RE: OplogCompactor race issue?

Kapil Goyal Thu, 06 Oct 2016 14:49:07 -0700

Hi Dan,

We are doing put operations within transactions.


Here is the stack trace:

[warn 2016/09/29 06:17:16.297 UTC <Event Processor for 
GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] 
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher@55bded67:
 Exception during processing batch 448
com.gemstone.gemfire.internal.cache.wan.GatewaySenderException: 
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher@55bded67:
 Exception during processing batch 448, caused by 
com.gemstone.gemfire.cache.CacheClosedException: For DiskStore: nsxDiskStore: 
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused 
by com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore: 
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused 
by java.io.IOException: Stream Closed
        at 
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:173)
        at 
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:83)
        at 
com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:579)
        at 
com.gemstone.gemfire.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run(SerialGatewaySenderEventProcessor.java:219)
Caused by: com.gemstone.gemfire.cache.CacheClosedException: For DiskStore: 
nsxDiskStore: Failed writing key to 
"/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by 
com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore: 
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused 
by java.io.IOException: Stream Closed
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1299)
        at 
com.gemstone.gemfire.CancelCriterion.checkCancelInProgress(CancelCriterion.java:82)
        at 
com.gemstone.gemfire.internal.cache.TXManagerImpl.checkClosed(TXManagerImpl.java:606)
        at 
com.gemstone.gemfire.internal.cache.TXManagerImpl.begin(TXManagerImpl.java:279)
        at 
com.vmware.nsx.management.container.dao.gemfire.GemFireTxLogDao.processTxLog(GemFireTxLogDao.java:119)
        at 
com.vmware.nsx.management.container.dao.gemfire.TxLogAsyncEventListener.processEvents(TxLogAsyncEventListener.java:93)
        at 
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:164)
        ... 3 more
Caused by: com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: 
nsxDiskStore: Failed writing key to 
"/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by java.io.IOException: 
Stream Closed
        at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235)
        at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5223)
        at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5215)
        at com.gemstone.gemfire.internal.cache.Oplog.basicClose(Oplog.java:3312)
        at com.gemstone.gemfire.internal.cache.Oplog.destroy(Oplog.java:3375)
        at 
com.gemstone.gemfire.internal.cache.Oplog.handleEmptyAndOldest(Oplog.java:5877)
        at 
com.gemstone.gemfire.internal.cache.Oplog.handleNoLiveValues(Oplog.java:5924)
        at 
com.gemstone.gemfire.internal.cache.Oplog.basicModify(Oplog.java:4751)
        at 
com.gemstone.gemfire.internal.cache.Oplog.copyForwardModifyForCompact(Oplog.java:4594)
        at com.gemstone.gemfire.internal.cache.Oplog.compact(Oplog.java:6087)
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl$OplogCompactor.compact(DiskStoreImpl.java:2970)
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl$OplogCompactor.run(DiskStoreImpl.java:3034)
        at 
com.gemstone.gemfire.internal.cache.DiskStoreImpl$6.run(DiskStoreImpl.java:4701)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Stream Closed
        at java.io.RandomAccessFile.getFilePointer(Native Method)
        at 
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile.getFilePointer(UninterruptibleRandomAccessFile.java:59)
        at 
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile$UninterruptibleFileChannelImpl.doUninterruptibly(UninterruptibleRandomAccessFile.java:102)
        at 
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile$UninterruptibleFileChannelImpl.force(UninterruptibleRandomAccessFile.java:226)
        at com.gemstone.gemfire.internal.cache.Oplog.flush(Oplog.java:5202)
        at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5232)
        ... 17 more

Thanks
Kapil

-----Original Message-----
From: Dan Smith [mailto:[email protected]] 
Sent: Tuesday, October 04, 2016 10:08 AM
To: [email protected]
Subject: Re: OplogCompactor race issue?

Hi Kapil,

I haven't seen this particular issue before. Do you have any stack traces of 
the exception from the OplogCompactor? What sort of operations are you doing? 
puts?

-Dan

On Tue, Oct 4, 2016 at 1:16 AM, Kapil Goyal <[email protected]> wrote:
> Hi All,
>
> We have been testing a single cache node with a lot of data recently 
> and frequently run into this error:
>
> [info 2016/09/29 06:16:06.823 UTC <OplogCompactor nsxDiskStore for 
> oplog oplog#6> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog 
> id(s):
> oplog#6
> [info 2016/09/29 06:16:08.232 UTC <OplogCompactor nsxDiskStore for 
> oplog oplog#6> tid=0x19] compaction did 6,310 creates and updates in 
> 1,408 ms [info 2016/09/29 06:16:08.248 UTC <Oplog Delete Task> 
> tid=0x19] Deleted
> oplog#6 crf for disk store nsxDiskStore.
> [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] 
> Deleted
> oplog#6 krf for disk store nsxDiskStore.
> [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19] 
> Deleted
> oplog#6 drf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:03.887 UTC <Event Processor for 
> GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created 
> oplog#8 drf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:03.911 UTC <Event Processor for 
> GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created 
> oplog#8 crf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:04.031 UTC <Idle OplogCompactor> tid=0x19] 
> Created
> oplog#7 krf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:04.314 UTC <OplogCompactor nsxDiskStore for 
> oplog oplog#7> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog 
> id(s):
> oplog#7
> [error 2016/09/29 06:17:16.075 UTC <OplogCompactor nsxDiskStore for 
> oplog oplog#7> tid=0x19] A DiskAccessException has occurred while 
> writing to the disk for disk store nsxDiskStore. The cache will be closed.
> ?com.gemstone.gemfire.cache.DiskAccessException: For DiskStore:
> nsxDiskStore: Failed writing key to
> "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by
> java.io.IOException: Stream Closed ?at
> com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235)
>
> From the logs it appears there may be a race between threads "Idle 
> OplogCompactor" and "OplogCompactor nsxDiskStore for oplog oplog#7". I 
> see that both are doing operations related to oplog#7. The former logs 
> creation of a KRF file, while the latter is trying to access either 
> the DRF or the CRF file. Now, is it possible that "Idle 
> OplogCompactor" closed the DRF/CRF files for oplog#7 as part of 
> creating the KRF for the same? This is what GemFire docs say about it:
>
> "After the oplog is closed, GemFire also attempts to create a krf 
> file, which contains the key names as well as the offset for the value 
> within the crf file."
>
> Based on the above, it's possible that oplog#7 was already closed and 
> its KRF was already created, when the compactor tried to access the files.
>
> Have any of you run into this error before? Any suggestions?
>
> Thanks
> Kapil

RE: OplogCompactor race issue?

Reply via email to