Hi Dan,
We are doing put operations within transactions.
Here is the stack trace:
[warn 2016/09/29 06:17:16.297 UTC <Event Processor for
GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19]
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher@55bded67:
Exception during processing batch 448
com.gemstone.gemfire.internal.cache.wan.GatewaySenderException:
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher@55bded67:
Exception during processing batch 448, caused by
com.gemstone.gemfire.cache.CacheClosedException: For DiskStore: nsxDiskStore:
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused
by com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore:
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused
by java.io.IOException: Stream Closed
at
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:173)
at
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:83)
at
com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySenderEventProcessor.processQueue(AbstractGatewaySenderEventProcessor.java:579)
at
com.gemstone.gemfire.internal.cache.wan.serial.SerialGatewaySenderEventProcessor.run(SerialGatewaySenderEventProcessor.java:219)
Caused by: com.gemstone.gemfire.cache.CacheClosedException: For DiskStore:
nsxDiskStore: Failed writing key to
"/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by
com.gemstone.gemfire.cache.DiskAccessException: For DiskStore: nsxDiskStore:
Failed writing key to "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused
by java.io.IOException: Stream Closed
at
com.gemstone.gemfire.internal.cache.GemFireCacheImpl$Stopper.generateCancelledException(GemFireCacheImpl.java:1299)
at
com.gemstone.gemfire.CancelCriterion.checkCancelInProgress(CancelCriterion.java:82)
at
com.gemstone.gemfire.internal.cache.TXManagerImpl.checkClosed(TXManagerImpl.java:606)
at
com.gemstone.gemfire.internal.cache.TXManagerImpl.begin(TXManagerImpl.java:279)
at
com.vmware.nsx.management.container.dao.gemfire.GemFireTxLogDao.processTxLog(GemFireTxLogDao.java:119)
at
com.vmware.nsx.management.container.dao.gemfire.TxLogAsyncEventListener.processEvents(TxLogAsyncEventListener.java:93)
at
com.gemstone.gemfire.internal.cache.wan.GatewaySenderEventCallbackDispatcher.dispatchBatch(GatewaySenderEventCallbackDispatcher.java:164)
... 3 more
Caused by: com.gemstone.gemfire.cache.DiskAccessException: For DiskStore:
nsxDiskStore: Failed writing key to
"/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by java.io.IOException:
Stream Closed
at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235)
at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5223)
at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5215)
at com.gemstone.gemfire.internal.cache.Oplog.basicClose(Oplog.java:3312)
at com.gemstone.gemfire.internal.cache.Oplog.destroy(Oplog.java:3375)
at
com.gemstone.gemfire.internal.cache.Oplog.handleEmptyAndOldest(Oplog.java:5877)
at
com.gemstone.gemfire.internal.cache.Oplog.handleNoLiveValues(Oplog.java:5924)
at
com.gemstone.gemfire.internal.cache.Oplog.basicModify(Oplog.java:4751)
at
com.gemstone.gemfire.internal.cache.Oplog.copyForwardModifyForCompact(Oplog.java:4594)
at com.gemstone.gemfire.internal.cache.Oplog.compact(Oplog.java:6087)
at
com.gemstone.gemfire.internal.cache.DiskStoreImpl$OplogCompactor.compact(DiskStoreImpl.java:2970)
at
com.gemstone.gemfire.internal.cache.DiskStoreImpl$OplogCompactor.run(DiskStoreImpl.java:3034)
at
com.gemstone.gemfire.internal.cache.DiskStoreImpl$6.run(DiskStoreImpl.java:4701)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Stream Closed
at java.io.RandomAccessFile.getFilePointer(Native Method)
at
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile.getFilePointer(UninterruptibleRandomAccessFile.java:59)
at
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile$UninterruptibleFileChannelImpl.doUninterruptibly(UninterruptibleRandomAccessFile.java:102)
at
com.gemstone.gemfire.internal.cache.persistence.UninterruptibleRandomAccessFile$UninterruptibleFileChannelImpl.force(UninterruptibleRandomAccessFile.java:226)
at com.gemstone.gemfire.internal.cache.Oplog.flush(Oplog.java:5202)
at com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5232)
... 17 more
Thanks
Kapil
-----Original Message-----
From: Dan Smith [mailto:[email protected]]
Sent: Tuesday, October 04, 2016 10:08 AM
To: [email protected]
Subject: Re: OplogCompactor race issue?
Hi Kapil,
I haven't seen this particular issue before. Do you have any stack traces of
the exception from the OplogCompactor? What sort of operations are you doing?
puts?
-Dan
On Tue, Oct 4, 2016 at 1:16 AM, Kapil Goyal <[email protected]> wrote:
> Hi All,
>
> We have been testing a single cache node with a lot of data recently
> and frequently run into this error:
>
> [info 2016/09/29 06:16:06.823 UTC <OplogCompactor nsxDiskStore for
> oplog oplog#6> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog
> id(s):
> oplog#6
> [info 2016/09/29 06:16:08.232 UTC <OplogCompactor nsxDiskStore for
> oplog oplog#6> tid=0x19] compaction did 6,310 creates and updates in
> 1,408 ms [info 2016/09/29 06:16:08.248 UTC <Oplog Delete Task>
> tid=0x19] Deleted
> oplog#6 crf for disk store nsxDiskStore.
> [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19]
> Deleted
> oplog#6 krf for disk store nsxDiskStore.
> [info 2016/09/29 06:16:08.256 UTC <Oplog Delete Task> tid=0x19]
> Deleted
> oplog#6 drf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:03.887 UTC <Event Processor for
> GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created
> oplog#8 drf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:03.911 UTC <Event Processor for
> GatewaySender_AsyncEventQueue_txLogEventQueue> tid=0x19] Created
> oplog#8 crf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:04.031 UTC <Idle OplogCompactor> tid=0x19]
> Created
> oplog#7 krf for disk store nsxDiskStore.
> [info 2016/09/29 06:17:04.314 UTC <OplogCompactor nsxDiskStore for
> oplog oplog#7> tid=0x19] OplogCompactor for nsxDiskStore compaction oplog
> id(s):
> oplog#7
> [error 2016/09/29 06:17:16.075 UTC <OplogCompactor nsxDiskStore for
> oplog oplog#7> tid=0x19] A DiskAccessException has occurred while
> writing to the disk for disk store nsxDiskStore. The cache will be closed.
> ?com.gemstone.gemfire.cache.DiskAccessException: For DiskStore:
> nsxDiskStore: Failed writing key to
> "/common/nsxapi/data/self/BACKUPnsxDiskStore_7", caused by
> java.io.IOException: Stream Closed ?at
> com.gemstone.gemfire.internal.cache.Oplog.flushAll(Oplog.java:5235)
>
> From the logs it appears there may be a race between threads "Idle
> OplogCompactor" and "OplogCompactor nsxDiskStore for oplog oplog#7". I
> see that both are doing operations related to oplog#7. The former logs
> creation of a KRF file, while the latter is trying to access either
> the DRF or the CRF file. Now, is it possible that "Idle
> OplogCompactor" closed the DRF/CRF files for oplog#7 as part of
> creating the KRF for the same? This is what GemFire docs say about it:
>
> "After the oplog is closed, GemFire also attempts to create a krf
> file, which contains the key names as well as the offset for the value
> within the crf file."
>
> Based on the above, it's possible that oplog#7 was already closed and
> its KRF was already created, when the compactor tried to access the files.
>
> Have any of you run into this error before? Any suggestions?
>
> Thanks
> Kapil