Pavel Kovalenko created IGNITE-9084: ---------------------------------------
Summary: Trash in WAL after node stop may affect WAL rebalance Key: IGNITE-9084 URL: https://issues.apache.org/jira/browse/IGNITE-9084 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.6 Reporter: Pavel Kovalenko Assignee: Pavel Kovalenko Fix For: 2.7 During iteration over WAL we can face with trash in WAL segment, which can remains after node restart. We should handle this situation in WAL rebalance iterator and gracefully stop iteration process. {noformat} [2018-07-25 17:18:21,152][ERROR][sys-#25385%persistence.IgnitePdsTxHistoricalRebalancingTest0%][GridCacheIoManager] Failed to process message [senderId=f0d35df7-ff93-4b6c-b699-45f3e7c00003, messageType=class o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage] class org.apache.ignite.IgniteException: Failed to read WAL record at position: 19346739 size: 67108864 at org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:38) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.advance(GridCacheOffheapManager.java:1033) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.next(GridCacheOffheapManager.java:948) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:917) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:842) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.nextX(IgniteRebalanceIteratorImpl.java:130) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:185) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:37) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:348) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:370) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:380) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:365) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1613) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752) at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 19346739 size: 67108864 at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:263) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:229) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:149) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:115) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:49) at org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:41) at org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35) ... 24 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 19346739 size: 67108864 at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:391) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:231) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:217) ... 29 more Caused by: class org.apache.ignite.internal.processors.cache.persistence.wal.crc.IgniteDataIntegrityViolationException: val: -49246101 writtenCrc: 0 at org.apache.ignite.internal.processors.cache.persistence.wal.FileInput$Crc32CheckingFileInput.close(FileInput.java:327) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:377) ... 31 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)