Hi Andrey. Thanks for information. Issues look like related to those we've got. Looking forward for fixes.
Regards. Arseny Kovalchuk Senior Software Engineer at Synesis skype: arseny.kovalchuk mobile: +375 (29) 666-16-16 LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en> On 26 December 2017 at 14:49, Andrey Mashenkov <[email protected]> wrote: > Hi Arseny, > > Seems this is already fixed [1] in master, but seems there is another > issue [2] and we are in the middle of fixing it. > We've found there were some unsafe memory changing operations without lock. > > > [1] https://issues.apache.org/jira/browse/IGNITE-6423 > [2] https://issues.apache.org/jira/browse/IGNITE-7278 > > On Tue, Dec 26, 2017 at 1:02 PM, Arseny Kovalchuk < > [email protected]> wrote: > >> Hi guys. >> >> Another issue when using Ignite 2.3 with native persistence enabled. See >> details below. >> >> We deploy Ignite along with our services in Kubernetes (v 1.8) on >> premises. Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite >> version 2.3. Each Pod mounts PersistentVolume backed by CEPH RBD. >> >> We put about 230 events/second into Ignite, 70% of events are ~200KB in >> size and 30% are 5000KB. Smaller events have indexed fields and we query >> them via SQL. >> >> The cluster is activated from a client node which also streams events >> into Ignite from Kafka. We use custom implementation of streamer which uses >> cache.putAll() API. >> >> We started cluster from scratch without any persistent data. After a >> while we got corrupted data with the error message. >> >> [2017-12-26 07:44:14,251] ERROR [sys-#127%ignite-instance-2%] >> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader: >> - Partition eviction failed, this can cause grid hang. >> class org.apache.ignite.IgniteException: Runtime failure on search row: >> Row@5b1479d6[ key: 171:1513946618964:3008806055072854, val: >> ru.synesis.kipod.event.KipodEvent [idHash=510912646, hash=-387621419, >> face_last_name=null, face_list_id=null, channel=171, source=, >> face_similarity=null, license_plate_number=null, descriptors=null, >> cacheName=kipod_events, cacheKey=171:1513946618964:3008806055072854, >> stream=171, alarm=false, processed_at=0, face_id=null, id=3008806055072854, >> persistent=false, face_first_name=null, license_plate_first_name=null, >> face_full_name=null, level=0, module=Kpx.Synesis.Outdoor, >> end_time=1513946624379, params=null, commented_at=0, tags=[vehicle, 0, >> human, 0, truck, 0, start_time=1513946618964, processed=false, >> kafka_offset=111259, license_plate_last_name=null, armed=false, >> license_plate_country=null, topic=MovingObject, comment=, >> expiration=1514033024000, original_id=null, license_plate_lists=null], ver: >> GridCacheVersion [topVer=125430590, order=1513955001926, nodeOrder=3] ][ >> 3008806055072854, MovingObject, Kpx.Synesis.Outdoor, 0, , 1513946618964, >> 1513946624379, 171, 171, FALSE, FALSE, , FALSE, FALSE, 0, 0, 111259, >> 1514033024000, (vehicle, 0, human, 0, truck, 0), null, null, null, null, >> null, null, null, null, null, null, null, null ] >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.doRemove(BPlusTree.java:1787) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.remove(BPlusTree.java:1578) >> at org.apache.ignite.internal.processors.query.h2.database.H2Tr >> eeIndex.remove(H2TreeIndex.java:216) >> at org.apache.ignite.internal.processors.query.h2.opt.GridH2Tab >> le.doUpdate(GridH2Table.java:496) >> at org.apache.ignite.internal.processors.query.h2.opt.GridH2Tab >> le.update(GridH2Table.java:423) >> at org.apache.ignite.internal.processors.query.h2.IgniteH2Index >> ing.remove(IgniteH2Indexing.java:580) >> at org.apache.ignite.internal.processors.query.GridQueryProcess >> or.remove(GridQueryProcessor.java:2334) >> at org.apache.ignite.internal.processors.cache.query.GridCacheQ >> ueryManager.remove(GridCacheQueryManager.java:461) >> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe >> apManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOff >> heapManagerImpl.java:1453) >> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe >> apManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapMa >> nagerImpl.java:1416) >> at org.apache.ignite.internal.processors.cache.persistence.Grid >> CacheOffheapManager$GridCacheDataStore.remove(GridCacheOffhe >> apManager.java:1271) >> at org.apache.ignite.internal.processors.cache.IgniteCacheOffhe >> apManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) >> at org.apache.ignite.internal.processors.cache.GridCacheMapEntr >> y.removeValue(GridCacheMapEntry.java:3233) >> at org.apache.ignite.internal.processors.cache.distributed.dht. >> GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) >> at org.apache.ignite.internal.processors.cache.distributed.dht. >> GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:951) >> at org.apache.ignite.internal.processors.cache.distributed.dht. >> GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:809) >> at org.apache.ignite.internal.processors.cache.distributed.dht. >> preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) >> at org.apache.ignite.internal.processors.cache.distributed.dht. >> preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) >> at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader >> (IgniteUtils.java:6631) >> at org.apache.ignite.internal.processors.closure.GridClosurePro >> cessor$2.body(GridClosureProcessor.java:967) >> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWo >> rker.java:110) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1149) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.lang.IllegalStateException: Failed to get page IO >> instance (page content is corrupted) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .io.IOVersions.forVersion(IOVersions.java:83) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .io.IOVersions.forPage(IOVersions.java:95) >> at org.apache.ignite.internal.processors.cache.persistence.Cach >> eDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148) >> at org.apache.ignite.internal.processors.cache.persistence.Cach >> eDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) >> at org.apache.ignite.internal.processors.query.h2.database.H2Ro >> wFactory.getRow(H2RowFactory.java:62) >> at org.apache.ignite.internal.processors.query.h2.database.io. >> H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126) >> at org.apache.ignite.internal.processors.query.h2.database.io. >> H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36) >> at org.apache.ignite.internal.processors.query.h2.database.H2Tr >> ee.getRow(H2Tree.java:123) >> at org.apache.ignite.internal.processors.query.h2.database.H2Tr >> ee.getRow(H2Tree.java:40) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.getRow(BPlusTree.java:4372) >> at org.apache.ignite.internal.processors.query.h2.database.H2Tr >> ee.compare(H2Tree.java:200) >> at org.apache.ignite.internal.processors.query.h2.database.H2Tr >> ee.compare(H2Tree.java:40) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.compare(BPlusTree.java:4359) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.findInsertionPoint(BPlusTree.java:4279) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.access$1500(BPlusTree.java:81) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree$Search.run0(BPlusTree.java:261) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree$GetPageHandler.run(BPlusTree.java:4697) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree$GetPageHandler.run(BPlusTree.java:4682) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .util.PageHandler.readPage(PageHandler.java:158) >> at org.apache.ignite.internal.processors.cache.persistence.Data >> Structure.read(DataStructure.java:319) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.removeDown(BPlusTree.java:1823) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.removeDown(BPlusTree.java:1842) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.removeDown(BPlusTree.java:1842) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.removeDown(BPlusTree.java:1842) >> at org.apache.ignite.internal.processors.cache.persistence.tree >> .BPlusTree.doRemove(BPlusTree.java:1752) >> ... 23 more >> >> >> After restart we also >> >> >> Arseny Kovalchuk >> >> Senior Software Engineer at Synesis >> skype: arseny.kovalchuk >> mobile: +375 (29) 666-16-16 <+375%2029%20666-16-16> >> LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en> >> > > > > -- > Best regards, > Andrey V. Mashenkov >
