Re: Corrupted B+ Tree Causing Repeated Crashes
Hello, Has the following code fix solved this issue? https://issues.apache.org/jira/browse/IGNITE-12489 在 2021/2/27 上午2:22, Maxim Muzafarov 写道: Mitchell, I've created the issue [1] for your case, but it's really hard to define the root cause without additional information (the exception stack trace isn't enough for analysis). Do you have the issue reproducer? Does this issue still appear? There are some issues already been fixed since the 2.7.5 version of the persistent data store, but maybe some of them still exist. So, I see the following options here: - as a WA: cleanup the node pds and wait for data being rebalanced. - trying to reproduce the issue on another cluster - take a snapshot, restore it on another environment, run idle_verify check (probably it helps) [1] https://issues.apache.org/jira/browse/IGNITE-14252 On Fri, 26 Feb 2021 at 06:44, Mitchell Rathbun (BLOOMBERG/ 731 LEX) wrote: The rest of the logs are mixed in with the rest of our process logs, so I can't really share that. The configuration looks as follows: DataRegionConfiguration dataRegionCfg = new DataRegionConfiguration(); dataRegionCfg.setName(DATA_REGION_NAME) .setInitialSize(200_000_000) .setMaxSize(200_000_000) .setPersistenceEnabled(true) .setMetricsEnabled(true); DataStorageConfiguration storageCfg = new DataStorageConfiguration(); storageCfg.setDataRegionConfigurations(dataRegionCfg) .setWriteThrottlingEnabled(true) .setMetricsEnabled(true); IgniteConfiguration ignCfg = new IgniteConfiguration(); ignCfg.setWorkDirectory(workDirectory) .setDataStorageConfiguration(storageCfg) .setIgniteInstanceName("instanceName") .setSystemWorkerBlockedTimeout(1) .setFailureDetectionTimeout(1) From: mmu...@apache.org At: 02/25/21 18:19:15 To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org Subject: Re: Corrupted B+ Tree Causing Repeated Crashes Mitchell, Can you provide the full log and the cache configuration? On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX) wrote: Any other thoughts on this? The data corruption occurred when we were using version 2.7.5. I have looked at a couple of tickets involving corrupted trees, but it doesn't seem like any of them apply to our use case of Ignite. Would like to understand at least how we get into this corrupted state in the first place, and how to handle it when it happens. Is there a way to detect and log this error while avoiding crashing the process? From: user@ignite.apache.org At: 02/19/21 14:18:44 To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org Subject: Re: Corrupted B+ Tree Causing Repeated Crashes Hello! What version of Apache Ignite are you using? 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" : We are encountering the following error repeatedly, which causes our node to crash: 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0 class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip= com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted TreeException(BPlusTree.java:6106) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1367) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
Re: Corrupted B+ Tree Causing Repeated Crashes
Mitchell, I've created the issue [1] for your case, but it's really hard to define the root cause without additional information (the exception stack trace isn't enough for analysis). Do you have the issue reproducer? Does this issue still appear? There are some issues already been fixed since the 2.7.5 version of the persistent data store, but maybe some of them still exist. So, I see the following options here: - as a WA: cleanup the node pds and wait for data being rebalanced. - trying to reproduce the issue on another cluster - take a snapshot, restore it on another environment, run idle_verify check (probably it helps) [1] https://issues.apache.org/jira/browse/IGNITE-14252 On Fri, 26 Feb 2021 at 06:44, Mitchell Rathbun (BLOOMBERG/ 731 LEX) wrote: > > The rest of the logs are mixed in with the rest of our process logs, so I > can't really share that. The configuration looks as follows: > > DataRegionConfiguration dataRegionCfg = new DataRegionConfiguration(); > dataRegionCfg.setName(DATA_REGION_NAME) > .setInitialSize(200_000_000) > .setMaxSize(200_000_000) > .setPersistenceEnabled(true) > .setMetricsEnabled(true); > > DataStorageConfiguration storageCfg = new DataStorageConfiguration(); > storageCfg.setDataRegionConfigurations(dataRegionCfg) > .setWriteThrottlingEnabled(true) > .setMetricsEnabled(true); > > IgniteConfiguration ignCfg = new IgniteConfiguration(); > ignCfg.setWorkDirectory(workDirectory) > .setDataStorageConfiguration(storageCfg) > .setIgniteInstanceName("instanceName") > .setSystemWorkerBlockedTimeout(1) > .setFailureDetectionTimeout(1) > > > From: mmu...@apache.org At: 02/25/21 18:19:15 > To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org > Subject: Re: Corrupted B+ Tree Causing Repeated Crashes > > Mitchell, > > Can you provide the full log and the cache configuration? > > On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX) > wrote: > > > > Any other thoughts on this? The data corruption occurred when we were using > version 2.7.5. I have looked at a couple of tickets involving corrupted trees, > but it doesn't seem like any of them apply to our use case of Ignite. Would > like to understand at least how we get into this corrupted state in the first > place, and how to handle it when it happens. Is there a way to detect and log > this error while avoiding crashing the process? > > > > From: user@ignite.apache.org At: 02/19/21 14:18:44 > > To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org > > Subject: Re: Corrupted B+ Tree Causing Repeated Crashes > > > > Hello! What version of Apache Ignite are you using? > > > > 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" > > : > > > We are encountering the following error repeatedly, which causes our node > > > to > > crash: > > > > > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 > > 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > > > SEVERE: Critical system error detected. Will be handled accordingly to > > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, > > super=AbstractFailureHandler > > > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > > [type=CRITICAL_ERROR, err=class > > o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is > > corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, > > val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow > > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > > [idHash=1436767547, hash=-931214342, > > accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip > > [idHash=316813954, hash=343304888, accountId=0, > > cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, > > hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, > > subflag=2]]], hash=-931214342, cacheId=0 > > > class > > > org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept > > ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple > > [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: > > SearchRow > > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > > [idHash=1436767547, hash=-931214342, accountCusip= > > > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip > > > [idHash=316813954, > > hash=343304888, accountId=0, > > cusip=com.
Re: Corrupted B+ Tree Causing Repeated Crashes
Mitchell, Can you provide the full log and the cache configuration? On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX) wrote: > > Any other thoughts on this? The data corruption occurred when we were using > version 2.7.5. I have looked at a couple of tickets involving corrupted > trees, but it doesn't seem like any of them apply to our use case of Ignite. > Would like to understand at least how we get into this corrupted state in the > first place, and how to handle it when it happens. Is there a way to detect > and log this error while avoiding crashing the process? > > From: user@ignite.apache.org At: 02/19/21 14:18:44 > To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org > Subject: Re: Corrupted B+ Tree Causing Repeated Crashes > > Hello! What version of Apache Ignite are you using? > > 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" > : > > We are encountering the following error repeatedly, which causes our node to > crash: > > > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 > 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > > SEVERE: Critical system error detected. Will be handled accordingly to > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, > super=AbstractFailureHandler > > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class > o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is > corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, > val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > [idHash=1436767547, hash=-931214342, > accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip > [idHash=316813954, hash=343304888, accountId=0, > cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, > hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, > subflag=2]]], hash=-931214342, cacheId=0 > > class > org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept > ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple > [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: > SearchRow > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > [idHash=1436767547, hash=-931214342, accountCusip= > > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, > hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip > [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, > cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] > > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted > TreeException(BPlusTree.java:6106) > > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B > PlusTree.java:1367) > > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B > PlusTree.java:1344) > > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD > ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755) > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$ > GridCacheDataStore.find(GridCacheOffheapManager.java:2469) > > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I > gniteCacheOffheapManagerImpl.java:637) > > at > org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge > tAllInternal(GridLocalAtomicCache.java:410) > > at > org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge > tAll(GridLocalAtomicCache.java:323) > > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr > idCacheAdapter.java:4907) > > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda > pter.java:1617) > > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa > cheProxyImpl.java:1157) > > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga > tewayProtectedCacheProxy.java:724) > > at > com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data > Cache.java:1535) > > at > com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity > KeyAccountAsync$11(Ts3DataCache.java:895) > > at java.base/java.util.concurrent.FutureTa
Re: Corrupted B+ Tree Causing Repeated Crashes
Any other thoughts on this? The data corruption occurred when we were using version 2.7.5. I have looked at a couple of tickets involving corrupted trees, but it doesn't seem like any of them apply to our use case of Ignite. Would like to understand at least how we get into this corrupted state in the first place, and how to handle it when it happens. Is there a way to detect and log this error while avoiding crashing the process? From: user@ignite.apache.org At: 02/19/21 14:18:44To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org Subject: Re: Corrupted B+ Tree Causing Repeated Crashes Hello! What version of Apache Ignite are you using? 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" : > We are encountering the following error repeatedly, which causes our node to crash: > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0 > class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip= > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted TreeException(BPlusTree.java:6106) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1367) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1344) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$ GridCacheDataStore.find(GridCacheOffheapManager.java:2469) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I gniteCacheOffheapManagerImpl.java:637) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAllInternal(GridLocalAtomicCache.java:410) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAll(GridLocalAtomicCache.java:323) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr idCacheAdapter.java:4907) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda pter.java:1617) > at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa cheProxyImpl.java:1157) > at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga tewayProtectedCacheProxy.java:724) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data Cache.java:1535) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity KeyAccountAsync$11(Ts3DataCache.java:895) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j ava:1128) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.IllegalStateException: Item not found: 1 > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.findIndirectItemIndex(AbstractDataPageIO.java:351) > at org.apache.ignite.internal.proc
Re: Corrupted B+ Tree Causing Repeated Crashes
Actually, we were using 2.7.5 when the data was corrupted. We upgraded to 2.9.1 without clearing the corrupted data and got the error that was posted in the first message. From: user@ignite.apache.org At: 02/19/21 14:18:44To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org Subject: Re: Corrupted B+ Tree Causing Repeated Crashes Hello! What version of Apache Ignite are you using? 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" : > We are encountering the following error repeatedly, which causes our node to crash: > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0 > class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip= > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted TreeException(BPlusTree.java:6106) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1367) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1344) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$ GridCacheDataStore.find(GridCacheOffheapManager.java:2469) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I gniteCacheOffheapManagerImpl.java:637) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAllInternal(GridLocalAtomicCache.java:410) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAll(GridLocalAtomicCache.java:323) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr idCacheAdapter.java:4907) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda pter.java:1617) > at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa cheProxyImpl.java:1157) > at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga tewayProtectedCacheProxy.java:724) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data Cache.java:1535) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity KeyAccountAsync$11(Ts3DataCache.java:895) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j ava:1128) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.IllegalStateException: Item not found: 1 > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.findIndirectItemIndex(AbstractDataPageIO.java:351) > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.getDataOffset(AbstractDataPageIO.java:459) > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.readPayload(AbstractDataPageIO.java:501) > at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(C
Re: Corrupted B+ Tree Causing Repeated Crashes
2.9.1 From: user@ignite.apache.org At: 02/19/21 14:18:44To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org Subject: Re: Corrupted B+ Tree Causing Repeated Crashes Hello! What version of Apache Ignite are you using? 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" : > We are encountering the following error repeatedly, which causes our node to crash: > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0 > class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip= > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted TreeException(BPlusTree.java:6106) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1367) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B PlusTree.java:1344) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755) > at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$ GridCacheDataStore.find(GridCacheOffheapManager.java:2469) > at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I gniteCacheOffheapManagerImpl.java:637) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAllInternal(GridLocalAtomicCache.java:410) > at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge tAll(GridLocalAtomicCache.java:323) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr idCacheAdapter.java:4907) > at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda pter.java:1617) > at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa cheProxyImpl.java:1157) > at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga tewayProtectedCacheProxy.java:724) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data Cache.java:1535) > at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity KeyAccountAsync$11(Ts3DataCache.java:895) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j ava:1128) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.IllegalStateException: Item not found: 1 > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.findIndirectItemIndex(AbstractDataPageIO.java:351) > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.getDataOffset(AbstractDataPageIO.java:459) > at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage IO.readPayload(AbstractDataPageIO.java:501) > at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache DataTree.java:447) > at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData Tree.java:3
Re: Corrupted B+ Tree Causing Repeated Crashes
Hello! What version of Apache Ignite are you using? 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" : > We are encountering the following error repeatedly, which causes our node to > crash: > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 > 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error > SEVERE: Critical system error detected. Will be handled accordingly to > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class > o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is > corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, > val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > [idHash=1436767547, hash=-931214342, > accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip > [idHash=316813954, hash=343304888, accountId=0, > cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, > hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, > subflag=2]]], hash=-931214342, cacheId=0 > class > org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: > B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, > val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey > [idHash=1436767547, hash=-931214342, accountCusip= > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, > hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip > [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, > cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]] > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1367) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1344) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:2469) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:637) > at > org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAllInternal(GridLocalAtomicCache.java:410) > at > org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAll(GridLocalAtomicCache.java:323) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(GridCacheAdapter.java:4907) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAdapter.java:1617) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCacheProxyImpl.java:1157) > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(GatewayProtectedCacheProxy.java:724) > at > com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3DataCache.java:1535) > at > com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurityKeyAccountAsync$11(Ts3DataCache.java:895) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.IllegalStateException: Item not found: 1 > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:351) > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:459) > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:501) > at > org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(CacheDataTree.java:447) > at > org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:386) > at > org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5377) > at >