Hello! This looks like a PDS corruption to me. Can you by chance share persistence files from problematic node? I am assuming that it fails every time on restart?
Regards, -- Ilya Kasnacheev чт, 20 мая 2021 г. в 12:52, Lo, Marcus <marcus...@citi.com>: > Hi, > > > > We have a 4 node ignite cluster setup. After running the cluster for 1 > day, we encounter the following error almost at the same time at node #2, > #3, and #4: > > > > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [ > SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class > o.a.i.IgniteCheckedException: Maximum number of retries 1000 reached for > Put operation (the tree may be corrupted). Increase > IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this > message (current value is 1000).]] > org.apache.ignite.IgniteCheckedException: Maximum number of retries 1000 > reached for Put operation (the tree may be corrupted). Increase > IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this > message (current value is 1000). at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.checkLockRetry > (BPlusTree.java:3109) [ignite-core-2.10.0.jar:2.10.0] at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.checkLockRetry > (BPlusTree.java:3906) [ignite-core-2.10.0.jar:2.10.0] > > > > Tried increasing IGNITE_BPLUS_TREE_LOCK_RETRIES to 100,000 and restarted > the nodes, but it didn’t help and the node went into the same error > straight away. > > > > Can you please shed some lights on how to resolve the issue? Thanks. > > > > I also attach the logs for your reference: > > ignite-node-[1,2,3,4].log: the full log files for all nodes > > ignite-restart.log: the log for node 2 when it crashed > > > > Regards, > > Marcus > > >