Thanks Andrei.  Looking at my exception (see below), it seem like it is related 
to https://issues.apache.org/jira/browse/IGNITE-11620 in that it occurred while 
expiration was going on. 

1. As a workaround, would it be valid to increase my ttl to reduce the 
possibility of this occurring ? 
2. My worry about using "NoOpFailureHandler" is that the error would still have 
occurred and it might have put the node in a bad situation which might be just 
as bad or worse than just killing the node. 

If you can confirm 1. is a valid line of defense (albeit not air-tight), that 
would be great.

Thanks,
Abhishek

P.S. My exception below. See it occurs on 'expire()' - similar stack trace as 
the one in 11620


 [ERROR] ttl-cleanup-worker-#159 - Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException
 [part=1013, msg=Adding entry to partition that is concurrently evicted 
[grp=mainCache, part=1013, shouldBeMoving=, belongs=false, 
topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]]]]] 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException:
 Adding entry to partition that is concurrently evicted [grp=mainCache, 
part=1013, shouldBeMoving=, belongs=false, topVer=AffinityTopologyVersion 
[topVer=1978, minorTopVer=1], curTopVer=AffinityTopologyVersion [topVer=1978, 
minorTopVer=1]] at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localPartition0(GridDhtPartitionTopologyImpl.java:950)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localPartition(GridDhtPartitionTopologyImpl.java:825)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.localPartition(GridCachePartitionedConcurrentMap.java:70)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:89)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:1008)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:544)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:999)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expireInternal(IgniteCacheOffheapManagerImpl.java:1403)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1347)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:207)
 ~[ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:139)
 [ignite-core-2.7.5-0-2.jar:2.7.5] at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.5-0-2.jar:2.7.5] at java.lang.Thread.run(Thread.java:748) 
[?:1.8.0_222]

From: [email protected] At: 01/31/20 05:11:57To:  [email protected]
Subject: Re: "Adding entry to partition that is concurrently evicted" error

                  
Hi,
      
      Current problem should be solved in ignite-2.8. I am not sure why       
this fix isn't a part of ignite-2.7.6.
      
      https://issues.apache.org/jira/browse/IGNITE-11127
      
      Your cluster was stopped because of failure handler work.
      
https://apacheignite.readme.io/docs/critical-failures-handling#section-failure-handling
      
      I am not sure about possible workarounds here (probably you can       set 
the NoOpFailureHandler). You also can try to create the thread       on 
developer user list:
      
http://apache-ignite-developers.2346864.n4.nabble.com/Apache-Ignite-2-7-release-td34076i40.html
      
      BR,
      Andrei     
1/29/2020 1:58 AM, Abhishek Gupta       (BLOOMBERG/ 919 3RD A) пишет:
         
                          
Hello!      I've got a 6 node Ignite 2.7.5 grid. I had this strange issue where 
multiple nodes hit the following exception -   [ERROR] [sys-stripe-53-#54] 
GridCacheIoManager - Failed to process message 
[senderId=f4a736b6-cfff-4548-a8b4-358d54d19ac6, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearGetRequest] 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException:
 Adding entry to partition that is concurrently evicted [grp=mainCache, 
part=733, shouldBeMoving=, belongs=false, topVer=AffinityTopologyVersion 
[topVer=1978, minorTopVer=1], curTopVer=AffinityTopologyVersion [topVer=1978, 
minorTopVer=1]]  and then died after  2020-01-27 13:30:19.849 [ERROR] 
[ttl-cleanup-worker-#159]  - JVM will be halted immediately due to the failure: 
[failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.i.processors.cache.distributed.dht.topology.GridDhtInvalidPartitionException
 [part=1013, msg=Adding entry to partition that is concurrently evicted 
[grp=mainCache, part=1013, shouldBeMoving=, belongs=false, 
topVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1], 
curTopVer=AffinityTopologyVersion [topVer=1978, minorTopVer=1]]]]]  The 
sequence of events was simply the following - 
One of the nodes (lets call it node 1) was down for 2.5 hours and restarted. 
After a configured delay of 20 mins, it started to rebalance from the other 5 
nodes. There were no other nodes that joined or left in this period. 40 minutes 
into the rebalance the the above errors started showing in the other nodes and 
they just bounced, and therefore there was data loss.   I found a few links 
related to this but nothing that explained the root cause or what my work 
around could be -   * 
http://apache-ignite-users.70518.x6.nabble.com/Adding-entry-to-partition-that-is-concurrently-evicted-td24782.html#a24786
 * https://issues.apache.org/jira/browse/IGNITE-9803
* https://issues.apache.org/jira/browse/IGNITE-11620
 Thanks, Abhishek      
    

Reply via email to