Ivan Daschinskiy created IGNITE-13690:
-----------------------------------------

             Summary: Failed to init coordinator caches on concurrent start of 
nodes with different cache configurations.
                 Key: IGNITE-13690
                 URL: https://issues.apache.org/jira/browse/IGNITE-13690
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.9
            Reporter: Ivan Daschinskiy


Scenario:
1. Start simultaneously nodes with different cache configurations
(for simplicity, let client nodes be with configured caches, servers without).
2. When processing first exchange on coordinator, coordinator will fail with 

{code:java}
[2020-11-10 
13:23:57,232][ERROR][start-node-1][DifferentCacheConfigurationConcurrentStart0] 
Got exception while starting (will rollback startup routine).
java.lang.AssertionError: Invalid exchange futures state [cur=6, total=7]
        at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1964)
        at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1935)
        at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.lambda$forAllRegisteredCacheGroups$e0a6939d$1(CacheAffinitySharedManager.java:1265)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11157)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11059)
        at 
org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11039)
        at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1264)
        at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1935)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:716)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:850)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3175)
        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3021)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
{code}


The main reason is the race on creating {{LocalJoinCachesContext}}, so local 
join caches differs from registered caches from other nodes. 

Reproducer for zk and ring discoveries are attached. 
NB! Not always reproducible -- to increase probability of fail, add sleep in 
{{GridDhtPartitionsExchangeFuture#init}}

{code:java}
 public void init(boolean newCrd) throws IgniteInterruptedCheckedException {
        if (newCrd)
            U.sleep(500);
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to