[ https://issues.apache.org/jira/browse/IGNITE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stanislav Lukyanov reassigned IGNITE-7753: ------------------------------------------ Assignee: (was: Stanislav Lukyanov) > Processors are incorrectly initialized if a node joins during cluster > activation > -------------------------------------------------------------------------------- > > Key: IGNITE-7753 > URL: https://issues.apache.org/jira/browse/IGNITE-7753 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.3, 2.4, 2.5 > Reporter: Stanislav Lukyanov > Priority: Major > > If a node joins during the cluster activation process (while the related > exchange operation is in progress), then some of the GridProcessor instances > of that node will be incorrectly initialized. While GridClusterStateProcessor > will correctly report the active cluster state, other processors that are > sensitive to the cluster state, e.g. GridServiceProcessor, will be not > initialized. > A reproducer is below. > ======================= > Ignite server = > IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml", > "server"); > CyclicBarrier barrier = new CyclicBarrier(2); > Thread activationThread = new Thread(() -> { > try { > barrier.await(); > server.active(true); > } > catch (Exception e) { > e.printStackTrace(); // TODO implement. > } > }); > activationThread.start(); > barrier.await(); > IgnitionEx.setClientMode(true); > Ignite client = > IgnitionEx.start("examples/config/persistentstore/example-persistent-store.xml", > "client"); > activationThread.join(); > client.services().deployClusterSingleton("myClusterSingleton", new > SimpleMapServiceImpl<>()); > ======================= > Here a single server node is started, then simultaneously a client node is > being started and the cluster is being activated, then client attempts to > deploy a service. As the result, the thread calling the deploy method hangs > forever with a stack trace like this: > ======================= > "main@1" prio=5 tid=0x1 nid=NA waiting > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Unsafe.java:-1) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7505) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceCache(GridServiceProcessor.java:290) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.writeServiceToCache(GridServiceProcessor.java:728) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:634) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:600) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:488) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployClusterSingleton(GridServiceProcessor.java:469) > at > org.apache.ignite.internal.IgniteServicesImpl.deployClusterSingleton(IgniteServicesImpl.java:120) > ======================= > The behavior depends on the timings - the client has to join in the middle of > the activation's exchange process. Putting Thread.sleep(4000) into > GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest seems to work on > a development laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)