Hi, We have a cluster with 5 server nodes. We run Ignite nodes dockerized. We do "docker stop" on one node to trigger a fail-over. At all this time we send operations to the cluster during a stress test.
As soon as we stop one of the 5 nodes, all the other nodes stop processing requests done with compute.call. In a thread dump we can see they all have blocked threads in the state below: they are blocked in a syncOp (it is a putIfAbsent), and they do not recover from this (at least we waited 10 minutes and they are blocked there, and even print "Threads starvation" messages when enough requests are made). Please note that if we have 6 nodes we no longer see this issue in a new test. Here is a blocked thread dump: "pub-#4%glueGrid%" #21 prio=5 os_prio=0 tid=0x00007ff30c4a0800 nid=0x23 waiting on condition [0x00007ff2a86e4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f5529918> (a org.apache.ignite.internal.util.future.GridFutureAdapter$ChainFuture) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:155) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115) at org.apache.ignite.internal.processors.cache.GridCacheAdapter$36.op(GridCacheAdapter.java:2642) at org.apache.ignite.internal.processors.cache.GridCacheAdapter$36.op(GridCacheAdapter.java:2640) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4440) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putIfAbsent(GridCacheAdapter.java:2640) at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.putIfAbsent(IgniteCacheProxy.java:1220) at comp.journeyprocessing.repository.Repository.persist(Repository.java:32) at comp.journeyprocessing.RequestProcessor.storeCorrelationIdToIndicateRequestIsHandled(RequestProcessor.java:61) at comp.journeyprocessing.RequestProcessor.process(RequestProcessor.java:51) at comp.journeyprocessing.RequestProcessor$$FastClassBySpringCGLIB$$3398aa.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:99) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:281) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655) at comp.journeyprocessing.RequestProcessor$$EnhancerBySpringCGLIB$$27ed152b.process(<generated>) at comp.journeyprocessing.RequestManagerImpl.processRequest(RequestManagerImpl.java:67) at comp.journeyprocessing.RequestManagerImpl.handle(RequestManagerImpl.java:59) at comp.journeyprocessingclient.api.TheGlueRequestCallable.execute(TheGlueRequestCallable.java:15) at comp.journeyprocessingclient.api.TheGlueRequestCallable.execute(TheGlueRequestCallable.java:5) at comp.journeyprocessingclient.TheGlueCallable.call(TheGlueCallable.java:14) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2V2.execute(GridClosureProcessor.java:2004) at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6484) at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503) at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1167) at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1772) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836) at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104) at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ?Thank you, Nicu Met vriendelijke groeten/Meilleures salutations/Best regards Nicolae Marasoiu Agile Developer E [email protected]<mailto:[email protected]> [http://signature.cegeka.com/mailsignature_cgk.png] CEGEKA * 15-17 Ion Mihalache Blvd. Tower Center Building, 4th,5th,6th fl * RO-011171 Bucharest (RO) * T +40 21 336 20 65 * www.cegeka.com Volg Cegeka: [http://signature.cegeka.com/twitter.png] <http://www.twitter.com/cegeka> [http://signature.cegeka.com/linkedin.png] <http://www.linkedin.com/company/cegeka> [http://signature.cegeka.com/facebook.png] <http://www.facebook.com/Cegeka> [http://signature.cegeka.com/google-plus.png] <http://www.cegeka.be/googleplus>
