Ilya Lantukh created IGNITE-9236: ------------------------------------ Summary: Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular) Key: IGNITE-9236 URL: https://issues.apache.org/jira/browse/IGNITE-9236 Project: Ignite Issue Type: Bug Reporter: Ilya Lantukh Assignee: Ilya Lantukh
In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP connection and hangs on handshake forever, holding lock on RebalanceFuture: {code} [11:51:55] : [Step 3/4] Locked synchronizers: [11:51:55] : [Step 3/4] java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883 [11:51:55] : [Step 3/4] Thread [name="sys-#68921%new-node-topology-change-thread-1%", id=77410, state=RUNNABLE, blockCnt=3, waitCnt=0] [11:51:55] : [Step 3/4] at sun.nio.ch.FileDispatcherImpl.read0(Native Method) [11:51:55] : [Step 3/4] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) [11:51:55] : [Step 3/4] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) [11:51:55] : [Step 3/4] at sun.nio.ch.IOUtil.read(IOUtil.java:197) [11:51:55] : [Step 3/4] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) [11:51:55] : [Step 3/4] - locked java.lang.Object@23aaa756 [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647) [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293) [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967) [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850) [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693) [11:51:55] : [Step 3/4] at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652) [11:51:55] : [Step 3/4] at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643) [11:51:55] : [Step 3/4] at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:1111) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041) [11:51:55] : [Step 3/4] - locked o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150 [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown Source) [11:51:55] : [Step 3/4] at o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800) [11:51:55] : [Step 3/4] at o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827) [11:51:55] : [Step 3/4] at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110) [11:51:55] : [Step 3/4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [11:51:55] : [Step 3/4] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [11:51:55] : [Step 3/4] at java.lang.Thread.run(Thread.java:748) {code} Because of that, exchange worker hangs forever while trying to acquire that lock: {code} [11:51:55] : [Step 3/4] Thread [name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, state=BLOCKED, blockCnt=11, waitCnt=7] [11:51:55] : [Step 3/4] Lock [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150, ownerName=sys-#68921%new-node-topology-change-thread-1%, ownerId=77410] [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1033) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.addAssignments(GridDhtPartitionDemander.java:302) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.addAssignments(GridDhtPreloader.java:441) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2659) [11:51:55] : [Step 3/4] at o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2377) [11:51:55] : [Step 3/4] at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110) [11:51:55] : [Step 3/4] at java.lang.Thread.run(Thread.java:748) {code} Timeout is explicitly set to Integer.MAX_VALUE in the GridCacheAbstractSelfTest.getConfiguration(...) method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)