Hi Stan, I spent half of a day but was not able to find such balanced configuration.
I followed your second advice and everything looks good now "I’d suggest to use an ExecutorService to call put()/putAsync(), getting a cancelable Future from the start. " I'm doing all ignite calls via dedicated thread pool and controlling max call time via future. Calls still hangs for 5-60 seconds after server nodes up/down but this happens in cache-pool and is not affecting whole system. I'm just handling this as cache misses on application side Thanks On Mon, Jun 11, 2018 at 7:27 PM, Stanislav Lukyanov <[email protected]> wrote: > When a node joins the cluster needs to perform partition map exchange > process. If this process takes too long, the cluster may become > unresponsive. > > Looks like this is what happened in your case. You can check how long an > exchange took by looking for “Started exchange” and “Finished exchange” in > the logs – I assume it’s around 20 seconds. > > > > Debugging hanged partition map exchange issues may be pretty tricky. > > My best guess so far is that the reduced timeouts you’ve set resulted in > failed network IO (e.g. instead of waiting for a message for 10s and > getting it on the first try, you retry every 3s until a fast enough > delivery happens – which might be the 10th or 20th attempt). > > Try changing timeouts back and see how long your exchanges take on a node > join. Perhaps some value will be low enough to detect node failures and > high enough to allow regular operations to pass. > > If that doesn’t help, please share the full logs from all nodes. > > > > Thanks, > > Stan > > > > *From: *Olexandr K <[email protected]> > *Sent: *11 июня 2018 г. 14:24 > *To: *[email protected] > *Subject: *Re: Cache operations hanging for a minute when one of server > nodesgoesdown > > > > Hi Stan, > > > > I tried to decrease network/failure timeouts and it worked fine when node > stopped. > > Unfortunately I got lot's of hanged calls when it started again. > > At that time all cache calls got stuck for 25-30 seconds. > > Is it expected? I thought rebalancing should occur in background and node > should join the cluster when it is 100% ready, no? > > See some log extracts below. > > > > <property name="networkSendRetryCount" value="1" /> <!-- default: > 3 --> > <property name="networkSendRetryDelay" value="500" /> <!-- > default: 1000 --> > <property name="networkTimeout" value="1500" /> <!-- default: 5000 > --> > <property name="failureDetectionTimeout" value="3000" /> <!-- > default: 10000 --> > <property name="clientFailureDetectionTimeout" value="10000" /> > <!-- default: 30000 --> > > > > > > -- CACHE CALL STARTED: cache.put() > > > > 83837380032 2018-06-11 13:51:54.777 [https-jsse-nio-8080-exec-4] INFO > com.xxx.lk.backend.cache.impl.RefreshTokenCache - Store: > d289a0a3-bca6-49a4-ae9d-9568517d656e > > > > -- MANY WARNINGS BEFORE CALL COMPLETED > > > 2018-06-11 13:51:55.058 [grid-timeout-worker-#4] DEBUG > org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - > Timeout has occurred [obj=CancelableTask > [id=fe9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, > endTime=1528714315053, period=3000, cancel=false, task=org.apache.ignite. > internal.processors.query.GridQueryProcessor$2@2649758c], process=true] > 2018-06-11 13:51:55.355 [grid-timeout-worker-#4] DEBUG > org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - > Timeout has occurred [obj=CancelableTask > [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, > endTime=1528714315351, period=2000, cancel=false, task=org.apache.ignite. > internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true] > 2018-06-11 13:51:56.339 [nio-acceptor-#5] DEBUG org.apache.ignite.spi. > communication.tcp.TcpCommunicationSpi - Balancing data [min0=0, minIdx=0, > max0=-1, maxIdx=-1] > > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > node=ba482197-64cc-4d84-81f7-2b58f0c66a0c]. Dumping pending objects that > might be the cause: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Ready affinity version: > AffinityTopologyVersion [topVer=5, minorTopVer=0] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Last exchange future: > GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent > [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, > addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01. > xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], > discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, > loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313636, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714314058], crd=TcpDiscoveryNode > [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], > exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313636, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313636, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714314058], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, > initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, > hash=538678914], init=true, lastVer=null, partReleaseFut=null, > exchActions=null, affChangeMsg=null, initTs=1528714314058, > centralizedAff=false, changeGlobalStateE=null, done=false, state=CLIENT, > evtLatch=0, remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, > 34dc1ff0-45ec-4fee-995b-84dde8a451cb], super=GridFutureAdapter > [ignoreInterrupts=false, state=INIT, res=null, hash=1716673894]] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager > - First 10 pending exchange futures [total=0] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Last 10 exchange futures (total: > 3): > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > evt=NODE_JOINED, evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, > addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01. > xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], > discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, > loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture > [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_LEFT, > evtNode=TcpDiscoveryNode [id=c7e39052-0882-42f3-91aa-6a438c2ab745, > addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[/ > 10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1528714105823, > loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=true] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture > [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], > evt=NODE_JOINED, evtNode=TcpDiscoveryNode > [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, > addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[ > v-hp-lk-jbn02.9801.1678.ecs.hp.com/30.251.106.11:0, /10.2.0.250:0, / > 127.0.0.1:0], discPort=0, order=4, intOrder=0, lastExchangeTime=1528714098847, > loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=true] > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending transactions: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending explicit locks: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending cache futures: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending atomic cache futures: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending data streamer futures: > 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN > org.apache.ignite.internal.diagnostic - Pending transaction deadlock > detection futures: > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Encrypted > data [status=OK, handshakeStaus=NOT_HANDSHAKING, > ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=11210, bytesSent=0, > bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=853 > lim=853 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=95956, bytesRcvd=11210, > bytesSent0=0, bytesRcvd0=0, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714313367, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]]] > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping > received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0, > bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050, > bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]] > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped > raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING, > ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0, > bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050, > bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]]] > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping > received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, > bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, > bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]] > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped > raw data [status=OK, handshakeStatus=NOT_HANDSHAKING, > ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, > bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, > bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]]] > 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped > raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING, > ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, > bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, > bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]]] > 2018-06-11 13:51:57.089 [sys-#25] INFO org.apache.ignite.internal.diagnostic > - Exchange future waiting for coordinator response > [crd=34dc1ff0-45ec-4fee-995b-84dde8a451cb, topVer=AffinityTopologyVersion > [topVer=6, minorTopVer=0]] > Remote node information: > General node info [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, client=false, > discoTopVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > time=13:51:56.992] > Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=5, > minorTopVer=0]] > Last initialized exchange future: GridDhtPartitionsExchangeFuture > [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714313966], crd=TcpDiscoveryNode > [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500], > discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726, > loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], > exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, > initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, > hash=1994546166], init=true, lastVer=null, > partReleaseFut=PartitionReleaseFuture > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > futures=[ExplicitLockReleaseFuture > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], > TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture > [topVer=AffinityTopologyVersion > [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture > [topVer=AffinityTopologyVersion > [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null, > affChangeMsg=null, initTs=1528714313966, centralizedAff=false, > changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, > remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter > [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]] > Communication SPI statistics [rmtNode=ba482197-64cc-4d84- > 81f7-2b58f0c66a0c] > Communication SPI recovery descriptors: > [key=ConnectionKey [nodeId=ba482197-64cc-4d84-81f7-2b58f0c66a0c, > idx=0, connCnt=0], msgsSent=4, msgsAckedByRmt=0, msgsRcvd=453, > lastAcked=448, reserveCnt=1, descIdHash=1249269712] > Communication SPI clients: > [node=ba482197-64cc-4d84-81f7-2b58f0c66a0c, > client=GridTcpNioCommunicationClient > [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=2, bytesRcvd=98649, bytesSent=0, > bytesRcvd0=933, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, > finished=false, hashCode=989360520, interrupted=false, > runner=grid-nio-worker-tcp-comm-2-#27]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, > rcvCnt=453, sentCnt=4, reserved=true, lastAck=448, nodeLeft=false, > node=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, > addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, / > 30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4, > lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270, > isClient=true], connected=true, connectCnt=0, queueLimit=4096, > reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor > [acked=0, resendCnt=0, rcvCnt=453, sentCnt=4, reserved=true, lastAck=448, > nodeLeft=false, node=TcpDiscoveryNode > [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, > addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, / > 30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4, > lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270, > isClient=true], connected=true, connectCnt=0, queueLimit=4096, > reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/ > 10.2.0.252:47100, rmtAddr=/10.2.0.250:49581, createTime=1528714107367, > closeTime=0, bytesSent=12857, bytesRcvd=98649, bytesSent0=0, > bytesRcvd0=933, sndSchedTime=1528714107367, lastSndTime=1528714313279, > lastRcvTime=1528714316992, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@5f146306, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=true]], super=GridAbstractCommunicationClient > [lastUsed=1528714107415, closed=false, connIdx=0]]] > NIO sessions statistics: > >> Selector info [idx=2, keysCnt=1, bytesRcvd=98649, bytesRcvd0=933, > bytesSent=0, bytesSent0=0] > Connection info [in=true, rmtAddr=/10.2.0.250:49581, locAddr=/ > 10.2.0.252:47100, msgsSent=4, msgsAckedByRmt=0, descIdHash=1249269712, > unackedMsgs=[GridNearSingleGetResponse, GridNearSingleGetResponse, > GridDhtPartitionsFullMessage, GridDhtPartitionsFullMessage], msgsRcvd=453, > lastAcked=448, descIdHash=1249269712, bytesRcvd=98649, bytesRcvd0=933, > bytesSent=12857, bytesSent0=0, opQueueSize=0] > Exchange future: GridDhtPartitionsExchangeFuture > [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714313966], crd=TcpDiscoveryNode > [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500], > discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726, > loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], > exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, > nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode > [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, > 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163: > 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, > intOrder=5, lastExchangeTime=1528714313497, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, > tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, > initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, > hash=1994546166], init=true, lastVer=null, > partReleaseFut=PartitionReleaseFuture > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], > futures=[ExplicitLockReleaseFuture > [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], > TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture > [topVer=AffinityTopologyVersion > [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture > [topVer=AffinityTopologyVersion > [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null, > affChangeMsg=null, initTs=1528714313966, centralizedAff=false, > changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, > remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter > [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]] > Local communication statistics: > Communication SPI statistics [rmtNode=34dc1ff0-45ec-4fee- > 995b-84dde8a451cb] > Communication SPI recovery descriptors: > [key=ConnectionKey [nodeId=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > idx=0, connCnt=-1], msgsSent=453, msgsAckedByRmt=448, msgsRcvd=5, > lastAcked=0, reserveCnt=1, descIdHash=105307846] > Communication SPI clients: > [node=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > client=GridTcpNioCommunicationClient > [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, > bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker > [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, > finished=false, hashCode=1835849342, interrupted=false, > runner=grid-nio-worker-tcp-comm-1-#7]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 > lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 > cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, > rcvCnt=5, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=5, > sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, > node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, > addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, > /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1528714105823, loc=false, > ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, > connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, > rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, > createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, > bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, > lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, > directMode=true], GridConnectionBytesVerifyFilter, SSL filter], > accepted=false]], super=GridAbstractCommunicationClient > [lastUsed=1528714107495, closed=false, connIdx=0]]] > NIO sessions statistics: > >> Selector info [idx=1, keysCnt=1, bytesRcvd=20719, bytesRcvd0=9509, > bytesSent=0, bytesSent0=0] > Connection info [in=false, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/ > 10.2.0.252:47100, locAddr=/10.2.0.250:49581, msgsSent=453, > msgsAckedByRmt=448, descIdHash=105307846, unackedMsgs=[ > GridNearAtomicFullUpdateRequest, GridNearAtomicFullUpdateRequest, > GridNearAtomicFullUpdateRequest, GridDhtPartitionsSingleMessage, > IgniteDiagnosticMessage], msgsRcvd=5, lastAcked=0, descIdHash=105307846, > bytesRcvd=20719, bytesRcvd0=9509, bytesSent=96889, bytesSent0=933, > opQueueSize=0] > > 2018-06-11 13:51:57.370 [grid-timeout-worker-#4] DEBUG > org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - > Timeout has occurred [obj=CancelableTask > [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, > endTime=1528714317355, period=2000, cancel=false, task=org.apache.ignite. > internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true] > > -- CALL COMPLETED > > > 83837380032 2018-06-11 13:52:21.903 [https-jsse-nio-8080-exec-4] DEBUG > com.xxx.lk.backend.cache.impl.RefreshTokenCache - Cache operation > completed > > > > On Mon, Jun 11, 2018 at 12:27 PM, Stanislav Lukyanov < > [email protected]> wrote: > > Hi, > > > > Try to configure the existing timeouts: > > - failureDetectionTimeout > > - clientFailureDetectionTimeout > > - networkTimeout > > > > Not sure if the Future you mention really needs timeout. If there was a > timeout, it would be either failureDetectionTimeout or networkTimeout > anyway. > > Currently the calling thread has to wait for the failed node to be kicked > out of the cluster. The time that the cluster needs to kick out a node > depends (primarily) on the failureDetectionTimeout, although isn’t strictly > equal to it. > > > > The problem I can see here is that the putAsync is not actually async > since it performs substantial work in the calling thread. I’ll give it > another look and file a bug. > > For now, as a workaround (aside from configuring the timeouts) I’d suggest > to use an ExecutorService to call put()/putAsync(), getting a cancelable > Future from the start. > > > > Thanks, > > Stan > > > > *From: *Olexandr K <[email protected]> > *Sent: *11 июня 2018 г. 2:44 > *To: *[email protected] > *Subject: *Cache operations hanging for a minute when one of server nodes > goesdown > > > > Hi Igniters, > > > > I'm testing our system for availability. > > It uses Ignite as key/value persistent cache. > > > > Here is my test: > > 1) start 2 server and 2 client nodes > > 2) run heavy load on client nodes (some application logic which cause > cache calls) > > 3) stop 1 server node > > > > Here I expect all in-progress cache operations targeted to server 1 node > to fail fast. > > What I don't want is to hang all my processing threads for significant > time. > > Unfortunately it works exactly that way: I'm constantly getting my threads > blocked for 20-80 seconds. > > Finally putAsync() completes successfully but I'd prefer cache operation > to fail fast. I don't want to hang all processing threads for a minute > because of cache. > > > > It works the same for put() and putAsync() calls. > > > > As I see in the code, it can be fixed by calling future.get(timeout) > instead of future.get() in TcpCommunicationSpi. > > Timeout should be configurable. > > > > TcpCommunicationSpi (line: 2799) > > private GridCommunicationClient reserveClient(ClusterNode node, int > connIdx) { > > ... > > client = fut.get(); > > > > Does it make sense from your point of view? > > > > Here is my thread dump: > > > > threads=[ > { > threadName=https-jsse-nio-8080-exec-20, > threadId=102, > blockedTime=-1, > blockedCount=0, > waitedTime=-1, > waitedCount=5, > lockName=null, > lockOwnerId=-1, > lockOwnerName=null, > inNative=false, > suspended=false, > threadState=WAITING, > stackTrace=[ > { > methodName=park, > fileName=Unsafe.java, > lineNumber=-2, > className=sun.misc.Unsafe, > nativeMethod=true > }, > { > methodName=park, > fileName=LockSupport.java, > lineNumber=304, > className=java.util.concurrent.locks.LockSupport, > nativeMethod=false > }, > { > methodName=get0, > fileName=GridFutureAdapter.java, > lineNumber=177, > className=org.apache.ignite.internal.util.future. > GridFutureAdapter, > nativeMethod=false > }, > { > methodName=get, > fileName=GridFutureAdapter.java, > lineNumber=140, > className=org.apache.ignite.internal.util.future. > GridFutureAdapter, > nativeMethod=false > }, > { > methodName=reserveClient, > fileName=TcpCommunicationSpi.java, > lineNumber=2799, > className=org.apache.ignite.spi.communication.tcp. > TcpCommunicationSpi, > nativeMethod=false > }, > > .... > > { > methodName=putAsync, > fileName=IgniteCacheProxyImpl.java, > lineNumber=1035, > className=org.apache.ignite.internal.processors.cache. > IgniteCacheProxyImpl, > nativeMethod=false > }, > { > methodName=putAsync, > fileName=GatewayProtectedCacheProxy.java, > lineNumber=900, > className=org.apache.ignite.internal.processors.cache. > GatewayProtectedCacheProxy, > nativeMethod=false > }, > > > > Sample cache config: > > > > <bean class="org.apache.ignite.configuration.CacheConfiguration"> > <property name="name" value="RefreshToken"/> > <property name="dataRegionName" > value="auth_durable_region"/> > <property name="atomicityMode" value="ATOMIC"/> > <property name="writeSynchronizationMode" > value="FULL_ASYNC"/> > <property name="cacheMode" value="PARTITIONED"/> > <property name="backups" value="1"/> > <property name="eagerTtl" value="true"/> > </bean> > > > > Ignite version: 2.4.0 > > OS: Windows Server 2012 R2 > > > > BR, Oleksandr > > > > > > >
