Hi, Try to configure the existing timeouts: - failureDetectionTimeout - clientFailureDetectionTimeout - networkTimeout
Not sure if the Future you mention really needs timeout. If there was a timeout, it would be either failureDetectionTimeout or networkTimeout anyway. Currently the calling thread has to wait for the failed node to be kicked out of the cluster. The time that the cluster needs to kick out a node depends (primarily) on the failureDetectionTimeout, although isn’t strictly equal to it. The problem I can see here is that the putAsync is not actually async since it performs substantial work in the calling thread. I’ll give it another look and file a bug. For now, as a workaround (aside from configuring the timeouts) I’d suggest to use an ExecutorService to call put()/putAsync(), getting a cancelable Future from the start. Thanks, Stan From: Olexandr K Sent: 11 июня 2018 г. 2:44 To: [email protected] Subject: Cache operations hanging for a minute when one of server nodes goesdown Hi Igniters, I'm testing our system for availability. It uses Ignite as key/value persistent cache. Here is my test: 1) start 2 server and 2 client nodes 2) run heavy load on client nodes (some application logic which cause cache calls) 3) stop 1 server node Here I expect all in-progress cache operations targeted to server 1 node to fail fast. What I don't want is to hang all my processing threads for significant time. Unfortunately it works exactly that way: I'm constantly getting my threads blocked for 20-80 seconds. Finally putAsync() completes successfully but I'd prefer cache operation to fail fast. I don't want to hang all processing threads for a minute because of cache. It works the same for put() and putAsync() calls. As I see in the code, it can be fixed by calling future.get(timeout) instead of future.get() in TcpCommunicationSpi. Timeout should be configurable. TcpCommunicationSpi (line: 2799) private GridCommunicationClient reserveClient(ClusterNode node, int connIdx) { ... client = fut.get(); Does it make sense from your point of view? Here is my thread dump: threads=[ { threadName=https-jsse-nio-8080-exec-20, threadId=102, blockedTime=-1, blockedCount=0, waitedTime=-1, waitedCount=5, lockName=null, lockOwnerId=-1, lockOwnerName=null, inNative=false, suspended=false, threadState=WAITING, stackTrace=[ { methodName=park, fileName=Unsafe.java, lineNumber=-2, className=sun.misc.Unsafe, nativeMethod=true }, { methodName=park, fileName=LockSupport.java, lineNumber=304, className=java.util.concurrent.locks.LockSupport, nativeMethod=false }, { methodName=get0, fileName=GridFutureAdapter.java, lineNumber=177, className=org.apache.ignite.internal.util.future.GridFutureAdapter, nativeMethod=false }, { methodName=get, fileName=GridFutureAdapter.java, lineNumber=140, className=org.apache.ignite.internal.util.future.GridFutureAdapter, nativeMethod=false }, { methodName=reserveClient, fileName=TcpCommunicationSpi.java, lineNumber=2799, className=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi, nativeMethod=false }, .... { methodName=putAsync, fileName=IgniteCacheProxyImpl.java, lineNumber=1035, className=org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl, nativeMethod=false }, { methodName=putAsync, fileName=GatewayProtectedCacheProxy.java, lineNumber=900, className=org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy, nativeMethod=false }, Sample cache config: <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="RefreshToken"/> <property name="dataRegionName" value="auth_durable_region"/> <property name="atomicityMode" value="ATOMIC"/> <property name="writeSynchronizationMode" value="FULL_ASYNC"/> <property name="cacheMode" value="PARTITIONED"/> <property name="backups" value="1"/> <property name="eagerTtl" value="true"/> </bean> Ignite version: 2.4.0 OS: Windows Server 2012 R2 BR, Oleksandr
