Hi,

Try to configure the existing timeouts:
- failureDetectionTimeout
- clientFailureDetectionTimeout
- networkTimeout

Not sure if the Future you mention really needs timeout. If there was a 
timeout, it would be either failureDetectionTimeout or networkTimeout anyway.
Currently the calling thread has to wait for the failed node to be kicked out 
of the cluster. The time that the cluster needs to kick out a node depends 
(primarily) on the failureDetectionTimeout, although isn’t strictly equal to it.

The problem I can see here is that the putAsync is not actually async since it 
performs substantial work in the calling thread. I’ll give it another look and 
file a bug.
For now, as a workaround (aside from configuring the timeouts) I’d suggest to 
use an ExecutorService to call put()/putAsync(), getting a cancelable Future 
from the start.

Thanks,
Stan

From: Olexandr K
Sent: 11 июня 2018 г. 2:44
To: [email protected]
Subject: Cache operations hanging for a minute when one of server nodes goesdown

Hi Igniters,

I'm testing our system for availability.
It uses Ignite as key/value persistent cache.

Here is my test:
1) start 2 server and 2 client nodes
2) run heavy load on client nodes (some application logic which cause cache 
calls)
3) stop 1 server node

Here I expect all in-progress cache operations targeted to server 1 node to 
fail fast.
What I don't want is to hang all my processing threads for significant time. 
Unfortunately it works exactly that way: I'm constantly getting my threads 
blocked for 20-80 seconds. 
Finally putAsync() completes successfully but I'd prefer cache operation to 
fail fast. I don't want to hang all processing threads for a minute because of 
cache.

It works the same for put() and putAsync() calls.

As I see in the code, it can be fixed by calling future.get(timeout) instead of 
future.get() in TcpCommunicationSpi.
Timeout should be configurable.

TcpCommunicationSpi (line: 2799)
  private GridCommunicationClient reserveClient(ClusterNode node, int connIdx) {
    ...
    client = fut.get();

Does it make sense from your point of view?

Here is my thread  dump:

threads=[
  {
    threadName=https-jsse-nio-8080-exec-20,
    threadId=102,
    blockedTime=-1,
    blockedCount=0,
    waitedTime=-1,
    waitedCount=5,
    lockName=null,
    lockOwnerId=-1,
    lockOwnerName=null,
    inNative=false,
    suspended=false,
    threadState=WAITING,
    stackTrace=[
      {
        methodName=park,
        fileName=Unsafe.java,
        lineNumber=-2,
        className=sun.misc.Unsafe,
        nativeMethod=true
      },
      {
        methodName=park,
        fileName=LockSupport.java,
        lineNumber=304,
        className=java.util.concurrent.locks.LockSupport,
        nativeMethod=false
      },
      {
        methodName=get0,
        fileName=GridFutureAdapter.java,
        lineNumber=177,
        className=org.apache.ignite.internal.util.future.GridFutureAdapter,
        nativeMethod=false
      },
      {
        methodName=get,
        fileName=GridFutureAdapter.java,
        lineNumber=140,
        className=org.apache.ignite.internal.util.future.GridFutureAdapter,
        nativeMethod=false
      },
      {
        methodName=reserveClient,
        fileName=TcpCommunicationSpi.java,
        lineNumber=2799,
        className=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi,
        nativeMethod=false
      },
....
      {
        methodName=putAsync,
        fileName=IgniteCacheProxyImpl.java,
        lineNumber=1035,
        
className=org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl,
        nativeMethod=false
      },
      {
        methodName=putAsync,
        fileName=GatewayProtectedCacheProxy.java,
        lineNumber=900,
        
className=org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy,
        nativeMethod=false
      },

Sample cache config:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="RefreshToken"/>
                    <property name="dataRegionName" 
value="auth_durable_region"/>
                    <property name="atomicityMode" value="ATOMIC"/>
                    <property name="writeSynchronizationMode" 
value="FULL_ASYNC"/>
                    <property name="cacheMode" value="PARTITIONED"/>
                    <property name="backups" value="1"/>
                    <property name="eagerTtl" value="true"/> 
                </bean>

Ignite version: 2.4.0
OS: Windows Server 2012 R2

BR, Oleksandr

Reply via email to