Re: Ignite Visor Cache command hangs indefinitely.

Ilya Kasnacheev Fri, 21 Jun 2019 03:02:31 -0700

Hello!

It is recommended to turn off failure detection since its default config is
not very convenient. Maybe it is also fixed in 2.7.5.


This just means some operation took longer than expected and Ignite
panicked.

Regards,

чт, 20 июн. 2019 г., 19:28 John Smith <[email protected]>:

> Actually this hapenned when the WIFI node connected. But it never hapenned
> before...
>
> [14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture]
> Completed partition exchange
> [localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
> [id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1,
> 127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0,
> /0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59,
> intOrder=32, lastExchangeTime=1561042306599, loc=false,
> ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true],
> topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0],
> durationFromInit=0]
> [14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange
> init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true]
> [14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED,
> node=45516c37-5ee0-4046-a13a-9573607d25aa]
> [14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi]
> Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100,
> rmtAddr=/MY_WIFI_IP:62249]
> [14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager]
> Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626,
> startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995],
> checkpointLockWait=0ms, checkpointLockHoldTime=12ms,
> walCpRecordFsyncDuration=3ms, pages=242, reason='timeout']
> [14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager]
> Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242,
> markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms,
> pagesWrite=14ms, fsync=101ms, total=138ms]
> [14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi]
> Local node seems to be disconnected from topology (failure detection
> timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500]
> [14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s]
> [14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi]
> Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861,
> rmtPort=56861
> [14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread
> [name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6,
> waitCnt=24621465]
>
> [14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical system
> error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false,
> heartbeatTs=1561042326687]]]
> class org.apache.ignite.IgniteException: GridWorker
> [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false,
> heartbeatTs=1561042326687]
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>         at
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>         at
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> [14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx]
> Possible too long JVM pause: 2047 milliseconds.
>         [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi]
> TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72,
> rmtPort=37607]
>         [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi]
> TCP discovery spawning a new thread for connection
> [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
>
> [14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi]
> Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607,
> rmtPort=37607]
>
> [14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor]
> Thread dump at 2019/06/20 14:52:47 UTC
>         Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING,
> blockCnt=0, waitCnt=1]
>             Lock
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4,
> ownerName=null, ownerId=-1]
>                 at sun.misc.Unsafe.park(Native Method)
>                 at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                 at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>                 at
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>                 at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                 at java.lang.Thread.run(Thread.java:748)
>
>         Thread [name="Thread-6972", id=33108, state=TIMED_WAITING,
> blockCnt=0, waitCnt=17]
>             Lock
> [object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c,
> ownerName=null, ownerId=-1]
>                 at sun.misc.Unsafe.park(Native Method)
>                 at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                 at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>                 at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
>                 at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
>                 at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
>                 at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                 at java.lang.Thread.run(Thread.java:748)
>
>
> On Thu, 20 Jun 2019 at 10:08, John Smith <[email protected]> wrote:
>
>> Ok, where do I look for the visor logs when it hangs? And it's not a no
>> caches issue the cluster works great. It when visor cannot reach a specific
>> client node.
>>
>> On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[email protected]>
>> wrote:
>>
>>> Hello @javadevmtl
>>>
>>> I failed to reproduce your problem.
>>> In case of any error in cache command Visor CMD shows message "No caches
>>> found".
>>> Please provide logs of visor, server and client nodes after command hang.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>

Re: Ignite Visor Cache command hangs indefinitely.

Reply via email to