Re: Ignite Visor Cache command hangs indefinitely.

John Smith Thu, 20 Jun 2019 09:28:33 -0700

Actually this hapenned when the WIFI node connected. But it never hapenned
before...

[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1,
127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0,
/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59,
intOrder=32, lastExchangeTime=1561042306599, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0],
durationFromInit=0]
[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange
init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true]
[14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED,
node=45516c37-5ee0-4046-a13a-9573607d25aa]
[14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100,
rmtAddr=/MY_WIFI_IP:62249]
[14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626,
startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995],
checkpointLockWait=0ms, checkpointLockHoldTime=12ms,
walCpRecordFsyncDuration=3ms, pages=242, reason='timeout']
[14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242,
markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms,
pagesWrite=14ms, fsync=101ms, total=138ms]
[14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi]
Local node seems to be disconnected from topology (failure detection
timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500]
[14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s]
[14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi]
Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861,
rmtPort=56861
[14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread
[name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6,
waitCnt=24621465]

[14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical system
error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false,
heartbeatTs=1561042326687]]]
class org.apache.ignite.IgniteException: GridWorker
[name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false,
heartbeatTs=1561042326687]
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

[14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx]
Possible too long JVM pause: 2047 milliseconds.
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72,
rmtPort=37607]
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection
[rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]

[14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi]
Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607,
rmtPort=37607]

[14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor]
Thread dump at 2019/06/20 14:52:47 UTC
        Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING,
blockCnt=0, waitCnt=1]
            Lock
[object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4,
ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
                at
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
                at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

        Thread [name="Thread-6972", id=33108, state=TIMED_WAITING,
blockCnt=0, waitCnt=17]
            Lock
[object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c,
ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
                at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
                at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
                at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

On Thu, 20 Jun 2019 at 10:08, John Smith <[email protected]> wrote:

> Ok, where do I look for the visor logs when it hangs? And it's not a no
> caches issue the cluster works great. It when visor cannot reach a specific
> client node.
>
> On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[email protected]>
> wrote:
>
>> Hello @javadevmtl
>>
>> I failed to reproduce your problem.
>> In case of any error in cache command Visor CMD shows message "No caches
>> found".
>> Please provide logs of visor, server and client nodes after command hang.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: Ignite Visor Cache command hangs indefinitely.

Reply via email to