How to turn it off? Also i think i know what may have been the visor issue. I was connecting to cluster not specifying ports 47500..47509. But once I added that it seems more stable. I can even see the wifi node and everything.
On Fri, 21 Jun 2019 at 06:01, Ilya Kasnacheev <[email protected]> wrote: > Hello! > > It is recommended to turn off failure detection since its default config > is not very convenient. Maybe it is also fixed in 2.7.5. > > This just means some operation took longer than expected and Ignite > panicked. > > Regards, > > чт, 20 июн. 2019 г., 19:28 John Smith <[email protected]>: > >> Actually this hapenned when the WIFI node connected. But it never >> hapenned before... >> >> [14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] >> Completed partition exchange >> [localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee, >> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion >> [topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode >> [id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1, >> 127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0, >> /0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59, >> intOrder=32, lastExchangeTime=1561042306599, loc=false, >> ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], >> topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], >> durationFromInit=0] >> [14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange >> init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true] >> [14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager] >> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion >> [topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED, >> node=45516c37-5ee0-4046-a13a-9573607d25aa] >> [14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi] >> Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100, >> rmtAddr=/MY_WIFI_IP:62249] >> [14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] >> Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626, >> startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995], >> checkpointLockWait=0ms, checkpointLockHoldTime=12ms, >> walCpRecordFsyncDuration=3ms, pages=242, reason='timeout'] >> [14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] >> Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242, >> markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995], >> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms, >> pagesWrite=14ms, fsync=101ms, total=138ms] >> [14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi] >> Local node seems to be disconnected from topology (failure detection >> timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500] >> [14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked >> system-critical thread has been detected. This can lead to cluster-wide >> undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s] >> [14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi] >> Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861, >> rmtPort=56861 >> [14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread >> [name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6, >> waitCnt=24621465] >> >> [14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical >> system error detected. Will be handled accordingly to configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler >> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext >> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker >> [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, >> heartbeatTs=1561042326687]]] >> class org.apache.ignite.IgniteException: GridWorker >> [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, >> heartbeatTs=1561042326687] >> at >> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) >> at >> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) >> at >> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) >> at >> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) >> at >> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151) >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> at java.lang.Thread.run(Thread.java:748) >> >> >> [14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx] >> Possible too long JVM pause: 2047 milliseconds. >> [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] >> TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72, >> rmtPort=37607] >> [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] >> TCP discovery spawning a new thread for connection >> [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607] >> >> [14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi] >> Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607, >> rmtPort=37607] >> >> [14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor] >> Thread dump at 2019/06/20 14:52:47 UTC >> Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING, >> blockCnt=0, waitCnt=1] >> Lock >> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4, >> ownerName=null, ownerId=-1] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) >> at >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) >> at >> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) >> at >> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> Thread [name="Thread-6972", id=33108, state=TIMED_WAITING, >> blockCnt=0, waitCnt=17] >> Lock >> [object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c, >> ownerName=null, ownerId=-1] >> at sun.misc.Unsafe.park(Native Method) >> at >> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) >> at >> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) >> at >> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) >> at >> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941) >> at >> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >> On Thu, 20 Jun 2019 at 10:08, John Smith <[email protected]> wrote: >> >>> Ok, where do I look for the visor logs when it hangs? And it's not a no >>> caches issue the cluster works great. It when visor cannot reach a specific >>> client node. >>> >>> On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[email protected]> >>> wrote: >>> >>>> Hello @javadevmtl >>>> >>>> I failed to reproduce your problem. >>>> In case of any error in cache command Visor CMD shows message "No caches >>>> found". >>>> Please provide logs of visor, server and client nodes after command >>>> hang. >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>>> >>>
