>Does network issue make JVM halt? There is a failureDetectionTimeout, which will help other nodes in the cluster to detect that node is unreachable and to exclude this node from topology. So, I believe it could be something like a temporary network problem. I would recommend to add some network monitoring to be prepared for the next failure.
Best Regards, Evgenii пт, 26 июл. 2019 г. в 16:01, Akash Shinde <[email protected]>: > This issue is not consistent and but occurs sometimes. Does network issue > make JVM halt?. As per my understanding node will disconnects from cluster > if network issue happens. > But in this case multiple JVMs were terminated.Can it be a bug in Ignite > 2.6 version? > > Thanks, > Akash > > On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev < > [email protected]> wrote: > >> I don't see any specific errors in the logs. For me, it looks like >> network problems, moreover, on client nodes it prints messages about >> connection problems. Is this issue reproducible? >> Evgenii >> >> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <[email protected]>: >> >>> Can someone please help me on this issue ? >>> >>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <[email protected]> >>> wrote: >>> >>>> Hi, >>>> Please find attached logs from all server and client nodes.Also >>>> attached gc logs for each node. >>>> >>>> Thanks, >>>> Akash >>>> >>>> >>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Can you please share full logs from the node start from all nodes in >>>>> the cluster? >>>>> >>>>> Thanks, >>>>> Evgenii >>>>> >>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <[email protected]>: >>>>> >>>>>> I am using Ignite 2.6 version. I have created a cluster of 7 server >>>>>> nodes and three client nodes. Out of seven nodes five nodes stopped >>>>>> unexpectedly with below error logs lines. >>>>>> I have attached logs of two such server nodes. >>>>>> >>>>>> FailureDetectionTimeout is set to 30000 ms in Ignite configuration. >>>>>> Network time out is default. >>>>>> ClientFailureDetectionTimeout is set to 30000 ms. >>>>>> >>>>>> I check gc logs but it does not seem to be GC pause issue. I have >>>>>> attached GC logs too. >>>>>> >>>>>> 1) Can someone please help me to identify the reason for this issue? >>>>>> 2) Are there any specific reasons which causes this issue or it is a >>>>>> bug in Ignite 2.6 version? >>>>>> >>>>>> >>>>>> *ERROR LOGS LINES* >>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%] >>>>>> ERROR - Critical system error detected. Will be handled accordingly to >>>>>> configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, >>>>>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, >>>>>> err=java.lang.IllegalStateException: Thread >>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>> java.lang.IllegalStateException: Thread >>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly. >>>>>> at >>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) >>>>>> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%] >>>>>> ERROR - JVM will be halted immediately due to the failure: >>>>>> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, >>>>>> err=java.lang.IllegalStateException: Thread >>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Akash >>>>>> >>>>>
