Hi, Can you please share new logs? It will help to understand the possible reason of the issue.
Thanks, Evgenii ср, 28 авг. 2019 г. в 17:56, Akash Shinde <[email protected]>: > Hi, > > Now I have set the failure detection timeout to 120000 mills and I am > still getting this error message intermittently on Ignite 2.6 version. > It could be the network issue but I am not able to confirm that this is > happening because of network issue. > > 1) What are all possible reasons for following error? Could you please > mention it, it might help to narrow down the issue. > [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: > Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.] > > 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this > problem? > > 3) How do you monitor the network. Can you please suggest any tool? > > 4) I understand that node gets segmented because of long GC pause or > network connectivity. Is my understanding correct? > > 5) What is the purpose of networkTimeout configuration? In my case it is > set to 10000 . > > Regards, > Akash > > On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev < > [email protected]> wrote: > >> >Does network issue make JVM halt? >> There is a failureDetectionTimeout, which will help other nodes in the >> cluster to detect that node is unreachable and to exclude this node from >> topology. So, I believe it could be something like a temporary network >> problem. I would recommend to add some network monitoring to be prepared >> for the next failure. >> >> Best Regards, >> Evgenii >> >> пт, 26 июл. 2019 г. в 16:01, Akash Shinde <[email protected]>: >> >>> This issue is not consistent and but occurs sometimes. Does network >>> issue make JVM halt?. As per my understanding node will disconnects from >>> cluster if network issue happens. >>> But in this case multiple JVMs were terminated.Can it be a bug in Ignite >>> 2.6 version? >>> >>> Thanks, >>> Akash >>> >>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev < >>> [email protected]> wrote: >>> >>>> I don't see any specific errors in the logs. For me, it looks like >>>> network problems, moreover, on client nodes it prints messages about >>>> connection problems. Is this issue reproducible? >>>> Evgenii >>>> >>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <[email protected]>: >>>> >>>>> Can someone please help me on this issue ? >>>>> >>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> Please find attached logs from all server and client nodes.Also >>>>>> attached gc logs for each node. >>>>>> >>>>>> Thanks, >>>>>> Akash >>>>>> >>>>>> >>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Can you please share full logs from the node start from all nodes in >>>>>>> the cluster? >>>>>>> >>>>>>> Thanks, >>>>>>> Evgenii >>>>>>> >>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <[email protected]>: >>>>>>> >>>>>>>> I am using Ignite 2.6 version. I have created a cluster of 7 >>>>>>>> server nodes and three client nodes. Out of seven nodes five nodes >>>>>>>> stopped >>>>>>>> unexpectedly with below error logs lines. >>>>>>>> I have attached logs of two such server nodes. >>>>>>>> >>>>>>>> FailureDetectionTimeout is set to 30000 ms in Ignite >>>>>>>> configuration. >>>>>>>> Network time out is default. >>>>>>>> ClientFailureDetectionTimeout is set to 30000 ms. >>>>>>>> >>>>>>>> I check gc logs but it does not seem to be GC pause issue. I have >>>>>>>> attached GC logs too. >>>>>>>> >>>>>>>> 1) Can someone please help me to identify the reason for this >>>>>>>> issue? >>>>>>>> 2) Are there any specific reasons which causes this issue or it is >>>>>>>> a bug in Ignite 2.6 version? >>>>>>>> >>>>>>>> >>>>>>>> *ERROR LOGS LINES* >>>>>>>> 2019-07-22 09:22:47,281 19417675 >>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR - Critical system error >>>>>>>> detected. Will be handled accordingly to configured handler [hnd=class >>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext >>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: >>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>>>> java.lang.IllegalStateException: Thread >>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly. >>>>>>>> at >>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) >>>>>>>> at >>>>>>>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >>>>>>>> 2019-07-22 09:22:47,281 19417675 >>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR - JVM will be halted >>>>>>>> immediately >>>>>>>> due to the failure: [failureCtx=FailureContext >>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: >>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Akash >>>>>>>> >>>>>>>
