Hi, Sorry for late reply. I was out of town. I am trying fetch the logs. Meanwhile could you please answer the questions from last mail ?
Thanks, Akash On Thu, Aug 29, 2019 at 6:51 PM Evgenii Zhuravlev <[email protected]> wrote: > Hi, > Can you please share new logs? It will help to understand the possible > reason of the issue. > > Thanks, > Evgenii > > ср, 28 авг. 2019 г. в 17:56, Akash Shinde <[email protected]>: > >> Hi, >> >> Now I have set the failure detection timeout to 120000 mills and I am >> still getting this error message intermittently on Ignite 2.6 version. >> It could be the network issue but I am not able to confirm that this is >> happening because of network issue. >> >> 1) What are all possible reasons for following error? Could you please >> mention it, it might help to narrow down the issue. >> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: >> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.] >> >> 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this >> problem? >> >> 3) How do you monitor the network. Can you please suggest any tool? >> >> 4) I understand that node gets segmented because of long GC pause or >> network connectivity. Is my understanding correct? >> >> 5) What is the purpose of networkTimeout configuration? In my case it is >> set to 10000 . >> >> Regards, >> Akash >> >> On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev < >> [email protected]> wrote: >> >>> >Does network issue make JVM halt? >>> There is a failureDetectionTimeout, which will help other nodes in the >>> cluster to detect that node is unreachable and to exclude this node from >>> topology. So, I believe it could be something like a temporary network >>> problem. I would recommend to add some network monitoring to be prepared >>> for the next failure. >>> >>> Best Regards, >>> Evgenii >>> >>> пт, 26 июл. 2019 г. в 16:01, Akash Shinde <[email protected]>: >>> >>>> This issue is not consistent and but occurs sometimes. Does network >>>> issue make JVM halt?. As per my understanding node will disconnects from >>>> cluster if network issue happens. >>>> But in this case multiple JVMs were terminated.Can it be a bug in >>>> Ignite 2.6 version? >>>> >>>> Thanks, >>>> Akash >>>> >>>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev < >>>> [email protected]> wrote: >>>> >>>>> I don't see any specific errors in the logs. For me, it looks like >>>>> network problems, moreover, on client nodes it prints messages about >>>>> connection problems. Is this issue reproducible? >>>>> Evgenii >>>>> >>>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <[email protected]>: >>>>> >>>>>> Can someone please help me on this issue ? >>>>>> >>>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> Please find attached logs from all server and client nodes.Also >>>>>>> attached gc logs for each node. >>>>>>> >>>>>>> Thanks, >>>>>>> Akash >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Can you please share full logs from the node start from all nodes >>>>>>>> in the cluster? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Evgenii >>>>>>>> >>>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <[email protected]>: >>>>>>>> >>>>>>>>> I am using Ignite 2.6 version. I have created a cluster of 7 >>>>>>>>> server nodes and three client nodes. Out of seven nodes five nodes >>>>>>>>> stopped >>>>>>>>> unexpectedly with below error logs lines. >>>>>>>>> I have attached logs of two such server nodes. >>>>>>>>> >>>>>>>>> FailureDetectionTimeout is set to 30000 ms in Ignite >>>>>>>>> configuration. >>>>>>>>> Network time out is default. >>>>>>>>> ClientFailureDetectionTimeout is set to 30000 ms. >>>>>>>>> >>>>>>>>> I check gc logs but it does not seem to be GC pause issue. I have >>>>>>>>> attached GC logs too. >>>>>>>>> >>>>>>>>> 1) Can someone please help me to identify the reason for this >>>>>>>>> issue? >>>>>>>>> 2) Are there any specific reasons which causes this issue or it is >>>>>>>>> a bug in Ignite 2.6 version? >>>>>>>>> >>>>>>>>> >>>>>>>>> *ERROR LOGS LINES* >>>>>>>>> 2019-07-22 09:22:47,281 19417675 >>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR - Critical system error >>>>>>>>> detected. Will be handled accordingly to configured handler [hnd=class >>>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext >>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: >>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>>>>> java.lang.IllegalStateException: Thread >>>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly. >>>>>>>>> at >>>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686) >>>>>>>>> at >>>>>>>>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) >>>>>>>>> 2019-07-22 09:22:47,281 19417675 >>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR - JVM will be halted >>>>>>>>> immediately >>>>>>>>> due to the failure: [failureCtx=FailureContext >>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: >>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]] >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Akash >>>>>>>>> >>>>>>>>
