Hi,

Now I have set the failure detection timeout to 120000 mills and I am still
getting this error message intermittently on Ignite 2.6 version.
It could be the network issue but I am not able to confirm that this is
happening because of network issue.

1)  What are all possible reasons for following error? Could you please
mention it, it might help to narrow down the issue.
 [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]

2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
problem?

3) How do you monitor the network. Can you please suggest any tool?

4) I understand that node gets segmented because of long GC pause or
network connectivity. Is my understanding correct?

5) What is the purpose of networkTimeout configuration? In my case it is
set to 10000 .

Regards,
Akash

On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev <[email protected]>
wrote:

> >Does network issue make JVM  halt?
> There is a failureDetectionTimeout, which will help other nodes in the
> cluster to detect that node is unreachable and to exclude this node from
> topology. So, I believe it could be something like a temporary network
> problem. I would recommend to add some network monitoring to be prepared
> for the next failure.
>
> Best Regards,
> Evgenii
>
> пт, 26 июл. 2019 г. в 16:01, Akash Shinde <[email protected]>:
>
>> This issue is not consistent and but occurs sometimes. Does network issue
>> make JVM  halt?. As per my understanding node will disconnects from cluster
>> if network issue happens.
>> But in this case multiple JVMs were terminated.Can it be a bug in Ignite
>> 2.6 version?
>>
>> Thanks,
>> Akash
>>
>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
>> [email protected]> wrote:
>>
>>> I don't see any specific errors in the logs. For me, it looks like
>>> network problems, moreover, on client nodes it prints messages about
>>> connection problems. Is this issue reproducible?
>>> Evgenii
>>>
>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <[email protected]>:
>>>
>>>> Can someone please help me on this issue ?
>>>>
>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> Please find attached logs from all server and client nodes.Also
>>>>> attached gc logs for each node.
>>>>>
>>>>> Thanks,
>>>>> Akash
>>>>>
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Can you please share full logs from the node start from all nodes in
>>>>>> the cluster?
>>>>>>
>>>>>> Thanks,
>>>>>> Evgenii
>>>>>>
>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <[email protected]>:
>>>>>>
>>>>>>> I am using Ignite 2.6 version.  I have created a cluster of 7 server
>>>>>>> nodes and three client nodes. Out of seven nodes five nodes stopped
>>>>>>> unexpectedly with below error logs lines.
>>>>>>> I have attached logs of two such server nodes.
>>>>>>>
>>>>>>> FailureDetectionTimeout is set to 30000 ms  in Ignite configuration.
>>>>>>> Network time out is default.
>>>>>>> ClientFailureDetectionTimeout is set to 30000 ms.
>>>>>>>
>>>>>>> I check gc logs but it does not seem to be GC pause issue. I have
>>>>>>> attached GC logs too.
>>>>>>>
>>>>>>> 1) Can someone please help me to identify the reason for this issue?
>>>>>>> 2) Are there any specific reasons which causes this issue or it is a
>>>>>>> bug in Ignite 2.6 version?
>>>>>>>
>>>>>>>
>>>>>>> *ERROR LOGS LINES*
>>>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>>>> ERROR  - Critical system error detected. Will be handled accordingly to
>>>>>>> configured handler [hnd=class 
>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler,
>>>>>>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>>>> err=java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>> java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>>>>>> at
>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>>>>>> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>>>> ERROR  - JVM will be halted immediately due to the failure:
>>>>>>> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>>>> err=java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Akash
>>>>>>>
>>>>>>

Reply via email to