Hi,
Can you please share new logs? It will help to understand the possible
reason of the issue.

Thanks,
Evgenii

ср, 28 авг. 2019 г. в 17:56, Akash Shinde <[email protected]>:

> Hi,
>
> Now I have set the failure detection timeout to 120000 mills and I am
> still getting this error message intermittently on Ignite 2.6 version.
> It could be the network issue but I am not able to confirm that this is
> happening because of network issue.
>
> 1)  What are all possible reasons for following error? Could you please
> mention it, it might help to narrow down the issue.
>  [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]
>
> 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
> problem?
>
> 3) How do you monitor the network. Can you please suggest any tool?
>
> 4) I understand that node gets segmented because of long GC pause or
> network connectivity. Is my understanding correct?
>
> 5) What is the purpose of networkTimeout configuration? In my case it is
> set to 10000 .
>
> Regards,
> Akash
>
> On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev <
> [email protected]> wrote:
>
>> >Does network issue make JVM  halt?
>> There is a failureDetectionTimeout, which will help other nodes in the
>> cluster to detect that node is unreachable and to exclude this node from
>> topology. So, I believe it could be something like a temporary network
>> problem. I would recommend to add some network monitoring to be prepared
>> for the next failure.
>>
>> Best Regards,
>> Evgenii
>>
>> пт, 26 июл. 2019 г. в 16:01, Akash Shinde <[email protected]>:
>>
>>> This issue is not consistent and but occurs sometimes. Does network
>>> issue make JVM  halt?. As per my understanding node will disconnects from
>>> cluster if network issue happens.
>>> But in this case multiple JVMs were terminated.Can it be a bug in Ignite
>>> 2.6 version?
>>>
>>> Thanks,
>>> Akash
>>>
>>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
>>> [email protected]> wrote:
>>>
>>>> I don't see any specific errors in the logs. For me, it looks like
>>>> network problems, moreover, on client nodes it prints messages about
>>>> connection problems. Is this issue reproducible?
>>>> Evgenii
>>>>
>>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde <[email protected]>:
>>>>
>>>>> Can someone please help me on this issue ?
>>>>>
>>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> Please find attached logs from all server and client nodes.Also
>>>>>> attached gc logs for each node.
>>>>>>
>>>>>> Thanks,
>>>>>> Akash
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Can you please share full logs from the node start from all nodes in
>>>>>>> the cluster?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Evgenii
>>>>>>>
>>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde <[email protected]>:
>>>>>>>
>>>>>>>> I am using Ignite 2.6 version.  I have created a cluster of 7
>>>>>>>> server nodes and three client nodes. Out of seven nodes five nodes 
>>>>>>>> stopped
>>>>>>>> unexpectedly with below error logs lines.
>>>>>>>> I have attached logs of two such server nodes.
>>>>>>>>
>>>>>>>> FailureDetectionTimeout is set to 30000 ms  in Ignite
>>>>>>>> configuration.
>>>>>>>> Network time out is default.
>>>>>>>> ClientFailureDetectionTimeout is set to 30000 ms.
>>>>>>>>
>>>>>>>> I check gc logs but it does not seem to be GC pause issue. I have
>>>>>>>> attached GC logs too.
>>>>>>>>
>>>>>>>> 1) Can someone please help me to identify the reason for this
>>>>>>>> issue?
>>>>>>>> 2) Are there any specific reasons which causes this issue or it is
>>>>>>>> a bug in Ignite 2.6 version?
>>>>>>>>
>>>>>>>>
>>>>>>>> *ERROR LOGS LINES*
>>>>>>>> 2019-07-22 09:22:47,281 19417675
>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical system error
>>>>>>>> detected. Will be handled accordingly to configured handler [hnd=class
>>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>>> java.lang.IllegalStateException: Thread
>>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>>>>>>> at
>>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>>>>>>> at
>>>>>>>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>>>>>>> 2019-07-22 09:22:47,281 19417675
>>>>>>>> [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will be halted 
>>>>>>>> immediately
>>>>>>>> due to the failure: [failureCtx=FailureContext
>>>>>>>> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>>>>>>>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Akash
>>>>>>>>
>>>>>>>

Reply via email to