Hello!

I don't see any lengthy GC pauses yet one node were segmented. It is
unclear what exactly would cause this.

Can you try increasing failureDetectionTimeout to 2 minutes (120000) and
retrying? Please attach logs if there is failure again.

Regards,
-- 
Ilya Kasnacheev


вт, 8 янв. 2019 г. в 17:33, Akash Shinde <[email protected]>:

> Hi Evgenii ,
>
> I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I
> started only two server nodes on two VMs, core03 and core04. Initially
> these VMs were on different VHS. So we moved these two VMs on same VHS (to
> avoid network issues) and checked the network bandwidth using iperf. Now
> the network bandwidth is 6.7 Gbps. Then started one client node from laptop
> just to check the cluster status.
>
> But even after doing this I am facing the same problem. The nodes are
> segmenting during the data loading.
>
> I have attached the logs for two server nodes. It also contains gc logs.
>
>
> Thanks,
> Akash
>
> On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <[email protected]>
> wrote:
>
>> Hi,
>>
>> Can you share logs from all nodes, especially from node qagmscore02/
>> 10.114.113.53:47500 ?
>>
>> Evgenii
>>
>> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <[email protected]>:
>>
>>> Hi,
>>> Someone could please help me on this issue.
>>>
>>> Thanks,
>>> Akash
>>>
>>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am getting " Timed out waiting for message delivery receipt" WARN
>>>> message in my logs.
>>>> But I am sure that it is not happening because of long GC pause. I have
>>>> check the memory utilization and it is very low.
>>>>
>>>> I also tried to check the connectivity between two nodes between which
>>>> the timeout is happening.
>>>> bandwidth is as shown below.
>>>>
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>>>>
>>>> Many times I get following message in my logs. Is it because two nodes
>>>> are not able communicate within given time limit?
>>>>
>>>> *ERROR:*
>>>>  Blocked system-critical thread has been detected. This can lead to
>>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
>>>> blockedFor=14s]
>>>>
>>>> I have also attached log snippet. Can some one please help to narrow
>>>> down the issue?
>>>>
>>>> Thanks,
>>>> Akash
>>>>
>>>

Reply via email to