Hello! I don't see any lengthy GC pauses yet one node were segmented. It is unclear what exactly would cause this.
Can you try increasing failureDetectionTimeout to 2 minutes (120000) and retrying? Please attach logs if there is failure again. Regards, -- Ilya Kasnacheev вт, 8 янв. 2019 г. в 17:33, Akash Shinde <[email protected]>: > Hi Evgenii , > > I am starting 7 ignite nodes on 7 VMs. But to narrow down the problem I > started only two server nodes on two VMs, core03 and core04. Initially > these VMs were on different VHS. So we moved these two VMs on same VHS (to > avoid network issues) and checked the network bandwidth using iperf. Now > the network bandwidth is 6.7 Gbps. Then started one client node from laptop > just to check the cluster status. > > But even after doing this I am facing the same problem. The nodes are > segmenting during the data loading. > > I have attached the logs for two server nodes. It also contains gc logs. > > > Thanks, > Akash > > On Tue, Jan 8, 2019 at 6:00 AM Evgenii Zhuravlev <[email protected]> > wrote: > >> Hi, >> >> Can you share logs from all nodes, especially from node qagmscore02/ >> 10.114.113.53:47500 ? >> >> Evgenii >> >> пн, 7 янв. 2019 г. в 08:14, Akash Shinde <[email protected]>: >> >>> Hi, >>> Someone could please help me on this issue. >>> >>> Thanks, >>> Akash >>> >>> On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I am getting " Timed out waiting for message delivery receipt" WARN >>>> message in my logs. >>>> But I am sure that it is not happening because of long GC pause. I have >>>> check the memory utilization and it is very low. >>>> >>>> I also tried to check the connectivity between two nodes between which >>>> the timeout is happening. >>>> bandwidth is as shown below. >>>> >>>> [ ID] Interval Transfer Bandwidth >>>> [ 4] 0.0-10.1 sec 855 MBytes 708 Mbits/sec >>>> >>>> Many times I get following message in my logs. Is it because two nodes >>>> are not able communicate within given time limit? >>>> >>>> *ERROR:* >>>> Blocked system-critical thread has been detected. This can lead to >>>> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker, >>>> blockedFor=14s] >>>> >>>> I have also attached log snippet. Can some one please help to narrow >>>> down the issue? >>>> >>>> Thanks, >>>> Akash >>>> >>>
