Hi Christine - Thanks for looking into the logs. I also see that the node eventually comes out of GATHER state here:
Jun 07 16:56:10 corosync [TOTEM ] entering GATHER state from 0. Jun 07 16:56:10 corosync [TOTEM ] Creating commit token because I am the rep. Does it mean, it has timed out or given up and then came out ? second point, I did see some unexpected entries when I did tcpdump on the node coro.4.. [ Its also pasted in one of the earlier threads] You can see that it was receiving messages like 10:23:17.117347 IP 172.22.0.13.50468 > 172.22.0.4.netsupport: UDP, length 332 10:23:17.140960 IP 172.22.0.8.50438 > 172.22.0.4.netsupport: UDP, length 82 10:23:17.141319 IP 172.22.0.6.38535 > 172.22.0.4.netsupport: UDP, length 156 Please note that 172.22.0.8 and 172.22.0.6 are not part of my group and I was wondering why these messages are coming ? Thanks! On Fri, Jun 8, 2018 at 2:34 PM, Christine Caulfield <[email protected]> wrote: > On 07/06/18 18:32, Prasad Nagaraj wrote: > > Hi Christine - Got it:) > > > > I have collected few seconds of debug logs from all nodes after startup. > > Please find them attached. > > Please let me know if this will help us to identify rootcause. > > > > The problem is on the node coro.4 - it never gets out of the JOIN > > "Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11." > > process so something is wrong on that node, either a rogue routing table > entry, dangling iptables rule or even a broken NIC. > > Chrissie > >
_______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
