On 07/06/18 18:32, Prasad Nagaraj wrote: > Hi Christine - Got it:) > > I have collected few seconds of debug logs from all nodes after startup. > Please find them attached. > Please let me know if this will help us to identify rootcause. >
The problem is on the node coro.4 - it never gets out of the JOIN "Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11." process so something is wrong on that node, either a rogue routing table entry, dangling iptables rule or even a broken NIC. Chrissie > Thanks! > > On Thu, Jun 7, 2018 at 8:43 PM, Christine Caulfield <ccaul...@redhat.com > <mailto:ccaul...@redhat.com>> wrote: > > On 07/06/18 15:53, Prasad Nagaraj wrote: > > Hi - As you can see in the corosync.conf details - i have already kept > > debug: on > > > > But only in the (disabled) AMF subsystem, not for corosync as a whole :) > > logger_subsys { > subsys: AMF > debug: on > } > > > Chrissie > > > > > > On Thu, 7 Jun 2018, 8:03 pm Christine Caulfield, <ccaul...@redhat.com > <mailto:ccaul...@redhat.com> > > <mailto:ccaul...@redhat.com <mailto:ccaul...@redhat.com>>> wrote: > > > > On 07/06/18 15:24, Prasad Nagaraj wrote: > > > > > > No iptables or otherwise firewalls are setup on these nodes. > > > > > > One observation is that each node sends messages on with its > own ring > > > sequence number which is not converging.. I have seen that > in a good > > > cluster, when nodes respond with same sequence number, the > > membership is > > > automatically formed. But in our case, that is not the case. > > > > > > > That's just a side-effect of the cluster not forming. It's not > causing > > it. Can you enable full corosync debugging (just add debug:on > to the end > > of the logging {} stanza) and see if that has any more useful > > information (I only need the corosync bits, not the pcmk ones) > > > > Chrissie > > > > > Example: we can see that one node sends > > > Jun 07 07:55:04 corosync [pcmk ] notice: pcmk_peer_update: > > Transitional > > > membership event on ring 71084: memb=1, new=0, lost=0 > > > ..... > > > Jun 07 07:55:16 corosync [pcmk ] notice: pcmk_peer_update: > > Transitional > > > membership event on ring 71096: memb=1, new=0, lost=0 > > > Jun 07 07:55:16 corosync [pcmk ] notice: pcmk_peer_update: > Stable > > > membership event on ring 71096: memb=1, new=0, lost=0 > > > > > > other node sends messages with its own numbers > > > Jun 07 07:55:12 corosync [pcmk ] notice: pcmk_peer_update: > > Transitional > > > membership event on ring 71088: memb=1, new=0, lost=0 > > > Jun 07 07:55:12 corosync [pcmk ] notice: pcmk_peer_update: > Stable > > > membership event on ring 71088: memb=1, new=0, lost=0 > > > ....... > > > Jun 07 07:55:24 corosync [pcmk ] notice: pcmk_peer_update: > > Transitional > > > membership event on ring 71100: memb=1, new=0, lost=0 > > > Jun 07 07:55:24 corosync [pcmk ] notice: pcmk_peer_update: > Stable > > > membership event on ring 71100: memb=1, new=0, lost=0 > > > > > > Any idea why this happens, and why the seq. numbers from > different > > nodes > > > are not converging ? > > > > > > Thanks! > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > > <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> > > > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > > > Bugs: http://bugs.clusterlabs.org > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org > <mailto:Users@clusterlabs.org>> > > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > > > Project Home: http://www.clusterlabs.org > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > > Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > > > Project Home: http://www.clusterlabs.org > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org> > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org