Hello everyone, we run a 3 node Pacemaker (1.1.14) / Corosync (2.3.5) cluster for a couple of months successfully and we have started seeing a faulty ring with unexpected 127.0.0.1 binding that we cannot reset via "corosync-cfgtool -r".
We have had this once before and only restarting Corosync (and everything else) on the node showing the unexpected 127.0.0.1 binding made the problem go away. However, in production we obviously would like to avoid this if possible. So from the following description - how can I troubleshoot this issue and/or does anybody have a good idea what might be happening here ? We run 2x passive rrp rings across different IP-subnets via udpu and we get the following output (all IPs obfuscated) - please notice the unexpected interface binding 127.0.0.1 for host pg2. If we reset via "corosync-cfgtool -r" on each node heartbeat ring id 1 briefly shows "no faults" but goes back to "FAULTY" seconds later. Regards, Martin Schlegel _____________________________________ root@pg1:~# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = A.B.C1.5 status = ring 0 active with no faults RING ID 1 id = D.E.F1.170 status = Marking ringid 1 interface D.E.F1.170 FAULTY root@pg2:~# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = A.B.C2.88 status = ring 0 active with no faults RING ID 1 id = 127.0.0.1 status = Marking ringid 1 interface 127.0.0.1 FAULTY root@pg3:~# corosync-cfgtool -s Printing ring status. Local node ID 3 RING ID 0 id = A.B.C3.236 status = ring 0 active with no faults RING ID 1 id = D.E.F3.112 status = Marking ringid 1 interface D.E.F3.112 FAULTY _____________________________________ /etc/corosync/corosync.conf from pg1 0 other nodes use different subnets and IPs, but are otherwise identical: =========================================== quorum { provider: corosync_votequorum expected_votes: 3 } totem { version: 2 crypto_cipher: none crypto_hash: none rrp_mode: passive interface { ringnumber: 0 bindnetaddr: A.B.C1.0 mcastport: 5405 ttl: 1 } interface { ringnumber: 1 bindnetaddr: D.E.F1.64 mcastport: 5405 ttl: 1 } transport: udpu } nodelist { node { ring0_addr: pg1 ring1_addr: pg1p nodeid: 1 } node { ring0_addr: pg2 ring1_addr: pg2p nodeid: 2 } node { ring0_addr: pg3 ring1_addr: pg3p nodeid: 3 } } logging { to_syslog: yes } =========================================== _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org