On 31/03/17 02:32 AM, Jan Friesse wrote: >> The original message has the logs from nodes 1 and 3. Node 2, the one >> that >> got fenced in this test, doesn't really show much. Here are the logs from >> it: >> >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0, >> 192.168.100.14#123, interface stats: received=0, sent=0, dropped=0, >> active_time=3253 secs >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #7 enp6s0f0, >> fe80::a236:9fff:fe8a:6500%6#123, interface stats: received=0, sent=0, >> dropped=0, active_time=3253 secs >> Mar 24 16:35:13 b014 corosync[2166]: notice [TOTEM ] A processor failed, >> forming new configuration. >> Mar 24 16:35:13 b014 corosync[2166]: [TOTEM ] A processor failed, >> forming >> new configuration. >> Mar 24 16:35:13 b014 corosync[2166]: notice [TOTEM ] The network >> interface >> is down. > > This is problem. Corosync handles ifdown really badly. If this was not > intentional it may be caused by NetworkManager. Then please install > equivalent of NetworkManager-config-server package (it's actually one > file called 00-server.conf so you can extract it from, for example, > Fedora package > https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/n/NetworkManager-config-server-1.8.0-0.1.fc27.noarch.html)
ifdown'ing corosync's interface happens a lot, intentionally or otherwise. I think it is reasonable to expect corosync to handle this properly. How hard would it be to make corosync resilient to this fault case? -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org