FeldHost™ Admin <ad...@feldhost.cz> writes: > rule of thumb is use separate dedicated network for corosync traffic. > For ex. we use two corosync rings, first and active one on separate > network card and switch, second passive one on team (bond) device vlan.
Hi, That's fine in principle, but this is a bladecenter setting, we can't really use separate networks cards, it's a single chassis at the end of the day. Besides, we've not encountered Corosync glitches. The Corosync virtual network is shared with the DLM traffic only and has 200 Mb/s bandwidth dedicated to it in the interface (BIOS) setup. Failure story for amusement: the blades expose a BMC watchdog device to the OS, which was picked up by Corosync. It seemed like a useful second line of defense in case fencing (BMC IPMI power) failed for any reason; I let it live and forgot about it. Months later, after a firmware upgrade the BMC had to be restarted, and the watchdog device ioctl blocked Corosync for a minute or so. Of course membership fell apart. Actually, across the full cluster, because the BMC restarts were preformed back-to-back (I authorized a single restart only, but anyway). I leave the rest to your imagination. Fencing (STONITH) worked (with delays) until quorum dissolved entirely... after a couple of minutes, it was over. We spent the rest of the day picking up the pieces, then the next few trying to reproduce the perceived Corosync network outage during BMC reboots without the cluster stack running. Of course in total vain. Half a year later an independent investigation of sporadic small Corosync delays revealed the watchdog connection, then we disabled the feature. Don't use (poorly implemented) BMC watchdogs. -- Feri _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org