On Wed, Jun 15, 2011 at 05:32:10PM -0500, mark - pacemaker list wrote: > On Wed, Jun 15, 2011 at 4:20 PM, Dejan Muhamedagic <deja...@fastmail.fm>wrote: > > > On Wed, Jun 15, 2011 at 03:26:56PM -0500, mark - pacemaker list wrote: > > > On Wed, Jun 15, 2011 at 12:24 PM, imnotpc <imno...@rock3d.net> wrote: > > > > > > > > > > > What I was thinking is that the DC is never fenced > > > > > > > > > Is this actually the case? > > > > In a way it is true. Only DC can order fencing and there is > > always exactly one DC in a partition. On split brain, each > > partition elects a DC and if the DC has quorum it can try to > > fence nodes in other partitions. That's why in two-node clusters > > there's always a shoot-out. But note that the old DC (before > > split brain), if it loses quorum, gets fenced by a new DC from > > another partition. > > > > > It would sure explain the one "gotcha" I've > > > never been able to work around in a three node cluster with stonith/SBD. > > If > > > you unplug the network cable from the DC (but it and the other nodes all > > > still see the SBD disk via their other NIC(s)), the DC of course becomes > > > completely isolated. It will fence > > > > Fence? It won't fence anything unless it has quorum. Do you have > > no-quorum-policy=ignore? > > > > I have no-quorum-policy=freeze.
OK. It seems like freeze freezes just resources, but fencing requests are still generated. That really shouldn't be happening. Can you please file a bugzilla. Cheers, Dejan > With this status: > > ============ > Last updated: Wed Jun 15 16:48:57 2011 > Stack: Heartbeat > Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - > partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 3 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > > Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ] > > Resource Group: MySQL-history > iscsi_mysql_history (ocf::heartbeat:iscsi): Started cn1.testlab.local > volgrp_mysql_history (ocf::heartbeat:LVM): Started cn1.testlab.local > fs_mysql_history (ocf::heartbeat:Filesystem): Started cn1.testlab.local > ip_mysql_history (ocf::heartbeat:IPaddr2): Started cn1.testlab.local > mysql_history (ocf::heartbeat:mysql): Started cn1.testlab.local > mail_alert_history (ocf::heartbeat:MailTo): Started cn1.testlab.local > Resource Group: MySQL-hsa > iscsi_mysql_hsa (ocf::heartbeat:iscsi): Started cn2.testlab.local > volgrp_mysql_hsa (ocf::heartbeat:LVM): Started cn2.testlab.local > fs_mysql_hsa (ocf::heartbeat:Filesystem): Started cn2.testlab.local > ip_mysql_hsa (ocf::heartbeat:IPaddr2): Started cn2.testlab.local > mysql_hsa (ocf::heartbeat:mysql): Started cn2.testlab.local > mail_alert_hsa (ocf::heartbeat:MailTo): Started cn2.testlab.local > Resource Group: MySQL-livedata > iscsi_mysql_livedata (ocf::heartbeat:iscsi): Started cn3.testlab.local > volgrp_mysql_livedata (ocf::heartbeat:LVM): Started cn3.testlab.local > fs_mysql_livedata (ocf::heartbeat:Filesystem): Started > cn3.testlab.local > ip_mysql_livedata (ocf::heartbeat:IPaddr2): Started cn3.testlab.local > mysql_livedata (ocf::heartbeat:mysql): Started cn3.testlab.local > mail_alert_livedata (ocf::heartbeat:MailTo): Started cn3.testlab.local > stonith_sbd (stonith:external/sbd): Started cn2.testlab.local > Resource Group: Cluster_Status > cluster_status_ip (ocf::heartbeat:IPaddr2): Started cn3.testlab.local > cluster_status_page (ocf::heartbeat:apache): Started cn3.testlab.local > > > I isolated cn1 (the DC, but stonith_sbd was running on cn2). In this case, > one of the two good nodes became DC and cn1 was fenced, so things worked as > I'd expect. The outage for cn1's resources is quite short. > > However, with *this* status, where everything is the same as above except > the stonith_sbd resource is also located on cn1, so it is both DC and the > node running stonith_sbd: > > ============ > Last updated: Wed Jun 15 16:58:49 2011 > Stack: Heartbeat > Current DC: cn1.testlab.local (814b426f-ab10-445c-9158-a1765d82395e) - > partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 3 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > > Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ] > > Resource Group: MySQL-history > iscsi_mysql_history (ocf::heartbeat:iscsi): Started cn1.testlab.local > volgrp_mysql_history (ocf::heartbeat:LVM): Started cn1.testlab.local > fs_mysql_history (ocf::heartbeat:Filesystem): Started cn1.testlab.local > ip_mysql_history (ocf::heartbeat:IPaddr2): Started cn1.testlab.local > mysql_history (ocf::heartbeat:mysql): Started cn1.testlab.local > mail_alert_history (ocf::heartbeat:MailTo): Started cn1.testlab.local > Resource Group: MySQL-hsa > iscsi_mysql_hsa (ocf::heartbeat:iscsi): Started cn2.testlab.local > volgrp_mysql_hsa (ocf::heartbeat:LVM): Started cn2.testlab.local > fs_mysql_hsa (ocf::heartbeat:Filesystem): Started cn2.testlab.local > ip_mysql_hsa (ocf::heartbeat:IPaddr2): Started cn2.testlab.local > mysql_hsa (ocf::heartbeat:mysql): Started cn2.testlab.local > mail_alert_hsa (ocf::heartbeat:MailTo): Started cn2.testlab.local > Resource Group: MySQL-livedata > iscsi_mysql_livedata (ocf::heartbeat:iscsi): Started cn3.testlab.local > volgrp_mysql_livedata (ocf::heartbeat:LVM): Started cn3.testlab.local > fs_mysql_livedata (ocf::heartbeat:Filesystem): Started > cn3.testlab.local > ip_mysql_livedata (ocf::heartbeat:IPaddr2): Started cn3.testlab.local > mysql_livedata (ocf::heartbeat:mysql): Started cn3.testlab.local > mail_alert_livedata (ocf::heartbeat:MailTo): Started cn3.testlab.local > stonith_sbd (stonith:external/sbd): Started cn1.testlab.local > Resource Group: Cluster_Status > cluster_status_ip (ocf::heartbeat:IPaddr2): Started cn2.testlab.local > cluster_status_page (ocf::heartbeat:apache): Started cn2.testlab.local > > > > ... when I isolated cn1, it almost immediately fenced cn3. Approx 30 > seconds later cn2 promotes itself to DC as it's the only surviving node with > network connectivity, but of course cn3 is just trying to come back up after > a reboot so it isn't participating yet. I have two nodes that think they're > DC, neither with quorum. That's where I decided to change no-quorum-policy > to freeze, because at this time all services would shut down completely. > With freeze, at least the services on the surviving good node stay up. > > Once cn3 finishes booting pacemaker starts, then cn2 and cn3 form a quorum > and cn1 finally gets fenced, and all resources are able to start on machines > with network connectivity. The outage in this case has of course been quite > a bit longer than the previous one. > > Regards, > Mark > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker