Hi Andrew,
I've installed pacemaker and updated my ha.cf on the quorum node, removing ccm and replacing it with the pacemaker respawn line. You're correct - migration happens quickly and does not need to wait for these timeouts. I then reverted crmd-integration-timeout to its default 3m value. Thanks, Andrew ----- Original Message ----- From: "Andrew Beekhof" < and...@beekhof.net > To: "The Pacemaker cluster resource manager" < pacemaker@oss.clusterlabs.org > Sent: Monday, February 27, 2012 5:25:46 AM Subject: Re: [Pacemaker] Configuring 3rd Node as Quorum Node in 2 Node Cluster It looks like we're waiting for the other node to respond, which it wont do. Is running pacemaker on the other node but with standby=true an option for you? On Sat, Feb 25, 2012 at 6:25 AM, Andrew Martin < amar...@xes-inc.com > wrote: > Hi Andreas, > > Thanks, adding "respawn hacluster /usr/lib/heartbeat/ccm" to ha.cf worked. > Since quorum-node is in standby, it shows up as "OFFLINE (standby)" in > crm_mon. It seems that "cl_status nodestatus quorum-node" always returns > "active", even if heartbeat is stopped on the quorum node. However, > "cl_status hblinkstatus quorum-node br0" can correctly detect if heartbeat > is down on quorum-node so I can use that to check its connectivity. > > I was able to successfully test resources automatically stopping once quorum > was lost. I did this by shutting down node2 so that only node1 and > quorum-node remained. I then stopped heartbeat on quorum-node, which > resulted in node1 losing quorum and the resources stopping (as expected). > After starting heartbeat on quorum-node again, node1 reestablished quorum > within about 1 minute. However, it took significantly longer (around 18 > minutes) for the resources on node1 to start again. Looking through the > logs, I discovered that this is because of the values of > the cluster-recheck-interval (displayed as "PEngine Recheck Timer" in the > logs) and crmd-integration-timeout (displayed as "Integration Timer" in the > logs) properties. Here's the sequence of events as I understand it: > 1. quorum is reestablished > 2. the cluster-recheck-interval timer pops, sees that quorum has been > reestablished, and schedules crmd-integration-timeout to run > 3. after crmd-integration-timeout's timeout period, it pops and also sees > that quorum has been restablished and thus starts the resources > > Based on this, the maximum wait time for resources to start once quorum has > been reestablished is the value of cluster-recheck-interval plus the value > of crmd-integration-timeout, or (15m + 3m). I have confirmed this value > through several runs of this test. This seems like a very long time to me, > so I adjusted both of these values down to 1m. Running the test again I was > able to confirm that the resources started 2m after quorum was > reestablished: > ## quorum reestablished > 12:35:31 node1 ccm: [27015]: debug: quorum plugin: majority > 12:35:31 node1 ccm: [27015]: debug: cluster:linux-ha, member_count=2, > member_quorum_votes=200 > 12:35:31 node1 ccm: [27015]: debug: total_node_count=3, > total_quorum_votes=300 > 12:35:31 node1 crmd: [27020]: info: crmd_ccm_msg_callback: Quorum > (re)attained after event=NEW MEMBERSHIP (id=14) > 12:35:31 node1 crmd: [27020]: info: crm_update_quorum: Updating quorum > status to true (call=366) > ## cluster-recheck-interval pops, schedules crmd-integration-timeout to run > after its timout > 12:36:18 node1 crmd: [27020]: info: crm_timer_popped: PEngine Recheck Timer > (I_PE_CALC) just popped (60000ms) > 12:36:18 node1 crmd: [27020]: info: do_state_transition: State transition > S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > 12:36:18 node1 crmd: [27020]: info: do_state_transition: Progressed to state > S_POLICY_ENGINE after C_TIMER_POPPED > ## crmd-integration-timeout runs, starts the resources > 12:37:18 node1 crmd: [27020]: ERROR: crm_timer_popped: Integration Timer > (I_INTEGRATED) just popped in state S_INTEGRATION! (60000ms) > 12:37:18 node1 crmd: [27020]: info: crm_timer_popped: Welcomed: 1, > Integrated: 1 > 12:37:18 node1 crmd: [27020]: info: do_state_transition: State transition > S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_TIMER_POPPED > origin=crm_timer_popped ] > 12:37:18 node1 crmd: [27020]: WARN: do_state_transition: Progressed to state > S_FINALIZE_JOIN after C_TIMER_POPPED > 12:37:20 node1 crmd: [27020]: info: do_dc_join_final: Ensuring DC, quorum > and node attributes are up-to-date > 12:37:20 node1 crmd: [27020]: info: crm_update_quorum: Updating quorum > status to true (call=379) > 12:37:20 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r0:1#011(Stopped) > 12:37:20 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r1:1#011(Stopped) > 12:37:20 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r2:1#011(Stopped) > 12:37:20 node1 pengine: [29916]: notice: LogActions: Leave > p_libvirt-bin:1#011(Stopped) > 12:37:20 node1 pengine: [29916]: notice: LogActions: Leave > p_libvirt-bin:2#011(Stopped) > 12:37:21 node1 crmd: [27020]: notice: run_graph: Transition 77 (Complete=24, > Pending=0, Fired=0, Skipped=3, Incomplete=0, > Source=/var/lib/pengine/pe-input-526.bz2): Stopped > 12:37:21 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r0:1#011(Stopped) > 12:37:21 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r1:1#011(Stopped) > 12:37:21 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r2:1#011(Stopped) > 12:37:21 node1 pengine: [29916]: notice: LogActions: Leave > p_libvirt-bin:1#011(Stopped) > 12:37:21 node1 pengine: [29916]: notice: LogActions: Leave > p_libvirt-bin:2#011(Stopped) > 12:37:23 node1 lrmd: [27017]: info: RA output: (p_vm:start:stdout) Domain > MyVM started > 12:38:23 node1 crmd: [27020]: info: crm_timer_popped: PEngine Recheck Timer > (I_PE_CALC) just popped (60000ms) > 12:38:23 node1 crmd: [27020]: info: do_state_transition: State transition > S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > 12:38:23 node1 crmd: [27020]: info: do_state_transition: Progressed to state > S_POLICY_ENGINE after C_TIMER_POPPED > 12:38:23 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r0:1#011(Stopped) > 12:38:23 node1 pengine: [29916]: notice: LogActions: Leave > p_drbd_r1:1#011(Stopped) > > I've attached the full logs from this time period in addition to the excerpt > above. > > Is there a better way to trigger the starting of resources once quorum has > been reestablished? Or is modifying these two properties a good way of doing > it? > > Thanks, > > Andrew > > ________________________________ > From: "Andreas Kurz" < andr...@hastexo.com > > To: pacemaker@oss.clusterlabs.org > Sent: Friday, February 24, 2012 7:26:59 AM > Subject: Re: [Pacemaker] Configuring 3rd Node as Quorum Node > in 2 Node Cluster > > > Hello, > > On 02/23/2012 03:59 PM, Andrew Martin wrote: >> I set up the 3rd node ("quorum") yesterday by only installing heartbeat, >> not pacemaker. Is pacemaker necessary as well? I commented out the >> following lines in its ha.cf since it is always going to be running in >> standby: >> autojoin none >> mcast eth0 239.0.0.43 694 1 0 >> bcast eth0 >> warntime 5 >> deadtime 15 >> initdead 60 >> keepalive 2 >> node node1 >> node node2 >> node quorum >> #crm respawn >> #respawn hacluster /usr/lib/heartbeat/dopd >> #apiauth dopd gid=haclient uid=hacluster >> > > Hmm ... IIRC I had to enable ccm in ha.cf on the third node during my > last heartbeat tests to enable a quorum node: > > respawn hacluster ccm > >> Since this quorum node only has a single ethernet interface, can it be >> used for both the mcast and bcast parameters? How are both the multicast >> and broadcast pathways used for node communication? After saving these >> parameters and reloading heartbeat on all nodes, the "quorum" node is >> listed as offline in the cluster. Is there something missing in my >> configuration that is preventing it from communicating with the rest of >> the cluster? > > cl_status and hbclient should give you some membership information, and > there should be some log entries on the nodes running Pacemaker. I don't > think the node will show up as ONLINE in crm_mon if no Pacemaker is > running there. > > You only need the communication settings you also used for node1/2 on > the shared network ... so only the mcast directive is needed/possible on > node3. > >> >> Also, another more general question about the failover - node1 and node2 >> are each connected to the shared network over br0 and connected directly >> to each other with a crossover cable over br1: >> -------------------- ---------- >> | Shared Network |------| quorum | >> -------------------- ---------- >> | >> br0 / \ br0 >> / \ >> --------- --------- >> | node1 | --------- | node2 | >> --------- br1 --------- >> >> The corresponding configuration in ha.cf is >> autojoin none >> mcast br0 239.0.0.43 694 1 0 >> bcast br1 >> warntime 5 >> .... >> >> If br0 to one of the nodes were to be cut when the "quorum" node was >> down, would they still be able to communicate over br1 (e.g. to maintain >> quorum between themselves and fail over to the other node that still has >> an active br0)? > > as long as node1/2 can communicate the cluster has quorum, to fail over > resources to the node with best connectivity: configure ping resource > agent and constraints, there is a chapter in "Pacemake Explained" on > clusterlabs.org ... http://goo.gl/x7dwK > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > >> >> Thanks, >> >> Andrew >> >> ------------------------------------------------------------------------ >> *From: *"Andreas Kurz" < andr...@hastexo.com > >> *To: *pacemaker@oss.clusterlabs.org >> *Sent: *Monday, January 23, 2012 1:53:27 PM >> *Subject: *Re: [Pacemaker] Configuring 3rd Node as Quorum Node in >> 2 Node Cluster >> >> On 01/23/2012 03:36 PM, Andrew Martin wrote: >>> I think I will configure the 3rd (quorum) node in standby mode. In the >>> near future I am looking into setting up 2 additional clusters (each of >>> these are also 2-node clusters) and would like to use this same server >>> as the quorum node for those clusters as well. Is this possible? If so, >>> how do I have to configure heartbeat (or multiple instances of >>> heartbeat) to join multiple clusters at once and act as the quorum node >>> in each? >> >> No, multiple heartbeat instances per node are not supported ... but why >> not creating minimal VM instances ... though not to minimal, as you have >> a good chance that theses standby instances become DC role. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> >>> Thanks, >>> >>> Andrew >>> >>> ------------------------------------------------------------------------ >>> *From: *"Andreas Kurz" < andr...@hastexo.com > >>> *To: *pacemaker@oss.clusterlabs.org >>> *Sent: *Friday, January 13, 2012 6:35:48 AM >>> *Subject: *Re: [Pacemaker] Configuring 3rd Node as Quorum Node in 2 >>> Node Cluster >>> >>> On 01/13/2012 12:32 PM, Ivan Savčić | Epix wrote: >>>> On 1/11/2012 8:28 AM, Florian Haas wrote: >>>>> Another option would be to permanently run the 3rd node in standby >>>>> mode. >>>> >>>> Just wondering, wouldn't the standby mode prevent that node from >>>> performing the fencing actions? Also, can it act as DC then? >>> >>> It would run no resources (including stonith resources) but can be the >>> DC. >>> >>> Another option for running a "pure" quorum node would be to only start >>> CCM but not pacemaker ... though that setup looks quite strange e.g. in >>> crm_mon output .... >>> >>> Regards, >>> Andreas >>> >>> -- >>> Need help with Pacemaker? >>> http://www.hastexo.com/now >>> >>>> >>>> Thanks, >>>> Ivan >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org