On Thu, Aug 9, 2012 at 12:14 PM, Bob Haxo <[email protected]> wrote: > Greetings. > > I have followed the setup instructions of Clusters From Scratch : > Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5, > including locating the new cman pages that do not seem to be linked into > the main document, for example, > > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html
The 1.1 document was updated for corosync 2.x I kept the cman/plugin version around but moved it to: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html Look for "Version: 1.1-plugin" on the main docs page. > > The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2, > Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave > anybody off the party list. > > I have these all working together to support "live" migration of the > virt client between the two phys hosts, so at that level, all is good. > > Questions: Is there a document that covers the fully covers such an > installation, meaning the extends the Cluster From Scratch (and replaces > the Apache example) to implementation of a HA virtual client? For > instance, should libvirtd be handled as a Pacemaker resource, or should > it be started as an system service at boot? What should be done with > "libvirt-guests"? These things I do not know sorry. > Should cman be started as a system service at boot? I prefer not to, but its just a personal preference. I run potentially broken versions of the cluster and have been hit hard before with processes running amok and putting machines into reboot cycles. > > Problem: When the the non-VM-host is rebooted, then when Pacemaker > restarts the gfs2 filesystem gets restarted on the VM host, which causes > the stop and start of the VirtualDomain. The gfs2 filesystem also gets > restarted without of the VirtualDomain resource included. This sounds like the "starting a clone on A causes a restart of the clone on B" bug. I think we've squashed that one now but not in a released version... how confident are you at creating rpms? > This behavior does not seem correct ... I think I would have flagged it > in my memory if I'd encountered the behavior when working with the SLES > HAE product. I've been doing a lot of fumbling this past week trying to > get the colocation and order statements correct, without affecting this > behavior. > > What am I missing? > > Here are the first indications of this restart issue during the restart > of Pacemaker and friends with the boot. I have attached more messages. > > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: > te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, > id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, magic=NA, > cib=0.474.170) : Transient attribute: update > Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of CCM > Quorum: Ignore > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote > drbd_r0:1#011(Slave -> Master hikari2) > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart > virt#011(Started hikari) <<<<<<<<<<<<<<<<<< > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart > shared-gfs2:0#011(Started hikari) <<<<<<<< > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start > shared-gfs2:1#011(hikari2) > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: > te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, > id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, magic=NA, > cib=0.474.171) : Transient attribute: update > > Here are the current constraints resulting from fumbling (actually, > trying to make sense of all of the information obtained from a Google > searches): > > colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master > order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start > order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt > order o-gfs-then-virt inf: c_shared-gfs2 virt > > Full config file attached. > > For reference, here is "service blah status" for the set of services: > > [root@hikari2 ~]# ha-status > ------- service corosync status ------- > corosync (pid 1996) is running... > ------- service cman status ------- > cluster is running. > ------- service drbd status ------- > drbd driver loaded OK; device status: > version: 8.4.1 (api:1/proto:86-100) > GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by > phil@Build64R6, 2012-04-17 11:28:08 > m:res cs ro ds p mounted fstype > 1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2 > 2:r1 Connected Primary/Primary UpToDate/UpToDate C > 3:r2 Connected Primary/Primary UpToDate/UpToDate C > ------- service pacemaker status ------- > pacemakerd (pid 8912) is running... > ------- service gfs2 status ------- > Configured GFS2 mountpoints: > /shared > Active GFS2 mountpoints: > /shared > ------- service libvirtd status ------- > libvirtd (pid 2510) is running... > > [root@hikari ~]# crm_mon -1ro > ============ > Last updated: Wed Aug 8 21:01:47 2012 > Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari > Stack: cman > Current DC: hikari - partition with quorum > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 > 2 Nodes configured, 2 expected votes > 11 Resources configured. > ============ > > Online: [ hikari hikari2 ] > > Full list of resources: > > Master/Slave Set: drbd_r0_clone [drbd_r0] > Masters: [ hikari hikari2 ] > Master/Slave Set: drbd_r1_clone [drbd_r1] > Masters: [ hikari hikari2 ] > Master/Slave Set: drbd_r2_clone [drbd_r2] > Masters: [ hikari hikari2 ] > ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari > ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2 > virt (ocf::heartbeat:VirtualDomain): Started hikari > Clone Set: c_shared-gfs2 [shared-gfs2] > Started: [ hikari hikari2 ] > > Operations: > * Node hikari2: > drbd_r1:1: migration-threshold=1000000 > + (17) monitor: interval=60000ms rc=0 (ok) > + (26) promote: rc=0 (ok) > drbd_r0:1: migration-threshold=1000000 > + (21) promote: rc=0 (ok) > drbd_r2:1: migration-threshold=1000000 > + (19) monitor: interval=60000ms rc=0 (ok) > + (27) promote: rc=0 (ok) > ipmi-fencing-2: migration-threshold=1000000 > + (12) start: rc=0 (ok) > + (13) monitor: interval=240000ms rc=0 (ok) > shared-gfs2:1: migration-threshold=1000000 > + (25) start: rc=0 (ok) > * Node hikari: > drbd_r1:0: migration-threshold=1000000 > + (24) promote: rc=0 (ok) > drbd_r2:0: migration-threshold=1000000 > + (25) promote: rc=0 (ok) > shared-gfs2:0: migration-threshold=1000000 > + (92) start: rc=0 (ok) > drbd_r0:0: migration-threshold=1000000 > + (23) promote: rc=0 (ok) > ipmi-fencing-1: migration-threshold=1000000 > + (12) start: rc=0 (ok) > + (13) monitor: interval=240000ms rc=0 (ok) > virt: migration-threshold=1000000 > + (120) start: rc=0 (ok) > + (121) monitor: interval=10000ms rc=0 (ok) > > Thanks for reading ... > Bob Haxo > [email protected] > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
