On Mon, Aug 13, 2012 at 11:27 AM, Bob Haxo <[email protected]> wrote: > > On Fri, 2012-08-10 at 12:21 +1000, Andrew Beekhof wrote: >> On Thu, Aug 9, 2012 at 12:14 PM, Bob Haxo <bhaxo at sgi.com> wrote: >> > Greetings. >> > >> > I have followed the setup instructions of Clusters From Scratch : >> > Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5, >> > including locating the new cman pages that do not seem to be linked into >> > the main document, for example, >> > >> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html >> >> The 1.1 document was updated for corosync 2.x >> I kept the cman/plugin version around but moved it to: >> >> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html >> >> Look for "Version: 1.1-plugin" on the main docs page. > > Andrew, much thanks for the response ... and much thanks here ... I had > not connected the dots regarding use of cman being an *earlier* version > of the docs (and software stack). > >> >> > >> > The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2, >> > Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave >> > anybody off the party list. >> > >> > I have these all working together to support "live" migration of the >> > virt client between the two phys hosts, so at that level, all is good. >> > >> > Questions: Is there a document that covers the fully covers such an >> > installation, meaning the extends the Cluster From Scratch (and replaces >> > the Apache example) to implementation of a HA virtual client? For >> > instance, should libvirtd be handled as a Pacemaker resource, or should >> > it be started as an system service at boot? What should be done with >> > "libvirt-guests"? >> >> These things I do not know sorry. >> >> > Should cman be started as a system service at boot? >> >> I prefer not to, but its just a personal preference. >> I run potentially broken versions of the cluster and have been hit >> hard before with processes running amok and putting machines into >> reboot cycles. > > Ah, right. I too in my testing start cman and pacemaker manually. I > was thinking more of when moving from testing to production. I think > you have answered that. > >> >> > >> > Problem: When the the non-VM-host is rebooted, then when Pacemaker >> > restarts the gfs2 filesystem gets restarted on the VM host, which causes >> > the stop and start of the VirtualDomain. The gfs2 filesystem also gets >> > restarted without of the VirtualDomain resource included. >> >> This sounds like the "starting a clone on A causes a restart of the >> clone on B" bug. >> I think we've squashed that one now but not in a released version... >> how confident are you at creating rpms? > > :-) Well "how confident" depends upon the precise meaning of "creating > rpms" .. if this is building a rpm given a working spec file, then that > I can do. If it is a matter of making mods to an almost working spec > file, that I can do. If it involves creating the spec file from scratch > for a large project, that would be a challenge.
Yeah, that would be asking a bit much :) Depending on how "clean" the machine you're working on is, and if its running the same software versions as the machine that the results will be installed on, you /should/ be able to check out the latest git and run 'make rpm'. Otherwise you might need to set up mock and run something like 'make mock-epel-6-x86_64' from the top of the latest pacemaker git tree. > > FYI, I'm trying to get Pacemaker accepted for use in a product rather > than rgmanager. > > Thanks, Andrew. > Bob Haxo > bhaxo at sgi.com > >> >> > This behavior does not seem correct ... I think I would have flagged it >> > in my memory if I'd encountered the behavior when working with the SLES >> > HAE product. I've been doing a lot of fumbling this past week trying to >> > get the colocation and order statements correct, without affecting this >> > behavior. >> > >> > What am I missing? >> > >> > Here are the first indications of this restart issue during the restart >> > of Pacemaker and friends with the boot. I have attached more messages. >> > >> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: >> > te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, >> > id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, >> > magic=NA, cib=0.474.170) : Transient attribute: update >> > Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State >> > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC >> > cause=C_FSA_INTERNAL origin=abort_transition_graph ] >> > Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of >> > CCM Quorum: Ignore >> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote >> > drbd_r0:1#011(Slave -> Master hikari2) >> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart >> > virt#011(Started hikari) <<<<<<<<<<<<<<<<<< >> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart >> > shared-gfs2:0#011(Started hikari) <<<<<<<< >> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start >> > shared-gfs2:1#011(hikari2) >> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: >> > te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, >> > id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, >> > magic=NA, cib=0.474.171) : Transient attribute: update >> > >> > Here are the current constraints resulting from fumbling (actually, >> > trying to make sense of all of the information obtained from a Google >> > searches): >> > >> > colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master >> > order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start >> > order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt >> > order o-gfs-then-virt inf: c_shared-gfs2 virt >> > >> > Full config file attached. >> > >> > For reference, here is "service blah status" for the set of services: >> > >> > [root@hikari2 ~]# ha-status >> > ------- service corosync status ------- >> > corosync (pid 1996) is running... >> > ------- service cman status ------- >> > cluster is running. >> > ------- service drbd status ------- >> > drbd driver loaded OK; device status: >> > version: 8.4.1 (api:1/proto:86-100) >> > GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by >> > phil@Build64R6, 2012-04-17 11:28:08 >> > m:res cs ro ds p mounted fstype >> > 1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2 >> > 2:r1 Connected Primary/Primary UpToDate/UpToDate C >> > 3:r2 Connected Primary/Primary UpToDate/UpToDate C >> > ------- service pacemaker status ------- >> > pacemakerd (pid 8912) is running... >> > ------- service gfs2 status ------- >> > Configured GFS2 mountpoints: >> > /shared >> > Active GFS2 mountpoints: >> > /shared >> > ------- service libvirtd status ------- >> > libvirtd (pid 2510) is running... >> > >> > [root@hikari ~]# crm_mon -1ro >> > ============ >> > Last updated: Wed Aug 8 21:01:47 2012 >> > Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari >> > Stack: cman >> > Current DC: hikari - partition with quorum >> > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 >> > 2 Nodes configured, 2 expected votes >> > 11 Resources configured. >> > ============ >> > >> > Online: [ hikari hikari2 ] >> > >> > Full list of resources: >> > >> > Master/Slave Set: drbd_r0_clone [drbd_r0] >> > Masters: [ hikari hikari2 ] >> > Master/Slave Set: drbd_r1_clone [drbd_r1] >> > Masters: [ hikari hikari2 ] >> > Master/Slave Set: drbd_r2_clone [drbd_r2] >> > Masters: [ hikari hikari2 ] >> > ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari >> > ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2 >> > virt (ocf::heartbeat:VirtualDomain): Started hikari >> > Clone Set: c_shared-gfs2 [shared-gfs2] >> > Started: [ hikari hikari2 ] >> > >> > Operations: >> > * Node hikari2: >> > drbd_r1:1: migration-threshold=1000000 >> > + (17) monitor: interval=60000ms rc=0 (ok) >> > + (26) promote: rc=0 (ok) >> > drbd_r0:1: migration-threshold=1000000 >> > + (21) promote: rc=0 (ok) >> > drbd_r2:1: migration-threshold=1000000 >> > + (19) monitor: interval=60000ms rc=0 (ok) >> > + (27) promote: rc=0 (ok) >> > ipmi-fencing-2: migration-threshold=1000000 >> > + (12) start: rc=0 (ok) >> > + (13) monitor: interval=240000ms rc=0 (ok) >> > shared-gfs2:1: migration-threshold=1000000 >> > + (25) start: rc=0 (ok) >> > * Node hikari: >> > drbd_r1:0: migration-threshold=1000000 >> > + (24) promote: rc=0 (ok) >> > drbd_r2:0: migration-threshold=1000000 >> > + (25) promote: rc=0 (ok) >> > shared-gfs2:0: migration-threshold=1000000 >> > + (92) start: rc=0 (ok) >> > drbd_r0:0: migration-threshold=1000000 >> > + (23) promote: rc=0 (ok) >> > ipmi-fencing-1: migration-threshold=1000000 >> > + (12) start: rc=0 (ok) >> > + (13) monitor: interval=240000ms rc=0 (ok) >> > virt: migration-threshold=1000000 >> > + (120) start: rc=0 (ok) >> > + (121) monitor: interval=10000ms rc=0 (ok) >> > >> > Thanks for reading ... >> > Bob Haxo >> > bhaxo @ sgi.com >> > >> > _______________________________________________ >> > Pacemaker mailing list: [email protected] >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
