Re: [Pacemaker] resource agent starting out-of-order

Pavel Levshin Sun, 13 Mar 2011 13:21:48 -0700

Hi.

You have hit this:


Mar  3 16:49:16 breadnut2 VirtualDomain[20709]: INFO: Virtual domain vg.test1 
currently has no state, retrying.
Mar  3 16:49:16 breadnut2 lrmd: [20694]: WARN: p-vd_vg.test1:monitor process 
(PID 20709) timed out (try 1).  Killing with signal SIGTERM (15).
Mar  3 16:49:16 breadnut2 lrmd: [20694]: WARN: operation monitor[5] on 
ocf::VirtualDomain::p-vd_vg.test1 for client 20697, its parameters: 
crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/vg.test1.xml] 
CRM_meta_timeout=[20000] migration_transport=[tcp] : pid [20709] timed out
Mar  3 16:49:16 breadnut2 crmd: [20697]: ERROR: process_lrm_event: LRM 
operation p-vd_vg.test1_monitor_0 (5) Timed Out (timeout=20000ms)

When a cluster node comes up, it is directed to probe each clusteredresource on the node. This behaviour does not depend on constraints,this check is mandatory.

At the moment, libvirtd is not running yet. Thus, VirtualDomain RA isunable to connect to it and to check if your VM is running. So it timesout after some time.

Timeout of monitor action implies "unknown error" of the resource.Pengine cannot ensure that your resource is not running, so it believesit is, and stops the resource everywhere, then starts it again to recover.

This is what you get. How to work around is a different story. Frankly,I don't see a decent way.

VirtualDomain RA really cannot tell if VM is running while it cannotconnect to libvirtd. I'm not too sure, but your log suggests thatlibvirtd will not be started until VirtualDomain monitor returns.

I'd suggest you to start libvirtd before corosync, from initscripts, andsee if it helps.


May anyone propose a cleaner solution?


--
Pavel Levshin


03.03.2011 9:05, AP пишет:

Hi,

Having deep issues with my cluster setup. Everything works ok until
I add a VirtualDomain RA in. Then things go pearshaped in that it seems
to ignore the "order" crm config for it and starts as soon as it can.

The crm config is provided below. Basically p-vd_vg.test1 attempts to
start despite p-libvirtd not being started and p-drbd_vg.test1 not
being master (or slave for that matter - ie it's not configured at all).

Eventually p-libvirtd and p-drbd_vg.test1 start and p-vd_vg.test1 attempts
to, pengine on the node where p-vd_vg.test1 is already running complains
with:

Mar  3 16:49:16 breadnut pengine: [2097]: ERROR: native_create_actions: 
Resource p-vd_vg.test1 (ocf::VirtualDomain) is active on 2 nodes attempting 
recovery
Mar  3 16:49:16 breadnut pengine: [2097]: WARN: See 
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.

Then mass slaughter occurs and p-vd_vg.test1 is restarted where it was
running previously whilst the other node gets an error for it.

Essentially I cannot restart the 2nd node without it breaking the 1st.

Now, as I understand it, a lone primitive will run once on any node - this
is just fine by me.

colo-vd_vg.test1 indicates that p-vd_vg.test1 should run where ms-drbd_vg.test1
is master. ms-drbd_vg.test1 should only be master where clone-libvirtd is
started.

order-vg.test1 indicates that ms-drbd_vg.test1 should start after clone-lvm_gh
is started (successfully). (This used to have a promote for ms-drbd_vg.test1
but then ms-drbd_vg.test1 would be demoted and not stopped on shutdown which
would cause clone-lvm_gh to error out on stop)

order-vd_vg.test1 indicates p-vd_vg.test1 should only start where
ms-drbd_vg.test1 and clone-libvirtd have both successfully started (the
order of their starting being irrelevant).

cli-standby-p-vd_vg.test1 was put there by my migrating p-vd_vg.test1
about the place.

This happens with or without fencing and with fencing configured as below
or as just a single primited with both nodes in the hostlist.

Help with this would be awesome and appreciated. I do not know what I am
missing here. The config makes sense to me so I don't even know where
to start poking and prodding. I be flailing.

Config and s/w version list is below:

OS: Debian Squeeze
Kernel: 2.6.37.2

PACKAGES:

ii  cluster-agents                      1:1.0.4-0ubuntu1~custom1     The 
reusable cluster components for Linux HA
ii  cluster-glue                        1.0.7-3ubuntu1~custom1       The 
reusable cluster components for Linux HA
ii  corosync                            1.3.0-1ubuntu1~custom1       
Standards-based cluster framework (daemon and modules)
ii  libccs3                             3.1.0-0ubuntu1~custom1       Red Hat 
cluster suite - cluster configuration libraries
ii  libcib1                             1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - CIB
ii  libcman3                            3.1.0-0ubuntu1~custom1       Red Hat 
cluster suite - cluster manager libraries
ii  libcorosync4                        1.3.0-1ubuntu1~custom1       
Standards-based cluster framework (libraries)
ii  libcrmcluster1                      1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - CRM
ii  libcrmcommon2                       1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - common CRM
ii  libfence4                           3.1.0-0ubuntu1~custom1       Red Hat 
cluster suite - fence client library
ii  liblrm2                             1.0.7-3ubuntu1~custom1       Reusable 
cluster libraries -- liblrm2
ii  libpe-rules2                        1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - rules for P-Engine
ii  libpe-status3                       1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - status for P-Engine
ii  libpengine3                         1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - P-Engine
ii  libpils2                            1.0.7-3ubuntu1~custom1       Reusable 
cluster libraries -- libpils2
ii  libplumb2                           1.0.7-3ubuntu1~custom1       Reusable 
cluster libraries -- libplumb2
ii  libplumbgpl2                        1.0.7-3ubuntu1~custom1       Reusable 
cluster libraries -- libplumbgpl2
ii  libstonith1                         1.0.7-3ubuntu1~custom1       Reusable 
cluster libraries -- libstonith1
ii  libstonithd1                        1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - stonith
ii  libtransitioner1                    1.1.5-0ubuntu1~ppa1~custom1  The 
Pacemaker libraries - transitioner
ii  pacemaker                           1.1.5-0ubuntu1~ppa1~custom1  HA cluster 
resource manager

CONFIG:

node breadnut
node breadnut2 \
         attributes standby="off"
primitive fencing-bn stonith:meatware \
         params hostlist="breadnut" \
         op start interval="0" timeout="60s" \
         op stop interval="0" timeout="70s" \
         op monitor interval="10" timeout="60s"
primitive fencing-bn2 stonith:meatware \
         params hostlist="breadnut2" \
         op start interval="0" timeout="60s" \
         op stop interval="0" timeout="70s" \
         op monitor interval="10" timeout="60s"
primitive p-drbd_vg.test1 ocf:linbit:drbd \
         params drbd_resource="vg.test1" \
         operations $id="ops-drbd_vg.test1" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="20" role="Master" timeout="20s" \
         op monitor interval="30" role="Slave" timeout="20s"
primitive p-libvirtd ocf:local:libvirtd \
         meta allow-migrate="off" \
         op start interval="0" timeout="200s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="10" timeout="200s"
primitive p-lvm_gh ocf:heartbeat:LVM \
         params volgrpname="gh" \
         meta allow-migrate="off" \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s" \
         op monitor interval="10" timeout="100s"
primitive p-vd_vg.test1 ocf:heartbeat:VirtualDomain \
         params config="/etc/libvirt/qemu/vg.test1.xml" \
         params migration_transport="tcp" \
         meta allow-migrate="true" is-managed="true" \
         op start interval="0" timeout="120s" \
         op stop interval="0" timeout="120s" \
         op migrate_to interval="0" timeout="120s" \
         op migrate_from interval="0" timeout="120s" \
         op monitor interval="10s" timeout="120s"
ms ms-drbd_vg.test1 p-drbd_vg.test1 \
         meta resource-stickines="100" notify="true" master-max="2" 
target-role="Master"
clone clone-libvirtd p-libvirtd \
         meta interleave="true"
clone clone-lvm_gh p-lvm_gh \
         meta interleave="true"
location cli-standby-p-vd_vg.test1 p-vd_vg.test1 \
         rule $id="cli-standby-rule-p-vd_vg.test1" -inf: #uname eq breadnut2
location loc-fencing-bn fencing-bn -inf: breadnut
location loc-fencing-bn2 fencing-bn2 -inf: breadnut2
colocation colo-vd_vg.test1 inf: p-vd_vg.test1:Started ms-drbd_vg.test1:Master 
clone-libvirtd:Started
order order-vd_vg.test1 inf: ( ms-drbd_vg.test1:start clone-libvirtd:start ) 
p-vd_vg.test1:start
order order-vg.test1 inf: clone-lvm_gh:start ms-drbd_vg.test1:start
property $id="cib-bootstrap-options" \
         dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
         cluster-infrastructure="openais" \
         default-resource-stickiness="1000" \
         stonith-enabled="true" \
         expected-quorum-votes="2" \
         no-quorum-policy="ignore" \
         last-lrm-refresh="1299128317"



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] resource agent starting out-of-order

Reply via email to