Hi.
You have hit this:
Mar 3 16:49:16 breadnut2 VirtualDomain[20709]: INFO: Virtual domain vg.test1
currently has no state, retrying.
Mar 3 16:49:16 breadnut2 lrmd: [20694]: WARN: p-vd_vg.test1:monitor process
(PID 20709) timed out (try 1). Killing with signal SIGTERM (15).
Mar 3 16:49:16 breadnut2 lrmd: [20694]: WARN: operation monitor[5] on
ocf::VirtualDomain::p-vd_vg.test1 for client 20697, its parameters:
crm_feature_set=[3.0.5] config=[/etc/libvirt/qemu/vg.test1.xml]
CRM_meta_timeout=[20000] migration_transport=[tcp] : pid [20709] timed out
Mar 3 16:49:16 breadnut2 crmd: [20697]: ERROR: process_lrm_event: LRM
operation p-vd_vg.test1_monitor_0 (5) Timed Out (timeout=20000ms)
When a cluster node comes up, it is directed to probe each clustered
resource on the node. This behaviour does not depend on constraints,
this check is mandatory.
At the moment, libvirtd is not running yet. Thus, VirtualDomain RA is
unable to connect to it and to check if your VM is running. So it times
out after some time.
Timeout of monitor action implies "unknown error" of the resource.
Pengine cannot ensure that your resource is not running, so it believes
it is, and stops the resource everywhere, then starts it again to recover.
This is what you get. How to work around is a different story. Frankly,
I don't see a decent way.
VirtualDomain RA really cannot tell if VM is running while it cannot
connect to libvirtd. I'm not too sure, but your log suggests that
libvirtd will not be started until VirtualDomain monitor returns.
I'd suggest you to start libvirtd before corosync, from initscripts, and
see if it helps.
May anyone propose a cleaner solution?
--
Pavel Levshin
03.03.2011 9:05, AP пишет:
Hi,
Having deep issues with my cluster setup. Everything works ok until
I add a VirtualDomain RA in. Then things go pearshaped in that it seems
to ignore the "order" crm config for it and starts as soon as it can.
The crm config is provided below. Basically p-vd_vg.test1 attempts to
start despite p-libvirtd not being started and p-drbd_vg.test1 not
being master (or slave for that matter - ie it's not configured at all).
Eventually p-libvirtd and p-drbd_vg.test1 start and p-vd_vg.test1 attempts
to, pengine on the node where p-vd_vg.test1 is already running complains
with:
Mar 3 16:49:16 breadnut pengine: [2097]: ERROR: native_create_actions:
Resource p-vd_vg.test1 (ocf::VirtualDomain) is active on 2 nodes attempting
recovery
Mar 3 16:49:16 breadnut pengine: [2097]: WARN: See
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
Then mass slaughter occurs and p-vd_vg.test1 is restarted where it was
running previously whilst the other node gets an error for it.
Essentially I cannot restart the 2nd node without it breaking the 1st.
Now, as I understand it, a lone primitive will run once on any node - this
is just fine by me.
colo-vd_vg.test1 indicates that p-vd_vg.test1 should run where ms-drbd_vg.test1
is master. ms-drbd_vg.test1 should only be master where clone-libvirtd is
started.
order-vg.test1 indicates that ms-drbd_vg.test1 should start after clone-lvm_gh
is started (successfully). (This used to have a promote for ms-drbd_vg.test1
but then ms-drbd_vg.test1 would be demoted and not stopped on shutdown which
would cause clone-lvm_gh to error out on stop)
order-vd_vg.test1 indicates p-vd_vg.test1 should only start where
ms-drbd_vg.test1 and clone-libvirtd have both successfully started (the
order of their starting being irrelevant).
cli-standby-p-vd_vg.test1 was put there by my migrating p-vd_vg.test1
about the place.
This happens with or without fencing and with fencing configured as below
or as just a single primited with both nodes in the hostlist.
Help with this would be awesome and appreciated. I do not know what I am
missing here. The config makes sense to me so I don't even know where
to start poking and prodding. I be flailing.
Config and s/w version list is below:
OS: Debian Squeeze
Kernel: 2.6.37.2
PACKAGES:
ii cluster-agents 1:1.0.4-0ubuntu1~custom1 The
reusable cluster components for Linux HA
ii cluster-glue 1.0.7-3ubuntu1~custom1 The
reusable cluster components for Linux HA
ii corosync 1.3.0-1ubuntu1~custom1
Standards-based cluster framework (daemon and modules)
ii libccs3 3.1.0-0ubuntu1~custom1 Red Hat
cluster suite - cluster configuration libraries
ii libcib1 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - CIB
ii libcman3 3.1.0-0ubuntu1~custom1 Red Hat
cluster suite - cluster manager libraries
ii libcorosync4 1.3.0-1ubuntu1~custom1
Standards-based cluster framework (libraries)
ii libcrmcluster1 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - CRM
ii libcrmcommon2 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - common CRM
ii libfence4 3.1.0-0ubuntu1~custom1 Red Hat
cluster suite - fence client library
ii liblrm2 1.0.7-3ubuntu1~custom1 Reusable
cluster libraries -- liblrm2
ii libpe-rules2 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - rules for P-Engine
ii libpe-status3 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - status for P-Engine
ii libpengine3 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - P-Engine
ii libpils2 1.0.7-3ubuntu1~custom1 Reusable
cluster libraries -- libpils2
ii libplumb2 1.0.7-3ubuntu1~custom1 Reusable
cluster libraries -- libplumb2
ii libplumbgpl2 1.0.7-3ubuntu1~custom1 Reusable
cluster libraries -- libplumbgpl2
ii libstonith1 1.0.7-3ubuntu1~custom1 Reusable
cluster libraries -- libstonith1
ii libstonithd1 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - stonith
ii libtransitioner1 1.1.5-0ubuntu1~ppa1~custom1 The
Pacemaker libraries - transitioner
ii pacemaker 1.1.5-0ubuntu1~ppa1~custom1 HA cluster
resource manager
CONFIG:
node breadnut
node breadnut2 \
attributes standby="off"
primitive fencing-bn stonith:meatware \
params hostlist="breadnut" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="70s" \
op monitor interval="10" timeout="60s"
primitive fencing-bn2 stonith:meatware \
params hostlist="breadnut2" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="70s" \
op monitor interval="10" timeout="60s"
primitive p-drbd_vg.test1 ocf:linbit:drbd \
params drbd_resource="vg.test1" \
operations $id="ops-drbd_vg.test1" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s" \
op monitor interval="20" role="Master" timeout="20s" \
op monitor interval="30" role="Slave" timeout="20s"
primitive p-libvirtd ocf:local:libvirtd \
meta allow-migrate="off" \
op start interval="0" timeout="200s" \
op stop interval="0" timeout="100s" \
op monitor interval="10" timeout="200s"
primitive p-lvm_gh ocf:heartbeat:LVM \
params volgrpname="gh" \
meta allow-migrate="off" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
op monitor interval="10" timeout="100s"
primitive p-vd_vg.test1 ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/vg.test1.xml" \
params migration_transport="tcp" \
meta allow-migrate="true" is-managed="true" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op migrate_to interval="0" timeout="120s" \
op migrate_from interval="0" timeout="120s" \
op monitor interval="10s" timeout="120s"
ms ms-drbd_vg.test1 p-drbd_vg.test1 \
meta resource-stickines="100" notify="true" master-max="2"
target-role="Master"
clone clone-libvirtd p-libvirtd \
meta interleave="true"
clone clone-lvm_gh p-lvm_gh \
meta interleave="true"
location cli-standby-p-vd_vg.test1 p-vd_vg.test1 \
rule $id="cli-standby-rule-p-vd_vg.test1" -inf: #uname eq breadnut2
location loc-fencing-bn fencing-bn -inf: breadnut
location loc-fencing-bn2 fencing-bn2 -inf: breadnut2
colocation colo-vd_vg.test1 inf: p-vd_vg.test1:Started ms-drbd_vg.test1:Master
clone-libvirtd:Started
order order-vd_vg.test1 inf: ( ms-drbd_vg.test1:start clone-libvirtd:start )
p-vd_vg.test1:start
order order-vg.test1 inf: clone-lvm_gh:start ms-drbd_vg.test1:start
property $id="cib-bootstrap-options" \
dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
default-resource-stickiness="1000" \
stonith-enabled="true" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
last-lrm-refresh="1299128317"
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker