Mistery solved: Never put: [ "${2}" = release ] && crm resource stop VMA_${1} inside /etc/libvirt/hooks/qemu
Very wrong decision. On Monday 07 December 2015 16:49:01 emmanuel segura wrote: > the next time show your full config unless your config have something > special that you can't show. > > 2015-12-07 9:08 GMT+01:00 Klechomir <kle...@gmail.com>: > > Hi, > > Sorry didn't get your point. > > > > The xml of the VM is on a active-active drbd drive with ocfs2 fs on it and > > is visible from both nodes. > > The live migration is always successful. > > > > On 4.12.2015 19:30, emmanuel segura wrote: > >> I think the xml of your vm need to available on both nodes, but your > >> using a failover resource Filesystem_CDrive1, because pacemaker > >> monitor resource on both nodes to check if they are running in > >> multiple nodes. > >> > >> 2015-12-04 18:06 GMT+01:00 Ken Gaillot <kgail...@redhat.com>: > >>> On 12/04/2015 10:22 AM, Klechomir wrote: > >>>> Hi list, > >>>> My issue is the following: > >>>> > >>>> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8 > >>>> (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3) > >>>> > >>>> Bumped on this issue when started playing with VirtualDomain resources, > >>>> but this seems to be unrelated to the RA. > >>>> > >>>> The problem is that without apparent reason a resource gets > >>>> target-role="Stopped". This happens after (successful) migration, or > >>>> after failover., or after VM restart . > >>>> > >>>> My tests showed that changing the resource name fixes this problem, but > >>>> this seems to be a temporary workaround. > >>>> > >>>> The resource configuration is: > >>>> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \ > >>>> > >>>> params config="/NFSvolumes/CDrive1/VM1/VM1.xml" > >>>> > >>>> hypervisor="qemu:///system" migration_transport="tcp" \ > >>>> > >>>> meta allow-migrate="true" target-role="Started" \ > >>>> op start interval="0" timeout="120s" \ > >>>> op stop interval="0" timeout="120s" \ > >>>> op monitor interval="10" timeout="30" depth="0" \ > >>>> utilization cpu="1" hv_memory="925" > >>>> > >>>> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1 > >>>> > >>>> Here is the log from one such stop, after successful migration with > >>>> "crm > >>>> migrate resource VMA_VM1": > >>>> > >>>> Dec 04 15:18:22 [3818929] CLUSTER-1 crmd: debug: cancel_op: > >>>> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564) > >>>> Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info: > >>>> cancel_recurring_action: Cancelling operation > >>>> VMA_VM1_monitor_10000 > >>>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd: debug: cancel_op: > >>>> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled > >>>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd: debug: > >>>> do_lrm_rsc_op: Performing > >>>> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 > >>>> op=VMA_VM1_migrate_to_0 > >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:23 DEBUG: > >>>> Virtual domain VM1 is currently running. > >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:23 INFO: VM1: > >>>> Starting live migration to CLUSTER-2 (using virsh > >>>> --connect=qemu:///system --quiet migrate --live VM1 > >>>> qemu+tcp://CLUSTER-2/system ). > >>>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info: > >>>> process_lrm_event: LRM operation VMA_VM1_monitor_10000 (call=5564, > >>>> status=1, cib-update=0, confirmed=false) Cancelled > >>>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: debug: > >>>> update_history_cache: Updating history for 'VMA_VM1' with > >>>> monitor op > >>>> VirtualDomain(VMA_VM1)[1797698]: 2015/12/04_15:18:26 INFO: VM1: > >>>> live migration to CLUSTER-2 succeeded. > >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: debug: > >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with > >>>> rc=0 > >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: > >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 [ > >>>> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 > >>>> (using virsh --connect=qemu:///system --quiet migrate --live VM1 > >>>> qemu+tcp://CLUSTER-2/system ). ] > >>>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: > >>>> operation_finished: VMA_VM1_migrate_to_0:1797698 [ > >>>> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ] > >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: debug: > >>>> create_operation_update: do_update_resource: Updating resouce > >>>> VMA_VM1 after complete migrate_to op (interval=0) > >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice: > >>>> process_lrm_event: LRM operation VMA_VM1_migrate_to_0 (call=5697, > >>>> rc=0, cib-update=89, confirmed=true) ok > >>>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: debug: > >>>> update_history_cache: Updating history for 'VMA_VM1' with > >>>> migrate_to op > >>>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd: debug: cancel_op: > >>>> Operation VMA_VM1:5564 already cancelled > >>>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd: debug: > >>>> do_lrm_rsc_op: Performing > >>>> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0 > >>>> VirtualDomain(VMA_VM1)[1798719]: 2015/12/04_15:18:31 DEBUG: > >>>> Virtual domain VM1 is not running: failed to get domain 'vm1' domain > >>>> not found: no domain with matching name 'vm1' > >>> > >>> This looks like the problem. Configuration error? > >>> > >>>> VirtualDomain(VMA_VM1)[1798719]: 2015/12/04_15:18:31 INFO: > >>>> Domain > >>>> VM1 already stopped. > >>>> Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: debug: > >>>> operation_finished: VMA_VM1_stop_0:1798719 - exited with rc=0 > >>>> Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: notice: > >>>> operation_finished: VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31 > >>>> INFO: Domain VM1 already stopped. ] > >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: debug: > >>>> create_operation_update: do_update_resource: Updating resouce > >>>> VMA_VM1 after complete stop op (interval=0) > >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: notice: > >>>> process_lrm_event: LRM operation VMA_VM1_stop_0 (call=5701, rc=0, > >>>> cib-update=90, confirmed=true) ok > >>>> Dec 04 15:18:32 [3818929] CLUSTER-1 crmd: debug: > >>>> update_history_cache: Updating history for 'VMA_VM1' with stop > >>>> op > >>>> Dec 04 15:20:58 [3818929] CLUSTER-1 crmd: debug: > >>>> create_operation_update: build_active_RAs: Updating resouce > >>>> VMA_VM1 > >>>> after complete stop op (interval=0) > >>>> Dec 04 15:20:58 [3818929] CLUSTER-1 crmd: debug: > >>>> create_operation_update: build_active_RAs: Updating resouce > >>>> VMA_VM1 > >>>> after complete monitor op (interval=0) > >>>> Dec 04 15:23:31 [1833996] CLUSTER-1 crm_resource: debug: > >>>> process_orphan_resource: Detected orphan resource VMA_VM1 on > >>>> CLUSTER-2 > >>>> > >>>> Any suggestions are welcome. > >>>> > >>>> Best regards, > >>>> Klecho > >>>> > >>>> _______________________________________________ > >>>> Users mailing list: Users@clusterlabs.org > >>>> http://clusterlabs.org/mailman/listinfo/users > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: > >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: http://bugs.clusterlabs.org > >>> > >>> _______________________________________________ > >>> Users mailing list: Users@clusterlabs.org > >>> http://clusterlabs.org/mailman/listinfo/users > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org