On 20 Dec 2013, at 5:30 am, Bob Haxo <bh...@sgi.com> wrote:

> Hello,
> 
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> [pacemaker] VirtualDomain problem after reboot of one node
> 
> 
> My configuration:
> 
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
> 
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> 
> Two node HA VM cluster using real shared drive, not drbd.
> 
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
> 
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
> 
> Observation:
> 
> Rebooting the NON-host system results in the restart of the VM merrily 
> running on the host system.

I'm still bootstrapping after the break, but I'm not following this.  Can you 
rephrase? 

> 
> Apparent cause:
> 
> Upon startup, Pacemaker apparently checks the status of configured resources. 
> However, the status request for the virt (ocf:heartbeat:VirtualDomain) 
> resource fails with:
> 
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: 
> child_timeout_callback:        virt_monitor_0 process (PID 4158) timed out
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:  warning: operation_finished:  
>   virt_monitor_0:4158 - timed out after 200000ms
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: no valid connection ]
> Dec 18 12:19:30 [4147] mici-admin2       lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': No such file or directory ]

Sounds like the agent should perhaps be returning OCF_NOT_RUNNING in this case.

> 
> 
> This failure then snowballs into an "orphan" situation in which the running 
> VM is restarted.
> 
> There was the suggestion of chkconfig on libvirtd (and presumably deleting 
> the resource) so that the /var/run/libvirt/libvirt-sock has been created by 
> service libvirtd. With libvirtd started by the system, there is no un-needed 
> reboot of the VM.
> 
> However, it may be that removing libvirtd from Pacemaker control leaves the 
> VM vdisk filesystem susceptible to corruption during a reboot induced 
> failover.
> 
> Question:
> 
> Is there an accepted Pacemaker configuration such that the un-needed restart 
> of the VM does not occur with the reboot of the non-host system?
> 
> Regards,
> Bob Haxo
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to