remove the libvirtd from pacemaker and chkconfig libvirtd on every node, like that the cluster just manage the vm, maybe i wrong but i don't see any reason for put libvirtd as primitivi in pacemaker
2013/12/19 Bob Haxo <bh...@sgi.com> > Hi Emmanuel, > > Thanks for the suggestions. It is pretty clear what is the problem; it's > just not clear what is the fix or the work-around. > > Search the Pacemaker email archive for the email of Andrew Beekhof, 12 Oct > 2012, "Re: [Pacemaker] chicken-egg-problem with libvirtd and a VM within > cluster", and the email to which he is responding (from Tom Fernandes). > > The status/monitor function of VirtualDomain fails because the > /var/run/libvirt/libvirt-sock has not been created. This socket is > created by the lsb:libvirtd, but that is not started (as a resource) until > Pacemaker has heard back from heartbeat:VirtualDomain, which will never > happen until /var/run/libvirt/libvirt-sock has been created ("service > libvirtd start" during this wait period does enable Pacemaker to continue > starting resources). After the VirtualDomain monitor function timeout, > Pacemaker deals with the failing logic loop, resulting in a re-start of the > VM. > > I hoping that "Unfortunately we still don't have a good answer for you." > is no longer the case, and that there is a fix or that there is a community > accepted workaround for the issue. > > > Regards, > Bob Haxo > > > > > > > On Thu, 2013-12-19 at 19:48 +0100, emmanuel segura wrote: > > Maybe the problem is this, the cluster try to start the vm and libvirtd > isn't started > > > > 2013/12/19 emmanuel segura <emi2f...@gmail.com> > > if don't set your vm to start at boot time, you don't to put in cluster > libvirtd, maybe the problem isn't this, but why put the os services in > cluster, for example crond ...... :) > > > > 2013/12/19 Bob Haxo <bh...@sgi.com> > > Hello, > > Earlier emails related to this topic: > [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster > [pacemaker] VirtualDomain problem after reboot of one node > > > My configuration: > > RHEL6.5/CMAN/gfs2/Pacemaker/crmsh > > pacemaker-libs-1.1.10-14.el6_5.1.x86_64 > pacemaker-cli-1.1.10-14.el6_5.1.x86_64 > pacemaker-1.1.10-14.el6_5.1.x86_64 > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64 > > Two node HA VM cluster using real shared drive, not drbd. > > Resources (relevant to this discussion): > primitive p_fs_images ocf:heartbeat:Filesystem \ > primitive p_libvirtd lsb:libvirtd \ > primitive virt ocf:heartbeat:VirtualDomain \ > > services chkconfig on: cman, clvmd, pacemaker > services chkconfig off: corosync, gfs2, libvirtd > > Observation: > > Rebooting the NON-host system results in the restart of the VM merrily > running on the host system. > > Apparent cause: > > Upon startup, Pacemaker apparently checks the status of configured > resources. However, the status request for the virt > (ocf:heartbeat:VirtualDomain) resource fails with: > > Dec 18 12:19:30 [4147] mici-admin2 lrmd: warning: > child_timeout_callback: virt_monitor_0 process (PID 4158) timed outDec > 18 12:19:30 [4147] mici-admin2 lrmd: warning: operation_finished: > virt_monitor_0:4158 - timed out after 200000msDec 18 12:19:30 [4147] > mici-admin2 lrmd: notice: operation_finished: > virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor > ]Dec 18 12:19:30 [4147] mici-admin2 lrmd: notice: operation_finished: > virt_monitor_0:4158:stderr [ error: no valid connection ]Dec 18 12:19:30 > [4147] mici-admin2 lrmd: notice: operation_finished: > virt_monitor_0:4158:stderr [ error: Failed to connect socket to > '/var/run/libvirt/libvirt-sock': No such file or directory ] > > This failure then snowballs into an "orphan" situation in which the > running VM is restarted. > > There was the suggestion of chkconfig on libvirtd (and presumably deleting > the resource) so that the /var/run/libvirt/libvirt-sock has been created by > service libvirtd. With libvirtd started by the system, there is no > un-needed reboot of the VM. > > However, it may be that removing libvirtd from Pacemaker control leaves > the VM vdisk filesystem susceptible to corruption during a reboot induced > failover. > > Question: > > Is there an accepted Pacemaker configuration such that the un-needed > restart of the VM does not occur with the reboot of the non-host system? > > Regards, > Bob Haxo > > > > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > _______________________________________________Pacemaker mailing list: > Pacemaker@oss.clusterlabs.orghttp://oss.clusterlabs.org/mailman/listinfo/pacemaker > Project Home: http://www.clusterlabs.orgGetting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdfBugs: > http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > -- esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org