Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. Status URL is enabled in both nodes. On Oct 30, 2014 11:14 AM, "Andrew Beekhof" wrote: > > > On 29 Oct 2014, at 1:01 pm, Sihan Goi wrote: > > > > Hi, > > > > I've never used crm_report before. I just read the man file and > generated a tarball from 1-2 hours before I reconfigured all the DRBD > related resources. I've put the tarball here - > https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 > > > > Hope you can help figure out what I'm doing wrong. Thanks for the help! > > Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for > /dev/drbd/by-res/wwwdata on /var/www/html > Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with > ordered data mode. Opts: > Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM > operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok > Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating > action 7: start WebSite_start_0 on node02 (local) > Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line > 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory > > Is DocumentRoot still set to /var/www/html? > If so, what happens if you run 'ls -al /var/www/html' in a shell? > > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache > /etc/httpd/conf/httpd.conf to come up > > Did you enable the status url? > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
> On 11 Nov 2014, at 4:39 am, Daniel Dehennin > wrote: > > emmanuel segura writes: > >> I think, you don't have fencing configured in your cluster. > > I have fencing configured and working, modulo fencing VMs on dead host[1]. Are you saying that the host and the VMs running inside it are both part of the same cluster? > > Regards. > > Footnotes: > [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html > > -- > Daniel Dehennin > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
Tomasz Kontusz writes: > Hanging corosync sounds like libqb problems: trusty comes with 0.16, > which likes to hang from time to time. Try building libqb 0.17. Thanks, I'll look at this. Is there a way to get back to normal state without rebooting all machines and interrupting services? I thought about a lightweight version of something like: 1. stop pacemaker on all nodes without doing anything with resources, they all continue to work 2. stop corosync on all nodes 3. start corosync on all nodes 4. start pacemaker on all nodes, as services are running nothing needs to be done I looked in the documentation but fail to find some kind of cluster management best practices. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
emmanuel segura writes: > I think, you don't have fencing configured in your cluster. I have fencing configured and working, modulo fencing VMs on dead host[1]. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
Hanging corosync sounds like libqb problems: trusty comes with 0.16, which likes to hang from time to time. Try building libqb 0.17. Daniel Dehennin napisał: >Hello, > >I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was >blocked. > >The “dlm_tool ls” command told me “wait ringid”. > >The corosync-* commands hangs (like corosync-quorumtool). > >The pacemaker “crm_mon” display nothing wrong. > >I'm using Ubuntu Trusty Tahr: > >- corosync 2.3.3-1ubuntu1 >- pacemaker 1.1.10+git20130802-1ubuntu2.1 > >My cluster was manually rebooted. > >Any idea how to debug such situation? > >Regards. >-- >Daniel Dehennin >Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF >Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > > > >___ >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >Project Home: http://www.clusterlabs.org >Getting started: >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org -- Wysłane za pomocą K-9 Mail.___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
В Mon, 10 Nov 2014 10:07:18 +0100 Tomasz Kontusz пишет: > I think the suggestion was to put shooting the host in the fencing path of a > VM. This way if you can't get the host to fence the VM (as the host is > already dead) you just check if the host was fenced. > Exactly. One thing I do not know how it will behave in case of multiple VMs on the same host. I.e. will pacemaker try to fence host for every VM or recognize that all VMs are dead after the first time agent is invoked. > Daniel Dehennin napisał: > >Andrei Borzenkov writes: > > > > > >[...] > > > >>> Now I have one issue, when the bare metal host on which the VM is > >>> running die, the VM is lost and can not be fenced. > >>> > >>> Is there a way to make pacemaker ACK the fencing of the VM running > >on a > >>> host when the host is fenced itself? > >>> > >> > >> Yes, you can define multiple stonith agents and priority between > >them. > >> > >> http://clusterlabs.org/wiki/Fencing_topology > > > >Hello, > > > >If I understand correctly, fencing topology is the way to have several > >fencing devices for a node and try them consecutively until one works. > > > >In my configuration, I group the VM stonith agents with the > >corresponding VM resource, to make them move together[1]. > > > >Here is my use case: > > > >1. Resource ONE-Frontend-Group runs on nebula1 > >2. nebula1 is fenced > >3. node one-fronted can not be fenced > > > >Is there a way to say that the life on node one-frontend is related to > >the state of resource ONE-Frontend? > > > >In which case when the node nebula1 is fenced, pacemaker should be > >aware that > >resource ONE-Frontend is not running any more, so node one-frontend is > >OFFLINE and not UNCLEAN. > > > >Regards. > > > >Footnotes: > >[1] > >http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html > > > >-- > >Daniel Dehennin > >Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > >Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > > > > > > > > > > >___ > >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > >Project Home: http://www.clusterlabs.org > >Getting started: > >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
I think, you don't have fencing configured in your cluster. 2014-11-10 17:02 GMT+01:00 Daniel Dehennin : > Daniel Dehennin writes: > >> Hello, > > Hello, > >> I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was >> blocked. >> >> The “dlm_tool ls” command told me “wait ringid”. > > It happened again: > > root@nebula2:~# dlm_tool ls > dlm lockspaces > name datastores > id0x1b61ba6a > flags 0x0004 kern_stop > changemember 4 joined 1 remove 0 failed 0 seq 3,3 > members 1084811078 1084811079 1084811080 108489 > new changemember 3 joined 0 remove 1 failed 1 seq 4,4 > new statuswait ringid > new members 1084811078 1084811079 1084811080 > > name clvmd > id0x4104eefa > flags 0x0004 kern_stop > changemember 4 joined 1 remove 0 failed 0 seq 3,3 > members 1084811078 1084811079 1084811080 108489 > new changemember 3 joined 0 remove 1 failed 1 seq 4,4 > new statuswait ringid > new members 1084811078 1084811079 1084811080 > > root@nebula2:~# dlm_tool status > cluster nodeid 1084811079 quorate 1 ring seq 21372 21372 > daemon now 8351 fence_pid 0 > fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now > 1415634734 > node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0 > node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0 > node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0 > node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0 > > Any idea? > -- > Daniel Dehennin > Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loosing corosync communication clusterwide
Daniel Dehennin writes: > Hello, Hello, > I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was > blocked. > > The “dlm_tool ls” command told me “wait ringid”. It happened again: root@nebula2:~# dlm_tool ls dlm lockspaces name datastores id0x1b61ba6a flags 0x0004 kern_stop changemember 4 joined 1 remove 0 failed 0 seq 3,3 members 1084811078 1084811079 1084811080 108489 new changemember 3 joined 0 remove 1 failed 1 seq 4,4 new statuswait ringid new members 1084811078 1084811079 1084811080 name clvmd id0x4104eefa flags 0x0004 kern_stop changemember 4 joined 1 remove 0 failed 0 seq 3,3 members 1084811078 1084811079 1084811080 108489 new changemember 3 joined 0 remove 1 failed 1 seq 4,4 new statuswait ringid new members 1084811078 1084811079 1084811080 root@nebula2:~# dlm_tool status cluster nodeid 1084811079 quorate 1 ring seq 21372 21372 daemon now 8351 fence_pid 0 fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now 1415634734 node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0 node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0 Any idea? -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Hey Team, I'm receiving some strange intermittent failovers on a two-node cluster (happens once every week or two). When this happens, both nodes are unavailable; one node will be marked offline and the other will be shown as unclean. Any help on this would be massively appreciated. Thanks. Running Ubuntu 12.04 (64-bit) Pacemaker 1.1.6-2ubuntu3.3 Corosync 1.4.2-2ubuntu0.2 Here are the logs: Nov 08 14:26:26 corosync [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x12bebe0, async-conn=0x12bebe0) left Nov 08 14:26:26 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:27 corosync [pcmk ] info: pcmk_ipc_exit: Client attrd (conn=0x12d0230, async-conn=0x12d0230) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x12c7d80, async-conn=0x12c7d80) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x12c3a20, async-conn=0x12c3a20) left Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12bebe0 for stonith-ng/0 Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c2f40 for attrd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c72a0 for cib/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to cib Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12cb600 for crmd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to crmd Output of crm configure show: node p-sbc3 \ attributes standby="off" node p-sbc4 \ attributes standby="off" primitive fs lsb:FSSofia \ op monitor interval="2s" enabled="true" timeout="10s" on-fail="standby" \ meta target-role="Started" primitive fs-ip ocf:heartbeat:IPaddr2 \ params ip="10.100.0.90" nic="eth0:0" cidr_netmask="24" \ op monitor interval="10s" primitive fs-ip2 ocf:heartbeat:IPaddr2 \ params ip="10.100.0.99" nic="eth0:1" cidr_netmask="24" \ op monitor interval="10s" group cluster_services fs-ip fs-ip2 fs \ meta target-role="Started" property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ last-lrm-refresh="1348755080" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured
On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote: > Hi, > While using heartbeat and pacemaker, is it possible to bringup first > node which can go as Master, followed by second node which should go > as Slave without causing any issues to the first node? Currently, I > see a couple of problems in achieving this:1. Assuming I am not using > mcast communication, heartbeat is mandating me to configure second > node info either in ha.cf or in /etc/hosts file with associated IP > address. Why can't it come up by itself as Master to start with? > > 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM > first sends stop on the Master before sending start. > Appreciate any help or pointers. Regardless of what you do there, or why, or on which communication stack: how about you first put pacemaker into "maintenance-mode", then you do your re-archetecturing of your cluster, and once you are satisfied with the new cluster, you take it out of maintenance mode again? At least that is one of the intended use cases for maintenance mode. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Loosing corosync communication clusterwide
Hello, I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was blocked. The “dlm_tool ls” command told me “wait ringid”. The corosync-* commands hangs (like corosync-quorumtool). The pacemaker “crm_mon” display nothing wrong. I'm using Ubuntu Trusty Tahr: - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.1 My cluster was manually rebooted. Any idea how to debug such situation? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
I think the suggestion was to put shooting the host in the fencing path of a VM. This way if you can't get the host to fence the VM (as the host is already dead) you just check if the host was fenced. Daniel Dehennin napisał: >Andrei Borzenkov writes: > > >[...] > >>> Now I have one issue, when the bare metal host on which the VM is >>> running die, the VM is lost and can not be fenced. >>> >>> Is there a way to make pacemaker ACK the fencing of the VM running >on a >>> host when the host is fenced itself? >>> >> >> Yes, you can define multiple stonith agents and priority between >them. >> >> http://clusterlabs.org/wiki/Fencing_topology > >Hello, > >If I understand correctly, fencing topology is the way to have several >fencing devices for a node and try them consecutively until one works. > >In my configuration, I group the VM stonith agents with the >corresponding VM resource, to make them move together[1]. > >Here is my use case: > >1. Resource ONE-Frontend-Group runs on nebula1 >2. nebula1 is fenced >3. node one-fronted can not be fenced > >Is there a way to say that the life on node one-frontend is related to >the state of resource ONE-Frontend? > >In which case when the node nebula1 is fenced, pacemaker should be >aware that >resource ONE-Frontend is not running any more, so node one-frontend is >OFFLINE and not UNCLEAN. > >Regards. > >Footnotes: >[1] >http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html > >-- >Daniel Dehennin >Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF >Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > > > > >___ >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >Project Home: http://www.clusterlabs.org >Getting started: >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org -- Wysłane za pomocą K-9 Mail.___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest
Andrei Borzenkov writes: [...] >> Now I have one issue, when the bare metal host on which the VM is >> running die, the VM is lost and can not be fenced. >> >> Is there a way to make pacemaker ACK the fencing of the VM running on a >> host when the host is fenced itself? >> > > Yes, you can define multiple stonith agents and priority between them. > > http://clusterlabs.org/wiki/Fencing_topology Hello, If I understand correctly, fencing topology is the way to have several fencing devices for a node and try them consecutively until one works. In my configuration, I group the VM stonith agents with the corresponding VM resource, to make them move together[1]. Here is my use case: 1. Resource ONE-Frontend-Group runs on nebula1 2. nebula1 is fenced 3. node one-fronted can not be fenced Is there a way to say that the life on node one-frontend is related to the state of resource ONE-Frontend? In which case when the node nebula1 is fenced, pacemaker should be aware that resource ONE-Frontend is not running any more, so node one-frontend is OFFLINE and not UNCLEAN. Regards. Footnotes: [1] http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF node $id="1084811078" nebula1 node $id="1084811079" nebula2 node $id="1084811080" nebula3 node $id="108488" quorum \ attributes standby="on" node $id="108489" one-frontend primitive ONE-Datastores ocf:heartbeat:Filesystem \ params device="/dev/one-fs/datastores" directory="/var/lib/one/datastores" fstype="gfs2" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="20" timeout="40" primitive ONE-Frontend ocf:heartbeat:VirtualDomain \ params config="/var/lib/one/datastores/one/one.xml" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ utilization cpu="1" hv_memory="1024" primitive ONE-vg ocf:heartbeat:LVM \ params volgrpname="one-fs" \ op start interval="0" timeout="30" \ op stop interval="0" timeout="30" \ op monitor interval="60" timeout="30" primitive Quorum-Node ocf:heartbeat:VirtualDomain \ params config="/var/lib/libvirt/qemu/pcmk/quorum.xml" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ utilization cpu="1" hv_memory="1024" primitive Stonith-ONE-Frontend stonith:external/libvirt \ params hostlist="one-frontend" hypervisor_uri="qemu:///system" pcmk_host_list="one-frontend" pcmk_host_check="static-list" \ op monitor interval="30m" primitive Stonith-Quorum-Node stonith:external/libvirt \ params hostlist="quorum" hypervisor_uri="qemu:///system" pcmk_host_list="quorum" pcmk_host_check="static-list" \ op monitor interval="30m" primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \ params hostname="nebula1-ipmi" ipaddr="XXX.XXX.XXX.XXX" interface="lanplus" userid="USER" passwd="PASSWORD1" passwd_method="env" priv="operator" pcmk_host_list="nebula1" pcmk_host_check="static-list" \ op monitor interval="30m" \ meta target-role="Started" primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \ params hostname="nebula2-ipmi" ipaddr="YYY.YYY.YYY.YYY" interface="lanplus" userid="USER" passwd="PASSWORD2" passwd_method="env" priv="operator" pcmk_host_list="nebula2" pcmk_host_check="static-list" \ op monitor interval="30m" \ meta target-role="Started" primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \ params hostname="nebula3-ipmi" ipaddr="ZZZ.ZZZ.ZZZ.ZZZ" interface="lanplus" userid="USER" passwd="PASSWORD3" passwd_method="env" priv="operator" pcmk_host_list="nebula3" pcmk_host_check="static-list" \ op monitor interval="30m" \ meta target-role="Started" primitive clvm ocf:lvm2:clvmd \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="60" timeout="90" primitive dlm ocf:pacemaker:controld \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="60" timeout="60" group ONE-Frontend-Group Stonith-ONE-Frontend ONE-Frontend \ meta target-role="Started" group ONE-Storage dlm clvm ONE-vg ONE-Datastores group Quorum-Node-Group Stonith-Quorum-Node Quorum-Node \ meta target-role="Started" clone ONE-Storage-Clone ONE-Storage \ meta interleave="true" target-role="Started" location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \ rule $id="Nebula1-does-not-fence-itslef-rule" 50: #uname eq nebula2 \ rule $id="Nebula1-does-not-fence-itslef-rule-0" 40: #uname eq nebula3 location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \ rule $id="Nebula2-does-not-fence-itslef-rule" 50: #una