Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-11-10 Thread Sihan Goi
Hi,

DocumentRoot is still set to /var/www/html
ls -al /var/www/html shows different things on the 2 nodes
node01:

total 28
drwxr-xr-x. 3 root root  4096 Nov 11 12:25 .
drwxr-xr-x. 6 root root  4096 Jul 23 22:18 ..
-rw-r--r--. 1 root root50 Oct 28 18:00 index.html
drwx--. 2 root root 16384 Oct 28 17:59 lost+found

node02 only has index.html, no lost+found, and it's a different version of
the file.

Status URL is enabled in both nodes.


On Oct 30, 2014 11:14 AM, "Andrew Beekhof"  wrote:

>
> > On 29 Oct 2014, at 1:01 pm, Sihan Goi  wrote:
> >
> > Hi,
> >
> > I've never used crm_report before. I just read the man file and
> generated a tarball from 1-2 hours before I reconfigured all the DRBD
> related resources. I've put the tarball here -
> https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0
> >
> > Hope you can help figure out what I'm doing wrong. Thanks for the help!
>
> Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for
> /dev/drbd/by-res/wwwdata on /var/www/html
> Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with
> ordered data mode. Opts:
> Oct 28 18:13:39 node02 crmd[9870]:   notice: process_lrm_event: LRM
> operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok
> Oct 28 18:13:39 node02 crmd[9870]:   notice: te_rsc_command: Initiating
> action 7: start WebSite_start_0 on node02 (local)
> Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line
> 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
>
> Is DocumentRoot still set to /var/www/html?
> If so, what happens if you run 'ls -al /var/www/html' in a shell?
>
> Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running
> Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache
> /etc/httpd/conf/httpd.conf to come up
>
> Did you enable the status url?
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Andrew Beekhof

> On 11 Nov 2014, at 4:39 am, Daniel Dehennin  
> wrote:
> 
> emmanuel segura  writes:
> 
>> I think, you don't have fencing configured in your cluster.
> 
> I have fencing configured and working, modulo fencing VMs on dead host[1].

Are you saying that the host and the VMs running inside it are both part of the 
same cluster?

> 
> Regards.
> 
> Footnotes: 
> [1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html
> 
> -- 
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Tomasz Kontusz  writes:

> Hanging corosync sounds like libqb problems: trusty comes with 0.16,
> which likes to hang from time to time. Try building libqb 0.17.

Thanks, I'll look at this.

Is there a way to get back to normal state without rebooting all
machines and interrupting services?

I thought about a lightweight version of something like:

1. stop pacemaker on all nodes without doing anything with resources,
   they all continue to work
   
2. stop corosync on all nodes

3. start corosync on all nodes

4. start pacemaker on all nodes, as services are running nothing needs
   to be done

I looked in the documentation but fail to find some kind of cluster
management best practices.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
emmanuel segura  writes:

> I think, you don't have fencing configured in your cluster.

I have fencing configured and working, modulo fencing VMs on dead host[1].

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Tomasz Kontusz
Hanging corosync sounds like libqb problems: trusty comes with 0.16, which 
likes to hang from time to time. Try building libqb 0.17.

Daniel Dehennin  napisał:
>Hello,
>
>I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
>blocked.
>
>The “dlm_tool ls” command told me “wait ringid”.
>
>The corosync-* commands hangs (like corosync-quorumtool).
>
>The pacemaker “crm_mon” display nothing wrong.
>
>I'm using Ubuntu Trusty Tahr:
>
>- corosync 2.3.3-1ubuntu1
>- pacemaker 1.1.10+git20130802-1ubuntu2.1
>
>My cluster was manually rebooted.
>
>Any idea how to debug such situation?
>
>Regards.
>-- 
>Daniel Dehennin
>Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
>
>
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Andrei Borzenkov
В Mon, 10 Nov 2014 10:07:18 +0100
Tomasz Kontusz  пишет:

> I think the suggestion was to put shooting the host in the fencing path of a 
> VM. This way if you can't get the host to fence the VM (as the host is 
> already dead) you just check if the host was fenced.
> 

Exactly. One thing I do not know how it will behave in case of multiple
VMs on the same host. I.e. will pacemaker try to fence host for every
VM or recognize that all VMs are dead after the first time agent is
invoked.

> Daniel Dehennin  napisał:
> >Andrei Borzenkov  writes:
> >
> >
> >[...]
> >
> >>> Now I have one issue, when the bare metal host on which the VM is
> >>> running die, the VM is lost and can not be fenced.
> >>> 
> >>> Is there a way to make pacemaker ACK the fencing of the VM running
> >on a
> >>> host when the host is fenced itself?
> >>> 
> >>
> >> Yes, you can define multiple stonith agents and priority between
> >them.
> >>
> >> http://clusterlabs.org/wiki/Fencing_topology
> >
> >Hello,
> >
> >If I understand correctly, fencing topology is the way to have several
> >fencing devices for a node and try them consecutively until one works.
> >
> >In my configuration, I group the VM stonith agents with the
> >corresponding VM resource, to make them move together[1].
> >
> >Here is my use case:
> >
> >1. Resource ONE-Frontend-Group runs on nebula1
> >2. nebula1 is fenced
> >3. node one-fronted can not be fenced
> >
> >Is there a way to say that the life on node one-frontend is related to
> >the state of resource ONE-Frontend?
> >
> >In which case when the node nebula1 is fenced, pacemaker should be
> >aware that
> >resource ONE-Frontend is not running any more, so node one-frontend is
> >OFFLINE and not UNCLEAN.
> >
> >Regards.
> >
> >Footnotes: 
> >[1] 
> >http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html
> >
> >-- 
> >Daniel Dehennin
> >Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> >Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
> >
> >
> >
> >
> >
> >___
> >Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started:
> >http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread emmanuel segura
I think, you don't have fencing configured in your cluster.

2014-11-10 17:02 GMT+01:00 Daniel Dehennin :
> Daniel Dehennin  writes:
>
>> Hello,
>
> Hello,
>
>> I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
>> blocked.
>>
>> The “dlm_tool ls” command told me “wait ringid”.
>
> It happened again:
>
> root@nebula2:~# dlm_tool ls
> dlm lockspaces
> name  datastores
> id0x1b61ba6a
> flags 0x0004 kern_stop
> changemember 4 joined 1 remove 0 failed 0 seq 3,3
> members   1084811078 1084811079 1084811080 108489
> new changemember 3 joined 0 remove 1 failed 1 seq 4,4
> new statuswait ringid
> new members   1084811078 1084811079 1084811080
>
> name  clvmd
> id0x4104eefa
> flags 0x0004 kern_stop
> changemember 4 joined 1 remove 0 failed 0 seq 3,3
> members   1084811078 1084811079 1084811080 108489
> new changemember 3 joined 0 remove 1 failed 1 seq 4,4
> new statuswait ringid
> new members   1084811078 1084811079 1084811080
>
> root@nebula2:~# dlm_tool status
> cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
> daemon now 8351 fence_pid 0
> fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
> 1415634734
> node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0
>
> Any idea?
> --
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Daniel Dehennin  writes:

> Hello,

Hello,

> I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
> blocked.
>
> The “dlm_tool ls” command told me “wait ringid”.

It happened again:

root@nebula2:~# dlm_tool ls
dlm lockspaces
name  datastores
id0x1b61ba6a
flags 0x0004 kern_stop
changemember 4 joined 1 remove 0 failed 0 seq 3,3
members   1084811078 1084811079 1084811080 108489
new changemember 3 joined 0 remove 1 failed 1 seq 4,4
new statuswait ringid
new members   1084811078 1084811079 1084811080

name  clvmd
id0x4104eefa
flags 0x0004 kern_stop
changemember 4 joined 1 remove 0 failed 0 seq 3,3
members   1084811078 1084811079 1084811080 108489
new changemember 3 joined 0 remove 1 failed 1 seq 4,4
new statuswait ringid
new members   1084811078 1084811079 1084811080

root@nebula2:~# dlm_tool status
cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
daemon now 8351 fence_pid 0
fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
1415634734
node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0

Any idea?
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

2014-11-10 Thread Zach Wolf
Hey Team,

I'm receiving some strange intermittent failovers on a two-node cluster 
(happens once every week or two). When this happens, both nodes are 
unavailable; one node will be marked offline and the other will be shown as 
unclean. Any help on this would be massively appreciated. Thanks.

Running Ubuntu 12.04 (64-bit)
Pacemaker 1.1.6-2ubuntu3.3
Corosync 1.4.2-2ubuntu0.2

Here are the logs:
Nov 08 14:26:26 corosync [pcmk  ] info: pcmk_ipc_exit: Client crmd 
(conn=0x12bebe0, async-conn=0x12bebe0) left
Nov 08 14:26:26 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:27 corosync [pcmk  ] info: pcmk_ipc_exit: Client attrd 
(conn=0x12d0230, async-conn=0x12d0230) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client cib 
(conn=0x12c7d80, async-conn=0x12c7d80) left
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc_exit: Client stonith-ng 
(conn=0x12c3a20, async-conn=0x12c3a20) left
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.crmd failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] WARN: route_ais_message: Sending message to 
local.cib failed: ipc delivery failed (rc=-2)
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12bebe0 
for stonith-ng/0
Nov 08 14:26:32 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c2f40 
for attrd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12c72a0 
for cib/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 
to cib
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Recorded connection 0x12cb600 
for crmd/0
Nov 08 14:26:33 corosync [pcmk  ] info: pcmk_ipc: Sending membership update 12 
to crmd

Output of crm configure show:
node p-sbc3 \
attributes standby="off"
node p-sbc4 \
attributes standby="off"
primitive fs lsb:FSSofia \
op monitor interval="2s" enabled="true" timeout="10s" on-fail="standby" 
\
meta target-role="Started"
primitive fs-ip ocf:heartbeat:IPaddr2 \
params ip="10.100.0.90" nic="eth0:0" cidr_netmask="24" \
op monitor interval="10s"
primitive fs-ip2 ocf:heartbeat:IPaddr2 \
params ip="10.100.0.99" nic="eth0:1" cidr_netmask="24" \
op monitor interval="10s"
group cluster_services fs-ip fs-ip2 fs \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
last-lrm-refresh="1348755080" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured

2014-11-10 Thread Lars Ellenberg
On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote:
> Hi,
> While using heartbeat and pacemaker, is it possible to bringup first
> node which can go as Master, followed by second node which should go
> as Slave without causing any issues to the first node? Currently, I
> see a  couple of problems in achieving this:1. Assuming I am not using
> mcast communication, heartbeat is mandating me to configure second
> node info either in ha.cf or in /etc/hosts file with associated IP
> address. Why can't it come up by itself as Master to start with?
>
> 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM
> first sends stop on the Master before sending start.
> Appreciate any help or pointers.


Regardless of what you do there, or why,
or on which communication stack:

how about you first put pacemaker into "maintenance-mode",
then you do your re-archetecturing of your cluster,
and once you are satisfied with the new cluster,
you take it out of maintenance mode again?

At least that is one of the intended use cases
for maintenance mode.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Tomasz Kontusz
I think the suggestion was to put shooting the host in the fencing path of a 
VM. This way if you can't get the host to fence the VM (as the host is already 
dead) you just check if the host was fenced.

Daniel Dehennin  napisał:
>Andrei Borzenkov  writes:
>
>
>[...]
>
>>> Now I have one issue, when the bare metal host on which the VM is
>>> running die, the VM is lost and can not be fenced.
>>> 
>>> Is there a way to make pacemaker ACK the fencing of the VM running
>on a
>>> host when the host is fenced itself?
>>> 
>>
>> Yes, you can define multiple stonith agents and priority between
>them.
>>
>> http://clusterlabs.org/wiki/Fencing_topology
>
>Hello,
>
>If I understand correctly, fencing topology is the way to have several
>fencing devices for a node and try them consecutively until one works.
>
>In my configuration, I group the VM stonith agents with the
>corresponding VM resource, to make them move together[1].
>
>Here is my use case:
>
>1. Resource ONE-Frontend-Group runs on nebula1
>2. nebula1 is fenced
>3. node one-fronted can not be fenced
>
>Is there a way to say that the life on node one-frontend is related to
>the state of resource ONE-Frontend?
>
>In which case when the node nebula1 is fenced, pacemaker should be
>aware that
>resource ONE-Frontend is not running any more, so node one-frontend is
>OFFLINE and not UNCLEAN.
>
>Regards.
>
>Footnotes: 
>[1] 
>http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html
>
>-- 
>Daniel Dehennin
>Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
>
>
>
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Daniel Dehennin
Andrei Borzenkov  writes:


[...]

>> Now I have one issue, when the bare metal host on which the VM is
>> running die, the VM is lost and can not be fenced.
>> 
>> Is there a way to make pacemaker ACK the fencing of the VM running on a
>> host when the host is fenced itself?
>> 
>
> Yes, you can define multiple stonith agents and priority between them.
>
> http://clusterlabs.org/wiki/Fencing_topology

Hello,

If I understand correctly, fencing topology is the way to have several
fencing devices for a node and try them consecutively until one works.

In my configuration, I group the VM stonith agents with the
corresponding VM resource, to make them move together[1].

Here is my use case:

1. Resource ONE-Frontend-Group runs on nebula1
2. nebula1 is fenced
3. node one-fronted can not be fenced

Is there a way to say that the life on node one-frontend is related to
the state of resource ONE-Frontend?

In which case when the node nebula1 is fenced, pacemaker should be aware that
resource ONE-Frontend is not running any more, so node one-frontend is
OFFLINE and not UNCLEAN.

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

node $id="1084811078" nebula1
node $id="1084811079" nebula2
node $id="1084811080" nebula3
node $id="108488" quorum \
attributes standby="on"
node $id="108489" one-frontend
primitive ONE-Datastores ocf:heartbeat:Filesystem \
params device="/dev/one-fs/datastores" 
directory="/var/lib/one/datastores" fstype="gfs2" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="20" timeout="40"
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config="/var/lib/one/datastores/one/one.xml" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
utilization cpu="1" hv_memory="1024"
primitive ONE-vg ocf:heartbeat:LVM \
params volgrpname="one-fs" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="30" \
op monitor interval="60" timeout="30"
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config="/var/lib/libvirt/qemu/pcmk/quorum.xml" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
utilization cpu="1" hv_memory="1024"
primitive Stonith-ONE-Frontend stonith:external/libvirt \
params hostlist="one-frontend" hypervisor_uri="qemu:///system" 
pcmk_host_list="one-frontend" pcmk_host_check="static-list" \
op monitor interval="30m"
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist="quorum" hypervisor_uri="qemu:///system" 
pcmk_host_list="quorum" pcmk_host_check="static-list" \
op monitor interval="30m"
primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \
params hostname="nebula1-ipmi" ipaddr="XXX.XXX.XXX.XXX" 
interface="lanplus" userid="USER" passwd="PASSWORD1" passwd_method="env" 
priv="operator" pcmk_host_list="nebula1" pcmk_host_check="static-list" \
op monitor interval="30m" \
meta target-role="Started"
primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \
params hostname="nebula2-ipmi" ipaddr="YYY.YYY.YYY.YYY" 
interface="lanplus" userid="USER" passwd="PASSWORD2" passwd_method="env" 
priv="operator" pcmk_host_list="nebula2" pcmk_host_check="static-list" \
op monitor interval="30m" \
meta target-role="Started"
primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \
params hostname="nebula3-ipmi" ipaddr="ZZZ.ZZZ.ZZZ.ZZZ" 
interface="lanplus" userid="USER" passwd="PASSWORD3" passwd_method="env" 
priv="operator" pcmk_host_list="nebula3" pcmk_host_check="static-list" \
op monitor interval="30m" \
meta target-role="Started"
primitive clvm ocf:lvm2:clvmd \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="60" timeout="90"
primitive dlm ocf:pacemaker:controld \
op start interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="60" timeout="60"
group ONE-Frontend-Group Stonith-ONE-Frontend ONE-Frontend \
meta target-role="Started"
group ONE-Storage dlm clvm ONE-vg ONE-Datastores
group Quorum-Node-Group Stonith-Quorum-Node Quorum-Node \
meta target-role="Started"
clone ONE-Storage-Clone ONE-Storage \
meta interleave="true" target-role="Started"
location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \
rule $id="Nebula1-does-not-fence-itslef-rule" 50: #uname eq nebula2 \
rule $id="Nebula1-does-not-fence-itslef-rule-0" 40: #uname eq nebula3
location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \
rule $id="Nebula2-does-not-fence-itslef-rule" 50: #una