Re: [Pacemaker] CoroSync's UDPu transport for public IP addresses?

2014-12-30 Thread Daniel Dehennin
Dmitry Koterov dmitry.kote...@gmail.com writes:

 Oh, seems I've found the solution! At least two mistakes was in my
 corosync.conf (BTW logs did not say about any errors, so my conclusion is
 based on my experiments only).

 1. nodelist.node MUST contain only IP addresses. No hostnames! They simply
 do not work, crm status shows no nodes. And no warnings are in logs
 regarding this.

You can add name like this:

nodelist {
  node {
ring0_addr: public-ip-address-of-the-first-machine
name: node1
  }
  node {
ring0_addr: public-ip-address-of-the-second-machine
name: node2
  }
}

I used it on Ubuntu Trusty with udpu.

Regards.

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-12-04 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:

 What version of pacemaker is this?
 Some very old versions wanted the agent to be installed on all nodes.

It's 1.1.10+git20130802-1ubuntu2.1 on Trusty Tahr.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker fencing and DLM/cLVM

2014-11-28 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:

 This was fixed a few months ago:

 + David Vossel (9 months ago) 054fedf: Fix: stonith_api_time_helper now 
 returns when the most recent fencing operation completed  (origin/pr/444)
 + Andrew Beekhof (9 months ago) d9921e5: Fix: Fencing: Pass the correct 
 options when looking up the history by node name 
 + Andrew Beekhof (9 months ago) b0a8876: Log: Fencing: Send details of 
 stonith_api_time() and stonith_api_kick() to syslog 

 It doesn't seem Ubuntu has these patches

Thanks, I just opened a bug report[1].

Footnotes: 
[1]  https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1397278

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-26 Thread Daniel Dehennin
Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 I'll try find how to make the change directly in XML.

 Ok, looking at git history this feature seems only available on master
 branch and not yet released.

I do not have that feature on my pacemaker version.

Does it sounds normal, I have:

- asymmetrical Opt-in cluster[1]

- a group of resources with INFINITY location on a specific node

And the nodes excluded are fenced because of many monitor errors about
this resource.

Regards.

Footnotes: 
[1]  
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_asymmetrical_opt_in_clusters.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Avoid monitoring of resources on nodes

2014-11-25 Thread Daniel Dehennin
Hello,

I have a 4 nodes cluster and some resources are only installed on 2 of
them.

I set cluster asymmetry and infinity location:

primitive Mysqld upstart:mysql \
op monitor interval=60
primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \
op monitor interval=60
primitive OpenNebula-Sysv lsb:opennebula \
op monitor interval=60
group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \
meta target-role=Started
location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend
property $id=cib-bootstrap-options \
dc-version=1.1.10-42f2063 \
cluster-infrastructure=corosync \
symmetric-cluster=false \
stonith-enabled=true \
stonith-timeout=30 \
last-lrm-refresh=1416817941 \
no-quorum-policy=stop \
stop-all-resources=off

But I have a lot of failing monitoring on other nodes of these resources
because they are not installed on them.

Is there a way to completely exclude the resources from nodes, even the
monitoring?

Regards.

Ubuntu Trusty Tahr (amd64):
- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-25 Thread Daniel Dehennin
Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 Hello,

Hello,

 I have a 4 nodes cluster and some resources are only installed on 2 of
 them.

 I set cluster asymmetry and infinity location:

 primitive Mysqld upstart:mysql \
   op monitor interval=60
 primitive OpenNebula-Sunstone-Sysv lsb:opennebula-sunstone \
   op monitor interval=60
 primitive OpenNebula-Sysv lsb:opennebula \
   op monitor interval=60
 group OpenNebula Mysqld OpenNebula-Sysv OpenNebula-Sunstone-Sysv \
   meta target-role=Started
 location OpenNebula-runs-on-Frontend OpenNebula inf: one-frontend
 property $id=cib-bootstrap-options \
   dc-version=1.1.10-42f2063 \
   cluster-infrastructure=corosync \
   symmetric-cluster=false \
   stonith-enabled=true \
   stonith-timeout=30 \
   last-lrm-refresh=1416817941 \
   no-quorum-policy=stop \
   stop-all-resources=off

 But I have a lot of failing monitoring on other nodes of these resources
 because they are not installed on them.

 Is there a way to completely exclude the resources from nodes, even the
 monitoring?

This cause troubles on my setup, as resources fails, my nodes are all
fenced.

Any hints?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker fencing and DLM/cLVM

2014-11-25 Thread Daniel Dehennin
Christine Caulfield ccaul...@redhat.com writes:

 It seems to me that fencing is failing for some reason, though I can't
 tell from the logs exactly why, so you might have to investgate your
 setup for IPMI to see just what is happening (I'm no IPMI expert,
 sorry).

Thanks for looking, but actually IPMI stonith is working, for all nodes
I tested:

stonith_adm --reboot node

And it works.

 The logs files tell me this though:

 Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request
 1084811079 pid 7358 nodedown time 1416909392 fence_all dlm_stonith
 Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence result
 1084811079 pid 7358 result 1 exit status
 Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence status
 1084811079 receive 1 from 1084811080 walltime 1416909392 local 1035
 Nov 25 10:56:32 nebula3 dlm_controld[6465]: 1035 fence request
 1084811079 no actor


 Showing a status code '1' from dlm_stonith - the result should be 0 if
 fencing completed succesfully.

But 1084811080 is nebula3 and in its logs I see:

Nov 25 10:56:33 nebula3 stonith-ng[6232]:   notice: can_fence_host_with_device: 
Stonith-nebula2-IPMILAN can fence nebula2: static-list
[...]
Nov 25 10:56:34 nebula3 stonith-ng[6232]:   notice: log_operation: Operation 
'reboot' [7359] (call 4 from crmd.5038) for host 'nebula2' with device 
'Stonith-nebula2-IPMILAN' returned: 0 (OK)
Nov 25 10:56:34 nebula3 stonith-ng[6232]:error: crm_abort: 
crm_glib_handler: Forked child 7376 to record non-fatal assert at logging.c:63 
: Source ID 20 was not found when attempting to remove it
Nov 25 10:56:34 nebula3 stonith-ng[6232]:error: crm_abort: 
crm_glib_handler: Forked child 7377 to record non-fatal assert at logging.c:63 
: Source ID 21 was not found when attempting to remove it
Nov 25 10:56:34 nebula3 stonith-ng[6232]:   notice: remote_op_done: Operation 
reboot of nebula2 by nebula1 for crmd.5038@nebula1.34bed18c: OK
Nov 25 10:56:34 nebula3 crmd[6236]:   notice: tengine_stonith_notify: Peer 
nebula2 was terminated (reboot) by nebula1 for nebula1: OK 
(ref=34bed18c-c395-4de2-b323-e00208cac6c7) by client crmd.5038
Nov 25 10:56:34 nebula3 crmd[6236]:   notice: crm_update_peer_state: 
tengine_stonith_notify: Node nebula2[0] - state is now lost (was (null))

Which means to me that stonith-ng manage to fence the node and notify
its success.

How the “returned: 0 (OK)” could became “receive 1”?

A logic issue somewhere between stonith-ng and dlm_controld?

Thanks.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-25 Thread Daniel Dehennin
David Vossel dvos...@redhat.com writes:

 actually, this is possible now. I am unaware of any configuration tools (pcs 
 or
 crmsh) that support this feature yet though. You might have to edit the cib 
 xml
 manually.

 There's a new 'resource-discovery' option you can set on a location constraint
 that help prevent resources from ever being started or monitored on a node.

 Example: never start or monitor the resource FAKE1 on 18node2.

 rsc_location id=location-FAKE1-18node2 node=18node2 
 resource-discovery=never rsc=FAKE1 score=-INFINITY/

 There are more examples in this regression test.
 https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99

Thanks a lot.

I'll try find how to make the change directly in XML.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-11-25 Thread Daniel Dehennin
Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 There's a new 'resource-discovery' option you can set on a location 
 constraint
 that help prevent resources from ever being started or monitored on a node.

 Example: never start or monitor the resource FAKE1 on 18node2.

 rsc_location id=location-FAKE1-18node2 node=18node2 
 resource-discovery=never rsc=FAKE1 score=-INFINITY/

 There are more examples in this regression test.
 https://github.com/ClusterLabs/pacemaker/blob/master/pengine/test10/resource-discovery.xml#L99

 Thanks a lot.

 I'll try find how to make the change directly in XML.

Ok, looking at git history this feature seems only available on master
branch and not yet released.

Thanks.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker fencing and DLM/cLVM

2014-11-24 Thread Daniel Dehennin
]:   notice: tengine_stonith_notify: Peer 
nebula1 was terminated (reboot) by nebula2 for nebula3: OK 
(ref=50c93bed-e66f-48a5-bd2f-100a9e7ca7a1) by client crmd.6043
Nov 24 09:51:13 nebula3 crmd[6043]:   notice: te_rsc_command: Initiating action 
22: start Stonith-nebula3-IPMILAN_start_0 on nebula2
Nov 24 09:51:14 nebula3 crmd[6043]:   notice: run_graph: Transition 5 
(Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped
Nov 24 09:51:14 nebula3 pengine[6042]:   notice: process_pe_message: Calculated 
Transition 6: /var/lib/pacemaker/pengine/pe-input-2.bz2
Nov 24 09:51:14 nebula3 crmd[6043]:   notice: te_rsc_command: Initiating action 
21: monitor Stonith-nebula3-IPMILAN_monitor_180 on nebula2
Nov 24 09:51:15 nebula3 crmd[6043]:   notice: run_graph: Transition 6 
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete
Nov 24 09:51:15 nebula3 crmd[6043]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 datastores wait for fencing
Nov 24 09:52:10 nebula3 dlm_controld[6263]: 566 clvmd wait for fencing
Nov 24 09:55:10 nebula3 dlm_controld[6263]: 747 fence status 1084811078 receive 
-125 from 1084811079 walltime 1416819310 local 747

When the node is fenced I have “clvmd wait for fencing” and “datastores
wait for fencing” (datastores is my GFS2 volume).

Any idea of something I can check when this happens?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker fencing and DLM/cLVM

2014-11-24 Thread Daniel Dehennin
Michael Schwartzkopff m...@sys4.de writes:

 Yes. You have to tell all the underlying infrastructure to use the fencing of 
 pacemaker. I assume that you are working on a RH clone.

 See: 
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08s02s03.html

Sorry, this is my fault.

I'm using Ubuntu 14.04:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

I thought everything was integrated in such configuration.

Regards.

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-14 Thread Daniel Dehennin
Christine Caulfield ccaul...@redhat.com writes:


[...]

 If its only happening at startup it could be the switch/router
 learning the ports for the nodes and building its routing
 tables. Switching to udpu will then get rid of the message if it's
 annoying

Switching to updu make it works correctly.

Thanks.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-13 Thread Daniel Dehennin
Hello,

My cluster seems to works correctly but when I start corosync and
pacemaker on one of them[1] I start to have some TOTEM logs like this:

#+begin_src
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 46 47 48 49 
4a 4b 4c 4d 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4a 4b 4c 4d 
4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
#+end_src

I do not understand what happens, do you have any hints?

Regards.

Footnotes: 
[1]  the VM using two cards 
http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022962.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Daniel Dehennin
Andrei Borzenkov arvidj...@gmail.com writes:


[...]

 Now I have one issue, when the bare metal host on which the VM is
 running die, the VM is lost and can not be fenced.
 
 Is there a way to make pacemaker ACK the fencing of the VM running on a
 host when the host is fenced itself?
 

 Yes, you can define multiple stonith agents and priority between them.

 http://clusterlabs.org/wiki/Fencing_topology

Hello,

If I understand correctly, fencing topology is the way to have several
fencing devices for a node and try them consecutively until one works.

In my configuration, I group the VM stonith agents with the
corresponding VM resource, to make them move together[1].

Here is my use case:

1. Resource ONE-Frontend-Group runs on nebula1
2. nebula1 is fenced
3. node one-fronted can not be fenced

Is there a way to say that the life on node one-frontend is related to
the state of resource ONE-Frontend?

In which case when the node nebula1 is fenced, pacemaker should be aware that
resource ONE-Frontend is not running any more, so node one-frontend is
OFFLINE and not UNCLEAN.

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022671.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

node $id=1084811078 nebula1
node $id=1084811079 nebula2
node $id=1084811080 nebula3
node $id=108488 quorum \
attributes standby=on
node $id=108489 one-frontend
primitive ONE-Datastores ocf:heartbeat:Filesystem \
params device=/dev/one-fs/datastores 
directory=/var/lib/one/datastores fstype=gfs2 \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=20 timeout=40
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/one.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
utilization cpu=1 hv_memory=1024
primitive ONE-vg ocf:heartbeat:LVM \
params volgrpname=one-fs \
op start interval=0 timeout=30 \
op stop interval=0 timeout=30 \
op monitor interval=60 timeout=30
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config=/var/lib/libvirt/qemu/pcmk/quorum.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
utilization cpu=1 hv_memory=1024
primitive Stonith-ONE-Frontend stonith:external/libvirt \
params hostlist=one-frontend hypervisor_uri=qemu:///system 
pcmk_host_list=one-frontend pcmk_host_check=static-list \
op monitor interval=30m
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist=quorum hypervisor_uri=qemu:///system 
pcmk_host_list=quorum pcmk_host_check=static-list \
op monitor interval=30m
primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \
params hostname=nebula1-ipmi ipaddr=XXX.XXX.XXX.XXX 
interface=lanplus userid=USER passwd=PASSWORD1 passwd_method=env 
priv=operator pcmk_host_list=nebula1 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \
params hostname=nebula2-ipmi ipaddr=YYY.YYY.YYY.YYY 
interface=lanplus userid=USER passwd=PASSWORD2 passwd_method=env 
priv=operator pcmk_host_list=nebula2 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \
params hostname=nebula3-ipmi ipaddr=ZZZ.ZZZ.ZZZ.ZZZ 
interface=lanplus userid=USER passwd=PASSWORD3 passwd_method=env 
priv=operator pcmk_host_list=nebula3 pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive clvm ocf:lvm2:clvmd \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=90
primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=60
group ONE-Frontend-Group Stonith-ONE-Frontend ONE-Frontend \
meta target-role=Started
group ONE-Storage dlm clvm ONE-vg ONE-Datastores
group Quorum-Node-Group Stonith-Quorum-Node Quorum-Node \
meta target-role=Started
clone ONE-Storage-Clone ONE-Storage \
meta interleave=true target-role=Started
location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \
rule $id=Nebula1-does-not-fence-itslef-rule 50: #uname eq nebula2 \
rule $id=Nebula1-does-not-fence-itslef-rule-0 40: #uname eq nebula3
location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \
rule $id=Nebula2-does-not-fence-itslef-rule 50: #uname eq nebula3 \
rule $id=Nebula2-does-not-fence-itslef-rule-0 40: #uname eq nebula1
location Nebula3-does-not-fence-itslef Stonith-nebula3-IPMILAN \
rule $id=Nebula3-does

[Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
emmanuel segura emi2f...@gmail.com writes:

 I think, you don't have fencing configured in your cluster.

I have fencing configured and working, modulo fencing VMs on dead host[1].

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin
Tomasz Kontusz tomasz.kont...@gmail.com writes:

 Hanging corosync sounds like libqb problems: trusty comes with 0.16,
 which likes to hang from time to time. Try building libqb 0.17.

Thanks, I'll look at this.

Is there a way to get back to normal state without rebooting all
machines and interrupting services?

I thought about a lightweight version of something like:

1. stop pacemaker on all nodes without doing anything with resources,
   they all continue to work
   
2. stop corosync on all nodes

3. start corosync on all nodes

4. start pacemaker on all nodes, as services are running nothing needs
   to be done

I looked in the documentation but fail to find some kind of cluster
management best practices.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [SOLVED] Re: Multicast corosync packets and default route

2014-11-07 Thread Daniel Dehennin
Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 Daniel Dehennin daniel.dehen...@baby-gnu.org writes:

 Hello,


 [...]

 I only manage to have my VM as corosync member like others when default
 the route is on the same interface as my multicast traffic.

[...]


Using tcpdump I found the difference between single card VM and multicard VM.

When using multiple cards, I need to force the IGMP version since my
physical switches does not support IGMPv3.

It looks like the kernel uses IGMPv3 to register any local IP addresses
to the multicast group.


Single card VM:

No.  Time  Source   Destination  Protocol Info
  2  0.000985  192.168.231.110  226.94.1.1   IGMPv2   Membership Report 
group 226.94.1.1

Frame 2: 46 bytes on wire (368 bits), 46 bytes captured (368 bits)
Ethernet II, Src: RealtekU_03:6d:2d (52:54:00:03:6d:2d), Dst: 
IPv4mcast_5e:01:01 (01:00:5e:5e:01:01)
Internet Protocol Version 4, Src: 192.168.231.110 (192.168.231.110), Dst: 
226.94.1.1 (226.94.1.1)
Internet Group Management Protocol
[IGMP Version: 2]
Type: Membership Report (0x16)
Max Resp Time: 0,0 sec (0x00)
Header checksum: 0x06a0 [correct]
Multicast Address: 226.94.1.1 (226.94.1.1)

Multicard VM:

No.  Time  Source   Destination  Protocol Info
  2  0.004419  192.168.231.111  224.0.0.22   IGMPv3   Membership Report / 
Join group 226.94.1.1 for any sources

Frame 2: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
Ethernet II, Src: RealtekU_dc:b6:92 (52:54:00:dc:b6:92), Dst: IPv4mcast_16 
(01:00:5e:00:00:16)
Internet Protocol Version 4, Src: 192.168.231.111 (192.168.231.111), Dst: 
224.0.0.22 (224.0.0.22)
Internet Group Management Protocol
[IGMP Version: 3]
Type: Membership Report (0x22)
Header checksum: 0xf69e [correct]
Num Group Records: 1
Group Record : 226.94.1.1  Change To Exclude Mode
Record Type: Change To Exclude Mode (4)
Aux Data Len: 0
Num Src: 0
Multicast Address: 226.94.1.1 (226.94.1.1)

So I force the IGMP version for all interfaces with the following:

sysctl -w net.ipv4.conf.all.force_igmp_version=2

Now my dual card VM is part of the ring:

root@nebula3:~# corosync-quorumtool
Quorum information
--
Date: Fri Nov  7 16:32:34 2014
Quorum provider:  corosync_votequorum
Nodes:5
Node ID:  1084811080
Ring ID:  20624
Quorate:  Yes

Votequorum information
--
Expected votes:   5
Highest expected: 5
Total votes:  5
Quorum:   3
Flags:Quorate WaitForAll LastManStanding

Membership information
--
Nodeid  Votes Name
1084811078  1 nebula1.eole.lan
1084811079  1 nebula2.eole.lan
1084811080  1 nebula3.eole.lan (local)
108488  1 quorum.eole.lan
108489  1 one-frontend.eole.lan

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-07 Thread Daniel Dehennin
Hello,

As I finally manage to integrate my VM to corosync and my dlm/clvm/GFS2
are running on it.

Now I have one issue, when the bare metal host on which the VM is
running die, the VM is lost and can not be fenced.

Is there a way to make pacemaker ACK the fencing of the VM running on a
host when the host is fenced itself?

Regards.

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Multicast corosync packets and default route

2014-11-06 Thread Daniel Dehennin
Hello,

I'm trying to setup a pacemaker/corosync on Ubuntu Trusty to access a
SAN to use with OpenNebula[1]:

- pacemaker .1.10+git20130802-1ubuntu2.1
- corosync 2.3.3-1ubuntu1

I have a dedicated VLAN for cluster communications.

Each bare metal node have a dedicated interface eth0 on that VLAN, 3
other interfaces are used as a bond0 integrated to an Open vSwtich as
VLAN trunk.

One VM have two interfaces on this Open vSwitch:

- one for cluster communication
- one to provide services, with default route on it

My 3 bare metal nodes are OK, with pacemaker up running dlm/cLVM/GFS2,
but my VM is always isolated.

I setup a dedicated quorum (standby=on) VM with a single interface
plugged to the cluster communication VLAN and it works
(corosync/pacemaker).

I run ssmping to debug multicast communication and found that the VM can
only make unicast ping to the bare metal nodes.

I finish by adding a route for multicast:

ip route add 224.0.0.0/4 dev eth1 src 192.168.1.111

But it does not work.

I only manage to have my VM as corosync member like others when default
the route is on the same interface as my multicast traffic.

I'm sure there is something I do not understand in corosync and
multicast communication, do you have any hints?

Regards.

Footnotes: 
[1]  http://opennebula.org/

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-13 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:


[...]

 Is the ipaddr for each device really the same?  If so, why not use a
 single 'resource'?

No, sorry, the IP addr was not the same.

 Also, 1.1.7 wasn't as smart as 1.1.12 when it came to deciding which fencing 
 device to use.

 Likely you'll get the behaviour you want with a version upgrade.

I'll do that this week.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-07 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:

 Maybe not, the collocation should be sufficient, but even without the
 orders, unclean VMs fencing is tried with other Stonith devices.

 Which other devices?  The config you sent through didnt have any
 others.

Sorry I sent it to linux-cluster mailing-list but not here, I attach it.

 I'll switch to newer corosync/pacemaker and use the pacemaker_remote if
 I can manage dlm/cLVM/OCFS2 with it.

 No can do.  All three services require corosync on the node. 

Ok, so the remote is useless in my case, but upgrading seems required[1]
in my case since wheezy software stack looks to old.

Thanks.

Footnotes: 
[1]  http://article.gmane.org/gmane.linux.redhat.cluster/22963

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

node nebula1
node nebula2
node nebula3
node one
node quorum \
attributes standby=on
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/one.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Stopped
primitive ONE-OCFS2-datastores ocf:heartbeat:Filesystem \
params device=/dev/one-fs/datastores 
directory=/var/lib/one/datastores fstype=ocfs2 \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=20 timeout=40
primitive ONE-vg ocf:heartbeat:LVM \
params volgrpname=one-fs \
op start interval=0 timeout=30 \
op stop interval=0 timeout=30 \
op monitor interval=60 timeout=30
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config=/var/lib/libvirt/qemu/pcmk/quorum.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Started
primitive Stonith-ONE-Frontend stonith:external/libvirt \
params hostlist=one hypervisor_uri=qemu:///system 
pcmk_host_list=one pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist=quorum hypervisor_uri=qemu:///system 
pcmk_host_list=quorum pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula1-IPMILAN stonith:external/ipmi \
params hostname=nebula1-ipmi ipaddr=A.B.C.D interface=lanplus 
userid=user passwd=X passwd_method=env priv=operator 
pcmk_host_list=nebula1 pcmk_host_check=static-list priority=10 \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula2-IPMILAN stonith:external/ipmi \
params hostname=nebula2-ipmi ipaddr=A.B.C.D interface=lanplus 
userid=user passwd=X passwd_method=env priv=operator 
pcmk_host_list=nebula2 pcmk_host_check=static-list priority=20 \
op monitor interval=30m \
meta target-role=Started
primitive Stonith-nebula3-IPMILAN stonith:external/ipmi \
params hostname=nebula3-ipmi ipaddr=A.B.C.D interface=lanplus 
userid=user passwd=X passwd_method=env priv=operator 
pcmk_host_list=nebula3 pcmk_host_check=static-list priority=30 \
op monitor interval=30m \
meta target-role=Started
primitive clvm ocf:lvm2:clvm \
op start interval=0 timeout=90 \
op stop interval=0 timeout=90 \
op monitor interval=60 timeout=90
primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=60
primitive o2cb ocf:pacemaker:o2cb \
params stack=pcmk daemon_timeout=30 \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=60 timeout=60
group ONE-Storage dlm o2cb clvm ONE-vg ONE-OCFS2-datastores
clone ONE-Storage-Clone ONE-Storage \
meta interleave=true target-role=Started
location Nebula1-does-not-fence-itslef Stonith-nebula1-IPMILAN \
rule $id=Nebula1-does-not-fence-itslef-rule inf: #uname ne nebula1
location Nebula2-does-not-fence-itslef Stonith-nebula2-IPMILAN \
rule $id=Nebula2-does-not-fence-itslef-rule inf: #uname ne nebula2
location Nebula3-does-not-fence-itslef Stonith-nebula3-IPMILAN \
rule $id=Nebula3-does-not-fence-itslef-rule inf: #uname ne nebula3
location Nodes-with-ONE-Storage ONE-Storage-Clone \
rule $id=Nodes-with-ONE-Storage-rule inf: #uname eq nebula1 or #uname 
eq nebula2 or #uname eq nebula3 or #uname eq one
location ONE-Fontend-fenced-by-hypervisor Stonith-ONE-Frontend \
rule $id=ONE-Fontend-fenced-by-hypervisor-rule inf: #uname ne quorum 
or #uname ne one
location ONE-Frontend-run-on-hypervisor ONE-Frontend \
rule $id=ONE-Frontend-run-on-hypervisor-rule 40: #uname eq nebula1 \
rule $id=ONE-Frontend-run-on-hypervisor-rule-0 30: #uname eq nebula2 \
rule $id=ONE-Frontend-run

Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-06 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:

 It may be due to two “order”:
 
 #+begin_src
 order ONE-Frontend-after-its-Stonith inf: Stonith-ONE-Frontend ONE-Frontend
 order Quorum-Node-after-its-Stonith inf: Stonith-Quorum-Node Quorum-Node
 #+end_src

 Probably. Any particular reason for them to exist?

Maybe not, the collocation should be sufficient, but even without the
orders, unclean VMs fencing is tried with other Stonith devices.

I'll switch to newer corosync/pacemaker and use the pacemaker_remote if
I can manage dlm/cLVM/OCFS2 with it.

Regards.

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fencing of movable VirtualDomains

2014-10-02 Thread Daniel Dehennin
Hello,

I'm setting up a 3 nodes OpenNebula[1] cluster on Debian Wheezy using a
SAN for shared storage and KVM as hypervisor.

The OpenNebula fontend is a VM for HA[2].

I had some quorum issues when the node running the fontend die as the
two other nodes loose quorum, so I added a pure quorum node in
standby=on mode.

My physical hosts are fenced using stonith:external/ipmi, which works
great, one stonith device per node with a anti-location on itself.

I have more troubles fencing the VMs since they can move.

I try to define a stonith device per VM and colocate it with the VM
itslef like this:

#+begin_src
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/one.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Stopped
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/quorum.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Started is-managed=true
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist=quorum hypervisor_uri=qemu:///system
pcmk_host_list=quorum pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
location ONE-Fontend-fenced-by-hypervisor Stonith-ONE-Frontend \
rule $id=ONE-Fontend-fenced-by-hypervisor-rule inf: #uname ne quorum 
or #uname ne one
location ONE-Frontend-run-on-hypervisor ONE-Frontend \
rule $id=ONE-Frontend-run-on-hypervisor-rule 20: #uname eq nebula1 \
rule $id=ONE-Frontend-run-on-hypervisor-rule-0 30: #uname eq nebula2 \
rule $id=ONE-Frontend-run-on-hypervisor-rule-1 40: #uname eq nebula3
location Quorum-Node-fenced-by-hypervisor Stonith-Quorum-Node \
rule $id=Quorum-Node-fenced-by-hypervisor-rule inf: #uname ne quorum 
or #uname ne one
location Quorum-Node-run-on-hypervisor Quorum-Node \
rule $id=Quorum-Node-run-on-hypervisor-rule 50: #uname eq nebula1 \
rule $id=Quorum-Node-run-on-hypervisor-rule-0 40: #uname eq nebula2 \
rule $id=Quorum-Node-run-on-hypervisor-rule-1 30: #uname eq nebula3
colocation Fence-ONE-Frontend-on-its-hypervisor inf: ONE-Frontend
Stonith-ONE-Frontend
colocation Fence-Quorum-Node-on-its-hypervisor inf: Quorum-Node
Stonith-Quorum-Node
property $id=cib-bootstrap-options \
dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=openais \
expected-quorum-votes=5 \
stonith-enabled=true \
last-lrm-refresh=1412242734 \
stonith-timeout=30 \
symmetric-cluster=false
#+end_src

But, I can not start the Quorum-Node resource, I get the following in logs:

#+begin_src
info: can_fence_host_with_device: Stonith-nebula2-IPMILAN can not fence quorum: 
static-list
#+end_src

All the examples I found describe a configuration where each VM stay on
a single hypervisor, in which case libvirt is configured to listen on
TCP and the “hypervisor_uri” point to it.

Does someone have ideas on configuring stonith:external/libvirt for
movable VMs?

Regards.

Footnotes: 
[1]  http://opennebula.org/

[2]  
http://docs.opennebula.org/4.8/advanced_administration/high_availability/oneha.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-02 Thread Daniel Dehennin
emmanuel segura emi2f...@gmail.com writes:

 for guest fencing you can use, something like this
 http://www.daemonzone.net/e/3/, rather to have a full cluster stack in
 your guest, you can try to use pacemaker-remote for your virtual guest

I think it could be done for the pure quorum node, but my other node
needs to access the cLVM and OCFS2 resources.

After some problems with blocking cLVM, even when cluster was quorated,
I saw that the “Stonith-Quorum-Node” and “Stonith-ONE-Frontend” was
started only when I ask to start the respective VirtualDomain.

It may be due to two “order”:

#+begin_src
order ONE-Frontend-after-its-Stonith inf: Stonith-ONE-Frontend ONE-Frontend
order Quorum-Node-after-its-Stonith inf: Stonith-Quorum-Node Quorum-Node
#+end_src

Now, it seems I mostly have dragons in DLM/o2cb/cLVM in my VM :-/

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org