Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-02 Thread Andrea
Hi,

I tryed a network failure and it works.
During failure, each node try to fence other node.
When network come back, the node with network problem is fenced and reboot.
Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on
node2, so, I have 2 situations:

1) Network failure on node2
When network come back, node2 is fenced and cman kill (cman) on node2 . 
Watchdog script check for key registration, and reboot node2. 
After reboot cluster come back with 2 nodes up.

2) Network failure on node1
When network come back, node1 is fenced,  and cman kill(cman) on
node2.(cluster is down!) 
Watchdog script check for key registration, and reboot node1. 
During reboot cluster is offline because node1 is rebooting and cman on node
2 was killed. 
After reboot, node1 is up and fence node2. Now, watchdog reboot node2. 
After reboot, cluster come back with 2 nodes up.


The only problem is downtime in  situation 2, but it is acceptable for my
context.
I created my fence device with this command:
[ONE]pcs stonith create scsi fence_scsi pcmk_host_list=serverHA1 serverHA2
pcmk_reboot_action=off meta provides=unfencing --force
as described here 
https://access.redhat.com/articles/530533


If possible, I will test the fence_vmware (without Wachdog script) and i
will post here my result

thansk to all
Andrea




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] IPMI stonith resource gets stuck

2015-02-02 Thread Marek marx Grac


On 01/30/2015 05:03 PM, Jérôme Charaoui wrote:


Thank you for looking at this, much appreciated.

The timeout issue intrigued me because I had noticed ipmitool taking 
sometimes over 10 seconds attempting to execute an action on a 
non-responding IPMI device over the lanplus interface.


So I had a look at the ipmi stonith plugin code and the ipmitool 
manpage itself and noticed this little gem in the latter:


-R count Set  the  number  of  retries  for lan/lanplus interface 
(default=4).


I then went ahead and added -R 1 in the plugin's ipmitool_opts 
variable, and my problem went away!


If you use fence agent fence_ipmilan then you can set this with retry_on 
(or --retry-on X when using as argv)


m,

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] HA Summit Key-signing Party (was: Organizing HA Summit 2015)

2015-02-02 Thread Jan Pokorný
On 26/01/15 15:14 +0100, Jan Pokorný wrote:
 Timeline?
 Best if you send me your public keys before 2015-02-02.  I will then
 compile a list of the attendees together with their keys and publish
 it at https://people.redhat.com/jpokorny/keysigning/2015-ha/
 so you can print it out and be ready for the party.
 
 Thanks for your cooperation, looking forward to this side-event and
 hope this will be beneficial to all involved.

Thanks for participating.

Please print out
https://people.redhat.com/jpokorny/keysigning/2015-ha/complete.html
(best in landscape format), prior to checking your fingerprints
there, indeed, prepare you ID document, and you are ready to proceed
the signing event, which is currently planned on 2015-02-05 16:30 CET:
http://plan.alteeve.ca/index.php/Main_Page#Feb_5th
(I'll post an update should it change).

-- 
Jan


pgp2HPYQvBCa7.pgp
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-02 Thread Digimer
That fence failed until the network came back makes your fence method 
less than ideal. Will it eventually fence with the network still failed?


Most importantly though; Cluster resources blocked while the fence was 
pending? If so, then your cluster is safe, and that is the most 
important part.


On 02/02/15 06:22 AM, Andrea wrote:

Hi,

I tryed a network failure and it works.
During failure, each node try to fence other node.
When network come back, the node with network problem is fenced and reboot.
Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on
node2, so, I have 2 situations:

1) Network failure on node2
When network come back, node2 is fenced and cman kill (cman) on node2 .
Watchdog script check for key registration, and reboot node2.
After reboot cluster come back with 2 nodes up.

2) Network failure on node1
When network come back, node1 is fenced,  and cman kill(cman) on
node2.(cluster is down!)
Watchdog script check for key registration, and reboot node1.
During reboot cluster is offline because node1 is rebooting and cman on node
2 was killed.
After reboot, node1 is up and fence node2. Now, watchdog reboot node2.
After reboot, cluster come back with 2 nodes up.


The only problem is downtime in  situation 2, but it is acceptable for my
context.
I created my fence device with this command:
[ONE]pcs stonith create scsi fence_scsi pcmk_host_list=serverHA1 serverHA2
pcmk_reboot_action=off meta provides=unfencing --force
as described here
https://access.redhat.com/articles/530533


If possible, I will test the fence_vmware (without Wachdog script) and i
will post here my result

thansk to all
Andrea




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] HA Summit Key-signing Party

2015-02-02 Thread Digimer

On 02/02/15 11:48 AM, Jan Pokorný wrote:

On 26/01/15 15:14 +0100, Jan Pokorný wrote:

Timeline?
Best if you send me your public keys before 2015-02-02.  I will then
compile a list of the attendees together with their keys and publish
it at https://people.redhat.com/jpokorny/keysigning/2015-ha/
so you can print it out and be ready for the party.

Thanks for your cooperation, looking forward to this side-event and
hope this will be beneficial to all involved.


Thanks for participating.

Please print out
https://people.redhat.com/jpokorny/keysigning/2015-ha/complete.html
(best in landscape format), prior to checking your fingerprints
there, indeed, prepare you ID document, and you are ready to proceed
the signing event, which is currently planned on 2015-02-05 16:30 CET:
http://plan.alteeve.ca/index.php/Main_Page#Feb_5th
(I'll post an update should it change).


Will there be a printer available in the room/area of the summit? If so, 
it might be good to set aside a bit of time to help people new to PGP 
get setup before the actual key-signing.


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org