Re: [Pacemaker] Two node cluster and no hardware device for stonith.
Hi, I tryed a network failure and it works. During failure, each node try to fence other node. When network come back, the node with network problem is fenced and reboot. Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on node2, so, I have 2 situations: 1) Network failure on node2 When network come back, node2 is fenced and cman kill (cman) on node2 . Watchdog script check for key registration, and reboot node2. After reboot cluster come back with 2 nodes up. 2) Network failure on node1 When network come back, node1 is fenced, and cman kill(cman) on node2.(cluster is down!) Watchdog script check for key registration, and reboot node1. During reboot cluster is offline because node1 is rebooting and cman on node 2 was killed. After reboot, node1 is up and fence node2. Now, watchdog reboot node2. After reboot, cluster come back with 2 nodes up. The only problem is downtime in situation 2, but it is acceptable for my context. I created my fence device with this command: [ONE]pcs stonith create scsi fence_scsi pcmk_host_list=serverHA1 serverHA2 pcmk_reboot_action=off meta provides=unfencing --force as described here https://access.redhat.com/articles/530533 If possible, I will test the fence_vmware (without Wachdog script) and i will post here my result thansk to all Andrea ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] IPMI stonith resource gets stuck
On 01/30/2015 05:03 PM, Jérôme Charaoui wrote: Thank you for looking at this, much appreciated. The timeout issue intrigued me because I had noticed ipmitool taking sometimes over 10 seconds attempting to execute an action on a non-responding IPMI device over the lanplus interface. So I had a look at the ipmi stonith plugin code and the ipmitool manpage itself and noticed this little gem in the latter: -R count Set the number of retries for lan/lanplus interface (default=4). I then went ahead and added -R 1 in the plugin's ipmitool_opts variable, and my problem went away! If you use fence agent fence_ipmilan then you can set this with retry_on (or --retry-on X when using as argv) m, ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] HA Summit Key-signing Party (was: Organizing HA Summit 2015)
On 26/01/15 15:14 +0100, Jan Pokorný wrote: Timeline? Best if you send me your public keys before 2015-02-02. I will then compile a list of the attendees together with their keys and publish it at https://people.redhat.com/jpokorny/keysigning/2015-ha/ so you can print it out and be ready for the party. Thanks for your cooperation, looking forward to this side-event and hope this will be beneficial to all involved. Thanks for participating. Please print out https://people.redhat.com/jpokorny/keysigning/2015-ha/complete.html (best in landscape format), prior to checking your fingerprints there, indeed, prepare you ID document, and you are ready to proceed the signing event, which is currently planned on 2015-02-05 16:30 CET: http://plan.alteeve.ca/index.php/Main_Page#Feb_5th (I'll post an update should it change). -- Jan pgp2HPYQvBCa7.pgp Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two node cluster and no hardware device for stonith.
That fence failed until the network came back makes your fence method less than ideal. Will it eventually fence with the network still failed? Most importantly though; Cluster resources blocked while the fence was pending? If so, then your cluster is safe, and that is the most important part. On 02/02/15 06:22 AM, Andrea wrote: Hi, I tryed a network failure and it works. During failure, each node try to fence other node. When network come back, the node with network problem is fenced and reboot. Moreover, the cman kill(cman) on one node, tipically node1 kill(cman) on node2, so, I have 2 situations: 1) Network failure on node2 When network come back, node2 is fenced and cman kill (cman) on node2 . Watchdog script check for key registration, and reboot node2. After reboot cluster come back with 2 nodes up. 2) Network failure on node1 When network come back, node1 is fenced, and cman kill(cman) on node2.(cluster is down!) Watchdog script check for key registration, and reboot node1. During reboot cluster is offline because node1 is rebooting and cman on node 2 was killed. After reboot, node1 is up and fence node2. Now, watchdog reboot node2. After reboot, cluster come back with 2 nodes up. The only problem is downtime in situation 2, but it is acceptable for my context. I created my fence device with this command: [ONE]pcs stonith create scsi fence_scsi pcmk_host_list=serverHA1 serverHA2 pcmk_reboot_action=off meta provides=unfencing --force as described here https://access.redhat.com/articles/530533 If possible, I will test the fence_vmware (without Wachdog script) and i will post here my result thansk to all Andrea ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] HA Summit Key-signing Party
On 02/02/15 11:48 AM, Jan Pokorný wrote: On 26/01/15 15:14 +0100, Jan Pokorný wrote: Timeline? Best if you send me your public keys before 2015-02-02. I will then compile a list of the attendees together with their keys and publish it at https://people.redhat.com/jpokorny/keysigning/2015-ha/ so you can print it out and be ready for the party. Thanks for your cooperation, looking forward to this side-event and hope this will be beneficial to all involved. Thanks for participating. Please print out https://people.redhat.com/jpokorny/keysigning/2015-ha/complete.html (best in landscape format), prior to checking your fingerprints there, indeed, prepare you ID document, and you are ready to proceed the signing event, which is currently planned on 2015-02-05 16:30 CET: http://plan.alteeve.ca/index.php/Main_Page#Feb_5th (I'll post an update should it change). Will there be a printer available in the room/area of the summit? If so, it might be good to set aside a bit of time to help people new to PGP get setup before the actual key-signing. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org