Re: [Pacemaker] Howto write a STONITH agent
Holger Teutsch holger.teutsch@... writes: I had the same experience. Ilo is _extremely_ slow and unreliable. Go for external/ipmi. That works very fast and reliable. It is available with ILO 2.x firmware. - holger Hi, I am at this same point right now. I am trying to do remote power reset of HP blades. Do those work with IPMI too? What ports need to be opened up in firewall for this? Thanks, Prakash ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
On Fri, 2011-01-14 at 17:10 +0100, Christoph Herrmann wrote: -Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: Fr 14.01.2011 12:31 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; Betreff: Re: [Pacemaker] Howto write a STONITH agent Hi, On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Try this: stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. I've tested it about 10 times and 5 times it hangs. That's not what I want. I had the same experience. Ilo is _extremely_ slow and unreliable. Go for external/ipmi. That works very fast and reliable. It is available with ILO 2.x firmware. - holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
Hi, On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Try this: stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Thanks, Dejan I get the following answer: external/riloe[14317]: ERROR: unknown power method %s, setting to power external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while talking to ilo_hostname=ilo1 ** (process:14315): CRITICAL **: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/riloe status' returned 1 ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with rc 1 stonith: external/riloe device not accessible. But I can access ilo1 with http, https and ssh. The easiest way to reset a node is to run: ssh -i ilo-sshkey ilouser@ilo1 reset system1 I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than debugging the existing one. But I'm looking for a short howto. I've read some STONITH agents, but they are not completely self-explaining and I have some questions. Is there a short howto write a stonith agent manual which google and I were not able to find? Or should I post all questions to the list? here we go: 1. (and most important): What does the status check do, if you have an agent which runs as cloned resource (my ssh-ilo agent should run as a cloned resource). Does it check all nodes? Is it possible to check the status of a single node? 2. What are the expected return codes? more to follow ;-) regards Christoph :-) -- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Michel Lepert Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
-Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: Fr 14.01.2011 12:31 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; Betreff: Re: [Pacemaker] Howto write a STONITH agent Hi, On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Try this: stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. I've tested it about 10 times and 5 times it hangs. That's not what I want. Finally I will use my own ssh-ilo agent. It's very simple (KISS) and reliable. The external/riloe agent did not look to simple. So my questions still remain. Is there a HOWTO for writing stonith agents. Is it usefull to write (to run) a stonith agent as cloned resource? What should the status check do with a cloned stonith resource. Is it usefull in any way? (As long as I have 4 different nodes with 4 different ilo boards.) Cheers, Christoph :-) Thanks, Dejan I get the following answer: external/riloe[14317]: ERROR: unknown power method %s, setting to power external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while talking to ilo_hostname=ilo1 ** (process:14315): CRITICAL **: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/riloe status' returned 1 ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with rc 1 stonith: external/riloe device not accessible. But I can access ilo1 with http, https and ssh. The easiest way to reset a node is to run: ssh -i ilo-sshkey ilouser@ilo1 reset system1 I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than debugging the existing one. But I'm looking for a short howto. I've read some STONITH agents, but they are not completely self-explaining and I have some questions. Is there a short howto write a stonith agent manual which google and I were not able to find? Or should I post all questions to the list? here we go: 1. (and most important): What does the status check do, if you have an agent which runs as cloned resource (my ssh-ilo agent should run as a cloned resource). Does it check all nodes? Is it possible to check the status of a single node? 2. What are the expected return codes? more to follow ;-) regards Christoph :-) -- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Michel Lepert Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
On Fri, Jan 14, 2011 at 05:10:17PM +0100, Christoph Herrmann wrote: -Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: Fr 14.01.2011 12:31 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; Betreff: Re: [Pacemaker] Howto write a STONITH agent Hi, On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Try this: stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. I've tested it about 10 times and 5 times it hangs. That's not what I want. Did you try to find out why did it hang? Finally I will use my own ssh-ilo agent. It's very simple (KISS) and reliable. The external/riloe agent did not look to simple. Right. Let's everybody roll our own ;- So my questions still remain. Is there a HOWTO for writing stonith agents. No. Is it usefull to write (to run) a stonith agent as cloned resource? Sometimes. There are quite some resources. You can take a look at clusterlabs.org. What should the status check do with a cloned stonith resource. Is it usefull in any way? (As long as I have 4 different nodes with 4 different ilo boards.) The status should check for the device status, not nodes. Thanks, Dejan Cheers, Christoph :-) Thanks, Dejan I get the following answer: external/riloe[14317]: ERROR: unknown power method %s, setting to power external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while talking to ilo_hostname=ilo1 ** (process:14315): CRITICAL **: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/riloe status' returned 1 ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with rc 1 stonith: external/riloe device not accessible. But I can access ilo1 with http, https and ssh. The easiest way to reset a node is to run: ssh -i ilo-sshkey ilouser@ilo1 reset system1 I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than debugging the existing one. But I'm looking for a short howto. I've read some STONITH agents, but they are not completely self-explaining and I have some questions. Is there a short howto write a stonith agent manual which google and I were not able to find? Or should I post all questions to the list? here we go: 1. (and most important): What does the status check do, if you have an agent which runs as cloned resource (my ssh-ilo agent should run as a cloned resource). Does it check all nodes? Is it possible to check the status of a single node? 2. What are the expected return codes? more to follow ;-) regards Christoph :-) -- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Michel Lepert Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Howto write a STONITH agent
Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S I get the following answer: external/riloe[14317]: ERROR: unknown power method %s, setting to power external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while talking to ilo_hostname=ilo1 ** (process:14315): CRITICAL **: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/riloe status' returned 1 ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with rc 1 stonith: external/riloe device not accessible. But I can access ilo1 with http, https and ssh. The easiest way to reset a node is to run: ssh -i ilo-sshkey ilouser@ilo1 reset system1 I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than debugging the existing one. But I'm looking for a short howto. I've read some STONITH agents, but they are not completely self-explaining and I have some questions. Is there a short howto write a stonith agent manual which google and I were not able to find? Or should I post all questions to the list? here we go: 1. (and most important): What does the status check do, if you have an agent which runs as cloned resource (my ssh-ilo agent should run as a cloned resource). Does it check all nodes? Is it possible to check the status of a single node? 2. What are the expected return codes? more to follow ;-) regards Christoph :-) -- Vorstand/Board of Management: Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Michel Lepert Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
Hi Christoph, Have you taken a look in /usr/lib64/stonith/plugins/external? The ipmi plugin might serve as a coding example/template. Or maybe the drac5 plugin. At first glance, drac5 appears to be using ssh. Bob Haxo On Thu, 2011-01-13 at 21:09 +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S I get the following answer: external/riloe[14317]: ERROR: unknown power method %s, setting to power external/riloe[14317]: ERROR: [Errno -2] Name or service not known, while talking to ilo_hostname=ilo1 ** (process:14315): CRITICAL **: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/riloe status' returned 1 ** (process:14315): CRITICAL **: external_status: 'riloe status' failed with rc 1 stonith: external/riloe device not accessible. But I can access ilo1 with http, https and ssh. The easiest way to reset a node is to run: ssh -i ilo-sshkey ilouser@ilo1 reset system1 I thouhgt it is easier to write a new ssh-ilo agent (I'm almost done :-) than debugging the existing one. But I'm looking for a short howto. I've read some STONITH agents, but they are not completely self-explaining and I have some questions. Is there a short howto write a stonith agent manual which google and I were not able to find? Or should I post all questions to the list? here we go: 1. (and most important): What does the status check do, if you have an agent which runs as cloned resource (my ssh-ilo agent should run as a cloned resource). Does it check all nodes? Is it possible to check the status of a single node? 2. What are the expected return codes? more to follow ;-) regards Christoph :-) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker