Re: [Pacemaker] AP9606 fencing device

2010-11-17 Thread Dejan Muhamedagic
Hi,

On Tue, Nov 16, 2010 at 08:15:10PM -0700, Devin Reade wrote:
 --On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis
 pavlos.paris...@gmail.com wrote:
 
  I have a APC AP9606 PDU and I am trying to find a stonith agent which
  works with that PDU.
 
 I know that this is an old thread, but I'll reply anyway.
 
 I have a one cluster that uses an old APC AP9606 for which I've not
 been able to obtain a flash update.  In particular, it is:
  hardware revision: J13
  APP version 2.2.0
  AOS version 3.0.3
 
 It is running just fine (see caveat below) with the following configuration,
 and I can attest that it has properly stonith'd nodes many times.
 
 primitive msw stonith:apcmastersnmp \
 operations $id=msw-operations \
 op monitor interval=15 timeout=15 start-delay=15 \
 params ipaddr=IPADDR port=161 community=COMMUNITY
 clone msw-clone msw \
 meta clone-max=2 target-role=started
 
 (yeah, that monitor interval is probably a little quick ...)
 
 That particular cluster is getting long in the tooth:
  pacemaker-1.0.5-4.6.x86_64
  openais-0.80.5-15.1.x86_64
 
 The caveat is that this PDU used to work with the default implementation,
 however at some point someone updated the OIDs in apcmastersnmp to 
 match newer firmware.  Therefore, I had to reverse patch that RA:

Yes, looking at the repository, that happened sometimes in 2007.
Though the log message claimed that the OIDs would work with both
older and newer PDUs. Apparently not. Then there was an effort by
Philip Gwyn to implement a new plugin which would support both
and it was almost finished. At the time we had a somewhat more
stringent contribution policy and Philip didn't do what was
necessary in the end. It's a shame that that contribution didn't
make it to the project at the time. I can see that the code is
still available at http://www.awale.qc.ca/ha-linux/apc-snmp/
If Philip's still listening or somebody else wants to push this,
we can take a look at it again.

Thanks,

Dejan

 ===
 --- apcmastersnmp.c.orig2009-09-26 16:12:27.0 -0600
 +++ apcmastersnmp.c 2009-09-28 16:46:17.0 -0600
 @@ -137,12 +137,12 @@
  #define OUTLET_NO_CMD_PEND 2
  
  /* oids */
 -#define OID_IDENT  .1.3.6.1.4.1.318.1.1.12.1.5.0
 -#define OID_NUM_OUTLETS
 .1.3.6.1.4.1.318.1.1.12.1.8.0
 -#define OID_OUTLET_NAMES
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.2.%i
 -#define OID_OUTLET_STATE
 .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.%i
 -#define OID_OUTLET_COMMAND_PENDING
 .1.3.6.1.4.1.318.1.1.12.3.5.1.1.5.%i
 -#define OID_OUTLET_REBOOT_DURATION
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.6.%i
 +#define OID_IDENT  .1.3.6.1.4.1.318.1.1.4.1.4.0
 +#define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0
 +#define OID_OUTLET_NAMES   .1.3.6.1.4.1.318.1.1.4.5.2.1.3.%i
 +#define OID_OUTLET_STATE   .1.3.6.1.4.1.318.1.1.4.4.2.1.3.%i
 +#define OID_OUTLET_COMMAND_PENDING .1.3.6.1.4.1.318.1.1.4.4.2.1.2.%i
 +#define OID_OUTLET_REBOOT_DURATION .1.3.6.1.4.1.318.1.1.4.5.2.1.5.%i
  
  /*
 snmpset -c private -v1 172.16.0.32:161
 ===
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-11-16 Thread Devin Reade
--On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis
pavlos.paris...@gmail.com wrote:

 I have a APC AP9606 PDU and I am trying to find a stonith agent which
 works with that PDU.

I know that this is an old thread, but I'll reply anyway.

I have a one cluster that uses an old APC AP9606 for which I've not
been able to obtain a flash update.  In particular, it is:
 hardware revision: J13
 APP version 2.2.0
 AOS version 3.0.3

It is running just fine (see caveat below) with the following configuration,
and I can attest that it has properly stonith'd nodes many times.

primitive msw stonith:apcmastersnmp \
operations $id=msw-operations \
op monitor interval=15 timeout=15 start-delay=15 \
params ipaddr=IPADDR port=161 community=COMMUNITY
clone msw-clone msw \
meta clone-max=2 target-role=started

(yeah, that monitor interval is probably a little quick ...)

That particular cluster is getting long in the tooth:
 pacemaker-1.0.5-4.6.x86_64
 openais-0.80.5-15.1.x86_64

The caveat is that this PDU used to work with the default implementation,
however at some point someone updated the OIDs in apcmastersnmp to 
match newer firmware.  Therefore, I had to reverse patch that RA:

===
--- apcmastersnmp.c.orig2009-09-26 16:12:27.0 -0600
+++ apcmastersnmp.c 2009-09-28 16:46:17.0 -0600
@@ -137,12 +137,12 @@
 #define OUTLET_NO_CMD_PEND 2
 
 /* oids */
-#define OID_IDENT  .1.3.6.1.4.1.318.1.1.12.1.5.0
-#define OID_NUM_OUTLETS
.1.3.6.1.4.1.318.1.1.12.1.8.0
-#define OID_OUTLET_NAMES
.1.3.6.1.4.1.318.1.1.12.3.4.1.1.2.%i
-#define OID_OUTLET_STATE
.1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.%i
-#define OID_OUTLET_COMMAND_PENDING
.1.3.6.1.4.1.318.1.1.12.3.5.1.1.5.%i
-#define OID_OUTLET_REBOOT_DURATION
.1.3.6.1.4.1.318.1.1.12.3.4.1.1.6.%i
+#define OID_IDENT  .1.3.6.1.4.1.318.1.1.4.1.4.0
+#define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0
+#define OID_OUTLET_NAMES   .1.3.6.1.4.1.318.1.1.4.5.2.1.3.%i
+#define OID_OUTLET_STATE   .1.3.6.1.4.1.318.1.1.4.4.2.1.3.%i
+#define OID_OUTLET_COMMAND_PENDING .1.3.6.1.4.1.318.1.1.4.4.2.1.2.%i
+#define OID_OUTLET_REBOOT_DURATION .1.3.6.1.4.1.318.1.1.4.5.2.1.5.%i
 
 /*
snmpset -c private -v1 172.16.0.32:161
===



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-11-16 Thread Pavlos Parissis
On 17 November 2010 04:15, Devin Reade g...@gno.org wrote:

 --On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis
 pavlos.paris...@gmail.com wrote:

  I have a APC AP9606 PDU and I am trying to find a stonith agent which
  works with that PDU.

 I know that this is an old thread, but I'll reply anyway.

 I have a one cluster that uses an old APC AP9606 for which I've not
 been able to obtain a flash update.  In particular, it is:
 hardware revision: J13
 APP version 2.2.0
 AOS version 3.0.3

 It is running just fine (see caveat below) with the following
 configuration,
 and I can attest that it has properly stonith'd nodes many times.

 primitive msw stonith:apcmastersnmp \
operations $id=msw-operations \
op monitor interval=15 timeout=15 start-delay=15 \
params ipaddr=IPADDR port=161 community=COMMUNITY
 clone msw-clone msw \
meta clone-max=2 target-role=started

 (yeah, that monitor interval is probably a little quick ...)

 That particular cluster is getting long in the tooth:
 pacemaker-1.0.5-4.6.x86_64
 openais-0.80.5-15.1.x86_64

 The caveat is that this PDU used to work with the default implementation,
 however at some point someone updated the OIDs in apcmastersnmp to
 match newer firmware.  Therefore, I had to reverse patch that RA:

 ===
 --- apcmastersnmp.c.orig2009-09-26 16:12:27.0 -0600
 +++ apcmastersnmp.c 2009-09-28 16:46:17.0 -0600
 @@ -137,12 +137,12 @@
  #define OUTLET_NO_CMD_PEND 2

  /* oids */
 -#define OID_IDENT  .1.3.6.1.4.1.318.1.1.12.1.5.0
 -#define OID_NUM_OUTLETS
 .1.3.6.1.4.1.318.1.1.12.1.8.0
 -#define OID_OUTLET_NAMES
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.2.%i
 -#define OID_OUTLET_STATE
 .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.%i
 -#define OID_OUTLET_COMMAND_PENDING
 .1.3.6.1.4.1.318.1.1.12.3.5.1.1.5.%i
 -#define OID_OUTLET_REBOOT_DURATION
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.6.%i
 +#define OID_IDENT  .1.3.6.1.4.1.318.1.1.4.1.4.0
 +#define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0
 +#define OID_OUTLET_NAMES   .1.3.6.1.4.1.318.1.1.4.5.2.1.3.%i
 +#define OID_OUTLET_STATE   .1.3.6.1.4.1.318.1.1.4.4.2.1.3.%i
 +#define OID_OUTLET_COMMAND_PENDING .1.3.6.1.4.1.318.1.1.4.4.2.1.2.%i
 +#define OID_OUTLET_REBOOT_DURATION .1.3.6.1.4.1.318.1.1.4.5.2.1.5.%i

  /*
snmpset -c private -v1 172.16.0.32:161
 ===


I faced the same problem and because I didn't want to modify the code of
apcmastersnmp RA, I used the rackpdu RA where I could set OIDs in the
parameters.
This RA worked perfectly until the PDU died!
I suggest to use the rackpdu RA because if you upgrade your cluster software
your modification will be gone.

Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-28 Thread Dejan Muhamedagic
Hi,

On Wed, Oct 27, 2010 at 08:15:09PM +0200, Pavlos Parissis wrote:
 On 27 October 2010 19:46, Pavlos Parissis pavlos.paris...@gmail.com wrote:
 
  I did more testing using the clone type of fencing and worked as I
  expected.
 
  test1 hack init script to return 1 on stop and run a crm resource move on
  that resource
  result node it was fenced and resource was started on the other node
 
  test2 using firewall to break the heartbeat links on node with resource
  result node it was fenced and resource was started on the other node
 
  As Dejan suggested I am going to run the same type of tests when 1 fence
  resource is used.
  In this test I will try to cause a fencing on the node which has fencing
  resource running on it and see if pacemaker moves the resource before it
  fences the node.
 
 
 
 
 I did the same tests without cloning and pacemaker moves fencing resource
 before triggers a reboot on the node where fencing resource was running.
 So, cloning fencing resource and having just one fence resource have the
 same behaviour! at least for these 2 tests.
 now I don't know which configuration solution I should choose!

Whichever you feel more comfortable with, providing that the
device really can support multiple connections simultaneously.
I'd opt for non-cloned version. It's simpler, it avoids possible
device contention.

Thanks,

Dejan


 Cheers,
 Pavlos

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
 
 Does anyone know any other PDU which works out of box with the
 supplied stonith agents?
 

I use APC AP7901, works like a charm:

primitive pdu stonith:external/rackpdu \
params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
clone fencing pdu

Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:

 On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
 
 On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
 
 Does anyone know any other PDU which works out of box with the
 supplied stonith agents?
 
 
 I use APC AP7901, works like a charm:
 
 primitive pdu stonith:external/rackpdu \
params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
 clone fencing pdu
 
 Vadym
 
 Then most likely the defaults OIDs of the rackpdu agents matches the
 OIDs of the AP7901.
 In my case I have to use OID for the device itself
 1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
 outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
 
 Hold on a sec, are you using clone on AP7901? Does it support multiple
 connections? Mine it doesn't.

Then it's useless regardless clone or not, you have to have multiple instances, 
because server can't reliable fence itself, right?

Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Dejan Muhamedagic
Hi,

On Wed, Oct 27, 2010 at 01:58:20PM +0200, Pavlos Parissis wrote:
 On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote:
 
 
  On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
 
   On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
  
   On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
  
   Does anyone know any other PDU which works out of box with the
   supplied stonith agents?
  
  
   I use APC AP7901, works like a charm:
  
   primitive pdu stonith:external/rackpdu \
  params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
   clone fencing pdu
  
   Vadym
  
   Then most likely the defaults OIDs of the rackpdu agents matches the
   OIDs of the AP7901.
   In my case I have to use OID for the device itself
   1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
   outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
  
   Hold on a sec, are you using clone on AP7901? Does it support multiple
   connections? Mine it doesn't.
 
  Then it's useless regardless clone or not, you have to have multiple
  instances, because server can't reliable fence itself, right?
 
 
 
 My understanding is/was that I need to have one resource running on 1 of the
 3 nodes in the cluster and if a fence event has to be triggered then
 pacemaker will send to it to the one stonith resource. I am planning to test
 that the coming days.[1]
 Am I right? if not then I have to buy a different PDU! :-(

Yes. In case a node which is currently running the stonith
resource is to be fenced, then the stonith resource would move
elsewhere first. But, yes, you should test this just like
anything else. Make sure to test both the node gone event
(failed links) and a critical action failing (such as stop).

Thanks,

Dejan

 Cheers,
 Pavlos
 
 
 [1] by testing I mean kill the heartbeat links on 1 node and DC node should
 fence that node.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 7:58 AM, Pavlos Parissis wrote:

 
 On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote:
 
 On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
 
  On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
 
  On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
 
  Does anyone know any other PDU which works out of box with the
  supplied stonith agents?
 
 
  I use APC AP7901, works like a charm:
 
  primitive pdu stonith:external/rackpdu \
 params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
  clone fencing pdu
 
  Vadym
 
  Then most likely the defaults OIDs of the rackpdu agents matches the
  OIDs of the AP7901.
  In my case I have to use OID for the device itself
  1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
  outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
 
  Hold on a sec, are you using clone on AP7901? Does it support multiple
  connections? Mine it doesn't.
 
 Then it's useless regardless clone or not, you have to have multiple 
 instances, because server can't reliable fence itself, right?
 
 
 
 My understanding is/was that I need to have one resource running on 1 of the 
 3 nodes in the cluster and if a fence event has to be triggered then 
 pacemaker will send to it to the one stonith resource. I am planning to test 
 that the coming days.[1]
 Am I right? if not then I have to buy a different PDU! :-(
 

My understanding is you have to have a fencing device for each of your hosts. 
Are you sure one connection limitation applies for SNMP? Most likely it's only 
for tcp sessions - ssh/http ?
If you look into rackpdu log you will see this:

Oct 19 12:39:00 xen-11 stonithd: [8606]: debug: external_run_cmd: Calling 
'/usr/lib64/stonith/plugins/external/rackpdu gethosts'
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_run_cmd: 
'/usr/lib64/stonith/plugins/external/rackpdu gethosts' output: xen-11 xen-12 
Outlet_3 Outlet_4 Outlet_5 Outlet_6 Outlet_7 Outlet_8 
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: running 
'rackpdu gethosts' returned 0
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
xen-11
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
xen-12
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_3
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_4
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_5
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_6
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_7
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu host 
Outlet_8
Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: remove us (xen-11) from the 
host list for pdu:0

check the last line - the agent is smart enough to know it can't fence itself.

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Vadym Chepkov

On Oct 27, 2010, at 8:11 AM, Dejan Muhamedagic wrote:

 Hi,
 
 On Wed, Oct 27, 2010 at 01:58:20PM +0200, Pavlos Parissis wrote:
 On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote:
 
 
 On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
 
 On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
 
 On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
 
 Does anyone know any other PDU which works out of box with the
 supplied stonith agents?
 
 
 I use APC AP7901, works like a charm:
 
 primitive pdu stonith:external/rackpdu \
   params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
 clone fencing pdu
 
 Vadym
 
 Then most likely the defaults OIDs of the rackpdu agents matches the
 OIDs of the AP7901.
 In my case I have to use OID for the device itself
 1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
 outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
 
 Hold on a sec, are you using clone on AP7901? Does it support multiple
 connections? Mine it doesn't.
 
 Then it's useless regardless clone or not, you have to have multiple
 instances, because server can't reliable fence itself, right?
 
 
 
 My understanding is/was that I need to have one resource running on 1 of the
 3 nodes in the cluster and if a fence event has to be triggered then
 pacemaker will send to it to the one stonith resource. I am planning to test
 that the coming days.[1]
 Am I right? if not then I have to buy a different PDU! :-(
 
 Yes. In case a node which is currently running the stonith
 resource is to be fenced, then the stonith resource would move
 elsewhere first. But, yes, you should test this just like
 anything else. Make sure to test both the node gone event
 (failed links) and a critical action failing (such as stop).
 
 Thanks,
 
 Dejan

rackpdu stonith agent seems to explicitly remove node itself from list of hosts 
it can fence. so I assume if you have just one instance running, 
cluster would not see any stonith device capable to fence server where agent 
started initially. Would pacemaker move such resource anyway?
Since it reported it can't fence server in trouble?

Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 14:09, Vadym Chepkov vchep...@gmail.com wrote:

[...snip...]

  Hold on a sec, are you using clone on AP7901? Does it support multiple
  connections? Mine it doesn't.

 Then it's useless regardless clone or not, you have to have multiple
 instances, because server can't reliable fence itself, right?



 My understanding is/was that I need to have one resource running on 1 of
the
 3 nodes in the cluster and if a fence event has to be triggered then
 pacemaker will send to it to the one stonith resource. I am planning to
test
 that the coming days.[1]
 Am I right? if not then I have to buy a different PDU! :-(

 My understanding is you have to have a fencing device for each of your
 hosts. Are you sure one connection limitation applies for SNMP? Most
likely
 it's only for tcp sessions - ssh/http ?

Valid point Vadym, SNMP is over UDP so conntionless communication.
I am wondering how i can test this - if cloning works on this PDU.

 If you look into rackpdu log you will see this:
 Oct 19 12:39:00 xen-11 stonithd: [8606]: debug: external_run_cmd: Calling
 '/usr/lib64/stonith/plugins/external/rackpdu gethosts'
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_run_cmd:
 '/usr/lib64/stonith/plugins/external/rackpdu gethosts' output: xen-11
xen-12
 Outlet_3 Outlet_4 Outlet_5 Outlet_6 Outlet_7 Outlet_8
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: running
 'rackpdu gethosts' returned 0
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host xen-11
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host xen-12
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_3
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_4
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_5
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_6
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_7
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_8
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: remove us (xen-11) from
the
 host list for pdu:0
 check the last line - the agent is smart enough to know it can't fence
 itself.
 Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Dejan Muhamedagic
Hi,

On Wed, Oct 27, 2010 at 08:34:19AM -0400, Vadym Chepkov wrote:
 
 On Oct 27, 2010, at 8:11 AM, Dejan Muhamedagic wrote:
 
  Hi,
  
  On Wed, Oct 27, 2010 at 01:58:20PM +0200, Pavlos Parissis wrote:
  On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote:
  
  
  On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:
  
  On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
  
  On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
  
  Does anyone know any other PDU which works out of box with the
  supplied stonith agents?
  
  
  I use APC AP7901, works like a charm:
  
  primitive pdu stonith:external/rackpdu \
params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
  clone fencing pdu
  
  Vadym
  
  Then most likely the defaults OIDs of the rackpdu agents matches the
  OIDs of the AP7901.
  In my case I have to use OID for the device itself
  1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
  outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
  
  Hold on a sec, are you using clone on AP7901? Does it support multiple
  connections? Mine it doesn't.
  
  Then it's useless regardless clone or not, you have to have multiple
  instances, because server can't reliable fence itself, right?
  
  
  
  My understanding is/was that I need to have one resource running on 1 of 
  the
  3 nodes in the cluster and if a fence event has to be triggered then
  pacemaker will send to it to the one stonith resource. I am planning to 
  test
  that the coming days.[1]
  Am I right? if not then I have to buy a different PDU! :-(
  
  Yes. In case a node which is currently running the stonith
  resource is to be fenced, then the stonith resource would move
  elsewhere first. But, yes, you should test this just like
  anything else. Make sure to test both the node gone event
  (failed links) and a critical action failing (such as stop).
  
  Thanks,
  
  Dejan
 
 rackpdu stonith agent seems to explicitly remove node itself from list of 
 hosts it can fence. so I assume if you have just one instance running, 
 cluster would not see any stonith device capable to fence server where agent 
 started initially. Would pacemaker move such resource anyway?

Yes, it should.

 Since it reported it can't fence server in trouble?

The node on which the resource is running is removed from the
hostlist, but once the resource moves elsewhere, the node will
reappear in the list.

Thanks,

Dejan

 Vadym
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
Hi,

I quickly tested cloning on this fencing and it worked. I used iptables to
break the heartbeat link on node-01 and it was fenced by the other node -
the DC.
In the coming days I will test without cloning fencing device.

Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 14:09, Vadym Chepkov vchep...@gmail.com wrote:


 On Oct 27, 2010, at 7:58 AM, Pavlos Parissis wrote:


 On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote:


 On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote:

  On 27 October 2010 13:12, Vadym Chepkov vchep...@gmail.com wrote:
 
  On Oct 27, 2010, at 3:47 AM, Pavlos Parissis wrote:
 
  Does anyone know any other PDU which works out of box with the
  supplied stonith agents?
 
 
  I use APC AP7901, works like a charm:
 
  primitive pdu stonith:external/rackpdu \
 params pduip=10.6.6.6 community=pdu-6 hostlist=AUTO
  clone fencing pdu
 
  Vadym
 
  Then most likely the defaults OIDs of the rackpdu agents matches the
  OIDs of the AP7901.
  In my case I have to use OID for the device itself
  1.3.6.1.4.1.318.1.1.4.4.2.1.3  and OID for retrieving (snmpwalk) the
  outlet list .1.3.6.1.4.1.318.1.1.4.4.2.1.4 .
 
  Hold on a sec, are you using clone on AP7901? Does it support multiple
  connections? Mine it doesn't.

 Then it's useless regardless clone or not, you have to have multiple
 instances, because server can't reliable fence itself, right?



 My understanding is/was that I need to have one resource running on 1 of
 the 3 nodes in the cluster and if a fence event has to be triggered then
 pacemaker will send to it to the one stonith resource. I am planning to test
 that the coming days.[1]
 Am I right? if not then I have to buy a different PDU! :-(


 My understanding is you have to have a fencing device for each of your
 hosts. Are you sure one connection limitation applies for SNMP? Most likely
 it's only for tcp sessions - ssh/http ?
 If you look into rackpdu log you will see this:

 Oct 19 12:39:00 xen-11 stonithd: [8606]: debug: external_run_cmd: Calling
 '/usr/lib64/stonith/plugins/external/rackpdu gethosts'
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_run_cmd:
 '/usr/lib64/stonith/plugins/external/rackpdu gethosts' output: xen-11 xen-12
 Outlet_3 Outlet_4 Outlet_5 Outlet_6 Outlet_7 Outlet_8
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: running
 'rackpdu gethosts' returned 0
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host xen-11
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host xen-12
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_3
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_4
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_5
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_6
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_7
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: external_hostlist: rackpdu
 host Outlet_8
 Oct 19 12:39:01 xen-11 stonithd: [8606]: debug: remove us (xen-11) from the
 host list for pdu:0

 check the last line - the agent is smart enough to know it can't fence
 itself.



do you enable debug by setting debug 1 on ha.cf?
do you see that WARN on your system?
stonith-ng: [3369]: WARN: parse_host_line: Could not parse (0 42):
/usr/lib/stonith/plugins/external/rackpdu: line 125: local: can only be used
in a function

Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 19:23, Vadym Chepkov vchep...@gmail.com wrote:


 On Oct 27, 2010, at 1:18 PM, Pavlos Parissis wrote:


 ok, i have done the same hack but i will remove it. I think 1.1.4 will be
 out before we go on production and hopefully this will be fixed in 1.1.4.



 This is part of cluster-glue, not pacemaker and it's 1.0.6 now


 yeap you aright and I am wrong
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
I did more testing using the clone type of fencing and worked as I expected.

test1 hack init script to return 1 on stop and run a crm resource move on
that resource
result node it was fenced and resource was started on the other node

test2 using firewall to break the heartbeat links on node with resource
result node it was fenced and resource was started on the other node

As Dejan suggested I am going to run the same type of tests when 1 fence
resource is used.
In this test I will try to cause a fencing on the node which has fencing
resource running on it and see if pacemaker moves the resource before it
fences the node.


Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 19:46, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 I did more testing using the clone type of fencing and worked as I
 expected.

 test1 hack init script to return 1 on stop and run a crm resource move on
 that resource
 result node it was fenced and resource was started on the other node

 test2 using firewall to break the heartbeat links on node with resource
 result node it was fenced and resource was started on the other node

 As Dejan suggested I am going to run the same type of tests when 1 fence
 resource is used.
 In this test I will try to cause a fencing on the node which has fencing
 resource running on it and see if pacemaker moves the resource before it
 fences the node.




I did the same tests without cloning and pacemaker moves fencing resource
before triggers a reboot on the node where fencing resource was running.
So, cloning fencing resource and having just one fence resource have the
same behaviour! at least for these 2 tests.
now I don't know which configuration solution I should choose!

Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker