Re: [Pacemaker] Project updates

2010-11-16 Thread Andrew Beekhof
On Tue, Nov 16, 2010 at 8:07 AM, Keisuke MORI keisuke.mori...@gmail.com wrote:
 Hi Andrew,

 On Fri, Nov 12, 2010 at 9:32 AM, Andrew Beekhof and...@beekhof.net wrote:
 For those that aren't using RSS readers, I wanted to draw people's
 attention to a couple of updates that went out today.

 Congratulations for the 1.0.10 release!
 It is the great excitement for us too,
 but one thing I would like to comment...


 2010/11/13 Andrew Beekhof and...@beekhof.net:
 On Fri, Nov 12, 2010 at 5:49 PM, Vadym Chepkov vchep...@gmail.com wrote:


  STABLE_SERIES          = stable-1.0

  RPM_ROOT       = $(shell pwd)
 diff -r 99f5a1e61667 configure.ac
 --- a/configure.ac      Fri Nov 12 09:12:32 2010 +0100
 +++ b/configure.ac      Fri Nov 12 11:47:28 2010 -0500
 @@ -19,7 +19,7 @@
  dnl     checks for library functions
  dnl     checks for system services

 -AC_INIT(pacemaker, 1.0.9, pacemaker@oss.clusterlabs.org)
 +AC_INIT(pacemaker, 1.0.10, pacemaker@oss.clusterlabs.org)

 thats kinda annoying but not crucial.  thanks for pointing it out


 This would be confusing for users to tell which version they're
 actually using when they are going to report a problem because all the
 logs and crm_mon output shows the version as 1.0.9.

 Any chance of the release for another RPMs with this fix?

Oh, I forgot about crm_mon.
I'll see what I can do.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] how to use sonith externel/vmware

2010-11-16 Thread Dika Ye
Dear All,

 

What is the externel/vmware sonith using for? Do some body know how to using
it?

Thanks.

 

Best wishes,

 

Dika.Ye



 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] options listed more than once

2010-11-16 Thread Andrew Beekhof
On Mon, Nov 15, 2010 at 4:35 PM, Pavlos Parissis
pavlos.paris...@gmail.com wrote:
 Hi,

 When I have multiple values for a cluster options, ow do I check which
 value is currently being used by the cluster?
 In the configuration explained there is a reference to rules chapter
 but I couldn't find an answer on that chapter.

Good question, I don't think we have a way to do that directly at the moment.
You might be able to use crm_resource to infer it with -g.

Not a bad feature request though, could you add it to bugzilla?


 Here is what I have and I want to get the current value of resource-stickiness
 [r...@node-03 log]# crm_attribute --type rsc_defaults --name
 resource-stickiness --query
 Multiple attributes match name=resource-stickiness
  Value: INFINITY       (id=working-hours-stickiness)
  Value: 0      (id=after-hours-stickiness)


 Cheers,
 Pavlos

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Balancing of clone resources (globally-unique=true)

2010-11-16 Thread Andrew Beekhof
On Mon, Nov 15, 2010 at 2:38 PM, Chris Picton ch...@ecntelecoms.com wrote:
 On Mon, 15 Nov 2010 08:37:52 +0100, Andrew Beekhof wrote:

 On Fri, Nov 12, 2010 at 7:41 AM, Chris Picton
 ch...@ecntelecoms.com wrote:
 I have attached the output as requested

 Normally it would get balanced, but its being pushed to 01 because there
 are so many resources on 02

    sort_node_weight: slb-test-02.ecntelecoms.za.net (12) 
 slb-test-01.ecntelecoms.za.net (2) : resources

 So the cluster is trying to balance out the resources, just not at the
 level you were expecting.

 I agree with the above.

 However, how would I weight the clusterip clone so it preferentially is
 balanced across the nodes, even in the presence of many other resources
 on a single node.

A reasonable request, could you create a bugzilla for that?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] options listed more than once

2010-11-16 Thread Pavlos Parissis
On 16 November 2010 10:49, Andrew Beekhof and...@beekhof.net wrote:
 On Mon, Nov 15, 2010 at 4:35 PM, Pavlos Parissis
 pavlos.paris...@gmail.com wrote:
 Hi,

 When I have multiple values for a cluster options, ow do I check which
 value is currently being used by the cluster?
 In the configuration explained there is a reference to rules chapter
 but I couldn't find an answer on that chapter.

 Good question, I don't think we have a way to do that directly at the moment.
 You might be able to use crm_resource to infer it with -g.
Nope
[r...@node-01 ~]# crm_resource -g resource-stickiness  -r ip_01
Error performing operation: The object/attribute does not exist
[r...@node-01 ~]# crm_resource -g resource-stickiness -t group -r pbx_service_01
Error performing operation: The object/attribute does not exist
[r...@node-01 ~]# crm_resource -g resource-stickiness -t primitive -r ip_01
Error performing operation: The object/attribute does not exist


 Not a bad feature request though, could you add it to bugzilla?
Done, http://developerbugs.linux-foundation.org/show_bug.cgi?id=2521



 Here is what I have and I want to get the current value of 
 resource-stickiness
 [r...@node-03 log]# crm_attribute --type rsc_defaults --name
 resource-stickiness --query
 Multiple attributes match name=resource-stickiness
  Value: INFINITY       (id=working-hours-stickiness)
  Value: 0      (id=after-hours-stickiness)


 Cheers,
 Pavlos

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] options listed more than once

2010-11-16 Thread Andrew Beekhof
On Tue, Nov 16, 2010 at 11:04 AM, Pavlos Parissis
pavlos.paris...@gmail.com wrote:
 On 16 November 2010 10:49, Andrew Beekhof and...@beekhof.net wrote:
 On Mon, Nov 15, 2010 at 4:35 PM, Pavlos Parissis
 pavlos.paris...@gmail.com wrote:
 Hi,

 When I have multiple values for a cluster options, ow do I check which
 value is currently being used by the cluster?
 In the configuration explained there is a reference to rules chapter
 but I couldn't find an answer on that chapter.

 Good question, I don't think we have a way to do that directly at the moment.
 You might be able to use crm_resource to infer it with -g.
 Nope

did you try with -m too?


 [r...@node-01 ~]# crm_resource -g resource-stickiness  -r ip_01
 Error performing operation: The object/attribute does not exist
 [r...@node-01 ~]# crm_resource -g resource-stickiness -t group -r 
 pbx_service_01
 Error performing operation: The object/attribute does not exist
 [r...@node-01 ~]# crm_resource -g resource-stickiness -t primitive -r ip_01
 Error performing operation: The object/attribute does not exist


 Not a bad feature request though, could you add it to bugzilla?
 Done, http://developerbugs.linux-foundation.org/show_bug.cgi?id=2521



 Here is what I have and I want to get the current value of 
 resource-stickiness
 [r...@node-03 log]# crm_attribute --type rsc_defaults --name
 resource-stickiness --query
 Multiple attributes match name=resource-stickiness
  Value: INFINITY       (id=working-hours-stickiness)
  Value: 0      (id=after-hours-stickiness)


 Cheers,
 Pavlos

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] options listed more than once

2010-11-16 Thread Pavlos Parissis
On 16 November 2010 11:12, Andrew Beekhof and...@beekhof.net wrote:
 On Tue, Nov 16, 2010 at 11:04 AM, Pavlos Parissis
 pavlos.paris...@gmail.com wrote:
 On 16 November 2010 10:49, Andrew Beekhof and...@beekhof.net wrote:
 On Mon, Nov 15, 2010 at 4:35 PM, Pavlos Parissis
 pavlos.paris...@gmail.com wrote:
 Hi,

 When I have multiple values for a cluster options, ow do I check which
 value is currently being used by the cluster?
 In the configuration explained there is a reference to rules chapter
 but I couldn't find an answer on that chapter.

 Good question, I don't think we have a way to do that directly at the 
 moment.
 You might be able to use crm_resource to infer it with -g.
 Nope

 did you try with -m too?
 [r...@node-01 ~]# crm_resource -g resource-stickiness  -r ip_01
 Error performing operation: The object/attribute does not exist
 [r...@node-01 ~]# crm_resource -g resource-stickiness -t group -r 
 pbx_service_01
 Error performing operation: The object/attribute does not exist
 [r...@node-01 ~]# crm_resource -g resource-stickiness -t primitive -r ip_01
 Error performing operation: The object/attribute does not exist


oops, no
 # crm_resource -m -g resource-stickiness -t primitive -r ip_01
1000


 Not a bad feature request though, could you add it to bugzilla?
 Done, http://developerbugs.linux-foundation.org/show_bug.cgi?id=2521



 Here is what I have and I want to get the current value of 
 resource-stickiness
 [r...@node-03 log]# crm_attribute --type rsc_defaults --name
 resource-stickiness --query
 Multiple attributes match name=resource-stickiness
  Value: INFINITY       (id=working-hours-stickiness)
  Value: 0      (id=after-hours-stickiness)


 Cheers,
 Pavlos

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Stonith Device APC AP7900

2010-11-16 Thread Dejan Muhamedagic
Hi,

On Mon, Nov 15, 2010 at 10:41:22AM -0700, Devin Reade wrote:
 --On Monday, November 15, 2010 08:40:45 AM -0700 Rick Cone
 rc...@securepaymentsystems.com wrote:
 
  In production I am planning to have 2 separate AP7900 units each plugged
  into 2 different APC UPS units to achieve that.  I would then have the
  single node name on each, for each of the 2 PS's on the individual
  systems.
 
 So for this setup, you would have to trigger two stonith devices, one
 for each AP7900, with identical node names.  Only after both succeeded
 would you be able to consider the node to be dead.  Correct?
 
 I don't recall reading anything in the pacemaker et al documentation that
 would cover this case.  I was under the impression that after one 
 stonith resource is successfully invoked, the node would be considered
 to be offline.  If so, I'd be suspicious about assuming both PDUs
 would get activated without further investigation and testing.  (I 
 don't think that you could consider two node names on one PDU to be
 equivalent to one on each of two PDUs.)

Right, there's currently no way to do a simultaneous reset on two
distinct fencing devices.

 I think that in such a case you'd also have to ensure that your stonith
 action is poweroff rather than reset, or your node may not actually
 lose power (although you could mitigate that likelihood by configuring
 a longer reset time in the PDU).

Defining more than one stonith resource wouldn't work in this
case either, because as soon as one of them reports success, the
node is considered fenced.

Thanks,

Dejan

 However, while I've written RAs before, I've never looked at the stonith
 logic, so I could be completely out to lunch.  It sounds like an interesting
 edge case, and edge cases make me nervous :)
 
 Devin
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] understanding scores

2010-11-16 Thread Dejan Muhamedagic
Hi,

On Mon, Nov 15, 2010 at 05:01:27PM +0100, Pavlos Parissis wrote:
 On 15 November 2010 16:43, Andrew Beekhof and...@beekhof.net wrote:
  On Mon, Nov 15, 2010 at 3:18 PM, Pavlos Parissis
  pavlos.paris...@gmail.com wrote:
  On 15 November 2010 08:07, Andrew Beekhof and...@beekhof.net wrote:
  On Fri, Nov 12, 2010 at 7:54 PM, Pavlos Parissis
  pavlos.paris...@gmail.com wrote:
  Hi,
 
  I am trying to understand how the scores are calculated based on the
  output of ptest -sL and I have few questions
  Below is my scores with a  line number column and the bottom you will
  find my configuration
 
  So, let's start
 
  1 group_color: pbx_service_01 allocation score on node-01: 200
   2 group_color: pbx_service_01 allocation score on node-03: 10
   3 group_color: ip_01 allocation score on node-01: 1200
   4 group_color: ip_01 allocation score on node-03: 10
  so for so good, ip_01 has 1000 due to resource-stickiness=1000 plus
  200 from the group location constraint
 
   5 group_color: fs_01 allocation score on node-01: 1000
   6 group_color: fs_01 allocation score on node-03: 0
   7 group_color: pbx_01 allocation score on node-01: 1000
   8 group_color: pbx_01 allocation score on node-03: 0
   9 group_color: sshd_01 allocation score on node-01: 1000
   10 group_color: sshd_01 allocation score on node-03: 0
   11 group_color: mailAlert-01 allocation score on node-01: 1000
   12 group_color: mailAlert-01 allocation score on node-03: 0
  hold on now, why all the above resources have 1000 on node-01 and not
  1200 as fs_01
 
  its only applied to ip_01, the rest inherit it from there
 
 
   13 native_color: ip_01 allocation score on node-01: 5200
  5 resources x 1000 from resource-stickiness=1000 plus, right? what
  is the difference between in native and group?
 
  Many things, can you be specific?
  In principles what are the difference? if my question sounds stupid
  then it is because I don't understand the terminology.
 
  well, groups are an ordered collection of natives.
 
 
 
 
   14 native_color: ip_01 allocation score on node-03: 10
   15 clone_color: ms-drbd_01 allocation score on node-01: 4100
  why 4100?
 
  probably the promotion score
  I have order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote
  pbx_service_01:start
  does promotion score you mention come out the above contstraint?
 
  only from colocation constraints
 then it comes out from colocation fs_01-on-drbd_01 inf: fs_01
 ms-drbd_01:Master
 which has score inf, where at line 15 score is 4100.
 
  I think my issue here is how I look at the numbers, I assume that
  every time I see score for a resource, that score also includes any
  scores mentioned before. Is my assumption correct?
 
  often, it depends on what colocation constraints you have set up
 
 Ok, let me ask it differently, how by looking at the output of ptest
 -sL can I find the score of a resource for a specific node?
 Since the score of a resource for a specific node is mentioned in
 several lines, it is not that easy- at least to me.

You can try with showscores.sh. It should do exactly what you
want. It is not package, but you can get it from the pacemaker
repository.

Thanks,

Dejan

 Cheers,
 Pavlos
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] DRBD MC 0.8.4 / Pacemaker GUI

2010-11-16 Thread Rasto Levrinc
Hi,

This is the next DRBD MC release 0.8.4. DRBD MC is a Java application that 
helps to configure DRBD, Pacemaker, VMs or any combination of them. It uses 
SSH to connect to the cluster from a desktop computer.

Focus of this release was on performance, profiling, fixing leaks and stuff 
like that. Thanks to this DRBD MC went straight from being bloated and 
sloppy application to lightweight and lean, making the source code a 
textbook of concentrated good-coding practices.

It turned out, that after couple of changes it is now possible to run DRBD 
MC as an applet without any performance or functionality loss. See here:

http://oss.linbit.com/drbd-mc/img/drbd-mc-0.8.4.png

It just sits in the browser and makes all this everything-must-be-web people
happy. For now if you want to enable the applet functionality, you'd have to
compile and sign it yourself.

You can get DRBD MC here:

http://www.drbd.org/mc/management-console/
http://oss.linbit.com/drbd-mc/DMC-0.8.4.jar
http://oss.linbit.com/drbd-mc/drbd-mc-0.8.4.tar.gz

1. Download the DMC-0.8.4.jar file.
2. Make sure you use SUN Java not the OpenJDK 1.6.
3. Start it: java -jar DMC-0.8.4.jar
4. It connects to the cluster via SSH.

DRBD MC is compatible with Heartbeat 2.1.3 to the Pacemaker 1.1.3 with
Corosync or Heartbeat and DRBD 8.

Here are the most important changes:
* add clone-node-max meta attribute
* fix defaults in IPaddr/IPaddr2 RAs
* remove useless node name and DNS check in host dialog wizard
* add --no-upgrade-check option
* fix applying of clones
* fix leaks with groups and clones
* graph fixes
* fix graph resizing
* fix leak with DRBD resources
* make it possible to run as an applet
* upgrade Jung library to 2.0.1

Rasto Levrinc

-- 
: Dipl-Ing Rastislav Levrinc
: DRBD-MC http://www.drbd.org/mc/management-console/
: DRBD/HA support and consulting http://www.linbit.com/
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] AP9606 fencing device

2010-11-16 Thread Devin Reade
--On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis
pavlos.paris...@gmail.com wrote:

 I have a APC AP9606 PDU and I am trying to find a stonith agent which
 works with that PDU.

I know that this is an old thread, but I'll reply anyway.

I have a one cluster that uses an old APC AP9606 for which I've not
been able to obtain a flash update.  In particular, it is:
 hardware revision: J13
 APP version 2.2.0
 AOS version 3.0.3

It is running just fine (see caveat below) with the following configuration,
and I can attest that it has properly stonith'd nodes many times.

primitive msw stonith:apcmastersnmp \
operations $id=msw-operations \
op monitor interval=15 timeout=15 start-delay=15 \
params ipaddr=IPADDR port=161 community=COMMUNITY
clone msw-clone msw \
meta clone-max=2 target-role=started

(yeah, that monitor interval is probably a little quick ...)

That particular cluster is getting long in the tooth:
 pacemaker-1.0.5-4.6.x86_64
 openais-0.80.5-15.1.x86_64

The caveat is that this PDU used to work with the default implementation,
however at some point someone updated the OIDs in apcmastersnmp to 
match newer firmware.  Therefore, I had to reverse patch that RA:

===
--- apcmastersnmp.c.orig2009-09-26 16:12:27.0 -0600
+++ apcmastersnmp.c 2009-09-28 16:46:17.0 -0600
@@ -137,12 +137,12 @@
 #define OUTLET_NO_CMD_PEND 2
 
 /* oids */
-#define OID_IDENT  .1.3.6.1.4.1.318.1.1.12.1.5.0
-#define OID_NUM_OUTLETS
.1.3.6.1.4.1.318.1.1.12.1.8.0
-#define OID_OUTLET_NAMES
.1.3.6.1.4.1.318.1.1.12.3.4.1.1.2.%i
-#define OID_OUTLET_STATE
.1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.%i
-#define OID_OUTLET_COMMAND_PENDING
.1.3.6.1.4.1.318.1.1.12.3.5.1.1.5.%i
-#define OID_OUTLET_REBOOT_DURATION
.1.3.6.1.4.1.318.1.1.12.3.4.1.1.6.%i
+#define OID_IDENT  .1.3.6.1.4.1.318.1.1.4.1.4.0
+#define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0
+#define OID_OUTLET_NAMES   .1.3.6.1.4.1.318.1.1.4.5.2.1.3.%i
+#define OID_OUTLET_STATE   .1.3.6.1.4.1.318.1.1.4.4.2.1.3.%i
+#define OID_OUTLET_COMMAND_PENDING .1.3.6.1.4.1.318.1.1.4.4.2.1.2.%i
+#define OID_OUTLET_REBOOT_DURATION .1.3.6.1.4.1.318.1.1.4.5.2.1.5.%i
 
 /*
snmpset -c private -v1 172.16.0.32:161
===



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] colocation that doesn't

2010-11-16 Thread Vadym Chepkov

On Nov 15, 2010, at 2:18 AM, Andrew Beekhof wrote:

 On Fri, Nov 5, 2010 at 4:07 AM, Vadym Chepkov vchep...@gmail.com wrote:
 
 On Nov 4, 2010, at 12:53 PM, Alan Jones wrote:
 
 If I understand you correctly, the role of the second resource in the
 colocation command was defaulting to that of the first Master which
 is not defined or is untested for none-ms resources.
 Unfortunately, after changed that line to:
 
 colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started
 
 ...it still doesn't work:
 
 myprim  (ocf::pacemaker:DummySlow): Started node6.acme.com
 Master/Slave Set: mystateful-ms
 Masters: [ node5.acme.com ]
 Slaves: [ node6.acme.com ]
 
 And after:
 location myprim-loc myprim -inf: node5.acme.com
 
 myprim  (ocf::pacemaker:DummySlow): Started node6.acme.com
 Master/Slave Set: mystateful-ms
 Masters: [ node6.acme.com ]
 Slaves: [ node5.acme.com ]
 
 What I would like to do is enable logging for the code that calculates
 the weights, etc.
 It is obvious to me that the weights are calculated differently for
 mystateful-ms based on the weights used in myprim.
 Can you enable more verbose logging online or do you have to recompile?
 My version is 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 which is
 different from Vadym's.
 BTW: Is there another release planned for the stable branch?  1.0.9.1
 is now 4 months old.
 I understand that I could take the top of tree, but I would like to
 believe that others are running the same version. ;)
 Thank you!
 Alan
 
 On Thu, Nov 4, 2010 at 8:22 AM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
 Hi,
 
 On Thu, Nov 04, 2010 at 06:51:59AM -0400, Vadym Chepkov wrote:
 On Thu, Nov 4, 2010 at 5:37 AM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
 
 This should be:
 
 colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started
 
 
 Interesting, so in this case it is not necessary?
 
 colocation fs_on_drbd inf: WebFS WebDataClone:Master
 (taken from Cluster_from_Scratch)
 
 but other way around it is?
 
 Yes, the role of the second resource defaults to the role of the
 first. Ditto for order and actions. A bit confusing, I know.
 
 Thanks,
 
 Dejan
 
 
 
 I did it a bit different this time and I observe the same anomaly.
 
 First I started stateful clone
 
 primitive s1 ocf:pacemaker:Stateful
 ms ms1 s1 meta master-max=1 master-node-max=1 clone-max=2 
 clone-node-max=1 notify=true
 
 Then a primitive:
 
 primitive d1 ocf:pacemaker:Dummy
 
 Made sure Master and primitive are running on different hosts
 location ld1 d1 10: xen-12
 
 and then I added constraint
 colocation c1 inf: ms1:Master d1:Started
 
  Master/Slave Set: ms1
 Masters: [ xen-11 ]
 Slaves: [ xen-12 ]
  d1 (ocf::pacemaker:Dummy): Started xen-12
 
 
 It seems colocation constraint is not enough to promote a clone. Looks like 
 a bug.
 
 # ptest -sL|grep s1
 clone_color: ms1 allocation score on xen-11: 0
 clone_color: ms1 allocation score on xen-12: 0
 clone_color: s1:0 allocation score on xen-11: 11
 clone_color: s1:0 allocation score on xen-12: 0
 clone_color: s1:1 allocation score on xen-11: 0
 clone_color: s1:1 allocation score on xen-12: 6
 native_color: s1:0 allocation score on xen-11: 11
 native_color: s1:0 allocation score on xen-12: 0
 native_color: s1:1 allocation score on xen-11: -100
 native_color: s1:1 allocation score on xen-12: 6
 s1:0 promotion score on xen-11: 20
 s1:1 promotion score on xen-12: 20
 
 Vadym
 
 Could you attach the result of cibadmin -Ql when the cluster is in
 this state please?


I created http://developerbugs.linux-foundation.org/show_bug.cgi?id=2522 with 
hb_report included

Thank you,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd-xen and fencing

2010-11-16 Thread Vadym Chepkov

On Nov 15, 2010, at 2:08 AM, Andrew Beekhof wrote:

 Don't use init.d/drbd, use the ocf script that comes with the drbd packages

Well, that doesn't help with live migration, unfortunately.

This is quote from /etc/xen/scripts/block-drbd

# This script will not load the DRBD kernel module for you, nor will
# it attach, detach, connect, or disconnect your resource. The init
# script distributed with DRBD will do that for you. Make sure it is
# started before attempting to start a DRBD-backed domU.


 
 On Thu, Nov 11, 2010 at 2:19 PM, Vadym Chepkov vchep...@gmail.com wrote:
 Hi,
 
 I posted a less elaborate version of this question to drbd mail-list, but,
 unfortunately, didn't get a reply,
 maybe audience of this list has more experience.
 I am trying to make xen live migration to work reliably, but wasn't
 successful so far.
 Here is the problem.
 In a cluster configuration I have two type of resources - file systems on
 drbd, with explicit drbd resources configuration and
 Xen resources with implicit, using drbd-xen block device helper. For the
 former everything works great, but the latter doesn't work quite well.
 In order for helper script to work, drbd module has to be loaded and
 underlying resources up. So I have to start init.d/drbd script.
 I can't make it an lsb cluster resource, because stop will be disastrous for
 file system resources. Enable it in startup sequence breaks
 /usr/lib/drbd/crm-unfence-peer.sh, because cluster stack is not completely
 up by the time drbd script finishes, and there is no way to configure only
 specific resources that need to be initialized.
 Also, I can't find a way fence Xen resource. I tried fence-peer
 /usr/lib/drbd/crm-fence-peer.sh -i xen_vsvn,
 where xen_svn is the name of Xen primitive, but it doesn't work,
 so there is a danger of starting Xen VM on an out-of-date node. Then there
 is no way of monitoring underlying drbd resources too.
 I thought of adding underlying drbd resource explicitly in the cluster, but
 I can't figure out what would be the configuration
 for this resource can be master on both nodes, but if just on one, it's
 fine too.
 allow-two-primaries has to be allowed for live migration and at the time of
 migration resources are primary on both nodes, but when migration finishes,
 it's again primary/slave. But if I configure drbd resource in the cluster
 with meta master-max=2 master-node-max=1,
 cluster insists on having them both primary all the time.
 Hope I didn't bore you to death and there is an elegant solution for
 this conundrum :)
 Thank you,
 Vadym
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd-xen and fencing

2010-11-16 Thread Vadym Chepkov

On Nov 15, 2010, at 3:45 AM, Lars Ellenberg wrote:

 On Mon, Nov 15, 2010 at 08:08:40AM +0100, Andrew Beekhof wrote:
 Don't use init.d/drbd, use the ocf script that comes with the drbd packages
 
 Then he loses xen live migration, I think.  Because Pacemaker cannot
 migrate dependent resources, and because xen apparently still thinks it
 needs both nodes Primary for a short amount of time.

And something needs to bring underlying storage into Connected state. 
It seems current drbd RA can't help with this problem


 
 On Thu, Nov 11, 2010 at 2:19 PM, Vadym Chepkov vchep...@gmail.com wrote:
 Hi,
 
 I posted a less elaborate version of this question to drbd mail-list, but,
 unfortunately, didn't get a reply,
 maybe audience of this list has more experience.
 I am trying to make xen live migration to work reliably, but wasn't
 successful so far.
 Here is the problem.
 In a cluster configuration I have two type of resources - file systems on
 drbd, with explicit drbd resources configuration and
 Xen resources with implicit, using drbd-xen block device helper. For the
 former everything works great, but the latter doesn't work quite well.
 In order for helper script to work, drbd module has to be loaded and
 underlying resources up. So I have to start init.d/drbd script.
 I can't make it an lsb cluster resource, because stop will be disastrous for
 file system resources. Enable it in startup sequence breaks
 /usr/lib/drbd/crm-unfence-peer.sh, because cluster stack is not completely
 up by the time drbd script finishes, and there is no way to configure only
 specific resources that need to be initialized.
 Also, I can't find a way fence Xen resource. I tried fence-peer
 /usr/lib/drbd/crm-fence-peer.sh -i xen_vsvn,
 where xen_svn is the name of Xen primitive, but it doesn't work,
 
 Can you be more specific?
 What did you try,
 what did you expect,
 and how do you determine it did not work?

whenever a drbd resource gets disconnected fencing script usually define a 
constraint, for example:

location drbd-fence-by-handler-ms_drbd_ldap ms_drbd_ldap \
rule $id=drbd-fence-by-handler-rule-ms_drbd_ldap $role=Master -inf: 
#uname ne xen-11

This doesn't happen if I define handler 
fence-peer /usr/lib/drbd/crm-fence-peer.sh
or
fence-peer /usr/lib/drbd/crm-fence-peer.sh -i xen_vsvn,
where xen_svn is Xen resource with drbd helper, I assume, because 
crm-fence-peer.sh expects drbd resource to be present in the cluster 
configuration and it is not the case.

I guess to solve this issue correctly a drbd clone is necessary, which will 
bring resources to Connected state and then won't freak out if both nodes are 
Secondary (when Xen VM is shutdown) or both nodes are Primary (when migration 
is in progress).

Vadym


 
 so there is a danger of starting Xen VM on an out-of-date node. Then there
 is no way of monitoring underlying drbd resources too.
 I thought of adding underlying drbd resource explicitly in the cluster, but
 I can't figure out what would be the configuration
 for this resource can be master on both nodes, but if just on one, it's
 fine too.
 allow-two-primaries has to be allowed for live migration and at the time of
 migration resources are primary on both nodes, but when migration finishes,
 it's again primary/slave. But if I configure drbd resource in the cluster
 with meta master-max=2 master-node-max=1,
 cluster insists on having them both primary all the time.
 Hope I didn't bore you to death and there is an elegant solution for
 this conundrum :)
 Thank you,
 Vadym
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: 

Re: [Pacemaker] AP9606 fencing device

2010-11-16 Thread Pavlos Parissis
On 17 November 2010 04:15, Devin Reade g...@gno.org wrote:

 --On Wednesday, October 27, 2010 09:47:14 AM +0200 Pavlos Parissis
 pavlos.paris...@gmail.com wrote:

  I have a APC AP9606 PDU and I am trying to find a stonith agent which
  works with that PDU.

 I know that this is an old thread, but I'll reply anyway.

 I have a one cluster that uses an old APC AP9606 for which I've not
 been able to obtain a flash update.  In particular, it is:
 hardware revision: J13
 APP version 2.2.0
 AOS version 3.0.3

 It is running just fine (see caveat below) with the following
 configuration,
 and I can attest that it has properly stonith'd nodes many times.

 primitive msw stonith:apcmastersnmp \
operations $id=msw-operations \
op monitor interval=15 timeout=15 start-delay=15 \
params ipaddr=IPADDR port=161 community=COMMUNITY
 clone msw-clone msw \
meta clone-max=2 target-role=started

 (yeah, that monitor interval is probably a little quick ...)

 That particular cluster is getting long in the tooth:
 pacemaker-1.0.5-4.6.x86_64
 openais-0.80.5-15.1.x86_64

 The caveat is that this PDU used to work with the default implementation,
 however at some point someone updated the OIDs in apcmastersnmp to
 match newer firmware.  Therefore, I had to reverse patch that RA:

 ===
 --- apcmastersnmp.c.orig2009-09-26 16:12:27.0 -0600
 +++ apcmastersnmp.c 2009-09-28 16:46:17.0 -0600
 @@ -137,12 +137,12 @@
  #define OUTLET_NO_CMD_PEND 2

  /* oids */
 -#define OID_IDENT  .1.3.6.1.4.1.318.1.1.12.1.5.0
 -#define OID_NUM_OUTLETS
 .1.3.6.1.4.1.318.1.1.12.1.8.0
 -#define OID_OUTLET_NAMES
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.2.%i
 -#define OID_OUTLET_STATE
 .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.%i
 -#define OID_OUTLET_COMMAND_PENDING
 .1.3.6.1.4.1.318.1.1.12.3.5.1.1.5.%i
 -#define OID_OUTLET_REBOOT_DURATION
 .1.3.6.1.4.1.318.1.1.12.3.4.1.1.6.%i
 +#define OID_IDENT  .1.3.6.1.4.1.318.1.1.4.1.4.0
 +#define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0
 +#define OID_OUTLET_NAMES   .1.3.6.1.4.1.318.1.1.4.5.2.1.3.%i
 +#define OID_OUTLET_STATE   .1.3.6.1.4.1.318.1.1.4.4.2.1.3.%i
 +#define OID_OUTLET_COMMAND_PENDING .1.3.6.1.4.1.318.1.1.4.4.2.1.2.%i
 +#define OID_OUTLET_REBOOT_DURATION .1.3.6.1.4.1.318.1.1.4.5.2.1.5.%i

  /*
snmpset -c private -v1 172.16.0.32:161
 ===


I faced the same problem and because I didn't want to modify the code of
apcmastersnmp RA, I used the rackpdu RA where I could set OIDs in the
parameters.
This RA worked perfectly until the PDU died!
I suggest to use the rackpdu RA because if you upgrade your cluster software
your modification will be gone.

Cheers,
Pavlos
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker