Re: [Pacemaker] power failure handling

2010-05-27 Thread Vadym Chepkov

On May 27, 2010, at 7:21 AM, Andrew Beekhof wrote:

 On Wed, May 26, 2010 at 9:07 PM, Vadym Chepkov vchep...@gmail.com wrote:
 Hi,
 
 What would be the proper way to shutdown members of two-node cluster in case 
 of a power outage?
 I assume as soon I issue 'crm node standby node-1 reboot' resources will 
 start to fail-over to the second node and,
 first of all, there is no reason for that, and, second of all,
 consecutive 'crm node standby node-2 reboot' might get into some race 
 condition.
 
 Why?

Just a gutsy feeling and I would prefer to have it in one transaction, call me 
a purist :)
I would use 'crm load update standby.cfg, but I can't figure out how to set 
lifetime reboot attribute properly. 
crm is definitely using a hack on this one because when I issue this command 
the node goes standby, but 'crm configure' and 'crm node show' 
indicates that standby attribute is off, weird.

 
 
 In pseudo-property terms:
 
 crm confgure property 
 stop-all-resources-even-if-target-role-is-started-until-reboot=true
 
 crm confgure property stop-all-resources=true
 
 followed by:
  cibadmin --delete-all --xpath '//nvp...@name=target-role]'
 
 should work

it would also alter those that were stopped for a reason, and certainly can be 
tweaked, 
but it won't take care of the until-reboot part.

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pengine self-maintenance

2010-05-19 Thread Vadym Chepkov

On May 17, 2010, at 11:38 AM, Dejan Muhamedagic wrote:
 
 You don't want to set it that low. PE input files are part of
 your cluster history. Set it to a few thousand.
 

What could be the side-backs of having it too low?
How are these files being used?

And shouldn't be some reasonable default be in place? I just happened to notice 
90% inode utilization on my /var, some could be not so lucky.


 # ls /var/lib/pengine/|wc -l
 123500
 
 
 /var/lib/heartbeat/crm/ seems also growing unattended.
 
 Unless there is a bug somewhere, it should be storing only the last
 100 configurations.
 
 you are right, they are being reused


I found another bug/feature :)
When it's time to reutilize cib-/pe-xxx the process starts with 1, but initial 
start creates files with 0 suffix
So you have your pe-warn-0.bz2 frozen in time, for example :)



 
 
 Does pacemaker do any self-maintenance or it will cause system to crash
 eventually by utilizing all inodes?
 
 Also, why cluster-recheck-interval not in pengine metadata output? Is 
 it
 deprecated?
 
 Its controlled by the crmd, so its in the crmd metadata output.
 
 Ah, then crm cli has a bug? 
 
 When you click TAB metadata of crmd is not shown:
 
 crm(live)configure# property 
 batch-limit=  no-quorum-policy= 
 pe-input-series-max=  stonith-enabled=
 cluster-delay=node-health-green=
 pe-warn-series-max=   stonith-timeout=
 default-action-timeout=   node-health-red=  
 remove-after-stop=stop-all-resources=
 default-resource-stickiness=  node-health-strategy= 
 start-failure-is-fatal=   stop-orphan-actions=
 is-managed-default=   node-health-yellow=   startup-fencing= 
  stop-orphan-resources=
 maintenance-mode= pe-error-series-max=  stonith-action=  
  symmetric-cluster=
 
 Yes, you can file a bugzilla for that. Note that the property
 will still be set if you type it.
 

Done, Bug 2419

Thanks,
Vadym



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pengine self-maintenance

2010-05-19 Thread Vadym Chepkov
On Wed, May 19, 2010 at 1:26 PM, Dejan Muhamedagic deja...@fastmail.fmwrote:

  And shouldn't be some reasonable default be in place? I just
  happened to notice 90% inode utilization on my /var, some could
  be not so lucky.


 Yes, that could be a problem. Perhaps that default could be
 changed to say 1 which would be close enough to unlimited for
 clusters in normal use :)


Even if your cluster absolutely solid and none of applications ever go up or
down this will be reached in 104 days :)


  I found another bug/feature :)
  When it's time to reutilize cib-/pe-xxx the process starts with 1, but
 initial start creates files with 0 suffix
  So you have your pe-warn-0.bz2 frozen in time, for example :)



Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-18 Thread Vadym Chepkov
On Tue, May 18, 2010 at 2:22 PM, Ruiyuan Jiang ruiyuan_ji...@liz.comwrote:

  Hi, Vadym



 I modified the configuration per your suggestion. Here is the current
 configuration of the cluster:



 [r...@usnbrl52 ~]# crm configure show

 node usnbrl52

 node usnbrl53

 primitive ClusterIP ocf:heartbeat:IPaddr2 \

 params ip=156.146.22.48 cidr_netmask=32 \

 op monitor interval=30s

 property $id=cib-bootstrap-options \

 dc-version=1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf \

 cluster-infrastructure=openais \

 expected-quorum-votes=2 \

 stonith-enabled=false

 rsc_defaults $id=rsc-options \

 resource-stickiness=100

 [r...@usnbrl52 ~]#



 After the change, the IP address still does not fail to the other node
 usnbrl53 after I shutdown openais on node usnbrl52. The cluster IP has no
 problem to bound on usnbrl52 when the “openais” gets stopped and started on
 the node.


That's because no-quorum-policy=ignore is still not there, it is not listed
in crm configure show output
Run the command again

crm configure property no-quorum-policy=ignore

and make sure 'crm configure show' has changed accordingly

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-18 Thread Vadym Chepkov
On Tue, May 18, 2010 at 3:58 PM, Ruiyuan Jiang ruiyuan_ji...@liz.comwrote:

  Thanks, Vadym



 This time it failed over to another node. For two nodes cluster, does the
 cluster have to set to “no-quorum-policy=ignore” to failover or work
 correctly?


I can't say it better myself:

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

two-node cluster only has quorum when both nodes are running, which is no
longer the case for our
cluster. This would normally make the creation of a two-node cluster
pointless, however it is possible to
control how Pacemaker behaves when quorum is lost. In particular, we can
tell the cluster to simply ignore
quorum altogether.
crm configure property no-quorum-policy=ignore
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] pengine self-maintenance

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 2:52 AM, Andrew Beekhof wrote:

 On Sun, May 16, 2010 at 1:09 AM, Vadym Chepkov vchep...@gmail.com wrote:
 Hi
 
 I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in
 /var/lib/pengine,
 especially when cluster-recheck-interval is set to enable failure-timeout
 checks.
 
 pengine metadata | grep series-max

Great, thanks, after I set it, I take it I need to clean excessive manually?

# crm configure show |grep series-max
pe-error-series-max=10 \
pe-warn-series-max=10 \
pe-input-series-max=10

# ls /var/lib/pengine/|wc -l
123500

 
 /var/lib/heartbeat/crm/ seems also growing unattended.
 
 Unless there is a bug somewhere, it should be storing only the last
 100 configurations.

you are right, they are being reused

 
 Does pacemaker do any self-maintenance or it will cause system to crash
 eventually by utilizing all inodes?
 
 Also, why cluster-recheck-interval not in pengine metadata output? Is it
 deprecated?
 
 Its controlled by the crmd, so its in the crmd metadata output.

Ah, then crm cli has a bug? 

When you click TAB metadata of crmd is not shown:

crm(live)configure# property 
batch-limit=  no-quorum-policy= 
pe-input-series-max=  stonith-enabled=
cluster-delay=node-health-green=pe-warn-series-max= 
  stonith-timeout=
default-action-timeout=   node-health-red=  remove-after-stop=  
  stop-all-resources=
default-resource-stickiness=  node-health-strategy= 
start-failure-is-fatal=   stop-orphan-actions=
is-managed-default=   node-health-yellow=   startup-fencing=
  stop-orphan-resources=
maintenance-mode= pe-error-series-max=  stonith-action= 
  symmetric-cluster=

Thanks,
Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] Detecting a lost network connection

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 11:56 AM, Simon Lavigne-Giroux wrote:

 Hi,
 
 I have 2 servers running Pacemaker. When the router fails, both nodes become 
 primary. Is it possible for Pacemaker on the secondary server to detect that 
 the network connection is not available and not become primary.
 

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch09s03s03s02.html



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-17 Thread Vadym Chepkov

On May 17, 2010, at 5:40 PM, Ruiyuan Jiang wrote:

 Hi, Gianluca
  
 I modified my configuration and deleted “crm configure property 
 no-quorum-policy=ignore” as you suggested but I have the same problem that 
 the IP address does not fail. Thanks.
  
 [r...@usnbrl52 log]# crm configure show
 node usnbrl52
 node usnbrl53
 primitive ClusterIP ocf:heartbeat:IPaddr2 \
 params ip=156.146.22.48 cidr_netmask=32 \
 op monitor interval=30s
 property $id=cib-bootstrap-options \
 dc-version=1.0.8-fab8db4bbd271ba0a630578ec23776dfbaa4e2cf \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 stonith-enabled=false
 rsc_defaults $id=rsc-options \
 resource-stickiness=default

did you run 'crm configure show' after you set the property?
Because the option is not shown in your output.

also resource-stickiness=default seems suspicious
what default? I thought it should be a numeric value.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

[Pacemaker] pengine self-maintenance

2010-05-15 Thread Vadym Chepkov
Hi

I noticed pengine (pacemaker-1.0.8-6.el5) creates quite a lot of files in 
/var/lib/pengine,
especially when cluster-recheck-interval is set to enable failure-timeout 
checks.
/var/lib/heartbeat/crm/ seems also growing unattended.
Does pacemaker do any self-maintenance or it will cause system to crash 
eventually by utilizing all inodes?

Also, why cluster-recheck-interval not in pengine metadata output? Is it 
deprecated?

Thanks,
Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
You forgot to turn on monitor operation for ping (actual job)


On May 11, 2010, at 5:15 AM, Gianluca Cecchi wrote:

 On Mon, May 10, 2010 at 4:39 PM, Vadym Chepkov vchep...@gmail.com wrote:
 # crm ra meta ping
 
 name (string, [undef]): Attribute name
 The name of the attributes to set.  This is the name to be used in the 
 constraints.
 
 By default is pingd, but you are checking against pinggw
 
 I suggest you do not change name though, but adjust your location constraint 
 to use pingd instead.
 crm_mon only notices pingd at the moment whenn you pass -f argument: it's 
 hardcoded
 
 
 On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi gianluca.cec...@gmail.com 
 wrote:
 Hello,
 using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way 
 ping clone works to setup monitoring of gw... even after reading docs...
 
 As soon as I run:
 crm configure location nfs-group-with-pinggw nfs-group rule -inf: not_defined 
 pinggw or pinggw lte 0
 
 the resources go stopped and don't re-start
 
 [snip]
 
 hem...
 I changed the location line so that now I have:
 primitive pinggw ocf:pacemaker:ping \
   params host_list=192.168.101.1 multiplier=100 \
   op start interval=0 timeout=90 \
   op stop interval=0 timeout=100
 
 clone cl-pinggw pinggw \
   meta globally-unique=false
 
 location nfs-group-with-pinggw nfs-group \
   rule $id=nfs-group-with-pinggw-rule -inf: not_defined pingd or pingd 
 lte 0
 
 But now nothing happens  if I run for example
  iptables -A OUTPUT -p icmp -d 192.168.101.1 -j REJECT (or DROP)
 in the node where nfs-group is running.
 
 Do I have to name the primitive itself to pingd
 It seems that the binary /bin/ping is not accessed at all (with ls -lu ...)
 
 Or do I have to change the general property I previously define to avoide 
 failback:
 rsc_defaults $id=rsc-options \
   resource-stickiness=100
 
 crm_mon -f -r gives:
 Online: [ ha1 ha2 ]
 
 Full list of resources:
 
 SitoWeb (ocf::heartbeat:apache):Started ha1
  Master/Slave Set: NfsData
  Masters: [ ha1 ]
  Slaves: [ ha2 ]
  Resource Group: nfs-group
  ClusterIP  (ocf::heartbeat:IPaddr2): Started ha1
  lv_drbd0   (ocf::heartbeat:LVM):   Started ha1
  NfsFS(ocf::heartbeat:Filesystem):Started ha1
  nfssrv (ocf::heartbeat:nfsserver): Started ha1
 nfsclient (ocf::heartbeat:Filesystem):Started ha2
  Clone Set: cl-pinggw
  Started: [ ha2 ha1 ]
 
 Migration summary:
 * Node ha1:  pingd=100
 * Node ha2:  pingd=100
 
 Probably I didn't understand correctly what described at the link:
 http://www.clusterlabs.org/wiki/Pingd_with_resources_on_different_networks
 or it is outdated now... and instead of defining two clones it is better (aka 
 works) to populate the host_list parameter as described here in case of more 
 networks connected:
 
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
 
 Probably I'm missing something very simple but I don't get a clue to it...
 Gianluca
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
First of all, none of the monitor operation is on by default in pacemaker, this 
is something that you have to turn on
For the ping RA  start and stop op parameters don't do much, so you can safely 
drop them.

Here is my settings,  they do work for me:

primitive ping ocf:pacemaker:ping \
params name=pingd host_list=10.10.10.250 multiplier=200 
timeout=3 \
op monitor interval=10
clone connected ping \
meta globally-unique=false
location rg0-connected rg0 \
rule -inf: not_defined pingd or pingd lte 0


On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote:

 On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov vchep...@gmail.com wrote:
 You forgot to turn on monitor operation for ping (actual job)
 
 
 
 I saw from the 
 [r...@ha1 ~]# crm ra meta ping 
 command
 
 Operations' defaults (advisory minimum):
 
 start timeout=60
 stop  timeout=20
 reloadtimeout=100
 monitor_0 interval=10 timeout=60
 
 So I presumed it was by default in place for the ping resource.
 Do you mean that I should define the resource this way:
 crm configure primitive pinggw ocf:pacemaker:ping \
  params host_list=192.168.101.1 multiplier=100 \
  op start interval=0 timeout=90 \
  op stop interval=0 timeout=100 \
  op monitor interval=10 timeout=60
 
 Ok, I did it and I now get the same behavior as with pingd. Thanks ;-)
 
 Migration summary:
 * Node ha1:  pingd=0
 * Node ha2:  pingd=100
 
 And if I remove the iptables rule  I get:
 Migration summary:
 * Node ha1:  pingd=100
 * Node ha2:  pingd=100
 
 Now I will check the all resources stopped problem...
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
By the way, there is another issue with your config

Since you set multiplier to 100, it will negate your resource-stickiness which 
is also set to 100.
Either reduce multiplier or increase default resource-stickiness ( I have mine 
at 1000)


Vadym
On May 11, 2010, at 7:06 AM, Gianluca Cecchi wrote:

 On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov vchep...@gmail.com wrote:
 You forgot to turn on monitor operation for ping (actual job)
 
 
 
 I saw from the 
 [r...@ha1 ~]# crm ra meta ping 
 command
 
 Operations' defaults (advisory minimum):
 
 start timeout=60
 stop  timeout=20
 reloadtimeout=100
 monitor_0 interval=10 timeout=60
 
 So I presumed it was by default in place for the ping resource.
 Do you mean that I should define the resource this way:
 crm configure primitive pinggw ocf:pacemaker:ping \
  params host_list=192.168.101.1 multiplier=100 \
  op start interval=0 timeout=90 \
  op stop interval=0 timeout=100 \
  op monitor interval=10 timeout=60
 
 Ok, I did it and I now get the same behavior as with pingd. Thanks ;-)
 
 Migration summary:
 * Node ha1:  pingd=0
 * Node ha2:  pingd=100
 
 And if I remove the iptables rule  I get:
 Migration summary:
 * Node ha1:  pingd=100
 * Node ha2:  pingd=100
 
 Now I will check the all resources stopped problem...
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
pingd is a daemon with is running all the time and does it job
you still need to define monitor operation though, what if the daemon dies?
op monitor  just have a different meaning for ping and pingd.
with pingd - monitor daemon
with ping - monitor connectivity

as for warnings:

crm configure property default-action-timeout=120s

On Tue, May 11, 2010 at 11:00 AM, Gianluca Cecchi gianluca.cec...@gmail.com
 wrote:

 On Tue, May 11, 2010 at 1:13 PM, Vadym Chepkov vchep...@gmail.com wrote:

 First of all, none of the monitor operation is on by default in pacemaker,
 this is something that you have to turn on
 For the ping RA  start and stop op parameters don't do much, so you can
 safely drop them.



 Yes, but for the pacemaker:pingd RA I didn't need to pass the op monitor
 parameter to have it working

 Also, in general I added the start/stop op parameters, because without them
 I get, for example with the command you suggested:

 [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \
  params host_list=192.168.101.1 multiplier=200 timeout=3 \
  op monitor interval=10
 WARNING: pinggw: default-action-timeout 20s for start is smaller than the
 advised 60
 WARNING: pinggw: default-action-timeout 20s for monitor_0 is smaller than
 the advised 60

 Do I have to ignore the warnings?
 Or do I have to adapt the resource creation with:
 [r...@ha1 ~]# crm configure primitive pinggw ocf:pacemaker:ping \
  params host_list=192.168.101.1 multiplier=200 timeout=3 \
  op start timeout=60

 That gives no warnings (even if I would have expected the warning about the
 monitor_0 timeout as I didn't set it...???)




 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Vadym Chepkov
The is no default unless it's set, that's why crm complains



On Tue, May 11, 2010 at 12:41 PM, Gianluca Cecchi gianluca.cec...@gmail.com
 wrote:

 On Tue, May 11, 2010 at 5:47 PM, Vadym Chepkov vchep...@gmail.com wrote:

 pingd is a daemon with is running all the time and does it job
 you still need to define monitor operation though, what if the daemon
 dies?
 op monitor  just have a different meaning for ping and pingd.
 with pingd - monitor daemon
 with ping - monitor connectivity

 as for warnings:

 crm configure property default-action-timeout=120s


 Thanks again!
 Now it is more clear.

 Only doubt: why pacemaker doesn't set directly as a default 120s for
 timeout?
 Any drawbacks in setting it to 120?
 Also, with
 crm configure show
 I can see
  property $id=cib-bootstrap-options \
 dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \
 cluster-infrastructure=openais \
  expected-quorum-votes=2 \
 stonith-enabled=false \
  no-quorum-policy=ignore \
 last-lrm-refresh=1273484758
 rsc_defaults $id=rsc-options \
  resource-stickiness=1000

 Any way to see what is the default value for default-action-timeout
 parameter that I'm going to change (I presume it is 20s from the warnings I
 received) and for other ones for example that are not shown with the show
 command?


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] Pacemaker installation on CentOs 5.3

2010-05-11 Thread Vadym Chepkov
You didn't have to do 'yum makecache'

Sometimes ago Andrew accidentally replaced some rpms  without bumping up
revision number.
This made yum to complain.
'yum clean all' should have cured all that.



On Tue, May 11, 2010 at 2:09 PM, Simon Lavigne-Giroux simon...@gmail.comwrote:

 I found the solution to my problem, I had to do a 'yum clean all' and 'yum
 makecache' before doing the 'yum update'

 I'm just getting used to yum.

 Simon

 On Mon, May 10, 2010 at 12:55 PM, Simon Lavigne-Giroux simon...@gmail.com
  wrote:

 Hi,

 I'm trying to install pacemaker from your epel-5 repository from your
 guide for a CentOs installation and it doesn't work.

 There is a checksum failure when using 'yum update' :

 http://www.clusterlabs.org/rpm/epel-5/repodata/filelists.xml.gz: [Errno
 -1] Metadata file does not match checksum
 Trying other mirror.
 Error: failure: repodata/filelists.xml.gz from clusterlabs: [Errno 256] No
 more mirrors to try.

 When I call 'yum install pacemaker', I have missing dependency errors for
 these elements

 libnetsnmpagent.so.15
 libcrypto.so.8
 libtinfo.so.5
 libxml2.so.2
 ... and more.

 Can you repair the checksum problem? Is there an alternative way to get
 pacemaker from a repository on CentOs 5.3.

 Thanks

 Simon



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-10 Thread Vadym Chepkov
# crm ra meta ping

name (string, [undef]): Attribute name
The name of the attributes to set.  This is the name to be used in the
constraints.

By default is pingd, but you are checking against pinggw

I suggest you do not change name though, but adjust your location constraint
to use pingd instead.
crm_mon only notices pingd at the moment whenn you pass -f argument: it's
hardcoded


On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecchi
gianluca.cec...@gmail.comwrote:

 Hello,
 using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way
 ping clone works to setup monitoring of gw... even after reading docs...

 As soon as I run:
 crm configure location nfs-group-with-pinggw nfs-group rule -inf:
 not_defined pinggw or pinggw lte 0

 the resources go stopped and don't re-start

 Then, as soon as I run
 crm configure delete nfs-group-with-pinggw

 the resources of the group start again...

 config (part of it, actually) I try to apply is this:
 group nfs-group ClusterIP lv_drbd0 NfsFS nfssrv \
 meta target-role=Started
 ms NfsData nfsdrbd \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
 notify=true
 primitive pinggw ocf:pacemaker:ping \
 params host_list=192.168.101.1 multiplier=100 \
 op start interval=0 timeout=90 \
  op stop interval=0 timeout=100
 clone cl-pinggw pinggw \
 meta globally-unique=false
 location nfs-group-with-pinggw nfs-group \
 rule $id=nfs-group-with-pinggw-rule -inf: not_defined pinggw or pinggw
 lte 0

 Is the location constraint to be done with ping resource or with its clone?
 Is it a cause of the problem that I have also defined an nfs client on the
 other node with:

 primitive nfsclient ocf:heartbeat:Filesystem \
 params device=nfsha:/nfsdata/web directory=/nfsdata/web fstype=nfs \
  op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 colocation nfsclient_not_on_nfs-group -inf: nfs-group nfsclient
 order nfsclient_after_nfs-group inf: nfs-group nfsclient

 Thansk in advance,
 Gianluca

 From messages of the server running the nfs-group at that moment:
 May 10 15:18:27 ha1 cibadmin: [29478]: info: Invoked: cibadmin -Ql
 May 10 15:18:27 ha1 cibadmin: [29479]: info: Invoked: cibadmin -Ql
 May 10 15:18:28 ha1 crm_shadow: [29536]: info: Invoked: crm_shadow -c
 __crmshell.29455
 May 10 15:18:28 ha1 cibadmin: [29537]: info: Invoked: cibadmin -p -U
 May 10 15:18:28 ha1 crm_shadow: [29539]: info: Invoked: crm_shadow -C
 __crmshell.29455 --force
 May 10 15:18:28 ha1 cib: [8470]: info: cib_replace_notify: Replaced:
 0.267.14 - 0.269.1 from null
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: - cib
 epoch=267 num_updates=14 admin_epoch=0 /
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + cib
 epoch=269 num_updates=1 admin_epoch=0 
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 configuration 
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 constraints 
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 rsc_location id=nfs-group-with-pinggw rsc=nfs-group
 __crm_diff_marker__=added:top 
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
   rule boolean-op=or id=nfs-group-with-pinggw-rule score=-INFINITY 
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 expression attribute=pinggw id=nfs-group-with-pinggw-expression
 operation=not_defined /
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 expression attribute=pinggw id=nfs-group-with-pinggw-expression-0
 operation=lte value=0 /
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
   /rule
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 /rsc_location
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 /constraints
 May 10 15:18:28 ha1 crmd: [8474]: info: abort_transition_graph:
 need_abort:59 - Triggered transition abort (complete=1) : Non-status change
 May 10 15:18:28 ha1 attrd: [8472]: info: do_cib_replaced: Sending full
 refresh
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: +
 /configuration
 May 10 15:18:28 ha1 crmd: [8474]: info: need_abort: Aborting on change to
 epoch
 May 10 15:18:28 ha1 attrd: [8472]: info: attrd_trigger_update: Sending
 flush op to all hosts for: master-nfsdrbd:0 (1)
 May 10 15:18:28 ha1 cib: [8470]: info: log_data_element: cib:diff: + /cib
 May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
 origin=abort_transition_graph ]
 May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
 complete: op cib_replace for section 'all' (origin=local/crm_shadow/2,
 version=0.269.1): ok (rc=0)
 May 10 15:18:28 ha1 crmd: [8474]: info: do_state_transition: All 2 cluster
 nodes are eligible to run resources.
 May 10 15:18:28 ha1 cib: [8470]: info: cib_process_request: Operation
 complete: op cib_modify for section nodes 

Re: [Pacemaker] pacemaker and gnbd

2010-05-04 Thread Vadym Chepkov
On Tue, May 4, 2010 at 3:41 AM, Andrew Beekhof and...@beekhof.net wrote:


 Hmmm... I wonder if the RHEL5.5 kernel is new enough to run the dlm.
 I suspect not.

 Why not try the RHEL6 beta?  It comes with compatible versions of
 everything (including pacemaker).


http://ftp.redhat.com/redhat/rhel/beta/6/x86_64/os/Packages/

I don't see gnbd.

And EPEL is not supporting RHEL6 yet.

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 2:23 AM, Andrew Beekhof wrote:
 
 
 I doubt openais conflicts with corosync, unless you have a very old
 version of cman.
 The repos include openais 1.0.x which is built against corosync.
 

Unless I am doing something terribly wrong, this is not the case.

Redhat 5.5 (the latest at the moment) comes with cman-2.0.115-34.el5.x86_64.rpm

# rpm -q --requires -p cman-2.0.115-34.el5.x86_64.rpm 
warning: cman-2.0.115-34.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 
37017186
kernel = 2.6.18-36.el5
/sbin/chkconfig  
/sbin/chkconfig  
openais  
pexpect  
/bin/sh  
/bin/sh  
rpmlib(PayloadFilesHavePrefix) = 4.0-1
rpmlib(CompressedFileNames) = 3.0.4-1
/bin/bash  
/usr/bin/perl  
/usr/bin/python  
libcpg.so.2()(64bit)  
libcpg.so.2(OPENAIS_CPG_1.0)(64bit)  
libc.so.6()(64bit)  
libc.so.6(GLIBC_2.2.5)(64bit)  
libc.so.6(GLIBC_2.3.2)(64bit)  
libc.so.6(GLIBC_2.3.3)(64bit)  
libc.so.6(GLIBC_2.3)(64bit)  
libdlm.so.2()(64bit)  
libdl.so.2()(64bit)  
libm.so.6()(64bit)  
libnss3.so()(64bit)  
libnss3.so(NSS_3.2)(64bit)  
libnss3.so(NSS_3.4)(64bit)  
libpthread.so.0()(64bit)  
libpthread.so.0(GLIBC_2.2.5)(64bit)  
libpthread.so.0(GLIBC_2.3.2)(64bit)  
librt.so.1()(64bit)  
librt.so.1(GLIBC_2.2.5)(64bit)  
libSaCkpt.so.2()(64bit)  
libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit)  
libxml2.so.2()(64bit)  
libz.so.1()(64bit)  
perl(Getopt::Std)  
perl(IPC::Open3)  
perl(Net::Telnet)  
perl(POSIX)  
perl(strict)  
perl(warnings)  
perl(XML::LibXML)  

So, it depends on openais 0.8 (libcpg.so.2) 

And here is yum output:

# yum install gnbd
Setting up Install Process
Resolving Dependencies
-- Running transaction check
--- Package gnbd.x86_64 0:1.1.7-1.el5 set to be updated
-- Processing Dependency: libcman.so.2()(64bit) for package: gnbd
-- Running transaction check
--- Package cman.x86_64 0:2.0.115-34.el5 set to be updated
-- Processing Dependency: libSaCkpt.so.2(OPENAIS_CKPT_B.01.01)(64bit) for 
package: cman
-- Processing Dependency: perl(Net::Telnet) for package: cman
-- Processing Dependency: perl(XML::LibXML) for package: cman
-- Processing Dependency: pexpect for package: cman
-- Processing Dependency: openais for package: cman
-- Processing Dependency: libcpg.so.2(OPENAIS_CPG_1.0)(64bit) for package: cman
-- Processing Dependency: libSaCkpt.so.2()(64bit) for package: cman
-- Processing Dependency: libcpg.so.2()(64bit) for package: cman
-- Running transaction check
--- Package openais.x86_64 0:0.80.6-16.el5 set to be updated
--- Package perl-Net-Telnet.noarch 0:3.03-5 set to be updated
--- Package perl-XML-LibXML.x86_64 0:1.58-6 set to be updated
-- Processing Dependency: perl-XML-NamespaceSupport for package: 
perl-XML-LibXML
-- Processing Dependency: perl-XML-LibXML-Common for package: perl-XML-LibXML
-- Processing Dependency: perl(XML::SAX::Exception) for package: 
perl-XML-LibXML
-- Processing Dependency: perl(XML::LibXML::Common) for package: 
perl-XML-LibXML
-- Processing Dependency: perl-XML-SAX for package: perl-XML-LibXML
-- Processing Dependency: perl(XML::SAX::DocumentLocator) for package: 
perl-XML-LibXML
-- Processing Dependency: perl(XML::SAX::Base) for package: perl-XML-LibXML
-- Processing Dependency: perl(XML::NamespaceSupport) for package: 
perl-XML-LibXML
--- Package pexpect.noarch 0:2.3-3.el5 set to be updated
-- Running transaction check
--- Package perl-XML-LibXML-Common.x86_64 0:0.13-8.2.2 set to be updated
--- Package perl-XML-NamespaceSupport.noarch 0:1.09-1.2.1 set to be updated
--- Package perl-XML-SAX.noarch 0:0.14-8 set to be updated
-- Processing Conflict: corosync conflicts openais = 0.89
-- Finished Dependency Resolution
corosync-1.2.1-1.el5.x86_64 from installed has depsolving problems
  -- corosync conflicts with openais
Error: corosync conflicts with openais


Vadym


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 10:27 AM, Andrew Beekhof wrote:

 It is the case, the conflict is slightly different than you think.
 Corosync doesn't conflict with all versions of openais, just the one
 cman wants to use.
 
 You need to rebuild cman to use the newer version of openais.

Hmm, this is what I asked at the very beginning:

On Sat, May 1, 2010 at 3:30 PM, Vadym Chepkov vchep...@gmail.com wrote:
 Hi,
 
 I found out I can't use gnbd if I use pacemaker rpm from clusterlabs 
 repository, because gnbd depends on cman which requires openais which 
 conflicts with corosync pacemaker depends on .
 Is it just a matter of recompiling cman rpm using corosync libraries instead 
 of openais? Or something else needs to be done?


Unfortunately, cman doesn't get compiled right away:

DEBUG: make[1]: Entering directory 
`/builddir/build/BUILD/cman-2.0.115/cman/daemon'
DEBUG: gcc -Wall  -fPIC -I//builddir/build/BUILD/cman-2.0.115/ccs/lib 
-I//usr/include -I../config -DCMAN_RELEASE_NAME=\2.0.115\ 
-DOPENAIS_EXTERNAL_SERVICE -O2 -c -o daemon.o daemon.c
DEBUG: daemon.c:32:35: error: openais/totem/aispoll.h: No such file or directory
DEBUG: daemon.c:33:35: error: openais/totem/totemip.h: No such file or directory
DEBUG: In file included from daemon.c:37:
DEBUG: cnxman-private.h:17:33: error: openais/totem/totem.h: No such file or 
directory
DEBUG: In file included from daemon.c:42:
DEBUG: ais.h:25: error: array type has incomplete element type
DEBUG: ais.h:26: error: array type has incomplete element type
DEBUG: daemon.c:59: error: expected '=', ',', ';', 'asm' or '__attribute__' 
before 'ais_poll_handle'
DEBUG: daemon.c:62: error: expected ')' before 'handle'
DEBUG: daemon.c:63: error: expected ')' before 'handle'
DEBUG: daemon.c: In function 'send_reply_message':
DEBUG: daemon.c:89: warning: implicit declaration of function 'remove_client'
DEBUG: daemon.c:89: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:89: error: (Each undeclared identifier is reported only once
DEBUG: daemon.c:89: error: for each function it appears in.)
DEBUG: daemon.c:108: warning: implicit declaration of function 
'poll_dispatch_modify'
DEBUG: daemon.c:108: error: 'process_client' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:113: error: expected ')' before 'handle'
DEBUG: daemon.c: In function 'send_queued_reply':
DEBUG: daemon.c:168: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:168: error: 'process_client' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:173: error: expected ')' before 'handle'
DEBUG: daemon.c:323: error: expected ')' before 'handle'
DEBUG: daemon.c:354: error: expected declaration specifiers or '...' before 
'poll_handle'
DEBUG: daemon.c: In function 'open_local_sock':
DEBUG: daemon.c:402: warning: implicit declaration of function 
'poll_dispatch_add'
DEBUG: daemon.c:402: error: 'handle' undeclared (first use in this function)
DEBUG: daemon.c:402: error: 'process_rendezvous' undeclared (first use in this 
function)
DEBUG: daemon.c: At top level:
DEBUG: daemon.c:500: error: expected '=', ',', ';', 'asm' or '__attribute__' 
before 'aisexec_poll_handle'
DEBUG: daemon.c: In function 'cman_init':
DEBUG: daemon.c:506: error: 'ais_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:506: error: 'aisexec_poll_handle' undeclared (first use in this 
function)
DEBUG: daemon.c:512: error: too many arguments to function 'open_local_sock'
DEBUG: daemon.c:516: error: too many arguments to function 'open_local_sock'
DEBUG: make[1]: Leaving directory 
`/builddir/build/BUILD/cman-2.0.115/cman/daemon'
DEBUG: RPM build errors:
DEBUG: make[1]: *** [daemon.o] Error 1



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] pacemaker and gnbd

2010-05-03 Thread Vadym Chepkov

On May 3, 2010, at 6:03 PM, Vadym Chepkov wrote:

 
 On May 3, 2010, at 5:39 PM, Andrew Beekhof wrote:
 
 
 perhaps try the srpm from F-12
 
 Would be nice, but the last one was in F-9, it seems:
 
 http://koji.fedoraproject.org/koji/packageinfo?packageID=182

Oh, I found out it's part of cluster package now
But it also doesn't compile :(

DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 
'create_lockspace_v5':
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: 
'DLM_LOCKSPACE_LEN' undeclared (first use in this function)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: 
(Each undeclared identifier is reported only once
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: error: for 
each function it appears in.)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1236: warning: 
left-hand operand of comma expression has no effect
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1231: warning: 
unused variable 'reqbuf'
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c: In function 
'create_lockspace_v6':
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: error: 
'DLM_LOCKSPACE_LEN' undeclared (first use in this function)
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1260: warning: 
left-hand operand of comma expression has no effect
DEBUG: /builddir/build/BUILD/cluster-3.0.7/dlm/libdlm/libdlm.c:1255: warning: 
unused variable 'reqbuf'
DEBUG: make[2]: make[2]: Leaving directory 
`/builddir/build/BUILD/cluster-3.0.7/dlm/libdlm'

Vadym
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


[Pacemaker] pacemaker and gnbd

2010-05-01 Thread Vadym Chepkov
Hi,

I found out I can't use gnbd if I use pacemaker rpm from clusterlabs 
repository, because gnbd depends on cman which requires openais which conflicts 
with corosync pacemaker depends on .
Is it just a matter of recompiling cman rpm using corosync libraries instead of 
openais? Or something else needs to be done?

Thank you,
Vadym Chepkov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] OpenAIS priorities

2010-04-29 Thread Vadym Chepkov
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/node-score-equal.html

On Apr 29, 2010, at 10:20 AM, Dan Frincu wrote:

 Greetings all,
 
 In the case of two servers in a cluster with OpenAIS, take the following 
 example:
 
 location Failover_Alert_1 Failover_Alert 100: abc.localdomain
 location Failover_Alert_2 Failover_Alert 200: def.localdomain
 
 This will setup the preference of a resource to def.localdomain because it 
 has the higher priority assigned to it, but what happens when the priorities 
 match, is there a tiebreaker, some sort of election process to choose which 
 node will be the one handling the resource?
 
 Thank you in advance,
 Best regards.
 
 -- 
 Dan FRINCU
 Internal Support Engineer
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


[Pacemaker] duality and equality

2010-04-10 Thread Vadym Chepkov
Hi,

I noticed there are quite a few configuration parameters in pacemaker that can 
be set two different ways: via cluster properties or rsc/op_defaults.
For example,
property default-resource-stickiness and rsc_defaults resource-stickiness,
property is-managed-default and rsc_defaults is-managed, property 
stop-all-resources and rsc_defaults target-role, property 
default-action-timeout and op_defaults timeout. I assume this duality exists 
for historical reasons and in computing world it is not unusual to achieve the 
same results in different ways. But in this case curios minds want to know 
which parameter takes precedence if equal parameters are both set and 
contradict each other? 

I also noticed some differences in how these settings are assessed.

# crm configure show
node c20.chepkov.lan
node c21.chepkov.lan
primitive ip_rg0 ocf:heartbeat:IPaddr2 \
params nic=eth0 ip=10.10.10.22 cidr_netmask=32
primitive ping ocf:pacemaker:ping \
params name=ping dampen=5s multiplier=200 host_list=10.10.10.250
clone connected ping \
meta globally-unique=false
property $id=cib-bootstrap-options \
dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
no-quorum-policy=ignore \
stonith-enabled=false

# crm configure verify
WARNING: ping: default-action-timeout 20s for start is smaller than the advised 
60
WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the 
advised 90
WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the 
advised 100

# crm configure op_defaults timeout=120
WARNING: ping: default-action-timeout 20s for start is smaller than the advised 
60
WARNING: ip_rg0: default-action-timeout 20s for start is smaller than the 
advised 90
WARNING: ip_rg0: default-action-timeout 20s for stop is smaller than the 
advised 100

But,

# crm configure property default-action-timeout=120

makes it happy.

And this makes me wonder, are these parameters really the same or do they have 
a different meanings? Thank you.

Sincerely yours,
  Vadym Chepkov

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


<    1   2