Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-22 Thread Brian J. Murrell (brian)
On Mon, 2014-10-13 at 12:51 +1100, Andrew Beekhof wrote: Even the same address can be a problem. That brief window where things were getting renewed can screw up corosync. But as I proved, there was no renewal at all during the period of this entire pacemaker run, so the use of DHCP here is

Re: [Pacemaker] unknown third node added to a 2 node cluster?

2014-10-10 Thread Brian J. Murrell (brian)
On Wed, 2014-10-08 at 12:39 +1100, Andrew Beekhof wrote: On 8 Oct 2014, at 2:09 am, Brian J. Murrell (brian) brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes node5 and node6 I saw an unknown third node being added

[Pacemaker] unknown third node added to a 2 node cluster?

2014-10-07 Thread Brian J. Murrell (brian)
Given a 2 node pacemaker-1.1.10-14.el6_5.3 cluster with nodes node5 and node6 I saw an unknown third node being added to the cluster, but only on node5: Sep 18 22:52:16 node5 corosync[17321]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 12: memb=2, new=0, lost=0 Sep

[Pacemaker] another node rebooting too quickly bug?

2014-04-24 Thread Brian J. Murrell
Hi, As was previously discussed there is a bug in the handling of a STONITH if a node reboots too quickly. I had a different kind of failure that I suspect is the same kind of problem, just different symptom. The situation is a two node cluster with two resources plus a fencing resource. Each

Re: [Pacemaker] Node stuck in pending state

2014-04-10 Thread Brian J. Murrell
On Thu, 2014-04-10 at 10:04 +1000, Andrew Beekhof wrote: Brian: the detective work above is highly appreciated NP. I feel like I am getting better at reading these logs and can provide some more detailed dissection of them. And am happy to do so to help get to the bottom of things. :-)

Re: [Pacemaker] Node stuck in pending state

2014-04-09 Thread Brian J. Murrell
On Tue, 2014-04-08 at 17:29 -0400, Digimer wrote: Looks like your fencing (stonith) failed. Where? If I'm reading the logs correctly, it looks like stonith worked. Here's the stonith: Apr 8 09:53:21 lotus-4vm6 stonith-ng[2492]: notice: log_operation: Operation 'reboot' [3306] (call 2 from

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: What version of pacemaker? Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. On 10 Dec 2013, at 4:40 am, Brian J. Murrell brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: I didn't seem to get a response

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-02-06 Thread Brian J. Murrell (brian)
On Thu, 2014-02-06 at 10:42 -0500, Brian J. Murrell (brian) wrote: On Wed, 2014-01-08 at 13:30 +1100, Andrew Beekhof wrote: What version of pacemaker? Most recently I have been seeing this in 1.1.10 as shipped by RHEL6.5. Doh! Somebody did a test run that had not been updated to use

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: Consider any long running action, such as starting a database. We do not update the CIB until after actions have completed, so there can and will be times when the status section is out of date to one degree or another. But that is

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Brian J. Murrell (brian)
On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote: I know, I was giving you another example of when the cib is not completely up-to-date with reality. Yeah, I understood that. I was just countering with why that example is actually more acceptable. It may very well be partially

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote: On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. I should have asked in my previous message: is this entirely an artifact of having just restarted or are there any

[Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
Hi, I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output of crm_resource -L is not trust-able, shortly after a node is booted. Here is the output from crm_resource -L on one of the nodes in a two node cluster (the one that was not rebooted): st-fencing

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Brian J. Murrell (brian)
On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. Should crm_resource actually be [mis-]reporting as if it were knowledgeable when it's not though? IOW is this expected behaviour or should it be considered a bug? Should I open a

Re: [Pacemaker] does adding a second ring actually work with cman?

2013-12-17 Thread Brian J. Murrell
On Tue, 2013-12-17 at 16:33 +0100, Florian Crouzat wrote: Is it possible that lotus-5vm8 (from DNS) and lotus-5vm8-ring1 (from /etc/hosts) resolves to the same IP (10.128.0.206) which could maybe confuse cman and make it decide that there is only one ring ? No, they do resolve to two

[Pacemaker] cman, ccs: Validation Failure, unable to modify configuration file

2013-12-16 Thread Brian J. Murrell
So, trying to create a cluster on a given node with ccs: # ccs -p xxx -h $(hostname) --createcluster foo2 Validation Failure, unable to modify configuration file (use -i to ignore this error). But there shouldn't be any configuration here yet. I've not done anything with this node: # ccs -p

[Pacemaker] does adding a second ring actually work with cman?

2013-12-16 Thread Brian J. Murrell
So, I was reading: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/s2-rrp-ccs-CA.html about adding a second ring to one's CMAN configuration. I tried to add a second ring to my configuration without success. Given the example: # ccs -h

Re: [Pacemaker] is ccs as racy as it feels?

2013-12-10 Thread Brian J. Murrell
On Tue, 2013-12-10 at 10:27 +, Christine Caulfield wrote: Sadly you're not wrong. That's what I was afraid of. But it's actually no worse than updating corosync.conf manually, I think it is... in fact it's pretty much the same thing, Not really. Updating corosync.conf on any

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-09 Thread Brian J. Murrell
On Mon, 2013-12-09 at 09:28 +0100, Jan Friesse wrote: Error 6 error means try again. This is happening ether if corosync is overloaded or creating new membership. Please take a look to /var/log/cluster/corosync.log if you see something strange there (+ make sure you have newest corosync).

[Pacemaker] is ccs as racy as it feels?

2013-12-09 Thread Brian J. Murrell
So, I'm trying to wrap my head around this need to migrate to pacemaker +CMAN. I've been looking at http://clusterlabs.org/quickstart-redhat.html and https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/ It seems ccs is the tool to configure

[Pacemaker] prevent starting resources on failed node

2013-12-06 Thread Brian J. Murrell (brian)
[ Hopefully this doesn't cause a duplicate post but my first attempt returned an error. ] Using pacemaker 1.1.10 (but I think this issue is more general than that release), I want to enforce a policy that once a node fails, no resources can be started/run on it until the user permits it. I have

[Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-06 Thread Brian J. Murrell (brian)
I seem to have another instance where pacemaker fails to exit at the end of a shutdown. Here's the log from the start of the service pacemaker stop: Dec 3 13:00:39 wtm-60vm8 crmd[14076]: notice: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS

Re: [Pacemaker] catch-22: can't fence node A because node A has the fencing resource

2013-12-03 Thread Brian J. Murrell
On Tue, 2013-12-03 at 18:26 -0500, David Vossel wrote: We did away with all of the policy engine logic involved with trying to move fencing devices off of the target node before executing the fencing action. Behind the scenes all fencing devices are now essentially clones. If the

[Pacemaker] catch-22: can't fence node A because node A has the fencing resource

2013-12-02 Thread Brian J. Murrell
So, I'm migrating my working pacemaker configuration from 1.1.7 to 1.1.10 and am finding what appears to be a new behavior in 1.1.10. If a given node is running a fencing resource and that node goes AWOL, it needs to be fenced (of course). But any other node trying to take over the fencing

Re: [Pacemaker] Best way to notify stonith action

2013-07-08 Thread Brian J. Murrell
On 13-07-08 03:48 AM, Andreas Mock wrote: Hi all, I'm just wondering what the best way is to let an admin know that the cluster (rest of a cluster) has stonithed some other nodes? You could modify or even just wrap the stonith agent. They are usually just python or shell script anyway

Re: [Pacemaker] error: do_exit: Could not recover from internal error

2013-05-23 Thread Brian J. Murrell
On 13-05-22 07:05 PM, Andrew Beekhof wrote: Also, 1.1.8-7 was not tested with the plugin _at_all_ (and neither will future RHEL builds). Was 1.1.7-* in EL 6.3 tested with the plugin? Is staying with most recent EL 6.3 pacemaker-1.1.7 release really the more stable option for people not

[Pacemaker] error: do_exit: Could not recover from internal error

2013-05-22 Thread Brian J. Murrell
Using pacemaker 1.1.8-7 on EL6, I got the following series of events trying to shut down pacemaker and then corosync. The corosync shutdown (service corosync stop) ended up spinning/hanging indefinitely (~7hrs now). The events, including a: May 21 23:47:18 node1 crmd[17598]:error: do_exit:

[Pacemaker] stonith-ng: error: remote_op_done: Operation reboot of node2 by node1 for stonith_admin: Timer expired

2013-05-16 Thread Brian J. Murrell
Using Pacemaker 1.1.8 on EL6.4 with the pacemaker plugin, I'm finding strange behavior with stonith-admin -B node2. It seems to shut the node down but not start it back up and ends up reporting a timer expired: # stonith_admin -B node2 Command failed: Timer expired The pacemaker log for the

Re: [Pacemaker] resource starts but then fails right away

2013-05-10 Thread Brian J. Murrell
On 13-05-09 09:53 PM, Andrew Beekhof wrote: May 7 02:36:16 node1 crmd[16836]: info: delete_resource: Removing resource testfs-resource1 for 18002_crm_resource (internal) on node1 May 7 02:36:16 node1 lrmd: [16833]: info: flush_op: process for operation monitor[8] on

[Pacemaker] resource starts but then fails right away

2013-05-09 Thread Brian J. Murrell
Using Pacemaker 1.1.7 on EL6.3, I'm getting an intermittent recurrence of a situation where I add a resource and start it and it seems to start but then right away fail. i.e. # clean up resource before trying to start, just to make sure we start with a clean slate # crm resource cleanup

[Pacemaker] will a stonith resource be moved from an AWOL node?

2013-04-30 Thread Brian J. Murrell
I'm using pacemaker 1.1.8 and I don't see stonith resources moving away from AWOL hosts like I thought I did with 1.1.7. So I guess the first thing to do is clear up what is supposed to happen. If I have a single stonith resource for a cluster and it's running on node A and then node A goes

Re: [Pacemaker] will a stonith resource be moved from an AWOL node?

2013-04-30 Thread Brian J. Murrell
On 13-04-30 11:13 AM, Lars Marowsky-Bree wrote: Pacemaker 1.1.8's stonith/fencing subsystem directly ties into the CIB, and will complete the fencing request even if the fencing/stonith resource is not instantiated on the node yet. But clearly that's not happening here. (There's a bug in

[Pacemaker] warning: unpack_rsc_op: Processing failed op monitor for my_resource on node1: unknown error (1)

2013-04-30 Thread Brian J. Murrell
Using 1.1.8 on EL6.4, I am seeing this sort of thing: pengine[1590]: warning: unpack_rsc_op: Processing failed op monitor for my_resource on node1: unknown error (1) The full log from the point of adding the resource until the errors: Apr 30 11:46:30 node1 cibadmin[3380]: notice:

Re: [Pacemaker] why so long to stonith?

2013-04-24 Thread Brian J. Murrell
On 13-04-24 01:16 AM, Andrew Beekhof wrote: Almost certainly you are hitting: https://bugzilla.redhat.com/show_bug.cgi?id=951340 Yup. The patch posted there fixed it. I am doing my best to convince people that make decisions that this is worthy of an update before 6.5. I've added

[Pacemaker] why so long to stonith?

2013-04-23 Thread Brian J. Murrell
Using pacemaker 1.1.8 on RHEL 6.4, I did a test where I just killed (-KILL) corosync on a peer node. Pacemaker seemed to take a long time to transition to stonithing it though after noticing it was AWOL: Apr 23 19:05:20 node2 corosync[1324]: [TOTEM ] A processor failed, forming new

[Pacemaker] crm_attribute not returning node attribute

2013-04-19 Thread Brian J. Murrell
Given: host1# crm node attribute host1 show foo scope=nodes name=foo value=bar Why doesn't this return anything: host1# crm_attribute --node host1 --name foo --query host1# echo $? 0 cibadmin -Q confirms the presence of the attribute: node id=host1 uname=host1

Re: [Pacemaker] racing crm commands... last write wins?

2013-04-12 Thread Brian J. Murrell
On 13-04-11 06:00 PM, Andrew Beekhof wrote: Actually, I think the semantics of -C are first-write-wins and any subsequent attempts fail with -EEXSIST Indeed, you are correct. I think my point though was that it didn't matter in my case which writer wins since they should all be trying to

Re: [Pacemaker] racing crm commands... last write wins?

2013-04-12 Thread Brian J. Murrell
On 13-04-10 07:02 PM, Andrew Beekhof wrote: On 11/04/2013, at 6:33 AM, Brian J. Murrell brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: Does crm_resource suffer from this problem no Excellent. I was unable to find any comprehensive documentation on just how to implement

Re: [Pacemaker] racing crm commands... last write wins?

2013-04-11 Thread Brian J. Murrell
On 13-04-10 04:33 PM, Brian J. Murrell wrote: Does crm_resource suffer from this problem or does it properly only send exactly the update to the CIB for the operation it's trying to achieve? In exploring all options, how about pcs? Does pcs' resource create ... for example have the same read

Re: [Pacemaker] racing crm commands... last write wins?

2013-04-11 Thread Brian J. Murrell
On 13-04-11 07:37 AM, Brian J. Murrell wrote: In exploring all options, how about pcs? Does pcs' resource create ... for example have the same read+modify+replace problem as crm configure or does pcs resource create also only send proper fragments to update just the part of the CIB it's

Re: [Pacemaker] racing crm commands... last write wins?

2013-04-10 Thread Brian J. Murrell
On 13-02-21 07:48 PM, Andrew Beekhof wrote: On Fri, Feb 22, 2013 at 5:18 AM, Brian J. Murrell brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: I wonder what happens in the case of two racing crm commands that want to update the CIB (with non-overlapping/conflicting data). Is there any

Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster

2013-03-28 Thread Brian J. Murrell
On 13-03-25 03:50 PM, Jacek Konieczny wrote: The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Right. But with typical two-node clusters ignoring no-quorum, because quorum is being ignored, as soon

Re: [Pacemaker] racing crm commands... last write wins?

2013-02-25 Thread Brian J. Murrell
On 13-02-25 10:30 AM, Dejan Muhamedagic wrote: Before doing replace, crmsh queries the CIB and checks if the epoch was modified in the meantime. But doesn't take out a lock of any sort to prevent an update in the meanwhile, right? Those operations are not atomic, though. Indeed. Perhaps

[Pacemaker] a situation where pacemaker refuses to stop

2013-02-23 Thread Brian J. Murrell
I seem to have found a situation where pacemaker (pacemaker-1.1.7-6.el6.x86_64) refuses to stop (i.e. service pacemaker stop) on EL6. The status of the 2 node cluster was that the node being asked to stop (node2) was continually trying to stonith another node (node1) in the cluster which was not

[Pacemaker] racing crm commands... last write wins?

2013-02-21 Thread Brian J. Murrell
I wonder what happens in the case of two racing crm commands that want to update the CIB (with non-overlapping/conflicting data). Is there any locking to ensure that one crm cannot overwrite the other's change? (i.e. second one to get there has to read in the new CIB before being able to apply

[Pacemaker] return properties and rsc_defaults back to default values

2013-02-14 Thread Brian J. Murrell
Is there a way to return an individual property (or all properties) and/or a rsc_default (or all) back to default values, using crm, or otherwise? Cheers, b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list:

[Pacemaker] location constraint anywhere on asymmetric cluster

2013-01-30 Thread Brian J. Murrell
I'm experimenting with asymmetric clusters and resource location constraints. My cluster has some resources which have to be restricted to certain nodes and other resources which can run on any node. Given that, an opt-in cluster seems the most manageable. That is, it seems easier to create

Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread Brian J. Murrell
On 13-01-23 03:32 AM, Dan Frincu wrote: Hi, Hi, I usually put the node in standby, which means it can no longer run any resources on it. Both Pacemaker and Corosync continue to run, node provides quorum. But a node in standby will still be STONITHed if it goes AWOL. I put a node in standby

[Pacemaker] best/proper way to shut down a node for service

2013-01-22 Thread Brian J. Murrell
OK. So you have a corosync cluster of nodes with pacemaker managing resources on them, including (of course) STONITH. What's the best/proper way to shut down a node, say, for maintenance such that pacemaker doesn't go trying to fix that situation and STONITHing it to try to bring it back up,

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-04 Thread Brian J. Murrell
On 12-07-04 02:12 AM, Andrew Beekhof wrote: On Wed, Jul 4, 2012 at 10:06 AM, Brian J. Murrell brian-squohqy54cvwr29bmmi...@public.gmane.org wrote: Just because I reduced the number of nodes doesn't mean that I reduced the parallelism any. Yes. You did. You reduced the number of check

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-04 Thread Brian J. Murrell
On 12-07-04 04:27 AM, Andreas Kurz wrote: beside increasing the batch limit to a higher value ... did you also tune corosync totem timings? Not yet. But a closer look at the logs reveals a bunch of these: Jun 28 14:56:56 node-2 corosync[30497]: [pcmk ] ERROR: send_cluster_msg_raw: Child

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-06-27 11:30 PM, Andrew Beekhof wrote: The updates from you aren't the problem. Its the number of resource operations (that need to be stored in the CIB) that result from your changes that might be causing the problem. Just to follow this up for anyone currently following or anyone

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-07-03 06:17 PM, Andrew Beekhof wrote: Even adding passive nodes multiplies the number of probe operations that need to be performed and loaded into the cib. So it seems. I just would have not thought they be such a load since from a simplistic perspective, since they are not trying to

Re: [Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-07-03 Thread Brian J. Murrell
On 12-07-03 04:26 PM, David Vossel wrote: This is not a definite. Perhaps you are experiencing this given the pacemaker version you are running Yes, that is absolutely possible and it certainly has been under consideration throughout this process. I did also recognize however, that I am

[Pacemaker] Call cib_query failed (-41): Remote node did not respond

2012-06-26 Thread Brian J. Murrell
So, I have an 18 node cluster here (so a small haystack, indeed, but still a haystack in which to try to find a needle) where a certain set of (yet unknown, figuring that out is part of this process) operations are pooching pacemaker. The symptom is that on one or more nodes I get the following

[Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Brian J. Murrell
In my cluster configuration, each resource can be run on one of two node and I designate a primary and a secondary using location constraints such as: location FOO-primary FOO 20: bar1 location FOO-secondary FOO 10: bar2 And I also set a default stickiness to prevent auto-fail-back (i.e. to

Re: [Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Brian J. Murrell
On 12-03-30 02:35 PM, Florian Haas wrote: crm configure rsc_defaults resource-stickiness=0 ... and then when resources have moved back, set it to 1000 again. It's really that simple. :) That sounds racy. I am changing a parameter which has the potential to affect the stickiness of all

[Pacemaker] resources show as running on all nodes right after adding them

2012-03-28 Thread Brian J. Murrell
We seem to have occasion where we find crm_resource reporting that a resource is running on more (usually all!) nodes when we query right after adding it: # crm_resource -resource chalkfs-OST_3 --locate resource chalkfs-OST_3 is running on: chalk02 resource chalkfs-OST_3 is running

Re: [Pacemaker] resources show as running on all nodes right after adding them

2012-03-28 Thread Brian J. Murrell
On 12-03-28 10:39 AM, Florian Haas wrote: Probably because your resource agent reports OCF_SUCCESS on a probe operation To be clear, is this the status $OP in the agent? Cheers, b. signature.asc Description: OpenPGP digital signature ___

[Pacemaker] running a resource on any node in an asymmetric cluster

2011-10-26 Thread Brian J. Murrell
I want to be able to run a resource on any node in an asymmetric cluster so I tried creating a rule to run it on any node not named foo since there are no nodes named foo in my cluster: # cat /tmp/foo.xml rsc_location id=run-bar-anywhere rsc=bar rule id=run-bar-anywhere-rule score=100

Re: [Pacemaker] running a resource on any node in an asymmetric cluster

2011-10-26 Thread Brian J. Murrell
On 11-10-26 10:19 AM, Brian J. Murrell wrote: # cat /tmp/foo.xml rsc_location id=run-bar-anywhere rsc=bar rule id=run-bar-anywhere-rule score=100 ^^^ I figured it out. This integer has to be quoted. I'm thinking too much like a programmer

[Pacemaker] cloning primatives with differing params

2011-10-25 Thread Brian J. Murrell
I want to create a stonith primitive and clone it for each node in my cluster. I'm using the fence-agents virsh agent as my stonith primitive. Currently for a single node it looks like: primitive st-pm-node1 stonith:fence_virsh \ params ipaddr=192.168.122.1 login=xxx passwd=xxx

[Pacemaker] stonith configured but not happening

2011-10-18 Thread Brian J. Murrell
I have a pacemaker 1.0.10 installation on rhel5 but I can't seem to manage to get a working stonith configuration. I have tested my stonith device manually using the stonith command and it works fine. What doesn't seem to be happening is pacemaker/stonithd actually asking for a stonith. In my

Re: [Pacemaker] stonith configured but not happening

2011-10-18 Thread Brian J. Murrell
On 11-10-18 09:40 AM, Andreas Kurz wrote: Hello, Hi, I'd expect this to be the problem ... if you insist on using an unsymmetric cluster you must add a location score for each resource you want to be up on a node ... so add a location constraint for the fencing clone for each node ... or

[Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed

2011-09-29 Thread Brian J. Murrell
So, in another thread there was a discussion of using cibadmin to mitigate possible concurrency issue of crm shell. I have written a test program to test that theory and unfortunately cibadmin falls down in the face of heavy concurrency also with errors such as: Signon to CIB failed: connection

Re: [Pacemaker] Concurrent runs of 'crm configure primitive' interfering

2011-09-28 Thread Brian J. Murrell
On 11-09-28 10:20 AM, Dejan Muhamedagic wrote: Hi, Hi, I'm really not sure. Need to investigate this area more. Well, I am experimenting with cibadmin. It's certainly not as nice and shiny as crm shell though. :-) cibadmin talks to the cib (the process) and cib should allow only one

Re: [Pacemaker] Call cib_modify failed (-22): The object/attribute does not exist

2011-09-26 Thread Brian J. Murrell
On 11-09-26 03:44 AM, Tim Serong wrote: Because: 1) You need to run cibadmin -o resources -C -x test.xml to create the resource (-C creates, -U updates an existing resource). That's what I thought/wondered but the EXAMPLES section in the manpage is quite clear that it's asking one to

Re: [Pacemaker] Call cib_modify failed (-22): The object/attribute does not exist

2011-09-26 Thread Brian J. Murrell
On 11-09-25 09:21 PM, Andrew Beekhof wrote: As the error says, the resource R_10.10.10.101 doesn't exist yet. Put it in a resources tag or use -C instead of -U Thanks much. I already replied to Tim, but the summary is that the manpage is incorrect in two places. One is specifying the

[Pacemaker] Call cib_modify failed (-22): The object/attribute does not exist

2011-09-24 Thread Brian J. Murrell
Using pacemaker-1.0.10-1.4.el5 I am trying to add the R_10.10.10.101 IPaddr2 example resource: primitive id=R_10.10.10.101 class=ocf type=IPaddr2 provider=heartbeat instance_attributes id=RA_R_10.10.10.101 attributes nvpair id=R_ip_P_ip name=ip value=10.10.10.101/ nvpair id=R_ip_P_nic

[Pacemaker] property default-resource-stickiness vs. rsc_defaults resource-stickiness

2011-08-25 Thread Brian J. Murrell
I've seen both of setting a default-resource-stickiness property (i.e. http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1) and a rsc_defaults option with resource-stickiness