Re: [Pacemaker] command to dump cluster configuration in pcs format?

2014-01-16 Thread Andrew Beekhof
On 16 Jan 2014, at 10:59 pm, Lars Marowsky-Bree l...@suse.com wrote: On 2014-01-15T20:25:30, Bob Haxo bh...@sgi.com wrote: Unfortunately, it configuration has taken me weeks to develop (what now seems to be) a working configuration (including mods to the VirtualDomain agent to avoid

Re: [Pacemaker] command to dump cluster configuration in pcs format?

2014-01-16 Thread Andrew Beekhof
On 17 Jan 2014, at 9:05 am, Lars Marowsky-Bree l...@suse.com wrote: On 2014-01-17T07:40:34, Andrew Beekhof and...@beekhof.net wrote: Well, unless RHT states that installing crmsh on top of their distribution invalidates support for the pacemaker back-end, you could just ship crmsh as part

Re: [Pacemaker] Question about new migration

2014-01-15 Thread Andrew Beekhof
On 15 Jan 2014, at 7:12 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi David, With new migration logic, when VM was migrated by 'node standby', start was performed in migrate_target. (migrate_from was not performed.) Is this the designed behavior? # crm_mon -rf1 Stack:

Re: [Pacemaker] hangs pending

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 12:41 am, Andrey Groshev gre...@yandex.ru wrote: 15.01.2014, 02:53, Andrew Beekhof and...@beekhof.net: On 15 Jan 2014, at 12:15 am, Andrey Groshev gre...@yandex.ru wrote: 14.01.2014, 10:00, Andrey Groshev gre...@yandex.ru: 14.01.2014, 07:47, Andrew Beekhof

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 6:53 am, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: On Wed, 2014-01-15 at 17:11 +1100, Andrew Beekhof wrote: Consider any long running action, such as starting a database. We do not update the CIB until after actions have completed, so there can

Re: [Pacemaker] command to dump cluster configuration in pcs format?

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 11:49 am, Bob Haxo bh...@sgi.com wrote: On 01/15/2014 05:02 PM, Bob Haxo wrote: Greetings, The command crm configure show dumps the cluster configuration in a format that is suitable for use in configuring a cluster. The command pcs config generates nice human

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 1:13 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: On Thu, 2014-01-16 at 08:35 +1100, Andrew Beekhof wrote: I know, I was giving you another example of when the cib is not completely up-to-date with reality. Yeah, I understood that. I was just countering

Re: [Pacemaker] command to dump cluster configuration in pcs format?

2014-01-15 Thread Andrew Beekhof
On 16 Jan 2014, at 3:25 pm, Bob Haxo bh...@sgi.com wrote: On Thu, 2014-01-16 at 12:32 +1100, Andrew Beekhof wrote: On 16 Jan 2014, at 11:49 am, Bob Haxo bh...@sgi.com wrote: On 01/15/2014 05:02 PM, Bob Haxo wrote: Greetings, The command crm configure show dumps the cluster

Re: [Pacemaker] Time to get ready for 1.1.11

2014-01-15 Thread Andrew Beekhof
resource manager pacemaker@oss.clusterlabs.org Sent: Tuesday, January 7, 2014 4:50:11 PM Subject: Re: [Pacemaker] Time to get ready for 1.1.11 - Original Message - From: Andrew Beekhof and...@beekhof.net To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent

Re: [Pacemaker] [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-14 Thread Andrew Beekhof
' No other way to move master? 2014/1/13 Andrew Beekhof and...@beekhof.net On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky a.rogov...@gmail.com wrote: Hi I have 3 node postgresql cluster. It work well. But I have some trobule with change master. For now, if I need change master

Re: [Pacemaker] hangs pending

2014-01-14 Thread Andrew Beekhof
On 15 Jan 2014, at 12:15 am, Andrey Groshev gre...@yandex.ru wrote: 14.01.2014, 10:00, Andrey Groshev gre...@yandex.ru: 14.01.2014, 07:47, Andrew Beekhof and...@beekhof.net: Ok, here's what happens: 1. node2 is lost 2. fencing of node2 starts 3. node2 reboots (and cluster

Re: [Pacemaker] [Enhancement] Change of the globally-unique attribute of the resource.

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 7:26 pm, renayama19661...@ybb.ne.jp wrote: Hi All, When a user changes the globally-unique attribute of the resource, a problem occurs. When it manages the resource with PID file, this occurs, but this is because PID file name changes by globally-unique attribute.

Re: [Pacemaker] [Enhancement] Change of the globally-unique attribute of the resource.

2014-01-14 Thread Andrew Beekhof
of the globally-unique attribute. I'd have expected the stop action to be performed with the old attributes. crm_report tarball? Okay. I register this topic with Bugzilla. I attach the log to Bugzilla. Best Regards, Hideo Yamauchi. --- On Wed, 2014/1/15, Andrew Beekhof

Re: [Pacemaker] Consider extra slave node resource when calculating actions for failover

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 11:25 pm, Juraj Fabo juraj.f...@gmail.com wrote: Hi I have master-slave cluster with configuration attached below. It is based on documented postgresql master-slave cluster configuration. Colocation constraints should work that way that if some of master-group

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-14 Thread Andrew Beekhof
On 14 Jan 2014, at 11:50 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: On Tue, 2014-01-14 at 16:01 +1100, Andrew Beekhof wrote: On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. I should have asked in my

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:31 pm, Andrey Groshev gre...@yandex.ru wrote: 13.01.2014, 02:51, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 9:55 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 14:31, Andrey Groshev gre...@yandex.ru: 10.01.2014, 14:01, Andrew Beekhof

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 5:13 am, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: Hi, I found a situation using pacemaker 1.1.10 on RHEL6.5 where the output of crm_resource -L is not trust-able, shortly after a node is booted. Here is the output from crm_resource -L on one of the nodes

Re: [Pacemaker] [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky a.rogov...@gmail.com wrote: Hi I have 3 node postgresql cluster. It work well. But I have some trobule with change master. For now, if I need change master, I must: 1) Stop PGSQL on each node and cluster service 2) Start Setup new manual

Re: [Pacemaker] Location / Colocation constraints issue

2014-01-13 Thread Andrew Beekhof
On 19 Dec 2013, at 1:08 am, Gaƫtan Slongo gslo...@it-optics.com wrote: Hi ! I'm currently building a 2 node cluster for firewalling. I would like to run a shorewall on both on the master and the Slave node. I tried many things but nothing works as expected. Shorewall configurations are

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
' returned: -62 (Timer expired) On 14 Jan 2014, at 7:18 am, Andrew Beekhof and...@beekhof.net wrote: On 13 Jan 2014, at 8:31 pm, Andrey Groshev gre...@yandex.ru wrote: 13.01.2014, 02:51, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 9:55 pm, Andrey Groshev gre...@yandex.ru wrote

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 1:19 pm, Andrew Beekhof and...@beekhof.net wrote: Apart from anything else, your timeout needs to be bigger: Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( commands.c:1321 ) error: log_operation: Operation 'reboot' [11331] (call 2 from

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
sure that the victim will rebooted and again available via ssh - it exit with 0. does not seem true. On 14 Jan 2014, at 1:19 pm, Andrew Beekhof and...@beekhof.net wrote: Apart from anything else, your timeout needs to be bigger: Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith

Re: [Pacemaker] hangs pending

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:34 pm, Andrey Groshev gre...@yandex.ru wrote: 14.01.2014, 06:25, Andrew Beekhof and...@beekhof.net: Apart from anything else, your timeout needs to be bigger: Jan 13 12:21:36 [17223] dev-cluster2-node1.unix.tensor.ru stonith-ng: ( commands.c:1321 ) error

Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:41 pm, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: On Tue, 2014-01-14 at 08:09 +1100, Andrew Beekhof wrote: The local cib hasn't caught up yet by the looks of it. Should crm_resource actually be [mis-]reporting as if it were knowledgeable when it's

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote: Hi All, I contributed next bugzilla by a problem to occur for the difference of the timing of the attribute update by attrd before. * https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2528 We can evade this problem

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
with crmd-transition-delay to me. I report the details again. # Probably it will be Bugzilla. . . Sounds good Best Regards, Hideo Yamauchi. --- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote: On 14 Jan 2014, at 3:52 pm, renayama19661...@ybb.ne.jp wrote: Hi All, I

Re: [Pacemaker] A resource starts with a standby node.(Latest attrd does not serve as the crmd-transition-delay parameter)

2014-01-13 Thread Andrew Beekhof
to work so that new attrd dispensed with crmd-transition-delay to me. I report the details again. # Probably it will be Bugzilla. . . Sounds good All right! Many Thanks! Hideo Yamauch. --- On Tue, 2014/1/14, Andrew Beekhof and...@beekhof.net wrote: On 14 Jan 2014, at 4:13 pm

Re: [Pacemaker] [PATCH] Downgrade probe log message for promoted ms resources

2014-01-12 Thread Andrew Beekhof
Fair enough. Pull request? On 12 Jan 2014, at 8:29 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, This is the only one message I see in logs in otherwise static cluster (with rechecks enabled), probably it is good idea to downgrade it to info. diff --git a/lib/pengine/unpack.c

Re: [Pacemaker] hangs pending

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 9:55 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 14:31, Andrey Groshev gre...@yandex.ru: 10.01.2014, 14:01, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 5:03 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 05:29, Andrew Beekhof

Re: [Pacemaker] again return code, now in crm_attribute

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 6:18 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 10:15, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 4:38 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 09:06, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 3:51 pm, Andrey Groshev

Re: [Pacemaker] Manual fence confirmation by stonith_admin doesn't work again.

2014-01-12 Thread Andrew Beekhof
On 10 Jan 2014, at 3:54 pm, Nikita Staroverov nsfo...@gmail.com wrote: There is no-one to tell yet. We have to wait for cman to decide something needs fencing before pacemaker can perform the notification. if I get you right i need own fencing agent that doing manual confirmed fence

Re: [Pacemaker] hangs pending

2014-01-10 Thread Andrew Beekhof
On 10 Jan 2014, at 5:03 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 05:29, Andrew Beekhof and...@beekhof.net: On 9 Jan 2014, at 11:11 pm, Andrey Groshev gre...@yandex.ru wrote: 08.01.2014, 06:22, Andrew Beekhof and...@beekhof.net: On 29 Nov 2013, at 7:17 pm, Andrey Groshev

Re: [Pacemaker] starting resources with failed stonith resource

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 8:29 pm, Frank Van Damme frank.vanda...@gmail.com wrote: 2014/1/8 Andrew Beekhof and...@beekhof.net: I don't understand it: if this means that the stonith devices have failed a million times, We also set it to 100 when the start action fails. why is it trying

Re: [Pacemaker] Breaking dependency loop stonith

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 5:05 pm, Andrey Groshev gre...@yandex.ru wrote: 08.01.2014, 06:15, Andrew Beekhof and...@beekhof.net: On 27 Nov 2013, at 12:26 am, Andrey Groshev gre...@yandex.ru wrote: Hi, ALL. I want to clarify two more questions. After stonith reboot - this node hangs

Re: [Pacemaker] again return code, now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 9 Jan 2014, at 4:44 pm, Andrey Groshev gre...@yandex.ru wrote: 09.01.2014, 02:39, Andrew Beekhof and...@beekhof.net: On 18 Dec 2013, at 11:55 pm, Andrey Groshev gre...@yandex.ru wrote: Hi, Andrew and ALL. I'm sorry, but I again found an error. :) Crux of the problem

Re: [Pacemaker] again return code, now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 10 Jan 2014, at 3:51 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 03:28, Andrew Beekhof and...@beekhof.net: On 9 Jan 2014, at 4:44 pm, Andrey Groshev gre...@yandex.ru wrote: 09.01.2014, 02:39, Andrew Beekhof and...@beekhof.net: On 18 Dec 2013, at 11:55 pm, Andrey

Re: [Pacemaker] again return code, now in crm_attribute

2014-01-09 Thread Andrew Beekhof
On 10 Jan 2014, at 4:38 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 09:06, Andrew Beekhof and...@beekhof.net: On 10 Jan 2014, at 3:51 pm, Andrey Groshev gre...@yandex.ru wrote: 10.01.2014, 03:28, Andrew Beekhof and...@beekhof.net: On 9 Jan 2014, at 4:44 pm, Andrey Groshev

Re: [Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1

2014-01-08 Thread Andrew Beekhof
On 8 Jan 2014, at 9:15 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: 2014/1/8 Andrew Beekhof and...@beekhof.net: On 18 Dec 2013, at 9:50 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi David, 2013/12/18 David Vossel dvos...@redhat.com: That's a really weird one... I

Re: [Pacemaker] starting resources with failed stonith resource

2014-01-08 Thread Andrew Beekhof
On 8 Jan 2014, at 2:41 am, Frank Van Damme frank.vanda...@gmail.com wrote: Hi list, I recently had some trouble with a dual-node mysql cluster, which runs in master-slave mode with Percona resource manager. While analyzing what happened to the cluster, I found this in syslog (network

Re: [Pacemaker] Error: node does not appear to exist in configuration

2014-01-08 Thread Andrew Beekhof
On 6 Jan 2014, at 8:09 pm, Jerald B. Darow jbda...@ace-host.net wrote: Where am I going wrong here? Good question... Chris? [root@zero mysql]# pcs cluster standby zero.acenet.us Error: node 'zero.acenet.us' does not appear to exist in configuration [root@zero mysql]# pcs cluster cib |

Re: [Pacemaker] monitoring redis in master-slave mode

2014-01-08 Thread Andrew Beekhof
On 13 Dec 2013, at 11:06 pm, ESWAR RAO eswar7...@gmail.com wrote: Hi All, I have a 3 node setup with HB+pacemaker. I wanted to run redis in master-slave mode using an ocf script. https://groups.google.com/forum/#!msg/redis-db/eY3zCKnl0G0/lW5fObHrjwQJ But with the below configuration

Re: [Pacemaker] catch-22: can't fence node A because node A has the fencing resource

2014-01-08 Thread Andrew Beekhof
On 4 Dec 2013, at 11:47 am, Brian J. Murrell br...@interlinx.bc.ca wrote: On Tue, 2013-12-03 at 18:26 -0500, David Vossel wrote: We did away with all of the policy engine logic involved with trying to move fencing devices off of the target node before executing the fencing action.

Re: [Pacemaker] pacemaker + cman - node names and bind address

2014-01-08 Thread Andrew Beekhof
On 5 Dec 2013, at 8:51 pm, Nikola Ciprich nikola.cipr...@linuxbox.cz wrote: Hello Digimer, and thanks for Your reply. I understand your points, but my question is about something a bit different.. example: I have two nodes, node1 (lan address resolves to 192.168.1.1) and node2 (lan

Re: [Pacemaker] How to permanently delete ghostly nodes?

2014-01-08 Thread Andrew Beekhof
On 7 Dec 2013, at 8:19 pm, Andrey Rogovsky a.rogov...@gmail.com wrote: I renamed several nodes and restart the cluster Now I show a old nodes in status offline I tried to delete them, but every time you change the cluster configuration they show in offline again It depends a bit on the

Re: [Pacemaker] again return code, now in crm_attribute

2014-01-08 Thread Andrew Beekhof
On 18 Dec 2013, at 11:55 pm, Andrey Groshev gre...@yandex.ru wrote: Hi, Andrew and ALL. I'm sorry, but I again found an error. :) Crux of the problem: # crm_attribute --type crm_config --attr-name stonith-enabled --query; echo $? scope=crm_config name=stonith-enabled value=true 0 #

Re: [Pacemaker] CentOS 6.5 Pacemaker Oracle Active/Failover cluster setup on SAN

2014-01-07 Thread Andrew Beekhof
On 6 Jan 2014, at 4:15 pm, Pui Edylie em...@edylie.net wrote: Good Day members, I am wondering if anyone has set this up successfully? I noticed that there is a lack of Oracle script to initiate this. I would willing to pay someone for this effort and hopefully we could create a

Re: [Pacemaker] Manual fence confirmation by stonith_admin doesn't work again.

2014-01-07 Thread Andrew Beekhof
On 19 Dec 2013, at 6:54 pm, Nikita Staroverov nsfo...@gmail.com wrote: Please see: https://access.redhat.com/site/articles/36302 If you don't have an account, the relevant part is: Usage of fence_manual is not supported in any production cluster. You may use this fence agent for

Re: [Pacemaker] lrmd segfault at pacemaker 1.1.11-rc1

2014-01-07 Thread Andrew Beekhof
On 18 Dec 2013, at 9:50 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi David, 2013/12/18 David Vossel dvos...@redhat.com: That's a really weird one... I don't see how it is possible for op-id to be NULL there. You might need to give valgrind a shot to detect whatever is

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2014-01-07 Thread Andrew Beekhof
On 20 Dec 2013, at 5:30 am, Bob Haxo bh...@sgi.com wrote: Hello, Earlier emails related to this topic: [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster [pacemaker] VirtualDomain problem after reboot of one node My configuration:

Re: [Pacemaker] Minor buffer overflow..

2014-01-07 Thread Andrew Beekhof
On 5 Dec 2013, at 3:20 pm, Rob Thomas xro...@gmail.com wrote: I was idly wondering why the SMTP and SNMP modules were disabled by default on the RHEL builds, and was in the middle of writing a shell script to duplicate them when I noticed there was a tiny buffer overflow in crm_mon. This

Re: [Pacemaker] Starting Pacemaker Cluster Manager [FAILED]

2014-01-07 Thread Andrew Beekhof
On 21 Nov 2013, at 9:56 pm, Miha m...@softnet.si wrote: HI, how can i delete/reset all config, so that I could do again: pcs cluster destroy on all nodes looks about right 'pcs cluster setup mycluster pcmk-1 pcmk-2' and begin again at the beginning? tnx! p.s.: bellowe is a log

Re: [Pacemaker] some questions about STONITH

2014-01-07 Thread Andrew Beekhof
On 26 Nov 2013, at 12:39 am, Andrey Groshev gre...@yandex.ru wrote: ...snip... Make next test: #stonith_admin --reboot=dev-cluster2-node2 Node reboot, but resource don't start. In crm_mon status - Node dev-cluster2-node2 (172793105): pending. And it will be hung. That is *probably*

Re: [Pacemaker] Breaking dependency loop stonith

2014-01-07 Thread Andrew Beekhof
On 27 Nov 2013, at 12:26 am, Andrey Groshev gre...@yandex.ru wrote: Hi, ALL. I want to clarify two more questions. After stonith reboot - this node hangs with status pending. The logs found string . info: rsc_merge_weights:pgsql:1: Breaking dependency loop at msPostgresql

Re: [Pacemaker] Weird behavior of PCS command while defining DRBD resources

2014-01-07 Thread Andrew Beekhof
On 27 Nov 2013, at 10:21 pm, Muhammad Kamran Azeem kamranaz...@gmail.com wrote: Apologies for double post. In my initial post, I forgot to set the subject properly. Hello List, I am new here. I worked with Linux HA during 2006-2008, went in HPC direction, and came back to HA a

Re: [Pacemaker] prevent starting resources on failed node

2014-01-07 Thread Andrew Beekhof
On 7 Dec 2013, at 2:17 am, Brian J. Murrell (brian) br...@interlinx.bc.ca wrote: [ Hopefully this doesn't cause a duplicate post but my first attempt returned an error. ] Using pacemaker 1.1.10 (but I think this issue is more general than that release), I want to enforce a policy that

Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2014-01-07 Thread Andrew Beekhof
What version of pacemaker? There were some improvements to how we handle sending messages via CPG recently. On 10 Dec 2013, at 4:40 am, Brian J. Murrell br...@interlinx.bc.ca wrote: On Mon, 2013-12-09 at 09:28 +0100, Jan Friesse wrote: Error 6 error means try again. This is happening ether

Re: [Pacemaker] Reg. trigger when node failure occurs

2014-01-07 Thread Andrew Beekhof
On 11 Dec 2013, at 3:45 pm, ESWAR RAO eswar7...@gmail.com wrote: Hi Micheal, I am configuring the ClusterMon as below on the 3 node setup: I am following http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/ # crm configure

Re: [Pacemaker] host came online but it was ignored

2014-01-07 Thread Andrew Beekhof
On Wed, Dec 11, 2013 at 6:18 AM, Andrew Beekhof and...@beekhof.net wrote: version of pacemaker? On 10 Dec 2013, at 10:41 pm, ESWAR RAO eswar7...@gmail.com wrote: Hi Micheal, There are no firewall rules. I could only see below messages in logs: Dec 10 14:13:48 nvp-common

Re: [Pacemaker] Question about node-action-limit and migration-limit

2014-01-07 Thread Andrew Beekhof
On 18 Dec 2013, at 9:51 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, When I set only migration-limit without setting node-action-limit in pacemaker-1.1, the number of 'operation' other than migrate_to/from was limited to the value of migration-limit. (The node that I used has

Re: [Pacemaker] Time to get ready for 1.1.11

2013-12-19 Thread Andrew Beekhof
- Original Message - From: David Vossel dvos...@redhat.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Wednesday, December 11, 2013 3:33:46 PM Subject: Re: [Pacemaker] Time to get ready for 1.1.11 - Original Message - From: Andrew Beekhof

Re: [Pacemaker] Pacemaker and RHEL/CENTOS 5.x compatibility ?

2013-12-19 Thread Andrew Beekhof
On 20 Dec 2013, at 1:36 am, Stephane Robin sro...@kivasystems.com wrote: Hi, This is a follow up on my previous post 'Trouble building Pacemaker from source on CentOS 5.10' Andrew: Thanks for your pointers. It turns out Pacemaker 1.1.10 needed more changes to build on CentOS 5.x.

Re: [Pacemaker] question on on-fail=restart

2013-12-18 Thread Andrew Beekhof
On 19 Dec 2013, at 4:03 am, Brusq, Jerome jerome.br...@signalis.com wrote: Dear all, I have a custom lsb script that launch a custom process. primitive myscript lsb:ha_swift \ op start interval=0 timeout=30s \ op stop interval=0 timeout=30s \ op monitor interval=15s

Re: [Pacemaker] Trouble building Pacemaker from source on CentOS 5.10

2013-12-18 Thread Andrew Beekhof
On 14 Dec 2013, at 7:51 am, Stephane Robin sro...@kivasystems.com wrote: Hi, I'm trying to build Pacemaker-1.1.10 (from git), with corosync 2.3.2 and libqb 0.16.0 on a CentOS 5.10 64b system. I have latest auto tools (automake 1.14, autoconf 2.69, lib tool 2.4, pkg-config 0.27.1)

Re: [Pacemaker] host came online but it was ignored

2013-12-10 Thread Andrew Beekhof
version of pacemaker? On 10 Dec 2013, at 10:41 pm, ESWAR RAO eswar7...@gmail.com wrote: Hi Micheal, There are no firewall rules. I could only see below messages in logs: Dec 10 14:13:48 nvp-common crmd: [9220]: WARN: crmd_ha_msg_callback: Ignoring HA message (op=join_announce) from

Re: [Pacemaker] is ccs as racy as it feels?

2013-12-10 Thread Andrew Beekhof
On 10 Dec 2013, at 11:31 pm, Brian J. Murrell br...@interlinx.bc.ca wrote: On Tue, 2013-12-10 at 10:27 +, Christine Caulfield wrote: Sadly you're not wrong. That's what I was afraid of. But it's actually no worse than updating corosync.conf manually, I think it is... in

Re: [Pacemaker] Where the heck is Beekhof?

2013-12-01 Thread Andrew Beekhof
12:04:01 +1100 Andrew Beekhof [2]and...@beekhof.net wrote: If you find yourself asking $subject at some point in the next couple of months, the answer is that I'm taking leave to look after our new son (Lawson Tiberius Beekhof) who was born on Tuesday. Concrats! And remember: If you want

Re: [Pacemaker] no-quorum-policy=freeze

2013-12-01 Thread Andrew Beekhof
On Wed, Nov 27, 2013, at 04:50 AM, Olivier Nicaise wrote: Hello all, I have an issue with the no quorum policy freeze (stonith disabled). I'm using an old version of pacemaker (1.1.6), the one distributed by Ubuntu 12.04. I have a cluster with 3 nodes running various resources, including

Re: [Pacemaker] p_mysql peration monitor failed 'not installed'

2013-11-21 Thread Andrew Beekhof
On 22 Nov 2013, at 7:32 am, Miha m...@softnet.si wrote: HI, what could be a reason for this error: notice: unpack_rsc_op: Preventing p_mysql from re-starting on sip2: operation monitor failed 'not installed' (rc=5) the agent, or something the agent needs is not available. how did you

Re: [Pacemaker] exit code crm_attibute

2013-11-21 Thread Andrew Beekhof
--attr-name notexistattibute --query /dev/null; echo $? Could not map name=fackename.node.org to a UUID 0 Version PCMK 1.1.11 23.09.2013, 08:23, Andrew Beekhof and...@beekhof.net: On 20/09/2013, at 5:53 PM, Andrey Groshev gre...@yandex.ru wrote: Hi again! Today again met

Re: [Pacemaker] CentOS 6.4 last update - Failed to create cluster resources with pcs command

2013-11-21 Thread Andrew Beekhof
On 22 Nov 2013, at 4:15 am, Dmitry Bron dmitr...@gmail.com wrote: Hi All, We have two fresh installed boxes with CentOS 6.4 and with last updates which we want to configure as Active - Standby in HA cluster. We copied all configuration files from another worked well HA cluster. We

Re: [Pacemaker] pacemaker update crash my config (cannot be represented in the CLI notation)

2013-11-20 Thread Andrew Beekhof
On 21 Nov 2013, at 6:08 am, Lars Marowsky-Bree l...@suse.com wrote: On 2013-11-20T16:43:51, Beo Banks beo.ba...@googlemail.com wrote: INFO: object cli-prefer-mysql cannot be represented in the CLI notation crm configure show | grep xml INFO: object cli-prefer-mysql cannot be

[Pacemaker] Time to get ready for 1.1.11

2013-11-20 Thread Andrew Beekhof
With over 400 updates since the release of 1.1.10, its time to start thinking about a new release. Today I have tagged release candidate 1[1]. The most notable fixes include: + attrd: Implementation of a truely atomic attrd for use with corosync 2.x + cib: Allow values to be added/updated

Re: [Pacemaker] stonith ra class missing

2013-11-19 Thread Andrew Beekhof
On 19 Nov 2013, at 4:19 pm, Michael Schwartzkopff m...@sys4.de wrote: Andrew Beekhof and...@beekhof.net schrieb: On 19 Nov 2013, at 1:23 am, Michael Schwartzkopff m...@sys4.de wrote: Hi, I installed pacemaker on a RHEL 6.4 machine. Now crm tells me that there is no stonith

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 18 Nov 2013, at 3:30 pm, Rob Thomas xro...@gmail.com wrote: I've been browsing through the cluster.log, and it's not even trying to move httpd. I'm almost certain that it used to work fine with resource sets. Hmm. OK. I went and -actually looked- at the CIB I was previously generating.

Re: [Pacemaker] stonith ra class missing

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 1:23 am, Michael Schwartzkopff m...@sys4.de wrote: Hi, I installed pacemaker on a RHEL 6.4 machine. Now crm tells me that there is no stonith ra class, onyl lsb, ocf and service. What did I miss? thanks for any valuable comments. did you install the fencing-agents

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 6:00 am, Rob Thomas xro...@gmail.com wrote: On Mon, Nov 18, 2013 at 9:17 PM, Andrew Beekhof and...@beekhof.net wrote: my eyes! my eyes! So... What's the -right- way to do it then? 8) http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 6:00 am, Rob Thomas xro...@gmail.com wrote: On Mon, Nov 18, 2013 at 9:17 PM, Andrew Beekhof and...@beekhof.net wrote: my eyes! my eyes! So... What's the -right- way to do it then? 8) http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 10:30 am, Rob Thomas xro...@gmail.com wrote: On Tue, Nov 19, 2013 at 8:55 AM, Andrew Beekhof and...@beekhof.net wrote: On 19 Nov 2013, at 6:00 am, Rob Thomas xro...@gmail.com wrote: So... What's the -right- way to do it then? 8) rsc_colocation id

Re: [Pacemaker] The larger cluster is tested.

2013-11-18 Thread Andrew Beekhof
://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing The report at the time of making it operate by my test code is the following. https://drive.google.com/file/d/0BwMFJItoO-fVbzB0NjFLeVY3Zmc/edit?usp=sharing Regards, Yusuke 2013/11/13 Andrew Beekhof and...@beekhof.net

Re: [Pacemaker] No such device, problem with setting pacemaker

2013-11-18 Thread Andrew Beekhof
On 18 Nov 2013, at 11:59 pm, Miha m...@softnet.si wrote: HI, I am for the first time setting cluster with pacemaker corosync. Server A and server B can ping each other, I have disabled selinux and iptables but I can not get this going. I did step by step as is writen in tutorial. Have

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 2:50 pm, Rob Thomas xro...@gmail.com wrote: On 19 Nov 2013, at 6:00 am, Rob Thomas xro...@gmail.com wrote: So... What's the -right- way to do it then? 8) rsc_colocation id=pcs_rsc_colocation resource_set id=pcs_rsc_set resource_ref id=httpd/

Re: [Pacemaker] Finally. A REAL question.

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 3:09 pm, Andrew Beekhof and...@beekhof.net wrote: On 19 Nov 2013, at 2:50 pm, Rob Thomas xro...@gmail.com wrote: On 19 Nov 2013, at 6:00 am, Rob Thomas xro...@gmail.com wrote: So... What's the -right- way to do it then? 8) rsc_colocation id=pcs_rsc_colocation

Re: [Pacemaker] Remove a ghost node

2013-11-18 Thread Andrew Beekhof
On 19 Nov 2013, at 3:21 am, Sean Lutner s...@rentul.net wrote: On Nov 17, 2013, at 7:40 PM, Andrew Beekhof and...@beekhof.net wrote: On 15 Nov 2013, at 2:28 pm, Sean Lutner s...@rentul.net wrote: Yes the varnish resources are in a group which is then cloned. -EDONTDOTHAT You

Re: [Pacemaker] CentOS 6.4 and CFS.

2013-11-17 Thread Andrew Beekhof
On 16 Nov 2013, at 9:42 am, Rob Thomas xro...@gmail.com wrote: Line 363 of /usr/lib/python2.6/site-packages/pcs/cluster.py has this: nodes = utils.getNodesFromCorosyncConf() Ahha. Look what I just spotted. https://github.com/feist/pcs/commit/8b888080c37ddea88b92dfd95aadd78b9db68b55

Re: [Pacemaker] Remove a ghost node

2013-11-17 Thread Andrew Beekhof
On 15 Nov 2013, at 2:28 pm, Sean Lutner s...@rentul.net wrote: Yes the varnish resources are in a group which is then cloned. -EDONTDOTHAT You cant refer to the things inside a clone. 1.1.8 will have just been ignoring those constraints. So the implicit order and colocation

Re: [Pacemaker] Finally. A REAL question.

2013-11-17 Thread Andrew Beekhof
On 18 Nov 2013, at 12:43 pm, Rob Thomas xro...@gmail.com wrote: Previously, using crm, it was reasonably painless to ensure that resource groups ran on the same node. I'm having difficulties figuring out what the 'right' way to do this is with pcs You tried: pcs constraint colocation

Re: [Pacemaker] CentOS 6.4 and CFS.

2013-11-15 Thread Andrew Beekhof
On 15 Nov 2013, at 5:56 pm, Rob Thomas xro...@gmail.com wrote: So I'm a long time corosync fan, and I've recently come back into the fold to change everything I've previously written to pcs, because that's the new cool thing. Sadly, things seem to be a bit broken. Here's how things have

Re: [Pacemaker] Remove a ghost node

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 2:55 pm, Sean Lutner s...@rentul.net wrote: On Nov 13, 2013, at 10:51 PM, Andrew Beekhof and...@beekhof.net wrote: On 14 Nov 2013, at 1:12 pm, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 8:03 PM, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 5:06 pm, Andrey Groshev gre...@yandex.ru wrote: 14.11.2013, 02:22, Andrew Beekhof and...@beekhof.net: On 14 Nov 2013, at 6:13 am, Andrey Groshev gre...@yandex.ru wrote: 13.11.2013, 03:22, Andrew Beekhof and...@beekhof.net: On 12 Nov 2013, at 4:42 pm, Andrey Groshev

Re: [Pacemaker] Remove a ghost node

2013-11-14 Thread Andrew Beekhof
On 15 Nov 2013, at 10:24 am, Sean Lutner s...@rentul.net wrote: On Nov 14, 2013, at 6:14 PM, Andrew Beekhof and...@beekhof.net wrote: On 14 Nov 2013, at 2:55 pm, Sean Lutner s...@rentul.net wrote: On Nov 13, 2013, at 10:51 PM, Andrew Beekhof and...@beekhof.net wrote: On 14 Nov

Re: [Pacemaker] Question about the resource to fence a node

2013-11-14 Thread Andrew Beekhof
On 14 Nov 2013, at 5:53 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, Andrew 2013/11/13 Kazunori INOUE kazunori.ino...@gmail.com: 2013/11/13 Andrew Beekhof and...@beekhof.net: On 16 Oct 2013, at 8:51 am, Andrew Beekhof and...@beekhof.net wrote: On 15/10/2013, at 8:24 PM

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-13 Thread Andrew Beekhof
On 14 Nov 2013, at 6:13 am, Andrey Groshev gre...@yandex.ru wrote: 13.11.2013, 03:22, Andrew Beekhof and...@beekhof.net: On 12 Nov 2013, at 4:42 pm, Andrey Groshev gre...@yandex.ru wrote: 11.11.2013, 03:44, Andrew Beekhof and...@beekhof.net: On 8 Nov 2013, at 7:49 am, Andrey Groshev

Re: [Pacemaker] stonith_admin does not work as expected

2013-11-13 Thread Andrew Beekhof
2013/11/11, Andrew Beekhof and...@beekhof.net: Impossible to comment without knowing the pacemaker version, full config, and how fence_ifmib works (I assume its a custom agent?) On 12 Nov 2013, at 1:21 am, andreas graeper agrae...@googlemail.com wrote: hi, two nodes. n1 (slave) fence_2

Re: [Pacemaker] crmd Segmentation fault at pacemaker 1.0.12

2013-11-13 Thread Andrew Beekhof
On 13 Nov 2013, at 7:36 pm, TAKATSUKA Haruka haru...@sraoss.co.jp wrote: Hello, pacemaker hackers I report crmd's crash at pacemaker 1.0.12 . We are going to upgrade pacemaker 1.0.12 to 1.0.13 . But I was not able to find a fix for this problem from ChangeLog. tengine.c:do_te_invoke()

Re: [Pacemaker] Remove a ghost node

2013-11-13 Thread Andrew Beekhof
On 14 Nov 2013, at 1:12 pm, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 8:03 PM, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013, at 7:54 PM, Andrew Beekhof and...@beekhof.net wrote: On 11 Nov 2013, at 11:44 am, Sean Lutner s...@rentul.net wrote: On Nov 10, 2013

Re: [Pacemaker] recover cib from raw file

2013-11-12 Thread Andrew Beekhof
to have to use it :-) Regards Sean O'Reilly On Mon 11/11/13 10:03 PM , Andrew Beekhof and...@beekhof.net sent: On 11 Nov 2013, at 9:41 pm, s.oreilly s.orei...@linnovations.co.uk wrote: Hi, Is it possible to recover/replace cib.xml from one of the raw files in /var/lib/pacemaker

Re: [Pacemaker] Network outage debugging

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 6:10 am, Sean Lutner s...@rentul.net wrote: The folks testing the cluster I've been building have run a script which blocks all traffic except SSH on one node of the cluster for 15 seconds to mimic a network failure. During this time, the network being down seems to

Re: [Pacemaker] Follow up: Colocation constraint to External Managed Resource (cluster-recheck-interval=5m ignored after 1.1.10 update?)

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 12:06 am, Robert H. pacema...@elconas.de wrote: Hello, for PaceMaker 1.1.8 (CentOS Version) the thread http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg18048.html was solved with adding cluster-recheck-interval=5m, causing the LRM Its the policy engine

Re: [Pacemaker] why pacemaker does not control the resources

2013-11-12 Thread Andrew Beekhof
On 12 Nov 2013, at 4:42 pm, Andrey Groshev gre...@yandex.ru wrote: 11.11.2013, 03:44, Andrew Beekhof and...@beekhof.net: On 8 Nov 2013, at 7:49 am, Andrey Groshev gre...@yandex.ru wrote: Hi, PPL! I need help. I do not understand... Why has stopped working. This configuration work

Re: [Pacemaker] The larger cluster is tested.

2013-11-12 Thread Andrew Beekhof
to handle the synchronization message? Regards, Yusuke 2013/11/12 Andrew Beekhof and...@beekhof.net: On 11 Nov 2013, at 11:48 pm, yusuke iida yusk.i...@gmail.com wrote: Execution of the graph was also checked. Since the number of pending(s) is restricted to 16 from the middle

<    2   3   4   5   6   7   8   9   10   11   >