Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread emmanuel segura
Hello How do you configure your cluster network? are you using a private network for the cluster and one public for the services? 2013/5/15 Andrew Widdersheim awiddersh...@hotmail.com Sorry to bring up old issues but I am having the exact same problem as the original poster. A simultaneous

[Pacemaker] pcs/crmsh Cheat sheet

2013-05-16 Thread Andrew Beekhof
By popular request, I've taken a stab at a cheat-sheet for those switching between pcs and crmsh. https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md Any and all assistance expanding it and ensuring it is accurate will be gratefully received. -- Andrew

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread Andrew Beekhof
On 16/05/2013, at 3:49 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 16.05.2013 02:46, Andrew Beekhof wrote: On 15/05/2013, at 6:44 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 15.05.2013 11:18, Andrew Beekhof wrote: On 15/05/2013, at 5:31 PM, Vladislav Bogdanov

Re: [Pacemaker] Pacemaker Digest, Vol 66, Issue 58

2013-05-16 Thread Wolfgang Routschka
Hi Andreas, thanks for your answer, crm_simulate -s -L (node2 is offline - r_postfix is running on node1) native_color: r_haproxy allocation score on node1: -INFINITY native_color: r_haproxy allocation score on node2: -INFINITY crm_simulate -s -L (both nodes are online - r_postfix is running

Re: [Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

2013-05-16 Thread Lars Marowsky-Bree
On 2013-05-15T22:55:43, Andreas Kurz andr...@hastexo.com wrote: start-delay is an option of the monitor operation ... in fact means don't trust that start was successfull, wait for the initial monitor some more time It can be used on start here though to avoid exactly this situation; and it

Re: [Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

2013-05-16 Thread Klaus Darilion
Hi Andreas! On 15.05.2013 22:55, Andreas Kurz wrote: On 2013-05-15 15:34, Klaus Darilion wrote: On 15.05.2013 14:51, Digimer wrote: On 05/15/2013 08:37 AM, Klaus Darilion wrote: primitive st-pace1 stonith:external/xen0 \ params hostlist=pace1 dom0=xentest1 \ op start

Re: [Pacemaker] error with cib synchronisation on disk

2013-05-16 Thread Халезов Иван
On 16.05.2013 07:14, Andrew Beekhof wrote: On 15/05/2013, at 9:53 PM, Халезов Иван i.khale...@rts.ru wrote: Hello everyone! Some problems occured with synchronisation CIB configuration to disk. I have this errors in pacemaker's logfile: What were the messages before this? Did it happen once

Re: [Pacemaker] pacemaker colocation after one node is down

2013-05-16 Thread Wolfgang Routschka
Hi Andreas, thank you for your answer. solutions is one coloation with -score colocation cl_g_ip-address_not_on_r_postfix -1: g_ip-address r_postfix Greetings Wolfgang On 2013-05-15 21:30, Wolfgang Routschka wrote: Hi everybody, one question today about colocation rule on a 2-node

[Pacemaker] stonith-ng: error: remote_op_done: Operation reboot of node2 by node1 for stonith_admin: Timer expired

2013-05-16 Thread Brian J. Murrell
Using Pacemaker 1.1.8 on EL6.4 with the pacemaker plugin, I'm finding strange behavior with stonith-admin -B node2. It seems to shut the node down but not start it back up and ends up reporting a timer expired: # stonith_admin -B node2 Command failed: Timer expired The pacemaker log for the

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
The cluster has 3 connections total. The first connection is the outside interface where services can communicate and is also used for cluster communication using mcast. The second interface is a cross-over that is solely for cluster communication. The third connection is another cross-over

Re: [Pacemaker] pacemaker colocation after one node is down

2013-05-16 Thread Andreas Kurz
On 2013-05-16 13:42, Wolfgang Routschka wrote: Hi Andreas, thank you for your answer. solutions is one coloation with -score ah, yes only _one_ of them with a non-negative value is needed. Scores of all constraints are added up. Regards, Andreas colocation

Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-16 Thread Rainer Brestan
The bug is in the function is_normal_node. This function checks the attribute type for state normal. But this attribute is not used any more. CIB output from Pacemaker 1.1.8 nodes node id=int2node1 uname=int2node1 instance_attributes id=nodes-int2node1 nvpair id=nodes-int2node1-standby

[Pacemaker] having problem with crm cib shadow

2013-05-16 Thread George Gibat
crm(live)cib# use gfs2 ERROR: gfs2: no such shadow CIB crm(live)cib# new gfs2 A shadow instance 'gfs2' already exists. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. crm(live)cib# list crm(live)cib# use gfs2 ERROR: gfs2: no such shadow CIB

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Martin
Andrew, I'd recommend adding more than one host to your p_ping resource and see if that improves the situation. When I had this problem, I observed better behavior after adding more than one IP to the list of hosts and changing the p_ping location constraint to be as follows: location

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2013-05-16 Thread Andrew Widdersheim
Thanks for the help. Adding another node to the ping host_list may help in some  situations but the root issues doesn't really get solved. Also, the location constraint you posted is very different than mine. Your constraint requires connectivity where as the one I am trying to use looks for

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread John McCabe
Which Linux distribution and version of pacemaker are you using? /John On Thursday, 16 May 2013, George Gibat wrote: crm(live)cib# use gfs2 ERROR: gfs2: no such shadow CIB crm(live)cib# new gfs2 A shadow instance 'gfs2' already exists. To prevent accidental destruction of the cluster, the

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread George G. Gibat
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 centos 6.4, pacemaker 1.1.8-7.el6 On 2013-05-16 18:57, John McCabe wrote: Which Linux distribution and version of pacemaker are you using? /John On Thursday, 16 May 2013, George Gibat wrote: crm(live)cib# use gfs2 ERROR: gfs2: no such shadow

[Pacemaker] question about interface failover

2013-05-16 Thread christopher barry
Greetings, I've setup a new 2-node mysql cluster using * drbd 8.3.1.3 * corosync 1.4.2 * pacemaker 117 on Debian Wheezy nodes. failover seems to be working fine for everything except the ips manually configured on the interfaces. see config here:

Re: [Pacemaker] having problem with crm cib shadow

2013-05-16 Thread John McCabe
Worth trying crm_shadow as described here - http://www.gossamer-threads.com/lists/linuxha/pacemaker/84969 I had the same problem and took it as a sign that I should just move to pcs (from the RHEL repo, not the latest source), which went pretty smoothly, only had a few problems with assigning

[Pacemaker] pacemaker-remote tls handshaking

2013-05-16 Thread Lindsay Todd
I've built pacemaker 1.1.10rc2 and am trying to get the pacemaker-remote features working on my Scientific Linux 6.4 system. It almost works... The /etc/pacemaker/authkey file is on all the cluster nodes, as well as my test VM (readable to all users, and checksums are the same everywhere). I

[Pacemaker] mysql ocf resource agent - resource stays unmanaged if binary unavailable

2013-05-16 Thread Vladimir
Hi, our pacemaker setup provides mysql resource using ocf resource agent. Today I tested with my colleagues forcing mysql resource to fail. I don't understand the following behaviour. When I remove the mysqld_safe binary (which path is specified in crm config) from one server and moving the mysql

Re: [Pacemaker] pacemaker-remote tls handshaking

2013-05-16 Thread David Vossel
- Original Message - From: Lindsay Todd rltodd@gmail.com To: The Pacemaker cluster resource manager Pacemaker@oss.clusterlabs.org Sent: Thursday, May 16, 2013 3:44:09 PM Subject: [Pacemaker] pacemaker-remote tls handshaking I've built pacemaker 1.1.10rc2 and am trying to get the

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread renayama19661014
Hi Andrew, Hi Vladislav, I try whether this correction is effective for this problem. * https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59 Best Regards, Hideo Yamauchi. --- On Thu, 2013/5/16, Andrew Beekhof and...@beekhof.net wrote: On 16/05/2013, at 3:49

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
Just tried the patch you gave and it worked fine. Any plans on putting this patch in officially or was this a one off? Aside from this patch I guess the only thing to get things to work is to install things slightly differently and adding a symlink from cluster-glue's lrmd to pacemakers.

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Beekhof
On 17/05/2013, at 11:38 AM, Andrew Widdersheim awiddersh...@hotmail.com wrote: Just tried the patch you gave and it worked fine. Any plans on putting this patch in officially or was this a one off? It will be in 1.1.10-rc3 soon Aside from this patch I guess the only thing to get things to

Re: [Pacemaker] pacemaker-1.1.10 results in Failed to sign on to the LRM 7

2013-05-16 Thread Andrew Widdersheim
I'm attaching 3 patches I made fairly quickly to fix the installation issues and also an issue I noticed with the ping ocf from the latest pacemaker.  One is for cluster-glue to prevent lrmd from building and later installing. May also want to modify this patch to take lrmd out of both spec

Re: [Pacemaker] [Question and Problem] In vSphere5.1 environment, IO blocking of pengine occurs at the time of shared disk trouble for a long time.

2013-05-16 Thread Andrew Beekhof
On 17/05/2013, at 10:27 AM, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Hi Vladislav, I try whether this correction is effective for this problem. * https://github.com/beekhof/pacemaker/commit/eb6264bf2db395779e65dadf1c626e050a388c59 Doubtful, it just reduces code duplication. But it