Re: [Linux-HA] Antw: Question around resources constraints (pacemaker on RHE7.1)

2015-09-13 Thread Andrew Beekhof
t; > Thanks a lot > Alain > > > De : linux-ha-boun...@lists.linux-ha.org > [linux-ha-boun...@lists.linux-ha.org] de la part de Andrew Beekhof > [and...@beekhof.net] > Envoyé : vendredi 28 août 2015 04:31 > À : Please su

Re: [Linux-HA] Antw: Question around resources constraints (pacemaker on RHE7.1)

2015-08-27 Thread Andrew Beekhof
On 25 Aug 2015, at 6:18 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: MOULLE, ALAIN alain.mou...@atos.net schrieb am 21.08.2015 um 15:27 in Nachricht df84cff8a85ab546b2d53fff12267727022...@frauvj99ex5msx.ww931.my-it-solutions.net : Hi I can't find a way to configure

Re: [Linux-HA] Ping nodes and cluster rules

2015-04-26 Thread Andrew Beekhof
On 17 Apr 2015, at 10:58 pm, Adam Błaszczykowski adam.blaszczykow...@gmail.com wrote: Hello, I am using two nodes in cluster with corosync 2.3.4 and pacemaker 1.1.12 and each node has access to 2 ping nodes. I would like to know if it is possible to set following cluster rules: Rule 1

Re: [Linux-HA] Q: Resource migration (Xen live migration)

2015-03-29 Thread Andrew Beekhof
On 13 Feb 2015, at 8:38 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hello! I have some questions on pacemakers's resource migration. We have a Xen host that has some problems (still to be investigated) that causes some VM disk not be be ready for use. When tyring to

Re: [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-19 Thread Andrew Beekhof
On 11 Feb 2015, at 8:24 am, Lars Ellenberg lars.ellenb...@linbit.com wrote: TL;DR: If you intend to set up a new High Availability cluster using the Pacemaker cluster manager, you typically should not care for Heartbeat, but use recent releases (2.3.x) of Corosync. If you

Re: [Linux-ha-dev] [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-19 Thread Andrew Beekhof
On 11 Feb 2015, at 8:24 am, Lars Ellenberg lars.ellenb...@linbit.com wrote: TL;DR: If you intend to set up a new High Availability cluster using the Pacemaker cluster manager, you typically should not care for Heartbeat, but use recent releases (2.3.x) of Corosync. If you

Re: [Linux-ha-dev] quorum status

2015-01-29 Thread Andrew Beekhof
Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, January 29, 2015 1:51 PM To: Yan, Xiaoping (NSN - CN/Hangzhou) Cc: Linux-HA-Dev@lists.linux-ha.org Subject: Re: quorum status Did you shut down pacemaker or corosync or both? On 29 Jan 2015, at 4:18 pm, Yan, Xiaoping (NSN - CN

Re: [Linux-ha-dev] quorum status

2015-01-28 Thread Andrew Beekhof
Did you shut down pacemaker or corosync or both? On 29 Jan 2015, at 4:18 pm, Yan, Xiaoping (NSN - CN/Hangzhou) xiaoping@nsn.com wrote: Hi, Any suggestion please? Br, Rip _ From: Yan, Xiaoping (NSN - CN/Hangzhou) Sent: Wednesday,

Re: [Linux-HA] SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

2015-01-20 Thread Andrew Beekhof
On 21 Jan 2015, at 3:38 am, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! When a SLES11SP3 node joined a 3-node cluster after reboot (and preceeding update), a node with up-to-date software showed these messages (I feel these should not appear): Jan 20 17:12:38 h10

Re: [Linux-HA] Support for DRDB

2015-01-19 Thread Andrew Beekhof
On 18 Jan 2015, at 3:45 am, Lars Marowsky-Bree l...@suse.com wrote: On 2015-01-16T16:25:15, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72) external.martin.kon...@de.bosch.com wrote: I am glad to hear that SLE HA has no plans to drop support for DRBD. Unfortunately I currently cannot

Re: [Linux-HA] Support for DRDB

2015-01-19 Thread Andrew Beekhof
On 17 Jan 2015, at 4:19 am, Digimer li...@alteeve.ca wrote: On 16/01/15 10:43 AM, Dmitri Maziuk wrote: On 1/16/2015 8:39 AM, Lars Marowsky-Bree wrote: On 2015-01-16T11:56:04, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72) external.martin.kon...@de.bosch.com wrote: I have been told that

Re: [Linux-HA] Monitor a Pacemaker Cluster with ocf:pacemaker:ClusterMon and/or external-agent

2015-01-07 Thread Andrew Beekhof
On 26 Nov 2014, at 4:15 pm, Ranjan Gajare ranjangajar...@gmail.com wrote: I have CentOs 9.3 9.3? that would be impressive can you tell me the version of pacemaker you have installed? i suspect you're lacking this patch: https://github.com/beekhof/pacemaker/commit/3df6aff installed and

Re: [Linux-HA] Antw: Re: Q: Avoid resource restart after configuration change

2014-12-03 Thread Andrew Beekhof
On 3 Dec 2014, at 6:55 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 03.12.2014 um 06:45 in Nachricht 6da1e43b-b83a-4441-9fb0-88bfe409d...@beekhof.net: On 1 Dec 2014, at 7:46 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de

Re: [Linux-HA] Q: Avoid resource restart after configuration change

2014-12-02 Thread Andrew Beekhof
On 1 Dec 2014, at 7:46 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I'd like to change a resource's configuration, but don't want a restart of the resource. That is, I want the configuration to be effective the next time the resource is started. I know in general

Re: [Linux-HA] [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree l...@suse.com wrote: On 2014-11-24T16:16:05, Fabio M. Di Nitto fdini...@redhat.com wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) That would be the perfect opportunity for you to

Re: [Linux-HA] [ha-wg-technical] [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
On 26 Nov 2014, at 10:06 am, Digimer li...@alteeve.ca wrote: On 25/11/14 04:31 PM, Andrew Beekhof wrote: Yeah, but you're already bringing him for your personal conference. That's a bit different. ;-) OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days

Re: [Linux-HA] [ha-wg-technical] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
On 26 Nov 2014, at 4:51 pm, Fabio M. Di Nitto fabbi...@fabbione.net wrote: On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: On 2014-11-24T16:16:05, Fabio M. Di Nitto fdini...@redhat.com wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the

Re: [Linux-HA] time_longclock illumos

2014-11-16 Thread Andrew Beekhof
On 17 Nov 2014, at 5:17 am, Randy S sim@live.nl wrote: Hi all, new user here. We have been testing an older version of the heartbeat / pacemaker combination compiled for illumos (an opensolaris follow-up). Versions: Heartbeat-3-0-STABLE-3.0.5 Pacemaker-1-0-Pacemaker-1.0.11 The

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Andrew Beekhof
On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: Yep, my network engineer and I found that the multicast packets were being blocked by the underlying hypervisor for the VM systems. Yeah, that'll happen :-( I believe its fixed in newer kernels, but for a while there multicast would

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Andrew Beekhof
://access.redhat.com/solutions/784373 On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote: Sure! But i can't seem to get Redhat to let me see the bug, even though I have an account. Sent from my iPad On Oct 21, 2014, at 5:51 PM, Andrew Beekhof and...@beekhof.net wrote: On 22 Oct 2014, at 7

Re: [Linux-HA] Configuring corosync on a CentOS 6.5

2014-10-21 Thread Andrew Beekhof
On 22 Oct 2014, at 2:15 am, John Scalia jayknowsu...@gmail.com wrote: Hi all, again, My network engineer and I have found that the VM's hypervisor was set up to block multicast broadcasts by our security team. Blocked or lost? These links might be worth a look:

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Andrew Beekhof
On 21 Oct 2014, at 7:17 am, John Scalia jayknowsu...@gmail.com wrote: Thanks, but on centOS are you saying to use pcs cluster start rather than using service cman start and service pacemaker start? I was just going by the tutorial, which doesn't mention this. 'service pacemaker start'

Re: [Linux-HA] Two node Pacemaker with one Corosync only quorum node

2014-10-06 Thread Andrew Beekhof
On 3 Sep 2014, at 9:29 am, Brian Campbell brian.campb...@editshare.com wrote: I'm wondering if there are any problems that would occur if you ran a cluster with only two nodes running Pacemaker, but add a third Corosync only node to provide quorum. I tried this setup, and it appears to

Re: [Linux-HA] Antw: Re: Q: dampening explained?

2014-09-10 Thread Andrew Beekhof
On 10 Sep 2014, at 6:43 pm, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Wed, Sep 10, 2014 at 03:00:15PM +1000, Andrew Beekhof wrote: What are the precise semantics of dampening (attrd_updater -d)? Basic idea of attr_updater -d delay is in fact wait for the dust to settle

Re: [Linux-HA] Antw: Re: Q: dampening explained?

2014-09-09 Thread Andrew Beekhof
On 9 Sep 2014, at 4:11 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 09.09.2014 um 00:25 in Nachricht c19365b9-b626-471e-a92a-001950d01...@beekhof.net: On 8 Sep 2014, at 5:19 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de

Re: [Linux-HA] Antw: Re: Q: dampening explained?

2014-09-09 Thread Andrew Beekhof
On 9 Sep 2014, at 11:12 pm, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Sep 09, 2014 at 05:10:33PM +1000, Andrew Beekhof wrote: On 9 Sep 2014, at 4:11 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 09.09.2014 um 00:25

Re: [Linux-HA] Q: dampening explained?

2014-09-08 Thread Andrew Beekhof
On 8 Sep 2014, at 5:19 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I remember having asked this before, but I'l still missing a good explanation: What are the precise semantics of dampening (attrd_updater -d)? The manual page just says: -d, --delay=value

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-24 Thread Andrew Beekhof
On 25 Jun 2014, at 12:03 am, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Jun 24, 2014 at 12:23:30PM +1000, Andrew Beekhof wrote: On 24 Jun 2014, at 1:52 am, f...@vmware.com wrote: Hi, I understand that initially the split-brain is caused by heartbeat messaging layer

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-23 Thread Andrew Beekhof
, but IMHO it surpassed heartbeat for reliability 3-4 years ago. Thanks, -Kaiwei - Original Message - From: Andrew Beekhof and...@beekhof.net To: General Linux-HA mailing list linux-ha@lists.linux-ha.org Sent: Sunday, June 22, 2014 3:45:00 PM Subject: Re: [Linux-HA] unable to recover

Re: [Linux-HA] unable to recover from split-brain in a two-node cluster

2014-06-22 Thread Andrew Beekhof
On 21 Jun 2014, at 5:18 am, f...@vmware.com wrote: Hi, New to this list and hope I can get some help here. I'm using pacemaker 1.0.10 and heartbeat 3.0.5 for a two-node cluster. I'm having split-brain problem when heartbeat messages sometimes get dropped when system is under high

Re: [Linux-HA] How to restart cluster ?

2014-06-09 Thread Andrew Beekhof
On 9 Jun 2014, at 7:58 pm, jarek ja...@poczta.srv.pl wrote: Hello! Thank you for the answer, but this answer didn't solve my problem. I have simple two-node cluster with virtual ip address and Postgres with streaming replication, created with this tutorial:

Re: [Linux-HA] getting proper sources

2014-06-01 Thread Andrew Beekhof
On 1 Jun 2014, at 2:15 am, Dmitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 5/30/2014 6:20 PM, Andrew Beekhof wrote: Is there a reason you keep spouting nonsense? Yes: I have a memory and it remembers. For example, this: http://www.gossamer-threads.com/lists/linuxha/users/81573?do

Re: [Linux-HA] getting proper sources

2014-05-30 Thread Andrew Beekhof
On 31 May 2014, at 12:56 am, Dmitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 5/29/2014 4:20 PM, Digimer wrote: On 29/05/14 01:43 PM, Dimitri Maziuk wrote: Support for free is 50% chance Lars will ask you if you're a paying Suse customer. Jay was asking about RHEL, but even then, I've seen

Re: [Linux-HA] getting proper sources

2014-05-29 Thread Andrew Beekhof
On 30 May 2014, at 3:01 am, Jay G. Scott g...@arlut.utexas.edu wrote: On Wed, May 28, 2014 at 07:42:58PM -0400, Digimer wrote: On 28/05/14 05:05 PM, Jay G. Scott wrote: Greetings, I'm a noob. If this isn't the right place to ask this, let me know. I took general configuration

Re: [Linux-HA] getting proper sources

2014-05-29 Thread Andrew Beekhof
On 30 May 2014, at 7:20 am, Digimer li...@alteeve.ca wrote: On 29/05/14 01:43 PM, Dimitri Maziuk wrote: On 05/29/2014 12:01 PM, Jay G. Scott wrote: what's the answer for ... Centos, I guess...? And it does embarrass me to have to ask that. Pacemaker/corosync -- 2+-node clusters,

Re: [Linux-HA] A question of pgsql resource

2014-05-19 Thread Andrew Beekhof
On 19 May 2014, at 8:12 pm, Naoya Anzai anzai-na...@mxu.nes.nec.co.jp wrote: Hi Matsuo-san Thank you for your response. But I would like you to keep current stopping process because I think it's safer to use STONITH. Could you implement it adding new parameter if you implement?

Re: [Linux-HA] OSX and HA

2014-05-08 Thread Andrew Beekhof
See if http://clusterlabs.org/wiki/SourceInstall#Darwin.2FMacOS_X helps. On 9 May 2014, at 5:38 am, Andrew Marks andrew.mar...@icloud.com wrote: I am looking for information on compiling this HA to be used with OSX Server (10.8/10.9) I am looking to do a simple active/passive cluster for

Re: [Linux-HA] How I can create unordered group of resources

2014-05-05 Thread Andrew Beekhof
On 5 May 2014, at 10:06 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: On 05/05/2014 02:36 AM, Andrew Beekhof wrote: On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: I would create the group with the meta attributr for unordered resources. Meta odered=false

Re: [Linux-HA] How I can create unordered group of resources

2014-05-04 Thread Andrew Beekhof
On 4 May 2014, at 4:22 pm, Fabian Herschel fabian.hersc...@arcor.de wrote: I would create the group with the meta attributr for unordered resources. Meta odered=false N. Use a colocation set. Von Samsung-Tablet gesendet Ursprüngliche Nachricht Von:

Re: [Linux-HA] Resource blocked

2014-04-22 Thread Andrew Beekhof
On 23 Apr 2014, at 4:15 am, Tom Parker tpar...@cbnco.com wrote: Good morning I am trying to restart resources on one of my clusters and I am getting the message pengine[13397]: notice: LogActions: Start domtcot1-qa(qaxen1 - blocked) How can I find out why this resource is

Re: [Linux-HA] Info messages in syslog: Are they normal?

2014-04-05 Thread Andrew Beekhof
Your mail client has butchered your message beyond recognition. Pacemaker did send some INFO logs to syslog in the past, so yes it is normal. On 5 Apr 2014, at 1:55 pm, Anna Hegedus akh...@hotmail.com wrote: Hi Everyone, I have to apologize in advance. I have used other solutions before in

Re: [Linux-HA] How to tell pacemaker to process a new event during a long-running resource operation

2014-03-17 Thread Andrew Beekhof
On 17 Mar 2014, at 6:49 pm, Lars Marowsky-Bree l...@suse.com wrote: On 2014-03-14T15:50:18, David Vossel dvos...@redhat.com wrote: in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight

Re: [Linux-HA] Multiple colocation with same resource group

2014-02-24 Thread Andrew Beekhof
On 25 Feb 2014, at 1:29 am, Tony Stocker tony.stoc...@nasa.gov wrote: On Mon, 24 Feb 2014, Andrew Beekhof wrote: On 22 Feb 2014, at 2:16 am, Greg Woods wo...@ucar.edu wrote: On Fri, 2014-02-21 at 12:37 +, Tony Stocker wrote: colocation inf_ftpd inf: infra_group ftpd

Re: [Linux-HA] Multiple colocation with same resource group

2014-02-23 Thread Andrew Beekhof
On 22 Feb 2014, at 2:16 am, Greg Woods wo...@ucar.edu wrote: On Fri, 2014-02-21 at 12:37 +, Tony Stocker wrote: colocation inf_ftpd inf: infra_group ftpd or do I need to use an 'order' statement instead, i.e.: order ftp_infra mandatory: infra_group:start ftpd I'm

Re: [Linux-HA] pgsql resource agent in status Stopped after crm resource cleanup

2014-02-23 Thread Andrew Beekhof
On 21 Feb 2014, at 10:55 pm, Lukas Grossar lukas.gros...@adfinis-sygroup.ch wrote: Hi I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy and I'm testing how Pacemaker reacts to specific failure scenarios. One thing I did test that currently drives me crazy is when I

Re: [Linux-HA] Antw: Re: Q: crm configure edit/show regex

2014-02-19 Thread Andrew Beekhof
On 20 Feb 2014, at 1:42 am, Lars Marowsky-Bree l...@suse.com wrote: On 2014-02-19T10:31:45, Andrew Beekhof and...@beekhof.net wrote: Unifying this might be difficult, as far as I know pcs doesn't have an interactive mode or anything similar to the configure interface of crmsh.. It does

Re: [Linux-HA] Antw: Re: Q: crm configure edit/show regex

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 12:13 am, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Kristoffer Grönlundkgronl...@suse.com schrieb am 18.02.2014 um 14:07 in Nachricht 20140218140726.1b2dfd0f@ultralix: On Tue, 18 Feb 2014 13:48:21 +0100 Lars Marowsky-Bree l...@suse.com wrote: On

Re: [Linux-HA] openstack fencing agent

2014-02-18 Thread Andrew Beekhof
On 19 Feb 2014, at 9:15 am, JR botem...@gmail.com wrote: Greetings, Someone on the linux-ha irc channel suggested that perhaps this agent might be of use to others using openstack. Note: it's based on fence_virsh and was written in about 20 mins; it seems to work for me but YMMV.

Re: [Linux-HA] Antw: Re: Why does o2cb RA remove module ocfs2?

2014-02-17 Thread Andrew Beekhof
On 17 Feb 2014, at 6:39 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 17.02.2014 um 02:33 in Nachricht 7619a7e9-f006-4098-90f9-5c5b8bc84...@beekhof.net: On 11 Feb 2014, at 10:38 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de

Re: [Linux-HA] Antw: Re: Why does o2cb RA remove module ocfs2?

2014-02-16 Thread Andrew Beekhof
On 11 Feb 2014, at 10:38 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Lars Marowsky-Bree l...@suse.com schrieb am 05.02.2014 um 15:11 in Nachricht 20140205141140.gu13...@suse.de: On 2014-02-05T15:06:47, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I guess the

Re: [Linux-HA] Antw: Re: crmd (?) becomes unresponsive

2014-02-06 Thread Andrew Beekhof
On 22 Jan 2014, at 10:58 pm, Thomas Schulte tho...@cupracer.de wrote: Hi Lars, I thought that was what I just did? This is likely a pacemaker problem, and more pacemaker experts are subscribed to the clusterlabs list than here. I think we're mostly all on both. Also, the right bugzilla

Re: [Linux-HA] SLE11 SP3: attrd[13911]: error: plugin_dispatch: Receiving message body failed: (2) Library error: Success (0)

2014-01-15 Thread Andrew Beekhof
Looks like corosync is dying underneath pacemaker. On 15 Jan 2014, at 6:49 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I feel the current clusterstack for SLES11 SP3 has several problems. I'm fighting for a day to get my test cluster up again after having installed the

Re: [Linux-HA] Antw: Re: SLE11 SP3: attrd[13911]: error: plugin_dispatch: Receiving message body failed: (2) Library error: Success (0)

2014-01-15 Thread Andrew Beekhof
On 15 Jan 2014, at 11:13 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 15.01.2014 um 10:27 in Nachricht 59b0ba57-84bd-4ed9-be06-22c41bc21...@beekhof.net: Looks like corosync is dying underneath pacemaker. On 15 Jan 2014, at 6:49

Re: [Linux-HA] Better way to change master in 3 node pgsql cluster

2014-01-13 Thread Andrew Beekhof
On 13 Jan 2014, at 8:32 pm, Andrey Rogovsky a.rogov...@gmail.com wrote: Hi I have 3 node postgresql cluster. It work well. But I have some trobule with change master. For now, if I need change master, I must: 1) Stop PGSQL on each node and cluster service 2) Start Setup new manual

Re: [Linux-HA] Antw: Re: does heartbeat 3.0.4 use IP aliases under CentOS 6.5?

2014-01-07 Thread Andrew Beekhof
On 7 Jan 2014, at 6:27 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Dimitri Maziuk dmaz...@bmrb.wisc.edu schrieb am 04.01.2014 um 18:39 in Nachricht 52c8473b.8000...@bmrb.wisc.edu: On 1/4/2014 10:39 AM, Lars Marowsky-Bree wrote: On 2014-01-03T20:56:42, Digimer li...@alteeve.ca

Re: [Linux-HA] FW cluster fails at 4am

2014-01-07 Thread Andrew Beekhof
On 28 Dec 2013, at 3:34 pm, Tracy Reed tr...@ultraviolet.org wrote: Hello all, First, thanks in advance for any help anyone may provide. I've been battling this problem off and on for months and it is driving me mad: Once every week or two my cluster fails. For reasons unknown it seems

Re: [Linux-HA] FW cluster fails at 4am

2014-01-07 Thread Andrew Beekhof
On 7 Jan 2014, at 10:52 am, Tracy Reed tr...@ultraviolet.org wrote: On Sat, Dec 28, 2013 at 12:42:28AM PST, Jefferson Ogata spake thusly: Is it possible that it's a coincidence of log rotation after patching? In certain circumstances i've had library replacement or subsequent prelink

Re: [Linux-HA] Attribures was not changed via crm_attribute

2014-01-07 Thread Andrew Beekhof
On 27 Dec 2013, at 3:24 am, Andrey Rogovsky a.rogov...@gmail.com wrote: This command help me. I return pg cluster in normal state. Thanks a lot. Now I was reboot a node for test and got this state: Resource Group: master pgsql-master-ip (ocf::heartbeat:IPaddr2): Started

Re: [Linux-HA] Problem not in our membership

2013-12-10 Thread Andrew Beekhof
On 6 Dec 2013, at 6:57 pm, Moullé Alain alain.mou...@bull.net wrote: Hi, I've found a thread talking about this problem on 1.1.7, but at the end , is the patch : https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f sufficient and correct to solve the

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-12-01 Thread Andrew Beekhof
On Wed, Nov 27, 2013, at 06:15 PM, Jefferson Ogata wrote: On 2013-11-28 01:55, Andrew Beekhof wrote: On 28 Nov 2013, at 11:29 am, Jefferson Ogata linux...@antibozo.net wrote: On 2013-11-28 00:12, Dimitri Maziuk wrote: Just so you know: RedHat's (centos, actually) latest build

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-12-01 Thread Andrew Beekhof
On Thu, Nov 28, 2013, at 10:26 AM, Dimitri Maziuk wrote: On 2013-11-27 20:15, Jefferson Ogata wrote: It's nicer, however, when Red Hat takes a conservative position with the Tech Preview. They could have shipped a minimal set of resource agents in the first place, so people would have a

Re: [Linux-HA] fence_apc always fails after some time and resources remains stopped

2013-12-01 Thread Andrew Beekhof
On Thu, Nov 28, 2013, at 12:11 AM, RaSca wrote: Il giorno Ven 22 Nov 2013 10:26:08 CET, RaSca ha scritto: [...] After this resources remains in stopped state. Why this happens? Am I in this case: https://github.com/ClusterLabs/pacemaker/pull/334 ? What kind of workaround can I use?

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-11-27 Thread Andrew Beekhof
On 28 Nov 2013, at 11:29 am, Jefferson Ogata linux...@antibozo.net wrote: On 2013-11-28 00:12, Dimitri Maziuk wrote: Just so you know: RedHat's (centos, actually) latest build of resource-agents sets $HA_BIN to /usr/libexec/heartbeat. The daemon in heartbeat-3.0.4 RPM is

Re: [Linux-HA] Node affinity across reboots

2013-11-14 Thread Andrew Beekhof
On 15 Nov 2013, at 4:37 am, Michael Jones michael.jo...@quantum.com wrote: Hello, I'm attempting to use pacemaker/corosync (1.1.10/2.31) in a two node active/passive embedded application where all resources should run only on one node. I'm looking for a configuration option that will

Re: [Linux-HA] Node affinity across reboots

2013-11-13 Thread Andrew Beekhof
On 14 Nov 2013, at 9:25 am, Michael Jones michael.jo...@quantum.com wrote: Hello, I'm attempting to use pacemaker/corosync (1.1.10/2.31) in a two node active/passive embedded application where all resources should run only on one node. I'm looking for a configuration option that will

Re: [Linux-HA] iSCSI corruption during interconnect failure with pacemaker+tgt+drbd+protocol C

2013-11-12 Thread Andrew Beekhof
On 13 Nov 2013, at 2:10 pm, Jefferson Ogata linux...@antibozo.net wrote: Here's a problem i don't understand, and i'd like a solution to if possible, or at least i'd like to understand why it's a problem, because i'm clearly not getting something. I have an iSCSI target cluster using

Re: [Linux-HA] Antw: How many primitives, groups can I have

2013-11-11 Thread Andrew Beekhof
On 12 Nov 2013, at 12:01 am, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I guess there is no direct limit for the number of primitives, but performance may depend on that number: Maybe O(1), mybe (O(n), hopefully not (On^2) or worse. Immediately I suspect the communication

Re: [Linux-HA] Two fencing devices, long timeout - why?

2013-11-10 Thread Andrew Beekhof
I can't really comment until I know which version of pacemaker this is... On 1 Nov 2013, at 2:57 am, Jakob Curdes j...@info-systems.de wrote: Hi , I have a cman-based cluster that uses pcmk-fencing. we have configured an ipmilan fencing device and an apc fencing device with stonith. I set a

Re: [Linux-HA] RH and gfs2 and Pacemaker

2013-10-16 Thread Andrew Beekhof
Thanks Alain Le 15/10/2013 22:40, Andrew Beekhof a écrit : On 16/10/2013, at 1:35 AM, Moullé Alain alain.mou...@bull.net wrote: OK , I was following this documentation : http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08s02.html the key pieces

Re: [Linux-HA] RH and gfs2 and Pacemaker

2013-10-15 Thread Andrew Beekhof
On 16/10/2013, at 1:35 AM, Moullé Alain alain.mou...@bull.net wrote: OK , I was following this documentation : http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08s02.html the key pieces are the dlm_controld and gfs_controld helpers -- no .pcmk suffix Part of

Re: [Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-10-01 Thread Andrew Beekhof
On 30/09/2013, at 10:49 PM, Moullé Alain alain.mou...@bull.net wrote: Hi, sorry for the delay on this thread, I was unavailable a few weeks, but just FYI, I wanted to share some results I got a few weeks ago: I've tried some tests on a configuration and start/stop of 500 Dummy

Re: [Linux-HA] problems eliminating the use of multicast (fwd)

2013-09-19 Thread Andrew Beekhof
On 19/09/2013, at 8:52 PM, Jakob Curdes j...@info-systems.de wrote: Am 19.09.2013 11:49, schrieb David Lang: On Thu, 19 Sep 2013, Jakob Curdes wrote: That's the direction we started, but apparently the centos pacemaker/corosync packages don't look at the corosync.conf file, they expect

Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-08 Thread Andrew Beekhof
crm resource demote ms_drbd (will only make drbd Secondary stuff) ... meanwhile, DRBD will establish the connection ... crm resource promote ms_drbd (will then promote one node) Hth, Lars -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-04 Thread Andrew Beekhof
On 04/09/2013, at 4:09 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 04.09.2013 07:16, Andrew Beekhof wrote: On 03/09/2013, at 9:20 PM, Moullé Alain alain.mou...@bull.net wrote: Hello, A simple question : is there a maximum number of resources (let's say simple primitives

Re: [Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-09-04 Thread Andrew Beekhof
reason to be found in syslog, and the cluster got quite confused. (I had reported this to my favourite supporter (SR 10851868591), but haven't heard anything since then...) Regards, Ulrich Andrew Beekhof and...@beekhof.net schrieb am 04.09.2013 um 06:16 in Nachricht 3703-9464-458e-9024

Re: [Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-09-04 Thread Andrew Beekhof
On 04/09/2013, at 5:50 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 04.09.2013 um 08:27 in Nachricht ffc283dc-1f66-4b41-a919-1d66e1bf8...@beekhof.net: On 04/09/2013, at 4:09 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-03 Thread Andrew Beekhof
On 03/09/2013, at 9:20 PM, Moullé Alain alain.mou...@bull.net wrote: Hello, A simple question : is there a maximum number of resources (let's say simple primitives) that Pacemaker can support at first at configuration of ressources via crm, and of course after configuration when

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 03/09/2013, at 4:32 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I don't have a real answer for this, but I can report other bad experience with 2-node cluster like yours: If the DC is fenced, the other node tries to become DC, but if the other node (who still

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 3:02 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-09-03T10:25:58, Digimer li...@alteeve.ca wrote: I've run only 2-node clusters and I've not seen this problem. That said, I've long-ago moved off of openais in favour of corosync. Given that membership is handled

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Andrew Beekhof
On 04/09/2013, at 3:10 AM, Digimer li...@alteeve.ca wrote: On 03/09/13 13:08, Lars Marowsky-Bree wrote: On 2013-09-03T13:04:52, Digimer li...@alteeve.ca wrote: My mistake then. I had assumed that corosync was just a stripped down openais, so I figured openais provided the same functions.

Re: [Linux-HA] error: te_connect_stonith: Sign-in failed: triggered a retry

2013-09-03 Thread Andrew Beekhof
to have gone away. On 08/29/2013 10:16 PM, Andrew Beekhof wrote: On 30/08/2013, at 5:51 AM, Tom Parker tpar...@cbnco.com wrote: Hello Since my upgrade last night I am also seeing this message in the logs on my servers. error: te_connect_stonith: Sign-in failed: triggered a retry

Re: [Linux-HA] error: filter_colocation_constraint: both allocated but to different nodes

2013-09-02 Thread Andrew Beekhof
. Regards, Thomas Am 2013-08-28 06:25, schrieb Thomas Schulte: Hi Andrew, thank you! The latest file is attached (pe-input-59.bz2). Regards, Thomas Am 28.08.2013 um 01:40 schrieb Andrew Beekhof and...@beekhof.net: On 27/08/2013, at 8:54 PM, Thomas Schulte tho...@cupracer.de wrote: Hi list

Re: [Linux-HA] Antw: A couple of questions regarding STONITH fencing ...

2013-09-02 Thread Andrew Beekhof
On 29/08/2013, at 4:18 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! After some short thinking I find that using ssh as STONITH is probably the wrong thing to do, because it can never STONITH if the target is down already. Correct (Is it documented that a STONITH

Re: [Linux-HA] Antw: A couple of questions regarding STONITH fencing ...

2013-09-02 Thread Andrew Beekhof
On 29/08/2013, at 5:28 PM, Alex Sudakar alex.suda...@gmail.com wrote: On Thu, Aug 29, 2013 at 4:18 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: After some short thinking I find that using ssh as STONITH is probably the wrong thing to do, because it can never STONITH if the

Re: [Linux-HA] error: te_connect_stonith: Sign-in failed: triggered a retry

2013-08-29 Thread Andrew Beekhof
On 30/08/2013, at 5:51 AM, Tom Parker tpar...@cbnco.com wrote: Hello Since my upgrade last night I am also seeing this message in the logs on my servers. error: te_connect_stonith: Sign-in failed: triggered a retry Old mailing lists seem to imply that this is an issue with heartbeat

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-29 Thread Andrew Beekhof
On 30/08/2013, at 5:49 AM, Tom Parker tpar...@cbnco.com wrote: Hello. Las night I updated my SLES 11 servers to HAE-SP3 which contains the following versions of software: cluster-glue-1.0.11-0.15.28 libcorosync4-1.4.5-0.18.15 corosync-1.4.5-0.18.15 pacemaker-mgmt-2.1.2-0.7.40

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-29 Thread Andrew Beekhof
for the PCMK_ipc_type. Do you have any suggestions for large clusters? shm is the new upstream default, but it may not have propagated to suse yet. Thanks Tom On 08/29/2013 11:19 PM, Andrew Beekhof wrote: On 30/08/2013, at 5:49 AM, Tom Parker tpar...@cbnco.com wrote: Hello. Las

Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-08-27 Thread Andrew Beekhof
deletions(-) Needle, meet haystack. Particularly since I have no idea what that drbd error means. If you want me to have a look, you'll need to create a crm_report archive of works and not works. Logs aren't enough. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew

Re: [Linux-HA] error: filter_colocation_constraint: both allocated but to different nodes

2013-08-27 Thread Andrew Beekhof
On 27/08/2013, at 8:54 PM, Thomas Schulte tho...@cupracer.de wrote: Hi list, I'm experiencing a strange problem an I can't figure out what's wrong. I'm running openSUSE 12.3 on a 2-node-cluster with pacemaker-1.1.9-55.2.x86_64. Every 15 minutes the following messages are logged. If

Re: [Linux-HA] cibadmin --query wiping out resource configuration

2013-08-26 Thread Andrew Beekhof
Seems to be fixed in the latest version: [root@pcmk-1 ~]# cibadmin --query --local --scope=resources --no-children resources/ [root@pcmk-1 ~]# cibadmin --query --local --scope=resources --no-children resources/ [root@pcmk-1 ~]# cibadmin --query --local --scope=resources --no-children resources/

Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-26 Thread Andrew Beekhof
On 23/08/2013, at 5:54 PM, Ferenc Wagner wf...@niif.hu wrote: Andrew Beekhof and...@beekhof.net writes: On 22/08/2013, at 10:08 PM, Ferenc Wagner wf...@niif.hu wrote: Our setup uses some cluster wide pieces of meta information. Think access control lists for resource instances used

Re: [Linux-HA] cibadmin --delete disregarding attributes

2013-08-26 Thread Andrew Beekhof
On 23/08/2013, at 5:32 PM, Ferenc Wagner wf...@niif.hu wrote: Andrew Beekhof and...@beekhof.net writes: On 22/08/2013, at 10:22 PM, Ferenc Wagner wf...@niif.hu wrote: man cibadmin says: the tagname and all attributes must match in order for the element to be deleted, for the element

Re: [Linux-HA] Antw: Re: Q: groups of groups

2013-08-26 Thread Andrew Beekhof
On 23/08/2013, at 5:15 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 23.08.2013 um 02:14 in Nachricht 7e68fa3b-6c15-4e39-ba43-f9a76647f...@beekhof.net: On 22/08/2013, at 7:31 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de

Re: [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-08-26 Thread Andrew Beekhof
On 27/08/2013, at 3:31 AM, Andreas Mock andreas.m...@web.de wrote: Hi all, while the linbit drbd resource agent seems to work perfectly on pacemaker 1.1.8 (standard software repository) we have problems with the last release 1.1.10 and also with the newest head 1.1.11.xxx. As using

Re: [Linux-HA] cman-controlled cluster takes an hour to start !?

2013-08-25 Thread Andrew Beekhof
Chrissie: Perhaps you have some insight here? On 23/08/2013, at 8:42 PM, Jakob Curdes j...@info-systems.de wrote: Hmmm, the problem turns out to DNS-related. At startup, some of the virtual interfaces are inactive and the DNS servers are unreachable. And CMAN seems to do a lookup for all ip

Re: [Linux-HA] Q: groups of groups

2013-08-22 Thread Andrew Beekhof
On 22/08/2013, at 7:31 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! Suppose you have an application A that needs two filesystems F1 and F2. The filesystems are on separate LVM VGs VG1 and VG2 with LVs L1 and L2, respectively. The RAID R1 and R2 provide the LVM PVs.

Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Andrew Beekhof
On 22/08/2013, at 10:08 PM, Ferenc Wagner wf...@niif.hu wrote: Hi, Our setup uses some cluster wide pieces of meta information. Think access control lists for resource instances used by some utilities or some common configuration data used by the resource agents. Currently this info is

Re: [Linux-HA] cibadmin --delete disregarding attributes

2013-08-22 Thread Andrew Beekhof
On 22/08/2013, at 10:22 PM, Ferenc Wagner wf...@niif.hu wrote: Hi, man cibadmin says: the tagname and all attributes must match in order for the element to be deleted, for the element to be deleted --- not the children of the element to be deleted but experience says otherwise: the

Re: [Linux-HA] establishing a new resource-agent package provider

2013-08-13 Thread Andrew Beekhof
On 13/08/2013, at 7:41 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-08-07T19:16:24, Lars Ellenberg lars.ellenb...@linbit.com wrote: Hi all, sorry for being a bit late to the game. I was on vacation for 2,5 weeks with no internet-enabled equipment. I can highly recommend the

Re: [Linux-HA] {SPAM 04.2} Re: Many location on ping resources and best practice for connectivity monitoring

2013-08-09 Thread Andrew Beekhof
On 09/08/2013, at 6:42 PM, RaSca ra...@miamammausalinux.org wrote: Il giorno Ven 09 Ago 2013 04:42:28 CEST, Andrew Beekhof ha scritto: [...] That sounds like something playing with the virt bridge when the vm starts. Is the host trying to ping through the bridge too? Yes

  1   2   3   4   5   6   7   8   9   10   >