Re: [Linux-ha-dev] PATCH: race in iSCSILogicalUnit

2012-10-23 Thread Florian Haas
On 10/08/2012 01:58 PM, Philipp Marek wrote: Hi Florian, Dejan told me that you're the maintainer for the iSCSI pieces, so I'm sending you this patch. Sorry about the late reply; lately I've been watching the GitHub pull requests more religiously than the list. Please apply, thank you

[Linux-ha-dev] RA developer's guide 1.0.3

2012-07-26 Thread Florian Haas
Hi everyone, Tal Yalon has pointed out an error in the RA developer's guide about the required and unique attributes in RA metadata (they belong on parameter elements, not content as the guide erroneously stated). I've spun and pushed a minor update. Enjoy release 1.0.3.

Re: [Linux-HA] Does globally-unique make sense on filesystems cloned resources?

2012-06-06 Thread Florian Haas
On Wed, Jun 6, 2012 at 4:43 PM, RaSca ra...@miamammausalinux.org wrote: Hi all, I've configured an NFS share which is cloned on each node of my cluster. What I need to understand is how the globally-unique parameter applies to this situation. Starting from it's definition: Globally unique

Re: [Linux-HA] Strange Pacemaker issue

2012-06-06 Thread Florian Haas
On Wed, Jun 6, 2012 at 5:58 PM, Yves Trudeau y.trud...@videotron.ca wrote: Hi,    I have an issue with a 5 nodes cluster.  2 nodes report wrong crm status output, it is like if all the other nodes were offline.  The interesting aspect though is that on the remaining 3 nodes, they appear

Re: [Linux-HA] Make ClusterLabs-resource-agents Errors

2012-06-05 Thread Florian Haas
On Mon, Jun 4, 2012 at 11:59 AM, xxg xxgora...@gmail.com wrote: Dear,Engeers.              I was install ClusterLabs-resource-agents on The RedHat 6.0 64bit using source Packages: ClusterLabs-resource-agents-v3.9.3-0-g18cc715.zip              run confiure,use this command:              

[Linux-ha-dev] sbd spinoff from cluster-glue

2012-06-01 Thread Florian Haas
Dejan, Lars, is it confirmed from your end that sbd is moving out of cluster-glue? If so, it would be nice if we could get an cluster-glue release with sbd removed, and a release of standalone sbd, so packagers can fix the relevant distro packages up properly. Cheers, Florian -- Need help with

[Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Florian Haas
Lars, I did this as an exercise of sorts to understand how this compiles and what its dependencies are. Considering the code base is quite small it may seem like a pointless stunt to jump through all the autofoo hoops, but it makes life that much easier for distro packagers. Feel free to apply

[Linux-ha-dev] [PATCH 1 of 2] build: autotoolize

2012-05-29 Thread Florian Haas
# HG changeset patch # User Florian Haas flor...@hastexo.com # Date 1338230941 -7200 # Branch autotools # Node ID 9888c2e4353b08599e6977e5e61dd6d34ce6151e # Parent c4de704b6cea21c69b3c767d1c47bed727f94d82 build: autotoolize diff -r c4de704b6cea -r 9888c2e4353b COPYING --- /dev/null Thu Jan 01

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Florian Haas
On Tue, May 29, 2012 at 2:27 PM, Florian Haas flor...@hastexo.com wrote: Lars, I did this as an exercise of sorts to understand how this compiles and what its dependencies are. Considering the code base is quite small it may seem like a pointless stunt to jump through all the autofoo hoops

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Florian Haas
On Tue, May 29, 2012 at 4:32 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-05-29T14:31:20, Florian Haas flor...@hastexo.com wrote: Forgot to mention this in the original cover message, for those who haven't been following the discussion: this is for sbd which is just spinning off from

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Florian Haas
On Tue, May 29, 2012 at 6:26 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-05-29T17:56:59, Florian Haas flor...@hastexo.com wrote: In case you're wondering why I didn't use PKG_CHECK_MODULES for the PE libraries: their pkg-config file is currently broken; Andrew has a pull request

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Florian Haas
On Tue, May 29, 2012 at 6:38 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-05-29T18:34:15, Florian Haas flor...@hastexo.com wrote: Yeah, it seems you just broke the build by including cluster/stack.h and not bothering to add an AC_CHECK_HEADERS to configure.ac. Where does that come from

Re: [Linux-HA] Bug in iSCSILogicalUnit

2012-05-22 Thread Florian Haas
On Mon, May 21, 2012 at 1:26 PM, Vadym Chepkov vchep...@gmail.com wrote: Hi, I am not subscribed to the devel list, so don't think my e-mail will go there. You know, subscribing only takes a minute, but oh well... Yes, I have the same pattern in iscsi target and in the store path tgtadm

Re: [Linux-HA] Bug in iSCSILogicalUnit

2012-05-22 Thread Florian Haas
On Tue, May 22, 2012 at 11:48 AM, Florian Haas flor...@hastexo.com wrote: Your patch looks fine to me for STGT; for IET it should be equally trivial but sadly I don't have the capacity to test that at the moment. Anyone able to give IET with an updated agent a spin? Fixed RA is in the iscsi

Re: [Linux-ha-dev] [Linux-HA] Bug in iSCSILogicalUnit

2012-05-21 Thread Florian Haas
Hi Vadym, moving the discussion to the -dev list, which is the more appropriate forum for this. Please reply to -dev; more comments inline. On Sun, May 20, 2012 at 9:52 PM, Vadym Chepkov vchep...@gmail.com wrote: Hi, The monitor operation of  iSCSILogicalUnit  is not specific enough in the

Re: [Linux-HA] DRBD-8.4

2012-05-21 Thread Florian Haas
On Sun, May 20, 2012 at 2:54 PM, Willi Fehler willi.feh...@t-online.de wrote: Hi all, I would like to upgrade my cluster to DRBD-8.4.1. Currently my cluster is using DRBD-8.3.12 on CentOS-6.2. Why? In other words, what features unique to 8.4.x (and unavailable in 8.3) are you planning to use?

Re: [Linux-HA] Bug in iSCSILogicalUnit

2012-05-21 Thread Florian Haas
Hi Vadym, moving the discussion to the -dev list, which is the more appropriate forum for this. Please reply to -dev; more comments inline. On Sun, May 20, 2012 at 9:52 PM, Vadym Chepkov vchep...@gmail.com wrote: Hi, The monitor operation of  iSCSILogicalUnit  is not specific enough in the

Re: [Linux-ha-dev] [PATCH v2] resource-agents: add Linux proxy arp resource agent

2012-04-04 Thread Florian Haas
On Wed, Apr 4, 2012 at 1:52 AM, Christian Franke nob...@nowhere.ws wrote: Hello Florian, Your question is fully justified - I sincerely apologize for ignoring that comprehensive documentation. I rewrote the patch trying to adhere to the requirements given in the documentation. Wow, that is

Re: [Linux-ha-dev] [PATCH] resource-agents: add GNU/Linux proxy arp resource agent

2012-04-03 Thread Florian Haas
On Tue, Apr 3, 2012 at 1:07 PM, Christian Franke nob...@nowhere.ws wrote: This patch adds an OCF resource agent which maintains proxy arp entries in a GNU/Linux arp table. This is especially useful when a high-availability routing setup is built and it is required to perform proxy arp.

Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

2012-04-03 Thread Florian Haas
On Tue, Apr 3, 2012 at 10:32 AM, Rainer Krienke krie...@uni-koblenz.de wrote: Hello, I am new to HA setup and my first try was to set up a HA cluster (using SLES 11 SP2 and the SLES11 SP2 HA extension)  that simply offers an OCFS2 filesystem. I did the setup according to the SLES 11 SP2 HA

Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

2012-04-03 Thread Florian Haas
On Tue, Apr 3, 2012 at 2:06 PM, Rainer Krienke krie...@uni-koblenz.de wrote: Am 03.04.2012 11:44, schrieb Lars Marowsky-Bree: property $id=cib-bootstrap-options \         dc-version=1.1.6-b988976485d15cb702c9307df55512d323831a5e \         cluster-infrastructure=openais \        

Re: [Linux-HA] R: crm configure primitive syntax, please HELP!!!

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 2:34 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote: It doesn't seem to be working. [root@NODE_A resources]# crm configure primitive resource_vrt_ip ocf:heartbeat:IPaddr2  params ip=192.168.15.73 nic=bond0 meta target-role=Stopped multiple-active=stop_start

Re: [Linux-HA] High Performance High Availability Guide: new community documentation project

2012-03-26 Thread Florian Haas
On Mon, Mar 26, 2012 at 1:52 AM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Mar 23, 2012 at 10:39 PM, Florian Haas flor...@hastexo.com wrote: Hi everyone, for those interested in contributing to a community documentation project focusing on performance optimization in high availability

Re: [Linux-HA] High Performance High Availability Guide: new community documentation project

2012-03-26 Thread Florian Haas
On Mon, Mar 26, 2012 at 11:34 AM, Andrew Beekhof and...@beekhof.net wrote: Does your scope supersede that of CfS? I would definitely think so. CfS doesn't much touch upon performance, and I don't see any need to -- it's a good document to get you started, and shouldn't be overloaded with too

Re: [Linux-HA] Antw: Re: order transitivity (was Re: order troubles)

2012-03-23 Thread Florian Haas
Hi Ulrich, On Fri, Mar 23, 2012 at 8:28 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! http://www.hastexo.com/resources/hints-and-kinks/mandatory-and-advisory-ordering-pacemaker I have a comment on the article: If you specify that foo should be started before bar, does that

[Linux-HA] High Performance High Availability Guide: new community documentation project

2012-03-23 Thread Florian Haas
Hi everyone, for those interested in contributing to a community documentation project focusing on performance optimization in high availability clusters, please take a look at the following URLs: https://github.com/fghaas/hp-ha-guide (GitHub repo) http://www.hastexo.com/node/173 (blog post --

Re: [Linux-HA] order troubles

2012-03-22 Thread Florian Haas
On Thu, Mar 22, 2012 at 10:34 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: order o_nfs_before_vz 0: cl_fs_nfs cl_vz order o_vz_before_ve992 0: cl_vz ve992 a score of 0 is roughly equivalent to if you happen do plan to do both operations  in the same transition, would you please

Re: [Linux-HA] Command execution prior to resource start

2012-03-19 Thread Florian Haas
On Sat, Mar 17, 2012 at 1:06 PM, Charles Williams ch...@itadmins.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hey all, I have been looking all over for a way to do this and have yet to find anything. I have a symlink resource that links an nfs share to /var/spool/cron

Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-07 Thread Florian Haas
On Wed, Mar 7, 2012 at 5:51 PM, William Seligman selig...@nevis.columbia.edu wrote: Again, a disclaimer: I am not an expert. Your advice was spot on. :) Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-05 Thread Florian Haas
On Sat, Mar 3, 2012 at 8:14 PM, Florian Haas flor...@hastexo.com wrote: In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since some 1.1.x

Re: [Linux-HA] Ping directive failure.

2012-03-05 Thread Florian Haas
On Thu, Mar 1, 2012 at 12:53 PM, Goodwin Anna anna.good...@uk.fujitsu.com wrote: We have a active/passive configuration server pair running Linux HA. The ping directive is pinging to an ip address which is no longer active (pinging the address doesn't return anything, just hangs) and the

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-05 Thread Florian Haas
On Sat, Mar 3, 2012 at 9:30 PM, William Seligman selig...@nevis.columbia.edu wrote: In other words, interleave=true is actually the reasonable thing to set on all clone instances by default, and I believe the pengine actually does use a default of interleave=true on defined clone sets since

Re: [Linux-HA] Apparent problem in pacemaker ordering

2012-03-03 Thread Florian Haas
On Sat, Mar 3, 2012 at 6:55 PM, William Seligman selig...@nevis.columbia.edu wrote: On 3/3/12 12:03 PM, emmanuel segura wrote: are you sure the exportfs agent can be use it with clone active/active? a) I've been through the script. If there's some problem associated with it being cloned, I

Re: [Linux-HA] Off-site Quorum Provider?

2012-03-02 Thread Florian Haas
Hi Eric! On Fri, Mar 2, 2012 at 10:34 PM, Robinson, Eric eric.robin...@psmnv.com wrote: We have two geographically separate data centers connected by 4 x Gigabit links (in 2 trunks). Our HA clusters are distributed between the data centers, with each node of a 2-node cluster in a separate data

Re: [Linux-HA] Process pause detected

2012-03-02 Thread Florian Haas
On Fri, Mar 2, 2012 at 11:11 PM, Willi Fehler willi.feh...@t-online.de wrote: Hi, I have two IBM x3650 M3 running CentOS-6.2 with Pacemaker, Corosync, OpenAIS, DRBD, MySQL and Redis and I see a lot of messages like Process pause detected in /var/log/messages. Then DRBD stops and the second

Re: [Linux-HA] o2cb Pacemaker Stack glue driver not loaded

2012-03-01 Thread Florian Haas
On Thu, Mar 1, 2012 at 8:47 AM, Stefan Schloesser sschloes...@intermediate.de wrote: Hi Florian, thanks for the link, I added the COROSYNC_DEFAULT_CONFIG_IFACE=openaisserviceenableexperimental:corosync_parser Param to no avail, also tried stonith (though I don't see why this should be

Re: [Linux-HA] o2cb Pacemaker Stack glue driver not loaded

2012-03-01 Thread Florian Haas
On Thu, Mar 1, 2012 at 10:25 AM, Stefan Schloesser sschloes...@intermediate.de wrote: Hi Florian, None, because dual-Primary and OCFS2 are utterly pointless for Apache and MySQL. Apache and MySQL can be easily built on single-Primary DRBD with a regular filesystem (ext3/4, XFS, you name it).

Re: [Linux-HA] o2cb Pacemaker Stack glue driver not loaded

2012-03-01 Thread Florian Haas
On 03/01/12 11:25, Stefan Schloesser wrote: Hi Marcus, I would like load-balancing and use typo3 which writes upon access to the filesystem and db (cache etc.). Still pointless? I do (well, I will be again when I get corosync/pacemaker working again!) something similar using a managed

Re: [Linux-HA] fence_nut fencing agent - use NUT to fence via UPS

2012-03-01 Thread Florian Haas
On Thu, Mar 1, 2012 at 11:37 PM, William Seligman selig...@nevis.columbia.edu wrote: That script doesn't work for stonith-ng. So here's a new agent, written in perl, and tested under pacemaker-1.1.6 and nut-2.4.3. I know there's a fence_apc_snmp agent that already in resource-agents.

Re: [Linux-HA] libpthread segfaults

2012-02-29 Thread Florian Haas
On Wed, Feb 29, 2012 at 11:14 AM, Marcus Bointon mar...@synchromedia.co.uk wrote: I'v scrapped my old heartbeat config and I'm trying to start from a clean slate with corosync/pacemaker That's excellent! installed on Ubuntu Lucid from the ubuntu-ha PPA

Re: [Linux-HA] libpthread segfaults

2012-02-29 Thread Florian Haas
On Wed, Feb 29, 2012 at 4:19 PM, Marcus Bointon mar...@synchromedia.co.uk wrote: cib appears to be fine on www5. I've never touched anything in /var/lib/heartbeat/crm - this is a completely vanilla config, though it may be that there are remnants of the old heartbeat config (which was only

Re: [Linux-HA] o2cb Pacemaker Stack glue driver not loaded

2012-02-29 Thread Florian Haas
On Wed, Feb 29, 2012 at 4:56 PM, Stefan Schloesser sschloes...@intermediate.de wrote: I get a Stack glue driver not loaded from the o2cb script: info: RA output: (o2cb:0:start:stderr) 2012/02/29_16:18:47 INFO: Stack glue driver not loaded What is a stack glue driver and what remains to

Re: [Linux-HA] libpthread segfaults

2012-02-29 Thread Florian Haas
On Wed, Feb 29, 2012 at 5:14 PM, Marcus Bointon mar...@synchromedia.co.uk wrote: On 29 Feb 2012, at 16:33, Florian Haas wrote: No, there's an easier way to fix that problem. :) You said this was a vanilla config that needn't be preserved, right? Shut down Corosync on both nodes. Kill

Re: [Linux-HA] libpthread segfaults

2012-02-29 Thread Florian Haas
On 02/29/12 18:46, Marcus Bointon wrote: On 29 Feb 2012, at 17:43, Florian Haas wrote: My hunch is that you never properly shut down corosync on that one. Did you check your ps output so see if it was really down? Corosync 1.2.x had some nasty shutdown issues when running with Pacemaker

Re: [Linux-HA] Oracle instance to be managed by an heartbeat based Linux Cluster

2012-02-28 Thread Florian Haas
Michel, On Mon, Feb 27, 2012 at 11:37 PM, Michel Lion ml...@allot.com wrote: Hi, I would need to configure a Linux Cluster for an Oracle instance. I found some very interesting scripts for starting up and shutting down the database to be used inside the heartbeat configuration files

Re: [Linux-ha-dev] some iSCSITarget meta data issues

2012-02-26 Thread Florian Haas
On 02/27/12 07:38, Rasto Levrinc wrote: I am talking about long shortdescs. E.g. shortdesc lang=enSpecifies the iSCSI target implementation (iet, tgt or lio)./shortdesc is way too long and it abbreviates to something like Specifies the iSCSI... in the GUI. You may not care about this, but

Re: [Linux-ha-dev] [RfC] [Patch] Filesystem

2012-02-21 Thread Florian Haas
On Mon, Feb 20, 2012 at 9:40 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: What do you say? +1 Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org

Re: [Linux-ha-dev] [PATCH] Medium: Use the resource timeout as an override to the default dbus timeout for upstart RA

2012-02-20 Thread Florian Haas
On Mon, Feb 20, 2012 at 11:57 AM, Andrew Beekhof and...@beekhof.net wrote: It does, but the exit status is always '0', which makes 'service' binary unusable for monitoring the status of the service without parsing the command output. 10 head 20 desk 30 add 40 goto 10 I believe you went

Re: [Linux-HA] pacemaker/corosync - cl_status . REASON: hb_api_signon: Can't initiate connection to heartbeat

2012-02-14 Thread Florian Haas
On 02/14/12 03:09, Andrew Beekhof wrote: On Tue, Feb 14, 2012 at 7:26 AM, Thomas Baumann t...@tiri.li wrote: Hello list, In my current pacemaker/corosync installation in a 2 node cluster I get following error: # cl_status listnodes This is a heartbeat command, you're running corosync

Re: [Linux-HA] question

2012-02-07 Thread Florian Haas
On Fri, Feb 3, 2012 at 12:10 AM, Amel m...@amel.no wrote: Hey hey, I am new to HA Heartbeat, and I am wondering about one thing as I could not find any info about it on Your page. Can we run the HA heartbeat across the WAN links if the servers are on the different subnets ?? In case yes

Re: [Linux-HA] if promote runs into timeout

2012-01-12 Thread Florian Haas
On Tue, Jan 10, 2012 at 4:44 PM, erkan yanar er...@linsenraum.de wrote: On Tue, Jan 10, 2012 at 10:20:02AM +0100, Andreas Kurz wrote: Hello, On 01/06/2012 06:14 PM, erkan yanar wrote: Moin, Im having the issue, that promoting a master can run into the promote timeout. After that,

Re: [Linux-ha-dev] Additional changes made via DHCPD review process

2011-12-09 Thread Florian Haas
On Fri, Dec 9, 2011 at 6:30 AM, Dejan Muhamedagic de...@suse.de wrote: Hi, On Tue, Dec 06, 2011 at 01:39:04PM -0400, Chris Bowlby wrote: Hi All,   Ok, I'll look into csync, and will concede the point on the RA syncing the out of chrooted configuration file. I still need to find a means to

Re: [Linux-HA] Q: OCFS on Dual-Primary DRBD: Wrong medium type

2011-12-09 Thread Florian Haas
On Fri, Dec 9, 2011 at 1:56 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I have configured a Dual-Primary DRBD as disk for OCFS. Occasionally the Filesystem RA reports a Wrong medium type: Dec  7 16:44:46 h02 lrmd: [10930]: info: RA output:

Re: [Linux-HA] Q: OCFS on Dual-Primary DRBD: Wrong medium type

2011-12-09 Thread Florian Haas
On Fri, Dec 9, 2011 at 2:00 PM, Florian Haas flor...@hastexo.com wrote: What could be the reasons for that? (The problem occurred when the other node of the cluster had killed itself, and the cluster tried to recover from unclean state) Most likely, the Filesystem is trying to mount your

Re: [Linux-HA] Light Weight Quorum Arbitration

2011-12-06 Thread Florian Haas
On Tue, Dec 6, 2011 at 9:50 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-12-04T00:57:05, Andreas Kurz andr...@hastexo.com wrote: the concept of an arbitrator for split-site cluster is already implemented and should be available with Pacemaker 1.1.6 though it seem to be not directly

Re: [Linux-ha-dev] pgsql and streaming replcation

2011-12-05 Thread Florian Haas
On Sun, Dec 4, 2011 at 11:11 PM, Serge Dubrouski serge...@gmail.com wrote: Florian, Dejan how would you like to merge a patch when we are ready? The patch will be rather big one and AFAIK you have some policy on the amount of changes for one patch. If it's a big addition of functionality, then

Re: [Linux-ha-dev] [Linux-HA] Generic Python framework for OCF Resource Agents released

2011-12-03 Thread Florian Haas
Hi Volker, welcome. First off I would like to say that what you present is pretty impressive and could turn out to be something extraordinarily helpful. I agree that being able to easily create robust resource agents in Python would be a huge plus. I also agree with the pain points you mentioned

Re: [Linux-HA] Generic Python framework for OCF Resource Agents released

2011-12-03 Thread Florian Haas
Hi Volker, welcome. First off I would like to say that what you present is pretty impressive and could turn out to be something extraordinarily helpful. I agree that being able to easily create robust resource agents in Python would be a huge plus. I also agree with the pain points you mentioned

Re: [Linux-ha-dev] [Linux-HA] new RA: varnish

2011-12-02 Thread Florian Haas
On Wed, Nov 23, 2011 at 10:40 AM, Léon Keijser keij...@stone-it.com wrote: Hi, I've created a new RA to manage Varnish instances. I've forked resource-agents and added it here: https://github.com/lkeijser/resource-agents We haven't seen much review from others here on the list, but Léon has

Re: [Linux-HA] new RA: varnish

2011-12-02 Thread Florian Haas
On Wed, Nov 23, 2011 at 10:40 AM, Léon Keijser keij...@stone-it.com wrote: Hi, I've created a new RA to manage Varnish instances. I've forked resource-agents and added it here: https://github.com/lkeijser/resource-agents We haven't seen much review from others here on the list, but Léon has

Re: [Linux-HA] OCF RA mysql

2011-12-01 Thread Florian Haas
On Wed, Nov 30, 2011 at 3:14 PM, Nick Khamis sym...@gmail.com wrote: Does the latest version of the RAs have all the old heartbeat related material removed? I don't follow. Care to clarify the question? Florian -- Need help with High Availability? http://www.hastexo.com/now

Re: [Linux-ha-dev] [PATCH 0/2] LVM: Fix activation for clustered VGs

2011-11-30 Thread Florian Haas
On Fri, Nov 25, 2011 at 6:38 PM, Florian Haas flor...@hastexo.com wrote: Lars (both), Dejan, Nils, could you take a quick peek at whether the following (untested) patches look like they're making sense? The more important one is obviously the second one. Nils, could you apply those patches

Re: [Linux-HA] OCF RA mysql

2011-11-30 Thread Florian Haas
On Wed, Nov 30, 2011 at 9:46 AM, alain.mou...@bull.net wrote: Hi I 'm facing a problem with this RA, I wonder if is it a real one already identified : in fact, the stop of the mysql resource fails and therefore the node is fenced. I've checked a little in the RA script : the stop kills the

Re: [Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones

2011-11-28 Thread Florian Haas
On Sat, Nov 26, 2011 at 11:58 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2011-11-25T18:38:06, Florian Haas flor...@hastexo.com wrote: Starting a clustered volume with monitoring disabled is not allowed: http://www.redhat.com/archives/lvm-devel/2010-March/msg00289.html Which would

Re: [Linux-HA] Antw: Re: ocf_heartbeat:Xinetd: bad status report

2011-11-28 Thread Florian Haas
On Mon, Nov 28, 2011 at 8:51 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Florian Haas florian.h...@hastexo.com schrieb am 25.11.2011 um 14:35 in Nachricht 4ecf99ae.8050...@hastexo.com: [...] ... and I just morphed that patch into a git branch. Ulrich, as much as we're all

Re: [Linux-HA] Antw: Re: ocf_heartbeat:Xinetd: bad status report

2011-11-28 Thread Florian Haas
On Mon, Nov 28, 2011 at 2:58 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Why? It seems typeset is the POSIX thing, while local is a BASH-ism. So what's wrong with local variables? local is almost certainly not a bashism. At least I can recall once changing typeset to local in some RA.

Re: [Linux-HA] Antw: Re: Stonith SBD not fencing nodes

2011-11-28 Thread Florian Haas
On Mon, Nov 28, 2011 at 4:35 PM, Hal Martin hal.mar...@gmail.com wrote: Looking at the mercurial repository for pacemaker (http://hg.clusterlabs.org/pacemaker/) I do not see any check-ins since 1.1.6 was tagged two months ago. Pacemaker has since moved to GitHub:

Re: [Linux-HA] failover questions

2011-11-26 Thread Florian Haas
Hello, On Sat, Nov 26, 2011 at 12:54 PM, Willi Fehler willi.feh...@t-online.de wrote: Hi Florian, the SRPM are working on CentOS-6. I just got a simple warning /var/lib/heartbeat/cores no such file or directory if I configure the cib. OK, that's a minor packaging issue that's easily fixed.

[Linux-ha-dev] [PATCH 1/2] Low: LVM: add local convenience variables in LVM_start

2011-11-25 Thread Florian Haas
--- heartbeat/LVM | 11 +++ 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/heartbeat/LVM b/heartbeat/LVM index 683d4d5..d8ad3ca 100755 --- a/heartbeat/LVM +++ b/heartbeat/LVM @@ -201,6 +201,8 @@ LVM_monitor() { # Enable LVM volume # LVM_start() { + local

[Linux-ha-dev] [PATCH 0/2] LVM: Fix activation for clustered VGs

2011-11-25 Thread Florian Haas
Lars (both), Dejan, Nils, could you take a quick peek at whether the following (untested) patches look like they're making sense? The more important one is obviously the second one. Nils, could you apply those patches on the system where you ran into the issue? If it's more convenient, you can

[Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones

2011-11-25 Thread Florian Haas
Starting a clustered volume with monitoring disabled is not allowed: http://www.redhat.com/archives/lvm-devel/2010-March/msg00289.html Which would be fine, as activation/monitoring = 1 ships as the default in lvm.conf. However, at least some versions of LVM seem to ignore this, throwing an error

Re: [Linux-HA] Q: unmanaged MD-RAID auto-recovery

2011-11-25 Thread Florian Haas
On 11/25/11 10:20, Ulrich Windl wrote: Hi! I thought that an unmanaged resource would not mess wit hthe resources, but as it seems, the RAID1 monitor does auto-recovery even in unmanaged mode. That's highly unlikely. Do you have any logs to back up that claim? If so, please pastebin those

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-25 Thread Florian Haas
On 11/25/11 10:47, Ulrich Windl wrote: The resource is unmanaged: Nov 24 12:59:05 h03 pengine: [15876]: notice: LogActions: Leave prm_c11_db_15k_raid1 (Started unmanaged) [...] LUN shrink begins: Nov 24 12:59:39 h03 kernel: [1220873.890571] sd 2:0:3:13: [sdai] Result:

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-25 Thread Florian Haas
On 11/25/11 13:29, Lars Ellenberg wrote: From the log snippet it's not entirely clear whether that's a recurring monitor (interval == whatever you configured, or 20 if default), or a probe (interval == 0). A recurring monitor clearly should not happen at all when unmanaged. That is

Re: [Linux-HA] Antw: Re: ocf_heartbeat:Xinetd: bad status report

2011-11-25 Thread Florian Haas
On 11/25/11 13:18, Lars Ellenberg wrote: On Tue, Nov 22, 2011 at 12:09:51PM +0100, Ulrich Windl wrote: Dejan Muhamedagic deja...@fastmail.fm schrieb am 21.11.2011 um 16:11 in Nachricht 2021151134.GB3600@squib: [...] This RA could certainly be improved. Patches welcome! OK, I tried a

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-25 Thread Florian Haas
On 11/25/11 20:31, Lars Ellenberg wrote: On Fri, Nov 25, 2011 at 01:54:33PM +0100, Florian Haas wrote: On 11/25/11 13:29, Lars Ellenberg wrote: From the log snippet it's not entirely clear whether that's a recurring monitor (interval == whatever you configured, or 20 if default), or a probe

Re: [Linux-HA] failover questions

2011-11-24 Thread Florian Haas
On 11/24/11 21:13, Willi Fehler wrote: I've some questions: 1. Could you please send me a link, where I can download SRPMS for Fedora? I've checked the redhat mirrors, and I didn't found any packages. (SRPM) Well they're obviously hosted on the Fedora mirrors, by far not all of which are

Re: [Linux-ha-dev] Review request: fixes to IPaddr2

2011-11-22 Thread Florian Haas
On 11/21/11 16:51, Dejan Muhamedagic wrote: Hi Florian, On Fri, Nov 11, 2011 at 10:28:00AM +0100, Florian Haas wrote: Dejan/Lars, I noticed I've had a bunch of minor changes to IPaddr2 sitting in a branch since July, and never got around to asking for a review or merging them. I've just

Re: [Linux-HA] failover questions

2011-11-22 Thread Florian Haas
On 11/22/11 20:18, Willi Fehler wrote: Hi, I'm trying to setup a database cluster with MySQL/Redis. My problem is, the failover is working if I shutdown/reboot one node. I take it that _that_ part isn't really a problem. :) If I shutdown the network on one node(ifdown eth0 or ifdown

Re: [Linux-HA] failover questions

2011-11-22 Thread Florian Haas
On 11/23/11 07:51, Willi Fehler wrote: Hi Florian, thank you so much for your feedback. My goal is, if the cluster communication eth0 get's failed on the active node, a failover should be triggered by pacemaker, because if eth0 is down, the application can't talk to the cluster.

Re: [Linux-HA] Antw: What about start-delay attribute status ?

2011-11-21 Thread Florian Haas
On 11/21/11 13:03, alain.mou...@bull.net wrote: Hi, yes that's exactly the purpose of my question (and exactly the same problem of big-monitoring-trains) : if we can always use start-delay to ramdomize the first monitor operation time on all the resources on a server, but if it is really

Re: [Linux-HA] Question about groups

2011-11-20 Thread Florian Haas
On 11/21/11 08:35, alain.mou...@bull.net wrote: Hi It seems that my last email a week ago has been lost, don't know why ... so : does migration-threshold parameter in the meta of group is allowed/efficient ? I've never tried this, but I would be surprised if it worked. Groups don't

Re: [Linux-ha-dev] [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo

2011-11-18 Thread Florian Haas
On 11/16/11 10:37, Florian Haas wrote: Hi everyone, this is something I've been meaning to do for a long time, and I've finally had the time to do so. Now that the ClusterLabs repo on Github has been established as the central source of OCF resource agents, there is really no reason why

[Linux-ha-dev] Updated OCF RA dev guide (was Re: [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo)

2011-11-18 Thread Florian Haas
On 11/18/11 10:50, Florian Haas wrote: Done. Merged and pushed. I'll now add a few updates and then rebuild the content hosted at http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html. I guess I should be able to upload the new stuff some time between now and Monday morning. An updated

[Linux-HA] Updated OCF RA dev guide (was Re: [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo)

2011-11-18 Thread Florian Haas
On 11/18/11 10:50, Florian Haas wrote: Done. Merged and pushed. I'll now add a few updates and then rebuild the content hosted at http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html. I guess I should be able to upload the new stuff some time between now and Monday morning. An updated

Re: [Linux-ha-dev] [GIT PULL] Re: [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-17 Thread Florian Haas
On 11/16/11 10:28, Florian Haas wrote: Hi everybody, barring any last-minute vetoes, I intend to pull the following changes since commit 020c8f7b08e232aef05e277b09632171a7561744: Heard no vetoes. Merged and pushed. https://github.com/ClusterLabs/resource-agents/commit

Re: [Linux-HA] [GIT PULL] Re: [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-17 Thread Florian Haas
On 11/16/11 10:28, Florian Haas wrote: Hi everybody, barring any last-minute vetoes, I intend to pull the following changes since commit 020c8f7b08e232aef05e277b09632171a7561744: Heard no vetoes. Merged and pushed. https://github.com/ClusterLabs/resource-agents/commit

[Linux-ha-dev] [GIT PULL] Re: [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-16 Thread Florian Haas
at: git://github.com/fghaas/resource-agents asterisk Thanks a lot to Dejan, Lars and Russell for their extensive and valuable feedback. Cheers, Florian Andreas Kurz (2): Medium: asterisk: remove -x option from pgrep Low: asterisk: refine sipsak exit code interpretation Florian

[Linux-ha-dev] [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo

2011-11-16 Thread Florian Haas
/resource-agents dev-guide Dejan Muhamedagic (1): RA dev guide: edit ocft description Florian Haas (68): doc: move man page generation to doc/man Add RA developer's guide RA dev guide: explain API RA dev guide: add info about expected behavior for actions RA dev guide

[Linux-HA] [GIT PULL] Re: [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-16 Thread Florian Haas
at: git://github.com/fghaas/resource-agents asterisk Thanks a lot to Dejan, Lars and Russell for their extensive and valuable feedback. Cheers, Florian Andreas Kurz (2): Medium: asterisk: remove -x option from pgrep Low: asterisk: refine sipsak exit code interpretation Florian

Re: [Linux-ha-dev] ocf_run: sanitize output before logging?

2011-11-15 Thread Florian Haas
On 2011-11-15 16:21, Dejan Muhamedagic wrote: Hi, On Mon, Nov 14, 2011 at 09:53:12PM +0100, Florian Haas wrote: Dejan, Lars, and other shell gurus in attendance, maybe I'm totally off my rocker, and one of you guys can set me straight. But to me this part of the ocf_run function seems

Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-14 Thread Florian Haas
Hello Ikeda-san! On 2011-11-11 09:22, Junko IKEDA wrote: Hi, I am running MySQL replication setting with 2 nodes Master/Slave configuration. If Slave status(secs_behind) is lager than Master's parameter(max_slave_lag), Slave data is outdated, right? Yes. check_slave() in mysql RA would

Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-14 Thread Florian Haas
On 2011-11-14 20:44, Marek Marczykowski wrote: On 14.11.2011 09:55, Raoul Bhatia [IPAX] wrote: On 2011-11-11 09:22, Junko IKEDA wrote: Hi, I am running MySQL replication setting with 2 nodes Master/Slave configuration. If Slave status(secs_behind) is lager than Master's

[Linux-ha-dev] ocf_run: sanitize output before logging?

2011-11-14 Thread Florian Haas
Dejan, Lars, and other shell gurus in attendance, maybe I'm totally off my rocker, and one of you guys can set me straight. But to me this part of the ocf_run function seems a bit fishy: output=`$@ 21` rc=$? output=`echo $output` Am I gravely mistaken, or would any funny

Re: [Linux-HA] Should This Worry Me?

2011-11-13 Thread Florian Haas
On 2011-11-13 03:02, Robinson, Eric wrote: Should I be concerned that the standby node of a 2-node cluster is logging these messages about every 15 seconds? Nope. It's just ocf:linbit:drbd updating the master preference based on DRBD's current status. Cheers, Florian -- Need help with DRBD?

Re: [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-12 Thread Florian Haas
On 2011-11-12 03:14, Russell Bryant wrote: https://github.com/fghaas/resource-agents/commit/ac34c61cc209c389c4501a9f4a699a23e803a0f6#heartbeat/asterisk Does this look reasonable to you? In case you're unfamiliar with our function library: ocf_run runs a command and captures its output, but

Re: [Linux-ha-dev] [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)

2011-11-11 Thread Florian Haas
Just FYI, I noticed I erroneously put the asterisk changes in the master branch on my github repo; I've now moved them to a separate asterisk branch. The direct links to commits, which I posted earlier, should still work as the SHA IDs are unchanged. They just point to commits in a different

[Linux-ha-dev] Review request: fixes to IPaddr2

2011-11-11 Thread Florian Haas
Dejan/Lars, I noticed I've had a bunch of minor changes to IPaddr2 sitting in a branch since July, and never got around to asking for a review or merging them. I've just rebased them to the current state of master. If one of you could take a look, I'd much appreciate that. Thanks!

Re: [Linux-ha-dev] Patches for VirtualDomain RA

2011-11-11 Thread Florian Haas
On 2011-07-29 10:22, Michael Schwartzkopff wrote: Hi, I hope I found the correct list. Playing with the VirtualDomain RA I found two problems. Please find the description and patches below. Sorry for not tending to this for a while, and thanks to Dejan for the reminder. 1) During stop

Re: [Linux-ha-dev] Patches for VirtualDomain RA

2011-11-11 Thread Florian Haas
On 2011-11-11 11:42, Michael Schwartzkopff wrote: 2) The next problem is that a graceful shutdown sometimes does not work when the machine just booted. This patch makes the RA send a shutdown command every 10 seconds while shutting down the machine. This catches the boot problem. @@ -234,6

  1   2   3   4   5   6   >