Re: [Linux-ha-dev] Ordering of clones; does it work?

2014-11-27 Thread Lars Marowsky-Bree
On 2014-11-27T10:10:47, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I had thought ordrring of clones would work, but it looks like it does not in current SLES11 SP3 (1.1.11-3ca8c3b): I have rules like: order ord_DLM_O2CB inf: cln_DLM cln_O2CB order ord_DLM_cLVMd inf:

Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-24 Thread Lars Marowsky-Bree
On 2014-10-23T20:36:38, Lars Ellenberg lars.ellenb...@linbit.com wrote: If we want to require presence of start-stop-daemon, we could make all this somebody elses problem. I need find some time to browse through the code to see if it can be improved further. But in any case, using (a tool

Re: [Linux-ha-dev] pull request for sg_persist new RA ocft

2014-03-21 Thread Lars Marowsky-Bree
On 2014-03-18T02:24:51, Liuhua Wang lw...@suse.com wrote: Hi Liuhua, thanks for pushing again! I've taken some time to provide some code review. Overall, I think it looks good, mostly cosmetic and codingstyle. I'd welcome more insight from others on this list; especially those with maintainer

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-29 Thread Lars Marowsky-Bree
On 2013-08-28T20:13:43, Dejan Muhamedagic de...@suse.de wrote: A new RC has been released today. It contains both fixes. It doesn't do atomic updates anymore, because cibadmin or something cannot stomach comments. Couldn't find the upstream bug report :-( Can you give me the pacemaker bugid,

Re: [Linux-ha-dev] [Linux-HA] Problem in SLES11 SP2 (actions on removed resources)?

2013-04-19 Thread Lars Marowsky-Bree
On 2013-04-19T09:56:37, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: sbd monitoring went crazy (reporting running sbds when there were none, compaining the unability to stop sbd when there was none), so I stopped it. What did you monitor? And what do you mean by went crazy?

Re: [Linux-ha-dev] A patch for crmsh.spec

2013-03-19 Thread Lars Marowsky-Bree
On 2013-03-19T15:28:05, yusuke iida yusk.i...@gmail.com wrote: I made a patch to solve the above. Dejan seems to be busy. Would you cope with this instead? Dejan is just on vacation this week. Let's wait for him to return ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX

Re: [Linux-ha-dev] TOTEM implementation eror (SLES11 SP2)?

2013-02-26 Thread Lars Marowsky-Bree
On 2013-02-25T15:26:36, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hello, I'm wondering about these messages: Feb 25 14:53:31 so4 corosync[12457]: [TOTEM ] Retransmit List: 2a5 28b 28d 28e 295 296 297 298 299 29a 29b 29c 29d 29e 29f 2a0 2a1 2a2 2a3 2a4 2a6 That has nothing

Re: [Linux-ha-dev] RA trace facility

2012-11-21 Thread Lars Marowsky-Bree
On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a

Re: [Linux-ha-dev] RA trace facility

2012-11-21 Thread Lars Marowsky-Bree
On 2012-11-21T18:02:49, Dejan Muhamedagic de...@suse.de wrote: What would you think of OCF_RESKEY_RA_TRACE ? A meta attribute perhaps? That wouldn't cause a resource restart. Point, but - meta attributes so far were mostly for the PE/pacemaker, this would be for the RA. Would a changed

Re: [Linux-ha-dev] Q: Xen RA: node_ip_attribute

2012-09-21 Thread Lars Marowsky-Bree
On 2012-09-20T08:47:59, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: ---(resource-agents-3.9.3-0.7.1 of SLES 11 SP2)--- node_ip_attribute (string): Node attribute containing target IP address ^^ In case of a live migration, the system will

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-12 Thread Lars Marowsky-Bree
On 2012-09-11T15:04:55, Alan Robertson al...@unix.sh wrote: Depends. Pacemaker may still care about the status of these agents. If it can't start or stop them, what can it do with them? The status from these agents may feed into operations on other resources that are fully managed.

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-12 Thread Lars Marowsky-Bree
On 2012-09-12T09:01:05, Alan Robertson al...@unix.sh wrote: The status from these agents may feed into operations on other resources that are fully managed. Understood. I believe it will care about those other agents - not these. It shouldn't know about these, AFAIK. I guess then

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-08 Thread Lars Marowsky-Bree
On 2012-09-07T13:46:27, Alan Robertson al...@unix.sh wrote: Well, I presume that one would not tell pacemaker about such agents, as they would not be useful to pacemaker. From the point of view of the crm command, you wouldn't consider them as valid resource agents to put in a

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-07 Thread Lars Marowsky-Bree
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote: BTW, FWIW - monocf may be just like ocf, sans start and stop operations. That would make all ocf RA elligible for this use. Thinking about this, not entirely. We'd have to fake the start/stop at least. (In particular the start.)

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-05 Thread Lars Marowsky-Bree
On 2012-09-04T19:20:23, Alan Robertson al...@unix.sh wrote: I will likely write a monitor-only resource agent for web servers. What would you think about calling it from the other web resource agents? Sharing code - in this case, the monitor-via-network of the http agents - seems to make

Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-05 Thread Lars Marowsky-Bree
On 2012-09-05T15:25:44, Dejan Muhamedagic de...@suse.de wrote: How about a new element. Something like primitive vm1 ocf:heartbeat:VirtualDomain require vm1 web-test dns-test How we map this into Pacemaker's dependency scheme is obviously open to discussion. The require would imply that

Re: [Linux-ha-dev] Note: Core Dumps with corosync-1.4.1-0.13.1 (SLES11 SP2)

2012-09-03 Thread Lars Marowsky-Bree
On 2012-08-31T14:56:22, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! By random I realized that every node of my 5-node test-cluster had at least one corosync-coredump. Unfortunately they even seem to have different signatures. I can provide a rough backtrace to get you warmed

Re: [Linux-ha-dev] apply_xml_diff: Digest mis-match

2012-08-17 Thread Lars Marowsky-Bree
On 2012-08-13T15:39:22, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! In pacemaker-1.1.6-1.29.1 (SLES11 SP2 x86_64) I see this for an idle cluster with just one stonith resource being running when doing some unrelated change: What is the unrelated change you are doing? Does

Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration

2012-06-15 Thread Lars Marowsky-Bree
On 2012-05-25T17:31:52, Florian Haas flor...@hastexo.com wrote: Um, right now I have no opinion. Your commit messages are pretty terse, and there's no README in the repo. Mind adding one? FWIW, there is now a manual page as well. That might help with understanding what it is supposed to do.

Re: [Linux-ha-dev] sbd spinoff from cluster-glue

2012-06-01 Thread Lars Marowsky-Bree
On 2012-06-01T16:16:20, Florian Haas flor...@hastexo.com wrote: Dejan, Lars, is it confirmed from your end that sbd is moving out of cluster-glue? If so, it would be nice if we could get an cluster-glue release with sbd removed, and a release of standalone sbd, so packagers can fix the

Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration

2012-05-29 Thread Lars Marowsky-Bree
On 2012-05-29T08:39:06, Florian Haas flor...@hastexo.com wrote: Should be packageable on every platform, though I admit that I've not tried building the pacemaker module against anything but the corosync+pacemaker+openais stuff we ship on SLE HA 11 so far. Are you expecting this to build

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Lars Marowsky-Bree
On 2012-05-29T14:31:20, Florian Haas flor...@hastexo.com wrote: Forgot to mention this in the original cover message, for those who haven't been following the discussion: this is for sbd which is just spinning off from cluster-glue. Thanks, I've merged them both! Regards, Lars --

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Lars Marowsky-Bree
On 2012-05-29T17:56:59, Florian Haas flor...@hastexo.com wrote: In case you're wondering why I didn't use PKG_CHECK_MODULES for the PE libraries: their pkg-config file is currently broken; Andrew has a pull request for Pacemaker for that. I was wondering more about how to build this against

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Lars Marowsky-Bree
On 2012-05-29T18:34:15, Florian Haas flor...@hastexo.com wrote: Yeah, it seems you just broke the build by including cluster/stack.h and not bothering to add an AC_CHECK_HEADERS to configure.ac. Where does that come from, is that new to Pacemaker? Uh? It builds here on the 1.1.7 pacemaker

Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build

2012-05-29 Thread Lars Marowsky-Bree
On 2012-05-29T18:57:30, Florian Haas flor...@hastexo.com wrote: The integration with the cluster stack is rather specific to whatever pacemaker/corosync version + configuration you build against. Unfortunately. Well that's what #ifdef HAVE_CLUSTER_STACK_H and friends are good for, no? I

Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration

2012-05-25 Thread Lars Marowsky-Bree
On 2012-05-25T17:31:52, Florian Haas flor...@hastexo.com wrote: That aside, what do you think of the idea/approach? Um, right now I have no opinion. Your commit messages are pretty terse, and there's no README in the repo. Mind adding one? Good point. I wasn't aware the commit messages were

Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration

2012-05-25 Thread Lars Marowsky-Bree
On 2012-05-25T21:44:25, Florian Haas flor...@hastexo.com wrote: If so, the master thread will not self-fence even if the majority of devices is currently unavailable. That's it, nothing more. Does that help? It does. One naive question: what's the rationale of tying in with

Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration

2012-05-24 Thread Lars Marowsky-Bree
On 2012-05-24T14:34:59, Florian Haas flor...@hastexo.com wrote: To give you a glance of the extended sbd code, you can check out http://hg.linux-ha.org/sbd - the new Pacemaker integration is activated using the -P option in /etc/sysconfig/sbd, otherwise sbd remains a drop-in replacement

Re: [Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-08 Thread Lars Marowsky-Bree
On 2012-05-08T12:08:27, Dejan Muhamedagic de...@suse.de wrote: In the default (without OCF_CHECK_LEVE), it's enough to try unmount the file system, isn't it? https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774 I don't see a need to remove the STATUSFILE

Re: [Linux-ha-dev] [PATCH v2] resource-agents: add Linux proxy arp resource agent

2012-04-04 Thread Lars Marowsky-Bree
On 2012-04-04T01:52:12, Christian Franke nob...@nowhere.ws wrote: Hello Florian, Your question is fully justified - I sincerely apologize for ignoring that comprehensive documentation. I rewrote the patch trying to adhere to the requirements given in the documentation. Hi Christian,

Re: [Linux-ha-dev] Patch: pgsql streaming replication

2012-03-19 Thread Lars Marowsky-Bree
On 2012-03-19T11:09:16, Dejan Muhamedagic de...@suse.de wrote: --- a/heartbeat/pgsql +++ b/heartbeat/pgsql @@ -1,12 +1,13 @@ -#!/bin/sh +#!/bin/bash Our policy is not to change shell. Is that absolutely necessary? He sends in many patches. bash is a 1MB install. I can't believe that

Re: [Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones

2011-11-30 Thread Lars Marowsky-Bree
On 2011-11-28T21:14:22, Florian Haas flor...@hastexo.com wrote: Seems to make sense. of course, an alternative would be to add a Conflicts: lvm2 x.y.z to the package on the respective versions to make sure it's only installed with a fixed lvm2 package ...? Surely you're joking.

Re: [Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones

2011-11-26 Thread Lars Marowsky-Bree
On 2011-11-25T18:38:06, Florian Haas flor...@hastexo.com wrote: Starting a clustered volume with monitoring disabled is not allowed: http://www.redhat.com/archives/lvm-devel/2010-March/msg00289.html Which would be fine, as activation/monitoring = 1 ships as the default in lvm.conf.

Re: [Linux-ha-dev] Stonith turns node names to lowercase

2011-10-19 Thread Lars Marowsky-Bree
On 2011-10-18T12:12:05, Alberic de Pertat alberic.deper...@adelux.fr wrote: I am currently in the process of writing a fencing agent for VMware vCenter. After some tests, I noticed that the stonith command is turning the nodename to lowercase. Yes, that was added because host/nodenames by

Re: [Linux-ha-dev] Stonith turns node names to lowercase

2011-10-19 Thread Lars Marowsky-Bree
On 2011-10-18T12:40:40, Florian Haas flor...@hastexo.com wrote: g_strdown(nodecopy); Is there a reason for this ? I suppose Dejan will accept a patch making this configurable. Please, no. We fence by hostname; hostnames are case insensitive by definition. Plugins need to handle that.

Re: [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-06 Thread Lars Marowsky-Bree
On 2011-10-03T11:10:13, Andrew Beekhof and...@beekhof.net wrote: Based on Boston last year, I imagine the conversations will last right up until Lars starts presenting his talk on Friday afternoon. People came and went at random, and if someone essential was missing for a conversation we

Re: [Linux-ha-dev] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-09-30 Thread Lars Marowsky-Bree
Hi all, it turns out that there was zero feedback about people wanting to present, only some about travel budget being too tight to come. So we had some discussions about whether to cancel this completely, as this made planning rather difficult. But just in the last few days, I got a fair share

Re: [Linux-ha-dev] Patch: VirtualDomain - fix probe if config is not on shared storage

2011-06-27 Thread Lars Marowsky-Bree
On 2011-06-27T12:00:28, Dominik Klein dominik.kl...@googlemail.com wrote: Now it sees NOT_RUNNING on all nodes during probe and may decide to start the VM on a node where it cannot run. That, with the current version of the agent, leads to a failed start, a failed stop during recovery and

Re: [Linux-ha-dev] Filesystem ocf file

2011-05-16 Thread Lars Marowsky-Bree
On 2011-05-16T14:52:16, Dejan Muhamedagic de...@suse.de wrote: There's a bunch of OCF_RESKEY_CRM_meta_*, that sounds like a good way to subdivision a name space, though a bit too verbose. That's basically an invention that Andrew came up with and that we might as well codify, now that

Re: [Linux-ha-dev] Filesystem ocf file

2011-05-10 Thread Lars Marowsky-Bree
On 2011-05-06T09:37:09, Florian Haas florian.h...@linbit.com wrote: To use it, set op monitor interval=X OCF_CHECK_LEVEL=Y The spec never decreed that this was how it has to be configured, just that this was the way how the environment variable had to be passed in. (The idea being that it

Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)

2011-05-01 Thread Lars Marowsky-Bree
On 2011-04-26T16:03:48, Dejan Muhamedagic de...@suse.de wrote: - the required attributes in meta-data need to be reviewed, a parameter is either required or has a default, cannot be both Why would this be the case? Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell,

Re: [Linux-ha-dev] Patch : ocf/resource.d/heartbeat/Filesystem: Use /proc/mounts if available since /etc/mtab might be outdated

2011-03-24 Thread Lars Marowsky-Bree
On 2011-03-23T14:45:50, Corvus Corax e...@lightwerk.com wrote: Re-implementing mount seems like a bad idea. If more extensive checks than a binary mounted xor not are needed, this would be noticed by the deeper monitor levels or the application-level monitor, no? Regards, Lars

Re: [Linux-ha-dev] Patch : ocf/resource.d/heartbeat/Filesystem: Use /proc/mounts if available since /etc/mtab might be outdated

2011-03-23 Thread Lars Marowsky-Bree
On 2011-03-23T11:50:02, Lars Ellenberg lars.ellenb...@linbit.com wrote: # Take advantage of /etc/mtab if present, use portable mount command # otherwise. Normalize format to dev mountpoint fstype. I wonder if we shouldn't just always rely on mount and insist on that providing proper data.

Re: [Linux-ha-dev] New stonith plugin for netio230A device

2011-03-23 Thread Lars Marowsky-Bree
On 2011-03-21T15:20:33, Martin Dziobek dzio...@ihr.uni-stuttgart.de wrote: It does not make sense to use more than one port of a netio device for a single node if you have redundant power supplies in a box, since this would remove redundancy. Each port is fed through the netio's power

[Linux-ha-dev] Announcement: Linux Foundation HA working group mailing lists

2011-03-03 Thread Lars Marowsky-Bree
Hi everyone, please excuse the long Cc list. Behind the scenes, some of the projects that make up the cluster stack on Linux have been working together to converge and integrate the various projects. We have been meeting on and off for the last decade, and made some amazing progress over the

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T11:56:53, Dejan Muhamedagic deja...@fastmail.fm wrote: Great! Unfortunately, we can't replace the old db2 now, the number of changes is very large: That, by itself, doesn't strike me as a reasonable argument for duplicating the RA. It may seem a reasonable idea to protect

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T14:17:58, Dejan Muhamedagic deja...@fastmail.fm wrote: At any rate, I wouldn't want to take responsibility for replacing the existing (and working RA) with a completely new and not yet tested code. Call me coward :) I agree with that. (Not the coward part, of course!) IMHO

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T14:43:17, Andrew Beekhof and...@beekhof.net wrote: It happens often enough - its just normally by a core developer. And realistically, almost every RA is going to get similar treatment (over time) as they're merged with the Red Hat ones. The pending big refactoring merge

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T17:05:08, Dejan Muhamedagic deja...@fastmail.fm wrote: That, by itself, doesn't strike me as a reasonable argument for duplicating the RA. Look, if I could tell that changes were safe, then I certainly wouldn't moan about it. If you can, please go ahead. ... I thought I

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T18:22:09, Dejan Muhamedagic deja...@fastmail.fm wrote: We need to get out of this oh my god fear is bad I'm afraid attitude as a project/group. It doesn't make me happy. Right now, I think the best option would be to put your agent in /usr/lib/ocf/resource.d/testing and ask

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T18:08:51, Holger Teutsch holger.teut...@web.de wrote: coming back from a 3 hour break I'm a bit shocked about the very *active* discussion and it is difficult to me to find an entry point. Heh ;-) Sorry, you've prodded a wasp nest. I've spent a lot of time to make it plug in

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T17:09:29, Dejan Muhamedagic deja...@fastmail.fm wrote: I agree with that. (Not the coward part, of course!) IMHO though, the answer is to improve our test coverage to the point where we can refactor and clean up code without taking an unreasonable risk of breakage. Dream on

Re: [Linux-ha-dev] New master/slave resource agent for DB2 databases in HADR (High Availability Disaster Recovery) mode

2011-02-09 Thread Lars Marowsky-Bree
On 2011-02-09T16:58:12, Dejan Muhamedagic deja...@fastmail.fm wrote: Actually, I was also thinking about having another provider for db2. I think that we should do this. I'm still unconvinced, because it requires users to change their configuration back and forth. If the RA is really

Re: [Linux-ha-dev] Add real monitoring capabilities to IPaddr2 resource agent

2011-02-08 Thread Lars Marowsky-Bree
On 2011-02-05T00:16:45, Lars Ellenberg lars.ellenb...@linbit.com wrote: I do like this packet counter monitoring. So do I, but I'd just casually suggest that this may make sense as a daemon, or at least per physical NIC - instead of per virtual IP. (Andrew, much to my dismay, has changed the

Re: [Linux-ha-dev] Antwort: Re: Antwort: Re: OCF RA dev guide: final heads up

2011-01-17 Thread Lars Marowsky-Bree
On 2011-01-17T10:25:50, Andrew Beekhof and...@beekhof.net wrote: While we're at it... Andrew, could you pass the OCF_RESKEY_CRM_meta_depth variable? Then we can update the resource agents and the documentation. You mean create one and pass it? No such thing currently exists. It wouldn't

Re: [Linux-ha-dev] ocf:linbit:drbd incorrectly handles split brain

2011-01-13 Thread Lars Marowsky-Bree
On 2011-01-05T15:46:54, Florian Haas florian.h...@linbit.com wrote: Run Pacemaker on Heartbeat, and use dopd, and this won't happen. Hi Florian, is there any missing functionality in the pacemaker integration? Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE

Re: [Linux-ha-dev] OCF Resource Agent Developer's Guide: unique parameters ?

2011-01-10 Thread Lars Marowsky-Bree
On 2011-01-10T16:46:43, Florian Haas florian.h...@linbit.com wrote: Dejan, is there any chance that you could put parameter uniqueness enforcement into the shell? Or maybe this is something that needs to be put in the PE itself, and should flag a warning in ptest output? The

Re: [Linux-ha-dev] Thinking about a new communications plugin

2010-11-23 Thread Lars Marowsky-Bree
On 2010-11-22T14:18:27, Alan Robertson al...@unix.sh wrote: Any thoughts about this? http://kronosnet.org/ -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. --

Re: [Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-05 Thread Lars Marowsky-Bree
On 2010-11-04T08:48:58, Alan Robertson al...@unix.sh wrote: This is something that's come up several times in the past (containers of resources), and something that seems to be neatly addressed by the current work on Matahari. http://repos.fedorapeople.org/repos/beekhof/matahari

Re: [Linux-ha-dev] pgsql Patch

2010-09-02 Thread Lars Marowsky-Bree
On 2010-09-01T20:26:09, Serge Dubrouski serge...@gmail.com wrote: While I'm agree in general I'm not sure that it wouldn't be better to modify ocf_run in a way that it would return actual error code form a command that it ran instead of generic $OCF_ERR_GENERIC I'd be inclined to agree, but

Re: [Linux-ha-dev] About movement of the Quorum control.

2010-08-30 Thread Lars Marowsky-Bree
On 2010-08-27T16:16:45, Andrew Beekhof and...@beekhof.net wrote: bcast is handled by corosync, by the way; and unicast support may also be coming. Yeah, but its not supported on RhEL which means he's not actively testing it or fixing bugs. Your milage may vary :-) We do test it though -

Re: [Linux-ha-dev] About movement of the Quorum control.

2010-08-27 Thread Lars Marowsky-Bree
On 2010-08-27T09:53:32, Andrew Beekhof and...@beekhof.net wrote: However, it is very difficult for us to wait for corosync to be stable. Its pretty close these days. It does lack ucast and bcast support, but there are plans to address that outside of corosync. The current corosync is really

Re: [Linux-ha-dev] [PATCH]Restraint of the noisy log output of the monitor processing.

2010-08-18 Thread Lars Marowsky-Bree
On 2010-08-18T14:42:15, renayama19661...@ybb.ne.jp wrote: I send the patch which I changed it of to form same as former monitor. With this patch, the log of the monitor processing becomes quiet. Yes, but the output goes to stdout still, it is just not captured by the script - but the lrmd

Re: [Linux-ha-dev] CFP: Linux Plumbers Mini-Conf on High-Availability/Clustering

2010-08-10 Thread Lars Marowsky-Bree
On 2010-08-04T15:59:27, Lars Marowsky-Bree l...@novell.com wrote: Hi all, there will (hopefully!) be a mini-conference on HA/Clustering at this year's LPC in Cambridge, MA, Nov 3-5th. Just a quick reminder, there've not been many proposals submitted yet. If the trend continues, the mini

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-08-09 Thread Lars Marowsky-Bree
On 2010-07-30T16:12:24, Simon Horman ho...@verge.net.au wrote: This change looks good to me. I have applied the change. http://hg.linux-ha.org/agents/rev/612e2966f372 I've had to commit a small revision, because on IA64, the memory on the stack is not aligned properly for the cast to

[Linux-ha-dev] CFP: Linux Plumbers Mini-Conf on High-Availability/Clustering

2010-08-04 Thread Lars Marowsky-Bree
Hi all, there will (hopefully!) be a mini-conference on HA/Clustering at this year's LPC in Cambridge, MA, Nov 3-5th. This would be an informal summit for the HA folks to get together and discuss the various issues that would benefit from a face to face meeting; to facilitate progress faster

Re: [Linux-ha-dev] Monitoring Process Death

2010-06-02 Thread Lars Marowsky-Bree
On 2010-06-01T00:37:24, Lars Ellenberg lars.ellenb...@linbit.com wrote: Once we notice, what are we supposed to do? Not do any action ourselve, but tell pacemaker the resource has failed, because that is where $policy lives? Yes, that's it, I think. Sorry for the outburst, I'm really overly

Re: [Linux-ha-dev] Monitoring Process Death

2010-06-02 Thread Lars Marowsky-Bree
On 2010-06-02T14:51:25, Lars Ellenberg lars.ellenb...@linbit.com wrote: I'd not put it into the RA. I'd write a wrapper around whatever process is supposed to run. maybe a shell script, or a shell function. somthing like that: call_crm_resource-F_on_process_exit.sh #!/bin/sh # takes

Re: [Linux-ha-dev] Monitoring Process Death

2010-06-02 Thread Lars Marowsky-Bree
On 2010-06-02T15:39:26, Lars Ellenberg lars.ellenb...@linbit.com wrote: Most auto-backgrounding thingies also have a foreground mode. No. SAP etc don't. BTW, the RA cannot register the pid of the background process either, as it only knows the pid of the process before it backgrounded

Re: [Linux-ha-dev] Monitoring Process Death

2010-05-31 Thread Lars Marowsky-Bree
On 2010-05-28T16:09:03, Bob Schatz bsch...@yahoo.com wrote: I have started reading the lrmd source. One thing I am worried about is that if I give a PID to lrmd, how will lrmd monitor it? My RA is a shell script that forks off a daemon. If I give this daemon PID to lrmd does lrmd

Re: [Linux-ha-dev] Upstart RA

2010-05-31 Thread Lars Marowsky-Bree
On 2010-05-17T08:40:51, Andrew Beekhof and...@beekhof.net wrote: Exit codes weren't implemented since upstart knows a bit more states than just 'running' or 'not running', i.e. it knows distinction between running, but stopping and running. Which is still no excuse for them not doing exit

Re: [Linux-ha-dev] Monitoring Process Death

2010-05-31 Thread Lars Marowsky-Bree
On 2010-05-31T18:16:30, Lars Ellenberg lars.ellenb...@linbit.com wrote: There are several flavors of overhead. One underestimated is programming and code maintenance overhead ;-) Why would we register pid with lrm and duplicate code from heartbeat proper to lrmd and whatnot, or even rewrite

Re: [Linux-ha-dev] Monitoring Process Death

2010-05-28 Thread Lars Marowsky-Bree
On 2010-05-21T12:12:12, Bob Schatz bsch...@yahoo.com wrote: I think the basic requirements are: 1.When a process starts it registers itself with a kernel component. This registration also gets passed an action. The easiest way would be for the RA to register pids to be monitored to lrmd,

Re: [Linux-ha-dev] Tickle Ack function in portblock resource

2010-05-02 Thread Lars Marowsky-Bree
On 2010-04-26T16:35:31, Sam Tran stl...@gmail.com wrote: primitive portblock_block ocf:heartbeat:portblock \ params protocol=tcp ip=192.168.8.171 portno=636 action=block \ op monitor interval=10 timeout=10 depth=0 primitive portblock_unblock ocf:heartbeat:portblock \

Re: [Linux-ha-dev] Deprecated resource agents

2010-04-20 Thread Lars Marowsky-Bree
On 2010-04-19T23:04:42, Lars Ellenberg lars.ellenb...@linbit.com wrote: If it works and is auto-migrated, the warning shouldn't be noisy - the logs already are ;-) That's the vicious circle: if the noise level is too high, everyone starts shouting. If we can do it mostly under the lid,

Re: [Linux-ha-dev] Deprecated resource agents

2010-04-20 Thread Lars Marowsky-Bree
On 2010-04-19T23:03:23, Tim Serong tser...@novell.com wrote: This'd be easiest if the metadata explicitly said an RA was deprecated, for example something like: ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=Evmsd version=0.9 deprecated=true ...

Re: [Linux-ha-dev] OCF

2010-04-20 Thread Lars Marowsky-Bree
On 2010-04-20T09:15:12, Andrew Beekhof and...@beekhof.net wrote: Which brings up another good point... Can we please make OCF relevant again by converting the repo to Hg and allowing access? The server the CVS repo is on right now is way too old to host a recent hg version, and for various

Re: [Linux-ha-dev] Deprecated resource agents

2010-04-20 Thread Lars Marowsky-Bree
On 2010-04-20T08:20:57, Florian Haas florian.h...@linbit.com wrote: And the first two _are_ still used/maintained for the SLES10 packages. Which means they'll _never_ even install a resource-agents package without risking to lose support. Why do we need to carry those RAs in future

Re: [Linux-ha-dev] proposed fix for the ABI extension of cluster-glue

2010-04-19 Thread Lars Marowsky-Bree
On 2010-04-17T21:04:06, Lars Ellenberg lars.ellenb...@linbit.com wrote: This is the interdiff: Bump the so version, note this is the libtool interface:revision:age way, the resulting soname is libplumb.so.2.1.0 --- a/lib/clplumbing/Makefile.am Thu Apr 15 15:58:50 2010 +0200 +++

Re: [Linux-ha-dev] Deprecated resource agents

2010-04-19 Thread Lars Marowsky-Bree
On 2010-04-19T14:05:48, Florian Haas florian.h...@linbit.com wrote: - EvmsSCC and - Evmsd (both apply to EVMS, which is no longer maintained); - LinuxSCSI (superseded by SCSI reservations and SF-EX); - drbd (superseded by ocf:linbit:drbd); - pingd (superseded by ocf:pacemaker:pingd,

Re: [Linux-ha-dev] proposed fix for the ABI extension of cluster-glue

2010-04-17 Thread Lars Marowsky-Bree
On 2010-04-16T23:51:55, Lars Ellenberg lars.ellenb...@linbit.com wrote: We need it for ACL support in pacemaker. I thought that was going to be done differently, as it needs to be solved differently anyways to support remote TCP (or in general, non unix domain) connections? That doesn't

Re: [Linux-ha-dev] Announcement: new release for cluster-glue (1.0.5)

2010-04-16 Thread Lars Marowsky-Bree
On 2010-04-16T11:48:35, Lars Ellenberg lars.ellenb...@linbit.com wrote: We may add the functionality back, in case anyone actually needs it. We need it for ACL support in pacemaker. But certainly not breaking backwards ABI compatibility, if not absolutely necessary. Which it is not for this

Re: [Linux-ha-dev] proposed fix for the ABI extension of cluster-glue

2010-04-16 Thread Lars Marowsky-Bree
On 2010-04-16T13:48:17, Lars Ellenberg lars.ellenb...@linbit.com wrote: Old code works with old and new glue. New code works with old and new glue. New code depending on the new struct members get a define to check for at compile time, and a runtime check just as well. New code depending

Re: [Linux-ha-dev] Announcement: new release for cluster-glue (1.0.5)

2010-04-15 Thread Lars Marowsky-Bree
On 2010-04-15T16:11:55, Dejan Muhamedagic deja...@fastmail.fm wrote: Hello, The cluster glue release 1.0.4 contains a change which breaks ABI. That would break all Heartbeat and Pacemaker installations which weren't built against that release. We could discuss more about what and how, but

Re: [Linux-ha-dev] [PATCH 0 of 1] High: SAPInstance RA: don't rely on op target rc when monitoring clones (lf#2371)

2010-03-30 Thread Lars Marowsky-Bree
On 2010-03-29T11:51:10, Tim Serong tser...@novell.com wrote: Hi All, This fixes a flawed assumption the SAPInstance RA makes about how it should monitor master/slave clones. It was relying on the op target RC, but this is completely wrong, and in any case, as of

Re: [Linux-ha-dev] [PATCH] Medium: RA: mysql: added?replication?capabilities

2010-02-26 Thread Lars Marowsky-Bree
On 2010-02-26T15:49:34, Marian Marinov m...@yuhu.biz wrote: The RA also should make sure that the list only contains one value. But yes, this should work. Is it actually possible to have more then one nodes in this variable for Master/Slave resources ? Yes, if meta_attribute:master-max

Re: [Linux-ha-dev] [PATCH] Medium: RA: mysql: added replication capabilities

2010-02-25 Thread Lars Marowsky-Bree
On 2010-02-25T09:18:51, Florian Haas florian.h...@linbit.com wrote: 1. the whitespaces after the HOSTNAME should remain since this is how are the unames exported to the RA Seriously? Then it's not your business to fix that, those environment variables must be set correctly. Andrew, can

Re: [Linux-ha-dev] announcement: cluster-glue 1.0.3 release

2010-02-02 Thread Lars Marowsky-Bree
On 2010-02-02T18:16:17, Dejan Muhamedagic deja...@fastmail.fm wrote: Hello everybody, The cluster-glue release 1.0.2 has a serious bug in lrmd: on lrmadmin connect lrmd would stop running repeating operations (monitor). Every time crm shell is run, lrmadmin is also invoked. Please don't

Re: [Linux-ha-dev] [PATCH] Tickle ACK to TCP connections

2010-01-26 Thread Lars Marowsky-Bree
On 2010-01-22T16:07:23, Dejan Muhamedagic deja...@fastmail.fm wrote: Just about to apply the patch. The only changes I made was to the meta data. The usage should, however, be documented in more detail, probably including how to overall handle the HA network services as it has already been

Re: [Linux-ha-dev] ulimit in ocf scripts

2010-01-15 Thread Lars Marowsky-Bree
On 2010-01-13T11:31:01, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: i am talking about elevated (or even reduced) limits which are enforced by the os, for example no. of open files/locks/pipes, scheduling priority, cpu time, maybe core file size, etc. (see man ulimit, man bash or similar).

Re: [Linux-ha-dev] [PATCH] Tickle ACK to TCP connections

2010-01-06 Thread Lars Marowsky-Bree
On 2010-01-06T15:21:53, Jiaju Zhang jjzhang.li...@gmail.com wrote: I have done some testing to the patch, it works as expected. So I regenerate this patch based on the current tip of resource agents repository. Attched is the hg export of it. Good job! Let's get this polished soon! Some

Re: [Linux-ha-dev] [PATCH] Tickle ACK to TCP connections

2010-01-06 Thread Lars Marowsky-Bree
On 2010-01-06T16:32:33, Dejan Muhamedagic deja...@fastmail.fm wrote: That said, I'm fine with using a file if it keeps up the performance. But the sync_script - we definitely don't want to be calling an external script, I guess. csync2 would be automatic anyway. The script (or program)

[Linux-ha-dev] glue cs#7505f2e115c5 - reintroducing heartbeat name into paths?

2009-12-10 Thread Lars Marowsky-Bree
Hi Dejan, what's the point of this changeset? It renames several paths from 'glue' to 'heartbeat', I'm not sure that is a good iea; why? Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is

Re: [Linux-ha-dev] glue cs#7505f2e115c5 - reintroducing heartbeat name into paths?

2009-12-10 Thread Lars Marowsky-Bree
On 2009-12-10T18:27:45, Lars Marowsky-Bree l...@suse.de wrote: what's the point of this changeset? It renames several paths from 'glue' to 'heartbeat', I'm not sure that is a good iea; why? Same for agents cs#84cceca21769. Regards, Lars -- Architect Storage/HA, OPS Engineering

Re: [Linux-ha-dev] glue cs#7505f2e115c5 - reintroducing heartbeat name into paths?

2009-12-10 Thread Lars Marowsky-Bree
On 2009-12-10T21:45:34, Dejan Muhamedagic deja...@fastmail.fm wrote: There are several packages using /usr/lib/heartbeat and similar. Yeah, but that was mostly a legacy thing, I thought - on a system without heartbeat installed, this is sort of a confusing artifact. The only thing were we have

Re: [Linux-ha-dev] [PATCH]Improvement of the log output of Filesystem.

2009-11-19 Thread Lars Marowsky-Bree
On 2009-11-19T10:06:37, renayama19661...@ybb.ne.jp wrote: Hi raoul, i think it would be better to concat this into one line, something like: ocf_log err Failed to read device $DEVICE: $err_output I imitated it for handling of Filesystem_monitor_20 and thought about a revision.

Re: [Linux-ha-dev] suggestions and patch for meatclient

2009-11-19 Thread Lars Marowsky-Bree
On 2009-11-16T18:58:12, Dejan Muhamedagic deja...@fastmail.fm wrote: To have this handled by CRM, crmd would have to cancel the currently running stonith action, i.e. send the appropriate message to stonithd. Sure. This also happens if the node eventually reboots and rejoins cleanly, too,

Re: [Linux-ha-dev] suggestions and patch for meatclient

2009-11-14 Thread Lars Marowsky-Bree
On 2009-11-13T11:42:31, Dejan Muhamedagic deja...@fastmail.fm wrote: I would like to hear any opinion. Great idea! But I'd like to suggest a bit different execution, i.e. to have usage like this: The idea is nice, but what we actually want is a crm node clean-down-confirmation XXX command,

Re: [Linux-ha-dev] Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0

2009-11-11 Thread Lars Marowsky-Bree
On 2009-11-11T10:14:29, Florian Haas florian.h...@linbit.com wrote: This is a common mistake, we remove them when we find them, but please don't add new ones ;-) Maybe I'm missing something, but I don't follow that. If that is indeed a mistake, then please replace common with ubiquitous

Re: [Linux-ha-dev] Fwd: [PATCH] call validate-all when monitoring with OCF_RESKEY_CRM_meta_interval=0

2009-11-11 Thread Lars Marowsky-Bree
On 2009-11-11T12:22:48, Dejan Muhamedagic deja...@fastmail.fm wrote: Simply put: if you're checking for dependencies possibly provided by other resources, these won't be present at probe time. (Kind of obvious, really.) In this case, the CRM, which knows that dependencies are not

  1   2   3   4   5   6   >