Re: [Linux-HA] Antw: Re: file system resource becomes inaccesible when any of the node goes down

2015-07-09 Thread Lars Marowsky-Bree
On 2015-07-07T14:15:14, Muhammad Sharfuddin m.sharfud...@nds.com.pk wrote: now msgwait timeout is set to 10s and a delay/inaccessibility of 15 seconds was observed. If a service(App, DB, file server) is installed and running from the ocfs2 file system via the surviving/online node, then

Re: [Linux-HA] Antw: Re: Antw: Re: file system resource becomes inaccesible when any of the node goes down

2015-07-09 Thread Lars Marowsky-Bree
On 2015-07-07T12:23:44, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: The advantage depends on the alternatives: If two nodes both want to access the same filesystem, you can use OCFS2, NFS, or CIFS (list not complete). If only one node can access a filesystem, you could try any

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-16 Thread Lars Marowsky-Bree
On 2015-02-16T09:20:22, Kristoffer Grönlund kgronl...@suse.com wrote: Actually, I decided that it does make sense to return 0 as the error code even if the resource to delete doesn't exist, so I pushed a commit to change this. The error message is still printed, though. I'm not sure I agree,

Re: [Linux-HA] Antw: Re: Antw: Re: Antw: Re: Antw: Re: SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

2015-01-30 Thread Lars Marowsky-Bree
On 2015-01-30T14:57:29, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: # grep -i high /etc/corosync/corosync.conf clear_node_high_bit:new Could this cause our problem? This is an option that didn't exist prior to SP3. With there was no change meant: No administrator

Re: [Linux-HA] Antw: Re: Antw: Re: Antw: Re: SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

2015-01-30 Thread Lars Marowsky-Bree
On 2015-01-30T08:23:14, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Two of the three nodes were actually updated from SP1 via SP2 to SP3, and the third node was installed with SP3. AFAIR there was no configuration change since SP1. That must be incorrect, because: Was the

Re: [Linux-HA] Antw: Re: SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

2015-01-28 Thread Lars Marowsky-Bree
On 2015-01-28T16:21:23, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Kind of answering my own question: Node id 84939948 in hex is 051014AC, which is 5.16.20.172 where the IP address actually is 172.20.16.5. But I see another node ID of 739512325 (hex 2C141005) which is 44.20.16.5.

Re: [Linux-HA] Antw: Re: Antw: Re: SLES11 SP3: warning: crm_find_peer: Node 'h01' and 'h01' share the same cluster nodeid: 739512321

2015-01-28 Thread Lars Marowsky-Bree
On 2015-01-28T16:44:34, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: address actually is 172.20.16.5. But I see another node ID of 739512325 (hex 2C141005) which is 44.20.16.5. That seems revered compared to the above, and the 0x2c doesn't fit anywhere. It does. The

Re: [Linux-HA] Support for DRDB

2015-01-17 Thread Lars Marowsky-Bree
On 2015-01-16T16:25:15, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72) external.martin.kon...@de.bosch.com wrote: I am glad to hear that SLE HA has no plans to drop support for DRBD. Unfortunately I currently cannot disclose who is spreading this false information. Too bad. Do let them

Re: [Linux-HA] Antw: multipath sbd stonith device recommended configuration

2015-01-16 Thread Lars Marowsky-Bree
On 2015-01-16T08:11:48, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! MHO: The correct time to wait is in an interval bounded by these two values: 1: An I/O delay that may occur during normal operation that is never allowed to trigger fencing 2: The maximum value to are

Re: [Linux-HA] Antw: Re: Antw: multipath sbd stonith device recommended configuration

2015-01-16 Thread Lars Marowsky-Bree
On 2015-01-16T12:22:57, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Unfortunately the SBD syntax is a real mess, and there is not manual page (AFAIK) for SBD. ... because man sbd isn't obvious enough, I guess. ;-) OK, I haven't re-checked recently: You added one! Yes, we

Re: [Linux-HA] Support for DRDB

2015-01-16 Thread Lars Marowsky-Bree
On 2015-01-16T11:56:04, EXTERNAL Konold Martin (erfrakon, RtP2/TEF72) external.martin.kon...@de.bosch.com wrote: I have been told that support for DRBD is supposed to be phased out from both SLES and RHEL in the near future. This is massively incorrect for SLE HA. (drbd is part of the HA

Re: [Linux-HA] SLES11 SP3 compatibility with HP Data Protector 7' Automatic Desaster Recovery Module

2014-12-04 Thread Lars Marowsky-Bree
On 2014-12-04T08:12:28, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Of course HP's software isn't quite flexible here, but maybe a symlink from the old location to the new one wouldn't be bad (for the lifetime of SLES11, maybe)... A symlink might not work, depending on what kind

Re: [Linux-ha-dev] Ordering of clones; does it work?

2014-11-27 Thread Lars Marowsky-Bree
On 2014-11-27T10:10:47, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I had thought ordrring of clones would work, but it looks like it does not in current SLES11 SP3 (1.1.11-3ca8c3b): I have rules like: order ord_DLM_O2CB inf: cln_DLM cln_O2CB order ord_DLM_cLVMd inf:

Re: [Linux-HA] [ha-wg] [Pacemaker] [Cluster-devel] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Lars Marowsky-Bree
On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote: Okay, okay, apparently we have got enough topics to discuss. I'll grumble a bit more about Brno, but let's get the organisation of that thing on track ... Sigh. Always so much work! I'm assuming arrival on the 3rd and departure on

Re: [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Lars Marowsky-Bree
On 2014-11-24T16:16:05, Fabio M. Di Nitto fdini...@redhat.com wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) That would be the perfect opportunity for you to convert users to Suse ;) I´d prefer, at least for this round, to keep

Re: [Linux-HA] [ha-wg] [ha-wg-technical] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-11-11T09:17:56, Fabio M. Di Nitto fdini...@redhat.com wrote: Hey, I know I'm a bit late to the game, but: I'd be happy to meet, yet Brno is not all that easy to reach. There don't appear to be regular flights to BRQ, and it's also quite far by train. Am I missing something obvious

Re: [Linux-HA] [ha-wg] [ha-wg-technical] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-11-24T06:59:39, Digimer li...@alteeve.ca wrote: The LINBIT folks suggested to land in Vienna and then it's two hours by road, but I've not looked too closely at it just yet. I'd be happy to meet in Vienna. I'm not keen on first flying to VIE and then spending 2+h on the road/bus.

Re: [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-09-08T12:30:23, Fabio M. Di Nitto fdini...@redhat.com wrote: Folks, Fabio, thanks for organizing this and getting the ball rolling. And again sorry for being late to said game; I was busy elsewhere. However, it seems that the idea for such a HA Summit in Brno/Feb 2015 hasn't exactly

Re: [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Lars Marowsky-Bree
On 2014-11-24T15:54:33, Fabio M. Di Nitto fdini...@redhat.com wrote: dates and location were chosen to piggy-back with devconf.cz and allow people to travel for more than just HA Summit. Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) I´d

Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-24 Thread Lars Marowsky-Bree
On 2014-10-23T20:36:38, Lars Ellenberg lars.ellenb...@linbit.com wrote: If we want to require presence of start-stop-daemon, we could make all this somebody elses problem. I need find some time to browse through the code to see if it can be improved further. But in any case, using (a tool

Re: [Linux-HA] Q: ocf-tester

2014-09-09 Thread Lars Marowsky-Bree
On 2014-09-09T16:03:04, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I modified the ping RA to meet my needs, and then I used ocf-tester to check it with the settings desired. I'm wondering about the output; shoudln't ocf-tester query the metadata _before_ trying to use the methods,

Re: [Linux-HA] getting proper sources

2014-06-07 Thread Lars Marowsky-Bree
On 2014-06-07T16:13:05, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: So I'd appreciate it if you'd not make those claims; I admit to feeling slighted. The claim that prompted this was that the level of support a centos user gets is for pacemaker: 50% chance that the Lars over there will ask

Re: [Linux-HA] getting proper sources

2014-06-06 Thread Lars Marowsky-Bree
On 2014-05-31T11:15:20, Dmitri Maziuk dmaz...@bmrb.wisc.edu wrote: Is there a reason you keep spouting nonsense? Yes: I have a memory and it remembers. For example, this: http://www.gossamer-threads.com/lists/linuxha/users/81573?do=post_view_threaded#81573 I don't remember that being an

Re: [Linux-HA] Multiple IP address on ocf:heartbeat:IPaddr2

2014-06-05 Thread Lars Marowsky-Bree
On 2014-06-05T07:41:18, Teerapatr Kittiratanachai maillist...@gmail.com wrote: In my /usr/lib/ocf/resource.d/heartbeat/ directory doesn't has the `IPv6addr` agent. But I found that the `IPaddr2` agent also support IPv6, from the source code in GitHub (IPaddr2

Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Lars Marowsky-Bree
On 2014-06-02T20:37:59, Venkata G Thota venkt...@in.ibm.com wrote: Hello, In our project we had the heartbeat cluster with version heartbeat-2.1.4-0.24.9. Is it the supported version ? Kindly assist how to get support for heartbeat cluster issues. Regards That looks like a fairly

Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Lars Marowsky-Bree
On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote: You should email Linbit (http://linbit.com) as they're the company that still supports the heartbeat package. For completeness, I doubt Linbit will support this version, since 2.1.4 from SLES 10 contains a number of backports from the

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-25 Thread Lars Marowsky-Bree
On 2014-04-22T14:21:33, Tom Parker tpar...@cbnco.com wrote: Hi Tom, Has anyone seen this? Do you know what might be causing the flapping? No, I've never seen this. Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with cluster ... So it connected fine. This is the process

Re: [Linux-HA] Antw: Re: Q: NTP Problem after Xen live migration

2014-04-17 Thread Lars Marowsky-Bree
On 2014-04-17T08:05:43, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I think Xen live migration and correct time has a lot to do with HA; maybe not with the product you have in mind, but with the concept in general. Sure. But Xen and kernel time keeping developers aren't subscribed to

Re: [Linux-HA] Q: NTP Problem after Xen live migration

2014-04-16 Thread Lars Marowsky-Bree
On 2014-04-16T09:18:21, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: As it turn out, the time in the VMs is actually wrong after migration: # ntpq -pn remote refid st t when poll reach delay offset jitter

Re: [Linux-ha-dev] pull request for sg_persist new RA ocft

2014-03-21 Thread Lars Marowsky-Bree
On 2014-03-18T02:24:51, Liuhua Wang lw...@suse.com wrote: Hi Liuhua, thanks for pushing again! I've taken some time to provide some code review. Overall, I think it looks good, mostly cosmetic and codingstyle. I'd welcome more insight from others on this list; especially those with maintainer

Re: [Linux-HA] How to tell pacemaker to process a new event during a long-running resource operation

2014-03-17 Thread Lars Marowsky-Bree
On 2014-03-14T15:50:18, David Vossel dvos...@redhat.com wrote: in-flight operations always have to complete before we can process a new transition. The only way we can transition earlier is by killing the in-flight process, which results in failure recovery and possibly fencing depending

Re: [Linux-HA] Antw: resource-agents: exportfs: Unlocking filesystems on stop by default

2014-03-11 Thread Lars Marowsky-Bree
On 2014-03-11T12:37:39, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I'm wondering: Does unlock mean that all file locks are invalidated? If so, I think it's a bad idea, because for a migration of the NFS server the exports will be stopped/started, thus loosing all locks. That's not

Re: [Linux-HA] 2 Nodes split brain, distant sites

2014-02-28 Thread Lars Marowsky-Bree
On 2014-02-27T11:05:21, Digimer li...@alteeve.ca wrote: So regardless of quorum, fencing is required. It is the only way to reliably avoid split-brains. Unfortunately, fencing doesn't work on stretch clusters. For a two node stretch cluster, sbd can also be used reliably as a fencing

Re: [Linux-HA] 2 Nodes split brain, distant sites

2014-02-28 Thread Lars Marowsky-Bree
On 2014-02-28T13:16:33, Digimer li...@alteeve.ca wrote: Assuming a SAN in each location (otherwise you have a single point of failure), then isn't it still possible to end up with a split-brain if/when the WAN link fails? As I suggested a 3rd tie-breaker site (which, in the case of SBD, can

Re: [Linux-HA] Why wont ocf:heartbeat:Xen check the current status?

2014-02-22 Thread Lars Marowsky-Bree
On 2014-02-22T12:35:42, ml ml mliebher...@googlemail.com wrote: Hello List, i have a two node Cluster with Debian 7 and this configuation: node proxy01-example.net node proxy02-example.net primitive login.example.net ocf:heartbeat:Xen \ params xmfile=/etc/xen/login.example.net.cfg \

Re: [Linux-HA] resources don't migrate on failure of one node (in a two node cluster)

2014-02-22 Thread Lars Marowsky-Bree
On 2014-02-22T13:49:40, JR botem...@gmail.com wrote: I've been told by folks on the linux-ha IRC that fencing is my answer and I've put in place the null fence client. I understand that this is not what I'd want in production, but for my testing it seems to be the correct way to test a

Re: [Linux-HA] Antw: Re: Q: crm configure edit/show regex

2014-02-19 Thread Lars Marowsky-Bree
On 2014-02-19T10:31:45, Andrew Beekhof and...@beekhof.net wrote: Unifying this might be difficult, as far as I know pcs doesn't have an interactive mode or anything similar to the configure interface of crmsh.. It does have bash completion for the command line. FWIW, so does the crm shell

Re: [Linux-HA] Why does o2cb RA remove module ocfs2?

2014-02-05 Thread Lars Marowsky-Bree
On 2014-02-05T12:24:00, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I had a problem where O2CB stop fenced the node that was shut down: I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the O2CB resource was stopped.

Re: [Linux-HA] Antw: Re: Why does o2cb RA remove module ocfs2?

2014-02-05 Thread Lars Marowsky-Bree
On 2014-02-05T15:06:47, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I guess the kernel update is more common than the just the ocfs2-kmp update Well, some customers do apply updates in the recommended way, and thus don't encounter this ;-) In any case, since at this time the cluster

Re: [Linux-HA] Antw: crm configure stonith:external/vcenter - parameter VI_CREDSTORE does not exist

2014-01-30 Thread Lars Marowsky-Bree
On 2014-01-30T12:19:27, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: root@vm-nas1:~# crm ra info fencing_vcenter stonith:external/vcenter ERROR: stonith:external/vcenter:fencing_vcenter: could not parse meta-data: I guess your RA may be LSB (which is kind of obsolete). Hm? How can

Re: [Linux-HA] Antw: crm configure stonith:external/vcenter - parameter VI_CREDSTORE does not exist

2014-01-30 Thread Lars Marowsky-Bree
On 2014-01-31T07:59:57, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: root@vm-nas1:~# crm ra info fencing_vcenter stonith:external/vcenter ERROR: stonith:external/vcenter:fencing_vcenter: could not parse meta-data: I guess your RA may be LSB (which is kind of obsolete). Hm?

Re: [Linux-HA] Antw: Re: /usr/sbin/lrmadmin missing from cluster-glue

2014-01-27 Thread Lars Marowsky-Bree
On 2014-01-27T08:59:55, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Talking on node-action-limit: I think I read in the syslog (not the best way to document changes) that the migration-limit parameter is obsoleted by node-action-limit in lastest SLES. Ist that correct? No.

Re: [Linux-HA] /usr/sbin/lrmadmin missing from cluster-glue

2014-01-25 Thread Lars Marowsky-Bree
On 2014-01-24T10:52:56, Tom Parker tpar...@cbnco.com wrote: Thanks Kristoffer. How is tuning done for lrm now? What do you want to tune? The LRM_MAX_CHILDREN setting is still (okay: again ;-), that was broken in one update) honored as before. Or you can use the node-action-limit property in

Re: [Linux-HA] Antw: Re: heartbeat failover

2014-01-24 Thread Lars Marowsky-Bree
On 2014-01-24T08:16:03, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: We have a server with a network traffic light on the front. With corosync/pacemaker the light is constantly flickering, even if the cluster does nothing. So I guess it's normal. Yes. Totem and other components have

Re: [Linux-HA] crmd (?) becomes unresponsive

2014-01-22 Thread Lars Marowsky-Bree
On 2014-01-22T09:55:10, Thomas Schulte tho...@cupracer.de wrote: Hi Thomas, since those are very recent upstream versions, I think you'll have a better chance to ask directly on the pacemaker mailing list, or directly report via bugs.clusterlabs.org - at least for providing the attachments,

Re: [Linux-HA] Antw: Re: crmd (?) becomes unresponsive

2014-01-22 Thread Lars Marowsky-Bree
On 2014-01-22T11:18:06, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: We are living in a very distributed world, even when using one Linux distribution. Maybe those who know could post periodic reminders which problems to post where... I thought that was what I just did? This is

Re: [Linux-HA] Q: fsck.ocfs anyone

2014-01-15 Thread Lars Marowsky-Bree
On 2014-01-15T12:05:22, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Ulrich, please either ask this question to support or at least on the ocfs2 mailing list. We really can't provide enterprise-level support via a generic mailing list. That is not a sustainable business model. And

Re: [Linux-HA] SLE11 SP3: attrd[13911]: error: plugin_dispatch: Receiving message body failed: (2) Library error: Success (0)

2014-01-15 Thread Lars Marowsky-Bree
On 2014-01-15T08:49:55, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I feel the current clusterstack for SLES11 SP3 has several problems. I'm fighting for a day to get my test cluster up again after having installed the latest updates. I still cannot find out what's going on, but I

Re: [Linux-HA] Antw: Re: Q: fsck.ocfs anyone

2014-01-15 Thread Lars Marowsky-Bree
On 2014-01-15T15:05:02, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: My man at Novell knows about the issue, too ;-) %s/Novell/SUSE/g I understand that Novell does not want to read about bugs in their products in mailinglists, just as customers don't want to see bugs in the products

Re: [Linux-HA] 3 active nodes in 3 node configuration

2014-01-09 Thread Lars Marowsky-Bree
On 2014-01-09T22:38:17, erkin kabataş ekb...@gmail.com wrote: I am using heartbeat-2.1.2-2.i386.rpm, heartbeat-pils-2.1.2-2.i386. rpm, heartbeat-stonith-2.1.2-2.i386.rpm packages on RHEL 5.5. 2.1.2? Seriously, upgrade. You're running code from 2007. I have 3 nodes and I only use cluster IP

Re: [Linux-HA] does heartbeat 3.0.4 use IP aliases under CentOS 6.5?

2014-01-04 Thread Lars Marowsky-Bree
On 2014-01-03T20:56:42, Digimer li...@alteeve.ca wrote: causing a lot of reinvention of the wheel. In the last 5~6 years, both teams have been working hard to unify under one common open-source HA stack. Pacemaker + corosync v2+ is the result of all that hard work. :) Yes. We know finally

Re: [Linux-HA] Antw: Re: Problem with updating the cluster stack (SLES11 SP3) (long)

2014-01-02 Thread Lars Marowsky-Bree
On 2014-01-02T10:17:13, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Are you using the update that was pushed on Friday? The previous As I installed updates on Monday, I guess so ;-) Hm, we're not yet aware of any new or existing bugs in that update. Of course, we'll learn

Re: [Linux-HA] [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T10:16:41, Kristoffer Grönlund kgronl...@suse.com wrote: Lars (lmb) suggested that we might switch to using the { } - brackets around resource sets everywhere for consistency. My only concern with that would be that it would be a breaking change to the previous crmsh syntax.

Re: [Linux-HA] [Pacemaker] crmsh: New syntax for location constraints, suggestions / comments

2013-12-13 Thread Lars Marowsky-Bree
On 2013-12-13T13:51:27, Andrey Groshev gre...@yandex.ru wrote: Just thought that I was missing in location, something like: node=any :) Can you describe what this is supposed to achieve? any is the default for symmetric clusters anyway. Regards, Lars -- Architect Storage/HA SUSE LINUX

Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-04 Thread Lars Marowsky-Bree
On 2013-12-04T10:25:58, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: You thought it was working, but in fact it wasn't. ;-) working meaning the resource started. not working meaning the resource does not start You see I have minimal requirements ;-) I'm sorry; we couldn't

Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-03 Thread Lars Marowsky-Bree
On 2013-12-02T09:22:10, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: No! Then it can't work. Exclusive activation only works for clustered volume groups, since it uses the DLM to protect against the VG being activated more than once in the cluster. Hi! Try it with

Re: [Linux-HA] SLES11 SP2 HAE: problematic change for LVM RA

2013-11-29 Thread Lars Marowsky-Bree
On 2013-11-29T12:05:28, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! A short notice: We had a problem after updating the resource agents in SLES11 SP2 HAE: A LVM VG would not start after updating the RAs. The primitive had exclusive=true for years, but the current RA requires

Re: [Linux-HA] Antw: Re: cib: [12226]: WARN: cib_diff_notify: Update (client: crmd, call:7137): -1.-1.-1 - 0.620.0 (The action/feature is not supported)

2013-11-29 Thread Lars Marowsky-Bree
On 2013-11-26T12:09:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I saw that I don't have an SBD device any more (it's stopped). Unfortunately I could not start it (crm resource start prm_stonith_sbd). I guess it's due to the fact that the cluster won't start resources until the

Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-11-29 Thread Lars Marowsky-Bree
On 2013-11-29T13:46:17, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I just did s/true/false/... Was that a clustered volume? Clusterd exclusive=true ?? No! Then it can't work. Exclusive activation only works for clustered volume groups, since it uses the DLM to protect

Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-11-29 Thread Lars Marowsky-Bree
On 2013-11-29T13:48:33, Lars Marowsky-Bree l...@suse.com wrote: Was that a clustered volume? Clusterd exclusive=true ?? No! Then it can't work. Exclusive activation only works for clustered volume groups, since it uses the DLM to protect against the VG being activated more than once

Re: [Linux-HA] Antw: Re: cib: [12226]: WARN: cib_diff_notify: Update (client: crmd, call:7137): -1.-1.-1 - 0.620.0 (The action/feature is not supported)

2013-11-26 Thread Lars Marowsky-Bree
On 2013-11-26T09:32:50, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Anotherthing I've notivced: One of out nodes has defective hardware and is down. It was OK all the time with SLES11 SP2, but SP3 now tried to fence the node and got a fencing timeout: Hmmm. Isn't the logic that

Re: [Linux-HA] cib: [12226]: WARN: cib_diff_notify: Update (client: crmd, call:7137): -1.-1.-1 - 0.620.0 (The action/feature is not supported)

2013-11-25 Thread Lars Marowsky-Bree
On 2013-11-25T17:48:25, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi Ulrich, Probably reason: cib: [12226]: ERROR: cib_perform_op: Discarding update with feature set '3.0.7' greater than our own '3.0.6' Is it required to update the whole cluster at once? It shouldn't be, and

Re: [Linux-HA] Antw: Xen XL Resource Agent

2013-11-18 Thread Lars Marowsky-Bree
On 2013-11-15T09:05:53, Tom Parker tpar...@cbnco.com wrote: The XL tools are much faster and lighter weight. I am not sure if they report proper codes (I will have to test) but the XM stack has been deprecated so at some point I assume it will go away completely. The Xen RA already supports

Re: [Linux-HA] ocf:heartbeat/ra's

2013-10-31 Thread Lars Marowsky-Bree
On 2013-10-31T10:54:15, Chuck Smith cgasm...@comcast.net wrote: I have been debugging ocf:heartbeat:anything, can someone point me to the definitive standards for ra handshake, as it appears there are several ported legacy methods that are inconsistent. Also, if you can point me to the top

Re: [Linux-HA] RH and gfs2 and Pacemaker

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-15T14:15:50, Moullé Alain alain.mou...@bull.net wrote: in fact, I would like to know if someone has configured gfs2 under Pacemaker with the dlm-controld and gfs-controld from cman-3.0.12 rpm (so without any more the dlm-controld.pcml and gfs-controld.pcml) ? And if it works

Re: [Linux-HA] RH and gfs2 and Pacemaker

2013-10-15 Thread Lars Marowsky-Bree
On 2013-10-15T16:25:37, Moullé Alain alain.mou...@bull.net wrote: Hi Lars, thanks a lot for information. I 'll try, but the documentation asks for gfs2-cluster rpm installation, and for now I don't find this rpm on RHEL6.4, and don't know if it is always required ... but it is not in your

Re: [Linux-HA] Antw: General question about heartbeat tokens and node overloaded.

2013-10-02 Thread Lars Marowsky-Bree
On 2013-10-02T09:36:14, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: In general I'm afraid you cannot handle this situation in a perfect way: You have two types of problems: 1) A node, resource, or monitor is hanging, but a long timeout prevents to recognize this in time 2) A

Re: [Linux-HA] Antw: General question about heartbeat tokens and node overloaded.

2013-10-02 Thread Lars Marowsky-Bree
On 2013-10-02T13:40:16, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: There is one notable exception: If you have shared storage (SAN, NAS, NFS), the cause of the slowness may be external to the systems being monitored, thus fencing those will not improve the situation, most likely.

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-01 Thread Lars Marowsky-Bree
On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one with this issue. For now I have set all my VMs to destroy so that the cluster is the only thing managing them but this is not super

Re: [Linux-HA] 2 node clustes with seperate quorum server

2013-09-25 Thread Lars Marowsky-Bree
On 2013-09-24T20:55:40, AvatarSmith cgasm...@comcast.net wrote: I'm having a bit of an issue under centos 6.4 x64. I have two duplcate hardware systems (raid arrays, 10G nics' etc) configured identically and drbd replication is working fine in the cluster between the two. When I started doing

Re: [Linux-HA] 2 node clustes with seperate quorum server

2013-09-25 Thread Lars Marowsky-Bree
On 2013-09-25T11:00:17, Chuck Smith cgasm...@comcast.net wrote: do act accordingly, for instance, I have raw primitives, add them to a group, then decide to move them to a different group (subject to load order) I cant just remove it from the group and put it in a different one, I have to

Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Lars Marowsky-Bree
On 2013-09-16T16:36:38, Tom Parker tpar...@cbnco.com wrote: Can you kindly file a bug report here so it doesn't get lost https://github.com/ClusterLabs/resource-agents/issues ? Submitted (Issue *#308)* Thanks. It definitely leads to data corruption and I think has to do with the way that

Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Lars Marowsky-Bree
On 2013-09-17T11:38:34, Ferenc Wagner wf...@niif.hu wrote: On the other hand, doesn't the recover action after a monitor failure consist of a stop action on the original host before the new start, just to make sure? Or maybe I'm confusing things... Yes, it would - but it seems there's a

Re: [Linux-HA] Clone colocation missing? (was: Pacemaker 1.19 cannot manage more than 127 resources)

2013-09-14 Thread Lars Marowsky-Bree
On 2013-09-13T17:48:40, Tom Parker tpar...@cbnco.com wrote: Hi Feri I agree that it should be necessary but for some reason it works well the way it is and everything starts in the correct order. Maybe someone on the dev list can explain a little bit better why this is working. It may

Re: [Linux-HA] Xen RA and rebooting

2013-09-14 Thread Lars Marowsky-Bree
On 2013-09-14T00:28:30, Tom Parker tpar...@cbnco.com wrote: Does anyone know of a good way to prevent pacemaker from declaring a vm dead if it's rebooted from inside the vm. It seems to be detecting the vm as stopped for the brief moment between shutting down and starting up. Hrm. Good

Re: [Linux-HA] FW: Can't seem to shutdown the DC

2013-09-12 Thread Lars Marowsky-Bree
On 2013-09-12T18:14:04, marcy.d.cor...@wellsfargo.com wrote: Hello list, Using SUSE SLES 11 SP2. I have 4 servers in a cluster running cLVM + OCFS2. If I tried to shutdown the one that is the DC using openais stop, strange things happen resulting in a really messed up cluster. One

Re: [Linux-HA] Problem with crm 1.2.6 rc 3 and older

2013-09-10 Thread Lars Marowsky-Bree
On 2013-09-09T17:22:14, Dejan Muhamedagic deja...@fastmail.fm wrote: a) When pacemaker and all other commandline tool can live nicely with multiple meta-attributes sections (it seems to be allowed by the xml definition) and address all nvpairs just by name beneath this tag, than crm

Re: [Linux-HA] Antw: Re: Max number of resources under Pacemaker ?

2013-09-05 Thread Lars Marowsky-Bree
On 2013-09-04T08:26:14, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: In my experience network traffic grows somewhat linear with the size of the CIB. At some point you probably have to change communication parameters to keep the cluster in a happy comminication state. Yes, I wish

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T10:25:58, Digimer li...@alteeve.ca wrote: I've run only 2-node clusters and I've not seen this problem. That said, I've long-ago moved off of openais in favour of corosync. Given that membership is handled there, I would look at openais as the source of your trouble. This is,

Re: [Linux-HA] Antw: Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T13:04:52, Digimer li...@alteeve.ca wrote: My mistake then. I had assumed that corosync was just a stripped down openais, so I figured openais provided the same functions. My personal experience with openais is limited to my early days of learning HA clustering on EL5. Yes and

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Lars Marowsky-Bree
On 2013-09-03T21:14:02, Vladislav Bogdanov bub...@hoster-ok.com wrote: To solve problem 2, simply disable corosync/pacemaker from starting on boot. This way, the fenced node will be (hopefully) back up and running, so you can ssh into it and look at what happened. It won't try to rejoin

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-30 Thread Lars Marowsky-Bree
On 2013-08-29T15:49:30, Tom Parker tpar...@cbnco.com wrote: Hello. Las night I updated my SLES 11 servers to HAE-SP3 which contains the following versions of software: Could you kindly file a report via NTS? That's the way to get official and timely support for SLE HA. (I don't mean to cut

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-29 Thread Lars Marowsky-Bree
On 2013-08-28T20:13:43, Dejan Muhamedagic de...@suse.de wrote: A new RC has been released today. It contains both fixes. It doesn't do atomic updates anymore, because cibadmin or something cannot stomach comments. Couldn't find the upstream bug report :-( Can you give me the pacemaker bugid,

Re: [Linux-HA] establishing a new resource-agent package provider

2013-08-13 Thread Lars Marowsky-Bree
On 2013-08-13T20:53:13, Andrew Beekhof and...@beekhof.net wrote: I'd: - Rename the provider to core - Rework our own documentation and as we find it - Transparently support references to ocf:heartbeat forever: - Re-map s/heartbeat/core/ in the LRM (silently, or it'd get really

Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-12T11:05:32, Wengatz Herbert herbert.weng...@baaderbank.de wrote: Seeing the high dropping quote... (just compare this to the other NIC) - have you tried a new cable? Maybe it's a cheap hardware problem... The drop rate is normal. A slave NIC in a bonded active/passive

Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-12T12:19:40, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: BTW: The way resource restart is implemented (i.e.: stop wait, then start) has a major problem: If stop causes to fence the node where the crm command is running, the resource will remain stopped even after the

Re: [Linux-HA] Antw: crm resource restart is broken (d4de3af6dd33)

2013-07-12 Thread Lars Marowsky-Bree
On 2013-07-12T12:26:18, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: (Another way to trigger a restart is to modify the instance parameters. Set __manual_restart=1 and it'll restart.) once? ;-) Keep increasing it. ;-) -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff

Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM

2013-07-11 Thread Lars Marowsky-Bree
On 2013-07-11T08:41:33, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: For a really silly idea, but can you swap the network cards for a test? Say, with Intel NICs, or even another Broadcom model? Unfortunately no: The 4-way NIC is onboard, and all slots are full. Too bad. But then

Re: [Linux-HA] Antw: Re: Master/Slave status check using crm_mon

2013-07-10 Thread Lars Marowsky-Bree
On 2013-07-10T13:26:32, John M john332...@gmail.com wrote: Current application supports only the Master/Slave configuration and there can be one master and one slave process in a group. A cluster can host multiple groups. You could, indeed, group your systems into 3 or 5 node clusters, and

Re: [Linux-HA] beating a dead horse: cLVM, OCFS2 and TOTEM

2013-07-10 Thread Lars Marowsky-Bree
On 2013-07-10T08:31:17, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I had reported about terrible performance of cLVM (maybe related to using OCFS also) when uses in SLES11 SP2. I guesses cLVM (or OCFS2) is communicating to death on activity. Now I have some interesing news: No,

Re: [Linux-HA] Antw: Re: beating a dead horse: cLVM, OCFS2 and TOTEM

2013-07-10 Thread Lars Marowsky-Bree
On 2013-07-10T14:33:12, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Network problems in hypervisors though also have a tendency to be, well, due to the hypervisor, or some network cards (broadcom?). Yes: driver: bnx2 version: 2.1.11 firmware-version: bc 5.2.3 NCSI 2.0.12 For

Re: [Linux-HA] Best version of Heartbeat + Pacemaker

2013-07-09 Thread Lars Marowsky-Bree
On 2013-07-08T22:35:31, Digimer li...@alteeve.ca wrote: As for multi-DC support, watch the booth project. It's supposed to bring stretch clustering to corosync + pacemaker. Stretch clustering is already possible and supported (depending on whom you ask; it is on SLE HA) with corosync. booth

Re: [Linux-HA] Antw: Re: Master/Slave status check using crm_mon

2013-07-09 Thread Lars Marowsky-Bree
On 2013-07-09T20:06:45, John M john332...@gmail.com wrote: Now I want to know 1. Can I use a node which is part of another cluster to run quorum node? 2. Can I configure a standalone quorum node that can manager 25 Clusters? No. Using the quorum node approach, a node can only ever be part of

Re: [Linux-HA] Antw: Re: Master/Slave status check using crm_mon

2013-07-09 Thread Lars Marowsky-Bree
On 2013-07-09T23:11:01, John M john332...@gmail.com wrote: So STONITH/feancing is the only option? A quorum node is no alternative to fencing, anyway. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-05 Thread Lars Marowsky-Bree
On 2013-07-05T19:06:54, Vladislav Bogdanov bub...@hoster-ok.com wrote: params #merge param1=value1 param2=value2 meta #replace ... utilization #keep and so on. With default to #replace? Even more. If we allow such meta lexems anywhere (not only at the very beginning), then

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Lars Marowsky-Bree
On 2013-07-03T00:20:19, Vladislav Bogdanov bub...@hoster-ok.com wrote: I do not edit them. I my setup I generate full crm config with template-based framework. And then you do a load/replace? Tough; yes, that'll clearly overwrite what is already there and added by scripts that more dynamically

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Lars Marowsky-Bree
On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote: Not sure that is expected by most people. How you then delete attributes? Tough call :) Ideas welcome. Set them to an empty string, or a magic #undef value. It's not only for the nodes. Attributes of resources should be

Re: [Linux-HA] Backing out of HA

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-01T16:31:13, William Seligman selig...@nevis.columbia.edu wrote: a) people can exclaim You fool! and point out all the stupid things I did wrong; b) sysadmins who are contemplating the switch to HA have additional points to add to the pros and cons. I think you bring up an

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T11:05:01, Vladislav Bogdanov bub...@hoster-ok.com wrote: One thing I see immediately, is that node utilization attributes are deleted after I do 'load update' with empty node utilization sections. That is probably not specific to this patch. Yes, that isn't specific to that. I

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Lars Marowsky-Bree
On 2013-07-02T13:14:48, Vladislav Bogdanov bub...@hoster-ok.com wrote: Yes, that's exactly what you need here. I know, but I do not expect that to be implemented soon. crm_attribute -l reboot -z doesn't strike me as an unlikely request. You could file an enhancement request for that. But

  1   2   3   4   5   6   7   8   9   10   >