[Linux-HA] Antw: Re: Comments inside CRM config?

2011-02-02 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 02.02.2011 um 13:34 in Nachricht 20110202123432.GD32096@rondo.homenet: Hi, On Tue, Feb 01, 2011 at 09:46:59PM +0100, Matthias Ferdinand wrote: Hi, I tried to use comments inside CRM configuration as per

[Linux-HA] pacemaker/HealthCPU

2011-02-03 Thread Ulrich Windl
Hi! I'm starting to explore Linux-HA. Examining one of the monitors, I think things could be made much more efficient. For example: To get the percent of idle CPU the monitor uses 4 processes: top -b -n2 | grep Cpu | tail -1 | awk -F,|\.[0-9]%id '{ print $4 }' However awk can do the effect of

[Linux-HA] Antw: Re: [Pacemaker] Solved: SLES 11 HAE SP1 Signon to CIB Failed

2011-02-03 Thread Ulrich Windl
Tim Serong tser...@novell.com schrieb am 03.02.2011 um 12:11 in Nachricht 4d4b1a08020a00035...@novprvlin0050.provo.novell.com: On 2/3/2011 at 08:47 PM, darren.mans...@opengi.co.uk wrote: On Fri, Jan 28, 2011 at 1:06 PM, darren.mans...@opengi.co.uk wrote: [...] It seems the example

[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-03 Thread Ulrich Windl
Michael Schwartzkopff mi...@clusterbau.com schrieb am 03.02.2011 um 13:09 in Nachricht 201102031309.04931.mi...@clusterbau.com: On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote: Hi! I'm starting to explore Linux-HA. Examining one of the monitors, I think things could be made

[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-03 Thread Ulrich Windl
Soffen, Matthew msof...@iso-ne.com schrieb am 03.02.2011 um 16:35 in Nachricht e847bfef193361409d48010ec8ace3bc01703...@exchangebe.iso-ne.com: Morning All, Please also keep in mind that /proc/stat is ONLY in Linux and Linux-HA despite the name is also used on FreeBSD and Solaris. Hi! Good

[Linux-HA] Antw: corosync + pacemaker on FC13 problem

2011-02-03 Thread Ulrich Windl
Hi! I'd guess /var/log/messages is the place to look at first. Regards, Ulrich Linux Cook linuxc...@gmail.com schrieb am 04.02.2011 um 08:25 in Nachricht AANLkTikRGKUPqpg8MW7NkFTHUi=0hyp8pceyhh7ig...@mail.gmail.com: hi! I need help into my corosync and pacemaker configuration. Corosync is

[Linux-HA] Antw: Re: pacemaker/HealthCPU

2011-02-04 Thread Ulrich Windl
20110204122642.GG10069@barkeeper1-xen.linbit: On Thu, Feb 03, 2011 at 01:09:04PM +0100, Michael Schwartzkopff wrote: On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote: Hi! I'm starting to explore Linux-HA. Examining one of the monitors, I think things could be made much more efficient

[Linux-HA] Antw: Re: stonith resource in RHEL6

2011-02-07 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 07.02.2011 um 13:54 in Nachricht AANLkTim4C5ZqhP-52wYEc0zTfAgTCTmCZtnk=y-we...@mail.gmail.com: On Mon, Feb 7, 2011 at 1:04 PM, ap bi...@antworte.me wrote: Hi list, I have pacemaker-1.1.2, corosync-1.2.3 and try to implement a meatware

[Linux-HA] Q: Clusters and Mirroring

2011-02-07 Thread Ulrich Windl
Hi, I have a typical beginner's question: If I want to mirror a filesystem that is available on multiple cluster nodes, I cannot use MD-RAID, because that might corrupt data if activated on multiple nodes simultaneously. OTOH when using LVM mirroring (cLVM) I have two choices: 1) --mirrorlog

[Linux-HA] Documentation for external/vmware STONITH device

2011-02-08 Thread Ulrich Windl
Hi! The documentation for external/vmware (SLES11 SP1) is sub-minimal: # stonith -t external/vmware -h ** INFO: Cannot get rhcs plugin subplugins STONITH Device: external/vmware - VMware Server STONITH device For more information see http://www.vmware.com/ List of valid parameter names for

[Linux-HA] Antw: Could not connect to the CIB: Remote node did not respond

2011-02-09 Thread Ulrich Windl
liang...@asc-csa.gc.ca schrieb am 09.02.2011 um 16:00 in Nachricht 253e072d9bb4e94782e852095edbc596030b1...@excsth01.csa.space.gc.ca: [...] Call cib_replace failed (-41): Remote node did not respond null [...] Hi, I cannot help on the issue (I'm still at the theory phase with Linux HA), but

[Linux-HA] One-Node-Cluster

2011-02-14 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 14.02.2011 um 10:08 in Nachricht aanlktinuc9_oqpwjubxrdmqkncqvnqx68a_1kbqss...@mail.gmail.com: [...] The log just keeps on saying: Feb 8 16:01:03 dmcs2 pengine: [1480]: WARN: cluster_status: We do not have quorum - fencing and resource

[Linux-HA] Antw: Re: fsck filesystem?

2011-03-14 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 21.02.2011 um 17:43 in Nachricht 20110221164331.GA3603@squib: Hi, On Fri, Feb 18, 2011 at 11:56:49AM -0500, Tony Nelson wrote: Hi All, I have a small cluster configured like this: [-- config -]

[Linux-HA] Antw: Sort of crm commandes but off line ?

2011-03-24 Thread Ulrich Windl
For what's worth it: With HP-UX ServiceGuard you can query the cluster on a node that is configured as a member of the cluster but is currently halted (down). I guess the comment temporarily launches some comminucation component to query the information from another node... Regards, Ulrich

[Linux-HA] Antw: Re: DRBD and pacemaker interaction

2011-03-28 Thread Ulrich Windl
Lars Ellenberg lars.ellenb...@linbit.com schrieb am 26.03.2011 um 00:10 in Nachricht 20110325231023.GK24099@barkeeper1-xen.linbit: [...] Pacemaker is not a substitute for proper monitoring (nagios, whatever). [...] AFAIK, pacemaker starts a monitor programm every now and then to check the

Re: [Linux-HA] Antw: Sort of crm commandes but off line ?

2011-03-28 Thread Ulrich Windl
. Maybe crm can find out from the XML configuration file what nodes to query for the cluster status (assuming comminication for the cluster is still up). Ulrich Ulrich Windl a écrit : For what's worth it: With HP-UX ServiceGuard you can query the cluster on a node that is configured

[Linux-HA] Antw: Re: Question about max_child_count

2011-04-05 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 04.04.2011 um 13:56 in Nachricht 20110404115618.GD3553@squib: Hi, On Mon, Apr 04, 2011 at 08:32:10AM +0200, Alain.Moulle wrote: Hi I got a strange message about ... max_child_count (4) reached, postponing execution of operation stop

[Linux-HA] Antw: Re: op monition on-fail option

2011-04-05 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 04.04.2011 um 14:51 in Nachricht 20110404125103.GK3553@squib: Hi, On Tue, Mar 22, 2011 at 06:39:08PM +0200, Pavlos Polianidis wrote: Dear all, I am looking for a way to add/modify an op-option op monitor on-fail=restart through a

Re: [Linux-HA] Antw: Re: op monition on-fail option

2011-04-05 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.04.2011 um 13:44 in Nachricht 20110405114447.GG17807@rondo.homenet: Hi, On Tue, Apr 05, 2011 at 08:25:21AM +0200, Ulrich Windl wrote: Dejan Muhamedagic deja...@fastmail.fm schrieb am 04.04.2011 um 14:51 in Nachricht 20110404125103

Re: [Linux-HA] Antw: Re: op monition on-fail option

2011-04-06 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.04.2011 um 16:40 in Nachricht 20110405144030.GB3505@squib: On Tue, Apr 05, 2011 at 03:58:39PM +0200, Ulrich Windl wrote: Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.04.2011 um 13:44 in Nachricht 20110405114447.GG17807

[Linux-HA] Robustness of crm

2011-04-06 Thread Ulrich Windl
Hi, found in SLES11 SP1 with all current updates (pacemaker-1.1.5-5.5.5): # crm crm(live)# configure crm(live)configure# template crm(live)configure template# list Traceback (most recent call last): File /usr/sbin/crm, line 45, in module main.run() File

[Linux-HA] Making crm quit (#2)

2011-04-06 Thread Ulrich Windl
Hi! I just managed to make crm of SLES11 SP1 quit again: # crm crm(live)# configure crm(live)configure# primitive prm_OCF1_dlm ocf:pacemaker::controld op monitor interval=60 timeout=60 Traceback (most recent call last): File /usr/sbin/crm, line 45, in module main.run() File

[Linux-HA] Antw: Re: Making crm quit (#2)

2011-04-07 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 06.04.2011 um 16:48 in Nachricht 20110406144816.GC3673@squib: Hi, On Wed, Apr 06, 2011 at 03:49:58PM +0200, Ulrich Windl wrote: Hi! I just managed to make crm of SLES11 SP1 quit again: Amazing :) # crm crm(live)# configure crm

[Linux-HA] Q: Faulty corosync ring

2011-04-13 Thread Ulrich Windl
Hi! I have a question: corosync reports a faulty redundant ring: # corosync-cfgtool -s Printing ring status. Local node ID 17831084 RING ID 0 id = 172.20.16.1 status = ring 0 active with no faults RING ID 1 id = 10.2.2.1 status = Marking seqid 112097

[Linux-HA] Antw: Q: Faulty corosync ring

2011-04-13 Thread Ulrich Windl
Hi! Found the answer: Some nodes were disagreeing on the netmasks for 10.2.2.x... Regards, Ulrich Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 13.04.2011 um 16:38 in Nachricht 4da5d1a202a15...@gwsmtp1.uni-regensburg.de: Hi! I have a question: corosync reports a faulty

[Linux-HA] XEN NPIV with Brocade bfa driver anyone?

2011-04-26 Thread Ulrich Windl
Hi! I just found out that XEN4's NPIV (Fibre Channel NPort Virtualozation) does not work with Brocade's bfa driver in SLES 11 SP1. That is because of non-standard sysfs entries being used for virtual ports (similar to Emulex, but still different). I wonder whether anybody did hack the

[Linux-HA] ocf:pacemaker:ping: dampen

2011-04-29 Thread Ulrich Windl
Hi, I think the description for dampen in OCF:pacemaker:ping (pacemaker-1.1.5-5.5.5 of SLES11 SP1) is too terse: parameter name=dampen unique=0 longdesc lang=en The time to wait (dampening) further changes occur /longdesc shortdesc lang=enDampening interval/shortdesc content type=integer

[Linux-HA] Antw: ocf:pacemaker:ping: dampen

2011-04-29 Thread Ulrich Windl
Update: I found out that the parameter is passed to attrd_updater as -d, where the documentation is just as bad: -d, --delay=value The time to wait (dampening) in seconds further changes occur I found no manual page for it. Regards, Ulrich Ulrich Windl ulrich.wi...@rz.uni

[Linux-HA] Antw: Re: ocf:pacemaker:ping: dampen

2011-05-02 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 29.04.2011 um 09:31 in Nachricht BANLkTi=-ftyk9uxcgu0m2wqhquu_rt8...@mail.gmail.com: On Fri, Apr 29, 2011 at 9:27 AM, Dominik Klein d...@in-telegence.net wrote: It waits $dampen before changes are pushed to the cib. So that eventually occuring

Re: [Linux-HA] Antw: Re: ocf:pacemaker:ping: dampen

2011-05-03 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 02.05.2011 um 13:20 in Nachricht banlktimmruow2ldzsrzlmb1wwy9hpp4...@mail.gmail.com: On Mon, May 2, 2011 at 8:27 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 29.04.2011 um 09:31

[Linux-HA] Antw: HA servers rebooting

2011-05-10 Thread Ulrich Windl
Hi! I wonder: Are you trying to create a snapshot of a VG that is activated on another node? A snapsot modifies the data blocks of the original VG, so I'd be not surprised if the kernel crashes then. Regards, Ulrich Brent Clark 06.05.11 11.44 Uhr Hiya Im wondering if someone could share

[Linux-HA] Antw: Re: Massive amount of log messages after node failure

2011-05-17 Thread Ulrich Windl
Hi! I think that pacemaker is logging too much all the time, so you hardly can find out if there really is a problem. For example external/sbd is logging a message every time the shared disk is OK, that is every 30s or so. In contrast (just for example) HP ServiceGuard only logs if the status

Re: [Linux-HA] Antw: Re: Massive amount of log messages after node failure

2011-05-18 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 17.05.2011 um 22:39 in Nachricht 20110517203904.gj4...@suse.de: On 2011-05-17T17:16:51, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I think that pacemaker is logging too much all the time, so you hardly can find out if there really

[Linux-HA] Antw: HA Nodes Port 691 UDP

2011-05-18 Thread Ulrich Windl
Randy Katz rk...@simplicityhosting.com schrieb am 18.05.2011 um 14:31 in Nachricht 4dd3bc2e.2080...@simplicityhosting.com: Hi, does anyone on this list know why there are UDP requests on port 691 of the HA nodes? I turned on firewalling and my crm_mon would not show both nodes' status until I

Re: [Linux-HA] Antw: Re: Massive amount of log messages after node failure

2011-05-18 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 18.05.2011 um 16:47 in Nachricht 20110518144724.GA3661@squib: Hi, On Wed, May 18, 2011 at 09:03:29AM +0200, Ulrich Windl wrote: Lars Marowsky-Bree l...@suse.de schrieb am 17.05.2011 um 22:39 in Nachricht 20110517203904.gj4...@suse.de

[Linux-HA] Antw: Re: SFEX questions

2011-05-19 Thread Ulrich Windl
NAKAHIRA Kazutomo nakahira.kazut...@oss.ntt.co.jp schrieb am 19.05.2011 um 06:39 in Nachricht 4dd49f06.8070...@oss.ntt.co.jp: Hi, Ulrich Hi! Thanks for answering. However there's one thing I don't understand: [...] 3) If I want to protect an MD-RAID1, would it work if I partition the leg

[Linux-HA] Setting up SBD resources in SLES11

2011-05-19 Thread Ulrich Windl
Hi! I had the doubt that setting up the SBD resources is described correctly in the High Availability Guide of SLES 11. My comment (to Novell I think) was: Shouldn't here be a resource per node? Following the procedure, the resource just starts on an arbitrary node. If one primitive per node,

[Linux-HA] Antw: Re: SFEX questions

2011-05-19 Thread Ulrich Windl
Hi! While I'm at it: I read README.sfex as well, and I'm going to provide some suggestions (error correction, suggestions for wording) for it. Sorry for the PDF (150kB) with the comments; it's OCR'ed. Regards, Ulrich ___ Linux-HA mailing list

[Linux-HA] SBD and SFEX on one shared (aprtitioned) disk?

2011-05-19 Thread Ulrich Windl
Hi! From what I've read about SBD and SFEX, I could use one disk for both of them, if SBD and SFEX get a partition on the disk. Right? Reason: The minimum of a disk on out SAN is 1GB, and it's quite wasteful to have 1GB just for SBD. Doing some calculation, 1MB for SBD should be enough for

[Linux-HA] Resource fencing with two SFEX disks?

2011-05-19 Thread Ulrich Windl
Hi! Another question on SFEX: Recent SBD allows multiple (1-3) devices, and SFEX could be used similarily, but I wonder how to express a majority of sfex primitives in CRM (what I have in mind is this: When using two SFEX disks, the resources should stay up if one of those disks fails, but not

[Linux-HA] SLES11 SP1 ocf:nfsserver

2011-05-19 Thread Ulrich Windl
Hi! While wondering why my ocf:nfsserver won't start, I tested the RA. Wondering: # OCF_ROOT=/usr/lib/ocf OCF_RESKEY_nfs_ip=172.20.16.208 OCF_RESKEY_nfs_shared_infodir=/exports/home/NFS /usr/lib/ocf/resource.d/heartbeat/nfsserver status usage:

[Linux-HA] Where did coros...@lists.osdl.org go?

2011-05-20 Thread Ulrich Windl
Hi! The corosync-overview man page still has this address, but that seems to have gone: lists.linux-foundation.org[140.211.169.51] said: 550 5.1.1 coros...@lists.osdl.org... User unknown Anybody knows the current address? I hope the project is not dead... Regards, Ulrich

[Linux-HA] Antw: Re: DO NOT start using heartbeat 2.x in crm mode, but just use Pacemaker, please! [was: managing resource httpd in heartbeat]

2011-05-20 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:02 in Nachricht 20110519110256.gl26...@suse.de: [...] Of course. And while our esteemed SLES10 customers are still fully supported on our maintained 2.1.4-fixed version, I personally believe everyone should move swiftly to a newer

[Linux-HA] Antw: Re: SBD and SFEX on one shared (aprtitioned) disk?

2011-05-20 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:03 in Nachricht 20110519110338.gm26...@suse.de: On 2011-05-19T11:24:23, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! From what I've read about SBD and SFEX, I could use one disk for both of them, if SBD

[Linux-HA] Antw: Re: Setting up SBD resources in SLES11

2011-05-20 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 19.05.2011 um 13:15 in Nachricht 20110519111526.gn26...@suse.de: On 2011-05-19T09:19:42, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I had the doubt that setting up the SBD resources is described correctly in the High

[Linux-HA] SLES11 SP1: bug in crm shell completion

2011-05-20 Thread Ulrich Windl
Hi! The crm shell of SLES11 SP1 has the following auto-completion bug: After defining new primitives in crm configure, the new primitives don't show in completion after commit until the configure context is re-entered (e.g. by up, configure). While taking about completion: If I enter del foo,

[Linux-HA] ocf:heartbeat:exportfs

2011-05-23 Thread Ulrich Windl
Hi! Do you use ocf_heartbeat:exportfs? It seems you cannot specify multiple clients using the same options (i.e.: clientspec=h1,h2,h3). So you need to specify a rather huge amount of primitives if you cannot use reasonable pattern matching. Any good solutions? Regards, Ulrich

[Linux-HA] Q: Load balancing using utilization

2011-05-23 Thread Ulrich Windl
Hi! I have defined nodes with a CPU and RAM utilization value, and I have defined a resource that uses CPU and RAM. I'd expect to see the node's utilization (rest capacity) reduced by the amount that the running resources specify, but that doesn't seem to be the case. What could be wrong? I

Re: [Linux-HA] Antw: Re: SBD and SFEX on one shared (aprtitioned) disk?

2011-05-23 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 23.05.2011 um 13:06 in Nachricht 20110523110652.gb29...@suse.de: On 2011-05-20T08:16:23, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Well, yes. I'm not quite sure why you'd want to use sfex though if you have sbd fencing anyway

[Linux-HA] Antw: Re: ocf:IPaddr: Iflabel seems to be ignored

2011-05-23 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 23.05.2011 um 13:13 in Nachricht 20110523111348.gd29...@suse.de: On 2011-05-12T11:28:49, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: So I guess it simply does not work (or more likely: it never was implemented

[Linux-HA] Antw: Re: Q: Load balancing using utilization

2011-05-23 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 23.05.2011 um 13:21 in Nachricht 20110523112136.gf29...@suse.de: On 2011-05-23T10:11:21, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I have defined nodes with a CPU and RAM utilization value, and I have defined a resource

[Linux-HA] Antw: Re: ocf:heartbeat:exportfs

2011-05-23 Thread Ulrich Windl
Eric Warnke ewar...@albany.edu schrieb am 23.05.2011 um 14:38 in Nachricht c9ffcb62.88a4%ewar...@uamail.albany.edu: I have a test environment running and noted the same limitation. As far as I can tell you have a few options. 1) Have a primitive for each client group. 2) Rewrite the

[Linux-HA] Antw: Re: ocf:IPaddr: Iflabel seems to be ignored

2011-05-23 Thread Ulrich Windl
Lars Ellenberg lars.ellenb...@linbit.com schrieb am 23.05.2011 um 15:37 in Nachricht 20110523133716.GH16134@barkeeper1-xen.linbit: On Mon, May 23, 2011 at 01:13:48PM +0200, Lars Marowsky-Bree wrote: On 2011-05-12T11:28:49, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: So I

[Linux-HA] Antw: cat /dev/ttyS0

2011-05-24 Thread Ulrich Windl
Hai Tao taoh...@hotmail.com schrieb am 23.05.2011 um 22:59 in Nachricht bay156-w38e2c0bf103481109bd9deeb...@phx.gbl: this might not be too close to HA, but I am not sure if someone has seem this before: I use a serial cable between two nodes, and I am testing the heartbeat with :

[Linux-HA] Incomplete check in start method of ocf:heartbeat:LVM

2011-05-24 Thread Ulrich Windl
Hello boys (and girls?)! Building my resources incrementally made me debug the LVM RA: Syslog said nothing, but the resource won't start. As it turned out, the check whether a VG was activated is wrong: vgchange -a seems to report an error if a VG without LVs was activated (It's completely OK

[Linux-HA] RA methods being idempotent?

2011-05-27 Thread Ulrich Windl
Hi! I have a question due to a problem that causes a node fence: When I set up a Volume group on an MD-RAID device, trying to start these resources through pacemaker causes a node fence. I didn't investigate, but I guess the problem is that the resources are already up when the RAs try to

[Linux-HA] Antw: Re: ocf:IPaddr: Iflabel seems to be ignored

2011-06-03 Thread Ulrich Windl
Alan Robertson al...@unix.sh schrieb am 02.06.2011 um 19:49 in Nachricht 4de7cd34.3000...@unix.sh: On 05/23/2011 07:37 AM, Lars Ellenberg wrote: On Mon, May 23, 2011 at 01:13:48PM +0200, Lars Marowsky-Bree wrote: On 2011-05-12T11:28:49, Ulrich Windlulrich.wi...@rz.uni-regensburg.de

[Linux-HA] Q: completion of resources in resource show of crm shell

2011-06-27 Thread Ulrich Windl
Hi, this is for SLES11 SP1 and crm 1.1.5 (Build 5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60): I realized that show in crm resource does not support completion (to show a specific resource). However when specifying a resource, just the status of that resource is shown. Would be nice if resource

[Linux-HA] Antw: Re: Q: completion of resources in resource show of crm shell

2011-06-27 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 27.06.2011 um 14:08 in Nachricht 20110627120805.GC29249@rondo.homenet: Hi, On Mon, Jun 27, 2011 at 09:52:59AM +0200, Ulrich Windl wrote: Hi, this is for SLES11 SP1 and crm 1.1.5 (Build 5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60): I

[Linux-HA] ra-api-1.dtd and related

2011-06-28 Thread Ulrich Windl
Hi! The thing with XML and DTS seems to be a hard thing to manage: Reading the RA specification draft by Mr. Marowsky-Brée, the DTD for the RA's metadata should be at http://www.opencf.org/standards/ra-api-1.dtd. However, there it isn't. With today's symbolic links, aliases, redirections, CGI

[Linux-HA] Q: ms-resources and grouping

2011-06-30 Thread Ulrich Windl
Hi! I have a question: when I want to have a filesystem on a logical volume, where the VG is on a RAID1, I would typically have three resources to handle that. Now if I wish to have a clone or ms resource, how could I connect the resources so that the resource's nodes find the desired

[Linux-HA] ocf:heartbeat:Raid1 unable to re-add missing (stale) leg of RAID1

2011-07-01 Thread Ulrich Windl
Hi! I don't know if this is an isse of mdadm in SLES11 SP1, but we had the situation where a RAID1 ended up with one leg, when a manual mdadm --re-add /dev/mdX /dev/disk/... worked. Inspecting the RA, I guess it also should have tried that (a bit differently): ocf_log info

[Linux-HA] Antw: ocf:heartbeat:Raid1 unable to re-add missing (stale) leg of RAID1

2011-07-01 Thread Ulrich Windl
Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 01.07.2011 um 16:32 in Nachricht 4e0df69b02a16...@gwsmtp1.uni-regensburg.de: Hi! I don't know if this is an isse of mdadm in SLES11 SP1, but we had the situation where a RAID1 ended up with one leg, when a manual mdadm --re

[Linux-HA] crm_mon's Failed action

2011-07-01 Thread Ulrich Windl
Hi, a quick thought: Wouldn't it be very helpful if crm_mon would include a time stamp for a failed action? That way one could inspect the syslog for details. Example: Failed actions: prm_db_monitor_12 (node=host02, call=327, rc=-2, status=Timed Out): unknown exec error Regards,

[Linux-HA] Q: What is (null)?

2011-07-04 Thread Ulrich Windl
Hi, null in syslog messages always looks like a programming error to me; so what does it mean? Example: Jul 4 11:10:15 hostname pengine: [11516]: notice: process_pe_message: Transition 332: PEngine Input stored in: (null) [...] Jul 4 11:10:15 hostname crmd: [11517]: info: do_te_invoke:

[Linux-HA] crm manage prm vs. crm_mon

2011-07-04 Thread Ulrich Windl
Hi! This was found in SLES11 SP1 (Version: 1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60): A resource is being displayed as (unmanaged) FAILED. I used crm resource manage prm to set the resource back to managed mode. However the resource is still displayed as unmanaged by crm_mon. When

[Linux-HA] Antw: Re: crm manage prm vs. crm_mon

2011-07-04 Thread Ulrich Windl
Tim Serong tser...@novell.com schrieb am 04.07.2011 um 13:34 in Nachricht 4e11a538.5080...@novell.com: On 04/07/11 19:48, Ulrich Windl wrote: Hi! This was found in SLES11 SP1 (Version: 1.1.5-5bd2b9154d7d9f86d7f56fe0a74072a5a6590c60): A resource is being displayed as (unmanaged) FAILED

[Linux-HA] Antw: Re: ocf:heartbeat:Raid1 unable to re-add missing (stale) leg of RAID1

2011-07-04 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 04.07.2011 um 15:09 in Nachricht 20110704130914.gc1...@suse.de: On 2011-07-01T16:32:27, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I don't know if this is an isse of mdadm in SLES11 SP1, but we had the situation where

[Linux-HA] Antw: Re: ra-api-1.dtd and related

2011-07-04 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 04.07.2011 um 15:11 in Nachricht 20110704131155.gd1...@suse.de: On 2011-06-28T08:15:53, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! The thing with XML and DTS seems to be a hard thing to manage: Reading the RA specification

Re: [Linux-HA] Antw: Re: crm manage prm vs. crm_mon

2011-07-04 Thread Ulrich Windl
Tim Serong tser...@novell.com schrieb am 04.07.2011 um 15:27 in Nachricht 4e11bfd7.7030...@novell.com: On 04/07/11 23:16, Ulrich Windl wrote: Tim Serongtser...@novell.com schrieb am 04.07.2011 um 13:34 in Nachricht 4e11a538.5080...@novell.com: On 04/07/11 19:48, Ulrich Windl wrote: Hi

[Linux-HA] Antw: Re: ra-api-1.dtd and related

2011-07-04 Thread Ulrich Windl
Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 04.07.2011 um 15:31 in Nachricht 4e11dcca02a16...@gwsmtp1.uni-regensburg.de [...] Hi Lars, as I started to write the RA, I found that the draft specification (written by you?) talks about a monitor action (section 3.4.3

[Linux-HA] Q: remaining LRM resources after deleting a clone

2011-07-05 Thread Ulrich Windl
Hi! I wonder whether this is intended behaviour: I had configured a clone resource in a two-node cluster. That clone resource was stopped and deleted using crm. After a while crm_mon still showed a clone resource for the other node. Inspecting the CIB (cibadmin -Q) I found that in element

[Linux-HA] Antw: Re: ra-api-1.dtd obsolete?

2011-07-05 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.07.2011 um 15:41 in Nachricht 20110705134105.GA3767@squib: Hi, On Tue, Jul 05, 2011 at 03:32:49PM +0200, Ulrich Windl wrote: Hello again! I'm wondering whether ra-api-1.dtd is still current: element action: validity error : Value

Re: [Linux-HA] Antw: Re: ra-api-1.dtd obsolete?

2011-07-06 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.07.2011 um 17:16 in Nachricht 20110705151610.GA3822@squib: [...] Now the funny thing is that you cahnged the DTD in an incompatible way without changing the version number. That's very bad practice! You're right, the version should've

Re: [Linux-HA] Antw: Re: ra-api-1.dtd obsolete?

2011-07-06 Thread Ulrich Windl
Lars Marowsky-Bree l...@suse.de schrieb am 06.07.2011 um 17:20 in Nachricht 20110706152050.ga1...@suse.de: On 2011-07-06T16:08:16, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: And for the records, here are the diffs between the old and the new version: Yes. We know

[Linux-HA] Antw: Re: Q: How does crm locate RAs?

2011-07-07 Thread Ulrich Windl
Florian Haas florian.h...@linbit.com schrieb am 06.07.2011 um 21:51 in Nachricht 4e14bca8.7070...@linbit.com: (Your MUA seems to have injected = in the toward the end of most lines. May want to have a look at fixing that.) On 07/06/2011 05:40 PM, Ulrich Windl wrote: Hi! As I've

[Linux-HA] XML metadata as a separate file (Was: Q: How does crm locate RAs?)

2011-07-08 Thread Ulrich Windl
Florian Haas florian.h...@linbit.com schrieb am 07.07.2011 um 10:09 in Nachricht 4e1569bd.2010...@linbit.com: [...] Then a question: Is it allowed to have a subdirectory per RA, like .../provider/agent/agent* ? I implemented my metadata as a separate XML file, and now I wonder if the

[Linux-HA] Antw: Forkbomb not initiating failover

2011-07-08 Thread Ulrich Windl
James Smith james.sm...@m247.com schrieb am 07.07.2011 um 11:59 in Nachricht 05CDC1A731F2E64C8C3BD1957047E844471ED3@office-server2.m247.local: Hi, Summary: Two node cluster running DRBD, IET with a floating IP and stonith enabled. All this works well, I can kernel panic the machine,

[Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-08 Thread Ulrich Windl
James Smith james.sm...@m247.com schrieb am 07.07.2011 um 13:52 in Nachricht 05CDC1A731F2E64C8C3BD1957047E8444727D4@office-server2.m247.local: Hi, I appreciate that, but it doesn't answer the question. What I'm getting at, is there are multiple scenarios where a system can fail but in

[Linux-HA] Q: RAs for NPIV and multipath?

2011-07-11 Thread Ulrich Windl
Hi, I'm wondering: Does anybody (except me) use NPIV and multipath? With NPIV, you would move a virtual WWN from one node to another to migrate the storage device(s). However when you use multipathing, a vWWN move looks much like a device disconnect, and multipath will flood the syslog with

[Linux-HA] Q: What type of dependency is colocation?

2011-07-11 Thread Ulrich Windl
Hi! I think I misunderstood colocation: I thought a colocation means if two resources are running, the should be (or not) on the same node. In practive however, there seems to be more: I have two resources, say A and B. A has highter priority than B, and an ordering that B should be started

[Linux-HA] Antw: Re: Q: What type of dependency is colocation?

2011-07-11 Thread Ulrich Windl
Florian Haas florian.h...@linbit.com schrieb am 11.07.2011 um 14:12 in Nachricht 4e1ae897.4050...@linbit.com: On 2011-07-11 13:40, Ulrich Windl wrote: Hi! I think I misunderstood colocation: I thought a colocation means if two resources are running, the should be (or not) on the same

Re: [Linux-HA] Antw: Re: Q: What type of dependency is colocation?

2011-07-11 Thread Ulrich Windl
Tim Serong tser...@novell.com schrieb am 11.07.2011 um 15:51 in Nachricht 4e1affe5.1010...@novell.com: [...] You probably want to flip the colocation constraint: colocation col_rksapr00_saprouter_ping inf: \ prm_rksapr00_ping grp_rksapr00 Yes I did that once I realized that it's

Re: [Linux-HA] Antw: Re: Q: What type of dependency is colocation?

2011-07-11 Thread Ulrich Windl
Hi! I don't want to carry on this thread, but there's a general problem with several Linux projects: Those that are inside the project all have the opinion that it's the greatest project of all, and everybody that doesn't share that opinion is either ignorant or stupid. I think that's one of

[Linux-HA] Non-existing RA (crm ra info)

2011-07-15 Thread Ulrich Windl
Hi, When trying to get info via crm:xola, the system produced a strange error: crm(live)ra# info ocf:xola lrmadmin[13586]: 2011/07/15_12:45:06 ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg. ERROR:

[Linux-HA] Attempt to send nothing causes an error?

2011-07-19 Thread Ulrich Windl
Hi, I wonder whether this is some programming problem: Jul 19 12:52:47 h03 corosync[14584]: [MAIN ] Corosync Cluster Engine ('1.3.1'): started and ready to provide service. [...] Jul 19 12:52:47 h03 corosync[14584]: [pcmk ] info: send_member_notification: Sending membership update 1024 to 0

[Linux-HA] Antw: Re: location and orders : Question about a behavior ...

2011-08-04 Thread Ulrich Windl
Dan Frincu df.clus...@gmail.com schrieb am 03.08.2011 um 13:28 in Nachricht CADQRkwiFCEUnq-i9Dtv6AbjQz4Z_e792=3is81zv1eqdrnj...@mail.gmail.com: Hi, On Wed, Aug 3, 2011 at 2:22 PM, alain.mou...@bull.net wrote: Hi Thanks I don't think the 1000 or 5000 value makes any difference, The

Re: [Linux-HA] Antw: Re: location and orders : Question about a behavior ...

2011-08-04 Thread Ulrich Windl
Maloja01 maloj...@arcor.de schrieb am 04.08.2011 um 12:58 in Nachricht 4e3a7b5c.1030...@arcor.de: On 08/04/2011 08:28 AM, Ulrich Windl wrote: Hi! Isn't the stickyness effectively based on the failcount? We have one resource that has a location constraint for one node with a weight

[Linux-HA] Antw: Re: [ha-wg-technical] The mess with OCF_CHECK_LEVEL (crm aborts during commit)

2011-08-05 Thread Ulrich Windl
Dejan Muhamedagic de...@suse.de schrieb am 04.08.2011 um 18:32 in Nachricht 20110804163245.GA28585@rondo.homenet: Hi, On Thu, Aug 04, 2011 at 05:45:16PM +0200, Ulrich Windl wrote: Hi! Some RAs support OCF_CHECK_LEVEL (e.g. ocf:heartbeat:Raid1). However the OCF_CHECK_LEVEL

Re: [Linux-HA] Antw: Re: location and orders : Question about a behavior ...

2011-08-05 Thread Ulrich Windl
and complicated. I feel that a group stickyness should override individual resource stickynesses, and not be used a a default stickyness for every resource in the group. Regards, Ulrich Regards Fabian On 08/04/2011 03:10 PM, Ulrich Windl wrote: Maloja01 maloj...@arcor.de schrieb am 04.08.2011 um 12

Re: [Linux-HA] Antw: Re: [ha-wg-technical] The mess with OCF_CHECK_LEVEL (crm aborts during commit)

2011-08-05 Thread Ulrich Windl
Dejan Muhamedagic de...@suse.de schrieb am 05.08.2011 um 08:39 in Nachricht 20110805063900.GB31749@rondo.homenet: Hi, On Fri, Aug 05, 2011 at 08:23:43AM +0200, Ulrich Windl wrote: Dejan Muhamedagic de...@suse.de schrieb am 04.08.2011 um 18:32 in Nachricht 20110804163245.GA28585

[Linux-HA] ocf::LVM monitor needs excessive time to complete

2011-08-05 Thread Ulrich Windl
Hi, we run a cluster that has about 30 LVM VGs that are monitored every minute with a timeout interval of 90s. Surprisingly even if the system is in nominal state, the LVM monitor times out. I suspect this has to do with multiple LVM commands being run in parallel like this: # ps ax |grep vg

[Linux-HA] ocf:heartbeat:exportfs and crm configure verify

2011-08-05 Thread Ulrich Windl
Hi! I think the RA for exportfs needs to be changed to allow a list of hosts (I had mentioned that before). Linux only allows eicher a hostname pattern, an IP mask, or a netgroup, but you cannot specify a thing like host[358] or host{3,5,8}. So as an ugly work-arounf one uses one resource per

[Linux-HA] Q: default vs. default (e.g. exportfs)

2011-08-05 Thread Ulrich Windl
Hi! I frequently see problems I don't understand: When configuring an exportfs resource using crm shell without explicitly specifying operations or timeouts, I get warnings like these: WARNING: prm_nfs_v03: default timeout 20s for start is smaller than the advised 40 I wonder: If the default

[Linux-HA] Antw: Re: ocf::LVM monitor needs excessive time to complete

2011-08-05 Thread Ulrich Windl
Dejan Muhamedagic deja...@fastmail.fm schrieb am 05.08.2011 um 14:18 in Nachricht 20110805121851.GB950@rondo.homenet: Hi, On Fri, Aug 05, 2011 at 01:55:25PM +0200, Ulrich Windl wrote: Hi, we run a cluster that has about 30 LVM VGs that are monitored every minute with a timeout

Re: [Linux-HA] Antw: Re: [ha-wg-technical] The mess with OCF_CHECK_LEVEL (crm aborts during commit)

2011-08-08 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 08.08.2011 um 04:07 in Nachricht caedlwg2ftv2jvzyfxpgp_hamd_ysdk9cyhemqbwhuatssjm...@mail.gmail.com: On Fri, Aug 5, 2011 at 5:15 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Dejan Muhamedagic de...@suse.de schrieb am 05.08.2011 um 08

[Linux-HA] Antw: Link recovery?

2011-08-10 Thread Ulrich Windl
Michael Moon moo...@yahoo.com schrieb am 10.08.2011 um 01:25 in Nachricht 1312932351.7763.yahoomail...@web39408.mail.mud.yahoo.com: I am setting up a test between two machines (Box A Box B) running heartbeat 3.0.4. I have two connections (eth0 eth1) from each machine connected to a switch.

[Linux-HA] Antw: Re: Link recovery?

2011-08-11 Thread Ulrich Windl
Michael Moon moo...@yahoo.com schrieb am 11.08.2011 um 02:09 in Nachricht 1313021358.88558.yahoomail...@web39413.mail.mud.yahoo.com: From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de To: linux-ha@lists.linux-ha.org linux-ha@lists.linux-ha.org; Michael

[Linux-HA] Antw: Re: Q: default vs. default (e.g. exportfs)

2011-08-11 Thread Ulrich Windl
Andrew Beekhof and...@beekhof.net schrieb am 11.08.2011 um 07:57 in Nachricht CAEDLWG3UfkJsYf3x9CUu45K9vdO1rce7FF9V1sooHkdp_X=x...@mail.gmail.com: On Sat, Aug 6, 2011 at 12:01 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I frequently see problems I don't understand

[Linux-HA] Renaming a running resource: to do, or not to do?

2011-08-11 Thread Ulrich Windl
Hi! Using crm shell, you cannot rename a running resource. However I managed to do it via a shadow cib: I renamed the resource in the shadow cib, then committed the shadow cib. From the XML changes, I got the impression that the old primitive is removed, and then the new primitive is added.

  1   2   3   4   5   6   >