[ClusterLabs] Antw: Re: Xen Migration/resource cleanup problem in SLES11 SP3

2015-10-08 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 08.10.2015 um 16:13 in Nachricht <20151008141357.GB15084@tuttle.linbit>: > Hi, > > On Thu, Oct 08, 2015 at 02:29:08PM +0200, Ulrich Windl wrote: >> Hi! >> >> I'd like to report an "interesting problem" w

[ClusterLabs] Antw: Monitoring Op for LVM - Excessive Logging

2015-10-09 Thread Ulrich Windl
>>> Jorge Fábregas schrieb am 09.10.2015 um 14:20 in Nachricht <5617b10f.1060...@gmail.com>: > Hi, > > Is there a way to stop the excessive logging produced by the LVM monitor > operation? I got it set at the default (30 seconds) here on SLES 11 > SP4. However, everytime it runs the DC will wri

[ClusterLabs] Antw: Re: Antw: Monitoring Op for LVM - Excessive Logging

2015-10-11 Thread Ulrich Windl
>>> Jorge Fábregas schrieb am 09.10.2015 um 18:00 in Nachricht <5617e49c.6060...@gmail.com>: > On 10/09/2015 09:06 AM, Ulrich Windl wrote: >> Did you try daemon_options="-d0"? (in clvmd resource) > > You've nailed it Ulrich! I still think thi

[ClusterLabs] Antw: Stopped node detection.

2015-10-16 Thread Ulrich Windl
>>> "Vallevand, Mark K" schrieb am 15.10.2015 um >>> 22:55 in Nachricht <2f280811793d43418745268be7397...@us-exch13-5.na.uis.unisys.com>: > Ubuntu 12.04 LTS > pacemaker 1.1.10 > cman 3.1.7 > corosync 1.4.6 > > If my cluster has no resources, it seems like it takes 20s for a stopped > node to be

[ClusterLabs] Antw: Re: difference between OCF return codes for monitor action

2015-10-22 Thread Ulrich Windl
Hi! I think the RAs' exit codes should be defined on the resource (agent)'s state only, not on the reaction pacemaker will perform. Otherwise the idea of universal resource agents will never work. Regards, Ulrich >>> "Vallevand, Mark K" schrieb am 21.10.2015 um 16:36 in Nachricht <6555c5b6072f4

[ClusterLabs] Antw: Failover to spare node

2015-10-22 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 22.10.2015 um 18:49 in Nachricht <562913b5.6070...@gmail.com>: > Let's say I have a pool of nodes and multiple services, somehow > distributed across them. I would like to keep one node as "spare", > without services by default, and if any of "worker" nodes fail,

[ClusterLabs] Antw: Resource placement strategy and utilization AND resource location preference

2015-10-28 Thread Ulrich Windl
>>> "Vallevand, Mark K" schrieb am 27.10.2015 um >>> 22:24 in Nachricht <37343dddcd2d454baaea374e80d73...@us-exch13-5.na.uis.unisys.com>: > How do the resource placement strategy and utilization AND resource location > preference relate? > > I mean, is it one or the other? Or both somehow? I

[ClusterLabs] Antw: ORACLE 12 and SLES HAE (Sles 11sp3)

2015-10-28 Thread Ulrich Windl
Probably if Oracle 12 is compatible with Oracle 11, the RAs will continue to work. >>> "Cristiano Coltro" schrieb am 28.10.2015 um 09:45 in Nachricht <56309953026ff...@prv-mh.provo.novell.com>: > Hi, > most of the SLES 11 sp3 with HAE are migrating Oracle Db. > The migration will be from

[ClusterLabs] Antw: VIP monitoring failing with Timed Out error

2015-10-28 Thread Ulrich Windl
>>> Pritam Kharat schrieb am 28.10.2015 um >>> 09:51 in Nachricht : > Hi All, > > I am facing one issue in my two node HA. When I stop pacemaker on ACTIVE > node, it takes more time to stop and by this time VIP migration with other > resources migration fails to STANDBY node. (I have seen same i

[ClusterLabs] Antw: Re: [Question] Question about mysql RA.

2015-11-04 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 04.11.2015 um 16:44 in >>> Nachricht <563a27c2.5090...@redhat.com>: > On 11/04/2015 04:36 AM, renayama19661...@ybb.ne.jp wrote: [...] >> pid=`cat $OCF_RESKEY_pid 2> /dev/null ` >> /bin/kill $pid > /dev/null > > I think before this line, the RA should do a "kill

[ClusterLabs] Antw: Re: restarting resources

2015-11-06 Thread Ulrich Windl
>>> zulucloud schrieb am 06.11.2015 um 12:48 in >>> Nachricht <563c9374.5000...@mailbox.org>: > > On 11/02/2015 05:59 PM, - - wrote: >> Hi, >> I need to be able to restart a resource (e.g apache) whenever a >> configuration >> file is updated. I have been using the 'crm resource restart ' c

[ClusterLabs] Antw: new version of Cronlink RA

2015-12-07 Thread Ulrich Windl
Hi! I wonder: It seems it does the same thing as the RA I wrote some years ago: (crm ra info ISC-cron) OCF Resource Agent managing crontabs for ISC cron (ocf:xola:ISC-cron) OCF Resource Agent managing crontabs for ISC cron This RA manages crontabs for the ISC cron daemon by managing links to spe

[ClusterLabs] Antw: Re: start service after filesystemressource

2015-12-07 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 20.11.2015 um 16:06 in >>> Nachricht <564f36e2.90...@redhat.com>: [...] >> location cli-prefer-collectd collectd inf: host-1 >> location cli-prefer-failover-ip1 failover-ip1 inf: host-1 >> location cli-prefer-failover-ip2 failover-ip2 inf: host-1 >> location cli-prefer

[ClusterLabs] Antw: Perl Modules for resource agents (was: Resource Agent language discussion)

2015-12-07 Thread Ulrich Windl
Hi! A few comments (Build.PL): The part beginning at --- $ocf_dirs = qx{ . "$lib_ocf_dirs" 2> /dev/null echo "\$INITDIR" ... --- is somewhat complicated. Why not do something like --- $ocf_dirs = qx{ . "$lib_ocf_dirs" 2> /dev/null echo "INITDIR=\$INITDIR" ... --- and then parse the outpu

[ClusterLabs] Antw: Re: design of a two-node cluster

2015-12-07 Thread Ulrich Windl
>>> Digimer schrieb am 07.12.2015 um 22:40 in Nachricht <5665fcdc.1030...@alteeve.ca>: [...] > Node 1 looks up how to fence node 2, sees no delay and fences > immediately. Node 2 looks up how to fence node 1, sees a delay and > pauses. Node 2 will be dead long before the delay expires, ensuring th

[ClusterLabs] Antw: Re: Antw: Re: design of a two-node cluster

2015-12-08 Thread Ulrich Windl
>>> Andrei Borzenkov schrieb am 08.12.2015 um 09:01 in Nachricht : > On Tue, Dec 8, 2015 at 10:44 AM, Ulrich Windl > wrote: >>>>> Digimer schrieb am 07.12.2015 um 22:40 in Nachricht >> <5665fcdc.1030...@alteeve.ca>: >> [...] >>> Node

[ClusterLabs] Antw: Re: design of a two-node cluster

2015-12-08 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 08.12.2015 >>> um 09:13 in Nachricht <00a901d13190$5c6db3c0$15491b40$@helmholtz-muenchen.de>: > Digimer wrote: > >> >>> Should I install all vm's in one partition or every vm in a seperate >> >>> partition ? The advantage of one vm per partition is that I don't >>

[ClusterLabs] Antw: Re: Antw: Re: design of a two-node cluster

2015-12-08 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 08.12.2015 >>> um 13:10 in Nachricht <012101d131b1$5ec1b2e0$1c4518a0$@helmholtz-muenchen.de>: > Ulrich wrote: > >> >> >>> "Lentes, Bernd" schrieb >> am >> >>> 08.12.2015 um >> 09:13 in Nachricht <00a901d13190$5c6db3c0$15491b40$@helmholtz- >> muenchen.de>: >> > Di

[ClusterLabs] Notice: SLES11SP4 broke exportfs!

2015-12-11 Thread Ulrich Windl
Hi! After updating from SLES11SP3 (june version) to SLES11SP4 (todays version) exportfs fails to get the export status. I have message like this in syslog: Dec 11 19:22:09 h04 crmd[11128]: notice: process_lrm_event: rksaph04-prm_nfs_c11_mnt_exp_monitor_0:93 [ /usr/lib/ocf/resource.d/heartbea

[ClusterLabs] SLES11SP4 (crmsh-2.1.2+git49.g2e3fa0e-1.32): problems parsing nodes

2015-12-11 Thread Ulrich Windl
Hi! "crm configure edit" displays the nodes section in XML like this: xml \ \ \ \ \ \ \ \ In SLES11 SP3 this was not the case... The rest of the config is not displayed in XML, just the nodes. Regards, Ulrich ___ Users

[ClusterLabs] Q: "Bad global update"

2015-12-11 Thread Ulrich Windl
Hi! Can anybody explain what "cib[10103]: warning: cib_process_diff: Bad global update(...)" means? I saw masses of these messages when updating a node from SLES11SP3 to SLES11 SP4. At the end of the block of those message there was this message: cib[10103]:error: cib_process_diff: Diff -1.

[ClusterLabs] Antw: Re: SLES11SP4 (crmsh-2.1.2+git49.g2e3fa0e-1.32): problems parsing nodes

2015-12-13 Thread Ulrich Windl
>>> Kristoffer Grönlund schrieb am 12.12.2015 um 08:43 in Nachricht <87io443ydc@krigpad.kri.gs>: > Ulrich Windl writes: > >> Hi! >> >> "crm configure edit" displays the nodes section in XML like this: >> >> xml \ >>\

[ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-13 Thread Ulrich Windl
Hi! There is one feature in Linux that may affect you: If processes block on I/O (NFS also), the load increases, and the load is the _sum_, and not the _average_ of all CPUs. So if you have many CPUs, your abservable load will typically increase. Recently we had a load of 60, but nobody actuall

[ClusterLabs] Antw: Re: SLES11SP4 (crmsh-2.1.2+git49.g2e3fa0e-1.32): problems parsing nodes

2015-12-14 Thread Ulrich Windl
]:error: crm_int_helper: Characters left over after parsing 'off': 'off' I feel different parts of the cluster stack have different expectations on the syntax... Regards, Ulrich >>> Ulrich Windl schrieb am 14.12.2015 um 08:22 in Nachricht <566E6E49.49E : 161 :

[ClusterLabs] Need help for specific resource manipulation

2015-12-15 Thread Ulrich Windl
Hi! I have two scenarios where I could need help from the experts: 1) For a master/slave resource, is it possible to stop just the master or slave (and let the cluster do the recovery)? For clone resources the failed operations get suffixes like ":0", ":1", ... Would it work with master/slave

[ClusterLabs] Antw: Re: design of a two-node cluster

2015-12-15 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 16.12.2015 >>> um 00:35 in Nachricht <1621336773.386234.1450222516246.javamail.zim...@helmholtz-muenchen.de>: [...] > What is about a quorum disk ? I also read about "tiebrakers" or that STONITH > is magically able to chosse the right > node to fence (i can't beli

[ClusterLabs] Antw: Re: Antw: Re: design of a two-node cluster

2015-12-16 Thread Ulrich Windl
>>> "Lentes, Bernd" schrieb am 16.12.2015 >>> um 13:31 in Nachricht <1496952480.528507.1450269100369.javamail.zim...@helmholtz-muenchen.de>: > Ulrich wrote: > > - On Dec 16, 2015, at 8:36 AM, Ulrich Windl > ulrich.wi...@rz.uni-regensburg.de

[ClusterLabs] Antw: Re: Antw: Re: Resources start serial, not parralel

2015-12-16 Thread Ulrich Windl
>>> Michal Koutný schrieb am 16.12.2015 um 13:43 in Nachricht <56715c56.60...@suse.com>: > Hi Oleg. > > On 12/16/2015 11:31 AM, Oleg Ilyin wrote: >> So, main point of my issue is jobs = 1 >> >> Please, does it possibly to increase number of jobs through throttle high? > The parameter you are

[ClusterLabs] Antw: Re: Early VM resource migration

2015-12-16 Thread Ulrich Windl
>>> Klechomir schrieb am 16.12.2015 um 17:30 in Nachricht <5671918e.40...@gmail.com>: > On 16.12.2015 17:52, Ken Gaillot wrote: >> On 12/16/2015 02:09 AM, Klechomir wrote: >>> Hi list, >>> I have a cluster with VM resources on a cloned active-active storage. >>> >>> VirtualDomain resource migrates

[ClusterLabs] Antw: Re: Antw: Re: Early VM resource migration

2015-12-17 Thread Ulrich Windl
es. config will help solving the problem. You could send logs with the actual startup sequence then. > > Regards, > KIecho > > On 17.12.2015 08:19:43 Ulrich Windl wrote: >> >>> Klechomir schrieb am 16.12.2015 um 17:30 in Nachricht >> >> <5671918e.40...@gm

[ClusterLabs] Antw: Re: Notice: SLES11SP4 broke exportfs!

2015-12-21 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 21.12.2015 um 11:40 in Nachricht <20151221104011.GB9783@walrus.homenet>: > Hi, > > On Fri, Dec 11, 2015 at 07:27:28PM +0100, Ulrich Windl wrote: >> Hi! >> >> After updating from SLES11SP3 (june version) to SLES11SP4 (to

[ClusterLabs] Q: migration-threshold

2015-12-22 Thread Ulrich Windl
Hi! Simple question (SLES11 SP3): I have set migration-threshold=1 and it seems to work. Now I tried to stop two resources that are "Migration enabled". I found out that the stop for the second resource was delayed until the first one acknowledged the stop. Is this a side-effect of migration-th

[ClusterLabs] Antw: Regarding IP tables and IP Address clone

2015-12-23 Thread Ulrich Windl
>>> Somanath Jeeva schrieb am 23.12.2015 um 06:01 >>> in Nachricht <4f5e5141ed95ff45b3128f3c7b1b2a6721abe...@eusaamb109.ericsson.se>: > Hi , > > Thanks for the quick reply. > > I will check with our IT team regarding the multicast MAC at switch level. > > But the Virtual IP is reachable for 15

[ClusterLabs] crm shell: "help migrate" is somewhat "thin"...

2015-12-28 Thread Ulrich Windl
Hi! When trying to migrate a clone resource I got this (SLES11 SP4 with crmsh-2.1.2+git49.g2e3fa0e-1.32): # crm resource migrate cln_ctdb PT5M Resource 'cln_ctdb' not moved: active in 2 locations. You can prevent 'cln_ctdb' from running on a specific location with: --ban --host Error performing

[ClusterLabs] Antw: crm shell: "help migrate" is somewhat "thin"...

2015-12-28 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 28.12.2015 um 09:17 in Nachricht <5680F025.9B2 : >>> 161 : 60728>: > Hi! > > When trying to migrate a clone resource I got this (SLES11 SP4 with > crmsh-2.1.2+git49.g2e3fa0e-1.32): > # crm resource migrate cln_ctdb PT5M >

[ClusterLabs] Q: (crm shell) unmanage vs. maintenance

2015-12-28 Thread Ulrich Windl
Hi! I put a resource into maintenance mode using "crm resource maintenance ". After a while I tried to undo it by trying "crm resource manage ", but that did not have any effect; I had to use "crm resource maintenance off". So may I ask what is the difference between both variants? Regards, Ul

[ClusterLabs] Antw: About globally unique resource instances distribution per node

2015-12-29 Thread Ulrich Windl
Hi! I would expect if you set the cpu utilization per primitive (used in clone) to one and set the cpu capacity per node to the correct number that no node has more primitives than the cpu number allows and that primitives are distributed among all available nodes. Isn't that true in your case? W

[ClusterLabs] Antw: Regarding IP tables and IP Address clone

2015-12-30 Thread Ulrich Windl
>>> Somanath Jeeva schrieb am 30.12.2015 um 11:34 >>> in Nachricht <4f5e5141ed95ff45b3128f3c7b1b2a6721abf...@eusaamb109.ericsson.se>: > On 12/22/2015 08:09 AM, Somanath Jeeva wrote: >> Hi >> I am trying to use ip loadbalancing using cloning feature in pacemaker. but > After 15 min the virtual ip

[ClusterLabs] Antw: SBD Latency Warnings

2016-01-10 Thread Ulrich Windl
>>> Jorge Fábregas schrieb am 30.12.2015 um 17:53 in Nachricht <56840c21.1050...@gmail.com>: > Hi, > > We're having some issues with a particular oversubscribed hypervisor > (cpu-wise) where we run SLES 11 SP4 guests. I had to increase many > timeouts on the cluster to cope with this: Hi! (I'm

[ClusterLabs] Antw: Re: Antw: About globally unique resource instances distribution per node

2016-01-10 Thread Ulrich Windl
>>> Daniel Hernández schrieb am 30.12.2015 um 19:43 in Nachricht : > On 12/30/15, Ulrich Windl wrote: >> Hi! >> >> I would expect if you set the cpu utilization per primitive (used in clone) >> to >> one and set the cpu capacity per node to the correct

[ClusterLabs] Antw: DLM hanging when corosync is OK causes cluster to hang

2016-01-11 Thread Ulrich Windl
>>> Digimer schrieb am 11.01.2016 um 17:59 in Nachricht <5693df77.7000...@alteeve.ca>: > Hi all, > > We hit a strange problem where a RAID controller on a node failed, > causing DLM (gfs2/clvmd) to hang, but the node was never fenced. I > assume this was because corosync was still working. I w

[ClusterLabs] Q: What is the meaning of "sbd: [19541]: info: Watchdog enabled."

2016-01-13 Thread Ulrich Windl
Hi! Since an update of sbd in SLES11 SP4 (sbd-1.2.1-0.12.1), I see frequent syslog messages like these (grep "Watchdog enabled." /var/log/messages): Jan 13 00:01:01 h02 sbd: [19373]: info: Watchdog enabled. Jan 13 00:01:01 h02 sbd: [19380]: info: Watchdog enabled. Jan 13 00:04:02 h02 sbd: [21740]

[ClusterLabs] SLES11SP4: problem in ldirectord with external-perl?

2016-01-14 Thread Ulrich Windl
Hi! We asre using a ldirectord configuration that uses an external-perl to check for connections. That worked in SLES11SP3, but I just discovered that IPVS thinks all real services are up, when in fact the perl module reports they are down. Could this be due to some bad change? Trying to debug

[ClusterLabs] Antw: crmsh 2.2.0 has been released!

2016-01-17 Thread Ulrich Windl
Hi! I have a proposal for future release announcements: Can you include the corresponding help texts for new features (where new features are mentioned)? That would have two advantages: 1) One better understands what the feature is about 2) One can quickly give feedback if the (new) help text is

[ClusterLabs] Q: maintenance-mode turning off automatically?

2016-01-27 Thread Ulrich Windl
Hi! I have a question (for SLES11 SP4 with current updates): Is it a feature that maintenance mode turn off when you do a rcopenais stop/start? I had turned maintenance mode on, stopped both nodes of a cluster (rcopenais stop). Then, when I started the node that was DC before, maintenance mode w

[ClusterLabs] Antw: Q: maintenance-mode turning off automatically?

2016-01-27 Thread Ulrich Windl
m /var/lib/heartbeat/crm to /var/lib/pacemaker/cib/ while the node was active. As the move was within one filesystem, that really shouldn't have hurt much... Regards, Ulrich Windl >>> "Ulrich Windl" schrieb am 27.01.2016 um 13:42 in Nachricht <56a8c95802a10001

[ClusterLabs] Antw: Re: kvm live migration, resource moving

2016-02-04 Thread Ulrich Windl
>>> Kyle O'Donnell schrieb am 04.02.2016 um 14:17 in Nachricht <124846465.11794.1454591851253.javamail.zim...@0b10.mx>: [...] > I had: > location cli-prefer-tome tome_kvm inf: ny4j1-kvm02 > > removed that and I am all good! [...] That's why I ALWAYS specify a time when migrating resources. So i

[ClusterLabs] Antw: Re: Antw: Re: kvm live migration, resource moving

2016-02-04 Thread Ulrich Windl
ference/priority (lower=higher prior)? > > ----- Original Message - > From: "Ulrich Windl" > To: "users" > Sent: Thursday, February 4, 2016 8:47:23 AM > Subject: [ClusterLabs] Antw: Re: kvm live migration, resource moving > >>>> Kyle O'Donn

[ClusterLabs] Antw: crmsh configure delete for constraints

2016-02-09 Thread Ulrich Windl
Hi! I guess it's hard to guess what the cluster will do when you remove constraints. Maybe adding a keyword like "delete --no-wait ..." moves the risk to the operator... However I'm not quite sure I understood your problem fully. Regards, Ulrich >>> Vladislav Bogdanov schrieb am 08.02.2016 um

[ClusterLabs] Antw: Re: DLM fencing

2016-02-09 Thread Ulrich Windl
>>> Digimer schrieb am 08.02.2016 um 20:03 in Nachricht <56b8e68a.1060...@alteeve.ca>: > On 08/02/16 01:56 PM, Ferenc Wágner wrote: >> Ken Gaillot writes: >> >>> On 02/07/2016 12:21 AM, G Spot wrote: >>> Thanks for your response, am using ocf:pacemaker:controld resource agent and stoni

[ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 09.02.2016 um 20:58 in Nachricht <20160209195816.GD2437@walrus.homenet>: [...] >> Particularly, imho RAs should not run validate_all on stop >> action. > > I'd disagree here. If the environment is no good (bad > installation, missing configuration and similar), th

[ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Ulrich Windl
>>> Vladislav Bogdanov schrieb am 10.02.2016 um 05:39 in Nachricht <6e479808-6362-4932-b2c6-348c7efc4...@hoster-ok.com>: [...] > Well, I'd reword. Generally, RA should not exit with error if validation > fails on stop. > Is that better? [...] As we have different error codes, what type of error

[ClusterLabs] Antw: Re: Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 10.02.2016 um 11:56 in Nachricht <87mvr8n896@lant.ki.iif.hu>: > Vladislav Bogdanov writes: > >> If pacemaker has got an error on start, it will run stop with the same >> set of parameters anyways. And will get error again if that one was >> from validation and RA

[ClusterLabs] Antw: Re: Antw: Re: DLM fencing

2016-02-10 Thread Ulrich Windl
>>> Digimer schrieb am 10.02.2016 um 17:32 in Nachricht <56bb6637.6090...@alteeve.ca>: > On 10/02/16 02:40 AM, Ulrich Windl wrote: [...] >>> If fencing fails or is not configured, DLM never unblocks and anything >>> using it is left hung (by design,

[ClusterLabs] Antw: Pacemaker for 389 directory server with multi-master replication

2016-02-21 Thread Ulrich Windl
>>> "Bernie Jones" schrieb am 20.02.2016 um 13:50 in Nachricht <000601d16bdd$372178e0$a5646aa0$@ltd.uk>: > Hi all, > > > > I'm new to this list and fairly new to pacemaker and have just spent a > couple of days trying unsuccessfully to solve a configuration challenge. > > > > I have seen a

[ClusterLabs] Antw: Migrating cluster node from SLES11 for SAP sp2 to sp3

2016-02-22 Thread Ulrich Windl
>>> Cristiano Coltro schrieb am 22.02.2016 um 09:29 in Nachricht : > Hi All, > A customer of mine is searching the smmothest way to migrate an envinronment > SLES for SAP 11 sp2 to SLES for SAP 11 sp3. May I ask why they want to go to SP3 now when SP4 is out for months? Regards, Ulrich _

[ClusterLabs] Antw: Re: Regular pengine warnings after a transient failure

2016-03-08 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 08.03.2016 um 15:08 in Nachricht <87wppdoydv@lant.ki.iif.hu>: > Ken Gaillot writes: > >> On 03/07/2016 02:03 PM, Ferenc Wágner wrote: >> >>> The transition-keys match, does this mean that the above is a late >>> result from the monitor operation which was conside

[ClusterLabs] Antw: Re: Pacemaker startup-fencing

2016-03-19 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 16.03.2016 um 13:47 in Nachricht <87k2l2zj0n@lant.ki.iif.hu>: [...] > Then I wonder why I hear the "must have working fencing if you value > your data" mantra so often (and always without explanation). After all, > it does not risk the data, only the automatic clu

[ClusterLabs] Antw: Installed Galera, now HAProxy won't start

2016-03-19 Thread Ulrich Windl
>>> Matthew Mucker schrieb am 16.03.2016 um 23:10 in >>> Nachricht [...] > So thinking this through logically, it seems to me that the Openstack > docs were wrong in telling me to configure MariaDB server to bind to all > available ports In a cluster environment with virtual IP addresse

[ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Ulrich Windl
>>> Christopher Harvey schrieb am 16.03.2016 um 21:04 in Nachricht <1458158684.122207.551267810.11f73...@webmail.messagingengine.com>: [...] >> > Would stonith solve this problem, or does this look like a bug? >> >> It should, that is its job. > > is there some log I can enable that would say >

[ClusterLabs] Antw: Re: Cluster failover failure with Unresolved dependency

2016-03-21 Thread Ulrich Windl
>>> Lorand Kelemen schrieb am 18.03.2016 um 16:42 in Nachricht : > I reviewed all the logs, but found nothing out of the ordinary, besides the > "resource cannot run anywhere" line, however after the cluster recheck > interval expired the services started fine without any suspicious log > entries.

[ClusterLabs] Antw: Re: PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-21 Thread Ulrich Windl
>>> Dennis Jacobfeuerborn schrieb am 19.03.2016 um >>> 14:32 in Nachricht <56ed5507.7070...@conversis.de>: > On 17.03.2016 08:45, Andrei Borzenkov wrote: >> On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt wrote: >>> I guess I have to say "never mind!" I don't know what the problem was >>> yester

[ClusterLabs] Antw: Re: Antw: Re: reproducible split brain

2016-03-21 Thread Ulrich Windl
>>> Dennis Jacobfeuerborn schrieb am 19.03.2016 um >>> 15:10 in Nachricht <56ed5dc4.9080...@conversis.de>: [...] > I think the key issue here is that people think about corosync they > believe there can only be two state for membership (true or false) when > in reality there are three possible s

[ClusterLabs] Antw: Re: Antw: Re: Cluster failover failure with Unresolved dependency

2016-03-21 Thread Ulrich Windl
visd: migration-threshold=1 fail-count=1 last-failure='Mon Mar 21 > 10:00:53 2016' > > Failed Actions: > * amavisd_monitor_6 on mail2 'not running' (7): call=2604, > status=complete, exitreason='none', > last-rc-change='Mon Mar 21 10:00:5

[ClusterLabs] Antw: Re: no clone for pcs-based cluster fencing?

2016-03-21 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 21.03.2016 um 15:22 in >>> Nachricht <56f003b0.4020...@redhat.com>: [...] > It's actually newer pacemaker versions rather than pcs itself. Fence > agents do not need to be cloned, or even running -- as long as they're > configured and enabled, any node can use the reso

[ClusterLabs] Antw: OCF agent validation proposal (was Re: fence_scsi no such device)

2016-03-22 Thread Ulrich Windl
>>> Kristoffer Grönlund schrieb am 22.03.2016 um 09:40 in Nachricht <87lh5ahpmr@krigpad.kri.gs>: > Jan Pokorný writes: > >> Hmm, I keep lamenting that by extending agents metadata with inline >> RelaxNG grammar to express co-occurrence/mutual exclusion of >> particular parameters and/or its

[ClusterLabs] Antw: Re: spread out resources

2016-04-03 Thread Ulrich Windl
Hi! Actually form my SLES11 SP[1-4] experience, the cluster always distributes resources across all available nodes, and only if don't want that, I'll have to add constraints. I wonder why that does not seem to work for you. Regards, Ulrich >>> Ferenc Wágner schrieb am 02.04.2016 um 10:28 in Na

[ClusterLabs] Antw: Re: Pacemaker on-fail standby recovery does not start DRBD slave resource

2016-04-06 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 07.04.2016 um 00:04 in Nachricht <57058805.8050...@redhat.com>: > On 03/30/2016 12:18 PM, Sam Gardner wrote: >> I'll check about the cluster-recheck-interval. Attached is a crm_report. >> >> In the meantime, what is all performed on that interval? The Red Hat docs >> sa

[ClusterLabs] Antw: syntax in rsc_template after update

2016-04-08 Thread Ulrich Windl
>>> Robert Dahlem schrieb am 08.04.2016 um 12:37 in Nachricht <570789f9.4040...@gmx.net>: > Hi, > > I am in the process of updating my SUSE SLES 11 system from SP3 to SP4. Did you install updates also? (e.g. crmsh-2.1.2+git132.gbc9fde0-7.2) > > This involves some version changes: > crmsh

[ClusterLabs] Antw: Utilization zones

2016-04-18 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 18.04.2016 um 17:07 in Nachricht <87a8krx8e7@lant.ki.iif.hu>: > Hi, > > I'm using the "balanced" placement strategy with good success. It > distributes our VM resources according to memory size perfectly. > However, I'd like to take the NUMA topology into account

[ClusterLabs] Antw: Re: Utilization zones

2016-04-19 Thread Ulrich Windl
>>> Ferenc Wágner schrieb am 19.04.2016 um 13:42 in Nachricht <87y489x1s2.fsf...@lant.ki.iif.hu>: > "Ulrich Windl" writes: > >> Ferenc Wágner schrieb am 18.04.2016 um 17:07 in Nachricht >> >>> I'm using the "balanced" placeme

[ClusterLabs] Q: Resource balancing opration

2016-04-19 Thread Ulrich Windl
Hi! I'm wondering: If you boot a node on a cluster, most resources will go to another node (if possible). Due to stickiness configured, those resources will stay there. So I'm wondering whether or how I could cause a rebalance of resources on the cluster. I must admit that I don't understand th

[ClusterLabs] Antw: Re: Q: Resource balancing opration

2016-04-20 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 20.04.2016 um 16:44 in >>> Nachricht <571795e5.4090...@redhat.com>: > On 04/20/2016 01:17 AM, Ulrich Windl wrote: >> Hi! >> >> I'm wondering: If you boot a node on a cluster, most resources will go to > another no

[ClusterLabs] Antw: Re: Antw: Re: Q: Resource balancing opration

2016-04-21 Thread Ulrich Windl
>>> Tomas Jelinek schrieb am 21.04.2016 um 10:20 in >>> Nachricht <57188d6b.5020...@redhat.com>: > Dne 21.4.2016 v 08:56 Ulrich Windl napsal(a): >>>>> Ken Gaillot schrieb am 20.04.2016 um 16:44 in >>>>> Nachricht >> <571795e5.4

[ClusterLabs] Antw: Coming in 1.1.15: Event-driven alerts

2016-04-21 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 21.04.2016 um 19:50 in >>> Nachricht <571912f3.2060...@redhat.com>: [...] > The alerts section can have any number of alerts, which look like: > > path="/srv/pacemaker/pcmk_alert_sample.sh"> > > value="/var/log/cluster-alerts.log

[ClusterLabs] Antw: Re: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-04-25 Thread Ulrich Windl
>>> Lars Marowsky-Bree schrieb am 25.04.2016 um 12:12 in Nachricht <20160425101236.gd10...@suse.de>: > On 2016-04-25T10:10:38, Ulrich Windl wrote: > > Hi Ulrich, > > I can't really comment on why the cLVM2 is slow (somewhat surprisingly, > because floc

[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

2016-04-27 Thread Ulrich Windl
>>> Kristoffer Grönlund schrieb am 27.04.2016 um 12:12 in Nachricht <87twinl5sb@krigpad.kri.gs>: > Ken Gaillot writes: > >> The most prominent feature will be Klaus Wenninger's new implementation >> of event-driven alerts -- the ability to call scripts whenever >> interesting events occur (n

[ClusterLabs] Antw: Re: Coming in 1.1.15: Event-driven alerts

2016-04-27 Thread Ulrich Windl
Hi! I wonder: would passing the CIB generation (like 1.6.122) or a (local?) event sequence number to the notification script (SNMP trap) help? Regards, Ulrich >>> Klaus Wenninger schrieb am 27.04.2016 um 20:14 in Nachricht <57210183.6050...@redhat.com>: > On 04/27/2016 04:19 PM, renayama19661.

[ClusterLabs] Antw: ringid interface FAULTY no resource move

2016-05-04 Thread Ulrich Windl
>>> Rafal Sanocki schrieb am 04.05.2016 um 14:14 in Nachricht <78d882b1-a407-31e0-2b9e-b5f8406d4...@gmail.com>: > Hello, > I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker > , DRBD . When i plug out cable nothing happend. "nothing"? The wrong cable? [...] Regards, Ulri

[ClusterLabs] Antw: Error message: Couldn't find device

2016-05-10 Thread Ulrich Windl
>>> Christopher Fogarty schrieb am >>> 09.05.2016 um 17:54 in Nachricht <4c41e827a2b5457ab06160db63bdd...@sveexch01.versiant.net>: > I am getting this error message > > data_vol (ocf::heartbeat:LVM): Started node1 > ClusterIP (ocf::heartbeat:IPaddr2): Started node1 > KahaDB_FS

[ClusterLabs] Antw: SDB + Pacemaker for virtuals

2016-05-10 Thread Ulrich Windl
>>> Mohammed Alam schrieb am 09.05.2016 um 08:17 in >>> Nachricht : > Hi, > > I was reading Andrew Beekhof's blog (particularly this article: > http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit/) and became very > interested in setting up a cluster using SBD fencing on a shared FCOE > sto

[ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

2016-05-10 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 10.05.2016 um 00:40 in >>> Nachricht <573111d3.7060...@redhat.com>: > On 05/04/2016 11:47 AM, Adam Spiers wrote: >> Ken Gaillot wrote: >>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote: On 05/04/2016 02:09 PM, Adam Spiers wrote: > Hi all, > > As discus

[ClusterLabs] bug in crm shell (SLES11 SP4)? required parameters not checked

2016-05-10 Thread Ulrich Windl
Hi! I think I found this bug in crm shell (crmsh-2.1.2+git132.gbc9fde0-7.2): When defining a new primitive, required parameters are not checked (I'm using my own RA, but any RA should do): -- crm(live)configure# primitive bad ocf:xola:iotwatch crm(live)configure# -- But when I edit the primiti

[ClusterLabs] Q: primitive formatting in crm shell

2016-05-10 Thread Ulrich Windl
Hi! I have a long-standing question: What are the rules for formatting a primitive in crm shell? It seems operations are put on separate lines, but parameters are not (The problem is more clear on non-wide terminals like 80 columns). I think having parameters on separate lines would help readabi

[ClusterLabs] Antw: Re: bug in crm shell (SLES11 SP4)? required parameters not checked

2016-05-10 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 10.05.2016 um 19:24 in Nachricht <20160510172423.GA26066@walrus.homenet>: > Hi, > > On Tue, May 10, 2016 at 02:51:19PM +0200, Kristoffer Grönlund wrote: >> Ulrich Windl writes: >> >> > Hi! >> > >>

[ClusterLabs] Antw: Resource clone groups - resource deletion

2016-05-11 Thread Ulrich Windl
Hi! I think the solution is switching from sequential dependencies in groups to parallel dependencies, separating in ordering and colocation In crm shell it is displayed as "order any inf: first ( then1 then2 )" and "colocation any inf: ( then11 then2 ) first". Would this help? We had a similar

[ClusterLabs] Antw: Re: Antw: Re: bug in crm shell (SLES11 SP4)? required parameters not checked

2016-05-11 Thread Ulrich Windl
>>> Lars Marowsky-Bree schrieb am 11.05.2016 um 16:13 in >>> Nachricht <20160511141305.gi...@suse.de>: > On 2016-05-11T08:02:56, Ulrich Windl > wrote: > >> > $ crm help Checks >> > $ crm options help check-frequency >> > $ crm option

[ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Ulrich Windl
Hi! I have a question regarding an RA written by myself and pacemaker 1.1.12-f47ea56 (SLES11 SP4): During "probe" all resources' "monitor" actions are executed (regardless of any ordering constraints). Therefore my RA considers a parameter as invalid ("file does not exist") (the file will be p

[ClusterLabs] Q: (somewhat off-topic) xentop counter VBD_RSECT

2016-05-12 Thread Ulrich Windl
Hi! Excuse me for asking something somewhat off-topic: We are running some Xen PVMs on SLES11 SP4, and I noticed from time to time what looks like a 64-bit overflow in VBD_RSECT. I think I had addressed this issue once before with support, but without success (memory may be wrong). I'd simply l

[ClusterLabs] Antw: Q: monitor and probe result codes and consequences

2016-05-12 Thread Ulrich Windl
>>> Ulrich Windl schrieb am 12.05.2016 um 09:56 in Nachricht <5734373F.4DE : >>> 161 : 60728>: > Hi! > > I have a question regarding an RA written by myself and pacemaker > 1.1.12-f47ea56 (SLES11 SP4): > > During "probe" all resources&#

[ClusterLabs] Antw: Re: Q: monitor and probe result codes and consequences

2016-05-13 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 12.05.2016 um 16:41 in >>> Nachricht <57349629.40...@redhat.com>: > On 05/12/2016 02:56 AM, Ulrich Windl wrote: >> Hi! >> >> I have a question regarding an RA written by myself and pacemaker > 1.1.12-f47ea56 (SLES11 SP4

[ClusterLabs] Antw: Re: Antw: Re: Q: monitor and probe result codes and consequences

2016-05-13 Thread Ulrich Windl
>>> Dejan Muhamedagic schrieb am 13.05.2016 um 12:16 in Nachricht <20160513101626.GA12493@walrus.homenet>: > Hi, > > On Fri, May 13, 2016 at 09:05:54AM +0200, Ulrich Windl wrote: >> >>> Ken Gaillot schrieb am 12.05.2016 um 16:41 in >> >>> N

[ClusterLabs] Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-16 Thread Ulrich Windl
Hi! One of the main problems I identified with POSIX shared memory (/dev/shm) in Linux is that changes to the shared memory don't affect the i-node, so you cannot tell from a "ls -rtl" which segments are still active and which are not. You can only see the creation time. Maybe there should be

[ClusterLabs] Bad help in crm shell? (You can prevent 'clone' from running on a specific location with: --ban --host )

2016-05-17 Thread Ulrich Windl
Hi! I tried to move a primitive away from a node with crm shell: crm resource migrate clone PT5M This results in the message: Resource 'clone' not moved: active in 2 locations. You can prevent 'clone' from running on a specific location with: --ban --host However when I tried it on host "node

[ClusterLabs] Antw: Re: Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-17 Thread Ulrich Windl
is a node from a three-node cliuster (SLES11 SP4), with no tuning applied to SHM sizes: # df /dev/shm Filesystem 1K-blocks Used Available Use% Mounted on tmpfs220 73368 2037752 4% /dev/shm Regards, Ulrich > > Thanks > Nikhil > > On Tue, May 17, 201

[ClusterLabs] Antw: Re: Bad help in crm shell? (You can prevent 'clone' from running on a specific location with: --ban --host )

2016-05-17 Thread Ulrich Windl
>>> Kristoffer Grönlund schrieb am 17.05.2016 um 14:51 in Nachricht <87d1okkftm@krigpad.kri.gs>: > Ulrich Windl writes: > >> Hi! >> >> I tried to move a primitive away from a node with crm shell: >> >> crm resource migrate clone PT5M &g

[ClusterLabs] Antw: Re: Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-17 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 17.05.2016 um 16:53 in >>> Nachricht <573b3074.1040...@redhat.com>: > On 05/17/2016 04:07 AM, Nikhil Utane wrote: >> What I would like to understand is how much total shared memory >> (approximately) would Pacemaker need so that accordingly I can define >> the partition

[ClusterLabs] Antw: Re: Antw: Re: Using different folder for /var/lib/pacemaker and usage of /dev/shm files

2016-05-17 Thread Ulrich Windl
>>> Ken Gaillot schrieb am 17.05.2016 um 19:23 in >>> Nachricht <573b537a.1080...@redhat.com>: > On 05/17/2016 12:02 PM, Nikhil Utane wrote: >> OK. Will do that. >> >> Actually I gave the /dev/shm usage when the cluster wasn't up. >> When it is up, I see it occupies close to 300 MB (it's also th

[ClusterLabs] Antw: Pacemaker restart resources when node joins cluster after failback

2016-05-19 Thread Ulrich Windl
>>> Dharmesh schrieb am 19.05.2016 um 13:18 in >>> Nachricht : > Hi, > > i am having a 2 node Debian cluster with resources configured in it. > Everything is working fine apart from one thing. Usually you find the reasons in the logs (syslog, cluster log, etc.). > > As and when one of my two

[ClusterLabs] Antw: Re: Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-05-19 Thread Ulrich Windl
>>> Jehan-Guillaume de Rorthais schrieb am 19.05.2016 um >>> 21:29 in Nachricht <20160519212947.6cc0fd7b@firost>: [...] > I was thinking of a use case where a graceful demote or stop action failed > multiple times and to give a chance to the RA to choose another method to > stop > the resource b

  1   2   3   4   5   6   7   8   9   10   >