[ClusterLabs] PCMK_ipc_buffer recommendation

2019-01-17 Thread Ferenc Wágner
Hi, Looking at lib/common/ipc.c, Pacemaker recommends setting PCMK_ipc_buffer to 4 times the *uncompressed* size of the biggest message seen: error: Could not compress the message (2309508 bytes) into less than the configured ipc limit (131072 bytes). Set PCMK_ipc_buffer to a higher value (9238

Re: [ClusterLabs] live migration rarely fails seemingly without reason

2018-12-03 Thread Ferenc Wágner
"Lentes, Bernd" writes: > 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03 > 15:03:02.835+: 4515: error : qemuMigrationCheckJobStatus:1456 : operation > failed: migration job: unexpectedly failed The above message is a hint at the real problem. It comes from libvirtd,

Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Ferenc Wágner
Patrick Whitney writes: > I have a two node (test) cluster running corosync/pacemaker with DLM > and CLVM. > > I was running into an issue where when one node failed, the remaining node > would appear to do the right thing, from the pcmk perspective, that is. > It would create a new cluster (of

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > I'm also looking into high-res timestamps for logfiles too. Wouldn't that be a useful option for the syslog output as well? I'm sometimes concerned by the batching effect added by the transport between the application and the (local) log server (rsyslog or systemd)

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Ken Gaillot writes: > libqb would simply provide the API for reopening the log, and clients > such as pacemaker would intercept the signal and call the API. Just for posterity: you needn't restrict yourself to signals. Logrotate has nothing to do with signals. Signals are a rather limited form

Re: [ClusterLabs] Antw: Salvaging aborted resource migration

2018-09-27 Thread Ferenc Wágner
Ken Gaillot writes: > On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote: > >> Obviously you violated the most important cluster rule that is "be >> patient". Maybe the next important is "Don't change the >> configuration while the cluster is not in IDLE state" ;-) > > Agreed -- although eve

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > TBH I would be quite happy to leave this to logrotate but the message I > was getting here is that we need additional help from libqb. I'm willing > to go with a consensus on this though Yes, to do a proper job logrotate has to have a way to get the log files reopen

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > I'm looking into new features for libqb and the option in > https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425 > looks like a good option to me. It feels backwards to me: traditionally, increasing numbers signify older rotated logs, while this pro

[ClusterLabs] Salvaging aborted resource migration

2018-09-26 Thread Ferenc Wágner
Hi, The current behavior of cancelled migration with Pacemaker 1.1.16 with a resource implementing push migration: # /usr/sbin/crm_resource --ban -r vm-conv-4 vhbl03 crmd[10017]: notice: State transition S_IDLE -> S_POLICY_ENGINE vhbl03 pengine[10016]: notice: Migrate vm-conv-4#011(Started v

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-26 Thread Ferenc Wágner
Jan Friesse writes: > wagner.fer...@kifu.gov.hu writes: > >> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common >> choices, but logging.* cmap keys probably fit Corosync better). That >> would enable proper log rotation. > > What is the reason that you find "copytruncate" a

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse writes: > Default example config should be definitively ported to newer style of > nodelist without interface section. example.udpu can probably be > deleted as well as example.xml (whole idea of having XML was because > of cluster config tools like pcs, but these tools never used > c

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse writes: > Have you had a time to play with packaging current alpha to find out > if there are no issues? I had no problems with Fedora, but Debian has > a lot of patches, and I would be really grateful if we could reduce > them a lot - so please let me know if there is patch which you

Re: [ClusterLabs] Corosync 3 release plans?

2018-08-27 Thread Ferenc Wágner
Jan Friesse writes: > Currently I'm pretty happy with current Corosync alpha stability so it > would be possible to release final right now, but because I want to > give us some room to break protocol/abi (only if needed and right now > I don't see any strong reason for such breakage), I didn't r

[ClusterLabs] Corosync 3 release plans? (was: Redundant ring not recovering after node is back)

2018-08-26 Thread Ferenc Wágner
Jan Friesse writes: > try corosync 3.x (current Alpha4 is pretty stable [...] Hi Honza, Can you provide an estimate for the Corosync 3 release timeline? We have to plan the ABI transition in Debian anf the freeze date is drawing closer. -- Thanks, Feri

Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > David Tolosa writes: > >> I tried to install corosync 3.x and it works pretty well. >> But when I install pacemaker, it installs previous version of corosync as >> dependency and breaks all the setup. >> Any suggestions? >

Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
David Tolosa writes: > I tried to install corosync 3.x and it works pretty well. > But when I install pacemaker, it installs previous version of corosync as > dependency and breaks all the setup. > Any suggestions? Install the equivs package to create a dummy corosync package representing your l

Re: [ClusterLabs] Antw: Re: Spurious node loss in corosync cluster

2018-08-22 Thread Ferenc Wágner
Jan Friesse writes: > Is that system VM or physical machine? Because " Corosync main process > was not scheduled for..." is usually happening on VMs where hosts are > highly overloaded. Or when physical hosts use BMC watchdogs. But Prasad didn't encounter such logs in the setup at hand, as far

Re: [ClusterLabs] DLM recovery stuck (digression: Corosync watchdog experience)

2018-08-10 Thread Ferenc Wágner
FeldHost™ Admin writes: > rule of thumb is use separate dedicated network for corosync traffic. > For ex. we use two corosync rings, first and active one on separate > network card and switch, second passive one on team (bond) device vlan. Hi, That's fine in principle, but this is a bladecenter

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
David Teigland writes: > On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote: > >> Almost ten years ago you requested more info in a similar case, let's >> see if we can get further now! > > Hi, the usual cause is that a network message from the dlm h

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > For a start I attached the dump output from another node. I meant to... 146 dlm_controld 4.0.5 started 146 our_nodeid 167773708 146 found /dev/misc/dlm-control minor 58 146 found /dev/misc/dlm-monitor minor 57 146 found /dev/misc/dlm_plock minor 56

[ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
Hi David, Almost ten years ago you requested more info in a similar case, let's see if we can get further now! We're running a 6-node Corosync cluster. DLM is started by systemd: ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled) Active: active (r

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-07 Thread Ferenc Wágner
Jan Pokorný writes: > 1. [X] Do you edit CIB by hand (as opposed to relying on crm/pcs or > their UI counterparts)? For debugging one has to understand the CIB anyway, so why learn additional syntaxes? :) Most of our configuration changes are scripted via a home-grown domain-specific CL

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-13 Thread Ferenc Wágner
Jan Friesse writes: > Ferenc Wágner napsal(a): > >> I wonder if c139255 (totemsrp: Implement sanity checks of received >> msgs) has direct security relevance as well. > > Not entirely direct, but quite similar. > >> Should I include that too in the Debian secur

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-12 Thread Ferenc Wágner
Jan Pokorný writes: > On 12/04/18 14:33 +0200, Jan Friesse wrote: > >> This release contains a lot of fixes, including fix for >> CVE-2018-1084. > > Security related updates would preferably provide more context Absolutely, thanks for providing that! Looking at the git log, I wonder if c139255

Re: [ClusterLabs] Issues found in Pacemaker 1.1.18, fixes in 1.1 branch

2017-12-12 Thread Ferenc Wágner
Ken Gaillot writes: > A couple of regressions have been found in the recent Pacemaker 1.1.18 > release. > > Fixes for these, plus one finishing an incomplete fix in 1.1.18, are in > the master branch, and have been backported to the 1.1 branch for ease > of patching. It is recommended that anyone

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Ferenc Wágner
Andrei Borzenkov writes: > 25.11.2017 10:05, Andrei Borzenkov пишет: > >> In one of guides suggested procedure to simulate split brain was to kill >> corosync process. It actually worked on one cluster, but on another >> corosync process was restarted after being killed without cluster >> noticin

Re: [ClusterLabs] Important note for anyone using guest nodes and upgrading to 1.1.18

2017-11-20 Thread Ferenc Wágner
Ken Gaillot writes: > This will also be of interest to distribution packagers ... Hi Ken, Do you mean that this warrants a prominent package changelog entry? Or what else could packagers do about this? -- Thanks, Feri ___ Users mailing list: Users@c

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-11-01 Thread Ferenc Wágner
Ken Gaillot writes: > When an operation completes, a history entry () is added to > the pe-input file. If the agent supports reload, the entry will include > op-force-restart and op-restart-digest fields. Now I see those are > present in the vm-alder_last_0 entry, so agent support isn't the issue

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot writes: > The pe-input is indeed entirely sufficient. > > I forgot to check why the reload was not possible in this case. It > turns out it is this: > >    trace: check_action_definition:  Resource vm-alder doesn't know > how to reload > > Does the resource agent implement the "re

Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-10-31 Thread Ferenc Wágner
Dennis Jacobfeuerborn writes: > if I create a new unit file for the new file the services would not > depend on it so it wouldn't get automatically mounted when they start. Put the new unit file under /etc/systemd/system/x.service.requires to have x.service require it. I don't get the full pict

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot writes: > On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote: > >> Ken Gaillot writes: >> >>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >>> >>>> Ken Gaillot writes: >>>> >>>>> Hm

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-30 Thread Ferenc Wágner
Norberto Lopes writes: > On Fri, 27 Oct 2017 at 06:41 Ferenc Wágner wrote: > >> Norberto Lopes writes: >> >>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master >>> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave &

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-26 Thread Ferenc Wágner
Norberto Lopes writes: > colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master > colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave > > Basically what's occurring in my cluster is that the first rule stops the > Sync node from being promoted if the Master ever

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-20 Thread Ferenc Wágner
Ken Gaillot writes: > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >> Ken Gaillot writes: >> >>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to >>> me privately, or file a bz with it attached) the above pe-input file &g

Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach writes: > On 10/11/2017 09:00 AM, Ferenc Wágner wrote: > >> Václav Mach writes: >> >>> allow-hotplug eth0 >>> iface eth0 inet dhcp >> >> Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a >> before network-

Re: [ClusterLabs] ClusterMon mail notification - does not work

2017-10-11 Thread Ferenc Wágner
Donat Zenichev writes: > then resource is stopped, but nothing occurred on e-mail destination. > Where I did wrong actions? Please note that ClusterMon notifications are becoming deprecated (they should still work, but I've got no experience with them). Try using alerts instead, as documented a

Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach writes: > allow-hotplug eth0 > iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is started during boo

Re: [ClusterLabs] Strange failed transition messages showing up in log every 15 minutes

2017-09-25 Thread Ferenc Wágner
Dennis Jacobfeuerborn writes: > I see the following messages repeated every 15 minutes in > /var/log/messages: > > Sep 25 20:49:52 nfs2-storage2 pengine[2640]: warning: Processing failed op > promote for drbd:0 on nfs2-storage2: unknown error (1) > > The status still shows an error but this seem

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-09-22 Thread Ferenc Wágner
Ken Gaillot writes: > Hmm, stop+reload is definitely a bug. Can you attach (or email it to me > privately, or file a bz with it attached) the above pe-input file with > any sensitive info removed? I sent you the pe-input file privately. It indeed shows the issue: $ /usr/sbin/crm_simulate -x pe

[ClusterLabs] Pacemaker resource parameter reload confusion

2017-09-22 Thread Ferenc Wágner
Hi, I'm running a custom resourcre agent under Pacemaker 1.1.16, which has several reloadable parameters: $ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain | fgrep unique= I used to routinely change the unique="0" parameters without having the corresponding resources re

Re: [ClusterLabs] Pacemaker 1.1.18 deprecation warnings

2017-09-20 Thread Ferenc Wágner
Ken Gaillot writes: > * undocumented LRMD_MAX_CHILDREN environment variable > (PCMK_node_action_limit is the current syntax) By the way, is the current syntax documented somewhere? Looking at crmd/throttle.c, throttle_update_job_max() is only ever invoked with a NULL argument, so "Global prefer

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Jan Friesse writes: > Back to problem you have. It's definitively HW issue but I'm thinking > how to solve it in software. Right now, I can see two ways: > 1. Set dog FD to be non blocking right at the end of setup_watchdog - >This is proffered but I'm not sure if it's really going to work.

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Klaus Wenninger writes: > Just for my understanding: You are using watchdog-handling in corosync? Yes, I was. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterl

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
Valentin Vidic writes: > On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > >> Confirmed: setting watchdog_device: off cluster wide got rid of the >> above warnings. > > Interesting, what brand or version of IPMI has this problem? It's a Fujitsu PR

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-09 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > Jan Friesse writes: > >> wf...@niif.hu writes: >> >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >>> (in August; in May, it happened 0-2 times a day only, it's slowly >>> rampin

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-05 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >>

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Digimer writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> >>>> [...] >>>> While dlm_tool status reports (similar on all nodes): >>>> &g

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >>

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Klaus Wenninger writes: > Just seen that you are hosting VMs which might make you use KSM ... > Don't fully remember at the moment but I have some memory of > issues with KSM and page-locking. > iirc it was some bug in the kernel memory-management that should > be fixed a long time ago but ... H

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-30 Thread Ferenc Wágner
"Ulrich Windl" writes: >>>> Ferenc Wágner schrieb am 28.08.2017 um 18:07 in Nachricht > <87mv6jk75r@lant.ki.iif.hu>: > > cLVM under I/O load can be really slow (I'm talking about delays in the range > of a few seconds). Yes, I know, and it'

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Digimer writes: > On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > >> [...] >> While dlm_tool status reports (similar on all nodes): >> >> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >> daemon now 2941405 fence_pid 0 >> node 167773705 M a

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >>

[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ferenc Wágner
Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming n

Re: [ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

2017-06-21 Thread Ferenc Wágner
Ken Gaillot writes: > The most significant change in this release is a new cluster option to > improve scalability. > > As users start to create clusters with hundreds of resources and many > nodes, one bottleneck is a complete reprobe of all resources (for > example, after a cleanup of all resou

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-20 Thread Ferenc Wágner
Digimer writes: > On 19/06/17 11:40 PM, Andrei Borzenkov wrote: > >> 20.06.2017 02:15, Digimer пишет: >> >>> On 19/06/17 06:59 PM, Ferenc Wágner wrote: >>> >>>> Digimer writes: >>>> >>>>> So we have a tool that watches

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Ferenc Wágner
Digimer writes: > So we have a tool that watches for changes to clvmd by running > pvscan/vgscan/lvscan, but this seems to be expensive and occassionally > cause trouble. What kind of trouble did you experience? > Is there any other way to be notified or to check when something > changes? LV (

Re: [ClusterLabs] Ubuntu 16.04 - Only binds on 127.0.0.1 then fails until reinstall

2017-05-05 Thread Ferenc Wágner
James Booth writes: > Sorry for the repeat mails, but I had issues subscribing list time > (Looks like it has worked successfully now!). > > Anywho, I'm really desperate for some help on my issue in > http://lists.clusterlabs.org/pipermail/users/2017-April/005495.html - > I can recap the info in

Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-18 Thread Ferenc Wágner
Ken Gaillot writes: > On 04/13/2017 11:11 AM, Ferenc Wágner wrote: > >> I encountered several (old) statements on various forums along the lines >> of: "the CIB is not a transactional database and shouldn't be used as >> one" or "resource paramet

[ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-13 Thread Ferenc Wágner
Hi, I encountered several (old) statements on various forums along the lines of: "the CIB is not a transactional database and shouldn't be used as one" or "resource parameters should only uniquely identify a resource, not configure it" and "the CIB was not designed to be a configuration database b

Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-13 Thread Ferenc Wágner
kgronl...@suse.com (Kristoffer Grönlund) writes: > I discovered today that a location constraint with score=INFINITY > doesn't actually restrict resources to running only on particular > nodes. Yeah, I made the same "discovery" some time ago. Since then I've been using something like the followi

Re: [ClusterLabs] Never join a list without a problem...

2017-03-01 Thread Ferenc Wágner
Jeffrey Westgate writes: > We use Nagios to monitor, and once every 20 to 40 hours - sometimes > longer, and we cannot set a clock by it - while the machine is 95% > idle (or more according to 'top'), the host load shoots up to 50 or > 60%. It takes about 20 minutes to peak, and another 30 to 45

Re: [ClusterLabs] Insert delay between the statup of VirtualDomain

2017-02-27 Thread Ferenc Wágner
Oscar Segarra writes: > In my environment I have 5 guestes that have to be started up in a > specified order starting for the MySQL database server. We use a somewhat redesigned resource agent, which connects to the guest using a virtio channel and waits for a signal before exiting from the star

Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

2017-02-09 Thread Ferenc Wágner
Jehan-Guillaume de Rorthais writes: > PAF use private attribute to give informations between actions. We > detect the failure during the notify as well, but raise the error > during the promotion itself. See how I dealt with this in PAF: > > https://github.com/ioguix/PAF/commit/6123025ff7cd9929b5

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot writes: > On 02/07/2017 01:11 AM, Ulrich Windl wrote: > >> Ken Gaillot writes: >> >>> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>> Isn't the question: Is crmd a process that is expected to die (and thus need restarting)? Or wouldn't one prefer to debug this situatio

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot writes: > On 02/03/2017 07:00 AM, RaSca wrote: >> >> On 03/02/2017 11:06, Ferenc Wágner wrote: >>> Ken Gaillot writes: >>> >>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote: >>>> >>>>> I am currently

[ClusterLabs] Failed reload

2017-02-08 Thread Ferenc Wágner
Hi, There was an interesting discussion on this list about "Doing reload right" last July (which I still haven't digested entirely). Now I've got a related question about the current and intented behavior: what happens if a reload operation fails? I found some suggestions in http://ocf.community

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-03 Thread Ferenc Wágner
Ken Gaillot writes: > On 01/10/2017 04:24 AM, Stefan Schloesser wrote: > >> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup >> seems to be working ok including the STONITH. >> For test purposes I issued a "pkill -f pace" killing all pacemaker >> processes on one node. >> >

Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino writes: > Ferenc, regarding the flag use_lvmetad in > /usr/lib/ocf/resource.d/heartbeat/LVM I read: > >> lvmetad is a daemon that caches lvm metadata to improve the >> performance of LVM commands. This daemon should never be used when >> volume groups exist that are being managed by

Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino writes: > I agree with you for > use_lvmetad = 0 (setting it = 1 in a clustered environment is an error) Where does this information come from? AFAIK, if locking_type=3 (LVM uses internal clustered locking, that is, clvmd), lvmetad is not used anyway, even if it's running. So it's

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ferenc Wágner
Ken Gaillot writes: > * When you move the VM, the cluster detects that it is not running on > the node you told it to keep it running on. Because there is no > "Stopped" monitor, the cluster doesn't immediately realize that a new > rogue instance is running on another node. So, the cluster thinks

Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)

2016-11-07 Thread Ferenc Wágner
Jan Friesse writes: > Ferenc Wágner napsal(a): > >> Have you got any plans/timeline for 2.4.2 yet? > > Yep, I'm going to release it in few minutes/hours. Man, that was quick. I've got a bunch of typo fixes queued..:) Please consider announcing upcoming releases a c

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-11-03 Thread Ferenc Wágner
Jan Friesse writes: >> Jan Friesse writes: >> >>> Please note that because of required changes in votequorum, >>> libvotequorum is no longer binary compatible. This is reason for >>> version bump. >> >> Er, what version bump? Corosync 2.4.1 still produces >> libvotequorum.so.7.0.0 for me, just

Re: [ClusterLabs] Special care needed when upgrading Pacemaker Remote nodes

2016-10-29 Thread Ferenc Wágner
Ken Gaillot writes: > This spurred me to complete a long-planned overhaul of Pacemaker > Explained's "Upgrading" appendix: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_upgrading.html > > Feedback is welcome. Since you asked for it..:) 1. Table D.1.: why does

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-10-28 Thread Ferenc Wágner
Jan Friesse writes: > Please note that because of required changes in votequorum, > libvotequorum is no longer binary compatible. This is reason for > version bump. Er, what version bump? Corosync 2.4.1 still produces libvotequorum.so.7.0.0 for me, just like Corosync 2.3.6. -- Thanks, Feri __

Re: [ClusterLabs] Pacemaker and OCFS2 on stand alone mode

2016-07-08 Thread Ferenc Wágner
"Carlos Xavier" writes: > 1467918891 Is dlm missing from kernel? No misc devices found. > 1467918891 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 > 1467918891 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 > 1467918891 No /sys/kernel/config, is configfs loaded? > 1467918891 s

Re: [ClusterLabs] Doing reload right

2016-07-04 Thread Ferenc Wágner
Ken Gaillot writes: > Does anyone know of an RA that uses reload correctly? My resource agents advertise a no-op reload action for handling their "private" meta attributes. Meta in the sense that they are used by the resource agent when performing certain operations, not by the managed resource

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-26 Thread Ferenc Wágner
"Lentes, Bernd" writes: > i don't have neither an init-script nor a systemd service file. > The only packages i find in the repositories concerning dlm are: > libdlm3-3.00.01-0.31.87 > libdlm-3.00.01-0.31.87 > And i have a kernel module for dlm. > Nothing else. Sorry, my experience is limited to

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-25 Thread Ferenc Wágner
"Lentes, Bernd" writes: > wf...@niif.hu writes: > >> "Lentes, Bernd" writes: >> >>> is it possible to have a DLM running without CRM? >> >> Yes. You'll need to configure fencing, though, since by default DLM >> will try to use stonithd (from Pacemaker). But DLM fencing didn't >> handle fencing f

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-25 Thread Ferenc Wágner
"Lentes, Bernd" writes: > is it possible to have a DLM running without CRM? Yes. You'll need to configure fencing, though, since by default DLM will try to use stonithd (from Pacemaker). But DLM fencing didn't handle fencing failures correctly for me, resulting in more nodes being fenced until

[ClusterLabs] restarting pacemakerd

2016-06-18 Thread Ferenc Wágner
Hi, Could somebody please elaborate a little why the pacemaker systemd service file contains "Restart=on-failure"? I mean that a failed node gets fenced anyway, so most of the time this would be a futile effort. On the other hand, one could argue that restarting failed services should be the defa

Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger writes: > On 06/16/2016 11:05 AM, Ferenc Wágner wrote: > >> Klaus Wenninger writes: >> >>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote: >>> >>>> I think the default timestamp should contain date and time zone >>>> spe

Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger writes: > On 06/15/2016 06:11 PM, Ferenc Wágner wrote: > >> Please find some random notes about my adventures testing the new alert >> system. >> >> The first alert example in the documentation has no recipient: >> >> >> >&

[ClusterLabs] Alert notes

2016-06-15 Thread Ferenc Wágner
Hi, Please find some random notes about my adventures testing the new alert system. The first alert example in the documentation has no recipient: In the example above, the cluster will call my-script.sh for each event. while the next section starts as: Each alert may be conf

Re: [ClusterLabs] Pacemaker 1.1.15 - Release Candidate 4

2016-06-12 Thread Ferenc Wágner
Ken Gaillot writes: > With this release candidate, we now provide three sample alert scripts > to use with the new alerts feature, installed in the > /usr/share/pacemaker/alerts directory. Hi, Is there a real reason to name these scripts *.sample? Sure, they are samples, but they are also usab

Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-10 Thread Ferenc Wágner
Ilia Sokolinski writes: > We have a custom Master-Slave resource running on a 3-node pcs cluster on > CentOS 7.1 > > As part of what is supposed to be an NDU we do update some properties of the > resource. > For some reason this causes both Master and Slave instances of the resource > to be r

Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ferenc Wágner
Nikhil Utane writes: > Would like to know the best and easiest way to add a new node to an already > running cluster. > > Our limitation: > 1) pcsd cannot be used since (as per my understanding) it communicates over > ssh which is prevented. > 2) No manual editing of corosync.conf If you use IPv

Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd" writes: > - On Jun 7, 2016, at 3:53 PM, Ferenc Wágner wf...@niif.hu wrote: > >> "Lentes, Bernd" writes: >> >>> Ok. Does DLM takes care that a LV just can be used on one host ? >> >> No. Even plain LVM uses loc

Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd" writes: > Ok. Does DLM takes care that a LV just can be used on one host ? No. Even plain LVM uses locks to serialize access to its metadata (avoid concurrent writes corrupting it). These locks are provided by the host kernel (locking_type=1). DLM extends the locking concept t

Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Ferenc Wágner
"Stephano-Shachter, Dylan" writes: > I can not figure out why version 4 is not supported. Have you got fsid=root (or fsid=0) on your root export? See man exports. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/li

Re: [ClusterLabs] ClusterLabsTrouble with deb packaging from 1.12 to 1.15

2016-05-17 Thread Ferenc Wágner
Andrey Rogovsky writes: > I have deb rules, comes from 1.12 and try apply it to current release. 1.1.14 is available in sid, stretch and jessie-backports, any reason you can't use those packages? > In the building I get an error: > dh_testroot -a > rm -rf `pwd`/debian/tmp/usr/lib/service_crm.so

Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-04-27 Thread Ferenc Wágner
David Teigland writes: > On Tue, Apr 26, 2016 at 09:57:06PM +0200, Valentin Vidic wrote: > >> The bug is caused by the missing braces in the expanded if >> statement. >> >> Do you think we can get a new version out with this patch as the >> fencing in 4.0.4 does not work properly due to this iss

[ClusterLabs] operation parallelism

2016-04-22 Thread Ferenc Wágner
Hi, Are recurring monitor operations constrained by the batch-limit cluster option? I ask because I'd like to limit the number of parallel start and stop operations (because they are resource hungry and potentially take long) without starving other operations, especially monitors. -- Thanks, Fer

Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts

2016-04-22 Thread Ferenc Wágner
Ken Gaillot writes: > Each alert may have any number of recipients configured. These values > will simply be passed to the script as arguments. The first recipient > will also be passed as the CRM_alert_recipient environment variable, > for compatibility with existing scripts that only support on

Re: [ClusterLabs] ClusterLabsAntw: Re: Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" writes: > Ferenc Wágner schrieb am 19.04.2016 um 13:42 in Nachricht > >> "Ulrich Windl" writes: >> >>> Ferenc Wágner schrieb am 18.04.2016 um 17:07 in Nachricht >>> >>>> I'm using the "balance

Re: [ClusterLabs] Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" writes: > Ferenc Wágner schrieb am 18.04.2016 um 17:07 in Nachricht > >> I'm using the "balanced" placement strategy with good success. It >> distributes our VM resources according to memory size perfectly. >> However, I

[ClusterLabs] Utilization zones

2016-04-18 Thread Ferenc Wágner
Hi, I'm using the "balanced" placement strategy with good success. It distributes our VM resources according to memory size perfectly. However, I'd like to take the NUMA topology into account. That means each host should have several capacity pools (of each capacity type) to arrange the resource

[ClusterLabs] crmd error: Cannot route message to unknown node

2016-04-07 Thread Ferenc Wágner
Hi, On a freshly rebooted cluster node (after crm_mon reports it as 'online'), I get the following: wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar Cleaning up vm-ceda

Re: [ClusterLabs] ClusterLabsAntw: Re: spread out resources

2016-04-04 Thread Ferenc Wágner
"Ulrich Windl" writes: > Actually form my SLES11 SP[1-4] experience, the cluster always > distributes resources across all available nodes, and only if don't > want that, I'll have to add constraints. I wonder why that does not > seem to work for you. Because I'd like to spread small subsets of

  1   2   >