[ClusterLabs] PCMK_ipc_buffer recommendation

2019-01-17 Thread Ferenc Wágner
Hi, Looking at lib/common/ipc.c, Pacemaker recommends setting PCMK_ipc_buffer to 4 times the *uncompressed* size of the biggest message seen: error: Could not compress the message (2309508 bytes) into less than the configured ipc limit (131072 bytes). Set PCMK_ipc_buffer to a higher value

Re: [ClusterLabs] live migration rarely fails seemingly without reason

2018-12-03 Thread Ferenc Wágner
"Lentes, Bernd" writes: > 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03 > 15:03:02.835+: 4515: error : qemuMigrationCheckJobStatus:1456 : operation > failed: migration job: unexpectedly failed The above message is a hint at the real problem. It comes from libvirtd,

Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Ferenc Wágner
Patrick Whitney writes: > I have a two node (test) cluster running corosync/pacemaker with DLM > and CLVM. > > I was running into an issue where when one node failed, the remaining node > would appear to do the right thing, from the pcmk perspective, that is. > It would create a new cluster (of

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > I'm also looking into high-res timestamps for logfiles too. Wouldn't that be a useful option for the syslog output as well? I'm sometimes concerned by the batching effect added by the transport between the application and the (local) log server (rsyslog or

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Ken Gaillot writes: > libqb would simply provide the API for reopening the log, and clients > such as pacemaker would intercept the signal and call the API. Just for posterity: you needn't restrict yourself to signals. Logrotate has nothing to do with signals. Signals are a rather limited

Re: [ClusterLabs] Antw: Salvaging aborted resource migration

2018-09-27 Thread Ferenc Wágner
Ken Gaillot writes: > On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote: > >> Obviously you violated the most important cluster rule that is "be >> patient". Maybe the next important is "Don't change the >> configuration while the cluster is not in IDLE state" ;-) > > Agreed -- although

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > TBH I would be quite happy to leave this to logrotate but the message I > was getting here is that we need additional help from libqb. I'm willing > to go with a consensus on this though Yes, to do a proper job logrotate has to have a way to get the log files

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield writes: > I'm looking into new features for libqb and the option in > https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425 > looks like a good option to me. It feels backwards to me: traditionally, increasing numbers signify older rotated logs, while this

[ClusterLabs] Salvaging aborted resource migration

2018-09-27 Thread Ferenc Wágner
Hi, The current behavior of cancelled migration with Pacemaker 1.1.16 with a resource implementing push migration: # /usr/sbin/crm_resource --ban -r vm-conv-4 vhbl03 crmd[10017]: notice: State transition S_IDLE -> S_POLICY_ENGINE vhbl03 pengine[10016]: notice: Migrate vm-conv-4#011(Started

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-26 Thread Ferenc Wágner
Jan Friesse writes: > wagner.fer...@kifu.gov.hu writes: > >> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common >> choices, but logging.* cmap keys probably fit Corosync better). That >> would enable proper log rotation. > > What is the reason that you find "copytruncate"

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse writes: > Default example config should be definitively ported to newer style of > nodelist without interface section. example.udpu can probably be > deleted as well as example.xml (whole idea of having XML was because > of cluster config tools like pcs, but these tools never used >

Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse writes: > Have you had a time to play with packaging current alpha to find out > if there are no issues? I had no problems with Fedora, but Debian has > a lot of patches, and I would be really grateful if we could reduce > them a lot - so please let me know if there is patch which

Re: [ClusterLabs] Corosync 3 release plans?

2018-08-27 Thread Ferenc Wágner
Jan Friesse writes: > Currently I'm pretty happy with current Corosync alpha stability so it > would be possible to release final right now, but because I want to > give us some room to break protocol/abi (only if needed and right now > I don't see any strong reason for such breakage), I didn't

[ClusterLabs] Corosync 3 release plans? (was: Redundant ring not recovering after node is back)

2018-08-26 Thread Ferenc Wágner
Jan Friesse writes: > try corosync 3.x (current Alpha4 is pretty stable [...] Hi Honza, Can you provide an estimate for the Corosync 3 release timeline? We have to plan the ABI transition in Debian anf the freeze date is drawing closer. -- Thanks, Feri

Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > David Tolosa writes: > >> I tried to install corosync 3.x and it works pretty well. >> But when I install pacemaker, it installs previous version of corosync as >> dependency and breaks all the setup. >> Any suggestions?

Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
David Tolosa writes: > I tried to install corosync 3.x and it works pretty well. > But when I install pacemaker, it installs previous version of corosync as > dependency and breaks all the setup. > Any suggestions? Install the equivs package to create a dummy corosync package representing your

Re: [ClusterLabs] Antw: Re: Spurious node loss in corosync cluster

2018-08-22 Thread Ferenc Wágner
Jan Friesse writes: > Is that system VM or physical machine? Because " Corosync main process > was not scheduled for..." is usually happening on VMs where hosts are > highly overloaded. Or when physical hosts use BMC watchdogs. But Prasad didn't encounter such logs in the setup at hand, as far

Re: [ClusterLabs] DLM recovery stuck (digression: Corosync watchdog experience)

2018-08-10 Thread Ferenc Wágner
FeldHost™ Admin writes: > rule of thumb is use separate dedicated network for corosync traffic. > For ex. we use two corosync rings, first and active one on separate > network card and switch, second passive one on team (bond) device vlan. Hi, That's fine in principle, but this is a

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
David Teigland writes: > On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote: > >> Almost ten years ago you requested more info in a similar case, let's >> see if we can get further now! > > Hi, the usual cause is that a network message from the dlm has be

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > For a start I attached the dump output from another node. I meant to... 146 dlm_controld 4.0.5 started 146 our_nodeid 167773708 146 found /dev/misc/dlm-control minor 58 146 found /dev/misc/dlm-monitor minor 57 146 found /dev/misc/dlm_plock minor 56

[ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
Hi David, Almost ten years ago you requested more info in a similar case, let's see if we can get further now! We're running a 6-node Corosync cluster. DLM is started by systemd: ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled) Active: active

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-07 Thread Ferenc Wágner
Jan Pokorný writes: > 1. [X] Do you edit CIB by hand (as opposed to relying on crm/pcs or > their UI counterparts)? For debugging one has to understand the CIB anyway, so why learn additional syntaxes? :) Most of our configuration changes are scripted via a home-grown domain-specific

Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-12 Thread Ferenc Wágner
Jan Pokorný writes: > On 12/04/18 14:33 +0200, Jan Friesse wrote: > >> This release contains a lot of fixes, including fix for >> CVE-2018-1084. > > Security related updates would preferably provide more context Absolutely, thanks for providing that! Looking at the git

Re: [ClusterLabs] Issues found in Pacemaker 1.1.18, fixes in 1.1 branch

2017-12-12 Thread Ferenc Wágner
Ken Gaillot writes: > A couple of regressions have been found in the recent Pacemaker 1.1.18 > release. > > Fixes for these, plus one finishing an incomplete fix in 1.1.18, are in > the master branch, and have been backported to the 1.1 branch for ease > of patching. It is

Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Ferenc Wágner
Andrei Borzenkov writes: > 25.11.2017 10:05, Andrei Borzenkov пишет: > >> In one of guides suggested procedure to simulate split brain was to kill >> corosync process. It actually worked on one cluster, but on another >> corosync process was restarted after being killed

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-11-01 Thread Ferenc Wágner
Ken Gaillot writes: > When an operation completes, a history entry () is added to > the pe-input file. If the agent supports reload, the entry will include > op-force-restart and op-restart-digest fields. Now I see those are > present in the vm-alder_last_0 entry, so agent

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot writes: > The pe-input is indeed entirely sufficient. > > I forgot to check why the reload was not possible in this case. It > turns out it is this: > >    trace: check_action_definition:  Resource vm-alder doesn't know > how to reload > > Does the resource

Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-10-31 Thread Ferenc Wágner
Dennis Jacobfeuerborn writes: > if I create a new unit file for the new file the services would not > depend on it so it wouldn't get automatically mounted when they start. Put the new unit file under /etc/systemd/system/x.service.requires to have x.service require it. I

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote: > >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >>> >>>> Ken Gaillo

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-31 Thread Ferenc Wágner
Norberto Lopes <nlopes...@gmail.com> writes: > On Fri, 27 Oct 2017 at 06:41 Ferenc Wágner <wf...@niif.hu> wrote: > >> Norberto Lopes <nlopes...@gmail.com> writes: >> >>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master

Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-26 Thread Ferenc Wágner
Norberto Lopes writes: > colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master > colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave > > Basically what's occurring in my cluster is that the first rule stops the > Sync node from being

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-20 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to >>> me privately, or file a

Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach <ma...@cesnet.cz> writes: > On 10/11/2017 09:00 AM, Ferenc Wágner wrote: > >> Václav Mach <ma...@cesnet.cz> writes: >> >>> allow-hotplug eth0 >>> iface eth0 inet dhcp >> >> Try replacing allow-hotplug with auto. Ifupdow

Re: [ClusterLabs] ClusterMon mail notification - does not work

2017-10-11 Thread Ferenc Wágner
Donat Zenichev writes: > then resource is stopped, but nothing occurred on e-mail destination. > Where I did wrong actions? Please note that ClusterMon notifications are becoming deprecated (they should still work, but I've got no experience with them). Try using

Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach writes: > allow-hotplug eth0 > iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is

Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-09-22 Thread Ferenc Wágner
Ken Gaillot writes: > Hmm, stop+reload is definitely a bug. Can you attach (or email it to me > privately, or file a bz with it attached) the above pe-input file with > any sensitive info removed? I sent you the pe-input file privately. It indeed shows the issue: $

Re: [ClusterLabs] Pacemaker 1.1.18 deprecation warnings

2017-09-20 Thread Ferenc Wágner
Ken Gaillot writes: > * undocumented LRMD_MAX_CHILDREN environment variable > (PCMK_node_action_limit is the current syntax) By the way, is the current syntax documented somewhere? Looking at crmd/throttle.c, throttle_update_job_max() is only ever invoked with a NULL

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Jan Friesse writes: > Back to problem you have. It's definitively HW issue but I'm thinking > how to solve it in software. Right now, I can see two ways: > 1. Set dog FD to be non blocking right at the end of setup_watchdog - >This is proffered but I'm not sure if it's

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Klaus Wenninger writes: > Just for my understanding: You are using watchdog-handling in corosync? Yes, I was. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
Valentin Vidic <valentin.vi...@carnet.hr> writes: > On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > >> Confirmed: setting watchdog_device: off cluster wide got rid of the >> above warnings. > > Interesting, what brand or version of IPMI has this pro

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > Jan Friesse <jfrie...@redhat.com> writes: > >> wf...@niif.hu writes: >> >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >>> (in August; in May, it happened 0-2 times a day only, it's slowl

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-05 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer <li...@alteeve.ca> writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> >>>> [...] >>>> While dlm_tool status repo

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Klaus Wenninger writes: > Just seen that you are hosting VMs which might make you use KSM ... > Don't fully remember at the moment but I have some memory of > issues with KSM and page-locking. > iirc it was some bug in the kernel memory-management that should > be fixed a

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes: > On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > >> [...] >> While dlm_tool status reports (similar on all nodes): >> >> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >> daemon now 2941405 fence_pid 0 >&g

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ferenc Wágner
Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming

Re: [ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

2017-06-21 Thread Ferenc Wágner
Ken Gaillot writes: > The most significant change in this release is a new cluster option to > improve scalability. > > As users start to create clusters with hundreds of resources and many > nodes, one bottleneck is a complete reprobe of all resources (for > example, after

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-20 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes: > On 19/06/17 11:40 PM, Andrei Borzenkov wrote: > >> 20.06.2017 02:15, Digimer пишет: >> >>> On 19/06/17 06:59 PM, Ferenc Wágner wrote: >>> >>>> Digimer <li...@alteeve.ca> writes: >>>&

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Ferenc Wágner
Digimer writes: > So we have a tool that watches for changes to clvmd by running > pvscan/vgscan/lvscan, but this seems to be expensive and occassionally > cause trouble. What kind of trouble did you experience? > Is there any other way to be notified or to check when

Re: [ClusterLabs] Ubuntu 16.04 - Only binds on 127.0.0.1 then fails until reinstall

2017-05-06 Thread Ferenc Wágner
James Booth writes: > Sorry for the repeat mails, but I had issues subscribing list time > (Looks like it has worked successfully now!). > > Anywho, I'm really desperate for some help on my issue in >

Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-18 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On 04/13/2017 11:11 AM, Ferenc Wágner wrote: > >> I encountered several (old) statements on various forums along the lines >> of: "the CIB is not a transactional database and shouldn't be used as >> one" or &

[ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-13 Thread Ferenc Wágner
Hi, I encountered several (old) statements on various forums along the lines of: "the CIB is not a transactional database and shouldn't be used as one" or "resource parameters should only uniquely identify a resource, not configure it" and "the CIB was not designed to be a configuration database

Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-13 Thread Ferenc Wágner
kgronl...@suse.com (Kristoffer Grönlund) writes: > I discovered today that a location constraint with score=INFINITY > doesn't actually restrict resources to running only on particular > nodes. Yeah, I made the same "discovery" some time ago. Since then I've been using something like the

Re: [ClusterLabs] Never join a list without a problem...

2017-03-01 Thread Ferenc Wágner
Jeffrey Westgate writes: > We use Nagios to monitor, and once every 20 to 40 hours - sometimes > longer, and we cannot set a clock by it - while the machine is 95% > idle (or more according to 'top'), the host load shoots up to 50 or > 60%. It takes about 20

Re: [ClusterLabs] Insert delay between the statup of VirtualDomain

2017-02-27 Thread Ferenc Wágner
Oscar Segarra writes: > In my environment I have 5 guestes that have to be started up in a > specified order starting for the MySQL database server. We use a somewhat redesigned resource agent, which connects to the guest using a virtio channel and waits for a signal

Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

2017-02-09 Thread Ferenc Wágner
Jehan-Guillaume de Rorthais writes: > PAF use private attribute to give informations between actions. We > detect the failure during the notify as well, but raise the error > during the promotion itself. See how I dealt with this in PAF: > >

Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot writes: > On 02/07/2017 01:11 AM, Ulrich Windl wrote: > >> Ken Gaillot writes: >> >>> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>> Isn't the question: Is crmd a process that is expected to die (and thus need restarting)? Or

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On 02/03/2017 07:00 AM, RaSca wrote: >> >> On 03/02/2017 11:06, Ferenc Wágner wrote: >>> Ken Gaillot <kgail...@redhat.com> writes: >>> >>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote: >&g

[ClusterLabs] Failed reload

2017-02-08 Thread Ferenc Wágner
Hi, There was an interesting discussion on this list about "Doing reload right" last July (which I still haven't digested entirely). Now I've got a related question about the current and intented behavior: what happens if a reload operation fails? I found some suggestions in

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-03 Thread Ferenc Wágner
Ken Gaillot writes: > On 01/10/2017 04:24 AM, Stefan Schloesser wrote: > >> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup >> seems to be working ok including the STONITH. >> For test purposes I issued a "pkill -f pace" killing all pacemaker >>

Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino writes: > Ferenc, regarding the flag use_lvmetad in > /usr/lib/ocf/resource.d/heartbeat/LVM I read: > >> lvmetad is a daemon that caches lvm metadata to improve the >> performance of LVM commands. This daemon should never be used when >> volume groups exist

Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino writes: > I agree with you for > use_lvmetad = 0 (setting it = 1 in a clustered environment is an error) Where does this information come from? AFAIK, if locking_type=3 (LVM uses internal clustered locking, that is, clvmd), lvmetad is not used anyway, even if

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ferenc Wágner
Ken Gaillot writes: > * When you move the VM, the cluster detects that it is not running on > the node you told it to keep it running on. Because there is no > "Stopped" monitor, the cluster doesn't immediately realize that a new > rogue instance is running on another node.

Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)

2016-11-07 Thread Ferenc Wágner
Jan Friesse <jfrie...@redhat.com> writes: > Ferenc Wágner napsal(a): > >> Have you got any plans/timeline for 2.4.2 yet? > > Yep, I'm going to release it in few minutes/hours. Man, that was quick. I've got a bunch of typo fixes queued..:) Please consider announcing up

Re: [ClusterLabs] Special care needed when upgrading Pacemaker Remote nodes

2016-10-29 Thread Ferenc Wágner
Ken Gaillot writes: > This spurred me to complete a long-planned overhaul of Pacemaker > Explained's "Upgrading" appendix: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_upgrading.html > > Feedback is welcome. Since you asked for it..:) 1.

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-10-28 Thread Ferenc Wágner
Jan Friesse writes: > Please note that because of required changes in votequorum, > libvotequorum is no longer binary compatible. This is reason for > version bump. Er, what version bump? Corosync 2.4.1 still produces libvotequorum.so.7.0.0 for me, just like Corosync

Re: [ClusterLabs] Doing reload right

2016-07-04 Thread Ferenc Wágner
Ken Gaillot writes: > Does anyone know of an RA that uses reload correctly? My resource agents advertise a no-op reload action for handling their "private" meta attributes. Meta in the sense that they are used by the resource agent when performing certain operations, not

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-26 Thread Ferenc Wágner
"Lentes, Bernd" writes: > i don't have neither an init-script nor a systemd service file. > The only packages i find in the repositories concerning dlm are: > libdlm3-3.00.01-0.31.87 > libdlm-3.00.01-0.31.87 > And i have a kernel module for dlm. > Nothing

Re: [ClusterLabs] DLM standalone without crm ?

2016-06-25 Thread Ferenc Wágner
"Lentes, Bernd" writes: > is it possible to have a DLM running without CRM? Yes. You'll need to configure fencing, though, since by default DLM will try to use stonithd (from Pacemaker). But DLM fencing didn't handle fencing failures correctly for me,

[ClusterLabs] restarting pacemakerd

2016-06-18 Thread Ferenc Wágner
Hi, Could somebody please elaborate a little why the pacemaker systemd service file contains "Restart=on-failure"? I mean that a failed node gets fenced anyway, so most of the time this would be a futile effort. On the other hand, one could argue that restarting failed services should be the

Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger <kwenn...@redhat.com> writes: > On 06/16/2016 11:05 AM, Ferenc Wágner wrote: > >> Klaus Wenninger <kwenn...@redhat.com> writes: >> >>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote: >>> >>>> I think the default timestamp

Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger <kwenn...@redhat.com> writes: > On 06/15/2016 06:11 PM, Ferenc Wágner wrote: > >> Please find some random notes about my adventures testing the new alert >> system. >> >> The first alert example in the documentation has no recipient: >&

Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-10 Thread Ferenc Wágner
Ilia Sokolinski writes: > We have a custom Master-Slave resource running on a 3-node pcs cluster on > CentOS 7.1 > > As part of what is supposed to be an NDU we do update some properties of the > resource. > For some reason this causes both Master and Slave instances of

Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ferenc Wágner
Nikhil Utane writes: > Would like to know the best and easiest way to add a new node to an already > running cluster. > > Our limitation: > 1) pcsd cannot be used since (as per my understanding) it communicates over > ssh which is prevented. > 2) No manual editing of

Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes: > - On Jun 7, 2016, at 3:53 PM, Ferenc Wágner wf...@niif.hu wrote: > >> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes: >> >>> Ok. Does DLM takes care that a L

Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd" writes: > Ok. Does DLM takes care that a LV just can be used on one host ? No. Even plain LVM uses locks to serialize access to its metadata (avoid concurrent writes corrupting it). These locks are provided by the host kernel

Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Ferenc Wágner
"Stephano-Shachter, Dylan" writes: > I can not figure out why version 4 is not supported. Have you got fsid=root (or fsid=0) on your root export? See man exports. -- Feri ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] ClusterLabsTrouble with deb packaging from 1.12 to 1.15

2016-05-17 Thread Ferenc Wágner
Andrey Rogovsky writes: > I have deb rules, comes from 1.12 and try apply it to current release. 1.1.14 is available in sid, stretch and jessie-backports, any reason you can't use those packages? > In the building I get an error: > dh_testroot -a > rm -rf

Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-04-27 Thread Ferenc Wágner
David Teigland writes: > On Tue, Apr 26, 2016 at 09:57:06PM +0200, Valentin Vidic wrote: > >> The bug is caused by the missing braces in the expanded if >> statement. >> >> Do you think we can get a new version out with this patch as the >> fencing in 4.0.4 does not work

[ClusterLabs] operation parallelism

2016-04-22 Thread Ferenc Wágner
Hi, Are recurring monitor operations constrained by the batch-limit cluster option? I ask because I'd like to limit the number of parallel start and stop operations (because they are resource hungry and potentially take long) without starving other operations, especially monitors. -- Thanks,

Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts

2016-04-22 Thread Ferenc Wágner
Ken Gaillot writes: > Each alert may have any number of recipients configured. These values > will simply be passed to the script as arguments. The first recipient > will also be passed as the CRM_alert_recipient environment variable, > for compatibility with existing

Re: [ClusterLabs] ClusterLabsAntw: Re: Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: > Ferenc Wágner <wf...@niif.hu> schrieb am 19.04.2016 um 13:42 in Nachricht > >> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: >> >>> Ferenc Wágner <wf..

Re: [ClusterLabs] Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: > Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht > >> I'm using the "balanced" placement strategy with good success. It >> distributes our VM resources accord

[ClusterLabs] Utilization zones

2016-04-18 Thread Ferenc Wágner
Hi, I'm using the "balanced" placement strategy with good success. It distributes our VM resources according to memory size perfectly. However, I'd like to take the NUMA topology into account. That means each host should have several capacity pools (of each capacity type) to arrange the

[ClusterLabs] crmd error: Cannot route message to unknown node

2016-04-07 Thread Ferenc Wágner
Hi, On a freshly rebooted cluster node (after crm_mon reports it as 'online'), I get the following: wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar Cleaning up

Re: [ClusterLabs] ClusterLabsAntw: Re: spread out resources

2016-04-04 Thread Ferenc Wágner
"Ulrich Windl" writes: > Actually form my SLES11 SP[1-4] experience, the cluster always > distributes resources across all available nodes, and only if don't > want that, I'll have to add constraints. I wonder why that does not > seem to work for you. Because

Re: [ClusterLabs] spread out resources

2016-04-02 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On 03/30/2016 08:37 PM, Ferenc Wágner wrote: > >> I've got a couple of resources (A, B, C, D, ... more than cluster nodes) >> that I want to spread out to different nodes as much as possible. They >> are all the same

Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

2016-03-23 Thread Ferenc Wágner
(Please post only to the list, or at least keep it amongst the Cc-s.) Momcilo Medic <fedorau...@fedoraproject.org> writes: > On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner <wf...@niif.hu> wrote: >> Momcilo Medic <fedorau...@fedoraproject.org> writes: >> >&g

Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

2016-03-23 Thread Ferenc Wágner
Momcilo Medic writes: > I have three hosts setup in my test environment. > They each have two connections to the SAN which has GFS2 on it. > > Everything works like a charm, except when I reboot a host. > Once it tries to stop gfs2-utils service it will just hang.

Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-23 Thread Ferenc Wágner
Ken Gaillot writes: > There is a fence parameter pcmk_host_check that specifies how pacemaker > determines which fence devices can fence which nodes. The default is > dynamic-list, which means to run the fence agent's list command to get > the nodes. [...] > > You can

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Andrei Borzenkov <arvidj...@gmail.com> writes: > On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner <wf...@niif.hu> wrote: > >> Pacemaker explained says about this cluster option: >> >> Advanced Use Only: Should the cluster shoot unseen nodes? Not using >&g

[ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Hi, Pacemaker explained says about this cluster option: Advanced Use Only: Should the cluster shoot unseen nodes? Not using the default is very unsafe! 1. What are those "unseen" nodes? And a possibly related question: 2. If I've got UNCLEAN (offline) nodes, is there a way to clean

Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-18 Thread Ferenc Wágner
Andrei Borzenkov <arvidj...@gmail.com> writes: > On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg <lars.ellenb...@linbit.com> > wrote: > >> On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >>>>> And some more about fencing: >>

[ClusterLabs] GFS and cLVM fencing requirements with DLM

2016-03-15 Thread Ferenc Wágner
Hi, I'm referring here to an ancient LKML thread introducing DLM. In http://article.gmane.org/gmane.linux.kernel/299788 David Teigland states: GFS requires that a failed node be fenced prior to gfs being told to begin recovery for that node which sounds very plausible as according to

Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-08 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes: > On 03/07/2016 02:03 PM, Ferenc Wágner wrote: > >> The transition-keys match, does this mean that the above is a late >> result from the monitor operation which was considered timed-out >> previously? How did it reach

Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-08 Thread Ferenc Wágner
Andrew Beekhof <abeek...@redhat.com> writes: > On Tue, Mar 8, 2016 at 7:03 AM, Ferenc Wágner <wf...@niif.hu> wrote: > >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> On 03/07/2016 07:31 AM, Ferenc Wágner wrote: >>> >>>>

  1   2   >