[ClusterLabs] PCMK_ipc_buffer recommendation
Hi, Looking at lib/common/ipc.c, Pacemaker recommends setting PCMK_ipc_buffer to 4 times the *uncompressed* size of the biggest message seen: error: Could not compress the message (2309508 bytes) into less than the configured ipc limit (131072 bytes). Set PCMK_ipc_buffer to a higher value (9238032 bytes suggested) Before setting it, I'd like to ask for confirmation: is a 10 MB buffer really reasonable and recommended in the above case? I wonder what effect it will have on total memory consumption. Growing 10 MB would be OK, growing 10 MB * some biggish number wouldn't. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] live migration rarely fails seemingly without reason
"Lentes, Bernd" writes: > 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03 > 15:03:02.835+: 4515: error : qemuMigrationCheckJobStatus:1456 : operation > failed: migration job: unexpectedly failed The above message is a hint at the real problem. It comes from libvirtd, so you should investigate there. I'd check the libvirtd logs for further clues. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Any CLVM/DLM users around?
Patrick Whitney writes: > I have a two node (test) cluster running corosync/pacemaker with DLM > and CLVM. > > I was running into an issue where when one node failed, the remaining node > would appear to do the right thing, from the pcmk perspective, that is. > It would create a new cluster (of one) and fence the other node, but > then, rather surprisingly, DLM would see the other node offline, and it > would go offline itself, abandoning the lockspace. > > I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and > our tests are now working as expected. I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that is, they are started by systemd, not by Pacemaker). I've seen weird DLM fencing behavior, but not what you describe above (though I ran with more than two nodes from the very start). Actually, I don't even understand how it occured to you to disable DLM fencing to fix that... > I'm a little concern I have masked an issue by doing this, as in all > of the tutorials and docs I've read, there is no mention of having to > configure DLM whatsoever. Unfortunately it's very hard to come by any reliable info about DLM. I had a couple of enlightening exchanges with David Teigland (its primary author) on this list, he is very helpful indeed, but I'm still very far from having a working understanding of it. But I've been running with --enable_fencing=0 for years without issues, leaving all fencing to Pacemaker. Note that manual cLVM operations are the only users of DLM here, so delayed fencing does not cause any problems, the cluster services do not depend on DLM being operational (I mean it can stay frozen for several days -- as it happened in a couple of pathological cases). GFS2 would be a very different thing, I guess. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Christine Caulfield writes: > I'm also looking into high-res timestamps for logfiles too. Wouldn't that be a useful option for the syslog output as well? I'm sometimes concerned by the batching effect added by the transport between the application and the (local) log server (rsyslog or systemd). Reliably merging messages from different channels can prove impossible without internal timestamps (even considering a single machine only). Another interesting feature could be structured, direct journal output (if you're looking for challenges). -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Ken Gaillot writes: > libqb would simply provide the API for reopening the log, and clients > such as pacemaker would intercept the signal and call the API. Just for posterity: you needn't restrict yourself to signals. Logrotate has nothing to do with signals. Signals are a rather limited form of IPC, which might be a good tool for some applications. Both Pacemaker and Corosync already employ much richer IPC mechanisms, which might be more natural to extend for triggering log rotation than adding a new IPC mechanism. Logrotate optionally runs scripts before and after renaming the log files; these can invoke kill, corosync-cmapctl, cibadmin and so on all the same. It's entirely your call. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Salvaging aborted resource migration
Ken Gaillot writes: > On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote: > >> Obviously you violated the most important cluster rule that is "be >> patient". Maybe the next important is "Don't change the >> configuration while the cluster is not in IDLE state" ;-) > > Agreed -- although even idle, removing a ban can result in a migration > back (if something like stickiness doesn't prevent it). I've got no problem with that in general. However, I can't gurantee that every configuration change happens in idle state, certain operations (mostly resource additions) are done by several administrators without synchronization, and of course asynchronous cluster events can also happen any time. So I have to ask: what are the consequences of breaking this "impossible" rule? > There's currently no way to tell pacemaker that an operation (i.e. > migrate_from) is a no-op and can be ignored. If a migration is only > partially completed, it has to be considered a failure and reverted. OK. Are there other complex operations which can "partially complete" if a transition is aborted by some event? Now let's suppose a pull migration scenario: migrate_to does nothing, but in this tiny window a configuration change aborts the transition. The resources would go through a full recovery (stop+start), right? Now let's suppose migrate_from gets scheduled and starts performing the migration. Before it finishes, a configuration change aborts the transition. The cluster waits for the outstanding operation to finish, doesn't it? And if it finishes successfully, is the migration considered complete requiring no recovery? > I'm not sure why the reload was scheduled; I suspect it's a bug due to > a restart being needed but no parameters having changed. There should > be special handling for a partial migration to make the stop required. Probably CLBZ#5309 again... You debugged a pe-input file for me with a similar issue almost exactly a year ago (thread subject "Pacemaker resource parameter reload confusion"). Time to upgrade this cluster, I guess. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Christine Caulfield writes: > TBH I would be quite happy to leave this to logrotate but the message I > was getting here is that we need additional help from libqb. I'm willing > to go with a consensus on this though Yes, to do a proper job logrotate has to have a way to get the log files reopened. And applications can't do that without support from libqb, if I understood Honza right. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Christine Caulfield writes: > I'm looking into new features for libqb and the option in > https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425 > looks like a good option to me. It feels backwards to me: traditionally, increasing numbers signify older rotated logs, while this proposal does the opposite. And what happens on application restart? Do you overwrite from 0? Do you ever jump back to 0? It also leaves the problem of cleaning up old log files unsolved... > Though adding an API call to re-open the log file could be done too - > I'm not averse to having both, Not addig log rotation policy (and implementation!) to each application is a win in my opinion, and also unifies local administration. Syslog is pretty good in this regard, my only gripe with it is that its time stamps can't be quite as precise as the ones from the (realtime) application (even nowadays, under systemd). And that it can block the log stream... on the other hand, disk latencies can block log writes just as well. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Salvaging aborted resource migration
Hi, The current behavior of cancelled migration with Pacemaker 1.1.16 with a resource implementing push migration: # /usr/sbin/crm_resource --ban -r vm-conv-4 vhbl03 crmd[10017]: notice: State transition S_IDLE -> S_POLICY_ENGINE vhbl03 pengine[10016]: notice: Migrate vm-conv-4#011(Started vhbl07 -> vhbl04) vhbl03 crmd[10017]: notice: Initiating migrate_to operation vm-conv-4_migrate_to_0 on vhbl07 vhbl03 pengine[10016]: notice: Calculated transition 4633, saving inputs in /var/lib/pacemaker/pengine/pe-input-1069.bz2 [...] At this point, with the migration still ongoing, I wanted to get rid of the constraint: # /usr/sbin/crm_resource --clear -r vm-conv-4 vhbl03 crmd[10017]: notice: Transition aborted by deletion of rsc_location[@id='cli-ban-vm-conv-4-on-vhbl07']: Configuration change vhbl07 crmd[10233]: notice: Result of migrate_to operation for vm-conv-4 on vhbl07: 0 (ok) vhbl03 crmd[10017]: notice: Transition 4633 (Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-1069.bz2): Stopped vhbl03 pengine[10016]: notice: Resource vm-conv-4 can no longer migrate to vhbl04. Stopping on vhbl07 too vhbl03 pengine[10016]: notice: Reload vm-conv-4#011(Started vhbl07) vhbl03 pengine[10016]: notice: Calculated transition 4634, saving inputs in /var/lib/pacemaker/pengine/pe-input-1070.bz2 vhbl03 crmd[10017]: notice: Initiating stop operation vm-conv-4_stop_0 on vhbl07 vhbl03 crmd[10017]: notice: Initiating stop operation vm-conv-4_stop_0 on vhbl04 vhbl03 crmd[10017]: notice: Initiating reload operation vm-conv-4_reload_0 on vhbl04 This recovery was entirely unnecessary, as the resource successfully migrated to vhbl04 (the migrate_from operation does nothing). Pacemaker does not know this, but is there a way to educate it? I think in this special case it is possible to redesign the agent making migrate_to a no-op and doing everything in migrate_from, which would significantly reduce the window between the start points of the two "halfs", but I'm not sure that would help in the end: Pacemaker could still decide to do an unnecessary stop+start recovery. Would it? I failed to find any documentation on recovery from aborted migration transitions. I don't expect on-fail (for migrate_* ops, not me) to apply here, does it? Side question: why initiate a reload in any case, like above? Even more side question: could you please consider using space instead of TAB in syslog messages? (Actually, I wouldn't mind getting rid of them altogether in any output.) -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Jan Friesse writes: > wagner.fer...@kifu.gov.hu writes: > >> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common >> choices, but logging.* cmap keys probably fit Corosync better). That >> would enable proper log rotation. > > What is the reason that you find "copytruncate" as non-proper log > rotation? I know there is a risk to loose some lines, but it should be > pretty small. Yes, there's a chance of losing some messages. It may be acceptable in some cases, but it's never desirable. The copy operation also wastes I/O bandwidth. Reopening the log files on some external trigger is a better solution on all accounts and also an industry standard. > Anyway, this again one of the feature where support from libqb would > be nice to have (there is actually issue opened > https://github.com/ClusterLabs/libqb/issues/239). That's a convoluted one for a simple reopen! But yes, if libqb does not expose such functionality, you can't do much about it. I'll stay with syslog for now. :) In cluster environments centralised log management is a must anyway, and that's annoying to achieve with direct file logs. >> Jan Friesse writes: >> >>> No matter how much I still believe totemsrp as a library would be >>> super nice to have - but current state is far away from what I would >>> call library (= something small, without non-related things like >>> transports/ip/..., testable (ideally with unit tests testing corner >>> cases)) and making one fat binary looks like a better way. >>> >>> I'll made a patch and send PR (it should be easy). >> >> Sounds sensible. Somebody can still split it out later if needed. > > Yep (and PR send + merged already :) ) Great! Did you mean to keep the totem.h, totemip.h, totempg.h and totemstats.h header files installed nevertheless? And totem_pg.pc could go as well, I guess. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Jan Friesse writes: > Default example config should be definitively ported to newer style of > nodelist without interface section. example.udpu can probably be > deleted as well as example.xml (whole idea of having XML was because > of cluster config tools like pcs, but these tools never used > corosync.xml). Kind of strange, because the inherently hierarchical Corosync configuration admits a very natural XML representation. > I was also thinking about allowing timestamp by default, because log > without timestamp is useless. I recommend adding high resolution timestamps, even, but for the direct file log only, not for syslog (by default). And log file reopening triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common choices, but logging.* cmap keys probably fit Corosync better). That would enable proper log rotation. >> Finally, something totally unrelated: the libtotem_pg shared object >> isn't standalone anymore, it has several undefined symbols (icmap_get_*, >> stats_knet_add_member, etc) which are defined in the corosync binary. > > This must be fixed. Or rather eliminated, if I read correctly below. >> Why is it still a separate object then? > > Honestly, I don't have too much strong reasons. We've talked with > Chrissie about it last year, and actually only reason I was able to > find out was to have a code/component separation so in theory other > project can use totem (what was original idea, but it never happened > and I don't think it will ever happen). Anyway, conclusion was to > remove the totem as a shared library and keep it as a static library > only, but nobody actually implemented that yet. That doesn't buy you anything if you use it in a single binary only. > No matter how much I still believe totemsrp as a library would be > super nice to have - but current state is far away from what I would > call library (= something small, without non-related things like > transports/ip/..., testable (ideally with unit tests testing corner > cases)) and making one fat binary looks like a better way. > > I'll made a patch and send PR (it should be easy). Sounds sensible. Somebody can still split it out later if needed. > Thank you for the testing and reporting problems! My pleasure, speaking about the latter. I haven't got to do any significant testing yet, unfortunately. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Jan Friesse writes: > Have you had a time to play with packaging current alpha to find out > if there are no issues? I had no problems with Fedora, but Debian has > a lot of patches, and I would be really grateful if we could reduce > them a lot - so please let me know if there is patch which you've sent > PR for and it's not merged yet. Hi Honza, Sorry for the delay. You've already merged my PR for two simple typos, thanks! Beyond that, there really isn't much in our patch queue anymore. As far as I can see, current master even has a patch for error propagation in notifyd, which will let us drop one more! And we arrive at the example configs. We prefer syslog for several reasons (copytruncate rotation isn't pretty, decoupling possible I/O stalls) and we haven't got the /var/log/cluster legacy. But more importantly, the knet default transport requires a nodelist instead of interfaces, unlike mcast udp. The "ring" terminology might need a change as well, especially ring0_addr. So I'd welcome an overhaul of the (now knet) example config, but I'm not personally qualified for doing that. :) Finally, something totally unrelated: the libtotem_pg shared object isn't standalone anymore, it has several undefined symbols (icmap_get_*, stats_knet_add_member, etc) which are defined in the corosync binary. Why is it still a separate object then? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 3 release plans?
Jan Friesse writes: > Currently I'm pretty happy with current Corosync alpha stability so it > would be possible to release final right now, but because I want to > give us some room to break protocol/abi (only if needed and right now > I don't see any strong reason for such breakage), I didn't release it > yet. Great! > Currently I'm planning to release 3.0.0 in the beginning of December > but if it would mean to miss Debian freeze date I'm open to release it > sooner. No need, we should be plenty good with this, the target date of the transition freeze is 2019-Jan-12. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Corosync 3 release plans? (was: Redundant ring not recovering after node is back)
Jan Friesse writes: > try corosync 3.x (current Alpha4 is pretty stable [...] Hi Honza, Can you provide an estimate for the Corosync 3 release timeline? We have to plan the ABI transition in Debian anf the freeze date is drawing closer. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Redundant ring not recovering after node is back
wf...@niif.hu (Ferenc Wágner) writes: > David Tolosa writes: > >> I tried to install corosync 3.x and it works pretty well. >> But when I install pacemaker, it installs previous version of corosync as >> dependency and breaks all the setup. >> Any suggestions? > > Install the equivs package to create a dummy corosync package > representing your local corosync build. > https://manpages.debian.org/stretch/equivs/equivs-build.1.en.html Forget it, libcfg changed ABI, so you'll have to recompile Pacemaker after all. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Redundant ring not recovering after node is back
David Tolosa writes: > I tried to install corosync 3.x and it works pretty well. > But when I install pacemaker, it installs previous version of corosync as > dependency and breaks all the setup. > Any suggestions? Install the equivs package to create a dummy corosync package representing your local corosync build. https://manpages.debian.org/stretch/equivs/equivs-build.1.en.html -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Spurious node loss in corosync cluster
Jan Friesse writes: > Is that system VM or physical machine? Because " Corosync main process > was not scheduled for..." is usually happening on VMs where hosts are > highly overloaded. Or when physical hosts use BMC watchdogs. But Prasad didn't encounter such logs in the setup at hand, as far as I understand. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DLM recovery stuck (digression: Corosync watchdog experience)
FeldHost™ Admin writes: > rule of thumb is use separate dedicated network for corosync traffic. > For ex. we use two corosync rings, first and active one on separate > network card and switch, second passive one on team (bond) device vlan. Hi, That's fine in principle, but this is a bladecenter setting, we can't really use separate networks cards, it's a single chassis at the end of the day. Besides, we've not encountered Corosync glitches. The Corosync virtual network is shared with the DLM traffic only and has 200 Mb/s bandwidth dedicated to it in the interface (BIOS) setup. Failure story for amusement: the blades expose a BMC watchdog device to the OS, which was picked up by Corosync. It seemed like a useful second line of defense in case fencing (BMC IPMI power) failed for any reason; I let it live and forgot about it. Months later, after a firmware upgrade the BMC had to be restarted, and the watchdog device ioctl blocked Corosync for a minute or so. Of course membership fell apart. Actually, across the full cluster, because the BMC restarts were preformed back-to-back (I authorized a single restart only, but anyway). I leave the rest to your imagination. Fencing (STONITH) worked (with delays) until quorum dissolved entirely... after a couple of minutes, it was over. We spent the rest of the day picking up the pieces, then the next few trying to reproduce the perceived Corosync network outage during BMC reboots without the cluster stack running. Of course in total vain. Half a year later an independent investigation of sporadic small Corosync delays revealed the watchdog connection, then we disabled the feature. Don't use (poorly implemented) BMC watchdogs. -- Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DLM recovery stuck
David Teigland writes: > On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote: > >> Almost ten years ago you requested more info in a similar case, let's >> see if we can get further now! > > Hi, the usual cause is that a network message from the dlm has been > lost/dropped/missed. The dlm can't recover from that, which is clearly a > weak point in the design. There may be some new development coming along > to finally improve that. Hi David, Good to hear! Can you share any more info about this development? > One way you can confirm this is to check if the dlm on one or more nodes > is waiting for a message that's not arriving. Often you'll see an entry > in the dlm "waiters" debugfs file corresponding to a response that's being > waited on. If you mean dlm/clvmd_waiters, it's empty on all nodes. Is there anything else to check? > Another red flag is kernel messages from a driver indicating some network > hickup at the time things hung. I can't say if these messages you sent > happened at the right time, or if they even correspond to the dlm > interface, but it's worth checking as a possible explanation: > > [ 137.207059] be2net :05:00.0 enp5s0f0: Link is Up > [ 137.252901] be2net :05:00.1 enp5s0f1: Link is Up Hard to say... This is an iSCSI offload card with two physical ports, which are virtualized in the card into 4-4 logical ports, 3-3 of which are passed to the OS as separate PCI functions, while the other two are used for iSCSI traffic. The DLM traffic goes through a Linux bond made of enp5s0f4 and enp5s0f5, which is started at 112.393798 and used for Corosync traffic first. The above two lines are signs of OpenVSwitch starting up for independent purposes. It should be totally independent, but it's the same device after all, so I can't exclude all possibility of "crosstalk". > [ 153.886619] connection2:0: detected conn error (1011) See above: iSCSI traffic is offloaded, not visible on the OS level, and these connection failures are expected at the moment because some of the targets are inaccessible. *But* it uses the same wire in the end, just different VLANs, and the virtualization (in the card itself) may not provide absolutely perfect separation. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DLM recovery stuck
wf...@niif.hu (Ferenc Wágner) writes: > For a start I attached the dump output from another node. I meant to... 146 dlm_controld 4.0.5 started 146 our_nodeid 167773708 146 found /dev/misc/dlm-control minor 58 146 found /dev/misc/dlm-monitor minor 57 146 found /dev/misc/dlm_plock minor 56 146 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 146 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 146 set recover_callbacks 1 146 cmap totem.cluster_name = 'vhbl' 146 set cluster_name vhbl 146 /dev/misc/dlm-monitor fd 10 146 cluster quorum 1 seq 3648 nodes 5 146 cluster node 167773705 added seq 3648 146 set_configfs_node 167773705 10.0.6.9 local 0 146 cluster node 167773707 added seq 3648 146 set_configfs_node 167773707 10.0.6.11 local 0 146 cluster node 167773708 added seq 3648 146 set_configfs_node 167773708 10.0.6.12 local 1 146 cluster node 167773709 added seq 3648 146 set_configfs_node 167773709 10.0.6.13 local 0 146 cluster node 167773710 added seq 3648 146 set_configfs_node 167773710 10.0.6.14 local 0 146 cpg_join dlm:controld ... 146 setup_cpg_daemon 12 146 dlm:controld conf 5 1 0 memb 167773705 167773707 167773708 167773709 167773710 join 167773708 left 146 daemon joined 167773705 146 daemon joined 167773707 146 daemon joined 167773708 146 daemon joined 167773709 146 daemon joined 167773710 146 dlm:controld ring 167773705:3648 5 memb 167773705 167773707 167773708 167773709 167773710 146 receive_protocol 167773705 max 3.1.1.0 run 3.1.1.1 146 daemon node 167773705 prot max 0.0.0.0 run 0.0.0.0 146 daemon node 167773705 save max 3.1.1.0 run 3.1.1.1 146 run protocol from nodeid 167773705 146 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1 146 plocks 13 146 receive_fence_clear from 167773705 for 167773708 result 0 flags 6 146 fence_in_progress_unknown 0 recv 146 receive_protocol 167773707 max 3.1.1.0 run 3.1.1.1 146 daemon node 167773707 prot max 0.0.0.0 run 0.0.0.0 146 daemon node 167773707 save max 3.1.1.0 run 3.1.1.1 146 receive_protocol 167773708 max 3.1.1.0 run 0.0.0.0 146 daemon node 167773708 prot max 0.0.0.0 run 0.0.0.0 146 daemon node 167773708 save max 3.1.1.0 run 0.0.0.0 146 receive_protocol 167773708 max 3.1.1.0 run 3.1.1.0 146 daemon node 167773708 prot max 3.1.1.0 run 0.0.0.0 146 daemon node 167773708 save max 3.1.1.0 run 3.1.1.0 146 receive_protocol 167773709 max 3.1.1.0 run 3.1.1.1 146 daemon node 167773709 prot max 0.0.0.0 run 0.0.0.0 146 daemon node 167773709 save max 3.1.1.0 run 3.1.1.1 146 receive_protocol 167773710 max 3.1.1.0 run 3.1.1.1 146 daemon node 167773710 prot max 0.0.0.0 run 0.0.0.0 146 daemon node 167773710 save max 3.1.1.0 run 3.1.1.1 147 uevent: add@/kernel/dlm/clvmd 147 kernel: add@ clvmd 147 uevent: online@/kernel/dlm/clvmd 147 kernel: online@ clvmd 147 clvmd cpg_join dlm:ls:clvmd ... 147 dlm:ls:clvmd conf 5 1 0 memb 167773705 167773707 167773708 167773709 167773710 join 167773708 left 147 clvmd add_change cg 1 joined nodeid 167773708 147 clvmd add_change cg 1 we joined 147 clvmd add_change cg 1 counts member 5 joined 1 remove 0 failed 0 147 clvmd check_ringid cluster 3648 cpg 0:0 147 dlm:ls:clvmd ring 167773705:3648 5 memb 167773705 167773707 167773708 167773709 167773710 147 clvmd check_ringid done cluster 3648 cpg 167773705:3648 147 clvmd check_fencing disabled 147 clvmd send_start 167773708:1 counts 0 5 1 0 0 147 clvmd wait_messages cg 1 need 5 of 5 147 clvmd receive_start 167773708:1 len 92 147 clvmd match_change 167773708:1 matches cg 1 147 clvmd wait_messages cg 1 need 4 of 5 147 clvmd receive_start 167773709:12 len 92 147 clvmd match_change 167773709:12 matches cg 1 147 clvmd wait_messages cg 1 need 3 of 5 147 clvmd receive_start 167773710:14 len 92 147 clvmd match_change 167773710:14 matches cg 1 147 clvmd wait_messages cg 1 need 2 of 5 147 clvmd receive_start 167773705:4 len 92 147 clvmd match_change 167773705:4 matches cg 1 147 clvmd wait_messages cg 1 need 1 of 5 147 clvmd receive_start 167773707:8 len 92 147 clvmd match_change 167773707:8 matches cg 1 147 clvmd wait_messages cg 1 got all 5 147 clvmd start_kernel cg 1 member_count 5 147 write "1090842362" to "/sys/kernel/dlm/clvmd/id" 147 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773705" 147 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773707" 147 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773708" 147 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773709" 147 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773710" 147 write "1" to "/sys/kernel/dlm/clvmd/control" 147 write "0" to "/sys/kernel/dlm/clvmd/event_done" 147 clvmd prepare_plocks 147 clvmd set_plock_data_node from 0 to 167773705 147 clvmd save_plocks start 147 clvmd receive_plocks_done 167773705:4 flags 2 plocks_data 0 need 1 save 1 147 clvmd match_change 167773705:4 matc
[ClusterLabs] DLM recovery stuck
Hi David, Almost ten years ago you requested more info in a similar case, let's see if we can get further now! We're running a 6-node Corosync cluster. DLM is started by systemd: ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled) Active: active (running) since Thu 2018-08-09 17:13:18 CEST; 33min ago Docs: man:dlm_controld man:dlm.conf man:dlm_stonith Process: 3690 ExecStartPre=/sbin/modprobe dlm (code=exited, status=0/SUCCESS) Main PID: 3692 (dlm_controld) CGroup: /system.slice/dlm.service └─3692 /usr/sbin/dlm_controld --foreground -D --enable_fencing=0 All other nodes have cLVM volumes activated, but activation is stuck on this node (in the last step of a rolling cluster reboot): [ 136.729172] dlm: Using TCP for communications [ 136.743935] dlm: clvmd: joining the lockspace group... [ 136.749419] dlm: clvmd: dlm_recover 1 [ 136.749485] dlm: clvmd: add member 167773710 [ 136.749493] dlm: clvmd: add member 167773709 [ 136.749497] dlm: clvmd: add member 167773708 [ 136.749499] dlm: clvmd: add member 167773707 [ 136.749504] dlm: clvmd: add member 167773706 [ 136.749506] dlm: clvmd: add member 167773705 [ 136.749519] dlm: connecting to 167773709 [ 136.752848] dlm: connecting to 167773708 [ 136.752889] dlm: connecting to 167773707 [ 136.752918] dlm: connecting to 167773706 [ 136.752943] dlm: connecting to 167773705 [ 136.768589] dlm: clvmd: dlm_recover_members 6 nodes [ 136.941888] dlm: clvmd: group event done 0 0 [ 136.960496] dlm: clvmd: join complete [ 137.019929] device enp5s0f1 entered promiscuous mode [ 137.036637] device enp5s0f0 entered promiscuous mode [ 137.054869] device vhbond entered promiscuous mode [ 137.207059] be2net :05:00.0 enp5s0f0: Link is Up [ 137.252901] be2net :05:00.1 enp5s0f1: Link is Up [ 138.009742] device vlan39 entered promiscuous mode [ 138.151755] device vlan894 entered promiscuous mode [ 153.861395] scsi host1: BM_2032 : Event CXN_KILLED_RST_RCVD[10] received on CID : 9 [ 153.886619] connection2:0: detected conn error (1011) [ 364.687306] INFO: task clvmd:5242 blocked for more than 120 seconds. [ 364.708222] Not tainted 4.9.0-0.bpo.6-amd64 #1 Debian 4.9.88-1+deb9u1~bpo8+1 [ 364.733131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 364.758896] clvmd D0 5242 1 0x [ 364.776934] 98eff5ecfc00 98fff8543040 98efe9744140 [ 364.801322] 98518ec0 a8a64ddc7cb0 af20e973 aedde780 [ 364.825720] bd9dfe3d7034cec9 41fef240 fa4841fef240 98efe9744140 [ 364.850103] Call Trace: [ 364.858140] [] ? __schedule+0x243/0x6f0 [ 364.876181] [] ? alloc_pages_vma+0xb0/0x240 [ 364.895364] [] ? schedule+0x32/0x80 [ 364.912261] [] ? rwsem_down_read_failed+0x10a/0x160 [ 364.933732] [] ? call_rwsem_down_read_failed+0x14/0x30 [ 364.956059] [] ? down_read+0x1c/0x30 [ 364.973259] [] ? dlm_user_request+0x47/0x200 [dlm] [ 364.994443] [] ? cache_alloc_refill+0x20f/0x2b0 [ 365.014773] [] ? kmem_cache_alloc_trace+0xc2/0x200 [ 365.035962] [] ? device_write+0x5b6/0x7a0 [dlm] [ 365.056290] [] ? vfs_write+0xb3/0x1a0 [ 365.073754] [] ? SyS_write+0x52/0xc0 [ 365.090937] [] ? do_syscall_64+0x91/0x1a0 [ 365.109559] [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6 Here's the output of "dlm_tool dump" on the stuck node: 131 dlm_controld 4.0.5 started 131 our_nodeid 167773710 131 found /dev/misc/dlm-control minor 58 131 found /dev/misc/dlm-monitor minor 57 131 found /dev/misc/dlm_plock minor 56 131 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2 131 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2 131 set recover_callbacks 1 131 cmap totem.cluster_name = 'vhbl' 131 set cluster_name vhbl 131 /dev/misc/dlm-monitor fd 10 131 cluster quorum 0 seq 3680 nodes 1 131 cluster node 167773710 added seq 3680 131 set_configfs_node 167773710 10.0.6.14 local 1 131 cpg_join dlm:controld ... 131 setup_cpg_daemon 12 135 dlm:controld ring 167773705:3724 6 memb 167773705 167773706 167773707 167773708 167773709 167773710 135 dlm:controld ring 167773705:3724 6 memb 167773705 167773706 167773707 167773708 167773709 167773710 135 dlm:controld conf 6 1 0 memb 167773705 167773706 167773707 167773708 167773709 167773710 join 167773710 left 135 daemon joined 167773705 135 daemon joined 167773706 135 daemon joined 167773707 135 daemon joined 167773708 135 daemon joined 167773709 135 daemon joined 167773710 135 receive_protocol 167773707 max 3.1.1.0 run 3.1.1.1 135 daemon node 167773707 prot max 0.0.0.0 run 0.0.0.0 135 daemon node 167773707 save max 3.1.1.0 run 3.1.1.1 135 run protocol from nodeid 167773707 135 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1 135 plocks 13 135 cluster quorum 1 seq 3724 nodes 6 135 cluster node 167773705 added seq 3724 135 set_configfs_node 167773705 10.0.6.9 local 0 135 cluster node 167773706 added
Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?
Jan Pokorný writes: > 1. [X] Do you edit CIB by hand (as opposed to relying on crm/pcs or > their UI counterparts)? For debugging one has to understand the CIB anyway, so why learn additional syntaxes? :) Most of our configuration changes are scripted via a home-grown domain-specific CLI. Using crmsh or pcs under the hood instead of cibadmin and crm_resource would bring additional dependencies and require additional knowledge (of these tools). > 2. [X] Do you use "template" based syntactic simplification[1] in CIB? This allows changing templated resource parameters at a single place. > 3. [ ] Do you use "id-ref" based syntactic simplification[2] in CIB? > > 3.1 [ ] When positive about 3., would you mind much if "id-refs" got > unfold/exploded during the "cibadmin --upgrade --force" > equivalent as a reliability/safety precaution? > > 4. [ ] Do you use "tag" based syntactic grouping[3] in CIB? -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!
Jan Pokornýwrites: > On 12/04/18 14:33 +0200, Jan Friesse wrote: > >> This release contains a lot of fixes, including fix for >> CVE-2018-1084. > > Security related updates would preferably provide more context Absolutely, thanks for providing that! Looking at the git log, I wonder if c139255 (totemsrp: Implement sanity checks of received msgs) has direct security relevance as well. Should I include that too in the Debian security update? Debian stable has 2.4.2, so I'm cherry picking into that version. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Issues found in Pacemaker 1.1.18, fixes in 1.1 branch
Ken Gaillotwrites: > A couple of regressions have been found in the recent Pacemaker 1.1.18 > release. > > Fixes for these, plus one finishing an incomplete fix in 1.1.18, are in > the master branch, and have been backported to the 1.1 branch for ease > of patching. It is recommended that anyone compiling or packaging > 1.1.18 include all the commits from the 1.1 branch. Hi Ken, Did you consider cutting a new patch-level release with these fixes? That would help determining the presence of the fixes for bug reports and questions. Which is even more important if you don't plan to make further 1.x releases. > * 1.1.18 improved scalability by eliminating redundant node attribute > write-outs. This proved to be too aggressive in one case: when a > cluster is configured with only IP addresses (no node names) in > corosync.conf, attribute write-outs can be incorrectly delayed; in the > worst case, this prevents a node from shutting down due to the shutdown > attribute not being written. I guess this applies all the same for clusters defined without a Corosync nodelist. Right? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?
Andrei Borzenkovwrites: > 25.11.2017 10:05, Andrei Borzenkov пишет: > >> In one of guides suggested procedure to simulate split brain was to kill >> corosync process. It actually worked on one cluster, but on another >> corosync process was restarted after being killed without cluster >> noticing anything. Except after several attempts pacemaker died with >> stopping resources ... :) >> >> This is SLES12 SP2; I do not see any Restart in service definition so it >> probably not systemd. >> > FTR - it was not corosync, but pacemakker; its unit file specifies > RestartOn=error so killing corosync caused pacemaker to fail and be > restarted by systemd. And starting corosync via a Requires dependency? -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillotwrites: > When an operation completes, a history entry () is added to > the pe-input file. If the agent supports reload, the entry will include > op-force-restart and op-restart-digest fields. Now I see those are > present in the vm-alder_last_0 entry, so agent support isn't the issue. Thanks for the explanation. > However, the operation is recorded as a *failed* probe (i.e. the > resource was running where it wasn't expected). This gets recorded as a > separate vm-alder_last_failure_0 entry, which does not get the special > fields. It looks to me like this failure entry is forcing the restart. > That would be a good idea if it's an actual failure; if we find a > resource unexpectedly running, we don't know how it was started, so a > full restart makes sense. > > However, I'm guessing it may not have been a real error, but a resource > cleanup. A cleanup clears the history so the resource is re-probed, and > I suspect that re-probe is what got recorded here as a failure. Does > that match what actually happened? Well, I can't really remember, it happened two months ago... I'm pretty sure the resource wasn't running unexpectedly, I'd surely recall such a grave failure. Interestingly, though, my shell history contains a cleanup operation shortly after the parameter change. Also, if you look at the logs in my thread starting mail, you'll find warning: Processing failed op monitor for vm-alder on vhbl05: not running (7) which does not seem to match up with the failure in the lrm_rsc_op entry in pe-input. It's sort of "normal" that such a resource disappears and gets restarted by the cluster. If that report survived the unexpected restart, I might have wanted to routinely clean it up afterwards. (I'm leaving for a short holiday now, expect longer delays.) -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillotwrites: > The pe-input is indeed entirely sufficient. > > I forgot to check why the reload was not possible in this case. It > turns out it is this: > > trace: check_action_definition: Resource vm-alder doesn't know > how to reload > > Does the resource agent implement the "reload" action and advertise it > in the section of its metadata? Absolutely, I use this operation routinely. $ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain [...] And the implementation is just a no-op. vm-alder is based on a template, just like all other VMs: [...] [...] [...] [...] I wonder why it wouldn't know how to reload. How is that visible in the pe-input file? I'd check the other resources... -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?
Dennis Jacobfeuerbornwrites: > if I create a new unit file for the new file the services would not > depend on it so it wouldn't get automatically mounted when they start. Put the new unit file under /etc/systemd/system/x.service.requires to have x.service require it. I don't get the full picture, but this trick may help puzzle it together. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillot <kgail...@redhat.com> writes: > On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote: > >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >>> >>>> Ken Gaillot <kgail...@redhat.com> writes: >>>> >>>>> Hmm, stop+reload is definitely a bug. Can you attach (or email it >>>>> to me privately, or file a bz with it attached) the above pe-input >>>>> file with any sensitive info removed? >>>> >>>> I sent you the pe-input file privately. It indeed shows the >>>> issue: >>>> >>>> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS >>>> [...] >>>> Executing cluster transition: >>>> * Resource action: vm-alderstop on vhbl05 >>>> * Resource action: vm-alderreload on vhbl05 >>>> [...] >>>> >>>> Hope you can easily get to the bottom of this. >>> >>> This turned out to have the same underlying cause as CLBZ#5309. I >>> have a fix pending review, which I expect to make it into the >>> soon-to-be-released 1.1.18. >> >> Great! >> >>> It is a regression introduced in 1.1.15 by commit 2558d76f. The >>> logic for reloads was consolidated in one place, but that happened >>> to be before restarts were scheduled, so it no longer had the right >>> information about whether a restart was needed. Now, it sets an >>> ordering flag that is used later to cancel the reload if the restart >>> becomes required. I've also added a regression test for it. >> >> Restarts shouldn't even enter the picture here, so I don't get your >> explanation. But I also don't know the code, so that doesn't mean a >> thing. I'll test the next RC to be sure. > > :-) > > Reloads are done in place of restarts, when circumstances allow. So > reloads are always related to (potential) restarts. > > The problem arose because not all of the relevant circumstances are > known at the time the reload action is created. We may figure out later > that a resource the reloading resource depends on must be restarted, > therefore the reloading resource must be fully restarted instead of > reloaded. E.g. a database resource might otherwise be able to reload, > but not if the filesystem it's using is going away. > > Previously in those cases, we would end up scheduling both the reload > and the restart. Now, we schedule only the restart. Hi Ken, 1.1.18-rc3 indeed schedules a restart, not a reload, like 1.1.16 did. However, this wasn't my problem, I really expect a reload on the change of a non-unique parameter. Them problem was that 1.1.16 also executed a stop action in parallel with the reload. Maybe I test it wrong: I just copied the pe-input file to another system (which doesn't even know this resource agent) running 1.1.18-rc3 and gave it to crm_simulate. Does the pe-input file contain all the information necessary to decide between restart and reload? The op-force-restart attribute does not contain the name of the changed parameter, but I can't find any info on what changed at all. Should I see a clean reload in this test setup at all? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Colocation rule with vip and ms master
Norberto Lopes <nlopes...@gmail.com> writes: > On Fri, 27 Oct 2017 at 06:41 Ferenc Wágner <wf...@niif.hu> wrote: > >> Norberto Lopes <nlopes...@gmail.com> writes: >> >>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master >>> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave >>> >>> Basically what's occurring in my cluster is that the first rule stops the >>> Sync node from being promoted if the Master ever dies. The second doesn't >>> but I can't quite follow why. >> >> Getting a score of -inf means that the resource won't run. On the other >> hand, (+)inf just means "strongest" preference. > > Apologies but I'm not following. I'm probably misunderstanding something. > > From what I could gather from > https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_mandatory_placement.html > I can't follow the subtle difference between the two on a running > cluster. As an example: If backupVIP is already in node A and > postgresMS:Master in node B, and postgresMS:Master dies, in my case, > postgresMS:Master never gets promoted in node C. But from the -inf > rule it should be able to? > > Any insights into this would be greatly appreciated. You're right: what I said was based on my experience with location constraints, and according to the linked documentation colocation constraints behave differently. Sorry for misleading you. I'm leaving this discussion to the more knowledgeable. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Colocation rule with vip and ms master
Norberto Lopeswrites: > colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master > colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave > > Basically what's occurring in my cluster is that the first rule stops the > Sync node from being promoted if the Master ever dies. The second doesn't > but I can't quite follow why. Getting a score of -inf means that the resource won't run. On the other hand, (+)inf just means "strongest" preference. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillot <kgail...@redhat.com> writes: > On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote: >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to >>> me privately, or file a bz with it attached) the above pe-input file >>> with any sensitive info removed? >> >> I sent you the pe-input file privately. It indeed shows the issue: >> >> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS >> [...] >> Executing cluster transition: >> * Resource action: vm-alderstop on vhbl05 >> * Resource action: vm-alderreload on vhbl05 >> [...] >> >> Hope you can easily get to the bottom of this. > > This turned out to have the same underlying cause as CLBZ#5309. I have > a fix pending review, which I expect to make it into the soon-to-be- > released 1.1.18. Great! > It is a regression introduced in 1.1.15 by commit 2558d76f. The logic > for reloads was consolidated in one place, but that happened to be > before restarts were scheduled, so it no longer had the right > information about whether a restart was needed. Now, it sets an > ordering flag that is used later to cancel the reload if the restart > becomes required. I've also added a regression test for it. Restarts shouldn't even enter the picture here, so I don't get your explanation. But I also don't know the code, so that doesn't mean a thing. I'll test the next RC to be sure. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
Václav Mach <ma...@cesnet.cz> writes: > On 10/11/2017 09:00 AM, Ferenc Wágner wrote: > >> Václav Mach <ma...@cesnet.cz> writes: >> >>> allow-hotplug eth0 >>> iface eth0 inet dhcp >> >> Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a >> before network-online.target, which excludes allow-hotplug interfaces. >> That means allow-hotplug interfaces are not waited for before corosync >> is started during boot. > > That did the trick for network config using DHCP. Thanks for clarification. > > Do you know what is the reason, why allow-hotplug interfaces are > excluded? It's obivous that if ifup (according to it's man) is run as > 'ifup -a' it does ignore them, but I don't get why allow hotplug > interfaces should be ignored by init system. Allow-hotplug interfaces aren't assumed to be present all the time, but rather to be plugged in and out arbitrarily. They are handled by udev, asynchronously when the system is running. Waiting for them during bootup would be strange if you ask me. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterMon mail notification - does not work
Donat Zenichevwrites: > then resource is stopped, but nothing occurred on e-mail destination. > Where I did wrong actions? Please note that ClusterMon notifications are becoming deprecated (they should still work, but I've got no experience with them). Try using alerts instead, as documented at https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch07.html -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] corosync service not automatically started
Václav Machwrites: > allow-hotplug eth0 > iface eth0 inet dhcp Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a before network-online.target, which excludes allow-hotplug interfaces. That means allow-hotplug interfaces are not waited for before corosync is started during boot. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker resource parameter reload confusion
Ken Gaillotwrites: > Hmm, stop+reload is definitely a bug. Can you attach (or email it to me > privately, or file a bz with it attached) the above pe-input file with > any sensitive info removed? I sent you the pe-input file privately. It indeed shows the issue: $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS [...] Executing cluster transition: * Resource action: vm-alderstop on vhbl05 * Resource action: vm-alderreload on vhbl05 [...] Hope you can easily get to the bottom of this. > Nothing's been done about reload yet. It's waiting until we get around > to an overhaul of the OCF resource agent standard, so we can define > the semantics more clearly. It will involve replacing "unique" with > separate meta-data for reloadability and GUI hinting, and possibly > changes to the reload operation. Of course we'll try to stay backward- > compatible. Thanks for the confirmation. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.18 deprecation warnings
Ken Gaillotwrites: > * undocumented LRMD_MAX_CHILDREN environment variable > (PCMK_node_action_limit is the current syntax) By the way, is the current syntax documented somewhere? Looking at crmd/throttle.c, throttle_update_job_max() is only ever invoked with a NULL argument, so "Global preference from the CIB" isn't implemented either. Or do I overlook something? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > Back to problem you have. It's definitively HW issue but I'm thinking > how to solve it in software. Right now, I can see two ways: > 1. Set dog FD to be non blocking right at the end of setup_watchdog - >This is proffered but I'm not sure if it's really going to work. I'll run some test to see what works (if anything). The keepalives can be provided by write()s as well, but somehow I don't expect that to make a difference. We'll see. > 2. Create thread which makes sure to tackle wd regularly. That would work, but maybe too well if entirely decoupled from the main loop. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Klaus Wenningerwrites: > Just for my understanding: You are using watchdog-handling in corosync? Yes, I was. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Valentin Vidic <valentin.vi...@carnet.hr> writes: > On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > >> Confirmed: setting watchdog_device: off cluster wide got rid of the >> above warnings. > > Interesting, what brand or version of IPMI has this problem? It's a Fujitsu PRIMERGY BX924 S4 blade iRMC S4 with Firmware Version 8.43F and SDRR Version 3.60 ID 0376 BX924S4. $ sudo ipmitool -I open mc info Device ID : 52 Device Revision : 2 Firmware Revision : 1.00 IPMI Version : 2.0 Manufacturer ID : 10368 Manufacturer Name : Fujitsu Siemens Product ID: 886 (0x0376) Product Name : Unknown (0x376) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device Aux Firmware Rev Info : 0x08 0x2b 0x00 0x46 -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
wf...@niif.hu (Ferenc Wágner) writes: > Jan Friesse <jfrie...@redhat.com> writes: > >> wf...@niif.hu writes: >> >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >>> (in August; in May, it happened 0-2 times a day only, it's slowly >>> ramping up): >>> >>> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >>> configuration. >>> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >>> configuration. >>> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >>> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout >>> increase. >> >> ^^^ This is main problem you have to solve. It usually means that >> machine is too overloaded. It is happening quite often when corosync >> is running inside VM where host machine is unable to schedule regular >> VM running. > > After some extensive tracing, I think the problem lies elsewhere: my > IPMI watchdog device is slow beyond imagination. Confirmed: setting watchdog_device: off cluster wide got rid of the above warnings. > Its ioctl operations can take seconds, starving all other functions. > At least, it seems to block the main thread of Corosync. Is this a > plausible scenario? Corosync has two threads, what are their roles? -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout >> increase. > > ^^^ This is main problem you have to solve. It usually means that > machine is too overloaded. It is happening quite often when corosync > is running inside VM where host machine is unable to schedule regular > VM running. After some extensive tracing, I think the problem lies elsewhere: my IPMI watchdog device is slow beyond imagination. Its ioctl operations can take seconds, starving all other functions. At least, it seems to block the main thread of Corosync. Is this a plausible scenario? Corosync has two threads, what are their roles? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Digimer <li...@alteeve.ca> writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer <li...@alteeve.ca> writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> >>>> [...] >>>> While dlm_tool status reports (similar on all nodes): >>>> >>>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >>>> daemon now 2941405 fence_pid 0 >>>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0 >>>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0 >>>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0 >>>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0 >>>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0 >>>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0 >>>> >>>> dlm_tool ls shows "kern_stop": >>>> >>>> dlm lockspaces >>>> name clvmd >>>> id0x4104eefa >>>> flags 0x0004 kern_stop >>>> changemember 5 joined 0 remove 1 failed 1 seq 8,8 >>>> members 167773705 167773706 167773707 167773708 167773710 >>>> new changemember 6 joined 1 remove 0 failed 0 seq 9,9 >>>> new statuswait messages 1 >>>> new members 167773705 167773706 167773707 167773708 167773709 167773710 >>>> >>>> on all nodes except for vhbl07 (167773709), where it gives >>>> >>>> dlm lockspaces >>>> name clvmd >>>> id0x4104eefa >>>> flags 0x >>>> changemember 6 joined 1 remove 0 failed 0 seq 11,11 >>>> members 167773705 167773706 167773707 167773708 167773709 167773710 >>>> >>>> instead. >>>> >>>> [...] Is there a way to unblock DLM without rebooting all nodes? >>> >>> Looks like the lost node wasn't fenced. >> >> Why dlm status does not report any lost node then? Or do I misinterpret >> its output? >> >>> Do you have fencing configured and tested? If not, DLM will block >>> forever because it won't recover until it has been told that the lost >>> peer has been fenced, by design. >> >> What command would you recommend for unblocking DLM in this case? > > First, fix fencing. Do you have that setup and working? I really don't want DLM to do fencing. DLM blocking for a couple of days is not an issue in this setup (cLVM isn't a "service" of this cluster, only a rarely needed administration tool). Fencing is set up and works fine for Pacemaker, so it's used to recover actual HA services. But letting DLM use it resulted in disaster one and a half year ago (see Message-ID: <87r3g5a969@lant.ki.iif.hu>), which I failed to understand yet, and I'd rather not go there again until that's taken care of properly. So for now, a manual unblock path is all I'm after. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new configuration. vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled for 4317.0054 ms (threshold is 2400. ms). Consider token timeout increase. >>> >>> ^^^ This is main problem you have to solve. It usually means that >>> machine is too overloaded. [...] >> >> Before I start tracing the scheduler, I'd like to ask something: what >> wakes up the Corosync main process periodically? The token making a >> full circle? (Please forgive my simplistic understanding of the TOTEM >> protocol.) That would explain the recommendation in the log message, >> but does not fit well with the overload assumption: totally idle nodes >> could just as easily produce such warnings if there are no other regular >> wakeup sources. (I'm looking at timer_function_scheduler_timeout but I >> know too little of libqb to decide.) > > Corosync main loop is based on epoll, so corosync is waked up ether by > receiving data (network socket or unix socket for services) or when > there are data to sent and socket is ready for non blocking write or > after timeout. This timeout is exactly what you call other wakeup > resource. > > Timeout is used for scheduling periodical tasks inside corosync. > > One of periodical tasks is scheduler pause detector. It is basically > scheduled every (token_timeout / 3) msec and it computes diff between > current and last time. If diff is larger than (token_timeout * 0.8) it > displays warning. Thanks, I can work with this. I'll come back as soon as I find something (or need further information :). >>> As a start you can try what message say = Consider token timeout >>> increase. Currently you have 3 seconds, in theory 6 second should be >>> enough. >> >> It was probably high time I realized that token timeout is scaled >> automatically when one has a nodelist. When you say Corosync should >> work OK with default settings up to 16 nodes, you assume this scaling is >> in effect, don't you? On the other hand, I've got no nodelist in the >> config, but token = 3000, which is less than the default 1000+4*650 with >> six nodes, and this will get worse as the cluster grows. > > This is described in corosync.conf man page (token_coefficient). Yes, that's how I found out. It also says: "This value is used only when nodelist section is specified and contains at least 3 nodes." > Final timeout is computed using totem.token as a base value. So if you > set totem.token to 3000 it means that final totem timeout value is not > 3000 but (3000 + 4 * 650). But I've got no nodelist section, and according to the warning, my token timeout is indeed 3 seconds, as you promptly deduced. So the documentation seems to be correct. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout >> increase. > > ^^^ This is main problem you have to solve. It usually means that > machine is too overloaded. [...] Before I start tracing the scheduler, I'd like to ask something: what wakes up the Corosync main process periodically? The token making a full circle? (Please forgive my simplistic understanding of the TOTEM protocol.) That would explain the recommendation in the log message, but does not fit well with the overload assumption: totally idle nodes could just as easily produce such warnings if there are no other regular wakeup sources. (I'm looking at timer_function_scheduler_timeout but I know too little of libqb to decide.) > As a start you can try what message say = Consider token timeout > increase. Currently you have 3 seconds, in theory 6 second should be > enough. It was probably high time I realized that token timeout is scaled automatically when one has a nodelist. When you say Corosync should work OK with default settings up to 16 nodes, you assume this scaling is in effect, don't you? On the other hand, I've got no nodelist in the config, but token = 3000, which is less than the default 1000+4*650 with six nodes, and this will get worse as the cluster grows. Comments on the above ramblings welcome! I'm grateful for all the valuable input poured into this thread by all parties: it's proven really educative in quite unexpected ways beyond what I was able to ask in the beginning. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Klaus Wenningerwrites: > Just seen that you are hosting VMs which might make you use KSM ... > Don't fully remember at the moment but I have some memory of > issues with KSM and page-locking. > iirc it was some bug in the kernel memory-management that should > be fixed a long time ago but ... Hi Klaus, I failed to find anything relevant by a quick internet search. Can you recall something more specific, so that I can ensure I'm running with this issue fixed? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new configuration. vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled for 4317.0054 ms (threshold is 2400. ms). Consider token timeout increase. >>> >>> ^^^ This is main problem you have to solve. It usually means that >>> machine is too overloaded. It is happening quite often when corosync >>> is running inside VM where host machine is unable to schedule regular >>> VM running. >> >> Corosync isn't running in a VM here, these nodes are 2x8 core servers >> hosting VMs themselves as Pacemaker resources. (Incidentally, some of >> these VMs run Corosync to form a test cluster, but that should be >> irrelevant now.) And they aren't overloaded in any apparent way: Munin >> reports 2900% CPU idle (out of 32 hyperthreads). There's no swap, but >> the corosync process is locked into memory anyway. It's also running as >> SCHED_RR prio 99, competing only with multipathd and the SCHED_FIFO prio >> 99 kernel threads (migration/* and watchdog/*) under Linux 4.9. I'll >> try to take a closer look at the scheduling of these. Can you recommend >> some indicators to check out? > > No real hints. But one question. Are you 100% sure memory is locked? > Because we had problem where mlockall was called in wrong place so > corosync was actually not locked and it was causing similar issues. > > This behavior is fixed by > https://github.com/corosync/corosync/commit/238e2e62d8b960e7c10bfa0a8281d78ec99f3a26 I based this assertion on the L flag in the ps STAT column. The above commit should not affect me because I'm running corosync with the -f option: $ ps l 3805 F UID PID PPID PRI NIVSZ RSS WCHAN STAT TTYTIME COMMAND 4 0 3805 1 -100 - 247464 141016 - SLsl ?251:10 /usr/sbin/corosync -f By the way, are the above VSZ and RSS numbers reasonable? One more thing: these servers run without any swap. >>> As a start you can try what message say = Consider token timeout >>> increase. Currently you have 3 seconds, in theory 6 second should be >>> enough. >> >> OK, thanks for the tip. Can I do this on-line, without shutting down >> Corosync? > > Corosync way is to just edit/copy corosync.conf on all nodes and call > corosync-cfgtool -R on one of the nodes (crmsh/pcs may have better > way). Great, that's what I wanted to know: whether -R is expected to make this change effective. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Digimer <li...@alteeve.ca> writes: > On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > >> [...] >> While dlm_tool status reports (similar on all nodes): >> >> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >> daemon now 2941405 fence_pid 0 >> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0 >> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0 >> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0 >> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0 >> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0 >> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0 >> >> dlm_tool ls shows "kern_stop": >> >> dlm lockspaces >> name clvmd >> id0x4104eefa >> flags 0x0004 kern_stop >> changemember 5 joined 0 remove 1 failed 1 seq 8,8 >> members 167773705 167773706 167773707 167773708 167773710 >> new changemember 6 joined 1 remove 0 failed 0 seq 9,9 >> new statuswait messages 1 >> new members 167773705 167773706 167773707 167773708 167773709 167773710 >> >> on all nodes except for vhbl07 (167773709), where it gives >> >> dlm lockspaces >> name clvmd >> id0x4104eefa >> flags 0x >> changemember 6 joined 1 remove 0 failed 0 seq 11,11 >> members 167773705 167773706 167773707 167773708 167773709 167773710 >> >> instead. >> >> [...] Is there a way to unblock DLM without rebooting all nodes? > > Looks like the lost node wasn't fenced. Why dlm status does not report any lost node then? Or do I misinterpret its output? > Do you have fencing configured and tested? If not, DLM will block > forever because it won't recover until it has been told that the lost > peer has been fenced, by design. What command would you recommend for unblocking DLM in this case? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Jan Friessewrites: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new >> configuration. >> vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled >> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout >> increase. > > ^^^ This is main problem you have to solve. It usually means that > machine is too overloaded. It is happening quite often when corosync > is running inside VM where host machine is unable to schedule regular > VM running. Hi Honza, Corosync isn't running in a VM here, these nodes are 2x8 core servers hosting VMs themselves as Pacemaker resources. (Incidentally, some of these VMs run Corosync to form a test cluster, but that should be irrelevant now.) And they aren't overloaded in any apparent way: Munin reports 2900% CPU idle (out of 32 hyperthreads). There's no swap, but the corosync process is locked into memory anyway. It's also running as SCHED_RR prio 99, competing only with multipathd and the SCHED_FIFO prio 99 kernel threads (migration/* and watchdog/*) under Linux 4.9. I'll try to take a closer look at the scheduling of these. Can you recommend some indicators to check out? Are scheduling delays expected to generate TOTEM membership "changes" without any leaving and joining nodes? > As a start you can try what message say = Consider token timeout > increase. Currently you have 3 seconds, in theory 6 second should be > enough. OK, thanks for the tip. Can I do this on-line, without shutting down Corosync? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck
Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming new configuration. vhbl07 corosync[3805]: [MAIN ] Corosync main process was not scheduled for 4317.0054 ms (threshold is 2400. ms). Consider token timeout increase. vhbl07 corosync[3805]: [TOTEM ] A processor failed, forming new configuration. vhbl04 corosync[3759]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl05 corosync[3919]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl06 corosync[3759]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl07 corosync[3805]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl08 corosync[3687]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl03 corosync[3890]: [TOTEM ] A new membership (10.0.6.9:3056) was formed. Members vhbl07 corosync[3805]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl08 corosync[3687]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl06 corosync[3759]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl07 corosync[3805]: [MAIN ] Completed service synchronization, ready to provide service. vhbl04 corosync[3759]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl08 corosync[3687]: [MAIN ] Completed service synchronization, ready to provide service. vhbl06 corosync[3759]: [MAIN ] Completed service synchronization, ready to provide service. vhbl04 corosync[3759]: [MAIN ] Completed service synchronization, ready to provide service. vhbl05 corosync[3919]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl03 corosync[3890]: [QUORUM] Members[6]: 167773705 167773706 167773707 167773708 167773709 167773710 vhbl05 corosync[3919]: [MAIN ] Completed service synchronization, ready to provide service. vhbl03 corosync[3890]: [MAIN ] Completed service synchronization, ready to provide service. The cluster is running Corosync 2.4.2 multicast. Those lines really end at Members, there are no joined of left nodes listed. Pacemaker on top reacts like: [9982] vhbl03 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=3056 members=6 [9991] vhbl03 crmd: info: pcmk_quorum_notification: Quorum retained | membership=3056 members=6 [9986] vhbl03cib: info: cib_process_request:Completed cib_modify operation for section nodes: OK (rc=0, origin=vhbl07/crmd/4477, version=0.1694.12) [9986] vhbl03cib: info: cib_process_request:Completed cib_modify operation for section status: OK (rc=0, origin=vhbl07/crmd/4478, version=0.1694.12) [9986] vhbl03cib: info: cib_process_ping: Reporting our current digest to vhbl07: 85250f3039d269f96012750f13e232d9 for 0.1694.12 (0x55ef057447d0 0) on all nodes except for vhbl07, where it says: [9886] vhbl07 crmd: info: pcmk_quorum_notification: Quorum retained | membership=3056 members=6 [9877] vhbl07 pacemakerd: info: pcmk_quorum_notification: Quorum retained | membership=3056 members=6 [9881] vhbl07cib: info: cib_process_request:Forwarding cib_modify operation for section nodes to all (origin=local/crmd/ [9881] vhbl07cib: info: cib_process_request:Forwarding cib_modify operation for section status to all (origin=local/crmd [9881] vhbl07cib: info: cib_process_request:Completed cib_modify operation for section nodes: OK (rc=0, origin=vhbl07/cr [9881] vhbl07cib: info: cib_process_request:Completed cib_modify operation for section status: OK (rc=0, origin=vhbl07/c [9881] vhbl07cib: info: cib_process_ping: Reporting our current digest to vhbl07: 85250f3039d269f96012750f13e232d9 for 0.1694. So Pacemaker does nothing, basically, and I can't see any adverse effect to resource management, but DLM seems to have some problem, which may or may not be related. When the TOTEM error appears, all nodes log this: vhbl03 dlm_controld[3914]: 2801675 dlm:controld ring 167773705:3056 6 memb 167773705 167773706 167773707 167773708 167773709 167773710 vhbl03 dlm_controld[3914]: 2801675 fence work wait for cluster ringid vhbl03 dlm_controld[3914]: 2801675 dlm:ls:clvmd ring 167773705:3056 6 memb 167773705 167773706 167773707 167773708 167773709 167773710 vhbl03 dlm_controld[3914]: 2801675 clvmd wait_messages cg 9 need 1 of 6 vhbl03 dlm_controld[3914]: 2801675 fence work wait for cluster ringid vhbl03 dlm_controld[3914]: 2801675 cluster quorum 1 seq 3056 nodes 6 dlm_controld is running with --enable_fencing=0. Pacemaker does
Re: [ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)
Ken Gaillotwrites: > The most significant change in this release is a new cluster option to > improve scalability. > > As users start to create clusters with hundreds of resources and many > nodes, one bottleneck is a complete reprobe of all resources (for > example, after a cleanup of all resources). Hi, Does crm_resource --cleanup without any --resource specified do this? Does this happen any other (automatic or manual) way? > This can generate enough CIB updates to get the crmd's CIB connection > dropped for not processing them quickly enough. Is this a catastrophic scenario, or does the cluster recover gently? > This bottleneck has been addressed with a new cluster option, > cluster-ipc-limit, to raise the threshold for dropping the connection. > The default is 500. The recommended value is the number of nodes in the > cluster multiplied by the number of resources. I'm running a production cluster with 6 nodes and 159 resources (ATM), which gives almost twice the above default. What symptoms should I expect to see under 1.1.16? (1.1.16 has just been released with Debian stretch. We can't really upgrade it, but changing the built-in default is possible if it makes sense.) -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Notifications on changes in clustered LVM
Digimer <li...@alteeve.ca> writes: > On 19/06/17 11:40 PM, Andrei Borzenkov wrote: > >> 20.06.2017 02:15, Digimer пишет: >> >>> On 19/06/17 06:59 PM, Ferenc Wágner wrote: >>> >>>> Digimer <li...@alteeve.ca> writes: >>>> >>>>> So we have a tool that watches for changes to clvmd by running >>>>> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally >>>>> cause trouble. >>>> >>>> What kind of trouble did you experience? >>>> >>>>> Is there any other way to be notified or to check when something >>>>> changes? >>>> >>>> LV (de)activation generates udev events (due to block devices appearing/ >>>> disappearing). PVs too, but they don't go though clvmd. >>> >>> Interesting (dbus), I'll look into that. >> >> udev events are sent over netlink, not D-Bus. > > I've not used that before. Any docs on how to listen for those events, > by chance? If nothing off hand, don't worry, I can search. Or just configure udev to run appropriate programs on the events you're interested in. Less efficient, but simpler. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Notifications on changes in clustered LVM
Digimerwrites: > So we have a tool that watches for changes to clvmd by running > pvscan/vgscan/lvscan, but this seems to be expensive and occassionally > cause trouble. What kind of trouble did you experience? > Is there any other way to be notified or to check when something > changes? LV (de)activation generates udev events (due to block devices appearing/ disappearing). PVs too, but they don't go though clvmd. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Ubuntu 16.04 - Only binds on 127.0.0.1 then fails until reinstall
James Boothwrites: > Sorry for the repeat mails, but I had issues subscribing list time > (Looks like it has worked successfully now!). > > Anywho, I'm really desperate for some help on my issue in > http://lists.clusterlabs.org/pipermail/users/2017-April/005495.html - > I can recap the info in this thread and provide any configs if needed! Hi James, This thread is badly fragmented and confusing now, but let's try to proceed. It seems corosync ignores its config file. Maybe you edit a stray corosync.conf, not the one corosync actually reads (which should probably be /etc/corosync/corosync.conf). Please issue the following command as a regular user, and show us its output (make sure strace is installed): $ strace -f -eopen /usr/sbin/corosync -p -f It should reveal the name of the config file. For example, under a different version a section of the output looks like this: open("/dev/shm/qb-corosync-16489-blackbox-header", O_RDWR|O_CREAT|O_TRUNC, 0600) = 3 open("/dev/shm/qb-corosync-16489-blackbox-data", O_RDWR|O_CREAT|O_TRUNC, 0600) = 4 open("/etc/corosync/corosync.conf", O_RDONLY) = 3 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3 Process 16490 attached [pid 16489] open("/var/run/corosync.pid", O_WRONLY|O_CREAT, 0640) = -1 EACCES (Permission denied) If you can identify the name of the config file, please also post its path and its full content. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?
Ken Gaillot <kgail...@redhat.com> writes: > On 04/13/2017 11:11 AM, Ferenc Wágner wrote: > >> I encountered several (old) statements on various forums along the lines >> of: "the CIB is not a transactional database and shouldn't be used as >> one" or "resource parameters should only uniquely identify a resource, >> not configure it" and "the CIB was not designed to be a configuration >> database but people still use it that way". Sorry if I misquote these, >> I go by my memories now, I failed to dig up the links by a quick try. >> >> Well, I've been feeling guilty in the above offenses for years, but it >> worked out pretty well that way which helped to suppress these warnings >> in the back of my head. Still, I'm curious: what's the reason for these >> warnings, what are the dangers of "abusing" the CIB this way? >> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources >> configured. Old Pacemaker versions required tuning PCMK_ipc_buffer to >> handle this, but even the default is big enough nowadays (128 kB after >> compression, I guess). >> >> Am I walking on thin ice? What should I look out for? > > That's a good question. Certainly, there is some configuration > information in most resource definitions, so it's more a matter of degree. > > The main concerns I can think of are: > > 1. Size: Increasing the CIB size increases the I/O, CPU and networking > overhead of the cluster (and if it crosses the compression threshold, > significantly). It also marginally increases the time it takes the > policy engine to calculate a new state, which slows recovery. Thanks for the input, Ken! Is this what you mean? cib: info: crm_compress_string: Compressed 1028972 bytes into 69095 (ratio 14:1) in 138ms At the same time /var/lib/pacemaker/cib/cib.xml is 336K, and # cibadmin -Q --scope resources | wc -c 330951 # cibadmin -Q --scope status | wc -c 732820 Even though I consume about 2 kB per resource, the status section weights 2.2 times the resources section. Which means shrinking the resource size wouldn't change the full size significantly. At the same time, we should probably monitor the trends of the cluster messaging health as we expand it (with nodes and resources). What would be some useful indicators to graph? > 2. Consistency: Clusters can become partitioned. If changes are made on > one or more partitions during the separation, the changes won't be > reflected on all nodes until the partition heals, at which time the > cluster will reconcile them, potentially losing one side's changes. Ah, that's a very good point, which I neglected totally: even inquorate partitions can have configuration changes. Thanks for bringing this up! I wonder if there's any practical workaround for that. > I suppose this isn't qualitatively different from using a separate > configuration file, but those tend to be more static, and failure to > modify all copies would be more obvious when doing them individually > rather than issuing a single cluster command. From a different angle: if a node is off, you can't modify its configuration file. So you need an independent mechanism to do what the CIB synchronization does anyway, or a shared file system with its added complexity. On the other hand, one needn't guess how Pacemaker reconciles the conflicting resource configuration changes. Indeed, how does it? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Why shouldn't one store resource configuration in the CIB?
Hi, I encountered several (old) statements on various forums along the lines of: "the CIB is not a transactional database and shouldn't be used as one" or "resource parameters should only uniquely identify a resource, not configure it" and "the CIB was not designed to be a configuration database but people still use it that way". Sorry if I misquote these, I go by my memories now, I failed to dig up the links by a quick try. Well, I've been feeling guilty in the above offenses for years, but it worked out pretty well that way which helped to suppress these warnings in the back of my head. Still, I'm curious: what's the reason for these warnings, what are the dangers of "abusing" the CIB this way? /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources configured. Old Pacemaker versions required tuning PCMK_ipc_buffer to handle this, but even the default is big enough nowadays (128 kB after compression, I guess). Am I walking on thin ice? What should I look out for? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score
kgronl...@suse.com (Kristoffer Grönlund) writes: > I discovered today that a location constraint with score=INFINITY > doesn't actually restrict resources to running only on particular > nodes. Yeah, I made the same "discovery" some time ago. Since then I've been using something like the following to restrict my-rsc to my-node: -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Never join a list without a problem...
Jeffrey Westgatewrites: > We use Nagios to monitor, and once every 20 to 40 hours - sometimes > longer, and we cannot set a clock by it - while the machine is 95% > idle (or more according to 'top'), the host load shoots up to 50 or > 60%. It takes about 20 minutes to peak, and another 30 to 45 minutes > to come back down to baseline, which is mostly 0.00. (attached > hostload.pdf) This happens to both machines, randomly, and is > concerning, as we'd like to find what's causing it and resolve it. Try running atop (http://www.atoptool.nl/). It collects and logs process accounting info, allowing you to step back in time and check resource usage in the past. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Insert delay between the statup of VirtualDomain
Oscar Segarrawrites: > In my environment I have 5 guestes that have to be started up in a > specified order starting for the MySQL database server. We use a somewhat redesigned resource agent, which connects to the guest using a virtio channel and waits for a signal before exiting from the start operation. The signal is sent by an approriately placed startup script from the guest. This is fully independent from regular network traffic and does not need any channel configuration. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.
Jehan-Guillaume de Rorthaiswrites: > PAF use private attribute to give informations between actions. We > detect the failure during the notify as well, but raise the error > during the promotion itself. See how I dealt with this in PAF: > > https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68 This is the first time I hear about private attributes. Since they could come useful one day, I'd like to understand them better. After some reading, they seem to be node attributes, not resource attributes. This may be irrelevant for PAF, but doesn't it mean that two resources of the same type on the same node would interfere with each other? Also, your _set_priv_attr could fall into an infinite loop if another instance used it at the inappropriate moment. Do I miss something here? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???
Ken Gaillotwrites: > On 02/07/2017 01:11 AM, Ulrich Windl wrote: > >> Ken Gaillot writes: >> >>> On 02/06/2017 03:28 AM, Ulrich Windl wrote: >>> Isn't the question: Is crmd a process that is expected to die (and thus need restarting)? Or wouldn't one prefer to debug this situation. I fear that restarting it might just cover some fatal failure... >>> >>> If crmd or corosync dies, the node will be fenced (if fencing is enabled >>> and working). If one of the crmd's persistent connections (such as to >>> the cib) fails, it will exit, so it ends up the same. >> >> But isn't it due to crmd not responding to network packets? So if the >> timeout is long enough, and crmd is started fast enough, will the >> node really be fenced? > > If crmd dies, it leaves its corosync process group, and I'm pretty sure > the other nodes will fence it for that reason, regardless of the duration. See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html for a case when a Pacemaker cluster survived a crmd failure and restart. Re-reading the thread, I'm still unsure what saved our ass from resources being started in parallel and losing massive data. I'd fully expect fencing in such cases... -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker kill does not cause node fault ???
Ken Gaillot <kgail...@redhat.com> writes: > On 02/03/2017 07:00 AM, RaSca wrote: >> >> On 03/02/2017 11:06, Ferenc Wágner wrote: >>> Ken Gaillot <kgail...@redhat.com> writes: >>> >>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote: >>>> >>>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup >>>>> seems to be working ok including the STONITH. >>>>> For test purposes I issued a "pkill -f pace" killing all pacemaker >>>>> processes on one node. >>>>> >>>>> Result: >>>>> The node is marked as "pending", all resources stay on it. If I >>>>> manually kill a resource it is not noticed. On the other node a drbd >>>>> "promote" command fails (drbd is still running as master on the first >>>>> node). >>>> >>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly >>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop >>>> pacemaker". >>> >>> What exactly is "quickly enough"? >> >> What Ken is saying is that Pacemaker, as a service managed by systemd, >> have in its service definition file >> (/usr/lib/systemd/system/pacemaker.service) this option: >> >> Restart=on-failure >> >> Looking at [1] it is explained: systemd restarts immediately the process >> if it ends for some unexpected reason (like a forced kill). >> >> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html > > And the cluster itself is resilient to some daemon restarts. If only > pacemakerd is killed, corosync and pacemaker's crmd can still function > without any issues. When pacemakerd respawns, it reestablishes contact > with any other cluster daemons still running (and its pacemakerd peers > on other cluster nodes). KillMode=process looks like is a very important compenent of the service file then. Probably worth commenting, especially its relation to Restart=on-failure (it also affects plain stop operations, of course). But I still wonder how "quickly enough" could be quantified. Have we got a timeout for this, or are we good while the cluster is quiescent, or maybe something else? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Failed reload
Hi, There was an interesting discussion on this list about "Doing reload right" last July (which I still haven't digested entirely). Now I've got a related question about the current and intented behavior: what happens if a reload operation fails? I found some suggestions in http://ocf.community.tummy.narkive.com/RngPlNfz/adding-reload-to-the-ocf-specification, from 11 years back, and the question wasn't clear cut at all. Now I'm contemplating adding best-effort reloads to an RA, but not sure what behavior I can expect and depend on in the long run. I'd be grateful for your insights. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker kill does not cause node fault ???
Ken Gaillotwrites: > On 01/10/2017 04:24 AM, Stefan Schloesser wrote: > >> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup >> seems to be working ok including the STONITH. >> For test purposes I issued a "pkill -f pace" killing all pacemaker >> processes on one node. >> >> Result: >> The node is marked as "pending", all resources stay on it. If I >> manually kill a resource it is not noticed. On the other node a drbd >> "promote" command fails (drbd is still running as master on the first >> node). > > I suspect that, when you kill pacemakerd, systemd respawns it quickly > enough that fencing is unnecessary. Try "pkill -f pace; systemd stop > pacemaker". What exactly is "quickly enough"? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] HALVM problem with 2 nodes cluster
Marco Marinowrites: > Ferenc, regarding the flag use_lvmetad in > /usr/lib/ocf/resource.d/heartbeat/LVM I read: > >> lvmetad is a daemon that caches lvm metadata to improve the >> performance of LVM commands. This daemon should never be used when >> volume groups exist that are being managed by the cluster. The >> lvmetad daemon introduces a response lag, where certain LVM commands >> look like they have completed (like vg activation) when in fact the >> command is still in progress by the lvmetad. This can cause >> reliability issues when managing volume groups in the cluster. For >> Example, if you have a volume group that is a dependency for another >> application, it is possible the cluster will think the volume group >> is activated and attempt to start the application before volume group >> is really accesible... lvmetad is bad. > > in the function LVM_validate_all() Wow, if this is true, then this is serious breakage in LVM. Thanks for the pointer. I think this should be brought up with the LVM developers. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] HALVM problem with 2 nodes cluster
Marco Marinowrites: > I agree with you for > use_lvmetad = 0 (setting it = 1 in a clustered environment is an error) Where does this information come from? AFAIK, if locking_type=3 (LVM uses internal clustered locking, that is, clvmd), lvmetad is not used anyway, even if it's running. So it's best to disable it to avoid warning messages all around. This is the case with active/active clustering in LVM itself, in which Pacemaker isn't involved. On the other hand, if you use Pacemaker to do active/passive clustering by appropriately activating/deactivating your VG, this isn't clustering from the LVM point of view, you don't set the clustered flag on your VG, don't run clvmd and use locking_type=1. Lvmetad should be perfectly fine with this in principle (unless it caches metadata of inactive VGs, which would be stupid, but I never tested this). > but I think I have to set > locking_type = 3 only if I use clvm Right. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts
Ken Gaillotwrites: > * When you move the VM, the cluster detects that it is not running on > the node you told it to keep it running on. Because there is no > "Stopped" monitor, the cluster doesn't immediately realize that a new > rogue instance is running on another node. So, the cluster thinks the VM > crashed on the original node, and recovers it by starting it again. Ken, do you mean that if a periodic "stopped" monitor is configured, it is forced to run immediately (out of schedule) when the regular periodic monitor unexpectedly returns with stopped status? That is, before the cluster takes the recovery action? Conceptually, that would be similar to the probe run on node startup. If not, then maybe it would be a useful resource option to have (I mean running cluster-wide probes on an unexpected monitor failure, before recovery). An optional safety check. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)
Jan Friesse <jfrie...@redhat.com> writes: > Ferenc Wágner napsal(a): > >> Have you got any plans/timeline for 2.4.2 yet? > > Yep, I'm going to release it in few minutes/hours. Man, that was quick. I've got a bunch of typo fixes queued..:) Please consider announcing upcoming releases a couple of days in advance; as a packager, I'd much appreciate it. Maybe even tag release candidates... Anyway, I've got a question concerning corosync-qnetd. I run it as user and group coroqnetd. Is granting it read access to cert8.db and key3.db enough for proper operation? corosync-qnetd-certutil gives write access to group coroqnetd to everything, which seems unintuitive to me. Please note that I've got zero experience with NSS. But I don't expect the daemon to change the certificate database. Should I? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Special care needed when upgrading Pacemaker Remote nodes
Ken Gaillotwrites: > This spurred me to complete a long-planned overhaul of Pacemaker > Explained's "Upgrading" appendix: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_upgrading.html > > Feedback is welcome. Since you asked for it..:) 1. Table D.1.: why does a rolling upgrade imply any service outage? Always? 2. Detach method: why use rsc_defaults instead of maintenance mode? 3. When do you think 1.1.16 will be released? With approximately how much ABI incompatibility in the libraries? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!
Jan Friessewrites: > Please note that because of required changes in votequorum, > libvotequorum is no longer binary compatible. This is reason for > version bump. Er, what version bump? Corosync 2.4.1 still produces libvotequorum.so.7.0.0 for me, just like Corosync 2.3.6. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
Ken Gaillotwrites: > Does anyone know of an RA that uses reload correctly? My resource agents advertise a no-op reload action for handling their "private" meta attributes. Meta in the sense that they are used by the resource agent when performing certain operations, not by the managed resource itself. Which means they are trivially changeable online, without any resource operation whatsoever. > Does anyone object to the (backward-incompatible) solution proposed > here? I'm all for cleanups, but please keep an online migration path around. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DLM standalone without crm ?
"Lentes, Bernd"writes: > i don't have neither an init-script nor a systemd service file. > The only packages i find in the repositories concerning dlm are: > libdlm3-3.00.01-0.31.87 > libdlm-3.00.01-0.31.87 > And i have a kernel module for dlm. > Nothing else. Sorry, my experience is limited to DLM 4. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DLM standalone without crm ?
"Lentes, Bernd"writes: > is it possible to have a DLM running without CRM? Yes. You'll need to configure fencing, though, since by default DLM will try to use stonithd (from Pacemaker). But DLM fencing didn't handle fencing failures correctly for me, resulting in more nodes being fenced until quorum was lost. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] restarting pacemakerd
Hi, Could somebody please elaborate a little why the pacemaker systemd service file contains "Restart=on-failure"? I mean that a failed node gets fenced anyway, so most of the time this would be a futile effort. On the other hand, one could argue that restarting failed services should be the default behavior of systemd (or any init system). Still, it is not. I'd be grateful for some insight into the matter. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Alert notes
Klaus Wenninger <kwenn...@redhat.com> writes: > On 06/16/2016 11:05 AM, Ferenc Wágner wrote: > >> Klaus Wenninger <kwenn...@redhat.com> writes: >> >>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote: >>> >>>> I think the default timestamp should contain date and time zone >>>> specification to make it unambigous. >>> >>> Idea was to have a trade-off between length and amount of information. >> >> I don't think it's worth saving a couple of bytes by dropping this >> information. In many cases there will be some way to recover it (from >> SMTP headers or system logs), but that complicates things. > > Wasn't about saving some bytes in the size of a file or so but > rather to keep readability. If the timestamp fills your screen > you won't be able to read the actual information...have a look > at /var/log/messages... > Pure intention was to have a default that creates a kind of nice-looking > output together with the file-example to give people an impression > what they could do with the feature. I see. Incidentally, the file example is probably the one which would profit most of having full timestamps. And some locking. >> In a similar vein, keeping the sequence number around would simplify >> alert ordering and loss detection on the receiver side. Especially with >> SNMP, where the transport is unreliable as well. > > Nice idea... any OID in mind? No. But you can always extend PACEMAKER-MIB. > Unfortunately the sequence-number we have right now als environment- > variable is not really fit for this purpuse. It counts up with each > and every alert being sent on a single node. So if you have multiple > alerts configured you would experience gaps that prevent you from > using it as loss-detection. I see, it isn't per alert, unfortunately. Still better than nothing, though... >>>> (BTW I'd prefer to run the alert scripts as a different user than the >>>> various Pacemaker components, but that would lead too far now.) >>> >>> well, something we thought about already and a point where the >>> new feature breaks the ClusterMon-Interface. >>> Unfortunately the impact is quite high - crmd has dropped privileges - >>> but if the pain-level rises high enough ... >> >> There's very little room to do this. You'd need to configure an alert >> user and group, and store them in the saved uid/gid set before dropping >> privileges for the crmd process. Or use a separate daemon for sending >> alerts, which feels cleaner. > > Yes 2nd daemon was the idea. We don't want to give more rights > to crmd than it needs. Btw. the daemon is there already: lrmd ;-) It's running as root already, so at least no problem changing to any user. And the default could be hacluster. >> You are right. The snmptrap tool does the string->binary conversion if >> it gets the correct format. Otherwise, if the length matches, is does a >> plain cast to binary, interpreting for example 12:34:56.78 as >> 12594-58-51,52:58:53.54,.55:56. Looks like the sample SNMP alert agent >> shouldn't let the uses choose any timestamp-format but >> %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this >> in the current design. > > Well, generic vs. failsafe ;-) > Of course one could introduce something like the metadata in RAs > to achieve things like that but we wanted to keep the ball flat... > After all the scripts are just examples...and the timestamp-format > that should work is given in the header of the script... More emphasis would help, I think. >> Maybe it would be more appropriate to get the timestamp from crmd as >> a high resolution (fractional) epoch all the time, and do the string >> conversion in the agents as necessary. One could still control the >> format via instance_attributes where allowed. Or keep around the >> current mechanism as well to reduce code duplication in the agents. >> Just some ideas... > > epoch was actually my first default ... > additional epoch might be interesting alternative... It would be useful. Actually, crm_time_format_hr() currently fails for any format string ending with any %-escape but N. For example, "%Yx" is formatted as "2016x", but "%Y" returns NULL. You can avoid fixing this by providing a fractional epoch instead. :) -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Alert notes
Klaus Wenninger <kwenn...@redhat.com> writes: > On 06/15/2016 06:11 PM, Ferenc Wágner wrote: > >> Please find some random notes about my adventures testing the new alert >> system. >> >> The first alert example in the documentation has no recipient: >> >> >> >> In the example above, the cluster will call my-script.sh for each >> event. >> >> while the next section starts as: >> >> Each alert may be configured with one or more recipients. The cluster >> will call the agent separately for each recipient. > > The goal of the first example is to be as simple as possible. > But of course it makes sense to mention that it is not compulsory > to ad a recipient. And I guess it makes sense to point that out > as it is just ugly to think that you have to fake a recipient while > it wouldn't make any sense in your context. I agree. >> I think the default timestamp should contain date and time zone >> specification to make it unambigous. > > Idea was to have a trade-off between length and amount of information. I don't think it's worth saving a couple of bytes by dropping this information. In many cases there will be some way to recover it (from SMTP headers or system logs), but that complicates things. In a similar vein, keeping the sequence number around would simplify alert ordering and loss detection on the receiver side. Especially with SNMP, where the transport is unreliable as well. >> (BTW I'd prefer to run the alert scripts as a different user than the >> various Pacemaker components, but that would lead too far now.) > > well, something we thought about already and a point where the > new feature breaks the ClusterMon-Interface. > Unfortunately the impact is quite high - crmd has dropped privileges - > but if the pain-level rises high enough ... There's very little room to do this. You'd need to configure an alert user and group, and store them in the saved uid/gid set before dropping privileges for the crmd process. Or use a separate daemon for sending alerts, which feels cleaner. >> The SNMP agent seems to have a problem with hrSystemDate, which should >> be an OCTETSTR with strict format, not some plain textual timestamp. >> But I haven't really looked into this yet. > > Actually I had tried it with the snmptrap-tool coming with rhel-7.2 > and it worked with the string given in the example. > Did you copy it 1-1? There is a typo in the document having the > double-quotes double. The format is strict and there are actually > 2 formats allowed - on with timezone and one without. The > format string given should match the latter. You are right. The snmptrap tool does the string->binary conversion if it gets the correct format. Otherwise, if the length matches, is does a plain cast to binary, interpreting for example 12:34:56.78 as 12594-58-51,52:58:53.54,.55:56. Looks like the sample SNMP alert agent shouldn't let the uses choose any timestamp-format but %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this in the current design. Maybe it would be more appropriate to get the timestamp from crmd as a high resolution (fractional) epoch all the time, and do the string conversion in the agents as necessary. One could still control the format via instance_attributes where allowed. Or keep around the current mechanism as well to reduce code duplication in the agents. Just some ideas... -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change
Ilia Sokolinskiwrites: > We have a custom Master-Slave resource running on a 3-node pcs cluster on > CentOS 7.1 > > As part of what is supposed to be an NDU we do update some properties of the > resource. > For some reason this causes both Master and Slave instances of the resource > to be restarted. > > Since restart takes a fairly long time for us, the update becomes very much > disruptive. > > Is this expected? Yes, if you changed a parameter declared with unique="1" in your resource agent metadata. > We have not seen this behavior with the previous release of pacemaker. I'm surprised... -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster
Nikhil Utanewrites: > Would like to know the best and easiest way to add a new node to an already > running cluster. > > Our limitation: > 1) pcsd cannot be used since (as per my understanding) it communicates over > ssh which is prevented. > 2) No manual editing of corosync.conf If you use IPv4 multicast for Corosync 2 communication, then you needn't have a nodelist in corosync.conf. However, if you want a quorum provider, then expected_votes must be set correctly, otherwise a small partition booting up could mistakenly assume it has quorum. In a live system all corosync daemons will recognize new nodes and increase their "live" expected_votes accordingly. But they won't write this back to the config file, leading to lack of information on reboot if they can't learn better from their peers. > So what I am thinking is, the first node will add nodelist with nodeid: 1 > into its corosync.conf file. > > nodelist { > node { > ring0_addr: node1 > nodeid: 1 > } > } > > The second node to be added will get this information through some other > means and add itself with nodeid: 2 into it's corosync file. > Now the question I have is, does node1 also need to be updated with > information about node 2? It'd better, at least to exclude any possibility of clashing nodeids. > When i tested it locally, the cluster was up even without node1 having > node2 in its corosync.conf. Node2's corosync had both. If node1 doesn't > need to be told about node2, is there a way where we don't configure the > nodes but let them discover each other through the multicast IP (best > option). If you use IPv4 multicast and don't specify otherwise, the node IDs are assigned according to the ring0 addresses (IPv4 addresses are 32 bit integers after all). But you still have to update expected_votes. > Assuming we should add it to keep the files in sync, what's the best way to > add the node information (either itself or other) preferably through some > CLI command? There's no corosync tool to update the config file. An Augeas lense is provided for corosync.conf though, which should help with the task (I myself never tried it). Then corosync-cfgtool -R makes all daemons in the cluster reload their config files. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] how to "switch on" cLVM ?
"Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes: > - On Jun 7, 2016, at 3:53 PM, Ferenc Wágner wf...@niif.hu wrote: > >> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes: >> >>> Ok. Does DLM takes care that a LV just can be used on one host ? >> >> No. Even plain LVM uses locks to serialize access to its metadata >> (avoid concurrent writes corrupting it). These locks are provided by >> the host kernel (locking_type=1). DLM extends the locking concept to a >> full cluster from a single host, which is exactly what cLVM needs. This >> is activated by locking_type=3. > > So DLM and cLVM just takes care that the metadata is consistent. > None of them controls any access to the LV itself ? cLVM contols activation as well (besides metadata consistency), but does not control access to activated LVs, which are cluster-unaware device-mapper devices, just like under plain LVM. >>> cLVM just takes care that the naming is the same on all nodes, right? >> >> More than that. As above, it keeps the LVM metadata consistent amongst >> the members of the cluster. It can also activate LVs on all members >> ("global" activation), or ensure that an LV is active on a single member >> only ("exclusive" activation). >> >>>>> Later on it's possible that some vm's run on host 1 and some on host 2. >>>>> Does >>>>> clvm needs to be a ressource managed by the cluster manager? >> >> The clvm daemon can be handled as a cloned cluster resource, but it >> isn't necessary. It requires corosync (or some other membership/ >> communication layer) and DLM to work. DLM can be configured to do its >> own fencing or to use that of Pacemaker (if present). >> >>>>> If i use a fs inside the lv, a "normal" fs like ext3 is sufficient, i >>>>> think. But >>>>> it has to be a cluster ressource, right ? >> >> If your filesystem is a plain cluster resource, then your resource >> manager will ensure that it isn't mounted on more than one node, and >> everything should be all right. >> >> Same with VMs on LVs: assuming no LV is used by two VMs (which would >> bring back the previous problem on another level) and your VMs are >> non-clone cluster resources, your resource manager will ensure that each >> LV is used by a single VM only (on whichever host), and everything >> should be all right, even though your LVs are active on all hosts (which >> makes live migration possible, if your resource agent supports that). > > Does the LV need to be a ressource (if i don't have a FS) ? No. (If you use cLVM. If you don't use cLVM, then your VGs must be resources, otherwise nothing guarrantees the consistency of their metadata.) > From what i understand from what you say the LV's are active on all > hosts, and the ressource manager controls that a VM is just running on > one host, so the LV is just used by one host. Right ? So it has not to > be a ressource. Right. (The LVs must be active on all hosts to enable free live migration. There might be other solutions, because the LVs receive I/O on one host only at any given time, but then you have to persuade your hypervisor that the block device it wants will really be available once migration is complete.) -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] how to "switch on" cLVM ?
"Lentes, Bernd"writes: > Ok. Does DLM takes care that a LV just can be used on one host ? No. Even plain LVM uses locks to serialize access to its metadata (avoid concurrent writes corrupting it). These locks are provided by the host kernel (locking_type=1). DLM extends the locking concept to a full cluster from a single host, which is exactly what cLVM needs. This is activated by locking_type=3. > cLVM just takes care that the naming is the same on all nodes, right? More than that. As above, it keeps the LVM metadata consistent amongst the members of the cluster. It can also activate LVs on all members ("global" activation), or ensure that an LV is active on a single member only ("exclusive" activation). >>> Later on it's possible that some vm's run on host 1 and some on host 2. Does >>> clvm needs to be a ressource managed by the cluster manager? The clvm daemon can be handled as a cloned cluster resource, but it isn't necessary. It requires corosync (or some other membership/ communication layer) and DLM to work. DLM can be configured to do its own fencing or to use that of Pacemaker (if present). >>> If i use a fs inside the lv, a "normal" fs like ext3 is sufficient, i >>> think. But >>> it has to be a cluster ressource, right ? If your filesystem is a plain cluster resource, then your resource manager will ensure that it isn't mounted on more than one node, and everything should be all right. Same with VMs on LVs: assuming no LV is used by two VMs (which would bring back the previous problem on another level) and your VMs are non-clone cluster resources, your resource manager will ensure that each LV is used by a single VM only (on whichever host), and everything should be all right, even though your LVs are active on all hosts (which makes live migration possible, if your resource agent supports that). -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Can't get nfs4 to work.
"Stephano-Shachter, Dylan"writes: > I can not figure out why version 4 is not supported. Have you got fsid=root (or fsid=0) on your root export? See man exports. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsTrouble with deb packaging from 1.12 to 1.15
Andrey Rogovskywrites: > I have deb rules, comes from 1.12 and try apply it to current release. 1.1.14 is available in sid, stretch and jessie-backports, any reason you can't use those packages? > In the building I get an error: > dh_testroot -a > rm -rf `pwd`/debian/tmp/usr/lib/service_crm.so > rm -rf `pwd`/debian/tmp/usr/lib/service_crm.la > rm -rf `pwd`/debian/tmp/usr/lib/service_crm.a > dh_install --sourcedir=debian/tmp --list-missing > dh_install: pacemaker missing files (usr/lib*/heartbeat/attrd), aborting This doesn't seem like coming from any recent packaging of Pacemaker. > I was check buildroot - this directory and symlinks is missing > Is this correct? May be I need add they manual? It's expected, unless you configure such lib(exec)dir explicitly. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node
David Teiglandwrites: > On Tue, Apr 26, 2016 at 09:57:06PM +0200, Valentin Vidic wrote: > >> The bug is caused by the missing braces in the expanded if >> statement. >> >> Do you think we can get a new version out with this patch as the >> fencing in 4.0.4 does not work properly due to this issue? > > Thanks for seeing that, I'll fix it right away. I uploaded the new release to Debian. Sorry for the breakage. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] operation parallelism
Hi, Are recurring monitor operations constrained by the batch-limit cluster option? I ask because I'd like to limit the number of parallel start and stop operations (because they are resource hungry and potentially take long) without starving other operations, especially monitors. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts
Ken Gaillotwrites: > Each alert may have any number of recipients configured. These values > will simply be passed to the script as arguments. The first recipient > will also be passed as the CRM_alert_recipient environment variable, > for compatibility with existing scripts that only support one > recipient. > [...] > In the current implementation, meta-attributes and instance attributes > may also be specified within the block, in which case they > override any values specified in the block when sent to that > recipient. Sorry, I don't get this. The first paragraph above tells me that for a given cluster event each is run once, with all recipients passed as command line arguments to the alert executable. But a single invocation can only have a single set of environmental variables, so how can you override instance attributes for individual recipients? > Whether this stays in the final 1.1.15 release or not depends on > whether people find this to be useful, or confusing. Now guess..:) -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsAntw: Re: Utilization zones
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: > Ferenc Wágner <wf...@niif.hu> schrieb am 19.04.2016 um 13:42 in Nachricht > >> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: >> >>> Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht >>> >>>> I'm using the "balanced" placement strategy with good success. It >>>> distributes our VM resources according to memory size perfectly. >>>> However, I'd like to take the NUMA topology into account. That means >>>> each host should have several capacity pools (of each capacity type) to >>>> arrange the resources in. Can Pacemaker do something like this? >>> >>> I think you can, but depending on VM technology, the hypervisor may >>> not care much about NUMA. More details? >> >> The NUMA technology would be handled by the resource agent, if it was >> told by Pacemaker which utilization zone to use on its host. I just >> need the policy engine to do more granular resource placement and >> communicate the selected zone to the resource agents on the hosts. >> >> I'm pretty sure there's no direct support for this, but there might be >> different approaches I missed. Thus I'm looking for ideas here. > > My initial idea was this: Define a memory resource for every NUMA pool > on each host, the assign your resources to NUMA pools (utilization): > The resources will pick some host, but when one pool is full, your > resources cannot go to another pool. Is something like this what you > wanted? Yes, and you also see correctly why this solution is unsatisfactory: I don't want to tie my resources to a fraction of the host capacities (like for example the first NUMA nodes of the hosts). If nothing better comes up, I'll probably interleave all my VM memory and forget about the NUMA topology until I find the time to implement a new placement strategy. That would be an unfortunate pessimization, though. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Utilization zones
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes: > Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht > >> I'm using the "balanced" placement strategy with good success. It >> distributes our VM resources according to memory size perfectly. >> However, I'd like to take the NUMA topology into account. That means >> each host should have several capacity pools (of each capacity type) to >> arrange the resources in. Can Pacemaker do something like this? > > I think you can, but depending on VM technology, the hypervisor may > not care much about NUMA. More details? The NUMA technology would be handled by the resource agent, if it was told by Pacemaker which utilization zone to use on its host. I just need the policy engine to do more granular resource placement and communicate the selected zone to the resource agents on the hosts. I'm pretty sure there's no direct support for this, but there might be different approaches I missed. Thus I'm looking for ideas here. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Utilization zones
Hi, I'm using the "balanced" placement strategy with good success. It distributes our VM resources according to memory size perfectly. However, I'd like to take the NUMA topology into account. That means each host should have several capacity pools (of each capacity type) to arrange the resources in. Can Pacemaker do something like this? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] crmd error: Cannot route message to unknown node
Hi, On a freshly rebooted cluster node (after crm_mon reports it as 'online'), I get the following: wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl05, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl06, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl07, removing fail-count-vm-cedar Cleaning up vm-cedar on vhbl08, removing fail-count-vm-cedar Waiting for 6 replies from the CRMd..No messages received in 60 seconds.. aborting Meanwhile, this is written into syslog (I can also provide info level logs if necessary): 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node vhbl03 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node vhbl04 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node vhbl06 22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node vhbl07 22:03:04 vhbl08 crmd[8990]: notice: Operation vm-cedar_monitor_0: not running (node=vhbl08, call=626, rc=7, cib-update=169, confirmed=true) For background: wferi@vhbl08:~$ sudo cibadmin --scope=nodes -Q Why does this happen? I've got no node names in corosync.conf, but Pacemaker defaults to uname -n all right. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsAntw: Re: spread out resources
"Ulrich Windl"writes: > Actually form my SLES11 SP[1-4] experience, the cluster always > distributes resources across all available nodes, and only if don't > want that, I'll have to add constraints. I wonder why that does not > seem to work for you. Because I'd like to spread small subsets of the resources (one such subset is A, B, C and D) independently. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] spread out resources
Ken Gaillot <kgail...@redhat.com> writes: > On 03/30/2016 08:37 PM, Ferenc Wágner wrote: > >> I've got a couple of resources (A, B, C, D, ... more than cluster nodes) >> that I want to spread out to different nodes as much as possible. They >> are all the same, there's no distinguished one amongst them. I tried >> >> >> >> >> >> >> >> >> >> >> >> But crm_simulate did not finish with the above in the CIB. >> What's a good way to get this working? > > Per the docs, "A colocated set with sequential=false makes sense only if > there is another set in the constraint. Otherwise, the constraint has no > effect." Using sequential=false would allow another set to depend on all > these resources, without them depending on each other. That was the very idea behind the above colocation constraint: it contains the same group twice. Yeah, it's somewhat contrived, but I had no other idea with any chance of success. And this one failed as well. > I haven't actually tried resource sets with negative scores, so I'm not > sure what happens there. With sequential=true, I'd guess that each > resource would avoid the resource listed before it, but not necessarily > any of the others. Probably, but that isn't what I'm after. > By default, pacemaker does spread things out as evenly as possible, so I > don't think anything special is needed. Yes, but only on the scale of all resources. And I've also got a hundred independent ones, which wash out this global spreading effect if you consider only a select handful. > If you want more control over the assignment, you can look into > placement strategies: We use balanced placement to account for the different memory requirements of the various resources globally. It would be possible to introduce a new, artifical utilization "dimension" for each resource group we want to spread independently, but this doesn't sound very compelling. For sets of two resources, a simple negative colocation constraint works very well; it'd be a pity if it wasn't possible to extend this concept to larger sets. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service
(Please post only to the list, or at least keep it amongst the Cc-s.) Momcilo Medic <fedorau...@fedoraproject.org> writes: > On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner <wf...@niif.hu> wrote: >> Momcilo Medic <fedorau...@fedoraproject.org> writes: >> >>> I have three hosts setup in my test environment. >>> They each have two connections to the SAN which has GFS2 on it. >>> >>> Everything works like a charm, except when I reboot a host. >>> Once it tries to stop gfs2-utils service it will just hang. >> >> Are you sure the OS reboot sequence does not stop the network or >> corosync before GFS and DLM? > > I specifically configured services to start in this order: > Corosync - DLM - GFS2-utils > and to shutdown in this order: > GFS2-utils - DLM - Corosync. > > I've acomplish this with: > update-rc.d -f corosync remove > update-rc.d -f corosync-notifyd remove > update-rc.d -f dlm remove > update-rc.d -f gfs2-utils remove > update-rc.d -f xendomains remove > update-rc.d corosync start 25 2 3 4 5 . stop 35 0 1 6 . > update-rc.d corosync-notifyd start 25 2 3 4 5 . stop 35 0 1 6 . > update-rc.d dlm start 30 2 3 4 5 . stop 30 0 1 6 . > update-rc.d gfs2-utils start 35 2 3 4 5 . stop 25 0 1 6 . > update-rc.d xendomains start 40 2 3 4 5 . stop 20 0 1 6 . I don't know your OS, the above may or may not work. > Also, the moment I was capturing logs, corosync and dlm were not > running as services, but in foreground debugging mode. > SSH connection did not break until I powered down the host so network > is not stopped either. At least you've got interactive debugging ability then. So try to find out why the Corosync membership broke down. The output of corosync-quorumtool and corosync-cpgtool might help. Also try pinging the Corosync ring0 addresses between the nodes. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service
Momcilo Medicwrites: > I have three hosts setup in my test environment. > They each have two connections to the SAN which has GFS2 on it. > > Everything works like a charm, except when I reboot a host. > Once it tries to stop gfs2-utils service it will just hang. Are you sure the OS reboot sequence does not stop the network or corosync before GFS and DLM? -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] "No such device" with fence_pve agent
Ken Gaillotwrites: > There is a fence parameter pcmk_host_check that specifies how pacemaker > determines which fence devices can fence which nodes. The default is > dynamic-list, which means to run the fence agent's list command to get > the nodes. [...] > > You can specify pcmk_host_list or pcmk_host_map to use a static target > list for the device. I meant to research this, but now that you brought it up: does the default of pcmk_host_check automatically change to static-list if pcmk_host_list is defined? Does pcmk_host_map override pcmk_host_list? Does it play together with pcmk_host_check=dynamic-list? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker startup-fencing
Andrei Borzenkov <arvidj...@gmail.com> writes: > On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner <wf...@niif.hu> wrote: > >> Pacemaker explained says about this cluster option: >> >> Advanced Use Only: Should the cluster shoot unseen nodes? Not using >> the default is very unsafe! >> >> 1. What are those "unseen" nodes? > > Nodes that lost communication with other nodes (think of unplugging cables) Translating to node status, does is mean UNCLEAN (offline) nodes which suddenly return? Can Pacemaker tell these apart from abruptly power cycled nodes (when reboot happens before the comeback)? I guess if a node was successfully fenced at the time, it won't be considered UNCLEAN, but is that the only way to avoid that? >> And a possibly related question: >> >> 2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up, >>so that they don't get fenced when I switch them on? I mean without >>removing the node altogether, to keep its capacity settings for >>example. > > You can declare node as down using "crm node clearstate". You should > not really do it unless you ascertained that node is actually > physically down. Great. Is there an equivalent in bare bones Pacemaker, that is, not involving the CRM shell? Like deleting some status or LRMD history element of the node, for example? >> And some more about fencing: >> >> 3. What's the difference in cluster behavior between >>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be >> retried?) >>- having no configured STONITH devices (resources won't be started, >> right?) >>- failing to STONITH with some error (on every node) >>- timing out the STONITH operation >>- manual fencing > > I do not think there is much difference. Without fencing pacemaker > cannot make decision to relocate resources so cluster will be stuck. Then I wonder why I hear the "must have working fencing if you value your data" mantra so often (and always without explanation). After all, it does not risk the data, only the automatic cluster recovery, right? >> 4. What's the modern way to do manual fencing? (stonith_admin >>--confirm + what? > > node name. :) I did really poor wording that question. I meant to ask what kind of cluster (STONITH) configuration makes the cluster sit patiently until I do the manual fencing, then carry on without timeouts or other errors. Just as if some automatic fencing agent did the job, but letting me investigate the node status beforehand. -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker startup-fencing
Hi, Pacemaker explained says about this cluster option: Advanced Use Only: Should the cluster shoot unseen nodes? Not using the default is very unsafe! 1. What are those "unseen" nodes? And a possibly related question: 2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up, so that they don't get fenced when I switch them on? I mean without removing the node altogether, to keep its capacity settings for example. And some more about fencing: 3. What's the difference in cluster behavior between - stonith-enabled=FALSE (9.3.2: how often will the stop operation be retried?) - having no configured STONITH devices (resources won't be started, right?) - failing to STONITH with some error (on every node) - timing out the STONITH operation - manual fencing 4. What's the modern way to do manual fencing? (stonith_admin --confirm + what? I ask because meatware.so comes from cluster-glue and uses the old API). -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker startup-fencing
Andrei Borzenkov <arvidj...@gmail.com> writes: > On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg <lars.ellenb...@linbit.com> > wrote: > >> On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote: >> >>>>> And some more about fencing: >>>>> >>>>> 3. What's the difference in cluster behavior between >>>>>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be >>>>> retried?) >>>>>- having no configured STONITH devices (resources won't be started, >>>>> right?) >>>>>- failing to STONITH with some error (on every node) >>>>>- timing out the STONITH operation >>>>>- manual fencing >>>> >>>> I do not think there is much difference. Without fencing pacemaker >>>> cannot make decision to relocate resources so cluster will be stuck. >>> >>> Then I wonder why I hear the "must have working fencing if you value >>> your data" mantra so often (and always without explanation). After all, >>> it does not risk the data, only the automatic cluster recovery, right? >> >> stonith-enabled=false >> means: >> if some node becomes unresponsive, >> it is immediately *assumed* it was "clean" dead. >> no fencing takes place, >> resource takeover happens without further protection. > > Oh! Actually it is not quite clear from documentation; documentation > does not explain what happens in case of stonith-enabled=false at all. Yes, this is a crucially important piece of information, which should be prominently announced in the documentation. Thanks for spelling it out, Lars. Hope you don't mind that I turned your text into https://github.com/ClusterLabs/pacemaker/pull/960. -- Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] GFS and cLVM fencing requirements with DLM
Hi, I'm referring here to an ancient LKML thread introducing DLM. In http://article.gmane.org/gmane.linux.kernel/299788 David Teigland states: GFS requires that a failed node be fenced prior to gfs being told to begin recovery for that node which sounds very plausible as according to that thread DLM itself does not make sure fencing happens before DLM recovery, thus DLM locks could be granted to others before the failed node is fenced (if at all). Now more than ten years passed and I wonder 1. if the above is still true (or maybe my interpretation was wrong to start with) 2. how it is arranged for in the GFS2 code (I failed to find it with naive search phrases) 3. whether clvmd does the same 4. what are the pros/cons of disabling DLM fencing (even the dlm_stonith proxy) and leaving fencing fully to the resource manager (Pacemaker) -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Regular pengine warnings after a transient failure
Ken Gaillot <kgail...@redhat.com> writes: > On 03/07/2016 02:03 PM, Ferenc Wágner wrote: > >> The transition-keys match, does this mean that the above is a late >> result from the monitor operation which was considered timed-out >> previously? How did it reach vhbl07, if the DC at that time was vhbl03? >> >>> The pe-input files from the transitions around here should help. >> >> They are available. What shall I look for? > > It's not the most user-friendly of tools, but crm_simulate can show how > the cluster would react to each transition: crm_simulate -Sx $FILE.bz2 $ /usr/sbin/crm_simulate -Sx pe-input-430.bz2 -D recover_many.dot [...] $ dot recover_many.dot -Tpng >recover_many.png dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.573572 to fit The result is a 32767x254 bitmap of green ellipses connected by arrows. Most arrows are impossible to follow, but the picture seems to agree with the textual output from crm_simulate: * 30 FAILED resources on vhbl05 are to be recovered * 32 Stopped resources are to be started (these are actually running, but considered Stopped as a consequence of the crmd restart on vhbl03) On the other hand, simulation based on pe-input-431.bz2 reports * only 2 FAILED resources to recover on vhbl05 * 36 resources to start (the 4 new are the ones whose recoveries started during the previous -- aborted -- transition) I failed to extract anything out of these simulations than what was already known from the logs. But I'm happy to see that the cluster probes the disappeared resources on vhbl03 (where they disappeared with the crmd restart) even though it plans to start some of them on other nodes. -- Regards, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Regular pengine warnings after a transient failure
Andrew Beekhof <abeek...@redhat.com> writes: > On Tue, Mar 8, 2016 at 7:03 AM, Ferenc Wágner <wf...@niif.hu> wrote: > >> Ken Gaillot <kgail...@redhat.com> writes: >> >>> On 03/07/2016 07:31 AM, Ferenc Wágner wrote: >>> >>>> 12:55:13 vhbl07 crmd[8484]: notice: Transition aborted by >>>> vm-eiffel_monitor_6 'create' on vhbl05: Foreign event >>>> (magic=0:0;521:0:0:634eef05-39c1-4093-94d4-8d624b423bb7, cib=0.613.98, >>>> source=process_graph_event:600, 0) >>> >>> That means the action was initiated by a different node (the previous DC >>> presumably), > > I suspect s/previous/other/ Is there a way to find out for sure? > With a stuck machine its entirely possible that the other nodes elected a > new leader. > Would I be right in guessing that fencing is disabled? No, fencing is enabled. However, Corosync did not experience any problem. I guess it's locked in memory and doesn't need storage anymore. (I elided the rest of your answer, thanks for those parts, too; I think those are settled now.) -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org