[ClusterLabs] PCMK_ipc_buffer recommendation

2019-01-17 Thread Ferenc Wágner
Hi,

Looking at lib/common/ipc.c, Pacemaker recommends setting
PCMK_ipc_buffer to 4 times the *uncompressed* size of the biggest
message seen:

error: Could not compress the message (2309508 bytes) into less than the 
configured ipc limit (131072 bytes). Set PCMK_ipc_buffer to a higher value 
(9238032 bytes suggested)

Before setting it, I'd like to ask for confirmation: is a 10 MB buffer
really reasonable and recommended in the above case?  I wonder what
effect it will have on total memory consumption.  Growing 10 MB would be
OK, growing 10 MB * some biggish number wouldn't.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] live migration rarely fails seemingly without reason

2018-12-03 Thread Ferenc Wágner
"Lentes, Bernd"  writes:

> 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03 
> 15:03:02.835+: 4515: error : qemuMigrationCheckJobStatus:1456 : operation 
> failed: migration job: unexpectedly failed

The above message is a hint at the real problem.  It comes from
libvirtd, so you should investigate there.  I'd check the libvirtd logs
for further clues.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Ferenc Wágner
Patrick Whitney  writes:

> I have a two node (test) cluster running corosync/pacemaker with DLM
> and CLVM.
>
> I was running into an issue where when one node failed, the remaining node
> would appear to do the right thing, from the pcmk perspective, that is.
> It would  create a new cluster (of one) and fence the other node, but
> then, rather surprisingly, DLM would see the other node offline, and it
> would go offline itself, abandoning the lockspace.
>
> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, and
> our tests are now working as expected.

I'm running a larger Pacemaker cluster with standalone DLM + cLVM (that
is, they are started by systemd, not by Pacemaker).  I've seen weird DLM
fencing behavior, but not what you describe above (though I ran with
more than two nodes from the very start).  Actually, I don't even
understand how it occured to you to disable DLM fencing to fix that...

> I'm a little concern I have masked an issue by doing this, as in all
> of the tutorials and docs I've read, there is no mention of having to
> configure DLM whatsoever.

Unfortunately it's very hard to come by any reliable info about DLM.  I
had a couple of enlightening exchanges with David Teigland (its primary
author) on this list, he is very helpful indeed, but I'm still very far
from having a working understanding of it.

But I've been running with --enable_fencing=0 for years without issues,
leaving all fencing to Pacemaker.  Note that manual cLVM operations are
the only users of DLM here, so delayed fencing does not cause any
problems, the cluster services do not depend on DLM being operational (I
mean it can stay frozen for several days -- as it happened in a couple
of pathological cases).  GFS2 would be a very different thing, I guess.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield  writes:

> I'm also looking into high-res timestamps for logfiles too.

Wouldn't that be a useful option for the syslog output as well?  I'm
sometimes concerned by the batching effect added by the transport
between the application and the (local) log server (rsyslog or systemd).
Reliably merging messages from different channels can prove impossible
without internal timestamps (even considering a single machine only).

Another interesting feature could be structured, direct journal output
(if you're looking for challenges).
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Ken Gaillot  writes:

> libqb would simply provide the API for reopening the log, and clients
> such as pacemaker would intercept the signal and call the API.

Just for posterity: you needn't restrict yourself to signals.  Logrotate
has nothing to do with signals.  Signals are a rather limited form of
IPC, which might be a good tool for some applications.  Both Pacemaker
and Corosync already employ much richer IPC mechanisms, which might be
more natural to extend for triggering log rotation than adding a new IPC
mechanism.  Logrotate optionally runs scripts before and after renaming
the log files; these can invoke kill, corosync-cmapctl, cibadmin and so
on all the same.  It's entirely your call.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Salvaging aborted resource migration

2018-09-27 Thread Ferenc Wágner
Ken Gaillot  writes:

> On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote:
> 
>> Obviously you violated the most important cluster rule that is "be
>> patient".  Maybe the next important is "Don't change the
>> configuration while the cluster is not in IDLE state" ;-)
>
> Agreed -- although even idle, removing a ban can result in a migration
> back (if something like stickiness doesn't prevent it).

I've got no problem with that in general.  However, I can't gurantee
that every configuration change happens in idle state, certain
operations (mostly resource additions) are done by several
administrators without synchronization, and of course asynchronous
cluster events can also happen any time.  So I have to ask: what are the
consequences of breaking this "impossible" rule?

> There's currently no way to tell pacemaker that an operation (i.e.
> migrate_from) is a no-op and can be ignored. If a migration is only
> partially completed, it has to be considered a failure and reverted.

OK.  Are there other complex operations which can "partially complete"
if a transition is aborted by some event?

Now let's suppose a pull migration scenario: migrate_to does nothing,
but in this tiny window a configuration change aborts the transition.
The resources would go through a full recovery (stop+start), right?
Now let's suppose migrate_from gets scheduled and starts performing the
migration.  Before it finishes, a configuration change aborts the
transition.  The cluster waits for the outstanding operation to finish,
doesn't it?  And if it finishes successfully, is the migration
considered complete requiring no recovery?

> I'm not sure why the reload was scheduled; I suspect it's a bug due to
> a restart being needed but no parameters having changed. There should
> be special handling for a partial migration to make the stop required.

Probably CLBZ#5309 again...  You debugged a pe-input file for me with a
similar issue almost exactly a year ago (thread subject "Pacemaker
resource parameter reload confusion").  Time to upgrade this cluster, I
guess.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield  writes:

> TBH I would be quite happy to leave this to logrotate but the message I
> was getting here is that we need additional help from libqb. I'm willing
> to go with a consensus on this though

Yes, to do a proper job logrotate has to have a way to get the log files
reopened.  And applications can't do that without support from libqb, if
I understood Honza right.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-27 Thread Ferenc Wágner
Christine Caulfield  writes:

> I'm looking into new features for libqb and the option in
> https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425
> looks like a good option to me.

It feels backwards to me: traditionally, increasing numbers signify
older rotated logs, while this proposal does the opposite.  And what
happens on application restart?  Do you overwrite from 0?  Do you ever
jump back to 0?  It also leaves the problem of cleaning up old log files
unsolved...

> Though adding an API call to re-open the log file could be done too -
> I'm not averse to having both,

Not addig log rotation policy (and implementation!) to each application
is a win in my opinion, and also unifies local administration.  Syslog
is pretty good in this regard, my only gripe with it is that its time
stamps can't be quite as precise as the ones from the (realtime)
application (even nowadays, under systemd).  And that it can block the
log stream... on the other hand, disk latencies can block log writes
just as well.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Salvaging aborted resource migration

2018-09-27 Thread Ferenc Wágner
Hi,

The current behavior of cancelled migration with Pacemaker 1.1.16 with a
resource implementing push migration:

# /usr/sbin/crm_resource --ban -r vm-conv-4

vhbl03 crmd[10017]:   notice: State transition S_IDLE -> S_POLICY_ENGINE
vhbl03 pengine[10016]:   notice: Migrate vm-conv-4#011(Started vhbl07 -> vhbl04)
vhbl03 crmd[10017]:   notice: Initiating migrate_to operation 
vm-conv-4_migrate_to_0 on vhbl07
vhbl03 pengine[10016]:   notice: Calculated transition 4633, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-1069.bz2
[...]

At this point, with the migration still ongoing, I wanted to get rid of
the constraint:

# /usr/sbin/crm_resource --clear -r vm-conv-4

vhbl03 crmd[10017]:   notice: Transition aborted by deletion of 
rsc_location[@id='cli-ban-vm-conv-4-on-vhbl07']: Configuration change
vhbl07 crmd[10233]:   notice: Result of migrate_to operation for vm-conv-4 on 
vhbl07: 0 (ok)
vhbl03 crmd[10017]:   notice: Transition 4633 (Complete=6, Pending=0, Fired=0, 
Skipped=1, Incomplete=6, Source=/var/lib/pacemaker/pengine/pe-input-1069.bz2): 
Stopped
vhbl03 pengine[10016]:   notice: Resource vm-conv-4 can no longer migrate to 
vhbl04. Stopping on vhbl07 too
vhbl03 pengine[10016]:   notice: Reload  vm-conv-4#011(Started vhbl07)
vhbl03 pengine[10016]:   notice: Calculated transition 4634, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-1070.bz2
vhbl03 crmd[10017]:   notice: Initiating stop operation vm-conv-4_stop_0 on 
vhbl07
vhbl03 crmd[10017]:   notice: Initiating stop operation vm-conv-4_stop_0 on 
vhbl04
vhbl03 crmd[10017]:   notice: Initiating reload operation vm-conv-4_reload_0 on 
vhbl04

This recovery was entirely unnecessary, as the resource successfully
migrated to vhbl04 (the migrate_from operation does nothing).  Pacemaker
does not know this, but is there a way to educate it?  I think in this
special case it is possible to redesign the agent making migrate_to a
no-op and doing everything in migrate_from, which would significantly
reduce the window between the start points of the two "halfs", but I'm
not sure that would help in the end: Pacemaker could still decide to do
an unnecessary stop+start recovery.  Would it?  I failed to find any
documentation on recovery from aborted migration transitions.  I don't
expect on-fail (for migrate_* ops, not me) to apply here, does it?

Side question: why initiate a reload in any case, like above?

Even more side question: could you please consider using space instead
of TAB in syslog messages?  (Actually, I wouldn't mind getting rid of
them altogether in any output.)
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-26 Thread Ferenc Wágner
Jan Friesse  writes:

> wagner.fer...@kifu.gov.hu writes:
>
>> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common
>> choices, but logging.* cmap keys probably fit Corosync better).  That
>> would enable proper log rotation.
>
> What is the reason that you find "copytruncate" as non-proper log
> rotation? I know there is a risk to loose some lines, but it should be
> pretty small.

Yes, there's a chance of losing some messages.  It may be acceptable in
some cases, but it's never desirable.  The copy operation also wastes
I/O bandwidth.  Reopening the log files on some external trigger is a
better solution on all accounts and also an industry standard.

> Anyway, this again one of the feature where support from libqb would
> be nice to have (there is actually issue opened
> https://github.com/ClusterLabs/libqb/issues/239).

That's a convoluted one for a simple reopen!  But yes, if libqb does not
expose such functionality, you can't do much about it.  I'll stay with
syslog for now. :)  In cluster environments centralised log management is
a must anyway, and that's annoying to achieve with direct file logs.

>> Jan Friesse  writes:
>>
>>> No matter how much I still believe totemsrp as a library would be
>>> super nice to have - but current state is far away from what I would
>>> call library (= something small, without non-related things like
>>> transports/ip/..., testable (ideally with unit tests testing corner
>>> cases)) and making one fat binary looks like a better way.
>>>
>>> I'll made a patch and send PR (it should be easy).
>>
>> Sounds sensible.  Somebody can still split it out later if needed.
>
> Yep (and PR send + merged already :) )

Great!  Did you mean to keep the totem.h, totemip.h, totempg.h and
totemstats.h header files installed nevertheless?  And totem_pg.pc could
go as well, I guess.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse  writes:

> Default example config should be definitively ported to newer style of
> nodelist without interface section. example.udpu can probably be
> deleted as well as example.xml (whole idea of having XML was because
> of cluster config tools like pcs, but these tools never used
> corosync.xml).

Kind of strange, because the inherently hierarchical Corosync
configuration admits a very natural XML representation.

> I was also thinking about allowing timestamp by default, because log
> without timestamp is useless.

I recommend adding high resolution timestamps, even, but for the direct
file log only, not for syslog (by default).  And log file reopening
triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common
choices, but logging.* cmap keys probably fit Corosync better).  That
would enable proper log rotation.

>> Finally, something totally unrelated: the libtotem_pg shared object
>> isn't standalone anymore, it has several undefined symbols (icmap_get_*,
>> stats_knet_add_member, etc) which are defined in the corosync binary.
>
> This must be fixed.

Or rather eliminated, if I read correctly below.

>> Why is it still a separate object then?
>
> Honestly, I don't have too much strong reasons. We've talked with
> Chrissie about it last year, and actually only reason I was able to
> find out was to have a code/component separation so in theory other
> project can use totem (what was original idea, but it never happened
> and I don't think it will ever happen). Anyway, conclusion was to
> remove the totem as a shared library and keep it as a static library
> only, but nobody actually implemented that yet.

That doesn't buy you anything if you use it in a single binary only.

> No matter how much I still believe totemsrp as a library would be
> super nice to have - but current state is far away from what I would
> call library (= something small, without non-related things like
> transports/ip/..., testable (ideally with unit tests testing corner
> cases)) and making one fat binary looks like a better way.
>
> I'll made a patch and send PR (it should be easy).

Sounds sensible.  Somebody can still split it out later if needed.

> Thank you for the testing and reporting problems!

My pleasure, speaking about the latter.  I haven't got to do any
significant testing yet, unfortunately.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-09-24 Thread Ferenc Wágner
Jan Friesse  writes:

> Have you had a time to play with packaging current alpha to find out
> if there are no issues? I had no problems with Fedora, but Debian has
> a lot of patches, and I would be really grateful if we could reduce
> them a lot - so please let me know if there is patch which you've sent
> PR for and it's not merged yet.

Hi Honza,

Sorry for the delay.  You've already merged my PR for two simple typos,
thanks!  Beyond that, there really isn't much in our patch queue
anymore.  As far as I can see, current master even has a patch for error
propagation in notifyd, which will let us drop one more!  And we arrive
at the example configs.  We prefer syslog for several reasons
(copytruncate rotation isn't pretty, decoupling possible I/O stalls) and
we haven't got the /var/log/cluster legacy.  But more importantly, the
knet default transport requires a nodelist instead of interfaces, unlike
mcast udp.  The "ring" terminology might need a change as well,
especially ring0_addr.  So I'd welcome an overhaul of the (now knet)
example config, but I'm not personally qualified for doing that. :)

Finally, something totally unrelated: the libtotem_pg shared object
isn't standalone anymore, it has several undefined symbols (icmap_get_*,
stats_knet_add_member, etc) which are defined in the corosync binary.
Why is it still a separate object then?
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 3 release plans?

2018-08-27 Thread Ferenc Wágner
Jan Friesse  writes:

> Currently I'm pretty happy with current Corosync alpha stability so it
> would be possible to release final right now, but because I want to
> give us some room to break protocol/abi (only if needed and right now
> I don't see any strong reason for such breakage), I didn't release it
> yet.

Great!

> Currently I'm planning to release 3.0.0 in the beginning of December
> but if it would mean to miss Debian freeze date I'm open to release it
> sooner.

No need, we should be plenty good with this, the target date of the
transition freeze is 2019-Jan-12.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Corosync 3 release plans? (was: Redundant ring not recovering after node is back)

2018-08-26 Thread Ferenc Wágner
Jan Friesse  writes:

> try corosync 3.x (current Alpha4 is pretty stable [...]

Hi Honza,

Can you provide an estimate for the Corosync 3 release timeline?  We
have to plan the ABI transition in Debian anf the freeze date is drawing
closer.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes:

> David Tolosa  writes:
>
>> I tried to install corosync 3.x and it works pretty well.
>> But when I install pacemaker, it installs previous version of corosync as
>> dependency and breaks all the setup.
>> Any suggestions?
>
> Install the equivs package to create a dummy corosync package
> representing your local corosync build.
> https://manpages.debian.org/stretch/equivs/equivs-build.1.en.html

Forget it, libcfg changed ABI, so you'll have to recompile Pacemaker
after all.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Redundant ring not recovering after node is back

2018-08-25 Thread Ferenc Wágner
David Tolosa  writes:

> I tried to install corosync 3.x and it works pretty well.
> But when I install pacemaker, it installs previous version of corosync as
> dependency and breaks all the setup.
> Any suggestions?

Install the equivs package to create a dummy corosync package
representing your local corosync build.
https://manpages.debian.org/stretch/equivs/equivs-build.1.en.html
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Spurious node loss in corosync cluster

2018-08-22 Thread Ferenc Wágner
Jan Friesse  writes:

> Is that system VM or physical machine? Because " Corosync main process
> was not scheduled for..." is usually happening on VMs where hosts are
> highly overloaded.

Or when physical hosts use BMC watchdogs.  But Prasad didn't encounter
such logs in the setup at hand, as far as I understand.
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM recovery stuck (digression: Corosync watchdog experience)

2018-08-10 Thread Ferenc Wágner
FeldHost™ Admin  writes:

> rule of thumb is use separate dedicated network for corosync traffic.
> For ex. we use two corosync rings, first and active one on separate
> network card and switch, second passive one on team (bond) device vlan.

Hi,

That's fine in principle, but this is a bladecenter setting, we can't
really use separate networks cards, it's a single chassis at the end of
the day.  Besides, we've not encountered Corosync glitches.  The
Corosync virtual network is shared with the DLM traffic only and has 200
Mb/s bandwidth dedicated to it in the interface (BIOS) setup.

Failure story for amusement: the blades expose a BMC watchdog device to
the OS, which was picked up by Corosync.  It seemed like a useful second
line of defense in case fencing (BMC IPMI power) failed for any reason;
I let it live and forgot about it.  Months later, after a firmware
upgrade the BMC had to be restarted, and the watchdog device ioctl
blocked Corosync for a minute or so.  Of course membership fell apart.
Actually, across the full cluster, because the BMC restarts were
preformed back-to-back (I authorized a single restart only, but anyway).
I leave the rest to your imagination.  Fencing (STONITH) worked (with
delays) until quorum dissolved entirely... after a couple of minutes, it
was over.  We spent the rest of the day picking up the pieces, then the
next few trying to reproduce the perceived Corosync network outage
during BMC reboots without the cluster stack running.  Of course in
total vain.  Half a year later an independent investigation of sporadic
small Corosync delays revealed the watchdog connection, then we disabled
the feature.  Don't use (poorly implemented) BMC watchdogs.
-- 
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
David Teigland  writes:

> On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote:
> 
>> Almost ten years ago you requested more info in a similar case, let's
>> see if we can get further now!
>
> Hi, the usual cause is that a network message from the dlm has been
> lost/dropped/missed.  The dlm can't recover from that, which is clearly a
> weak point in the design.  There may be some new development coming along
> to finally improve that.

Hi David,

Good to hear!  Can you share any more info about this development?

> One way you can confirm this is to check if the dlm on one or more nodes
> is waiting for a message that's not arriving.  Often you'll see an entry
> in the dlm "waiters" debugfs file corresponding to a response that's being
> waited on.

If you mean dlm/clvmd_waiters, it's empty on all nodes.  Is there
anything else to check?

> Another red flag is kernel messages from a driver indicating some network
> hickup at the time things hung.  I can't say if these messages you sent
> happened at the right time, or if they even correspond to the dlm
> interface, but it's worth checking as a possible explanation:
>
> [  137.207059] be2net :05:00.0 enp5s0f0: Link is Up
> [  137.252901] be2net :05:00.1 enp5s0f1: Link is Up

Hard to say...  This is an iSCSI offload card with two physical ports,
which are virtualized in the card into 4-4 logical ports, 3-3 of which
are passed to the OS as separate PCI functions, while the other two are
used for iSCSI traffic.  The DLM traffic goes through a Linux bond made
of enp5s0f4 and enp5s0f5, which is started at 112.393798 and used for
Corosync traffic first.  The above two lines are signs of OpenVSwitch
starting up for independent purposes.  It should be totally independent,
but it's the same device after all, so I can't exclude all possibility
of "crosstalk".

> [  153.886619]  connection2:0: detected conn error (1011)

See above: iSCSI traffic is offloaded, not visible on the OS level, and
these connection failures are expected at the moment because some of the
targets are inaccessible.  *But* it uses the same wire in the end, just
different VLANs, and the virtualization (in the card itself) may not
provide absolutely perfect separation.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes:

> For a start I attached the dump output from another node.

I meant to...

146 dlm_controld 4.0.5 started
146 our_nodeid 167773708
146 found /dev/misc/dlm-control minor 58
146 found /dev/misc/dlm-monitor minor 57
146 found /dev/misc/dlm_plock minor 56
146 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
146 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
146 set recover_callbacks 1
146 cmap totem.cluster_name = 'vhbl'
146 set cluster_name vhbl
146 /dev/misc/dlm-monitor fd 10
146 cluster quorum 1 seq 3648 nodes 5
146 cluster node 167773705 added seq 3648
146 set_configfs_node 167773705 10.0.6.9 local 0
146 cluster node 167773707 added seq 3648
146 set_configfs_node 167773707 10.0.6.11 local 0
146 cluster node 167773708 added seq 3648
146 set_configfs_node 167773708 10.0.6.12 local 1
146 cluster node 167773709 added seq 3648
146 set_configfs_node 167773709 10.0.6.13 local 0
146 cluster node 167773710 added seq 3648
146 set_configfs_node 167773710 10.0.6.14 local 0
146 cpg_join dlm:controld ...
146 setup_cpg_daemon 12
146 dlm:controld conf 5 1 0 memb 167773705 167773707 167773708 167773709 
167773710 join 167773708 left
146 daemon joined 167773705
146 daemon joined 167773707
146 daemon joined 167773708
146 daemon joined 167773709
146 daemon joined 167773710
146 dlm:controld ring 167773705:3648 5 memb 167773705 167773707 167773708 
167773709 167773710
146 receive_protocol 167773705 max 3.1.1.0 run 3.1.1.1
146 daemon node 167773705 prot max 0.0.0.0 run 0.0.0.0
146 daemon node 167773705 save max 3.1.1.0 run 3.1.1.1
146 run protocol from nodeid 167773705
146 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
146 plocks 13
146 receive_fence_clear from 167773705 for 167773708 result 0 flags 6
146 fence_in_progress_unknown 0 recv
146 receive_protocol 167773707 max 3.1.1.0 run 3.1.1.1
146 daemon node 167773707 prot max 0.0.0.0 run 0.0.0.0
146 daemon node 167773707 save max 3.1.1.0 run 3.1.1.1
146 receive_protocol 167773708 max 3.1.1.0 run 0.0.0.0
146 daemon node 167773708 prot max 0.0.0.0 run 0.0.0.0
146 daemon node 167773708 save max 3.1.1.0 run 0.0.0.0
146 receive_protocol 167773708 max 3.1.1.0 run 3.1.1.0
146 daemon node 167773708 prot max 3.1.1.0 run 0.0.0.0
146 daemon node 167773708 save max 3.1.1.0 run 3.1.1.0
146 receive_protocol 167773709 max 3.1.1.0 run 3.1.1.1
146 daemon node 167773709 prot max 0.0.0.0 run 0.0.0.0
146 daemon node 167773709 save max 3.1.1.0 run 3.1.1.1
146 receive_protocol 167773710 max 3.1.1.0 run 3.1.1.1
146 daemon node 167773710 prot max 0.0.0.0 run 0.0.0.0
146 daemon node 167773710 save max 3.1.1.0 run 3.1.1.1
147 uevent: add@/kernel/dlm/clvmd
147 kernel: add@ clvmd
147 uevent: online@/kernel/dlm/clvmd
147 kernel: online@ clvmd
147 clvmd cpg_join dlm:ls:clvmd ...
147 dlm:ls:clvmd conf 5 1 0 memb 167773705 167773707 167773708 167773709 
167773710 join 167773708 left
147 clvmd add_change cg 1 joined nodeid 167773708
147 clvmd add_change cg 1 we joined
147 clvmd add_change cg 1 counts member 5 joined 1 remove 0 failed 0
147 clvmd check_ringid cluster 3648 cpg 0:0
147 dlm:ls:clvmd ring 167773705:3648 5 memb 167773705 167773707 167773708 
167773709 167773710
147 clvmd check_ringid done cluster 3648 cpg 167773705:3648
147 clvmd check_fencing disabled
147 clvmd send_start 167773708:1 counts 0 5 1 0 0
147 clvmd wait_messages cg 1 need 5 of 5
147 clvmd receive_start 167773708:1 len 92
147 clvmd match_change 167773708:1 matches cg 1
147 clvmd wait_messages cg 1 need 4 of 5
147 clvmd receive_start 167773709:12 len 92
147 clvmd match_change 167773709:12 matches cg 1
147 clvmd wait_messages cg 1 need 3 of 5
147 clvmd receive_start 167773710:14 len 92
147 clvmd match_change 167773710:14 matches cg 1
147 clvmd wait_messages cg 1 need 2 of 5
147 clvmd receive_start 167773705:4 len 92
147 clvmd match_change 167773705:4 matches cg 1
147 clvmd wait_messages cg 1 need 1 of 5
147 clvmd receive_start 167773707:8 len 92
147 clvmd match_change 167773707:8 matches cg 1
147 clvmd wait_messages cg 1 got all 5
147 clvmd start_kernel cg 1 member_count 5
147 write "1090842362" to "/sys/kernel/dlm/clvmd/id"
147 set_members mkdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773705"
147 set_members mkdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773707"
147 set_members mkdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773708"
147 set_members mkdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773709"
147 set_members mkdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/167773710"
147 write "1" to "/sys/kernel/dlm/clvmd/control"
147 write "0" to "/sys/kernel/dlm/clvmd/event_done"
147 clvmd prepare_plocks
147 clvmd set_plock_data_node from 0 to 167773705
147 clvmd save_plocks start
147 clvmd receive_plocks_done 167773705:4 flags 2 plocks_data 0 need 1 save 1
147 clvmd match_change 167773705:4 matc

[ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
Hi David,

Almost ten years ago you requested more info in a similar case, let's
see if we can get further now!

We're running a 6-node Corosync cluster.  DLM is started by systemd:

● dlm.service - dlm control daemon
   Loaded: loaded (/lib/systemd/system/dlm.service; enabled)
   Active: active (running) since Thu 2018-08-09 17:13:18 CEST; 33min ago
 Docs: man:dlm_controld
   man:dlm.conf
   man:dlm_stonith
  Process: 3690 ExecStartPre=/sbin/modprobe dlm (code=exited, status=0/SUCCESS)
 Main PID: 3692 (dlm_controld)
   CGroup: /system.slice/dlm.service
   └─3692 /usr/sbin/dlm_controld --foreground -D --enable_fencing=0

All other nodes have cLVM volumes activated, but activation is stuck on
this node (in the last step of a rolling cluster reboot):

[  136.729172] dlm: Using TCP for communications
[  136.743935] dlm: clvmd: joining the lockspace group...
[  136.749419] dlm: clvmd: dlm_recover 1
[  136.749485] dlm: clvmd: add member 167773710
[  136.749493] dlm: clvmd: add member 167773709
[  136.749497] dlm: clvmd: add member 167773708
[  136.749499] dlm: clvmd: add member 167773707
[  136.749504] dlm: clvmd: add member 167773706
[  136.749506] dlm: clvmd: add member 167773705
[  136.749519] dlm: connecting to 167773709
[  136.752848] dlm: connecting to 167773708
[  136.752889] dlm: connecting to 167773707
[  136.752918] dlm: connecting to 167773706
[  136.752943] dlm: connecting to 167773705
[  136.768589] dlm: clvmd: dlm_recover_members 6 nodes
[  136.941888] dlm: clvmd: group event done 0 0
[  136.960496] dlm: clvmd: join complete
[  137.019929] device enp5s0f1 entered promiscuous mode
[  137.036637] device enp5s0f0 entered promiscuous mode
[  137.054869] device vhbond entered promiscuous mode
[  137.207059] be2net :05:00.0 enp5s0f0: Link is Up
[  137.252901] be2net :05:00.1 enp5s0f1: Link is Up
[  138.009742] device vlan39 entered promiscuous mode
[  138.151755] device vlan894 entered promiscuous mode
[  153.861395] scsi host1: BM_2032 : Event CXN_KILLED_RST_RCVD[10] received on 
CID : 9
[  153.886619]  connection2:0: detected conn error (1011)
[  364.687306] INFO: task clvmd:5242 blocked for more than 120 seconds.
[  364.708222]   Not tainted 4.9.0-0.bpo.6-amd64 #1 Debian 
4.9.88-1+deb9u1~bpo8+1
[  364.733131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  364.758896] clvmd   D0  5242  1 0x
[  364.776934]  98eff5ecfc00  98fff8543040 
98efe9744140
[  364.801322]  98518ec0 a8a64ddc7cb0 af20e973 
aedde780
[  364.825720]  bd9dfe3d7034cec9 41fef240 fa4841fef240 
98efe9744140
[  364.850103] Call Trace:
[  364.858140]  [] ? __schedule+0x243/0x6f0
[  364.876181]  [] ? alloc_pages_vma+0xb0/0x240
[  364.895364]  [] ? schedule+0x32/0x80
[  364.912261]  [] ? rwsem_down_read_failed+0x10a/0x160
[  364.933732]  [] ? call_rwsem_down_read_failed+0x14/0x30
[  364.956059]  [] ? down_read+0x1c/0x30
[  364.973259]  [] ? dlm_user_request+0x47/0x200 [dlm]
[  364.994443]  [] ? cache_alloc_refill+0x20f/0x2b0
[  365.014773]  [] ? kmem_cache_alloc_trace+0xc2/0x200
[  365.035962]  [] ? device_write+0x5b6/0x7a0 [dlm]
[  365.056290]  [] ? vfs_write+0xb3/0x1a0
[  365.073754]  [] ? SyS_write+0x52/0xc0
[  365.090937]  [] ? do_syscall_64+0x91/0x1a0
[  365.109559]  [] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Here's the output of "dlm_tool dump" on the stuck node:

131 dlm_controld 4.0.5 started
131 our_nodeid 167773710
131 found /dev/misc/dlm-control minor 58
131 found /dev/misc/dlm-monitor minor 57
131 found /dev/misc/dlm_plock minor 56
131 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
131 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
131 set recover_callbacks 1
131 cmap totem.cluster_name = 'vhbl'
131 set cluster_name vhbl
131 /dev/misc/dlm-monitor fd 10
131 cluster quorum 0 seq 3680 nodes 1
131 cluster node 167773710 added seq 3680
131 set_configfs_node 167773710 10.0.6.14 local 1
131 cpg_join dlm:controld ...
131 setup_cpg_daemon 12
135 dlm:controld ring 167773705:3724 6 memb 167773705 167773706 167773707 
167773708 167773709 167773710
135 dlm:controld ring 167773705:3724 6 memb 167773705 167773706 167773707 
167773708 167773709 167773710
135 dlm:controld conf 6 1 0 memb 167773705 167773706 167773707 167773708 
167773709 167773710 join 167773710 left
135 daemon joined 167773705
135 daemon joined 167773706
135 daemon joined 167773707
135 daemon joined 167773708
135 daemon joined 167773709
135 daemon joined 167773710
135 receive_protocol 167773707 max 3.1.1.0 run 3.1.1.1
135 daemon node 167773707 prot max 0.0.0.0 run 0.0.0.0
135 daemon node 167773707 save max 3.1.1.0 run 3.1.1.1
135 run protocol from nodeid 167773707
135 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
135 plocks 13
135 cluster quorum 1 seq 3724 nodes 6
135 cluster node 167773705 added seq 3724
135 set_configfs_node 167773705 10.0.6.9 local 0
135 cluster node 167773706 added 

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-07 Thread Ferenc Wágner
Jan Pokorný  writes:

> 1.  [X] Do you edit CIB by hand (as opposed to relying on crm/pcs or
> their UI counterparts)?

For debugging one has to understand the CIB anyway, so why learn
additional syntaxes? :) Most of our configuration changes are scripted
via a home-grown domain-specific CLI.  Using crmsh or pcs under the hood
instead of cibadmin and crm_resource would bring additional dependencies
and require additional knowledge (of these tools).

> 2.  [X] Do you use "template" based syntactic simplification[1] in CIB?

This allows changing templated resource parameters at a single place.

> 3.  [ ] Do you use "id-ref" based syntactic simplification[2] in CIB?
>
> 3.1 [ ] When positive about 3., would you mind much if "id-refs" got
> unfold/exploded during the "cibadmin --upgrade --force"
> equivalent as a reliability/safety precaution?
>
> 4.  [ ] Do you use "tag" based syntactic grouping[3] in CIB?
-- 
Regards,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 2.4.4 is available at corosync.org!

2018-04-12 Thread Ferenc Wágner
Jan Pokorný  writes:

> On 12/04/18 14:33 +0200, Jan Friesse wrote:
>
>> This release contains a lot of fixes, including fix for
>> CVE-2018-1084.
>
> Security related updates would preferably provide more context

Absolutely, thanks for providing that!  Looking at the git log, I wonder
if c139255 (totemsrp: Implement sanity checks of received msgs) has
direct security relevance as well.  Should I include that too in the
Debian security update?  Debian stable has 2.4.2, so I'm cherry picking
into that version.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Issues found in Pacemaker 1.1.18, fixes in 1.1 branch

2017-12-12 Thread Ferenc Wágner
Ken Gaillot  writes:

> A couple of regressions have been found in the recent Pacemaker 1.1.18
> release.
>
> Fixes for these, plus one finishing an incomplete fix in 1.1.18, are in
> the master branch, and have been backported to the 1.1 branch for ease
> of patching. It is recommended that anyone compiling or packaging
> 1.1.18 include all the commits from the 1.1 branch.

Hi Ken,

Did you consider cutting a new patch-level release with these fixes?
That would help determining the presence of the fixes for bug reports
and questions.  Which is even more important if you don't plan to make
further 1.x releases.

> * 1.1.18 improved scalability by eliminating redundant node attribute
> write-outs. This proved to be too aggressive in one case: when a
> cluster is configured with only IP addresses (no node names) in
> corosync.conf, attribute write-outs can be incorrectly delayed; in the
> worst case, this prevents a node from shutting down due to the shutdown
> attribute not being written.

I guess this applies all the same for clusters defined without a
Corosync nodelist.  Right?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-27 Thread Ferenc Wágner
Andrei Borzenkov  writes:

> 25.11.2017 10:05, Andrei Borzenkov пишет:
>
>> In one of guides suggested procedure to simulate split brain was to kill
>> corosync process. It actually worked on one cluster, but on another
>> corosync process was restarted after being killed without cluster
>> noticing anything. Except after several attempts pacemaker died with
>> stopping resources ... :)
>> 
>> This is SLES12 SP2; I do not see any Restart in service definition so it
>> probably not systemd.
>> 
> FTR - it was not corosync, but pacemakker; its unit file specifies
> RestartOn=error so killing corosync caused pacemaker to fail and be
> restarted by systemd.

And starting corosync via a Requires dependency?
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-11-01 Thread Ferenc Wágner
Ken Gaillot  writes:

> When an operation completes, a history entry () is added to
> the pe-input file. If the agent supports reload, the entry will include
> op-force-restart and op-restart-digest fields. Now I see those are
> present in the vm-alder_last_0 entry, so agent support isn't the issue.

Thanks for the explanation.

> However, the operation is recorded as a *failed* probe (i.e. the
> resource was running where it wasn't expected). This gets recorded as a
> separate vm-alder_last_failure_0 entry, which does not get the special
> fields. It looks to me like this failure entry is forcing the restart.
> That would be a good idea if it's an actual failure; if we find a
> resource unexpectedly running, we don't know how it was started, so a
> full restart makes sense. 
>
> However, I'm guessing it may not have been a real error, but a resource
> cleanup. A cleanup clears the history so the resource is re-probed, and
> I suspect that re-probe is what got recorded here as a failure. Does
> that match what actually happened?

Well, I can't really remember, it happened two months ago...  I'm pretty
sure the resource wasn't running unexpectedly, I'd surely recall such a
grave failure.  Interestingly, though, my shell history contains a
cleanup operation shortly after the parameter change.  Also, if you look
at the logs in my thread starting mail, you'll find

warning: Processing failed op monitor for vm-alder on vhbl05: not running (7)

which does not seem to match up with the failure in the lrm_rsc_op entry
in pe-input.  It's sort of "normal" that such a resource disappears and
gets restarted by the cluster.  If that report survived the unexpected
restart, I might have wanted to routinely clean it up afterwards.

(I'm leaving for a short holiday now, expect longer delays.)
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot  writes:

> The pe-input is indeed entirely sufficient.
>
> I forgot to check why the reload was not possible in this case. It
> turns out it is this:
>
>    trace: check_action_definition:  Resource vm-alder doesn't know
> how to reload
>
> Does the resource agent implement the "reload" action and advertise it
> in the  section of its metadata?

Absolutely, I use this operation routinely.

$ /usr/sbin/crm_resource --show-metadata=ocf:niif:TransientDomain
[...]












And the implementation is just a no-op.

vm-alder is based on a template, just like all other VMs:


  

[...]
  
  [...]
  

[...]

  
  





  
  [...]


I wonder why it wouldn't know how to reload.  How is that visible in the
pe-input file?  I'd check the other resources...
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-10-31 Thread Ferenc Wágner
Dennis Jacobfeuerborn  writes:

> if I create a new unit file for the new file the services would not
> depend on it so it wouldn't get automatically mounted when they start.

Put the new unit file under /etc/systemd/system/x.service.requires to
have x.service require it.  I don't get the full picture, but this trick
may help puzzle it together.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-31 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote:
>
>> Ken Gaillot <kgail...@redhat.com> writes:
>> 
>>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
>>>
>>>> Ken Gaillot <kgail...@redhat.com> writes:
>>>> 
>>>>> Hmm, stop+reload is definitely a bug. Can you attach (or email it
>>>>> to me privately, or file a bz with it attached) the above pe-input
>>>>> file with any sensitive info removed?
>>>> 
>>>> I sent you the pe-input file privately.  It indeed shows the
>>>> issue:
>>>> 
>>>> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
>>>> [...]
>>>> Executing cluster transition:
>>>>  * Resource action: vm-alderstop on vhbl05
>>>>  * Resource action: vm-alderreload on vhbl05
>>>> [...]
>>>> 
>>>> Hope you can easily get to the bottom of this.
>>> 
>>> This turned out to have the same underlying cause as CLBZ#5309. I
>>> have a fix pending review, which I expect to make it into the
>>> soon-to-be-released 1.1.18.
>> 
>> Great!
>> 
>>> It is a regression introduced in 1.1.15 by commit 2558d76f. The
>>> logic for reloads was consolidated in one place, but that happened
>>> to be before restarts were scheduled, so it no longer had the right
>>> information about whether a restart was needed. Now, it sets an
>>> ordering flag that is used later to cancel the reload if the restart
>>> becomes required. I've also added a regression test for it.
>> 
>> Restarts shouldn't even enter the picture here, so I don't get your
>> explanation.  But I also don't know the code, so that doesn't mean a
>> thing.  I'll test the next RC to be sure.
>
> :-)
>
> Reloads are done in place of restarts, when circumstances allow. So
> reloads are always related to (potential) restarts.
>
> The problem arose because not all of the relevant circumstances are
> known at the time the reload action is created. We may figure out later
> that a resource the reloading resource depends on must be restarted,
> therefore the reloading resource must be fully restarted instead of
> reloaded. E.g. a database resource might otherwise be able to reload,
> but not if the filesystem it's using is going away.
>
> Previously in those cases, we would end up scheduling both the reload
> and the restart. Now, we schedule only the restart.

Hi Ken,

1.1.18-rc3 indeed schedules a restart, not a reload, like 1.1.16 did.
However, this wasn't my problem, I really expect a reload on the change
of a non-unique parameter.  Them problem was that 1.1.16 also executed a
stop action in parallel with the reload.

Maybe I test it wrong: I just copied the pe-input file to another system
(which doesn't even know this resource agent) running 1.1.18-rc3 and
gave it to crm_simulate.  Does the pe-input file contain all the
information necessary to decide between restart and reload?  The
op-force-restart attribute does not contain the name of the changed
parameter, but I can't find any info on what changed at all.  Should I
see a clean reload in this test setup at all?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-31 Thread Ferenc Wágner
Norberto Lopes <nlopes...@gmail.com> writes:

> On Fri, 27 Oct 2017 at 06:41 Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Norberto Lopes <nlopes...@gmail.com> writes:
>>
>>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master
>>> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave
>>>
>>> Basically what's occurring in my cluster is that the first rule stops the
>>> Sync node from being promoted if the Master ever dies. The second doesn't
>>> but I can't quite follow why.
>>
>> Getting a score of -inf means that the resource won't run.  On the other
>> hand, (+)inf just means "strongest" preference.
>
> Apologies but I'm not following. I'm probably misunderstanding something.
>
> From what I could gather from
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_mandatory_placement.html
> I can't follow the subtle difference between the two on a running
> cluster. As an example: If backupVIP is already in node A and
> postgresMS:Master in node B, and postgresMS:Master dies, in my case,
> postgresMS:Master never gets promoted in node C. But from the -inf
> rule it should be able to?
>
> Any insights into this would be greatly appreciated.

You're right: what I said was based on my experience with location
constraints, and according to the linked documentation colocation
constraints behave differently.  Sorry for misleading you.  I'm leaving
this discussion to the more knowledgeable.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Colocation rule with vip and ms master

2017-10-26 Thread Ferenc Wágner
Norberto Lopes  writes:

> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master
> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave
>
> Basically what's occurring in my cluster is that the first rule stops the
> Sync node from being promoted if the Master ever dies. The second doesn't
> but I can't quite follow why.

Getting a score of -inf means that the resource won't run.  On the other
hand, (+)inf just means "strongest" preference.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-10-20 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
>> Ken Gaillot <kgail...@redhat.com> writes:
>> 
>>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to
>>> me privately, or file a bz with it attached) the above pe-input file
>>> with any sensitive info removed?
>> 
>> I sent you the pe-input file privately.  It indeed shows the issue:
>> 
>> $ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
>> [...]
>> Executing cluster transition:
>>  * Resource action: vm-alderstop on vhbl05
>>  * Resource action: vm-alderreload on vhbl05
>> [...]
>> 
>> Hope you can easily get to the bottom of this.
>
> This turned out to have the same underlying cause as CLBZ#5309. I have
> a fix pending review, which I expect to make it into the soon-to-be-
> released 1.1.18.

Great!

> It is a regression introduced in 1.1.15 by commit 2558d76f. The logic
> for reloads was consolidated in one place, but that happened to be
> before restarts were scheduled, so it no longer had the right
> information about whether a restart was needed. Now, it sets an
> ordering flag that is used later to cancel the reload if the restart
> becomes required. I've also added a regression test for it.

Restarts shouldn't even enter the picture here, so I don't get your
explanation.  But I also don't know the code, so that doesn't mean a
thing.  I'll test the next RC to be sure.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach <ma...@cesnet.cz> writes:

> On 10/11/2017 09:00 AM, Ferenc Wágner wrote:
>
>> Václav Mach <ma...@cesnet.cz> writes:
>>
>>> allow-hotplug eth0
>>> iface eth0 inet dhcp
>>
>> Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
>> before network-online.target, which excludes allow-hotplug interfaces.
>> That means allow-hotplug interfaces are not waited for before corosync
>> is started during boot.
>
> That did the trick for network config using DHCP. Thanks for clarification.
>
> Do you know what is the reason, why allow-hotplug interfaces are
> excluded? It's obivous that if ifup (according to it's man) is run as
> 'ifup -a' it does ignore them, but I don't get why allow hotplug
> interfaces should be ignored by init system.

Allow-hotplug interfaces aren't assumed to be present all the time, but
rather to be plugged in and out arbitrarily.  They are handled by udev,
asynchronously when the system is running.  Waiting for them during
bootup would be strange if you ask me.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterMon mail notification - does not work

2017-10-11 Thread Ferenc Wágner
Donat Zenichev  writes:

> then resource is stopped, but nothing occurred on e-mail destination.
> Where I did wrong actions?

Please note that ClusterMon notifications are becoming deprecated (they
should still work, but I've got no experience with them).  Try using
alerts instead, as documented at
https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch07.html
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach  writes:

> allow-hotplug eth0
> iface eth0 inet dhcp

Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
before network-online.target, which excludes allow-hotplug interfaces.
That means allow-hotplug interfaces are not waited for before corosync
is started during boot.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource parameter reload confusion

2017-09-22 Thread Ferenc Wágner
Ken Gaillot  writes:

> Hmm, stop+reload is definitely a bug. Can you attach (or email it to me
> privately, or file a bz with it attached) the above pe-input file with
> any sensitive info removed?

I sent you the pe-input file privately.  It indeed shows the issue:

$ /usr/sbin/crm_simulate -x pe-input-1033.bz2 -RS
[...]
Executing cluster transition:
 * Resource action: vm-alderstop on vhbl05
 * Resource action: vm-alderreload on vhbl05
[...]

Hope you can easily get to the bottom of this.

> Nothing's been done about reload yet. It's waiting until we get around
> to an overhaul of the OCF resource agent standard, so we can define
> the semantics more clearly. It will involve replacing "unique" with
> separate meta-data for reloadability and GUI hinting, and possibly
> changes to the reload operation. Of course we'll try to stay backward-
> compatible.

Thanks for the confirmation.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.18 deprecation warnings

2017-09-20 Thread Ferenc Wágner
Ken Gaillot  writes:

> * undocumented LRMD_MAX_CHILDREN environment variable
> (PCMK_node_action_limit is the current syntax)

By the way, is the current syntax documented somewhere?  Looking at
crmd/throttle.c, throttle_update_job_max() is only ever invoked with a
NULL argument, so "Global preference from the CIB" isn't implemented
either.  Or do I overlook something?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Jan Friesse  writes:

> Back to problem you have. It's definitively HW issue but I'm thinking
> how to solve it in software. Right now, I can see two ways:
> 1. Set dog FD to be non blocking right at the end of setup_watchdog - 
>This is proffered but I'm not sure if it's really going to work.

I'll run some test to see what works (if anything).  The keepalives can
be provided by write()s as well, but somehow I don't expect that to make
a difference.  We'll see.

> 2. Create thread which makes sure to tackle wd regularly.

That would work, but maybe too well if entirely decoupled from the main
loop.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Klaus Wenninger  writes:

> Just for my understanding: You are using watchdog-handling in corosync?

Yes, I was.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
Valentin Vidic <valentin.vi...@carnet.hr> writes:

> On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote:
>
>> Confirmed: setting watchdog_device: off cluster wide got rid of the
>> above warnings.
>
> Interesting, what brand or version of IPMI has this problem?

It's a Fujitsu PRIMERGY BX924 S4 blade iRMC S4 with Firmware Version
8.43F and SDRR Version 3.60 ID 0376 BX924S4.

$ sudo ipmitool -I open mc info
Device ID : 52
Device Revision   : 2
Firmware Revision : 1.00
IPMI Version  : 2.0
Manufacturer ID   : 10368
Manufacturer Name : Fujitsu Siemens
Product ID: 886 (0x0376)
Product Name  : Unknown (0x376)
Device Available  : yes
Provides Device SDRs  : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
Aux Firmware Rev Info : 
0x08
0x2b
0x00
0x46
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes:

> Jan Friesse <jfrie...@redhat.com> writes:
>
>> wf...@niif.hu writes:
>>
>>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>>> (in August; in May, it happened 0-2 times a day only, it's slowly
>>> ramping up):
>>>
>>> vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
>>> configuration.
>>> vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
>>> configuration.
>>> vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
>>> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
>>> increase.
>>
>> ^^^ This is main problem you have to solve. It usually means that
>> machine is too overloaded. It is happening quite often when corosync
>> is running inside VM where host machine is unable to schedule regular
>> VM running.
>
> After some extensive tracing, I think the problem lies elsewhere: my
> IPMI watchdog device is slow beyond imagination.

Confirmed: setting watchdog_device: off cluster wide got rid of the
above warnings.

> Its ioctl operations can take seconds, starving all other functions.
> At least, it seems to block the main thread of Corosync.  Is this a
> plausible scenario?  Corosync has two threads, what are their roles?
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-05 Thread Ferenc Wágner
Jan Friesse  writes:

> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
>> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
>> increase.
>
> ^^^ This is main problem you have to solve. It usually means that
> machine is too overloaded. It is happening quite often when corosync
> is running inside VM where host machine is unable to schedule regular
> VM running.

After some extensive tracing, I think the problem lies elsewhere: my
IPMI watchdog device is slow beyond imagination.  Its ioctl operations
can take seconds, starving all other functions.  At least, it seems to
block the main thread of Corosync.  Is this a plausible scenario?
Corosync has two threads, what are their roles?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes:

> On 2017-08-29 10:45 AM, Ferenc Wágner wrote:
>
>> Digimer <li...@alteeve.ca> writes:
>> 
>>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>>>
>>>> [...]
>>>> While dlm_tool status reports (similar on all nodes):
>>>>
>>>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>>>> daemon now 2941405 fence_pid 0 
>>>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0
>>>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0
>>>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0
>>>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0
>>>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0
>>>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0
>>>>
>>>> dlm_tool ls shows "kern_stop":
>>>>
>>>> dlm lockspaces
>>>> name  clvmd
>>>> id0x4104eefa
>>>> flags 0x0004 kern_stop
>>>> changemember 5 joined 0 remove 1 failed 1 seq 8,8
>>>> members   167773705 167773706 167773707 167773708 167773710 
>>>> new changemember 6 joined 1 remove 0 failed 0 seq 9,9
>>>> new statuswait messages 1
>>>> new members   167773705 167773706 167773707 167773708 167773709 167773710 
>>>>
>>>> on all nodes except for vhbl07 (167773709), where it gives
>>>>
>>>> dlm lockspaces
>>>> name  clvmd
>>>> id0x4104eefa
>>>> flags 0x 
>>>> changemember 6 joined 1 remove 0 failed 0 seq 11,11
>>>> members   167773705 167773706 167773707 167773708 167773709 167773710 
>>>>
>>>> instead.
>>>>
>>>> [...] Is there a way to unblock DLM without rebooting all nodes?
>>>
>>> Looks like the lost node wasn't fenced.
>> 
>> Why dlm status does not report any lost node then?  Or do I misinterpret
>> its output?
>> 
>>> Do you have fencing configured and tested? If not, DLM will block
>>> forever because it won't recover until it has been told that the lost
>>> peer has been fenced, by design.
>> 
>> What command would you recommend for unblocking DLM in this case?
>
> First, fix fencing. Do you have that setup and working?

I really don't want DLM to do fencing.  DLM blocking for a couple of
days is not an issue in this setup (cLVM isn't a "service" of this
cluster, only a rarely needed administration tool).  Fencing is set up
and works fine for Pacemaker, so it's used to recover actual HA
services.  But letting DLM use it resulted in disaster one and a half
year ago (see Message-ID: <87r3g5a969@lant.ki.iif.hu>), which I
failed to understand yet, and I'd rather not go there again until that's
taken care of properly.  So for now, a manual unblock path is all I'm
after.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Jan Friesse  writes:

> wf...@niif.hu writes:
>
>> Jan Friesse  writes:
>>
>>> wf...@niif.hu writes:
>>>
 In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
 (in August; in May, it happened 0-2 times a day only, it's slowly
 ramping up):

 vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
 configuration.
 vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
 configuration.
 vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
 for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
 increase.
>>>
>>> ^^^ This is main problem you have to solve. It usually means that
>>> machine is too overloaded. [...]
>>
>> Before I start tracing the scheduler, I'd like to ask something: what
>> wakes up the Corosync main process periodically?  The token making a
>> full circle?  (Please forgive my simplistic understanding of the TOTEM
>> protocol.)  That would explain the recommendation in the log message,
>> but does not fit well with the overload assumption: totally idle nodes
>> could just as easily produce such warnings if there are no other regular
>> wakeup sources.  (I'm looking at timer_function_scheduler_timeout but I
>> know too little of libqb to decide.)
>
> Corosync main loop is based on epoll, so corosync is waked up ether by
> receiving data (network socket or unix socket for services) or when
> there are data to sent and socket is ready for non blocking write or
> after timeout. This timeout is exactly what you call other wakeup
> resource.
>
> Timeout is used for scheduling periodical tasks inside corosync.
>
> One of periodical tasks is scheduler pause detector. It is basically
> scheduled every (token_timeout / 3) msec and it computes diff between
> current and last time. If diff is larger than (token_timeout * 0.8) it
> displays warning.

Thanks, I can work with this.  I'll come back as soon as I find
something (or need further information :).

>>> As a start you can try what message say = Consider token timeout
>>> increase. Currently you have 3 seconds, in theory 6 second should be
>>> enough.
>>
>> It was probably high time I realized that token timeout is scaled
>> automatically when one has a nodelist.  When you say Corosync should
>> work OK with default settings up to 16 nodes, you assume this scaling is
>> in effect, don't you?  On the other hand, I've got no nodelist in the
>> config, but token = 3000, which is less than the default 1000+4*650 with
>> six nodes, and this will get worse as the cluster grows.
>
> This is described in corosync.conf man page (token_coefficient).

Yes, that's how I found out.  It also says: "This value is used only
when nodelist section is specified and contains at least 3 nodes."

> Final timeout is computed using totem.token as a base value. So if you
> set totem.token to 3000 it means that final totem timeout value is not
> 3000 but (3000 + 4 * 650).

But I've got no nodelist section, and according to the warning, my token
timeout is indeed 3 seconds, as you promptly deduced.  So the
documentation seems to be correct.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Jan Friesse  writes:

> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
>> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
>> increase.
>
> ^^^ This is main problem you have to solve. It usually means that
> machine is too overloaded. [...]

Before I start tracing the scheduler, I'd like to ask something: what
wakes up the Corosync main process periodically?  The token making a
full circle?  (Please forgive my simplistic understanding of the TOTEM
protocol.)  That would explain the recommendation in the log message,
but does not fit well with the overload assumption: totally idle nodes
could just as easily produce such warnings if there are no other regular
wakeup sources.  (I'm looking at timer_function_scheduler_timeout but I
know too little of libqb to decide.)

> As a start you can try what message say = Consider token timeout
> increase. Currently you have 3 seconds, in theory 6 second should be
> enough.

It was probably high time I realized that token timeout is scaled
automatically when one has a nodelist.  When you say Corosync should
work OK with default settings up to 16 nodes, you assume this scaling is
in effect, don't you?  On the other hand, I've got no nodelist in the
config, but token = 3000, which is less than the default 1000+4*650 with
six nodes, and this will get worse as the cluster grows.

Comments on the above ramblings welcome!

I'm grateful for all the valuable input poured into this thread by all
parties: it's proven really educative in quite unexpected ways beyond
what I was able to ask in the beginning.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Klaus Wenninger  writes:

> Just seen that you are hosting VMs which might make you use KSM ...
> Don't fully remember at the moment but I have some memory of
> issues with KSM and page-locking.
> iirc it was some bug in the kernel memory-management that should
> be fixed a long time ago but ...

Hi Klaus,

I failed to find anything relevant by a quick internet search.  Can you
recall something more specific, so that I can ensure I'm running with
this issue fixed?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse  writes:

> wf...@niif.hu writes:
>
>> Jan Friesse  writes:
>>
>>> wf...@niif.hu writes:
>>>
 In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
 (in August; in May, it happened 0-2 times a day only, it's slowly
 ramping up):

 vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
 configuration.
 vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
 configuration.
 vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
 for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
 increase.
>>>
>>> ^^^ This is main problem you have to solve. It usually means that
>>> machine is too overloaded. It is happening quite often when corosync
>>> is running inside VM where host machine is unable to schedule regular
>>> VM running.
>>
>> Corosync isn't running in a VM here, these nodes are 2x8 core servers
>> hosting VMs themselves as Pacemaker resources.  (Incidentally, some of
>> these VMs run Corosync to form a test cluster, but that should be
>> irrelevant now.)  And they aren't overloaded in any apparent way: Munin
>> reports 2900% CPU idle (out of 32 hyperthreads).  There's no swap, but
>> the corosync process is locked into memory anyway.  It's also running as
>> SCHED_RR prio 99, competing only with multipathd and the SCHED_FIFO prio
>> 99 kernel threads (migration/* and watchdog/*) under Linux 4.9.  I'll
>> try to take a closer look at the scheduling of these.  Can you recommend
>> some indicators to check out?
>
> No real hints. But one question. Are you 100% sure memory is locked?
> Because we had problem where mlockall was called in wrong place so
> corosync was actually not locked and it was causing similar issues.
>
> This behavior is fixed by
> https://github.com/corosync/corosync/commit/238e2e62d8b960e7c10bfa0a8281d78ec99f3a26

I based this assertion on the L flag in the ps STAT column.  The above
commit should not affect me because I'm running corosync with the -f
option:

$ ps l 3805
F   UID   PID  PPID PRI  NIVSZ   RSS WCHAN  STAT TTYTIME COMMAND
4 0  3805 1 -100  - 247464 141016 - SLsl ?251:10 
/usr/sbin/corosync -f

By the way, are the above VSZ and RSS numbers reasonable?

One more thing: these servers run without any swap.

>>> As a start you can try what message say = Consider token timeout
>>> increase. Currently you have 3 seconds, in theory 6 second should be
>>> enough.
>>
>> OK, thanks for the tip.  Can I do this on-line, without shutting down
>> Corosync?
>
> Corosync way is to just edit/copy corosync.conf on all nodes and call
> corosync-cfgtool -R on one of the nodes (crmsh/pcs may have better
> way).

Great, that's what I wanted to know: whether -R is expected to make this
change effective.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes:

> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>
>> [...]
>> While dlm_tool status reports (similar on all nodes):
>> 
>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>> daemon now 2941405 fence_pid 0 
>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0
>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0
>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0
>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0
>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0
>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0
>> 
>> dlm_tool ls shows "kern_stop":
>> 
>> dlm lockspaces
>> name  clvmd
>> id0x4104eefa
>> flags 0x0004 kern_stop
>> changemember 5 joined 0 remove 1 failed 1 seq 8,8
>> members   167773705 167773706 167773707 167773708 167773710 
>> new changemember 6 joined 1 remove 0 failed 0 seq 9,9
>> new statuswait messages 1
>> new members   167773705 167773706 167773707 167773708 167773709 167773710 
>> 
>> on all nodes except for vhbl07 (167773709), where it gives
>> 
>> dlm lockspaces
>> name  clvmd
>> id0x4104eefa
>> flags 0x 
>> changemember 6 joined 1 remove 0 failed 0 seq 11,11
>> members   167773705 167773706 167773707 167773708 167773709 167773710 
>> 
>> instead.
>> 
>> [...] Is there a way to unblock DLM without rebooting all nodes?
>
> Looks like the lost node wasn't fenced.

Why dlm status does not report any lost node then?  Or do I misinterpret
its output?

> Do you have fencing configured and tested? If not, DLM will block
> forever because it won't recover until it has been told that the lost
> peer has been fenced, by design.

What command would you recommend for unblocking DLM in this case?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse  writes:

> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new 
>> configuration.
>> vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled 
>> for 4317.0054 ms (threshold is 2400. ms). Consider token timeout 
>> increase.
>
> ^^^ This is main problem you have to solve. It usually means that
> machine is too overloaded. It is happening quite often when corosync
> is running inside VM where host machine is unable to schedule regular
> VM running.

Hi Honza,

Corosync isn't running in a VM here, these nodes are 2x8 core servers
hosting VMs themselves as Pacemaker resources.  (Incidentally, some of
these VMs run Corosync to form a test cluster, but that should be
irrelevant now.)  And they aren't overloaded in any apparent way: Munin
reports 2900% CPU idle (out of 32 hyperthreads).  There's no swap, but
the corosync process is locked into memory anyway.  It's also running as
SCHED_RR prio 99, competing only with multipathd and the SCHED_FIFO prio
99 kernel threads (migration/* and watchdog/*) under Linux 4.9.  I'll
try to take a closer look at the scheduling of these.  Can you recommend
some indicators to check out?

Are scheduling delays expected to generate TOTEM membership "changes"
without any leaving and joining nodes?

> As a start you can try what message say = Consider token timeout
> increase. Currently you have 3 seconds, in theory 6 second should be
> enough.

OK, thanks for the tip.  Can I do this on-line, without shutting down
Corosync?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ferenc Wágner
Hi,

In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
(in August; in May, it happened 0-2 times a day only, it's slowly
ramping up):

vhbl08 corosync[3687]:   [TOTEM ] A processor failed, forming new configuration.
vhbl03 corosync[3890]:   [TOTEM ] A processor failed, forming new configuration.
vhbl07 corosync[3805]:   [MAIN  ] Corosync main process was not scheduled for 
4317.0054 ms (threshold is 2400. ms). Consider token timeout increase.
vhbl07 corosync[3805]:   [TOTEM ] A processor failed, forming new configuration.
vhbl04 corosync[3759]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl05 corosync[3919]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl06 corosync[3759]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl07 corosync[3805]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl08 corosync[3687]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl03 corosync[3890]:   [TOTEM ] A new membership (10.0.6.9:3056) was formed. 
Members
vhbl07 corosync[3805]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl08 corosync[3687]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl06 corosync[3759]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl07 corosync[3805]:   [MAIN  ] Completed service synchronization, ready to 
provide service.
vhbl04 corosync[3759]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl08 corosync[3687]:   [MAIN  ] Completed service synchronization, ready to 
provide service.
vhbl06 corosync[3759]:   [MAIN  ] Completed service synchronization, ready to 
provide service.
vhbl04 corosync[3759]:   [MAIN  ] Completed service synchronization, ready to 
provide service.
vhbl05 corosync[3919]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl03 corosync[3890]:   [QUORUM] Members[6]: 167773705 167773706 167773707 
167773708 167773709 167773710
vhbl05 corosync[3919]:   [MAIN  ] Completed service synchronization, ready to 
provide service.
vhbl03 corosync[3890]:   [MAIN  ] Completed service synchronization, ready to 
provide service.

The cluster is running Corosync 2.4.2 multicast.  Those lines really end
at Members, there are no joined of left nodes listed.  Pacemaker on top
reacts like:

[9982] vhbl03 pacemakerd: info: pcmk_quorum_notification:   Quorum retained 
| membership=3056 members=6
[9991] vhbl03   crmd: info: pcmk_quorum_notification:   Quorum retained 
| membership=3056 members=6
[9986] vhbl03cib: info: cib_process_request:Completed 
cib_modify operation for section nodes: OK (rc=0, origin=vhbl07/crmd/4477, 
version=0.1694.12)
[9986] vhbl03cib: info: cib_process_request:Completed 
cib_modify operation for section status: OK (rc=0, origin=vhbl07/crmd/4478, 
version=0.1694.12)
[9986] vhbl03cib: info: cib_process_ping:   Reporting our current 
digest to vhbl07: 85250f3039d269f96012750f13e232d9 for 0.1694.12 
(0x55ef057447d0 0)

on all nodes except for vhbl07, where it says:

[9886] vhbl07   crmd: info: pcmk_quorum_notification:   Quorum retained 
| membership=3056 members=6
[9877] vhbl07 pacemakerd: info: pcmk_quorum_notification:   Quorum retained 
| membership=3056 members=6
[9881] vhbl07cib: info: cib_process_request:Forwarding 
cib_modify operation for section nodes to all (origin=local/crmd/
[9881] vhbl07cib: info: cib_process_request:Forwarding 
cib_modify operation for section status to all (origin=local/crmd
[9881] vhbl07cib: info: cib_process_request:Completed 
cib_modify operation for section nodes: OK (rc=0, origin=vhbl07/cr
[9881] vhbl07cib: info: cib_process_request:Completed 
cib_modify operation for section status: OK (rc=0, origin=vhbl07/c
[9881] vhbl07cib: info: cib_process_ping:   Reporting our current 
digest to vhbl07: 85250f3039d269f96012750f13e232d9 for 0.1694.

So Pacemaker does nothing, basically, and I can't see any adverse effect
to resource management, but DLM seems to have some problem, which may or
may not be related.  When the TOTEM error appears, all nodes log this:

vhbl03 dlm_controld[3914]: 2801675 dlm:controld ring 167773705:3056 6 memb 
167773705 167773706 167773707 167773708 167773709 167773710
vhbl03 dlm_controld[3914]: 2801675 fence work wait for cluster ringid
vhbl03 dlm_controld[3914]: 2801675 dlm:ls:clvmd ring 167773705:3056 6 memb 
167773705 167773706 167773707 167773708 167773709 167773710
vhbl03 dlm_controld[3914]: 2801675 clvmd wait_messages cg 9 need 1 of 6
vhbl03 dlm_controld[3914]: 2801675 fence work wait for cluster ringid
vhbl03 dlm_controld[3914]: 2801675 cluster quorum 1 seq 3056 nodes 6

dlm_controld is running with --enable_fencing=0.  Pacemaker does 

Re: [ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

2017-06-21 Thread Ferenc Wágner
Ken Gaillot  writes:

> The most significant change in this release is a new cluster option to
> improve scalability.
>
> As users start to create clusters with hundreds of resources and many
> nodes, one bottleneck is a complete reprobe of all resources (for
> example, after a cleanup of all resources).

Hi,

Does crm_resource --cleanup without any --resource specified do this?
Does this happen any other (automatic or manual) way?

> This can generate enough CIB updates to get the crmd's CIB connection
> dropped for not processing them quickly enough.

Is this a catastrophic scenario, or does the cluster recover gently?

> This bottleneck has been addressed with a new cluster option,
> cluster-ipc-limit, to raise the threshold for dropping the connection.
> The default is 500. The recommended value is the number of nodes in the
> cluster multiplied by the number of resources.

I'm running a production cluster with 6 nodes and 159 resources (ATM),
which gives almost twice the above default.  What symptoms should I
expect to see under 1.1.16?  (1.1.16 has just been released with Debian
stretch.  We can't really upgrade it, but changing the built-in default
is possible if it makes sense.)
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-20 Thread Ferenc Wágner
Digimer <li...@alteeve.ca> writes:

> On 19/06/17 11:40 PM, Andrei Borzenkov wrote:
>
>> 20.06.2017 02:15, Digimer пишет:
>>
>>> On 19/06/17 06:59 PM, Ferenc Wágner wrote:
>>>
>>>> Digimer <li...@alteeve.ca> writes:
>>>>
>>>>> So we have a tool that watches for changes to clvmd by running
>>>>> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally
>>>>> cause trouble.
>>>>
>>>> What kind of trouble did you experience?
>>>>
>>>>> Is there any other way to be notified or to check when something
>>>>> changes?
>>>>
>>>> LV (de)activation generates udev events (due to block devices appearing/
>>>> disappearing).  PVs too, but they don't go though clvmd.
>>>
>>> Interesting (dbus), I'll look into that.
>> 
>> udev events are sent over netlink, not D-Bus.
>
> I've not used that before. Any docs on how to listen for those events,
> by chance? If nothing off hand, don't worry, I can search.

Or just configure udev to run appropriate programs on the events you're
interested in.  Less efficient, but simpler.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Ferenc Wágner
Digimer  writes:

> So we have a tool that watches for changes to clvmd by running
> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally
> cause trouble.

What kind of trouble did you experience?

> Is there any other way to be notified or to check when something
> changes?

LV (de)activation generates udev events (due to block devices appearing/
disappearing).  PVs too, but they don't go though clvmd.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Ubuntu 16.04 - Only binds on 127.0.0.1 then fails until reinstall

2017-05-06 Thread Ferenc Wágner
James Booth  writes:

> Sorry for the repeat mails, but I had issues subscribing list time
> (Looks like it has worked successfully now!).
>
> Anywho, I'm really desperate for some help on my issue in
> http://lists.clusterlabs.org/pipermail/users/2017-April/005495.html -
> I can recap the info in this thread and provide any configs if needed!

Hi James,

This thread is badly fragmented and confusing now, but let's try to
proceed.  It seems corosync ignores its config file.  Maybe you edit a
stray corosync.conf, not the one corosync actually reads (which should
probably be /etc/corosync/corosync.conf).  Please issue the following
command as a regular user, and show us its output (make sure strace is
installed):

$ strace -f -eopen /usr/sbin/corosync -p -f

It should reveal the name of the config file.  For example, under a
different version a section of the output looks like this:

open("/dev/shm/qb-corosync-16489-blackbox-header", O_RDWR|O_CREAT|O_TRUNC, 
0600) = 3
open("/dev/shm/qb-corosync-16489-blackbox-data", O_RDWR|O_CREAT|O_TRUNC, 0600) 
= 4
open("/etc/corosync/corosync.conf", O_RDONLY) = 3
open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3
Process 16490 attached
[pid 16489] open("/var/run/corosync.pid", O_WRONLY|O_CREAT, 0640) = -1 EACCES 
(Permission denied)

If you can identify the name of the config file, please also post its
path and its full content.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-18 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On 04/13/2017 11:11 AM, Ferenc Wágner wrote:
> 
>> I encountered several (old) statements on various forums along the lines
>> of: "the CIB is not a transactional database and shouldn't be used as
>> one" or "resource parameters should only uniquely identify a resource,
>> not configure it" and "the CIB was not designed to be a configuration
>> database but people still use it that way".  Sorry if I misquote these,
>> I go by my memories now, I failed to dig up the links by a quick try.
>> 
>> Well, I've been feeling guilty in the above offenses for years, but it
>> worked out pretty well that way which helped to suppress these warnings
>> in the back of my head.  Still, I'm curious: what's the reason for these
>> warnings, what are the dangers of "abusing" the CIB this way?
>> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources
>> configured.  Old Pacemaker versions required tuning PCMK_ipc_buffer to
>> handle this, but even the default is big enough nowadays (128 kB after
>> compression, I guess).
>> 
>> Am I walking on thin ice?  What should I look out for?
>
> That's a good question. Certainly, there is some configuration
> information in most resource definitions, so it's more a matter of degree.
>
> The main concerns I can think of are:
>
> 1. Size: Increasing the CIB size increases the I/O, CPU and networking
> overhead of the cluster (and if it crosses the compression threshold,
> significantly). It also marginally increases the time it takes the
> policy engine to calculate a new state, which slows recovery.

Thanks for the input, Ken!  Is this what you mean?

cib: info: crm_compress_string: Compressed 1028972 bytes into 69095 (ratio 
14:1) in 138ms

At the same time /var/lib/pacemaker/cib/cib.xml is 336K, and

# cibadmin -Q --scope resources | wc -c
330951
# cibadmin -Q --scope status | wc -c
732820

Even though I consume about 2 kB per resource, the status section
weights 2.2 times the resources section.  Which means shrinking the
resource size wouldn't change the full size significantly.

At the same time, we should probably monitor the trends of the cluster
messaging health as we expand it (with nodes and resources).  What would
be some useful indicators to graph?

> 2. Consistency: Clusters can become partitioned. If changes are made on
> one or more partitions during the separation, the changes won't be
> reflected on all nodes until the partition heals, at which time the
> cluster will reconcile them, potentially losing one side's changes.

Ah, that's a very good point, which I neglected totally: even inquorate
partitions can have configuration changes.  Thanks for bringing this up!
I wonder if there's any practical workaround for that.

> I suppose this isn't qualitatively different from using a separate
> configuration file, but those tend to be more static, and failure to
> modify all copies would be more obvious when doing them individually
> rather than issuing a single cluster command.

From a different angle: if a node is off, you can't modify its
configuration file.  So you need an independent mechanism to do what the
CIB synchronization does anyway, or a shared file system with its added
complexity.  On the other hand, one needn't guess how Pacemaker
reconciles the conflicting resource configuration changes.  Indeed, how
does it?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-13 Thread Ferenc Wágner
Hi,

I encountered several (old) statements on various forums along the lines
of: "the CIB is not a transactional database and shouldn't be used as
one" or "resource parameters should only uniquely identify a resource,
not configure it" and "the CIB was not designed to be a configuration
database but people still use it that way".  Sorry if I misquote these,
I go by my memories now, I failed to dig up the links by a quick try.

Well, I've been feeling guilty in the above offenses for years, but it
worked out pretty well that way which helped to suppress these warnings
in the back of my head.  Still, I'm curious: what's the reason for these
warnings, what are the dangers of "abusing" the CIB this way?
/var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources
configured.  Old Pacemaker versions required tuning PCMK_ipc_buffer to
handle this, but even the default is big enough nowadays (128 kB after
compression, I guess).

Am I walking on thin ice?  What should I look out for?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-13 Thread Ferenc Wágner
kgronl...@suse.com (Kristoffer Grönlund) writes:

> I discovered today that a location constraint with score=INFINITY
> doesn't actually restrict resources to running only on particular
> nodes.

Yeah, I made the same "discovery" some time ago.  Since then I've been
using something like the following to restrict my-rsc to my-node:


  

  

-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Never join a list without a problem...

2017-03-01 Thread Ferenc Wágner
Jeffrey Westgate  writes:

> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
> longer, and we cannot set a clock by it - while the machine is 95%
> idle (or more according to 'top'), the host load shoots up to 50 or
> 60%.  It takes about 20 minutes to peak, and another 30 to 45 minutes
> to come back down to baseline, which is mostly 0.00.  (attached
> hostload.pdf) This happens to both machines, randomly, and is
> concerning, as we'd like to find what's causing it and resolve it.

Try running atop (http://www.atoptool.nl/).  It collects and logs
process accounting info, allowing you to step back in time and check
resource usage in the past.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Insert delay between the statup of VirtualDomain

2017-02-27 Thread Ferenc Wágner
Oscar Segarra  writes:

> In my environment I have 5 guestes that have to be started up in a
> specified order starting for the MySQL database server.

We use a somewhat redesigned resource agent, which connects to the guest
using a virtio channel and waits for a signal before exiting from the
start operation.  The signal is sent by an approriately placed startup
script from the guest.  This is fully independent from regular network
traffic and does not need any channel configuration.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: [Question] About a change of crm_failcount.

2017-02-09 Thread Ferenc Wágner
Jehan-Guillaume de Rorthais  writes:

> PAF use private attribute to give informations between actions. We
> detect the failure during the notify as well, but raise the error
> during the promotion itself. See how I dealt with this in PAF:
>
> https://github.com/ioguix/PAF/commit/6123025ff7cd9929b56c9af2faaefdf392886e68

This is the first time I hear about private attributes.  Since they
could come useful one day, I'd like to understand them better.  After
some reading, they seem to be node attributes, not resource attributes.
This may be irrelevant for PAF, but doesn't it mean that two resources
of the same type on the same node would interfere with each other?
Also, your _set_priv_attr could fall into an infinite loop if another
instance used it at the inappropriate moment.  Do I miss something here?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot  writes:

> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>
>> Ken Gaillot  writes:
>>
>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>
 Isn't the question: Is crmd a process that is expected to die (and
 thus need restarting)? Or wouldn't one prefer to debug this
 situation. I fear that restarting it might just cover some fatal
 failure...
>>>
>>> If crmd or corosync dies, the node will be fenced (if fencing is enabled
>>> and working). If one of the crmd's persistent connections (such as to
>>> the cib) fails, it will exit, so it ends up the same.
>> 
>> But isn't it due to crmd not responding to network packets? So if the
>> timeout is long enough, and crmd is started fast enough, will the
>> node really be fenced?
>
> If crmd dies, it leaves its corosync process group, and I'm pretty sure
> the other nodes will fence it for that reason, regardless of the duration.

See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html
for a case when a Pacemaker cluster survived a crmd failure and restart.
Re-reading the thread, I'm still unsure what saved our ass from
resources being started in parallel and losing massive data.  I'd fully
expect fencing in such cases...
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-08 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On 02/03/2017 07:00 AM, RaSca wrote:
>> 
>> On 03/02/2017 11:06, Ferenc Wágner wrote:
>>> Ken Gaillot <kgail...@redhat.com> writes:
>>>
>>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>>>>
>>>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
>>>>> seems to be working ok including the STONITH.
>>>>> For test purposes I issued a "pkill -f pace" killing all pacemaker
>>>>> processes on one node.
>>>>>
>>>>> Result:
>>>>> The node is marked as "pending", all resources stay on it. If I
>>>>> manually kill a resource it is not noticed. On the other node a drbd
>>>>> "promote" command fails (drbd is still running as master on the first
>>>>> node).
>>>>
>>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly
>>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
>>>> pacemaker".
>>>
>>> What exactly is "quickly enough"?
>> 
>> What Ken is saying is that Pacemaker, as a service managed by systemd,
>> have in its service definition file
>> (/usr/lib/systemd/system/pacemaker.service) this option:
>> 
>> Restart=on-failure
>> 
>> Looking at [1] it is explained: systemd restarts immediately the process
>> if it ends for some unexpected reason (like a forced kill).
>> 
>> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html
>
> And the cluster itself is resilient to some daemon restarts. If only
> pacemakerd is killed, corosync and pacemaker's crmd can still function
> without any issues. When pacemakerd respawns, it reestablishes contact
> with any other cluster daemons still running (and its pacemakerd peers
> on other cluster nodes).

KillMode=process looks like is a very important compenent of the service
file then.  Probably worth commenting, especially its relation to
Restart=on-failure (it also affects plain stop operations, of course).

But I still wonder how "quickly enough" could be quantified.  Have we
got a timeout for this, or are we good while the cluster is quiescent,
or maybe something else?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Failed reload

2017-02-08 Thread Ferenc Wágner
Hi,

There was an interesting discussion on this list about "Doing reload
right" last July (which I still haven't digested entirely).  Now I've
got a related question about the current and intented behavior: what
happens if a reload operation fails?  I found some suggestions in
http://ocf.community.tummy.narkive.com/RngPlNfz/adding-reload-to-the-ocf-specification,
from 11 years back, and the question wasn't clear cut at all.  Now I'm
contemplating adding best-effort reloads to an RA, but not sure what
behavior I can expect and depend on in the long run.  I'd be grateful
for your insights.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-03 Thread Ferenc Wágner
Ken Gaillot  writes:

> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
> 
>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
>> seems to be working ok including the STONITH.
>> For test purposes I issued a "pkill -f pace" killing all pacemaker
>> processes on one node.
>> 
>> Result:
>> The node is marked as "pending", all resources stay on it. If I
>> manually kill a resource it is not noticed. On the other node a drbd
>> "promote" command fails (drbd is still running as master on the first
>> node).
>
> I suspect that, when you kill pacemakerd, systemd respawns it quickly
> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
> pacemaker".

What exactly is "quickly enough"?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino  writes:

> Ferenc, regarding the flag use_lvmetad in
> /usr/lib/ocf/resource.d/heartbeat/LVM I read:
>
>> lvmetad is a daemon that caches lvm metadata to improve the
>> performance of LVM commands. This daemon should never be used when
>> volume groups exist that are being managed by the cluster. The
>> lvmetad daemon introduces a response lag, where certain LVM commands
>> look like they have completed (like vg activation) when in fact the
>> command is still in progress by the lvmetad.  This can cause
>> reliability issues when managing volume groups in the cluster.  For
>> Example, if you have a volume group that is a dependency for another
>> application, it is possible the cluster will think the volume group
>> is activated and attempt to start the application before volume group
>> is really accesible... lvmetad is bad.
>
> in the function LVM_validate_all()

Wow, if this is true, then this is serious breakage in LVM.  Thanks for
the pointer.  I think this should be brought up with the LVM developers.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread Ferenc Wágner
Marco Marino  writes:

> I agree with you for
> use_lvmetad = 0 (setting it = 1 in a clustered environment is an error)

Where does this information come from?  AFAIK, if locking_type=3 (LVM
uses internal clustered locking, that is, clvmd), lvmetad is not used
anyway, even if it's running.  So it's best to disable it to avoid
warning messages all around.  This is the case with active/active
clustering in LVM itself, in which Pacemaker isn't involved.

On the other hand, if you use Pacemaker to do active/passive clustering
by appropriately activating/deactivating your VG, this isn't clustering
from the LVM point of view, you don't set the clustered flag on your VG,
don't run clvmd and use locking_type=1.  Lvmetad should be perfectly
fine with this in principle (unless it caches metadata of inactive VGs,
which would be stupid, but I never tested this).

> but I think I have to set
> locking_type = 3 only if I use clvm

Right.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ferenc Wágner
Ken Gaillot  writes:

> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node. So, the cluster thinks the VM
> crashed on the original node, and recovers it by starting it again.

Ken, do you mean that if a periodic "stopped" monitor is configured, it
is forced to run immediately (out of schedule) when the regular periodic
monitor unexpectedly returns with stopped status?  That is, before the
cluster takes the recovery action?  Conceptually, that would be similar
to the probe run on node startup.  If not, then maybe it would be a
useful resource option to have (I mean running cluster-wide probes on an
unexpected monitor failure, before recovery).  An optional safety check.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)

2016-11-07 Thread Ferenc Wágner
Jan Friesse <jfrie...@redhat.com> writes:

> Ferenc Wágner napsal(a):
>
>> Have you got any plans/timeline for 2.4.2 yet?
>
> Yep, I'm going to release it in few minutes/hours.

Man, that was quick.  I've got a bunch of typo fixes queued..:) Please
consider announcing upcoming releases a couple of days in advance; as a
packager, I'd much appreciate it.  Maybe even tag release candidates...

Anyway, I've got a question concerning corosync-qnetd.  I run it as
user and group coroqnetd.  Is granting it read access to cert8.db and
key3.db enough for proper operation?  corosync-qnetd-certutil gives
write access to group coroqnetd to everything, which seems unintuitive
to me.  Please note that I've got zero experience with NSS.  But I don't
expect the daemon to change the certificate database.  Should I?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Special care needed when upgrading Pacemaker Remote nodes

2016-10-29 Thread Ferenc Wágner
Ken Gaillot  writes:

> This spurred me to complete a long-planned overhaul of Pacemaker
> Explained's "Upgrading" appendix:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_upgrading.html
>
> Feedback is welcome.

Since you asked for it..:)

1. Table D.1.: why does a rolling upgrade imply any service outage? Always?

2. Detach method: why use rsc_defaults instead of maintenance mode?

3. When do you think 1.1.16 will be released?  With approximately how
   much ABI incompatibility in the libraries?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-10-28 Thread Ferenc Wágner
Jan Friesse  writes:

> Please note that because of required changes in votequorum,
> libvotequorum is no longer binary compatible. This is reason for
> version bump.

Er, what version bump?  Corosync 2.4.1 still produces
libvotequorum.so.7.0.0 for me, just like Corosync 2.3.6.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-04 Thread Ferenc Wágner
Ken Gaillot  writes:

> Does anyone know of an RA that uses reload correctly?

My resource agents advertise a no-op reload action for handling their
"private" meta attributes.  Meta in the sense that they are used by the
resource agent when performing certain operations, not by the managed
resource itself.  Which means they are trivially changeable online,
without any resource operation whatsoever.

> Does anyone object to the (backward-incompatible) solution proposed
> here?

I'm all for cleanups, but please keep an online migration path around.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM standalone without crm ?

2016-06-26 Thread Ferenc Wágner
"Lentes, Bernd"  writes:

> i don't have neither an init-script nor a systemd service file.
> The only packages i find in the repositories concerning dlm are:
> libdlm3-3.00.01-0.31.87
> libdlm-3.00.01-0.31.87
> And i have a kernel module for dlm.
> Nothing else.

Sorry, my experience is limited to DLM 4.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DLM standalone without crm ?

2016-06-25 Thread Ferenc Wágner
"Lentes, Bernd"  writes:

> is it possible to have a DLM running without CRM?

Yes.  You'll need to configure fencing, though, since by default DLM
will try to use stonithd (from Pacemaker).  But DLM fencing didn't
handle fencing failures correctly for me, resulting in more nodes being
fenced until quorum was lost.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] restarting pacemakerd

2016-06-18 Thread Ferenc Wágner
Hi,

Could somebody please elaborate a little why the pacemaker systemd
service file contains "Restart=on-failure"?  I mean that a failed node
gets fenced anyway, so most of the time this would be a futile effort.
On the other hand, one could argue that restarting failed services
should be the default behavior of systemd (or any init system).  Still,
it is not.  I'd be grateful for some insight into the matter.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger <kwenn...@redhat.com> writes:

> On 06/16/2016 11:05 AM, Ferenc Wágner wrote:
>
>> Klaus Wenninger <kwenn...@redhat.com> writes:
>>
>>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>>>
>>>> I think the default timestamp should contain date and time zone
>>>> specification to make it unambigous.
>>>
>>> Idea was to have a trade-off between length and amount of information.
>>
>> I don't think it's worth saving a couple of bytes by dropping this
>> information.  In many cases there will be some way to recover it (from
>> SMTP headers or system logs), but that complicates things.
>
> Wasn't about saving some bytes in the size of a file or so but
> rather to keep readability. If the timestamp fills your screen
> you won't be able to read the actual information...have a look
> at /var/log/messages...
> Pure intention was to have a default that creates a kind of nice-looking
> output together with the file-example to give people an impression
> what they could do with the feature.

I see.  Incidentally, the file example is probably the one which would
profit most of having full timestamps.  And some locking.

>> In a similar vein, keeping the sequence number around would simplify
>> alert ordering and loss detection on the receiver side.  Especially with
>> SNMP, where the transport is unreliable as well.
>
> Nice idea... any OID in mind?

No.  But you can always extend PACEMAKER-MIB.

> Unfortunately the sequence-number we have right now als environment-
> variable is not really fit for this purpuse. It counts up with each
> and every alert being sent on a single node. So if you have multiple
> alerts configured you would experience gaps that prevent you from
> using it as loss-detection.

I see, it isn't per alert, unfortunately.  Still better than nothing,
though...

>>>> (BTW I'd prefer to run the alert scripts as a different user than the
>>>> various Pacemaker components, but that would lead too far now.)
>>>
>>> well, something we thought about already and a point where the
>>> new feature breaks the ClusterMon-Interface.
>>> Unfortunately the impact is quite high - crmd has dropped privileges -
>>> but if the pain-level rises high enough ...
>>
>> There's very little room to do this.  You'd need to configure an alert
>> user and group, and store them in the saved uid/gid set before dropping
>> privileges for the crmd process.  Or use a separate daemon for sending
>> alerts, which feels cleaner.
>
> Yes 2nd daemon was the idea. We don't want to give more rights
> to crmd than it needs. Btw. the daemon is there already: lrmd ;-)

It's running as root already, so at least no problem changing to any
user.  And the default could be hacluster.

>> You are right.  The snmptrap tool does the string->binary conversion if
>> it gets the correct format.  Otherwise, if the length matches, is does a
>> plain cast to binary, interpreting for example 12:34:56.78 as
>> 12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
>> shouldn't let the uses choose any timestamp-format but
>> %Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
>> in the current design. 
>
> Well, generic vs. failsafe  ;-)
> Of course one could introduce something like the metadata in RAs
> to achieve things like that but we wanted to keep the ball flat...
> After all the scripts are just examples...and the timestamp-format
> that should work is given in the header of the script...

More emphasis would help, I think.

>> Maybe it would be more appropriate to get the timestamp from crmd as
>> a high resolution (fractional) epoch all the time, and do the string
>> conversion in the agents as necessary.  One could still control the
>> format via instance_attributes where allowed.  Or keep around the
>> current mechanism as well to reduce code duplication in the agents.
>> Just some ideas...
>
> epoch was actually my first default ...
> additional epoch might be interesting alternative...

It would be useful.  Actually, crm_time_format_hr() currently fails for
any format string ending with any %-escape but N.  For example, "%Yx" is
formatted as "2016x", but "%Y" returns NULL.  You can avoid fixing this
by providing a fractional epoch instead. :)
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Alert notes

2016-06-16 Thread Ferenc Wágner
Klaus Wenninger <kwenn...@redhat.com> writes:

> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>
>> Please find some random notes about my adventures testing the new alert
>> system.
>>
>> The first alert example in the documentation has no recipient:
>>
>> 
>>
>> In the example above, the cluster will call my-script.sh for each
>> event.
>>
>> while the next section starts as:
>>
>> Each alert may be configured with one or more recipients. The cluster
>> will call the agent separately for each recipient.
>
> The goal of the first example is to be as simple as possible.
> But of course it makes sense to mention that it is not compulsory
> to ad a recipient. And I guess it makes sense to point that out
> as it is just ugly to think that you have to fake a recipient while
> it wouldn't make any sense in your context.

I agree.

>> I think the default timestamp should contain date and time zone
>> specification to make it unambigous.
>
> Idea was to have a trade-off between length and amount of information.

I don't think it's worth saving a couple of bytes by dropping this
information.  In many cases there will be some way to recover it (from
SMTP headers or system logs), but that complicates things.

In a similar vein, keeping the sequence number around would simplify
alert ordering and loss detection on the receiver side.  Especially with
SNMP, where the transport is unreliable as well.

>> (BTW I'd prefer to run the alert scripts as a different user than the
>> various Pacemaker components, but that would lead too far now.)
>
> well, something we thought about already and a point where the
> new feature breaks the ClusterMon-Interface.
> Unfortunately the impact is quite high - crmd has dropped privileges -
> but if the pain-level rises high enough ...

There's very little room to do this.  You'd need to configure an alert
user and group, and store them in the saved uid/gid set before dropping
privileges for the crmd process.  Or use a separate daemon for sending
alerts, which feels cleaner.

>> The SNMP agent seems to have a problem with hrSystemDate, which should
>> be an OCTETSTR with strict format, not some plain textual timestamp.
>> But I haven't really looked into this yet.
>
> Actually I had tried it with the snmptrap-tool coming with rhel-7.2
> and it worked with the string given in the example.
> Did you copy it 1-1? There is a typo in the document having the
> double-quotes double. The format is strict and there are actually
> 2 formats allowed - on with timezone and one without. The
> format string given should match the latter.

You are right.  The snmptrap tool does the string->binary conversion if
it gets the correct format.  Otherwise, if the length matches, is does a
plain cast to binary, interpreting for example 12:34:56.78 as
12594-58-51,52:58:53.54,.55:56.  Looks like the sample SNMP alert agent
shouldn't let the uses choose any timestamp-format but
%Y-%m-%d,%H:%M:%S.%1N,%:z; unfortunately there's no way to enforce this
in the current design.  Maybe it would be more appropriate to get the
timestamp from crmd as a high resolution (fractional) epoch all the
time, and do the string conversion in the agents as necessary.  One
could still control the format via instance_attributes where allowed.
Or keep around the current mechanism as well to reduce code duplication
in the agents.  Just some ideas...
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Master-Slaver resource Restarted after configuration change

2016-06-10 Thread Ferenc Wágner
Ilia Sokolinski  writes:

> We have a custom Master-Slave resource running on a 3-node pcs cluster on 
> CentOS 7.1
>
> As part of what is supposed to be an NDU we do update some properties of the 
> resource.
> For some reason this causes both Master and Slave instances of the  resource 
> to be restarted.
>
> Since restart takes a fairly long time for us, the update becomes very much 
> disruptive.
>
> Is this expected? 

Yes, if you changed a parameter declared with unique="1" in your resource
agent metadata.

> We have not seen this behavior with the previous release of pacemaker.

I'm surprised...
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Minimum configuration for dynamically adding a node to a cluster

2016-06-08 Thread Ferenc Wágner
Nikhil Utane  writes:

> Would like to know the best and easiest way to add a new node to an already
> running cluster.
>
> Our limitation:
> 1) pcsd cannot be used since (as per my understanding) it communicates over
> ssh which is prevented.
> 2) No manual editing of corosync.conf

If you use IPv4 multicast for Corosync 2 communication, then you needn't
have a nodelist in corosync.conf.  However, if you want a quorum
provider, then expected_votes must be set correctly, otherwise a small
partition booting up could mistakenly assume it has quorum.  In a live
system all corosync daemons will recognize new nodes and increase their
"live" expected_votes accordingly.  But they won't write this back to
the config file, leading to lack of information on reboot if they can't
learn better from their peers.

> So what I am thinking is, the first node will add nodelist with nodeid: 1
> into its corosync.conf file.
>
> nodelist {
> node {
>   ring0_addr: node1
>   nodeid: 1
> }
> }
>
> The second node to be added will get this information through some other
> means and add itself with nodeid: 2 into it's corosync file.
> Now the question I have is, does node1 also need to be updated with
> information about node 2?

It'd better, at least to exclude any possibility of clashing nodeids.

> When i tested it locally, the cluster was up even without node1 having
> node2 in its corosync.conf. Node2's corosync had both. If node1 doesn't
> need to be told about node2, is there a way where we don't configure the
> nodes but let them discover each other through the multicast IP (best
> option).

If you use IPv4 multicast and don't specify otherwise, the node IDs are
assigned according to the ring0 addresses (IPv4 addresses are 32 bit
integers after all).  But you still have to update expected_votes.

> Assuming we should add it to keep the files in sync, what's the best way to
> add the node information (either itself or other) preferably through some
> CLI command?

There's no corosync tool to update the config file.  An Augeas lense is
provided for corosync.conf though, which should help with the task (I
myself never tried it).  Then corosync-cfgtool -R makes all daemons in
the cluster reload their config files.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes:

> - On Jun 7, 2016, at 3:53 PM, Ferenc Wágner wf...@niif.hu wrote:
>
>> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes:
>> 
>>> Ok. Does DLM takes care that a LV just can be used on one host ?
>> 
>> No.  Even plain LVM uses locks to serialize access to its metadata
>> (avoid concurrent writes corrupting it).  These locks are provided by
>> the host kernel (locking_type=1).  DLM extends the locking concept to a
>> full cluster from a single host, which is exactly what cLVM needs.  This
>> is activated by locking_type=3.
>
> So DLM and cLVM just takes care that the metadata is consistent.
> None of them controls any access to the LV itself ?

cLVM contols activation as well (besides metadata consistency), but does
not control access to activated LVs, which are cluster-unaware
device-mapper devices, just like under plain LVM.

>>> cLVM just takes care that the naming is the same on all nodes, right?
>> 
>> More than that.  As above, it keeps the LVM metadata consistent amongst
>> the members of the cluster.  It can also activate LVs on all members
>> ("global" activation), or ensure that an LV is active on a single member
>> only ("exclusive" activation).
>> 
>>>>> Later on it's possible that some vm's run on host 1 and some on host 2. 
>>>>> Does
>>>>> clvm needs to be a ressource managed by the cluster manager?
>> 
>> The clvm daemon can be handled as a cloned cluster resource, but it
>> isn't necessary.  It requires corosync (or some other membership/
>> communication layer) and DLM to work.  DLM can be configured to do its
>> own fencing or to use that of Pacemaker (if present).
>> 
>>>>> If i use a fs inside the lv, a "normal" fs like ext3 is sufficient, i 
>>>>> think. But
>>>>> it has to be a cluster ressource, right ?
>> 
>> If your filesystem is a plain cluster resource, then your resource
>> manager will ensure that it isn't mounted on more than one node, and
>> everything should be all right.
>> 
>> Same with VMs on LVs: assuming no LV is used by two VMs (which would
>> bring back the previous problem on another level) and your VMs are
>> non-clone cluster resources, your resource manager will ensure that each
>> LV is used by a single VM only (on whichever host), and everything
>> should be all right, even though your LVs are active on all hosts (which
>> makes live migration possible, if your resource agent supports that).
>
> Does the LV need to be a ressource (if i don't have a FS) ?

No.  (If you use cLVM.  If you don't use cLVM, then your VGs must be
resources, otherwise nothing guarrantees the consistency of their
metadata.)

> From what i understand from what you say the LV's are active on all
> hosts, and the ressource manager controls that a VM is just running on
> one host, so the LV is just used by one host. Right ? So it has not to
> be a ressource.

Right.  (The LVs must be active on all hosts to enable free live
migration.  There might be other solutions, because the LVs receive I/O
on one host only at any given time, but then you have to persuade your
hypervisor that the block device it wants will really be available once
migration is complete.)
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to "switch on" cLVM ?

2016-06-07 Thread Ferenc Wágner
"Lentes, Bernd"  writes:

> Ok. Does DLM takes care that a LV just can be used on one host ?

No.  Even plain LVM uses locks to serialize access to its metadata
(avoid concurrent writes corrupting it).  These locks are provided by
the host kernel (locking_type=1).  DLM extends the locking concept to a
full cluster from a single host, which is exactly what cLVM needs.  This
is activated by locking_type=3.

> cLVM just takes care that the naming is the same on all nodes, right?

More than that.  As above, it keeps the LVM metadata consistent amongst
the members of the cluster.  It can also activate LVs on all members
("global" activation), or ensure that an LV is active on a single member
only ("exclusive" activation).

>>> Later on it's possible that some vm's run on host 1 and some on host 2. Does
>>> clvm needs to be a ressource managed by the cluster manager?

The clvm daemon can be handled as a cloned cluster resource, but it
isn't necessary.  It requires corosync (or some other membership/
communication layer) and DLM to work.  DLM can be configured to do its
own fencing or to use that of Pacemaker (if present).

>>> If i use a fs inside the lv, a "normal" fs like ext3 is sufficient, i 
>>> think. But
>>> it has to be a cluster ressource, right ?

If your filesystem is a plain cluster resource, then your resource
manager will ensure that it isn't mounted on more than one node, and
everything should be all right.

Same with VMs on LVs: assuming no LV is used by two VMs (which would
bring back the previous problem on another level) and your VMs are
non-clone cluster resources, your resource manager will ensure that each
LV is used by a single VM only (on whichever host), and everything
should be all right, even though your LVs are active on all hosts (which
makes live migration possible, if your resource agent supports that).
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Ferenc Wágner
"Stephano-Shachter, Dylan"  writes:

> I can not figure out why version 4 is not supported.

Have you got fsid=root (or fsid=0) on your root export?
See man exports.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsTrouble with deb packaging from 1.12 to 1.15

2016-05-17 Thread Ferenc Wágner
Andrey Rogovsky  writes:

> I have deb rules, comes from 1.12 and try apply it to current release.

1.1.14 is available in sid, stretch and jessie-backports, any reason you
can't use those packages?

> In the building I get an error:
> dh_testroot -a
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.so
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.la
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.a
> dh_install --sourcedir=debian/tmp --list-missing
> dh_install: pacemaker missing files (usr/lib*/heartbeat/attrd), aborting

This doesn't seem like coming from any recent packaging of Pacemaker.

> I was check buildroot - this directory and symlinks is missing
> Is this correct? May be I need add they manual?

It's expected, unless you configure such lib(exec)dir explicitly.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-04-27 Thread Ferenc Wágner
David Teigland  writes:

> On Tue, Apr 26, 2016 at 09:57:06PM +0200, Valentin Vidic wrote:
>
>> The bug is caused by the missing braces in the expanded if
>> statement.
>> 
>> Do you think we can get a new version out with this patch as the
>> fencing in 4.0.4 does not work properly due to this issue?
>
> Thanks for seeing that, I'll fix it right away.

I uploaded the new release to Debian.
Sorry for the breakage.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] operation parallelism

2016-04-22 Thread Ferenc Wágner
Hi,

Are recurring monitor operations constrained by the batch-limit cluster
option?  I ask because I'd like to limit the number of parallel start
and stop operations (because they are resource hungry and potentially
take long) without starving other operations, especially monitors.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsComing in 1.1.15: Event-driven alerts

2016-04-22 Thread Ferenc Wágner
Ken Gaillot  writes:

> Each alert may have any number of recipients configured. These values
> will simply be passed to the script as arguments. The first recipient
> will also be passed as the CRM_alert_recipient environment variable,
> for compatibility with existing scripts that only support one
> recipient.
> [...]
> In the current implementation, meta-attributes and instance attributes
> may also be specified within the  block, in which case they
> override any values specified in the  block when sent to that
> recipient.

Sorry, I don't get this.  The first paragraph above tells me that for a
given cluster event each  is run once, with all recipients passed
as command line arguments to the alert executable.  But a single
invocation can only have a single set of environmental variables, so how
can you override instance attributes for individual recipients?

> Whether this stays in the final 1.1.15 release or not depends on
> whether people find this to be useful, or confusing.

Now guess..:)
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsAntw: Re: Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:

> Ferenc Wágner <wf...@niif.hu> schrieb am 19.04.2016 um 13:42 in Nachricht
>
>> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:
>> 
>>> Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht
>>> 
>>>> I'm using the "balanced" placement strategy with good success.  It
>>>> distributes our VM resources according to memory size perfectly.
>>>> However, I'd like to take the NUMA topology into account.  That means
>>>> each host should have several capacity pools (of each capacity type) to
>>>> arrange the resources in.  Can Pacemaker do something like this?
>>>
>>> I think you can, but depending on VM technology, the hypervisor may
>>> not care much about NUMA. More details?
>> 
>> The NUMA technology would be handled by the resource agent, if it was
>> told by Pacemaker which utilization zone to use on its host.  I just
>> need the policy engine to do more granular resource placement and
>> communicate the selected zone to the resource agents on the hosts.
>> 
>> I'm pretty sure there's no direct support for this, but there might be
>> different approaches I missed.  Thus I'm looking for ideas here.
>
> My initial idea was this: Define a memory resource for every NUMA pool
> on each host, the assign your resources to NUMA pools (utilization):
> The resources will pick some host, but when one pool is full, your
> resources cannot go to another pool. Is something like this what you
> wanted?

Yes, and you also see correctly why this solution is unsatisfactory: I
don't want to tie my resources to a fraction of the host capacities
(like for example the first NUMA nodes of the hosts).

If nothing better comes up, I'll probably interleave all my VM memory
and forget about the NUMA topology until I find the time to implement a
new placement strategy.  That would be an unfortunate pessimization,
though.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Utilization zones

2016-04-19 Thread Ferenc Wágner
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:

> Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht
> 
>> I'm using the "balanced" placement strategy with good success.  It
>> distributes our VM resources according to memory size perfectly.
>> However, I'd like to take the NUMA topology into account.  That means
>> each host should have several capacity pools (of each capacity type) to
>> arrange the resources in.  Can Pacemaker do something like this?
>
> I think you can, but depending on VM technology, the hypervisor may
> not care much about NUMA. More details?

The NUMA technology would be handled by the resource agent, if it was
told by Pacemaker which utilization zone to use on its host.  I just
need the policy engine to do more granular resource placement and
communicate the selected zone to the resource agents on the hosts.

I'm pretty sure there's no direct support for this, but there might be
different approaches I missed.  Thus I'm looking for ideas here.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Utilization zones

2016-04-18 Thread Ferenc Wágner
Hi,

I'm using the "balanced" placement strategy with good success.  It
distributes our VM resources according to memory size perfectly.
However, I'd like to take the NUMA topology into account.  That means
each host should have several capacity pools (of each capacity type) to
arrange the resources in.  Can Pacemaker do something like this?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] crmd error: Cannot route message to unknown node

2016-04-07 Thread Ferenc Wágner
Hi,

On a freshly rebooted cluster node (after crm_mon reports it as
'online'), I get the following:

wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup
Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl05, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl06, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl07, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl08, removing fail-count-vm-cedar
Waiting for 6 replies from the CRMd..No messages received in 60 seconds.. 
aborting

Meanwhile, this is written into syslog (I can also provide info level
logs if necessary):

22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
vhbl03
22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
vhbl04
22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
vhbl06
22:03:02 vhbl08 crmd[8990]:error: Cannot route message to unknown node 
vhbl07
22:03:04 vhbl08 crmd[8990]:   notice: Operation vm-cedar_monitor_0: not running 
(node=vhbl08, call=626, rc=7, cib-update=169, confirmed=true)

For background:

wferi@vhbl08:~$ sudo cibadmin --scope=nodes -Q

  

  


  
  

  


  
  

  


  
  

  


  
  

  


  
  

  

  


Why does this happen?  I've got no node names in corosync.conf, but
Pacemaker defaults to uname -n all right.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsAntw: Re: spread out resources

2016-04-04 Thread Ferenc Wágner
"Ulrich Windl"  writes:

> Actually form my SLES11 SP[1-4] experience, the cluster always
> distributes resources across all available nodes, and only if don't
> want that, I'll have to add constraints. I wonder why that does not
> seem to work for you.

Because I'd like to spread small subsets of the resources (one such
subset is A, B, C and D) independently.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] spread out resources

2016-04-02 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On 03/30/2016 08:37 PM, Ferenc Wágner wrote:
> 
>> I've got a couple of resources (A, B, C, D, ... more than cluster nodes)
>> that I want to spread out to different nodes as much as possible.  They
>> are all the same, there's no distinguished one amongst them.  I tried
>> 
>> 
>>   
>> 
>> 
>> 
>> 
>>   
>>   
>> 
>> 
>> But crm_simulate did not finish with the above in the CIB.
>> What's a good way to get this working?
>
> Per the docs, "A colocated set with sequential=false makes sense only if
> there is another set in the constraint. Otherwise, the constraint has no
> effect." Using sequential=false would allow another set to depend on all
> these resources, without them depending on each other.

That was the very idea behind the above colocation constraint: it
contains the same group twice.  Yeah, it's somewhat contrived, but I had
no other idea with any chance of success.  And this one failed as well.

> I haven't actually tried resource sets with negative scores, so I'm not
> sure what happens there. With sequential=true, I'd guess that each
> resource would avoid the resource listed before it, but not necessarily
> any of the others.

Probably, but that isn't what I'm after.

> By default, pacemaker does spread things out as evenly as possible, so I
> don't think anything special is needed.

Yes, but only on the scale of all resources.  And I've also got a
hundred independent ones, which wash out this global spreading effect if
you consider only a select handful.

> If you want more control over the assignment, you can look into
> placement strategies:

We use balanced placement to account for the different memory
requirements of the various resources globally.  It would be possible to
introduce a new, artifical utilization "dimension" for each resource
group we want to spread independently, but this doesn't sound very
compelling.  For sets of two resources, a simple negative colocation
constraint works very well; it'd be a pity if it wasn't possible to
extend this concept to larger sets.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

2016-03-23 Thread Ferenc Wágner
(Please post only to the list, or at least keep it amongst the Cc-s.)

Momcilo Medic <fedorau...@fedoraproject.org> writes:

> On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner <wf...@niif.hu> wrote:
>> Momcilo Medic <fedorau...@fedoraproject.org> writes:
>>
>>> I have three hosts setup in my test environment.
>>> They each have two connections to the SAN which has GFS2 on it.
>>>
>>> Everything works like a charm, except when I reboot a host.
>>> Once it tries to stop gfs2-utils service it will just hang.
>>
>> Are you sure the OS reboot sequence does not stop the network or
>> corosync before GFS and DLM?
>
> I specifically configured services to start in this order:
> Corosync - DLM - GFS2-utils
> and to shutdown in this order:
> GFS2-utils - DLM - Corosync.
>
> I've acomplish this with:
>  update-rc.d -f corosync remove
>  update-rc.d -f corosync-notifyd remove
>  update-rc.d -f dlm remove
>  update-rc.d -f gfs2-utils remove
>  update-rc.d -f xendomains remove
>  update-rc.d corosync start 25 2 3 4 5 . stop 35 0 1 6 .
>  update-rc.d corosync-notifyd start 25 2 3 4 5 . stop 35 0 1 6 .
>  update-rc.d dlm start 30 2 3 4 5 . stop 30 0 1 6 .
>  update-rc.d gfs2-utils start 35 2 3 4 5 . stop 25 0 1 6 .
>  update-rc.d xendomains start 40 2 3 4 5 . stop 20 0 1 6 .

I don't know your OS, the above may or may not work.

> Also, the moment I was capturing logs, corosync and dlm were not
> running as services, but in foreground debugging mode.
> SSH connection did not break until I powered down the host so network
> is not stopped either.

At least you've got interactive debugging ability then.  So try to find
out why the Corosync membership broke down.  The output of
corosync-quorumtool and corosync-cpgtool might help.  Also try pinging
the Corosync ring0 addresses between the nodes.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

2016-03-23 Thread Ferenc Wágner
Momcilo Medic  writes:

> I have three hosts setup in my test environment.
> They each have two connections to the SAN which has GFS2 on it.
>
> Everything works like a charm, except when I reboot a host.
> Once it tries to stop gfs2-utils service it will just hang.

Are you sure the OS reboot sequence does not stop the network or
corosync before GFS and DLM?
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] "No such device" with fence_pve agent

2016-03-23 Thread Ferenc Wágner
Ken Gaillot  writes:

> There is a fence parameter pcmk_host_check that specifies how pacemaker
> determines which fence devices can fence which nodes. The default is
> dynamic-list, which means to run the fence agent's list command to get
> the nodes.  [...]
>
> You can specify pcmk_host_list or pcmk_host_map to use a static target
> list for the device.

I meant to research this, but now that you brought it up: does the
default of pcmk_host_check automatically change to static-list if
pcmk_host_list is defined?

Does pcmk_host_map override pcmk_host_list?  Does it play together with
pcmk_host_check=dynamic-list?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Andrei Borzenkov <arvidj...@gmail.com> writes:

> On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Pacemaker explained says about this cluster option:
>>
>> Advanced Use Only: Should the cluster shoot unseen nodes? Not using
>> the default is very unsafe!
>>
>> 1. What are those "unseen" nodes?
>
> Nodes that lost communication with other nodes (think of unplugging cables)

Translating to node status, does is mean UNCLEAN (offline) nodes which
suddenly return?  Can Pacemaker tell these apart from abruptly power
cycled nodes (when reboot happens before the comeback)?  I guess if a
node was successfully fenced at the time, it won't be considered
UNCLEAN, but is that the only way to avoid that?

>> And a possibly related question:
>>
>> 2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up,
>>so that they don't get fenced when I switch them on?  I mean without
>>removing the node altogether, to keep its capacity settings for
>>example.
>
> You can declare node as down using "crm node clearstate". You should
> not really do it unless you ascertained that node is actually
> physically down.

Great.  Is there an equivalent in bare bones Pacemaker, that is, not
involving the CRM shell?  Like deleting some status or LRMD history
element of the node, for example?

>> And some more about fencing:
>>
>> 3. What's the difference in cluster behavior between
>>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be 
>> retried?)
>>- having no configured STONITH devices (resources won't be started, 
>> right?)
>>- failing to STONITH with some error (on every node)
>>- timing out the STONITH operation
>>- manual fencing
>
> I do not think there is much difference. Without fencing pacemaker
> cannot make decision to relocate resources so cluster will be stuck.

Then I wonder why I hear the "must have working fencing if you value
your data" mantra so often (and always without explanation).  After all,
it does not risk the data, only the automatic cluster recovery, right?

>> 4. What's the modern way to do manual fencing?  (stonith_admin
>>--confirm + what?
>
> node name.

:) I did really poor wording that question.  I meant to ask what kind of
cluster (STONITH) configuration makes the cluster sit patiently until I
do the manual fencing, then carry on without timeouts or other errors.
Just as if some automatic fencing agent did the job, but letting me
investigate the node status beforehand.
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker startup-fencing

2016-03-19 Thread Ferenc Wágner
Hi,

Pacemaker explained says about this cluster option:

Advanced Use Only: Should the cluster shoot unseen nodes? Not using
the default is very unsafe!

1. What are those "unseen" nodes?

And a possibly related question:

2. If I've got UNCLEAN (offline) nodes, is there a way to clean them up,
   so that they don't get fenced when I switch them on?  I mean without
   removing the node altogether, to keep its capacity settings for
   example.

And some more about fencing:

3. What's the difference in cluster behavior between
   - stonith-enabled=FALSE (9.3.2: how often will the stop operation be 
retried?)
   - having no configured STONITH devices (resources won't be started, right?)
   - failing to STONITH with some error (on every node)
   - timing out the STONITH operation
   - manual fencing

4. What's the modern way to do manual fencing?  (stonith_admin
   --confirm + what?  I ask because meatware.so comes from
   cluster-glue and uses the old API).
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker startup-fencing

2016-03-18 Thread Ferenc Wágner
Andrei Borzenkov <arvidj...@gmail.com> writes:

> On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg <lars.ellenb...@linbit.com> 
> wrote:
>
>> On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote:
>>
>>>>> And some more about fencing:
>>>>>
>>>>> 3. What's the difference in cluster behavior between
>>>>>- stonith-enabled=FALSE (9.3.2: how often will the stop operation be 
>>>>> retried?)
>>>>>- having no configured STONITH devices (resources won't be started, 
>>>>> right?)
>>>>>- failing to STONITH with some error (on every node)
>>>>>- timing out the STONITH operation
>>>>>- manual fencing
>>>>
>>>> I do not think there is much difference. Without fencing pacemaker
>>>> cannot make decision to relocate resources so cluster will be stuck.
>>>
>>> Then I wonder why I hear the "must have working fencing if you value
>>> your data" mantra so often (and always without explanation).  After all,
>>> it does not risk the data, only the automatic cluster recovery, right?
>>
>> stonith-enabled=false
>> means:
>> if some node becomes unresponsive,
>> it is immediately *assumed* it was "clean" dead.
>> no fencing takes place,
>> resource takeover happens without further protection.
>
> Oh! Actually it is not quite clear from documentation; documentation
> does not explain what happens in case of stonith-enabled=false at all.

Yes, this is a crucially important piece of information, which should be
prominently announced in the documentation.  Thanks for spelling it out,
Lars.  Hope you don't mind that I turned your text into
https://github.com/ClusterLabs/pacemaker/pull/960.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] GFS and cLVM fencing requirements with DLM

2016-03-15 Thread Ferenc Wágner
Hi,

I'm referring here to an ancient LKML thread introducing DLM.  In
http://article.gmane.org/gmane.linux.kernel/299788 David Teigland
states:

GFS requires that a failed node be fenced prior to gfs being told to
begin recovery for that node

which sounds very plausible as according to that thread DLM itself does
not make sure fencing happens before DLM recovery, thus DLM locks could
be granted to others before the failed node is fenced (if at all).

Now more than ten years passed and I wonder

1. if the above is still true (or maybe my interpretation was wrong to
   start with)

2. how it is arranged for in the GFS2 code (I failed to find it with
   naive search phrases)

3. whether clvmd does the same

4. what are the pros/cons of disabling DLM fencing (even the dlm_stonith
   proxy) and leaving fencing fully to the resource manager (Pacemaker)
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-08 Thread Ferenc Wágner
Ken Gaillot <kgail...@redhat.com> writes:

> On 03/07/2016 02:03 PM, Ferenc Wágner wrote:
>
>> The transition-keys match, does this mean that the above is a late
>> result from the monitor operation which was considered timed-out
>> previously?  How did it reach vhbl07, if the DC at that time was vhbl03?
>> 
>>> The pe-input files from the transitions around here should help.
>> 
>> They are available.  What shall I look for?
>
> It's not the most user-friendly of tools, but crm_simulate can show how
> the cluster would react to each transition: crm_simulate -Sx $FILE.bz2

$ /usr/sbin/crm_simulate -Sx pe-input-430.bz2 -D recover_many.dot
[...]
$ dot recover_many.dot -Tpng >recover_many.png
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.573572 to fit

The result is a 32767x254 bitmap of green ellipses connected by arrows.
Most arrows are impossible to follow, but the picture seems to agree
with the textual output from crm_simulate:

* 30 FAILED resources on vhbl05 are to be recovered
* 32 Stopped resources are to be started (these are actually running,
  but considered Stopped as a consequence of the crmd restart on vhbl03)

On the other hand, simulation based on pe-input-431.bz2 reports
* only 2 FAILED resources to recover on vhbl05
* 36 resources to start (the 4 new are the ones whose recoveries started
  during the previous -- aborted -- transition)

I failed to extract anything out of these simulations than what was
already known from the logs.  But I'm happy to see that the cluster
probes the disappeared resources on vhbl03 (where they disappeared with
the crmd restart) even though it plans to start some of them on other
nodes.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Regular pengine warnings after a transient failure

2016-03-08 Thread Ferenc Wágner
Andrew Beekhof <abeek...@redhat.com> writes:

> On Tue, Mar 8, 2016 at 7:03 AM, Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Ken Gaillot <kgail...@redhat.com> writes:
>>
>>> On 03/07/2016 07:31 AM, Ferenc Wágner wrote:
>>>
>>>> 12:55:13 vhbl07 crmd[8484]: notice: Transition aborted by 
>>>> vm-eiffel_monitor_6 'create' on vhbl05: Foreign event 
>>>> (magic=0:0;521:0:0:634eef05-39c1-4093-94d4-8d624b423bb7, cib=0.613.98, 
>>>> source=process_graph_event:600, 0)
>>>
>>> That means the action was initiated by a different node (the previous DC
>>> presumably),
>
> I suspect s/previous/other/

Is there a way to find out for sure?

> With a stuck machine its entirely possible that the other nodes elected a
> new leader.
> Would I be right in guessing that fencing is disabled?

No, fencing is enabled.  However, Corosync did not experience any
problem.  I guess it's locked in memory and doesn't need storage
anymore.

(I elided the rest of your answer, thanks for those parts, too; I think
those are settled now.)
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


  1   2   >