Hi Andrew, 23.12.2010 14:14, Andrew Beekhof wrote: ... >> Especially I need to understand how pacemaker integrates with cman's >> fencing/dlm subsystem: >> *) Do I need to configure fencing in both cman and pacemaker? > > No. Just in Pacemaker. > fenced spins waiting for Pacemaker to make an API call that tells it > that fencing completed, at which point the dlm can continue.
It doesn't seem to be enough even with c6a01b02950b: When I killall -9 corosync on one node (vd01-b, cman id 2) which by the chance was a DC, the I have following in log on will-be-new-DC (vd01-d) which again by chance run stonith resource for vd01-b (only relevant log lines): ============ Mar 23 10:08:49 vd01-d corosync[1630]: [TOTEM ] A processor failed, forming new configuration. Mar 23 10:09:01 vd01-d kernel: dlm: closing connection to node 2 Mar 23 10:09:01 vd01-d crmd: [1875]: info: cman_event_callback: Membership 1582268: quorum retained Mar 23 10:09:01 vd01-d crmd: [1875]: info: ais_status_callback: status: vd01-b is now lost (was member) Mar 23 10:09:01 vd01-d crmd: [1875]: info: crm_update_peer: Node vd01-b: id=2 state=lost (new) addr=(null) votes=0 born=1582212 seen=1582264 proc=00000000000000000000000000111312 Mar 23 10:09:01 vd01-d corosync[1630]: [CLM ] Members Left: Mar 23 10:09:01 vd01-d crmd: [1875]: WARN: check_dead_member: Our DC node (vd01-b) left the cluster Mar 23 10:09:01 vd01-d corosync[1630]: [CLM ] #011r(0) ip(10.5.4.65) Mar 23 10:09:01 vd01-d crmd: [1875]: info: send_ais_text: Peer overloaded or membership in flux: Re-sending message (Attempt 1 of 20) Mar 23 10:09:01 vd01-d corosync[1630]: [QUORUM] Members[15]: 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Mar 23 10:09:02 vd01-d corosync[1630]: [MAIN ] Completed service synchronization, ready to provide service. Mar 23 10:09:02 vd01-d fenced[1688]: fencing deferred to vd01-a Mar 23 10:09:02 vd01-d crmd: [1875]: info: update_dc: Unset DC vd01-b ============ At this time fenced (on vd01-a which has cman id 1 and is a fencing domain master) tries to kill that node but fails: ============ Mar 23 10:09:02 vd01-a fenced[1748]: fencing node vd01-b Mar 23 10:09:02 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none result: error no method Mar 23 10:09:02 vd01-a fenced[1748]: fence vd01-b failed Mar 23 10:09:05 vd01-a fenced[1748]: fencing node vd01-b Mar 23 10:09:05 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none result: error no method Mar 23 10:09:05 vd01-a fenced[1748]: fence vd01-b failed Mar 23 10:09:08 vd01-a fenced[1748]: fencing node vd01-b Mar 23 10:09:08 vd01-a fenced[1748]: fence vd01-b dev 0.0 agent none result: error no method Mar 23 10:09:08 vd01-a fenced[1748]: fence vd01-b failed ============ All DLM-related staff is blocked. After 1 minute vd01-d takes over DC role. ============ Mar 23 10:10:03 vd01-d crmd: [1875]: info: update_dc: Set DC to vd01-d (3.0.5) ============ After that all monitoring operations on resources which depend on DLM (LVM, GFS) fail with timeout, all dependent resources are then stopped, so cluster stops to be highly available. And only almost one more minute later pacemaker decides to stonith vd01-b: ============ Mar 23 10:10:54 vd01-d crmd: [1875]: WARN: match_down_event: No match for shutdown action on vd01-b Mar 23 10:10:54 vd01-d crmd: [1875]: info: te_update_diff: Stonith/shutdown of vd01-b not matched Mar 23 10:10:55 vd01-d pengine: [1874]: WARN: pe_fence_node: Node vd01-b will be fenced because it is un-expectedly down Mar 23 10:10:55 vd01-d pengine: [1874]: WARN: determine_online_status: Node vd01-b is unclean ============ and one minute later vd01-b is finally fenced. ============ Mar 23 10:12:17 vd01-a crmd: [1935]: info: tengine_stonith_notify: Peer vd01-b was terminated (reboot) by vd01-d for vd01-d (ref=05cd139e-585d-452e-a22d-0ef188a64d81): OK Mar 23 10:12:17 vd01-a crmd: [1935]: notice: tengine_stonith_notify: Notified CMAN that 'vd01-b' is now fenced Mar 23 10:12:17 vd01-a crmd: [1935]: notice: tengine_stonith_notify: Confirmed CMAN fencing event for 'vd01-b' Mar 23 10:12:17 vd01-a fenced[1748]: fence vd01-b overridden by administrator intervention ============ Overall it took (10:08:49 - 10:12:17) three and a half minutes to fence failed node. So, for this kind of failures (crash of corosync) it could be much more safer to duplicate fencing in both cman and pacemaker, because it would take only 15-20 seconds to do the same. I'll check it a bit later, need to configure fencing in cman, and also check a case when fencing domain master fails. Alternative could be if fenced asks pacemaker to fence failed node (is this done this way?), but this will not help much if DC (my case) fails because election of new DC takes some time too and (I assume) pacemaker will refuse to do fencing without DC. And this time is enough for monitor ops to fail (yes, I can configure bigger timeouts, but I generally want cluster to be as smart as possible). Would you please comment on this? Best, Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker