Hi Andrew, I investigated on my test cluster what actually happens with dlm and fencing.
I added more debug messages to dlm dump, and also did a re-kick of nodes after some time. Results are that stonith history actually doesn't contain any information until pacemaker decides to fence node itself. Testcase I used is: killall -9 dlm_controld.pcmk on one node, After that I see in dlm dump: 1322748122 dlm:controld conf 3 0 1 memb 1074005258 1090782474 1124336906 join left 1107559690 1322748122 dlm:ls:clvmd conf 3 0 1 memb 1074005258 1090782474 1124336906 join left 1107559690 1322748122 clvmd add_change cg 7 remove nodeid 1107559690 reason 5 1322748122 Requested that node 1107559690 be kicked from the cluster 1322748122 clvmd add_change cg 7 counts member 3 joined 0 remove 1 failed 1 1322748122 clvmd stop_kernel cg 7 1322748122 write "0" to "/sys/kernel/dlm/clvmd/control" 1322748122 It does not appear node 1107559690/vd01-c has been shot 1322748122 clvmd check_fencing 1107559690 wait add 1322748073 fail 1322748122 last 0 1322748122 It does not appear node 1107559690/vd01-c has been shot 1322748123 It does not appear node 1107559690/vd01-c has been shot ... 1322748133 It does not appear node 1107559690/vd01-c has been shot 1322748133 Requested that node 1107559690 be kicked from the cluster 1322748134 It does not appear node 1107559690/vd01-c has been shot ... 1322748276 It does not appear node 1107559690/vd01-c has been shot 1322748276 Requested that node 1107559690 be kicked from the cluster 1322748277 It does not appear node 1107559690/vd01-c has been shot 1322748278 It does not appear node 1107559690/vd01-c has been shot 1322748279 It does not appear node 1107559690/vd01-c has been shot 1322748280 It does not appear node 1107559690/vd01-c has been shot 1322748281 It does not appear node 1107559690/vd01-c has been shot 1322748282 It does not appear node 1107559690/vd01-c has been shot 1322748283 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748284 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748285 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748286 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748287 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748288 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748289 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748290 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748291 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748292 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748293 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748294 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748295 Stonith history[0]: Fencing of node 1107559690/vd01-c is in progress 1322748296 Processing membership 488 1322748296 Skipped active node 1124336906: born-on=476, last-seen=488, this-event=488, last-event=484 1322748296 Skipped active node 1074005258: born-on=484, last-seen=488, this-event=488, last-event=484 1322748296 Skipped active node 1090782474: born-on=464, last-seen=488, this-event=488, last-event=484 1322748296 del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/1107559690" 1322748296 Removed inactive node 1107559690: born-on=468, last-seen=484, this-event=488, last-event=484 1322748296 Stonith history[0]: Node 1107559690/vd01-c fenced at 1322748296 1322748296 Node 1107559690/vd01-c was last shot at: 1322748296 1322748296 clvmd check_fencing 1107559690 done add 1322748073 fail 1322748122 last 1322748296 So, first stonith history entry appeared only after 161 second after initial fencing attempt. And that corresponds to following log lines (1322748283 = Dec 01 2011 14:04:43 UTC): Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: stage6: Scheduling Node vd01-c for STONITH Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: native_stop_constraints: Stop of failed resource dlm:2 is implicit after vd01-c is fenced Dec 1 14:04:42 vd01-b pengine: [1894]: WARN: native_stop_constraints: Stop of failed resource clvmd:2 is implicit after vd01-c is fenced >From my PoV that means that the call to crm_terminate_member_no_mainloop() does not actually schedule fencing operation. Best, Vladislav _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org