I'm fairly new to clustering under Linux. I've basically have one shared storage resource right now, using dlm, and gfs2. I'm using fibre channel and when both of my nodes are up (2 node cluster) dlm and gfs2 seem to be operating perfectly. If I reboot node B, node A works fine and vice-versa.
When node B goes offline unexpectedly, and become unclean, dlm seems to block all IO to the shared storage. dlm knows node B is down: # dlm_tool status cluster nodeid 1084772368 quorate 1 ring seq 32644 32644 daemon now 865695 fence_pid 18186 fence 1084772369 nodedown pid 18186 actor 1084772368 fail 1527119246 fence 0 now 1527119524 node 1084772368 M add 861439 rem 0 fail 0 fence 0 at 0 0 node 1084772369 X add 865239 rem 865416 fail 865416 fence 0 at 0 0 on the same server, I see these messages in my daemon.log May 23 19:52:47 alpha stonith-api[18186]: stonith_api_kick: Could not kick (reboot) node 1084772369/(null) : No route to host (-113) May 23 19:52:47 alpha dlm_stonith[18186]: kick_helper error -113 nodeid 1084772369 I can recover from the situation by forcing it (or bring the other node back online) dlm_tool fence_ack 1084772369 cluster config is pretty straighforward. node 1084772368: alpha node 1084772369: beta primitive p_dlm_controld ocf:pacemaker:controld \ op monitor interval=60 timeout=60 \ meta target-role=Started \ params args="-K -L -s 1" primitive p_fs_gfs2 Filesystem \ params device="/dev/sdb2" directory="/vms" fstype=gfs2 primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 sbd_device="/dev/sdb1" \ meta target-role=Started group g_gfs2 p_dlm_controld p_fs_gfs2 clone cl_gfs2 g_gfs2 \ meta interleave=true target-role=Started location cli-prefer-cl_gfs2 cl_gfs2 role=Started inf: alpha property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=zeta \ last-lrm-refresh=1525523370 \ stonith-enabled=true \ stonith-timeout=20s Any pointers would be appreciated. I feel like this should be working but I'm not sure if I've missed something. Thanks, Jason
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org