Digimer <li...@alteeve.ca> writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer <li...@alteeve.ca> writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> >>>> [...] >>>> While dlm_tool status reports (similar on all nodes): >>>> >>>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >>>> daemon now 2941405 fence_pid 0 >>>> node 167773705 M add 196 rem 0 fail 0 fence 0 at 0 0 >>>> node 167773706 M add 5960 rem 5730 fail 0 fence 0 at 0 0 >>>> node 167773707 M add 2089 rem 1802 fail 0 fence 0 at 0 0 >>>> node 167773708 M add 3646 rem 3413 fail 0 fence 0 at 0 0 >>>> node 167773709 M add 2588921 rem 2588920 fail 0 fence 0 at 0 0 >>>> node 167773710 M add 196 rem 0 fail 0 fence 0 at 0 0 >>>> >>>> dlm_tool ls shows "kern_stop": >>>> >>>> dlm lockspaces >>>> name clvmd >>>> id 0x4104eefa >>>> flags 0x00000004 kern_stop >>>> change member 5 joined 0 remove 1 failed 1 seq 8,8 >>>> members 167773705 167773706 167773707 167773708 167773710 >>>> new change member 6 joined 1 remove 0 failed 0 seq 9,9 >>>> new status wait messages 1 >>>> new members 167773705 167773706 167773707 167773708 167773709 167773710 >>>> >>>> on all nodes except for vhbl07 (167773709), where it gives >>>> >>>> dlm lockspaces >>>> name clvmd >>>> id 0x4104eefa >>>> flags 0x00000000 >>>> change member 6 joined 1 remove 0 failed 0 seq 11,11 >>>> members 167773705 167773706 167773707 167773708 167773709 167773710 >>>> >>>> instead. >>>> >>>> [...] Is there a way to unblock DLM without rebooting all nodes? >>> >>> Looks like the lost node wasn't fenced. >> >> Why dlm status does not report any lost node then? Or do I misinterpret >> its output? >> >>> Do you have fencing configured and tested? If not, DLM will block >>> forever because it won't recover until it has been told that the lost >>> peer has been fenced, by design. >> >> What command would you recommend for unblocking DLM in this case? > > First, fix fencing. Do you have that setup and working?
I really don't want DLM to do fencing. DLM blocking for a couple of days is not an issue in this setup (cLVM isn't a "service" of this cluster, only a rarely needed administration tool). Fencing is set up and works fine for Pacemaker, so it's used to recover actual HA services. But letting DLM use it resulted in disaster one and a half year ago (see Message-ID: <87r3g5a969....@lant.ki.iif.hu>), which I failed to understand yet, and I'd rather not go there again until that's taken care of properly. So for now, a manual unblock path is all I'm after. -- Thanks, Feri _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org