HI Ulrich, It's not that it is not working for me, it is that to make it, at least appear, to work for me, I've had to modify dlm.conf -- which I have found zero mention of this being necessary in any of the tutorials or walk throughs I've read. I was curious what ramifications I would encounter by setting 'enable_fencing=0' in dlm.conf.
>From what I can tell, my config is sane with regard to interleave, colo, and ordering. SBD is a possibility we're considering, but haven't fully embraced, as of yet. I'm still planning on testing 'enable_startup_fencing=0' instead of 'enable_fencing=0' in dlm.conf and will report back (in case anyone is interested). Best, -Pat On Tue, Oct 2, 2018 at 2:25 AM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > I'm sorry that DLM/cLVM does not work for you. Did you double-check the > configuration (meta interleave=true, colocation and ordering), especially > the clones? > Also as you have shared storage, why don't you use SBD for fencing? > > Regards, > Ulrich > > > >>> Patrick Whitney <pwhit...@luminoso.com> schrieb am 01.10.2018 um > 22:01 in > Nachricht > <cae0zlk_va6gthz9tg3woecua2ridaehaoq8ieqdz4meokcy...@mail.gmail.com>: > > Hi Ulrich, > > > > When I first encountered this issue, I posted this: > > > > https://lists.clusterlabs.org/pipermail/users/2018-September/015637.html > > > > ... I was using resource fencing in this example, but, as I've mentioned > > before, the issue would come about, not when fencing occurred, but when > the > > fenced node was shutdown (we were using resource fencing). > > > > During that discussion, yourself and others suggested that power fencing > > was the only way DLM was going to cooperate and one suggestion of using > > meatware was proposed. > > > > Unfortunately, I found out later that meatware was no longer available ( > > https://lists.clusterlabs.org/pipermail/users/2018-September/015715.html > ), > > so we were lucky enough our test environment is a KVM/libvirt > environment, > > so I used fence_virsh. Again, I had the same problem... when the "bad" > > node was fenced, dlm_controld would issue (what appears to be) a > fence_all, > > and I would receive messages that that the dlm clone was down on all > > members and would have a log message that the clvm lockspace was > > abandoned. > > > > It was only when I disabled fencing for dlm (enable_fencing=0 in > dlm.conf; > > but kept fencing enabled in pcmk) did things begin to work as expected. > > > > One suggestion earlier in this thread suggests trying the dlm > configuration > > of disabling startup fencing (enable_startup_fencing=0), which sounds > like > > a plausible solution after looking over the logs, but I haven't tested > > yet. > > > > The conclusion I'm coming to is: > > 1. The reason DLM cannot handle resource fencing is because it keeps its > > own "heartbeat/control" channel (for lack of a better term) via the > > network, and pcmk cannot instruct DLM "Don't worry about that guy over > > there" which means we must use power fencing, but; > > 2. DLM does not like to see one of its members disappear; when that does > > happen, DLM does "something" which causes the lockspace to disappear... > > unless you disable fencing for DLM. > > > > I am now speculating that DLM restarts when the communications fail, and > > the theory that disabling startup fencing for DLM > > (enable_startup_fencing=0) may be the solution to my problem (reverting > my > > enable_fencing=0 DLM config). > > > > Best, > > -Pat > > > > On Mon, Oct 1, 2018 at 3:38 PM Ulrich Windl < > > ulrich.wi...@rz.uni-regensburg.de> wrote: > > > >> Hi! > >> > >> It would be much more helpful, if you could provide logs around the > >> problem events. Personally I think you _must_ implement proper fencing. > In > >> addition, DLM seems to do its own fencing when there is a communication > >> problem. > >> > >> Regards, > >> Ulrich > >> > >> > >> >>> Patrick Whitney <pwhit...@luminoso.com> 01.10.18 16.25 Uhr >>> > >> Hi Everyone, > >> > >> I wanted to solicit input on my configuration. > >> > >> I have a two node (test) cluster running corosync/pacemaker with DLM and > >> CLVM. > >> > >> I was running into an issue where when one node failed, the remaining > node > >> would appear to do the right thing, from the pcmk perspective, that is. > >> It would create a new cluster (of one) and fence the other node, but > >> then, rather surprisingly, DLM would see the other node offline, and it > >> would go offline itself, abandoning the lockspace. > >> > >> I changed my DLM settings to "enable_fencing=0", disabling DLM fencing, > and > >> our tests are now working as expected. > >> > >> I'm a little concern I have masked an issue by doing this, as in all of > the > >> tutorials and docs I've read, there is no mention of having to configure > >> DLM whatsoever. > >> > >> Is anyone else running a similar stack and can comment? > >> > >> Best, > >> -Pat > >> -- > >> Patrick Whitney > >> DevOps Engineer -- Tools > >> > >> _______________________________________________ > >> Users mailing list: Users@clusterlabs.org > >> https://lists.clusterlabs.org/mailman/listinfo/users > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > >> > > > > > > -- > > Patrick Whitney > > DevOps Engineer -- Tools > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Patrick Whitney DevOps Engineer -- Tools
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org