Re: [ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-11-02 Thread Jean-Marc Saffroy
Replying to myself: On Wed, 11 Oct 2017, Jean-Marc Saffroy wrote: > I am caught by surprise with this behaviour of DLM: > - I have 5 nodes (test VMs) > - 3 of them have 1 vote for the corosync quorum (they are "voters") > - 2 of them have 0 vote ("non-voters")

[ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-10-11 Thread Jean-Marc Saffroy
Hi, I am caught by surprise with this behaviour of DLM: - I have 5 nodes (test VMs) - 3 of them have 1 vote for the corosync quorum (they are "voters") - 2 of them have 0 vote ("non-voters") So the corosync quorum is 2. On the non-voters, I run DLM and an application that runs it. On DLM, fenci

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-04 Thread Jean-Marc Saffroy
On Wed, 4 Oct 2017, Jan Friesse wrote: > > Could you clarify the formula for me? I don't see how "- 2" and "650" > > map to this configuration. > > Since Corosync 2.3.4 when nodelist is used, totem.token is used only as > a basis for calculating real token timeout. You can check corosync.conf

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-03 Thread Jean-Marc Saffroy
Hi Jan, On Tue, 3 Oct 2017, Jan Friesse wrote: > > I hope this makes sense! :) > > I would still have some questions :) but that is really not related to > the problem you have. Questions are welcome! I am new to this stack, so there is certainly room for learning and for improvement. > My p

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-02 Thread Jean-Marc Saffroy
On Mon, 2 Oct 2017, Jan Friesse wrote: > > We had one problem on a real deployment of DLM+corosync (5 voters and 20 > > non-voters, with dlm on those 20, for a specific application that uses > > What you mean by voters and non-voters? There is 25 nodes in total and > each of them is running coro

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-09-27 Thread Jean-Marc Saffroy
On Wed, 27 Sep 2017, Jan Friesse wrote: > I don't think scheduling is the case. If scheduler would be the case > other message (Corosync main process was not scheduled for ...) would > kick in. This looks more like a something is blocked in totemsrp. Ah, interesting! > > Also, it looks like th