Re: [ClusterLabs] How to implement a fencing agent

2018-08-09 Thread Jan Pokorný
On 09/08/18 14:10 -0500, Ryan Thomas wrote: > I did some more investigation and was able to answer two of my questions: > > First, why did "pcs stonith list" not show my fence_foo agent? pcs runs the > meta-data action on the agent to get the description. Since my fence_foo > agent wasn't impleme

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread David Teigland
> If you mean dlm/clvmd_waiters, it's empty on all nodes. Is there > anything else to check? I guess that might be the wrong thing to look at when it's recovery that's blocked, my memory about this isn't great. I think the clues to check for recovery are mainly the dlm kernel messages and maybe:

Re: [ClusterLabs] How to implement a fencing agent

2018-08-09 Thread Ryan Thomas
I did some more investigation and was able to answer two of my questions: First, why did "pcs stonith list" not show my fence_foo agent? pcs runs the meta-data action on the agent to get the description. Since my fence_foo agent wasn't implemented, this would fail, and pcs would not display it. I

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread FeldHost™ Admin
Hi Feri, rule of thumb is use separate dedicated network for corosync traffic. For ex. we use two corosync rings, first and active one on separate network card and switch, second passive one on team (bond) device vlan. S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mai

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
David Teigland writes: > On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote: > >> Almost ten years ago you requested more info in a similar case, let's >> see if we can get further now! > > Hi, the usual cause is that a network message from the dlm has been > lost/dropped/missed. The

Re: [ClusterLabs] How to implement a fencing agent

2018-08-09 Thread Ryan Thomas
Thanks for the advice and information. >> 1. The documentation encourages the use of the python fencing library. >>How does one install this?> Fencing library is basically this file (sans the doubled extension): > https://github.com/ClusterLabs/fence-agents/blob/v4.2.1/lib/fencing.py.py

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread David Teigland
On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote: > Hi David, > > Almost ten years ago you requested more info in a similar case, let's > see if we can get further now! Hi, the usual cause is that a network message from the dlm has been lost/dropped/missed. The dlm can't recover fro

Re: [ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > For a start I attached the dump output from another node. I meant to... 146 dlm_controld 4.0.5 started 146 our_nodeid 167773708 146 found /dev/misc/dlm-control minor 58 146 found /dev/misc/dlm-monitor minor 57 146 found /dev/misc/dlm_plock minor 56 146 /sy

[ClusterLabs] DLM recovery stuck

2018-08-09 Thread Ferenc Wágner
Hi David, Almost ten years ago you requested more info in a similar case, let's see if we can get further now! We're running a 6-node Corosync cluster. DLM is started by systemd: ● dlm.service - dlm control daemon Loaded: loaded (/lib/systemd/system/dlm.service; enabled) Active: active (r

[ClusterLabs] Corosync-qdevice 3.0 - Beta1 is available at GitHub!

2018-08-09 Thread Jan Friesse
I am pleased to announce the first beta version release of Corosync-Qdevice 3.0 available immediately from GitHub at https://github.com/corosync/corosync-qdevice/releases as corosync-qdevice-2.92.0. This release contains mostly bugfixes and support for new NSS database format. Complete chang

Re: [ClusterLabs] How to implement a fencing agent

2018-08-09 Thread Jan Pokorný
On 09/08/18 07:59 +0200, Ulrich Windl wrote: Ryan Thomas schrieb am 08.08.2018 um 23:26 in > Nachricht > : >> I’m attempting to implement a fencing agent. >> >> The ClusterLabs/fence-agent github repo has some helpful information >> including fence-agents/doc/FenceAgentAPI.md, but I haven’t

[ClusterLabs] Example of somewhat confusing log message from crmd

2018-08-09 Thread Ulrich Windl
Hi! While setting up a new clustered VG in SLES11 SP4 (create it, then configure it in the cluster), I noticed this warning message, which is a good example how log messages could be improved: Aug 9 12:19:41 h10 crmd[20946]: warning: status_from_rc: Action 58 (prm_LVM_VMD:0_monitor_0) on h05