Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer
Hello Dejan, On Tue, 17 Oct 2017 13:13:11 +0200 Dejan Muhamedagic wrote: > Hi Lars, > > On Mon, Oct 16, 2017 at 08:52:04PM +0200, Lars Ellenberg wrote: > > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote: > > > Hi, > > > > > > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Chris

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 16:58 +0200, Gerard Garcia wrote: > I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the > latest stable version in Centos 7.3 > > Gerard Interesting ... this was an undetected bug that was coincidentally fixed by the recent fail-count work released in 1.1.17. T

Re: [ClusterLabs] strange cluster state

2017-10-18 Thread Ken Gaillot
On Fri, 2017-09-29 at 15:32 +0200, Václav Mach wrote: > Hello, > > I am trying to setup simple 2 node cluster. The setup is done with  > ansible. The whole project is available on github at  > https://github.com/lager1/cesnet_HA (README is written in czech, but  > other parts may be relevant). >

Re: [ClusterLabs] monitor failed actions not cleared

2017-10-18 Thread Ken Gaillot
On Mon, 2017-10-02 at 13:29 +, LE COQUIL Pierre-Yves wrote: > Hi, >   > I finally found my mistake: > I have set up the failure-timeout like the lifetime example in the > RedHat Documentation with the value PT1M. > If I set up the failure-timeout with 60, it works like it should. This is a bug

Re: [ClusterLabs] VirtualDomain live migration error

2017-10-18 Thread Ken Gaillot
On Sat, 2017-09-02 at 01:21 +0200, Oscar Segarra wrote: > Hi,  > > I have updated the known_hosts: > > Now, I get the following error: > > Sep 02 01:03:41 [1535] vdicnode01        cib:     info: > cib_perform_op: + >  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resou > rce[@id

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 10:57 PM, kgaillot kgail...@redhat.com wrote: >> from the Changelog: >> >> Changes since Pacemaker-1.1.15 >>   ... >>   + pengine: do not fence a node in maintenance mode if it shuts down >> cleanly >>   ... >> >> just saying ... may or may not be what you are seeing.

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 9:27 PM, Digimer li...@alteeve.ca wrote: > > I understood what you meant about it getting fenced after stopping > corosync. What I am not clear on is if you are stopping corosync on the > normal node, or the node that is in maintenance mode. > > In either case, as I

Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer
On Mon, 16 Oct 2017 20:52:04 +0200 Lars Ellenberg wrote: > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote: > > Hi, > > > > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote: > > > > > > Hello, > > > > > > 2nd post in 10 years, lets see if this one gets an ans

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if problem persists? Wow! With that change, I'm pleased to say that I'm

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the latest stable version in Centos 7.3 Gerard On Wed, Oct 18, 2017 at 4:42 PM, Ken Gaillot wrote: > On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote: > > So I think I found the problem. The two resources are named forwarder >

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote: > So I think I found the problem. The two resources are named forwarder > and bgpforwarder. It doesn't matter if bgpforwarder exists. It is > just that when I set the failcount to INFINITY to a resource named > bgpforwarder (crm_failcount -r b

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jonathan Davies
On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if problem persists? Wow! With that change, I'm pleased to say that I'm not able

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 16/10/17 15:58, Jan Friesse wrote: Jonathan, On 13/10/17 17:24, Jan Friesse wrote: I've done a bit of digging and am getting closer to the root cause of the race. We rely on having votequorum_sync_init called twice -- once when node 1 joins (with member_list_entries=2) and

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
So I think I found the problem. The two resources are named forwarder and bgpforwarder. It doesn't matter if bgpforwarder exists. It is just that when I set the failcount to INFINITY to a resource named bgpforwarder (crm_failcount -r bgpforwarder -v INFINITY) it directly affects the forwarder resou

Re: [ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Vladislav Bogdanov
Hi, ensure you have two monitor operations configured for your drbd resource: for 'Master' and 'Slave' roles ('Slave' == 'Started' == '' for ms resources). http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_monitoring_multi_state_resources.html 18.10.2017 11:18, Антон

[ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Антон Сацкий
Hi list need your help [root@voipserver ~]# pcs status Cluster name: ClusterKrusher Stack: corosync Current DC: voipserver.backup (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum Last updated: Tue Oct 17 19:46:05 2017 Last change: Tue Oct 17 19:28:22 2017 by root via cibadmin on voipse