Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-12 Thread Jan Friesse
Ferenc, Jan Friesse writes: Back to problem you have. It's definitively HW issue but I'm thinking how to solve it in software. Right now, I can see two ways: 1. Set dog FD to be non blocking right at the end of setup_watchdog - This is proffered but I'm not sure if

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Jan Friesse writes: > Back to problem you have. It's definitively HW issue but I'm thinking > how to solve it in software. Right now, I can see two ways: > 1. Set dog FD to be non blocking right at the end of setup_watchdog - >This is proffered but I'm not sure if it's

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Ferenc Wágner
Klaus Wenninger writes: > Just for my understanding: You are using watchdog-handling in corosync? Yes, I was. -- Feri ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Valentin Vidic
On Mon, Sep 11, 2017 at 04:18:08PM +0200, Klaus Wenninger wrote: > Just for my understanding: You are using watchdog-handling in corosync? Corosync package in Debian gets build with --enable-watchdog so by default it takes /dev/watchdog during runtime. Don't think SUSE or RedHat packages get

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Klaus Wenninger
On 09/11/2017 12:32 PM, Jan Friesse wrote: > Ferenc, > >> wf...@niif.hu (Ferenc Wágner) writes: >> >>> Jan Friesse writes: >>> wf...@niif.hu writes: > In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day > (in August; in May, it happened 0-2

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Jan Friesse
Ferenc, wf...@niif.hu (Ferenc Wágner) writes: Jan Friesse writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]:

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
Valentin Vidic writes: > On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > >> Confirmed: setting watchdog_device: off cluster wide got rid of the >> above warnings. > > Interesting, what brand or version of IPMI has this problem? It's a Fujitsu PRIMERGY

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Valentin Vidic
On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > Confirmed: setting watchdog_device: off cluster wide got rid of the > above warnings. Interesting, what brand or version of IPMI has this problem? -- Valentin ___ Users mailing list:

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-10 Thread Ferenc Wágner
wf...@niif.hu (Ferenc Wágner) writes: > Jan Friesse writes: > >> wf...@niif.hu writes: >> >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >>> (in August; in May, it happened 0-2 times a day only, it's slowly >>> ramping up): >>> >>> vhbl08

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-05 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Klaus Wenninger
On 08/31/2017 11:58 PM, Ferenc Wágner wrote: > Klaus Wenninger writes: > >> Just seen that you are hosting VMs which might make you use KSM ... >> Don't fully remember at the moment but I have some memory of >> issues with KSM and page-locking. >> iirc it was some bug in the

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Digimer writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> [...] While dlm_tool status reports (similar on all nodes): cluster nodeid 167773705 quorate 1

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Jan Friesse
Ferenc, Jan Friesse writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-31 Thread Ferenc Wágner
Klaus Wenninger writes: > Just seen that you are hosting VMs which might make you use KSM ... > Don't fully remember at the moment but I have some memory of > issues with KSM and page-locking. > iirc it was some bug in the kernel memory-management that should > be fixed a

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-30 Thread Klaus Wenninger
On 08/30/2017 08:54 AM, Jan Friesse wrote: > Ferenc, > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a >>

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Digimer
On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > Digimer writes: > >> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >> >>> [...] >>> While dlm_tool status reports (similar on all nodes): >>> >>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >>> daemon now 2941405 fence_pid

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Jan Friesse
Ferenc, Jan Friesse writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Digimer writes: > On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > >> [...] >> While dlm_tool status reports (similar on all nodes): >> >> cluster nodeid 167773705 quorate 1 ring seq 3088 3088 >> daemon now 2941405 fence_pid 0 >> node 167773705 M add 196 rem 0 fail 0 fence 0 at

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day >> (in August; in May, it happened 0-2 times a day only, it's slowly >> ramping up): >> >> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Jan Friesse
Ferenc, Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Digimer
On 2017-08-28 12:07 PM, Ferenc Wágner wrote: > Hi, > > In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day > (in August; in May, it happened 0-2 times a day only, it's slowly > ramping up): > > vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new > configuration.

[ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Ferenc Wágner
Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming