Re: [ClusterLabs] VirtualDomain live migration error

2017-09-01 Thread Oscar Segarra
Hi, I have updated the known_hosts: Now, I get the following error: Sep 02 01:03:41 [1535] vdicnode01cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='vm-vdicdb01']/lrm_rsc_op[@id='vm-vdicdb01_last_0']: @operation_key=vm-vdic

Re: [ClusterLabs] Pacemaker stopped monitoring the resource

2017-09-01 Thread Ken Gaillot
On Fri, 2017-09-01 at 15:06 +0530, Abhay B wrote: > Are you sure the monitor stopped? Pacemaker only logs > recurring monitors > when the status changes. Any successful monitors after this > wouldn't be > logged. > > Yes. Since there were no logs which s

Re: [ClusterLabs] VirtualDomain live migration error

2017-09-01 Thread Ken Gaillot
On Fri, 2017-09-01 at 00:26 +0200, Oscar Segarra wrote: > Hi, > > > Yes, it is > > > The qemu-kvm process is executed by the oneadmin user. > > > When I cluster tries the live migration, what users do play? > > > Oneadmin > Root > Hacluster > > > I have just configured pasworless ssh

Re: [ClusterLabs] Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Klechomir
On 1.09.2017 17:21, Jan Pokorný wrote: On 01/09/17 09:48 +0300, Klechomir wrote: I have cases, when for an unknown reason a single monitoring request never returns result. So having bigger timeouts doesn't resolve this problem. If I get you right, the pain point here is a command called by the

Re: [ClusterLabs] Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Jan Pokorný
On 01/09/17 09:48 +0300, Klechomir wrote: > I have cases, when for an unknown reason a single monitoring request > never returns result. > So having bigger timeouts doesn't resolve this problem. If I get you right, the pain point here is a command called by the resource agents during monitor opera

[ClusterLabs] Antw: Re: Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Ulrich Windl
>>> Klechomir schrieb am 01.09.2017 um 13:15 in Nachricht <258bd7d9-ed89-f1f0-8f9b-bca7420c6...@gmail.com>: > What I observe is that single monitoring request of different resources > with different resource agents is timing out. > > For example LVM resource (the LVM RA) does this sometimes. We

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Klaus Wenninger
On 08/31/2017 11:58 PM, Ferenc Wágner wrote: > Klaus Wenninger writes: > >> Just seen that you are hosting VMs which might make you use KSM ... >> Don't fully remember at the moment but I have some memory of >> issues with KSM and page-locking. >> iirc it was some bug in the kernel memory-manageme

Re: [ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Klechomir
On 1 Sep 2017, at 13:15, Klechomir > wrote: What I observe is that single monitoring request of different resources with different resource agents is timing out. For example LVM resource (the LVM RA) does this sometimes. Setting ridiculously high timeouts (5 minutes an

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Digimer writes: > On 2017-08-29 10:45 AM, Ferenc Wágner wrote: > >> Digimer writes: >> >>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote: >>> [...] While dlm_tool status reports (similar on all nodes): cluster nodeid 167773705 quorate 1 ring seq 3088 3088 daemon now 29414

Re: [ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Kristián Feldsam
S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny. FELDSAM s.r.o. V rohu 434/3 Praha 4 – Libuš, PSČ 142 00 IČ: 290 60 958, DIČ: CZ290 60 958 C 200350 ved

Re: [ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Klechomir
What I observe is that single monitoring request of different resources with different resource agents is timing out. For example LVM resource (the LVM RA) does this sometimes. Setting ridiculously high timeouts (5 minutes and more) didn't solve the problem, so I think I'm out of options there

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ferenc Wágner
Jan Friesse writes: > wf...@niif.hu writes: > >> Jan Friesse writes: >> >>> wf...@niif.hu writes: >>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687

Re: [ClusterLabs] Pacemaker stopped monitoring the resource

2017-09-01 Thread Abhay B
> > Are you sure the monitor stopped? Pacemaker only logs recurring monitors > when the status changes. Any successful monitors after this wouldn't be > logged. Yes. Since there were no logs which said "RecurringOp: Start recurring monitor" on the node after it had failed. Also there were no lo

Re: [ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Jehan-Guillaume de Rorthais
On Fri, 01 Sep 2017 09:07:16 +0200 "Ulrich Windl" wrote: > >>> Klechomir schrieb am 01.09.2017 um 08:48 in Nachricht > <9f043557-233d-6c1c-b46d-63f8c2ee5...@gmail.com>: > > Hi Ulrich, > > Have to disagree here. > > > > I have cases, when for an unknown reason a single monitoring request > >

[ClusterLabs] Antw: Re: Antw: Is there a way to ignore a single monitoring timeout

2017-09-01 Thread Ulrich Windl
>>> Klechomir schrieb am 01.09.2017 um 08:48 in Nachricht <9f043557-233d-6c1c-b46d-63f8c2ee5...@gmail.com>: > Hi Ulrich, > Have to disagree here. > > I have cases, when for an unknown reason a single monitoring request > never returns result. > So having bigger timeouts doesn't resolve this prob

[ClusterLabs] Antw: Re: Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Ulrich Windl
Hi! I don't know the answer, but I wonder what would happen if corosync runs at normal scheduling priority. My suspect is that something's wrong, and using highest real-time priority could be the wrong fix for that problem ;-) Personally I think a process that does disk I/O and is waiting for net