Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Jan Friesse
Edwin, On 19/02/2019 17:02, Klaus Wenninger wrote: On 02/19/2019 05:41 PM, Edwin Török wrote: On 19/02/2019 16:26, Edwin Török wrote: On 18/02/2019 18:27, Edwin Török wrote: Did a test today with CentOS 7.6 with upstream kernel and with 4.20.10-1.el7.elrepo.x86_64 (tested both with

[ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Ulrich Windl
>>> Eric Robinson schrieb am 19.02.2019 um 21:06 in Nachricht >> -Original Message- >> From: Users On Behalf Of Ken Gaillot >> Sent: Tuesday, February 19, 2019 10:31 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> >> Subject: Re: [ClusterLabs] Why Do

[ClusterLabs] Antw: Re: corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Ulrich Windl
>>> Klaus Wenninger schrieb am 19.02.2019 um 18:02 in Nachricht <7b626ca1-4f59-6257-bfb5-ef5d0d823...@redhat.com>: [...] >> >> It is looping on: >> debug Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed >> (non-critical): Resource temporarily unavailable (11) I wonder whether this

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Andrei Borzenkov
19.02.2019 23:06, Eric Robinson пишет: ... > Bottom line is, how do we configure the cluster in such a way that > there are no cascading circumstances when a MySQL resource fails? > Basically, if a MySQL resource fails, it fails. We'll deal with that > on an ad-hoc basis. I don't want the whole

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Ken Gaillot
On Tue, 2019-02-19 at 20:06 +, Eric Robinson wrote: > > -Original Message- > > From: Users On Behalf Of Ken > > Gaillot > > Sent: Tuesday, February 19, 2019 10:31 AM > > To: Cluster Labs - All topics related to open-source clustering > > welcomed > > > > Subject: Re: [ClusterLabs]

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Ken Gaillot > Sent: Tuesday, February 19, 2019 10:31 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > On Tue, 2019-02-19

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Ken Gaillot
On Tue, 2019-02-19 at 17:40 +, Eric Robinson wrote: > > -Original Message- > > From: Users On Behalf Of Andrei > > Borzenkov > > Sent: Sunday, February 17, 2019 11:56 AM > > To: users@clusterlabs.org > > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When > > Just One > >

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Klaus Wenninger
On 02/19/2019 06:21 PM, Edwin Török wrote: > > On 19/02/2019 17:02, Klaus Wenninger wrote: >> On 02/19/2019 05:41 PM, Edwin Török wrote: >>> On 19/02/2019 16:26, Edwin Török wrote: On 18/02/2019 18:27, Edwin Török wrote: > Did a test today with CentOS 7.6 with upstream kernel and with

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Sunday, February 17, 2019 11:56 AM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > 17.02.2019 0:44, Eric Robinson пишет: > > Thanks for the

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Edwin Török
On 19/02/2019 17:02, Klaus Wenninger wrote: > On 02/19/2019 05:41 PM, Edwin Török wrote: >> On 19/02/2019 16:26, Edwin Török wrote: >>> On 18/02/2019 18:27, Edwin Török wrote: Did a test today with CentOS 7.6 with upstream kernel and with 4.20.10-1.el7.elrepo.x86_64 (tested both with

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Klaus Wenninger
On 02/19/2019 05:41 PM, Edwin Török wrote: > On 19/02/2019 16:26, Edwin Török wrote: >> On 18/02/2019 18:27, Edwin Török wrote: >>> Did a test today with CentOS 7.6 with upstream kernel and with >>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our >>> patched [1] SBD) and was

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Edwin Török
On 19/02/2019 16:26, Edwin Török wrote: > On 18/02/2019 18:27, Edwin Török wrote: >> Did a test today with CentOS 7.6 with upstream kernel and with >> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our >> patched [1] SBD) and was not able to reproduce the issue yet. > > I was

Re: [ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

2019-02-19 Thread Edwin Török
On 18/02/2019 18:27, Edwin Török wrote: > Did a test today with CentOS 7.6 with upstream kernel and with > 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our > patched [1] SBD) and was not able to reproduce the issue yet. I was able to finally reproduce this using only upstream

Re: [ClusterLabs] Documentation for Corosync

2019-02-19 Thread Jan Friesse
Guido, Hi: My name's Guido and I'm working at an ISP provider. We are deploying a cluster using Corosync as cluster engine and We're very happy about how's it work. I just want to get more deep about how corosync works at all. I could not find specific material on the web, just how to configure

[ClusterLabs] Documentation for Corosync

2019-02-19 Thread Guido Alvarez
Hi: My name's Guido and I'm working at an ISP provider. We are deploying a cluster using Corosync as cluster engine and We're very happy about how's it work. I just want to get more deep about how corosync works at all. I could not find specific material on the web, just how to configure