Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-20 Thread Andrei Borzenkov
18.02.2019 18:53, Ken Gaillot пишет: > On Sun, 2019-02-17 at 20:33 +0300, Andrei Borzenkov wrote: >> 17.02.2019 0:33, Andrei Borzenkov пишет: >>> 17.02.2019 0:03, Eric Robinson пишет: Here are the relevant corosync logs. It appears that the stop action for resource p_mysql_002

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Andrei Borzenkov
19.02.2019 23:06, Eric Robinson пишет: ... > Bottom line is, how do we configure the cluster in such a way that > there are no cascading circumstances when a MySQL resource fails? > Basically, if a MySQL resource fails, it fails. We'll deal with that > on an ad-hoc basis. I don't want the whole

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Ken Gaillot
> > > Borzenkov > > > > Sent: Sunday, February 17, 2019 11:56 AM > > > > To: users@clusterlabs.org > > > > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When > > > > Just > > > > One Fails? > > > > >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Ken Gaillot > Sent: Tuesday, February 19, 2019 10:31 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? &g

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Ken Gaillot
On Tue, 2019-02-19 at 17:40 +, Eric Robinson wrote: > > -Original Message- > > From: Users On Behalf Of Andrei > > Borzenkov > > Sent: Sunday, February 17, 2019 11:56 AM > > To: users@clusterlabs.org > > Subject: Re: [ClusterLabs] Why Do All Th

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-19 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Sunday, February 17, 2019 11:56 AM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > 17.02.2019 0:44, Eric

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-18 Thread Ken Gaillot
On Sun, 2019-02-17 at 20:33 +0300, Andrei Borzenkov wrote: > 17.02.2019 0:33, Andrei Borzenkov пишет: > > 17.02.2019 0:03, Eric Robinson пишет: > > > Here are the relevant corosync logs. > > > > > > It appears that the stop action for resource p_mysql_002 failed, > > > and that caused a cascading

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:44, Eric Robinson пишет: > Thanks for the feedback, Andrei. > > I only want cluster failover to occur if the filesystem or drbd resources > fail, or if the cluster messaging layer detects a complete node failure. Is > there a way to tell PaceMaker not to trigger a cluster failover

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-17 Thread Andrei Borzenkov
17.02.2019 0:33, Andrei Borzenkov пишет: > 17.02.2019 0:03, Eric Robinson пишет: >> Here are the relevant corosync logs. >> >> It appears that the stop action for resource p_mysql_002 failed, and that >> caused a cascading series of service changes. However, I don't understand >> why, since no

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 10:23:17PM +, Eric Robinson wrote: > I'm looking through the docs but I don't see how to set the on-fail value for > a resource. It is not set on the resource itself but on each of the actions (monitor, start, stop). -- Valentin

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
riginal Message- >> From: Users On Behalf Of Andrei >> Borzenkov >> Sent: Saturday, February 16, 2019 1:34 PM >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One >> Fails? >> >> 17.02.2019 0:03, E

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > > On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > > > I just noticed that. I also noticed that the lsb init script has a > > > hard-coded stop ti

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > > I just noticed that. I also noticed that the lsb init script has a > > hard-coded stop timeout of 30 seconds. So if the init script waits > > longer than the cluster resource timeout of 15s, that would cause the > > Yes, you

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
? > -Original Message- > From: Users On Behalf Of Andrei > Borzenkov > Sent: Saturday, February 16, 2019 1:34 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > 17.02.2019 0:03, Eric Robinson пишет

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > I just noticed that. I also noticed that the lsb init script has a > hard-coded stop timeout of 30 seconds. So if the init script waits > longer than the cluster resource timeout of 15s, that would cause the Yes, you should use

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Andrei Borzenkov
17.02.2019 0:03, Eric Robinson пишет: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and that > caused a cascading series of service changes. However, I don't understand > why, since no other resources are dependent on p_mysql_002. >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
> -Original Message- > From: Users On Behalf Of Valentin Vidic > Sent: Saturday, February 16, 2019 1:28 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Why Do All The Services Go Down When Just One > Fails? > > On Sat, Feb 16, 2019 at 09:03:43PM +0

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 09:03:43PM +, Eric Robinson wrote: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and > that caused a cascading series of service changes. However, I don't > understand why, since no other resources are

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 08:50:57PM +, Eric Robinson wrote: > Which logs? You mean /var/log/cluster/corosync.log? On the DC node pacemaker will be logging the actions it is trying to run (start or stop some resources). > But even if the stop action is resulting in an error, why would the >

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
Here are the relevant corosync logs. It appears that the stop action for resource p_mysql_002 failed, and that caused a cascading series of service changes. However, I don't understand why, since no other resources are dependent on p_mysql_002. [root@001db01a cluster]# cat

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic
On Sat, Feb 16, 2019 at 08:34:21PM +, Eric Robinson wrote: > Why is it that when one of the resources that start with p_mysql_* > goes into a FAILED state, all the other MySQL services also stop? Perhaps stop is not working correctly for these lsb services, so for example stopping

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
topics related to open-source clustering welcomed Subject: [ClusterLabs] Why Do All The Services Go Down When Just One Fails? These are the resources on our cluster. [root@001db01a ~]# pcs status Cluster name: 001db01ab Stack: corosync Current DC: 001db01a (version 1.1.18-11.el7_5.3-2b07d5c5a9

[ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Eric Robinson
These are the resources on our cluster. [root@001db01a ~]# pcs status Cluster name: 001db01ab Stack: corosync Current DC: 001db01a (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Sat Feb 16 15:24:55 2019 Last change: Sat Feb 16 15:10:21 2019 by root via cibadmin on