Re: [ClusterLabs] No slave is promoted to be master

2018-04-17 Thread Jehan-Guillaume de Rorthais
On Tue, 17 Apr 2018 04:16:38 +
范国腾  wrote:

> I check the status again. It is not not promoted but it promoted about 15
> minutes after the cluster starts. 
> 
> I try in three labs and the results are same: The promotion happens 15
> minutes after the cluster starts. 
> 
> Why is there about 15 minutes delay every time?

This was a bug in Pacemaker up to 1.1.17. I did a report about this last August
and Ken Gaillot fixed it few days later in 1.1.18. See:

https://lists.clusterlabs.org/pipermail/developers/2017-August/001110.html
https://lists.clusterlabs.org/pipermail/developers/2017-September/001113.html

I wonder if disabling the pgsql resource before shutting down the cluster might
be a simpler and safer workaround. Eg.:

 pcs resource disable pgsql-ha  --wait
 pcs cluster stop --all

and 

 pcs cluster start --all
 pcs resource enable pgsql-ha

Another fix would be to force a master score on one node **if needed** using:

  crm_master -N  -r  -l forever -v 1

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No slave is promoted to be master

2018-04-13 Thread Jehan-Guillaume de Rorthais
OK, I know what happen.

It seems like your standbies were not replicating when the master "crashed",
you can find tons of messages like this in the log files:

  WARNING: No secondary connected to the master
  WARNING: "db2" is not connected to the primary
  WARNING: "db3" is not connected to the primary

When a standby is not replicating, the master set negative master score to them
to forbid the promotion on them, as they are probably lagging for some
undefined time.

The following command shows the scores just before the simulated master crash:

  $ crm_simulate -x pe-input-2039.bz2 -s|grep -E 'date|promotion'
  Using the original execution date of: 2018-04-11 16:23:07Z
  pgsqld:0 promotion score on db1: 1001
  pgsqld:1 promotion score on db2: -1000
  pgsqld:2 promotion score on db3: -1000

"1001" score design the master. Streaming standbies always have a
positive master score between 1000 and 1000-N*10 where N is the number of
connected standbies.



On Fri, 13 Apr 2018 01:37:54 +
范国腾  wrote:

> The log is in the attachment.
> 
> We make a bug in the PG code in master node to make it not be restarted any
> more in order to test the following scenario: One slave could be promoted
> when the master crashed,  
> 
> -邮件原件-
> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
> 发送时间: 2018年4月12日 17:39
> 收件人: 范国腾 
> 抄送: Cluster Labs - All topics related to open-source clustering welcomed
>  主题: Re: [ClusterLabs] No slave is promoted to be
> master
> 
> Hi,
> On Thu, 12 Apr 2018 08:31:39 +
> 范国腾  wrote:
> 
> > Thank you very much for help check this issue. The information is in 
> > the attachment.
> > 
> > I have restarted the cluster after I send my first email. Not sure if 
> > it affects the checking of "the result of "crm_simulate -sL"
> 
> It does...
> 
> Could you please provide files
> from /var/lib/pacemaker/pengine/pe-input-2039.bz2 to  pe-input-2065.bz2 ?
> 
> [...]
> > Then the master is restarted and it could not start(that is ok and we 
> > know the reason)。
> 
> Why couldn't it start ?



-- 
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No slave is promoted to be master

2018-04-12 Thread Jehan-Guillaume de Rorthais
Hi,
On Thu, 12 Apr 2018 08:31:39 +
范国腾  wrote:

> Thank you very much for help check this issue. The information is in the
> attachment. 
> 
> I have restarted the cluster after I send my first email. Not sure if it
> affects the checking of "the result of "crm_simulate -sL"

It does...

Could you please provide files
from /var/lib/pacemaker/pengine/pe-input-2039.bz2 to  pe-input-2065.bz2 ?

[...]
> Then the master is restarted and it could not start(that is ok and we know
> the reason)。

Why couldn't it start ?
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No slave is promoted to be master

2018-04-12 Thread Jehan-Guillaume de Rorthais
Hi,
On Thu, 12 Apr 2018 03:14:52 +
范国腾  wrote:
> We have three nodes in the cluster. When the master postgres resource in one
> node(db1) crashed and could not start any more, we hope that one of the slave
> node(db2,db3) could be promoted be master. But it does not.
> 
> [cid:image001.jpg@01D3D24F.03F259A0]
> 
> Here is the log
> 
> db1:
> [cid:image002.jpg@01D3D24F.03F259A0]
> Db2:
> [cid:image007.png@01D3D24E.200050D0]
> Db3:
> [cid:image008.png@01D3D24E.200050D0]

Could you please provide:

* your full logs from all nodes as textual (compressed) files?
* the full setup of the cluster
* the result of "crm_simulate -sL"

Regards,
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org