Re: [ClusterLabs] No slave is promoted to be master
On Tue, 17 Apr 2018 04:16:38 + 范国腾 wrote: > I check the status again. It is not not promoted but it promoted about 15 > minutes after the cluster starts. > > I try in three labs and the results are same: The promotion happens 15 > minutes after the cluster starts. > > Why is there about 15 minutes delay every time? This was a bug in Pacemaker up to 1.1.17. I did a report about this last August and Ken Gaillot fixed it few days later in 1.1.18. See: https://lists.clusterlabs.org/pipermail/developers/2017-August/001110.html https://lists.clusterlabs.org/pipermail/developers/2017-September/001113.html I wonder if disabling the pgsql resource before shutting down the cluster might be a simpler and safer workaround. Eg.: pcs resource disable pgsql-ha --wait pcs cluster stop --all and pcs cluster start --all pcs resource enable pgsql-ha Another fix would be to force a master score on one node **if needed** using: crm_master -N -r -l forever -v 1 ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] No slave is promoted to be master
Hi, We install a new lab which only have the postgres resource and the vip resource. After the cluster is installed, the status is ok: only node is master and the other is slave. Then I run "pcs cluster stop --all" to close the cluster and then I run the "pcs cluster start --all" to start the cluster. All of the pgsql is slave status and they could not be promoted to be master any more like this: Master/Slave Set: pgsql-ha [pgsqld] Slaves: [ sds1 sds2 ] There is no error in the log and the " crm_simulate -sL" show the flowing and it seems that the score is ok too. The detailed log and config is in the attachment. [root@node1 ~]# crm_simulate -sL Current cluster status: Online: [ sds1 sds2 ] Master/Slave Set: pgsql-ha [pgsqld] Slaves: [ sds1 sds2 ] Resource Group: mastergroup master-vip (ocf::heartbeat:IPaddr2): Stopped pgsql-master-ip(ocf::heartbeat:IPaddr2): Stopped Allocation scores: clone_color: pgsql-ha allocation score on sds1: 1 clone_color: pgsql-ha allocation score on sds2: 1 clone_color: pgsqld:0 allocation score on sds1: 1003 clone_color: pgsqld:0 allocation score on sds2: 1 clone_color: pgsqld:1 allocation score on sds1: 1 clone_color: pgsqld:1 allocation score on sds2: 1002 native_color: pgsqld:0 allocation score on sds1: 1003 native_color: pgsqld:0 allocation score on sds2: 1 native_color: pgsqld:1 allocation score on sds1: -INFINITY native_color: pgsqld:1 allocation score on sds2: 1002 pgsqld:0 promotion score on sds1: 1002 pgsqld:1 promotion score on sds2: 1001 group_color: mastergroup allocation score on sds1: 0 group_color: mastergroup allocation score on sds2: 0 group_color: master-vip allocation score on sds1: 0 group_color: master-vip allocation score on sds2: 0 native_color: master-vip allocation score on sds1: 1003 native_color: master-vip allocation score on sds2: -INFINITY native_color: pgsql-master-ip allocation score on sds1: 1003 native_color: pgsql-master-ip allocation score on sds2: -INFINITY Transition Summary: * Promote pgsqld:0 (Slave -> Master sds1) * Start master-vip (sds1) * Start pgsql-master-ip (sds1) log.rar Description: log.rar ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] No slave is promoted to be master
OK, I know what happen. It seems like your standbies were not replicating when the master "crashed", you can find tons of messages like this in the log files: WARNING: No secondary connected to the master WARNING: "db2" is not connected to the primary WARNING: "db3" is not connected to the primary When a standby is not replicating, the master set negative master score to them to forbid the promotion on them, as they are probably lagging for some undefined time. The following command shows the scores just before the simulated master crash: $ crm_simulate -x pe-input-2039.bz2 -s|grep -E 'date|promotion' Using the original execution date of: 2018-04-11 16:23:07Z pgsqld:0 promotion score on db1: 1001 pgsqld:1 promotion score on db2: -1000 pgsqld:2 promotion score on db3: -1000 "1001" score design the master. Streaming standbies always have a positive master score between 1000 and 1000-N*10 where N is the number of connected standbies. On Fri, 13 Apr 2018 01:37:54 + 范国腾 wrote: > The log is in the attachment. > > We make a bug in the PG code in master node to make it not be restarted any > more in order to test the following scenario: One slave could be promoted > when the master crashed, > > -邮件原件- > 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > 发送时间: 2018年4月12日 17:39 > 收件人: 范国腾 > 抄送: Cluster Labs - All topics related to open-source clustering welcomed > 主题: Re: [ClusterLabs] No slave is promoted to be > master > > Hi, > On Thu, 12 Apr 2018 08:31:39 + > 范国腾 wrote: > > > Thank you very much for help check this issue. The information is in > > the attachment. > > > > I have restarted the cluster after I send my first email. Not sure if > > it affects the checking of "the result of "crm_simulate -sL" > > It does... > > Could you please provide files > from /var/lib/pacemaker/pengine/pe-input-2039.bz2 to pe-input-2065.bz2 ? > > [...] > > Then the master is restarted and it could not start(that is ok and we > > know the reason)。 > > Why couldn't it start ? -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] No slave is promoted to be master
Hi, On Thu, 12 Apr 2018 08:31:39 + 范国腾 wrote: > Thank you very much for help check this issue. The information is in the > attachment. > > I have restarted the cluster after I send my first email. Not sure if it > affects the checking of "the result of "crm_simulate -sL" It does... Could you please provide files from /var/lib/pacemaker/pengine/pe-input-2039.bz2 to pe-input-2065.bz2 ? [...] > Then the master is restarted and it could not start(that is ok and we know > the reason)。 Why couldn't it start ? ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] No slave is promoted to be master
Hi, On Thu, 12 Apr 2018 03:14:52 + 范国腾 wrote: > We have three nodes in the cluster. When the master postgres resource in one > node(db1) crashed and could not start any more, we hope that one of the slave > node(db2,db3) could be promoted be master. But it does not. > > [cid:image001.jpg@01D3D24F.03F259A0] > > Here is the log > > db1: > [cid:image002.jpg@01D3D24F.03F259A0] > Db2: > [cid:image007.png@01D3D24E.200050D0] > Db3: > [cid:image008.png@01D3D24E.200050D0] Could you please provide: * your full logs from all nodes as textual (compressed) files? * the full setup of the cluster * the result of "crm_simulate -sL" Regards, ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org