On 05/07/2018 07:39 AM, 范国腾 wrote:
> Hi,
>
> We have two nodes cluster using PAF to manage the postgres. Node2 is master. 
> Master/Slave Set: pgsql-ha [pgsqld]
>      Master: [sds2]
>      Slaves: [ sds1 ]
>
> In the master node(sds2), I remove the data directory of postgres. I expect 
> the master nodes(sds2) stop and the slave node(sds1) is promoted to master. 
> The sds2 log show that is executes monitor->notify->demote->notify->stop. The 
> sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the 
> "pcs status" shows the status like the following. Could you please help check 
> what prevents the promotion happen in sds1? What should I do if I want to 
> recovery the system?

Didn't check all detail but looks as if stopping the resource would
fail. So that it doesn't know the state on sds2 and thus can't
promote on sds1.
If you had enabled fencing this would lead to sds2 being fenced
so that sds1 can take over.

As digimer would say: "use fencing!"

Regards,
Klaus

>
> 2 nodes configured
> 3 resources configured
> Online: [ sds1 sds2 ]
> Full list of resources:
>  Master/Slave Set: pgsql-ha [pgsqld]
>      pgsqld     (ocf::heartbeat:pgsqlms):       FAILED Master sds2 (blocked)
>      Slaves: [ sds1 ]
>  Resource Group: mastergroup
>      master-vip (ocf::heartbeat:IPaddr2):       Started sds2
> Failed Actions:
> * pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, 
> exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists',
>     last-rc-change='Mon May  7 00:39:06 2018', queued=1ms, exec=72ms
>
>
>
> Here is the sds2 log:
> May  7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor 
> and the result 8
> May  7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor 
> and the result 8
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_monitor_10000:14152:stderr 
> [ ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14162:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_10000:36 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_demote_0:14172:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of demote operation for 
> pgsqld on sds2: 2 (invalid parameter)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_demote_0:39 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14182:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14192:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
> pgsqld on sds2: 0 (ok)
> May  7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA 
> "/home/highgo/highgo/database/4.3.1/data" does not exists
> May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_stop_0:14202:stderr [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists ]
> May  7 00:39:06 node2 crmd[1129]:  notice: Result of stop operation for 
> pgsqld on sds2: 2 (invalid parameter)
> May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_stop_0:42 [ 
> ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
> exists\n ]
> May  7 00:40:01 node2 systemd: Started Session 4 of user root.
> May  7 00:40:01 node2 systemd: Starting Session 4 of user root.
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Caught 'Terminated' signal
> May  7 00:47:21 node2 systemd: Stopping Pacemaker High Availability Cluster 
> Manager...
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Shutting down Pacemaker
> May  7 00:47:21 node2 pacemakerd[1063]:  notice: Stopping crmd
> May  7 00:47:21 node2 crmd[1129]:  notice: Caught 'Terminated' signal
> May  7 00:47:21 node2 crmd[1129]:  notice: Shutting down cluster resource 
> manager
>
> Here is the sds1 log(in the attachment)
> May  7 00:38:47 node1 pgsqlms(pgsqld)[4426]: INFO: Execute action monitor and 
> the result 0May  7 00:39:03 node1 pgsqlms(pgsqld)[4442]: INFO: Execute action 
> monitor and the result 0May  7 00:39:06 node1 crmd[1133]:  notice: State 
> transition S_IDLE -> S_POLICY_ENGINEMay  7 00:39:06 node1 pengine[1132]: 
> warning: Processing failed op monitor for pgsqld:1 on sds2: invalid parameter 
> (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing pgsql-ha from 
> re-starting on sds2: operation monitor failed 'invalid parameter' (2)May  7 
> 00:39:06 node1 pengine[1132]:  notice: Promote pgsqld:0#011(Slave -> Master 
> sds1)May  7 00:39:06 node1 pengine[1132]:  notice: Demote  
> pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 pengine[1132]:  
> notice: Move    master-vip#011(Started sds2 -> sds1)May  7 00:39:06 node1 
> pengine[1132]:  notice: Calculated transition 31, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-97.bz2May  7 00:39:06 node1 
> pengine[1132]: warning: Processing failed op monitor for pgsqld:1 on sds2: 
> invalid parameter (2)May  7 00:39:06 node1 pengine[1132]:   error: Preventing 
> pgsql-ha from re-starting on sds2: operation monitor failed 'invalid 
> parameter' (2)May  7 00:39:06 node1 pengine[1132]:  notice: Promote 
> pgsqld:0#011(Slave -> Master sds1)May  7 00:39:06 node1 pengine[1132]:  
> notice: Demote  pgsqld:1#011(Master -> Stopped sds2)May  7 00:39:06 node1 
> pengine[1132]:  notice: Move    master-vip#011(Started sds2 -> sds1)May  7 
> 00:39:06 node1 pengine[1132]:  notice: Calculated transition 32, saving 
> inputs in /var/lib/pacemaker/pengine/pe-input-98.bz2May  7 00:39:06 node1 
> crmd[1133]:  notice: Initiating cancel operation pgsqld_monitor_16000 locally 
> on sds1May  7 00:39:06 node1 crmd[1133]:  notice: Initiating notify operation 
> pgsqld_pre_notify_demote_0 locally on sds1May  7 00:39:06 node1 crmd[1133]:  
> notice: Initiating notify operation pgsqld_pre_notify_demote_0 on sds2
>
>
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to