[ClusterLabs] ocf:heartbeat:pgsql not starting

Darren Kinley Thu, 11 Aug 2016 14:47:28 -0700

Hi,

I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker 
control
using ocf:heartbeat:pgsql provided by SLES12SP1.


This is the crmsh script that I used to configure Pacemaker.

        configure cib new pgsql_cfg --force
        configure primitive res-ars-pgsql ocf:heartbeat:pgsql \
           pgctl="/usr/lib/postgresql93/bin/pg_ctl" \
           psql="/usr/lib/postgresql93/bin/psql" \
           pgdata="/var/lib/pgsql/data/" \
           rep_mode="sync" \
           node_list="ars1 ars2" \
           restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
           primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 
keepalives_count=5" \
           master_ip="192.168.244.223" \
           restart_on_promote='true' \
           pghost="191.168.244.223" \
           repuser="postgres" \
           check_wal_receiver='true' \
           monitor_user='postgres' \
           monitor_password='xxx' \
           op start   timeout="120s" interval="0s"  on-fail="restart" \
           op monitor timeout="120s" interval="4s" on-fail="restart" \
           op monitor timeout="120s" interval="3s"  on-fail="restart" 
role="Master" \
           op promote timeout="120s" interval="0s"  on-fail="restart" \
           op demote  timeout="120s" interval="0s"  on-fail="stop" \
           op stop    timeout="120s" interval="0s"  on-fail="block" \
           op notify  timeout="90s" interval="0s"
        configure ms ms-ars-pgsql res-ars-pgsql \
           meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true
        configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master 
ms-ars-drbd:Master
        configure cib commit pgsql_cfg

I have a ~postgres/.pgpass


My nodes remain stopped and only once during the 12 hours I've been working on 
this
did both nodes try to bring up PG (both in recovery mode) before shutting them 
both down.

When running ocf-tester I think that I'm to name the master/slave resource.

        ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n ms-ars-pgsql 
`pwd`/pgsql
        Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
        Testing permissions with uid nobody
        Testing: meta-data
        Testing: meta-data
        ...
        <XML removed/>
        ...
        Testing: validate-all
        Checking current state
        Testing: stop
        INFO: waiting for server to shut down.... done server stopped
        INFO: PostgreSQL is down
        Testing: monitor
        INFO: PostgreSQL is down
        Testing: monitor
        ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl
        Testing: start
        INFO: server starting
        INFO: PostgreSQL start command sent.
        INFO: PostgreSQL is started.
        Testing: monitor
        Testing: monitor
        INFO: Don't check /var/lib/pgsql/data during probe
        Testing: notify
        Checking for demote action
        ocf-exit-reason:Not in a replication mode.
        Checking for promote action
        ocf-exit-reason:Not in a replication mode.
        Testing: demotion of started resource
        ocf-exit-reason:Not in a replication mode.
        * rc=6: Demoting a start resource should not fail
        Testing: promote
        ocf-exit-reason:Not in a replication mode.
        * rc=6: Promote failed
        Testing: demote
        ocf-exit-reason:Not in a replication mode.
        * rc=6: Demote failed
        Aborting tests


'Not in a replication mode' disagrees with the res-ars-pgsql above.
I'm not sure that the pacemaker.log for CIB changes is needed.

        Aug 11 09:19:53 [2757] ars2    pengine:     info: clone_print:   
Master/Slave Set: ms-ars-pgsql [res-ars-pgsql]
        Aug 11 09:19:53 [2757] ars2    pengine:     info: short_print:       
Stopped: [ ars1 ars2 ]
        Aug 11 09:19:53 [2757] ars2    pengine:     info: get_failcount_full:   
res-ars-pgsql:0 has failed INFINITY times on ars1
        Aug 11 09:19:53 [2757] ars2    pengine:  warning: 
common_apply_stickiness:      Forcing ms-ars-pgsql away from ars1 after 1000000 
failures (max=1000000)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: get_failcount_full:   
ms-ars-pgsql has failed INFINITY times on ars1
        Aug 11 09:19:53 [2757] ars2    pengine:  warning: 
common_apply_stickiness:      Forcing ms-ars-pgsql away from ars1 after 1000000 
failures (max=1000000)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: get_failcount_full:   
res-ars-pgsql:0 has failed INFINITY times on ars2
        Aug 11 09:19:53 [2757] ars2    pengine:  warning: 
common_apply_stickiness:      Forcing ms-ars-pgsql away from ars2 after 1000000 
failures (max=1000000)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: get_failcount_full:   
ms-ars-pgsql has failed INFINITY times on ars2
        Aug 11 09:19:53 [2757] ars2    pengine:  warning: 
common_apply_stickiness:      Forcing ms-ars-pgsql away from ars2 after 1000000 
failures (max=1000000)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: rsc_merge_weights:    
ms-ars-drbd: Rolling back scores from ms-ars-pgsql
        Aug 11 09:19:53 [2757] ars2    pengine:     info: master_color: 
Promoting res-ars-drbd:1 (Master ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: master_color: 
ms-ars-drbd: Promoted 1 instances of a possible 1 to master
        Aug 11 09:19:53 [2757] ars2    pengine:     info: native_color: 
res-ars-pgsql:0: Rolling back scores from ms-ars-drbd
        Aug 11 09:19:53 [2757] ars2    pengine:     info: native_color: 
Resource res-ars-pgsql:0 cannot run anywhere
        Aug 11 09:19:53 [2757] ars2    pengine:     info: native_color: 
res-ars-pgsql:1: Rolling back scores from ms-ars-drbd
        Aug 11 09:19:53 [2757] ars2    pengine:     info: native_color: 
Resource res-ars-pgsql:1 cannot run anywhere
        Aug 11 09:19:53 [2757] ars2    pengine:     info: master_color: 
ms-ars-pgsql: Promoted 0 instances of a possible 1 to master
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-mgmt-vip    (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-mgmt-app    (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-vip     (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-drbd:0  (Slave ars1)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-drbd:1  (Master ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-lvm     (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-fs_dropbox      (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-fs_svndata      (Started ars2)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-pgsql:0 (Stopped)
        Aug 11 09:19:53 [2757] ars2    pengine:     info: LogActions:   Leave   
res-ars-pgsql:1 (Stopped)
        Aug 11 09:19:53 [2758] ars2       crmd:     info: do_state_transition:  
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
        Aug 11 09:19:53 [2758] ars2       crmd:   notice: do_te_invoke: 
Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from 
/var/lib/pacemaker/pengine/pe-input-625.bz2

and /var/log/messages

        2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]:   notice: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
        2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]:   notice: On loss 
of CCM Quorum: Ignore
        2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]:  warning: Forcing 
ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
        2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]:  warning: Forcing 
ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
        2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]:  warning: Forcing 
ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
        2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]:  warning: Forcing 
ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
        2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]:   notice: Processing 
graph 222 (ref=pe_calc-dc-1470932393-1349) derived from 
/var/lib/pacemaker/pengine/pe-input-625.bz2
        2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]:   notice: 
Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2
        2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]:   notice: Transition 
222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete
        2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]:   notice: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]


Can anyone provide thoughs on how to debug this?
Should I give up with the SLES provided RA and use PAF instead?

Thanks,
Darren

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] ocf:heartbeat:pgsql not starting

Reply via email to