Hi,
I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker
control
using ocf:heartbeat:pgsql provided by SLES12SP1.
This is the crmsh script that I used to configure Pacemaker.
configure cib new pgsql_cfg --force
configure primitive res-ars-pgsql ocf:heartbeat:pgsql \
pgctl="/usr/lib/postgresql93/bin/pg_ctl" \
psql="/usr/lib/postgresql93/bin/psql" \
pgdata="/var/lib/pgsql/data/" \
rep_mode="sync" \
node_list="ars1 ars2" \
restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" \
master_ip="192.168.244.223" \
restart_on_promote='true' \
pghost="191.168.244.223" \
repuser="postgres" \
check_wal_receiver='true' \
monitor_user='postgres' \
monitor_password='xxx' \
op start timeout="120s" interval="0s" on-fail="restart" \
op monitor timeout="120s" interval="4s" on-fail="restart" \
op monitor timeout="120s" interval="3s" on-fail="restart"
role="Master" \
op promote timeout="120s" interval="0s" on-fail="restart" \
op demote timeout="120s" interval="0s" on-fail="stop" \
op stop timeout="120s" interval="0s" on-fail="block" \
op notify timeout="90s" interval="0s"
configure ms ms-ars-pgsql res-ars-pgsql \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master
ms-ars-drbd:Master
configure cib commit pgsql_cfg
I have a ~postgres/.pgpass
My nodes remain stopped and only once during the 12 hours I've been working on
this
did both nodes try to bring up PG (both in recovery mode) before shutting them
both down.
When running ocf-tester I think that I'm to name the master/slave resource.
ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n ms-ars-pgsql
`pwd`/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
Testing permissions with uid nobody
Testing: meta-data
Testing: meta-data
...
<XML removed/>
...
Testing: validate-all
Checking current state
Testing: stop
INFO: waiting for server to shut down.... done server stopped
INFO: PostgreSQL is down
Testing: monitor
INFO: PostgreSQL is down
Testing: monitor
ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl
Testing: start
INFO: server starting
INFO: PostgreSQL start command sent.
INFO: PostgreSQL is started.
Testing: monitor
Testing: monitor
INFO: Don't check /var/lib/pgsql/data during probe
Testing: notify
Checking for demote action
ocf-exit-reason:Not in a replication mode.
Checking for promote action
ocf-exit-reason:Not in a replication mode.
Testing: demotion of started resource
ocf-exit-reason:Not in a replication mode.
* rc=6: Demoting a start resource should not fail
Testing: promote
ocf-exit-reason:Not in a replication mode.
* rc=6: Promote failed
Testing: demote
ocf-exit-reason:Not in a replication mode.
* rc=6: Demote failed
Aborting tests
'Not in a replication mode' disagrees with the res-ars-pgsql above.
I'm not sure that the pacemaker.log for CIB changes is needed.
Aug 11 09:19:53 [2757] ars2 pengine: info: clone_print:
Master/Slave Set: ms-ars-pgsql [res-ars-pgsql]
Aug 11 09:19:53 [2757] ars2 pengine: info: short_print:
Stopped: [ ars1 ars2 ]
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full:
res-ars-pgsql:0 has failed INFINITY times on ars1
Aug 11 09:19:53 [2757] ars2 pengine: warning:
common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000
failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full:
ms-ars-pgsql has failed INFINITY times on ars1
Aug 11 09:19:53 [2757] ars2 pengine: warning:
common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after 1000000
failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full:
res-ars-pgsql:0 has failed INFINITY times on ars2
Aug 11 09:19:53 [2757] ars2 pengine: warning:
common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000
failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: get_failcount_full:
ms-ars-pgsql has failed INFINITY times on ars2
Aug 11 09:19:53 [2757] ars2 pengine: warning:
common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after 1000000
failures (max=1000000)
Aug 11 09:19:53 [2757] ars2 pengine: info: rsc_merge_weights:
ms-ars-drbd: Rolling back scores from ms-ars-pgsql
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
Promoting res-ars-drbd:1 (Master ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
ms-ars-drbd: Promoted 1 instances of a possible 1 to master
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
res-ars-pgsql:0: Rolling back scores from ms-ars-drbd
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
Resource res-ars-pgsql:0 cannot run anywhere
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
res-ars-pgsql:1: Rolling back scores from ms-ars-drbd
Aug 11 09:19:53 [2757] ars2 pengine: info: native_color:
Resource res-ars-pgsql:1 cannot run anywhere
Aug 11 09:19:53 [2757] ars2 pengine: info: master_color:
ms-ars-pgsql: Promoted 0 instances of a possible 1 to master
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-mgmt-vip (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-mgmt-app (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-vip (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-drbd:0 (Slave ars1)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-drbd:1 (Master ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-lvm (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-fs_dropbox (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-fs_svndata (Started ars2)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-pgsql:0 (Stopped)
Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: Leave
res-ars-pgsql:1 (Stopped)
Aug 11 09:19:53 [2758] ars2 crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Aug 11 09:19:53 [2758] ars2 crmd: notice: do_te_invoke:
Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from
/var/lib/pacemaker/pengine/pe-input-625.bz2
and /var/log/messages
2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]: notice: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]: notice: On loss
of CCM Quorum: Ignore
2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]: warning: Forcing
ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]: warning: Forcing
ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]: warning: Forcing
ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]: warning: Forcing
ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000)
2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]: notice: Processing
graph 222 (ref=pe_calc-dc-1470932393-1349) derived from
/var/lib/pacemaker/pengine/pe-input-625.bz2
2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]: notice:
Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2
2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]: notice: Transition
222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete
2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]: notice: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Can anyone provide thoughs on how to debug this?
Should I give up with the SLES provided RA and use PAF instead?
Thanks,
Darren
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org