Two tips: 1) Did you stop the configured postgres in the cluster and put it into maintenance mode while tyring OCF-tester? 2) When testing my RAs I replace "'!/bin/sh" with "#!/bin/sh -x" temporarily. It produces a lot of output, but sometimes you'll find the problem.
Regards, Ulrich >>> Darren Kinley <[email protected]> schrieb am 11.08.2016 um 23:44 in Nachricht <[email protected]>: > Hi, > > I have PostgreSQL 9.3 replicated and I'm trying to put it under Pacemaker > control > using ocf:heartbeat:pgsql provided by SLES12SP1. > > This is the crmsh script that I used to configure Pacemaker. > > configure cib new pgsql_cfg --force > configure primitive res-ars-pgsql ocf:heartbeat:pgsql \ > pgctl="/usr/lib/postgresql93/bin/pg_ctl" \ > psql="/usr/lib/postgresql93/bin/psql" \ > pgdata="/var/lib/pgsql/data/" \ > rep_mode="sync" \ > node_list="ars1 ars2" \ > restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \ > primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 > keepalives_count=5" \ > master_ip="192.168.244.223" \ > restart_on_promote='true' \ > pghost="191.168.244.223" \ > repuser="postgres" \ > check_wal_receiver='true' \ > monitor_user='postgres' \ > monitor_password='xxx' \ > op start timeout="120s" interval="0s" on-fail="restart" \ > op monitor timeout="120s" interval="4s" on-fail="restart" \ > op monitor timeout="120s" interval="3s" on-fail="restart" > role="Master" \ > op promote timeout="120s" interval="0s" on-fail="restart" \ > op demote timeout="120s" interval="0s" on-fail="stop" \ > op stop timeout="120s" interval="0s" on-fail="block" \ > op notify timeout="90s" interval="0s" > configure ms ms-ars-pgsql res-ars-pgsql \ > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > configure colocation col-ars-pgsql-with-drbd inf: ms-ars-pgsql:Master > ms-ars-drbd:Master > configure cib commit pgsql_cfg > > I have a ~postgres/.pgpass > > > My nodes remain stopped and only once during the 12 hours I've been working > on this > did both nodes try to bring up PG (both in recovery mode) before shutting > them both down. > > When running ocf-tester I think that I'm to name the master/slave resource. > > ars2:/usr/lib/ocf/resource.d/heartbeat # ocf-tester -v -n > ms-ars-pgsql > `pwd`/pgsql > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql... > Testing permissions with uid nobody > Testing: meta-data > Testing: meta-data > ... > <XML removed/> > ... > Testing: validate-all > Checking current state > Testing: stop > INFO: waiting for server to shut down.... done server stopped > INFO: PostgreSQL is down > Testing: monitor > INFO: PostgreSQL is down > Testing: monitor > ocf-exit-reason:Setup problem: couldn't find command: /usr/bin/pg_ctl > Testing: start > INFO: server starting > INFO: PostgreSQL start command sent. > INFO: PostgreSQL is started. > Testing: monitor > Testing: monitor > INFO: Don't check /var/lib/pgsql/data during probe > Testing: notify > Checking for demote action > ocf-exit-reason:Not in a replication mode. > Checking for promote action > ocf-exit-reason:Not in a replication mode. > Testing: demotion of started resource > ocf-exit-reason:Not in a replication mode. > * rc=6: Demoting a start resource should not fail > Testing: promote > ocf-exit-reason:Not in a replication mode. > * rc=6: Promote failed > Testing: demote > ocf-exit-reason:Not in a replication mode. > * rc=6: Demote failed > Aborting tests > > > 'Not in a replication mode' disagrees with the res-ars-pgsql above. > I'm not sure that the pacemaker.log for CIB changes is needed. > > Aug 11 09:19:53 [2757] ars2 pengine: info: clone_print: > Master/Slave Set: ms-ars-pgsql [res-ars-pgsql] > Aug 11 09:19:53 [2757] ars2 pengine: info: short_print: > Stopped: [ ars1 ars2 ] > Aug 11 09:19:53 [2757] ars2 pengine: info: > get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars1 > Aug 11 09:19:53 [2757] ars2 pengine: warning: > common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after > 1000000 > failures (max=1000000) > Aug 11 09:19:53 [2757] ars2 pengine: info: > get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars1 > Aug 11 09:19:53 [2757] ars2 pengine: warning: > common_apply_stickiness: Forcing ms-ars-pgsql away from ars1 after > 1000000 > failures (max=1000000) > Aug 11 09:19:53 [2757] ars2 pengine: info: > get_failcount_full: res-ars-pgsql:0 has failed INFINITY times on ars2 > Aug 11 09:19:53 [2757] ars2 pengine: warning: > common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after > 1000000 > failures (max=1000000) > Aug 11 09:19:53 [2757] ars2 pengine: info: > get_failcount_full: ms-ars-pgsql has failed INFINITY times on ars2 > Aug 11 09:19:53 [2757] ars2 pengine: warning: > common_apply_stickiness: Forcing ms-ars-pgsql away from ars2 after > 1000000 > failures (max=1000000) > Aug 11 09:19:53 [2757] ars2 pengine: info: rsc_merge_weights: > ms-ars-drbd: Rolling back scores from ms-ars-pgsql > Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: > Promoting res-ars-drbd:1 (Master ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: > ms-ars-drbd: Promoted 1 instances of a possible 1 to master > Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: > res-ars-pgsql:0: Rolling back scores from ms-ars-drbd > Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: > Resource res-ars-pgsql:0 cannot run anywhere > Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: > res-ars-pgsql:1: Rolling back scores from ms-ars-drbd > Aug 11 09:19:53 [2757] ars2 pengine: info: native_color: > Resource res-ars-pgsql:1 cannot run anywhere > Aug 11 09:19:53 [2757] ars2 pengine: info: master_color: > ms-ars-pgsql: Promoted 0 instances of a possible 1 to master > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-mgmt-vip (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-mgmt-app (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-vip (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-drbd:0 (Slave ars1) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-drbd:1 (Master ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-lvm (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-fs_dropbox (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-fs_svndata (Started ars2) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-pgsql:0 (Stopped) > Aug 11 09:19:53 [2757] ars2 pengine: info: LogActions: > Leave res-ars-pgsql:1 (Stopped) > Aug 11 09:19:53 [2758] ars2 crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE > [ > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > Aug 11 09:19:53 [2758] ars2 crmd: notice: do_te_invoke: > Processing graph 222 (ref=pe_calc-dc-1470932393-1349) derived from > /var/lib/pacemaker/pengine/pe-input-625.bz2 > > and /var/log/messages > > 2016-08-11T09:19:53.146603-07:00 ars-2 crmd[2758]: notice: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > 2016-08-11T09:19:53.152322-07:00 ars-2 pengine[2757]: notice: On > loss > of CCM Quorum: Ignore > 2016-08-11T09:19:53.153078-07:00 ars-2 pengine[2757]: warning: > Forcing > ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000) > 2016-08-11T09:19:53.153266-07:00 ars-2 pengine[2757]: warning: > Forcing > ms-ars-pgsql away from ars1 after 1000000 failures (max=1000000) > 2016-08-11T09:19:53.153395-07:00 ars-2 pengine[2757]: warning: > Forcing > ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000) > 2016-08-11T09:19:53.153547-07:00 ars-2 pengine[2757]: warning: > Forcing > ms-ars-pgsql away from ars2 after 1000000 failures (max=1000000) > 2016-08-11T09:19:53.155568-07:00 ars-2 crmd[2758]: notice: > Processing > graph 222 (ref=pe_calc-dc-1470932393-1349) derived from > /var/lib/pacemaker/pengine/pe-input-625.bz2 > 2016-08-11T09:19:53.155768-07:00 ars-2 pengine[2757]: notice: > Calculated Transition 222: /var/lib/pacemaker/pengine/pe-input-625.bz2 > 2016-08-11T09:19:53.155927-07:00 ars-2 crmd[2758]: notice: > Transition > 222 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-625.bz2): Complete > 2016-08-11T09:19:53.156085-07:00 ars-2 crmd[2758]: notice: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > > > Can anyone provide thoughs on how to debug this? > Should I give up with the SLES provided RA and use PAF instead? > > Thanks, > Darren _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
