Thank you Andrew, perfect ! The more I use Pacemaker, the more tricky it gets when working with third-party tools (although crmsh is great). Completing PCMK configuration obviously involves some manual XML file tuning. It's often more crmsh issues than PCMK actually...
2014-06-27 7:43 GMT+02:00 Andrew Beekhof <and...@beekhof.net>: > > On 25 Jun 2014, at 7:36 pm, Sékine Coulibaly <scoulib...@gmail.com> wrote: > > > Hi all, > > > > My setup is as follows : RedHat 6.3 (yes, I know,this is quite old) , > Pacemaker 1.1.7, Corosync 1.4.1. > > > > I noticed something that is strange because since it doesn't complies > with what I read (and understood) from the following ressources : > > 1. http://crmsh.nongnu.org/crm.8.html#cmdhelp_configure_order > > 2. > http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.manual_config.html > > > > > > The strange behaviour I experience is related to ordering. > > Let's have this very basic case : > > > > One zookeeper process, and one PostgreSQL master. I want them to run > alltogether on the master node. Since they bind on the VIP IP, I want the > VIP to be started before ZK and PostgreSQL. > > So I setup : > > > > primitive VIP73 ocf:heartbeat:IPaddr2 \ > > params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1" > cidr_netmask="24" iflabel="VIP73" \ > > op monitor interval="10s" timeout="20s" > > > > primitive POSTGRESQL ocf:custom:postgresql \ > > params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf" > pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \ > > op start interval="0" timeout="90s" \ > > op stop interval="0" timeout="60s" \ > > op promote interval="0" timeout="120s" \ > > op monitor interval="53s" role="Master" \ > > op monitor interval="60s" role="Slave" > > > > primitive ZK ocf:custom:zookeeper \ > > op monitor interval="5s" timeout="10s" \ > > op start interval="0" timeout="10s" \ > > op stop interval="0" timeout="10s" > > > > ms MS_POSTGRESQL POSTGRESQL \ > > meta clone-max="2" target-role="Started" > resource-stickiness="100" notify="true" > > > > I then add an ordering such as this one : > > order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote > > > > I expect the VIP to be mounted, and then ZK started, and then PostgreSQL > master to be promoted. Instead, all the resources seem to be started in > parallel. > > Yep, that'll do that. > I think you need brackets around the group or something, my memory is a > little hazy. > Thats the trigger for crmsh to set sequential=true which gives the > behaviour you're looking for. > > The default is quite silly. > > Possibly its easier to just edit the xml. > > > > > Here comes a /var/log/messages extract, taken just after a corosync > restart. > > It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all > started simultaneously. > > Do I have something wrong here, or something I didn't understand ? > > > > > > > > primitive VIP73 ocf:heartbeat:IPaddr2 \ > > params ip="192.168.73.222" broadcast="192.168.73.255" nic="eth1" > cidr_netmask="24" iflabel="VIP73" \ > > op monitor interval="10s" timeout="20s" > > > > primitive POSTGRESQL ocf:custom:postgresql \ > > params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf" > pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/custom/pgdata" \ > > op start interval="0" timeout="90s" \ > > op stop interval="0" timeout="60s" \ > > op promote interval="0" timeout="120s" \ > > op monitor interval="53s" role="Master" \ > > op monitor interval="60s" role="Slave" > > > > primitive ZK ocf:custom:zookeeper \ > > op monitor interval="5s" timeout="10s" \ > > op start interval="0" timeout="10s" \ > > op stop interval="0" timeout="10s" > > ms MS_POSTGRESQL POSTGRESQL \ > > meta clone-max="2" target-role="Started" > resource-stickiness="100" notify="true" > > > > I then add an ordering such as this one : > > order VIP_last inf: VIP73 ZK MS_POSTGRESQL:promote > > > > I expect the VIP to be mounted, and then ZK started, and then PostgreSQL > master to be promoted. Instead, all the resources seem to be started in > parallel. > > > > Here comes a /var/log/messages extract, taken just after a corosync > restart. > > It can be seen at Jun 25 05:19:52 that VIP73 and POSTGRESQL are all > started simultaneously. > > Do I have something wrong here, or something I didn't understand ? > > > > Thank you ! > > > > Jun 25 05:19:52 clustera pengine[33832]: info: > determine_online_status: Node clustera is online > > Jun 25 05:19:52 clustera pengine[33832]: info: native_print: > VIP73#011(ocf::heartbeat:IPaddr2):#011Stopped > > Jun 25 05:19:52 clustera pengine[33832]: info: clone_print: > Master/Slave Set: MS_POSTGRESQL [POSTGRESQL] > > Jun 25 05:19:52 clustera pengine[33832]: info: short_print: > Stopped: [ POSTGRESQL:0 POSTGRESQL:1 ] > > Jun 25 05:19:52 clustera pengine[33832]: info: native_print: > ZK#011(ocf::custom:zookeeper):#011Stopped > > Jun 25 05:19:52 clustera pengine[33832]: info: native_color: > Resource POSTGRESQL:1 cannot run anywhere > > Jun 25 05:19:52 clustera pengine[33832]: info: master_color: > MS_POSTGRESQL: Promoted 0 instances of a possible 1 to master > > Jun 25 05:19:52 clustera pengine[33832]: info: RecurringOp: Start > recurring monitor (10s) for VIP73 on clustera > > Jun 25 05:19:52 clustera pengine[33832]: info: RecurringOp: Start > recurring monitor (60s) for POSTGRESQL:0 on clustera > > Jun 25 05:19:52 clustera pengine[33832]: info: RecurringOp: Start > recurring monitor (60s) for POSTGRESQL:0 on clustera > > Jun 25 05:19:52 clustera pengine[33832]: info: RecurringOp: Start > recurring monitor (5s) for ZK on clustera > > Jun 25 05:19:52 clustera pengine[33832]: notice: LogActions: Start > VIP73#011(clustera) > > Jun 25 05:19:52 clustera pengine[33832]: notice: LogActions: Start > POSTGRESQL:0#011(clustera) > > Jun 25 05:19:52 clustera pengine[33832]: info: LogActions: Leave > POSTGRESQL:1#011(Stopped) > > Jun 25 05:19:52 clustera pengine[33832]: notice: LogActions: Start > ZK#011(clustera) > > > > ... > > Jun 25 05:19:52 clustera crmd[33833]: info: te_rsc_command: > Initiating action 3: probe_complete probe_complete on clustera (local) - no > waiting > > Jun 25 05:19:52 clustera crmd[33833]: info: te_rsc_command: > Initiating action 7: start VIP73_start_0 on clustera (local) > > Jun 25 05:19:52 clustera attrd[33831]: info: find_hash_entry: > Creating hash entry for probe_complete > > Jun 25 05:19:52 clustera attrd[33831]: notice: attrd_trigger_update: > Sending flush op to all hosts for: probe_complete (true) > > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:5: start > > Jun 25 05:19:52 clustera attrd[33831]: notice: attrd_perform_update: > Sent update 4: probe_complete=true > > Jun 25 05:19:52 clustera crmd[33833]: info: te_rsc_command: > Initiating action 9: start POSTGRESQL:0_start_0 on clustera (local) > > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:POSTGRESQL:0:6: start > > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip -f inet addr > add 192.168.73.222/24 brd 192.168.73.255 dev eth1 label eth1:VIP73 > > Jun 25 05:19:52 clustera avahi-daemon[2065]: Registering new address > record for 192.168.73.222 on eth1.IPv4. > > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: ip link set eth1 up > > Jun 25 05:19:52 clustera IPaddr2(VIP73)[33953]: INFO: > /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p > /var/run/heartbeat/rsctmp/send_arp-192.168.73.222 eth1 192.168.73.222 auto > not_used not_used > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > POSTGRESQL:0: Starting > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > ACTION=start > > Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:start > process 33953 exited with return code 0. > > Jun 25 05:19:52 clustera crmd[33833]: info: process_lrm_event: LRM > operation VIP73_start_0 (call=5, rc=0, cib-update=29, confirmed=true) ok > > Jun 25 05:19:52 clustera crmd[33833]: info: te_rsc_command: > Initiating action 8: monitor VIP73_monitor_10000 on clustera (local) > > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:VIP73:7: monitor > > Jun 25 05:19:52 clustera crmd[33833]: info: te_rsc_command: > Initiating action 35: start ZK_start_0 on clustera (local) > > Jun 25 05:19:52 clustera lrmd: [33830]: info: rsc:ZK:8: start > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RA_VERSION_MAJOR=1 > > Jun 25 05:19:52 clustera zookeeper(ZK)[34058]: INFO: [ZK] No pid file > found > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RA_VERSION_MINOR=0 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_clone=0 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_clone_max=2 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_clone_node_max=1 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_globally_unique=false > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_master_max=1 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_master_node_max=1 > > Jun 25 05:19:52 clustera lrmd: [33830]: info: Managed VIP73:monitor > process 34057 exited with return code 0. > > Jun 25 05:19:52 clustera crmd[33833]: info: process_lrm_event: LRM > operation VIP73_monitor_10000 (call=7, rc=0, cib-update=30, > confirmed=false) ok > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_name=start > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify=true > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_active_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_active_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_demote_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_demote_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_inactive_resource=POSTGRESQL:0 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > POSTGRESQL:1 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_master_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_master_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_promote_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_promote_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_slave_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_slave_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_start_resource=POSTGRESQL:0 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_start_uname=clustera > > Jun 25 05:19:52 clustera lrmd: [33830]: info: RA output: > (ZK:start:stdout) Starting zookeeper ... > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_stop_resource= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_notify_stop_uname= > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_CRM_meta_timeout=90000 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_crm_feature_set=3.0.6 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_pgctl=/usr/pgsql-9.2/bin/pg_ctl > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_pgdata=/opt/custom/pgdata > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESKEY_repmgr_conf=/var/lib/pgsql/repmgr/repmgr.conf > > Jun 25 05:19:52 clustera avahi-daemon[2065]: Invalid legacy unicast > query packet. > > Jun 25 05:19:52 clustera avahi-daemon[2065]: Received response from host > 192.168.72.1 with invalid source port 50070 on interface 'eth0.0' > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESOURCE_INSTANCE=POSTGRESQL:0 > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESOURCE_PROVIDER=custom > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_RESOURCE_TYPE=postgresql > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: > OCF_ROOT=/usr/lib/ocf > > Jun 25 05:19:52 clustera postgresql(POSTGRESQL:0)[33954]: INFO: Run as > postgres: /usr/pgsql-9.2/bin/pg_ctl start -w -D /opt/custom/pgdata -l > /var/log/custom/custom/ha/postgresql/postgres_ha.log -o '-c > config_file=/opt/custom/pgdata/postgresql.conf' -o '-p 5432' > > Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast > query packet. > > Jun 25 05:19:53 clustera avahi-daemon[2065]: Invalid legacy unicast > query packet. > > Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host > 192.168.72.1 with invalid source port 50070 on interface 'eth0.0' > > Jun 25 05:19:53 clustera avahi-daemon[2065]: Received response from host > 192.168.72.1 with invalid source port 50070 on interface 'eth0.0' > > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: > (ZK:start:stdout) STARTED > > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: > (ZK:start:stdout) > ---------------------------------------------------------#012Starting ZK > restore#012--------------------------------------------------------- > > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: > (ZK:start:stderr) ls: cannot access > /var/custom/custom/ha/snapshots/zk_dump_*.xml: No such file or directory > > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: > (ZK:start:stdout) [ZK] Last snapshot : <NONE>#012[ZK] > WARNING: Nothing to restore > > Jun 25 05:19:53 clustera zookeeper(ZK)[34058]: INFO: [ZK] Nothing to > restore > > Jun 25 05:19:53 clustera lrmd: [33830]: info: RA output: > (ZK:start:stdout) --------------------------------------------------------- > > Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:start process > 34058 exited with return code 0. > > Jun 25 05:19:53 clustera crmd[33833]: info: process_lrm_event: LRM > operation ZK_start_0 (call=8, rc=0, cib-update=31, confirmed=true) ok > > Jun 25 05:19:53 clustera crmd[33833]: info: te_rsc_command: > Initiating action 36: monitor ZK_monitor_5000 on clustera (local) > > Jun 25 05:19:53 clustera lrmd: [33830]: info: rsc:ZK:9: monitor > > Jun 25 05:19:53 clustera lrmd: [33830]: info: Managed ZK:monitor process > 34390 exited with return code 0. > > Jun 25 05:19:53 clustera crmd[33833]: info: process_lrm_event: LRM > operation ZK_monitor_5000 (call=9, rc=0, cib-update=32, confirmed=false) ok > > Jun 25 05:19:53 clustera postgresql(POSTGRESQL:0)[33954]: INFO: waiting > for server to start.... done server started > > Jun 25 05:19:54 clustera avahi-daemon[2065]: Received response from host > 192.168.72.1 with invalid source port 50070 on interface 'eth0.0' > > Jun 25 05:19:56 clustera avahi-daemon[2065]: Received response from host > 192.168.72.1 with invalid source port 50070 on interface 'eth0.0' > > Jun 25 05:19:56 clustera lrmd: [33830]: info: RA output: > (VIP73:start:stderr) ARPING 192.168.73.222 from 192.168.73.222 eth1#012Sent > 5 probes (5 broadcast(s))# > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org