Hi, On Thu, Sep 29, 2011 at 09:30:55AM -0300, Charles Richard wrote: > Here it is attached. > > I also see the following 2 errors in the node 2 logs which I assume mean the > problem is really that node1 is not getting demoted and I'm not sure why: > > Error 1: > Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c > /etc/drbd.conf primary mysqld > Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Exit code 11 > Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Command output: > Sep 28 19:53:20 staging2 lrmd: [1442]: info: RA output: > (drbd_mysql:1:promote:stdout) > Sep 28 19:53:22 staging2 lrmd: [1442]: info: RA output: > (drbd_mysql:1:promote:stderr) 0: State change failed: (-1) Multiple > primaries not allowed by config > > Error 2: > Sep 28 19:53:27 staging2 kernel: d-con mysqld: Requested state change failed > by peer: Refusing to be Primary while peer is not outdated (-7) > Sep 28 19:53:27 staging2 kernel: d-con mysqld: peer( Primary -> Unknown ) > conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( > UpToDate -> DUnknown ) > Sep 28 19:53:27 staging2 kernel: d-con mysqld: meta connection shut down by > peer. > > Also, failover works fine if i reboot either machine. The outdated machines > comes back up as secondary. The scenario where i get the errors above is > when i pull the network cable from the primary. Is that a stonith device > that should be protecting from this scenario and potentially rebooting the > primary?
Yes. That's the only way for the cluster to keep sanity in case of split-brain caused by pulling the network cable. Thanks, Dejan > Feels like I'm getting so close to getting this working! > > Thanks! > Charles > > On Thu, Sep 29, 2011 at 4:15 AM, Andrew Beekhof <and...@beekhof.net> wrote: > > > Could you attach /var/lib/pengine/pe-input-3802.bz2 from staging1? > > That would tell us why. > > > > On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard > > <chachi.rich...@gmail.com> wrote: > > > Hi, > > > > > > I'm making some headway finally with my pacemaker install but now that > > > crm_mon doesn't return errors any more and crm_verify is clear, I'm > > having a > > > problem where my master won't get promoted. Not sure what to do with > > this > > > one, any suggestions? Here's the log snippet and config files: > > > > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: crm_timer_popped: PEngine > > > Recheck Timer (I_PE_CALC) just popped! > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > > cause=C_TIMER_POPPED > > > origin=crm_timer_popped ] > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: > > Progressed > > > to state S_POLICY_ENGINE after C_TIMER_POPPED > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: All 2 > > > cluster nodes are eligible to run resources. > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke: Query 106: > > > Requesting the current CIB: S_POLICY_ENGINE > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke_callback: > > Invoking > > > the PE: query=106, ref=pe_calc-dc-1317020772-95, seq=2564, quorate=1 > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Startup > > > probes: enabled > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: unpack_config: On loss > > of > > > CCM Quorum: Ignore > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Node > > scores: > > > 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_domains: Unpacking > > > domains > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status: > > > Node staging1.dev.applepeak.com is online > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status: > > > Node staging2.dev.applepeak.com is online > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: group_print: Resource > > > Group: mysql > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > > fs_mysql#011(ocf::heartbeat:Filesystem):#011Stopped > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > > ip_mysql#011(ocf::heartbeat:IPaddr2):#011Stopped > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > > mysqld#011(lsb:mysqld):#011Stopped > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: clone_print: > > Master/Slave > > > Set: ms_drbd_mysql > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: short_print: > > Stopped: > > > [ drbd_mysql:0 drbd_mysql:1 ] > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: > > ms_drbd_mysql: > > > Promoted 0 instances of a possible 1 to master > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights: > > > fs_mysql: Rolling back scores from ip_mysql > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights: > > > ip_mysql: Rolling back scores from mysqld > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: > > ms_drbd_mysql: > > > Promoted 0 instances of a possible 1 to master > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > > resource > > > fs_mysql#011(Stopped) > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > > resource > > > ip_mysql#011(Stopped) > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > > resource > > > mysqld#011(Stopped) > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > > resource > > > drbd_mysql:0#011(Stopped) > > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > > resource > > > drbd_mysql:1#011(Stopped) > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > > > cause=C_IPC_MESSAGE origin=handle_response ] > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: unpack_graph: Unpacked > > > transition 72: 0 actions in 0 synapses > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_te_invoke: Processing > > graph > > > 72 (ref=pe_calc-dc-1317020772-95) derived from > > > /var/lib/pengine/pe-input-3802.bz2 > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: run_graph: > > > ==================================================== > > > Sep 26 04:06:12 staging1 crmd: [1686]: notice: run_graph: Transition 72 > > > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > > Source=/var/lib/pengine/pe-input-3802.bz2): Complete > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: te_graph_trigger: Transition > > 72 > > > is now complete > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: notify_crmd: Transition 72 > > > status: done - <null> > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: > > Starting > > > PEngine Recheck Timer > > > Sep 26 04:06:12 staging1 pengine: [1685]: info: process_pe_message: > > > Transition 72: PEngine Input stored in: > > /var/lib/pengine/pe-input-3802.bz2 > > > Sep 26 04:15:09 staging1 cib: [1682]: info: cib_stats: Processed 1 > > > operations (0.00us average, 0% utilization) in the last 10min > > > > > > My drbd config file: > > > > > > resource mysqld { > > > > > > protocol C; > > > > > > startup { wfc-timeout 0; degr-wfc-timeout 120; } > > > > > > disk { on-io-error detach; } > > > > > > > > > on staging1 { > > > > > > device /dev/drbd0; > > > > > > disk /dev/vg_staging1/lv_data; > > > > > > meta-disk internal; > > > > > > address 10.10.20.1:7788; > > > > > > } > > > > > > on staging2 { > > > > > > device /dev/drbd0; > > > > > > disk /dev/vg_staging2/lv_data; > > > > > > meta-disk internal; > > > > > > address 10.10.20.2:7788; > > > > > > } > > > > > > } > > > > > > corosync.conf: > > > > > > compatibility: whitetank > > > > > > aisexec { > > > user: root > > > group: root > > > } > > > > > > totem { > > > version: 2 > > > secauth: off > > > threads: 0 > > > interface { > > > ringnumber: 0 > > > bindnetaddr: 10.10.10.0 > > > mcastaddr: 226.94.1.1 > > > mcastport: 5405 > > > } > > > } > > > > > > logging { > > > fileline: off > > > to_stderr: no > > > to_logfile: no > > > to_syslog: yes > > > logfile: /var/log/cluster/corosync.log > > > debug: off > > > timestamp: on > > > logger_subsys { > > > subsys: AMF > > > debug: off > > > } > > > } > > > > > > amf { > > > mode: disabled > > > } > > > > > > service { > > > #Load Pacemaker > > > name: pacemaker > > > ver: 0 > > > use_mgmtd: yes > > > } > > > > > > And my crm config: > > > > > > node staging1.dev.applepeak.com > > > node staging2.dev.applepeak.com > > > primitive drbd_mysql ocf:linbit:drbd \ > > > params drbd_resource="mysqld" \ > > > op monitor interval="15s" \ > > > op start interval="0" timeout="240s" \ > > > op stop interval="0" timeout="100s" > > > primitive fs_mysql ocf:heartbeat:Filesystem \ > > > params device="/dev/drbd0" directory="/opt/data/mysql/data/mysql" > > > fstype="ext4" \ > > > op start interval="0" timeout="60s" \ > > > op stop interval="0" timeout="60s" > > > primitive ip_mysql ocf:heartbeat:IPaddr2 \ > > > params ip="10.10.10.31" nic="eth0" > > > primitive mysqld lsb:mysqld > > > group mysql fs_mysql ip_mysql mysqld > > > ms ms_drbd_mysql drbd_mysql \ > > > meta master-max="1" master-node-max="1" clone-max="2" > > > clone-node-max="1" notify="true" > > > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master > > > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start > > > property $id="cib-bootstrap-options" \ > > > dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ > > > cluster-infrastructure="openais" \ > > > expected-quorum-votes="2" \ > > > stonith-enabled="false" \ > > > last-lrm-refresh="1316961847" \ > > > stop-all-resources="true" \ > > > no-quorum-policy="ignore" > > > rsc_defaults $id="rsc-options" \ > > > resource-stickiness="100" > > > > > > Thanks, > > > Charles > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: > > > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker