Hi, On Mon, Feb 08, 2010 at 03:45:25PM -0600, D. J. Draper wrote: > > On Mon, Feb 8 13:36:47 EST 2010, Dejan Muhamedagic wrote: > The logs don't contain the period when CRM probes for running > resources. But I can imagine what is actually going on. This is a > deficiency in handling probes in the LVM and, perhaps, the > Filesystem resource agents. Can you please post the logs from the > time when the cluster is starting. Actually, best to open a > bugzilla and attach a hb_report report. > > Thanks, > > Dejan > Thanks for the reply Dejan. I attached a zip file with several > log files covering two reboots on each server. To generate
According to Node01Reboot1500ha-log.log, CRM first starts LVM then drbd: Feb 08 15:03:36 node01.houseofdraper.org lrmd: [1771]: info: rsc:lvm_data0:6: start Feb 08 15:03:36 node01.houseofdraper.org crmd: [1774]: info: do_lrm_rsc_op: Performing key=7:1:0:1fcb0ada-cc5d-463b-ab2d-e046fee580ed op=drbd_data0:1_start_0 ) Feb 08 15:03:36 node01.houseofdraper.org lrmd: [1771]: info: rsc:drbd_data0:1:7: start Feb 08 15:03:36 node01.houseofdraper.org crmd: [1774]: info: do_lrm_rsc_op: Performing key=35:1:0:1fcb0ada-cc5d-463b-ab2d-e046fee580ed op=drbd_data1:1_start_0 ) Feb 08 15:03:36 node01.houseofdraper.org lrmd: [1771]: info: rsc:drbd_data1:1:8: start That's obviously a configuration problem. Similar in all other logs, it's as if there are no constraints. There are also numerous drbd errors: Node01Reboot1400messages.log:Feb 8 14:00:56 node01 drbd[6124]: ERROR: data0: Called drbdadm -c /etc/drbd.conf secondary data0 Node01Reboot1400messages.log:Feb 8 14:00:56 node01 drbd[6124]: ERROR: data0: Exit code 11 Node01Reboot1400messages.log:Feb 8 14:00:56 node01 drbd[6124]: ERROR: data0: Command output: Node01Reboot1400messages.log:Feb 8 14:00:56 node01 drbd[6124]: ERROR: data0: Called drbdadm -c /etc/drbd.conf secondary data0 etc. Looking again at your configuration, there are some strange resource relations: > order ord_data00 inf: ms_drbd_data0:promote ms_drbd_data1:promote How these two dependent of each other? > order ord_data01 inf: ms_drbd_data0:promote lvm_data0:start > order ord_data02 inf: lvm_data0:start fs_data0:start > order ord_data03 inf: ms_drbd_data1:promote lvm_data1:start > order ord_data04 inf: lvm_data1:start fs_data1:start > order ord_data05 inf: fs_data0:start fs_data1:start And these two. > order ord_data06 inf: fs_data1:start ip_data:start > order ord_data07 inf: ip_data:start svc_nfs:start > order ord_data08 inf: ip_data:start svc_samba:start Perhaps you could use groups to reduce the configuration size a bit. It's quite hard to follow all the constraints. Please use hb_report, it is the only way one can correlate events with logs with configuration. And you'll find it a tad easier than collecting stuff by hand. The bugzilla is at http://developerbugs.linux-foundation.org/ Thanks, Dejan > these, I started with all the resources running on Node01. I > issued the first reboot at 14:00, after which all the resources > except fs_data0 started successfully on Node02. I issued a > second reboot at 15:00, after which only the drbd resources > successfully restarted on Node01: > > -bash-4.0# crm status > ============ > Last updated: Mon Feb 8 15:42:25 2010 > Stack: Heartbeat > Current DC: node02.houseofdraper.org (a91b7362-448e-4437-a543-19e0067a5d2e) - > partition with quorum > Version: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 > 2 Nodes configured, unknown expected votes > 4 Resources configured. > ============ > > Online: [ node01.houseofdraper.org node02.houseofdraper.org ] > > Master/Slave Set: ms_drbd_data0 > Masters: [ node01.houseofdraper.org ] > Slaves: [ node02.houseofdraper.org ] > Master/Slave Set: ms_drbd_data1 > Masters: [ node01.houseofdraper.org ] > Slaves: [ node02.houseofdraper.org ] > > Failed actions: > lvm_data0_start_0 (node=node02.houseofdraper.org, call=14, rc=1, > status=complete): unknown error > fs_data0_start_0 (node=node02.houseofdraper.org, call=6, rc=5, > status=complete): not installed > lvm_data0_start_0 (node=node01.houseofdraper.org, call=6, rc=1, > status=complete): unknown error > fs_data0_start_0 (node=node01.houseofdraper.org, call=14, rc=5, > status=complete): not installed > -bash-4.0# > > As for the bugzilla report, if you would kindly point me to a > FAQ or HOWTO covering the proper submission of a bugzilla > report for this group, I would be happy to initiate one. > Thanks in advance, > > DJ > > _________________________________________________________________ > Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. > http://clk.atdmt.com/GBL/go/201469229/direct/01/ > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker