Folks, I have having trouble starting my DRBD+OCFS2 filesystem. It seems to be a timing thing, with the filesystem trying to come up before DRBD has gotten the second node of the cluster into Primary mode. I find this in the logs:
Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) FATAL: Module scsi_hostadapter not found. Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) blockdev: Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) cannot open /dev/drbd/by-res/share Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) : Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) Wrong medium type Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) mount.ocfs2 Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) : Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) I/O error on channel Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) while opening device /dev/drbd1 Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: RA output: (p_fs_share:1:start:stderr) Dec 4 15:50:05 aztestc4 Filesystem[1631]: ERROR: Couldn't mount filesystem /dev/drbd/by-res/share on /share Dec 4 15:50:05 aztestc4 lrmd: [1177]: WARN: Managed p_fs_share:1:start process 1631 exited with return code 1. Dec 4 15:50:05 aztestc4 lrmd: [1177]: info: operation start[15] on p_fs_share:1 for client 1180: pid 1631 exited with return code 1 Dec 4 15:50:05 aztestc4 crmd: [1180]: debug: create_operation_update: do_update_resource: Updating resouce p_fs_share:1 after complete start op (interval=0) Dec 4 15:50:05 aztestc4 crmd: [1180]: info: process_lrm_event: LRM operation p_fs_share:1_start_0 (call=15, rc=1, cib-update=18, confirmed=true) unknown error If I simply wait a little while (maybe a minute, maybe less) and then "crm resource cleanup cl_fs_share", the filesystem starts properly on both nodes. Here are the pertinent parts of my configuration: primitive p_drbd_share ocf:linbit:drbd \ params drbd_resource="share" \ op monitor interval="15s" role="Master" timeout="20s" \ op monitor interval="20s" role="Slave" timeout="20s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" primitive p_fs_share ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/share" directory="/share" fstype="ocfs2" options="rw,noatime" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="20" timeout="40" primitive p_o2cb ocf:pacemaker:o2cb \ params stack="cman" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" ms ms_drbd_share p_drbd_share \ meta master-max="2" notify="true" interleave="true" clone-max="2" is-managed="true" target-role="Started" clone cl_fs_share p_fs_share \ meta interleave="true" notify="true" globally-unique="false" target-role="Started" clone cl_o2cb p_o2cb \ meta interleave="true" globally-unique="false" order o_ocfs2 inf: ms_drbd_share:promote cl_o2cb order o_share inf: cl_o2cb cl_fs_share Should I increase the timeout value in primitive p_fs_share ocf:heartbeat:Filesystem \ ... \ op start interval="0" timeout="60" to take care of this? I am dubious because I think cl_o2cb is starting, which allows cl_fs_share to start, before ms_drbd_share is done promote-ing. Thanks, -- Art Z. -- Art Zemon, President Hen's Teeth Network <http://www.hens-teeth.net/> for reliable web hosting and programming (866)HENS-NET / (636)447-3030 ext. 200 / www.hens-teeth.net
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org