What is your O2CB_HEARTBEAT_THRESHOLD set to?
For more, refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT
[EMAIL PROTECTED] wrote:
I'm performing some testing with ocfs2 on 2 nodes with Red Hat AS4
Update 4 (x86_64) and (mulitpath included in the 2.6 kernel) and am
runing into some issues when cleanly rebooting the 2nd node, while the
1st node is still up.
So if I do the following on the 2nd node, the 1st node does not fence
itself:
/etc/init.d/ocfs2 stop
/etc/init.d/o2cb stop
wait more than 60 seconds
init 6
I get the following on the 1st node, but everything is fine:
Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 12> return code
= 0x20000
Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdm,
sector 1.
Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 8:192.
Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 14> return code
= 0x20000
Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdo,
sector 193297
Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 8:224.
Sep 21 21:44:49 bbflgrid11 kernel: SCSI error : <0 0 0 13> return code
= 0x20000
Sep 21 21:44:49 bbflgrid11 kernel: end_request: I/O error, dev sdn,
sector 192785
Sep 21 21:44:49 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 8:208.
Sep 21 21:44:49 bbflgrid11 multipathd: 8:192: mark as failed
Sep 21 21:44:49 bbflgrid11 multipathd: mpath1: remaining active paths: 1
Sep 21 21:44:49 bbflgrid11 multipathd: 8:224: mark as failed
Sep 21 21:44:49 bbflgrid11 multipathd: mpath3: remaining active paths: 1
Sep 21 21:44:49 bbflgrid11 multipathd: 8:208: mark as failed
Sep 21 21:44:49 bbflgrid11 multipathd: mpath2: remaining active paths: 1
Sep 21 21:44:58 bbflgrid11 multipathd: 8:192: readsector0 checker
reports path is up
Sep 21 21:44:58 bbflgrid11 multipathd: 8:192: reinstated
Sep 21 21:44:58 bbflgrid11 multipathd: mpath1: remaining active paths: 2
Sep 21 21:44:58 bbflgrid11 multipathd: 8:208: readsector0 checker
reports path is up
Sep 21 21:44:58 bbflgrid11 multipathd: 8:208: reinstated
Sep 21 21:44:58 bbflgrid11 multipathd: mpath2: remaining active paths: 2
Sep 21 21:44:58 bbflgrid11 multipathd: 8:224: readsector0 checker
reports path is up
Sep 21 21:44:58 bbflgrid11 multipathd: 8:224: reinstated
Sep 21 21:44:58 bbflgrid11 multipathd: mpath3: remaining active paths: 2
Sep 21 21:46:06 bbflgrid11 kernel: SCSI error : <1 0 0 11> return code
= 0x20000
Sep 21 21:46:06 bbflgrid11 kernel: end_request: I/O error, dev sdaa,
sector 1920
Sep 21 21:46:06 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 65:160.
Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: mark as failed
Sep 21 21:46:06 bbflgrid11 multipathd: mpath0: remaining active paths: 1
Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: readsector0 checker
reports path is up
Sep 21 21:46:06 bbflgrid11 multipathd: 65:160: reinstated
Sep 21 21:46:06 bbflgrid11 multipathd: mpath0: remaining active paths: 2
Now if I do the following on the 2nd node, the 1st node fences itself
(same as above, except dont wait 60 seconds after o2cb stop)
/etc/init.d/ocfs2 stop
/etc/init.d/o2cb stop
init 6
Node 1 logs the following and fences itself, I have to power cycle the
server to get it back, it doesn't reboot or shutdown just hangs
Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <0 0 0 13> return code
= 0x20000
Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdn,
sector 192785
Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 8:208.
Sep 21 21:28:00 bbflgrid11 multipathd: 8:208: mark as failed
Sep 21 21:28:00 bbflgrid11 multipathd: mpath2: remaining active paths: 1
Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <1 0 0 12> return code
= 0x20000
Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdab,
sector 192784
Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdab,
sector 192786
Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 65:176.
Sep 21 21:28:00 bbflgrid11 kernel: SCSI error : <1 0 0 13> return code
= 0x20000
Sep 21 21:28:00 bbflgrid11 kernel: end_request: I/O error, dev sdac,
sector 192785
Sep 21 21:28:00 bbflgrid11 kernel: device-mapper: dm-multipath:
Failing path 65:192.
Sep 21 21:28:00 bbflgrid11 multipathd: 65:176: mark as failed
Sep 21 21:28:00 bbflgrid11 multipathd: mpath1: remaining active paths: 1
Sep 21 21:28:01 bbflgrid11 multipathd: 65:192: mark as failed
Sep 21 21:28:01 bbflgrid11 multipathd: mpath2: remaining active paths: 0
Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:01 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:01 bbflgrid11 multipathd: 65:176: readsector0 checker
reports path is up
Sep 21 21:28:01 bbflgrid11 multipathd: 65:176: reinstated
Sep 21 21:28:01 bbflgrid11 multipathd: mpath1: remaining active paths: 2
Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:03 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:05 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:07 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_bio_end_io:331 ERROR:
IO Error -5
Sep 21 21:28:09 bbflgrid11 kernel: (4912,1):o2hb_do_disk_heartbeat:973
ERROR: status = -5
Sep 21 21:28:09 bbflgrid11 multipathd: 8:208: readsector0 checker
reports path is up
Sep 21 21:28:09 bbflgrid11 multipathd: 8:208: reinstated
Sep 21 21:28:09 bbflgrid11 multipathd: mpath2: remaining active paths: 1
Sep 21 21:28:10 bbflgrid11 multipathd: 65:192: readsector0 checker
reports path is up
Sep 21 21:28:10 bbflgrid11 multipathd: 65:192: reinstated
Sep 21 21:28:10 bbflgrid11 multipathd: mpath2: remaining active paths: 2
...
Index 14: took 0 ms to do submit_bio for read
Index 15: took 0 ms to do waiting for read completion
(11,1):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all
active regions
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
system by panicing
Seems like if I wait for the node 1 to heartbeat to node 2, with o2c
down, before rebooting it's fine, but if I reboot before node 1 has
had a chance to hearbeat to node 2, with o2cb down, it's panics.
Shawn E. Ruff
Senior Oracle DBA
Fiberlink Communications
The information transmitted is intended only for the person or entity
to which it is addressed and may contain confidential and/or
privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended recipient
is prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
------------------------------------------------------------------------
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users