Re: [Ocfs2-users] AoE+ocfs2 = Heartbeat write timeout to device

Sunil Mushran Sat, 08 Mar 2008 09:48:56 -0800

The older 12 sec default timeout was too low. It has been bumped
up to 60 secs. The FAQ has details on this.


[EMAIL PROTECTED] wrote:

Hi,

I got a problem regarding 100Mbit Ethernet, AoE and ocfs2. I setup 2 boxes
connected per 100Mbit ethernet to their Ata-over-Ethernet storage. The
ocfs filesystem resides on such an AoE-Partition. If I produce high
troughput to that ocfs-partition on one node, it reboots after some
seconds.

I use dd for testing, like dd if=/dev/zero of=test bs=1M count=1000
If I write 100Mb of data to the disk everything is fine. If I write 1Gb of
data to the disk, the node reboots after some seconds and prints the
following error:

(9,0):o2hb_write_timeout:167 ERROR: Heartbeat write timeout to device
etherd/e402.0 after 12000 milliseconds
(9,0):o2hb_stop_all_regions:1865 ERROR: stopping heartbeat on all active
regions.

This couldn't be caused by lost heartbeat packets. I setup a seperate
network for heartbeat to track this problem.

Actually I know that 100Mbit Ethernet is a bottleneck, but this should not
cause the system to reboot, right? Even if I could switch to Gigbit
Ethernet it may be the bottleneck in future..

Someone experienced this already? Do you know how to solve this issue?
Please help, I need to do some tests..
Your help is really appreciated.

Cheers,
Holger


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users



_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] AoE+ocfs2 = Heartbeat write timeout to device

Reply via email to