Sunil,
I was running the following iozone command (iozone version 3.248):
/iozone -az -e -q 4096 -n 1G -g 18G -b r5_ocfs2_iozone1.xls
Tool is available here:
http://iozone.org
This cycles through various tests, using "record" sizes of 4K through
4MB and file size ranging from 1GB to 18GB.
From the log file, it appears that it was about half way through the
16GB file test, using 32K records.
--Peter
Sunil Mushran wrote:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.txt
Refer to the section titled "Heartbeat" and "Quorum and Fencing".
What size ios were you performing when running iozone?
Peter Sylvester wrote:
Sunil,
Can you expand upon this explanation a bit?
What kind of I/O (disk, network, etc) are we talking about here, and
under what conditions could it possibly take 12 seconds?
Disk I/O service time should be around 10ms for these (10K RPM SCSI)
drives.
Remember that this is a single note cluster, managing locally
attached disk, so it should only be talking to itself.
thanks,
Peter Sylvester
Sunil Mushran wrote:
What this means is that the hb thread was unable to complete an io
for 12 secs and was forced to fence the node.
One solution is to increase this threshold time by specifying
it in /etc/sysconfig/o2cb.
O2CB_HEARTBEAT_THRESHOLD = 14
The default value is 7 will results in 12 secs.
(O2CB_HEARTBEAT_THRESHOLD - 1) * 2 secs
Setting it to 14 will make it 26 secs.
Peter Sylvester wrote:
System config:
Dell PE2850 server
(4) 36GB SCSI drives in (onboard) RAID-5
RHEL4-U2
Dell ATI Video Driver update 10/2005
ocfs2-2.6.9-22.ELsmp-1.0.7-1.i686.rpm
ocfs2-tools-1.0.2-1.i386.rpm
ocfs2console-1.0.2-1.i386.rpm
Note that this is a single node cluster, nothing else
installed/running except iozone.
I was running some "iozone" tests on the OCFS2 volume for about a
day, and the system locked up completely.
The following messages were transcribed from the console (nothing
written to /var/log/messages):
usb4-2: device not accepting address 4, error -71
(11,1): o2hb_write_timeout: 164 ERROR: heartbeat write timeout to
device sda6 after 12000 miliseconds
(11,1): o2hb_stop_all_regions: 1724 ERROR: stopping heartbeat on
all active regeons
Kernel Panic - not syncing: ocfs2 is very sorry to be fencing the
system by panicing
Questions:
What does all this mean?
Why is nothing getting written to /var/log/messages?
If this software really ready for prime time (honestly...)?
thanks,
Peter Sylvester
MITRE Corp.
_______________________________________________
Ocfs2-users mailing list
[EMAIL PROTECTED]
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[EMAIL PROTECTED]
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[EMAIL PROTECTED]
http://oss.oracle.com/mailman/listinfo/ocfs2-users