Obviously looking for some help, if any is there to be had.
Behavior I'm seeing is a log entry stating "ping timeout of 5 secs
expired" then session dropped. Roughly 2 seconds of "Failing
command..." then the session is re-established. (log excerpt below).
This has been caused corruption in the ext3 journal, making the
filesystem go read-only until its fsck'd.
I'm seeing a few ways this could be debugged, but I'm still new to
iSCSI, so direction would be appreciated. The "ping timeout of 5 sec"
followed by 2 seconds, followed by re-establishing. Is there a
tunable parameter that I missed which would make the system less
vulnerable these inexplicable lags?
I still yet to investigate the source of these lags. The system does
push a lot of traffic, ~ 200Mbit/sec sustained over the Gig eth. I've
also looked at updating my ethernet drivers. The Broadcom drivers
have been updated since the version released with SL 4.6
The system is Scientific Linux 4.6 (SL is a RHEL derivative like
CentOS) running on a quad - dual AMD, Tyan S4882 motherboard with
built in broadcom gig-ethernet. The storage array is from Rorke
Data, which is basically a re-branded Infortrend A16E-G2130-4. The
iscsi target and the array are on the same subnet.
My kernel and iscsi-initiator-utils are as up-to-date as I can get
them
# cat /etc/redhat-release
Scientific Linux SL release 4.6 (Beryllium)
# uname -r
2.6.9-67.0.15.ELsmp
# rpm -qi iscsi-initiator-utils
Name: iscsi-initiator-utilsRelocations: (not
relocatable)
Version : 4.0.3.0 Vendor: Scientific
Linux
Release : 6 Build Date: Tue Nov 20
17:49:23 2007
Install Date: Tue Jun 24 12:59:15 2008 Build Host: yort.fnal.gov
Group : System Environment/DaemonsSource RPM: iscsi-
initiator-utils-4.0.3.0-6.src.rpm
Size: 227927 License: GPL
Signature : DSA/SHA1, Thu Feb 7 11:39:33 2008, Key ID
25dbef78a7048f8d
URL : http://linux-iscsi.sourceforge.net/
Summary : iSCSI daemon and utility programs
Description :
The iscsi package provides the server daemon for the iSCSI protocol,
as well as the utility programs used to manage it. iSCSI is a protocol
for distributed disk access using SCSI commands sent over Internet
Protocol networks.
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: ping timeout of 5
secs expired, last rx 4318639488, last ping 4318644488, now 4318649488
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Session dropped
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x8a task 636019 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x8a task 636033 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x88 task 636034 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636035 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636036 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636037 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636038 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636039 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 636040 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x8a task 636041 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x88 task 636042 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636043 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636044 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636045 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636046 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636047 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636048 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636049 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 636050 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x2a task 4294967295 with return code = 0x2
Jul 9 18:08:48 HOSTNAME kernel: iscsi-sfnet:host7: Failing command
cdb 0x28 task 4294967295 with return code = 0x2
Jul 9 18:08:48