On 06/25/2009 06:33 AM, Santi Saez wrote:
>
> Hi,
>
> Randomly I get those iSCSI errors on a Linux box with CentOS 5.3,
> running default kernel (2.6.18) and using Open-iSCSI
> (6.2.0.868-0.18.el5_3.1):
>
> ping timeout of 5 secs expired, last rx (..)

This indicates that the initiator sent a iscsi ping but we did not get a 
reply. When this happens the initiator will then drop the session and 
try to relogin and retry IO.

> connection1:0: iscsi: detected conn error (1011)
> Kernel reported iSCSI connection 1:0 error (1011) state (3)
> session1: iscsi: session recovery timed out after 120 secs
> iscsi: cmd 0x28 is not queued (8)

This indicates that we tried to relogin for 2 minutes, but we could not 
log back in. At that time, we fail IO.

> sd 1:0:0:0: SCSI error: return code = 0x00010000
> end_request: I/O error, dev sdb, sector 226732039
> sd 1:0:0:0: SCSI error: return code = 0x00010000
> end_request: I/O error, dev sdb, sector 187040175
>
> Full log is available at: http://pastebin.com/f40472f99
>
> After that, we need to reboot the server to recover read-write into ext3 fs.
>

You might be able to avoid this problem by increasing the 
node.session.timeo.replacement_timeout in iscsid.conf (dont forget to 
rediscovery the storage so the new value gets picked up). However, if we 
are not able to reconnect for a couple minutes then something is wrong here.

Maybe running iscsid by hand with debugging on will give us more info:

iscsid -d 8

Or if you could run it by hand and make a test disk, then login and just 
pull the cable, so we can check that relogin is working it might be 
helpful. You should see:


ping timeout of 5 secs expired,
connection1:0: iscsi: detected conn error (1011)
Kernel reported iSCSI connection 1:0 error (1011) state (3)

When you see that you should plug the cable back in. Then instead of
session1: iscsi: session recovery timed out after 120 secs
you should see
connection1:0 is operational after recovery


> Where use default Open-iSCSI config:
>
> http://pastebin.com/f9f15d82
>
> More info about this device:
>
> # cat /sys/block/sdb/device/timeout
> 60
>
> # cat /sys/class/iscsi_session/session1/recovery_tmo
> 120
>
> There are more initiators conected to the same target and switch, and
> are not afectted by this situation, so we think that maybe changing some
> Open-iSCSI configuration parameter we can solve this.. any ideas? thanks!!
>
> Regards,
>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to