On 06/25/2009 06:33 AM, Santi Saez wrote: > > Hi, > > Randomly I get those iSCSI errors on a Linux box with CentOS 5.3, > running default kernel (2.6.18) and using Open-iSCSI > (6.2.0.868-0.18.el5_3.1): > > ping timeout of 5 secs expired, last rx (..)
This indicates that the initiator sent a iscsi ping but we did not get a reply. When this happens the initiator will then drop the session and try to relogin and retry IO. > connection1:0: iscsi: detected conn error (1011) > Kernel reported iSCSI connection 1:0 error (1011) state (3) > session1: iscsi: session recovery timed out after 120 secs > iscsi: cmd 0x28 is not queued (8) This indicates that we tried to relogin for 2 minutes, but we could not log back in. At that time, we fail IO. > sd 1:0:0:0: SCSI error: return code = 0x00010000 > end_request: I/O error, dev sdb, sector 226732039 > sd 1:0:0:0: SCSI error: return code = 0x00010000 > end_request: I/O error, dev sdb, sector 187040175 > > Full log is available at: http://pastebin.com/f40472f99 > > After that, we need to reboot the server to recover read-write into ext3 fs. > You might be able to avoid this problem by increasing the node.session.timeo.replacement_timeout in iscsid.conf (dont forget to rediscovery the storage so the new value gets picked up). However, if we are not able to reconnect for a couple minutes then something is wrong here. Maybe running iscsid by hand with debugging on will give us more info: iscsid -d 8 Or if you could run it by hand and make a test disk, then login and just pull the cable, so we can check that relogin is working it might be helpful. You should see: ping timeout of 5 secs expired, connection1:0: iscsi: detected conn error (1011) Kernel reported iSCSI connection 1:0 error (1011) state (3) When you see that you should plug the cable back in. Then instead of session1: iscsi: session recovery timed out after 120 secs you should see connection1:0 is operational after recovery > Where use default Open-iSCSI config: > > http://pastebin.com/f9f15d82 > > More info about this device: > > # cat /sys/block/sdb/device/timeout > 60 > > # cat /sys/class/iscsi_session/session1/recovery_tmo > 120 > > There are more initiators conected to the same target and switch, and > are not afectted by this situation, so we think that maybe changing some > Open-iSCSI configuration parameter we can solve this.. any ideas? thanks!! > > Regards, > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---