On 30 Jul 2009 at 19:25, Erez Zilber wrote:

> 
> Mike,
> 
> I'm seeing some strange behavior that I don't completely understand:
> during heavy IO, I disconnect the cable on the target machine and
> after a few seconds, I'm trying to logout from that target on the
> initiator side. Here's what I see:

Hi,

IMHO unpluggin the cable is a bad thing to test for network problems:

1) A Gb NIC typically takes up to 5 seconds to re-establish the link to a 
switch 
after re-plugging.
2) The kernel (udev) may trigger the execution of some scripts to set up the 
network after un-plug/re-plug (maybe even using DHCP)
3) The TCP stack may suspend I/O until some timeout
...then come the iSCSI issues...

Maybe a more effective (and easier to manage) test would be adding a blackhole 
route. That's purely software and it just makes the packets go to nowhere.

Regards,
Ulrich

> 
> t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed
> out. It resets the timer because we have a ping task &
> time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies)
> is true.
> 
> t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called
> again. The session state is ISCSI_STATE_IN_RECOVERY. This will be
> called again and again every /sys/block/sdX/device/timeout seconds.
> The function returns and resets the timer.
> 
> t = 11: I'm using iscsiadm to logout from the bad session
> 
> t = 60 sec: __iscsi_block_session is called. I guess that the call
> stack is something like: mgmt_ipc_session_logout ->
> session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop
> -> iscsi_conn_stop -> iscsi_start_session_recovery ->
> iscsi_block_session -> __iscsi_block_session. I'm not sure if this was
> initiated by the logout that iscsiadm initiated 49 seconds ago.
> 
> Now, I see that the commands that were in flight time out:
> 
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000
> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100663048
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x06000000
> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100662792
> Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32.
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s
> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000
> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296
> 
> t = 61: ctldev_handle calls iscsi_sched_conn_context with
> EV_CONN_ERROR. I'm not sure where this came from.
> 
> What I don't understand is why __iscsi_block_session was called only
> after 60 seconds. Is it configurable? Anyway, here's the node info:
> 
> node.name = iqn.2009-01.com.example
> node.tpgt = 1
> node.startup = manual
> iface.hwaddress = default
> iface.ipaddress = default
> iface.iscsi_ifacename = default
> iface.net_ifacename = default
> iface.transport_name = tcp
> iface.initiatorname = <empty>
> node.discovery_address = 172.22.0.20
> node.discovery_port = 3260
> node.discovery_type = send_targets
> node.session.initial_cmdsn = 0
> node.session.initial_login_retry_max = 1
> node.session.cmds_max = 128
> node.session.queue_depth = 32
> node.session.auth.authmethod = None
> node.session.auth.username = <empty>
> node.session.auth.password = <empty>
> node.session.auth.username_in = <empty>
> node.session.auth.password_in = <empty>
> node.session.timeo.replacement_timeout = 1
> node.session.err_timeo.abort_timeout = 2
> node.session.err_timeo.lu_reset_timeout = 2
> node.session.err_timeo.host_reset_timeout = 1
> node.session.iscsi.FastAbort = Yes
> node.session.iscsi.InitialR2T = No
> node.session.iscsi.ImmediateData = Yes
> node.session.iscsi.FirstBurstLength = 262144
> node.session.iscsi.MaxBurstLength = 16776192
> node.session.iscsi.DefaultTime2Retain = 0
> node.session.iscsi.DefaultTime2Wait = 2
> node.session.iscsi.MaxConnections = 1
> node.session.iscsi.MaxOutstandingR2T = 1
> node.session.iscsi.ERL = 0
> node.conn[0].address = 172.22.0.20
> node.conn[0].port = 3260
> node.conn[0].startup = manual
> node.conn[0].tcp.window_size = 524288
> node.conn[0].tcp.type_of_service = 0
> node.conn[0].timeo.logout_timeout = 2
> node.conn[0].timeo.login_timeout = 2
> node.conn[0].timeo.auth_timeout = 2
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
> node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
> node.conn[0].iscsi.HeaderDigest = None
> node.conn[0].iscsi.DataDigest = None
> node.conn[0].iscsi.IFMarker = No
> node.conn[0].iscsi.OFMarker = No
> 
> Thanks,
> Erez
> 
> > 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to