On 30 Jul 2009 at 19:25, Erez Zilber wrote: > > Mike, > > I'm seeing some strange behavior that I don't completely understand: > during heavy IO, I disconnect the cable on the target machine and > after a few seconds, I'm trying to logout from that target on the > initiator side. Here's what I see:
Hi, IMHO unpluggin the cable is a bad thing to test for network problems: 1) A Gb NIC typically takes up to 5 seconds to re-establish the link to a switch after re-plugging. 2) The kernel (udev) may trigger the execution of some scripts to set up the network after un-plug/re-plug (maybe even using DHCP) 3) The TCP stack may suspend I/O until some timeout ...then come the iSCSI issues... Maybe a more effective (and easier to manage) test would be adding a blackhole route. That's purely software and it just makes the packets go to nowhere. Regards, Ulrich > > t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed > out. It resets the timer because we have a ping task & > time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies) > is true. > > t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called > again. The session state is ISCSI_STATE_IN_RECOVERY. This will be > called again and again every /sys/block/sdX/device/timeout seconds. > The function returns and resets the timer. > > t = 11: I'm using iscsiadm to logout from the bad session > > t = 60 sec: __iscsi_block_session is called. I guess that the call > stack is something like: mgmt_ipc_session_logout -> > session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop > -> iscsi_conn_stop -> iscsi_start_session_recovery -> > iscsi_block_session -> __iscsi_block_session. I'm not sure if this was > initiated by the logout that iscsiadm initiated 49 seconds ago. > > Now, I see that the commands that were in flight time out: > > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000 > Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100663048 > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x06000000 > Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100662792 > Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32. > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s > Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000 > Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296 > > t = 61: ctldev_handle calls iscsi_sched_conn_context with > EV_CONN_ERROR. I'm not sure where this came from. > > What I don't understand is why __iscsi_block_session was called only > after 60 seconds. Is it configurable? Anyway, here's the node info: > > node.name = iqn.2009-01.com.example > node.tpgt = 1 > node.startup = manual > iface.hwaddress = default > iface.ipaddress = default > iface.iscsi_ifacename = default > iface.net_ifacename = default > iface.transport_name = tcp > iface.initiatorname = <empty> > node.discovery_address = 172.22.0.20 > node.discovery_port = 3260 > node.discovery_type = send_targets > node.session.initial_cmdsn = 0 > node.session.initial_login_retry_max = 1 > node.session.cmds_max = 128 > node.session.queue_depth = 32 > node.session.auth.authmethod = None > node.session.auth.username = <empty> > node.session.auth.password = <empty> > node.session.auth.username_in = <empty> > node.session.auth.password_in = <empty> > node.session.timeo.replacement_timeout = 1 > node.session.err_timeo.abort_timeout = 2 > node.session.err_timeo.lu_reset_timeout = 2 > node.session.err_timeo.host_reset_timeout = 1 > node.session.iscsi.FastAbort = Yes > node.session.iscsi.InitialR2T = No > node.session.iscsi.ImmediateData = Yes > node.session.iscsi.FirstBurstLength = 262144 > node.session.iscsi.MaxBurstLength = 16776192 > node.session.iscsi.DefaultTime2Retain = 0 > node.session.iscsi.DefaultTime2Wait = 2 > node.session.iscsi.MaxConnections = 1 > node.session.iscsi.MaxOutstandingR2T = 1 > node.session.iscsi.ERL = 0 > node.conn[0].address = 172.22.0.20 > node.conn[0].port = 3260 > node.conn[0].startup = manual > node.conn[0].tcp.window_size = 524288 > node.conn[0].tcp.type_of_service = 0 > node.conn[0].timeo.logout_timeout = 2 > node.conn[0].timeo.login_timeout = 2 > node.conn[0].timeo.auth_timeout = 2 > node.conn[0].timeo.noop_out_interval = 5 > node.conn[0].timeo.noop_out_timeout = 5 > node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 > node.conn[0].iscsi.HeaderDigest = None > node.conn[0].iscsi.DataDigest = None > node.conn[0].iscsi.IFMarker = No > node.conn[0].iscsi.OFMarker = No > > Thanks, > Erez > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---
