Mike Christie wrote: > Erez Zilber wrote: >> Mike, >> >> I'm seeing some strange behavior that I don't completely understand: >> during heavy IO, I disconnect the cable on the target machine and >> after a few seconds, I'm trying to logout from that target on the >> initiator side. Here's what I see: >> >> t = 0: iscsi_eh_cmd_timed_out gets called for all commands that timed >> out. It resets the timer because we have a ping task & >> time_before_eq(conn->last_recv + (conn->recv_timeout * HZ), jiffies) >> is true. >> >> t = /sys/block/sdX/device/timeout: iscsi_eh_cmd_timed_out gets called >> again. The session state is ISCSI_STATE_IN_RECOVERY. This will be >> called again and again every /sys/block/sdX/device/timeout seconds. >> The function returns and resets the timer. >> >> t = 11: I'm using iscsiadm to logout from the bad session >> >> t = 60 sec: __iscsi_block_session is called. I guess that the call >> stack is something like: mgmt_ipc_session_logout -> >> session_logout_task -> session_conn_shutdown -> iscsi_sw_tcp_conn_stop >> -> iscsi_conn_stop -> iscsi_start_session_recovery -> >> iscsi_block_session -> __iscsi_block_session. I'm not sure if this was >> initiated by the logout that iscsiadm initiated 49 seconds ago. >> >> Now, I see that the commands that were in flight time out: >> >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000 >> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100663048 >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x06000000 >> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 100662792 >> Jul 30 18:26:46 be8 kernel: device-mapper: multipath: Failing path 8:32. >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: timing out command, waited 42s >> Jul 30 18:26:46 be8 kernel: sd 2:0:0:0: SCSI error: return code = 0x060e0000 >> Jul 30 18:26:46 be8 kernel: end_request: I/O error, dev sdc, sector 41943296 >> >> t = 61: ctldev_handle calls iscsi_sched_conn_context with >> EV_CONN_ERROR. I'm not sure where this came from. >> >> What I don't understand is why __iscsi_block_session was called only >> after 60 seconds. Is it configurable? Anyway, here's the node info: >> > > block session is only called from iscsi_conn_stop. > > iscsi_conn_stop is called when there is a connection error like 1011. > The kernel will throw the error, then send a msg to userspace. Userspace > will then call ep_disconnect to have the kernel clean up the kernel > structs for the ep. Then when that has completed, iscsid will call > iscsi_conn_stop. > > What could be delaying the block session is that the > iscsi_start_session_recovery -> iscsi_suspend_tx will wait for the xmit > thread to flush. Before we had a timeout of 15 seconds so we could be > waiting in there that long. I just modified that to 3 seconds in that > sndtmo patch (that is also in the linux-2.6-iscsi git tree), but what we > really want is a way to just signal that thread to wake right away. >
Oh yeah, what version of open-iscsi were you using? Could you try the 871 release with that patch http://groups.google.com/group/open-iscsi/browse_thread/thread/0940a3217ea0c4ff In the older releases, we could end up waiting for 3 sndtmo periods so that would add 45 secs on there. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---
