Hi all, during heavy I/O the iSCSI initiator starts spitting out receive timeouts and connection failures, even though the connection itself is not faulty.
I managed to trace it down to the way open-iscsi treats SCSI commands. During queuecommand we're just taking the scmd, add it to the cmdqueue, and kick the workqueue to transmit these commands. However, when the system is under heavy I/O load the times difference between queueing and processing the command on the workqueue might be quite considerable. In fact, it might be longer than the SCSI command timeout itself, causing the command to timeout. And to make matters worse, we're injecting NOPs now and again to detect the connection is still alive. However, currently we're only counting the time since we last received some data from the target. The time the NOP request is stuck on the queue is not being taken into account, causing erroneous connection failures. To remedy this I've created two patches, one for checking the cmdqueue before trying to send NOPs and the other for checking the cmdqueue when the SCSI command timeout has kicked in. They _do_ looks sane to me, and they certainly cause the spurious connection failures to drop quite considerably. However, as they interfere with error handling I'd like to have a second or third opinion here. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---