Re: SUSE open-iscsi bug on replacement_timeout [resend]

Mike Christie Wed, 23 Jan 2013 13:59:04 -0800

On 01/17/2013 07:18 PM, Lee Duncan wrote:
>> > Yeah, that should trigger it. Are you seeing IO failed with DID_BUS_BUSY
>> > too like in the novel bugzilla?
> I never saw "DID_BUS_BUSY" in the logs attached to the original bug report,
> and I don't see any such message in /var/log/messages now. I know my kernel
> has the "DID_BUS_BUSY" code present, but I just don't know how to tell if
> anything is returning that or not.
>


In older SLES kernels we just got the hex value of the errors variable. So:

Oct  3 19:34:10 IBMx3250-200-174 kernel: sd 1:0:0:0: SCSI error: return
code = 0x00020000

is DID_BUS_BUSY right?


In the log you sent I do not see any errors except:

Jan 16 13:02:49 sles10vm kernel: sd 2:0:0:0: timing out command, waited 60s

This sort of makes sense because it looks like the failure has lasted a
minute (start of error below):

Jan 16 13:01:49 sles10vm kernel:  connection2:0: iscsi: detected conn
error (1011)

What does not make sense is why that command is floating around getting
retried and hitting that time check. In the upstream code the IO is in a
blocked queue so it should not hit that check while blocked. While
replacement/recovery timeout has not expired then it should just be
sitting in the queue and not hit that timeout out command code path.

When you run the test, before IO is failed with that error, is the iscsi
device in the state "blocked"? You can run iscsiadm -m session -P 3 to
see the states.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: SUSE open-iscsi bug on replacement_timeout [resend]

Reply via email to