On 01/17/2013 07:18 PM, Lee Duncan wrote: >> > Yeah, that should trigger it. Are you seeing IO failed with DID_BUS_BUSY >> > too like in the novel bugzilla? > I never saw "DID_BUS_BUSY" in the logs attached to the original bug report, > and I don't see any such message in /var/log/messages now. I know my kernel > has the "DID_BUS_BUSY" code present, but I just don't know how to tell if > anything is returning that or not. >
In older SLES kernels we just got the hex value of the errors variable. So: Oct 3 19:34:10 IBMx3250-200-174 kernel: sd 1:0:0:0: SCSI error: return code = 0x00020000 is DID_BUS_BUSY right? In the log you sent I do not see any errors except: Jan 16 13:02:49 sles10vm kernel: sd 2:0:0:0: timing out command, waited 60s This sort of makes sense because it looks like the failure has lasted a minute (start of error below): Jan 16 13:01:49 sles10vm kernel: connection2:0: iscsi: detected conn error (1011) What does not make sense is why that command is floating around getting retried and hitting that time check. In the upstream code the IO is in a blocked queue so it should not hit that check while blocked. While replacement/recovery timeout has not expired then it should just be sitting in the queue and not hit that timeout out command code path. When you run the test, before IO is failed with that error, is the iscsi device in the state "blocked"? You can run iscsiadm -m session -P 3 to see the states. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
