Re: kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK

Mike Christie Thu, 09 Sep 2010 09:56:06 -0700

On 09/02/2010 08:12 PM, shantanu mehendale wrote:

I am dealing with an issue on ISCSI transport  where I am seeing
"DID_TRASNPORT_FAILFAST" hostbyte errors possibly reaching the
application which is sending I/O on a device-mapper node. Reading the
code a little I thought that after the iscsi  replacement_timeout
timer fires, the io stuck in the io queues will be sent up to the
device-mapper, which  would send the io to the new path. Is there a


It should.

possibility that dm-multipath is not able to handle all the errors so
some of them end up going to the application. Basically this is a

Not normally. If all paths failed, then dm would propagate IO errorsupwards.

cable pull kind of experiment where we would expect the path failover
to work and io to continue properly.
Since I read another issue posted here diccussing issue with
"DID_TRANSPORT_DISRUPTED" error code, I was wondering if
"DID_TRANSPORT_FAILFAST" also has some similar issues with limited
retries and such.

DID_* are scsi error codes. dm multipath and apps using the dm device inthe normal block io path never sees this. Apps using the SG IO passthrough interface would.


These errors:

> Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result:
> hostbyte=DID_TRAN SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK Aug
> 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl,
> sector 113

Are from the iscsi layer failing the IO with DID_TRANSPORT_FAILFAST tothe scsi layer, and the scsi layer logging it. For FS/block IO the scsilayer then translates that error to -EIO and returns the IO to the blocklayer which returns it to the multipath layer. The multipath layer lookslike it is retrying the IO ok or internally queueing, because we do notsee any IO errors from dm-multipath's kernel component.


For pass through IO, the scsi layer would just pass the error upwards.

We do see errors from the path checker, but it looks like it is sendingTURs using the sg io pass through path. We expect IO errors from there,because it is using sg io and setting retries to 0. So when the problemis detected and the iscsi layer drops the connection and requeues IO,the scsi layer will see that retries is 0 and will fail the IO upwards.


Is the app that is getting the IO errors related to this:

> Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13127]: ERROR: errno -5 at
> __tapdisk_vbd_co
> mplete_td_request: req 0: write 0x0008 secs to 0x00000180 Aug 27
> 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at
> vhd_complete: /d

If so is it above or below the dm device. And how is it sending IO tothe device. If it is some sort of pass through IO method like sg io withno retries set, then you are going to get IO errors right away like withthe multipath SG IO TUR path checker.


--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: kernel: sd 2:0:0:5: [sdm] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK

Reply via email to