On 09/02/2010 08:12 PM, shantanu mehendale wrote:
I am dealing with an issue on ISCSI transport where I am seeing
"DID_TRASNPORT_FAILFAST" hostbyte errors possibly reaching the
application which is sending I/O on a device-mapper node. Reading the
code a little I thought that after the iscsi replacement_timeout
timer fires, the io stuck in the io queues will be sent up to the
device-mapper, which would send the io to the new path. Is there a
It should.
possibility that dm-multipath is not able to handle all the errors so
some of them end up going to the application. Basically this is a
Not normally. If all paths failed, then dm would propagate IO errors
upwards.
cable pull kind of experiment where we would expect the path failover
to work and io to continue properly.
Since I read another issue posted here diccussing issue with
"DID_TRANSPORT_DISRUPTED" error code, I was wondering if
"DID_TRANSPORT_FAILFAST" also has some similar issues with limited
retries and such.
DID_* are scsi error codes. dm multipath and apps using the dm device in
the normal block io path never sees this. Apps using the SG IO pass
through interface would.
These errors:
> Aug 27 15:36:06 cb-xen-srv16 kernel: sd 2:0:0:3: [sdl] Result:
> hostbyte=DID_TRAN SPORT_FAILFAST driverbyte=DRIVER_OK,SUGGEST_OK Aug
> 27 15:36:06 cb-xen-srv16 kernel: end_request: I/O error, dev sdl,
> sector 113
Are from the iscsi layer failing the IO with DID_TRANSPORT_FAILFAST to
the scsi layer, and the scsi layer logging it. For FS/block IO the scsi
layer then translates that error to -EIO and returns the IO to the block
layer which returns it to the multipath layer. The multipath layer looks
like it is retrying the IO ok or internally queueing, because we do not
see any IO errors from dm-multipath's kernel component.
For pass through IO, the scsi layer would just pass the error upwards.
We do see errors from the path checker, but it looks like it is sending
TURs using the sg io pass through path. We expect IO errors from there,
because it is using sg io and setting retries to 0. So when the problem
is detected and the iscsi layer drops the connection and requeues IO,
the scsi layer will see that retries is 0 and will fail the IO upwards.
Is the app that is getting the IO errors related to this:
> Aug 27 15:36:06 cb-xen-srv16 TAPDISK[13127]: ERROR: errno -5 at
> __tapdisk_vbd_co
> mplete_td_request: req 0: write 0x0008 secs to 0x00000180 Aug 27
> 15:36:06 cb-xen-srv16 TAPDISK[12972]: ERROR: errno -5 at
> vhd_complete: /d
If so is it above or below the dm device. And how is it sending IO to
the device. If it is some sort of pass through IO method like sg io with
no retries set, then you are going to get IO errors right away like with
the multipath SG IO TUR path checker.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.