Re: task set full and /or busy?

2011-09-02 Thread Mike Christie
On 09/02/2011 12:54 PM, iscsi developer man wrote:
> Hi Mike, I have a follow up question.
> 
> Your notes were very insightful. I was able to recreate exactly what you
> said. 
> 
> I was slightly confused, because iostat counts the operation in its output. 
> 
> 
> Here's another question: What about other failure scenarios? we talked
> about task_set_full and busy, but what if the connection is broken by
> the target side? Is that subject to the same 180 second (scsi commands
> allowed * timeout value ) timeout?

Yes and no and maybe :) It depends on other timers and the target.

If the target drops the conn and the we get a notification (either the
target sends a iscsi async pdu or we get a tcp/ip socket state change
notification then the initiator will set the session as down, block the
scsi devices accessed through that session, and then fail IO with a
return value that tells the scsi layer to requeue the IO until we tell
it otherwise.

We will then try to relogin to the target for replacement/recovery
timeout seconds (see iscsid.conf and the README for info on that timer).
If we cannot login within that timeout we will the scsi layer to unblock
the queues and we fail all the IO.

If the target does not send us an async pdu and we do not get a state
change notification then chances are the initiator will send a nop
(iscsi ping) (see noop timeout settings in iscsid.conf). That will time
out and then the initiator will handle that like above.

If the noop timeout settings are higher than scsi command timeout (the
single run timeout value) then the scsi command would timeout. That
would start the scsi eh which requests the initiator to do aborts and
resets. Those would fail and the initiator would drop the connection and
we would wait for replacement/recovery timeout seconds like above.

See the README's section 8. It describes how many of the timers and eh work.


> 
> 
> Is it possible that on connection breaks, iscsi returns immediate failure? 

Not really immediately. But if all the timers were set really low it
could be really quickly.

> 
> 
> We have been experimenting, and looks like check conditons & sense data
> return immediate errors. 

What do you mean? Return to layers above scsi or to scsi/iscsi layers?

It depends on the sense code. Some are immediately returned upwards to
the block/FS/passthrough layers.

> 
> Does the same happen with connection closes by the target?
> 
> 
> thanks
> 
> 
> iscsi devel man
> 
> 
> 
> 
> 
> On Thu, Aug 25, 2011 at 5:16 PM, iscsi developer man
> mailto:iscsidevel...@gmail.com>> wrote:
> 
> Thanks Mike,
> 
> Its good to hear that SCSI BUSY and SCSI Task_Set_Full are both
> handled correctly by the linux kernel.
> 
> 
> The bug must be in my code then!
> 
> I'll look deeper at the wireshark traces.
> 
> thanks
> 
> iscsi devel man.
> 
> 
> On Thu, Aug 25, 2011 at 4:39 PM, Mike Christie  > wrote:
> 
> On 08/25/2011 05:23 PM, iscsi developer man wrote:
> > Thanks Mike,
> >
> > So what happens if we return the task set full or the busy
> status forever?
> > Does the host get an io error at a certain timeout, does the
> host silently
> > return back to the application that the operation has completed
> > successfully,  or does it retry indefinitely?
> >
> 
> The info below is for the current upstream kernel. It is probably
> correct from about 2.6.18 - 3.*.
> 
> There is a max time value that the scsi layer will retry. It
> depends on
> the command type. The algorithm is:
> 
> (scsi_cmnd->allowed + 1) * scsi_cmnd->timeout.
> 
> The allowed value for disk IO is 5. The timeout depends on your
> distro.
> You can see it in /sys/block/sdX/device/timeout. The kernel sets
> it to
> 30 but some distro udev version set it to 60. Users can set it to
> whatever makes them happy so who knows.
> 
> If the command has not completed in ((scsi_cmnd->allowed + 1) *
> scsi_cmnd->timeout) seconds then the command is failed. In
> /var/log/messages you would see:
> 
> "timing out command, waited X seconds",
> 
> And the upper layers would get some error. The error value
> depends on
> the IO type. The block layer, dm, file systems (kernel stuff)
> gets -EIO.
> If you were doing SG IO then you would see the scsi status value
> you set
> in the SG IO's error data.
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To post to this group, send email to open-iscsi@googlegroups.com.
> To unsubscribe from this group, send email to
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.

-- 
You received this message becaus

Re: task set full and /or busy?

2011-09-02 Thread iscsi developer man
Hi Mike, I have a follow up question.

Your notes were very insightful. I was able to recreate exactly what you
said.

I was slightly confused, because iostat counts the operation in its output.


Here's another question: What about other failure scenarios? we talked about
task_set_full and busy, but what if the connection is broken by the target
side? Is that subject to the same 180 second (scsi commands allowed *
timeout value ) timeout?


Is it possible that on connection breaks, iscsi returns immediate failure?


We have been experimenting, and looks like check conditons & sense data
return immediate errors.

Does the same happen with connection closes by the target?


thanks


iscsi devel man





On Thu, Aug 25, 2011 at 5:16 PM, iscsi developer man <
iscsidevel...@gmail.com> wrote:

> Thanks Mike,
>
> Its good to hear that SCSI BUSY and SCSI Task_Set_Full are both handled
> correctly by the linux kernel.
>
>
> The bug must be in my code then!
>
> I'll look deeper at the wireshark traces.
>
> thanks
>
> iscsi devel man.
>
>
> On Thu, Aug 25, 2011 at 4:39 PM, Mike Christie wrote:
>
>> On 08/25/2011 05:23 PM, iscsi developer man wrote:
>> > Thanks Mike,
>> >
>> > So what happens if we return the task set full or the busy status
>> forever?
>> > Does the host get an io error at a certain timeout, does the host
>> silently
>> > return back to the application that the operation has completed
>> > successfully,  or does it retry indefinitely?
>> >
>>
>> The info below is for the current upstream kernel. It is probably
>> correct from about 2.6.18 - 3.*.
>>
>> There is a max time value that the scsi layer will retry. It depends on
>> the command type. The algorithm is:
>>
>> (scsi_cmnd->allowed + 1) * scsi_cmnd->timeout.
>>
>> The allowed value for disk IO is 5. The timeout depends on your distro.
>> You can see it in /sys/block/sdX/device/timeout. The kernel sets it to
>> 30 but some distro udev version set it to 60. Users can set it to
>> whatever makes them happy so who knows.
>>
>> If the command has not completed in ((scsi_cmnd->allowed + 1) *
>> scsi_cmnd->timeout) seconds then the command is failed. In
>> /var/log/messages you would see:
>>
>> "timing out command, waited X seconds",
>>
>> And the upper layers would get some error. The error value depends on
>> the IO type. The block layer, dm, file systems (kernel stuff) gets -EIO.
>> If you were doing SG IO then you would see the scsi status value you set
>> in the SG IO's error data.
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: task set full and /or busy?

2011-08-25 Thread iscsi developer man
Thanks Mike,

Its good to hear that SCSI BUSY and SCSI Task_Set_Full are both handled
correctly by the linux kernel.


The bug must be in my code then!

I'll look deeper at the wireshark traces.

thanks

iscsi devel man.

On Thu, Aug 25, 2011 at 4:39 PM, Mike Christie  wrote:

> On 08/25/2011 05:23 PM, iscsi developer man wrote:
> > Thanks Mike,
> >
> > So what happens if we return the task set full or the busy status
> forever?
> > Does the host get an io error at a certain timeout, does the host
> silently
> > return back to the application that the operation has completed
> > successfully,  or does it retry indefinitely?
> >
>
> The info below is for the current upstream kernel. It is probably
> correct from about 2.6.18 - 3.*.
>
> There is a max time value that the scsi layer will retry. It depends on
> the command type. The algorithm is:
>
> (scsi_cmnd->allowed + 1) * scsi_cmnd->timeout.
>
> The allowed value for disk IO is 5. The timeout depends on your distro.
> You can see it in /sys/block/sdX/device/timeout. The kernel sets it to
> 30 but some distro udev version set it to 60. Users can set it to
> whatever makes them happy so who knows.
>
> If the command has not completed in ((scsi_cmnd->allowed + 1) *
> scsi_cmnd->timeout) seconds then the command is failed. In
> /var/log/messages you would see:
>
> "timing out command, waited X seconds",
>
> And the upper layers would get some error. The error value depends on
> the IO type. The block layer, dm, file systems (kernel stuff) gets -EIO.
> If you were doing SG IO then you would see the scsi status value you set
> in the SG IO's error data.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: task set full and /or busy?

2011-08-25 Thread Mike Christie
On 08/25/2011 05:23 PM, iscsi developer man wrote:
> Thanks Mike,
> 
> So what happens if we return the task set full or the busy status forever?
> Does the host get an io error at a certain timeout, does the host silently
> return back to the application that the operation has completed
> successfully,  or does it retry indefinitely?
>

The info below is for the current upstream kernel. It is probably
correct from about 2.6.18 - 3.*.

There is a max time value that the scsi layer will retry. It depends on
the command type. The algorithm is:

(scsi_cmnd->allowed + 1) * scsi_cmnd->timeout.

The allowed value for disk IO is 5. The timeout depends on your distro.
You can see it in /sys/block/sdX/device/timeout. The kernel sets it to
30 but some distro udev version set it to 60. Users can set it to
whatever makes them happy so who knows.

If the command has not completed in ((scsi_cmnd->allowed + 1) *
scsi_cmnd->timeout) seconds then the command is failed. In
/var/log/messages you would see:

"timing out command, waited X seconds",

And the upper layers would get some error. The error value depends on
the IO type. The block layer, dm, file systems (kernel stuff) gets -EIO.
If you were doing SG IO then you would see the scsi status value you set
in the SG IO's error data.


-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: task set full and /or busy?

2011-08-25 Thread iscsi developer man
Thanks Mike,

So what happens if we return the task set full or the busy status forever?
Does the host get an io error at a certain timeout, does the host silently
return back to the application that the operation has completed
successfully,  or does it retry indefinitely?

My concern is that the host is acknowledging the write to the application
after a timeout period for that pdu, which would explain the lost writes
during my target's rebuilds.


thanks,

iscsi devel man.



On Thu, Aug 25, 2011 at 4:22 PM, Mike Christie  wrote:

> On 08/25/2011 10:28 AM, iscsi developer man wrote:
> > Hey everyone,
> >
> > I am writing an iscsi target, and I am working on some experimental
> > features for dealing with rebuilds.
> >
> >
> > Right now, I am considering sending back to the initiator a
> > TASK_SET_FULL SCSI Status, and/or a BUSY SCSI Status, in the Command
> > Response PDU.
> >
> >
> > However, it looks like currently, it doesnt appear that its correctly
>
> What do you mean by it is not being handled? Are you returning task set
> fill and what is happening?
>
> > being handled. I.E. I am not acknowledging the writes, yet open-iscsi
> > thinks that they are being acknowledged.
>
> What do you mean by not acking writes? Do you mean with the task set
> fill status or do you mean you are also doing something like with the
> ExpCmdSn or some other value?
>
> >
> >
> > Is there a reason why this is?
> >
> > What is the initiator's expected behavior with regards to SCSI
> > Status's that are not SCSI_GOOD?
>
> The iscsi layer does not really do anything. It just copies the status
> and sense if needed to the scsi_cmnd struct and the scsi layer handles it.
>
> For BUSY status, the scsi layer will just retry the cmd immediately. For
> TASK_SET_FULL it will lower the devices queue depth and then retry
> immediately.
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: task set full and /or busy?

2011-08-25 Thread Mike Christie
On 08/25/2011 10:28 AM, iscsi developer man wrote:
> Hey everyone,
> 
> I am writing an iscsi target, and I am working on some experimental
> features for dealing with rebuilds.
> 
> 
> Right now, I am considering sending back to the initiator a
> TASK_SET_FULL SCSI Status, and/or a BUSY SCSI Status, in the Command
> Response PDU.
> 
> 
> However, it looks like currently, it doesnt appear that its correctly

What do you mean by it is not being handled? Are you returning task set
fill and what is happening?

> being handled. I.E. I am not acknowledging the writes, yet open-iscsi
> thinks that they are being acknowledged.

What do you mean by not acking writes? Do you mean with the task set
fill status or do you mean you are also doing something like with the
ExpCmdSn or some other value?

> 
> 
> Is there a reason why this is?
> 
> What is the initiator's expected behavior with regards to SCSI
> Status's that are not SCSI_GOOD?

The iscsi layer does not really do anything. It just copies the status
and sense if needed to the scsi_cmnd struct and the scsi layer handles it.

For BUSY status, the scsi layer will just retry the cmd immediately. For
TASK_SET_FULL it will lower the devices queue depth and then retry
immediately.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.