Antw: [EXT] Re: [PATCH] scsi: iscsi: prefer xmit of DataOut before new cmd

2022-06-16 Thread Ulrich Windl
>>> Adam Hutchinson  schrieb am 15.06.2022 um 20:57 in
Nachricht
:
> Is there any reason not to use time as an indicator that pending R2Ts
> need to be processed?  Could R2Ts be tagged with a timestamp when
> received and only given priority over new commands if the age of the
> R2T at the head exceeds some configurable limit? This would guarantee
> R2T will eventually be serviced even if the block layer doesn't reduce
> the submission rate of new commands, it wouldn't remove the
> performance benefits of the current algorithm which gives priority to
> new commands and it would be a relatively simple solution.  A
> threshold of 0 could indicate that R2Ts should always be given
> priority over new commands. Just a thought..

I had similar thought comparing SCSI command scheduling with process scheduling
real-time scheduling can cause starvation when newer requests are postponed 
indefinitely,
while the classic scheduler increases the chance of longer-waiting tasks to be 
scheduled next.
In any case that would require some sorting of the queue (or searching for a 
maximum/minimum in the requests which is equivalent).

Regards,
Ulrich


> 
> Regards,
> Adam
> 
> On Wed, Jun 15, 2022 at 11:37 AM Mike Christie
>  wrote:
>>
>> On 6/7/22 8:19 AM, Dmitry Bogdanov wrote:
>> > In function iscsi_data_xmit (TX worker) there is walking through the
>> > queue of new SCSI commands that is replenished in parallell. And only
>> > after that queue got emptied the function will start sending pending
>> > DataOut PDUs. That lead to DataOut timer time out on target side and
>> > to connection reinstatment.
>> >
>> > This patch swaps walking through the new commands queue and the pending
>> > DataOut queue. To make a preference to ongoing commands over new ones.
>> >
>> > Reviewed-by: Konstantin Shelekhin 
>> > Signed-off-by: Dmitry Bogdanov 
>>
>> Let's do this patch. I've tried so many combos of implementations and
>> they all have different perf gains or losses with different workloads.
>> I've already been going back and forth with myself for over a year
>> (the link for my patch in the other mail was version N) and I don't
>> think a common solution is going to happen.
>>
>> You patch fixes the bug, and I've found a workaround for my issue
>> where I tweak the queue depth, so I think we will be ok.
>>
>> Reviewed-by: Mike Christie 
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
> email to open-iscsi+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/open-iscsi/237bed01-819a-55be-5163-274fac3b 
> 61e6%40oracle.com.
> 
> 
> 
> -- 
> "Things turn out best for the people who make the best out of the way
> things turn out." - Art Linkletter
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to open-iscsi+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/open-iscsi/CAFU8FUgwMX_d85OG%2BqC%2BqTX-NpF 
> iSVkwBtradzAmeJW-3PCmEQ%40mail.gmail.com.




-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/62AC170C02A10004B106%40gwsmtp.uni-regensburg.de.


RE: Antw: [EXT] Re: [PATCH] scsi: iscsi: prefer xmit of DataOut before new cmd

2022-06-10 Thread Dmitriy Bogdanov
Hi Ulrich,

> In my primitive point of view iSCSI is just "another type of cable", making 
> me wonder:
> Is iSCSI allowed to reorder the requests at all? Shouldn't the block layer or 
> initiator do
> so, or the target doing out-of order processing (tagged queueing)?

iSCSI RFC does not require to serialize a commands flow. It's just an "iSCSI 
user" feature -
to send some set of SCSI commands in an unbreakable batch to a device.
But, as far as I understood, the problem, Mike described, is not a reorder but 
an increasing
of time between full data transmission of the commands from the batch.

> I mean: If there is a problem that occurs even without using iSCSI, should 
> iSCSI try to fix it?
Since that is software iSCSI specific issue it could be fixed/improved in 
software. How it's handled in
HW offloaded implementation is unknown for me.

BR,
 Dmitry

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/09cecc56ea2041dd8ccfafcba180f907%40yadro.com.


Antw: [EXT] Re: [PATCH] scsi: iscsi: prefer xmit of DataOut before new cmd

2022-06-10 Thread Ulrich Windl
Hi!

In my primitive point of view iSCSI is just "another type of cable", making me 
wonder:
Is iSCSI allowed to reorder the requests at all? Shouldn't the block layer or 
initiator do so, or the target doing out-of order processing (tagged queueing)?

I mean: If there is a problem that occurs even without using iSCSI, should 
iSCSI try to fix it?

Regards,
Ulrich

>>> Mike Christie  schrieb am 09.06.2022 um 22:58 
>>> in
Nachricht :
> On 6/9/22 4:02 AM, Dmitriy Bogdanov wrote:
>> Hi Mike,
>> 
 On 6/8/22 9:16 AM, Dmitriy Bogdanov wrote:
> On 6/7/22 10:55 AM, Mike Christie wrote:
>> On 6/7/22 8:19 AM, Dmitry Bogdanov wrote:
>>> In function iscsi_data_xmit (TX worker) there is walking through the
>>> queue of new SCSI commands that is replenished in parallell. And only
>>> after that queue got emptied the function will start sending pending
>>> DataOut PDUs. That lead to DataOut timer time out on target side and
>>> to connection reinstatment.
>>>
>>> This patch swaps walking through the new commands queue and the pending
>>> DataOut queue. To make a preference to ongoing commands over new ones.
>>>
>>
>> ...
>>
>>>  task = list_entry(conn->cmdqueue.next, struct iscsi_task,
>>> @@ -1594,28 +1616,10 @@ static int iscsi_data_xmit(struct iscsi_conn 
>>> *conn)
>>>   */
>>>  if (!list_empty(>mgmtqueue))
>>>  goto check_mgmt;
>>> +if (!list_empty(>requeue))
>>> +goto check_requeue;
>>
>>
>>
>> Hey, I've been posting a similar patch:
>>
>> 
> https://urldefense.com/v3/__https://www.spinics.net/lists/linux-scsi/msg15693 
> 9.html__;!!ACWV5N9M2RV99hQ!LHLghPLuyBZadpsGme03-HBoowa8sNiZYMKxKoz5E_BNu-M9-B
> iuNV_JS9kFxhnumNfhrxuR7qVdIaOH5X7iTfMO$
>>
>> A problem I hit is a possible pref regression so I tried to allow
>> us to start up a burst of cmds in parallel. It's pretty simple where
>> we allow up to a queue's worth of cmds to start. It doesn't try to
>> check that all cmds are from the same queue or anything fancy to try
>> and keep the code simple. Mostly just assuming users might try to bunch
>> cmds together during submission or they might hit the queue plugging
>> code.
>>
>> What do you think?
>
> Oh yeah, what about a modparam batch_limit? It's between 0 and 
> cmd_per_lun.
> 0 would check after every transmission like above.

  Did you really face with a perf regression? I could not imagine how it is
 possible.
 DataOut PDU contains a data too, so a throughput performance cannot be
 decreased by sending DataOut PDUs.
>>>
>>>
>>> We can agree that queue plugging and batching improves throughput right?
>>> The app or block layer may try to batch commands. It could be with something
>>> like fio's batch args or you hit the block layer queue plugging.
>> 
>> I agree that those features 100% gives an improvement of a throughput on 
> local
>> devices on serial interfaces like SATA1. Since SATA2 (Native Command 
> Queuing)
>> devices can reorder incoming commmands to provide the best thoughput.
>> SCSI I guess has similar feature from the very beginning.
>> But on remote devices (iSCSI and FC) it is not 100% - initiators's command
>> order may be reordered by the network protocol nature itself. I mean 1PDU vs
>> R2T+DataOut PDUs, PDU resending due to crc errors or something like that.
> I think we are talking about slightly different things. You are coming up 
> with
> different possible scenarios where it doesn't work. I get them. You are 
> correct
> for those cases. I'm not talking about those cases. I'm talking about the 
> specific
> case I described.
> 
> I'm saying we have targets where we use backends that get improved 
> performance
> when they get batched cmds. When the network is ok, and the user's app is
> batching cmds, they come from the app down to the target's backend device as
> a batch. My concern is that with your patch we will no longer get that 
> behavior.
> 
> The reason is that the target and initiator can do:
> 
> 1. initiator sends scsi cmd pdu1
> 2. target sends R2T
> 3. initiator sees R2T and hits the goto. Sends data
> 4. target reads in data. Sends cmd to block layer.
> 5. initiator sends scsi cmd pdu2
> 6. target sends R2T
> 7. initiator reads in R2T sends data.
> 8. target reads in data and sends cmd to block layer.
> 
> The problem here could be between 4 and 8 the block layer has run the queue
> and sent that one cmd to the real device already because we have that extra
> delay now. So when I implemented the fix in the same way as you and I run
> iostat I would see lower aqu-sz for example.
> 
> With the current code we might not have that extra delay between 4 - 8 so
> I would see:
> 
> 1. initiator sends scsi cmd pdu1
> 2. initiator sends scsi cmd pdu2
> 3. initiator sends scsi cmd pdu3
> 4. target 

Antw: [EXT] Re: [PATCH] scsi: iscsi: prefer xmit of DataOut before new cmd

2022-06-09 Thread Ulrich Windl
>>> Mike Christie  schrieb am 08.06.2022 um 17:36 
>>> in
Nachricht <48af6f5f-c3b6-ac65-836d-518153ab2...@oracle.com>:
> On 6/8/22 9:16 AM, Dmitriy Bogdanov wrote:
>> Hi Mike,
>> 
>>> On 6/7/22 10:55 AM, Mike Christie wrote:
 On 6/7/22 8:19 AM, Dmitry Bogdanov wrote:
> In function iscsi_data_xmit (TX worker) there is walking through the
> queue of new SCSI commands that is replenished in parallell. And only
> after that queue got emptied the function will start sending pending
> DataOut PDUs. That lead to DataOut timer time out on target side and
> to connection reinstatment.
>
> This patch swaps walking through the new commands queue and the pending
> DataOut queue. To make a preference to ongoing commands over new ones.
>

 ...

>  task = list_entry(conn->cmdqueue.next, struct iscsi_task,
> @@ -1594,28 +1616,10 @@ static int iscsi_data_xmit(struct iscsi_conn 
> *conn)
>   */
>  if (!list_empty(>mgmtqueue))
>  goto check_mgmt;
> +if (!list_empty(>requeue))
> +goto check_requeue;



 Hey, I've been posting a similar patch:

 
> https://urldefense.com/v3/__https://www.spinics.net/lists/linux-scsi/msg15693 
> 9.html__;!!ACWV5N9M2RV99hQ!LHLghPLuyBZadpsGme03-HBoowa8sNiZYMKxKoz5E_BNu-M9-B
> iuNV_JS9kFxhnumNfhrxuR7qVdIaOH5X7iTfMO$ 

 A problem I hit is a possible pref regression so I tried to allow
 us to start up a burst of cmds in parallel. It's pretty simple where
 we allow up to a queue's worth of cmds to start. It doesn't try to
 check that all cmds are from the same queue or anything fancy to try
 and keep the code simple. Mostly just assuming users might try to bunch
 cmds together during submission or they might hit the queue plugging
 code.

 What do you think?
>>>
>>> Oh yeah, what about a modparam batch_limit? It's between 0 and cmd_per_lun.
>>> 0 would check after every transmission like above.
>> 
>>  Did you really face with a perf regression? I could not imagine how it is
>> possible.
>> DataOut PDU contains a data too, so a throughput performance cannot be
>> decreased by sending DataOut PDUs.
> 
> 
> We can agree that queue plugging and batching improves throughput right?

Hi!

Isn't that the classic "throughput vs. response time"? I think you cannot 
optimize one without affecting the other.
(I can remember discussions like "You are sending one ethernet packet for each 
key pressed; are you crazy?" when network admins felt worried about throughput)

> The app or block layer may try to batch commands. It could be with something
> like fio's batch args or you hit the block layer queue plugging.
> 
> With the current code we can end up sending all cmds to the target in a way
> the target can send them to the real device batched. For example, we send 
> off
> the initial N scsi command PDUs in one run of iscsi_data_xmit. The target 
> reads
> them in, and sends off N R2Ts. We are able to read N R2Ts in the same call.
> And again we are able to send the needed data for them in one call of
> iscsi_data_xmit. The target is able to read in the data and send off the
> WRITEs to the physical device in a batch.
> 
> With your patch, we can end up not batching them like the app/block layer
> intended. For example, we now call iscsi_data_xmit and in the cmdqueue loop.
> We've sent N - M scsi cmd PDUs, then see that we've got an incoming R2T to
> handle. So we goto check_requeue. We send the needed data. The target then
> starts to send the cmd to the physical device. If we have read in multiple
> R2Ts then we will continue the requeue loop. And so we might be able to send
> the data fast enough that the target can then send those commands to the
> physical device. But we've now broken up the batching the upper layers sent
> to us and we were doing before.
> 
>> 
>>  The only thing is a latency performance. But that is not an easy question.
> 
> Agree latency is important and that's why I was saying we can make it config
> option. Users can continue to try and batch their cmds and we don't break
> them. We also fix the bug in that we don't get stuck in the cmdqueue loop
> always taking in new cmds.
> 
>> IMHO, a system should strive to reduce a maximum value of the latency almost
>> without impacting of a minimum value (prefer current commands) instead of
>> to reduce a minimum value of the latency to the detriment of maximum value
>> (prefer new commands).
>> 
>>  Any preference of new commands over current ones looks like an io scheduler
> 
> I can see your point of view where you see it as preferring new cmds
> vs existing. It's probably due to my patch not hooking into commit_rqs
> and trying to figure out the batching exactly. It's more of a simple
> estimate.

Is it also about the classic "reads stall when all buffers are dirty" (reads to 
a fast device may time-out