Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-04 Thread Hannes Reinecke
On 6/4/19 4:45 AM, Ming Lei wrote:
> On Mon, Jun 03, 2019 at 01:38:53PM +0200, Hannes Reinecke wrote:
>> On 6/3/19 11:31 AM, Ming Lei wrote:[ .. ]
>>> Your RFC patch changes to allow requests with same tag submitted to driver
>>> & hardware at the same time, so we should double-check if hisi_v3 hardware
>>> is happy with this change.
>>>
>>> John, is hisi_sas v3 fine with this way?
>>>
>> As mentioned above, I don't think this can happen.
> 
> Then prove where I am wrong.
> 
> Just attach a little bcc script, which should have been done by
> bpftrace easily, just not found how to pass multiple variable as key.
> 
> 1) run the attached bcc script in another terminal
> 
> 2) load null_blk via:
> 
> rmmod null_blk;modprobe null_blk queue_mode=2 irqmode=2 
> completion_nsec=100 submit_queues=2 hw_queue_depth=4
> 
> 3) run the following fio:
> fio --bs=4k --size=128G  --rw=randread --direct=1 --ioengine=libaio 
> --iodepth=16 --runtime=10 --name=fiotest --filename=/dev/nullb0 --numjobs=8
> 
> Then ctrl+C on the bcc script, and you will see how many requests with
> same tag from different queues are handled concurrently.
> 
> BTW, this test is run on dual core VM/Fedora 30, and bcc package is
> required.
> 
Tsk.

I do get old. After revisiting the relevant patches it does look as if
you are right after all. So apologies :-)

Which indeed means we have to implement shared host tags first, and
resurrect your original patchset.
On the good side this should also reduce the memory footprint, as with
that patchset we will only allocate queue_depth requests, not
queue_depth * nr_hw_queues as we do now.
So I guess it should be a worthwhile goal.

Will be updating the patches and do some performance measurements.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 01:38:53PM +0200, Hannes Reinecke wrote:
> On 6/3/19 11:31 AM, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 10:47:24AM +0200, Hannes Reinecke wrote:
> >> On 6/3/19 10:16 AM, Ming Lei wrote:
> >>> On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
>  On 6/3/19 9:37 AM, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
> >> On 6/1/19 1:06 AM, Ming Lei wrote:
> >>> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
>  On 5/31/19 10:46 AM, Ming Lei wrote:
> >> [ .. ]
>  First we check for the 'slot_index_alloc()' callback to handle weird 
>  v2
>  allocation rules, _then_ we look for a tag, and only if we do _not_ 
>  have
>  a tag we're using the bitmap.
> >>>
> >>> OK, looks I miss the above change.
> >>>
>  And the bitmap is already correctly sized, as otherwise we'd have a
>  clash between internal and tagged I/O commands even now.
> >>>
> >>> But now the big problem is in the following two line code:
> >>>
> >>> +   else if (blk_tag != (u32)-1)
> >>> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> >>>
> >>> Request from different blk-mq hw queue has same tag returned from
> >>> blk_mq_unique_tag_to_tag().
> >>>
> >> Yes, but the sbitmap allocator will ensure that each command will get a
> >> unique tag.
> >
> > Each hw queue has independent sbitmap allocator, so commands with same
> > tag can come from different hw queue.
> >
>  It does not for SCSI.
>  See below.
> 
> > So you meant this RFC patch depends on the host-wide tags patchset I
> > posted?
> >
> >>
> >>> Now the biggest question is that if V3 hw supports per-queue tags,
> >>> If yes, it should be real MQ hardware, otherwise I guess commands with
> >>> same tag at the same time may not work for host-wide tags.
> >>>
> >>
> >> Of course you can't have different commands with the same tag. But the
> >> sbitmap allocator prevents this from happening, as for host-wide tags
> >> the tagset is _shared_ between all devices, so the sbitmap allocator
> >> will only ever run on _one_ tagset for all commands.
> >
> > But blk-mq doesn't support host-wide tags yet, so how can this single
> > patch work?
> >
>  Wrong. It does:
> 
>  struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
>  {
>   sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
>   if (IS_ERR(sdev->request_queue))
>   return NULL;
> 
>   sdev->request_queue->queuedata = sdev;
>   __scsi_init_queue(sdev->host, sdev->request_queue);
>   blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
>   return sdev->request_queue;
>  }
> 
> 
>  IE every scsi device is using the tagset from the host.
> >>>
> >>> Looks we are not in the same page, and you misunderstood two concepts:
> >>> scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.
> >>>
> >>> I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.
> >>>
> >>> Now let me clarify it a bit:
> >>>
> >>> 1) the current SCSI hostwide tags means all LUNs share the host tagset,
> >>> but the tagset may include multiple hw queues, and each hw queue still
> >>> has independent tags, that is why blk-mq provides blk_mq_unique_tag().
> >>> In short, each LUN's hw queue has independent tags.
> >>>
> >> Which is where I fundamentally disagree.
> >> Each hw queue does _not_ have independent tags.
> >> Each hw queue will use tags from the same (host-wide) tagset; the tags
> >> themselves will be allocated for each queue on an ad-hoc base, ie there
> >> is no fixed mapping between tag values and hardware queues.
> > 
> > Tagset is set of tags, and one tags is for serving one hw queue.
> > 
> > Each hw queue has its own tags, please see __blk_mq_alloc_rq_map()
> > in which standalone sbitmap allocator and rq pool is allocated to
> > each hw queue represented by 'hctx_idx'.
> > 
> Yes, but ...

So you agree the theory.

> 
> > And for each hw queue, the allocated tag value for request is in
> > the range of 0 ~ queue_depth - 1, that is why I say requests from
> > different hw queue may have same tag.
> > 
> 
> But this is not what I have been observing working with lpfc and qla2xxx.
> Both drivers have been converted to using scsi-mq with nr_hw_queues > 1
> some years ago, and do work just fine.
> And none of those drivers allow for re-using an in-flight tag on
> different hardware queues.

Really?

Just run quick grep on the two drivers, looks both don't use cmnd->tag or
rq->tag.

> If your reasoning is correct none of these drivers would work.
> 
> > Your RFC patch changes to allow requests with same tag submitted to driver
> > & hardware at the same time, so we should double-check if hisi_v3 hardwa

Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Hannes Reinecke
On 6/3/19 1:38 PM, Hannes Reinecke wrote:
> On 6/3/19 11:31 AM, Ming Lei wrote:
>> On Mon, Jun 03, 2019 at 10:47:24AM +0200, Hannes Reinecke wrote:
>>> On 6/3/19 10:16 AM, Ming Lei wrote:
 On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
> On 6/3/19 9:37 AM, Ming Lei wrote:
>> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
>>> On 6/1/19 1:06 AM, Ming Lei wrote:
 On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
> On 5/31/19 10:46 AM, Ming Lei wrote:
>>> [ .. ]
> First we check for the 'slot_index_alloc()' callback to handle weird 
> v2
> allocation rules, _then_ we look for a tag, and only if we do _not_ 
> have
> a tag we're using the bitmap.

 OK, looks I miss the above change.

> And the bitmap is already correctly sized, as otherwise we'd have a
> clash between internal and tagged I/O commands even now.

 But now the big problem is in the following two line code:

 +   else if (blk_tag != (u32)-1)
 +   rc = blk_mq_unique_tag_to_tag(blk_tag);

 Request from different blk-mq hw queue has same tag returned from
 blk_mq_unique_tag_to_tag().

>>> Yes, but the sbitmap allocator will ensure that each command will get a
>>> unique tag.
>>
>> Each hw queue has independent sbitmap allocator, so commands with same
>> tag can come from different hw queue.
>>
> It does not for SCSI.
> See below.
>
>> So you meant this RFC patch depends on the host-wide tags patchset I
>> posted?
>>
>>>
 Now the biggest question is that if V3 hw supports per-queue tags,
 If yes, it should be real MQ hardware, otherwise I guess commands with
 same tag at the same time may not work for host-wide tags.

>>>
>>> Of course you can't have different commands with the same tag. But the
>>> sbitmap allocator prevents this from happening, as for host-wide tags
>>> the tagset is _shared_ between all devices, so the sbitmap allocator
>>> will only ever run on _one_ tagset for all commands.
>>
>> But blk-mq doesn't support host-wide tags yet, so how can this single
>> patch work?
>>
> Wrong. It does:
>
> struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
> {
>   sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
>   if (IS_ERR(sdev->request_queue))
>   return NULL;
>
>   sdev->request_queue->queuedata = sdev;
>   __scsi_init_queue(sdev->host, sdev->request_queue);
>   blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
>   return sdev->request_queue;
> }
>
>
> IE every scsi device is using the tagset from the host.

 Looks we are not in the same page, and you misunderstood two concepts:
 scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.

 I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.

 Now let me clarify it a bit:

 1) the current SCSI hostwide tags means all LUNs share the host tagset,
 but the tagset may include multiple hw queues, and each hw queue still
 has independent tags, that is why blk-mq provides blk_mq_unique_tag().
 In short, each LUN's hw queue has independent tags.

>>> Which is where I fundamentally disagree.
>>> Each hw queue does _not_ have independent tags.
>>> Each hw queue will use tags from the same (host-wide) tagset; the tags
>>> themselves will be allocated for each queue on an ad-hoc base, ie there
>>> is no fixed mapping between tag values and hardware queues.
>>
>> Tagset is set of tags, and one tags is for serving one hw queue.
>>
>> Each hw queue has its own tags, please see __blk_mq_alloc_rq_map()
>> in which standalone sbitmap allocator and rq pool is allocated to
>> each hw queue represented by 'hctx_idx'.
>>
> Yes, but ...
> 
>> And for each hw queue, the allocated tag value for request is in
>> the range of 0 ~ queue_depth - 1, that is why I say requests from
>> different hw queue may have same tag.
>>
> 
> But this is not what I have been observing working with lpfc and qla2xxx.
> Both drivers have been converted to using scsi-mq with nr_hw_queues > 1
> some years ago, and do work just fine.
> And none of those drivers allow for re-using an in-flight tag on
> different hardware queues.
> If your reasoning is correct none of these drivers would work.
> 
To illustrate what I mean I've attached a histogram from a fio run on a
smartpqi device (with 48 queues and 1024 tags in total).
As you can see, tags are evenly used by any queue, with the tag
allocator ensuring that in-flight tags will not be re-used on other hw
queues.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de 

Re: [PATCH RFC] hisi_sas_v3: multiqueue support (v2 hw divergence)

2019-06-03 Thread John Garry


The crazy (escalating from weird) rules to workaround the HW bug(s) mean
that we need to chop up the command tag range into blocks of 32 even tag
indexes per SATA device; this means that SATA device #0 can use 64, 66,
68, 70...126. device #1 can use 128, 130, 132,..., device #2 can use
192, 194,...

I don't know how you can take a rq tag and generate a command tag
suitable for a SATA device.
Actually, you can.

We can setup a _distinct_ tagset per SATA device.
Eg we can setup a shared tagset for SAS (which is half the size of the
original tagset), and shift the tags by one to get a valid SAS tag.
For SATA we can setup a _distinct_ tagset per device, containing 32
tags. The we can invoke some calculation to transform the tag (which is
not guaranteed to be in the range of 0-31) into a valid hardware tag.

Should actually work.


ok, fine. I suppose that each SATA device having its own tagset could be 
more efficient, as in reality the tags are separate.




Problem is that we'd need to set aside some tags for TMF,


It not just TMF to consider, but the HW also supports "internal abort", 
which also requires a unique tag. And in some scenarios we send ATA 
softreset, as is the abort flow design for this host.


 but I really

don't think that we can or should do command aborts on SATA; while there
is the 'abort NCQ' command, it'll work only for NCQ commands, and won't
help us for 'normal' commands.
And seeing that on error all NCQ commands will be aborted anyway, plus
the standard ATA error handler will be resulting into a device reset, I
guess we should skip command abort on SATA and escalate to device reset
straightaway. That would also have the nice benefit that we need only to
set _one_ tag aside for TMF, as we will only send one TMF at a time.



Yeah, I think that it should be ok. All error handlng is sequential.

Thanks very much,
John


Cheers,

Hannes






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Hannes Reinecke
On 6/3/19 11:31 AM, Ming Lei wrote:
> On Mon, Jun 03, 2019 at 10:47:24AM +0200, Hannes Reinecke wrote:
>> On 6/3/19 10:16 AM, Ming Lei wrote:
>>> On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
 On 6/3/19 9:37 AM, Ming Lei wrote:
> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
>> On 6/1/19 1:06 AM, Ming Lei wrote:
>>> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
 On 5/31/19 10:46 AM, Ming Lei wrote:
>> [ .. ]
 First we check for the 'slot_index_alloc()' callback to handle weird v2
 allocation rules, _then_ we look for a tag, and only if we do _not_ 
 have
 a tag we're using the bitmap.
>>>
>>> OK, looks I miss the above change.
>>>
 And the bitmap is already correctly sized, as otherwise we'd have a
 clash between internal and tagged I/O commands even now.
>>>
>>> But now the big problem is in the following two line code:
>>>
>>> +   else if (blk_tag != (u32)-1)
>>> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
>>>
>>> Request from different blk-mq hw queue has same tag returned from
>>> blk_mq_unique_tag_to_tag().
>>>
>> Yes, but the sbitmap allocator will ensure that each command will get a
>> unique tag.
>
> Each hw queue has independent sbitmap allocator, so commands with same
> tag can come from different hw queue.
>
 It does not for SCSI.
 See below.

> So you meant this RFC patch depends on the host-wide tags patchset I
> posted?
>
>>
>>> Now the biggest question is that if V3 hw supports per-queue tags,
>>> If yes, it should be real MQ hardware, otherwise I guess commands with
>>> same tag at the same time may not work for host-wide tags.
>>>
>>
>> Of course you can't have different commands with the same tag. But the
>> sbitmap allocator prevents this from happening, as for host-wide tags
>> the tagset is _shared_ between all devices, so the sbitmap allocator
>> will only ever run on _one_ tagset for all commands.
>
> But blk-mq doesn't support host-wide tags yet, so how can this single
> patch work?
>
 Wrong. It does:

 struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
 {
sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
if (IS_ERR(sdev->request_queue))
return NULL;

sdev->request_queue->queuedata = sdev;
__scsi_init_queue(sdev->host, sdev->request_queue);
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
return sdev->request_queue;
 }


 IE every scsi device is using the tagset from the host.
>>>
>>> Looks we are not in the same page, and you misunderstood two concepts:
>>> scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.
>>>
>>> I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.
>>>
>>> Now let me clarify it a bit:
>>>
>>> 1) the current SCSI hostwide tags means all LUNs share the host tagset,
>>> but the tagset may include multiple hw queues, and each hw queue still
>>> has independent tags, that is why blk-mq provides blk_mq_unique_tag().
>>> In short, each LUN's hw queue has independent tags.
>>>
>> Which is where I fundamentally disagree.
>> Each hw queue does _not_ have independent tags.
>> Each hw queue will use tags from the same (host-wide) tagset; the tags
>> themselves will be allocated for each queue on an ad-hoc base, ie there
>> is no fixed mapping between tag values and hardware queues.
> 
> Tagset is set of tags, and one tags is for serving one hw queue.
> 
> Each hw queue has its own tags, please see __blk_mq_alloc_rq_map()
> in which standalone sbitmap allocator and rq pool is allocated to
> each hw queue represented by 'hctx_idx'.
> 
Yes, but ...

> And for each hw queue, the allocated tag value for request is in
> the range of 0 ~ queue_depth - 1, that is why I say requests from
> different hw queue may have same tag.
> 

But this is not what I have been observing working with lpfc and qla2xxx.
Both drivers have been converted to using scsi-mq with nr_hw_queues > 1
some years ago, and do work just fine.
And none of those drivers allow for re-using an in-flight tag on
different hardware queues.
If your reasoning is correct none of these drivers would work.

> Your RFC patch changes to allow requests with same tag submitted to driver
> & hardware at the same time, so we should double-check if hisi_v3 hardware
> is happy with this change.
> 
> John, is hisi_sas v3 fine with this way?
> 
As mentioned above, I don't think this can happen.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 11:07:11AM +0100, John Garry wrote:
> On 03/06/2019 10:54, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 09:57:26AM +0100, John Garry wrote:
> > > > > > Otherwise duplicated slot can be used from different blk-mq hw 
> > > > > > queue.
> > > > > > 
> > > > > > > 
> > > > > > > > The worsen thing is that V3's actual max queue depth is (4096 - 
> > > > > > > > 96), but
> > > > > > > > this patch claims that the device can support (4096 - 96) * 32 
> > > > > > > > command
> > > > > > > > slots
> > > > > 
> > > > > To be clear about the hw, the hw supports max 4096 command tags and 
> > > > > has 16
> > > > 
> > > > Is 4096 the max allowed host-wide command tags? Or per-queue's max 
> > > > commands
> > > > tags?
> > > 
> > > 4096 is max allowed host wide, in range [0, 4096), and tags are not per
> > > queue. HW delivery queues can be configured to be as large as desired.
> > 
> > Then all delivery(and complete) queues share the 4096 command slots, we 
> > can't
> > convert hisi_sas V3 into typical blk-mq MQ model simply as done by Hannes's 
> > patch.
> > 
> > Or you may partition the 4096 tags into 16 queues, then each queue's
> > depth is ~256. Depends on if performance is good for workloads.
> > You still can build a unique tag easily in [0, 4096), such as
> > (req->tag * hw_queue_index).
> > 
> > blk-mq's hw queue has independent tags, which is from NVMe's 
> > submission/completion
> > queue.
> 
> e, and this is what I thought you were addressing in "[PATCH 1/9]
> blk-mq: allow hw queues to share hostwide tags", right?

That patch allows drivers, such as hisi_sas v3, to share the same single
tags among all hw queues, then the private hw queue(reply queue, or
delivery/complete queue) can be converted to blk-mq hw queue.

> 
> > 
> > > 
> > > And on another point I saw mentioned, the device supports multiple
> > > submission and multiple completion queues. They are symmetrical, and any
> > > command will always complete on the same queue index which it was 
> > > submitted.
> > 
> > DQ & CQ did confuse me a bit, :-)
> 
> DQ is "delivery" queue. Interna; terminolgy.
> 
> In response to earlier "It depends on if hisi_sas V3 hardware supports
> independent tags for each queue. "
> 
> As you now know, it doesn't.
> 
> TBH, I would be slightly surprised if any SCSI host supported independent
> tags per HW queue (ignoring these tri-mode types). The reason is that
> hisi_sas uses this tag as SAS SSP frame tag, and later used for TMF tag. If
> we sent different commands to a SCSI end device from different queues, then
> possibly non-unique tags and this would break the model. But maybe other
> SCSI host work differently.

When I tested some SCSI FC at MQ mode years ago, I did suspect if their tags
is hw queue wide, but result is really true.

Thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread John Garry

On 03/06/2019 10:54, Ming Lei wrote:

On Mon, Jun 03, 2019 at 09:57:26AM +0100, John Garry wrote:

Otherwise duplicated slot can be used from different blk-mq hw queue.




The worsen thing is that V3's actual max queue depth is (4096 - 96), but
this patch claims that the device can support (4096 - 96) * 32 command
slots


To be clear about the hw, the hw supports max 4096 command tags and has 16


Is 4096 the max allowed host-wide command tags? Or per-queue's max commands
tags?


4096 is max allowed host wide, in range [0, 4096), and tags are not per
queue. HW delivery queues can be configured to be as large as desired.


Then all delivery(and complete) queues share the 4096 command slots, we can't
convert hisi_sas V3 into typical blk-mq MQ model simply as done by Hannes's 
patch.

Or you may partition the 4096 tags into 16 queues, then each queue's
depth is ~256. Depends on if performance is good for workloads.
You still can build a unique tag easily in [0, 4096), such as
(req->tag * hw_queue_index).

blk-mq's hw queue has independent tags, which is from NVMe's 
submission/completion
queue.


e, and this is what I thought you were addressing in "[PATCH 1/9] 
blk-mq: allow hw queues to share hostwide tags", right?






And on another point I saw mentioned, the device supports multiple
submission and multiple completion queues. They are symmetrical, and any
command will always complete on the same queue index which it was submitted.


DQ & CQ did confuse me a bit, :-)


DQ is "delivery" queue. Interna; terminolgy.

In response to earlier "It depends on if hisi_sas V3 hardware supports 
independent tags for each queue. "


As you now know, it doesn't.

TBH, I would be slightly surprised if any SCSI host supported 
independent tags per HW queue (ignoring these tri-mode types). The 
reason is that hisi_sas uses this tag as SAS SSP frame tag, and later 
used for TMF tag. If we sent different commands to a SCSI end device 
from different queues, then possibly non-unique tags and this would 
break the model. But maybe other SCSI host work differently.


Thanks,
John



Thanks,
Ming

.






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 09:57:26AM +0100, John Garry wrote:
> > > > Otherwise duplicated slot can be used from different blk-mq hw queue.
> > > > 
> > > > > 
> > > > > > The worsen thing is that V3's actual max queue depth is (4096 - 
> > > > > > 96), but
> > > > > > this patch claims that the device can support (4096 - 96) * 32 
> > > > > > command
> > > > > > slots
> > > 
> > > To be clear about the hw, the hw supports max 4096 command tags and has 16
> > 
> > Is 4096 the max allowed host-wide command tags? Or per-queue's max commands
> > tags?
> 
> 4096 is max allowed host wide, in range [0, 4096), and tags are not per
> queue. HW delivery queues can be configured to be as large as desired.

Then all delivery(and complete) queues share the 4096 command slots, we can't
convert hisi_sas V3 into typical blk-mq MQ model simply as done by Hannes's 
patch.

Or you may partition the 4096 tags into 16 queues, then each queue's
depth is ~256. Depends on if performance is good for workloads.
You still can build a unique tag easily in [0, 4096), such as
(req->tag * hw_queue_index).

blk-mq's hw queue has independent tags, which is from NVMe's 
submission/completion
queue.

> 
> And on another point I saw mentioned, the device supports multiple
> submission and multiple completion queues. They are symmetrical, and any
> command will always complete on the same queue index which it was submitted.

DQ & CQ did confuse me a bit, :-)

Thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread John Garry

On 03/06/2019 08:46, Hannes Reinecke wrote:

On 6/3/19 9:37 AM, Ming Lei wrote:

On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:

On 6/1/19 1:06 AM, Ming Lei wrote:

On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:

On 5/31/19 10:46 AM, Ming Lei wrote:

[ .. ]

First we check for the 'slot_index_alloc()' callback to handle weird v2
allocation rules, _then_ we look for a tag, and only if we do _not_ have
a tag we're using the bitmap.


OK, looks I miss the above change.


And the bitmap is already correctly sized, as otherwise we'd have a
clash between internal and tagged I/O commands even now.


But now the big problem is in the following two line code:

+   else if (blk_tag != (u32)-1)
+   rc = blk_mq_unique_tag_to_tag(blk_tag);

Request from different blk-mq hw queue has same tag returned from
blk_mq_unique_tag_to_tag().


Yes, but the sbitmap allocator will ensure that each command will get a
unique tag.


Each hw queue has independent sbitmap allocator, so commands with same
tag can come from different hw queue.


It does not for SCSI.
See below.


So you meant this RFC patch depends on the host-wide tags patchset I
posted?




Now the biggest question is that if V3 hw supports per-queue tags,
If yes, it should be real MQ hardware, otherwise I guess commands with
same tag at the same time may not work for host-wide tags.



Of course you can't have different commands with the same tag. But the
sbitmap allocator prevents this from happening, as for host-wide tags
the tagset is _shared_ between all devices, so the sbitmap allocator
will only ever run on _one_ tagset for all commands.


But blk-mq doesn't support host-wide tags yet, so how can this single
patch work?


Wrong. It does:

struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
{
sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
if (IS_ERR(sdev->request_queue))
return NULL;

sdev->request_queue->queuedata = sdev;
__scsi_init_queue(sdev->host, sdev->request_queue);
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
return sdev->request_queue;
}


IE every scsi device is using the tagset from the host.



Hi Hannes,

Do you think that we have a problem for the bsg devices we create in 
SCSI transport SAS, in that they don't seem to use the same host tagset?


I'm looking at scsi_transport_sas.c::sas_bsg_initialise()->bsg_setup_queue()

Thanks,
John


Cheers,

Hannes






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 10:47:24AM +0200, Hannes Reinecke wrote:
> On 6/3/19 10:16 AM, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
> >> On 6/3/19 9:37 AM, Ming Lei wrote:
> >>> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
>  On 6/1/19 1:06 AM, Ming Lei wrote:
> > On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
> >> On 5/31/19 10:46 AM, Ming Lei wrote:
>  [ .. ]
> >> First we check for the 'slot_index_alloc()' callback to handle weird v2
> >> allocation rules, _then_ we look for a tag, and only if we do _not_ 
> >> have
> >> a tag we're using the bitmap.
> >
> > OK, looks I miss the above change.
> >
> >> And the bitmap is already correctly sized, as otherwise we'd have a
> >> clash between internal and tagged I/O commands even now.
> >
> > But now the big problem is in the following two line code:
> >
> > +   else if (blk_tag != (u32)-1)
> > +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> >
> > Request from different blk-mq hw queue has same tag returned from
> > blk_mq_unique_tag_to_tag().
> >
>  Yes, but the sbitmap allocator will ensure that each command will get a
>  unique tag.
> >>>
> >>> Each hw queue has independent sbitmap allocator, so commands with same
> >>> tag can come from different hw queue.
> >>>
> >> It does not for SCSI.
> >> See below.
> >>
> >>> So you meant this RFC patch depends on the host-wide tags patchset I
> >>> posted?
> >>>
> 
> > Now the biggest question is that if V3 hw supports per-queue tags,
> > If yes, it should be real MQ hardware, otherwise I guess commands with
> > same tag at the same time may not work for host-wide tags.
> >
> 
>  Of course you can't have different commands with the same tag. But the
>  sbitmap allocator prevents this from happening, as for host-wide tags
>  the tagset is _shared_ between all devices, so the sbitmap allocator
>  will only ever run on _one_ tagset for all commands.
> >>>
> >>> But blk-mq doesn't support host-wide tags yet, so how can this single
> >>> patch work?
> >>>
> >> Wrong. It does:
> >>
> >> struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
> >> {
> >>sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
> >>if (IS_ERR(sdev->request_queue))
> >>return NULL;
> >>
> >>sdev->request_queue->queuedata = sdev;
> >>__scsi_init_queue(sdev->host, sdev->request_queue);
> >>blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
> >>return sdev->request_queue;
> >> }
> >>
> >>
> >> IE every scsi device is using the tagset from the host.
> > 
> > Looks we are not in the same page, and you misunderstood two concepts:
> > scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.
> > 
> > I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.
> > 
> > Now let me clarify it a bit:
> > 
> > 1) the current SCSI hostwide tags means all LUNs share the host tagset,
> > but the tagset may include multiple hw queues, and each hw queue still
> > has independent tags, that is why blk-mq provides blk_mq_unique_tag().
> > In short, each LUN's hw queue has independent tags.
> > 
> Which is where I fundamentally disagree.
> Each hw queue does _not_ have independent tags.
> Each hw queue will use tags from the same (host-wide) tagset; the tags
> themselves will be allocated for each queue on an ad-hoc base, ie there
> is no fixed mapping between tag values and hardware queues.

Tagset is set of tags, and one tags is for serving one hw queue.

Each hw queue has its own tags, please see __blk_mq_alloc_rq_map()
in which standalone sbitmap allocator and rq pool is allocated to
each hw queue represented by 'hctx_idx'.

And for each hw queue, the allocated tag value for request is in
the range of 0 ~ queue_depth - 1, that is why I say requests from
different hw queue may have same tag.

Your RFC patch changes to allow requests with same tag submitted to driver
& hardware at the same time, so we should double-check if hisi_v3 hardware
is happy with this change.

John, is hisi_sas v3 fine with this way?

> Which is why there is blk_mq_unique_tag(); this precisely returns a tag
> which is unique across all devices from this host by shifting the queue
> number on top of the actual tag number:
> 
> u32 blk_mq_unique_tag(struct request *rq)
> {
>   return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) |
>   (rq->tag & BLK_MQ_UNIQUE_TAG_MASK);
> }
>

See the two line code in your patch again:

+   else if (blk_tag != (u32)-1)
+   rc = blk_mq_unique_tag_to_tag(blk_tag);

rc is same with rq->tag, which can be same for requests submitted from
different hw queues at the same time.

Then look at the following code in hisi_sas_task_prep():

slot_idx = rc;
slot = &hisi_hba->slot_info[slot_idx];

Your patc

Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread John Garry

Otherwise duplicated slot can be used from different blk-mq hw queue.




The worsen thing is that V3's actual max queue depth is (4096 - 96), but
this patch claims that the device can support (4096 - 96) * 32 command
slots


To be clear about the hw, the hw supports max 4096 command tags and has 16


Is 4096 the max allowed host-wide command tags? Or per-queue's max commands
tags?


4096 is max allowed host wide, in range [0, 4096), and tags are not per 
queue. HW delivery queues can be configured to be as large as desired.


And on another point I saw mentioned, the device supports multiple 
submission and multiple completion queues. They are symmetrical, and any 
command will always complete on the same queue index which it was submitted.




If it is per-queue's max command tags, V3 should be real MQ hardware,
otherwise it is still host wide. Otherwise, Hannes's patch can't work
because tag from different hw queue can be same.


hw queues. The hw queue depth is configurable by software, and we configure
it at 4096 per queue - same as max command tags - this is to support
possibility of all commands using the same queue simultaneously.


Suppose 4096 is the host-wide command tags:

As Hannes's patch changed to allow each blk-mq hw queue to accept 4096 commands,
there will be big contention in driver, given now the actual .can_queue becomes
4096 * 32, and how to avoid the same tag from different hw queue?


The documentation needs to be clarifed for this, to avoid future confusion.



Thanks,
Ming

.






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Hannes Reinecke
On 6/3/19 10:16 AM, Ming Lei wrote:
> On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
>> On 6/3/19 9:37 AM, Ming Lei wrote:
>>> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
 On 6/1/19 1:06 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
>> On 5/31/19 10:46 AM, Ming Lei wrote:
 [ .. ]
>> First we check for the 'slot_index_alloc()' callback to handle weird v2
>> allocation rules, _then_ we look for a tag, and only if we do _not_ have
>> a tag we're using the bitmap.
>
> OK, looks I miss the above change.
>
>> And the bitmap is already correctly sized, as otherwise we'd have a
>> clash between internal and tagged I/O commands even now.
>
> But now the big problem is in the following two line code:
>
> +   else if (blk_tag != (u32)-1)
> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
>
> Request from different blk-mq hw queue has same tag returned from
> blk_mq_unique_tag_to_tag().
>
 Yes, but the sbitmap allocator will ensure that each command will get a
 unique tag.
>>>
>>> Each hw queue has independent sbitmap allocator, so commands with same
>>> tag can come from different hw queue.
>>>
>> It does not for SCSI.
>> See below.
>>
>>> So you meant this RFC patch depends on the host-wide tags patchset I
>>> posted?
>>>

> Now the biggest question is that if V3 hw supports per-queue tags,
> If yes, it should be real MQ hardware, otherwise I guess commands with
> same tag at the same time may not work for host-wide tags.
>

 Of course you can't have different commands with the same tag. But the
 sbitmap allocator prevents this from happening, as for host-wide tags
 the tagset is _shared_ between all devices, so the sbitmap allocator
 will only ever run on _one_ tagset for all commands.
>>>
>>> But blk-mq doesn't support host-wide tags yet, so how can this single
>>> patch work?
>>>
>> Wrong. It does:
>>
>> struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
>> {
>>  sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
>>  if (IS_ERR(sdev->request_queue))
>>  return NULL;
>>
>>  sdev->request_queue->queuedata = sdev;
>>  __scsi_init_queue(sdev->host, sdev->request_queue);
>>  blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
>>  return sdev->request_queue;
>> }
>>
>>
>> IE every scsi device is using the tagset from the host.
> 
> Looks we are not in the same page, and you misunderstood two concepts:
> scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.
> 
> I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.
> 
> Now let me clarify it a bit:
> 
> 1) the current SCSI hostwide tags means all LUNs share the host tagset,
> but the tagset may include multiple hw queues, and each hw queue still
> has independent tags, that is why blk-mq provides blk_mq_unique_tag().
> In short, each LUN's hw queue has independent tags.
> 
Which is where I fundamentally disagree.
Each hw queue does _not_ have independent tags.
Each hw queue will use tags from the same (host-wide) tagset; the tags
themselves will be allocated for each queue on an ad-hoc base, ie there
is no fixed mapping between tag values and hardware queues.
Which is why there is blk_mq_unique_tag(); this precisely returns a tag
which is unique across all devices from this host by shifting the queue
number on top of the actual tag number:

u32 blk_mq_unique_tag(struct request *rq)
{
return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) |
(rq->tag & BLK_MQ_UNIQUE_TAG_MASK);
}

So the tagset itself is not split across devices, and all devices can
(potentially) use all tags from the tagset.

> 
> 2) some drivers(hisi_v3 hw, hpsa, megraid_sas, mpt3sas) only support
> single hw queue, but has multiple reply queue which can avoid CPU
> lockup, so we are working towards converting the private reply queue
> into blk-mq hw queue. That is what BLK_MQ_F_HOST_TAGS covers.
> 
I am aware.

> Now you think 1) is enough for converting the private reply queue into
> blk-mq hw queue, that is definitely not correct since blk-mq MQ doesn't
> support shared tags among hw queues.
> 
Currently blk-mq assumes a symmetric submission/completion model, ie it
is assumed that a multiqueue device will have the same number of
submissions and completion queues.

For the HBAs listed above the submission works by writing an address
into a single dedicated PCI register, whereas there are completion
queues per interrupt. So we really have a 1:n mapping between submission
and completion queues.

Which is what your patchset is trying to address, and I'm perfectly fine
with this.

All I'm arguing is that hisi_sas v3 really maps onto the original model,
seeing that is has per-queue submission paths, and as such maps more
closely on the original model for b

Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 09:46:39AM +0200, Hannes Reinecke wrote:
> On 6/3/19 9:37 AM, Ming Lei wrote:
> > On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
> >> On 6/1/19 1:06 AM, Ming Lei wrote:
> >>> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
>  On 5/31/19 10:46 AM, Ming Lei wrote:
> >> [ .. ]
>  First we check for the 'slot_index_alloc()' callback to handle weird v2
>  allocation rules, _then_ we look for a tag, and only if we do _not_ have
>  a tag we're using the bitmap.
> >>>
> >>> OK, looks I miss the above change.
> >>>
>  And the bitmap is already correctly sized, as otherwise we'd have a
>  clash between internal and tagged I/O commands even now.
> >>>
> >>> But now the big problem is in the following two line code:
> >>>
> >>> +   else if (blk_tag != (u32)-1)
> >>> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> >>>
> >>> Request from different blk-mq hw queue has same tag returned from
> >>> blk_mq_unique_tag_to_tag().
> >>>
> >> Yes, but the sbitmap allocator will ensure that each command will get a
> >> unique tag.
> > 
> > Each hw queue has independent sbitmap allocator, so commands with same
> > tag can come from different hw queue.
> > 
> It does not for SCSI.
> See below.
> 
> > So you meant this RFC patch depends on the host-wide tags patchset I
> > posted?
> > 
> >>
> >>> Now the biggest question is that if V3 hw supports per-queue tags,
> >>> If yes, it should be real MQ hardware, otherwise I guess commands with
> >>> same tag at the same time may not work for host-wide tags.
> >>>
> >>
> >> Of course you can't have different commands with the same tag. But the
> >> sbitmap allocator prevents this from happening, as for host-wide tags
> >> the tagset is _shared_ between all devices, so the sbitmap allocator
> >> will only ever run on _one_ tagset for all commands.
> > 
> > But blk-mq doesn't support host-wide tags yet, so how can this single
> > patch work?
> > 
> Wrong. It does:
> 
> struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
> {
>   sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
>   if (IS_ERR(sdev->request_queue))
>   return NULL;
> 
>   sdev->request_queue->queuedata = sdev;
>   __scsi_init_queue(sdev->host, sdev->request_queue);
>   blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
>   return sdev->request_queue;
> }
> 
> 
> IE every scsi device is using the tagset from the host.

Looks we are not in the same page, and you misunderstood two concepts:
scsi's host-wide tagset, and the new host-tags of BLK_MQ_F_HOST_TAGS.

I admit that the new flag of BLK_MQ_F_HOST_TAGS is misleading.

Now let me clarify it a bit:

1) the current SCSI hostwide tags means all LUNs share the host tagset,
but the tagset may include multiple hw queues, and each hw queue still
has independent tags, that is why blk-mq provides blk_mq_unique_tag().
In short, each LUN's hw queue has independent tags.


2) some drivers(hisi_v3 hw, hpsa, megraid_sas, mpt3sas) only support
single hw queue, but has multiple reply queue which can avoid CPU
lockup, so we are working towards converting the private reply queue
into blk-mq hw queue. That is what BLK_MQ_F_HOST_TAGS covers.

Now you think 1) is enough for converting the private reply queue into
blk-mq hw queue, that is definitely not correct since blk-mq MQ doesn't
support shared tags among hw queues.

Please take a look at the patch you posted before, and you will see
the point.

https://marc.info/?l=linux-block&m=149132580511346&w=2




Thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Hannes Reinecke
On 6/3/19 9:37 AM, Ming Lei wrote:
> On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
>> On 6/1/19 1:06 AM, Ming Lei wrote:
>>> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
 On 5/31/19 10:46 AM, Ming Lei wrote:
>> [ .. ]
 First we check for the 'slot_index_alloc()' callback to handle weird v2
 allocation rules, _then_ we look for a tag, and only if we do _not_ have
 a tag we're using the bitmap.
>>>
>>> OK, looks I miss the above change.
>>>
 And the bitmap is already correctly sized, as otherwise we'd have a
 clash between internal and tagged I/O commands even now.
>>>
>>> But now the big problem is in the following two line code:
>>>
>>> +   else if (blk_tag != (u32)-1)
>>> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
>>>
>>> Request from different blk-mq hw queue has same tag returned from
>>> blk_mq_unique_tag_to_tag().
>>>
>> Yes, but the sbitmap allocator will ensure that each command will get a
>> unique tag.
> 
> Each hw queue has independent sbitmap allocator, so commands with same
> tag can come from different hw queue.
> 
It does not for SCSI.
See below.

> So you meant this RFC patch depends on the host-wide tags patchset I
> posted?
> 
>>
>>> Now the biggest question is that if V3 hw supports per-queue tags,
>>> If yes, it should be real MQ hardware, otherwise I guess commands with
>>> same tag at the same time may not work for host-wide tags.
>>>
>>
>> Of course you can't have different commands with the same tag. But the
>> sbitmap allocator prevents this from happening, as for host-wide tags
>> the tagset is _shared_ between all devices, so the sbitmap allocator
>> will only ever run on _one_ tagset for all commands.
> 
> But blk-mq doesn't support host-wide tags yet, so how can this single
> patch work?
> 
Wrong. It does:

struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
{
sdev->request_queue = blk_mq_init_queue(&sdev->host->tag_set);
if (IS_ERR(sdev->request_queue))
return NULL;

sdev->request_queue->queuedata = sdev;
__scsi_init_queue(sdev->host, sdev->request_queue);
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
return sdev->request_queue;
}


IE every scsi device is using the tagset from the host.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-03 Thread Ming Lei
On Mon, Jun 03, 2019 at 08:08:18AM +0200, Hannes Reinecke wrote:
> On 6/1/19 1:06 AM, Ming Lei wrote:
> > On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
> >> On 5/31/19 10:46 AM, Ming Lei wrote:
> [ .. ]
> >> First we check for the 'slot_index_alloc()' callback to handle weird v2
> >> allocation rules, _then_ we look for a tag, and only if we do _not_ have
> >> a tag we're using the bitmap.
> > 
> > OK, looks I miss the above change.
> > 
> >> And the bitmap is already correctly sized, as otherwise we'd have a
> >> clash between internal and tagged I/O commands even now.
> > 
> > But now the big problem is in the following two line code:
> > 
> > +   else if (blk_tag != (u32)-1)
> > +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> > 
> > Request from different blk-mq hw queue has same tag returned from
> > blk_mq_unique_tag_to_tag().
> > 
> Yes, but the sbitmap allocator will ensure that each command will get a
> unique tag.

Each hw queue has independent sbitmap allocator, so commands with same
tag can come from different hw queue.

So you meant this RFC patch depends on the host-wide tags patchset I
posted?

> 
> > Now the biggest question is that if V3 hw supports per-queue tags,
> > If yes, it should be real MQ hardware, otherwise I guess commands with
> > same tag at the same time may not work for host-wide tags.
> > 
> 
> Of course you can't have different commands with the same tag. But the
> sbitmap allocator prevents this from happening, as for host-wide tags
> the tagset is _shared_ between all devices, so the sbitmap allocator
> will only ever run on _one_ tagset for all commands.

But blk-mq doesn't support host-wide tags yet, so how can this single
patch work?


Thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-02 Thread Hannes Reinecke
On 5/31/19 3:46 PM, John Garry wrote:
> On 31/05/2019 14:18, Hannes Reinecke wrote:
>> On 5/31/19 11:42 AM, John Garry wrote:
>>>
>>> -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
>>> - struct scsi_cmnd *scsi_cmnd)
>>> +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
>>>  {
>>>  int index;
>>>  void *bitmap = hisi_hba->slot_index_tags;
>>>  unsigned long flags;
>>>
>>> -    if (scsi_cmnd)
>>> -    return scsi_cmnd->request->tag;
>>> -
>>>  spin_lock_irqsave(&hisi_hba->lock, flags);
>>>  index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
>>>     hisi_hba->last_slot_index + 1);
>>
>> Then you switch to hisi_sas_slot_index_alloc() for allocating the
>> unique
>> tag via spin_lock & find_next_zero_bit(). Do you think this way is
>> more
>> efficient than blk-mq's sbitmap?
>>>
>>> These are not fast path, as used only for TMF, internal IO, etc.
>>>
>>> Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd()
>>> when available, so that the LLDD has to stop managing them.
>>>
>>
> slot_index_alloc() is only used for commands which do _not_ have a tag
> (eg internal commands), or for v2 hardware which has weird allocation
> rules.

 But this patch has switched to this allocation unconditionally for all
 commands:

>>>
>>> As Hannes said, v2 had a few bugs which meant that we had to make a
>>> specific version of this function for that hw revision, cf.
>>> slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.
>>>
>>> But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it
>>> would help, considering the weird rules.
>>>
>> We should be able to get away by shifting all tags by 1 to the left,
>> and adding '1' to SMP/SAS commands, and '2' to STP commands, no?
>> Then index '0' would be free, and the allocation rules are satisfied.
>>
> 
> The crazy (escalating from weird) rules to workaround the HW bug(s) mean
> that we need to chop up the command tag range into blocks of 32 even tag
> indexes per SATA device; this means that SATA device #0 can use 64, 66,
> 68, 70...126. device #1 can use 128, 130, 132,..., device #2 can use
> 192, 194,...
> 
> I don't know how you can take a rq tag and generate a command tag
> suitable for a SATA device.
> Actually, you can.
We can setup a _distinct_ tagset per SATA device.
Eg we can setup a shared tagset for SAS (which is half the size of the
original tagset), and shift the tags by one to get a valid SAS tag.
For SATA we can setup a _distinct_ tagset per device, containing 32
tags. The we can invoke some calculation to transform the tag (which is
not guaranteed to be in the range of 0-31) into a valid hardware tag.

Should actually work.

Problem is that we'd need to set aside some tags for TMF, but I really
don't think that we can or should do command aborts on SATA; while there
is the 'abort NCQ' command, it'll work only for NCQ commands, and won't
help us for 'normal' commands.
And seeing that on error all NCQ commands will be aborted anyway, plus
the standard ATA error handler will be resulting into a device reset, I
guess we should skip command abort on SATA and escalate to device reset
straightaway. That would also have the nice benefit that we need only to
set _one_ tag aside for TMF, as we will only send one TMF at a time.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-06-02 Thread Hannes Reinecke
On 6/1/19 1:06 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
>> On 5/31/19 10:46 AM, Ming Lei wrote:
[ .. ]
>> First we check for the 'slot_index_alloc()' callback to handle weird v2
>> allocation rules, _then_ we look for a tag, and only if we do _not_ have
>> a tag we're using the bitmap.
> 
> OK, looks I miss the above change.
> 
>> And the bitmap is already correctly sized, as otherwise we'd have a
>> clash between internal and tagged I/O commands even now.
> 
> But now the big problem is in the following two line code:
> 
> +   else if (blk_tag != (u32)-1)
> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> 
> Request from different blk-mq hw queue has same tag returned from
> blk_mq_unique_tag_to_tag().
> 
Yes, but the sbitmap allocator will ensure that each command will get a
unique tag.

> Now the biggest question is that if V3 hw supports per-queue tags,
> If yes, it should be real MQ hardware, otherwise I guess commands with
> same tag at the same time may not work for host-wide tags.
> 

Of course you can't have different commands with the same tag. But the
sbitmap allocator prevents this from happening, as for host-wide tags
the tagset is _shared_ between all devices, so the sbitmap allocator
will only ever run on _one_ tagset for all commands.

>>
 -   if (scsi_cmnd)
 -   return scsi_cmnd->request->tag;
 -
>>>
>>> Otherwise duplicated slot can be used from different blk-mq hw queue.
>>>

> The worsen thing is that V3's actual max queue depth is (4096 - 96), but
> this patch claims that the device can support (4096 - 96) * 32 command
> slots, finally hisi_sas_slot_index_alloc() is used to respect the actual
> max queue depth(4000).
>
 Well, this patch is an RFC to demonstrate my idea. Of course the queue
 depth should be identical before and after the conversion.
>>>
>>> That is why I call it is hard to partition the hostwide tags to MQ.
>>>
>> It's not. The driver already sets aside a portion of tags for internal
>> commands (check HISI_SAS_RESERVED_IPTT_CNT), so it is already
>> effectively partitioned.
> 
> I meant partitioning for each hw queue because you said it is host-wide
> tags.
> 
> I still don't understand why you can it is host-wide tags, and now each
> queue has same tag space with the host-wide tags. 
> 
See above. For host-wide tags the tagset is shared between all devices
(cf scsi_mq_alloc_queue() and scsi_mq_setup_tags()), so we do not need
to partition the tagspace between devices.
And as there is only one tagset the sbitmap allocator ensures that each
tag is in fact unique across _all_ devices.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Ming Lei
On Fri, May 31, 2019 at 12:26:56PM +0200, Hannes Reinecke wrote:
> On 5/31/19 10:46 AM, Ming Lei wrote:
> > On Fri, May 31, 2019 at 10:32:04AM +0200, Hannes Reinecke wrote:
> >> On 5/31/19 10:21 AM, Ming Lei wrote:
> >>> On Fri, May 31, 2019 at 09:41:58AM +0200, Hannes Reinecke wrote:
>  (Resending due to missing mailing list submission)
> 
>  Update v3 to support SCSI multiqueue.
> 
>  Signed-off-by: Hannes Reinecke 
>  ---
>   drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
>   drivers/scsi/hisi_sas/hisi_sas_main.c  | 45 
>  +-
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 44 
>  +++--
>   3 files changed, 36 insertions(+), 54 deletions(-)
> 
>  diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
>  b/drivers/scsi/hisi_sas/hisi_sas.h
>  index fc87994b5d73..4b6f32f60689 100644
>  --- a/drivers/scsi/hisi_sas/hisi_sas.h
>  +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>  @@ -378,7 +378,6 @@ struct hisi_hba {
>   u32 intr_coal_count;/* Interrupt count to coalesce */
>   
>   int cq_nvecs;
>  -unsigned int *reply_map;
>   
>   /* debugfs memories */
>   u32 *debugfs_global_reg;
>  diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
>  b/drivers/scsi/hisi_sas/hisi_sas_main.c
>  index 8a7feb8ed8d6..f4237c4754a4 100644
>  --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>  +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>  @@ -200,16 +200,12 @@ static void hisi_sas_slot_index_set(struct 
>  hisi_hba *hisi_hba, int slot_idx)
>   set_bit(slot_idx, bitmap);
>   }
>   
>  -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
>  - struct scsi_cmnd *scsi_cmnd)
>  +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
>   {
>   int index;
>   void *bitmap = hisi_hba->slot_index_tags;
>   unsigned long flags;
>   
>  -if (scsi_cmnd)
>  -return scsi_cmnd->request->tag;
>  -
>   spin_lock_irqsave(&hisi_hba->lock, flags);
>   index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
>  hisi_hba->last_slot_index + 1);
> >>>
> >>> Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
> >>> tag via spin_lock & find_next_zero_bit(). Do you think this way is more
> >>> efficient than blk-mq's sbitmap?
> >>>
> >> slot_index_alloc() is only used for commands which do _not_ have a tag
> >> (eg internal commands), or for v2 hardware which has weird allocation 
> >> rules.
> > 
> > But this patch has switched to this allocation unconditionally for all 
> > commands:
> > 
> No:
> 
> @@ -503,21 +513,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
> 
> if (hisi_hba->hw->slot_index_alloc)
> rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
> -   else {
> -   struct scsi_cmnd *scsi_cmnd = NULL;
> -
> -   if (task->uldd_task) {
> -   struct ata_queued_cmd *qc;
> -
> -   if (dev_is_sata(device)) {
> -   qc = task->uldd_task;
> -   scsi_cmnd = qc->scsicmd;
> -   } else {
> -   scsi_cmnd = task->uldd_task;
> -   }
> -   }
> -   rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
> -   }
> +   else if (blk_tag != (u32)-1)
> +   rc = blk_mq_unique_tag_to_tag(blk_tag);
> +   else
> +   rc  = hisi_sas_slot_index_alloc(hisi_hba);
> if (rc < 0)
> goto err_out_dif_dma_unmap;
> 
> 
> First we check for the 'slot_index_alloc()' callback to handle weird v2
> allocation rules, _then_ we look for a tag, and only if we do _not_ have
> a tag we're using the bitmap.

OK, looks I miss the above change.

> And the bitmap is already correctly sized, as otherwise we'd have a
> clash between internal and tagged I/O commands even now.

But now the big problem is in the following two line code:

+   else if (blk_tag != (u32)-1)
+   rc = blk_mq_unique_tag_to_tag(blk_tag);

Request from different blk-mq hw queue has same tag returned from
blk_mq_unique_tag_to_tag().

Now the biggest question is that if V3 hw supports per-queue tags,
If yes, it should be real MQ hardware, otherwise I guess commands with
same tag at the same time may not work for host-wide tags.

> 
> >> -   if (scsi_cmnd)
> >> -   return scsi_cmnd->request->tag;
> >> -
> > 
> > Otherwise duplicated slot can be used from different blk-mq hw queue.
> > 
> >>
> >>> The worsen thing is that V3's actual max queue depth is (4096 - 96), but
> >>> this patch claims that the device can support (4096 - 96)

Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Ming Lei
On Fri, May 31, 2019 at 10:42:12AM +0100, John Garry wrote:
> > > > > 
> > > > > -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
> > > > > -  struct scsi_cmnd *scsi_cmnd)
> > > > > +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
> > > > >  {
> > > > >   int index;
> > > > >   void *bitmap = hisi_hba->slot_index_tags;
> > > > >   unsigned long flags;
> > > > > 
> > > > > - if (scsi_cmnd)
> > > > > - return scsi_cmnd->request->tag;
> > > > > -
> > > > >   spin_lock_irqsave(&hisi_hba->lock, flags);
> > > > >   index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
> > > > >  hisi_hba->last_slot_index + 1);
> > > > 
> > > > Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
> > > > tag via spin_lock & find_next_zero_bit(). Do you think this way is more
> > > > efficient than blk-mq's sbitmap?
> 
> These are not fast path, as used only for TMF, internal IO, etc.

OK, looks I miss that.

> 
> Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd()
> when available, so that the LLDD has to stop managing them.

Agree.

> 
> > > > 
> > > slot_index_alloc() is only used for commands which do _not_ have a tag
> > > (eg internal commands), or for v2 hardware which has weird allocation 
> > > rules.
> > 
> > But this patch has switched to this allocation unconditionally for all 
> > commands:
> > 
> 
> As Hannes said, v2 had a few bugs which meant that we had to make a specific
> version of this function for that hw revision, cf.
> slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.
> 
> But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it would
> help, considering the weird rules.
> 
> > > -   if (scsi_cmnd)
> > > -   return scsi_cmnd->request->tag;
> > > -
> > 
> > Otherwise duplicated slot can be used from different blk-mq hw queue.
> > 
> > > 
> > > > The worsen thing is that V3's actual max queue depth is (4096 - 96), but
> > > > this patch claims that the device can support (4096 - 96) * 32 command
> > > > slots
> 
> To be clear about the hw, the hw supports max 4096 command tags and has 16

Is 4096 the max allowed host-wide command tags? Or per-queue's max commands
tags?

If it is per-queue's max command tags, V3 should be real MQ hardware,
otherwise it is still host wide. Otherwise, Hannes's patch can't work
because tag from different hw queue can be same.

> hw queues. The hw queue depth is configurable by software, and we configure
> it at 4096 per queue - same as max command tags - this is to support
> possibility of all commands using the same queue simultaneously.

Suppose 4096 is the host-wide command tags:

As Hannes's patch changed to allow each blk-mq hw queue to accept 4096 commands,
there will be big contention in driver, given now the actual .can_queue becomes
4096 * 32, and how to avoid the same tag from different hw queue? 

Thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread John Garry

On 31/05/2019 14:18, Hannes Reinecke wrote:

On 5/31/19 11:42 AM, John Garry wrote:


-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
- struct scsi_cmnd *scsi_cmnd)
+static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
 {
 int index;
 void *bitmap = hisi_hba->slot_index_tags;
 unsigned long flags;

-if (scsi_cmnd)
-return scsi_cmnd->request->tag;
-
 spin_lock_irqsave(&hisi_hba->lock, flags);
 index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
hisi_hba->last_slot_index + 1);


Then you switch to hisi_sas_slot_index_alloc() for allocating the
unique
tag via spin_lock & find_next_zero_bit(). Do you think this way is more
efficient than blk-mq's sbitmap?


These are not fast path, as used only for TMF, internal IO, etc.

Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd()
when available, so that the LLDD has to stop managing them.




slot_index_alloc() is only used for commands which do _not_ have a tag
(eg internal commands), or for v2 hardware which has weird allocation
rules.


But this patch has switched to this allocation unconditionally for all
commands:



As Hannes said, v2 had a few bugs which meant that we had to make a
specific version of this function for that hw revision, cf.
slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.

But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it
would help, considering the weird rules.


We should be able to get away by shifting all tags by 1 to the left,
and adding '1' to SMP/SAS commands, and '2' to STP commands, no?
Then index '0' would be free, and the allocation rules are satisfied.



The crazy (escalating from weird) rules to workaround the HW bug(s) mean 
that we need to chop up the command tag range into blocks of 32 even tag 
indexes per SATA device; this means that SATA device #0 can use 64, 66, 
68, 70...126. device #1 can use 128, 130, 132,..., device #2 can use 
192, 194,...


I don't know how you can take a rq tag and generate a command tag 
suitable for a SATA device.


Thanks!

John


I'll see to whip up a patch.

Cheers,

Hannes






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Hannes Reinecke
On 5/31/19 11:42 AM, John Garry wrote:
>
> -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
> - struct scsi_cmnd *scsi_cmnd)
> +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
>  {
>  int index;
>  void *bitmap = hisi_hba->slot_index_tags;
>  unsigned long flags;
>
> -    if (scsi_cmnd)
> -    return scsi_cmnd->request->tag;
> -
>  spin_lock_irqsave(&hisi_hba->lock, flags);
>  index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
>     hisi_hba->last_slot_index + 1);

 Then you switch to hisi_sas_slot_index_alloc() for allocating the
 unique
 tag via spin_lock & find_next_zero_bit(). Do you think this way is more
 efficient than blk-mq's sbitmap?
> 
> These are not fast path, as used only for TMF, internal IO, etc.
> 
> Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd()
> when available, so that the LLDD has to stop managing them.
> 

>>> slot_index_alloc() is only used for commands which do _not_ have a tag
>>> (eg internal commands), or for v2 hardware which has weird allocation
>>> rules.
>>
>> But this patch has switched to this allocation unconditionally for all
>> commands:
>>
> 
> As Hannes said, v2 had a few bugs which meant that we had to make a
> specific version of this function for that hw revision, cf.
> slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.
> 
> But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it
> would help, considering the weird rules.
> 
We should be able to get away by shifting all tags by 1 to the left,
and adding '1' to SMP/SAS commands, and '2' to STP commands, no?
Then index '0' would be free, and the allocation rules are satisfied.

I'll see to whip up a patch.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Hannes Reinecke
On 5/31/19 10:46 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 10:32:04AM +0200, Hannes Reinecke wrote:
>> On 5/31/19 10:21 AM, Ming Lei wrote:
>>> On Fri, May 31, 2019 at 09:41:58AM +0200, Hannes Reinecke wrote:
 (Resending due to missing mailing list submission)

 Update v3 to support SCSI multiqueue.

 Signed-off-by: Hannes Reinecke 
 ---
  drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
  drivers/scsi/hisi_sas/hisi_sas_main.c  | 45 
 +-
  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 44 
 +++--
  3 files changed, 36 insertions(+), 54 deletions(-)

 diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
 b/drivers/scsi/hisi_sas/hisi_sas.h
 index fc87994b5d73..4b6f32f60689 100644
 --- a/drivers/scsi/hisi_sas/hisi_sas.h
 +++ b/drivers/scsi/hisi_sas/hisi_sas.h
 @@ -378,7 +378,6 @@ struct hisi_hba {
u32 intr_coal_count;/* Interrupt count to coalesce */
  
int cq_nvecs;
 -  unsigned int *reply_map;
  
/* debugfs memories */
u32 *debugfs_global_reg;
 diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
 b/drivers/scsi/hisi_sas/hisi_sas_main.c
 index 8a7feb8ed8d6..f4237c4754a4 100644
 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
 +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
 @@ -200,16 +200,12 @@ static void hisi_sas_slot_index_set(struct hisi_hba 
 *hisi_hba, int slot_idx)
set_bit(slot_idx, bitmap);
  }
  
 -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
 -   struct scsi_cmnd *scsi_cmnd)
 +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
  {
int index;
void *bitmap = hisi_hba->slot_index_tags;
unsigned long flags;
  
 -  if (scsi_cmnd)
 -  return scsi_cmnd->request->tag;
 -
spin_lock_irqsave(&hisi_hba->lock, flags);
index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
   hisi_hba->last_slot_index + 1);
>>>
>>> Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
>>> tag via spin_lock & find_next_zero_bit(). Do you think this way is more
>>> efficient than blk-mq's sbitmap?
>>>
>> slot_index_alloc() is only used for commands which do _not_ have a tag
>> (eg internal commands), or for v2 hardware which has weird allocation rules.
> 
> But this patch has switched to this allocation unconditionally for all 
> commands:
> 
No:

@@ -503,21 +513,10 @@ static int hisi_sas_task_prep(struct sas_task *task,

if (hisi_hba->hw->slot_index_alloc)
rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-   else {
-   struct scsi_cmnd *scsi_cmnd = NULL;
-
-   if (task->uldd_task) {
-   struct ata_queued_cmd *qc;
-
-   if (dev_is_sata(device)) {
-   qc = task->uldd_task;
-   scsi_cmnd = qc->scsicmd;
-   } else {
-   scsi_cmnd = task->uldd_task;
-   }
-   }
-   rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-   }
+   else if (blk_tag != (u32)-1)
+   rc = blk_mq_unique_tag_to_tag(blk_tag);
+   else
+   rc  = hisi_sas_slot_index_alloc(hisi_hba);
if (rc < 0)
goto err_out_dif_dma_unmap;


First we check for the 'slot_index_alloc()' callback to handle weird v2
allocation rules, _then_ we look for a tag, and only if we do _not_ have
a tag we're using the bitmap.
And the bitmap is already correctly sized, as otherwise we'd have a
clash between internal and tagged I/O commands even now.

>> -   if (scsi_cmnd)
>> -   return scsi_cmnd->request->tag;
>> -
> 
> Otherwise duplicated slot can be used from different blk-mq hw queue.
> 
>>
>>> The worsen thing is that V3's actual max queue depth is (4096 - 96), but
>>> this patch claims that the device can support (4096 - 96) * 32 command
>>> slots, finally hisi_sas_slot_index_alloc() is used to respect the actual
>>> max queue depth(4000).
>>>
>> Well, this patch is an RFC to demonstrate my idea. Of course the queue
>> depth should be identical before and after the conversion.
> 
> That is why I call it is hard to partition the hostwide tags to MQ.
> 
It's not. The driver already sets aside a portion of tags for internal
commands (check HISI_SAS_RESERVED_IPTT_CNT), so it is already
effectively partitioned.

>>
>>> Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge> 
>>> memory is wasted for request pool.
>>>
>> See above. That allocation is only used if no blk tag is available.
> 
> This patch switches the allocation for all commands.
> 
See above. No.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   

Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread John Garry


-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
-struct scsi_cmnd *scsi_cmnd)
+static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
 {
int index;
void *bitmap = hisi_hba->slot_index_tags;
unsigned long flags;

-   if (scsi_cmnd)
-   return scsi_cmnd->request->tag;
-
spin_lock_irqsave(&hisi_hba->lock, flags);
index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
   hisi_hba->last_slot_index + 1);


Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
tag via spin_lock & find_next_zero_bit(). Do you think this way is more
efficient than blk-mq's sbitmap?


These are not fast path, as used only for TMF, internal IO, etc.

Having said that, hopefully we can move to scsi_{get,put}_reserved_cmd() 
when available, so that the LLDD has to stop managing them.





slot_index_alloc() is only used for commands which do _not_ have a tag
(eg internal commands), or for v2 hardware which has weird allocation rules.


But this patch has switched to this allocation unconditionally for all commands:



As Hannes said, v2 had a few bugs which meant that we had to make a 
specific version of this function for that hw revision, cf. 
slot_index_alloc_quirk_v2_hw(), and it cannot use request queue tag.


But, indeed, we could consider sbitmap for v2 hw. I'm not sure if it 
would help, considering the weird rules.



-   if (scsi_cmnd)
-   return scsi_cmnd->request->tag;
-


Otherwise duplicated slot can be used from different blk-mq hw queue.




The worsen thing is that V3's actual max queue depth is (4096 - 96), but
this patch claims that the device can support (4096 - 96) * 32 command
slots


To be clear about the hw, the hw supports max 4096 command tags and has 
16 hw queues. The hw queue depth is configurable by software, and we 
configure it at 4096 per queue - same as max command tags - this is to 
support possibility of all commands using the same queue simultaneously.


, finally hisi_sas_slot_index_alloc() is used to respect the actual

max queue depth(4000).


Well, this patch is an RFC to demonstrate my idea. Of course the queue
depth should be identical before and after the conversion.


That is why I call it is hard to partition the hostwide tags to MQ.




Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge> memory 
is wasted for request pool.


See above. That allocation is only used if no blk tag is available.


This patch switches the allocation for all commands.



Or, at least, that was the idea :-)


Agree, :-)


thanks,
Ming

.






Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Ming Lei
On Fri, May 31, 2019 at 10:32:04AM +0200, Hannes Reinecke wrote:
> On 5/31/19 10:21 AM, Ming Lei wrote:
> > On Fri, May 31, 2019 at 09:41:58AM +0200, Hannes Reinecke wrote:
> >> (Resending due to missing mailing list submission)
> >>
> >> Update v3 to support SCSI multiqueue.
> >>
> >> Signed-off-by: Hannes Reinecke 
> >> ---
> >>  drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
> >>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 45 
> >> +-
> >>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 44 
> >> +++--
> >>  3 files changed, 36 insertions(+), 54 deletions(-)
> >>
> >> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
> >> b/drivers/scsi/hisi_sas/hisi_sas.h
> >> index fc87994b5d73..4b6f32f60689 100644
> >> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> >> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> >> @@ -378,7 +378,6 @@ struct hisi_hba {
> >>u32 intr_coal_count;/* Interrupt count to coalesce */
> >>  
> >>int cq_nvecs;
> >> -  unsigned int *reply_map;
> >>  
> >>/* debugfs memories */
> >>u32 *debugfs_global_reg;
> >> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> >> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> >> index 8a7feb8ed8d6..f4237c4754a4 100644
> >> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> >> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> >> @@ -200,16 +200,12 @@ static void hisi_sas_slot_index_set(struct hisi_hba 
> >> *hisi_hba, int slot_idx)
> >>set_bit(slot_idx, bitmap);
> >>  }
> >>  
> >> -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
> >> -   struct scsi_cmnd *scsi_cmnd)
> >> +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
> >>  {
> >>int index;
> >>void *bitmap = hisi_hba->slot_index_tags;
> >>unsigned long flags;
> >>  
> >> -  if (scsi_cmnd)
> >> -  return scsi_cmnd->request->tag;
> >> -
> >>spin_lock_irqsave(&hisi_hba->lock, flags);
> >>index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
> >>   hisi_hba->last_slot_index + 1);
> > 
> > Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
> > tag via spin_lock & find_next_zero_bit(). Do you think this way is more
> > efficient than blk-mq's sbitmap?
> > 
> slot_index_alloc() is only used for commands which do _not_ have a tag
> (eg internal commands), or for v2 hardware which has weird allocation rules.

But this patch has switched to this allocation unconditionally for all commands:

> -   if (scsi_cmnd)
> -   return scsi_cmnd->request->tag;
> -

Otherwise duplicated slot can be used from different blk-mq hw queue.

> 
> > The worsen thing is that V3's actual max queue depth is (4096 - 96), but
> > this patch claims that the device can support (4096 - 96) * 32 command
> > slots, finally hisi_sas_slot_index_alloc() is used to respect the actual
> > max queue depth(4000).
> > 
> Well, this patch is an RFC to demonstrate my idea. Of course the queue
> depth should be identical before and after the conversion.

That is why I call it is hard to partition the hostwide tags to MQ.

> 
> > Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge> 
> > memory is wasted for request pool.
> > 
> See above. That allocation is only used if no blk tag is available.

This patch switches the allocation for all commands.

> 
> Or, at least, that was the idea :-)

Agree, :-)


thanks,
Ming


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Hannes Reinecke
On 5/31/19 10:21 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 09:41:58AM +0200, Hannes Reinecke wrote:
>> (Resending due to missing mailing list submission)
>>
>> Update v3 to support SCSI multiqueue.
>>
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
>>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 45 
>> +-
>>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 44 
>> +++--
>>  3 files changed, 36 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
>> b/drivers/scsi/hisi_sas/hisi_sas.h
>> index fc87994b5d73..4b6f32f60689 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>> @@ -378,7 +378,6 @@ struct hisi_hba {
>>  u32 intr_coal_count;/* Interrupt count to coalesce */
>>  
>>  int cq_nvecs;
>> -unsigned int *reply_map;
>>  
>>  /* debugfs memories */
>>  u32 *debugfs_global_reg;
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
>> b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> index 8a7feb8ed8d6..f4237c4754a4 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> @@ -200,16 +200,12 @@ static void hisi_sas_slot_index_set(struct hisi_hba 
>> *hisi_hba, int slot_idx)
>>  set_bit(slot_idx, bitmap);
>>  }
>>  
>> -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
>> - struct scsi_cmnd *scsi_cmnd)
>> +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
>>  {
>>  int index;
>>  void *bitmap = hisi_hba->slot_index_tags;
>>  unsigned long flags;
>>  
>> -if (scsi_cmnd)
>> -return scsi_cmnd->request->tag;
>> -
>>  spin_lock_irqsave(&hisi_hba->lock, flags);
>>  index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
>> hisi_hba->last_slot_index + 1);
> 
> Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
> tag via spin_lock & find_next_zero_bit(). Do you think this way is more
> efficient than blk-mq's sbitmap?
> 
slot_index_alloc() is only used for commands which do _not_ have a tag
(eg internal commands), or for v2 hardware which has weird allocation rules.

> The worsen thing is that V3's actual max queue depth is (4096 - 96), but
> this patch claims that the device can support (4096 - 96) * 32 command
> slots, finally hisi_sas_slot_index_alloc() is used to respect the actual
> max queue depth(4000).
> 
Well, this patch is an RFC to demonstrate my idea. Of course the queue
depth should be identical before and after the conversion.

> Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge> 
> memory is wasted for request pool.
> 
See above. That allocation is only used if no blk tag is available.

Or, at least, that was the idea :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)


Re: [PATCH RFC] hisi_sas_v3: multiqueue support

2019-05-31 Thread Ming Lei
On Fri, May 31, 2019 at 09:41:58AM +0200, Hannes Reinecke wrote:
> (Resending due to missing mailing list submission)
> 
> Update v3 to support SCSI multiqueue.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/hisi_sas/hisi_sas.h   |  1 -
>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 45 
> +-
>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 44 +++--
>  3 files changed, 36 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
> b/drivers/scsi/hisi_sas/hisi_sas.h
> index fc87994b5d73..4b6f32f60689 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> @@ -378,7 +378,6 @@ struct hisi_hba {
>   u32 intr_coal_count;/* Interrupt count to coalesce */
>  
>   int cq_nvecs;
> - unsigned int *reply_map;
>  
>   /* debugfs memories */
>   u32 *debugfs_global_reg;
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 8a7feb8ed8d6..f4237c4754a4 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -200,16 +200,12 @@ static void hisi_sas_slot_index_set(struct hisi_hba 
> *hisi_hba, int slot_idx)
>   set_bit(slot_idx, bitmap);
>  }
>  
> -static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
> -  struct scsi_cmnd *scsi_cmnd)
> +static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba)
>  {
>   int index;
>   void *bitmap = hisi_hba->slot_index_tags;
>   unsigned long flags;
>  
> - if (scsi_cmnd)
> - return scsi_cmnd->request->tag;
> -
>   spin_lock_irqsave(&hisi_hba->lock, flags);
>   index = find_next_zero_bit(bitmap, hisi_hba->slot_index_count,
>  hisi_hba->last_slot_index + 1);

Then you switch to hisi_sas_slot_index_alloc() for allocating the unique
tag via spin_lock & find_next_zero_bit(). Do you think this way is more
efficient than blk-mq's sbitmap?

The worsen thing is that V3's actual max queue depth is (4096 - 96), but
this patch claims that the device can support (4096 - 96) * 32 command
slots, finally hisi_sas_slot_index_alloc() is used to respect the actual
max queue depth(4000).

Big contention is caused on hisi_sas_slot_index_alloc(), meantime huge
memory is wasted for request pool.



Thanks,
Ming