Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-09-25 Thread Martin K. Petersen

Hi Suganath,

> Also, Making all PRP buffer may or may not need FW changes (assuming
> it is possible.), we may end up into multiple FW version check.

I don't understand how submitting an I/O that is guaranteed to honor the
constraints of the target NVMe drive could possibly cause problems for
the controller firmware. Quite the contrary, it's the best case
scenario.

> Since this is main IO path and current driver is following H/W
> limitation, we should avoid any changes in this area until and unless
> change is universal acceptable in FW (for all type of work load).

This is why you need to involve the Linux community early in the design
process and not when your implementation is complete.

We could have told you right away what the correct approach would be for
your Linux driver. And that said approach works for products from other
vendors so we see no compelling reason to deviate from it.

As evidenced by Broadcom disowning the legacy mpt and megaraid drivers,
I will be stuck maintaining this mpt3sas code for a decade or more. Long
after Broadcom has ended official support and moved on to different
ASICs and programming interfaces. Consequently, I am very heavily biased
towards solutions that leverage the shared interfaces provided by the
kernel and that don't have special cases and workarounds inside the
driver.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-09-18 Thread Suganath Prabu Subramani
Hi Martin,

On Fri, Sep 15, 2017 at 6:37 AM, Martin K. Petersen
 wrote:
>
> Suganath,
>
>> Is there any update on the submitted mpt3sas patches.
>
> We are waiting for you to report back your findings on PRP vs. SGL.

We are working on this, since there is h/w dependent, we are in
discussion with H/W & F/W team and doing experiments. If there is no
impact, and if SGL translation has to be removed, this change has to
go through some phase of testing, before we post it to upstream, since
that is not inline with H/W requirement.

The hardware translation of IEEE SGL to NVMe PRPs has limitation.

We have added the below comment in patch 3 as well:

if a command cannot be translated by hardware then it will go
to firmware and the firmware needs to translate it. And this will
have a performance reduction. To avoid that driver proactively
checks whether the translation will be done in hardware or not,
if not then driver try to translate inside the driver

Current code posted to upstream is inline with hardware requirements
and well tested internally.

SGL vs NVMe PRP building in driver is small sanity check for decision making
and it is not going to change in long run.

Also, Making all PRP buffer may or may not need FW changes (assuming
it is possible.),
we may end up into multiple FW version check.

Since this is main IO path and current driver is following H/W limitation,
we should avoid any changes in this area until and unless change is
universal acceptable in FW (for all type of work load).

Hope this clarifies.
>
> --
> Martin K. Petersen  Oracle Linux Engineering

Thanks,
Suganath Prabu S


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-09-14 Thread Martin K. Petersen

Suganath,

> Is there any update on the submitted mpt3sas patches.

We are waiting for you to report back your findings on PRP vs. SGL.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-09-13 Thread Suganath Prabu Subramani
Hi Martin,

Is there any update on the submitted mpt3sas patches.

Thanks,
Suganath Prabu S

On Fri, Sep 1, 2017 at 2:09 PM, Suganath Prabu Subramani
 wrote:
> Hi Martin,
>
> On Fri, Sep 1, 2017 at 8:52 AM, Martin K. Petersen
>  wrote:
>>
>> Hi Suganath,
>>
>>> Let me explain - NVME device fast path is possible in two ways.  IEEE
>>> SGL and PRP SGL. Due to h/w constraint we choose IEEE SGL only for
>>> smaller IO size.  Both above is true h/w Fast Path and no firmware
>>> involvement.
>>
>>> Agree with you. We are planning to see if we can keep only simple Fast
>>> Path using only PRP.
>>
>> That would be great, thank you!
>>
>>> Currently there is no performance issue for UNMAP translation in FW.
>>
>> Good!
>>
 And yet patch 4 circumvents that statement by adding support for
 encapsulated commands to bypass the FW translation...
>>>
>>> This path is not due to performance reason. User wants to interact
>>> with NVME drive in native NVME command for management.
>>
>> Patch 4 states:
>>
>> "This encapsulated NVMe command is used by applications to send direct
>> NVMe commands to NVMe drives or for handling unmap where the translation
>> at controller/firmware level is having performance issues."
>>
>> --
>
> The statement in description of patch 4 is added by mistake, We ll
> correct the description and re sending that.
>
>> Martin K. Petersen  Oracle Linux Engineering
>
> Thanks,
> Suganath Prabu S


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-09-01 Thread Suganath Prabu Subramani
Hi Martin,

On Fri, Sep 1, 2017 at 8:52 AM, Martin K. Petersen
 wrote:
>
> Hi Suganath,
>
>> Let me explain - NVME device fast path is possible in two ways.  IEEE
>> SGL and PRP SGL. Due to h/w constraint we choose IEEE SGL only for
>> smaller IO size.  Both above is true h/w Fast Path and no firmware
>> involvement.
>
>> Agree with you. We are planning to see if we can keep only simple Fast
>> Path using only PRP.
>
> That would be great, thank you!
>
>> Currently there is no performance issue for UNMAP translation in FW.
>
> Good!
>
>>> And yet patch 4 circumvents that statement by adding support for
>>> encapsulated commands to bypass the FW translation...
>>
>> This path is not due to performance reason. User wants to interact
>> with NVME drive in native NVME command for management.
>
> Patch 4 states:
>
> "This encapsulated NVMe command is used by applications to send direct
> NVMe commands to NVMe drives or for handling unmap where the translation
> at controller/firmware level is having performance issues."
>
> --

The statement in description of patch 4 is added by mistake, We ll
correct the description and re sending that.

> Martin K. Petersen  Oracle Linux Engineering

Thanks,
Suganath Prabu S


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-08-31 Thread Martin K. Petersen

Hi Suganath,

> Let me explain - NVME device fast path is possible in two ways.  IEEE
> SGL and PRP SGL. Due to h/w constraint we choose IEEE SGL only for
> smaller IO size.  Both above is true h/w Fast Path and no firmware
> involvement.

> Agree with you. We are planning to see if we can keep only simple Fast
> Path using only PRP.

That would be great, thank you!

> Currently there is no performance issue for UNMAP translation in FW.

Good!

>> And yet patch 4 circumvents that statement by adding support for
>> encapsulated commands to bypass the FW translation...
>
> This path is not due to performance reason. User wants to interact
> with NVME drive in native NVME command for management.

Patch 4 states:

"This encapsulated NVMe command is used by applications to send direct
NVMe commands to NVMe drives or for handling unmap where the translation
at controller/firmware level is having performance issues."

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-08-30 Thread Suganath Prabu Subramani
Hi Martin,
Replied inline.

Thanks,
Suganath Prabu S

On Thu, Aug 31, 2017 at 8:35 AM, Martin K. Petersen
 wrote:
>
> Hi Suganath,
>
>> Theoretically we want to use h/w capability (to translate IEEE to PRP)
>> for smaller IO size to leverage h/w capability.
>
> Nobody says we have to use the capability just because the hardware has
> it.
>
> Unlike some other operating systems, Linux will only submit I/Os to the
> driver that conform to the reported underlying constraints of the
> hardware.

 In general, h/w constraints are handled. What we missed is
Fast Path h/w
which is not exposed to OS.

> I fail to understand how letting the HBA firmware translate an
> SGL to a PRP for a subset of I/Os could do anything but add latency.

 Let me explain - NVME device fast path is possible in two ways.
IEEE SGL and PRP SGL. Due to h/w constraint we choose IEEE SGL only for
smaller IO size.
Both above is true h/w Fast Path and no firmware involvement.

> Plus complexity in the hot path of the driver.

 Agree with you. We are planning to see if we can keep only
simple Fast
Path using only PRP. It will take some time to finalize as we have to
engage h/w and f/w team.  BTW - This area is h/w dependent and we do not
see further changes in this area.

>
>> - If the unmap translation in firmware is slow, why don't you translate
>>   WRITE SAME/w UNMAP set to DSM DEALLOCATE without requiring
>>   applications to do encapsulated passthrough?
>
>> => As of now, current FW supports UNMAP command but not WRITE_SAME for
>> NVME drive. We did some experiment to convert UMAP command in driver,
>> but that is not really giving any performance improvement.
>
> It is imperative that the common use case, Linux' discard
> infrastructure, is working correctly and is as performant as any
> application-driven passthrough workaround.
>
> Unlike SCSI-to-SATA translation you have the benefit of a 1:1 mapping
> between UNMAP and DEALLOCATE. I'm not even sure why there would be a
> significant performance penalty in the firmware?

 I agree. Currently there is no performance issue for UNMAP
translation in
FW.


>> We would like to continue with UNMAP (and all other non-read/write
>> commands) to be handled in FW.
>
> And yet patch 4 circumvents that statement by adding support for
> encapsulated commands to bypass the FW translation...

 This path is not due to performance reason. User wants to
interact with
NVME drive in native NVME command for management.

>
> --
> Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-08-30 Thread Martin K. Petersen

Hi Suganath,

> Theoretically we want to use h/w capability (to translate IEEE to PRP)
> for smaller IO size to leverage h/w capability.

Nobody says we have to use the capability just because the hardware has
it.

Unlike some other operating systems, Linux will only submit I/Os to the
driver that conform to the reported underlying constraints of the
hardware. I fail to understand how letting the HBA firmware translate an
SGL to a PRP for a subset of I/Os could do anything but add latency.
Plus complexity in the hot path of the driver.

> - If the unmap translation in firmware is slow, why don't you translate
>   WRITE SAME/w UNMAP set to DSM DEALLOCATE without requiring
>   applications to do encapsulated passthrough?

> => As of now, current FW supports UNMAP command but not WRITE_SAME for
> NVME drive. We did some experiment to convert UMAP command in driver,
> but that is not really giving any performance improvement.

It is imperative that the common use case, Linux' discard
infrastructure, is working correctly and is as performant as any
application-driven passthrough workaround.

Unlike SCSI-to-SATA translation you have the benefit of a 1:1 mapping
between UNMAP and DEALLOCATE. I'm not even sure why there would be a
significant performance penalty in the firmware?

> We would like to continue with UNMAP (and all other non-read/write
> commands) to be handled in FW.

And yet patch 4 circumvents that statement by adding support for
encapsulated commands to bypass the FW translation...

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-08-30 Thread Suganath Prabu Subramani
Hi Martin,

Replied in line.

- I don't understand why you go through all these hoops to decide
  whether to use PRPs or IEEE scatterlists. If the firmware translation
  is slow, why even bother with the SG format in the first place? Set
  the max I/O size to match MDTS and you're done.

=>  We will  set MDTS value as max hw sectors using
blk_queue_max_hw_sectors(). As of now, we see correct MDTS value is
being set to block layer via VPD page 0xb0 (Block limits VPD page )
response from FW for NVME device.
=> I will remove MDTS checks in IO path.


- What's the benefit of using SG for regular I/O commands?

=>  Broadcom's IT Tri-mode HBA hardware has a capability of
translating IEEE SGLs to PRP's only up to ~4 page block size.
If the IO block size is greater than that (along with other condition
described code base_is_prp_possible), driver has to frame the PRP's to
avoid FW intervention. Both the case is a fast path, but for smaller
IO (up to 20K) size will frame IEEE SGL and large IO size will frame
PRP format SGL. Theoretically we want to use h/w capability (to
translate IEEE to PRP) for smaller IO size to leverage h/w capability.
We are investigating if at all we can send all PRP and avoid checks in
driver, but that exercise may take time as we have many different
opinions. We prefer to use existing code as it is stable and in-line
with h/w requirement.


- If the unmap translation in firmware is slow, why don't you translate
  WRITE SAME/w UNMAP set to DSM DEALLOCATE without requiring
  applications to do encapsulated passthrough?

=> As of now, current FW supports UNMAP command but not WRITE_SAME for
NVME drive. We did some experiment to convert UMAP command in driver,
but that is not really giving any performance improvement. We would
like to continue with UNMAP (and all other non-read/write commands) to
be handled in FW.


- Also make sure you attribute your patches correctly (From: root
). And you don't need
that long CC: list. Just send the patch series to
linux-s...@vger.kernel.org.

=>  I will fix this type of issue going forward

Thanks,
Suganath Prabu S

On Wed, Aug 23, 2017 at 7:48 AM, Martin K. Petersen
 wrote:
>
> Suganath,
>
>>   mpt3sas: SGL to PRP Translation for I/Os to NVMe  devices
>
> I'm still confused about this patch.
>
>  - I don't understand why you go through all these hoops to decide
>whether to use PRPs or IEEE scatterlists. If the firmware translation
>is slow, why even bother with the SG format in the first place? Set
>the max I/O size to match MDTS and you're done.
>
>  - What's the benefit of using SG for regular I/O commands?
>
>  - If the unmap translation in firmware is slow, why don't you translate
>WRITE SAME/w UNMAP set to DSM DEALLOCATE without requiring
>applications to do encapsulated passthrough?
>
> Also make sure you attribute your patches correctly (From: root
> ). And you don't need that
> long CC: list. Just send the patch series to linux-s...@vger.kernel.org.
>
> Thanks!
>
> --
> Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v4 00/14] mpt3sas driver NVMe support:

2017-08-22 Thread Martin K. Petersen

Suganath,

>   mpt3sas: SGL to PRP Translation for I/Os to NVMe  devices

I'm still confused about this patch.

 - I don't understand why you go through all these hoops to decide
   whether to use PRPs or IEEE scatterlists. If the firmware translation
   is slow, why even bother with the SG format in the first place? Set
   the max I/O size to match MDTS and you're done.

 - What's the benefit of using SG for regular I/O commands?

 - If the unmap translation in firmware is slow, why don't you translate
   WRITE SAME/w UNMAP set to DSM DEALLOCATE without requiring
   applications to do encapsulated passthrough?

Also make sure you attribute your patches correctly (From: root
). And you don't need that
long CC: list. Just send the patch series to linux-s...@vger.kernel.org.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering