Re: [PATCH v3 03/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices

2017-08-09 Thread Suganath Prabu Subramani
Hi Martin,
This code was added to detect holes, when we started testing with 4.9
kernel. when we disabled "use_blk_mq" and no merges, we are hitting
issues with holes. Anyhow In latest upstream, it got fixed with this
commit 5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af

So we are removing the code related to hole detection .

Thanks,
Suganath Prabu S



On Tue, Aug 8, 2017 at 9:42 PM, Martin K. Petersen
 wrote:
>
> Suganath,
>
>> + /*
>> +  ** Below code detects gaps/holes in IO data buffers.
>> +  ** What does holes/gaps mean?
>> +  ** Any SGE except first one in a SGL starts at non NVME page size
>> +  ** aligned address OR Any SGE except last one in a SGL ends at
>> +  ** non NVME page size boundary.
>> +  **
>> +  ** Driver has already informed block layer by setting boundary rules
>> +  ** for bio merging done at NVME page size boundary calling kernel API
>> +  ** blk_queue_virt_boundary inside slave_config.
>> +  ** Still there is possibility of IO coming with holes to driver 
>> because
>> +  ** of IO merging done by IO scheduler.
>
> All this SGL to PRP code needs to go.
>
> If you are seeing anything that's not a valid PRP after setting the
> queue virt boundary then there's a block layer bug that needs to be
> debugged and fixed. Regardless of whether you are using an I/O scheduler
> or not.
>
> --
> Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 03/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices

2017-08-08 Thread Martin K. Petersen

Suganath,

> + /*
> +  ** Below code detects gaps/holes in IO data buffers.
> +  ** What does holes/gaps mean?
> +  ** Any SGE except first one in a SGL starts at non NVME page size
> +  ** aligned address OR Any SGE except last one in a SGL ends at
> +  ** non NVME page size boundary.
> +  **
> +  ** Driver has already informed block layer by setting boundary rules
> +  ** for bio merging done at NVME page size boundary calling kernel API
> +  ** blk_queue_virt_boundary inside slave_config.
> +  ** Still there is possibility of IO coming with holes to driver because
> +  ** of IO merging done by IO scheduler.

All this SGL to PRP code needs to go.

If you are seeing anything that's not a valid PRP after setting the
queue virt boundary then there's a block layer bug that needs to be
debugged and fixed. Regardless of whether you are using an I/O scheduler
or not.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v3 03/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices

2017-08-08 Thread Suganath Prabu S
* Added support for translating the SGLs associated with incoming
commands either to IEE SGL or NVMe PRPs for NVMe devices.

* The hardware translation of IEEE SGL to NVMe PRPs has limitation
and if a command cannot be translated by hardware then it will go
to firmware and the firmware needs to translate it. And this will
have a performance reduction. To avoid that driver proactively
checks whether the translation will be done in hardware or not,
if not then driver try to translate inside the driver.

Signed-off-by: Chaitra P B 
Signed-off-by: Suganath Prabu S 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  |  668 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h  |   44 ++-
 drivers/scsi/mpt3sas/mpt3sas_ctl.c   |1 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |   14 +-
 drivers/scsi/mpt3sas/mpt3sas_warpdrive.c |2 +-
 5 files changed, 713 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 18039bb..d48f176 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include /* To get host page size per arch */
 #include 
 
 
@@ -1347,6 +1348,547 @@ _base_build_sg(struct MPT3SAS_ADAPTER *ioc, void *psge,
 /* IEEE format sgls */
 
 /**
+ * _base_build_nvme_prp - This function is called for NVMe end devices to build
+ * a native SGL (NVMe PRP). The native SGL is built starting in the first PRP
+ * entry of the NVMe message (PRP1).  If the data buffer is small enough to be
+ * described entirely using PRP1, then PRP2 is not used.  If needed, PRP2 is
+ * used to describe a larger data buffer.  If the data buffer is too large to
+ * describe using the two PRP entriess inside the NVMe message, then PRP1
+ * describes the first data memory segment, and PRP2 contains a pointer to a 
PRP
+ * list located elsewhere in memory to describe the remaining data memory
+ * segments.  The PRP list will be contiguous.
+
+ * The native SGL for NVMe devices is a Physical Region Page (PRP).  A PRP
+ * consists of a list of PRP entries to describe a number of noncontigous
+ * physical memory segments as a single memory buffer, just as a SGL does.  
Note
+ * however, that this function is only used by the IOCTL call, so the memory
+ * given will be guaranteed to be contiguous.  There is no need to translate
+ * non-contiguous SGL into a PRP in this case.  All PRPs will describe
+ * contiguous space that is one page size each.
+ *
+ * Each NVMe message contains two PRP entries.  The first (PRP1) either 
contains
+ * a PRP list pointer or a PRP element, depending upon the command.  PRP2
+ * contains the second PRP element if the memory being described fits within 2
+ * PRP entries, or a PRP list pointer if the PRP spans more than two entries.
+ *
+ * A PRP list pointer contains the address of a PRP list, structured as a 
linear
+ * array of PRP entries.  Each PRP entry in this list describes a segment of
+ * physical memory.
+ *
+ * Each 64-bit PRP entry comprises an address and an offset field.  The address
+ * always points at the beginning of a 4KB physical memory page, and the offset
+ * describes where within that 4KB page the memory segment begins.  Only the
+ * first element in a PRP list may contain a non-zero offest, implying that all
+ * memory segments following the first begin at the start of a 4KB page.
+ *
+ * Each PRP element normally describes 4KB of physical memory, with exceptions
+ * for the first and last elements in the list.  If the memory being described
+ * by the list begins at a non-zero offset within the first 4KB page, then the
+ * first PRP element will contain a non-zero offset indicating where the region
+ * begins within the 4KB page.  The last memory segment may end before the end
+ * of the 4KB segment, depending upon the overall size of the memory being
+ * described by the PRP list.
+ *
+ * Since PRP entries lack any indication of size, the overall data buffer 
length
+ * is used to determine where the end of the data memory buffer is located, and
+ * how many PRP entries are required to describe it.
+ *
+ * @ioc: per adapter object
+ * @smid: system request message index for getting asscociated SGL
+ * @nvme_encap_request: the NVMe request msg frame pointer
+ * @data_out_dma: physical address for WRITES
+ * @data_out_sz: data xfer size for WRITES
+ * @data_in_dma: physical address for READS
+ * @data_in_sz: data xfer size for READS
+ *
+ * Returns nothing.
+ */
+static void
+_base_build_nvme_prp(struct MPT3SAS_ADAPTER *ioc, u16 smid,
+   Mpi26NVMeEncapsulatedRequest_t *nvme_encap_request,
+   dma_addr_t data_out_dma, size_t data_out_sz, dma_addr_t data_in_dma,
+   size_t data_in_sz)
+{
+   int prp_size = NVME_PRP_SIZE;
+   u64 *prp_entry, *prp1_entry, *prp2_entry, *prp_entry_phys;
+   u64