[PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
* Added support for translating the SGLs associated with incoming commands either to IEE SGL or NVMe PRPs for NVMe devices. * The hardware translation of IEEE SGL to NVMe PRPs has limitation and if a command cannot be translated by hardware then it will go to firmware and the firmware needs to translate it. And this will have a performance reduction. To avoid that driver proactively checks whether the translation will be done in hardware or not, if not then driver try to translate inside the driver. Signed-off-by: Chaitra P B Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 339 ++- drivers/scsi/mpt3sas/mpt3sas_base.h | 41 +++- drivers/scsi/mpt3sas/mpt3sas_ctl.c | 1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 +- drivers/scsi/mpt3sas/mpt3sas_warpdrive.c | 2 +- 5 files changed, 380 insertions(+), 17 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 11c6afe..1ad3cbb 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -59,6 +59,7 @@ #include #include #include +#include /* To get host page size per arch */ #include @@ -1344,7 +1345,218 @@ _base_build_sg(struct MPT3SAS_ADAPTER *ioc, void *psge, } } -/* IEEE format sgls */ +/** + * base_make_prp_nvme - + * Prepare PRPs(Physical Region Page)- SGLs specific to NVMe drives only + * + * @ioc: per adapter object + * @scmd: SCSI command from the mid-layer + * @mpi_request: mpi request + * @smid: msg Index + * @sge_count: scatter gather element count. + * + * Returns:true: PRPs are built + * false: IEEE SGLs needs to be built + */ +void +base_make_prp_nvme(struct MPT3SAS_ADAPTER *ioc, + struct scsi_cmnd *scmd, + Mpi25SCSIIORequest_t *mpi_request, + u16 smid, int sge_count) +{ + int sge_len, offset, num_prp_in_chain = 0; + Mpi25IeeeSgeChain64_t *main_chain_element, *ptr_first_sgl; + u64 *curr_buff; + dma_addr_t msg_phys; + u64 sge_addr; + u32 page_mask, page_mask_result; + struct scatterlist *sg_scmd; + u32 first_prp_len; + int data_len = scsi_bufflen(scmd); + u32 nvme_pg_size; + + nvme_pg_size = max_t(u32, ioc->page_size, NVME_PRP_PAGE_SIZE); + /* +* Nvme has a very convoluted prp format. One prp is required +* for each page or partial page. Driver need to split up OS sg_list +* entries if it is longer than one page or cross a page +* boundary. Driver also have to insert a PRP list pointer entry as +* the last entry in each physical page of the PRP list. +* +* NOTE: The first PRP "entry" is actually placed in the first +* SGL entry in the main message as IEEE 64 format. The 2nd +* entry in the main message is the chain element, and the rest +* of the PRP entries are built in the contiguous pcie buffer. +*/ + page_mask = nvme_pg_size - 1; + + /* +* Native SGL is needed. +* Put a chain element in main message frame that points to the first +* chain buffer. +* +* NOTE: The ChainOffset field must be 0 when using a chain pointer to +*a native SGL. +*/ + + /* Set main message chain element pointer */ + main_chain_element = (pMpi25IeeeSgeChain64_t)&mpi_request->SGL; + /* +* For NVMe the chain element needs to be the 2nd SG entry in the main +* message. +*/ + main_chain_element = (Mpi25IeeeSgeChain64_t *) + ((u8 *)main_chain_element + sizeof(MPI25_IEEE_SGE_CHAIN64)); + + /* +* For the PRP entries, use the specially allocated buffer of +* contiguous memory. Normal chain buffers can't be used +* because each chain buffer would need to be the size of an OS +* page (4k). +*/ + curr_buff = mpt3sas_base_get_pcie_sgl(ioc, smid); + msg_phys = (dma_addr_t)mpt3sas_base_get_pcie_sgl_dma(ioc, smid); + + main_chain_element->Address = cpu_to_le64(msg_phys); + main_chain_element->NextChainOffset = 0; + main_chain_element->Flags = MPI2_IEEE_SGE_FLAGS_CHAIN_ELEMENT | + MPI2_IEEE_SGE_FLAGS_SYSTEM_ADDR | + MPI26_IEEE_SGE_FLAGS_NSF_NVME_PRP; + + /* Build first prp, sge need not to be page aligned*/ + ptr_first_sgl = (pMpi25IeeeSgeChain64_t)&mpi_request->SGL; + sg_scmd = scsi_sglist(scmd); + sge_addr = sg_dma_address(sg_scmd); + sge_len = sg_dma_len(sg_scmd); + + offset = (u32)(sge_addr & page_mask); + first_prp_len = nvme_pg_size - offset; + + ptr_first_sgl->Address = cpu_to_le64(sge_addr); + ptr_first_sgl->Length = cpu_to_le32(first_prp_len); + + data_len -= first_prp_len; + +
Re: [PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
Hi Keith, We have made change and submitted V2 of patch set. Thanks, Suganath Prabu S On Wed, Jul 12, 2017 at 5:34 AM, Keith Busch wrote: > On Tue, Jul 11, 2017 at 01:55:02AM -0700, Suganath Prabu S wrote: >> +/** >> + * _base_check_pcie_native_sgl - This function is called for PCIe end >> devices to >> + * determine if the driver needs to build a native SGL. If so, that native >> + * SGL is built in the special contiguous buffers allocated especially for >> + * PCIe SGL creation. If the driver will not build a native SGL, return >> + * TRUE and a normal IEEE SGL will be built. Currently this routine >> + * supports NVMe. >> + * @ioc: per adapter object >> + * @mpi_request: mf request pointer >> + * @smid: system request message index >> + * @scmd: scsi command >> + * @pcie_device: points to the PCIe device's info >> + * >> + * Returns 0 if native SGL was built, 1 if no SGL was built >> + */ >> +static int >> +_base_check_pcie_native_sgl(struct MPT3SAS_ADAPTER *ioc, >> + Mpi25SCSIIORequest_t *mpi_request, u16 smid, struct scsi_cmnd *scmd, >> + struct _pcie_device *pcie_device) >> +{ > > > >> + /* Return 0, indicating we built a native SGL. */ >> + return 1; >> +} > > This function doesn't return 0 ever. Not sure why it's here. > > Curious about your device, though, if a nvme native SGL can *not* be > built, does the HBA firmware then buffer it in its local memory before > sending/receiving to/from the host? > > And if a native SGL can be built, does the NVMe target DMA directly > to/from host memory, giving a performance boost?
Re: [PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
Hi Suganath, [auto build test ERROR on scsi/for-next] [also build test ERROR on v4.12 next-20170711] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Suganath-Prabu-S/mpt3sas-Add-nvme-device-support-in-slave-alloc-target-alloc-and-probe/20170711-204831 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: x86_64-kexec (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 Note: the linux-review/Suganath-Prabu-S/mpt3sas-Add-nvme-device-support-in-slave-alloc-target-alloc-and-probe/20170711-204831 HEAD 46a9c8fb1d7fe7649aa0eaa925c6653a6fa3047e builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): In file included from drivers/scsi/mpt3sas/mpt3sas_base.c:66:0: >> drivers/scsi/mpt3sas/mpt3sas_base.h:57:26: fatal error: mpi/mpi2_pci.h: No >> such file or directory #include "mpi/mpi2_pci.h" ^ compilation terminated. vim +57 drivers/scsi/mpt3sas/mpt3sas_base.h 48 49 #include "mpi/mpi2_type.h" 50 #include "mpi/mpi2.h" 51 #include "mpi/mpi2_ioc.h" 52 #include "mpi/mpi2_cnfg.h" 53 #include "mpi/mpi2_init.h" 54 #include "mpi/mpi2_raid.h" 55 #include "mpi/mpi2_tool.h" 56 #include "mpi/mpi2_sas.h" > 57 #include "mpi/mpi2_pci.h" 58 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
On Tue, Jul 11, 2017 at 01:55:02AM -0700, Suganath Prabu S wrote: > +/** > + * _base_check_pcie_native_sgl - This function is called for PCIe end > devices to > + * determine if the driver needs to build a native SGL. If so, that native > + * SGL is built in the special contiguous buffers allocated especially for > + * PCIe SGL creation. If the driver will not build a native SGL, return > + * TRUE and a normal IEEE SGL will be built. Currently this routine > + * supports NVMe. > + * @ioc: per adapter object > + * @mpi_request: mf request pointer > + * @smid: system request message index > + * @scmd: scsi command > + * @pcie_device: points to the PCIe device's info > + * > + * Returns 0 if native SGL was built, 1 if no SGL was built > + */ > +static int > +_base_check_pcie_native_sgl(struct MPT3SAS_ADAPTER *ioc, > + Mpi25SCSIIORequest_t *mpi_request, u16 smid, struct scsi_cmnd *scmd, > + struct _pcie_device *pcie_device) > +{ > + /* Return 0, indicating we built a native SGL. */ > + return 1; > +} This function doesn't return 0 ever. Not sure why it's here. Curious about your device, though, if a nvme native SGL can *not* be built, does the HBA firmware then buffer it in its local memory before sending/receiving to/from the host? And if a native SGL can be built, does the NVMe target DMA directly to/from host memory, giving a performance boost?
Re: [PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
On Tue, Jul 11, 2017 at 01:55:02AM -0700, Suganath Prabu S wrote: > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h > b/drivers/scsi/mpt3sas/mpt3sas_base.h > index 60fa7b6..cebdd8e 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_base.h > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h > @@ -54,6 +54,7 @@ > #include "mpi/mpi2_raid.h" > #include "mpi/mpi2_tool.h" > #include "mpi/mpi2_sas.h" > +#include "mpi/mpi2_pci.h" Could you ajust your patch order for this series so each can compile? Here in patch 2 you're including a header that's not defined until patch 12.
[PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
* Added support for translating the SGLs associated with incoming commands either to IEE SGL or NVMe PRPs for NVMe devices. * The hardware translation of IEEE SGL to NVMe PRPs has limitation and if a command cannot be translated by hardware then it will go to firmware and the firmware needs to translate it. And this will have a performance reduction. To avoid that driver proactively checks whether the translation will be done in hardware or not, if not then driver try to translate inside the driver. Signed-off-by: Chaitra P B Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 623 +- drivers/scsi/mpt3sas/mpt3sas_base.h | 43 +++- drivers/scsi/mpt3sas/mpt3sas_ctl.c |1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 12 +- 4 files changed, 666 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 18039bb..b67212c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -59,6 +59,7 @@ #include #include #include +#include /* To get host page size per arch */ #include @@ -1347,6 +1348,502 @@ _base_build_sg(struct MPT3SAS_ADAPTER *ioc, void *psge, /* IEEE format sgls */ /** + * _base_build_nvme_prp - This function is called for NVMe end devices to build + * a native SGL (NVMe PRP). The native SGL is built starting in the first PRP + * entry of the NVMe message (PRP1). If the data buffer is small enough to be + * described entirely using PRP1, then PRP2 is not used. If needed, PRP2 is + * used to describe a larger data buffer. If the data buffer is too large to + * describe using the two PRP entriess inside the NVMe message, then PRP1 + * describes the first data memory segment, and PRP2 contains a pointer to a PRP + * list located elsewhere in memory to describe the remaining data memory + * segments. The PRP list will be contiguous. + + * The native SGL for NVMe devices is a Physical Region Page (PRP). A PRP + * consists of a list of PRP entries to describe a number of noncontigous + * physical memory segments as a single memory buffer, just as a SGL does. Note + * however, that this function is only used by the IOCTL call, so the memory + * given will be guaranteed to be contiguous. There is no need to translate + * non-contiguous SGL into a PRP in this case. All PRPs will describe + * contiguous space that is one page size each. + * + * Each NVMe message contains two PRP entries. The first (PRP1) either contains + * a PRP list pointer or a PRP element, depending upon the command. PRP2 + * contains the second PRP element if the memory being described fits within 2 + * PRP entries, or a PRP list pointer if the PRP spans more than two entries. + * + * A PRP list pointer contains the address of a PRP list, structured as a linear + * array of PRP entries. Each PRP entry in this list describes a segment of + * physical memory. + * + * Each 64-bit PRP entry comprises an address and an offset field. The address + * always points at the beginning of a 4KB physical memory page, and the offset + * describes where within that 4KB page the memory segment begins. Only the + * first element in a PRP list may contain a non-zero offest, implying that all + * memory segments following the first begin at the start of a 4KB page. + * + * Each PRP element normally describes 4KB of physical memory, with exceptions + * for the first and last elements in the list. If the memory being described + * by the list begins at a non-zero offset within the first 4KB page, then the + * first PRP element will contain a non-zero offset indicating where the region + * begins within the 4KB page. The last memory segment may end before the end + * of the 4KB segment, depending upon the overall size of the memory being + * described by the PRP list. + * + * Since PRP entries lack any indication of size, the overall data buffer length + * is used to determine where the end of the data memory buffer is located, and + * how many PRP entries are required to describe it. + * + * @ioc: per adapter object + * @smid: system request message index for getting asscociated SGL + * @nvme_encap_request: the NVMe request msg frame pointer + * @data_out_dma: physical address for WRITES + * @data_out_sz: data xfer size for WRITES + * @data_in_dma: physical address for READS + * @data_in_sz: data xfer size for READS + * + * Returns nothing. + */ +static void +_base_build_nvme_prp(struct MPT3SAS_ADAPTER *ioc, u16 smid, + Mpi26NVMeEncapsulatedRequest_t *nvme_encap_request, + dma_addr_t data_out_dma, size_t data_out_sz, dma_addr_t data_in_dma, + size_t data_in_sz) +{ + int prp_size = NVME_PRP_SIZE; + u64 *prp_entry, *prp1_entry, *prp2_entry, *prp_entry_phys; + u64 *prp_page, *prp_page_phys; + u32 offset, entry_len; + u32 page_mask_result, page_mas
[PATCH 02/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices
* Added support for translating the SGLs associated with incoming commands either to IEE SGL or NVMe PRPs for NVMe devices. * The hardware translation of IEEE SGL to NVMe PRPs has limitation and if a command cannot be translated by hardware then it will go to firmware and the firmware needs to translate it. And this will have a performance reduction. To avoid that driver proactively checks whether the translation will be done in hardware or not, if not then driver try to translate inside the driver. Signed-off-by: Chaitra P B Signed-off-by: Suganath Prabu S --- drivers/scsi/mpt3sas/mpt3sas_base.c | 623 +- drivers/scsi/mpt3sas/mpt3sas_base.h | 43 +++- drivers/scsi/mpt3sas/mpt3sas_ctl.c |1 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 12 +- 4 files changed, 666 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 18039bb..b67212c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -59,6 +59,7 @@ #include #include #include +#include /* To get host page size per arch */ #include @@ -1347,6 +1348,502 @@ _base_build_sg(struct MPT3SAS_ADAPTER *ioc, void *psge, /* IEEE format sgls */ /** + * _base_build_nvme_prp - This function is called for NVMe end devices to build + * a native SGL (NVMe PRP). The native SGL is built starting in the first PRP + * entry of the NVMe message (PRP1). If the data buffer is small enough to be + * described entirely using PRP1, then PRP2 is not used. If needed, PRP2 is + * used to describe a larger data buffer. If the data buffer is too large to + * describe using the two PRP entriess inside the NVMe message, then PRP1 + * describes the first data memory segment, and PRP2 contains a pointer to a PRP + * list located elsewhere in memory to describe the remaining data memory + * segments. The PRP list will be contiguous. + + * The native SGL for NVMe devices is a Physical Region Page (PRP). A PRP + * consists of a list of PRP entries to describe a number of noncontigous + * physical memory segments as a single memory buffer, just as a SGL does. Note + * however, that this function is only used by the IOCTL call, so the memory + * given will be guaranteed to be contiguous. There is no need to translate + * non-contiguous SGL into a PRP in this case. All PRPs will describe + * contiguous space that is one page size each. + * + * Each NVMe message contains two PRP entries. The first (PRP1) either contains + * a PRP list pointer or a PRP element, depending upon the command. PRP2 + * contains the second PRP element if the memory being described fits within 2 + * PRP entries, or a PRP list pointer if the PRP spans more than two entries. + * + * A PRP list pointer contains the address of a PRP list, structured as a linear + * array of PRP entries. Each PRP entry in this list describes a segment of + * physical memory. + * + * Each 64-bit PRP entry comprises an address and an offset field. The address + * always points at the beginning of a 4KB physical memory page, and the offset + * describes where within that 4KB page the memory segment begins. Only the + * first element in a PRP list may contain a non-zero offest, implying that all + * memory segments following the first begin at the start of a 4KB page. + * + * Each PRP element normally describes 4KB of physical memory, with exceptions + * for the first and last elements in the list. If the memory being described + * by the list begins at a non-zero offset within the first 4KB page, then the + * first PRP element will contain a non-zero offset indicating where the region + * begins within the 4KB page. The last memory segment may end before the end + * of the 4KB segment, depending upon the overall size of the memory being + * described by the PRP list. + * + * Since PRP entries lack any indication of size, the overall data buffer length + * is used to determine where the end of the data memory buffer is located, and + * how many PRP entries are required to describe it. + * + * @ioc: per adapter object + * @smid: system request message index for getting asscociated SGL + * @nvme_encap_request: the NVMe request msg frame pointer + * @data_out_dma: physical address for WRITES + * @data_out_sz: data xfer size for WRITES + * @data_in_dma: physical address for READS + * @data_in_sz: data xfer size for READS + * + * Returns nothing. + */ +static void +_base_build_nvme_prp(struct MPT3SAS_ADAPTER *ioc, u16 smid, + Mpi26NVMeEncapsulatedRequest_t *nvme_encap_request, + dma_addr_t data_out_dma, size_t data_out_sz, dma_addr_t data_in_dma, + size_t data_in_sz) +{ + int prp_size = NVME_PRP_SIZE; + u64 *prp_entry, *prp1_entry, *prp2_entry, *prp_entry_phys; + u64 *prp_page, *prp_page_phys; + u32 offset, entry_len; + u32 page_mask_result, page_mas