Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
On Sat, Oct 03, 2015 at 06:51:06AM +1000, Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote: > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > So you chose to return the granularity of the iommu to the driver > rather than providing a way for the driver to request a specific > alignment for DMA mappings. Any specific reason ? At least for NVMe that's the way to go - it allows to set a page set in the device which should fit the IOMMU page size. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case, and generally, we want to use the IOMMU's page size for the default device page size, rather than the kernel's page size. This series consists of five patches: 1) add a generic dma_get_page_shift implementation that just returns PAGE_SHIFT 2) override the generic implementation on Power to use the IOMMU table's page shift if available 3) allow further specific overriding on power with machdep platform overrides 4) use the machdep override on pseries, as the DDW code puts the TCE shift in a special property and there is no IOMMU table available 5) leverage the new API in the NVMe driver With these patches, a NVMe device survives our internal hardware exerciser; the kernel BUGs within a few seconds without the patch. arch/powerpc/include/asm/dma-mapping.h | 3 +++ arch/powerpc/include/asm/machdep.h | 3 ++- arch/powerpc/kernel/dma.c| 11 +++ arch/powerpc/platforms/pseries/iommu.c | 36 drivers/block/nvme-core.c| 3 ++- include/asm-generic/dma-mapping-common.h | 7 +++ 6 files changed, 61 insertions(+), 2 deletions(-) v1 -> v2: Based upon feedback from Christoph Hellwig, rather than using an arch-specific hack, expose the DMA page shift via a generic DMA API and override it on Power as needed. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote: > 1) add a generic dma_get_page_shift implementation that just returns > PAGE_SHIFT So you chose to return the granularity of the iommu to the driver rather than providing a way for the driver to request a specific alignment for DMA mappings. Any specific reason ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote: > Right, I did start with your advice and tried that approach, but it > turned out I was wrong about the actual issue at the time. The problem > for NVMe isn't actually the starting address alignment (which it can > handle not being aligned to the device's page size). It doesn't handle > (addr + len % dev_page_size != 0). That is, it's really a length > alignment issue. > > It seems incredibly device specific to have a an API into the DMA code > to request an end alignment -- no other device seems to have this > issue/design. If you think that's better, I can fiddle with that > instead. > > Sorry, I should have called this out better as an alternative > consideration. Nah it's fine. Ok. Also adding the alignment requirement to the API would have been a much more complex patch since it would have had to be implemented for all archs. I think your current solution is fine. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
On 03.10.2015 [07:35:09 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote: > > Right, I did start with your advice and tried that approach, but it > > turned out I was wrong about the actual issue at the time. The problem > > for NVMe isn't actually the starting address alignment (which it can > > handle not being aligned to the device's page size). It doesn't handle > > (addr + len % dev_page_size != 0). That is, it's really a length > > alignment issue. > > > > It seems incredibly device specific to have a an API into the DMA code > > to request an end alignment -- no other device seems to have this > > issue/design. If you think that's better, I can fiddle with that > > instead. > > > > Sorry, I should have called this out better as an alternative > > consideration. > > Nah it's fine. Ok. Also adding the alignment requirement to the API > would have been a much more complex patch since it would have had to > be implemented for all archs. > > I think your current solution is fine. Great, thanks. Also, while it's possible an alignment API would be more performant...we're already not using DDW on Power in this case, performance is not a primary concern. We want to simply be functional/correct in this configuration. -Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA
On 03.10.2015 [06:51:06 +1000], Benjamin Herrenschmidt wrote: > On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote: > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > So you chose to return the granularity of the iommu to the driver > rather than providing a way for the driver to request a specific > alignment for DMA mappings. Any specific reason ? Right, I did start with your advice and tried that approach, but it turned out I was wrong about the actual issue at the time. The problem for NVMe isn't actually the starting address alignment (which it can handle not being aligned to the device's page size). It doesn't handle (addr + len % dev_page_size != 0). That is, it's really a length alignment issue. It seems incredibly device specific to have a an API into the DMA code to request an end alignment -- no other device seems to have this issue/design. If you think that's better, I can fiddle with that instead. Sorry, I should have called this out better as an alternative consideration. -Nish ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev