Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-03 Thread Christoph Hellwig
On Sat, Oct 03, 2015 at 06:51:06AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote:
> 
> > 1) add a generic dma_get_page_shift implementation that just returns
> > PAGE_SHIFT
> 
> So you chose to return the granularity of the iommu to the driver
> rather than providing a way for the driver to request a specific
> alignment for DMA mappings. Any specific reason ?

At least for NVMe that's the way to go - it allows to set a page set in
the device which should fit the IOMMU page size.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
 
The NVMe device driver, though, assumes that the DMA alignment for the
PRP entries will match the device's page size, and that the DMA aligment
matches the kernel's page aligment. On Power, the the IOMMU page size,
as mentioned above, can be 4K, while the device can have a page size of
8K, while the kernel has a page size of 64K. This eventually trips the
BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple
of 4K but not 8K (e.g., 0xF000).
 
In this particular case, and generally, we want to use the IOMMU's page
size for the default device page size, rather than the kernel's page
size.
 
This series consists of five patches:

1) add a generic dma_get_page_shift implementation that just returns
PAGE_SHIFT
2) override the generic implementation on Power to use the IOMMU table's
page shift if available
3) allow further specific overriding on power with machdep platform
overrides
4) use the machdep override on pseries, as the DDW code puts the TCE
shift in a special property and there is no IOMMU table available
5) leverage the new API in the NVMe driver
 
With these patches, a NVMe device survives our internal hardware
exerciser; the kernel BUGs within a few seconds without the patch.

 arch/powerpc/include/asm/dma-mapping.h   |  3 +++
 arch/powerpc/include/asm/machdep.h   |  3 ++-
 arch/powerpc/kernel/dma.c| 11 +++
 arch/powerpc/platforms/pseries/iommu.c   | 36 

 drivers/block/nvme-core.c|  3 ++-
 include/asm-generic/dma-mapping-common.h |  7 +++
 6 files changed, 61 insertions(+), 2 deletions(-)

v1 -> v2:
  Based upon feedback from Christoph Hellwig, rather than using an
  arch-specific hack, expose the DMA page shift via a generic DMA API and
  override it on Power as needed.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Benjamin Herrenschmidt
On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote:

> 1) add a generic dma_get_page_shift implementation that just returns
> PAGE_SHIFT

So you chose to return the granularity of the iommu to the driver
rather than providing a way for the driver to request a specific
alignment for DMA mappings. Any specific reason ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Benjamin Herrenschmidt
On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote:
> Right, I did start with your advice and tried that approach, but it
> turned out I was wrong about the actual issue at the time. The problem
> for NVMe isn't actually the starting address alignment (which it can
> handle not being aligned to the device's page size). It doesn't handle
> (addr + len % dev_page_size != 0). That is, it's really a length
> alignment issue.
> 
> It seems incredibly device specific to have a an API into the DMA code
> to request an end alignment -- no other device seems to have this
> issue/design. If you think that's better, I can fiddle with that
> instead.
> 
> Sorry, I should have called this out better as an alternative
> consideration.

Nah it's fine. Ok. Also adding the alignment requirement to the API
would have been a much more complex patch since it would have had to
be implemented for all archs.

I think your current solution is fine.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [07:35:09 +1000], Benjamin Herrenschmidt wrote:
> On Fri, 2015-10-02 at 14:04 -0700, Nishanth Aravamudan wrote:
> > Right, I did start with your advice and tried that approach, but it
> > turned out I was wrong about the actual issue at the time. The problem
> > for NVMe isn't actually the starting address alignment (which it can
> > handle not being aligned to the device's page size). It doesn't handle
> > (addr + len % dev_page_size != 0). That is, it's really a length
> > alignment issue.
> > 
> > It seems incredibly device specific to have a an API into the DMA code
> > to request an end alignment -- no other device seems to have this
> > issue/design. If you think that's better, I can fiddle with that
> > instead.
> > 
> > Sorry, I should have called this out better as an alternative
> > consideration.
> 
> Nah it's fine. Ok. Also adding the alignment requirement to the API
> would have been a much more complex patch since it would have had to
> be implemented for all archs.
> 
> I think your current solution is fine.

Great, thanks. Also, while it's possible an alignment API would be more
performant...we're already not using DDW on Power in this case,
performance is not a primary concern. We want to simply be
functional/correct in this configuration.

-Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5 v2] Fix NVMe driver support on Power with 32-bit DMA

2015-10-02 Thread Nishanth Aravamudan
On 03.10.2015 [06:51:06 +1000], Benjamin Herrenschmidt wrote:
> On Fri, 2015-10-02 at 13:09 -0700, Nishanth Aravamudan wrote:
> 
> > 1) add a generic dma_get_page_shift implementation that just returns
> > PAGE_SHIFT
> 
> So you chose to return the granularity of the iommu to the driver
> rather than providing a way for the driver to request a specific
> alignment for DMA mappings. Any specific reason ?

Right, I did start with your advice and tried that approach, but it
turned out I was wrong about the actual issue at the time. The problem
for NVMe isn't actually the starting address alignment (which it can
handle not being aligned to the device's page size). It doesn't handle
(addr + len % dev_page_size != 0). That is, it's really a length
alignment issue.

It seems incredibly device specific to have a an API into the DMA code
to request an end alignment -- no other device seems to have this
issue/design. If you think that's better, I can fiddle with that
instead.

Sorry, I should have called this out better as an alternative
consideration.

-Nish


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev