Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Jianxiong Gao via iommu
I have tried to set it once at probe and then leave it in place, however
the NVMe driver does not seem to like it, and the VM does not boot
correctly. I have spent a couple days debugging but I am a bit lost
now.

Basically whenever nvme_setup_prp_simple

is
mapping with the mask,
I am getting timeout issues on boot, which to my knowledge shows NVMe
driver failure:

> [5.500662] random: crng init done
> [5.502933] random: 7 urandom warning(s) missed due to ratelimiting
> [  132.077795] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
> [  132.614755] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
>

I have checked that all the mappings happened correctly:

> [4.773570] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fc9acd6c6040, mapped at 7affb200
> [4.784540] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fc9acd6c6040, mapped at 7affc400
> [4.794096] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fc9acd6c6040, mapped at 7affd600
> [4.801983] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fc9acd6c6040, mapped at 7affe800
> [4.806873] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset a00, from fc9acd6c6040, mapped at 7afffa00
> [4.812382] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset c00, from fc9acd6c6040, mapped at 7b000c00
> [4.817423] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset e00, from fc9acd6c6040, mapped at 7b001e00
> [4.823652] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fc9acd6c60c0, mapped at 7b003200
> [4.828679] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fc9acd6c60c0, mapped at 7b004400
> [4.833875] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fc9acd6c60c0, mapped at 7b005600
> [4.838926] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fc9acd6c60c0, mapped at 7b006800

...

I have compared it to not setting the mask. The only difference in result is
that instead of being mapped to *200|* 400|*600 etc, they are all mapped
to *000. So I believe the mapping is done correctly, and according to NVMe

spec

figure
108/109, the mapping should have the offset kept. I am not
sure what caused the error that eventually led to the failure. Is there
another
bug in the NVMe driver?

On Thu, Jan 28, 2021 at 10:18 AM Christoph Hellwig  wrote:

> On Thu, Jan 28, 2021 at 06:00:58PM +, Robin Murphy wrote:
> > If it were possible for this to fail, you might leak the DMA mapping
> here.
> > However if dev->dma_parms somehow disappeared since a dozen lines above
> > then I think you've got far bigger problems anyway.
> >
> > That said, do you really need to keep toggling this back and forth all
> the
> > time? Even if the device does make other mappings elsewhere that don't
> > necessarily need the same strict alignment, would it be significantly
> > harmful just to set it once at probe and leave it in place anyway?
>
> Yes, we should kept it set all the time.  While some NVMe devices have
> the optional to use SGLs that do not have this limitation, I have
> absolutely no sympathy for anyone running NVMe with swiotlb as that
> means their system imposes an addressing limitation.  We need to make
> sure it does not corrupt data, but we're not going to make any effort
> to optimize for such a degenerated setup.
>


-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Christoph Hellwig
On Thu, Jan 28, 2021 at 06:00:58PM +, Robin Murphy wrote:
> If it were possible for this to fail, you might leak the DMA mapping here. 
> However if dev->dma_parms somehow disappeared since a dozen lines above 
> then I think you've got far bigger problems anyway.
>
> That said, do you really need to keep toggling this back and forth all the 
> time? Even if the device does make other mappings elsewhere that don't 
> necessarily need the same strict alignment, would it be significantly 
> harmful just to set it once at probe and leave it in place anyway?

Yes, we should kept it set all the time.  While some NVMe devices have
the optional to use SGLs that do not have this limitation, I have
absolutely no sympathy for anyone running NVMe with swiotlb as that
means their system imposes an addressing limitation.  We need to make
sure it does not corrupt data, but we're not going to make any effort
to optimize for such a degenerated setup.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Robin Murphy

On 2021-01-28 00:38, Jianxiong Gao wrote:

NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao 
---
  drivers/nvme/host/pci.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 856aa31931c1..0b23f04068be 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
  static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
  {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+   if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+   if (dma_set_page_offset_mask(dev->dev, 0))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
  }
  
  static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)

@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
  {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
-   int nr_mapped;
+   int nr_mapped, offset_ret;
  
  	if (blk_rq_nr_phys_segments(req) == 1) {

struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
  
+	offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);

+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
+   goto out_free_sg;
+   }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+   offset_ret = dma_set_page_offset_mask(dev->dev, 0);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
+   goto out_free_sg;


If it were possible for this to fail, you might leak the DMA mapping 
here. However if dev->dma_parms somehow disappeared since a dozen lines 
above then I think you've got far bigger problems anyway.


That said, do you really need to keep toggling this back and forth all 
the time? Even if the device does make other mappings elsewhere that 
don't necessarily need the same strict alignment, would it be 
significantly harmful just to set it once at probe and leave it in place 
anyway?


Robin.


+   }
if (!nr_mapped)
goto out_free_sg;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu