RE: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-17 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, November 16, 2021 9:46 PM
> 
> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > Hi Christoph,
> >
> > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > +enum iommu_dma_owner {
> > > > +   DMA_OWNER_NONE,
> > > > +   DMA_OWNER_KERNEL,
> > > > +   DMA_OWNER_USER,
> > > > +};
> > > > +
> > >
> > > > +   enum iommu_dma_owner dma_owner;
> > > > +   refcount_t owner_cnt;
> > > > +   struct file *owner_user_file;
> > >
> > > I'd just overload the ownership into owner_user_file,
> > >
> > >   NULL-> no owner
> > >   (struct file *)1UL) -> kernel
> > >   real pointer-> user
> > >
> > > Which could simplify a lot of the code dealing with the owner.
> > >
> >
> > Yeah! Sounds reasonable. I will make this in the next version.
> 
> It would be good to figure out how to make iommu_attach_device()
> enforce no other driver binding as a kernel user without a file *, as
> Robin pointed to, before optimizing this.
> 
> This fixes an existing bug where iommu_attach_device() only checks the
> group size and is vunerable to a hot plug increasing the group size
> after it returns. That check should be replaced by this series's logic
> instead.
> 

I think this existing bug in iommu_attach_devce() is different from 
what this series is attempting to solve. To avoid breaking singleton
group assumption there the ideal band-aid is to fail device hotplug.
Otherwise some IOVA ranges which are supposed to go upstream 
to IOMMU may be considered as p2p and routed to the hotplugged
device instead. In concept a singleton group is different from a
multi-devices group which has only one device bound to driver...

This series aims to avoid conflict having both user and kernel drivers
mixed in a multi-devices group.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-17 Thread Lu Baolu

On 11/17/21 9:35 PM, Jason Gunthorpe wrote:

On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:

Hi Jason,

On 11/16/21 9:46 PM, Jason Gunthorpe wrote:

On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:

Hi Christoph,

On 11/15/21 9:14 PM, Christoph Hellwig wrote:

On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:

+enum iommu_dma_owner {
+   DMA_OWNER_NONE,
+   DMA_OWNER_KERNEL,
+   DMA_OWNER_USER,
+};
+



+   enum iommu_dma_owner dma_owner;
+   refcount_t owner_cnt;
+   struct file *owner_user_file;


I'd just overload the ownership into owner_user_file,

NULL-> no owner
(struct file *)1UL) -> kernel
real pointer-> user

Which could simplify a lot of the code dealing with the owner.



Yeah! Sounds reasonable. I will make this in the next version.


It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.


As my my understanding, the essence of this problem is that only the
user owner of the iommu_group could attach an UNMANAGED domain to it.
If I understand it right, how about introducing a new interface to
allocate a user managed domain and storing the user file pointer in it.


For iommu_attach_device() the semantic is simple non-sharing, so there
is no need for the file * at all, it can just be NULL.


The file * being NULL means the device is only owned by the kernel
driver. Perhaps we can check this pointer in iommu_attach_device() to
avoid using it for user domain attachment.




Does above help here?


No, iommu_attach_device() is kernel only and should not interact with
userspace.


The existing iommu_attach_device() allows only for singleton group. As
we have added group ownership attribute, we can enforce this interface
only for kernel domain usage.



I'm also going to see if I can learn what Tegra is doing with
iommu_attach_group()


Okay! Thank you!



Jason



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-17 Thread Rob Herring
On Tue, Nov 16, 2021 at 5:52 AM Jean-Philippe Brucker
 wrote:
>
> Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
> placed as a sibling node of the SMMU. Although the PMCGs registers may
> be within the SMMU MMIO region, they are separate devices, and there can
> be multiple PMCG devices for each SMMU (for example one for the TCU and
> one for each TBU).
>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .../bindings/iommu/arm,smmu-v3-pmcg.yaml  | 67 +++
>  1 file changed, 67 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml 
> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
> new file mode 100644
> index ..a893e071fdb4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml
> @@ -0,0 +1,67 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/arm,smmu-v3-pmcg.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Arm SMMUv3 Performance Monitor Counter Group
> +
> +maintainers:
> +  - Will Deacon 
> +  - Robin Murphy 
> +
> +description: |+

Don't need '|+' if no formatting to preserve.

> +  An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
> +  They are standalone performance monitoring units that support both
> +  architected and IMPLEMENTATION DEFINED event counters.

Humm, I don't know that I agree they are standalone. They could be I
guess, but looking at the MMU-600 spec the PMCG looks like it's just a
subset of registers in a larger block. This seems similar to MPAM
(which I'm working on a binding for) where it's just a register map
and interrupts, but every other possible resource is unspecified by
the architecture.

The simplest change from this would be just specifying that the PMCG
is child node(s) of whatever it is part of. The extreme would be this
is all part of the SMMU binding (i.e. reg entry X is PMCG registers,
interrupts entry Y is pmu irq).

> +
> +properties:
> +  $nodename:
> +pattern: "^pmu@[0-9a-f]*"

s/*/+/

Need at least 1 digit.

> +  compatible:
> +oneOf:
> +  - items:
> +- enum:
> +  - hisilicon,smmu-v3-pmcg-hip08
> +- const: arm,smmu-v3-pmcg
> +  - const: arm,smmu-v3-pmcg
> +
> +  reg:
> +description: |
> +  Base addresses of the PMCG registers. Either a single address for Page > 0
> +  or an additional address for Page 1, where some registers can be
> +  relocated with SMMU_PMCG_CFGR.RELOC_CTRS.
> +minItems: 1
> +maxItems: 2
> +
> +  interrupts:
> +maxItems: 1
> +
> +  msi-parent: true
> +
> +required:
> +  - compatible
> +  - reg
> +
> +additionalProperties: false
> +
> +examples:
> +  - |+
> +#include 
> +#include 
> +
> +pmu@2b42 {
> +compatible = "arm,smmu-v3-pmcg";
> +reg = <0 0x2b42 0 0x1000>,
> +  <0 0x2b43 0 0x1000>;
> +interrupts = ;
> +msi-parent = < 0xff>;
> +};
> +
> +pmu@2b44 {
> +compatible = "arm,smmu-v3-pmcg";
> +reg = <0 0x2b44 0 0x1000>,
> +  <0 0x2b45 0 0x1000>;
> +interrupts = ;
> +msi-parent = < 0xff>;
> +};
> --
> 2.33.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 11/23] nvme-pci: convert to using dma_map_sgtable()

2021-11-17 Thread Logan Gunthorpe
The dma_map operations now support P2PDMA pages directly. So remove
the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
to dma_map_sgtable().

dma_map_sgtable() returns more complete error codes than dma_map_sg()
and allows differentiating EREMOTEIO errors in case an unsupported
P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
so the request isn't retried.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Max Gurtovoy 
---
 drivers/nvme/host/pci.c | 69 +
 1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 72f623999ba5..3f2bd1efe076 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -229,11 +229,10 @@ struct nvme_iod {
bool use_sgl;
int aborted;
int npages; /* In the PRP list. 0 means small pool in use */
-   int nents;  /* Used in scatterlist */
dma_addr_t first_dma;
unsigned int dma_len;   /* length of single DMA segment mapping */
dma_addr_t meta_dma;
-   struct scatterlist *sg;
+   struct sg_table sgt;
 };
 
 static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
@@ -531,7 +530,7 @@ static void nvme_commit_rqs(struct blk_mq_hw_ctx *hctx)
 static void **nvme_pci_iod_list(struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-   return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
+   return (void **)(iod->sgt.sgl + blk_rq_nr_phys_segments(req));
 }
 
 static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
@@ -583,17 +582,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
}
 }
 
-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
-   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
-   rq_dma_dir(req));
-   else
-   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -604,9 +592,10 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
return;
}
 
-   WARN_ON_ONCE(!iod->nents);
+   WARN_ON_ONCE(!iod->sgt.nents);
+
+   dma_unmap_sgtable(dev->dev, >sgt, rq_dma_dir(req), 0);
 
-   nvme_unmap_sg(dev, req);
if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
  iod->first_dma);
@@ -614,7 +603,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
nvme_free_sgls(dev, req);
else
nvme_free_prps(dev, req);
-   mempool_free(iod->sg, dev->iod_mempool);
+   mempool_free(iod->sgt.sgl, dev->iod_mempool);
 }
 
 static void nvme_print_sgl(struct scatterlist *sgl, int nents)
@@ -637,7 +626,7 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
int length = blk_rq_payload_bytes(req);
-   struct scatterlist *sg = iod->sg;
+   struct scatterlist *sg = iod->sgt.sgl;
int dma_len = sg_dma_len(sg);
u64 dma_addr = sg_dma_address(sg);
int offset = dma_addr & (NVME_CTRL_PAGE_SIZE - 1);
@@ -710,16 +699,16 @@ static blk_status_t nvme_pci_setup_prps(struct nvme_dev 
*dev,
dma_len = sg_dma_len(sg);
}
 done:
-   cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
+   cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sgt.sgl));
cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
return BLK_STS_OK;
 free_prps:
nvme_free_prps(dev, req);
return BLK_STS_RESOURCE;
 bad_sgl:
-   WARN(DO_ONCE(nvme_print_sgl, iod->sg, iod->nents),
+   WARN(DO_ONCE(nvme_print_sgl, iod->sgt.sgl, iod->sgt.nents),
"Invalid SGL for payload:%d nents:%d\n",
-   blk_rq_payload_bytes(req), iod->nents);
+   blk_rq_payload_bytes(req), iod->sgt.nents);
return BLK_STS_IOERR;
 }
 
@@ -745,12 +734,13 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc 
*sge,
 }
 
 static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
-   struct request *req, struct nvme_rw_command *cmd, int entries)
+   struct request *req, struct nvme_rw_command *cmd)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool;
struct nvme_sgl_desc *sg_list;
-   struct scatterlist *sg = iod->sg;
+   struct scatterlist *sg = iod->sgt.sgl;
+   int entries = iod->sgt.nents;
dma_addr_t sgl_dma;
int i = 0;
 
@@ -848,7 +838,7 @@ static blk_status_t 

[PATCH v4 06/23] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers

2021-11-17 Thread Logan Gunthorpe
Add EREMOTEIO error return to dma_map_sgtable() which will be used
by .map_sg() implementations that detect P2PDMA pages that the
underlying DMA device cannot access.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 kernel/dma/mapping.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..c056a1468189 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -197,7 +197,7 @@ static int __dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
if (ents > 0)
debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
- ents != -EIO))
+ ents != -EIO && ents != -EREMOTEIO))
return -EIO;
 
return ents;
@@ -255,6 +255,8 @@ EXPORT_SYMBOL(dma_map_sg_attrs);
  * complete the mapping. Should succeed if retried later.
  *   -EIO  Legacy error code with an unknown meaning. eg. this is
  * returned if a lower level call returned DMA_MAPPING_ERROR.
+ *   -EREMOTEIOThe DMA device cannot access P2PDMA memory specified in
+ * the sg_table. This will not succeed if retried.
  */
 int dma_map_sgtable(struct device *dev, struct sg_table *sgt,
enum dma_data_direction dir, unsigned long attrs)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 09/23] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg

2021-11-17 Thread Logan Gunthorpe
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through
 the root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
 normally with an IOMMU IOVA.
  3) It is not possible for the two devices to communicate and thus
 the mapping operation should fail (and it will return -EREMOTEIO).

Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is set to
DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 drivers/iommu/dma-iommu.c | 67 +++
 1 file changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b42e38a0dbe2..c70c661d8f98 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -883,6 +883,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
+   if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -979,6 +989,8 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -1014,6 +1026,35 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
 
+   if (is_pci_p2pdma_page(sg_page(s))) {
+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev);
+   }
+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected
+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Mapping through host bridge should be
+* mapped with regular IOVAs, thus we
+* do nothing here and continue below.
+*/
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1036,6 +1077,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
+   if (!iova_len)
+   return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova) {
ret = -ENOMEM;
@@ -1057,7 +1101,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 out_restore_sg:
__invalidate_sg(sg, nents);
 out:
-   if (ret != -ENOMEM)
+   if (ret != -ENOMEM && ret != -EREMOTEIO)
return -EINVAL;
return ret;
 }
@@ -1065,7 +1109,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist 

[PATCH v4 00/23] Userspace P2PDMA with O_DIRECT NVMe devices

2021-11-17 Thread Logan Gunthorpe
Hi,

This patchset continues my work to add userspace P2PDMA access using
O_DIRECT NVMe devices. This posting fixes a lot of issues that were
addresed in the last posting, which is here[1].

The patchset enables userspace P2PDMA by allowing userspace to mmap()
allocated chunks of the CMB. The resulting VMA can be passed only
to O_DIRECT IO on NVMe backed files or block devices. A flag is added
to GUP() in Patch <>, then Patches <> through <> wire this flag up based
on whether the block queue indicates P2PDMA support. Patches <>
through <> enable the CMB to be mapped into userspace by mmaping
the nvme char device.

This is relatively straightforward, however the one significant
problem is that, presently, pci_p2pdma_map_sg() requires a homogeneous
SGL with all P2PDMA pages or all regular pages. Enhancing GUP to
support enforcing this rule would require a huge hack that I don't
expect would be all that pallatable. So the first 13 patches add
support for P2PDMA pages to dma_map_sg[table]() to the dma-direct
and dma-iommu implementations. Thus systems without an IOMMU plus
Intel and AMD IOMMUs are supported. (Other IOMMU implementations would
then be unsupported, notably ARM and PowerPC but support would be added
when they convert to dma-iommu).

dma_map_sgtable() is preferred when dealing with P2PDMA memory as it
will return -EREMOTEIO when the DMA device cannot map specific P2PDMA
pages based on the existing rules in calc_map_type_and_dist().

The other issue is dma_unmap_sg() needs a flag to determine whether a
given dma_addr_t was mapped regularly or as a PCI bus address. To allow
this, a third flag is added to the page_link field in struct
scatterlist. This effectively means support for P2PDMA will now depend
on CONFIG_64BIT.

Feedback welcome.

This series is based on v5.16-rc1. A git branch is available here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_user_cmb_v4

Thanks,

Logan

[1] https://lore.kernel.org/all/20210916234100.122368-1-log...@deltatee.com

--

Changes since v3:
  - Add some comment and commit message cleanups I had missed for v3,
also moved the prototypes for some of the p2pdma helpers to
dma-map-ops.h (which I missed in v3 and was suggested in v2).
  - Add separate cleanup patch for scatterlist.h and change the macros
to functions. (Suggested by Chaitanya and Jason, respectively)
  - Rename sg_dma_mark_pci_p2pdma() and sg_is_dma_pci_p2pdma() to
sg_dma_mark_bus_address() and sg_is_dma_bus_address() which
is a more generic name (As requested by Jason)
  - Fixes to some comments and commit messages as suggested by Bjorn
and Jason.
  - Ensure swiotlb is not used with P2PDMA pages. (Per Jason)
  - The sgtable coversion in RDMA was split out and sent upstream
separately, the new patch is only the removal. (Per Jason)
  - Moved the FOLL_PCI_P2PDMA check outside of get_dev_pagemap() as
Jason suggested this will be removed in the near term.
  - Add two patches to ensure that zone device pages with different
pgmaps are never merged in the block layer or
sg_alloc_append_table_from_pages() (Per Jason)
  - Ensure synchronize_rcu() or call_rcu() is used before returning
pages to the genalloc. (Jason pointed out that pages are not
gauranteed to be unused in all architectures until at least
after an RCU grace period, and that synchronize_rcu() was likely
too slow to use in the vma close operation.
  - Collected Acks and Reviews by Bjorn, Jason and Max.

Logan Gunthorpe (23):
  lib/scatterlist: cleanup macros into static inline functions
  lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Expose pci_p2pdma_map_type()
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  dma-mapping: allow EREMOTEIO return code for P2PDMA transfers
  dma-direct: support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support
  iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg
  nvme-pci: check DMA ops when indicating support for PCI P2PDMA
  nvme-pci: convert to using dma_map_sgtable()
  RDMA/core: introduce ib_dma_pci_p2p_dma_supported()
  RDMA/rw: drop pci_p2pdma_[un]map_sg()
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()
  mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
  block: add check when merging zone device pages
  lib/scatterlist: add check when merging zone device pages
  block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
  mm: use custom page_free for P2PDMA pages
  PCI/P2PDMA: Introduce pci_mmap_p2pmem()
  nvme-pci: allow mmaping the CMB in userspace

 block/bio.c  |  10 +-
 block/blk-map.c  |   7 +-
 drivers/infiniband/core/rw.c |  45 +---
 drivers/iommu/dma-iommu.c|  67 +-
 drivers/nvme/host/core.c |  18 +-
 

[PATCH v4 10/23] nvme-pci: check DMA ops when indicating support for PCI P2PDMA

2021-11-17 Thread Logan Gunthorpe
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 11 +--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4b5de8f5435a..344414351314 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3819,7 +3819,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid,
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
-   if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+   if (ctrl->ops->supports_pci_p2pdma &&
+   ctrl->ops->supports_pci_p2pdma(ctrl))
blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
 
ns->ctrl = ctrl;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b334af8aa264..a9f60b12a32b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -486,7 +486,6 @@ struct nvme_ctrl_ops {
unsigned int flags;
 #define NVME_F_FABRICS (1 << 0)
 #define NVME_F_METADATA_SUPPORTED  (1 << 1)
-#define NVME_F_PCI_P2PDMA  (1 << 2)
int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -494,6 +493,7 @@ struct nvme_ctrl_ops {
void (*submit_async_event)(struct nvme_ctrl *ctrl);
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+   bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca2ee806d74b..72f623999ba5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2900,17 +2900,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, 
char *buf, int size)
return snprintf(buf, size, "%s\n", dev_name(>dev));
 }
 
+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+   struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+   return dma_pci_p2pdma_supported(dev->dev);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
-   .flags  = NVME_F_METADATA_SUPPORTED |
- NVME_F_PCI_P2PDMA,
+   .flags  = NVME_F_METADATA_SUPPORTED,
.reg_read32 = nvme_pci_reg_read32,
.reg_write32= nvme_pci_reg_write32,
.reg_read64 = nvme_pci_reg_read64,
.free_ctrl  = nvme_pci_free_ctrl,
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
+   .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 21/23] mm: use custom page_free for P2PDMA pages

2021-11-17 Thread Logan Gunthorpe
When P2PDMA pages are passed to userspace, they will need to be
reference counted properly and returned to their genalloc after their
reference count returns to 1. This is accomplished with the existing
DEV_PAGEMAP_OPS and the .page_free() operation.

Change CONFIG_P2PDMA to select CONFIG_DEV_PAGEMAP_OPS and add
MEMORY_DEVICE_PCI_P2PDMA to page_is_devmap_managed(),
devmap_managed_enable_[put|get]() and free_devmap_managed_page().

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/Kconfig  |  1 +
 drivers/pci/p2pdma.c | 13 +
 include/linux/mm.h   |  1 +
 mm/memremap.c| 12 +---
 4 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 95f29601a4df..da53799cddab 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -170,6 +170,7 @@ config PCI_P2PDMA
#
select NEED_SG_DMA_BUS_ADDR_FLAG
select GENERIC_ALLOCATOR
+   select DEV_PAGEMAP_OPS
help
  Enableѕ drivers to do PCI peer-to-peer transactions to and from
  BARs that are exposed in other devices that are the part of
diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 563e9be9599e..16992b0f0c36 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -101,6 +101,18 @@ static const struct attribute_group p2pmem_group = {
.name = "p2pmem",
 };
 
+static void p2pdma_page_free(struct page *page)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
+
+   gen_pool_free(pgmap->provider->p2pdma->pool,
+ (uintptr_t)page_to_virt(page), PAGE_SIZE);
+}
+
+static const struct dev_pagemap_ops p2pdma_pgmap_ops = {
+   .page_free = p2pdma_page_free,
+};
+
 static void pci_p2pdma_release(void *data)
 {
struct pci_dev *pdev = data;
@@ -198,6 +210,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, 
size_t size,
pgmap->range.end = pgmap->range.start + size - 1;
pgmap->nr_range = 1;
pgmap->type = MEMORY_DEVICE_PCI_P2PDMA;
+   pgmap->ops = _pgmap_ops;
 
p2p_pgmap->provider = pdev;
p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) -
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3367d936b256..f26ea7e1fc74 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1168,6 +1168,7 @@ static inline bool page_is_devmap_managed(struct page 
*page)
switch (page->pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
case MEMORY_DEVICE_FS_DAX:
+   case MEMORY_DEVICE_PCI_P2PDMA:
return true;
default:
break;
diff --git a/mm/memremap.c b/mm/memremap.c
index 5a66a71ab591..ec3143ffdeee 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -44,14 +44,16 @@ EXPORT_SYMBOL(devmap_managed_key);
 static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
 {
if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-   pgmap->type == MEMORY_DEVICE_FS_DAX)
+   pgmap->type == MEMORY_DEVICE_FS_DAX ||
+   pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
static_branch_dec(_managed_key);
 }
 
 static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
-   pgmap->type == MEMORY_DEVICE_FS_DAX)
+   pgmap->type == MEMORY_DEVICE_FS_DAX ||
+   pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
static_branch_inc(_managed_key);
 }
 #else
@@ -355,6 +357,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
case MEMORY_DEVICE_GENERIC:
break;
case MEMORY_DEVICE_PCI_P2PDMA:
+   if (!pgmap->ops->page_free) {
+   WARN(1, "Missing page_free method\n");
+   return ERR_PTR(-EINVAL);
+   }
params.pgprot = pgprot_noncached(params.pgprot);
break;
default:
@@ -498,7 +504,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
 void free_devmap_managed_page(struct page *page)
 {
/* notify page idle for dax */
-   if (!is_device_private_page(page)) {
+   if (!is_device_private_page(page) && !is_pci_p2pdma_page(page)) {
wake_up_var(>_refcount);
return;
}
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 07/23] dma-direct: support PCI P2PDMA pages in dma-direct map_sg

2021-11-17 Thread Logan Gunthorpe
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

A P2PDMA page may have three possible outcomes when being mapped:
  1) If the data path between the two devices doesn't go through the
 root port, then it should be mapped with a PCI bus address
  2) If the data path goes through the host bridge, it should be mapped
 normally, as though it were a CPU physical address
  3) It is not possible for the two devices to communicate and thus
 the mapping operation should fail (and it will return -EREMOTEIO).

SGL segments that contain PCI bus addresses are marked with
sg_dma_mark_pci_p2pdma() and are ignored when unmapped.

P2PDMA mappings are also failed if swiotlb needs to be used on the
mapping.

Signed-off-by: Logan Gunthorpe 
---
 kernel/dma/direct.c | 43 +--
 kernel/dma/direct.h |  7 ++-
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4c6c5e0635e3..f2368263f847 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -421,29 +421,60 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
arch_sync_dma_for_cpu_all();
 }
 
+/*
+ * Unmaps segments, except for ones marked as pci_p2pdma which do not
+ * require any further action as they contain a bus address.
+ */
 void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgl, sg, nents, i)
-   dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
-attrs);
+   for_each_sg(sgl,  sg, nents, i) {
+   if (sg_is_dma_bus_address(sg))
+   sg_dma_unmark_bus_address(sg);
+   else
+   dma_direct_unmap_page(dev, sg->dma_address,
+ sg_dma_len(sg), dir, attrs);
+   }
 }
 #endif
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
 {
-   int i;
+   struct pci_p2pdma_map_state p2pdma_state = {};
+   enum pci_p2pdma_map_type map;
struct scatterlist *sg;
+   int i, ret;
 
for_each_sg(sgl, sg, nents, i) {
+   if (is_pci_p2pdma_page(sg_page(sg))) {
+   map = pci_p2pdma_map_segment(_state, dev, sg);
+   switch (map) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   /*
+* Any P2P mapping that traverses the PCI
+* host bridge must be mapped with CPU physical
+* address and not PCI bus addresses. This is
+* done with dma_direct_map_page() below.
+*/
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_unmap;
+   }
+   }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
-   if (sg->dma_address == DMA_MAPPING_ERROR)
+   if (sg->dma_address == DMA_MAPPING_ERROR) {
+   ret = -EIO;
goto out_unmap;
+   }
sg_dma_len(sg) = sg->length;
}
 
@@ -451,7 +482,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
 
 out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-   return -EIO;
+   return ret;
 }
 
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 4632b0f4f72e..a33152d79069 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,10 +87,15 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (is_swiotlb_force_bounce(dev))
+   if (is_swiotlb_force_bounce(dev)) {
+   if (is_pci_p2pdma_page(page))
+   return DMA_MAPPING_ERROR;
return swiotlb_map(dev, phys, size, dir, attrs);
+   }
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
+   if (is_pci_p2pdma_page(page))
+   return DMA_MAPPING_ERROR;
if (swiotlb_force != SWIOTLB_NO_FORCE)
return 

[PATCH v4 05/23] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

2021-11-17 Thread Logan Gunthorpe
Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Prototypes for these helpers are added to dma-map-ops.h as they are only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c| 59 +
 include/linux/dma-map-ops.h | 21 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 02a13a5ac680..6ad3a8816677 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -944,6 +944,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
 
+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared outside of the for_each_sg()
+ * loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ *
+ * This is a helper to be used by non-IOMMU dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns the type of mapping used and maps the page if the type is
+ * PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist *sg)
+{
+   if (state->pgmap != sg_page(sg)->pgmap) {
+   state->pgmap = sg_page(sg)->pgmap;
+   state->map = pci_p2pdma_map_type(state->pgmap, dev);
+   state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+   }
+
+   if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
+   sg->dma_address = sg_phys(sg) + state->bus_off;
+   sg_dma_len(sg) = sg->length;
+   sg_dma_mark_bus_address(sg);
+   }
+
+   return state->map;
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ * be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a DMA address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the DMA address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+   dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+   sg_dma_len(dma_sg) = pg_sg->length;
+   sg_dma_mark_bus_address(dma_sg);
+}
+
 /**
  * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
  * to enable p2pdma
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index d693a0e33bac..752f91e5eb5d 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -413,15 +413,36 @@ enum pci_p2pdma_map_type {
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
 };
 
+struct pci_p2pdma_map_state {
+   struct dev_pagemap *pgmap;
+   int map;
+   u64 bus_off;
+};
+
 #ifdef CONFIG_PCI_P2PDMA
 enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
 struct device *dev);
+enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist *sg);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg);
 #else /* CONFIG_PCI_P2PDMA */
 static inline enum pci_p2pdma_map_type
 pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
 {
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 }
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev,
+  struct scatterlist 

[PATCH v4 20/23] block: set FOLL_PCI_P2PDMA in bio_map_user_iov()

2021-11-17 Thread Logan Gunthorpe
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be
passed from userspace and enables the NVMe passthru requests to
use P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
---
 block/blk-map.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 4526adde0156..7508448e290c 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -234,6 +234,7 @@ static int bio_map_user_iov(struct request *rq, struct 
iov_iter *iter,
gfp_t gfp_mask)
 {
unsigned int max_sectors = queue_max_hw_sectors(rq->q);
+   unsigned int flags = 0;
struct bio *bio;
int ret;
int j;
@@ -246,13 +247,17 @@ static int bio_map_user_iov(struct request *rq, struct 
iov_iter *iter,
return -ENOMEM;
bio->bi_opf |= req_op(rq);
 
+   if (blk_queue_pci_p2pdma(rq->q))
+   flags |= FOLL_PCI_P2PDMA;
+
while (iov_iter_count(iter)) {
struct page **pages;
ssize_t bytes;
size_t offs, added = 0;
int npages;
 
-   bytes = iov_iter_get_pages_alloc(iter, , LONG_MAX, );
+   bytes = iov_iter_get_pages_alloc_flags(iter, , LONG_MAX,
+  , flags);
if (unlikely(bytes <= 0)) {
ret = bytes ? bytes : -EFAULT;
goto out_unmap;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 16/23] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()

2021-11-17 Thread Logan Gunthorpe
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
which take a flags argument that is passed to get_user_pages_fast().

This is so that FOLL_PCI_P2PDMA can be passed when appropriate.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/uio.h | 21 +
 lib/iov_iter.c  | 15 +++
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 6350354f97e9..4c6e64d2f7dd 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -243,10 +243,23 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int 
direction, struct pipe_inode
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t 
count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray 
*xarray,
 loff_t start, size_t count);
-ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
-   size_t maxsize, unsigned maxpages, size_t *start);
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
-   size_t maxsize, size_t *start);
+ssize_t iov_iter_get_pages_flags(struct iov_iter *i, struct page **pages,
+   size_t maxsize, unsigned maxpages, size_t *start,
+   unsigned int gup_flags);
+ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i,
+   struct page ***pages, size_t maxsize, size_t *start,
+   unsigned int gup_flags);
+static inline ssize_t iov_iter_get_pages(struct iov_iter *i,
+   struct page **pages, size_t maxsize, unsigned maxpages,
+   size_t *start)
+{
+   return iov_iter_get_pages_flags(i, pages, maxsize, maxpages, start, 0);
+}
+static inline ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+   struct page ***pages, size_t maxsize, size_t *start)
+{
+   return iov_iter_get_pages_alloc_flags(i, pages, maxsize, start, 0);
+}
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);
 
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 66a740e6e153..0d557e0e82b2 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1515,9 +1515,9 @@ static struct page *first_bvec_segment(const struct 
iov_iter *i,
return page;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
+ssize_t iov_iter_get_pages_flags(struct iov_iter *i,
   struct page **pages, size_t maxsize, unsigned maxpages,
-  size_t *start)
+  size_t *start, unsigned int gup_flags)
 {
size_t len;
int n, res;
@@ -1528,7 +1528,6 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
return 0;
 
if (likely(iter_is_iovec(i))) {
-   unsigned int gup_flags = 0;
unsigned long addr;
 
if (iov_iter_rw(i) != WRITE)
@@ -1558,7 +1557,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
return iter_xarray_get_pages(i, pages, maxsize, maxpages, 
start);
return -EFAULT;
 }
-EXPORT_SYMBOL(iov_iter_get_pages);
+EXPORT_SYMBOL(iov_iter_get_pages_flags);
 
 static struct page **get_pages_array(size_t n)
 {
@@ -1640,9 +1639,9 @@ static ssize_t iter_xarray_get_pages_alloc(struct 
iov_iter *i,
return actual;
 }
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+ssize_t iov_iter_get_pages_alloc_flags(struct iov_iter *i,
   struct page ***pages, size_t maxsize,
-  size_t *start)
+  size_t *start, unsigned int gup_flags)
 {
struct page **p;
size_t len;
@@ -1654,7 +1653,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
return 0;
 
if (likely(iter_is_iovec(i))) {
-   unsigned int gup_flags = 0;
unsigned long addr;
 
if (iov_iter_rw(i) != WRITE)
@@ -1667,6 +1665,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
p = get_pages_array(n);
if (!p)
return -ENOMEM;
+
res = get_user_pages_fast(addr, n, gup_flags, p);
if (unlikely(res <= 0)) {
kvfree(p);
@@ -1694,7 +1693,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
return -EFAULT;
 }
-EXPORT_SYMBOL(iov_iter_get_pages_alloc);
+EXPORT_SYMBOL(iov_iter_get_pages_alloc_flags);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
   struct iov_iter *i)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 15/23] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages

2021-11-17 Thread Logan Gunthorpe
Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
allow obtaining P2PDMA pages. If a caller does not set this flag
and tries to map P2PDMA pages it will fail.

This is implemented by checking failing if PCI p2pdma pages are
found when FOLL_PCI_P2PDMA is set. This is only done if pte_devmap()
is set.

FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/mm.h |  1 +
 mm/gup.c   | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7e4a9e7d807..65cb27cebbab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2971,6 +2971,7 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
 #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
 #define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_PCI_P2PDMA0x10 /* allow returning PCI P2PDMA pages */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 2c51e9748a6a..c31461c3d256 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -527,6 +527,12 @@ static struct page *follow_page_pte(struct vm_area_struct 
*vma,
page = pte_page(pte);
else
goto no_page;
+
+   if (unlikely(!(flags & FOLL_PCI_P2PDMA) &&
+is_pci_p2pdma_page(page))) {
+   page = ERR_PTR(-EREMOTEIO);
+   goto out;
+   }
} else if (unlikely(!page)) {
if (flags & FOLL_DUMP) {
/* Avoid special (like zero) pages in core dumps */
@@ -980,6 +986,9 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
return -EOPNOTSUPP;
 
+   if ((gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_PCI_P2PDMA))
+   return -EOPNOTSUPP;
+
if (vma_is_secretmem(vma))
return -EFAULT;
 
@@ -2297,6 +2306,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
 
+   if (unlikely(pte_devmap(pte) && !(flags & FOLL_PCI_P2PDMA) &&
+is_pci_p2pdma_page(page)))
+   goto pte_unmap;
+
head = try_grab_compound_head(page, 1, flags);
if (!head)
goto pte_unmap;
@@ -2374,6 +2387,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned 
long addr,
undo_dev_pagemap(nr, nr_start, flags, pages);
break;
}
+
+   if (!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)) {
+   undo_dev_pagemap(nr, nr_start, flags, pages);
+   break;
+   }
+
SetPageReferenced(page);
pages[*nr] = page;
if (unlikely(!try_grab_page(page, flags))) {
@@ -2842,7 +2861,8 @@ static int internal_get_user_pages_fast(unsigned long 
start,
 
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
   FOLL_FORCE | FOLL_PIN | FOLL_GET |
-  FOLL_FAST_ONLY | FOLL_NOFAULT)))
+  FOLL_FAST_ONLY | FOLL_NOFAULT |
+  FOLL_PCI_P2PDMA)))
return -EINVAL;
 
if (gup_flags & FOLL_PIN)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 22/23] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2021-11-17 Thread Logan Gunthorpe
Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap
a hunk of p2pmem into userspace.

Pages are allocated from the genalloc in bulk and their reference count
incremented. They are returned to the genalloc when the page is put.

The VMA does not take a reference to the pages when they are inserted
with vmf_insert_mixed() (which is necessary for zone device pages) so
the backing P2P memory is stored in a structures in vm_private_data.

A pseudo mount is used to allocate an inode for each PCI device. The
inode's address_space is used in the file doing the mmap so that all
VMAs are collected and can be unmapped if the PCI device is unbound.
After unmapping, the VMAs are iterated through and their pages are
put so the device can continue to be unbound. An active flag is used
to signal to VMAs not to allocate any further P2P memory once the
removal process starts. The flag is synchronized with concurrent
access with an RCU lock.

The VMAs and inode will survive after the unbind of the device, but no
pages will be present in the VMA and a subsequent access will result
in a SIGBUS error.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c   | 301 -
 include/linux/pci-p2pdma.h |  11 ++
 include/uapi/linux/magic.h |   1 +
 3 files changed, 311 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 16992b0f0c36..641a7808a527 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -17,14 +17,19 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
struct xarray map_types;
+   struct inode *inode;
+   bool active;
 };
 
 struct pci_p2pdma_pagemap {
@@ -33,6 +38,15 @@ struct pci_p2pdma_pagemap {
u64 bus_offset;
 };
 
+struct pci_p2pdma_map {
+   struct kref ref;
+   struct rcu_head rcu;
+   struct pci_dev *pdev;
+   struct inode *inode;
+   void *kaddr;
+   size_t len;
+};
+
 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
 {
return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
@@ -101,6 +115,26 @@ static const struct attribute_group p2pmem_group = {
.name = "p2pmem",
 };
 
+/*
+ * P2PDMA internal mount
+ * Fake an internal VFS mount-point in order to allocate struct address_space
+ * mappings to remove VMAs on unbind events.
+ */
+static int pci_p2pdma_fs_cnt;
+static struct vfsmount *pci_p2pdma_fs_mnt;
+
+static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc)
+{
+   return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type pci_p2pdma_fs_type = {
+   .name = "p2dma",
+   .owner = THIS_MODULE,
+   .init_fs_context = pci_p2pdma_fs_init_fs_context,
+   .kill_sb = kill_anon_super,
+};
+
 static void p2pdma_page_free(struct page *page)
 {
struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
@@ -129,6 +163,9 @@ static void pci_p2pdma_release(void *data)
gen_pool_destroy(p2pdma->pool);
sysfs_remove_group(>dev.kobj, _group);
xa_destroy(>map_types);
+
+   iput(p2pdma->inode);
+   simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
 }
 
 static int pci_p2pdma_setup(struct pci_dev *pdev)
@@ -146,17 +183,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
if (!p2p->pool)
goto out;
 
-   error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
+   error = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt,
+ _p2pdma_fs_cnt);
if (error)
goto out_pool_destroy;
 
+   p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb);
+   if (IS_ERR(p2p->inode)) {
+   error = -ENOMEM;
+   goto out_unpin_fs;
+   }
+
+   error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
+   if (error)
+   goto out_put_inode;
+
error = sysfs_create_group(>dev.kobj, _group);
if (error)
-   goto out_pool_destroy;
+   goto out_put_inode;
 
rcu_assign_pointer(pdev->p2pdma, p2p);
return 0;
 
+out_put_inode:
+   iput(p2p->inode);
+out_unpin_fs:
+   simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
 out_pool_destroy:
gen_pool_destroy(p2p->pool);
 out:
@@ -164,6 +216,54 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
return error;
 }
 
+static void pci_p2pdma_map_free_pages(struct pci_p2pdma_map *pmap)
+{
+   int i;
+
+   if (!pmap->kaddr)
+   return;
+
+   for (i = 0; i < pmap->len; i += PAGE_SIZE)
+   put_page(virt_to_page(pmap->kaddr + i));
+
+   pmap->kaddr = NULL;
+}
+
+static void pci_p2pdma_free_mappings(struct address_space *mapping)
+{
+   struct vm_area_struct *vma;
+
+   i_mmap_lock_write(mapping);
+ 

[PATCH v4 13/23] RDMA/rw: drop pci_p2pdma_[un]map_sg()

2021-11-17 Thread Logan Gunthorpe
dma_map_sg() now supports the use of P2PDMA pages so pci_p2pdma_map_sg()
is no longer necessary and may be dropped. This means the
rdma_rw_[un]map_sg() helpers are no longer necessary. Remove it all.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 drivers/infiniband/core/rw.c | 45 
 1 file changed, 9 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5a3bd41b331c..d4517b68d1ca 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -273,33 +273,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return 1;
 }
 
-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
-u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg)))
-   pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
-   else
-   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sgtable(struct ib_device *dev, struct sg_table *sgt,
-  enum dma_data_direction dir)
-{
-   int nents;
-
-   if (is_pci_p2pdma_page(sg_page(sgt->sgl))) {
-   if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
-   return 0;
-   nents = pci_p2pdma_map_sg(dev->dma_device, sgt->sgl,
- sgt->orig_nents, dir);
-   if (!nents)
-   return -EIO;
-   sgt->nents = nents;
-   return 0;
-   }
-   return ib_dma_map_sgtable_attrs(dev, sgt, dir, 0);
-}
-
 /**
  * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
  * @ctx:   context to initialize
@@ -326,7 +299,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u32 port_num,
};
int ret;
 
-   ret = rdma_rw_map_sgtable(dev, , dir);
+   ret = ib_dma_map_sgtable_attrs(dev, , dir, 0);
if (ret)
return ret;
sg_cnt = sgt.nents;
@@ -365,7 +338,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u32 port_num,
return ret;
 
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, , dir, 0);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -413,12 +386,12 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return -EINVAL;
}
 
-   ret = rdma_rw_map_sgtable(dev, , dir);
+   ret = ib_dma_map_sgtable_attrs(dev, , dir, 0);
if (ret)
return ret;
 
if (prot_sg_cnt) {
-   ret = rdma_rw_map_sgtable(dev, _sgt, dir);
+   ret = ib_dma_map_sgtable_attrs(dev, _sgt, dir, 0);
if (ret)
goto out_unmap_sg;
}
@@ -485,9 +458,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 out_unmap_prot_sg:
if (prot_sgt.nents)
-   rdma_rw_unmap_sg(dev, prot_sgt.sgl, prot_sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, _sgt, dir, 0);
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sgt.sgl, sgt.orig_nents, dir);
+   ib_dma_unmap_sgtable_attrs(dev, , dir, 0);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -620,7 +593,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct 
ib_qp *qp,
break;
}
 
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
@@ -648,8 +621,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 04/23] PCI/P2PDMA: Expose pci_p2pdma_map_type()

2021-11-17 Thread Logan Gunthorpe
pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual IOMMU mapping.

Prototypes for this helper are added to dma-map-ops.h as they are only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
Reviewed-by: Jason Gunthorpe 
---
 drivers/pci/p2pdma.c| 25 +
 include/linux/dma-map-ops.h | 45 +
 2 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 9a39c2c307ab..02a13a5ac680 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -10,6 +10,7 @@
 
 #define pr_fmt(fmt) "pci-p2pdma: " fmt
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -20,13 +21,6 @@
 #include 
 #include 
 
-enum pci_p2pdma_map_type {
-   PCI_P2PDMA_MAP_UNKNOWN = 0,
-   PCI_P2PDMA_MAP_NOT_SUPPORTED,
-   PCI_P2PDMA_MAP_BUS_ADDR,
-   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
@@ -841,8 +835,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
-   struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ * a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ *
+ * Returns one of:
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
+ * using the CPU physical address (in dma-direct) or an IOVA
+ * mapping for the IOMMU.
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+struct device *dev)
 {
enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..d693a0e33bac 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -379,4 +379,49 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 
 extern const struct dma_map_ops dma_dummy_ops;
 
+enum pci_p2pdma_map_type {
+   /*
+* PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
+* type hasn't been calculated yet. Functions that return this enum
+* never return this value.
+*/
+   PCI_P2PDMA_MAP_UNKNOWN = 0,
+
+   /*
+* PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
+* traverse the host bridge and the host bridge is not in the
+* allowlist. DMA Mapping routines should return an error when
+* this is returned.
+*/
+   PCI_P2PDMA_MAP_NOT_SUPPORTED,
+
+   /*
+* PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
+* each other directly through a PCI switch and the transaction will
+* not traverse the host bridge. Such a mapping should program
+* the DMA engine with PCI bus addresses.
+*/
+   PCI_P2PDMA_MAP_BUS_ADDR,
+
+   /*
+* PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
+* to each other, but the transaction traverses a host bridge on the
+* allowlist. In this case, a normal mapping either with CPU physical
+* addresses (in the case of dma-direct) or IOVA addresses (in the
+* case of IOMMUs) should be used to program the DMA engine.
+*/
+   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
+#ifdef CONFIG_PCI_P2PDMA
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+struct device *dev);
+#else /* CONFIG_PCI_P2PDMA */
+static inline enum pci_p2pdma_map_type
+pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct device *dev)
+{
+   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
+#endif /* CONFIG_PCI_P2PDMA */
+
 #endif /* _LINUX_DMA_MAP_OPS_H */
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 19/23] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()

2021-11-17 Thread Logan Gunthorpe
When a bio's queue supports PCI P2PDMA, set FOLL_PCI_P2PDMA for
iov_iter_get_pages_flags(). This allows PCI P2PDMA pages to be passed
from userspace and enables the O_DIRECT path in iomap based filesystems
and direct to block devices.

Signed-off-by: Logan Gunthorpe 
---
 block/bio.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index f4e2e30d7a24..f0a17c7f41c3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1096,6 +1096,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
bool same_page = false;
+   unsigned int flags = 0;
ssize_t size, left;
unsigned len, i;
size_t offset;
@@ -1108,7 +1109,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, 
struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-   size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
+   if (bio->bi_bdev && bio->bi_bdev->bd_disk &&
+   blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
+   flags |= FOLL_PCI_P2PDMA;
+
+   size = iov_iter_get_pages_flags(iter, pages, LONG_MAX, nr_pages,
+   , flags);
if (unlikely(size <= 0))
return size ? size : -EFAULT;
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 03/23] PCI/P2PDMA: Attempt to set map_type if it has not been set

2021-11-17 Thread Logan Gunthorpe
Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

This change will calculate the mapping type if it hasn't pre-calculated
so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping
type is calculated, so drop the WARN_ON when that is he case.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 8d47cb7218d1..9a39c2c307ab 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -848,6 +848,7 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
struct pci_dev *client;
struct pci_p2pdma *p2pdma;
+   int dist;
 
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
@@ -864,6 +865,10 @@ static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
type = xa_to_value(xa_load(>map_types,
   map_types_idx(client)));
rcu_read_unlock();
+
+   if (type == PCI_P2PDMA_MAP_UNKNOWN)
+   return calc_map_type_and_dist(provider, client, , true);
+
return type;
 }
 
@@ -906,7 +911,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
case PCI_P2PDMA_MAP_BUS_ADDR:
return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
default:
-   WARN_ON_ONCE(1);
return 0;
}
 }
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 02/23] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2021-11-17 Thread Logan Gunthorpe
Make use of the third free LSB in scatterlist's page_link on 64bit systems.

The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
given SGL segments dma_address points to a PCI bus address.
dma_unmap_sg_p2pdma() will need to perform different cleanup when a
segment is marked as a bus address.

Create a CONFIG_NEED_SG_DMA_BUS_ADDR_FLAG bool which depends on
CONFIG_64BIT (so there is space in the page link for the new flag).
CONFIG_PCI_P2PDMA will then depend on this so this means PCI P2PDMA will
require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA
use cases are restricted to newer root complexes and roughly require the
extra address space for memory BARs used in the transactions.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/Kconfig |  5 +
 include/linux/scatterlist.h | 44 -
 kernel/dma/Kconfig  | 10 +
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 43e615aa12ff..95f29601a4df 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -164,6 +164,11 @@ config PCI_PASID
 config PCI_P2PDMA
bool "PCI peer-to-peer transfer support"
depends on ZONE_DEVICE
+   #
+   # The need for the scatterlist DMA bus address flag means PCI P2PDMA
+   # requires 64bit
+   #
+   select NEED_SG_DMA_BUS_ADDR_FLAG
select GENERIC_ALLOCATOR
help
  Enableѕ drivers to do PCI peer-to-peer transactions to and from
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 7ff9d6386c12..917c09dcc566 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -64,12 +64,24 @@ struct sg_append_table {
 #define SG_CHAIN   0x01UL
 #define SG_END 0x02UL
 
+/*
+ * bit 2 is the third free bit in the page_link on 64bit systems which
+ * is used by dma_unmap_sg() to determine if the dma_address is a
+ * bus address when doing P2PDMA.
+ */
+#ifdef CONFIG_NEED_SG_DMA_BUS_ADDR_FLAG
+#define SG_DMA_BUS_ADDRESS 0x04UL
+static_assert(__alignof__(struct page) >= 8);
+#else
+#define SG_DMA_BUS_ADDRESS 0x00UL
+#endif
+
 /*
  * We overload the LSB of the page pointer to indicate whether it's
  * a valid sg entry, or whether it points to the start of a new scatterlist.
  * Those low bits are there for everyone! (thanks mason :-)
  */
-#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END)
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_DMA_BUS_ADDRESS)
 
 static inline unsigned int __sg_flags(struct scatterlist *sg)
 {
@@ -91,6 +103,11 @@ static inline bool sg_is_last(struct scatterlist *sg)
return __sg_flags(sg) & SG_END;
 }
 
+static inline bool sg_is_dma_bus_address(struct scatterlist *sg)
+{
+   return __sg_flags(sg) & SG_DMA_BUS_ADDRESS;
+}
+
 /**
  * sg_assign_page - Assign a given page to an SG entry
  * @sg:SG entry
@@ -245,6 +262,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
sg->page_link &= ~SG_END;
 }
 
+/**
+ * sg_dma_mark_bus address - Mark the scatterlist entry as a bus address
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Marks the passed in sg entry to indicate that the dma_address is
+ *   a bus address and doesn't need to be unmapped.
+ **/
+static inline void sg_dma_mark_bus_address(struct scatterlist *sg)
+{
+   sg->page_link |= SG_DMA_BUS_ADDRESS;
+}
+
+/**
+ * sg_unmark_pci_p2pdma - Unmark the scatterlist entry as a bus address
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Clears the bus address mark.
+ **/
+static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
+{
+   sg->page_link &= ~SG_DMA_BUS_ADDRESS;
+}
+
 /**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 1b02179758cb..6e5e1d8e1329 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -27,6 +27,16 @@ config ARCH_HAS_DMA_MAP_DIRECT
 config NEED_SG_DMA_LENGTH
bool
 
+#
+# PCI P2PDMA needs to store bus addresses in the SGL's dma_address so that the
+# dma_unmap_sg() implementations can know not to unmap those segments.
+# The flag is stored in the 3rd bit in the page_link field in the SGL
+# which means this can only be done on 64bit systems.
+#
+config NEED_SG_DMA_BUS_ADDR_FLAG
+   depends on 64BIT
+   bool
+
 config NEED_DMA_MAP_STATE
bool
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 14/23] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

2021-11-17 Thread Logan Gunthorpe
This interface is superseded by support in dma_map_sg() which now supports
heterogeneous scatterlists. There are no longer any users, so remove it.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Max Gurtovoy 
---
 drivers/pci/p2pdma.c   | 65 --
 include/linux/pci-p2pdma.h | 27 
 2 files changed, 92 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 6ad3a8816677..563e9be9599e 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -879,71 +879,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
return type;
 }
 
-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
-   struct device *dev, struct scatterlist *sg, int nents)
-{
-   struct scatterlist *s;
-   int i;
-
-   for_each_sg(sg, s, nents, i) {
-   s->dma_address = sg_phys(s) + p2p_pgmap->bus_offset;
-   sg_dma_len(s) = s->length;
-   }
-
-   return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   struct pci_p2pdma_pagemap *p2p_pgmap =
-   to_p2p_pgmap(sg_page(sg)->pgmap);
-
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
-   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-   return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
-   case PCI_P2PDMA_MAP_BUS_ADDR:
-   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
-   default:
-   return 0;
-   }
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- * mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   enum pci_p2pdma_map_type map_type;
-
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
-
-   if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
-   dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
 /**
  * pci_p2pdma_map_segment - map an sg segment determining the mapping type
  * @state: State structure that should be declared outside of the for_each_sg()
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..2c07aa6b7665 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -30,10 +30,6 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev 
*pdev,
 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
 int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
bool *use_p2pdma);
 ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
@@ -83,17 +79,6 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-}
 static inline int pci_p2pdma_enable_store(const char *page,
struct pci_dev **p2p_dev, bool *use_p2pdma)
 {
@@ -119,16 +104,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct 
device *client)
return pci_p2pmem_find_many(, 1);
 }
 
-static inline int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg,
-   

[PATCH v4 17/23] block: add check when merging zone device pages

2021-11-17 Thread Logan Gunthorpe
Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Add a helper to determine if zone device pages are mergeable and use
this helper in page_is_mergeable().

Signed-off-by: Logan Gunthorpe 
---
 block/bio.c|  2 ++
 include/linux/mm.h | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 15ab0d6d1c06..f4e2e30d7a24 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -803,6 +803,8 @@ static inline bool page_is_mergeable(const struct bio_vec 
*bv,
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
return false;
+   if (!zone_device_pages_are_mergeable(bv->bv_page, page))
+   return false;
 
*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
if (*same_page)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 65cb27cebbab..3367d936b256 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1118,6 +1118,24 @@ static inline bool is_zone_device_page(const struct page 
*page)
 {
return page_zonenum(page) == ZONE_DEVICE;
 }
+
+/*
+ * Consecutive zone device pages should not be merged into the same sgl
+ * or bvec segment with other types of pages or if they belong to different
+ * pgmaps. Otherwise getting the pgmap of a given segment is not possible
+ * without scanning the entire segment. This helper returns true either if
+ * both pages are not zone device pages or both pages are zone device pages
+ * with the same pgmap.
+ */
+static inline bool zone_device_pages_are_mergeable(const struct page *a,
+  const struct page *b)
+{
+   if (is_zone_device_page(a) != is_zone_device_page(b))
+   return false;
+   if (!is_zone_device_page(a))
+   return true;
+   return a->pgmap == b->pgmap;
+}
 extern void memmap_init_zone_device(struct zone *, unsigned long,
unsigned long, struct dev_pagemap *);
 #else
@@ -1125,6 +1143,11 @@ static inline bool is_zone_device_page(const struct page 
*page)
 {
return false;
 }
+static inline bool zone_device_pages_are_mergeable(const struct page *a,
+  const struct page *b)
+{
+   return true;
+}
 #endif
 
 static inline bool is_zone_movable_page(const struct page *page)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 08/23] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support

2021-11-17 Thread Logan Gunthorpe
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/dma-map-ops.h | 10 ++
 include/linux/dma-mapping.h |  5 +
 kernel/dma/mapping.c| 18 ++
 3 files changed, 33 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 752f91e5eb5d..4d4161d58ce0 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -11,7 +11,17 @@
 
 struct cma;
 
+/*
+ * Values for struct dma_map_ops.flags:
+ *
+ * DMA_F_PCI_P2PDMA_SUPPORTED: Indicates the dma_map_ops implementation can
+ * handle PCI P2PDMA pages in the map_sg/unmap_sg operation.
+ */
+#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0)
+
 struct dma_map_ops {
+   unsigned int flags;
+
void *(*alloc)(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..f7c61b2b4b5e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -140,6 +140,7 @@ int dma_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 bool dma_can_mmap(struct device *dev);
 int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
@@ -250,6 +251,10 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 {
return 0;
 }
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   return false;
+}
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index c056a1468189..74858326ef94 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -724,6 +724,24 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   /* if ops is not set, dma direct will be used which supports P2PDMA */
+   if (!ops)
+   return true;
+
+   /*
+* Note: dma_ops_bypass is not checked here because P2PDMA should
+* not be used with dma mapping ops that do not have support even
+* if the specific device is bypassing them.
+*/
+
+   return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
 #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
 void arch_dma_set_mask(struct device *dev, u64 mask);
 #else
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions

2021-11-17 Thread Logan Gunthorpe
Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros
into static inline functions. There's no reason for these to be macros
and static inline are generally preferred these days.

Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is
adding another bit to this mask, can do so more easily.

Suggested-by: Jason Gunthorpe 
Signed-off-by: Logan Gunthorpe 
---
 include/linux/scatterlist.h | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 266754a55327..7ff9d6386c12 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -69,10 +69,27 @@ struct sg_append_table {
  * a valid sg entry, or whether it points to the start of a new scatterlist.
  * Those low bits are there for everyone! (thanks mason :-)
  */
-#define sg_is_chain(sg)((sg)->page_link & SG_CHAIN)
-#define sg_is_last(sg) ((sg)->page_link & SG_END)
-#define sg_chain_ptr(sg)   \
-   ((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END)))
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END)
+
+static inline unsigned int __sg_flags(struct scatterlist *sg)
+{
+   return sg->page_link & SG_PAGE_LINK_MASK;
+}
+
+static inline struct scatterlist *sg_chain_ptr(struct scatterlist *sg)
+{
+   return (struct scatterlist *)(sg->page_link & ~SG_PAGE_LINK_MASK);
+}
+
+static inline bool sg_is_chain(struct scatterlist *sg)
+{
+   return __sg_flags(sg) & SG_CHAIN;
+}
+
+static inline bool sg_is_last(struct scatterlist *sg)
+{
+   return __sg_flags(sg) & SG_END;
+}
 
 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -92,7 +109,7 @@ static inline void sg_assign_page(struct scatterlist *sg, 
struct page *page)
 * In order for the low bit stealing approach to work, pages
 * must be aligned at a 32-bit boundary as a minimum.
 */
-   BUG_ON((unsigned long) page & (SG_CHAIN | SG_END));
+   BUG_ON((unsigned long)page & SG_PAGE_LINK_MASK);
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
 #endif
@@ -126,7 +143,7 @@ static inline struct page *sg_page(struct scatterlist *sg)
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
 #endif
-   return (struct page *)((sg)->page_link & ~(SG_CHAIN | SG_END));
+   return (struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK);
 }
 
 /**
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 23/23] nvme-pci: allow mmaping the CMB in userspace

2021-11-17 Thread Logan Gunthorpe
Allow userspace to obtain CMB memory by mmaping the controller's
char device. The mmap call allocates and returns a hunk of CMB memory,
(the offset is ignored) so userspace does not have control over the
address within the CMB.

A VMA allocated in this way will only be usable by drivers that set
FOLL_PCI_P2PDMA when calling GUP. And inter-device support will be
checked the first time the pages are mapped for DMA.

Currently this is only supported by O_DIRECT to an PCI NVMe device
or through the NVMe passthrough IOCTL.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/core.c | 15 +++
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 18 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 344414351314..39ad592cacdc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3111,6 +3111,10 @@ static int nvme_dev_open(struct inode *inode, struct 
file *file)
}
 
file->private_data = ctrl;
+
+   if (ctrl->ops->mmap_file_open)
+   ctrl->ops->mmap_file_open(ctrl, file);
+
return 0;
 }
 
@@ -3124,12 +3128,23 @@ static int nvme_dev_release(struct inode *inode, struct 
file *file)
return 0;
 }
 
+static int nvme_dev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   struct nvme_ctrl *ctrl = file->private_data;
+
+   if (!ctrl->ops->mmap_cmb)
+   return -ENODEV;
+
+   return ctrl->ops->mmap_cmb(ctrl, vma);
+}
+
 static const struct file_operations nvme_dev_fops = {
.owner  = THIS_MODULE,
.open   = nvme_dev_open,
.release= nvme_dev_release,
.unlocked_ioctl = nvme_dev_ioctl,
.compat_ioctl   = compat_ptr_ioctl,
+   .mmap   = nvme_dev_mmap,
 };
 
 static ssize_t nvme_sysfs_reset(struct device *dev,
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a9f60b12a32b..5fdc1a2027e9 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -494,6 +494,8 @@ struct nvme_ctrl_ops {
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
+   void (*mmap_file_open)(struct nvme_ctrl *ctrl, struct file *file);
+   int (*mmap_cmb)(struct nvme_ctrl *ctrl, struct vm_area_struct *vma);
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3f2bd1efe076..05d6e7284000 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2896,6 +2896,22 @@ static bool nvme_pci_supports_pci_p2pdma(struct 
nvme_ctrl *ctrl)
return dma_pci_p2pdma_supported(dev->dev);
 }
 
+static void nvme_pci_mmap_file_open(struct nvme_ctrl *ctrl,
+   struct file *file)
+{
+   struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev);
+
+   pci_p2pdma_mmap_file_open(pdev, file);
+}
+
+static int nvme_pci_mmap_cmb(struct nvme_ctrl *ctrl,
+struct vm_area_struct *vma)
+{
+   struct pci_dev *pdev = to_pci_dev(to_nvme_dev(ctrl)->dev);
+
+   return pci_mmap_p2pmem(pdev, vma);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
@@ -2907,6 +2923,8 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
.supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
+   .mmap_file_open = nvme_pci_mmap_file_open,
+   .mmap_cmb   = nvme_pci_mmap_cmb,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 12/23] RDMA/core: introduce ib_dma_pci_p2p_dma_supported()

2021-11-17 Thread Logan Gunthorpe
Introduce the helper function ib_dma_pci_p2p_dma_supported() to check
if a given ib_device can be used in P2PDMA transfers. This ensures
the ib_device is not using virt_dma and also that the underlying
dma_device supports P2PDMA.

Use the new helper in nvme-rdma to replace the existing check for
ib_uses_virt_dma(). Adding the dma_pci_p2pdma_supported() check allows
switching away from pci_p2pdma_[un]map_sg().

Signed-off-by: Logan Gunthorpe 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Max Gurtovoy 
---
 drivers/nvme/target/rdma.c |  2 +-
 include/rdma/ib_verbs.h| 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 1deb4043e242..22519739a874 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -415,7 +415,7 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device 
*ndev,
if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
goto out_free_rsp;
 
-   if (!ib_uses_virt_dma(ndev->device))
+   if (ib_dma_pci_p2p_dma_supported(ndev->device))
r->req.p2p_client = >device->dev;
r->send_sge.length = sizeof(*r->req.cqe);
r->send_sge.lkey = ndev->pd->local_dma_lkey;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6e9ad656ecb7..6355a0d5fd00 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4003,6 +4003,17 @@ static inline bool ib_uses_virt_dma(struct ib_device 
*dev)
return IS_ENABLED(CONFIG_INFINIBAND_VIRT_DMA) && !dev->dma_device;
 }
 
+/*
+ * Check if a IB device's underlying DMA mapping supports P2PDMA transfers.
+ */
+static inline bool ib_dma_pci_p2p_dma_supported(struct ib_device *dev)
+{
+   if (ib_uses_virt_dma(dev))
+   return false;
+
+   return dma_pci_p2pdma_supported(dev->dma_device);
+}
+
 /**
  * ib_dma_mapping_error - check a DMA addr for error
  * @dev: The device for which the dma_addr was created
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 18/23] lib/scatterlist: add check when merging zone device pages

2021-11-17 Thread Logan Gunthorpe
Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Factor out the check for page mergability into a pages_are_mergable()
helper and add a check with zone_device_pages_are_mergeable().

Signed-off-by: Logan Gunthorpe 
---
 lib/scatterlist.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index d5e82e4a57ad..dc473010235c 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -410,6 +410,15 @@ static struct scatterlist *get_next_sg(struct 
sg_append_table *table,
return new_sg;
 }
 
+static bool pages_are_mergeable(struct page *a, struct page *b)
+{
+   if (page_to_pfn(a) != page_to_pfn(b) + 1)
+   return false;
+   if (!zone_device_pages_are_mergeable(a, b))
+   return false;
+   return true;
+}
+
 /**
  * sg_alloc_append_table_from_pages - Allocate and initialize an append sg
  *table from an array of pages
@@ -447,6 +456,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
unsigned int chunks, cur_page, seg_len, i, prv_len = 0;
unsigned int added_nents = 0;
struct scatterlist *s = sgt_append->prv;
+   struct page *last_pg;
 
/*
 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
@@ -460,21 +470,17 @@ int sg_alloc_append_table_from_pages(struct 
sg_append_table *sgt_append,
return -EOPNOTSUPP;
 
if (sgt_append->prv) {
-   unsigned long paddr =
-   (page_to_pfn(sg_page(sgt_append->prv)) * PAGE_SIZE +
-sgt_append->prv->offset + sgt_append->prv->length) /
-   PAGE_SIZE;
-
if (WARN_ON(offset))
return -EINVAL;
 
/* Merge contiguous pages into the last SG */
prv_len = sgt_append->prv->length;
-   while (n_pages && page_to_pfn(pages[0]) == paddr) {
+   last_pg = sg_page(sgt_append->prv);
+   while (n_pages && pages_are_mergeable(last_pg, pages[0])) {
if (sgt_append->prv->length + PAGE_SIZE > max_segment)
break;
sgt_append->prv->length += PAGE_SIZE;
-   paddr++;
+   last_pg = pages[0];
pages++;
n_pages--;
}
@@ -488,7 +494,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
for (i = 1; i < n_pages; i++) {
seg_len += PAGE_SIZE;
if (seg_len >= max_segment ||
-   page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
+   !pages_are_mergeable(pages[i], pages[i - 1])) {
chunks++;
seg_len = 0;
}
@@ -504,8 +510,7 @@ int sg_alloc_append_table_from_pages(struct sg_append_table 
*sgt_append,
for (j = cur_page + 1; j < n_pages; j++) {
seg_len += PAGE_SIZE;
if (seg_len >= max_segment ||
-   page_to_pfn(pages[j]) !=
-   page_to_pfn(pages[j - 1]) + 1)
+   !pages_are_mergeable(pages[j], pages[j - 1]))
break;
}
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu: dart: Support t6000 variant

2021-11-17 Thread Sven Peter via iommu
The M1 Pro/Max SoCs come with a new variant of DART which supports a
larger physical address space with a slightly different PTE format.
Pass through the correct paddr address space size to the io-pgtable code
which will take care of the rest.

Signed-off-by: Sven Peter 
---
 drivers/iommu/apple-dart.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 565ef5598811..c04648dfd747 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -81,10 +81,15 @@
 #define DART_TTBR_VALID BIT(31)
 #define DART_TTBR_SHIFT 12
 
+struct apple_dart_hw {
+   u32 oas;
+};
+
 /*
  * Private structure associated with each DART device.
  *
  * @dev: device struct
+ * @hw: SoC-specific hardware data
  * @regs: mapped MMIO region
  * @irq: interrupt number, can be shared with other DARTs
  * @clks: clocks associated with this DART
@@ -98,6 +103,7 @@
  */
 struct apple_dart {
struct device *dev;
+   const struct apple_dart_hw *hw;
 
void __iomem *regs;
 
@@ -421,7 +427,7 @@ static int apple_dart_finalize_domain(struct iommu_domain 
*domain,
pgtbl_cfg = (struct io_pgtable_cfg){
.pgsize_bitmap = dart->pgsize,
.ias = 32,
-   .oas = 36,
+   .oas = dart->hw->oas,
.coherent_walk = 1,
.iommu_dev = dart->dev,
};
@@ -855,6 +861,7 @@ static int apple_dart_probe(struct platform_device *pdev)
return -ENOMEM;
 
dart->dev = dev;
+   dart->hw = of_device_get_match_data(dev);
spin_lock_init(>lock);
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -944,8 +951,16 @@ static int apple_dart_remove(struct platform_device *pdev)
return 0;
 }
 
+static const struct apple_dart_hw apple_dart_hw_t8103 = {
+   .oas = 36,
+};
+static const struct apple_dart_hw apple_dart_hw_t6000 = {
+   .oas = 42,
+};
+
 static const struct of_device_id apple_dart_of_match[] = {
-   { .compatible = "apple,t8103-dart", .data = NULL },
+   { .compatible = "apple,t8103-dart", .data = _dart_hw_t8103 },
+   { .compatible = "apple,t6000-dart", .data = _dart_hw_t6000 },
{},
 };
 MODULE_DEVICE_TABLE(of, apple_dart_of_match);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] dt-bindings: iommu: dart: add t6000 compatible

2021-11-17 Thread Sven Peter via iommu
The M1 Max/Pro SoCs come with a new DART variant that is incompatible with
the previous one. Add a new compatible for those.

Signed-off-by: Sven Peter 
---
 Documentation/devicetree/bindings/iommu/apple,dart.yaml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/apple,dart.yaml 
b/Documentation/devicetree/bindings/iommu/apple,dart.yaml
index 94aa9e9afa59..ca2cbde9f3c9 100644
--- a/Documentation/devicetree/bindings/iommu/apple,dart.yaml
+++ b/Documentation/devicetree/bindings/iommu/apple,dart.yaml
@@ -22,7 +22,9 @@ description: |+
 
 properties:
   compatible:
-const: apple,t8103-dart
+enum:
+  - apple,t8103-dart
+  - apple,t6000-dart
 
   reg:
 maxItems: 1
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/io-pgtable: Add DART PTE support for t6000

2021-11-17 Thread Sven Peter via iommu
The DARTs present in the M1 Pro/Max SoC support a 42bit physical address
space by shifting the paddr and extending its mask inside the PTE.

Signed-off-by: Sven Peter 
---
 drivers/iommu/io-pgtable-arm.c | 30 +-
 include/linux/io-pgtable.h |  2 ++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a8c660b8b3e9..be66774aaf70 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -137,6 +137,11 @@
 #define APPLE_DART_PTE_SUBPAGE_START   GENMASK_ULL(63, 52)
 #define APPLE_DART_PTE_SUBPAGE_END GENMASK_ULL(51, 40)
 
+#define APPLE_DART_PADDR_MASK_PS_36BIT GENMASK_ULL(35, 12)
+#define APPLE_DART_PADDR_SHIFT_PS_36BIT(0)
+#define APPLE_DART_PADDR_MASK_PS_42BIT GENMASK_ULL(37, 10)
+#define APPLE_DART_PADDR_SHIFT_PS_42BIT(4)
+
 /* IOPTE accessors */
 #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
 
@@ -171,6 +176,13 @@ static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
 {
arm_lpae_iopte pte = paddr;
 
+   if (data->iop.fmt == APPLE_DART) {
+   pte = paddr >> data->iop.cfg.apple_dart_cfg.paddr_shift;
+   pte &= data->iop.cfg.apple_dart_cfg.paddr_mask;
+
+   return pte;
+   }
+
/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
 }
@@ -180,6 +192,12 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
 {
u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
 
+   if (data->iop.fmt == APPLE_DART) {
+   paddr = pte & data->iop.cfg.apple_dart_cfg.paddr_mask;
+   paddr <<= data->iop.cfg.apple_dart_cfg.paddr_shift;
+   return paddr;
+   }
+
if (ARM_LPAE_GRANULE(data) < SZ_64K)
return paddr;
 
@@ -1122,8 +1140,18 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
struct arm_lpae_io_pgtable *data;
int i;
 
-   if (cfg->oas > 36)
+   switch (cfg->oas) {
+   case 36:
+   cfg->apple_dart_cfg.paddr_shift = 
APPLE_DART_PADDR_SHIFT_PS_36BIT;
+   cfg->apple_dart_cfg.paddr_mask = APPLE_DART_PADDR_MASK_PS_36BIT;
+   break;
+   case 42:
+   cfg->apple_dart_cfg.paddr_shift = 
APPLE_DART_PADDR_SHIFT_PS_42BIT;
+   cfg->apple_dart_cfg.paddr_mask = APPLE_DART_PADDR_MASK_PS_42BIT;
+   break;
+   default:
return NULL;
+   }
 
data = arm_lpae_alloc_pgtable(cfg);
if (!data)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 86af6f0a00a2..4e26ebb0be93 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -136,6 +136,8 @@ struct io_pgtable_cfg {
struct {
u64 ttbr[4];
u32 n_ttbrs;
+   u32 paddr_shift;
+   u64 paddr_mask;
} apple_dart_cfg;
};
 };
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/io-pgtable: Add DART subpage protection support

2021-11-17 Thread Sven Peter via iommu
DART allows to only expose a subpage to the device. While this is an
optional feature on the M1 DARTs the new ones present on the Pro/Max
models require this field in every PTE.

Signed-off-by: Sven Peter 
---
 drivers/iommu/io-pgtable-arm.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dd9e47189d0d..a8c660b8b3e9 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt)"arm-lpae io-pgtable: " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -133,6 +134,9 @@
 #define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)
 #define APPLE_DART_PTE_PROT_NO_READ (1<<8)
 
+#define APPLE_DART_PTE_SUBPAGE_START   GENMASK_ULL(63, 52)
+#define APPLE_DART_PTE_SUBPAGE_END GENMASK_ULL(51, 40)
+
 /* IOPTE accessors */
 #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
 
@@ -273,6 +277,12 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
else
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
+   if (data->iop.fmt == APPLE_DART) {
+   /* subpage protection: always allow access to the entire page */
+   pte |= FIELD_PREP(APPLE_DART_PTE_SUBPAGE_START, 0);
+   pte |= FIELD_PREP(APPLE_DART_PTE_SUBPAGE_END, 0xfff);
+   }
+
for (i = 0; i < num_entries; i++)
ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] iommu: M1 Pro/Max DART support

2021-11-17 Thread Sven Peter via iommu
Hi,

This is a fairly brief series to add support for the DARTs present in the
M1 Pro/Max. They have two differences that make them incompatible with
those in the M1:

  - the physical addresses are shifted left by 4 bits and and have 2 more
bits inside the PTE entries
  - the subpage protection feature is now mandatory. For Linux we can
just configure it to always allow access to the entire page.

Note that this needs a fix to the core pagetable code. Hector already
sent a first version separately to the mailing list since the problem
is (at least in theory) also present on other SoCs using the LPAE format
with a large physical address space [1].

Sven

[1] 
https://lore.kernel.org/linux-iommu/a2b45243-7e0a-a2ac-4e14-5256a3e7a...@arm.com/T/#t

Sven Peter (4):
  dt-bindings: iommu: dart: add t6000 compatible
  iommu/io-pgtable: Add DART subpage protection support
  iommu/io-pgtable: Add DART PTE support for t6000
  iommu: dart: Support t6000 variant

 .../devicetree/bindings/iommu/apple,dart.yaml |  4 +-
 drivers/iommu/apple-dart.c| 19 -
 drivers/iommu/io-pgtable-arm.c| 40 ++-
 include/linux/io-pgtable.h|  2 +
 4 files changed, 61 insertions(+), 4 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm: Fix table descriptor paddr formatting

2021-11-17 Thread Robin Murphy

On 2021-11-17 17:12, Hector Martin wrote:

Table descriptors were being installed without properly formatting the
address using paddr_to_iopte, which does not match up with the
iopte_deref in __arm_lpae_map. This is incorrect for the LPAE pte
format, as it does not handle the high bits properly.


Oh, I guess whatever system it was tested on can't have exercised it all
that thoroughly. IIRC I couldn't test it myself since at the time none
of the Fast Model builds with SMMUs actually implemented any memory
above 48 bits.


This was found on Apple T6000 DARTs, which require a new pte format
(different shift); adding support for that to
paddr_to_iopte/iopte_to_paddr caused it to break badly, as even <48-bit
addresses would end up incorrect in that case.


...I look forward to not wanting to look at that patch :)


Signed-off-by: Hector Martin 
---
  drivers/iommu/io-pgtable-arm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dd9e47189d0d..b636e2737607 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -315,12 +315,12 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
  static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
 arm_lpae_iopte *ptep,
 arm_lpae_iopte curr,
-struct io_pgtable_cfg *cfg)
+struct arm_lpae_io_pgtable *data)
  {


Please just define a local "cfg" variable here like in most other
places, to avoid the rest of the churn in this function. Other than
that,

Acked-by: Robin Murphy 

Also,

Fixes: 6c89928ff7a0 ("iommu/io-pgtable-arm: Support 52-bit physical address")

Thanks,
Robin.
(currently elbow-deep in other parts of io-pgtable-arm...)


arm_lpae_iopte old, new;
  
-	new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;

-   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+   new = paddr_to_iopte(__pa(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
+   if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_LPAE_PTE_NSTABLE;
  
  	/*

@@ -332,11 +332,11 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
  
  	old = cmpxchg64_relaxed(ptep, curr, new);
  
-	if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))

+   if (data->iop.cfg.coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
return old;
  
  	/* Even if it's not ours, there's no point waiting; just kick it */

-   __arm_lpae_sync_pte(ptep, 1, cfg);
+   __arm_lpae_sync_pte(ptep, 1, >iop.cfg);
if (old == curr)
WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
  
@@ -380,7 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,

if (!cptep)
return -ENOMEM;
  
-		pte = arm_lpae_install_table(cptep, ptep, 0, cfg);

+   pte = arm_lpae_install_table(cptep, ptep, 0, data);
if (pte)
__arm_lpae_free_pages(cptep, tblsz, cfg);
} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
@@ -592,7 +592,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
}
  
-	pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);

+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
if (pte != blk_pte) {
__arm_lpae_free_pages(tablep, tablesz, cfg);
/*


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/io-pgtable-arm: Fix table descriptor paddr formatting

2021-11-17 Thread Hector Martin
Table descriptors were being installed without properly formatting the
address using paddr_to_iopte, which does not match up with the
iopte_deref in __arm_lpae_map. This is incorrect for the LPAE pte
format, as it does not handle the high bits properly.

This was found on Apple T6000 DARTs, which require a new pte format
(different shift); adding support for that to
paddr_to_iopte/iopte_to_paddr caused it to break badly, as even <48-bit
addresses would end up incorrect in that case.

Signed-off-by: Hector Martin 
---
 drivers/iommu/io-pgtable-arm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dd9e47189d0d..b636e2737607 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -315,12 +315,12 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
 static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
 arm_lpae_iopte *ptep,
 arm_lpae_iopte curr,
-struct io_pgtable_cfg *cfg)
+struct arm_lpae_io_pgtable *data)
 {
arm_lpae_iopte old, new;
 
-   new = __pa(table) | ARM_LPAE_PTE_TYPE_TABLE;
-   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
+   new = paddr_to_iopte(__pa(table), data) | ARM_LPAE_PTE_TYPE_TABLE;
+   if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_LPAE_PTE_NSTABLE;
 
/*
@@ -332,11 +332,11 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
 
old = cmpxchg64_relaxed(ptep, curr, new);
 
-   if (cfg->coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
+   if (data->iop.cfg.coherent_walk || (old & ARM_LPAE_PTE_SW_SYNC))
return old;
 
/* Even if it's not ours, there's no point waiting; just kick it */
-   __arm_lpae_sync_pte(ptep, 1, cfg);
+   __arm_lpae_sync_pte(ptep, 1, >iop.cfg);
if (old == curr)
WRITE_ONCE(*ptep, new | ARM_LPAE_PTE_SW_SYNC);
 
@@ -380,7 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
if (!cptep)
return -ENOMEM;
 
-   pte = arm_lpae_install_table(cptep, ptep, 0, cfg);
+   pte = arm_lpae_install_table(cptep, ptep, 0, data);
if (pte)
__arm_lpae_free_pages(cptep, tblsz, cfg);
} else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) {
@@ -592,7 +592,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
}
 
-   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, data);
if (pte != blk_pte) {
__arm_lpae_free_pages(tablep, tablesz, cfg);
/*
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/3] perf/smmuv3: Synthesize IIDR from CoreSight ID registers

2021-11-17 Thread Jean-Philippe Brucker
From: Robin Murphy 

The SMMU_PMCG_IIDR register was not present in older revisions of the
Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of
fields from several PIDR registers, allowing us to present a
standardized identifier to userspace.

Signed-off-by: Robin Murphy 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/perf/arm_smmuv3_pmu.c | 55 ++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 19697617153a..598d6978280d 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -76,6 +76,10 @@
 #define SMMU_PMCG_CR0xE04
 #define SMMU_PMCG_CR_ENABLE BIT(0)
 #define SMMU_PMCG_IIDR  0xE08
+#define SMMU_PMCG_IIDR_PRODUCTIDGENMASK(31, 20)
+#define SMMU_PMCG_IIDR_VARIANT  GENMASK(19, 16)
+#define SMMU_PMCG_IIDR_REVISION GENMASK(15, 12)
+#define SMMU_PMCG_IIDR_IMPLEMENTER  GENMASK(11, 0)
 #define SMMU_PMCG_CEID0 0xE20
 #define SMMU_PMCG_CEID1 0xE28
 #define SMMU_PMCG_IRQ_CTRL  0xE50
@@ -84,6 +88,20 @@
 #define SMMU_PMCG_IRQ_CFG1  0xE60
 #define SMMU_PMCG_IRQ_CFG2  0xE64
 
+/* IMP-DEF ID registers */
+#define SMMU_PMCG_PIDR0 0xFE0
+#define SMMU_PMCG_PIDR0_PART_0  GENMASK(7, 0)
+#define SMMU_PMCG_PIDR1 0xFE4
+#define SMMU_PMCG_PIDR1_DES_0   GENMASK(7, 4)
+#define SMMU_PMCG_PIDR1_PART_1  GENMASK(3, 0)
+#define SMMU_PMCG_PIDR2 0xFE8
+#define SMMU_PMCG_PIDR2_REVISIONGENMASK(7, 4)
+#define SMMU_PMCG_PIDR2_DES_1   GENMASK(2, 0)
+#define SMMU_PMCG_PIDR3 0xFEC
+#define SMMU_PMCG_PIDR3_REVAND  GENMASK(7, 4)
+#define SMMU_PMCG_PIDR4 0xFD0
+#define SMMU_PMCG_PIDR4_DES_2   GENMASK(3, 0)
+
 /* MSI config fields */
 #define MSI_CFG0_ADDR_MASK  GENMASK_ULL(51, 2)
 #define MSI_CFG2_MEMATTR_DEVICE_nGnRE   0x1
@@ -755,6 +773,41 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
*smmu_pmu)
dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
 }
 
+static bool smmu_pmu_coresight_id_regs(struct smmu_pmu *smmu_pmu)
+{
+   return of_device_is_compatible(smmu_pmu->dev->of_node,
+  "arm,mmu-600-pmcg");
+}
+
+static void smmu_pmu_get_iidr(struct smmu_pmu *smmu_pmu)
+{
+   u32 iidr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_IIDR);
+
+   if (!iidr && smmu_pmu_coresight_id_regs(smmu_pmu)) {
+   u32 pidr0 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR0);
+   u32 pidr1 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR1);
+   u32 pidr2 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR2);
+   u32 pidr3 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR3);
+   u32 pidr4 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR4);
+
+   u32 productid = FIELD_GET(SMMU_PMCG_PIDR0_PART_0, pidr0) |
+   (FIELD_GET(SMMU_PMCG_PIDR1_PART_1, pidr1) << 8);
+   u32 variant = FIELD_GET(SMMU_PMCG_PIDR2_REVISION, pidr2);
+   u32 revision = FIELD_GET(SMMU_PMCG_PIDR3_REVAND, pidr3);
+   u32 implementer =
+   FIELD_GET(SMMU_PMCG_PIDR1_DES_0, pidr1) |
+   (FIELD_GET(SMMU_PMCG_PIDR2_DES_1, pidr2) << 4) |
+   (FIELD_GET(SMMU_PMCG_PIDR4_DES_2, pidr4) << 8);
+
+   iidr = FIELD_PREP(SMMU_PMCG_IIDR_PRODUCTID, productid) |
+  FIELD_PREP(SMMU_PMCG_IIDR_VARIANT, variant) |
+  FIELD_PREP(SMMU_PMCG_IIDR_REVISION, revision) |
+  FIELD_PREP(SMMU_PMCG_IIDR_IMPLEMENTER, implementer);
+   }
+
+   smmu_pmu->iidr = iidr;
+}
+
 static int smmu_pmu_probe(struct platform_device *pdev)
 {
struct smmu_pmu *smmu_pmu;
@@ -826,7 +879,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return err;
}
 
-   smmu_pmu->iidr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_IIDR);
+   smmu_pmu_get_iidr(smmu_pmu);
 
name = devm_kasprintf(>dev, GFP_KERNEL, "smmuv3_pmcg_%llx",
  (res_0->start) >> SMMU_PMCG_PA_SHIFT);
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/3] perf/smmuv3: Add devicetree support

2021-11-17 Thread Jean-Philippe Brucker
Add device-tree support to the SMMUv3 PMCG driver.

Signed-off-by: Jay Chen 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/perf/arm_smmuv3_pmu.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 226348822ab3..19697617153a 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -834,7 +835,8 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
 
-   smmu_pmu_get_acpi_options(smmu_pmu);
+   if (!dev->of_node)
+   smmu_pmu_get_acpi_options(smmu_pmu);
 
/* Pick one CPU to be the preferred one to use */
smmu_pmu->on_cpu = raw_smp_processor_id();
@@ -884,9 +886,16 @@ static void smmu_pmu_shutdown(struct platform_device *pdev)
smmu_pmu_disable(_pmu->pmu);
 }
 
+static const struct of_device_id smmu_pmu_of_match[] = {
+   { .compatible = "arm,smmu-v3-pmcg" },
+   {}
+};
+MODULE_DEVICE_TABLE(of, smmu_pmu_of_match);
+
 static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
+   .of_match_table = of_match_ptr(smmu_pmu_of_match),
.suppress_bind_attrs = true,
},
.probe = smmu_pmu_probe,
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/3] dt-bindings: Add Arm SMMUv3 PMCG binding

2021-11-17 Thread Jean-Philippe Brucker
Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
placed as a sibling node of the SMMU. Although the PMCGs registers may
be within the SMMU MMIO region, they are separate devices, and there can
be multiple PMCG devices for each SMMU (for example one for the TCU and
one for each TBU).

Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 .../bindings/perf/arm,smmu-v3-pmcg.yaml   | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml

diff --git a/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml 
b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml
new file mode 100644
index ..a4b53a6a1ebf
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/perf/arm,smmu-v3-pmcg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm SMMUv3 Performance Monitor Counter Group
+
+maintainers:
+  - Will Deacon 
+  - Robin Murphy 
+
+description: |
+  An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
+  They are standalone performance monitoring units that support both
+  architected and IMPLEMENTATION DEFINED event counters.
+
+properties:
+  $nodename:
+pattern: "^pmu@[0-9a-f]*"
+  compatible:
+oneOf:
+  - items:
+  - const: arm,mmu-600-pmcg
+  - const: arm,smmu-v3-pmcg
+  - const: arm,smmu-v3-pmcg
+
+  reg:
+items:
+  - description: Register page 0
+  - description: Register page 1, if SMMU_PMCG_CFGR.RELOC_CTRS = 1
+minItems: 1
+
+  interrupts:
+maxItems: 1
+
+  msi-parent: true
+
+required:
+  - compatible
+  - reg
+
+anyOf:
+  - required:
+  - interrupts
+  - required:
+  - msi-parent
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+pmu@2b42 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0x2b42 0x1000>,
+  <0x2b43 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
+
+pmu@2b44 {
+compatible = "arm,smmu-v3-pmcg";
+reg = <0x2b44 0x1000>,
+  <0x2b45 0x1000>;
+interrupts = ;
+msi-parent = < 0xff>;
+};
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/3] perf/smmuv3: Support devicetree

2021-11-17 Thread Jean-Philippe Brucker
Add devicetree binding for the SMMUv3 PMU, called Performance Monitoring
Counter Group (PMCG) in the spec. Each SMMUv3 implementation can have
multiple independent PMCGs, for example one for the Translation Control
Unit (TCU) and one per Translation Buffer Unit (TBU).

Since v1 [1]:
* Fixed warnings in the binding doc
* Removed hip08 support
* Merged Robin's version. I took the liberty of splitting the driver
  patch into 2 and 3. One fix in patch 3, and whitespace changes (the
  driver uses spaces instead of tabs to align #define values, which I
  was going to fix but actually seems more common across the tree.)

[1] 
https://lore.kernel.org/linux-iommu/2026113536.69758-1-jean-phili...@linaro.org/

Jean-Philippe Brucker (2):
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/smmuv3: Add devicetree support

Robin Murphy (1):
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers

 .../bindings/perf/arm,smmu-v3-pmcg.yaml   | 70 +++
 drivers/perf/arm_smmuv3_pmu.c | 66 -
 2 files changed, 134 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml

-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-17 Thread Tianyu Lan

On 11/17/2021 6:01 PM, Christoph Hellwig wrote:

This doesn't really have much to do with normal DMA mapping,
so why does this direct through the dma ops?



According to the previous discussion, dma_alloc_noncontigous()
and dma_vmap_noncontiguous() may be used to handle the noncontigous
memory alloc/map in the netvsc driver. So add alloc/free and vmap/vunmap
callbacks here to handle the case. The previous patch v4 & v5 handles
the allocation and map in the netvsc driver. If this should not go 
though dma ops, We also may make it as vmbus specific function and keep

the function in the vmbus driver.

https://lkml.org/lkml/2021/9/28/51


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-17 Thread Jason Gunthorpe via iommu
On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
> > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > > Hi Christoph,
> > > 
> > > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > > +enum iommu_dma_owner {
> > > > > + DMA_OWNER_NONE,
> > > > > + DMA_OWNER_KERNEL,
> > > > > + DMA_OWNER_USER,
> > > > > +};
> > > > > +
> > > > 
> > > > > + enum iommu_dma_owner dma_owner;
> > > > > + refcount_t owner_cnt;
> > > > > + struct file *owner_user_file;
> > > > 
> > > > I'd just overload the ownership into owner_user_file,
> > > > 
> > > >NULL -> no owner
> > > >(struct file *)1UL)  -> kernel
> > > >real pointer -> user
> > > > 
> > > > Which could simplify a lot of the code dealing with the owner.
> > > > 
> > > 
> > > Yeah! Sounds reasonable. I will make this in the next version.
> > 
> > It would be good to figure out how to make iommu_attach_device()
> > enforce no other driver binding as a kernel user without a file *, as
> > Robin pointed to, before optimizing this.
> > 
> > This fixes an existing bug where iommu_attach_device() only checks the
> > group size and is vunerable to a hot plug increasing the group size
> > after it returns. That check should be replaced by this series's logic
> > instead.
> 
> As my my understanding, the essence of this problem is that only the
> user owner of the iommu_group could attach an UNMANAGED domain to it.
> If I understand it right, how about introducing a new interface to
> allocate a user managed domain and storing the user file pointer in it.

For iommu_attach_device() the semantic is simple non-sharing, so there
is no need for the file * at all, it can just be NULL.

> Does above help here?

No, iommu_attach_device() is kernel only and should not interact with
userspace.

I'm also going to see if I can learn what Tegra is doing with
iommu_attach_group()

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-17 Thread Tianyu Lan

Hi Christoph:
  Thanks for your review.

On 11/17/2021 5:59 PM, Christoph Hellwig wrote:

The subject is wrong, nothing x86-specific here.  Please use
"swiotlb: " as the prefix


OK. Will update. Thanks.




+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store


Please avoid the overly long line. >

+   /*
+* With swiotlb_unencrypted_base setting, swiotlb bounce buffer will
+* be remapped in the swiotlb_update_mem_attributes() and return here
+* directly.
+*/


I'd word this as:

/*
 * If swiotlb_unencrypted_base is set, the bounce buffer memory will
 * be remapped and cleared in swiotlb_update_mem_attributes.
 */


Thanks for suggestion. Will update. Thanks.



+   ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   if (ret) {
+   memblock_free(mem->slots, alloc_size);
+   return ret;
+   }


With the latest update swiotlb_init_io_tlb_mem will always return 0,
so no need for the return value change or error handling here.



OK. Will revert the change.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-17 Thread Tianyu Lan




On 11/17/2021 3:12 AM, Borislav Petkov wrote:

What you should do, instead, is add an isol. VM specific
hv_cc_platform_has() just like amd_cc_platform_has() and handle
the cc_attrs there for your platform, like return false for
CC_ATTR_GUEST_MEM_ENCRYPT and then you won't need to add that hv_* thing
everywhere.

And then fix it up in __set_memory_enc_dec() too.



Yes, agree. Will add hv cc_attrs and check via cc_platform_has().


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 1/2] arm64: Add support for system cache memory type

2021-11-17 Thread Christoph Hellwig
On Tue, Nov 16, 2021 at 03:15:59PM -0800, Georgi Djakov wrote:
>  include/linux/dma-map-ops.h  | 8 

Your forgot to CC the maintainer.  Also don't try to ever hide DMA
core changes in arch specific patches ever again.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-17 Thread Christoph Hellwig
This doesn't really have much to do with normal DMA mapping,
so why does this direct through the dma ops?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-17 Thread Christoph Hellwig
The subject is wrong, nothing x86-specific here.  Please use
"swiotlb: " as the prefix

> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb
> + *   memory pool may be remapped in the memory encrypted case and 
> store

Please avoid the overly long line.

> + /*
> +  * With swiotlb_unencrypted_base setting, swiotlb bounce buffer will
> +  * be remapped in the swiotlb_update_mem_attributes() and return here
> +  * directly.
> +  */

I'd word this as:

/*
 * If swiotlb_unencrypted_base is set, the bounce buffer memory will
 * be remapped and cleared in swiotlb_update_mem_attributes.
 */
> + ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
> + if (ret) {
> + memblock_free(mem->slots, alloc_size);
> + return ret;
> + }

With the latest update swiotlb_init_io_tlb_mem will always return 0,
so no need for the return value change or error handling here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu