Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Wang


On 2020/9/16 上午3:26, Raj, Ashok wrote:

IIUC, you are asking that part of the interface to move to a API interface
that potentially the new /dev/sva and VFIO could share? I think the API's
for PASID management themselves are generic (Jean's patchset + Jacob's
ioasid set management).

Yes, the in kernel APIs are pretty generic now, and can be used by
many types of drivers.

Good, so there is no new requirements here I suppose.



The requirement is not for in-kernel APIs but a generic uAPIs.



As JasonW kicked this off, VDPA will need all this identical stuff
too. We already know this, and I think Intel VDPA HW will need it, so
it should concern you too:)

This is one of those things that I would disagree and commit :-)..


A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
reasonable starting point for discussion.

Looks like now we are getting closer to what we need.:-)

Given that PASID api's are general purpose today and any driver can use it
to take advantage. VFIO fortunately or unfortunately has the IOMMU things
abstracted. I suppose that support is also mostly built on top of the
generic iommu* api abstractions in a vendor neutral way?

I'm still lost on what is missing that vDPA can't build on top of what is
available?



For sure it can, but we may end up duplicated (or similar) uAPIs which 
is bad.


Thanks




Cheers,
Ashok



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Wang


On 2020/9/14 下午9:31, Jean-Philippe Brucker wrote:

If it's possible, I would suggest a generic uAPI instead of a VFIO specific
one.

A large part of this work is already generic uAPI, in
include/uapi/linux/iommu.h.



This is not what I read from this series, all the following uAPI is VFIO 
specific:


struct vfio_iommu_type1_nesting_op;
struct vfio_iommu_type1_pasid_request;

And include/uapi/linux/iommu.h is not included in 
include/uapi/linux/vfio.h at all.





This patchset connects that generic interface
to the pre-existing VFIO uAPI that deals with IOMMU mappings of an
assigned device. But the bulk of the work is done by the IOMMU subsystem,
and is available to all device drivers.



So any reason not introducing the uAPI to IOMMU drivers directly?





Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Do you have a more precise idea of the interface /dev/sva would provide,
how it would interact with VFIO and others?



Can we replace the container fd with sva fd like:

sva = open("/dev/sva", O_RDWR);
group = open("/dev/vfio/26", O_RDWR);
ioctl(group, VFIO_GROUP_SET_SVA, );

Then we can do all SVA stuffs through sva fd, and for other subsystems 
(like vDPA) it only need to implement the function that is equivalent to 
VFIO_GROUP_SET_SVA.




   vDPA could transport the
generic iommu.h structures via its own uAPI, and call the IOMMU API
directly without going through an intermediate /dev/sva handle.



Any value for those transporting? I think we have agreed that VFIO is 
not the only user for vSVA ...


It's not hard to forecast that there would be more subsystems that want 
to benefit from vSVA, we don't want to duplicate the similar uAPIs in 
all of those subsystems.


Thanks




Thanks,
Jean



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Lu Baolu

On 9/16/20 8:22 AM, Jacob Pan (Jun) wrote:

If user space wants to bind page tables, create the PASID with
/dev/sva, use ioctls there to setup the page table the way it wants,
then pass the now configured PASID to a driver that can use it.


Are we talking about bare metal SVA? If so, I don't see the need for
userspace to know there is a PASID. All user space need is that my
current mm is bound to a device by the driver. So it can be a one-step
process for user instead of two.


Driver does not do page table binding. Do not duplicate all the
control plane uAPI in every driver.

PASID managment and binding is seperated from the driver(s) that are
using the PASID.


Why separate? Drivers need to be involved in PASID life cycle
management. For example, when tearing down a PASID, the driver needs to
stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
point, then things are just in order. I am referring to bare metal SVA.

For guest SVA, I agree that binding is separate from PASID allocation.
Could you review this doc. in terms of life cycle?
https://lkml.org/lkml/2020/8/22/13

My point is that /dev/sda has no value for bare metal SVA, we are just
talking about if guest SVA UAPIs can be consolidated. Or am I missing
something?



Not only bare metal SVA, but also subdevice passthrough (Intel Scalable
IOV and ARM SubStream ID) also consumes PASID which has nothing to do
with user space, hence the /dev/sva is unsuited.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 15, 2020 10:29 PM
>
> > Do they need a device at all?  It's not clear to me why RID based
> > IOMMU management fits within vfio's scope, but PASID based does not.
> 
> In RID mode vfio-pci completely owns the PCI function, so it is more
> natural that VFIO, as the sole device owner, would own the DMA mapping
> machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> so there is not much reason to try and disaggregate the API.

It is also used by vDPA.

> 
> PASID on the other hand, is shared. vfio-mdev drivers will share the
> device with other kernel drivers. PASID and DMA will be concurrent
> with VFIO and other kernel drivers/etc.
> 

Looks you are equating PASID to host-side sharing, while ignoring 
another valid usage that a PASID-capable device is passed through
to the guest through vfio-pci and then PASID is used by the guest 
for guest-side sharing. In such case, it is an exclusive usage in host
side and then what is the problem for VFIO to manage PASID given
that vfio-pci completely owns the function?

Thanks
Kevin 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/tegra-smmu: Fix tlb_mask

2020-09-15 Thread Nicolin Chen
The "num_tlb_lines" might not be a power-of-2 value, being 48 on
Tegra210 for example. So the current way of calculating tlb_mask
using the num_tlb_lines is not correct: tlb_mask=0x5f in case of
num_tlb_lines=48, which will trim a setting of 0x30 (48) to 0x10.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 84fdee473873..0becdbfea306 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -1120,7 +1120,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
BIT_MASK(mc->soc->num_address_bits - SMMU_PTE_SHIFT) - 1;
dev_dbg(dev, "address bits: %u, PFN mask: %#lx\n",
mc->soc->num_address_bits, smmu->pfn_mask);
-   smmu->tlb_mask = (smmu->soc->num_tlb_lines << 1) - 1;
+   smmu->tlb_mask = (1 << fls(smmu->soc->num_tlb_lines)) - 1;
dev_dbg(dev, "TLB lines: %u, mask: %#lx\n", smmu->soc->num_tlb_lines,
smmu->tlb_mask);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jacob Pan (Jun)
Hi Jason,
On Tue, 15 Sep 2020 20:51:26 -0300, Jason Gunthorpe 
wrote:

> On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote:
> > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a
> > > PASID control char dev (eg /dev/sva, or maybe /dev/iommu) seems
> > > like a reasonable starting point for discussion.  
> > 
> > I am not sure what can really be consolidated in /dev/sva.   
> 
> More or less, everything in this patch. All the manipulations of PASID
> that are required for vIOMMU use case/etc. Basically all PASID control
> that is not just a 1:1 mapping of the mm_struct.
> 
> > will have their own kerne-user interfaces anyway for their usage
> > models. They are just providing the specific transport while
> > sharing generic IOMMU UAPIs and IOASID management.  
> 
> > As I mentioned PASID management is already consolidated in the
> > IOASID layer, so for VDPA or other users, it just matter of create
> > its own ioasid_set, doing allocation.  
> 
> Creating the PASID is not the problem, managing what the PASID maps to
> is the issue. That is all uAPI that we don't really have today.
> 
> > IOASID is also available to the in-kernel users which does not
> > need /dev/sva AFAICT. For bare metal SVA, I don't see a need to
> > create this 'floating' state of the PASID when created by /dev/sva.
> > PASID allocation could happen behind the scene when users need to
> > bind page tables to a device DMA stream.  
> 
> My point is I would like to see one set of uAPI ioctls to bind page
> tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the
> exact same things only slightly differently.
> 
Got your point. I am not familiar with VDPA but for VFIO UAPI, it is
very thin, mostly passthrough IOMMU UAPI struct as opaque data.

> If user space wants to bind page tables, create the PASID with
> /dev/sva, use ioctls there to setup the page table the way it wants,
> then pass the now configured PASID to a driver that can use it. 
> 
Are we talking about bare metal SVA? If so, I don't see the need for
userspace to know there is a PASID. All user space need is that my
current mm is bound to a device by the driver. So it can be a one-step
process for user instead of two.

> Driver does not do page table binding. Do not duplicate all the
> control plane uAPI in every driver.
> 
> PASID managment and binding is seperated from the driver(s) that are
> using the PASID.
> 
Why separate? Drivers need to be involved in PASID life cycle
management. For example, when tearing down a PASID, the driver needs to
stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
point, then things are just in order. I am referring to bare metal SVA.

For guest SVA, I agree that binding is separate from PASID allocation.
Could you review this doc. in terms of life cycle?
https://lkml.org/lkml/2020/8/22/13

My point is that /dev/sda has no value for bare metal SVA, we are just
talking about if guest SVA UAPIs can be consolidated. Or am I missing
something?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent

2020-09-15 Thread Robin Murphy
According to a downstream commit I found in the Khadas vendor kernel,
the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
how to handle this properly) we should describe it as such. Otherwise
the mismatch leads to all manner of fun with mismatched attributes and
inadvertently snooping stale data from caches, which would account for
at least some of the brokenness observed on this platform.

Signed-off-by: Robin Murphy 
---
 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
index 9b8548e5f6e5..ee8fcae9f9f0 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
@@ -135,3 +135,7 @@ map1 {
};
};
 };
+
+ {
+   dma-coherent;
+};
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote:
> > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> > control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> > reasonable starting point for discussion.
> 
> I am not sure what can really be consolidated in /dev/sva. 

More or less, everything in this patch. All the manipulations of PASID
that are required for vIOMMU use case/etc. Basically all PASID control
that is not just a 1:1 mapping of the mm_struct.

> will have their own kerne-user interfaces anyway for their usage models.
> They are just providing the specific transport while sharing generic IOMMU
> UAPIs and IOASID management.

> As I mentioned PASID management is already consolidated in the IOASID layer,
> so for VDPA or other users, it just matter of create its own ioasid_set,
> doing allocation.

Creating the PASID is not the problem, managing what the PASID maps to
is the issue. That is all uAPI that we don't really have today.

> IOASID is also available to the in-kernel users which does not
> need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this
> 'floating' state of the PASID when created by /dev/sva. PASID allocation
> could happen behind the scene when users need to bind page tables to a
> device DMA stream.

My point is I would like to see one set of uAPI ioctls to bind page
tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the exact
same things only slightly differently.

If user space wants to bind page tables, create the PASID with
/dev/sva, use ioctls there to setup the page table the way it wants,
then pass the now configured PASID to a driver that can use it. 

Driver does not do page table binding. Do not duplicate all the
control plane uAPI in every driver.

PASID managment and binding is seperated from the driver(s) that are
using the PASID.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu/io-pgtable-arm: Support coherency for Mali LPAE

2020-09-15 Thread Robin Murphy
Midgard GPUs have ACE-Lite master interfaces which allows systems to
integrate them in an I/O-coherent manner. It seems that from the GPU's
viewpoint, the rest of the system is its outer shareable domain, and so
even when snoop signals are wired up, they are only emitted for outer
shareable accesses. As such, setting the TTBR_SHARE_OUTER bit does
indeed get coherent pagetable walks working nicely for the coherent
T620 in the Arm Juno SoC.

Reviewed-by: Steven Price 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/io-pgtable-arm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..e47012006dcc 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -440,7 +440,7 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
}
 
-   if (prot & IOMMU_CACHE)
+   if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
pte |= ARM_LPAE_PTE_SH_IS;
else
pte |= ARM_LPAE_PTE_SH_OS;
@@ -1049,6 +1049,9 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
cfg->arm_mali_lpae_cfg.transtab = virt_to_phys(data->pgd) |
  ARM_MALI_LPAE_TTBR_READ_INNER |
  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
+   if (cfg->coherent_walk)
+   cfg->arm_mali_lpae_cfg.transtab |= 
ARM_MALI_LPAE_TTBR_SHARE_OUTER;
+
return >iop;
 
 out_free_data:
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] drm/panfrost: Support cache-coherent integrations

2020-09-15 Thread Robin Murphy
When the GPU's ACE-Lite interface is fully wired up and capable of
snooping CPU caches, it may be described as "dma-coherent" in
devicetree, which will already inform the DMA layer not to perform
unnecessary cache maintenance. However, we still need to ensure that
the GPU uses the appropriate cacheable outer-shareable attributes in
order to generate the requisite snoop signals, and that CPU mappings
don't create a mismatch by using a non-cacheable type either.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/panfrost/panfrost_device.h | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c| 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c| 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c| 1 +
 4 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index c30c719a8059..b31f45315e96 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -84,6 +84,7 @@ struct panfrost_device {
/* pm_domains for devices with more than one. */
struct device *pm_domain_devs[MAX_PM_DOMAINS];
struct device_link *pm_domain_links[MAX_PM_DOMAINS];
+   bool coherent;
 
struct panfrost_features features;
const struct panfrost_compatible *comp;
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ada51df9a7a3..2a6f2f716b2f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -588,6 +588,8 @@ static int panfrost_probe(struct platform_device *pdev)
if (!pfdev->comp)
return -ENODEV;
 
+   pfdev->coherent = device_get_dma_attr(>dev) == DEV_DMA_COHERENT;
+
/* Allocate and initialze the DRM device. */
ddev = drm_dev_alloc(_drm_driver, >dev);
if (IS_ERR(ddev))
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 33355dd302f1..cdf1a8754eba 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -220,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
  */
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, 
size_t size)
 {
+   struct panfrost_device *pfdev = dev->dev_private;
struct panfrost_gem_object *obj;
 
obj = kzalloc(sizeof(*obj), GFP_KERNEL);
@@ -229,6 +230,7 @@ struct drm_gem_object *panfrost_gem_create_object(struct 
drm_device *dev, size_t
INIT_LIST_HEAD(>mappings.list);
mutex_init(>mappings.lock);
obj->base.base.funcs = _gem_funcs;
+   obj->base.map_cached = pfdev->coherent;
 
return >base.base;
 }
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..8852fd378f7a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -371,6 +371,7 @@ int panfrost_mmu_pgtable_alloc(struct panfrost_file_priv 
*priv)
.pgsize_bitmap  = SZ_4K | SZ_2M,
.ias= FIELD_GET(0xff, pfdev->features.mmu_features),
.oas= FIELD_GET(0xff00, 
pfdev->features.mmu_features),
+   .coherent_walk  = pfdev->coherent,
.tlb= _tlb_ops,
.iommu_dev  = pfdev->dev,
};
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] drm: panfrost: Coherency support

2020-09-15 Thread Robin Murphy
Hi all,

I polished up my original proof-of-concept a little while back, but now
that I've got my hands on my Juno again I've been able to actually test
it to my satisfaction, so here are proper patches!

It probably makes sense for patches #1 and #2 to stay together and both
go via drm-misc, provided Will's OK with that.

Robin.


Robin Murphy (3):
  iommu/io-pgtable-arm: Support coherency for Mali LPAE
  drm/panfrost: Support cache-coherent integrations
  arm64: dts: meson: Describe G12b GPU as coherent

 arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 
 drivers/gpu/drm/panfrost/panfrost_device.h  | 1 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
 drivers/iommu/io-pgtable-arm.c  | 5 -
 6 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 12:26:32PM -0700, Raj, Ashok wrote:

> > Yes, there is. There is a limited pool of HW PASID's. If one user fork
> > bombs it can easially claim an unreasonable number from that pool as
> > each process will claim a PASID. That can DOS the rest of the system.
> 
> Not sure how you had this played out.. For PASID used in ENQCMD today for
> our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. 
> 
> The new process needs to bind to get a PASID for its own use. For threads
> of same process the PASID is inherited. For forks(), we do not
> auto-allocate them.

Auto-allocate doesn't matter, the PASID is tied to the mm_struct,
after fork the program will get a new mm_struct, and it can manually
re-trigger PASID allocation for that mm_struct from any SVA kernel
driver.

64k processes, each with their own mm_struct, all triggering SVA, will
allocate 64k PASID's and use up the whole 16 bit space.

> Given that PASID api's are general purpose today and any driver can use it
> to take advantage. VFIO fortunately or unfortunately has the IOMMU things
> abstracted. I suppose that support is also mostly built on top of the
> generic iommu* api abstractions in a vendor neutral way? 
> 
> I'm still lost on what is missing that vDPA can't build on top of what is
> available?

I think it is basically everything in this patch.. Why duplicate all
this uAPI?

Jason 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-09-15 Thread Rajat Jain via iommu
Hello Bjorn,


On Tue, Jul 14, 2020 at 1:19 PM Rajat Jain  wrote:
>
> On Sat, Jul 11, 2020 at 12:53 PM Bjorn Helgaas  wrote:
> >
> > On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> > > On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > > > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > > > When enabling ACS, enable translation blocking for external facing 
> > > > > > ports
> > > > > > and untrusted devices.
> > > > > >
> > > > > > Signed-off-by: Rajat Jain 
> > > > > > ---
> > > > > > v4: Add braces to avoid warning from kernel robot
> > > > > > print warning for only external-facing devices.
> > > > > > v3: print warning if ACS_TB not supported on 
> > > > > > external-facing/untrusted ports.
> > > > > > Minor code comments fixes.
> > > > > > v2: Commit log change
> > > > > >
> > > > > >  drivers/pci/pci.c|  8 
> > > > > >  drivers/pci/quirks.c | 15 +++
> > > > > >  2 files changed, 23 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > > > --- a/drivers/pci/pci.c
> > > > > > +++ b/drivers/pci/pci.c
> > > > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev 
> > > > > > *dev)
> > > > > > /* Upstream Forwarding */
> > > > > > ctrl |= (cap & PCI_ACS_UF);
> > > > > >
> > > > > > +   /* Enable Translation Blocking for external devices */
> > > > > > +   if (dev->external_facing || dev->untrusted) {
> > > > > > +   if (cap & PCI_ACS_TB)
> > > > > > +   ctrl |= PCI_ACS_TB;
> > > > > > +   else if (dev->external_facing)
> > > > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > > > external-facing dev\n");
> > > > > > +   }
> > > > >
> > > > > IIUC, this means that external devices can *never* use ATS and
> > > > > can never cache translations.
> > >
> > > Yes, but it already exists today (and this patch doesn't change that):
> > > 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"
> > >
> > > IMHO any external device trying to send ATS traffic despite having ATS
> > > disabled should count as a bad intent. And this patch is trying to
> > > plug that loophole, by blocking the AT traffic from devices that we do
> > > not expect to see AT from anyway.
> >
> > Thinking about this some more, I wonder if Linux should:
> >
> >   - Explicitly disable ATS for every device at enumeration-time, e.g.,
> > in pci_init_capabilities(),
> >
> >   - Enable PCI_ACS_TB for every device (not just external-facing or
> > untrusted ones),
> >
> >   - Disable PCI_ACS_TB for the relevant devices along the path only
> > when enabling ATS.
> >
> > One nice thing about doing that is that the "untrusted" test would be
> > only in pci_enable_ats(), and we wouldn't need one in
> > pci_std_enable_acs().
>
> Sent out v5 with this approach here:
> https://patchwork.kernel.org/patch/11663515/

Any feedback on the patch above? It has been waiting for feedback

Thanks & Best Regards,,

Rajat


>
> Thanks,
>
> Rajat
>
> >
> >
> > It's possible BIOS gives us devices with ATS enabled, and this might
> > break them, but that seems like something we'd want to find out about.
> >
> > Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Sep 2020 15:45:10 -0300, Jason Gunthorpe  wrote:

> On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > > PASID applies widely to many device and needs to be introduced with a
> > > wide community agreement so all scenarios will be supportable.  
> > 
> > True, reading some of the earlier replies I was clearly confused as I
> > thought you were talking about mdev again. But now that you stay it, you
> > have moved past mdev and its the PASID interfaces correct?  
> 
> Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.
> 
> > For the native user applications have just 1 PASID per
> > process. There is no need for a quota management.  
> 
> Yes, there is. There is a limited pool of HW PASID's. If one user fork
> bombs it can easially claim an unreasonable number from that pool as
> each process will claim a PASID. That can DOS the rest of the system.
> 
> If PASID DOS is a worry then it must be solved at the IOMMU level for
> all user applications that might trigger a PASID allocation. VFIO is
> not special.
> 
> > IIUC, you are asking that part of the interface to move to a API
> > interface that potentially the new /dev/sva and VFIO could share? I
> > think the API's for PASID management themselves are generic (Jean's
> > patchset + Jacob's ioasid set management).  
> 
> Yes, the in kernel APIs are pretty generic now, and can be used by
> many types of drivers.
> 
Right, IOMMU UAPIs are not VFIO specific, we pass user pointer to the IOMMU
layer to process.

Similarly for PASID management, the IOASID extensions we are proposing
will handle ioasid_set (groups/pools), quota, permissions, and notifications
in the IOASID core. There is nothing VFIO specific.
https://lkml.org/lkml/2020/8/22/12

> As JasonW kicked this off, VDPA will need all this identical stuff
> too. We already know this, and I think Intel VDPA HW will need it, so
> it should concern you too :)
> 
> A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> reasonable starting point for discussion.
> 
I am not sure what can really be consolidated in /dev/sva. VFIO and VDPA
will have their own kerne-user interfaces anyway for their usage models.
They are just providing the specific transport while sharing generic IOMMU
UAPIs and IOASID management.

As I mentioned PASID management is already consolidated in the IOASID layer,
so for VDPA or other users, it just matter of create its own ioasid_set,
doing allocation.

IOASID is also available to the in-kernel users which does not
need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this
'floating' state of the PASID when created by /dev/sva. PASID allocation
could happen behind the scene when users need to bind page tables to a
device DMA stream. Security authorization of the PASID is natively enforced
when user try to bind page table, there is no need to pass the FD handle of
the PASID back to the kernel as you suggested earlier.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Thomas Tai




On 2020-09-15 11:09 a.m., Christoph Hellwig wrote:

On Tue, Sep 15, 2020 at 10:40:39AM -0400, Thomas Tai wrote:

+++ b/include/linux/dma-direct.h
@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
   {
dma_addr_t end = addr + size - 1;
   -if (!dev->dma_mask)
-   return false;
-


I am concerned that some drivers may rely on this NULL checking. Would you
think we can keep this checking and use the following WARN_ON_ONCE()?


dma_capable is not a helper for drivers, but just for dma-direct
and related code.  And this patch adds the checks for the three
places how we call into the ->map* methods.



Hi Christoph,
I tried out the suggested changes, and it successfully warned the null 
pointer without panic. I notice that there are some places outside the 
dma-direct, which calls dma_capable().


https://elixir.bootlin.com/linux/v5.9-rc5/source/arch/x86/kernel/amd_gart_64.c#L187

https://elixir.bootlin.com/linux/v5.9-rc5/source/drivers/xen/swiotlb-xen.c#L387

Also, if I remove the null checking in dma_capable(), I may run into the 
risk of a null pointer dereference within the function.


@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size,

 {
dma_addr_t end = addr + size - 1;

-   if (!dev->dma_mask)
-   return false;
-
if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
return false;

return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
^
|
** risk of a null dereference **
}


Given that the WARN_ON_ONCE already did the intended warning, would you 
be ok that I keep the null checking in dma_capable()?


Thank you,
Thomas

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] dma-mapping: introduce DMA range map, supplanting dma_pfn_offset

2020-09-15 Thread Mathieu Poirier
On Tue, Sep 15, 2020 at 07:41:22AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 14, 2020 at 05:01:47PM -0600, Mathieu Poirier wrote:
> 
> [700 lines of the fullquote deleted..]
> 
> > > + for (r = map; r->size; r++)
> > > + num_ranges++;
> > > +
> > > + new_map = kmemdup(map, array_size(num_ranges + 1, sizeof(*map)),
> > > +   GFP_KERNEL);
> > > + if (!new_map)
> > > + return -ENOMEM;
> > > + to->dma_range_map = new_map;
> > > + return 0;
> > > +}
> > > +
> > 
> > This patch seemed Ok to me but it broke the stm32 remoteproc 
> > implementation.  When
> > I tested things out function dma_coerce_mask_and_cohenrent() returns -5 and 
> > the
> > rest of the initialisation fails.  I isolated things to function 
> > dma_to_pfn()
> > [2].  In the original implementation __bus_to_pfn() returns 0xf and
> > dev->dma_pfn_offset is equal to 0x38000.  As such the function returns 
> > 0x137fff
> > and dma_supported() a non-zero value[3].
> > 
> > With this set function dma_to_pfn() received a face lift.  Function
> > __bus_to_pfn() still returns 0xf but translate_dma_to_phys() returns 0,
> > which forces dma_supported() to also return 0 and that is where the -5 
> > (-EIO)
> > comes from.
> > 
> > Taking a futher look at translate_dma_to_phy(), @dma_addr never falls 
> > within the
> > bus_dma_region ranges and returns 0.
> > 
> > I'm suspecting an initialisation problem and if it occurred here, it will
> > likely show up elsewhere.
> 
> Can you try this incremental patch?
> 
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 088c97181ab146..c6b21acba7a459 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -46,7 +46,7 @@ static inline phys_addr_t translate_dma_to_phys(struct 
> device *dev,
>   if (dma_addr >= m->dma_start && dma_addr - m->dma_start < 
> m->size)
>   return (phys_addr_t)dma_addr + m->offset;
>  
> - return 0;
> + return (phys_addr_t)-1;

That did the trick - the stm32 platform driver's probe() function completes and
the remote processor is operatinal. 

That being said the value returned by function dma_to_pfn()
is 0x137fff in the original code and 0xf with your patches applied.

Thanks,
Mathieu

>  }
>  
>  #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Raj, Ashok
On Tue, Sep 15, 2020 at 03:45:10PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > > PASID applies widely to many device and needs to be introduced with a
> > > wide community agreement so all scenarios will be supportable.
> > 
> > True, reading some of the earlier replies I was clearly confused as I
> > thought you were talking about mdev again. But now that you stay it, you
> > have moved past mdev and its the PASID interfaces correct?
> 
> Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.
> 
> > For the native user applications have just 1 PASID per
> > process. There is no need for a quota management.
> 
> Yes, there is. There is a limited pool of HW PASID's. If one user fork
> bombs it can easially claim an unreasonable number from that pool as
> each process will claim a PASID. That can DOS the rest of the system.

Not sure how you had this played out.. For PASID used in ENQCMD today for
our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. 

The new process needs to bind to get a PASID for its own use. For threads
of same process the PASID is inherited. For forks(), we do not
auto-allocate them. Since PASID isn't a sharable resource much like how you
would not pass mmio mmap's to forked processes that cannot be shared correct?
Such as your doorbell space for e.g. 

> 
> If PASID DOS is a worry then it must be solved at the IOMMU level for
> all user applications that might trigger a PASID allocation. VFIO is
> not special.

Feels like you can simply avoid the PASID DOS rather than permit it to
happen. 
> 
> > IIUC, you are asking that part of the interface to move to a API interface
> > that potentially the new /dev/sva and VFIO could share? I think the API's
> > for PASID management themselves are generic (Jean's patchset + Jacob's
> > ioasid set management).
> 
> Yes, the in kernel APIs are pretty generic now, and can be used by
> many types of drivers.

Good, so there is no new requirements here I suppose.
> 
> As JasonW kicked this off, VDPA will need all this identical stuff
> too. We already know this, and I think Intel VDPA HW will need it, so
> it should concern you too :)

This is one of those things that I would disagree and commit :-).. 

> 
> A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> reasonable starting point for discussion.

Looks like now we are getting closer to what we need. :-)

Given that PASID api's are general purpose today and any driver can use it
to take advantage. VFIO fortunately or unfortunately has the IOMMU things
abstracted. I suppose that support is also mostly built on top of the
generic iommu* api abstractions in a vendor neutral way? 

I'm still lost on what is missing that vDPA can't build on top of what is
available?

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > PASID applies widely to many device and needs to be introduced with a
> > wide community agreement so all scenarios will be supportable.
> 
> True, reading some of the earlier replies I was clearly confused as I
> thought you were talking about mdev again. But now that you stay it, you
> have moved past mdev and its the PASID interfaces correct?

Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.

> For the native user applications have just 1 PASID per
> process. There is no need for a quota management.

Yes, there is. There is a limited pool of HW PASID's. If one user fork
bombs it can easially claim an unreasonable number from that pool as
each process will claim a PASID. That can DOS the rest of the system.

If PASID DOS is a worry then it must be solved at the IOMMU level for
all user applications that might trigger a PASID allocation. VFIO is
not special.

> IIUC, you are asking that part of the interface to move to a API interface
> that potentially the new /dev/sva and VFIO could share? I think the API's
> for PASID management themselves are generic (Jean's patchset + Jacob's
> ioasid set management).

Yes, the in kernel APIs are pretty generic now, and can be used by
many types of drivers.

As JasonW kicked this off, VDPA will need all this identical stuff
too. We already know this, and I think Intel VDPA HW will need it, so
it should concern you too :)

A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
reasonable starting point for discussion.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Raj, Ashok
On Tue, Sep 15, 2020 at 08:33:41AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote:
> > Hi Jason,
> > 
> > I thought we discussed this at LPC, but still seems to be going in
> > circles :-(.
> 
> We discused mdev at LPC, not PASID.
> 
> PASID applies widely to many device and needs to be introduced with a
> wide community agreement so all scenarios will be supportable.

True, reading some of the earlier replies I was clearly confused as I
thought you were talking about mdev again. But now that you stay it, you
have moved past mdev and its the PASID interfaces correct?

> 
> > As you had suggested earlier in the mail thread could Jason Wang maybe
> > build out what it takes to have a full fledged /dev/sva interface for vDPA
> > and figure out how the interfaces should emerge? otherwise it appears
> > everyone is talking very high level and with that limited understanding of
> > how things work at the moment. 
> 
> You want Jason Wang to do the work to get Intel PASID support merged?
> Seems a bit of strange request.

I was reading mdev in my head. Not PASID, sorry.

For the native user applications have just 1 PASID per process. There is no
need for a quota management. VFIO being the one used for guest where there
is more PASID's per guest is where this is enforced today. 

IIUC, you are asking that part of the interface to move to a API interface
that potentially the new /dev/sva and VFIO could share? I think the API's
for PASID management themselves are generic (Jean's patchset + Jacob's
ioasid set management). 

Possibly what you need is already available, but not in a specific way that
you expect maybe? 

Let me check with Jacob and let him/Jean pick that up.

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu: Kconfig: Update help description for IPMMU_VMSA config

2020-09-15 Thread Lad Prabhakar
ipmmu-vmsa driver is also used on Renesas RZ/G{1,2} Soc's, update the
description for the IPMMU_VMSA config symbol to reflect this.

Signed-off-by: Lad Prabhakar 
Reviewed-by: Chris Paterson 
Reviewed-by: Geert Uytterhoeven 
---
v1->v2
* Updated commit description as suggested by Geert
* Included RB from Geert
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index bef5d75e306b..d8f71bf31786 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -232,7 +232,7 @@ config IPMMU_VMSA
select ARM_DMA_USE_IOMMU
help
  Support for the Renesas VMSA-compatible IPMMU found in the R-Mobile
- APE6, R-Car Gen2, and R-Car Gen3 SoCs.
+ APE6, R-Car Gen{2,3} and RZ/G{1,2} SoCs.
 
  If unsure, say N.
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 18/18] firewire-ohci: use dma_alloc_pages

2020-09-15 Thread Christoph Hellwig
Use dma_alloc_pages to allocate DMAable pages instead of hoping that
the architecture either has GFP_DMA32 or not more than 4G of memory.

Signed-off-by: Christoph Hellwig 
---
 drivers/firewire/ohci.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 020cb15a4d8fcc..9811c40956e54d 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -674,17 +674,16 @@ static void ar_context_link_page(struct ar_context *ctx, 
unsigned int index)
 
 static void ar_context_release(struct ar_context *ctx)
 {
+   struct device *dev = ctx->ohci->card.device;
unsigned int i;
 
vunmap(ctx->buffer);
 
-   for (i = 0; i < AR_BUFFERS; i++)
-   if (ctx->pages[i]) {
-   dma_unmap_page(ctx->ohci->card.device,
-  ar_buffer_bus(ctx, i),
-  PAGE_SIZE, DMA_FROM_DEVICE);
-   __free_page(ctx->pages[i]);
-   }
+   for (i = 0; i < AR_BUFFERS; i++) {
+   if (ctx->pages[i])
+   dma_free_pages(dev, PAGE_SIZE, ctx->pages[i],
+  ar_buffer_bus(ctx, i), DMA_FROM_DEVICE);
+   }
 }
 
 static void ar_context_abort(struct ar_context *ctx, const char *error_msg)
@@ -970,6 +969,7 @@ static void ar_context_tasklet(unsigned long data)
 static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci,
   unsigned int descriptors_offset, u32 regs)
 {
+   struct device *dev = ohci->card.device;
unsigned int i;
dma_addr_t dma_addr;
struct page *pages[AR_BUFFERS + AR_WRAPAROUND_PAGES];
@@ -980,17 +980,13 @@ static int ar_context_init(struct ar_context *ctx, struct 
fw_ohci *ohci,
tasklet_init(>tasklet, ar_context_tasklet, (unsigned long)ctx);
 
for (i = 0; i < AR_BUFFERS; i++) {
-   ctx->pages[i] = alloc_page(GFP_KERNEL | GFP_DMA32);
+   ctx->pages[i] = dma_alloc_pages(dev, PAGE_SIZE, _addr,
+   DMA_FROM_DEVICE, GFP_KERNEL);
if (!ctx->pages[i])
goto out_of_memory;
-   dma_addr = dma_map_page(ohci->card.device, ctx->pages[i],
-   0, PAGE_SIZE, DMA_FROM_DEVICE);
-   if (dma_mapping_error(ohci->card.device, dma_addr)) {
-   __free_page(ctx->pages[i]);
-   ctx->pages[i] = NULL;
-   goto out_of_memory;
-   }
set_page_private(ctx->pages[i], dma_addr);
+   dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE,
+  DMA_FROM_DEVICE);
}
 
for (i = 0; i < AR_BUFFERS; i++)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Implement the alloc_noncoherent method to provide memory that is neither
coherent not contiguous.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 41 +++
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 00a5b49248e334..c12c1dc43d312e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -572,6 +572,7 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
  * @size: Size of buffer in bytes
  * @dma_handle: Out argument for allocated DMA handle
  * @gfp: Allocation flags
+ * @prot: pgprot_t to use for the remapped mapping
  * @attrs: DMA attributes for this allocation
  *
  * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
@@ -580,14 +581,14 @@ static struct page **__iommu_dma_alloc_pages(struct 
device *dev,
  * Return: Mapped virtual address, or NULL on failure.
  */
 static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+   unsigned long attrs)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
struct sg_table sgt;
@@ -1030,8 +1031,10 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
gfp |= __GFP_ZERO;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
-   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
-   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
+   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
+   return iommu_dma_alloc_remap(dev, size, handle, gfp,
+   dma_pgprot(dev, PAGE_KERNEL, attrs), attrs);
+   }
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
@@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return cpu_addr;
 }
 
+#ifdef CONFIG_DMA_REMAP
+static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
+   dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
+{
+   if (!gfpflags_allow_blocking(gfp)) {
+   struct page *page;
+
+   page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
+   if (!page)
+   return NULL;
+   return page_address(page);
+   }
+
+   return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
+PAGE_KERNEL, 0);
+}
+
+static void iommu_dma_free_noncoherent(struct device *dev, size_t size,
+   void *cpu_addr, dma_addr_t handle, enum dma_data_direction dir)
+{
+   __iommu_dma_unmap(dev, handle, size);
+   __iommu_dma_free(dev, size, cpu_addr);
+}
+#else
+#define iommu_dma_alloc_noncoherentNULL
+#define iommu_dma_free_noncoherent NULL
+#endif /* CONFIG_DMA_REMAP */
+
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
@@ -1122,6 +1153,8 @@ static const struct dma_map_ops iommu_dma_ops = {
.free   = iommu_dma_free,
.alloc_pages= dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
+   .alloc_noncoherent  = iommu_dma_alloc_noncoherent,
+   .free_noncoherent   = iommu_dma_free_noncoherent,
.mmap   = iommu_dma_mmap,
.get_sgtable= iommu_dma_get_sgtable,
.map_page   = iommu_dma_map_page,
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 6/9] x86/msr-index: Define IA32_PASID MSR

2020-09-15 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4f39a8..aaddc6a9e237 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -257,6 +257,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 2/9] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-09-15 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Reviewed by Lu Baolu

v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index e78a74a9c1cf..fc90a079e228 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -446,7 +446,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
@@ -1033,7 +1034,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -1042,7 +1043,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, );
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7322073f62d0..9c3e8337442a 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -765,7 +765,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/9] Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)

2020-09-15 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v8:
- Address all of comments from Boris and Randy.

v7:
- Change the doc for updating PASID by IPI and context switch (Andy).

v3:
- Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
- Fix a couple of typos (Baolu)

v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 256 
 2 files changed, 257 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..a1f008ef7dad
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,256 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM).
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to the PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for virtual addresses. The IOMMU driver uses the mmu_notifier()
+support to keep the device TLB cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address by using the CPU page tables
+before accessing the address.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20-bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress should be provided.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permits hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread-scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA-capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU-specific API
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following:
+
+- Allocate the PASID, and program the process page-table (%cr3 register) in the
+  PASID context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device TLB in sync. For example, when a page-table entry is invalidated,

[PATCH v8 4/9] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-09-15 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2901d5df4366..fea10d04d05f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -353,6 +353,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 8/9] x86/cpufeatures: Mark ENQCMD as disabled when configured out

2020-09-15 Thread Fenghua Yu
Currently, the ENQCMD feature depends on CONFIG_IOMMU_SUPPORT.
Add X86_FEATURE_ENQCMD to the disabled features mask.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v8:
- Re-write commit message (Boris).
- Move "#ifdef CONFIG_IOMMU_SUPPORT" hunk from patch 9 (Boris).

v7:
- Split this patch from a previous patch.

 arch/x86/include/asm/disabled-features.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 4ea8584682f9..5861d34f9771 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,6 +56,12 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -75,7 +81,8 @@
 #define DISABLED_MASK130
 #define DISABLED_MASK140
 #define DISABLED_MASK150
-#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16
(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
+DISABLE_ENQCMD)
 #define DISABLED_MASK170
 #define DISABLED_MASK180
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 9/9] x86/mmu: Allocate/free PASID

2020-09-15 Thread Fenghua Yu
A PASID is allocated for an "mm" the first time any thread binds
to an SVM capable device and is freed from the "mm" when the SVM is
unbound by the last thread. It's possible for the "mm" to have different
PASID values in different binding/unbinding SVM cycles.

The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is
propagated to per-thread PASID MSR for all threads within the mm through
IPI, context switch, or inherit. So that a running thread has the right
PASID MSR matching the mm's PASID.

Suggested-by: Andy Lutomirski 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v8:
- Re-write commit message (Boris).
- Remove ppasid_state == NULL check in update_pasid() (Boris).
- Move "#ifdef CONFIG_IOMMU_SUPPORT" hunk to patch 8 (Boris).
- Add comment when calling update_pasid().

v7:
- Don't fix up PASID in #GP. Instead, update the PASID MSR by IPI and
  context switch after PASID allocation and free. Inherit PASID from
  parent. (Andy)

Before v7:
- Allocate a PASID for the mm and free it until mm exit.

 arch/x86/include/asm/fpu/api.h  | 12 ++
 arch/x86/include/asm/fpu/internal.h |  7 
 arch/x86/kernel/fpu/xstate.c| 57 +
 drivers/iommu/intel/svm.c   | 28 +-
 4 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b774c52e5411..dcd9503b1098 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -62,4 +62,16 @@ extern void switch_fpu_return(void);
  */
 extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
 
+/*
+ * Tasks that are not using SVA have mm->pasid set to zero to note that they
+ * will not have the valid bit set in MSR_IA32_PASID while they are running.
+ */
+#define PASID_DISABLED 0
+
+#ifdef CONFIG_IOMMU_SUPPORT
+/* Update current's PASID MSR/state by mm's PASID. */
+void update_pasid(void);
+#else
+static inline void update_pasid(void) { }
+#endif
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 0a460f2a3f90..341d00eba3f8 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -583,6 +583,13 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
pkru_val = pk->pkru;
}
__write_pkru(pkru_val);
+
+   /*
+* Expensive PASID MSR write will be avoided in update_pasid() because
+* TIF_NEED_FPU_LOAD was set. And the PASID state won't be updated
+* unless it's different from mm->pasid to reduce overhead.
+*/
+   update_pasid();
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 67f1a03b9b23..5d8047441a0a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1402,3 +1402,60 @@ int proc_pid_arch_status(struct seq_file *m, struct 
pid_namespace *ns,
return 0;
 }
 #endif /* CONFIG_PROC_PID_ARCH_STATUS */
+
+#ifdef CONFIG_IOMMU_SUPPORT
+void update_pasid(void)
+{
+   u64 pasid_state;
+   u32 pasid;
+
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return;
+
+   if (!current->mm)
+   return;
+
+   pasid = READ_ONCE(current->mm->pasid);
+   /* Set the valid bit in the PASID MSR/state only for valid pasid. */
+   pasid_state = pasid == PASID_DISABLED ?
+ pasid : pasid | MSR_IA32_PASID_VALID;
+
+   /*
+* No need to hold fregs_lock() since the task's fpstate won't
+* be changed by others (e.g. ptrace) while the task is being
+* switched to or is in IPI.
+*/
+   if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+   /* The MSR is active and can be directly updated. */
+   wrmsrl(MSR_IA32_PASID, pasid_state);
+   } else {
+   struct fpu *fpu = >thread.fpu;
+   struct ia32_pasid_state *ppasid_state;
+   struct xregs_state *xsave;
+
+   /*
+* The CPU's xstate registers are not currently active. Just
+* update the PASID state in the memory buffer here. The
+* PASID MSR will be loaded when returning to user mode.
+*/
+   xsave = >state.xsave;
+   xsave->header.xfeatures |= XFEATURE_MASK_PASID;
+   ppasid_state = get_xsave_addr(xsave, XFEATURE_PASID);
+   /*
+* Since XFEATURE_MASK_PASID is set in xfeatures, ppasid_state
+* won't be NULL and no need to check its value.
+*
+* Only update the task's PASID state when it's different
+* from the mm's pasid.
+*/
+   if (ppasid_state->pasid != pasid_state) {
+   /*
+* Invalid fpregs so that state restoring will pick up
+* the PASID 

[PATCH v8 7/9] mm: Define pasid in mm

2020-09-15 Thread Fenghua Yu
PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM (Jean)

v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: 
https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-phili...@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-phili...@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 496c3ff97cce..1ff0615ef19f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -542,6 +542,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
 #endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+   u32 pasid;
+#endif
} __randomize_layout;
 
/*
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/9] drm, iommu: Change type of pasid to u32

2020-09-15 Thread Fenghua Yu
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".

No PASID type change in uapi although it defines PASID as __u64 in
some places.

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
Acked-by: Felix Kuehling 
---
v8:
- Change subject to "drm, iommu:" (Boris).

v7:
- Add "Acked-by: Felix Kuehling "

v6:
- Change return type to u32 for kfd_pasid_alloc() (Felix)

v5:
- Reviewed by Lu Baolu

v4:
- Change PASID type from "unsigned int" to "u32" (Christoph)

v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 20 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
 drivers/iommu/amd/amd_iommu.h | 10 +++---
 drivers/iommu/amd/iommu.c | 31 ++-
 drivers/iommu/amd/iommu_v2.c  | 20 ++--
 drivers/iommu/intel/dmar.c|  7 +++--
 drivers/iommu/intel/iommu.c   |  4 +--
 drivers/iommu/intel/pasid.c   | 31 +--
 drivers/iommu/intel/pasid.h   | 24 +++---
 drivers/iommu/intel/svm.c | 12 +++
 drivers/iommu/iommu.c |  2 +-
 drivers/misc/uacce/uacce.c|  2 +-
 include/linux/amd-iommu.h |  8 ++---
 include/linux/intel-iommu.h   | 12 +++
 include/linux/intel-svm.h |  2 +-
 include/linux/iommu.h | 10 +++---
 include/linux/uacce.h |  2 +-
 38 files changed, 141 insertions(+), 141 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ffe149aafc39..dfef5a7e0f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev 
*dst, struct kgd_dev *s
})
 
 /* GPUVM API */
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-   struct file *filp, unsigned int pasid,
+   struct file *filp, u32 pasid,
void **vm, void **process_info,
struct dma_fence **ef);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index bf927f432506..ee531c3988d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev 
*kgd, uint32_t vmid,
unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
unsigned int vmid)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 

[PATCH v8 5/9] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

2020-09-15 Thread Fenghua Yu
From: Yu-cheng Yu 

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 11 ++-
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c  |  6 +-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c87364ea6446..f5a38a5f3ae1 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,7 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
-   XFEATURE_RSRVD_COMP_10,
+   XFEATURE_PASID,
XFEATURE_RSRVD_COMP_11,
XFEATURE_RSRVD_COMP_12,
XFEATURE_RSRVD_COMP_13,
@@ -134,6 +134,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT   (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID(1 << XFEATURE_PASID)
 #define XFEATURE_MASK_LBR  (1 << XFEATURE_LBR)
 
 #define XFEATURE_MASK_FPSSE(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
@@ -256,6 +257,14 @@ struct arch_lbr_state {
struct lbr_entryentries[];
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+   u64 pasid;
+} __packed;
+
 struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index 14ab815132d4..47a92232d595 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -35,7 +35,7 @@
  XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * A supervisor state component may not always contain valuable information,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 038e19c0019e..67f1a03b9b23 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)"  ,
"Protection Keys User registers",
+   "PASID state",
"unknown xstate feature",
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+   X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -318,6 +320,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+   print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -592,6 +595,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU,  struct pkru_state);
+   XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
 
/*
 * Make *SURE* to add any feature numbers in below if
@@ -601,7 +605,7 @@ static void check_xstate_against_struct(int nr)
if ((nr < XFEATURE_YMM) ||
(nr >= XFEATURE_MAX) ||
(nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
-   ((nr >= XFEATURE_RSRVD_COMP_10) && (nr <= XFEATURE_LBR))) {
+   ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) {
WARN_ONCE(1, "no structure for xstate: %d\n", nr);
XSTATE_WARN_ON(1);
}
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/9] x86: tag application address space for devices

2020-09-15 Thread Fenghua Yu
Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-2). These patches can be in a
   separate series.
   - Define "pasid" as "u32" consistently
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 3).
3. Enumerate support for ENQCMD in the processor (patch 4).
4. Handle FPU PASID state and the MSR during context switch (patches 5-6).
5. Define "pasid" in mm_struct (patch 7).
6. Disable ENQCMD when configured out (patch 8).
7. Allocate and free PASID for a process (patch 9).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v8:
- Change subject to "drm, iommu:" in patch 1 (Boris).
- Address all comments on patch 3 from Boris and Randy.
- Re-write commit message for patch 8 (Boris)
- Re-write commit message, remove "#ifdef CONFIG_IOMMU_SUPPORT", remove
  the "if (!ppasid_state)" check in patch 9 (Boris).

v7:
- Don't fix up PASID in #GP. Instead, update the PASID MSR by IPI and
  context switch after PASID allocation and free. Inherit PASID from
  parent. (Andy)

v6:
- Change return type to u32 for kfd_pasid_alloc() in patch 1 (Felix)

v5:
- Mark ENQCMD disabled when configured out and use cpu_feature_enabled()
  to simplify the feature checking code in patch 10 and 12 (PeterZ and
  Dave Hansen)
- Add Reviewed-by: Lu Baolu to patch 1, 2, 10, and 12.

v4:
- Define PASID as "u32" instead of "unsigned int" in patch 1, 7, 10, 12.
  (Christoph)
- Drop v3 patch 2 which changes PASID type in ocxl because it's not related
  to x86 and was rejected by ocxl maintainer Frederic Barrat
- A split patch which changes PASID type to u32 in crypto/hisilicon/qm.c
  was released separately to linux-crypto mailing list because it's not
  related to x86 and is a standalone patch:

v3:
- Change names of bind_mm() and unbind_mm() to match to new APIs in
  patch 4 (Baolu)
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM in patch 8 (Jean)
- Add a few sanity checks in __free_pasid() and alloc_pasid() in
  patch 11 (Baolu)
- Add patch 12 to define a new flag "has_valid_pasid" for a task and
  use the flag to identify if the task has a valid PASID MSR (PeterZ)
- Add fpu__pasid_write() to update the MSR in fixup() in patch 13
- Check if mm->pasid can be found in fixup() in patch 13

v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  Documentation/x86: Add documentation for SVA (Shared Virtual
Addressing)

Fenghua Yu (7):
  drm, iommu: Change type of pasid to u32
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  x86/cpufeatures: Mark ENQCMD as disabled when configured out
  x86/mmu: Allocate/free PASID

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst   |   1 +
 Documentation/x86/sva.rst

[PATCH 16/18] dma-mapping: add new {alloc, free}_noncoherent dma_map_ops methods

2020-09-15 Thread Christoph Hellwig
This will allow IOMMU drivers to allocate non-contigous memory and
return a vmapped virtual address.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h |  5 +
 kernel/dma/mapping.c| 33 +++--
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bf592cf0db4acb..b4b5d75260d6dc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -80,6 +80,11 @@ struct dma_map_ops {
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir);
+   void* (*alloc_noncoherent)(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir,
+   gfp_t gfp);
+   void (*free_noncoherent)(struct device *dev, size_t size, void *vaddr,
+   dma_addr_t dma_handle, enum dma_data_direction dir);
int (*mmap)(struct device *, struct vm_area_struct *,
  void *, dma_addr_t, size_t,
  unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 6f86c925b8251d..8614d7d2ee59a9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -502,19 +502,40 @@ EXPORT_SYMBOL_GPL(dma_free_pages);
 void *dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
 {
-   struct page *page;
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   void *vaddr;
 
-   page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
-   if (!page)
-   return NULL;
-   return page_address(page);
+   if (!ops || !ops->alloc_noncoherent) {
+   struct page *page;
+
+   page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
+   if (!page)
+   return NULL;
+   return page_address(page);
+   }
+
+   size = PAGE_ALIGN(size);
+   vaddr = ops->alloc_noncoherent(dev, size, dma_handle, dir, gfp);
+   if (vaddr)
+   debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir,
+  *dma_handle);
+   return vaddr;
 }
 EXPORT_SYMBOL_GPL(dma_alloc_noncoherent);
 
 void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir)
 {
-   dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   if (!ops || !ops->free_noncoherent) {
+   dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+   return;
+   }
+
+   size = PAGE_ALIGN(size);
+   debug_dma_unmap_page(dev, dma_handle, size, dir);
+   ops->free_noncoherent(dev, size, vaddr, dma_handle, dir);
 }
 EXPORT_SYMBOL_GPL(dma_free_noncoherent);
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/18] dma-mapping: add a new dma_alloc_pages API

2020-09-15 Thread Christoph Hellwig
This API is the equivalent of alloc_pages, except that the returned memory
is guaranteed to be DMA addressable by the passed in device.  The
implementation will also be used to provide a more sensible replacement
for DMA_ATTR_NON_CONSISTENT flag.

Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages
as its backend.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-attributes.rst |  8 ---
 arch/alpha/kernel/pci_iommu.c |  2 +
 arch/arm/mm/dma-mapping-nommu.c   |  2 +
 arch/arm/mm/dma-mapping.c |  4 ++
 arch/ia64/hp/common/sba_iommu.c   |  2 +
 arch/mips/jazz/jazzdma.c  |  7 +--
 arch/powerpc/kernel/dma-iommu.c   |  2 +
 arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
 arch/powerpc/platforms/pseries/vio.c  |  2 +
 arch/s390/pci/pci_dma.c   |  2 +
 arch/x86/kernel/amd_gart_64.c |  2 +
 drivers/iommu/dma-iommu.c |  2 +
 drivers/iommu/intel/iommu.c   |  4 ++
 drivers/parisc/ccio-dma.c |  2 +
 drivers/parisc/sba_iommu.c|  2 +
 drivers/xen/swiotlb-xen.c |  2 +
 include/linux/dma-direct.h|  5 ++
 include/linux/dma-mapping.h   | 34 ++--
 include/linux/dma-noncoherent.h   |  3 --
 kernel/dma/direct.c   | 52 ++-
 kernel/dma/mapping.c  | 63 +--
 kernel/dma/ops_helpers.c  | 35 +
 kernel/dma/virt.c |  2 +
 23 files changed, 206 insertions(+), 37 deletions(-)

diff --git a/Documentation/core-api/dma-attributes.rst 
b/Documentation/core-api/dma-attributes.rst
index 29dcbe8826e85e..1887d92e8e9269 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -25,14 +25,6 @@ Since it is optional for platforms to implement 
DMA_ATTR_WRITE_COMBINE,
 those that do not will simply ignore the attribute and exhibit default
 behavior.
 
-DMA_ATTR_NON_CONSISTENT

-
-DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
-consistent or non-consistent memory as it sees fit.  By using this API,
-you are guaranteeing to the platform that you have all the correct and
-necessary sync points for this memory in the driver.
-
 DMA_ATTR_NO_KERNEL_MAPPING
 --
 
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 6f7de4f4e191e7..447e0fd0ed3895 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -952,5 +952,7 @@ const struct dma_map_ops alpha_pci_ops = {
.dma_supported  = alpha_pci_supported,
.mmap   = dma_common_mmap,
.get_sgtable= dma_common_get_sgtable,
+   .alloc_pages= dma_common_alloc_pages,
+   .free_pages = dma_common_free_pages,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 287ef898a55e11..43c6d66b6e733a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -176,6 +176,8 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device 
*dev, struct scatterlist
 const struct dma_map_ops arm_nommu_dma_ops = {
.alloc  = arm_nommu_dma_alloc,
.free   = arm_nommu_dma_free,
+   .alloc_pages= dma_direct_alloc_pages,
+   .free_pages = dma_direct_free_pages,
.mmap   = arm_nommu_dma_mmap,
.map_page   = arm_nommu_dma_map_page,
.unmap_page = arm_nommu_dma_unmap_page,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a8949174b1c06..7738b4d23f692f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,8 @@ static int arm_dma_supported(struct device *dev, u64 mask)
 const struct dma_map_ops arm_dma_ops = {
.alloc  = arm_dma_alloc,
.free   = arm_dma_free,
+   .alloc_pages= dma_direct_alloc_pages,
+   .free_pages = dma_direct_free_pages,
.mmap   = arm_dma_mmap,
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_dma_map_page,
@@ -226,6 +228,8 @@ static int arm_coherent_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
 const struct dma_map_ops arm_coherent_dma_ops = {
.alloc  = arm_coherent_dma_alloc,
.free   = arm_coherent_dma_free,
+   .alloc_pages= dma_direct_alloc_pages,
+   .free_pages = dma_direct_free_pages,
.mmap   = arm_coherent_dma_mmap,
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_coherent_dma_map_page,
diff --git 

[PATCH 14/18] dma-mapping: remove dma_cache_sync

2020-09-15 Thread Christoph Hellwig
All users are gone now, remove the API.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/Kconfig   |  1 -
 arch/mips/jazz/jazzdma.c|  1 -
 arch/mips/mm/dma-noncoherent.c  |  6 --
 arch/parisc/Kconfig |  1 -
 arch/parisc/kernel/pci-dma.c|  6 --
 include/linux/dma-mapping.h |  8 
 include/linux/dma-noncoherent.h | 10 --
 kernel/dma/Kconfig  |  3 ---
 kernel/dma/mapping.c| 14 --
 9 files changed, 50 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..1be91c5d666e61 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_DMA_SET_UNCACHED
select DMA_NONCOHERENT_MMAP
-   select DMA_NONCOHERENT_CACHE_SYNC
select NEED_DMA_MAP_STATE
 
 config SYS_HAS_EARLY_PRINTK
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index dab4d058cea9b1..2bf849caf507b1 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -620,7 +620,6 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_single_for_device = jazz_dma_sync_single_for_device,
.sync_sg_for_cpu= jazz_dma_sync_sg_for_cpu,
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
-   .cache_sync = arch_dma_cache_sync,
.mmap   = dma_common_mmap,
.get_sgtable= dma_common_get_sgtable,
 };
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 97a14adbafc99c..f34ad1f09799f1 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 }
 #endif
 
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction direction)
-{
-   dma_sync_virt_for_device(vaddr, size, direction);
-}
-
 #ifdef CONFIG_DMA_PERDEV_COHERENT
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..ed15da1da174e0 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -195,7 +195,6 @@ config PA11
depends on PA7000 || PA7100LC || PA7200 || PA7300LC
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-   select DMA_NONCOHERENT_CACHE_SYNC
 
 config PREFETCH
def_bool y
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 38c68e131bbe2a..ce38c0b9158125 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
 {
flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
 }
-
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-  enum dma_data_direction direction)
-{
-   flush_kernel_dcache_range((unsigned long)vaddr, size);
-}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4e1de194b45cbf..5b4e97b0846fd3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -123,8 +123,6 @@ struct dma_map_ops {
void (*sync_sg_for_device)(struct device *dev,
   struct scatterlist *sg, int nents,
   enum dma_data_direction dir);
-   void (*cache_sync)(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction direction);
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
@@ -254,8 +252,6 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle);
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction dir);
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
@@ -339,10 +335,6 @@ static inline void dmam_free_coherent(struct device *dev, 
size_t size,
void *vaddr, dma_addr_t dma_handle)
 {
 }
-static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
-   enum dma_data_direction dir)
-{
-}
 static inline int dma_get_sgtable_attrs(struct device *dev,
struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr,
size_t size, unsigned long attrs)
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index b9bc6c557ea46f..0888656369a45b 100644
--- 

[PATCH 13/18] 53c700: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Use the new non-coherent DMA API including proper ownership transfers.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/53c700.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index c59226d7e2f6b5..5117d90ccd9edf 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
  spi_period(SDp->sdev_target));
 }
 
+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void 
*p)
+{
+   return h->pScript + ((uintptr_t)p - (uintptr_t)h->script);
+}
+
 static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
 {
if (h->noncoherent)
-   dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+   dma_sync_single_for_device(h->dev, virt_to_dma(h, addr),
+  size, DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
 {
if (h->noncoherent)
-   dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+   dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size,
+  DMA_BIDIRECTIONAL);
 }
 
 struct Scsi_Host *
@@ -300,8 +307,8 @@ NCR_700_detect(struct scsi_host_template *tpnt,
memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, , GFP_KERNEL);
if (!memory) {
hostdata->noncoherent = 1;
-   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, ,
-GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   memory = dma_alloc_noncoherent(dev, TOTAL_MEM_SIZE, ,
+DMA_BIDIRECTIONAL, GFP_KERNEL);
}
if (!memory) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, 
detaching\n");
@@ -414,8 +421,9 @@ NCR_700_release(struct Scsi_Host *host)
(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
if (hostdata->noncoherent)
-   dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-  hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(hostdata->dev, TOTAL_MEM_SIZE,
+   hostdata->script, hostdata->pScript,
+   DMA_BIDIRECTIONAL);
else
dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
  hostdata->script, hostdata->pScript);
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/18] sgiseeq: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Use the new non-coherent DMA API including proper ownership transfers.
This includes adding additional calls to dma_sync_desc_dev as the
old syncing was rather ad-hoc.

Thanks to Thomas Bogendoerfer for debugging the ownership transfer
issues.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/seeq/sgiseeq.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/seeq/sgiseeq.c 
b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff2420143a..37ff25a84030eb 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {
 
 static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
 {
-   dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-  DMA_FROM_DEVICE);
+   struct sgiseeq_private *sp = netdev_priv(dev);
+
+   dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+   sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
 {
-   dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
-  DMA_TO_DEVICE);
+   struct sgiseeq_private *sp = netdev_priv(dev);
+
+   dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+   sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
 }
 
 static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs)
@@ -403,6 +407,8 @@ static inline void sgiseeq_rx(struct net_device *dev, 
struct sgiseeq_private *sp
rd = >rx_desc[sp->rx_new];
dma_sync_desc_cpu(dev, rd);
}
+   dma_sync_desc_dev(dev, rd);
+
dma_sync_desc_cpu(dev, >rx_desc[orig_end]);
sp->rx_desc[orig_end].rdma.cntinfo &= ~(HPCDMA_EOR);
dma_sync_desc_dev(dev, >rx_desc[orig_end]);
@@ -443,6 +449,7 @@ static inline void kick_tx(struct net_device *dev,
dma_sync_desc_cpu(dev, td);
}
if (td->tdma.cntinfo & HPCDMA_XIU) {
+   dma_sync_desc_dev(dev, td);
hregs->tx_ndptr = VIRT_TO_DMA(sp, td);
hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE;
}
@@ -476,6 +483,7 @@ static inline void sgiseeq_tx(struct net_device *dev, 
struct sgiseeq_private *sp
if (!(td->tdma.cntinfo & (HPCDMA_XIU)))
break;
if (!(td->tdma.cntinfo & (HPCDMA_ETXD))) {
+   dma_sync_desc_dev(dev, td);
if (!(status & HPC3_ETXCTRL_ACTIVE)) {
hregs->tx_ndptr = VIRT_TO_DMA(sp, td);
hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE;
@@ -740,8 +748,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
sp = netdev_priv(dev);
 
/* Make private data page aligned */
-   sr = dma_alloc_attrs(>dev, sizeof(*sp->srings), >srings_dma,
-GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   sr = dma_alloc_noncoherent(>dev, sizeof(*sp->srings),
+   >srings_dma, DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!sr) {
printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n");
err = -ENOMEM;
@@ -802,8 +810,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
return 0;
 
 err_out_free_attrs:
-   dma_free_attrs(>dev, sizeof(*sp->srings), sp->srings,
-  sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, sizeof(*sp->srings), sp->srings,
+  sp->srings_dma, DMA_BIDIRECTIONAL);
 err_out_free_dev:
free_netdev(dev);
 
@@ -817,8 +825,8 @@ static int sgiseeq_remove(struct platform_device *pdev)
struct sgiseeq_private *sp = netdev_priv(dev);
 
unregister_netdev(dev);
-   dma_free_attrs(>dev, sizeof(*sp->srings), sp->srings,
-  sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, sizeof(*sp->srings), sp->srings,
+  sp->srings_dma, DMA_BIDIRECTIONAL);
free_netdev(dev);
 
return 0;
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/18] lib82596: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Use the new non-coherent DMA API including proper ownership transfers.
This includes moving the DMA helpers to lib82596 based of an ifdef to
avoid include order problems.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/i825xx/lasi_82596.c |  25 ++---
 drivers/net/ethernet/i825xx/lib82596.c   | 114 ++-
 drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
 3 files changed, 80 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c 
b/drivers/net/ethernet/i825xx/lasi_82596.c
index a12218e940a2fa..96c6f4f36904ed 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,21 +96,14 @@
 
 #define OPT_SWAP_PORT  0x0001  /* Need to wordswp on the MPU port */
 
-#define DMA_WBACK(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_TO_DEVICE); } while (0)
-
-#define DMA_INV(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_FROM_DEVICE); } while (0)
-
-#define DMA_WBACK_INV(ndev, addr, len) \
-   do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_BIDIRECTIONAL); } while (0)
-
 #define SYSBUS  0x006c
 
 /* big endian CPU, 82596 "big" endian mode */
 #define SWAP32(x)   (((u32)(x)<<16) | u32)(x)))>>16))
 #define SWAP16(x)   (x)
 
+#define NONCOHERENT_DMA 1
+
 #include "lib82596.c"
 
 MODULE_AUTHOR("Richard Hirst");
@@ -184,9 +177,9 @@ lan_init_chip(struct parisc_device *dev)
 
lp = netdev_priv(netdevice);
lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
-   lp->dma = dma_alloc_attrs(>dev, sizeof(struct i596_dma),
- >dma_addr, GFP_KERNEL,
- DMA_ATTR_NON_CONSISTENT);
+   lp->dma = dma_alloc_noncoherent(>dev,
+   sizeof(struct i596_dma), >dma_addr,
+   DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!lp->dma)
goto out_free_netdev;
 
@@ -196,8 +189,8 @@ lan_init_chip(struct parisc_device *dev)
return 0;
 
 out_free_dma:
-   dma_free_attrs(>dev, sizeof(struct i596_dma), lp->dma,
-   lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, sizeof(struct i596_dma),
+  lp->dma, lp->dma_addr, DMA_BIDIRECTIONAL);
 out_free_netdev:
free_netdev(netdevice);
return retval;
@@ -209,8 +202,8 @@ static int __exit lan_remove_chip(struct parisc_device 
*pdev)
struct i596_private *lp = netdev_priv(dev);
 
unregister_netdev (dev);
-   dma_free_attrs(>dev, sizeof(struct i596_private), lp->dma,
-  lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, sizeof(struct i596_private), lp->dma,
+  lp->dma_addr, DMA_BIDIRECTIONAL);
free_netdev (dev);
return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/lib82596.c 
b/drivers/net/ethernet/i825xx/lib82596.c
index b4e4b3eb5758b5..ca2fb303fcc6f6 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1;
 static void i596_poll_controller(struct net_device *dev);
 #endif
 
+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v)
+{
+   return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma);
+}
+
+#ifdef NONCOHERENT_DMA
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+   dma_sync_single_for_device(ndev->dev.parent,
+   virt_to_dma(netdev_priv(ndev), addr), len,
+   DMA_BIDIRECTIONAL);
+}
+
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+   dma_sync_single_for_cpu(ndev->dev.parent,
+   virt_to_dma(netdev_priv(ndev), addr), len,
+   DMA_BIDIRECTIONAL);
+}
+#else
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+}
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+   size_t len)
+{
+}
+#endif /* NONCOHERENT_DMA */
 
 static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int 
delcnt, char *str)
 {
-   DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+   dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
while (--delcnt && dma->iscp.stat) {
udelay(10);
-   DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+   dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
}
if (!delcnt) {
printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n",
@@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev, 
struct i596_dma *dma, int d
 
 static inline int wait_cmd(struct net_device *dev, struct 

[PATCH 10/18] hal2: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Use the new non-coherent DMA API including proper ownership transfers.
This also means we can allocate the buffer memory with the proper
direction instead of bidirectional.

Signed-off-by: Christoph Hellwig 
---
 sound/mips/hal2.c | 58 ++-
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index ec84bc4c3a6e77..9ac9b58d7c8cdd 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2)
hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD;
 }
 
-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+   enum dma_data_direction buffer_dir)
 {
struct device *dev = hal2->card->dev;
struct hal2_desc *desc;
@@ -449,15 +450,15 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, 
struct hal2_codec *codec)
int count = H2_BUF_SIZE / H2_BLOCK_SIZE;
int i;
 
-   codec->buffer = dma_alloc_attrs(dev, H2_BUF_SIZE, _dma,
-   GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   codec->buffer = dma_alloc_noncoherent(dev, H2_BUF_SIZE, _dma,
+   buffer_dir, GFP_KERNEL);
if (!codec->buffer)
return -ENOMEM;
-   desc = dma_alloc_attrs(dev, count * sizeof(struct hal2_desc),
-  _dma, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   desc = dma_alloc_noncoherent(dev, count * sizeof(struct hal2_desc),
+   _dma, DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!desc) {
-   dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
-  DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, 
buffer_dma,
+   buffer_dir);
return -ENOMEM;
}
codec->buffer_dma = buffer_dma;
@@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, 
struct hal2_codec *codec)
  desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc);
desc++;
}
-   dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc),
-  DMA_TO_DEVICE);
+   dma_sync_single_for_device(dev, codec->desc_dma,
+  count * sizeof(struct hal2_desc),
+  DMA_BIDIRECTIONAL);
codec->desc_count = count;
return 0;
 }
 
-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+   enum dma_data_direction buffer_dir)
 {
struct device *dev = hal2->card->dev;
 
-   dma_free_attrs(dev, codec->desc_count * sizeof(struct hal2_desc),
-  codec->desc, codec->desc_dma, DMA_ATTR_NON_CONSISTENT);
-   dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
-  DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(dev, codec->desc_count * sizeof(struct hal2_desc),
+  codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
+   dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
+   buffer_dir);
 }
 
 static const struct snd_pcm_hardware hal2_pcm_hw = {
@@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream 
*substream)
 {
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-   int err;
 
runtime->hw = hal2_pcm_hw;
-
-   err = hal2_alloc_dmabuf(hal2, >dac);
-   if (err)
-   return err;
-   return 0;
+   return hal2_alloc_dmabuf(hal2, >dac, DMA_TO_DEVICE);
 }
 
 static int hal2_playback_close(struct snd_pcm_substream *substream)
 {
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
 
-   hal2_free_dmabuf(hal2, >dac);
+   hal2_free_dmabuf(hal2, >dac, DMA_TO_DEVICE);
return 0;
 }
 
@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream 
*substream,
unsigned char *buf = hal2->dac.buffer + rec->hw_data;
 
memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes);
-   dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE);
+   dma_sync_single_for_device(hal2->card->dev,
+   hal2->dac.buffer_dma + rec->hw_data, bytes,
+   DMA_TO_DEVICE);
 
 }
 
@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream 
*substream)
 {
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
-   struct hal2_codec *adc = >adc;
-   int err;
 
runtime->hw = hal2_pcm_hw;
-
-   err 

[PATCH 09/18] sgiwd93: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
Use the new non-coherent DMA API including proper ownership transfers.
This also means we can allocate the memory as DMA_TO_DEVICE instead
of bidirectional.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sgiwd93.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 3bdf0deb8f1529..cf1030c9dda17f 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct 
scsi_cmnd *cmd, int din)
 */
hcp->desc.pbuf = 0;
hcp->desc.cntinfo = HPCDMA_EOX;
-   dma_cache_sync(hd->dev, hd->cpu,
+   dma_sync_single_for_device(hd->dev, hd->dma,
   (unsigned long)(hcp + 1) - (unsigned long)hd->cpu,
   DMA_TO_DEVICE);
 }
@@ -234,8 +234,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 
hdata = host_to_hostdata(host);
hdata->dev = >dev;
-   hdata->cpu = dma_alloc_attrs(>dev, HPC_DMA_SIZE, >dma,
-GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   hdata->cpu = dma_alloc_noncoherent(>dev, HPC_DMA_SIZE,
+   >dma, DMA_TO_DEVICE, GFP_KERNEL);
if (!hdata->cpu) {
printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
   "host %d buffer.\n", unit);
@@ -274,8 +274,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
 out_irq:
free_irq(irq, host);
 out_free:
-   dma_free_attrs(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-  DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+   DMA_TO_DEVICE);
 out_put:
scsi_host_put(host);
 out:
@@ -291,8 +291,8 @@ static int sgiwd93_remove(struct platform_device *pdev)
 
scsi_remove_host(host);
free_irq(pd->irq, host);
-   dma_free_attrs(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
-  DMA_ATTR_NON_CONSISTENT);
+   dma_free_noncoherent(>dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+   DMA_TO_DEVICE);
scsi_host_put(host);
return 0;
 }
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

2020-09-15 Thread Christoph Hellwig
Add a new API to allocate and free memory that is guaranteed to be
addressable by a device, but which potentially is not cache coherent
for DMA.

To transfer ownership to and from the device, the existing streaming
DMA API calls dma_sync_single_for_device and dma_sync_single_for_cpu
must be used.

For now the new calls are implemented on top of dma_alloc_attrs just
like the old-noncoherent API, but once all drivers are switched to
the new API it will be replaced with a better working implementation
that is available on all architectures.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 75 ++
 include/linux/dma-mapping.h| 12 +
 2 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 90239348b30f6f..ea0413276ddb70 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -516,48 +516,56 @@ routines, e.g.:::
}
 
 
-Part II - Advanced dma usage
-
+Part II - Non-coherent DMA allocations
+--
 
-Warning: These pieces of the DMA API should not be used in the
-majority of cases, since they cater for unlikely corner cases that
-don't belong in usual drivers.
+These APIs allow to allocate pages in the kernel direct mapping that are
+guaranteed to be DMA addressable.  This means that unlike dma_alloc_coherent,
+virt_to_page can be called on the resulting address, and the resulting
+struct page can be used for everything a struct page is suitable for.
 
-If you don't understand how cache line coherency works between a
-processor and an I/O device, you should not be using this part of the
-API at all.
+If you don't understand how cache line coherency works between a processor and
+an I/O device, you should not be using this part of the API.
 
 ::
 
void *
-   dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   gfp_t flag, unsigned long attrs)
+   dma_alloc_noncoherent(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir,
+   gfp_t gfp)
 
-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit.  By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
+This routine allocates a region of  bytes of consistent memory.  It
+returns a pointer to the allocated region (in the processor's virtual address
+space) or NULL if the allocation failed.  The returned memory may or may not
+be in the kernels direct mapping.  Drivers must not call virt_to_page on
+the returned memory region.
 
-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
 
-Warning:  Handling non-consistent memory is a real pain.  You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
 
 ::
 
void
-   dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-  dma_addr_t dma_handle, unsigned long attrs)
+   dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
+   dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free memory allocated by the dma_alloc_attrs().  All common
-parameters must be identical to those otherwise passed to dma_free_coherent,
-and the attrs argument must be identical to the attrs passed to
-dma_alloc_attrs().
+Free a region of memory previously allocated using dma_alloc_noncoherent().
+dev, size and dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
+the dma_alloc_noncoherent().
 
 ::
 
@@ -575,17 +583,6 @@ memory or doing partial flushes.
into the width returned by this call.  It will also always be a power
of two for easy 

[PATCH 07/18] 53c700: improve non-coherent DMA handling

2020-09-15 Thread Christoph Hellwig
Switch the 53c700 driver to only use non-coherent descriptor memory if it
really has to because dma_alloc_coherent fails.  This doesn't matter for
any of the platforms it runs on currently, but that will change soon.

To help with this two new helpers to transfer ownership to and from the
device are added that abstract the syncing of the non-coherent memory.
The two current bidirectional cases are mapped to transfers to the
device, as that appears to what they are used for.  Note that for parisc,
which is the only architecture this driver needs to use non-coherent
memory on, the direction argument of dma_cache_sync is ignored, so this
will not change behavior in any way.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/53c700.c | 113 +++---
 drivers/scsi/53c700.h |  17 ---
 2 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 84b57a8f86bfa9..c59226d7e2f6b5 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
  spi_period(SDp->sdev_target));
 }
 
+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
+   void *addr, size_t size)
+{
+   if (h->noncoherent)
+   dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+}
+
+static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
+   void *addr, size_t size)
+{
+   if (h->noncoherent)
+   dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+}
+
 struct Scsi_Host *
 NCR_700_detect(struct scsi_host_template *tpnt,
   struct NCR_700_Host_Parameters *hostdata, struct device *dev)
@@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt,
if(tpnt->sdev_attrs == NULL)
tpnt->sdev_attrs = NCR_700_dev_attrs;
 
-   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, ,
-GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
-   if(memory == NULL) {
+   memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, , GFP_KERNEL);
+   if (!memory) {
+   hostdata->noncoherent = 1;
+   memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, ,
+GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+   }
+   if (!memory) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, 
detaching\n");
return NULL;
}
@@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt,
for (j = 0; j < PATCHES; j++)
script[LABELPATCHES[j]] = bS_to_host(pScript + 
SCRIPT[LABELPATCHES[j]]);
/* now patch up fixed addresses. */
-   script_patch_32(hostdata->dev, script, MessageLocation,
+   script_patch_32(hostdata, script, MessageLocation,
pScript + MSGOUT_OFFSET);
-   script_patch_32(hostdata->dev, script, StatusAddress,
+   script_patch_32(hostdata, script, StatusAddress,
pScript + STATUS_OFFSET);
-   script_patch_32(hostdata->dev, script, ReceiveMsgAddress,
+   script_patch_32(hostdata, script, ReceiveMsgAddress,
pScript + MSGIN_OFFSET);
 
hostdata->script = script;
@@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host)
struct NCR_700_Host_Parameters *hostdata = 
(struct NCR_700_Host_Parameters *)host->hostdata[0];
 
-   dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
-  hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+   if (hostdata->noncoherent)
+   dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+  hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+   else
+   dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
+ hostdata->script, hostdata->pScript);
return 1;
 }
 
@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host,
shost_printk(KERN_WARNING, host,
"Unexpected SDTR msg\n");
hostdata->msgout[0] = A_REJECT_MSG;
-   dma_cache_sync(hostdata->dev, hostdata->msgout, 1, 
DMA_TO_DEVICE);
-   script_patch_16(hostdata->dev, hostdata->script,
+   dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+   script_patch_16(hostdata, hostdata->script,
MessageCount, 1);
/* SendMsgOut returns, so set up the return
 * address */
@@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host,
printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD, 
Rejecting\n",
   host->host_no, pun, lun);
hostdata->msgout[0] = A_REJECT_MSG;
-

[PATCH 06/18] lib82596: move DMA allocation into the callers of i82596_probe

2020-09-15 Thread Christoph Hellwig
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++--
 drivers/net/ethernet/i825xx/lib82596.c   | 36 
 drivers/net/ethernet/i825xx/sni_82596.c  | 19 +
 3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c 
b/drivers/net/ethernet/i825xx/lasi_82596.c
index aec7e98bcc853a..a12218e940a2fa 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,8 +96,6 @@
 
 #define OPT_SWAP_PORT  0x0001  /* Need to wordswp on the MPU port */
 
-#define LIB82596_DMA_ATTR  DMA_ATTR_NON_CONSISTENT
-
 #define DMA_WBACK(ndev, addr, len) \
do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, 
DMA_TO_DEVICE); } while (0)
 
@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev)
 {
struct  net_device *netdevice;
struct i596_private *lp;
-   int retval;
+   int retval = -ENOMEM;
int i;
 
if (!dev->irq) {
@@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)
 
lp = netdev_priv(netdevice);
lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
+   lp->dma = dma_alloc_attrs(>dev, sizeof(struct i596_dma),
+ >dma_addr, GFP_KERNEL,
+ DMA_ATTR_NON_CONSISTENT);
+   if (!lp->dma)
+   goto out_free_netdev;
 
retval = i82596_probe(netdevice);
-   if (retval) {
-   free_netdev(netdevice);
-   return -ENODEV;
-   }
+   if (retval)
+   goto out_free_dma;
+   return 0;
+
+out_free_dma:
+   dma_free_attrs(>dev, sizeof(struct i596_dma), lp->dma,
+   lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+out_free_netdev:
+   free_netdev(netdevice);
return retval;
 }
 
@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device 
*pdev)
 
unregister_netdev (dev);
dma_free_attrs(>dev, sizeof(struct i596_private), lp->dma,
-  lp->dma_addr, LIB82596_DMA_ATTR);
+  lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
free_netdev (dev);
return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/lib82596.c 
b/drivers/net/ethernet/i825xx/lib82596.c
index b03757e169e475..b4e4b3eb5758b5 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {
 
 static int i82596_probe(struct net_device *dev)
 {
-   int i;
struct i596_private *lp = netdev_priv(dev);
-   struct i596_dma *dma;
+   int ret;
 
/* This lot is ensure things have been cache line aligned. */
BUILD_BUG_ON(sizeof(struct i596_rfd) != 32);
@@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev)
if (!dev->base_addr || !dev->irq)
return -ENODEV;
 
-   dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
- >dma_addr, GFP_KERNEL,
- LIB82596_DMA_ATTR);
-   if (!dma) {
-   printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
-   return -ENOMEM;
-   }
-
dev->netdev_ops = _netdev_ops;
dev->watchdog_timeo = TX_TIMEOUT;
 
-   memset(dma, 0, sizeof(struct i596_dma));
-   lp->dma = dma;
-
-   dma->scb.command = 0;
-   dma->scb.cmd = I596_NULL;
-   dma->scb.rfd = I596_NULL;
+   memset(lp->dma, 0, sizeof(struct i596_dma));
+   lp->dma->scb.command = 0;
+   lp->dma->scb.cmd = I596_NULL;
+   lp->dma->scb.rfd = I596_NULL;
spin_lock_init(>lock);
 
-   DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+   DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
 
-   i = register_netdev(dev);
-   if (i) {
-   dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
-  dma, lp->dma_addr, LIB82596_DMA_ATTR);
-   return i;
-   }
+   ret = register_netdev(dev);
+   if (ret)
+   return ret;
 
DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n",
  dev->name, dev->base_addr, dev->dev_addr,
  dev->irq));
DEB(DEB_INIT, printk(KERN_INFO
 "%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n",
-dev->name, dma, (int)sizeof(struct i596_dma),
->scb));
+dev->name, lp->dma, (int)sizeof(struct i596_dma),
+>dma->scb));
 
return 0;
 }
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c 
b/drivers/net/ethernet/i825xx/sni_82596.c
index 

[PATCH 05/18] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT

2020-09-15 Thread Christoph Hellwig
The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT.  From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.

Signed-off-by: Christoph Hellwig 
---
 drivers/net/ethernet/amd/au1000_eth.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/au1000_eth.c 
b/drivers/net/ethernet/amd/au1000_eth.c
index 75dbd221dc594b..19e195420e2434 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev)
/* Allocate the data buffers
 * Snooping works fine with eth on all au1xxx
 */
-   aup->vaddr = (u32)dma_alloc_attrs(>dev, MAX_BUF_SIZE *
+   aup->vaddr = (u32)dma_alloc_coherent(>dev, MAX_BUF_SIZE *
  (NUM_TX_BUFFS + NUM_RX_BUFFS),
- >dma_addr, 0,
- DMA_ATTR_NON_CONSISTENT);
+ >dma_addr, 0);
if (!aup->vaddr) {
dev_err(>dev, "failed to allocate data buffers\n");
err = -ENOMEM;
@@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev)
 err_remap2:
iounmap(aup->mac);
 err_remap1:
-   dma_free_attrs(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-   (void *)aup->vaddr, aup->dma_addr,
-   DMA_ATTR_NON_CONSISTENT);
+   dma_free_coherent(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + 
NUM_RX_BUFFS),
+   (void *)aup->vaddr, aup->dma_addr);
 err_vaddr:
free_netdev(dev);
 err_alloc:
@@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev)
if (aup->tx_db_inuse[i])
au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);
 
-   dma_free_attrs(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
-   (void *)aup->vaddr, aup->dma_addr,
-   DMA_ATTR_NON_CONSISTENT);
+   dma_free_coherent(>dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + 
NUM_RX_BUFFS),
+   (void *)aup->vaddr, aup->dma_addr);
 
iounmap(aup->macdma);
iounmap(aup->mac);
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/18] drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT

2020-09-15 Thread Christoph Hellwig
DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS
configs, so don't set it in this ARM specific driver part.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 985f2990ab0dda..13d4d7ac0697b4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,
 
nvkm_info(>base.subdev, "using IOMMU\n");
} else {
-   imem->attrs = DMA_ATTR_NON_CONSISTENT |
- DMA_ATTR_WEAK_ORDERING |
+   imem->attrs = DMA_ATTR_WEAK_ORDERING |
  DMA_ATTR_WRITE_COMBINE;
 
nvkm_info(>base.subdev, "using DMA API\n");
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/18] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT

2020-09-15 Thread Christoph Hellwig
DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS
configs, so don't set it in this ARM specific driver.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index efa476858db54b..07073222b8f691 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem 
*exynos_gem, bool kvmap)
if (exynos_gem->flags & EXYNOS_BO_WC ||
!(exynos_gem->flags & EXYNOS_BO_CACHABLE))
attr |= DMA_ATTR_WRITE_COMBINE;
-   else
-   attr |= DMA_ATTR_NON_CONSISTENT;
 
/* FBDev emulation requires kernel mapping */
if (!kvmap)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/18] mm: turn alloc_pages into an inline function

2020-09-15 Thread Christoph Hellwig
To prevent a compiler error when a method call alloc_pages is
added (which I plan to for the dma_map_ops).

Signed-off-by: Christoph Hellwig 
---
 include/linux/gfp.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b98..dd2577c5407112 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int 
order,
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
 #else
-#define alloc_pages(gfp_mask, order) \
-   alloc_pages_node(numa_node_id(), gfp_mask, order)
+static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+   return alloc_pages_node(numa_node_id(), gfp_mask, order);
+}
 #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
alloc_pages(gfp_mask, order)
 #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/18] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT flag

2020-09-15 Thread Christoph Hellwig
From: Sergey Senozhatsky 

The patch partially reverts some of the UAPI bits of the buffer
cache management hints. Namely, the queue consistency (memory
coherency) user-space hint because, as it turned out, the kernel
implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.

The patch revers both kernel and user space parts: removes the
DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers,
v4l2_create_buffers structures and corresponding UAPI functions
(plus compat32 layer) and cleanups the documentation.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Sergey Senozhatsky 
Signed-off-by: Christoph Hellwig 
---
 .../userspace-api/media/v4l/buffer.rst| 17 ---
 .../media/v4l/vidioc-create-bufs.rst  |  6 +--
 .../media/v4l/vidioc-reqbufs.rst  | 12 +
 .../media/common/videobuf2/videobuf2-core.c   | 46 +++
 .../common/videobuf2/videobuf2-dma-contig.c   | 19 
 .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
 .../media/common/videobuf2/videobuf2-v4l2.c   | 18 +---
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +---
 drivers/media/v4l2-core/v4l2-ioctl.c  |  5 +-
 include/media/videobuf2-core.h|  7 +--
 include/uapi/linux/videodev2.h| 13 +-
 11 files changed, 22 insertions(+), 134 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
 :stub-columns: 0
 :widths:   3 1 4
 
-* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
-  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
-  - 0x0001
-  - A buffer is allocated either in consistent (it will be automatically
-   coherent between the CPU and the bus) or non-consistent memory. The
-   latter can provide performance gains, for instance the CPU cache
-   sync/flush operations can be avoided if the buffer is accessed by the
-   corresponding device only and the CPU does not read/write to/from that
-   buffer. However, this requires extra care from the driver -- it must
-   guarantee memory consistency by issuing a cache flush/sync when
-   consistency is needed. If this flag is set V4L2 will attempt to
-   allocate the buffer in non-consistent memory. The flag takes effect
-   only if the buffer is used for :ref:`memory mapping ` I/O and the
-   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
-   ` capability.
-
 .. c:type:: v4l2_memory
 
 enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst 
b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
index f2a702870fadc1..12cf6b44f414f7 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
@@ -120,13 +120,9 @@ than the number requested.
If you want to just query the capabilities without making any
other changes, then set ``count`` to 0, ``memory`` to
``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
-* - __u32
-  - ``flags``
-  - Specifies additional buffer management attributes.
-   See :ref:`memory-flags`.
 
 * - __u32
-  - ``reserved``\ [6]
+  - ``reserved``\ [7]
   - A place holder for future extensions. Drivers and applications
must set the array to zero.
 
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst 
b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..0e3e2fde65e850 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -112,17 +112,10 @@ aborting or finishing any DMA in progress, an implicit
``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will
free any previously allocated buffers, so this is typically something
that will be done at the start of the application.
-* - union {
-  - (anonymous)
-* - __u32
-  - ``flags``
-  - Specifies additional buffer management attributes.
-   See :ref:`memory-flags`.
 * - __u32
   - ``reserved``\ [1]
-  - Kept for backwards compatibility. Use ``flags`` instead.
-* - }
-  -
+  - A place holder for future extensions. Drivers and applications
+   must set the array to zero.
 
 .. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|
 
@@ -169,7 +162,6 @@ aborting or finishing any DMA in progress, an implicit
   - This capability is set by the driver to indicate that the queue 
supports
 cache and memory management hints. However, it's only valid when the
 queue is used for :ref:`memory mapping ` streaming I/O. See
-:ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT 

a saner API for allocating DMA addressable pages v3

2020-09-15 Thread Christoph Hellwig
Hi all,

this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
with a separate new dma_alloc_pages API, which is available on all
platforms.  In addition to cleaning up the convoluted code path, this
ensures that other drivers that have asked for better support for
non-coherent DMA to pages with incurring bounce buffering over can finally
be properly supported.

As a follow up I plan to move the implementation of the
DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
that is also is a fundamentally non coherent allocation.  The replacement
for that flag would then return a struct page, as it is allowed to
actually return pages without a kernel mapping as the name suggested
(although most of the time they will actually have a kernel mapping..)

In addition to the conversions of the existing non-coherent DMA users,
I've also added a patch to convert the firewire ohci driver to use
the new dma_alloc_pages API.

The first patch is queued up for 5.9 in the media tree, but included here
for completeness.


A git tree is available here:

git://git.infradead.org/users/hch/misc.git dma_alloc_pages

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages


Changes since v2:
 - fix up the patch reshuffle which wasn't quite correct
 - fix up a few commit messages

Changes since v1:
 - rebased on the latests dma-mapping tree, which merged many of the
   cleanups
 - fix an argument passing typo in 53c700, caught by sparse
 - rename a few macro arguments in 53c700
 - pass the right device to the DMA API in the lib82596 drivers
 - fix memory ownershiptransfers in sgiseeq
 - better document what a page in the direct kernel mapping means
 - split into dma_alloc_pages that returns a struct page and is in the
   direct mapping vs dma_alloc_noncoherent that can be vmapped
 - conver the firewire ohci driver to dma_alloc_pages

Diffstat:
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] irqchip/qcom-pdc: Allow QCOM_PDC to be loadable as a permanent module

2020-09-15 Thread Bjorn Andersson
On Mon 14 Sep 21:04 UTC 2020, John Stultz wrote:

> Allows qcom-pdc driver to be loaded as a permanent module.
> 
> An earlier version of this patch was merged in a larger patchset
> but was reverted entirely when issues were found with other
> drivers, so now that Marc has provided a better solution in his
> Hybrid probing patch set, I wanted to re-submit this change.
> 

Reviewed-by: Bjorn Andersson 

> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Joerg Roedel 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Linus Walleij 
> Cc: Maulik Shah 
> Cc: Lina Iyer 
> Cc: Saravana Kannan 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Cc: linux-arm-...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-g...@vger.kernel.org
> Signed-off-by: John Stultz 
> ---
>  drivers/irqchip/Kconfig| 2 +-
>  drivers/irqchip/qcom-pdc.c | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index bfc9719dbcdc..bb70b7177f94 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -425,7 +425,7 @@ config GOLDFISH_PIC
>   for Goldfish based virtual platforms.
>  
>  config QCOM_PDC
> - bool "QCOM PDC"
> + tristate "QCOM PDC"
>   depends on ARCH_QCOM
>   select IRQ_DOMAIN_HIERARCHY
>   help
> diff --git a/drivers/irqchip/qcom-pdc.c b/drivers/irqchip/qcom-pdc.c
> index 8543fa23da10..59eb3c8473b0 100644
> --- a/drivers/irqchip/qcom-pdc.c
> +++ b/drivers/irqchip/qcom-pdc.c
> @@ -433,3 +433,5 @@ static int qcom_pdc_init(struct device_node *node, struct 
> device_node *parent)
>  IRQCHIP_HYBRID_DRIVER_BEGIN(qcom_pdc)
>  IRQCHIP_MATCH("qcom,pdc", qcom_pdc_init)
>  IRQCHIP_HYBRID_DRIVER_END(qcom_pdc)
> +MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Power Domain Controller");
> +MODULE_LICENSE("GPL v2");
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Thomas Tai




On 2020-09-15 11:09 a.m., Christoph Hellwig wrote:

On Tue, Sep 15, 2020 at 10:40:39AM -0400, Thomas Tai wrote:

+++ b/include/linux/dma-direct.h
@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
   {
dma_addr_t end = addr + size - 1;
   -if (!dev->dma_mask)
-   return false;
-


I am concerned that some drivers may rely on this NULL checking. Would you
think we can keep this checking and use the following WARN_ON_ONCE()?


dma_capable is not a helper for drivers, but just for dma-direct
and related code.  And this patch adds the checks for the three
places how we call into the ->map* methods.



Ok. That sounds good to me. I will make the suggested changes and run 
some tests before sending out the V2 patch.


Thank you,
Thomas
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Christoph Hellwig
On Tue, Sep 15, 2020 at 10:40:39AM -0400, Thomas Tai wrote:
>> +++ b/include/linux/dma-direct.h
>> @@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, 
>> dma_addr_t addr, size_t size,
>>   {
>>  dma_addr_t end = addr + size - 1;
>>   -  if (!dev->dma_mask)
>> -return false;
>> -
>
> I am concerned that some drivers may rely on this NULL checking. Would you 
> think we can keep this checking and use the following WARN_ON_ONCE()?

dma_capable is not a helper for drivers, but just for dma-direct
and related code.  And this patch adds the checks for the three
places how we call into the ->map* methods.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Thomas Tai



On 2020-09-15 10:26 a.m., Christoph Hellwig wrote:

On Tue, Sep 15, 2020 at 10:11:51AM -0400, Thomas Tai wrote:



On 2020-09-15 10:07 a.m., Christoph Hellwig wrote:

On Tue, Sep 15, 2020 at 08:03:14AM -0600, Thomas Tai wrote:

When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
printing a warning message in swiotlb_map(). It is because dev->dma_mask
can potentially be a null pointer. Using the dma_get_mask() macro can
avoid the NULL pointer dereference.


dma_mask must not be zero.  This means drm is calling DMA API functions
on something weird.  This needs to be fixed in the caller.



Thanks, Christoph for your comment. The caller already fixed the null
pointer in the latest v5.9-rc5. I am thinking that if we had used the
dma_get_mask(), the kernel couldn't panic and could properly print out the
warning message.


If we want to solve this something like this patch is probably the
right way:



diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e87225600ae35..064870844f06c1 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
  {
dma_addr_t end = addr + size - 1;
  
-	if (!dev->dma_mask)

-   return false;
-


I am concerned that some drivers may rely on this NULL checking. Would 
you think we can keep this checking and use the following WARN_ON_ONCE()?



if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d129421e75fc8..2b01d8f7baf160 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -144,6 +144,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
dma_addr_t addr;
  
  	BUG_ON(!valid_dma_direction(dir));

+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return DMA_MAPPING_ERROR;
+
if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
@@ -179,6 +183,10 @@ int dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg, int nents,
int ents;
  
  	BUG_ON(!valid_dma_direction(dir));

+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return 0;
+
if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
@@ -217,6 +225,9 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t 
phys_addr,
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
return DMA_MAPPING_ERROR;
  
+	if (WARN_ON_ONCE(!dev->dma_mask))

+   return DMA_MAPPING_ERROR;
+
if (dma_map_direct(dev, ops))
addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);
else if (ops->map_resource)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 04:33:10PM -0600, Alex Williamson wrote:

> Can you explain that further, or spit-ball what you think this /dev/sva
> interface looks like and how a user might interact between vfio and
> this new interface? 

When you open it you get some container, inside the container the
user can create PASIDs. PASIDs outside that container cannot be
reached.

Creating a PASID, or the guest PASID range would be the entry point
for doing all the operations against a PASID or range that this patch
series imagines:
 - Map process VA mappings to the PASID's DMA virtual address space
 - Catch faults
 - Setup any special HW stuff like Intel's two level thing, ARM stuff, etc
 - Expose resource controls, cgroup, whatever
 - Migration special stuff (allocate fixed PASIDs)

A PASID is a handle for an IOMMU page table, and the tools to
manipulate it. Within /dev/sva the page table is just 'floating' and
not linked to any PCI functions

The open /dev/sva FD holding the allocated PASIDs would be passed to a
kernel driver. This is a security authorization that the specified
PASID can be assigned to a PCI device by the kernel.

At this point the kernel driver would have the IOMMU permit its
bus/device/function to use the PASID. The PASID can be passed to
multiple drivers of any driver flavour so table re-use is
possible. Now the IOMMU page table is linked to a device.

The kernel device driver would also do the device specific programming
to setup the PASID in the device, attach it to some device object and
expose the device for user DMA.

For instance IDXD's char dev would map the queue memory and associate
the PASID with that queue and setup the HW to be ready for the new
enque instruction. The IDXD mdev would link to its emulated PCI BAR
and ensure the guest can only use PASID's included in the /dev/sva
container.

The qemu control plane for vIOMMU related to PASID would run over
/dev/sva.

I think the design could go further where a 'PASID' is just an
abstract idea of a page table, then vfio-pci could consume it too as a
IOMMU page table handle even though there is no actual PASID. So qemu
could end up with one API to universally control the vIOMMU, an API
that can be shared between subsystems and is not tied to VFIO.

> allocating pasids and associating them with page tables for that
> two-stage IOMMU setup, performing cache invalidations based on page
> table updates, etc.  How does it make more sense for a vIOMMU to
> setup some aspects of the IOMMU through vfio and others through a
> TBD interface?

vfio's IOMMU interface is about RID based full device ownership,
and fixed mappings.

PASID is about mediation, shared ownership and page faulting.

Does PASID overlap with the existing IOMMU RID interface beyond both
are using the IOMMU?

> The IOMMU needs to allocate PASIDs, so in that sense it enforces a
> quota via the architectural limits, but is the IOMMU layer going to
> distinguish in-kernel versus user limits?  A cgroup limit seems like a
> good idea, but that's not really at the IOMMU layer either and I don't
> see that a /dev/sva and vfio interface couldn't both support a cgroup
> type quota.

It is all good questions. PASID is new, this stuff needs to be
sketched out more. A lot of in-kernel users of IOMMU PASID are
probably going to be triggered by userspace actions.

I think a cgroup quota would end up near the IOMMU layer, so vfio,
sva, and any other driver char devs would all be restricted by the
cgroup as peers.

> And it's not clear that they'll have compatible requirements.  A
> userspace idxd driver might have limited needs versus a vIOMMU backend.
> Does a single quota model adequately support both or are we back to the
> differences between access to a device and ownership of a device?

At the end of the day a PASID is just a number and the drivers only
use of it is to program it into HW.

All these other differences deal with the IOMMU side of the PASID, how
pages are mapped into it, how page fault works, etc, etc. Keeping the
two concerns seperated seems very clean. A device driver shouldn't
care how the PASID is setup.

> > > This series is a blueprint within the context of the ownership and
> > > permission model that VFIO already provides.  It doesn't seem like we
> > > can pluck that out on its own, nor is it necessarily the case that VFIO
> > > wouldn't want to provide PASID services within its own API even if we
> > > did have this undefined /dev/sva interface.  
> > 
> > I don't see what you do - VFIO does not own PASID, and in this
> > vfio-mdev mode it does not own the PCI device/IOMMU either. So why
> > would this need to be part of the VFIO owernship and permission model?
> 
> Doesn't the PASID model essentially just augment the requester ID IOMMU
> model so as to manage the IOVAs for a subdevice of a RID?  

I'd say not really.. PASID is very different from RID because PASID
must always be mediated by the kernel. vfio-pci doesn't know how to
use PASID because it doesn't 

Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Christoph Hellwig
On Tue, Sep 15, 2020 at 10:11:51AM -0400, Thomas Tai wrote:
>
>
> On 2020-09-15 10:07 a.m., Christoph Hellwig wrote:
>> On Tue, Sep 15, 2020 at 08:03:14AM -0600, Thomas Tai wrote:
>>> When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
>>> printing a warning message in swiotlb_map(). It is because dev->dma_mask
>>> can potentially be a null pointer. Using the dma_get_mask() macro can
>>> avoid the NULL pointer dereference.
>>
>> dma_mask must not be zero.  This means drm is calling DMA API functions
>> on something weird.  This needs to be fixed in the caller.
>>
>
> Thanks, Christoph for your comment. The caller already fixed the null 
> pointer in the latest v5.9-rc5. I am thinking that if we had used the 
> dma_get_mask(), the kernel couldn't panic and could properly print out the 
> warning message.

If we want to solve this something like this patch is probably the
right way:



diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e87225600ae35..064870844f06c1 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 {
dma_addr_t end = addr + size - 1;
 
-   if (!dev->dma_mask)
-   return false;
-
if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 0d129421e75fc8..2b01d8f7baf160 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -144,6 +144,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
dma_addr_t addr;
 
BUG_ON(!valid_dma_direction(dir));
+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return DMA_MAPPING_ERROR;
+
if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
@@ -179,6 +183,10 @@ int dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg, int nents,
int ents;
 
BUG_ON(!valid_dma_direction(dir));
+
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return 0;
+
if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
@@ -217,6 +225,9 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t 
phys_addr,
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
return DMA_MAPPING_ERROR;
 
+   if (WARN_ON_ONCE(!dev->dma_mask))
+   return DMA_MAPPING_ERROR;
+
if (dma_map_direct(dev, ops))
addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);
else if (ops->map_resource)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Thomas Tai




On 2020-09-15 10:07 a.m., Christoph Hellwig wrote:

On Tue, Sep 15, 2020 at 08:03:14AM -0600, Thomas Tai wrote:

When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
printing a warning message in swiotlb_map(). It is because dev->dma_mask
can potentially be a null pointer. Using the dma_get_mask() macro can
avoid the NULL pointer dereference.


dma_mask must not be zero.  This means drm is calling DMA API functions
on something weird.  This needs to be fixed in the caller.



Thanks, Christoph for your comment. The caller already fixed the null 
pointer in the latest v5.9-rc5. I am thinking that if we had used the 
dma_get_mask(), the kernel couldn't panic and could properly print out 
the warning message.


Thomas
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/17] 53c700: improve non-coherent DMA handling

2020-09-15 Thread James Bottomley
On Tue, 2020-09-15 at 08:27 +0200, Christoph Hellwig wrote:
> On Mon, Sep 14, 2020 at 08:20:18AM -0700, James Bottomley wrote:
> > If you're going to change the macros from taking a device to taking
> > a hostdata structure then the descriptive argument name needs to
> > change ... it can't be dev anymore.  I'm happy with it simply
> > becoming 'h' if hostdata is too long.
> > 
> > I already asked for this on the first go around:
> 
> And I did rename them, those hunks just accidentally slipped into
> patch 12 instead of this one.  Fixed for the next versions.

Ah, yes, found it ... thanks for doing that!

James

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Christoph Hellwig
On Tue, Sep 15, 2020 at 04:07:19PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 15, 2020 at 08:03:14AM -0600, Thomas Tai wrote:
> > When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
> > printing a warning message in swiotlb_map(). It is because dev->dma_mask
> > can potentially be a null pointer. Using the dma_get_mask() macro can
> > avoid the NULL pointer dereference.
> 
> dma_mask must not be zero.  This means drm is calling DMA API functions
> on something weird.  This needs to be fixed in the caller.

s/zero/NULL/, but the point stands.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Christoph Hellwig
On Tue, Sep 15, 2020 at 08:03:14AM -0600, Thomas Tai wrote:
> When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
> printing a warning message in swiotlb_map(). It is because dev->dma_mask
> can potentially be a null pointer. Using the dma_get_mask() macro can
> avoid the NULL pointer dereference.

dma_mask must not be zero.  This means drm is calling DMA API functions
on something weird.  This needs to be fixed in the caller.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-direct: Fix potential NULL pointer dereference

2020-09-15 Thread Thomas Tai
When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
printing a warning message in swiotlb_map(). It is because dev->dma_mask
can potentially be a null pointer. Using the dma_get_mask() macro can
avoid the NULL pointer dereference.

Fixes: d323bb44e4d2 ("drm/virtio: Call the right shmem helpers")

[drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
 BUG: kernel NULL pointer dereference, address: 
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x) - not-present page
 PGD 0 P4D 0
 Oops:  [#1] SMP PTI
 CPU: 1 PID: 331 Comm: systemd-udevd Not tainted 5.9.0-rc4 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
 BIOS 1.13.0-1ubuntu1 04/01/2014
 RIP: 0010:swiotlb_map+0x1ac/0x200
 Code: e8 d9 fc ff ff 80 3d 92 ee 4c 01 00 75 51 49 8b 84 24 48 02 00 00
 4d 8b 6c 24 50 c6 05 7c ee 4c 01 01 4d 8b bc 24 58 02 00 00 <4c> 8b 30
 4d 85 ed 75 04 4d 8b 2c 24 4c 89 e7 e8 10 6b 4f 00 4d 89
 RSP: 0018:9f96801af6f8 EFLAGS: 00010246
 RAX:  RBX: 1000 RCX: 0080
 RDX: 007f RSI: 0202 RDI: 0202
 RBP: 9f96801af748 R08:  R09: 0020
 R10:  R11: 8fabfffa3000 R12: 8faad02c7810
 R13:  R14: 0020 R15: 
 FS:  7fabc63588c0() GS:8fabf7c8()
 knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2:  CR3: 000151496005 CR4: 00370ee0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  dma_direct_map_sg+0x124/0x210
  dma_map_sg_attrs+0x32/0x50
  drm_gem_shmem_get_pages_sgt+0x6a/0x90 [drm]
  virtio_gpu_object_create+0x140/0x2f0 [virtio_gpu]
  ? ww_mutex_unlock+0x26/0x30
  virtio_gpu_mode_dumb_create+0xab/0x160 [virtio_gpu]
  drm_mode_create_dumb+0x82/0x90 [drm]
  drm_client_framebuffer_create+0xaa/0x200 [drm]
  drm_fb_helper_generic_probe+0x59/0x150 [drm_kms_helper]
  drm_fb_helper_single_fb_probe+0x29e/0x3e0 [drm_kms_helper]
  __drm_fb_helper_initial_config_and_unlock+0x41/0xd0 [drm_kms_helper]
  drm_fbdev_client_hotplug+0xe6/0x1a0 [drm_kms_helper]
  drm_fbdev_generic_setup+0xaf/0x170 [drm_kms_helper]
  virtio_gpu_probe+0xea/0x100 [virtio_gpu]
  virtio_dev_probe+0x14b/0x1e0 [virtio]
  really_probe+0x1db/0x440
  driver_probe_device+0xe9/0x160
  device_driver_attach+0x5d/0x70
  __driver_attach+0x8f/0x150
  ? device_driver_attach+0x70/0x70
  bus_for_each_dev+0x7e/0xc0
  driver_attach+0x1e/0x20
  bus_add_driver+0x152/0x1f0
  driver_register+0x74/0xd0
  ? 0xc0529000
  register_virtio_driver+0x20/0x30 [virtio]
  virtio_gpu_driver_init+0x15/0x1000 [virtio_gpu]
  do_one_initcall+0x4a/0x1fa
  ? _cond_resched+0x19/0x30
  ? kmem_cache_alloc_trace+0x16b/0x2e0
  do_init_module+0x62/0x240
  load_module+0xe0e/0x1100
  ? security_kernel_post_read_file+0x5c/0x70
  __do_sys_finit_module+0xbe/0x120
  ? __do_sys_finit_module+0xbe/0x120
  __x64_sys_finit_module+0x1a/0x20
  do_syscall_64+0x38/0x50
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Thomas Tai 
---
 include/linux/dma-direct.h | 2 +-
 kernel/dma/swiotlb.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 6e87225..7556067 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -168,7 +168,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
 
dev_WARN_ONCE(dev, 1,
 "DMA addr %pad+%zu overflow (mask %llx, bus limit 
%llx).\n",
-_addr, size, *dev->dma_mask, 
dev->bus_dma_limit);
+_addr, size, dma_get_mask(dev), 
dev->bus_dma_limit);
return DMA_MAPPING_ERROR;
}
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fa..aa7727b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -682,7 +682,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
attrs | DMA_ATTR_SKIP_CPU_SYNC);
dev_WARN_ONCE(dev, 1,
"swiotlb addr %pad+%zu overflow (mask %llx, bus limit 
%llx).\n",
-   _addr, size, *dev->dma_mask, dev->bus_dma_limit);
+   _addr, size, dma_get_mask(dev), dev->bus_dma_limit);
return DMA_MAPPING_ERROR;
}
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: fix interrupt remapping for avic

2020-09-15 Thread Joao Martins
On 9/15/20 1:30 PM, Suravee Suthikulpanit wrote:
> On 9/15/20 6:25 PM, Maxim Levitsky wrote:
>> On Mon, 2020-09-14 at 21:48 +0700, Suravee Suthikulpanit wrote:
>>> Could you please try with the following patch instead?
>>>
>>> --- a/drivers/iommu/amd/iommu.c
>>> +++ b/drivers/iommu/amd/iommu.c
>>> @@ -3840,14 +3840,18 @@ int amd_iommu_activate_guest_mode(void *data)
>>>{
>>>   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>>>   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
>>> +   u64 valid;
>>>
>>>   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>>>   !entry || entry->lo.fields_vapic.guest_mode)
>>>   return 0;
>>>
>>> +   valid = entry->lo.fields_vapic.valid;
>>> +
>>>   entry->lo.val = 0;
>>>   entry->hi.val = 0;
>>>
>>> +   entry->lo.fields_vapic.valid   = valid;
>>>   entry->lo.fields_vapic.guest_mode  = 1;
>>>   entry->lo.fields_vapic.ga_log_intr = 1;
>>>   entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
>>> @@ -3864,12 +3868,14 @@ int amd_iommu_deactivate_guest_mode(void *data)
>>>   struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>>>   struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
>>>   struct irq_cfg *cfg = ir_data->cfg;
>>> -   u64 valid = entry->lo.fields_remap.valid;
>>> +   u64 valid;
>>>
>>>   if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>>>   !entry || !entry->lo.fields_vapic.guest_mode)
>>>   return 0;
>>>
>>> +   valid = entry->lo.fields_remap.valid;
>>> +
>>>   entry->lo.val = 0;
>>>   entry->hi.val = 0;
>> I see. I based my approach on the fact that valid bit was
>> set always to true anyway before, plus that amd_iommu_activate_guest_mode
>> should be really only called when someone activates a valid interrupt 
>> remapping
>> entry, but IMHO the approach of preserving the valid bit is safer anyway.
>>
>> It works on my system (I applied the patch manually, since either your or my 
>> email client,
>> seems to mangle the patch)
>>
> 
> Sorry for the mangled patch. I'll submit the patch w/ your information. 
> Thanks for your help reporting, debugging, and 
> testing the patch.
> 
I assume you're only doing the valid bit preservation in 
amd_iommu_activate_guest_mode() ?
The null deref fix in amd_iommu_deactivate_guest_mode() was fixed elsewhere[0], 
or are you
planning on merging both changes like the diff you attached?

Asking also because commit 26e495f341 ("iommu/amd: Restore IRTE.RemapEn bit 
after
programming IRTE") was added in v5.4 and v5.8 stable trees but the v5.4 
backport didn't
include e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
IRTE").

Joao

[0] 
https://lore.kernel.org/linux-iommu/20200910171621.12879-1-joao.m.mart...@oracle.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-15 Thread Thierry Reding
On Mon, Sep 14, 2020 at 04:08:29PM -0600, Rob Herring wrote:
> On Fri, Sep 04, 2020 at 02:59:57PM +0200, Thierry Reding wrote:
> > From: Thierry Reding 
> > 
> > Reserved memory regions can be marked as "active" if hardware is
> > expected to access the regions during boot and before the operating
> > system can take control. One example where this is useful is for the
> > operating system to infer whether the region needs to be identity-
> > mapped through an IOMMU.
> 
> I like simple solutions, but this hardly seems adequate to solve the 
> problem of passing IOMMU setup from bootloader/firmware to the OS. Like 
> what is the IOVA that's supposed to be used if identity mapping is not 
> used?

The assumption here is that if the region is not active there is no need
for the IOVA to be specified because the kernel will allocate memory and
assign any IOVA of its choosing.

Also, note that this is not meant as a way of passing IOMMU setup from
the bootloader or firmware to the OS. The purpose of this is to specify
that some region of memory is actively being accessed during boot. The
particular case that I'm looking at is where the bootloader set up a
splash screen and keeps it on during boot. The bootloader has not set up
an IOMMU mapping and the identity mapping serves as a way of keeping the
accesses by the display hardware working during the transitional period
after the IOMMU translations have been enabled by the kernel but before
the kernel display driver has had a chance to set up its own IOMMU
mappings.

> If you know enough about the regions to assume identity mapping, then 
> can't you know if active or not?

We could alternatively add some property that describes the region as
requiring an identity mapping. But note that we can't make any
assumptions here about the usage of these regions because the IOMMU
driver simply has no way of knowing what they are being used for.

Some additional information is required in device tree for the IOMMU
driver to be able to make that decision.

Thierry

> 
> > Signed-off-by: Thierry Reding 
> > ---
> >  .../bindings/reserved-memory/reserved-memory.txt   | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index 4dd20de6977f..163d2927e4fc 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -63,6 +63,13 @@ reusable (optional) - empty property
> >able to reclaim it back. Typically that means that the operating
> >system can use that region to store volatile or cached data that
> >can be otherwise regenerated or migrated elsewhere.
> > +active (optional) - empty property
> > +- If this property is set for a reserved memory region, it indicates
> > +  that some piece of hardware may be actively accessing this region.
> > +  Should the operating system want to enable IOMMU protection for a
> > +  device, all active memory regions must have been identity-mapped
> > +  in order to ensure that non-quiescent hardware during boot can
> > +  continue to access the memory.
> >  
> >  Linux implementation note:
> >  - If a "linux,cma-default" property is present, then Linux will use the
> > -- 
> > 2.28.0
> > 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: fix interrupt remapping for avic

2020-09-15 Thread Suravee Suthikulpanit




On 9/15/20 6:25 PM, Maxim Levitsky wrote:

On Mon, 2020-09-14 at 21:48 +0700, Suravee Suthikulpanit wrote:

Maxim,

On 9/13/2020 7:42 PM, Maxim Levitsky wrote:

Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
IRTE")
accidentally removed an assumption that modify_irte_ga always set the valid bit
and amd_iommu_activate_guest_mode relied on that.

Side effect of this is that on my machine, VFIO based VMs with AVIC enabled
would eventually crash and show IOMMU errors like that:

AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0055 address=0xfffdf800 
flags=0x0008]

Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
IRTE")
Signed-off-by: Maxim Levitsky 
---
   drivers/iommu/amd/iommu.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 07ae8b93887e5..aff4cc1869356 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3853,6 +3853,7 @@ int amd_iommu_activate_guest_mode(void *data)
entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
entry->hi.fields.vector= ir_data->ga_vector;
entry->lo.fields_vapic.ga_tag  = ir_data->ga_tag;
+   entry->lo.fields_remap.valid = 1;
   
   	return modify_irte_ga(ir_data->irq_2_irte.devid,

  ir_data->irq_2_irte.index, entry, ir_data);



Could you please try with the following patch instead?

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3840,14 +3840,18 @@ int amd_iommu_activate_guest_mode(void *data)
   {
  struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
  struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
+   u64 valid;

  if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
  !entry || entry->lo.fields_vapic.guest_mode)
  return 0;

+   valid = entry->lo.fields_vapic.valid;
+
  entry->lo.val = 0;
  entry->hi.val = 0;

+   entry->lo.fields_vapic.valid   = valid;
  entry->lo.fields_vapic.guest_mode  = 1;
  entry->lo.fields_vapic.ga_log_intr = 1;
  entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
@@ -3864,12 +3868,14 @@ int amd_iommu_deactivate_guest_mode(void *data)
  struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
  struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
  struct irq_cfg *cfg = ir_data->cfg;
-   u64 valid = entry->lo.fields_remap.valid;
+   u64 valid;

  if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
  !entry || !entry->lo.fields_vapic.guest_mode)
  return 0;

+   valid = entry->lo.fields_remap.valid;
+
  entry->lo.val = 0;
  entry->hi.val = 0;

I see. I based my approach on the fact that valid bit was
set always to true anyway before, plus that amd_iommu_activate_guest_mode
should be really only called when someone activates a valid interrupt remapping
entry, but IMHO the approach of preserving the valid bit is safer anyway.

It works on my system (I applied the patch manually, since either your or my 
email client,
seems to mangle the patch)

Thanks,
Best regards,
Maxim Levitsky




Sorry for the mangled patch. I'll submit the patch w/ your information. Thanks for your help reporting, debugging, and 
testing the patch.


Sincerely,

Suravee

--




Thanks,
Suravee





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote:
> Hi Jason,
> 
> I thought we discussed this at LPC, but still seems to be going in
> circles :-(.

We discused mdev at LPC, not PASID.

PASID applies widely to many device and needs to be introduced with a
wide community agreement so all scenarios will be supportable.

> As you had suggested earlier in the mail thread could Jason Wang maybe
> build out what it takes to have a full fledged /dev/sva interface for vDPA
> and figure out how the interfaces should emerge? otherwise it appears
> everyone is talking very high level and with that limited understanding of
> how things work at the moment. 

You want Jason Wang to do the work to get Intel PASID support merged?
Seems a bit of strange request.

> This has to move ahead of these email discussions, hoping somone with the
> right ideas would help move this forward.

Why not try yourself to come up with a proposal?

Jason 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: fix interrupt remapping for avic

2020-09-15 Thread Maxim Levitsky
On Mon, 2020-09-14 at 21:48 +0700, Suravee Suthikulpanit wrote:
> Maxim,
> 
> On 9/13/2020 7:42 PM, Maxim Levitsky wrote:
> > Commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when updating 128-bit 
> > IRTE")
> > accidentally removed an assumption that modify_irte_ga always set the valid 
> > bit
> > and amd_iommu_activate_guest_mode relied on that.
> > 
> > Side effect of this is that on my machine, VFIO based VMs with AVIC enabled
> > would eventually crash and show IOMMU errors like that:
> > 
> > AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0055 
> > address=0xfffdf800 flags=0x0008]
> > 
> > Fixes: e52d58d54a321 ("iommu/amd: Use cmpxchg_double() when updating 
> > 128-bit IRTE")
> > Signed-off-by: Maxim Levitsky 
> > ---
> >   drivers/iommu/amd/iommu.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index 07ae8b93887e5..aff4cc1869356 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -3853,6 +3853,7 @@ int amd_iommu_activate_guest_mode(void *data)
> > entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
> > entry->hi.fields.vector= ir_data->ga_vector;
> > entry->lo.fields_vapic.ga_tag  = ir_data->ga_tag;
> > +   entry->lo.fields_remap.valid = 1;
> >   
> > return modify_irte_ga(ir_data->irq_2_irte.devid,
> >   ir_data->irq_2_irte.index, entry, ir_data);
> > 
> 
> Could you please try with the following patch instead?
> 
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3840,14 +3840,18 @@ int amd_iommu_activate_guest_mode(void *data)
>   {
>  struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>  struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
> +   u64 valid;
> 
>  if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>  !entry || entry->lo.fields_vapic.guest_mode)
>  return 0;
> 
> +   valid = entry->lo.fields_vapic.valid;
> +
>  entry->lo.val = 0;
>  entry->hi.val = 0;
> 
> +   entry->lo.fields_vapic.valid   = valid;
>  entry->lo.fields_vapic.guest_mode  = 1;
>  entry->lo.fields_vapic.ga_log_intr = 1;
>  entry->hi.fields.ga_root_ptr   = ir_data->ga_root_ptr;
> @@ -3864,12 +3868,14 @@ int amd_iommu_deactivate_guest_mode(void *data)
>  struct amd_ir_data *ir_data = (struct amd_ir_data *)data;
>  struct irte_ga *entry = (struct irte_ga *) ir_data->entry;
>  struct irq_cfg *cfg = ir_data->cfg;
> -   u64 valid = entry->lo.fields_remap.valid;
> +   u64 valid;
> 
>  if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) ||
>  !entry || !entry->lo.fields_vapic.guest_mode)
>  return 0;
> 
> +   valid = entry->lo.fields_remap.valid;
> +
>  entry->lo.val = 0;
>  entry->hi.val = 0;
I see. I based my approach on the fact that valid bit was
set always to true anyway before, plus that amd_iommu_activate_guest_mode
should be really only called when someone activates a valid interrupt remapping
entry, but IMHO the approach of preserving the valid bit is safer anyway.

It works on my system (I applied the patch manually, since either your or my 
email client,
seems to mangle the patch)

Thanks,
Best regards,
Maxim Levitsky


> --

> 
> Thanks,
> Suravee
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;

2020-09-15 Thread Miquel Raynal
Hi Joe,

For MTD:

>  drivers/mtd/nand/raw/nandsim.c|  2 +-

Reviewed-by: Miquel Raynal 


Thanks,
Miquèl
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Intel-gfx] [trivial PATCH] treewide: Convert switch/case fallthrough; to break;

2020-09-15 Thread Jani Nikula
On Wed, 09 Sep 2020, Joe Perches  wrote:
> diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c 
> b/drivers/gpu/drm/i915/display/intel_sprite.c
> index 5ac0dbf0e03d..35ac539cc2b1 100644
> --- a/drivers/gpu/drm/i915/display/intel_sprite.c
> +++ b/drivers/gpu/drm/i915/display/intel_sprite.c
> @@ -2861,7 +2861,7 @@ static bool gen12_plane_format_mod_supported(struct 
> drm_plane *_plane,
>   case I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS:
>   if (!gen12_plane_supports_mc_ccs(dev_priv, plane->id))
>   return false;
> - fallthrough;
> + break;
>   case DRM_FORMAT_MOD_LINEAR:
>   case I915_FORMAT_MOD_X_TILED:
>   case I915_FORMAT_MOD_Y_TILED:

Acked-by: Jani Nikula 

for merging via whichever tree seems best.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Convert the intel iommu driver to the dma-iommu api

2020-09-15 Thread Tvrtko Ursulin


On 15/09/2020 02:47, Lu Baolu wrote:

Hi Tvrtko,

On 9/14/20 4:04 PM, Tvrtko Ursulin wrote:


Hi,

On 12/09/2020 04:21, Lu Baolu wrote:

Tom Murphy has almost done all the work. His latest patch series was
posted here.

https://lore.kernel.org/linux-iommu/20200903201839.7327-1-murph...@tcd.ie/ 



Thanks a lot!

This series is a follow-up with below changes:

1. Add a quirk for the i915 driver issue described in Tom's cover
letter.


Last week I have copied you on an i915 series which appears to remove 
the need for this quirk. so if we get those i915 patches reviewed and 
merged, do you still want to pursue this quirk?


It's up to the graphic guys. I don't know the details in i915 driver.
I don't think my tests could cover all cases.


I am the graphic guy. :) I just need some reviews (internally) for my 
series and then we can merge it, at which point you don't need the quirk 
patch any more. I'll try to accelerate this.


With regards to testing, you could send your series with my patches on 
top to our trybot mailing list (intel-gfx-try...@lists.freedesktop.org / 
https://patchwork.freedesktop.org/project/intel-gfx-trybot/series/?ordering=-last_updated) 
which would show you if it is still hitting the DMAR issues in i915.





2. Fix several bugs in patch "iommu: Allow the dma-iommu api to use
bounce buffers" to make the bounce buffer work for untrusted devices.
3. Several cleanups in iommu/vt-d driver after the conversion.


With the previous version of the series I hit a problem on Ivybridge 
where apparently the dma engine width is not respected. At least that 
is my layman interpretation of the errors. From the older thread:


<3> [209.526605] DMAR: intel_iommu_map: iommu width (39) is not 
sufficient for the mapped address (008000)


Relevant iommu boot related messages are:

<6>[    0.184234] DMAR: Host address width 36
<6>[    0.184245] DMAR: DRHD base: 0x00fed9 flags: 0x0
<6>[    0.184288] DMAR: dmar0: reg_base_addr fed9 ver 1:0 cap 
c020e60262 ecap f0101a

<6>[    0.184308] DMAR: DRHD base: 0x00fed91000 flags: 0x1
<6>[    0.184337] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap 
c9008020660262 ecap f0105a

<6>[    0.184357] DMAR: RMRR base: 0x00d8d28000 end: 0x00d8d46fff
<6>[    0.184377] DMAR: RMRR base: 0x00db00 end: 0x00df1f
<6>[    0.184398] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 
IOMMU 1

<6>[    0.184414] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
<6>[    0.184428] DMAR-IR: Queued invalidation will be enabled to 
support x2apic and Intr-remapping.

<6>[    0.185173] DMAR-IR: Enabled IRQ remapping in x2apic mode

<6>[    0.878934] DMAR: No ATSR found
<6>[    0.878966] DMAR: dmar0: Using Queued invalidation
<6>[    0.879007] DMAR: dmar1: Using Queued invalidation

<6>[    0.915032] DMAR: Intel(R) Virtualization Technology for 
Directed I/O
<6>[    0.915060] PCI-DMA: Using software bounce buffering for IO 
(SWIOTLB)
<6>[    0.915084] software IO TLB: mapped [mem 0xc80d4000-0xcc0d4000] 
(64MB)


(Full boot log at 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7054/fi-ivb-3770/boot0.txt, 
failures at 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7054/fi-ivb-3770/igt@i915_selftest@l...@blt.html.) 



Does this look familiar or at least plausible to you? Is this 
something your new series has fixed?


This happens during attaching a domain to device. It has nothing to do
with this patch series. I will look into this issue, but not in this
email thread context.


I am not sure what step is attaching domain to device, but these type 
messages:


<3> [209.526605] DMAR: intel_iommu_map: iommu width (39) is not
>> sufficient for the mapped address (008000)

They definitely appear to happen at runtime, as i915 is getting 
exercised by userspace.


Regards,

Tvrtko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCHv4 6/6] iommu: arm-smmu-impl: Remove unwanted extra blank lines

2020-09-15 Thread Sai Prakash Ranjan

On 2020-09-11 22:20, Sai Prakash Ranjan wrote:

On 2020-09-11 22:04, Robin Murphy wrote:

On 2020-09-11 17:21, Sai Prakash Ranjan wrote:

On 2020-09-11 21:37, Will Deacon wrote:

On Fri, Sep 11, 2020 at 05:03:06PM +0100, Robin Murphy wrote:
BTW am I supposed to have received 3 copies of everything? Because 
I did...


Yeah, this seems to be happening for all of Sai's emails :/



Sorry, I am not sure what went wrong as I only sent this once
and there are no recent changes to any of my configs, I'll
check it further.


Actually on closer inspection it appears to be "correct" behaviour.
I'm still subscribed to LAKML and the IOMMU list on this account, but
normally Office 365 deduplicates so aggressively that I have rules set
up to copy list mails that I'm cc'ed on back to my inbox, in case they
arrive first and cause the direct copy to get eaten - apparently
there's something unique about your email setup that manages to defeat
the deduplicator and make it deliver all 3 copies intact... :/



No changes in my local setup atleast, but in the past we have
had cases with codeaurora mail acting weird or it could be my vpn,
will have to check.



This was an issue with codeaurora servers and I am told that it is
fixed now.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] irqchip/qcom-pdc: Allow QCOM_PDC to be loadable as a permanent module

2020-09-15 Thread Greg Kroah-Hartman
On Mon, Sep 14, 2020 at 09:04:23PM +, John Stultz wrote:
> Allows qcom-pdc driver to be loaded as a permanent module.
> 
> An earlier version of this patch was merged in a larger patchset
> but was reverted entirely when issues were found with other
> drivers, so now that Marc has provided a better solution in his
> Hybrid probing patch set, I wanted to re-submit this change.
> 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Joerg Roedel 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Linus Walleij 
> Cc: Maulik Shah 
> Cc: Lina Iyer 
> Cc: Saravana Kannan 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Cc: linux-arm-...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-g...@vger.kernel.org
> Signed-off-by: John Stultz 
> ---

Reviewed-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: a saner API for allocating DMA addressable pages v2

2020-09-15 Thread Christoph Hellwig
On Mon, Sep 14, 2020 at 04:26:17PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 14, 2020 at 04:44:16PM +0200, Christoph Hellwig wrote:
> > I'm still a little unsure about the API naming, as alloc_pages sort of
> > implies a struct page return value, but we return a kernel virtual
> > address.
> 
> Erm ... dma_alloc_pages() returns a struct page, so is this sentence
> stale?

Yes.

> You say that like it's a bad thing.  I think the problem is more that
> people don't understand what non-coherent means and think they're
> supporting it when they're not.
> 
> dma_alloc_manual_flushing()?

That sounds pretty awkward..

> 
> > As a follow up I plan to move the implementation of the
> > DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
> > that is also is a fundamentally non coherent allocation.  The replacement
> > for that flag would then return a struct page, as it is allowed to
> > actually return pages without a kernel mapping as the name suggested
> > (although most of the time they will actually have a kernel mapping..)
> 
> If the page doesn't have a kernel mapping, shouldn't it return a PFN
> or a phys_addr?

Most APIs we'll feed it into need a struct page.  The difference is just
that it can be a highmem page.  And if we want to get fancy we could
change the kernel mapping to PROT_NONE eventually.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/17] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT

2020-09-15 Thread Christoph Hellwig
On Mon, Sep 14, 2020 at 06:34:02PM +0300, Sergei Shtylyov wrote:
> On 9/14/20 5:44 PM, Christoph Hellwig wrote:
> 
> > DMA_ATTR_NON_CONSISTENT is a no-op except on PARISC and some mips
> > configs, so don't set it in this ARM specific driver.
> 
>Hm, PARICS and ARM capitalized but mips in lower case? :-)

I guess it should also be PA-RISC if we start nitpicking..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/17] sgiseeq: convert to dma_alloc_noncoherent

2020-09-15 Thread Christoph Hellwig
On Mon, Sep 14, 2020 at 04:13:58PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 14, 2020 at 04:44:27PM +0200, Christoph Hellwig wrote:
> >  drivers/net/ethernet/i825xx/lasi_82596.c |  25 ++---
> >  drivers/net/ethernet/i825xx/lib82596.c   | 114 ++-
> >  drivers/net/ethernet/i825xx/sni_82596.c  |   4 -
> >  drivers/net/ethernet/seeq/sgiseeq.c  |  28 --
> >  drivers/scsi/53c700.c|   9 +-
> >  5 files changed, 103 insertions(+), 77 deletions(-)
> 
> I think your patch slicing-and-dicing went wrong here ;-(

Pretty much..  Fixed up for the next version.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/17] 53c700: improve non-coherent DMA handling

2020-09-15 Thread Christoph Hellwig
On Mon, Sep 14, 2020 at 08:20:18AM -0700, James Bottomley wrote:
> If you're going to change the macros from taking a device to taking a
> hostdata structure then the descriptive argument name needs to change
> ... it can't be dev anymore.  I'm happy with it simply becoming 'h' if
> hostdata is too long.
> 
> I already asked for this on the first go around:

And I did rename them, those hunks just accidentally slipped into patch
12 instead of this one.  Fixed for the next versions.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu