Re: iommu/vt-d: Cure VF irqdomain hickup

2020-11-12 Thread Lu Baolu

Hi Thomas,

On 2020/11/13 3:15, Thomas Gleixner wrote:

The recent changes to store the MSI irqdomain pointer in struct device
missed that Intel DMAR does not register virtual function devices.  Due to
that a VF device gets the plain PCI-MSI domain assigned and then issues
compat MSI messages which get caught by the interrupt remapping unit.

Cure that by inheriting the irq domain from the physical function
device.

That's a temporary workaround. The correct fix is to inherit the irq domain
from the bus, but that's a larger effort which needs quite some other
changes to the way how x86 manages PCI and MSI domains.

Fixes: 85a8dfc57a0b ("iommm/vt-d: Store irq domain in struct device")
Reported-by: Jason Gunthorpe 
Signed-off-by: Thomas Gleixner 
---
  drivers/iommu/intel/dmar.c |   19 ++-
  1 file changed, 18 insertions(+), 1 deletion(-)

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -333,6 +333,11 @@ static void  dmar_pci_bus_del_dev(struct
dmar_iommu_notify_scope_dev(info);
  }
  
+static inline void vf_inherit_msi_domain(struct pci_dev *pdev)

+{
+   dev_set_msi_domain(>dev, dev_get_msi_domain(>physfn->dev));
+}
+
  static int dmar_pci_bus_notifier(struct notifier_block *nb,
 unsigned long action, void *data)
  {
@@ -342,8 +347,20 @@ static int dmar_pci_bus_notifier(struct
/* Only care about add/remove events for physical functions.
 * For VFs we actually do the lookup based on the corresponding
 * PF in device_to_iommu() anyway. */
-   if (pdev->is_virtfn)
+   if (pdev->is_virtfn) {
+   /*
+* Note: This is a horrible hack and needs to be cleaned
+* up by assigning the domain to the bus, but that's too
+* big of a change for post rc3.
+*
+* Ensure that the VF device inherits the irq domain of the
+* PF device:
+*/
+   if (action == BUS_NOTIFY_ADD_DEVICE)
+   vf_inherit_msi_domain(pdev);
return NOTIFY_DONE;
+   }
+
if (action != BUS_NOTIFY_ADD_DEVICE &&
action != BUS_NOTIFY_REMOVED_DEVICE)
return NOTIFY_DONE;


We also encountered this problem in internal testing. This patch can
solve the problem.

Acked-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/14] iommu/amd: Add Generic IO Page Table Framework Support

2020-11-12 Thread Suravee Suthikulpanit

Joerg,

Please ignore to include the V3. I am working on V4 to resubmit.

Thank you,
Suravee

On 11/11/20 10:10 AM, Suravee Suthikulpanit wrote:

Hi Joerg,

Do you have any update on this series?

Thanks,
Suravee

On 11/2/20 10:16 AM, Suravee Suthikulpanit wrote:

Joerg,

You mentioned to remind you to pull this in to linux-next.

Thanks,
Suravee

On 10/4/20 8:45 AM, Suravee Suthikulpanit wrote:

The framework allows callable implementation of IO page table.
This allows AMD IOMMU driver to switch between different types
of AMD IOMMU page tables (e.g. v1 vs. v2).

This series refactors the current implementation of AMD IOMMU v1 page table
to adopt the framework. There should be no functional change.
Subsequent series will introduce support for the AMD IOMMU v2 page table.

Thanks,
Suravee

Change from V2 
(https://lore.kernel.org/lkml/835c0d46-ed96-9fbe-856a-777dcffac...@amd.com/T/#t)
   - Patch 2/14: Introduce helper function io_pgtable_cfg_to_data.
   - Patch 13/14: Put back the struct iommu_flush_ops since patch v2 would run 
into
 NULL pointer bug when calling free_io_pgtable_ops if not defined.

Change from V1 (https://lkml.org/lkml/2020/9/23/251)
   - Do not specify struct io_pgtable_cfg.coherent_walk, since it is
 not currently used. (per Robin)
   - Remove unused struct iommu_flush_ops.  (patch 2/13)
   - Move amd_iommu_setup_io_pgtable_ops to iommu.c instead of io_pgtable.c
 patch 13/13)

Suravee Suthikulpanit (14):
   iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline
   iommu/amd: Prepare for generic IO page table framework
   iommu/amd: Move pt_root to to struct amd_io_pgtable
   iommu/amd: Convert to using amd_io_pgtable
   iommu/amd: Declare functions as extern
   iommu/amd: Move IO page table related functions
   iommu/amd: Restructure code for freeing page table
   iommu/amd: Remove amd_iommu_domain_get_pgtable
   iommu/amd: Rename variables to be consistent with struct
 io_pgtable_ops
   iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable
   iommu/amd: Introduce iommu_v1_iova_to_phys
   iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page
   iommu/amd: Introduce IOMMU flush callbacks
   iommu/amd: Adopt IO page table framework

  drivers/iommu/amd/Kconfig   |   1 +
  drivers/iommu/amd/Makefile  |   2 +-
  drivers/iommu/amd/amd_iommu.h   |  22 +
  drivers/iommu/amd/amd_iommu_types.h |  43 +-
  drivers/iommu/amd/io_pgtable.c  | 564 
  drivers/iommu/amd/iommu.c   | 646 +++-
  drivers/iommu/io-pgtable.c  |   3 +
  include/linux/io-pgtable.h  |   2 +
  8 files changed, 691 insertions(+), 592 deletions(-)
  create mode 100644 drivers/iommu/amd/io_pgtable.c


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 5/9] ACPI/IORT: Enable stall support for platform devices

2020-11-12 Thread Hanjun Guo

On 2020/11/12 20:55, Jean-Philippe Brucker wrote:

Copy the "Stall supported" bit, that tells whether a platform device
supports stall, into the fwspec struct.

Signed-off-by: Jean-Philippe Brucker 


Acked-by: Hanjun Guo 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND][PATCH 1/2] arm-smmu-qcom: Ensure the qcom_scm driver has finished probing

2020-11-12 Thread John Stultz
Robin Murphy pointed out that if the arm-smmu driver probes before
the qcom_scm driver, we may call qcom_scm_qsmmu500_wait_safe_toggle()
before the __scm is initialized.

Now, getting this to happen is a bit contrived, as in my efforts it
required enabling asynchronous probing for both drivers, moving the
firmware dts node to the end of the dtsi file, as well as forcing a
long delay in the qcom_scm_probe function.

With those tweaks we ran into the following crash:
[2.631040] arm-smmu 1500.iommu: Stage-1: 48-bit VA -> 48-bit IPA
[2.633372] Unable to handle kernel NULL pointer dereference at virtual 
address 
...
[2.633402] [] user address but active_mm is swapper
[2.633409] Internal error: Oops: 9605 [#1] PREEMPT SMP
[2.633415] Modules linked in:
[2.633427] CPU: 5 PID: 117 Comm: kworker/u16:2 Tainted: GW 
5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3971
[2.633430] Hardware name: Thundercomm Dragonboard 845c (DT)
[2.633448] Workqueue: events_unbound async_run_entry_fn
[2.633456] pstate: 80c5 (Nzcv daif +PAN +UAO -TCO BTYPE=--)
[2.633465] pc : qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[2.633473] lr : qcom_smmu500_reset+0x58/0x78
[2.633476] sp : ffc0105a3b60
...
[2.633567] Call trace:
[2.633572]  qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[2.633576]  qcom_smmu500_reset+0x58/0x78
[2.633581]  arm_smmu_device_reset+0x194/0x270
[2.633585]  arm_smmu_device_probe+0xc94/0xeb8
[2.633592]  platform_drv_probe+0x58/0xa8
[2.633597]  really_probe+0xec/0x398
[2.633601]  driver_probe_device+0x5c/0xb8
[2.633606]  __driver_attach_async_helper+0x64/0x88
[2.633610]  async_run_entry_fn+0x4c/0x118
[2.633617]  process_one_work+0x20c/0x4b0
[2.633621]  worker_thread+0x48/0x460
[2.633628]  kthread+0x14c/0x158
[2.633634]  ret_from_fork+0x10/0x18
[2.633642] Code: a9034fa0 d0007f73 29107fa0 91342273 (f9400020)

To avoid this, this patch adds a check on qcom_scm_is_available() in
the qcom_smmu_impl_init() function, returning -EPROBE_DEFER if its
not ready.

This allows the driver to try to probe again later after qcom_scm has
finished probing.

Cc: Robin Murphy 
Cc: Will Deacon 
Cc: Andy Gross 
Cc: Maulik Shah 
Cc: Bjorn Andersson 
Cc: Saravana Kannan 
Cc: Marc Zyngier 
Cc: Lina Iyer 
Cc: iommu@lists.linux-foundation.org
Cc: linux-arm-msm 
Reported-by: Robin Murphy 
Signed-off-by: John Stultz 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 66ba4870659f4..ef37ccfa82562 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -159,6 +159,10 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 {
struct qcom_smmu *qsmmu;
 
+   /* Check to make sure qcom_scm has finished probing */
+   if (!qcom_scm_is_available())
+   return ERR_PTR(-EPROBE_DEFER);
+
qsmmu = devm_kzalloc(smmu->dev, sizeof(*qsmmu), GFP_KERNEL);
if (!qsmmu)
return ERR_PTR(-ENOMEM);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND][PATCH 2/2] iommu: Avoid crash if iommu_group is null

2020-11-12 Thread John Stultz
In trying to handle a possible driver probe ordering issue
brought up by Robin Murphy, I ran across a separate null pointer
crash in the iommu core in iommu_group_remove_device():
[2.732803] dwc3-qcom a6f8800.usb: failed to get usb-ddr path: -517
[2.739281] Unable to handle kernel NULL pointer dereference at virtual 
address 00c0
...
[2.775619] [00c0] user address but active_mm is swapper
[2.782039] Internal error: Oops: 9605 [#1] PREEMPT SMP
[2.787670] Modules linked in:
[2.790769] CPU: 6 PID: 1 Comm: swapper/0 Tainted: GW 
5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3973
[2.801719] Hardware name: Thundercomm Dragonboard 845c (DT)
[2.807431] pstate: 00c5 (nzcv daif +PAN +UAO -TCO BTYPE=--)
[2.813508] pc : iommu_group_remove_device+0x30/0x1b0
[2.818611] lr : iommu_release_device+0x4c/0x78
[2.823189] sp : ffc01005b950
...
[2.907082] Call trace:
[2.909566]  iommu_group_remove_device+0x30/0x1b0
[2.914323]  iommu_release_device+0x4c/0x78
[2.918559]  iommu_bus_notifier+0xe8/0x108
[2.922708]  blocking_notifier_call_chain+0x78/0xb8
[2.927641]  device_del+0x2ac/0x3d0
[2.931177]  platform_device_del.part.9+0x20/0x98
[2.935933]  platform_device_unregister+0x2c/0x40
[2.940694]  of_platform_device_destroy+0xd8/0xe0
[2.945450]  device_for_each_child_reverse+0x58/0xb0
[2.950471]  of_platform_depopulate+0x4c/0x78
[2.954886]  dwc3_qcom_probe+0x93c/0xcb8
[2.958858]  platform_drv_probe+0x58/0xa8
[2.962917]  really_probe+0xec/0x398
[2.966531]  driver_probe_device+0x5c/0xb8
[2.970677]  device_driver_attach+0x74/0x98
[2.974911]  __driver_attach+0x60/0xe8
[2.978700]  bus_for_each_dev+0x84/0xd8
[2.982581]  driver_attach+0x30/0x40
[2.986194]  bus_add_driver+0x160/0x208
[2.990076]  driver_register+0x64/0x110
[2.993957]  __platform_driver_register+0x58/0x68
[2.998716]  dwc3_qcom_driver_init+0x20/0x28
[3.003041]  do_one_initcall+0x6c/0x2d0
[3.006925]  kernel_init_freeable+0x214/0x268
[3.011339]  kernel_init+0x18/0x118
[3.014876]  ret_from_fork+0x10/0x18
[3.018495] Code: d0006a21 f9417295 91130021 910162b6 (b940c2a2)

In the case above, the arm-smmu driver fails to probe with
EPROBE_DEFER, and I'm guessing I'm guessing that causes
iommu_group_add_device() to fail and sets the
dev->iommu_group = NULL, then somehow we hit
iommu_group_remove_device() and trip over the null value?
I'm not really sure...

Anyway, adding the null check seems to avoid the issue and the
system boots fine after the arm-smmu driver later reprobed.

Feedback or better ideas for a solution would be appreciated!

Cc: Robin Murphy 
Cc: Will Deacon 
Cc: Andy Gross 
Cc: Maulik Shah 
Cc: Bjorn Andersson 
Cc: Saravana Kannan 
Cc: Marc Zyngier 
Cc: Lina Iyer 
Cc: iommu@lists.linux-foundation.org
Cc: linux-arm-msm 
Signed-off-by: John Stultz 
---
 drivers/iommu/iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b53446bb8c6b4..28229f7ef7d5a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -877,6 +877,10 @@ void iommu_group_remove_device(struct device *dev)
struct iommu_group *group = dev->iommu_group;
struct group_device *tmp_device, *device = NULL;
 
+   /* Avoid crash if iommu_group value is null */
+   if (!group)
+   return;
+
dev_info(dev, "Removing from iommu group %d\n", group->id);
 
/* Pre-notify listeners that a device is being removed. */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/vt-d: Cure VF irqdomain hickup

2020-11-12 Thread Thomas Gleixner
On Thu, Nov 12 2020 at 20:15, Thomas Gleixner wrote:
> The recent changes to store the MSI irqdomain pointer in struct device
> missed that Intel DMAR does not register virtual function devices.  Due to
> that a VF device gets the plain PCI-MSI domain assigned and then issues
> compat MSI messages which get caught by the interrupt remapping unit.
>
> Cure that by inheriting the irq domain from the physical function
> device.
>
> That's a temporary workaround. The correct fix is to inherit the irq domain
> from the bus, but that's a larger effort which needs quite some other
> changes to the way how x86 manages PCI and MSI domains.

Bah, that's not really going to work with the way how irq remapping
works on x86 because at least Intel/DMAR can have more than one DMAR
unit on a bus.

So the alternative solution would be to assign the domain per device,
but the current ordering creates a hen and egg problem. Looking the
domain up in pci_set_msi_domain() does not work because at that point
the device is not registered in the IOMMU. That happens from
device_add().

Marc, is there any problem to reorder the calls in pci_device_add():

  device_add();
  pci_set_msi_domain();

That would allow to add a irq_find_matching_fwspec() based lookup to
pci_msi_get_device_domain().

Though I'm not yet convinced that the outcome would be less horrible
than the hack in the DMAR driver when I'm taking all the other horrors
of x86 (including XEN) into account :)

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/3] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2020-11-12 Thread John Stultz
On Thu, Nov 12, 2020 at 9:37 AM Will Deacon  wrote:
> On Tue, Nov 10, 2020 at 10:51:46AM -0800, John Stultz wrote:
> > On Tue, Nov 10, 2020 at 5:35 AM Linus Walleij  
> > wrote:
> > > On Fri, Nov 6, 2020 at 5:27 AM John Stultz  wrote:
> > >
> > > > Allow the qcom_scm driver to be loadable as a permenent module.
> > > >
> > ...
> > > I applied this patch to the pinctrl tree as well, I suppose
> > > that was the intention. If someone gets upset I can always
> > > pull it out.
> >
> > Will: You ok with this?
>
> We didn't come up with something better, so I can live with it.

Ok, thanks!

> Not sure
> about the otehr issues that were reported by Robin though -- your RFC for
> fixing those looked a bit more controversial ;)

Huh, I hadn't heard anything back on that series and was going to
resend it. Do let me know if you have more thoughts on that one.

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


iommu/vt-d: Cure VF irqdomain hickup

2020-11-12 Thread Thomas Gleixner
The recent changes to store the MSI irqdomain pointer in struct device
missed that Intel DMAR does not register virtual function devices.  Due to
that a VF device gets the plain PCI-MSI domain assigned and then issues
compat MSI messages which get caught by the interrupt remapping unit.

Cure that by inheriting the irq domain from the physical function
device.

That's a temporary workaround. The correct fix is to inherit the irq domain
from the bus, but that's a larger effort which needs quite some other
changes to the way how x86 manages PCI and MSI domains.

Fixes: 85a8dfc57a0b ("iommm/vt-d: Store irq domain in struct device")
Reported-by: Jason Gunthorpe 
Signed-off-by: Thomas Gleixner 
---
 drivers/iommu/intel/dmar.c |   19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -333,6 +333,11 @@ static void  dmar_pci_bus_del_dev(struct
dmar_iommu_notify_scope_dev(info);
 }
 
+static inline void vf_inherit_msi_domain(struct pci_dev *pdev)
+{
+   dev_set_msi_domain(>dev, dev_get_msi_domain(>physfn->dev));
+}
+
 static int dmar_pci_bus_notifier(struct notifier_block *nb,
 unsigned long action, void *data)
 {
@@ -342,8 +347,20 @@ static int dmar_pci_bus_notifier(struct
/* Only care about add/remove events for physical functions.
 * For VFs we actually do the lookup based on the corresponding
 * PF in device_to_iommu() anyway. */
-   if (pdev->is_virtfn)
+   if (pdev->is_virtfn) {
+   /*
+* Note: This is a horrible hack and needs to be cleaned
+* up by assigning the domain to the bus, but that's too
+* big of a change for post rc3.
+*
+* Ensure that the VF device inherits the irq domain of the
+* PF device:
+*/
+   if (action == BUS_NOTIFY_ADD_DEVICE)
+   vf_inherit_msi_domain(pdev);
return NOTIFY_DONE;
+   }
+
if (action != BUS_NOTIFY_ADD_DEVICE &&
action != BUS_NOTIFY_REMOVED_DEVICE)
return NOTIFY_DONE;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-12 Thread Jason Gunthorpe
On Thu, Nov 12, 2020 at 06:09:56PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 12, 2020 at 12:59:35PM -0400, Jason Gunthorpe wrote:
> >  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs
> 
> I think this one actually is something needed in 5.10 and -stable.

Done, I added a

Fixes: 551199aca1c3 ("lib/dma-virt: Add dma_virt_ops")

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/3] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2020-11-12 Thread Will Deacon
On Tue, Nov 10, 2020 at 10:51:46AM -0800, John Stultz wrote:
> On Tue, Nov 10, 2020 at 5:35 AM Linus Walleij  
> wrote:
> > On Fri, Nov 6, 2020 at 5:27 AM John Stultz  wrote:
> >
> > > Allow the qcom_scm driver to be loadable as a permenent module.
> > >
> ...
> > I applied this patch to the pinctrl tree as well, I suppose
> > that was the intention. If someone gets upset I can always
> > pull it out.
> 
> Will: You ok with this?

We didn't come up with something better, so I can live with it. Not sure
about the otehr issues that were reported by Robin though -- your RFC for
fixing those looked a bit more controversial ;)

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-12 Thread santosh . shilimkar

+ Ka-Cheong

On 11/12/20 5:23 AM, Jason Gunthorpe wrote:

On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:

ping?

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:

Hi Jason,

this series switches the RDMA core to opencode the special case of
devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
have caused a bit of trouble due to the P2P code node working with
them due to the fact that we'd do two dma mapping iterations for a
single I/O, but also are a bit of layering violation and lead to
more code than necessary.

Tested with nvme-rdma over rxe.

Note that the rds changes are untested, as I could not find any
simple rds test setup.

Changes since v2:
  - simplify the INFINIBAND_VIRT_DMA dependencies
  - add a ib_uses_virt_dma helper
  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
  - use ib_dma_max_seg_size in umem
  - stop using dmapool in rds

Changes since v1:
  - disable software RDMA drivers for highmem configs
  - update the PCI commit logs


Santosh can you please check the RDA parts??



Hi Ka-Cheong,

Can you please check Christoph change [1] which clean-up
dma-pool API to use ib_dma_* and slab allocator ? This was added
as part of your "net/rds: Use DMA memory pool allocation for rds_header"
commit.


Regards,
Santosh

[1] https://www.spinics.net/lists/linux-pci/msg101547.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-12 Thread Christoph Hellwig
On Thu, Nov 12, 2020 at 12:59:35PM -0400, Jason Gunthorpe wrote:
>  RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs

I think this one actually is something needed in 5.10 and -stable.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-12 Thread Jason Gunthorpe
On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs

Lets give Santosh a little longer for RDS, I've grabbed the precursor
parts to for-next for now:

 nvme-rdma: Use ibdev_to_node instead of dereferencing ->dma_device
 RDMA: Lift ibdev_to_node from rds to common code
 RDMA/core: Remove ib_dma_{alloc,free}_coherent
 RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
 RMDA/sw: Don't allow drivers using dma_virt_ops on highmem configs

Will get the rest next week regardless.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-12 Thread Nicolas Saenz Julienne
Hi Catalin,

On Tue, 2020-11-10 at 18:17 +, Catalin Marinas wrote:
> On Fri, Nov 06, 2020 at 07:46:29PM +0100, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-11-05 at 16:11 +, James Morse wrote:
> > > On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > > > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > > > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > > > boot table initialization, so move it later in the boot process.
> > > > Specifically into mem_init(), this is the last place crashkernel will be
> > > > able to reserve the memory before the page allocator kicks in.
> > > > There
> > > > isn't any apparent reason for doing this earlier.
> > > 
> > > It's so that map_mem() can carve it out of the linear/direct map.
> > > This is so that stray writes from a crashing kernel can't accidentally 
> > > corrupt the kdump
> > > kernel. We depend on this if we continue with kdump, but failed to 
> > > offline all the other
> > > CPUs.
> > 
> > I presume here you refer to arch_kexec_protect_crashkres(), IIUC this will 
> > only
> > happen further down the line, after having loaded the kdump kernel image. 
> > But
> > it also depends on the mappings to be PAGE sized (flags == 
> > NO_BLOCK_MAPPINGS |
> > NO_CONT_MAPPINGS).
> 
> IIUC, arch_kexec_protect_crashkres() is only for the crashkernel image,
> not the whole reserved memory that the crashkernel will use. For the
> latter, we avoid the linear map by marking it as nomap in map_mem().

I'm not sure we're on the same page here, so sorry if this was already implied.

The crashkernel memory mapping is bypassed while preparing the linear mappings
but it is then mapped right away, with page granularity and !MTE.
See paging_init()->map_mem():

/*
 * Use page-level mappings here so that we can shrink the region
 * in page granularity and put back unused memory to buddy system
 * through /sys/kernel/kexec_crash_size interface.
 */
if (crashk_res.end) {
__map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
   PAGE_KERNEL,
   NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
memblock_clear_nomap(crashk_res.start,
 resource_size(_res));
}

IIUC the inconvenience here is that we need special mapping options for
crashkernel and updating those after having mapped that memory as regular
memory isn't possible/easy to do.

> > > We also depend on this when skipping the checksum code in purgatory, 
> > > which can be
> > > exceedingly slow.
> > 
> > This one I don't fully understand, so I'll lazily assume the prerequisite is
> > the same WRT how memory is mapped. :)
> > 
> > Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> > prerequisite.
> > 
> > Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> > having the linear mappings available.
> 
> So it looks like reserve_crashkernel() wants to reserve memory before
> setting up the linear map with the information about the DMA zones in
> place but that comes later when we can parse the firmware tables.
> 
> I wonder, instead of not mapping the crashkernel reservation, can we not
> do an arch_kexec_protect_crashkres() for the whole reservation after we
> created the linear map?

arch_kexec_protect_crashkres() depends on __change_memory_common() which
ultimately depends on the memory to be mapped with PAGE_SIZE pages. As I
comment above, the trick would work as long as there is as way to update the
linear mappings with whatever crashkernel needs later in the boot process.

> > Let me stress that knowing the DMA constraints in the system before 
> > reserving
> > crashkernel's regions is necessary if we ever want it to work seamlessly on 
> > all
> > platforms. Be it small stuff like the Raspberry Pi or huge servers with TB 
> > of
> > memory.
> 
> Indeed. So we have 3 options (so far):
> 
> 1. Allow the crashkernel reservation to go into the linear map but set
>it to invalid once allocated.
> 
> 2. Parse the flattened DT (not sure what we do with ACPI) before
>creating the linear map. We may have to rely on some SoC ID here
>instead of actual DMA ranges.
> 
> 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
>reservations and not rely on arm64_dma_phys_limit in
>reserve_crashkernel().
> 
> I think (2) we tried hard to avoid. Option (3) brings us back to the
> issues we had on large crashkernel reservations regressing on some
> platforms (though it's been a while since, they mostly went quiet ;)).
> However, with Chen's crashkernel patches we end up with two
> reservations, one in the low DMA zone and one higher, potentially above
> 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> reservations than what we have now.
> 
> If (1) works, I'd go for it (James knows this part better than me),
> 

Re: [PATCH v2 04/17] iommu/hyperv: don't setup IRQ remapping when running as root

2020-11-12 Thread Vitaly Kuznetsov
Wei Liu  writes:

> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root.

Super-nitpick: I would suggest we always say 'root partition' as 'root'
has a 'slightly different' meaning in Linux and this commit message may
sound confusing to an unprepared reader.

>
> Signed-off-by: Wei Liu 
> Acked-by: Joerg Roedel 
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index e09e2d734c57..8d3ce3add57d 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "irq_remapping.h"
>  
> @@ -143,7 +144,7 @@ static int __init hyperv_prepare_irq_remapping(void)
>   int i;
>  
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> - !x2apic_supported())
> + !x2apic_supported() || hv_root_partition)
>   return -ENODEV;
>  
>   fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);

Reviewed-by: Vitaly Kuznetsov 

-- 
Vitaly

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Thomas Gleixner
On Thu, Nov 12 2020 at 15:15, Thomas Gleixner wrote:
> On Thu, Nov 12 2020 at 08:55, Jason Gunthorpe wrote:
>> On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
>> They were unable to bisect further into the series because some of the
>> interior commits don't boot :(
>>
>> When we try to load the mlx5 driver on a bare metal VF it gets this:
>>
>> [Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
>> [Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
>> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>> [Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
>> [Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
>> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
>> mlx5_cmd_eq_recover:264:(pid 3390): Recovered 1 EQEs on cmd_eq
>> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
>> wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
>> recovered after timeout
>> [Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
>> [Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
>> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>>
>> If you have any idea Ziyad and Itay can run any debugging you like.
>>
>> I suppose it is because this series is handing out compatability
>> addr/data pairs while the IOMMU is setup to only accept remap ones
>> from SRIOV VFs?
>
> So the issue seems to be that the VF device has the default irq domain
> assigned and not the remapping domain. Let me stare into the code to see
> how these VF devices are set up and registered with the IOMMU/remap
> unit.

Found the reason. Will fix it after walking the dogs. Brain needs some
fresh air.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Thomas Gleixner
Jason,

(trimmed CC list a bit)

On Thu, Nov 12 2020 at 08:55, Jason Gunthorpe wrote:
> On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> They were unable to bisect further into the series because some of the
> interior commits don't boot :(
>
> When we try to load the mlx5 driver on a bare metal VF it gets this:
>
> [Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
> [Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
> [Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
> [Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
> mlx5_cmd_eq_recover:264:(pid 3390): Recovered 1 EQEs on cmd_eq
> [Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
> wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
> recovered after timeout
> [Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
> [Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
> index 1600 [fault reason 37] Blocked a compatibility format interrupt request
>
> If you have any idea Ziyad and Itay can run any debugging you like.
>
> I suppose it is because this series is handing out compatability
> addr/data pairs while the IOMMU is setup to only accept remap ones
> from SRIOV VFs?

So the issue seems to be that the VF device has the default irq domain
assigned and not the remapping domain. Let me stare into the code to see
how these VF devices are set up and registered with the IOMMU/remap
unit.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: remove dma_virt_ops v2

2020-11-12 Thread Jason Gunthorpe
On Thu, Nov 12, 2020 at 10:40:30AM +0100, Christoph Hellwig wrote:
> ping?
> 
> On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> > Hi Jason,
> > 
> > this series switches the RDMA core to opencode the special case of
> > devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> > have caused a bit of trouble due to the P2P code node working with
> > them due to the fact that we'd do two dma mapping iterations for a
> > single I/O, but also are a bit of layering violation and lead to
> > more code than necessary.
> > 
> > Tested with nvme-rdma over rxe.
> > 
> > Note that the rds changes are untested, as I could not find any
> > simple rds test setup.
> > 
> > Changes since v2:
> >  - simplify the INFINIBAND_VIRT_DMA dependencies
> >  - add a ib_uses_virt_dma helper
> >  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
> >  - use ib_dma_max_seg_size in umem
> >  - stop using dmapool in rds
> > 
> > Changes since v1:
> >  - disable software RDMA drivers for highmem configs
> >  - update the PCI commit logs

Santosh can you please check the RDA parts??

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 1/9] iommu: Add a page fault handler

2020-11-12 Thread Jean-Philippe Brucker
Some systems allow devices to handle I/O Page Faults in the core mm. For
example systems implementing the PCIe PRI extension or Arm SMMU stall
model. Infrastructure for reporting these recoverable page faults was
added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
fault report API"). Add a page fault handler for host SVA.

IOMMU driver can now instantiate several fault workqueues and link them
to IOPF-capable devices. Drivers can choose between a single global
workqueue, one per IOMMU device, one per low-level fault queue, one per
domain, etc.

When it receives a fault event, supposedly in an IRQ handler, the IOMMU
driver reports the fault using iommu_report_device_fault(), which calls
the registered handler. The page fault handler then calls the mm fault
handler, and reports either success or failure with iommu_page_response().
When the handler succeeded, the IOMMU retries the access.

The iopf_param pointer could be embedded into iommu_fault_param. But
putting iopf_param into the iommu_param structure allows us not to care
about ordering between calls to iopf_queue_add_device() and
iommu_register_device_fault_handler().

Signed-off-by: Jean-Philippe Brucker 
---
v8:
* Re-use CONFIG_IOMMU_SVA_LIB and move definitions to iommu-sva-lib.h,
  since this is an API internal to IOMMU drivers.
* Fix typos.
---
 drivers/iommu/Makefile|   1 +
 drivers/iommu/iommu-sva-lib.h |  53 
 include/linux/iommu.h |   2 +
 drivers/iommu/io-pgfault.c| 462 ++
 4 files changed, 518 insertions(+)
 create mode 100644 drivers/iommu/io-pgfault.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 61bd30cd8369..60fafc23dee6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
+obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index b40990aef3fd..031155010ca8 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
min, ioasid_t max);
 void iommu_sva_free_pasid(struct mm_struct *mm);
 struct mm_struct *iommu_sva_find(ioasid_t pasid);
 
+/* I/O Page fault */
+struct device;
+struct iommu_fault;
+struct iopf_queue;
+
+#ifdef CONFIG_IOMMU_SVA_LIB
+int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_remove_device(struct iopf_queue *queue,
+struct device *dev);
+int iopf_queue_flush_dev(struct device *dev);
+struct iopf_queue *iopf_queue_alloc(const char *name);
+void iopf_queue_free(struct iopf_queue *queue);
+int iopf_queue_discard_partial(struct iopf_queue *queue);
+
+#else /* CONFIG_IOMMU_SVA_LIB */
+static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_add_device(struct iopf_queue *queue,
+   struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_remove_device(struct iopf_queue *queue,
+  struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+   return -ENODEV;
+}
+
+static inline struct iopf_queue *iopf_queue_alloc(const char *name)
+{
+   return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+   return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_SVA_LIB */
 #endif /* _IOMMU_SVA_LIB_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 047480a19997..a1c78c4cdeb1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -350,6 +350,7 @@ struct iommu_fault_param {
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
+ * @iopf_param: I/O Page Fault queue and data
  * @fwspec: IOMMU fwspec data
  * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
@@ -360,6 +361,7 @@ struct iommu_fault_param {
 struct dev_iommu {
struct mutex lock;
struct iommu_fault_param*fault_param;
+   struct iopf_device_param*iopf_param;
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
new file mode 100644
index ..fc1d5d29ac37
--- /dev/null
+++ b/drivers/iommu/io-pgfault.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Handle device page faults

[PATCH v8 9/9] iommu/arm-smmu-v3: Add support for PRI

2020-11-12 Thread Jean-Philippe Brucker
For PCI devices that support it, enable the PRI capability and handle
PRI Page Requests with the generic fault handler. It is enabled on
demand by iommu_sva_device_init().

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  20 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  28 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 275 +++---
 3 files changed, 272 insertions(+), 51 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 124f604ed677..7c2d31133148 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -230,6 +230,7 @@
 #define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4)
 #define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6)
 
+#define STRTAB_STE_1_PPAR  (1UL << 18)
 #define STRTAB_STE_1_S1STALLD  (1UL << 27)
 
 #define STRTAB_STE_1_EATS  GENMASK_ULL(29, 28)
@@ -360,6 +361,9 @@
 #define CMDQ_PRI_0_SID GENMASK_ULL(63, 32)
 #define CMDQ_PRI_1_GRPID   GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESPGENMASK_ULL(13, 12)
+#define CMDQ_PRI_1_RESP_FAILURE0UL
+#define CMDQ_PRI_1_RESP_INVALID1UL
+#define CMDQ_PRI_1_RESP_SUCCESS2UL
 
 #define CMDQ_RESUME_0_SID  GENMASK_ULL(63, 32)
 #define CMDQ_RESUME_0_RESP_TERM0UL
@@ -427,12 +431,6 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
-enum pri_resp {
-   PRI_RESP_DENY = 0,
-   PRI_RESP_FAIL = 1,
-   PRI_RESP_SUCC = 2,
-};
-
 struct arm_smmu_cmdq_ent {
/* Common fields */
u8  opcode;
@@ -494,7 +492,7 @@ struct arm_smmu_cmdq_ent {
u32 sid;
u32 ssid;
u16 grpid;
-   enum pri_resp   resp;
+   u8  resp;
} pri;
 
#define CMDQ_OP_RESUME  0x44
@@ -568,6 +566,9 @@ struct arm_smmu_evtq {
 
 struct arm_smmu_priq {
struct arm_smmu_queue   q;
+   struct iopf_queue   *iopf;
+   u64 batch;
+   wait_queue_head_t   wq;
 };
 
 /* High-level stream table and context descriptor structures */
@@ -703,6 +704,8 @@ struct arm_smmu_master {
unsigned intnum_streams;
boolats_enabled;
boolstall_enabled;
+   boolpri_supported;
+   boolprg_resp_needs_ssid;
boolsva_enabled;
struct list_headbonds;
unsigned intssid_bits;
@@ -754,6 +757,9 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, 
u16 asid);
 bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
 int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
unsigned long iova, size_t size);
+int arm_smmu_enable_pri(struct arm_smmu_master *master);
+void arm_smmu_disable_pri(struct arm_smmu_master *master);
+int arm_smmu_flush_priq(struct arm_smmu_device *smmu);
 
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 64e2082ef9ed..1fdfd40f70fd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -370,6 +370,19 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm, void *drvdata)
 void arm_smmu_sva_unbind(struct iommu_sva *handle)
 {
struct arm_smmu_bond *bond = sva_to_bond(handle);
+   struct arm_smmu_master *master = dev_iommu_priv_get(handle->dev);
+
+   /*
+* For stall, the event queue does not need to be flushed since the
+* device driver ensured all transaction are complete. For PRI however,
+* although the device driver has stopped all DMA for this PASID, it may
+* have left Page Requests in flight (if using the Stop Marker Message
+* to stop PASID). Complete them.
+*/
+   if (master->pri_supported) {
+   arm_smmu_flush_priq(master->smmu);
+   iopf_queue_flush_dev(handle->dev);
+   }
 
mutex_lock(_lock);
if (refcount_dec_and_test(>refs)) {
@@ -435,7 +448,7 @@ bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 
 static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
 {
-   return master->stall_enabled;
+   return master->stall_enabled || master->pri_supported;
 }
 
 bool 

[PATCH v8 3/9] dt-bindings: document stall property for IOMMU masters

2020-11-12 Thread Jean-Philippe Brucker
On ARM systems, some platform devices behind an IOMMU may support stall,
which is the ability to recover from page faults. Let the firmware tell us
when a device supports stall.

Reviewed-by: Rob Herring 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/iommu/iommu.txt| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt 
b/Documentation/devicetree/bindings/iommu/iommu.txt
index 3c36334e4f94..26ba9e530f13 100644
--- a/Documentation/devicetree/bindings/iommu/iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/iommu.txt
@@ -92,6 +92,24 @@ Optional properties:
   tagging DMA transactions with an address space identifier. By default,
   this is 0, which means that the device only has one address space.
 
+- dma-can-stall: When present, the master can wait for a transaction to
+  complete for an indefinite amount of time. Upon translation fault some
+  IOMMUs, instead of aborting the translation immediately, may first
+  notify the driver and keep the transaction in flight. This allows the OS
+  to inspect the fault and, for example, make physical pages resident
+  before updating the mappings and completing the transaction. Such IOMMU
+  accepts a limited number of simultaneous stalled transactions before
+  having to either put back-pressure on the master, or abort new faulting
+  transactions.
+
+  Firmware has to opt-in stalling, because most buses and masters don't
+  support it. In particular it isn't compatible with PCI, where
+  transactions have to complete before a time limit. More generally it
+  won't work in systems and masters that haven't been designed for
+  stalling. For example the OS, in order to handle a stalled transaction,
+  may attempt to retrieve pages from secondary storage in a stalled
+  domain, leading to a deadlock.
+
 
 Notes:
 ==
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 5/9] ACPI/IORT: Enable stall support for platform devices

2020-11-12 Thread Jean-Philippe Brucker
Copy the "Stall supported" bit, that tells whether a platform device
supports stall, into the fwspec struct.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/iort.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 70df1ecba7fe..3e39b2212388 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -968,6 +968,7 @@ static void iort_named_component_init(struct device *dev,
nc = (struct acpi_iort_named_component *)node->node_data;
fwspec->num_pasid_bits = FIELD_GET(ACPI_IORT_NC_PASID_BITS,
   nc->node_flags);
+   fwspec->can_stall = (nc->node_flags & ACPI_IORT_NC_STALL_SUPPORTED);
 }
 
 static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node)
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 6/9] iommu/arm-smmu-v3: Add stall support for platform devices

2020-11-12 Thread Jean-Philippe Brucker
The SMMU provides a Stall model for handling page faults in platform
devices. It is similar to PCIe PRI, but doesn't require devices to have
their own translation cache. Instead, faulting transactions are parked
and the OS is given a chance to fix the page tables and retry the
transaction.

Enable stall for devices that support it (opt-in by firmware). When an
event corresponds to a translation error, call the IOMMU fault handler.
If the fault is recoverable, it will call us back to terminate or
continue the stall.

Signed-off-by: Jean-Philippe Brucker 
---
v8:
* Extract firwmare setting into separate patch
* Don't drain event queue on unbind()
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  36 
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  26 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 173 +-
 3 files changed, 224 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 7bd98fdce5c3..124f604ed677 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -361,6 +361,13 @@
 #define CMDQ_PRI_1_GRPID   GENMASK_ULL(8, 0)
 #define CMDQ_PRI_1_RESPGENMASK_ULL(13, 12)
 
+#define CMDQ_RESUME_0_SID  GENMASK_ULL(63, 32)
+#define CMDQ_RESUME_0_RESP_TERM0UL
+#define CMDQ_RESUME_0_RESP_RETRY   1UL
+#define CMDQ_RESUME_0_RESP_ABORT   2UL
+#define CMDQ_RESUME_0_RESP GENMASK_ULL(13, 12)
+#define CMDQ_RESUME_1_STAG GENMASK_ULL(15, 0)
+
 #define CMDQ_SYNC_0_CS GENMASK_ULL(13, 12)
 #define CMDQ_SYNC_0_CS_NONE0
 #define CMDQ_SYNC_0_CS_IRQ 1
@@ -377,6 +384,25 @@
 
 #define EVTQ_0_ID  GENMASK_ULL(7, 0)
 
+#define EVT_ID_TRANSLATION_FAULT   0x10
+#define EVT_ID_ADDR_SIZE_FAULT 0x11
+#define EVT_ID_ACCESS_FAULT0x12
+#define EVT_ID_PERMISSION_FAULT0x13
+
+#define EVTQ_0_SSV (1UL << 11)
+#define EVTQ_0_SSIDGENMASK_ULL(31, 12)
+#define EVTQ_0_SID GENMASK_ULL(63, 32)
+#define EVTQ_1_STAGGENMASK_ULL(15, 0)
+#define EVTQ_1_STALL   (1UL << 31)
+#define EVTQ_1_PRIV(1UL << 33)
+#define EVTQ_1_EXEC(1UL << 34)
+#define EVTQ_1_READ(1UL << 35)
+#define EVTQ_1_S2  (1UL << 39)
+#define EVTQ_1_CLASS   GENMASK_ULL(41, 40)
+#define EVTQ_1_TT_READ (1UL << 44)
+#define EVTQ_2_ADDRGENMASK_ULL(63, 0)
+#define EVTQ_3_IPA GENMASK_ULL(51, 12)
+
 /* PRI queue */
 #define PRIQ_ENT_SZ_SHIFT  4
 #define PRIQ_ENT_DWORDS((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
@@ -471,6 +497,13 @@ struct arm_smmu_cmdq_ent {
enum pri_resp   resp;
} pri;
 
+   #define CMDQ_OP_RESUME  0x44
+   struct {
+   u32 sid;
+   u16 stag;
+   u8  resp;
+   } resume;
+
#define CMDQ_OP_CMD_SYNC0x46
struct {
u64 msiaddr;
@@ -529,6 +562,7 @@ struct arm_smmu_cmdq_batch {
 
 struct arm_smmu_evtq {
struct arm_smmu_queue   q;
+   struct iopf_queue   *iopf;
u32 max_stalls;
 };
 
@@ -668,6 +702,7 @@ struct arm_smmu_master {
struct arm_smmu_stream  *streams;
unsigned intnum_streams;
boolats_enabled;
+   boolstall_enabled;
boolsva_enabled;
struct list_headbonds;
unsigned intssid_bits;
@@ -687,6 +722,7 @@ struct arm_smmu_domain {
 
struct io_pgtable_ops   *pgtbl_ops;
boolnon_strict;
+   boolstall_enabled;
atomic_tnr_ats_masters;
 
enum arm_smmu_domain_stage  stage;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f734797ea07a..64e2082ef9ed 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -435,7 +435,7 @@ bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
 
 static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
 {
-   return false;
+   return master->stall_enabled;
 }
 
 bool arm_smmu_master_sva_supported(struct arm_smmu_master *master)
@@ -459,24 +459,46 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master)
 
 

[PATCH v8 2/9] iommu/arm-smmu-v3: Maintain a SID->device structure

2020-11-12 Thread Jean-Philippe Brucker
When handling faults from the event or PRI queue, we need to find the
struct device associated to a SID. Add a rb_tree to keep track of SIDs.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 161 
 2 files changed, 144 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 58868a5677b6..7bd98fdce5c3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -648,6 +648,15 @@ struct arm_smmu_device {
 
/* IOMMU core code handle */
struct iommu_device iommu;
+
+   struct rb_root  streams;
+   struct mutexstreams_mutex;
+};
+
+struct arm_smmu_stream {
+   u32 id;
+   struct arm_smmu_master  *master;
+   struct rb_node  node;
 };
 
 /* SMMU private data for each master */
@@ -656,8 +665,8 @@ struct arm_smmu_master {
struct device   *dev;
struct arm_smmu_domain  *domain;
struct list_headdomain_head;
-   u32 *sids;
-   unsigned intnum_sids;
+   struct arm_smmu_stream  *streams;
+   unsigned intnum_streams;
boolats_enabled;
boolsva_enabled;
struct list_headbonds;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 453478c83933..d87c87136d63 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -918,8 +918,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
*smmu_domain,
 
spin_lock_irqsave(_domain->devices_lock, flags);
list_for_each_entry(master, _domain->devices, domain_head) {
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.cfgi.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.cfgi.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu, , );
}
}
@@ -1371,6 +1371,32 @@ static int arm_smmu_init_l2_strtab(struct 
arm_smmu_device *smmu, u32 sid)
return 0;
 }
 
+__maybe_unused
+static struct arm_smmu_master *
+arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
+{
+   struct rb_node *node;
+   struct arm_smmu_stream *stream;
+   struct arm_smmu_master *master = NULL;
+
+   mutex_lock(>streams_mutex);
+   node = smmu->streams.rb_node;
+   while (node) {
+   stream = rb_entry(node, struct arm_smmu_stream, node);
+   if (stream->id < sid) {
+   node = node->rb_right;
+   } else if (stream->id > sid) {
+   node = node->rb_left;
+   } else {
+   master = stream->master;
+   break;
+   }
+   }
+   mutex_unlock(>streams_mutex);
+
+   return master;
+}
+
 /* IRQ and event handlers */
 static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
 {
@@ -1604,8 +1630,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master 
*master)
 
arm_smmu_atc_inv_to_cmd(0, 0, 0, );
 
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.atc.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_issue_cmd(master->smmu, );
}
 
@@ -1648,8 +1674,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
*smmu_domain, int ssid,
if (!master->ats_enabled)
continue;
 
-   for (i = 0; i < master->num_sids; i++) {
-   cmd.atc.sid = master->sids[i];
+   for (i = 0; i < master->num_streams; i++) {
+   cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(smmu_domain->smmu, , );
}
}
@@ -2064,13 +2090,13 @@ static void arm_smmu_install_ste_for_dev(struct 
arm_smmu_master *master)
int i, j;
struct arm_smmu_device *smmu = master->smmu;
 
-   for (i = 0; i < master->num_sids; ++i) {
-   u32 sid = master->sids[i];
+   for (i = 0; i < master->num_streams; ++i) {
+   u32 sid = master->streams[i].id;
__le64 *step = arm_smmu_get_step_for_sid(smmu, sid);
 
/* Bridged PCI devices may end up with duplicated IDs */
for (j = 0; j < i; j++)
-   if (master->sids[j] == sid)
+   if (master->streams[j].id == sid)

[PATCH v8 4/9] of/iommu: Support dma-can-stall property

2020-11-12 Thread Jean-Philippe Brucker
Copy the dma-can-stall property into the fwspec structure.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/iommu.h| 2 ++
 drivers/iommu/of_iommu.c | 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a1c78c4cdeb1..9076fb592c8f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -572,6 +572,7 @@ struct iommu_group *fsl_mc_device_group(struct device *dev);
  * @iommu_fwnode: firmware handle for this device's IOMMU
  * @iommu_priv: IOMMU driver private data for this device
  * @num_pasid_bits: number of PASID bits supported by this device
+ * @can_stall: the device is allowed to stall
  * @num_ids: number of associated device IDs
  * @ids: IDs which this device may present to the IOMMU
  */
@@ -579,6 +580,7 @@ struct iommu_fwspec {
const struct iommu_ops  *ops;
struct fwnode_handle*iommu_fwnode;
u32 num_pasid_bits;
+   boolcan_stall;
unsigned intnum_ids;
u32 ids[];
 };
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e505b9130a1c..d6255ca823d8 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -212,9 +212,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
err = of_iommu_configure_device(master_np, dev, id);
 
fwspec = dev_iommu_fwspec_get(dev);
-   if (!err && fwspec)
+   if (!err && fwspec) {
of_property_read_u32(master_np, "pasid-num-bits",
 >num_pasid_bits);
+   fwspec->can_stall = of_property_read_bool(master_np,
+ 
"dma-can-stall");
+   }
}
 
/*
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 7/9] PCI/ATS: Add PRI stubs

2020-11-12 Thread Jean-Philippe Brucker
The SMMUv3 driver, which can be built without CONFIG_PCI, will soon gain
support for PRI.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
unused PRI and PASID stubs") to re-introduce the PRI stubs, and avoid
adding more #ifdefs to the SMMU driver.

Acked-by: Bjorn Helgaas 
Reviewed-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/pci-ats.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index df54cd5b15db..ccfca09fd232 100644
--- a/include/linux/pci-ats.h
+++ b/include/linux/pci-ats.h
@@ -30,6 +30,13 @@ int pci_reset_pri(struct pci_dev *pdev);
 int pci_prg_resp_pasid_required(struct pci_dev *pdev);
 bool pci_pri_supported(struct pci_dev *pdev);
 #else
+static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
+{ return -ENODEV; }
+static inline void pci_disable_pri(struct pci_dev *pdev) { }
+static inline int pci_reset_pri(struct pci_dev *pdev)
+{ return -ENODEV; }
+static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
+{ return 0; }
 static inline bool pci_pri_supported(struct pci_dev *pdev)
 { return false; }
 #endif /* CONFIG_PCI_PRI */
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 0/9] iommu: I/O page faults for SMMUv3

2020-11-12 Thread Jean-Philippe Brucker
Add support for stall and PRI to the SMMUv3 driver, along with a common
I/O Page Fault handler.

These patches were last sent as part of v7 of the larger SVA series [1].
Main changes since v7:
* Dropped CONFIG_IOMMU_PAGE_FAULT, reuse CONFIG_IOMMU_SVA_LIB instead.
* Extracted devicetree support into patch 4.
* Added patch 5 for ACPI support.
* Dropped event queue flush on unbind(). Since device drivers must
  complete DMA transactions before calling unbind(), there cannot be any
  pending stalled event.
* A few small fixes.

The series depends on "iommu/sva: Add PASID helpers" [2], since it
provides the function to search an mm_struct by PASID.

Has anyone been testing the PRI patches on hardware? I still only have a
software model to test them, so as much as I'd like to cross this off my
list, we could leave out patches 7-9 for now.

[1] 
https://lore.kernel.org/linux-iommu/20200519175502.2504091-1-jean-phili...@linaro.org/
[2] 
https://lore.kernel.org/linux-iommu/20201106155048.997886-1-jean-phili...@linaro.org/

Jean-Philippe Brucker (9):
  iommu: Add a page fault handler
  iommu/arm-smmu-v3: Maintain a SID->device structure
  dt-bindings: document stall property for IOMMU masters
  of/iommu: Support dma-can-stall property
  ACPI/IORT: Enable stall support for platform devices
  iommu/arm-smmu-v3: Add stall support for platform devices
  PCI/ATS: Add PRI stubs
  PCI/ATS: Export PRI functions
  iommu/arm-smmu-v3: Add support for PRI

 drivers/iommu/Makefile|   1 +
 .../devicetree/bindings/iommu/iommu.txt   |  18 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  69 +-
 drivers/iommu/iommu-sva-lib.h |  53 ++
 include/linux/iommu.h |   4 +
 include/linux/pci-ats.h   |   7 +
 drivers/acpi/arm64/iort.c |   1 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  52 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 605 +++---
 drivers/iommu/io-pgfault.c| 462 +
 drivers/iommu/of_iommu.c  |   5 +-
 drivers/pci/ats.c |   4 +
 12 files changed, 1191 insertions(+), 90 deletions(-)
 create mode 100644 drivers/iommu/io-pgfault.c

-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 8/9] PCI/ATS: Export PRI functions

2020-11-12 Thread Jean-Philippe Brucker
The SMMUv3 driver uses pci_{enable,disable}_pri() and related
functions. Export those functions to allow the driver to be built as a
module.

Acked-by: Bjorn Helgaas 
Reviewed-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/pci/ats.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 46bc7f31fb4d..e36d601015d9 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -191,6 +191,7 @@ void pci_pri_init(struct pci_dev *pdev)
if (status & PCI_PRI_STATUS_PASID)
pdev->pasid_required = 1;
 }
+EXPORT_SYMBOL_GPL(pci_pri_init);
 
 /**
  * pci_enable_pri - Enable PRI capability
@@ -238,6 +239,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(pci_enable_pri);
 
 /**
  * pci_disable_pri - Disable PRI capability
@@ -317,6 +319,7 @@ int pci_reset_pri(struct pci_dev *pdev)
 
return 0;
 }
+EXPORT_SYMBOL_GPL(pci_reset_pri);
 
 /**
  * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
@@ -332,6 +335,7 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
 
return pdev->pasid_required;
 }
+EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
 
 /**
  * pci_pri_supported - Check if PRI is supported.
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


REGRESSION: Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-11-12 Thread Jason Gunthorpe
On Wed, Aug 26, 2020 at 01:16:28PM +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.

Hi Thomas,

Our test team has been struggling with a regression on bare metal
SRIOV VFs since -rc1 that they were able to bisect to this series

This commit tests good:

5712c3ed549e ("Merge tag 'armsoc-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc")

This commit tests bad:

981aa1d366bf ("PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKS")

They were unable to bisect further into the series because some of the
interior commits don't boot :(

When we try to load the mlx5 driver on a bare metal VF it gets this:

[Thu Oct 22 08:54:51 2020] DMAR: DRHD: handling fault status reg 2
[Thu Oct 22 08:54:51 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
index 1600 [fault reason 37] Blocked a compatibility format interrupt request
[Thu Oct 22 08:55:04 2020] mlx5_core :42:00.1 eth4: Link down
[Thu Oct 22 08:55:11 2020] mlx5_core :42:00.1 eth4: Link up
[Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: mlx5_cmd_eq_recover:264:(pid 
3390): Recovered 1 EQEs on cmd_eq
[Thu Oct 22 08:55:54 2020] mlx5_core :42:00.2: 
wait_func_handle_exec_timeout:1051:(pid 3390): cmd0: CREATE_EQ(0×301) 
recovered after timeout
[Thu Oct 22 08:55:54 2020] DMAR: DRHD: handling fault status reg 102
[Thu Oct 22 08:55:54 2020] DMAR: [INTR-REMAP] Request device [42:00.2] fault 
index 1600 [fault reason 37] Blocked a compatibility format interrupt request

If you have any idea Ziyad and Itay can run any debugging you like.

I suppose it is because this series is handing out compatability
addr/data pairs while the IOMMU is setup to only accept remap ones
from SRIOV VFs?

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/4] iommu/iova: Add free_all_cpu_cached_iovas()

2020-11-12 Thread John Garry
Add a helper function to free the CPU rcache for all online CPUs.

There also exists a function of the same name in
drivers/iommu/intel/iommu.c, but the parameters are different, and there
should be no conflict.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 30d969a4c5fd..81b7399dd5e8 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -227,6 +227,14 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static void free_all_cpu_cached_iovas(struct iova_domain *iovad)
+{
+   unsigned int cpu;
+
+   for_each_online_cpu(cpu)
+   free_cpu_cached_iovas(cpu, iovad);
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -422,15 +430,12 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 retry:
new_iova = alloc_iova(iovad, size, limit_pfn, true);
if (!new_iova) {
-   unsigned int cpu;
-
if (!flush_rcache)
return 0;
 
/* Try replenishing IOVAs by flushing rcache. */
flush_rcache = false;
-   for_each_online_cpu(cpu)
-   free_cpu_cached_iovas(cpu, iovad);
+   free_all_cpu_cached_iovas(iovad);
goto retry;
}
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue

2020-11-12 Thread John Garry
This series contains a patch to solve the longterm IOVA issue which
leizhen originally tried to address at [0].

A sieved kernel log is at the following, showing periodic dumps of IOVA
sizes, per CPU and per depot bin, per IOVA size granule:
https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test

Notice, for example, the following logs:
[13175.355584] print_iova1 cpu_total=40135 depot_total=3866 total=44001
[83483.457858] print_iova1 cpu_total=62532 depot_total=24476 total=87008

Where total IOVA rcache size has grown from 44K->87K over a long time.

Along with this patch, I included the following:
- A smaller helper to clear all IOVAs for a domain
- Change polarity of the IOVA magazine helpers
- Small optimisation from Cong Wang included, which was never applied [1].
  There was some debate of the other patches in that series, but this one
  is quite straightforward.

Differnces to v2:
- Update commit message for patch 3/4

Differences to v1:
- Add IOVA clearing helper
- Add patch to change polarity of mag helpers
- Avoid logically-redundant extra variable in __iova_rcache_insert()

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/4b74d40a-22d1-af53-fcb6-5d7018370...@huawei.com/

Cong Wang (1):
  iommu: avoid taking iova_rbtree_lock twice

John Garry (3):
  iommu/iova: Add free_all_cpu_cached_iovas()
  iommu/iova: Avoid double-negatives in magazine helpers
  iommu/iova: Flush CPU rcache for when a depot fills

 drivers/iommu/iova.c | 66 +---
 1 file changed, 38 insertions(+), 28 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/4] iommu/iova: Avoid double-negatives in magazine helpers

2020-11-12 Thread John Garry
A similar crash to the following could be observed if initial CPU rcache
magazine allocations fail in init_iova_rcaches():

Unable to handle kernel NULL pointer dereference at virtual address 

Mem abort info:
   ESR = 0x9604
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
[] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc7-dirty #109
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 
03/15/2019
Call trace:
  free_iova_fast+0xfc/0x280
  iommu_dma_free_iova+0x64/0x70
  __iommu_dma_unmap+0x9c/0xf8
  iommu_dma_unmap_sg+0xa8/0xc8
  dma_unmap_sg_attrs+0x28/0x50
  cq_thread_v3_hw+0x2dc/0x528
  irq_thread_fn+0x2c/0xa0
  irq_thread+0x130/0x1e0
  kthread+0x154/0x158
  ret_from_fork+0x10/0x34

Code: f9400060 f102001f 54000981 d421 (f9400043)

 ---[ end trace 4afcbdfc61b60467 ]---

The issue is that expression !iova_magazine_full(NULL) evaluates true; this
falls over in in __iova_rcache_insert() when we attempt to cache a mag
and cpu_rcache->loaded == NULL:

if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;
...

if (can_insert)
iova_magazine_push(cpu_rcache->loaded, iova_pfn);

As above, can_insert is evaluated true, which it shouldn't be, and we try
to insert pfns in a NULL mag, which is not safe.

To avoid this, stop using double-negatives, like !iova_magazine_full() and
!iova_magazine_empty(), and use positive tests, like
iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these
can safely deal with cpu_rcache->{loaded, prev} = NULL.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 81b7399dd5e8..1f3f0f8b12e0 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct 
iova_domain *iovad)
mag->size = 0;
 }
 
-static bool iova_magazine_full(struct iova_magazine *mag)
+static bool iova_magazine_has_space(struct iova_magazine *mag)
 {
-   return (mag && mag->size == IOVA_MAG_SIZE);
+   if (!mag)
+   return false;
+   return mag->size < IOVA_MAG_SIZE;
 }
 
-static bool iova_magazine_empty(struct iova_magazine *mag)
+static bool iova_magazine_has_pfns(struct iova_magazine *mag)
 {
-   return (!mag || mag->size == 0);
+   if (!mag)
+   return false;
+   return mag->size;
 }
 
 static unsigned long iova_magazine_pop(struct iova_magazine *mag,
@@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
int i;
unsigned long pfn;
 
-   BUG_ON(iova_magazine_empty(mag));
+   BUG_ON(!iova_magazine_has_pfns(mag));
 
/* Only fall back to the rbtree if we have no suitable pfns at all */
for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
@@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
 
 static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn)
 {
-   BUG_ON(iova_magazine_full(mag));
+   BUG_ON(!iova_magazine_has_space(mag));
 
mag->pfns[mag->size++] = pfn;
 }
@@ -905,9 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_full(cpu_rcache->loaded)) {
+   if (iova_magazine_has_space(cpu_rcache->loaded)) {
can_insert = true;
-   } else if (!iova_magazine_full(cpu_rcache->prev)) {
+   } else if (iova_magazine_has_space(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
can_insert = true;
} else {
@@ -916,8 +920,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (new_mag) {
spin_lock(>lock);
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
-   rcache->depot[rcache->depot_size++] =
-   cpu_rcache->loaded;
+   if (cpu_rcache->loaded)
+   rcache->depot[rcache->depot_size++] =
+   cpu_rcache->loaded;
} else {
mag_to_free = cpu_rcache->loaded;
}
@@ -968,9 +973,9 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_empty(cpu_rcache->loaded)) {
+   if 

Re: [PATCHv7 1/7] iommu/io-pgtable-arm: Add support to use system cache

2020-11-12 Thread Will Deacon
On Wed, Nov 11, 2020 at 11:32:42AM +0530, Sai Prakash Ranjan wrote:
> On 2020-11-10 17:48, Will Deacon wrote:
> > On Fri, Oct 30, 2020 at 02:53:08PM +0530, Sai Prakash Ranjan wrote:
> > > Add a quirk IO_PGTABLE_QUIRK_SYS_CACHE to override the
> > > attributes set in TCR for the page table walker when
> > > using system cache.
> > > 
> > > Signed-off-by: Sai Prakash Ranjan 
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 7 ++-
> > >  include/linux/io-pgtable.h | 4 
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > b/drivers/iommu/io-pgtable-arm.c
> > > index a7a9bc08dcd1..a356caf1683a 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -761,7 +761,8 @@ arm_64_lpae_alloc_pgtable_s1(struct
> > > io_pgtable_cfg *cfg, void *cookie)
> > > 
> > >   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> > >   IO_PGTABLE_QUIRK_NON_STRICT |
> > > - IO_PGTABLE_QUIRK_ARM_TTBR1))
> > > + IO_PGTABLE_QUIRK_ARM_TTBR1 |
> > > + IO_PGTABLE_QUIRK_SYS_CACHE))
> > >   return NULL;
> > > 
> > >   data = arm_lpae_alloc_pgtable(cfg);
> > > @@ -773,6 +774,10 @@ arm_64_lpae_alloc_pgtable_s1(struct
> > > io_pgtable_cfg *cfg, void *cookie)
> > >   tcr->sh = ARM_LPAE_TCR_SH_IS;
> > >   tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
> > >   tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
> > > + } else if (cfg->quirks & IO_PGTABLE_QUIRK_SYS_CACHE) {
> > > + tcr->sh = ARM_LPAE_TCR_SH_OS;
> > > + tcr->irgn = ARM_LPAE_TCR_RGN_NC;
> > > + tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
> > 
> > Given that this only applies in the case where then page-table walker is
> > non-coherent, I think we'd be better off renaming the quirk to something
> > like IO_PGTABLE_QUIRK_ARM_OUTER_WBWA and then rejecting it in the
> > non-coherent case.
> > 
> 
> Do you mean like below?
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index a7a9bc08dcd1..94de1f71db42 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -776,7 +776,10 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg
> *cfg, void *cookie)
> } else {
> tcr->sh = ARM_LPAE_TCR_SH_OS;
> tcr->irgn = ARM_LPAE_TCR_RGN_NC;
> -   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
> +   if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> +   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
> +   else
> +   tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;

Yes, but rejecting the quirk if the walker is coherent (I accidentally said
"non-coherent" earlier on).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove dma_virt_ops v2

2020-11-12 Thread Christoph Hellwig
ping?

On Fri, Nov 06, 2020 at 07:19:31PM +0100, Christoph Hellwig wrote:
> Hi Jason,
> 
> this series switches the RDMA core to opencode the special case of
> devices bypassing the DMA mapping in the RDMA ULPs.  The virt ops
> have caused a bit of trouble due to the P2P code node working with
> them due to the fact that we'd do two dma mapping iterations for a
> single I/O, but also are a bit of layering violation and lead to
> more code than necessary.
> 
> Tested with nvme-rdma over rxe.
> 
> Note that the rds changes are untested, as I could not find any
> simple rds test setup.
> 
> Changes since v2:
>  - simplify the INFINIBAND_VIRT_DMA dependencies
>  - add a ib_uses_virt_dma helper
>  - use ib_uses_virt_dma in nvmet-rdma to disable p2p for virt_dma devices
>  - use ib_dma_max_seg_size in umem
>  - stop using dmapool in rds
> 
> Changes since v1:
>  - disable software RDMA drivers for highmem configs
>  - update the PCI commit logs
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv7 2/7] iommu/arm-smmu: Add domain attribute for system cache

2020-11-12 Thread Will Deacon
On Wed, Nov 11, 2020 at 12:10:50PM +0530, Sai Prakash Ranjan wrote:
> On 2020-11-10 17:48, Will Deacon wrote:
> > On Fri, Oct 30, 2020 at 02:53:09PM +0530, Sai Prakash Ranjan wrote:
> > > Add iommu domain attribute for using system cache aka last level
> > > cache by client drivers like GPU to set right attributes for caching
> > > the hardware pagetables into the system cache.
> > > 
> > > Signed-off-by: Sai Prakash Ranjan 
> > > ---
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.c | 17 +
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
> > >  include/linux/iommu.h |  1 +
> > >  3 files changed, 19 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > index b1cf8f0abc29..070d13f80c7e 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > @@ -789,6 +789,9 @@ static int arm_smmu_init_domain_context(struct
> > > iommu_domain *domain,
> > >   if (smmu_domain->non_strict)
> > >   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
> > > 
> > > + if (smmu_domain->sys_cache)
> > > + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_SYS_CACHE;
> > > +
> > >   pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
> > >   if (!pgtbl_ops) {
> > >   ret = -ENOMEM;
> > > @@ -1520,6 +1523,9 @@ static int arm_smmu_domain_get_attr(struct
> > > iommu_domain *domain,
> > >   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
> > >   *(int *)data = smmu_domain->non_strict;
> > >   return 0;
> > > + case DOMAIN_ATTR_SYS_CACHE:
> > > + *((int *)data) = smmu_domain->sys_cache;
> > > + return 0;
> > >   default:
> > >   return -ENODEV;
> > >   }
> > > @@ -1551,6 +1557,17 @@ static int arm_smmu_domain_set_attr(struct
> > > iommu_domain *domain,
> > >   else
> > >   smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> > >   break;
> > > + case DOMAIN_ATTR_SYS_CACHE:
> > > + if (smmu_domain->smmu) {
> > > + ret = -EPERM;
> > > + goto out_unlock;
> > > + }
> > > +
> > > + if (*((int *)data))
> > > + smmu_domain->sys_cache = true;
> > > + else
> > > + smmu_domain->sys_cache = false;
> > > + break;
> > >   default:
> > >   ret = -ENODEV;
> > >   }
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > index 885840f3bec8..dfc44d806671 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > @@ -373,6 +373,7 @@ struct arm_smmu_domain {
> > >   struct mutexinit_mutex; /* Protects smmu pointer */
> > >   spinlock_t  cb_lock; /* Serialises ATS1* ops and 
> > > TLB syncs */
> > >   struct iommu_domain domain;
> > > + boolsys_cache;
> > >  };
> > > 
> > >  struct arm_smmu_master_cfg {
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index b95a6f8db6ff..4f4bb9c6f8f6 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -118,6 +118,7 @@ enum iommu_attr {
> > >   DOMAIN_ATTR_FSL_PAMUV1,
> > >   DOMAIN_ATTR_NESTING,/* two stages of translation */
> > >   DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > > + DOMAIN_ATTR_SYS_CACHE,
> > 
> > I think you're trying to make this look generic, but it's really not.
> > If we need to funnel io-pgtable quirks through domain attributes, then I
> > think we should be open about that and add something like
> > DOMAIN_ATTR_IO_PGTABLE_CFG which could take a struct of page-table
> > configuration data for the domain (this could just be quirks initially,
> > but maybe it's worth extending to take ias, oas and page size)
> > 
> 
> Actually the initial versions used DOMAIN_ATTR_QCOM_SYS_CACHE
> to make it QCOM specific and not generic, I don't see anyone else
> using this attribute, would that work?

No -- I'd prefer to have _one_ domain attribute for funneling all the
IP_PGTABLE_CFG data. Otherwise, we'll just end up with things like
DOMAIN_ATTR_QCOM_SYS_CACHE_EXT or DOMAIN_ATTR_QCOM_QUIRKS later on.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: avoid unnecessory panic if iommu init fail in tboot system

2020-11-12 Thread Lukasz Hawrylko
On Tue, 2020-11-10 at 15:19 +0800, Zhenzhong Duan wrote:
> "intel_iommu=off" command line is used to disable iommu but iommu is force
> enabled in a tboot system for security reason.
> 
> However for better performance on high speed network device, a new option
> "intel_iommu=tboot_noforce" is introduced to disable the force on.
> 
> By default kernel should panic if iommu init fail in tboot for security
> reason, but it's unnecessory if we use "intel_iommu=tboot_noforce,off".
> 
> Fix the code setting force_on and move intel_iommu_tboot_noforce
> from tboot code to intel iommu code.
> 
> Fixes: 7304e8f28bb2 ("iommu/vt-d: Correctly disable Intel IOMMU force on")
> Signed-off-by: Zhenzhong Duan 
> ---
> v2: move ckeck of intel_iommu_tboot_noforce into iommu code per Baolu.
> 

Tested-by: Lukasz Hawrylko 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu