AMD-Vi: Event logged [IO_PAGE_FAULT device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008]

2020-12-02 Thread Marc Smith
Hi,

First, I must preface this email by apologizing in advance for asking
about a distro kernel (RHEL in this case); so not truly reporting this
problem and requesting a fix here (I know this should be taken up with
the vendor), rather hoping someone can give me a few hints/pointers on
where to look next for debugging this issue.

I'm using RHEL 7.8.2003 (CentOS) with a 3.10.0-1127.18.2.el7 kernel.
The systems use a Supermicro H12SSW-NT board (AMD), and we have the
IOMMU enabled along with SR-IOV. I have several virtual machines (QEMU
KVM) that run on these servers, and I'm passing PCIe end-points into
the VMs (in some cases the whole PCIe EP itself, and for some devices
I use SR-IOV and pass in the VFs to the VMs). The VM's run Linux as
their guest OS (a couple different distros).

While the servers (VMs) are idle, I don't experience any problems. But
when I start doing a lot of I/O in the virtual machines (iSCSI across
Ethernet interfaces, disk I/O via SAS HBAs that are passed into the
VM, etc.) I notice the following after some time at the host layer
("hypervisor"):
Nov 29 10:50:00 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=42:00.0 domain=0x005e address=0xfffdf803 flags=0x0008]
Nov 29 22:02:03 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:02.1 domain=0x005f address=0xfffdf806 flags=0x0008]
Nov 30 02:13:54 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=42:00.0 domain=0x005e address=0xfffdf802 flags=0x0008]
Nov 30 02:28:44 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:02.0 domain=0x005e address=0xfffdf802 flags=0x0008]
Nov 30 10:48:53 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=01:00.0 domain=0x005e address=0xfffdf804 flags=0x0008]
Dec  2 07:05:22 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:03.0 domain=0x005e address=0xfffdf801 flags=0x0008]

These events happen to all PCIe devices that are passed into the VMs,
although not all at once... as you can see on the timestamps above,
they are not very frequent when under heavy load (in the log snippet
above, the system was doing a big workload over several days). For the
Ethernet devices that are passed into the VMs, I noticed that they
experience transmit hangs / resets in the virtual machines, and when
these occur, they correspond to a matching IO_PAGE_FAULT that belongs
to that PCI device.

FWIW, those NIC hangs look like this (visible in the VM guest OS):
[17879.279091] NETDEV WATCHDOG: s1p1 (bnxt_en): transmit queue 2 timed out
[17879.279111] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:447
dev_watchdog+0x121/0x17e
...
[17879.279213] bnxt_en :01:09.0 s1p1: TX timeout detected,
starting reset task!
[17883.075299] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17883.075302] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 1
failed. rc:fff0 err:0
[17886.957100] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17886.957103] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 2
failed. rc:fff0 err:0
[17890.843023] bnxt_en :01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17890.843025] bnxt_en :01:09.0 s1p1: hwrm_ring_free type 2
failed. rc:fff0 err:0

We see these NIC hangs in the VMs occur with both Broadcom and
Mellanox Ethernet adapters that are passed into the VMs, so I don't
think it's the NICs causing the IO_PAGE_FAULT events observed in the
hypervisor. Plus we see IO_PAGE_FAULT's for devices other than
Ethernet adapters.


I have several of these same servers (all using the same motherboard,
processor, memory, BIOS, etc.) and they all experience this behavior
with the IO_PAGE_FAULT events, so I don't believe it to be any one
faulty server / component. I guess my question is I'm not sure where
to dig/push next. Is this perhaps an issue with the BIOS/firmware on
these motherboards? Something with the chipset (AMD IOMMU)? A
colleague has suggested that even the AGESA may be related. Or should
I be focusing on the Linux kernel, the AMD IOMMU driver (software)?

I've been poking around other similar bug reports, and I see the
IO_PAGE_FAULT and NIC reset / transmit hang seem to be related in
other posts. This commit looked promising:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e50ce03976fbc8ae995a000c4b10c737467beaa

But I see RH has already back-ported it into their
3.10.0-1127.18.2.el7 kernel source. I'm open to trying a newer Linux
vanilla kernel (eg, 5.4.x) but would prefer to resolve this in the
RHEL kernel I'm using now. I'll take a look at this next, although due
to the complex nature of this hypervisor/VM setup, it's a bit tedious
to test.


Kernel messages from boot (using the amd_iommu_dump=1 parameter):
...
[0.214395] AMD-Vi: Using IVHD type 0x11
[0.214627] AMD-Vi: device: c0:00.2 cap: 0040 seg: 0 flags: b0 info 
[0.214628] AMD-Vi:mmio-addr: f370
[0.214634] AMD-Vi:   DEV_SELECT_RANGE_START  devid: c0:01.0 flags: 0

Re: [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue

2020-12-02 Thread Dmitry Safonov
On Tue, 1 Dec 2020 at 21:50, Will Deacon  wrote:
>
> On Tue, 17 Nov 2020 18:25:30 +0800, John Garry wrote:
> > This series contains a patch to solve the longterm IOVA issue which
> > leizhen originally tried to address at [0].
> >
> > A sieved kernel log is at the following, showing periodic dumps of IOVA
> > sizes, per CPU and per depot bin, per IOVA size granule:
> > https://raw.githubusercontent.com/hisilicon/kernel-dev/topic-iommu-5.10-iova-debug-v3/aging_test
> >
> > [...]
>
> Applied the final patch to arm64 (for-next/iommu/iova), thanks!
>
> [4/4] iommu: avoid taking iova_rbtree_lock twice
>   https://git.kernel.org/arm64/c/3a651b3a27a1

Glad it made in next, 2 years ago I couldn't convince iommu maintainer
it's worth it (but with a different justification):
https://lore.kernel.org/linux-iommu/20180621180823.805-3-d...@arista.com/

Thanks,
 Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-02 Thread Ashish Kalra
From: Ashish Kalra 

For SEV, all DMA to and from guest has to use shared (un-encrypted) pages.
SEV uses SWIOTLB to make this happen without requiring changes to device
drivers.  However, depending on workload being run, the default 64MB of
SWIOTLB might not be enough and SWIOTLB may run out of buffers to use
for DMA, resulting in I/O errors and/or performance degradation for
high I/O workloads.

Adjust the default size of SWIOTLB for SEV guests using a
percentage of the total memory available to guest for SWIOTLB buffers.

Using late_initcall() interface to invoke swiotlb_adjust() does not
work as the size adjustment needs to be done before mem_encrypt_init()
and reserve_crashkernel() which use the allocated SWIOTLB buffer size,
hence call it explicitly from setup_arch().

The SWIOTLB default size adjustment needs to be added as an architecture
specific interface/callback to allow architectures such as those supporting
memory encryption to adjust/expand SWIOTLB size for their use.

v5 fixed build errors and warnings as
Reported-by: kbuild test robot 

Signed-off-by: Ashish Kalra 
---
 arch/x86/kernel/setup.c   |  2 ++
 arch/x86/mm/mem_encrypt.c | 31 +++
 include/linux/swiotlb.h   |  6 ++
 kernel/dma/swiotlb.c  | 22 ++
 4 files changed, 61 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84f581c91db4..31e24e198061 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1149,6 +1149,8 @@ void __init setup_arch(char **cmdline_p)
if (boot_cpu_has(X86_FEATURE_GBPAGES))
hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
 
+   swiotlb_adjust();
+
/*
 * Reserve memory for crash kernel after SRAT is parsed so that it
 * won't consume hotpluggable memory.
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1bcfbcd2bfd7..46549bd3d840 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -485,7 +485,38 @@ static void print_mem_encrypt_feature_info(void)
pr_cont("\n");
 }
 
+#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT6
+
 /* Architecture __weak replacement functions */
+unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
+{
+   unsigned long size = iotlb_default_size;
+
+   /*
+* For SEV, all DMA has to occur via shared/unencrypted pages.
+* SEV uses SWOTLB to make this happen without changing device
+* drivers. However, depending on the workload being run, the
+* default 64MB of SWIOTLB may not be enough and`SWIOTLB may
+* run out of buffers for DMA, resulting in I/O errors and/or
+* performance degradation especially with high I/O workloads.
+* Adjust the default size of SWIOTLB for SEV guests using
+* a percentage of guest memory for SWIOTLB buffers.
+* Also as the SWIOTLB bounce buffer memory is allocated
+* from low memory, ensure that the adjusted size is within
+* the limits of low available memory.
+*
+*/
+   if (sev_active()) {
+   phys_addr_t total_mem = memblock_phys_mem_size();
+   size = total_mem * SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
+   size = clamp_val(size, iotlb_default_size, SZ_1G);
+   pr_info("SWIOTLB bounce buffer size adjusted to %luMB for SEV",
+   size >> 20);
+   }
+
+   return size;
+}
+
 void __init mem_encrypt_init(void)
 {
if (!sme_me_mask)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3bb72266a75a..b5904fa4b67c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 unsigned long swiotlb_size_or_default(void);
+unsigned long __init arch_swiotlb_adjust(unsigned long size);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern int swiotlb_late_init_with_default_size(size_t default_size);
 extern void __init swiotlb_update_mem_attributes(void);
@@ -77,6 +78,7 @@ void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
+void __init swiotlb_adjust(void);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
@@ -99,6 +101,10 @@ static inline bool is_swiotlb_active(void)
 {
return false;
 }
+
+static inline void swiotlb_adjust(void)
+{
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..0150ca2336bc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -163,6 +163,28 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
+unsigne

Re: [PATCH v3 1/1] vfio/type1: Add vfio_group_domain()

2020-12-02 Thread Alex Williamson
On Tue,  1 Dec 2020 09:23:28 +0800
Lu Baolu  wrote:

> Add the API for getting the domain from a vfio group. This could be used
> by the physical device drivers which rely on the vfio/mdev framework for
> mediated device user level access. The typical use case like below:
> 
>   unsigned int pasid;
>   struct vfio_group *vfio_group;
>   struct iommu_domain *iommu_domain;
>   struct device *dev = mdev_dev(mdev);
>   struct device *iommu_device = mdev_get_iommu_device(dev);
> 
>   if (!iommu_device ||
>   !iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
>   return -EINVAL;
> 
>   vfio_group = vfio_group_get_external_user_from_dev(dev);
>   if (IS_ERR_OR_NULL(vfio_group))
>   return -EFAULT;
> 
>   iommu_domain = vfio_group_domain(vfio_group);
>   if (IS_ERR_OR_NULL(iommu_domain)) {
>   vfio_group_put_external_user(vfio_group);
>   return -EFAULT;
>   }
> 
>   pasid = iommu_aux_get_pasid(iommu_domain, iommu_device);
>   if (pasid < 0) {
>   vfio_group_put_external_user(vfio_group);
>   return -EFAULT;
>   }
> 
>   /* Program device context with pasid value. */
>   ...
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/vfio/vfio.c | 18 ++
>  drivers/vfio/vfio_iommu_type1.c | 23 +++
>  include/linux/vfio.h|  3 +++
>  3 files changed, 44 insertions(+)
> 
> Change log:
>  - v2: 
> https://lore.kernel.org/linux-iommu/20201126012726.1185171-1-baolu...@linux.intel.com/
>  - Changed according to comments @ 
> https://lore.kernel.org/linux-iommu/20201130135725.70fdf...@w520.home/
>  - Fix a typo 
> https://lore.kernel.org/linux-iommu/dm5pr11mb143560e51c84baf83ae54ac0c3...@dm5pr11mb1435.namprd11.prod.outlook.com/
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 2151bc7f87ab..588e8026d94b 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2331,6 +2331,24 @@ int vfio_unregister_notifier(struct device *dev, enum 
> vfio_notify_type type,
>  }
>  EXPORT_SYMBOL(vfio_unregister_notifier);
>  
> +struct iommu_domain *vfio_group_domain(struct vfio_group *group)

Could we make this vfio_group_iommu_domain()?  We're making a callback
specific to a vfio IOMMU backend participating in the IOMMU API, so we
might as well make this callback explicitly tied to it.

> +{
> + struct vfio_container *container;
> + struct vfio_iommu_driver *driver;
> +
> + if (!group)
> + return ERR_PTR(-EINVAL);
> +
> + container = group->container;
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->group_domain))
> + return driver->ops->group_domain(container->iommu_data,
> +  group->iommu_group);

Likewise group_iommu_domain()?


> + else
> + return ERR_PTR(-ENOTTY);

Nit, we don't need 'else' here, the first branch always returns.

Otherwise I think it looks good.  Thanks,

Alex

> +}
> +EXPORT_SYMBOL_GPL(vfio_group_domain);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 67e827638995..d7b5acb3056a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2980,6 +2980,28 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, 
> dma_addr_t user_iova,
>   return ret;
>  }
>  
> +static struct iommu_domain *
> +vfio_iommu_type1_group_domain(void *iommu_data, struct iommu_group 
> *iommu_group)
> +{
> + struct iommu_domain *domain = ERR_PTR(-ENODEV);
> + struct vfio_iommu *iommu = iommu_data;
> + struct vfio_domain *d;
> +
> + if (!iommu || !iommu_group)
> + return ERR_PTR(-EINVAL);
> +
> + mutex_lock(&iommu->lock);
> + list_for_each_entry(d, &iommu->domain_list, next) {
> + if (find_iommu_group(d, iommu_group)) {
> + domain = d->domain;
> + break;
> + }
> + }
> + mutex_unlock(&iommu->lock);
> +
> + return domain;
> +}
> +
>  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>   .name   = "vfio-iommu-type1",
>   .owner  = THIS_MODULE,
> @@ -2993,6 +3015,7 @@ static const struct vfio_iommu_driver_ops 
> vfio_iommu_driver_ops_type1 = {
>   .register_notifier  = vfio_iommu_type1_register_notifier,
>   .unregister_notifier= vfio_iommu_type1_unregister_notifier,
>   .dma_rw = vfio_iommu_type1_dma_rw,
> + .group_domain   = vfio_iommu_type1_group_domain,
>  };
>  
>  static int __init vfio_iommu_type1_init(void)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 38d3c6a8dc7e..6cd0de2764cb 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -90,6 +90,7 @@ struct vfio_iommu_driver_ops {
>

Re: [RESEND PATCH v3 0/4] iommu/iova: Solve longterm IOVA issue

2020-12-02 Thread John Garry

On 01/12/2020 21:02, Will Deacon wrote:

cc'ing some more people who have touched iova code recently


On Tue, Dec 01, 2020 at 03:35:02PM +, John Garry wrote:

On 17/11/2020 10:25, John Garry wrote:
Is there any chance that we can get these picked up for 5.11? We've seen
this issue solved here for a long time.

Or, @Robin, let me know if not happy with this since v1.

BTW, patch #4 has been on the go for ~1 year now, and is a nice small
optimisation from Cong, which I picked up and already had a RB tag.

I can pick the last patch up, but I'd really like some reviewed/tested-bys
on the others.



ok, fair enough.

Considering the extremes required to unearth the main problem, it'll be 
hard to get testers, but, fwiw, I can provide a tested-by from the reporter:


Tested-by: Xiang Chen 

@Robin, You originally had some interest in this topic - are you now 
satisfied with the changes I am proposing?


Please let me know.

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-02 Thread Wang Xingang

Thanks for your reply. We are testing vSVA, and will let you know if
other problems are found.

On 2020/12/1 21:58, Auger Eric wrote:

Hi Xingang,

On 12/1/20 2:33 PM, Xingang Wang wrote:

Hi Eric

On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:

@@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 * insertion to guarantee those are observed before the TLBI. Do be
 * careful, 007.
 */
-   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+   if (ext_asid >= 0) { /* guest stage 1 invalidation */
+   cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
+   cmd.tlbi.asid   = ext_asid;
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {


Found a problem here, the cmd for guest stage 1 invalidation is built,
but it is not delivered to smmu.



Thank you for the report. I will fix that soon. With that fixed, have
you been able to run vSVA on top of the series. Do you need other stuff
to be fixed at SMMU level? As I am going to respin soon, please let me
know what is the best branch to rebase to alleviate your integration.

Best Regards

Eric

.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: arm-smmu-impl: add NXP hook to preserve bootmappings

2020-12-02 Thread Laurentiu Tudor
Hi Robin,

Sorry for the late reply, we had a few days of over here. Comments inline.

On 11/25/2020 8:10 PM, Robin Murphy wrote:
> On 2020-11-25 15:50, laurentiu.tu...@nxp.com wrote:
>> From: Laurentiu Tudor 
>>
>> Add a NXP specific hook to preserve SMMU mappings present at
>> boot time (created by the boot loader). These are needed for
>> MC firmware present on some NXP chips to continue working
>> across kernel boot and SMMU initialization.
>>
>> Signed-off-by: Laurentiu Tudor 
>> ---
>>   drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 33 ++
>>   1 file changed, 33 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
>> b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
>> index 7fed89c9d18a..ca07d9d4be69 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
>> @@ -187,6 +187,36 @@ static const struct arm_smmu_impl
>> mrvl_mmu500_impl = {
>>   .reset = arm_mmu500_reset,
>>   };
>>   +static int nxp_cfg_probe(struct arm_smmu_device *smmu)
>> +{
>> +    int i, cnt = 0;
>> +    u32 smr;
>> +
>> +    for (i = 0; i < smmu->num_mapping_groups; i++) {
>> +    smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
>> +
>> +    if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
> 
> I bet this is fun over kexec...

Right. I haven't even considered kexec.

> Note that the Qualcomm special case got a bit of a free pass since it
> involves working around a totally broken hypervisor, plus gets to play
> the "nobody sane will run an enterprise distro on their phone" card to
> an extent; I don't think the likes of Layerscape kit get it quite so
> easy ;)

I agree that this is not ideal, but the plan here was to have something
to boot vanilla kernel OOB on our chips, which is something on my mind
for quite a while now. I do realize that we won't get away with it
in the long run.

>> +    smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
>> +    smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
>> +    smmu->smrs[i].valid = true;
>> +
>> +    smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
>> +    smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
>> +    smmu->s2crs[i].cbndx = 0xff;
>> +
>> +    cnt++;
>> +    }
>> +    }
>> +
>> +    dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt,
>> +   cnt == 1 ? "" : "s");
> 
> That gets you around the initial SMMU reset, but what happens for the
> arbitrarily long period of time between the MC device getting attached
> to a default domain and the MC driver actually probing and (presumably)
> being able to map and reinitialise its firmware?

Perhaps I'm missing something, but won't the MC firmware live based on
this bypass mapping created by the bootloader and that gets preserved?

>> +
>> +    return 0;
>> +}
>> +
>> +static const struct arm_smmu_impl nxp_impl = {
>> +    .cfg_probe = nxp_cfg_probe,
>> +};
> 
> I believe you're mostly using MMU-500, so you probably don't want to
> simply throw out the relevant errata workarounds.
> 
>>   struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device
>> *smmu)
>>   {
>> @@ -226,5 +256,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct
>> arm_smmu_device *smmu)
>>   if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
>>   smmu->impl = &mrvl_mmu500_impl;
>>   +    if (of_property_read_bool(np, "nxp,keep-boot-mappings"))
>> +    smmu->impl = &nxp_impl;
> 
> Normally you'd get a "what about ACPI?" here, but given the number of
> calls and email threads we've had specifically about trying to make ACPI
> support for these platforms work, that gets upgraded to at least a "WHAT
> ABOUT ACPI!?" :P
I do have ACPI in mind, but for now I just wanted to have a
first impression on the approach. One idea I was pondering on was to
have this property in the MC node (quick reminder: MC is exposed as a NC
in ACPI, should be able to replicate the property in there too). In the
mean time, we are in collaboration with our partners on using RMRR by
adding support for it in the arm-smmu-v2 driver.

> But seriously, the case of device firmware in memory being active before
> handover to Linux is *literally* the original reason behind IORT RMRs.
> We already know we need a way to specify the equivalent thing for DT
> systems, such that both can be handled commonly. I really don't want to
> have to support a vendor-specific mechanism for not-even-fully-solving a
> completely generic issue, sorry.
> 

I remember that some months ago there was a proposal from nvidia [1] to
map per-device reserved memory into SMMU. Would it make sense to revive
it as it seemed a viable solution for our case too?

[1]
https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=213701&state=%2A&archive=both

---
Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/