date:20220517

RE: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node

2022-05-17 Thread Shameerali Kolothum Thodi via iommu



> -Original Message-
> From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com]
> Sent: 13 May 2022 10:50
> To: Robin Murphy ; Shameerali Kolothum Thodi
> ; raf...@kernel.org;
> j...@8bytes.org
> Cc: Shameerali Kolothum Thodi ;
> Guohanjun (Hanjun Guo) ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org; Linuxarm ;
> w...@kernel.org; wanghuiqiang ;
> steven.pr...@arm.com; sami.muja...@arm.com; j...@solid-run.com;
> eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org
> Subject: Re: [PATCH v12 0/9] ACPI/IORT: Support for IORT RMR node
> 
> [with Christoph's correct email address]
> 
> On Tue, May 10, 2022 at 09:07:00AM +0100, Robin Murphy wrote:
> > On 2022-05-10 08:23, Shameerali Kolothum Thodi wrote:
> > > Hi Joerg/Robin,
> > >
> > > I think this series is now ready to be merged. Could you please let
> > > me know if there is anything missing.
> >
> > Fine by me - these patches have had enough review and testing now that
> > even if anything else did come up, I think it would be better done as
> > follow-up work on the merged code.
> 
> Given the ACPICA dependency I believe it is best for this series
> to go via the ACPI tree, right ?
> 
> I assume there are all the required ACKs for that to happen.

The SMMUv3/SMMU related changes (patches 6 - 9) still doesn't have
explicit ACK from maintainers other than the go ahead above from Robin.

Just thought of highlighting it as not sure that will be an issue or not.

Thanks,
Shameer
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-17 Thread Yicong Yang via iommu

On 2022/5/16 22:03, Jonathan Cameron wrote:
> On Mon, 16 May 2022 20:52:17 +0800
> Yicong Yang  wrote:
> 
>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
>> Endpoint(RCiEP) device, providing the capability to dynamically monitor and
>> tune the PCIe traffic and trace the TLP headers.
>>
>> Add the driver for the device to enable the trace function. Register PMU
>> device of PTT trace, then users can use trace through perf command. The
>> driver makes use of perf AUX trace function and support the following
>> events to configure the trace:
>>
>> - filter: select Root port or Endpoint to trace
>> - type: select the type of traced TLP headers
>> - direction: select the direction of traced TLP headers
>> - format: select the data format of the traced TLP headers
>>
>> This patch initially add a basic driver of PTT trace.
>>
>> Signed-off-by: Yicong Yang 
> 
> Hi Yicong,
> 
> It's been a while since I looked at this driver, so I'll admit
> I can't remember if any of the things I've raised below were
> previously discussed. 
> 
> All minor stuff (biggest is question of failing cleanly in unlikely
> case of failing the allocation in the filter addition vs carrying
> on anyway), so feel free to add
> 
> Reviewed-by: Jonathan Cameron 
> 
>> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
>> new file mode 100644
>> index ..908c09a98161
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/Makefile
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
>> b/drivers/hwtracing/ptt/hisi_ptt.c
>> new file mode 100644
>> index ..ef25ce98f664
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> 
> 
> ...
> 
> 
>> +
>> +static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data)
>> +{
>> +struct hisi_ptt_filter_desc *filter;
>> +struct hisi_ptt *hisi_ptt = data;
>> +
>> +filter = kzalloc(sizeof(*filter), GFP_KERNEL);
>> +if (!filter) {
>> +pci_err(hisi_ptt->pdev, "failed to add filter %s\n", 
>> pci_name(pdev));
> 
> If this fails we carry on anyway (no error checking on the bus_walk).
> I think we should error out in that case (would need to use a flag placed
> somewhere in hisi_ptt to tell we had an error).
> 
> That would complicate the unwind though.
> Easiest way to do that unwind is probably to register a separate
> devm_add_action_or_reset() callback for each filter.
> 
> If you prefer to carry on even with this allocation error, then maybe add a 
> comment
> here somewhere to make it clear that will happen.
> 

I'd prefer to carry on with this memory allocation error because this may not 
be fatal
as: 1) the filter maybe partial initialized but the trace is available with 
resident
filters and 2) the driver log here provides enough information that which 
filter has
problem to add. 3) furthermore the tune function doesn't rely on this so we 
give a
chance for the tune to be available if the filter allocation failed.

The log message provide information about what happened here, will add a 
comment here
why we decide to carry on:

/*
 * We won't fail the probe if filter allocation failed here. The filters
 * should be partial initialized and users would know which filter fails
 * through the log. Other functions of PTT device are still available.
 */
filter = kzalloc(sizeof(*filter), GFP_KERNEL);
if (!filter) {
pci_err(hisi_ptt->pdev, "failed to add filter %s\n", 
pci_name(pdev));
return -ENOMEM;
}

It's better to manually manage the filters as we're going to add support of 
dynamically
updating the filters, which means the filter can be added/removed after probe. 
With
devres it'll be complex to keep the order (the new added one will be released 
before
the PMU unregistration, which is not expected).

Will address the rest comments.

Thanks,
Yicong

>> +return -ENOMEM;
>> +}
>> +
>> +filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn);
>> +
>> +if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
>> +filter->is_port = true;
>> +list_add_tail(&filter->list, &hisi_ptt->port_filters);
>> +
>> +/* Update the available port mask */
>> +hisi_ptt->port_mask |= hisi_ptt_get_filter_val(filter->devid, 
>> true);
>> +} else {
>> +list_add_tail(&filter->list, &hisi_ptt->req_filters);
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static void hisi_ptt_release_filters(void *data)
>> +{
>> +struct hisi_ptt_filter_desc *filter, *tmp;
>> +struct hisi_ptt *hisi_ptt = data;
>> +
>> +list_for_each_entry_safe(filter, tmp, &hisi_ptt->req_filters, list) {
>> +list_del(&filter->list);
>> +kfree(filter);
> 
> I think with separate release per entry above, this bit become simpler as
> we walk all the elemen

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-17 Thread Yicong Yang via iommu

On 2022/5/17 0:23, John Garry wrote:
> On 16/05/2022 13:52, Yicong Yang wrote:
>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
>> Endpoint(RCiEP) device, providing the capability to dynamically monitor and
>> tune the PCIe traffic and trace the TLP headers.
>>
>> Add the driver for the device to enable the trace function. Register PMU
>> device of PTT trace, then users can use trace through perf command. The
>> driver makes use of perf AUX trace function and support the following
>> events to configure the trace:
>>
>> - filter: select Root port or Endpoint to trace
>> - type: select the type of traced TLP headers
>> - direction: select the direction of traced TLP headers
>> - format: select the data format of the traced TLP headers
>>
>> This patch initially add a basic driver of PTT trace.
> 
> Initially add basic trace support.
> 
>>
>> Signed-off-by: Yicong Yang 
> 
> Generally this looks ok, apart from nitpicking below, so, FWIW:
> Reviewed-by: John Garry 
> >> ---
>>   drivers/Makefile |   1 +
>>   drivers/hwtracing/Kconfig    |   2 +
>>   drivers/hwtracing/ptt/Kconfig    |  12 +
>>   drivers/hwtracing/ptt/Makefile   |   2 +
>>   drivers/hwtracing/ptt/hisi_ptt.c | 964 +++
>>   drivers/hwtracing/ptt/hisi_ptt.h | 178 ++
>>   6 files changed, 1159 insertions(+)
>>   create mode 100644 drivers/hwtracing/ptt/Kconfig
>>   create mode 100644 drivers/hwtracing/ptt/Makefile
>>   create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>>   create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
>>

[...]

>> +static int hisi_ptt_cpu_teardown(unsigned int cpu, struct hlist_node *node)
>> +{
>> +    struct hisi_ptt *hisi_ptt;
>> +    int target, src;
>> +
>> +    hisi_ptt = hlist_entry_safe(node, struct hisi_ptt, hotplug_node);
>> +    src = hisi_ptt->trace_ctrl.on_cpu;
>> +
>> +    if (!hisi_ptt->trace_ctrl.started || src != cpu)
>> +    return 0;
>> +
>> +    target = 
>> cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)));
>> +    if (target < nr_cpumask_bits) {
> 
> the comment for cpumask_any() hints to check against nr_cpu_ids - any 
> specific reason to check against nr_cpumask_bits?
> 

here should be:
if (target >= nr_cpumask_bits) {

will fix this up.

>> +    dev_err(hisi_ptt->hisi_ptt_pmu.dev, "no available cpu for perf 
>> context migration\n");
>> +    return 0;
>> +    }
>> +
>> +    perf_pmu_migrate_context(&hisi_ptt->hisi_ptt_pmu, src, target);
>> +    WARN_ON(irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, 
>> HISI_PTT_TRACE_DMA_IRQ),
>> + cpumask_of(cpu)));
>> +    hisi_ptt->trace_ctrl.on_cpu = target;
>> +
>> +    return 0;
>> +}
>> +
>> +static int __init hisi_ptt_init(void)
>> +{
>> +    int ret;
>> +
>> +    ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRV_NAME, NULL,
>> +  hisi_ptt_cpu_teardown);
>> +    if (ret < 0)
>> +    return ret;
>> +    hisi_ptt_pmu_online = ret;
>> +
>> +    ret = pci_register_driver(&hisi_ptt_driver);
>> +    if (ret)
>> +    cpuhp_remove_multi_state(hisi_ptt_pmu_online);
>> +
>> +    return ret;
>> +}
>> +module_init(hisi_ptt_init);
>> +
>> +static void __exit hisi_ptt_exit(void)
>> +{
>> +    pci_unregister_driver(&hisi_ptt_driver);
>> +    cpuhp_remove_multi_state(hisi_ptt_pmu_online);
>> +}
>> +module_exit(hisi_ptt_exit);
>> +
>> +MODULE_LICENSE("GPL");
>> +MODULE_AUTHOR("Yicong Yang ");
>> +MODULE_DESCRIPTION("Driver for HiSilicon PCIe tune and trace device");
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.h 
>> b/drivers/hwtracing/ptt/hisi_ptt.h
>> new file mode 100644
>> index ..2344e4195648
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.h
>> @@ -0,0 +1,178 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Driver for HiSilicon PCIe tune and trace device
>> + *
>> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
>> + * Author: Yicong Yang 
>> + */
>> +
>> +#ifndef _HISI_PTT_H
>> +#define _HISI_PTT_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define DRV_NAME "hisi_ptt"
>> +
>> +/*
>> + * The definition of the device registers and register fields.
>> + */
>> +#define HISI_PTT_TRACE_ADDR_SIZE    0x0800
>> +#define HISI_PTT_TRACE_ADDR_BASE_LO_0    0x0810
>> +#define HISI_PTT_TRACE_ADDR_BASE_HI_0    0x0814
>> +#define HISI_PTT_TRACE_ADDR_STRIDE    0x8
>> +#define HISI_PTT_TRACE_CTRL    0x0850
>> +#define   HISI_PTT_TRACE_CTRL_EN    BIT(0)
>> +#define   HISI_PTT_TRACE_CTRL_RST    BIT(1)
>> +#define   HISI_PTT_TRACE_CTRL_RXTX_SEL    GENMASK(3, 2)
>> +#define   HISI_PTT_TRACE_CTRL_TYPE_SEL    GENMASK(7, 4)
>> +#define   HISI_PTT_TRACE_CTRL_DATA_FORMAT    BIT(14)
>> +#define   HISI_PTT_TRACE_CTRL_FILTER_MODE    BIT(15)
>> +#define   HISI_PTT_TRACE_CTRL_TARGET_SEL    GENMASK(31, 16)
>> +#define HISI_PTT_TRACE_INT_STAT    0x0890
>> +#define   HISI_PTT_TRACE_INT_STAT_MASK    GENMASK(3, 0)
>> +#defin

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-17 Thread John Garry via iommu


On 17/05/2022 09:09, Yicong Yang wrote:

+    target = cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)));
+    if (target < nr_cpumask_bits) {

the comment for cpumask_any() hints to check against nr_cpu_ids - any specific 
reason to check against nr_cpumask_bits?


here should be:
if (target >= nr_cpumask_bits) {

will fix this up.



I am still not sure that using nr_cpumask_bits is correct.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()

2022-05-17 Thread Christoph Hellwig

On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote:
> In __iommu_dma_alloc_noncontiguous() the value returned by
> iommu_map_sg_atomic() is checked for being smaller than size. Before
> commit ad8f36e4b6b1 ("iommu: return full error code from
> iommu_map_sg[_atomic]()") this simply checked if the requested size was
> successfully mapped.
> 
> After that commit iommu_map_sg_atomic() may also return a negative
> error value. In principle this too would be covered by the existing
> check. There is one problem however, as size is of type size_t while the
> return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets
> converted to size_t and negative error values end up as very large
> positive values making the check succeed. Fix this by making the return
> type visible with a local variable and add an explicit cast to ssize_t.
> 
> Fixes: ad8f36e4b6b1 ("iommu: return full error code from 
> iommu_map_sg[_atomic]()")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Niklas Schnelle 

I don't see what the point of the newly added local variable is here.
Just casting size should be all that is needed as far as I can tell.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread Christoph Hellwig

On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote:
> For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
> exceeds the IOVA rcache upper limit (meaning that they are not cached),
> performance can be reduced.
> 
> Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
> allows the drivers to know the mapping limit and thus limit the requested 
> IOVA lengths.
> 
> This resolves the performance issue originally reported in [0] for a SCSI
> HBA driver which was regularly mapping SGLs which required IOVAs in
> excess of the IOVA caching limit. In this case the block layer limits the
> max sectors per request - as configured in __scsi_init_queue() - which
> will limit the total SGL length the driver tries to map and in turn limits
> IOVA lengths requested.
> 
> [0] 
> https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
> 
> Signed-off-by: John Garry 
> ---
> Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not
> a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 09f6e1c0f9c0..e2d5205cde37 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1442,6 +1442,21 @@ static unsigned long 
> iommu_dma_get_merge_boundary(struct device *dev)
>   return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
>  }
>  

> + if (!domain)
> + return 0;
> +
> + cookie = domain->iova_cookie;
> + if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
> + return 0;

Can these conditions even be true here?

> +static inline unsigned long iova_rcache_range(void)
> +{
> + return 0;
> +}

Given that IOMMU_DMA select IOMMU_IOVA there is no need for this stub.

Otherwise this looks sensible to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu


On 17/05/2022 09:38, Christoph Hellwig wrote:

On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote:

For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.

Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
allows the drivers to know the mapping limit and thus limit the requested
IOVA lengths.

This resolves the performance issue originally reported in [0] for a SCSI
HBA driver which was regularly mapping SGLs which required IOVAs in
excess of the IOVA caching limit. In this case the block layer limits the
max sectors per request - as configured in __scsi_init_queue() - which
will limit the total SGL length the driver tries to map and in turn limits
IOVA lengths requested.


BTW, on a separate topic, I noticed that even with this change my ATA 
devices have max_hw_sectors_kb of 32767, as opposed to 128 for SAS 
devices. It seems that libata-scsi - specifically ata_scsi_dev_config() 
- doesn't honour the shost max_sectors limit. I guess that is not 
intentional.




[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/

Signed-off-by: John Garry 
---
Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not
a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..e2d5205cde37 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1442,6 +1442,21 @@ static unsigned long iommu_dma_get_merge_boundary(struct 
device *dev)
return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
  }
  



+   if (!domain)
+   return 0;
+
+   cookie = domain->iova_cookie;
+   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+   return 0;


Can these conditions even be true here?


I don't think so. Paranoia on my part.




+static inline unsigned long iova_rcache_range(void)
+{
+   return 0;
+}


Given that IOMMU_DMA select IOMMU_IOVA there is no need for this stub.


hmmm.. ok. Policy was to be stub everything but I think that it has changed.



Otherwise this looks sensible to me.

.


Great, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-17 Thread Yong Wu via iommu

On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote:
> Add support for the M4Us found in the MT6795 Helio X10 SoC.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 71b2ace74cd6..3d802dd3f377 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -157,6 +157,7 @@
>  enum mtk_iommu_plat {
>   M4U_MT2712,
>   M4U_MT6779,
> + M4U_MT6795,
>   M4U_MT8167,
>   M4U_MT8173,
>   M4U_MT8183,
> @@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct
> mtk_iommu_data *data, unsigned int ban
>* Global control settings are in bank0. May re-init these
> global registers
>* since no sure if there is bank0 consumers.
>*/
> - if (data->plat_data->m4u_plat == M4U_MT8173) {
> + if (data->plat_data->m4u_plat == M4U_MT6795 ||
> + data->plat_data->m4u_plat == M4U_MT8173) {
>   regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
>F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
>   } else {
> @@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
>   case M4U_MT2712:
>   p = "mediatek,mt2712-infracfg";
>   break;
> + case M4U_MT6795:
> + p = "mediatek,mt6795-infracfg";
> + break;
>   case M4U_MT8173:
>   p = "mediatek,mt8173-infracfg";
>   break;
> @@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data
> mt6779_data = {
>   .larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
>  };
>  
> +static const struct mtk_iommu_plat_data mt6795_data = {
> + .m4u_plat = M4U_MT6795,
> + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
> + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
> + .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
> + .banks_num= 1,
> + .banks_enable = {true},
> + .iova_region  = single_domain,
> + .iova_region_nr = ARRAY_SIZE(single_domain),
> + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
> */
> +};

This is nearly same with mt8173_data. mt8173 has one more larb than
mt6795, its larbid_remap is also ok for mt6795.

thus it looks we could use mt8173 as the backward compatible.
compatible = "mediatek,mt6795-m4u",
 "mediatek,mt8173-m4u";

After this, the only thing is about "mediatek,mt6795-infracfg". we have
to try again with mediatek,mt6795-infracfg after mediatek,mt8173-
infracfg fail. I think we should allow the backward case in 4GB mode
judgment if we have.

What's your opinion? or some other suggestion?
Thanks.

> +
>  static const struct mtk_iommu_plat_data mt8167_data = {
>   .m4u_plat = M4U_MT8167,
>   .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR |
> MTK_IOMMU_TYPE_MM,
> @@ -1515,6 +1532,7 @@ static const struct mtk_iommu_plat_data
> mt8195_data_vpp = {
>  static const struct of_device_id mtk_iommu_of_ids[] = {
>   { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data},
>   { .compatible = "mediatek,mt6779-m4u", .data = &mt6779_data},
> + { .compatible = "mediatek,mt6795-m4u", .data = &mt6795_data},
>   { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data},
>   { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data},
>   { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data},

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread Christoph Hellwig

On Tue, May 17, 2022 at 10:02:00AM +0100, John Garry wrote:
> BTW, on a separate topic, I noticed that even with this change my ATA 
> devices have max_hw_sectors_kb of 32767, as opposed to 128 for SAS devices. 
> It seems that libata-scsi - specifically ata_scsi_dev_config() - doesn't 
> honour the shost max_sectors limit. I guess that is not intentional.

I don't think it is.  the libsas/libsata integration is a bit messy
sometimes..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-17 Thread Yicong Yang via iommu

On 2022/5/17 16:21, John Garry wrote:
> On 17/05/2022 09:09, Yicong Yang wrote:
 +    target = 
 cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)));
 +    if (target < nr_cpumask_bits) {
>>> the comment for cpumask_any() hints to check against nr_cpu_ids - any 
>>> specific reason to check against nr_cpumask_bits?
>>>
>> here should be:
>> if (target >= nr_cpumask_bits) {
>>
>> will fix this up.
>>
> 
> I am still not sure that using nr_cpumask_bits is correct.

Let's use nr_cpu_ids to match the comment of cpumask_any(). Actually we should 
have
nr_cpu_ids(possible cpus, init to NR_CPUS) <= nr_cpumask_bits (NR_CPUS) so it's 
ok here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-17 Thread Christoph Hellwig

Thanks,

applied to the dma-mapping for-next tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-17 Thread AngeloGioacchino Del Regno


Il 17/05/22 11:08, Yong Wu ha scritto:

On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote:

Add support for the M4Us found in the MT6795 Helio X10 SoC.

Signed-off-by: AngeloGioacchino Del Regno <
angelogioacchino.delre...@collabora.com>
---
  drivers/iommu/mtk_iommu.c | 20 +++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 71b2ace74cd6..3d802dd3f377 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -157,6 +157,7 @@
  enum mtk_iommu_plat {
M4U_MT2712,
M4U_MT6779,
+   M4U_MT6795,
M4U_MT8167,
M4U_MT8173,
M4U_MT8183,
@@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct
mtk_iommu_data *data, unsigned int ban
 * Global control settings are in bank0. May re-init these
global registers
 * since no sure if there is bank0 consumers.
 */
-   if (data->plat_data->m4u_plat == M4U_MT8173) {
+   if (data->plat_data->m4u_plat == M4U_MT6795 ||
+   data->plat_data->m4u_plat == M4U_MT8173) {
regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
 F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
} else {
@@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct
platform_device *pdev)
case M4U_MT2712:
p = "mediatek,mt2712-infracfg";
break;
+   case M4U_MT6795:
+   p = "mediatek,mt6795-infracfg";
+   break;
case M4U_MT8173:
p = "mediatek,mt8173-infracfg";
break;
@@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data
mt6779_data = {
.larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
  };
  
+static const struct mtk_iommu_plat_data mt6795_data = {

+   .m4u_plat = M4U_MT6795,
+   .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
+   HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+   .banks_num= 1,
+   .banks_enable = {true},
+   .iova_region  = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
+   .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
*/
+};


This is nearly same with mt8173_data. mt8173 has one more larb than
mt6795, its larbid_remap is also ok for mt6795.



I think that we should be explicit about the larbid_remap property,
since mt6795 has one less larb, we should explicitly say that like
I did there... that's only for human readability I admit ... but,
still, I wouldn't want to see people thinking that MT6795 has 6 LARBs
because they've read that larbid_remap having 6 entries.


thus it looks we could use mt8173 as the backward compatible.
 compatible = "mediatek,mt6795-m4u",
  "mediatek,mt8173-m4u";

After this, the only thing is about "mediatek,mt6795-infracfg". we have
to try again with mediatek,mt6795-infracfg after mediatek,mt8173-
infracfg fail. I think we should allow the backward case in 4GB mode
judgment if we have.

What's your opinion? or some other suggestion?
Thanks.


I know, I may have a plan for that, but I wanted to have a good reason to
propose such a thing, as if it's just about two SoCs needing that, there
would be no good reason to get things done differently.

...so, in order to provide a good cleanup, we have two possible roads to
follow here: either we add a generic "mediatek,infracfg" compatible to the
infra node (but I don't like that), or we can do it like it was previously
done in mtk-pm-domains.c (I prefer that approach):

iommu: iommu@somewhere {
... something ...
mediatek,infracfg = <&infracfg>;
};

infracfg = syscon_regmap_lookup_by_compatible(node, "mediatek,infracfg");
if (IS_ERR(infracfg)) {
/* try with the older way */
switch (...) {
case  p = "mediatek,mt2712-infracfg";
... blah blah ...
}
/* legacy also failed, ouch! */
if (IS_ERR(infracfg))
return PTR_ERR(infracfg);
}

ret = regmap_read ... etc etc etc

Cheers,
Angelo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-17 Thread Matthias Brugger




On 17/05/2022 11:26, AngeloGioacchino Del Regno wrote:

Il 17/05/22 11:08, Yong Wu ha scritto:

On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno wrote:

Add support for the M4Us found in the MT6795 Helio X10 SoC.

Signed-off-by: AngeloGioacchino Del Regno <
angelogioacchino.delre...@collabora.com>
---
  drivers/iommu/mtk_iommu.c | 20 +++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 71b2ace74cd6..3d802dd3f377 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -157,6 +157,7 @@
  enum mtk_iommu_plat {
  M4U_MT2712,
  M4U_MT6779,
+    M4U_MT6795,
  M4U_MT8167,
  M4U_MT8173,
  M4U_MT8183,
@@ -953,7 +954,8 @@ static int mtk_iommu_hw_init(const struct
mtk_iommu_data *data, unsigned int ban
   * Global control settings are in bank0. May re-init these
global registers
   * since no sure if there is bank0 consumers.
   */
-    if (data->plat_data->m4u_plat == M4U_MT8173) {
+    if (data->plat_data->m4u_plat == M4U_MT6795 ||
+    data->plat_data->m4u_plat == M4U_MT8173) {
  regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
   F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
  } else {
@@ -1138,6 +1140,9 @@ static int mtk_iommu_probe(struct
platform_device *pdev)
  case M4U_MT2712:
  p = "mediatek,mt2712-infracfg";
  break;
+    case M4U_MT6795:
+    p = "mediatek,mt6795-infracfg";
+    break;
  case M4U_MT8173:
  p = "mediatek,mt8173-infracfg";
  break;
@@ -1404,6 +1409,18 @@ static const struct mtk_iommu_plat_data
mt6779_data = {
  .larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
  };
+static const struct mtk_iommu_plat_data mt6795_data = {
+    .m4u_plat = M4U_MT6795,
+    .flags  = HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
+    HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
+    .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+    .banks_num    = 1,
+    .banks_enable = {true},
+    .iova_region  = single_domain,
+    .iova_region_nr = ARRAY_SIZE(single_domain),
+    .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
*/
+};


This is nearly same with mt8173_data. mt8173 has one more larb than
mt6795, its larbid_remap is also ok for mt6795.



I think that we should be explicit about the larbid_remap property,
since mt6795 has one less larb, we should explicitly say that like
I did there... that's only for human readability I admit ... but,
still, I wouldn't want to see people thinking that MT6795 has 6 LARBs
because they've read that larbid_remap having 6 entries.


thus it looks we could use mt8173 as the backward compatible.
 compatible = "mediatek,mt6795-m4u",
  "mediatek,mt8173-m4u";

After this, the only thing is about "mediatek,mt6795-infracfg". we have
to try again with mediatek,mt6795-infracfg after mediatek,mt8173-
infracfg fail. I think we should allow the backward case in 4GB mode
judgment if we have.

What's your opinion? or some other suggestion?
Thanks.


I know, I may have a plan for that, but I wanted to have a good reason to
propose such a thing, as if it's just about two SoCs needing that, there
would be no good reason to get things done differently.

...so, in order to provide a good cleanup, we have two possible roads to
follow here: either we add a generic "mediatek,infracfg" compatible to the
infra node (but I don't like that), or we can do it like it was previously
done in mtk-pm-domains.c (I prefer that approach):

iommu: iommu@somewhere {
 ... something ...
 mediatek,infracfg = <&infracfg>;
};

infracfg = syscon_regmap_lookup_by_compatible(node, "mediatek,infracfg");
if (IS_ERR(infracfg)) {
 /* try with the older way */
 switch (...) {
 case  p = "mediatek,mt2712-infracfg";
 ... blah blah ...
 }
 /* legacy also failed, ouch! */
 if (IS_ERR(infracfg))
     return PTR_ERR(infracfg);
}

ret = regmap_read ... etc etc etc



I prefer that approach as well.

Regards,
Matthias


Cheers,
Angelo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()

2022-05-17 Thread Niklas Schnelle

On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote:
> On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote:
> > In __iommu_dma_alloc_noncontiguous() the value returned by
> > iommu_map_sg_atomic() is checked for being smaller than size. Before
> > commit ad8f36e4b6b1 ("iommu: return full error code from
> > iommu_map_sg[_atomic]()") this simply checked if the requested size was
> > successfully mapped.
> > 
> > After that commit iommu_map_sg_atomic() may also return a negative
> > error value. In principle this too would be covered by the existing
> > check. There is one problem however, as size is of type size_t while the
> > return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets
> > converted to size_t and negative error values end up as very large
> > positive values making the check succeed. Fix this by making the return
> > type visible with a local variable and add an explicit cast to ssize_t.
> > 
> > Fixes: ad8f36e4b6b1 ("iommu: return full error code from 
> > iommu_map_sg[_atomic]()")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Niklas Schnelle 
> 
> I don't see what the point of the newly added local variable is here.
> Just casting size should be all that is needed as far as I can tell.

No technical reason just found it easier to read and more descriptive.
I'll sent a v2 with just the cast, it does simplify the commit message.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()

2022-05-17 Thread Robin Murphy


On 2022-05-17 11:17, Niklas Schnelle wrote:

On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote:

On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote:

In __iommu_dma_alloc_noncontiguous() the value returned by
iommu_map_sg_atomic() is checked for being smaller than size. Before
commit ad8f36e4b6b1 ("iommu: return full error code from
iommu_map_sg[_atomic]()") this simply checked if the requested size was
successfully mapped.

After that commit iommu_map_sg_atomic() may also return a negative
error value. In principle this too would be covered by the existing
check. There is one problem however, as size is of type size_t while the
return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets
converted to size_t and negative error values end up as very large
positive values making the check succeed. Fix this by making the return
type visible with a local variable and add an explicit cast to ssize_t.

Fixes: ad8f36e4b6b1 ("iommu: return full error code from 
iommu_map_sg[_atomic]()")
Cc: sta...@vger.kernel.org
Signed-off-by: Niklas Schnelle 


I don't see what the point of the newly added local variable is here.
Just casting size should be all that is needed as far as I can tell.


No technical reason just found it easier to read and more descriptive.
I'll sent a v2 with just the cast, it does simplify the commit message.


Note that this is already fixed upstream, though:

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=core&id=a3884774d731f03d3a3dd4fb70ec2d9341ceb39d

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/dma: Fix check for error return from iommu_map_sg_atomic()

2022-05-17 Thread Niklas Schnelle

In __iommu_dma_alloc_noncontiguous() the return value of
iommu_map_sg_atomic() is treated as an error if it is smaller than size.
Before upstream commit ad8f36e4b6b1 ("iommu: return full error code from
iommu_map_sg[_atomic]()") this simply checked if the requested size was
successfully mapped.

After that commit iommu_map_sg_atomic() may also return a negative
error value. In principle this too would be covered by the existing
check. There is one problem however, as size is of type size_t while the
return type of iommu_map_sg_atomic() is now of type ssize_t the latter gets
converted to size_t and negative error values end up as very large
positive values making the check succeed even if an error was returned.
Fix this by casting size to ssize_t.

Fixes: ad8f36e4b6b1 ("iommu: return full error code from 
iommu_map_sg[_atomic]()")
Cc: sta...@vger.kernel.org
Signed-off-by: Niklas Schnelle 
---
v1 -> v2:
- Don't needlessly add a local variable

 drivers/iommu/dma-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..80db2aa5458c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -814,7 +814,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
}
 
if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
-   < size)
+   < (ssize_t)size)
goto out_free_sg;
 
sgt->sgl->dma_address = iova;
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix check for error return from iommu_map_sg_atomic()

2022-05-17 Thread Niklas Schnelle

On Tue, 2022-05-17 at 11:18 +0100, Robin Murphy wrote:
> On 2022-05-17 11:17, Niklas Schnelle wrote:
> > On Tue, 2022-05-17 at 10:36 +0200, Christoph Hellwig wrote:
> > > On Fri, May 13, 2022 at 05:39:48PM +0200, Niklas Schnelle wrote:
> > > > In __iommu_dma_alloc_noncontiguous() the value returned by
> > > > iommu_map_sg_atomic() is checked for being smaller than size. Before
> > > > commit ad8f36e4b6b1 ("iommu: return full error code from
> > > > iommu_map_sg[_atomic]()") this simply checked if the requested size was
> > > > successfully mapped.
> > > > 
> > > > After that commit iommu_map_sg_atomic() may also return a negative
> > > > error value. In principle this too would be covered by the existing
> > > > check. There is one problem however, as size is of type size_t while the
> > > > return type of iommu_map_sg_atomic() is now of type ssize_t the latter 
> > > > gets
> > > > converted to size_t and negative error values end up as very large
> > > > positive values making the check succeed. Fix this by making the return
> > > > type visible with a local variable and add an explicit cast to ssize_t.
> > > > 
> > > > Fixes: ad8f36e4b6b1 ("iommu: return full error code from 
> > > > iommu_map_sg[_atomic]()")
> > > > Cc: sta...@vger.kernel.org
> > > > Signed-off-by: Niklas Schnelle 
> > > 
> > > I don't see what the point of the newly added local variable is here.
> > > Just casting size should be all that is needed as far as I can tell.
> > 
> > No technical reason just found it easier to read and more descriptive.
> > I'll sent a v2 with just the cast, it does simplify the commit message.
> 
> Note that this is already fixed upstream, though:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=core&id=a3884774d731f03d3a3dd4fb70ec2d9341ceb39d
> 
> Robin.

Ah oh well then nevermind and you can of course also ignore the v2 I
sent out a minute ago.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread Robin Murphy


On 2022-05-16 14:06, John Garry wrote:

For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.

Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
allows the drivers to know the mapping limit and thus limit the requested
IOVA lengths.

This resolves the performance issue originally reported in [0] for a SCSI
HBA driver which was regularly mapping SGLs which required IOVAs in
excess of the IOVA caching limit. In this case the block layer limits the
max sectors per request - as configured in __scsi_init_queue() - which
will limit the total SGL length the driver tries to map and in turn limits
IOVA lengths requested.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/

Signed-off-by: John Garry 
---
Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and not
a hard limit which I expect is the semantics of dma_map_ops.max_mapping_size


Indeed, sorry but NAK for this being nonsense. As I've said at least 
once before, if the unnecessary SAC address allocation attempt slows 
down your workload, make it not do that in the first place. If you don't 
like the existing command-line parameter then fine, there are plenty of 
other options, it just needs to be done in a way that doesn't break x86 
systems with dodgy firmware, as my first attempt turned out to.


Furthermore, if a particular SCSI driver doesn't benefit from mappings 
larger than 256KB, then that driver is also free to limit its own 
mapping size. There are other folks out there with use-cases for mapping 
*gigabytes* at once; you don't get to cripple the API and say that 
that's suddenly not allowed just because it happens to make your thing 
go faster, that's absurd.


Thanks,
Robin.


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..e2d5205cde37 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1442,6 +1442,21 @@ static unsigned long iommu_dma_get_merge_boundary(struct 
device *dev)
return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
  }
  
+static size_t iommu_dma_max_mapping_size(struct device *dev)

+{
+   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iommu_dma_cookie *cookie;
+
+   if (!domain)
+   return 0;
+
+   cookie = domain->iova_cookie;
+   if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+   return 0;
+
+   return iova_rcache_range();
+}
+
  static const struct dma_map_ops iommu_dma_ops = {
.alloc  = iommu_dma_alloc,
.free   = iommu_dma_free,
@@ -1462,6 +1477,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
+   .max_mapping_size   = iommu_dma_max_mapping_size,
  };
  
  /*

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..9f00b58d546e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
*iovad);
  static void free_iova_rcaches(struct iova_domain *iovad);
  
+unsigned long iova_rcache_range(void)

+{
+   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+
  static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
  {
struct iova_domain *iovad;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..ae3e18d77e6c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
*iovad, dma_addr_t iova)
  int iova_cache_get(void);
  void iova_cache_put(void);
  
+unsigned long iova_rcache_range(void);

+
  void free_iova(struct iova_domain *iovad, unsigned long pfn);
  void __free_iova(struct iova_domain *iovad, struct iova *iova);
  struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
@@ -105,6 +107,11 @@ static inline void iova_cache_put(void)
  {
  }
  
+static inline unsigned long iova_rcache_range(void)

+{
+   return 0;
+}
+
  static inline void free_iova(struct iova_domain *iovad, unsigned long pfn)
  {
  }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dt-bindings: mediatek: Add bindings for MT6795 M4U

2022-05-17 Thread AngeloGioacchino Del Regno


Il 16/05/22 18:03, Rob Herring ha scritto:

On Fri, 13 May 2022 17:14:10 +0200, AngeloGioacchino Del Regno wrote:

Add bindings for the MediaTek Helio X10 (MT6795) IOMMU/M4U.

Signed-off-by: AngeloGioacchino Del Regno 

---
  .../bindings/iommu/mediatek,iommu.yaml|  3 +
  include/dt-bindings/memory/mt6795-larb-port.h | 96 +++
  2 files changed, 99 insertions(+)
  create mode 100644 include/dt-bindings/memory/mt6795-larb-port.h



Acked-by: Rob Herring 



Hello Rob,

I'm sad to say that, but I have to send a new version of this patch even though
you have acked it already... and I will have to drop your ack, as the changes to
the yaml patch will be a bit different, as we're adding support for a phandle
to mediatek,infracfg after some discussion about it on patch 2/2 of this series.

The mediatek,infracfg phandle addition will come as a different series, and this
patch (on v2) will add a conditional for:

  - if:

  properties:

compatible:

  contains:

enum:

  - mediatek,mt6795-m4u

then:

  required:

- mediatek,infracfg

Sorry about wasting your time on this v1.

Regards,
Angelo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-17 Thread Jason Gunthorpe via iommu

On Tue, May 17, 2022 at 10:05:43AM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/17 02:06, Jason Gunthorpe wrote:
> > > +static __init int tboot_force_iommu(void)
> > > +{
> > > + if (!tboot_enabled())
> > > + return 0;
> > > +
> > > + if (no_iommu || dmar_disabled)
> > > + pr_warn("Forcing Intel-IOMMU to enabled\n");
> > Unrelated, but when we are in the special secure IOMMU modes, do we
> > force ATS off? Specifically does the IOMMU reject TLPs that are marked
> > as translated?
> 
> Good question. From IOMMU point of view, I don't see a point to force
> ATS off, but trust boot involves lots of other things that I am not
> familiar with. Anybody else could help to answer?

ATS is inherently not secure, if a rouge device can issue a TLP with
the translated bit set then it has unlimited access to host memory.

Many of these trusted iommu scenarios rely on the idea that a rouge
device cannot DMA to arbitary system memory.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-05-17 Thread Max Gurtovoy via iommu

Hi,

Can you please send the original scenario, setup details and dumps ?

I can't find it in my mailbox.

you can send it directly to me to avoid spam.

-Max.

On 5/17/2022 11:26 AM, Mark Ruijter wrote:

Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 
5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 
4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" 
 wrote:

 On 2022-02-10 23:58, Martin Oliveira wrote:
 > On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
 >> On 2/8/22 6:50 PM, Martin Oliveira wrote:
 >>> Hello,
 >>>
 >>> We have been hitting an error when running IO over our nvme-of setup, 
using the mlx5 driver and we are wondering if anyone has seen anything similar/has any 
suggestions.
 >>>
 >>> Both initiator and target are AMD EPYC 7502 machines connected over 
RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single 
NVMe fabrics device, one physical SSD per namespace.
 >>>
 >>
 >> Thanks for reporting this, if you can bisect the problem on your setup
 >> it will help others to help you better.
 >>
 >> -ck
 >
 > Hi Chaitanya,
 >
 > I went back to a kernel as old as 4.15 and the problem was still there, 
so I don't know of a good commit to start from.
 >
 > I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.
 >
 > I'd be happy to try any tests if someone has any suggestions.

 The IOMMU is probably your friend here - one thing that might be worth
 trying is capturing the iommu:map and iommu:unmap tracepoints to see if
 the address reported in subsequent IOMMU faults was previously mapped as
 a valid DMA address (be warned that there will likely be a *lot* of
 trace generated). With 5.13 or newer, booting with "iommu.forcedac=1"
 should also make it easier to tell real DMA IOVAs from rogue physical
 addresses or other nonsense, as real DMA addresses should then look more
 like 0x24d08000.

 That could at least help narrow down whether it's some kind of
 use-after-free race or a completely bogus address creeping in somehow.

 Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu


On 17/05/2022 11:40, Robin Murphy wrote:

On 2022-05-16 14:06, John Garry wrote:
For streaming DMA mappings involving an IOMMU and whose IOVA len 
regularly

exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.

Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
allows the drivers to know the mapping limit and thus limit the requested
IOVA lengths.

This resolves the performance issue originally reported in [0] for a SCSI
HBA driver which was regularly mapping SGLs which required IOVAs in
excess of the IOVA caching limit. In this case the block layer limits the
max sectors per request - as configured in __scsi_init_queue() - which
will limit the total SGL length the driver tries to map and in turn 
limits

IOVA lengths requested.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ 



Signed-off-by: John Garry 
---
Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, and 
not
a hard limit which I expect is the semantics of 
dma_map_ops.max_mapping_size


Indeed, sorry but NAK for this being nonsense. As I've said at least 
once before, if the unnecessary SAC address allocation attempt slows 
down your workload, make it not do that in the first place. If you don't 
like the existing command-line parameter then fine, > there are plenty of
other options, it just needs to be done in a way that doesn't break x86 
systems with dodgy firmware, as my first attempt turned out to.


Sorry, but I am not interested in this. It was discussed in Jan last 
year without any viable solution.


Anyway we still have the long-term IOVA aging issue, and requesting 
non-cached IOVAs is involved in that. So I would rather keep the SCSI 
driver to requesting cached IOVAs all the time.


I did try to do it the other way around - configuring the IOVA caching 
range according to the drivers requirement but that got nowhere.




Furthermore, if a particular SCSI driver doesn't benefit from mappings 
larger than 256KB, then that driver is also free to limit its own 
mapping size. There are other folks out there with use-cases for mapping 
*gigabytes* at once; you don't get to cripple the API and say that 
that's suddenly not allowed just because it happens to make your thing 
go faster, that's absurd.


I'd say less catastrophically slow, not faster.

So how to inform the SCSI driver of this caching limit then so that it 
may limit the SGL length?


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread Robin Murphy


On 2022-05-17 12:26, John Garry wrote:

On 17/05/2022 11:40, Robin Murphy wrote:

On 2022-05-16 14:06, John Garry wrote:
For streaming DMA mappings involving an IOMMU and whose IOVA len 
regularly

exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.

Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
allows the drivers to know the mapping limit and thus limit the 
requested

IOVA lengths.

This resolves the performance issue originally reported in [0] for a 
SCSI

HBA driver which was regularly mapping SGLs which required IOVAs in
excess of the IOVA caching limit. In this case the block layer limits 
the

max sectors per request - as configured in __scsi_init_queue() - which
will limit the total SGL length the driver tries to map and in turn 
limits

IOVA lengths requested.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/ 



Signed-off-by: John Garry 
---
Sending as an RFC as iommu_dma_max_mapping_size() is a soft limit, 
and not
a hard limit which I expect is the semantics of 
dma_map_ops.max_mapping_size


Indeed, sorry but NAK for this being nonsense. As I've said at least 
once before, if the unnecessary SAC address allocation attempt slows 
down your workload, make it not do that in the first place. If you 
don't like the existing command-line parameter then fine, > there are 
plenty of
other options, it just needs to be done in a way that doesn't break 
x86 systems with dodgy firmware, as my first attempt turned out to.


Sorry, but I am not interested in this. It was discussed in Jan last 
year without any viable solution.


Er, OK, if you're not interested in solving that problem I don't see why 
you'd bring it up, but hey ho. *I* still think it's important, so I 
guess I'll revive my old patch with a CONFIG_X86 bodge and have another 
go at some point.


Anyway we still have the long-term IOVA aging issue, and requesting 
non-cached IOVAs is involved in that. So I would rather keep the SCSI 
driver to requesting cached IOVAs all the time.


I did try to do it the other way around - configuring the IOVA caching 
range according to the drivers requirement but that got nowhere.


FWIW I thought that all looked OK, it just kept getting drowned out by 
more critical things in my inbox so I hoped someone else might comment. 
If it turns out that I've become the de-facto IOVA maintainer in 
everyone else's minds now and they're all waiting for my word then fair 
enough, I just need to know and reset my expectations accordingly. Joerg?


Furthermore, if a particular SCSI driver doesn't benefit from mappings 
larger than 256KB, then that driver is also free to limit its own 
mapping size. There are other folks out there with use-cases for 
mapping *gigabytes* at once; you don't get to cripple the API and say 
that that's suddenly not allowed just because it happens to make your 
thing go faster, that's absurd.


I'd say less catastrophically slow, not faster.

So how to inform the SCSI driver of this caching limit then so that it 
may limit the SGL length?


Driver-specific mechanism; block-layer-specific mechanism; redefine this 
whole API to something like dma_opt_mapping_size(), as a limit above 
which mappings might become less efficient or start to fail (callback to 
my thoughts on [1] as well, I suppose); many options. Just not imposing 
a ridiculously low *maximum* on everyone wherein mapping calls "should 
not be larger than the returned value" when that's clearly bollocks.


Cheers,
Robin.

[1] 
https://lore.kernel.org/linux-iommu/20220510142109.38-1-ltyker...@gmail.com/

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Robin Murphy


On 2022-05-17 03:37, Baolu Lu wrote:

Hi Jason,

On 2022/5/16 21:57, Jason Gunthorpe wrote:

On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote:

On 2022-05-16 02:57, Lu Baolu wrote:

Each IOMMU driver must provide a blocking domain ops. If the hardware
supports detaching domain from device, setting blocking domain equals
detaching the existing domain from the deivce. Otherwise, an UNMANAGED
domain without any mapping will be used instead.
Unfortunately that's backwards - most of the implementations of 
.detach_dev
are disabling translation entirely, meaning the device ends up 
effectively

in passthrough rather than blocked.

Ideally we'd convert the detach_dev of every driver into either
a blocking or identity domain. The trick is knowing which is which..


I am still a bit puzzled about how the blocking_domain should be used 
when it is extended to support ->set_dev_pasid.


If it's a blocking domain, the IOMMU driver knows that setting the
blocking domain to device pasid means detaching the existing one.

But if it's an identity domain, how could the IOMMU driver choose
between:

  - setting the input domain to the pasid on device; or,
  - detaching the existing domain.

I've ever thought about below solutions:

- Checking the domain types and dispatching them to different
   operations.
- Using different blocking domains for different types of domains.

But both look rough.



Guessing going down the list:
  apple dart - blocking, detach_dev calls apple_dart_hw_disable_dma() 
same as

   IOMMU_DOMAIN_BLOCKED
  [I wonder if this drive ris wrong in other ways though because
    I dont see a remove_streams in attach_dev]
  exynos - this seems to disable the 'sysmmu' so I'm guessing this is
   identity
  iommu-vmsa - Comment says 'disable mmu translaction' so I'm guessing
   this is idenity
  mkt_v1 - Code looks similar to mkt, which is probably identity.
  rkt - No idea
  sprd - No idea
  sun50i - This driver confusingly treats identity the same as
   unmanaged, seems wrong, no idea.
  amd - Not sure, clear_dte_entry() seems to set translation on but 
points

    the PTE to 0 ? Based on the spec table 8 I would have expected
    TV to be clear which would be blocking. Maybe a bug??
  arm smmu qcomm - not sure
  intel - blocking

These doesn't support default domains, so detach_dev should return
back to DMA API ownership, which is either identity or something weird:
  fsl_pamu - identity due to the PPC use of dma direct
  msm
  mkt
  omap
  s390 - platform DMA ops
  terga-gart - Usually something called a GART would be 0 length once
   disabled, guessing blocking?
  tegra-smmu

So, the approach here should be to go driver by driver and convert
detach_dev to either identity, blocking or just delete it entirely,
excluding the above 7 that don't support default domains. And get acks
from the driver owners.



Agreed. There seems to be a long way to go. I am wondering if we could
decouple this refactoring from my new SVA API work? We can easily switch
.detach_dev_pasid to using blocking domain later.


FWIW from my point of view I'm happy with having a .detach_dev_pasid op 
meaning implicitly-blocked access for now. On SMMUv3, PASIDs don't mix 
with our current notion of IOMMU_DOMAIN_IDENTITY (nor the potential one 
for IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with 
devices would need significantly more work anyway.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Jason Gunthorpe via iommu

On Tue, May 17, 2022 at 10:37:55AM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/16 21:57, Jason Gunthorpe wrote:
> > On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote:
> > > On 2022-05-16 02:57, Lu Baolu wrote:
> > > > Each IOMMU driver must provide a blocking domain ops. If the hardware
> > > > supports detaching domain from device, setting blocking domain equals
> > > > detaching the existing domain from the deivce. Otherwise, an UNMANAGED
> > > > domain without any mapping will be used instead.
> > > Unfortunately that's backwards - most of the implementations of 
> > > .detach_dev
> > > are disabling translation entirely, meaning the device ends up effectively
> > > in passthrough rather than blocked.
> > Ideally we'd convert the detach_dev of every driver into either
> > a blocking or identity domain. The trick is knowing which is which..
> 
> I am still a bit puzzled about how the blocking_domain should be used when
> it is extended to support ->set_dev_pasid.
> 
> If it's a blocking domain, the IOMMU driver knows that setting the
> blocking domain to device pasid means detaching the existing one.

> But if it's an identity domain, how could the IOMMU driver choose
> between:

The identity domain would never be used for detaching a PASID. The
reason it is in this explanation is because every detach_dev op should
be deleted - and the code that it was doing can remain in the driver
as either a blocking or identity domain.

> > So, the approach here should be to go driver by driver and convert
> > detach_dev to either identity, blocking or just delete it entirely,
> > excluding the above 7 that don't support default domains. And get acks
> > from the driver owners.
 
> Agreed. There seems to be a long way to go. I am wondering if we could
> decouple this refactoring from my new SVA API work? We can easily switch
> .detach_dev_pasid to using blocking domain later.

Well, PASID has nothing to do with this. PASID enabled drivers would
just need:
 - Must be using default domains and must have a NULL detach_dev op
 - Must provide a blocking_domain
 - Must provide set_dev_pasid for the unmanaged and blocking domains
   it creates

That is it.

Realigning the existing drivers for their base RID support is another
task.

The appeal of this is that it matches how base RID support should look
and doesn't continue the confusing appearance that there is a strictly
paired attach/detach operation.

Any domain can be set ony and RID/PASID at any time.

Drivers intended to be used with VFIO really want a blocking_domain
anyhow for efficiency.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Jason Gunthorpe via iommu

On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote:

> FWIW from my point of view I'm happy with having a .detach_dev_pasid op
> meaning implicitly-blocked access for now. 

If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.

> On SMMUv3, PASIDs don't mix with our current notion of
> IOMMU_DOMAIN_IDENTITY (nor the potential one for
> IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with
> devices would need significantly more work anyway.

There is no extra work in the driver, beyond SMMU having to implement
a blocking domain. The blocking domain's set_dev_pasid op simply is
whatever set_dev_blocking_pasid would have done on the unmanaged
domain.

identity doesn't come into this, identity domains should have a NULL
set_dev_pasid op if the driver can't support using it on a PASID.

IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name
than domain->ops->set_dev_blocking_pasid() - especially since VFIO
would like drivers to implement blocking domain anyhow.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/8] dt-bindings: iommu: mediatek: Add mediatek, pericfg phandle

2022-05-17 Thread AngeloGioacchino Del Regno

Add property "mediatek,pericfg" to let the mtk_iommu driver retrieve
a phandle to the pericfg syscon instead of performing a per-soc
compatible lookup, as it was also done with infracfg.

Signed-off-by: AngeloGioacchino Del Regno 

---
 Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index 78c72c22740b..a6cf9678271f 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -116,6 +116,10 @@ properties:
   Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It must sort
   according to the local arbiter index, like larb0, larb1, larb2...
 
+  mediatek,pericfg:
+$ref: "/schemas/types.yaml#/definitions/phandle"
+description: The phandle to the mediatek pericfg syscon
+
   '#iommu-cells':
 const: 1
 description: |
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 7/8] dt-bindings: iommu: mediatek: Require mediatek, infracfg for mt2712/8173

2022-05-17 Thread AngeloGioacchino Del Regno

Both MT2712 and MT8173 got a mediatek,infracfg phandle: add that to
the required properties for these SoCs to deprecate the old way of
looking for SoC-specific infracfg compatible in the entire devicetree.

Signed-off-by: AngeloGioacchino Del Regno 

---
 .../devicetree/bindings/iommu/mediatek,iommu.yaml| 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index a6cf9678271f..17d78b17027a 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -175,6 +175,18 @@ allOf:
   required:
 - power-domains
 
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - mediatek,mt2712-m4u
+  - mediatek,mt8173-m4u
+
+then:
+  required:
+- mediatek,infracfg
+
   - if: # The IOMMUs don't have larbs.
   not:
 properties:
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg

2022-05-17 Thread AngeloGioacchino Del Regno

This driver will get support for more SoCs and the list of infracfg
compatibles is expected to grow: in order to prevent getting this
situation out of control and see a long list of compatible strings,
add support to retrieve a handle to infracfg's regmap through a
new "mediatek,infracfg" phandle.

In order to keep retrocompatibility with older devicetrees, the old
way is kept in place, but also a dev_warn() was added to advertise
this change in hope that the user will see it and eventually update
the devicetree if this is possible.

Signed-off-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 40 +--
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 71b2ace74cd6..cfaaa98d2b50 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct platform_device *pdev)
data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN);
 
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) {
-   switch (data->plat_data->m4u_plat) {
-   case M4U_MT2712:
-   p = "mediatek,mt2712-infracfg";
-   break;
-   case M4U_MT8173:
-   p = "mediatek,mt8173-infracfg";
-   break;
-   default:
-   p = NULL;
+   infracfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
"mediatek,infracfg");
+   if (IS_ERR(infracfg)) {
+   dev_warn(dev, "Cannot find phandle to 
mediatek,infracfg:"
+ " Please update your devicetree.\n");
+   /*
+* Legacy devicetrees will not specify a phandle to
+* mediatek,infracfg: in that case, we use the older
+* way to retrieve a syscon to infra.
+*
+* This is for retrocompatibility purposes only, hence
+* no more compatibles shall be added to this.
+*/
+   switch (data->plat_data->m4u_plat) {
+   case M4U_MT2712:
+   p = "mediatek,mt2712-infracfg";
+   break;
+   case M4U_MT8173:
+   p = "mediatek,mt8173-infracfg";
+   break;
+   default:
+   p = NULL;
+   }
+
+   infracfg = syscon_regmap_lookup_by_compatible(p);
+   if (IS_ERR(infracfg))
+   return PTR_ERR(infracfg);
}
 
-   infracfg = syscon_regmap_lookup_by_compatible(p);
-
-   if (IS_ERR(infracfg))
-   return PTR_ERR(infracfg);
-
ret = regmap_read(infracfg, REG_INFRA_MISC, &val);
if (ret)
return ret;
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 6/8] arm64: dts: mediatek: mt2712e: Add mediatek, infracfg phandle for IOMMU

2022-05-17 Thread AngeloGioacchino Del Regno

The IOMMU driver now looks for the "mediatek,infracfg" phandle as a
new way to retrieve a syscon to that:
even though the old way is retained, it has been deprecated and the
driver will write a message in kmsg advertising to use the phandle
way instead.

For this reason, assign the right phandle to mediatek,infracfg in
the iommu node.

Signed-off-by: AngeloGioacchino Del Regno 

---
 arch/arm64/boot/dts/mediatek/mt2712e.dtsi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi 
b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
index 623eb3beabf2..4797537cb368 100644
--- a/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt2712e.dtsi
@@ -329,6 +329,7 @@ iommu0: iommu@10205000 {
interrupts = ;
clocks = <&infracfg CLK_INFRA_M4U>;
clock-names = "bclk";
+   mediatek,infracfg = <&infracfg>;
mediatek,larbs = <&larb0>, <&larb1>, <&larb2>,
 <&larb3>, <&larb6>;
#iommu-cells = <1>;
@@ -346,6 +347,7 @@ iommu1: iommu@1020a000 {
interrupts = ;
clocks = <&infracfg CLK_INFRA_M4U>;
clock-names = "bclk";
+   mediatek,infracfg = <&infracfg>;
mediatek,larbs = <&larb4>, <&larb5>, <&larb7>;
#iommu-cells = <1>;
};
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/8] dt-bindings: iommu: mediatek: Add mediatek, infracfg phandle

2022-05-17 Thread AngeloGioacchino Del Regno

Add property "mediatek,infracfg" to let the mtk_iommu driver retrieve
a phandle to the infracfg syscon instead of performing a per-soc
compatible lookup.

Signed-off-by: AngeloGioacchino Del Regno 

---
 Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index 2ae3bbad7f1a..78c72c22740b 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -101,6 +101,10 @@ properties:
 items:
   - const: bclk
 
+  mediatek,infracfg:
+$ref: "/schemas/types.yaml#/definitions/phandle"
+description: The phandle to the mediatek infracfg syscon
+
   mediatek,larbs:
 $ref: /schemas/types.yaml#/definitions/phandle-array
 minItems: 1
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg

2022-05-17 Thread AngeloGioacchino Del Regno

On some SoCs (of which only MT8195 is supported at the time of writing),
the "R" and "W" (I/O) enable bits for the IOMMUs are in the pericfg_ao
register space and not in the IOMMU space: as it happened already with
infracfg, it is expected that this list will grow.

Instead of specifying pericfg compatibles on a per-SoC basis, following
what was done with infracfg, let's lookup the syscon by phandle instead.
Also following the previous infracfg change, add a warning for outdated
devicetrees, in hope that the user will take action.

Signed-off-by: AngeloGioacchino Del Regno 

---
 drivers/iommu/mtk_iommu.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index cfaaa98d2b50..c7e2d836199e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -138,6 +138,8 @@
 /* PM and clock always on. e.g. infra iommu */
 #define PM_CLK_AO  BIT(15)
 #define IFA_IOMMU_PCIE_SUPPORT BIT(16)
+/* IOMMU I/O (r/w) is enabled using PERICFG_IOMMU_1 register */
+#define HAS_PERI_IOMMU1_REGBIT(17)
 
 #define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask)   \
pdata)->flags) & (mask)) == (_x))
@@ -187,7 +189,6 @@ struct mtk_iommu_plat_data {
u32 flags;
u32 inv_sel_reg;
 
-   char*pericfg_comp_str;
struct list_head*hw_list;
unsigned intiova_region_nr;
const struct mtk_iommu_iova_region  *iova_region;
@@ -1214,14 +1215,19 @@ static int mtk_iommu_probe(struct platform_device *pdev)
goto out_runtime_disable;
}
} else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
-  data->plat_data->pericfg_comp_str) {
-   infracfg = 
syscon_regmap_lookup_by_compatible(data->plat_data->pericfg_comp_str);
-   if (IS_ERR(infracfg)) {
-   ret = PTR_ERR(infracfg);
-   goto out_runtime_disable;
-   }
+  MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_PERI_IOMMU1_REG)) {
+   data->pericfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
"mediatek,pericfg");
+   if (IS_ERR(data->pericfg)) {
+   dev_warn(dev, "Cannot find phandle to mediatek,pericfg:"
+ " Please update your devicetree.\n");
 
-   data->pericfg = infracfg;
+   p = "mediatek,mt8195-pericfg_ao";
+   data->pericfg = syscon_regmap_lookup_by_compatible(p);
+   if (IS_ERR(data->pericfg)) {
+   ret = PTR_ERR(data->pericfg);
+   goto out_runtime_disable;
+   }
+   }
}
 
platform_set_drvdata(pdev, data);
@@ -1480,8 +1486,8 @@ static const struct mtk_iommu_plat_data mt8192_data = {
 static const struct mtk_iommu_plat_data mt8195_data_infra = {
.m4u_plat = M4U_MT8195,
.flags= WR_THROT_EN | DCM_DISABLE | STD_AXI_MODE | 
PM_CLK_AO |
-   MTK_IOMMU_TYPE_INFRA | IFA_IOMMU_PCIE_SUPPORT,
-   .pericfg_comp_str = "mediatek,mt8195-pericfg_ao",
+   HAS_PERI_IOMMU1_REG | MTK_IOMMU_TYPE_INFRA |
+   IFA_IOMMU_PCIE_SUPPORT,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN2,
.banks_num= 5,
.banks_enable = {true, false, false, false, true},
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5/8] arm64: dts: mediatek: mt8173: Add mediatek, infracfg phandle for IOMMU

2022-05-17 Thread AngeloGioacchino Del Regno

The IOMMU driver now looks for the "mediatek,infracfg" phandle as a
new way to retrieve a syscon to that:
even though the old way is retained, it has been deprecated and the
driver will write a message in kmsg advertising to use the phandle
way instead.

For this reason, assign the right phandle to mediatek,infracfg in
the iommu node.

Signed-off-by: AngeloGioacchino Del Regno 

---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 40d7b47fc52e..825a3c670373 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -588,6 +588,7 @@ iommu: iommu@10205000 {
interrupts = ;
clocks = <&infracfg CLK_INFRA_M4U>;
clock-names = "bclk";
+   mediatek,infracfg = <&infracfg>;
mediatek,larbs = <&larb0>, <&larb1>, <&larb2>,
 <&larb3>, <&larb4>, <&larb5>;
#iommu-cells = <1>;
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 8/8] dt-bindings: iommu: mediatek: Require mediatek, pericfg for mt8195-infra

2022-05-17 Thread AngeloGioacchino Del Regno

The MT8195 SoC has IOMMU related registers in the pericfg_ao iospace:
require a phandle to that.

Signed-off-by: AngeloGioacchino Del Regno 

---

Note for Rob: as of now, there's no iommu node in upstream mt8195 devicetrees 
yet.

 .../devicetree/bindings/iommu/mediatek,iommu.yaml  | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index 17d78b17027a..2441c2e8e55d 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -187,6 +187,16 @@ allOf:
   required:
 - mediatek,infracfg
 
+  - if:
+  properties:
+compatible:
+  contains:
+const: mediatek,mt8195-iommu-infra
+
+then:
+  required:
+- mediatek,pericfg
+
   - if: # The IOMMUs don't have larbs.
   not:
 properties:
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/8] mtk_iommu: Specify phandles to infracfg and pericfg

2022-05-17 Thread AngeloGioacchino Del Regno

The IOMMU has registers in the infracfg and/or pericfg iospaces: as
for the currently supported SoCs, MT2712 and MT8173 need a phandle to
infracfg, while MT8195 needs one to pericfg.

Before this change, the driver was checking for a SoC-specific infra/peri
compatible but, sooner or later, these lists are going to grow a lot...
...and this is why it was chosen to add phandles (as it was done with
some other drivers already - look at mtk-pm-domains, mt8192-afe

Please note that, while it was necessary to update the devicetrees for
MT8173 and MT2712e, there was no update for MT8195 because there is no
IOMMU node in there yet.

AngeloGioacchino Del Regno (8):
  dt-bindings: iommu: mediatek: Add mediatek,infracfg phandle
  iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg
  dt-bindings: iommu: mediatek: Add mediatek,pericfg phandle
  iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg
  arm64: dts: mediatek: mt8173: Add mediatek,infracfg phandle for IOMMU
  arm64: dts: mediatek: mt2712e: Add mediatek,infracfg phandle for IOMMU
  dt-bindings: iommu: mediatek: Require mediatek,infracfg for
mt2712/8173
  dt-bindings: iommu: mediatek: Require mediatek,pericfg for
mt8195-infra

 .../bindings/iommu/mediatek,iommu.yaml| 30 +
 arch/arm64/boot/dts/mediatek/mt2712e.dtsi |  2 +
 arch/arm64/boot/dts/mediatek/mt8173.dtsi  |  1 +
 drivers/iommu/mtk_iommu.c | 66 ---
 4 files changed, 75 insertions(+), 24 deletions(-)

-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu


On 17/05/2022 13:02, Robin Murphy wrote:


Indeed, sorry but NAK for this being nonsense. As I've said at least 
once before, if the unnecessary SAC address allocation attempt slows 
down your workload, make it not do that in the first place. If you 
don't like the existing command-line parameter then fine, > there are 
plenty of
other options, it just needs to be done in a way that doesn't break 
x86 systems with dodgy firmware, as my first attempt turned out to.


Sorry, but I am not interested in this. It was discussed in Jan last 
year without any viable solution.


Er, OK, if you're not interested in solving that problem I don't see why 
you'd bring it up, but hey ho. *I* still think it's important, so I 
guess I'll revive my old patch with a CONFIG_X86 bodge and have another 
go at some point.


Let me rephrase, I would be happy to help fix that problem if we really 
can get it fixed, however for my problem it's better to try to get the 
SCSI driver to stop requesting uncached IOVAs foremost.




Anyway we still have the long-term IOVA aging issue, and requesting 
non-cached IOVAs is involved in that. So I would rather keep the SCSI 
driver to requesting cached IOVAs all the time.


I did try to do it the other way around - configuring the IOVA caching 
range according to the drivers requirement but that got nowhere.


Note that this is still not a final solution as it's not always viable 
to ask a user to unbind + bind the driver.




FWIW I thought that all looked OK, it just kept getting drowned out by 
more critical things in my inbox so I hoped someone else might comment. 
If it turns out that I've become the de-facto IOVA maintainer in 
everyone else's minds now and they're all waiting for my word then fair 
enough, I just need to know and reset my expectations accordingly. Joerg?


It would be great to see an improvement here...



Furthermore, if a particular SCSI driver doesn't benefit from 
mappings larger than 256KB, then that driver is also free to limit 
its own mapping size. There are other folks out there with use-cases 
for mapping *gigabytes* at once; you don't get to cripple the API and 
say that that's suddenly not allowed just because it happens to make 
your thing go faster, that's absurd.


I'd say less catastrophically slow, not faster.

So how to inform the SCSI driver of this caching limit then so that it 
may limit the SGL length?


Driver-specific mechanism; block-layer-specific mechanism; redefine this 
whole API to something like dma_opt_mapping_size(), as a limit above 
which mappings might become less efficient or start to fail (callback to 
my thoughts on [1] as well, I suppose); many options.


ok, fine.

Just not imposing 
a ridiculously low *maximum* on everyone wherein mapping calls "should 
not be larger than the returned value" when that's clearly bollocks.


I agree that this change is in violation as the documentation clearly 
implies a hard limit.


However, FWIW, from looking at users of dma_max_mapping_size(), they 
seem to use in a similar way to SCSI/block layer core, i.e. use this 
value to limit the max SGL total len per IO command. And not as a method 
to learn max DMA consistent mappings size for ring buffers, etc.


Anyway I'll look at dma_opt_mapping_size() but I am not sure how keen 
Christoph will be on this... it is strange to introduce that API due to 
peculiarity of the IOVA allocator.





[1] 
https://lore.kernel.org/linux-iommu/20220510142109.38-1-ltyker...@gmail.com/


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/8] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg

2022-05-17 Thread Robin Murphy


On 2022-05-17 14:21, AngeloGioacchino Del Regno wrote:

This driver will get support for more SoCs and the list of infracfg
compatibles is expected to grow: in order to prevent getting this
situation out of control and see a long list of compatible strings,
add support to retrieve a handle to infracfg's regmap through a
new "mediatek,infracfg" phandle.

In order to keep retrocompatibility with older devicetrees, the old
way is kept in place, but also a dev_warn() was added to advertise
this change in hope that the user will see it and eventually update
the devicetree if this is possible.

Signed-off-by: AngeloGioacchino Del Regno 

---
  drivers/iommu/mtk_iommu.c | 40 +--
  1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 71b2ace74cd6..cfaaa98d2b50 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct platform_device *pdev)
data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN);
  
  	if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) {

-   switch (data->plat_data->m4u_plat) {
-   case M4U_MT2712:
-   p = "mediatek,mt2712-infracfg";
-   break;
-   case M4U_MT8173:
-   p = "mediatek,mt8173-infracfg";
-   break;
-   default:
-   p = NULL;
+   infracfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
"mediatek,infracfg");
+   if (IS_ERR(infracfg)) {
+   dev_warn(dev, "Cannot find phandle to 
mediatek,infracfg:"
+ " Please update your devicetree.\n");


Is this really a dev_warn-level problem? There's no functional impact, 
given that we can't stop supporting the original binding any time soon, 
if ever, so I suspect this is more likely to just annoy users and CI 
systems than effect any significant change.



+   /*
+* Legacy devicetrees will not specify a phandle to
+* mediatek,infracfg: in that case, we use the older
+* way to retrieve a syscon to infra.
+*
+* This is for retrocompatibility purposes only, hence
+* no more compatibles shall be added to this.
+*/
+   switch (data->plat_data->m4u_plat) {
+   case M4U_MT2712:
+   p = "mediatek,mt2712-infracfg";
+   break;
+   case M4U_MT8173:
+   p = "mediatek,mt8173-infracfg";
+   break;
+   default:
+   p = NULL;
+   }
+
+   infracfg = syscon_regmap_lookup_by_compatible(p);


Would it not make sense to punt this over to the same mechanism as for 
pericfg, such that it simplifies down to something like:


if (IS_ERR(infracfg) && plat_data->infracfg) {
infracfg = 
syscon_regmap_lookup_by_compatible(plat_data->infracfg);
...
}

?

TBH if we're still going to have a load of per-SoC data in the driver 
anyway then I don't see that we really gain much by delegating one 
aspect of it to DT, but meh. I would note that with the phandle 
approach, you still need some *other* flag in the driver to know whether 
a phandle is expected to be present or not, whereas a NULL vs. non-NULL 
string is at least neatly self-describing.


Robin.


+   if (IS_ERR(infracfg))
+   return PTR_ERR(infracfg);
}
  
-		infracfg = syscon_regmap_lookup_by_compatible(p);

-
-   if (IS_ERR(infracfg))
-   return PTR_ERR(infracfg);
-
ret = regmap_read(infracfg, REG_INFRA_MISC, &val);
if (ret)
return ret;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/8] dt-bindings: iommu: mediatek: Add mediatek,infracfg phandle

2022-05-17 Thread Krzysztof Kozlowski

On 17/05/2022 15:21, AngeloGioacchino Del Regno wrote:
> Add property "mediatek,infracfg" to let the mtk_iommu driver retrieve
> a phandle to the infracfg syscon instead of performing a per-soc
> compatible lookup.
> 
> Signed-off-by: AngeloGioacchino Del Regno 
> 
> ---
>  Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> index 2ae3bbad7f1a..78c72c22740b 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> @@ -101,6 +101,10 @@ properties:
>  items:
>- const: bclk
>  
> +  mediatek,infracfg:
> +$ref: "/schemas/types.yaml#/definitions/phandle"

No need for quotes. They are not present in other places.


> +description: The phandle to the mediatek infracfg syscon
> +
>mediatek,larbs:
>  $ref: /schemas/types.yaml#/definitions/phandle-array
>  minItems: 1


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/8] dt-bindings: iommu: mediatek: Add mediatek,pericfg phandle

2022-05-17 Thread Krzysztof Kozlowski

On 17/05/2022 15:21, AngeloGioacchino Del Regno wrote:
> Add property "mediatek,pericfg" to let the mtk_iommu driver retrieve
> a phandle to the pericfg syscon instead of performing a per-soc
> compatible lookup, as it was also done with infracfg.
> 
> Signed-off-by: AngeloGioacchino Del Regno 
> 
> ---
>  Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> index 78c72c22740b..a6cf9678271f 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> @@ -116,6 +116,10 @@ properties:
>Refer to bindings/memory-controllers/mediatek,smi-larb.yaml. It must 
> sort
>according to the local arbiter index, like larb0, larb1, larb2...
>  
> +  mediatek,pericfg:
> +$ref: "/schemas/types.yaml#/definitions/phandle"

No need for quotes.


Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-05-17 Thread Mark Ruijter

Hi Robin,

I ran into the exact same problem while testing with 4 connect-x6 cards, kernel 
5.18-rc6.

[ 4878.273016] nvme nvme0: Successfully reconnected (3 attempts)
[ 4879.122015] nvme nvme0: starting error recovery
[ 4879.122028] infiniband mlx5_4: mlx5_handle_error_cqe:332:(pid 0): WC error: 
4, Message: local protection error
[ 4879.122035] infiniband mlx5_4: dump_cqe:272:(pid 0): dump error cqe
[ 4879.122037] : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122039] 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 4879.122040] 0030: 00 00 00 00 a9 00 56 04 00 00 00 ed 0d da ff e2
[ 4881.085547] nvme nvme3: Reconnecting in 10 seconds...

I assume this means that the problem has still not been resolved?
If so, I'll try to diagnose the problem.

Thanks,

--Mark

On 11/02/2022, 12:35, "Linux-nvme on behalf of Robin Murphy" 

wrote:

On 2022-02-10 23:58, Martin Oliveira wrote:
> On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:
>> On 2/8/22 6:50 PM, Martin Oliveira wrote:
>>> Hello,
>>>
>>> We have been hitting an error when running IO over our nvme-of setup, 
using the mlx5 driver and we are wondering if anyone has seen anything 
similar/has any suggestions.
>>>
>>> Both initiator and target are AMD EPYC 7502 machines connected over 
RDMA using a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a 
single NVMe fabrics device, one physical SSD per namespace.
>>>
>>
>> Thanks for reporting this, if you can bisect the problem on your setup
>> it will help others to help you better.
>>
>> -ck
> 
> Hi Chaitanya,
> 
> I went back to a kernel as old as 4.15 and the problem was still there, 
so I don't know of a good commit to start from.
> 
> I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.
> 
> I'd be happy to try any tests if someone has any suggestions.

The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0x24d08000.

That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.

Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category

2022-05-17 Thread Ricardo Neri

On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> > Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> > is to be used with the HPET-based hardlockup detector. This detector
> > does not have a direct way of checking if the HPET timer is the source of
> > the NMI. Instead, it indirectly estimates it using the time-stamp counter.
> >
> > Therefore, we may have false-positives in case another NMI occurs within
> > the estimated time window. For this reason, we want the handler of the
> > detector to be called after all the NMI_LOCAL handlers. A simple way
> > of achieving this with a new NMI handler category.
> >
> > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs 
> > *regs)
> > }
> > raw_spin_unlock(&nmi_reason_lock);
> >  
> > +   handled = nmi_handle(NMI_WATCHDOG, regs);
> > +   if (handled == NMI_HANDLED)
> > +   goto out;
> > +
> 
> How is this supposed to work reliably?
> 
> If perf is active and the HPET NMI and the perf NMI come in around the
> same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
> won't be checked. Because MSI is strictly edge and the message is only
> sent once, this can result in a stale watchdog, no?

This is true. Instead, at the end of each NMI I should _also_ check if the TSC
is within the expected value of the HPET NMI watchdog. In this way, unrelated
NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET
channel.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-17 Thread Alex Williamson

On Tue, 10 May 2022 13:55:24 -0300
Jason Gunthorpe  wrote:

> This control causes the ARM SMMU drivers to choose a stage 2
> implementation for the IO pagetable (vs the stage 1 usual default),
> however this choice has no visible impact to the VFIO user. Further qemu
> never implemented this and no other userspace user is known.
> 
> The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
> new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
> SMMU translation services to the guest operating system" however the rest
> of the API to set the guest table pointer for the stage 1 was never
> completed, or at least never upstreamed, rendering this part useless dead
> code.
> 
> Since the current patches to enable nested translation, aka userspace page
> tables, rely on iommufd and will not use the enable_nesting()
> iommu_domain_op, remove this infrastructure. However, don't cut too deep
> into the SMMU drivers for now expecting the iommufd work to pick it up -
> we still need to create S2 IO page tables.
> 
> Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
> enable_nesting iommu_domain_op.
> 
> Just in-case there is some userspace using this continue to treat
> requesting it as a NOP, but do not advertise support any more.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 16 
>  drivers/iommu/iommu.c   | 10 --
>  drivers/vfio/vfio_iommu_type1.c | 12 +---
>  include/linux/iommu.h   |  3 ---
>  include/uapi/linux/vfio.h   |  2 +-
>  6 files changed, 2 insertions(+), 57 deletions(-)
> 
> It would probably make sense for this to go through the VFIO tree with Robin's
> ack for the SMMU changes.

I'd be in favor of applying this, but it seems Robin and Eric are
looking for a stay of execution and I'd also be looking for an ack from
Joerg.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-17 Thread Ricardo Neri

On Tue, May 10, 2022 at 09:16:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > The HPET hardlockup detector relies on tsc_khz to estimate the value of
> > that the TSC will have when its HPET channel fires. A refined tsc_khz
> > helps to estimate better the expected TSC value.
> > 
> > Using the early value of tsc_khz may lead to a large error in the expected
> > TSC value. Restarting the NMI watchdog detector has the effect of kicking
> > its HPET channel and make use of the refined tsc_khz.
> > 
> > When the HPET hardlockup is not in use, restarting the NMI watchdog is
> > a noop.
> > 
> > Cc: Andi Kleen 
> > Cc: Stephane Eranian 
> > Cc: "Ravi V. Shankar" 
> > Cc: iommu@lists.linux-foundation.org
> > Cc: linuxppc-...@lists.ozlabs.org
> > Cc: x...@kernel.org
> > Signed-off-by: Ricardo Neri 
> > ---
> > Changes since v5:
> >  * Introduced this patch
> > 
> > Changes since v4
> >  * N/A
> > 
> > Changes since v3
> >  * N/A
> > 
> > Changes since v2:
> >  * N/A
> > 
> > Changes since v1:
> >  * N/A
> > ---
> >  arch/x86/kernel/tsc.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index cafacb2e58cc..cc1843044d88 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct 
> > work_struct *work)
> > /* Inform the TSC deadline clockevent devices about the recalibration */
> > lapic_update_tsc_freq();
> >  
> > +   /*
> > +* If in use, the HPET hardlockup detector relies on tsc_khz.
> > +* Reconfigure it to make use of the refined tsc_khz.
> > +*/
> > +   lockup_detector_reconfigure();
> 
> I don't know if the API is conceptually good.
> 
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. 

Yes, this is what happens.

> What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter

Please see my comment

> but in principle it's a bad API to export.
> 
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped

I see.

> [actually that  looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].

Perhaps there should be a watchdog_thresh_user. When the user updates it,
the detector is stopped, watchdog_thresh is updated, and then the detector
is resumed.

> 
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.

I had it like this but it did not look right to me. You are right, however,
I can stop/restart the watchdog when needed.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-17 Thread Ricardo Neri

On Tue, May 10, 2022 at 01:44:05PM +0200, Thomas Gleixner wrote:
> On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> >> +  /*
> >> +   * If in use, the HPET hardlockup detector relies on tsc_khz.
> >> +   * Reconfigure it to make use of the refined tsc_khz.
> >> +   */
> >> +  lockup_detector_reconfigure();
> >
> > I don't know if the API is conceptually good.
> >
> > You change something that the lockup detector is currently using, 
> > *while* the detector is running asynchronously, and then reconfigure
> > it. What happens in the window? If this code is only used for small
> > adjustments maybe it does not really matter but in principle it's
> > a bad API to export.
> >
> > lockup_detector_reconfigure as an internal API is okay because it
> > reconfigures things while the watchdog is stopped [actually that
> > looks untrue for soft dog which uses watchdog_thresh in
> > is_softlockup(), but that should be fixed].
> >
> > You're the arch so you're allowed to stop the watchdog and configure
> > it, e.g., hardlockup_detector_perf_stop() is called in arch/.
> >
> > So you want to disable HPET watchdog if it was enabled, then update
> > wherever you're using tsc_khz, then re-enable.
> 
> The real question is whether making this refined tsc_khz value
> immediately effective matters at all. IMO, it does not because up to
> that point the watchdog was happily using the coarse calibrated value
> and the whole use TSC to assess whether the HPET fired mechanism is just
> a guestimate anyway. So what's the point of trying to guess 'more
> correct'.

In some of my test systems I observed that, the TSC value does not fall
within the expected error window the first time the HPET channel expires.

I inferred that the error computed using the coarser tsc_khz was wrong.
Recalculating the error window with refined tsc_khz would correct it.

However, restarting the timer has the side-effect of kicking the timer and,
therefore pushing the first HPET NMI further in the future.

Perhaps kicking HPET channel, not recomputing the error window, corrected
(masked?) the problem.

I will investigate further and rework or drop this patch as needed.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-17 Thread Ricardo Neri

On Sat, May 14, 2022 at 10:17:38AM +0200, Thomas Gleixner wrote:
> On Fri, May 13 2022 at 14:19, Ricardo Neri wrote:
> > On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote:
> >> The argument about not bloating the code
> >> with an "obvious???" function which is quite small is slightly beyond my
> >> comprehension level.
> >
> > That obvious function would look like this:
> >
> > void hpet_set_comparator_one_shot(int channel, u32 delta)
> > {
> > u32 count;
> >
> > count = hpet_readl(HPET_COUNTER);
> > count += delta;
> > hpet_writel(count, HPET_Tn_CMP(channel));
> > }
> 
> This function only works reliably when the delta is large. See
> hpet_clkevt_set_next_event().

That is a good point. One more reason to not have a
hpet_set_comparator_one_shot(), IMO.

Thanks and BR,
Ricardo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-17 Thread Yong Wu via iommu

On Tue, 2022-05-17 at 11:26 +0200, AngeloGioacchino Del Regno wrote:
> Il 17/05/22 11:08, Yong Wu ha scritto:
> > On Fri, 2022-05-13 at 17:14 +0200, AngeloGioacchino Del Regno
> > wrote:
> > > Add support for the M4Us found in the MT6795 Helio X10 SoC.
> > > 
> > > Signed-off-by: AngeloGioacchino Del Regno <
> > > angelogioacchino.delre...@collabora.com>
> > > ---
> > >   drivers/iommu/mtk_iommu.c | 20 +++-
> > >   1 file changed, 19 insertions(+), 1 deletion(-)

[...]

> > > +static const struct mtk_iommu_plat_data mt6795_data = {
> > > + .m4u_plat = M4U_MT6795,
> > > + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
> > > + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
> > > + .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
> > > + .banks_num= 1,
> > > + .banks_enable = {true},
> > > + .iova_region  = single_domain,
> > > + .iova_region_nr = ARRAY_SIZE(single_domain),
> > > + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
> > > */
> > > +};
> > 
> > This is nearly same with mt8173_data. mt8173 has one more larb than
> > mt6795, its larbid_remap is also ok for mt6795.
> > 
> 
> I think that we should be explicit about the larbid_remap property,
> since mt6795 has one less larb, we should explicitly say that like
> I did there... that's only for human readability I admit ... but,
> still, I wouldn't want to see people thinking that MT6795 has 6 LARBs
> because they've read that larbid_remap having 6 entries.

In the linear mapping case, It does help. Strictly larbid_remap is two-
dimensional array now, It doesn't indicate how many larbs this SoC
has. If someone would like to know this, he could read the mtxxx-larb-
port.h. We also could document the larb numbers in the binding
for readability.

Anyway, It is not a big problem. A new structure is ok for me. I just
complain there are too many structures, specially in the internal
branch which contains many non-upstream SoCs, and there may be several
IOMMU HWs in one SoC. thus I'd like to group it personally.

> 
> > thus it looks we could use mt8173 as the backward compatible.
> >  compatible = "mediatek,mt6795-m4u",
> >   "mediatek,mt8173-m4u";
> > 
> > After this, the only thing is about "mediatek,mt6795-infracfg". we
> > have
> > to try again with mediatek,mt6795-infracfg after mediatek,mt8173-
> > infracfg fail. I think we should allow the backward case in 4GB
> > mode
> > judgment if we have.
> > 
> > What's your opinion? or some other suggestion?
> > Thanks.
> 
> I know, I may have a plan for that, but I wanted to have a good
> reason to
> propose such a thing, as if it's just about two SoCs needing that,
> there
> would be no good reason to get things done differently.
> 
> ...so, in order to provide a good cleanup, we have two possible roads
> to
> follow here: either we add a generic "mediatek,infracfg" compatible
> to the
> infra node (but I don't like that), or we can do it like it was
> previously
> done in mtk-pm-domains.c (I prefer that approach):
> 
> iommu: iommu@somewhere {
>   ... something ...
>   mediatek,infracfg = <&infracfg>;

We like this too. But this was not suggested from Rob long before.

https://lore.kernel.org/linux-mediatek/20200715205120.GA778876@bogus/

> };
> 
> infracfg = syscon_regmap_lookup_by_compatible(node,
> "mediatek,infracfg");
> if (IS_ERR(infracfg)) {
>   /* try with the older way */
>   switch (...) {
>   case  p = "mediatek,mt2712-infracfg";
>   ... blah blah ...
>   }
>   /* legacy also failed, ouch! */
>   if (IS_ERR(infracfg))
>   return PTR_ERR(infracfg);
> }
> 
> ret = regmap_read ... etc etc etc
> 
> Cheers,
> Angelo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 7/8] dt-bindings: iommu: mediatek: Require mediatek,infracfg for mt2712/8173

2022-05-17 Thread Rob Herring

On Tue, May 17, 2022 at 03:21:06PM +0200, AngeloGioacchino Del Regno wrote:
> Both MT2712 and MT8173 got a mediatek,infracfg phandle: add that to
> the required properties for these SoCs to deprecate the old way of
> looking for SoC-specific infracfg compatible in the entire devicetree.

Wait, what? If there's only one possible node that can match, I prefer 
the 'old way'. Until we implemented a phandle cache, searching the 
entire tree was how phandle lookups worked too, so not any better.

But if this makes things more consistent,

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 8/8] dt-bindings: iommu: mediatek: Require mediatek,pericfg for mt8195-infra

2022-05-17 Thread Rob Herring

On Tue, 17 May 2022 15:21:07 +0200, AngeloGioacchino Del Regno wrote:
> The MT8195 SoC has IOMMU related registers in the pericfg_ao iospace:
> require a phandle to that.
> 
> Signed-off-by: AngeloGioacchino Del Regno 
> 
> ---
> 
> Note for Rob: as of now, there's no iommu node in upstream mt8195 devicetrees 
> yet.
> 
>  .../devicetree/bindings/iommu/mediatek,iommu.yaml  | 10 ++
>  1 file changed, 10 insertions(+)
> 

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Baolu Lu


On 2022/5/17 21:13, Jason Gunthorpe wrote:

On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote:


FWIW from my point of view I'm happy with having a .detach_dev_pasid op
meaning implicitly-blocked access for now.


If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.


Sure. And with the blocking domain implemented, the
set_dev_blocking_pasid could be deprecated.




On SMMUv3, PASIDs don't mix with our current notion of
IOMMU_DOMAIN_IDENTITY (nor the potential one for
IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with
devices would need significantly more work anyway.


There is no extra work in the driver, beyond SMMU having to implement
a blocking domain. The blocking domain's set_dev_pasid op simply is
whatever set_dev_blocking_pasid would have done on the unmanaged
domain.

identity doesn't come into this, identity domains should have a NULL
set_dev_pasid op if the driver can't support using it on a PASID.

IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name
than domain->ops->set_dev_blocking_pasid() - especially since VFIO
would like drivers to implement blocking domain anyhow.


Perhaps implementing blocking domains for intel and smmuv3 iommu drivers
seem to be a more practical start.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

49 matches

Mail list logo