Re: [PATCH v2 3/5] irqchip: Allow QCOM_PDC to be loadable as a permanent module

2020-07-12 Thread Marc Zyngier
On Sat, 11 Jul 2020 00:27:45 +0100,
Stephen Boyd  wrote:
> 
> Quoting John Stultz (2020-07-10 15:44:18)
> > On Thu, Jul 9, 2020 at 11:02 PM Stephen Boyd  wrote:
> > >
> > > Does it work? I haven't looked in detail but I worry that the child
> > > irqdomain (i.e. pinctrl-msm) would need to delay probing until this
> > > parent irqdomain is registered. Or has the hierarchical irqdomain code
> > > been updated to handle the parent child relationship and wait for things
> > > to probe or be loaded?
> > 
> > So I can't say I know the underlying hardware particularly well, but
> > I've been using this successfully on the Dragonboard 845c with both
> > static builds as well as module enabled builds.
> > And the same patch has been in the android-mainline and android-5.4
> > kernels for a while without objections from QCOM.
> > 
> > As to the probe ordering question, Saravana can maybe speak in more
> > detail if it's involved in this case but the fw_devlink code has
> > addressed many of these sorts of ordering issues.
> > However, I'm not sure if I'm lucking into the right probe order, as we
> > have been able to boot android-mainline w/ both fw_devlink=on and
> > fw_devlink=off (though in the =off case, we need
> > deferred_probe_timeout=30 to give us a bit more time for modules to
> > load after init starts).
> > 
> 
> Ok I looked at the code (sorry for not checking earlier) and I see this in
> msm_gpio_init()
> 
> np = of_parse_phandle(pctrl->dev->of_node, "wakeup-parent", 0);
> if (np) {
> chip->irq.parent_domain = irq_find_matching_host(np,
>  DOMAIN_BUS_WAKEUP);
> of_node_put(np);
> if (!chip->irq.parent_domain)
> return -EPROBE_DEFER;
> 
> so it looks like we'll probe defer the pinctrl driver until the pdc module
> loads. Meaning it should work to have pinctrl builtin and pdc as a module.

What I hope is that eventually fw_devlink will become the norm (on by
default), and that probe deferral will become a thing of the past.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC][PATCH 5/5] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2020-06-16 Thread Marc Zyngier

Hi John,

+Will for the SMMU part.

On 2020-06-16 07:13, John Stultz wrote:

Allow the qcom_scm driver to be loadable as a
permenent module.

Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Joerg Roedel 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: Lina Iyer 
Cc: Saravana Kannan 
Cc: Todd Kjos 
Cc: Greg Kroah-Hartman 
Cc: linux-arm-...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-g...@vger.kernel.org
Signed-off-by: John Stultz 
---
 drivers/firmware/Kconfig| 2 +-
 drivers/firmware/Makefile   | 3 ++-
 drivers/firmware/qcom_scm.c | 4 
 drivers/iommu/Kconfig   | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index fbd785dd0513..9e533a462bf4 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -236,7 +236,7 @@ config INTEL_STRATIX10_RSU
  Say Y here if you want Intel RSU support.

 config QCOM_SCM
-   bool
+   tristate "Qcom SCM driver"
depends on ARM || ARM64
select RESET_CONTROLLER

diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 99510be9f5ed..cf24d674216b 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -17,7 +17,8 @@ obj-$(CONFIG_ISCSI_IBFT)  += iscsi_ibft.o
 obj-$(CONFIG_FIRMWARE_MEMMAP)  += memmap.o
 obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o
 obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o
-obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
+obj-$(CONFIG_QCOM_SCM) += qcom-scm.o
+qcom-scm-objs += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
 obj-$(CONFIG_TI_SCI_PROTOCOL)  += ti_sci.o
 obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o
 obj-$(CONFIG_TURRIS_MOX_RWTM)  += turris-mox-rwtm.o
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index 0e7233a20f34..b5e88bf66975 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -1155,6 +1155,7 @@ static const struct of_device_id 
qcom_scm_dt_match[] = {

{ .compatible = "qcom,scm" },
{}
 };
+MODULE_DEVICE_TABLE(of, qcom_scm_dt_match);

 static struct platform_driver qcom_scm_driver = {
.driver = {
@@ -1170,3 +1171,6 @@ static int __init qcom_scm_init(void)
return platform_driver_register(_scm_driver);
 }
 subsys_initcall(qcom_scm_init);
+
+MODULE_DESCRIPTION("Qualcomm Technologies, Inc. SCM driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b510f67dfa49..714893535dd2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -381,6 +381,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
 	depends on (ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)) && 
MMU

+   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y


This looks a bit ugly. Could you explain why we need this at the SMMU 
level? I'd have expected the dependency to flow the other way around...



select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
@@ -500,6 +501,7 @@ config QCOM_IOMMU
# Note: iommu drivers cannot (yet?) be built as modules
bool "Qualcomm IOMMU Support"
depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on QCOM_SCM=y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/5] irqchip: Allow QCOM_PDC to be loadable as a permanent module

2020-06-27 Thread Marc Zyngier
On Sat, 27 Jun 2020 02:34:25 +0100,
John Stultz  wrote:
> 
> On Fri, Jun 26, 2020 at 12:42 AM Stephen Boyd  wrote:
> >
> > Quoting John Stultz (2020-06-24 17:10:37)
> > > diff --git a/drivers/irqchip/qcom-pdc.c b/drivers/irqchip/qcom-pdc.c
> > > index 6ae9e1f0819d..3fee8b655da1 100644
> > > --- a/drivers/irqchip/qcom-pdc.c
> > > +++ b/drivers/irqchip/qcom-pdc.c
> > > @@ -430,4 +432,33 @@ static int qcom_pdc_init(struct device_node *node, 
> > > struct device_node *parent)
> > > return ret;
> > >  }
> > >
> > > +#ifdef MODULE
> > > +static int qcom_pdc_probe(struct platform_device *pdev)
> > > +{
> > > +   struct device_node *np = pdev->dev.of_node;
> > > +   struct device_node *parent = of_irq_find_parent(np);
> > > +
> > > +   return qcom_pdc_init(np, parent);
> > > +}
> > > +
> > > +static const struct of_device_id qcom_pdc_match_table[] = {
> > > +   { .compatible = "qcom,pdc" },
> > > +   {}
> > > +};
> > > +MODULE_DEVICE_TABLE(of, qcom_pdc_match_table);
> > > +
> > > +static struct platform_driver qcom_pdc_driver = {
> > > +   .probe = qcom_pdc_probe,
> > > +   .driver = {
> > > +   .name = "qcom-pdc",
> > > +   .of_match_table = qcom_pdc_match_table,
> > > +   .suppress_bind_attrs = true,
> > > +   },
> > > +};
> > > +module_platform_driver(qcom_pdc_driver);
> > > +#else
> > >  IRQCHIP_DECLARE(qcom_pdc, "qcom,pdc", qcom_pdc_init);
> >
> > Is there any reason to use IRQCHIP_DECLARE if this can work as a
> > platform device driver?
> >
> 
> Hey! Thanks so much for the review!
> 
> Mostly it was done this way to minimize the change in the non-module
> case. But if you'd rather avoid the #ifdefery I'll respin it without.

That would certainly be my own preference. In general, IRQCHIP_DECLARE
and platform drivers should be mutually exclusive in the same driver:
if you can delay the probing and have it as a proper platform device,
then this should be the one true way.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXT] Re: [PATCH v2 12/12] bus: fsl-mc: Add ACPI support for fsl-mc

2020-07-16 Thread Marc Zyngier

On 2020-07-16 04:23, Makarand Pawagi wrote:

-Original Message-
From: Lorenzo Pieralisi 


[...]


Anyway - you need to seek feedback from Marc on whether patches
11 and 12 are OK from an irqchip perspective, it is possible we can 
take the rest
of the series independently if everyone agrees but I don't necessarily 
see a

reason for that.

Long story short: you need Marc's ACK on [11-12], it is your code.


Hi Marc, can you please review/ack this patch?


https://lore.kernel.org/linux-acpi/bd07f44dad1d029e0d023202cbf5f...@kernel.org/

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/vt-d: Cure VF irqdomain hickup

2020-11-13 Thread Marc Zyngier

On 2020-11-12 21:34, Thomas Gleixner wrote:

On Thu, Nov 12 2020 at 20:15, Thomas Gleixner wrote:

The recent changes to store the MSI irqdomain pointer in struct device
missed that Intel DMAR does not register virtual function devices.  
Due to
that a VF device gets the plain PCI-MSI domain assigned and then 
issues

compat MSI messages which get caught by the interrupt remapping unit.

Cure that by inheriting the irq domain from the physical function
device.

That's a temporary workaround. The correct fix is to inherit the irq 
domain

from the bus, but that's a larger effort which needs quite some other
changes to the way how x86 manages PCI and MSI domains.


Bah, that's not really going to work with the way how irq remapping
works on x86 because at least Intel/DMAR can have more than one DMAR
unit on a bus.

So the alternative solution would be to assign the domain per device,
but the current ordering creates a hen and egg problem. Looking the
domain up in pci_set_msi_domain() does not work because at that point
the device is not registered in the IOMMU. That happens from
device_add().

Marc, is there any problem to reorder the calls in pci_device_add():

  device_add();
  pci_set_msi_domain();


I *think* it works as long as we keep the "match_driver = false" hack.
Otherwise, we risk binding to a driver early, and game over.


That would allow to add a irq_find_matching_fwspec() based lookup to
pci_msi_get_device_domain().


Just so that I understand the issue: is the core of the problem that
there is no 1:1 mapping between a PCI bus and a DMAR unit, and no
firmware topology information to indicate which one to pick?


Though I'm not yet convinced that the outcome would be less horrible
than the hack in the DMAR driver when I'm taking all the other horrors
of x86 (including XEN) into account :)


I tried to follow the notifier into the DMAR driver, ended up in the
IRQ remapping code, and lost the will to live. I have a question though:

In the bus notifier callback, you end-up in dmar_pci_bus_add_dev(),
which calls intel_irq_remap_add_device(), which tries to set the
MSI domain. Why isn't that enough? Are we still missing any information
at that stage?

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Add default domain quirk for Arm Fast Models

2021-06-30 Thread Marc Zyngier
On Tue, 29 Jun 2021 18:34:40 +0100,
Will Deacon  wrote:
> 
> On Fri, Jun 18, 2021 at 05:24:49PM +0100, Robin Murphy wrote:
> > Arm Fast Models are still implementing legacy virtio-pci devices behind
> > the SMMU, which continue to be problematic as "real hardware" devices
> > (from the point of view of the simulated system) without the mitigating
> > VIRTIO_F_ACCESS_PLATFORM feature.
> > 
> > Since we now have the ability to force passthrough on a device-specific
> > basis, let's use it to work around this particular oddity so that people
> > who just want to boot Linux on a model don't have to fiddle around with
> > things to avoid the SMMU getting in the way by default (and I don't have
> > to keep telling them which things...)
> > 
> > Signed-off-by: Robin Murphy 
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
> >  1 file changed, 15 insertions(+)
> 
> Any chance of getting the fastmodels updated instead? It feels like it
> has to happen *eventually*, and then there would be no need for this bodge.

That'd be ideal. What are the chances of that happening before the Sun
turns into a black hole?

> Will
> 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 54b2f27b81d4..13cf16e8f45b 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2649,6 +2649,20 @@ static int arm_smmu_dev_disable_feature(struct 
> > device *dev,
> > }
> >  }
> >  
> > +static int arm_smmu_def_domain_type(struct device *dev)
> > +{
> > +   if (dev_is_pci(dev)) {
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +   /* Legacy virtio-block devices on Arm Fast Models */
> > +   if (pdev->vendor == 0x1af4 && pdev->device == 0x1001 &&
> > +   pdev->subsystem_vendor == 0x00ff && pdev->subsystem_device 
> > == 0x0002)
> > +   return IOMMU_DOMAIN_IDENTITY;
> > +   }
> > +
> > +   return 0;
> > +}

Could this be expressed as a PCI quirk instead? It would at least keep
the ID matching out of the SMMU driver...

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-04-24 Thread Marc Zyngier
On Fri, 23 Apr 2021 18:58:23 +0100,
Krishna Reddy  wrote:
> 
> >> Did that patch cause any issue, or is it just not needed on your system?
> >> It fixes an hypothetical problem with the way ATS is implemented. 
> >> Maybe I actually observed it on an old software model, I don't 
> >> remember. Either way it's unlikely to go upstream but I'd like to know 
> >> if I should drop it from my tree.
> 
> > Had to revert same patch "mm: notify remote TLBs when dirtying a PTE" to
> > avoid below crash[1]. I am not sure about the cause yet.
> 
> I have noticed this issue earlier with patch pointed here and root
> caused the issue as below.  It happens after vfio_mmap request from
> QEMU for the PCIe device and during the access of VA when PTE access
> flags are updated.
> 
> kvm_mmu_notifier_change_pte() --> kvm_set_spte_hve() -->
> kvm_set_spte_hva() --> clean_dcache_guest_page()
> 
> The validation model doesn't have FWB capability supported.
> __clean_dcache_guest_page() attempts to perform dcache flush on pcie
> bar address(not a valid_pfn()) through page_address(), which doesn't
> have page table mapping and leads to exception.
> 
> I have worked around the issue by filtering out the request if the
> pfn is not valid in __clean_dcache_guest_page().  As the patch
> wasn't posted in the community, reverted it as well.

That's papering over the real issue, and this mapping path needs
fixing as it was only ever expected to be called for CoW.

Can you please try the following patch and let me know if that fixes
the issue for good?

Thanks,

M.

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..b62dd40a4083 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1147,7 +1147,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
pte_t pte)
 * We've moved a page around, probably through CoW, so let's treat it
 * just like a translation fault and clean the cache to the PoC.
 */
-   clean_dcache_guest_page(pfn, PAGE_SIZE);
+   if (!kvm_is_device_pfn(pfn))
+   clean_dcache_guest_page(pfn, PAGE_SIZE);
handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, );
return 0;
 }


-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number

2021-03-26 Thread Marc Zyngier
On Fri, 26 Mar 2021 01:02:43 +,
"Dey, Megha"  wrote:
> 
> Hi Marc,
> 
> On 3/25/2021 10:53 AM, Marc Zyngier wrote:
> > On Fri, 26 Feb 2021 20:11:17 +,
> > Megha Dey  wrote:
> >> From: Dave Jiang 
> >> 
> >> Add new helpers to get the Linux IRQ number and device specific index
> >> for given device-relative vector so that the drivers don't need to
> >> allocate their own arrays to keep track of the vectors and hwirq for
> >> the multi vector device MSI case.
> >> 
> >> Reviewed-by: Tony Luck 
> >> Signed-off-by: Dave Jiang 
> >> Signed-off-by: Megha Dey 
> >> ---
> >>   include/linux/msi.h |  2 ++
> >>   kernel/irq/msi.c| 44 
> >>   2 files changed, 46 insertions(+)
> >> 
> >> diff --git a/include/linux/msi.h b/include/linux/msi.h
> >> index 24abec0..d60a6ba 100644
> >> --- a/include/linux/msi.h
> >> +++ b/include/linux/msi.h
> >> @@ -451,6 +451,8 @@ struct irq_domain 
> >> *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
> >>   int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
> >>   irq_write_msi_msg_t write_msi_msg);
> >>   void platform_msi_domain_free_irqs(struct device *dev);
> >> +int msi_irq_vector(struct device *dev, unsigned int nr);
> >> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
> >> /* When an MSI domain is used as an intermediate domain */
> >>   int msi_domain_prepare_irqs(struct irq_domain *domain, struct device 
> >> *dev,
> >> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> >> index 047b59d..f2a8f55 100644
> >> --- a/kernel/irq/msi.c
> >> +++ b/kernel/irq/msi.c
> >> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct 
> >> irq_domain *domain)
> >>return (struct msi_domain_info *)domain->host_data;
> >>   }
> >>   +/**
> >> + * msi_irq_vector - Get the Linux IRQ number of a device vector
> >> + * @dev: device to operate on
> >> + * @nr: device-relative interrupt vector index (0-based).
> >> + *
> >> + * Returns the Linux IRQ number of a device vector.
> >> + */
> >> +int msi_irq_vector(struct device *dev, unsigned int nr)
> >> +{
> >> +  struct msi_desc *entry;
> >> +  int i = 0;
> >> +
> >> +  for_each_msi_entry(entry, dev) {
> >> +  if (i == nr)
> >> +  return entry->irq;
> >> +  i++;
> > This obviously doesn't work with Multi-MSI, does it?
> 
> This API is only for devices that support device MSI interrupts. They
> follow MSI-x format and don't support multi MSI (part of MSI).
> 
> Not sure if I am missing something here, can you please let me know?

Nothing in the prototype of the function indicates this limitation,
nor does the documentation. And I'm not sure why you should exclude
part of the MSI functionality here. It can't be for performance
reason, so you might as well make sure this works for all the MSI
variants:

int msi_irq_vector(struct device *dev, unsigned int nr)
{
struct msi_desc *entry;
int irq, index = 0;

for_each_msi_vector(entry, irq, dev) {
if (index == nr}
return irq;
index++;
}

return WARN_ON_ONCE(-EINVAL);
}

>
> > 
> >> +  }
> >> +  WARN_ON_ONCE(1);
> >> +  return -EINVAL;
> >> +}
> >> +EXPORT_SYMBOL_GPL(msi_irq_vector);
> >> +
> >> +/**
> >> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
> >> + * @dev: device to operate on
> >> + * @nr: device-relative interrupt vector index (0-based).
> >> + *
> >> + * Return the dev_msi hw IRQ number of a device vector.
> >> + */
> >> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
> >> +{
> >> +  struct msi_desc *entry;
> >> +  int i = 0;
> >> +
> >> +  for_each_msi_entry(entry, dev) {
> >> +  if (i == nr)
> >> +  return entry->device_msi.hwirq;
> >> +  i++;
> >> +  }
> >> +  WARN_ON_ONCE(1);
> >> +  return -EINVAL;
> >> +}

And this helper would be more generally useful if it returned the n-th
msi_desc entry rather than some obscure field in a substructure.

struct msi_desc *msi_get_nth_desc(struct device *dev, unsigned int nth)
{
struct msi_desc *entry = NULL;
unsigned in

Re: [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks

2021-03-25 Thread Marc Zyngier
On Fri, 26 Feb 2021 20:11:11 +,
Megha Dey  wrote:
> 
> From: Thomas Gleixner 
> 
> For devices which don't have a standard storage for MSI messages like the
> upcoming IMS (Interrupt Message Store) it's required to allocate storage
> space before allocating interrupts and after freeing them.
> 
> This could be achieved with the existing callbacks, but that would be
> awkward because they operate on msi_alloc_info_t which is not uniform
> across architectures. Also these callbacks are invoked per interrupt but
> the allocation might have bulk requirements depending on the device.
> 
> As such devices can operate on different architectures it is simpler to
> have separate callbacks which operate on struct device. The resulting
> storage information has to be stored in struct msi_desc so the underlying
> irq chip implementation can retrieve it for the relevant operations.
> 
> Reviewed-by: Tony Luck 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Megha Dey 
> ---
>  include/linux/msi.h |  8 
>  kernel/irq/msi.c| 11 +++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 46e879c..e915932 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -323,6 +323,10 @@ struct msi_domain_info;
>   *   function.
>   * @domain_free_irqs:Optional function to override the default free
>   *   function.
> + * @msi_alloc_store: Optional callback to allocate storage in a device
> + *   specific non-standard MSI store
> + * @msi_alloc_free:  Optional callback to free storage in a device
> + *   specific non-standard MSI store
>   *
>   * @get_hwirq, @msi_init and @msi_free are callbacks used by
>   * msi_create_irq_domain() and related interfaces
> @@ -372,6 +376,10 @@ struct msi_domain_ops {
>struct device *dev, int nvec);
>   void(*domain_free_irqs)(struct irq_domain *domain,
>   struct device *dev);
> + int (*msi_alloc_store)(struct irq_domain *domain,
> +struct device *dev, int nvec);
> + void(*msi_free_store)(struct irq_domain *domain,
> +   struct device *dev);
>  };
>  
>  /**
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index c54316d..047b59d 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, 
> struct device *dev,
>   if (ret)
>   return ret;
>  
> + if (ops->msi_alloc_store) {
> + ret = ops->msi_alloc_store(domain, dev, nvec);

What is supposed to happen if we get aliasing devices (similar to what
we have with devices behind a PCI bridge)?

The ITS code goes through all kind of hoops to try and detect this
case when sizing the translation tables (in the .prepare callback),
and I have the feeling that sizing the message store is analogous.

Or do we all have the warm fuzzy feeling that aliasing is a thing of
the past and that we can ignore this potential problem?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt

2021-03-25 Thread Marc Zyngier
On Fri, 26 Feb 2021 20:11:12 +,
Megha Dey  wrote:
> 
> Introduce a new function pointer in the irq_chip structure(irq_set_auxdata)
> which is responsible for updating data which is stored in a shared register
> or data storage. For example, the idxd driver uses the auxiliary data API
> to enable/set and disable PASID field that is in the IMS entry (introduced
> in a later patch) and that data are not typically present in MSI entry.
> 
> Reviewed-by: Tony Luck 
> Signed-off-by: Megha Dey 
> ---
>  include/linux/interrupt.h |  2 ++
>  include/linux/irq.h   |  4 
>  kernel/irq/manage.c   | 32 
>  3 files changed, 38 insertions(+)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 967e257..461ed1c 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -496,6 +496,8 @@ extern int irq_get_irqchip_state(unsigned int irq, enum 
> irqchip_irq_state which,
>  extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state 
> which,
>bool state);
>  
> +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val);
> +
>  #ifdef CONFIG_IRQ_FORCED_THREADING
>  # ifdef CONFIG_PREEMPT_RT
>  #  define force_irqthreads   (true)
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 2efde6a..fc19f32 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -491,6 +491,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct 
> irq_data *d)
>   *   irq_request_resources
>   * @irq_compose_msi_msg: optional to compose message content for MSI
>   * @irq_write_msi_msg:   optional to write message content for MSI
> + * @irq_set_auxdata: Optional function to update auxiliary data e.g. in
> + *   shared registers
>   * @irq_get_irqchip_state:   return the internal state of an interrupt
>   * @irq_set_irqchip_state:   set the internal state of a interrupt
>   * @irq_set_vcpu_affinity:   optional to target a vCPU in a virtual machine
> @@ -538,6 +540,8 @@ struct irq_chip {
>   void(*irq_compose_msi_msg)(struct irq_data *data, struct 
> msi_msg *msg);
>   void(*irq_write_msi_msg)(struct irq_data *data, struct 
> msi_msg *msg);
>  
> + int (*irq_set_auxdata)(struct irq_data *data, unsigned int 
> which, u64 auxval);
> +
>   int (*irq_get_irqchip_state)(struct irq_data *data, enum 
> irqchip_irq_state which, bool *state);
>   int (*irq_set_irqchip_state)(struct irq_data *data, enum 
> irqchip_irq_state which, bool state);
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 85ede4e..68ff559 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -2860,3 +2860,35 @@ bool irq_check_status_bit(unsigned int irq, unsigned 
> int bitmask)
>   return res;
>  }
>  EXPORT_SYMBOL_GPL(irq_check_status_bit);
> +
> +/**
> + * irq_set_auxdata - Set auxiliary data
> + * @irq: Interrupt to update
> + * @which:   Selector which data to update
> + * @auxval:  Auxiliary data value
> + *
> + * Function to update auxiliary data for an interrupt, e.g. to update data
> + * which is stored in a shared register or data storage (e.g. IMS).
> + */
> +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)

This looks to me like a massively generalised version of
irq_set_irqchip_state(), only without any defined semantics when it
comes to the 'which' state, making it look like the irqchip version of
an ioctl.

We also have the irq_set_vcpu_affinity() callback that is used to
perpetrate all sort of sins (and I have abused this interface more
than I should admit it).

Can we try and converge on a single interface that allows for
"side-band state" to be communicated, with documented state?

> +{
> + struct irq_desc *desc;
> + struct irq_data *data;
> + unsigned long flags;
> + int res = -ENODEV;
> +
> + desc = irq_get_desc_buslock(irq, , 0);
> + if (!desc)
> + return -EINVAL;
> +
> + for (data = >irq_data; data; data = irqd_get_parent_data(data)) {
> + if (data->chip->irq_set_auxdata) {
> + res = data->chip->irq_set_auxdata(data, which, val);

And this is where things can break: because you don't define what
'which' is, you can end-up with two stacked layers clashing in their
interpretation of 'which', potentially doing the wrong thing.

Short of having a global, cross architecture definition of all the
possible states, this is frankly dodgy.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number

2021-03-25 Thread Marc Zyngier
On Fri, 26 Feb 2021 20:11:17 +,
Megha Dey  wrote:
> 
> From: Dave Jiang 
> 
> Add new helpers to get the Linux IRQ number and device specific index
> for given device-relative vector so that the drivers don't need to
> allocate their own arrays to keep track of the vectors and hwirq for
> the multi vector device MSI case.
> 
> Reviewed-by: Tony Luck 
> Signed-off-by: Dave Jiang 
> Signed-off-by: Megha Dey 
> ---
>  include/linux/msi.h |  2 ++
>  kernel/irq/msi.c| 44 
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 24abec0..d60a6ba 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct 
> fwnode_handle *fwnode,
>  int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
>  irq_write_msi_msg_t write_msi_msg);
>  void platform_msi_domain_free_irqs(struct device *dev);
> +int msi_irq_vector(struct device *dev, unsigned int nr);
> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
>  
>  /* When an MSI domain is used as an intermediate domain */
>  int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 047b59d..f2a8f55 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct 
> irq_domain *domain)
>   return (struct msi_domain_info *)domain->host_data;
>  }
>  
> +/**
> + * msi_irq_vector - Get the Linux IRQ number of a device vector
> + * @dev: device to operate on
> + * @nr: device-relative interrupt vector index (0-based).
> + *
> + * Returns the Linux IRQ number of a device vector.
> + */
> +int msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> + struct msi_desc *entry;
> + int i = 0;
> +
> + for_each_msi_entry(entry, dev) {
> + if (i == nr)
> + return entry->irq;
> + i++;

This obviously doesn't work with Multi-MSI, does it?

> + }
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(msi_irq_vector);
> +
> +/**
> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
> + * @dev: device to operate on
> + * @nr: device-relative interrupt vector index (0-based).
> + *
> + * Return the dev_msi hw IRQ number of a device vector.
> + */
> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
> +{
> + struct msi_desc *entry;
> + int i = 0;
> +
> + for_each_msi_entry(entry, dev) {
> + if (i == nr)
> + return entry->device_msi.hwirq;
> + i++;
> + }
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(dev_msi_hwirq);
> +
>  #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */

And what uses these helpers?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver

2021-03-25 Thread Marc Zyngier
On Fri, 26 Feb 2021 20:11:16 +,
Megha Dey  wrote:
> 
> Generic IMS(Interrupt Message Store) irq chips and irq domain
> implementations for IMS based devices which store the interrupt messages
> in an array in device memory.
> 
> Allocation and freeing of interrupts happens via the generic
> msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
> required as long as the interrupt domain is stored in the underlying
> device struct. The irq_set_auxdata() is used to program the pasid into
> the IMS entry.
> 
> [Megha: Fixed compile time errors
> Added necessary dependencies to IMS_MSI_ARRAY config
> Fixed polarity of IMS_VECTOR_CTRL
> Added reads after writes to flush writes to device
> Added set_desc ops to IMS msi domain ops
> Tested the IMS infrastructure with the IDXD driver]
> 
> Reviewed-by: Tony Luck 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Megha Dey 
> ---
>  drivers/irqchip/Kconfig |  14 +++
>  drivers/irqchip/Makefile|   1 +
>  drivers/irqchip/irq-ims-msi.c   | 211 
> 
>  include/linux/irqchip/irq-ims-msi.h |  68 
>  4 files changed, 294 insertions(+)
>  create mode 100644 drivers/irqchip/irq-ims-msi.c
>  create mode 100644 include/linux/irqchip/irq-ims-msi.h
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index e74fa20..2fb0c24 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -586,4 +586,18 @@ config MST_IRQ
>   help
> Support MStar Interrupt Controller.
>  
> +config IMS_MSI
> + depends on PCI
> + select DEVICE_MSI
> + bool
> +
> +config IMS_MSI_ARRAY
> + bool "IMS Interrupt Message Store MSI controller for device memory 
> storage arrays"
> + depends on PCI
> + select IMS_MSI
> + select GENERIC_MSI_IRQ_DOMAIN
> + help
> +   Support for IMS Interrupt Message Store MSI controller
> +   with IMS slot storage in a slot array in device memory
> +
>  endmenu
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index c59b95a..e903201 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_MSI)+= 
> irq-loongson-pch-msi.o
>  obj-$(CONFIG_MST_IRQ)+= irq-mst-intc.o
>  obj-$(CONFIG_SL28CPLD_INTC)  += irq-sl28cpld.o
>  obj-$(CONFIG_MACH_REALTEK_RTL)   += irq-realtek-rtl.o
> +obj-$(CONFIG_IMS_MSI)+= irq-ims-msi.o
> diff --git a/drivers/irqchip/irq-ims-msi.c b/drivers/irqchip/irq-ims-msi.c
> new file mode 100644
> index 000..fa23207
> --- /dev/null
> +++ b/drivers/irqchip/irq-ims-msi.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// (C) Copyright 2021 Thomas Gleixner 
> +/*
> + * Shared interrupt chips and irq domains for IMS devices
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#ifdef CONFIG_IMS_MSI_ARRAY

Given that this covers the whole driver, what is this #defined used
for? You might as well make the driver depend on this config option.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks

2021-03-26 Thread Marc Zyngier
On Thu, 25 Mar 2021 18:44:48 +,
Thomas Gleixner  wrote:
> 
> On Thu, Mar 25 2021 at 17:08, Marc Zyngier wrote:
> > Megha Dey  wrote:
> >> @@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain 
> >> *domain, struct device *dev,
> >>if (ret)
> >>return ret;
> >>  
> >> +  if (ops->msi_alloc_store) {
> >> +  ret = ops->msi_alloc_store(domain, dev, nvec);
> >
> > What is supposed to happen if we get aliasing devices (similar to what
> > we have with devices behind a PCI bridge)?
> >
> > The ITS code goes through all kind of hoops to try and detect this
> > case when sizing the translation tables (in the .prepare callback),
> > and I have the feeling that sizing the message store is analogous.
> 
> No. The message store itself is sized upfront by the underlying 'master'
> device. Each 'master' device has it's own irqdomain.
> 
> This is the allocation for the subdevice and this is not part of PCI and
> therefore not subject to PCI aliasing.

Fair enough. If we are guaranteed that there is no aliasing, then this
point is moot.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt

2021-03-26 Thread Marc Zyngier
On Thu, 25 Mar 2021 18:59:48 +,
Thomas Gleixner  wrote:
> 
> On Thu, Mar 25 2021 at 17:23, Marc Zyngier wrote:
> >> +{
> >> +  struct irq_desc *desc;
> >> +  struct irq_data *data;
> >> +  unsigned long flags;
> >> +  int res = -ENODEV;
> >> +
> >> +  desc = irq_get_desc_buslock(irq, , 0);
> >> +  if (!desc)
> >> +  return -EINVAL;
> >> +
> >> +  for (data = >irq_data; data; data = irqd_get_parent_data(data)) {
> >> +  if (data->chip->irq_set_auxdata) {
> >> +  res = data->chip->irq_set_auxdata(data, which, val);
> >
> > And this is where things can break: because you don't define what
> > 'which' is, you can end-up with two stacked layers clashing in their
> > interpretation of 'which', potentially doing the wrong thing.
> >
> > Short of having a global, cross architecture definition of all the
> > possible states, this is frankly dodgy.
> 
> My bad, I suggested this in the first place.
> 
> So what you suggest is to make 'which' an enum and have that in
> include/linux/whatever.h so we end up with unique identifiers accross
> architectures, irqdomains and whatever, right?

Exactly. As long as 'which' is unique and unambiguous, we can solve
the stacking issue (which is oddly starting to smell like the ghost of
the SVR3 STREAMS... /me runs ;-).

Once we have that, I can start killing the irq_set_vcpu_affinity()
abuse I have been guilty of over the past years. Even more, we could
kill irq_set_vcpu_affinity() altogether, because we have a generic way
of passing side-band information from a driver down to the IRQ stack.

> That makes a lot of sense.
> 
> Though that leaves the question of the data type for 'val'. While u64 is
> probably good enough for most stuff, anything which needs more than that
> is left out (again). union as arguments are horrible especially if you
> need the counterpart irq_get_auxdata() for which you need a pointer and
> then you can't do a forward declaration. Something like this might work
> though and avoid to make the pointer business unconditional:
> 
> struct irq_auxdata {
>union {
>u64val;
>  struct foo *foo;
>};
> };

I guess that at some point, irq_get_auxdata() will become a
requirement so we might as well bite the bullet now, and the above
looks like a good start to me.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs

2021-03-09 Thread Marc Zyngier
Hi Shameer,

[+Will]

On Mon, 22 Feb 2021 15:53:36 +,
Shameer Kolothum  wrote:
> 
> On an ARM64 system with a SMMUv3 implementation that fully supports
> Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate
> instructions are received by SMMU. This is very useful when the
> SMMU shares the page tables with the CPU(eg: Guest SVA use case).
> For this to work, the SMMU must use the same VMID that is allocated
> by KVM to configure the stage 2 translations.
> 
> At present KVM VMID allocations are recycled on rollover and may
> change as a result. This will create issues if we have to share
> the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into
> two, the first half follows the normal recycle on rollover policy
> while the second half of the VMID pace is used to allocate pinned
> VMIDs. This feature is enabled based on a command line option
> "kvm-arm.pinned_vmid_enable".

I think this is the wrong approach. Instead of shoving the notion of
pinned VMID into the current allocator, which really isn't designed
for this, it'd be a lot better if we aligned the KVM VMID allocator
with the ASID allocator, which already has support for pinning and is
in general much more efficient.

Julien Grall worked on such a series[1] a long while ago, which got
stalled because of the 32bit KVM port. Since we don't have this burden
anymore, I'd rather you look in that direction instead of wasting half
of the VMID space on potentially pinned VMIDs.

Thanks,

M.

[1] 
https://patchwork.kernel.org/project/linux-arm-kernel/cover/20190724162534.7390-1-julien.gr...@arm.com/


-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-21 Thread Marc Zyngier
On Thu, 21 Oct 2021 03:22:30 +0100,
Lu Baolu  wrote:
> 
> On 10/20/21 10:22 PM, Marc Zyngier wrote:
> > On Wed, 20 Oct 2021 06:21:44 +0100,
> > Lu Baolu  wrote:
> >> 
> >> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> >>> + /*
> >>> +  * Check that CPU pages can be represented by the IOVA granularity.
> >>> +  * This has to be done after ops->attach_dev since many IOMMU drivers
> >>> +  * only limit domain->pgsize_bitmap after having attached the first
> >>> +  * device.
> >>> +  */
> >>> + ret = iommu_check_page_size(domain);
> >>> + if (ret) {
> >>> + __iommu_detach_device(domain, dev);
> >>> + return ret;
> >>> + }
> >> 
> >> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> >> device. How does it relate to CPU pages? Why is it a failure case if CPU
> >> page size is not covered?
> > 
> > If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> > that now can DMA to more than what you have allocated because the
> > IOMMU's own page size is larger, the device has now access to data it
> > shouldn't see. In my book, that's a pretty bad thing.
> 
> But even you enforce the CPU page size check here, this problem still
> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?

Let me take a CPU analogy: you have a page that contains some user
data *and* a kernel secret. How do you map this page into userspace
without leaking the kernel secret?

PAGE_SIZE allocations are the unit of isolation, and this applies to
both CPU and IOMMU. If you have allocated a DMA buffer that is less
than a page, you then have to resort to bounce buffering, or accept
that your data isn't safe.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-20 Thread Marc Zyngier
On Wed, 20 Oct 2021 06:21:44 +0100,
Lu Baolu  wrote:
> 
> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> > The iova allocator is capable of handling any granularity which is a power
> > of two. Remove the much stronger condition that the granularity must be
> > smaller or equal to the CPU page size from a BUG_ON there.
> > Instead, check this condition during __iommu_attach_device and fail
> > gracefully.
> > 
> > Signed-off-by: Sven Peter
> > ---
> >   drivers/iommu/iommu.c | 35 ---
> >   drivers/iommu/iova.c  |  7 ---
> >   include/linux/iommu.h |  5 +
> >   3 files changed, 41 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index dd7863e453a5..28896739964b 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
> > bus_type *bus,
> >  unsigned type);
> >   static int __iommu_attach_device(struct iommu_domain *domain,
> >  struct device *dev);
> > +static void __iommu_detach_device(struct iommu_domain *domain,
> > + struct device *dev);
> >   static int __iommu_attach_group(struct iommu_domain *domain,
> > struct iommu_group *group);
> >   static void __iommu_detach_group(struct iommu_domain *domain,
> > @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_domain_free);
> >   +static int iommu_check_page_size(struct iommu_domain *domain)
> > +{
> > +   if (!iommu_is_paging_domain(domain))
> > +   return 0;
> > +
> > +   if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1 {
> > +   pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
> > +   return -EFAULT;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >   static int __iommu_attach_device(struct iommu_domain *domain,
> >  struct device *dev)
> >   {
> > @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain 
> > *domain,
> > return -ENODEV;
> > ret = domain->ops->attach_dev(domain, dev);
> > -   if (!ret)
> > -   trace_attach_device_to_domain(dev);
> > -   return ret;
> > +   if (ret)
> > +   return ret;
> > +
> > +   /*
> > +* Check that CPU pages can be represented by the IOVA granularity.
> > +* This has to be done after ops->attach_dev since many IOMMU drivers
> > +* only limit domain->pgsize_bitmap after having attached the first
> > +* device.
> > +*/
> > +   ret = iommu_check_page_size(domain);
> > +   if (ret) {
> > +   __iommu_detach_device(domain, dev);
> > +   return ret;
> > +   }
> 
> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> device. How does it relate to CPU pages? Why is it a failure case if CPU
> page size is not covered?

If you allocate a CPU PAGE_SIZE'd region, and point it at a device
that now can DMA to more than what you have allocated because the
IOMMU's own page size is larger, the device has now access to data it
shouldn't see. In my book, that's a pretty bad thing.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dart: Clear sid2group entry when a group is freed

2021-09-24 Thread Marc Zyngier
On Fri, 24 Sep 2021 14:45:02 +0100,
Sven Peter  wrote:
> 
> sid2groups keeps track of which stream id combinations belong to a
> iommu_group to assign those correctly to devices.
> When a iommu_group is freed a stale pointer will however remain in
> sid2groups. This prevents devices with the same stream id combination
> to ever be attached again (see below).
> Fix that by creating a shadow copy of the stream id configuration
> when a group is allocated for the first time and clear the sid2group
> entry when that group is freed.
> 
>   # echo 1 >/sys/bus/pci/devices/\:03\:00.0/remove
>   pci :03:00.0: Removing from iommu group 1
>   # echo 1 >/sys/bus/pci/rescan
>   [...]
>   pci :03:00.0: BAR 0: assigned [mem 0x6a000-0x6a000 64bit pref]
>   pci :03:00.0: BAR 2: assigned [mem 0x6a001-0x6a001 64bit pref]
>   pci :03:00.0: BAR 6: assigned [mem 0x6c010-0x6c01007ff pref]
>   tg3 :03:00.0: Failed to add to iommu group 1: -2
>   [...]
> 
> Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver")
> Reported-by: Marc Zyngier 
> Signed-off-by: Sven Peter 

Thanks for posting this.

Tested-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dart: Remove iommu_flush_ops

2021-09-21 Thread Marc Zyngier
On Tue, 21 Sep 2021 16:39:34 +0100,
Sven Peter  wrote:
> 
> apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain
> but instead get a struct iommu_domain right now. This breaks those two
> functions and can lead to kernel panics like the one below.
> DART can only invalidate the entire TLB and apple_dart_iotlb_sync will
> already flush everything. There's no need to do that again inside those
> two functions. Let's just drop them.
> 
>   pci :03:00.0: Removing from iommu group 1
>   Unable to handle kernel paging request at virtual address 00010023
>   [...]
>   Call trace:
>_raw_spin_lock_irqsave+0x54/0xbc
>apple_dart_hw_stream_command.constprop.0+0x2c/0x130
>apple_dart_tlb_flush_all+0x48/0x90
>free_io_pgtable_ops+0x40/0x70
>apple_dart_domain_free+0x2c/0x44
>iommu_group_release+0x68/0xac
>kobject_cleanup+0x4c/0x1fc
>kobject_cleanup+0x14c/0x1fc
>kobject_put+0x64/0x84
>iommu_group_remove_device+0x110/0x180
>iommu_release_device+0x50/0xa0
>   [...]
> 
> Fixes: 46d1fb072e76b161 ("iommu/dart: Add DART iommu driver")
> Reported-by: Marc Zyngier 
> Signed-off-by: Sven Peter 

Thanks for addressing this so quickly.

Acked-by: Marc Zyngier 
Tested-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-22 Thread Marc Zyngier
On Fri, 22 Oct 2021 03:52:38 +0100,
Lu Baolu  wrote:
> 
> On 10/21/21 4:10 PM, Marc Zyngier wrote:
> > On Thu, 21 Oct 2021 03:22:30 +0100,
> > Lu Baolu  wrote:
> >> 
> >> On 10/20/21 10:22 PM, Marc Zyngier wrote:
> >>> On Wed, 20 Oct 2021 06:21:44 +0100,
> >>> Lu Baolu  wrote:
> >>>> 
> >>>> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> >>>>> +   /*
> >>>>> +* Check that CPU pages can be represented by the IOVA 
> >>>>> granularity.
> >>>>> +* This has to be done after ops->attach_dev since many IOMMU 
> >>>>> drivers
> >>>>> +* only limit domain->pgsize_bitmap after having attached the 
> >>>>> first
> >>>>> +* device.
> >>>>> +*/
> >>>>> +   ret = iommu_check_page_size(domain);
> >>>>> +   if (ret) {
> >>>>> +   __iommu_detach_device(domain, dev);
> >>>>> +   return ret;
> >>>>> +   }
> >>>> 
> >>>> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> >>>> device. How does it relate to CPU pages? Why is it a failure case if CPU
> >>>> page size is not covered?
> >>> 
> >>> If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> >>> that now can DMA to more than what you have allocated because the
> >>> IOMMU's own page size is larger, the device has now access to data it
> >>> shouldn't see. In my book, that's a pretty bad thing.
> >> 
> >> But even you enforce the CPU page size check here, this problem still
> >> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?
> > 
> > Let me take a CPU analogy: you have a page that contains some user
> > data *and* a kernel secret. How do you map this page into userspace
> > without leaking the kernel secret?
> > 
> > PAGE_SIZE allocations are the unit of isolation, and this applies to
> > both CPU and IOMMU. If you have allocated a DMA buffer that is less
> > than a page, you then have to resort to bounce buffering, or accept
> > that your data isn't safe.
> 
> I can understand the problems when IOMMU page sizes is larger than CPU
> page size. But the code itself is not clean. The vendor iommu drivers
> know more hardware details than the iommu core. It looks odd that the
> vendor iommu says "okay, I can attach this I/O page table to the
> device", but the iommu core says "no, you can't" and rolls everything
> back.

If your IOMMU driver can do things behind the core's back and
contradict the view that the core has, then it is probably time to fix
your IOMMU driver and make the core aware of what is going on.
Supported page sizes is one of these things.

In general, keeping the IOMMU driver as dumb as possible is a worthy
design goal, and this is why we have these abstractions.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch 29/37] PCI/MSI: Use __msi_get_virq() in pci_get_vector()

2021-11-28 Thread Marc Zyngier
On Sat, 27 Nov 2021 01:22:03 +,
Thomas Gleixner  wrote:
> 
> Use __msi_get_vector() and handle the return values to be compatible.
> 
> No functional change intended.

You wish ;-).

[   15.779540] pcieport 0001:00:01.0: AER: request AER IRQ -22 failed

Notice how amusing the IRQ number is?

> 
> Signed-off-by: Thomas Gleixner 
> ---
>  drivers/pci/msi/msi.c |   25 +
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -1023,28 +1023,13 @@ EXPORT_SYMBOL(pci_free_irq_vectors);
>   */
>  int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
>  {
> - if (dev->msix_enabled) {
> - struct msi_desc *entry;
> + int irq = __msi_get_virq(>dev, nr);
>  
> - for_each_pci_msi_entry(entry, dev) {
> - if (entry->msi_index == nr)
> - return entry->irq;
> - }
> - WARN_ON_ONCE(1);
> - return -EINVAL;
> + switch (irq) {
> + case -ENODEV: return !nr ? dev->irq : -EINVAL;
> + case -ENOENT: return -EINVAL;
>   }
> -
> - if (dev->msi_enabled) {
> - struct msi_desc *entry = first_pci_msi_entry(dev);
> -
> - if (WARN_ON_ONCE(nr >= entry->nvec_used))
> - return -EINVAL;
> - } else {
> - if (WARN_ON_ONCE(nr > 0))
> - return -EINVAL;
> - }
> -
> - return dev->irq + nr;
> + return irq;
>  }
>  EXPORT_SYMBOL(pci_irq_vector);

I worked around it with the hack below, but I doubt this is the real
thing. portdrv_core.c does complicated things, and I don't completely
understand its logic.

M.

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 1f72bc734226..b15278a5fb4b 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -1092,8 +1092,9 @@ int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
int irq = __msi_get_virq(>dev, nr);
 
switch (irq) {
-   case -ENODEV: return !nr ? dev->irq : -EINVAL;
-   case -ENOENT: return -EINVAL;
+   case -ENOENT:
+   case -ENODEV:
+   return !nr ? dev->irq : -EINVAL;
}
return irq;
 }

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -next] iommu/arm-smmu-v3: Add suspend and resume support

2021-07-21 Thread Marc Zyngier
On Wed, 21 Jul 2021 12:42:14 +0100,
Robin Murphy  wrote:
> 
> [ +Marc for MSI bits ]
> 
> On 2021-07-21 02:33, Bixuan Cui wrote:
> > Add suspend and resume support for arm-smmu-v3 by low-power mode.
> > 
> > When the smmu is suspended, it is powered off and the registers are
> > cleared. So saves the msi_msg context during msi interrupt initialization
> > of smmu. When resume happens it calls arm_smmu_device_reset() to restore
> > the registers.
> > 
> > Signed-off-by: Bixuan Cui 
> > Reviewed-by: Wei Yongjun 
> > Reviewed-by: Zhen Lei 
> > Reviewed-by: Ding Tianhong 
> > Reviewed-by: Hanjun Guo 
> > ---
> > 
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 72 ++---
> >   1 file changed, 64 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 235f9bdaeaf2..bf1163acbcb1 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -40,6 +40,7 @@ MODULE_PARM_DESC(disable_bypass,
> > static bool disable_msipolling;
> >   module_param(disable_msipolling, bool, 0444);
> > +static bool bypass;
> >   MODULE_PARM_DESC(disable_msipolling,
> > "Disable MSI-based polling for CMD_SYNC completion.");
> >   @@ -3129,11 +3130,37 @@ static void arm_smmu_write_msi_msg(struct
> > msi_desc *desc, struct msi_msg *msg)
> > doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
> > doorbell &= MSI_CFG0_ADDR_MASK;
> >   + /* Saves the msg context for resume if desc->msg is empty */
> > +   if (desc->msg.address_lo == 0 && desc->msg.address_hi == 0) {
> > +   desc->msg.address_lo = msg->address_lo;
> > +   desc->msg.address_hi = msg->address_hi;
> > +   desc->msg.data = msg->data;
> > +   }
> 
> My gut feeling is that this is something a device driver maybe
> shouldn't be poking into, but I'm not entirely familiar with the area
> :/

Certainly not. If you rely on the message being stored into the
descriptors, then implement this in the core code, like we do for PCI.

> 
> > +
> > writeq_relaxed(doorbell, smmu->base + cfg[0]);
> > writel_relaxed(msg->data, smmu->base + cfg[1]);
> > writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]);
> >   }
> >   +static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
> > +{
> > +   struct msi_desc *desc;
> > +   struct device *dev = smmu->dev;
> > +
> > +   for_each_msi_entry(desc, dev) {
> > +   switch (desc->platform.msi_index) {
> > +   case EVTQ_MSI_INDEX:
> > +   case GERROR_MSI_INDEX:
> > +   case PRIQ_MSI_INDEX:
> > +   arm_smmu_write_msi_msg(desc, &(desc->msg));

Consider using get_cached_msi_msg() instead of using the internals of
the descriptor.

> > +   break;
> > +   default:
> > +   continue;
> > +
> > +   }
> > +   }
> > +}
> > +
> >   static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
> >   {
> > struct msi_desc *desc;
> > @@ -3184,11 +3211,17 @@ static void arm_smmu_setup_msis(struct 
> > arm_smmu_device *smmu)
> > devm_add_action(dev, arm_smmu_free_msis, dev);
> >   }
> >   -static void arm_smmu_setup_unique_irqs(struct arm_smmu_device
> > *smmu)
> > +static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu, bool 
> > resume_mode)
> >   {
> > int irq, ret;
> >   - arm_smmu_setup_msis(smmu);
> > +   if (!resume_mode)
> > +   arm_smmu_setup_msis(smmu);
> > +   else {
> > +   /* The irq doesn't need to be re-requested during resume */
> > +   arm_smmu_resume_msis(smmu);
> > +   return;
> 
> What about wired IRQs?

Yeah. I assume the SMMU needs to be told which event gets signalled
using MSIs or IRQs? Or is that implied by the MSI being configured or
not (I used to know the answer to that, but I have long paged it out).

> 
> > +   }
> > /* Request interrupt lines */
> > irq = smmu->evtq.q.irq;
> > @@ -3230,7 +3263,7 @@ static void arm_smmu_setup_unique_irqs(struct 
> > arm_smmu_device *smmu)
> > }
> >   }
> >   -static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
> > +static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu, bool 
> > resume_mode)
> >   {
> > int ret, irq;
> > u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
> > @@ -3257,7 +3290,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
> > *smmu)
> > if (ret < 0)
> > dev_warn(smmu->dev, "failed to enable combined irq\n");
> > } else
> > -   arm_smmu_setup_unique_irqs(smmu);
> > +   arm_smmu_setup_unique_irqs(smmu, resume_mode);
> > if (smmu->features & ARM_SMMU_FEAT_PRI)
> > irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
> > @@ -3282,7 +3315,7 @@ static int arm_smmu_device_disable(struct 
> > arm_smmu_device *smmu)
> > return ret;
> >   }
> >   -static int 

Re: [PATCH -next] iommu/arm-smmu-v3: Add suspend and resume support

2021-07-21 Thread Marc Zyngier
On Wed, 21 Jul 2021 14:59:47 +0100,
Robin Murphy  wrote:
> 
> On 2021-07-21 14:12, Marc Zyngier wrote:
> > On Wed, 21 Jul 2021 12:42:14 +0100,
> > Robin Murphy  wrote:
> >> 
> >> [ +Marc for MSI bits ]
> >> 
> >> On 2021-07-21 02:33, Bixuan Cui wrote:
> >>> Add suspend and resume support for arm-smmu-v3 by low-power mode.
> >>> 
> >>> When the smmu is suspended, it is powered off and the registers are
> >>> cleared. So saves the msi_msg context during msi interrupt initialization
> >>> of smmu. When resume happens it calls arm_smmu_device_reset() to restore
> >>> the registers.
> >>> 
> >>> Signed-off-by: Bixuan Cui 
> >>> Reviewed-by: Wei Yongjun 
> >>> Reviewed-by: Zhen Lei 
> >>> Reviewed-by: Ding Tianhong 
> >>> Reviewed-by: Hanjun Guo 
> >>> ---
> >>> 
> >>>drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 72 ++---
> >>>1 file changed, 64 insertions(+), 8 deletions(-)
> >>> 
> >>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> >>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >>> index 235f9bdaeaf2..bf1163acbcb1 100644
> >>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> >>> @@ -40,6 +40,7 @@ MODULE_PARM_DESC(disable_bypass,
> >>>  static bool disable_msipolling;
> >>>module_param(disable_msipolling, bool, 0444);
> >>> +static bool bypass;
> >>>MODULE_PARM_DESC(disable_msipolling,
> >>>   "Disable MSI-based polling for CMD_SYNC completion.");
> >>>@@ -3129,11 +3130,37 @@ static void arm_smmu_write_msi_msg(struct
> >>> msi_desc *desc, struct msi_msg *msg)
> >>>   doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
> >>>   doorbell &= MSI_CFG0_ADDR_MASK;
> >>>+  /* Saves the msg context for resume if desc->msg is empty */
> >>> + if (desc->msg.address_lo == 0 && desc->msg.address_hi == 0) {
> >>> + desc->msg.address_lo = msg->address_lo;
> >>> + desc->msg.address_hi = msg->address_hi;
> >>> + desc->msg.data = msg->data;
> >>> + }
> >> 
> >> My gut feeling is that this is something a device driver maybe
> >> shouldn't be poking into, but I'm not entirely familiar with the area
> >> :/
> > 
> > Certainly not. If you rely on the message being stored into the
> > descriptors, then implement this in the core code, like we do for PCI.
> 
> Ah, so it would be an acceptable compromise to *read* desc->msg (and
> thus avoid having to store our own copy of the message) if the core
> was guaranteed to cache it? That's good to know, thanks.

Yeah, vfio, a couple of other weird drivers and (*surprise!*) ia64 are
using this kind of trick. I don't see a reason not to implement that
for platform-MSI (although level signalling may be interesting...), or
even to move it into the core MSI code.

> 
> >>> +
> >>>   writeq_relaxed(doorbell, smmu->base + cfg[0]);
> >>>   writel_relaxed(msg->data, smmu->base + cfg[1]);
> >>>   writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + 
> >>> cfg[2]);
> >>>}
> >>>+static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
> >>> +{
> >>> + struct msi_desc *desc;
> >>> + struct device *dev = smmu->dev;
> >>> +
> >>> + for_each_msi_entry(desc, dev) {
> >>> + switch (desc->platform.msi_index) {
> >>> + case EVTQ_MSI_INDEX:
> >>> + case GERROR_MSI_INDEX:
> >>> + case PRIQ_MSI_INDEX:
> >>> + arm_smmu_write_msi_msg(desc, &(desc->msg));
> > 
> > Consider using get_cached_msi_msg() instead of using the internals of
> > the descriptor.
> 
> Oh, there's even a proper API for it, marvellous! I hadn't managed to
> dig that far myself :)

It is a bit odd in the sense that it takes a copy of the message
instead of returning a pointer, but at least this solves lifetime
issues.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-22 Thread Marc Zyngier

On 2021-07-22 12:12, John Garry wrote:

Hi John,

[...]


Your kernel log should show:
[0.00] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
synchronisation


Unrelated, but you seem to be running with ICC_CTLR_EL3.PMHE set,
which makes the overhead of pseudo-NMIs much higher than it should be
(you take a DSB SY on each interrupt unmasking).

If you are not using 1:N distribution of SPIs on the secure side,
consider turning that off in your firmware. This should make NMIs
slightly more pleasant to use.

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -next v2] iommu/arm-smmu-v3: Add suspend and resume support

2021-07-27 Thread Marc Zyngier
On Tue, 27 Jul 2021 13:14:08 +0100,
Bixuan Cui  wrote:
> 
> Add suspend and resume support for arm-smmu-v3 by low-power mode.
> 
> When the smmu is suspended, it is powered off and the registers are
> cleared. So saves the msi_msg context during msi interrupt initialization
> of smmu. When resume happens it calls arm_smmu_device_reset() to restore
> the registers.
> 
> Signed-off-by: Bixuan Cui 
> Reviewed-by: Wei Yongjun 
> Reviewed-by: Zhen Lei 
> Reviewed-by: Ding Tianhong 
> Reviewed-by: Hanjun Guo 
> ---
> Changes in v2:
> * Using get_cached_msi_msg() instead of the descriptor to resume msi_msg
>   in arm_smmu_resume_msis();
> 
> * Move arm_smmu_resume_msis() from arm_smmu_setup_unique_irqs() into
>   arm_smmu_setup_irqs() and rename it to arm_smmu_resume_unique_irqs();
> 
>   Call arm_smmu_setup_unique_irqs() to configure the IRQ during probe and
>   call arm_smmu_resume_unique_irqs() in resume mode to restore the IRQ
>   registers to make the code more reasonable.
> 
> * Call arm_smmu_device_disable() to disable smmu and clear CR0_SMMUEN on
>   suspend. Then the warning about CR0_SMMUEN being enabled can be cleared
>   on resume.
> 
> * Using SET_SYSTEM_SLEEP_PM_OPS();
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 69 ++---
>  1 file changed, 62 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 235f9bdaeaf2..66f35d5c7a70 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -40,6 +40,7 @@ MODULE_PARM_DESC(disable_bypass,
>  
>  static bool disable_msipolling;
>  module_param(disable_msipolling, bool, 0444);
> +static bool bypass;

As outlined before, this is likely to be wrong if you can have
per-SMMU bypass control.

>  MODULE_PARM_DESC(disable_msipolling,
>   "Disable MSI-based polling for CMD_SYNC completion.");
>  
> @@ -3129,11 +3130,38 @@ static void arm_smmu_write_msi_msg(struct msi_desc 
> *desc, struct msi_msg *msg)
>   doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
>   doorbell &= MSI_CFG0_ADDR_MASK;
>  
> + /* Saves the msg context for resume if desc->msg is empty */
> + if (desc->msg.address_lo == 0x0 && desc->msg.address_hi == 0x0) {
> + desc->msg.address_lo = msg->address_lo;
> + desc->msg.address_hi = msg->address_hi;
> + desc->msg.data = msg->data;
> + }

I thought I had made it clear that this approach is not acceptable.
Please fix the generic code to keep track of the latest message.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/dma: Explicitly sort PCI DMA windows

2022-03-23 Thread Marc Zyngier
On Tue, 22 Mar 2022 17:27:36 +,
Robin Murphy  wrote:
> 
> Originally, creating the dma_ranges resource list in pre-sorted fashion
> was the simplest and most efficient way to enforce the order required by
> iova_reserve_pci_windows(). However since then at least one PCI host
> driver is now re-sorting the list for its own probe-time processing,
> which doesn't seem entirely unreasonable, so that basic assumption no
> longer holds. Make iommu-dma robust and get the sort order it needs by
> explicitly sorting, which means we can also save the effort at creation
> time and just build the list in whatever natural order the DT had.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> Looking at this area off the back of the XGene thread[1] made me realise
> that we need to do it anyway, regardless of whether it might also happen
> to restore the previous XGene behaviour or not. Presumably nobody's
> tried to use pcie-cadence-host behind an IOMMU yet...

This definitely restores PCIe functionality on my Mustang (XGene-1).
Hopefully dann can comment on whether this addresses his own issue, as
his firmware is significantly different.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: fully convert arm to use dma-direct

2022-04-23 Thread Marc Zyngier

On 2022-04-22 22:17, Linus Walleij wrote:

On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:


arm is the last platform not using the dma-direct code for directly
mapped DMA.  With the dmaboune removal from Arnd we can easily switch
arm to always use dma-direct now (it already does for LPAE configs
and nommu).  I'd love to merge this series through the dma-mapping 
tree

as it gives us the opportunity for additional core dma-mapping
improvements.

(...)


 b/arch/arm/mach-footbridge/Kconfig   |1
 b/arch/arm/mach-footbridge/common.c  |   19
 b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8
 b/arch/arm/mach-footbridge/include/mach/memory.h |4


I think Marc Z has a Netwinder that he can test this on. Marc?
I have one too, just not much in my office because of parental leave.


I'm about to travel for a week. Can this wait until I'm back?
This is one of the few boxes that isn't hooked up to the PDU,
so I can't operate it remotely.

M.
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/8] irqchip/mips-gic: Only register IPI domain when SMP is enabled

2022-07-07 Thread Marc Zyngier
On Tue, 05 Jul 2022 14:52:43 +0100,
Serge Semin  wrote:
> 
> Hi Samuel
> 
> On Fri, Jul 01, 2022 at 03:00:49PM -0500, Samuel Holland wrote:
> > The MIPS GIC irqchip driver may be selected in a uniprocessor
> > configuration, but it unconditionally registers an IPI domain.
> > 
> > Limit the part of the driver dealing with IPIs to only be compiled when
> > GENERIC_IRQ_IPI is enabled, which corresponds to an SMP configuration.
> 
> Thanks for the patch. Some comment is below.
> 
> > 
> > Reported-by: kernel test robot 
> > Signed-off-by: Samuel Holland 
> > ---
> > 
> > Changes in v3:
> >  - New patch to fix build errors in uniprocessor MIPS configs
> > 
> >  drivers/irqchip/Kconfig|  3 +-
> >  drivers/irqchip/irq-mips-gic.c | 80 +++---
> >  2 files changed, 56 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> > index 1f23a6be7d88..d26a4ff7c99f 100644
> > --- a/drivers/irqchip/Kconfig
> > +++ b/drivers/irqchip/Kconfig
> > @@ -322,7 +322,8 @@ config KEYSTONE_IRQ
> >  
> >  config MIPS_GIC
> > bool
> > -   select GENERIC_IRQ_IPI
> > +   select GENERIC_IRQ_IPI if SMP
> 
> > +   select IRQ_DOMAIN_HIERARCHY
> 
> It seems to me that the IRQ domains hierarchy is supposed to be
> created only if IPI is required. At least that's what the MIPS GIC
> driver implies. Thus we can go further and CONFIG_IRQ_DOMAIN_HIERARCHY
> ifdef-out the gic_irq_domain_alloc() and gic_irq_domain_free()
> methods definition together with the initialization:
> 
>  static const struct irq_domain_ops gic_irq_domain_ops = {
>   .xlate = gic_irq_domain_xlate,
> +#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
>   .alloc = gic_irq_domain_alloc,
>   .free = gic_irq_domain_free,
> +#endif
>   .map = gic_irq_domain_map,
> };
> 
> If the GENERIC_IRQ_IPI config is enabled, CONFIG_IRQ_DOMAIN_HIERARCHY
> will be automatically selected (see the config definition in
> kernel/irq/Kconfig). If the IRQs hierarchy is needed for some another
> functionality like GENERIC_MSI_IRQ_DOMAIN or GPIOs then they will
> explicitly enable the IRQ_DOMAIN_HIERARCHY config thus activating the
> denoted .alloc and .free methods definitions.
> 
> To sum up you can get rid of the IRQ_DOMAIN_HIERARCHY config
> force-select from this patch and make the MIPS GIC driver code a bit
> more coherent.
> 
> @Marc, please correct me if were wrong.

Either way probably works correctly, but Samuel's approach is more
readable IMO. It is far easier to reason about a high-level feature
(GENERIC_IRQ_IPI) than an implementation detail (IRQ_DOMAIN_HIERARCHY).

If you really want to save a handful of bytes, you can make the
callbacks conditional on GENERIC_IRQ_IPI, and be done with it. But
this can come as an additional patch.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/8] genirq: Add and use an irq_data_update_affinity helper

2022-07-07 Thread Marc Zyngier
On Sun, 03 Jul 2022 16:22:03 +0100,
Oleksandr  wrote:
> 
> 
> On 01.07.22 23:00, Samuel Holland wrote:
> 
> 
> Hello Samuel
> 
> > Some architectures and irqchip drivers modify the cpumask returned by
> > irq_data_get_affinity_mask, usually by copying in to it. This is
> > problematic for uniprocessor configurations, where the affinity mask
> > should be constant, as it is known at compile time.
> > 
> > Add and use a setter for the affinity mask, following the pattern of
> > irq_data_update_effective_affinity. This allows the getter function to
> > return a const cpumask pointer.
> > 
> > Signed-off-by: Samuel Holland 
> > ---
> > 
> > Changes in v3:
> >   - New patch to introduce irq_data_update_affinity
> > 
> >   arch/alpha/kernel/irq.c  | 2 +-
> >   arch/ia64/kernel/iosapic.c   | 2 +-
> >   arch/ia64/kernel/irq.c   | 4 ++--
> >   arch/ia64/kernel/msi_ia64.c  | 4 ++--
> >   arch/parisc/kernel/irq.c | 2 +-
> >   drivers/irqchip/irq-bcm6345-l1.c | 4 ++--
> >   drivers/parisc/iosapic.c | 2 +-
> >   drivers/sh/intc/chip.c   | 2 +-
> >   drivers/xen/events/events_base.c | 7 ---
> >   include/linux/irq.h  | 6 ++
> >   10 files changed, 21 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/alpha/kernel/irq.c b/arch/alpha/kernel/irq.c
> > index f6d2946edbd2..15f2effd6baf 100644
> > --- a/arch/alpha/kernel/irq.c
> > +++ b/arch/alpha/kernel/irq.c
> > @@ -60,7 +60,7 @@ int irq_select_affinity(unsigned int irq)
> > cpu = (cpu < (NR_CPUS-1) ? cpu + 1 : 0);
> > last_cpu = cpu;
> >   - cpumask_copy(irq_data_get_affinity_mask(data),
> > cpumask_of(cpu));
> > +   irq_data_update_affinity(data, cpumask_of(cpu));
> > chip->irq_set_affinity(data, cpumask_of(cpu), false);
> > return 0;
> >   }
> > diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
> > index 35adcf89035a..99300850abc1 100644
> > --- a/arch/ia64/kernel/iosapic.c
> > +++ b/arch/ia64/kernel/iosapic.c
> > @@ -834,7 +834,7 @@ iosapic_unregister_intr (unsigned int gsi)
> > if (iosapic_intr_info[irq].count == 0) {
> >   #ifdef CONFIG_SMP
> > /* Clear affinity */
> > -   cpumask_setall(irq_get_affinity_mask(irq));
> > +   irq_data_update_affinity(irq_get_irq_data(irq), cpu_all_mask);
> >   #endif
> > /* Clear the interrupt information */
> > iosapic_intr_info[irq].dest = 0;
> > diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
> > index ecef17c7c35b..275b9ea58c64 100644
> > --- a/arch/ia64/kernel/irq.c
> > +++ b/arch/ia64/kernel/irq.c
> > @@ -57,8 +57,8 @@ static char irq_redir [NR_IRQS]; // = { [0 ... NR_IRQS-1] 
> > = 1 };
> >   void set_irq_affinity_info (unsigned int irq, int hwid, int redir)
> >   {
> > if (irq < NR_IRQS) {
> > -   cpumask_copy(irq_get_affinity_mask(irq),
> > -cpumask_of(cpu_logical_id(hwid)));
> > +   irq_data_update_affinity(irq_get_irq_data(irq),
> > +cpumask_of(cpu_logical_id(hwid)));
> > irq_redir[irq] = (char) (redir & 0xff);
> > }
> >   }
> > diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c
> > index df5c28f252e3..025e5133c860 100644
> > --- a/arch/ia64/kernel/msi_ia64.c
> > +++ b/arch/ia64/kernel/msi_ia64.c
> > @@ -37,7 +37,7 @@ static int ia64_set_msi_irq_affinity(struct irq_data 
> > *idata,
> > msg.data = data;
> > pci_write_msi_msg(irq, );
> > -   cpumask_copy(irq_data_get_affinity_mask(idata), cpumask_of(cpu));
> > +   irq_data_update_affinity(idata, cpumask_of(cpu));
> > return 0;
> >   }
> > @@ -132,7 +132,7 @@ static int dmar_msi_set_affinity(struct irq_data *data,
> > msg.address_lo |= MSI_ADDR_DEST_ID_CPU(cpu_physical_id(cpu));
> > dmar_msi_write(irq, );
> > -   cpumask_copy(irq_data_get_affinity_mask(data), mask);
> > +   irq_data_update_affinity(data, mask);
> > return 0;
> >   }
> > diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
> > index 0fe2d79fb123..5ebb1771b4ab 100644
> > --- a/arch/parisc/kernel/irq.c
> > +++ b/arch/parisc/kernel/irq.c
> > @@ -315,7 +315,7 @@ unsigned long txn_affinity_addr(unsigned int irq, int 
> > cpu)
> >   {
> >   #ifdef CONFIG_SMP
> > struct irq_data *d = irq_get_irq_data(irq);
> > -   cpumask_copy(irq_data_get_affinity_mask(d), cpumask_of(cpu));
> > +   irq_data_update_affinity(d, cpumask_of(cpu));
> >   #endif
> > return per_cpu(cpu_data, cpu).txn_addr;
> > diff --git a/drivers/irqchip/irq-bcm6345-l1.c 
> > b/drivers/irqchip/irq-bcm6345-l1.c
> > index 142a7431745f..6899e37810a8 100644
> > --- a/drivers/irqchip/irq-bcm6345-l1.c
> > +++ b/drivers/irqchip/irq-bcm6345-l1.c
> > @@ -216,11 +216,11 @@ static int bcm6345_l1_set_affinity(struct irq_data *d,
> > enabled = intc->cpus[old_cpu]->enable_cache[word] & mask;
> > if (enabled)
> > __bcm6345_l1_mask(d);
> > - 

Re: fully convert arm to use dma-direct

2022-05-06 Thread Marc Zyngier
On Fri, 22 Apr 2022 22:17:20 +0100,
Linus Walleij  wrote:
> 
> On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
> 
> > arm is the last platform not using the dma-direct code for directly
> > mapped DMA.  With the dmaboune removal from Arnd we can easily switch
> > arm to always use dma-direct now (it already does for LPAE configs
> > and nommu).  I'd love to merge this series through the dma-mapping tree
> > as it gives us the opportunity for additional core dma-mapping
> > improvements.
> (...)
> 
> >  b/arch/arm/mach-footbridge/Kconfig   |1
> >  b/arch/arm/mach-footbridge/common.c  |   19
> >  b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8
> >  b/arch/arm/mach-footbridge/include/mach/memory.h |4
> 
> I think Marc Z has a Netwinder that he can test this on. Marc?
> I have one too, just not much in my office because of parental leave.

Finally found some time to hook the machine up and throw a new kernel
at it. Booted at its usual glacial speed, so FWIW:

Tested-by: Marc Zyngier 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


<    1   2