Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On 05/05/2021 04:15, Jason Gunthorpe wrote: On Tue, May 04, 2021 at 01:54:55PM +1000, David Gibson wrote: On Mon, May 03, 2021 at 01:05:30PM -0300, Jason Gunthorpe wrote: On Thu, Apr 29, 2021 at 01:20:22PM +1000, David Gibson wrote: There is a certain appeal to having some 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra information like windows that can be optionally called by the viommu driver and it remains well defined and described. Windows really aren't ppc specific. They're absolutely there on x86 and everything else as well - it's just that people are used to having a window at 0.. that you can often get away with treating it sloppily. My point is this detailed control seems to go on to more than just windows. As you say the vIOMMU is emulating specific HW that needs to have kernel interfaces to match it exactly. It's really not that bad. The case of emulating the PAPR vIOMMU on something else is relatively easy, because all updates to the IO page tables go through hypercalls. So, as long as the backend IOMMU can map all the IOVAs that the guest IOMMU can, then qemu's implementation of those hypercalls just needs to put an equivalent mapping in the backend, which it can do with a generic VFIO_DMA_MAP. So you also want the PAPR vIOMMU driver to run on, say, an ARM IOMMU? This is a good feature in general when let's say there is a linux supported device which has a proprietary device firmware update tool which only exists as an x86 binary and your hardware is not x86 - running qemu + vfio in full emulation would provide a way to run the tool to update a physical device. -- Alexey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Tue, May 04, 2021 at 03:11:54PM -0700, Jacob Pan wrote: > > It is a weird way to use xarray to have a structure which > > itself is just a wrapper around another RCU protected structure. > > > > Make the caller supply the ioasid_data memory, embedded in its own > > element, get rid of the void * and rely on XA_ZERO_ENTRY to hold > > allocated but not active entries. > > > Let me try to paraphrase to make sure I understand. Currently > struct ioasid_data is private to the iasid core, its memory is allocated by > the ioasid core. > > You are suggesting the following: > 1. make struct ioasid_data public > 2. caller allocates memory for ioasid_data, initialize it then pass it to > ioasid_alloc to store in the xarray > 3. caller will be responsible for setting private data inside ioasid_data > and do call_rcu after update if needed. Basically, but you probably won't need a "private data" once the caller has this struct as it can just embed it in whatever larger struct makes sense for it and use container_of/etc I didn't look too closely at the whole thing though. Honestly I'm a bit puzzled why we need a pluggable global allocator framework.. The whole framework went to some trouble to isolate everything into iommu drivers then that whole design is disturbed by this global thing. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
On Tue, May 04 2021 at 12:10, Ricardo Neri wrote: > In x86 there is not an IRQF_NMI flag that can be used to indicate the There exists no IRQF_NMI flag at all. No architecture provides that. > delivery mode when requesting an interrupt (via request_irq()). Thus, > there is no way for the interrupt remapping driver to know and set > the delivery mode. There is no support for this today. So what? > Hence, when allocating an interrupt, check if such interrupt belongs to > the HPET hardlockup detector and fixup the delivery mode accordingly. What? > + /* > + * If we find the HPET hardlockup detector irq, fixup the > + * delivery mode. > + */ > + if (is_hpet_irq_hardlockup_detector(info)) > + irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI; Again. We are not sticking some random device checks into that code. It's wrong and I explained it to you before. https://lore.kernel.org/lkml/alpine.deb.2.21.1906161042080.1...@nanos.tec.linutronix.de/ But I'm happy to repeat it again: "No. This is horrible hackery violating all the layering which we carefully put into place to avoid exactly this kind of sprinkling conditionals into all code pathes. With some thought the existing irqdomain hierarchy can be used to achieve the same thing without tons of extra functions and conditionals." So the outcome of thought and using the irqdomain hierarchy is: Replacing an hpet specific conditional in one place with an hpet specific conditional in a different place. Impressive. hpet_assign_irq(, bool nmi) init_info(info) ... if (nmi) info.flags |= X86_IRQ_ALLOC_AS_NMI; irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info) intel_irq_remapping_alloc(..., info) irq_domain_alloc_irq_parents(..., info) x86_vector_alloc_irqs(..., info) { if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1) return -EINVAL; for (i = 0; i < nr_irqs; i++) { if (info->flags & X86_IRQ_ALLOC_AS_NMI) { irq_cfg_setup_nmi(apicd); continue; } ... } irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required and everything else just works. Of course this needs a few other minor tweaks but none of those introduces random hpet quirks all over the place. Not convoluted enough, right? But that solves none of other problems. Let me summarize again which options or non-options we have: 1) Selective IPIs from NMI context cannot work As explained in the other thread. 2) Shorthand IPI allbutself from NMI This should work, but that obviously does not take the watchdog cpumask into account. Also this only works when IPI shorthand mode is enabled. See apic_smt_update() for details. 3) Sending the IPIs from irq_work This would solve the problem, but if the CPU which is the NMI target is really stuck in an interrupt disabled region then the IPIs won't be sent. OTOH, if that's the case then the CPU which was processing the NMI will continue to be stuck until the next NMI hits which will detect that the CPU is stuck which is a good enough reason to send a shorthand IPI to all CPUs ignoring the watchdog cpumask. Same limitation vs. shorthand mode as #2 4) Changing affinity of the HPET NMI from NMI As we established two years ago that cannot work with interrupt remapping 5) Changing affinity of the HPET NMI from irq_work Same issues as #3 Anything else than #2 is just causing more problems than it solves, but surely the NOHZ_FULL/isolation people might have opinions on this. OTOH, as this is opt-in, anything which wants a watchdog mask which is not the full online set, has to accept that HPET has these restrictions. And that's exactly what I suggested two years ago: https://lore.kernel.org/lkml/alpine.deb.2.21.1906172343120.1...@nanos.tec.linutronix.de/ "It definitely would be worthwhile to experiment with that. if we could use shorthands (also for regular IPIs) that would be a great improvement in general and would nicely solve that NMI issue. Beware of the dragons though." As a consequence of this conversation I implemented shorthand IPIs... But I haven't seen any mentioning that this has been tried, why the approach was not chosen or any discussion about that matter. Not that I'm surprised. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Jason, On Tue, 4 May 2021 15:00:50 -0300, Jason Gunthorpe wrote: > On Tue, May 04, 2021 at 08:41:48AM -0700, Jacob Pan wrote: > > > > > > > > (also looking at ioasid.c, why do we need such a thin and odd > > > > wrapper around xarray?) > > > > > > > > > > I'll leave it to Jean and Jacob. > > > Could you elaborate? > > I mean stuff like this: > > int ioasid_set_data(ioasid_t ioasid, void *data) > { > struct ioasid_data *ioasid_data; > int ret = 0; > > spin_lock(&ioasid_allocator_lock); > ioasid_data = xa_load(&active_allocator->xa, ioasid); > if (ioasid_data) > rcu_assign_pointer(ioasid_data->private, data); > else > ret = -ENOENT; > spin_unlock(&ioasid_allocator_lock); > > /* > * Wait for readers to stop accessing the old private data, so the > * caller can free it. > */ > if (!ret) > synchronize_rcu(); > > return ret; > } > EXPORT_SYMBOL_GPL(ioasid_set_data); > > It is a weird way to use xarray to have a structure which > itself is just a wrapper around another RCU protected structure. > > Make the caller supply the ioasid_data memory, embedded in its own > element, get rid of the void * and rely on XA_ZERO_ENTRY to hold > allocated but not active entries. > Let me try to paraphrase to make sure I understand. Currently struct ioasid_data is private to the iasid core, its memory is allocated by the ioasid core. You are suggesting the following: 1. make struct ioasid_data public 2. caller allocates memory for ioasid_data, initialize it then pass it to ioasid_alloc to store in the xarray 3. caller will be responsible for setting private data inside ioasid_data and do call_rcu after update if needed. Correct? > Make the synchronize_rcu() the caller responsiblity, and callers > should really be able to use call_rcu() > > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
A previous changeset introduced a new member to struct irq_cfg to specify the delivery mode of an interrupt. Supporting the configuration of the delivery mode would require adding a third argument to prepare_irte(). Instead, simply take a pointer to an irq_cfg data structure as the only argument. Always configure the delivery mode of the Interrupt Remapping Table Entry using the values specified in the irq_cfg data structure. This change does not change the existing behavior, as the delivery mode of the APIC is used to configure the irq_cfg of each irq. Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Reviewed-by: Ashok Raj Signed-off-by: Ricardo Neri --- Changes since v4: * None Changes since v3: * None Changes since v2: * None Changes since v1: * Introduced this patch. --- drivers/iommu/intel/irq_remapping.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 611ef5243cb6..daa5df53db59 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1104,7 +1104,7 @@ void intel_irq_remap_add_device(struct dmar_pci_notify_info *info) dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev)); } -static void prepare_irte(struct irte *irte, int vector, unsigned int dest) +static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg) { memset(irte, 0, sizeof(*irte)); @@ -1118,9 +1118,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest) * irq migration in the presence of interrupt-remapping. */ irte->trigger_mode = 0; - irte->dlvry_mode = apic->delivery_mode; - irte->vector = vector; - irte->dest_id = IRTE_DEST(dest); + irte->dlvry_mode = irq_cfg->delivery_mode; + irte->vector = irq_cfg->vector; + irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid); irte->redir_hint = 1; } @@ -1261,8 +1261,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, { struct irte *irte = &data->irte_entry; - prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid); - + prepare_irte(irte, irq_cfg); switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: /* Set source-id of interrupt request */ -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
The HPET hardlockup detector requires that the HPET timer delivers the interrupt as NMI. When interrupt remapping is disabled, this can be done by programming the HPET MSI registers directly. With interrupt remapping, it is necessary to populate an entry in the interrupt remapping table. In x86 there is not an IRQF_NMI flag that can be used to indicate the delivery mode when requesting an interrupt (via request_irq()). Thus, there is no way for the interrupt remapping driver to know and set the delivery mode. Hence, when allocating an interrupt, check if such interrupt belongs to the HPET hardlockup detector and fixup the delivery mode accordingly. Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Reviewed-by: Ashok Raj Signed-off-by: Ricardo Neri --- Changes since v4: * Introduced this patch. Changes since v3: * N/A Changes since v2: * N/A Changes since v1: * N/A --- drivers/iommu/intel/irq_remapping.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index daa5df53db59..b07c68ecac01 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -1376,6 +1377,14 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, irq_data->hwirq = (index << 16) + i; irq_data->chip_data = ird; irq_data->chip = &intel_ir_chip; + + /* +* If we find the HPET hardlockup detector irq, fixup the +* delivery mode. +*/ + if (is_hpet_irq_hardlockup_detector(info)) + irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI; + intel_irq_remapping_prepare_irte(ird, irq_cfg, info, index, i); irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode
Until now, the delivery mode of APIC interrupts is set to the default mode set in the APIC driver. However, there are no restrictions in hardware to configure each interrupt with a different delivery mode. Specifying the delivery mode per interrupt is useful when one is interested in changing the delivery mode of a particular interrupt. For instance, this can be used to deliver an interrupt as non-maskable. Add a new member, delivery_mode, to struct irq_cfg. This new member can be used to update the configuration of the delivery mode in each interrupt domain. Currently, all interrupt domains set the delivery mode of interrupts using the APIC setting. Interrupt domains use an irq_cfg data structure to configure their own data structures and hardware resources. Thus, in order to keep the current behavior, set the delivery mode of the irq configuration that as the APIC setting. In this manner, irq domains can obtain the delivery mode from the irq configuration data instead of the APIC setting, if needed. Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x86@kernel.orgReviewed-by: Ashok Raj Signed-off-by: Ricardo Neri --- Changes since v4: * Rebased to use new enumeration apic_delivery_modes. Changes since v3: * None Changes since v2: * Reduced scope to only add the interrupt delivery mode in struct irq_alloc_info. Changes since v1: * Introduced this patch. --- arch/x86/include/asm/hw_irq.h | 1 + arch/x86/kernel/apic/vector.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index d465ece58151..370f4db0372b 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -90,6 +90,7 @@ struct irq_alloc_info { struct irq_cfg { unsigned intdest_apicid; unsigned intvector; + enum apic_delivery_modesdelivery_mode; }; extern struct irq_cfg *irq_cfg(unsigned int irq); diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6dbdc7c22bb7..d47ed07a56a4 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -567,6 +567,16 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq, irqd->chip_data = apicd; irqd->hwirq = virq + i; irqd_set_single_target(irqd); + + /* +* Initialize the delivery mode of this irq to match the +* default delivery mode of the APIC. This is useful for +* children irq domains which want to take the delivery +* mode from the individual irq configuration rather +* than from the APIC. +*/ +apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode; + /* * Prevent that any of these interrupts is invoked in * non interrupt context via e.g. generic_handle_irq() -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 6/7] iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt
The HPET hardlockup detector requires that the HPET timer delivers the interrupt as NMI. When interrupt remapping is disabled, this can be done by programming the HPET MSI registers directly. With interrupt remapping, it is necessary to populate an entry in the interrupt remapping table. In x86 there is not an IRQF_NMI flag that can be used to indicate the delivery mode when requesting an interrupt (via request_irq()). Thus, there is no way for the interrupt remapping driver to know and set the delivery mode. Hence, when allocating an interrupt, check if such interrupt belongs to the HPET hardlockup detector and fixup the delivery mode accordingly. Cc: Ashok Raj Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- Changes since v4: * Introduced this patch. Changes since v3: * N/A Changes since v2: * N/A Changes since v1: * N/A --- drivers/iommu/amd/iommu.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index e8d9fae0c766..758e08ba42e6 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -3254,6 +3255,14 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq, irq_data->hwirq = (devid << 16) + i; irq_data->chip_data = data; irq_data->chip = &amd_ir_chip; + + /* +* If we find the HPET hardlockup detector irq, fixup the +* delivery mode. +*/ + if (is_hpet_irq_hardlockup_detector(info)) + cfg->delivery_mode = APIC_DELIVERY_MODE_NMI; + irq_remapping_prepare_irte(data, cfg, info, devid, index, i); irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT); } -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq
The HPET hardlockup detector needs to deliver its interrupt as NMI. In x86 there is not an IRQF_NMI flag that can be used in the irq plumbing code to tell interrupt remapping drivers to set the interrupt delivery mode accordingly. Hence, they must fixup the delivery mode internally. Implement a method to determine if the interrupt being allocated belongs to the HPET hardlockup detector. Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Reviewed-by: Ashok Raj Signed-off-by: Ricardo Neri --- Changes since v4: * Introduced this patch. Previous versions had special functions to allocate and set the affinity of a remapped NMI interrupt. Changes since v3: * N/A Changes since v2: * N/A Changes since v1: * N/A --- arch/x86/include/asm/hpet.h | 3 +++ arch/x86/kernel/hpet.c | 33 + 2 files changed, 36 insertions(+) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index df11c7d4af44..5bf675970d4b 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -149,6 +149,7 @@ extern void hardlockup_detector_hpet_stop(void); extern void hardlockup_detector_hpet_enable(unsigned int cpu); extern void hardlockup_detector_hpet_disable(unsigned int cpu); extern void hardlockup_detector_switch_to_perf(void); +extern bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info); #else static inline int hardlockup_detector_hpet_init(void) { return -ENODEV; } @@ -156,6 +157,8 @@ static inline void hardlockup_detector_hpet_stop(void) {} static inline void hardlockup_detector_hpet_enable(unsigned int cpu) {} static inline void hardlockup_detector_hpet_disable(unsigned int cpu) {} static inline void hardlockup_detector_switch_to_perf(void) {} +static inline bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info) +{ return false; } #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */ #else /* CONFIG_HPET_TIMER */ diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 5012590dc1b8..3e43e0f348b8 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1479,6 +1479,39 @@ struct hpet_hld_data *hpet_hld_get_timer(void) hld_data = NULL; return NULL; } + +/** + * is_hpet_irq_hardlockup_detector() - Identify the HPET hld interrupt info + * @info: Interrupt allocation info, with private HPET channel data + * + * The HPET hardlockup detector is special as it needs its interrupts delivered + * as NMI. However, for interrupt remapping we use the existing irq subsystem + * to configure and route the HPET interrupt. Unfortunately, there is not a + * IRQF_NMI flag for x86. Instead, identify whether the interrupt being + * allocated for the HPET channel belongs to the hardlockup detector. + * + * Returns: True if @info indicates that it belongs to the HPET hardlockup + * detector. False otherwise. + */ +bool is_hpet_irq_hardlockup_detector(struct irq_alloc_info *info) +{ + struct hpet_channel *hc; + + if (!info) + return false; + + if (info->type != X86_IRQ_ALLOC_TYPE_HPET) + return false; + + hc = info->data; + if (!hc) + return false; + + if (hc->mode == HPET_MODE_NMI_WATCHDOG) + return true; + + return false; +} #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */ #endif -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping
When interrupt remapping is enabled in the system, the MSI interrupt address and data fields must follow a special format that the IOMMU defines. However, the HPET hardlockup detector must rely on the interrupt subsystem to have the interrupt remapping drivers allocate, activate, and set the affinity of HPET timer interrupt. Hence, it must use request_irq() to use such functionality. In x86 there is not an IRQF_NMI flag to indicate to the interrupt subsystem the delivery mode of the interrupt. A previous changset added functionality to detect the interrupt of the HPET hardlockup detector and fixup the delivery mode accordingly. Also, since request_irq() is used, a non-NMI interrupt handler must be defined. Even if it is not needed. When Interrupt Remapping is enabled, use the new facility to ensure interrupt is plumbed properly to work with interrupt remapping. Cc: Ashok Raj Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Signed-off-by: Ricardo Neri --- Changes since v4: * Use request_irq() to obtain an IRTE for the HPET hardlockup detector instead of the custom interfaces previously implemented in the interrupt remapping drivers. * Simplified detection of interrupt remapping by checking the parent of the HPET irq domain. * Stopped using the HPET magic fields of struct irq_alloc_info. They were removed in commit 2bf1e7bcedb8 ("x86/msi: Consolidate HPET allocation") * Rephrased commit message for clarity. (Ashok) * Clarified error message of non-NMI handler. (Ashok) Changes since v3: * None Changes since v2: * None Changes since v1: * Introduced this patch. Added custom functions in the Intel IOMMU driver to allocate an IRTE for the HPET hardlockup detector. --- arch/x86/include/asm/hpet.h | 2 ++ arch/x86/kernel/hpet.c | 3 ++ arch/x86/kernel/watchdog_hld_hpet.c | 48 + 3 files changed, 47 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h index 5bf675970d4b..d130285ddc96 100644 --- a/arch/x86/include/asm/hpet.h +++ b/arch/x86/include/asm/hpet.h @@ -109,6 +109,7 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler); * @tsc_ticks_per_group: TSC ticks that must elapse for each group of * monitored CPUs. * @irq: IRQ number assigned to the HPET channel + * @int_remap_enabled: True if interrupt remapping is enabled * @handling_cpu: CPU handling the HPET interrupt * @pkgs_per_group:Number of physical packages in a group of CPUs * receiving an IPI @@ -133,6 +134,7 @@ struct hpet_hld_data { u64 tsc_next; u64 tsc_ticks_per_group; int irq; + boolintr_remap_enabled; u32 handling_cpu; u32 pkgs_per_group; u32 nr_groups; diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 3e43e0f348b8..ff4abdef5e15 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1464,6 +1464,9 @@ struct hpet_hld_data *hpet_hld_get_timer(void) if (!hpet_domain) goto err; + if (hpet_domain->parent != x86_vector_domain) + hld_data->intr_remap_enabled = true; + hc->mode = HPET_MODE_NMI_WATCHDOG; irq = hpet_assign_irq(hpet_domain, hc, hc->num); if (irq <= 0) diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c index 3fd2405b31fa..265641d001ac 100644 --- a/arch/x86/kernel/watchdog_hld_hpet.c +++ b/arch/x86/kernel/watchdog_hld_hpet.c @@ -176,6 +176,14 @@ static int update_msi_destid(struct hpet_hld_data *hdata) { u32 destid; + if (hdata->intr_remap_enabled) { + int ret; + + ret = irq_set_affinity(hdata->irq, + cpumask_of(hdata->handling_cpu)); + return ret; + } + destid = apic->calc_dest_apicid(hdata->handling_cpu); /* * HPET only supports a 32-bit MSI address register. Thus, only @@ -393,26 +401,52 @@ static int hardlockup_detector_nmi_handler(unsigned int type, return NMI_DONE; } +/* + * When interrupt remapping is enabled, we request the irq for the detector + * using request_irq() and then we fixup the delivery mode to NMI using + * is_hpet_irq_hardlockup_detector(). If the latter fails, we will see a non- + * NMI interrupt. + * + */ +static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data) +{ + pr_err_once("Received a non-NMI interrupt. The HLD detector always uses NMIs!\n"); +
[RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg
There is not hardware requirement to have a different delivery mode for each interrupt. Instead of using the delivery mode of the APIC driver, use the delivery mode of each specific interrupt configuration. This allows to accommodate interrupts which require a specific delivery mode, such as the HPET hardlockup detector. Outside of such case, there are not functional changes since the delivery mode of an interrupt is initialized with the delivery mode of the APIC driver. Cc: Andi Kleen Cc: Borislav Petkov Cc: David Woodhouse (supporter:INTEL IOMMU (VT-d)) Cc: "Ravi V. Shankar" Cc: Ingo Molnar Cc: Jacob Pan Cc: Lu Baolu (supporter:INTEL IOMMU (VT-d)) Cc: Stephane Eranian Cc: Thomas Gleixner Cc: iommu@lists.linux-foundation.org (open list:INTEL IOMMU (VT-d)) Cc: x...@kernel.org Reviewed-by: Ashok Raj Signed-off-by: Ricardo Neri --- Changes since v4: * Introduced this patch. Changes since v3: * N/A Changes since v2: * N/A Changes since v1: * N/A --- drivers/iommu/amd/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40..e8d9fae0c766 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3122,7 +3122,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data *data, data->irq_2_irte.devid = devid; data->irq_2_irte.index = index + sub_handle; - iommu->irte_ops->prepare(data->entry, apic->delivery_mode, + iommu->irte_ops->prepare(data->entry, irq_cfg->delivery_mode, apic->dest_mode_logical, irq_cfg->vector, irq_cfg->dest_apicid, devid); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping
Hi IOMMU experts, I proposed a hardlockup detector driven by the HPET timer [1]. Such detector is driven by a single timer. The hardlockup detector brings the extra complexity of having to update the affinity of the interrupt periodically and initiated from NMI context. The proposed design only requires updating the affinity every watchdog_thresh (the interval is between [1, 60] seconds). Also, the affinity update is offloaded to an irq_work. Handling the HPET interrupt affinity is trivial with !intremap since the detector composes the MSI message and writes it directly to the HPET registers. However, for intremap we must use the existing IOMMU drivers as well as the kernel's irq plumbing. Thomas Gleixner has imposed two restrictions: 1) Do not implement an IRQF_NMI flag for x86 as it is not possible to determine the source of an NMI [2]. 2) Use the irq subsystem to update the affinity of the HPET interrupt [3]. 1) implies that the interrupt remapping drivers need to implement a quirk to identify the HPET interrupt and update its delivery mode to NMI. 2) means that the hardlockup detector must use request_irq() to allocate the HPET interrupt. This patch series attempts to meet the requirements above by a) Decoupling the delivery mode of an APIC interrupt from the delivery mode of the APIC driver (patch 1) b) Implement quirks in the Intel and AMD IOMMU drivers to identify the HPET timer and update the delivery mode accordingly (patches 2-5). c) Add support for interrupt remapping in the HPET hardlockup detector in [1]. This includes the unavoidable eyesore of using request_irq() and having a useless regular interrupt handler (patch 6). I would like to get your feedback on whether the HPET NMI quirk looks sane to you and whether offloading the affinity setup to an irq_work could pose issues. Thanks and BR, Ricardo [1]. https://lore.kernel.org/lkml/20210504190526.22347-1-ricardo.neri-calde...@linux.intel.com/T/#mf77988cc98f9ca6988831e17f68394577388959d [2]. https://lore.kernel.org/lkml/alpine.deb.2.21.1808021137400.2...@nanos.tec.linutronix.de/ [3]. https://lore.kernel.org/lkml/alpine.deb.2.21.1906161042080.1...@nanos.tec.linutronix.de/ Changes since v4: * With !CONFIG_IRQ_REMAP [1] now disables the HPET channel before changing the MSI Destination ID field. This should avoid races between a pending interrupt and updating the detector's interrupt affinity. (Ashok) * Rebased to use new enumeration apic_delivery_modes. * Removed custom functions to allocate an interrupt for the detector and instead added support to identify the detector's interrupt and change the delivery mode. * With interrupt remapping enabled, use request_irq(). * Added support for AMD IOMMU. Changes since v3: * None Changes since v2: * None Changes since v1: * Introduced support for interrupt remapping Ricardo Neri (7): x86/apic: Add irq_cfg::delivery_mode x86/hpet: Introduce function to identify HPET hardlockup detector irq iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode iommu/amd: Set the IRTE delivery mode from irq_cfg iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt iommu/amd: Fixup delivery mode of the HPET hardlockup interrupt x86/watchdog/hardlockup/hpet: Support interrupt remapping arch/x86/include/asm/hpet.h | 5 +++ arch/x86/include/asm/hw_irq.h | 1 + arch/x86/kernel/apic/vector.c | 10 ++ arch/x86/kernel/hpet.c | 36 ++ arch/x86/kernel/watchdog_hld_hpet.c | 48 + drivers/iommu/amd/iommu.c | 11 ++- drivers/iommu/intel/irq_remapping.c | 20 7 files changed, 118 insertions(+), 13 deletions(-) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma-mapping updates for Linux 5.13
The pull request you sent on Tue, 4 May 2021 09:58:23 +0200: > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.13 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/954b7207059cc4004f2e18f49c335304b1c6d64a Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Tue, May 04, 2021 at 01:54:55PM +1000, David Gibson wrote: > On Mon, May 03, 2021 at 01:05:30PM -0300, Jason Gunthorpe wrote: > > On Thu, Apr 29, 2021 at 01:20:22PM +1000, David Gibson wrote: > > > > There is a certain appeal to having some > > > > 'PPC_TCE_CREATE_SPECIAL_IOASID' entry point that has a wack of extra > > > > information like windows that can be optionally called by the viommu > > > > driver and it remains well defined and described. > > > > > > Windows really aren't ppc specific. They're absolutely there on x86 > > > and everything else as well - it's just that people are used to having > > > a window at 0.. that you can often get away with > > > treating it sloppily. > > > > My point is this detailed control seems to go on to more than just > > windows. As you say the vIOMMU is emulating specific HW that needs to > > have kernel interfaces to match it exactly. > > It's really not that bad. The case of emulating the PAPR vIOMMU on > something else is relatively easy, because all updates to the IO page > tables go through hypercalls. So, as long as the backend IOMMU can > map all the IOVAs that the guest IOMMU can, then qemu's implementation > of those hypercalls just needs to put an equivalent mapping in the > backend, which it can do with a generic VFIO_DMA_MAP. So you also want the PAPR vIOMMU driver to run on, say, an ARM IOMMU? > vIOMMUs with page tables in guest memory are harder, but only really > in the usual ways that a vIOMMU of that type is harder (needs cache > mode or whatever). At whatever point you need to shadow from the > guest IO page tables to the host backend, you can again do that with > generic maps, as long as the backend supports the necessary IOVAs, and > has an IO page size that's equal to or a submultiple of the vIOMMU > page size. But this definitely all becomes HW specific. For instance I want to have an ARM vIOMMU driver it needs to do some ret = ioctl(ioasid_fd, CREATE_NESTED_IOASID, [page table format is ARMvXXX]) if (ret == -EOPNOTSUPP) ret = ioctl(ioasid_fd, CREATE_NORMAL_IOASID, ..) // and do completely different and more expensive emulation I can get a little bit of generality, but at the end of the day the IOMMU must create a specific HW layout of the nested page table, if it can't, it can't. > > I'm remarking that trying to unify every HW IOMMU implementation that > > ever has/will exist into a generic API complete enough to allow the > > vIOMMU to be created is likely to result in an API too complicated to > > understand.. > > Maybe not every one, but I think we can get a pretty wide range with a > reasonable interface. It sounds like a reasonable guideline is if the feature is actually general to all IOMMUs and can be used by qemu as part of a vIOMMU emulation when compatible vIOMMU HW is not available. Having 'requested window' support that isn't actually implemented in every IOMMU is going to mean the PAPR vIOMMU emulation won't work, defeating the whole point of making things general? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Tue, May 04, 2021 at 08:41:48AM -0700, Jacob Pan wrote: > > > > > > (also looking at ioasid.c, why do we need such a thin and odd wrapper > > > around xarray?) > > > > > > > I'll leave it to Jean and Jacob. > Could you elaborate? I mean stuff like this: int ioasid_set_data(ioasid_t ioasid, void *data) { struct ioasid_data *ioasid_data; int ret = 0; spin_lock(&ioasid_allocator_lock); ioasid_data = xa_load(&active_allocator->xa, ioasid); if (ioasid_data) rcu_assign_pointer(ioasid_data->private, data); else ret = -ENOENT; spin_unlock(&ioasid_allocator_lock); /* * Wait for readers to stop accessing the old private data, so the * caller can free it. */ if (!ret) synchronize_rcu(); return ret; } EXPORT_SYMBOL_GPL(ioasid_set_data); It is a weird way to use xarray to have a structure which itself is just a wrapper around another RCU protected structure. Make the caller supply the ioasid_data memory, embedded in its own element, get rid of the void * and rely on XA_ZERO_ENTRY to hold allocated but not active entries. Make the synchronize_rcu() the caller responsiblity, and callers should really be able to use call_rcu() Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Wed, Apr 28, 2021 at 06:58:19AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, April 28, 2021 1:12 AM > > > [...] > > > As Alex says, if this line fails because of the group restrictions, > > > that's not great because it's not very obvious what's gone wrong. > > > > Okay, that is fair, but let's solve that problem directly. For > > instance netlink has been going in the direction of adding a "extack" > > from the kernel which is a descriptive error string. If the failing > > ioctl returned the string: > > > > "cannot join this device to the IOASID because device XXX in the > >same group #10 is in use" > > > > Would you agree it is now obvious what has gone wrong? In fact would > > you agree this is a lot better user experience than what applications > > do today even though they have the group FD? > > > > Currently all the discussions are around implicit vs. explicit uAPI semantics > on the group restriction. However if we look beyond group the implicit > semantics might be inevitable when dealing with incompatible iommu > domains. An existing example of iommu incompatibility is IOMMU_ > CACHE. I still think we need to get rid of these incompatibilities somehow. Having multiple HW incompatible IOASID in the same platform is just bad all around. When modeling in userspace IOMMU_CACHE sounds like it is a property of each individual IOASID, not an attribute that requires a new domain. People that want to create cache bypass IOASID's should just ask for that that directly. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
Hi again! On 04/05/2021 07:52, Suravee Suthikulpanit wrote: On certain AMD platforms, when the IOMMU performance counter source (csource) field is zero, power-gating for the counter is enabled, which prevents write access and returns zero for read access. This can cause invalid perf result especially when event multiplexing is needed (i.e. more number of events than available counters) since the current logic keeps track of the previously read counter value, and subsequently re-program the counter to continue counting the event. With power-gating enabled, we cannot gurantee successful re-programming of the counter. Workaround this issue by : 1. Modifying the ordering of setting/reading counters and enabing/ disabling csources to only access the counter when the csource is set to non-zero. 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. Results for Ryzen 4700U running Ubuntu 21.04 kernel 5.11.0-16 patched as above. All amd_iommu events: Performance counter stats for 'system wide': 18 amd_iommu_0/cmd_processed/(33.29%) 9 amd_iommu_0/cmd_processed_inv/(33.33%) 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ (33.36%) 308 amd_iommu_0/int_dte_hit/ (33.40%) 5 amd_iommu_0/int_dte_mis/ (33.45%) 346 amd_iommu_0/mem_dte_hit/ (33.46%) 8,954 amd_iommu_0/mem_dte_mis/ (33.48%) 0 amd_iommu_0/mem_iommu_tlb_pde_hit/(33.46%) 771 amd_iommu_0/mem_iommu_tlb_pde_mis/(33.44%) 14 amd_iommu_0/mem_iommu_tlb_pte_hit/(33.40%) 836 amd_iommu_0/mem_iommu_tlb_pte_mis/(33.36%) 0 amd_iommu_0/mem_pass_excl/(33.32%) 0 amd_iommu_0/mem_pass_pretrans/(33.28%) 1,601 amd_iommu_0/mem_pass_untrans/ (33.27%) 0 amd_iommu_0/mem_target_abort/ (33.27%) 1,130 amd_iommu_0/mem_trans_total/ (33.27%) 0 amd_iommu_0/page_tbl_read_gst/(33.27%) 312 amd_iommu_0/page_tbl_read_nst/(33.28%) 279 amd_iommu_0/page_tbl_read_tot/(33.27%) 0 amd_iommu_0/smi_blk/ (33.29%) 0 amd_iommu_0/smi_recv/ (33.27%) 0 amd_iommu_0/tlb_inv/ (33.26%) 0 amd_iommu_0/vapic_int_guest/ (33.25%) 366 amd_iommu_0/vapic_int_non_guest/ (33.27%) 10.001941666 seconds time elapsed Groups of 8 amd_iommu events: Performance counter stats for 'system wide': 14 amd_iommu_0/cmd_processed/ 7 amd_iommu_0/cmd_processed_inv/ 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 502 amd_iommu_0/int_dte_hit/ 6 amd_iommu_0/int_dte_mis/ 532 amd_iommu_0/mem_dte_hit/ 13,622 amd_iommu_0/mem_dte_mis/ 159 amd_iommu_0/mem_iommu_tlb_pde_hit/ 10.002170562 seconds time elapsed Performance counter stats for 'system wide': 762 amd_iommu_0/mem_iommu_tlb_pde_mis/ 20 amd_iommu_0/mem_iommu_tlb_pte_hit/ 698 amd_iommu_0/mem_iommu_tlb_pte_mis/ 0 amd_iommu_0/mem_pass_excl/ 0 amd_iommu_0/mem_pass_pretrans/ 15 amd_iommu_0/mem_pass_untrans/ 0 amd_iommu_0/mem_target_abort/ 718 amd_iommu_0/mem_trans_total/ 10.001683428 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/page_tbl_read_gst/ 33 amd_iommu_0/page_tbl_read_nst/ 33 amd_iommu_0/page_tbl_read_tot/ 0 amd_iommu_0/smi_blk/ 0 amd_iommu_0/smi_recv/ 0 amd_iommu_0/tlb_inv/ 0 amd_iommu_0/vapic_int_guest/ 11,638 amd_iommu_0/vapic_int_non_guest/ 10.002205748 seconds time elapsed ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Tue, May 04, 2021 at 09:22:55AM -0700, Jacob Pan wrote: > Hi Jason, > > On Wed, 28 Apr 2021 17:46:06 -0300, Jason Gunthorpe wrote: > > > > > I think the name IOASID is fine for the uAPI, the kernel version can > > > > be called ioasid_id or something. > > > > > > ioasid is already an id and then ioasid_id just adds confusion. Another > > > point is that ioasid is currently used to represent both PCI PASID and > > > ARM substream ID in the kernel. It implies that if we want to separate > > > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with > > > another general term usable for substream ID. Are we making the > > > terms too confusing here? > > > > This is why I also am not so sure about exposing the PASID in the API > > because it is ultimately a HW specific item. > > > > As I said to David, one avenue is to have some generic uAPI that is > > very general and keep all this deeply detailed stuff, that really only > > matters for qemu, as part of a more HW specific vIOMMU driver > > interface. > I think it is not just for QEMU. I am assuming you meant PASID is > needed for guest driver to program assigned but not mediated devices. Anything that directly operates the device and tries to instantiate PASIDs for vfio-pci devices will need to understand the PASID. > User space drivers may also need to get the real HW PASID to program it on > to the HW. So this uAPI need to provide some lookup functionality. Perhaps > the kernel generic version can be called ioasid_hw_id? > > So we have the following per my understanding: > - IOASID: a userspace logical number which identifies a page table, this can > be a first level (GVA-GPA), or a second level (GPA->HPA) page table. > - PASID: strictly defined in PCIe term > - Substream ID: strictly defined in ARM SMMUv3 spec. > - IOASID_HW_ID: a generic ID backed by PASID, Substream ID, or any other >HW IDs used to tag DMA > > Is that right? It is reasonable. If a IOASID_HW_ID IOCTL can back with a enum that qualified its exact nature it might be perfectly fine. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Jason, On Wed, 28 Apr 2021 17:46:06 -0300, Jason Gunthorpe wrote: > > > I think the name IOASID is fine for the uAPI, the kernel version can > > > be called ioasid_id or something. > > > > ioasid is already an id and then ioasid_id just adds confusion. Another > > point is that ioasid is currently used to represent both PCI PASID and > > ARM substream ID in the kernel. It implies that if we want to separate > > ioasid and pasid in the uAPI the 'pasid' also needs to be replaced with > > another general term usable for substream ID. Are we making the > > terms too confusing here? > > This is why I also am not so sure about exposing the PASID in the API > because it is ultimately a HW specific item. > > As I said to David, one avenue is to have some generic uAPI that is > very general and keep all this deeply detailed stuff, that really only > matters for qemu, as part of a more HW specific vIOMMU driver > interface. I think it is not just for QEMU. I am assuming you meant PASID is needed for guest driver to program assigned but not mediated devices. User space drivers may also need to get the real HW PASID to program it on to the HW. So this uAPI need to provide some lookup functionality. Perhaps the kernel generic version can be called ioasid_hw_id? So we have the following per my understanding: - IOASID: a userspace logical number which identifies a page table, this can be a first level (GVA-GPA), or a second level (GPA->HPA) page table. - PASID: strictly defined in PCIe term - Substream ID: strictly defined in ARM SMMUv3 spec. - IOASID_HW_ID: a generic ID backed by PASID, Substream ID, or any other HW IDs used to tag DMA Is that right? Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Kevin, On Wed, 28 Apr 2021 06:34:11 +, "Tian, Kevin" wrote: > > > > (also looking at ioasid.c, why do we need such a thin and odd wrapper > > around xarray?) > > > > I'll leave it to Jean and Jacob. I am not sure whether you are referring to the current ioasid.c or the changes proposed in this patchset. I added a per VM/ioasid_set (also per /dev/ioasid fd) xarray to store guest-host PASID mapping. The current code has a xarray for the allocators. struct ioasid_allocator_data { struct ioasid_allocator_ops *ops; struct list_head list; struct list_head slist; #define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */ unsigned long flags; struct xarray xa; struct rcu_head rcu; }; Could you elaborate? Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/4] dt-bindings: iommu: rockchip: Add compatible for v2
Hi Rob, On Friday, April 30, 2021 22:34 -03, Rob Herring wrote: > On Thu, Apr 22, 2021 at 04:16:00PM +0200, Benjamin Gaignard wrote: > > Add compatible for the second version of IOMMU hardware block. > > RK356x IOMMU can also be link to a power domain. > > > > Signed-off-by: Benjamin Gaignard > > --- > > version 2: > > - Add power-domains property > > > > .../devicetree/bindings/iommu/rockchip,iommu.yaml | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml > > b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml > > index 0db208cf724a..e54353ccd1ec 100644 > > --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml > > +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml > > @@ -19,7 +19,9 @@ description: |+ > > > > properties: > >compatible: > > -const: rockchip,iommu > > +enum: > > + - rockchip,iommu > > + - rockchip,iommu-v2 > > This should be SoC specific. > It seems iommu-v2 is really the name Rockchip gives to this IOMMU IP core. Can we keep the "rockchip,iommu-v2" compatible, and add SoC-specific ones, as we normally do: compatible = "rockchip,rk3568-iommu", "rockchip,iommu-v2"; Just like we'd do with any peripheral: compatible = "st,stm32-dwmac", "snps,dwmac-3.50a"; Thanks! Ezequiel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote: > Peter, > > On 5/4/2021 4:39 PM, Peter Zijlstra wrote: > > On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: > > > > > 2. Since AMD IOMMU PMU does not support interrupt mode, the logic > > > can be simplified to always start counting with value zero, > > > and accumulate the counter value when stopping without the need > > > to keep track and reprogram the counter with the previously read > > > counter value. > > > > This relies on the hardware counter being the full 64bit wide, is it? > > > > The HW counter value is 48-bit. Not sure why it needs to be 64-bit? > I might be missing some points here? Could you please describe? How do you deal with the 48bit overflow if you don't use the interrupt? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
Peter, On 5/4/2021 4:39 PM, Peter Zijlstra wrote: On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This relies on the hardware counter being the full 64bit wide, is it? The HW counter value is 48-bit. Not sure why it needs to be 64-bit? I might be missing some points here? Could you please describe? Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote: > 2. Since AMD IOMMU PMU does not support interrupt mode, the logic >can be simplified to always start counting with value zero, >and accumulate the counter value when stopping without the need >to keep track and reprogram the counter with the previously read >counter value. This relies on the hardware counter being the full 64bit wide, is it? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RESEND PATCH v2] iommu/amd: Fix extended features logging
print_iommu_info prints the EFR register and then the decoded list of features on a separate line: pci :00:00.2: AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC The second line is emitted via 'pr_cont', which causes it to have a different ('warn') loglevel compared to the previous line ('info'). Commit 9a295ff0ffc9 attempted to rectify this by removing the newline from the pci_info format string, but this doesn't work, as pci_info calls implicitly append a newline anyway. Printing the decoded features on the same line would make it quite long. Instead, change pci_info() to pr_info() to omit PCI bus location info, which is also shown in the preceding message. This results in: pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40 AMD-Vi: Extended features (0x206d73ef22254ade): PPR X2APIC NX GT IA GA PC GA_vAPIC AMD-Vi: Interrupt remapping enabled Fixes: 9a295ff0ffc9 ("iommu/amd: Print extended features in one line to fix divergent log levels") Link: https://lore.kernel.org/lkml/alpine.lnx.2.20.13.2104112326460.11...@monopod.intra.ispras.ru Signed-off-by: Alexander Monakov Cc: Paul Menzel Cc: Joerg Roedel Cc: Suravee Suthikulpanit Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 429a4baa3aee..8f0eb865119a 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1954,8 +1954,8 @@ static void print_iommu_info(void) pci_info(pdev, "Found IOMMU cap 0x%x\n", iommu->cap_ptr); if (iommu->cap & (1 << IOMMU_CAP_EFR)) { - pci_info(pdev, "Extended features (%#llx):", -iommu->features); + pr_info("Extended features (%#llx):", iommu->features); + for (i = 0; i < ARRAY_SIZE(feat_str); ++i) { if (iommu_feature(iommu, (1ULL << i))) pr_cont(" %s", feat_str[i]); base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717 -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 4/4] iommu: rockchip: Add support iommu v2
From: Simon Xue RK356x SoC got new IOMMU hardware block (version 2). Add a compatible to distinguish it from the first version. Signed-off-by: Simon Xue [Benjamin] - port driver from kernel 4.19 to 5.12 - change functions prototype. - squash all fixes in this commit. Signed-off-by: Benjamin Gaignard --- version 3: - Change compatible name to use SoC name drivers/iommu/rockchip-iommu.c | 422 +++-- 1 file changed, 406 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index e5d86b7177de..32dcf0aaec18 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -81,6 +82,30 @@ */ #define RK_IOMMU_PGSIZE_BITMAP 0x007ff000 +#define DT_LO_MASK 0xf000 +#define DT_HI_MASK GENMASK_ULL(39, 32) +#define DT_SHIFT 28 + +#define DTE_BASE_HI_MASK GENMASK(11, 4) + +#define PAGE_DESC_LO_MASK 0xf000 +#define PAGE_DESC_HI1_LOWER 32 +#define PAGE_DESC_HI1_UPPER 35 +#define PAGE_DESC_HI2_LOWER 36 +#define PAGE_DESC_HI2_UPPER 39 +#define PAGE_DESC_HI_MASK1 GENMASK_ULL(PAGE_DESC_HI1_UPPER, PAGE_DESC_HI1_LOWER) +#define PAGE_DESC_HI_MASK2 GENMASK_ULL(PAGE_DESC_HI2_UPPER, PAGE_DESC_HI2_LOWER) + +#define DTE_HI1_LOWER 8 +#define DTE_HI1_UPPER 11 +#define DTE_HI2_LOWER 4 +#define DTE_HI2_UPPER 7 +#define DTE_HI_MASK1 GENMASK(DTE_HI1_UPPER, DTE_HI1_LOWER) +#define DTE_HI_MASK2 GENMASK(DTE_HI2_UPPER, DTE_HI2_LOWER) + +#define PAGE_DESC_HI_SHIFT1 (PAGE_DESC_HI1_LOWER - DTE_HI1_LOWER) +#define PAGE_DESC_HI_SHIFT2 (PAGE_DESC_HI2_LOWER - DTE_HI2_LOWER) + struct rk_iommu_domain { struct list_head iommus; u32 *dt; /* page directory table */ @@ -91,6 +116,10 @@ struct rk_iommu_domain { struct iommu_domain domain; }; +struct rockchip_iommu_data { + u32 version; +}; + /* list of clocks required by IOMMU */ static const char * const rk_iommu_clocks[] = { "aclk", "iface", @@ -108,6 +137,7 @@ struct rk_iommu { struct list_head node; /* entry in rk_iommu_domain.iommus */ struct iommu_domain *domain; /* domain to which iommu is attached */ struct iommu_group *group; + u32 version; }; struct rk_iommudata { @@ -174,11 +204,32 @@ static struct rk_iommu_domain *to_rk_domain(struct iommu_domain *dom) #define RK_DTE_PT_ADDRESS_MASK0xf000 #define RK_DTE_PT_VALID BIT(0) +/* + * In v2: + * 31:12 - PT address bit 31:0 + * 11: 8 - PT address bit 35:32 + * 7: 4 - PT address bit 39:36 + * 3: 1 - Reserved + * 0 - 1 if PT @ PT address is valid + */ +#define RK_DTE_PT_ADDRESS_MASK_V2 0xfff0 + static inline phys_addr_t rk_dte_pt_address(u32 dte) { return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK; } +static inline phys_addr_t rk_dte_pt_address_v2(u32 dte) +{ + u64 dte_v2 = dte; + + dte_v2 = ((dte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) | +((dte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) | +(dte_v2 & PAGE_DESC_LO_MASK); + + return (phys_addr_t)dte_v2; +} + static inline bool rk_dte_is_pt_valid(u32 dte) { return dte & RK_DTE_PT_VALID; @@ -189,6 +240,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID; } +static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma) +{ + pt_dma = (pt_dma & PAGE_DESC_LO_MASK) | +((pt_dma & PAGE_DESC_HI_MASK1) >> PAGE_DESC_HI_SHIFT1) | +(pt_dma & PAGE_DESC_HI_MASK2) >> PAGE_DESC_HI_SHIFT2; + + return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID; +} + /* * Each PTE has a Page address, some flags and a valid bit: * +-+---+---+-+ @@ -215,11 +275,37 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma) #define RK_PTE_PAGE_READABLE BIT(1) #define RK_PTE_PAGE_VALID BIT(0) +/* + * In v2: + * 31:12 - Page address bit 31:0 + * 11:9 - Page address bit 34:32 + * 8:4 - Page address bit 39:35 + * 3 - Security + * 2 - Readable + * 1 - Writable + * 0 - 1 if Page @ Page address is valid + */ +#define RK_PTE_PAGE_ADDRESS_MASK_V2 0xfff0 +#define RK_PTE_PAGE_FLAGS_MASK_V20x000e +#define RK_PTE_PAGE_READABLE_V2 BIT(2) +#define RK_PTE_PAGE_WRITABLE_V2 BIT(1) + static inline phys_addr_t rk_pte_page_address(u32 pte) { return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK; } +static inline phys_addr_t rk_pte_page_address_v2(u32 pte) +{ + u64 pte_v2 = pte; + + pte_v2 = ((pte_v2 & DTE_HI_MASK2) << PAGE_DESC_HI_SHIFT2) | +((pte_v2 & DTE_HI_MASK1) << PAGE_DESC_HI_SHIFT1) | +(pte_v2 & PAGE_DESC_LO_MASK); + + return (phys_addr_t)pte_v2; +} + static inline bool rk_pte_is_page_valid(u32 pte) { return pte & RK_PTE_PAGE_VALID; @@ -229,12 +315,26 @@ static inline bool rk_pte_is_page_valid(u32 p
[PATCH v3 2/4] dt-bindings: iommu: rockchip: Add compatible for v2
Add compatible for the second version of IOMMU hardware block. RK356x IOMMU can also be link to a power domain. Signed-off-by: Benjamin Gaignard --- version 3: - Rename compatible with SoC name version 2: - Add power-domains property .../devicetree/bindings/iommu/rockchip,iommu.yaml | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml index 0db208cf724a..7f6bca185b5f 100644 --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml @@ -19,7 +19,9 @@ description: |+ properties: compatible: -const: rockchip,iommu +enum: + - rockchip,iommu + - rockchip,rk3568-iommu reg: minItems: 1 @@ -46,6 +48,9 @@ properties: "#iommu-cells": const: 0 + power-domains: +maxItems: 1 + rockchip,disable-mmu-reset: $ref: /schemas/types.yaml#/definitions/flag description: | -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
Convert Rockchip IOMMU to DT schema Signed-off-by: Benjamin Gaignard --- version 2: - Change maintainer - Change reg maxItems - Change interrupt maxItems .../bindings/iommu/rockchip,iommu.txt | 38 - .../bindings/iommu/rockchip,iommu.yaml| 79 +++ 2 files changed, 79 insertions(+), 38 deletions(-) delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt deleted file mode 100644 index 6ecefea1c6f9.. --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt +++ /dev/null @@ -1,38 +0,0 @@ -Rockchip IOMMU -== - -A Rockchip DRM iommu translates io virtual addresses to physical addresses for -its master device. Each slave device is bound to a single master device, and -shares its clocks, power domain and irq. - -Required properties: -- compatible : Should be "rockchip,iommu" -- reg : Address space for the configuration registers -- interrupts : Interrupt specifier for the IOMMU instance -- interrupt-names : Interrupt name for the IOMMU instance -- #iommu-cells: Should be <0>. This indicates the iommu is a -"single-master" device, and needs no additional information -to associate with its master device. See: -Documentation/devicetree/bindings/iommu/iommu.txt -- clocks : A list of clocks required for the IOMMU to be accessible by -the host CPU. -- clock-names : Should contain the following: - "iface" - Main peripheral bus clock (PCLK/HCL) (required) - "aclk" - AXI bus clock (required) - -Optional properties: -- rockchip,disable-mmu-reset : Don't use the mmu reset operation. - Some mmu instances may produce unexpected results - when the reset operation is used. - -Example: - - vopl_mmu: iommu@ff940300 { - compatible = "rockchip,iommu"; - reg = <0xff940300 0x100>; - interrupts = ; - interrupt-names = "vopl_mmu"; - clocks = <&cru ACLK_VOP1>, <&cru HCLK_VOP1>; - clock-names = "aclk", "iface"; - #iommu-cells = <0>; - }; diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml new file mode 100644 index ..0db208cf724a --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml @@ -0,0 +1,79 @@ +# SPDX-License-Identifier: GPL-2.0-only +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip IOMMU + +maintainers: + - Heiko Stuebner + +description: |+ + A Rockchip DRM iommu translates io virtual addresses to physical addresses for + its master device. Each slave device is bound to a single master device and + shares its clocks, power domain and irq. + + For information on assigning IOMMU controller to its peripheral devices, + see generic IOMMU bindings. + +properties: + compatible: +const: rockchip,iommu + + reg: +minItems: 1 +maxItems: 2 + + interrupts: +minItems: 1 +maxItems: 2 + + interrupt-names: +minItems: 1 +maxItems: 2 + + clocks: +items: + - description: Core clock + - description: Interface clock + + clock-names: +items: + - const: aclk + - const: iface + + "#iommu-cells": +const: 0 + + rockchip,disable-mmu-reset: +$ref: /schemas/types.yaml#/definitions/flag +description: | + Do not use the mmu reset operation. + Some mmu instances may produce unexpected results + when the reset operation is used. + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + - "#iommu-cells" + +additionalProperties: false + +examples: + - | +#include +#include + +vopl_mmu: iommu@ff940300 { + compatible = "rockchip,iommu"; + reg = <0xff940300 0x100>; + interrupts = ; + interrupt-names = "vopl_mmu"; + clocks = <&cru ACLK_VOP1>, <&cru HCLK_VOP1>; + clock-names = "aclk", "iface"; + #iommu-cells = <0>; +}; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/4] ARM: dts: rockchip: rk322x: Fix iommu-cells properties name
Add '#" to iommu-cells properties Signed-off-by: Benjamin Gaignard --- arch/arm/boot/dts/rk322x.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/rk322x.dtsi b/arch/arm/boot/dts/rk322x.dtsi index a4dd50aaf3fc..8af2e9288029 100644 --- a/arch/arm/boot/dts/rk322x.dtsi +++ b/arch/arm/boot/dts/rk322x.dtsi @@ -564,7 +564,7 @@ vpu_mmu: iommu@20020800 { interrupt-names = "vpu_mmu"; clocks = <&cru ACLK_VPU>, <&cru HCLK_VPU>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; @@ -575,7 +575,7 @@ vdec_mmu: iommu@20030480 { interrupt-names = "vdec_mmu"; clocks = <&cru ACLK_RKVDEC>, <&cru HCLK_RKVDEC>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; @@ -629,7 +629,7 @@ iep_mmu: iommu@20070800 { interrupt-names = "iep_mmu"; clocks = <&cru ACLK_IEP>, <&cru HCLK_IEP>; clock-names = "aclk", "iface"; - iommu-cells = <0>; + #iommu-cells = <0>; status = "disabled"; }; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/4] Add driver for rk356x
This series adds the IOMMU driver for rk356x SoC. Since a new compatible is needed to distinguish this second version of IOMMU hardware block from the first one, it is an opportunity to convert the binding to DT schema. version 3: - Rename compatible with soc prefix - Rebase on v5.12 tag version 2: - Fix iommu-cells typo in rk322x.dtsi - Change maintainer - Change reg maxItems - Add power-domains property Benjamin Gaignard (3): dt-bindings: iommu: rockchip: Convert IOMMU to DT schema dt-bindings: iommu: rockchip: Add compatible for v2 ARM: dts: rockchip: rk322x: Fix iommu-cells properties name Simon Xue (1): iommu: rockchip: Add support iommu v2 .../bindings/iommu/rockchip,iommu.txt | 38 -- .../bindings/iommu/rockchip,iommu.yaml| 84 arch/arm/boot/dts/rk322x.dtsi | 6 +- drivers/iommu/rockchip-iommu.c| 422 +- 4 files changed, 493 insertions(+), 57 deletions(-) delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] dma-mapping updates for Linux 5.13
The following changes since commit 1e28eed17697bcf343c6743f0028cc3b5dd88bf0: Linux 5.12-rc3 (2021-03-14 14:41:02 -0700) are available in the Git repository at: git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.13 for you to fetch changes up to a7f3d3d3600c8ed119eb0d2483de0062ce2e3707: dma-mapping: add unlikely hint to error path in dma_mapping_error (2021-04-02 16:41:08 +0200) dma-mapping updates for Linux 5.13: - add a new dma_alloc_noncontiguous API (me, Ricardo Ribalda) - fix a copyright noice (Hao Fang) - add an unlikely annotation to dma_mapping_error (Heiner Kallweit) - remove a pointless empty line (Wang Qing) - add support for multi-pages map/unmap bencharking (Xiang Chen) Christoph Hellwig (5): dma-mapping: add a dma_mmap_pages helper dma-mapping: refactor dma_{alloc,free}_pages dma-mapping: add a dma_alloc_noncontiguous API dma-iommu: refactor iommu_dma_alloc_remap dma-iommu: implement ->alloc_noncontiguous Hao Fang (1): dma-mapping: benchmark: use the correct HiSilicon copyright Heiner Kallweit (1): dma-mapping: add unlikely hint to error path in dma_mapping_error Ricardo Ribalda (1): media: uvcvideo: Use dma_alloc_noncontiguous API Wang Qing (1): dma-mapping: remove a pointless empty line in dma_alloc_coherent Xiang Chen (1): dma-mapping: benchmark: Add support for multi-pages map/unmap Documentation/core-api/dma-api.rst | 88 ++ drivers/iommu/dma-iommu.c | 103 - drivers/media/usb/uvc/uvc_video.c | 94 +++ drivers/media/usb/uvc/uvcvideo.h| 5 +- include/linux/dma-map-ops.h | 19 +++ include/linux/dma-mapping.h | 37 +- kernel/dma/map_benchmark.c | 23 ++-- kernel/dma/mapping.c| 148 ++-- tools/testing/selftests/dma/dma_map_benchmark.c | 22 +++- 9 files changed, 456 insertions(+), 83 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
Le 01/05/2021 à 00:14, Rob Herring a écrit : On Thu, Apr 22, 2021 at 02:16:53PM -0300, Ezequiel Garcia wrote: (Adding Kever) Hi Benjamin, Thanks a lot for working on this, it looks amazing. Together with the great work that Rockchip is doing, it seems RK3566/RK3568 will have decent support very soon. One comment here: On Thu, 2021-04-22 at 16:15 +0200, Benjamin Gaignard wrote: Convert Rockchip IOMMU to DT schema Signed-off-by: Benjamin Gaignard --- version 2: - Change maintainer - Change reg maxItems - Change interrupt maxItems .../bindings/iommu/rockchip,iommu.txt | 38 - .../bindings/iommu/rockchip,iommu.yaml | 79 +++ 2 files changed, 79 insertions(+), 38 deletions(-) delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt deleted file mode 100644 index 6ecefea1c6f9.. --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt +++ /dev/null @@ -1,38 +0,0 @@ -Rockchip IOMMU -== - -A Rockchip DRM iommu translates io virtual addresses to physical addresses for -its master device. Each slave device is bound to a single master device, and -shares its clocks, power domain and irq. - -Required properties: -- compatible : Should be "rockchip,iommu" -- reg : Address space for the configuration registers -- interrupts : Interrupt specifier for the IOMMU instance -- interrupt-names : Interrupt name for the IOMMU instance -- #iommu-cells : Should be <0>. This indicates the iommu is a - "single-master" device, and needs no additional information - to associate with its master device. See: - Documentation/devicetree/bindings/iommu/iommu.txt -- clocks : A list of clocks required for the IOMMU to be accessible by - the host CPU. -- clock-names : Should contain the following: - "iface" - Main peripheral bus clock (PCLK/HCL) (required) - "aclk" - AXI bus clock (required) - -Optional properties: -- rockchip,disable-mmu-reset : Don't use the mmu reset operation. - Some mmu instances may produce unexpected results - when the reset operation is used. - -Example: - - vopl_mmu: iommu@ff940300 { - compatible = "rockchip,iommu"; - reg = <0xff940300 0x100>; - interrupts = ; - interrupt-names = "vopl_mmu"; - clocks = <&cru ACLK_VOP1>, <&cru HCLK_VOP1>; - clock-names = "aclk", "iface"; - #iommu-cells = <0>; - }; diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml new file mode 100644 index ..0db208cf724a --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml @@ -0,0 +1,79 @@ +# SPDX-License-Identifier: GPL-2.0-only +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip IOMMU + +maintainers: + - Heiko Stuebner + +description: |+ + A Rockchip DRM iommu translates io virtual addresses to physical addresses for + its master device. Each slave device is bound to a single master device and + shares its clocks, power domain and irq. + + For information on assigning IOMMU controller to its peripheral devices, + see generic IOMMU bindings. + +properties: + compatible: + const: rockchip,iommu + + reg: + minItems: 1 + maxItems: 2 + + interrupts: + minItems: 1 + maxItems: 2 + + interrupt-names: + minItems: 1 + maxItems: 2 + AFAICS, the driver supports handling multiple MMUs, and there's one reg and interrupt cell for each MMU. IOW, there's no requirement that maxItems is 2. Is there any way we can describe that? Or maybe just allow a bigger maximum? With #iommu-cells == 0, how would one distinguish which IOMMU is associated with a device? IOW, is more that 1 really usable? If you need more just pick a maxItems value that's either the most seen or 'should be enough'TM. If the entries are just multiple instances of the same thing, please note that here. In current dts files it is up to two interruptions per IOMMU hardware blocks so I will keep it to this value. Benjamin Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu