Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-06 Thread Thomas Gleixner
On Fri, May 06 2022 at 23:41, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
>> Programming an HPET channel as periodic requires setting the
>> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
>> register must be written twice (once for the comparator value and once for
>> the periodic value). Since this programming might be needed in several
>> places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
>> add a helper function for this purpose.
>>
>> A helper function hpet_set_comparator_oneshot() could also be implemented.
>> However, such function would only program the comparator register and the
>> function would be quite small. Hence, it is better to not bloat the code
>> with such an obvious function.
>
> This word salad above does not provide a single reason why the periodic
> programming function is required and better suited for the NMI watchdog
> case and then goes on and blurbs about why a function which is not
> required is not implemented. The argument about not bloating the code
> with an "obvious???" function which is quite small is slightly beyond my
> comprehension level.

What's even more uncomprehensible is that the patch which actually sets
up that NMI watchdog cruft has:

> +   if (hc->boot_cfg & HPET_TN_PERIODIC_CAP)
> +   hld_data->has_periodic = true;

So how the heck does that work with a HPET which does not support
periodic mode?

That watchdog muck will still happily invoke that set periodic function
in the hope that it works by chance?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Programming an HPET channel as periodic requires setting the
> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
> register must be written twice (once for the comparator value and once for
> the periodic value). Since this programming might be needed in several
> places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
> add a helper function for this purpose.
>
> A helper function hpet_set_comparator_oneshot() could also be implemented.
> However, such function would only program the comparator register and the
> function would be quite small. Hence, it is better to not bloat the code
> with such an obvious function.

This word salad above does not provide a single reason why the periodic
programming function is required and better suited for the NMI watchdog
case and then goes on and blurbs about why a function which is not
required is not implemented. The argument about not bloating the code
with an "obvious???" function which is quite small is slightly beyond my
comprehension level.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> +  *
> +  * Also, NMIs do not have an associated vector. No need for cleanup.

They have a vector and what the heck is this cleanup comment for here?
There is nothing cleanup alike anywhere near.

Adding confusing comments is worse than adding no comments at all.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
>  
> + if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
> + /* Only one IRQ per NMI */
> + if (nr_irqs != 1)
> + return -EINVAL;

See previous reply.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drm/nouveau/tegra: Stop using iommu_present()

2022-05-06 Thread Lyude Paul
Whoops! Was going through my unread emails and noticed I somehow missed this
patch last month.

Reviewed-by: Lyude Paul 

I will push this to drm-misc-fixes in a little bit (assuming I don't find it
there already)

On Tue, 2022-04-05 at 15:21 +0100, Robin Murphy wrote:
> Even if some IOMMU has registered itself on the platform "bus", that
> doesn't necessarily mean it provides translation for the device we
> care about. Replace iommu_present() with a more appropriate check.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> index 992cc285f2fe..2ed528c065fa 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
> @@ -123,7 +123,7 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra
> *tdev)
>  
> mutex_init(>iommu.mutex);
>  
> -   if (iommu_present(_bus_type)) {
> +   if (device_iommu_mapped(dev)) {
> tdev->iommu.domain = iommu_domain_alloc(_bus_type);
> if (!tdev->iommu.domain)
> goto error;

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> The Intel IOMMU interrupt remapping driver already programs correctly the
> delivery mode of individual irqs as per their irq_data. Improve handling
> of NMIs. Allow only one irq per NMI. Also, it is not necessary to cleanup
> irq vectors after updating affinity.

Structuring a changelog in paragraphs might make it readable. New lines
exist for a reason.

> NMIs do not have associated vectors.

Again. NMI has an vector associated, but it is not subject to dynamic
vector management.

> diff --git a/drivers/iommu/intel/irq_remapping.c 
> b/drivers/iommu/intel/irq_remapping.c
> index fb2d71bea98d..791a9331e257 100644
> --- a/drivers/iommu/intel/irq_remapping.c
> +++ b/drivers/iommu/intel/irq_remapping.c
> @@ -1198,8 +1198,12 @@ intel_ir_set_affinity(struct irq_data *data, const 
> struct cpumask *mask,
>* After this point, all the interrupts will start arriving
>* at the new destination. So, time to cleanup the previous
>* vector allocation.
> +  *
> +  * Do it only for non-NMI irqs. NMIs don't have associated
> +  * vectors.

See above.

>*/
> - send_cleanup_vector(cfg);
> + if (cfg->delivery_mode != APIC_DELIVERY_MODE_NMI)
> + send_cleanup_vector(cfg);

So this needs to be replicated for all invocations of
send_cleanup_vector(), right? Why can't you put it into that function?
  
>   return IRQ_SET_MASK_OK_DONE;
>  }
> @@ -1352,6 +1356,9 @@ static int intel_irq_remapping_alloc(struct irq_domain 
> *domain,
>   if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
>   info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
>  
> + if ((info->flags & X86_IRQ_ALLOC_AS_NMI) && nr_irqs != 1)
> + return -EINVAL;

This cannot be reached when the vector allocation domain already
rejected it, but copy & pasta is wonderful and increases the line count.

Thanks,

tglx


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Vectors are meaningless when allocating IRQs with NMI as the delivery
> mode.

Vectors are not meaningless. NMI has a fixed vector.

The point is that for a fixed vector there is no vector management
required.

Can you spot the difference?

> In such case, skip the reservation of IRQ vectors. Do it in the lowest-
> level functions where the actual IRQ reservation takes place.
>
> Since NMIs target specific CPUs, keep the functionality to find the best
> CPU.

Again. What for?
  
> + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> + cpu = irq_matrix_find_best_cpu(vector_matrix, dest);
> + apicd->cpu = cpu;
> + vector = 0;
> + goto no_vector;
> + }

Why would a NMI ever end up in this code? There is no vector management
required and this find cpu exercise is pointless.

>   vector = irq_matrix_alloc(vector_matrix, dest, resvd, );
>   trace_vector_alloc(irqd->irq, vector, resvd, vector);
>   if (vector < 0)
>   return vector;
>   apic_update_vector(irqd, vector, cpu);
> +
> +no_vector:
>   apic_update_irq_cfg(irqd, vector, cpu);
>  
>   return 0;
> @@ -321,12 +330,22 @@ assign_managed_vector(struct irq_data *irqd, const 
> struct cpumask *dest)
>   /* set_affinity might call here for nothing */
>   if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
>   return 0;
> +
> + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> + cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
> + apicd->cpu = cpu;
> + vector = 0;
> + goto no_vector;
> + }

This code can never be reached for a NMI delivery. If so, then it's a
bug.

This all is special purpose for that particular HPET NMI watchdog use
case and we are not exposing this to anything else at all.

So why are you sprinkling this NMI nonsense all over the place? Just
because? There are well defined entry points to all of this where this
can be fenced off.

If at some day the hardware people get their act together and provide a
proper vector space for NMIs then we have to revisit that, but then
there will be a separate NMI vector management too.

What you want is the below which also covers the next patch. Please keep
an eye on the comments I added/modified.

Thanks,

tglx
---
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -42,6 +42,7 @@ EXPORT_SYMBOL_GPL(x86_vector_domain);
 static DEFINE_RAW_SPINLOCK(vector_lock);
 static cpumask_var_t vector_searchmask;
 static struct irq_chip lapic_controller;
+static struct irq_chip lapic_nmi_controller;
 static struct irq_matrix *vector_matrix;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct hlist_head, cleanup_list);
@@ -451,6 +452,10 @@ static int x86_vector_activate(struct ir
trace_vector_activate(irqd->irq, apicd->is_managed,
  apicd->can_reserve, reserve);
 
+   /* NMI has a fixed vector. No vector management required */
+   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
+   return 0;
+
raw_spin_lock_irqsave(_lock, flags);
if (!apicd->can_reserve && !apicd->is_managed)
assign_irq_vector_any_locked(irqd);
@@ -472,6 +477,10 @@ static void vector_free_reserved_and_man
trace_vector_teardown(irqd->irq, apicd->is_managed,
  apicd->has_reserved);
 
+   /* NMI has a fixed vector. No vector management required */
+   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
+   return;
+
if (apicd->has_reserved)
irq_matrix_remove_reserved(vector_matrix);
if (apicd->is_managed)
@@ -568,6 +577,24 @@ static int x86_vector_alloc_irqs(struct
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
 
+   if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
+   /*
+* NMIs have a fixed vector and need their own
+* interrupt chip so nothing can end up in the
+* regular local APIC management code except the
+* MSI message composing callback.
+*/
+   irqd->chip = _nmi_controller;
+   /*
+* Don't allow affinity setting attempts for NMIs.
+* This cannot work with the regular affinity
+* mechanisms and for the intended HPET NMI
+* watchdog use case it's not required.
+*/
+   irqd_set_no_balance(irqd);
+   continue;
+   }
+
/*
 * Prevent that any of these interrupts is invoked in
 * non interrupt context via e.g. 

Re: [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> There are no restrictions in hardware to set  MSI messages with its
> own delivery mode.

"messages with its own" ? Plural/singular confusion.

> Use the mode specified in the provided IRQ hardware
> configuration data. Since most of the IRQs are configured to use the
> delivery mode of the APIC driver in use (set in all of them to
> APIC_DELIVERY_MODE_FIXED), the only functional changes are where
> IRQs are configured to use a specific delivery mode.

This does not parse. There are no functional changes due to this patch
and there is no point talking about functional changes in subsequent
patches here.

> Changing the utility function __irq_msi_compose_msg() takes care of
> implementing the change in the in the local APIC, PCI-MSI, and DMAR-MSI

in the in the

> irq_chips.
>
> The IO-APIC irq_chip configures the entries in the interrupt redirection
> table using the delivery mode specified in the corresponding MSI message.
> Since the MSI message is composed by a higher irq_chip in the hierarchy,
> it does not need to be updated.

The point is that updating __irq_msi_compose_msg() covers _all_ MSI
consumers including IO-APIC.

I had to read that changelog 3 times to make sense of it. Something like
this perhaps:

  "x86/apic/msi: Use the delivery mode from irq_cfg for message composition

   irq_cfg provides a delivery mode for each interrupt. Use it instead
   of the hardcoded APIC_DELIVERY_MODE_FIXED. This allows to compose
   messages for NMI delivery mode which is required to implement a HPET
   based NMI watchdog.

   No functional change as the default delivery mode is set to
   APIC_DELIVERY_MODE_FIXED."

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode

2022-05-06 Thread Thomas Gleixner
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Currently, the delivery mode of all interrupts is set to the mode of the
> APIC driver in use. There are no restrictions in hardware to configure the
> delivery mode of each interrupt individually. Also, certain IRQs need
> to be

s/IRQ/interrupt/ Changelogs can do without acronyms.

> configured with a specific delivery mode (e.g., NMI).
>
> Add a new member, delivery_mode, to struct irq_cfg. Subsequent changesets
> will update every irq_domain to set the delivery mode of each IRQ to that
> specified in its irq_cfg data.
>
> To keep the current behavior, when allocating an IRQ in the root
> domain

The root domain does not allocate an interrupt. The root domain
allocates a vector for an interrupt. There is a very clear and technical
destinction. Can you please be more careful about the wording?

> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -567,6 +567,7 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> *domain, unsigned int virq,
>   irqd->chip_data = apicd;
>   irqd->hwirq = virq + i;
>   irqd_set_single_target(irqd);
> +

Stray newline.

>   /*
>* Prevent that any of these interrupts is invoked in
>* non interrupt context via e.g. generic_handle_irq()
> @@ -577,6 +578,14 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> *domain, unsigned int virq,
>   /* Don't invoke affinity setter on deactivated interrupts */
>   irqd_set_affinity_on_activate(irqd);
>  
> + /*
> +  * Initialize the delivery mode of this irq to match the

s/irq/interrupt/

> +  * default delivery mode of the APIC. Children irq domains
> +  * may take the delivery mode from the individual irq
> +  * configuration rather than from the APIC driver.
> +  */
> + apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
> +
>   /*
>* Legacy vectors are already assigned when the IOAPIC
>* takes them over. They stay on the same vector. This is
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors

2022-05-06 Thread Thomas Gleixner
Ricardo,

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Certain types of interrupts, such as NMI, do not have an associated vector.
> They, however, target specific CPUs. Thus, when assigning the destination
> CPU, it is beneficial to select the one with the lowest number of
> vectors.

Why is that beneficial especially in the context of a NMI watchdog which
then broadcasts the NMI to all other CPUs?

That's wishful thinking perhaps, but I don't see any benefit at all.

> Prepend the functions matrix_find_best_cpu_managed() and
> matrix_find_best_cpu_managed()

The same function prepended twice becomes two functions :)

> with the irq_ prefix and expose them for
> IRQ controllers to use when allocating and activating vector-less IRQs.

There is no such thing like a vectorless IRQ. NMIs have a vector. Can we
please describe facts and not pulled out of thin air concepts which do
not exist?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Reorganize __iommu_attach_group()

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 05:44:11PM +0100, Robin Murphy wrote:

> > > Nit: if that's true then it's equally true for iommu_attach_group() as 
> > > well.
> > 
> > Is it? I didn't check this as closely..
> 
> Well, there certainly seems no obvious reason for one to WARN where the
> other fails quietly. Either it's an egregious loss of internal consistency
> to not know whether your thing is already attached or it isn't, but I don't
> see why the exact flavour of attach API called should matter.

I will try to check

> > We don't have a name for the set of domains owned by the core code vs a
> > domain set externally.
> > 
> > If you don't like core how about internal/external?
> 
> Oh no, I rather like the "core domain" nomenclature, it's the
> "iommu_group_is_..." part that then fails to parse sensibly.

Oh Ok

> > > Even getting past that, does it make sense to say NULL
> > > is a core domain? I'm not convinced.
> > 
> > NULL is not an externally imposed domain so it is definately a
> > core/internal domain in this logic.
> 
> So if it *is* a domain then I can call NULL->attach_dev() and...? ;)

You can call iommu_group_set_domain(group, NULL) and it will work.

As I said, it must have this symmetry:

 __iommu_group_attach_core_domain()
 assert(__iommu_group_is_core_domain())

This is why I choose the name, because it is a clear pairing with
__iommu_group_attach_core_domain().

How about __iommu_group_is_core_domain_attached() ? Solves the grammer
issue

> > > For the sake of future readability, I'd
> > > prefer to call this something more purpose-related like
> > > __iommu_group_can_attach()
> > 
> > That is not the case - we can always attach any domain.
> 
> No, in the context of the iommu_attach_{device,group}() APIs where this
> helper is relevant, after a successful attach, it has long been required to
> detach before another attach can be performed. That's literally the code
> you're moving around in this patch.


> This is precisely why I'm starting to think it would be beneficial to
> differentiate the internal interface now that it's firmly moving away from
> the attach/detach paradigm still exposed externally.

I'm viewing the API family of __iommu_group_attach_core_domain /
__iommu_group_set_domain / __iommu_group_is_core_domain_attached()
a layering point - the other APIs are being built on top of this
family.

It is now exposing a semantic the same as the default-domain enabled
driver ops, but is using different nomenclature.

Should it be __iommu_group_set_core_domain() ?

This naming seems way harder than it really should be.

> Heh, if we can ever get to the point where we don't have differing old and
> new semantics side-by-side at almost every level, maybe then we can find the
> right name for *everything*. Besides, it's arguably even weirder for
> attach_domain to be the only way to call detach_dev, so there's that.

It is because the ops are misnamed for what they now do.

 attach_dev is really set_domain_for_dev()
 detach_dev is really reset_dev_to_platform()

Which definitely is the root cause of the confusion that created this
bug in the first place.

Anyhow, I'll post the fixing patch again with the new comment and we
can decide how to revise the language separately, no need to delay
getting Lu's series back into the testing stream.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Reorganize __iommu_attach_group()

2022-05-06 Thread Robin Murphy

On 2022-05-06 14:21, Jason Gunthorpe wrote:

On Fri, May 06, 2022 at 10:12:29AM +0100, Robin Murphy wrote:

On 2022-05-05 17:15, Jason Gunthorpe via iommu wrote:

Call iommu_group_do_attach_device() only from
__iommu_group_attach_domain() which should be used to attach any domain to
the group.

The only unique thing __iommu_attach_group() does is to check if the group
is already attached to some caller specified group. Put this test into
__iommu_group_is_core_domain(), matching the
__iommu_group_attach_core_domain() nomenclature.

Change the two callers to directly call __iommu_group_attach_domain() and
test __iommu_group_is_core_domain().

iommu_attach_device() should trigger a WARN_ON if the group is attached as
the caller is using the function wrong.


Nit: if that's true then it's equally true for iommu_attach_group() as well.


Is it? I didn't check this as closely..


Well, there certainly seems no obvious reason for one to WARN where the 
other fails quietly. Either it's an egregious loss of internal 
consistency to not know whether your thing is already attached or it 
isn't, but I don't see why the exact flavour of attach API called should 
matter.



@@ -2110,6 +2100,12 @@ static int __iommu_group_attach_domain(struct 
iommu_group *group,
return 0;
   }
+static bool __iommu_group_is_core_domain(struct iommu_group *group)


I can see the thought process behind it, but once we've had some time away
from actively working on this area, this is clearly going to be a terrible
name.


We don't have a name for the set of domains owned by the core code vs a
domain set externally.

If you don't like core how about internal/external?


Oh no, I rather like the "core domain" nomenclature, it's the 
"iommu_group_is_..." part that then fails to parse sensibly.



__iommu_group_set_internal_domain()
__iommu_group_internal_domain_attached() / 
__iommu_group_external_domain_attached() ?

I wanted to use the word 'user' here (ie kernel user of the iommu
kAPI) for external domain but that felt confusing as well given we are
talking about introducing userspace domains for nesting.


Even getting past that, does it make sense to say NULL
is a core domain? I'm not convinced.


NULL is not an externally imposed domain so it is definately a
core/internal domain in this logic.


So if it *is* a domain then I can call NULL->attach_dev() and...? ;)

It's true that it's not a non-core domain, but IMO the two negatives 
don't simply cancel out. Hence why I think it's easier to try framing it 
in terms of what the *result* implies, rather than wrestle with trying 
to concisely capture that the check itself is approximately

iommu_group_is_not_already_attached_to_user_allocated_domain().


For the sake of future readability, I'd
prefer to call this something more purpose-related like
__iommu_group_can_attach()


That is not the case - we can always attach any domain.


No, in the context of the iommu_attach_{device,group}() APIs where this 
helper is relevant, after a successful attach, it has long been required 
to detach before another attach can be performed. That's literally the 
code you're moving around in this patch.


(Prior to default domains that wasn't actually enforced, but I doubt all 
the drivers would have handled it correctly.)


This is precisely why I'm starting to think it would be beneficial to 
differentiate the internal interface now that it's firmly moving away 
from the attach/detach paradigm still exposed externally.



This is only used as a santity check to see if an externally imposed
domain is currently attached or not.

We could also just delete the check..


(and also just define it earlier to avoid the forward-declaration).


I put it here deliberately because the function directly below is
__iommu_group_attach_core_domain() - which causes
__iommu_group_is_core_domain() to become true. The symmetry helps
explain how the two functions relate.

This whole file is out of order, it seems to be a style or something?


FWIW I think the declarations we do have there are all things that got 
refactored out of existing code. I don't see any having been added with 
entirely new functions where easily avoidable.



In fact at that point I'd also be tempted to rename
__iommu_group_attach_domain() to __iommu_group_set_domain(), to further
clarify that attach/detach still reflects the external API, but the internal
mechanism is really a bit different since default/core domains came in.


It make sense - weird that set_domain is the only way to call
attach_dev, but OK :) I'll do that in the prior patch


Heh, if we can ever get to the point where we don't have differing old 
and new semantics side-by-side at almost every level, maybe then we can 
find the right name for *everything*. Besides, it's arguably even 
weirder for attach_domain to be the only way to call detach_dev, so 
there's that.



Please give me your thought on external/internal as naming instead I
can adjust the prior patch as well.


Re: [PATCH] iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()

2022-05-06 Thread Will Deacon
On Mon, 25 Apr 2022 19:41:36 +0800, Yang Yingliang wrote:
> It will cause null-ptr-deref when using 'res', if platform_get_resource()
> returns NULL, so move using 'res' after devm_ioremap_resource() that
> will check it to avoid null-ptr-deref.
> And use devm_platform_get_and_ioremap_resource() to simplify code.
> 
> 

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()
  https://git.kernel.org/will/c/d9ed8af1dee3

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/3] iommu/arm-smmu: Support Tegra234 SMMU

2022-05-06 Thread Will Deacon
On Fri, 29 Apr 2022 10:22:40 +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Hi Joerg,
> 
> this is essentially a resend of v2 with a Acked-by:s from Robin and Will
> added. These have been on the list for quite a while now, but apparently
> there was a misunderstanding, so neither you nor Will picked this up.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/3] dt-bindings: arm-smmu: Document nvidia,memory-controller property
  https://git.kernel.org/will/c/c02bda09f91a
[2/3] dt-bindings: arm-smmu: Add compatible for Tegra234 SOC
  https://git.kernel.org/will/c/95d5aeabda00
[3/3] iommu/arm-smmu: Support Tegra234 SMMU
  https://git.kernel.org/will/c/5ca216155b5e

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: check return value after calling platform_get_resource()

2022-05-06 Thread Will Deacon
On Mon, 25 Apr 2022 19:45:25 +0800, Yang Yingliang wrote:
> It will cause null-ptr-deref if platform_get_resource() returns NULL,
> we need check the return value.
> 
> 

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: check return value after calling 
platform_get_resource()
  https://git.kernel.org/will/c/b131fa8c1d2a

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/7] SDX65 devicetree updates

2022-05-06 Thread Will Deacon
On Mon, 11 Apr 2022 15:20:08 +0530, Rohit Agarwal wrote:
> This series adds devicetree nodes for SDX65. It adds reserved memory
> nodes, SDHCI, smmu and tcsr mutex support.
> 
> Changes from v1:
>  - Addressed Mani's Comments and made necessary.
>  - Rebased on top of v5.18-rc2.
> 
> [...]

Applied SMMU binding patch to will (for-joerg/arm-smmu/updates), thanks!

[4/7] dt-bindings: arm-smmu: Add binding for SDX65 SMMU
  https://git.kernel.org/will/c/5a4eb9163471

However, I must confess that I don't understand the point in updating
the binding documentation but not the driver. Should we be matching on
the new compatible string somewhere?

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/arm-smmu-v3-sva: Fix mm use-after-free

2022-05-06 Thread Will Deacon
On Tue, 26 Apr 2022 14:04:45 +0100, Jean-Philippe Brucker wrote:
> We currently call arm64_mm_context_put() without holding a reference to
> the mm, which can result in use-after-free. Call mmgrab()/mmdrop() to
> ensure the mm only gets freed after we unpinned the ASID.
> 
> 

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3-sva: Fix mm use-after-free
  https://git.kernel.org/will/c/cbd23144f766

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/arm-smmu-qcom: Add SC8280XP support

2022-05-06 Thread Will Deacon
On Tue, 3 May 2022 09:34:27 -0700, Bjorn Andersson wrote:
> This adds the compatible for the Qualcomm SC8280XP platform and associate the
> Qualcomm impl in the ARM SMMU driver to it.
> 
> Bjorn Andersson (2):
>   dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP
>   iommu/arm-smmu-qcom: Add SC8280XP support
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/2] dt-bindings: arm-smmu: Add compatible for Qualcomm SC8280XP
  https://git.kernel.org/will/c/38db6b41b2f4
[2/2] iommu/arm-smmu-qcom: Add SC8280XP support
  https://git.kernel.org/will/c/d044023e219d

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread Steve Wahl
On Fri, May 06, 2022 at 08:12:11AM +, Tian, Kevin wrote:
> > From: David Woodhouse 
> > Sent: Friday, May 6, 2022 3:17 PM
> > 
> > On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > > > From: Baolu Lu 
> > > >
> > > > > --- a/include/linux/dmar.h
> > > > > +++ b/include/linux/dmar.h
> > > > > @@ -19,7 +19,7 @@
> > > > >   struct acpi_dmar_header;
> > > > >
> > > > >   #ifdef  CONFIG_X86
> > > > > -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > > > +# define DMAR_UNITS_SUPPORTED640
> > > > >   #else
> > > > >   # defineDMAR_UNITS_SUPPORTED64
> > > > >   #endif
> > >
> > > ... is it necessary to permanently do 10x increase which wastes memory
> > > on most platforms which won't have such need.
> > 
> > I was just looking at that. It mostly adds about 3½ KiB to each struct
> > dmar_domain.
> > 
> > I think the only actual static array is the dmar_seq_ids bitmap which
> > grows to 640 *bits* which is fairly negligible, and the main growth is
> > that it adds about 3½ KiB to each struct dmar_domain for the
> > iommu_refcnt[] and iommu_did[] arrays.
> 
> Thanks for the quick experiment! though the added material is
> negligible it's cleaner to me if having a way to configure it as
> discussed below.
> 
> > 
> > > Does it make more sense to have a configurable approach similar to
> > > CONFIG_NR_CPUS? or even better can we just replace those static
> > > arrays with dynamic allocation so removing this restriction
> > > completely?
> > 
> > Hotplug makes that fun, but I suppose you only need to grow the array
> > in a given struct dmar_domain if you actually add a device to it that's
> > behind a newly added IOMMU. I don't know if the complexity of making it
> > fully dynamic is worth it though. We could make it a config option,
> > and/or a command line option (perhaps automatically derived from
> > CONFIG_NR_CPUS).
> 
> either config option or command line option is OK to me. Probably
> the former is simpler given no need to dynamically expand the
> static array. btw though deriving from CONFIG_NR_CPUS could work 
> in this case it is unclear why tying the two together is necessary in
> concept, e.g. is there guarantee that the number of IOMMUs must
> be smaller than the number of CPUs in a platform?
> 
> > 
> > If it wasn't for hotplug, I think we'd know the right number by the
> > time we actually need it anyway, wouldn't we? Can we have a heuristic
> > for how many DMAR units are likely to be hotplugged? Is it as simple as
> > the ratio of present to not-yet-present CPUs in MADT?
> 
> Probably. But I don't have enough knowledge on DMAR hotplug to
> judge (e.g. whether it's strictly tied to CPU hotplug and if yes whether
> there could be multiple IOMMUs hotplugged together with a CPU
> socket)...
> 
> Thanks
> Kevin

Would anyone be more comfortable if we only increase the limit where
MAXSMP is set?

i.e.

#if defined(CONFIG_X86) && defined(CONFIG_MAXSMP)
# defineDMAR_UNITS_SUPPORTED640
#elif defined(CONFIG_X86)
# defineDMAR_UNITS_SUPPORTEDMAX_IO_APICS
#else
# defineDMAR_UNITS_SUPPORTED64
#endif

Thank you all for your time looking at this.

--> Steve Wahl

-- 
Steve Wahl, Hewlett Packard Enterprise
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] dt-bindings: iommu: arm,smmu-v3: make PRI IRQ optional

2022-05-06 Thread Andre Przywara
On Thu, 28 Apr 2022 08:56:38 +0200
Krzysztof Kozlowski  wrote:

Hi,

> On 27/04/2022 13:25, Andre Przywara wrote:
> > The Page Request Interface (PRI) is an optional PCIe feature. As such, a
> > SMMU would not need to handle it if the PCIe host bridge or the SMMU
> > itself do not implement it. Also an SMMU could be connected to a platform
> > device, without any PRI functionality whatsoever.
> > In all cases there would be no SMMU PRI queue interrupt to be wired up
> > to an interrupt controller.
> > 
> > Relax the binding to allow specifying three interrupts, omitting the PRI
> > IRQ. At the moment, with the "eventq,gerror,priq,cmdq-sync" order, we
> > would need to sacrifice the command queue sync interrupt as well, which
> > might not be desired.
> > The Linux driver does not care about any order at all, just picks IRQs
> > based on their names, and treats all (wired) IRQs as optional.  
> 
> The last sentence is not a good explanation for the bindings. They are
> not about Linux and are used in other projects as well.

It was not meant as an explanation, but just as an assurance that we can
*change* the binding. At the moment the order is strict, so binding
compliant DT consumers could just read the first, second, third, and fourth
interrupt, without caring about the names. If we now allow a different
order, this would break those users.
I couldn't find any user of arm,smmu-v3 in FreeBSD, OpenBSD, U-Boot,
or Zephyr, hence my mentioning of Linux being fine, so it's safe to relax
the strict ordering requirement.
If someone knows about other DT consumers that need attention, I am
all ears.

Cheers,
Andre

> > Signed-off-by: Andre Przywara 
> > ---
> >  .../bindings/iommu/arm,smmu-v3.yaml   | 21 ++-
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml 
> > b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
> > index e87bfbcc69135..6b3111f1f06ce 100644
> > --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
> > @@ -37,12 +37,23 @@ properties:
> >hardware supports just a single, combined interrupt line.
> >If provided, then the combined interrupt will be used in 
> > preference to
> >any others.
> > -  - minItems: 2
> > +  - minItems: 1
> >  items:
> > -  - const: eventq # Event Queue not empty
> > -  - const: gerror # Global Error activated
> > -  - const: priq   # PRI Queue not empty
> > -  - const: cmdq-sync  # CMD_SYNC complete
> > +  - enum:
> > +  - eventq # Event Queue not empty
> > +  - gerror # Global Error activated
> > +  - cmdq-sync  # CMD_SYNC complete
> > +  - priq   # PRI Queue not empty
> > +  - enum:
> > +  - gerror
> > +  - cmdq-sync
> > +  - priq
> > +  - enum:
> > +  - cmdq-sync
> > +  - priq
> > +  - enum:
> > +  - cmdq-sync
> > +  - priq  
> 
> The order should be strict, so if you want the first interrupt optional,
> then:
> oneOf:
>  - items:
> ... 4 items list
>  - items
> ... 3 items list
> 
> Best regards,
> Krzysztof

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 01/11] dt-bindings: iommu: arm, smmu-v3: make PRI IRQ optional

2022-05-06 Thread Andre Przywara
The Page Request Interface (PRI) is an optional PCIe feature. As such, a
SMMU would not need to handle it if the PCIe host bridge or the SMMU
itself do not implement it. Also an SMMU could be connected to a platform
device, without any PRI functionality whatsoever.
In all cases there would be no SMMU PRI queue interrupt to be wired up
to an interrupt controller.
At the moment, with the "eventq,gerror,priq,cmdq-sync" order, we
would need to sacrifice the command queue sync interrupt as well, which
might not be desired.

Relax the binding to allow specifying certain useful combinations of
wired interrupts, for instance just the "gerror" interrupt, or omitting
both "pri" and "cmdq-sync".

Signed-off-by: Andre Przywara 
---
 .../devicetree/bindings/iommu/arm,smmu-v3.yaml   | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
index e87bfbcc69135..c57a53d87e4e6 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml
@@ -37,12 +37,18 @@ properties:
   hardware supports just a single, combined interrupt line.
   If provided, then the combined interrupt will be used in preference 
to
   any others.
-  - minItems: 2
+  - minItems: 1
 items:
-  - const: eventq # Event Queue not empty
-  - const: gerror # Global Error activated
-  - const: priq   # PRI Queue not empty
-  - const: cmdq-sync  # CMD_SYNC complete
+  - enum:
+  - eventq # Event Queue not empty
+  - gerror # Global Error activated
+  - const: gerror
+  - enum:
+  - cmdq-sync  # CMD_SYNC complete
+  - priq   # PRI Queue not empty
+  - enum:
+  - cmdq-sync
+  - priq
 
   '#iommu-cells':
 const: 1
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 02:35:50PM +0100, Robin Murphy wrote:

> > So you want to say "DMA is always managed" when attaching a domain of
> > type IOMMU_DOMAIN_UNMANAGED? :)
> 
> Touché ;) Just goes to confirm the point above that confusion between
> general concepts and specific API terms is all too easy. An "unmanaged"
> domain from the PoV of the API just means it's managed by the external
> caller, but you're right that that's not necessarily so obvious either.

Yeah, I'm not so keen on the naming used for IOMMU_DOMAIN_*

I looked for a bit and could not figure out why we need to have
IOMMU_DOMAIN_DMA either.. I didn't find anthing obvious in the iommu
drivers that looked like a special case for this? Most drivers treat
it identically to UNMANAGED in their alloc functions

Only mtk, arm-smmu and some odd stuff in Intel seemed to be sensitive?

> > /*
> >  * Changing the domain is done by calling attach_dev() on the new
> >  * domain. This switch does not have to be atomic and DMA can be
> >  * discarded during the transition. DMA must always be translated by
> 
> s/always be translated by/only be able to access/ and we have a deal :)

Done, thanks

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Robin Murphy

On 2022-05-06 14:10, Jason Gunthorpe wrote:

On Fri, May 06, 2022 at 10:32:50AM +0100, Robin Murphy wrote:


It is as discussed with Robin, NULL means to return the group back to
some platform defined translation, and in some cases this means
returning back to work with the platform's DMA ops - presumably if it
isn't using the dma api.


This is clearer than the original text.


Perhaps we could just leave that sentence as "Otherwise the NULL domain
represents platform-specific behaviour."


Sure, there are only three drivers that use this call path and who
knows what they do I guess..


Soemtimes. This is a statement about the required
atomicity. New/old/block are all valid translations during the
operation. Identity is not.


but new/old/block are not the same type of classifications. A group
has an old domain and a new domain at this transition point, and
both old/new domains have a domain type (dma, unmanged, block,
identity, etc.). Mixing them together only increases confusion here.


Block refers to the global concept of blocking - not to a type.


Good point - in particular I think the "DMA is always translated" part would
be more accurately said as "DMA is always managed".


So you want to say "DMA is always managed" when attaching a domain of
type IOMMU_DOMAIN_UNMANAGED? :)


Touché ;) Just goes to confirm the point above that confusion between 
general concepts and specific API terms is all too easy. An "unmanaged" 
domain from the PoV of the API just means it's managed by the external 
caller, but you're right that that's not necessarily so obvious either.



Lets just clarify a bit that blocking isn't talking about a domain
type:

/*
 * Changing the domain is done by calling attach_dev() on the new
 * domain. This switch does not have to be atomic and DMA can be
 * discarded during the transition. DMA must always be translated by


s/always be translated by/only be able to access/ and we have a deal :)

Cheers,
Robin.


 * either new_domain or group->domain, never something else.
 *
 * Note that this is called in error unwind paths, attaching to a
 * domain that has already been attached cannot fail.
 */

(aside, I expect down the road we will have some domain capability
'attach is atomic' meaning if new and old have the same translation
then there is no disruption to DMA)

Thanks,
Jason

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Reorganize __iommu_attach_group()

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 10:12:29AM +0100, Robin Murphy wrote:
> On 2022-05-05 17:15, Jason Gunthorpe via iommu wrote:
> > Call iommu_group_do_attach_device() only from
> > __iommu_group_attach_domain() which should be used to attach any domain to
> > the group.
> > 
> > The only unique thing __iommu_attach_group() does is to check if the group
> > is already attached to some caller specified group. Put this test into
> > __iommu_group_is_core_domain(), matching the
> > __iommu_group_attach_core_domain() nomenclature.
> > 
> > Change the two callers to directly call __iommu_group_attach_domain() and
> > test __iommu_group_is_core_domain().
> > 
> > iommu_attach_device() should trigger a WARN_ON if the group is attached as
> > the caller is using the function wrong.
> 
> Nit: if that's true then it's equally true for iommu_attach_group() as well.

Is it? I didn't check this as closely..

> > @@ -2110,6 +2100,12 @@ static int __iommu_group_attach_domain(struct 
> > iommu_group *group,
> > return 0;
> >   }
> > +static bool __iommu_group_is_core_domain(struct iommu_group *group)
> 
> I can see the thought process behind it, but once we've had some time away
> from actively working on this area, this is clearly going to be a terrible
> name.

We don't have a name for the set of domains owned by the core code vs a
domain set externally.

If you don't like core how about internal/external?

__iommu_group_set_internal_domain()
__iommu_group_internal_domain_attached() / 
__iommu_group_external_domain_attached() ?

I wanted to use the word 'user' here (ie kernel user of the iommu
kAPI) for external domain but that felt confusing as well given we are
talking about introducing userspace domains for nesting.

> Even getting past that, does it make sense to say NULL
> is a core domain? I'm not convinced. 

NULL is not an externally imposed domain so it is definately a
core/internal domain in this logic.

> For the sake of future readability, I'd
> prefer to call this something more purpose-related like
> __iommu_group_can_attach() 

That is not the case - we can always attach any domain.

This is only used as a santity check to see if an externally imposed
domain is currently attached or not.

We could also just delete the check..

> (and also just define it earlier to avoid the forward-declaration).

I put it here deliberately because the function directly below is
__iommu_group_attach_core_domain() - which causes
__iommu_group_is_core_domain() to become true. The symmetry helps
explain how the two functions relate.

This whole file is out of order, it seems to be a style or something?

> In fact at that point I'd also be tempted to rename
> __iommu_group_attach_domain() to __iommu_group_set_domain(), to further
> clarify that attach/detach still reflects the external API, but the internal
> mechanism is really a bit different since default/core domains came in.

It make sense - weird that set_domain is the only way to call
attach_dev, but OK :) I'll do that in the prior patch

Please give me your thought on external/internal as naming instead I
can adjust the prior patch as well.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 10:32:50AM +0100, Robin Murphy wrote:

> > > It is as discussed with Robin, NULL means to return the group back to
> > > some platform defined translation, and in some cases this means
> > > returning back to work with the platform's DMA ops - presumably if it
> > > isn't using the dma api.
> > 
> > This is clearer than the original text.
> 
> Perhaps we could just leave that sentence as "Otherwise the NULL domain
> represents platform-specific behaviour."

Sure, there are only three drivers that use this call path and who
knows what they do I guess..

> > > Soemtimes. This is a statement about the required
> > > atomicity. New/old/block are all valid translations during the
> > > operation. Identity is not.
> > 
> > but new/old/block are not the same type of classifications. A group
> > has an old domain and a new domain at this transition point, and
> > both old/new domains have a domain type (dma, unmanged, block,
> > identity, etc.). Mixing them together only increases confusion here.

Block refers to the global concept of blocking - not to a type.

> Good point - in particular I think the "DMA is always translated" part would
> be more accurately said as "DMA is always managed". 

So you want to say "DMA is always managed" when attaching a domain of
type IOMMU_DOMAIN_UNMANAGED? :)

Lets just clarify a bit that blocking isn't talking about a domain
type:

/*
 * Changing the domain is done by calling attach_dev() on the new
 * domain. This switch does not have to be atomic and DMA can be
 * discarded during the transition. DMA must always be translated by
 * either new_domain or group->domain, never something else.
 *
 * Note that this is called in error unwind paths, attaching to a
 * domain that has already been attached cannot fail.
 */

(aside, I expect down the road we will have some domain capability
'attach is atomic' meaning if new and old have the same translation
then there is no disruption to DMA)

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 03:25:03PM +1000, David Gibson wrote:
> On Thu, May 05, 2022 at 04:07:28PM -0300, Jason Gunthorpe wrote:

> > When the iommu_domain is created I want to have a
> > iommu-driver-specific struct, so PPC can customize its iommu_domain
> > however it likes.
> 
> This requires that the client be aware of the host side IOMMU model.
> That's true in VFIO now, and it's nasty; I was really hoping we could
> *stop* doing that.

iommufd has two modes, the 'generic interface' which what this patch
series shows that does not require any device specific knowledge.

The default iommu_domain that the iommu driver creates will be used
here, it is up to the iommu driver to choose something reasonable for
use by applications like DPDK. ie PPC should probably pick its biggest
x86-like aperture.

The iommu-driver-specific struct is the "advanced" interface and
allows a user-space IOMMU driver to tightly control the HW with full
HW specific knowledge. This is where all the weird stuff that is not
general should go.

> Note that I'm talking here *purely* about the non-optimized case where
> all updates to the host side IO pagetables are handled by IOAS_MAP /
> IOAS_COPY, with no direct hardware access to user or guest managed IO
> pagetables.  The optimized case obviously requires end-to-end
> agreement on the pagetable format amongst other domain properties.

Sure, this is how things are already..

> What I'm hoping is that qemu (or whatever) can use this non-optimized
> as a fallback case where it does't need to know the properties of
> whatever host side IOMMU models there are.  It just requests what it
> needs based on the vIOMMU properties it needs to replicate and the
> host kernel either can supply it or can't.

There aren't really any negotiable vIOMMU properties beyond the
ranges, and the ranges are not *really* negotiable.

There are lots of dragons with the idea we can actually negotiate
ranges - because asking for the wrong range for what the HW can do
means you don't get anything. Which is completely contrary to the idea
of easy generic support for things like DPDK.

So DPDK can't ask for ranges, it is not generic.

This means we are really talking about a qemu-only API, and IMHO, qemu
is fine to have a PPC specific userspace driver to tweak this PPC
unique thing if the default windows are not acceptable.

IMHO it is no different from imagining an Intel specific userspace
driver that is using userspace IO pagetables to optimize
cross-platform qemu vIOMMU emulation. We should be comfortable with
the idea that accessing the full device-specific feature set requires
a HW specific user space driver.

> Admittedly those are pretty niche cases, but allowing for them gives
> us flexibility for the future.  Emulating an ARM SMMUv3 guest on an
> ARM SMMU v4 or v5 or v.whatever host is likely to be a real case in
> the future, and AFAICT, ARM are much less conservative that x86 about
> maintaining similar hw interfaces over time.  That's why I think
> considering these ppc cases will give a more robust interface for
> other future possibilities as well.

I don't think so - PPC has just done two things that are completely
divergent from eveything else - having two IO page tables for the same
end point, and using map/unmap hypercalls instead of a nested page
table.

Everyone else seems to be focused on IOPTEs that are similar to CPU
PTEs, particularly to enable SVA and other tricks, and CPU's don't
have either of this weirdness.

> You can consider that a map/unmap hypercall, but the size of the
> mapping is fixed (the IO pagesize which was previously set for the
> aperture).

Yes, I would consider that a map/unmap hypercall vs a nested translation.
 
> > Assuming yes, I'd expect that:
> > 
> > The iommu_domain for nested PPC is just a log of map/unmap hypervsior
> > calls to make. Whenever a new PE is attached to that domain it gets
> > the logged map's replayed to set it up, and when a PE is detached the
> > log is used to unmap everything.
> 
> And likewise duplicate every H_PUT_TCE to all the PEs in the domain.
> Sure.  It means the changes won't be atomic across the domain, but I
> guess that doesn't matter.  I guess you could have the same thing on a
> sufficiently complex x86 or ARM system, if you put two devices into
> the IOAS that were sufficiently far from each other in the bus
> topology that they use a different top-level host IOMMU.

Yes, strict atomicity is not needed.

> > It is not perfectly memory efficient - and we could perhaps talk about
> > a API modification to allow re-use of the iommufd datastructure
> > somehow, but I think this is a good logical starting point.
> 
> Because the map size is fixed, a "replay log" is effectively
> equivalent to a mirror of the entire IO pagetable.

So, for virtualized PPC the iommu_domain is an xarray of mapped PFNs
and device attach/detach just sweeps the xarray and does the
hypercalls. Very similar to what we discussed for S390.

It seems OK, 

Re: fully convert arm to use dma-direct

2022-05-06 Thread Marc Zyngier
On Fri, 22 Apr 2022 22:17:20 +0100,
Linus Walleij  wrote:
> 
> On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
> 
> > arm is the last platform not using the dma-direct code for directly
> > mapped DMA.  With the dmaboune removal from Arnd we can easily switch
> > arm to always use dma-direct now (it already does for LPAE configs
> > and nommu).  I'd love to merge this series through the dma-mapping tree
> > as it gives us the opportunity for additional core dma-mapping
> > improvements.
> (...)
> 
> >  b/arch/arm/mach-footbridge/Kconfig   |1
> >  b/arch/arm/mach-footbridge/common.c  |   19
> >  b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8
> >  b/arch/arm/mach-footbridge/include/mach/memory.h |4
> 
> I think Marc Z has a Netwinder that he can test this on. Marc?
> I have one too, just not much in my office because of parental leave.

Finally found some time to hook the machine up and throw a new kernel
at it. Booted at its usual glacial speed, so FWIW:

Tested-by: Marc Zyngier 

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 00/19] IOMMUFD Dirty Tracking

2022-05-06 Thread Jason Gunthorpe via iommu
On Fri, May 06, 2022 at 03:51:40AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Thursday, May 5, 2022 10:08 PM
> > 
> > On Thu, May 05, 2022 at 07:40:37AM +, Tian, Kevin wrote:
> > 
> > > In concept this is an iommu property instead of a domain property.
> > 
> > Not really, domains shouldn't be changing behaviors once they are
> > created. If a domain supports dirty tracking and I attach a new device
> > then it still must support dirty tracking.
> 
> That sort of suggests that userspace should specify whether a domain
> supports dirty tracking when it's created. But how does userspace
> know that it should create the domain in this way in the first place? 
> live migration is triggered on demand and it may not happen in the
> lifetime of a VM.

The best you could do is to look at the devices being plugged in at VM
startup, and if they all support live migration then request dirty
tracking, otherwise don't.

However, tt costs nothing to have dirty tracking as long as all iommus
support it in the system - which seems to be the normal case today.

We should just always turn it on at this point. 

> and if the user always creates domain to allow dirty tracking by default,
> how does it know a failed attach is due to missing dirty tracking support
> by the IOMMU and then creates another domain which disables dirty
> tracking and retry-attach again?

The automatic logic is complicated for sure, if you had a device flag
it would have to figure it out that way

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-06 Thread Tian, Kevin
> From: David Gibson 
> Sent: Friday, May 6, 2022 1:25 PM
> 
> >
> > When the iommu_domain is created I want to have a
> > iommu-driver-specific struct, so PPC can customize its iommu_domain
> > however it likes.
> 
> This requires that the client be aware of the host side IOMMU model.
> That's true in VFIO now, and it's nasty; I was really hoping we could
> *stop* doing that.

that model is anyway inevitable when talking about user page table,
i.e. when nesting is enabled.

> 
> Note that I'm talking here *purely* about the non-optimized case where
> all updates to the host side IO pagetables are handled by IOAS_MAP /
> IOAS_COPY, with no direct hardware access to user or guest managed IO
> pagetables.  The optimized case obviously requires end-to-end
> agreement on the pagetable format amongst other domain properties.
> 
> What I'm hoping is that qemu (or whatever) can use this non-optimized
> as a fallback case where it does't need to know the properties of
> whatever host side IOMMU models there are.  It just requests what it
> needs based on the vIOMMU properties it needs to replicate and the
> host kernel either can supply it or can't.
> 
> In many cases it should be perfectly possible to emulate a PPC style
> vIOMMU on an x86 host, because the x86 IOMMU has such a colossal
> aperture that it will encompass wherever the ppc apertures end
> up. Similarly we could simulate an x86-style no-vIOMMU guest on a ppc
> host (currently somewhere between awkward and impossible) by placing
> the host apertures to cover guest memory.
> 
> Admittedly those are pretty niche cases, but allowing for them gives
> us flexibility for the future.  Emulating an ARM SMMUv3 guest on an
> ARM SMMU v4 or v5 or v.whatever host is likely to be a real case in
> the future, and AFAICT, ARM are much less conservative that x86 about
> maintaining similar hw interfaces over time.  That's why I think
> considering these ppc cases will give a more robust interface for
> other future possibilities as well.

It's not niche cases. We already have virtio-iommu which can work
on both ARM and x86 platforms, i.e. what current iommufd provides
is already generic enough except on PPC.

Then IMHO the key open here is:

Can PPC adapt to the current iommufd proposal if it can be
refactored to fit the standard iommu domain/group concepts?

If not, what is the remaining gap after PPC becomes a normal
citizen in the iommu layer and is it worth solving it in the general
interface or via iommu-driver-specific domain (given this will
exist anyway)?

to close that open I'm with Jason:

   "Fundamentally PPC has to fit into the iommu standard framework of
   group and domains, we can talk about modifications, but drifting too
   far away is a big problem."

Directly jumping to the iommufd layer for what changes might be
applied to all platforms sounds counter-intuitive if we haven't tried 
to solve the gap in the iommu layer in the first place, as even
there is argument that certain changes in iommufd layer can find
matching concept on other platforms it still sort of looks redundant
since those platforms already work with the current model.

My two cents.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Baolu Lu

On 2022/5/6 03:27, Jason Gunthorpe wrote:

On Thu, May 05, 2022 at 07:56:59PM +0100, Robin Murphy wrote:


Ack to that, there are certainly further improvements to consider once we've
got known-working code into a released kernel, but let's not get ahead of
ourselves just now.

Yes please
  

(I'm pretty sure we could get away with a single blocking domain per IOMMU
instance, rather than per group, but I deliberately saved that one for later
- right now one per group to match default domains is simpler to reason
about and less churny in the context of the current ownership patches)

I noticed this too..

But I thought the driver can do a better job of this. There is no
reason it has to allocate memory to return a IOMMU_DOMAIN_BLOCKED
domain - this struct could be a globally allocated singleton for the
entire driver and that would be even better memory savings.

For instance, here is a sketch for the Intel driver based on Baolu's
remark that intel_iommu_detach_device() establishes a blocking
behavior already on detach_dev (Baolu if you like it can you make a
proper patch?):


Yes, I will.

The same scheme could also be applied to identity/sva/block domains.
Perhaps make it after v5.19.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-06 Thread Robin Murphy

On 2022-05-06 00:28, Tian, Kevin wrote:

From: Jason Gunthorpe
Sent: Thursday, May 5, 2022 11:33 PM

/*
-* If the group has been claimed already, do not re-attach the default
-* domain.
+* New drivers should support default domains and so the
detach_dev() op
+* will never be called. Otherwise the NULL domain indicates the
+* translation for the group should be set so it will work with the


translation should be 'blocked'?


No, not blocked.


+* platform DMA ops.


I didn't get the meaning of the last sentence.


It is as discussed with Robin, NULL means to return the group back to
some platform defined translation, and in some cases this means
returning back to work with the platform's DMA ops - presumably if it
isn't using the dma api.


This is clearer than the original text.


Perhaps we could just leave that sentence as "Otherwise the NULL domain 
represents platform-specific behaviour."





+   /*
+* Changing the domain is done by calling attach on the new domain.
+* Drivers should implement this so that DMA is always translated by


what does 'this' mean?


The code below - attach_dev called in a loop for each dev in the group.


Yes.




+* either the new, old, or a blocking domain. DMA should never


isn't the blocking domain passed in as the new domain?


Soemtimes. This is a statement about the required
atomicity. New/old/block are all valid translations during the
operation. Identity is not.


but new/old/block are not the same type of classifications. A group
has an old domain and a new domain at this transition point, and
both old/new domains have a domain type (dma, unmanged, block,
identity, etc.). Mixing them together only increases confusion here.


Good point - in particular I think the "DMA is always translated" part 
would be more accurately said as "DMA is always managed". When we're 
reattaching back to the default domain here, and it happens to be an 
identity domain, then DMA *may* be untranslated, but in a manner that 
we've knowingly chosen. The key point is that if the driver supports 
core domains, then it should never have any in-between state that allows 
access to anything *other* than the current domain or the new domain.


Thanks,
Robin.





So, I'm going to leave this patch as-is since it has been tested now
and we need to get the series back into iommu-next ASAP.



Agree. Just hope some improvements on the code comment.

But either way given no more code change:

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Reorganize __iommu_attach_group()

2022-05-06 Thread Robin Murphy

On 2022-05-05 17:15, Jason Gunthorpe via iommu wrote:

Call iommu_group_do_attach_device() only from
__iommu_group_attach_domain() which should be used to attach any domain to
the group.

The only unique thing __iommu_attach_group() does is to check if the group
is already attached to some caller specified group. Put this test into
__iommu_group_is_core_domain(), matching the
__iommu_group_attach_core_domain() nomenclature.

Change the two callers to directly call __iommu_group_attach_domain() and
test __iommu_group_is_core_domain().

iommu_attach_device() should trigger a WARN_ON if the group is attached as
the caller is using the function wrong.


Nit: if that's true then it's equally true for iommu_attach_group() as well.


Suggested-by: "Tian, Kevin" 
Signed-off-by: Jason Gunthorpe 
---
  drivers/iommu/iommu.c | 42 +++---
  1 file changed, 19 insertions(+), 23 deletions(-)

This goes after "iommu: iommu_group_claim_dma_owner() must always assign a
domain" and simplifies some of the remaining duplication.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c1bdec807ea381..09d00be887f924 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -81,9 +81,10 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
 unsigned type);
  static int __iommu_attach_device(struct iommu_domain *domain,
 struct device *dev);
-static int __iommu_attach_group(struct iommu_domain *domain,
-   struct iommu_group *group);
+static int __iommu_group_attach_domain(struct iommu_group *group,
+  struct iommu_domain *new_domain);
  static void __iommu_group_attach_core_domain(struct iommu_group *group);
+static bool __iommu_group_is_core_domain(struct iommu_group *group);
  static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
  static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
@@ -1938,10 +1939,11 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 */
mutex_lock(>mutex);
ret = -EINVAL;
-   if (iommu_group_device_count(group) != 1)
+   if (iommu_group_device_count(group) != 1 ||
+   WARN_ON(!__iommu_group_is_core_domain(group)))
goto out_unlock;
  
-	ret = __iommu_attach_group(domain, group);

+   ret = __iommu_group_attach_domain(group, domain);
  
  out_unlock:

mutex_unlock(>mutex);
@@ -2032,31 +2034,19 @@ static int iommu_group_do_attach_device(struct device 
*dev, void *data)
return __iommu_attach_device(domain, dev);
  }
  
-static int __iommu_attach_group(struct iommu_domain *domain,

-   struct iommu_group *group)
-{
-   int ret;
-
-   if (group->domain && group->domain != group->default_domain &&
-   group->domain != group->blocking_domain)
-   return -EBUSY;
-
-   ret = __iommu_group_for_each_dev(group, domain,
-iommu_group_do_attach_device);
-   if (ret == 0)
-   group->domain = domain;
-
-   return ret;
-}
-
  int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
  {
int ret;
  
  	mutex_lock(>mutex);

-   ret = __iommu_attach_group(domain, group);
-   mutex_unlock(>mutex);
+   if (!__iommu_group_is_core_domain(group)) {
+   ret = -EBUSY;
+   goto out_unlock;
+   }
  
+	ret = __iommu_group_attach_domain(group, domain);

+out_unlock:
+   mutex_unlock(>mutex);
return ret;
  }
  EXPORT_SYMBOL_GPL(iommu_attach_group);
@@ -2110,6 +2100,12 @@ static int __iommu_group_attach_domain(struct 
iommu_group *group,
return 0;
  }
  
+static bool __iommu_group_is_core_domain(struct iommu_group *group)


I can see the thought process behind it, but once we've had some time 
away from actively working on this area, this is clearly going to be a 
terrible name. "Is this group a core domain? Er, no, it's a group; what 
an odd question to ask :/" Even getting past that, does it make sense to 
say NULL is a core domain? I'm not convinced. For the sake of future 
readability, I'd prefer to call this something more purpose-related like 
__iommu_group_can_attach() (and also just define it earlier to avoid the 
forward-declaration).


In fact at that point I'd also be tempted to rename 
__iommu_group_attach_domain() to __iommu_group_set_domain(), to further 
clarify that attach/detach still reflects the external API, but the 
internal mechanism is really a bit different since default/core domains 
came in.


Thanks,
Robin.


+{
+   return !group->domain || group->domain == group->default_domain ||
+  group->domain == group->blocking_domain;
+}
+
  /*
   * Put the group's domain back to 

RE: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread Tian, Kevin
> From: David Woodhouse 
> Sent: Friday, May 6, 2022 3:17 PM
> 
> On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > > From: Baolu Lu 
> > >
> > > > --- a/include/linux/dmar.h
> > > > +++ b/include/linux/dmar.h
> > > > @@ -19,7 +19,7 @@
> > > >   struct acpi_dmar_header;
> > > >
> > > >   #ifdefCONFIG_X86
> > > > -# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > > +# define   DMAR_UNITS_SUPPORTED640
> > > >   #else
> > > >   # define  DMAR_UNITS_SUPPORTED64
> > > >   #endif
> >
> > ... is it necessary to permanently do 10x increase which wastes memory
> > on most platforms which won't have such need.
> 
> I was just looking at that. It mostly adds about 3½ KiB to each struct
> dmar_domain.
> 
> I think the only actual static array is the dmar_seq_ids bitmap which
> grows to 640 *bits* which is fairly negligible, and the main growth is
> that it adds about 3½ KiB to each struct dmar_domain for the
> iommu_refcnt[] and iommu_did[] arrays.

Thanks for the quick experiment! though the added material is
negligible it's cleaner to me if having a way to configure it as
discussed below.

> 
> > Does it make more sense to have a configurable approach similar to
> > CONFIG_NR_CPUS? or even better can we just replace those static
> > arrays with dynamic allocation so removing this restriction
> > completely?
> 
> Hotplug makes that fun, but I suppose you only need to grow the array
> in a given struct dmar_domain if you actually add a device to it that's
> behind a newly added IOMMU. I don't know if the complexity of making it
> fully dynamic is worth it though. We could make it a config option,
> and/or a command line option (perhaps automatically derived from
> CONFIG_NR_CPUS).

either config option or command line option is OK to me. Probably
the former is simpler given no need to dynamically expand the
static array. btw though deriving from CONFIG_NR_CPUS could work 
in this case it is unclear why tying the two together is necessary in
concept, e.g. is there guarantee that the number of IOMMUs must
be smaller than the number of CPUs in a platform?

> 
> If it wasn't for hotplug, I think we'd know the right number by the
> time we actually need it anyway, wouldn't we? Can we have a heuristic
> for how many DMAR units are likely to be hotplugged? Is it as simple as
> the ratio of present to not-yet-present CPUs in MADT?

Probably. But I don't have enough knowledge on DMAR hotplug to
judge (e.g. whether it's strictly tied to CPU hotplug and if yes whether
there could be multiple IOMMUs hotplugged together with a CPU
socket)...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread Tian, Kevin
> From: Rodel, Jorg
> Sent: Friday, May 6, 2022 3:11 PM
> 
> On Fri, May 06, 2022 at 06:49:43AM +, Tian, Kevin wrote:
> > another nit: dmar is intel specific thus CONFIG_X86 is always true.
> 
> There are Itanium systems which have DMAR units. Is that no longer
> supported?
> 

Sorry I just forgot that. haven't touched an Itanium system for
over ten years. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/4] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-06 Thread Baolu Lu

On 2022/5/6 14:10, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Friday, May 6, 2022 1:27 PM
+
+/*
+ * Set the page snoop control for a pasid entry which has been set up.
+ */
+void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
+ struct device *dev, u32 pasid)
+{
+   struct pasid_entry *pte;
+   u16 did;
+
+   spin_lock(>lock);
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (WARN_ON(!pte || !pasid_pte_is_present(pte))) {
+   spin_unlock(>lock);
+   return;
+   }
+
+   pasid_set_pgsnp(pte);
+   did = pasid_get_domain_id(pte);
+   spin_unlock(>lock);
+
+   pasid_flush_caches(iommu, pte, pasid, did);
+}


The comment of pasid_flush_caches() says:

/*
  * This function flushes cache for a newly setup pasid table entry.
  * Caller of it should not modify the in-use pasid table entries.
  */

Can you check whether that comment still hold?


I am sorry that I overlooked this. Very appreciated for pointing this
out!

How about below additional changes?

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index da47406011d3..68f7e8cfa848 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -782,5 +782,24 @@ void intel_pasid_setup_page_snoop_control(struct 
intel_iommu *iommu,

did = pasid_get_domain_id(pte);
spin_unlock(>lock);

-   pasid_flush_caches(iommu, pte, pasid, did);
+   if (!ecap_coherent(iommu->ecap))
+   clflush_cache_range(pte, sizeof(*pte));
+
+   /*
+* VT-d spec 3.4 table23 states guides for cache invalidation:
+*
+* - PASID-selective-within-Domain PASID-cache invalidation
+* - PASID-selective PASID-based IOTLB invalidation
+* - If (pasid is RID_PASID)
+*- Global Device-TLB invalidation to affected functions
+*   Else
+*- PASID-based Device-TLB invalidation (with S=1 and
+*  Addr[63:12]=0x7FFF_F) to affected functions
+*/
+   pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+   qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
+
+   /* Device IOTLB doesn't need to be flushed in caching mode. */
+   if (!cap_caching_mode(iommu->cap))
+   devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/sva: Fix PASID use-after-free issue

2022-05-06 Thread Thomas Gleixner
On Fri, May 06 2022 at 09:49, Zhangfei Gao wrote:
> Will this be taken for 5.18?

It's queued in tip core/urgent and will go to Linus this week.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread David Woodhouse
On Fri, 2022-05-06 at 06:49 +, Tian, Kevin wrote:
> > From: Baolu Lu 
> > 
> > > --- a/include/linux/dmar.h
> > > +++ b/include/linux/dmar.h
> > > @@ -19,7 +19,7 @@
> > >   struct acpi_dmar_header;
> > > 
> > >   #ifdef  CONFIG_X86
> > > -# define DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > > +# define DMAR_UNITS_SUPPORTED640
> > >   #else
> > >   # defineDMAR_UNITS_SUPPORTED64
> > >   #endif
> 
> ... is it necessary to permanently do 10x increase which wastes memory
> on most platforms which won't have such need.

I was just looking at that. It mostly adds about 3½ KiB to each struct
dmar_domain.

I think the only actual static array is the dmar_seq_ids bitmap which
grows to 640 *bits* which is fairly negligible, and the main growth is
that it adds about 3½ KiB to each struct dmar_domain for the
iommu_refcnt[] and iommu_did[] arrays.

> Does it make more sense to have a configurable approach similar to
> CONFIG_NR_CPUS? or even better can we just replace those static
> arrays with dynamic allocation so removing this restriction
> completely?

Hotplug makes that fun, but I suppose you only need to grow the array
in a given struct dmar_domain if you actually add a device to it that's
behind a newly added IOMMU. I don't know if the complexity of making it
fully dynamic is worth it though. We could make it a config option,
and/or a command line option (perhaps automatically derived from
CONFIG_NR_CPUS).

If it wasn't for hotplug, I think we'd know the right number by the
time we actually need it anyway, wouldn't we? Can we have a heuristic
for how many DMAR units are likely to be hotplugged? Is it as simple as
the ratio of present to not-yet-present CPUs in MADT?


> another nit: dmar is intel specific thus CONFIG_X86 is always true.

DMAR exists on IA64 too.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread Rodel, Jorg
On Fri, May 06, 2022 at 06:49:43AM +, Tian, Kevin wrote:
> another nit: dmar is intel specific thus CONFIG_X86 is always true.

There are Itanium systems which have DMAR units. Is that no longer
supported?

Regards,

-- 
Jörg Rödel
jroe...@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/vt-d: Increase DMAR_UNITS_SUPPORTED

2022-05-06 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Friday, May 6, 2022 1:57 PM
> 
> On 2022/5/6 03:46, Steve Wahl wrote:
> > Increase DMAR_UNITS_SUPPORTED to support 64 sockets with 10 DMAR
> units
> > each, for a total of 640.
> >
> > If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously
> set
> > to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
> > allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
> > remapping doesn't support X2APIC mode x2apic disabled"; and the system
> > fails to boot.
> >
> > Signed-off-by: Steve Wahl 
> > Reviewed-by: Mike Travis 
> > ---
> >
> > Note that we could not find a reason for connecting
> > DMAR_UNITS_SUPPORTED to MAX_IO_APICS as was done previously.
> Perhaps
> > it seemed like the two would continue to match on earlier processors.
> > There doesn't appear to be kernel code that assumes that the value of
> > one is related to the other.
> 
> +Kevin
> 
> This maximum value was introduced by below commit. And I don't see any
> hardware/software restrictions that we can't enlarge it after ten years.

I don't see a rationale either, but...

> 
> commit 1b198bb04ad72669d4bd6575fc9945ed595bfee0
> Author: Mike Travis 
> Date:   Mon Mar 5 15:05:16 2012 -0800
> 
>  x86/iommu/intel: Increase the number of iommus supported to
> MAX_IO_APICS
> 
>  The number of IOMMUs supported should be the same as the number
>  of IO APICS.  This limit comes into play when the IOMMUs are
>  identity mapped, thus the number of possible IOMMUs in the
>  "static identity" (si) domain should be this same number.
> [...]
> 
> >
> >   include/linux/dmar.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> > index 45e903d84733..9d4867b8f42e 100644
> > --- a/include/linux/dmar.h
> > +++ b/include/linux/dmar.h
> > @@ -19,7 +19,7 @@
> >   struct acpi_dmar_header;
> >
> >   #ifdefCONFIG_X86
> > -# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
> > +# define   DMAR_UNITS_SUPPORTED640
> >   #else
> >   # define  DMAR_UNITS_SUPPORTED64
> >   #endif
> 

... is it necessary to permanently do 10x increase which wastes memory
on most platforms which won't have such need.

Does it make more sense to have a configurable approach similar to
CONFIG_NR_CPUS? or even better can we just replace those static
arrays with dynamic allocation so removing this restriction
completely?

another nit: dmar is intel specific thus CONFIG_X86 is always true.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 2/4] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-06 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Friday, May 6, 2022 1:27 PM
> +
> +/*
> + * Set the page snoop control for a pasid entry which has been set up.
> + */
> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid)
> +{
> + struct pasid_entry *pte;
> + u16 did;
> +
> + spin_lock(>lock);
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte || !pasid_pte_is_present(pte))) {
> + spin_unlock(>lock);
> + return;
> + }
> +
> + pasid_set_pgsnp(pte);
> + did = pasid_get_domain_id(pte);
> + spin_unlock(>lock);
> +
> + pasid_flush_caches(iommu, pte, pasid, did);
> +}

The comment of pasid_flush_caches() says:

/*
 * This function flushes cache for a newly setup pasid table entry.
 * Caller of it should not modify the in-use pasid table entries.
 */

Can you check whether that comment still hold?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu