from:"Thomas Gleixner"

Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2022-05-14 Thread Thomas Gleixner

On Fri, May 13 2022 at 15:16, Ricardo Neri wrote:
> On Mon, May 09, 2022 at 04:03:39PM +0200, Thomas Gleixner wrote:
>> > +  /* If we are here, IPI shorthands are enabled. */
>> > +  apic->send_IPI_allbutself(NMI_VECTOR);
>> 
>> So if the monitored cpumask is a subset of online CPUs, which is the
>> case when isolation features are enabled, then you still send NMIs to
>> those isolated CPUs. I'm sure the isolation folks will be enthused.
>
> Yes, I acknowledged this limitation in the cover letter. I should also update
> Documentation/admin-guide/lockup-watchdogs.rst.
>
> This patchset proposes the HPET NMI watchdog as an opt-in feature.
>
> Perhaps the limitation might be mitigated by adding a check for 
> non-housekeeping
> and non-monitored CPUs in exc_nmi(). However, that will not eliminate the
> problem of isolated CPUs also getting the NMI.

Right. It's a mess...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-14 Thread Thomas Gleixner

On Fri, May 13 2022 at 14:19, Ricardo Neri wrote:
> On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote:
>> The argument about not bloating the code
>> with an "obvious???" function which is quite small is slightly beyond my
>> comprehension level.
>
> That obvious function would look like this:
>
> void hpet_set_comparator_one_shot(int channel, u32 delta)
> {
>   u32 count;
>
>   count = hpet_readl(HPET_COUNTER);
>   count += delta;
>   hpet_writel(count, HPET_Tn_CMP(channel));
> }

This function only works reliably when the delta is large. See
hpet_clkevt_set_next_event().

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-14 Thread Thomas Gleixner

On Fri, May 13 2022 at 16:45, Ricardo Neri wrote:
> On Fri, May 13, 2022 at 10:50:09PM +0200, Thomas Gleixner wrote:
>> > Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips
>> > INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for 
>> > NULL.
>> > They currently unconditionally call the parent irq_chip's 
>> > irq_set_affinity().
>> > I see that there is a irq_chip_set_affinity_parent() function. Perhaps it 
>> > can
>> > be used for this check?
>> 
>> Yes, this lacks obviously a NMI specific set_affinity callback and this
>> can be very trivial and does not have any of the complexity of interrupt
>> affinity assignment. First online CPU in the mask with a fallback to any
>> online CPU.
>
> Why would we need a fallback to any online CPU? Shouldn't it fail if it cannot
> find an online CPU in the mask?

Might as well fail. Let me think about it.

>> I did not claim that this is complete. This was for illustration.
>
> In the reworked patch, may I add a Co-developed-by with your name and your 
> SOB?

Suggested-by is good enough.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-13 Thread Thomas Gleixner

On Fri, May 13 2022 at 11:03, Ricardo Neri wrote:
> On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote:
>> Why would a NMI ever end up in this code? There is no vector management
>> required and this find cpu exercise is pointless.
>
> But even if the NMI has a fixed vector, it is still necessary to determine
> which CPU will get the NMI. It is still necessary to determine what to
> write in the Destination ID field of the MSI message.
>
> irq_matrix_find_best_cpu() would find the CPU with the lowest number of
> managed vectors so that the NMI is directed to that CPU. 

What's the point to send it to the CPU with the lowest number of
interrupts. It's not that this NMI happens every 50 microseconds.
We pick one online CPU and are done.

> In today's code, an NMI would end up here because we rely on the existing
> interrupt management infrastructure... Unless, the check is done the entry
> points as you propose.

Correct. We don't want to call into functions which are not designed for
NMIs.
 
>> > +
>> > +  if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
>> > +  cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
>> > +  apicd->cpu = cpu;
>> > +  vector = 0;
>> > +  goto no_vector;
>> > +  }
>> 
>> This code can never be reached for a NMI delivery. If so, then it's a
>> bug.
>> 
>> This all is special purpose for that particular HPET NMI watchdog use
>> case and we are not exposing this to anything else at all.
>> 
>> So why are you sprinkling this NMI nonsense all over the place? Just
>> because? There are well defined entry points to all of this where this
>> can be fenced off.
>
> I put the NMI checks in these points because assign_vector_locked() and
> assign_managed_vector() are reached through multiple paths and these are
> the two places where the allocation of the vector is requested and the
> destination CPU is determined.
>
> I do observe this code being reached for an NMI, but that is because this
> code still does not know about NMIs... Unless the checks for NMI are put
> in the entry points as you pointed out.
>
> The intent was to refactor the code in a generic manner and not to focus
> only in the NMI watchdog. That would have looked hacky IMO.

We don't want to have more of this really. Supporting NMIs on x86 in a
broader way is simply not reasonable because there is only one NMI
vector and we have no sensible way to get to the cause of the NMI
without a massive overhead.

Even if we get multiple NMI vectors some shiny day, this will be
fundamentally different than regular interrupts and certainly not
exposed broadly. There will be 99.99% fixed vectors for simplicity sake.

>> +if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>> +/*
>> + * NMIs have a fixed vector and need their own
>> + * interrupt chip so nothing can end up in the
>> + * regular local APIC management code except the
>> + * MSI message composing callback.
>> + */
>> +irqd->chip = _nmi_controller;
>> +/*
>> + * Don't allow affinity setting attempts for NMIs.
>> + * This cannot work with the regular affinity
>> + * mechanisms and for the intended HPET NMI
>> + * watchdog use case it's not required.
>
> But we do need the ability to set affinity, right? As stated above, we need
> to know what Destination ID to write in the MSI message or in the interrupt
> remapping table entry.
>
> It cannot be any CPU because only one specific CPU is supposed to handle the
> NMI from the HPET channel.
>
> We cannot hard-code a CPU for that because it may go offline (and ignore NMIs)
> or not be part of the monitored CPUs.
>
> Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips
> INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for 
> NULL.
> They currently unconditionally call the parent irq_chip's irq_set_affinity().
> I see that there is a irq_chip_set_affinity_parent() function. Perhaps it can
> be used for this check?

Yes, this lacks obviously a NMI specific set_affinity callback and this
can be very trivial and does not have any of the complexity of interrupt
affinity assignment. First online CPU in the mask with a fallback to any
online CPU.

I did not claim that this is complete. This was for illustration.

>> + */
>> +irqd_set_no_balance(irqd);
>
> This code does not set apicd->hw_irq_cfg.delivery_mode as NMI, right?
> I had to add that to make it work.

I assumed you can figure that out on your own :)

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Thomas Gleixner

On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
>> +/*
>> + * If in use, the HPET hardlockup detector relies on tsc_khz.
>> + * Reconfigure it to make use of the refined tsc_khz.
>> + */
>> +lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter but in principle it's
> a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped [actually that
> looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].
>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.
>
> So you want to disable HPET watchdog if it was enabled, then update
> wherever you're using tsc_khz, then re-enable.

The real question is whether making this refined tsc_khz value
immediately effective matters at all. IMO, it does not because up to
that point the watchdog was happily using the coarse calibrated value
and the whole use TSC to assess whether the HPET fired mechanism is just
a guestimate anyway. So what's the point of trying to guess 'more
correct'.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2022-05-09 Thread Thomas Gleixner

On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> + if (is_hpet_hld_interrupt(hdata)) {
> + /*
> +  * Kick the timer first. If the HPET channel is periodic, it
> +  * helps to reduce the delta between the expected TSC value and
> +  * its actual value the next time the HPET channel fires.
> +  */
> + kick_timer(hdata, !(hdata->has_periodic));
> +
> + if (cpumask_weight(hld_data->monitored_cpumask) > 1) {
> + /*
> +  * Since we cannot know the source of an NMI, the best
> +  * we can do is to use a flag to indicate to all online
> +  * CPUs that they will get an NMI and that the source of
> +  * that NMI is the hardlockup detector. Offline CPUs
> +  * also receive the NMI but they ignore it.
> +  *
> +  * Even though we are in NMI context, we have concluded
> +  * that the NMI came from the HPET channel assigned to
> +  * the detector, an event that is infrequent and only
> +  * occurs in the handling CPU. There should not be races
> +  * with other NMIs.
> +  */
> + cpumask_copy(hld_data->inspect_cpumask,
> +  cpu_online_mask);
> +
> + /* If we are here, IPI shorthands are enabled. */
> + apic->send_IPI_allbutself(NMI_VECTOR);

So if the monitored cpumask is a subset of online CPUs, which is the
case when isolation features are enabled, then you still send NMIs to
those isolated CPUs. I'm sure the isolation folks will be enthused.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category

2022-05-09 Thread Thomas Gleixner

On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> is to be used with the HPET-based hardlockup detector. This detector
> does not have a direct way of checking if the HPET timer is the source of
> the NMI. Instead, it indirectly estimates it using the time-stamp counter.
>
> Therefore, we may have false-positives in case another NMI occurs within
> the estimated time window. For this reason, we want the handler of the
> detector to be called after all the NMI_LOCAL handlers. A simple way
> of achieving this with a new NMI handler category.
>
> @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
>   }
>   raw_spin_unlock(_reason_lock);
>  
> + handled = nmi_handle(NMI_WATCHDOG, regs);
> + if (handled == NMI_HANDLED)
> + goto out;
> +

How is this supposed to work reliably?

If perf is active and the HPET NMI and the perf NMI come in around the
same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
won't be checked. Because MSI is strictly edge and the message is only
sent once, this can result in a stale watchdog, no?

Thanks,

tglx



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-06 Thread Thomas Gleixner

On Fri, May 06 2022 at 23:41, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
>> Programming an HPET channel as periodic requires setting the
>> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
>> register must be written twice (once for the comparator value and once for
>> the periodic value). Since this programming might be needed in several
>> places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
>> add a helper function for this purpose.
>>
>> A helper function hpet_set_comparator_oneshot() could also be implemented.
>> However, such function would only program the comparator register and the
>> function would be quite small. Hence, it is better to not bloat the code
>> with such an obvious function.
>
> This word salad above does not provide a single reason why the periodic
> programming function is required and better suited for the NMI watchdog
> case and then goes on and blurbs about why a function which is not
> required is not implemented. The argument about not bloating the code
> with an "obvious???" function which is quite small is slightly beyond my
> comprehension level.

What's even more uncomprehensible is that the patch which actually sets
up that NMI watchdog cruft has:

> +   if (hc->boot_cfg & HPET_TN_PERIODIC_CAP)
> +   hld_data->has_periodic = true;

So how the heck does that work with a HPET which does not support
periodic mode?

That watchdog muck will still happily invoke that set periodic function
in the hope that it works by chance?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Programming an HPET channel as periodic requires setting the
> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
> register must be written twice (once for the comparator value and once for
> the periodic value). Since this programming might be needed in several
> places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
> add a helper function for this purpose.
>
> A helper function hpet_set_comparator_oneshot() could also be implemented.
> However, such function would only program the comparator register and the
> function would be quite small. Hence, it is better to not bloat the code
> with such an obvious function.

This word salad above does not provide a single reason why the periodic
programming function is required and better suited for the NMI watchdog
case and then goes on and blurbs about why a function which is not
required is not implemented. The argument about not bloating the code
with an "obvious???" function which is quite small is slightly beyond my
comprehension level.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> +  *
> +  * Also, NMIs do not have an associated vector. No need for cleanup.

They have a vector and what the heck is this cleanup comment for here?
There is nothing cleanup alike anywhere near.

Adding confusing comments is worse than adding no comments at all.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
>  
> + if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
> + /* Only one IRQ per NMI */
> + if (nr_irqs != 1)
> + return -EINVAL;

See previous reply.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> The Intel IOMMU interrupt remapping driver already programs correctly the
> delivery mode of individual irqs as per their irq_data. Improve handling
> of NMIs. Allow only one irq per NMI. Also, it is not necessary to cleanup
> irq vectors after updating affinity.

Structuring a changelog in paragraphs might make it readable. New lines
exist for a reason.

> NMIs do not have associated vectors.

Again. NMI has an vector associated, but it is not subject to dynamic
vector management.

> diff --git a/drivers/iommu/intel/irq_remapping.c 
> b/drivers/iommu/intel/irq_remapping.c
> index fb2d71bea98d..791a9331e257 100644
> --- a/drivers/iommu/intel/irq_remapping.c
> +++ b/drivers/iommu/intel/irq_remapping.c
> @@ -1198,8 +1198,12 @@ intel_ir_set_affinity(struct irq_data *data, const 
> struct cpumask *mask,
>* After this point, all the interrupts will start arriving
>* at the new destination. So, time to cleanup the previous
>* vector allocation.
> +  *
> +  * Do it only for non-NMI irqs. NMIs don't have associated
> +  * vectors.

See above.

>*/
> - send_cleanup_vector(cfg);
> + if (cfg->delivery_mode != APIC_DELIVERY_MODE_NMI)
> + send_cleanup_vector(cfg);

So this needs to be replicated for all invocations of
send_cleanup_vector(), right? Why can't you put it into that function?
  
>   return IRQ_SET_MASK_OK_DONE;
>  }
> @@ -1352,6 +1356,9 @@ static int intel_irq_remapping_alloc(struct irq_domain 
> *domain,
>   if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
>   info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
>  
> + if ((info->flags & X86_IRQ_ALLOC_AS_NMI) && nr_irqs != 1)
> + return -EINVAL;

This cannot be reached when the vector allocation domain already
rejected it, but copy & pasta is wonderful and increases the line count.

Thanks,

tglx


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Vectors are meaningless when allocating IRQs with NMI as the delivery
> mode.

Vectors are not meaningless. NMI has a fixed vector.

The point is that for a fixed vector there is no vector management
required.

Can you spot the difference?

> In such case, skip the reservation of IRQ vectors. Do it in the lowest-
> level functions where the actual IRQ reservation takes place.
>
> Since NMIs target specific CPUs, keep the functionality to find the best
> CPU.

Again. What for?
  
> + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> + cpu = irq_matrix_find_best_cpu(vector_matrix, dest);
> + apicd->cpu = cpu;
> + vector = 0;
> + goto no_vector;
> + }

Why would a NMI ever end up in this code? There is no vector management
required and this find cpu exercise is pointless.

>   vector = irq_matrix_alloc(vector_matrix, dest, resvd, );
>   trace_vector_alloc(irqd->irq, vector, resvd, vector);
>   if (vector < 0)
>   return vector;
>   apic_update_vector(irqd, vector, cpu);
> +
> +no_vector:
>   apic_update_irq_cfg(irqd, vector, cpu);
>  
>   return 0;
> @@ -321,12 +330,22 @@ assign_managed_vector(struct irq_data *irqd, const 
> struct cpumask *dest)
>   /* set_affinity might call here for nothing */
>   if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
>   return 0;
> +
> + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> + cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
> + apicd->cpu = cpu;
> + vector = 0;
> + goto no_vector;
> + }

This code can never be reached for a NMI delivery. If so, then it's a
bug.

This all is special purpose for that particular HPET NMI watchdog use
case and we are not exposing this to anything else at all.

So why are you sprinkling this NMI nonsense all over the place? Just
because? There are well defined entry points to all of this where this
can be fenced off.

If at some day the hardware people get their act together and provide a
proper vector space for NMIs then we have to revisit that, but then
there will be a separate NMI vector management too.

What you want is the below which also covers the next patch. Please keep
an eye on the comments I added/modified.

Thanks,

tglx
---
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -42,6 +42,7 @@ EXPORT_SYMBOL_GPL(x86_vector_domain);
 static DEFINE_RAW_SPINLOCK(vector_lock);
 static cpumask_var_t vector_searchmask;
 static struct irq_chip lapic_controller;
+static struct irq_chip lapic_nmi_controller;
 static struct irq_matrix *vector_matrix;
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct hlist_head, cleanup_list);
@@ -451,6 +452,10 @@ static int x86_vector_activate(struct ir
trace_vector_activate(irqd->irq, apicd->is_managed,
  apicd->can_reserve, reserve);
 
+   /* NMI has a fixed vector. No vector management required */
+   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
+   return 0;
+
raw_spin_lock_irqsave(_lock, flags);
if (!apicd->can_reserve && !apicd->is_managed)
assign_irq_vector_any_locked(irqd);
@@ -472,6 +477,10 @@ static void vector_free_reserved_and_man
trace_vector_teardown(irqd->irq, apicd->is_managed,
  apicd->has_reserved);
 
+   /* NMI has a fixed vector. No vector management required */
+   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
+   return;
+
if (apicd->has_reserved)
irq_matrix_remove_reserved(vector_matrix);
if (apicd->is_managed)
@@ -568,6 +577,24 @@ static int x86_vector_alloc_irqs(struct
irqd->hwirq = virq + i;
irqd_set_single_target(irqd);
 
+   if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
+   /*
+* NMIs have a fixed vector and need their own
+* interrupt chip so nothing can end up in the
+* regular local APIC management code except the
+* MSI message composing callback.
+*/
+   irqd->chip = _nmi_controller;
+   /*
+* Don't allow affinity setting attempts for NMIs.
+* This cannot work with the regular affinity
+* mechanisms and for the intended HPET NMI
+* watchdog use case it's not required.
+*/
+   irqd_set_no_balance(irqd);
+   continue;
+   }
+
/*
 * Prevent that any of these interrupts is invoked in
 * non interrupt context via e.g.

Re: [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> There are no restrictions in hardware to set  MSI messages with its
> own delivery mode.

"messages with its own" ? Plural/singular confusion.

> Use the mode specified in the provided IRQ hardware
> configuration data. Since most of the IRQs are configured to use the
> delivery mode of the APIC driver in use (set in all of them to
> APIC_DELIVERY_MODE_FIXED), the only functional changes are where
> IRQs are configured to use a specific delivery mode.

This does not parse. There are no functional changes due to this patch
and there is no point talking about functional changes in subsequent
patches here.

> Changing the utility function __irq_msi_compose_msg() takes care of
> implementing the change in the in the local APIC, PCI-MSI, and DMAR-MSI

in the in the

> irq_chips.
>
> The IO-APIC irq_chip configures the entries in the interrupt redirection
> table using the delivery mode specified in the corresponding MSI message.
> Since the MSI message is composed by a higher irq_chip in the hierarchy,
> it does not need to be updated.

The point is that updating __irq_msi_compose_msg() covers _all_ MSI
consumers including IO-APIC.

I had to read that changelog 3 times to make sense of it. Something like
this perhaps:

  "x86/apic/msi: Use the delivery mode from irq_cfg for message composition

   irq_cfg provides a delivery mode for each interrupt. Use it instead
   of the hardcoded APIC_DELIVERY_MODE_FIXED. This allows to compose
   messages for NMI delivery mode which is required to implement a HPET
   based NMI watchdog.

   No functional change as the default delivery mode is set to
   APIC_DELIVERY_MODE_FIXED."

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode

2022-05-06 Thread Thomas Gleixner

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Currently, the delivery mode of all interrupts is set to the mode of the
> APIC driver in use. There are no restrictions in hardware to configure the
> delivery mode of each interrupt individually. Also, certain IRQs need
> to be

s/IRQ/interrupt/ Changelogs can do without acronyms.

> configured with a specific delivery mode (e.g., NMI).
>
> Add a new member, delivery_mode, to struct irq_cfg. Subsequent changesets
> will update every irq_domain to set the delivery mode of each IRQ to that
> specified in its irq_cfg data.
>
> To keep the current behavior, when allocating an IRQ in the root
> domain

The root domain does not allocate an interrupt. The root domain
allocates a vector for an interrupt. There is a very clear and technical
destinction. Can you please be more careful about the wording?

> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -567,6 +567,7 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> *domain, unsigned int virq,
>   irqd->chip_data = apicd;
>   irqd->hwirq = virq + i;
>   irqd_set_single_target(irqd);
> +

Stray newline.

>   /*
>* Prevent that any of these interrupts is invoked in
>* non interrupt context via e.g. generic_handle_irq()
> @@ -577,6 +578,14 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> *domain, unsigned int virq,
>   /* Don't invoke affinity setter on deactivated interrupts */
>   irqd_set_affinity_on_activate(irqd);
>  
> + /*
> +  * Initialize the delivery mode of this irq to match the

s/irq/interrupt/

> +  * default delivery mode of the APIC. Children irq domains
> +  * may take the delivery mode from the individual irq
> +  * configuration rather than from the APIC driver.
> +  */
> + apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
> +
>   /*
>* Legacy vectors are already assigned when the IOAPIC
>* takes them over. They stay on the same vector. This is
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors

2022-05-06 Thread Thomas Gleixner

Ricardo,

On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> Certain types of interrupts, such as NMI, do not have an associated vector.
> They, however, target specific CPUs. Thus, when assigning the destination
> CPU, it is beneficial to select the one with the lowest number of
> vectors.

Why is that beneficial especially in the context of a NMI watchdog which
then broadcasts the NMI to all other CPUs?

That's wishful thinking perhaps, but I don't see any benefit at all.

> Prepend the functions matrix_find_best_cpu_managed() and
> matrix_find_best_cpu_managed()

The same function prepended twice becomes two functions :)

> with the irq_ prefix and expose them for
> IRQ controllers to use when allocating and activating vector-less IRQs.

There is no such thing like a vectorless IRQ. NMIs have a vector. Can we
please describe facts and not pulled out of thin air concepts which do
not exist?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/sva: Fix PASID use-after-free issue

2022-05-06 Thread Thomas Gleixner

On Fri, May 06 2022 at 09:49, Zhangfei Gao wrote:
> Will this be taken for 5.18?

It's queued in tip core/urgent and will go to Linus this week.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] x86/msi: Fix msi message data shadow struct

2022-04-06 Thread Thomas Gleixner

Reto,

On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote:

> The x86 MSI message data is 32 bits in total and is either in
> compatibility or remappable format, see Intel Virtualization Technology
> for Directed I/O, section 5.1.2.
>
> Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs")
> Signed-off-by: Reto Buerki 
> Signed-off-by: Adrian-Ken Rueegsegger 

This signed-off by chain is incorrect. It would be correct if Adrian-Ken
would have sent the patch.

If you both worked on that then please use the Co-developed-by tag
according to Documentation/process/

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 12/35] x86/msi: Provide msi message shadow structs

2022-04-06 Thread Thomas Gleixner

On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote:
> While working on some out-of-tree patches, we noticed that assignment to
> dmar_subhandle of struct x86_msi_data lead to a QEMU warning about
> reserved bits in MSI data being set:
>
> qemu-system-x86_64: vtd_interrupt_remap_msi: invalid IR MSI (sid=256, 
> address=0xfee003d8, data=0x1)
>
> This message originates from hw/i386/intel_iommu.c in QEMU:
>
> #define VTD_IR_MSI_DATA_RESERVED (0x)
> if (origin->data & VTD_IR_MSI_DATA_RESERVED) { ... }
>
> Looking at struct x86_msi_data, it appears that it is actually 48-bits in size
> since the bitfield containing the vector, delivery_mode etc is 2 bytes wide
> followed by dmar_subhandle which is 32 bits. Thus assignment to dmar_subhandle
> leads to bits > 16 being set.
>
> If I am not mistaken, the MSI data field should be 32-bits wide for all
> platforms (struct msi_msg, include/linux/msi.h). Is this analysis
> correct or did I miss something wrt. handling of dmar_subhandle?

It's correct and I'm completely surprised that this went unnoticed
for more than a year. Where is that brown paperbag...

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 11/11] docs: x86: Change documentation for SVA (Shared Virtual Addressing)

2022-02-14 Thread Thomas Gleixner

On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote:
> Since allocating, freeing and fixing up PASID are changed, the
> documentation is updated to reflect the changes.
>
> Originally-by: Ashok Raj 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Acked-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-14 Thread Thomas Gleixner

On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote:

> PASIDs are process wide. It was attempted to use refcounted PASIDs to
> free them when the last thread drops the refcount. This turned out to
> be complex and error prone. Given the fact that the PASID space is 20
> bits, which allows up to 1M processes to have a PASID associated
> concurrently, PASID resource exhaustion is not a realistic concern.
>
> Therefore it was decided to simplify the approach and stick with lazy
> on demand PASID allocation, but drop the eager free approach and make
> a allocated PASID lifetime bound to the life time of the process.
>
> Get rid of the refcounting mechanisms and replace/rename the interfaces
> to reflect this new approach.
>
> Suggested-by: Dave Hansen 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 04/11] kernel/fork: Initialize mm's PASID

2022-02-14 Thread Thomas Gleixner

On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote:

> A new mm doesn't have a PASID yet when it's created. Initialize
> the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).
>
> INIT_PASID (0) is reserved for kernel legacy DMA PASID. It cannot be
> allocated to a user process. Initializing the process's PASID to 0 may
> cause confusion that why the process uses the reserved kernel legacy DMA
> PASID. Initializing the PASID to INVALID_IOASID (-1) explicitly
> tells the process doesn't have a valid PASID yet.
>
> Even though the only user of mm_pasid_init() is in fork.c, define it
> in  as the first of three mm/pasid life cycle
> functions (init/set/drop) to keep these all together.
>
> Suggested-by: Dave Hansen 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 09/11] x86/cpufeatures: Re-enable ENQCMD

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote:
> Since ENQCMD is handled by #GP fix up, it can be re-enabled.
>
> The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add
> X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that
> cpu_feature_enabled() can be used to check the feature.
>
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 08/11] x86/traps: Demand-populate PASID MSR via #GP

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote:

> All tasks start with PASID state disabled. This means that the first
> time they execute an ENQCMD instruction they will take a #GP fault.
>
> Modify the #GP fault handler to check if the "mm" for the task has
> already been allocated a PASID. If so, try to fix the #GP fault by
> loading the IA32_PASID MSR.
>
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 07/11] sched: Define and initialize a flag to identify valid PASID in the task

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote:

> From: Peter Zijlstra 
>
> Add a new single bit field to the task structure to track whether this task
> has initialized the IA32_PASID MSR to the mm's PASID.
>
> Initialize the field to zero when creating a new task with fork/clone.
>
> Signed-off-by: Peter Zijlstra 
> Co-developed-by: Fenghua Yu 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 06/11] x86/fpu: Clear PASID when copying fpstate

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote:
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:
> To avoid complexity of updating each thread's PASID status (e.g. sending
> IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
> allocated and assigned to an mm, the PASID stays with the mm for the
> rest of the mm's lifetime. A reference to the PASID is taken on
> allocating the PASID. Binding/unbinding the PASID won't change refcount.
> The reference is dropped on mm exit and thus the PASID is freed.
>
> Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
> the PASID operations handle the pasid member in mm_struct and should be
> part of mm operations. Because IOASID's reference count is not used any
> more and removed, unused ioasid_get() and iommu_sva_free_pasid()
> are deleted and ioasid_put() is renamed to ioasid_free().
>
> 20-bit PASID allows up to 1M processes bound to PASIDs at the same time.
> With cgroups and other controls that might limit the number of process
> creation, the limited number of PASIDs is not a realistic issue for
> lazy PASID free.

Please take a step back and think hard about it whether that changelog
makes sense to you a year from now.

Let me walk you through:

> To avoid complexity of updating each thread's PASID status (e.g. sending
> IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
> allocated and assigned to an mm, the PASID stays with the mm for the
> rest of the mm's lifetime.

You are missing the oportunity to tell a story about the history of this
decision here:

  PASIDs are process wide. It was attempted to use refcounted PASIDs to
  free them when the last thread drops the refcount. This turned out to
  be complex and error prone. Given the fact that the PASID space is 20
  bits, which allows up to 1M processes to have a PASID associated
  concurrently, PASID resource exhaustion is not a realistic concern.

  Therefore it was decided to simplify the approach and stick with lazy
  on demand PASID allocation, but drop the eager free approach and make
  a allocated PASID lifetime bound to the life time of the process.

> A reference to the PASID is taken on allocating the
> PASID. Binding/unbinding the PASID won't change refcount.  The
> reference is dropped on mm exit and thus the PASID is freed.

There is no refcount in play anymore, right? So how does this part of
the changelog make any sense?

This is followed by:

> Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
> the PASID operations handle the pasid member in mm_struct and should be
> part of mm operations. Because IOASID's reference count is not used any
> more and removed, unused ioasid_get() and iommu_sva_free_pasid()
> are deleted and ioasid_put() is renamed to ioasid_free().

which does not provide much rationale and just blurbs about _what_ the
patch is doing and not about the why and the connection to the
above. And the refcount removal section contradicts the stale text
above.

So this paragraph should be something like this:

  Get rid of the refcounting mechanisms and replace/rename the
  interfaces to reflect this new approach.

Hmm?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:

> pasid_valid() is defined to check if a given PASID is valid.
>
> Suggested-by: Ashok Raj 
> Suggested-by: Jacob Pan 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 04/11] kernel/fork: Initialize mm's PASID

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:
> A new mm doesn't have a PASID yet when it's created. Initialize
> the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).

I must be missing something here.

> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index aa5f09ca5bcf..c74d1edbac2f 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Routines for handling mm_structs
> @@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct 
> mm_struct *next_mm)
>  }
>  #endif
>  
> +#ifdef CONFIG_IOMMU_SVA
> +static inline void mm_pasid_init(struct mm_struct *mm)
> +{
> + mm->pasid = INVALID_IOASID;
> +}
> +#else
> +static inline void mm_pasid_init(struct mm_struct *mm) {}
> +#endif
> +
>  #endif /* _LINUX_SCHED_MM_H */

So this adds mm_pasid_init() to linux/sched/mm.h which replaces:

> -static void mm_init_pasid(struct mm_struct *mm)
> -{
> -#ifdef CONFIG_IOMMU_SVA
> - mm->pasid = INIT_PASID;
> -#endif
> -}
> -

I.e. already existing code which is initializing mm->pasid with
INIT_PASID (0) while the replacement initializes it with INVALID_IOASID
(-1).

The change log does not have any information about why INIT_PASID is the
wrong initialization value and why this change is not having any side
effects.

It neither mentions why having this in a global available header makes
sense when the only call site is in the C file from which the already
existing function is removed.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:
> pasid_valid() is defined to check if a given PASID is valid.
>
> Suggested-by: Ashok Raj 
> Suggested-by: Jacob Pan 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 02/11] mm: Change CONFIG option for mm->pasid field

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:

> This currently depends on CONFIG_IOMMU_SUPPORT. But it is only
> needed when CONFIG_IOMMU_SVA option is enabled.
>
> Change the CONFIG guards around definition and initialization
> of mm->pasid field.
>
> Suggested-by: Jacob Pan 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 01/11] iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA

2022-02-04 Thread Thomas Gleixner

On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:

> This CONFIG option originally only referred to the Shared
> Virtual Address (SVA) library. But it is now also used for
> non-library portions of code.
>
> Drop the "_LIB" suffix so that there is just one configuration
> options for all code relating to SVA.
>
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 

Reviewed-by: Thomas Gleixner 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2022-01-31 Thread Thomas Gleixner

Guenter,

On Mon, Jan 31 2022 at 07:21, Guenter Roeck wrote:
> Sure. Please see http://server.roeck-us.net/qemu/x86/.
> The logs are generated with with v5.16.4.

thanks for providing the data. It definitely helped me to leave the
state of not seeing the wood for the trees. Fix below.

Thanks,

tglx
---
Subject: PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
From: Thomas Gleixner 
Date: Mon, 31 Jan 2022 22:02:46 +0100

The recent overhaul of pci_irq_get_affinity() introduced a regression when
pci_irq_get_affinity() is called for an MSI-X interrupt which was not
allocated with affinity descriptor information.

The original code just returned a NULL pointer in that case, but the rework
added a WARN_ON() under the assumption that the corresponding WARN_ON() in
the MSI case can be applied to MSI-X as well.

In fact the MSI warning in the original code does not make sense either
because it's legitimate to invoke pci_irq_get_affinity() for a MSI
interrupt which was not allocated with affinity descriptor information.

Remove it and just return NULL as the original code did.

Fixes: f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()")
Reported-by: Guenter Roeck 
Signed-off-by: Thomas Gleixner 
---
 drivers/pci/msi/msi.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -,7 +,8 @@ const struct cpumask *pci_irq_get_affini
if (!desc)
return cpu_possible_mask;

-   if (WARN_ON_ONCE(!desc->affinity))
+   /* MSI[X] interrupts can be allocated without affinity descriptor */
+   if (!desc->affinity)
return NULL;

/*
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2022-01-31 Thread Thomas Gleixner

On Sun, Jan 30 2022 at 09:12, Guenter Roeck wrote:
> On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote:
> This patch results in the following runtime warning when booting x86
> (32 bit) nosmp images from NVME in qemu.
>
> [   14.825482] nvme nvme0: 1/0/0 default/read/poll queues
> ILLOPC: ca7c6d10: 0f 0b
> [   14.826188] [ cut here ]
> [   14.826307] WARNING: CPU: 0 PID: 7 at drivers/pci/msi/msi.c:1114 
> pci_irq_get_affinity+0x80/0x90

This complains about msi_desc->affinity being NULL.

> git bisect bad f48235900182d64537c6e8f8dc0932b57a1a0638
> # first bad commit: [f48235900182d64537c6e8f8dc0932b57a1a0638] PCI/MSI: 
> Simplify pci_irq_get_affinity()

Hrm. Can you please provide dmesg and /proc/interrupts from a
kernel before that commit?

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-28 Thread Thomas Gleixner

On Thu, Jan 27 2022 at 18:42, Fenghua Yu wrote:
> On Wed, Jan 26, 2022 at 10:38:04PM +0100, Thomas Gleixner wrote:
>> Against Linus tree please so that the bugfix applies.
>> 
>> > I will fold the following patch into patch #5. The patch #11 (the doc 
>> > patch)
>> > also needs to remove one paragraph talking about refcount.
>> >
>> > So I will send the whole patch set with the following changes:
>> > 1. One new bug fix patch (the above patch)
>
> When I study your above aux_domain bug fix path, I find more aux_domain bugs.
> But then I find aux_domain will be completely removed because all aux_domain
> related callbacks are not called and are dead code (no wonder there are
> so many bugs in aux_domain). Please see this series: 
> https://lore.kernel.org/linux-iommu/20220124071103.2097118-4-baolu...@linux.intel.com/
> For the series, Baolu confirms that he is "pretty sure that should be part
> of v5.18". And I don't find the series calls any IOASID function after
> removing the aux_domain code.
>
> So that means we don't need to fix those issues in the dead aux_domain
> code any more because it will be completely removed in 5.18, right?
>
> If you agree, I will not include the aux_domain fix patch or any other
> aux_domain fix patches in the up-coming v3. Is that OK?

Fair enough.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-26 Thread Thomas Gleixner

On Wed, Jan 26 2022 at 09:36, Fenghua Yu wrote:
> On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote:
>> On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote:
>> While looking at ioasid_put() usage I tripped over the following UAF
>> issue:
>> 
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma
>>  auxiliary_unlink_device(domain, dev);
>>  link_failed:
>>  spin_unlock_irqrestore(_domain_lock, flags);
>> -if (list_empty(>subdevices) && domain->default_pasid > 0)
>> +if (list_empty(>subdevices) && domain->default_pasid > 0) {
>>  ioasid_put(domain->default_pasid);
>> +domain->default_pasid = INVALID_IOASID;
>> +}
>>  
>>  return ret;
>>  }
>> @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct
>>  
>>  spin_unlock_irqrestore(_domain_lock, flags);
>>  
>> -if (list_empty(>subdevices) && domain->default_pasid > 0)
>> +if (list_empty(>subdevices) && domain->default_pasid > 0) {
>>  ioasid_put(domain->default_pasid);
>> +domain->default_pasid = INVALID_IOASID;
>> +}
>>  }
>>  
>>  static int prepare_domain_attach_device(struct iommu_domain *domain,
>
> The above patch fixes an existing issue. I will put it in a separate patch,
> right?

Correct.

> It cannot be applied cleanly to the upstream tree. Do you want me to base
> the above patch (and the whole patch set) to the upstream tree or a specific
> tip branch?

Against Linus tree please so that the bugfix applies.

> I will fold the following patch into patch #5. The patch #11 (the doc patch)
> also needs to remove one paragraph talking about refcount.
>
> So I will send the whole patch set with the following changes:
> 1. One new bug fix patch (the above patch)
> 2. Updated patch #5 (with the following patch folded)
> 3. Updated patch #11 (removing refcount description)

Looks good.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-26 Thread Thomas Gleixner

On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote:
> On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote:
>  /**
>   * ioasid_put - Release a reference to an ioasid
>   * @ioasid: the ID to remove

which in turn makes ioasid_put() a misnomer and the whole refcounting of
the ioasid a pointless exercise.

While looking at ioasid_put() usage I tripped over the following UAF
issue:

--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma
auxiliary_unlink_device(domain, dev);
 link_failed:
spin_unlock_irqrestore(_domain_lock, flags);
-   if (list_empty(>subdevices) && domain->default_pasid > 0)
+   if (list_empty(>subdevices) && domain->default_pasid > 0) {
ioasid_put(domain->default_pasid);
+   domain->default_pasid = INVALID_IOASID;
+   }
 
return ret;
 }
@@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct
 
spin_unlock_irqrestore(_domain_lock, flags);
 
-   if (list_empty(>subdevices) && domain->default_pasid > 0)
+   if (list_empty(>subdevices) && domain->default_pasid > 0) {
ioasid_put(domain->default_pasid);
+   domain->default_pasid = INVALID_IOASID;
+   }
 }
 
 static int prepare_domain_attach_device(struct iommu_domain *domain,

Vs. ioasid_put() I think we should fold the following:

--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4818,7 +4818,7 @@ static int aux_domain_add_dev(struct dma
 link_failed:
spin_unlock_irqrestore(_domain_lock, flags);
if (list_empty(>subdevices) && domain->default_pasid > 0) {
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
domain->default_pasid = INVALID_IOASID;
}
 
@@ -4850,7 +4850,7 @@ static void aux_domain_remove_dev(struct
spin_unlock_irqrestore(_domain_lock, flags);
 
if (list_empty(>subdevices) && domain->default_pasid > 0) {
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
domain->default_pasid = INVALID_IOASID;
}
 }
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -2,7 +2,7 @@
 /*
  * I/O Address Space ID allocator. There is one global IOASID space, split into
  * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
- * free IOASIDs with ioasid_alloc and ioasid_put.
+ * free IOASIDs with ioasid_alloc() and ioasid_free().
  */
 #include 
 #include 
@@ -15,7 +15,6 @@ struct ioasid_data {
struct ioasid_set *set;
void *private;
struct rcu_head rcu;
-   refcount_t refs;
 };
 
 /*
@@ -315,7 +314,6 @@ ioasid_t ioasid_alloc(struct ioasid_set
 
data->set = set;
data->private = private;
-   refcount_set(>refs, 1);
 
/*
 * Custom allocator needs allocator data to perform platform specific
@@ -348,17 +346,11 @@ ioasid_t ioasid_alloc(struct ioasid_set
 EXPORT_SYMBOL_GPL(ioasid_alloc);
 
 /**
- * ioasid_put - Release a reference to an ioasid
+ * ioasid_free - Free an ioasid
  * @ioasid: the ID to remove
- *
- * Put a reference to the IOASID, free it when the number of references drops 
to
- * zero.
- *
- * Return: %true if the IOASID was freed, %false otherwise.
  */
-bool ioasid_put(ioasid_t ioasid)
+void ioasid_free(ioasid_t ioasid)
 {
-   bool free = false;
struct ioasid_data *ioasid_data;
 
spin_lock(_allocator_lock);
@@ -368,10 +360,6 @@ bool ioasid_put(ioasid_t ioasid)
goto exit_unlock;
}
 
-   free = refcount_dec_and_test(_data->refs);
-   if (!free)
-   goto exit_unlock;
-
active_allocator->ops->free(ioasid, active_allocator->ops->pdata);
/* Custom allocator needs additional steps to free the xa element */
if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
@@ -381,9 +369,8 @@ bool ioasid_put(ioasid_t ioasid)
 
 exit_unlock:
spin_unlock(_allocator_lock);
-   return free;
 }
-EXPORT_SYMBOL_GPL(ioasid_put);
+EXPORT_SYMBOL_GPL(ioasid_free);
 
 /**
  * ioasid_find - Find IOASID data
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -34,7 +34,7 @@ struct ioasid_allocator_ops {
 #if IS_ENABLED(CONFIG_IOASID)
 ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
  void *private);
-bool ioasid_put(ioasid_t ioasid);
+void ioasid_free(ioasid_t ioasid);
 void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
  bool (*getter)(void *));
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
@@ -52,10 +52,7 @@ static inline ioasid_t ioasid_alloc(stru
return INVALID_IOASID;
 }
 
-static inline bool i

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-24 Thread Thomas Gleixner

On Mon, Jan 24 2022 at 12:52, Fenghua Yu wrote:
> On Mon, Jan 24, 2022 at 09:36:00PM +0100, Thomas Gleixner wrote:
>> On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote:
> Ah. This patch should remove ioasid_get(). So I will change this patch
> as follows:
>
> 1. Remove ioasid_get() because it's not used any more when IOASID's
>refcount is set to 1 once the IOASID is allocated and is freed on mm exit.
> 2. Change mm_pasid_get() to mm_pasid_set().

Yes. Just resend this one. No need to post the full queue again.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-24 Thread Thomas Gleixner

On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote:
>
> Hrm. This is odd.
>
>> +/* Associate a PASID with an mm_struct: */
>> +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid)
>> +{
>> +mm->pasid = pasid;
>> +}
>
> This does not get anything. It sets the allocated PASID in the mm. The
> refcount on the PASID was already taken by the allocation. So this
> should be mm_pasid_set() or mm_pasid_install(), right?

And as a result of all this ioasid_get() is now left without users...

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-24 Thread Thomas Gleixner

On Fri, Dec 17 2021 at 22:01, Fenghua Yu wrote:
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index bd41405d34e9..ee2294e02716 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -18,8 +18,7 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
>   *
>   * Try to allocate a PASID for this mm, or take a reference to the existing 
> one
>   * provided it fits within the [@min, @max] range. On success the PASID is
> - * available in mm->pasid, and must be released with iommu_sva_free_pasid().
> - * @min must be greater than 0, because 0 indicates an unused mm->pasid.
> + * available in mm->pasid and will be available for the lifetime of the mm.
>   *
>   * Returns 0 on success and < 0 on error.
>   */
> @@ -33,38 +32,24 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max)
>   return -EINVAL;
>  
>   mutex_lock(_sva_lock);
> - if (mm->pasid) {
> - if (mm->pasid >= min && mm->pasid <= max)
> - ioasid_get(mm->pasid);
> - else
> + /* Is a PASID already associated with this mm? */
> + if (pasid_valid(mm->pasid)) {
> + if (mm->pasid < min || mm->pasid >= max)
>   ret = -EOVERFLOW;
> - } else {
> - pasid = ioasid_alloc(_sva_pasid, min, max, mm);
> - if (pasid == INVALID_IOASID)
> - ret = -ENOMEM;
> - else
> - mm->pasid = pasid;
> + goto out;
>   }
> +
> + pasid = ioasid_alloc(_sva_pasid, min, max, mm);
> + if (!pasid_valid(pasid))
> + ret = -ENOMEM;
> + else
> + mm_pasid_get(mm, pasid);

Hrm. This is odd.

> +/* Associate a PASID with an mm_struct: */
> +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid)
> +{
> + mm->pasid = pasid;
> +}

This does not get anything. It sets the allocated PASID in the mm. The
refcount on the PASID was already taken by the allocation. So this
should be mm_pasid_set() or mm_pasid_install(), right?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-20 Thread Thomas Gleixner

On Sat, Dec 18 2021 at 21:25, Cédric Le Goater wrote:

> On 12/18/21 11:25, Thomas Gleixner wrote:
>> On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote:
>>> On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote:
>>> I just bisected a boot failure on my AMD test desktop to this patch as
>>> commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") in
>>> -next. It looks like there is a problem with the NVMe drive after this
>>> change according to the logs. Given that the hard drive is not getting
>>> mounted for journald to write logs to, I am not really sure how to get
>>> them from the machine so I have at least taken a picture of what I see
>>> on my screen; open to ideas on that front!
>> 
>> Bah. Fix below.
>
> That's a fix for the issue I was seeing on pseries with NVMe.
>
> Tested-by: Cédric Le Goater 

I had a faint memory that I've seen that issue before, but couldn't find
the mail in those massive threads.

Thanks for confirming!

   tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-18 Thread Thomas Gleixner

On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote:
> On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote:
> I just bisected a boot failure on my AMD test desktop to this patch as
> commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") in
> -next. It looks like there is a problem with the NVMe drive after this
> change according to the logs. Given that the hard drive is not getting
> mounted for journald to write logs to, I am not really sure how to get
> them from the machine so I have at least taken a picture of what I see
> on my screen; open to ideas on that front!

Bah. Fix below.

Thanks,

tglx
---
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 71802410e2ab..9b4910befeda 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -1100,7 +1100,7 @@ EXPORT_SYMBOL(pci_irq_vector);
  */
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 {
-   int irq = pci_irq_vector(dev, nr);
+   int idx, irq = pci_irq_vector(dev, nr);
struct msi_desc *desc;
 
if (WARN_ON_ONCE(irq <= 0))
@@ -1113,7 +1113,10 @@ const struct cpumask *pci_irq_get_affinity(struct 
pci_dev *dev, int nr)
 
if (WARN_ON_ONCE(!desc->affinity))
return NULL;
-   return >affinity[nr].mask;
+
+   /* MSI has a mask array in the descriptor. */
+   idx = dev->msi_enabled ? nr : 0;
+   return >affinity[idx].mask;
 }
 EXPORT_SYMBOL(pci_irq_get_affinity);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-16 Thread Thomas Gleixner

Nishanth,

On Wed, Dec 15 2021 at 19:45, Nishanth Menon wrote:
> On 17:35-20211215, Thomas Gleixner wrote:
> Thanks once again for your help. Hope we can roll in the fixes for
> part3.

Sure, it's only the one-liner for ti sci. Got it folded already.

Thanks for your help and testing!

   tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V4 09-02/35] PCI/MSI: Allocate MSI device data on first use

2021-12-15 Thread Thomas Gleixner

Allocate MSI device data on first use, i.e. when a PCI driver invokes one
of the PCI/MSI enablement functions.

Add a wrapper function to ensure that the ordering vs. pcim_msi_release()
is correct.

Signed-off-by: Thomas Gleixner 
---
V4: Adopted to ensure devres ordering
---
 drivers/pci/msi/msi.c |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -366,6 +366,19 @@ static int pcim_setup_msi_release(struct
return ret;
 }
 
+/*
+ * Ordering vs. devres: msi device data has to be installed first so that
+ * pcim_msi_release() is invoked before it on device release.
+ */
+static int pci_setup_msi_context(struct pci_dev *dev)
+{
+   int ret = msi_setup_device_data(>dev);
+
+   if (!ret)
+   ret = pcim_setup_msi_release(dev);
+   return ret;
+}
+
 static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 {
@@ -909,7 +922,7 @@ static int __pci_enable_msi_range(struct
if (nvec > maxvec)
nvec = maxvec;
 
-   rc = pcim_setup_msi_release(dev);
+   rc = pci_setup_msi_context(dev);
if (rc)
return rc;
 
@@ -956,7 +969,7 @@ static int __pci_enable_msix_range(struc
if (WARN_ON_ONCE(dev->msix_enabled))
return -EINVAL;
 
-   rc = pcim_setup_msi_release(dev);
+   rc = pci_setup_msi_context(dev);
if (rc)
return rc;
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V4 09-01/35] PCI/MSI: Decouple MSI[-X] disable from pcim_release()

2021-12-15 Thread Thomas Gleixner

The MSI core will introduce runtime allocation of MSI related data. This
data will be devres managed and has to be set up before enabling
PCI/MSI[-X]. This would introduce an ordering issue vs. pcim_release().

The setup order is:

   pcim_enable_device()
devres_alloc(pcim_release...);
...
pci_irq_alloc()
  msi_setup_device_data()
 devres_alloc(msi_device_data_release, ...)

and once the device is released these release functions are invoked in the
opposite order:

msi_device_data_release()
...
pcim_release()
   pci_disable_msi[x]()

which is obviously wrong, because pci_disable_msi[x]() requires the MSI
data to be available to tear down the MSI[-X] interrupts.

Remove the MSI[-X] teardown from pcim_release() and add an explicit action
to be installed on the attempt of enabling PCI/MSI[-X].

This allows the MSI core data allocation to be ordered correctly in a
subsequent step.

Reported-by: Nishanth Menon 
Signed-off-by: Thomas Gleixner 
---
V4: New patch
---
 drivers/pci/msi/msi.c |   33 +
 drivers/pci/pci.c |5 -
 include/linux/pci.h   |3 ++-
 3 files changed, 35 insertions(+), 6 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -341,6 +341,31 @@ void pci_restore_msi_state(struct pci_de
 }
 EXPORT_SYMBOL_GPL(pci_restore_msi_state);
 
+static void pcim_msi_release(void *pcidev)
+{
+   struct pci_dev *dev = pcidev;
+
+   dev->is_msi_managed = false;
+   pci_free_irq_vectors(dev);
+}
+
+/*
+ * Needs to be separate from pcim_release to prevent an ordering problem
+ * vs. msi_device_data_release() in the MSI core code.
+ */
+static int pcim_setup_msi_release(struct pci_dev *dev)
+{
+   int ret;
+
+   if (!pci_is_managed(dev) || dev->is_msi_managed)
+   return 0;
+
+   ret = devm_add_action(>dev, pcim_msi_release, dev);
+   if (!ret)
+   dev->is_msi_managed = true;
+   return ret;
+}
+
 static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 {
@@ -884,6 +909,10 @@ static int __pci_enable_msi_range(struct
if (nvec > maxvec)
nvec = maxvec;
 
+   rc = pcim_setup_msi_release(dev);
+   if (rc)
+   return rc;
+
for (;;) {
if (affd) {
nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
@@ -927,6 +956,10 @@ static int __pci_enable_msix_range(struc
if (WARN_ON_ONCE(dev->msix_enabled))
return -EINVAL;
 
+   rc = pcim_setup_msi_release(dev);
+   if (rc)
+   return rc;
+
for (;;) {
if (affd) {
nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2024,11 +2024,6 @@ static void pcim_release(struct device *
struct pci_devres *this = res;
int i;
 
-   if (dev->msi_enabled)
-   pci_disable_msi(dev);
-   if (dev->msix_enabled)
-   pci_disable_msix(dev);
-
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
if (this->region_mask & (1 << i))
pci_release_region(dev, i);
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -425,7 +425,8 @@ struct pci_dev {
unsigned intats_enabled:1;  /* Address Translation Svc */
unsigned intpasid_enabled:1;/* Process Address Space ID */
unsigned intpri_enabled:1;  /* Page Request Interface */
-   unsigned intis_managed:1;
+   unsigned intis_managed:1;   /* Managed via devres */
+   unsigned intis_msi_managed:1;   /* MSI release via devres 
installed */
unsigned intneeds_freset:1; /* Requires fundamental reset */
unsigned intstate_saved:1;
unsigned intis_physfn:1;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-15 Thread Thomas Gleixner

On Wed, Dec 15 2021 at 17:18, Thomas Gleixner wrote:

> On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote:
>> On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote:
>>
>> thanks for trying. I'll have a look again with brain awake tomorrow
>> morning.
>
> Morning was busy with other things, but I found what my sleepy brain
> managed to do wrong yesterday evening.
>
> Let me reintegrate the pile and I'll send you an update.

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.1-part-2
   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.2-part-3

That should cure the problem.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-15 Thread Thomas Gleixner

On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote:
> On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote:
>
> thanks for trying. I'll have a look again with brain awake tomorrow
> morning.

Morning was busy with other things, but I found what my sleepy brain
managed to do wrong yesterday evening.

Let me reintegrate the pile and I'll send you an update.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-14 Thread Thomas Gleixner

Nishanth,

On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote:
> On 21:15-20211214, Thomas Gleixner wrote:
>> I think I managed to distangle this. Can you please give:
>> 
>>git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2
>
>
> Umm.. I am not entirely sure what is going on.. but all kinds of weird
> corruption seems to occur with msi-v4-part-2 that does'nt seem to be
> present in v5.16-rc5. (I use NFS since ethernet in K3 platforms use
> inta/intr and dma that is impacted by this series).
>
> I will try and rebase your patches on v5.16-rc4 to be sure as well and
> report back later today once i get some time.
>
> [1] https://gist.github.com/nmenon/a66e022926c4c15313c45d44313d860c 
> msi-v4-part-2
> [2] https://gist.github.com/nmenon/43085664d69ad846d596e76a06ed0656  v5.16-rc5

thanks for trying. I'll have a look again with brain awake tomorrow
morning.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-14 Thread Thomas Gleixner

Nishanth,

On Tue, Dec 14 2021 at 18:03, Thomas Gleixner wrote:
> msi_device_data_release()
> ...
> pcim_release()
>pci_disable_msi[x]()
>
> Groan

I think I managed to distangle this. Can you please give:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2

and/or the full pile:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-3

a test ride?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-14 Thread Thomas Gleixner

On Tue, Dec 14 2021 at 17:36, Thomas Gleixner wrote:
> On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote:
>> On 10:41-20211214, Thomas Gleixner wrote:
> [   13.478122] Call trace:
> [   13.509042]  msi_device_destroy_sysfs+0x18/0x88
> [   13.509058]  msi_domain_free_irqs+0x34/0x58
> [   13.509064]  pci_msi_teardown_msi_irqs+0x30/0x3c
> [   13.509072]  free_msi_irqs+0x78/0xd4
> [   13.509077]  pci_disable_msix+0x138/0x164
> [   13.529930]  pcim_release+0x70/0x238
> [   13.529942]  devres_release_all+0x9c/0xfc
> [   13.529951]  device_release_driver_internal+0x1a0/0x244
> [   13.542725]  device_release_driver+0x18/0x24
> [   13.542741]  iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi]
> [   13.552308]  request_firmware_work_func+0x50/0x9c
> [   13.552320]  process_one_work+0x194/0x25c
>
> That's not a driver problem, that's an ordering issue vs. the devres
> muck. Let me go back to the drawing board. Sigh...

Which is pretty obvious why:

   pcim_enable_device()
devres_alloc(pcim_release...);
...
pci_irq_alloc()
  msi_setup_device_data()
 devres_alloc(msi_device_data_release, ...)

and once the device is released:

msi_device_data_release()
...
pcim_release()
   pci_disable_msi[x]()

Groan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-14 Thread Thomas Gleixner

On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote:
> On 10:41-20211214, Thomas Gleixner wrote:
> Agreed that the warning is fine, the null pointer exception that follows
> [1] [2] it however does'nt look right and it can be trivially fixed with the
> following fixup for ee90787487bc ("genirq/msi: Provide
> msi_device_populate/destroy_sysfs()") below, with that the log looks
> like [3] - the warn is good, the null pointer exception and resultant
> crash could be avoided (not saying this is the best solution):

Aaargh.

[   13.478122] Call trace:
[   13.509042]  msi_device_destroy_sysfs+0x18/0x88
[   13.509058]  msi_domain_free_irqs+0x34/0x58
[   13.509064]  pci_msi_teardown_msi_irqs+0x30/0x3c
[   13.509072]  free_msi_irqs+0x78/0xd4
[   13.509077]  pci_disable_msix+0x138/0x164
[   13.529930]  pcim_release+0x70/0x238
[   13.529942]  devres_release_all+0x9c/0xfc
[   13.529951]  device_release_driver_internal+0x1a0/0x244
[   13.542725]  device_release_driver+0x18/0x24
[   13.542741]  iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi]
[   13.552308]  request_firmware_work_func+0x50/0x9c
[   13.552320]  process_one_work+0x194/0x25c

That's not a driver problem, that's an ordering issue vs. the devres
muck. Let me go back to the drawing board. Sigh...

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-14 Thread Thomas Gleixner

On Mon, Dec 13 2021 at 12:29, Nishanth Menon wrote:
> On 23:18-20211210, Thomas Gleixner wrote:
> Also while testing on TI K3 platforms, I noticed:
>
> msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200

The warning complains about a device being released with MSI descriptors
still attached to the device. This was added by:

  5b012cede0f7 ("device: Add device::msi_data pointer and struct 
msi_device_data")

That's not a regression caused by this commit. The warning is just
exposing an already existing problem in the iwlwifi driver, which seems
to do:

   probe()
 setup_pci_msi[x]_interrupts()
 start_drv()
   if (try_to_load_firmware() == FAIL)
   device_release_driver()
...
msi_device_data_release()
WARN()

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-12 Thread Thomas Gleixner

Kevin,

On Sun, Dec 12 2021 at 01:56, Kevin Tian wrote:
>> From: Thomas Gleixner 
>> All I can find is drivers/iommu/virtio-iommu.c but I can't find anything
>> vIR related there.
>
> Well, virtio-iommu is a para-virtualized vIOMMU implementations.
>
> In reality there are also fully emulated vIOMMU implementations (e.g.
> Qemu fully emulates Intel/AMD/ARM IOMMUs). In those configurations
> the IR logic in existing iommu drivers just apply:
>
>   drivers/iommu/intel/irq_remapping.c
>   drivers/iommu/amd/iommu.c

thanks for the explanation. So that's a full IOMMU emulation. I was more
expecting a paravirtualized lightweight one.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-12 Thread Thomas Gleixner

Kevin,

On Sun, Dec 12 2021 at 02:14, Kevin Tian wrote:
>> From: Thomas Gleixner 
> I just continue the thought practice along that direction to see what
> the host flow will be like (step by step). Looking at the current 
> implementation is just one necessary step in my thought practice to 
> help refine the picture. When I found something which may be 
> worth being aligned then I shared to avoid follow a wrong direction 
> too far.
>
> If both of your think it simply adds noise to this discussion, I can
> surely hold back and focus on 'concept' only.

All good. We _want_ your participartion for sure. Comparing and
contrasting it to the existing flow is fine.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-11 Thread Thomas Gleixner

Kevin,

On Sat, Dec 11 2021 at 07:52, Kevin Tian wrote:
>> From: Jason Gunthorpe 
>> > Then Qemu needs to find out the GSI number for the vIRTE handle.
>> > Again Qemu doesn't have such information since it doesn't know
>> > which MSI[-X] entry points to this handle due to no trap.
>> 
>> No this is already going wrong. qemu *cannot* know the MSI information
>> because there is no MSI information for IMS.
>
> I haven't thought of IMS at this step. The IR approach applies to
> all types of interrupt storages, thus I'm more interested in how it
> affect the storages which are already virtualized today (MSI[-X] 
> in my thought practice).

They are not any different. As I explained several times now IMS is
nothing new at all. It existed since the invention of Message Signaled
interrupts. Why?

The principle behind Message Signaled Interrupts is:

Device writes DATA to ADDRESS which raises an interrupt in a CPU

Message Signaled Interrupts obviously need some place to store the
ADDRESS/DATA pair so that the device can use it for raising an
interrupt, i.e. an

   Interrupt Message Store, short IMS.

PCI/MSI was the first implementation of this and the storage was defined
to be at a specified and therefore uniform and device independent place.

PCI/MSI-X followed the same approch. While it solved quite some of the
shortcomings of PCI/MSI it still has a specificed and uniform and device
independent place to store the message (ADDRESS/DATA pair)

Now the PCI wizards figured out that PCI/MSI[-X] is not longer up to the
task for various reasons and came up with the revolutionary new concept
of IMS, aka Interrupt Message Store. where the device defines where the
message is stored.

IOW, this is coming back full circle to the original problem of where to
store the message, i.e. the ADDRESS/DATA pair so that the device can
raise an interrupt in a CPU, which requires - drum roll - an

   Interrupt Message Store, short IMS.

So you simply have to look at it from a pure MSI (not PCI/MSI) point
of view:

   MSI at the conceptual level requires storage for the ADDRESS/DATA
   pair at some place so that the device or the compute unit embedded in
   the device can write DATA to ADDRESS.

That's it. Not more, not less.

When you look at it from this perspective, then you'll realize that

 PCI/MSI and PCI/MSI-X are just implementations of IMS

Not more, not less. The fact that they have very strict rules about the
storage space and the fact that they are mutually exclusive does not
change that at all.

That's where a lot of the confusion comes from. If you go back to all
the IDXD/IMS discussions which happened over time then you'll figure out
that _all_ of us where coming from the same wrong assumption:

IMS is new and it's just another exclusive variant of PCI/MSI and
PCi/MSI-X.

It took _all_ of us quite some time to realize that we need to look at
it from the other way around.

There was surely some other conceptual confusion vs. subdevices, queues
and whatever involved which contributed to that. Water under the bridge.

Coming back to your initial question:

> I haven't thought of IMS at this step. The IR approach applies to
> all types of interrupt storages, thus I'm more interested in how it
> affect the storages which are already virtualized today (MSI[-X] 
> in my thought practice).

Stop focussing on implementation details. Focus on the general concept
instead. See above.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-11 Thread Thomas Gleixner

Kevin,

On Sat, Dec 11 2021 at 07:44, Kevin Tian wrote:
>> From: Thomas Gleixner 
>> On Fri, Dec 10 2021 at 08:39, Jason Gunthorpe wrote:
>> > It is clever, we don't have an vIOMMU that supplies vIR today, so by
>> > definition all guests are excluded and only bare metal works.
>> 
>> Dammit. Now you spilled the beans. :)
>
> Unfortunately we do have that today. Qemu supports IR for
> both AMD and Intel vIOMMU.

can you point me to the code?

All I can find is drivers/iommu/virtio-iommu.c but I can't find anything
vIR related there.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 34/35] soc: ti: ti_sci_inta_msi: Get rid of ti_sci_inta_msi_get_virq()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Just use the core function msi_get_virq().

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Peter Ujfalusi 
Cc: Vinod Koul 
Cc: dmaeng...@vger.kernel.org
---
 drivers/dma/ti/k3-udma-private.c   |6 ++
 drivers/dma/ti/k3-udma.c   |   10 --
 drivers/soc/ti/k3-ringacc.c|2 +-
 drivers/soc/ti/ti_sci_inta_msi.c   |   12 
 include/linux/soc/ti/ti_sci_inta_msi.h |1 -
 5 files changed, 7 insertions(+), 24 deletions(-)

--- a/drivers/dma/ti/k3-udma-private.c
+++ b/drivers/dma/ti/k3-udma-private.c
@@ -168,8 +168,7 @@ int xudma_pktdma_tflow_get_irq(struct ud
 {
const struct udma_oes_offsets *oes = >soc_data->oes;
 
-   return ti_sci_inta_msi_get_virq(ud->dev, udma_tflow_id +
-   oes->pktdma_tchan_flow);
+   return msi_get_virq(ud->dev, udma_tflow_id + oes->pktdma_tchan_flow);
 }
 EXPORT_SYMBOL(xudma_pktdma_tflow_get_irq);
 
@@ -177,7 +176,6 @@ int xudma_pktdma_rflow_get_irq(struct ud
 {
const struct udma_oes_offsets *oes = >soc_data->oes;
 
-   return ti_sci_inta_msi_get_virq(ud->dev, udma_rflow_id +
-   oes->pktdma_rchan_flow);
+   return msi_get_virq(ud->dev, udma_rflow_id + oes->pktdma_rchan_flow);
 }
 EXPORT_SYMBOL(xudma_pktdma_rflow_get_irq);
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -2313,8 +2313,7 @@ static int udma_alloc_chan_resources(str
 
/* Event from UDMA (TR events) only needed for slave TR mode channels */
if (is_slave_direction(uc->config.dir) && !uc->config.pkt_mode) {
-   uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev,
-   irq_udma_idx);
+   uc->irq_num_udma = msi_get_virq(ud->dev, irq_udma_idx);
if (uc->irq_num_udma <= 0) {
dev_err(ud->dev, "Failed to get udma irq (index: %u)\n",
irq_udma_idx);
@@ -2486,7 +2485,7 @@ static int bcdma_alloc_chan_resources(st
uc->psil_paired = true;
}
 
-   uc->irq_num_ring = ti_sci_inta_msi_get_virq(ud->dev, irq_ring_idx);
+   uc->irq_num_ring = msi_get_virq(ud->dev, irq_ring_idx);
if (uc->irq_num_ring <= 0) {
dev_err(ud->dev, "Failed to get ring irq (index: %u)\n",
irq_ring_idx);
@@ -2503,8 +2502,7 @@ static int bcdma_alloc_chan_resources(st
 
/* Event from BCDMA (TR events) only needed for slave channels */
if (is_slave_direction(uc->config.dir)) {
-   uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev,
-   irq_udma_idx);
+   uc->irq_num_udma = msi_get_virq(ud->dev, irq_udma_idx);
if (uc->irq_num_udma <= 0) {
dev_err(ud->dev, "Failed to get bcdma irq (index: 
%u)\n",
irq_udma_idx);
@@ -2672,7 +2670,7 @@ static int pktdma_alloc_chan_resources(s
 
uc->psil_paired = true;
 
-   uc->irq_num_ring = ti_sci_inta_msi_get_virq(ud->dev, irq_ring_idx);
+   uc->irq_num_ring = msi_get_virq(ud->dev, irq_ring_idx);
if (uc->irq_num_ring <= 0) {
dev_err(ud->dev, "Failed to get ring irq (index: %u)\n",
irq_ring_idx);
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -647,7 +647,7 @@ int k3_ringacc_get_ring_irq_num(struct k
if (!ring)
return -EINVAL;
 
-   irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev, ring->ring_id);
+   irq_num = msi_get_virq(ring->parent->dev, ring->ring_id);
if (irq_num <= 0)
irq_num = -EINVAL;
return irq_num;
--- a/drivers/soc/ti/ti_sci_inta_msi.c
+++ b/drivers/soc/ti/ti_sci_inta_msi.c
@@ -148,15 +148,3 @@ void ti_sci_inta_msi_domain_free_irqs(st
ti_sci_inta_msi_free_descs(dev);
 }
 EXPORT_SYMBOL_GPL(ti_sci_inta_msi_domain_free_irqs);
-
-unsigned int ti_sci_inta_msi_get_virq(struct device *dev, u32 dev_index)
-{
-   struct msi_desc *desc;
-
-   for_each_msi_entry(desc, dev)
-   if (desc->msi_index == dev_index)
-   return desc->irq;
-
-   return -ENODEV;
-}
-EXPORT_SYMBOL_GPL(ti_sci_inta_msi_get_virq);
--- a/include/linux/soc/ti/ti_sci_inta_msi.h
+++ b/include/linux/soc/ti/ti_sci_inta_msi.h
@@ -18,6 +18,5 @@ struct irq_domain
   struct irq_domain *parent);
 int ti_sci_inta_msi_domain_alloc_irqs(struct device *dev,
  struct ti_sci_resource *res);
-unsigned int ti_sci_inta_msi_get_virq(struct device *dev,

[patch V3 35/35] dmaengine: qcom_hidma: Cleanup MSI handling

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

There is no reason to walk the MSI descriptors to retrieve the interrupt
number for a device. Use msi_get_virq() instead.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Acked-by: Sinan Kaya 
Cc: dmaeng...@vger.kernel.org
---
 drivers/dma/qcom/hidma.c |   42 ++
 1 file changed, 18 insertions(+), 24 deletions(-)

--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -678,11 +678,13 @@ static void hidma_free_msis(struct hidma
 {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
struct device *dev = dmadev->ddev.dev;
-   struct msi_desc *desc;
+   int i, virq;
 
-   /* free allocated MSI interrupts above */
-   for_each_msi_entry(desc, dev)
-   devm_free_irq(dev, desc->irq, >lldev);
+   for (i = 0; i < HIDMA_MSI_INTS; i++) {
+   virq = msi_get_virq(dev, i);
+   if (virq)
+   devm_free_irq(dev, virq, >lldev);
+   }
 
platform_msi_domain_free_irqs(dev);
 #endif
@@ -692,45 +694,37 @@ static int hidma_request_msi(struct hidm
 struct platform_device *pdev)
 {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
-   int rc;
-   struct msi_desc *desc;
-   struct msi_desc *failed_desc = NULL;
+   int rc, i, virq;
 
rc = platform_msi_domain_alloc_irqs(>dev, HIDMA_MSI_INTS,
hidma_write_msi_msg);
if (rc)
return rc;
 
-   for_each_msi_entry(desc, >dev) {
-   if (!desc->msi_index)
-   dmadev->msi_virqbase = desc->irq;
-
-   rc = devm_request_irq(>dev, desc->irq,
+   for (i = 0; i < HIDMA_MSI_INTS; i++) {
+   virq = msi_get_virq(>dev, i);
+   rc = devm_request_irq(>dev, virq,
   hidma_chirq_handler_msi,
   0, "qcom-hidma-msi",
   >lldev);
-   if (rc) {
-   failed_desc = desc;
+   if (rc)
break;
-   }
+   if (!i)
+   dmadev->msi_virqbase = virq;
}
 
if (rc) {
/* free allocated MSI interrupts above */
-   for_each_msi_entry(desc, >dev) {
-   if (desc == failed_desc)
-   break;
-   devm_free_irq(>dev, desc->irq,
- >lldev);
+   for (--i; i >= 0; i--) {
+   virq = msi_get_virq(>dev, i);
+   devm_free_irq(>dev, virq, >lldev);
}
+   dev_warn(>dev,
+"failed to request MSI irq, falling back to wired 
IRQ\n");
} else {
/* Add callback to free MSIs on teardown */
hidma_ll_setup_irq(dmadev->lldev, true);
-
}
-   if (rc)
-   dev_warn(>dev,
-"failed to request MSI irq, falling back to wired 
IRQ\n");
return rc;
 #else
return -EINVAL;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 33/35] bus: fsl-mc: fsl-mc-allocator: Rework MSI handling

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Storing a pointer to the MSI descriptor just to track the Linux interrupt
number is daft. Just store the interrupt number and be done with it.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Stuart Yoder 
Cc: Laurentiu Tudor 
---
 drivers/bus/fsl-mc/dprc-driver.c|8 
 drivers/bus/fsl-mc/fsl-mc-allocator.c   |9 ++---
 drivers/bus/fsl-mc/fsl-mc-msi.c |6 +++---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c|4 ++--
 drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c|4 +---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c |5 ++---
 drivers/soc/fsl/dpio/dpio-driver.c  |8 
 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c  |4 ++--
 include/linux/fsl/mc.h  |4 ++--
 9 files changed, 22 insertions(+), 30 deletions(-)

--- a/drivers/bus/fsl-mc/dprc-driver.c
+++ b/drivers/bus/fsl-mc/dprc-driver.c
@@ -400,7 +400,7 @@ static irqreturn_t dprc_irq0_handler_thr
struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
struct fsl_mc_bus *mc_bus = to_fsl_mc_bus(mc_dev);
struct fsl_mc_io *mc_io = mc_dev->mc_io;
-   struct msi_desc *msi_desc = mc_dev->irqs[0]->msi_desc;
+   int irq = mc_dev->irqs[0]->virq;
 
dev_dbg(dev, "DPRC IRQ %d triggered on CPU %u\n",
irq_num, smp_processor_id());
@@ -409,7 +409,7 @@ static irqreturn_t dprc_irq0_handler_thr
return IRQ_HANDLED;
 
mutex_lock(_bus->scan_mutex);
-   if (!msi_desc || msi_desc->irq != (u32)irq_num)
+   if (irq != (u32)irq_num)
goto out;
 
status = 0;
@@ -521,7 +521,7 @@ static int register_dprc_irq_handler(str
 * function that programs the MSI physically in the device
 */
error = devm_request_threaded_irq(_dev->dev,
- irq->msi_desc->irq,
+ irq->virq,
  dprc_irq0_handler,
  dprc_irq0_handler_thread,
  IRQF_NO_SUSPEND | IRQF_ONESHOT,
@@ -771,7 +771,7 @@ static void dprc_teardown_irq(struct fsl
 
(void)disable_dprc_irq(mc_dev);
 
-   devm_free_irq(_dev->dev, irq->msi_desc->irq, _dev->dev);
+   devm_free_irq(_dev->dev, irq->virq, _dev->dev);
 
fsl_mc_free_irqs(mc_dev);
 }
--- a/drivers/bus/fsl-mc/fsl-mc-allocator.c
+++ b/drivers/bus/fsl-mc/fsl-mc-allocator.c
@@ -350,7 +350,6 @@ int fsl_mc_populate_irq_pool(struct fsl_
 unsigned int irq_count)
 {
unsigned int i;
-   struct msi_desc *msi_desc;
struct fsl_mc_device_irq *irq_resources;
struct fsl_mc_device_irq *mc_dev_irq;
int error;
@@ -388,16 +387,12 @@ int fsl_mc_populate_irq_pool(struct fsl_
mc_dev_irq->resource.type = res_pool->type;
mc_dev_irq->resource.data = mc_dev_irq;
mc_dev_irq->resource.parent_pool = res_pool;
+   mc_dev_irq->virq = msi_get_virq(_bus_dev->dev, i);
+   mc_dev_irq->resource.id = mc_dev_irq->virq;
INIT_LIST_HEAD(_dev_irq->resource.node);
list_add_tail(_dev_irq->resource.node, _pool->free_list);
}
 
-   for_each_msi_entry(msi_desc, _bus_dev->dev) {
-   mc_dev_irq = _resources[msi_desc->msi_index];
-   mc_dev_irq->msi_desc = msi_desc;
-   mc_dev_irq->resource.id = msi_desc->irq;
-   }
-
res_pool->max_count = irq_count;
res_pool->free_count = irq_count;
mc_bus->irq_resources = irq_resources;
--- a/drivers/bus/fsl-mc/fsl-mc-msi.c
+++ b/drivers/bus/fsl-mc/fsl-mc-msi.c
@@ -58,11 +58,11 @@ static void fsl_mc_msi_update_dom_ops(st
 }
 
 static void __fsl_mc_msi_write_msg(struct fsl_mc_device *mc_bus_dev,
-  struct fsl_mc_device_irq *mc_dev_irq)
+  struct fsl_mc_device_irq *mc_dev_irq,
+  struct msi_desc *msi_desc)
 {
int error;
struct fsl_mc_device *owner_mc_dev = mc_dev_irq->mc_dev;
-   struct msi_desc *msi_desc = mc_dev_irq->msi_desc;
struct dprc_irq_cfg irq_cfg;
 
/*
@@ -129,7 +129,7 @@ static void fsl_mc_msi_write_msg(struct
/*
 * Program the MSI (paddr, value) pair in the device:
 */
-   __fsl_mc_msi_write_msg(mc_bus_dev, mc_dev_irq);
+   __fsl_mc_msi_write_msg(mc_bus_dev, mc_dev_irq, msi_desc);
 }
 
 static void fsl_mc_msi_update_chip_ops(struct msi_domain_info *info)
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4246,7 +4246,7 @@ s

[patch V3 32/35] mailbox: bcm-flexrm-mailbox: Rework MSI interrupt handling

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

No point in retrieving the MSI descriptors. Just query the Linux interrupt
number.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Jassi Brar 

---
 drivers/mailbox/bcm-flexrm-mailbox.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1497,7 +1497,6 @@ static int flexrm_mbox_probe(struct plat
int index, ret = 0;
void __iomem *regs;
void __iomem *regs_end;
-   struct msi_desc *desc;
struct resource *iomem;
struct flexrm_ring *ring;
struct flexrm_mbox *mbox;
@@ -1608,10 +1607,8 @@ static int flexrm_mbox_probe(struct plat
goto fail_destroy_cmpl_pool;
 
/* Save alloced IRQ numbers for each ring */
-   for_each_msi_entry(desc, dev) {
-   ring = >rings[desc->msi_index];
-   ring->irq = desc->irq;
-   }
+   for (index = 0; index < mbox->num_rings; index++)
+   mbox->rings[index].irq = msi_get_virq(dev, index);
 
/* Check availability of debugfs */
if (!debugfs_initialized())

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 31/35] iommu/arm-smmu-v3: Use msi_get_virq()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Let the core code fiddle with the MSI descriptor retrieval.

Signed-off-by: Thomas Gleixner 
Tested-by: Robin Murphy 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Will Deacon 
Cc: Joerg Roedel 
Cc: linux-arm-ker...@lists.infradead.org
Cc: iommu@lists.linux-foundation.org

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3154,7 +3154,6 @@ static void arm_smmu_write_msi_msg(struc
 
 static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
 {
-   struct msi_desc *desc;
int ret, nvec = ARM_SMMU_MAX_MSIS;
struct device *dev = smmu->dev;
 
@@ -3182,21 +3181,9 @@ static void arm_smmu_setup_msis(struct a
return;
}
 
-   for_each_msi_entry(desc, dev) {
-   switch (desc->msi_index) {
-   case EVTQ_MSI_INDEX:
-   smmu->evtq.q.irq = desc->irq;
-   break;
-   case GERROR_MSI_INDEX:
-   smmu->gerr_irq = desc->irq;
-   break;
-   case PRIQ_MSI_INDEX:
-   smmu->priq.q.irq = desc->irq;
-   break;
-   default:/* Unknown */
-   continue;
-   }
-   }
+   smmu->evtq.q.irq = msi_get_virq(dev, EVTQ_MSI_INDEX);
+   smmu->gerr_irq = msi_get_virq(dev, GERROR_MSI_INDEX);
+   smmu->priq.q.irq = msi_get_virq(dev, PRIQ_MSI_INDEX);
 
/* Add callback to free MSIs on teardown */
devm_add_action(dev, arm_smmu_free_msis, dev);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 30/35] perf/smmuv3: Use msi_get_virq()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Let the core code fiddle with the MSI descriptor retrieval.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/perf/arm_smmuv3_pmu.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -684,7 +684,6 @@ static void smmu_pmu_write_msi_msg(struc
 
 static void smmu_pmu_setup_msi(struct smmu_pmu *pmu)
 {
-   struct msi_desc *desc;
struct device *dev = pmu->dev;
int ret;
 
@@ -701,9 +700,7 @@ static void smmu_pmu_setup_msi(struct sm
return;
}
 
-   desc = first_msi_entry(dev);
-   if (desc)
-   pmu->irq = desc->irq;
+   pmu->irq = msi_get_virq(dev, 0);
 
/* Add callback to free MSIs on teardown */
devm_add_action(dev, smmu_pmu_free_msis, dev);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 29/35] dmaengine: mv_xor_v2: Get rid of msi_desc abuse

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Storing a pointer to the MSI descriptor just to keep track of the Linux
interrupt number is daft. Use msi_get_virq() instead.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Vinod Koul 
Cc: dmaeng...@vger.kernel.org
---
 drivers/dma/mv_xor_v2.c |   16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

--- a/drivers/dma/mv_xor_v2.c
+++ b/drivers/dma/mv_xor_v2.c
@@ -149,7 +149,7 @@ struct mv_xor_v2_descriptor {
  * @desc_size: HW descriptor size
  * @npendings: number of pending descriptors (for which tx_submit has
  * @hw_queue_idx: HW queue index
- * @msi_desc: local interrupt descriptor information
+ * @irq: The Linux interrupt number
  * been called, but not yet issue_pending)
  */
 struct mv_xor_v2_device {
@@ -168,7 +168,7 @@ struct mv_xor_v2_device {
int desc_size;
unsigned int npendings;
unsigned int hw_queue_idx;
-   struct msi_desc *msi_desc;
+   unsigned int irq;
 };
 
 /**
@@ -718,7 +718,6 @@ static int mv_xor_v2_probe(struct platfo
int i, ret = 0;
struct dma_device *dma_dev;
struct mv_xor_v2_sw_desc *sw_desc;
-   struct msi_desc *msi_desc;
 
BUILD_BUG_ON(sizeof(struct mv_xor_v2_descriptor) !=
 MV_XOR_V2_EXT_DESC_SIZE);
@@ -770,14 +769,9 @@ static int mv_xor_v2_probe(struct platfo
if (ret)
goto disable_clk;
 
-   msi_desc = first_msi_entry(>dev);
-   if (!msi_desc) {
-   ret = -ENODEV;
-   goto free_msi_irqs;
-   }
-   xor_dev->msi_desc = msi_desc;
+   xor_dev->irq = msi_get_virq(>dev, 0);
 
-   ret = devm_request_irq(>dev, msi_desc->irq,
+   ret = devm_request_irq(>dev, xor_dev->irq,
   mv_xor_v2_interrupt_handler, 0,
   dev_name(>dev), xor_dev);
if (ret)
@@ -892,7 +886,7 @@ static int mv_xor_v2_remove(struct platf
  xor_dev->desc_size * MV_XOR_V2_DESC_NUM,
  xor_dev->hw_desq_virt, xor_dev->hw_desq);
 
-   devm_free_irq(>dev, xor_dev->msi_desc->irq, xor_dev);
+   devm_free_irq(>dev, xor_dev->irq, xor_dev);
 
platform_msi_domain_free_irqs(>dev);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Replace open coded MSI descriptor chasing and use the proper accessor
functions instead.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/pci/msi/msi.c |   26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -1061,26 +1061,20 @@ EXPORT_SYMBOL(pci_irq_vector);
  */
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 {
-   if (dev->msix_enabled) {
-   struct msi_desc *entry;
+   int irq = pci_irq_vector(dev, nr);
+   struct msi_desc *desc;
 
-   for_each_pci_msi_entry(entry, dev) {
-   if (entry->msi_index == nr)
-   return >affinity->mask;
-   }
-   WARN_ON_ONCE(1);
+   if (WARN_ON_ONCE(irq <= 0))
return NULL;
-   } else if (dev->msi_enabled) {
-   struct msi_desc *entry = first_pci_msi_entry(dev);
 
-   if (WARN_ON_ONCE(!entry || !entry->affinity ||
-nr >= entry->nvec_used))
-   return NULL;
-
-   return >affinity[nr].mask;
-   } else {
+   desc = irq_get_msi_desc(irq);
+   /* Non-MSI does not have the information handy */
+   if (!desc)
return cpu_possible_mask;
-   }
+
+   if (WARN_ON_ONCE(!desc->affinity))
+   return NULL;
+   return >affinity[nr].mask;
 }
 EXPORT_SYMBOL(pci_irq_get_affinity);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 27/35] PCI/MSI: Use __msi_get_virq() in pci_get_vector()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Use msi_get_vector() and handle the return value to be compatible.

No functional change intended.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
---
V2: Handle the INTx case directly instead of trying to be overly smart - Marc
---
 drivers/pci/msi/msi.c |   25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -1037,28 +1037,13 @@ EXPORT_SYMBOL(pci_free_irq_vectors);
  */
 int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
 {
-   if (dev->msix_enabled) {
-   struct msi_desc *entry;
+   unsigned int irq;
 
-   for_each_pci_msi_entry(entry, dev) {
-   if (entry->msi_index == nr)
-   return entry->irq;
-   }
-   WARN_ON_ONCE(1);
-   return -EINVAL;
-   }
+   if (!dev->msi_enabled && !dev->msix_enabled)
+   return !nr ? dev->irq : -EINVAL;
 
-   if (dev->msi_enabled) {
-   struct msi_desc *entry = first_pci_msi_entry(dev);
-
-   if (WARN_ON_ONCE(nr >= entry->nvec_used))
-   return -EINVAL;
-   } else {
-   if (WARN_ON_ONCE(nr > 0))
-   return -EINVAL;
-   }
-
-   return dev->irq + nr;
+   irq = msi_get_virq(>dev, nr);
+   return irq ? irq : -EINVAL;
 }
 EXPORT_SYMBOL(pci_irq_vector);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 26/35] genirq/msi: Provide interface to retrieve Linux interrupt number

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

This allows drivers to retrieve the Linux interrupt number instead of
fiddling with MSI descriptors.

msi_get_virq() returns the Linux interrupt number or 0 in case that there
is no entry for the given MSI index.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
---
V2: Simplify the implementation and let PCI deal with the PCI specialities - 
Marc
---
 include/linux/msi.h |2 ++
 kernel/irq/msi.c|   36 
 2 files changed, 38 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -153,6 +153,8 @@ struct msi_device_data {
 
 int msi_setup_device_data(struct device *dev);
 
+unsigned int msi_get_virq(struct device *dev, unsigned int index);
+
 /* Helpers to hide struct msi_desc implementation details */
 #define msi_desc_to_dev(desc)  ((desc)->dev)
 #define dev_to_msi_list(dev)   (&(dev)->msi_list)
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -105,6 +105,42 @@ int msi_setup_device_data(struct device
return 0;
 }
 
+/**
+ * msi_get_virq - Return Linux interrupt number of a MSI interrupt
+ * @dev:   Device to operate on
+ * @index: MSI interrupt index to look for (0-based)
+ *
+ * Return: The Linux interrupt number on success (> 0), 0 if not found
+ */
+unsigned int msi_get_virq(struct device *dev, unsigned int index)
+{
+   struct msi_desc *desc;
+   bool pcimsi;
+
+   if (!dev->msi.data)
+   return 0;
+
+   pcimsi = dev_is_pci(dev) ? to_pci_dev(dev)->msi_enabled : false;
+
+   for_each_msi_entry(desc, dev) {
+   /* PCI-MSI has only one descriptor for multiple interrupts. */
+   if (pcimsi) {
+   if (desc->irq && index < desc->nvec_used)
+   return desc->irq + index;
+   break;
+   }
+
+   /*
+* PCI-MSIX and platform MSI use a descriptor per
+* interrupt.
+*/
+   if (desc->msi_index == index)
+   return desc->irq;
+   }
+   return 0;
+}
+EXPORT_SYMBOL_GPL(msi_get_virq);
+
 #ifdef CONFIG_SYSFS
 static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr,
 char *buf)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 25/35] powerpc/pseries/msi: Let core code check for contiguous entries

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Set the domain info flag and remove the check.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: "Cédric Le Goater" 
Cc: linuxppc-...@lists.ozlabs.org

---
V2: Remove it completely - Cedric
---
 arch/powerpc/platforms/pseries/msi.c |   33 -
 1 file changed, 8 insertions(+), 25 deletions(-)

--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -321,27 +321,6 @@ static int msi_quota_for_device(struct p
return request;
 }
 
-static int check_msix_entries(struct pci_dev *pdev)
-{
-   struct msi_desc *entry;
-   int expected;
-
-   /* There's no way for us to express to firmware that we want
-* a discontiguous, or non-zero based, range of MSI-X entries.
-* So we must reject such requests. */
-
-   expected = 0;
-   for_each_pci_msi_entry(entry, pdev) {
-   if (entry->msi_index != expected) {
-   pr_debug("rtas_msi: bad MSI-X entries.\n");
-   return -EINVAL;
-   }
-   expected++;
-   }
-
-   return 0;
-}
-
 static void rtas_hack_32bit_msi_gen2(struct pci_dev *pdev)
 {
u32 addr_hi, addr_lo;
@@ -380,9 +359,6 @@ static int rtas_prepare_msi_irqs(struct
if (quota && quota < nvec)
return quota;
 
-   if (type == PCI_CAP_ID_MSIX && check_msix_entries(pdev))
-   return -EINVAL;
-
/*
 * Firmware currently refuse any non power of two allocation
 * so we round up if the quota will allow it.
@@ -529,9 +505,16 @@ static struct irq_chip pseries_pci_msi_i
.irq_write_msi_msg  = pseries_msi_write_msg,
 };
 
+
+/*
+ * Set MSI_FLAG_MSIX_CONTIGUOUS as there is no way to express to
+ * firmware to request a discontiguous or non-zero based range of
+ * MSI-X entries. Core code will reject such setup attempts.
+ */
 static struct msi_domain_info pseries_msi_domain_info = {
.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
- MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX),
+ MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX |
+ MSI_FLAG_MSIX_CONTIGUOUS),
.ops   = _pci_msi_domain_ops,
.chip  = _pci_msi_irq_chip,
 };

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 24/35] PCI/MSI: Provide MSI_FLAG_MSIX_CONTIGUOUS

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Provide a domain info flag which makes the core code check for a contiguous
MSI-X index on allocation. That's simpler than checking it at some other
domain callback in architecture code.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Acked-by: Bjorn Helgaas 

---
 drivers/pci/msi/irqdomain.c |   16 ++--
 include/linux/msi.h |2 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -89,9 +89,21 @@ static int pci_msi_domain_check_cap(stru
if (pci_msi_desc_is_multi_msi(desc) &&
!(info->flags & MSI_FLAG_MULTI_PCI_MSI))
return 1;
-   else if (desc->pci.msi_attrib.is_msix && !(info->flags & 
MSI_FLAG_PCI_MSIX))
-   return -ENOTSUPP;
 
+   if (desc->pci.msi_attrib.is_msix) {
+   if (!(info->flags & MSI_FLAG_PCI_MSIX))
+   return -ENOTSUPP;
+
+   if (info->flags & MSI_FLAG_MSIX_CONTIGUOUS) {
+   unsigned int idx = 0;
+
+   /* Check for gaps in the entry indices */
+   for_each_msi_entry(desc, dev) {
+   if (desc->msi_index != idx++)
+   return -ENOTSUPP;
+   }
+   }
+   }
return 0;
 }
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -361,6 +361,8 @@ enum {
MSI_FLAG_LEVEL_CAPABLE  = (1 << 6),
/* Populate sysfs on alloc() and destroy it on free() */
MSI_FLAG_DEV_SYSFS  = (1 << 7),
+   /* MSI-X entries must be contiguous */
+   MSI_FLAG_MSIX_CONTIGUOUS= (1 << 8),
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 23/35] PCI/MSI: Use msi_desc::msi_index

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

The usage of msi_desc::pci::entry_nr is confusing at best. It's the index
into the MSI[X] descriptor table.

Use msi_desc::msi_index which is shared between all MSI incarnations
instead of having a PCI specific storage for no value.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Acked-by: Bjorn Helgaas 

---
 arch/powerpc/platforms/pseries/msi.c |4 ++--
 arch/x86/pci/xen.c   |2 +-
 drivers/pci/msi/irqdomain.c  |2 +-
 drivers/pci/msi/msi.c|   20 
 drivers/pci/xen-pcifront.c   |2 +-
 include/linux/msi.h  |2 --
 6 files changed, 13 insertions(+), 19 deletions(-)

--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -332,7 +332,7 @@ static int check_msix_entries(struct pci
 
expected = 0;
for_each_pci_msi_entry(entry, pdev) {
-   if (entry->pci.msi_attrib.entry_nr != expected) {
+   if (entry->msi_index != expected) {
pr_debug("rtas_msi: bad MSI-X entries.\n");
return -EINVAL;
}
@@ -579,7 +579,7 @@ static int pseries_irq_domain_alloc(stru
int hwirq;
int i, ret;
 
-   hwirq = rtas_query_irq_number(pci_get_pdn(pdev), 
desc->pci.msi_attrib.entry_nr);
+   hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->msi_index);
if (hwirq < 0) {
dev_err(>dev, "Failed to query HW IRQ: %d\n", hwirq);
return hwirq;
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -306,7 +306,7 @@ static int xen_initdom_setup_msi_irqs(st
return -EINVAL;
 
map_irq.table_base = pci_resource_start(dev, bir);
-   map_irq.entry_nr = msidesc->pci.msi_attrib.entry_nr;
+   map_irq.entry_nr = msidesc->msi_index;
}
 
ret = -EINVAL;
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -57,7 +57,7 @@ static irq_hw_number_t pci_msi_domain_ca
 {
struct pci_dev *dev = msi_desc_to_pci_dev(desc);
 
-   return (irq_hw_number_t)desc->pci.msi_attrib.entry_nr |
+   return (irq_hw_number_t)desc->msi_index |
pci_dev_id(dev) << 11 |
(pci_domain_nr(dev->bus) & 0x) << 27;
 }
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -44,7 +44,7 @@ static inline void pci_msi_unmask(struct
 
 static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
 {
-   return desc->pci.mask_base + desc->pci.msi_attrib.entry_nr * 
PCI_MSIX_ENTRY_SIZE;
+   return desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE;
 }
 
 /*
@@ -356,13 +356,10 @@ msi_setup_entry(struct pci_dev *dev, int
if (dev->dev_flags & PCI_DEV_FLAGS_HAS_MSI_MASKING)
control |= PCI_MSI_FLAGS_MASKBIT;
 
-   entry->pci.msi_attrib.is_msix   = 0;
-   entry->pci.msi_attrib.is_64 = !!(control & 
PCI_MSI_FLAGS_64BIT);
-   entry->pci.msi_attrib.is_virtual= 0;
-   entry->pci.msi_attrib.entry_nr  = 0;
+   entry->pci.msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT);
entry->pci.msi_attrib.can_mask  = !pci_msi_ignore_mask &&
  !!(control & PCI_MSI_FLAGS_MASKBIT);
-   entry->pci.msi_attrib.default_irq   = dev->irq; /* Save IOAPIC 
IRQ */
+   entry->pci.msi_attrib.default_irq = dev->irq;
entry->pci.msi_attrib.multi_cap = (control & PCI_MSI_FLAGS_QMASK) >> 1;
entry->pci.msi_attrib.multiple  = ilog2(__roundup_pow_of_two(nvec));
 
@@ -504,12 +501,11 @@ static int msix_setup_entries(struct pci
entry->pci.msi_attrib.is_64 = 1;
 
if (entries)
-   entry->pci.msi_attrib.entry_nr = entries[i].entry;
+   entry->msi_index = entries[i].entry;
else
-   entry->pci.msi_attrib.entry_nr = i;
+   entry->msi_index = i;
 
-   entry->pci.msi_attrib.is_virtual =
-   entry->pci.msi_attrib.entry_nr >= vec_count;
+   entry->pci.msi_attrib.is_virtual = entry->msi_index >= 
vec_count;
 
entry->pci.msi_attrib.can_mask  = !pci_msi_ignore_mask &&
  
!entry->pci.msi_attrib.is_virtual;
@@ -1045,7 +1041,7 @@ int pci_irq_vector(struct pci_dev *dev,
struct msi_desc *entry;
 
for_each_pci_msi_entry(entry, dev) {
-   if (entry->pci.msi_attrib.entry_nr == nr)
+   if (entry->msi_index ==

[patch V3 21/35] bus: fsl-mc-msi: Use msi_desc::msi_index

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Use the common msi_index member and get rid of the pointless wrapper struct.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Stuart Yoder 
Cc: Laurentiu Tudor 
---
 drivers/bus/fsl-mc/fsl-mc-allocator.c |2 +-
 drivers/bus/fsl-mc/fsl-mc-msi.c   |6 +++---
 include/linux/msi.h   |   10 --
 3 files changed, 4 insertions(+), 14 deletions(-)

--- a/drivers/bus/fsl-mc/fsl-mc-allocator.c
+++ b/drivers/bus/fsl-mc/fsl-mc-allocator.c
@@ -393,7 +393,7 @@ int fsl_mc_populate_irq_pool(struct fsl_
}
 
for_each_msi_entry(msi_desc, _bus_dev->dev) {
-   mc_dev_irq = _resources[msi_desc->fsl_mc.msi_index];
+   mc_dev_irq = _resources[msi_desc->msi_index];
mc_dev_irq->msi_desc = msi_desc;
mc_dev_irq->resource.id = msi_desc->irq;
}
--- a/drivers/bus/fsl-mc/fsl-mc-msi.c
+++ b/drivers/bus/fsl-mc/fsl-mc-msi.c
@@ -29,7 +29,7 @@ static irq_hw_number_t fsl_mc_domain_cal
 * Make the base hwirq value for ICID*1 so it is readable
 * as a decimal value in /proc/interrupts.
 */
-   return (irq_hw_number_t)(desc->fsl_mc.msi_index + (dev->icid * 1));
+   return (irq_hw_number_t)(desc->msi_index + (dev->icid * 1));
 }
 
 static void fsl_mc_msi_set_desc(msi_alloc_info_t *arg,
@@ -122,7 +122,7 @@ static void fsl_mc_msi_write_msg(struct
struct fsl_mc_device *mc_bus_dev = to_fsl_mc_device(msi_desc->dev);
struct fsl_mc_bus *mc_bus = to_fsl_mc_bus(mc_bus_dev);
struct fsl_mc_device_irq *mc_dev_irq =
-   _bus->irq_resources[msi_desc->fsl_mc.msi_index];
+   _bus->irq_resources[msi_desc->msi_index];
 
msi_desc->msg = *msg;
 
@@ -235,7 +235,7 @@ static int fsl_mc_msi_alloc_descs(struct
goto cleanup_msi_descs;
}
 
-   msi_desc->fsl_mc.msi_index = i;
+   msi_desc->msi_index = i;
INIT_LIST_HEAD(_desc->list);
list_add_tail(_desc->list, dev_to_msi_list(dev));
}
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -107,14 +107,6 @@ struct pci_msi_desc {
 };
 
 /**
- * fsl_mc_msi_desc - FSL-MC device specific msi descriptor data
- * @msi_index: The index of the MSI descriptor
- */
-struct fsl_mc_msi_desc {
-   u16 msi_index;
-};
-
-/**
  * ti_sci_inta_msi_desc - TISCI based INTA specific msi descriptor data
  * @dev_index: TISCI device index
  */
@@ -137,7 +129,6 @@ struct ti_sci_inta_msi_desc {
  *
  * @msi_index: Index of the msi descriptor
  * @pci:   [PCI]   PCI speficic msi descriptor data
- * @fsl_mc:[fsl-mc]FSL MC device specific msi descriptor data
  * @inta:  [INTA]  TISCI based INTA specific msi descriptor data
  */
 struct msi_desc {
@@ -158,7 +149,6 @@ struct msi_desc {
u16 msi_index;
union {
struct pci_msi_desc pci;
-   struct fsl_mc_msi_desc  fsl_mc;
struct ti_sci_inta_msi_desc inta;
};
 };

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 22/35] soc: ti: ti_sci_inta_msi: Use msi_desc::msi_index

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Use the common msi_index member and get rid of the pointless wrapper struct.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Nishanth Menon 
Cc: Tero Kristo 
Cc: Santosh Shilimkar 
Cc: Thomas Gleixner 
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/irqchip/irq-ti-sci-inta.c |2 +-
 drivers/soc/ti/ti_sci_inta_msi.c  |6 +++---
 include/linux/msi.h   |   16 ++--
 3 files changed, 6 insertions(+), 18 deletions(-)

--- a/drivers/irqchip/irq-ti-sci-inta.c
+++ b/drivers/irqchip/irq-ti-sci-inta.c
@@ -595,7 +595,7 @@ static void ti_sci_inta_msi_set_desc(msi
struct platform_device *pdev = to_platform_device(desc->dev);
 
arg->desc = desc;
-   arg->hwirq = TO_HWIRQ(pdev->id, desc->inta.dev_index);
+   arg->hwirq = TO_HWIRQ(pdev->id, desc->msi_index);
 }
 
 static struct msi_domain_ops ti_sci_inta_msi_ops = {
--- a/drivers/soc/ti/ti_sci_inta_msi.c
+++ b/drivers/soc/ti/ti_sci_inta_msi.c
@@ -84,7 +84,7 @@ static int ti_sci_inta_msi_alloc_descs(s
return -ENOMEM;
}
 
-   msi_desc->inta.dev_index = res->desc[set].start + i;
+   msi_desc->msi_index = res->desc[set].start + i;
INIT_LIST_HEAD(_desc->list);
list_add_tail(_desc->list, dev_to_msi_list(dev));
count++;
@@ -96,7 +96,7 @@ static int ti_sci_inta_msi_alloc_descs(s
return -ENOMEM;
}
 
-   msi_desc->inta.dev_index = res->desc[set].start_sec + i;
+   msi_desc->msi_index = res->desc[set].start_sec + i;
INIT_LIST_HEAD(_desc->list);
list_add_tail(_desc->list, dev_to_msi_list(dev));
count++;
@@ -154,7 +154,7 @@ unsigned int ti_sci_inta_msi_get_virq(st
struct msi_desc *desc;
 
for_each_msi_entry(desc, dev)
-   if (desc->inta.dev_index == dev_index)
+   if (desc->msi_index == dev_index)
return desc->irq;
 
return -ENODEV;
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -107,14 +107,6 @@ struct pci_msi_desc {
 };
 
 /**
- * ti_sci_inta_msi_desc - TISCI based INTA specific msi descriptor data
- * @dev_index: TISCI device index
- */
-struct ti_sci_inta_msi_desc {
-   u16 dev_index;
-};
-
-/**
  * struct msi_desc - Descriptor structure for MSI based interrupts
  * @list:  List head for management
  * @irq:   The base interrupt number
@@ -128,8 +120,7 @@ struct ti_sci_inta_msi_desc {
  * @write_msi_msg_data:Data parameter for the callback.
  *
  * @msi_index: Index of the msi descriptor
- * @pci:   [PCI]   PCI speficic msi descriptor data
- * @inta:  [INTA]  TISCI based INTA specific msi descriptor data
+ * @pci:   PCI specific msi descriptor data
  */
 struct msi_desc {
/* Shared device/bus type independent data */
@@ -147,10 +138,7 @@ struct msi_desc {
void *write_msi_msg_data;
 
u16 msi_index;
-   union {
-   struct pci_msi_desc pci;
-   struct ti_sci_inta_msi_desc inta;
-   };
+   struct pci_msi_desc pci;
 };
 
 /**

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 18/35] platform-msi: Store platform private data pointer in msi_device_data

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Storing the platform private data in a MSI descriptor is sloppy at
best. The data belongs to the device and not to the descriptor.
Add a pointer to struct msi_device_data and store the pointer there.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/base/platform-msi.c |   79 +---
 include/linux/msi.h |4 +-
 2 files changed, 34 insertions(+), 49 deletions(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -38,9 +38,7 @@ static DEFINE_IDA(platform_msi_devid_ida
  */
 static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc)
 {
-   u32 devid;
-
-   devid = desc->platform.msi_priv_data->devid;
+   u32 devid = desc->dev->msi.data->platform_data->devid;
 
return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
 }
@@ -85,11 +83,8 @@ static void platform_msi_update_dom_ops(
 static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
struct msi_desc *desc = irq_data_get_msi_desc(data);
-   struct platform_msi_priv_data *priv_data;
-
-   priv_data = desc->platform.msi_priv_data;
 
-   priv_data->write_msg(desc, msg);
+   desc->dev->msi.data->platform_data->write_msg(desc, msg);
 }
 
 static void platform_msi_update_chip_ops(struct msi_domain_info *info)
@@ -126,9 +121,7 @@ static void platform_msi_free_descs(stru
 }
 
 static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq,
-int nvec,
-struct platform_msi_priv_data 
*data)
-
+int nvec)
 {
struct msi_desc *desc;
int i, base = 0;
@@ -144,7 +137,6 @@ static int platform_msi_alloc_descs_with
if (!desc)
break;
 
-   desc->platform.msi_priv_data = data;
desc->platform.msi_index = base + i;
desc->irq = virq ? virq + i : 0;
 
@@ -161,11 +153,9 @@ static int platform_msi_alloc_descs_with
return 0;
 }
 
-static int platform_msi_alloc_descs(struct device *dev, int nvec,
-   struct platform_msi_priv_data *data)
-
+static int platform_msi_alloc_descs(struct device *dev, int nvec)
 {
-   return platform_msi_alloc_descs_with_irq(dev, 0, nvec, data);
+   return platform_msi_alloc_descs_with_irq(dev, 0, nvec);
 }
 
 /**
@@ -199,9 +189,8 @@ struct irq_domain *platform_msi_create_i
return domain;
 }
 
-static struct platform_msi_priv_data *
-platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec,
-irq_write_msi_msg_t write_msi_msg)
+static int platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec,
+   irq_write_msi_msg_t write_msi_msg)
 {
struct platform_msi_priv_data *datap;
int err;
@@ -213,41 +202,44 @@ platform_msi_alloc_priv_data(struct devi
 * capable devices).
 */
if (!dev->msi.domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS)
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
 
if (dev->msi.domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) {
dev_err(dev, "Incompatible msi_domain, giving up\n");
-   return ERR_PTR(-EINVAL);
+   return -EINVAL;
}
 
err = msi_setup_device_data(dev);
if (err)
-   return ERR_PTR(err);
+   return err;
 
-   /* Already had a helping of MSI? Greed... */
-   if (!list_empty(dev_to_msi_list(dev)))
-   return ERR_PTR(-EBUSY);
+   /* Already initialized? */
+   if (dev->msi.data->platform_data)
+   return -EBUSY;
 
datap = kzalloc(sizeof(*datap), GFP_KERNEL);
if (!datap)
-   return ERR_PTR(-ENOMEM);
+   return -ENOMEM;
 
datap->devid = ida_simple_get(_msi_devid_ida,
  0, 1 << DEV_ID_SHIFT, GFP_KERNEL);
if (datap->devid < 0) {
err = datap->devid;
kfree(datap);
-   return ERR_PTR(err);
+   return err;
}
 
datap->write_msg = write_msi_msg;
datap->dev = dev;
-
-   return datap;
+   dev->msi.data->platform_data = datap;
+   return 0;
 }
 
-static void platform_msi_free_priv_data(struct platform_msi_priv_data *data)
+static void platform_msi_free_priv_data(struct device *dev)
 {
+   struct platform_msi_priv_data *data = dev->msi.data->platform_data;
+
+   dev->msi.data->platform_data = NULL;
ida_simple_remove(_msi_devid_ida, data->devid);
kfree(data);
 }
@@ -264,1

[patch V3 20/35] platform-msi: Use msi_desc::msi_index

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Use the common msi_index member and get rid of the pointless wrapper struct.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/base/platform-msi.c |   10 +-
 drivers/dma/qcom/hidma.c|4 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |4 ++--
 drivers/mailbox/bcm-flexrm-mailbox.c|4 ++--
 include/linux/msi.h |   10 --
 5 files changed, 11 insertions(+), 21 deletions(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -40,7 +40,7 @@ static irq_hw_number_t platform_msi_calc
 {
u32 devid = desc->dev->msi.data->platform_data->devid;
 
-   return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index;
+   return (devid << (32 - DEV_ID_SHIFT)) | desc->msi_index;
 }
 
 static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
@@ -112,8 +112,8 @@ static void platform_msi_free_descs(stru
struct msi_desc *desc, *tmp;
 
list_for_each_entry_safe(desc, tmp, dev_to_msi_list(dev), list) {
-   if (desc->platform.msi_index >= base &&
-   desc->platform.msi_index < (base + nvec)) {
+   if (desc->msi_index >= base &&
+   desc->msi_index < (base + nvec)) {
list_del(>list);
free_msi_entry(desc);
}
@@ -129,7 +129,7 @@ static int platform_msi_alloc_descs_with
if (!list_empty(dev_to_msi_list(dev))) {
desc = list_last_entry(dev_to_msi_list(dev),
   struct msi_desc, list);
-   base = desc->platform.msi_index + 1;
+   base = desc->msi_index + 1;
}
 
for (i = 0; i < nvec; i++) {
@@ -137,7 +137,7 @@ static int platform_msi_alloc_descs_with
if (!desc)
break;
 
-   desc->platform.msi_index = base + i;
+   desc->msi_index = base + i;
desc->irq = virq ? virq + i : 0;
 
list_add_tail(>list, dev_to_msi_list(dev));
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -666,7 +666,7 @@ static void hidma_write_msi_msg(struct m
struct device *dev = msi_desc_to_dev(desc);
struct hidma_dev *dmadev = dev_get_drvdata(dev);
 
-   if (!desc->platform.msi_index) {
+   if (!desc->msi_index) {
writel(msg->address_lo, dmadev->dev_evca + 0x118);
writel(msg->address_hi, dmadev->dev_evca + 0x11C);
writel(msg->data, dmadev->dev_evca + 0x120);
@@ -702,7 +702,7 @@ static int hidma_request_msi(struct hidm
return rc;
 
for_each_msi_entry(desc, >dev) {
-   if (!desc->platform.msi_index)
+   if (!desc->msi_index)
dmadev->msi_virqbase = desc->irq;
 
rc = devm_request_irq(>dev, desc->irq,
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3142,7 +3142,7 @@ static void arm_smmu_write_msi_msg(struc
phys_addr_t doorbell;
struct device *dev = msi_desc_to_dev(desc);
struct arm_smmu_device *smmu = dev_get_drvdata(dev);
-   phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
+   phys_addr_t *cfg = arm_smmu_msi_cfg[desc->msi_index];
 
doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
doorbell &= MSI_CFG0_ADDR_MASK;
@@ -3183,7 +3183,7 @@ static void arm_smmu_setup_msis(struct a
}
 
for_each_msi_entry(desc, dev) {
-   switch (desc->platform.msi_index) {
+   switch (desc->msi_index) {
case EVTQ_MSI_INDEX:
smmu->evtq.q.irq = desc->irq;
break;
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -1484,7 +1484,7 @@ static void flexrm_mbox_msi_write(struct
 {
struct device *dev = msi_desc_to_dev(desc);
struct flexrm_mbox *mbox = dev_get_drvdata(dev);
-   struct flexrm_ring *ring = >rings[desc->platform.msi_index];
+   struct flexrm_ring *ring = >rings[desc->msi_index];
 
/* Configure per-Ring MSI registers */
writel_relaxed(msg->address_lo, ring->regs + RING_MSI_ADDR_LS);
@@ -1609,7 +1609,7 @@ static int flexrm_mbox_probe(struct plat
 
/* Save alloced IRQ numbers for each ring */
for_each_msi_entry(desc, dev) {
-   ring = >rings[desc->platform.msi_index];
+   ring = >rings[desc->msi_index];
ring->irq = desc->irq;
}
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -107,14 +10

[patch V3 19/35] genirq/msi: Consolidate MSI descriptor data

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

All non PCI/MSI usage variants have data structures in struct msi_desc with
only one member: xxx_index. PCI/MSI has a entry_nr member.

Add a common msi_index member to struct msi_desc so all implementations can
share it which allows further consolidation.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/msi.h |2 ++
 1 file changed, 2 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -143,6 +143,7 @@ struct ti_sci_inta_msi_desc {
  * address or data changes
  * @write_msi_msg_data:Data parameter for the callback.
  *
+ * @msi_index: Index of the msi descriptor
  * @pci:   [PCI]   PCI speficic msi descriptor data
  * @platform:  [platform]  Platform device specific msi descriptor data
  * @fsl_mc:[fsl-mc]FSL MC device specific msi descriptor data
@@ -163,6 +164,7 @@ struct msi_desc {
void (*write_msi_msg)(struct msi_desc *entry, void *data);
void *write_msi_msg_data;
 
+   u16 msi_index;
union {
struct pci_msi_desc pci;
struct platform_msi_descplatform;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 17/35] platform-msi: Rename functions and clarify comments

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

It's hard to distinguish what platform_msi_domain_alloc() and
platform_msi_domain_alloc_irqs() are about. Make the distinction more
explicit and add comments which explain the use cases properly.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/base/platform-msi.c |   36 +---
 drivers/irqchip/irq-mbigen.c|4 ++--
 drivers/irqchip/irq-mvebu-icu.c |6 +++---
 include/linux/msi.h |8 
 4 files changed, 30 insertions(+), 24 deletions(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -313,17 +313,18 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_fr
  *  a platform-msi domain
  * @domain:The platform-msi domain
  *
- * Returns the private data provided when calling
- * platform_msi_create_device_domain.
+ * Return: The private data provided when calling
+ * platform_msi_create_device_domain().
  */
 void *platform_msi_get_host_data(struct irq_domain *domain)
 {
struct platform_msi_priv_data *data = domain->host_data;
+
return data->host_data;
 }
 
 /**
- * __platform_msi_create_device_domain - Create a platform-msi domain
+ * __platform_msi_create_device_domain - Create a platform-msi device domain
  *
  * @dev:   The device generating the MSIs
  * @nvec:  The number of MSIs that need to be allocated
@@ -332,7 +333,11 @@ void *platform_msi_get_host_data(struct
  * @ops:   The hierarchy domain operations to use
  * @host_data: Private data associated to this domain
  *
- * Returns an irqdomain for @nvec interrupts
+ * Return: An irqdomain for @nvec interrupts on success, NULL in case of error.
+ *
+ * This is for interrupt domains which stack on a platform-msi domain
+ * created by platform_msi_create_irq_domain(). @dev->msi.domain points to
+ * that platform-msi domain which is the parent for the new domain.
  */
 struct irq_domain *
 __platform_msi_create_device_domain(struct device *dev,
@@ -372,18 +377,19 @@ struct irq_domain *
 }
 
 /**
- * platform_msi_domain_free - Free interrupts associated with a platform-msi
- *domain
+ * platform_msi_device_domain_free - Free interrupts associated with a 
platform-msi
+ *  device domain
  *
- * @domain:The platform-msi domain
+ * @domain:The platform-msi device domain
  * @virq:  The base irq from which to perform the free operation
  * @nvec:  How many interrupts to free from @virq
  */
-void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
- unsigned int nvec)
+void platform_msi_device_domain_free(struct irq_domain *domain, unsigned int 
virq,
+unsigned int nvec)
 {
struct platform_msi_priv_data *data = domain->host_data;
struct msi_desc *desc, *tmp;
+
for_each_msi_entry_safe(desc, tmp, data->dev) {
if (WARN_ON(!desc->irq || desc->nvec_used != 1))
return;
@@ -397,10 +403,10 @@ void platform_msi_domain_free(struct irq
 }
 
 /**
- * platform_msi_domain_alloc - Allocate interrupts associated with
- *a platform-msi domain
+ * platform_msi_device_domain_alloc - Allocate interrupts associated with
+ *   a platform-msi device domain
  *
- * @domain:The platform-msi domain
+ * @domain:The platform-msi device domain
  * @virq:  The base irq from which to perform the allocate operation
  * @nr_irqs:   How many interrupts to free from @virq
  *
@@ -408,8 +414,8 @@ void platform_msi_domain_free(struct irq
  * with irq_domain_mutex held (which can only be done as part of a
  * top-level interrupt allocation).
  */
-int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
- unsigned int nr_irqs)
+int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int 
virq,
+unsigned int nr_irqs)
 {
struct platform_msi_priv_data *data = domain->host_data;
int err;
@@ -421,7 +427,7 @@ int platform_msi_domain_alloc(struct irq
err = msi_domain_populate_irqs(domain->parent, data->dev,
   virq, nr_irqs, >arg);
if (err)
-   platform_msi_domain_free(domain, virq, nr_irqs);
+   platform_msi_device_domain_free(domain, virq, nr_irqs);
 
return err;
 }
--- a/drivers/irqchip/irq-mbigen.c
+++ b/drivers/irqchip/irq-mbigen.c
@@ -207,7 +207,7 @@ static int mbigen_irq_domain_alloc(struc
if (err)
return err;
 
-   err = platform_msi_domain_alloc(domain, virq, nr_irqs);
+   err = platform_msi_device_domain_alloc(domain, virq, nr_irqs);
if (err)
return err;

[patch V3 16/35] genirq/msi: Remove the original sysfs interfaces

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

No more users. Refactor the core code accordingly and move the global
interface under CONFIG_PCI_MSI_ARCH_FALLBACKS.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/msi.h |   29 +++-
 kernel/irq/msi.c|   53 +++-
 2 files changed, 28 insertions(+), 54 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -246,26 +246,6 @@ void __pci_write_msi_msg(struct msi_desc
 void pci_msi_mask_irq(struct irq_data *data);
 void pci_msi_unmask_irq(struct irq_data *data);
 
-#ifdef CONFIG_SYSFS
-int msi_device_populate_sysfs(struct device *dev);
-void msi_device_destroy_sysfs(struct device *dev);
-
-const struct attribute_group **msi_populate_sysfs(struct device *dev);
-void msi_destroy_sysfs(struct device *dev,
-  const struct attribute_group **msi_irq_groups);
-#else
-static inline int msi_device_populate_sysfs(struct device *dev) { return 0; }
-static inline void msi_device_destroy_sysfs(struct device *dev) { }
-
-static inline const struct attribute_group **msi_populate_sysfs(struct device 
*dev)
-{
-   return NULL;
-}
-static inline void msi_destroy_sysfs(struct device *dev, const struct 
attribute_group **msi_irq_groups)
-{
-}
-#endif
-
 /*
  * The arch hooks to setup up msi irqs. Default functions are implemented
  * as weak symbols so that they /can/ be overriden by architecture specific
@@ -279,7 +259,14 @@ int arch_setup_msi_irq(struct pci_dev *d
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
-#endif
+#ifdef CONFIG_SYSFS
+int msi_device_populate_sysfs(struct device *dev);
+void msi_device_destroy_sysfs(struct device *dev);
+#else /* CONFIG_SYSFS */
+static inline int msi_device_populate_sysfs(struct device *dev) { return 0; }
+static inline void msi_device_destroy_sysfs(struct device *dev) { }
+#endif /* !CONFIG_SYSFS */
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
 
 /*
  * The restore hook is still available even for fully irq domain based
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -118,12 +118,8 @@ static ssize_t msi_mode_show(struct devi
 /**
  * msi_populate_sysfs - Populate msi_irqs sysfs entries for devices
  * @dev:   The device(PCI, platform etc) who will get sysfs entries
- *
- * Return attribute_group ** so that specific bus MSI can save it to
- * somewhere during initilizing msi irqs. If devices has no MSI irq,
- * return NULL; if it fails to populate sysfs, return ERR_PTR
  */
-const struct attribute_group **msi_populate_sysfs(struct device *dev)
+static const struct attribute_group **msi_populate_sysfs(struct device *dev)
 {
const struct attribute_group **msi_irq_groups;
struct attribute **msi_attrs, *msi_attr;
@@ -214,41 +210,32 @@ int msi_device_populate_sysfs(struct dev
 }
 
 /**
- * msi_destroy_sysfs - Destroy msi_irqs sysfs entries for devices
- * @dev:   The device(PCI, platform etc) who will remove sysfs 
entries
- * @msi_irq_groups:attribute_group for device msi_irqs entries
- */
-void msi_destroy_sysfs(struct device *dev, const struct attribute_group 
**msi_irq_groups)
-{
-   struct device_attribute *dev_attr;
-   struct attribute **msi_attrs;
-   int count = 0;
-
-   if (msi_irq_groups) {
-   sysfs_remove_groups(>kobj, msi_irq_groups);
-   msi_attrs = msi_irq_groups[0]->attrs;
-   while (msi_attrs[count]) {
-   dev_attr = container_of(msi_attrs[count],
-   struct device_attribute, attr);
-   kfree(dev_attr->attr.name);
-   kfree(dev_attr);
-   ++count;
-   }
-   kfree(msi_attrs);
-   kfree(msi_irq_groups[0]);
-   kfree(msi_irq_groups);
-   }
-}
-
-/**
  * msi_device_destroy_sysfs - Destroy msi_irqs sysfs entries for a device
  * @dev:   The device (PCI, platform etc) for which to remove
  * sysfs entries
  */
 void msi_device_destroy_sysfs(struct device *dev)
 {
-   msi_destroy_sysfs(dev, dev->msi.data->attrs);
+   const struct attribute_group **msi_irq_groups = dev->msi.data->attrs;
+   struct device_attribute *dev_attr;
+   struct attribute **msi_attrs;
+   int count = 0;
+
dev->msi.data->attrs = NULL;
+   if (!msi_irq_groups)
+   return;
+
+   sysfs_remove_groups(>kobj, msi_irq_groups);
+   msi_attrs = msi_irq_groups[0]->attrs;
+   while (msi_attrs[count]) {
+   dev_attr = container_of(msi_attrs[count], struct 
device_attribute, attr);
+   kfree(dev_attr->attr.name);
+   kfree(dev_attr);
+   ++count;
+   }
+   kfree(msi_at

[patch V3 15/35] platform-msi: Let the core code handle sysfs groups

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Set the domain info flag and remove the local sysfs code.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/base/platform-msi.c |   11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -23,7 +23,6 @@
 struct platform_msi_priv_data {
struct device   *dev;
void*host_data;
-   const struct attribute_group**msi_irq_groups;
msi_alloc_info_targ;
irq_write_msi_msg_t write_msg;
int devid;
@@ -191,6 +190,7 @@ struct irq_domain *platform_msi_create_i
platform_msi_update_dom_ops(info);
if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
platform_msi_update_chip_ops(info);
+   info->flags |= MSI_FLAG_DEV_SYSFS;
 
domain = msi_create_irq_domain(fwnode, info, parent);
if (domain)
@@ -279,16 +279,8 @@ int platform_msi_domain_alloc_irqs(struc
if (err)
goto out_free_desc;
 
-   priv_data->msi_irq_groups = msi_populate_sysfs(dev);
-   if (IS_ERR(priv_data->msi_irq_groups)) {
-   err = PTR_ERR(priv_data->msi_irq_groups);
-   goto out_free_irqs;
-   }
-
return 0;
 
-out_free_irqs:
-   msi_domain_free_irqs(dev->msi.domain, dev);
 out_free_desc:
platform_msi_free_descs(dev, 0, nvec);
 out_free_priv_data:
@@ -308,7 +300,6 @@ void platform_msi_domain_free_irqs(struc
struct msi_desc *desc;
 
desc = first_msi_entry(dev);
-   msi_destroy_sysfs(dev, 
desc->platform.msi_priv_data->msi_irq_groups);
platform_msi_free_priv_data(desc->platform.msi_priv_data);
}
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 14/35] PCI/MSI: Let the irq code handle sysfs groups

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Set the domain info flag which makes the core code handle sysfs groups and
put an explicit invocation into the legacy code.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/msi/irqdomain.c |2 +-
 drivers/pci/msi/legacy.c|6 +-
 drivers/pci/msi/msi.c   |   23 ---
 include/linux/pci.h |1 -
 4 files changed, 6 insertions(+), 26 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -159,7 +159,7 @@ struct irq_domain *pci_msi_create_irq_do
if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
pci_msi_domain_update_chip_ops(info);
 
-   info->flags |= MSI_FLAG_ACTIVATE_EARLY;
+   info->flags |= MSI_FLAG_ACTIVATE_EARLY | MSI_FLAG_DEV_SYSFS;
if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
info->flags |= MSI_FLAG_MUST_REACTIVATE;
 
--- a/drivers/pci/msi/legacy.c
+++ b/drivers/pci/msi/legacy.c
@@ -70,10 +70,14 @@ int pci_msi_legacy_setup_msi_irqs(struct
 {
int ret = arch_setup_msi_irqs(dev, nvec, type);
 
-   return pci_msi_setup_check_result(dev, type, ret);
+   ret = pci_msi_setup_check_result(dev, type, ret);
+   if (!ret)
+   ret = msi_device_populate_sysfs(>dev);
+   return ret;
 }
 
 void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev)
 {
+   msi_device_destroy_sysfs(>dev);
arch_teardown_msi_irqs(dev);
 }
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -233,11 +233,6 @@ static void free_msi_irqs(struct pci_dev
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
 
-   if (dev->msi_irq_groups) {
-   msi_destroy_sysfs(>dev, dev->msi_irq_groups);
-   dev->msi_irq_groups = NULL;
-   }
-
pci_msi_teardown_msi_irqs(dev);
 
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -417,7 +412,6 @@ static int msi_verify_entries(struct pci
 static int msi_capability_init(struct pci_dev *dev, int nvec,
   struct irq_affinity *affd)
 {
-   const struct attribute_group **groups;
struct msi_desc *entry;
int ret;
 
@@ -448,14 +442,6 @@ static int msi_capability_init(struct pc
if (ret)
goto err;
 
-   groups = msi_populate_sysfs(>dev);
-   if (IS_ERR(groups)) {
-   ret = PTR_ERR(groups);
-   goto err;
-   }
-
-   dev->msi_irq_groups = groups;
-
/* Set MSI enabled bits */
pci_intx_for_msi(dev, 0);
pci_msi_set_enable(dev, 1);
@@ -584,7 +570,6 @@ static void msix_mask_all(void __iomem *
 static int msix_capability_init(struct pci_dev *dev, struct msix_entry 
*entries,
int nvec, struct irq_affinity *affd)
 {
-   const struct attribute_group **groups;
void __iomem *base;
int ret, tsize;
u16 control;
@@ -629,14 +614,6 @@ static int msix_capability_init(struct p
 
msix_update_entries(dev, entries);
 
-   groups = msi_populate_sysfs(>dev);
-   if (IS_ERR(groups)) {
-   ret = PTR_ERR(groups);
-   goto out_free;
-   }
-
-   dev->msi_irq_groups = groups;
-
/* Disable INTX and unmask MSI-X */
pci_intx_for_msi(dev, 0);
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -475,7 +475,6 @@ struct pci_dev {
 #ifdef CONFIG_PCI_MSI
void __iomem*msix_base;
raw_spinlock_t  msi_lock;
-   const struct attribute_group **msi_irq_groups;
 #endif
struct pci_vpd  vpd;
 #ifdef CONFIG_PCIE_DPC

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 12/35] soc: ti: ti_sci_inta_msi: Allocate MSI device data on first use

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Allocate the MSI device data on first invocation of the allocation function.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Nishanth Menon 
Cc: Tero Kristo 
Cc: Santosh Shilimkar 
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/soc/ti/ti_sci_inta_msi.c |4 
 1 file changed, 4 insertions(+)

--- a/drivers/soc/ti/ti_sci_inta_msi.c
+++ b/drivers/soc/ti/ti_sci_inta_msi.c
@@ -120,6 +120,10 @@ int ti_sci_inta_msi_domain_alloc_irqs(st
if (pdev->id < 0)
return -ENODEV;
 
+   ret = msi_setup_device_data(dev);
+   if (ret)
+   return ret;
+
nvec = ti_sci_inta_msi_alloc_descs(dev, res);
if (nvec <= 0)
return nvec;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 13/35] genirq/msi: Provide msi_device_populate/destroy_sysfs()

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Add new allocation functions which can be activated by domain info
flags. They store the groups pointer in struct msi_device_data.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/msi.h |4 
 kernel/irq/msi.c|   42 --
 2 files changed, 44 insertions(+), 2 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -56,6 +56,8 @@ struct irq_data;
 struct msi_desc;
 struct pci_dev;
 struct platform_msi_priv_data;
+struct attribute_group;
+
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 #ifdef CONFIG_GENERIC_MSI_IRQ
 void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg);
@@ -174,9 +176,11 @@ struct msi_desc {
 /**
  * msi_device_data - MSI per device data
  * @properties:MSI properties which are interesting to drivers
+ * @attrs: Pointer to the sysfs attribute group
  */
 struct msi_device_data {
unsigned long   properties;
+   const struct attribute_group**attrs;
 };
 
 int msi_setup_device_data(struct device *dev);
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -200,6 +200,20 @@ const struct attribute_group **msi_popul
 }
 
 /**
+ * msi_device_populate_sysfs - Populate msi_irqs sysfs entries for a device
+ * @dev:   The device (PCI, platform etc) which will get sysfs entries
+ */
+int msi_device_populate_sysfs(struct device *dev)
+{
+   const struct attribute_group **group = msi_populate_sysfs(dev);
+
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+   dev->msi.data->attrs = group;
+   return 0;
+}
+
+/**
  * msi_destroy_sysfs - Destroy msi_irqs sysfs entries for devices
  * @dev:   The device(PCI, platform etc) who will remove sysfs 
entries
  * @msi_irq_groups:attribute_group for device msi_irqs entries
@@ -225,6 +239,17 @@ void msi_destroy_sysfs(struct device *de
kfree(msi_irq_groups);
}
 }
+
+/**
+ * msi_device_destroy_sysfs - Destroy msi_irqs sysfs entries for a device
+ * @dev:   The device (PCI, platform etc) for which to remove
+ * sysfs entries
+ */
+void msi_device_destroy_sysfs(struct device *dev)
+{
+   msi_destroy_sysfs(dev, dev->msi.data->attrs);
+   dev->msi.data->attrs = NULL;
+}
 #endif
 
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
@@ -672,8 +697,19 @@ int msi_domain_alloc_irqs(struct irq_dom
 {
struct msi_domain_info *info = domain->host_data;
struct msi_domain_ops *ops = info->ops;
+   int ret;
 
-   return ops->domain_alloc_irqs(domain, dev, nvec);
+   ret = ops->domain_alloc_irqs(domain, dev, nvec);
+   if (ret)
+   return ret;
+
+   if (!(info->flags & MSI_FLAG_DEV_SYSFS))
+   return 0;
+
+   ret = msi_device_populate_sysfs(dev);
+   if (ret)
+   msi_domain_free_irqs(domain, dev);
+   return ret;
 }
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
@@ -712,7 +748,9 @@ void msi_domain_free_irqs(struct irq_dom
struct msi_domain_info *info = domain->host_data;
struct msi_domain_ops *ops = info->ops;
 
-   return ops->domain_free_irqs(domain, dev);
+   if (info->flags & MSI_FLAG_DEV_SYSFS)
+   msi_device_destroy_sysfs(dev);
+   ops->domain_free_irqs(domain, dev);
 }
 
 /**

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 11/35] bus: fsl-mc-msi: Allocate MSI device data on first use

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Allocate the MSI device data on first invocation of the allocation function.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Cc: Stuart Yoder 
Cc: Laurentiu Tudor 
---
 drivers/bus/fsl-mc/fsl-mc-msi.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/drivers/bus/fsl-mc/fsl-mc-msi.c
+++ b/drivers/bus/fsl-mc/fsl-mc-msi.c
@@ -253,6 +253,14 @@ int fsl_mc_msi_domain_alloc_irqs(struct
struct irq_domain *msi_domain;
int error;
 
+   msi_domain = dev_get_msi_domain(dev);
+   if (!msi_domain)
+   return -EINVAL;
+
+   error = msi_setup_device_data(dev);
+   if (error)
+   return error;
+
if (!list_empty(dev_to_msi_list(dev)))
return -EINVAL;
 
@@ -260,12 +268,6 @@ int fsl_mc_msi_domain_alloc_irqs(struct
if (error < 0)
return error;
 
-   msi_domain = dev_get_msi_domain(dev);
-   if (!msi_domain) {
-   error = -EINVAL;
-   goto cleanup_msi_descs;
-   }
-
/*
 * NOTE: Calling this function will trigger the invocation of the
 * its_fsl_mc_msi_prepare() callback

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 09/35] PCI/MSI: Allocate MSI device data on first use

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Allocate MSI device data on first use, i.e. when a PCI driver invokes one
of the PCI/MSI enablement functions.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/msi/msi.c |   20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -900,10 +900,12 @@ static int __pci_enable_msi_range(struct
 /* deprecated, don't use */
 int pci_enable_msi(struct pci_dev *dev)
 {
-   int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
-   if (rc < 0)
-   return rc;
-   return 0;
+   int rc = msi_setup_device_data(>dev);
+
+   if (!rc)
+   rc = __pci_enable_msi_range(dev, 1, 1, NULL);
+
+   return rc < 0 ? rc : 0;
 }
 EXPORT_SYMBOL(pci_enable_msi);
 
@@ -958,7 +960,11 @@ static int __pci_enable_msix_range(struc
 int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
int minvec, int maxvec)
 {
-   return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0);
+   int ret = msi_setup_device_data(>dev);
+
+   if (!ret)
+   ret = __pci_enable_msix_range(dev, entries, minvec, maxvec, 
NULL, 0);
+   return ret;
 }
 EXPORT_SYMBOL(pci_enable_msix_range);
 
@@ -985,8 +991,12 @@ int pci_alloc_irq_vectors_affinity(struc
   struct irq_affinity *affd)
 {
struct irq_affinity msi_default_affd = {0};
+   int ret = msi_setup_device_data(>dev);
int nvecs = -ENOSPC;
 
+   if (ret)
+   return ret;
+
if (flags & PCI_IRQ_AFFINITY) {
if (!affd)
affd = _default_affd;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 10/35] platform-msi: Allocate MSI device data on first use

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Allocate the MSI device data on first invocation of the allocation function
for platform MSI private data.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 drivers/base/platform-msi.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -204,6 +204,8 @@ platform_msi_alloc_priv_data(struct devi
 irq_write_msi_msg_t write_msi_msg)
 {
struct platform_msi_priv_data *datap;
+   int err;
+
/*
 * Limit the number of interrupts to 2048 per device. Should we
 * need to bump this up, DEV_ID_SHIFT should be adjusted
@@ -218,6 +220,10 @@ platform_msi_alloc_priv_data(struct devi
return ERR_PTR(-EINVAL);
}
 
+   err = msi_setup_device_data(dev);
+   if (err)
+   return ERR_PTR(err);
+
/* Already had a helping of MSI? Greed... */
if (!list_empty(dev_to_msi_list(dev)))
return ERR_PTR(-EBUSY);
@@ -229,7 +235,7 @@ platform_msi_alloc_priv_data(struct devi
datap->devid = ida_simple_get(_msi_devid_ida,
  0, 1 << DEV_ID_SHIFT, GFP_KERNEL);
if (datap->devid < 0) {
-   int err = datap->devid;
+   err = datap->devid;
kfree(datap);
return ERR_PTR(err);
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 08/35] device: Add device:: Msi_data pointer and struct msi_device_data

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

Create struct msi_device_data and add a pointer of that type to struct
dev_msi_info, which is part of struct device. Provide an allocator function
which can be invoked from the MSI interrupt allocation code pathes.

Add a properties field to the data structure as a first member so the
allocation size is not zero bytes. The field will be uses later on.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 
---
 include/linux/device.h |5 +
 include/linux/msi.h|   18 ++
 kernel/irq/msi.c   |   32 
 3 files changed, 55 insertions(+)

--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -45,6 +45,7 @@ struct iommu_ops;
 struct iommu_group;
 struct dev_pin_info;
 struct dev_iommu;
+struct msi_device_data;
 
 /**
  * struct subsys_interface - interfaces to device functions
@@ -374,11 +375,15 @@ struct dev_links_info {
 /**
  * struct dev_msi_info - Device data related to MSI
  * @domain:The MSI interrupt domain associated to the device
+ * @data:  Pointer to MSI device data
  */
 struct dev_msi_info {
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
struct irq_domain   *domain;
 #endif
+#ifdef CONFIG_GENERIC_MSI_IRQ
+   struct msi_device_data  *data;
+#endif
 };
 
 /**
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -171,6 +171,16 @@ struct msi_desc {
};
 };
 
+/**
+ * msi_device_data - MSI per device data
+ * @properties:MSI properties which are interesting to drivers
+ */
+struct msi_device_data {
+   unsigned long   properties;
+};
+
+int msi_setup_device_data(struct device *dev);
+
 /* Helpers to hide struct msi_desc implementation details */
 #define msi_desc_to_dev(desc)  ((desc)->dev)
 #define dev_to_msi_list(dev)   (&(dev)->msi_list)
@@ -233,10 +243,16 @@ void pci_msi_mask_irq(struct irq_data *d
 void pci_msi_unmask_irq(struct irq_data *data);
 
 #ifdef CONFIG_SYSFS
+int msi_device_populate_sysfs(struct device *dev);
+void msi_device_destroy_sysfs(struct device *dev);
+
 const struct attribute_group **msi_populate_sysfs(struct device *dev);
 void msi_destroy_sysfs(struct device *dev,
   const struct attribute_group **msi_irq_groups);
 #else
+static inline int msi_device_populate_sysfs(struct device *dev) { return 0; }
+static inline void msi_device_destroy_sysfs(struct device *dev) { }
+
 static inline const struct attribute_group **msi_populate_sysfs(struct device 
*dev)
 {
return NULL;
@@ -384,6 +400,8 @@ enum {
MSI_FLAG_MUST_REACTIVATE= (1 << 5),
/* Is level-triggered capable, using two messages */
MSI_FLAG_LEVEL_CAPABLE  = (1 << 6),
+   /* Populate sysfs on alloc() and destroy it on free() */
+   MSI_FLAG_DEV_SYSFS  = (1 << 7),
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -73,6 +73,38 @@ void get_cached_msi_msg(unsigned int irq
 }
 EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 
+static void msi_device_data_release(struct device *dev, void *res)
+{
+   WARN_ON_ONCE(!list_empty(>msi_list));
+   dev->msi.data = NULL;
+}
+
+/**
+ * msi_setup_device_data - Setup MSI device data
+ * @dev:   Device for which MSI device data should be set up
+ *
+ * Return: 0 on success, appropriate error code otherwise
+ *
+ * This can be called more than once for @dev. If the MSI device data is
+ * already allocated the call succeeds. The allocated memory is
+ * automatically released when the device is destroyed.
+ */
+int msi_setup_device_data(struct device *dev)
+{
+   struct msi_device_data *md;
+
+   if (dev->msi.data)
+   return 0;
+
+   md = devres_alloc(msi_device_data_release, sizeof(*md), GFP_KERNEL);
+   if (!md)
+   return -ENOMEM;
+
+   dev->msi.data = md;
+   devres_add(dev, md);
+   return 0;
+}
+
 #ifdef CONFIG_SYSFS
 static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr,
 char *buf)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 07/35] device: Move MSI related data into a struct

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

The only unconditional part of MSI data in struct device is the irqdomain
pointer. Everything else can be allocated on demand. Create a data
structure and move the irqdomain pointer into it. The other MSI specific
parts are going to be removed from struct device in later steps.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Jason Gunthorpe 

---
 drivers/base/platform-msi.c |   12 ++--
 drivers/dma/ti/k3-udma.c|4 ++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |2 +-
 drivers/irqchip/irq-mvebu-icu.c |6 +++---
 drivers/soc/ti/k3-ringacc.c |4 ++--
 drivers/soc/ti/ti_sci_inta_msi.c|2 +-
 include/linux/device.h  |   20 ++--
 7 files changed, 29 insertions(+), 21 deletions(-)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -210,10 +210,10 @@ platform_msi_alloc_priv_data(struct devi
 * accordingly (which would impact the max number of MSI
 * capable devices).
 */
-   if (!dev->msi_domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS)
+   if (!dev->msi.domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS)
return ERR_PTR(-EINVAL);
 
-   if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) {
+   if (dev->msi.domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) {
dev_err(dev, "Incompatible msi_domain, giving up\n");
return ERR_PTR(-EINVAL);
}
@@ -269,7 +269,7 @@ int platform_msi_domain_alloc_irqs(struc
if (err)
goto out_free_priv_data;
 
-   err = msi_domain_alloc_irqs(dev->msi_domain, dev, nvec);
+   err = msi_domain_alloc_irqs(dev->msi.domain, dev, nvec);
if (err)
goto out_free_desc;
 
@@ -282,7 +282,7 @@ int platform_msi_domain_alloc_irqs(struc
return 0;
 
 out_free_irqs:
-   msi_domain_free_irqs(dev->msi_domain, dev);
+   msi_domain_free_irqs(dev->msi.domain, dev);
 out_free_desc:
platform_msi_free_descs(dev, 0, nvec);
 out_free_priv_data:
@@ -306,7 +306,7 @@ void platform_msi_domain_free_irqs(struc
platform_msi_free_priv_data(desc->platform.msi_priv_data);
}
 
-   msi_domain_free_irqs(dev->msi_domain, dev);
+   msi_domain_free_irqs(dev->msi.domain, dev);
platform_msi_free_descs(dev, 0, MAX_DEV_MSIS);
 }
 EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs);
@@ -354,7 +354,7 @@ struct irq_domain *
return NULL;
 
data->host_data = host_data;
-   domain = irq_domain_create_hierarchy(dev->msi_domain, 0,
+   domain = irq_domain_create_hierarchy(dev->msi.domain, 0,
 is_tree ? 0 : nvec,
 dev->fwnode, ops, data);
if (!domain)
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -5279,9 +5279,9 @@ static int udma_probe(struct platform_de
if (IS_ERR(ud->ringacc))
return PTR_ERR(ud->ringacc);
 
-   dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
+   dev->msi.domain = of_msi_get_domain(dev, dev->of_node,
DOMAIN_BUS_TI_SCI_INTA_MSI);
-   if (!dev->msi_domain) {
+   if (!dev->msi.domain) {
dev_err(dev, "Failed to get MSI domain\n");
return -EPROBE_DEFER;
}
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3170,7 +3170,7 @@ static void arm_smmu_setup_msis(struct a
if (!(smmu->features & ARM_SMMU_FEAT_MSI))
return;
 
-   if (!dev->msi_domain) {
+   if (!dev->msi.domain) {
dev_info(smmu->dev, "msi_domain absent - falling back to wired 
irqs\n");
return;
}
--- a/drivers/irqchip/irq-mvebu-icu.c
+++ b/drivers/irqchip/irq-mvebu-icu.c
@@ -314,12 +314,12 @@ static int mvebu_icu_subset_probe(struct
msi_data->subset_data = of_device_get_match_data(dev);
}
 
-   dev->msi_domain = of_msi_get_domain(dev, dev->of_node,
+   dev->msi.domain = of_msi_get_domain(dev, dev->of_node,
DOMAIN_BUS_PLATFORM_MSI);
-   if (!dev->msi_domain)
+   if (!dev->msi.domain)
return -EPROBE_DEFER;
 
-   msi_parent_dn = irq_domain_get_of_node(dev->msi_domain);
+   msi_parent_dn = irq_domain_get_of_node(dev->msi.domain);
if (!msi_parent_dn)
return -ENODEV;
 
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -1356,9 +1356,9 @@ static int k3_ringacc_init(struct platfo
struct resourc

[patch V3 06/35] powerpc/pseries/msi: Use PCI device properties

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
---
V3: Use pci_dev->msix_enabled - Jason
---
 arch/powerpc/platforms/pseries/msi.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -448,8 +448,7 @@ static int pseries_msi_ops_prepare(struc
   int nvec, msi_alloc_info_t *arg)
 {
struct pci_dev *pdev = to_pci_dev(dev);
-   struct msi_desc *desc = first_pci_msi_entry(pdev);
-   int type = desc->pci.msi_attrib.is_msix ? PCI_CAP_ID_MSIX : 
PCI_CAP_ID_MSI;
+   int type = pdev->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
 
return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 05/35] powerpc/cell/axon_msi: Use PCI device property

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner 
Cc: Arnd Bergmann 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: linuxppc-...@lists.ozlabs.org
---
V3: Use pci_dev property - Jason
V2: Invoke the function with the correct number of arguments - Andy
---
 arch/powerpc/platforms/cell/axon_msi.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -199,7 +199,6 @@ static struct axon_msic *find_msi_transl
 static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg)
 {
struct device_node *dn;
-   struct msi_desc *entry;
int len;
const u32 *prop;
 
@@ -209,10 +208,8 @@ static int setup_msi_msg_address(struct
return -ENODEV;
}
 
-   entry = first_pci_msi_entry(dev);
-
for (; dn; dn = of_get_next_parent(dn)) {
-   if (entry->pci.msi_attrib.is_64) {
+   if (!dev->no_64bit_msi) {
prop = of_get_property(dn, "msi-address-64", );
if (prop)
break;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2

2021-12-10 Thread Thomas Gleixner

This is the second part of [PCI]MSI refactoring which aims to provide the
ability of expanding MSI-X vectors after enabling MSI-X.

This is based on the first part of this work which can be found here:

https://lore.kernel.org/r/20211206210147.872865...@linutronix.de

and has been applied to:

 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi


This second part has the following important changes:

   1) Cleanup of the MSI related data in struct device

  struct device contains at the moment various MSI related parts. Some
  of them (the irq domain pointer) cannot be moved out, but the rest
  can be allocated on first use. This is in preparation of adding more
  per device MSI data later on.

   2) Consolidation of sysfs handling

  As a first step this moves the sysfs pointer from struct msi_desc
  into the new per device MSI data structure where it belongs.

  Later changes will cleanup this code further, but that's not possible
  at this point.

   3) Use PCI device properties instead of looking up MSI descriptors and
  analysing their data.

   4) Provide a function to retrieve the Linux interrupt number for a given
  MSI index similar to pci_irq_vector() and cleanup all open coded
  variants.

It's also available from git:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v3-part-2

Part 3 of this effort is available on top

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v3-part-3

 Part 3 is not going to be reposted as there is no change vs. V2.

V2 of part 2 can be found here:

https://lore.kernel.org/r/20211206210307.625116...@linutronix.de

Changes versus V2:

  - Use PCI device properties instead of creating a new set - Jason

  - Picked up Reviewed/Tested/Acked-by tags as appropriate

Thanks,

tglx
---
 arch/powerpc/platforms/cell/axon_msi.c  |5 
 arch/powerpc/platforms/pseries/msi.c|   38 +---
 arch/x86/kernel/apic/msi.c  |5 
 arch/x86/pci/xen.c  |   11 -
 drivers/base/platform-msi.c |  152 ---
 drivers/bus/fsl-mc/dprc-driver.c|8 -
 drivers/bus/fsl-mc/fsl-mc-allocator.c   |9 -
 drivers/bus/fsl-mc/fsl-mc-msi.c |   26 +--
 drivers/dma/mv_xor_v2.c |   16 --
 drivers/dma/qcom/hidma.c|   44 ++---
 drivers/dma/ti/k3-udma-private.c|6 
 drivers/dma/ti/k3-udma.c|   14 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   23 --
 drivers/irqchip/irq-mbigen.c|4 
 drivers/irqchip/irq-mvebu-icu.c |   12 -
 drivers/irqchip/irq-ti-sci-inta.c   |2 
 drivers/mailbox/bcm-flexrm-mailbox.c|9 -
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c|4 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c|4 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c |5 
 drivers/pci/msi/irqdomain.c |   20 ++
 drivers/pci/msi/legacy.c|6 
 drivers/pci/msi/msi.c   |  133 ++--
 drivers/pci/xen-pcifront.c  |2 
 drivers/perf/arm_smmuv3_pmu.c   |5 
 drivers/soc/fsl/dpio/dpio-driver.c  |8 -
 drivers/soc/ti/k3-ringacc.c |6 
 drivers/soc/ti/ti_sci_inta_msi.c|   22 --
 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c  |4 
 include/linux/device.h  |   25 ++-
 include/linux/fsl/mc.h  |4 
 include/linux/msi.h |   95 
 include/linux/pci.h |1 
 include/linux/soc/ti/ti_sci_inta_msi.h  |1 
 kernel/irq/msi.c|  158 +++-
 35 files changed, 429 insertions(+), 458 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 03/35] x86/apic/msi: Use PCI device MSI property

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner 
---
V3: Use pci_dev->msix_enabled - Jason
---
 arch/x86/kernel/apic/msi.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -160,11 +160,8 @@ static struct irq_chip pci_msi_controlle
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct msi_desc *desc = first_pci_msi_entry(pdev);
-
init_irq_alloc_info(arg, NULL);
-   if (desc->pci.msi_attrib.is_msix) {
+   if (to_pci_dev(dev)->msix_enabled) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 04/35] genirq/msi: Use PCI device property

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

to determine whether this is MSI or MSIX instead of consulting MSI
descriptors.

Signed-off-by: Thomas Gleixner 
---
V2: Use PCI device property - Jason
---
 kernel/irq/msi.c |   17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -77,21 +77,8 @@ EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr,
 char *buf)
 {
-   struct msi_desc *entry;
-   bool is_msix = false;
-   unsigned long irq;
-   int retval;
-
-   retval = kstrtoul(attr->attr.name, 10, );
-   if (retval)
-   return retval;
-
-   entry = irq_get_msi_desc(irq);
-   if (!entry)
-   return -ENODEV;
-
-   if (dev_is_pci(dev))
-   is_msix = entry->pci.msi_attrib.is_msix;
+   /* MSI vs. MSIX is per device not per interrupt */
+   bool is_msix = dev_is_pci(dev) ? to_pci_dev(dev)->msix_enabled : false;
 
return sysfs_emit(buf, "%s\n", is_msix ? "msix" : "msi");
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 02/35] x86/pci/XEN: Use PCI device property

2021-12-10 Thread Thomas Gleixner

From: Thomas Gleixner 

instead of fiddling with MSI descriptors.

Signed-off-by: Thomas Gleixner 
Cc: Juergen Gross 
Cc: xen-de...@lists.xenproject.org
---
V3: Use pci_dev->msix_enabled.
---
 arch/x86/pci/xen.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -399,9 +399,7 @@ static void xen_teardown_msi_irqs(struct
 
 static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
 {
-   struct msi_desc *msidesc = first_pci_msi_entry(dev);
-
-   if (msidesc->pci.msi_attrib.is_msix)
+   if (dev->msix_enabled)
xen_pci_frontend_disable_msix(dev);
else
xen_pci_frontend_disable_msi(dev);
@@ -417,10 +415,7 @@ static int xen_msi_domain_alloc_irqs(str
if (WARN_ON_ONCE(!dev_is_pci(dev)))
return -EINVAL;
 
-   if (first_msi_entry(dev)->pci.msi_attrib.is_msix)
-   type = PCI_CAP_ID_MSIX;
-   else
-   type = PCI_CAP_ID_MSI;
+   type = to_pci_dev(dev)->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
 
return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type);
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V3 01/35] PCI/MSI: Set pci_dev::msi[x]_enabled early

2021-12-10 Thread Thomas Gleixner

There are quite some places which retrieve the first MSI descriptor to
evaluate whether the setup is for MSI or MSI-X. That's required because
pci_dev::msi[x]_enabled is only set when the setup completed successfully.

There is no real reason why msi[x]_enabled can't be set at the beginning of
the setup sequence and cleared in case of a failure.

Implement that so the MSI descriptor evaluations can be converted to simple
property queries.

Signed-off-by: Thomas Gleixner 
---
V3: New patch
---
 drivers/pci/msi/msi.c |   23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -421,11 +421,18 @@ static int msi_capability_init(struct pc
struct msi_desc *entry;
int ret;
 
-   pci_msi_set_enable(dev, 0); /* Disable MSI during set up */
+   /*
+* Disable MSI during setup in the hardware, but mark it enabled
+* so that setup code can evaluate it.
+*/
+   pci_msi_set_enable(dev, 0);
+   dev->msi_enabled = 1;
 
entry = msi_setup_entry(dev, nvec, affd);
-   if (!entry)
-   return -ENOMEM;
+   if (!entry) {
+   ret = -ENOMEM;
+   goto fail;
+   }
 
/* All MSIs are unmasked by default; mask them all */
pci_msi_mask(entry, msi_multi_mask(entry));
@@ -452,7 +459,6 @@ static int msi_capability_init(struct pc
/* Set MSI enabled bits */
pci_intx_for_msi(dev, 0);
pci_msi_set_enable(dev, 1);
-   dev->msi_enabled = 1;
 
pcibios_free_irq(dev);
dev->irq = entry->irq;
@@ -461,6 +467,8 @@ static int msi_capability_init(struct pc
 err:
pci_msi_unmask(entry, msi_multi_mask(entry));
free_msi_irqs(dev);
+fail:
+   dev->msi_enabled = 0;
return ret;
 }
 
@@ -589,6 +597,9 @@ static int msix_capability_init(struct p
pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL |
PCI_MSIX_FLAGS_ENABLE);
 
+   /* Mark it enabled so setup functions can query it */
+   dev->msix_enabled = 1;
+
pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, );
/* Request & Map MSI-X table region */
tsize = msix_table_size(control);
@@ -626,9 +637,8 @@ static int msix_capability_init(struct p
 
dev->msi_irq_groups = groups;
 
-   /* Set MSI-X enabled bits and unmask the function */
+   /* Disable INTX and unmask MSI-X */
pci_intx_for_msi(dev, 0);
-   dev->msix_enabled = 1;
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
pcibios_free_irq(dev);
@@ -638,6 +648,7 @@ static int msix_capability_init(struct p
free_msi_irqs(dev);
 
 out_disable:
+   dev->msix_enabled = 0;
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
 
return ret;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-10 Thread Thomas Gleixner

Jason,

On Fri, Dec 10 2021 at 08:39, Jason Gunthorpe wrote:

> On Fri, Dec 10, 2021 at 07:29:01AM +, Tian, Kevin wrote:
>> >   5) It's not possible for the kernel to reliably detect whether it is
>> >  running on bare metal or not. Yes we talked about heuristics, but
>> >  that's something I really want to avoid.
>> 
>> How would the hypercall mechanism avoid such heuristics?
>
> It is clever, we don't have an vIOMMU that supplies vIR today, so by
> definition all guests are excluded and only bare metal works.

Dammit. Now you spilled the beans. :)

>> > The charm is that his works for everything from INTx to IMS because all
>> > of them go through the same procedure, except that INTx (IO/APIC) does
>> > not support the reservation mode dance.
>
> Do we even have vIOAPIC?

It does not matter much. INTx via IOAPIC is different anyway because
INTx is shared so it's unclear from which device it originates.

> It seems reasonable - do you have any idea how this all would work on
> ARM too? IMS on baremetal ARM is surely interesting. I assume they
> have a similar issue with trapping the MSI

On baremetal it should just work once ARM is converted over. No idea
about guests. Marc should know.

>> Then Qemu needs to find out the GSI number for the vIRTE handle. 
>> Again Qemu doesn't have such information since it doesn't know 
>> which MSI[-X] entry points to this handle due to no trap.
>
> No this is already going wrong. qemu *cannot* know the MSI information
> because there is no MSI information for IMS.
>
> All qemu should get is the origin device information and data about
> how the guest wants the interrupt setup.
>
> Forget about guests and all of this complexity, design how to make
> VFIO work with IMS in pure userspace like DPDK.
>
> We must have a VFIO ioctl to acquire a addr/data pair and link it to
> an event fd.
>
> I'm not sure exactly how this should be done, it is 90% of what IMS
> is, except the VFIO irq_chip cannot touch any real HW and certainly
> cannot do mask/unmask..
>
> Maybe that is OK now that it requires IR?

IR unfortunately does not allow masking, but we surely can come up some
emergency button to press when e.g. an interrupt storm is detected.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-10 Thread Thomas Gleixner

Kevin,

On Fri, Dec 10 2021 at 07:29, Kevin Tian wrote:
>> From: Thomas Gleixner 
>>   4) For the guest side we agreed that we need an hypercall because the
>>  host can't trap the write to the MSI[-X] entry anymore.
>
> To be accurate I'd like to not call it "can't trap". The host still traps the 
> MSI/MSI-X entry if the hypercall is not used. This is for current guest 
> OS which doesn't have this hypercall mechanism. For future new guest
> OS which will support this machinery then a handshake process from
> such guest will disable the trap for MSI-X and map it for direct guest
> access in the fly.

Right. What I'm suggesting is a clear cut between the current approach,
which obviously needs to be preserved, and a consistent new approach
which handles MSI, MSI-X and IMS in the exactly same way.

> MSI has to be always trapped although the guest has acquired the right 
> data/addr pair via the hypercall, since it's a PCI config capability.
>
>> 
>>  Aside of the fact that this creates a special case for IMS which is
>>  undesirable in my opinion, it's not really obvious where the
>>  hypercall should be placed to work for all scenarios so that it can
>>  also solve the existing issue of silent failures.
>> 
>>   5) It's not possible for the kernel to reliably detect whether it is
>>  running on bare metal or not. Yes we talked about heuristics, but
>>  that's something I really want to avoid.
>
> How would the hypercall mechanism avoid such heuristics?

The availability of IR remapping where the irqdomain which is provided
by the remapping unit signals that it supports this new scheme:

|--IO/APIC
|--MSI
 vector -- IR --|--MSI-X
|--IMS

while the current scheme is:

|--IO/APIC
 vector -- IR --|--PCI/MSI[-X]

or 

|--IO/APIC
 vector |--PCI/MSI[-X]

So in the new scheme the IR domain will advertise new features which are
not available on older kernels. The availability of these new features
is the indicator for the interrupt subsystem and subsequently for PCI
whether IMS is supported or not.

Bootup either finds an IR unit or not. In the bare metal case that's the
usual hardware/firmware detection. In the guest case it's the
availability of vIR including the required hypercall protocol.

So for the guest case the initialization would look like this:

   qemu starts guest
  Method of interrupt management defaults to current scheme
  restricted to MSI/MSI-X

  guest initializes
  older guest do not check for the hypercall so everything
  continues as of today

  new guest initializes vIR, detects hypercall and requests
  from the hypervisor to switch over to the new scheme.

  The hypervisor grants or rejects the request, i.e. either both
  switch to the new scheme or stay with the old one.

The new scheme means, that all variants, MSI, MSI-X, IMS are handled in
the same way. Staying on the old scheme means that IMS is not available
to the guest.

Having that clear separation is in my opinion way better than trying to
make all of that a big maze of conditionals.

I'm going to make that clear cut in the PCI/MSI management layer as
well. Trying to do that hybrid we do today for irqdomain and non
irqdomain based backends is just not feasible. The decision which way to
go will be very close to the driver exposed API, i.e.:

   pci_alloc_ims_vector()
if (new_scheme())
return new_scheme_alloc_ims();
else
return -ENOTSUPP;

and new_scheme_alloc_ims() will have:

   new_scheme_alloc_ims()
if (!system_supports_ims())
return -ENOTSUPP;

system_supports_ims() makes its decision based on the feature flags
exposed by the underlying base MSI irqdomain, i.e. either vector or IR
on x86.

Vector domain will not have that feature flag set, but IR will have on
bare metal and eventually in the guest when the vIR setup and hypercall
detection and negotiation succeeds.

> Then Qemu needs to find out the GSI number for the vIRTE handle. 
> Again Qemu doesn't have such information since it doesn't know 
> which MSI[-X] entry points to this handle due to no trap.
>
> This implies that we may also need carry device ID, #msi entry, etc. 
> in the hypercall, so Qemu can associate the virtual routing info
> to the right [irqfd, gsi].
>
> In your model the hypercall is raised by IR domain. Do you see
> any problem of finding those information within IR domain?

IR has the following information available:

   Interrupt type
- MSI:   Device, index and number of vectors
- MSI-X: Device, index
- IMS:   Device, index

  Target APIC/vector pair

IMS: The index depends on the storage type:

 Fo

Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-09 Thread Thomas Gleixner

On Thu, Dec 09 2021 at 23:09, Thomas Gleixner wrote:
> On Thu, Dec 09 2021 at 16:58, Jason Gunthorpe wrote:
>> Okay, I think I get it. Would be nice to have someone from intel
>> familiar with the vIOMMU protocols and qemu code remark what the
>> hypervisor side can look like.
>>
>> There is a bit more work here, we'd have to change VFIO to somehow
>> entirely disconnect the kernel IRQ logic from the MSI table and
>> directly pass control of it to the guest after the hypervisor IOMMU IR
>> secures it. ie directly mmap the msi-x table into the guest
>
> That makes everything consistent and a clear cut on all levels, right?

Let me give a bit more rationale here, why I think this is the right
thing to do. There are several problems with IMS both on the host and on
the guest side:

  1) Contrary to MSI/MSI-X the address/data pair is not completely
 managed by the core. It's handed off to driver writers in the
 hope they get it right.

  2) Without interrupt remapping there is a fundamental issue on x86
 for the affinity setting case, as there is no guarantee that
 the magic protocol which we came up with (see msi_set_affinity()
 in the x86 code) is correctly implemented at the driver level or
 that the update is truly atomic so that the problem does not
 arise. My interrest in chasing these things is exactly zero.

 With interrupt remapping the affinity change happens at the IRTE
 level and not at the device level. It's a one time setup for the
 device.

 Just for the record:

 The ATH11 thing does not have that problem by pure luck because
 multi-vector MSI is not supported on X86 unless interrupt
 remapping is enabled. 

 The switchtec NTB thing will fall apart w/o remapping AFAICT.

  3) With remapping the message for the device is constructed at
 allocation time. It does not change after that because the affinity
 change happens at the remapping level, which eliminates #2 above.

 That has another advantage for IMS because it does not require any
 synchronization with the queue or whatever is involved. The next
 interrupt after the change at the remapping level ends up on the
 new target.

  4) For the guest side we agreed that we need an hypercall because the
 host can't trap the write to the MSI[-X] entry anymore.

 Aside of the fact that this creates a special case for IMS which is
 undesirable in my opinion, it's not really obvious where the
 hypercall should be placed to work for all scenarios so that it can
 also solve the existing issue of silent failures.

  5) It's not possible for the kernel to reliably detect whether it is
 running on bare metal or not. Yes we talked about heuristics, but
 that's something I really want to avoid.

When looking at the above I came to the conclusion that the consistent
way is to make IMS depend on IR both on the host and the guest as this
solves all of the above in one go.

How would that work? With IR the irqdomain hierarchy looks like this:

   |--IO/APIC
   |--MSI
vector -- IR --|--MIX-X
   |--IMS

There are several context where this matters:

  1) Allocation of an interrupt, e.g. pci_alloc_irq_vectors().

  2) Activation of an interrupt which happens during allocation and/or
 at request_irq() time

  3) Interrupt affinity setting

#1 Allocation

   That allocates an IRTE, which can fail

#2 Activation

   That's the step where actually a CPU vector is allocated, where the
   IRTE is updated and where the device message is composed to target
   the IRTE.

   On X86 activation is happening twice:

   1) During allocation it allocates a special CPU vector which is
  handed out to all allocated interrupts. That's called reservation
  mode. This was introduced to prevent vector exhaustion for two
  cases:

   - Devices allocating tons of MSI-X vectors without using
 them. That obviously needs to be fixed at the device driver
 level, but due to the fact that post probe() allocation is not
 supported, that's not always possible

   - CPU hotunplug

 All vectors targeting the outgoing CPU need to be migrated to a
 new target CPU, which can result in exhaustion of the vector
 space.

 Reservation mode avoids that because it just uses a unique
 vector for all interrupts which are allocated but not
 requested.

2) On request_irq()

   As the vector assigned during allocation is just a place holder
   to make the MSI hardware happy it needs to be replaced by a
   real vector.

   Both can fail and the error is propagated through the call chain

#3 Changing the interrupt affinity

   This obviously needs to allocate a new target CPU vector and update
   the IRTE.

   Allocating a new target CPU vector can fail.

When looking at it from

Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-09 Thread Thomas Gleixner

On Thu, Dec 09 2021 at 16:58, Jason Gunthorpe wrote:
> On Thu, Dec 09, 2021 at 09:32:42PM +0100, Thomas Gleixner wrote:
>> That was my thought to avoid having different mechanisms.
>> 
>> The address/data pair is computed in two places:
>> 
>>   1) Activation of an interrupt
>>   2) Affinity setting on an interrupt
>> 
>> Both configure the IRTE when interrupt remapping is in place.
>> 
>> In both cases a vector is allocated in the vector domain and based on
>> the resulting target APIC / vector number pair the IRTE is
>> (re)configured.
>> 
>> So putting the hypercall into the vIRTE update is the obvious
>> place. Both activation and affinity setting can fail and propagate an
>> error code down to the originating caller.
>> 
>> Hmm?
>
> Okay, I think I get it. Would be nice to have someone from intel
> familiar with the vIOMMU protocols and qemu code remark what the
> hypervisor side can look like.
>
> There is a bit more work here, we'd have to change VFIO to somehow
> entirely disconnect the kernel IRQ logic from the MSI table and
> directly pass control of it to the guest after the hypervisor IOMMU IR
> secures it. ie directly mmap the msi-x table into the guest

That makes everything consistent and a clear cut on all levels, right?

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 19/36] PCI/MSI: Store properties in device::msi::data

2021-12-09 Thread Thomas Gleixner

On Thu, Dec 09 2021 at 18:53, Thomas Gleixner wrote:
> On Wed, Dec 08 2021 at 11:58, Jason Gunthorpe wrote:
>> On Mon, Dec 06, 2021 at 11:39:26PM +0100, Thomas Gleixner wrote:
>>> Store the properties which are interesting for various places so the MSI
>>> descriptor fiddling can be removed.
>>> 
>>> Signed-off-by: Thomas Gleixner 
>>> ---
>>> V2: Use the setter function
>>> ---
>>>  drivers/pci/msi/msi.c |8 
>>>  1 file changed, 8 insertions(+)
>>
>> I took more time to look at this, to summarize my remarks on the other
>> patches
>>
>> I think we don't need properties. The info in the msi_desc can come
>> from the pci_dev which we have easy access to. This seems overall
>> clearer
>
> I fixed that now.

So much for the theory. dev->msi[x]_enabled are set after everything is
set up. Some of the places are part of the setup...

/me goes back to stare
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-09 Thread Thomas Gleixner

On Thu, Dec 09 2021 at 12:21, Jason Gunthorpe wrote:
> On Thu, Dec 09, 2021 at 09:37:06AM +0100, Thomas Gleixner wrote:
> If we keep the MSI emulation in the hypervisor then MSI != IMS.  The
> MSI code needs to write a addr/data pair compatible with the emulation
> and the IMS code needs to write an addr/data pair from the
> hypercall. Seems like this scenario is best avoided!
>
> From this perspective I haven't connected how virtual interrupt
> remapping helps in the guest? Is this a way to provide the hypercall
> I'm imagining above?

That was my thought to avoid having different mechanisms.

The address/data pair is computed in two places:

  1) Activation of an interrupt
  2) Affinity setting on an interrupt

Both configure the IRTE when interrupt remapping is in place.

In both cases a vector is allocated in the vector domain and based on
the resulting target APIC / vector number pair the IRTE is
(re)configured.

So putting the hypercall into the vIRTE update is the obvious
place. Both activation and affinity setting can fail and propagate an
error code down to the originating caller.

Hmm?

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 19/36] PCI/MSI: Store properties in device::msi::data

2021-12-09 Thread Thomas Gleixner

On Wed, Dec 08 2021 at 11:58, Jason Gunthorpe wrote:
> On Mon, Dec 06, 2021 at 11:39:26PM +0100, Thomas Gleixner wrote:
>> Store the properties which are interesting for various places so the MSI
>> descriptor fiddling can be removed.
>> 
>> Signed-off-by: Thomas Gleixner 
>> ---
>> V2: Use the setter function
>> ---
>>  drivers/pci/msi/msi.c |8 
>>  1 file changed, 8 insertions(+)
>
> I took more time to look at this, to summarize my remarks on the other
> patches
>
> I think we don't need properties. The info in the msi_desc can come
> from the pci_dev which we have easy access to. This seems overall
> clearer

I fixed that now.

> The notable one is the sysfs, but that is probably better handled by
> storing a
>
>   const char *sysfs_label
>
> in the dev->msi and emitting that instead of computing it.

I just compute is for now via is_pci_dev() and
to_pci_dev()->msi_enabled.

We are still debating to remove the whole thing completely.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

2021-12-09 Thread Thomas Gleixner

On Thu, Dec 09 2021 at 12:17, Kevin Tian wrote:
>> From: Thomas Gleixner 
>> I think you are looking at that from the internal implementation details
>> of IDXD. But you can just model it in an IDXD implementation agnostic
>> way:
>> 
>>  ENQCMD(PASID, IMS-ENTRY,.)
>
> Not exactly IMS-ENTRY. MSI-ENTRY also works.

Sure.

>> 
>> implies an on demand allocation of a virtual queue, which is deallocated
>> when the command completes. The PASID and IMS-ENTRY act as the 'queue'
>> identifier.
>> 
>> The implementation detail of IDXD that it executes these computations on
>> an internal shared workqueue does not change that.
>> 
>> Such a workqueue can only execute one enqueued command at a time,
>> which
>> means that during the execution of a particular command that IDXD
>> internal workqueue represents the 'virtual queue' which is identified by
>> the unique PASID/IMS-ENTRY pair.
>
> While it's one way of looking at this model do we want to actually
> create some objects to represent this 'virtual queue' concept? that
> implies each ENQCMD must be moderated to create such short-lifespan
> objects and I'm not sure the benefit of doing so.

You don't have to create anything. The PASID/ENTRY pair represents that
'virtual queue', right?

> If not then from driver p.o.v it's still one queue resource and driver 
> needs to manage its association with multiple interrupt entries and 
> PASIDs when it's connected to multiple clients.

That's correct, but there is nothing problematic with it. It's like
allocating multiple interrupts for any other hardware device or
subdevice, right?

What's probably more interresting is how the PASID/interrupt/RID
relations are managed.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

1 2 3 4 5 6 >

1 - 100 of 576 matches

Mail list logo