Re: [PATCH v9 06/19] x86: Add early SHA-1 support for Secure Launch early measurements
On Thu, Aug 15 2024 at 13:38, Daniel P. Smith wrote: > On 5/31/24 09:54, Eric W. Biederman wrote: >> Eric Biggers writes: >>> That paragraph is also phrased as a hypothetical, "Even if we'd prefer to >>> use >>> SHA-256-only". That implies that you do not, in fact, prefer SHA-256 only. >>> Is >>> that the case? Sure, maybe there are situations where you *have* to use >>> SHA-1, >>> but why would you not at least *prefer* SHA-256? >> >> Yes. Please prefer to use SHA-256. >> >> Have you considered implementing I think it is SHA1-DC (as git has) that >> is compatible with SHA1 but blocks the known class of attacks where >> sha1 is actively broken at this point? > > We are using the kernel's implementation, addressing what the kernel > provides is beyond our efforts. Perhaps someone who is interested in > improving the kernel's SHA1 could submit a patch implementing/replacing > it with SHA1-DC, as I am sure the maintainers would welcome the help. Well, someone who is interested to get his "secure" code merged should have a vested interested to have a non-broken SHA1 implementation if there is a sensible requirement to use SHA1 in that new "secure" code, no? Just for the record. The related maintainers can rightfully decide to reject known broken "secure" code on a purely technical argument. Thanks, tglx
Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
On Fri, May 13 2022 at 15:16, Ricardo Neri wrote: > On Mon, May 09, 2022 at 04:03:39PM +0200, Thomas Gleixner wrote: >> > + /* If we are here, IPI shorthands are enabled. */ >> > + apic->send_IPI_allbutself(NMI_VECTOR); >> >> So if the monitored cpumask is a subset of online CPUs, which is the >> case when isolation features are enabled, then you still send NMIs to >> those isolated CPUs. I'm sure the isolation folks will be enthused. > > Yes, I acknowledged this limitation in the cover letter. I should also update > Documentation/admin-guide/lockup-watchdogs.rst. > > This patchset proposes the HPET NMI watchdog as an opt-in feature. > > Perhaps the limitation might be mitigated by adding a check for > non-housekeeping > and non-monitored CPUs in exc_nmi(). However, that will not eliminate the > problem of isolated CPUs also getting the NMI. Right. It's a mess... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()
On Fri, May 13 2022 at 14:19, Ricardo Neri wrote: > On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote: >> The argument about not bloating the code >> with an "obvious???" function which is quite small is slightly beyond my >> comprehension level. > > That obvious function would look like this: > > void hpet_set_comparator_one_shot(int channel, u32 delta) > { > u32 count; > > count = hpet_readl(HPET_COUNTER); > count += delta; > hpet_writel(count, HPET_Tn_CMP(channel)); > } This function only works reliably when the delta is large. See hpet_clkevt_set_next_event(). Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs
On Fri, May 13 2022 at 16:45, Ricardo Neri wrote: > On Fri, May 13, 2022 at 10:50:09PM +0200, Thomas Gleixner wrote: >> > Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips >> > INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for >> > NULL. >> > They currently unconditionally call the parent irq_chip's >> > irq_set_affinity(). >> > I see that there is a irq_chip_set_affinity_parent() function. Perhaps it >> > can >> > be used for this check? >> >> Yes, this lacks obviously a NMI specific set_affinity callback and this >> can be very trivial and does not have any of the complexity of interrupt >> affinity assignment. First online CPU in the mask with a fallback to any >> online CPU. > > Why would we need a fallback to any online CPU? Shouldn't it fail if it cannot > find an online CPU in the mask? Might as well fail. Let me think about it. >> I did not claim that this is complete. This was for illustration. > > In the reworked patch, may I add a Co-developed-by with your name and your > SOB? Suggested-by is good enough. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs
On Fri, May 13 2022 at 11:03, Ricardo Neri wrote: > On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote: >> Why would a NMI ever end up in this code? There is no vector management >> required and this find cpu exercise is pointless. > > But even if the NMI has a fixed vector, it is still necessary to determine > which CPU will get the NMI. It is still necessary to determine what to > write in the Destination ID field of the MSI message. > > irq_matrix_find_best_cpu() would find the CPU with the lowest number of > managed vectors so that the NMI is directed to that CPU. What's the point to send it to the CPU with the lowest number of interrupts. It's not that this NMI happens every 50 microseconds. We pick one online CPU and are done. > In today's code, an NMI would end up here because we rely on the existing > interrupt management infrastructure... Unless, the check is done the entry > points as you propose. Correct. We don't want to call into functions which are not designed for NMIs. >> > + >> > + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) { >> > + cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest); >> > + apicd->cpu = cpu; >> > + vector = 0; >> > + goto no_vector; >> > + } >> >> This code can never be reached for a NMI delivery. If so, then it's a >> bug. >> >> This all is special purpose for that particular HPET NMI watchdog use >> case and we are not exposing this to anything else at all. >> >> So why are you sprinkling this NMI nonsense all over the place? Just >> because? There are well defined entry points to all of this where this >> can be fenced off. > > I put the NMI checks in these points because assign_vector_locked() and > assign_managed_vector() are reached through multiple paths and these are > the two places where the allocation of the vector is requested and the > destination CPU is determined. > > I do observe this code being reached for an NMI, but that is because this > code still does not know about NMIs... Unless the checks for NMI are put > in the entry points as you pointed out. > > The intent was to refactor the code in a generic manner and not to focus > only in the NMI watchdog. That would have looked hacky IMO. We don't want to have more of this really. Supporting NMIs on x86 in a broader way is simply not reasonable because there is only one NMI vector and we have no sensible way to get to the cause of the NMI without a massive overhead. Even if we get multiple NMI vectors some shiny day, this will be fundamentally different than regular interrupts and certainly not exposed broadly. There will be 99.99% fixed vectors for simplicity sake. >> +if (info->flags & X86_IRQ_ALLOC_AS_NMI) { >> +/* >> + * NMIs have a fixed vector and need their own >> + * interrupt chip so nothing can end up in the >> + * regular local APIC management code except the >> + * MSI message composing callback. >> + */ >> +irqd->chip = &lapic_nmi_controller; >> +/* >> + * Don't allow affinity setting attempts for NMIs. >> + * This cannot work with the regular affinity >> + * mechanisms and for the intended HPET NMI >> + * watchdog use case it's not required. > > But we do need the ability to set affinity, right? As stated above, we need > to know what Destination ID to write in the MSI message or in the interrupt > remapping table entry. > > It cannot be any CPU because only one specific CPU is supposed to handle the > NMI from the HPET channel. > > We cannot hard-code a CPU for that because it may go offline (and ignore NMIs) > or not be part of the monitored CPUs. > > Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips > INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for > NULL. > They currently unconditionally call the parent irq_chip's irq_set_affinity(). > I see that there is a irq_chip_set_affinity_parent() function. Perhaps it can > be used for this check? Yes, this lacks obviously a NMI specific set_affinity callback and this can be very trivial and does not have any of the complexity of interrupt affinity assignment. First online CPU in the mask with a fallback to any online CPU. I did not claim that this is complete. This was for illustration. >> + */ >> +irqd_set_no_balance(irqd); > > This code does not set apicd->hw_irq_cfg.delivery_mode as NMI, right? > I had to add that to make it work. I assumed you can figure that out on your own :) Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz
On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: >> +/* >> + * If in use, the HPET hardlockup detector relies on tsc_khz. >> + * Reconfigure it to make use of the refined tsc_khz. >> + */ >> +lockup_detector_reconfigure(); > > I don't know if the API is conceptually good. > > You change something that the lockup detector is currently using, > *while* the detector is running asynchronously, and then reconfigure > it. What happens in the window? If this code is only used for small > adjustments maybe it does not really matter but in principle it's > a bad API to export. > > lockup_detector_reconfigure as an internal API is okay because it > reconfigures things while the watchdog is stopped [actually that > looks untrue for soft dog which uses watchdog_thresh in > is_softlockup(), but that should be fixed]. > > You're the arch so you're allowed to stop the watchdog and configure > it, e.g., hardlockup_detector_perf_stop() is called in arch/. > > So you want to disable HPET watchdog if it was enabled, then update > wherever you're using tsc_khz, then re-enable. The real question is whether making this refined tsc_khz value immediately effective matters at all. IMO, it does not because up to that point the watchdog was happily using the coarse calibrated value and the whole use TSC to assess whether the HPET fired mechanism is just a guestimate anyway. So what's the point of trying to guess 'more correct'. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > + if (is_hpet_hld_interrupt(hdata)) { > + /* > + * Kick the timer first. If the HPET channel is periodic, it > + * helps to reduce the delta between the expected TSC value and > + * its actual value the next time the HPET channel fires. > + */ > + kick_timer(hdata, !(hdata->has_periodic)); > + > + if (cpumask_weight(hld_data->monitored_cpumask) > 1) { > + /* > + * Since we cannot know the source of an NMI, the best > + * we can do is to use a flag to indicate to all online > + * CPUs that they will get an NMI and that the source of > + * that NMI is the hardlockup detector. Offline CPUs > + * also receive the NMI but they ignore it. > + * > + * Even though we are in NMI context, we have concluded > + * that the NMI came from the HPET channel assigned to > + * the detector, an event that is infrequent and only > + * occurs in the handling CPU. There should not be races > + * with other NMIs. > + */ > + cpumask_copy(hld_data->inspect_cpumask, > + cpu_online_mask); > + > + /* If we are here, IPI shorthands are enabled. */ > + apic->send_IPI_allbutself(NMI_VECTOR); So if the monitored cpumask is a subset of online CPUs, which is the case when isolation features are enabled, then you still send NMIs to those isolated CPUs. I'm sure the isolation folks will be enthused. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category
On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > is to be used with the HPET-based hardlockup detector. This detector > does not have a direct way of checking if the HPET timer is the source of > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > Therefore, we may have false-positives in case another NMI occurs within > the estimated time window. For this reason, we want the handler of the > detector to be called after all the NMI_LOCAL handlers. A simple way > of achieving this with a new NMI handler category. > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs) > } > raw_spin_unlock(&nmi_reason_lock); > > + handled = nmi_handle(NMI_WATCHDOG, regs); > + if (handled == NMI_HANDLED) > + goto out; > + How is this supposed to work reliably? If perf is active and the HPET NMI and the perf NMI come in around the same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog won't be checked. Because MSI is strictly edge and the message is only sent once, this can result in a stale watchdog, no? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()
On Fri, May 06 2022 at 23:41, Thomas Gleixner wrote: > On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: >> Programming an HPET channel as periodic requires setting the >> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator >> register must be written twice (once for the comparator value and once for >> the periodic value). Since this programming might be needed in several >> places (e.g., the HPET clocksource and the HPET-based hardlockup detector), >> add a helper function for this purpose. >> >> A helper function hpet_set_comparator_oneshot() could also be implemented. >> However, such function would only program the comparator register and the >> function would be quite small. Hence, it is better to not bloat the code >> with such an obvious function. > > This word salad above does not provide a single reason why the periodic > programming function is required and better suited for the NMI watchdog > case and then goes on and blurbs about why a function which is not > required is not implemented. The argument about not bloating the code > with an "obvious???" function which is quite small is slightly beyond my > comprehension level. What's even more uncomprehensible is that the patch which actually sets up that NMI watchdog cruft has: > + if (hc->boot_cfg & HPET_TN_PERIODIC_CAP) > + hld_data->has_periodic = true; So how the heck does that work with a HPET which does not support periodic mode? That watchdog muck will still happily invoke that set periodic function in the hope that it works by chance? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > Programming an HPET channel as periodic requires setting the > HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator > register must be written twice (once for the comparator value and once for > the periodic value). Since this programming might be needed in several > places (e.g., the HPET clocksource and the HPET-based hardlockup detector), > add a helper function for this purpose. > > A helper function hpet_set_comparator_oneshot() could also be implemented. > However, such function would only program the comparator register and the > function would be quite small. Hence, it is better to not bloat the code > with such an obvious function. This word salad above does not provide a single reason why the periodic programming function is required and better suited for the NMI watchdog case and then goes on and blurbs about why a function which is not required is not implemented. The argument about not bloating the code with an "obvious???" function which is quite small is slightly beyond my comprehension level. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > + * > + * Also, NMIs do not have an associated vector. No need for cleanup. They have a vector and what the heck is this cleanup comment for here? There is nothing cleanup alike anywhere near. Adding confusing comments is worse than adding no comments at all. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > > + if (info->flags & X86_IRQ_ALLOC_AS_NMI) { > + /* Only one IRQ per NMI */ > + if (nr_irqs != 1) > + return -EINVAL; See previous reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > The Intel IOMMU interrupt remapping driver already programs correctly the > delivery mode of individual irqs as per their irq_data. Improve handling > of NMIs. Allow only one irq per NMI. Also, it is not necessary to cleanup > irq vectors after updating affinity. Structuring a changelog in paragraphs might make it readable. New lines exist for a reason. > NMIs do not have associated vectors. Again. NMI has an vector associated, but it is not subject to dynamic vector management. > diff --git a/drivers/iommu/intel/irq_remapping.c > b/drivers/iommu/intel/irq_remapping.c > index fb2d71bea98d..791a9331e257 100644 > --- a/drivers/iommu/intel/irq_remapping.c > +++ b/drivers/iommu/intel/irq_remapping.c > @@ -1198,8 +1198,12 @@ intel_ir_set_affinity(struct irq_data *data, const > struct cpumask *mask, >* After this point, all the interrupts will start arriving >* at the new destination. So, time to cleanup the previous >* vector allocation. > + * > + * Do it only for non-NMI irqs. NMIs don't have associated > + * vectors. See above. >*/ > - send_cleanup_vector(cfg); > + if (cfg->delivery_mode != APIC_DELIVERY_MODE_NMI) > + send_cleanup_vector(cfg); So this needs to be replicated for all invocations of send_cleanup_vector(), right? Why can't you put it into that function? > return IRQ_SET_MASK_OK_DONE; > } > @@ -1352,6 +1356,9 @@ static int intel_irq_remapping_alloc(struct irq_domain > *domain, > if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) > info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; > > + if ((info->flags & X86_IRQ_ALLOC_AS_NMI) && nr_irqs != 1) > + return -EINVAL; This cannot be reached when the vector allocation domain already rejected it, but copy & pasta is wonderful and increases the line count. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > Vectors are meaningless when allocating IRQs with NMI as the delivery > mode. Vectors are not meaningless. NMI has a fixed vector. The point is that for a fixed vector there is no vector management required. Can you spot the difference? > In such case, skip the reservation of IRQ vectors. Do it in the lowest- > level functions where the actual IRQ reservation takes place. > > Since NMIs target specific CPUs, keep the functionality to find the best > CPU. Again. What for? > + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) { > + cpu = irq_matrix_find_best_cpu(vector_matrix, dest); > + apicd->cpu = cpu; > + vector = 0; > + goto no_vector; > + } Why would a NMI ever end up in this code? There is no vector management required and this find cpu exercise is pointless. > vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu); > trace_vector_alloc(irqd->irq, vector, resvd, vector); > if (vector < 0) > return vector; > apic_update_vector(irqd, vector, cpu); > + > +no_vector: > apic_update_irq_cfg(irqd, vector, cpu); > > return 0; > @@ -321,12 +330,22 @@ assign_managed_vector(struct irq_data *irqd, const > struct cpumask *dest) > /* set_affinity might call here for nothing */ > if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask)) > return 0; > + > + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) { > + cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest); > + apicd->cpu = cpu; > + vector = 0; > + goto no_vector; > + } This code can never be reached for a NMI delivery. If so, then it's a bug. This all is special purpose for that particular HPET NMI watchdog use case and we are not exposing this to anything else at all. So why are you sprinkling this NMI nonsense all over the place? Just because? There are well defined entry points to all of this where this can be fenced off. If at some day the hardware people get their act together and provide a proper vector space for NMIs then we have to revisit that, but then there will be a separate NMI vector management too. What you want is the below which also covers the next patch. Please keep an eye on the comments I added/modified. Thanks, tglx --- --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -42,6 +42,7 @@ EXPORT_SYMBOL_GPL(x86_vector_domain); static DEFINE_RAW_SPINLOCK(vector_lock); static cpumask_var_t vector_searchmask; static struct irq_chip lapic_controller; +static struct irq_chip lapic_nmi_controller; static struct irq_matrix *vector_matrix; #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct hlist_head, cleanup_list); @@ -451,6 +452,10 @@ static int x86_vector_activate(struct ir trace_vector_activate(irqd->irq, apicd->is_managed, apicd->can_reserve, reserve); + /* NMI has a fixed vector. No vector management required */ + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) + return 0; + raw_spin_lock_irqsave(&vector_lock, flags); if (!apicd->can_reserve && !apicd->is_managed) assign_irq_vector_any_locked(irqd); @@ -472,6 +477,10 @@ static void vector_free_reserved_and_man trace_vector_teardown(irqd->irq, apicd->is_managed, apicd->has_reserved); + /* NMI has a fixed vector. No vector management required */ + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) + return; + if (apicd->has_reserved) irq_matrix_remove_reserved(vector_matrix); if (apicd->is_managed) @@ -568,6 +577,24 @@ static int x86_vector_alloc_irqs(struct irqd->hwirq = virq + i; irqd_set_single_target(irqd); + if (info->flags & X86_IRQ_ALLOC_AS_NMI) { + /* +* NMIs have a fixed vector and need their own +* interrupt chip so nothing can end up in the +* regular local APIC management code except the +* MSI message composing callback. +*/ + irqd->chip = &lapic_nmi_controller; + /* +* Don't allow affinity setting attempts for NMIs. +* This cannot work with the regular affinity +* mechanisms and for the intended HPET NMI +* watchdog use case it's not required. +*/ + irqd_set_no_balance(irqd); + continue; + } + /* * Prevent that any of these interrupts is invoked in * non interrupt context
Re: [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > There are no restrictions in hardware to set MSI messages with its > own delivery mode. "messages with its own" ? Plural/singular confusion. > Use the mode specified in the provided IRQ hardware > configuration data. Since most of the IRQs are configured to use the > delivery mode of the APIC driver in use (set in all of them to > APIC_DELIVERY_MODE_FIXED), the only functional changes are where > IRQs are configured to use a specific delivery mode. This does not parse. There are no functional changes due to this patch and there is no point talking about functional changes in subsequent patches here. > Changing the utility function __irq_msi_compose_msg() takes care of > implementing the change in the in the local APIC, PCI-MSI, and DMAR-MSI in the in the > irq_chips. > > The IO-APIC irq_chip configures the entries in the interrupt redirection > table using the delivery mode specified in the corresponding MSI message. > Since the MSI message is composed by a higher irq_chip in the hierarchy, > it does not need to be updated. The point is that updating __irq_msi_compose_msg() covers _all_ MSI consumers including IO-APIC. I had to read that changelog 3 times to make sense of it. Something like this perhaps: "x86/apic/msi: Use the delivery mode from irq_cfg for message composition irq_cfg provides a delivery mode for each interrupt. Use it instead of the hardcoded APIC_DELIVERY_MODE_FIXED. This allows to compose messages for NMI delivery mode which is required to implement a HPET based NMI watchdog. No functional change as the default delivery mode is set to APIC_DELIVERY_MODE_FIXED." Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode
On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > Currently, the delivery mode of all interrupts is set to the mode of the > APIC driver in use. There are no restrictions in hardware to configure the > delivery mode of each interrupt individually. Also, certain IRQs need > to be s/IRQ/interrupt/ Changelogs can do without acronyms. > configured with a specific delivery mode (e.g., NMI). > > Add a new member, delivery_mode, to struct irq_cfg. Subsequent changesets > will update every irq_domain to set the delivery mode of each IRQ to that > specified in its irq_cfg data. > > To keep the current behavior, when allocating an IRQ in the root > domain The root domain does not allocate an interrupt. The root domain allocates a vector for an interrupt. There is a very clear and technical destinction. Can you please be more careful about the wording? > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -567,6 +567,7 @@ static int x86_vector_alloc_irqs(struct irq_domain > *domain, unsigned int virq, > irqd->chip_data = apicd; > irqd->hwirq = virq + i; > irqd_set_single_target(irqd); > + Stray newline. > /* >* Prevent that any of these interrupts is invoked in >* non interrupt context via e.g. generic_handle_irq() > @@ -577,6 +578,14 @@ static int x86_vector_alloc_irqs(struct irq_domain > *domain, unsigned int virq, > /* Don't invoke affinity setter on deactivated interrupts */ > irqd_set_affinity_on_activate(irqd); > > + /* > + * Initialize the delivery mode of this irq to match the s/irq/interrupt/ > + * default delivery mode of the APIC. Children irq domains > + * may take the delivery mode from the individual irq > + * configuration rather than from the APIC driver. > + */ > + apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode; > + > /* >* Legacy vectors are already assigned when the IOAPIC >* takes them over. They stay on the same vector. This is ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors
Ricardo, On Thu, May 05 2022 at 16:59, Ricardo Neri wrote: > Certain types of interrupts, such as NMI, do not have an associated vector. > They, however, target specific CPUs. Thus, when assigning the destination > CPU, it is beneficial to select the one with the lowest number of > vectors. Why is that beneficial especially in the context of a NMI watchdog which then broadcasts the NMI to all other CPUs? That's wishful thinking perhaps, but I don't see any benefit at all. > Prepend the functions matrix_find_best_cpu_managed() and > matrix_find_best_cpu_managed() The same function prepended twice becomes two functions :) > with the irq_ prefix and expose them for > IRQ controllers to use when allocating and activating vector-less IRQs. There is no such thing like a vectorless IRQ. NMIs have a vector. Can we please describe facts and not pulled out of thin air concepts which do not exist? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/sva: Fix PASID use-after-free issue
On Fri, May 06 2022 at 09:49, Zhangfei Gao wrote: > Will this be taken for 5.18? It's queued in tip core/urgent and will go to Linus this week. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] x86/msi: Fix msi message data shadow struct
Reto, On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote: > The x86 MSI message data is 32 bits in total and is either in > compatibility or remappable format, see Intel Virtualization Technology > for Directed I/O, section 5.1.2. > > Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") > Signed-off-by: Reto Buerki > Signed-off-by: Adrian-Ken Rueegsegger This signed-off by chain is incorrect. It would be correct if Adrian-Ken would have sent the patch. If you both worked on that then please use the Co-developed-by tag according to Documentation/process/ Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 12/35] x86/msi: Provide msi message shadow structs
On Wed, Apr 06 2022 at 10:36, Reto Buerki wrote: > While working on some out-of-tree patches, we noticed that assignment to > dmar_subhandle of struct x86_msi_data lead to a QEMU warning about > reserved bits in MSI data being set: > > qemu-system-x86_64: vtd_interrupt_remap_msi: invalid IR MSI (sid=256, > address=0xfee003d8, data=0x1) > > This message originates from hw/i386/intel_iommu.c in QEMU: > > #define VTD_IR_MSI_DATA_RESERVED (0x) > if (origin->data & VTD_IR_MSI_DATA_RESERVED) { ... } > > Looking at struct x86_msi_data, it appears that it is actually 48-bits in size > since the bitfield containing the vector, delivery_mode etc is 2 bytes wide > followed by dmar_subhandle which is 32 bits. Thus assignment to dmar_subhandle > leads to bits > 16 being set. > > If I am not mistaken, the MSI data field should be 32-bits wide for all > platforms (struct msi_msg, include/linux/msi.h). Is this analysis > correct or did I miss something wrt. handling of dmar_subhandle? It's correct and I'm completely surprised that this went unnoticed for more than a year. Where is that brown paperbag... Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 11/11] docs: x86: Change documentation for SVA (Shared Virtual Addressing)
On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote: > Since allocating, freeing and fixing up PASID are changed, the > documentation is updated to reflect the changes. > > Originally-by: Ashok Raj > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Acked-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote: > PASIDs are process wide. It was attempted to use refcounted PASIDs to > free them when the last thread drops the refcount. This turned out to > be complex and error prone. Given the fact that the PASID space is 20 > bits, which allows up to 1M processes to have a PASID associated > concurrently, PASID resource exhaustion is not a realistic concern. > > Therefore it was decided to simplify the approach and stick with lazy > on demand PASID allocation, but drop the eager free approach and make > a allocated PASID lifetime bound to the life time of the process. > > Get rid of the refcounting mechanisms and replace/rename the interfaces > to reflect this new approach. > > Suggested-by: Dave Hansen > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/11] kernel/fork: Initialize mm's PASID
On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote: > A new mm doesn't have a PASID yet when it's created. Initialize > the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1). > > INIT_PASID (0) is reserved for kernel legacy DMA PASID. It cannot be > allocated to a user process. Initializing the process's PASID to 0 may > cause confusion that why the process uses the reserved kernel legacy DMA > PASID. Initializing the PASID to INVALID_IOASID (-1) explicitly > tells the process doesn't have a valid PASID yet. > > Even though the only user of mm_pasid_init() is in fork.c, define it > in as the first of three mm/pasid life cycle > functions (init/set/drop) to keep these all together. > > Suggested-by: Dave Hansen > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 09/11] x86/cpufeatures: Re-enable ENQCMD
On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote: > Since ENQCMD is handled by #GP fix up, it can be re-enabled. > > The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add > X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that > cpu_feature_enabled() can be used to check the feature. > > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 08/11] x86/traps: Demand-populate PASID MSR via #GP
On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote: > All tasks start with PASID state disabled. This means that the first > time they execute an ENQCMD instruction they will take a #GP fault. > > Modify the #GP fault handler to check if the "mm" for the task has > already been allocated a PASID. If so, try to fix the #GP fault by > loading the IA32_PASID MSR. > > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 07/11] sched: Define and initialize a flag to identify valid PASID in the task
On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote: > From: Peter Zijlstra > > Add a new single bit field to the task structure to track whether this task > has initialized the IA32_PASID MSR to the mm's PASID. > > Initialize the field to zero when creating a new task with fork/clone. > > Signed-off-by: Peter Zijlstra > Co-developed-by: Fenghua Yu > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 06/11] x86/fpu: Clear PASID when copying fpstate
On Fri, Jan 28 2022 at 12:29, Fenghua Yu wrote: > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > To avoid complexity of updating each thread's PASID status (e.g. sending > IPI to update IA32_PASID MSR) on allocating and freeing PASID, once > allocated and assigned to an mm, the PASID stays with the mm for the > rest of the mm's lifetime. A reference to the PASID is taken on > allocating the PASID. Binding/unbinding the PASID won't change refcount. > The reference is dropped on mm exit and thus the PASID is freed. > > Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because > the PASID operations handle the pasid member in mm_struct and should be > part of mm operations. Because IOASID's reference count is not used any > more and removed, unused ioasid_get() and iommu_sva_free_pasid() > are deleted and ioasid_put() is renamed to ioasid_free(). > > 20-bit PASID allows up to 1M processes bound to PASIDs at the same time. > With cgroups and other controls that might limit the number of process > creation, the limited number of PASIDs is not a realistic issue for > lazy PASID free. Please take a step back and think hard about it whether that changelog makes sense to you a year from now. Let me walk you through: > To avoid complexity of updating each thread's PASID status (e.g. sending > IPI to update IA32_PASID MSR) on allocating and freeing PASID, once > allocated and assigned to an mm, the PASID stays with the mm for the > rest of the mm's lifetime. You are missing the oportunity to tell a story about the history of this decision here: PASIDs are process wide. It was attempted to use refcounted PASIDs to free them when the last thread drops the refcount. This turned out to be complex and error prone. Given the fact that the PASID space is 20 bits, which allows up to 1M processes to have a PASID associated concurrently, PASID resource exhaustion is not a realistic concern. Therefore it was decided to simplify the approach and stick with lazy on demand PASID allocation, but drop the eager free approach and make a allocated PASID lifetime bound to the life time of the process. > A reference to the PASID is taken on allocating the > PASID. Binding/unbinding the PASID won't change refcount. The > reference is dropped on mm exit and thus the PASID is freed. There is no refcount in play anymore, right? So how does this part of the changelog make any sense? This is followed by: > Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because > the PASID operations handle the pasid member in mm_struct and should be > part of mm operations. Because IOASID's reference count is not used any > more and removed, unused ioasid_get() and iommu_sva_free_pasid() > are deleted and ioasid_put() is renamed to ioasid_free(). which does not provide much rationale and just blurbs about _what_ the patch is doing and not about the why and the connection to the above. And the refcount removal section contradicts the stale text above. So this paragraph should be something like this: Get rid of the refcounting mechanisms and replace/rename the interfaces to reflect this new approach. Hmm? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > pasid_valid() is defined to check if a given PASID is valid. > > Suggested-by: Ashok Raj > Suggested-by: Jacob Pan > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 04/11] kernel/fork: Initialize mm's PASID
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > A new mm doesn't have a PASID yet when it's created. Initialize > the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1). I must be missing something here. > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index aa5f09ca5bcf..c74d1edbac2f 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -8,6 +8,7 @@ > #include > #include > #include > +#include > > /* > * Routines for handling mm_structs > @@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct > mm_struct *next_mm) > } > #endif > > +#ifdef CONFIG_IOMMU_SVA > +static inline void mm_pasid_init(struct mm_struct *mm) > +{ > + mm->pasid = INVALID_IOASID; > +} > +#else > +static inline void mm_pasid_init(struct mm_struct *mm) {} > +#endif > + > #endif /* _LINUX_SCHED_MM_H */ So this adds mm_pasid_init() to linux/sched/mm.h which replaces: > -static void mm_init_pasid(struct mm_struct *mm) > -{ > -#ifdef CONFIG_IOMMU_SVA > - mm->pasid = INIT_PASID; > -#endif > -} > - I.e. already existing code which is initializing mm->pasid with INIT_PASID (0) while the replacement initializes it with INVALID_IOASID (-1). The change log does not have any information about why INIT_PASID is the wrong initialization value and why this change is not having any side effects. It neither mentions why having this in a global available header makes sense when the only call site is in the C file from which the already existing function is removed. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > pasid_valid() is defined to check if a given PASID is valid. > > Suggested-by: Ashok Raj > Suggested-by: Jacob Pan > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 02/11] mm: Change CONFIG option for mm->pasid field
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > This currently depends on CONFIG_IOMMU_SUPPORT. But it is only > needed when CONFIG_IOMMU_SVA option is enabled. > > Change the CONFIG guards around definition and initialization > of mm->pasid field. > > Suggested-by: Jacob Pan > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/11] iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote: > This CONFIG option originally only referred to the Shared > Virtual Address (SVA) library. But it is now also used for > non-library portions of code. > > Drop the "_LIB" suffix so that there is just one configuration > options for all code relating to SVA. > > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()
Guenter, On Mon, Jan 31 2022 at 07:21, Guenter Roeck wrote: > Sure. Please see http://server.roeck-us.net/qemu/x86/. > The logs are generated with with v5.16.4. thanks for providing the data. It definitely helped me to leave the state of not seeing the wood for the trees. Fix below. Thanks, tglx --- Subject: PCI/MSI: Remove bogus warning in pci_irq_get_affinity() From: Thomas Gleixner Date: Mon, 31 Jan 2022 22:02:46 +0100 The recent overhaul of pci_irq_get_affinity() introduced a regression when pci_irq_get_affinity() is called for an MSI-X interrupt which was not allocated with affinity descriptor information. The original code just returned a NULL pointer in that case, but the rework added a WARN_ON() under the assumption that the corresponding WARN_ON() in the MSI case can be applied to MSI-X as well. In fact the MSI warning in the original code does not make sense either because it's legitimate to invoke pci_irq_get_affinity() for a MSI interrupt which was not allocated with affinity descriptor information. Remove it and just return NULL as the original code did. Fixes: f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") Reported-by: Guenter Roeck Signed-off-by: Thomas Gleixner --- drivers/pci/msi/msi.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -,7 +,8 @@ const struct cpumask *pci_irq_get_affini if (!desc) return cpu_possible_mask; - if (WARN_ON_ONCE(!desc->affinity)) + /* MSI[X] interrupts can be allocated without affinity descriptor */ + if (!desc->affinity) return NULL; /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()
On Sun, Jan 30 2022 at 09:12, Guenter Roeck wrote: > On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: > This patch results in the following runtime warning when booting x86 > (32 bit) nosmp images from NVME in qemu. > > [ 14.825482] nvme nvme0: 1/0/0 default/read/poll queues > ILLOPC: ca7c6d10: 0f 0b > [ 14.826188] [ cut here ] > [ 14.826307] WARNING: CPU: 0 PID: 7 at drivers/pci/msi/msi.c:1114 > pci_irq_get_affinity+0x80/0x90 This complains about msi_desc->affinity being NULL. > git bisect bad f48235900182d64537c6e8f8dc0932b57a1a0638 > # first bad commit: [f48235900182d64537c6e8f8dc0932b57a1a0638] PCI/MSI: > Simplify pci_irq_get_affinity() Hrm. Can you please provide dmesg and /proc/interrupts from a kernel before that commit? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Thu, Jan 27 2022 at 18:42, Fenghua Yu wrote: > On Wed, Jan 26, 2022 at 10:38:04PM +0100, Thomas Gleixner wrote: >> Against Linus tree please so that the bugfix applies. >> >> > I will fold the following patch into patch #5. The patch #11 (the doc >> > patch) >> > also needs to remove one paragraph talking about refcount. >> > >> > So I will send the whole patch set with the following changes: >> > 1. One new bug fix patch (the above patch) > > When I study your above aux_domain bug fix path, I find more aux_domain bugs. > But then I find aux_domain will be completely removed because all aux_domain > related callbacks are not called and are dead code (no wonder there are > so many bugs in aux_domain). Please see this series: > https://lore.kernel.org/linux-iommu/20220124071103.2097118-4-baolu...@linux.intel.com/ > For the series, Baolu confirms that he is "pretty sure that should be part > of v5.18". And I don't find the series calls any IOASID function after > removing the aux_domain code. > > So that means we don't need to fix those issues in the dead aux_domain > code any more because it will be completely removed in 5.18, right? > > If you agree, I will not include the aux_domain fix patch or any other > aux_domain fix patches in the up-coming v3. Is that OK? Fair enough. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Wed, Jan 26 2022 at 09:36, Fenghua Yu wrote: > On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote: >> On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote: >> While looking at ioasid_put() usage I tripped over the following UAF >> issue: >> >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma >> auxiliary_unlink_device(domain, dev); >> link_failed: >> spin_unlock_irqrestore(&device_domain_lock, flags); >> -if (list_empty(&domain->subdevices) && domain->default_pasid > 0) >> +if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { >> ioasid_put(domain->default_pasid); >> +domain->default_pasid = INVALID_IOASID; >> +} >> >> return ret; >> } >> @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct >> >> spin_unlock_irqrestore(&device_domain_lock, flags); >> >> -if (list_empty(&domain->subdevices) && domain->default_pasid > 0) >> +if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { >> ioasid_put(domain->default_pasid); >> +domain->default_pasid = INVALID_IOASID; >> +} >> } >> >> static int prepare_domain_attach_device(struct iommu_domain *domain, > > The above patch fixes an existing issue. I will put it in a separate patch, > right? Correct. > It cannot be applied cleanly to the upstream tree. Do you want me to base > the above patch (and the whole patch set) to the upstream tree or a specific > tip branch? Against Linus tree please so that the bugfix applies. > I will fold the following patch into patch #5. The patch #11 (the doc patch) > also needs to remove one paragraph talking about refcount. > > So I will send the whole patch set with the following changes: > 1. One new bug fix patch (the above patch) > 2. Updated patch #5 (with the following patch folded) > 3. Updated patch #11 (removing refcount description) Looks good. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote: > On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote: > /** > * ioasid_put - Release a reference to an ioasid > * @ioasid: the ID to remove which in turn makes ioasid_put() a misnomer and the whole refcounting of the ioasid a pointless exercise. While looking at ioasid_put() usage I tripped over the following UAF issue: --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma auxiliary_unlink_device(domain, dev); link_failed: spin_unlock_irqrestore(&device_domain_lock, flags); - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { ioasid_put(domain->default_pasid); + domain->default_pasid = INVALID_IOASID; + } return ret; } @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct spin_unlock_irqrestore(&device_domain_lock, flags); - if (list_empty(&domain->subdevices) && domain->default_pasid > 0) + if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { ioasid_put(domain->default_pasid); + domain->default_pasid = INVALID_IOASID; + } } static int prepare_domain_attach_device(struct iommu_domain *domain, Vs. ioasid_put() I think we should fold the following: --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4818,7 +4818,7 @@ static int aux_domain_add_dev(struct dma link_failed: spin_unlock_irqrestore(&device_domain_lock, flags); if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); domain->default_pasid = INVALID_IOASID; } @@ -4850,7 +4850,7 @@ static void aux_domain_remove_dev(struct spin_unlock_irqrestore(&device_domain_lock, flags); if (list_empty(&domain->subdevices) && domain->default_pasid > 0) { - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); domain->default_pasid = INVALID_IOASID; } } --- a/drivers/iommu/ioasid.c +++ b/drivers/iommu/ioasid.c @@ -2,7 +2,7 @@ /* * I/O Address Space ID allocator. There is one global IOASID space, split into * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and - * free IOASIDs with ioasid_alloc and ioasid_put. + * free IOASIDs with ioasid_alloc() and ioasid_free(). */ #include #include @@ -15,7 +15,6 @@ struct ioasid_data { struct ioasid_set *set; void *private; struct rcu_head rcu; - refcount_t refs; }; /* @@ -315,7 +314,6 @@ ioasid_t ioasid_alloc(struct ioasid_set data->set = set; data->private = private; - refcount_set(&data->refs, 1); /* * Custom allocator needs allocator data to perform platform specific @@ -348,17 +346,11 @@ ioasid_t ioasid_alloc(struct ioasid_set EXPORT_SYMBOL_GPL(ioasid_alloc); /** - * ioasid_put - Release a reference to an ioasid + * ioasid_free - Free an ioasid * @ioasid: the ID to remove - * - * Put a reference to the IOASID, free it when the number of references drops to - * zero. - * - * Return: %true if the IOASID was freed, %false otherwise. */ -bool ioasid_put(ioasid_t ioasid) +void ioasid_free(ioasid_t ioasid) { - bool free = false; struct ioasid_data *ioasid_data; spin_lock(&ioasid_allocator_lock); @@ -368,10 +360,6 @@ bool ioasid_put(ioasid_t ioasid) goto exit_unlock; } - free = refcount_dec_and_test(&ioasid_data->refs); - if (!free) - goto exit_unlock; - active_allocator->ops->free(ioasid, active_allocator->ops->pdata); /* Custom allocator needs additional steps to free the xa element */ if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) { @@ -381,9 +369,8 @@ bool ioasid_put(ioasid_t ioasid) exit_unlock: spin_unlock(&ioasid_allocator_lock); - return free; } -EXPORT_SYMBOL_GPL(ioasid_put); +EXPORT_SYMBOL_GPL(ioasid_free); /** * ioasid_find - Find IOASID data --- a/include/linux/ioasid.h +++ b/include/linux/ioasid.h @@ -34,7 +34,7 @@ struct ioasid_allocator_ops { #if IS_ENABLED(CONFIG_IOASID) ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, void *private); -bool ioasid_put(ioasid_t ioasid); +void ioasid_free(ioasid_t ioasid); void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)); int ioasid_register_allocator(s
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Jan 24 2022 at 12:52, Fenghua Yu wrote: > On Mon, Jan 24, 2022 at 09:36:00PM +0100, Thomas Gleixner wrote: >> On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote: > Ah. This patch should remove ioasid_get(). So I will change this patch > as follows: > > 1. Remove ioasid_get() because it's not used any more when IOASID's >refcount is set to 1 once the IOASID is allocated and is freed on mm exit. > 2. Change mm_pasid_get() to mm_pasid_set(). Yes. Just resend this one. No need to post the full queue again. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote: > > Hrm. This is odd. > >> +/* Associate a PASID with an mm_struct: */ >> +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid) >> +{ >> +mm->pasid = pasid; >> +} > > This does not get anything. It sets the allocated PASID in the mm. The > refcount on the PASID was already taken by the allocation. So this > should be mm_pasid_set() or mm_pasid_install(), right? And as a result of all this ioasid_get() is now left without users... Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Fri, Dec 17 2021 at 22:01, Fenghua Yu wrote: > diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c > index bd41405d34e9..ee2294e02716 100644 > --- a/drivers/iommu/iommu-sva-lib.c > +++ b/drivers/iommu/iommu-sva-lib.c > @@ -18,8 +18,7 @@ static DECLARE_IOASID_SET(iommu_sva_pasid); > * > * Try to allocate a PASID for this mm, or take a reference to the existing > one > * provided it fits within the [@min, @max] range. On success the PASID is > - * available in mm->pasid, and must be released with iommu_sva_free_pasid(). > - * @min must be greater than 0, because 0 indicates an unused mm->pasid. > + * available in mm->pasid and will be available for the lifetime of the mm. > * > * Returns 0 on success and < 0 on error. > */ > @@ -33,38 +32,24 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t > min, ioasid_t max) > return -EINVAL; > > mutex_lock(&iommu_sva_lock); > - if (mm->pasid) { > - if (mm->pasid >= min && mm->pasid <= max) > - ioasid_get(mm->pasid); > - else > + /* Is a PASID already associated with this mm? */ > + if (pasid_valid(mm->pasid)) { > + if (mm->pasid < min || mm->pasid >= max) > ret = -EOVERFLOW; > - } else { > - pasid = ioasid_alloc(&iommu_sva_pasid, min, max, mm); > - if (pasid == INVALID_IOASID) > - ret = -ENOMEM; > - else > - mm->pasid = pasid; > + goto out; > } > + > + pasid = ioasid_alloc(&iommu_sva_pasid, min, max, mm); > + if (!pasid_valid(pasid)) > + ret = -ENOMEM; > + else > + mm_pasid_get(mm, pasid); Hrm. This is odd. > +/* Associate a PASID with an mm_struct: */ > +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid) > +{ > + mm->pasid = pasid; > +} This does not get anything. It sets the allocated PASID in the mm. The refcount on the PASID was already taken by the allocation. So this should be mm_pasid_set() or mm_pasid_install(), right? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()
On Sat, Dec 18 2021 at 21:25, Cédric Le Goater wrote: > On 12/18/21 11:25, Thomas Gleixner wrote: >> On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote: >>> On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: >>> I just bisected a boot failure on my AMD test desktop to this patch as >>> commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") in >>> -next. It looks like there is a problem with the NVMe drive after this >>> change according to the logs. Given that the hard drive is not getting >>> mounted for journald to write logs to, I am not really sure how to get >>> them from the machine so I have at least taken a picture of what I see >>> on my screen; open to ideas on that front! >> >> Bah. Fix below. > > That's a fix for the issue I was seeing on pseries with NVMe. > > Tested-by: Cédric Le Goater I had a faint memory that I've seen that issue before, but couldn't find the mail in those massive threads. Thanks for confirming! tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()
On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote: > On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: > I just bisected a boot failure on my AMD test desktop to this patch as > commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") in > -next. It looks like there is a problem with the NVMe drive after this > change according to the logs. Given that the hard drive is not getting > mounted for journald to write logs to, I am not really sure how to get > them from the machine so I have at least taken a picture of what I see > on my screen; open to ideas on that front! Bah. Fix below. Thanks, tglx --- diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 71802410e2ab..9b4910befeda 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -1100,7 +1100,7 @@ EXPORT_SYMBOL(pci_irq_vector); */ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr) { - int irq = pci_irq_vector(dev, nr); + int idx, irq = pci_irq_vector(dev, nr); struct msi_desc *desc; if (WARN_ON_ONCE(irq <= 0)) @@ -1113,7 +1113,10 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr) if (WARN_ON_ONCE(!desc->affinity)) return NULL; - return &desc->affinity[nr].mask; + + /* MSI has a mask array in the descriptor. */ + idx = dev->msi_enabled ? nr : 0; + return &desc->affinity[idx].mask; } EXPORT_SYMBOL(pci_irq_get_affinity); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Wed, Dec 15 2021 at 19:45, Nishanth Menon wrote: > On 17:35-20211215, Thomas Gleixner wrote: > Thanks once again for your help. Hope we can roll in the fixes for > part3. Sure, it's only the one-liner for ti sci. Got it folded already. Thanks for your help and testing! tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V4 09-02/35] PCI/MSI: Allocate MSI device data on first use
Allocate MSI device data on first use, i.e. when a PCI driver invokes one of the PCI/MSI enablement functions. Add a wrapper function to ensure that the ordering vs. pcim_msi_release() is correct. Signed-off-by: Thomas Gleixner --- V4: Adopted to ensure devres ordering --- drivers/pci/msi/msi.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -366,6 +366,19 @@ static int pcim_setup_msi_release(struct return ret; } +/* + * Ordering vs. devres: msi device data has to be installed first so that + * pcim_msi_release() is invoked before it on device release. + */ +static int pci_setup_msi_context(struct pci_dev *dev) +{ + int ret = msi_setup_device_data(&dev->dev); + + if (!ret) + ret = pcim_setup_msi_release(dev); + return ret; +} + static struct msi_desc * msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd) { @@ -909,7 +922,7 @@ static int __pci_enable_msi_range(struct if (nvec > maxvec) nvec = maxvec; - rc = pcim_setup_msi_release(dev); + rc = pci_setup_msi_context(dev); if (rc) return rc; @@ -956,7 +969,7 @@ static int __pci_enable_msix_range(struc if (WARN_ON_ONCE(dev->msix_enabled)) return -EINVAL; - rc = pcim_setup_msi_release(dev); + rc = pci_setup_msi_context(dev); if (rc) return rc; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V4 09-01/35] PCI/MSI: Decouple MSI[-X] disable from pcim_release()
The MSI core will introduce runtime allocation of MSI related data. This data will be devres managed and has to be set up before enabling PCI/MSI[-X]. This would introduce an ordering issue vs. pcim_release(). The setup order is: pcim_enable_device() devres_alloc(pcim_release...); ... pci_irq_alloc() msi_setup_device_data() devres_alloc(msi_device_data_release, ...) and once the device is released these release functions are invoked in the opposite order: msi_device_data_release() ... pcim_release() pci_disable_msi[x]() which is obviously wrong, because pci_disable_msi[x]() requires the MSI data to be available to tear down the MSI[-X] interrupts. Remove the MSI[-X] teardown from pcim_release() and add an explicit action to be installed on the attempt of enabling PCI/MSI[-X]. This allows the MSI core data allocation to be ordered correctly in a subsequent step. Reported-by: Nishanth Menon Signed-off-by: Thomas Gleixner --- V4: New patch --- drivers/pci/msi/msi.c | 33 + drivers/pci/pci.c |5 - include/linux/pci.h |3 ++- 3 files changed, 35 insertions(+), 6 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -341,6 +341,31 @@ void pci_restore_msi_state(struct pci_de } EXPORT_SYMBOL_GPL(pci_restore_msi_state); +static void pcim_msi_release(void *pcidev) +{ + struct pci_dev *dev = pcidev; + + dev->is_msi_managed = false; + pci_free_irq_vectors(dev); +} + +/* + * Needs to be separate from pcim_release to prevent an ordering problem + * vs. msi_device_data_release() in the MSI core code. + */ +static int pcim_setup_msi_release(struct pci_dev *dev) +{ + int ret; + + if (!pci_is_managed(dev) || dev->is_msi_managed) + return 0; + + ret = devm_add_action(&dev->dev, pcim_msi_release, dev); + if (!ret) + dev->is_msi_managed = true; + return ret; +} + static struct msi_desc * msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd) { @@ -884,6 +909,10 @@ static int __pci_enable_msi_range(struct if (nvec > maxvec) nvec = maxvec; + rc = pcim_setup_msi_release(dev); + if (rc) + return rc; + for (;;) { if (affd) { nvec = irq_calc_affinity_vectors(minvec, nvec, affd); @@ -927,6 +956,10 @@ static int __pci_enable_msix_range(struc if (WARN_ON_ONCE(dev->msix_enabled)) return -EINVAL; + rc = pcim_setup_msi_release(dev); + if (rc) + return rc; + for (;;) { if (affd) { nvec = irq_calc_affinity_vectors(minvec, nvec, affd); --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2024,11 +2024,6 @@ static void pcim_release(struct device * struct pci_devres *this = res; int i; - if (dev->msi_enabled) - pci_disable_msi(dev); - if (dev->msix_enabled) - pci_disable_msix(dev); - for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) if (this->region_mask & (1 << i)) pci_release_region(dev, i); --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -425,7 +425,8 @@ struct pci_dev { unsigned intats_enabled:1; /* Address Translation Svc */ unsigned intpasid_enabled:1;/* Process Address Space ID */ unsigned intpri_enabled:1; /* Page Request Interface */ - unsigned intis_managed:1; + unsigned intis_managed:1; /* Managed via devres */ + unsigned intis_msi_managed:1; /* MSI release via devres installed */ unsigned intneeds_freset:1; /* Requires fundamental reset */ unsigned intstate_saved:1; unsigned intis_physfn:1; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Wed, Dec 15 2021 at 17:18, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote: >> On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: >> >> thanks for trying. I'll have a look again with brain awake tomorrow >> morning. > > Morning was busy with other things, but I found what my sleepy brain > managed to do wrong yesterday evening. > > Let me reintegrate the pile and I'll send you an update. git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.1-part-2 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4.2-part-3 That should cure the problem. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 22:19, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: > > thanks for trying. I'll have a look again with brain awake tomorrow > morning. Morning was busy with other things, but I found what my sleepy brain managed to do wrong yesterday evening. Let me reintegrate the pile and I'll send you an update. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Tue, Dec 14 2021 at 14:56, Nishanth Menon wrote: > On 21:15-20211214, Thomas Gleixner wrote: >> I think I managed to distangle this. Can you please give: >> >>git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2 > > > Umm.. I am not entirely sure what is going on.. but all kinds of weird > corruption seems to occur with msi-v4-part-2 that does'nt seem to be > present in v5.16-rc5. (I use NFS since ethernet in K3 platforms use > inta/intr and dma that is impacted by this series). > > I will try and rebase your patches on v5.16-rc4 to be sure as well and > report back later today once i get some time. > > [1] https://gist.github.com/nmenon/a66e022926c4c15313c45d44313d860c > msi-v4-part-2 > [2] https://gist.github.com/nmenon/43085664d69ad846d596e76a06ed0656 v5.16-rc5 thanks for trying. I'll have a look again with brain awake tomorrow morning. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
Nishanth, On Tue, Dec 14 2021 at 18:03, Thomas Gleixner wrote: > msi_device_data_release() > ... > pcim_release() >pci_disable_msi[x]() > > Groan I think I managed to distangle this. Can you please give: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-2 and/or the full pile: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v4-part-3 a test ride? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 17:36, Thomas Gleixner wrote: > On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote: >> On 10:41-20211214, Thomas Gleixner wrote: > [ 13.478122] Call trace: > [ 13.509042] msi_device_destroy_sysfs+0x18/0x88 > [ 13.509058] msi_domain_free_irqs+0x34/0x58 > [ 13.509064] pci_msi_teardown_msi_irqs+0x30/0x3c > [ 13.509072] free_msi_irqs+0x78/0xd4 > [ 13.509077] pci_disable_msix+0x138/0x164 > [ 13.529930] pcim_release+0x70/0x238 > [ 13.529942] devres_release_all+0x9c/0xfc > [ 13.529951] device_release_driver_internal+0x1a0/0x244 > [ 13.542725] device_release_driver+0x18/0x24 > [ 13.542741] iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi] > [ 13.552308] request_firmware_work_func+0x50/0x9c > [ 13.552320] process_one_work+0x194/0x25c > > That's not a driver problem, that's an ordering issue vs. the devres > muck. Let me go back to the drawing board. Sigh... Which is pretty obvious why: pcim_enable_device() devres_alloc(pcim_release...); ... pci_irq_alloc() msi_setup_device_data() devres_alloc(msi_device_data_release, ...) and once the device is released: msi_device_data_release() ... pcim_release() pci_disable_msi[x]() Groan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Tue, Dec 14 2021 at 10:22, Nishanth Menon wrote: > On 10:41-20211214, Thomas Gleixner wrote: > Agreed that the warning is fine, the null pointer exception that follows > [1] [2] it however does'nt look right and it can be trivially fixed with the > following fixup for ee90787487bc ("genirq/msi: Provide > msi_device_populate/destroy_sysfs()") below, with that the log looks > like [3] - the warn is good, the null pointer exception and resultant > crash could be avoided (not saying this is the best solution): Aaargh. [ 13.478122] Call trace: [ 13.509042] msi_device_destroy_sysfs+0x18/0x88 [ 13.509058] msi_domain_free_irqs+0x34/0x58 [ 13.509064] pci_msi_teardown_msi_irqs+0x30/0x3c [ 13.509072] free_msi_irqs+0x78/0xd4 [ 13.509077] pci_disable_msix+0x138/0x164 [ 13.529930] pcim_release+0x70/0x238 [ 13.529942] devres_release_all+0x9c/0xfc [ 13.529951] device_release_driver_internal+0x1a0/0x244 [ 13.542725] device_release_driver+0x18/0x24 [ 13.542741] iwl_req_fw_callback+0x1a28/0x1ddc [iwlwifi] [ 13.552308] request_firmware_work_func+0x50/0x9c [ 13.552320] process_one_work+0x194/0x25c That's not a driver problem, that's an ordering issue vs. the devres muck. Let me go back to the drawing board. Sigh... Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
On Mon, Dec 13 2021 at 12:29, Nishanth Menon wrote: > On 23:18-20211210, Thomas Gleixner wrote: > Also while testing on TI K3 platforms, I noticed: > > msi_device_data_release/msi_device_destroy_sysfs in am64xx-evm / j7200 The warning complains about a device being released with MSI descriptors still attached to the device. This was added by: 5b012cede0f7 ("device: Add device::msi_data pointer and struct msi_device_data") That's not a regression caused by this commit. The warning is just exposing an already existing problem in the iwlwifi driver, which seems to do: probe() setup_pci_msi[x]_interrupts() start_drv() if (try_to_load_firmware() == FAIL) device_release_driver() ... msi_device_data_release() WARN() Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sun, Dec 12 2021 at 01:56, Kevin Tian wrote: >> From: Thomas Gleixner >> All I can find is drivers/iommu/virtio-iommu.c but I can't find anything >> vIR related there. > > Well, virtio-iommu is a para-virtualized vIOMMU implementations. > > In reality there are also fully emulated vIOMMU implementations (e.g. > Qemu fully emulates Intel/AMD/ARM IOMMUs). In those configurations > the IR logic in existing iommu drivers just apply: > > drivers/iommu/intel/irq_remapping.c > drivers/iommu/amd/iommu.c thanks for the explanation. So that's a full IOMMU emulation. I was more expecting a paravirtualized lightweight one. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sun, Dec 12 2021 at 02:14, Kevin Tian wrote: >> From: Thomas Gleixner > I just continue the thought practice along that direction to see what > the host flow will be like (step by step). Looking at the current > implementation is just one necessary step in my thought practice to > help refine the picture. When I found something which may be > worth being aligned then I shared to avoid follow a wrong direction > too far. > > If both of your think it simply adds noise to this discussion, I can > surely hold back and focus on 'concept' only. All good. We _want_ your participartion for sure. Comparing and contrasting it to the existing flow is fine. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sat, Dec 11 2021 at 07:52, Kevin Tian wrote: >> From: Jason Gunthorpe >> > Then Qemu needs to find out the GSI number for the vIRTE handle. >> > Again Qemu doesn't have such information since it doesn't know >> > which MSI[-X] entry points to this handle due to no trap. >> >> No this is already going wrong. qemu *cannot* know the MSI information >> because there is no MSI information for IMS. > > I haven't thought of IMS at this step. The IR approach applies to > all types of interrupt storages, thus I'm more interested in how it > affect the storages which are already virtualized today (MSI[-X] > in my thought practice). They are not any different. As I explained several times now IMS is nothing new at all. It existed since the invention of Message Signaled interrupts. Why? The principle behind Message Signaled Interrupts is: Device writes DATA to ADDRESS which raises an interrupt in a CPU Message Signaled Interrupts obviously need some place to store the ADDRESS/DATA pair so that the device can use it for raising an interrupt, i.e. an Interrupt Message Store, short IMS. PCI/MSI was the first implementation of this and the storage was defined to be at a specified and therefore uniform and device independent place. PCI/MSI-X followed the same approch. While it solved quite some of the shortcomings of PCI/MSI it still has a specificed and uniform and device independent place to store the message (ADDRESS/DATA pair) Now the PCI wizards figured out that PCI/MSI[-X] is not longer up to the task for various reasons and came up with the revolutionary new concept of IMS, aka Interrupt Message Store. where the device defines where the message is stored. IOW, this is coming back full circle to the original problem of where to store the message, i.e. the ADDRESS/DATA pair so that the device can raise an interrupt in a CPU, which requires - drum roll - an Interrupt Message Store, short IMS. So you simply have to look at it from a pure MSI (not PCI/MSI) point of view: MSI at the conceptual level requires storage for the ADDRESS/DATA pair at some place so that the device or the compute unit embedded in the device can write DATA to ADDRESS. That's it. Not more, not less. When you look at it from this perspective, then you'll realize that PCI/MSI and PCI/MSI-X are just implementations of IMS Not more, not less. The fact that they have very strict rules about the storage space and the fact that they are mutually exclusive does not change that at all. That's where a lot of the confusion comes from. If you go back to all the IDXD/IMS discussions which happened over time then you'll figure out that _all_ of us where coming from the same wrong assumption: IMS is new and it's just another exclusive variant of PCI/MSI and PCi/MSI-X. It took _all_ of us quite some time to realize that we need to look at it from the other way around. There was surely some other conceptual confusion vs. subdevices, queues and whatever involved which contributed to that. Water under the bridge. Coming back to your initial question: > I haven't thought of IMS at this step. The IR approach applies to > all types of interrupt storages, thus I'm more interested in how it > affect the storages which are already virtualized today (MSI[-X] > in my thought practice). Stop focussing on implementation details. Focus on the general concept instead. See above. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sat, Dec 11 2021 at 07:44, Kevin Tian wrote: >> From: Thomas Gleixner >> On Fri, Dec 10 2021 at 08:39, Jason Gunthorpe wrote: >> > It is clever, we don't have an vIOMMU that supplies vIR today, so by >> > definition all guests are excluded and only bare metal works. >> >> Dammit. Now you spilled the beans. :) > > Unfortunately we do have that today. Qemu supports IR for > both AMD and Intel vIOMMU. can you point me to the code? All I can find is drivers/iommu/virtio-iommu.c but I can't find anything vIR related there. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 34/35] soc: ti: ti_sci_inta_msi: Get rid of ti_sci_inta_msi_get_virq()
From: Thomas Gleixner Just use the core function msi_get_virq(). Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Peter Ujfalusi Cc: Vinod Koul Cc: dmaeng...@vger.kernel.org --- drivers/dma/ti/k3-udma-private.c |6 ++ drivers/dma/ti/k3-udma.c | 10 -- drivers/soc/ti/k3-ringacc.c|2 +- drivers/soc/ti/ti_sci_inta_msi.c | 12 include/linux/soc/ti/ti_sci_inta_msi.h |1 - 5 files changed, 7 insertions(+), 24 deletions(-) --- a/drivers/dma/ti/k3-udma-private.c +++ b/drivers/dma/ti/k3-udma-private.c @@ -168,8 +168,7 @@ int xudma_pktdma_tflow_get_irq(struct ud { const struct udma_oes_offsets *oes = &ud->soc_data->oes; - return ti_sci_inta_msi_get_virq(ud->dev, udma_tflow_id + - oes->pktdma_tchan_flow); + return msi_get_virq(ud->dev, udma_tflow_id + oes->pktdma_tchan_flow); } EXPORT_SYMBOL(xudma_pktdma_tflow_get_irq); @@ -177,7 +176,6 @@ int xudma_pktdma_rflow_get_irq(struct ud { const struct udma_oes_offsets *oes = &ud->soc_data->oes; - return ti_sci_inta_msi_get_virq(ud->dev, udma_rflow_id + - oes->pktdma_rchan_flow); + return msi_get_virq(ud->dev, udma_rflow_id + oes->pktdma_rchan_flow); } EXPORT_SYMBOL(xudma_pktdma_rflow_get_irq); --- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -2313,8 +2313,7 @@ static int udma_alloc_chan_resources(str /* Event from UDMA (TR events) only needed for slave TR mode channels */ if (is_slave_direction(uc->config.dir) && !uc->config.pkt_mode) { - uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev, - irq_udma_idx); + uc->irq_num_udma = msi_get_virq(ud->dev, irq_udma_idx); if (uc->irq_num_udma <= 0) { dev_err(ud->dev, "Failed to get udma irq (index: %u)\n", irq_udma_idx); @@ -2486,7 +2485,7 @@ static int bcdma_alloc_chan_resources(st uc->psil_paired = true; } - uc->irq_num_ring = ti_sci_inta_msi_get_virq(ud->dev, irq_ring_idx); + uc->irq_num_ring = msi_get_virq(ud->dev, irq_ring_idx); if (uc->irq_num_ring <= 0) { dev_err(ud->dev, "Failed to get ring irq (index: %u)\n", irq_ring_idx); @@ -2503,8 +2502,7 @@ static int bcdma_alloc_chan_resources(st /* Event from BCDMA (TR events) only needed for slave channels */ if (is_slave_direction(uc->config.dir)) { - uc->irq_num_udma = ti_sci_inta_msi_get_virq(ud->dev, - irq_udma_idx); + uc->irq_num_udma = msi_get_virq(ud->dev, irq_udma_idx); if (uc->irq_num_udma <= 0) { dev_err(ud->dev, "Failed to get bcdma irq (index: %u)\n", irq_udma_idx); @@ -2672,7 +2670,7 @@ static int pktdma_alloc_chan_resources(s uc->psil_paired = true; - uc->irq_num_ring = ti_sci_inta_msi_get_virq(ud->dev, irq_ring_idx); + uc->irq_num_ring = msi_get_virq(ud->dev, irq_ring_idx); if (uc->irq_num_ring <= 0) { dev_err(ud->dev, "Failed to get ring irq (index: %u)\n", irq_ring_idx); --- a/drivers/soc/ti/k3-ringacc.c +++ b/drivers/soc/ti/k3-ringacc.c @@ -647,7 +647,7 @@ int k3_ringacc_get_ring_irq_num(struct k if (!ring) return -EINVAL; - irq_num = ti_sci_inta_msi_get_virq(ring->parent->dev, ring->ring_id); + irq_num = msi_get_virq(ring->parent->dev, ring->ring_id); if (irq_num <= 0) irq_num = -EINVAL; return irq_num; --- a/drivers/soc/ti/ti_sci_inta_msi.c +++ b/drivers/soc/ti/ti_sci_inta_msi.c @@ -148,15 +148,3 @@ void ti_sci_inta_msi_domain_free_irqs(st ti_sci_inta_msi_free_descs(dev); } EXPORT_SYMBOL_GPL(ti_sci_inta_msi_domain_free_irqs); - -unsigned int ti_sci_inta_msi_get_virq(struct device *dev, u32 dev_index) -{ - struct msi_desc *desc; - - for_each_msi_entry(desc, dev) - if (desc->msi_index == dev_index) - return desc->irq; - - return -ENODEV; -} -EXPORT_SYMBOL_GPL(ti_sci_inta_msi_get_virq); --- a/include/linux/soc/ti/ti_sci_inta_msi.h +++ b/include/linux/soc/ti/ti_sci_inta_msi.h @@ -18,6 +18,5 @@ struct irq_domain struct irq_domain *parent); int ti_sci_inta_msi_domain_alloc_irqs(struct device *dev, struct ti_sci_resource *res); -unsigned int ti_sci_inta_msi_get_virq
[patch V3 35/35] dmaengine: qcom_hidma: Cleanup MSI handling
From: Thomas Gleixner There is no reason to walk the MSI descriptors to retrieve the interrupt number for a device. Use msi_get_virq() instead. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Acked-by: Sinan Kaya Cc: dmaeng...@vger.kernel.org --- drivers/dma/qcom/hidma.c | 42 ++ 1 file changed, 18 insertions(+), 24 deletions(-) --- a/drivers/dma/qcom/hidma.c +++ b/drivers/dma/qcom/hidma.c @@ -678,11 +678,13 @@ static void hidma_free_msis(struct hidma { #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN struct device *dev = dmadev->ddev.dev; - struct msi_desc *desc; + int i, virq; - /* free allocated MSI interrupts above */ - for_each_msi_entry(desc, dev) - devm_free_irq(dev, desc->irq, &dmadev->lldev); + for (i = 0; i < HIDMA_MSI_INTS; i++) { + virq = msi_get_virq(dev, i); + if (virq) + devm_free_irq(dev, virq, &dmadev->lldev); + } platform_msi_domain_free_irqs(dev); #endif @@ -692,45 +694,37 @@ static int hidma_request_msi(struct hidm struct platform_device *pdev) { #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN - int rc; - struct msi_desc *desc; - struct msi_desc *failed_desc = NULL; + int rc, i, virq; rc = platform_msi_domain_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS, hidma_write_msi_msg); if (rc) return rc; - for_each_msi_entry(desc, &pdev->dev) { - if (!desc->msi_index) - dmadev->msi_virqbase = desc->irq; - - rc = devm_request_irq(&pdev->dev, desc->irq, + for (i = 0; i < HIDMA_MSI_INTS; i++) { + virq = msi_get_virq(&pdev->dev, i); + rc = devm_request_irq(&pdev->dev, virq, hidma_chirq_handler_msi, 0, "qcom-hidma-msi", &dmadev->lldev); - if (rc) { - failed_desc = desc; + if (rc) break; - } + if (!i) + dmadev->msi_virqbase = virq; } if (rc) { /* free allocated MSI interrupts above */ - for_each_msi_entry(desc, &pdev->dev) { - if (desc == failed_desc) - break; - devm_free_irq(&pdev->dev, desc->irq, - &dmadev->lldev); + for (--i; i >= 0; i--) { + virq = msi_get_virq(&pdev->dev, i); + devm_free_irq(&pdev->dev, virq, &dmadev->lldev); } + dev_warn(&pdev->dev, +"failed to request MSI irq, falling back to wired IRQ\n"); } else { /* Add callback to free MSIs on teardown */ hidma_ll_setup_irq(dmadev->lldev, true); - } - if (rc) - dev_warn(&pdev->dev, -"failed to request MSI irq, falling back to wired IRQ\n"); return rc; #else return -EINVAL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 33/35] bus: fsl-mc: fsl-mc-allocator: Rework MSI handling
From: Thomas Gleixner Storing a pointer to the MSI descriptor just to track the Linux interrupt number is daft. Just store the interrupt number and be done with it. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Stuart Yoder Cc: Laurentiu Tudor --- drivers/bus/fsl-mc/dprc-driver.c|8 drivers/bus/fsl-mc/fsl-mc-allocator.c |9 ++--- drivers/bus/fsl-mc/fsl-mc-msi.c |6 +++--- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c|4 ++-- drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c|4 +--- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c |5 ++--- drivers/soc/fsl/dpio/dpio-driver.c |8 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c |4 ++-- include/linux/fsl/mc.h |4 ++-- 9 files changed, 22 insertions(+), 30 deletions(-) --- a/drivers/bus/fsl-mc/dprc-driver.c +++ b/drivers/bus/fsl-mc/dprc-driver.c @@ -400,7 +400,7 @@ static irqreturn_t dprc_irq0_handler_thr struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); struct fsl_mc_bus *mc_bus = to_fsl_mc_bus(mc_dev); struct fsl_mc_io *mc_io = mc_dev->mc_io; - struct msi_desc *msi_desc = mc_dev->irqs[0]->msi_desc; + int irq = mc_dev->irqs[0]->virq; dev_dbg(dev, "DPRC IRQ %d triggered on CPU %u\n", irq_num, smp_processor_id()); @@ -409,7 +409,7 @@ static irqreturn_t dprc_irq0_handler_thr return IRQ_HANDLED; mutex_lock(&mc_bus->scan_mutex); - if (!msi_desc || msi_desc->irq != (u32)irq_num) + if (irq != (u32)irq_num) goto out; status = 0; @@ -521,7 +521,7 @@ static int register_dprc_irq_handler(str * function that programs the MSI physically in the device */ error = devm_request_threaded_irq(&mc_dev->dev, - irq->msi_desc->irq, + irq->virq, dprc_irq0_handler, dprc_irq0_handler_thread, IRQF_NO_SUSPEND | IRQF_ONESHOT, @@ -771,7 +771,7 @@ static void dprc_teardown_irq(struct fsl (void)disable_dprc_irq(mc_dev); - devm_free_irq(&mc_dev->dev, irq->msi_desc->irq, &mc_dev->dev); + devm_free_irq(&mc_dev->dev, irq->virq, &mc_dev->dev); fsl_mc_free_irqs(mc_dev); } --- a/drivers/bus/fsl-mc/fsl-mc-allocator.c +++ b/drivers/bus/fsl-mc/fsl-mc-allocator.c @@ -350,7 +350,6 @@ int fsl_mc_populate_irq_pool(struct fsl_ unsigned int irq_count) { unsigned int i; - struct msi_desc *msi_desc; struct fsl_mc_device_irq *irq_resources; struct fsl_mc_device_irq *mc_dev_irq; int error; @@ -388,16 +387,12 @@ int fsl_mc_populate_irq_pool(struct fsl_ mc_dev_irq->resource.type = res_pool->type; mc_dev_irq->resource.data = mc_dev_irq; mc_dev_irq->resource.parent_pool = res_pool; + mc_dev_irq->virq = msi_get_virq(&mc_bus_dev->dev, i); + mc_dev_irq->resource.id = mc_dev_irq->virq; INIT_LIST_HEAD(&mc_dev_irq->resource.node); list_add_tail(&mc_dev_irq->resource.node, &res_pool->free_list); } - for_each_msi_entry(msi_desc, &mc_bus_dev->dev) { - mc_dev_irq = &irq_resources[msi_desc->msi_index]; - mc_dev_irq->msi_desc = msi_desc; - mc_dev_irq->resource.id = msi_desc->irq; - } - res_pool->max_count = irq_count; res_pool->free_count = irq_count; mc_bus->irq_resources = irq_resources; --- a/drivers/bus/fsl-mc/fsl-mc-msi.c +++ b/drivers/bus/fsl-mc/fsl-mc-msi.c @@ -58,11 +58,11 @@ static void fsl_mc_msi_update_dom_ops(st } static void __fsl_mc_msi_write_msg(struct fsl_mc_device *mc_bus_dev, - struct fsl_mc_device_irq *mc_dev_irq) + struct fsl_mc_device_irq *mc_dev_irq, + struct msi_desc *msi_desc) { int error; struct fsl_mc_device *owner_mc_dev = mc_dev_irq->mc_dev; - struct msi_desc *msi_desc = mc_dev_irq->msi_desc; struct dprc_irq_cfg irq_cfg; /* @@ -129,7 +129,7 @@ static void fsl_mc_msi_write_msg(struct /* * Program the MSI (paddr, value) pair in the device: */ - __fsl_mc_msi_write_msg(mc_bus_dev, mc_dev_irq); + __fsl_mc_msi_write_msg(mc_bus_dev, mc_dev_irq, msi_desc); } static void fsl_mc_msi_update_chip_ops(struct msi_domain_info *info) --- a/drivers/net/ethernet/freescale/d
[patch V3 32/35] mailbox: bcm-flexrm-mailbox: Rework MSI interrupt handling
From: Thomas Gleixner No point in retrieving the MSI descriptors. Just query the Linux interrupt number. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Jassi Brar --- drivers/mailbox/bcm-flexrm-mailbox.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) --- a/drivers/mailbox/bcm-flexrm-mailbox.c +++ b/drivers/mailbox/bcm-flexrm-mailbox.c @@ -1497,7 +1497,6 @@ static int flexrm_mbox_probe(struct plat int index, ret = 0; void __iomem *regs; void __iomem *regs_end; - struct msi_desc *desc; struct resource *iomem; struct flexrm_ring *ring; struct flexrm_mbox *mbox; @@ -1608,10 +1607,8 @@ static int flexrm_mbox_probe(struct plat goto fail_destroy_cmpl_pool; /* Save alloced IRQ numbers for each ring */ - for_each_msi_entry(desc, dev) { - ring = &mbox->rings[desc->msi_index]; - ring->irq = desc->irq; - } + for (index = 0; index < mbox->num_rings; index++) + mbox->rings[index].irq = msi_get_virq(dev, index); /* Check availability of debugfs */ if (!debugfs_initialized()) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 31/35] iommu/arm-smmu-v3: Use msi_get_virq()
From: Thomas Gleixner Let the core code fiddle with the MSI descriptor retrieval. Signed-off-by: Thomas Gleixner Tested-by: Robin Murphy Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Will Deacon Cc: Joerg Roedel Cc: linux-arm-ker...@lists.infradead.org Cc: iommu@lists.linux-foundation.org --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3154,7 +3154,6 @@ static void arm_smmu_write_msi_msg(struc static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { - struct msi_desc *desc; int ret, nvec = ARM_SMMU_MAX_MSIS; struct device *dev = smmu->dev; @@ -3182,21 +3181,9 @@ static void arm_smmu_setup_msis(struct a return; } - for_each_msi_entry(desc, dev) { - switch (desc->msi_index) { - case EVTQ_MSI_INDEX: - smmu->evtq.q.irq = desc->irq; - break; - case GERROR_MSI_INDEX: - smmu->gerr_irq = desc->irq; - break; - case PRIQ_MSI_INDEX: - smmu->priq.q.irq = desc->irq; - break; - default:/* Unknown */ - continue; - } - } + smmu->evtq.q.irq = msi_get_virq(dev, EVTQ_MSI_INDEX); + smmu->gerr_irq = msi_get_virq(dev, GERROR_MSI_INDEX); + smmu->priq.q.irq = msi_get_virq(dev, PRIQ_MSI_INDEX); /* Add callback to free MSIs on teardown */ devm_add_action(dev, arm_smmu_free_msis, dev); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 30/35] perf/smmuv3: Use msi_get_virq()
From: Thomas Gleixner Let the core code fiddle with the MSI descriptor retrieval. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Mark Rutland Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org --- drivers/perf/arm_smmuv3_pmu.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -684,7 +684,6 @@ static void smmu_pmu_write_msi_msg(struc static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) { - struct msi_desc *desc; struct device *dev = pmu->dev; int ret; @@ -701,9 +700,7 @@ static void smmu_pmu_setup_msi(struct sm return; } - desc = first_msi_entry(dev); - if (desc) - pmu->irq = desc->irq; + pmu->irq = msi_get_virq(dev, 0); /* Add callback to free MSIs on teardown */ devm_add_action(dev, smmu_pmu_free_msis, dev); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 29/35] dmaengine: mv_xor_v2: Get rid of msi_desc abuse
From: Thomas Gleixner Storing a pointer to the MSI descriptor just to keep track of the Linux interrupt number is daft. Use msi_get_virq() instead. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Vinod Koul Cc: dmaeng...@vger.kernel.org --- drivers/dma/mv_xor_v2.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) --- a/drivers/dma/mv_xor_v2.c +++ b/drivers/dma/mv_xor_v2.c @@ -149,7 +149,7 @@ struct mv_xor_v2_descriptor { * @desc_size: HW descriptor size * @npendings: number of pending descriptors (for which tx_submit has * @hw_queue_idx: HW queue index - * @msi_desc: local interrupt descriptor information + * @irq: The Linux interrupt number * been called, but not yet issue_pending) */ struct mv_xor_v2_device { @@ -168,7 +168,7 @@ struct mv_xor_v2_device { int desc_size; unsigned int npendings; unsigned int hw_queue_idx; - struct msi_desc *msi_desc; + unsigned int irq; }; /** @@ -718,7 +718,6 @@ static int mv_xor_v2_probe(struct platfo int i, ret = 0; struct dma_device *dma_dev; struct mv_xor_v2_sw_desc *sw_desc; - struct msi_desc *msi_desc; BUILD_BUG_ON(sizeof(struct mv_xor_v2_descriptor) != MV_XOR_V2_EXT_DESC_SIZE); @@ -770,14 +769,9 @@ static int mv_xor_v2_probe(struct platfo if (ret) goto disable_clk; - msi_desc = first_msi_entry(&pdev->dev); - if (!msi_desc) { - ret = -ENODEV; - goto free_msi_irqs; - } - xor_dev->msi_desc = msi_desc; + xor_dev->irq = msi_get_virq(&pdev->dev, 0); - ret = devm_request_irq(&pdev->dev, msi_desc->irq, + ret = devm_request_irq(&pdev->dev, xor_dev->irq, mv_xor_v2_interrupt_handler, 0, dev_name(&pdev->dev), xor_dev); if (ret) @@ -892,7 +886,7 @@ static int mv_xor_v2_remove(struct platf xor_dev->desc_size * MV_XOR_V2_DESC_NUM, xor_dev->hw_desq_virt, xor_dev->hw_desq); - devm_free_irq(&pdev->dev, xor_dev->msi_desc->irq, xor_dev); + devm_free_irq(&pdev->dev, xor_dev->irq, xor_dev); platform_msi_domain_free_irqs(&pdev->dev); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()
From: Thomas Gleixner Replace open coded MSI descriptor chasing and use the proper accessor functions instead. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/pci/msi/msi.c | 26 ++ 1 file changed, 10 insertions(+), 16 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -1061,26 +1061,20 @@ EXPORT_SYMBOL(pci_irq_vector); */ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr) { - if (dev->msix_enabled) { - struct msi_desc *entry; + int irq = pci_irq_vector(dev, nr); + struct msi_desc *desc; - for_each_pci_msi_entry(entry, dev) { - if (entry->msi_index == nr) - return &entry->affinity->mask; - } - WARN_ON_ONCE(1); + if (WARN_ON_ONCE(irq <= 0)) return NULL; - } else if (dev->msi_enabled) { - struct msi_desc *entry = first_pci_msi_entry(dev); - if (WARN_ON_ONCE(!entry || !entry->affinity || -nr >= entry->nvec_used)) - return NULL; - - return &entry->affinity[nr].mask; - } else { + desc = irq_get_msi_desc(irq); + /* Non-MSI does not have the information handy */ + if (!desc) return cpu_possible_mask; - } + + if (WARN_ON_ONCE(!desc->affinity)) + return NULL; + return &desc->affinity[nr].mask; } EXPORT_SYMBOL(pci_irq_get_affinity); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 27/35] PCI/MSI: Use __msi_get_virq() in pci_get_vector()
From: Thomas Gleixner Use msi_get_vector() and handle the return value to be compatible. No functional change intended. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman --- V2: Handle the INTx case directly instead of trying to be overly smart - Marc --- drivers/pci/msi/msi.c | 25 + 1 file changed, 5 insertions(+), 20 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -1037,28 +1037,13 @@ EXPORT_SYMBOL(pci_free_irq_vectors); */ int pci_irq_vector(struct pci_dev *dev, unsigned int nr) { - if (dev->msix_enabled) { - struct msi_desc *entry; + unsigned int irq; - for_each_pci_msi_entry(entry, dev) { - if (entry->msi_index == nr) - return entry->irq; - } - WARN_ON_ONCE(1); - return -EINVAL; - } + if (!dev->msi_enabled && !dev->msix_enabled) + return !nr ? dev->irq : -EINVAL; - if (dev->msi_enabled) { - struct msi_desc *entry = first_pci_msi_entry(dev); - - if (WARN_ON_ONCE(nr >= entry->nvec_used)) - return -EINVAL; - } else { - if (WARN_ON_ONCE(nr > 0)) - return -EINVAL; - } - - return dev->irq + nr; + irq = msi_get_virq(&dev->dev, nr); + return irq ? irq : -EINVAL; } EXPORT_SYMBOL(pci_irq_vector); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 26/35] genirq/msi: Provide interface to retrieve Linux interrupt number
From: Thomas Gleixner This allows drivers to retrieve the Linux interrupt number instead of fiddling with MSI descriptors. msi_get_virq() returns the Linux interrupt number or 0 in case that there is no entry for the given MSI index. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman --- V2: Simplify the implementation and let PCI deal with the PCI specialities - Marc --- include/linux/msi.h |2 ++ kernel/irq/msi.c| 36 2 files changed, 38 insertions(+) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -153,6 +153,8 @@ struct msi_device_data { int msi_setup_device_data(struct device *dev); +unsigned int msi_get_virq(struct device *dev, unsigned int index); + /* Helpers to hide struct msi_desc implementation details */ #define msi_desc_to_dev(desc) ((desc)->dev) #define dev_to_msi_list(dev) (&(dev)->msi_list) --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -105,6 +105,42 @@ int msi_setup_device_data(struct device return 0; } +/** + * msi_get_virq - Return Linux interrupt number of a MSI interrupt + * @dev: Device to operate on + * @index: MSI interrupt index to look for (0-based) + * + * Return: The Linux interrupt number on success (> 0), 0 if not found + */ +unsigned int msi_get_virq(struct device *dev, unsigned int index) +{ + struct msi_desc *desc; + bool pcimsi; + + if (!dev->msi.data) + return 0; + + pcimsi = dev_is_pci(dev) ? to_pci_dev(dev)->msi_enabled : false; + + for_each_msi_entry(desc, dev) { + /* PCI-MSI has only one descriptor for multiple interrupts. */ + if (pcimsi) { + if (desc->irq && index < desc->nvec_used) + return desc->irq + index; + break; + } + + /* +* PCI-MSIX and platform MSI use a descriptor per +* interrupt. +*/ + if (desc->msi_index == index) + return desc->irq; + } + return 0; +} +EXPORT_SYMBOL_GPL(msi_get_virq); + #ifdef CONFIG_SYSFS static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr, char *buf) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 25/35] powerpc/pseries/msi: Let core code check for contiguous entries
From: Thomas Gleixner Set the domain info flag and remove the check. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: "Cédric Le Goater" Cc: linuxppc-...@lists.ozlabs.org --- V2: Remove it completely - Cedric --- arch/powerpc/platforms/pseries/msi.c | 33 - 1 file changed, 8 insertions(+), 25 deletions(-) --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -321,27 +321,6 @@ static int msi_quota_for_device(struct p return request; } -static int check_msix_entries(struct pci_dev *pdev) -{ - struct msi_desc *entry; - int expected; - - /* There's no way for us to express to firmware that we want -* a discontiguous, or non-zero based, range of MSI-X entries. -* So we must reject such requests. */ - - expected = 0; - for_each_pci_msi_entry(entry, pdev) { - if (entry->msi_index != expected) { - pr_debug("rtas_msi: bad MSI-X entries.\n"); - return -EINVAL; - } - expected++; - } - - return 0; -} - static void rtas_hack_32bit_msi_gen2(struct pci_dev *pdev) { u32 addr_hi, addr_lo; @@ -380,9 +359,6 @@ static int rtas_prepare_msi_irqs(struct if (quota && quota < nvec) return quota; - if (type == PCI_CAP_ID_MSIX && check_msix_entries(pdev)) - return -EINVAL; - /* * Firmware currently refuse any non power of two allocation * so we round up if the quota will allow it. @@ -529,9 +505,16 @@ static struct irq_chip pseries_pci_msi_i .irq_write_msi_msg = pseries_msi_write_msg, }; + +/* + * Set MSI_FLAG_MSIX_CONTIGUOUS as there is no way to express to + * firmware to request a discontiguous or non-zero based range of + * MSI-X entries. Core code will reject such setup attempts. + */ static struct msi_domain_info pseries_msi_domain_info = { .flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | - MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX), + MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX | + MSI_FLAG_MSIX_CONTIGUOUS), .ops = &pseries_pci_msi_domain_ops, .chip = &pseries_pci_msi_irq_chip, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 24/35] PCI/MSI: Provide MSI_FLAG_MSIX_CONTIGUOUS
From: Thomas Gleixner Provide a domain info flag which makes the core code check for a contiguous MSI-X index on allocation. That's simpler than checking it at some other domain callback in architecture code. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/msi/irqdomain.c | 16 ++-- include/linux/msi.h |2 ++ 2 files changed, 16 insertions(+), 2 deletions(-) --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -89,9 +89,21 @@ static int pci_msi_domain_check_cap(stru if (pci_msi_desc_is_multi_msi(desc) && !(info->flags & MSI_FLAG_MULTI_PCI_MSI)) return 1; - else if (desc->pci.msi_attrib.is_msix && !(info->flags & MSI_FLAG_PCI_MSIX)) - return -ENOTSUPP; + if (desc->pci.msi_attrib.is_msix) { + if (!(info->flags & MSI_FLAG_PCI_MSIX)) + return -ENOTSUPP; + + if (info->flags & MSI_FLAG_MSIX_CONTIGUOUS) { + unsigned int idx = 0; + + /* Check for gaps in the entry indices */ + for_each_msi_entry(desc, dev) { + if (desc->msi_index != idx++) + return -ENOTSUPP; + } + } + } return 0; } --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -361,6 +361,8 @@ enum { MSI_FLAG_LEVEL_CAPABLE = (1 << 6), /* Populate sysfs on alloc() and destroy it on free() */ MSI_FLAG_DEV_SYSFS = (1 << 7), + /* MSI-X entries must be contiguous */ + MSI_FLAG_MSIX_CONTIGUOUS= (1 << 8), }; int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 23/35] PCI/MSI: Use msi_desc::msi_index
From: Thomas Gleixner The usage of msi_desc::pci::entry_nr is confusing at best. It's the index into the MSI[X] descriptor table. Use msi_desc::msi_index which is shared between all MSI incarnations instead of having a PCI specific storage for no value. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Acked-by: Bjorn Helgaas --- arch/powerpc/platforms/pseries/msi.c |4 ++-- arch/x86/pci/xen.c |2 +- drivers/pci/msi/irqdomain.c |2 +- drivers/pci/msi/msi.c| 20 drivers/pci/xen-pcifront.c |2 +- include/linux/msi.h |2 -- 6 files changed, 13 insertions(+), 19 deletions(-) --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -332,7 +332,7 @@ static int check_msix_entries(struct pci expected = 0; for_each_pci_msi_entry(entry, pdev) { - if (entry->pci.msi_attrib.entry_nr != expected) { + if (entry->msi_index != expected) { pr_debug("rtas_msi: bad MSI-X entries.\n"); return -EINVAL; } @@ -579,7 +579,7 @@ static int pseries_irq_domain_alloc(stru int hwirq; int i, ret; - hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->pci.msi_attrib.entry_nr); + hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->msi_index); if (hwirq < 0) { dev_err(&pdev->dev, "Failed to query HW IRQ: %d\n", hwirq); return hwirq; --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -306,7 +306,7 @@ static int xen_initdom_setup_msi_irqs(st return -EINVAL; map_irq.table_base = pci_resource_start(dev, bir); - map_irq.entry_nr = msidesc->pci.msi_attrib.entry_nr; + map_irq.entry_nr = msidesc->msi_index; } ret = -EINVAL; --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -57,7 +57,7 @@ static irq_hw_number_t pci_msi_domain_ca { struct pci_dev *dev = msi_desc_to_pci_dev(desc); - return (irq_hw_number_t)desc->pci.msi_attrib.entry_nr | + return (irq_hw_number_t)desc->msi_index | pci_dev_id(dev) << 11 | (pci_domain_nr(dev->bus) & 0x) << 27; } --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -44,7 +44,7 @@ static inline void pci_msi_unmask(struct static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc) { - return desc->pci.mask_base + desc->pci.msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; + return desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE; } /* @@ -356,13 +356,10 @@ msi_setup_entry(struct pci_dev *dev, int if (dev->dev_flags & PCI_DEV_FLAGS_HAS_MSI_MASKING) control |= PCI_MSI_FLAGS_MASKBIT; - entry->pci.msi_attrib.is_msix = 0; - entry->pci.msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT); - entry->pci.msi_attrib.is_virtual= 0; - entry->pci.msi_attrib.entry_nr = 0; + entry->pci.msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT); entry->pci.msi_attrib.can_mask = !pci_msi_ignore_mask && !!(control & PCI_MSI_FLAGS_MASKBIT); - entry->pci.msi_attrib.default_irq = dev->irq; /* Save IOAPIC IRQ */ + entry->pci.msi_attrib.default_irq = dev->irq; entry->pci.msi_attrib.multi_cap = (control & PCI_MSI_FLAGS_QMASK) >> 1; entry->pci.msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec)); @@ -504,12 +501,11 @@ static int msix_setup_entries(struct pci entry->pci.msi_attrib.is_64 = 1; if (entries) - entry->pci.msi_attrib.entry_nr = entries[i].entry; + entry->msi_index = entries[i].entry; else - entry->pci.msi_attrib.entry_nr = i; + entry->msi_index = i; - entry->pci.msi_attrib.is_virtual = - entry->pci.msi_attrib.entry_nr >= vec_count; + entry->pci.msi_attrib.is_virtual = entry->msi_index >= vec_count; entry->pci.msi_attrib.can_mask = !pci_msi_ignore_mask && !entry->pci.msi_attrib.is_virtual; @@ -1045,7 +1041,7 @@ int pci_irq_vector(struct pci_dev *dev, struct msi_desc *entry; for_each_pci_msi_entry(entry, dev) { - if (entry->pci.msi_attrib.entry_nr == nr) + if (entry->msi_
[patch V3 21/35] bus: fsl-mc-msi: Use msi_desc::msi_index
From: Thomas Gleixner Use the common msi_index member and get rid of the pointless wrapper struct. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Stuart Yoder Cc: Laurentiu Tudor --- drivers/bus/fsl-mc/fsl-mc-allocator.c |2 +- drivers/bus/fsl-mc/fsl-mc-msi.c |6 +++--- include/linux/msi.h | 10 -- 3 files changed, 4 insertions(+), 14 deletions(-) --- a/drivers/bus/fsl-mc/fsl-mc-allocator.c +++ b/drivers/bus/fsl-mc/fsl-mc-allocator.c @@ -393,7 +393,7 @@ int fsl_mc_populate_irq_pool(struct fsl_ } for_each_msi_entry(msi_desc, &mc_bus_dev->dev) { - mc_dev_irq = &irq_resources[msi_desc->fsl_mc.msi_index]; + mc_dev_irq = &irq_resources[msi_desc->msi_index]; mc_dev_irq->msi_desc = msi_desc; mc_dev_irq->resource.id = msi_desc->irq; } --- a/drivers/bus/fsl-mc/fsl-mc-msi.c +++ b/drivers/bus/fsl-mc/fsl-mc-msi.c @@ -29,7 +29,7 @@ static irq_hw_number_t fsl_mc_domain_cal * Make the base hwirq value for ICID*1 so it is readable * as a decimal value in /proc/interrupts. */ - return (irq_hw_number_t)(desc->fsl_mc.msi_index + (dev->icid * 1)); + return (irq_hw_number_t)(desc->msi_index + (dev->icid * 1)); } static void fsl_mc_msi_set_desc(msi_alloc_info_t *arg, @@ -122,7 +122,7 @@ static void fsl_mc_msi_write_msg(struct struct fsl_mc_device *mc_bus_dev = to_fsl_mc_device(msi_desc->dev); struct fsl_mc_bus *mc_bus = to_fsl_mc_bus(mc_bus_dev); struct fsl_mc_device_irq *mc_dev_irq = - &mc_bus->irq_resources[msi_desc->fsl_mc.msi_index]; + &mc_bus->irq_resources[msi_desc->msi_index]; msi_desc->msg = *msg; @@ -235,7 +235,7 @@ static int fsl_mc_msi_alloc_descs(struct goto cleanup_msi_descs; } - msi_desc->fsl_mc.msi_index = i; + msi_desc->msi_index = i; INIT_LIST_HEAD(&msi_desc->list); list_add_tail(&msi_desc->list, dev_to_msi_list(dev)); } --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -107,14 +107,6 @@ struct pci_msi_desc { }; /** - * fsl_mc_msi_desc - FSL-MC device specific msi descriptor data - * @msi_index: The index of the MSI descriptor - */ -struct fsl_mc_msi_desc { - u16 msi_index; -}; - -/** * ti_sci_inta_msi_desc - TISCI based INTA specific msi descriptor data * @dev_index: TISCI device index */ @@ -137,7 +129,6 @@ struct ti_sci_inta_msi_desc { * * @msi_index: Index of the msi descriptor * @pci: [PCI] PCI speficic msi descriptor data - * @fsl_mc:[fsl-mc]FSL MC device specific msi descriptor data * @inta: [INTA] TISCI based INTA specific msi descriptor data */ struct msi_desc { @@ -158,7 +149,6 @@ struct msi_desc { u16 msi_index; union { struct pci_msi_desc pci; - struct fsl_mc_msi_desc fsl_mc; struct ti_sci_inta_msi_desc inta; }; }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 22/35] soc: ti: ti_sci_inta_msi: Use msi_desc::msi_index
From: Thomas Gleixner Use the common msi_index member and get rid of the pointless wrapper struct. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Nishanth Menon Cc: Tero Kristo Cc: Santosh Shilimkar Cc: Thomas Gleixner Cc: linux-arm-ker...@lists.infradead.org --- drivers/irqchip/irq-ti-sci-inta.c |2 +- drivers/soc/ti/ti_sci_inta_msi.c |6 +++--- include/linux/msi.h | 16 ++-- 3 files changed, 6 insertions(+), 18 deletions(-) --- a/drivers/irqchip/irq-ti-sci-inta.c +++ b/drivers/irqchip/irq-ti-sci-inta.c @@ -595,7 +595,7 @@ static void ti_sci_inta_msi_set_desc(msi struct platform_device *pdev = to_platform_device(desc->dev); arg->desc = desc; - arg->hwirq = TO_HWIRQ(pdev->id, desc->inta.dev_index); + arg->hwirq = TO_HWIRQ(pdev->id, desc->msi_index); } static struct msi_domain_ops ti_sci_inta_msi_ops = { --- a/drivers/soc/ti/ti_sci_inta_msi.c +++ b/drivers/soc/ti/ti_sci_inta_msi.c @@ -84,7 +84,7 @@ static int ti_sci_inta_msi_alloc_descs(s return -ENOMEM; } - msi_desc->inta.dev_index = res->desc[set].start + i; + msi_desc->msi_index = res->desc[set].start + i; INIT_LIST_HEAD(&msi_desc->list); list_add_tail(&msi_desc->list, dev_to_msi_list(dev)); count++; @@ -96,7 +96,7 @@ static int ti_sci_inta_msi_alloc_descs(s return -ENOMEM; } - msi_desc->inta.dev_index = res->desc[set].start_sec + i; + msi_desc->msi_index = res->desc[set].start_sec + i; INIT_LIST_HEAD(&msi_desc->list); list_add_tail(&msi_desc->list, dev_to_msi_list(dev)); count++; @@ -154,7 +154,7 @@ unsigned int ti_sci_inta_msi_get_virq(st struct msi_desc *desc; for_each_msi_entry(desc, dev) - if (desc->inta.dev_index == dev_index) + if (desc->msi_index == dev_index) return desc->irq; return -ENODEV; --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -107,14 +107,6 @@ struct pci_msi_desc { }; /** - * ti_sci_inta_msi_desc - TISCI based INTA specific msi descriptor data - * @dev_index: TISCI device index - */ -struct ti_sci_inta_msi_desc { - u16 dev_index; -}; - -/** * struct msi_desc - Descriptor structure for MSI based interrupts * @list: List head for management * @irq: The base interrupt number @@ -128,8 +120,7 @@ struct ti_sci_inta_msi_desc { * @write_msi_msg_data:Data parameter for the callback. * * @msi_index: Index of the msi descriptor - * @pci: [PCI] PCI speficic msi descriptor data - * @inta: [INTA] TISCI based INTA specific msi descriptor data + * @pci: PCI specific msi descriptor data */ struct msi_desc { /* Shared device/bus type independent data */ @@ -147,10 +138,7 @@ struct msi_desc { void *write_msi_msg_data; u16 msi_index; - union { - struct pci_msi_desc pci; - struct ti_sci_inta_msi_desc inta; - }; + struct pci_msi_desc pci; }; /** ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 18/35] platform-msi: Store platform private data pointer in msi_device_data
From: Thomas Gleixner Storing the platform private data in a MSI descriptor is sloppy at best. The data belongs to the device and not to the descriptor. Add a pointer to struct msi_device_data and store the pointer there. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c | 79 +--- include/linux/msi.h |4 +- 2 files changed, 34 insertions(+), 49 deletions(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -38,9 +38,7 @@ static DEFINE_IDA(platform_msi_devid_ida */ static irq_hw_number_t platform_msi_calc_hwirq(struct msi_desc *desc) { - u32 devid; - - devid = desc->platform.msi_priv_data->devid; + u32 devid = desc->dev->msi.data->platform_data->devid; return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index; } @@ -85,11 +83,8 @@ static void platform_msi_update_dom_ops( static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) { struct msi_desc *desc = irq_data_get_msi_desc(data); - struct platform_msi_priv_data *priv_data; - - priv_data = desc->platform.msi_priv_data; - priv_data->write_msg(desc, msg); + desc->dev->msi.data->platform_data->write_msg(desc, msg); } static void platform_msi_update_chip_ops(struct msi_domain_info *info) @@ -126,9 +121,7 @@ static void platform_msi_free_descs(stru } static int platform_msi_alloc_descs_with_irq(struct device *dev, int virq, -int nvec, -struct platform_msi_priv_data *data) - +int nvec) { struct msi_desc *desc; int i, base = 0; @@ -144,7 +137,6 @@ static int platform_msi_alloc_descs_with if (!desc) break; - desc->platform.msi_priv_data = data; desc->platform.msi_index = base + i; desc->irq = virq ? virq + i : 0; @@ -161,11 +153,9 @@ static int platform_msi_alloc_descs_with return 0; } -static int platform_msi_alloc_descs(struct device *dev, int nvec, - struct platform_msi_priv_data *data) - +static int platform_msi_alloc_descs(struct device *dev, int nvec) { - return platform_msi_alloc_descs_with_irq(dev, 0, nvec, data); + return platform_msi_alloc_descs_with_irq(dev, 0, nvec); } /** @@ -199,9 +189,8 @@ struct irq_domain *platform_msi_create_i return domain; } -static struct platform_msi_priv_data * -platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, -irq_write_msi_msg_t write_msi_msg) +static int platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, + irq_write_msi_msg_t write_msi_msg) { struct platform_msi_priv_data *datap; int err; @@ -213,41 +202,44 @@ platform_msi_alloc_priv_data(struct devi * capable devices). */ if (!dev->msi.domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS) - return ERR_PTR(-EINVAL); + return -EINVAL; if (dev->msi.domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { dev_err(dev, "Incompatible msi_domain, giving up\n"); - return ERR_PTR(-EINVAL); + return -EINVAL; } err = msi_setup_device_data(dev); if (err) - return ERR_PTR(err); + return err; - /* Already had a helping of MSI? Greed... */ - if (!list_empty(dev_to_msi_list(dev))) - return ERR_PTR(-EBUSY); + /* Already initialized? */ + if (dev->msi.data->platform_data) + return -EBUSY; datap = kzalloc(sizeof(*datap), GFP_KERNEL); if (!datap) - return ERR_PTR(-ENOMEM); + return -ENOMEM; datap->devid = ida_simple_get(&platform_msi_devid_ida, 0, 1 << DEV_ID_SHIFT, GFP_KERNEL); if (datap->devid < 0) { err = datap->devid; kfree(datap); - return ERR_PTR(err); + return err; } datap->write_msg = write_msi_msg; datap->dev = dev; - - return datap; + dev->msi.data->platform_data = datap; + return 0; } -static void platform_msi_free_priv_data(struct platform_msi_priv_data *data) +static void platform_msi_free_priv_data(struct device *dev) { + struct platform_msi_priv_data *data = dev->msi.data->platform_data; + + dev->msi.data->platform_data = NULL; ida_simple_remove(&platform_msi_devid_ida, data->devid); kfree(data); } @
[patch V3 20/35] platform-msi: Use msi_desc::msi_index
From: Thomas Gleixner Use the common msi_index member and get rid of the pointless wrapper struct. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c | 10 +- drivers/dma/qcom/hidma.c|4 ++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |4 ++-- drivers/mailbox/bcm-flexrm-mailbox.c|4 ++-- include/linux/msi.h | 10 -- 5 files changed, 11 insertions(+), 21 deletions(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -40,7 +40,7 @@ static irq_hw_number_t platform_msi_calc { u32 devid = desc->dev->msi.data->platform_data->devid; - return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index; + return (devid << (32 - DEV_ID_SHIFT)) | desc->msi_index; } static void platform_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) @@ -112,8 +112,8 @@ static void platform_msi_free_descs(stru struct msi_desc *desc, *tmp; list_for_each_entry_safe(desc, tmp, dev_to_msi_list(dev), list) { - if (desc->platform.msi_index >= base && - desc->platform.msi_index < (base + nvec)) { + if (desc->msi_index >= base && + desc->msi_index < (base + nvec)) { list_del(&desc->list); free_msi_entry(desc); } @@ -129,7 +129,7 @@ static int platform_msi_alloc_descs_with if (!list_empty(dev_to_msi_list(dev))) { desc = list_last_entry(dev_to_msi_list(dev), struct msi_desc, list); - base = desc->platform.msi_index + 1; + base = desc->msi_index + 1; } for (i = 0; i < nvec; i++) { @@ -137,7 +137,7 @@ static int platform_msi_alloc_descs_with if (!desc) break; - desc->platform.msi_index = base + i; + desc->msi_index = base + i; desc->irq = virq ? virq + i : 0; list_add_tail(&desc->list, dev_to_msi_list(dev)); --- a/drivers/dma/qcom/hidma.c +++ b/drivers/dma/qcom/hidma.c @@ -666,7 +666,7 @@ static void hidma_write_msi_msg(struct m struct device *dev = msi_desc_to_dev(desc); struct hidma_dev *dmadev = dev_get_drvdata(dev); - if (!desc->platform.msi_index) { + if (!desc->msi_index) { writel(msg->address_lo, dmadev->dev_evca + 0x118); writel(msg->address_hi, dmadev->dev_evca + 0x11C); writel(msg->data, dmadev->dev_evca + 0x120); @@ -702,7 +702,7 @@ static int hidma_request_msi(struct hidm return rc; for_each_msi_entry(desc, &pdev->dev) { - if (!desc->platform.msi_index) + if (!desc->msi_index) dmadev->msi_virqbase = desc->irq; rc = devm_request_irq(&pdev->dev, desc->irq, --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3142,7 +3142,7 @@ static void arm_smmu_write_msi_msg(struc phys_addr_t doorbell; struct device *dev = msi_desc_to_dev(desc); struct arm_smmu_device *smmu = dev_get_drvdata(dev); - phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index]; + phys_addr_t *cfg = arm_smmu_msi_cfg[desc->msi_index]; doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo; doorbell &= MSI_CFG0_ADDR_MASK; @@ -3183,7 +3183,7 @@ static void arm_smmu_setup_msis(struct a } for_each_msi_entry(desc, dev) { - switch (desc->platform.msi_index) { + switch (desc->msi_index) { case EVTQ_MSI_INDEX: smmu->evtq.q.irq = desc->irq; break; --- a/drivers/mailbox/bcm-flexrm-mailbox.c +++ b/drivers/mailbox/bcm-flexrm-mailbox.c @@ -1484,7 +1484,7 @@ static void flexrm_mbox_msi_write(struct { struct device *dev = msi_desc_to_dev(desc); struct flexrm_mbox *mbox = dev_get_drvdata(dev); - struct flexrm_ring *ring = &mbox->rings[desc->platform.msi_index]; + struct flexrm_ring *ring = &mbox->rings[desc->msi_index]; /* Configure per-Ring MSI registers */ writel_relaxed(msg->address_lo, ring->regs + RING_MSI_ADDR_LS); @@ -1609,7 +1609,7 @@ static int flexrm_mbox_probe(struct plat /* Save alloced IRQ numbers for each ring */ for_each_msi_entry(desc, dev) { - ring = &mbox->rings[desc->platform.msi_index]; + ring = &mbox->rings[desc->msi_index]; ring->irq = desc
[patch V3 19/35] genirq/msi: Consolidate MSI descriptor data
From: Thomas Gleixner All non PCI/MSI usage variants have data structures in struct msi_desc with only one member: xxx_index. PCI/MSI has a entry_nr member. Add a common msi_index member to struct msi_desc so all implementations can share it which allows further consolidation. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- include/linux/msi.h |2 ++ 1 file changed, 2 insertions(+) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -143,6 +143,7 @@ struct ti_sci_inta_msi_desc { * address or data changes * @write_msi_msg_data:Data parameter for the callback. * + * @msi_index: Index of the msi descriptor * @pci: [PCI] PCI speficic msi descriptor data * @platform: [platform] Platform device specific msi descriptor data * @fsl_mc:[fsl-mc]FSL MC device specific msi descriptor data @@ -163,6 +164,7 @@ struct msi_desc { void (*write_msi_msg)(struct msi_desc *entry, void *data); void *write_msi_msg_data; + u16 msi_index; union { struct pci_msi_desc pci; struct platform_msi_descplatform; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 17/35] platform-msi: Rename functions and clarify comments
From: Thomas Gleixner It's hard to distinguish what platform_msi_domain_alloc() and platform_msi_domain_alloc_irqs() are about. Make the distinction more explicit and add comments which explain the use cases properly. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c | 36 +--- drivers/irqchip/irq-mbigen.c|4 ++-- drivers/irqchip/irq-mvebu-icu.c |6 +++--- include/linux/msi.h |8 4 files changed, 30 insertions(+), 24 deletions(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -313,17 +313,18 @@ EXPORT_SYMBOL_GPL(platform_msi_domain_fr * a platform-msi domain * @domain:The platform-msi domain * - * Returns the private data provided when calling - * platform_msi_create_device_domain. + * Return: The private data provided when calling + * platform_msi_create_device_domain(). */ void *platform_msi_get_host_data(struct irq_domain *domain) { struct platform_msi_priv_data *data = domain->host_data; + return data->host_data; } /** - * __platform_msi_create_device_domain - Create a platform-msi domain + * __platform_msi_create_device_domain - Create a platform-msi device domain * * @dev: The device generating the MSIs * @nvec: The number of MSIs that need to be allocated @@ -332,7 +333,11 @@ void *platform_msi_get_host_data(struct * @ops: The hierarchy domain operations to use * @host_data: Private data associated to this domain * - * Returns an irqdomain for @nvec interrupts + * Return: An irqdomain for @nvec interrupts on success, NULL in case of error. + * + * This is for interrupt domains which stack on a platform-msi domain + * created by platform_msi_create_irq_domain(). @dev->msi.domain points to + * that platform-msi domain which is the parent for the new domain. */ struct irq_domain * __platform_msi_create_device_domain(struct device *dev, @@ -372,18 +377,19 @@ struct irq_domain * } /** - * platform_msi_domain_free - Free interrupts associated with a platform-msi - *domain + * platform_msi_device_domain_free - Free interrupts associated with a platform-msi + * device domain * - * @domain:The platform-msi domain + * @domain:The platform-msi device domain * @virq: The base irq from which to perform the free operation * @nvec: How many interrupts to free from @virq */ -void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq, - unsigned int nvec) +void platform_msi_device_domain_free(struct irq_domain *domain, unsigned int virq, +unsigned int nvec) { struct platform_msi_priv_data *data = domain->host_data; struct msi_desc *desc, *tmp; + for_each_msi_entry_safe(desc, tmp, data->dev) { if (WARN_ON(!desc->irq || desc->nvec_used != 1)) return; @@ -397,10 +403,10 @@ void platform_msi_domain_free(struct irq } /** - * platform_msi_domain_alloc - Allocate interrupts associated with - *a platform-msi domain + * platform_msi_device_domain_alloc - Allocate interrupts associated with + * a platform-msi device domain * - * @domain:The platform-msi domain + * @domain:The platform-msi device domain * @virq: The base irq from which to perform the allocate operation * @nr_irqs: How many interrupts to free from @virq * @@ -408,8 +414,8 @@ void platform_msi_domain_free(struct irq * with irq_domain_mutex held (which can only be done as part of a * top-level interrupt allocation). */ -int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, - unsigned int nr_irqs) +int platform_msi_device_domain_alloc(struct irq_domain *domain, unsigned int virq, +unsigned int nr_irqs) { struct platform_msi_priv_data *data = domain->host_data; int err; @@ -421,7 +427,7 @@ int platform_msi_domain_alloc(struct irq err = msi_domain_populate_irqs(domain->parent, data->dev, virq, nr_irqs, &data->arg); if (err) - platform_msi_domain_free(domain, virq, nr_irqs); + platform_msi_device_domain_free(domain, virq, nr_irqs); return err; } --- a/drivers/irqchip/irq-mbigen.c +++ b/drivers/irqchip/irq-mbigen.c @@ -207,7 +207,7 @@ static int mbigen_irq_domain_alloc(struc if (err) return err; - err = platform_msi_domain_alloc(domain, virq, nr_irqs); + err = platform_msi_device_domain_alloc(domain, virq, nr_irqs); if (err) re
[patch V3 16/35] genirq/msi: Remove the original sysfs interfaces
From: Thomas Gleixner No more users. Refactor the core code accordingly and move the global interface under CONFIG_PCI_MSI_ARCH_FALLBACKS. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- include/linux/msi.h | 29 +++- kernel/irq/msi.c| 53 +++- 2 files changed, 28 insertions(+), 54 deletions(-) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -246,26 +246,6 @@ void __pci_write_msi_msg(struct msi_desc void pci_msi_mask_irq(struct irq_data *data); void pci_msi_unmask_irq(struct irq_data *data); -#ifdef CONFIG_SYSFS -int msi_device_populate_sysfs(struct device *dev); -void msi_device_destroy_sysfs(struct device *dev); - -const struct attribute_group **msi_populate_sysfs(struct device *dev); -void msi_destroy_sysfs(struct device *dev, - const struct attribute_group **msi_irq_groups); -#else -static inline int msi_device_populate_sysfs(struct device *dev) { return 0; } -static inline void msi_device_destroy_sysfs(struct device *dev) { } - -static inline const struct attribute_group **msi_populate_sysfs(struct device *dev) -{ - return NULL; -} -static inline void msi_destroy_sysfs(struct device *dev, const struct attribute_group **msi_irq_groups) -{ -} -#endif - /* * The arch hooks to setup up msi irqs. Default functions are implemented * as weak symbols so that they /can/ be overriden by architecture specific @@ -279,7 +259,14 @@ int arch_setup_msi_irq(struct pci_dev *d void arch_teardown_msi_irq(unsigned int irq); int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); void arch_teardown_msi_irqs(struct pci_dev *dev); -#endif +#ifdef CONFIG_SYSFS +int msi_device_populate_sysfs(struct device *dev); +void msi_device_destroy_sysfs(struct device *dev); +#else /* CONFIG_SYSFS */ +static inline int msi_device_populate_sysfs(struct device *dev) { return 0; } +static inline void msi_device_destroy_sysfs(struct device *dev) { } +#endif /* !CONFIG_SYSFS */ +#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */ /* * The restore hook is still available even for fully irq domain based --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -118,12 +118,8 @@ static ssize_t msi_mode_show(struct devi /** * msi_populate_sysfs - Populate msi_irqs sysfs entries for devices * @dev: The device(PCI, platform etc) who will get sysfs entries - * - * Return attribute_group ** so that specific bus MSI can save it to - * somewhere during initilizing msi irqs. If devices has no MSI irq, - * return NULL; if it fails to populate sysfs, return ERR_PTR */ -const struct attribute_group **msi_populate_sysfs(struct device *dev) +static const struct attribute_group **msi_populate_sysfs(struct device *dev) { const struct attribute_group **msi_irq_groups; struct attribute **msi_attrs, *msi_attr; @@ -214,41 +210,32 @@ int msi_device_populate_sysfs(struct dev } /** - * msi_destroy_sysfs - Destroy msi_irqs sysfs entries for devices - * @dev: The device(PCI, platform etc) who will remove sysfs entries - * @msi_irq_groups:attribute_group for device msi_irqs entries - */ -void msi_destroy_sysfs(struct device *dev, const struct attribute_group **msi_irq_groups) -{ - struct device_attribute *dev_attr; - struct attribute **msi_attrs; - int count = 0; - - if (msi_irq_groups) { - sysfs_remove_groups(&dev->kobj, msi_irq_groups); - msi_attrs = msi_irq_groups[0]->attrs; - while (msi_attrs[count]) { - dev_attr = container_of(msi_attrs[count], - struct device_attribute, attr); - kfree(dev_attr->attr.name); - kfree(dev_attr); - ++count; - } - kfree(msi_attrs); - kfree(msi_irq_groups[0]); - kfree(msi_irq_groups); - } -} - -/** * msi_device_destroy_sysfs - Destroy msi_irqs sysfs entries for a device * @dev: The device (PCI, platform etc) for which to remove * sysfs entries */ void msi_device_destroy_sysfs(struct device *dev) { - msi_destroy_sysfs(dev, dev->msi.data->attrs); + const struct attribute_group **msi_irq_groups = dev->msi.data->attrs; + struct device_attribute *dev_attr; + struct attribute **msi_attrs; + int count = 0; + dev->msi.data->attrs = NULL; + if (!msi_irq_groups) + return; + + sysfs_remove_groups(&dev->kobj, msi_irq_groups); + msi_attrs = msi_irq_groups[0]->attrs; + while (msi_attrs[count]) { + dev_attr = container_of(msi_attrs[count], struct device_attribute, attr); + kfree(dev_attr->attr.name); + kfree(dev_attr); + ++count; + } + kfre
[patch V3 15/35] platform-msi: Let the core code handle sysfs groups
From: Thomas Gleixner Set the domain info flag and remove the local sysfs code. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -23,7 +23,6 @@ struct platform_msi_priv_data { struct device *dev; void*host_data; - const struct attribute_group**msi_irq_groups; msi_alloc_info_targ; irq_write_msi_msg_t write_msg; int devid; @@ -191,6 +190,7 @@ struct irq_domain *platform_msi_create_i platform_msi_update_dom_ops(info); if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS) platform_msi_update_chip_ops(info); + info->flags |= MSI_FLAG_DEV_SYSFS; domain = msi_create_irq_domain(fwnode, info, parent); if (domain) @@ -279,16 +279,8 @@ int platform_msi_domain_alloc_irqs(struc if (err) goto out_free_desc; - priv_data->msi_irq_groups = msi_populate_sysfs(dev); - if (IS_ERR(priv_data->msi_irq_groups)) { - err = PTR_ERR(priv_data->msi_irq_groups); - goto out_free_irqs; - } - return 0; -out_free_irqs: - msi_domain_free_irqs(dev->msi.domain, dev); out_free_desc: platform_msi_free_descs(dev, 0, nvec); out_free_priv_data: @@ -308,7 +300,6 @@ void platform_msi_domain_free_irqs(struc struct msi_desc *desc; desc = first_msi_entry(dev); - msi_destroy_sysfs(dev, desc->platform.msi_priv_data->msi_irq_groups); platform_msi_free_priv_data(desc->platform.msi_priv_data); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 14/35] PCI/MSI: Let the irq code handle sysfs groups
From: Thomas Gleixner Set the domain info flag which makes the core code handle sysfs groups and put an explicit invocation into the legacy code. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/msi/irqdomain.c |2 +- drivers/pci/msi/legacy.c|6 +- drivers/pci/msi/msi.c | 23 --- include/linux/pci.h |1 - 4 files changed, 6 insertions(+), 26 deletions(-) --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -159,7 +159,7 @@ struct irq_domain *pci_msi_create_irq_do if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS) pci_msi_domain_update_chip_ops(info); - info->flags |= MSI_FLAG_ACTIVATE_EARLY; + info->flags |= MSI_FLAG_ACTIVATE_EARLY | MSI_FLAG_DEV_SYSFS; if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE)) info->flags |= MSI_FLAG_MUST_REACTIVATE; --- a/drivers/pci/msi/legacy.c +++ b/drivers/pci/msi/legacy.c @@ -70,10 +70,14 @@ int pci_msi_legacy_setup_msi_irqs(struct { int ret = arch_setup_msi_irqs(dev, nvec, type); - return pci_msi_setup_check_result(dev, type, ret); + ret = pci_msi_setup_check_result(dev, type, ret); + if (!ret) + ret = msi_device_populate_sysfs(&dev->dev); + return ret; } void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev) { + msi_device_destroy_sysfs(&dev->dev); arch_teardown_msi_irqs(dev); } --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -233,11 +233,6 @@ static void free_msi_irqs(struct pci_dev for (i = 0; i < entry->nvec_used; i++) BUG_ON(irq_has_action(entry->irq + i)); - if (dev->msi_irq_groups) { - msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups); - dev->msi_irq_groups = NULL; - } - pci_msi_teardown_msi_irqs(dev); list_for_each_entry_safe(entry, tmp, msi_list, list) { @@ -417,7 +412,6 @@ static int msi_verify_entries(struct pci static int msi_capability_init(struct pci_dev *dev, int nvec, struct irq_affinity *affd) { - const struct attribute_group **groups; struct msi_desc *entry; int ret; @@ -448,14 +442,6 @@ static int msi_capability_init(struct pc if (ret) goto err; - groups = msi_populate_sysfs(&dev->dev); - if (IS_ERR(groups)) { - ret = PTR_ERR(groups); - goto err; - } - - dev->msi_irq_groups = groups; - /* Set MSI enabled bits */ pci_intx_for_msi(dev, 0); pci_msi_set_enable(dev, 1); @@ -584,7 +570,6 @@ static void msix_mask_all(void __iomem * static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, int nvec, struct irq_affinity *affd) { - const struct attribute_group **groups; void __iomem *base; int ret, tsize; u16 control; @@ -629,14 +614,6 @@ static int msix_capability_init(struct p msix_update_entries(dev, entries); - groups = msi_populate_sysfs(&dev->dev); - if (IS_ERR(groups)) { - ret = PTR_ERR(groups); - goto out_free; - } - - dev->msi_irq_groups = groups; - /* Disable INTX and unmask MSI-X */ pci_intx_for_msi(dev, 0); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -475,7 +475,6 @@ struct pci_dev { #ifdef CONFIG_PCI_MSI void __iomem*msix_base; raw_spinlock_t msi_lock; - const struct attribute_group **msi_irq_groups; #endif struct pci_vpd vpd; #ifdef CONFIG_PCIE_DPC ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 13/35] genirq/msi: Provide msi_device_populate/destroy_sysfs()
From: Thomas Gleixner Add new allocation functions which can be activated by domain info flags. They store the groups pointer in struct msi_device_data. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- include/linux/msi.h |4 kernel/irq/msi.c| 42 -- 2 files changed, 44 insertions(+), 2 deletions(-) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -56,6 +56,8 @@ struct irq_data; struct msi_desc; struct pci_dev; struct platform_msi_priv_data; +struct attribute_group; + void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg); #ifdef CONFIG_GENERIC_MSI_IRQ void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg); @@ -174,9 +176,11 @@ struct msi_desc { /** * msi_device_data - MSI per device data * @properties:MSI properties which are interesting to drivers + * @attrs: Pointer to the sysfs attribute group */ struct msi_device_data { unsigned long properties; + const struct attribute_group**attrs; }; int msi_setup_device_data(struct device *dev); --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -200,6 +200,20 @@ const struct attribute_group **msi_popul } /** + * msi_device_populate_sysfs - Populate msi_irqs sysfs entries for a device + * @dev: The device (PCI, platform etc) which will get sysfs entries + */ +int msi_device_populate_sysfs(struct device *dev) +{ + const struct attribute_group **group = msi_populate_sysfs(dev); + + if (IS_ERR(group)) + return PTR_ERR(group); + dev->msi.data->attrs = group; + return 0; +} + +/** * msi_destroy_sysfs - Destroy msi_irqs sysfs entries for devices * @dev: The device(PCI, platform etc) who will remove sysfs entries * @msi_irq_groups:attribute_group for device msi_irqs entries @@ -225,6 +239,17 @@ void msi_destroy_sysfs(struct device *de kfree(msi_irq_groups); } } + +/** + * msi_device_destroy_sysfs - Destroy msi_irqs sysfs entries for a device + * @dev: The device (PCI, platform etc) for which to remove + * sysfs entries + */ +void msi_device_destroy_sysfs(struct device *dev) +{ + msi_destroy_sysfs(dev, dev->msi.data->attrs); + dev->msi.data->attrs = NULL; +} #endif #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN @@ -672,8 +697,19 @@ int msi_domain_alloc_irqs(struct irq_dom { struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; + int ret; - return ops->domain_alloc_irqs(domain, dev, nvec); + ret = ops->domain_alloc_irqs(domain, dev, nvec); + if (ret) + return ret; + + if (!(info->flags & MSI_FLAG_DEV_SYSFS)) + return 0; + + ret = msi_device_populate_sysfs(dev); + if (ret) + msi_domain_free_irqs(domain, dev); + return ret; } void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) @@ -712,7 +748,9 @@ void msi_domain_free_irqs(struct irq_dom struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; - return ops->domain_free_irqs(domain, dev); + if (info->flags & MSI_FLAG_DEV_SYSFS) + msi_device_destroy_sysfs(dev); + ops->domain_free_irqs(domain, dev); } /** ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 12/35] soc: ti: ti_sci_inta_msi: Allocate MSI device data on first use
From: Thomas Gleixner Allocate the MSI device data on first invocation of the allocation function. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Nishanth Menon Cc: Tero Kristo Cc: Santosh Shilimkar Cc: linux-arm-ker...@lists.infradead.org --- drivers/soc/ti/ti_sci_inta_msi.c |4 1 file changed, 4 insertions(+) --- a/drivers/soc/ti/ti_sci_inta_msi.c +++ b/drivers/soc/ti/ti_sci_inta_msi.c @@ -120,6 +120,10 @@ int ti_sci_inta_msi_domain_alloc_irqs(st if (pdev->id < 0) return -ENODEV; + ret = msi_setup_device_data(dev); + if (ret) + return ret; + nvec = ti_sci_inta_msi_alloc_descs(dev, res); if (nvec <= 0) return nvec; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 11/35] bus: fsl-mc-msi: Allocate MSI device data on first use
From: Thomas Gleixner Allocate the MSI device data on first invocation of the allocation function. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Cc: Stuart Yoder Cc: Laurentiu Tudor --- drivers/bus/fsl-mc/fsl-mc-msi.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) --- a/drivers/bus/fsl-mc/fsl-mc-msi.c +++ b/drivers/bus/fsl-mc/fsl-mc-msi.c @@ -253,6 +253,14 @@ int fsl_mc_msi_domain_alloc_irqs(struct struct irq_domain *msi_domain; int error; + msi_domain = dev_get_msi_domain(dev); + if (!msi_domain) + return -EINVAL; + + error = msi_setup_device_data(dev); + if (error) + return error; + if (!list_empty(dev_to_msi_list(dev))) return -EINVAL; @@ -260,12 +268,6 @@ int fsl_mc_msi_domain_alloc_irqs(struct if (error < 0) return error; - msi_domain = dev_get_msi_domain(dev); - if (!msi_domain) { - error = -EINVAL; - goto cleanup_msi_descs; - } - /* * NOTE: Calling this function will trigger the invocation of the * its_fsl_mc_msi_prepare() callback ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 09/35] PCI/MSI: Allocate MSI device data on first use
From: Thomas Gleixner Allocate MSI device data on first use, i.e. when a PCI driver invokes one of the PCI/MSI enablement functions. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/msi/msi.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -900,10 +900,12 @@ static int __pci_enable_msi_range(struct /* deprecated, don't use */ int pci_enable_msi(struct pci_dev *dev) { - int rc = __pci_enable_msi_range(dev, 1, 1, NULL); - if (rc < 0) - return rc; - return 0; + int rc = msi_setup_device_data(&dev->dev); + + if (!rc) + rc = __pci_enable_msi_range(dev, 1, 1, NULL); + + return rc < 0 ? rc : 0; } EXPORT_SYMBOL(pci_enable_msi); @@ -958,7 +960,11 @@ static int __pci_enable_msix_range(struc int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, int maxvec) { - return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0); + int ret = msi_setup_device_data(&dev->dev); + + if (!ret) + ret = __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0); + return ret; } EXPORT_SYMBOL(pci_enable_msix_range); @@ -985,8 +991,12 @@ int pci_alloc_irq_vectors_affinity(struc struct irq_affinity *affd) { struct irq_affinity msi_default_affd = {0}; + int ret = msi_setup_device_data(&dev->dev); int nvecs = -ENOSPC; + if (ret) + return ret; + if (flags & PCI_IRQ_AFFINITY) { if (!affd) affd = &msi_default_affd; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 10/35] platform-msi: Allocate MSI device data on first use
From: Thomas Gleixner Allocate the MSI device data on first invocation of the allocation function for platform MSI private data. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -204,6 +204,8 @@ platform_msi_alloc_priv_data(struct devi irq_write_msi_msg_t write_msi_msg) { struct platform_msi_priv_data *datap; + int err; + /* * Limit the number of interrupts to 2048 per device. Should we * need to bump this up, DEV_ID_SHIFT should be adjusted @@ -218,6 +220,10 @@ platform_msi_alloc_priv_data(struct devi return ERR_PTR(-EINVAL); } + err = msi_setup_device_data(dev); + if (err) + return ERR_PTR(err); + /* Already had a helping of MSI? Greed... */ if (!list_empty(dev_to_msi_list(dev))) return ERR_PTR(-EBUSY); @@ -229,7 +235,7 @@ platform_msi_alloc_priv_data(struct devi datap->devid = ida_simple_get(&platform_msi_devid_ida, 0, 1 << DEV_ID_SHIFT, GFP_KERNEL); if (datap->devid < 0) { - int err = datap->devid; + err = datap->devid; kfree(datap); return ERR_PTR(err); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 08/35] device: Add device:: Msi_data pointer and struct msi_device_data
From: Thomas Gleixner Create struct msi_device_data and add a pointer of that type to struct dev_msi_info, which is part of struct device. Provide an allocator function which can be invoked from the MSI interrupt allocation code pathes. Add a properties field to the data structure as a first member so the allocation size is not zero bytes. The field will be uses later on. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- include/linux/device.h |5 + include/linux/msi.h| 18 ++ kernel/irq/msi.c | 32 3 files changed, 55 insertions(+) --- a/include/linux/device.h +++ b/include/linux/device.h @@ -45,6 +45,7 @@ struct iommu_ops; struct iommu_group; struct dev_pin_info; struct dev_iommu; +struct msi_device_data; /** * struct subsys_interface - interfaces to device functions @@ -374,11 +375,15 @@ struct dev_links_info { /** * struct dev_msi_info - Device data related to MSI * @domain:The MSI interrupt domain associated to the device + * @data: Pointer to MSI device data */ struct dev_msi_info { #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN struct irq_domain *domain; #endif +#ifdef CONFIG_GENERIC_MSI_IRQ + struct msi_device_data *data; +#endif }; /** --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -171,6 +171,16 @@ struct msi_desc { }; }; +/** + * msi_device_data - MSI per device data + * @properties:MSI properties which are interesting to drivers + */ +struct msi_device_data { + unsigned long properties; +}; + +int msi_setup_device_data(struct device *dev); + /* Helpers to hide struct msi_desc implementation details */ #define msi_desc_to_dev(desc) ((desc)->dev) #define dev_to_msi_list(dev) (&(dev)->msi_list) @@ -233,10 +243,16 @@ void pci_msi_mask_irq(struct irq_data *d void pci_msi_unmask_irq(struct irq_data *data); #ifdef CONFIG_SYSFS +int msi_device_populate_sysfs(struct device *dev); +void msi_device_destroy_sysfs(struct device *dev); + const struct attribute_group **msi_populate_sysfs(struct device *dev); void msi_destroy_sysfs(struct device *dev, const struct attribute_group **msi_irq_groups); #else +static inline int msi_device_populate_sysfs(struct device *dev) { return 0; } +static inline void msi_device_destroy_sysfs(struct device *dev) { } + static inline const struct attribute_group **msi_populate_sysfs(struct device *dev) { return NULL; @@ -384,6 +400,8 @@ enum { MSI_FLAG_MUST_REACTIVATE= (1 << 5), /* Is level-triggered capable, using two messages */ MSI_FLAG_LEVEL_CAPABLE = (1 << 6), + /* Populate sysfs on alloc() and destroy it on free() */ + MSI_FLAG_DEV_SYSFS = (1 << 7), }; int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask, --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -73,6 +73,38 @@ void get_cached_msi_msg(unsigned int irq } EXPORT_SYMBOL_GPL(get_cached_msi_msg); +static void msi_device_data_release(struct device *dev, void *res) +{ + WARN_ON_ONCE(!list_empty(&dev->msi_list)); + dev->msi.data = NULL; +} + +/** + * msi_setup_device_data - Setup MSI device data + * @dev: Device for which MSI device data should be set up + * + * Return: 0 on success, appropriate error code otherwise + * + * This can be called more than once for @dev. If the MSI device data is + * already allocated the call succeeds. The allocated memory is + * automatically released when the device is destroyed. + */ +int msi_setup_device_data(struct device *dev) +{ + struct msi_device_data *md; + + if (dev->msi.data) + return 0; + + md = devres_alloc(msi_device_data_release, sizeof(*md), GFP_KERNEL); + if (!md) + return -ENOMEM; + + dev->msi.data = md; + devres_add(dev, md); + return 0; +} + #ifdef CONFIG_SYSFS static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr, char *buf) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 07/35] device: Move MSI related data into a struct
From: Thomas Gleixner The only unconditional part of MSI data in struct device is the irqdomain pointer. Everything else can be allocated on demand. Create a data structure and move the irqdomain pointer into it. The other MSI specific parts are going to be removed from struct device in later steps. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe --- drivers/base/platform-msi.c | 12 ++-- drivers/dma/ti/k3-udma.c|4 ++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |2 +- drivers/irqchip/irq-mvebu-icu.c |6 +++--- drivers/soc/ti/k3-ringacc.c |4 ++-- drivers/soc/ti/ti_sci_inta_msi.c|2 +- include/linux/device.h | 20 ++-- 7 files changed, 29 insertions(+), 21 deletions(-) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -210,10 +210,10 @@ platform_msi_alloc_priv_data(struct devi * accordingly (which would impact the max number of MSI * capable devices). */ - if (!dev->msi_domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS) + if (!dev->msi.domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS) return ERR_PTR(-EINVAL); - if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { + if (dev->msi.domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { dev_err(dev, "Incompatible msi_domain, giving up\n"); return ERR_PTR(-EINVAL); } @@ -269,7 +269,7 @@ int platform_msi_domain_alloc_irqs(struc if (err) goto out_free_priv_data; - err = msi_domain_alloc_irqs(dev->msi_domain, dev, nvec); + err = msi_domain_alloc_irqs(dev->msi.domain, dev, nvec); if (err) goto out_free_desc; @@ -282,7 +282,7 @@ int platform_msi_domain_alloc_irqs(struc return 0; out_free_irqs: - msi_domain_free_irqs(dev->msi_domain, dev); + msi_domain_free_irqs(dev->msi.domain, dev); out_free_desc: platform_msi_free_descs(dev, 0, nvec); out_free_priv_data: @@ -306,7 +306,7 @@ void platform_msi_domain_free_irqs(struc platform_msi_free_priv_data(desc->platform.msi_priv_data); } - msi_domain_free_irqs(dev->msi_domain, dev); + msi_domain_free_irqs(dev->msi.domain, dev); platform_msi_free_descs(dev, 0, MAX_DEV_MSIS); } EXPORT_SYMBOL_GPL(platform_msi_domain_free_irqs); @@ -354,7 +354,7 @@ struct irq_domain * return NULL; data->host_data = host_data; - domain = irq_domain_create_hierarchy(dev->msi_domain, 0, + domain = irq_domain_create_hierarchy(dev->msi.domain, 0, is_tree ? 0 : nvec, dev->fwnode, ops, data); if (!domain) --- a/drivers/dma/ti/k3-udma.c +++ b/drivers/dma/ti/k3-udma.c @@ -5279,9 +5279,9 @@ static int udma_probe(struct platform_de if (IS_ERR(ud->ringacc)) return PTR_ERR(ud->ringacc); - dev->msi_domain = of_msi_get_domain(dev, dev->of_node, + dev->msi.domain = of_msi_get_domain(dev, dev->of_node, DOMAIN_BUS_TI_SCI_INTA_MSI); - if (!dev->msi_domain) { + if (!dev->msi.domain) { dev_err(dev, "Failed to get MSI domain\n"); return -EPROBE_DEFER; } --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3170,7 +3170,7 @@ static void arm_smmu_setup_msis(struct a if (!(smmu->features & ARM_SMMU_FEAT_MSI)) return; - if (!dev->msi_domain) { + if (!dev->msi.domain) { dev_info(smmu->dev, "msi_domain absent - falling back to wired irqs\n"); return; } --- a/drivers/irqchip/irq-mvebu-icu.c +++ b/drivers/irqchip/irq-mvebu-icu.c @@ -314,12 +314,12 @@ static int mvebu_icu_subset_probe(struct msi_data->subset_data = of_device_get_match_data(dev); } - dev->msi_domain = of_msi_get_domain(dev, dev->of_node, + dev->msi.domain = of_msi_get_domain(dev, dev->of_node, DOMAIN_BUS_PLATFORM_MSI); - if (!dev->msi_domain) + if (!dev->msi.domain) return -EPROBE_DEFER; - msi_parent_dn = irq_domain_get_of_node(dev->msi_domain); + msi_parent_dn = irq_domain_get_of_node(dev->msi.domain); if (!msi_parent_dn) return -ENODEV; --- a/drivers/soc/ti/k3-ringacc.c +++ b/drivers/soc/ti/k3-ringacc.c @@ -1356,9 +1356,9 @@ static int k3_ringacc_init(struct platfo struct resourc
[patch V3 06/35] powerpc/pseries/msi: Use PCI device properties
From: Thomas Gleixner instead of fiddling with MSI descriptors. Signed-off-by: Thomas Gleixner Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org --- V3: Use pci_dev->msix_enabled - Jason --- arch/powerpc/platforms/pseries/msi.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -448,8 +448,7 @@ static int pseries_msi_ops_prepare(struc int nvec, msi_alloc_info_t *arg) { struct pci_dev *pdev = to_pci_dev(dev); - struct msi_desc *desc = first_pci_msi_entry(pdev); - int type = desc->pci.msi_attrib.is_msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; + int type = pdev->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; return rtas_prepare_msi_irqs(pdev, nvec, type, arg); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 05/35] powerpc/cell/axon_msi: Use PCI device property
From: Thomas Gleixner instead of fiddling with MSI descriptors. Signed-off-by: Thomas Gleixner Cc: Arnd Bergmann Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: linuxppc-...@lists.ozlabs.org --- V3: Use pci_dev property - Jason V2: Invoke the function with the correct number of arguments - Andy --- arch/powerpc/platforms/cell/axon_msi.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/arch/powerpc/platforms/cell/axon_msi.c +++ b/arch/powerpc/platforms/cell/axon_msi.c @@ -199,7 +199,6 @@ static struct axon_msic *find_msi_transl static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg) { struct device_node *dn; - struct msi_desc *entry; int len; const u32 *prop; @@ -209,10 +208,8 @@ static int setup_msi_msg_address(struct return -ENODEV; } - entry = first_pci_msi_entry(dev); - for (; dn; dn = of_get_next_parent(dn)) { - if (entry->pci.msi_attrib.is_64) { + if (!dev->no_64bit_msi) { prop = of_get_property(dn, "msi-address-64", &len); if (prop) break; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 00/35] genirq/msi, PCI/MSI: Spring cleaning - Part 2
This is the second part of [PCI]MSI refactoring which aims to provide the ability of expanding MSI-X vectors after enabling MSI-X. This is based on the first part of this work which can be found here: https://lore.kernel.org/r/20211206210147.872865...@linutronix.de and has been applied to: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi This second part has the following important changes: 1) Cleanup of the MSI related data in struct device struct device contains at the moment various MSI related parts. Some of them (the irq domain pointer) cannot be moved out, but the rest can be allocated on first use. This is in preparation of adding more per device MSI data later on. 2) Consolidation of sysfs handling As a first step this moves the sysfs pointer from struct msi_desc into the new per device MSI data structure where it belongs. Later changes will cleanup this code further, but that's not possible at this point. 3) Use PCI device properties instead of looking up MSI descriptors and analysing their data. 4) Provide a function to retrieve the Linux interrupt number for a given MSI index similar to pci_irq_vector() and cleanup all open coded variants. It's also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v3-part-2 Part 3 of this effort is available on top git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git msi-v3-part-3 Part 3 is not going to be reposted as there is no change vs. V2. V2 of part 2 can be found here: https://lore.kernel.org/r/20211206210307.625116...@linutronix.de Changes versus V2: - Use PCI device properties instead of creating a new set - Jason - Picked up Reviewed/Tested/Acked-by tags as appropriate Thanks, tglx --- arch/powerpc/platforms/cell/axon_msi.c |5 arch/powerpc/platforms/pseries/msi.c| 38 +--- arch/x86/kernel/apic/msi.c |5 arch/x86/pci/xen.c | 11 - drivers/base/platform-msi.c | 152 --- drivers/bus/fsl-mc/dprc-driver.c|8 - drivers/bus/fsl-mc/fsl-mc-allocator.c |9 - drivers/bus/fsl-mc/fsl-mc-msi.c | 26 +-- drivers/dma/mv_xor_v2.c | 16 -- drivers/dma/qcom/hidma.c| 44 ++--- drivers/dma/ti/k3-udma-private.c|6 drivers/dma/ti/k3-udma.c| 14 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 -- drivers/irqchip/irq-mbigen.c|4 drivers/irqchip/irq-mvebu-icu.c | 12 - drivers/irqchip/irq-ti-sci-inta.c |2 drivers/mailbox/bcm-flexrm-mailbox.c|9 - drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c|4 drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c|4 drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c |5 drivers/pci/msi/irqdomain.c | 20 ++ drivers/pci/msi/legacy.c|6 drivers/pci/msi/msi.c | 133 ++-- drivers/pci/xen-pcifront.c |2 drivers/perf/arm_smmuv3_pmu.c |5 drivers/soc/fsl/dpio/dpio-driver.c |8 - drivers/soc/ti/k3-ringacc.c |6 drivers/soc/ti/ti_sci_inta_msi.c| 22 -- drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c |4 include/linux/device.h | 25 ++- include/linux/fsl/mc.h |4 include/linux/msi.h | 95 include/linux/pci.h |1 include/linux/soc/ti/ti_sci_inta_msi.h |1 kernel/irq/msi.c| 158 +++- 35 files changed, 429 insertions(+), 458 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 03/35] x86/apic/msi: Use PCI device MSI property
From: Thomas Gleixner instead of fiddling with MSI descriptors. Signed-off-by: Thomas Gleixner --- V3: Use pci_dev->msix_enabled - Jason --- arch/x86/kernel/apic/msi.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -160,11 +160,8 @@ static struct irq_chip pci_msi_controlle int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg) { - struct pci_dev *pdev = to_pci_dev(dev); - struct msi_desc *desc = first_pci_msi_entry(pdev); - init_irq_alloc_info(arg, NULL); - if (desc->pci.msi_attrib.is_msix) { + if (to_pci_dev(dev)->msix_enabled) { arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX; } else { arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 04/35] genirq/msi: Use PCI device property
From: Thomas Gleixner to determine whether this is MSI or MSIX instead of consulting MSI descriptors. Signed-off-by: Thomas Gleixner --- V2: Use PCI device property - Jason --- kernel/irq/msi.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -77,21 +77,8 @@ EXPORT_SYMBOL_GPL(get_cached_msi_msg); static ssize_t msi_mode_show(struct device *dev, struct device_attribute *attr, char *buf) { - struct msi_desc *entry; - bool is_msix = false; - unsigned long irq; - int retval; - - retval = kstrtoul(attr->attr.name, 10, &irq); - if (retval) - return retval; - - entry = irq_get_msi_desc(irq); - if (!entry) - return -ENODEV; - - if (dev_is_pci(dev)) - is_msix = entry->pci.msi_attrib.is_msix; + /* MSI vs. MSIX is per device not per interrupt */ + bool is_msix = dev_is_pci(dev) ? to_pci_dev(dev)->msix_enabled : false; return sysfs_emit(buf, "%s\n", is_msix ? "msix" : "msi"); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 02/35] x86/pci/XEN: Use PCI device property
From: Thomas Gleixner instead of fiddling with MSI descriptors. Signed-off-by: Thomas Gleixner Cc: Juergen Gross Cc: xen-de...@lists.xenproject.org --- V3: Use pci_dev->msix_enabled. --- arch/x86/pci/xen.c |9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -399,9 +399,7 @@ static void xen_teardown_msi_irqs(struct static void xen_pv_teardown_msi_irqs(struct pci_dev *dev) { - struct msi_desc *msidesc = first_pci_msi_entry(dev); - - if (msidesc->pci.msi_attrib.is_msix) + if (dev->msix_enabled) xen_pci_frontend_disable_msix(dev); else xen_pci_frontend_disable_msi(dev); @@ -417,10 +415,7 @@ static int xen_msi_domain_alloc_irqs(str if (WARN_ON_ONCE(!dev_is_pci(dev))) return -EINVAL; - if (first_msi_entry(dev)->pci.msi_attrib.is_msix) - type = PCI_CAP_ID_MSIX; - else - type = PCI_CAP_ID_MSI; + type = to_pci_dev(dev)->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch V3 01/35] PCI/MSI: Set pci_dev::msi[x]_enabled early
There are quite some places which retrieve the first MSI descriptor to evaluate whether the setup is for MSI or MSI-X. That's required because pci_dev::msi[x]_enabled is only set when the setup completed successfully. There is no real reason why msi[x]_enabled can't be set at the beginning of the setup sequence and cleared in case of a failure. Implement that so the MSI descriptor evaluations can be converted to simple property queries. Signed-off-by: Thomas Gleixner --- V3: New patch --- drivers/pci/msi/msi.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -421,11 +421,18 @@ static int msi_capability_init(struct pc struct msi_desc *entry; int ret; - pci_msi_set_enable(dev, 0); /* Disable MSI during set up */ + /* +* Disable MSI during setup in the hardware, but mark it enabled +* so that setup code can evaluate it. +*/ + pci_msi_set_enable(dev, 0); + dev->msi_enabled = 1; entry = msi_setup_entry(dev, nvec, affd); - if (!entry) - return -ENOMEM; + if (!entry) { + ret = -ENOMEM; + goto fail; + } /* All MSIs are unmasked by default; mask them all */ pci_msi_mask(entry, msi_multi_mask(entry)); @@ -452,7 +459,6 @@ static int msi_capability_init(struct pc /* Set MSI enabled bits */ pci_intx_for_msi(dev, 0); pci_msi_set_enable(dev, 1); - dev->msi_enabled = 1; pcibios_free_irq(dev); dev->irq = entry->irq; @@ -461,6 +467,8 @@ static int msi_capability_init(struct pc err: pci_msi_unmask(entry, msi_multi_mask(entry)); free_msi_irqs(dev); +fail: + dev->msi_enabled = 0; return ret; } @@ -589,6 +597,9 @@ static int msix_capability_init(struct p pci_msix_clear_and_set_ctrl(dev, 0, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE); + /* Mark it enabled so setup functions can query it */ + dev->msix_enabled = 1; + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); /* Request & Map MSI-X table region */ tsize = msix_table_size(control); @@ -626,9 +637,8 @@ static int msix_capability_init(struct p dev->msi_irq_groups = groups; - /* Set MSI-X enabled bits and unmask the function */ + /* Disable INTX and unmask MSI-X */ pci_intx_for_msi(dev, 0); - dev->msix_enabled = 1; pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); pcibios_free_irq(dev); @@ -638,6 +648,7 @@ static int msix_capability_init(struct p free_msi_irqs(dev); out_disable: + dev->msix_enabled = 0; pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); return ret; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Jason, On Fri, Dec 10 2021 at 08:39, Jason Gunthorpe wrote: > On Fri, Dec 10, 2021 at 07:29:01AM +, Tian, Kevin wrote: >> > 5) It's not possible for the kernel to reliably detect whether it is >> > running on bare metal or not. Yes we talked about heuristics, but >> > that's something I really want to avoid. >> >> How would the hypercall mechanism avoid such heuristics? > > It is clever, we don't have an vIOMMU that supplies vIR today, so by > definition all guests are excluded and only bare metal works. Dammit. Now you spilled the beans. :) >> > The charm is that his works for everything from INTx to IMS because all >> > of them go through the same procedure, except that INTx (IO/APIC) does >> > not support the reservation mode dance. > > Do we even have vIOAPIC? It does not matter much. INTx via IOAPIC is different anyway because INTx is shared so it's unclear from which device it originates. > It seems reasonable - do you have any idea how this all would work on > ARM too? IMS on baremetal ARM is surely interesting. I assume they > have a similar issue with trapping the MSI On baremetal it should just work once ARM is converted over. No idea about guests. Marc should know. >> Then Qemu needs to find out the GSI number for the vIRTE handle. >> Again Qemu doesn't have such information since it doesn't know >> which MSI[-X] entry points to this handle due to no trap. > > No this is already going wrong. qemu *cannot* know the MSI information > because there is no MSI information for IMS. > > All qemu should get is the origin device information and data about > how the guest wants the interrupt setup. > > Forget about guests and all of this complexity, design how to make > VFIO work with IMS in pure userspace like DPDK. > > We must have a VFIO ioctl to acquire a addr/data pair and link it to > an event fd. > > I'm not sure exactly how this should be done, it is 90% of what IMS > is, except the VFIO irq_chip cannot touch any real HW and certainly > cannot do mask/unmask.. > > Maybe that is OK now that it requires IR? IR unfortunately does not allow masking, but we surely can come up some emergency button to press when e.g. an interrupt storm is detected. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Fri, Dec 10 2021 at 07:29, Kevin Tian wrote: >> From: Thomas Gleixner >> 4) For the guest side we agreed that we need an hypercall because the >> host can't trap the write to the MSI[-X] entry anymore. > > To be accurate I'd like to not call it "can't trap". The host still traps the > MSI/MSI-X entry if the hypercall is not used. This is for current guest > OS which doesn't have this hypercall mechanism. For future new guest > OS which will support this machinery then a handshake process from > such guest will disable the trap for MSI-X and map it for direct guest > access in the fly. Right. What I'm suggesting is a clear cut between the current approach, which obviously needs to be preserved, and a consistent new approach which handles MSI, MSI-X and IMS in the exactly same way. > MSI has to be always trapped although the guest has acquired the right > data/addr pair via the hypercall, since it's a PCI config capability. > >> >> Aside of the fact that this creates a special case for IMS which is >> undesirable in my opinion, it's not really obvious where the >> hypercall should be placed to work for all scenarios so that it can >> also solve the existing issue of silent failures. >> >> 5) It's not possible for the kernel to reliably detect whether it is >> running on bare metal or not. Yes we talked about heuristics, but >> that's something I really want to avoid. > > How would the hypercall mechanism avoid such heuristics? The availability of IR remapping where the irqdomain which is provided by the remapping unit signals that it supports this new scheme: |--IO/APIC |--MSI vector -- IR --|--MSI-X |--IMS while the current scheme is: |--IO/APIC vector -- IR --|--PCI/MSI[-X] or |--IO/APIC vector |--PCI/MSI[-X] So in the new scheme the IR domain will advertise new features which are not available on older kernels. The availability of these new features is the indicator for the interrupt subsystem and subsequently for PCI whether IMS is supported or not. Bootup either finds an IR unit or not. In the bare metal case that's the usual hardware/firmware detection. In the guest case it's the availability of vIR including the required hypercall protocol. So for the guest case the initialization would look like this: qemu starts guest Method of interrupt management defaults to current scheme restricted to MSI/MSI-X guest initializes older guest do not check for the hypercall so everything continues as of today new guest initializes vIR, detects hypercall and requests from the hypervisor to switch over to the new scheme. The hypervisor grants or rejects the request, i.e. either both switch to the new scheme or stay with the old one. The new scheme means, that all variants, MSI, MSI-X, IMS are handled in the same way. Staying on the old scheme means that IMS is not available to the guest. Having that clear separation is in my opinion way better than trying to make all of that a big maze of conditionals. I'm going to make that clear cut in the PCI/MSI management layer as well. Trying to do that hybrid we do today for irqdomain and non irqdomain based backends is just not feasible. The decision which way to go will be very close to the driver exposed API, i.e.: pci_alloc_ims_vector() if (new_scheme()) return new_scheme_alloc_ims(); else return -ENOTSUPP; and new_scheme_alloc_ims() will have: new_scheme_alloc_ims() if (!system_supports_ims()) return -ENOTSUPP; system_supports_ims() makes its decision based on the feature flags exposed by the underlying base MSI irqdomain, i.e. either vector or IR on x86. Vector domain will not have that feature flag set, but IR will have on bare metal and eventually in the guest when the vIR setup and hypercall detection and negotiation succeeds. > Then Qemu needs to find out the GSI number for the vIRTE handle. > Again Qemu doesn't have such information since it doesn't know > which MSI[-X] entry points to this handle due to no trap. > > This implies that we may also need carry device ID, #msi entry, etc. > in the hypercall, so Qemu can associate the virtual routing info > to the right [irqfd, gsi]. > > In your model the hypercall is raised by IR domain. Do you see > any problem of finding those information within IR domain? IR has the following information available: Interrupt type - MSI: Device, index and number of vectors - MSI-X: Device, index - IMS: Device, index Target
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Thu, Dec 09 2021 at 23:09, Thomas Gleixner wrote: > On Thu, Dec 09 2021 at 16:58, Jason Gunthorpe wrote: >> Okay, I think I get it. Would be nice to have someone from intel >> familiar with the vIOMMU protocols and qemu code remark what the >> hypervisor side can look like. >> >> There is a bit more work here, we'd have to change VFIO to somehow >> entirely disconnect the kernel IRQ logic from the MSI table and >> directly pass control of it to the guest after the hypervisor IOMMU IR >> secures it. ie directly mmap the msi-x table into the guest > > That makes everything consistent and a clear cut on all levels, right? Let me give a bit more rationale here, why I think this is the right thing to do. There are several problems with IMS both on the host and on the guest side: 1) Contrary to MSI/MSI-X the address/data pair is not completely managed by the core. It's handed off to driver writers in the hope they get it right. 2) Without interrupt remapping there is a fundamental issue on x86 for the affinity setting case, as there is no guarantee that the magic protocol which we came up with (see msi_set_affinity() in the x86 code) is correctly implemented at the driver level or that the update is truly atomic so that the problem does not arise. My interrest in chasing these things is exactly zero. With interrupt remapping the affinity change happens at the IRTE level and not at the device level. It's a one time setup for the device. Just for the record: The ATH11 thing does not have that problem by pure luck because multi-vector MSI is not supported on X86 unless interrupt remapping is enabled. The switchtec NTB thing will fall apart w/o remapping AFAICT. 3) With remapping the message for the device is constructed at allocation time. It does not change after that because the affinity change happens at the remapping level, which eliminates #2 above. That has another advantage for IMS because it does not require any synchronization with the queue or whatever is involved. The next interrupt after the change at the remapping level ends up on the new target. 4) For the guest side we agreed that we need an hypercall because the host can't trap the write to the MSI[-X] entry anymore. Aside of the fact that this creates a special case for IMS which is undesirable in my opinion, it's not really obvious where the hypercall should be placed to work for all scenarios so that it can also solve the existing issue of silent failures. 5) It's not possible for the kernel to reliably detect whether it is running on bare metal or not. Yes we talked about heuristics, but that's something I really want to avoid. When looking at the above I came to the conclusion that the consistent way is to make IMS depend on IR both on the host and the guest as this solves all of the above in one go. How would that work? With IR the irqdomain hierarchy looks like this: |--IO/APIC |--MSI vector -- IR --|--MIX-X |--IMS There are several context where this matters: 1) Allocation of an interrupt, e.g. pci_alloc_irq_vectors(). 2) Activation of an interrupt which happens during allocation and/or at request_irq() time 3) Interrupt affinity setting #1 Allocation That allocates an IRTE, which can fail #2 Activation That's the step where actually a CPU vector is allocated, where the IRTE is updated and where the device message is composed to target the IRTE. On X86 activation is happening twice: 1) During allocation it allocates a special CPU vector which is handed out to all allocated interrupts. That's called reservation mode. This was introduced to prevent vector exhaustion for two cases: - Devices allocating tons of MSI-X vectors without using them. That obviously needs to be fixed at the device driver level, but due to the fact that post probe() allocation is not supported, that's not always possible - CPU hotunplug All vectors targeting the outgoing CPU need to be migrated to a new target CPU, which can result in exhaustion of the vector space. Reservation mode avoids that because it just uses a unique vector for all interrupts which are allocated but not requested. 2) On request_irq() As the vector assigned during allocation is just a place holder to make the MSI hardware happy it needs to be replaced by a real vector. Both can fail and the error is propagated through the call chain #3 Changing the interrupt affinity This obviously needs to allocate a new target CPU vector and update the IRTE. Allocating a new ta
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Thu, Dec 09 2021 at 16:58, Jason Gunthorpe wrote: > On Thu, Dec 09, 2021 at 09:32:42PM +0100, Thomas Gleixner wrote: >> That was my thought to avoid having different mechanisms. >> >> The address/data pair is computed in two places: >> >> 1) Activation of an interrupt >> 2) Affinity setting on an interrupt >> >> Both configure the IRTE when interrupt remapping is in place. >> >> In both cases a vector is allocated in the vector domain and based on >> the resulting target APIC / vector number pair the IRTE is >> (re)configured. >> >> So putting the hypercall into the vIRTE update is the obvious >> place. Both activation and affinity setting can fail and propagate an >> error code down to the originating caller. >> >> Hmm? > > Okay, I think I get it. Would be nice to have someone from intel > familiar with the vIOMMU protocols and qemu code remark what the > hypervisor side can look like. > > There is a bit more work here, we'd have to change VFIO to somehow > entirely disconnect the kernel IRQ logic from the MSI table and > directly pass control of it to the guest after the hypervisor IOMMU IR > secures it. ie directly mmap the msi-x table into the guest That makes everything consistent and a clear cut on all levels, right? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 19/36] PCI/MSI: Store properties in device::msi::data
On Thu, Dec 09 2021 at 18:53, Thomas Gleixner wrote: > On Wed, Dec 08 2021 at 11:58, Jason Gunthorpe wrote: >> On Mon, Dec 06, 2021 at 11:39:26PM +0100, Thomas Gleixner wrote: >>> Store the properties which are interesting for various places so the MSI >>> descriptor fiddling can be removed. >>> >>> Signed-off-by: Thomas Gleixner >>> --- >>> V2: Use the setter function >>> --- >>> drivers/pci/msi/msi.c |8 >>> 1 file changed, 8 insertions(+) >> >> I took more time to look at this, to summarize my remarks on the other >> patches >> >> I think we don't need properties. The info in the msi_desc can come >> from the pci_dev which we have easy access to. This seems overall >> clearer > > I fixed that now. So much for the theory. dev->msi[x]_enabled are set after everything is set up. Some of the places are part of the setup... /me goes back to stare ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Thu, Dec 09 2021 at 12:21, Jason Gunthorpe wrote: > On Thu, Dec 09, 2021 at 09:37:06AM +0100, Thomas Gleixner wrote: > If we keep the MSI emulation in the hypervisor then MSI != IMS. The > MSI code needs to write a addr/data pair compatible with the emulation > and the IMS code needs to write an addr/data pair from the > hypercall. Seems like this scenario is best avoided! > > From this perspective I haven't connected how virtual interrupt > remapping helps in the guest? Is this a way to provide the hypercall > I'm imagining above? That was my thought to avoid having different mechanisms. The address/data pair is computed in two places: 1) Activation of an interrupt 2) Affinity setting on an interrupt Both configure the IRTE when interrupt remapping is in place. In both cases a vector is allocated in the vector domain and based on the resulting target APIC / vector number pair the IRTE is (re)configured. So putting the hypercall into the vIRTE update is the obvious place. Both activation and affinity setting can fail and propagate an error code down to the originating caller. Hmm? Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V2 19/36] PCI/MSI: Store properties in device::msi::data
On Wed, Dec 08 2021 at 11:58, Jason Gunthorpe wrote: > On Mon, Dec 06, 2021 at 11:39:26PM +0100, Thomas Gleixner wrote: >> Store the properties which are interesting for various places so the MSI >> descriptor fiddling can be removed. >> >> Signed-off-by: Thomas Gleixner >> --- >> V2: Use the setter function >> --- >> drivers/pci/msi/msi.c |8 >> 1 file changed, 8 insertions(+) > > I took more time to look at this, to summarize my remarks on the other > patches > > I think we don't need properties. The info in the msi_desc can come > from the pci_dev which we have easy access to. This seems overall > clearer I fixed that now. > The notable one is the sysfs, but that is probably better handled by > storing a > > const char *sysfs_label > > in the dev->msi and emitting that instead of computing it. I just compute is for now via is_pci_dev() and to_pci_dev()->msi_enabled. We are still debating to remove the whole thing completely. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu