Re: PCI MSI issue with reinserting a driver
On 03/02/2021 17:23, Marc Zyngier wrote: On 2021-02-02 15:46, John Garry wrote: On 02/02/2021 14:48, Marc Zyngier wrote: Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. Could this be the problem: nr_cpus=11 In alloc path, we have: its_alloc_device_irq(nvecs=27 = 16+11) bitmap_find_free_region(order = 5); In free path, we have: its_irq_domain_free(nvecs = 1) and free each 27 vecs bitmap_release_region(order = 0) So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails. [ ... ] Hi Marc, Just a friendly reminder on this issue. Let me know if anything required some our side. Cheers, John But I'm not sure that we have any requirement for those map bits to be consecutive. We can't really do that. All the events must be contiguous, and there is also a lot of assumptions in the ITS driver that LPI allocations is also contiguous. But there is also the fact that for Multi-MSI, we *must* allocate 32 vectors. Any driver could assume that if we have allocated 17 vectors, then there is another 15 available. My question still stand: how was this working with the previous behaviour? Because previously in this scenario we would allocate 32 bits and free 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy() now frees per-interrupt, instead of all irqs per domain. Before: In free path, we have: its_irq_domain_free(nvecs = 27) bitmap_release_region(count order = 5 == 32bits) Current: In free path, we have: its_irq_domain_free(nvecs = 1) for free each 27 vecs bitmap_release_region(count order = 0 == 1bit) Right. I was focusing on the patch and blindly ignored the explanation at the top of the email. Apologies for that. I'm not overly keen on handling this in the ITS though, and I'd rather we try and do it in the generic code. How about this (compile tested only)? Thanks, M. diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 6aacd342cd14..cfccad83c2df 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain, return; for (i = 0; i < nr_irqs; i++) { - if (irq_domain_get_irq_data(domain, irq_base + i)) - domain->ops->free(domain, irq_base + i, 1); + int n ; + + /* Find the largest possible span of IRQs to free in one go */ + for (n = 0; + ((i + n) < nr_irqs && + irq_domain_get_irq_data(domain, irq_base + i + n)); + n++); + + if (!n) + continue; + + domain->ops->free(domain, irq_base + i, n); + i += n; } }
Re: PCI MSI issue with reinserting a driver
On 03/02/2021 17:23, Marc Zyngier wrote: Before: In free path, we have: its_irq_domain_free(nvecs = 27) bitmap_release_region(count order = 5 == 32bits) Current: In free path, we have: its_irq_domain_free(nvecs = 1) for free each 27 vecs bitmap_release_region(count order = 0 == 1bit) Right. I was focusing on the patch and blindly ignored the explanation at the top of the email. Apologies for that. Yeah, it was a distraction. I'm not overly keen on handling this in the ITS though, and I'd rather we try and do it in the generic code. How about this (compile tested only)? diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 6aacd342cd14..cfccad83c2df 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain, return; for (i = 0; i < nr_irqs; i++) { - if (irq_domain_get_irq_data(domain, irq_base + i)) - domain->ops->free(domain, irq_base + i, 1); + int n ; + + /* Find the largest possible span of IRQs to free in one go */ + for (n = 0; + ((i + n) < nr_irqs && + irq_domain_get_irq_data(domain, irq_base + i + n)); + n++); + + if (!n) + continue; + + domain->ops->free(domain, irq_base + i, n); + i += n; } } That fixed my problem. For my benefit, if MSIs must be allocated in power of 2, could we check a flag for the dealloc method? Like, if MSI domain, then do as before 4615fbc3788d. But I'm not sure on the specific scenario which that commit fixed. Or even whether you want specifics like that in core code. Thanks, John
Re: PCI MSI issue with reinserting a driver
On 2021-02-02 15:46, John Garry wrote: On 02/02/2021 14:48, Marc Zyngier wrote: Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. Could this be the problem: nr_cpus=11 In alloc path, we have: its_alloc_device_irq(nvecs=27 = 16+11) bitmap_find_free_region(order = 5); In free path, we have: its_irq_domain_free(nvecs = 1) and free each 27 vecs bitmap_release_region(order = 0) So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails. [ ... ] But I'm not sure that we have any requirement for those map bits to be consecutive. We can't really do that. All the events must be contiguous, and there is also a lot of assumptions in the ITS driver that LPI allocations is also contiguous. But there is also the fact that for Multi-MSI, we *must* allocate 32 vectors. Any driver could assume that if we have allocated 17 vectors, then there is another 15 available. My question still stand: how was this working with the previous behaviour? Because previously in this scenario we would allocate 32 bits and free 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy() now frees per-interrupt, instead of all irqs per domain. Before: In free path, we have: its_irq_domain_free(nvecs = 27) bitmap_release_region(count order = 5 == 32bits) Current: In free path, we have: its_irq_domain_free(nvecs = 1) for free each 27 vecs bitmap_release_region(count order = 0 == 1bit) Right. I was focusing on the patch and blindly ignored the explanation at the top of the email. Apologies for that. I'm not overly keen on handling this in the ITS though, and I'd rather we try and do it in the generic code. How about this (compile tested only)? Thanks, M. diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 6aacd342cd14..cfccad83c2df 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain, return; for (i = 0; i < nr_irqs; i++) { - if (irq_domain_get_irq_data(domain, irq_base + i)) - domain->ops->free(domain, irq_base + i, 1); + int n ; + + /* Find the largest possible span of IRQs to free in one go */ + for (n = 0; +((i + n) < nr_irqs && + irq_domain_get_irq_data(domain, irq_base + i + n)); +n++); + + if (!n) + continue; + + domain->ops->free(domain, irq_base + i, n); + i += n; } } -- Jazz is not dead. It just smells funny...
Re: PCI MSI issue with reinserting a driver
On 02/02/2021 14:48, Marc Zyngier wrote: Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. Could this be the problem: nr_cpus=11 In alloc path, we have: its_alloc_device_irq(nvecs=27 = 16+11) bitmap_find_free_region(order = 5); In free path, we have: its_irq_domain_free(nvecs = 1) and free each 27 vecs bitmap_release_region(order = 0) So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails. [ ... ] But I'm not sure that we have any requirement for those map bits to be consecutive. We can't really do that. All the events must be contiguous, and there is also a lot of assumptions in the ITS driver that LPI allocations is also contiguous. But there is also the fact that for Multi-MSI, we *must* allocate 32 vectors. Any driver could assume that if we have allocated 17 vectors, then there is another 15 available. My question still stand: how was this working with the previous behaviour? Because previously in this scenario we would allocate 32 bits and free 32 bits in the map; but now we allocate 32 bits, yet only free 27 - so leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy() now frees per-interrupt, instead of all irqs per domain. Before: In free path, we have: its_irq_domain_free(nvecs = 27) bitmap_release_region(count order = 5 == 32bits) Current: In free path, we have: its_irq_domain_free(nvecs = 1) for free each 27 vecs bitmap_release_region(count order = 0 == 1bit) Cheers, John
Re: PCI MSI issue with reinserting a driver
On 2021-02-02 12:38, John Garry wrote: Here's my suspicion: two of the interrupts are mapped in the low-level domain (the ITS, I'd expect in your case), but they have never been mapped at the higher level. On teardown, we only get rid of the 30 that were actually mapped, and leave the last two dangling in the ITS domain, and thus the ITS device resources are never freed. On reload, we request another 32 interrupts, which can't be satisfied for this device. Assuming I got it right, the question is: why weren't these interrupts mapped in the PCI domain the first place. And if I got it wrong, I'm even more curious! Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. Could this be the problem: nr_cpus=11 In alloc path, we have: its_alloc_device_irq(nvecs=27 = 16+11) bitmap_find_free_region(order = 5); In free path, we have: its_irq_domain_free(nvecs = 1) and free each 27 vecs bitmap_release_region(order = 0) So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails. FWIW, this hack seems to fix it: >8- diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index ac5412b284e6..458ea0ebea2b 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, struct its_device *its_dev = info->scratchpad[0].ptr; struct its_node *its = its_dev->its; struct irq_data *irqd; - irq_hw_number_t hwirq; + irq_hw_number_t hwirq[nr_irqs]; //vla :( int err; int i; - err = its_alloc_device_irq(its_dev, nr_irqs, ); + for (i = 0; i < nr_irqs; i++) { + err = its_alloc_device_irq(its_dev, 1, [i]); + if (err) //tidy + return err; + } + - if (err) - return err; err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); if (err) return err; for (i = 0; i < nr_irqs; i++) { - err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); + err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]); if (err) return err; irq_domain_set_hwirq_and_chip(domain, virq + i, - hwirq + i, _irq_chip, its_dev); + hwirq[i], _irq_chip, its_dev); irqd = irq_get_irq_data(virq + i); irqd_set_single_target(irqd); irqd_set_affinity_on_activate(irqd); pr_debug("ID:%d pID:%d vID:%d\n", -(int)(hwirq + i - its_dev->event_map.lpi_base), -(int)(hwirq + i), virq + i); +(int)(hwirq[i] - its_dev->event_map.lpi_base), +(int)(hwirq[i]), virq + i); } 8<- But I'm not sure that we have any requirement for those map bits to be consecutive. We can't really do that. All the events must be contiguous, and there is also a lot of assumptions in the ITS driver that LPI allocations is also contiguous. But there is also the fact that for Multi-MSI, we *must* allocate 32 vectors. Any driver could assume that if we have allocated 17 vectors, then there is another 15 available. My question still stand: how was this working with the previous behaviour? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: PCI MSI issue with reinserting a driver
Here's my suspicion: two of the interrupts are mapped in the low-level domain (the ITS, I'd expect in your case), but they have never been mapped at the higher level. On teardown, we only get rid of the 30 that were actually mapped, and leave the last two dangling in the ITS domain, and thus the ITS device resources are never freed. On reload, we request another 32 interrupts, which can't be satisfied for this device. Assuming I got it right, the question is: why weren't these interrupts mapped in the PCI domain the first place. And if I got it wrong, I'm even more curious! Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. Could this be the problem: nr_cpus=11 In alloc path, we have: its_alloc_device_irq(nvecs=27 = 16+11) bitmap_find_free_region(order = 5); In free path, we have: its_irq_domain_free(nvecs = 1) and free each 27 vecs bitmap_release_region(order = 0) So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails. FWIW, this hack seems to fix it: >8- diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index ac5412b284e6..458ea0ebea2b 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, struct its_device *its_dev = info->scratchpad[0].ptr; struct its_node *its = its_dev->its; struct irq_data *irqd; - irq_hw_number_t hwirq; + irq_hw_number_t hwirq[nr_irqs]; //vla :( int err; int i; - err = its_alloc_device_irq(its_dev, nr_irqs, ); + for (i = 0; i < nr_irqs; i++) { + err = its_alloc_device_irq(its_dev, 1, [i]); + if (err) //tidy + return err; + } + - if (err) - return err; err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); if (err) return err; for (i = 0; i < nr_irqs; i++) { - err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); + err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]); if (err) return err; irq_domain_set_hwirq_and_chip(domain, virq + i, - hwirq + i, _irq_chip, its_dev); + hwirq[i], _irq_chip, its_dev); irqd = irq_get_irq_data(virq + i); irqd_set_single_target(irqd); irqd_set_affinity_on_activate(irqd); pr_debug("ID:%d pID:%d vID:%d\n", -(int)(hwirq + i - its_dev->event_map.lpi_base), -(int)(hwirq + i), virq + i); +(int)(hwirq[i] - its_dev->event_map.lpi_base), +(int)(hwirq[i]), virq + i); } 8<- But I'm not sure that we have any requirement for those map bits to be consecutive. Thanks, John
Re: PCI MSI issue with reinserting a driver
On 01/02/2021 18:50, Marc Zyngier wrote: Hi Marc, Just a heads-up, by chance I noticed that I can't re-insert a specific driver on v5.11-rc6: [ 64.356023] hisi_dma :7b:00.0: Adding to iommu group 31 [ 64.368627] hisi_dma :7b:00.0: enabling device ( -> 0002) [ 64.384156] hisi_dma :7b:00.0: Failed to allocate MSI vectors! [ 64.397180] hisi_dma: probe of :7b:00.0 failed with error -28 That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y Bisect tells me that this is the first bad commit: 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has no mapping The relevant driver code is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547 That driver only allocates 30 MSI, so maybe there's a problem with not allocating (and freeing) all 32 MSI. Are they Multi-MSI (and not MSI-X)? multi-msi I'll have a bit more of a look tomorrow. Here's my suspicion: two of the interrupts are mapped in the low-level domain (the ITS, I'd expect in your case), but they have never been mapped at the higher level. On teardown, we only get rid of the 30 that were actually mapped, and leave the last two dangling in the ITS domain, and thus the ITS device resources are never freed. On reload, we request another 32 interrupts, which can't be satisfied for this device. Assuming I got it right, the question is: why weren't these interrupts mapped in the PCI domain the first place. And if I got it wrong, I'm even more curious! Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI. Anyway, let me have a look today to see what is going wrong. cheers, John
Re: PCI MSI issue with reinserting a driver
Hi John, On Mon, 01 Feb 2021 18:34:59 +, John Garry wrote: > > Just a heads-up, by chance I noticed that I can't re-insert a specific > driver on v5.11-rc6: > > [ 64.356023] hisi_dma :7b:00.0: Adding to iommu group 31 > [ 64.368627] hisi_dma :7b:00.0: enabling device ( -> 0002) > [ 64.384156] hisi_dma :7b:00.0: Failed to allocate MSI vectors! > [ 64.397180] hisi_dma: probe of :7b:00.0 failed with error -28 > > That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y > > Bisect tells me that this is the first bad commit: > 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has > no mapping > > The relevant driver code is > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547 > > That driver only allocates 30 MSI, so maybe there's a problem with not > allocating (and freeing) all 32 MSI. Are they Multi-MSI (and not MSI-X)? > I'll have a bit more of a look tomorrow. Here's my suspicion: two of the interrupts are mapped in the low-level domain (the ITS, I'd expect in your case), but they have never been mapped at the higher level. On teardown, we only get rid of the 30 that were actually mapped, and leave the last two dangling in the ITS domain, and thus the ITS device resources are never freed. On reload, we request another 32 interrupts, which can't be satisfied for this device. Assuming I got it right, the question is: why weren't these interrupts mapped in the PCI domain the first place. And if I got it wrong, I'm even more curious! Thanks, M. -- Without deviation from the norm, progress is not possible.