Re: PCI MSI issue with reinserting a driver

2021-04-06 Thread John Garry

On 03/02/2021 17:23, Marc Zyngier wrote:

On 2021-02-02 15:46, John Garry wrote:

On 02/02/2021 14:48, Marc Zyngier wrote:


Not sure. I also now notice an error for the SAS PCI driver on D06 
when nr_cpus < 16, which means number of MSI vectors allocated < 
32, so looks the same problem. There we try to allocate 16 + max(nr 
cpus, 16) MSI.


Anyway, let me have a look today to see what is going wrong.


Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
  bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.


[ ... ]





Hi Marc,

Just a friendly reminder on this issue. Let me know if anything required 
some our side.


Cheers,
John



But I'm not sure that we have any requirement for those map bits to be
consecutive.


We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?


Because previously in this scenario we would allocate 32 bits and free
32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
now frees per-interrupt, instead of all irqs per domain.

Before:
 In free path, we have:
 its_irq_domain_free(nvecs = 27)
   bitmap_release_region(count order = 5 == 32bits)

Current:
 In free path, we have:
 its_irq_domain_free(nvecs = 1) for free each 27 vecs
   bitmap_release_region(count order = 0 == 1bit)


Right. I was focusing on the patch and blindly ignored the explanation
at the top of the email. Apologies for that.

I'm not overly keen on handling this in the ITS though, and I'd rather
we try and do it in the generic code. How about this (compile tested
only)?

Thanks,

     M.

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6aacd342cd14..cfccad83c2df 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
irq_domain *domain,

  return;

  for (i = 0; i < nr_irqs; i++) {
-    if (irq_domain_get_irq_data(domain, irq_base + i))
-    domain->ops->free(domain, irq_base + i, 1);
+    int n ;
+
+    /* Find the largest possible span of IRQs to free in one go */
+    for (n = 0;
+ ((i + n) < nr_irqs &&
+  irq_domain_get_irq_data(domain, irq_base + i + n));
+ n++);
+
+    if (!n)
+    continue;
+
+    domain->ops->free(domain, irq_base + i, n);
+    i += n;
  }
  }






Re: PCI MSI issue with reinserting a driver

2021-02-04 Thread John Garry

On 03/02/2021 17:23, Marc Zyngier wrote:


Before:
 In free path, we have:
 its_irq_domain_free(nvecs = 27)
   bitmap_release_region(count order = 5 == 32bits)

Current:
 In free path, we have:
 its_irq_domain_free(nvecs = 1) for free each 27 vecs
   bitmap_release_region(count order = 0 == 1bit)


Right. I was focusing on the patch and blindly ignored the explanation
at the top of the email. Apologies for that.


Yeah, it was a distraction.



I'm not overly keen on handling this in the ITS though, and I'd rather
we try and do it in the generic code. How about this (compile tested
only)?

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6aacd342cd14..cfccad83c2df 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
irq_domain *domain,

  return;

  for (i = 0; i < nr_irqs; i++) {
-    if (irq_domain_get_irq_data(domain, irq_base + i))
-    domain->ops->free(domain, irq_base + i, 1);
+    int n ;
+
+    /* Find the largest possible span of IRQs to free in one go */
+    for (n = 0;
+ ((i + n) < nr_irqs &&
+  irq_domain_get_irq_data(domain, irq_base + i + n));
+ n++);
+
+    if (!n)
+    continue;
+
+    domain->ops->free(domain, irq_base + i, n);
+    i += n;
  }
  }


That fixed my problem.

For my benefit, if MSIs must be allocated in power of 2, could we check 
a flag for the dealloc method? Like, if MSI domain, then do as before 
4615fbc3788d. But I'm not sure on the specific scenario which that 
commit fixed. Or even whether you want specifics like that in core code.


Thanks,
John


Re: PCI MSI issue with reinserting a driver

2021-02-03 Thread Marc Zyngier

On 2021-02-02 15:46, John Garry wrote:

On 02/02/2021 14:48, Marc Zyngier wrote:


Not sure. I also now notice an error for the SAS PCI driver on D06 
when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
so looks the same problem. There we try to allocate 16 + max(nr 
cpus, 16) MSI.


Anyway, let me have a look today to see what is going wrong.


Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
  bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.


[ ... ]




But I'm not sure that we have any requirement for those map bits to 
be

consecutive.


We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?


Because previously in this scenario we would allocate 32 bits and free
32 bits in the map; but now we allocate 32 bits, yet only free 27 - so
leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy()
now frees per-interrupt, instead of all irqs per domain.

Before:
 In free path, we have:
 its_irq_domain_free(nvecs = 27)
   bitmap_release_region(count order = 5 == 32bits)

Current:
 In free path, we have:
 its_irq_domain_free(nvecs = 1) for free each 27 vecs
   bitmap_release_region(count order = 0 == 1bit)


Right. I was focusing on the patch and blindly ignored the explanation
at the top of the email. Apologies for that.

I'm not overly keen on handling this in the ITS though, and I'd rather
we try and do it in the generic code. How about this (compile tested
only)?

Thanks,

M.

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6aacd342cd14..cfccad83c2df 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1399,8 +1399,19 @@ static void irq_domain_free_irqs_hierarchy(struct 
irq_domain *domain,

return;

for (i = 0; i < nr_irqs; i++) {
-   if (irq_domain_get_irq_data(domain, irq_base + i))
-   domain->ops->free(domain, irq_base + i, 1);
+   int n ;
+
+   /* Find the largest possible span of IRQs to free in one go */
+   for (n = 0;
+((i + n) < nr_irqs &&
+ irq_domain_get_irq_data(domain, irq_base + i + n));
+n++);
+
+   if (!n)
+   continue;
+
+   domain->ops->free(domain, irq_base + i, n);
+   i += n;
}
 }


--
Jazz is not dead. It just smells funny...


Re: PCI MSI issue with reinserting a driver

2021-02-02 Thread John Garry

On 02/02/2021 14:48, Marc Zyngier wrote:


Not sure. I also now notice an error for the SAS PCI driver on D06 
when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
so looks the same problem. There we try to allocate 16 + max(nr cpus, 
16) MSI.


Anyway, let me have a look today to see what is going wrong.


Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
  bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.


[ ... ]




But I'm not sure that we have any requirement for those map bits to be
consecutive.


We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?


Because previously in this scenario we would allocate 32 bits and free 
32 bits in the map; but now we allocate 32 bits, yet only free 27 - so 
leak 5 bits. And this comes from how irq_domain_free_irqs_hierarchy() 
now frees per-interrupt, instead of all irqs per domain.


Before:
 In free path, we have:
 its_irq_domain_free(nvecs = 27)
   bitmap_release_region(count order = 5 == 32bits)

Current:
 In free path, we have:
 its_irq_domain_free(nvecs = 1) for free each 27 vecs
   bitmap_release_region(count order = 0 == 1bit)

Cheers,
John


Re: PCI MSI issue with reinserting a driver

2021-02-02 Thread Marc Zyngier

On 2021-02-02 12:38, John Garry wrote:
Here's my suspicion: two of the interrupts are mapped in the 
low-level

domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS 
device

resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these 
interrupts

mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!


Not sure. I also now notice an error for the SAS PCI driver on D06 
when nr_cpus < 16, which means number of MSI vectors allocated < 32, 
so looks the same problem. There we try to allocate 16 + max(nr cpus, 
16) MSI.


Anyway, let me have a look today to see what is going wrong.


Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
  bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

FWIW, this hack seems to fix it:

>8-

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index ac5412b284e6..458ea0ebea2b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c

@@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct
irq_domain *domain, unsigned int virq,
struct its_device *its_dev = info->scratchpad[0].ptr;
struct its_node *its = its_dev->its;
struct irq_data *irqd;
-   irq_hw_number_t hwirq;
+   irq_hw_number_t hwirq[nr_irqs]; //vla :(
int err;
int i;

-   err = its_alloc_device_irq(its_dev, nr_irqs, );
+   for (i = 0; i < nr_irqs; i++) {
+   err = its_alloc_device_irq(its_dev, 1, [i]);
+   if (err) //tidy
+   return err;
+   }
+
-   if (err)
-   return err;

err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
if (err)
return err;

for (i = 0; i < nr_irqs; i++) {
-   err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
+   err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
if (err)
return err;

irq_domain_set_hwirq_and_chip(domain, virq + i,
- hwirq + i, _irq_chip, 
its_dev);
+ hwirq[i], _irq_chip, its_dev);
irqd = irq_get_irq_data(virq + i);
irqd_set_single_target(irqd);
irqd_set_affinity_on_activate(irqd);
pr_debug("ID:%d pID:%d vID:%d\n",
-(int)(hwirq + i - its_dev->event_map.lpi_base),
-(int)(hwirq + i), virq + i);
+(int)(hwirq[i] - its_dev->event_map.lpi_base),
+(int)(hwirq[i]), virq + i);
}
8<-


But I'm not sure that we have any requirement for those map bits to be
consecutive.


We can't really do that. All the events must be contiguous,
and there is also a lot of assumptions in the ITS driver that
LPI allocations is also contiguous.

But there is also the fact that for Multi-MSI, we *must*
allocate 32 vectors. Any driver could assume that if we have
allocated 17 vectors, then there is another 15 available.

My question still stand: how was this working with the previous
behaviour?

Thanks,

M.
--
Jazz is not dead. It just smells funny...


Re: PCI MSI issue with reinserting a driver

2021-02-02 Thread John Garry

Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!


Not sure. I also now notice an error for the SAS PCI driver on D06 when 
nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks 
the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.


Anyway, let me have a look today to see what is going wrong.


Could this be the problem:

nr_cpus=11

In alloc path, we have:
its_alloc_device_irq(nvecs=27 = 16+11)
  bitmap_find_free_region(order = 5);
In free path, we have:
its_irq_domain_free(nvecs = 1) and free each 27 vecs
  bitmap_release_region(order = 0)

So we allocate 32 bits, but only free 27. And 2nd alloc for 32 fails.

FWIW, this hack seems to fix it:

>8-

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index ac5412b284e6..458ea0ebea2b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c

@@ -3533,34 +3534,39 @@ static int its_irq_domain_alloc(struct 
irq_domain *domain, unsigned int virq,

struct its_device *its_dev = info->scratchpad[0].ptr;
struct its_node *its = its_dev->its;
struct irq_data *irqd;
-   irq_hw_number_t hwirq;
+   irq_hw_number_t hwirq[nr_irqs]; //vla :(
int err;
int i;

-   err = its_alloc_device_irq(its_dev, nr_irqs, );
+   for (i = 0; i < nr_irqs; i++) {
+   err = its_alloc_device_irq(its_dev, 1, [i]);
+   if (err) //tidy
+   return err;
+   }
+   
-   if (err)
-   return err;

err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
if (err)
return err;

for (i = 0; i < nr_irqs; i++) {
-   err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
+   err = its_irq_gic_domain_alloc(domain, virq + i, hwirq[i]);
if (err)
return err;

irq_domain_set_hwirq_and_chip(domain, virq + i,
- hwirq + i, _irq_chip, 
its_dev);
+ hwirq[i], _irq_chip, its_dev);
irqd = irq_get_irq_data(virq + i);
irqd_set_single_target(irqd);
irqd_set_affinity_on_activate(irqd);
pr_debug("ID:%d pID:%d vID:%d\n",
-(int)(hwirq + i - its_dev->event_map.lpi_base),
-(int)(hwirq + i), virq + i);
+(int)(hwirq[i] - its_dev->event_map.lpi_base),
+(int)(hwirq[i]), virq + i);
}
8<-


But I'm not sure that we have any requirement for those map bits to be 
consecutive.


Thanks,
John


Re: PCI MSI issue with reinserting a driver

2021-02-02 Thread John Garry

On 01/02/2021 18:50, Marc Zyngier wrote:

Hi Marc,


Just a heads-up, by chance I noticed that I can't re-insert a specific
driver on v5.11-rc6:

[   64.356023] hisi_dma :7b:00.0: Adding to iommu group 31
[   64.368627] hisi_dma :7b:00.0: enabling device ( -> 0002)
[   64.384156] hisi_dma :7b:00.0: Failed to allocate MSI vectors!
[   64.397180] hisi_dma: probe of :7b:00.0 failed with error -28

That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y

Bisect tells me that this is the first bad commit:
4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
no mapping

The relevant driver code is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547

That driver only allocates 30 MSI, so maybe there's a problem with not
allocating (and freeing) all 32 MSI.

Are they Multi-MSI (and not MSI-X)?


multi-msi




I'll have a bit more of a look tomorrow.

Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!


Not sure. I also now notice an error for the SAS PCI driver on D06 when 
nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks 
the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.


Anyway, let me have a look today to see what is going wrong.

cheers,
John


Re: PCI MSI issue with reinserting a driver

2021-02-01 Thread Marc Zyngier
Hi John,

On Mon, 01 Feb 2021 18:34:59 +,
John Garry  wrote:
> 
> Just a heads-up, by chance I noticed that I can't re-insert a specific
> driver on v5.11-rc6:
> 
> [   64.356023] hisi_dma :7b:00.0: Adding to iommu group 31
> [   64.368627] hisi_dma :7b:00.0: enabling device ( -> 0002)
> [   64.384156] hisi_dma :7b:00.0: Failed to allocate MSI vectors!
> [   64.397180] hisi_dma: probe of :7b:00.0 failed with error -28
> 
> That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
> 
> Bisect tells me that this is the first bad commit:
> 4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
> no mapping
> 
> The relevant driver code is
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547
> 
> That driver only allocates 30 MSI, so maybe there's a problem with not
> allocating (and freeing) all 32 MSI.

Are they Multi-MSI (and not MSI-X)?

> I'll have a bit more of a look tomorrow.

Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.