Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Christoph Hellwig 
Date: Tue, 22 Aug 2017 18:39:16 +0200

> On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote:
>> > I fear my commit message (but not the code) might be wrong.
>> > irq_create_affinity_masks can return NULL any time we don't have any
>> > affinity masks.  I've already had a discussion about this elsewhere
>> > with Bjorn, and I suspect we need to kill the warning or move it
>> > to irq_create_affinity_masks only for genuine failure cases.
>> 
>> This is a rather large machine with 64 or more cpus and several NUMA
>> nodes.  Why wouldn't there be any affinity masks available?
> 
> The drivers only asked for two MSI-X vectors, and marked bost of them
> as pre-vectors that should not be spread.  So there is no actual
> vector left that we want to actually spread.

Ok, now it makes more sense, and yes the warning should be removed.




Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Christoph Hellwig 
Date: Tue, 22 Aug 2017 18:39:16 +0200

> On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote:
>> > I fear my commit message (but not the code) might be wrong.
>> > irq_create_affinity_masks can return NULL any time we don't have any
>> > affinity masks.  I've already had a discussion about this elsewhere
>> > with Bjorn, and I suspect we need to kill the warning or move it
>> > to irq_create_affinity_masks only for genuine failure cases.
>> 
>> This is a rather large machine with 64 or more cpus and several NUMA
>> nodes.  Why wouldn't there be any affinity masks available?
> 
> The drivers only asked for two MSI-X vectors, and marked bost of them
> as pre-vectors that should not be spread.  So there is no actual
> vector left that we want to actually spread.

Ok, now it makes more sense, and yes the warning should be removed.




Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Meelis Roos 
Date: Tue, 22 Aug 2017 19:33:55 +0300 (EEST)

>> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
>> >> I ask because the commit log message indicates that this failure is
>> >> not expected to ever happen on SMP.
>> > 
>> > I fear my commit message (but not the code) might be wrong.
>> > irq_create_affinity_masks can return NULL any time we don't have any
>> > affinity masks.  I've already had a discussion about this elsewhere
>> > with Bjorn, and I suspect we need to kill the warning or move it
>> > to irq_create_affinity_masks only for genuine failure cases.
>> 
>> This is a rather large machine with 64 or more cpus and several NUMA
>> nodes.  Why wouldn't there be any affinity masks available?
> 
> T5120 with 1 slot and 32 threads total. I have not configured any NUM on 
> it is there any reason for that?

Ok 32 cpus and 1 NUMA node, my bad :-)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Meelis Roos 
Date: Tue, 22 Aug 2017 19:33:55 +0300 (EEST)

>> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
>> >> I ask because the commit log message indicates that this failure is
>> >> not expected to ever happen on SMP.
>> > 
>> > I fear my commit message (but not the code) might be wrong.
>> > irq_create_affinity_masks can return NULL any time we don't have any
>> > affinity masks.  I've already had a discussion about this elsewhere
>> > with Bjorn, and I suspect we need to kill the warning or move it
>> > to irq_create_affinity_masks only for genuine failure cases.
>> 
>> This is a rather large machine with 64 or more cpus and several NUMA
>> nodes.  Why wouldn't there be any affinity masks available?
> 
> T5120 with 1 slot and 32 threads total. I have not configured any NUM on 
> it is there any reason for that?

Ok 32 cpus and 1 NUMA node, my bad :-)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Christoph Hellwig
On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote:
> > I fear my commit message (but not the code) might be wrong.
> > irq_create_affinity_masks can return NULL any time we don't have any
> > affinity masks.  I've already had a discussion about this elsewhere
> > with Bjorn, and I suspect we need to kill the warning or move it
> > to irq_create_affinity_masks only for genuine failure cases.
> 
> This is a rather large machine with 64 or more cpus and several NUMA
> nodes.  Why wouldn't there be any affinity masks available?

The drivers only asked for two MSI-X vectors, and marked bost of them
as pre-vectors that should not be spread.  So there is no actual
vector left that we want to actually spread.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Christoph Hellwig
On Tue, Aug 22, 2017 at 09:31:39AM -0700, David Miller wrote:
> > I fear my commit message (but not the code) might be wrong.
> > irq_create_affinity_masks can return NULL any time we don't have any
> > affinity masks.  I've already had a discussion about this elsewhere
> > with Bjorn, and I suspect we need to kill the warning or move it
> > to irq_create_affinity_masks only for genuine failure cases.
> 
> This is a rather large machine with 64 or more cpus and several NUMA
> nodes.  Why wouldn't there be any affinity masks available?

The drivers only asked for two MSI-X vectors, and marked bost of them
as pre-vectors that should not be spread.  So there is no actual
vector left that we want to actually spread.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Meelis Roos
> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
> >> I ask because the commit log message indicates that this failure is
> >> not expected to ever happen on SMP.
> > 
> > I fear my commit message (but not the code) might be wrong.
> > irq_create_affinity_masks can return NULL any time we don't have any
> > affinity masks.  I've already had a discussion about this elsewhere
> > with Bjorn, and I suspect we need to kill the warning or move it
> > to irq_create_affinity_masks only for genuine failure cases.
> 
> This is a rather large machine with 64 or more cpus and several NUMA
> nodes.  Why wouldn't there be any affinity masks available?

T5120 with 1 slot and 32 threads total. I have not configured any NUM on 
it is there any reason for that?

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Meelis Roos
> > On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
> >> I ask because the commit log message indicates that this failure is
> >> not expected to ever happen on SMP.
> > 
> > I fear my commit message (but not the code) might be wrong.
> > irq_create_affinity_masks can return NULL any time we don't have any
> > affinity masks.  I've already had a discussion about this elsewhere
> > with Bjorn, and I suspect we need to kill the warning or move it
> > to irq_create_affinity_masks only for genuine failure cases.
> 
> This is a rather large machine with 64 or more cpus and several NUMA
> nodes.  Why wouldn't there be any affinity masks available?

T5120 with 1 slot and 32 threads total. I have not configured any NUM on 
it is there any reason for that?

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Christoph Hellwig 
Date: Tue, 22 Aug 2017 08:35:05 +0200

> On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
>> I ask because the commit log message indicates that this failure is
>> not expected to ever happen on SMP.
> 
> I fear my commit message (but not the code) might be wrong.
> irq_create_affinity_masks can return NULL any time we don't have any
> affinity masks.  I've already had a discussion about this elsewhere
> with Bjorn, and I suspect we need to kill the warning or move it
> to irq_create_affinity_masks only for genuine failure cases.

This is a rather large machine with 64 or more cpus and several NUMA
nodes.  Why wouldn't there be any affinity masks available?

That's why I want to root cause this.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread David Miller
From: Christoph Hellwig 
Date: Tue, 22 Aug 2017 08:35:05 +0200

> On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
>> I ask because the commit log message indicates that this failure is
>> not expected to ever happen on SMP.
> 
> I fear my commit message (but not the code) might be wrong.
> irq_create_affinity_masks can return NULL any time we don't have any
> affinity masks.  I've already had a discussion about this elsewhere
> with Bjorn, and I suspect we need to kill the warning or move it
> to irq_create_affinity_masks only for genuine failure cases.

This is a rather large machine with 64 or more cpus and several NUMA
nodes.  Why wouldn't there be any affinity masks available?

That's why I want to root cause this.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Christoph Hellwig
On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
> I ask because the commit log message indicates that this failure is
> not expected to ever happen on SMP.

I fear my commit message (but not the code) might be wrong.
irq_create_affinity_masks can return NULL any time we don't have any
affinity masks.  I've already had a discussion about this elsewhere
with Bjorn, and I suspect we need to kill the warning or move it
to irq_create_affinity_masks only for genuine failure cases.

> 
> We really need to root cause this.
---end quoted text---


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-22 Thread Christoph Hellwig
On Mon, Aug 21, 2017 at 01:35:49PM -0700, David Miller wrote:
> I ask because the commit log message indicates that this failure is
> not expected to ever happen on SMP.

I fear my commit message (but not the code) might be wrong.
irq_create_affinity_masks can return NULL any time we don't have any
affinity masks.  I've already had a discussion about this elsewhere
with Bjorn, and I suspect we need to kill the warning or move it
to irq_create_affinity_masks only for genuine failure cases.

> 
> We really need to root cause this.
---end quoted text---


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread Meelis Roos
> 
> >> I think with this patch from -rc6 the symptoms should be cured:
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
> >> 
> >> if that theory is right.
> > 
> > The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
> > affinty maks are still there but the rest of the detection works and the 
> > driver is loaded successfully:
> 
> Is this an SMP system?

Yes, T5120.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread Meelis Roos
> 
> >> I think with this patch from -rc6 the symptoms should be cured:
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
> >> 
> >> if that theory is right.
> > 
> > The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
> > affinty maks are still there but the rest of the detection works and the 
> > driver is loaded successfully:
> 
> Is this an SMP system?

Yes, T5120.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread David Miller
From: mr...@linux.ee
Date: Mon, 21 Aug 2017 22:20:22 +0300 (EEST)

>> I think with this patch from -rc6 the symptoms should be cured:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
>> 
>> if that theory is right.
> 
> The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
> affinty maks are still there but the rest of the detection works and the 
> driver is loaded successfully:

Is this an SMP system?

I ask because the commit log message indicates that this failure is
not expected to ever happen on SMP.

We really need to root cause this.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread David Miller
From: mr...@linux.ee
Date: Mon, 21 Aug 2017 22:20:22 +0300 (EEST)

>> I think with this patch from -rc6 the symptoms should be cured:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
>> 
>> if that theory is right.
> 
> The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
> affinty maks are still there but the rest of the detection works and the 
> driver is loaded successfully:

Is this an SMP system?

I ask because the commit log message indicates that this failure is
not expected to ever happen on SMP.

We really need to root cause this.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread mroos
> I think with this patch from -rc6 the symptoms should be cured:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
> 
> if that theory is right.

The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
affinty maks are still there but the rest of the detection works and the 
driver is loaded successfully:

[   29.924282] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
10.00.00.00-k.
[   29.924710] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 
0x00c100d0.
[   29.925581] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 2 
vectors
[   30.483422] scsi host1: qla2xxx
[   35.495031] qla2xxx [:10:00.0]-00fb:1: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   35.495274] qla2xxx [:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.0 hdma- host#=1 fw=7.03.00 (9496).
[   35.495615] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 
0x00c100d04000.
[   35.496409] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 2 
vectors
[   35.985355] scsi host2: qla2xxx
[   40.996991] qla2xxx [:10:00.1]-00fb:2: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   40.997251] qla2xxx [:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.1 hdma- host#=2 fw=7.03.00 (9496).
[   51.880945] qla2xxx [:10:00.0]-8038:1: Cable is unplugged...
[   57.402900] qla2xxx [:10:00.1]-8038:2: Cable is unplugged...

With Dave Millers patch on top of 4.13-rc6, I see the following before 
both MSI-X messages:

irq_create_affinity_masks: nvecs[2] affd->pre_vectors[2] affd->post_vectors[0]

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread mroos
> I think with this patch from -rc6 the symptoms should be cured:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7
> 
> if that theory is right.

The result with 4.13-rc6 is positive but mixed: the message about MSI-X 
affinty maks are still there but the rest of the detection works and the 
driver is loaded successfully:

[   29.924282] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 
10.00.00.00-k.
[   29.924710] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 
0x00c100d0.
[   29.925581] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 2 
vectors
[   30.483422] scsi host1: qla2xxx
[   35.495031] qla2xxx [:10:00.0]-00fb:1: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   35.495274] qla2xxx [:10:00.0]-00fc:1: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.0 hdma- host#=1 fw=7.03.00 (9496).
[   35.495615] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 
0x00c100d04000.
[   35.496409] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 2 
vectors
[   35.985355] scsi host2: qla2xxx
[   40.996991] qla2xxx [:10:00.1]-00fb:2: QLogic QLE2462 - 
SG-(X)PCIE2FC-QF4, Sun StorageTek 4 Gb FC Enterprise PCI-Express Dual Channel H.
[   40.997251] qla2xxx [:10:00.1]-00fc:2: ISP2432: PCIe (2.5GT/s x4) @ 
:10:00.1 hdma- host#=2 fw=7.03.00 (9496).
[   51.880945] qla2xxx [:10:00.0]-8038:1: Cable is unplugged...
[   57.402900] qla2xxx [:10:00.1]-8038:2: Cable is unplugged...

With Dave Millers patch on top of 4.13-rc6, I see the following before 
both MSI-X messages:

irq_create_affinity_masks: nvecs[2] affd->pre_vectors[2] affd->post_vectors[0]

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread Christoph Hellwig
I think with this patch from -rc6 the symptoms should be cured:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7

if that theory is right.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread Christoph Hellwig
I think with this patch from -rc6 the symptoms should be cured:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c005390374957baacbc38eef96ea360559510aa7

if that theory is right.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread David Miller
From: Bjorn Helgaas 
Date: Wed, 16 Aug 2017 14:02:41 -0500

> On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
>> > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
>> > > > sparc64 
>> > > > t5120 server: can't allocate MSI-X affinity masks.
>> > > > 
>> > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
>> > > > Driver: 10.00.00.00-k.
>> > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
>> > > > iobase 0x00c100d0.
>> > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
>> > > > masks for 2 vectors
>> > > > [   30.816882] scsi host1: qla2xxx
>> > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
>> > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
>> > > > iobase 0x00c100d04000.
>> > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
>> > > > masks for 2 vectors
>> > > > [   31.367083] scsi host1: qla2xxx
>> > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
>> > > > 
>> > > > I do not know if the driver works since nothing is attached to the FC 
>> > > > HBA at the moment, but from the error messages it looks like the 
>> > > > driver 
>> > > > fails to load.
>> > > > 
>> > > > I booted 4.12 and 4.11 - the red error is not there but the failure 
>> > > > seems to be the same error -22:
>> > 
>> > 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
>> > with more verbose MSI messages. So something between 4.10 and 4.11 has 
>> > broken it.
>> 
>> I can not reproduice the older kernels that misbehave. I checked out 
>> earlier kernels and recompiled them (old config lost, nothing changed 
>> AFAIK), everything works up to 4.12 inclusive.
>> 
>> > Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
>> > be sun4v interrupt related.
>> 
>> This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
>> machines.
> 
> IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
> regression we introduced this cycle.
> 
> If nobody steps up with a theory, bisecting might be the easiest path
> forward.

I suspect the test added by:

commit 6f9a22bc5775d231ab8fbe2c2f3c88e45e3e7c28
Author: Michael Hernandez 
Date:   Thu May 18 10:47:47 2017 -0700

PCI/MSI: Ignore affinity if pre/post vector count is more than min_vecs

is triggering.

The rest of the failure cases are memory allocation failures which should
not be happening here.

There have only been 5 commits to kernel/irq/affinity.c since v4.10

I suppose we have been getting away with something that has silently
been allowed in the past, or something like that.

Meelis can you run with the following debuggingspatch?

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index d69bd77252a7..d16c6326000a 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -110,6 +110,9 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
struct cpumask *masks;
cpumask_var_t nmsk, *node_to_present_cpumask;
 
+   pr_info("irq_create_affinity_masks: nvecs[%d] affd->pre_vectors[%d] "
+   "affd->post_vectors[%d]\n",
+   nvecs, affd->pre_vectors, affd->post_vectors);
/*
 * If there aren't any vectors left after applying the pre/post
 * vectors don't bother with assigning affinity.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-21 Thread David Miller
From: Bjorn Helgaas 
Date: Wed, 16 Aug 2017 14:02:41 -0500

> On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
>> > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
>> > > > sparc64 
>> > > > t5120 server: can't allocate MSI-X affinity masks.
>> > > > 
>> > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
>> > > > Driver: 10.00.00.00-k.
>> > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
>> > > > iobase 0x00c100d0.
>> > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
>> > > > masks for 2 vectors
>> > > > [   30.816882] scsi host1: qla2xxx
>> > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
>> > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
>> > > > iobase 0x00c100d04000.
>> > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
>> > > > masks for 2 vectors
>> > > > [   31.367083] scsi host1: qla2xxx
>> > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
>> > > > 
>> > > > I do not know if the driver works since nothing is attached to the FC 
>> > > > HBA at the moment, but from the error messages it looks like the 
>> > > > driver 
>> > > > fails to load.
>> > > > 
>> > > > I booted 4.12 and 4.11 - the red error is not there but the failure 
>> > > > seems to be the same error -22:
>> > 
>> > 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
>> > with more verbose MSI messages. So something between 4.10 and 4.11 has 
>> > broken it.
>> 
>> I can not reproduice the older kernels that misbehave. I checked out 
>> earlier kernels and recompiled them (old config lost, nothing changed 
>> AFAIK), everything works up to 4.12 inclusive.
>> 
>> > Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
>> > be sun4v interrupt related.
>> 
>> This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
>> machines.
> 
> IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
> regression we introduced this cycle.
> 
> If nobody steps up with a theory, bisecting might be the easiest path
> forward.

I suspect the test added by:

commit 6f9a22bc5775d231ab8fbe2c2f3c88e45e3e7c28
Author: Michael Hernandez 
Date:   Thu May 18 10:47:47 2017 -0700

PCI/MSI: Ignore affinity if pre/post vector count is more than min_vecs

is triggering.

The rest of the failure cases are memory allocation failures which should
not be happening here.

There have only been 5 commits to kernel/irq/affinity.c since v4.10

I suppose we have been getting away with something that has silently
been allowed in the past, or something like that.

Meelis can you run with the following debuggingspatch?

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index d69bd77252a7..d16c6326000a 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -110,6 +110,9 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
struct cpumask *masks;
cpumask_var_t nmsk, *node_to_present_cpumask;
 
+   pr_info("irq_create_affinity_masks: nvecs[%d] affd->pre_vectors[%d] "
+   "affd->post_vectors[%d]\n",
+   nvecs, affd->pre_vectors, affd->post_vectors);
/*
 * If there aren't any vectors left after applying the pre/post
 * vectors don't bother with assigning affinity.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Meelis Roos
> On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
> > > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
> > > > > sparc64 
> > > > > t5120 server: can't allocate MSI-X affinity masks.
> > > > > 
> > > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel 
> > > > > HBA Driver: 10.00.00.00-k.
> > > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > > > iobase 0x00c100d0.
> > > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
> > > > > masks for 2 vectors
> > > > > [   30.816882] scsi host1: qla2xxx
> > > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > > > iobase 0x00c100d04000.
> > > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
> > > > > masks for 2 vectors
> > > > > [   31.367083] scsi host1: qla2xxx
> > > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > > > 

> IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
> regression we introduced this cycle.

Yes, I understand the same.

But under some circumstances/configs it has been probematic before too. 
I could not reproduce the circumstances.

> If nobody steps up with a theory, bisecting might be the easiest path
> forward.

I finished bisecting but was not successful. The pattern was strange:
good good skip good skip good skip  bad bad bad bad bad bad.

The first bad commit was - unrelated xen merge. Reverting this commit 
does not fix the problem.

Like at some moment it got broken by side effects (code size or 
whatever). The skips were most because of repeated on on sparc cpuidle 
code, and initially some in iommu related code. This might bend the 
results so some commits were not tested.

git bisect start
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [aae4e7a8bc44722fe70d58920a36916b1043195e] Linux 4.13-rc4
git bisect bad aae4e7a8bc44722fe70d58920a36916b1043195e
# good: [920f2ecdf6c3b3526f60fbd38c68597953cad3ee] Merge tag 'sound-4.13-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 920f2ecdf6c3b3526f60fbd38c68597953cad3ee
# skip: [af3c8d98508d37541d4bf57f13a984a7f73a328c] Merge tag 'drm-for-v4.13' of 
git://people.freedesktop.org/~airlied/linux
git bisect skip af3c8d98508d37541d4bf57f13a984a7f73a328c
# good: [d29cb3e45e923715f74d8a08d5c1ea996dce5a59] xfs: make _bmap_count_blocks 
consistent wrt delalloc extent behavior
git bisect good d29cb3e45e923715f74d8a08d5c1ea996dce5a59
# good: [fa6d095eb23a8b1aae78d221879032497f6e457f] drm/tegra: Add driver 
documentation
git bisect good fa6d095eb23a8b1aae78d221879032497f6e457f
# good: [37e51a7640c275999ea0c35410c42e6d896ff7fa] mm: clean up error handling 
in write_one_page
git bisect good 37e51a7640c275999ea0c35410c42e6d896ff7fa
# good: [4b9cdd96e7ea3dc2cd0edac67835f6f38c4f14c9] drm/omap: remove CLUT
git bisect good 4b9cdd96e7ea3dc2cd0edac67835f6f38c4f14c9
# good: [7f56c30bd0a232822aca38d288da475613bdff9b] vfio: Remove unnecessary 
uses of vfio_container.group_lock
git bisect good 7f56c30bd0a232822aca38d288da475613bdff9b
# good: [d7631e30434e7fcf025dd2a7cba879f203f7849b] switch 
compat_drm_getsareactx() to drm_ioctl_kernel()
git bisect good d7631e30434e7fcf025dd2a7cba879f203f7849b
# skip: [f991af3daabaecff34684fd51fac80319d1baad1] mqueue: fix a use-after-free 
in sys_mq_notify()
git bisect skip f991af3daabaecff34684fd51fac80319d1baad1
# good: [ecbb903c56745d59c301db26dd7d8b74b520eb84] NFS: Be more careful about 
mapping file permissions
git bisect good ecbb903c56745d59c301db26dd7d8b74b520eb84
# skip: [b49defe83659cefbb1763d541e779da32594ab10] kvm: avoid unused variable 
warning for UP builds
git bisect skip b49defe83659cefbb1763d541e779da32594ab10
# good: [b5ab16bf64347ebc9dbdc51a4f603511babda1e6] drm/amdgpu: properly 
byteswap gpu_info firmware
git bisect good b5ab16bf64347ebc9dbdc51a4f603511babda1e6
# good: [3941dae15ed90437396389e8bb7d2d5b3e63ba4a] drm_dp_aux_dev: switch to 
read_iter/write_iter
git bisect good 3941dae15ed90437396389e8bb7d2d5b3e63ba4a
# good: [f0d9c8924e2c33764dca0c3a4f693a345ecf6579] [media] media: imx: Add IC 
subdev drivers
git bisect good f0d9c8924e2c33764dca0c3a4f693a345ecf6579
# skip: [101dd590a7fa37954540cf3149a1c502c0acc524] powerpc/perf: Avoid spurious 
PMU interrupts after idle
git bisect skip 101dd590a7fa37954540cf3149a1c502c0acc524
# good: [eb0f0373e575822cf35949627b92533c7c41629c] drm/amdgpu: fix a typo in 
comment
git bisect good eb0f0373e575822cf35949627b92533c7c41629c
# skip: [3f0bd8dad0db73f5d71b355aec5ab33b374260ba] powerpc/perf: Add POWER9 
alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
git bisect skip 3f0bd8dad0db73f5d71b355aec5ab33b374260ba
# good: [96edd61dcf44362d3ef0bed1a5361e0ac7886a63] xen/balloon: don't online 
new memory initially
git bisect good 

Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Meelis Roos
> On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
> > > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
> > > > > sparc64 
> > > > > t5120 server: can't allocate MSI-X affinity masks.
> > > > > 
> > > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel 
> > > > > HBA Driver: 10.00.00.00-k.
> > > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > > > iobase 0x00c100d0.
> > > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
> > > > > masks for 2 vectors
> > > > > [   30.816882] scsi host1: qla2xxx
> > > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > > > iobase 0x00c100d04000.
> > > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
> > > > > masks for 2 vectors
> > > > > [   31.367083] scsi host1: qla2xxx
> > > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > > > 

> IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
> regression we introduced this cycle.

Yes, I understand the same.

But under some circumstances/configs it has been probematic before too. 
I could not reproduce the circumstances.

> If nobody steps up with a theory, bisecting might be the easiest path
> forward.

I finished bisecting but was not successful. The pattern was strange:
good good skip good skip good skip  bad bad bad bad bad bad.

The first bad commit was - unrelated xen merge. Reverting this commit 
does not fix the problem.

Like at some moment it got broken by side effects (code size or 
whatever). The skips were most because of repeated on on sparc cpuidle 
code, and initially some in iommu related code. This might bend the 
results so some commits were not tested.

git bisect start
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [aae4e7a8bc44722fe70d58920a36916b1043195e] Linux 4.13-rc4
git bisect bad aae4e7a8bc44722fe70d58920a36916b1043195e
# good: [920f2ecdf6c3b3526f60fbd38c68597953cad3ee] Merge tag 'sound-4.13-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 920f2ecdf6c3b3526f60fbd38c68597953cad3ee
# skip: [af3c8d98508d37541d4bf57f13a984a7f73a328c] Merge tag 'drm-for-v4.13' of 
git://people.freedesktop.org/~airlied/linux
git bisect skip af3c8d98508d37541d4bf57f13a984a7f73a328c
# good: [d29cb3e45e923715f74d8a08d5c1ea996dce5a59] xfs: make _bmap_count_blocks 
consistent wrt delalloc extent behavior
git bisect good d29cb3e45e923715f74d8a08d5c1ea996dce5a59
# good: [fa6d095eb23a8b1aae78d221879032497f6e457f] drm/tegra: Add driver 
documentation
git bisect good fa6d095eb23a8b1aae78d221879032497f6e457f
# good: [37e51a7640c275999ea0c35410c42e6d896ff7fa] mm: clean up error handling 
in write_one_page
git bisect good 37e51a7640c275999ea0c35410c42e6d896ff7fa
# good: [4b9cdd96e7ea3dc2cd0edac67835f6f38c4f14c9] drm/omap: remove CLUT
git bisect good 4b9cdd96e7ea3dc2cd0edac67835f6f38c4f14c9
# good: [7f56c30bd0a232822aca38d288da475613bdff9b] vfio: Remove unnecessary 
uses of vfio_container.group_lock
git bisect good 7f56c30bd0a232822aca38d288da475613bdff9b
# good: [d7631e30434e7fcf025dd2a7cba879f203f7849b] switch 
compat_drm_getsareactx() to drm_ioctl_kernel()
git bisect good d7631e30434e7fcf025dd2a7cba879f203f7849b
# skip: [f991af3daabaecff34684fd51fac80319d1baad1] mqueue: fix a use-after-free 
in sys_mq_notify()
git bisect skip f991af3daabaecff34684fd51fac80319d1baad1
# good: [ecbb903c56745d59c301db26dd7d8b74b520eb84] NFS: Be more careful about 
mapping file permissions
git bisect good ecbb903c56745d59c301db26dd7d8b74b520eb84
# skip: [b49defe83659cefbb1763d541e779da32594ab10] kvm: avoid unused variable 
warning for UP builds
git bisect skip b49defe83659cefbb1763d541e779da32594ab10
# good: [b5ab16bf64347ebc9dbdc51a4f603511babda1e6] drm/amdgpu: properly 
byteswap gpu_info firmware
git bisect good b5ab16bf64347ebc9dbdc51a4f603511babda1e6
# good: [3941dae15ed90437396389e8bb7d2d5b3e63ba4a] drm_dp_aux_dev: switch to 
read_iter/write_iter
git bisect good 3941dae15ed90437396389e8bb7d2d5b3e63ba4a
# good: [f0d9c8924e2c33764dca0c3a4f693a345ecf6579] [media] media: imx: Add IC 
subdev drivers
git bisect good f0d9c8924e2c33764dca0c3a4f693a345ecf6579
# skip: [101dd590a7fa37954540cf3149a1c502c0acc524] powerpc/perf: Avoid spurious 
PMU interrupts after idle
git bisect skip 101dd590a7fa37954540cf3149a1c502c0acc524
# good: [eb0f0373e575822cf35949627b92533c7c41629c] drm/amdgpu: fix a typo in 
comment
git bisect good eb0f0373e575822cf35949627b92533c7c41629c
# skip: [3f0bd8dad0db73f5d71b355aec5ab33b374260ba] powerpc/perf: Add POWER9 
alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
git bisect skip 3f0bd8dad0db73f5d71b355aec5ab33b374260ba
# good: [96edd61dcf44362d3ef0bed1a5361e0ac7886a63] xen/balloon: don't online 
new memory initially
git bisect good 

Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Meelis Roos
> Just curious:  these are all SMP builds, right?

Yes. 32 threads on that CPU.

I am bisecting it slowly - some steps crash on boot for seemingly 
different reasons and skipping them does not advance quikly.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Meelis Roos
> Just curious:  these are all SMP builds, right?

Yes. 32 threads on that CPU.

I am bisecting it slowly - some steps crash on boot for seemingly 
different reasons and skipping them does not advance quikly.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Christoph Hellwig
Just curious:  these are all SMP builds, right?

Just got burnt again by an UP kernel issue in that area that I sent
a patch for (to Jens) a long time ago, but that didn't get fixed.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-17 Thread Christoph Hellwig
Just curious:  these are all SMP builds, right?

Just got burnt again by an UP kernel issue in that area that I sent
a patch for (to Jens) a long time ago, but that didn't get fixed.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-16 Thread Bjorn Helgaas
On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
> > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
> > > > sparc64 
> > > > t5120 server: can't allocate MSI-X affinity masks.
> > > > 
> > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > > > Driver: 10.00.00.00-k.
> > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > > iobase 0x00c100d0.
> > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
> > > > masks for 2 vectors
> > > > [   30.816882] scsi host1: qla2xxx
> > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > > iobase 0x00c100d04000.
> > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
> > > > masks for 2 vectors
> > > > [   31.367083] scsi host1: qla2xxx
> > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > > 
> > > > I do not know if the driver works since nothing is attached to the FC 
> > > > HBA at the moment, but from the error messages it looks like the driver 
> > > > fails to load.
> > > > 
> > > > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > > > seems to be the same error -22:
> > 
> > 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
> > with more verbose MSI messages. So something between 4.10 and 4.11 has 
> > broken it.
> 
> I can not reproduice the older kernels that misbehave. I checked out 
> earlier kernels and recompiled them (old config lost, nothing changed 
> AFAIK), everything works up to 4.12 inclusive.
> 
> > Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
> > be sun4v interrupt related.
> 
> This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
> machines.

IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
regression we introduced this cycle.

If nobody steps up with a theory, bisecting might be the easiest path
forward.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-16 Thread Bjorn Helgaas
On Wed, Aug 16, 2017 at 09:39:08PM +0300, Meelis Roos wrote:
> > > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my 
> > > > sparc64 
> > > > t5120 server: can't allocate MSI-X affinity masks.
> > > > 
> > > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > > > Driver: 10.00.00.00-k.
> > > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > > iobase 0x00c100d0.
> > > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity 
> > > > masks for 2 vectors
> > > > [   30.816882] scsi host1: qla2xxx
> > > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > > iobase 0x00c100d04000.
> > > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity 
> > > > masks for 2 vectors
> > > > [   31.367083] scsi host1: qla2xxx
> > > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > > 
> > > > I do not know if the driver works since nothing is attached to the FC 
> > > > HBA at the moment, but from the error messages it looks like the driver 
> > > > fails to load.
> > > > 
> > > > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > > > seems to be the same error -22:
> > 
> > 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
> > with more verbose MSI messages. So something between 4.10 and 4.11 has 
> > broken it.
> 
> I can not reproduice the older kernels that misbehave. I checked out 
> earlier kernels and recompiled them (old config lost, nothing changed 
> AFAIK), everything works up to 4.12 inclusive.
> 
> > Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
> > be sun4v interrupt related.
> 
> This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
> machines.

IIUC, that means v4.12 works and v4.13-rc4 does not, so this is a
regression we introduced this cycle.

If nobody steps up with a theory, bisecting might be the easiest path
forward.


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-16 Thread Meelis Roos
> > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> > > t5120 server: can't allocate MSI-X affinity masks.
> > > 
> > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > > Driver: 10.00.00.00-k.
> > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > iobase 0x00c100d0.
> > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks 
> > > for 2 vectors
> > > [   30.816882] scsi host1: qla2xxx
> > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > iobase 0x00c100d04000.
> > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks 
> > > for 2 vectors
> > > [   31.367083] scsi host1: qla2xxx
> > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > 
> > > I do not know if the driver works since nothing is attached to the FC 
> > > HBA at the moment, but from the error messages it looks like the driver 
> > > fails to load.
> > > 
> > > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > > seems to be the same error -22:
> 
> 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
> with more verbose MSI messages. So something between 4.10 and 4.11 has 
> broken it.

I can not reproduice the older kernels that misbehave. I checked out 
earlier kernels and recompiled them (old config lost, nothing changed 
AFAIK), everything works up to 4.12 inclusive.

> Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
> be sun4v interrupt related.

This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
machines.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-16 Thread Meelis Roos
> > > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> > > t5120 server: can't allocate MSI-X affinity masks.
> > > 
> > > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > > Driver: 10.00.00.00-k.
> > > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > > iobase 0x00c100d0.
> > > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks 
> > > for 2 vectors
> > > [   30.816882] scsi host1: qla2xxx
> > > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > > iobase 0x00c100d04000.
> > > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks 
> > > for 2 vectors
> > > [   31.367083] scsi host1: qla2xxx
> > > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > > 
> > > I do not know if the driver works since nothing is attached to the FC 
> > > HBA at the moment, but from the error messages it looks like the driver 
> > > fails to load.
> > > 
> > > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > > seems to be the same error -22:
> 
> 4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
> with more verbose MSI messages. So something between 4.10 and 4.11 has 
> broken it.

I can not reproduice the older kernels that misbehave. I checked out 
earlier kernels and recompiled them (old config lost, nothing changed 
AFAIK), everything works up to 4.12 inclusive.

> Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
> be sun4v interrupt related.

This still holds - 4.13-rc4 has MSI trouble on at least 2 of my sun4v 
machines.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-15 Thread Meelis Roos
> On Tue, Aug 15, 2017 at 05:54:27PM +0300, Meelis Roos wrote:
> > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> > t5120 server: can't allocate MSI-X affinity masks.
> > 
> > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > Driver: 10.00.00.00-k.
> > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > iobase 0x00c100d0.
> > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks 
> > for 2 vectors
> > [   30.816882] scsi host1: qla2xxx
> > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > iobase 0x00c100d04000.
> > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks 
> > for 2 vectors
> > [   31.367083] scsi host1: qla2xxx
> > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > 
> > I do not know if the driver works since nothing is attached to the FC 
> > HBA at the moment, but from the error messages it looks like the driver 
> > fails to load.
> > 
> > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > seems to be the same error -22:

4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
with more verbose MSI messages. So something between 4.10 and 4.11 has 
broken it.

Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
be sun4v interrupt related.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-15 Thread Meelis Roos
> On Tue, Aug 15, 2017 at 05:54:27PM +0300, Meelis Roos wrote:
> > I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> > t5120 server: can't allocate MSI-X affinity masks.
> > 
> > [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> > Driver: 10.00.00.00-k.
> > [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> > iobase 0x00c100d0.
> > [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks 
> > for 2 vectors
> > [   30.816882] scsi host1: qla2xxx
> > [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> > [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> > iobase 0x00c100d04000.
> > [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks 
> > for 2 vectors
> > [   31.367083] scsi host1: qla2xxx
> > [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> > 
> > I do not know if the driver works since nothing is attached to the FC 
> > HBA at the moment, but from the error messages it looks like the driver 
> > fails to load.
> > 
> > I booted 4.12 and 4.11 - the red error is not there but the failure 
> > seems to be the same error -22:

4.10.0 works, 4.11.0 errors out with EINVAL and 4.13-rc4 errorr sout 
with more verbose MSI messages. So something between 4.10 and 4.11 has 
broken it.

Also, 4.13-rc4 is broken on another sun4v here (T1000). So it seems to 
be sun4v interrupt related.

-- 
Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-15 Thread Bjorn Helgaas
[+cc Christoph]

On Tue, Aug 15, 2017 at 05:54:27PM +0300, Meelis Roos wrote:
> I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> t5120 server: can't allocate MSI-X affinity masks.
> 
> [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 10.00.00.00-k.
> [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 
> 0x00c100d0.
> [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 
> 2 vectors
> [   30.816882] scsi host1: qla2xxx
> [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 
> 0x00c100d04000.
> [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 
> 2 vectors
> [   31.367083] scsi host1: qla2xxx
> [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> 
> I do not know if the driver works since nothing is attached to the FC 
> HBA at the moment, but from the error messages it looks like the driver 
> fails to load.
> 
> I booted 4.12 and 4.11 - the red error is not there but the failure 
> seems to be the same error -22:

-22 is -EINVAL, so not very specific.  Many failures probably use this
code.

There were several IRQ affinity changes between v4.12 and v4.13; it'll
probably be obvious to Christoph.

> [2478900.385223] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 9.00.00.00-k.
> [2478900.385610] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> iobase 0x00c100d0.
> [2478900.930517] scsi host1: qla2xxx
> [2478900.990939] qla2xxx: probe of :10:00.0 failed with error -22
> [2478900.991222] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> iobase 0x00c100d04000.
> [2478901.510715] scsi host1: qla2xxx
> [2478901.581106] qla2xxx: probe of :10:00.1 failed with error -22
> 
> Will try older kernels too if it is useful for bisection.
> 
> On an older sparc64 (t1-200) with 4.13.0-rc4, qla2xxx loads fine (nothing is 
> attached there either):
> 
> [   30.590064] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 10.00.00.00-k.
> [   30.699053] PCI: Enabling device: (:02:05.0), cmd 3
> [   30.699122] qla2xxx [:02:05.0]-001d: : Found an ISP2200 irq 12 iobase 
> 0x01ffa000.
> [   52.463403] scsi host2: qla2xxx
> [   52.545973] qla2xxx [:02:05.0]-4800:2: DPC handler sleeping.
> [   52.627163] qla2xxx [:02:05.0]-00fb:2: QLogic QLA22xx - .
> [   52.705428] qla2xxx [:02:05.0]-00fc:2: ISP2200: PCI (33 MHz) @ 
> :02:05.0 hdma- host#=2 fw=2.02.08 TP.
> [   53.503221] qla2xxx [:02:05.0]-480f:2: Loop resync scheduled.
> [   73.796964] qla2xxx [:02:05.0]-8038:2: Cable is unplugged...
> [   73.876036] qla2xxx [:02:05.0]-883a:2: fw_state=4 (, , , 
>  ) curr time=a61d.
> [   73.999845] qla2xxx [:02:05.0]-286c:2: qla2x00_loop_resync *** FAILED 
> ***.
> [   74.094861] qla2xxx [:02:05.0]-4810:2: Loop resync end.
> [   74.168188] qla2xxx [:02:05.0]-4800:2: DPC handler sleeping.
> 
> 
> -- 
> Meelis Roos (mr...@linux.ee)


Re: 4.13.0-rc4 sparc64: can't allocate MSI-X affinity masks for 2 vectors

2017-08-15 Thread Bjorn Helgaas
[+cc Christoph]

On Tue, Aug 15, 2017 at 05:54:27PM +0300, Meelis Roos wrote:
> I noticed that in 4.13.0-rc4 there is a new error in dmesg on my sparc64 
> t5120 server: can't allocate MSI-X affinity masks.
> 
> [   30.274284] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 10.00.00.00-k.
> [   30.274648] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 iobase 
> 0x00c100d0.
> [   30.275447] qla2xxx :10:00.0: can't allocate MSI-X affinity masks for 
> 2 vectors
> [   30.816882] scsi host1: qla2xxx
> [   30.877294] qla2xxx: probe of :10:00.0 failed with error -22
> [   30.877578] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 iobase 
> 0x00c100d04000.
> [   30.878387] qla2xxx :10:00.1: can't allocate MSI-X affinity masks for 
> 2 vectors
> [   31.367083] scsi host1: qla2xxx
> [   31.427500] qla2xxx: probe of :10:00.1 failed with error -22
> 
> I do not know if the driver works since nothing is attached to the FC 
> HBA at the moment, but from the error messages it looks like the driver 
> fails to load.
> 
> I booted 4.12 and 4.11 - the red error is not there but the failure 
> seems to be the same error -22:

-22 is -EINVAL, so not very specific.  Many failures probably use this
code.

There were several IRQ affinity changes between v4.12 and v4.13; it'll
probably be obvious to Christoph.

> [2478900.385223] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 9.00.00.00-k.
> [2478900.385610] qla2xxx [:10:00.0]-001d: : Found an ISP2432 irq 21 
> iobase 0x00c100d0.
> [2478900.930517] scsi host1: qla2xxx
> [2478900.990939] qla2xxx: probe of :10:00.0 failed with error -22
> [2478900.991222] qla2xxx [:10:00.1]-001d: : Found an ISP2432 irq 22 
> iobase 0x00c100d04000.
> [2478901.510715] scsi host1: qla2xxx
> [2478901.581106] qla2xxx: probe of :10:00.1 failed with error -22
> 
> Will try older kernels too if it is useful for bisection.
> 
> On an older sparc64 (t1-200) with 4.13.0-rc4, qla2xxx loads fine (nothing is 
> attached there either):
> 
> [   30.590064] qla2xxx [:00:00.0]-0005: : QLogic Fibre Channel HBA 
> Driver: 10.00.00.00-k.
> [   30.699053] PCI: Enabling device: (:02:05.0), cmd 3
> [   30.699122] qla2xxx [:02:05.0]-001d: : Found an ISP2200 irq 12 iobase 
> 0x01ffa000.
> [   52.463403] scsi host2: qla2xxx
> [   52.545973] qla2xxx [:02:05.0]-4800:2: DPC handler sleeping.
> [   52.627163] qla2xxx [:02:05.0]-00fb:2: QLogic QLA22xx - .
> [   52.705428] qla2xxx [:02:05.0]-00fc:2: ISP2200: PCI (33 MHz) @ 
> :02:05.0 hdma- host#=2 fw=2.02.08 TP.
> [   53.503221] qla2xxx [:02:05.0]-480f:2: Loop resync scheduled.
> [   73.796964] qla2xxx [:02:05.0]-8038:2: Cable is unplugged...
> [   73.876036] qla2xxx [:02:05.0]-883a:2: fw_state=4 (, , , 
>  ) curr time=a61d.
> [   73.999845] qla2xxx [:02:05.0]-286c:2: qla2x00_loop_resync *** FAILED 
> ***.
> [   74.094861] qla2xxx [:02:05.0]-4810:2: Loop resync end.
> [   74.168188] qla2xxx [:02:05.0]-4800:2: DPC handler sleeping.
> 
> 
> -- 
> Meelis Roos (mr...@linux.ee)