Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-23 Thread Ben Hutchings
On Mon, 2017-12-18 at 18:40 +, Thomas Patrick Downes wrote:
> Is there going to be any kind of post-mortem analysis of how this
> happened?

I think it would be good to do this, but it doesn't seem to be
something that Debian does as a matter of course, and I'm unlikely to
be the right person to lead such an analysis.  It might be worth
proposing this to the release team (debian-release mailing list).

[...]
> It would also be helpful if you clearly stated whether this (a)
> affects all multi-socket systems and (b) whether it affected any
> single-socket systems. Between the sample bias of bug reports
> themselves and the “fog of war” neither conclusion is clear.

I'm afraid I still don't have a deep enough understanding of the bug to
say for sure.  Given that 'numa=off' appears to be a workaround, I
suspect that it is triggered by multiple NUMA nodes.  That would imply
that older multi-socket systems with a shared memory controller would
not be affected, while some single-socket systems with multiple memory
controllers would be affected.

Ben.

-- 
Ben Hutchings
I say we take off; nuke the site from orbit.  It's the only way to be
sure.



signature.asc
Description: This is a digitally signed message part


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-18 Thread Thomas Patrick Downes
Is there going to be any kind of post-mortem analysis of how this happened? The 
changelogs indicate that this entered oldstable-proposed-updates around 4 
December. I’m not quite sure when it entered oldstable-updates or if it ever 
formally entered oldstable-updates prior to being incorporated into the 8.10 
release on 9 December.

I would not call 5 days an “extended testing period” as promised by the 
proposed-updates mechanism.

https://www.debian.org/releases/proposed-updates.html

It would also be helpful if you clearly stated whether this (a) affects all 
multi-socket systems and (b) whether it affected any single-socket systems. 
Between the sample bias of bug reports themselves and the “fog of war” neither 
conclusion is clear.

Yours,

--
Tom Downes
Senior Scientist
Center for Gravitation, Cosmology and Astrophysics
414.229.2678


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-18 Thread Jim Cobley
Thank you. The update has just been applied and my systems are now up 
and running again with no UUID error.




Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-15 Thread Dr. Nagy Elemér Kár oly
Dear Ben,

Thank you, the fix works for me, both Sun Fires (X2200M2 and X4200M2) boot with 
3.16.51-3~a.test (2017-12-11).

Best wishes:
Elemér



Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-14 Thread Chris Hofstaedtler
Hi,

* Ben Hutchings  [171214 11:37]:
> Apologies for this regression.  Salvatore Bonaccorso has tracked down
> which change in 3.16-stable triggers the crash, and I identified some
> related upstream changes which appear to fix it.  An updated package is
> available at:
> 
> https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb

We just ran into this same issue inside Proxmox VE 5.1-38 on a KVM
guest with 2 Sockets with NUMA enabled.
I can confirm that the test kernel makes the guest boot again.

Many thanks,
Chris



signature.asc
Description: PGP signature


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-12 Thread David Loup

On Tue, 12 Dec 2017 01:57:48 + Ben Hutchings wrote:
> [This message is bcc'd to all bug reporters.]
>
> Apologies for this regression. Salvatore Bonaccorso has tracked down
> which change in 3.16-stable triggers the crash, and I identified some
> related upstream changes which appear to fix it. An updated package is
> available at:
>
> 
https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb

>
> There is a signed .changes file in the same directory that you can use
> to authenticate it.
>
> Please report back (to the bug address) whether this fixes the
> regression for you.
>
> If you need i386 packages, let me know and I will upload them too.
>
> Ben.
>
> --
> Ben Hutchings
> Unix is many things to many people,
> but it's never been everything to anybody.

The fix worked for me, thanks !


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-12 Thread Miquel van Smoorenburg
On 12/12/17 02:57, Ben Hutchings wrote:

> https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb
>
> Please report back (to the bug address) whether this fixes the
> regression for you.
>
Fixes the problem on our servers. Thanks!

Mike.


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-12 Thread Bernhard Schmidt
Am 12.12.2017 um 02:57 schrieb Ben Hutchings:

Hi Ben,

> Apologies for this regression.  Salvatore Bonaccorso has tracked down
> which change in 3.16-stable triggers the crash, and I identified some
> related upstream changes which appear to fix it.  An updated package is
> available at:
> 
> https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb
> 
> There is a signed .changes file in the same directory that you can use
> to authenticate it.
> 
> Please report back (to the bug address) whether this fixes the
> regression for you.

Fixes the regression on a HP DL380 Gen9.

Thanks for following up.

Bernhard



Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-12 Thread Karsten Heiken

Hi Ben,

Ben Hutchings wrote:

An updated package is available at:

https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb


I can also confirm that this build works fine on my problematic
machines.

Thanks for the fix!


Karsten



signature.asc
Description: PGP signature


Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-12 Thread Dominic Benson
3.16.51-3~a.test also works on my previously problematic box.

Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-11 Thread Thomas Martin
On Tue, 12 Dec 2017 01:57:48 + Ben Hutchings  wrote:
> [This message is bcc'd to all bug reporters.]
>
> Apologies for this regression.  Salvatore Bonaccorso has tracked down
> which change in 3.16-stable triggers the crash, and I identified some
> related upstream changes which appear to fix it.  An updated package is
> available at:
>
> https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb
>
> There is a signed .changes file in the same directory that you can use
> to authenticate it.
>
> Please report back (to the bug address) whether this fixes the
> regression for you.
>
> If you need i386 packages, let me know and I will upload them too.
>
> Ben.
>
> --
> Ben Hutchings
> Unix is many things to many people,
> but it's never been everything to anybody.

It worked for me (on Dell PowerEdge R630); I'm now able to boot with
using maxcpus=1, nosmp or numa=off.

Thanks for everyone's work by the way!


Thomas



Bug#883938: RFT: Candidate fix for boot failure of Debian 8.10 on various x86 systems

2017-12-11 Thread Ben Hutchings
[This message is bcc'd to all bug reporters.]

Apologies for this regression.  Salvatore Bonaccorso has tracked down
which change in 3.16-stable triggers the crash, and I identified some
related upstream changes which appear to fix it.  An updated package is
available at:

https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb

There is a signed .changes file in the same directory that you can use
to authenticate it.

Please report back (to the bug address) whether this fixes the
regression for you.

If you need i386 packages, let me know and I will upload them too.

Ben.

-- 
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


signature.asc
Description: This is a digitally signed message part