Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?

2021-10-04 Thread Chuck Zmudzinski
As discussed in message #91, the submitter of this bug accepts the 
package maintainer's fix which will close this bug.




Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?

2021-10-04 Thread Chuck Zmudzinski

On 10/4/2021 6:57 AM, Diederik de Haas wrote:

On Monday, 4 October 2021 11:46:54 CEST Hans van Kranenburg wrote:

The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I
agree with Diederik that we should keep them all together.

Context: Those 4 are part of 1 patch-set posted here:
https://lists.xen.org/archives/html/xen-devel/2020-11/msg01516.html

The 5th was already debatable and I choose to include it in my MR, but I'm fine
with not including that one.

Cheers,
   Diederik


As the submitter of #994899, I can confirm these 4 fix the bug
on my hardware. I agree this fix can close #994899 and #995341,
since as Hans noted, they are part of the upstream stable 4.15 branch
and I presume that will make them stable enough for bullseye.

Thank you Hans, Diederik, and Elliott.

All the best,

Chuck



Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?

2021-10-04 Thread Diederik de Haas
On Monday, 4 October 2021 17:27:22 CEST Chuck Zmudzinski wrote:
>  I can confirm these 4 fix the bug on my hardware.

\o/
Thanks for testing and reporting back :-)

Cheers,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?

2021-10-04 Thread Diederik de Haas
On Monday, 4 October 2021 11:46:54 CEST Hans van Kranenburg wrote:
> The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I
> agree with Diederik that we should keep them all together.

Context: Those 4 are part of 1 patch-set posted here:
https://lists.xen.org/archives/html/xen-devel/2020-11/msg01516.html

The 5th was already debatable and I choose to include it in my MR, but I'm fine 
with not including that one.

Cheers,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#991967: Simply ACPI powerdown/reset issue?

2021-10-04 Thread Hans van Kranenburg
Hi Elliot and others,

Also including #994899 for once, since that's the bug number for the Xen
issue now.

On 9/26/21 5:27 AM, Elliott Mitchell wrote:
> On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:
>> I presume you are suggesting I try booting 4.19.181-1 on the
>> current version of Xen-4.14 for bullseye as a dom0. I am not
>> inclined to try it until an official Debian developer endorses
>> your opinion that the bug I am seeing is distinct
>> from #991967, at which point I will report the bug I am
>> seeing as a new bug.
> 
> Chuck Zmudzinski you are getting rather close to my threshold for calling
> harrassment.  You're not /quite/ there, but I'm concerned.
> 
> 
> Since the purpose of the bug reports is to find and diagnose bugs, I did
> a bit of experimentation and made some observations.
> 
> I checked out the Debian Xen source via git.  I got the current
> "master" branch which is presently the candidate 4.14.3-1 version,
> which includes urgent fixes.  The hash is:
> e7a17db0305c8de891b366ad3528e5a43015
> 
> On top of this I cherry-picked 3 commits from Xen's main branch:
> 5a4087004d1adbbb223925f3306db0e5824a2bdc
> 0f089bbf43ecce6f27576cb548ba4341d0ec46a8
> bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b
> 
> (these can be retrieved via Xen's gitweb at
> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
> suitable for the `git am` command)
> 
> With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
> 4.19.194-3 (this system is presently mostly on oldstable).  The results
> were:
> 
> Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful
> 
> Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung

Ok, so it included 0f089bbf43, which is probably the most important of
the 3 fixes that we need indeed. And, it's good that the above
difference is still visible afterwards, since it confirms that we're
looking at two distinct problems.

> Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
> missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
> Linux 4.19.181-1.  I believe this combination would have hung during
> reboot.

The Xen related breakage was introduced in 4.14.0+88-g1d1d1f5391-2, so
with that combination, I would expect you would experience both of the
bugs at the same time, yes.

> As such, I believe there are in fact two distinct bugs being observed.
> The presence of EITHER of these is sufficient to cause hangs during
> powerdown or reboot.
> 
> First, some patch originally from Linux's main branch breaks Xen reboots
> was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
> either have been introduced before 5.10 diverged from main, or may also
> have been backported to 5.10.  THIS is Debian bug #991967.
> 
> Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
> valuable to ARM devices breaks reboots and powerdowns on x86.  This is
> correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.  Presently
> this has no Debian bug report.

Correct. Thanks a lot for your help with hunting down and confirming this.

And now we have #994899 for it. So, I would like to kindly ask everyone
to stop hijacking this one, #991967, for discussing the Xen problem.

> The first is presently unidentified, someone enthusiastic either needs to
> read git logs/source code, or bisect and build to find where it got
> broken.
> 
> The second we seem to have a fix.  The only question is how many patches
> to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
> and not needed for functionality.
> 5a4087004d1a is a workaround for Linux kernel breakage, but how likely
> are we to see that fixed in the Linux kernel packages?  The fix is
> well-contained and needed for some highly popular ARM devices.

Diederik also helped with testing changes, and when combining results,
the best thing we can do is pick the 4 changes that were initially
posted in Nov 2020 as "x86: ACPI and DMI table mapping fixes", and ended
up in Xen 4.15 as well.

 >8 

commit 8b6d55c1261820bb9db8d867ce9ee77397d05203
Author: Jan Beulich 
Date:   Tue Nov 24 11:26:02 2020 +0100

x86/ACPI: fix mapping of FACS

commit f390941a92f102ece1b54be206a602187fd7
Author: Jan Beulich 
Date:   Tue Nov 24 11:26:34 2020 +0100

x86/DMI: fix table mapping when one lives above 1Mb

commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8
Author: Jan Beulich 
Date:   Tue Jan 5 13:09:55 2021 +0100

x86/ACPI: fix S3 wakeup vector mapping

commit 16ca5b3f873f17f4fbdaecf46c133e1aa3d623b2
Author: Jan Beulich 
Date:   Tue Jan 5 13:11:04 2021 +0100

x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be
determined

 >8 

The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I
agree with Diederik that we should keep them all together.

I do not know if this is also the thing Chuck tested in the end, but I'm
a bit lost in the walls of text that were produced in these two bugs.


Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-29 Thread Chuck Zmudzinski

This corrects typos - I referenced the wrong bug # in
a few places.

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:


Since the purpose of the bug reports is to find and diagnose bugs, I did
a bit of experimentation and made some observations.

I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad3528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b


By main branch, I presume you mean the unstable
4.16 branch of Xen. Correct?

(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung



Interesting. Looks like you are honing in on solving this bug. I notice
at the beginning of this message you quoted an older message of mine
which does not take into account that I have reported a new bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899
because I did come to the conclusion, as you did, that there are
in fact two bugs.

I wonder if the results of your modified Xen 4.14.3-1 with
4.19.181-1 and 4.19.194-3 on my hardware would be of help.
I have, as you might recall, older (Haswell) intel, EFI boot
system, and systemd for init/shutdown services.
If I get the same result, then I would agree we are seeing a
regression between those two versions of Linux. Otherwise,
then there may also be some tests involving EFI vs. BIOS to
do. Or, based on what I have learned at #994899, also possibly
we need to check systemd vs. sysv-init. Do you want me to
do the test on my hardware?


Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1. I believe this combination would have hung during
reboot.


I can confirm it did hang on my hardware with this combination of
Xen and Linux versions.

As such, I believe there are in fact two distinct bugs being observed.
The presence of EITHER of these is sufficient to cause hangs during
powerdown or reboot.


And we already have two distinct bugs on BTS.

First, some patch originally from Linux's main branch breaks Xen reboots
was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
either have been introduced before 5.10 diverged from main, or may also
have been backported to 5.10.  THIS is Debian bug #991967.


I agree. I believe you.

Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
valuable to ARM devices breaks reboots and powerdowns on x86.  This is
correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

Presently
this has no Debian bug report.


That looks a lot like #994899. Have you ruled out the possibility that
this bug is #994899 in disguise? If so, how? Or do you think #994899
is a third bug?

The first is presently unidentified, someone enthusiastic either needs to
read git logs/source code, or bisect and build to find where it got
broken.


Yeah, that's alot of work. That's how I found my solution for #994899.
For that bug, since the working version was Xen 4.11 and the broken
version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14.
So that required a bit of detective work studying git logs, but in the
end, I just tested 4.12, and it was good, then 4.13 and it was good.
I also tested the first Debian version of 4.14, which was actually
experimental on Debian if I recall correctly. It did not include the
RPI4 patches, and it was good too. So I knew the bug was introduced
sometime after that, and I soon identified the RPI4 patches as the place
where the bug (#994899) first appeared on my hardware.

The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.




When you decide what to do here, I would like to check it to
see if it works on my hardware and if you don't hear anything
from me, you can assume it worked fine on my hardware.

Cheers,

Chuck



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-27 Thread Chuck Zmudzinski

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:


I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad3528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung


I presume the Xen 4.14.3-1 you are referring to is not the
official version, but the one patched with the three extra
aforementioned commits. Note: I use quilt to manage the
packages, and quilt rejected the last commit because the
context within three lines of the patched code was changed.
A goto bad was changed to goto done by another commit
on the Xen unstable branch, so I fixed the patch file
and changed the 'done' to 'bad' to get the third patch to succeed.
Let's call this patched version of Xen version 4.14.3-1.1

I tried these on my hardware, which is a Haswell processor, EFI
boot, and systemd for init, and my results are:

Xen 4.14.3-1.1 with Linux 4.19.181-1: system reboots hung
Xen 4.14.3-1.1 with Linux 4.19.194-3: system reboots hung
Xen 4.14.3-1.1 with Linux 5.10.46-4: system reboots hung

I still cannot reproduce this result, not even with the extra three
commits. Perhaps it depends on differences in the BIOS or EFI, or
maybe systemd vs. sysv.

I share this result in case it is of help to you.

Regards,

Chuck Zmudzinski



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-26 Thread Chuck Zmudzinski

On 9/26/2021 8:46 AM, Chuck Zmudzinski wrote:

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:


Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1.  I believe this combination would have hung during
reboot.




In light of what I discovered while investigating the cause of
bug #994899, I would tend to think calling
Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting
choice of words. To me, vanilla connotes boring,
uninteresting. But that version of Debian Xen, and
also the current version in the stable distribution,
bullseye, are not boring or uninteresting as I have
studied these versions and concluded they actually
are now a fork of upstream Xen's 4.14 version, since
they contain patches from upstream Xen's 4.16 unstable
branch to better support the Raspberry Pi 4, as noted
in the changelogs of those versions.

So I am adding the tag upstream,


Actually, I will add the upstream tag to the bug I reported in
Xen, #994899, since we are talking about upstream Xen, not
upstream Linux.



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-26 Thread Chuck Zmudzinski

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:


Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1.  I believe this combination would have hung during
reboot.




In light of what I discovered while investigating the cause of
bug #994899, I would tend to think calling
Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting
choice of words. To me, vanilla connotes boring,
uninteresting. But that version of Debian Xen, and
also the current version in the stable distribution,
bullseye, are not boring or uninteresting as I have
studied these versions and concluded they actually
are now a fork of upstream Xen's 4.14 version, since
they contain patches from upstream Xen's 4.16 unstable
branch to better support the Raspberry Pi 4, as noted
in the changelogs of those versions.

So I am adding the tag upstream, and I suggest that
the Debian Xen Team notify upstream Xen that we
are planning a fork of Xen to better support popular
arm devices and we are already shipping a testing
version of it in our current bullseye release. We could
tell upstream we are willing to stop this fork if they
could assist us with backporting the reworking of the
xen/arm/acpi and xen/x86/acpi code that is in upstream
Xen 4.16 unstable to xen 4.14. We can tell
them if they are interested in what we are doing, they
can take a look at the work we are doing on our
public development servers (salsa).

For our own users, especially in the stable version,
we should make a note of this fact in a README.Debian
file and place it in an appropriate place of the binary
packages. We should also note that there are encouraging
results with this version for improved support on arm,
but some tests indicate an annoying bug causing
problems shutting down Domain 0 appear to have
surfaced on x86 (amd64). For details, see bugs #991967
and #994899 on the Debian Bug Tracking System.

I think this is the BEST way to truly proceed in accordance
with the Debian Social Policy of courtesy and cooperation
with the free software projects that are available to the
public in our main repositories, and to properly inform
our users what we are doing in our current Xen packages
for unstable, testing, and stable.



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-26 Thread Diederik de Haas
Hi Elliott,

On zondag 26 september 2021 05:27:07 CEST Elliott Mitchell wrote:
> I checked out the Debian Xen source via git.  I got the current
> "master" branch which is presently the candidate 4.14.3-1 version,
> which includes urgent fixes.  The hash is:
> e7a17db0305c8de891b366ad3528e5a43015
> 
> On top of this I cherry-picked 3 commits from Xen's main branch:
> 5a4087004d1adbbb223925f3306db0e5824a2bdc
> 0f089bbf43ecce6f27576cb548ba4341d0ec46a8
> bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

Shutdown on my Xen server broke for me between 4.14.0+80-gd101b417b7-1 and 
4.14.0+88-g1d1d1f5391-1 (too) and 'Knorrie' and I have been doing some 
experiments. We identified the 0f089bbf43 commit too, but also 2 other ones:

8b6d55c1261820bb9db8d867ce9ee77397d05203
f390941a92f102ece1b54be206a602187fd7

https://salsa.debian.org/xen-team/debian-xen/-/commits/knorrie/for-diederik-3-fixes/
is a branch Knorrie prepared for me with those 3 patches applied.
I did 'git checkout' on that branch and then a 'dpkg-buildpackage -b' and 
installed the built .deb files and rebooted. After that, shutdown worked again 
:)

So you may want to take a look at those patches too.

HTH,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-26 Thread Chuck Zmudzinski

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:


The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.


I suspect that depends on how highly motivated Debian is
to support those highly popular ARM devices not just with
Linux, but with Linux as a Xen Dom0 on those devices. Even
if they are highly popular devices, what matters, ultimately,
I think, is if there is a reason for them to be popular as
devices that run a Xen dom0. Then maybe there is a chance
to get some patches into the Linux kernel for this purpose.
Just my two cents, FWIW.



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-25 Thread Fr. Chuck Zmudzinski, C.P.M.

On 9/25/2021 11:27 PM, Elliott Mitchell wrote:

On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:

I presume you are suggesting I try booting 4.19.181-1 on the
current version of Xen-4.14 for bullseye as a dom0. I am not
inclined to try it until an official Debian developer endorses
your opinion that the bug I am seeing is distinct
from #991967, at which point I will report the bug I am
seeing as a new bug.

Chuck Zmudzinski you are getting rather close to my threshold for calling
harrassment.  You're not /quite/ there, but I'm concerned.


Sorry if I offended you in some way, I didn't mean to.


Since the purpose of the bug reports is to find and diagnose bugs, I did
a bit of experimentation and made some observations.

I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad3528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b


By main branch, I presume you mean the unstable
4.16 branch of Xen. Correct?

(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung



Interesting. Looks like you are honing in on solving this bug. I notice
at the beginning of this message you quoted an older message of mine
which does not take into account that I have reported a new bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899
because I did come to the conclusion, as you did, that there are
in fact two bugs.

I wonder if the results of your modified Xen 4.14.3-1 with
4.19.181-1 and 4.19.194-3 on my hardware would be of help.
I have, as you might recall, older (Haswell) intel, EFI boot
system, and systemd for init/shutdown services.
If I get the same result, then I would agree we are seeing a
regression between those two versions of Linux. Otherwise,
then there may also be some tests involving EFI vs. BIOS to
do. Or, based on what I have learned at #994899, also possibly
we need to check systemd vs. sysv-init. Do you want me to
do the test on my hardware?


Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1. I believe this combination would have hung during
reboot.


I can confirm it did hang on my hardware with this combination of
Xen and Linux versions.


As such, I believe there are in fact two distinct bugs being observed.
The presence of EITHER of these is sufficient to cause hangs during
powerdown or reboot.


And we already have two distinct bugs on BTS.

First, some patch originally from Linux's main branch breaks Xen reboots
was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
either have been introduced before 5.10 diverged from main, or may also
have been backported to 5.10.  THIS is Debian bug #991967.


I agree. I believe you.

Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
valuable to ARM devices breaks reboots and powerdowns on x86.  This is
correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

Presently
this has no Debian bug report.


That looks a lot like #994889. Have you ruled out the possibility that
this bug is #994889 in disguise? If so, how? Or do you think #994889
is a third bug?


The first is presently unidentified, someone enthusiastic either needs to
read git logs/source code, or bisect and build to find where it got
broken.


Yeah, that's alot of work. That's how I found my solution for #994889.
For that bug, since the working version was Xen 4.11 and the broken
version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14.
So that required a bit of detective work studying git logs, but in the
end, I just tested 4.12, and it was good, then 4.13 and it was good.
I also tested the first Debian version of 4.14, which was actually
experimental on Debian if I recall correctly. It did not include the
RPI4 patches, and it was good too. So I knew the bug was introduced
sometime after that, and I soon identified the RPI4 patches as the place
where the bug (#994889) first appeared on my hardware.

The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.





Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-25 Thread Elliott Mitchell
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:
> I presume you are suggesting I try booting 4.19.181-1 on the
> current version of Xen-4.14 for bullseye as a dom0. I am not
> inclined to try it until an official Debian developer endorses
> your opinion that the bug I am seeing is distinct
> from #991967, at which point I will report the bug I am
> seeing as a new bug.

Chuck Zmudzinski you are getting rather close to my threshold for calling
harrassment.  You're not /quite/ there, but I'm concerned.


Since the purpose of the bug reports is to find and diagnose bugs, I did
a bit of experimentation and made some observations.

I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad3528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung

Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1.  I believe this combination would have hung during
reboot.


As such, I believe there are in fact two distinct bugs being observed.
The presence of EITHER of these is sufficient to cause hangs during
powerdown or reboot.

First, some patch originally from Linux's main branch breaks Xen reboots
was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
either have been introduced before 5.10 diverged from main, or may also
have been backported to 5.10.  THIS is Debian bug #991967.

Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
valuable to ARM devices breaks reboots and powerdowns on x86.  This is
correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.  Presently
this has no Debian bug report.


The first is presently unidentified, someone enthusiastic either needs to
read git logs/source code, or bisect and build to find where it got
broken.

The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Diederik de Haas
Adding pkg-xen-de...@lists.alioth.debian.org into the loop.

Chuck Zmudzinski replied to the bug and later replied to his own reply. 
To give full context, I've added the original reply in full and Chuck's reply 
to that (as it only quoted part of the context there).

On zondag 19 september 2021 07:05:56 CEST Chuck Zmudzinski wrote:
> On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso
> 
>  wrote:
>  > Hi Elliott,
>  > 
>  > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
>  > > An experiment lead to a potential alternative explanation for #991967.
>  > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
>  > > 4.19.194-3. Presence of Xen on the system may be unrelated.
>  > > 
>  > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
>  > > was tried on a UEFI system and the issue wasn't observed)
>  > 
>  > Following up on https://bugs.debian.org/991967#12
>  > 
>  > Did you succeeded in bisecting the issue as you seem to have it
>  > reproducible?
>  > 
>  > Regards,
>  > Salvatore
> 
> Hello Elliott and Salvatore,
> 
> I noticed this bug on bullseye ever since I have been
> running bullseye as a dom0, but my testing indicates
> there is no problem with src:linux but the problem
> appeared in src:xen with the 4.14 version of xen on
> bullseye.
> 
> I ask Elliott if you are only seeing the problem on Debian's
> xen-4.14 hypervisor? Also, which architecture, arm or
> amd64? I only see the problem on the Debian xen-4.14
> hypervisor, and I have only tested on amd64, and I
> have found a fix for my amd64 system which is as
> follows:
> 
> Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
> with a Haswell CPU (core i5-4590S)
> 
> xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64
> 
> linux kernel version: 5.10.46-4 (the current amd64 kernel
> for bullseye)
> 
> Boot system: EFI, not using secure boot, booting xen
> hypervisor and dom0 bullseye with grub-efi package for
> bullseye, and it boots the xen-4.14-amd64.gz file, not
> the xen-4.14-amd64.efi file.
> 
> I also tested a buster dom0 with the 4.19 series kernel
> on the xen-4.14 hypervisor from bullseye and saw the
> problem, but I did not see the problem with either
> a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
> the xen-4.11 hypervisor, so I think the problem is
> with the Debian version of the xen-4.14 hypervisor,
> not with src:linux.
> 
> I also found a fix in src:xen:
> 
> I noticed the series of patches in debian/patches of the
> 4.14.2+25-gb6a8c4f72d-2 version of src:xen (and
> earlier versions of xen-4.14 on Debian) have several patches
> backported from the unstable branch of xen upstream. By
> removing some of these patches from the patches
> series of the src:xen package, the dom0 shuts down
> as expected on my ASRock Haswell motherboard.
> 
> I rebuilt the src:xen package after removing the following
> patches from the debian/patches series and the result
> was that the computer shuts down as expected if I boot
> using the patched hypervisor:
> 
> 0027-xen-rpi4-implement-watchdog-based-reset.patch
> 0028-tools-python-Pass-linker-to-Python-build-process.patch
> 0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> 0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
> 0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
> 0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
> 0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
> 0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch
> 
> Most of these patches seem unrelated to the amd64
> architecture and instead affect the arm architecture, and
> removing all these patches is probably more than is needed to
> fix this bug, but I removed them all because I could not find
> them upstream on the 4.14 branch but instead only saw them
> on the xen unstable branch upstream (I did not check if they are
> on the 4.15 branch upstream), and I wanted to test
> a true upstream 4.14 version without these seemingly
> aggressive patches added by Debian from the unstable
> branch of xen upstream, and I discovered by being
> more conservative and not adding these patches from the
> unstable branch upstream fixed the problem!
> 
> I suspect the following patch is the culprit for problems
> shutting down on the amd64 architecture:
> 
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> 
> The commit log for this patch states:
> 
> From: Julien Grall 
> Date: Sat, 26 Sep 2020 17:44:29 +0100
> Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()
> 
> The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
> while the __acpi_os_{un,}map_memory() are meant to be arch-specific.
> 
> Currently, the former are still containing x86 specific code.
> 
> To avoid this rather strange split, the generic helpers are reworked so
> they are arch-agnostic. This requires the introduction of