Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
As discussed in message #91, the submitter of this bug accepts the package maintainer's fix which will close this bug.
Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
On 10/4/2021 6:57 AM, Diederik de Haas wrote: On Monday, 4 October 2021 11:46:54 CEST Hans van Kranenburg wrote: The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I agree with Diederik that we should keep them all together. Context: Those 4 are part of 1 patch-set posted here: https://lists.xen.org/archives/html/xen-devel/2020-11/msg01516.html The 5th was already debatable and I choose to include it in my MR, but I'm fine with not including that one. Cheers, Diederik As the submitter of #994899, I can confirm these 4 fix the bug on my hardware. I agree this fix can close #994899 and #995341, since as Hans noted, they are part of the upstream stable 4.15 branch and I presume that will make them stable enough for bullseye. Thank you Hans, Diederik, and Elliott. All the best, Chuck
Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
On Monday, 4 October 2021 17:27:22 CEST Chuck Zmudzinski wrote: > I can confirm these 4 fix the bug on my hardware. \o/ Thanks for testing and reporting back :-) Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
On Monday, 4 October 2021 11:46:54 CEST Hans van Kranenburg wrote: > The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I > agree with Diederik that we should keep them all together. Context: Those 4 are part of 1 patch-set posted here: https://lists.xen.org/archives/html/xen-devel/2020-11/msg01516.html The 5th was already debatable and I choose to include it in my MR, but I'm fine with not including that one. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#991967: Simply ACPI powerdown/reset issue?
Hi Elliot and others, Also including #994899 for once, since that's the bug number for the Xen issue now. On 9/26/21 5:27 AM, Elliott Mitchell wrote: > On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: >> I presume you are suggesting I try booting 4.19.181-1 on the >> current version of Xen-4.14 for bullseye as a dom0. I am not >> inclined to try it until an official Debian developer endorses >> your opinion that the bug I am seeing is distinct >> from #991967, at which point I will report the bug I am >> seeing as a new bug. > > Chuck Zmudzinski you are getting rather close to my threshold for calling > harrassment. You're not /quite/ there, but I'm concerned. > > > Since the purpose of the bug reports is to find and diagnose bugs, I did > a bit of experimentation and made some observations. > > I checked out the Debian Xen source via git. I got the current > "master" branch which is presently the candidate 4.14.3-1 version, > which includes urgent fixes. The hash is: > e7a17db0305c8de891b366ad3528e5a43015 > > On top of this I cherry-picked 3 commits from Xen's main branch: > 5a4087004d1adbbb223925f3306db0e5824a2bdc > 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 > bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b > > (these can be retrieved via Xen's gitweb at > https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is > suitable for the `git am` command) > > With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and > 4.19.194-3 (this system is presently mostly on oldstable). The results > were: > > Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful > > Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Ok, so it included 0f089bbf43, which is probably the most important of the 3 fixes that we need indeed. And, it's good that the above difference is still visible afterwards, since it confirms that we're looking at two distinct problems. > Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I > missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with > Linux 4.19.181-1. I believe this combination would have hung during > reboot. The Xen related breakage was introduced in 4.14.0+88-g1d1d1f5391-2, so with that combination, I would expect you would experience both of the bugs at the same time, yes. > As such, I believe there are in fact two distinct bugs being observed. > The presence of EITHER of these is sufficient to cause hangs during > powerdown or reboot. > > First, some patch originally from Linux's main branch breaks Xen reboots > was backported somewhere between 4.19.181-1 and 4.19.194-3. This may > either have been introduced before 5.10 diverged from main, or may also > have been backported to 5.10. THIS is Debian bug #991967. > > Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is > valuable to ARM devices breaks reboots and powerdowns on x86. This is > correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently > this has no Debian bug report. Correct. Thanks a lot for your help with hunting down and confirming this. And now we have #994899 for it. So, I would like to kindly ask everyone to stop hijacking this one, #991967, for discussing the Xen problem. > The first is presently unidentified, someone enthusiastic either needs to > read git logs/source code, or bisect and build to find where it got > broken. > > The second we seem to have a fix. The only question is how many patches > to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial > and not needed for functionality. > 5a4087004d1a is a workaround for Linux kernel breakage, but how likely > are we to see that fixed in the Linux kernel packages? The fix is > well-contained and needed for some highly popular ARM devices. Diederik also helped with testing changes, and when combining results, the best thing we can do is pick the 4 changes that were initially posted in Nov 2020 as "x86: ACPI and DMI table mapping fixes", and ended up in Xen 4.15 as well. >8 commit 8b6d55c1261820bb9db8d867ce9ee77397d05203 Author: Jan Beulich Date: Tue Nov 24 11:26:02 2020 +0100 x86/ACPI: fix mapping of FACS commit f390941a92f102ece1b54be206a602187fd7 Author: Jan Beulich Date: Tue Nov 24 11:26:34 2020 +0100 x86/DMI: fix table mapping when one lives above 1Mb commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 Author: Jan Beulich Date: Tue Jan 5 13:09:55 2021 +0100 x86/ACPI: fix S3 wakeup vector mapping commit 16ca5b3f873f17f4fbdaecf46c133e1aa3d623b2 Author: Jan Beulich Date: Tue Jan 5 13:11:04 2021 +0100 x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined >8 The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I agree with Diederik that we should keep them all together. I do not know if this is also the thing Chuck tested in the end, but I'm a bit lost in the walls of text that were produced in these two bugs.
Bug#991967: Simply ACPI powerdown/reset issue?
This corrects typos - I referenced the wrong bug # in a few places. On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b By main branch, I presume you mean the unstable 4.16 branch of Xen. Correct? (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Interesting. Looks like you are honing in on solving this bug. I notice at the beginning of this message you quoted an older message of mine which does not take into account that I have reported a new bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899 because I did come to the conclusion, as you did, that there are in fact two bugs. I wonder if the results of your modified Xen 4.14.3-1 with 4.19.181-1 and 4.19.194-3 on my hardware would be of help. I have, as you might recall, older (Haswell) intel, EFI boot system, and systemd for init/shutdown services. If I get the same result, then I would agree we are seeing a regression between those two versions of Linux. Otherwise, then there may also be some tests involving EFI vs. BIOS to do. Or, based on what I have learned at #994899, also possibly we need to check systemd vs. sysv-init. Do you want me to do the test on my hardware? Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. I can confirm it did hang on my hardware with this combination of Xen and Linux versions. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. And we already have two distinct bugs on BTS. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. I agree. I believe you. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. That looks a lot like #994899. Have you ruled out the possibility that this bug is #994899 in disguise? If so, how? Or do you think #994899 is a third bug? The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. Yeah, that's alot of work. That's how I found my solution for #994899. For that bug, since the working version was Xen 4.11 and the broken version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14. So that required a bit of detective work studying git logs, but in the end, I just tested 4.12, and it was good, then 4.13 and it was good. I also tested the first Debian version of 4.14, which was actually experimental on Debian if I recall correctly. It did not include the RPI4 patches, and it was good too. So I knew the bug was introduced sometime after that, and I soon identified the RPI4 patches as the place where the bug (#994899) first appeared on my hardware. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. When you decide what to do here, I would like to check it to see if it works on my hardware and if you don't hear anything from me, you can assume it worked fine on my hardware. Cheers, Chuck
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung I presume the Xen 4.14.3-1 you are referring to is not the official version, but the one patched with the three extra aforementioned commits. Note: I use quilt to manage the packages, and quilt rejected the last commit because the context within three lines of the patched code was changed. A goto bad was changed to goto done by another commit on the Xen unstable branch, so I fixed the patch file and changed the 'done' to 'bad' to get the third patch to succeed. Let's call this patched version of Xen version 4.14.3-1.1 I tried these on my hardware, which is a Haswell processor, EFI boot, and systemd for init, and my results are: Xen 4.14.3-1.1 with Linux 4.19.181-1: system reboots hung Xen 4.14.3-1.1 with Linux 4.19.194-3: system reboots hung Xen 4.14.3-1.1 with Linux 5.10.46-4: system reboots hung I still cannot reproduce this result, not even with the extra three commits. Perhaps it depends on differences in the BIOS or EFI, or maybe systemd vs. sysv. I share this result in case it is of help to you. Regards, Chuck Zmudzinski
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/26/2021 8:46 AM, Chuck Zmudzinski wrote: On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. In light of what I discovered while investigating the cause of bug #994899, I would tend to think calling Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting choice of words. To me, vanilla connotes boring, uninteresting. But that version of Debian Xen, and also the current version in the stable distribution, bullseye, are not boring or uninteresting as I have studied these versions and concluded they actually are now a fork of upstream Xen's 4.14 version, since they contain patches from upstream Xen's 4.16 unstable branch to better support the Raspberry Pi 4, as noted in the changelogs of those versions. So I am adding the tag upstream, Actually, I will add the upstream tag to the bug I reported in Xen, #994899, since we are talking about upstream Xen, not upstream Linux.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. In light of what I discovered while investigating the cause of bug #994899, I would tend to think calling Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting choice of words. To me, vanilla connotes boring, uninteresting. But that version of Debian Xen, and also the current version in the stable distribution, bullseye, are not boring or uninteresting as I have studied these versions and concluded they actually are now a fork of upstream Xen's 4.14 version, since they contain patches from upstream Xen's 4.16 unstable branch to better support the Raspberry Pi 4, as noted in the changelogs of those versions. So I am adding the tag upstream, and I suggest that the Debian Xen Team notify upstream Xen that we are planning a fork of Xen to better support popular arm devices and we are already shipping a testing version of it in our current bullseye release. We could tell upstream we are willing to stop this fork if they could assist us with backporting the reworking of the xen/arm/acpi and xen/x86/acpi code that is in upstream Xen 4.16 unstable to xen 4.14. We can tell them if they are interested in what we are doing, they can take a look at the work we are doing on our public development servers (salsa). For our own users, especially in the stable version, we should make a note of this fact in a README.Debian file and place it in an appropriate place of the binary packages. We should also note that there are encouraging results with this version for improved support on arm, but some tests indicate an annoying bug causing problems shutting down Domain 0 appear to have surfaced on x86 (amd64). For details, see bugs #991967 and #994899 on the Debian Bug Tracking System. I think this is the BEST way to truly proceed in accordance with the Debian Social Policy of courtesy and cooperation with the free software projects that are available to the public in our main repositories, and to properly inform our users what we are doing in our current Xen packages for unstable, testing, and stable.
Bug#991967: Simply ACPI powerdown/reset issue?
Hi Elliott, On zondag 26 september 2021 05:27:07 CEST Elliott Mitchell wrote: > I checked out the Debian Xen source via git. I got the current > "master" branch which is presently the candidate 4.14.3-1 version, > which includes urgent fixes. The hash is: > e7a17db0305c8de891b366ad3528e5a43015 > > On top of this I cherry-picked 3 commits from Xen's main branch: > 5a4087004d1adbbb223925f3306db0e5824a2bdc > 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 > bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Shutdown on my Xen server broke for me between 4.14.0+80-gd101b417b7-1 and 4.14.0+88-g1d1d1f5391-1 (too) and 'Knorrie' and I have been doing some experiments. We identified the 0f089bbf43 commit too, but also 2 other ones: 8b6d55c1261820bb9db8d867ce9ee77397d05203 f390941a92f102ece1b54be206a602187fd7 https://salsa.debian.org/xen-team/debian-xen/-/commits/knorrie/for-diederik-3-fixes/ is a branch Knorrie prepared for me with those 3 patches applied. I did 'git checkout' on that branch and then a 'dpkg-buildpackage -b' and installed the built .deb files and rebooted. After that, shutdown worked again :) So you may want to take a look at those patches too. HTH, Diederik signature.asc Description: This is a digitally signed message part.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. I suspect that depends on how highly motivated Debian is to support those highly popular ARM devices not just with Linux, but with Linux as a Xen Dom0 on those devices. Even if they are highly popular devices, what matters, ultimately, I think, is if there is a reason for them to be popular as devices that run a Xen dom0. Then maybe there is a chance to get some patches into the Linux kernel for this purpose. Just my two cents, FWIW.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: I presume you are suggesting I try booting 4.19.181-1 on the current version of Xen-4.14 for bullseye as a dom0. I am not inclined to try it until an official Debian developer endorses your opinion that the bug I am seeing is distinct from #991967, at which point I will report the bug I am seeing as a new bug. Chuck Zmudzinski you are getting rather close to my threshold for calling harrassment. You're not /quite/ there, but I'm concerned. Sorry if I offended you in some way, I didn't mean to. Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b By main branch, I presume you mean the unstable 4.16 branch of Xen. Correct? (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Interesting. Looks like you are honing in on solving this bug. I notice at the beginning of this message you quoted an older message of mine which does not take into account that I have reported a new bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899 because I did come to the conclusion, as you did, that there are in fact two bugs. I wonder if the results of your modified Xen 4.14.3-1 with 4.19.181-1 and 4.19.194-3 on my hardware would be of help. I have, as you might recall, older (Haswell) intel, EFI boot system, and systemd for init/shutdown services. If I get the same result, then I would agree we are seeing a regression between those two versions of Linux. Otherwise, then there may also be some tests involving EFI vs. BIOS to do. Or, based on what I have learned at #994899, also possibly we need to check systemd vs. sysv-init. Do you want me to do the test on my hardware? Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. I can confirm it did hang on my hardware with this combination of Xen and Linux versions. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. And we already have two distinct bugs on BTS. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. I agree. I believe you. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. That looks a lot like #994889. Have you ruled out the possibility that this bug is #994889 in disguise? If so, how? Or do you think #994889 is a third bug? The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. Yeah, that's alot of work. That's how I found my solution for #994889. For that bug, since the working version was Xen 4.11 and the broken version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14. So that required a bit of detective work studying git logs, but in the end, I just tested 4.12, and it was good, then 4.13 and it was good. I also tested the first Debian version of 4.14, which was actually experimental on Debian if I recall correctly. It did not include the RPI4 patches, and it was good too. So I knew the bug was introduced sometime after that, and I soon identified the RPI4 patches as the place where the bug (#994889) first appeared on my hardware. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices.
Bug#991967: Simply ACPI powerdown/reset issue?
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: > I presume you are suggesting I try booting 4.19.181-1 on the > current version of Xen-4.14 for bullseye as a dom0. I am not > inclined to try it until an official Debian developer endorses > your opinion that the bug I am seeing is distinct > from #991967, at which point I will report the bug I am > seeing as a new bug. Chuck Zmudzinski you are getting rather close to my threshold for calling harrassment. You're not /quite/ there, but I'm concerned. Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: Simply ACPI powerdown/reset issue?
Adding pkg-xen-de...@lists.alioth.debian.org into the loop. Chuck Zmudzinski replied to the bug and later replied to his own reply. To give full context, I've added the original reply in full and Chuck's reply to that (as it only quoted part of the context there). On zondag 19 september 2021 07:05:56 CEST Chuck Zmudzinski wrote: > On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso > > wrote: > > Hi Elliott, > > > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > > An experiment lead to a potential alternative explanation for #991967. > > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > > was tried on a UEFI system and the issue wasn't observed) > > > > Following up on https://bugs.debian.org/991967#12 > > > > Did you succeeded in bisecting the issue as you seem to have it > > reproducible? > > > > Regards, > > Salvatore > > Hello Elliott and Salvatore, > > I noticed this bug on bullseye ever since I have been > running bullseye as a dom0, but my testing indicates > there is no problem with src:linux but the problem > appeared in src:xen with the 4.14 version of xen on > bullseye. > > I ask Elliott if you are only seeing the problem on Debian's > xen-4.14 hypervisor? Also, which architecture, arm or > amd64? I only see the problem on the Debian xen-4.14 > hypervisor, and I have only tested on amd64, and I > have found a fix for my amd64 system which is as > follows: > > Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, > with a Haswell CPU (core i5-4590S) > > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) > > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. > > I also found a fix in src:xen: > > I noticed the series of patches in debian/patches of the > 4.14.2+25-gb6a8c4f72d-2 version of src:xen (and > earlier versions of xen-4.14 on Debian) have several patches > backported from the unstable branch of xen upstream. By > removing some of these patches from the patches > series of the src:xen package, the dom0 shuts down > as expected on my ASRock Haswell motherboard. > > I rebuilt the src:xen package after removing the following > patches from the debian/patches series and the result > was that the computer shuts down as expected if I boot > using the patched hypervisor: > > 0027-xen-rpi4-implement-watchdog-based-reset.patch > 0028-tools-python-Pass-linker-to-Python-build-process.patch > 0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > 0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch > 0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch > 0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch > 0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch > 0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch > > Most of these patches seem unrelated to the amd64 > architecture and instead affect the arm architecture, and > removing all these patches is probably more than is needed to > fix this bug, but I removed them all because I could not find > them upstream on the 4.14 branch but instead only saw them > on the xen unstable branch upstream (I did not check if they are > on the 4.15 branch upstream), and I wanted to test > a true upstream 4.14 version without these seemingly > aggressive patches added by Debian from the unstable > branch of xen upstream, and I discovered by being > more conservative and not adding these patches from the > unstable branch upstream fixed the problem! > > I suspect the following patch is the culprit for problems > shutting down on the amd64 architecture: > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > The commit log for this patch states: > > From: Julien Grall > Date: Sat, 26 Sep 2020 17:44:29 +0100 > Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory() > > The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic > while the __acpi_os_{un,}map_memory() are meant to be arch-specific. > > Currently, the former are still containing x86 specific code. > > To avoid this rather strange split, the generic helpers are reworked so > they are arch-agnostic. This requires the introduction of