Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski

On 9/21/2021 9:13 AM, Chuck Zmudzinski wrote:

On 9/20/2021 10:37 PM, Elliott Mitchell wrote:

On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote:

On 9/20/21 7:39 PM, Diederik de Haas wrote:

On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 

You probably then also want the following commit, which is a fix on 
that patch:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b 



Found that via the following url/query:
https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI 



I don't know whether others should be used from that as well.

I tried these two commits (adapted for the xen-4.14 branch) but this
approach did not fix the bug - with these patches applied the dom0
did not power down.

My advice for the Debian Xen Team is to consult with upstream and
get their advice on whether or not it is advisable for Debian to
retain the patches from the Xen-4.16 branch that have been
added to the Debian 4.14 package in an attempt to support
some arm devices that panic during on an unpatched Xen-4.14.
If upstream cannot help Debian backport fixes for arm panics
from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
Xen team should remove aggressive patches that really have now
turned the Debian Xen-4.14 package into a Frankenstein version
that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
for those arm devices must wait until Debian gets Xen 4.16 up
and running on the unstable and hopefully soon, testing distribution.

It is still not established you're running into #991967.  Unless the one
you're pointing towards was backported to the Xen 4.11 packages (which I
doubt) it cannot explain #991967, since at the time 4.11 was in use.

Could be this is a second bug with symptoms similar to #991967. Now
that a fix for the second bug has been identified, you might try a
4.19.181-1 kernel and see whether that fixes things.




FWIW, I tried this.

Sorry, not only does this not fix things, when I shutdown the dom0
running with the official Debian 4.19.181-1 kernel on the current
official Debian Xen-4.14 hypervisor, the dom0 not only did not
power off, it did not even reach the systemd poweroff target. 


Slight correction - after a few minutes, it did finally reach the
systemd poweroff target, but the power did not turn off.
Yet, it works perfectly on the official Debian Xen-4.11 hypervisor. 
Again,

my tests cannot confirm that there is a bug in src:linux, the only
common denominator for this bug in all my testing is src:xen, the
and it appears in all the 4.14 Xen versions for bullseye, for every 
single

Linux version tested.

Chuck




Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski

On 9/20/2021 10:37 PM, Elliott Mitchell wrote:

On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote:

On 9/20/21 7:39 PM, Diederik de Haas wrote:

On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8

You probably then also want the following commit, which is a fix on that patch:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

Found that via the following url/query:
https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI

I don't know whether others should be used from that as well.

I tried these two commits (adapted for the xen-4.14 branch) but this
approach did not fix the bug - with these patches applied the dom0
did not power down.

My advice for the Debian Xen Team is to consult with upstream and
get their advice on whether or not it is advisable for Debian to
retain the patches from the Xen-4.16 branch that have been
added to the Debian 4.14 package in an attempt to support
some arm devices that panic during on an unpatched Xen-4.14.
If upstream cannot help Debian backport fixes for arm panics
from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
Xen team should remove aggressive patches that really have now
turned the Debian Xen-4.14 package into a Frankenstein version
that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
for those arm devices must wait until Debian gets Xen 4.16 up
and running on the unstable and hopefully soon, testing distribution.

It is still not established you're running into #991967.  Unless the one
you're pointing towards was backported to the Xen 4.11 packages (which I
doubt) it cannot explain #991967, since at the time 4.11 was in use.

Could be this is a second bug with symptoms similar to #991967.  Now
that a fix for the second bug has been identified, you might try a
4.19.181-1 kernel and see whether that fixes things.




FWIW, I tried this.

Sorry, not only does this not fix things, when I shutdown the dom0
running with the official Debian 4.19.181-1 kernel on the current
official Debian Xen-4.14 hypervisor, the dom0 not only did not
power off, it did not even reach the systemd poweroff target. Yet,
it works perfectly on the official Debian Xen-4.11 hypervisor. Again,
my tests cannot confirm that there is a bug in src:linux, the only
common denominator for this bug in all my testing is src:xen, the
and it appears in all the 4.14 Xen versions for bullseye, for every single
Linux version tested.

Chuck



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski



On 9/20/21 10:12 PM, Chuck Zmudzinski wrote:

On 9/20/21 6:29 PM, Chuck Zmudzinski wrote:

On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:


On 9/20/21 12:27 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:


I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but 
that

is rather small to be the culprit.


I just tested the build with
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
applied before building the package and I can confirm that this is 
the patch
causing the trouble for dom0 poweroff on x86/amd64. Reverting this 
patch
fixes it on my amd64 system. But this would probably break the arm 
build.


I think one possible fix would require modifying
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
so it only applies at runtime to the arm architecture. I will try some
modifications to the patch instead of removing it, and if I get 
something

that works on amd64 and also might work on arm, I will post it
for Elliott to try.


I have an encouraging result. I found a very simple patch
to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
bug on my system and it should not affect the arm patches
at all:
--
This patch partially reverts previous patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

This hopefully fixes #911976

--- a/xen/arch/x86/acpi/lib.c    2021-09-20 16:49:08.0 -0400
+++ b/xen/arch/x86/acpi/lib.c    2021-09-20 16:25:05.572038000 -0400
@@ -46,10 +46,6 @@
 if ((phys + size) <= (1 * 1024 * 1024))
     return __va(phys);

-    /* No further arch specific implementation after early boot */
-    if (system_state >= SYS_STATE_boot)
-        return NULL;
-
 offset = phys & (PAGE_SIZE - 1);
 mapped_size = PAGE_SIZE - offset;
 set_fixmap(FIX_ACPI_END, phys);
--




Further testing with this patch revealed a problem. Although
this simple patch causes dom0 to poweroff when shutting
down, on the next reboot the system dropped to single-user
shell because it mixed up my ssd and my hard disk. Normally
the system assigns my SSD as /dev/sda and my hard disk
as /dev/sdb. But on the first reboot after running the Xen
hypervisor, the system reversed them so my SSD was /dev/sdb
and my hard disk was /dev/sda. Since the EFI partition, which
is a vfat partition, is on the SSD and in /etc/fstab I ask to mount
it from the /dev/sda1 partition, it is now at /dev/sdb1, and
the first partition is not a vfat partition on the hard disk so
the system drops to a root shell for system maintenance.

This switching of the devices on the subsequent reboot is
another symptom of this bug I have seen in the past, and
usually the ordinary behavior is restored on the next reboot
or after resetting and powering off or unplugging from power.
So this patch does not really fix the bug reliably.


To clarify things, I saw this strange behavior of the system
switching the disk devices with this patch under the following
conditions:

1) Boot using this simple patch - dom0 shuts down properly

2) Boot using Elliott's suggested patch in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#94

3) It was when booting using Elliott's suggested patch that
I saw the drop to single-user root for system maintenance.
Moreover, Elliott's suggested patch did not fix the dom0
power off bug.

So it might be the case that this simple patch would work
for both amd64 and arm devices nicely, but Elliott refuses
to test it with his arm devices. Sigh.



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski

On 9/20/21 10:37 PM, Elliott Mitchell wrote:

On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote:

On 9/20/21 7:39 PM, Diederik de Haas wrote:

On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8

You probably then also want the following commit, which is a fix on that patch:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

Found that via the following url/query:
https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI

I don't know whether others should be used from that as well.

I tried these two commits (adapted for the xen-4.14 branch) but this
approach did not fix the bug - with these patches applied the dom0
did not power down.

My advice for the Debian Xen Team is to consult with upstream and
get their advice on whether or not it is advisable for Debian to
retain the patches from the Xen-4.16 branch that have been
added to the Debian 4.14 package in an attempt to support
some arm devices that panic during on an unpatched Xen-4.14.
If upstream cannot help Debian backport fixes for arm panics
from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
Xen team should remove aggressive patches that really have now
turned the Debian Xen-4.14 package into a Frankenstein version
that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
for those arm devices must wait until Debian gets Xen 4.16 up
and running on the unstable and hopefully soon, testing distribution.

It is still not established you're running into #991967.  Unless the one
you're pointing towards was backported to the Xen 4.11 packages (which I
doubt) it cannot explain #991967, since at the time 4.11 was in use.

Could be this is a second bug with symptoms similar to #991967.  Now
that a fix for the second bug has been identified, you might try a
4.19.181-1 kernel and see whether that fixes things.




I presume you are suggesting I try booting 4.19.181-1 on the
current version of Xen-4.14 for bullseye as a dom0. I am not
inclined to try it until an official Debian developer endorses
your opinion that the bug I am seeing is distinct
from #991967, at which point I will report the bug I am
seeing as a new bug.

Regards,

Chuck Zmudzinski



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski



On 9/20/21 7:39 PM, Diederik de Haas wrote:

On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8

You probably then also want the following commit, which is a fix on that patch:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

Found that via the following url/query:
https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI

I don't know whether others should be used from that as well.


I tried these two commits (adapted for the xen-4.14 branch) but this
approach did not fix the bug - with these patches applied the dom0
did not power down.

My advice for the Debian Xen Team is to consult with upstream and
get their advice on whether or not it is advisable for Debian to
retain the patches from the Xen-4.16 branch that have been
added to the Debian 4.14 package in an attempt to support
some arm devices that panic during on an unpatched Xen-4.14.
If upstream cannot help Debian backport fixes for arm panics
from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
Xen team should remove aggressive patches that really have now
turned the Debian Xen-4.14 package into a Frankenstein version
that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
for those arm devices must wait until Debian gets Xen 4.16 up
and running on the unstable and hopefully soon, testing distribution.



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-21 Thread Chuck Zmudzinski

On 9/20/21 6:29 PM, Chuck Zmudzinski wrote:

On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:


On 9/20/21 12:27 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:


I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but 
that

is rather small to be the culprit.


I just tested the build with
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
applied before building the package and I can confirm that this is 
the patch

causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch
fixes it on my amd64 system. But this would probably break the arm 
build.


I think one possible fix would require modifying
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
so it only applies at runtime to the arm architecture. I will try some
modifications to the patch instead of removing it, and if I get 
something

that works on amd64 and also might work on arm, I will post it
for Elliott to try.


I have an encouraging result. I found a very simple patch
to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
bug on my system and it should not affect the arm patches
at all:
--
This patch partially reverts previous patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

This hopefully fixes #911976

--- a/xen/arch/x86/acpi/lib.c    2021-09-20 16:49:08.0 -0400
+++ b/xen/arch/x86/acpi/lib.c    2021-09-20 16:25:05.572038000 -0400
@@ -46,10 +46,6 @@
 if ((phys + size) <= (1 * 1024 * 1024))
     return __va(phys);

-    /* No further arch specific implementation after early boot */
-    if (system_state >= SYS_STATE_boot)
-        return NULL;
-
 offset = phys & (PAGE_SIZE - 1);
 mapped_size = PAGE_SIZE - offset;
 set_fixmap(FIX_ACPI_END, phys);
--




Further testing with this patch revealed a problem. Although
this simple patch causes dom0 to poweroff when shutting
down, on the next reboot the system dropped to single-user
shell because it mixed up my ssd and my hard disk. Normally
the system assigns my SSD as /dev/sda and my hard disk
as /dev/sdb. But on the first reboot after running the Xen
hypervisor, the system reversed them so my SSD was /dev/sdb
and my hard disk was /dev/sda. Since the EFI partition, which
is a vfat partition, is on the SSD and in /etc/fstab I ask to mount
it from the /dev/sda1 partition, it is now at /dev/sdb1, and
the first partition is not a vfat partition on the hard disk so
the system drops to a root shell for system maintenance.

This switching of the devices on the subsequent reboot is
another symptom of this bug I have seen in the past, and
usually the ordinary behavior is restored on the next reboot
or after resetting and powering off or unplugging from power.

So this patch does not really fix the bug reliably.



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Elliott Mitchell
On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote:
> 
> On 9/20/21 7:39 PM, Diederik de Haas wrote:
> > On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:
> >> Merely having the path is a sufficiently strong indicator for me to
> >> simply wave it past.  I though would suggest Debian should instead
> >> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.
> >>
> >> This is available as a patch at:
> >>
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8
> > You probably then also want the following commit, which is a fix on that 
> > patch:
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b
> >
> > Found that via the following url/query:
> > https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI
> >
> > I don't know whether others should be used from that as well.
> 
> I tried these two commits (adapted for the xen-4.14 branch) but this
> approach did not fix the bug - with these patches applied the dom0
> did not power down.
> 
> My advice for the Debian Xen Team is to consult with upstream and
> get their advice on whether or not it is advisable for Debian to
> retain the patches from the Xen-4.16 branch that have been
> added to the Debian 4.14 package in an attempt to support
> some arm devices that panic during on an unpatched Xen-4.14.
> If upstream cannot help Debian backport fixes for arm panics
> from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
> Xen team should remove aggressive patches that really have now
> turned the Debian Xen-4.14 package into a Frankenstein version
> that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
> for those arm devices must wait until Debian gets Xen 4.16 up
> and running on the unstable and hopefully soon, testing distribution.

It is still not established you're running into #991967.  Unless the one
you're pointing towards was backported to the Xen 4.11 packages (which I
doubt) it cannot explain #991967, since at the time 4.11 was in use.

Could be this is a second bug with symptoms similar to #991967.  Now
that a fix for the second bug has been identified, you might try a
4.19.181-1 kernel and see whether that fixes things.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Chuck Zmudzinski

On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:


On 9/20/21 12:27 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:


I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.


I just tested the build with
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
applied before building the package and I can confirm that this is the 
patch

causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch
fixes it on my amd64 system. But this would probably break the arm build.

I think one possible fix would require modifying
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
so it only applies at runtime to the arm architecture. I will try some
modifications to the patch instead of removing it, and if I get something
that works on amd64 and also might work on arm, I will post it
for Elliott to try.


I have an encouraging result. I found a very simple patch
to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
bug on my system and it should not affect the arm patches
at all:
--
This patch partially reverts previous patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

This hopefully fixes #911976

--- a/xen/arch/x86/acpi/lib.c    2021-09-20 16:49:08.0 -0400
+++ b/xen/arch/x86/acpi/lib.c    2021-09-20 16:25:05.572038000 -0400
@@ -46,10 +46,6 @@
 if ((phys + size) <= (1 * 1024 * 1024))
     return __va(phys);

-    /* No further arch specific implementation after early boot */
-    if (system_state >= SYS_STATE_boot)
-        return NULL;
-
 offset = phys & (PAGE_SIZE - 1);
 mapped_size = PAGE_SIZE - offset;
 set_fixmap(FIX_ACPI_END, phys);
--

Can you try this patch to src:xen and see if your
arm devices are OK with it?



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Diederik de Haas
On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:
> Merely having the path is a sufficiently strong indicator for me to
> simply wave it past.  I though would suggest Debian should instead
> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.
> 
> This is available as a patch at:
> 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8

You probably then also want the following commit, which is a fix on that patch:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

Found that via the following url/query:
https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI

I don't know whether others should be used from that as well.

signature.asc
Description: This is a digitally signed message part.


Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Elliott Mitchell
On Mon, Sep 20, 2021 at 06:29:49PM -0400, Chuck Zmudzinski wrote:
> On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:
> >
> > On 9/20/21 12:27 AM, Elliott Mitchell wrote:
> >> On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> >>
> >>> I suspect the following patch is the culprit for problems
> >>> shutting down on the amd64 architecture:
> >>>
> >>> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> >>> This patch does affect amd64 acpi code, and is probably causing
> >>> the problem on my amd64 system, so my build of the xen-4.14
> >>> hypervisor without this patch fixed the problem.
> >> Of the ones listed that is the only one which has any overlap with x86
> >> code.?? The next reproduction step is `apt-get source xen &&
> >> patch -p1 -R < 
> >> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> >> && dpkg-buildpackage -b`.?? Then try with this to confirm that patch
> >> is what does it.
> >>
> >> Thing is that delta is rather small.?? I don't have a simulator, but that
> >> is rather small to be the culprit.
> >
> > I just tested the build with
> > patch -p1 -R < 
> > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> > applied before building the package and I can confirm that this is the 
> > patch
> > causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch
> > fixes it on my amd64 system. But this would probably break the arm build.
> >
> > I think one possible fix would require modifying
> > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> > so it only applies at runtime to the arm architecture. I will try some
> > modifications to the patch instead of removing it, and if I get something
> > that works on amd64 and also might work on arm, I will post it
> > for Elliott to try.
> 
> I have an encouraging result. I found a very simple patch
> to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
> bug on my system and it should not affect the arm patches
> at all:
> --
> This patch partially reverts previous patch
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> 
> This hopefully fixes #911976
> 
> --- a/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:49:08.0 -0400
> +++ b/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:25:05.572038000 -0400
> @@ -46,10 +46,6 @@
>   if ((phys + size) <= (1 * 1024 * 1024))
>   ?? return __va(phys);
> 
> -?? /* No further arch specific implementation after early boot */
> -?? if (system_state >= SYS_STATE_boot)
> -?? ?? return NULL;
> -
>   offset = phys & (PAGE_SIZE - 1);
>   mapped_size = PAGE_SIZE - offset;
>   set_fixmap(FIX_ACPI_END, phys);
> --
> 
> Can you try this patch to src:xen and see if your
> arm devices are OK with it?

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8


The other commit I would suggest being picked by src:xen is
5a4087004d1adbbb223925f3306db0e5824a2bdc

This is for device-tree funkiness which got added between linux-5.10.0
and linux-5.10.y (if the Debian kernel team wants to maintain a fix in
Debian's kernel source, that works too).

BTW have I mentioned I've become rather skeptical of device-trees being
a usable way of representing hardware information?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Chuck Zmudzinski



On 9/20/21 12:27 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:


I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.


I just tested the build with
patch -p1 -R < 
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

applied before building the package and I can confirm that this is the patch
causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch
fixes it on my amd64 system. But this would probably break the arm build.

I think one possible fix would require modifying
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
so it only applies at runtime to the arm architecture. I will try some
modifications to the patch instead of removing it, and if I get something
that works on amd64 and also might work on arm, I will post it
for Elliott to try.



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Chuck Zmudzinski



On 9/20/21 12:27 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.
I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

You're referencing several software versions which are mismatches for
#991967.  #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3,
but not Linux kernel 4.19.181.

The fact it correlates with a Linux kernel update rather strongly points
to the Linux kernel.  I could believe the situation is partially the
fault of both though.


I don't see it with Xen-4.11 and Linux kernel 4.19.194-3 which is
the current default dom0 configuration on Debian buster, but I
do see it with Debian's version of Xen-4.14 and either Linux
kernel 4.19.194-3 from buster or Linux kernel 5.10.46-4 from
bullseye as the dom0. So I only saw it with the update of the
Xen hypervisor from 4.11 to 4.14. Of course you have different
hardware and a different acpi implementation which is also likely
to be a factor that determines whether or not the dom0 poweroff
bug manifests itself.




I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.


I did try to remove this single patch from the xen build using
quilt, but quilt was not happy when it tried to apply the
subsequent arm patch, so I just removed all the subsequent
arm patches to keep quilt happy with my modified xen
src tree. I will try it now, though.

If it is this small a delta that is causing the problem
on x86/amd64, then maybe we can come up with a workaround
in src:xen that is acceptable for both arm and x86/amd64.




I think this bug should be re-classified as a bug in src:xen.

There could be a separate bug in src:xen, but that is not #991967.


I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch upstream
should not be included in the xen package for Debian stable.

It was requested since someone trying to have Xen operational on a device
needed those for operation.  Rather a lot of bugfix or very small
standalone feature patches get cherry-picked.


Presently I haven't been convinced this is a Xen bug (though it does
effect Xen installations).

Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux
kernel?  I'm suspecting got incorrectly backported on the Linux side
(alternatively the Xen project seems a bit poor at keeping needed patches
in Linux).




Yes, I recently built and tested a slightly modified Debian
bullseye kernel to test a fix for #983357:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357

If you have a patch for Debian's 5.10 bullseye kernel that
might fix the dom0 poweroff bug I am seeing on bullseye with
Debian's current Xen 4.14, I am willing to try it out on my
system as an alternate fix from the fix I discovered in
src:xen that unfortunately removes arm patches that are
needed by some devices.



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Chuck Zmudzinski

On 9/19/2021 9:30 PM, Chuck Zmudzinski wrote:

On 9/19/2021 4:53 PM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote:

On 9/19/2021 1:29 PM, Elliott Mitchell wrote:

Have you tried memory ballooning with PVH or HVM domains?

That combination has been reliably crashing Xen for me for a while.
Apparently few others have run into it, yet it is reliable for me.  
Have

you tried the combination?  Works?  Panics?

I have not tried ballooning HVM or PVH domains. If the Xen
hypervisor is crashing when ballooning unprivileged domains,
doesn't that support my belief that there are bugs in src:xen
rather than in src:linux?

No.


I still think the patches to fix a panic on devices using the arm 
architecture

are a bit aggressive for the Debian Xen package for Debian stable. Those
patches upstream are intended for Xen unstable, which is currently
Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for
Debian stable, especially after it can be proven they cause a regression
for Xen users of amd64 devices, the regression being that they break the
proper shutdown functioning of amd64 devices.

I think the correct Debian way to support the arm devices that
panic on a true upstream Xen 4.14 hypervisor without the
patches for arm that cause dom0 to not power off properly on
amd64 is by first testing the arm patches as part of a new Xen 4.16
unstable Xen package for Debian unstable, then follow ordinary
development procedures for porting Xen 4.16 to bookworm/testing,
and then finally a backport of Xen 4.16 to bullseye. That is the
only way I can see this being done without causing grief to
Xen users who want a stable Xen on a stable Debian, unless
upstream can help with porting the arm patches back to Xen 4.14
in such a way that they don't break things on amd64.

This was also deliberately not copied to #991967 since this is 
unrelated.

I'm concerned this second one might be Debian, but the small delta makes
me think it likely originates from upstream Xen.  I was wondering 
whether

you had seen it since I haven't found other reports.

(note, if you try recreating, this is a Xen panic, all domains get lost)




This is off-topic for bug #991968.

Regards,

Chuck


Also off-topic for bug #991967 - sorry about the typo.

Chuck



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Chuck Zmudzinski

On 9/19/2021 4:53 PM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote:

On 9/19/2021 1:29 PM, Elliott Mitchell wrote:

Have you tried memory ballooning with PVH or HVM domains?

That combination has been reliably crashing Xen for me for a while.
Apparently few others have run into it, yet it is reliable for me.  Have
you tried the combination?  Works?  Panics?

I have not tried ballooning HVM or PVH domains. If the Xen
hypervisor is crashing when ballooning unprivileged domains,
doesn't that support my belief that there are bugs in src:xen
rather than in src:linux?

No.


I still think the patches to fix a panic on devices using the arm 
architecture

are a bit aggressive for the Debian Xen package for Debian stable. Those
patches upstream are intended for Xen unstable, which is currently
Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for
Debian stable, especially after it can be proven they cause a regression
for Xen users of amd64 devices, the regression being that they break the
proper shutdown functioning of amd64 devices.

I think the correct Debian way to support the arm devices that
panic on a true upstream Xen 4.14 hypervisor without the
patches for arm that cause dom0 to not power off properly on
amd64 is by first testing the arm patches as part of a new Xen 4.16
unstable Xen package for Debian unstable, then follow ordinary
development procedures for porting Xen 4.16 to bookworm/testing,
and then finally a backport of Xen 4.16 to bullseye. That is the
only way I can see this being done without causing grief to
Xen users who want a stable Xen on a stable Debian, unless
upstream can help with porting the arm patches back to Xen 4.14
in such a way that they don't break things on amd64.


This was also deliberately not copied to #991967 since this is unrelated.
I'm concerned this second one might be Debian, but the small delta makes
me think it likely originates from upstream Xen.  I was wondering whether
you had seen it since I haven't found other reports.

(note, if you try recreating, this is a Xen panic, all domains get lost)




This is off-topic for bug #991968.

Regards,

Chuck



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Elliott Mitchell
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64
> 
> linux kernel version: 5.10.46-4 (the current amd64 kernel
> for bullseye)
> 
> Boot system: EFI, not using secure boot, booting xen
> hypervisor and dom0 bullseye with grub-efi package for
> bullseye, and it boots the xen-4.14-amd64.gz file, not
> the xen-4.14-amd64.efi file.

> I also tested a buster dom0 with the 4.19 series kernel
> on the xen-4.14 hypervisor from bullseye and saw the
> problem, but I did not see the problem with either
> a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
> the xen-4.11 hypervisor, so I think the problem is
> with the Debian version of the xen-4.14 hypervisor,
> not with src:linux.

You're referencing several software versions which are mismatches for
#991967.  #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3,
but not Linux kernel 4.19.181.

The fact it correlates with a Linux kernel update rather strongly points
to the Linux kernel.  I could believe the situation is partially the
fault of both though.


> I suspect the following patch is the culprit for problems
> shutting down on the amd64 architecture:
> 
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

> This patch does affect amd64 acpi code, and is probably causing
> the problem on my amd64 system, so my build of the xen-4.14
> hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.


> I think this bug should be re-classified as a bug in src:xen.

There could be a separate bug in src:xen, but that is not #991967.

> I also would inquire with the Debian Xen Team about why they
> are backporting patches from the upstream xen unstable
> branch into Debian's 4.14 package that is currently shipping
> on Debian stable (bullseye). IMHO, the aforementioned
> patches that are not in the stable 4.14 branch upstream
> should not be included in the xen package for Debian stable.

It was requested since someone trying to have Xen operational on a device
needed those for operation.  Rather a lot of bugfix or very small
standalone feature patches get cherry-picked.


Presently I haven't been convinced this is a Xen bug (though it does
effect Xen installations).

Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux
kernel?  I'm suspecting got incorrectly backported on the Linux side
(alternatively the Xen project seems a bit poor at keeping needed patches
in Linux).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Chuck Zmudzinski

On 9/19/2021 1:29 PM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

Actually hardware which is pretty different from mine, so you may run
into distinct bugs.

Have you tried PVH or HVM domains?


HVM domains: Yes, and they work normally on all Debian versions
I have tried..

PVH domains: No, I have not tried these on Debian.


Have you tried memory ballooning with PVH or HVM domains?

That combination has been reliably crashing Xen for me for a while.
Apparently few others have run into it, yet it is reliable for me.  Have
you tried the combination?  Works?  Panics?


I have not tried ballooning HVM or PVH domains. If the Xen
hypervisor is crashing when ballooning unprivileged domains,
doesn't that support my belief that there are bugs in src:xen
rather than in src:linux?

Regards,

Chuck



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Chuck Zmudzinski

On 9/19/2021 10:56 AM, Elliott Mitchell wrote:

On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:

On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso
 wrote:
  >
  > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
  > > An experiment lead to a potential alternative explanation for #991967.
  > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
  > > 4.19.194-3. Presence of Xen on the system may be unrelated.
  > >
  > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
  > > was tried on a UEFI system and the issue wasn't observed)
  >
  > Following up on https://bugs.debian.org/991967#12
  >
  > Did you succeeded in bisecting the issue as you seem to have it
  > reproducible?

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Nope.  As per the report the problem appeared with kernel 4.19.194-3 and
at the time using Xen 4.11.

The kernel you're listing is rather more recent, which might suggest a
patch which had been backported from 5.x to 4.19.

I could believe a Xen security update being the trigger though (I don't
recall there being one at the right time, but I wouldn't rule it out).



Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

Just to make sure, the kernel you were testing was 4.19.194-3?  The
issue didn't manifest with kernels earlier than that.


I will check again with a buster dom0 when I get a chance,
probably late tonight or tomorrow. I think it was 4.19.194-3
if that is the latest buster kernel because I don't think there
has been an update to the buster kernel since I tested it.


Could be we're seeing distinct bugs.


I could agree if the problem shows up on my system
with the 4.19.194-3 kernel dom0 on xen-4.11, but if not,
then it is probably the same bug, a bug that is in src:xen,
not src:linux.




This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

While that commit modifies the code path the processor takes, the
modified path appears identical.



I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch upstream
should not be included in the xen package for Debian stable.

Some people are asking for those.  Those are bugfixes for an extremely
popular device which panics on boot without the patches.


The raspberry pi, I presume.



Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees
were modified in a way which broke Xen 4.14 on ARM64.  The change
violated Linux's own standards for device-trees, yet still appeared in a
stable branch.

In other news, if you see device-trees compared to ACPI tables, they're
not very comparable.  99% of ACPI tables work for all versions of all
OSes.  Any given device-tree is only likely to work for a single version
of a single OS.  While a useful abstraction for portions of kernel code,
device-trees are utter garbage compared to ACPI tables.




Well, now we are at Debian stable with 5.10.x for linux and 4.14.x for xen,
so we are kind of stuck with these versions on Debian stable now. I am all
for tweaking the Debian stable packages to support raspberry and amd64. The
question is, what is the quickest and least disturbing way to fix it now?

All the best,

Chuck



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Chuck Zmudzinski

On 9/19/2021 1:05 AM, Chuck Zmudzinski wrote:


Hello Elliott and Salvatore,

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

I also found a fix in src:xen:

I noticed the series of patches in debian/patches of the
4.14.2+25-gb6a8c4f72d-2 version of src:xen (and
earlier versions of xen-4.14 on Debian) have several patches
backported from the unstable branch of xen upstream. By
removing some of these patches from the patches
series of the src:xen package, the dom0 shuts down
as expected on my ASRock Haswell motherboard.

I rebuilt the src:xen package after removing the following
patches from the debian/patches series and the result
was that the computer shuts down as expected if I boot
using the patched hypervisor:

0027-xen-rpi4-implement-watchdog-based-reset.patch
0028-tools-python-Pass-linker-to-Python-build-process.patch
0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch

Most of these patches seem unrelated to the amd64
architecture and instead affect the arm architecture, and
removing all these patches is probably more than is needed to
fix this bug, but I removed them all because I could not find
them upstream on the 4.14 branch but instead only saw them
on the xen unstable branch upstream (I did not check if they are
on the 4.15 branch upstream), and I wanted to test
a true upstream 4.14 version without these seemingly
aggressive patches added by Debian from the unstable
branch of xen upstream, and I discovered by being
more conservative and not adding these patches from the
unstable branch upstream fixed the problem!

I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

The commit log for this patch states:

From: Julien Grall 
Date: Sat, 26 Sep 2020 17:44:29 +0100
Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall 
Reviewed-by: Rahul Singh 
Reviewed-by: Jan Beulich 
Acked-by: Stefano Stabellini 
Tested-by: Rahul Singh 
Tested-by: Elliott Mitchell 
(cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973)
---

This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

I think this bug should be re-classified as a bug in src:xen.

I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch 

Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Elliott Mitchell
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso 
>  wrote:
>  >
>  > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
>  > > An experiment lead to a potential alternative explanation for #991967.
>  > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
>  > > 4.19.194-3. Presence of Xen on the system may be unrelated.
>  > >
>  > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
>  > > was tried on a UEFI system and the issue wasn't observed)
>  >
>  > Following up on https://bugs.debian.org/991967#12
>  >
>  > Did you succeeded in bisecting the issue as you seem to have it
>  > reproducible?
> 
> I noticed this bug on bullseye ever since I have been
> running bullseye as a dom0, but my testing indicates
> there is no problem with src:linux but the problem
> appeared in src:xen with the 4.14 version of xen on
> bullseye.
> 
> I ask Elliott if you are only seeing the problem on Debian's
> xen-4.14 hypervisor? Also, which architecture, arm or
> amd64? I only see the problem on the Debian xen-4.14
> hypervisor, and I have only tested on amd64, and I
> have found a fix for my amd64 system which is as
> follows:
> 
> Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
> with a Haswell CPU (core i5-4590S)
> 
> xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64
> 
> linux kernel version: 5.10.46-4 (the current amd64 kernel
> for bullseye)

Nope.  As per the report the problem appeared with kernel 4.19.194-3 and
at the time using Xen 4.11.

The kernel you're listing is rather more recent, which might suggest a
patch which had been backported from 5.x to 4.19.

I could believe a Xen security update being the trigger though (I don't
recall there being one at the right time, but I wouldn't rule it out).


> Boot system: EFI, not using secure boot, booting xen
> hypervisor and dom0 bullseye with grub-efi package for
> bullseye, and it boots the xen-4.14-amd64.gz file, not
> the xen-4.14-amd64.efi file.
> 
> I also tested a buster dom0 with the 4.19 series kernel
> on the xen-4.14 hypervisor from bullseye and saw the
> problem, but I did not see the problem with either
> a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
> the xen-4.11 hypervisor, so I think the problem is
> with the Debian version of the xen-4.14 hypervisor,
> not with src:linux.

Just to make sure, the kernel you were testing was 4.19.194-3?  The
issue didn't manifest with kernels earlier than that.

Could be we're seeing distinct bugs.


> This patch does affect amd64 acpi code, and is probably causing
> the problem on my amd64 system, so my build of the xen-4.14
> hypervisor without this patch fixed the problem.

While that commit modifies the code path the processor takes, the
modified path appears identical.


> I also would inquire with the Debian Xen Team about why they
> are backporting patches from the upstream xen unstable
> branch into Debian's 4.14 package that is currently shipping
> on Debian stable (bullseye). IMHO, the aforementioned
> patches that are not in the stable 4.14 branch upstream
> should not be included in the xen package for Debian stable.

Some people are asking for those.  Those are bugfixes for an extremely
popular device which panics on boot without the patches.


Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees
were modified in a way which broke Xen 4.14 on ARM64.  The change
violated Linux's own standards for device-trees, yet still appeared in a
stable branch.

In other news, if you see device-trees compared to ACPI tables, they're
not very comparable.  99% of ACPI tables work for all versions of all
OSes.  Any given device-tree is only likely to work for a single version
of a single OS.  While a useful abstraction for portions of kernel code,
device-trees are utter garbage compared to ACPI tables.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Chuck Zmudzinski
On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso 
 wrote:

> Hi Elliott,
>
> On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
> > An experiment lead to a potential alternative explanation for #991967.
> > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
> > 4.19.194-3. Presence of Xen on the system may be unrelated.
> >
> > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
> > was tried on a UEFI system and the issue wasn't observed)
>
> Following up on https://bugs.debian.org/991967#12
>
> Did you succeeded in bisecting the issue as you seem to have it
> reproducible?
>
> Regards,
> Salvatore
>
>

Hello Elliott and Salvatore,

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

I also found a fix in src:xen:

I noticed the series of patches in debian/patches of the
4.14.2+25-gb6a8c4f72d-2 version of src:xen (and
earlier versions of xen-4.14 on Debian) have several patches
backported from the unstable branch of xen upstream. By
removing some of these patches from the patches
series of the src:xen package, the dom0 shuts down
as expected on my ASRock Haswell motherboard.

I rebuilt the src:xen package after removing the following
patches from the debian/patches series and the result
was that the computer shuts down as expected if I boot
using the patched hypervisor:

0027-xen-rpi4-implement-watchdog-based-reset.patch
0028-tools-python-Pass-linker-to-Python-build-process.patch
0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch

Most of these patches seem unrelated to the amd64
architecture and instead affect the arm architecture, and
removing all these patches is probably more than is needed to
fix this bug, but I removed them all because I could not find
them upstream on the 4.14 branch but instead only saw them
on the xen unstable branch upstream (I did not check if they are
on the 4.15 branch upstream), and I wanted to test
a true upstream 4.14 version without these seemingly
aggressive patches added by Debian from the unstable
branch of xen upstream, and I discovered by being
more conservative and not adding these patches from the
unstable branch upstream fixed the problem!

I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

The commit log for this patch states:

From: Julien Grall 
Date: Sat, 26 Sep 2020 17:44:29 +0100
Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall 
Reviewed-by: Rahul Singh 
Reviewed-by: Jan Beulich 
Acked-by: Stefano Stabellini 
Tested-by: Rahul Singh 
Tested-by: Elliott Mitchell 
(cherry picked from commit 

Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-12 Thread Elliott Mitchell
On Sat, Sep 11, 2021 at 01:29:12PM +0200, Salvatore Bonaccorso wrote:
> On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
> > An experiment lead to a potential alternative explanation for #991967.
> > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
> > 4.19.194-3.  Presence of Xen on the system may be unrelated.
> > 
> > Failing that, it could be Xen and non-UEFI systems are effected.  (Xen
> > was tried on a UEFI system and the issue wasn't observed)
> 
> Following up on https://bugs.debian.org/991967#12
> 
> Did you succeeded in bisecting the issue as you seem to have it
> reproducible?

Problem is that is rather a lot of kernel builds, which also means a lot
of downtime...   Right now distribution update seems worthy of greater
attention.

The one notable bit is the one I sent in the last message.  The system
does NOT have UEFI, and a test system with UEFI seemed to have no
problem.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-11 Thread Salvatore Bonaccorso
Hi Elliott,

On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
> An experiment lead to a potential alternative explanation for #991967.
> The issue may be ACPI (non-UEFI) powerdown/reset was broken at
> 4.19.194-3.  Presence of Xen on the system may be unrelated.
> 
> Failing that, it could be Xen and non-UEFI systems are effected.  (Xen
> was tried on a UEFI system and the issue wasn't observed)

Following up on https://bugs.debian.org/991967#12

Did you succeeded in bisecting the issue as you seem to have it
reproducible?

Regards,
Salvatore



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-10 Thread Elliott Mitchell
An experiment lead to a potential alternative explanation for #991967.
The issue may be ACPI (non-UEFI) powerdown/reset was broken at
4.19.194-3.  Presence of Xen on the system may be unrelated.

Failing that, it could be Xen and non-UEFI systems are effected.  (Xen
was tried on a UEFI system and the issue wasn't observed)


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445