On Mon, Mar 17, 2025 at 04:56:15PM +0100, Roger Pau Monné wrote: > On Sat, Mar 15, 2025 at 12:02:50AM +0000, Andrew Cooper wrote: > > On 14/03/2025 11:53 pm, Marek Marczykowski-Górecki wrote: > > > On Fri, Mar 14, 2025 at 11:23:28PM +0100, Marek Marczykowski-Górecki > > > wrote: > > >> On Fri, Mar 14, 2025 at 02:19:19PM -0700, Stefano Stabellini wrote: > > >>> On Fri, 14 Mar 2025, Marek Marczykowski-Górecki wrote: > > >>>> This is AMD Zen2 (Ryzen 5 4500U specifically), in a HP Probook 445 G7. > > >>>> > > >>>> This one has working S3, so add a test for it here. > > >>>> > > >>>> Signed-off-by: Marek Marczykowski-Górecki > > >>>> <marma...@invisiblethingslab.com> > > >>>> --- > > >>>> Cc: Jan Beulich <jbeul...@suse.com> > > >>>> Cc: Andrew Cooper <andrew.coop...@citrix.com> > > >>>> > > >>>> The suspend test added here currently fails on staging[1], but passes > > >>>> on > > >>>> staging-4.19[2]. So the regression wants fixing before committing this > > >>>> patch. > > >>>> > > >>>> [1] > > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408437140 > > >>>> [2] > > >>>> https://gitlab.com/xen-project/people/marmarek/xen/-/jobs/9408943441 > > >>> We could commit the patch now without the s3 test. > > >>> > > >>> I don't know what the x86 maintainers think about fixing the suspend > > >>> bug, but one idea would be to run a bisection between 4.20 and 4.19. > > >> I'm on it already, but it's annoying. Lets convert this thread to > > >> discussion about the issue: > > >> > > >> So, I bisected it between staging-4.19 and master. The breakage is > > >> somewhere between (inclusive): > > >> eb21ce14d709 x86/boot: Rewrite EFI/MBI2 code partly in C > > >> and > > >> 47990ecef286 x86/boot: Improve MBI2 structure check > > >> > > >> But, the first one breaks booting on this system and it remains broken > > >> until the second commit (or its parent) - at which point S3 is already > > >> broken. So, there is a range of 71 commits that may be responsible... > > >> > > >> But then, based on a matrix chat and Jan's observation I've tried > > >> reverting f75780d26b2f "xen: move per-cpu area management into common > > >> code" just on top of 47990ecef286, and that fixed suspend. > > >> Applying "xen/percpu: don't initialize percpu on resume" on top of > > >> 47990ecef286 fixes suspend too. > > >> But applying it on top of master > > >> (91772f8420dfa2fcfe4db68480c216db5b79c512 specifically) does not fix it, > > >> but the failure mode is different than without the patch - system resets > > >> on S3 resume, with no crash message on the serial console (even with > > >> sync_console), instead of hanging. > > >> And one more data point: reverting f75780d26b2f on top of master is the > > >> same as applying "xen/percpu: don't initialize percpu on resume" on > > >> master - system reset on S3 resume. > > >> So, it looks like there are more issues... > > > Another bisection round and I have the second culprit: > > > > > > 8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT > > > index hasn't changed > > > > > > With master+"xen/percpu: don't initialize percpu on resume"+revert of > > > 8e60d47cf011 suspend works again on this AMD system. > > > > That's not surprising in the slightest. > > > > Caching hardware values in Xen isn't safe across S3, which QubesOS has > > found time and time again, and for which we still have outstanding bugs. > > > > S3 turns most of the system off. RAM gets preserved, but devices and > > plenty of internal registers don't. > > I think I've spotted the issue. enable_iommu() called on resume > (ab)uses set_msi_affinity() to force an MSI register write, as it's > previous behavior was to unconditionally propagate the values. With > my change it would no longer perform such writes on resume. > > I think the patch below should help.
It doesn't, I still got reboot on resume, with no crash message on
serial (even with sync_console).
And the other patch (setting force=true in all calls) gets me panic
on boot:
(XEN) [ 11.890757] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr'
failed at arch/x86/msi.c:210
(XEN) [ 11.900440] ----[ Xen-4.21-unstable x86_64 debug=y Tainted: C
]----
(XEN) [ 11.908082] CPU: 3
(XEN) [ 11.910923] RIP: e008:[<ffff82d040304bdc>]
msi.c#write_msi_msg+0xf5/0x16c
(XEN) [ 11.918652] RFLAGS: 0000000000010002 CONTEXT: hypervisor (d0v3)
(XEN) [ 11.925404] rax: 0000000000000001 rbx: ffff830401ded840 rcx:
ffff8303f5367d38
(XEN) [ 11.933576] rdx: 0000000000000000 rsi: ffff8303f536e1d8 rdi:
8000000000000000
(XEN) [ 11.941750] rbp: ffff8303f5367d90 rsp: ffff8303f5367d68 r8:
0000000000000000
(XEN) [ 11.949924] r9: 0000000000000000 r10: ffff8303f5367d58 r11:
0000000000000000
(XEN) [ 11.958097] r12: ffff8303f5367da0 r13: 0000000000000000 r14:
0000000000000001
(XEN) [ 11.966269] r15: 0000000000000000 cr0: 0000000080050033 cr4:
00000000003506e0
(XEN) [ 11.974442] cr3: 00000003f2a0c000 cr2: ffff888100109890
(XEN) [ 11.980483] fsb: 0000000000000000 gsb: ffff8881352c0000 gss:
0000000000000000
(XEN) [ 11.988656] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs:
e008
(XEN) [ 11.996301] Xen code around <ffff82d040304bdc>
(msi.c#write_msi_msg+0xf5/0x16c):
(XEN) [ 12.004382] 41 bd 00 00 00 00 eb c2 <0f> 0b 4c 8b 73 28 44 0f b6 7b
01 41 8b 14 24 41
(XEN) [ 12.012998] Xen stack trace from rsp=ffff8303f5367d68:
(XEN) [ 12.018772] ffff8303f5582000 0000000000000000 ffff830401ded840
ffff8303f555e000
(XEN) [ 12.027036] 0000000000000000 ffff8303f5367dc8 ffff82d040304cd9
00000000fee00000
(XEN) [ 12.035299] 0000000000004061 ffff8303f5582000 ffff830401ded380
ffff82d0403c6340
(XEN) [ 12.043560] ffff830401ded9a0 ffff82d04030b00a 000000010000002a
ffff82d0403a5aaf
(XEN) [ 12.051821] ffff8303f5367ef8 ffffc90040087afc ffff8303f556f000
ffff8303f5569a80
(XEN) [ 12.060083] 000000000000002a 00000000000004d8 ffff82d04020ca6f
0000002aaaaaaaaa
(XEN) [ 12.068344] ffff830401ded380 ffff8303f555e000 00000000000004d8
aaaaaaaa00000000
(XEN) [ 12.076606] aaaaaaaaaaaaaaaa ffff8303f5367ef8 0000000000000020
ffff8303f554f000
(XEN) [ 12.084869] ffffc90040087afc 0000000000000000 0000000000000002
ffff82d0402eb7fc
(XEN) [ 12.093130] ffff88810ff6a228 ffff888100448c90 ffffffffffffffff
0000000000000000
(XEN) [ 12.101393] 0000000000000000 ffff8303f5367ef8 0000000000000000
ffff8303f554f000
(XEN) [ 12.109654] 0000000000000000 0000000000000000 0000000000000000
ffff8303f5367fff
(XEN) [ 12.117916] 0000000000000000 ffff82d0402012cd 0000000000000000
ffff88810ff6a2a4
(XEN) [ 12.126179] ffff88810ff6a360 0000000000000000 ffff88810236fde0
0000000000000060
(XEN) [ 12.134442] 0000000000000246 0000000000000000 ffffffff82b3cce0
ffff888100448c90
(XEN) [ 12.142703] 0000000000000020 ffffffff81df540a ffff88810ff6a228
ffffc90040087afc
(XEN) [ 12.150966] 0000000000000002 0000010000000000 ffffffff81df540a
000000000000e033
(XEN) [ 12.159227] 0000000000000246 ffffc90040087ae0 000000000000e02b
c2c2c2c2c2c2c2c2
(XEN) [ 12.167489] c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2 c2c2c2c2c2c2c2c2
0000e01000000003
(XEN) [ 12.175752] ffff8303f554f000 00000033b4d9f000 00000000003506e0
0000000000000000
(XEN) [ 12.184014] Xen call trace:
(XEN) [ 12.187388] [<ffff82d040304bdc>] R msi.c#write_msi_msg+0xf5/0x16c
(XEN) [ 12.194407] [<ffff82d040304cd9>] S set_msi_affinity+0x86/0x93
(XEN) [ 12.201071] [<ffff82d04030b00a>] S pirq_guest_bind+0x2c0/0x484
(XEN) [ 12.207823] [<ffff82d04020ca6f>] S do_event_channel_op+0xad1/0x11b0
(XEN) [ 12.215019] [<ffff82d0402eb7fc>] S pv_hypercall+0x55f/0x5dd
(XEN) [ 12.221504] [<ffff82d0402012cd>] S lstar_enter+0x13d/0x150
(XEN) [ 12.227899]
(XEN) [ 12.229943]
(XEN) [ 12.231987] ****************************************
(XEN) [ 12.237584] Panic on CPU 3:
(XEN) [ 12.240960] Assertion '(msg->data & (entry[-nr].msi.nvec - 1)) == nr'
failed at arch/x86/msi.c:210
(XEN) [ 12.250644] ****************************************
I wonder if similar crash happens on resume with this patch.
BTW, the -nr in the above assert looks suspicious, is it trying to fine
the first entry?
> I wonder if we should unconditionally propagate the write from
> __setup_msi_irq(), as it's also unlikely to make any difference to
> skip that write, and would further keep the previous behavior.
>
> Thanks, Roger.
> ---
> commit 1d9bfd0d45f6b547b19f0d2f752fc3bd10103971
> Author: Roger Pau Monne <roger....@citrix.com>
> Date: Mon Mar 17 15:40:11 2025 +0100
>
> x86/msi: always propagate MSI writes when not in active system mode
>
> Relax the limitation on MSI register writes, and only apply it when the
> system is in active state. For example AMD IOMMU drivers rely on using
> set_msi_affinity() to force an MSI register write on resume from
> suspension.
>
> The original patch intention was to reduce the number of MSI register
> writes when the system is in active state. Leave the other states to
> always perform the writes, as it's safer given the existing code, and it's
> expected to not make a difference performance wise.
>
> Reported-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com>
> Fixes: ('8e60d47cf011 x86/iommu: avoid MSI address and data writes if IRT
> index hasn't changed')
> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
>
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 163ccf874720..8bb3bb18af61 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -189,6 +189,15 @@ static int write_msi_msg(struct msi_desc *entry, struct
> msi_msg *msg,
> {
> entry->msg = *msg;
>
> + if ( unlikely(system_state != SYS_STATE_active) )
> + /*
> + * Always propagate writes when not in the 'active' state. The
> + * optimization to avoid the MSI address and data registers write is
> + * only relevant for runtime state, and drivers on resume (at least)
> + * rely on set_msi_affinity() to update the hardware state.
> + */
> + force = true;
> +
> if ( iommu_intremap != iommu_intremap_off )
> {
> int rc;
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
signature.asc
Description: PGP signature
