On 07/08/2025 6:29 pm, Marek Marczykowski-Górecki wrote:
> On Wed, Aug 06, 2025 at 12:46:36PM +0200, Marek Marczykowski-Górecki wrote:
>> On Wed, Aug 06, 2025 at 12:36:56PM +0200, Jan Beulich wrote:
>>> On 06.08.2025 12:23, Marek Marczykowski-Górecki wrote:
>>>> We've got several reports that S3 reliability recently regressed. We
>>>> identified it's definitely related to XSA-471 patches, and bisection
>>>> points at "x86/idle: Remove broken MWAIT implementation". I don't have
>>>> reliable reproduction steps, so I'm not 100% sure if it's really this
>>>> patch, or maybe an earlier one - but it's definitely already broken at
>>>> this point in the series. Most reports are about Xen 4.17 (as that's
>>>> what stable Qubes OS version currently use), but I think I've seen
>>>> somebody reporting the issue on 4.19 too (but I don't have clear
>>>> evidence, especially if it's the same issue).
>>> At the time we've been discussing the explicit raising of TIMER_SOFTIRQ
>>> in mwait_idle_with_hints() a lot. If it was now truly missing, that imo
>>> shouldn't cause problems only after resume, but then it may have covered
>>> for some omission during resume. As a far-fetched experiment, could you
>>> try putting that back (including the calculation of the "expires" local
>>> variable)?
>> Sure, I'll try.
> It appears this fixes the issue, at least in ~10 attempts so far
> (usually I could reproduce the issue after 2-3 attempts).
>

Can you show the exact code which seems to have made this stable?

We discussed this in the x86 maintainers meeting, and our best guess is
a timer that's not torn down or recreated properly on S3, but this is
largely speculation.

~Andrew

Reply via email to