Hi Jan, On Mon, Aug 18, 2025 at 1:15 PM Jan Beulich <jbeul...@suse.com> wrote: > > On 18.08.2025 10:49, Mykola Kvach wrote: > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p) > > if ( is_idle_vcpu(p) ) > > return; > > > > + /* Arch timer */ > > + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1); > > + virt_timer_save(p); > > + > > + /* VGIC */ > > + gic_save_state(p); > > + > > + if ( test_bit(_VPF_suspended, &p->pause_flags) ) > > + return; > > As I had to look at the Arm side uses of the new bit to at least try to > follow the comment further down, I came across this. And now I wonder: > Why would one of the pause flags need special casing here?
Some kind of answer was given in a previous version of this patch series, see: https://patchew.org/Xen/cover.1751020456.git.mykola._5fkv...@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5fkv...@epam.com/#066c6e93-a478-4c8f-b161-d109bd0e6...@xen.org To clarify: We need to avoid updating the vCPU context when switching from a vCPU of a domain that is being suspended to a vCPU of another domain. After the current HVC trap, which handles the PSCI SYSTEM_SUSPEND call, completes, the scheduler will switch to a vCPU from another domain. At this point, the virtual CPU is marked as paused either via pause_count (domain_shutdown -> vcpu_pause_nosync) or by having the _VPF_suspended bit set in pause_flags. In both cases, vcpu_runnable() will return false, so control will not be returned to the guest OS from this trap, and the scheduler will switch to another domain's vCPU. During this context switch, we must not update the saved context of certain vCPU registers for the domain that has entered suspend. The vCPU context for suspend is set up in do_setup_vcpu_ctx(). If another function were to overwrite the saved values with the current ones in a physical CPU at this point, the domain would not be able to resume correctly after a resume command. As an alternative, all suspend-related actions could be performed in a tasklet, which would avoid the need for modifications in domain_pause and the introduction of the new flag. The tasklet would execute after the context switch, but this approach would increase code complexity and introduce synchronization issues, such as passing the suspend command context to the tasklet, adding extra locks, or creating a dedicated tasklet for each domain. Locking would also need to be reworked to handle cases where another domain might attempt to change the vCPU context concurrently and just domain_pause won't be enough. Since the trap is a synchronous call, the current approach greatly simplifies synchronization compared to the tasklet-based alternative. > > > @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d) > > > > for_each_vcpu ( d, v ) > > { > > + /* > > + * No need to conditionally clear _VPF_suspended here: > > + * - This bit is only set on Arm64, and only after a successful > > suspend. Note to self: s/Arm64/Arm/g > > + * - domain_resume_nopause() may also be called from paths other > > than > > + * the suspend/resume flow, such as "soft-reset" actions (e.g., > > + * on_poweroff), as part of the Xenstore control/shutdown > > protocol. > > + * These require guest acknowledgement to complete the operation. > > + * So clearing the bit unconditionally is safe. > > + */ > > + clear_bit(_VPF_suspended, &v->pause_flags); > > Seeing that you set this bit for a single vCPU in a domain only, I wonder why > it needs to be a per-vCPU flag. That's a good question. In earlier versions of this patch series, both I and some other contributors used existed fields from the domain structure, such as shutdown_code and is_shutting_down, for this purpose. However, I recall that in a previous review, this approach was not well received. See: https://lore.kernel.org/all/d24be446-af5a-7881-2db4-b25afac3e...@citrix.com/ Technically, there is nothing preventing me from storing this information in the domain structure. However, I do not see much benefit in introducing a new field to the domain struct when there is already an existing per-vCPU flags field that tracks powerdown and pause states. Using one more bit in the pause_flags field seems sufficient and avoids further bloating the domain structure. > > Apart from this, with the comment I still fear I wouldn't feel capable of > ack-ing this, as there's too much Arm-only interaction in here. It's not even > clear whether this could easily be re-used by another port. Thank you for your feedback. I understand your concern regarding the Arm-specific nature of this code and the potential challenges for reusing it on other architectures. The current implementation is focused on supporting PSCI SYSTEM_SUSPEND, which is an Arm-specific interface, so much of the logic is naturally tied to Arm. That said, I have tried to keep the changes as contained as possible, and where feasible, I have avoided making unnecessary modifications to common code. If there is interest or a use case for supporting similar suspend/resume flows on other architectures, I am open to suggestions on how to further abstract or generalize the implementation. If you have specific recommendations for making this code more portable or easier to adapt for other ports, I would be happy to discuss and consider them. > > Jan Best regards, Mykola