Hi Jan,

On Mon, Aug 18, 2025 at 1:15 PM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 18.08.2025 10:49, Mykola Kvach wrote:
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> >      if ( is_idle_vcpu(p) )
> >          return;
> >
> > +    /* Arch timer */
> > +    p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> > +    virt_timer_save(p);
> > +
> > +    /* VGIC */
> > +    gic_save_state(p);
> > +
> > +    if ( test_bit(_VPF_suspended, &p->pause_flags) )
> > +        return;
>
> As I had to look at the Arm side uses of the new bit to at least try to
> follow the comment further down, I came across this. And now I wonder:
> Why would one of the pause flags need special casing here?

Some kind of answer was given in a previous version of this patch series,
see:
https://patchew.org/Xen/cover.1751020456.git.mykola._5fkv...@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5fkv...@epam.com/#066c6e93-a478-4c8f-b161-d109bd0e6...@xen.org

To clarify:

We need to avoid updating the vCPU context when switching from a vCPU of a
domain that is being suspended to a vCPU of another domain. After the current
HVC trap, which handles the PSCI SYSTEM_SUSPEND call, completes, the scheduler
will switch to a vCPU from another domain. At this point, the virtual CPU is
marked as paused either via pause_count (domain_shutdown -> vcpu_pause_nosync)
or by having the _VPF_suspended bit set in pause_flags. In both cases,
vcpu_runnable() will return false, so control will not be returned to the guest
OS from this trap, and the scheduler will switch to another domain's vCPU.

During this context switch, we must not update the saved context of certain
vCPU registers for the domain that has entered suspend. The vCPU context for
suspend is set up in do_setup_vcpu_ctx(). If another function were to overwrite
the saved values with the current ones in a physical CPU at this point, the
domain would not be able to resume correctly after a resume command.


As an alternative, all suspend-related actions could be performed in a tasklet,
which would avoid the need for modifications in domain_pause and the
introduction of the new flag. The tasklet would execute after the context
switch, but this approach would increase code complexity and introduce
synchronization issues, such as passing the suspend command context to the
tasklet, adding extra locks, or creating a dedicated tasklet for each domain.
Locking would also need to be reworked to handle cases where another domain
might attempt to change the vCPU context concurrently and just domain_pause
won't be enough.

Since the trap is a synchronous call, the current approach greatly simplifies
synchronization compared to the tasklet-based alternative.

>
> > @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
> >
> >      for_each_vcpu ( d, v )
> >      {
> > +        /*
> > +         * No need to conditionally clear _VPF_suspended here:
> > +         * - This bit is only set on Arm64, and only after a successful 
> > suspend.

Note to self: s/Arm64/Arm/g

> > +         * - domain_resume_nopause() may also be called from paths other 
> > than
> > +         *   the suspend/resume flow, such as "soft-reset" actions (e.g.,
> > +         *   on_poweroff), as part of the Xenstore control/shutdown 
> > protocol.
> > +         *   These require guest acknowledgement to complete the operation.
> > +         * So clearing the bit unconditionally is safe.
> > +         */
> > +        clear_bit(_VPF_suspended, &v->pause_flags);
>
> Seeing that you set this bit for a single vCPU in a domain only, I wonder why
> it needs to be a per-vCPU flag.

That's a good question. In earlier versions of this patch series, both I and
some other contributors used existed fields from the domain structure, such as
shutdown_code and is_shutting_down, for this purpose. However, I recall that
in a previous review, this approach was not well received. See:
https://lore.kernel.org/all/d24be446-af5a-7881-2db4-b25afac3e...@citrix.com/

Technically, there is nothing preventing me from storing this information in
the domain structure. However, I do not see much benefit in introducing a new
field to the domain struct when there is already an existing per-vCPU flags
field that tracks powerdown and pause states. Using one more bit in the
pause_flags field seems sufficient and avoids further bloating the domain
structure.

>
> Apart from this, with the comment I still fear I wouldn't feel capable of
> ack-ing this, as there's too much Arm-only interaction in here. It's not even
> clear whether this could easily be re-used by another port.

Thank you for your feedback.

I understand your concern regarding the Arm-specific nature of this code and
the potential challenges for reusing it on other architectures. The current
implementation is focused on supporting PSCI SYSTEM_SUSPEND, which is an
Arm-specific interface, so much of the logic is naturally tied to Arm.

That said, I have tried to keep the changes as contained as possible, and
where feasible, I have avoided making unnecessary modifications to common code.
If there is interest or a use case for supporting similar suspend/resume flows
on other architectures, I am open to suggestions on how to further abstract or
generalize the implementation.

If you have specific recommendations for making this code more portable or
easier to adapt for other ports, I would be happy to discuss and consider
them.

>
> Jan

Best regards,
Mykola

Reply via email to