[virtio-dev] Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

Jason Wang Sun, 14 Jan 2024 16:26:24 -0800

On Fri, Jan 12, 2024 at 3:41 PM Chen, Jiqian <jiqian.c...@amd.com> wrote:
>
> Hi all,
> Sorry to reply late.
> I don't know if you still remember this problem, let me briefly descript it.
> I am working to implement virtgpu S3 function on Xen.
> Currently on Xen, if we start a guest through qemu with enabling virtgpu, and 
> then suspend and resume guest. We can find that the guest kernel comes back, 
> but the display doesn't. It just shown a black screen.
> That is because during suspending, guest called into qemu and qemu destroyed 
> all display resources and reset renderer. This made the display gone after 
> guest resumed.
> So, I add a new command for virtio-gpu that when guest is suspending, it will 
> notify qemu and set parameter(preserver_resource) to 1 and then qemu will 
> preserve resources, and when resuming, guest will notify qemu to set 
> parameter to 0, and then qemu will keep the normal actions. That can help 
> guest's display come back.
> When I upstream above implementation, Parav and MST suggest me to use the PM 
> capability to fix this problem instead of adding a new command or state bit.
> Now, I have tried the PM capability of virtio-gpu, it can't be used to solve 
> this problem.
> The reason is:
> during guest suspending, it will write D3 state through PM cap, then I can 
> save the resources of virtio-gpu on Qemu side(set preserver_resource to 1),
> but during the process of resuming, the state of PM cap will be cleared by 
> qemu resetting(qemu_system_wakeup-> qemu_devices_reset-> 
> virtio_vga_base_reset-> virtio_gpu_gl_reset),
> it causes that when guest reads state from PM cap, it will find the 
> virtio-gpu has already been D0 state, so guest will not write D0 through PM 
> cap, so I can't know when to restore the resources(set preserver_resource to 
> 0).


Looks like a bug to be fixed?

> Do you have any other suggestions?
> Or can I just fallback to the version that add a new 
> command(VIRTIO_GPU_CMD_PRESERVE_RESOURCE) in virtio-gpu? I think that way is 
> more reasonable and feasible for virtio-gpu to protect display resources 
> during S3. As for other devices, if necessary, they can also refer to the 
> implementation of the virtio-gpu to add new commands to prevent resource loss 
> during S3.

Note that there's recently a fix for no_soft_reset, a well behaved
device should not suffer from this issue anymore. If I understand this
correctly, there's no need for any extension for the spec as well.:

https://lore.kernel.org/lkml/CACGkMEs_MajTFxGVcK5R8oqQzTxuL4Pm=uunoondczwzqau...@mail.gmail.com/T/

Thanks

>
> On 2023/10/27 11:03, Chen, Jiqian wrote:
> > Hi Michael S. Tsirkin and Parav Pandit,
> > Thank you for your detailed explanation. I will try to use PM cap to fix 
> > this issue.
> >
> > On 2023/10/26 18:30, Michael S. Tsirkin wrote:
> >> On Thu, Oct 26, 2023 at 10:24:26AM +0000, Chen, Jiqian wrote:
> >>>
> >>> On 2023/10/25 11:51, Parav Pandit wrote:
> >>>>
> >>>>
> >>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
> >>>>> Sent: Tuesday, October 24, 2023 5:44 PM
> >>>>>
> >>>>> On 2023/10/24 18:51, Parav Pandit wrote:
> >>>>>>
> >>>>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
> >>>>>>> Sent: Tuesday, October 24, 2023 4:06 PM
> >>>>>>>
> >>>>>>> On 2023/10/23 21:35, Parav Pandit wrote:
> >>>>>>>>
> >>>>>>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
> >>>>>>>>> Sent: Monday, October 23, 2023 4:09 PM
> >>>>>>>>
> >>>>>>>>>> I think this can be done without introducing the new register.
> >>>>>>>>>> Can you please check if the PM register itself can serve the
> >>>>>>>>>> purpose instead
> >>>>>>>>> of new virtio level register?
> >>>>>>>>> Do you mean the system PM register?
> >>>>>>>> No, the device's PM register at transport level.
> >>>>>>> I tried to find this register(pci level or virtio pci level or virtio
> >>>>>>> driver level), but I didn't find it in Linux kernel or Qemu codes.
> >>>>>>> May I know which register you are referring to specifically? Or which
> >>>>>>> PM state bit you mentioned below?
> >>>>>>>
> >>>>>> PCI spec's PCI Power Management Capability Structure in section 7.5.2.
> >>>>> Yes, what you point to is PM capability for PCIe device.
> >>>>> But the problem is still that in Qemu code, it will check the
> >>>>> condition(pci_bus_is_express or pci_is_express) of all virtio-pci 
> >>>>> devices in
> >>>>> function virtio_pci_realize(), if the virtio devices aren't a PCIe 
> >>>>> device, it will not
> >>>>> add PM capability for them.
> >>>> PCI PM capability is must for PCIe devices. So may be QEMU code has put 
> >>>> it only under is_pcie check.
> >>>> But it can be done outside of that check as well because this capability 
> >>>> exists on PCI too for long time and it is backward compatible.
> >>> Do you suggest me to implement PM capability for virtio devices in
> >>> Qemu firstly, and then to try if the PM capability can work for this
> >>> scenario?
> >>
> >> virtio devices in qemu already have a PM capability.
> >>
> >>
> >>> If so, we will complicate a simple problem. Because there are no other 
> >>> needs to add PM capability for virtio devices for now, if we add it just 
> >>> for preserving resources, it seems unnecessary and unreasonable. And we 
> >>> are not sure if there are other scenarios that are not during the process 
> >>> of PM state changing also need to preserve resources, if have, then the 
> >>> PM register can't cover, but preserve_resources register can.
> >>
> >> One of the selling points of virtio is precisely reusing existing
> >> platform capabilities as opposed to coming up with our own.
> >> See abstract.tex
> >>
> >>
> >>> Can I add some notes like "If PM capability is implemented for virtio 
> >>> devices, it may cover this scenario, and if there are no other scenarios 
> >>> that PM can't cover, then we can remove this register " in commit message 
> >>> or spec description and let us continue to add preserve_resources 
> >>> register?
> >>
> >> We can't remove registers.
> >>
> >>>>
> >>>>> And another problem is how about the MMIO transport devices? Since
> >>>>> preserve resources is need for all transport type devices.
> >>>>>
> >>>> MMIO lacks such rich PM definitions. If in future MMIO wants to support, 
> >>>> it will be extended to match to other transports like PCI.
> >>>>
> >>>>>>>>
> >>>>>>>>> I think it is unreasonable to let virtio- device listen the PM
> >>>>>>>>> state of Guest system.
> >>>>>>>> Guest driver performs any work on the guest systems PM callback
> >>>>>>>> events in
> >>>>>>> the virtio driver.
> >>>>>>> I didn't find any PM state callback in the virtio driver.
> >>>>>>>
> >>>>>> There are virtio_suspend and virtio_resume in case of Linux.
> >>>>> I think what you said virtio_suspend/resume is freeze/restore callback 
> >>>>> from
> >>>>> "struct virtio_driver" or suspend/resume callback from "static const 
> >>>>> struct
> >>>>> dev_pm_ops virtio_pci_pm_ops".
> >>>>> And yes, I agree, if virtio devices have PM capability, maybe we can 
> >>>>> set PM state
> >>>>> in those callback functions.
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>>> It's more suitable that each device gets notifications from driver,
> >>>>>>>>> and then do preserving resources operation.
> >>>>>>>> I agree that each device gets the notification from driver.
> >>>>>>>> The question is, should it be virtio driver, or existing pci driver
> >>>>>>>> which
> >>>>>>> transitions the state from d0->d3 and d3->d0 is enough.
> >>>>>>> It seems there isn't existing pci driver to transitions d0 or d3
> >>>>>>> state. Could you please tell me which one it is specifically? I am 
> >>>>>>> very willing to
> >>>>> give a try.
> >>>>>>>
> >>>>>> Virtio-pci modern driver of Linux should be able to.
> >>>>> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two 
> >>>>> problems I
> >>>>> said above.
> >>>>>
> >>>> Both can be resolved without switching to pcie.
> >>>>
> >>>>>>
> >>>>>>>> Can you please check that?
> >>>>>>>>
> >>>>>>>>>>> --- a/transport-pci.tex
> >>>>>>>>>>> +++ b/transport-pci.tex
> >>>>>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration
> >>>>> structure
> >>>>>>>>>>> layout}\label{sec:Virtio Transport
> >>>>>>>>>>>          /* About the administration virtqueue. */
> >>>>>>>>>>>          le16 admin_queue_index;         /* read-only for driver 
> >>>>>>>>>>> */
> >>>>>>>>>>>          le16 admin_queue_num;         /* read-only for driver */
> >>>>>>>>>>> +        le16 preserve_resources;        /* read-write */
> >>>>>>>>>> Preserving these resources in the device implementation takes
> >>>>>>>>>> finite amount
> >>>>>>>>> of time.
> >>>>>>>>>> Possibly more than 40nsec (time of PCIe write TLP).
> >>>>>>>>>> Hence this register must be a polling register to indicate that
> >>>>>>>>> preservation_done.
> >>>>>>>>>> This will tell the guest when the preservation is done, and when
> >>>>>>>>>> restoration is
> >>>>>>>>> done, so that it can resume upper layers.
> >>>>>>>>>>
> >>>>>>>>>> Please refer to queue_reset definition to learn more about such
> >>>>>>>>>> register
> >>>>>>>>> definition.
> >>>>>>>>> Thanks, I will refer to "queue_reset". So, I need three values,
> >>>>>>>>> driver write 1 to let device do preserving resources, driver write
> >>>>>>>>> 2 to let device do restoring resources, device write 0 to tell
> >>>>>>>>> driver that preserving or restoring done, am I right?
> >>>>>>>>>
> >>>>>>>> Right.
> >>>>>>>>
> >>>>>>>> And if the existing pcie pm state bits can do, we can leverage that.
> >>>>>>>> If it cannot be used, lets add that reasoning in the commit log to
> >>>>>>>> describe this
> >>>>>>> register.
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Lets please make sure that PCIe PM level registers are
> >>>>>>>>>> sufficient/not-sufficient
> >>>>>>>>> to decide the addition of this register.
> >>>>>>>>> But if the device is not a PCIe device, it doesn't have PM
> >>>>>>>>> capability, then this will not work. Actually in my local
> >>>>>>>>> environment, pci_is_express() return false in Qemu, they are not
> >>>>>>>>> PCIe
> >>>>>>> device.
> >>>>>>>> It is reasonable to ask to plug in as PCIe device in 2023 to get new
> >>>>>>>> functionality that too you mentioned a gpu device. 😊
> >>>>>>>> Which does not have very long history of any backward compatibility.
> >>>>>>> Do you suggest me to add PM capability for virtio-gpu or change
> >>>>>>> virtio-gpu to a PCIe device?
> >>>>>>>
> >>>>>> PCI Power Management Capability Structure does not seem to be limited 
> >>>>>> to
> >>>>> PCIe.
> >>>>> I am not sure, but in current Qemu code, I can see the check 
> >>>>> "pci_is_express"
> >>>>> for all virtio-pci devices. If we want to add PM capability for 
> >>>>> virtio-pci devices,
> >>>>> we need to change them to PCIe device I think.
> >>>>>
> >>>> That is one option.
> >>>> Second option to extend PCI PM cap for non pci device because it is 
> >>>> supported.
> >>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Jiqian Chen.
> >>>
> >>> --
> >>> Best regards,
> >>> Jiqian Chen.
> >>
> >
>
> --
> Best regards,
> Jiqian Chen.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

Reply via email to