Hi Michael S. Tsirkin and Parav Pandit, Thank you for your detailed explanation. I will try to use PM cap to fix this issue.
On 2023/10/26 18:30, Michael S. Tsirkin wrote: > On Thu, Oct 26, 2023 at 10:24:26AM +0000, Chen, Jiqian wrote: >> >> On 2023/10/25 11:51, Parav Pandit wrote: >>> >>> >>>> From: Chen, Jiqian <jiqian.c...@amd.com> >>>> Sent: Tuesday, October 24, 2023 5:44 PM >>>> >>>> On 2023/10/24 18:51, Parav Pandit wrote: >>>>> >>>>>> From: Chen, Jiqian <jiqian.c...@amd.com> >>>>>> Sent: Tuesday, October 24, 2023 4:06 PM >>>>>> >>>>>> On 2023/10/23 21:35, Parav Pandit wrote: >>>>>>> >>>>>>>> From: Chen, Jiqian <jiqian.c...@amd.com> >>>>>>>> Sent: Monday, October 23, 2023 4:09 PM >>>>>>> >>>>>>>>> I think this can be done without introducing the new register. >>>>>>>>> Can you please check if the PM register itself can serve the >>>>>>>>> purpose instead >>>>>>>> of new virtio level register? >>>>>>>> Do you mean the system PM register? >>>>>>> No, the device's PM register at transport level. >>>>>> I tried to find this register(pci level or virtio pci level or virtio >>>>>> driver level), but I didn't find it in Linux kernel or Qemu codes. >>>>>> May I know which register you are referring to specifically? Or which >>>>>> PM state bit you mentioned below? >>>>>> >>>>> PCI spec's PCI Power Management Capability Structure in section 7.5.2. >>>> Yes, what you point to is PM capability for PCIe device. >>>> But the problem is still that in Qemu code, it will check the >>>> condition(pci_bus_is_express or pci_is_express) of all virtio-pci devices >>>> in >>>> function virtio_pci_realize(), if the virtio devices aren't a PCIe device, >>>> it will not >>>> add PM capability for them. >>> PCI PM capability is must for PCIe devices. So may be QEMU code has put it >>> only under is_pcie check. >>> But it can be done outside of that check as well because this capability >>> exists on PCI too for long time and it is backward compatible. >> Do you suggest me to implement PM capability for virtio devices in >> Qemu firstly, and then to try if the PM capability can work for this >> scenario? > > virtio devices in qemu already have a PM capability. > > >> If so, we will complicate a simple problem. Because there are no other needs >> to add PM capability for virtio devices for now, if we add it just for >> preserving resources, it seems unnecessary and unreasonable. And we are not >> sure if there are other scenarios that are not during the process of PM >> state changing also need to preserve resources, if have, then the PM >> register can't cover, but preserve_resources register can. > > One of the selling points of virtio is precisely reusing existing > platform capabilities as opposed to coming up with our own. > See abstract.tex > > >> Can I add some notes like "If PM capability is implemented for virtio >> devices, it may cover this scenario, and if there are no other scenarios >> that PM can't cover, then we can remove this register " in commit message or >> spec description and let us continue to add preserve_resources register? > > We can't remove registers. > >>> >>>> And another problem is how about the MMIO transport devices? Since >>>> preserve resources is need for all transport type devices. >>>> >>> MMIO lacks such rich PM definitions. If in future MMIO wants to support, it >>> will be extended to match to other transports like PCI. >>> >>>>>>> >>>>>>>> I think it is unreasonable to let virtio- device listen the PM >>>>>>>> state of Guest system. >>>>>>> Guest driver performs any work on the guest systems PM callback >>>>>>> events in >>>>>> the virtio driver. >>>>>> I didn't find any PM state callback in the virtio driver. >>>>>> >>>>> There are virtio_suspend and virtio_resume in case of Linux. >>>> I think what you said virtio_suspend/resume is freeze/restore callback from >>>> "struct virtio_driver" or suspend/resume callback from "static const struct >>>> dev_pm_ops virtio_pci_pm_ops". >>>> And yes, I agree, if virtio devices have PM capability, maybe we can set >>>> PM state >>>> in those callback functions. >>>> >>>>> >>>>>>> >>>>>>>> It's more suitable that each device gets notifications from driver, >>>>>>>> and then do preserving resources operation. >>>>>>> I agree that each device gets the notification from driver. >>>>>>> The question is, should it be virtio driver, or existing pci driver >>>>>>> which >>>>>> transitions the state from d0->d3 and d3->d0 is enough. >>>>>> It seems there isn't existing pci driver to transitions d0 or d3 >>>>>> state. Could you please tell me which one it is specifically? I am very >>>>>> willing to >>>> give a try. >>>>>> >>>>> Virtio-pci modern driver of Linux should be able to. >>>> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two >>>> problems I >>>> said above. >>>> >>> Both can be resolved without switching to pcie. >>> >>>>> >>>>>>> Can you please check that? >>>>>>> >>>>>>>>>> --- a/transport-pci.tex >>>>>>>>>> +++ b/transport-pci.tex >>>>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration >>>> structure >>>>>>>>>> layout}\label{sec:Virtio Transport >>>>>>>>>> /* About the administration virtqueue. */ >>>>>>>>>> le16 admin_queue_index; /* read-only for driver */ >>>>>>>>>> le16 admin_queue_num; /* read-only for driver */ >>>>>>>>>> + le16 preserve_resources; /* read-write */ >>>>>>>>> Preserving these resources in the device implementation takes >>>>>>>>> finite amount >>>>>>>> of time. >>>>>>>>> Possibly more than 40nsec (time of PCIe write TLP). >>>>>>>>> Hence this register must be a polling register to indicate that >>>>>>>> preservation_done. >>>>>>>>> This will tell the guest when the preservation is done, and when >>>>>>>>> restoration is >>>>>>>> done, so that it can resume upper layers. >>>>>>>>> >>>>>>>>> Please refer to queue_reset definition to learn more about such >>>>>>>>> register >>>>>>>> definition. >>>>>>>> Thanks, I will refer to "queue_reset". So, I need three values, >>>>>>>> driver write 1 to let device do preserving resources, driver write >>>>>>>> 2 to let device do restoring resources, device write 0 to tell >>>>>>>> driver that preserving or restoring done, am I right? >>>>>>>> >>>>>>> Right. >>>>>>> >>>>>>> And if the existing pcie pm state bits can do, we can leverage that. >>>>>>> If it cannot be used, lets add that reasoning in the commit log to >>>>>>> describe this >>>>>> register. >>>>>>> >>>>>>>>> >>>>>>>>> Lets please make sure that PCIe PM level registers are >>>>>>>>> sufficient/not-sufficient >>>>>>>> to decide the addition of this register. >>>>>>>> But if the device is not a PCIe device, it doesn't have PM >>>>>>>> capability, then this will not work. Actually in my local >>>>>>>> environment, pci_is_express() return false in Qemu, they are not >>>>>>>> PCIe >>>>>> device. >>>>>>> It is reasonable to ask to plug in as PCIe device in 2023 to get new >>>>>>> functionality that too you mentioned a gpu device. 😊 >>>>>>> Which does not have very long history of any backward compatibility. >>>>>> Do you suggest me to add PM capability for virtio-gpu or change >>>>>> virtio-gpu to a PCIe device? >>>>>> >>>>> PCI Power Management Capability Structure does not seem to be limited to >>>> PCIe. >>>> I am not sure, but in current Qemu code, I can see the check >>>> "pci_is_express" >>>> for all virtio-pci devices. If we want to add PM capability for virtio-pci >>>> devices, >>>> we need to change them to PCIe device I think. >>>> >>> That is one option. >>> Second option to extend PCI PM cap for non pci device because it is >>> supported. >>> >>>>> >>>> >>>> -- >>>> Best regards, >>>> Jiqian Chen. >> >> -- >> Best regards, >> Jiqian Chen. > -- Best regards, Jiqian Chen.