Hi Michael S. Tsirkin and Parav Pandit,
Thank you for your detailed explanation. I will try to use PM cap to fix this 
issue.

On 2023/10/26 18:30, Michael S. Tsirkin wrote:
> On Thu, Oct 26, 2023 at 10:24:26AM +0000, Chen, Jiqian wrote:
>>
>> On 2023/10/25 11:51, Parav Pandit wrote:
>>>
>>>
>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
>>>> Sent: Tuesday, October 24, 2023 5:44 PM
>>>>
>>>> On 2023/10/24 18:51, Parav Pandit wrote:
>>>>>
>>>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
>>>>>> Sent: Tuesday, October 24, 2023 4:06 PM
>>>>>>
>>>>>> On 2023/10/23 21:35, Parav Pandit wrote:
>>>>>>>
>>>>>>>> From: Chen, Jiqian <jiqian.c...@amd.com>
>>>>>>>> Sent: Monday, October 23, 2023 4:09 PM
>>>>>>>
>>>>>>>>> I think this can be done without introducing the new register.
>>>>>>>>> Can you please check if the PM register itself can serve the
>>>>>>>>> purpose instead
>>>>>>>> of new virtio level register?
>>>>>>>> Do you mean the system PM register?
>>>>>>> No, the device's PM register at transport level.
>>>>>> I tried to find this register(pci level or virtio pci level or virtio
>>>>>> driver level), but I didn't find it in Linux kernel or Qemu codes.
>>>>>> May I know which register you are referring to specifically? Or which
>>>>>> PM state bit you mentioned below?
>>>>>>
>>>>> PCI spec's PCI Power Management Capability Structure in section 7.5.2.
>>>> Yes, what you point to is PM capability for PCIe device.
>>>> But the problem is still that in Qemu code, it will check the
>>>> condition(pci_bus_is_express or pci_is_express) of all virtio-pci devices 
>>>> in
>>>> function virtio_pci_realize(), if the virtio devices aren't a PCIe device, 
>>>> it will not
>>>> add PM capability for them.
>>> PCI PM capability is must for PCIe devices. So may be QEMU code has put it 
>>> only under is_pcie check.
>>> But it can be done outside of that check as well because this capability 
>>> exists on PCI too for long time and it is backward compatible.
>> Do you suggest me to implement PM capability for virtio devices in
>> Qemu firstly, and then to try if the PM capability can work for this
>> scenario?
> 
> virtio devices in qemu already have a PM capability.
> 
> 
>> If so, we will complicate a simple problem. Because there are no other needs 
>> to add PM capability for virtio devices for now, if we add it just for 
>> preserving resources, it seems unnecessary and unreasonable. And we are not 
>> sure if there are other scenarios that are not during the process of PM 
>> state changing also need to preserve resources, if have, then the PM 
>> register can't cover, but preserve_resources register can.
> 
> One of the selling points of virtio is precisely reusing existing
> platform capabilities as opposed to coming up with our own.
> See abstract.tex
> 
> 
>> Can I add some notes like "If PM capability is implemented for virtio 
>> devices, it may cover this scenario, and if there are no other scenarios 
>> that PM can't cover, then we can remove this register " in commit message or 
>> spec description and let us continue to add preserve_resources register?
> 
> We can't remove registers.
> 
>>>
>>>> And another problem is how about the MMIO transport devices? Since
>>>> preserve resources is need for all transport type devices.
>>>>
>>> MMIO lacks such rich PM definitions. If in future MMIO wants to support, it 
>>> will be extended to match to other transports like PCI.
>>>
>>>>>>>
>>>>>>>> I think it is unreasonable to let virtio- device listen the PM
>>>>>>>> state of Guest system.
>>>>>>> Guest driver performs any work on the guest systems PM callback
>>>>>>> events in
>>>>>> the virtio driver.
>>>>>> I didn't find any PM state callback in the virtio driver.
>>>>>>
>>>>> There are virtio_suspend and virtio_resume in case of Linux.
>>>> I think what you said virtio_suspend/resume is freeze/restore callback from
>>>> "struct virtio_driver" or suspend/resume callback from "static const struct
>>>> dev_pm_ops virtio_pci_pm_ops".
>>>> And yes, I agree, if virtio devices have PM capability, maybe we can set 
>>>> PM state
>>>> in those callback functions.
>>>>
>>>>>
>>>>>>>
>>>>>>>> It's more suitable that each device gets notifications from driver,
>>>>>>>> and then do preserving resources operation.
>>>>>>> I agree that each device gets the notification from driver.
>>>>>>> The question is, should it be virtio driver, or existing pci driver
>>>>>>> which
>>>>>> transitions the state from d0->d3 and d3->d0 is enough.
>>>>>> It seems there isn't existing pci driver to transitions d0 or d3
>>>>>> state. Could you please tell me which one it is specifically? I am very 
>>>>>> willing to
>>>> give a try.
>>>>>>
>>>>> Virtio-pci modern driver of Linux should be able to.
>>>> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two 
>>>> problems I
>>>> said above.
>>>>
>>> Both can be resolved without switching to pcie.
>>>  
>>>>>
>>>>>>> Can you please check that?
>>>>>>>
>>>>>>>>>> --- a/transport-pci.tex
>>>>>>>>>> +++ b/transport-pci.tex
>>>>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration
>>>> structure
>>>>>>>>>> layout}\label{sec:Virtio Transport
>>>>>>>>>>          /* About the administration virtqueue. */
>>>>>>>>>>          le16 admin_queue_index;         /* read-only for driver */
>>>>>>>>>>          le16 admin_queue_num;         /* read-only for driver */
>>>>>>>>>> +        le16 preserve_resources;        /* read-write */
>>>>>>>>> Preserving these resources in the device implementation takes
>>>>>>>>> finite amount
>>>>>>>> of time.
>>>>>>>>> Possibly more than 40nsec (time of PCIe write TLP).
>>>>>>>>> Hence this register must be a polling register to indicate that
>>>>>>>> preservation_done.
>>>>>>>>> This will tell the guest when the preservation is done, and when
>>>>>>>>> restoration is
>>>>>>>> done, so that it can resume upper layers.
>>>>>>>>>
>>>>>>>>> Please refer to queue_reset definition to learn more about such
>>>>>>>>> register
>>>>>>>> definition.
>>>>>>>> Thanks, I will refer to "queue_reset". So, I need three values,
>>>>>>>> driver write 1 to let device do preserving resources, driver write
>>>>>>>> 2 to let device do restoring resources, device write 0 to tell
>>>>>>>> driver that preserving or restoring done, am I right?
>>>>>>>>
>>>>>>> Right.
>>>>>>>
>>>>>>> And if the existing pcie pm state bits can do, we can leverage that.
>>>>>>> If it cannot be used, lets add that reasoning in the commit log to
>>>>>>> describe this
>>>>>> register.
>>>>>>>
>>>>>>>>>
>>>>>>>>> Lets please make sure that PCIe PM level registers are
>>>>>>>>> sufficient/not-sufficient
>>>>>>>> to decide the addition of this register.
>>>>>>>> But if the device is not a PCIe device, it doesn't have PM
>>>>>>>> capability, then this will not work. Actually in my local
>>>>>>>> environment, pci_is_express() return false in Qemu, they are not
>>>>>>>> PCIe
>>>>>> device.
>>>>>>> It is reasonable to ask to plug in as PCIe device in 2023 to get new
>>>>>>> functionality that too you mentioned a gpu device. 😊
>>>>>>> Which does not have very long history of any backward compatibility.
>>>>>> Do you suggest me to add PM capability for virtio-gpu or change
>>>>>> virtio-gpu to a PCIe device?
>>>>>>
>>>>> PCI Power Management Capability Structure does not seem to be limited to
>>>> PCIe.
>>>> I am not sure, but in current Qemu code, I can see the check 
>>>> "pci_is_express"
>>>> for all virtio-pci devices. If we want to add PM capability for virtio-pci 
>>>> devices,
>>>> we need to change them to PCIe device I think.
>>>>
>>> That is one option.
>>> Second option to extend PCI PM cap for non pci device because it is 
>>> supported.
>>>
>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Jiqian Chen.
>>
>> -- 
>> Best regards,
>> Jiqian Chen.
> 

-- 
Best regards,
Jiqian Chen.

Reply via email to