>-----Original Message-----
>From: Alex Williamson <alex.william...@redhat.com>
>Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>d3hot state
>
>On Thu, 20 Feb 2025 04:24:13 +0000
>"Duan, Zhenzhong" <zhenzhong.d...@intel.com> wrote:
>
>> >-----Original Message-----
>> >From: Alex Williamson <alex.william...@redhat.com>
>> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>> >d3hot state
>> >
>> >On Wed, 19 Feb 2025 18:58:58 +0100
>> >Eric Auger <eric.au...@redhat.com> wrote:
>> >
>> >> Since kernel commit:
>> >> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>> >> in D3hot power state")
>> >> any attempt to do an mmap access to a BAR when the device is in d3hot
>> >> state will generate a fault.
>> >>
>> >> On system_powerdown, if the VFIO device is translated by an IOMMU,
>> >> the device is moved to D3hot state and then the vIOMMU gets disabled
>> >> by the guest. As a result of this later operation, the address space is
>> >> swapped from translated to untranslated. When re-enabling the aliased
>> >> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>> >> faults when attempting the operation on BARs.
>> >>
>> >> To avoid doing the remap on those BARs, we compute whether the
>> >> device is in D3hot state and if so, skip the DMA MAP.
>> >
>> >Thinking on this some more, QEMU PCI code already manages the device
>> >BARs appearing in the address space based on the memory enable bit in
>> >the command register. Should we do the same for PM state?
>> >
>> >IOW, the device going into low power state should remove the BARs from
>> >the AddressSpace and waking the device should re-add them. The BAR DMA
>> >mapping should then always be consistent, whereas here nothing would
>> >remap the BARs when the device is woken.
>>
>> If BARs should be disabled before D3hot transition, isn't it guest's
>> responsibility
>to do that itself?
>> Just like what have been done for FLR which calls pci_dev_save_and_disable().
>
>Nothing requires the guest to clear memory and IO from the command
>register before entering a low power state, nor are we going to get
>very far arguing that it's the guest's fault for triggering an error in
>the hypervisor. The PCI spec indicates that memory and IO BARs are only
>accessible when the device is in the D0 power state. On bare metal
>accessing the BAR for a device in a low power state would generate an
>unsupported request.
Understood, yes it makes sense to remove BARs from AddressSpace when D3hot.
> Therefore why should QEMU map BARs of devices in
>low power states into the address space?
Should not.
Thanks
Zhenzhong