pci: preserve pending interrupts

Duan, Zhenzhong Mon, 04 Aug 2025 20:42:05 -0700


>-----Original Message-----
>From: Steven Sistare <steven.sist...@oracle.com>
>Subject: Re: [PATCH V2 2/2] vfio/pci: preserve pending interrupts
>
>On 7/21/2025 7:18 AM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sist...@oracle.com>
>>> Subject: Re: [PATCH V2 2/2] vfio/pci: preserve pending interrupts
>>>
>>> On 7/16/2025 10:43 PM, Duan, Zhenzhong wrote:
>>>>> -----Original Message-----
>>>>> From: Steve Sistare <steven.sist...@oracle.com>
>>>>> Subject: [PATCH V2 2/2] vfio/pci: preserve pending interrupts
>>>>>
>>>>> cpr-transfer may lose a VFIO interrupt because the KVM instance is
>>>>> destroyed and recreated.  If an interrupt arrives in the middle, it is
>>>>> dropped.  To fix, stop pending new interrupts during cpr save, and pick
>>>>> up the pieces.  In more detail:
>>>>>
>>>>> Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi -->
>>> KVM_IRQFD
>>>>> to
>>>>> deassign the irqfd gsi that routes interrupts directly to the VCPU and
>KVM.
>>>>> After this call, interrupts fall back to the kernel vfio_msihandler, which
>>>>> writes to QEMU's kvm_interrupt eventfd.  CPR already preserves that
>>>>> eventfd.  When the route is re-established in new QEMU, the kernel
>tests
>>>>> the eventfd and injects an interrupt to KVM if necessary.
>>>>
>>>> With this patch, producer is detached from the kvm consumer, do we still
>>> need to close kvm fd on source QEMU?
>>>
>>> Good observation!  I tested with this patch, without the kvm close patch,
>>> and indeed it works.
>>
>> Thanks for confirming.
>>
>>> However, I would like to keep the kvm close patch, because it has another
>>> benefit:
>>> it makes cpr-exec mode faster.  In that mode, old QEMU directly exec's
>new
>>> QEMU,
>>> and it is faster because the kernel exec code does not have to traverse and
>>> examine
>>> kvm page mappings.  That cost is linear with address space size.  I use
>>> cpr-exec
>>> mode at Oracle, and I plan to submit it for consideration in QEMU 10.2.
>>
>> Sure, but I'd like to get clear on the reason.
>> What kvm page do you mean, guest memory pages?
>
>KVM has a slots data structure that it uses to track guest memory pages.
>During exec, slots is cleared page-by-page in the path
>   copy_page_range -> mmu_notifier_invalidate_range_start ->
>kvm_mmu_notifier_invalidate_range_start


Understood, you want to avoid zapping EPT by closing kvm fd.

>
>> When exec, old kvm_fd is closed with close_no_exec implicitly, I don't
>understand
>> why faster if kvm_fd is closed explicitly.
>
>The kernel closes close-on-exec fd's after copy_page_range, after the mmu
>notifier
>has done all the per-page work.

Clear, for the whole series:

Reviewed-by: Zhenzhong Duan <zhenzhong.d...@intel.com>

Thanks
Zhenzhong

RE: [PATCH V2 2/2] vfio/pci: preserve pending interrupts

Reply via email to