>-----Original Message----- >From: Steven Sistare <steven.sist...@oracle.com> >Subject: Re: [PATCH V2 2/2] vfio/pci: preserve pending interrupts > >On 7/21/2025 7:18 AM, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Steven Sistare <steven.sist...@oracle.com> >>> Subject: Re: [PATCH V2 2/2] vfio/pci: preserve pending interrupts >>> >>> On 7/16/2025 10:43 PM, Duan, Zhenzhong wrote: >>>>> -----Original Message----- >>>>> From: Steve Sistare <steven.sist...@oracle.com> >>>>> Subject: [PATCH V2 2/2] vfio/pci: preserve pending interrupts >>>>> >>>>> cpr-transfer may lose a VFIO interrupt because the KVM instance is >>>>> destroyed and recreated. If an interrupt arrives in the middle, it is >>>>> dropped. To fix, stop pending new interrupts during cpr save, and pick >>>>> up the pieces. In more detail: >>>>> >>>>> Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi --> >>> KVM_IRQFD >>>>> to >>>>> deassign the irqfd gsi that routes interrupts directly to the VCPU and >KVM. >>>>> After this call, interrupts fall back to the kernel vfio_msihandler, which >>>>> writes to QEMU's kvm_interrupt eventfd. CPR already preserves that >>>>> eventfd. When the route is re-established in new QEMU, the kernel >tests >>>>> the eventfd and injects an interrupt to KVM if necessary. >>>> >>>> With this patch, producer is detached from the kvm consumer, do we still >>> need to close kvm fd on source QEMU? >>> >>> Good observation! I tested with this patch, without the kvm close patch, >>> and indeed it works. >> >> Thanks for confirming. >> >>> However, I would like to keep the kvm close patch, because it has another >>> benefit: >>> it makes cpr-exec mode faster. In that mode, old QEMU directly exec's >new >>> QEMU, >>> and it is faster because the kernel exec code does not have to traverse and >>> examine >>> kvm page mappings. That cost is linear with address space size. I use >>> cpr-exec >>> mode at Oracle, and I plan to submit it for consideration in QEMU 10.2. >> >> Sure, but I'd like to get clear on the reason. >> What kvm page do you mean, guest memory pages? > >KVM has a slots data structure that it uses to track guest memory pages. >During exec, slots is cleared page-by-page in the path > copy_page_range -> mmu_notifier_invalidate_range_start -> >kvm_mmu_notifier_invalidate_range_start
Understood, you want to avoid zapping EPT by closing kvm fd. > >> When exec, old kvm_fd is closed with close_no_exec implicitly, I don't >understand >> why faster if kvm_fd is closed explicitly. > >The kernel closes close-on-exec fd's after copy_page_range, after the mmu >notifier >has done all the per-page work. Clear, for the whole series: Reviewed-by: Zhenzhong Duan <zhenzhong.d...@intel.com> Thanks Zhenzhong