On Fri, Apr 19, 2024 at 11:07:21AM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 18, 2024 at 04:02:49PM -0400, Peter Xu wrote: > > On Thu, Apr 18, 2024 at 08:14:15PM +0200, Maciej S. Szmigiero wrote: > > > I think one of the reasons for these results is that mixed (RAM + device > > > state) multifd channels participate in the RAM sync process > > > (MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't. > > > > Firstly, I'm wondering whether we can have better names for these new > > hooks. Currently (only comment on the async* stuff): > > > > - complete_precopy_async > > - complete_precopy > > - complete_precopy_async_wait > > > > But perhaps better: > > > > - complete_precopy_begin > > - complete_precopy > > - complete_precopy_end > > > > ? > > > > As I don't see why the device must do something with async in such hook. > > To me it's more like you're splitting one process into multiple, then > > begin/end sounds more generic. > > > > Then, if with that in mind, IIUC we can already split ram_save_complete() > > into >1 phases too. For example, I would be curious whether the performance > > will go back to normal if we offloading multifd_send_sync_main() into the > > complete_precopy_end(), because we really only need one shot of that, and I > > am quite surprised it already greatly affects VFIO dumping its own things. > > > > I would even ask one step further as what Dan was asking: have you thought > > about dumping VFIO states via multifd even during iterations? Would that > > help even more than this series (which IIUC only helps during the blackout > > phase)? > > To dump during RAM iteration, the VFIO device will need to have > dirty tracking and iterate on its state, because the guest CPUs > will still be running potentially changing VFIO state. That seems > impractical in the general case.
We already do such interations in vfio_save_iterate()? My understanding is the recent VFIO work is based on the fact that the VFIO device can track device state changes more or less (besides being able to save/load full states). E.g. I still remember in our QE tests some old devices report much more dirty pages than expected during the iterations when we were looking into such issue that a huge amount of dirty pages reported. But newer models seem to have fixed that and report much less. That issue was about GPU not NICs, though, and IIUC a major portion of such tracking used to be for GPU vRAMs. So maybe I was mixing up these, and maybe they work differently. Thanks, -- Peter Xu