Hello Juan, Sorry to go back that early in discussion, but I was reviewing for v9 and I am not sure If I am unable to recall the reason, or I missed an argument here. Could you please help me with this?
On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quint...@redhat.com> wrote: > > Leonardo Bras <leob...@redhat.com> wrote: > > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel > > zerocopy interface. > > > > Change multifd_send_sync_main() so it can distinguish the last sync from > > the setup and per-iteration ones, so a flush_zerocopy() can be called > > at the last sync in order to make sure all RAM is sent before finishing > > the migration. > > You need to do this after each iteration. Otherwise it can happen that: > > channel 1: channel 2: > > send page 11 > > next iteration > send page 11 > > this page arrives > > now arrives this old copy. > > After each iteration, one needs to be sure that no ram is inflight. > > This means that I think you don't need the last_sync parameter at all, > as you have to do the flush() in every iteration. The flush command is used to guarantee every packet queued before flush is actually sent before flush returns. I mean, flushing every iteration will not help with the situation above, where the pages are sent in order, but arrive at target in a different order. There is a chance that in the above text you meant 'send page' as "queue page for sending", and 'page arrives' as "actually send the queued page". It that is correct, then syncing every iteration should not be necessary: - On page queue, Linux saves the page address and size for sending - On actual send, Linux will send the current data in the page and send. So, in this example, if page 11 from iteration 'i' happens to be 'actually sent' after page 11 from iteration 'i+1', it would not be an issue: ### channel 1: channel 2: Iteration i queue page 11 (i) iteration i+1 queue page 11 (i+1) actually send page 11 (i+1) actually send page 11 (i) ### That's because page 11 (i) will contain a newer version compared to page 11 (i+1) tl;dr: - The page content always depends on the send time, instead of queue time. - The iteration count describes the queue time. (on non-zerocopy it's the opposite: it will depend on queue time, because it copies the memory content during enqueue) > [...] Juan, could you please help me understand if I am missing a part of your argument up there? Also, syncing every iteration is still necessary / recommended? Best regards, Leo