Hello Juan,

Sorry to go back that early in discussion, but I was reviewing for v9
and I am not sure If I am unable to recall the reason, or I missed an
argument here.
Could you please help me with this?

On Tue, Nov 2, 2021 at 9:32 AM Juan Quintela <quint...@redhat.com> wrote:
>
> Leonardo Bras <leob...@redhat.com> wrote:
> > Implement zerocopy on nocomp_send_write(), by making use of QIOChannel
> > zerocopy interface.
> >
> > Change multifd_send_sync_main() so it can distinguish the last sync from
> > the setup and per-iteration ones, so a flush_zerocopy() can be called
> > at the last sync in order to make sure all RAM is sent before finishing
> > the migration.
>
> You need to do this after each iteration.  Otherwise it can happen that:
>
> channel 1:               channel 2:
>
>    send page 11
>
> next iteration
>                          send page 11
>
>                          this page arrives
>
> now arrives this old copy.
>
> After each iteration, one needs to be sure that no ram is inflight.
>
> This means that I think you don't need the last_sync parameter at all,
> as you have to do the flush() in every iteration.

The flush command is used to guarantee every packet queued before
flush is actually sent before flush returns.
I mean, flushing every iteration will not help with the situation
above, where the pages are sent in order, but arrive at target in a
different order.

There is a chance that in the above text you meant 'send page' as
"queue page for sending", and 'page arrives' as "actually send the
queued page".
It that is correct, then syncing every iteration should not be necessary:
- On page queue, Linux saves the page address and size for sending
- On actual send, Linux will send the current data in the page and send.

So, in this example, if page 11 from iteration 'i' happens to be
'actually sent' after page 11 from iteration 'i+1', it would not be an
issue:
###
channel 1:               channel 2:
Iteration i

queue page 11 (i)

iteration i+1
                          queue page 11 (i+1)
                          actually send page 11 (i+1)

actually send page 11 (i)
###

That's because page 11 (i) will contain a newer version compared to
page 11 (i+1)

tl;dr:
- The page content always depends on the send time, instead of queue time.
- The iteration count describes the queue time.
(on non-zerocopy it's the opposite: it will depend on queue time,
because it copies the memory content during enqueue)

>
[...]

Juan, could you please help me understand if I am missing a part of
your argument up there?
Also, syncing every iteration is still necessary / recommended?

Best regards,
Leo


Reply via email to