On Mon, Jun 13, 2022 at 05:58:44PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Wed, Jun 8, 2022 at 5:23 PM Peter Xu <pet...@redhat.com> wrote:
> [...]
> > > In a previous iteration of the patchset, it was made clear that it's
> > > desirable to detect when the kernel falls back to copying mechanism,
> > > so the user of 'QIOChannelSocket' can switch to copying and avoid the
> > > overhead. This was done by the return value of flush(), which is 1 if
> > > that occurs.
> >
> > Two questions..
> >
> >   1) When that happens, will MSG_ERRQUEUE keeps working just like zerocopy
> >      is functional?
> 
> I am not sure about what exactly you meant by 'like zerocopy is
> funcional', but the
> idea is that reading from MSG_ERRQUEUE should return a msg for each sendmsg
> syscall with MSG_ZEROCOPY that previously happened. This does not depend on
> the outcome (like falling back to the copying mechanism).
> btw, most of those messages may be batched to reduce overhead.
> 
> At some point, zero-copy may fail, and fall back to copying, so in
> those messages
> an error code SO_EE_CODE_ZEROCOPY_COPIED can be seen. Having only
> those messages in a flush will trigger the returning of 1 from the
> flush function.

Ah I think I missed the "reset ret==0 when !SO_EE_CODE_ZEROCOPY_COPIED"
path..  Sorry.

> 
> >
> >      If the answer is yes, I don't see how ret=1 will ever be
> >      returned.. because we'll also go into the same loop in
> >      qio_channel_socket_flush() anyway.
> 
> 
> We set ret to 1 at function entry and then for each message in the 
> MSG_ERRQUEUE,
> we test if it has error code different than SO_EE_CODE_ZEROCOPY_COPIED.
> If it ever have a different error code, we set ret=0.
> 
> So, in our previous example, if we have a net device not supporting
> the 'Scatter-Gather'
> feature (NETIF_F_SG), every error message will be
> SO_EE_CODE_ZEROCOPY_COPIED, and it will return 1.
> 
> 
> >
> >      If the answer is no, then since we'll have non-zero zero_copy_queued,
> >      will the loop in qio_channel_socket_flush() go into a dead one?  How
> >      could it return?
> 
> No, because it will go through all packets sent with MSG_ZEROCOPY, including 
> the
> ones that fell back to copying, so the counter should be fine. If any
> code disables
> zero-copy, it will both stop sending stuff wil MSG_ZEROCOPY and flushing, so 
> it
> should be fine.
> 
> >
> >   2) Even if we have the correct ret=1 returned when that happens, which
> >      caller is detecting that ret==1 and warn the admin?
> >
> 
> No caller is using that right now.
> It's supposed to be a QIOChannel interface feature, and any 
> user/implementation
> could use that information to warn if zero-copy is not being used, fall back 
> to
> copying directly (to avoid overhead of testing zero-copy) or even use
> it to cancel the
> sending if wanted.
> 
> It was a suggestion of Daniel on top of [PATCH v5 1/6] IIRC.

OK the detection makes sense, thanks for the details.

Then now I'm wondering whether we should have warned the admin already if
zero-copy send is not fully enabled in live migration.  Should we add a
error_report_once() somewhere for the ret==1 already?  After all the user
specify zero_copy_send=true explicitly.  Did I miss something again?

Thanks,

-- 
Peter Xu


Reply via email to