On Tue, Aug 05, 2025 at 10:44:41AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Mon, Aug 04, 2025 at 04:27:45PM -0300, Fabiano Rosas wrote:
> >> Juraj Marcin <jmar...@redhat.com> writes:
> >> 
> >> > Hi Daniel,
> >> >
> >> > On 2025-08-01 18:02, Daniel P. Berrangé wrote:
> >> >> This is a followup to previously merged patches that claimed to
> >> >> workaround the gnutls bug impacting migration, but in fact were
> >> >> essentially non-functional. Juraj Marcin pointed this out, and
> >> >> this new patch tweaks the workaround to make it actually do
> >> >> something useful.
> >> >> 
> >> >> Daniel P. Berrangé (2):
> >> >>   migration: simplify error reporting after channel read
> >> >>   migration: fix workaround for gnutls thread safety
> >> >> 
> >> >>  crypto/tlssession.c   | 16 ----------------
> >> >>  migration/qemu-file.c | 22 +++++++++++++++++-----
> >> >>  2 files changed, 17 insertions(+), 21 deletions(-)
> >> >> 
> >> >
> >> > thanks for finding a fix for the workaround. I have tested it and it
> >> > resolves the issue.
> >> >
> >> > However, it significantly slows down migration, even with the workaround
> >> > disabled (and thus no locking). When benchmarking, I used the fixed
> >> > version of GNUTLS, VM with 20GB of RAM which were fully written to
> >> > before starting a normal migration with no workload during the
> >> > migration.
> >> >
> >> > Test cases:
> >> > [1]: before this patchset
> >> > [2]: with this patchset applied and GNUTLS workaround enabled
> >> > [2]: with this patchset applied and GNUTLS workaround disabled
> >> >
> >> >   | Total time | Throughput | Transfered bytes |
> >> > --+------------+------------+------------------+
> >> > 1 |  31 192 ms |  5450 mpbs |   21 230 973 763 |
> >> > 2 |  74 147 ms |  2291 mbps |   21 232 849 066 |
> >> > 3 |  72 426 ms |  2343 mbps |   21 215 009 392 |
> >> 
> >> Thanks testing this. I had just managed to convince myself that there
> >> wouldn't be any performance issues.
> >> 
> >> The yield at every buffer fill on the incoming side is probably way more
> >> impactful than the poll on the RP.
> >
> > Yeah, that's an unacceptable penalty on the incoming side for sure.
> >
> > How about we simply change the outbound migration channel to be in
> > non-blocking mode ?   I originally put it in blocking mode way back
> > in 9e4d2b98ee98f4cee50d671e500eceeefa751ee0, but if I look at the
> > QEMUFile impl of qemu_fill_buffer and qemu_fflush, but should be
> > coping with a non-blocking socket. qemu_fill_buffer has explicit
> > code to wait, and qemu_fflush uses the _all() variant whcih has
> > built-in support for waiting. So I'm not seeing an obvious need
> > to run the channel in blocking mode.
> >
> 
> It's definitely simpler and I think it works. It's uncomfortably late
> though to add a bunch of glib event loop code to the migration
> thread. Is the suggestion of moving the yield to tlssession.c even
> feasible?

Well that'll remove the burden for the non-TLS incoming migration,
but the incoming TLS migration will still have the redundant
yields and so still suffer a hit.

Given where we are in freeze, I'm thinking we should just hard
disable the workaround for this release, and re-attempt it in
next cycle and then we can bring it back to stable afterwards.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to