Hi all!

On 2025-08-05 10:44, Fabiano Rosas wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Mon, Aug 04, 2025 at 04:27:45PM -0300, Fabiano Rosas wrote:
> >> Juraj Marcin <jmar...@redhat.com> writes:
> >> 
> >> > Hi Daniel,
> >> >
> >> > On 2025-08-01 18:02, Daniel P. Berrangé wrote:
> >> >> This is a followup to previously merged patches that claimed to
> >> >> workaround the gnutls bug impacting migration, but in fact were
> >> >> essentially non-functional. Juraj Marcin pointed this out, and
> >> >> this new patch tweaks the workaround to make it actually do
> >> >> something useful.
> >> >> 
> >> >> Daniel P. Berrangé (2):
> >> >>   migration: simplify error reporting after channel read
> >> >>   migration: fix workaround for gnutls thread safety
> >> >> 
> >> >>  crypto/tlssession.c   | 16 ----------------
> >> >>  migration/qemu-file.c | 22 +++++++++++++++++-----
> >> >>  2 files changed, 17 insertions(+), 21 deletions(-)
> >> >> 
> >> >
> >> > thanks for finding a fix for the workaround. I have tested it and it
> >> > resolves the issue.
> >> >
> >> > However, it significantly slows down migration, even with the workaround
> >> > disabled (and thus no locking). When benchmarking, I used the fixed
> >> > version of GNUTLS, VM with 20GB of RAM which were fully written to
> >> > before starting a normal migration with no workload during the
> >> > migration.
> >> >
> >> > Test cases:
> >> > [1]: before this patchset
> >> > [2]: with this patchset applied and GNUTLS workaround enabled
> >> > [2]: with this patchset applied and GNUTLS workaround disabled
> >> >
> >> >   | Total time | Throughput | Transfered bytes |
> >> > --+------------+------------+------------------+
> >> > 1 |  31 192 ms |  5450 mpbs |   21 230 973 763 |
> >> > 2 |  74 147 ms |  2291 mbps |   21 232 849 066 |
> >> > 3 |  72 426 ms |  2343 mbps |   21 215 009 392 |
> >> 
> >> Thanks testing this. I had just managed to convince myself that there
> >> wouldn't be any performance issues.
> >> 
> >> The yield at every buffer fill on the incoming side is probably way more
> >> impactful than the poll on the RP.
> >
> > Yeah, that's an unacceptable penalty on the incoming side for sure.
> >
> > How about we simply change the outbound migration channel to be in
> > non-blocking mode ?   I originally put it in blocking mode way back
> > in 9e4d2b98ee98f4cee50d671e500eceeefa751ee0, but if I look at the
> > QEMUFile impl of qemu_fill_buffer and qemu_fflush, but should be
> > coping with a non-blocking socket. qemu_fill_buffer has explicit
> > code to wait, and qemu_fflush uses the _all() variant whcih has
> > built-in support for waiting. So I'm not seeing an obvious need
> > to run the channel in blocking mode.
> >
> 
> It's definitely simpler and I think it works. It's uncomfortably late
> though to add a bunch of glib event loop code to the migration
> thread. Is the suggestion of moving the yield to tlssession.c even
> feasible?
> 
> Juraj, are you able to get some numbers with this?

Yes, I have tested it with this patch and it still works, now without
any noticeable performance regressions.

Test cases:
[0]: before this patchset with GNUTLS workaround disabled
[1]: before this patchset with GNUTLS workaround enabled
[2]: with this patchset applied and GNUTLS workaround enabled
[2]: with this patchset applied and GNUTLS workaround disabled

  | Total time | Throughput  | Transfered bytes |
--+------------+-------------+------------------+
0 |  32 464 ms |  5 228 mbps |   21 213 283 021 |
1 |  32 429 ms |  5 236 mpbs |   21 224 062 348 |
2 |  32 934 ms |  5 150 mbps |   21 200 558 083 |
3 |  31 547 ms |  5 384 mbps |   21 229 225 792 |


Tested-by: Juraj Marcin <jmar...@redhat.com>


Best regards

Juraj Marcin

> 
> > Using non-blocking will prevent the return path thuread setting
> > in a read() call, so we won't have mutual exclusion between read
> > and write which this patch was trying to avoid
> >
> > ie, this delta on top of this patch:
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 10c216d25d..1eaabc1f19 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -4002,7 +4002,7 @@ void migration_connect(MigrationState *s, Error 
> > *error_in)
> >      }
> >  
> >      migration_rate_set(rate_limit);
> > -    qemu_file_set_blocking(s->to_dst_file, true);
> > +    qemu_file_set_blocking(s->to_dst_file, false);
> >  
> >      /*
> >       * Open the return path. For postcopy, it is used exclusively. For
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index cf6115e699..8ee44c5ac9 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -338,22 +338,6 @@ static ssize_t coroutine_mixed_fn 
> > qemu_fill_buffer(QEMUFile *f)
> >          return 0;
> >      }
> >  
> > -    /*
> > -     * This feature triggers acquisition of mutexes around every
> > -     * read and write. Thus we must not sit in a blocking read
> > -     * if this is set, but must instead poll proactively. This
> > -     * does not work with some channel types, however, so must
> > -     * only pre-poll when the featre is set.
> > -     */
> > -    if (qio_channel_has_feature(f->ioc,
> > -                                QIO_CHANNEL_FEATURE_CONCURRENT_IO)) {
> > -        if (qemu_in_coroutine()) {
> > -            qio_channel_yield(f->ioc, G_IO_IN);
> > -        } else {
> > -            qio_channel_wait(f->ioc, G_IO_IN);
> > -        }
> > -    }
> > -
> >      do {
> >          struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
> >          len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
> >
> >
> > With regards,
> > Daniel
> 


Reply via email to