Hi all! On 2025-08-05 10:44, Fabiano Rosas wrote: > Daniel P. Berrangé <berra...@redhat.com> writes: > > > On Mon, Aug 04, 2025 at 04:27:45PM -0300, Fabiano Rosas wrote: > >> Juraj Marcin <jmar...@redhat.com> writes: > >> > >> > Hi Daniel, > >> > > >> > On 2025-08-01 18:02, Daniel P. Berrangé wrote: > >> >> This is a followup to previously merged patches that claimed to > >> >> workaround the gnutls bug impacting migration, but in fact were > >> >> essentially non-functional. Juraj Marcin pointed this out, and > >> >> this new patch tweaks the workaround to make it actually do > >> >> something useful. > >> >> > >> >> Daniel P. Berrangé (2): > >> >> migration: simplify error reporting after channel read > >> >> migration: fix workaround for gnutls thread safety > >> >> > >> >> crypto/tlssession.c | 16 ---------------- > >> >> migration/qemu-file.c | 22 +++++++++++++++++----- > >> >> 2 files changed, 17 insertions(+), 21 deletions(-) > >> >> > >> > > >> > thanks for finding a fix for the workaround. I have tested it and it > >> > resolves the issue. > >> > > >> > However, it significantly slows down migration, even with the workaround > >> > disabled (and thus no locking). When benchmarking, I used the fixed > >> > version of GNUTLS, VM with 20GB of RAM which were fully written to > >> > before starting a normal migration with no workload during the > >> > migration. > >> > > >> > Test cases: > >> > [1]: before this patchset > >> > [2]: with this patchset applied and GNUTLS workaround enabled > >> > [2]: with this patchset applied and GNUTLS workaround disabled > >> > > >> > | Total time | Throughput | Transfered bytes | > >> > --+------------+------------+------------------+ > >> > 1 | 31 192 ms | 5450 mpbs | 21 230 973 763 | > >> > 2 | 74 147 ms | 2291 mbps | 21 232 849 066 | > >> > 3 | 72 426 ms | 2343 mbps | 21 215 009 392 | > >> > >> Thanks testing this. I had just managed to convince myself that there > >> wouldn't be any performance issues. > >> > >> The yield at every buffer fill on the incoming side is probably way more > >> impactful than the poll on the RP. > > > > Yeah, that's an unacceptable penalty on the incoming side for sure. > > > > How about we simply change the outbound migration channel to be in > > non-blocking mode ? I originally put it in blocking mode way back > > in 9e4d2b98ee98f4cee50d671e500eceeefa751ee0, but if I look at the > > QEMUFile impl of qemu_fill_buffer and qemu_fflush, but should be > > coping with a non-blocking socket. qemu_fill_buffer has explicit > > code to wait, and qemu_fflush uses the _all() variant whcih has > > built-in support for waiting. So I'm not seeing an obvious need > > to run the channel in blocking mode. > > > > It's definitely simpler and I think it works. It's uncomfortably late > though to add a bunch of glib event loop code to the migration > thread. Is the suggestion of moving the yield to tlssession.c even > feasible? > > Juraj, are you able to get some numbers with this?
Yes, I have tested it with this patch and it still works, now without any noticeable performance regressions. Test cases: [0]: before this patchset with GNUTLS workaround disabled [1]: before this patchset with GNUTLS workaround enabled [2]: with this patchset applied and GNUTLS workaround enabled [2]: with this patchset applied and GNUTLS workaround disabled | Total time | Throughput | Transfered bytes | --+------------+-------------+------------------+ 0 | 32 464 ms | 5 228 mbps | 21 213 283 021 | 1 | 32 429 ms | 5 236 mpbs | 21 224 062 348 | 2 | 32 934 ms | 5 150 mbps | 21 200 558 083 | 3 | 31 547 ms | 5 384 mbps | 21 229 225 792 | Tested-by: Juraj Marcin <jmar...@redhat.com> Best regards Juraj Marcin > > > Using non-blocking will prevent the return path thuread setting > > in a read() call, so we won't have mutual exclusion between read > > and write which this patch was trying to avoid > > > > ie, this delta on top of this patch: > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 10c216d25d..1eaabc1f19 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -4002,7 +4002,7 @@ void migration_connect(MigrationState *s, Error > > *error_in) > > } > > > > migration_rate_set(rate_limit); > > - qemu_file_set_blocking(s->to_dst_file, true); > > + qemu_file_set_blocking(s->to_dst_file, false); > > > > /* > > * Open the return path. For postcopy, it is used exclusively. For > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > > index cf6115e699..8ee44c5ac9 100644 > > --- a/migration/qemu-file.c > > +++ b/migration/qemu-file.c > > @@ -338,22 +338,6 @@ static ssize_t coroutine_mixed_fn > > qemu_fill_buffer(QEMUFile *f) > > return 0; > > } > > > > - /* > > - * This feature triggers acquisition of mutexes around every > > - * read and write. Thus we must not sit in a blocking read > > - * if this is set, but must instead poll proactively. This > > - * does not work with some channel types, however, so must > > - * only pre-poll when the featre is set. > > - */ > > - if (qio_channel_has_feature(f->ioc, > > - QIO_CHANNEL_FEATURE_CONCURRENT_IO)) { > > - if (qemu_in_coroutine()) { > > - qio_channel_yield(f->ioc, G_IO_IN); > > - } else { > > - qio_channel_wait(f->ioc, G_IO_IN); > > - } > > - } > > - > > do { > > struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending }; > > len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0, > > > > > > With regards, > > Daniel >