In previous commits

  eb3618e9 migration: activate TLS thread safety workaround
  edea8183 io: add support for activating TLS thread safety workaround
  24ad5e19 crypto: implement workaround for GNUTLS thread safety problems

an attempt was made to workaround broken gnutls thread safety when
TLS 1.3 rekeying is performed.

Those patches acquired locks before calling gnutls_record_{send|recv}
but temporarily dropped the locks in the push/pull functions, in the
mistaken belief that there was a race inside gnutls that did not cross
execution of the push/pull functions.

A non-deterministic reproducer mislead into thinking the workaround
was operating as expected, but this was wrong. Juraj demonstrated
that QEMU would still see errors from GNUTLS as well as crashes.

The issue is that a pointer to internal state is saved before the
the push/pull functions are called, and after they return this
saved pointer is potentially invalid. IOW, it is never safe to
temporarily drop the mutexes inside the push/pull functions. The
lock must be held throughout execution of gnutls_record_{send|recv}.

This would be possible with QEMU migration, except that the return
path thread sits in a blocking read waiting for data that very
rarely arrives from the destination QEMU. This blocks ability to
send any migration data in the other thread.

It is possible to workaround this issue, however, by proactively
calling poll() to check for available incoming data before trying
the qio_channel_read() call.

Reported-by: Juraj Marcin <jmar...@redhat.com>
Signed-off-by: Daniel P. Berrangé <berra...@redhat.com>
---
 crypto/tlssession.c   | 16 ----------------
 migration/qemu-file.c | 16 ++++++++++++++++
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index 86d407a142..7e11317528 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -95,19 +95,11 @@ qcrypto_tls_session_push(void *opaque, const void *buf, 
size_t len)
         return -1;
     };
 
-    if (session->lockEnabled) {
-        qemu_mutex_unlock(&session->lock);
-    }
-
     error_free(session->werr);
     session->werr = NULL;
 
     ret = session->writeFunc(buf, len, session->opaque, &session->werr);
 
-    if (session->lockEnabled) {
-        qemu_mutex_lock(&session->lock);
-    }
-
     if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
         errno = EAGAIN;
         return -1;
@@ -134,16 +126,8 @@ qcrypto_tls_session_pull(void *opaque, void *buf, size_t 
len)
     error_free(session->rerr);
     session->rerr = NULL;
 
-    if (session->lockEnabled) {
-        qemu_mutex_unlock(&session->lock);
-    }
-
     ret = session->readFunc(buf, len, session->opaque, &session->rerr);
 
-    if (session->lockEnabled) {
-        qemu_mutex_lock(&session->lock);
-    }
-
     if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
         errno = EAGAIN;
         return -1;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 8ee44c5ac9..cf6115e699 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -338,6 +338,22 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
         return 0;
     }
 
+    /*
+     * This feature triggers acquisition of mutexes around every
+     * read and write. Thus we must not sit in a blocking read
+     * if this is set, but must instead poll proactively. This
+     * does not work with some channel types, however, so must
+     * only pre-poll when the featre is set.
+     */
+    if (qio_channel_has_feature(f->ioc,
+                                QIO_CHANNEL_FEATURE_CONCURRENT_IO)) {
+        if (qemu_in_coroutine()) {
+            qio_channel_yield(f->ioc, G_IO_IN);
+        } else {
+            qio_channel_wait(f->ioc, G_IO_IN);
+        }
+    }
+
     do {
         struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
         len = qio_channel_readv_full(f->ioc, &iov, 1, pfds, pnfd, 0,
-- 
2.50.1


Reply via email to