client-connection: implement connection retry

Vladimir Sementsov-Ogievskiy Mon, 22 Nov 2021 09:18:11 -0800

22.11.2021 19:30, Eric Blake wrote:

Reviving this thread, as a good as place as any for my question:


On Thu, Jun 10, 2021 at 01:07:50PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Add an option for a thread to retry connection until succeeds. We'll
use nbd/client-connection both for reconnect and for initial connection
in nbd_open(), so we need a possibility to use same NBDClientConnection
instance to connect once in nbd_open() and then use retry semantics for
reconnect.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
  include/block/nbd.h     |  2 ++
  nbd/client-connection.c | 56 +++++++++++++++++++++++++++++++----------
  2 files changed, 45 insertions(+), 13 deletions(-)

  NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
                                                 bool do_negotiation,
                                                 const char *export_name,
@@ -154,23 +164,43 @@ static void *connect_thread_func(void *opaque)
      NBDClientConnection *conn = opaque;
      int ret;
      bool do_free;
+    uint64_t timeout = 1;
+    uint64_t max_timeout = 16;

- conn->sioc = qio_channel_socket_new();

+    while (true) {
+        conn->sioc = qio_channel_socket_new();

- error_free(conn->err);

-    conn->err = NULL;
-    conn->updated_info = conn->initial_info;
+        error_free(conn->err);
+        conn->err = NULL;
+        conn->updated_info = conn->initial_info;

- ret = nbd_connect(conn->sioc, conn->saddr,

-                      conn->do_negotiation ? &conn->updated_info : NULL,
-                      conn->tlscreds, &conn->ioc, &conn->err);


This says that on each retry attempt, we reset whether to ask the
server for structured replies back to our original initial_info
values.

But when dealing with NBD retries in general, I suspect we have a bug.
Consider what happens if our first connection requests structured
replies and base:allocation block status, and we are successful.  But
later, the server disconnects, triggering a retry.  Suppose that on
our retry, we encounter a different server that no longer supports
structured replies.  We would no longer be justified in sending
NBD_CMD_BLOCK_STATUS requests to the reconnected server.  But I can't
find anywhere in the code base that ensures that on a reconnect, the
new server supplies at least as many extensions as the original
server, nor anywhere that we would be able to gracefully handle an
in-flight block status command that can no longer be successfully
continued because the reconnect landed on a downgraded server.

In general, you don't expect a server to downgrade its capabilities
across restarts, so assuming that a retried connection will hit a
server at least as capable as the original server is typical, even if
unsafe.  But it is easy enough to use nbdkit to write a server that
purposefully downgrades its abilities after the first client
connection, for testing how qemu falls apart if it continues making
assumptions about the current server based solely on what it learned
prior to retrying from the first server.

Is this something we need to address quickly for inclusion in 6.2?
Maybe by having a retry connect fail if the new server does not have
the same capabilities as the old?  Do we also need to care about a
server reporting a different size export than the old server?


Yes that's a problem. We previously noted it here 
https://lists.gnu.org/archive/html/qemu-block/2021-06/msg00458.html

Honestly, I didn't start any fix for that :(.. I agree, it would be good to fix 
it somehow in 6.2. I'll try to make something simple this week. Or did you 
already started doing some fix?


--
Best regards,
Vladimir

Re: [PATCH v4 20/32] nbd/client-connection: implement connection retry

Reply via email to