nbd: fix reconnect-delay

Eric Blake Wed, 23 Sep 2020 08:21:15 -0700

On 9/3/20 2:03 PM, Vladimir Sementsov-Ogievskiy wrote:

reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.


Let's instead use separate timer.

How to reproduce the bug:


This will make the connect() call of qemu-io at node2 take a long time.

And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.

8. Don't forget to drop iptables rule on node1:

    sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
  block/nbd.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
  1 file changed, 50 insertions(+), 9 deletions(-)


Reviewed-by: Eric Blake <ebl...@redhat.com>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 3/4] block/nbd: fix reconnect-delay

Reply via email to