Re: [PATCH 2/3] block/nbd: only enter connection coroutine if it's present

2021-01-28 Thread Vladimir Sementsov-Ogievskiy

28.01.2021 23:14, Roman Kagan wrote:

When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.

However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine.  As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:

 at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
 opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
 blocking=blocking@entry=true)
 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
 cb=, opaque=)
 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
 bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
 new_context=new_context@entry=0x5618056c7580,
 ignore=ignore@entry=0x7f60e1e63780)
 at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
 bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
 ignore_child=, errp=)
 at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
 new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
 errp=errp@entry=0x0)
 at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
 new_context=, errp=errp@entry=0x0)
 at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
 at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
 at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
 running=0, state=)
 at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
 state=state@entry=RUN_STATE_FINISH_MIGRATE)
 at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
 send_stop=) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
 at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
from /lib/x86_64-linux-gnu/libpthread.so.0

Fix it by checking that the connection coroutine is non-null before
trying to enter it.  If it is null, no entering is needed, as the
connection is probably going down anyway.

Signed-off-by: Roman Kagan 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  block/nbd.c | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index bcd6641e90..b3cbbeb4b0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
  BlockDriverState *bs = opaque;
  BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
  
-/*

- * The node is still drained, so we know the coroutine has yielded in
- * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is
- * entered for the first time. Both places are safe for entering the
- * coroutine.
- */
-qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+if (s->connection_co) {
+/*
+ * The node is still drained, so we know the coroutine has yielded in
+ * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
+ * it is entered for the first time. Both places are safe for entering
+ * the coroutine.
+ */
+qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+}
  bdrv_dec_in_flight(bs);
  }
  




--
Best regards,
Vladimir



[PATCH 2/3] block/nbd: only enter connection coroutine if it's present

2021-01-28 Thread Roman Kagan
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.

However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine.  As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:

at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
blocking=blocking@entry=true)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
cb=, opaque=)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
new_context=new_context@entry=0x5618056c7580,
ignore=ignore@entry=0x7f60e1e63780)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
ignore_child=, errp=)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
new_context=, errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
running=0, state=)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
send_stop=) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
   from /lib/x86_64-linux-gnu/libpthread.so.0

Fix it by checking that the connection coroutine is non-null before
trying to enter it.  If it is null, no entering is needed, as the
connection is probably going down anyway.

Signed-off-by: Roman Kagan 
---
 block/nbd.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index bcd6641e90..b3cbbeb4b0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque)
 BlockDriverState *bs = opaque;
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-/*
- * The node is still drained, so we know the coroutine has yielded in
- * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is
- * entered for the first time. Both places are safe for entering the
- * coroutine.
- */
-qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+if (s->connection_co) {
+/*
+ * The node is still drained, so we know the coroutine has yielded in
+ * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
+ * it is entered for the first time. Both places are safe for entering
+ * the coroutine.
+ */
+qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
+}
 bdrv_dec_in_flight(bs);
 }
 
-- 
2.29.2