Re: [PATCH 2/3] block/nbd: only enter connection coroutine if it's present
28.01.2021 23:14, Roman Kagan wrote: When an NBD block driver state is moved from one aio_context to another (e.g. when doing a drain in a migration thread), nbd_client_attach_aio_context_bh is executed that enters the connection coroutine. However, the assumption that ->connection_co is always present here appears incorrect: the connection may have encountered an error other than -EIO in the underlying transport, and thus may have decided to quit rather than keep trying to reconnect, and therefore it may have terminated the connection coroutine. As a result an attempt to reassign the client in this state (NBD_CLIENT_QUIT) to a different aio_context leads to a null pointer dereference: at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 blocking=blocking@entry=true) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 cb=, opaque=) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 new_context=new_context@entry=0x5618056c7580, ignore=ignore@entry=0x7f60e1e63780) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, ignore_child=, errp=) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 new_context=, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 running=0, state=) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 send_stop=) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 from /lib/x86_64-linux-gnu/libpthread.so.0 Fix it by checking that the connection coroutine is non-null before trying to enter it. If it is null, no entering is needed, as the connection is probably going down anyway. Signed-off-by: Roman Kagan Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index bcd6641e90..b3cbbeb4b0 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque) BlockDriverState *bs = opaque; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -/* - * The node is still drained, so we know the coroutine has yielded in - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is - * entered for the first time. Both places are safe for entering the - * coroutine. - */ -qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); +if (s->connection_co) { +/* + * The node is still drained, so we know the coroutine has yielded in + * nbd_read_eof(), the only place where bs->in_flight can reach 0, or + * it is entered for the first time. Both places are safe for entering + * the coroutine. + */ +qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); +} bdrv_dec_in_flight(bs); } -- Best regards, Vladimir
[PATCH 2/3] block/nbd: only enter connection coroutine if it's present
When an NBD block driver state is moved from one aio_context to another (e.g. when doing a drain in a migration thread), nbd_client_attach_aio_context_bh is executed that enters the connection coroutine. However, the assumption that ->connection_co is always present here appears incorrect: the connection may have encountered an error other than -EIO in the underlying transport, and thus may have decided to quit rather than keep trying to reconnect, and therefore it may have terminated the connection coroutine. As a result an attempt to reassign the client in this state (NBD_CLIENT_QUIT) to a different aio_context leads to a null pointer dereference: at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109 opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164 at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136 at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164 blocking=blocking@entry=true) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650 cb=, opaque=) at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71 bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172 new_context=new_context@entry=0x5618056c7580, ignore=ignore@entry=0x7f60e1e63780) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237 bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580, ignore_child=, errp=) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332 new_context=0x5618056c7580, update_root_node=update_root_node@entry=true, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989 new_context=, errp=errp@entry=0x0) at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010 at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292 at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245 running=0, state=) at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220 state=state@entry=RUN_STATE_FINISH_MIGRATE) at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275 send_stop=) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275 at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439 at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519 from /lib/x86_64-linux-gnu/libpthread.so.0 Fix it by checking that the connection coroutine is non-null before trying to enter it. If it is null, no entering is needed, as the connection is probably going down anyway. Signed-off-by: Roman Kagan --- block/nbd.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index bcd6641e90..b3cbbeb4b0 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -250,13 +250,15 @@ static void nbd_client_attach_aio_context_bh(void *opaque) BlockDriverState *bs = opaque; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -/* - * The node is still drained, so we know the coroutine has yielded in - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or it is - * entered for the first time. Both places are safe for entering the - * coroutine. - */ -qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); +if (s->connection_co) { +/* + * The node is still drained, so we know the coroutine has yielded in + * nbd_read_eof(), the only place where bs->in_flight can reach 0, or + * it is entered for the first time. Both places are safe for entering + * the coroutine. + */ +qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); +} bdrv_dec_in_flight(bs); } -- 2.29.2