Am 09.02.2023 um 16:45 hat Emanuele Giuseppe Esposito geschrieben: > When calling bdrv_getlength() in handle_aiocb_write_zeroes(), the > function creates a new coroutine and then waits that it finishes using > AIO_WAIT_WHILE. > The problem is that this function could also run in a worker thread, > that has a different AioContext from main loop and iothreads, therefore > in AIO_WAIT_WHILE we will have in_aio_context_home_thread(ctx) == false > and therefore > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch will fail, crashing QEMU. > > Aside from that, bdrv_getlength() is wrong also conceptually, because > it reads the BDS graph from another thread and is not protected by > any lock. > > Replace it with raw_co_getlength, that doesn't create a coroutine and > doesn't read the BDS graph. > > Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com> > --- > block/file-posix.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index d3073a7caa..9a99111f45 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1738,7 +1738,7 @@ static int handle_aiocb_write_zeroes(void *opaque) > #ifdef CONFIG_FALLOCATE > /* Last resort: we are trying to extend the file with zeroed data. This > * can be done via fallocate(fd, 0) */ > - len = bdrv_getlength(aiocb->bs); > + len = raw_co_getlength(aiocb->bs); > if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) { > int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, > aiocb->aio_nbytes); > if (ret == 0 || ret != -ENOTSUP) {
Obviously this relies on the fact that raw_co_getlength() doesn't actually depend on running in coroutine context. Could be done in a separate patch, but I think we should rename it back to raw_getlength() and remove the coroutine_fn annotation again. Seems commit c86422c5549 was a little too eager. Kevin