Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
On Mon, Dec 05, 2022 at 09:24:28PM +0800, Sam Li wrote: > Stefan Hajnoczi 于2022年12月5日周一 20:20写道: > > > > On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote: > > > Stefan Hajnoczi 于2022年11月30日周三 10:01写道: > > > > On Thu, 27 Oct 2022 at 11:46, Sam Li wrote: > > > > > @@ -1374,9 +1428,11 @@ static int > > > > > hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz) > > > > > int ret; > > > > > > > > > > /* If DASD, get blocksizes */ > > > > > +#ifndef CONFIG_BLKZONED > > > > > if (check_for_dasd(s->fd) < 0) { > > > > > return -ENOTSUP; > > > > > } > > > > > +#endif > > > > > > > > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should > > > > only return block sizes for s390 DASD devices. I don't think zoned > > > > storage needs block size probing here. > > > > > > Zoned storage needs to be virtualized with the correct physical block > > > size and logical block size. And the probing here can guarantee that. > > > Or virtio-blk may send wrong block size to the guest. If manually set > > > block size in the command line as before, it is somewhat inaccurate. > > > > I see. I/O won't work if the guest block size differs from the physical > > zoned device's block size. > > > > However, we must not do this for regular host_device BlockDriverStates. > > The block size is manually controlled from those devices and defaults to > > 512B. That way the blocksize doesn't change across live migration and > > break the guest. > > > > Please use a run-time check instead of an #ifdef. Only probe blocksizes > > for dasd and zoned devices. > > I see. Like this? > > #ifndef CONFIG_BLKZONED > static int hdev_probe_zbd_blocksizes(BlockDriverState *bs, BlockSizes *bsz){ > int ret; > /* check zbd */ > ... > /* probe zbd */ > } > +#endif Yes, I think that's the cleanest option. You don't need to check if it's a ZBD. Only the zoned_host_device driver will use this .bdrv_probe_blocksizes() callback. The regular host_device driver will use hdev_probe_blocksizes(). Stefan signature.asc Description: PGP signature
Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Stefan Hajnoczi 于2022年12月5日周一 20:20写道: > > On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote: > > Stefan Hajnoczi 于2022年11月30日周三 10:01写道: > > > On Thu, 27 Oct 2022 at 11:46, Sam Li wrote: > > > > @@ -1374,9 +1428,11 @@ static int > > > > hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz) > > > > int ret; > > > > > > > > /* If DASD, get blocksizes */ > > > > +#ifndef CONFIG_BLKZONED > > > > if (check_for_dasd(s->fd) < 0) { > > > > return -ENOTSUP; > > > > } > > > > +#endif > > > > > > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should > > > only return block sizes for s390 DASD devices. I don't think zoned > > > storage needs block size probing here. > > > > Zoned storage needs to be virtualized with the correct physical block > > size and logical block size. And the probing here can guarantee that. > > Or virtio-blk may send wrong block size to the guest. If manually set > > block size in the command line as before, it is somewhat inaccurate. > > I see. I/O won't work if the guest block size differs from the physical > zoned device's block size. > > However, we must not do this for regular host_device BlockDriverStates. > The block size is manually controlled from those devices and defaults to > 512B. That way the blocksize doesn't change across live migration and > break the guest. > > Please use a run-time check instead of an #ifdef. Only probe blocksizes > for dasd and zoned devices. I see. Like this? #ifndef CONFIG_BLKZONED static int hdev_probe_zbd_blocksizes(BlockDriverState *bs, BlockSizes *bsz){ int ret; /* check zbd */ ... /* probe zbd */ } +#endif
Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote: > Stefan Hajnoczi 于2022年11月30日周三 10:01写道: > > On Thu, 27 Oct 2022 at 11:46, Sam Li wrote: > > > @@ -1374,9 +1428,11 @@ static int hdev_probe_blocksizes(BlockDriverState > > > *bs, BlockSizes *bsz) > > > int ret; > > > > > > /* If DASD, get blocksizes */ > > > +#ifndef CONFIG_BLKZONED > > > if (check_for_dasd(s->fd) < 0) { > > > return -ENOTSUP; > > > } > > > +#endif > > > > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should > > only return block sizes for s390 DASD devices. I don't think zoned > > storage needs block size probing here. > > Zoned storage needs to be virtualized with the correct physical block > size and logical block size. And the probing here can guarantee that. > Or virtio-blk may send wrong block size to the guest. If manually set > block size in the command line as before, it is somewhat inaccurate. I see. I/O won't work if the guest block size differs from the physical zoned device's block size. However, we must not do this for regular host_device BlockDriverStates. The block size is manually controlled from those devices and defaults to 512B. That way the blocksize doesn't change across live migration and break the guest. Please use a run-time check instead of an #ifdef. Only probe blocksizes for dasd and zoned devices. Stefan signature.asc Description: PGP signature
Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Stefan Hajnoczi 于2022年11月30日周三 10:01写道: > > On Thu, 27 Oct 2022 at 11:46, Sam Li wrote: > > > > Add a new zoned_host_device BlockDriver. The zoned_host_device option > > accepts only zoned host block devices. By adding zone management > > operations in this new BlockDriver, users can use the new block > > layer APIs including Report Zone and four zone management operations > > (open, close, finish, reset, reset_all). > > > > Qemu-io uses the new APIs to perform zoned storage commands of the device: > > zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), > > zone_finish(zf). > > > > For example, to test zone_report, use following command: > > $ ./build/qemu-io --image-opts -n driver=zoned_host_device, > > filename=/dev/nullb0 > > -c "zrp offset nr_zones" > > > > Signed-off-by: Sam Li > > Reviewed-by: Hannes Reinecke > > --- > > block/block-backend.c | 147 + > > block/file-posix.c| 348 ++ > > block/io.c| 41 > > include/block/block-io.h | 7 + > > include/block/block_int-common.h | 21 ++ > > include/block/raw-aio.h | 6 +- > > include/sysemu/block-backend-io.h | 18 ++ > > meson.build | 4 + > > qapi/block-core.json | 8 +- > > qemu-io-cmds.c| 149 + > > 10 files changed, 746 insertions(+), 3 deletions(-) > > > > diff --git a/block/block-backend.c b/block/block-backend.c > > index aa4adf06ae..731f23e816 100644 > > --- a/block/block-backend.c > > +++ b/block/block-backend.c > > @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo { > > void *iobuf; > > int ret; > > BdrvRequestFlags flags; > > +union { > > +struct { > > +unsigned int *nr_zones; > > +BlockZoneDescriptor *zones; > > +} zone_report; > > +struct { > > +unsigned long op; > > +} zone_mgmt; > > +}; > > } BlkRwCo; > > > > int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) > > @@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) > > return ret; > > } > > > > +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) > > +{ > > +BlkAioEmAIOCB *acb = opaque; > > +BlkRwCo *rwco = >rwco; > > + > > +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, > > + rwco->zone_report.nr_zones, > > + rwco->zone_report.zones); > > +blk_aio_complete(acb); > > +} > > + > > +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, > > +unsigned int *nr_zones, > > +BlockZoneDescriptor *zones, > > +BlockCompletionFunc *cb, void *opaque) > > +{ > > +BlkAioEmAIOCB *acb; > > +Coroutine *co; > > +IO_CODE(); > > + > > +blk_inc_in_flight(blk); > > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); > > +acb->rwco = (BlkRwCo) { > > +.blk= blk, > > +.offset = offset, > > +.ret= NOT_DONE, > > +.zone_report = { > > +.zones = zones, > > +.nr_zones = nr_zones, > > +}, > > +}; > > +acb->has_returned = false; > > + > > +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); > > +bdrv_coroutine_enter(blk_bs(blk), co); > > + > > +acb->has_returned = true; > > +if (acb->rwco.ret != NOT_DONE) { > > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), > > + blk_aio_complete_bh, acb); > > +} > > + > > +return >common; > > +} > > + > > +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) > > +{ > > +BlkAioEmAIOCB *acb = opaque; > > +BlkRwCo *rwco = >rwco; > > + > > +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op, > > + rwco->offset, acb->bytes); > > +blk_aio_complete(acb); > > +} > > + > > +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, > > + int64_t offset, int64_t len, > > + BlockCompletionFunc *cb, void *opaque) { > > +BlkAioEmAIOCB *acb; > > +Coroutine *co; > > +IO_CODE(); > > + > > +blk_inc_in_flight(blk); > > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); > > +acb->rwco = (BlkRwCo) { > > +.blk= blk, > > +.offset = offset, > > +.ret= NOT_DONE, > > +.zone_mgmt = { > > +.op = op, > > +}, > > +}; > > +acb->bytes = len; > > +acb->has_returned = false; > > + > > +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); > > +bdrv_coroutine_enter(blk_bs(blk), co); > > + > > +acb->has_returned = true; > > +if (acb->rwco.ret != NOT_DONE) { > > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), >
Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
On Thu, 27 Oct 2022 at 11:46, Sam Li wrote: > > Add a new zoned_host_device BlockDriver. The zoned_host_device option > accepts only zoned host block devices. By adding zone management > operations in this new BlockDriver, users can use the new block > layer APIs including Report Zone and four zone management operations > (open, close, finish, reset, reset_all). > > Qemu-io uses the new APIs to perform zoned storage commands of the device: > zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), > zone_finish(zf). > > For example, to test zone_report, use following command: > $ ./build/qemu-io --image-opts -n driver=zoned_host_device, > filename=/dev/nullb0 > -c "zrp offset nr_zones" > > Signed-off-by: Sam Li > Reviewed-by: Hannes Reinecke > --- > block/block-backend.c | 147 + > block/file-posix.c| 348 ++ > block/io.c| 41 > include/block/block-io.h | 7 + > include/block/block_int-common.h | 21 ++ > include/block/raw-aio.h | 6 +- > include/sysemu/block-backend-io.h | 18 ++ > meson.build | 4 + > qapi/block-core.json | 8 +- > qemu-io-cmds.c| 149 + > 10 files changed, 746 insertions(+), 3 deletions(-) > > diff --git a/block/block-backend.c b/block/block-backend.c > index aa4adf06ae..731f23e816 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo { > void *iobuf; > int ret; > BdrvRequestFlags flags; > +union { > +struct { > +unsigned int *nr_zones; > +BlockZoneDescriptor *zones; > +} zone_report; > +struct { > +unsigned long op; > +} zone_mgmt; > +}; > } BlkRwCo; > > int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) > @@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) > return ret; > } > > +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) > +{ > +BlkAioEmAIOCB *acb = opaque; > +BlkRwCo *rwco = >rwco; > + > +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, > + rwco->zone_report.nr_zones, > + rwco->zone_report.zones); > +blk_aio_complete(acb); > +} > + > +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, > +unsigned int *nr_zones, > +BlockZoneDescriptor *zones, > +BlockCompletionFunc *cb, void *opaque) > +{ > +BlkAioEmAIOCB *acb; > +Coroutine *co; > +IO_CODE(); > + > +blk_inc_in_flight(blk); > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); > +acb->rwco = (BlkRwCo) { > +.blk= blk, > +.offset = offset, > +.ret= NOT_DONE, > +.zone_report = { > +.zones = zones, > +.nr_zones = nr_zones, > +}, > +}; > +acb->has_returned = false; > + > +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); > +bdrv_coroutine_enter(blk_bs(blk), co); > + > +acb->has_returned = true; > +if (acb->rwco.ret != NOT_DONE) { > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), > + blk_aio_complete_bh, acb); > +} > + > +return >common; > +} > + > +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) > +{ > +BlkAioEmAIOCB *acb = opaque; > +BlkRwCo *rwco = >rwco; > + > +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op, > + rwco->offset, acb->bytes); > +blk_aio_complete(acb); > +} > + > +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, > + int64_t offset, int64_t len, > + BlockCompletionFunc *cb, void *opaque) { > +BlkAioEmAIOCB *acb; > +Coroutine *co; > +IO_CODE(); > + > +blk_inc_in_flight(blk); > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); > +acb->rwco = (BlkRwCo) { > +.blk= blk, > +.offset = offset, > +.ret= NOT_DONE, > +.zone_mgmt = { > +.op = op, > +}, > +}; > +acb->bytes = len; > +acb->has_returned = false; > + > +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); > +bdrv_coroutine_enter(blk_bs(blk), co); > + > +acb->has_returned = true; > +if (acb->rwco.ret != NOT_DONE) { > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), > + blk_aio_complete_bh, acb); > +} > + > +return >common; > +} > + > +/* > + * Send a zone_report command. > + * offset is a byte offset from the start of the device. No alignment > + * required for offset. > + * nr_zones represents IN maximum and OUT actual. > + */ >
[PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Add a new zoned_host_device BlockDriver. The zoned_host_device option accepts only zoned host block devices. By adding zone management operations in this new BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=zoned_host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke --- block/block-backend.c | 147 + block/file-posix.c| 348 ++ block/io.c| 41 include/block/block-io.h | 7 + include/block/block_int-common.h | 21 ++ include/block/raw-aio.h | 6 +- include/sysemu/block-backend-io.h | 18 ++ meson.build | 4 + qapi/block-core.json | 8 +- qemu-io-cmds.c| 149 + 10 files changed, 746 insertions(+), 3 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index aa4adf06ae..731f23e816 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo { void *iobuf; int ret; BdrvRequestFlags flags; +union { +struct { +unsigned int *nr_zones; +BlockZoneDescriptor *zones; +} zone_report; +struct { +unsigned long op; +} zone_mgmt; +}; } BlkRwCo; int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) @@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) +{ +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = >rwco; + +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, + rwco->zone_report.nr_zones, + rwco->zone_report.zones); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones, +BlockCompletionFunc *cb, void *opaque) +{ +BlkAioEmAIOCB *acb; +Coroutine *co; +IO_CODE(); + +blk_inc_in_flight(blk); +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_report = { +.zones = zones, +.nr_zones = nr_zones, +}, +}; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return >common; +} + +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) +{ +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = >rwco; + +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op, + rwco->offset, acb->bytes); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque) { +BlkAioEmAIOCB *acb; +Coroutine *co; +IO_CODE(); + +blk_inc_in_flight(blk); +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_mgmt = { +.op = op, +}, +}; +acb->bytes = len; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return >common; +} + +/* + * Send a zone_report command. + * offset is a byte offset from the start of the device. No alignment + * required for offset. + * nr_zones represents IN maximum and OUT actual. + */ +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones) +{ +int ret; +IO_CODE(); + +blk_inc_in_flight(blk); /* increase before waiting */ +blk_wait_while_drained(blk); +if