Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-12-05 Thread Stefan Hajnoczi
On Mon, Dec 05, 2022 at 09:24:28PM +0800, Sam Li wrote:
> Stefan Hajnoczi  于2022年12月5日周一 20:20写道:
> >
> > On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote:
> > > Stefan Hajnoczi  于2022年11月30日周三 10:01写道:
> > > > On Thu, 27 Oct 2022 at 11:46, Sam Li  wrote:
> > > > > @@ -1374,9 +1428,11 @@ static int 
> > > > > hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
> > > > >  int ret;
> > > > >
> > > > >  /* If DASD, get blocksizes */
> > > > > +#ifndef CONFIG_BLKZONED
> > > > >  if (check_for_dasd(s->fd) < 0) {
> > > > >  return -ENOTSUP;
> > > > >  }
> > > > > +#endif
> > > >
> > > > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should
> > > > only return block sizes for s390 DASD devices. I don't think zoned
> > > > storage needs block size probing here.
> > >
> > > Zoned storage needs to be virtualized with the correct physical block
> > > size and logical block size. And the probing here can guarantee that.
> > > Or virtio-blk may send wrong block size to the guest. If manually set
> > > block size in the command line as before, it is somewhat inaccurate.
> >
> > I see. I/O won't work if the guest block size differs from the physical
> > zoned device's block size.
> >
> > However, we must not do this for regular host_device BlockDriverStates.
> > The block size is manually controlled from those devices and defaults to
> > 512B. That way the blocksize doesn't change across live migration and
> > break the guest.
> >
> > Please use a run-time check instead of an #ifdef. Only probe blocksizes
> > for dasd and zoned devices.
> 
> I see. Like this?
> 
> #ifndef CONFIG_BLKZONED
> static int hdev_probe_zbd_blocksizes(BlockDriverState *bs, BlockSizes *bsz){
> int ret;
> /* check zbd */
> ...
> /* probe zbd */
> }
> +#endif

Yes, I think that's the cleanest option.

You don't need to check if it's a ZBD. Only the zoned_host_device driver
will use this .bdrv_probe_blocksizes() callback. The regular host_device
driver will use hdev_probe_blocksizes().

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-12-05 Thread Sam Li
Stefan Hajnoczi  于2022年12月5日周一 20:20写道:
>
> On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote:
> > Stefan Hajnoczi  于2022年11月30日周三 10:01写道:
> > > On Thu, 27 Oct 2022 at 11:46, Sam Li  wrote:
> > > > @@ -1374,9 +1428,11 @@ static int 
> > > > hdev_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
> > > >  int ret;
> > > >
> > > >  /* If DASD, get blocksizes */
> > > > +#ifndef CONFIG_BLKZONED
> > > >  if (check_for_dasd(s->fd) < 0) {
> > > >  return -ENOTSUP;
> > > >  }
> > > > +#endif
> > >
> > > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should
> > > only return block sizes for s390 DASD devices. I don't think zoned
> > > storage needs block size probing here.
> >
> > Zoned storage needs to be virtualized with the correct physical block
> > size and logical block size. And the probing here can guarantee that.
> > Or virtio-blk may send wrong block size to the guest. If manually set
> > block size in the command line as before, it is somewhat inaccurate.
>
> I see. I/O won't work if the guest block size differs from the physical
> zoned device's block size.
>
> However, we must not do this for regular host_device BlockDriverStates.
> The block size is manually controlled from those devices and defaults to
> 512B. That way the blocksize doesn't change across live migration and
> break the guest.
>
> Please use a run-time check instead of an #ifdef. Only probe blocksizes
> for dasd and zoned devices.

I see. Like this?

#ifndef CONFIG_BLKZONED
static int hdev_probe_zbd_blocksizes(BlockDriverState *bs, BlockSizes *bsz){
int ret;
/* check zbd */
...
/* probe zbd */
}
+#endif



Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-12-05 Thread Stefan Hajnoczi
On Wed, Nov 30, 2022 at 10:24:10AM +0800, Sam Li wrote:
> Stefan Hajnoczi  于2022年11月30日周三 10:01写道:
> > On Thu, 27 Oct 2022 at 11:46, Sam Li  wrote:
> > > @@ -1374,9 +1428,11 @@ static int hdev_probe_blocksizes(BlockDriverState 
> > > *bs, BlockSizes *bsz)
> > >  int ret;
> > >
> > >  /* If DASD, get blocksizes */
> > > +#ifndef CONFIG_BLKZONED
> > >  if (check_for_dasd(s->fd) < 0) {
> > >  return -ENOTSUP;
> > >  }
> > > +#endif
> >
> > What is the purpose of this #ifndef? .bdrv_probe_blocksizes() should
> > only return block sizes for s390 DASD devices. I don't think zoned
> > storage needs block size probing here.
> 
> Zoned storage needs to be virtualized with the correct physical block
> size and logical block size. And the probing here can guarantee that.
> Or virtio-blk may send wrong block size to the guest. If manually set
> block size in the command line as before, it is somewhat inaccurate.

I see. I/O won't work if the guest block size differs from the physical
zoned device's block size.

However, we must not do this for regular host_device BlockDriverStates.
The block size is manually controlled from those devices and defaults to
512B. That way the blocksize doesn't change across live migration and
break the guest.

Please use a run-time check instead of an #ifdef. Only probe blocksizes
for dasd and zoned devices.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-11-29 Thread Sam Li
Stefan Hajnoczi  于2022年11月30日周三 10:01写道:
>
> On Thu, 27 Oct 2022 at 11:46, Sam Li  wrote:
> >
> > Add a new zoned_host_device BlockDriver. The zoned_host_device option
> > accepts only zoned host block devices. By adding zone management
> > operations in this new BlockDriver, users can use the new block
> > layer APIs including Report Zone and four zone management operations
> > (open, close, finish, reset, reset_all).
> >
> > Qemu-io uses the new APIs to perform zoned storage commands of the device:
> > zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> > zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts -n driver=zoned_host_device, 
> > filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Hannes Reinecke 
> > ---
> >  block/block-backend.c | 147 +
> >  block/file-posix.c| 348 ++
> >  block/io.c|  41 
> >  include/block/block-io.h  |   7 +
> >  include/block/block_int-common.h  |  21 ++
> >  include/block/raw-aio.h   |   6 +-
> >  include/sysemu/block-backend-io.h |  18 ++
> >  meson.build   |   4 +
> >  qapi/block-core.json  |   8 +-
> >  qemu-io-cmds.c| 149 +
> >  10 files changed, 746 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index aa4adf06ae..731f23e816 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
> >  void *iobuf;
> >  int ret;
> >  BdrvRequestFlags flags;
> > +union {
> > +struct {
> > +unsigned int *nr_zones;
> > +BlockZoneDescriptor *zones;
> > +} zone_report;
> > +struct {
> > +unsigned long op;
> > +} zone_mgmt;
> > +};
> >  } BlkRwCo;
> >
> >  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> > @@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
> >  return ret;
> >  }
> >
> > +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb = opaque;
> > +BlkRwCo *rwco = >rwco;
> > +
> > +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> > +   rwco->zone_report.nr_zones,
> > +   rwco->zone_report.zones);
> > +blk_aio_complete(acb);
> > +}
> > +
> > +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> > +unsigned int *nr_zones,
> > +BlockZoneDescriptor  *zones,
> > +BlockCompletionFunc *cb, void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb;
> > +Coroutine *co;
> > +IO_CODE();
> > +
> > +blk_inc_in_flight(blk);
> > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> > +acb->rwco = (BlkRwCo) {
> > +.blk= blk,
> > +.offset = offset,
> > +.ret= NOT_DONE,
> > +.zone_report = {
> > +.zones = zones,
> > +.nr_zones = nr_zones,
> > +},
> > +};
> > +acb->has_returned = false;
> > +
> > +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> > +bdrv_coroutine_enter(blk_bs(blk), co);
> > +
> > +acb->has_returned = true;
> > +if (acb->rwco.ret != NOT_DONE) {
> > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> > + blk_aio_complete_bh, acb);
> > +}
> > +
> > +return >common;
> > +}
> > +
> > +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> > +{
> > +BlkAioEmAIOCB *acb = opaque;
> > +BlkRwCo *rwco = >rwco;
> > +
> > +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> > + rwco->offset, acb->bytes);
> > +blk_aio_complete(acb);
> > +}
> > +
> > +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> > +  int64_t offset, int64_t len,
> > +  BlockCompletionFunc *cb, void *opaque) {
> > +BlkAioEmAIOCB *acb;
> > +Coroutine *co;
> > +IO_CODE();
> > +
> > +blk_inc_in_flight(blk);
> > +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> > +acb->rwco = (BlkRwCo) {
> > +.blk= blk,
> > +.offset = offset,
> > +.ret= NOT_DONE,
> > +.zone_mgmt = {
> > +.op = op,
> > +},
> > +};
> > +acb->bytes = len;
> > +acb->has_returned = false;
> > +
> > +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> > +bdrv_coroutine_enter(blk_bs(blk), co);
> > +
> > +acb->has_returned = true;
> > +if (acb->rwco.ret != NOT_DONE) {
> > +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> 

Re: [PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-11-29 Thread Stefan Hajnoczi
On Thu, 27 Oct 2022 at 11:46, Sam Li  wrote:
>
> Add a new zoned_host_device BlockDriver. The zoned_host_device option
> accepts only zoned host block devices. By adding zone management
> operations in this new BlockDriver, users can use the new block
> layer APIs including Report Zone and four zone management operations
> (open, close, finish, reset, reset_all).
>
> Qemu-io uses the new APIs to perform zoned storage commands of the device:
> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts -n driver=zoned_host_device, 
> filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c | 147 +
>  block/file-posix.c| 348 ++
>  block/io.c|  41 
>  include/block/block-io.h  |   7 +
>  include/block/block_int-common.h  |  21 ++
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |  18 ++
>  meson.build   |   4 +
>  qapi/block-core.json  |   8 +-
>  qemu-io-cmds.c| 149 +
>  10 files changed, 746 insertions(+), 3 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index aa4adf06ae..731f23e816 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
>  void *iobuf;
>  int ret;
>  BdrvRequestFlags flags;
> +union {
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +unsigned long op;
> +} zone_mgmt;
> +};
>  } BlkRwCo;
>
>  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> @@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>
> +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = >rwco;
> +
> +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> +   rwco->zone_report.nr_zones,
> +   rwco->zone_report.zones);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor  *zones,
> +BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_report = {
> +.zones = zones,
> +.nr_zones = nr_zones,
> +},
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = >rwco;
> +
> +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> + rwco->offset, acb->bytes);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +  int64_t offset, int64_t len,
> +  BlockCompletionFunc *cb, void *opaque) {
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_mgmt = {
> +.op = op,
> +},
> +};
> +acb->bytes = len;
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> 

[PATCH v13 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-10-27 Thread Sam Li
Add a new zoned_host_device BlockDriver. The zoned_host_device option
accepts only zoned host block devices. By adding zone management
operations in this new BlockDriver, users can use the new block
layer APIs including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).

Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=zoned_host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
---
 block/block-backend.c | 147 +
 block/file-posix.c| 348 ++
 block/io.c|  41 
 include/block/block-io.h  |   7 +
 include/block/block_int-common.h  |  21 ++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  18 ++
 meson.build   |   4 +
 qapi/block-core.json  |   8 +-
 qemu-io-cmds.c| 149 +
 10 files changed, 746 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..731f23e816 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
 void *iobuf;
 int ret;
 BdrvRequestFlags flags;
+union {
+struct {
+unsigned int *nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report;
+struct {
+unsigned long op;
+} zone_mgmt;
+};
 } BlkRwCo;
 
 int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
@@ -1775,6 +1784,144 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
+   rwco->zone_report.nr_zones,
+   rwco->zone_report.zones);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor  *zones,
+BlockCompletionFunc *cb, void *opaque)
+{
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.ret= NOT_DONE,
+.zone_report = {
+.zones = zones,
+.nr_zones = nr_zones,
+},
+};
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
+ rwco->offset, acb->bytes);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+  int64_t offset, int64_t len,
+  BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.ret= NOT_DONE,
+.zone_mgmt = {
+.op = op,
+},
+};
+acb->bytes = len;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+if