Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-19 Thread Eric Blake

On 8/13/20 11:29 AM, Kevin Wolf wrote:

Instead of implementing qemu-nbd --offset in the NBD code, just put a
raw block node with the requested offset on top of the user image and
rely on that doing the job.

This does not only simplify the nbd_export_new() interface and bring it
closer to the set of options that the nbd-server-add QMP command offers,
but in fact it also eliminates a potential source for bugs in the NBD
code which previously had to add the offset manually in all relevant
places.


Yay!  This patch alone is worth having, regardless of the fate of the 
rest of the series: no change in end-user functionality, but by making 
qemu-nbd turn it into proper syntactic sugar, we've reduced the 
maintenance burden of duplicated code.




Signed-off-by: Kevin Wolf 
---
  include/block/nbd.h |  4 ++--
  blockdev-nbd.c  |  9 +
  nbd/server.c| 34 +-
  qemu-nbd.c  | 27 ---
  4 files changed, 32 insertions(+), 42 deletions(-)




+++ b/nbd/server.c
@@ -89,7 +89,6 @@ struct NBDExport {
  BlockBackend *blk;
  char *name;
  char *description;
-uint64_t dev_offset;
  uint64_t size;


I'm trying to figure out if we can also drop 'size' here.  If we do, the 
consequence would be that an NBD client could ask for beyond-EOF I/O, 
and presumably the block layer would reject that gracefully (although 
not necessarily with the same errno as NBD currently reports).  I'm fine 
leaving it alone in this patch, though.



@@ -1569,7 +1574,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t 
dev_offset,
  exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
NBD_FLAG_SEND_FAST_ZERO);
  }
-assert(size <= INT64_MAX - dev_offset);
+assert(size <= INT64_MAX);


As Max caught, this is now dead code.


@@ -2386,8 +2388,7 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
  if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
  flags |= BDRV_REQ_NO_FALLBACK;
  }
-ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
-request->len, flags);
+ret = blk_pwrite_zeroes(exp->blk, request->from, request->len, flags);
  return nbd_send_generic_reply(client, request->handle, ret,
"writing to file failed", errp);
  
@@ -2401,8 +2402,7 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,

"flush failed", errp);
  
  case NBD_CMD_TRIM:

-ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
-  request->len);
+ret = blk_co_pdiscard(exp->blk, request->from, request->len);


Merge conflicts with 890cbccb08; should be obvious enough to resolve, 
though.



+++ b/qemu-nbd.c
@@ -523,7 +523,6 @@ int main(int argc, char **argv)
  const char *port = NULL;
  char *sockpath = NULL;
  char *device = NULL;
-int64_t fd_size;
  QemuOpts *sn_opts = NULL;
  const char *sn_id_or_name = NULL;
  const char *sopt = "hVb:o:p:rsnc:dvk:e:f:tl:x:T:D:B:L";
@@ -1028,6 +1027,17 @@ int main(int argc, char **argv)
  }
  bs = blk_bs(blk);
  
+if (dev_offset) {

+QDict *raw_opts = qdict_new();
+qdict_put_str(raw_opts, "driver", "raw");
+qdict_put_str(raw_opts, "file", bs->node_name);
+qdict_put_int(raw_opts, "offset", dev_offset);


Huh.  When 0bc16997f5 got rid of the --partition option, it also got rid 
of the only way that the NBD driver could clamp down requests to a range 
smaller than the end of the file.  Now that you are adding a raw driver 
in the mix, that ability to clamp the end of the range (aka a --size 
option, in addition to an --offset option) may be worth reinstating. 
But that can be done as a separate patch, if at all (and whether 
qemu-nbd should do it, or qemu-storage-daemon, or whether we just point 
people at 'nbdkit --filter=partition', is part of that discussion).  But 
for this patch, it looks like you are making a straight-across conversion.



+bs = bdrv_open(NULL, NULL, raw_opts, flags, _fatal);
+blk_remove_bs(blk);
+blk_insert_bs(blk, bs, _fatal);
+bdrv_unref(bs);
+}
+


Slick.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-18 Thread Nir Soffer
On Tue, Aug 18, 2020 at 11:47 AM Kevin Wolf  wrote:

> Am 17.08.2020 um 19:19 hat Nir Soffer geschrieben:
> > On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:
> >
> > > Instead of implementing qemu-nbd --offset in the NBD code, just put a
> > > raw block node with the requested offset on top of the user image and
> > > rely on that doing the job.
> > >
> > > This does not only simplify the nbd_export_new() interface and bring it
> > > closer to the set of options that the nbd-server-add QMP command
> offers,
> > > but in fact it also eliminates a potential source for bugs in the NBD
> > > code which previously had to add the offset manually in all relevant
> > > places.
> > >
> >
> > Just to make sure I understand this correctly -
> >
> > qemu-nbd can work with:
> >
> > $ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'
> >
> > And:
> >
> > $ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
> > "filename": "test.raw"}}'
> >
> > I assumed that we always create the raw node?
>
> No, the first form creates only the 'file' node without a 'raw' node on
> top. For all practical matters, this should be the same in qemu-img or
> qemu-nbd. For actually running VMs, omitting the 'raw' node where it's
> not needed can improve performance a little.
>

We did not check if we have different performance with the extra raw node.
Since in our use case (copying images) small reads/writes are unlikely, I
don't
think it will make a difference.

What is true is that if you use a filename without specifying the driver
> (i.e.  you rely on format probing), you'll get a 'raw' node on top of
> the 'file' node.
>
> > oVirt always uses the second form to make it easier to support offset,
> > size, and backing.
> >
> https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104
> >
> > This also seems to be the way libvirt builds the nodes using -blockdev.
>
> libvirt actually has a BZ to avoid the 'raw' node for performance when
> it's not needed.
>
> > Do we have a way to visualize the internal node graph used by
> > qemu-nbd/qemu-img?
>
> No, but as long as you explicitly specify the driver, you get exactly
> what you specified.
>

So this is not really needed then.


> For exploring what happens, you can pass the same json: filename to QEMU
> (maybe with -hda) and then use the monitor to inspect the state.
>
> Kevin
>
>


Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-18 Thread Kevin Wolf
Am 17.08.2020 um 19:19 hat Nir Soffer geschrieben:
> On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:
> 
> > Instead of implementing qemu-nbd --offset in the NBD code, just put a
> > raw block node with the requested offset on top of the user image and
> > rely on that doing the job.
> >
> > This does not only simplify the nbd_export_new() interface and bring it
> > closer to the set of options that the nbd-server-add QMP command offers,
> > but in fact it also eliminates a potential source for bugs in the NBD
> > code which previously had to add the offset manually in all relevant
> > places.
> >
> 
> Just to make sure I understand this correctly -
> 
> qemu-nbd can work with:
> 
> $ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'
> 
> And:
> 
> $ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
> "filename": "test.raw"}}'
> 
> I assumed that we always create the raw node?

No, the first form creates only the 'file' node without a 'raw' node on
top. For all practical matters, this should be the same in qemu-img or
qemu-nbd. For actually running VMs, omitting the 'raw' node where it's
not needed can improve performance a little.

What is true is that if you use a filename without specifying the driver
(i.e.  you rely on format probing), you'll get a 'raw' node on top of
the 'file' node.

> oVirt always uses the second form to make it easier to support offset,
> size, and backing.
> https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104
> 
> This also seems to be the way libvirt builds the nodes using -blockdev.

libvirt actually has a BZ to avoid the 'raw' node for performance when
it's not needed.

> Do we have a way to visualize the internal node graph used by
> qemu-nbd/qemu-img?

No, but as long as you explicitly specify the driver, you get exactly
what you specified.

For exploring what happens, you can pass the same json: filename to QEMU
(maybe with -hda) and then use the monitor to inspect the state.

Kevin




Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Nir Soffer
On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:

> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
>
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
>

Just to make sure I understand this correctly -

qemu-nbd can work with:

$ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'

And:

$ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
"filename": "test.raw"}}'

I assumed that we always create the raw node?

oVirt always uses the second form to make it easier to support offset,
size, and backing.
https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104

This also seems to be the way libvirt builds the nodes using -blockdev.

Do we have a way to visualize the internal node graph used by
qemu-nbd/qemu-img?

Nir

Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)
>
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index c8c5cb6b61..3846d2bac8 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -329,8 +329,8 @@ typedef struct NBDExport NBDExport;
>  typedef struct NBDClient NBDClient;
>
>  BlockExport *nbd_export_create(BlockExportOptions *exp_args, Error
> **errp);
> -NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> -  uint64_t size, const char *name, const char
> *desc,
> +NBDExport *nbd_export_new(BlockDriverState *bs,
> +  const char *name, const char *desc,
>const char *bitmap, bool readonly, bool shared,
>void (*close)(NBDExport *), bool writethrough,
>BlockBackend *on_eject_blk, Error **errp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index a1dc11bdd7..16cda3b052 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -154,7 +154,6 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>  BlockDriverState *bs = NULL;
>  BlockBackend *on_eject_blk;
>  NBDExport *exp = NULL;
> -int64_t len;
>  AioContext *aio_context;
>
>  assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
> @@ -192,12 +191,6 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>
>  aio_context = bdrv_get_aio_context(bs);
>  aio_context_acquire(aio_context);
> -len = bdrv_getlength(bs);
> -if (len < 0) {
> -error_setg_errno(errp, -len,
> - "Failed to determine the NBD export's length");
> -goto out;
> -}
>
>  if (!arg->has_writable) {
>  arg->writable = false;
> @@ -206,7 +199,7 @@ BlockExport *nbd_export_create(BlockExportOptions
> *exp_args, Error **errp)
>  arg->writable = false;
>  }
>
> -exp = nbd_export_new(bs, 0, len, arg->name, arg->description,
> arg->bitmap,
> +exp = nbd_export_new(bs, arg->name, arg->description, arg->bitmap,
>   !arg->writable, !arg->writable,
>   NULL, false, on_eject_blk, errp);
>  if (!exp) {
> diff --git a/nbd/server.c b/nbd/server.c
> index 774325dbe5..92360d1f08 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -89,7 +89,6 @@ struct NBDExport {
>  BlockBackend *blk;
>  char *name;
>  char *description;
> -uint64_t dev_offset;
>  uint64_t size;
>  uint16_t nbdflags;
>  QTAILQ_HEAD(, NBDClient) clients;
> @@ -1507,8 +1506,8 @@ static void nbd_eject_notifier(Notifier *n, void
> *data)
>  aio_context_release(aio_context);
>  }
>
> -NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> -  uint64_t size, const char *name, const char
> *desc,
> +NBDExport *nbd_export_new(BlockDriverState *bs,
> +  const char *name, const char *desc,
>const char *bitmap, bool readonly, bool shared,
>void (*close)(NBDExport *), bool writethrough,
>BlockBackend *on_eject_blk, Error **errp)
> @@ -1516,9 +1515,17 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
> uint64_t dev_offset,
>  AioContext *ctx;
>  BlockBackend *blk;
>  NBDExport *exp;
> +int64_t size;
>  uint64_t perm;
>  int ret;
>
> +size = bdrv_getlength(bs);
> +if (size < 0) {
> +error_setg_errno(errp, -size,
> + "Failed to 

Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
> 
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)

[...]

> diff --git a/nbd/server.c b/nbd/server.c
> index 774325dbe5..92360d1f08 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c

[...]

> @@ -1569,7 +1574,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, 
> uint64_t dev_offset,
>  exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
>NBD_FLAG_SEND_FAST_ZERO);
>  }
> -assert(size <= INT64_MAX - dev_offset);
> +assert(size <= INT64_MAX);

Forgot to note: I think we can drop this assertion altogether now.

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-17 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Instead of implementing qemu-nbd --offset in the NBD code, just put a
> raw block node with the requested offset on top of the user image and
> rely on that doing the job.
> 
> This does not only simplify the nbd_export_new() interface and bring it
> closer to the set of options that the nbd-server-add QMP command offers,
> but in fact it also eliminates a potential source for bugs in the NBD
> code which previously had to add the offset manually in all relevant
> places.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/nbd.h |  4 ++--
>  blockdev-nbd.c  |  9 +
>  nbd/server.c| 34 +-
>  qemu-nbd.c  | 27 ---
>  4 files changed, 32 insertions(+), 42 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-13 Thread Kevin Wolf
Instead of implementing qemu-nbd --offset in the NBD code, just put a
raw block node with the requested offset on top of the user image and
rely on that doing the job.

This does not only simplify the nbd_export_new() interface and bring it
closer to the set of options that the nbd-server-add QMP command offers,
but in fact it also eliminates a potential source for bugs in the NBD
code which previously had to add the offset manually in all relevant
places.

Signed-off-by: Kevin Wolf 
---
 include/block/nbd.h |  4 ++--
 blockdev-nbd.c  |  9 +
 nbd/server.c| 34 +-
 qemu-nbd.c  | 27 ---
 4 files changed, 32 insertions(+), 42 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index c8c5cb6b61..3846d2bac8 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -329,8 +329,8 @@ typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 
 BlockExport *nbd_export_create(BlockExportOptions *exp_args, Error **errp);
-NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
-  uint64_t size, const char *name, const char *desc,
+NBDExport *nbd_export_new(BlockDriverState *bs,
+  const char *name, const char *desc,
   const char *bitmap, bool readonly, bool shared,
   void (*close)(NBDExport *), bool writethrough,
   BlockBackend *on_eject_blk, Error **errp);
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index a1dc11bdd7..16cda3b052 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -154,7 +154,6 @@ BlockExport *nbd_export_create(BlockExportOptions 
*exp_args, Error **errp)
 BlockDriverState *bs = NULL;
 BlockBackend *on_eject_blk;
 NBDExport *exp = NULL;
-int64_t len;
 AioContext *aio_context;
 
 assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
@@ -192,12 +191,6 @@ BlockExport *nbd_export_create(BlockExportOptions 
*exp_args, Error **errp)
 
 aio_context = bdrv_get_aio_context(bs);
 aio_context_acquire(aio_context);
-len = bdrv_getlength(bs);
-if (len < 0) {
-error_setg_errno(errp, -len,
- "Failed to determine the NBD export's length");
-goto out;
-}
 
 if (!arg->has_writable) {
 arg->writable = false;
@@ -206,7 +199,7 @@ BlockExport *nbd_export_create(BlockExportOptions 
*exp_args, Error **errp)
 arg->writable = false;
 }
 
-exp = nbd_export_new(bs, 0, len, arg->name, arg->description, arg->bitmap,
+exp = nbd_export_new(bs, arg->name, arg->description, arg->bitmap,
  !arg->writable, !arg->writable,
  NULL, false, on_eject_blk, errp);
 if (!exp) {
diff --git a/nbd/server.c b/nbd/server.c
index 774325dbe5..92360d1f08 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -89,7 +89,6 @@ struct NBDExport {
 BlockBackend *blk;
 char *name;
 char *description;
-uint64_t dev_offset;
 uint64_t size;
 uint16_t nbdflags;
 QTAILQ_HEAD(, NBDClient) clients;
@@ -1507,8 +1506,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
 aio_context_release(aio_context);
 }
 
-NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
-  uint64_t size, const char *name, const char *desc,
+NBDExport *nbd_export_new(BlockDriverState *bs,
+  const char *name, const char *desc,
   const char *bitmap, bool readonly, bool shared,
   void (*close)(NBDExport *), bool writethrough,
   BlockBackend *on_eject_blk, Error **errp)
@@ -1516,9 +1515,17 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t 
dev_offset,
 AioContext *ctx;
 BlockBackend *blk;
 NBDExport *exp;
+int64_t size;
 uint64_t perm;
 int ret;
 
+size = bdrv_getlength(bs);
+if (size < 0) {
+error_setg_errno(errp, -size,
+ "Failed to determine the NBD export's length");
+return NULL;
+}
+
 exp = g_new0(NBDExport, 1);
 exp->common = (BlockExport) {
 .drv = _exp_nbd,
@@ -1553,8 +1560,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t 
dev_offset,
 exp->refcount = 1;
 QTAILQ_INIT(>clients);
 exp->blk = blk;
-assert(dev_offset <= INT64_MAX);
-exp->dev_offset = dev_offset;
 exp->name = g_strdup(name);
 assert(!desc || strlen(desc) <= NBD_MAX_STRING_SIZE);
 exp->description = g_strdup(desc);
@@ -1569,7 +1574,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t 
dev_offset,
 exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
   NBD_FLAG_SEND_FAST_ZERO);
 }
-assert(size <= INT64_MAX - dev_offset);
+assert(size <= INT64_MAX);
 exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);
 
 if (bitmap) {
@@ -1928,8