Re: [Qemu-block] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant 

Thanks for the patch!

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/xen_disk.c | 141 
> 
>  1 file changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..a9942d32db 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* - */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>  BlockAcctCookie acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>  struct XenDevicexendev;  /* must be first */
>  char*params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>  booldirectiosafe;
>  const char  *fileproto;
>  const char  *filename;
> -int ring_ref;
> +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> +unsigned intnr_ring_ref;
>  void*sring;
>  int64_t file_blk;
>  int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>  int requests_total;
>  int requests_inflight;
>  int requests_finished;
> +unsigned intmax_requests;
>  
>  /* Persistent grants extension */
>  gbooleanfeature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>  struct ioreq *ioreq = NULL;
>  
>  if (QLIST_EMPTY(>freelist)) {
> -if (blkdev->requests_total >= max_requests) {
> +if (blkdev->requests_total >= blkdev->max_requests) {
>  goto out;
>  }
>  /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>  ioreq_runio_qemu_aio(ioreq);
>  }
>  
> -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +if (blkdev->more_work && blkdev->requests_inflight < 
> blkdev->max_requests) {
>  qemu_bh_schedule(blkdev->bh);
>  }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>  blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> -if (xengnttab_set_max_grants(xendev->gnttabdev,
> -MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -  strerror(errno));
> -}
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>!blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
> +xenstore_write_be_int(>xendev, "max-ring-page-order",
> +  MAX_RING_PAGE_ORDER);
> +
>  blk_parse_discard(blkdev);
>  
>  g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>  return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 2 * max_req * max_seg.
> + */
> +#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> +
>  static int blk_connect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  int pers, index, qflags;
>  bool readonly = true;
>  bool writethrough = true;
> +int order, ring_ref;
> +unsigned int ring_size, max_grants;
> +unsigned int i;
> +uint32_t *domids;
>  
>  /* read-only ? */
>  if (blkdev->directiosafe) {
> 

Re: [Qemu-block] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant 

CC'ing Roger.

It is true that using feature-persistent together with grant copies is a
a very bad idea.

But this change enstablishes an explicit preference of
feature_grant_copy over feature-persistent in the xen_disk backend. It
is not obvious to me that it should be the case.

Why is feature_grant_copy (without feature-persistent) better than
feature-persistent (without feature_grant_copy)? Shouldn't we simply
avoid grant copies to copy data to persistent grants?


> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/xen_disk.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>  blkdev->file_blk  = BLOCK_SIZE;
>  
> +blkdev->feature_grant_copy =
> +(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> +
> +xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> +  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>  /* fill info
>   * blk_connect supplies sector-size and sectors
>   */
>  xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
> -xenstore_write_be_int(>xendev, "feature-persistent", 1);
> +xenstore_write_be_int(>xendev, "feature-persistent",
> +  !blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
>  blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>  xen_be_bind_evtchn(>xendev);
>  
> -blkdev->feature_grant_copy =
> -(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> -
> -xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> -  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>  xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
>"remote port %d, local port %d\n",
>blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 



Re: [Qemu-block] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paul Durrant
> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: 20 June 2017 17:08
> To: Paul Durrant ; xen-de...@lists.xenproject.org;
> qemu-de...@nongnu.org; qemu-block@nongnu.org
> Cc: Anthony Perard ; Kevin Wolf
> ; Stefano Stabellini ; Max Reitz
> 
> Subject: Re: [PATCH 3/3] xen-disk: use an IOThread per instance
> 
> On 20/06/2017 15:47, Paul Durrant wrote:
> > This patch allocates an IOThread object for each xen_disk instance and
> > sets the AIO context appropriately on connect. This allows processing
> > of I/O to proceed in parallel.
> >
> > The patch also adds tracepoints into xen_disk to make it possible to
> > follow the state transtions of an instance in the log.
> >
> > Signed-off-by: Paul Durrant 
> 
> The QEMU block layer is not yet thread safe, but code running in
> IOThreads still has to take the AioContext lock.  You need to call
> aio_context_acquire/release in blk_bh and qemu_aio_complete.
> 

Ok, thanks. I'll update the patch and re-test.

Cheers,

  Paul

> Paolo
> 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: Kevin Wolf 
> > Cc: Max Reitz 
> > ---
> >  hw/block/trace-events |  7 +++
> >  hw/block/xen_disk.c   | 44
> +++-
> >  2 files changed, 50 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > index 65e83dc258..608b24ba66 100644
> > --- a/hw/block/trace-events
> > +++ b/hw/block/trace-events
> > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> num_reqs, uint64_t offset,
> >  # hw/block/hd-geometry.c
> >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> LCHS %d %d %d"
> >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs, int trans) "blk %p CHS %u %u %u trans %d"
> > +
> > +# hw/block/xen_disk.c
> > +xen_disk_alloc(char *name) "%s"
> > +xen_disk_init(char *name) "%s"
> > +xen_disk_connect(char *name) "%s"
> > +xen_disk_disconnect(char *name) "%s"
> > +xen_disk_free(char *name) "%s"
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index a9942d32db..ec1085c802 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -27,10 +27,13 @@
> >  #include "hw/xen/xen_backend.h"
> >  #include "xen_blkif.h"
> >  #include "sysemu/blockdev.h"
> > +#include "sysemu/iothread.h"
> >  #include "sysemu/block-backend.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qstring.h"
> > +#include "qom/object_interfaces.h"
> > +#include "trace.h"
> >
> >  /* - */
> >
> > @@ -128,6 +131,9 @@ struct XenBlkDev {
> >  DriveInfo   *dinfo;
> >  BlockBackend*blk;
> >  QEMUBH  *bh;
> > +
> > +IOThread*iothread;
> > +AioContext  *ctx;
> >  };
> >
> >  /* - */
> > @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
> >  static void blk_alloc(struct XenDevice *xendev)
> >  {
> >  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev,
> xendev);
> > +Object *obj;
> > +char *name;
> > +Error *err = NULL;
> > +
> > +trace_xen_disk_alloc(xendev->name);
> >
> >  QLIST_INIT(>inflight);
> >  QLIST_INIT(>finished);
> >  QLIST_INIT(>freelist);
> > -blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> > +
> > +obj = object_new(TYPE_IOTHREAD);
> > +name = g_strdup_printf("iothread-%s", xendev->name);
> > +
> > +object_property_add_child(object_get_objects_root(), name, obj,
> );
> > +assert(!err);
> > +
> > +g_free(name);
> > +
> > +user_creatable_complete(obj, );
> > +assert(!err);
> > +
> > +blkdev->iothread = (IOThread *)object_dynamic_cast(obj,
> TYPE_IOTHREAD);
> > +blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> > +blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> > +
> >  if (xen_mode != XEN_EMULATE) {
> >  batch_maps = 1;
> >  }
> > @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
> >  int info = 0;
> >  char *directiosafe = NULL;
> >
> > +trace_xen_disk_init(xendev->name);
> > +
> >  /* read xenstore entries */
> >  if (blkdev->params == NULL) {
> >  char *h = NULL;
> > @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
> >  unsigned int i;
> >  uint32_t *domids;
> >
> > +trace_xen_disk_connect(xendev->name);
> > +
> >  /* read-only ? */
> >  if (blkdev->directiosafe) {
> >  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> > @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
> 

Re: [Qemu-block] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paolo Bonzini
On 20/06/2017 15:47, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant 

The QEMU block layer is not yet thread safe, but code running in
IOThreads still has to take the AioContext lock.  You need to call
aio_context_acquire/release in blk_bh and qemu_aio_complete.

Paolo

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/trace-events |  7 +++
>  hw/block/xen_disk.c   | 44 +++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
> num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p 
> LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, 
> int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index a9942d32db..ec1085c802 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* - */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>  DriveInfo   *dinfo;
>  BlockBackend*blk;
>  QEMUBH  *bh;
> +
> +IOThread*iothread;
> +AioContext  *ctx;
>  };
>  
>  /* - */
> @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> +Object *obj;
> +char *name;
> +Error *err = NULL;
> +
> +trace_xen_disk_alloc(xendev->name);
>  
>  QLIST_INIT(>inflight);
>  QLIST_INIT(>finished);
>  QLIST_INIT(>freelist);
> -blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> +obj = object_new(TYPE_IOTHREAD);
> +name = g_strdup_printf("iothread-%s", xendev->name);
> +
> +object_property_add_child(object_get_objects_root(), name, obj, );
> +assert(!err);
> +
> +g_free(name);
> +
> +user_creatable_complete(obj, );
> +assert(!err);
> +
> +blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> +blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> +blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
>  int info = 0;
>  char *directiosafe = NULL;
>  
> +trace_xen_disk_init(xendev->name);
> +
>  /* read xenstore entries */
>  if (blkdev->params == NULL) {
>  char *h = NULL;
> @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
>  unsigned int i;
>  uint32_t *domids;
>  
> +trace_xen_disk_connect(xendev->name);
> +
>  /* read-only ? */
>  if (blkdev->directiosafe) {
>  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
>  blkdev->persistent_gnt_count = 0;
>  }
>  
> +blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
>  xen_be_bind_evtchn(>xendev);
>  
>  xen_pv_printf(>xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  
> +trace_xen_disk_disconnect(xendev->name);
> +
> +aio_context_acquire(blkdev->ctx);
> +
>  if (blkdev->blk) {
> +blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
>  blk_detach_dev(blkdev->blk, blkdev);
>  blk_unref(blkdev->blk);
>  blkdev->blk = NULL;
>  }
>  xen_pv_unbind_evtchn(>xendev);
>  
> +

Re: [Qemu-block] [PATCH] qcow2: Use offset_into_cluster() and offset_to_l2_index()

2017-06-20 Thread Kevin Wolf
Am 20.06.2017 um 15:01 hat Alberto Garcia geschrieben:
> We already have functions for doing these calculations, so let's use
> them instead of doing everything by hand. This makes the code a bit
> more readable.
> 
> Signed-off-by: Alberto Garcia 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v4 0/7] Reduce the number of I/O ops when doing COW

2017-06-20 Thread Kevin Wolf
Am 19.06.2017 um 15:40 hat Alberto Garcia geschrieben:
> Hi all,
> 
> here's a patch series that rewrites the copy-on-write code in the
> qcow2 driver to reduce the number of I/O operations.
> 
> This is version v4, please refer to the original e-mail for a complete
> description:
> 
> https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v9 07/20] block: deprecate "encryption=on" in favor of "encrypt.format=aes"

2017-06-20 Thread Alberto Garcia
On Tue 20 Jun 2017 02:02:06 PM CEST, Daniel P. Berrange wrote:
>> > +if (encryptfmt) {
>> > +buf = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT);
>> > +if (buf != NULL) {
>> > +g_free(buf);
>> 
>> If you use qemu_opt_get() instead then you don't need "buf" at all,
>> do you?
>
> IIRC, we needed to delete the option from opts, otherwise something
> will later complain that there are opts that are not consumed.

I don't see how, since once you reached this point it means that you
cannot create the image because the options are already wrong.

Anyway, there's a more important problem with this patch that I just
noticed: in this scenario you're freeing 'buf' twice, first in the snip
I quoted above, and then at the end of qcow2_create().

qcow_create() from qcow.c is safe, but perhaps it's clearer if you put
the declaration of the 'buf' variable inside the 'if (encryptfmt)'
block.

That assuming that you really need to use qemu_opt_get_del() there. With
the double-free bug fixed, I don't see any difference when using
qemu_opt_get_del() vs qemu_opt_get().

Berto



[Qemu-block] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-20 Thread Paul Durrant
If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/xen_disk.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 3a22805fbc..9b06e3aa81 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
 
 blkdev->file_blk  = BLOCK_SIZE;
 
+blkdev->feature_grant_copy =
+(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
+
+xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
+  blkdev->feature_grant_copy ? "enabled" : "disabled");
+
 /* fill info
  * blk_connect supplies sector-size and sectors
  */
 xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
-xenstore_write_be_int(>xendev, "feature-persistent", 1);
+xenstore_write_be_int(>xendev, "feature-persistent",
+  !blkdev->feature_grant_copy);
 xenstore_write_be_int(>xendev, "info", info);
 
 blk_parse_discard(blkdev);
@@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
 
 xen_be_bind_evtchn(>xendev);
 
-blkdev->feature_grant_copy =
-(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
-
-xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
-  blkdev->feature_grant_copy ? "enabled" : "disabled");
-
 xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
   "remote port %d, local port %d\n",
   blkdev->xendev.protocol, blkdev->ring_ref,
-- 
2.11.0




[Qemu-block] [PATCH 0/3] xen-disk: performance improvements

2017-06-20 Thread Paul Durrant
Paul Durrant (3):
  xen-disk: only advertize feature-persistent if grant copy is not
available
  xen-disk: add support for multi-page shared rings
  xen-disk: use an IOThread per instance

 hw/block/trace-events |   7 ++
 hw/block/xen_disk.c   | 200 --
 2 files changed, 168 insertions(+), 39 deletions(-)

-- 
2.11.0




[Qemu-block] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-20 Thread Paul Durrant
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/xen_disk.c | 141 
 1 file changed, 110 insertions(+), 31 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 9b06e3aa81..a9942d32db 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,8 +36,6 @@
 
 static int batch_maps   = 0;
 
-static int max_requests = 32;
-
 /* - */
 
 #define BLOCK_SIZE  512
@@ -84,6 +82,8 @@ struct ioreq {
 BlockAcctCookie acct;
 };
 
+#define MAX_RING_PAGE_ORDER 4
+
 struct XenBlkDev {
 struct XenDevicexendev;  /* must be first */
 char*params;
@@ -94,7 +94,8 @@ struct XenBlkDev {
 booldirectiosafe;
 const char  *fileproto;
 const char  *filename;
-int ring_ref;
+unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
+unsigned intnr_ring_ref;
 void*sring;
 int64_t file_blk;
 int64_t file_size;
@@ -110,6 +111,7 @@ struct XenBlkDev {
 int requests_total;
 int requests_inflight;
 int requests_finished;
+unsigned intmax_requests;
 
 /* Persistent grants extension */
 gbooleanfeature_discard;
@@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 struct ioreq *ioreq = NULL;
 
 if (QLIST_EMPTY(>freelist)) {
-if (blkdev->requests_total >= max_requests) {
+if (blkdev->requests_total >= blkdev->max_requests) {
 goto out;
 }
 /* allocate new struct */
@@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
 ioreq_runio_qemu_aio(ioreq);
 }
 
-if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
+if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) 
{
 qemu_bh_schedule(blkdev->bh);
 }
 }
@@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
 blk_handle_requests(blkdev);
 }
 
-/*
- * We need to account for the grant allocations requiring contiguous
- * chunks; the worst case number would be
- * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
- * but in order to keep things simple just use
- * 2 * max_req * max_seg.
- */
-#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
-
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
 if (xen_mode != XEN_EMULATE) {
 batch_maps = 1;
 }
-if (xengnttab_set_max_grants(xendev->gnttabdev,
-MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
-xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
-  strerror(errno));
-}
 }
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
   !blkdev->feature_grant_copy);
 xenstore_write_be_int(>xendev, "info", info);
 
+xenstore_write_be_int(>xendev, "max-ring-page-order",
+  MAX_RING_PAGE_ORDER);
+
 blk_parse_discard(blkdev);
 
 g_free(directiosafe);
@@ -1058,12 +1049,25 @@ out_error:
 return -1;
 }
 
+/*
+ * We need to account for the grant allocations requiring contiguous
+ * chunks; the worst case number would be
+ * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
+ * but in order to keep things simple just use
+ * 2 * max_req * max_seg.
+ */
+#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
+
 static int blk_connect(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 int pers, index, qflags;
 bool readonly = true;
 bool writethrough = true;
+int order, ring_ref;
+unsigned int ring_size, max_grants;
+unsigned int i;
+uint32_t *domids;
 
 /* read-only ? */
 if (blkdev->directiosafe) {
@@ -1138,9 +1142,39 @@ static int blk_connect(struct XenDevice *xendev)
 xenstore_write_be_int64(>xendev, "sectors",
 blkdev->file_size / blkdev->file_blk);
 
-if (xenstore_read_fe_int(>xendev, "ring-ref", >ring_ref) 
== -1) {
+if (xenstore_read_fe_int(>xendev, "ring-page-order",
+ ) == 

[Qemu-block] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paul Durrant
This patch allocates an IOThread object for each xen_disk instance and
sets the AIO context appropriately on connect. This allows processing
of I/O to proceed in parallel.

The patch also adds tracepoints into xen_disk to make it possible to
follow the state transtions of an instance in the log.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Kevin Wolf 
Cc: Max Reitz 
---
 hw/block/trace-events |  7 +++
 hw/block/xen_disk.c   | 44 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 65e83dc258..608b24ba66 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
num_reqs, uint64_t offset,
 # hw/block/hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS 
%d %d %d"
 hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, int 
trans) "blk %p CHS %u %u %u trans %d"
+
+# hw/block/xen_disk.c
+xen_disk_alloc(char *name) "%s"
+xen_disk_init(char *name) "%s"
+xen_disk_connect(char *name) "%s"
+xen_disk_disconnect(char *name) "%s"
+xen_disk_free(char *name) "%s"
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index a9942d32db..ec1085c802 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -27,10 +27,13 @@
 #include "hw/xen/xen_backend.h"
 #include "xen_blkif.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qom/object_interfaces.h"
+#include "trace.h"
 
 /* - */
 
@@ -128,6 +131,9 @@ struct XenBlkDev {
 DriveInfo   *dinfo;
 BlockBackend*blk;
 QEMUBH  *bh;
+
+IOThread*iothread;
+AioContext  *ctx;
 };
 
 /* - */
@@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
 static void blk_alloc(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
+Object *obj;
+char *name;
+Error *err = NULL;
+
+trace_xen_disk_alloc(xendev->name);
 
 QLIST_INIT(>inflight);
 QLIST_INIT(>finished);
 QLIST_INIT(>freelist);
-blkdev->bh = qemu_bh_new(blk_bh, blkdev);
+
+obj = object_new(TYPE_IOTHREAD);
+name = g_strdup_printf("iothread-%s", xendev->name);
+
+object_property_add_child(object_get_objects_root(), name, obj, );
+assert(!err);
+
+g_free(name);
+
+user_creatable_complete(obj, );
+assert(!err);
+
+blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
+blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
+blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
+
 if (xen_mode != XEN_EMULATE) {
 batch_maps = 1;
 }
@@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
 int info = 0;
 char *directiosafe = NULL;
 
+trace_xen_disk_init(xendev->name);
+
 /* read xenstore entries */
 if (blkdev->params == NULL) {
 char *h = NULL;
@@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
 unsigned int i;
 uint32_t *domids;
 
+trace_xen_disk_connect(xendev->name);
+
 /* read-only ? */
 if (blkdev->directiosafe) {
 qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
@@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
 blkdev->persistent_gnt_count = 0;
 }
 
+blk_set_aio_context(blkdev->blk, blkdev->ctx);
+
 xen_be_bind_evtchn(>xendev);
 
 xen_pv_printf(>xendev, 1, "ok: proto %s, nr-ring-ref %u, "
@@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
 {
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 
+trace_xen_disk_disconnect(xendev->name);
+
+aio_context_acquire(blkdev->ctx);
+
 if (blkdev->blk) {
+blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
 blk_detach_dev(blkdev->blk, blkdev);
 blk_unref(blkdev->blk);
 blkdev->blk = NULL;
 }
 xen_pv_unbind_evtchn(>xendev);
 
+aio_context_release(blkdev->ctx);
+
 if (blkdev->sring) {
 xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring,
 blkdev->nr_ring_ref);
@@ -1338,6 +1377,8 @@ static int blk_free(struct XenDevice *xendev)
 struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
 struct ioreq *ioreq;
 
+trace_xen_disk_free(xendev->name);
+
 if (blkdev->blk || blkdev->sring) {
 blk_disconnect(xendev);
 }
@@ -1355,6 +1396,7 @@ static int blk_free(struct XenDevice *xendev)
 

[Qemu-block] [PATCH] qcow2: Use offset_into_cluster() and offset_to_l2_index()

2017-06-20 Thread Alberto Garcia
We already have functions for doing these calculations, so let's use
them instead of doing everything by hand. This makes the code a bit
more readable.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d779ea19cf..a2c612433d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -548,7 +548,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t 
offset,
 
 /* find the cluster offset for the given disk offset */
 
-l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+l2_index = offset_to_l2_index(s, offset);
 *cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
 nb_clusters = size_to_clusters(s, bytes_needed);
@@ -685,7 +685,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t 
offset,
 
 /* find the cluster offset for the given disk offset */
 
-l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+l2_index = offset_to_l2_index(s, offset);
 
 *new_l2_table = l2_table;
 *new_l2_index = l2_index;
diff --git a/block/qcow2.c b/block/qcow2.c
index b3ba5daa93..adb604fc81 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -356,7 +356,7 @@ static int validate_table_offset(BlockDriverState *bs, 
uint64_t offset,
 }
 
 /* Tables must be cluster aligned */
-if (offset & (s->cluster_size - 1)) {
+if (offset_into_cluster(s, offset) != 0) {
 return -EINVAL;
 }
 
-- 
2.11.0




Re: [Qemu-block] [PATCH v9 07/20] block: deprecate "encryption=on" in favor of "encrypt.format=aes"

2017-06-20 Thread Daniel P. Berrange
On Tue, Jun 20, 2017 at 01:44:50PM +0200, Alberto Garcia wrote:
> On Mon 19 Jun 2017 07:34:42 PM CEST, Daniel P. Berrange wrote:
> > Historically the qcow & qcow2 image formats supported a property
> > "encryption=on" to enable their built-in AES encryption. We'll
> > soon be supporting LUKS for qcow2, so need a more general purpose
> > way to enable encryption, with a choice of formats.
> >
> > This introduces an "encrypt.format" option, which will later be
> > joined by a number of other "encrypt.XXX" options. The use of
> > a "encrypt." prefix instead of "encrypt-" is done to facilitate
> > mapping to a nested QAPI schema at later date.
> >
> > e.g. the preferred syntax is now
> >
> >   qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2
> >
> > Signed-off-by: Daniel P. Berrange 
> > +if (encryptfmt) {
> > +buf = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT);
> > +if (buf != NULL) {
> > +g_free(buf);
> 
> If you use qemu_opt_get() instead then you don't need "buf" at all, do
> you?

IIRC, we needed to delete the option from opts, otherwise something
will later complain that there are opts that are not consumed.

> 
> > +if (encryptfmt) {
> > +buf = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT);
> > +if (buf != NULL) {
> > +g_free(buf);
> 
> Same here.
> 
> Everything else looks fine.
> 
> Reviewed-by: Alberto Garcia 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH v9 07/20] block: deprecate "encryption=on" in favor of "encrypt.format=aes"

2017-06-20 Thread Alberto Garcia
On Mon 19 Jun 2017 07:34:42 PM CEST, Daniel P. Berrange wrote:
> Historically the qcow & qcow2 image formats supported a property
> "encryption=on" to enable their built-in AES encryption. We'll
> soon be supporting LUKS for qcow2, so need a more general purpose
> way to enable encryption, with a choice of formats.
>
> This introduces an "encrypt.format" option, which will later be
> joined by a number of other "encrypt.XXX" options. The use of
> a "encrypt." prefix instead of "encrypt-" is done to facilitate
> mapping to a nested QAPI schema at later date.
>
> e.g. the preferred syntax is now
>
>   qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2
>
> Signed-off-by: Daniel P. Berrange 
> +if (encryptfmt) {
> +buf = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT);
> +if (buf != NULL) {
> +g_free(buf);

If you use qemu_opt_get() instead then you don't need "buf" at all, do
you?

> +if (encryptfmt) {
> +buf = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT);
> +if (buf != NULL) {
> +g_free(buf);

Same here.

Everything else looks fine.

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PULL 0/2] Block patches

2017-06-20 Thread Peter Maydell
On 14 June 2017 at 22:55, Jeff Cody  wrote:
> The following changes since commit 3f0602927b120a480b35dcf58cf6f95435b3ae91:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20170613' into staging (2017-06-13 
> 15:49:07 +0100)
>
> are available in the git repository at:
>
>   git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
>
> for you to fetch changes up to 5c3ad1a6a8fa041c57403dbe1fc5927eec0be66b:
>
>   block/iscsi: enable filename option and parsing (2017-06-14 17:39:46 -0400)
>
> 
> Block patches
> 
>
> Jeff Cody (2):
>   block/rbd: enable filename option and parsing
>   block/iscsi: enable filename option and parsing
>
>  block/iscsi.c | 22 +-
>  block/rbd.c   | 22 +-
>  2 files changed, 42 insertions(+), 2 deletions(-)

Applied, thanks.

-- PMM



Re: [Qemu-block] [PATCH] nvme: Add support for Read Data and Write Data in CMBs.

2017-06-20 Thread Kevin Wolf
Am 19.06.2017 um 23:24 hat Keith Busch geschrieben:
> On Tue, Jun 13, 2017 at 04:08:35AM -0600, sba...@raithlin.com wrote:
> > From: Stephen Bates 
> > 
> > Add the ability for the NVMe model to support both the RDS and WDS
> > modes in the Controller Memory Buffer.
> > 
> > Although not currently supported in the upstreamed Linux kernel a fork
> > with support exists [1] and user-space test programs that build on
> > this also exist [2].
> > 
> > Useful for testing CMB functionality in preperation for real CMB
> > enabled NVMe devices (coming soon).
> > 
> > [1] https://github.com/sbates130272/linux-p2pmem
> > [2] https://github.com/sbates130272/p2pmem-test
> > 
> > Signed-off-by: Stephen Bates 
> > Reviewed-by: Logan Gunthorpe 
> 
> This looks good to me.
> 
> Reviewed-by: Keith Busch 

Thanks, applied to the block branch.

Kevin