[PATCH v2] qemu-img: Explicit number replaced by a constant

2020-08-18 Thread Yi Li
Signed-off-by: Yi Li 
---
 qemu-img.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 5308773811..aa2e31c8ae 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1200,10 +1200,10 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum,
 *pnum = 0;
 return 0;
 }
-is_zero = buffer_is_zero(buf, 512);
+is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE);
 for(i = 1; i < n; i++) {
-buf += 512;
-if (is_zero != buffer_is_zero(buf, 512)) {
+buf += BDRV_SECTOR_SIZE;
+if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE)) {
 break;
 }
 }
@@ -2489,8 +2489,8 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-qemu_opt_set_number(opts, BLOCK_OPT_SIZE, s.total_sectors * 512,
-_abort);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+s.total_sectors * BDRV_SECTOR_SIZE, _abort);
 ret = add_old_style_options(out_fmt, opts, out_baseimg, NULL);
 if (ret < 0) {
 goto out;
-- 
2.25.3






Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-18 Thread Bin Meng
On Tue, Aug 18, 2020 at 9:55 PM Anup Patel  wrote:
>
> On Tue, Aug 18, 2020 at 6:39 PM  wrote:
> >
> > On 8/18/20 7:17 AM, Anup Patel wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> > > the content is safe
> > >
> > > On Tue, Aug 18, 2020 at 1:23 AM  wrote:
> > >> On 8/17/20 8:28 PM, Alistair Francis wrote:
> > >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> > >>> the content is safe
> > >>>
> > >>> On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
> >  Hi Anup,
> > 
> >  On 8/17/20 11:30 AM, Bin Meng wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you 
> > > know the content is safe
> > >
> > > Hi Anup,
> > >
> > > On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  
> > > wrote:
> > >> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> > >>> From: Bin Meng 
> > >>>
> > >>> This adds support for Microchip PolarFire SoC Icicle Kit board.
> > >>> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> > >>> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
> > >> Nice Work !!! This is very helpful.
> > > Thanks!
> > >
> > >> The Microchip HSS is quite convoluted. It has:
> > >> 1. DDR Init
> > >> 2. Boot device support
> > >> 3. SBI support using OpenSBI as library
> > >> 4. Simple TEE support
> > >>
> > >> I think point 1) and 2) above should be part of U-Boot SPL.
> > >> The point 3) can be OpenSBI FW_DYNAMIC.
> > >>
> > >> Lastly,for point 4), we are working on a new OpenSBI feature using
> > >> which we can run independent Secure OS and Non-Secure OS using
> > >> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
> > >> PolarFire).
> > >>
> > >> Do you have plans for adding U-Boot SPL support for this board ??
> > > + Cyril Jean from Microchip
> > >
> > > I will have to leave this question to Cyril to comment.
> > >
> >  I currently do not have a plan to support U-Boot SPL. The idea of the
> >  HSS is to contain all the silicon specific initialization and
> >  configuration code within the HSS before jumping to U-Boot S-mode. I
> >  would rather keep all this within the HSS for the time being. I would
> >  wait until we reach production silicon before attempting to move this 
> >  to
> >  U-Boot SPL as the HSS is likely to contain some opaque silicon related
> >  changes for another while.
> > >>> That is unfortunate, a lot of work has gone into making the boot flow
> > >>> simple and easy to use.
> > >>>
> > >>> QEMU now includes OpenSBI by default to make it easy for users to boot
> > >>> Linux. The Icicle Kit board is now the most difficult QEMU board to
> > >>> boot Linux on. Not to mention it makes it hard to impossible to
> > >>> support it in standard tool flows such as meta-riscv.
> > >>>
> > >>> Alistair
> > >> If it is such a problem we can add a U-Boot SPL stage and the HSS can be
> > >> treated as standard SoC ROM code.
> > > It's not mandatory for U-Boot SPL to have stable DRAM calibration settings
> > > from the start itself. The initial U-Boot SPL support for most
> > > platforms (accross
> > > architectures) usually include basic working DRAM calibration settings 
> > > which
> > > is later on updated with separate patches. Also, we don't need all U-Boot
> > > drivers to be upstreamed in one-go as this can be done in phases.
> > >
> > > The advantage we have with PolarFire SoC Icicle board is that we already
> > > have a U-Boot S-mode. and we believe the OpenSBI generic platform will
> > > work fine for PolarFire SoC Icicle board so the only thing missing right 
> > > now
> > > is the U-Boot SPL support for OpenSource boot-flow.
> > >
> > > It will certainly accelerate open-source development if we have boot-flow
> > > U-Boot_SPL => OpenSBI (FW_DYNAMIC) => U-Boot_S-mode working
> > > on PolarFire SoC Icicle board. Initially, we just need DRAM, SD/eMMC,
> > > and Serial port support for U-Boot SPL and U-Boot S-mode. Later on,
> > > more patches can add ethernet and other booting device drivers to U-Boot.
> > >
> > > Regarding security services of HSS, we are working on a OpenSBI
> > > feature which will allow HSS security services to run as independent
> > > binary in M-mode (not linked to OpenSBI) and OpenSBI FW_DYNAMIC
> > > will be a separate binary acting as a secure monitor.
> > >
> > > Regards,
> > > Anup
> >
> > What I have in mind is that the external memory will be up and running
> > by the time we get to U-Boot SPL. In the case of PolarFire SoC the ROM
> > code equivalent brings up the DDR memory interface so we do not need to
> > worry about this as part of U-Boot.
>
> Keeping DRAM configuration as part of a separate ROM booting stage prior
> to the U-Boot SPL sounds good to me. This will lead to following boot-flow:
>
> ROM/HSS (M-mode) => U-Boot SPL (M-mode) => OpenSBI 

[PATCH v6 0/4] Apply COR-filter to the block-stream permanently

2020-08-18 Thread Andrey Shinkevich
Note: this series is based on the another one "block: Deal with filters"
  by Max Reitz that could be found in the branches:
  https://git.xanclic.moe/XanClic/qemu child-access-functions-v6
  https://github.com/XanClic/qemu child-access-functions-v6

v6:
  Re-based to the series "block: Deal with filters".
  The minimum number of patches were kept.
  Not all the iotests were checked for pass.
  
  04: The test case iotests:030:test_stream_parallel was removed
  due to multiple errors.

Andrey Shinkevich (4):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c   | 131 +
 block/copy-on-read.h   |  36 +++
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c |  80 +
 blockdev.c |   8 ++-
 include/block/block_int.h  |   7 ++-
 qapi/block-core.json   |   6 ++
 tests/qemu-iotests/030 |  50 ++--
 tests/qemu-iotests/030.out |   4 +-
 9 files changed, 240 insertions(+), 86 deletions(-)
 create mode 100644 block/copy-on-read.h

-- 
1.8.3.1




[PATCH v6 4/4] block: apply COR-filter to block-stream jobs

2020-08-18 Thread Andrey Shinkevich
The patch completes the series with the COR-filter insertion to any
block-stream operation. It also makes changes to the iotests 030.
The test case 'test_stream_parallel' was deleted due to multiple
errors.

Signed-off-by: Andrey Shinkevich 
---
 block/stream.c | 76 --
 tests/qemu-iotests/030 | 50 +++---
 tests/qemu-iotests/030.out |  4 +--
 3 files changed, 61 insertions(+), 69 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..0b11979 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
 /*
@@ -33,8 +34,11 @@ typedef struct StreamBlockJob {
 BlockJob common;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *cor_filter_bs;
+BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
+char *base_fmt;
 bool bs_read_only;
 bool chain_frozen;
 } StreamBlockJob;
@@ -53,34 +57,26 @@ static void stream_abort(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
 if (s->chain_frozen) {
-BlockJob *bjob = >common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 }
 }
 
 static int stream_prepare(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 s->chain_frozen = false;
 
 if (bdrv_cow_child(unfiltered_bs)) {
-const char *base_id = NULL, *base_fmt = NULL;
-if (base) {
-base_id = s->backing_file_str;
-if (base->drv) {
-base_fmt = base->drv->format_name;
-}
-}
 bdrv_set_backing_hd(unfiltered_bs, base, _err);
-ret = bdrv_change_backing_file(unfiltered_bs, base_id, base_fmt);
+ret = bdrv_change_backing_file(unfiltered_bs, s->backing_file_str,
+   s->base_fmt);
 if (local_err) {
 error_report_err(local_err);
 return -EPERM;
@@ -94,7 +90,9 @@ static void stream_clean(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+BlockDriverState *bs = s->target_bs;
+
+bdrv_cor_filter_drop(s->cor_filter_bs);
 
 /* Reopen the image back in read-only mode if necessary */
 if (s->bs_read_only) {
@@ -104,13 +102,14 @@ static void stream_clean(Job *job)
 }
 
 g_free(s->backing_file_str);
+g_free(s->base_fmt);
 }
 
 static int coroutine_fn stream_run(Job *job, Error **errp)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
-BlockDriverState *bs = blk_bs(blk);
+BlockDriverState *bs = s->target_bs;
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 bool enable_cor = !bdrv_cow_child(s->base_overlay);
 int64_t len;
@@ -231,6 +230,12 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
 BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
 BlockDriverState *above_base;
+BlockDriverState *cor_filter_bs = NULL;
+char *base_fmt = NULL;
+
+if (base && base->drv) {
+base_fmt = g_strdup(base->drv->format_name);
+}
 
 if (!base_overlay) {
 error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -264,17 +269,36 @@ void stream_start(const char *job_id, BlockDriverState 
*bs,
 }
 }
 
-/* Prevent concurrent jobs trying to modify the graph structure here, we
- * already have our own plans. Also don't allow resize as the image size is
- * queried only at the job start and then cached. */
-s = block_job_create(job_id, _job_driver, NULL, bs,
- basic_flags | BLK_PERM_GRAPH_MOD,
- basic_flags | BLK_PERM_WRITE,
+cor_filter_bs = bdrv_cor_filter_append(bs, filter_node_name, errp);
+if (cor_filter_bs == NULL) {
+goto fail;
+}
+
+if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
+bdrv_cor_filter_drop(cor_filter_bs);
+cor_filter_bs = NULL;
+goto fail;
+}
+
+s = 

[PATCH v6 2/4] copy-on-read: add filter append/drop functions

2020-08-18 Thread Andrey Shinkevich
Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 103 +++
 block/copy-on-read.h |  36 ++
 2 files changed, 139 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..150d9b7 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BDRVStateCOR *state = bs->opaque;
+
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
false, errp);
@@ -42,6 +52,8 @@ static int cor_open(BlockDriverState *bs, QDict *options, int 
flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+state->active = true;
+
 return 0;
 }
 
@@ -57,6 +69,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVStateCOR *s = bs->opaque;
+
+if (!s->active) {
+/*
+ * While the filter is being removed
+ */
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+return;
+}
+
 *nperm = perm & PERM_PASSTHROUGH;
 *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +158,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool 
locked)
 
 static BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
+.instance_size  = sizeof(BDRVStateCOR),
 
 .bdrv_open  = cor_open,
 .bdrv_child_perm= cor_child_perm,
@@ -159,4 +183,83 @@ static void bdrv_copy_on_read_init(void)
 bdrv_register(_copy_on_read);
 }
 
+
+static BlockDriverState *create_filter_node(BlockDriverState *bs,
+const char *filter_node_name,
+Error **errp)
+{
+QDict *opts = qdict_new();
+
+qdict_put_str(opts, "driver", "copy-on-read");
+qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+if (filter_node_name) {
+qdict_put_str(opts, "node-name", filter_node_name);
+}
+
+return bdrv_open(NULL, NULL, opts, BDRV_O_RDWR, errp);
+}
+
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+ const char *filter_node_name,
+ Error **errp)
+{
+BlockDriverState *cor_filter_bs;
+BDRVStateCOR *state;
+Error *local_err = NULL;
+
+cor_filter_bs = create_filter_node(bs, filter_node_name, errp);
+if (cor_filter_bs == NULL) {
+error_prepend(errp, "Could not create filter node: ");
+return NULL;
+}
+
+if (!filter_node_name) {
+cor_filter_bs->implicit = true;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, cor_filter_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(cor_filter_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+state = cor_filter_bs->opaque;
+state->active = true;
+
+return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+BdrvChild *child;
+BlockDriverState *bs;
+BDRVStateCOR *s = cor_filter_bs->opaque;
+
+child = bdrv_filter_child(cor_filter_bs);
+if (!child) {
+return;
+}
+bs = child->bs;
+
+/* Retain the BDS until we complete the graph change. */
+bdrv_ref(bs);
+/* Hold a guest back from writing while permissions are being reset. */
+bdrv_drained_begin(bs);
+/* Drop permissions before the graph change. */
+s->active = false;
+bdrv_child_refresh_perms(cor_filter_bs, child, _abort);
+bdrv_replace_node(cor_filter_bs, bs, _abort);
+
+bdrv_drained_end(bs);
+bdrv_unref(bs);
+bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 000..db03c6c
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,36 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich 
+ *
+ * This program is free software; you 

[PATCH v6 3/4] qapi: add filter-node-name to block-stream

2020-08-18 Thread Andrey Shinkevich
Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job. That will be
needed for further iotests implementations.

Signed-off-by: Andrey Shinkevich 
---
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c | 4 +++-
 blockdev.c | 8 +++-
 include/block/block_int.h  | 7 ++-
 qapi/block-core.json   | 6 ++
 5 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4d3db5e..4e66775 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
 qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
  false, NULL, qdict_haskey(qdict, "speed"), speed, true,
- BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
- );
+ BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
+ false, );
 
 hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index b9c1141..8bf6b6d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp)
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp)
 {
 StreamBlockJob *s;
 BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index 237fffb..800ecb3 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2476,6 +2476,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
+  bool has_filter_node_name, const char *filter_node_name,
   bool has_auto_finalize, bool auto_finalize,
   bool has_auto_dismiss, bool auto_dismiss,
   Error **errp)
@@ -2491,6 +2492,10 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
 
+if (!has_filter_node_name) {
+filter_node_name = NULL;
+}
+
 bs = bdrv_lookup_bs(device, device, errp);
 if (!bs) {
 return;
@@ -2558,7 +2563,8 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 
 stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
- job_flags, has_speed ? speed : 0, on_error, _err);
+ job_flags, has_speed ? speed : 0, on_error,
+ filter_node_name, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 465a601..3efde33 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1122,6 +1122,9 @@ int is_windows_drive(const char *filename);
  *  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1134,7 +1137,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp);
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp);
 
 /**
  * commit_start:
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0b8ccd3..1db6ce1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2524,6 +2524,11 @@
 #'stop' and 'enospc' can only be used if the block device
 #supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#filter driver that the stream job inserts into the graph
+#above @device. If this option is not given, a node name is
+#autogenerated. (Since: 5.1)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it 
has
 

[PATCH v6 1/4] copy-on-read: Support preadv/pwritev_part functions

2020-08-18 Thread Andrey Shinkevich
Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61..cb03e0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-  uint64_t offset, uint64_t bytes,
-  QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov,
+   size_t qiov_offset,
+   int flags)
 {
-return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-  flags | BDRV_REQ_COPY_ON_READ);
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-   uint64_t offset, uint64_t bytes,
-   QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+size_t qiov_offset, int flags)
 {
-
-return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
 .bdrv_getlength = cor_getlength,
 
-.bdrv_co_preadv = cor_co_preadv,
-.bdrv_co_pwritev= cor_co_pwritev,
+.bdrv_co_preadv_part= cor_co_preadv_part,
+.bdrv_co_pwritev_part   = cor_co_pwritev_part,
 .bdrv_co_pwrite_zeroes  = cor_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = cor_co_pdiscard,
 .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed,
-- 
1.8.3.1




Re: [PATCH v7 14/47] stream: Deal with filters

2020-08-18 Thread Andrey Shinkevich

Reviewed-by: Andrey Shinkevich 


On 10.08.2020 14:04, Vladimir Sementsov-Ogievskiy wrote:

10.08.2020 11:12, Max Reitz wrote:

On 07.08.20 12:29, Vladimir Sementsov-Ogievskiy wrote:

16.07.2020 17:59, Max Reitz wrote:

On 10.07.20 19:41, Andrey Shinkevich wrote:

On 10.07.2020 18:24, Max Reitz wrote:

On 09.07.20 16:52, Andrey Shinkevich wrote:

On 25.06.2020 18:21, Max Reitz wrote:

Because of the (not so recent anymore) changes that make the
stream job
independent of the base node and instead track the node above 
it, we
have to split that "bottom" node into two cases: The bottom COW 
node,
and the node directly above the base node (which may be an R/W 
filter

or the bottom COW node).

Signed-off-by: Max Reitz 
---
 qapi/block-core.json |  4 +++
 block/stream.c   | 63

 blockdev.c   |  4 ++-
 3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b20332e592..df87855429 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2486,6 +2486,10 @@
 # On successful completion the image file is updated to 
drop the

backing file
 # and the BLOCK_JOB_COMPLETED event is emitted.
 #
+# In case @device is a filter node, block-stream modifies the 
first

non-filter
+# overlay node below it to point to base's backing node (or 
NULL if

@base was
+# not specified) instead of modifying @device itself.
+#
 # @job-id: identifier for the newly-created block job. If
 #  omitted, the device name will be used. (Since 2.7)
 #
diff --git a/block/stream.c b/block/stream.c
index aa2e7af98e..b9c1141656 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -31,7 +31,8 @@ enum {
   typedef struct StreamBlockJob {
 BlockJob common;
-    BlockDriverState *bottom;
+    BlockDriverState *base_overlay; /* COW overlay (stream from
this) */
+    BlockDriverState *above_base;   /* Node directly above the
base */

Keeping the base_overlay is enough to complete the stream job.
Depends on the definition.  If we decide it isn’t enough, then it 
isn’t

enough.

The above_base may disappear during the job and we can't rely on 
it.

In this version of this series, it may not, because the chain is
frozen.
    So the above_base cannot disappear.


Once we insert a filter above the top bs of the stream job, the 
parallel

jobs in

the iotests #030 will fail with 'frozen link error'. It is because of
the

independent parallel stream or commit jobs that insert/remove their
filters

asynchroniously.


I’m not sure whether that’s a problem with this series specifically.


We can discuss whether we should allow it to disappear, but I think
not.

The problem is, we need something to set as the backing file after
streaming.  How do we figure out what that should be? My proposal
is we
keep above_base and use its immediate child.


We can do the same with the base_overlay.

If the backing node turns out to be a filter, the proper backing
child will

be set after the filter is removed. So, we shouldn't care.


And what if the user manually added some filter above the base (i.e.
below base_overlay) that they want to keep after the job?



It's automatically kept, if we use base_overlay->backing->bs as final
backing node.

You mean, that they want it to be dropped?


Er, yes.  Point is, the graph structure below with @base at the root may
be different than the one right below @base_overlay.


so, assuming the following:

top -(backing)-> manually-inserted-filter -(file)-> base

and user do stream with base=base, and expects filter to be removed by
stream job?

Hmm, yes, such use-case is broken with our proposed way...



Let me now clarify the problem we'll have with your way.

When stream don't have any filter, we can easily imagine two parallel
stream jobs:

top -(backing)-> mid1 -(backing)-> mid2 -(backing)-> base

stream1: top=top, base=mid2
stream2: top=mid2, base=NULL

final picture is obvious:

top (merged with mid1) -(backing)-> mid2 (merged with base)


Yes, and I don’t think this currently working case is broken by this 
series.



But we want stream job has own filter, like mirror.


Which it does not have yet, right?  Which is why I was saying that I
don’t think this is a problem with this series.  We could try to address
it later.

Or do you think we can’t address it later because right now all filter
cases are broken anyway so now would be the time to make a breaking
change (which the suggestion to not use @base as the final backing 
node is)?


I think, we can address it later, but it would be good to fit into one 
release cycle with these series, to not make incompatible behavior 
changes later.





So the picture becomes more complex.

Assume stream2 starts first.

top -(backing)-> mid1 -(backing)-> stream2-filter -(backing)-> mid2
-(backing)-> base


stream2-filter would be on top of mid2, right?


Right. In my picture, "-(backing)->" means backing link. 

Re: [RFC PATCH v3 2/5] util/vfio-helpers: Report error on unsupported host architectures

2020-08-18 Thread Philippe Mathieu-Daudé
On 8/18/20 7:12 PM, Alex Williamson wrote:
> On Tue, 18 Aug 2020 18:45:06 +0200
> Philippe Mathieu-Daudé  wrote:
> 
>> The vfio-helpers implementation expects a TYPEv1 IOMMU, see
>> qemu_vfio_init_pci:
>>
>>   263 if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
>>   264 error_setg_errno(errp, errno, "VFIO IOMMU check failed");
>>
>> Thus POWER SPAPR IOMMU is obviously not supported.
>>
>> The implementation only cares about host page size alignment
>> (usually 4KB on X86), not the IOMMU one, which is be problematic
>> on Aarch64, when 64MB page size is used. So Aarch64 is not
>> supported neither.
>>
>> Report an error when the host architecture is different than X86:
>>
>>  $ qemu-system-aarch64 \
>> -drive file=nvme://0001:01:00.0/1,if=none,id=drive0 \
>> -device virtio-blk-pci,drive=drive0
>>   qemu-system-aarch64: -drive file=nvme://0001:01:00.0/1,if=none,id=drive0: 
>> QEMU VFIO utility is not supported on this architecture
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Cc: Eric Auger 
>> Cc: Drew Jones 
>> Cc: Laurent Vivier 
>> Cc: David Gibson 
>> ---
>>  util/vfio-helpers.c | 26 +-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
>> index e399e330e26..60017936e3e 100644
>> --- a/util/vfio-helpers.c
>> +++ b/util/vfio-helpers.c
>> @@ -420,14 +420,38 @@ static void qemu_vfio_open_common(QEMUVFIOState *s)
>>  qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
>>  }
>>  
>> +/**
>> + * Return if the host architecture is supported.
>> + *
>> + * aarch64: IOMMU page alignment not respected
>> + * ppc64:   SPAPR IOMMU window not configured
>> + * x86-64:  Only architecture validated
>> + * other:   Untested
>> + */
>> +static bool qemu_vfio_arch_supported(void)
>> +{
>> +bool supported = false;
>> +
>> +#if defined(HOST_X86_64)
>> +supported = true;
>> +#endif
>> +
>> +return supported;
>> +}
> 
> Why does this need to be hard coded to specific architectures rather
> than probing for type1 IOMMU support and looking at the iova_pgsizes
> from VFIO_IOMMU_GET_INFO to see if there's a compatible size?  It
> requires us to get a bit deeper into the device initialization, but we
> should still be able to unwind out of the device realize.  Otherwise
> we're throwing out aarch64 running of 4KB for no reason, right?  Thanks,

Ah yes, much clever! Thanks Alex :)

> 
> Alex
> 
> 
>>  /**
>>   * Open a PCI device, e.g. ":00:01.0".
>>   */
>>  QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp)
>>  {
>>  int r;
>> -QEMUVFIOState *s = g_new0(QEMUVFIOState, 1);
>> +QEMUVFIOState *s;
>>  
>> +if (!qemu_vfio_arch_supported()) {
>> +error_setg(errp,
>> +   "QEMU VFIO utility is not supported on this 
>> architecture");
>> +return NULL;
>> +}
>> +s = g_new0(QEMUVFIOState, 1);
>>  r = qemu_vfio_init_pci(s, device, errp);
>>  if (r) {
>>  g_free(s);
> 




Re: [RFC PATCH v3 4/5] util/vfio-helpers: Introduce qemu_vfio_pci_init_msix_irqs()

2020-08-18 Thread Alex Williamson
On Tue, 18 Aug 2020 18:45:08 +0200
Philippe Mathieu-Daudé  wrote:

> qemu_vfio_pci_init_irq() allows us to initialize any type of IRQ,
> but only one. Introduce qemu_vfio_pci_init_msix_irqs() which is
> specific to MSIX IRQ type, and allow us to use multiple IRQs
> (thus passing multiple eventfd notifiers).
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/qemu/vfio-helpers.h |  2 ++
>  util/vfio-helpers.c | 53 +
>  2 files changed, 55 insertions(+)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 1f057c2b9e4..63108ebc8da 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -28,5 +28,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, 
> void *bar,
>   uint64_t offset, uint64_t size);
>  int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
> int irq_type, Error **errp);
> +int qemu_vfio_pci_init_msix_irqs(QEMUVFIOState *s, EventNotifier *e,
> + unsigned irq_count, Error **errp);
>  
>  #endif
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 696f2d51712..fb3a79a5bcb 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -215,6 +215,59 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, 
> EventNotifier *e,
>  return 0;
>  }
>  
> +/**
> + * Initialize device MSIX IRQs and register event notifiers.
> + * @irq_count: number of MSIX IRQs to initialize
> + * @e: Array of @irq_count notifiers (each corresponding to a MSIX IRQ)
> + */
> +int qemu_vfio_pci_init_msix_irqs(QEMUVFIOState *s, EventNotifier *e,
> + unsigned irq_count, Error **errp)
> +{
> +int r;
> +struct vfio_irq_set *irq_set;
> +size_t irq_set_size;
> +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> +
> +irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;

Nit, this could be initialized in the declaration with argsz.

> +if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, _info)) {
> +error_setg_errno(errp, errno, "Failed to get device interrupt info");
> +return -errno;
> +}
> +if (irq_info.count <= irq_count) {


Shouldn't this only test strictly less than?  The API seems to leave
the problem of determining how many vectors might be available as an
exercise for the caller.  Thanks,

Alex


> +error_setg(errp,
> +   "Not enough device interrupts available (only %" PRIu32 
> ")",
> +   irq_info.count);
> +return -EINVAL;
> +}
> +if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
> +error_setg(errp, "Device interrupt doesn't support eventfd");
> +return -EINVAL;
> +}
> +
> +irq_set_size = sizeof(*irq_set) + irq_count * sizeof(int32_t);
> +irq_set = g_malloc0(irq_set_size);
> +
> +/* Get to a known IRQ state */
> +*irq_set = (struct vfio_irq_set) {
> +.argsz = irq_set_size,
> +.flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
> +.index = irq_info.index,
> +.start = 0,
> +.count = irq_count,
> +};
> +
> +for (unsigned i = 0; i < irq_count; i++) {
> +((int32_t *)_set->data)[i] = event_notifier_get_fd([i]);
> +}
> +r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
> +g_free(irq_set);
> +if (r) {
> +error_setg_errno(errp, errno, "Failed to setup device interrupts");
> +return -errno;
> +}
> +return 0;
> +}
> +
>  static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
>   int size, int ofs)
>  {




Re: [PATCH v2 0/3] hw/sd: Add Cadence SDHCI emulation

2020-08-18 Thread Philippe Mathieu-Daudé
Cc'ing Sai Pavan for patches 1 and 2.

On 8/17/20 12:05 PM, Bin Meng wrote:
> This series is spun off from the following series as it is hw/sd
> centric, so that it can be picked up separately by Philippe.
> 
> http://patchwork.ozlabs.org/project/qemu-devel/list/?series=195648
> 
> This series fixed 2 SD card issues, and added a new model for
> Cadence SDHCI controller.
> 
> Patch "[09/18] hw/sd: sdhci: Make sdhci_poweron_reset() internal visible"
> in this series per the review comments.
> 
> Changes in v2:
> - remove the pointless zero initialization
> - fix SDSC size check in sd_set_csd() too
> - use 's' for the model state
> - call device_cold_reset() in cadence_sdhci_reset()
> - add .impl in cadence_sdhci_ops
> - move Cadence specific register defines to cadence_sdhci.c
> - use 'sdhci' instead of 'slot' to represent SDHCIState
> - use sysbus_mmio_get_region() to access SDHCI model's memory region
> - initialize TYPE_SYSBUS_SDHCI in the instance_init() so that users
>   of Cadence SDHCI do not have to do that themselves
> - propergate irq and 'sd-bus' from generic-sdhci
> 
> Bin Meng (3):
>   hw/sd: sd: Fix incorrect populated function switch status data
> structure
>   hw/sd: sd: Correct the maximum size of a Standard Capacity SD Memory
> Card
>   hw/sd: Add Cadence SDHCI emulation
> 
>  hw/sd/Kconfig |   4 +
>  hw/sd/Makefile.objs   |   1 +
>  hw/sd/cadence_sdhci.c | 200 
> ++
>  hw/sd/sd.c|   9 +-
>  include/hw/sd/cadence_sdhci.h |  46 ++
>  5 files changed, 257 insertions(+), 3 deletions(-)
>  create mode 100644 hw/sd/cadence_sdhci.c
>  create mode 100644 include/hw/sd/cadence_sdhci.h
> 



[RFC PATCH v3 5/5] block/nvme: Use qemu_vfio_pci_init_msix_irqs() to initialize our IRQ

2020-08-18 Thread Philippe Mathieu-Daudé
Instead of initializing one MSIX IRQ with the generic
qemu_vfio_pci_init_irq() function, use the MSIX specific one which
will allow us to use multiple IRQs. For now we provide an array of
a single IRQ.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index cdd16d451e7..cb86ba2518d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -789,8 +789,8 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 }
 
-ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
- VFIO_PCI_MSIX_IRQ_INDEX, errp);
+ret = qemu_vfio_pci_init_msix_irqs(s->vfio, s->irq_notifier,
+   MSIX_IRQ_COUNT, errp);
 if (ret) {
 goto out;
 }
-- 
2.26.2




[RFC PATCH v3 4/5] util/vfio-helpers: Introduce qemu_vfio_pci_init_msix_irqs()

2020-08-18 Thread Philippe Mathieu-Daudé
qemu_vfio_pci_init_irq() allows us to initialize any type of IRQ,
but only one. Introduce qemu_vfio_pci_init_msix_irqs() which is
specific to MSIX IRQ type, and allow us to use multiple IRQs
(thus passing multiple eventfd notifiers).

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/vfio-helpers.h |  2 ++
 util/vfio-helpers.c | 53 +
 2 files changed, 55 insertions(+)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 1f057c2b9e4..63108ebc8da 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -28,5 +28,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, 
void *bar,
  uint64_t offset, uint64_t size);
 int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
int irq_type, Error **errp);
+int qemu_vfio_pci_init_msix_irqs(QEMUVFIOState *s, EventNotifier *e,
+ unsigned irq_count, Error **errp);
 
 #endif
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 696f2d51712..fb3a79a5bcb 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -215,6 +215,59 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier 
*e,
 return 0;
 }
 
+/**
+ * Initialize device MSIX IRQs and register event notifiers.
+ * @irq_count: number of MSIX IRQs to initialize
+ * @e: Array of @irq_count notifiers (each corresponding to a MSIX IRQ)
+ */
+int qemu_vfio_pci_init_msix_irqs(QEMUVFIOState *s, EventNotifier *e,
+ unsigned irq_count, Error **errp)
+{
+int r;
+struct vfio_irq_set *irq_set;
+size_t irq_set_size;
+struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+
+irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;
+if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, _info)) {
+error_setg_errno(errp, errno, "Failed to get device interrupt info");
+return -errno;
+}
+if (irq_info.count <= irq_count) {
+error_setg(errp,
+   "Not enough device interrupts available (only %" PRIu32 ")",
+   irq_info.count);
+return -EINVAL;
+}
+if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
+error_setg(errp, "Device interrupt doesn't support eventfd");
+return -EINVAL;
+}
+
+irq_set_size = sizeof(*irq_set) + irq_count * sizeof(int32_t);
+irq_set = g_malloc0(irq_set_size);
+
+/* Get to a known IRQ state */
+*irq_set = (struct vfio_irq_set) {
+.argsz = irq_set_size,
+.flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
+.index = irq_info.index,
+.start = 0,
+.count = irq_count,
+};
+
+for (unsigned i = 0; i < irq_count; i++) {
+((int32_t *)_set->data)[i] = event_notifier_get_fd([i]);
+}
+r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
+g_free(irq_set);
+if (r) {
+error_setg_errno(errp, errno, "Failed to setup device interrupts");
+return -errno;
+}
+return 0;
+}
+
 static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
  int size, int ofs)
 {
-- 
2.26.2




[RFC PATCH v3 3/5] util/vfio-helpers: Store eventfd using int32_t type

2020-08-18 Thread Philippe Mathieu-Daudé
Per the documentation in linux-headers/linux/vfio.h:

 VFIO_DEVICE_SET_IRQS

 * DATA_EVENTFD binds the specified ACTION to the provided __s32 eventfd.

Replace the 'int' by an 'int32_t' to match the documentation.

Fixes: 418026ca43 ("util: Introduce vfio helpers")
Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 60017936e3e..696f2d51712 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -193,7 +193,7 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier 
*e,
 return -EINVAL;
 }
 
-irq_set_size = sizeof(*irq_set) + sizeof(int);
+irq_set_size = sizeof(*irq_set) + sizeof(int32_t);
 irq_set = g_malloc0(irq_set_size);
 
 /* Get to a known IRQ state */
@@ -205,7 +205,7 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier 
*e,
 .count = 1,
 };
 
-*(int *)_set->data = event_notifier_get_fd(e);
+*(int32_t *)_set->data = event_notifier_get_fd(e);
 r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
 g_free(irq_set);
 if (r) {
-- 
2.26.2




[RFC PATCH v3 0/5] util/vfio-helpers: Add support for multiple IRQs

2020-08-18 Thread Philippe Mathieu-Daudé
This series intends to setup the VFIO helper to allow
binding notifiers on different IRQs.

For the NVMe use case, we only care about MSIX interrupts.
To not disrupt other users, introduce the qemu_vfio_pci_init_msix_irqs
function to initialize multiple MSIX IRQs and attach eventfd to
them.

Since RFC v2:
- new patch to report vfio-helpers is not supported on AA64/POWER

(NVMe block driver series will follow).

Based-on: <20200812185014.18267-1-phi...@redhat.com>
"block/nvme: Various cleanups required to use multiple queues"
https://www.mail-archive.com/qemu-devel@nongnu.org/msg729395.html

Philippe Mathieu-Daudé (5):
  block/nvme: Use an array of EventNotifier
  util/vfio-helpers: Report error on unsupported host architectures
  util/vfio-helpers: Store eventfd using int32_t type
  util/vfio-helpers: Introduce qemu_vfio_pci_init_msix_irqs()
  block/nvme: Use qemu_vfio_pci_init_msix_irqs() to initialize our IRQ

 include/qemu/vfio-helpers.h |  2 +
 block/nvme.c| 30 +-
 util/vfio-helpers.c | 83 +++--
 3 files changed, 101 insertions(+), 14 deletions(-)

-- 
2.26.2




[RFC PATCH v3 1/5] block/nvme: Use an array of EventNotifier

2020-08-18 Thread Philippe Mathieu-Daudé
In preparation of using multiple IRQ (thus multiple eventfds)
make BDRVNVMeState::irq_notifier an array (for now of a single
element, the admin queue notifier).

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index a61e86a83eb..cdd16d451e7 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -106,6 +106,9 @@ QEMU_BUILD_BUG_ON(offsetof(NVMeRegs, doorbells) != 0x1000);
 #define INDEX_ADMIN 0
 #define INDEX_IO(n) (1 + n)
 
+/* This driver shares a single MSIX IRQ for the admin and I/O queues */
+#define MSIX_IRQ_COUNT  1
+
 struct BDRVNVMeState {
 AioContext *aio_context;
 QEMUVFIOState *vfio;
@@ -120,7 +123,7 @@ struct BDRVNVMeState {
 /* How many uint32_t elements does each doorbell entry take. */
 size_t doorbell_scale;
 bool write_cache_supported;
-EventNotifier irq_notifier;
+EventNotifier irq_notifier[MSIX_IRQ_COUNT];
 
 uint64_t nsze; /* Namespace size reported by identify command */
 int nsid;  /* The namespace id to read/write data. */
@@ -631,7 +634,8 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
 
 static void nvme_handle_event(EventNotifier *n)
 {
-BDRVNVMeState *s = container_of(n, BDRVNVMeState, irq_notifier);
+BDRVNVMeState *s = container_of(n, BDRVNVMeState,
+irq_notifier[INDEX_ADMIN]);
 
 trace_nvme_handle_event(s);
 event_notifier_test_and_clear(n);
@@ -683,7 +687,8 @@ out_error:
 static bool nvme_poll_cb(void *opaque)
 {
 EventNotifier *e = opaque;
-BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
+BDRVNVMeState *s = container_of(e, BDRVNVMeState,
+irq_notifier[INDEX_ADMIN]);
 
 trace_nvme_poll_cb(s);
 return nvme_poll_queues(s);
@@ -705,7 +710,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 s->device = g_strdup(device);
 s->nsid = namespace;
 s->aio_context = bdrv_get_aio_context(bs);
-ret = event_notifier_init(>irq_notifier, 0);
+ret = event_notifier_init(>irq_notifier[INDEX_ADMIN], 0);
 if (ret) {
 error_setg(errp, "Failed to init event notifier");
 return ret;
@@ -784,12 +789,13 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 }
 
-ret = qemu_vfio_pci_init_irq(s->vfio, >irq_notifier,
+ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
  VFIO_PCI_MSIX_IRQ_INDEX, errp);
 if (ret) {
 goto out;
 }
-aio_set_event_notifier(bdrv_get_aio_context(bs), >irq_notifier,
+aio_set_event_notifier(bdrv_get_aio_context(bs),
+   >irq_notifier[INDEX_ADMIN],
false, nvme_handle_event, nvme_poll_cb);
 
 nvme_identify(bs, namespace, _err);
@@ -872,9 +878,10 @@ static void nvme_close(BlockDriverState *bs)
 nvme_free_queue_pair(s->queues[i]);
 }
 g_free(s->queues);
-aio_set_event_notifier(bdrv_get_aio_context(bs), >irq_notifier,
+aio_set_event_notifier(bdrv_get_aio_context(bs),
+   >irq_notifier[INDEX_ADMIN],
false, NULL, NULL);
-event_notifier_cleanup(>irq_notifier);
+event_notifier_cleanup(>irq_notifier[INDEX_ADMIN]);
 qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE);
 qemu_vfio_close(s->vfio);
 
@@ -1381,7 +1388,8 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
 q->completion_bh = NULL;
 }
 
-aio_set_event_notifier(bdrv_get_aio_context(bs), >irq_notifier,
+aio_set_event_notifier(bdrv_get_aio_context(bs),
+   >irq_notifier[INDEX_ADMIN],
false, NULL, NULL);
 }
 
@@ -1391,7 +1399,7 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 BDRVNVMeState *s = bs->opaque;
 
 s->aio_context = new_context;
-aio_set_event_notifier(new_context, >irq_notifier,
+aio_set_event_notifier(new_context, >irq_notifier[INDEX_ADMIN],
false, nvme_handle_event, nvme_poll_cb);
 
 for (int i = 0; i < s->nr_queues; i++) {
-- 
2.26.2




[RFC PATCH v3 2/5] util/vfio-helpers: Report error on unsupported host architectures

2020-08-18 Thread Philippe Mathieu-Daudé
The vfio-helpers implementation expects a TYPEv1 IOMMU, see
qemu_vfio_init_pci:

  263 if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
  264 error_setg_errno(errp, errno, "VFIO IOMMU check failed");

Thus POWER SPAPR IOMMU is obviously not supported.

The implementation only cares about host page size alignment
(usually 4KB on X86), not the IOMMU one, which is be problematic
on Aarch64, when 64MB page size is used. So Aarch64 is not
supported neither.

Report an error when the host architecture is different than X86:

 $ qemu-system-aarch64 \
-drive file=nvme://0001:01:00.0/1,if=none,id=drive0 \
-device virtio-blk-pci,drive=drive0
  qemu-system-aarch64: -drive file=nvme://0001:01:00.0/1,if=none,id=drive0: 
QEMU VFIO utility is not supported on this architecture

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Eric Auger 
Cc: Drew Jones 
Cc: Laurent Vivier 
Cc: David Gibson 
---
 util/vfio-helpers.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index e399e330e26..60017936e3e 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -420,14 +420,38 @@ static void qemu_vfio_open_common(QEMUVFIOState *s)
 qemu_ram_foreach_block(qemu_vfio_init_ramblock, s);
 }
 
+/**
+ * Return if the host architecture is supported.
+ *
+ * aarch64: IOMMU page alignment not respected
+ * ppc64:   SPAPR IOMMU window not configured
+ * x86-64:  Only architecture validated
+ * other:   Untested
+ */
+static bool qemu_vfio_arch_supported(void)
+{
+bool supported = false;
+
+#if defined(HOST_X86_64)
+supported = true;
+#endif
+
+return supported;
+}
 /**
  * Open a PCI device, e.g. ":00:01.0".
  */
 QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp)
 {
 int r;
-QEMUVFIOState *s = g_new0(QEMUVFIOState, 1);
+QEMUVFIOState *s;
 
+if (!qemu_vfio_arch_supported()) {
+error_setg(errp,
+   "QEMU VFIO utility is not supported on this architecture");
+return NULL;
+}
+s = g_new0(QEMUVFIOState, 1);
 r = qemu_vfio_init_pci(s, device, errp);
 if (r) {
 g_free(s);
-- 
2.26.2




[PATCH v6 7/7] vhost-user-blk-pci: default num_queues to -smp N

2020-08-18 Thread Stefan Hajnoczi
Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.  The maximum number of MSI-X
vectors and virtqueues limit are respected.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
Reviewed-by: Raphael Norwitz 
---
 include/hw/virtio/vhost-user-blk.h | 2 ++
 hw/block/vhost-user-blk.c  | 6 +-
 hw/core/machine.c  | 1 +
 hw/virtio/vhost-user-blk-pci.c | 4 
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 34ad6f0c0e..292d17147c 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -25,6 +25,8 @@
 #define VHOST_USER_BLK(obj) \
 OBJECT_CHECK(VHostUserBlk, (obj), TYPE_VHOST_USER_BLK)
 
+#define VHOST_USER_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 typedef struct VHostUserBlk {
 VirtIODevice parent_obj;
 CharBackend chardev;
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a00b854736..39aec42dae 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -420,6 +420,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+s->num_queues = 1;
+}
 if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
 error_setg(errp, "vhost-user-blk: invalid number of IO queues");
 return;
@@ -531,7 +534,8 @@ static const VMStateDescription vmstate_vhost_user_blk = {
 
 static Property vhost_user_blk_properties[] = {
 DEFINE_PROP_CHR("chardev", VHostUserBlk, chardev),
-DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, 1),
+DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
+   VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
 DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7f65fa8743..ea26d61237 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -30,6 +30,7 @@
 
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
+{ "vhost-user-blk", "num-queues", "1"},
 { "vhost-user-scsi", "num_queues", "1"},
 { "virtio-blk-device", "num-queues", "1"},
 { "virtio-scsi-device", "num_queues", "1"},
diff --git a/hw/virtio/vhost-user-blk-pci.c b/hw/virtio/vhost-user-blk-pci.c
index 4f5d5cbf44..a62a71e067 100644
--- a/hw/virtio/vhost-user-blk-pci.c
+++ b/hw/virtio/vhost-user-blk-pci.c
@@ -54,6 +54,10 @@ static void vhost_user_blk_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
 
+if (dev->vdev.num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+dev->vdev.num_queues = virtio_pci_optimal_num_queues(0);
+}
+
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
 vpci_dev->nvectors = dev->vdev.num_queues + 1;
 }
-- 
2.26.2



[PATCH v6 2/7] hw: add 5.2 machine types and 5.1 compat options

2020-08-18 Thread Stefan Hajnoczi
arm, i386, ppc, and s390x have versioned machine types and associated
compatibility options. Introduce new ones now that QEMU 5.1 has been
released.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/boards.h|  3 +++
 include/hw/i386/pc.h   |  3 +++
 hw/arm/virt.c  |  9 -
 hw/core/machine.c  |  4 
 hw/i386/pc.c   |  4 
 hw/i386/pc_piix.c  | 14 +-
 hw/i386/pc_q35.c   | 13 -
 hw/ppc/spapr.c | 15 +--
 hw/s390x/s390-virtio-ccw.c | 14 +-
 9 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 426ce5f625..bc5b82ad20 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -319,6 +319,9 @@ struct MachineState {
 } \
 type_init(machine_initfn##_register_types)
 
+extern GlobalProperty hw_compat_5_1[];
+extern const size_t hw_compat_5_1_len;
+
 extern GlobalProperty hw_compat_5_0[];
 extern const size_t hw_compat_5_0_len;
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 3d7ed3a55e..fe52e165b2 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -193,6 +193,9 @@ void pc_system_firmware_init(PCMachineState *pcms, 
MemoryRegion *rom_memory);
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
const CPUArchIdList *apic_ids, GArray *entry);
 
+extern GlobalProperty pc_compat_5_1[];
+extern const size_t pc_compat_5_1_len;
+
 extern GlobalProperty pc_compat_5_0[];
 extern const size_t pc_compat_5_0_len;
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ecfee362a1..acf9bfbece 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2546,10 +2546,17 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
+static void virt_machine_5_2_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
+
 static void virt_machine_5_1_options(MachineClass *mc)
 {
+virt_machine_5_2_options(mc);
+compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len);
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(5, 1)
+DEFINE_VIRT_MACHINE(5, 1)
 
 static void virt_machine_5_0_options(MachineClass *mc)
 {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 8d1a90c6cf..a6f7e4e8d7 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,10 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
+GlobalProperty hw_compat_5_1[] = {
+};
+const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
+
 GlobalProperty hw_compat_5_0[] = {
 { "pci-host-bridge", "x-config-reg-migration-enabled", "off" },
 { "virtio-balloon-device", "page-poison", "false" },
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 47c5ca3e34..4afcf17f99 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,6 +97,10 @@
 #include "fw_cfg.h"
 #include "trace.h"
 
+GlobalProperty pc_compat_5_1[] = {
+};
+const size_t pc_compat_5_1_len = G_N_ELEMENTS(pc_compat_5_1);
+
 GlobalProperty pc_compat_5_0[] = {
 };
 const size_t pc_compat_5_0_len = G_N_ELEMENTS(pc_compat_5_0);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b789e83f9a..9ce1f9c5d6 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
 }
 
-static void pc_i440fx_5_1_machine_options(MachineClass *m)
+static void pc_i440fx_5_2_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_machine_options(m);
@@ -435,6 +435,18 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
+  pc_i440fx_5_2_machine_options);
+
+static void pc_i440fx_5_1_machine_options(MachineClass *m)
+{
+pc_i440fx_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
+compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
+}
+
 DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
   pc_i440fx_5_1_machine_options);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a3e607a544..4acaad557c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -353,7 +353,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->max_cpus = 288;
 }
 
-static void pc_q35_5_1_machine_options(MachineClass *m)
+static void pc_q35_5_2_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_machine_options(m);
@@ -361,6 +361,17 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL,
+   pc_q35_5_2_machine_options);
+
+static void pc_q35_5_1_machine_options(MachineClass *m)
+{
+pc_q35_machine_options(m);
+m->alias = NULL;
+compat_props_add(m->compat_props, hw_compat_5_1, 

Re: [RFC PATCH 17/22] block/export: Add blk_exp_close_all(_type)

2020-08-18 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> This adds a function to shut down all block exports, and another one to
> shut down the block exports of a single type. The latter is used for now
> when stopping the NBD server. As soon as we implement support for
> multiple NBD servers, we'll need a per-server list of exports and it
> will be replaced by a function using that.
> 
> As a side effect, the BlockExport layer has a list tracking all existing
> exports now. closed_exports loses its only user and can go away.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/export.h |  8 +++
>  include/block/nbd.h|  2 --
>  block.c|  2 +-
>  block/export/export.c  | 52 ++
>  blockdev-nbd.c |  2 +-
>  nbd/server.c   | 34 ---
>  qemu-nbd.c |  2 +-
>  7 files changed, 68 insertions(+), 34 deletions(-)

Reviewed-by: Max Reitz 

> diff --git a/block/export/export.c b/block/export/export.c
> index 9de108cbc1..675db9a8b9 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c

[...]

> +/* type == BLOCK_EXPORT_TYPE__MAX for all types */
> +void blk_exp_close_all_type(BlockExportType type)
> +{
> +BlockExport *exp, *next;
> +
> +QLIST_FOREACH_SAFE(exp, _exports, next, next) {
> +if (type != BLOCK_EXPORT_TYPE__MAX && exp->drv->type != type) {
> +continue;
> +}
> +blk_exp_request_shutdown(exp);
> +}
> +
> +AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
> +}
> +
> +void blk_exp_close_all(void)
> +{
> +blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX);

What’s interesting about this is that I saw from the header file that
you added both this and the type-specific function and wondered “Why not
just pass __MAX to close_all_type() to close all?”  And then I thought
“Because that would be stupid as an external interface”.

So I see you had the same thinking.



signature.asc
Description: OpenPGP digital signature


[PATCH v6 0/7] virtio-pci: enable blk and scsi multi-queue by default

2020-08-18 Thread Stefan Hajnoczi
v6:
 * Rebased onto QEMU 5.1 and added the now-necessary machine compat opts.

v4:
 * Sorry for the long delay. I considered replacing this series with a simpler
   approach. Real hardware ships with a fixed number of queues (e.g. 128). The
   equivalent can be done in QEMU too. That way we don't need to magically si=
ze
   num_queues. In the end I decided against this approach because the Linux
   virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized
   all available queues until recently (it was written with
   num_queues=3Dnum_vcpus in mind). It doesn't make sense for a 1 CPU guest to
   bring up 128 virtqueues (waste of resources and possibly weird performance
   effects with blk-mq).
 * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
 * Update commit descriptions to mention maximum MSI-X vector and virtqueue
   caps [Raphael]
v3:
 * Introduce virtio_pci_optimal_num_queues() helper to enforce VIRTIO_QUEUE_M=
AX
   in one place
 * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
 * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
v3:
 * Add new performance results that demonstrate the scalability
 * Mention that this is PCI-specific [Cornelia]
v2:
 * Let the virtio-DEVICE-pci device select num-queues because the optimal
   multi-queue configuration may differ between virtio-pci, virtio-mmio, and
   virtio-ccw [Cornelia]

Enabling multi-queue on virtio-pci storage devices improves performance on SMP
guests because the completion interrupt is handled on the vCPU that submitted
the I/O request.  This avoids IPIs inside the guest.

Note that performance is unchanged in these cases:
1. Uniprocessor guests.  They don't have IPIs.
2. Application threads might be scheduled on the sole vCPU that handles
   completion interrupts purely by chance.  (This is one reason why benchmark
   results can vary noticably between runs.)
3. Users may bind the application to the vCPU that handles completion
   interrupts.

Set the number of queues to the number of vCPUs by default on virtio-blk and
virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
for live migration compatibility.

Random read performance:
  IOPS
q=3D178k
q=3D32  104k  +33%

Boot time:
  Duration
q=3D151s
q=3D32 1m41s  +98%

Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks

Previously measured results on a 4 vCPU guest were also positive but showed a
smaller 1-4% performance improvement.  They are no longer valid because
significant event loop optimizations have been merged.

Peter Maydell (1):
  Open 5.2 development tree

Stefan Hajnoczi (6):
  hw: add 5.2 machine types and 5.1 compat options
  virtio-pci: add virtio_pci_optimal_num_queues() helper
  virtio-scsi: introduce a constant for fixed virtqueues
  virtio-scsi-pci: default num_queues to -smp N
  virtio-blk-pci: default num_queues to -smp N
  vhost-user-blk-pci: default num_queues to -smp N

 hw/virtio/virtio-pci.h |  9 +
 include/hw/boards.h|  3 +++
 include/hw/i386/pc.h   |  3 +++
 include/hw/virtio/vhost-user-blk.h |  2 ++
 include/hw/virtio/virtio-blk.h |  2 ++
 include/hw/virtio/virtio-scsi.h|  5 +
 hw/arm/virt.c  |  9 -
 hw/block/vhost-user-blk.c  |  6 +-
 hw/block/virtio-blk.c  |  6 +-
 hw/core/machine.c  |  9 +
 hw/i386/pc.c   |  4 
 hw/i386/pc_piix.c  | 14 -
 hw/i386/pc_q35.c   | 13 +++-
 hw/ppc/spapr.c | 15 --
 hw/s390x/s390-virtio-ccw.c | 14 -
 hw/scsi/vhost-scsi.c   |  3 ++-
 hw/scsi/vhost-user-scsi.c  |  5 +++--
 hw/scsi/virtio-scsi.c  | 13 
 hw/virtio/vhost-scsi-pci.c |  9 +++--
 hw/virtio/vhost-user-blk-pci.c |  4 
 hw/virtio/vhost-user-scsi-pci.c|  9 +++--
 hw/virtio/virtio-blk-pci.c |  7 ++-
 hw/virtio/virtio-pci.c | 32 ++
 hw/virtio/virtio-scsi-pci.c|  9 +++--
 VERSION|  2 +-
 25 files changed, 184 insertions(+), 23 deletions(-)

--=20
2.26.2



Re: [PATCH v5 5/5] iotests: add commit top->base cases to 274

2020-08-18 Thread Alberto Garcia
On Wed 10 Jun 2020 02:04:26 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> These cases are fixed by previous patches around block_status and
> is_allocated.

Reviewed-by: Alberto Garcia 

Berto



Re: [RFC PATCH 18/22] block/export: Add 'id' option to block-export-add

2020-08-18 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> We'll need an id to identify block exports in monitor commands. This
> adds one.
> 
> Note that this is different from the 'name' option in the NBD server,
> which is the externally visible export name. While block export ids need
> to be unique in the whole process, export names must be unique only for
> the same server. Different export types or (potentially in the future)
> multiple NBD servers can have the same export name externally, but still
> need different block export ids internally.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/block-export.json |  3 +++
>  include/block/export.h |  3 +++
>  block/export/export.c  | 27 +++
>  qemu-nbd.c |  1 +
>  4 files changed, 34 insertions(+)

Looks good, just one thing:

> diff --git a/block/export/export.c b/block/export/export.c
> index 675db9a8b9..72f1fab975 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c

[...]

> @@ -144,6 +170,7 @@ void qmp_nbd_server_add(NbdServerAddOptions *arg, Error 
> **errp)
>  BlockExportOptions *export_opts = g_new(BlockExportOptions, 1);
>  *export_opts = (BlockExportOptions) {
>  .type   = BLOCK_EXPORT_TYPE_NBD,
> +.id = g_strdup(arg->name ?: arg->device),

Maybe this behavior should be documented for nbd-server-add?

>  .device = g_strdup(arg->device),
>  .u.nbd = {
>  .has_name   = arg->has_name,



signature.asc
Description: OpenPGP digital signature


[PATCH v6 5/7] virtio-scsi-pci: default num_queues to -smp N

2020-08-18 Thread Stefan Hajnoczi
Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
Other transports continue to default to 1 request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.  The
maximum number of MSI-X vectors and virtqueues limit are respected.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-scsi.h |  2 ++
 hw/core/machine.c   |  3 +++
 hw/scsi/vhost-scsi.c|  3 ++-
 hw/scsi/vhost-user-scsi.c   |  3 ++-
 hw/scsi/virtio-scsi.c   |  6 +-
 hw/virtio/vhost-scsi-pci.c  | 10 +++---
 hw/virtio/vhost-user-scsi-pci.c | 10 +++---
 hw/virtio/virtio-scsi-pci.c | 10 +++---
 8 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 9f293bcb80..c0b8e4dd7e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -39,6 +39,8 @@
 /* Number of virtqueues that are always present */
 #define VIRTIO_SCSI_VQ_NUM_FIXED2
 
+#define VIRTIO_SCSI_AUTO_NUM_QUEUES UINT32_MAX
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
diff --git a/hw/core/machine.c b/hw/core/machine.c
index a6f7e4e8d7..9ee2aa0f7b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -29,6 +29,9 @@
 #include "migration/vmstate.h"
 
 GlobalProperty hw_compat_5_1[] = {
+{ "vhost-scsi", "num_queues", "1"},
+{ "vhost-user-scsi", "num_queues", "1"},
+{ "virtio-scsi-device", "num_queues", "1"},
 };
 const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 13b05af29b..a83ffeefc8 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -270,7 +270,8 @@ static Property vhost_scsi_properties[] = {
 DEFINE_PROP_STRING("vhostfd", VirtIOSCSICommon, conf.vhostfd),
 DEFINE_PROP_STRING("wwpn", VirtIOSCSICommon, conf.wwpn),
 DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
128),
 DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSICommon, conf.seg_max_adjust,
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a8b821466f..7c0631656c 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -162,7 +162,8 @@ static void vhost_user_scsi_unrealize(DeviceState *dev)
 static Property vhost_user_scsi_properties[] = {
 DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
 DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
128),
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index eecdd05af5..3a71ea7097 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -891,6 +891,9 @@ void virtio_scsi_common_realize(DeviceState *dev,
 virtio_init(vdev, "virtio-scsi", VIRTIO_ID_SCSI,
 sizeof(VirtIOSCSIConfig));
 
+if (s->conf.num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+s->conf.num_queues = 1;
+}
 if (s->conf.num_queues == 0 ||
 s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
 error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
@@ -964,7 +967,8 @@ static void virtio_scsi_device_unrealize(DeviceState *dev)
 }
 
 static Property virtio_scsi_properties[] = {
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 
1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI,
  parent_obj.conf.virtqueue_size, 256),
 DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSI,
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index 06e814d30e..a6bb0dc60d 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -47,11 +47,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 {
 VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
 DeviceState *vdev = 

[PATCH v6 3/7] virtio-pci: add virtio_pci_optimal_num_queues() helper

2020-08-18 Thread Stefan Hajnoczi
Multi-queue devices achieve the best performance when each vCPU has a
dedicated queue. This ensures that virtqueue used notifications are
handled on the same vCPU that submitted virtqueue buffers.  When another
vCPU handles the the notification an IPI will be necessary to wake the
submission vCPU and this incurs a performance overhead.

Provide a helper function that virtio-pci devices will use in later
patches to automatically select the optimal number of queues.

The function handles guests with large numbers of CPUs by limiting the
number of queues to fit within the following constraints:
1. The maximum number of MSI-X vectors.
2. The maximum number of virtqueues.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
---
 hw/virtio/virtio-pci.h |  9 +
 hw/virtio/virtio-pci.c | 32 
 2 files changed, 41 insertions(+)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index e2eaaa9182..91096f0291 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -243,4 +243,13 @@ typedef struct VirtioPCIDeviceTypeInfo {
 /* Register virtio-pci type(s).  @t must be static. */
 void virtio_pci_types_register(const VirtioPCIDeviceTypeInfo *t);
 
+/**
+ * virtio_pci_optimal_num_queues:
+ * @fixed_queues: number of queues that are always present
+ *
+ * Returns: The optimal number of queues for a multi-queue device, excluding
+ * @fixed_queues.
+ */
+unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
+
 #endif
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ccdf54e81c..fc69570dcc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -19,6 +19,7 @@
 
 #include "exec/memop.h"
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "migration/qemu-file-types.h"
 #include "hw/pci/pci.h"
@@ -2058,6 +2059,37 @@ void virtio_pci_types_register(const 
VirtioPCIDeviceTypeInfo *t)
 g_free(base_name);
 }
 
+unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues)
+{
+/*
+ * 1:1 vq to vCPU mapping is ideal because the same vCPU that submitted
+ * virtqueue buffers can handle their completion. When a different vCPU
+ * handles completion it may need to IPI the vCPU that submitted the
+ * request and this adds overhead.
+ *
+ * Virtqueues consume guest RAM and MSI-X vectors. This is wasteful in
+ * guests with very many vCPUs and a device that is only used by a few
+ * vCPUs. Unfortunately optimizing that case requires manual pinning inside
+ * the guest, so those users might as well manually set the number of
+ * queues. There is no upper limit that can be applied automatically and
+ * doing so arbitrarily would result in a sudden performance drop once the
+ * threshold number of vCPUs is exceeded.
+ */
+unsigned num_queues = current_machine->smp.cpus;
+
+/*
+ * The maximum number of MSI-X vectors is PCI_MSIX_FLAGS_QSIZE + 1, but the
+ * config change interrupt and the fixed virtqueues must be taken into
+ * account too.
+ */
+num_queues = MIN(num_queues, PCI_MSIX_FLAGS_QSIZE - fixed_queues);
+
+/*
+ * There is a limit to how many virtqueues a device can have.
+ */
+return MIN(num_queues, VIRTIO_QUEUE_MAX - fixed_queues);
+}
+
 /* virtio-pci-bus */
 
 static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size,
-- 
2.26.2



[PATCH v6 6/7] virtio-blk-pci: default num_queues to -smp N

2020-08-18 Thread Stefan Hajnoczi
Automatically size the number of virtio-blk-pci request virtqueues to
match the number of vCPUs.  Other transports continue to default to 1
request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.  The
maximum number of MSI-X vectors and virtqueues limit are respected.

Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
with NVMe storage).

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
Reviewed-by: Pankaj Gupta 
---
 include/hw/virtio/virtio-blk.h | 2 ++
 hw/block/virtio-blk.c  | 6 +-
 hw/core/machine.c  | 1 +
 hw/virtio/virtio-blk-pci.c | 7 ++-
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index b1334c3904..7539c2b848 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -30,6 +30,8 @@ struct virtio_blk_inhdr
 unsigned char status;
 };
 
+#define VIRTIO_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 struct VirtIOBlkConf
 {
 BlockConf conf;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 413783693c..2204ba149e 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1147,6 +1147,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "Device needs media, but drive is empty");
 return;
 }
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = 1;
+}
 if (!conf->num_queues) {
 error_setg(errp, "num-queues property must be larger than 0");
 return;
@@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = {
 #endif
 DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 true),
-DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
+   VIRTIO_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
 DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
 DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ee2aa0f7b..7f65fa8743 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
 { "vhost-user-scsi", "num_queues", "1"},
+{ "virtio-blk-device", "num-queues", "1"},
 { "virtio-scsi-device", "num_queues", "1"},
 };
 const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index 849cc7dfd8..37c6e0aeb4 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -50,9 +50,14 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 {
 VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
+VirtIOBlkConf *conf = >vdev.conf;
+
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = virtio_pci_optimal_num_queues(0);
+}
 
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
+vpci_dev->nvectors = conf->num_queues + 1;
 }
 
 qdev_realize(vdev, BUS(_dev->bus), errp);
-- 
2.26.2



Re: [RFC PATCH 16/22] block/export: Allocate BlockExport in blk_exp_add()

2020-08-18 Thread Max Reitz
On 13.08.20 18:29, Kevin Wolf wrote:
> Instead of letting the driver allocate and return the BlockExport
> object, allocate it already in blk_exp_add() and pass it. This allows us
> to initialise the generic part before calling into the driver so that
> the driver can just use these values instead of having to parse the
> options a second time.
> 
> For symmetry, move freeing the BlockExport to blk_exp_unref().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/export.h |  3 ++-
>  include/block/nbd.h| 11 ++-
>  block/export/export.c  | 17 -
>  blockdev-nbd.c | 24 +---
>  nbd/server.c   | 30 +-
>  5 files changed, 50 insertions(+), 35 deletions(-)

[...]

> diff --git a/block/export/export.c b/block/export/export.c
> index ef86bf892b..9de108cbc1 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c

[...]

> @@ -46,7 +48,19 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
> **errp)
>  return NULL;
>  }
>  
> -return drv->create(export, errp);

assert(drv->instance_size >= sizeof(BlockExport)) might be nice before
dereferencing *exp.

With that:

Reviewed-by: Max Reitz 

> +exp = g_malloc0(drv->instance_size);
> +*exp = (BlockExport) {
> +.drv= _exp_nbd,
> +.refcount   = 1,
> +};



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 2/7] hw: add 5.2 machine types and 5.1 compat options

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 15:33:43 +0100
Stefan Hajnoczi  wrote:

> arm, i386, ppc, and s390x have versioned machine types and associated
> compatibility options. Introduce new ones now that QEMU 5.1 has been
> released.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/boards.h|  3 +++
>  include/hw/i386/pc.h   |  3 +++
>  hw/arm/virt.c  |  9 -
>  hw/core/machine.c  |  4 
>  hw/i386/pc.c   |  4 
>  hw/i386/pc_piix.c  | 14 +-
>  hw/i386/pc_q35.c   | 13 -
>  hw/ppc/spapr.c | 15 +--
>  hw/s390x/s390-virtio-ccw.c | 14 +-
>  9 files changed, 73 insertions(+), 6 deletions(-)

https://lore.kernel.org/qemu-devel/20200728094645.272149-1-coh...@redhat.com/
is already out there :)




Re: [PATCH v7 14/47] stream: Deal with filters

2020-08-18 Thread Kevin Wolf
Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> Because of the (not so recent anymore) changes that make the stream job
> independent of the base node and instead track the node above it, we
> have to split that "bottom" node into two cases: The bottom COW node,
> and the node directly above the base node (which may be an R/W filter
> or the bottom COW node).
> 
> Signed-off-by: Max Reitz 
> ---
>  qapi/block-core.json |  4 +++
>  block/stream.c   | 63 
>  blockdev.c   |  4 ++-
>  3 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b20332e592..df87855429 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2486,6 +2486,10 @@
>  # On successful completion the image file is updated to drop the backing file
>  # and the BLOCK_JOB_COMPLETED event is emitted.
>  #
> +# In case @device is a filter node, block-stream modifies the first 
> non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.

Not to @base's backing node, but to @base itself (or actually, to
above_base's backing node, which is initially @base, but may have
changed when the job is completed).

Should we also document what using a filter node for @base means?

The code changes look good.

Kevin




[PATCH v6 4/7] virtio-scsi: introduce a constant for fixed virtqueues

2020-08-18 Thread Stefan Hajnoczi
The event and control virtqueues are always present, regardless of the
multi-queue configuration.  Define a constant so that virtqueue number
calculations are easier to read.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
Reviewed-by: Pankaj Gupta 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Raphael Norwitz 
---
 include/hw/virtio/virtio-scsi.h | 3 +++
 hw/scsi/vhost-user-scsi.c   | 2 +-
 hw/scsi/virtio-scsi.c   | 7 ---
 hw/virtio/vhost-scsi-pci.c  | 3 ++-
 hw/virtio/vhost-user-scsi-pci.c | 3 ++-
 hw/virtio/virtio-scsi-pci.c | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 24e768909d..9f293bcb80 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -36,6 +36,9 @@
 #define VIRTIO_SCSI_MAX_TARGET  255
 #define VIRTIO_SCSI_MAX_LUN 16383
 
+/* Number of virtqueues that are always present */
+#define VIRTIO_SCSI_VQ_NUM_FIXED2
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index f2e524438a..a8b821466f 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -114,7 +114,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error 
**errp)
 goto free_virtio;
 }
 
-vsc->dev.nvqs = 2 + vs->conf.num_queues;
+vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
 vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
 vsc->dev.vq_index = 0;
 vsc->dev.backend_features = 0;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index b49775269e..eecdd05af5 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -191,7 +191,7 @@ static void virtio_scsi_save_request(QEMUFile *f, 
SCSIRequest *sreq)
 VirtIOSCSIReq *req = sreq->hba_private;
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
 VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
-uint32_t n = virtio_get_queue_index(req->vq) - 2;
+uint32_t n = virtio_get_queue_index(req->vq) - VIRTIO_SCSI_VQ_NUM_FIXED;
 
 assert(n < vs->conf.num_queues);
 qemu_put_be32s(f, );
@@ -892,10 +892,11 @@ void virtio_scsi_common_realize(DeviceState *dev,
 sizeof(VirtIOSCSIConfig));
 
 if (s->conf.num_queues == 0 ||
-s->conf.num_queues > VIRTIO_QUEUE_MAX - 2) {
+s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
 error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
  "must be a positive integer less than %d.",
-   s->conf.num_queues, VIRTIO_QUEUE_MAX - 2);
+   s->conf.num_queues,
+   VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED);
 virtio_cleanup(vdev);
 return;
 }
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index 095af23f3f..06e814d30e 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -50,7 +50,8 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-vpci_dev->nvectors = vs->conf.num_queues + 3;
+vpci_dev->nvectors = vs->conf.num_queues +
+ VIRTIO_SCSI_VQ_NUM_FIXED + 1;
 }
 
 qdev_realize(vdev, BUS(_dev->bus), errp);
diff --git a/hw/virtio/vhost-user-scsi-pci.c b/hw/virtio/vhost-user-scsi-pci.c
index 4705cd54e8..ab6dfb71a9 100644
--- a/hw/virtio/vhost-user-scsi-pci.c
+++ b/hw/virtio/vhost-user-scsi-pci.c
@@ -56,7 +56,8 @@ static void vhost_user_scsi_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-vpci_dev->nvectors = vs->conf.num_queues + 3;
+vpci_dev->nvectors = vs->conf.num_queues +
+ VIRTIO_SCSI_VQ_NUM_FIXED + 1;
 }
 
 qdev_realize(vdev, BUS(_dev->bus), errp);
diff --git a/hw/virtio/virtio-scsi-pci.c b/hw/virtio/virtio-scsi-pci.c
index c23a134202..3ff9eb7ef6 100644
--- a/hw/virtio/virtio-scsi-pci.c
+++ b/hw/virtio/virtio-scsi-pci.c
@@ -51,7 +51,8 @@ static void virtio_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 char *bus_name;
 
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-vpci_dev->nvectors = vs->conf.num_queues + 3;
+vpci_dev->nvectors = vs->conf.num_queues +
+ VIRTIO_SCSI_VQ_NUM_FIXED + 1;
 }
 
 /*
-- 
2.26.2



[PATCH v6 1/7] Open 5.2 development tree

2020-08-18 Thread Stefan Hajnoczi
From: Peter Maydell 

Signed-off-by: Peter Maydell 
---
 VERSION | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/VERSION b/VERSION
index 831446cbd2..7d40cb9d36 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-5.1.0
+5.1.50
-- 
2.26.2



Re: [PATCH v5 4/5] block/io: fix bdrv_is_allocated_above

2020-08-18 Thread Alberto Garcia
On Wed 10 Jun 2020 02:04:25 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_is_allocated_above wrongly handles short backing files: it reports
> after-EOF space as UNALLOCATED which is wrong, as on read the data is
> generated on the level of short backing file (if all overlays has
> unallocated area at that place).
>
> Reusing bdrv_common_block_status_above fixes the issue and unifies code
> path.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v5 3/5] block/io: bdrv_common_block_status_above: support bs == base

2020-08-18 Thread Alberto Garcia
On Wed 10 Jun 2020 02:04:24 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> We are going to reuse bdrv_common_block_status_above in
> bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
> include_base == false and still bs == base (for ex. from img_rebase()).
>
> So, support this corner case.

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v5 2/5] block/io: bdrv_common_block_status_above: support include_base

2020-08-18 Thread Alberto Garcia
On Wed 10 Jun 2020 02:04:23 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> In order to reuse bdrv_common_block_status_above in
> bdrv_is_allocated_above, let's support include_base parameter.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v5 1/5] block/io: fix bdrv_co_block_status_above

2020-08-18 Thread Alberto Garcia
On Wed 10 Jun 2020 02:04:22 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> + * The top layer deferred to this layer, and because this layer 
> is
> + * short, any zeroes that we synthesize beyond EOF behave as if 
> they
> + * were allocated at this layer
>   */
> +assert(ret & BDRV_BLOCK_EOF);
>  *pnum = bytes;
> +if (file) {
> +*file = p;
> +}
> +return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;

You don't add BDRV_BLOCK_EOF to the return code here ?

> +res = bdrv_block_status_above(bs, NULL, offset, bytes, , NULL, 
> NULL);
> +offset += nr;
> +bytes -= nr;
> +} while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);

About this last "... && nr && bytes", I think 'nr' already implies
'bytes', maybe you want to use an assertion instead?

Berto



Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-18 Thread Anup Patel
On Tue, Aug 18, 2020 at 6:39 PM  wrote:
>
> On 8/18/20 7:17 AM, Anup Patel wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > On Tue, Aug 18, 2020 at 1:23 AM  wrote:
> >> On 8/17/20 8:28 PM, Alistair Francis wrote:
> >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> >>> the content is safe
> >>>
> >>> On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
>  Hi Anup,
> 
>  On 8/17/20 11:30 AM, Bin Meng wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> > the content is safe
> >
> > Hi Anup,
> >
> > On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
> >> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> >>> From: Bin Meng 
> >>>
> >>> This adds support for Microchip PolarFire SoC Icicle Kit board.
> >>> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> >>> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
> >> Nice Work !!! This is very helpful.
> > Thanks!
> >
> >> The Microchip HSS is quite convoluted. It has:
> >> 1. DDR Init
> >> 2. Boot device support
> >> 3. SBI support using OpenSBI as library
> >> 4. Simple TEE support
> >>
> >> I think point 1) and 2) above should be part of U-Boot SPL.
> >> The point 3) can be OpenSBI FW_DYNAMIC.
> >>
> >> Lastly,for point 4), we are working on a new OpenSBI feature using
> >> which we can run independent Secure OS and Non-Secure OS using
> >> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
> >> PolarFire).
> >>
> >> Do you have plans for adding U-Boot SPL support for this board ??
> > + Cyril Jean from Microchip
> >
> > I will have to leave this question to Cyril to comment.
> >
>  I currently do not have a plan to support U-Boot SPL. The idea of the
>  HSS is to contain all the silicon specific initialization and
>  configuration code within the HSS before jumping to U-Boot S-mode. I
>  would rather keep all this within the HSS for the time being. I would
>  wait until we reach production silicon before attempting to move this to
>  U-Boot SPL as the HSS is likely to contain some opaque silicon related
>  changes for another while.
> >>> That is unfortunate, a lot of work has gone into making the boot flow
> >>> simple and easy to use.
> >>>
> >>> QEMU now includes OpenSBI by default to make it easy for users to boot
> >>> Linux. The Icicle Kit board is now the most difficult QEMU board to
> >>> boot Linux on. Not to mention it makes it hard to impossible to
> >>> support it in standard tool flows such as meta-riscv.
> >>>
> >>> Alistair
> >> If it is such a problem we can add a U-Boot SPL stage and the HSS can be
> >> treated as standard SoC ROM code.
> > It's not mandatory for U-Boot SPL to have stable DRAM calibration settings
> > from the start itself. The initial U-Boot SPL support for most
> > platforms (accross
> > architectures) usually include basic working DRAM calibration settings which
> > is later on updated with separate patches. Also, we don't need all U-Boot
> > drivers to be upstreamed in one-go as this can be done in phases.
> >
> > The advantage we have with PolarFire SoC Icicle board is that we already
> > have a U-Boot S-mode. and we believe the OpenSBI generic platform will
> > work fine for PolarFire SoC Icicle board so the only thing missing right now
> > is the U-Boot SPL support for OpenSource boot-flow.
> >
> > It will certainly accelerate open-source development if we have boot-flow
> > U-Boot_SPL => OpenSBI (FW_DYNAMIC) => U-Boot_S-mode working
> > on PolarFire SoC Icicle board. Initially, we just need DRAM, SD/eMMC,
> > and Serial port support for U-Boot SPL and U-Boot S-mode. Later on,
> > more patches can add ethernet and other booting device drivers to U-Boot.
> >
> > Regarding security services of HSS, we are working on a OpenSBI
> > feature which will allow HSS security services to run as independent
> > binary in M-mode (not linked to OpenSBI) and OpenSBI FW_DYNAMIC
> > will be a separate binary acting as a secure monitor.
> >
> > Regards,
> > Anup
>
> What I have in mind is that the external memory will be up and running
> by the time we get to U-Boot SPL. In the case of PolarFire SoC the ROM
> code equivalent brings up the DDR memory interface so we do not need to
> worry about this as part of U-Boot.

Keeping DRAM configuration as part of a separate ROM booting stage prior
to the U-Boot SPL sounds good to me. This will lead to following boot-flow:

ROM/HSS (M-mode) => U-Boot SPL (M-mode) => OpenSBI (M-mode) => U-Boot S-mode

>
> Sounds to me the component that needs to be upstreamed next is the eMMC
> support so that it can be used as part of U-Boot SPL. Correct?

Maybe as a PHASE1 patch series for PolarFire U-Boot support can
target the following:
1. Minimal U-Boot board support for 

[PATCH v4 0/4] migration: Add block-bitmap-mapping parameter

2020-08-18 Thread Max Reitz
RFC v1: https://lists.nongnu.org/archive/html/qemu-block/2020-05/msg00912.html
RFC v2: https://lists.nongnu.org/archive/html/qemu-block/2020-05/msg00915.html
v1: https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09792.html
v2: https://lists.nongnu.org/archive/html/qemu-block/2020-07/msg01179.html
v3: https://lists.nongnu.org/archive/html/qemu-block/2020-07/msg01385.html

Branch: https://github.com/XanClic/qemu.git migration-bitmap-mapping-v4
Branch: https://git.xanclic.moe/XanClic/qemu.git migration-bitmap-mapping-v4

Hi,

This new migration parameter allows mapping block node names and bitmap
names to aliases for the purpose of block dirty bitmap migration.

This way, management tools can use different node names on the source
and destination and pass the mapping of how bitmaps are to be
transferred to qemu (on the source, the destination, or even both with
arbitrary aliases in the migration stream).


v4:
- Patch 1:
  - Rebased
  - %s/5.1/5.2/
  - The destination side will now ignore unmapped aliases, too (though
they are reported to stderr)
  - Check that the aliases given will fit in the migration stream (i.e.
are shorter than 256 bytes)
- On that note, fix a pre-existing bug where a bitmap with a name
  longer than 255 bytes would make qemu_put_counted_string() fail an
  assertion

- Patch 2:
  - import time (apparently, I forgot that in v3...?)

- Patch 3:
  - Added; I need this now

- Patch 4:
  - Migration rarely ever really fails now (just one case, where a
non-aliased bitmap name is too long, which can only be detected when
beginning migration)

  - Thus, we have to read most error messages from the VM log now, for
which I’ve added a new function (verify_dest_error)

  - Added tests for overly long bitmap names and node/bitmap aliases


git-backport-diff against v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[0167] [FC] 'migration: Add block-bitmap-mapping parameter'
002/4:[0001] [FC] 'iotests.py: Add wait_for_runstate()'
003/4:[down] 'iotests.py: Let wait_migration() return on failure'
004/4:[0232] [FC] 'iotests: Test node/bitmap aliases during migration'


Max Reitz (4):
  migration: Add block-bitmap-mapping parameter
  iotests.py: Add wait_for_runstate()
  iotests.py: Let wait_migration() return on failure
  iotests: Test node/bitmap aliases during migration

 qapi/migration.json| 101 +-
 migration/migration.h  |   3 +
 migration/block-dirty-bitmap.c | 409 +++---
 migration/migration.c  |  30 ++
 monitor/hmp-cmds.c |  30 ++
 tests/qemu-iotests/300 | 595 +
 tests/qemu-iotests/300.out |   5 +
 tests/qemu-iotests/group   |   1 +
 tests/qemu-iotests/iotests.py  |  22 +-
 9 files changed, 1134 insertions(+), 62 deletions(-)
 create mode 100755 tests/qemu-iotests/300
 create mode 100644 tests/qemu-iotests/300.out

-- 
2.26.2




[PATCH v4 2/4] iotests.py: Add wait_for_runstate()

2020-08-18 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 717b5b652c..ee93cf22db 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -833,6 +833,10 @@ class VM(qtest.QEMUQtestMachine):
'Found node %s under %s (but expected %s)' % \
(node['name'], path, expected_node)
 
+def wait_for_runstate(self, runstate: str) -> None:
+while self.qmp('query-status')['return']['status'] != runstate:
+time.sleep(0.2)
+
 index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
 
 class QMPTestCase(unittest.TestCase):
-- 
2.26.2




[PATCH v4 4/4] iotests: Test node/bitmap aliases during migration

2020-08-18 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/300 | 595 +
 tests/qemu-iotests/300.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 601 insertions(+)
 create mode 100755 tests/qemu-iotests/300
 create mode 100644 tests/qemu-iotests/300.out

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
new file mode 100755
index 00..c6d86b1dbc
--- /dev/null
+++ b/tests/qemu-iotests/300
@@ -0,0 +1,595 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# Tests for dirty bitmaps migration with node aliases
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import random
+import re
+import time
+from typing import Dict, List, Optional, Union
+import iotests
+import qemu
+
+BlockBitmapMapping = List[Dict[str, Union[str, List[Dict[str, str]
+
+assert iotests.sock_dir is not None
+mig_sock = os.path.join(iotests.sock_dir, 'mig_sock')
+
+class TestDirtyBitmapMigration(iotests.QMPTestCase):
+src_node_name: str = ''
+dst_node_name: str = ''
+src_bmap_name: str = ''
+dst_bmap_name: str = ''
+
+def setUp(self) -> None:
+self.vm_a = iotests.VM(path_suffix='-a')
+self.vm_a.add_blockdev(f'node-name={self.src_node_name},' \
+   'driver=null-co')
+self.vm_a.launch()
+
+self.vm_b = iotests.VM(path_suffix='-b')
+self.vm_b.add_blockdev(f'node-name={self.dst_node_name},' \
+   'driver=null-co')
+self.vm_b.add_incoming(f'unix:{mig_sock}')
+self.vm_b.launch()
+
+result = self.vm_a.qmp('block-dirty-bitmap-add',
+   node=self.src_node_name,
+   name=self.src_bmap_name)
+self.assert_qmp(result, 'return', {})
+
+# Dirty some random megabytes
+for _ in range(9):
+mb_ofs = random.randrange(1024)
+self.vm_a.hmp_qemu_io(self.src_node_name, f'write {mb_ofs}M 1M')
+
+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node=self.src_node_name,
+   name=self.src_bmap_name)
+self.bitmap_hash_reference = result['return']['sha256']
+
+caps = [{'capability': name, 'state': True} \
+for name in ('dirty-bitmaps', 'events')]
+
+for vm in (self.vm_a, self.vm_b):
+result = vm.qmp('migrate-set-capabilities', capabilities=caps)
+self.assert_qmp(result, 'return', {})
+
+def tearDown(self) -> None:
+self.vm_a.shutdown()
+self.vm_b.shutdown()
+try:
+os.remove(mig_sock)
+except OSError:
+pass
+
+def check_bitmap(self, bitmap_name_valid: bool) -> None:
+result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
+   node=self.dst_node_name,
+   name=self.dst_bmap_name)
+if bitmap_name_valid:
+self.assert_qmp(result, 'return/sha256',
+self.bitmap_hash_reference)
+else:
+self.assert_qmp(result, 'error/desc',
+f"Dirty bitmap '{self.dst_bmap_name}' not found")
+
+def migrate(self, bitmap_name_valid: bool = True,
+migration_success: bool = True) -> None:
+result = self.vm_a.qmp('migrate', uri=f'unix:{mig_sock}')
+self.assert_qmp(result, 'return', {})
+
+with iotests.Timeout(5, 'Timeout waiting for migration to complete'):
+self.assertEqual(self.vm_a.wait_migration('postmigrate'),
+ migration_success)
+self.assertEqual(self.vm_b.wait_migration('running'),
+ migration_success)
+
+if migration_success:
+self.check_bitmap(bitmap_name_valid)
+
+def verify_dest_error(self, msg: Optional[str]) -> None:
+"""
+Check whether the given error message is present in vm_b's log.
+(vm_b is shut down to do so.)
+If @msg is None, check that there has not been any error.
+"""
+self.vm_b.shutdown()
+if msg is None:
+self.assertNotIn('qemu-system-', self.vm_b.get_log())
+else:
+self.assertIn(msg, self.vm_b.get_log())
+
+@staticmethod
+def 

[PATCH v4 3/4] iotests.py: Let wait_migration() return on failure

2020-08-18 Thread Max Reitz
Let wait_migration() return on failure (with the return value indicating
whether the migration was completed or has failed), so we can use it for
migrations that are expected to fail, too.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ee93cf22db..f39fd580a6 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -729,16 +729,22 @@ class VM(qtest.QEMUQtestMachine):
 }
 ]))
 
-def wait_migration(self, expect_runstate):
+def wait_migration(self, expect_runstate: Optional[str]) -> bool:
 while True:
 event = self.event_wait('MIGRATION')
 log(event, filters=[filter_qmp_event])
-if event['data']['status'] == 'completed':
+if event['data']['status'] in ('completed', 'failed'):
 break
-# The event may occur in finish-migrate, so wait for the expected
-# post-migration runstate
-while self.qmp('query-status')['return']['status'] != expect_runstate:
-pass
+
+if event['data']['status'] == 'completed':
+# The event may occur in finish-migrate, so wait for the expected
+# post-migration runstate
+runstate = None
+while runstate != expect_runstate:
+runstate = self.qmp('query-status')['return']['status']
+return True
+else:
+return False
 
 def node_info(self, node_name):
 nodes = self.qmp('query-named-block-nodes')
-- 
2.26.2




[PATCH v4 1/4] migration: Add block-bitmap-mapping parameter

2020-08-18 Thread Max Reitz
This migration parameter allows mapping block node names and bitmap
names to aliases for the purpose of block dirty bitmap migration.

This way, management tools can use different node and bitmap names on
the source and destination and pass the mapping of how bitmaps are to be
transferred to qemu (on the source, the destination, or even both with
arbitrary aliases in the migration stream).

While touching this code, fix a bug where bitmap names longer than 255
bytes would fail an assertion in qemu_put_counted_string().

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Max Reitz 
---
 qapi/migration.json| 101 +++-
 migration/migration.h  |   3 +
 migration/block-dirty-bitmap.c | 409 -
 migration/migration.c  |  30 +++
 monitor/hmp-cmds.c |  30 +++
 5 files changed, 517 insertions(+), 56 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index ea53b23dca..0c4ae102b1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -508,6 +508,44 @@
   'data': [ 'none', 'zlib',
 { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
+##
+# @BitmapMigrationBitmapAlias:
+#
+# @name: The name of the bitmap.
+#
+# @alias: An alias name for migration (for example the bitmap name on
+# the opposite site).
+#
+# Since: 5.2
+##
+{ 'struct': 'BitmapMigrationBitmapAlias',
+  'data': {
+  'name': 'str',
+  'alias': 'str'
+  } }
+
+##
+# @BitmapMigrationNodeAlias:
+#
+# Maps a block node name and the bitmaps it has to aliases for dirty
+# bitmap migration.
+#
+# @node-name: A block node name.
+#
+# @alias: An alias block node name for migration (for example the
+# node name on the opposite site).
+#
+# @bitmaps: Mappings for the bitmaps on this node.
+#
+# Since: 5.2
+##
+{ 'struct': 'BitmapMigrationNodeAlias',
+  'data': {
+  'node-name': 'str',
+  'alias': 'str',
+  'bitmaps': [ 'BitmapMigrationBitmapAlias' ]
+  } }
+
 ##
 # @MigrationParameter:
 #
@@ -642,6 +680,24 @@
 #  will consume more CPU.
 #  Defaults to 1. (Since 5.0)
 #
+# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
+#  aliases for the purpose of dirty bitmap migration.  Such
+#  aliases may for example be the corresponding names on the
+#  opposite site.
+#  The mapping must be one-to-one, but not necessarily
+#  complete: On the source, unmapped bitmaps and all bitmaps
+#  on unmapped nodes will be ignored.  On the destination,
+#  all unmapped aliases in the incoming migration stream will
+#  be reported, but they will not result in failure.
+#  Note that the destination does not know about bitmaps it
+#  does not receive, so there is no limitation or requirement
+#  regarding the number of bitmaps received, or how they are
+#  named, or on which nodes they are placed.
+#  By default (when this parameter has never been set), bitmap
+#  names are mapped to themselves.  Nodes are mapped to their
+#  block device name if there is one, and to their node name
+#  otherwise. (Since 5.2)
+#
 # Since: 2.4
 ##
 { 'enum': 'MigrationParameter',
@@ -656,7 +712,8 @@
'multifd-channels',
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
-   'multifd-zlib-level' ,'multifd-zstd-level' ] }
+   'multifd-zlib-level' ,'multifd-zstd-level',
+   'block-bitmap-mapping' ] }
 
 ##
 # @MigrateSetParameters:
@@ -782,6 +839,24 @@
 #  will consume more CPU.
 #  Defaults to 1. (Since 5.0)
 #
+# @block-bitmap-mapping: Maps block nodes and bitmaps on them to
+#  aliases for the purpose of dirty bitmap migration.  Such
+#  aliases may for example be the corresponding names on the
+#  opposite site.
+#  The mapping must be one-to-one, but not necessarily
+#  complete: On the source, unmapped bitmaps and all bitmaps
+#  on unmapped nodes will be ignored.  On the destination,
+#  all unmapped aliases in the incoming migration stream will
+#  be reported, but they will not result in failure.
+#  Note that the destination does not know about bitmaps it
+#  does not receive, so there is no limitation or requirement
+#  regarding the number of bitmaps received, or how they are
+#  named, or on which nodes they are placed.
+#  By default (when this parameter has never been set), bitmap
+#  names are mapped to themselves.  Nodes are mapped to their
+#  block device name if there is one, and to their node name
+#  otherwise. (Since 5.2)
+#
 # Since: 2.4
 ##
 # TODO either fuse back into MigrationParameters, or make
@@ -812,7 +887,8 @@
 '*max-cpu-throttle': 'int',
 '*multifd-compression': 'MultiFDCompression',
 

Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-18 Thread via
On 8/18/20 7:17 AM, Anup Patel wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
>
> On Tue, Aug 18, 2020 at 1:23 AM  wrote:
>> On 8/17/20 8:28 PM, Alistair Francis wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
>>> content is safe
>>>
>>> On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
 Hi Anup,

 On 8/17/20 11:30 AM, Bin Meng wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> Hi Anup,
>
> On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
>> On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
>>> From: Bin Meng 
>>>
>>> This adds support for Microchip PolarFire SoC Icicle Kit board.
>>> The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
>>> E51 plus four U54 cores and many on-chip peripherals and an FPGA.
>> Nice Work !!! This is very helpful.
> Thanks!
>
>> The Microchip HSS is quite convoluted. It has:
>> 1. DDR Init
>> 2. Boot device support
>> 3. SBI support using OpenSBI as library
>> 4. Simple TEE support
>>
>> I think point 1) and 2) above should be part of U-Boot SPL.
>> The point 3) can be OpenSBI FW_DYNAMIC.
>>
>> Lastly,for point 4), we are working on a new OpenSBI feature using
>> which we can run independent Secure OS and Non-Secure OS using
>> U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
>> PolarFire).
>>
>> Do you have plans for adding U-Boot SPL support for this board ??
> + Cyril Jean from Microchip
>
> I will have to leave this question to Cyril to comment.
>
 I currently do not have a plan to support U-Boot SPL. The idea of the
 HSS is to contain all the silicon specific initialization and
 configuration code within the HSS before jumping to U-Boot S-mode. I
 would rather keep all this within the HSS for the time being. I would
 wait until we reach production silicon before attempting to move this to
 U-Boot SPL as the HSS is likely to contain some opaque silicon related
 changes for another while.
>>> That is unfortunate, a lot of work has gone into making the boot flow
>>> simple and easy to use.
>>>
>>> QEMU now includes OpenSBI by default to make it easy for users to boot
>>> Linux. The Icicle Kit board is now the most difficult QEMU board to
>>> boot Linux on. Not to mention it makes it hard to impossible to
>>> support it in standard tool flows such as meta-riscv.
>>>
>>> Alistair
>> If it is such a problem we can add a U-Boot SPL stage and the HSS can be
>> treated as standard SoC ROM code.
> It's not mandatory for U-Boot SPL to have stable DRAM calibration settings
> from the start itself. The initial U-Boot SPL support for most
> platforms (accross
> architectures) usually include basic working DRAM calibration settings which
> is later on updated with separate patches. Also, we don't need all U-Boot
> drivers to be upstreamed in one-go as this can be done in phases.
>
> The advantage we have with PolarFire SoC Icicle board is that we already
> have a U-Boot S-mode. and we believe the OpenSBI generic platform will
> work fine for PolarFire SoC Icicle board so the only thing missing right now
> is the U-Boot SPL support for OpenSource boot-flow.
>
> It will certainly accelerate open-source development if we have boot-flow
> U-Boot_SPL => OpenSBI (FW_DYNAMIC) => U-Boot_S-mode working
> on PolarFire SoC Icicle board. Initially, we just need DRAM, SD/eMMC,
> and Serial port support for U-Boot SPL and U-Boot S-mode. Later on,
> more patches can add ethernet and other booting device drivers to U-Boot.
>
> Regarding security services of HSS, we are working on a OpenSBI
> feature which will allow HSS security services to run as independent
> binary in M-mode (not linked to OpenSBI) and OpenSBI FW_DYNAMIC
> will be a separate binary acting as a secure monitor.
>
> Regards,
> Anup

What I have in mind is that the external memory will be up and running 
by the time we get to U-Boot SPL. In the case of PolarFire SoC the ROM 
code equivalent brings up the DDR memory interface so we do not need to 
worry about this as part of U-Boot.

Sounds to me the component that needs to be upstreamed next is the eMMC 
support so that it can be used as part of U-Boot SPL. Correct?


Regards,

Cyril.



Re: [PATCH v7 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

2020-08-18 Thread Lukas Straub
On Tue, 4 Aug 2020 10:11:22 +0200
Lukas Straub  wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, 
> etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
> 
> Regards,
> Lukas Straub
> 
> v7:
>  -yank_register_instance now returns error via Error **errp instead of 
> aborting
>  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> 
> v6:
>  -add Reviewed-by and Acked-by tags
>  -rebase on master
>  -lots of changes in nbd due to rebase
>  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. 
> Berrangé)
>  -fix a crash discovered by the newly added chardev test
>  -fix the test itself
> 
> v5:
>  -move yank.c to util/
>  -move yank.h to include/qemu/
>  -add license to yank.h
>  -use const char*
>  -nbd: use atomic_store_release and atomic_load_aqcuire
>  -io-channel: ensure thread-safety and document it
>  -add myself as maintainer for yank
> 
> v4:
>  -fix build errors...
> 
> v3:
>  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo 
> Bonzini)
>  -fix build errors
>  -rewrite migration patch so it actually passes all tests
> 
> v2:
>  -don't touch io/ code anymore
>  -always register yank functions
>  -'yank' now takes a list of instances to yank
>  -'query-yank' returns a list of yankable instances
> 
> Lukas Straub (8):
>   Introduce yank feature
>   block/nbd.c: Add yank feature
>   chardev/char-socket.c: Add yank feature
>   migration: Add yank feature
>   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
>   io: Document thread-safety of qio_channel_shutdown
>   MAINTAINERS: Add myself as maintainer for yank feature
>   tests/test-char.c: Wait for the chardev to connect in
> char_socket_client_dupid_test
> 
>  MAINTAINERS   |   6 ++
>  block/nbd.c   | 129 +++-
>  chardev/char-socket.c |  31 ++
>  include/io/channel.h  |   2 +
>  include/qemu/yank.h   |  80 +++
>  io/channel-tls.c  |   6 +-
>  migration/channel.c   |  12 +++
>  migration/migration.c |  25 -
>  migration/multifd.c   |  10 ++
>  migration/qemu-file-channel.c |   6 ++
>  migration/savevm.c|   6 ++
>  qapi/misc.json|  45 +
>  tests/Makefile.include|   2 +-
>  tests/test-char.c |   1 +
>  util/Makefile.objs|   1 +
>  util/yank.c   | 184 ++
>  16 files changed, 493 insertions(+), 53 deletions(-)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
> 
> --
> 2.20.1

Ping...


pgpPMt2f4Jdec.pgp
Description: OpenPGP digital signature


Re: [PATCH] qemu-img: Explicit number replaced by a constant

2020-08-18 Thread Max Reitz
On 18.08.20 09:19, Stefano Garzarella wrote:
> Hi Yi Li,
> thanks for this patch! Just a comment below:
> 
> On Mon, Aug 17, 2020 at 07:01:13PM +0800, Yi Li wrote:
>> Signed-off-by: Yi Li 
>> ---
>>  qemu-img.c | 12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 5308773811..a0fbc2757c 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1181,7 +1181,7 @@ static int64_t find_nonzero(const uint8_t *buf, 
>> int64_t n)
>>  }
>>  
>>  /*
>> - * Returns true iff the first sector pointed to by 'buf' contains at least
>> + * Returns true if the first sector pointed to by 'buf' contains at least
> 
> This change seems unrelated, please can you put this in a separate patch?

It’s also just not a correct change, as “iff” means “if and only if”.

Max



signature.asc
Description: OpenPGP digital signature


[RFC PATCH 0/9] block/curl: Add caching of data downloaded from the remote server

2020-08-18 Thread David Edmondson
When using qemu-img to convert an image that is hosted on an HTTP
server to some faster local (or pseudo-local) storage, the overall
performance can be improved by reading data from the HTTP server in
larger blocks and by caching and re-using blocks already read. This
set of patches implements both of these, and adds a further patch
allowing an offset to be added to all of the HTTP requests.

The first patch (block/curl: Add an 'offset' parameter, affecting all
range requests) allows the user to add an arbitrary offset to all
range requests sent to the HTTP server. This is useful if the image to
be read from the HTTP server is embedded in another file (for example
an uncompressed tar file). It avoids the need to first download the
file containing the source image and extract it (both of which will
require writing the image to local storage). It is logically distinct
from the rest of the patches and somewhat use-case specific.

The remaining patches implement block based retrieval of data from the
HTTP server and, optionally, caching of those blocks in memory.

The existing HTTP implementation simply reads whatever data is
requested by the caller, with the option for a user-specified amount
of readahead. This is poor for performance because most IO requests
(from QCOW2, for example) are for relatively small amounts of data,
typically no more than 64kB. This does not allow the underlying TCP
connections to achieve peak throughput.

The existing readhead mechanism is also intended to work in
conjunction with the HTTP driver's attempt to piggy-back a new IO
request on one that is already in flight. This works, but is often
defeated because it relies on the existing IO request *completely*
satisfying any subsequent request that might piggy-back onto it. This
is rarely the case and, particularly when used with "readahead", can
result in the same data being downloaded repeatedly.

The observed performance will depend greatly on the environment, but
when using qemu-img to retrieve a 1GiB QCOW2 image from an HTTPS
server, the following was observed:

| approach   | time (hh:mm:ss) |
|+-|
| QCOW2 over HTTPS (existing implementation) |00:00:59 |
| 256kB blocks, 8 cached blocks  |00:00:42 |
| 2MB blocks, 100 cached blocks  |00:00:34 |

By way of comparison, aria2c (a dedicated HTTP download client) can
retrieve the same image in 19 seconds. Obviously this is without any
QCOW2 layer.

David Edmondson (9):
  block/curl: Add an 'offset' parameter, affecting all range requests
  block/curl: Remove readahead support
  block/curl: Tracing
  block/curl: Perform IO in fixed size chunks
  block/curl: Allow the blocksize to be specified by the user
  block/curl: Cache downloaded blocks
  block/curl: Allow the user to control the number of cache blocks
  block/curl: Allow 16 sockets/ACB
  block/curl: Add readahead support

 block/curl.c  | 515 ++
 block/io.c|   4 +
 block/linux-aio.c |   6 +
 block/trace-events|  18 +-
 docs/system/device-url-syntax.rst.inc |  15 +
 qapi/block-core.json  |  11 +-
 6 files changed, 488 insertions(+), 81 deletions(-)

-- 
2.27.0




[RFC PATCH 1/9] block/curl: Add an 'offset' parameter, affecting all range requests

2020-08-18 Thread David Edmondson
A new 'offset' parameter affects all range requests that are sent to
the remote server. The value, in bytes, is simply added to any byte
offset values passed in to the driver.

Signed-off-by: David Edmondson 
---
 block/curl.c  | 12 +++-
 docs/system/device-url-syntax.rst.inc |  4 
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 4f907c47be..32ec760f66 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -74,6 +74,7 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_BLOCK_OPT_PASSWORD_SECRET "password-secret"
 #define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username"
 #define CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET "proxy-password-secret"
+#define CURL_BLOCK_OPT_OFFSET "offset"
 
 #define CURL_BLOCK_OPT_READAHEAD_DEFAULT (256 * 1024)
 #define CURL_BLOCK_OPT_SSLVERIFY_DEFAULT true
@@ -135,6 +136,7 @@ typedef struct BDRVCURLState {
 char *password;
 char *proxyusername;
 char *proxypassword;
+size_t offset;
 } BDRVCURLState;
 
 static void curl_clean_state(CURLState *s);
@@ -658,6 +660,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "ID of secret used as password for HTTP proxy auth",
 },
+{
+.name = CURL_BLOCK_OPT_OFFSET,
+.type = QEMU_OPT_SIZE,
+.help = "Offset into underlying content"
+},
 { /* end of list */ }
 },
 };
@@ -769,6 +776,8 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 }
 
+s->offset = qemu_opt_get_size(opts, CURL_BLOCK_OPT_OFFSET, 0);
+
 trace_curl_open(file);
 qemu_co_queue_init(>free_state_waitq);
 s->aio_context = bdrv_get_aio_context(bs);
@@ -899,7 +908,8 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 }
 state->acb[0] = acb;
 
-snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64, start, end);
+snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64,
+ s->offset + start, s->offset + end);
 trace_curl_setup_preadv(acb->bytes, start, state->range);
 curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
 
diff --git a/docs/system/device-url-syntax.rst.inc 
b/docs/system/device-url-syntax.rst.inc
index 88d7a372a7..33f1ddfe6d 100644
--- a/docs/system/device-url-syntax.rst.inc
+++ b/docs/system/device-url-syntax.rst.inc
@@ -197,6 +197,10 @@ These are specified using a special URL syntax.
   get the size of the image to be downloaded. If not set, the
   default timeout of 5 seconds is used.
 
+   ``offset``
+  Add an offset, in bytes, to all range requests sent to the
+  remote server.
+
Note that when passing options to qemu explicitly, ``driver`` is the
value of .
 
-- 
2.27.0




[RFC PATCH 9/9] block/curl: Add readahead support

2020-08-18 Thread David Edmondson
Re-add support for a readahead parameter, which is the number of bytes
added to the request from the upper layer before breaking the request
into blocks. The default is zero. The number of bytes specified has no
alignment requirements.

Signed-off-by: David Edmondson 
---
 block/curl.c  | 11 +++
 docs/system/device-url-syntax.rst.inc |  7 +++
 qapi/block-core.json  |  3 +++
 3 files changed, 21 insertions(+)

diff --git a/block/curl.c b/block/curl.c
index 8ee314739a..a182a55b93 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -65,6 +65,7 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_TIMEOUT_MAX 1
 
 #define CURL_BLOCK_OPT_URL   "url"
+#define CURL_BLOCK_OPT_READAHEAD "readahead"
 #define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
 #define CURL_BLOCK_OPT_TIMEOUT "timeout"
 #define CURL_BLOCK_OPT_COOKIE"cookie"
@@ -149,6 +150,7 @@ typedef struct BDRVCURLState {
 uint64_t len;
 CURLState states[CURL_NUM_STATES];
 char *url;
+size_t readahead_size;
 bool sslverify;
 uint64_t timeout;
 char *cookie;
@@ -881,6 +883,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "URL to open",
 },
+{
+.name = CURL_BLOCK_OPT_READAHEAD,
+.type = QEMU_OPT_SIZE,
+.help = "Readahead size",
+},
 {
 .name = CURL_BLOCK_OPT_SSLVERIFY,
 .type = QEMU_OPT_BOOL,
@@ -976,6 +983,8 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto out_noclean;
 }
 
+s->readahead_size = qemu_opt_get_size(opts, CURL_BLOCK_OPT_READAHEAD, 0);
+
 s->timeout = qemu_opt_get_number(opts, CURL_BLOCK_OPT_TIMEOUT,
  CURL_BLOCK_OPT_TIMEOUT_DEFAULT);
 if (s->timeout > CURL_TIMEOUT_MAX) {
@@ -1247,6 +1256,8 @@ static int coroutine_fn curl_co_preadv(BlockDriverState 
*bs,
 
 trace_curl_co_preadv(qemu_coroutine_self(), offset, bytes);
 
+bytes += s->readahead_size;
+
 while (bytes > 0) {
 uint64_t len = MIN(bytes, s->blocksize - curl_block_offset(s, off));
 CURLAIOCB acb = {
diff --git a/docs/system/device-url-syntax.rst.inc 
b/docs/system/device-url-syntax.rst.inc
index 56843cb38f..58245e017c 100644
--- a/docs/system/device-url-syntax.rst.inc
+++ b/docs/system/device-url-syntax.rst.inc
@@ -174,6 +174,13 @@ These are specified using a special URL syntax.
``url``
   The full URL when passing options to the driver explicitly.
 
+   ``readahead``
+  The amount of data to read ahead with each range request to the
+  remote server. This value may optionally have the suffix 'T', 'G',
+  'M', 'K', 'k' or 'b'. If it does not have a suffix, it will be
+  assumed to be in bytes. The value must be a multiple of 512 bytes.
+  It defaults to 256k.
+
``sslverify``
   Whether to verify the remote server's certificate when connecting
   over SSL. It can have the value 'on' or 'off'. It defaults to
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 91888166fa..f4092ccc14 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3752,6 +3752,8 @@
 #
 # @url: URL of the image file
 #
+# @readahead: Amount of read-ahead (defaults to 0)
+#
 # @timeout: Timeout for connections, in seconds (defaults to 5)
 #
 # @username: Username for authentication (defaults to none)
@@ -3776,6 +3778,7 @@
   'data': { 'url': 'str',
 '*blocksize': 'int',
 '*blockcount': 'int',
+'*readahead': 'int',
 '*timeout': 'int',
 '*username': 'str',
 '*password-secret': 'str',
-- 
2.27.0




[RFC PATCH 3/9] block/curl: Tracing

2020-08-18 Thread David Edmondson
Add some more trace functions to the IO path.

Signed-off-by: David Edmondson 
---
 block/curl.c   | 10 +-
 block/io.c |  4 
 block/linux-aio.c  |  6 ++
 block/trace-events | 13 -
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index d0c74d7de5..d2f4de46c9 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -299,6 +299,8 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 {
 char *buf = state->orig_buf + (start - state->buf_start);
 
+trace_curl_pending_hit(qemu_coroutine_self(),
+   start, len);
 qemu_iovec_from_buf(acb->qiov, 0, buf, clamped_len);
 if (clamped_len < len) {
 qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
clamped_len);
@@ -316,6 +318,8 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 {
 int j;
 
+trace_curl_pending_piggyback(qemu_coroutine_self(),
+ start, len);
 acb->start = start - state->buf_start;
 acb->end = acb->start + clamped_len;
 
@@ -328,6 +332,8 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 }
 }
 
+trace_curl_pending_miss(qemu_coroutine_self(), start, len);
+
 return false;
 }
 
@@ -894,7 +900,7 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 
 snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64,
  s->offset + start, s->offset + end);
-trace_curl_setup_preadv(acb->bytes, start, state->range);
+trace_curl_setup_preadv(qemu_coroutine_self(), start, acb->bytes);
 curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
 
 if (curl_multi_add_handle(s->multi, state->curl) != CURLM_OK) {
@@ -923,10 +929,12 @@ static int coroutine_fn curl_co_preadv(BlockDriverState 
*bs,
 .bytes = bytes
 };
 
+trace_curl_co_preadv(qemu_coroutine_self(), offset, bytes);
 curl_setup_preadv(bs, );
 while (acb.ret == -EINPROGRESS) {
 qemu_coroutine_yield();
 }
+trace_curl_co_preadv_done(qemu_coroutine_self());
 return acb.ret;
 }
 
diff --git a/block/io.c b/block/io.c
index ad3a51ed53..f816a46bf0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1194,6 +1194,8 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 return -ENOMEDIUM;
 }
 
+trace_bdrv_driver_pwritev(qemu_coroutine_self(), offset, bytes);
+
 if (drv->bdrv_co_pwritev_part) {
 ret = drv->bdrv_co_pwritev_part(bs, offset, bytes, qiov, qiov_offset,
 flags & bs->supported_write_flags);
@@ -1253,6 +1255,8 @@ emulate_flags:
 qemu_iovec_destroy(_qiov);
 }
 
+trace_bdrv_driver_pwritev_done(qemu_coroutine_self());
+
 return ret;
 }
 
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 3c0527c2bf..811e9ff94c 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "trace.h"
 
 #include 
 
@@ -391,6 +392,8 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
 .qiov   = qiov,
 };
 
+trace_laio_co_submit(qemu_coroutine_self(), offset, qiov->size);
+
 ret = laio_do_submit(fd, , offset, type);
 if (ret < 0) {
 return ret;
@@ -399,6 +402,9 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, 
LinuxAioState *s, int fd,
 if (laiocb.ret == -EINPROGRESS) {
 qemu_coroutine_yield();
 }
+
+trace_laio_co_submit_done(qemu_coroutine_self());
+
 return laiocb.ret;
 }
 
diff --git a/block/trace-events b/block/trace-events
index 9158335061..0b52d2ca1d 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -17,6 +17,8 @@ bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, 
int flags) "bs %p off
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t 
cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u 
cluster_offset %"PRId64" cluster_bytes %"PRId64
 bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t 
dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset 
%"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
 bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t 
dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset 
%"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x"
+bdrv_driver_pwritev(void *co, uint64_t offset, uint64_t bytes) "co %p writes 
0x%"PRIx64" + 0x%"PRIx64
+bdrv_driver_pwritev_done(void *co) "co %p done"
 
 # stream.c
 stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int 
is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d"
@@ -196,7 +198,12 @@ 

[RFC PATCH 6/9] block/curl: Cache downloaded blocks

2020-08-18 Thread David Edmondson
In the hope that they will be referred to multiple times, cache the
blocks downloaded from the remote server.

Signed-off-by: David Edmondson 
---
 block/curl.c   | 321 +++--
 block/trace-events |   3 +
 2 files changed, 287 insertions(+), 37 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index b2d02818a9..0ea9eedebd 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -81,11 +81,29 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 /* Must be a non-zero power of 2. */
 #define CURL_BLOCK_OPT_BLOCKSIZE_DEFAULT (256 * 1024)
 
+/* The maximum number of blocks to store in the cache. */
+#define CURL_BLOCK_CACHE_MAX_BLOCKS 100
+/* The number of heads in the hash table. */
+#define CURL_BLOCK_CACHE_HASH 37
+
 struct BDRVCURLState;
 struct CURLState;
 
 static bool libcurl_initialized;
 
+typedef struct block {
+QLIST_ENTRY(block) hash; /* Blocks with the same hash value. */
+QLIST_ENTRY(block) free; /* Block free list. */
+QTAILQ_ENTRY(block) lru; /* LRU list. */
+bool hashed; /* block_t contains data and is hashed. */
+int use; /* Use count. */
+
+uint64_t start; /* Offset of first byte. */
+uint64_t count; /* Valid bytes. */
+
+char *buf;  /* Data. */
+} block_t;
+
 typedef struct CURLAIOCB {
 Coroutine *co;
 QEMUIOVector *qiov;
@@ -117,12 +135,11 @@ typedef struct CURLState
 CURLAIOCB *acb[CURL_NUM_ACB];
 CURL *curl;
 QLIST_HEAD(, CURLSocket) sockets;
-char *orig_buf;
-uint64_t buf_start;
 size_t buf_off;
 char range[128];
 char errmsg[CURL_ERROR_SIZE];
 char in_use;
+block_t *cache_block;
 } CURLState;
 
 typedef struct BDRVCURLState {
@@ -144,11 +161,17 @@ typedef struct BDRVCURLState {
 char *proxypassword;
 size_t offset;
 size_t blocksize;
+int cache_allocated; /* The number of block_t currently allocated. */
+QLIST_HEAD(, block) cache_free;
+QTAILQ_HEAD(, block) cache_lru;
+QLIST_HEAD(, block) * cache_hash;
 } BDRVCURLState;
 
 static void curl_clean_state(CURLState *s);
 static void curl_multi_do(void *arg);
 
+static void curl_cache_free(BDRVCURLState *s, block_t *b);
+
 /* Align "n" to the start of the containing block. */
 static inline uint64_t curl_block_align(BDRVCURLState *s, uint64_t n)
 {
@@ -161,6 +184,198 @@ static inline uint64_t curl_block_offset(BDRVCURLState 
*s, uint64_t n)
 return n & (s->blocksize - 1);
 }
 
+static uint64_t curl_cache_hash(BDRVCURLState *s, uint64_t n)
+{
+return curl_block_align(s, n) % CURL_BLOCK_CACHE_HASH;
+}
+
+static bool curl_cache_init(BDRVCURLState *s)
+{
+s->cache_allocated = 0;
+
+QLIST_INIT(>cache_free);
+QTAILQ_INIT(>cache_lru);
+
+s->cache_hash = g_try_malloc(CURL_BLOCK_CACHE_HASH * 
sizeof(s->cache_hash));
+if (!s->cache_hash) {
+return false;
+}
+
+for (int i = 0; i < CURL_BLOCK_CACHE_HASH; i++) {
+QLIST_INIT(>cache_hash[i]);
+}
+
+return true;
+}
+
+static void curl_cache_deinit(BDRVCURLState *s)
+{
+block_t *b;
+
+/*
+ * Cache blocks are either in the hash table or on the free list.
+ */
+for (int i = 0; i < CURL_BLOCK_CACHE_HASH; i++) {
+while (!QLIST_EMPTY(>cache_hash[i])) {
+b = QLIST_FIRST(>cache_hash[i]);
+QLIST_REMOVE(b, hash);
+b->hashed = false;
+curl_cache_free(s, b);
+}
+}
+
+while (!QLIST_EMPTY(>cache_free)) {
+b = QLIST_FIRST(>cache_free);
+QLIST_REMOVE(b, free);
+curl_cache_free(s, b);
+}
+
+assert(s->cache_allocated == 0);
+
+g_free(s->cache_hash);
+s->cache_hash = NULL;
+}
+
+static block_t *curl_cache_alloc(BDRVCURLState *s)
+{
+block_t *b = g_try_malloc0(sizeof(*b));
+
+if (!b) {
+return NULL;
+}
+
+b->buf = g_try_malloc(s->blocksize);
+if (!b->buf) {
+g_free(b);
+return NULL;
+}
+
+s->cache_allocated++;
+
+trace_curl_cache_alloc(s->cache_allocated);
+
+return b;
+}
+
+static void curl_cache_free(BDRVCURLState *s, block_t *b)
+{
+assert(b->use == 0);
+assert(!b->hashed);
+
+g_free(b->buf);
+g_free(b);
+
+s->cache_allocated--;
+
+trace_curl_cache_free(s->cache_allocated);
+}
+
+static block_t *curl_cache_get(BDRVCURLState *s)
+{
+block_t *b = NULL;
+
+/* If there is one on the free list, use it. */
+if (!QLIST_EMPTY(>cache_free)) {
+b = QLIST_FIRST(>cache_free);
+QLIST_REMOVE(b, free);
+
+assert(b->use == 0);
+assert(!b->hashed);
+
+b->use++;
+goto done;
+}
+
+/* If not at the limit, try get a new one. */
+if (s->cache_allocated < CURL_BLOCK_CACHE_MAX_BLOCKS) {
+b = curl_cache_alloc(s);
+if (b) {
+b->use++;
+goto done;
+}
+}
+
+/* Take one from the LRU list. */
+if (!QTAILQ_EMPTY(>cache_lru)) {
+b = QTAILQ_FIRST(>cache_lru);
+

[RFC PATCH 5/9] block/curl: Allow the blocksize to be specified by the user

2020-08-18 Thread David Edmondson
Rather than a fixed 256kB blocksize, allow the user to specify the
size used. It must be a non-zero power of two, defaulting to 256kB.

Signed-off-by: David Edmondson 
---
 block/curl.c  | 73 +--
 docs/system/device-url-syntax.rst.inc |  7 +++
 qapi/block-core.json  |  4 ++
 3 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index cfc518efda..b2d02818a9 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -74,17 +74,12 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username"
 #define CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET "proxy-password-secret"
 #define CURL_BLOCK_OPT_OFFSET "offset"
+#define CURL_BLOCK_OPT_BLOCKSIZE "blocksize"
 
 #define CURL_BLOCK_OPT_SSLVERIFY_DEFAULT true
 #define CURL_BLOCK_OPT_TIMEOUT_DEFAULT 5
-
 /* Must be a non-zero power of 2. */
-#define CURL_BLOCK_SIZE (256 * 1024)
-
-/* Align "n" to the start of the containing block. */
-#define CURL_BLOCK_ALIGN(n) ((n) & ~(CURL_BLOCK_SIZE - 1))
-/* The offset of "n" within its' block. */
-#define CURL_BLOCK_OFFSET(n) ((n) & (CURL_BLOCK_SIZE - 1))
+#define CURL_BLOCK_OPT_BLOCKSIZE_DEFAULT (256 * 1024)
 
 struct BDRVCURLState;
 struct CURLState;
@@ -102,7 +97,7 @@ typedef struct CURLAIOCB {
 
 /*
  * start and end indicate the subset of the surrounding
- * CURL_BLOCK_SIZE sized block that is the subject of this
+ * BDRVCURLState.blocksize sized block that is the subject of this
  * IOCB. They are offsets from the beginning of the underlying
  * buffer.
  */
@@ -148,11 +143,24 @@ typedef struct BDRVCURLState {
 char *proxyusername;
 char *proxypassword;
 size_t offset;
+size_t blocksize;
 } BDRVCURLState;
 
 static void curl_clean_state(CURLState *s);
 static void curl_multi_do(void *arg);
 
+/* Align "n" to the start of the containing block. */
+static inline uint64_t curl_block_align(BDRVCURLState *s, uint64_t n)
+{
+return n & ~(s->blocksize - 1);
+}
+
+/* The offset of "n" within its' block. */
+static inline uint64_t curl_block_offset(BDRVCURLState *s, uint64_t n)
+{
+return n & (s->blocksize - 1);
+}
+
 #ifdef NEED_CURL_TIMER_CALLBACK
 /* Called from curl_multi_do_locked, with s->mutex held.  */
 static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
@@ -264,22 +272,23 @@ static size_t curl_header_cb(void *ptr, size_t size, 
size_t nmemb, void *opaque)
 /* Called from curl_multi_do_locked, with s->mutex held.  */
 static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
 {
-CURLState *s = ((CURLState*)opaque);
+CURLState *state = (CURLState *)opaque;
+BDRVCURLState *s = state->s;
 size_t realsize = size * nmemb;
 
 trace_curl_read_cb(realsize);
 
-if (!s || !s->orig_buf) {
+if (!state || !state->orig_buf) {
 goto read_end;
 }
 
-if (s->buf_off >= CURL_BLOCK_SIZE) {
+if (state->buf_off >= s->blocksize) {
 /* buffer full, read nothing */
 goto read_end;
 }
-realsize = MIN(realsize, CURL_BLOCK_SIZE - s->buf_off);
-memcpy(s->orig_buf + s->buf_off, ptr, realsize);
-s->buf_off += realsize;
+realsize = MIN(realsize, s->blocksize - state->buf_off);
+memcpy(state->orig_buf + state->buf_off, ptr, realsize);
+state->buf_off += realsize;
 
 read_end:
 /* curl will error out if we do not return this value */
@@ -300,7 +309,7 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 /* The end of the currently valid data. */
 uint64_t buf_end = state->buf_start + state->buf_off;
 /* The end of the valid data when the IO completes. */
-uint64_t buf_fend = state->buf_start + CURL_BLOCK_SIZE;
+uint64_t buf_fend = state->buf_start + s->blocksize;
 
 if (!state->orig_buf)
 continue;
@@ -315,7 +324,7 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 (clamped_end >= state->buf_start) &&
 (clamped_end <= buf_end))
 {
-char *buf = state->orig_buf + CURL_BLOCK_OFFSET(start);
+char *buf = state->orig_buf + curl_block_offset(s, start);
 
 trace_curl_pending_hit(qemu_coroutine_self(),
start, len);
@@ -343,7 +352,7 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, 
uint64_t len,
 
 trace_curl_pending_piggyback(qemu_coroutine_self(),
  start, len);
-acb->start = CURL_BLOCK_OFFSET(start);
+acb->start = curl_block_offset(s, start);
 acb->end = acb->start + clamped_len;
 
 for (j = 0; j < CURL_NUM_ACB; j++) {
@@ -688,6 +697,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_SIZE,
 .help = "Offset into underlying content"
 },
+{
+.name = 

[RFC PATCH 2/9] block/curl: Remove readahead support

2020-08-18 Thread David Edmondson
Block based caching and the current readahead support do not interact
well, so remove readahead support before adding block
caching. Readahead will be re-added later.

Signed-off-by: David Edmondson 
---
 block/curl.c  | 23 ---
 docs/system/device-url-syntax.rst.inc |  7 ---
 qapi/block-core.json  |  4 
 3 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 32ec760f66..d0c74d7de5 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -65,7 +65,6 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_TIMEOUT_MAX 1
 
 #define CURL_BLOCK_OPT_URL   "url"
-#define CURL_BLOCK_OPT_READAHEAD "readahead"
 #define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
 #define CURL_BLOCK_OPT_TIMEOUT "timeout"
 #define CURL_BLOCK_OPT_COOKIE"cookie"
@@ -76,7 +75,6 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET "proxy-password-secret"
 #define CURL_BLOCK_OPT_OFFSET "offset"
 
-#define CURL_BLOCK_OPT_READAHEAD_DEFAULT (256 * 1024)
 #define CURL_BLOCK_OPT_SSLVERIFY_DEFAULT true
 #define CURL_BLOCK_OPT_TIMEOUT_DEFAULT 5
 
@@ -124,7 +122,6 @@ typedef struct BDRVCURLState {
 uint64_t len;
 CURLState states[CURL_NUM_STATES];
 char *url;
-size_t readahead_size;
 bool sslverify;
 uint64_t timeout;
 char *cookie;
@@ -615,11 +612,6 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_STRING,
 .help = "URL to open",
 },
-{
-.name = CURL_BLOCK_OPT_READAHEAD,
-.type = QEMU_OPT_SIZE,
-.help = "Readahead size",
-},
 {
 .name = CURL_BLOCK_OPT_SSLVERIFY,
 .type = QEMU_OPT_BOOL,
@@ -705,14 +697,6 @@ static int curl_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto out_noclean;
 }
 
-s->readahead_size = qemu_opt_get_size(opts, CURL_BLOCK_OPT_READAHEAD,
-  CURL_BLOCK_OPT_READAHEAD_DEFAULT);
-if ((s->readahead_size & 0x1ff) != 0) {
-error_setg(errp, "HTTP_READAHEAD_SIZE %zd is not a multiple of 512",
-   s->readahead_size);
-goto out_noclean;
-}
-
 s->timeout = qemu_opt_get_number(opts, CURL_BLOCK_OPT_TIMEOUT,
  CURL_BLOCK_OPT_TIMEOUT_DEFAULT);
 if (s->timeout > CURL_TIMEOUT_MAX) {
@@ -898,7 +882,7 @@ static void curl_setup_preadv(BlockDriverState *bs, 
CURLAIOCB *acb)
 state->buf_off = 0;
 g_free(state->orig_buf);
 state->buf_start = start;
-state->buf_len = MIN(acb->end + s->readahead_size, s->len - start);
+state->buf_len = MIN(acb->end, s->len - start);
 end = start + state->buf_len - 1;
 state->orig_buf = g_try_malloc(state->buf_len);
 if (state->buf_len && state->orig_buf == NULL) {
@@ -971,8 +955,9 @@ static void curl_refresh_filename(BlockDriverState *bs)
 {
 BDRVCURLState *s = bs->opaque;
 
-/* "readahead" and "timeout" do not change the guest-visible data,
- * so ignore them */
+/*
+ * "timeout" does not change the guest-visible data, so ignore it.
+ */
 if (s->sslverify != CURL_BLOCK_OPT_SSLVERIFY_DEFAULT ||
 s->cookie || s->username || s->password || s->proxyusername ||
 s->proxypassword)
diff --git a/docs/system/device-url-syntax.rst.inc 
b/docs/system/device-url-syntax.rst.inc
index 33f1ddfe6d..bc38b9df38 100644
--- a/docs/system/device-url-syntax.rst.inc
+++ b/docs/system/device-url-syntax.rst.inc
@@ -174,13 +174,6 @@ These are specified using a special URL syntax.
``url``
   The full URL when passing options to the driver explicitly.
 
-   ``readahead``
-  The amount of data to read ahead with each range request to the
-  remote server. This value may optionally have the suffix 'T', 'G',
-  'M', 'K', 'k' or 'b'. If it does not have a suffix, it will be
-  assumed to be in bytes. The value must be a multiple of 512 bytes.
-  It defaults to 256k.
-
``sslverify``
   Whether to verify the remote server's certificate when connecting
   over SSL. It can have the value 'on' or 'off'. It defaults to
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 197bdc1c36..d6f5e91cc3 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3752,9 +3752,6 @@
 #
 # @url: URL of the image file
 #
-# @readahead: Size of the read-ahead cache; must be a multiple of
-# 512 (defaults to 256 kB)
-#
 # @timeout: Timeout for connections, in seconds (defaults to 5)
 #
 # @username: Username for authentication (defaults to none)
@@ -3771,7 +3768,6 @@
 ##
 { 'struct': 'BlockdevOptionsCurlBase',
   'data': { 'url': 'str',
-'*readahead': 'int',
 '*timeout': 'int',
 '*username': 'str',
 '*password-secret': 'str',
-- 
2.27.0




[RFC PATCH 4/9] block/curl: Perform IO in fixed size chunks

2020-08-18 Thread David Edmondson
Do all IO requests to the remote server in 256kB chunks.

Signed-off-by: David Edmondson 
---
 block/curl.c   | 151 -
 block/trace-events |   2 +
 2 files changed, 109 insertions(+), 44 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index d2f4de46c9..cfc518efda 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -78,6 +78,14 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_BLOCK_OPT_SSLVERIFY_DEFAULT true
 #define CURL_BLOCK_OPT_TIMEOUT_DEFAULT 5
 
+/* Must be a non-zero power of 2. */
+#define CURL_BLOCK_SIZE (256 * 1024)
+
+/* Align "n" to the start of the containing block. */
+#define CURL_BLOCK_ALIGN(n) ((n) & ~(CURL_BLOCK_SIZE - 1))
+/* The offset of "n" within its' block. */
+#define CURL_BLOCK_OFFSET(n) ((n) & (CURL_BLOCK_SIZE - 1))
+
 struct BDRVCURLState;
 struct CURLState;
 
@@ -86,11 +94,18 @@ static bool libcurl_initialized;
 typedef struct CURLAIOCB {
 Coroutine *co;
 QEMUIOVector *qiov;
+uint64_t qiov_offset; /* Offset in qiov to place data. */
 
 uint64_t offset;
 uint64_t bytes;
 int ret;
 
+/*
+ * start and end indicate the subset of the surrounding
+ * CURL_BLOCK_SIZE sized block that is the subject of this
+ * IOCB. They are offsets from the beginning of the underlying
+ * buffer.
+ */
 size_t start;
 size_t end;
 } CURLAIOCB;
@@ -110,7 +125,6 @@ typedef struct CURLState
 char *orig_buf;
 uint64_t buf_start;
 size_t buf_off;
-size_t buf_len;
 char range[128];
 char errmsg[CURL_ERROR_SIZE];
 char in_use;
@@ -259,11 +273,11 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
nmemb, void *opaque)
 goto read_end;
 }
 
-if (s->buf_off >= s->buf_len) {
+if (s->buf_off >= CURL_BLOCK_SIZE) {
 /* buffer full, read nothing */
 goto read_end;
 }
-realsize = MIN(realsize, s->buf_len - s->buf_off);
+realsize = MIN(realsize, CURL_BLOCK_SIZE - s->buf_off);
 memcpy(s->orig_buf + s->buf_off, ptr, realsize);
 s->buf_off += realsize;
 
@@ -281,35 +295,44 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t 
start, uint64_t len,
 uint64_t clamped_end = MIN(end, s->len);
 uint64_t clamped_len = clamped_end - start;
 
-for (i=0; istates[i];
-uint64_t buf_end = (state->buf_start + state->buf_off);
-uint64_t buf_fend = (state->buf_start + state->buf_len);
+/* The end of the currently valid data. */
+uint64_t buf_end = state->buf_start + state->buf_off;
+/* The end of the valid data when the IO completes. */
+uint64_t buf_fend = state->buf_start + CURL_BLOCK_SIZE;
 
 if (!state->orig_buf)
 continue;
 if (!state->buf_off)
 continue;
 
-// Does the existing buffer cover our section?
+/*
+ * Does the existing buffer cover our section?
+ */
 if ((start >= state->buf_start) &&
 (start <= buf_end) &&
 (clamped_end >= state->buf_start) &&
 (clamped_end <= buf_end))
 {
-char *buf = state->orig_buf + (start - state->buf_start);
+char *buf = state->orig_buf + CURL_BLOCK_OFFSET(start);
 
 trace_curl_pending_hit(qemu_coroutine_self(),
start, len);
-qemu_iovec_from_buf(acb->qiov, 0, buf, clamped_len);
+qemu_iovec_from_buf(acb->qiov, acb->qiov_offset, buf, clamped_len);
 if (clamped_len < len) {
-qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
clamped_len);
+qemu_iovec_memset(acb->qiov, acb->qiov_offset + clamped_len,
+  0, len - clamped_len);
 }
 acb->ret = 0;
 return true;
 }
 
-// Wait for unfinished chunks
+/*
+ * If an in-progress IO will provide the required data, wait
+ * for it to complete - the initiator will complete this
+ * aiocb.
+ */
 if (state->in_use &&
 (start >= state->buf_start) &&
 (start <= buf_fend) &&
@@ -320,10 +343,10 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t 
start, uint64_t len,
 
 trace_curl_pending_piggyback(qemu_coroutine_self(),
  start, len);
-acb->start = start - state->buf_start;
+acb->start = CURL_BLOCK_OFFSET(start);
 acb->end = acb->start + clamped_len;
 
-for (j=0; jacb[j]) {
 state->acb[j] = acb;
 return true;
@@ -377,7 +400,7 @@ static void curl_multi_check_completion(BDRVCURLState *s)
 for (i = 0; i < CURL_NUM_ACB; i++) {
 CURLAIOCB *acb = state->acb[i];
 
-if (acb == NULL) {
+if (!acb) {
 continue;
 }
 
@@ -385,14 +408,15 @@ 

[RFC PATCH 8/9] block/curl: Allow 16 sockets/ACB

2020-08-18 Thread David Edmondson
qemu-img allows up to 16 coroutines when performing IO. Ensure that
there is a Curl socket and ACB available to each of them.

Signed-off-by: David Edmondson 
---
 block/curl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 27fa77c351..8ee314739a 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -60,8 +60,8 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define PROTOCOLS (CURLPROTO_HTTP | CURLPROTO_HTTPS | \
CURLPROTO_FTP | CURLPROTO_FTPS)
 
-#define CURL_NUM_STATES 8
-#define CURL_NUM_ACB8
+#define CURL_NUM_STATES 16
+#define CURL_NUM_ACB16
 #define CURL_TIMEOUT_MAX 1
 
 #define CURL_BLOCK_OPT_URL   "url"
-- 
2.27.0




[RFC PATCH 7/9] block/curl: Allow the user to control the number of cache blocks

2020-08-18 Thread David Edmondson
Rather than using a fixed number, allow the user to specify the number
of cache blocks allocated. This cannot be less than the number of Curl
states and defaults to that value.

Signed-off-by: David Edmondson 
---
 block/curl.c  | 20 +---
 docs/system/device-url-syntax.rst.inc |  4 
 qapi/block-core.json  |  4 
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 0ea9eedebd..27fa77c351 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -75,14 +75,15 @@ static CURLMcode __curl_multi_socket_action(CURLM 
*multi_handle,
 #define CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET "proxy-password-secret"
 #define CURL_BLOCK_OPT_OFFSET "offset"
 #define CURL_BLOCK_OPT_BLOCKSIZE "blocksize"
+#define CURL_BLOCK_OPT_BLOCKCOUNT "blockcount"
 
 #define CURL_BLOCK_OPT_SSLVERIFY_DEFAULT true
 #define CURL_BLOCK_OPT_TIMEOUT_DEFAULT 5
 /* Must be a non-zero power of 2. */
 #define CURL_BLOCK_OPT_BLOCKSIZE_DEFAULT (256 * 1024)
+/* The defaultnumber of blocks to store in the cache. */
+#define CURL_BLOCK_OPT_BLOCKCOUNT_DEFAULT (CURL_NUM_STATES)
 
-/* The maximum number of blocks to store in the cache. */
-#define CURL_BLOCK_CACHE_MAX_BLOCKS 100
 /* The number of heads in the hash table. */
 #define CURL_BLOCK_CACHE_HASH 37
 
@@ -161,6 +162,7 @@ typedef struct BDRVCURLState {
 char *proxypassword;
 size_t offset;
 size_t blocksize;
+int cache_max;
 int cache_allocated; /* The number of block_t currently allocated. */
 QLIST_HEAD(, block) cache_free;
 QTAILQ_HEAD(, block) cache_lru;
@@ -287,7 +289,7 @@ static block_t *curl_cache_get(BDRVCURLState *s)
 }
 
 /* If not at the limit, try get a new one. */
-if (s->cache_allocated < CURL_BLOCK_CACHE_MAX_BLOCKS) {
+if (s->cache_allocated < s->cache_max) {
 b = curl_cache_alloc(s);
 if (b) {
 b->use++;
@@ -929,6 +931,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_SIZE,
 .help = "Block size for IO requests"
 },
+{
+.name = CURL_BLOCK_OPT_BLOCKCOUNT,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum number of cached blocks"
+},
 { /* end of list */ }
 },
 };
@@ -1039,6 +1046,13 @@ static int curl_open(BlockDriverState *bs, QDict 
*options, int flags,
 error_setg(errp, "blocksize must be a non-zero power of two");
 goto out_noclean;
 }
+s->cache_max = qemu_opt_get_size(opts, CURL_BLOCK_OPT_BLOCKCOUNT,
+ CURL_BLOCK_OPT_BLOCKCOUNT_DEFAULT);
+if (s->cache_max < CURL_NUM_STATES) {
+error_setg(errp, "blockcount must be larger than %d",
+CURL_NUM_STATES - 1);
+goto out_noclean;
+}
 
 trace_curl_open(file);
 qemu_co_queue_init(>free_state_waitq);
diff --git a/docs/system/device-url-syntax.rst.inc 
b/docs/system/device-url-syntax.rst.inc
index ee504ee41a..56843cb38f 100644
--- a/docs/system/device-url-syntax.rst.inc
+++ b/docs/system/device-url-syntax.rst.inc
@@ -201,6 +201,10 @@ These are specified using a special URL syntax.
   bytes. The value must be a non-zero power of two.  It defaults
   to 256kB.
 
+   ``blockcount``
+  The number of ``blocksize`` blocks that the system may allocate
+  to store data read from the remote server.
+
Note that when passing options to qemu explicitly, ``driver`` is the
value of .
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index cd16197e1e..91888166fa 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3767,11 +3767,15 @@
 # @blocksize: Size of all IO requests sent to the remote server; must
 # be a non-zero power of two (defaults to 1 256kB)
 #
+# @blockcount: The number of IO blocks used to cache data from the
+#  remote server.
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsCurlBase',
   'data': { 'url': 'str',
 '*blocksize': 'int',
+'*blockcount': 'int',
 '*timeout': 'int',
 '*username': 'str',
 '*password-secret': 'str',
-- 
2.27.0




Re: [PATCH v7 10/47] mirror-top: Support compressed writes

2020-08-18 Thread Kevin Wolf
Am 25.06.2020 um 17:21 hat Max Reitz geschrieben:
> Signed-off-by: Max Reitz 
> ---
>  block/mirror.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index e8e8844afc..469acf4600 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1480,6 +1480,15 @@ static int coroutine_fn 
> bdrv_mirror_top_pdiscard(BlockDriverState *bs,
>  NULL, 0);
>  }
>  
> +static int coroutine_fn bdrv_mirror_top_pwritev_compressed(BlockDriverState 
> *bs,
> +   uint64_t offset,
> +   uint64_t bytes,
> +   QEMUIOVector 
> *qiov)
> +{
> +return bdrv_mirror_top_pwritev(bs, offset, bytes, qiov,
> +   BDRV_REQ_WRITE_COMPRESSED);
> +}

Hm, not sure if it's a problem, but bdrv_supports_compressed_writes()
will now return true for mirror-top. However, with an active mirror to a
target that doesn't support compression, trying to actually do a
compressed write will always return -ENOTSUP.

So I guess the set of nodes patch 7 looks at still isn't quite complete.
However, it's not obvious how to make it more complete without
delegating to the driver.

Maybe we need to use bs->supported_write_flags, which is set by the
driver, instead of looking at the presence of callbacks.

Of course, in the general case, we also should make sure that graph
changes will be reflected in bs->supported_write_flags, but we already
fail to do this in raw-format, so I guess ignoring it for now is good
enough here, too...

Kevin




Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-18 Thread Nir Soffer
On Tue, Aug 18, 2020 at 11:47 AM Kevin Wolf  wrote:

> Am 17.08.2020 um 19:19 hat Nir Soffer geschrieben:
> > On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:
> >
> > > Instead of implementing qemu-nbd --offset in the NBD code, just put a
> > > raw block node with the requested offset on top of the user image and
> > > rely on that doing the job.
> > >
> > > This does not only simplify the nbd_export_new() interface and bring it
> > > closer to the set of options that the nbd-server-add QMP command
> offers,
> > > but in fact it also eliminates a potential source for bugs in the NBD
> > > code which previously had to add the offset manually in all relevant
> > > places.
> > >
> >
> > Just to make sure I understand this correctly -
> >
> > qemu-nbd can work with:
> >
> > $ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'
> >
> > And:
> >
> > $ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
> > "filename": "test.raw"}}'
> >
> > I assumed that we always create the raw node?
>
> No, the first form creates only the 'file' node without a 'raw' node on
> top. For all practical matters, this should be the same in qemu-img or
> qemu-nbd. For actually running VMs, omitting the 'raw' node where it's
> not needed can improve performance a little.
>

We did not check if we have different performance with the extra raw node.
Since in our use case (copying images) small reads/writes are unlikely, I
don't
think it will make a difference.

What is true is that if you use a filename without specifying the driver
> (i.e.  you rely on format probing), you'll get a 'raw' node on top of
> the 'file' node.
>
> > oVirt always uses the second form to make it easier to support offset,
> > size, and backing.
> >
> https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104
> >
> > This also seems to be the way libvirt builds the nodes using -blockdev.
>
> libvirt actually has a BZ to avoid the 'raw' node for performance when
> it's not needed.
>
> > Do we have a way to visualize the internal node graph used by
> > qemu-nbd/qemu-img?
>
> No, but as long as you explicitly specify the driver, you get exactly
> what you specified.
>

So this is not really needed then.


> For exploring what happens, you can pass the same json: filename to QEMU
> (maybe with -hda) and then use the monitor to inspect the state.
>
> Kevin
>
>


Re: [RFC PATCH 06/22] qemu-nbd: Use raw block driver for --offset

2020-08-18 Thread Kevin Wolf
Am 17.08.2020 um 19:19 hat Nir Soffer geschrieben:
> On Thu, Aug 13, 2020 at 7:36 PM Kevin Wolf  wrote:
> 
> > Instead of implementing qemu-nbd --offset in the NBD code, just put a
> > raw block node with the requested offset on top of the user image and
> > rely on that doing the job.
> >
> > This does not only simplify the nbd_export_new() interface and bring it
> > closer to the set of options that the nbd-server-add QMP command offers,
> > but in fact it also eliminates a potential source for bugs in the NBD
> > code which previously had to add the offset manually in all relevant
> > places.
> >
> 
> Just to make sure I understand this correctly -
> 
> qemu-nbd can work with:
> 
> $ qemu-nbd 'json:{"driver": "file", "filename": "test.raw"}'
> 
> And:
> 
> $ qemu-nbd 'json:{"driver": "raw", "file": {"driver": "file",
> "filename": "test.raw"}}'
> 
> I assumed that we always create the raw node?

No, the first form creates only the 'file' node without a 'raw' node on
top. For all practical matters, this should be the same in qemu-img or
qemu-nbd. For actually running VMs, omitting the 'raw' node where it's
not needed can improve performance a little.

What is true is that if you use a filename without specifying the driver
(i.e.  you rely on format probing), you'll get a 'raw' node on top of
the 'file' node.

> oVirt always uses the second form to make it easier to support offset,
> size, and backing.
> https://github.com/oVirt/ovirt-imageio/blob/2021164d064227d7c5e03c8da087adc66e3a577e/daemon/ovirt_imageio/_internal/qemu_nbd.py#L104
> 
> This also seems to be the way libvirt builds the nodes using -blockdev.

libvirt actually has a BZ to avoid the 'raw' node for performance when
it's not needed.

> Do we have a way to visualize the internal node graph used by
> qemu-nbd/qemu-img?

No, but as long as you explicitly specify the driver, you get exactly
what you specified.

For exploring what happens, you can pass the same json: filename to QEMU
(maybe with -hda) and then use the monitor to inspect the state.

Kevin




Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster

2020-08-18 Thread Kevin Wolf
Am 17.08.2020 um 20:18 hat Alberto Garcia geschrieben:
> On Mon 17 Aug 2020 05:53:07 PM CEST, Kevin Wolf wrote:
> > Maybe the difference is in allocating 64k at once instead of doing a
> > separate allocation for every 4k block? But with the extent size hint
> > patches to file-posix, we should allocate 1 MB at once by default now
> > (if your test image was newly created). Can you check whether this is
> > in effect for your image file?
> 
> Ehmm... is that hint supported in ext4 or only in xfs?

Hm, I had understood that ext4 supports this, but looking at the kernel
code, it doesn't look like it. :-(

Kevin




Re: [PATCH] qemu-img: Explicit number replaced by a constant

2020-08-18 Thread Stefano Garzarella
Hi Yi Li,
thanks for this patch! Just a comment below:

On Mon, Aug 17, 2020 at 07:01:13PM +0800, Yi Li wrote:
> Signed-off-by: Yi Li 
> ---
>  qemu-img.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 5308773811..a0fbc2757c 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1181,7 +1181,7 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t 
> n)
>  }
>  
>  /*
> - * Returns true iff the first sector pointed to by 'buf' contains at least
> + * Returns true if the first sector pointed to by 'buf' contains at least

This change seems unrelated, please can you put this in a separate patch?

Thanks,
Stefano

>   * a non-NUL byte.
>   *
>   * 'pnum' is set to the number of sectors (including and immediately 
> following
> @@ -1200,10 +1200,10 @@ static int is_allocated_sectors(const uint8_t *buf, 
> int n, int *pnum,
>  *pnum = 0;
>  return 0;
>  }
> -is_zero = buffer_is_zero(buf, 512);
> +is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE);
>  for(i = 1; i < n; i++) {
> -buf += 512;
> -if (is_zero != buffer_is_zero(buf, 512)) {
> +buf += BDRV_SECTOR_SIZE;
> +if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE)) {
>  break;
>  }
>  }
> @@ -2489,8 +2489,8 @@ static int img_convert(int argc, char **argv)
>  }
>  }
>  
> -qemu_opt_set_number(opts, BLOCK_OPT_SIZE, s.total_sectors * 512,
> -_abort);
> +qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
> +s.total_sectors * BDRV_SECTOR_SIZE, 
> _abort);
>  ret = add_old_style_options(out_fmt, opts, out_baseimg, NULL);
>  if (ret < 0) {
>  goto out;
> -- 
> 2.25.3
> 
> 
> 
> 




Re: [PATCH 00/18] hw/riscv: Add Microchip PolarFire SoC Icicle Kit board support

2020-08-18 Thread Anup Patel
On Tue, Aug 18, 2020 at 1:23 AM  wrote:
>
> On 8/17/20 8:28 PM, Alistair Francis wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > On Mon, Aug 17, 2020 at 11:12 AM via  wrote:
> >> Hi Anup,
> >>
> >> On 8/17/20 11:30 AM, Bin Meng wrote:
> >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> >>> the content is safe
> >>>
> >>> Hi Anup,
> >>>
> >>> On Sat, Aug 15, 2020 at 1:44 AM Anup Patel  wrote:
>  On Fri, Aug 14, 2020 at 10:12 PM Bin Meng  wrote:
> > From: Bin Meng 
> >
> > This adds support for Microchip PolarFire SoC Icicle Kit board.
> > The Icicle Kit board integrates a PolarFire SoC, with one SiFive's
> > E51 plus four U54 cores and many on-chip peripherals and an FPGA.
>  Nice Work !!! This is very helpful.
> >>> Thanks!
> >>>
>  The Microchip HSS is quite convoluted. It has:
>  1. DDR Init
>  2. Boot device support
>  3. SBI support using OpenSBI as library
>  4. Simple TEE support
> 
>  I think point 1) and 2) above should be part of U-Boot SPL.
>  The point 3) can be OpenSBI FW_DYNAMIC.
> 
>  Lastly,for point 4), we are working on a new OpenSBI feature using
>  which we can run independent Secure OS and Non-Secure OS using
>  U-Boot_SPL+OpenSBI (for both SiFive Unleashed and Microchip
>  PolarFire).
> 
>  Do you have plans for adding U-Boot SPL support for this board ??
> >>> + Cyril Jean from Microchip
> >>>
> >>> I will have to leave this question to Cyril to comment.
> >>>
> >> I currently do not have a plan to support U-Boot SPL. The idea of the
> >> HSS is to contain all the silicon specific initialization and
> >> configuration code within the HSS before jumping to U-Boot S-mode. I
> >> would rather keep all this within the HSS for the time being. I would
> >> wait until we reach production silicon before attempting to move this to
> >> U-Boot SPL as the HSS is likely to contain some opaque silicon related
> >> changes for another while.
> > That is unfortunate, a lot of work has gone into making the boot flow
> > simple and easy to use.
> >
> > QEMU now includes OpenSBI by default to make it easy for users to boot
> > Linux. The Icicle Kit board is now the most difficult QEMU board to
> > boot Linux on. Not to mention it makes it hard to impossible to
> > support it in standard tool flows such as meta-riscv.
> >
> > Alistair
>
> If it is such a problem we can add a U-Boot SPL stage and the HSS can be
> treated as standard SoC ROM code.

It's not mandatory for U-Boot SPL to have stable DRAM calibration settings
from the start itself. The initial U-Boot SPL support for most
platforms (accross
architectures) usually include basic working DRAM calibration settings which
is later on updated with separate patches. Also, we don't need all U-Boot
drivers to be upstreamed in one-go as this can be done in phases.

The advantage we have with PolarFire SoC Icicle board is that we already
have a U-Boot S-mode. and we believe the OpenSBI generic platform will
work fine for PolarFire SoC Icicle board so the only thing missing right now
is the U-Boot SPL support for OpenSource boot-flow.

It will certainly accelerate open-source development if we have boot-flow
U-Boot_SPL => OpenSBI (FW_DYNAMIC) => U-Boot_S-mode working
on PolarFire SoC Icicle board. Initially, we just need DRAM, SD/eMMC,
and Serial port support for U-Boot SPL and U-Boot S-mode. Later on,
more patches can add ethernet and other booting device drivers to U-Boot.

Regarding security services of HSS, we are working on a OpenSBI
feature which will allow HSS security services to run as independent
binary in M-mode (not linked to OpenSBI) and OpenSBI FW_DYNAMIC
will be a separate binary acting as a secure monitor.

Regards,
Anup



Re: What is bs->reqs_lock for?

2020-08-18 Thread Paolo Bonzini
On 13/08/20 18:34, Vladimir Sementsov-Ogievskiy wrote:
> I thought bs is attached to one aio context and aio context attached to
> one iothread.

For now yes, but with multiqueue there would be many iothreads sending
requests to the AioContext.  The BDS would still have a "home"
aiocontext to request socket readiness events, but
io_uring/linux_aio/threadpool requests could be issued from any iothread.

> And all normal request processing of the bs is done in this one iothread.
> And when we need to access bs externally, we do it in
> aio_context_acquire / aio_context_release, which protects from parallel
> access to BlockDriverState fields...
> 
> But you say, that block/io.c is not protected by AioContext lock..
> Does it mean that everything must be thread-safe in block/io.c and all
> block drivers?

Yes.

> 
> Are tracked_requests different from other fields? A lot of other
> BlockDriverState
> fields are not protected by any mutex.. For example: total_sectors,
> file, backing..

Rules are documented in include/block/block_int.h.  It seems however
that never_freeze was blindly added at the end.

Paolo

> Could you give an example of parallel access to tracked_requests?
> 




Re: [PATCH] hw: add a number of SPI-flash's of m25p80 family

2020-08-18 Thread Cédric Le Goater
On 8/17/20 7:16 PM, i.kononenko wrote:
> No, the ext ID wasn't be checked at a real HW.
> Just copied it from the U-boot official repository
> https://github.com/u-boot/u-boot/blob/789bfb52668ee609b2043de645e2f94bbd24fd1f/drivers/mtd/spi/spi-nor-ids.c#L183


OK.

Reviewed-by: Cédric Le Goater 

> Do i need to take it from a real HW and compare?
No. That's fine :)

Thanks,

C. 


> 
> On 12.08.2020 10:27, Cédric Le Goater wrote:
>> On 8/11/20 10:37 PM, Igor Kononenko wrote:
>>> Support a following SPI flashes:
>>> * mx66l51235f
>>> * mt25ql512ab
>>>
>>> Signed-off-by: Igor Kononenko 
>>> ---
>>>  hw/block/m25p80.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
>>> index 8227088441..bf1f833784 100644
>>> --- a/hw/block/m25p80.c
>>> +++ b/hw/block/m25p80.c
>>> @@ -219,6 +219,7 @@ static const FlashPartInfo known_devices[] = {
>>>  { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
>>>  { INFO("mx25l25635e", 0xc22019,  0,  64 << 10, 512, 0) },
>>>  { INFO("mx25l25655e", 0xc22619,  0,  64 << 10, 512, 0) },
>>> +{ INFO("mx66l51235f", 0xc2201a,  0,  64 << 10, 1024, ER_4K | 
>>> ER_32K) },
>>>  { INFO("mx66u51235f", 0xc2253a,  0,  64 << 10, 1024, ER_4K | 
>>> ER_32K) },
>>>  { INFO("mx66u1g45g",  0xc2253b,  0,  64 << 10, 2048, ER_4K | 
>>> ER_32K) },
>>>  { INFO("mx66l1g45g",  0xc2201b,  0,  64 << 10, 2048, ER_4K | 
>>> ER_32K) },
>>> @@ -237,6 +238,7 @@ static const FlashPartInfo known_devices[] = {
>>>  { INFO("n25q128", 0x20ba18,  0,  64 << 10, 256, 0) },
>>>  { INFO("n25q256a",0x20ba19,  0,  64 << 10, 512, ER_4K) },
>>>  { INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
>>> +{ INFO("mt25ql512ab", 0x20ba20, 0x1044, 64 << 10, 1024, ER_4K | 
>>> ER_32K) },
>>
>> Have checked the extended ID on real HW ? 
>>
>> C. 
>>
>>>  { INFO_STACKED("n25q00",0x20ba21, 0x1000, 64 << 10, 2048, ER_4K, 
>>> 4) },
>>>  { INFO_STACKED("n25q00a",   0x20bb21, 0x1000, 64 << 10, 2048, ER_4K, 
>>> 4) },
>>>  { INFO_STACKED("mt25ql01g", 0x20ba21, 0x1040, 64 << 10, 2048, ER_4K, 
>>> 2) },
>>>
>>
>