Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

2017-05-19 Thread Daniel P. Berrange
On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> We don't really have a return path for the other types yet. Let's check
> this when .get_return_path() is called.
> 
> For this, we introduce a new feature bit, and set it up only for socket
> typed IO channels.
> 
> This will help detect earlier failure for postcopy, e.g., logically
> speaking postcopy cannot work with "exec:". Before this patch, when we
> try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> With this patch, we'll get:
> 
> (qemu) migrate -d exec:cat>out
> Unable to open return-path for postcopy

This is wrong - post-copy migration *can* work with exec: - it just entirely
depends on what command you are running. Your example ran a command which is
unidirectional, but if you ran 'exec:socat ...' you would have a fully
bidirectional channel. Actually the channel is always bi-directional, but
'cat' simply won't ever send data back to QEMU.

If QEMU hangs when the other end doesn't send data back, that actually seems
like a potentially serious bug in migration code. Even if using the normal
'tcp' migration protocol, if the target QEMU server hangs and fails to
send data to QEMU on the return path, the source QEMU must never hang.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [PATCH] usb: Deprecate HMP commands usb_add and usb_del

2017-05-19 Thread Dr. David Alan Gilbert
* Thomas Huth (th...@redhat.com) wrote:
> The commands 'device_add' and 'device_del' should be used
> nowadays instead.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hmp-commands.hx | 6 --
>  vl.c| 6 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index baeac47..e763606 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -676,7 +676,8 @@ ETEXI
>  STEXI
>  @item usb_add @var{devname}
>  @findex usb_add
> -Add the USB device @var{devname}.  For details of available devices see
> +Add the USB device @var{devname}. This command is deprecated, please
> +use @code{device_add} instead. For details of available devices see
>  @ref{usb_devices}
>  ETEXI
>  
> @@ -693,7 +694,8 @@ STEXI
>  @findex usb_del
>  Remove the USB device @var{devname} from the QEMU virtual USB
>  hub. @var{devname} has the syntax @code{bus.addr}. Use the monitor
> -command @code{info usb} to see the devices you can remove.
> +command @code{info usb} to see the devices you can remove. This
> +command is deprecated, please use @code{device_del} instead.
>  ETEXI
>  
>  {
> diff --git a/vl.c b/vl.c
> index 3ca6bd3..268a8d8 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1435,6 +1435,9 @@ static int usb_parse(const char *cmdline)
>  void hmp_usb_add(Monitor *mon, const QDict *qdict)
>  {
>  const char *devname = qdict_get_str(qdict, "devname");
> +
> +error_report("usb_add is deprecated, please use device_add instead");
> +
>  if (usb_device_add(devname) < 0) {
>  error_report("could not add USB device '%s'", devname);
>  }
> @@ -1443,6 +1446,9 @@ void hmp_usb_add(Monitor *mon, const QDict *qdict)
>  void hmp_usb_del(Monitor *mon, const QDict *qdict)
>  {
>  const char *devname = qdict_get_str(qdict, "devname");
> +
> +error_report("usb_del is deprecated, please use device_del instead");
> +
>  if (usb_device_del(devname) < 0) {
>  error_report("could not delete USB device '%s'", devname);
>  }
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PULL 1/1] ui: egl-headless requires dmabuf support

2017-05-19 Thread Gerd Hoffmann
Reported-by: Thomas Huth 
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20170517122744.3541-1-kra...@redhat.com
---
 vl.c | 4 ++--
 ui/Makefile.objs | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/vl.c b/vl.c
index 1512df6e9e..ca4a5d679c 100644
--- a/vl.c
+++ b/vl.c
@@ -2129,7 +2129,7 @@ static DisplayType select_display(const char *p)
 exit(1);
 }
 } else if (strstart(p, "egl-headless", )) {
-#ifdef CONFIG_OPENGL
+#ifdef CONFIG_OPENGL_DMABUF
 request_opengl = 1;
 display_opengl = 1;
 display = DT_EGL;
@@ -4668,7 +4668,7 @@ int main(int argc, char **argv, char **envp)
 qemu_spice_display_init();
 }
 
-#ifdef CONFIG_OPENGL
+#ifdef CONFIG_OPENGL_DMABUF
 if (display_type == DT_EGL) {
 egl_headless_init();
 }
diff --git a/ui/Makefile.objs b/ui/Makefile.objs
index aac6ae8bef..3369451285 100644
--- a/ui/Makefile.objs
+++ b/ui/Makefile.objs
@@ -33,7 +33,7 @@ common-obj-y += shader.o
 common-obj-y += console-gl.o
 common-obj-y += egl-helpers.o
 common-obj-y += egl-context.o
-common-obj-y += egl-headless.o
+common-obj-$(CONFIG_OPENGL_DMABUF) += egl-headless.o
 ifeq ($(CONFIG_GTK_GL),y)
 common-obj-$(CONFIG_GTK) += gtk-gl-area.o
 else
-- 
2.9.3




Re: [Qemu-devel] [PATCH 2/5] migration: Create block capability

2017-05-19 Thread Markus Armbruster
Juan Quintela  writes:

> Create one capability for block migration and one parameter for
> incremental block migration.
>
> Signed-off-by: Juan Quintela 
[...]
> diff --git a/include/migration/block.h b/include/migration/block.h
> index 41a1ac8..5225af9 100644
> --- a/include/migration/block.h
> +++ b/include/migration/block.h
> @@ -20,4 +20,6 @@ uint64_t blk_mig_bytes_transferred(void);
>  uint64_t blk_mig_bytes_remaining(void);
>  uint64_t blk_mig_bytes_total(void);
>  
> +void migrate_set_block_enabled(bool value, Error **errp);
> +
>  #endif /* MIGRATION_BLOCK_H */
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 49ec501..024a048 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -153,6 +153,9 @@ struct MigrationState
>  
>  /* The last error that occurred */
>  Error *error;
> +/* Do we have to clean up -b/-i from old migrate parameters */

The sentence is a question, so it should end with a '?'.

> +/* This feature is deprecated and will be removed */
> +bool must_remove_block_options;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> @@ -265,6 +268,9 @@ bool migrate_colo_enabled(void);
>  
>  int64_t xbzrle_cache_resize(int64_t new_size);
>  
> +bool migrate_use_block(void);
> +bool migrate_use_block_incremental(void);
> +
>  bool migrate_use_compression(void);
>  int migrate_compress_level(void);
>  int migrate_compress_threads(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index 0304c01..c13c0a2 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
[...]
> @@ -1207,6 +1242,24 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  return;
>  }
>  
> +if ((has_blk && blk) || (has_inc && inc)) {
> +if (migrate_use_block() || migrate_use_block_incremental()) {
> +error_setg(errp, "Command options are incompatible with "
> +   "current migration capabilities");
> +return;
> +}
> +migrate_set_block_enabled(true, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +s->must_remove_block_options = true;
> +}
> +
> +if (has_inc && inc) {
> +migrate_set_block_incremental(s, true);
> +}
> +

Putting this within the previous conditional might be clearer.  Your
choice.

>  s = migrate_init();
>  
>  if (strstart(uri, "tcp:", )) {
[...]

Only nitpicks, so
Reviewed-by: Markus Armbruster 



[Qemu-devel] [PATCH v1 03/13] qcow2: do not COW the empty areas

2017-05-19 Thread Anton Nefedov
If COW area of the newly allocated cluster is zeroes, there is no reason
to write zero sectors in perform_cow() again now as whole clusters are
zeroed out in single chunks by handle_alloc_space().

Introduce QCowL2Meta field "reduced", since the existing fields
(offset and nb_bytes) still has to keep other write requests from
simultaneous writing in the area

iotest 060:
write to the discarded cluster does not trigger COW anymore.
so, break on write_aio event instead, will work for the test
(but write won't fail anymore, so update reference output)

iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match

performance tests: ===

qemu-io,
  results in seconds to complete (less is better)
  random write 4k to empty image, no backing
HDD
  64k cluster
128M over 128M image:   160 -> 160 ( x1  )
128M over   2G image:86 ->  84 ( x1  )
128M over   8G image:40 ->  29 ( x1.4 )
  1M cluster
 32M over   8G image:58 ->  23 ( x2.5 )

SSD
  64k cluster
  2G over   2G image:71 ->  38 (  x1.9 )
512M over   8G image:85 ->   8 ( x10.6 )
  1M cluster
128M over  32G image:   314 ->   2 ( x157  )

  - improvement grows bigger the bigger the cluster size,
  - first data portions to the fresh image benefit the most
  (more chance to hit an unallocated cluster)
  - SSD improvement is close to the IO length reduction rate
  (e.g. writing only 4k instead of 64k) gives theoretical x16
  and practical x10 improvement)

fio tests over xfs, empty image (cluster 64k), no backing,

  first megabytes of random writes:
randwrite 4k, size=8g:

  HDD (io_size=128m) :  730 ->  1050 IOPS ( x1.45)
  SSD (io_size=512m) : 1500 ->  7000 IOPS ( x4.7 )

  random writes io_size==image_size:
randwrite 4k, size=2g io_size=2g:
   HDD   : 200 IOPS (no difference)
   SSD   : 7500 ->  9500 IOPS ( x1.3 )

  sequential write:
seqwrite 4k, size=4g, iodepth=4
   SSD   : 7000 -> 18000 IOPS ( x2.6 )

  - numbers are similar to qemu-io tests, slightly less improvement
  (damped by fs?)

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c  |  4 +++-
 block/qcow2.c  | 23 +++
 block/qcow2.h  |  4 
 tests/qemu-iotests/060 |  2 +-
 tests/qemu-iotests/060.out |  3 ++-
 tests/qemu-iotests/066 |  2 +-
 tests/qemu-iotests/066.out |  4 ++--
 7 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 347d94b..cf18dee 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -758,7 +758,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, 
Qcow2COWRegion *r)
 BDRVQcow2State *s = bs->opaque;
 int ret;
 
-if (r->nb_bytes == 0) {
+if (r->nb_bytes == 0 || r->reduced) {
 return 0;
 }
 
@@ -1267,10 +1267,12 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 .cow_start = {
 .offset = 0,
 .nb_bytes   = offset_into_cluster(s, guest_offset),
+.reduced= false,
 },
 .cow_end = {
 .offset = nb_bytes,
 .nb_bytes   = avail_bytes - nb_bytes,
+.reduced= false,
 },
 };
 qemu_co_queue_init(&(*m)->dependent_requests);
diff --git a/block/qcow2.c b/block/qcow2.c
index b885dfc..b438f22 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -64,6 +64,9 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
 
+static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
+uint32_t count);
+
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
 const QCowHeader *cow_header = (const void *)buf;
@@ -1575,6 +1578,25 @@ fail:
 return ret;
 }
 
+static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m)
+{
+if (bs->encrypted) {
+return;
+}
+if (!m->cow_start.reduced && m->cow_start.nb_bytes != 0 &&
+is_zero_sectors(bs,
+(m->offset + m->cow_start.offset) >> BDRV_SECTOR_BITS,
+m->cow_start.nb_bytes >> BDRV_SECTOR_BITS)) {
+m->cow_start.reduced = true;
+}
+if (!m->cow_end.reduced && m->cow_end.nb_bytes != 0 &&
+is_zero_sectors(bs,
+(m->offset + m->cow_end.offset) >> BDRV_SECTOR_BITS,
+m->cow_end.nb_bytes >> BDRV_SECTOR_BITS)) {
+m->cow_end.reduced = true;
+}
+}
+
 static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 {
 BDRVQcow2State *s = bs->opaque;
@@ -1598,6 +1620,7 @@ static void handle_alloc_space(BlockDriverState 

[Qemu-devel] [PATCH v1 00/13] qcow2: space preallocation and COW improvements

2017-05-19 Thread Anton Nefedov
This pull request is to address a few performance problems of qcow2 format:

  1. non cluster-aligned write requests (to unallocated clusters) explicitly
pad data with zeroes if there is no backing data. This can be avoided
and the whole clusters are preallocated and zeroed in a single
efficient write_zeroes() operation, also providing better host file
continuity

  2. moreover, efficient write_zeroes() operation can be used to preallocate
space megabytes ahead which gives noticeable improvement on some storage
types (e.g. distributed storages where space allocation operation is
expensive)

  3. preallocating/zeroing the clusters in advance makes possible to enable
simultaneous writes to the same unallocated cluster, which is beneficial
for parallel sequential write operations which are not cluster-aligned

Performance test results are added to commit messages (see patch 3, 12)

Anton Nefedov (9):
  qcow2: is_zero_sectors(): return true if area is outside of backing
file
  qcow2: do not COW the empty areas
  qcow2: set inactive flag
  qcow2: handle_prealloc(): find out if area zeroed by earlier
preallocation
  qcow2: fix misleading comment about L2 linking
  qcow2-cluster: slightly refactor handle_dependencies()
  qcow2-cluster: make handle_dependencies() logic easier to follow
  qcow2: allow concurrent unaligned writes to the same clusters
  iotest 046: test simultaneous cluster write error case

Denis V. Lunev (3):
  qcow2: alloc space for COW in one chunk
  qcow2: preallocation at image expand
  qcow2: truncate preallocated space

Pavel Butsykin (1):
  qcow2: check space leak at the end of the image

 block/qcow2-cache.c|   3 +
 block/qcow2-cluster.c  | 216 +++-
 block/qcow2-refcount.c |  21 +++
 block/qcow2.c  | 286 -
 block/qcow2.h  |  26 
 tests/qemu-iotests/026.out | 104 ++
 tests/qemu-iotests/026.out.nocache | 104 ++
 tests/qemu-iotests/029.out |   5 +-
 tests/qemu-iotests/046 |  38 -
 tests/qemu-iotests/046.out |  23 +++
 tests/qemu-iotests/060 |   2 +-
 tests/qemu-iotests/060.out |  13 +-
 tests/qemu-iotests/061.out |   5 +-
 tests/qemu-iotests/066 |   2 +-
 tests/qemu-iotests/066.out |   9 +-
 tests/qemu-iotests/098.out |   7 +-
 tests/qemu-iotests/108.out |   5 +-
 tests/qemu-iotests/112.out |   5 +-
 tests/qemu-iotests/154.out |   4 +-
 19 files changed, 769 insertions(+), 109 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v1 04/13] qcow2: preallocation at image expand

2017-05-19 Thread Anton Nefedov
From: "Denis V. Lunev" 

This patch adds image preallocation at expand to provide better locality
of QCOW2 image file and optimize this procedure for some distributed
storages where this procedure is slow.

Image expand requests have to be suspended until the allocation is
performed which is done via special QCowL2Meta.
This meta is invisible to handle_dependencies() code.
This is the main reason for also calling preallocation before metadata
write: it might intersect with preallocation triggered by another IO,
and has to yield

Signed-off-by: Denis V. Lunev 
Signed-off-by: Anton Nefedov 
---
 block/qcow2-cache.c|   3 +
 block/qcow2-cluster.c  |   5 ++
 block/qcow2-refcount.c |  14 +
 block/qcow2.c  | 151 +
 block/qcow2.h  |   5 ++
 5 files changed, 178 insertions(+)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 1d25147..aa9da5f 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -204,6 +204,9 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, 
Qcow2Cache *c, int i)
 return ret;
 }
 
+/* check and preallocate extra space if touching a fresh metadata cluster 
*/
+qcow2_handle_prealloc(bs, c->entries[i].offset, s->cluster_size);
+
 if (c == s->refcount_block_cache) {
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
 } else if (c == s->l2_table_cache) {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cf18dee..a4b6d40 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -108,6 +108,9 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 goto fail;
 }
 
+qcow2_handle_prealloc(bs, new_l1_table_offset,
+  QEMU_ALIGN_UP(new_l1_size2, s->cluster_size));
+
 BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
 for(i = 0; i < s->l1_size; i++)
 new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
@@ -1820,6 +1823,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState 
*bs, uint64_t *l1_table,
 goto fail;
 }
 
+qcow2_handle_prealloc(bs, offset, s->cluster_size);
+
 ret = bdrv_pwrite_zeroes(bs->file, offset, s->cluster_size, 0);
 if (ret < 0) {
 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061..873a1d2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -547,6 +547,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 }
 
 /* Write refcount blocks to disk */
+qcow2_handle_prealloc(bs, meta_offset, blocks_clusters * s->cluster_size);
+
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE_BLOCKS);
 ret = bdrv_pwrite_sync(bs->file, meta_offset, new_blocks,
 blocks_clusters * s->cluster_size);
@@ -561,6 +563,10 @@ static int alloc_refcount_block(BlockDriverState *bs,
 cpu_to_be64s(_table[i]);
 }
 
+qcow2_handle_prealloc(bs, table_offset,
+  QEMU_ALIGN_UP(table_size * sizeof(uint64_t),
+s->cluster_size));
+
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE_TABLE);
 ret = bdrv_pwrite_sync(bs->file, table_offset, new_table,
 table_size * sizeof(uint64_t));
@@ -2104,6 +2110,8 @@ write_refblocks:
 goto fail;
 }
 
+qcow2_handle_prealloc(bs, refblock_offset, s->cluster_size);
+
 /* The size of *refcount_table is always cluster-aligned, therefore the
  * write operation will not overflow */
 on_disk_refblock = (void *)((char *) *refcount_table +
@@ -2158,6 +2166,8 @@ write_refblocks:
 }
 
 assert(reftable_size < INT_MAX / sizeof(uint64_t));
+qcow2_handle_prealloc(bs, reftable_offset,
+  reftable_size * sizeof(uint64_t));
 ret = bdrv_pwrite(bs->file, reftable_offset, on_disk_reftable,
   reftable_size * sizeof(uint64_t));
 if (ret < 0) {
@@ -2845,6 +2855,10 @@ int qcow2_change_refcount_order(BlockDriverState *bs, 
int refcount_order,
 cpu_to_be64s(_reftable[i]);
 }
 
+qcow2_handle_prealloc(bs, new_reftable_offset,
+  QEMU_ALIGN_UP(new_reftable_size * sizeof(uint64_t),
+s->cluster_size));
+
 ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
   new_reftable_size * sizeof(uint64_t));
 
diff --git a/block/qcow2.c b/block/qcow2.c
index b438f22..6e7ce96 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -464,6 +464,11 @@ static QemuOptsList qcow2_runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Clean unused cache entries after this time (in seconds)",
 },
+{
+.name = QCOW2_OPT_PREALLOC_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = 

[Qemu-devel] [PATCH v1 09/13] qcow2: fix misleading comment about L2 linking

2017-05-19 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 25210cd..4204db9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -827,12 +827,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 
 assert(l2_index + m->nb_clusters <= s->l2_size);
 for (i = 0; i < m->nb_clusters; i++) {
-/* if two concurrent writes happen to the same unallocated cluster
- * each write allocates separate cluster and writes data concurrently.
- * The first one to complete updates l2 table with pointer to its
- * cluster the second one has to do RMW (which is done above by
- * perform_cow()), update l2 table with its cluster pointer and free
- * old cluster. This is what this loop does */
+/* handle_dependencies() protects from normal cluster allocation
+ * collision; still L2 entry might be !0 in case of zero or compressed
+ * cluster reusage or writing over the snapshot
+ */
 if (l2_table[l2_index + i] != 0) {
 old_cluster[j++] = l2_table[l2_index + i];
 }
-- 
2.7.4




Re: [Qemu-devel] [PATCH 3/6] vhost: Update rings information for IOTLB earlier

2017-05-19 Thread Maxime Coquelin



On 05/18/2017 05:24 PM, Michael S. Tsirkin wrote:

On Thu, May 18, 2017 at 04:45:23PM +0200, Maxime Coquelin wrote:

Hi Michael,

On 05/18/2017 09:35 AM, Maxime Coquelin wrote:



On 05/17/2017 06:41 PM, Michael S. Tsirkin wrote:

On Fri, May 12, 2017 at 01:21:18PM +0200, Maxime Coquelin wrote:


On 05/11/2017 07:33 PM, Michael S. Tsirkin wrote:

On Thu, May 11, 2017 at 02:32:43PM +0200, Maxime Coquelin wrote:

Vhost-kernel backend need to receive IOTLB entries for rings
information early, but vhost-user need the same information
earlier, before VHOST_USER_SET_VRING_ADDR is sent.

Weird. What does VHOST_USER_SET_VRING_ADDR have to do with it?

According to
 Starting and stopping rings
in vhost user spec, vhost user does not access
anything until ring is started and enabled.



This patch also trigger IOTLB miss for all rings informations
for robustness, even if in practice these adresses are on the
same page.

Actually, the DPDK vhost-user backend is compliant with the spec,
but when handling VHOST_USER_SET_VRING_ADDR request, it translates the
guest addresses into backend VAs, and check they are valid. I
will make the
commit message clearer about this in next revision.

The check could be done later, for example when the ring are started,
but it wouldn't change the need to trigger a miss at some point.

I think it should be done later, yes. As long as ring is not
started addresses should not be interpreted.



Ok, then I'll move these addresses translations in the
VHOST_USER_SET_VRING_KICK handler.

s/VHOST_USER_SET_VRING_KICK/VHOST_USER_SET_VRING_ENABLE/


Note that when protocol features are off ring is started in
enabled state, but iommu requires protocol features.


OK, I will take care of this.

Note that currently in DPDK, the ring is created in enabled state,
so it is enabled as soon as started even with protocol features.
I have done the patch to fix this, will be posted with the patch that
do the ring addresses translations only when starting/enabling the ring.

Also, note that disabling VHOST_USER_F_PROTOCOL_FEATURES with latest
DPDK and QEMU seems broken. I'll add this to my todo list to understand
where the problem is, but this is lower priority.


I just looked at implementing this change, but I'm not convinced this is
the right thing to do.

On backend side, it means saving temporarily the vhost_vring_addr struct
into the vq struct, and moving all what is done currently in
SET_VRING_ADDR handler to SET_VRING_ENABLE one.


Yes, and this is consistent with what the kernel does.


My understanding of the "Starting and stopping rings" chapter of the
spec is that the ring must not be processed as long as not started and
enabled, not that the addresses passed should not be checked/translated
as it is done today both in DPDK and libvhost-user.

If the addresses are invalid, isn't it better to know as soon as
possible?

Cheers,
Maxime


There could be valid reasons to set an invalid address temporarily.
For example to make sure connection is reset.


Ok.

Thanks,
Maxime



[Qemu-devel] [Bug 1198350] Re: USB pass-through fails with USBDEVFS_DISCONNECT: Invalid argument

2017-05-19 Thread Thomas Huth
Triaging old bug tickets ... can you still reproduce this issue with the
latest version of QEMU (currently v2.9)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1198350

Title:
  USB pass-through fails with USBDEVFS_DISCONNECT: Invalid argument

Status in QEMU:
  Incomplete

Bug description:
  Host Gentoo linux 32bit
  Guest Windows XP SP3
  qemu 1.4.2 and
  qemu fresh get clone and build 2013-07-04 (version1.5.50)
  qemu command line

  qemu-system-i386 -enable-kvm localtime -m 2047 -boot d
  /archive3/qemu/WindowsXP.img -net nic,model=rtl8139 -net user -usb
  -device usb-ehci,id=ehci -usbdevice host:1493:19

  The device I am trying to use with the guest is an interface for the
  Suunto Ambit 2 GPS watch which has no linux support.

  When the USB device is plugged in qemu reports to the command line:

  USBDEVFS_DISCONNECT: Invalid argument
  Invalid argument

  dmesg shows

  [237755.495968] usb 2-1.5: new full-speed USB device number 34 using ehci-pci
  [237755.582778] usb 2-1.5: config 1 has an invalid interface number: 1 but 
max is 0
  [237755.582781] usb 2-1.5: config 1 has no interface number 0
  [237755.583628] usb 2-1.5: New USB device found, idVendor=1493, idProduct=0019
  [237755.583631] usb 2-1.5: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
  [237755.583633] usb 2-1.5: Product: Ambit
  [237755.583634] usb 2-1.5: Manufacturer: Suunto
  [237755.583636] usb 2-1.5: SerialNumber: CE8309511700
  [237756.584937] usb 2-1.5: reset full-speed USB device number 34 using 
ehci-pci
  [237756.832658] usb 2-1.5: reset full-speed USB device number 34 using 
ehci-pci
  [237757.143585] usb 2-1.5: usbfs: process 12684 (qemu-system-i38) did not 
claim interface 1 before use

  In the windows guest Device Manager a HID device is listed but nothing
  else happens, no found new hardware dialog or the Suunto software
  (which is sitting there waiting) is not triggered as it should be.

  I have tried successfully with several other devices (flash drive,
  mouse, printer and video capture device). Because this device pretends
  to be an HID device my kernel's hid-generic driver was picking it up
  first until I modified hid-core.c to ignore this vendorid/productid.
  But still no joy.

  I'm guessing it has something to do with the the dmesg lines:

  [237755.582778] usb 2-1.5: config 1 has an invalid interface number: 1 but 
max is 0
  [237755.582781] usb 2-1.5: config 1 has no interface number 0

  But read that these warnings are not important though I don't get them
  for other devices. Nor do I get:

  [237757.143585] usb 2-1.5: usbfs: process 12684 (qemu-system-i38) did
  not claim interface 1 before use

  I've done alot of searching and I've run out of ideas. Any help would
  be great.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1198350/+subscriptions



[Qemu-devel] [Bug 1511887] Re: USB device 1.1 not correctly passedthru from Linux host to Windows guest

2017-05-19 Thread Thomas Huth
Triaging old bug tickets ... can you still reproduce this issue with the
latest version of QEMU (currently v2.9)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1511887

Title:
  USB device 1.1 not correctly passedthru from Linux host to Windows
  guest

Status in QEMU:
  Incomplete

Bug description:
  I have USB Digital Oscilloscope which works great on pure Windows
  machine but not work on virtualized one. I tried passthru the device
  from my Debian Jessie (64bit) host machine to Windows 7 (32bit) guest
  machine but unfortunately it does not work very well. It looks that
  device is passed thru so Windows machine knows about new device and
  loads HID device driver for it but the device driver failed to start
  the device and details of an error provided by device manager is "This
  device cannot start" Code 10.

  Installed Qemu version: 2.1+dfsg-12+deb8u4 0

  USB device spec: Dynon Instruments ELAB-080, USB 1.1

  On linux host computer
  ---
  lsusb identify it as:
  Bus 003 Device 009: ID 13a3:0001 

  lsusb -t identify it as:
  /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
  |__ Port 1: Dev 9, If 0, Class=Human Interface Device, Driver=usbhid, 12M

  This is how I started my Windows guest machine
  --
  kvm -cpu host \
  -m 2048MiB \
  -hda test.vdi \
  -ctrl-grab \
  -parallel /dev/parport0 \
  -usbdevice host:13a3:0001

  ...also instead of last line I tried this one:
  -device usb-host,vendorid=0x13a3,productid=0x0001

  none of them help to properly handle my device inside guest machine.

  Only one time the Windows guest machine properly start the device so
  software for that oscilloscope can identify the Oscilloscope and
  measure for a while but unfortunately after I guess 5 seconds of
  measurement the device was disconnected from Windows and never start
  working again even after couple of restarts of guest machine even
  after plug and unplug it's USB cable and power cable.

  I searched for a solution or some clues to get it work but none of my
  searching over the internet was successful. Because device works on
  pure Windows but not work on virtualized one, I think there is a
  problem with handling not standard USB devices (like sticks,
  keyboards, mouses etc.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1511887/+subscriptions



[Qemu-devel] [Bug 685096] Re: USB Passthrough not working for Windows 7 guest

2017-05-19 Thread Thomas Huth
If I get the previous comments right, this is just about using the right
configuration, and not a real bug? If so, I assume we can close this
ticket nowadays?

** Changed in: qemu
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/685096

Title:
  USB Passthrough not working for Windows 7 guest

Status in QEMU:
  Incomplete
Status in qemu package in Ubuntu:
  Confirmed
Status in qemu package in Debian:
  New

Bug description:
  USB Passthrough from host to guest is not working for a 32-bit Windows
  7 guest, while it works perfectly for a 32-bit Windows XP guest.

  The device appears in the device manager of Windows 7, but with "Error
  code 10: device cannot start". I have tried this with numerous USB
  thumbdrives and a USB wireless NIC, all with the same result. The
  device name and functionality is recognized, so at least some USB
  negotiation is taking place.

  I am trying this with the latest git-pull of QEMU-KVM.

  The command line to launch qemu-kvm for win7 is:
  sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 
-smp 2 -vga std -hda ./disk_images/win7.qcow -vnc :1 -boot c -usb -usbdevice 
tablet -usbdevice host:0781:5150

  The command line to launch qemu-kvm for winxp is:
  sudo /home/user/local_install/bin/qemu-system-x86_64 -cpu core2duo -m 1024 
-smp 2 -usb -vga std -hda ./winxpsp3.qcow -vnc :0 -boot c -usbdevice tablet 
-usbdevice host:0781:5150

  Any help is appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/685096/+subscriptions



[Qemu-devel] [PATCH 00/21] s390x changes

2017-05-19 Thread Cornelia Huck
Hi,

here's a fairly large s390x update for which I plan to send a pull
request next week.

The biggest part is the introduction of the new vfio-ccw support
to passthrough ccw devices (kernel part has been merged as of
4.12-rc1). There's also some changes to allow the ccw bios to boot
from scsi generic devices, and a bugfix for initial reset.

Christian Borntraeger (1):
  s390/kvm: do not reset riccb on initial cpu reset

Cornelia Huck (1):
  linux-headers: update

Dong Jia Shi (6):
  s390x/css: realize css_create_sch
  s390x/css: device support for s390-ccw passthrough
  vfio/ccw: get io region info
  vfio/ccw: get irqs info and set the eventfd fd
  vfio/ccw: update sense data if a unit check is pending
  MAINTAINERS: Add vfio-ccw maintainer

Eric Farman (8):
  pc-bios/s390-ccw: Remove duplicate blk_factor adjustment
  pc-bios/s390-ccw: Move SCSI block factor to outer read
  pc-bios/s390-ccw: Break up virtio-scsi read into multiples
  pc-bios/s390-ccw: Refactor scsi_inquiry function
  pc-bios/s390-ccw: Get list of supported VPD pages
  pc-bios/s390-ccw: Get Block Limits VPD device data
  pc-bios/s390-ccw: Build a reasonable max_sectors limit
  pc-bios/s390-ccw.img: rebuild image

Xiao Feng Ren (5):
  s390x/css: add s390-squash-mcss machine option
  s390x/css: realize css_sch_build_schib
  vfio/ccw: vfio based subchannel passthrough driver
  s390x/css: introduce and realize ccw-request callback
  s390x/css: ccw translation infrastructure

 MAINTAINERS|   8 +
 default-configs/s390x-softmmu.mak  |   1 +
 hw/s390x/3270-ccw.c|   6 +-
 hw/s390x/Makefile.objs |   1 +
 hw/s390x/css-bridge.c  |   2 +
 hw/s390x/css.c | 290 +-
 hw/s390x/s390-ccw.c| 153 
 hw/s390x/s390-virtio-ccw.c |  32 +-
 hw/s390x/virtio-ccw.c  |   7 +-
 hw/vfio/Makefile.objs  |   1 +
 hw/vfio/ccw.c  | 434 +
 include/hw/s390x/css-bridge.h  |   1 +
 include/hw/s390x/css.h |  67 ++--
 include/hw/s390x/s390-ccw.h|  39 ++
 include/hw/s390x/s390-virtio-ccw.h |   1 +
 include/hw/vfio/vfio-common.h  |   1 +
 include/standard-headers/asm-x86/hyperv.h  |   7 +-
 include/standard-headers/linux/input-event-codes.h |   1 +
 include/standard-headers/linux/input.h |  11 +-
 include/standard-headers/linux/pci_regs.h  |   3 +-
 linux-headers/asm-arm/kvm.h|  10 +-
 linux-headers/asm-arm/unistd-common.h  |   1 +
 linux-headers/asm-arm64/kvm.h  |  10 +-
 linux-headers/asm-powerpc/kvm.h|   3 +
 linux-headers/asm-powerpc/unistd.h |   1 +
 linux-headers/asm-s390/kvm.h   |  29 +-
 linux-headers/asm-s390/unistd.h|   4 +-
 linux-headers/asm-x86/kvm.h|   3 +
 linux-headers/asm-x86/unistd_32.h  |   2 +
 linux-headers/asm-x86/unistd_64.h  |   1 +
 linux-headers/asm-x86/unistd_x32.h |   1 +
 linux-headers/linux/kvm.h  |  25 ++
 linux-headers/linux/userfaultfd.h  |  11 +-
 linux-headers/linux/vfio.h |  18 +
 linux-headers/linux/vfio_ccw.h |  24 ++
 pc-bios/s390-ccw.img   | Bin 26472 -> 26480 bytes
 pc-bios/s390-ccw/s390-ccw.h|   7 +
 pc-bios/s390-ccw/scsi.h|  30 ++
 pc-bios/s390-ccw/virtio-scsi.c |  85 +++-
 pc-bios/s390-ccw/virtio-scsi.h |   2 +
 pc-bios/s390-ccw/virtio.h  |   1 +
 qemu-options.hx|   6 +-
 scripts/update-linux-headers.sh|   2 +-
 target/s390x/cpu.c |   7 +-
 target/s390x/cpu.h |  16 +-
 target/s390x/ioinst.c  |   9 +
 46 files changed, 1293 insertions(+), 81 deletions(-)
 create mode 100644 hw/s390x/s390-ccw.c
 create mode 100644 hw/vfio/ccw.c
 create mode 100644 include/hw/s390x/s390-ccw.h
 create mode 100644 linux-headers/linux/vfio_ccw.h

-- 
2.13.0




[Qemu-devel] [PATCH 02/21] pc-bios/s390-ccw: Move SCSI block factor to outer read

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

Simple refactoring so that the blk_factor adjustment is
moved into virtio_scsi_read_many routine, in preparation
for another change.

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-3-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/virtio-scsi.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index 69b7a93b29..6d070e2f73 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -142,14 +142,13 @@ static bool scsi_report_luns(VDev *vdev, void *data, 
uint32_t data_size)
 }
 
 static bool scsi_read_10(VDev *vdev,
- ulong sector, int sectors, void *data)
+ ulong sector, int sectors, void *data,
+ unsigned int data_size)
 {
-int f = vdev->blk_factor;
-unsigned int data_size = sectors * virtio_get_block_size() * f;
 ScsiCdbRead10 cdb = {
 .command = 0x28,
-.lba = sector * f,
-.xfer_length = sectors * f,
+.lba = sector,
+.xfer_length = sectors,
 };
 VirtioCmd read_10[] = {
 { , sizeof(req), VRING_DESC_F_NEXT },
@@ -255,7 +254,10 @@ static void virtio_scsi_locate_device(VDev *vdev)
 int virtio_scsi_read_many(VDev *vdev,
   ulong sector, void *load_addr, int sec_num)
 {
-if (!scsi_read_10(vdev, sector, sec_num, load_addr)) {
+int f = vdev->blk_factor;
+unsigned int data_size = sec_num * virtio_get_block_size() * f;
+
+if (!scsi_read_10(vdev, sector * f, sec_num * f, load_addr, data_size)) {
 virtio_scsi_verify_response(, "virtio-scsi:read_many");
 }
 
-- 
2.13.0




[Qemu-devel] [PATCH 03/21] pc-bios/s390-ccw: Break up virtio-scsi read into multiples

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

A virtio-scsi request that goes through the host sd driver and exceeds
the maximum transfer size is automatically broken up for us.  But the
equivalent request going to the sg driver presumes that any length
requirements have already been honored.

Let's use the max_sectors field on the virtio-scsi controller device,
and break up all requests (both sd and sg) to avoid this problem.

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-4-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/s390-ccw.h|  7 +++
 pc-bios/s390-ccw/virtio-scsi.c | 20 +++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/pc-bios/s390-ccw/s390-ccw.h b/pc-bios/s390-ccw/s390-ccw.h
index 07d8cbcb20..2089274842 100644
--- a/pc-bios/s390-ccw/s390-ccw.h
+++ b/pc-bios/s390-ccw/s390-ccw.h
@@ -42,6 +42,13 @@ typedef unsigned long long __u64;
 #ifndef NULL
 #define NULL0
 #endif
+#ifndef MIN
+#define MIN(a, b) (((a) < (b)) ? (a) : (b))
+#endif
+#ifndef MIN_NON_ZERO
+#define MIN_NON_ZERO(a, b) ((a) == 0 ? (b) : \
+((b) == 0 ? (a) : (MIN(a, b
+#endif
 
 #include "cio.h"
 #include "iplb.h"
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index 6d070e2f73..ff65e2ee30 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -202,6 +202,7 @@ static void virtio_scsi_locate_device(VDev *vdev)
 debug_print_int("config.scsi.max_channel", vdev->config.scsi.max_channel);
 debug_print_int("config.scsi.max_target ", vdev->config.scsi.max_target);
 debug_print_int("config.scsi.max_lun", vdev->config.scsi.max_lun);
+debug_print_int("config.scsi.max_sectors", vdev->config.scsi.max_sectors);
 
 if (vdev->scsi_device_selected) {
 sdev->channel = vdev->selected_scsi_device.channel;
@@ -254,12 +255,21 @@ static void virtio_scsi_locate_device(VDev *vdev)
 int virtio_scsi_read_many(VDev *vdev,
   ulong sector, void *load_addr, int sec_num)
 {
+int sector_count;
 int f = vdev->blk_factor;
-unsigned int data_size = sec_num * virtio_get_block_size() * f;
-
-if (!scsi_read_10(vdev, sector * f, sec_num * f, load_addr, data_size)) {
-virtio_scsi_verify_response(, "virtio-scsi:read_many");
-}
+unsigned int data_size;
+
+do {
+sector_count = MIN_NON_ZERO(sec_num, vdev->config.scsi.max_sectors);
+data_size = sector_count * virtio_get_block_size() * f;
+if (!scsi_read_10(vdev, sector * f, sector_count * f, load_addr,
+  data_size)) {
+virtio_scsi_verify_response(, "virtio-scsi:read_many");
+}
+load_addr += data_size;
+sector += sector_count;
+sec_num -= sector_count;
+} while (sec_num > 0);
 
 return 0;
 }
-- 
2.13.0




[Qemu-devel] [PATCH 07/21] pc-bios/s390-ccw: Build a reasonable max_sectors limit

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

Now that we've read all the possible limits that have been defined for
a virtio-scsi controller and the disk we're booting from, it's possible
that we are STILL going to exceed the limits of the host device.
For example, a "-device scsi-generic" device does not support the
Block Limits VPD page.

So, let's fallback to something that seems to work for most boot
configurations if larger values were specified (including if nothing
was explicitly specified, and we took default values).

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-8-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/virtio-scsi.c | 9 +
 pc-bios/s390-ccw/virtio-scsi.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index b722f25ad7..f61ecf0205 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -399,6 +399,15 @@ void virtio_scsi_setup(VDev *vdev)
 vdev->max_transfer = evpd_bl->max_transfer;
 }
 
+/*
+ * The host sg driver will often be unhappy with particularly large
+ * I/Os that exceed the block iovec limits.  Let's enforce something
+ * reasonable, despite what the device configuration tells us.
+ */
+
+vdev->max_transfer = MIN_NON_ZERO(VIRTIO_SCSI_MAX_SECTORS,
+  vdev->max_transfer);
+
 if (!scsi_read_capacity(vdev, data, data_size)) {
 virtio_scsi_verify_response(, "virtio-scsi:setup:read_capacity");
 }
diff --git a/pc-bios/s390-ccw/virtio-scsi.h b/pc-bios/s390-ccw/virtio-scsi.h
index f50b38b18b..4c4f4bbc31 100644
--- a/pc-bios/s390-ccw/virtio-scsi.h
+++ b/pc-bios/s390-ccw/virtio-scsi.h
@@ -19,6 +19,8 @@
 #define VIRTIO_SCSI_CDB_SIZE   SCSI_DEFAULT_CDB_SIZE
 #define VIRTIO_SCSI_SENSE_SIZE SCSI_DEFAULT_SENSE_SIZE
 
+#define VIRTIO_SCSI_MAX_SECTORS 2048
+
 /* command-specific response values */
 #define VIRTIO_SCSI_S_OK 0x00
 #define VIRTIO_SCSI_S_BAD_TARGET 0x03
-- 
2.13.0




[Qemu-devel] [PATCH 21/21] s390/kvm: do not reset riccb on initial cpu reset

2017-05-19 Thread Cornelia Huck
From: Christian Borntraeger 

The riccb is kept unchanged during initial cpu reset. Move the data
structure to the other registers that are unchanged.

Signed-off-by: Christian Borntraeger 
Signed-off-by: Cornelia Huck 
---
 target/s390x/cpu.c | 7 ---
 target/s390x/cpu.h | 6 --
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index a1bf2ba5a7..a69005d9b5 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -92,9 +92,10 @@ static void s390_cpu_initial_reset(CPUState *s)
 int i;
 
 s390_cpu_reset(s);
-/* initial reset does not touch regs,fregs and aregs */
-memset(>fpc, 0, offsetof(CPUS390XState, end_reset_fields) -
- offsetof(CPUS390XState, fpc));
+/* initial reset does not clear everything! */
+memset(>start_initial_reset_fields, 0,
+offsetof(CPUS390XState, end_reset_fields) -
+offsetof(CPUS390XState, start_initial_reset_fields));
 
 /* architectured initial values for CR 0 and 14 */
 env->cregs[0] = CR0_RESET;
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index e27d9d874a..c74b4193ee 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -88,6 +88,10 @@ typedef struct CPUS390XState {
  */
 CPU_DoubleU vregs[32][2];  /* vector registers */
 uint32_t aregs[16];/* access registers */
+uint8_t riccb[64]; /* runtime instrumentation control */
+
+/* Fields up to this point are not cleared by initial CPU reset */
+struct {} start_initial_reset_fields;
 
 uint32_t fpc;  /* floating-point control register */
 uint32_t cc_op;
@@ -137,8 +141,6 @@ typedef struct CPUS390XState {
 uint64_t gbea;
 uint64_t pp;
 
-uint8_t riccb[64];
-
 /* Fields up to this point are cleared by a CPU reset */
 struct {} end_reset_fields;
 
-- 
2.13.0




[Qemu-devel] [PATCH 10/21] s390x/css: add s390-squash-mcss machine option

2017-05-19 Thread Cornelia Huck
From: Xiao Feng Ren 

We want to support real (i.e. not virtual) channel devices
even for guests that do not support MCSS-E (where guests may
see devices from any channel subsystem image at once). As all
virtio-ccw devices are in css 0xfe (and show up in the default
css 0 for guests not activating MCSS-E), we need an option to
squash both the virtio subchannels and e.g. passed-through
subchannels from their real css (0-3, or 0 for hosts not
activating MCSS-E) into the default css. This will be
exploited in a later patch.

Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-4-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-virtio-ccw.c | 21 +
 include/hw/s390x/s390-virtio-ccw.h |  1 +
 qemu-options.hx|  6 +-
 target/s390x/cpu.h | 10 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index fdd4384ff0..cd007ca8cf 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -303,6 +303,20 @@ static void machine_set_loadparm(Object *obj, const char 
*val, Error **errp)
 ms->loadparm[i] = ' '; /* pad right with spaces */
 }
 }
+static inline bool machine_get_squash_mcss(Object *obj, Error **errp)
+{
+S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+
+return ms->s390_squash_mcss;
+}
+
+static inline void machine_set_squash_mcss(Object *obj, bool value,
+   Error **errp)
+{
+S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+
+ms->s390_squash_mcss = value;
+}
 
 static inline void s390_machine_initfn(Object *obj)
 {
@@ -328,6 +342,13 @@ static inline void s390_machine_initfn(Object *obj)
 " to upper case) to pass to machine loader, boot manager,"
 " and guest kernel",
 NULL);
+object_property_add_bool(obj, "s390-squash-mcss",
+ machine_get_squash_mcss,
+ machine_set_squash_mcss, NULL);
+object_property_set_description(obj, "s390-squash-mcss",
+"enable/disable squashing subchannels into the default css",
+NULL);
+object_property_set_bool(obj, false, "s390-squash-mcss", NULL);
 }
 
 static const TypeInfo ccw_machine_info = {
diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 7b8a3e4d74..3027555f6d 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -29,6 +29,7 @@ typedef struct S390CcwMachineState {
 bool aes_key_wrap;
 bool dea_key_wrap;
 uint8_t loadparm[8];
+bool s390_squash_mcss;
 } S390CcwMachineState;
 
 typedef struct S390CcwMachineClass {
diff --git a/qemu-options.hx b/qemu-options.hx
index f07a310eb1..1e5382c1e1 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -42,7 +42,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
 "suppress-vmdesc=on|off disables self-describing migration 
(default=off)\n"
 "nvdimm=on|off controls NVDIMM support (default=off)\n"
-"enforce-config-section=on|off enforce configuration 
section migration (default=off)\n",
+"enforce-config-section=on|off enforce configuration 
section migration (default=off)\n"
+"s390-squash-mcss=on|off controls support for squashing 
into default css (default=off)\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -81,6 +82,9 @@ controls whether DEA wrapping keys will be created to allow
 execution of DEA cryptographic functions.  The default is on.
 @item nvdimm=on|off
 Enables or disables NVDIMM support. The default is off.
+@item s390-squash-mcss=on|off
+Enables or disables squashing subchannels into the default css.
+The default is off.
 @end table
 ETEXI
 
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 240b8a5c22..e27d9d874a 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -1256,6 +1256,16 @@ static inline void s390_crypto_reset(void)
 }
 }
 
+static inline bool s390_get_squash_mcss(void)
+{
+if (object_property_get_bool(OBJECT(qdev_get_machine()), 
"s390-squash-mcss",
+ NULL)) {
+return true;
+}
+
+return false;
+}
+
 /* machine check interruption code */
 
 /* subclasses */
-- 
2.13.0




[Qemu-devel] [PATCH 19/21] vfio/ccw: update sense data if a unit check is pending

2017-05-19 Thread Cornelia Huck
From: Dong Jia Shi 

Concurrent-sense data is currently not delivered. This patch stores
the concurrent-sense data to the subchannel if a unit check is pending
and the concurrent-sense bit is enabled. Then a TSCH can retreive the
right IRB data back to the guest.

Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-13-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/vfio/ccw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 007ce435f1..12d0262336 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -94,6 +94,7 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 CcwDevice *ccw_dev = CCW_DEVICE(cdev);
 SubchDev *sch = ccw_dev->sch;
 SCSW *s = >curr_status.scsw;
+PMCW *p = >curr_status.pmcw;
 IRB irb;
 int size;
 
@@ -143,6 +144,12 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 /* Update control block via irb. */
 copy_scsw_to_guest(s, );
 
+/* If a uint check is pending, copy sense data. */
+if ((s->dstat & SCSW_DSTAT_UNIT_CHECK) &&
+(p->chars & PMCW_CHARS_MASK_CSENSE)) {
+memcpy(sch->sense_data, irb.ecw, sizeof(irb.ecw));
+}
+
 read_err:
 css_inject_io_interrupt(sch);
 }
-- 
2.13.0




[Qemu-devel] [PULL 01/20] mc146818rtc: update periodic timer only if it is needed

2017-05-19 Thread Paolo Bonzini
From: Xiao Guangrong 

Currently, the timer is updated whenever RegA or RegB is written
even if the periodic timer related configuration is not changed

This patch optimizes it slightly to make the update happen only
if its period or enable-status is changed, also later patches are
depend on this optimization

Signed-off-by: Xiao Guangrong 
Message-Id: <20170510083259.3900-2-xiaoguangr...@tencent.com>
Signed-off-by: Paolo Bonzini 
---
 hw/timer/mc146818rtc.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index 93de3e1..7d78391 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -391,6 +391,7 @@ static void cmos_ioport_write(void *opaque, hwaddr addr,
   uint64_t data, unsigned size)
 {
 RTCState *s = opaque;
+bool update_periodic_timer;
 
 if ((addr & 1) == 0) {
 s->cmos_index = data & 0x7f;
@@ -423,6 +424,8 @@ static void cmos_ioport_write(void *opaque, hwaddr addr,
 }
 break;
 case RTC_REG_A:
+update_periodic_timer = (s->cmos_data[RTC_REG_A] ^ data) & 0x0f;
+
 if ((data & 0x60) == 0x60) {
 if (rtc_running(s)) {
 rtc_update_time(s);
@@ -445,10 +448,17 @@ static void cmos_ioport_write(void *opaque, hwaddr addr,
 /* UIP bit is read only */
 s->cmos_data[RTC_REG_A] = (data & ~REG_A_UIP) |
 (s->cmos_data[RTC_REG_A] & REG_A_UIP);
-periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+
+if (update_periodic_timer) {
+periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+}
+
 check_update_timer(s);
 break;
 case RTC_REG_B:
+update_periodic_timer = (s->cmos_data[RTC_REG_B] ^ data)
+   & REG_B_PIE;
+
 if (data & REG_B_SET) {
 /* update cmos to when the rtc was stopping */
 if (rtc_running(s)) {
@@ -475,7 +485,11 @@ static void cmos_ioport_write(void *opaque, hwaddr addr,
 qemu_irq_lower(s->irq);
 }
 s->cmos_data[RTC_REG_B] = data;
-periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+
+if (update_periodic_timer) {
+periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+}
+
 check_update_timer(s);
 break;
 case RTC_REG_C:
-- 
1.8.3.1





[Qemu-devel] [PULL 12/20] nbd: add errp parameter to nbd_wr_syncv()

2017-05-19 Thread Paolo Bonzini
From: Vladimir Sementsov-Ogievskiy 

Will be used in following patch to provide actual error message in
some cases.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20170516094533.6160-4-vsement...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 block/nbd-client.c  |  4 ++--
 include/block/nbd.h |  3 ++-
 nbd/common.c| 12 +---
 nbd/nbd-internal.h  |  4 ++--
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 1e2952f..538d95e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -136,7 +136,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 rc = nbd_send_request(s->ioc, request);
 if (rc >= 0) {
 ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
-   false);
+   false, NULL);
 if (ret != request->len) {
 rc = -EIO;
 }
@@ -165,7 +165,7 @@ static void nbd_co_receive_reply(NBDClientSession *s,
 } else {
 if (qiov && reply->error == 0) {
 ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
-   true);
+   true, NULL);
 if (ret != request->len) {
 reply->error = EIO;
 }
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 0ed0775..9d385ea 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -127,7 +127,8 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
  struct iovec *iov,
  size_t niov,
  size_t length,
- bool do_read);
+ bool do_read,
+ Error **errp);
 int nbd_receive_negotiate(QIOChannel *ioc, const char *name, uint16_t *flags,
   QCryptoTLSCreds *tlscreds, const char *hostname,
   QIOChannel **outioc,
diff --git a/nbd/common.c b/nbd/common.c
index 4db45b3..bd81637 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -28,10 +28,10 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
  struct iovec *iov,
  size_t niov,
  size_t length,
- bool do_read)
+ bool do_read,
+ Error **errp)
 {
 ssize_t done = 0;
-Error *local_err = NULL;
 struct iovec *local_iov = g_new(struct iovec, niov);
 struct iovec *local_iov_head = local_iov;
 unsigned int nlocal_iov = niov;
@@ -41,19 +41,17 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
 while (nlocal_iov > 0) {
 ssize_t len;
 if (do_read) {
-len = qio_channel_readv(ioc, local_iov, nlocal_iov, _err);
+len = qio_channel_readv(ioc, local_iov, nlocal_iov, errp);
 } else {
-len = qio_channel_writev(ioc, local_iov, nlocal_iov, _err);
+len = qio_channel_writev(ioc, local_iov, nlocal_iov, errp);
 }
 if (len == QIO_CHANNEL_ERR_BLOCK) {
+/* errp should not be set */
 assert(qemu_in_coroutine());
 qio_channel_yield(ioc, do_read ? G_IO_IN : G_IO_OUT);
 continue;
 }
 if (len < 0) {
-TRACE("I/O error: %s", error_get_pretty(local_err));
-error_free(local_err);
-/* XXX handle Error objects */
 done = -EIO;
 goto cleanup;
 }
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index e6bbc7c..1d479fe 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -108,7 +108,7 @@ static inline ssize_t read_sync_eof(QIOChannel *ioc, void 
*buffer, size_t size)
  * our request/reply.  Synchronization is done with recv_coroutine, so
  * that this is coroutine-safe.
  */
-return nbd_wr_syncv(ioc, , 1, size, true);
+return nbd_wr_syncv(ioc, , 1, size, true, NULL);
 }
 
 /* read_sync
@@ -132,7 +132,7 @@ static inline int write_sync(QIOChannel *ioc, const void 
*buffer, size_t size)
 {
 struct iovec iov = { .iov_base = (void *) buffer, .iov_len = size };
 
-ssize_t ret = nbd_wr_syncv(ioc, , 1, size, false);
+ssize_t ret = nbd_wr_syncv(ioc, , 1, size, false, NULL);
 
 assert(ret < 0 || ret == size);
 
-- 
1.8.3.1





[Qemu-devel] [PULL 07/20] msix: trace control bit write op

2017-05-19 Thread Paolo Bonzini
From: Peter Xu 

Meanwhile, abstract a function to detect msix masked bit.

Signed-off-by: Peter Xu 
Message-Id: <1494309644-18743-3-git-send-email-pet...@redhat.com>
Acked-by: Michael S. Tsirkin 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 hw/pci/msix.c   | 11 +--
 hw/pci/trace-events |  3 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index bb54e8b..fc5fe51 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -22,6 +22,7 @@
 #include "hw/xen/xen.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
+#include "trace.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -130,10 +131,14 @@ static void msix_handle_mask_update(PCIDevice *dev, int 
vector, bool was_masked)
 }
 }
 
+static bool msix_masked(PCIDevice *dev)
+{
+return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & 
MSIX_MASKALL_MASK;
+}
+
 static void msix_update_function_masked(PCIDevice *dev)
 {
-dev->msix_function_masked = !msix_enabled(dev) ||
-(dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK);
+dev->msix_function_masked = !msix_enabled(dev) || msix_masked(dev);
 }
 
 /* Handle MSI-X capability config write. */
@@ -148,6 +153,8 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 return;
 }
 
+trace_msix_write_config(dev->name, msix_enabled(dev), msix_masked(dev));
+
 was_masked = dev->msix_function_masked;
 msix_update_function_masked(dev);
 
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index 2b9cf24..83c8f5a 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -7,3 +7,6 @@ pci_update_mappings_add(void *d, uint32_t bus, uint32_t slot, 
uint32_t func, int
 # hw/pci/pci_host.c
 pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, 
unsigned val) "%s %02u:%u @0x%x -> 0x%x"
 pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, 
unsigned val) "%s %02u:%u @0x%x <- 0x%x"
+
+# hw/pci/msix.c
+msix_write_config(char *name, bool enabled, bool masked) "dev %s enabled %d 
masked %d"
-- 
1.8.3.1





Re: [Qemu-devel] [RFC PATCH V2 2/2] msi: Handle remappable format interrupt request

2017-05-19 Thread Anthony PERARD
On Thu, May 18, 2017 at 01:33:00AM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> According to VT-d spec Interrupt Remapping and Interrupt Posting ->
> Interrupt Remapping -> Interrupt Request Formats On Intel 64
> Platforms, fields of MSI data register have changed. This patch
> avoids wrongly regarding a remappable format interrupt request as
> an interrupt binded with an event channel.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  hw/pci/msi.c | 5 +++--
>  hw/pci/msix.c| 4 +++-
>  hw/xen/xen_pt_msi.c  | 2 +-
>  include/hw/xen/xen.h | 2 +-
>  xen-hvm-stub.c   | 2 +-
>  xen-hvm.c| 7 ++-
>  6 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> index a87b227..199cb47 100644
> --- a/hw/pci/msi.c
> +++ b/hw/pci/msi.c
> @@ -289,7 +289,7 @@ void msi_reset(PCIDevice *dev)
>  static bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
>  {
>  uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
> -uint32_t mask, data;
> +uint32_t mask, data, addr_lo;
>  bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>  assert(vector < PCI_MSI_VECTORS_MAX);
>  
> @@ -298,7 +298,8 @@ static bool msi_is_masked(const PCIDevice *dev, unsigned 
> int vector)
>  }
>  
>  data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
> -if (xen_is_pirq_msi(data)) {
> +addr_lo = pci_get_long(dev->config + msi_address_lo_off(dev));
> +if (xen_is_pirq_msi(data, addr_lo)) {
>  return false;
>  }
>  
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index bb54e8b..efe2982 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -82,9 +82,11 @@ static bool msix_vector_masked(PCIDevice *dev, unsigned 
> int vector, bool fmask)
>  {
>  unsigned offset = vector * PCI_MSIX_ENTRY_SIZE;
>  uint8_t *data = >msix_table[offset + PCI_MSIX_ENTRY_DATA];
> +uint8_t *addr_lo = >msix_table[offset + PCI_MSIX_ENTRY_LOWER_ADDR];
>  /* MSIs on Xen can be remapped into pirqs. In those cases, masking
>   * and unmasking go through the PV evtchn path. */
> -if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data))) {
> +if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data),
> + pci_get_long(addr_lo))) {
>  return false;
>  }
>  return fmask || dev->msix_table[offset + PCI_MSIX_ENTRY_VECTOR_CTRL] &
> diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
> index 5fab95e..45a9e9f 100644
> --- a/hw/xen/xen_pt_msi.c
> +++ b/hw/xen/xen_pt_msi.c
> @@ -114,7 +114,7 @@ static int msi_msix_setup(XenPCIPassthroughState *s,
>  
>  assert((!is_msix && msix_entry == 0) || is_msix);
>  
> -if (xen_is_pirq_msi(data)) {
> +if (xen_is_pirq_msi(data, addr)) {
>  *ppirq = msi_ext_dest_id(addr >> 32) | msi_dest_id(addr);
>  if (!*ppirq) {
>  /* this probably identifies an misconfiguration of the guest,
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index 09c2ce5..af759bc 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -33,7 +33,7 @@ int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
>  void xen_piix3_set_irq(void *opaque, int irq_num, int level);
>  void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int 
> len);
>  void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
> -int xen_is_pirq_msi(uint32_t msi_data);
> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo);

Maybe inverting the arguments would be better, so the arguments would be
the address first, then the data, like I think it is often the case.
What do you think?

>  
>  qemu_irq *xen_interrupt_controller_init(void);
>  
> diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
> index c500325..dae421c 100644
> --- a/xen-hvm-stub.c
> +++ b/xen-hvm-stub.c
> @@ -31,7 +31,7 @@ void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
>  {
>  }
>  
> -int xen_is_pirq_msi(uint32_t msi_data)
> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
>  {
>  return 0;
>  }
> diff --git a/xen-hvm.c b/xen-hvm.c
> index 5043beb..db29121 100644
> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -146,8 +146,13 @@ void xen_piix_pci_write_config_client(uint32_t address, 
> uint32_t val, int len)
>  }
>  }
>  
> -int xen_is_pirq_msi(uint32_t msi_data)
> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
>  {
> +/* If msi address is configurate to remapping format, the msi will not
> + * remapped into a pirq.

What do you think of: "If the MSI address is configured in remappable
format, the MSI will not be remapped into a pirq." ?

> + */
> +if (msi_addr_lo & MSI_ADDR_IF_MASK)
> +return 0;
>  /* If vector is 0, the msi is remapped into a pirq, passed as
>   * dest_id.
>   */

Thanks,

-- 
Anthony PERARD



Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

2017-05-19 Thread Daniel P. Berrange
On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.
> 
> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

BTW, if you want to simplify the code in this area at all, then arguably
we should get rid of the "get_return_path" helper method entirely. We're
not actually opening any new connections - we're just creating a second
QEMUFile that uses the same underlying QIOChannel object. All we would
need is for the QEMUFile to have a separate 'buf' field management in
QEMUFile for the read & write directions.  Then all the code would be
able to just use the single QEMUFile for read & write getting rid of this
concept of "opening a return path" which doens't actually do anything at
the underlying data transport level.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-19 Thread Tian, Kevin
> From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> Sent: Friday, May 19, 2017 1:24 PM
> 
> Hi Alex,
> 
> What's your opinion with Tianyu's question? Is it accepatable
> to use VFIO API in intel_iommu emulator?

Did you actually need such translation at all? SID should be
filled by kernel IOMMU driver based on which device is
requested with invalidation request, regardless of which 
guest SID is used in user space. Qemu only needs to know
which fd corresponds to guest SID, and then initiates an
invalidation request on that fd?

> 
> Thanks,
> Yi L
> On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> translation
> > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > emulator needs to replace guest SID with host SID so that to limit
> > > the invalidation. This patch introduces a new callback
> > > iommu_ops->record_device() to notify vIOMMU emulator to record
> necessary
> > > information about the assigned device.
> >
> > This patch is to prepare to translate guest sbdf to host sbdf.
> >
> > Alex:
> > Could we add a new vfio API to do such translation? This will be more
> > straight forward than storing host sbdf in the vIOMMU device model.
> >
> > >
> > > Signed-off-by: Liu, Yi L 
> > > ---
> > >  include/exec/memory.h | 11 +++
> > >  memory.c  | 12 
> > >  2 files changed, 23 insertions(+)
> > >
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 7bd13ab..49087ef 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > >  IOMMUNotifierFlag new_flags);
> > >  /* Set this up to provide customized IOMMU replay function */
> > >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > +void (*record_device)(MemoryRegion *iommu,
> > > +  void *device_info);
> > >  };
> > >
> > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > @@ -708,6 +710,15 @@ void
> memory_region_notify_iommu(MemoryRegion *mr,
> > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > >IOMMUTLBEntry *entry);
> > >
> > > +/*
> > > + * memory_region_notify_device_record: notify IOMMU to record
> assign
> > > + * device.
> > > + * @mr: the memory region to notify
> > > + * @ device_info: device information
> > > + */
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +void *info);
> > > +
> > >  /**
> > >   * memory_region_register_iommu_notifier: register a notifier for
> changes to
> > >   * IOMMU translation entries.
> > > diff --git a/memory.c b/memory.c
> > > index 0728e62..45ef069 100644
> > > --- a/memory.c
> > > +++ b/memory.c
> > > @@ -1600,6 +1600,18 @@ static void
> memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > >  mr->iommu_notify_flags = flags;
> > >  }
> > >
> > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > +void *info)
> > > +{
> > > +assert(memory_region_is_iommu(mr));
> > > +
> > > +if (mr->iommu_ops->record_device) {
> > > +mr->iommu_ops->record_device(mr, info);
> > > +}
> > > +
> > > +return;
> > > +}
> > > +
> > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > IOMMUNotifier *n)
> > >  {
> > >
> >
> >


[Qemu-devel] [PATCH v1 12/13] qcow2: allow concurrent unaligned writes to the same clusters

2017-05-19 Thread Anton Nefedov
If COW area of a write request to unallocated cluster is empty,
concurrent write requests can be allowed with a little bit of
extra synchronization; so they don't have to wait until L2 is filled.

Let qcow2_cluster.c::handle_dependencies() do the most of the job:
  if there is an in-flight request to the same cluster,
  and the current request wants to write in its COW area,
  and its COW area is marked empty,
  - steal the allocated offset and write concurrently. Let the original
request update L2 later when it likes.

This gives an improvement for parallel misaligned writes to
unallocated clusters with no backing data:

HDD fio over xfs iodepth=4:
  seqwrite 4k:   18400 -> 22800 IOPS ( x1.24 )
  seqwrite 68k:   1600 ->  2300 IOPS ( x1.44 )

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c | 169 +++---
 block/qcow2.c |  28 -
 block/qcow2.h |  12 +++-
 3 files changed, 181 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c0974e8..7cffdd4 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -898,20 +898,32 @@ out:
 
 /*
  * Check if there already is an AIO write request in flight which allocates
- * the same cluster. In this case we need to wait until the previous
- * request has completed and updated the L2 table accordingly.
+ * the same cluster.
+ * In this case, check if that request has explicitly allowed to write
+ * in its COW area(s).
+ *   If yes - fill the meta to point to the same cluster.
+ *   If no  - we need to wait until the previous request has completed and
+ *updated the L2 table accordingly or
+ *has allowed writing in its COW area(s).
  * Returns:
  *   0   if there was no dependency. *cur_bytes indicates the number of
  *   bytes from guest_offset that can be read before the next
  *   dependency must be processed (or the request is complete).
- *   *m is not modified
+ *   *m, *host_offset are not modified
+ *
+ *   1   if there is a dependency but it is possible to write concurrently
+ *   *m is filled accordingly,
+ *   *cur_bytes may have decreased and describes
+ * the length of the area that can be written to,
+ *   *host_offset contains the starting host image offset to write to
  *
  *   -EAGAIN if we had to wait for another request. The caller
- *   must start over, so consider *cur_bytes undefined.
+ *   must start over, so consider *cur_bytes and *host_offset 
undefined.
  *   *m is not modified
  */
 static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
-uint64_t *cur_bytes, QCowL2Meta **m)
+   uint64_t *host_offset, uint64_t *cur_bytes,
+   QCowL2Meta **m)
 {
 BDRVQcow2State *s = bs->opaque;
 QCowL2Meta *old_alloc;
@@ -924,7 +936,7 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 const uint64_t old_start = l2meta_cow_start(old_alloc);
 const uint64_t old_end = l2meta_cow_end(old_alloc);
 
-if (end <= old_start || start >= old_end) {
+if (end <= old_start || start >= old_end || old_alloc->piggybacked) {
 /* No intersection */
 continue;
 }
@@ -936,21 +948,95 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 continue;
 }
 
-/* Stop if already an l2meta exists. After yielding, it wouldn't
- * be valid any more, so we'd have to clean up the old L2Metas
- * and deal with requests depending on them before starting to
- * gather new ones. Not worth the trouble. */
-if (*m) {
+/* offsets of the cluster we're intersecting in */
+const uint64_t cluster_start = start_of_cluster(s, start);
+const uint64_t cluster_end = cluster_start + s->cluster_size;
+
+const uint64_t old_data_start = old_start
++ old_alloc->cow_start.nb_bytes;
+const uint64_t old_data_end = old_alloc->offset
++ old_alloc->cow_end.offset;
+
+const bool conflict_in_data_area =
+end > old_data_start && start < old_data_end;
+const bool conflict_in_old_cow_start =
+/* 1). new write request area is before the old */
+start < old_data_start
+&& /* 2). old request did not allow writing in its cow area */
+!old_alloc->cow_start.reduced;
+const bool conflict_in_old_cow_end =
+/* 1). new write request area is after the old */
+start > old_data_start
+&& /* 2). old request did not allow writing in its cow area */
+!old_alloc->cow_end.reduced;
+
+if (conflict_in_data_area ||
+conflict_in_old_cow_start || 

Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

2017-05-19 Thread Peter Xu
On Fri, May 19, 2017 at 09:30:10AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> BTW, if you want to simplify the code in this area at all, then arguably
> we should get rid of the "get_return_path" helper method entirely. We're
> not actually opening any new connections - we're just creating a second
> QEMUFile that uses the same underlying QIOChannel object. All we would
> need is for the QEMUFile to have a separate 'buf' field management in
> QEMUFile for the read & write directions.  Then all the code would be
> able to just use the single QEMUFile for read & write getting rid of this
> concept of "opening a return path" which doens't actually do anything at
> the underlying data transport level.

Makes sense. Noted. Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH] i386: fix read/write cr with icount option

2017-05-19 Thread Mihail Abakumov

Paolo Bonzini писал 2017-05-19 12:59:

On 19/05/2017 11:36, Mihail Abakumov wrote:

Running Windows with icount causes a crash in instruction of write cr.
This patch fixes it.

Reading and writing cr cause an icount read because there are called
cpu_get_apic_tpr and cpu_set_apic_tpr functions. So, there is need
gen_io_start()/gen_io_end() calls.


The patch looks good, but lacks a signoff.  Please read the Developer
Certificate of Origin[1] and reply to this email with "Signed-off-by:
Mihail Abakumov ".


[1] Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and 
I

have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the 
best
of my knowledge, is covered under an appropriate open 
source
license and I have the right under that license to submit 
that
work with modifications, whether created in whole or in 
part

by me, under the same open source license (unless I am
permitted to submit under a different license), as 
indicated

in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not 
modified

it.

(d) I understand and agree that this project and the 
contribution
are public and that a record of the contribution (including 
all
personal information I submit with it, including my 
sign-off) is
maintained indefinitely and may be redistributed consistent 
with

this project or the open source license(s) involved.

Thanks,

Paolo


Signed-off-by: Mihail Abakumov 


---
 target/i386/translate.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 06d8833..3b009bd 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -7907,14 +7907,26 @@ static target_ulong disas_insn(CPUX86State 
*env,

DisasContext *s,
 gen_update_cc_op(s);
 gen_jmp_im(pc_start - s->cs_base);
 if (b & 2) {
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_op_mov_v_reg(ot, cpu_T0, rm);
 gen_helper_write_crN(cpu_env, tcg_const_i32(reg),
  cpu_T0);
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_end();
+}
 gen_jmp_im(s->pc - s->cs_base);
 gen_eob(s);
 } else {
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_helper_read_crN(cpu_T0, cpu_env,
tcg_const_i32(reg));
 gen_op_mov_reg_v(ot, rm, cpu_T0);
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_end();
+}
 }
 break;
 default:




Re: [Qemu-devel] [PATCH 3/5] migration: Remove use of old MigrationParams

2017-05-19 Thread Markus Armbruster
Juan Quintela  writes:

> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as
> now it is empty.
>
> For savevm, one can't do a:
>
> savevm -b/-i foo

Yes (savem has no such options).

> but now one can do:
>
> migrate_set_capability block on
> savevm foo
>
> And we can't use block migration. We could disable block capability
> unconditionally, but it would not be much better.

This leaves me confused: what does the example do?  Reading ahead...
looks like it fails with "Block migration and snapshots are
incompatible".  What are you trying to say here?

> Signed-off-by: Juan Quintela 
> Reviewed-by: Eric Blake 

Patch looks good to me.



[Qemu-devel] [PULL 04/20] mc146818rtc: drop unnecessary '#ifdef TARGET_I386'

2017-05-19 Thread Paolo Bonzini
From: Xiao Guangrong 

If the code purely depends on LOST_TICK_POLICY_SLEW, we can simply
drop '#ifdef TARGET_I386' as only x86 can enable this tick policy

Signed-off-by: Xiao Guangrong 
Message-Id: <20170510083259.3900-5-xiaoguangr...@tencent.com>
Signed-off-by: Paolo Bonzini 
---
 hw/timer/mc146818rtc.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index 4870a72..f9d6181 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -112,7 +112,6 @@ static uint64_t get_guest_rtc_ns(RTCState *s)
 guest_clock - s->last_update + s->offset;
 }
 
-#ifdef TARGET_I386
 static void rtc_coalesced_timer_update(RTCState *s)
 {
 if (s->irq_coalesced == 0) {
@@ -126,6 +125,7 @@ static void rtc_coalesced_timer_update(RTCState *s)
 }
 }
 
+#ifdef TARGET_I386
 static void rtc_coalesced_timer(void *opaque)
 {
 RTCState *s = opaque;
@@ -198,7 +198,6 @@ periodic_timer_update(RTCState *s, int64_t current_time, 
uint32_t old_period)
 assert(lost_clock >= 0);
 }
 
-#ifdef TARGET_I386
 /*
  * s->irq_coalesced can change for two reasons:
  *
@@ -227,9 +226,7 @@ periodic_timer_update(RTCState *s, int64_t current_time, 
uint32_t old_period)
   s->irq_coalesced, old_period, s->period);
 rtc_coalesced_timer_update(s);
 }
-} else
-#endif
-{
+} else {
/*
  * no way to compensate the interrupt if LOST_TICK_POLICY_SLEW
  * is not used, we should make the time progress anyway.
@@ -244,9 +241,7 @@ periodic_timer_update(RTCState *s, int64_t current_time, 
uint32_t old_period)
  RTC_CLOCK_RATE) + 1;
 timer_mod(s->periodic_timer, s->next_periodic_time);
 } else {
-#ifdef TARGET_I386
 s->irq_coalesced = 0;
-#endif
 timer_del(s->periodic_timer);
 }
 }
@@ -835,13 +830,11 @@ static int rtc_post_load(void *opaque, int version_id)
 }
 }
 
-#ifdef TARGET_I386
 if (version_id >= 2) {
 if (s->lost_tick_policy == LOST_TICK_POLICY_SLEW) {
 rtc_coalesced_timer_update(s);
 }
 }
-#endif
 return 0;
 }
 
@@ -898,11 +891,10 @@ static void rtc_notify_clock_reset(Notifier *notifier, 
void *data)
 rtc_set_date_from_host(ISA_DEVICE(s));
 periodic_timer_update(s, now, 0);
 check_update_timer(s);
-#ifdef TARGET_I386
+
 if (s->lost_tick_policy == LOST_TICK_POLICY_SLEW) {
 rtc_coalesced_timer_update(s);
 }
-#endif
 }
 
 /* set CMOS shutdown status register (index 0xF) as S3_resume(0xFE)
@@ -923,12 +915,10 @@ static void rtc_reset(void *opaque)
 
 qemu_irq_lower(s->irq);
 
-#ifdef TARGET_I386
 if (s->lost_tick_policy == LOST_TICK_POLICY_SLEW) {
 s->irq_coalesced = 0;
 s->irq_reinject_on_ack_count = 0;  
 }
-#endif
 }
 
 static const MemoryRegionOps cmos_ops = {
-- 
1.8.3.1





[Qemu-devel] [PULL 11/20] nbd: read_sync and friends: return 0 on success

2017-05-19 Thread Paolo Bonzini
From: Vladimir Sementsov-Ogievskiy 

functions read_sync, drop_sync, write_sync, and also
nbd_negotiate_write, nbd_negotiate_read, nbd_negotiate_drop_sync
returns number of processed bytes. But what this number can be,
except requested number of bytes?

Actually, underlying nbd_wr_syncv function returns a value >= 0 and
!= requested_bytes only on eof on read operation. So, firstly, it is
impossible on write (let's add an assert) and on read it actually
means, that communication is broken (except nbd_receive_reply, see
below).

Most of callers operate like this:
   if (func(..., size) != size) {
   /* error path */
   }
, i.e.:
  1. They are not interested in partial success
  2. Extra duplications in code (especially bad are duplications of
 magic numbers)
  3. User doesn't see actual error message, as return code is lost.
 (this patch doesn't fix this point, but it makes fixing easier)

Several callers handles ret >= 0 and != requested-size separately, by
just returning EINVAL in this case. This patch makes read_sync and
friends return EINVAL in this case, so final behavior is the same.

And only one caller - nbd_receive_reply() does something not so
obvious. It returns EINVAL for ret > 0 and != requested-size, like
previous group, but for ret == 0 it returns 0. The only caller of
nbd_receive_reply() - nbd_read_reply_entry() handles ret == 0 in the
same way as ret < 0, so for now it doesn't matter. However, in
following commits error path handling will be improved and we'll need
to distinguish success from fail in this case too. So, this patch adds
separate helper for this case - read_sync_eof.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20170516094533.6160-3-vsement...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 nbd/client.c   | 63 
 nbd/nbd-internal.h | 34 +++---
 nbd/server.c   | 84 +-
 3 files changed, 88 insertions(+), 93 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index a58fb02..6b74a62 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -86,9 +86,9 @@ static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);
 
 */
 
-/* Discard length bytes from channel.  Return -errno on failure, or
- * the amount of bytes consumed. */
-static ssize_t drop_sync(QIOChannel *ioc, size_t size)
+/* Discard length bytes from channel.  Return -errno on failure and 0 on
+ * success*/
+static int drop_sync(QIOChannel *ioc, size_t size)
 {
 ssize_t ret = 0;
 char small[1024];
@@ -96,14 +96,13 @@ static ssize_t drop_sync(QIOChannel *ioc, size_t size)
 
 buffer = sizeof(small) >= size ? small : g_malloc(MIN(65536, size));
 while (size > 0) {
-ssize_t count = read_sync(ioc, buffer, MIN(65536, size));
+ssize_t count = MIN(65536, size);
+ret = read_sync(ioc, buffer, MIN(65536, size));
 
-if (count <= 0) {
+if (ret < 0) {
 goto cleanup;
 }
-assert(count <= size);
 size -= count;
-ret += count;
 }
 
  cleanup:
@@ -136,12 +135,12 @@ static int nbd_send_option_request(QIOChannel *ioc, 
uint32_t opt,
 stl_be_p(, opt);
 stl_be_p(, len);
 
-if (write_sync(ioc, , sizeof(req)) != sizeof(req)) {
+if (write_sync(ioc, , sizeof(req)) < 0) {
 error_setg(errp, "Failed to send option request header");
 return -1;
 }
 
-if (len && write_sync(ioc, (char *) data, len) != len) {
+if (len && write_sync(ioc, (char *) data, len) < 0) {
 error_setg(errp, "Failed to send option request data");
 return -1;
 }
@@ -170,7 +169,7 @@ static int nbd_receive_option_reply(QIOChannel *ioc, 
uint32_t opt,
 nbd_opt_reply *reply, Error **errp)
 {
 QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
-if (read_sync(ioc, reply, sizeof(*reply)) != sizeof(*reply)) {
+if (read_sync(ioc, reply, sizeof(*reply)) < 0) {
 error_setg(errp, "failed to read option reply");
 nbd_send_opt_abort(ioc);
 return -1;
@@ -219,7 +218,7 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
nbd_opt_reply *reply,
 goto cleanup;
 }
 msg = g_malloc(reply->length + 1);
-if (read_sync(ioc, msg, reply->length) != reply->length) {
+if (read_sync(ioc, msg, reply->length) < 0) {
 error_setg(errp, "failed to read option error message");
 goto cleanup;
 }
@@ -321,7 +320,7 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 nbd_send_opt_abort(ioc);
 return -1;
 }
-if (read_sync(ioc, , sizeof(namelen)) != sizeof(namelen)) {
+if (read_sync(ioc, , sizeof(namelen)) < 0) {
 error_setg(errp, "failed to read option name length");
 nbd_send_opt_abort(ioc);
 return -1;
@@ -334,7 +333,7 @@ 

[Qemu-devel] [PULL 08/20] kvm: irqchip: skip update msi when disabled

2017-05-19 Thread Paolo Bonzini
From: Peter Xu 

It's possible that one device kept its irqfd/virq there even when
MSI/MSIX was disabled globally for that device. One example is
virtio-net-pci (see commit f1d0f15a6 and virtio_pci_vq_vector_mask()).
It is used as a fast path to avoid allocate/release irqfd/virq
frequently when guest enables/disables MSIX.

However, this fast path brought a problem to msi_route_list, that the
device MSIRouteEntry is still dangling there even if MSIX disabled -
then we cannot know which message to fetch, even if we can, the messages
are meaningless. In this case, we can just simply ignore this entry.

It's safe, since when MSIX is enabled again, we'll rebuild them no
matter what.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1448813

Signed-off-by: Peter Xu 
Message-Id: <1494309644-18743-4-git-send-email-pet...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/kvm.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 011d4a5..82c72d2 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -43,6 +43,7 @@
 #include "standard-headers/asm-x86/hyperv.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
@@ -3510,12 +3511,17 @@ static void kvm_update_msi_routes_all(void *private, 
bool global,
 int cnt = 0;
 MSIRouteEntry *entry;
 MSIMessage msg;
+PCIDevice *dev;
+
 /* TODO: explicit route update */
 QLIST_FOREACH(entry, _route_list, list) {
 cnt++;
-msg = pci_get_msi_message(entry->dev, entry->vector);
-kvm_irqchip_update_msi_route(kvm_state, entry->virq,
- msg, entry->dev);
+dev = entry->dev;
+if (!msix_enabled(dev) && !msi_enabled(dev)) {
+continue;
+}
+msg = pci_get_msi_message(dev, entry->vector);
+kvm_irqchip_update_msi_route(kvm_state, entry->virq, msg, dev);
 }
 kvm_irqchip_commit_routes(kvm_state);
 trace_kvm_x86_update_msi_routes(cnt);
-- 
1.8.3.1





[Qemu-devel] [PULL 13/20] nbd: add errp to read_sync, write_sync and drop_sync

2017-05-19 Thread Paolo Bonzini
From: Vladimir Sementsov-Ogievskiy 

There a lot of calls of these functions, which already have errp, which
they are filling themselves. On the other hand, nbd_wr_syncv has errp
parameter too, so it would be great to connect them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20170516094533.6160-5-vsement...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 nbd/client.c   | 76 +++---
 nbd/nbd-internal.h | 16 +++-
 nbd/server.c   | 12 -
 3 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 6b74a62..f102375 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -88,7 +88,7 @@ static QTAILQ_HEAD(, NBDExport) exports = 
QTAILQ_HEAD_INITIALIZER(exports);
 
 /* Discard length bytes from channel.  Return -errno on failure and 0 on
  * success*/
-static int drop_sync(QIOChannel *ioc, size_t size)
+static int drop_sync(QIOChannel *ioc, size_t size, Error **errp)
 {
 ssize_t ret = 0;
 char small[1024];
@@ -97,7 +97,7 @@ static int drop_sync(QIOChannel *ioc, size_t size)
 buffer = sizeof(small) >= size ? small : g_malloc(MIN(65536, size));
 while (size > 0) {
 ssize_t count = MIN(65536, size);
-ret = read_sync(ioc, buffer, MIN(65536, size));
+ret = read_sync(ioc, buffer, MIN(65536, size), errp);
 
 if (ret < 0) {
 goto cleanup;
@@ -135,13 +135,13 @@ static int nbd_send_option_request(QIOChannel *ioc, 
uint32_t opt,
 stl_be_p(, opt);
 stl_be_p(, len);
 
-if (write_sync(ioc, , sizeof(req)) < 0) {
-error_setg(errp, "Failed to send option request header");
+if (write_sync(ioc, , sizeof(req), errp) < 0) {
+error_prepend(errp, "Failed to send option request header");
 return -1;
 }
 
-if (len && write_sync(ioc, (char *) data, len) < 0) {
-error_setg(errp, "Failed to send option request data");
+if (len && write_sync(ioc, (char *) data, len, errp) < 0) {
+error_prepend(errp, "Failed to send option request data");
 return -1;
 }
 
@@ -169,8 +169,8 @@ static int nbd_receive_option_reply(QIOChannel *ioc, 
uint32_t opt,
 nbd_opt_reply *reply, Error **errp)
 {
 QEMU_BUILD_BUG_ON(sizeof(*reply) != 20);
-if (read_sync(ioc, reply, sizeof(*reply)) < 0) {
-error_setg(errp, "failed to read option reply");
+if (read_sync(ioc, reply, sizeof(*reply), errp) < 0) {
+error_prepend(errp, "failed to read option reply");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -218,8 +218,8 @@ static int nbd_handle_reply_err(QIOChannel *ioc, 
nbd_opt_reply *reply,
 goto cleanup;
 }
 msg = g_malloc(reply->length + 1);
-if (read_sync(ioc, msg, reply->length) < 0) {
-error_setg(errp, "failed to read option error message");
+if (read_sync(ioc, msg, reply->length, errp) < 0) {
+error_prepend(errp, "failed to read option error message");
 goto cleanup;
 }
 msg[reply->length] = '\0';
@@ -320,8 +320,8 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 nbd_send_opt_abort(ioc);
 return -1;
 }
-if (read_sync(ioc, , sizeof(namelen)) < 0) {
-error_setg(errp, "failed to read option name length");
+if (read_sync(ioc, , sizeof(namelen), errp) < 0) {
+error_prepend(errp, "failed to read option name length");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -333,8 +333,8 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 return -1;
 }
 if (namelen != strlen(want)) {
-if (drop_sync(ioc, len) < 0) {
-error_setg(errp, "failed to skip export name with wrong length");
+if (drop_sync(ioc, len, errp) < 0) {
+error_prepend(errp, "failed to skip export name with wrong 
length");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -342,15 +342,15 @@ static int nbd_receive_list(QIOChannel *ioc, const char 
*want, bool *match,
 }
 
 assert(namelen < sizeof(name));
-if (read_sync(ioc, name, namelen) < 0) {
-error_setg(errp, "failed to read export name");
+if (read_sync(ioc, name, namelen, errp) < 0) {
+error_prepend(errp, "failed to read export name");
 nbd_send_opt_abort(ioc);
 return -1;
 }
 name[namelen] = '\0';
 len -= namelen;
-if (drop_sync(ioc, len) < 0) {
-error_setg(errp, "failed to read export description");
+if (drop_sync(ioc, len, errp) < 0) {
+error_prepend(errp, "failed to read export description");
 nbd_send_opt_abort(ioc);
 return -1;
 }
@@ -476,8 +476,8 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint16_t *flags,
 goto fail;
 }
 
- 

[Qemu-devel] [PULL 1/3] audio: Move arch_init audio code to hw/audio/soundhw.c

2017-05-19 Thread Gerd Hoffmann
From: Eduardo Habkost 

There's no reason to keep the soundhw table in arch_init.c. Move
that code to a new hw/audio/soundhw.c file.

While moving the code, trivial coding style issues were fixed.

Signed-off-by: Eduardo Habkost 
Reviewed-by: David Gibson 
Reviewed-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20170508205735.23444-2-ehabk...@redhat.com
Signed-off-by: Gerd Hoffmann 
---
 include/hw/audio/audio.h   |   3 +
 include/sysemu/arch_init.h |   2 -
 arch_init.c| 124 ---
 hw/audio/soundhw.c | 156 +
 hw/ppc/prep.c  |   1 +
 vl.c   |   1 +
 hw/audio/Makefile.objs |   2 +
 7 files changed, 163 insertions(+), 126 deletions(-)
 create mode 100644 hw/audio/soundhw.c

diff --git a/include/hw/audio/audio.h b/include/hw/audio/audio.h
index 55d40f71bf..259bb2cf96 100644
--- a/include/hw/audio/audio.h
+++ b/include/hw/audio/audio.h
@@ -7,4 +7,7 @@ void isa_register_soundhw(const char *name, const char *descr,
 void pci_register_soundhw(const char *name, const char *descr,
   int (*init_pci)(PCIBus *bus));
 
+void audio_init(void);
+void select_soundhw(const char *optarg);
+
 #endif
diff --git a/include/sysemu/arch_init.h b/include/sysemu/arch_init.h
index 2bf16b203c..8751c468ed 100644
--- a/include/sysemu/arch_init.h
+++ b/include/sysemu/arch_init.h
@@ -28,8 +28,6 @@ enum {
 
 extern const uint32_t arch_type;
 
-void select_soundhw(const char *optarg);
-void audio_init(void);
 int kvm_available(void);
 int xen_available(void);
 
diff --git a/arch_init.c b/arch_init.c
index 0810116144..74ca62f508 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -85,130 +85,6 @@ int graphic_depth = 32;
 
 const uint32_t arch_type = QEMU_ARCH;
 
-struct soundhw {
-const char *name;
-const char *descr;
-int enabled;
-int isa;
-union {
-int (*init_isa) (ISABus *bus);
-int (*init_pci) (PCIBus *bus);
-} init;
-};
-
-static struct soundhw soundhw[9];
-static int soundhw_count;
-
-void isa_register_soundhw(const char *name, const char *descr,
-  int (*init_isa)(ISABus *bus))
-{
-assert(soundhw_count < ARRAY_SIZE(soundhw) - 1);
-soundhw[soundhw_count].name = name;
-soundhw[soundhw_count].descr = descr;
-soundhw[soundhw_count].isa = 1;
-soundhw[soundhw_count].init.init_isa = init_isa;
-soundhw_count++;
-}
-
-void pci_register_soundhw(const char *name, const char *descr,
-  int (*init_pci)(PCIBus *bus))
-{
-assert(soundhw_count < ARRAY_SIZE(soundhw) - 1);
-soundhw[soundhw_count].name = name;
-soundhw[soundhw_count].descr = descr;
-soundhw[soundhw_count].isa = 0;
-soundhw[soundhw_count].init.init_pci = init_pci;
-soundhw_count++;
-}
-
-void select_soundhw(const char *optarg)
-{
-struct soundhw *c;
-
-if (is_help_option(optarg)) {
-show_valid_cards:
-
-if (soundhw_count) {
- printf("Valid sound card names (comma separated):\n");
- for (c = soundhw; c->name; ++c) {
- printf ("%-11s %s\n", c->name, c->descr);
- }
- printf("\n-soundhw all will enable all of the above\n");
-} else {
- printf("Machine has no user-selectable audio hardware "
-"(it may or may not have always-present audio 
hardware).\n");
-}
-exit(!is_help_option(optarg));
-}
-else {
-size_t l;
-const char *p;
-char *e;
-int bad_card = 0;
-
-if (!strcmp(optarg, "all")) {
-for (c = soundhw; c->name; ++c) {
-c->enabled = 1;
-}
-return;
-}
-
-p = optarg;
-while (*p) {
-e = strchr(p, ',');
-l = !e ? strlen(p) : (size_t) (e - p);
-
-for (c = soundhw; c->name; ++c) {
-if (!strncmp(c->name, p, l) && !c->name[l]) {
-c->enabled = 1;
-break;
-}
-}
-
-if (!c->name) {
-if (l > 80) {
-error_report("Unknown sound card name (too big to show)");
-}
-else {
-error_report("Unknown sound card name `%.*s'",
- (int) l, p);
-}
-bad_card = 1;
-}
-p += l + (e != NULL);
-}
-
-if (bad_card) {
-goto show_valid_cards;
-}
-}
-}
-
-void audio_init(void)
-{
-struct soundhw *c;
-ISABus *isa_bus = (ISABus *) object_resolve_path_type("", TYPE_ISA_BUS, 
NULL);
-PCIBus *pci_bus = (PCIBus *) object_resolve_path_type("", TYPE_PCI_BUS, 
NULL);
-
-for (c = soundhw; c->name; 

Re: [Qemu-devel] [PATCH 6/6] spec/vhost-user spec: Add IOMMU support

2017-05-19 Thread Maxime Coquelin



On 05/19/2017 08:48 AM, Jason Wang wrote:



On 2017年05月17日 22:10, Maxime Coquelin wrote:



On 05/17/2017 04:53 AM, Jason Wang wrote:



On 2017年05月16日 23:16, Michael S. Tsirkin wrote:

On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:


On 2017年05月13日 08:02, Michael S. Tsirkin wrote:

On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:

On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:

On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin 
---
docs/specs/vhost-user.txt | 75 
+++

hw/virtio/vhost-user.c| 31 
2 files changed, 106 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 5fa7016..4a1f0c3 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -97,6 +97,23 @@ Depending on the request type, payload can be:
   log offset: offset from start of supplied file descriptor
   where logging starts (i.e. where guest address 0 
would be logged)

+ * An IOTLB message
+ -
+   | iova | size | user address | permissions flags | type |
+ -
+
+   IOVA: a 64-bit guest I/O virtual address

guest -> VM

Ok.


+   Size: a 64-bit size

How do you specify "all memory"? give special meaning to size 0?

Good point, it does not support all memory currently.
It is not vhost-user specific, but general to the vhost 
implementation.

But iommu needs it to support passthrough.

Probably not, we will just pass the mappings in vhost_memory_region to
vhost. Its memory_size is also a __u64.

Thanks

That's different since that's chunks of qemu virtual memory.

IOMMU maps IOVA to GPA.



But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
passthrough mode is enabled, IOVA == GPA, so passing mappings in 
vhost_memory_region should be fine.


Not sure this is a good idea, because when configured in passthrough,
QEMU will see the IOMMU as enabled, so the the VIRTIO_F_IOMMU_PLATFORM
feature will be negotiated if both guest and backend support it.
So how the backend will know whether it should directly pick the
translation directly into the vhost_memory_region, or translate it
through the device IOTLB?


This no need for backend to know about this, since IOVA equals to GPA, 
vhost_memory_region stores IOVA -> HVA mapping. If we pass them, device 
IOTLB should work as usual?


Ok, I think there were a misunderstanding. I understood you said there
were no need to use the device IOTLB in this case.





Maybe the solution would be for QEMU to wrap "all memory" IOTLB updates
& invalidations to vhost_memory_regions, since the backend won't anyway
be able to perform accesses outside these regions?


This is just what I mean, you can refer Peter's series. >


The only possible "issue" with "all memory" is if you can not use a 
single TLB invalidation to invalidate all caches in remote TLB.


If needed, maybe we could introduce a new VHOST_IOTLB_INVALIDATE message
type?
For older kernel backend that doesn't support it, -EINVAL will be
returned, so QEMU could handle it another way in this case.


We could, but not sure it was really needed.


I meant VHOST_IOTLB_INVALIDATE_ALL, and yes, I'm not sure this is
needed. But this is an option we have it turns out to be at some point.

Thanks,
Maxime



[Qemu-devel] [PATCH v1 13/13] iotest 046: test simultaneous cluster write error case

2017-05-19 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
---
 tests/qemu-iotests/046 | 38 +-
 tests/qemu-iotests/046.out | 23 +++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/046 b/tests/qemu-iotests/046
index f2ebecf..c210b55 100755
--- a/tests/qemu-iotests/046
+++ b/tests/qemu-iotests/046
@@ -29,7 +29,8 @@ status=1  # failure is the default!
 
 _cleanup()
 {
-   _cleanup_test_img
+_cleanup_test_img
+rm "$TEST_DIR/blkdebug.conf"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15
 
@@ -188,6 +189,37 @@ overlay_io | $QEMU_IO blkdebug::"$TEST_IMG" | 
_filter_qemu_io |\
sed -e 's/bytes at offset [0-9]*/bytes at offset XXX/g'
 
 echo
+echo "== Concurrency error case =="
+
+# 1. 1st request allocated the cluster, stop before it writes and updates L2
+# 2. 2nd request starts at the same cluster must complete write and start
+#waiting for the 1st to update L2
+# 3. Resume the 1st request to make it fail (injected error)
+# 4. 2nd request must wake and fail as well
+#1 cluster will end up leaked
+cat > "$TEST_DIR/blkdebug.conf" <

[Qemu-devel] [PATCH v1 02/13] qcow2: is_zero_sectors(): return true if area is outside of backing file

2017-05-19 Thread Anton Nefedov
in such case, bdrv_get_block_status() shall return 0, *nr == 0

iotest 154 updated accordingly: write-zeroes tail alignment can be detected
as zeroes now, so pwrite_zeroes succeeds

Signed-off-by: Anton Nefedov 
---
 block/qcow2.c  | 6 --
 tests/qemu-iotests/154.out | 4 ++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2e6a0ec..b885dfc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2482,7 +2482,7 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t 
start,
 int64_t res;
 
 if (start + count > bs->total_sectors) {
-count = bs->total_sectors - start;
+count = start < bs->total_sectors ? bs->total_sectors - start : 0;
 }
 
 if (!count) {
@@ -2490,7 +2490,9 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t 
start,
 }
 res = bdrv_get_block_status_above(bs, NULL, start, count,
   , );
-return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
+return res >= 0
+&& (((res & BDRV_BLOCK_ZERO) && nr == count)
+|| nr == 0);
 }
 
 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index d8485ee..259340e 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -322,7 +322,7 @@ wrote 1024/1024 bytes at offset 134218240
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 2048/2048 bytes at offset 134217728
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -348,7 +348,7 @@ wrote 1024/1024 bytes at offset 134218240
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, 
"offset": OFFSET}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 
backing_file=TEST_DIR/t.IMGFMT.base
 wrote 2048/2048 bytes at offset 134217728
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.7.4




[Qemu-devel] [PATCH v2 3/4] migration: avoid recursive AioContext locking in save_vmstate()

2017-05-19 Thread Stefan Hajnoczi
AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Paolo Bonzini 
---
 migration/savevm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..3ca319f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2150,6 +2150,14 @@ int save_vmstate(const char *name, Error **errp)
 goto the_end;
 }
 
+/* The bdrv_all_create_snapshot() call that follows acquires the AioContext
+ * for itself.  BDRV_POLL_WHILE() does not support nested locking because
+ * it only releases the lock once.  Therefore synchronous I/O will deadlock
+ * unless we release the AioContext before bdrv_all_create_snapshot().
+ */
+aio_context_release(aio_context);
+aio_context = NULL;
+
 ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, );
 if (ret < 0) {
 error_setg(errp, "Error while creating snapshot on '%s'",
@@ -2160,7 +2168,9 @@ int save_vmstate(const char *name, Error **errp)
 ret = 0;
 
  the_end:
-aio_context_release(aio_context);
+if (aio_context) {
+aio_context_release(aio_context);
+}
 if (saved_vm_running) {
 vm_start();
 }
-- 
2.9.3




[Qemu-devel] [PATCH v2 1/4] spapr_cpu_core: drop reference on ICP object during CPU realization

2017-05-19 Thread Greg Kurz
When a piece of code allocates an object, it implicitely gets a reference
on it. If it then makes that object a child property of another object, it
should drop its own reference at some point otherwise the child object can
never be finalized. The current code hence leaks one ICP object per CPU
when hot-removing a core.

Failing to add a newly allocated ICP object to the CPU is a bug. While here,
let's ensure QEMU aborts if this ever happens.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr_cpu_core.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 1df1404ea52d..ff7058ecc00e 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -143,7 +143,8 @@ static void spapr_cpu_core_realize_child(Object *child, 
Error **errp)
 Object *obj;
 
 obj = object_new(spapr->icp_type);
-object_property_add_child(OBJECT(cpu), "icp", obj, NULL);
+object_property_add_child(OBJECT(cpu), "icp", obj, _abort);
+object_unref(obj);
 object_property_add_const_link(obj, "xics", OBJECT(spapr), _abort);
 object_property_set_bool(obj, true, "realized", _err);
 if (local_err) {




[Qemu-devel] [PATCH v2 0/4] block: fix 'savevm' hang with -object iothread

2017-05-19 Thread Stefan Hajnoczi
v2:
 * New patch to use bdrv_drain_all_begin/end() in savevm/loadvm [Kevin]
   (All other patches unchanged)

The 'savevm' command hangs when -object iothread is used.  See patches for
details, but basically the vmstate read/write code didn't conform to the latest
block layer locking rules.

Stefan Hajnoczi (4):
  block: count bdrv_co_rw_vmstate() requests
  block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
  migration: avoid recursive AioContext locking in save_vmstate()
  migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()

 block/io.c | 21 +
 migration/savevm.c | 21 +++--
 2 files changed, 32 insertions(+), 10 deletions(-)

-- 
2.9.3




[Qemu-devel] [PATCH v2 4/4] spapr: fix migration of ICP objects from/to older QEMU

2017-05-19 Thread Greg Kurz
Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
is an improvement since we no longer allocate ICP objects that will
never be used. But it has the side-effect of breaking migration of
older machine types from older QEMU versions.

This patch introduces a compat flag in the sPAPR machine class so
that all pseries machine up to 2.9 go on with the previous behavior
of pre-allocating ICP objects.

Signed-off-by: Greg Kurz 
---
v2: - s/void* /void * in xics_system_init()
- don't use "[*]" in the ICP object name
- use pre_2_10_ prefix in field names
- added xics_nr_servers() helper
---
 hw/ppc/spapr.c  |   40 +++-
 hw/ppc/spapr_cpu_core.c |   29 -
 include/hw/ppc/spapr.h  |2 ++
 3 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1bb05a9a6b07..182262257c60 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -123,9 +123,15 @@ error:
 return NULL;
 }
 
+static inline int xics_nr_servers(void)
+{
+return ppc_cpu_dt_id_from_index(max_cpus);
+}
+
 static void xics_system_init(MachineState *machine, int nr_irqs, Error **errp)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 
 if (kvm_enabled()) {
 if (machine_kernel_irqchip_allowed(machine) &&
@@ -147,6 +153,35 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 return;
 }
 }
+
+if (smc->pre_2_10_icp_allocation) {
+int nr_servers = xics_nr_servers();
+Error *local_err = NULL;
+int i;
+
+spapr->pre_2_10_icps = g_malloc0(nr_servers * sizeof(ICPState));
+
+for (i = 0; i < nr_servers; i++) {
+void *obj = >pre_2_10_icps[i];
+char *name = g_strdup_printf("icp[%d]", i);
+
+object_initialize(obj, sizeof(ICPState), spapr->icp_type);
+object_property_add_child(OBJECT(spapr), name, obj, _abort);
+g_free(name);
+object_unref(obj);
+object_property_add_const_link(obj, "xics", OBJECT(spapr),
+   _abort);
+object_property_set_bool(obj, true, "realized", _err);
+if (local_err) {
+while (i--) {
+object_unparent(obj);
+}
+g_free(spapr->pre_2_10_icps);
+error_propagate(errp, local_err);
+break;
+}
+}
+}
 }
 
 static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
@@ -1020,7 +1055,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
 _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
 /* /interrupt controller */
-spapr_dt_xics(ppc_cpu_dt_id_from_index(max_cpus), fdt, PHANDLE_XICP);
+spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);
 
 ret = spapr_populate_memory(spapr, fdt);
 if (ret < 0) {
@@ -3286,9 +3321,12 @@ static void 
spapr_machine_2_9_instance_options(MachineState *machine)
 
 static void spapr_machine_2_9_class_options(MachineClass *mc)
 {
+sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_2_10_class_options(mc);
 SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
 mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+smc->pre_2_10_icp_allocation = true;
 }
 
 DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ff7058ecc00e..13c4916aa5e6 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
Error **errp)
 size_t size = object_type_get_instance_size(typename);
 CPUCore *cc = CPU_CORE(dev);
 int i;
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
 for (i = 0; i < cc->nr_threads; i++) {
 void *obj = sc->threads + i * size;
@@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
Error **errp)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 
 spapr_cpu_destroy(cpu);
-object_unparent(cpu->intc);
+if (!spapr->pre_2_10_icps) {
+object_unparent(cpu->intc);
+}
 cpu_remove_sync(cs);
 object_unparent(obj);
 }
@@ -142,13 +145,19 @@ static void spapr_cpu_core_realize_child(Object *child, 
Error **errp)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 Object *obj;
 
-obj = object_new(spapr->icp_type);
-object_property_add_child(OBJECT(cpu), "icp", obj, _abort);
-object_unref(obj);
-object_property_add_const_link(obj, "xics", OBJECT(spapr), _abort);
-object_property_set_bool(obj, true, "realized", _err);
-if (local_err) {
-goto error;
+if (spapr->pre_2_10_icps) {
+int index = cpu->parent_obj.cpu_index;
+

Re: [Qemu-devel] [RFC v5 4/4] hw/intc/arm_gicv3_its: Allow save/restore

2017-05-19 Thread Shannon Zhao


On 2017/4/14 20:46, Eric Auger wrote:
> We change the restoration priority of both the GICv3 and ITS. The
> GICv3 must be restored before the ITS and the ITS needs to be restored
> before PCIe devices since it translates their MSI transactions.
> 
> Signed-off-by: Eric Auger 
> Reviewed-by: Juan Quintela 
> 
> ---
> v2 -> v3:
> - reword migration blocker message
> - remove unmigratable setting to false
> 
> v1 -> v2:
> - handle case where migrate_add_blocker fails
> - add comments along with ITS and GICv3 migration priorities
> ---
>  hw/intc/arm_gicv3_common.c |  1 +
>  hw/intc/arm_gicv3_its_common.c |  2 +-
>  hw/intc/arm_gicv3_its_kvm.c| 24 
>  include/migration/vmstate.h|  2 ++
>  4 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
> index c6493d6..4228b7c 100644
> --- a/hw/intc/arm_gicv3_common.c
> +++ b/hw/intc/arm_gicv3_common.c
> @@ -145,6 +145,7 @@ static const VMStateDescription vmstate_gicv3 = {
>  .minimum_version_id = 1,
>  .pre_save = gicv3_pre_save,
>  .post_load = gicv3_post_load,
> +.priority = MIG_PRI_GICV3,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(gicd_ctlr, GICv3State),
>  VMSTATE_UINT32_ARRAY(gicd_statusr, GICv3State, 2),
> diff --git a/hw/intc/arm_gicv3_its_common.c b/hw/intc/arm_gicv3_its_common.c
> index efab8c7..22ce4c4 100644
> --- a/hw/intc/arm_gicv3_its_common.c
> +++ b/hw/intc/arm_gicv3_its_common.c
> @@ -48,7 +48,7 @@ static const VMStateDescription vmstate_its = {
>  .name = "arm_gicv3_its",
>  .pre_save = gicv3_its_pre_save,
>  .post_load = gicv3_its_post_load,
> -.unmigratable = true,
> +.priority = MIG_PRI_GICV3_ITS,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(ctlr, GICv3ITSState),
>  VMSTATE_UINT32(iidr, GICv3ITSState),
> diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
> index 7c5502c..8401d2f 100644
> --- a/hw/intc/arm_gicv3_its_kvm.c
> +++ b/hw/intc/arm_gicv3_its_kvm.c
> @@ -77,18 +77,6 @@ static void kvm_arm_its_realize(DeviceState *dev, Error 
> **errp)
>  GICv3ITSState *s = ARM_GICV3_ITS_COMMON(dev);
>  Error *local_err = NULL;
>  
> -/*
> - * Block migration of a KVM GICv3 ITS device: the API for saving and
> - * restoring the state in the kernel is not yet available
> - */
> -error_setg(>migration_blocker, "vITS migration is not implemented");
> -migrate_add_blocker(s->migration_blocker, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -error_free(s->migration_blocker);
> -return;
> -}
> -
>  s->dev_fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_ITS, 
> false);
>  if (s->dev_fd < 0) {
>  error_setg_errno(errp, -s->dev_fd, "error creating in-kernel ITS");
> @@ -105,6 +93,18 @@ static void kvm_arm_its_realize(DeviceState *dev, Error 
> **errp)
>  
>  gicv3_its_init_mmio(s, NULL);
>  
> +if (!kvm_device_check_attr(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
> +GITS_CTLR)) {
> +error_setg(>migration_blocker, "This operating system kernel "
> +   "does not support vGICv3 migration");
s/vGICv3/vITS
> +migrate_add_blocker(s->migration_blocker, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +error_free(s->migration_blocker);
> +return;
> +}
> +}
> +
>  kvm_msi_use_devid = true;
>  kvm_gsi_direct_mapping = false;
>  kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled();
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index f2dbf84..8dab9c7 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -198,6 +198,8 @@ enum VMStateFlags {
>  typedef enum {
>  MIG_PRI_DEFAULT = 0,
>  MIG_PRI_IOMMU,  /* Must happen before PCI devices */
> +MIG_PRI_GICV3_ITS,  /* Must happen before PCI devices */
> +MIG_PRI_GICV3,  /* Must happen before the ITS */
>  MIG_PRI_MAX,
>  } MigrationPriority;
>  
> 

Thanks,
-- 
Shannon




[Qemu-devel] [PATCH 16/21] vfio/ccw: get irqs info and set the eventfd fd

2017-05-19 Thread Cornelia Huck
From: Dong Jia Shi 

vfio-ccw resorts to the eventfd mechanism to communicate with userspace.
We fetch the irqs info via the ioctl VFIO_DEVICE_GET_IRQ_INFO,
register a event notifier to get the eventfd fd which is sent
to kernel via the ioctl VFIO_DEVICE_SET_IRQS, then we can implement
read operation once kernel sends the signal.

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-10-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/vfio/ccw.c | 101 ++
 1 file changed, 101 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 7ddcfd7767..689a7724b6 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -22,6 +22,7 @@
 #include "hw/vfio/vfio-common.h"
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/ccw-device.h"
+#include "qemu/error-report.h"
 
 #define TYPE_VFIO_CCW "vfio-ccw"
 typedef struct VFIOCCWDevice {
@@ -30,6 +31,7 @@ typedef struct VFIOCCWDevice {
 uint64_t io_region_size;
 uint64_t io_region_offset;
 struct ccw_io_region *io_region;
+EventNotifier io_notifier;
 } VFIOCCWDevice;
 
 static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
@@ -54,6 +56,97 @@ static void vfio_ccw_reset(DeviceState *dev)
 ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
 }
 
+static void vfio_ccw_io_notifier_handler(void *opaque)
+{
+VFIOCCWDevice *vcdev = opaque;
+
+if (!event_notifier_test_and_clear(>io_notifier)) {
+return;
+}
+}
+
+static void vfio_ccw_register_io_notifier(VFIOCCWDevice *vcdev, Error **errp)
+{
+VFIODevice *vdev = >vdev;
+struct vfio_irq_info *irq_info;
+struct vfio_irq_set *irq_set;
+size_t argsz;
+int32_t *pfd;
+
+if (vdev->num_irqs < VFIO_CCW_IO_IRQ_INDEX + 1) {
+error_setg(errp, "vfio: unexpected number of io irqs %u",
+   vdev->num_irqs);
+return;
+}
+
+argsz = sizeof(*irq_set);
+irq_info = g_malloc0(argsz);
+irq_info->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_info->argsz = argsz;
+if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
+  irq_info) < 0 || irq_info->count < 1) {
+error_setg_errno(errp, errno, "vfio: Error getting irq info");
+goto out_free_info;
+}
+
+if (event_notifier_init(>io_notifier, 0)) {
+error_setg_errno(errp, errno,
+ "vfio: Unable to init event notifier for IO");
+goto out_free_info;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *) _set->data;
+
+*pfd = event_notifier_get_fd(>io_notifier);
+qemu_set_fd_handler(*pfd, vfio_ccw_io_notifier_handler, NULL, vcdev);
+if (ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+error_setg(errp, "vfio: Failed to set up io notification");
+qemu_set_fd_handler(*pfd, NULL, NULL, vcdev);
+event_notifier_cleanup(>io_notifier);
+}
+
+g_free(irq_set);
+
+out_free_info:
+g_free(irq_info);
+}
+
+static void vfio_ccw_unregister_io_notifier(VFIOCCWDevice *vcdev)
+{
+struct vfio_irq_set *irq_set;
+size_t argsz;
+int32_t *pfd;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *) _set->data;
+*pfd = -1;
+
+if (ioctl(vcdev->vdev.fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+error_report("vfio: Failed to de-assign device io fd: %m");
+}
+
+qemu_set_fd_handler(event_notifier_get_fd(>io_notifier),
+NULL, NULL, vcdev);
+event_notifier_cleanup(>io_notifier);
+
+g_free(irq_set);
+}
+
 static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
 {
 VFIODevice *vdev = >vdev;
@@ -173,8 +266,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 goto out_region_err;
 }
 
+vfio_ccw_register_io_notifier(vcdev, );
+if (err) {
+goto out_notifier_err;
+}
+
 return;
 
+out_notifier_err:
+vfio_ccw_put_region(vcdev);
 out_region_err:
 vfio_put_device(vcdev);
 out_device_err:
@@ -195,6 +295,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
+vfio_ccw_unregister_io_notifier(vcdev);
 vfio_ccw_put_region(vcdev);
 vfio_put_device(vcdev);
 

[Qemu-devel] [PATCH 09/21] linux-headers: update

2017-05-19 Thread Cornelia Huck
Update against Linux v4.12-rc1.

Also include the new vfio_ccw.h header.

Signed-off-by: Cornelia Huck 
---
 include/standard-headers/asm-x86/hyperv.h  |  7 +-
 include/standard-headers/linux/input-event-codes.h |  1 +
 include/standard-headers/linux/input.h | 11 +---
 include/standard-headers/linux/pci_regs.h  |  3 ++-
 linux-headers/asm-arm/kvm.h| 10 +++-
 linux-headers/asm-arm/unistd-common.h  |  1 +
 linux-headers/asm-arm64/kvm.h  | 10 +++-
 linux-headers/asm-powerpc/kvm.h|  3 +++
 linux-headers/asm-powerpc/unistd.h |  1 +
 linux-headers/asm-s390/kvm.h   | 29 +++---
 linux-headers/asm-s390/unistd.h|  4 ++-
 linux-headers/asm-x86/kvm.h|  3 +++
 linux-headers/asm-x86/unistd_32.h  |  2 ++
 linux-headers/asm-x86/unistd_64.h  |  1 +
 linux-headers/asm-x86/unistd_x32.h |  1 +
 linux-headers/linux/kvm.h  | 25 +++
 linux-headers/linux/userfaultfd.h  | 11 +---
 linux-headers/linux/vfio.h | 18 ++
 linux-headers/linux/vfio_ccw.h | 24 ++
 scripts/update-linux-headers.sh|  2 +-
 20 files changed, 151 insertions(+), 16 deletions(-)
 create mode 100644 linux-headers/linux/vfio_ccw.h

diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index eca9a2ca22..d0c6e0a079 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -124,7 +124,7 @@
   * Recommend using hypercall for address space switches rather
   * than MOV to CR3 instruction
   */
-#define HV_X64_MWAIT_RECOMMENDED   (1 << 0)
+#define HV_X64_AS_SWITCH_RECOMMENDED   (1 << 0)
 /* Recommend using hypercall for local TLB flushes rather
  * than INVLPG or MOV to CR3 instructions */
 #define HV_X64_LOCAL_TLB_FLUSH_RECOMMENDED (1 << 1)
@@ -148,6 +148,11 @@
 #define HV_X64_RELAXED_TIMING_RECOMMENDED  (1 << 5)
 
 /*
+ * Virtual APIC support
+ */
+#define HV_X64_DEPRECATING_AEOI_RECOMMENDED(1 << 9)
+
+/*
  * Crash notification flag.
  */
 #define HV_CRASH_CTL_CRASH_NOTIFY (1ULL << 63)
diff --git a/include/standard-headers/linux/input-event-codes.h 
b/include/standard-headers/linux/input-event-codes.h
index c8b3338375..29d463af37 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -641,6 +641,7 @@
  * e.g. teletext or data broadcast application (MHEG, MHP, HbbTV, etc.)
  */
 #define KEY_DATA   0x277
+#define KEY_ONSCREEN_KEYBOARD  0x278
 
 #define BTN_TRIGGER_HAPPY  0x2c0
 #define BTN_TRIGGER_HAPPY1 0x2c0
diff --git a/include/standard-headers/linux/input.h 
b/include/standard-headers/linux/input.h
index b472b8530c..666e201ddb 100644
--- a/include/standard-headers/linux/input.h
+++ b/include/standard-headers/linux/input.h
@@ -58,9 +58,14 @@ struct input_id {
  * Note that input core does not clamp reported values to the
  * [minimum, maximum] limits, such task is left to userspace.
  *
- * Resolution for main axes (ABS_X, ABS_Y, ABS_Z) is reported in
- * units per millimeter (units/mm), resolution for rotational axes
- * (ABS_RX, ABS_RY, ABS_RZ) is reported in units per radian.
+ * The default resolution for main axes (ABS_X, ABS_Y, ABS_Z)
+ * is reported in units per millimeter (units/mm), resolution
+ * for rotational axes (ABS_RX, ABS_RY, ABS_RZ) is reported
+ * in units per radian.
+ * When INPUT_PROP_ACCELEROMETER is set the resolution changes.
+ * The main axes (ABS_X, ABS_Y, ABS_Z) are then reported in
+ * in units per g (units/g) and in units per degree per second
+ * (units/deg/s) for rotational axes (ABS_RX, ABS_RY, ABS_RZ).
  */
 struct input_absinfo {
int32_t value;
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index 634c9c44ed..d56bb00510 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -114,7 +114,7 @@
 #define PCI_SUBSYSTEM_ID   0x2e
 #define PCI_ROM_ADDRESS0x30/* Bits 31..11 are address, 
10..1 reserved */
 #define  PCI_ROM_ADDRESS_ENABLE0x01
-#define PCI_ROM_ADDRESS_MASK   (~0x7ffUL)
+#define PCI_ROM_ADDRESS_MASK   (~0x7ffU)
 
 #define PCI_CAPABILITY_LIST0x34/* Offset of first capability list 
entry */
 
@@ -630,6 +630,7 @@
 #define  PCI_EXP_DEVCTL2_COMP_TIMEOUT  0x000f  /* Completion Timeout Value */
 #define  PCI_EXP_DEVCTL2_ARI   0x0020  /* Alternative Routing-ID */
 #define PCI_EXP_DEVCTL2_ATOMIC_REQ 0x0040  /* Set Atomic requests */
+#define PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK 0x0080 /* Block atomic 

[Qemu-devel] [PATCH 20/21] MAINTAINERS: Add vfio-ccw maintainer

2017-05-19 Thread Cornelia Huck
From: Dong Jia Shi 

Add Cornelia Huck as the vfio-ccw maintainer.

Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-14-bjsdj...@linux.vnet.ibm.com>
[CH: add tree]
Signed-off-by: Cornelia Huck 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ef2ec58a94..7df088259b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1005,6 +1005,14 @@ S: Supported
 F: hw/vfio/*
 F: include/hw/vfio/
 
+vfio-ccw
+M: Cornelia Huck 
+S: Supported
+F: hw/vfio/ccw.c
+F: hw/s390x/s390-ccw.c
+F: include/hw/s390x/s390-ccw.h
+T: git git://github.com/cohuck/qemu.git s390-next
+
 vhost
 M: Michael S. Tsirkin 
 S: Supported
-- 
2.13.0




[Qemu-devel] [PULL 17/20] vhost-user-scsi: Introduce vhost-user-scsi host device

2017-05-19 Thread Paolo Bonzini
From: Felipe Franciosi 

This commit introduces a vhost-user device for SCSI. This is based
on the existing vhost-scsi implementation, but done over vhost-user
instead. It also uses a chardev to connect to the backend. Unlike
vhost-scsi (today), VMs using vhost-user-scsi can be live migrated.

To use it, start Qemu with a command line equivalent to:

qemu-system-x86_64 \
   -chardev socket,id=vus0,path=/tmp/vus.sock \
   -device vhost-user-scsi-pci,chardev=vus0,bus=pci.0,addr=...

A separate commit presents a sample application linked with libiscsi to
provide a backend for vhost-user-scsi.

Signed-off-by: Felipe Franciosi 
Message-Id: <1488479153-21203-4-git-send-email-fel...@nutanix.com>
---
 .gitignore  |   1 +
 default-configs/pci.mak |   1 +
 default-configs/s390x-softmmu.mak   |   1 +
 hw/scsi/Makefile.objs   |   1 +
 hw/scsi/vhost-user-scsi.c   | 215 
 hw/virtio/virtio-pci.c  |  54 +
 hw/virtio/virtio-pci.h  |  11 ++
 include/hw/virtio/vhost-user-scsi.h |  35 ++
 include/hw/virtio/virtio-scsi.h |   3 +
 9 files changed, 322 insertions(+)
 create mode 100644 hw/scsi/vhost-user-scsi.c
 create mode 100644 include/hw/virtio/vhost-user-scsi.h

diff --git a/.gitignore b/.gitignore
index 55a001e..fa96bd2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -50,6 +50,7 @@
 /qemu-version.h.tmp
 /module_block.h
 /vscclient
+/vhost-user-scsi
 /fsdev/virtfs-proxy-helper
 *.[1-9]
 *.a
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index 60dc651..ada9c6f 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -42,3 +42,4 @@ CONFIG_VGA=y
 CONFIG_VGA_PCI=y
 CONFIG_IVSHMEM=$(CONFIG_EVENTFD)
 CONFIG_ROCKER=y
+CONFIG_VHOST_USER_SCSI=$(CONFIG_POSIX)
diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index 9615a48..9a0b6d9 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -1,5 +1,6 @@
 CONFIG_PCI=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VHOST_USER_SCSI=y
 CONFIG_VIRTIO=y
 CONFIG_SCLPCONSOLE=y
 CONFIG_TERMINAL3270=y
diff --git a/hw/scsi/Makefile.objs b/hw/scsi/Makefile.objs
index 54d8754..b188f72 100644
--- a/hw/scsi/Makefile.objs
+++ b/hw/scsi/Makefile.objs
@@ -11,4 +11,5 @@ obj-$(CONFIG_PSERIES) += spapr_vscsi.o
 ifeq ($(CONFIG_VIRTIO),y)
 obj-y += virtio-scsi.o virtio-scsi-dataplane.o
 obj-$(CONFIG_VHOST_SCSI) += vhost-scsi-common.o vhost-scsi.o
+obj-$(CONFIG_VHOST_USER_SCSI) += vhost-scsi-common.o vhost-user-scsi.o
 endif
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
new file mode 100644
index 000..694a637
--- /dev/null
+++ b/hw/scsi/vhost-user-scsi.c
@@ -0,0 +1,215 @@
+/*
+ * vhost-user-scsi host device
+ *
+ * Copyright (c) 2016 Nutanix Inc. All rights reserved.
+ *
+ * Author:
+ *  Felipe Franciosi 
+ *
+ * This work is largely based on the "vhost-scsi" implementation by:
+ *  Stefan Hajnoczi
+ *  Nicholas Bellinger 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/typedefs.h"
+#include "qom/object.h"
+#include "hw/fw-path-provider.h"
+#include "hw/qdev-core.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-backend.h"
+#include "hw/virtio/vhost-user-scsi.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-access.h"
+#include "sysemu/char.h"
+
+/* Features supported by the host application */
+static const int user_feature_bits[] = {
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_SCSI_F_HOTPLUG,
+VHOST_INVALID_FEATURE_BIT
+};
+
+static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VHostUserSCSI *s = (VHostUserSCSI *)vdev;
+VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
+
+if (vsc->dev.started == start) {
+return;
+}
+
+if (start) {
+int ret;
+
+ret = vhost_scsi_common_start(vsc);
+if (ret < 0) {
+error_report("unable to start vhost-user-scsi: %s", 
strerror(-ret));
+exit(1);
+}
+} else {
+vhost_scsi_common_stop(vsc);
+}
+}
+
+static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
+{
+}
+
+static void vhost_user_scsi_save(QEMUFile *f, void *opaque)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+virtio_save(vdev, f);
+}
+
+static int vhost_user_scsi_load(QEMUFile *f, void *opaque, int version_id)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+return virtio_load(vdev, f, version_id);
+}
+
+static void 

[Qemu-devel] [PULL 18/20] vhost-user-scsi: Introduce a vhost-user-scsi sample application

2017-05-19 Thread Paolo Bonzini
From: Felipe Franciosi 

This commit introduces a vhost-user-scsi backend sample application. It
must be linked with libiscsi and libvhost-user.

To use it, compile with:
  $ make vhost-user-scsi

And run as follows:
  $ ./vhost-user-scsi -u vus.sock -i iscsi://uri_to_target/
  $ qemu-system-x86_64 --enable-kvm -m 512 \
  -object memory-backend-file,id=mem,size=512m,share=on,mem-path=guestmem \
  -numa node,memdev=mem \
  -chardev socket,id=vhost-user-scsi,path=vus.sock \
  -device vhost-user-scsi-pci,chardev=vhost-user-scsi \

The application is currently limited at one LUN only and it processes
requests synchronously (therefore only achieving QD1). The purpose of
the code is to show how a backend can be implemented and to test the
vhost-user-scsi Qemu implementation.

If a different instance of this vhost-user-scsi application is executed
at a remote host, a VM can be live migrated to such a host.

Signed-off-by: Felipe Franciosi 
Message-Id: <1488479153-21203-5-git-send-email-fel...@nutanix.com>
---
 Makefile  |   3 +
 Makefile.objs |   4 +
 contrib/vhost-user-scsi/Makefile.objs |   1 +
 contrib/vhost-user-scsi/vhost-user-scsi.c | 886 ++
 4 files changed, 894 insertions(+)
 create mode 100644 contrib/vhost-user-scsi/Makefile.objs
 create mode 100644 contrib/vhost-user-scsi/vhost-user-scsi.c

diff --git a/Makefile b/Makefile
index c830d7a..e14988d 100644
--- a/Makefile
+++ b/Makefile
@@ -269,6 +269,7 @@ dummy := $(call unnest-vars,, \
 ivshmem-client-obj-y \
 ivshmem-server-obj-y \
 libvhost-user-obj-y \
+vhost-user-scsi-obj-y \
 qga-vss-dll-obj-y \
 block-obj-y \
 block-obj-m \
@@ -473,6 +474,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) 
$(COMMON_LDADDS)
$(call LINK, $^)
 ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS)
$(call LINK, $^)
+vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y)
+   $(call LINK, $^)
 
 module_block.h: $(SRC_PATH)/scripts/modules/module_block.py config-host.mak
$(call quiet-command,$(PYTHON) $< $@ \
diff --git a/Makefile.objs b/Makefile.objs
index 2100845..1fa9450 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -112,6 +112,10 @@ qga-vss-dll-obj-y = qga/
 ivshmem-client-obj-y = contrib/ivshmem-client/
 ivshmem-server-obj-y = contrib/ivshmem-server/
 libvhost-user-obj-y = contrib/libvhost-user/
+vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
+vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
+vhost-user-scsi-obj-y = contrib/vhost-user-scsi/
+vhost-user-scsi-obj-y += contrib/libvhost-user/libvhost-user.o
 
 ##
 trace-events-subdirs =
diff --git a/contrib/vhost-user-scsi/Makefile.objs 
b/contrib/vhost-user-scsi/Makefile.objs
new file mode 100644
index 000..e83a38a
--- /dev/null
+++ b/contrib/vhost-user-scsi/Makefile.objs
@@ -0,0 +1 @@
+vhost-user-scsi-obj-y = vhost-user-scsi.o
diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c 
b/contrib/vhost-user-scsi/vhost-user-scsi.c
new file mode 100644
index 000..e41bad0
--- /dev/null
+++ b/contrib/vhost-user-scsi/vhost-user-scsi.c
@@ -0,0 +1,886 @@
+/*
+ * vhost-user-scsi sample application
+ *
+ * Copyright (c) 2016 Nutanix Inc. All rights reserved.
+ *
+ * Author:
+ *  Felipe Franciosi 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 only.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "contrib/libvhost-user/libvhost-user.h"
+#include "hw/virtio/virtio-scsi.h"
+#include "iscsi/iscsi.h"
+
+#include 
+
+/* Small compat shim from glib 2.32 */
+#ifndef G_SOURCE_CONTINUE
+#define G_SOURCE_CONTINUE TRUE
+#endif
+#ifndef G_SOURCE_REMOVE
+#define G_SOURCE_REMOVE FALSE
+#endif
+
+//#define VUS_DEBUG 1
+
+/** Log helpers **/
+
+#define PPRE  \
+struct timespec ts;   \
+char   timebuf[64];   \
+struct tm tm; \
+(void)clock_gettime(CLOCK_REALTIME, ); \
+(void)strftime(timebuf, 64, "%Y%m%d %T", gmtime_r(_sec, ))
+
+#define PEXT(lvl, msg, ...) do {  \
+PPRE; \
+fprintf(stderr, "%s.%06ld " lvl ": %s:%s():%d: " msg "\n",\
+timebuf, ts.tv_nsec/1000, \
+__FILE__, __FUNCTION__, __LINE__, ## __VA_ARGS__);\
+} while(0)
+
+#define PNOR(lvl, msg, ...) do {  \
+PPRE; \
+

[Qemu-devel] [Bug 1635339] Re: qxl_pre_save assertion failure on vm "save"

2017-05-19 Thread Frediano Ziglio
wddm dod 0.17 version released which fixes the issue guest side.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1635339

Title:
  qxl_pre_save assertion failure on vm "save"

Status in QEMU:
  Confirmed

Bug description:
  When I try and save my Windows 10 VM, I see an assertion failure, and
  the machine is shut down.

  I see the following in the log:

  main_channel_handle_parsed: agent start
  qemu-system-x86_64: /build/qemu-Zwynhi/qemu-2.5+dfsg/hw/display/qxl.c:2101: 
qxl_pre_save: Assertion `d->last_release_offset < d->vga.vram_size' failed.
  2016-10-20 11:52:42.713+: shutting down

  Please let me know what other information would be relevant!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1635339/+subscriptions



Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-19 Thread Xiao Guangrong


I do not know why i was removed from the list.


On 05/19/2017 04:09 PM, Jay Zhou wrote:

Hi Paolo and Wanpeng,

On 2017/5/17 16:38, Wanpeng Li wrote:

2017-05-17 15:43 GMT+08:00 Paolo Bonzini :

Recently, I have tested the performance before migration and after migration 
failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard 
performance
evaluation tool.

These are the steps:
==
  (1) the version of kmod is 4.4.11(with slightly modified) and the version of
  qemu is 2.6.0
 (with slightly modified), the kmod is applied with the following patch

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
  */
 if ((change != KVM_MR_DELETE) &&
 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   }

 /*
  * Set up write protection and/or dirty logging for the new slot.


Try these modifications to the setup:

1) set up 1G hugetlbfs hugepages and use those for the guest's memory

2) test both without and with the above patch.



In order to avoid random memory allocation issues, I reran the test cases:
(1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a 
pcpu respectively, these resources(memory and pcpu) allocated to VM are all 
from NUMA node 0
(2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration, and 
get a result. And then, migration failure is constructed. At last, I run the 
test case again, and get an another result.


I guess this case purely writes the memory, that means the readonly mappings 
will
always be dropped by #PF, then huge mappings are established.

If benchmark memory read, you show observe its difference.

Thanks!



[Qemu-devel] [PATCH v1 11/13] qcow2-cluster: make handle_dependencies() logic easier to follow

2017-05-19 Thread Anton Nefedov
Avoid complicated nested conditions; return or continue asap instead.
The logic is not changed.

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c | 45 ++---
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 03d6f7e..c0974e8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -926,32 +926,31 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 
 if (end <= old_start || start >= old_end) {
 /* No intersection */
-} else {
-if (start < old_start) {
-/* Stop at the start of a running allocation */
-bytes = old_start - start;
-} else {
-bytes = 0;
-}
+continue;
+}
 
-/* Stop if already an l2meta exists. After yielding, it wouldn't
- * be valid any more, so we'd have to clean up the old L2Metas
- * and deal with requests depending on them before starting to
- * gather new ones. Not worth the trouble. */
-if (bytes == 0 && *m) {
-/* start must be cluster aligned at this point */
-assert(start == start_of_cluster(s, start));
-*cur_bytes = 0;
-return 0;
-}
+if (start < old_start) {
+/* Stop at the start of a running allocation */
+bytes = old_start - start;
+/* ..if there is no other conflict, keep checking */
+continue;
+}
 
-if (bytes == 0) {
-/* Wait for the dependency to complete. We need to recheck
- * the free/allocated clusters when we continue. */
-qemu_co_queue_wait(_alloc->dependent_requests, >lock);
-return -EAGAIN;
-}
+/* Stop if already an l2meta exists. After yielding, it wouldn't
+ * be valid any more, so we'd have to clean up the old L2Metas
+ * and deal with requests depending on them before starting to
+ * gather new ones. Not worth the trouble. */
+if (*m) {
+/* start must be cluster aligned at this point */
+assert(start == start_of_cluster(s, start));
+*cur_bytes = 0;
+return 0;
 }
+
+/* Wait for the dependency to complete. We need to recheck
+ * the free/allocated clusters when we continue. */
+qemu_co_queue_wait(_alloc->dependent_requests, >lock);
+return -EAGAIN;
 }
 
 /* Make sure that existing clusters and new allocations are only used up to
-- 
2.7.4




Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

2017-05-19 Thread Peter Xu
On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

Indeed. I should not block postcopy if the user used a TCP tunnel
between the source and destination in some way, using this exec: way.
Thanks for pointing that out.

However I still think the idea is needed here. Say, we'd better know
whether the transport would be able to respond (though current
approach of "assuming sockets are the only ones that can reply" is not
a good solution...). Please see below.

> 
> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Firstly I should not say it's a hang - it's actually by-design here
imho - migration thread is in the last phase now, waiting for a SHUT
message from destination (which I think is wise). But from the
behavior, indeed src VM is not usable during the time, just like what
happened for most postcopy cases on the source side. So, we can see
that postcopy "assumes" that destination side can reply now.

Meanwhile, I see it reasonable for postcopy to have such an
assumption. After all, postcopy means "start VM on destination before
pages are moved over completely", then there must be someone to reply
to source, no matter whether it'll be via some kind of io channel.

That's why I think we still need the general idea here, that we need
to know whether destination end is able to reply.

But, I still have no good idea (after knowing this patch won't work)
on how we can do this... Any further suggestions would be greatly
welcomed.

Thanks,

-- 
Peter Xu



[Qemu-devel] [PATCH 06/21] pc-bios/s390-ccw: Get Block Limits VPD device data

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

The "Block Limits" Inquiry VPD page is optional for any SCSI device,
but if it's supported it provides a hint of the maximum I/O transfer
length for this particular device. If this page is supported by the
disk, let's issue that Inquiry and use the minimum of it and the
SCSI controller limit. That will cover this scenario:

  qemu-system-s390x ...
-device virtio-scsi-ccw,id=scsi0,max_sectors=32768 ...
-drive file=/dev/sda,if=none,id=drive0,format=raw ...
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,
drive=drive0,id=disk0,max_io_size=1048576

controller: 32768 sectors x 512 bytes/sector = 16777216 bytes
  disk: 1048576 bytes

Now that we have a limit for a virtio-scsi disk, compare that with the
limit for the virtio-scsi controller when we actually build the I/O.
The minimum of these two limits should be the one we use.

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-7-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/scsi.h| 14 ++
 pc-bios/s390-ccw/virtio-scsi.c | 21 -
 pc-bios/s390-ccw/virtio.h  |  1 +
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/pc-bios/s390-ccw/scsi.h b/pc-bios/s390-ccw/scsi.h
index 803eff8ae3..fe3fd5ac05 100644
--- a/pc-bios/s390-ccw/scsi.h
+++ b/pc-bios/s390-ccw/scsi.h
@@ -33,6 +33,7 @@
 /* SCSI Inquiry Pages */
 #define SCSI_INQUIRY_STANDARD_NONE  0x00U
 #define SCSI_INQUIRY_EVPD_SUPPORTED_PAGES   0x00U
+#define SCSI_INQUIRY_EVPD_BLOCK_LIMITS  0xb0U
 
 union ScsiLun {
 uint64_t v64;/* numeric shortcut */
@@ -87,6 +88,19 @@ struct ScsiInquiryEvpdPages {
 }  __attribute__((packed));
 typedef struct ScsiInquiryEvpdPages ScsiInquiryEvpdPages;
 
+struct ScsiInquiryEvpdBl {
+uint8_t peripheral_qdt; /* b0, use (b0 & 0x1f) to get SCSI_INQ_RDT  */
+uint8_t page_code;
+uint16_t page_length;
+uint8_t b4;
+uint8_t b5;
+uint16_t b6;
+uint32_t max_transfer;  /* b8   */
+uint32_t b12[7];/* b12..b43 (defined fields)*/
+uint32_t b44[5];/* b44..b63 (reserved fields)   */
+}  __attribute__((packed));
+typedef struct ScsiInquiryEvpdBl ScsiInquiryEvpdBl;
+
 struct ScsiCdbInquiry {
 uint8_t command; /* b0, == 0x12 */
 uint8_t b1;  /* b1, |= 0x01 (evpd)  */
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index e34755c4d4..b722f25ad7 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -20,6 +20,7 @@ static VirtioScsiCmdResp resp;
 
 static uint8_t scsi_inquiry_std_response[256];
 static ScsiInquiryEvpdPages scsi_inquiry_evpd_pages_response;
+static ScsiInquiryEvpdBl scsi_inquiry_evpd_bl_response;
 
 static inline void vs_assert(bool term, const char **msgs)
 {
@@ -262,9 +263,11 @@ int virtio_scsi_read_many(VDev *vdev,
 int sector_count;
 int f = vdev->blk_factor;
 unsigned int data_size;
+unsigned int max_transfer = MIN_NON_ZERO(vdev->config.scsi.max_sectors,
+ vdev->max_transfer);
 
 do {
-sector_count = MIN_NON_ZERO(sec_num, vdev->config.scsi.max_sectors);
+sector_count = MIN_NON_ZERO(sec_num, max_transfer);
 data_size = sector_count * virtio_get_block_size() * f;
 if (!scsi_read_10(vdev, sector * f, sector_count * f, load_addr,
   data_size)) {
@@ -321,6 +324,7 @@ void virtio_scsi_setup(VDev *vdev)
 uint8_t data[256];
 uint32_t data_size = sizeof(data);
 ScsiInquiryEvpdPages *evpd = _inquiry_evpd_pages_response;
+ScsiInquiryEvpdBl *evpd_bl = _inquiry_evpd_bl_response;
 int i;
 
 vdev->scsi_device = _scsi_device;
@@ -378,6 +382,21 @@ void virtio_scsi_setup(VDev *vdev)
 
 for (i = 0; i <= evpd->page_length; i++) {
 debug_print_int("supported EVPD page", evpd->byte[i]);
+
+if (evpd->byte[i] != SCSI_INQUIRY_EVPD_BLOCK_LIMITS) {
+continue;
+}
+
+if (!scsi_inquiry(vdev,
+  SCSI_INQUIRY_EVPD,
+  SCSI_INQUIRY_EVPD_BLOCK_LIMITS,
+  evpd_bl,
+  sizeof(*evpd_bl))) {
+virtio_scsi_verify_response(, 
"virtio-scsi:setup:blocklimits");
+}
+
+debug_print_int("max transfer", evpd_bl->max_transfer);
+vdev->max_transfer = evpd_bl->max_transfer;
 }
 
 if (!scsi_read_capacity(vdev, data, data_size)) {
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 3388a423e5..1eaf865b1f 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -277,6 +277,7 @@ struct VDev {
 bool scsi_device_selected;
 ScsiDevice selected_scsi_device;

[Qemu-devel] [PATCH 15/21] vfio/ccw: get io region info

2017-05-19 Thread Cornelia Huck
From: Dong Jia Shi 

vfio-ccw provides an MMIO region for I/O operations. We fetch its
information via ioctls here, then we can use it performing I/O
instructions and retrieving I/O results later on.

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-9-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/vfio/ccw.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 7d2497cee6..7ddcfd7767 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -12,6 +12,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "qemu/osdep.h"
@@ -26,6 +27,9 @@
 typedef struct VFIOCCWDevice {
 S390CCWDevice cdev;
 VFIODevice vdev;
+uint64_t io_region_size;
+uint64_t io_region_offset;
+struct ccw_io_region *io_region;
 } VFIOCCWDevice;
 
 static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
@@ -50,6 +54,48 @@ static void vfio_ccw_reset(DeviceState *dev)
 ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
 }
 
+static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
+{
+VFIODevice *vdev = >vdev;
+struct vfio_region_info *info;
+int ret;
+
+/* Sanity check device */
+if (!(vdev->flags & VFIO_DEVICE_FLAGS_CCW)) {
+error_setg(errp, "vfio: Um, this isn't a vfio-ccw device");
+return;
+}
+
+if (vdev->num_regions < VFIO_CCW_CONFIG_REGION_INDEX + 1) {
+error_setg(errp, "vfio: Unexpected number of the I/O region %u",
+   vdev->num_regions);
+return;
+}
+
+ret = vfio_get_region_info(vdev, VFIO_CCW_CONFIG_REGION_INDEX, );
+if (ret) {
+error_setg_errno(errp, -ret, "vfio: Error getting config info");
+return;
+}
+
+vcdev->io_region_size = info->size;
+if (sizeof(*vcdev->io_region) != vcdev->io_region_size) {
+error_setg(errp, "vfio: Unexpected size of the I/O region");
+g_free(info);
+return;
+}
+
+vcdev->io_region_offset = info->offset;
+vcdev->io_region = g_malloc0(info->size);
+
+g_free(info);
+}
+
+static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
+{
+g_free(vcdev->io_region);
+}
+
 static void vfio_put_device(VFIOCCWDevice *vcdev)
 {
 g_free(vcdev->vdev.name);
@@ -122,8 +168,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 goto out_device_err;
 }
 
+vfio_ccw_get_region(vcdev, );
+if (err) {
+goto out_region_err;
+}
+
 return;
 
+out_region_err:
+vfio_put_device(vcdev);
 out_device_err:
 vfio_put_group(group);
 out_group_err:
@@ -142,6 +195,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
+vfio_ccw_put_region(vcdev);
 vfio_put_device(vcdev);
 vfio_put_group(group);
 
-- 
2.13.0




[Qemu-devel] [PATCH 04/21] pc-bios/s390-ccw: Refactor scsi_inquiry function

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

If we want to issue any of the SCSI Inquiry EVPD pages,
which we do, we could use this function to issue both types
of commands with a little bit of refactoring.

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-5-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/scsi.h|  6 ++
 pc-bios/s390-ccw/virtio-scsi.c | 10 --
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/pc-bios/s390-ccw/scsi.h b/pc-bios/s390-ccw/scsi.h
index fc830f7e52..83ffaef54e 100644
--- a/pc-bios/s390-ccw/scsi.h
+++ b/pc-bios/s390-ccw/scsi.h
@@ -26,6 +26,12 @@
 #define SCSI_SENSE_KEY_NO_SENSE 0
 #define SCSI_SENSE_KEY_UNIT_ATTENTION   6
 
+/* SCSI Inquiry Types */
+#define SCSI_INQUIRY_STANDARD   0x00U
+
+/* SCSI Inquiry Pages */
+#define SCSI_INQUIRY_STANDARD_NONE  0x00U
+
 union ScsiLun {
 uint64_t v64;/* numeric shortcut */
 uint8_t  v8[8];  /* generic 8 bytes representation   */
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index ff65e2ee30..9d2e14cdf0 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -89,10 +89,13 @@ static void vs_run(const char *title, VirtioCmd *cmd, VDev 
*vdev,
 
 /* SCSI protocol implementation routines */
 
-static bool scsi_inquiry(VDev *vdev, void *data, uint32_t data_size)
+static bool scsi_inquiry(VDev *vdev, uint8_t evpd, uint8_t page,
+ void *data, uint32_t data_size)
 {
 ScsiCdbInquiry cdb = {
 .command = 0x12,
+.b1 = evpd,
+.b2 = page,
 .alloc_len = data_size < 65535 ? data_size : 65535,
 };
 VirtioCmd inquiry[] = {
@@ -346,7 +349,10 @@ void virtio_scsi_setup(VDev *vdev)
 }
 
 /* read and cache SCSI INQUIRY response */
-if (!scsi_inquiry(vdev, scsi_inquiry_std_response,
+if (!scsi_inquiry(vdev,
+  SCSI_INQUIRY_STANDARD,
+  SCSI_INQUIRY_STANDARD_NONE,
+  scsi_inquiry_std_response,
   sizeof(scsi_inquiry_std_response))) {
 virtio_scsi_verify_response(, "virtio-scsi:setup:inquiry");
 }
-- 
2.13.0




Re: [Qemu-devel] [Qemu-ppc] [RESEND PATCH v10 1/5] hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState

2017-05-19 Thread Daniel Henrique Barboza



On 05/19/2017 01:26 AM, David Gibson wrote:

On Thu, May 18, 2017 at 06:54:12PM -0300, Daniel Henrique Barboza wrote:

The LMB DRC release callback, spapr_lmb_release(), uses an opaque
parameter, a sPAPRDIMMState struct that stores the current LMBs that
are allocated to a DIMM (nr_lmbs). After each call to this callback,
the nr_lmbs is decremented by one and, when it reaches zero, the callback
proceeds with the qdev calls to hot unplug the LMB.

Using drc->detach_cb_opaque is problematic because it can't be migrated in
the future DRC migration work. This patch makes the following changes to
eliminate the usage of this opaque callback inside spapr_lmb_release:

- sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
attribute called 'addr' was added to it. This is used as an unique
identifier to associate a sPAPRDIMMState to a PCDIMM element.

- sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
that are currently going under an unplug process.

- spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.

After these changes, the opaque argument for spapr_lmb_release is now
unused and is passed as NULL inside spapr_del_lmbs. This and the other
opaque arguments can now be safely removed from the code.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/ppc/spapr.c | 57 +-
  include/hw/ppc/spapr.h |  4 
  2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0980d73..b05abe5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2050,6 +2050,7 @@ static void ppc_spapr_init(MachineState *machine)
  msi_nonbroken = true;
  
  QLIST_INIT(>phbs);

+QTAILQ_INIT(>pending_dimm_unplugs);
  
  /* Allocate RMA if necessary */

  rma_alloc_size = kvmppc_alloc_rma();
@@ -2603,20 +2604,63 @@ out:
  error_propagate(errp, local_err);
  }
  
-typedef struct sPAPRDIMMState {

+struct sPAPRDIMMState {
+uint64_t addr;

Since you're not trying to migrate this any more, you can index the
list by an actual PCDIMMDevice *, rather than the base address.
You're already passing the DeviceState * for the DIMM around, so this
will actually remove the address parameter from some functions.

Good idea.



I think that could actually be done as a preliminary cleanup.  It also
probably makes sense to merge spapr_del_lmbs() with
spapr_memory_unplug_request(), they're both very small.


Ok.





  uint32_t nr_lmbs;
-} sPAPRDIMMState;
+QTAILQ_ENTRY(sPAPRDIMMState) next;
+};
+
+static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
+   uint64_t addr)
+{
+sPAPRDIMMState *dimm_state = NULL;
+QTAILQ_FOREACH(dimm_state, >pending_dimm_unplugs, next) {
+if (dimm_state->addr == addr) {
+break;
+}
+}
+return dimm_state;
+}
+
+static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
+   sPAPRDIMMState *dimm_state)
+{
+g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->addr));
+QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
+}
+
+static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
+  sPAPRDIMMState *dimm_state)
+{
+QTAILQ_REMOVE(>pending_dimm_unplugs, dimm_state, next);
+g_free(dimm_state);
+}
+
+static uint64_t spapr_dimm_get_address(PCDIMMDevice *dimm)
+{
+Error *local_err = NULL;
+uint64_t addr;
+addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP,
+   _err);
+if (local_err) {
+error_propagate(_abort, local_err);
+return 0;
+}
+return addr;
+}
  
  static void spapr_lmb_release(DeviceState *dev, void *opaque)

  {
-sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
  HotplugHandler *hotplug_ctrl;
+uint64_t addr = spapr_dimm_get_address(PC_DIMM(dev));
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());

I prefer not to access the machine as a global when possible.  I think
it's preferable to pass down the spapr object from above -
unplug_request() itself can get it from hotplug_dev.


I see that we have access to the hotplug_dev (HotplugHandler) in the end of
spapr_lmb_release:

hotplug_ctrl = qdev_get_hotplug_handler(dev);

One alternative would be to move this call up in the function and then
retrieve the machine as unplug_request() does:

hotplug_ctrl = 

[Qemu-devel] [PULL 14/20] nbd/client.c: use errp instead of LOG

2017-05-19 Thread Paolo Bonzini
From: Vladimir Sementsov-Ogievskiy 

Move to modern errp scheme from just LOGging errors.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20170516094533.6160-6-vsement...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 block/nbd-client.c  |  7 ++-
 include/block/nbd.h |  5 +++--
 nbd/client.c| 30 +-
 qemu-nbd.c  |  3 ++-
 4 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 538d95e..073032b 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -28,6 +28,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "nbd-client.h"
 
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
@@ -70,10 +71,14 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 NBDClientSession *s = opaque;
 uint64_t i;
 int ret;
+Error *local_err;
 
 for (;;) {
 assert(s->reply.handle == 0);
-ret = nbd_receive_reply(s->ioc, >reply);
+ret = nbd_receive_reply(s->ioc, >reply, _err);
+if (ret < 0) {
+error_report_err(local_err);
+}
 if (ret <= 0) {
 break;
 }
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9d385ea..416257a 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -133,9 +133,10 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint16_t *flags,
   QCryptoTLSCreds *tlscreds, const char *hostname,
   QIOChannel **outioc,
   off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size,
+ Error **errp);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
+ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 
diff --git a/nbd/client.c b/nbd/client.c
index f102375..595d99e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -627,11 +627,13 @@ fail:
 }
 
 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size,
+ Error **errp)
 {
 unsigned long sectors = size / BDRV_SECTOR_SIZE;
 if (size / BDRV_SECTOR_SIZE != sectors) {
-LOG("Export size %lld too large for 32-bit kernel", (long long) size);
+error_setg(errp, "Export size %lld too large for 32-bit kernel",
+   (long long) size);
 return -E2BIG;
 }
 
@@ -639,7 +641,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_SOCK, (unsigned long) sioc->fd) < 0) {
 int serrno = errno;
-LOG("Failed to set NBD socket");
+error_setg(errp, "Failed to set NBD socket");
 return -serrno;
 }
 
@@ -647,7 +649,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
 int serrno = errno;
-LOG("Failed setting NBD block size");
+error_setg(errp, "Failed setting NBD block size");
 return -serrno;
 }
 
@@ -659,7 +661,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
 int serrno = errno;
-LOG("Failed setting size (in blocks)");
+error_setg(errp, "Failed setting size (in blocks)");
 return -serrno;
 }
 
@@ -670,12 +672,12 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, BLKROSET, (unsigned long) _only) < 0) {
 int serrno = errno;
-LOG("Failed setting read-only attribute");
+error_setg(errp, "Failed setting read-only attribute");
 return -serrno;
 }
 } else {
 int serrno = errno;
-LOG("Failed setting flags");
+error_setg(errp, "Failed setting flags");
 return -serrno;
 }
 }
@@ -723,8 +725,10 @@ int nbd_disconnect(int fd)
 }
 
 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size,
+Error **errp)
 {
+error_setg(errp, "nbd_init is only supported on Linux");
 return -ENOTSUP;
 }
 
@@ -758,19 +762,19 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest 
*request)
 return write_sync(ioc, buf, sizeof(buf), NULL);
 }
 
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
+ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
 uint8_t 

[Qemu-devel] [PULL 09/20] Check the return value of fcntl in qemu_set_cloexec

2017-05-19 Thread Paolo Bonzini
From: Stefano Stabellini 

Assert that the return value is not an error. This issue was found by
Coverity.

CID: 1374831

Signed-off-by: Stefano Stabellini 
CC: gr...@kaod.org
CC: pbonz...@redhat.com
CC: Eric Blake 
Message-Id: <1494356693-13190-2-git-send-email-sstabell...@kernel.org>
Signed-off-by: Paolo Bonzini 
---
 util/oslib-posix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 4d9189e..16894ad 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -182,7 +182,9 @@ void qemu_set_cloexec(int fd)
 {
 int f;
 f = fcntl(fd, F_GETFD);
-fcntl(fd, F_SETFD, f | FD_CLOEXEC);
+assert(f != -1);
+f = fcntl(fd, F_SETFD, f | FD_CLOEXEC);
+assert(f != -1);
 }
 
 /*
-- 
1.8.3.1





[Qemu-devel] [PULL 3/3] audio: Rename hw/audio/audio.h to hw/audio/soundhw.h

2017-05-19 Thread Gerd Hoffmann
From: Eduardo Habkost 

All the functions in hw/audio/audio.h are called "soundhw_*()"
and live in hw/audio/audiohw.c. Rename the header file for
consistency.

Signed-off-by: Eduardo Habkost 
Reviewed-by: David Gibson 
Reviewed-by: Hervé Poussineau 
Message-id: 20170508205735.23444-4-ehabk...@redhat.com
Signed-off-by: Gerd Hoffmann 
---
 include/hw/audio/{audio.h => soundhw.h} | 0
 arch_init.c | 2 +-
 hw/audio/ac97.c | 2 +-
 hw/audio/adlib.c| 2 +-
 hw/audio/cs4231a.c  | 2 +-
 hw/audio/es1370.c   | 2 +-
 hw/audio/gus.c  | 2 +-
 hw/audio/intel-hda.c| 2 +-
 hw/audio/pcspk.c| 2 +-
 hw/audio/sb16.c | 2 +-
 hw/audio/soundhw.c  | 2 +-
 hw/ppc/prep.c   | 2 +-
 vl.c| 2 +-
 13 files changed, 12 insertions(+), 12 deletions(-)
 rename include/hw/audio/{audio.h => soundhw.h} (100%)

diff --git a/include/hw/audio/audio.h b/include/hw/audio/soundhw.h
similarity index 100%
rename from include/hw/audio/audio.h
rename to include/hw/audio/soundhw.h
diff --git a/arch_init.c b/arch_init.c
index 74ca62f508..a0b8ed6167 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -27,7 +27,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/arch_init.h"
 #include "hw/pci/pci.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qmp-commands.h"
diff --git a/hw/audio/ac97.c b/hw/audio/ac97.c
index c30657501c..959c786261 100644
--- a/hw/audio/ac97.c
+++ b/hw/audio/ac97.c
@@ -19,7 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/pci/pci.h"
 #include "sysemu/dma.h"
diff --git a/hw/audio/adlib.c b/hw/audio/adlib.c
index 09b8248cda..c6e0f10c16 100644
--- a/hw/audio/adlib.c
+++ b/hw/audio/adlib.c
@@ -25,7 +25,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/isa/isa.h"
 
diff --git a/hw/audio/cs4231a.c b/hw/audio/cs4231a.c
index 3ecd0582bf..096e8e98d7 100644
--- a/hw/audio/cs4231a.c
+++ b/hw/audio/cs4231a.c
@@ -23,7 +23,7 @@
  */
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev.h"
diff --git a/hw/audio/es1370.c b/hw/audio/es1370.c
index fe64c1ac37..dd7c23d185 100644
--- a/hw/audio/es1370.c
+++ b/hw/audio/es1370.c
@@ -28,7 +28,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/pci/pci.h"
 #include "sysemu/dma.h"
diff --git a/hw/audio/gus.c b/hw/audio/gus.c
index ec103a4db9..3e864cd36d 100644
--- a/hw/audio/gus.c
+++ b/hw/audio/gus.c
@@ -24,7 +24,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/isa/isa.h"
 #include "gusemu.h"
diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 2c497eb174..06acc98f7b 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -22,7 +22,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "qemu/timer.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "intel-hda.h"
 #include "intel-hda-defs.h"
 #include "sysemu/dma.h"
diff --git a/hw/audio/pcspk.c b/hw/audio/pcspk.c
index 9b99358d87..f643b122bb 100644
--- a/hw/audio/pcspk.c
+++ b/hw/audio/pcspk.c
@@ -26,7 +26,7 @@
 #include "hw/hw.h"
 #include "hw/i386/pc.h"
 #include "hw/isa/isa.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "qemu/timer.h"
 #include "hw/timer/i8254.h"
diff --git a/hw/audio/sb16.c b/hw/audio/sb16.c
index 6b4427f242..6ab2f6f89a 100644
--- a/hw/audio/sb16.c
+++ b/hw/audio/sb16.c
@@ -23,7 +23,7 @@
  */
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 #include "audio/audio.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev.h"
diff --git a/hw/audio/soundhw.c b/hw/audio/soundhw.c
index 29565da93d..e698909d34 100644
--- a/hw/audio/soundhw.c
+++ b/hw/audio/soundhw.c
@@ -28,7 +28,7 @@
 #include "qom/object.h"
 #include "hw/isa/isa.h"
 #include "hw/pci/pci.h"
-#include "hw/audio/audio.h"
+#include "hw/audio/soundhw.h"
 
 struct soundhw {
 const char *name;
diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
index 4a7d2cfbe0..d16646c95d 100644
--- a/hw/ppc/prep.c
+++ b/hw/ppc/prep.c
@@ -36,7 +36,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/ppc/ppc.h"
 #include 

Re: [Qemu-devel] specify memory in QEMU with Virtio

2017-05-19 Thread Gerd Hoffmann
On Do, 2017-05-18 at 22:43 -0400, jenia mtl wrote:
> Hello.
> 
> 
> How do I specify how much graphics memory the Windows client should have in
> QEMU. I have successfully installed Virtio and can launch the VM (Windows)
> with it. This doubles the memory form 8MB to 16MB. But I need 512. How can
> I set that?

virtio-vga doesn't need dedicated video memory, except for the
(unaccelerated) vga compatibility mode.  But given that there are no
virtio-vga drivers for windows (yet) it'll actually runs in vga mode, so
you have no advantages over stdvga.

stdvga can be configured with up to 256M of video memory (-device
VGA,vgamem_mb=64).  This will allow higher resolutions display
resolutions.  Still an unaccelerated framebuffer though.

cheers,
  Gerd




[Qemu-devel] [PATCH] i386: fix read/write cr with icount option

2017-05-19 Thread Mihail Abakumov
Running Windows with icount causes a crash in instruction of write cr. 
This patch fixes it.


Reading and writing cr cause an icount read because there are called 
cpu_get_apic_tpr and cpu_set_apic_tpr functions. So, there is need 
gen_io_start()/gen_io_end() calls.


---
 target/i386/translate.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 06d8833..3b009bd 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -7907,14 +7907,26 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,

 gen_update_cc_op(s);
 gen_jmp_im(pc_start - s->cs_base);
 if (b & 2) {
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_op_mov_v_reg(ot, cpu_T0, rm);
 gen_helper_write_crN(cpu_env, tcg_const_i32(reg),
  cpu_T0);
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_end();
+}
 gen_jmp_im(s->pc - s->cs_base);
 gen_eob(s);
 } else {
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_start();
+}
 gen_helper_read_crN(cpu_T0, cpu_env, 
tcg_const_i32(reg));

 gen_op_mov_reg_v(ot, rm, cpu_T0);
+if (s->tb->cflags & CF_USE_ICOUNT) {
+gen_io_end();
+}
 }
 break;
 default:
--
1.9.1



Re: [Qemu-devel] [PATCH] i386: fix read/write cr with icount option

2017-05-19 Thread Paolo Bonzini


On 19/05/2017 11:36, Mihail Abakumov wrote:
> Running Windows with icount causes a crash in instruction of write cr.
> This patch fixes it.
> 
> Reading and writing cr cause an icount read because there are called
> cpu_get_apic_tpr and cpu_set_apic_tpr functions. So, there is need
> gen_io_start()/gen_io_end() calls.

The patch looks good, but lacks a signoff.  Please read the Developer 
Certificate of Origin[1] and reply to this email with "Signed-off-by: 
Mihail Abakumov ".


[1] Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Thanks,

Paolo
> ---
>  target/i386/translate.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 06d8833..3b009bd 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -7907,14 +7907,26 @@ static target_ulong disas_insn(CPUX86State *env,
> DisasContext *s,
>  gen_update_cc_op(s);
>  gen_jmp_im(pc_start - s->cs_base);
>  if (b & 2) {
> +if (s->tb->cflags & CF_USE_ICOUNT) {
> +gen_io_start();
> +}
>  gen_op_mov_v_reg(ot, cpu_T0, rm);
>  gen_helper_write_crN(cpu_env, tcg_const_i32(reg),
>   cpu_T0);
> +if (s->tb->cflags & CF_USE_ICOUNT) {
> +gen_io_end();
> +}
>  gen_jmp_im(s->pc - s->cs_base);
>  gen_eob(s);
>  } else {
> +if (s->tb->cflags & CF_USE_ICOUNT) {
> +gen_io_start();
> +}
>  gen_helper_read_crN(cpu_T0, cpu_env,
> tcg_const_i32(reg));
>  gen_op_mov_reg_v(ot, rm, cpu_T0);
> +if (s->tb->cflags & CF_USE_ICOUNT) {
> +gen_io_end();
> +}
>  }
>  break;
>  default:



[Qemu-devel] [PATCH v2 1/4] block: count bdrv_co_rw_vmstate() requests

2017-05-19 Thread Stefan Hajnoczi
Call bdrv_inc/dec_in_flight() for vmstate reads/writes.  This seems
unnecessary at first glance because vmstate reads/writes are done
synchronously while the guest is stopped.  But we need the bdrv_wakeup()
in bdrv_dec_in_flight() so the main loop sees request completion.
Besides, it's cleaner to count vmstate reads/writes like ordinary
read/write requests.

The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Paolo Bonzini 
---
 block/io.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index fdd7485..cc56e90 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1988,17 +1988,24 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector 
*qiov, int64_t pos,
bool is_read)
 {
 BlockDriver *drv = bs->drv;
+int ret = -ENOTSUP;
+
+bdrv_inc_in_flight(bs);
 
 if (!drv) {
-return -ENOMEDIUM;
+ret = -ENOMEDIUM;
 } else if (drv->bdrv_load_vmstate) {
-return is_read ? drv->bdrv_load_vmstate(bs, qiov, pos)
-   : drv->bdrv_save_vmstate(bs, qiov, pos);
+if (is_read) {
+ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+} else {
+ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+}
 } else if (bs->file) {
-return bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
+ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
 }
 
-return -ENOTSUP;
+bdrv_dec_in_flight(bs);
+return ret;
 }
 
 static void coroutine_fn bdrv_co_rw_vmstate_entry(void *opaque)
-- 
2.9.3




[Qemu-devel] [PATCH 14/21] vfio/ccw: vfio based subchannel passthrough driver

2017-05-19 Thread Cornelia Huck
From: Xiao Feng Ren 

We use the IOMMU_TYPE1 of VFIO to realize the subchannels
passthrough, implement a vfio based subchannels passthrough
driver called "vfio-ccw".

Support qemu parameters in the style of:
"-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.'

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-8-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 default-configs/s390x-softmmu.mak |   1 +
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/ccw.c | 187 ++
 include/hw/vfio/vfio-common.h |   1 +
 4 files changed, 190 insertions(+)
 create mode 100644 hw/vfio/ccw.c

diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index 9615a48f80..18aed56fc0 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -5,4 +5,5 @@ CONFIG_SCLPCONSOLE=y
 CONFIG_TERMINAL3270=y
 CONFIG_S390_FLIC=y
 CONFIG_S390_FLIC_KVM=$(CONFIG_KVM)
+CONFIG_VFIO_CCW=$(CONFIG_LINUX)
 CONFIG_WDT_DIAG288=y
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 05e7fbb93f..c3ab9097f1 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,6 +1,7 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o pci-quirks.o
+obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_SOFTMMU) += platform.o
 obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
 obj-$(CONFIG_VFIO_AMD_XGBE) += amd-xgbe.o
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
new file mode 100644
index 00..7d2497cee6
--- /dev/null
+++ b/hw/vfio/ccw.c
@@ -0,0 +1,187 @@
+/*
+ * vfio based subchannel assignment support
+ *
+ * Copyright 2017 IBM Corp.
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ *Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or(at
+ * your option) any version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/s390x/s390-ccw.h"
+#include "hw/s390x/ccw-device.h"
+
+#define TYPE_VFIO_CCW "vfio-ccw"
+typedef struct VFIOCCWDevice {
+S390CCWDevice cdev;
+VFIODevice vdev;
+} VFIOCCWDevice;
+
+static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
+{
+vdev->needs_reset = false;
+}
+
+/*
+ * We don't need vfio_hot_reset_multi and vfio_eoi operations for
+ * vfio_ccw device now.
+ */
+struct VFIODeviceOps vfio_ccw_ops = {
+.vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
+};
+
+static void vfio_ccw_reset(DeviceState *dev)
+{
+CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
+S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+
+ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
+}
+
+static void vfio_put_device(VFIOCCWDevice *vcdev)
+{
+g_free(vcdev->vdev.name);
+vfio_put_base_device(>vdev);
+}
+
+static VFIOGroup *vfio_ccw_get_group(S390CCWDevice *cdev, Error **errp)
+{
+char *tmp, group_path[PATH_MAX];
+ssize_t len;
+int groupid;
+
+tmp = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/%s/iommu_group",
+  cdev->hostid.cssid, cdev->hostid.ssid,
+  cdev->hostid.devid, cdev->mdevid);
+len = readlink(tmp, group_path, sizeof(group_path));
+g_free(tmp);
+
+if (len <= 0 || len >= sizeof(group_path)) {
+error_setg(errp, "vfio: no iommu_group found");
+return NULL;
+}
+
+group_path[len] = 0;
+
+if (sscanf(basename(group_path), "%d", ) != 1) {
+error_setg(errp, "vfio: failed to read %s", group_path);
+return NULL;
+}
+
+return vfio_get_group(groupid, _space_memory, errp);
+}
+
+static void vfio_ccw_realize(DeviceState *dev, Error **errp)
+{
+VFIODevice *vbasedev;
+VFIOGroup *group;
+CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
+S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
+Error *err = NULL;
+
+/* Call the class init function for subchannel. */
+if (cdc->realize) {
+cdc->realize(cdev, vcdev->vdev.sysfsdev, );
+if (err) {
+goto out_err_propagate;
+}
+}
+
+group = vfio_ccw_get_group(cdev, );
+if (!group) {
+goto out_group_err;
+}
+
+vcdev->vdev.ops = _ccw_ops;
+vcdev->vdev.type = 

[Qemu-devel] [PATCH 05/21] pc-bios/s390-ccw: Get list of supported VPD pages

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

The "Supported Pages" Inquiry EVPD page is mandatory for all SCSI devices,
and is used as a gateway for what VPD pages the device actually supports.
Let's issue this Inquiry, and dump that list with the debug facility.

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-6-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/scsi.h| 10 ++
 pc-bios/s390-ccw/virtio-scsi.c | 17 +
 2 files changed, 27 insertions(+)

diff --git a/pc-bios/s390-ccw/scsi.h b/pc-bios/s390-ccw/scsi.h
index 83ffaef54e..803eff8ae3 100644
--- a/pc-bios/s390-ccw/scsi.h
+++ b/pc-bios/s390-ccw/scsi.h
@@ -28,9 +28,11 @@
 
 /* SCSI Inquiry Types */
 #define SCSI_INQUIRY_STANDARD   0x00U
+#define SCSI_INQUIRY_EVPD   0x01U
 
 /* SCSI Inquiry Pages */
 #define SCSI_INQUIRY_STANDARD_NONE  0x00U
+#define SCSI_INQUIRY_EVPD_SUPPORTED_PAGES   0x00U
 
 union ScsiLun {
 uint64_t v64;/* numeric shortcut */
@@ -77,6 +79,14 @@ struct ScsiInquiryStd {
 }  __attribute__((packed));
 typedef struct ScsiInquiryStd ScsiInquiryStd;
 
+struct ScsiInquiryEvpdPages {
+uint8_t peripheral_qdt; /* b0, use (b0 & 0x1f) to get SCSI_INQ_RDT  */
+uint8_t page_code;  /* b1   */
+uint16_t page_length;   /* b2..b3 length = N-3  */
+uint8_t byte[28];   /* b4..bN Supported EVPD pages (N=31 here)  */
+}  __attribute__((packed));
+typedef struct ScsiInquiryEvpdPages ScsiInquiryEvpdPages;
+
 struct ScsiCdbInquiry {
 uint8_t command; /* b0, == 0x12 */
 uint8_t b1;  /* b1, |= 0x01 (evpd)  */
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index 9d2e14cdf0..e34755c4d4 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -19,6 +19,7 @@ static VirtioScsiCmdReq req;
 static VirtioScsiCmdResp resp;
 
 static uint8_t scsi_inquiry_std_response[256];
+static ScsiInquiryEvpdPages scsi_inquiry_evpd_pages_response;
 
 static inline void vs_assert(bool term, const char **msgs)
 {
@@ -319,6 +320,8 @@ void virtio_scsi_setup(VDev *vdev)
 int retry_test_unit_ready = 3;
 uint8_t data[256];
 uint32_t data_size = sizeof(data);
+ScsiInquiryEvpdPages *evpd = _inquiry_evpd_pages_response;
+int i;
 
 vdev->scsi_device = _scsi_device;
 virtio_scsi_locate_device(vdev);
@@ -363,6 +366,20 @@ void virtio_scsi_setup(VDev *vdev)
 vdev->scsi_block_size = VIRTIO_ISO_BLOCK_SIZE;
 }
 
+if (!scsi_inquiry(vdev,
+  SCSI_INQUIRY_EVPD,
+  SCSI_INQUIRY_EVPD_SUPPORTED_PAGES,
+  evpd,
+  sizeof(*evpd))) {
+virtio_scsi_verify_response(, 
"virtio-scsi:setup:supported_pages");
+}
+
+debug_print_int("EVPD length", evpd->page_length);
+
+for (i = 0; i <= evpd->page_length; i++) {
+debug_print_int("supported EVPD page", evpd->byte[i]);
+}
+
 if (!scsi_read_capacity(vdev, data, data_size)) {
 virtio_scsi_verify_response(, "virtio-scsi:setup:read_capacity");
 }
-- 
2.13.0




[Qemu-devel] [PATCH 13/21] s390x/css: device support for s390-ccw passthrough

2017-05-19 Thread Cornelia Huck
From: Dong Jia Shi 

In order to support subchannels pass-through, we introduce a s390
subchannel device called "s390-ccw" to hold the real subchannel info.
The s390-ccw devices inherit from the abstract CcwDevice which connect
to the existing virtual-css-bus.

Reviewed-by: Eric Auger 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-7-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/s390x/Makefile.objs  |   1 +
 hw/s390x/s390-ccw.c | 141 
 include/hw/s390x/s390-ccw.h |  38 
 3 files changed, 180 insertions(+)
 create mode 100644 hw/s390x/s390-ccw.c
 create mode 100644 include/hw/s390x/s390-ccw.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 36bd4b1645..a8e5575a8a 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -14,3 +14,4 @@ obj-y += ccw-device.o
 obj-y += s390-pci-bus.o s390-pci-inst.o
 obj-y += s390-skeys.o
 obj-$(CONFIG_KVM) += s390-skeys-kvm.o
+obj-y += s390-ccw.o
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
new file mode 100644
index 00..e2b1973fda
--- /dev/null
+++ b/hw/s390x/s390-ccw.c
@@ -0,0 +1,141 @@
+/*
+ * s390 CCW Assignment Support
+ *
+ * Copyright 2017 IBM Corp
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ *Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2
+ * or (at your option) any later version. See the COPYING file in the
+ * top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "libgen.h"
+#include "hw/s390x/css.h"
+#include "hw/s390x/css-bridge.h"
+#include "hw/s390x/s390-ccw.h"
+
+static void s390_ccw_get_dev_info(S390CCWDevice *cdev,
+  char *sysfsdev,
+  Error **errp)
+{
+unsigned int cssid, ssid, devid;
+char dev_path[PATH_MAX] = {0}, *tmp;
+
+if (!sysfsdev) {
+error_setg(errp, "No host device provided");
+error_append_hint(errp,
+  "Use -device vfio-ccw,sysfsdev=PATH_TO_DEVICE\n");
+return;
+}
+
+if (!realpath(sysfsdev, dev_path)) {
+error_setg_errno(errp, errno, "Host device '%s' not found", sysfsdev);
+return;
+}
+
+cdev->mdevid = g_strdup(basename(dev_path));
+
+tmp = basename(dirname(dev_path));
+if (sscanf(tmp, "%2x.%1x.%4x", , , ) != 3) {
+error_setg_errno(errp, errno, "Failed to read %s", tmp);
+return;
+}
+
+cdev->hostid.cssid = cssid;
+cdev->hostid.ssid = ssid;
+cdev->hostid.devid = devid;
+cdev->hostid.valid = true;
+}
+
+static void s390_ccw_realize(S390CCWDevice *cdev, char *sysfsdev, Error **errp)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(cdev);
+CCWDeviceClass *ck = CCW_DEVICE_GET_CLASS(ccw_dev);
+DeviceState *parent = DEVICE(ccw_dev);
+BusState *qbus = qdev_get_parent_bus(parent);
+VirtualCssBus *cbus = VIRTUAL_CSS_BUS(qbus);
+SubchDev *sch;
+int ret;
+Error *err = NULL;
+
+s390_ccw_get_dev_info(cdev, sysfsdev, );
+if (err) {
+goto out_err_propagate;
+}
+
+sch = css_create_sch(ccw_dev->devno, false, cbus->squash_mcss, );
+if (!sch) {
+goto out_mdevid_free;
+}
+sch->driver_data = cdev;
+
+ccw_dev->sch = sch;
+ret = css_sch_build_schib(sch, >hostid);
+if (ret) {
+error_setg_errno(, -ret, "%s: Failed to build initial schib",
+ __func__);
+goto out_err;
+}
+
+ck->realize(ccw_dev, );
+if (err) {
+goto out_err;
+}
+
+css_generate_sch_crws(sch->cssid, sch->ssid, sch->schid,
+  parent->hotplugged, 1);
+return;
+
+out_err:
+css_subch_assign(sch->cssid, sch->ssid, sch->schid, sch->devno, NULL);
+ccw_dev->sch = NULL;
+g_free(sch);
+out_mdevid_free:
+g_free(cdev->mdevid);
+out_err_propagate:
+error_propagate(errp, err);
+}
+
+static void s390_ccw_unrealize(S390CCWDevice *cdev, Error **errp)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(cdev);
+SubchDev *sch = ccw_dev->sch;
+
+if (sch) {
+css_subch_assign(sch->cssid, sch->ssid, sch->schid, sch->devno, NULL);
+g_free(sch);
+ccw_dev->sch = NULL;
+}
+
+g_free(cdev->mdevid);
+}
+
+static void s390_ccw_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
+
+dc->bus_type = TYPE_VIRTUAL_CSS_BUS;
+cdc->realize = s390_ccw_realize;
+cdc->unrealize = s390_ccw_unrealize;
+}
+
+static const TypeInfo s390_ccw_info = {
+.name  = TYPE_S390_CCW,
+.parent= TYPE_CCW_DEVICE,
+.instance_size = sizeof(S390CCWDevice),
+

[Qemu-devel] [PULL 05/20] mc146818rtc: embrace all x86 specific code

2017-05-19 Thread Paolo Bonzini
From: Xiao Guangrong 

Introduce a function, rtc_policy_slew_deliver_irq(), which delivers
irq if LOST_TICK_POLICY_SLEW is used, as which is only supported on
x86, other platforms call it will trigger a assert

After that, we can move the x86 specific code to the common place

Signed-off-by: Xiao Guangrong 
Message-Id: <20170510083259.3900-6-xiaoguangr...@tencent.com>
Signed-off-by: Paolo Bonzini 
---
 hw/timer/mc146818rtc.c | 60 ++
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index f9d6181..542cd09 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -125,17 +125,34 @@ static void rtc_coalesced_timer_update(RTCState *s)
 }
 }
 
+static QLIST_HEAD(, RTCState) rtc_devices =
+QLIST_HEAD_INITIALIZER(rtc_devices);
+
 #ifdef TARGET_I386
+void qmp_rtc_reset_reinjection(Error **errp)
+{
+RTCState *s;
+
+QLIST_FOREACH(s, _devices, link) {
+s->irq_coalesced = 0;
+}
+}
+
+static bool rtc_policy_slew_deliver_irq(RTCState *s)
+{
+apic_reset_irq_delivered();
+qemu_irq_raise(s->irq);
+return apic_get_irq_delivered();
+}
+
 static void rtc_coalesced_timer(void *opaque)
 {
 RTCState *s = opaque;
 
 if (s->irq_coalesced != 0) {
-apic_reset_irq_delivered();
 s->cmos_data[RTC_REG_C] |= 0xc0;
 DPRINTF_C("cmos: injecting from timer\n");
-qemu_irq_raise(s->irq);
-if (apic_get_irq_delivered()) {
+if (rtc_policy_slew_deliver_irq(s)) {
 s->irq_coalesced--;
 DPRINTF_C("cmos: coalesced irqs decreased to %d\n",
   s->irq_coalesced);
@@ -144,6 +161,12 @@ static void rtc_coalesced_timer(void *opaque)
 
 rtc_coalesced_timer_update(s);
 }
+#else
+static bool rtc_policy_slew_deliver_irq(RTCState *s)
+{
+assert(0);
+return false;
+}
 #endif
 
 static uint32_t rtc_periodic_clock_ticks(RTCState *s)
@@ -254,21 +277,17 @@ static void rtc_periodic_timer(void *opaque)
 s->cmos_data[RTC_REG_C] |= REG_C_PF;
 if (s->cmos_data[RTC_REG_B] & REG_B_PIE) {
 s->cmos_data[RTC_REG_C] |= REG_C_IRQF;
-#ifdef TARGET_I386
 if (s->lost_tick_policy == LOST_TICK_POLICY_SLEW) {
 if (s->irq_reinject_on_ack_count >= RTC_REINJECT_ON_ACK_COUNT)
-s->irq_reinject_on_ack_count = 0;  
-apic_reset_irq_delivered();
-qemu_irq_raise(s->irq);
-if (!apic_get_irq_delivered()) {
+s->irq_reinject_on_ack_count = 0;
+if (!rtc_policy_slew_deliver_irq(s)) {
 s->irq_coalesced++;
 rtc_coalesced_timer_update(s);
 DPRINTF_C("cmos: coalesced irqs increased to %d\n",
   s->irq_coalesced);
 }
 } else
-#endif
-qemu_irq_raise(s->irq);
+qemu_irq_raise(s->irq);
 }
 }
 
@@ -612,20 +631,6 @@ static void rtc_get_time(RTCState *s, struct tm *tm)
 rtc_from_bcd(s, s->cmos_data[RTC_CENTURY]) * 100 - 1900;
 }
 
-static QLIST_HEAD(, RTCState) rtc_devices =
-QLIST_HEAD_INITIALIZER(rtc_devices);
-
-#ifdef TARGET_I386
-void qmp_rtc_reset_reinjection(Error **errp)
-{
-RTCState *s;
-
-QLIST_FOREACH(s, _devices, link) {
-s->irq_coalesced = 0;
-}
-}
-#endif
-
 static void rtc_set_time(RTCState *s)
 {
 struct tm tm;
@@ -745,22 +750,19 @@ static uint64_t cmos_ioport_read(void *opaque, hwaddr 
addr,
 if (ret & (REG_C_UF | REG_C_AF)) {
 check_update_timer(s);
 }
-#ifdef TARGET_I386
+
 if(s->irq_coalesced &&
 (s->cmos_data[RTC_REG_B] & REG_B_PIE) &&
 s->irq_reinject_on_ack_count < RTC_REINJECT_ON_ACK_COUNT) {
 s->irq_reinject_on_ack_count++;
 s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_PF;
-apic_reset_irq_delivered();
 DPRINTF_C("cmos: injecting on ack\n");
-qemu_irq_raise(s->irq);
-if (apic_get_irq_delivered()) {
+if (rtc_policy_slew_deliver_irq(s)) {
 s->irq_coalesced--;
 DPRINTF_C("cmos: coalesced irqs decreased to %d\n",
   s->irq_coalesced);
 }
 }
-#endif
 break;
 default:
 ret = s->cmos_data[s->cmos_index];
-- 
1.8.3.1





Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-19 Thread Wei Wang

On 05/19/2017 11:10 AM, Jason Wang wrote:



On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:



On 2017年05月17日 14:16, Jason Wang wrote:



On 2017年05月16日 15:12, Wei Wang wrote:




Hi:

Care to post the driver codes too?

OK. It may take some time to clean up the driver code before post 
it out. You can first

have a check of the draft at the repo here:
https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei


Interesting, looks like there's one copy on tx side. We used to 
have zerocopy support for tun for VM2VM traffic. Could you please 
try to compare it with your vhost-pci-net by:


We can analyze from the whole data path - from VM1's network stack to 
send packets -> VM2's
network stack to receive packets. The number of copies are actually 
the same for both.


That's why I'm asking you to compare the performance. The only reason 
for vhost-pci is performance. You should prove it.




vhost-pci: 1-copy happen in VM1's driver xmit(), which copes packets 
from its network stack to VM2's
RX ring buffer. (we call it "zerocopy" because there is no 
intermediate copy between VMs)
zerocopy enabled vhost-net: 1-copy happen in tun's recvmsg, which 
copies packets from VM1's TX ring

buffer to VM2's RX ring buffer.


Actually, there's a major difference here. You do copy in guest which 
consumes time slice of vcpu thread on host. Vhost_net do this in its 
own thread. So I feel vhost_net is even faster here, maybe I was wrong.




The code path using vhost_net is much longer - the Ping test shows that 
the zcopy based vhost_net reports around 0.237ms,

while using vhost-pci it reports around 0.06 ms.
For some environment issue, I can report the throughput number later.



That being said, we compared to vhost-user, instead of vhost_net, 
because vhost-user is the one

that is used in NFV, which we think is a major use case for vhost-pci.


If this is true, why not draft a pmd driver instead of a kernel one? 


Yes, that's right. There are actually two directions of the vhost-pci 
driver implementation - kernel driver
and dpdk pmd. The QEMU side device patches are first posted out for 
discussion, because when the device
part is ready, we will be able to have the related team work on the pmd 
driver as well. As usual, the pmd

driver would give a much better throughput.

So, I think at this stage we should focus on the device part review, and 
use the kernel driver to prove that

the device part design and implementation is reasonable and functional.


And do you use virtio-net kernel driver to compare the performance? If 
yes, has OVS dpdk optimized for kernel driver (I think not)?




We used the legacy OVS+DPDK.
Another thing with the existing OVS+DPDK usage is its centralization 
property. With vhost-pci, we will be able to

de-centralize the usage.

What's more important, if vhost-pci is faster, I think its kernel 
driver should be also faster than virtio-net, no?


Sorry about the confusion. We are actually not trying to use vhost-pci 
to replace virtio-net. Rather, vhost-pci
can be viewed as another type of backend for virtio-net to be used in 
NFV (the communication channel is

vhost-pci-net<->virtio_net).


Best,
Wei



[Qemu-devel] [PATCH] target/i386: use multiple CPU AddressSpaces

2017-05-19 Thread Paolo Bonzini
This speeds up SMM switches.  Later on it may remove the need to take
the BQL, and it may also allow to reuse code between TCG and KVM.

Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c| 15 +-
 target/i386/cpu.h| 11 +-
 target/i386/helper.c | 54 
 target/i386/machine.c|  4 
 target/i386/smm_helper.c | 18 
 5 files changed, 47 insertions(+), 55 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5e768404a1..1b3b77c96a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3239,7 +3239,7 @@ static void x86_cpu_machine_done(Notifier *n, void 
*unused)
 cpu->smram = g_new(MemoryRegion, 1);
 memory_region_init_alias(cpu->smram, OBJECT(cpu), "smram",
  smram, 0, 1ull << 32);
-memory_region_set_enabled(cpu->smram, false);
+memory_region_set_enabled(cpu->smram, true);
 memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, cpu->smram, 
1);
 }
 }
@@ -3619,7 +3619,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 #ifndef CONFIG_USER_ONLY
 if (tcg_enabled()) {
-AddressSpace *newas = g_new(AddressSpace, 1);
+AddressSpace *as_normal = address_space_init_shareable(cs->memory,
+   "cpu-memory");
+AddressSpace *as_smm = g_new(AddressSpace, 1);
 
 cpu->cpu_as_mem = g_new(MemoryRegion, 1);
 cpu->cpu_as_root = g_new(MemoryRegion, 1);
@@ -3635,9 +3637,11 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
  get_system_memory(), 0, ~0ull);
 memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, 
cpu->cpu_as_mem, 0);
 memory_region_set_enabled(cpu->cpu_as_mem, true);
-address_space_init(newas, cpu->cpu_as_root, "CPU");
-cs->num_ases = 1;
-cpu_address_space_init(cs, newas, 0);
+address_space_init(as_smm, cpu->cpu_as_root, "CPU");
+
+cs->num_ases = 2;
+cpu_address_space_init(cs, as_normal, 0);
+cpu_address_space_init(cs, as_smm, 1);
 
 /* ... SMRAM with higher priority, linked from /machine/smram.  */
 cpu->machine_done.notify = x86_cpu_machine_done;
@@ -4053,6 +4057,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
void *data)
 #ifdef CONFIG_USER_ONLY
 cc->handle_mmu_fault = x86_cpu_handle_mmu_fault;
 #else
+cc->asidx_from_attrs = x86_asidx_from_attrs;
 cc->get_memory_mapping = x86_cpu_get_memory_mapping;
 cc->get_phys_page_debug = x86_cpu_get_phys_page_debug;
 cc->write_elf64_note = x86_cpu_write_elf64_note;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 32a3a0cb8f..c2e081c6e3 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1450,6 +1450,16 @@ int x86_cpu_handle_mmu_fault(CPUState *cpu, vaddr addr,
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
 
 #ifndef CONFIG_USER_ONLY
+static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs)
+{
+return !!attrs.secure;
+}
+
+static inline AddressSpace *cpu_addressspace(CPUState *cs, MemTxAttrs attrs)
+{
+return cpu_get_address_space(cs, cpu_asidx_from_attrs(cs, attrs));
+}
+
 uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_lduw_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_ldl_phys(CPUState *cs, hwaddr addr);
@@ -1652,7 +1662,6 @@ void do_interrupt_x86_hardirq(CPUX86State *env, int 
intno, int is_hw);
 
 /* smm_helper.c */
 void do_smm_enter(X86CPU *cpu);
-void cpu_smm_update(X86CPU *cpu);
 
 /* apic.c */
 void cpu_report_tpr_access(CPUX86State *env, TPRAccess access);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 6c16e7cb53..d0daa1f882 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -1403,89 +1403,89 @@ uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_ldub(cs->as, addr,
-  cpu_get_mem_attrs(env),
-  NULL);
+return address_space_ldub(as, addr, attrs, NULL);
 }
 
 uint32_t x86_lduw_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_lduw(cs->as, addr,
-  cpu_get_mem_attrs(env),
-  NULL);
+return address_space_lduw(as, addr, attrs, NULL);
 }
 
 uint32_t x86_ldl_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_ldl(cs->as, addr,

[Qemu-devel] [PATCH v1 10/13] qcow2-cluster: slightly refactor handle_dependencies()

2017-05-19 Thread Anton Nefedov
  - assert the alignment on return if the allocation has to stop
(at the start of a running allocation)
  - make use of const specifiers for local variables

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4204db9..03d6f7e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -900,15 +900,15 @@ out:
  * Check if there already is an AIO write request in flight which allocates
  * the same cluster. In this case we need to wait until the previous
  * request has completed and updated the L2 table accordingly.
- *
  * Returns:
  *   0   if there was no dependency. *cur_bytes indicates the number of
  *   bytes from guest_offset that can be read before the next
- *   dependency must be processed (or the request is complete)
+ *   dependency must be processed (or the request is complete).
+ *   *m is not modified
  *
- *   -EAGAIN if we had to wait for another request, previously gathered
- *   information on cluster allocation may be invalid now. The caller
- *   must start over anyway, so consider *cur_bytes undefined.
+ *   -EAGAIN if we had to wait for another request. The caller
+ *   must start over, so consider *cur_bytes undefined.
+ *   *m is not modified
  */
 static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 uint64_t *cur_bytes, QCowL2Meta **m)
@@ -919,10 +919,10 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
 
 QLIST_FOREACH(old_alloc, >cluster_allocs, next_in_flight) {
 
-uint64_t start = guest_offset;
-uint64_t end = start + bytes;
-uint64_t old_start = l2meta_cow_start(old_alloc);
-uint64_t old_end = l2meta_cow_end(old_alloc);
+const uint64_t start = guest_offset;
+const uint64_t end = start + bytes;
+const uint64_t old_start = l2meta_cow_start(old_alloc);
+const uint64_t old_end = l2meta_cow_end(old_alloc);
 
 if (end <= old_start || start >= old_end) {
 /* No intersection */
@@ -939,6 +939,8 @@ static int handle_dependencies(BlockDriverState *bs, 
uint64_t guest_offset,
  * and deal with requests depending on them before starting to
  * gather new ones. Not worth the trouble. */
 if (bytes == 0 && *m) {
+/* start must be cluster aligned at this point */
+assert(start == start_of_cluster(s, start));
 *cur_bytes = 0;
 return 0;
 }
-- 
2.7.4




Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

2017-05-19 Thread Daniel P. Berrange
On Fri, May 19, 2017 at 05:51:43PM +0800, Peter Xu wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> 
> Indeed. I should not block postcopy if the user used a TCP tunnel
> between the source and destination in some way, using this exec: way.
> Thanks for pointing that out.
> 
> However I still think the idea is needed here. Say, we'd better know
> whether the transport would be able to respond (though current
> approach of "assuming sockets are the only ones that can reply" is not
> a good solution...). Please see below.
> 
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> Firstly I should not say it's a hang - it's actually by-design here
> imho - migration thread is in the last phase now, waiting for a SHUT
> message from destination (which I think is wise). But from the
> behavior, indeed src VM is not usable during the time, just like what
> happened for most postcopy cases on the source side. So, we can see
> that postcopy "assumes" that destination side can reply now.
> 
> Meanwhile, I see it reasonable for postcopy to have such an
> assumption. After all, postcopy means "start VM on destination before
> pages are moved over completely", then there must be someone to reply
> to source, no matter whether it'll be via some kind of io channel.
> 
> That's why I think we still need the general idea here, that we need
> to know whether destination end is able to reply.
> 
> But, I still have no good idea (after knowing this patch won't work)
> on how we can do this... Any further suggestions would be greatly
> welcomed.

IMHO this is nothing more than a documentation issue for the 'exec'
protocol. ie, document that you should provide a bi-directional
transport for live migration.

A uni-directional transport is arguably only valid if you're using
migrate to save/restore the VM state to a file.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-devel] [PATCH 11/21] s390x/css: realize css_sch_build_schib

2017-05-19 Thread Cornelia Huck
From: Xiao Feng Ren 

The S390 virtual css support already has a mechanism to build a
virtual subchannel information block (schib) and provide virtual
subchannels to the guest. However, to pass-through subchannels to
a guest, we need to introduce a new mechanism to build its schib
according to the real device information. Thus we realize a new css
sch_build_schib function to extract the path_masks, chpids, chpid
type from sysfs. To reuse the existing code, we refactor
css_add_virtual_chpid to css_add_chpid.

Reviewed-by: Pierre Morel 
Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-5-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c | 152 -
 include/hw/s390x/css.h |  36 ++--
 2 files changed, 168 insertions(+), 20 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 15c4f4b249..2c8d0e7219 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -13,6 +13,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/qdev.h"
+#include "qemu/error-report.h"
 #include "qemu/bitops.h"
 #include "exec/address-spaces.h"
 #include "cpu.h"
@@ -1326,7 +1327,8 @@ unsigned int css_find_free_chpid(uint8_t cssid)
 return MAX_CHPID + 1;
 }
 
-static int css_add_virtual_chpid(uint8_t cssid, uint8_t chpid, uint8_t type)
+static int css_add_chpid(uint8_t cssid, uint8_t chpid, uint8_t type,
+ bool is_virt)
 {
 CssImage *css;
 
@@ -1340,7 +1342,7 @@ static int css_add_virtual_chpid(uint8_t cssid, uint8_t 
chpid, uint8_t type)
 }
 css->chpids[chpid].in_use = 1;
 css->chpids[chpid].type = type;
-css->chpids[chpid].is_virtual = 1;
+css->chpids[chpid].is_virtual = is_virt;
 
 css_generate_chp_crws(cssid, chpid);
 
@@ -1364,7 +1366,7 @@ void css_sch_build_virtual_schib(SubchDev *sch, uint8_t 
chpid, uint8_t type)
 p->pam = 0x80;
 p->chpid[0] = chpid;
 if (!css->chpids[chpid].in_use) {
-css_add_virtual_chpid(sch->cssid, chpid, type);
+css_add_chpid(sch->cssid, chpid, type, true);
 }
 
 memset(s, 0, sizeof(SCSW));
@@ -1978,3 +1980,147 @@ SubchDev *css_create_virtual_sch(CssDevId bus_id, Error 
**errp)
 css_subch_assign(sch->cssid, sch->ssid, schid, sch->devno, sch);
 return sch;
 }
+
+static int css_sch_get_chpids(SubchDev *sch, CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+uint32_t chpid[8];
+int i;
+PMCW *p = >curr_status.pmcw;
+
+fid_path = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/chpids",
+   dev_id->cssid, dev_id->ssid, dev_id->devid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x %x %x %x %x %x %x %x",
+[0], [1], [2], [3],
+[4], [5], [6], [7]) != 8) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+for (i = 0; i < ARRAY_SIZE(p->chpid); i++) {
+p->chpid[i] = chpid[i];
+}
+
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+static int css_sch_get_path_masks(SubchDev *sch, CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+uint32_t pim, pam, pom;
+PMCW *p = >curr_status.pmcw;
+
+fid_path = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/pimpampom",
+   dev_id->cssid, dev_id->ssid, dev_id->devid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x %x %x", , , ) != 3) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+p->pim = pim;
+p->pam = pam;
+p->pom = pom;
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+static int css_sch_get_chpid_type(uint8_t chpid, uint32_t *type,
+  CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+
+fid_path = g_strdup_printf("/sys/devices/css%x/chp0.%02x/type",
+   dev_id->cssid, chpid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x", type) != 1) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+/*
+ * We currently retrieve the real device information from sysfs to build the
+ * guest subchannel information block without considering the migration 
feature.
+ * We need to revisit this problem when we want to add migration support.
+ */
+int 

[Qemu-devel] [PATCH 18/21] s390x/css: ccw translation infrastructure

2017-05-19 Thread Cornelia Huck
From: Xiao Feng Ren 

Implement a basic infrastructure of handling channel I/O instruction
interception for passed through subchannels:
1. Branch the code path of instruction interception handling by
   SubChannel type.
2. For a passed-through subchannel, issue the ORB to kernel to do ccw
   translation and perform an I/O operation.
3. Assign different condition code based on the I/O result, or
   trigger a program check.

Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
Message-Id: <20170517004813.58227-12-bjsdj...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c | 89 ++
 hw/s390x/s390-ccw.c| 12 +++
 hw/s390x/virtio-ccw.c  |  1 +
 include/hw/s390x/css.h |  4 +++
 target/s390x/ioinst.c  |  9 +
 5 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 462a768f9e..1e2f26b65a 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -524,7 +524,7 @@ static int css_interpret_ccw(SubchDev *sch, hwaddr ccw_addr,
 return ret;
 }
 
-static void sch_handle_start_func(SubchDev *sch, ORB *orb)
+static void sch_handle_start_func_virtual(SubchDev *sch, ORB *orb)
 {
 
 PMCW *p = >curr_status.pmcw;
@@ -626,13 +626,58 @@ static void sch_handle_start_func(SubchDev *sch, ORB *orb)
 
 }
 
+static int sch_handle_start_func_passthrough(SubchDev *sch, ORB *orb)
+{
+
+PMCW *p = >curr_status.pmcw;
+SCSW *s = >curr_status.scsw;
+int ret;
+
+if (!(s->ctrl & SCSW_ACTL_SUSP)) {
+assert(orb != NULL);
+p->intparm = orb->intparm;
+}
+
+/*
+ * Only support prefetch enable mode.
+ * Only support 64bit addressing idal.
+ */
+if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
+!(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
+return -EINVAL;
+}
+
+ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
+switch (ret) {
+/* Currently we don't update control block and just return the cc code. */
+case 0:
+break;
+case -EBUSY:
+break;
+case -ENODEV:
+break;
+case -EACCES:
+/* Let's reflect an inaccessible host device by cc 3. */
+ret = -ENODEV;
+break;
+default:
+   /*
+* All other return codes will trigger a program check,
+* or set cc to 1.
+*/
+   break;
+};
+
+return ret;
+}
+
 /*
  * On real machines, this would run asynchronously to the main vcpus.
  * We might want to make some parts of the ssch handling (interpreting
  * read/writes) asynchronous later on if we start supporting more than
  * our current very simple devices.
  */
-static void do_subchannel_work(SubchDev *sch, ORB *orb)
+int do_subchannel_work_virtual(SubchDev *sch, ORB *orb)
 {
 
 SCSW *s = >curr_status.scsw;
@@ -643,12 +688,45 @@ static void do_subchannel_work(SubchDev *sch, ORB *orb)
 sch_handle_halt_func(sch);
 } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
 /* Triggered by both ssch and rsch. */
-sch_handle_start_func(sch, orb);
+sch_handle_start_func_virtual(sch, orb);
 } else {
 /* Cannot happen. */
-return;
+return 0;
 }
 css_inject_io_interrupt(sch);
+return 0;
+}
+
+int do_subchannel_work_passthrough(SubchDev *sch, ORB *orb)
+{
+int ret;
+SCSW *s = >curr_status.scsw;
+
+if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
+/* TODO: Clear handling */
+sch_handle_clear_func(sch);
+ret = 0;
+} else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
+/* TODO: Halt handling */
+sch_handle_halt_func(sch);
+ret = 0;
+} else if (s->ctrl & SCSW_FCTL_START_FUNC) {
+ret = sch_handle_start_func_passthrough(sch, orb);
+} else {
+/* Cannot happen. */
+return -ENODEV;
+}
+
+return ret;
+}
+
+static int do_subchannel_work(SubchDev *sch, ORB *orb)
+{
+if (sch->do_subchannel_work) {
+return sch->do_subchannel_work(sch, orb);
+} else {
+return -EINVAL;
+}
 }
 
 static void copy_pmcw_to_guest(PMCW *dest, const PMCW *src)
@@ -967,8 +1045,7 @@ int css_do_ssch(SubchDev *sch, ORB *orb)
 s->ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND);
 s->flags &= ~SCSW_FLAGS_MASK_PNO;
 
-do_subchannel_work(sch, orb);
-ret = 0;
+ret = do_subchannel_work(sch, orb);
 
 out:
 return ret;
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index e2b1973fda..8614dda6f8 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -18,6 +18,17 @@
 #include "hw/s390x/css-bridge.h"
 #include "hw/s390x/s390-ccw.h"
 
+int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
+{
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(data);
+
+if (cdc->handle_request) {
+return cdc->handle_request(orb, scsw, data);
+} else {
+return -ENOSYS;
+

[Qemu-devel] [PATCH 08/21] pc-bios/s390-ccw.img: rebuild image

2017-05-19 Thread Cornelia Huck
From: Eric Farman 

Contains the following commits:
- pc-bios/s390-ccw: Remove duplicate blk_factor adjustment
- pc-bios/s390-ccw: Move SCSI block factor to outer read
- pc-bios/s390-ccw: Break up virtio-scsi read into multiples
- pc-bios/s390-ccw: Refactor scsi_inquiry function
- pc-bios/s390-ccw: Get list of supported EVPD pages
- pc-bios/s390-ccw: Get Block Limits VPD device data
- pc-bios/s390-ccw: Build a reasonable max_sectors limit

Signed-off-by: Eric Farman 
Message-Id: <20170510155359.32727-9-far...@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw.img | Bin 26472 -> 26480 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/pc-bios/s390-ccw.img b/pc-bios/s390-ccw.img
index 
0b01d49495c607b67d3f1b2359395534631deb88..5ad056400073c7e1c5e862576c76f0e674ff3c60
 100644
GIT binary patch
delta 7395
zcma)Be_T}6w%_N>FbqFEz%UHM2m_LlDB_P0lypGyk&%sxhDL>k7CuZ;?3LFV?#=5p
zuNAWG<}0)5W*xI17lgW4rc)2i-;Rc9MMaL9g)h$-sW+8#zk30_ZrwlLd_Keap7m?*
zz1G@m?=ze|Bx#4F`r(qcHv84^@=ivd^3JuL!zDLYq^HuRUhU=mno8?>`O-S>lh#{0
zqnfv#ze$^;fADnX!eM{5J+$Sqm0!=Lt5=A
zpA#ya@P7%M_9f3Ty(OpI!_SyXCz7+})hYOtAr-b>IhyR(QS9!ORQT2%W;_SM1fPb@QsnP?T#CWHGssg
z6AmMB4Udh=@Hor`6%|gw4nIdK9>Qlx1GUBBj2yLF#-QvXCHy;q9VdBK>_%uB;?%BwoqS#15A2OL=tM
zWV`JjLK6BFY28EkQ_^zr#c_+}#MAstT!tWuWr*x}l;>NL?Y4(W?{1-!hA8z)3rK
zt;J=x-Ab$xB-RLLlDL5%vSizB6zWh4b>hb)rcfs>u5NiR6rx30b*uI~aTSway?jjb|Y2s|cH<5S~-(p=*n2HYUV!a|*3GIYm
zB}Z^-GVaJD34IV7cfCR%02O$}GiS)87^1y=1YHtd9_;nzVgG*0Z0H{Ix|oa6
z>3&3bfiFZv>rPf^9wj&^{zKe)Tbf

[Qemu-devel] [PULL 02/20] mc146818rtc: precisely count the clock for periodic timer

2017-05-19 Thread Paolo Bonzini
From: Tai Yunfang 

There are two issues in current code:
1) If the period is changed by re-configuring RegA, the coalesced
   irq will be scaled to reflect the new period, however, it
   calculates the new interrupt number like this:
s->irq_coalesced = (s->irq_coalesced * s->period) / period;

   There are some clocks will be lost if they are not enough to
   be squeezed to a single new period that will cause the VM clock
   slower

   In order to fix the issue, we calculate the interrupt window
   based on the precise clock rather than period, then the clocks
   lost during period is scaled can be compensated properly

2) If periodic_timer_update() is called due to RegA reconfiguration,
   i.e, the period is updated, current time is not the start point
   for the next periodic timer, instead, which should start from the
   last interrupt, otherwise, the clock in VM will become slow

   This patch takes the clocks from last interrupt to current clock
   into account and compensates the clocks for the next interrupt,
   especially if a complete interrupt was lost in this window, the
   time can be caught up by LOST_TICK_POLICY_SLEW

Signed-off-by: Tai Yunfang 
Signed-off-by: Xiao Guangrong 
Message-Id: <20170510083259.3900-3-xiaoguangr...@tencent.com>
Signed-off-by: Paolo Bonzini 
---
 hw/timer/mc146818rtc.c | 120 +++--
 1 file changed, 97 insertions(+), 23 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index 7d78391..aeb60cc 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -146,31 +146,100 @@ static void rtc_coalesced_timer(void *opaque)
 }
 #endif
 
-/* handle periodic timer */
-static void periodic_timer_update(RTCState *s, int64_t current_time)
+static uint32_t rtc_periodic_clock_ticks(RTCState *s)
 {
-int period_code, period;
-int64_t cur_clock, next_irq_clock;
+int period_code;
+
+if (!(s->cmos_data[RTC_REG_B] & REG_B_PIE)) {
+return 0;
+ }
 
 period_code = s->cmos_data[RTC_REG_A] & 0x0f;
-if (period_code != 0
-&& (s->cmos_data[RTC_REG_B] & REG_B_PIE)) {
-if (period_code <= 2)
-period_code += 7;
-/* period in 32 Khz cycles */
-period = 1 << (period_code - 1);
-#ifdef TARGET_I386
-if (period != s->period) {
-s->irq_coalesced = (s->irq_coalesced * s->period) / period;
-DPRINTF_C("cmos: coalesced irqs scaled to %d\n", s->irq_coalesced);
-}
-s->period = period;
-#endif
+if (!period_code) {
+return 0;
+}
+
+if (period_code <= 2) {
+period_code += 7;
+}
+
+/* period in 32 Khz cycles */
+return 1 << (period_code - 1);
+}
+
+/*
+ * handle periodic timer. @old_period indicates the periodic timer update
+ * is just due to period adjustment.
+ */
+static void
+periodic_timer_update(RTCState *s, int64_t current_time, uint32_t old_period)
+{
+uint32_t period;
+int64_t cur_clock, next_irq_clock, lost_clock = 0;
+
+period = rtc_periodic_clock_ticks(s);
+
+if (period) {
 /* compute 32 khz clock */
 cur_clock =
 muldiv64(current_time, RTC_CLOCK_RATE, NANOSECONDS_PER_SECOND);
 
-next_irq_clock = (cur_clock & ~(period - 1)) + period;
+/*
+* if the periodic timer's update is due to period re-configuration,
+* we should count the clock since last interrupt.
+*/
+if (old_period) {
+int64_t last_periodic_clock, next_periodic_clock;
+
+next_periodic_clock = muldiv64(s->next_periodic_time,
+RTC_CLOCK_RATE, NANOSECONDS_PER_SECOND);
+last_periodic_clock = next_periodic_clock - old_period;
+lost_clock = cur_clock - last_periodic_clock;
+assert(lost_clock >= 0);
+}
+
+#ifdef TARGET_I386
+/*
+ * s->irq_coalesced can change for two reasons:
+ *
+ * a) if one or more periodic timer interrupts have been lost,
+ *lost_clock will be more that a period.
+ *
+ * b) when the period may be reconfigured, we expect the OS to
+ *treat delayed tick as the new period.  So, when switching
+ *from a shorter to a longer period, scale down the missing,
+ *because the OS will treat past delayed ticks as longer
+ *(leftovers are put back into lost_clock).  When switching
+ *to a shorter period, scale up the missing ticks since the
+ *OS handler will treat past delayed ticks as shorter.
+ */
+if (s->lost_tick_policy == LOST_TICK_POLICY_SLEW) {
+uint32_t old_irq_coalesced = s->irq_coalesced;
+
+s->period = period;
+lost_clock += old_irq_coalesced * old_period;
+s->irq_coalesced = lost_clock / s->period;
+  

[Qemu-devel] [PULL 00/20] Misc patches for 2017-05-19

2017-05-19 Thread Paolo Bonzini
The following changes since commit 56821559f0ba682fe6b367815572e6f974d329ab:

  Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging 
(2017-05-18 13:36:15 +0100)

are available in the git repository at:


  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to e10dc0ca6854c4f47cc5e9d47e20c62aa875f518:

  target/i386: use multiple CPU AddressSpaces (2017-05-19 13:01:32 +0200)


* virtio-scsi use-after-free fix (Fam)
* vhost-user-scsi support (Felipe)
* SMM fixes and improvements for TCG (myself)
* irqchip and AddressSpaceDispatch cleanups and fixes (Peter)
* Coverity fix (Stefano)
* NBD cleanups (Vladimir)
* RTC accuracy improvements and code cleanups (Guangrong+Yunfang)


Fam Zheng (1):
  virtio-scsi: Unset hotplug handler when unrealize

Felipe Franciosi (2):
  vhost-user-scsi: Introduce vhost-user-scsi host device
  vhost-user-scsi: Introduce a vhost-user-scsi sample application

Paolo Bonzini (2):
  target/i386: enable A20 automatically in system management mode
  target/i386: use multiple CPU AddressSpaces

Peter Xu (4):
  kvm: irqchip: trace changes on msi add/remove
  msix: trace control bit write op
  kvm: irqchip: skip update msi when disabled
  exec: simplify phys_page_find() params

Stefano Stabellini (1):
  Check the return value of fcntl in qemu_set_cloexec

Tai Yunfang (1):
  mc146818rtc: precisely count the clock for periodic timer

Vladimir Sementsov-Ogievskiy (5):
  nbd: strict nbd_wr_syncv
  nbd: read_sync and friends: return 0 on success
  nbd: add errp parameter to nbd_wr_syncv()
  nbd: add errp to read_sync, write_sync and drop_sync
  nbd/client.c: use errp instead of LOG

Xiao Guangrong (4):
  mc146818rtc: update periodic timer only if it is needed
  mc146818rtc: ensure LOST_TICK_POLICY_SLEW is only enabled on TARGET_I386
  mc146818rtc: drop unnecessary '#ifdef TARGET_I386'
  mc146818rtc: embrace all x86 specific code

 .gitignore|   1 +
 Makefile  |   3 +
 Makefile.objs |   4 +
 block/nbd-client.c|  11 +-
 contrib/vhost-user-scsi/Makefile.objs |   1 +
 contrib/vhost-user-scsi/vhost-user-scsi.c | 886 ++
 default-configs/pci.mak   |   1 +
 default-configs/s390x-softmmu.mak |   1 +
 exec.c|  13 +-
 hw/pci/msix.c |  11 +-
 hw/pci/trace-events   |   3 +
 hw/scsi/Makefile.objs |   1 +
 hw/scsi/vhost-user-scsi.c | 215 
 hw/scsi/virtio-scsi.c |   3 +
 hw/timer/mc146818rtc.c| 206 ---
 hw/virtio/virtio-pci.c|  54 ++
 hw/virtio/virtio-pci.h|  11 +
 include/block/nbd.h   |   8 +-
 include/hw/virtio/vhost-user-scsi.h   |  35 ++
 include/hw/virtio/virtio-scsi.h   |   3 +
 kvm-all.c |   4 +-
 nbd/client.c  | 125 ++---
 nbd/common.c  |  23 +-
 nbd/nbd-internal.h|  40 +-
 nbd/server.c  |  92 ++--
 qemu-nbd.c|   3 +-
 target/i386/arch_memory_mapping.c |  18 +-
 target/i386/cpu.c |  15 +-
 target/i386/cpu.h |  20 +-
 target/i386/helper.c  |  96 ++--
 target/i386/kvm.c |  12 +-
 target/i386/machine.c |   4 -
 target/i386/smm_helper.c  |  18 -
 trace-events  |   3 +-
 util/oslib-posix.c|   4 +-
 35 files changed, 1642 insertions(+), 306 deletions(-)
 create mode 100644 contrib/vhost-user-scsi/Makefile.objs
 create mode 100644 contrib/vhost-user-scsi/vhost-user-scsi.c
 create mode 100644 hw/scsi/vhost-user-scsi.c
 create mode 100644 include/hw/virtio/vhost-user-scsi.h
-- 
1.8.3.1




Re: [Qemu-devel] [Bug 1034423] Re: Guests running OpenIndiana (and relatives) fail to boot on AMD hardware

2017-05-19 Thread Owen Tuz
This is an old ticket! I had completely forgotten about it, but will test
when I get a chance and let you know.

Cheers,

Owen

On Fri, May 19, 2017 at 11:25 AM, Thomas Huth <1034...@bugs.launchpad.net>
wrote:

> Triaging old bug tickets ... can you still reproduce this issue with the
> latest version of QEMU (currently v2.9)?
>
> ** Changed in: qemu
>Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1034423
>
> Title:
>   Guests running OpenIndiana (and relatives) fail to boot on AMD
>   hardware
>
> Status in QEMU:
>   Incomplete
>
> Bug description:
>   First observed with OpenSolaris 2009.06, and also applies to the
>   latest OpenIndiana release.
>
>   Version: qemu-kvm 1.1.1
>
>   Hardware:
>
>   2 x AMD Opteron 6128 8-core processors, 64GB RAM.
>
>   These guests boot on equivalent Intel hardware.
>
>   To reproduce:
>
>   qemu-kvm -nodefaults -m 512 -cpu host -vga cirrus -usbdevice tablet
>   -vnc :99 -monitor stdio -hda drive.img -cdrom oi-dev-
>   151a5-live-x86.iso -boot order=dc
>
>   I've tested with "-vga std" and various different emulated CPU types,
>   to no effect.
>
>   What happens:
>
>   GRUB loads, and offers multiple boot options, but none work. Some kind
>   of kernel panic flies by very fast before restarting the VM, and
>   careful use of the screenshot button reveals that it reads as follows:
>
>   panic[cpu0]/thread=fec22de0: BAD TRAP: type=8 (#df Double fault)
>   rp=fec2b48c add r=0
>
>   #df Double fault
>   pid=0, pc=0xault
>   pid=0, pc=0xfe800377, sp=0xfec40090, eflags=0x202
>   cr0: 80050011 cr4:b8
>   cr2: 0cr3: ae2f000
> gs:1b0fs:  0   es:
>  160   ds:  160
>edi:0  esi:  0 ebp:
>  0 esp: fec2b4c4
>ebx: c0010015 edx:  0 ecx: 0 eax:
> fec40400
>trp: 8  err:  0 eip: fe800377
> cs:   158
>efl: 202 usp: fec40090  ss:   160
>   tss.tss_link: 0x0
>   tss.tss_esp0:   0x0
>   tss.tss_ss0: 0x160
>   tss.tss_esp1:   0x0
>   tss.tss_ss1:  0x0
>   tss.tss esp2: 0x0
>   tss.tss_ss2:  0x0
>   tss.tss_cr3:   0xae2f000
>   tss.tss_eip:   0xfec40400
>   tss.tss_eflags:  0x202
>   tss.tss_eax:  0xfec40400
>   tss.tss_ebx:  0xc0010015
>   tss.tss_ecx:  0xc001
>   tss.tss_edx:  0x0
>   tss.tss_esp:  0xfec40090
>
>   Warning - stack not written to the dumpbuf
>   fec2b3c8 unix:due+e4 (8, fec2b48c, 0, 0)
>   fec2b478 unix:trap+12fa (fec2b48c, 0, 0)
>   fec2b48c unix:_cmntrap+7c (1b0, 0, 160, 160, 0)
>
>   If there's any more, I haven't managed to catch it.
>
>   Solaris 11 does not seem to suffer from the same issue, although the
>   first message that appears at boot (after the version info) is "trap:
>   Unkown trap type 8 in user mode". Could be related?
>
>   As always, thanks in advance and please let me know if I can help to
>   test, or provide any more information.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1034423/+subscriptions
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1034423

Title:
  Guests running OpenIndiana (and relatives) fail to boot on AMD
  hardware

Status in QEMU:
  Incomplete

Bug description:
  First observed with OpenSolaris 2009.06, and also applies to the
  latest OpenIndiana release.

  Version: qemu-kvm 1.1.1

  Hardware:

  2 x AMD Opteron 6128 8-core processors, 64GB RAM.

  These guests boot on equivalent Intel hardware.

  To reproduce:

  qemu-kvm -nodefaults -m 512 -cpu host -vga cirrus -usbdevice tablet
  -vnc :99 -monitor stdio -hda drive.img -cdrom oi-dev-
  151a5-live-x86.iso -boot order=dc

  I've tested with "-vga std" and various different emulated CPU types,
  to no effect.

  What happens:

  GRUB loads, and offers multiple boot options, but none work. Some kind
  of kernel panic flies by very fast before restarting the VM, and
  careful use of the screenshot button reveals that it reads as follows:

  panic[cpu0]/thread=fec22de0: BAD TRAP: type=8 (#df Double fault)
  rp=fec2b48c add r=0

  #df Double fault
  pid=0, pc=0xault
  pid=0, pc=0xfe800377, sp=0xfec40090, eflags=0x202
  cr0: 80050011 cr4:b8
  cr2: 0cr3: ae2f000
gs:1b0fs:  0   es: 160   
ds:  160
   edi:0  esi:  0 ebp: 0 
esp: fec2b4c4
   ebx: c0010015 edx:  0 ecx: 0 eax: 
fec40400
   trp: 8  err:  0 eip: fe800377  cs:   
158
   efl: 202 usp: fec40090  ss:   160
  tss.tss_link: 

[Qemu-devel] [PULL 2/3] audio: Rename audio_init() to soundhw_init()

2017-05-19 Thread Gerd Hoffmann
From: Eduardo Habkost 

To make it consistent with the remaining soundhw.c functions and
avoid confusion with the audio_init() function in audio/audio.c,
rename audio_init() to soundhw_init().

Signed-off-by: Eduardo Habkost 
Reviewed-by: David Gibson 
Message-id: 20170508205735.23444-3-ehabk...@redhat.com
Signed-off-by: Gerd Hoffmann 
---
 include/hw/audio/audio.h | 2 +-
 hw/audio/soundhw.c   | 2 +-
 hw/ppc/prep.c| 2 +-
 vl.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/hw/audio/audio.h b/include/hw/audio/audio.h
index 259bb2cf96..119f7d78d5 100644
--- a/include/hw/audio/audio.h
+++ b/include/hw/audio/audio.h
@@ -7,7 +7,7 @@ void isa_register_soundhw(const char *name, const char *descr,
 void pci_register_soundhw(const char *name, const char *descr,
   int (*init_pci)(PCIBus *bus));
 
-void audio_init(void);
+void soundhw_init(void);
 void select_soundhw(const char *optarg);
 
 #endif
diff --git a/hw/audio/soundhw.c b/hw/audio/soundhw.c
index 5e96b73c81..29565da93d 100644
--- a/hw/audio/soundhw.c
+++ b/hw/audio/soundhw.c
@@ -129,7 +129,7 @@ void select_soundhw(const char *optarg)
 }
 }
 
-void audio_init(void)
+void soundhw_init(void)
 {
 struct soundhw *c;
 ISABus *isa_bus = (ISABus *) object_resolve_path_type("", TYPE_ISA_BUS, 
NULL);
diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
index 96a4813b3f..4a7d2cfbe0 100644
--- a/hw/ppc/prep.c
+++ b/hw/ppc/prep.c
@@ -783,7 +783,7 @@ static void ibm_40p_init(MachineState *machine)
_checksum);
 
 /* initialize audio subsystem */
-audio_init();
+soundhw_init();
 
 /* add some more devices */
 if (defaults_enabled()) {
diff --git a/vl.c b/vl.c
index 6e46889cde..8f08f422a7 100644
--- a/vl.c
+++ b/vl.c
@@ -4575,7 +4575,7 @@ int main(int argc, char **argv, char **envp)
 
 realtime_init();
 
-audio_init();
+soundhw_init();
 
 if (hax_enabled()) {
 hax_sync_vcpus();
-- 
2.9.3




[Qemu-devel] [PULL 0/1] ui: egl-headless requires dmabuf support

2017-05-19 Thread Gerd Hoffmann
  Hi,

Little single-patch pull request to fix a build issue.

please pull,
  Gerd

The following changes since commit 56821559f0ba682fe6b367815572e6f974d329ab:

  Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging 
(2017-05-18 13:36:15 +0100)

are available in the git repository at:

  git://git.kraxel.org/qemu tags/pull-ui-20170519-1

for you to fetch changes up to 371ec54e9f8415cd74af45acdcf67b413f50cce5:

  ui: egl-headless requires dmabuf support (2017-05-19 10:46:00 +0200)


ui: egl-headless requires dmabuf support


Gerd Hoffmann (1):
  ui: egl-headless requires dmabuf support

 vl.c | 4 ++--
 ui/Makefile.objs | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)



[Qemu-devel] [PATCH v1 05/13] qcow2: set inactive flag

2017-05-19 Thread Anton Nefedov
Qcow2State and BlockDriverState flags have to be in sync

Signed-off-by: Anton Nefedov 
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6e7ce96..07c1706 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1939,6 +1939,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
 
 if (result == 0) {
 qcow2_mark_clean(bs);
+s->flags |= BDRV_O_INACTIVE;
 }
 
 return result;
-- 
2.7.4




[Qemu-devel] [PATCH v1 07/13] qcow2: check space leak at the end of the image

2017-05-19 Thread Anton Nefedov
From: Pavel Butsykin 

Preallocating memory in the image may remain unused after the fall, for
the qcow2_check adds the ability to identify and fix it, so as not to
store extra memory on the host.

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
Signed-off-by: Anton Nefedov 
---
 block/qcow2.c  |  31 +++
 tests/qemu-iotests/026.out | 104 -
 tests/qemu-iotests/026.out.nocache | 104 -
 tests/qemu-iotests/029.out |   5 +-
 tests/qemu-iotests/060.out |  10 +++-
 tests/qemu-iotests/061.out |   5 +-
 tests/qemu-iotests/066.out |   5 +-
 tests/qemu-iotests/098.out |   7 ++-
 tests/qemu-iotests/108.out |   5 +-
 tests/qemu-iotests/112.out |   5 +-
 10 files changed, 225 insertions(+), 56 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7b4359b..503f0dc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -322,6 +322,32 @@ int qcow2_mark_consistent(BlockDriverState *bs)
 return 0;
 }
 
+static int qcow2_check_extra_preallocation(BlockDriverState *bs,
+BdrvCheckResult *res, BdrvCheckMode fix)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t img_size = bdrv_getlength(bs->file->bs);
+
+if (res->image_end_offset < img_size) {
+uint64_t count =
+DIV_ROUND_UP(img_size - res->image_end_offset, s->cluster_size);
+fprintf(stderr, "%s space leaked at the end of the image %jd\n",
+fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
+img_size - res->image_end_offset);
+res->leaks += count;
+if (fix & BDRV_FIX_LEAKS) {
+int ret = bdrv_truncate(bs->file, res->image_end_offset, NULL);
+if (ret < 0) {
+res->check_errors++;
+return ret;
+}
+res->leaks_fixed += count;
+}
+}
+
+return 0;
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
BdrvCheckMode fix)
 {
@@ -330,6 +356,11 @@ static int qcow2_check(BlockDriverState *bs, 
BdrvCheckResult *result,
 return ret;
 }
 
+ret = qcow2_check_extra_preallocation(bs, result, fix);
+if (ret < 0) {
+return ret;
+}
+
 if (fix && result->check_errors == 0 && result->corruptions == 0) {
 ret = qcow2_mark_clean(bs);
 if (ret < 0) {
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 86a50a2..e8cf348 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -5,7 +5,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write -b
@@ -33,7 +36,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write -b
@@ -181,7 +187,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write -b
@@ -207,7 +216,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write -b
@@ -468,20 +480,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, 

[Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-19 Thread Anton Nefedov
From: "Denis V. Lunev" 

Currently each single write operation can result in 3 write operations
if guest offsets are not cluster aligned. One write is performed for the
real payload and two for COW-ed areas. Thus the data possibly lays
non-contiguously on the host filesystem. This will reduce further
sequential read performance significantly.

The patch allocates the space in the file with cluster granularity,
ensuring
  1. better host offset locality
  2. less space allocation operations
 (which can be expensive on distributed storages)

Signed-off-by: Denis V. Lunev 
Signed-off-by: Anton Nefedov 
---
 block/qcow2.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0..2e6a0ec 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,32 @@ fail:
 return ret;
 }
 
+static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+BDRVQcow2State *s = bs->opaque;
+BlockDriverState *file = bs->file->bs;
+QCowL2Meta *m;
+int ret;
+
+for (m = l2meta; m != NULL; m = m->next) {
+uint64_t bytes = m->nb_clusters << s->cluster_bits;
+
+if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+continue;
+}
+
+/* try to alloc host space in one chunk for better locality */
+ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 
0);
+
+if (ret != 0) {
+continue;
+}
+
+file->total_sectors = MAX(file->total_sectors,
+  (m->alloc_offset + bytes) / 
BDRV_SECTOR_SIZE);
+}
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov,
  int flags)
@@ -1656,8 +1682,12 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 if (ret < 0) {
 goto fail;
 }
-
 qemu_co_mutex_unlock(>lock);
+
+if (bs->file->bs->drv->bdrv_co_pwrite_zeroes != NULL) {
+handle_alloc_space(bs, l2meta);
+}
+
 BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
 trace_qcow2_writev_data(qemu_coroutine_self(),
 cluster_offset + offset_in_cluster);
-- 
2.7.4




Re: [Qemu-devel] [PATCH 3/3] numa: silence incomplete mapping warning under qtest

2017-05-19 Thread Stefan Hajnoczi
On Thu, May 18, 2017 at 7:20 PM, Eduardo Habkost  wrote:
> On Thu, May 18, 2017 at 10:09:31AM +0200, Igor Mammedov wrote:
>> Suggested-by: Markus Armbruster 
>> Signed-off-by: Igor Mammedov 
>
> Where exactly is the test code that triggers those messages and
> requires this patch? I would like to document that in the commit
> message.

$ make V=1 check
TEST: tests/numa-test... (pid=30376)
  /x86_64/numa/mon/default:OK
  /x86_64/numa/mon/cpus/explicit:  OK
  /x86_64/numa/mon/cpus/partial:
qemu-system-x86_64: warning: CPU(s) not present in any NUMA nodes: CPU
2 [socket-id: 2, core-id: 0, thread-id: 0], CPU 3 [socket-id: 3,
core-id: 0, thread-id: 0], CPU 6 [socket-id: 6, core-id: 0, thread-id:
0], CPU 7 [socket-id: 7, core-id: 0, thread-id: 0]
qemu-system-x86_64: warning: All CPU(s) up to maxcpus should be
described in NUMA config, ability to start up with partial NUMA
mappings is obsoleted and will be removed in future
OK



[Qemu-devel] [Bug 1255303] Re: ALSA underruns occurr when using QEMU

2017-05-19 Thread Thomas Huth
Triaging old bug tickets ... can you still reproduce this issue with the
latest version of QEMU (currently v2.9)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1255303

Title:
  ALSA underruns occurr when using QEMU

Status in QEMU:
  Incomplete

Bug description:
  I'm running QEMU 1.6.1 on a 64-bit Gentoo Linux system. The guest
  operating system is Windows 7 32-bit. I get multiple identical warning
  messages when using the ac97 or hda sound cards:

  > ALSA lib /var/tmp/portage/media-libs/alsa-lib-1.0.27.2/work/alsa-
  lib-1.0.27.2/src/pcm/pcm.c:7843:(snd_pcm_recover) underrun occurred

  The difference between ac97 and hda is that the former works well,
  while the latter causes the sound to be garbled.

  /var/tmp/portage is the directory where Portage, the Gentoo package
  manager, builds programs. I don't know why it is mentioned in the
  error message.

  I also don't know if this is an ALSA problem or a QEMU problem.

  The command I use is:

  > qemu-system-i386 -cpu host -m 1G -k it -drive
  file=~/QEMU/Windows_7_Privato.qcow2,media=disk,index=0 -vga std -net
  nic -net user -enable-kvm -display sdl -soundhw ac97 -device usb-
  ehci,id=ehci -usb -rtc base=localtime -usbdevice tablet

  My real sound card is an Intel HD Audio:

  > lspci | grep "Audio device"

  > 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD
  Audio Controller (rev 02)

  Please tell me if you need other informations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1255303/+subscriptions



[Qemu-devel] [PULL 03/20] mc146818rtc: ensure LOST_TICK_POLICY_SLEW is only enabled on TARGET_I386

2017-05-19 Thread Paolo Bonzini
From: Xiao Guangrong 

Any tick policy specified on other platforms rather on TARGET_I386
will fall back to LOST_TICK_POLICY_DISCARD silently, this patch makes
sure only TARGET_I386 can enable LOST_TICK_POLICY_SLEW

After that, we can enable LOST_TICK_POLICY_SLEW in the common code
which need not use '#ifdef TARGET_I386' to make these code be x86
specific anymore

Signed-off-by: Xiao Guangrong 
Message-Id: <20170510083259.3900-4-xiaoguangr...@tencent.com>
Signed-off-by: Paolo Bonzini 
---
 hw/timer/mc146818rtc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index aeb60cc..4870a72 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -974,19 +974,19 @@ static void rtc_realizefn(DeviceState *dev, Error **errp)
 
 rtc_set_date_from_host(isadev);
 
-#ifdef TARGET_I386
 switch (s->lost_tick_policy) {
+#ifdef TARGET_I386
 case LOST_TICK_POLICY_SLEW:
 s->coalesced_timer =
 timer_new_ns(rtc_clock, rtc_coalesced_timer, s);
 break;
+#endif
 case LOST_TICK_POLICY_DISCARD:
 break;
 default:
 error_setg(errp, "Invalid lost tick policy.");
 return;
 }
-#endif
 
 s->periodic_timer = timer_new_ns(rtc_clock, rtc_periodic_timer, s);
 s->update_timer = timer_new_ns(rtc_clock, rtc_update_timer, s);
-- 
1.8.3.1





[Qemu-devel] old_vm_running boolean variable

2017-05-19 Thread ali saeedi
Hello
What is the role of  "old_vm_running" variable in migration_thread?
thanks a lot


[Qemu-devel] [PULL 20/20] target/i386: use multiple CPU AddressSpaces

2017-05-19 Thread Paolo Bonzini
This speeds up SMM switches.  Later on it may remove the need to take
the BQL, and it may also allow to reuse code between TCG and KVM.

Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c| 15 +-
 target/i386/cpu.h| 11 +-
 target/i386/helper.c | 54 
 target/i386/machine.c|  4 
 target/i386/smm_helper.c | 18 
 5 files changed, 47 insertions(+), 55 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a41d595..a638832 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3239,7 +3239,7 @@ static void x86_cpu_machine_done(Notifier *n, void 
*unused)
 cpu->smram = g_new(MemoryRegion, 1);
 memory_region_init_alias(cpu->smram, OBJECT(cpu), "smram",
  smram, 0, 1ull << 32);
-memory_region_set_enabled(cpu->smram, false);
+memory_region_set_enabled(cpu->smram, true);
 memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, cpu->smram, 
1);
 }
 }
@@ -3619,7 +3619,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 #ifndef CONFIG_USER_ONLY
 if (tcg_enabled()) {
-AddressSpace *newas = g_new(AddressSpace, 1);
+AddressSpace *as_normal = address_space_init_shareable(cs->memory,
+   "cpu-memory");
+AddressSpace *as_smm = g_new(AddressSpace, 1);
 
 cpu->cpu_as_mem = g_new(MemoryRegion, 1);
 cpu->cpu_as_root = g_new(MemoryRegion, 1);
@@ -3635,9 +3637,11 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
  get_system_memory(), 0, ~0ull);
 memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, 
cpu->cpu_as_mem, 0);
 memory_region_set_enabled(cpu->cpu_as_mem, true);
-address_space_init(newas, cpu->cpu_as_root, "CPU");
-cs->num_ases = 1;
-cpu_address_space_init(cs, newas, 0);
+address_space_init(as_smm, cpu->cpu_as_root, "CPU");
+
+cs->num_ases = 2;
+cpu_address_space_init(cs, as_normal, 0);
+cpu_address_space_init(cs, as_smm, 1);
 
 /* ... SMRAM with higher priority, linked from /machine/smram.  */
 cpu->machine_done.notify = x86_cpu_machine_done;
@@ -4053,6 +4057,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
void *data)
 #ifdef CONFIG_USER_ONLY
 cc->handle_mmu_fault = x86_cpu_handle_mmu_fault;
 #else
+cc->asidx_from_attrs = x86_asidx_from_attrs;
 cc->get_memory_mapping = x86_cpu_get_memory_mapping;
 cc->get_phys_page_debug = x86_cpu_get_phys_page_debug;
 cc->write_elf64_note = x86_cpu_write_elf64_note;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 32a3a0c..c2e081c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1450,6 +1450,16 @@ int x86_cpu_handle_mmu_fault(CPUState *cpu, vaddr addr,
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
 
 #ifndef CONFIG_USER_ONLY
+static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs)
+{
+return !!attrs.secure;
+}
+
+static inline AddressSpace *cpu_addressspace(CPUState *cs, MemTxAttrs attrs)
+{
+return cpu_get_address_space(cs, cpu_asidx_from_attrs(cs, attrs));
+}
+
 uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_lduw_phys(CPUState *cs, hwaddr addr);
 uint32_t x86_ldl_phys(CPUState *cs, hwaddr addr);
@@ -1652,7 +1662,6 @@ void do_interrupt_x86_hardirq(CPUX86State *env, int 
intno, int is_hw);
 
 /* smm_helper.c */
 void do_smm_enter(X86CPU *cpu);
-void cpu_smm_update(X86CPU *cpu);
 
 /* apic.c */
 void cpu_report_tpr_access(CPUX86State *env, TPRAccess access);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 6c16e7c..d0daa1f 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -1403,89 +1403,89 @@ uint8_t x86_ldub_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_ldub(cs->as, addr,
-  cpu_get_mem_attrs(env),
-  NULL);
+return address_space_ldub(as, addr, attrs, NULL);
 }
 
 uint32_t x86_lduw_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_lduw(cs->as, addr,
-  cpu_get_mem_attrs(env),
-  NULL);
+return address_space_lduw(as, addr, attrs, NULL);
 }
 
 uint32_t x86_ldl_phys(CPUState *cs, hwaddr addr)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
+MemTxAttrs attrs = cpu_get_mem_attrs(env);
+AddressSpace *as = cpu_addressspace(cs, attrs);
 
-return address_space_ldl(cs->as, addr,
- 

Re: [Qemu-devel] [Xen-devel] [RFC PATCH V2 1/2] xen-pt: bind/unbind interrupt remapping format MSI

2017-05-19 Thread Jan Beulich
>>> On 19.05.17 at 13:16,  wrote:
> On Thu, May 18, 2017 at 01:32:59AM -0400, Lan Tianyu wrote:
>> --- a/include/hw/i386/apic-msidef.h
>> +++ b/include/hw/i386/apic-msidef.h
>> @@ -26,6 +26,7 @@
>>  
>>  #define MSI_ADDR_DEST_ID_SHIFT  12
>>  #define MSI_ADDR_DEST_IDX_SHIFT 4
>> -#define  MSI_ADDR_DEST_ID_MASK  0x000
>> +#define  MSI_ADDR_DEST_ID_MASK  0x000fff00
> 
> The value of MSI_ADDR_DEST_ID_MASK is changed here. I think the patch
> should be:
> +#define  MSI_ADDR_DEST_ID_MASK  0x0000

Judging from other sources, rather the other way around - the
mask needs to have further bits removed (should be 0x000ff000
afaict). Xen sources confirm this, and while Linux has the value
you suggest, that contradicts

#define MSI_ADDR_DEST_ID_SHIFT  12
#define  MSI_ADDR_DEST_ID(dest) (((dest) << MSI_ADDR_DEST_ID_SHIFT) & \
 MSI_ADDR_DEST_ID_MASK)

as well as

#define MSI_ADDR_EXT_DEST_ID(dest)  ((dest) & 0xff00)

chopping off just the low 8 bits.

Jan




Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] migration: avoid recursive AioContext locking in save_vmstate()

2017-05-19 Thread Stefan Hajnoczi
On Thu, May 18, 2017 at 10:18:46AM +0200, Kevin Wolf wrote:
> Am 17.05.2017 um 19:09 hat Stefan Hajnoczi geschrieben:
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 7f66d58..a70ba20 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2153,6 +2153,14 @@ int save_vmstate(const char *name)
> >  goto the_end;
> >  }
> >  
> > +/* The bdrv_all_create_snapshot() call that follows acquires the 
> > AioContext
> > + * for itself.  BDRV_POLL_WHILE() does not support nested locking 
> > because
> > + * it only releases the lock once.  Therefore synchronous I/O will 
> > deadlock
> > + * unless we release the AioContext before bdrv_all_create_snapshot().
> > + */
> > +aio_context_release(aio_context);
> > +aio_context = NULL;
> > +
> >  ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, );
> >  if (ret < 0) {
> >  error_report("Error while creating snapshot on '%s'",
> > @@ -2163,7 +2171,9 @@ int save_vmstate(const char *name)
> >  ret = 0;
> >  
> >   the_end:
> > -aio_context_release(aio_context);
> > +if (aio_context) {
> > +aio_context_release(aio_context);
> > +}
> >  if (saved_vm_running) {
> >  vm_start();
> >  }
> 
> It might actually even be true before this patch because the lock is
> already only taken for some parts of the function, but don't we need to
> call bdrv_drain_all_begin/end() around the whole function now?
> 
> We're stopping the VM, so hopefully no device is continuing to process
> requests, but can't we still have block jobs, NBD server requests etc.?
> 
> And the same is probably true for qemu_loadvm_state().

Yes, they currently rely on bdrv_drain_all() but that's not enough.
Thanks for the suggestion, will add a patch in v2.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-19 Thread Jay Zhou

Hi Xiao,

On 2017/5/19 16:32, Xiao Guangrong wrote:


I do not know why i was removed from the list.


I was CCed to you...
Your comments are very valuable to us, and thank for your quick response.



On 05/19/2017 04:09 PM, Jay Zhou wrote:

Hi Paolo and Wanpeng,

On 2017/5/17 16:38, Wanpeng Li wrote:

2017-05-17 15:43 GMT+08:00 Paolo Bonzini :

Recently, I have tested the performance before migration and after
migration failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard
performance
evaluation tool.

These are the steps:
==
  (1) the version of kmod is 4.4.11(with slightly modified) and the
version of
  qemu is 2.6.0
 (with slightly modified), the kmod is applied with the following patch

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
  */
 if ((change != KVM_MR_DELETE) &&
 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   }

 /*
  * Set up write protection and/or dirty logging for the new slot.


Try these modifications to the setup:

1) set up 1G hugetlbfs hugepages and use those for the guest's memory

2) test both without and with the above patch.



In order to avoid random memory allocation issues, I reran the test cases:
(1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a
pcpu respectively, these resources(memory and pcpu) allocated to VM are all
from NUMA node 0
(2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration,
and get a result. And then, migration failure is constructed. At last, I run
the test case again, and get an another result.


I guess this case purely writes the memory, that means the readonly mappings 
will


Yes, I printed out the speed of dirty page rate, it is about 1GB per second.


always be dropped by #PF, then huge mappings are established.

If benchmark memory read, you show observe its difference.



OK, thank for your suggestion!

Regards,
Jay Zhou







[Qemu-devel] [PATCH v1 08/13] qcow2: handle_prealloc(): find out if area zeroed by earlier preallocation

2017-05-19 Thread Anton Nefedov
Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
---
 block/qcow2-cluster.c | 2 ++
 block/qcow2.c | 8 +++-
 block/qcow2.h | 4 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b2879b9..25210cd 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1143,6 +1143,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 uint64_t *host_offset, uint64_t *bytes, QCowL2Meta **m)
 {
 BDRVQcow2State *s = bs->opaque;
+const uint64_t old_data_end = s->data_end;
 int l2_index;
 uint64_t *l2_table;
 uint64_t entry;
@@ -1264,6 +1265,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t 
guest_offset,
 .alloc_offset   = alloc_cluster_offset,
 .offset = start_of_cluster(s, guest_offset),
 .nb_clusters= nb_clusters,
+.clusters_are_trailing = alloc_cluster_offset >= old_data_end,
 
 .keep_old_clusters  = keep_old_clusters,
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 503f0dc..97a66a0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1684,7 +1684,13 @@ restart:
 
 if (end <= bdrv_getlength(file)) {
 /* No need to care, file size will not be changed */
-return false;
+
+/* We're safe to assume that the area is zeroes if the area
+ * was allocated at the end of data (s->data_end).
+ * In this case, the only way for file length to be bigger is that
+ * the area was preallocated by another request.
+ */
+return m->clusters_are_trailing;
 }
 
 meta = g_alloca(sizeof(*meta));
diff --git a/block/qcow2.h b/block/qcow2.h
index e28c54a..2fd8510 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -333,6 +333,10 @@ typedef struct QCowL2Meta
 /** Do not free the old clusters */
 bool keep_old_clusters;
 
+/** True if the area is allocated after the end of data area
+ *  (i.e. >= s->data_end), which means that it is zeroed */
+bool clusters_are_trailing;
+
 /**
  * Requests that overlap with this allocation and wait to be restarted
  * when the allocating request has completed.
-- 
2.7.4




[Qemu-devel] [PATCH v1 06/13] qcow2: truncate preallocated space

2017-05-19 Thread Anton Nefedov
From: "Denis V. Lunev" 

This could be done after calculation of the end of data and metadata in
the qcow2 image.

Signed-off-by: Denis V. Lunev 
Signed-off-by: Anton Nefedov 
---
 block/qcow2-cluster.c  | 9 +
 block/qcow2-refcount.c | 7 +++
 block/qcow2.c  | 8 
 block/qcow2.h  | 3 +++
 4 files changed, 27 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a4b6d40..b2879b9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1957,3 +1957,12 @@ fail:
 g_free(l1_table);
 return ret;
 }
+
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off)
+{
+BDRVQcow2State *s = bs->opaque;
+
+if (s->data_end < off) {
+s->data_end = off;
+}
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 873a1d2..8156466 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -744,6 +744,9 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
 ret = alloc_refcount_block(bs, cluster_index, _block);
 if (ret < 0) {
 goto fail;
+} else {
+qcow2_update_data_end(bs, s->refcount_table_offset +
+s->refcount_table_size * sizeof(uint64_t));
 }
 }
 old_table_index = table_index;
@@ -865,6 +868,8 @@ retry:
 s->free_cluster_index - 1 > (INT64_MAX >> s->cluster_bits))
 {
 return -EFBIG;
+} else {
+qcow2_update_data_end(bs, s->free_cluster_index << s->cluster_bits);
 }
 
 #ifdef DEBUG_ALLOC2
@@ -929,6 +934,8 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, 
uint64_t offset,
 
 if (ret < 0) {
 return ret;
+} else {
+qcow2_update_data_end(bs, offset + (nb_clusters << s->cluster_bits));
 }
 
 return i;
diff --git a/block/qcow2.c b/block/qcow2.c
index 07c1706..7b4359b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1192,6 +1192,8 @@ static int qcow2_do_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+s->data_end = bdrv_getlength(bs->file->bs);
+
 #ifdef DEBUG_ALLOC
 {
 BdrvCheckResult result = {0};
@@ -1948,12 +1950,18 @@ static int qcow2_inactivate(BlockDriverState *bs)
 static void qcow2_close(BlockDriverState *bs)
 {
 BDRVQcow2State *s = bs->opaque;
+
 qemu_vfree(s->l1_table);
 /* else pre-write overlap checks in cache_destroy may crash */
 s->l1_table = NULL;
 
 if (!(s->flags & BDRV_O_INACTIVE)) {
 qcow2_inactivate(bs);
+
+/* truncate preallocated space */
+if (!bs->read_only && s->data_end < bdrv_getlength(bs->file->bs)) {
+bdrv_truncate(bs->file, s->data_end, NULL);
+}
 }
 
 cache_clean_timer_del(bs);
diff --git a/block/qcow2.h b/block/qcow2.h
index a0d222d..e28c54a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -297,6 +297,7 @@ typedef struct BDRVQcow2State {
 char *image_backing_format;
 
 uint64_t prealloc_size;
+uint64_t data_end;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -607,4 +608,6 @@ int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache 
*c, uint64_t offset,
 void **table);
 void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
 
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off);
+
 #endif
-- 
2.7.4




Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-19 Thread Jason Wang



On 2017年05月19日 17:00, Wei Wang wrote:

On 05/19/2017 11:10 AM, Jason Wang wrote:



On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:



On 2017年05月17日 14:16, Jason Wang wrote:



On 2017年05月16日 15:12, Wei Wang wrote:




Hi:

Care to post the driver codes too?

OK. It may take some time to clean up the driver code before post 
it out. You can first

have a check of the draft at the repo here:
https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei


Interesting, looks like there's one copy on tx side. We used to 
have zerocopy support for tun for VM2VM traffic. Could you please 
try to compare it with your vhost-pci-net by:


We can analyze from the whole data path - from VM1's network stack 
to send packets -> VM2's
network stack to receive packets. The number of copies are actually 
the same for both.


That's why I'm asking you to compare the performance. The only reason 
for vhost-pci is performance. You should prove it.




vhost-pci: 1-copy happen in VM1's driver xmit(), which copes packets 
from its network stack to VM2's
RX ring buffer. (we call it "zerocopy" because there is no 
intermediate copy between VMs)
zerocopy enabled vhost-net: 1-copy happen in tun's recvmsg, which 
copies packets from VM1's TX ring

buffer to VM2's RX ring buffer.


Actually, there's a major difference here. You do copy in guest which 
consumes time slice of vcpu thread on host. Vhost_net do this in its 
own thread. So I feel vhost_net is even faster here, maybe I was wrong.




The code path using vhost_net is much longer - the Ping test shows 
that the zcopy based vhost_net reports around 0.237ms,

while using vhost-pci it reports around 0.06 ms.
For some environment issue, I can report the throughput number later.


Yes, vhost-pci should have better latency by design. But we should 
measure pps or packet size other than 64 as well. I agree vhost_net has 
bad latency, but this does not mean it could not be improved (just 
because few people are working on improve this in the past), especially 
we know the destination is another VM.






That being said, we compared to vhost-user, instead of vhost_net, 
because vhost-user is the one

that is used in NFV, which we think is a major use case for vhost-pci.


If this is true, why not draft a pmd driver instead of a kernel one? 


Yes, that's right. There are actually two directions of the vhost-pci 
driver implementation - kernel driver
and dpdk pmd. The QEMU side device patches are first posted out for 
discussion, because when the device
part is ready, we will be able to have the related team work on the 
pmd driver as well. As usual, the pmd

driver would give a much better throughput.


I think pmd should be easier for a prototype than kernel driver.



So, I think at this stage we should focus on the device part review, 
and use the kernel driver to prove that

the device part design and implementation is reasonable and functional.



Probably both.



And do you use virtio-net kernel driver to compare the performance? 
If yes, has OVS dpdk optimized for kernel driver (I think not)?




We used the legacy OVS+DPDK.
Another thing with the existing OVS+DPDK usage is its centralization 
property. With vhost-pci, we will be able to

de-centralize the usage.



Right, so I think we should prove:

- For usage, prove or make vhost-pci better than existed share memory 
based solution. (Or is virtio good at shared memory?)
- For performance, prove or make vhost-pci better than existed 
centralized solution.


What's more important, if vhost-pci is faster, I think its kernel 
driver should be also faster than virtio-net, no?


Sorry about the confusion. We are actually not trying to use vhost-pci 
to replace virtio-net. Rather, vhost-pci
can be viewed as another type of backend for virtio-net to be used in 
NFV (the communication channel is

vhost-pci-net<->virtio_net).


My point is performance number is important for proving the correctness 
for both design and engineering. If its slow, it has less interesting in 
NFV.


Thanks




Best,
Wei





Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-19 Thread Liu, Yi L
On Fri, May 19, 2017 at 09:07:49AM +, Tian, Kevin wrote:
> > From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> > Sent: Friday, May 19, 2017 1:24 PM
> > 
> > Hi Alex,
> > 
> > What's your opinion with Tianyu's question? Is it accepatable
> > to use VFIO API in intel_iommu emulator?
> 
> Did you actually need such translation at all? SID should be
> filled by kernel IOMMU driver based on which device is
> requested with invalidation request, regardless of which 
> guest SID is used in user space. Qemu only needs to know
> which fd corresponds to guest SID, and then initiates an
> invalidation request on that fd?

Kevin,

It actually depends on the svm binding behavior we expect in host
IOMMU driver side. If we want to have the binding per-device, this
translation is needed in Qemu either in VFIO or intel_iommu emulator.
So that the host SID could be used as a device selector when looping
devices in a group.

If we can use VFIO API directly, we also may trigger the svm bind/qi
propagation straightforwardly instead of using notifier.

Thanks,
Yi L
 
> > 
> > Thanks,
> > Yi L
> > On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> > > On 2017年04月26日 18:06, Liu, Yi L wrote:
> > > > With vIOMMU exposed to guest, vIOMMU emulator needs to do
> > translation
> > > > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > > > emulator needs to replace guest SID with host SID so that to limit
> > > > the invalidation. This patch introduces a new callback
> > > > iommu_ops->record_device() to notify vIOMMU emulator to record
> > necessary
> > > > information about the assigned device.
> > >
> > > This patch is to prepare to translate guest sbdf to host sbdf.
> > >
> > > Alex:
> > >   Could we add a new vfio API to do such translation? This will be more
> > > straight forward than storing host sbdf in the vIOMMU device model.
> > >
> > > >
> > > > Signed-off-by: Liu, Yi L 
> > > > ---
> > > >  include/exec/memory.h | 11 +++
> > > >  memory.c  | 12 
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > > index 7bd13ab..49087ef 100644
> > > > --- a/include/exec/memory.h
> > > > +++ b/include/exec/memory.h
> > > > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> > > >  IOMMUNotifierFlag new_flags);
> > > >  /* Set this up to provide customized IOMMU replay function */
> > > >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > > > +void (*record_device)(MemoryRegion *iommu,
> > > > +  void *device_info);
> > > >  };
> > > >
> > > >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > > > @@ -708,6 +710,15 @@ void
> > memory_region_notify_iommu(MemoryRegion *mr,
> > > >  void memory_region_notify_one(IOMMUNotifier *notifier,
> > > >IOMMUTLBEntry *entry);
> > > >
> > > > +/*
> > > > + * memory_region_notify_device_record: notify IOMMU to record
> > assign
> > > > + * device.
> > > > + * @mr: the memory region to notify
> > > > + * @ device_info: device information
> > > > + */
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info);
> > > > +
> > > >  /**
> > > >   * memory_region_register_iommu_notifier: register a notifier for
> > changes to
> > > >   * IOMMU translation entries.
> > > > diff --git a/memory.c b/memory.c
> > > > index 0728e62..45ef069 100644
> > > > --- a/memory.c
> > > > +++ b/memory.c
> > > > @@ -1600,6 +1600,18 @@ static void
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> > > >  mr->iommu_notify_flags = flags;
> > > >  }
> > > >
> > > > +void memory_region_notify_device_record(MemoryRegion *mr,
> > > > +void *info)
> > > > +{
> > > > +assert(memory_region_is_iommu(mr));
> > > > +
> > > > +if (mr->iommu_ops->record_device) {
> > > > +mr->iommu_ops->record_device(mr, info);
> > > > +}
> > > > +
> > > > +return;
> > > > +}
> > > > +
> > > >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > > > IOMMUNotifier *n)
> > > >  {
> > > >
> > >
> > >



[Qemu-devel] [PATCH v2 3/4] target/ppc: consolidate CPU device-tree id computation in helper

2017-05-19 Thread Greg Kurz
For historical reasons, we compute CPU device-tree ids with a non-trivial
logic. This patch consolidate the logic in a single helper to be used
in various places where it is currently open-coded.

It is okay to get rid of DIV_ROUND_UP() because we're sure that the number
of threads per core in the guest cannot exceed the number of threads per
core in the host.

Signed-off-by: Greg Kurz 
---
 hw/ppc/spapr.c  |6 ++
 target/ppc/cpu.h|   17 +
 target/ppc/translate_init.c |3 +--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 75e298b4c6be..1bb05a9a6b07 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -981,7 +981,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
 void *fdt;
 sPAPRPHBState *phb;
 char *buf;
-int smt = kvmppc_smt_threads();
 
 fdt = g_malloc0(FDT_MAX_SIZE);
 _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
@@ -1021,7 +1020,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
 _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
 /* /interrupt controller */
-spapr_dt_xics(DIV_ROUND_UP(max_cpus * smt, smp_threads), fdt, 
PHANDLE_XICP);
+spapr_dt_xics(ppc_cpu_dt_id_from_index(max_cpus), fdt, PHANDLE_XICP);
 
 ret = spapr_populate_memory(spapr, fdt);
 if (ret < 0) {
@@ -1977,7 +1976,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
 MachineState *machine = MACHINE(spapr);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 char *type = spapr_get_cpu_core_type(machine->cpu_model);
-int smt = kvmppc_smt_threads();
 const CPUArchIdList *possible_cpus;
 int boot_cores_nr = smp_cpus / smp_threads;
 int i;
@@ -2014,7 +2012,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
 sPAPRDRConnector *drc =
 spapr_dr_connector_new(OBJECT(spapr),
SPAPR_DR_CONNECTOR_TYPE_CPU,
-   (core_id / smp_threads) * smt);
+   ppc_cpu_dt_id_from_index(core_id));
 
 qemu_register_reset(spapr_drc_reset, drc);
 }
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 401e10e7dad8..47fe6c64698f 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2529,4 +2529,21 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
 PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 
 void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
+
+#if !defined(CONFIG_USER_ONLY)
+#include "sysemu/cpus.h"
+#include "target/ppc/kvm_ppc.h"
+
+static inline int ppc_cpu_dt_id_from_index(int cpu_index)
+{
+/* POWER HV support has an historical limitation that different threads
+ * on a single core cannot be in different guests at the same time. In
+ * order to allow KVM to assign guest threads to host cores accordingly,
+ * CPU device tree ids are spaced by the number of threads per host cores.
+ */
+return (cpu_index / smp_threads) * kvmppc_smt_threads()
++ (cpu_index % smp_threads);
+}
+#endif
+
 #endif /* PPC_CPU_H */
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 56a0ab22cfbe..837a9a496a65 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -9851,8 +9851,7 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 
 #if !defined(CONFIG_USER_ONLY)
-cpu->cpu_dt_id = (cs->cpu_index / smp_threads) * max_smt
-+ (cs->cpu_index % smp_threads);
+cpu->cpu_dt_id = ppc_cpu_dt_id_from_index(cs->cpu_index);
 
 if (kvm_enabled() && !kvm_vcpu_id_is_valid(cpu->cpu_dt_id)) {
 error_setg(errp, "Can't create CPU with id %d in KVM", cpu->cpu_dt_id);




[Qemu-devel] [PATCH v2 0/4] spapr/xics: fix migration of older machine types

2017-05-19 Thread Greg Kurz
v2: - some patches from v1 are already merged in ppc-for-2.10
- added a new fix to a potential memory leak (patch 1)
- consolidate dt_id computation (patch 3)
- see individual changelogs for patch 2 and 4

This series is based on:

https://github.com/dgibson/qemu.git ppc-for-2.10

I could successfully migrate from QEMU 2.9 and back, including when the guest
has different threads per core than the host and hotplugging cores between
each migration attempt.

--
Greg

---

Greg Kurz (4):
  spapr_cpu_core: drop reference on ICP object during CPU realization
  spapr: fix error reporting in xics_system_init()
  target/ppc: consolidate CPU device-tree id computation in helper
  spapr: fix migration of ICP objects from/to older QEMU


 hw/ppc/spapr.c  |   57 +++
 hw/ppc/spapr_cpu_core.c |   28 +++--
 include/hw/ppc/spapr.h  |2 ++
 target/ppc/cpu.h|   17 +
 target/ppc/translate_init.c |3 +-
 5 files changed, 86 insertions(+), 21 deletions(-)




[Qemu-devel] [PATCH v2 2/4] spapr: fix error reporting in xics_system_init()

2017-05-19 Thread Greg Kurz
If the user explicitely asked for kernel-irqchip support and "xics-kvm"
initialization fails, we shouldn't fallback to emulated "xics" as we
do now. It is also awkward to print an error message when we have an
errp pointer argument.

Let's use the errp argument to report the error and let the caller decide.
This simplifies the code as we don't need a local Error * here.

Signed-off-by: Greg Kurz 
---
v2: - total rewrite
---
 hw/ppc/spapr.c |   13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 91f7434861a8..75e298b4c6be 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -128,18 +128,14 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
 
 if (kvm_enabled()) {
-Error *err = NULL;
-
 if (machine_kernel_irqchip_allowed(machine) &&
 !xics_kvm_init(spapr, errp)) {
 spapr->icp_type = TYPE_KVM_ICP;
-spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, );
+spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, errp);
 }
 if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
-error_reportf_err(err,
-  "kernel_irqchip requested but unavailable: ");
-} else {
-error_free(err);
+error_prepend(errp, "kernel_irqchip requested but unavailable: ");
+return;
 }
 }
 
@@ -147,6 +143,9 @@ static void xics_system_init(MachineState *machine, int 
nr_irqs, Error **errp)
 xics_spapr_init(spapr);
 spapr->icp_type = TYPE_ICP;
 spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
+if (!spapr->ics) {
+return;
+}
 }
 }
 




Re: [Qemu-devel] [RFC PATCH V2 1/2] xen-pt: bind/unbind interrupt remapping format MSI

2017-05-19 Thread Anthony PERARD
On Thu, May 18, 2017 at 01:32:59AM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> If a vIOMMU is exposed to guest, guest will configure the msi to remapping
> format. The original code isn't suitable to the new format. A new pair
> bind/unbind interfaces are added for this usage. This patch recognizes
> this case and use new interfaces to bind/unbind msi.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  hw/xen/xen_pt_msi.c   | 50 
> ---
>  include/hw/i386/apic-msidef.h |  3 ++-
>  2 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
> index 62add06..5fab95e 100644
> --- a/hw/xen/xen_pt_msi.c
> +++ b/hw/xen/xen_pt_msi.c
> @@ -163,16 +163,24 @@ static int msi_msix_update(XenPCIPassthroughState *s,
>  int rc = 0;
>  uint64_t table_addr = 0;
>  
> -XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
> -   " (entry: %#x)\n",
> -   is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
> -
>  if (is_msix) {
>  table_addr = s->msix->mmio_base_addr;
>  }
>  
> -rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
> -  pirq, gflags, table_addr);
> +if (addr & MSI_ADDR_IF_MASK) {
> +XEN_PT_LOG(d, "Updating MSI%s with addr %#" PRIx64 "data %#x\n",

With a space before "data", I think it will be easier to read the debug
log.

> +   is_msix ? "-X": "", addr, data);
> +rc = xc_domain_update_msi_irq_remapping(xen_xc, xen_domid, pirq,
> + d->devfn, data, addr, 
> table_addr);

We are going to need a stub function for
xc_domain_update_msi_irq_remapping(), when Xen does not have support for
it, so QEMU can compile in any case. (same for unbind version.)

I think the stub can just return -ENOSYS. That going to require changes
in configure to detect newer xen version and the stub can be in
xen_common.h.

> +}
> +else {
> +XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
> +   " (entry: %#x)\n",
> +   is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
> +
> +rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
> +  pirq, gflags, table_addr);
> +}
>  
>  if (rc) {
>  XEN_PT_ERR(d, "Updating of MSI%s failed. (err: %d)\n",
> @@ -204,13 +212,29 @@ static int msi_msix_disable(XenPCIPassthroughState *s,
>  }
>  
>  if (is_binded) {
> -XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
> -   is_msix ? "-X" : "", pirq, gvec);
> -rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, gflags);
> -if (rc) {
> -XEN_PT_ERR(d, "Unbinding of MSI%s failed. (err: %d, pirq: %d, 
> gvec: %#x)\n",
> -   is_msix ? "-X" : "", errno, pirq, gvec);
> -return rc;
> +if ( addr & MSI_ADDR_IF_MASK ) {
> +XEN_PT_LOG(d, "Unbinding of MSI%s . ( pirq: %d, data: %x, "
> +   "addr: %#" PRIx64 ")\n",
> +   is_msix ? "-X" : "", pirq, data, addr);
> +rc = xc_domain_unbind_msi_irq_remapping(xen_xc, xen_domid, pirq,
> +d->devfn, data, addr);
> +if (rc) {
> +XEN_PT_ERR(d, "Unbinding of MSI%s . (error: %d, pirq: %d, "
> +   "data: %x, addr: %#" PRIx64 ")\n",
> +   is_msix ? "-X" : "", rc, pirq, data, addr);
> +return rc;
> +}
> +
> +} else {
> +XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
> +   is_msix ? "-X" : "", pirq, gvec);
> +rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, 
> gflags);
> +if (rc) {
> +XEN_PT_ERR(d, "Unbinding of MSI%s failed. (err: %d, pirq: 
> %d, "
> +   "gvec: %#x)\n",
> +   is_msix ? "-X" : "", errno, pirq, gvec);
> +return rc;
> +}
>  }
>  }
>  
> diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
> index 8b4d4cc..2c450f9 100644
> --- a/include/hw/i386/apic-msidef.h
> +++ b/include/hw/i386/apic-msidef.h
> @@ -26,6 +26,7 @@
>  
>  #define MSI_ADDR_DEST_ID_SHIFT  12
>  #define MSI_ADDR_DEST_IDX_SHIFT 4
> -#define  MSI_ADDR_DEST_ID_MASK  0x000
> +#define  MSI_ADDR_DEST_ID_MASK  0x000fff00

The value of MSI_ADDR_DEST_ID_MASK is changed here. I think the patch
should be:
+#define  MSI_ADDR_DEST_ID_MASK  0x0000


> +#define  MSI_ADDR_IF_MASK   0x0010
>  
>  #endif /* HW_APIC_MSIDEF_H */

Thanks,

-- 
Anthony PERARD



[Qemu-devel] [PULL 06/20] kvm: irqchip: trace changes on msi add/remove

2017-05-19 Thread Paolo Bonzini
From: Peter Xu 

It'll be nice to know which virq belongs to which device/vector when
adding msi routes, so adding two more parameters for the add trace.

Meanwhile, releasing virq has no tracing before. Add one for it.

Signed-off-by: Peter Xu 
Message-Id: <1494309644-18743-2-git-send-email-pet...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Paolo Bonzini 
---
 kvm-all.c| 4 +++-
 trace-events | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 90b8573..2598b1f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1144,6 +1144,7 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 }
 clear_gsi(s, virq);
 kvm_arch_release_virq_post(virq);
+trace_kvm_irqchip_release_virq(virq);
 }
 
 static unsigned int kvm_hash_msi(uint32_t data)
@@ -1287,7 +1288,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 return -EINVAL;
 }
 
-trace_kvm_irqchip_add_msi_route(virq);
+trace_kvm_irqchip_add_msi_route(dev ? dev->name : (char *)"N/A",
+vector, virq);
 
 kvm_add_routing_entry(s, );
 kvm_arch_add_msi_route_post(, vector, dev);
diff --git a/trace-events b/trace-events
index e582d63..f01ec05 100644
--- a/trace-events
+++ b/trace-events
@@ -69,8 +69,9 @@ kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, 
type 0x%x, arg %p"
 kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve 
ONEREG %" PRIu64 " from KVM: %s"
 kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set 
ONEREG %" PRIu64 " to KVM: %s"
 kvm_irqchip_commit_routes(void) ""
-kvm_irqchip_add_msi_route(int virq) "Adding MSI route virq=%d"
+kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d 
virq %d"
 kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
+kvm_irqchip_release_virq(int virq) "virq %d"
 
 # TCG related tracing (mostly disabled by default)
 # cpu-exec.c
-- 
1.8.3.1





[Qemu-devel] [PULL 15/20] exec: simplify phys_page_find() params

2017-05-19 Thread Paolo Bonzini
From: Peter Xu 

It really only plays with the dispatchers, so the parameter list does
not need that complexity. This helps for readability at least.

Signed-off-by: Peter Xu 
Message-Id: <1494838260-30439-2-git-send-email-pet...@redhat.com>
Reviewed-by: David Gibson 
Signed-off-by: Paolo Bonzini 
---
 exec.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index 96e3ac9..29633cd 100644
--- a/exec.c
+++ b/exec.c
@@ -373,10 +373,11 @@ static inline bool section_covers_addr(const 
MemoryRegionSection *section,
  int128_getlo(section->size), addr);
 }
 
-static MemoryRegionSection *phys_page_find(PhysPageEntry lp, hwaddr addr,
-   Node *nodes, MemoryRegionSection 
*sections)
+static MemoryRegionSection *phys_page_find(AddressSpaceDispatch *d, hwaddr 
addr)
 {
-PhysPageEntry *p;
+PhysPageEntry lp = d->phys_map, *p;
+Node *nodes = d->map.nodes;
+MemoryRegionSection *sections = d->map.sections;
 hwaddr index = addr >> TARGET_PAGE_BITS;
 int i;
 
@@ -414,8 +415,7 @@ static MemoryRegionSection 
*address_space_lookup_region(AddressSpaceDispatch *d,
 section_covers_addr(section, addr)) {
 update = false;
 } else {
-section = phys_page_find(d->phys_map, addr, d->map.nodes,
- d->map.sections);
+section = phys_page_find(d, addr);
 update = true;
 }
 if (resolve_subpage && section->mr->subpage) {
@@ -1283,8 +1283,7 @@ static void register_subpage(AddressSpaceDispatch *d, 
MemoryRegionSection *secti
 subpage_t *subpage;
 hwaddr base = section->offset_within_address_space
 & TARGET_PAGE_MASK;
-MemoryRegionSection *existing = phys_page_find(d->phys_map, base,
-   d->map.nodes, 
d->map.sections);
+MemoryRegionSection *existing = phys_page_find(d, base);
 MemoryRegionSection subsection = {
 .offset_within_address_space = base,
 .size = int128_make64(TARGET_PAGE_SIZE),
-- 
1.8.3.1





[Qemu-devel] [PULL 10/20] nbd: strict nbd_wr_syncv

2017-05-19 Thread Paolo Bonzini
From: Vladimir Sementsov-Ogievskiy 

nbd_wr_syncv is called either from coroutine or from client negotiation
code, when socket is in blocking mode. So, -EAGAIN is impossible.

Furthermore, EAGAIN is confusing, as, what to read/write again? With
EAGAIN as a return code we don't know how much data is already
read or written by the function, so in case of EAGAIN the whole
communication is broken.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20170516094533.6160-2-vsement...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 nbd/common.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/nbd/common.c b/nbd/common.c
index dccbb8e..4db45b3 100644
--- a/nbd/common.c
+++ b/nbd/common.c
@@ -20,6 +20,10 @@
 #include "qapi/error.h"
 #include "nbd-internal.h"
 
+/* nbd_wr_syncv
+ * The function may be called from coroutine or from non-coroutine context.
+ * When called from non-coroutine context @ioc must be in blocking mode.
+ */
 ssize_t nbd_wr_syncv(QIOChannel *ioc,
  struct iovec *iov,
  size_t niov,
@@ -42,11 +46,8 @@ ssize_t nbd_wr_syncv(QIOChannel *ioc,
 len = qio_channel_writev(ioc, local_iov, nlocal_iov, _err);
 }
 if (len == QIO_CHANNEL_ERR_BLOCK) {
-if (qemu_in_coroutine()) {
-qio_channel_yield(ioc, do_read ? G_IO_IN : G_IO_OUT);
-} else {
-return -EAGAIN;
-}
+assert(qemu_in_coroutine());
+qio_channel_yield(ioc, do_read ? G_IO_IN : G_IO_OUT);
 continue;
 }
 if (len < 0) {
-- 
1.8.3.1





  1   2   3   >