date:20170518

Re: [Qemu-devel] [Qemu-block] [PATCH 09/18] throttle-groups: protect throttled requests with a CoMutex

2017-05-18 Thread Alberto Garcia

On Thu 11 May 2017 04:41:59 PM CEST, Paolo Bonzini wrote:

> --- a/include/sysemu/block-backend.h
> +++ b/include/sysemu/block-backend.h
> @@ -72,11 +72,8 @@ typedef struct BlockDevOps {
>   * fields that must be public. This is in particular for QLIST_ENTRY() and
>   * friends so that BlockBackends can be kept in lists outside 
> block-backend.c */
>  typedef struct BlockBackendPublic {
> -/* I/O throttling has its own locking, but also some fields are
> - * protected by the AioContext lock.
> - */
> -
> -/* Protected by AioContext lock.  */
> +/* throttled_reqs_lock protects the CoQueues for throttled requests.  */
> +CoMutex  throttled_reqs_lock;
>  CoQueue  throttled_reqs[2];
>  
>  /* Nonzero if the I/O limits are currently being ignored; generally
>   * it is zero.  */
>  unsigned int io_limits_disabled;

Does io_limits_disabled need locking too?

Berto

Re: [Qemu-devel] [PULL 6/6] hw/usb/dev-serial: Do not try to set vendorid or productid properties

2017-05-18 Thread Paolo Bonzini



On 12/05/2017 14:21, Gerd Hoffmann wrote:
> From: Thomas Huth 
> 
> When starting QEMU with the legacy USB serial device like this:
> 
>  qemu-system-x86_64 -usbdevice serial:vendorid=0x1234:stdio
> 
> it currently aborts since the vendorid property does not exist
> anymore (it has been removed by commit f29783f72ea77dfbd7ea0c9):
> 
>  Unexpected error in object_property_find() at qemu/qom/object.c:1008:
>  qemu-system-x86_64: -usbdevice serial:vendorid=0x1234:stdio: Property
>  '.vendorid' not found
>  Aborted (core dumped)
> 
> Fix this crash by issuing a more friendly error message instead
> (and simplify the code also a little bit this way).
> 
> Signed-off-by: Thomas Huth 
> Message-id: 1493883704-27604-1-git-send-email-th...@redhat.com
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/usb/dev-serial.c | 24 ++--
>  1 file changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/usb/dev-serial.c b/hw/usb/dev-serial.c
> index 6d5137383b..83a4f0e6fb 100644
> --- a/hw/usb/dev-serial.c
> +++ b/hw/usb/dev-serial.c
> @@ -513,27 +513,18 @@ static USBDevice *usb_serial_init(USBBus *bus, const 
> char *filename)
>  {
>  USBDevice *dev;
>  Chardev *cdrv;
> -uint32_t vendorid = 0, productid = 0;
>  char label[32];
>  static int index;
>  
>  while (*filename && *filename != ':') {
>  const char *p;
> -char *e;
> +
>  if (strstart(filename, "vendorid=", )) {
> -vendorid = strtol(p, , 16);
> -if (e == p || (*e && *e != ',' && *e != ':')) {
> -error_report("bogus vendor ID %s", p);
> -return NULL;
> -}
> -filename = e;
> +error_report("vendorid is not supported anymore");
> +return NULL;
>  } else if (strstart(filename, "productid=", )) {
> -productid = strtol(p, , 16);
> -if (e == p || (*e && *e != ',' && *e != ':')) {
> -error_report("bogus product ID %s", p);
> -return NULL;
> -}
> -filename = e;
> +error_report("productid is not supported anymore");
> +return NULL;
>  } else {
>  error_report("unrecognized serial USB option %s", filename);
>  return NULL;

All breanches of the "if" now return NULL, so the "while" loop in turn
can become an

if (*filename && *filename != ':') {
}

and the "while (*filename == ',')" subloop can go away, replaced by just
"return NULL".

Even better, the "if (!*filename)" if just below can be moved first.

Paolo
> @@ -554,10 +545,7 @@ static USBDevice *usb_serial_init(USBBus *bus, const 
> char *filename)
>  
>  dev = usb_create(bus, "usb-serial");
>  qdev_prop_set_chr(>qdev, "chardev", cdrv);
> -if (vendorid)
> -qdev_prop_set_uint16(>qdev, "vendorid", vendorid);
> -if (productid)
> -qdev_prop_set_uint16(>qdev, "productid", productid);
> +
>  return dev;
>  }
>  
>

Re: [Qemu-devel] [RFC] qmp: Return 'user_creatable' & 'hotpluggable' fields on qom-list-types

2017-05-18 Thread Markus Armbruster

Eduardo Habkost  writes:

> Currently there's no way for QMP clients to get a list of device types
> that are really usable with -device.  This information would be useful
> for management software and test scripts (like the device-crash-test
> script I have submitted recently).  Interestingly, the "info qdm" HMP
> command provides this information, but no QMP command does.
>
> Add two new fields to the return value of qom-list-types:
> "user-creatable-device" and "hotpluggable-device".

Does the combination

"user-creatable-device": false,
"hotpluggable-device": true

make any sense?

Exposing information on user-creatable/hot-pluggable via QMP makes
sense.  The question is how.  This is a design question, and as so often
with design questions, I need more words to make my case than I'd like
to.  Please bear with me.

> Also, add extra
> arguments for filtering the list of types based on the new fields.

I consider the case for filtering weak.  Let's ignore this part for now.

We have a number of commands to introspect static information,
e.g. query-version, query-commands, query-qmp-schema, query-target,
query-machines, query-cpu-definitions, query-chardev-backends,
device-list-properties, qom-list-types.

Aside: us abandoning the convention to name such commands query-FOO is
regrettable.

In its basic form, i.e. without arguments, qom-list-types does what its
name suggests: it lists the QOM types.

It also permits finding out whether a type is abstract, but only in a
roundabout way: subtract the result of running it without arguments (or
with 'abstract':false) from the result with 'abstract':true.

It also permits finding the "implements" relation, but only in an even
more roundabout way: run it with 'implements':T for every abstract T,
then solve the resulting puzzle.

Unless there's a direct way to find both (and I'm not aware of any),
this is bad design.  The obvious fix is to extend its return type to
include the information.

With qom-list-types fixed that way, there's precedence for exposing
additional information on QOM types there.

Note that both the "abstract" bit and the "implements directly" list
apply to any QOM type, not just to certain subtypes.  As proposed,
"user-creatable-device' and "hotpluggable-device" apply only to the
"device" subtype.  There's no precedence for exposing information
specific to certain subtypes.

If we want to do it anyway, then qom-list-type should perhaps return a
union.  Taken to the logical conclusion, this becomes a nest of unions
mirroring the "direct subtype of" relation.  Hmm.

However, we already have a command to introspect device types:
device-list-properties.  Can we expose the information as read-only
property/ies of type "device" and be done?

But perhaps they aren't really specific to devices.  There are other QOM
types that can be created by users, e.g. with -object, so the
"user-creatable" bit applies more widely.  Are any of them only
creatable during initial startup?  If yes, then that applies more
widely, too.

> I'm not sure about the naming of the new command arguments.  Maybe the
> names are too long, but I believe that "user-creatable-devices-only=false"
> would have more obvious semantics than "user-creatable-devices=false"
> (the latter looks ambiguous: it could mean "return only
> non-user-creatable devices" or "return all devices").

I consider the filtering feature unnecessary complexity.  The filtering
we have got in against my objections.  I won't veto additional filtering
outright, but I will insist on test coverage.

The unfiltered output of qom-list-types is less than 10 KiB.  Even if we
extend it some, the need to filter it client-side seems dubious.  If a
management application really wants to save resources here, it should
cache the result and re-get it only when QEMU changes.

Re: [Qemu-devel] [PATCH 2/9] migration: Split migration/channel.c for channel operations

2017-05-18 Thread Peter Xu

On Wed, May 17, 2017 at 05:47:49PM +0200, Juan Quintela wrote:
> Create an include for its exported functions.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Peter Xu 

> ---
>  include/migration/migration.h |  7 -
>  migration/Makefile.objs   |  2 +-
>  migration/channel.c   | 69 
> +++
>  migration/channel.h   | 25 
>  migration/exec.c  |  1 +
>  migration/fd.c|  1 +
>  migration/migration.c | 50 ---
>  migration/socket.c|  1 +
>  migration/tls.c   |  1 +
>  9 files changed, 99 insertions(+), 58 deletions(-)
>  create mode 100644 migration/channel.c
>  create mode 100644 migration/channel.h
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 7d1eef7..e831259 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -157,17 +157,10 @@ void migration_fd_process_incoming(QEMUFile *f);
>  
>  void qemu_start_incoming_migration(const char *uri, Error **errp);
>  
> -void migration_channel_process_incoming(MigrationState *s,
> -QIOChannel *ioc);
> -
>  void migration_tls_channel_process_incoming(MigrationState *s,
>  QIOChannel *ioc,
>  Error **errp);
>  
> -void migration_channel_connect(MigrationState *s,
> -   QIOChannel *ioc,
> -   const char *hostname);
> -
>  void migration_tls_channel_connect(MigrationState *s,
> QIOChannel *ioc,
> const char *hostname,
> diff --git a/migration/Makefile.objs b/migration/Makefile.objs
> index 00a3f4a..4e8ab0a 100644
> --- a/migration/Makefile.objs
> +++ b/migration/Makefile.objs
> @@ -1,5 +1,5 @@
>  common-obj-y += migration.o socket.o fd.o exec.o
> -common-obj-y += tls.o
> +common-obj-y += tls.o channel.o
>  common-obj-y += colo-comm.o colo.o colo-failover.o
>  common-obj-y += vmstate.o page_cache.o
>  common-obj-y += qemu-file.o
> diff --git a/migration/channel.c b/migration/channel.c
> new file mode 100644
> index 000..6104de5
> --- /dev/null
> +++ b/migration/channel.c
> @@ -0,0 +1,69 @@
> +/*
> + * QEMU live migration
> + *
> + * Copyright IBM, Corp. 2008
> + *
> + * Authors:
> + *  Anthony Liguori   
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Contributions after 2012-01-13 are licensed under the terms of the
> + * GNU GPL, version 2 or (at your option) any later version.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "channel.h"
> +#include "migration/migration.h"
> +#include "trace.h"
> +#include "qapi/error.h"
> +#include "io/channel-tls.h"
> +
> +void migration_channel_process_incoming(MigrationState *s,
> +QIOChannel *ioc)
> +{
> +trace_migration_set_incoming_channel(
> +ioc, object_get_typename(OBJECT(ioc)));
> +
> +if (s->parameters.tls_creds &&
> +*s->parameters.tls_creds &&
> +!object_dynamic_cast(OBJECT(ioc),
> + TYPE_QIO_CHANNEL_TLS)) {
> +Error *local_err = NULL;
> +migration_tls_channel_process_incoming(s, ioc, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +}
> +} else {
> +QEMUFile *f = qemu_fopen_channel_input(ioc);
> +migration_fd_process_incoming(f);
> +}
> +}
> +
> +
> +void migration_channel_connect(MigrationState *s,
> +   QIOChannel *ioc,
> +   const char *hostname)
> +{
> +trace_migration_set_outgoing_channel(
> +ioc, object_get_typename(OBJECT(ioc)), hostname);
> +
> +if (s->parameters.tls_creds &&
> +*s->parameters.tls_creds &&
> +!object_dynamic_cast(OBJECT(ioc),
> + TYPE_QIO_CHANNEL_TLS)) {
> +Error *local_err = NULL;
> +migration_tls_channel_connect(s, ioc, hostname, _err);
> +if (local_err) {
> +migrate_fd_error(s, local_err);
> +error_free(local_err);
> +}
> +} else {
> +QEMUFile *f = qemu_fopen_channel_output(ioc);
> +
> +s->to_dst_file = f;
> +
> +migrate_fd_connect(s);
> +}
> +}
> diff --git a/migration/channel.h b/migration/channel.h
> new file mode 100644
> index 000..618acb7
> --- /dev/null
> +++ b/migration/channel.h
> @@ -0,0 +1,25 @@
> +/*
> + * QEMU live migration channel functions
> + *
> + * Copyright IBM, Corp. 2008
> + *
> + * Authors:
> + *  Anthony Liguori   
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in

Re: [Qemu-devel] [PATCH 1/9] migration: Create migration/xbzrle.h

2017-05-18 Thread Peter Xu

On Wed, May 17, 2017 at 05:47:48PM +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela 

Reviewed-by: Peter Xu 

> ---
>  include/migration/migration.h |  4 
>  migration/ram.c   |  1 +
>  migration/xbzrle.c|  2 +-
>  migration/xbzrle.h| 21 +
>  tests/test-xbzrle.c   |  2 +-
>  5 files changed, 24 insertions(+), 6 deletions(-)
>  create mode 100644 migration/xbzrle.h
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index b80a6ed..7d1eef7 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -251,10 +251,6 @@ bool migrate_zero_blocks(void);
>  
>  bool migrate_auto_converge(void);
>  
> -int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
> - uint8_t *dst, int dlen);
> -int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
> -
>  int migrate_use_xbzrle(void);
>  int64_t migrate_xbzrle_cache_size(void);
>  bool migrate_colo_enabled(void);
> diff --git a/migration/ram.c b/migration/ram.c
> index 76c118c..2564c00 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -35,6 +35,7 @@
>  #include "qemu/bitmap.h"
>  #include "qemu/timer.h"
>  #include "qemu/main-loop.h"
> +#include "xbzrle.h"
>  #include "migration/migration.h"
>  #include "postcopy-ram.h"
>  #include "exec/address-spaces.h"
> diff --git a/migration/xbzrle.c b/migration/xbzrle.c
> index c858339..1ba482d 100644
> --- a/migration/xbzrle.c
> +++ b/migration/xbzrle.c
> @@ -12,7 +12,7 @@
>   */
>  #include "qemu/osdep.h"
>  #include "qemu/cutils.h"
> -#include "include/migration/migration.h"
> +#include "xbzrle.h"
>  
>  /*
>page = zrun nzrun
> diff --git a/migration/xbzrle.h b/migration/xbzrle.h
> new file mode 100644
> index 000..a0db507
> --- /dev/null
> +++ b/migration/xbzrle.h
> @@ -0,0 +1,21 @@
> +/*
> + * QEMU live migration
> + *
> + * Copyright IBM, Corp. 2008
> + *
> + * Authors:
> + *  Anthony Liguori   
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_MIGRATION_XBZRLE_H
> +#define QEMU_MIGRATION_XBZRLE_H
> +
> +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
> + uint8_t *dst, int dlen);
> +
> +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
> +#endif
> diff --git a/tests/test-xbzrle.c b/tests/test-xbzrle.c
> index 49f6419..f5e08de 100644
> --- a/tests/test-xbzrle.c
> +++ b/tests/test-xbzrle.c
> @@ -13,7 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
>  #include "qemu/cutils.h"
> -#include "include/migration/migration.h"
> +#include "../migration/xbzrle.h"
>  
>  #define PAGE_SIZE 4096
>  
> -- 
> 2.9.3
> 

-- 
Peter Xu

Re: [Qemu-devel] [Qemu-block] [PATCH 08/18] throttle-groups: do not use qemu_co_enter_next

2017-05-18 Thread Alberto Garcia

On Thu 11 May 2017 04:41:58 PM CEST, Paolo Bonzini wrote:
> Prepare for removing this function; always restart throttled requests
> from coroutine context.  This will matter when restarting throttled
> requests will have to acquire a CoMutex.
>
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-devel] [PATCH] trivial: Remove unneeded ifndef in memory.h

2017-05-18 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> All the file is surounded already by #ifndef CONFIG_USER_ONLY.
> 
> Signed-off-by: Juan Quintela 

Ah yes, the patch that added the inner ifndef added it to
every use of exec/hwaddr.h.

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/exec/memory.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 99e0f54..6b5b953 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -17,9 +17,7 @@
>  #ifndef CONFIG_USER_ONLY
>  
>  #include "exec/cpu-common.h"
> -#ifndef CONFIG_USER_ONLY
>  #include "exec/hwaddr.h"
> -#endif
>  #include "exec/memattrs.h"
>  #include "exec/ramlist.h"
>  #include "qemu/queue.h"
> -- 
> 2.9.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v2] target/s390x: Add support for the TEST BLOCK instruction

2017-05-18 Thread Thomas Huth

TEST BLOCK was likely once used to execute basic memory
tests, but nowadays it's just a (slow) way to clear a page.

Signed-off-by: Thomas Huth 
---
 v2:
 - Use DEF_HELPER_FLAGS_2 instead for DEF_HELPER_2 for returning CC value
 - Convert real to absolute address
 - Added a check for valid RAM page
 - Added low-address protection check

 target/s390x/cpu.h |  1 +
 target/s390x/helper.h  |  1 +
 target/s390x/insn-data.def |  2 ++
 target/s390x/mem_helper.c  | 28 
 target/s390x/mmu_helper.c  |  2 +-
 target/s390x/translate.c   | 10 ++
 6 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 240b8a5..4f38ba0 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -1082,6 +1082,7 @@ struct sysib_322 {
 #define SIGP_ORDER_MASK 0x00ff
 
 void load_psw(CPUS390XState *env, uint64_t mask, uint64_t addr);
+target_ulong mmu_real2abs(CPUS390XState *env, target_ulong raddr);
 int mmu_translate(CPUS390XState *env, target_ulong vaddr, int rw, uint64_t asc,
   target_ulong *raddr, int *flags, bool exc);
 int sclp_service_call(CPUS390XState *env, uint64_t sccb, uint32_t code);
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 0b70770..1fae191 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -102,6 +102,7 @@ DEF_HELPER_FLAGS_4(lctl, TCG_CALL_NO_WG, void, env, i32, 
i64, i32)
 DEF_HELPER_FLAGS_4(lctlg, TCG_CALL_NO_WG, void, env, i32, i64, i32)
 DEF_HELPER_FLAGS_4(stctl, TCG_CALL_NO_WG, void, env, i32, i64, i32)
 DEF_HELPER_FLAGS_4(stctg, TCG_CALL_NO_WG, void, env, i32, i64, i32)
+DEF_HELPER_FLAGS_2(testblock, TCG_CALL_NO_WG, i32, env, i64)
 DEF_HELPER_FLAGS_2(tprot, TCG_CALL_NO_RWG, i32, i64, i64)
 DEF_HELPER_FLAGS_2(iske, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_3(sske, TCG_CALL_NO_RWG, void, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 55a7c52..cac0f51 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -918,6 +918,8 @@
 /* STORE USING REAL ADDRESS */
 C(0xb246, STURA,   RRE,   Z,   r1_o, r2_o, 0, 0, stura, 0)
 C(0xb925, STURG,   RRE,   Z,   r1_o, r2_o, 0, 0, sturg, 0)
+/* TEST BLOCK */
+C(0xb22c, TB,  RRE,   Z,   0, r2_o, 0, 0, testblock, 0)
 /* TEST PROTECTION */
 C(0xe501, TPROT,   SSE,   Z,   la1, a2, 0, 0, tprot, 0)
 
diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
index f6e5bce..de0ecd4 100644
--- a/target/s390x/mem_helper.c
+++ b/target/s390x/mem_helper.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "exec/address-spaces.h"
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
@@ -973,6 +974,33 @@ void HELPER(stctl)(CPUS390XState *env, uint32_t r1, 
uint64_t a2, uint32_t r3)
 }
 }
 
+uint32_t HELPER(testblock)(CPUS390XState *env, uint64_t real_addr)
+{
+CPUState *cs = CPU(s390_env_get_cpu(env));
+uint64_t abs_addr;
+int i;
+
+real_addr = fix_address(env, real_addr);
+abs_addr = mmu_real2abs(env, real_addr) & TARGET_PAGE_MASK;
+if (!address_space_access_valid(_space_memory, abs_addr,
+TARGET_PAGE_SIZE, true)) {
+program_interrupt(env, PGM_ADDRESSING, 4);
+return 1;
+}
+
+/* Check low-address protection */
+if ((env->cregs[0] & CR0_LOWPROT) != 0 && real_addr < 0x2000) {
+program_interrupt(env, PGM_PROTECTION, 4);
+return 1;
+}
+
+for (i = 0; i < TARGET_PAGE_SIZE; i += 8) {
+stq_phys(cs->as, abs_addr + i, 0);
+}
+
+return 0;
+}
+
 uint32_t HELPER(tprot)(uint64_t a1, uint64_t a2)
 {
 /* XXX implement */
diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
index b11a027..31eb9ef 100644
--- a/target/s390x/mmu_helper.c
+++ b/target/s390x/mmu_helper.c
@@ -108,7 +108,7 @@ static void trigger_page_fault(CPUS390XState *env, 
target_ulong vaddr,
  * Translate real address to absolute (= physical)
  * address by taking care of the prefix mapping.
  */
-static target_ulong mmu_real2abs(CPUS390XState *env, target_ulong raddr)
+target_ulong mmu_real2abs(CPUS390XState *env, target_ulong raddr)
 {
 if (raddr < 0x2000) {
 return raddr + env->psa;/* Map the lowcore. */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 4c48c59..61aa2c9 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -4057,6 +4057,15 @@ static ExitStatus op_tcxb(DisasContext *s, DisasOps *o)
 }
 
 #ifndef CONFIG_USER_ONLY
+
+static ExitStatus op_testblock(DisasContext *s, DisasOps *o)
+{
+check_privileged(s);
+gen_helper_testblock(cc_op, cpu_env, o->in2);
+set_cc_static(s);
+return NO_EXIT;
+}
+
 static ExitStatus op_tprot(DisasContext *s, DisasOps *o)
 {
 potential_page_fault(s);
@@ -4064,6 +4073,7 @@ static ExitStatus op_tprot(DisasContext *s, DisasOps *o)
 set_cc_static(s);
 return

[Qemu-devel] [RFC PATCH V2 1/2] xen-pt: bind/unbind interrupt remapping format MSI

2017-05-18 Thread Lan Tianyu

From: Chao Gao 

If a vIOMMU is exposed to guest, guest will configure the msi to remapping
format. The original code isn't suitable to the new format. A new pair
bind/unbind interfaces are added for this usage. This patch recognizes
this case and use new interfaces to bind/unbind msi.

Signed-off-by: Chao Gao 
Signed-off-by: Lan Tianyu 
---
 hw/xen/xen_pt_msi.c   | 50 ---
 include/hw/i386/apic-msidef.h |  3 ++-
 2 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
index 62add06..5fab95e 100644
--- a/hw/xen/xen_pt_msi.c
+++ b/hw/xen/xen_pt_msi.c
@@ -163,16 +163,24 @@ static int msi_msix_update(XenPCIPassthroughState *s,
 int rc = 0;
 uint64_t table_addr = 0;
 
-XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
-   " (entry: %#x)\n",
-   is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
-
 if (is_msix) {
 table_addr = s->msix->mmio_base_addr;
 }
 
-rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
-  pirq, gflags, table_addr);
+if (addr & MSI_ADDR_IF_MASK) {
+XEN_PT_LOG(d, "Updating MSI%s with addr %#" PRIx64 "data %#x\n",
+   is_msix ? "-X": "", addr, data);
+rc = xc_domain_update_msi_irq_remapping(xen_xc, xen_domid, pirq,
+ d->devfn, data, addr, table_addr);
+}
+else {
+XEN_PT_LOG(d, "Updating MSI%s with pirq %d gvec %#x gflags %#x"
+   " (entry: %#x)\n",
+   is_msix ? "-X" : "", pirq, gvec, gflags, msix_entry);
+
+rc = xc_domain_update_msi_irq(xen_xc, xen_domid, gvec,
+  pirq, gflags, table_addr);
+}
 
 if (rc) {
 XEN_PT_ERR(d, "Updating of MSI%s failed. (err: %d)\n",
@@ -204,13 +212,29 @@ static int msi_msix_disable(XenPCIPassthroughState *s,
 }
 
 if (is_binded) {
-XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
-   is_msix ? "-X" : "", pirq, gvec);
-rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, gflags);
-if (rc) {
-XEN_PT_ERR(d, "Unbinding of MSI%s failed. (err: %d, pirq: %d, 
gvec: %#x)\n",
-   is_msix ? "-X" : "", errno, pirq, gvec);
-return rc;
+if ( addr & MSI_ADDR_IF_MASK ) {
+XEN_PT_LOG(d, "Unbinding of MSI%s . ( pirq: %d, data: %x, "
+   "addr: %#" PRIx64 ")\n",
+   is_msix ? "-X" : "", pirq, data, addr);
+rc = xc_domain_unbind_msi_irq_remapping(xen_xc, xen_domid, pirq,
+d->devfn, data, addr);
+if (rc) {
+XEN_PT_ERR(d, "Unbinding of MSI%s . (error: %d, pirq: %d, "
+   "data: %x, addr: %#" PRIx64 ")\n",
+   is_msix ? "-X" : "", rc, pirq, data, addr);
+return rc;
+}
+
+} else {
+XEN_PT_LOG(d, "Unbind MSI%s with pirq %d, gvec %#x\n",
+   is_msix ? "-X" : "", pirq, gvec);
+rc = xc_domain_unbind_msi_irq(xen_xc, xen_domid, gvec, pirq, 
gflags);
+if (rc) {
+XEN_PT_ERR(d, "Unbinding of MSI%s failed. (err: %d, pirq: %d, "
+   "gvec: %#x)\n",
+   is_msix ? "-X" : "", errno, pirq, gvec);
+return rc;
+}
 }
 }
 
diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
index 8b4d4cc..2c450f9 100644
--- a/include/hw/i386/apic-msidef.h
+++ b/include/hw/i386/apic-msidef.h
@@ -26,6 +26,7 @@
 
 #define MSI_ADDR_DEST_ID_SHIFT  12
 #define MSI_ADDR_DEST_IDX_SHIFT 4
-#define  MSI_ADDR_DEST_ID_MASK  0x000
+#define  MSI_ADDR_DEST_ID_MASK  0x000fff00
+#define  MSI_ADDR_IF_MASK   0x0010
 
 #endif /* HW_APIC_MSIDEF_H */
-- 
1.8.3.1

[Qemu-devel] [RFC PATCH V2 0/2] Qemu: Add Xen vIOMMU interrupt remapping function.

2017-05-18 Thread Lan Tianyu

Change since V1:
   1) Move create/destroy vIOMMU and query vIOMMU capabilities to tool 
stack.
   2) Fix some code style issue.

This patchset is to deal with MSI interrupt remapping request when guest
updates MSI registers.

Repo:
https://github.com/lantianyu/qemu/tree/xen_viommu_rfc_v2

Chao Gao (2):
  xen-pt: bind/unbind interrupt remapping format MSI
  msi: Handle remappable format interrupt request

 hw/pci/msi.c  |  5 +++--
 hw/pci/msix.c |  4 +++-
 hw/xen/xen_pt_msi.c   | 52 +++
 include/hw/i386/apic-msidef.h |  3 ++-
 include/hw/xen/xen.h  |  2 +-
 xen-hvm-stub.c|  2 +-
 xen-hvm.c |  7 +-
 7 files changed, 54 insertions(+), 21 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [RFC PATCH V2 2/2] msi: Handle remappable format interrupt request

2017-05-18 Thread Lan Tianyu

From: Chao Gao 

According to VT-d spec Interrupt Remapping and Interrupt Posting ->
Interrupt Remapping -> Interrupt Request Formats On Intel 64
Platforms, fields of MSI data register have changed. This patch
avoids wrongly regarding a remappable format interrupt request as
an interrupt binded with an event channel.

Signed-off-by: Chao Gao 
Signed-off-by: Lan Tianyu 
---
 hw/pci/msi.c | 5 +++--
 hw/pci/msix.c| 4 +++-
 hw/xen/xen_pt_msi.c  | 2 +-
 include/hw/xen/xen.h | 2 +-
 xen-hvm-stub.c   | 2 +-
 xen-hvm.c| 7 ++-
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index a87b227..199cb47 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -289,7 +289,7 @@ void msi_reset(PCIDevice *dev)
 static bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
 {
 uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
-uint32_t mask, data;
+uint32_t mask, data, addr_lo;
 bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
 assert(vector < PCI_MSI_VECTORS_MAX);
 
@@ -298,7 +298,8 @@ static bool msi_is_masked(const PCIDevice *dev, unsigned 
int vector)
 }
 
 data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
-if (xen_is_pirq_msi(data)) {
+addr_lo = pci_get_long(dev->config + msi_address_lo_off(dev));
+if (xen_is_pirq_msi(data, addr_lo)) {
 return false;
 }
 
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index bb54e8b..efe2982 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -82,9 +82,11 @@ static bool msix_vector_masked(PCIDevice *dev, unsigned int 
vector, bool fmask)
 {
 unsigned offset = vector * PCI_MSIX_ENTRY_SIZE;
 uint8_t *data = >msix_table[offset + PCI_MSIX_ENTRY_DATA];
+uint8_t *addr_lo = >msix_table[offset + PCI_MSIX_ENTRY_LOWER_ADDR];
 /* MSIs on Xen can be remapped into pirqs. In those cases, masking
  * and unmasking go through the PV evtchn path. */
-if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data))) {
+if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data),
+ pci_get_long(addr_lo))) {
 return false;
 }
 return fmask || dev->msix_table[offset + PCI_MSIX_ENTRY_VECTOR_CTRL] &
diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
index 5fab95e..45a9e9f 100644
--- a/hw/xen/xen_pt_msi.c
+++ b/hw/xen/xen_pt_msi.c
@@ -114,7 +114,7 @@ static int msi_msix_setup(XenPCIPassthroughState *s,
 
 assert((!is_msix && msix_entry == 0) || is_msix);
 
-if (xen_is_pirq_msi(data)) {
+if (xen_is_pirq_msi(data, addr)) {
 *ppirq = msi_ext_dest_id(addr >> 32) | msi_dest_id(addr);
 if (!*ppirq) {
 /* this probably identifies an misconfiguration of the guest,
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 09c2ce5..af759bc 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -33,7 +33,7 @@ int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
 void xen_piix3_set_irq(void *opaque, int irq_num, int level);
 void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len);
 void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
-int xen_is_pirq_msi(uint32_t msi_data);
+int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo);
 
 qemu_irq *xen_interrupt_controller_init(void);
 
diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
index c500325..dae421c 100644
--- a/xen-hvm-stub.c
+++ b/xen-hvm-stub.c
@@ -31,7 +31,7 @@ void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
 {
 }
 
-int xen_is_pirq_msi(uint32_t msi_data)
+int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
 {
 return 0;
 }
diff --git a/xen-hvm.c b/xen-hvm.c
index 5043beb..db29121 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -146,8 +146,13 @@ void xen_piix_pci_write_config_client(uint32_t address, 
uint32_t val, int len)
 }
 }
 
-int xen_is_pirq_msi(uint32_t msi_data)
+int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
 {
+/* If msi address is configurate to remapping format, the msi will not
+ * remapped into a pirq.
+ */
+if (msi_addr_lo & MSI_ADDR_IF_MASK)
+return 0;
 /* If vector is 0, the msi is remapped into a pirq, passed as
  * dest_id.
  */
-- 
1.8.3.1

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-18 Thread Jean-Philippe Brucker

On 17/05/17 11:27, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
>> On Wed, 26 Apr 2017 18:12:02 +0800
>> "Liu, Yi L"  wrote:
>>>  
>>> +/* IOCTL for Shared Virtual Memory Bind */
>>> +struct vfio_device_svm {
>>> +   __u32   argsz;
>>> +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
>>> +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
>>> driver */
>>> +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
>>> +   __u32   flags;
>>> +   __u32   length;
>>> +   __u8data[];
>>
>> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
>> pasid_table_info?  So at a minimum this is a union including struct
>> pasid_table_info.  Furthermore how does a user learn what the opaque
>> data in struct pasid_table_info is without looking at the code?  A user
>> API needs to be clear and documented, not opaque and variable.  We
>> should also have references to the hardware spec for an Intel or ARM
>> PASID table in uapi.  flags should be defined as they're used, let's
>> not reserve them with the expectation of future use.
>>
> 
> Agree. would add description accordingly. For the flags, I would remove
> the last two as I wouldn't use. I think Jean would add them in his/her
> patchset. Anyhow, one of us need to do merge on the flags.

Yes, I can add the VFIO_SVM_BIND_PASID (or rather _TASK) flag as (1 << 1)
in my series if it helps the merge. The PGTABLE flag is for another series
which I don't plan to send out anytime soon, since there already is enough
pending work on this.

Thanks,
Jean

[Qemu-devel] [PATCH 4/5] migration: Remove old MigrationParams

2017-05-18 Thread Juan Quintela

Not used anymore after moving block migration to use capabilities.

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: zhanghailiang 
Reviewed-by: Peter Xu 
---
 include/migration/migration.h | 10 ++
 include/migration/vmstate.h   |  1 -
 include/qemu/typedefs.h   |  1 -
 include/sysemu/sysemu.h   |  3 +--
 migration/colo.c  |  2 +-
 migration/migration.c |  8 +++-
 migration/savevm.c| 16 +++-
 7 files changed, 10 insertions(+), 31 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4dedc66..b80a6ed 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -38,10 +38,6 @@
 #define QEMU_VM_COMMAND  0x08
 #define QEMU_VM_SECTION_FOOTER   0x7e
 
-struct MigrationParams {
-bool unused; /* C doesn't allow empty structs */
-};
-
 /* Messages sent on the return path from destination to source */
 enum mig_rp_message_type {
 MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
@@ -109,12 +105,10 @@ struct MigrationState
 QEMUBH *cleanup_bh;
 QEMUFile *to_dst_file;
 
-/* New style params from 'migrate-set-parameters' */
+/* params from 'migrate-set-parameters' */
 MigrationParameters parameters;
 
 int state;
-/* Old style params from 'migrate' command */
-MigrationParams params;
 
 /* State related to return path */
 struct {
@@ -207,7 +201,7 @@ void migrate_fd_connect(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
-MigrationState *migrate_init(const MigrationParams *params);
+MigrationState *migrate_init(void);
 bool migration_is_blocked(Error **errp);
 bool migration_in_setup(MigrationState *);
 bool migration_is_idle(void);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 8489659..dacb052 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -37,7 +37,6 @@ typedef int LoadStateHandler(QEMUFile *f, void *opaque, int 
version_id);
 
 typedef struct SaveVMHandlers {
 /* This runs inside the iothread lock.  */
-void (*set_params)(const MigrationParams *params, void * opaque);
 SaveStateHandler *save_state;
 
 void (*cleanup)(void *opaque);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 7d85057..33a6aa1 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -49,7 +49,6 @@ typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionCache MemoryRegionCache;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
-typedef struct MigrationParams MigrationParams;
 typedef struct MigrationState MigrationState;
 typedef struct Monitor Monitor;
 typedef struct MonitorDef MonitorDef;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 83c1ceb..765358e 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -102,8 +102,7 @@ enum qemu_vm_cmd {
 #define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
 
 bool qemu_savevm_state_blocked(Error **errp);
-void qemu_savevm_state_begin(QEMUFile *f,
- const MigrationParams *params);
+void qemu_savevm_state_begin(QEMUFile *f);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/colo.c b/migration/colo.c
index 8c86892..dd38fed 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -348,7 +348,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 /* Disable block migration */
 migrate_set_block_enabled(false, _err);
 qemu_savevm_state_header(fb);
-qemu_savevm_state_begin(fb, >params);
+qemu_savevm_state_begin(fb);
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete_precopy(fb, false);
 qemu_mutex_unlock_iothread();
diff --git a/migration/migration.c b/migration/migration.c
index b3d300d..ed66158 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1118,7 +1118,7 @@ bool migration_is_idle(void)
 return false;
 }
 
-MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(void)
 {
 MigrationState *s = migrate_get_current();
 
@@ -1132,7 +1132,6 @@ MigrationState *migrate_init(const MigrationParams 
*params)
 s->cleanup_bh = 0;
 s->to_dst_file = NULL;
 s->state = MIGRATION_STATUS_NONE;
-s->params = *params;
 s->rp_state.from_dst_file = NULL;
 s->rp_state.error = false;
 s->mbps = 0.0;
@@ -1221,7 +1220,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 {
 Error *local_err = NULL;
 MigrationState *s = migrate_get_current();
-MigrationParams params;
 const char *p;
 
 if

[Qemu-devel] [PATCH 3/5] migration: Remove use of old MigrationParams

2017-05-18 Thread Juan Quintela

We have change in the previous patch to use migration capabilities for
it.  Notice that we continue using the old command line flags from
migrate command from the time being.  Remove the set_params method as
now it is empty.

For savevm, one can't do a:

savevm -b/-i foo

but now one can do:

migrate_set_capability block on
savevm foo

And we can't use block migration. We could disable block capability
unconditionally, but it would not be much better.

Signed-off-by: Juan Quintela 
Reviewed-by: Eric Blake 

---
- Maintain shared/enabled dependency (Xu suggestion)
- Now we maintain the dependency on the setter functions
- improve error messages
---
 include/migration/migration.h |  3 +--
 migration/block.c | 17 ++---
 migration/colo.c  |  4 ++--
 migration/migration.c |  3 ---
 migration/savevm.c|  8 ++--
 5 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 024a048..4dedc66 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -39,8 +39,7 @@
 #define QEMU_VM_SECTION_FOOTER   0x7e
 
 struct MigrationParams {
-bool blk;
-bool shared;
+bool unused; /* C doesn't allow empty structs */
 };
 
 /* Messages sent on the return path from destination to source */
diff --git a/migration/block.c b/migration/block.c
index 060087f..5d22926 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -94,9 +94,6 @@ typedef struct BlkMigBlock {
 } BlkMigBlock;
 
 typedef struct BlkMigState {
-/* Written during setup phase.  Can be read without a lock.  */
-int blk_enable;
-int shared_base;
 QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
 int64_t total_sector_sum;
 bool zero_blocks;
@@ -425,7 +422,7 @@ static int init_blk_migration(QEMUFile *f)
 bmds->bulk_completed = 0;
 bmds->total_sectors = sectors;
 bmds->completed_sectors = 0;
-bmds->shared_base = block_mig_state.shared_base;
+bmds->shared_base = migrate_use_block_incremental();
 
 assert(i < num_bs);
 bmds_bs[i].bmds = bmds;
@@ -994,22 +991,12 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(const MigrationParams *params, void *opaque)
-{
-block_mig_state.blk_enable = params->blk;
-block_mig_state.shared_base = params->shared;
-
-/* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= params->shared;
-}
-
 static bool block_is_active(void *opaque)
 {
-return block_mig_state.blk_enable == 1;
+return migrate_use_block();
 }
 
 static SaveVMHandlers savevm_block_handlers = {
-.set_params = block_set_params,
 .save_live_setup = block_save_setup,
 .save_live_iterate = block_save_iterate,
 .save_live_complete_precopy = block_save_complete,
diff --git a/migration/colo.c b/migration/colo.c
index 963c802..8c86892 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -14,6 +14,7 @@
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/colo.h"
+#include "migration/block.h"
 #include "io/channel-buffer.h"
 #include "trace.h"
 #include "qemu/error-report.h"
@@ -345,8 +346,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 }
 
 /* Disable block migration */
-s->params.blk = 0;
-s->params.shared = 0;
+migrate_set_block_enabled(false, _err);
 qemu_savevm_state_header(fb);
 qemu_savevm_state_begin(fb, >params);
 qemu_mutex_lock_iothread();
diff --git a/migration/migration.c b/migration/migration.c
index c13c0a2..b3d300d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1224,9 +1224,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 MigrationParams params;
 const char *p;
 
-params.blk = has_blk && blk;
-params.shared = has_inc && inc;
-
 if (migration_is_setup_or_active(s->state) ||
 s->state == MIGRATION_STATUS_CANCELLING ||
 s->state == MIGRATION_STATUS_COLO) {
diff --git a/migration/savevm.c b/migration/savevm.c
index f5e8194..2f1f4eb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1233,8 +1233,6 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 {
 int ret;
 MigrationParams params = {
-.blk = 0,
-.shared = 0
 };
 MigrationState *ms = migrate_init();
 MigrationStatus status;
@@ -1245,6 +1243,12 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 goto done;
 }
 
+if (migrate_use_block()) {
+error_setg(errp, "Block migration and snapshots are incompatible");
+ret = -EINVAL;
+goto done;
+}
+
 qemu_mutex_unlock_iothread();
 qemu_savevm_state_header(f);
 qemu_savevm_state_begin(f, );
-- 
2.9.3

[Qemu-devel] [PATCH 2/5] migration: Create block capability

2017-05-18 Thread Juan Quintela

Create one capability for block migration and one parameter for
incremental block migration.

Signed-off-by: Juan Quintela 

---

- address all Markus comments
- use Markus and Eric text descriptions
- change logic another time
- improve text messages
---
 hmp.c | 13 
 include/migration/block.h |  2 ++
 include/migration/migration.h |  6 
 migration/migration.c | 71 +++
 qapi-schema.json  | 28 +++--
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/hmp.c b/hmp.c
index acbbc5c..208154c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -326,6 +326,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRId64 "\n",
 MigrationParameter_lookup[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY],
 params->x_checkpoint_delay);
+assert(params->has_block_incremental);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_lookup[MIGRATION_PARAMETER_BLOCK_INCREMENTAL],
+   params->block_incremental ? "on" : "off");
 }
 
 qapi_free_MigrationParameters(params);
@@ -1527,6 +1531,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 Visitor *v = string_input_visitor_new(valuestr);
 uint64_t valuebw = 0;
 int64_t valueint = 0;
+bool valuebool = false;
 Error *err = NULL;
 bool use_int_value = false;
 int i, ret;
@@ -1581,6 +1586,14 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p.has_x_checkpoint_delay = true;
 use_int_value = true;
 break;
+case MIGRATION_PARAMETER_BLOCK_INCREMENTAL:
+p.has_block_incremental = true;
+visit_type_bool(v, param, , );
+if (err) {
+goto cleanup;
+}
+p.block_incremental = valuebool;
+break;
 }
 
 if (use_int_value) {
diff --git a/include/migration/block.h b/include/migration/block.h
index 41a1ac8..5225af9 100644
--- a/include/migration/block.h
+++ b/include/migration/block.h
@@ -20,4 +20,6 @@ uint64_t blk_mig_bytes_transferred(void);
 uint64_t blk_mig_bytes_remaining(void);
 uint64_t blk_mig_bytes_total(void);
 
+void migrate_set_block_enabled(bool value, Error **errp);
+
 #endif /* MIGRATION_BLOCK_H */
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 49ec501..024a048 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -153,6 +153,9 @@ struct MigrationState
 
 /* The last error that occurred */
 Error *error;
+/* Do we have to clean up -b/-i from old migrate parameters */
+/* This feature is deprecated and will be removed */
+bool must_remove_block_options;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
@@ -265,6 +268,9 @@ bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+bool migrate_use_block(void);
+bool migrate_use_block_incremental(void);
+
 bool migrate_use_compression(void);
 int migrate_compress_level(void);
 int migrate_compress_threads(void);
diff --git a/migration/migration.c b/migration/migration.c
index 0304c01..c13c0a2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -592,6 +592,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->downtime_limit = s->parameters.downtime_limit;
 params->has_x_checkpoint_delay = true;
 params->x_checkpoint_delay = s->parameters.x_checkpoint_delay;
+params->has_block_incremental = true;
+params->block_incremental = s->parameters.block_incremental;
 
 return params;
 }
@@ -900,6 +902,9 @@ void qmp_migrate_set_parameters(MigrationParameters 
*params, Error **errp)
 colo_checkpoint_notify(s);
 }
 }
+if (params->has_block_incremental) {
+s->parameters.block_incremental = params->block_incremental;
+}
 }
 
 
@@ -935,6 +940,33 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
 }
 }
 
+void migrate_set_block_enabled(bool value, Error **errp)
+{
+MigrationCapabilityStatusList *cap;
+
+cap = g_new0(MigrationCapabilityStatusList, 1);
+cap->value = g_new0(MigrationCapabilityStatus, 1);
+cap->value->capability = MIGRATION_CAPABILITY_BLOCK;
+cap->value->state = value;
+qmp_migrate_set_capabilities(cap, errp);
+qapi_free_MigrationCapabilityStatusList(cap);
+}
+
+static void migrate_set_block_incremental(MigrationState *s, bool value)
+{
+s->parameters.block_incremental = value;
+}
+
+static void block_cleanup_parameters(MigrationState *s)
+{
+if (s->must_remove_block_options) {
+/* setting to false can never fail */
+migrate_set_block_enabled(false, _abort);
+migrate_set_block_incremental(s, false);
+

[Qemu-devel] [PATCH 0/5] Remove old MigrationParams

2017-05-18 Thread Juan Quintela

Hi

Changes from v4:
- make suggested logic change by eric
- remove extra space in error message
- allow migrate_set_capability block off when block migration is complied off
  (eric request)

Please review

[v4]

- Address Markus review
  * better documentation messages (thanks)
  * reorganize error checking to have small data
  * Use g_new0
  * Improve commints messages

[v4]

- patch 1 included again. We have a 'block' capability and a
  'block_incremental' parameter
  Make Dave happy: rename shared into incremental
  Make Eric and Markus happy:

block: off -> no block migration, we don't care about
  block_incremental value
block: on block_incremental: off -> no incremental block migration
block: on block_incremental: on  -> incremental block migration

- Use visitors.
  You ask if there is not a better way than hand coding a boolean parser.
  QAPI people (a.k.a. Markus) answer to use a visitor.
  Once there, I used it also for integers, not only booleans.

- Use -b/-i parameters.  OK, I bit the bullet: You can't use
  capabilities/parameters and -b/-i: it gives you one error If you use
  -b/-i, capability parameter is cleanup after the migration ends (Who
  would have guessed that there are cases where we call
  migration_fd_error(), but not migration_fd_cleanup()
  I think this should make Peter and Markus and Dave happy.

- Integrate migrate block compile time disabling
  Well, it conflicted left and right, so I fixed it
  Once there, I disable setting/getting block capability if it is disabled.

[v3]

- Patch 1 included in pull request
- disable block_shared when we disable block_enable
- enable block_enabled when we enable block_shared

[v2]

- make migrate_block_set_* take a boolean
- disable block migration in colo to maintain semantics.

[v1]
Upon a time there were MigrationParms (only used for block migration)
and then MigrationParams used for everything else.  This series:

- create migration capabilities for block parameters
- make the migrate command line parameters to use capabilities
- remove MigrationParams completely


Dr. David Alan Gilbert (1):
  block migration: Allow compile time disable

Juan Quintela (4):
  hmp: Use visitor api for hmp_migrate_set_parameter()
  migration: Create block capability
  migration: Remove use of old MigrationParams
  migration: Remove old MigrationParams

 configure | 11 +
 hmp.c | 23 +--
 include/migration/block.h | 24 +++
 include/migration/migration.h | 17 
 include/migration/vmstate.h   |  1 -
 include/qemu/typedefs.h   |  1 -
 include/sysemu/sysemu.h   |  3 +-
 migration/Makefile.objs   |  2 +-
 migration/block.c | 17 +---
 migration/colo.c  |  6 +--
 migration/migration.c | 96 +++
 migration/savevm.c| 24 ---
 qapi-schema.json  | 28 +++--
 13 files changed, 191 insertions(+), 62 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH 5/5] block migration: Allow compile time disable

2017-05-18 Thread Juan Quintela

From: "Dr. David Alan Gilbert" 

Many users now prefer to use drive_mirror over NBD as an
alternative to the older migrate -b option; drive_mirror is
more complex to setup but gives you more options (e.g. only
migrating some of the disks if some of them are shared).

Allow the large chunk of block migration code to be compiled
out for those who don't use it.

Based on a downstream-patch we've had for a while by Jeff Cody.

Signed-off-by: Dr. David Alan Gilbert 

--

- When compiled out, allow seting block only with false value (eric)
---
 configure | 11 +++
 include/migration/block.h | 24 +++-
 migration/Makefile.objs   |  2 +-
 migration/migration.c | 14 ++
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 57b5ae6..ace1a52 100755
--- a/configure
+++ b/configure
@@ -316,6 +316,7 @@ vte=""
 virglrenderer=""
 tpm="yes"
 libssh2=""
+live_block_migration="yes"
 numa=""
 tcmalloc="no"
 jemalloc="no"
@@ -1169,6 +1170,10 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --disable-live-block-migration) live_block_migration="no"
+  ;;
+  --enable-live-block-migration) live_block_migration="yes"
+  ;;
   --disable-numa) numa="no"
   ;;
   --enable-numa) numa="yes"
@@ -1401,6 +1406,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   libnfs  nfs support
   smartcard   smartcard support (libcacard)
   libusb  libusb (for usb passthrough)
+  live-block-migration   Block migration in the main migration stream
   usb-redir   usb network redirection support
   lzo support of lzo compression library
   snappy  support of snappy compression library
@@ -5216,6 +5222,7 @@ echo "TPM support   $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging $qom_cast_debug"
+echo "Live block migration $live_block_migration"
 echo "lzo support   $lzo"
 echo "snappy support$snappy"
 echo "bzip2 support $bzip2"
@@ -5782,6 +5789,10 @@ if test "$libssh2" = "yes" ; then
   echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
 fi
 
+if test "$live_block_migration" = "yes" ; then
+  echo "CONFIG_LIVE_BLOCK_MIGRATION=y" >> $config_host_mak
+fi
+
 # USB host support
 if test "$libusb" = "yes"; then
   echo "HOST_USB=libusb legacy" >> $config_host_mak
diff --git a/include/migration/block.h b/include/migration/block.h
index 5225af9..28cff53 100644
--- a/include/migration/block.h
+++ b/include/migration/block.h
@@ -14,12 +14,34 @@
 #ifndef MIGRATION_BLOCK_H
 #define MIGRATION_BLOCK_H
 
+#ifdef CONFIG_LIVE_BLOCK_MIGRATION
 void blk_mig_init(void);
 int blk_mig_active(void);
 uint64_t blk_mig_bytes_transferred(void);
 uint64_t blk_mig_bytes_remaining(void);
 uint64_t blk_mig_bytes_total(void);
 
+#else
+static inline void blk_mig_init(void) { }
+static inline int blk_mig_active(void)
+{
+return false;
+}
+static inline uint64_t blk_mig_bytes_transferred(void)
+{
+return 0;
+}
+
+static inline uint64_t blk_mig_bytes_remaining(void)
+{
+return 0;
+}
+
+static inline uint64_t blk_mig_bytes_total(void)
+{
+return 0;
+}
+#endif /* CONFIG_LIVE_BLOCK_MIGRATION */
+
 void migrate_set_block_enabled(bool value, Error **errp);
-
 #endif /* MIGRATION_BLOCK_H */
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index c1920b6..00a3f4a 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -9,5 +9,5 @@ common-obj-y += qjson.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
 
-common-obj-y += block.o
+common-obj-$(CONFIG_LIVE_BLOCK_MIGRATION) += block.o
 
diff --git a/migration/migration.c b/migration/migration.c
index ed66158..2f8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -547,6 +547,11 @@ MigrationCapabilityStatusList 
*qmp_query_migrate_capabilities(Error **errp)
 
 caps = NULL; /* silence compiler warning */
 for (i = 0; i < MIGRATION_CAPABILITY__MAX; i++) {
+#ifndef CONFIG_LIVE_BLOCK_MIGRATION
+if (i == MIGRATION_CAPABILITY_BLOCK) {
+continue;
+}
+#endif
 if (i == MIGRATION_CAPABILITY_X_COLO && !colo_supported()) {
 continue;
 }
@@ -763,6 +768,15 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap->next) {
+#ifndef CONFIG_LIVE_BLOCK_MIGRATION
+if (cap->value->capability == MIGRATION_CAPABILITY_BLOCK
+&& cap->value->state) {
+error_setg(errp, "QEMU compiled without old-style (blk/-b, inc/-i) 
"
+   "block migration");
+error_append_hint(errp, "Use drive_mirror+NBD instead.\n");
+continue;
+}
+#endif
 if (cap->value->capability == MIGRATION_CAPABILITY_X_COLO) {
 if (!colo_supported()) {
 error_setg(errp, "COLO is not currently supported, please"
-- 
2.9.3

[Qemu-devel] [PATCH 2/4] move cputlb.c

2017-05-18 Thread Yang Zhong

move cputlb.c to accel/tcg/

Signed-off-by: Yang Zhong 
---
 Makefile.target |2 +-
 accel/tcg/Makefile.objs |1 +
 accel/tcg/cputlb.c  | 1051 +++
 cputlb.c| 1051 ---
 4 files changed, 1053 insertions(+), 1052 deletions(-)
 create mode 100644 accel/tcg/cputlb.c
 delete mode 100644 cputlb.c

diff --git a/Makefile.target b/Makefile.target
index ba893a8..3e19fe9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -144,7 +144,7 @@ obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
 obj-y += accel/
-obj-y += memory.o cputlb.o
+obj-y += memory.o
 obj-y += memory_mapping.o
 obj-y += dump.o
 obj-y += migration/ram.o migration/savevm.o
diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index 6e3211a..487570f 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -1 +1,2 @@
 obj-$(CONFIG_SOFTMMU) += tcg-all.o
+obj-$(CONFIG_SOFTMMU) += cputlb.o
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
new file mode 100644
index 000..743776a
--- /dev/null
+++ b/accel/tcg/cputlb.c
@@ -0,0 +1,1051 @@
+/*
+ *  Common CPU TLB handling
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "exec/cpu_ldst.h"
+#include "exec/cputlb.h"
+#include "exec/memory-internal.h"
+#include "exec/ram_addr.h"
+#include "tcg/tcg.h"
+#include "qemu/error-report.h"
+#include "exec/log.h"
+#include "exec/helper-proto.h"
+#include "qemu/atomic.h"
+
+/* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
+/* #define DEBUG_TLB */
+/* #define DEBUG_TLB_LOG */
+
+#ifdef DEBUG_TLB
+# define DEBUG_TLB_GATE 1
+# ifdef DEBUG_TLB_LOG
+#  define DEBUG_TLB_LOG_GATE 1
+# else
+#  define DEBUG_TLB_LOG_GATE 0
+# endif
+#else
+# define DEBUG_TLB_GATE 0
+# define DEBUG_TLB_LOG_GATE 0
+#endif
+
+#define tlb_debug(fmt, ...) do { \
+if (DEBUG_TLB_LOG_GATE) { \
+qemu_log_mask(CPU_LOG_MMU, "%s: " fmt, __func__, \
+  ## __VA_ARGS__); \
+} else if (DEBUG_TLB_GATE) { \
+fprintf(stderr, "%s: " fmt, __func__, ## __VA_ARGS__); \
+} \
+} while (0)
+
+#define assert_cpu_is_self(this_cpu) do { \
+if (DEBUG_TLB_GATE) { \
+g_assert(!cpu->created || qemu_cpu_is_self(cpu)); \
+} \
+} while (0)
+
+/* run_on_cpu_data.target_ptr should always be big enough for a
+ * target_ulong even on 32 bit builds */
+QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
+
+/* We currently can't handle more than 16 bits in the MMUIDX bitmask.
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
+#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
+
+/* flush_all_helper: run fn across all cpus
+ *
+ * If the wait flag is set then the src cpu's helper will be queued as
+ * "safe" work and the loop exited creating a synchronisation point
+ * where all queued work will be finished before execution starts
+ * again.
+ */
+static void flush_all_helper(CPUState *src, run_on_cpu_func fn,
+ run_on_cpu_data d)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+if (cpu != src) {
+async_run_on_cpu(cpu, fn, d);
+}
+}
+}
+
+/* statistics */
+int tlb_flush_count;
+
+/* This is OK because CPU architectures generally permit an
+ * implementation to drop entries from the TLB at any time, so
+ * flushing more entries than required is only an efficiency issue,
+ * not a correctness issue.
+ */
+static void tlb_flush_nocheck(CPUState *cpu)
+{
+CPUArchState *env = cpu->env_ptr;
+
+/* The QOM tests will trigger tlb_flushes without setting up TCG
+ * so we bug out here in that case.
+ */
+if (!tcg_enabled()) {
+return;
+}
+
+assert_cpu_is_self(cpu);
+tlb_debug("(count: %d)\n", tlb_flush_count++);
+
+tb_lock();
+
+memset(env->tlb_table, -1, sizeof(env->tlb_table));
+memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
+memset(cpu->tb_jmp_cache, 0,

[Qemu-devel] [PATCH 1/5] hmp: Use visitor api for hmp_migrate_set_parameter()

2017-05-18 Thread Juan Quintela

We only use it for int64 at this point, I am not able to find a way to
parse an int with MiB units.

Signed-off-by: Juan Quintela 
Reviewed-by: Markus Armbruster 
---
 hmp.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hmp.c b/hmp.c
index b9dd933..acbbc5c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -29,6 +29,7 @@
 #include "monitor/qdev.h"
 #include "qapi/opts-visitor.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/string-input-visitor.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/util.h"
 #include "qapi-visit.h"
@@ -1523,8 +1524,9 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 {
 const char *param = qdict_get_str(qdict, "parameter");
 const char *valuestr = qdict_get_str(qdict, "value");
+Visitor *v = string_input_visitor_new(valuestr);
 uint64_t valuebw = 0;
-long valueint = 0;
+int64_t valueint = 0;
 Error *err = NULL;
 bool use_int_value = false;
 int i, ret;
@@ -1582,9 +1584,8 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 }
 
 if (use_int_value) {
-if (qemu_strtol(valuestr, NULL, 10, ) < 0) {
-error_setg(, "Unable to parse '%s' as an int",
-   valuestr);
+visit_type_int(v, param, , );
+if (err) {
 goto cleanup;
 }
 /* Set all integers; only one has_FOO will be set, and
@@ -1608,6 +1609,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 }
 
  cleanup:
+visit_free(v);
 if (err) {
 error_report_err(err);
 }
-- 
2.9.3

[Qemu-devel] [PATCH 3/4] move cpu-exec.c

2017-05-18 Thread Yang Zhong

move cpu-exec.c to ./accel/tcg/

Signed-off-by: Yang Zhong 
---
 Makefile.objs   |   1 +
 Makefile.target |   4 +-
 accel/tcg/Makefile.objs |   1 +
 accel/tcg/cpu-exec.c| 686 
 accel/tcg/trace-events  |   7 +
 cpu-exec.c  | 685 ---
 trace-events|   5 -
 7 files changed, 697 insertions(+), 692 deletions(-)
 create mode 100644 accel/tcg/cpu-exec.c
 create mode 100644 accel/tcg/trace-events
 delete mode 100644 cpu-exec.c

diff --git a/Makefile.objs b/Makefile.objs
index 2a8de77..6a33874 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -163,6 +163,7 @@ trace-events-subdirs += target/ppc
 trace-events-subdirs += qom
 trace-events-subdirs += linux-user
 trace-events-subdirs += qapi
+trace-events-subdirs += accel/tcg
 
 trace-events-files = $(SRC_PATH)/trace-events 
$(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)
 
diff --git a/Makefile.target b/Makefile.target
index 3e19fe9..709d07a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -88,7 +88,8 @@ all: $(PROGS) stap
 
 #
 # cpu emulator library
-obj-y = exec.o translate-all.o cpu-exec.o
+obj-y = exec.o translate-all.o
+obj-y += accel/
 obj-y += translate-common.o
 obj-y += cpu-exec-common.o
 obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
@@ -143,7 +144,6 @@ obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o 
ioport.o numa.o
 obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
-obj-y += accel/
 obj-y += memory.o
 obj-y += memory_mapping.o
 obj-y += dump.o
diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index 487570f..6b75a31 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -1,2 +1,3 @@
 obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
+obj-y += cpu-exec.o
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
new file mode 100644
index 000..2019160
--- /dev/null
+++ b/accel/tcg/cpu-exec.c
@@ -0,0 +1,686 @@
+/*
+ *  emulator main execution loop
+ *
+ *  Copyright (c) 2003-2005 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "trace.h"
+#include "disas/disas.h"
+#include "exec/exec-all.h"
+#include "tcg.h"
+#include "qemu/atomic.h"
+#include "sysemu/qtest.h"
+#include "qemu/timer.h"
+#include "exec/address-spaces.h"
+#include "qemu/rcu.h"
+#include "exec/tb-hash.h"
+#include "exec/log.h"
+#include "qemu/main-loop.h"
+#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
+#include "hw/i386/apic.h"
+#endif
+#include "sysemu/cpus.h"
+#include "sysemu/replay.h"
+
+/* -icount align implementation. */
+
+typedef struct SyncClocks {
+int64_t diff_clk;
+int64_t last_cpu_icount;
+int64_t realtime_clock;
+} SyncClocks;
+
+#if !defined(CONFIG_USER_ONLY)
+/* Allow the guest to have a max 3ms advance.
+ * The difference between the 2 clocks could therefore
+ * oscillate around 0.
+ */
+#define VM_CLOCK_ADVANCE 300
+#define THRESHOLD_REDUCE 1.5
+#define MAX_DELAY_PRINT_RATE 20LL
+#define MAX_NB_PRINTS 100
+
+static void align_clocks(SyncClocks *sc, const CPUState *cpu)
+{
+int64_t cpu_icount;
+
+if (!icount_align_option) {
+return;
+}
+
+cpu_icount = cpu->icount_extra + cpu->icount_decr.u16.low;
+sc->diff_clk += cpu_icount_to_ns(sc->last_cpu_icount - cpu_icount);
+sc->last_cpu_icount = cpu_icount;
+
+if (sc->diff_clk > VM_CLOCK_ADVANCE) {
+#ifndef _WIN32
+struct timespec sleep_delay, rem_delay;
+sleep_delay.tv_sec = sc->diff_clk / 10LL;
+sleep_delay.tv_nsec = sc->diff_clk % 10LL;
+if (nanosleep(_delay, _delay) < 0) {
+sc->diff_clk = rem_delay.tv_sec * 10LL + rem_delay.tv_nsec;
+} else {
+sc->diff_clk = 0;
+}
+#else
+Sleep(sc->diff_clk / SCALE_MS);
+sc->diff_clk = 0;
+#endif
+}
+}
+
+static void print_delay(const SyncClocks *sc)
+{
+static float threshold_delay;
+static int64_t last_realtime_clock;
+static int nb_prints;
+
+if (icount_align_option &&
+sc->realtime_clock - last_realtime_clock >= MAX_DELAY_PRINT_RATE &&
+nb_prints < MAX_NB_PRINTS) {
+if

[Qemu-devel] [PATCH 4/4] move cpu-exec-common.c

2017-05-18 Thread Yang Zhong

move cpu-exec-common.c to accel/tcg

Signed-off-by: Yang Zhong 
---
 Makefile.target |  1 -
 accel/tcg/Makefile.objs |  2 +-
 accel/tcg/cpu-exec-common.c | 82 +
 cpu-exec-common.c   | 82 -
 4 files changed, 83 insertions(+), 84 deletions(-)
 create mode 100644 accel/tcg/cpu-exec-common.c
 delete mode 100644 cpu-exec-common.c

diff --git a/Makefile.target b/Makefile.target
index 709d07a..b083a76 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -91,7 +91,6 @@ all: $(PROGS) stap
 obj-y = exec.o translate-all.o
 obj-y += accel/
 obj-y += translate-common.o
-obj-y += cpu-exec-common.o
 obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
 obj-$(CONFIG_TCG_INTERPRETER) += tci.o
 obj-y += tcg/tcg-common.o
diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs
index 6b75a31..940379b 100644
--- a/accel/tcg/Makefile.objs
+++ b/accel/tcg/Makefile.objs
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SOFTMMU) += tcg-all.o
 obj-$(CONFIG_SOFTMMU) += cputlb.o
-obj-y += cpu-exec.o
+obj-y += cpu-exec.o cpu-exec-common.o
diff --git a/accel/tcg/cpu-exec-common.c b/accel/tcg/cpu-exec-common.c
new file mode 100644
index 000..e81da27
--- /dev/null
+++ b/accel/tcg/cpu-exec-common.c
@@ -0,0 +1,82 @@
+/*
+ *  emulator main execution loop
+ *
+ *  Copyright (c) 2003-2005 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "sysemu/cpus.h"
+#include "exec/exec-all.h"
+#include "exec/memory-internal.h"
+
+/* exit the current TB, but without causing any exception to be raised */
+void cpu_loop_exit_noexc(CPUState *cpu)
+{
+/* XXX: restore cpu registers saved in host registers */
+
+cpu->exception_index = -1;
+siglongjmp(cpu->jmp_env, 1);
+}
+
+#if defined(CONFIG_SOFTMMU)
+void cpu_reloading_memory_map(void)
+{
+if (qemu_in_vcpu_thread() && current_cpu->running) {
+/* The guest can in theory prolong the RCU critical section as long
+ * as it feels like. The major problem with this is that because it
+ * can do multiple reconfigurations of the memory map within the
+ * critical section, we could potentially accumulate an unbounded
+ * collection of memory data structures awaiting reclamation.
+ *
+ * Because the only thing we're currently protecting with RCU is the
+ * memory data structures, it's sufficient to break the critical 
section
+ * in this callback, which we know will get called every time the
+ * memory map is rearranged.
+ *
+ * (If we add anything else in the system that uses RCU to protect
+ * its data structures, we will need to implement some other mechanism
+ * to force TCG CPUs to exit the critical section, at which point this
+ * part of this callback might become unnecessary.)
+ *
+ * This pair matches cpu_exec's rcu_read_lock()/rcu_read_unlock(), 
which
+ * only protects cpu->as->dispatch. Since we know our caller is about
+ * to reload it, it's safe to split the critical section.
+ */
+rcu_read_unlock();
+rcu_read_lock();
+}
+}
+#endif
+
+void cpu_loop_exit(CPUState *cpu)
+{
+siglongjmp(cpu->jmp_env, 1);
+}
+
+void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
+{
+if (pc) {
+cpu_restore_state(cpu, pc);
+}
+siglongjmp(cpu->jmp_env, 1);
+}
+
+void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
+{
+cpu->exception_index = EXCP_ATOMIC;
+cpu_loop_exit_restore(cpu, pc);
+}
diff --git a/cpu-exec-common.c b/cpu-exec-common.c
deleted file mode 100644
index e81da27..000
--- a/cpu-exec-common.c
+++ /dev/null
@@ -1,82 +0,0 @@
-/*
- *  emulator main execution loop
- *
- *  Copyright (c) 2003-2005 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

[Qemu-devel] [PATCH 1/4] split the tcg accelerator from accel.c file

2017-05-18 Thread Yang Zhong

there are two accelerators in qemu, kvm and tcg.  kvm
accelerator is defined in kvm-all.c, but tcg accelerator
is defined in accel.c file. we split tcg accelerator from
accel.c file and create one new accel directory,which will
include kvm and tcg accel files.

Signed-off-by: Yang Zhong 
---
 Makefile.objs   |   1 -
 Makefile.target |   1 +
 accel.c | 155 
 accel/Makefile.objs |   2 +
 accel/accel.c   | 128 +++
 accel/tcg/Makefile.objs |   1 +
 accel/tcg/tcg-all.c |  61 +++
 7 files changed, 193 insertions(+), 156 deletions(-)
 delete mode 100644 accel.c
 create mode 100644 accel/Makefile.objs
 create mode 100644 accel/accel.c
 create mode 100644 accel/tcg/Makefile.objs
 create mode 100644 accel/tcg/tcg-all.c

diff --git a/Makefile.objs b/Makefile.objs
index 6167e7b..2a8de77 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -55,7 +55,6 @@ common-obj-$(CONFIG_SPICE) += spice-qemu-char.o
 
 common-obj-y += audio/
 common-obj-y += hw/
-common-obj-y += accel.o
 
 common-obj-y += replay/
 
diff --git a/Makefile.target b/Makefile.target
index 465a633..ba893a8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -143,6 +143,7 @@ obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o 
ioport.o numa.o
 obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
+obj-y += accel/
 obj-y += memory.o cputlb.o
 obj-y += memory_mapping.o
 obj-y += dump.o
diff --git a/accel.c b/accel.c
deleted file mode 100644
index 664bb88..000
--- a/accel.c
+++ /dev/null
@@ -1,155 +0,0 @@
-/*
- * QEMU System Emulator, accelerator interfaces
- *
- * Copyright (c) 2003-2008 Fabrice Bellard
- * Copyright (c) 2014 Red Hat Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "qemu/osdep.h"
-#include "sysemu/accel.h"
-#include "hw/boards.h"
-#include "qemu-common.h"
-#include "sysemu/arch_init.h"
-#include "sysemu/sysemu.h"
-#include "sysemu/kvm.h"
-#include "sysemu/qtest.h"
-#include "hw/xen/xen.h"
-#include "qom/object.h"
-
-int tcg_tb_size;
-static bool tcg_allowed = true;
-
-static int tcg_init(MachineState *ms)
-{
-tcg_exec_init(tcg_tb_size * 1024 * 1024);
-return 0;
-}
-
-static const TypeInfo accel_type = {
-.name = TYPE_ACCEL,
-.parent = TYPE_OBJECT,
-.class_size = sizeof(AccelClass),
-.instance_size = sizeof(AccelState),
-};
-
-/* Lookup AccelClass from opt_name. Returns NULL if not found */
-static AccelClass *accel_find(const char *opt_name)
-{
-char *class_name = g_strdup_printf(ACCEL_CLASS_NAME("%s"), opt_name);
-AccelClass *ac = ACCEL_CLASS(object_class_by_name(class_name));
-g_free(class_name);
-return ac;
-}
-
-static int accel_init_machine(AccelClass *acc, MachineState *ms)
-{
-ObjectClass *oc = OBJECT_CLASS(acc);
-const char *cname = object_class_get_name(oc);
-AccelState *accel = ACCEL(object_new(cname));
-int ret;
-ms->accelerator = accel;
-*(acc->allowed) = true;
-ret = acc->init_machine(ms);
-if (ret < 0) {
-ms->accelerator = NULL;
-*(acc->allowed) = false;
-object_unref(OBJECT(accel));
-}
-return ret;
-}
-
-void configure_accelerator(MachineState *ms)
-{
-const char *p;
-char buf[10];
-int ret;
-bool accel_initialised = false;
-bool init_failed = false;
-AccelClass *acc = NULL;
-
-p = qemu_opt_get(qemu_get_machine_opts(), "accel");
-if (p == NULL) {
-/* Use the default "accelerator", tcg */
-p = "tcg";
-}
-
-while (!accel_initialised && *p != '\0') {
-if (*p == ':') {
-p++;
-}
-p = get_opt_name(buf, sizeof(buf), p, ':');
-acc = accel_find(buf);
-if (!acc) {
-fprintf(stderr, "\"%s\" accelerator not found.\n", buf);
-continue;
-}
-if

[Qemu-devel] [PATCH v2 1/1] s390x/css: catch section mismatch on load

2017-05-18 Thread Halil Pasic

Prior to the virtio-ccw-2.7 machine (and commit 2a79eb1a), our virtio
devices residing under the virtual-css bus do not have qdev_path based
migration stream identifiers (because their qdev_path is NULL). The ids
are instead generated when the device is registered as a composition of
the so called idstr, which takes the vmsd name as its value, and an
instance_id, which is which is calculated as a maximal instance_id
registered with the same idstr plus one, or zero (if none was registered
previously).

That means, under certain circumstances, one device might try, and even
succeed, to load the state of a different device. This can lead to
trouble.

Let us fail the migration if the above problem is detected during load.

How to reproduce the problem:
1) start qemu-system-s390x making sure you have the following devices
   defined on your command line:
 -device virtio-rng-ccw,id=rng1,devno=fe.0.0001
 -device virtio-rng-ccw,id=rng2,devno=fe.0.0002
2) detach the devices and reattach in reverse order using the monitor:
 (qemu) device_del rng1
 (qemu) device_del rng2
 (qemu) device_add virtio-rng-ccw,id=rng2,devno=fe.0.0002
 (qemu) device_add virtio-rng-ccw,id=rng1,devno=fe.0.0001
3) save the state of the vm into a temporary file and quit QEMU:
 (qemu) migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
 (qemu) q
4) use your command line from step 1 with
 -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
   appended to reproduce the problem (while trying to to load the saved vm)

CC: qemu-sta...@nongnu.org
Signed-off-by: Halil Pasic 
Reviewed-by: Dong Jia Shi 
---

v1 --> v2:
* Fixed grammar in error message (s/prior/prior to/).

We (Connie and me) are still interested in learning how to do verbose
error messages with VMSTATE.
---
 hw/s390x/css.c| 14 ++
 hw/s390x/virtio-ccw.c |  6 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 15c4f4b249..b8e6b0b648 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -14,6 +14,7 @@
 #include "qapi/visitor.h"
 #include "hw/qdev.h"
 #include "qemu/bitops.h"
+#include "qemu/error-report.h"
 #include "exec/address-spaces.h"
 #include "cpu.h"
 #include "hw/s390x/ioinst.h"
@@ -1721,13 +1722,26 @@ void subch_device_save(SubchDev *s, QEMUFile *f)
 int subch_device_load(SubchDev *s, QEMUFile *f)
 {
 SubchDev *old_s;
+Error *err = NULL;
 uint16_t old_schid = s->schid;
+uint16_t old_devno = s->devno;
 int i;
 
 s->cssid = qemu_get_byte(f);
 s->ssid = qemu_get_byte(f);
 s->schid = qemu_get_be16(f);
 s->devno = qemu_get_be16(f);
+if (s->devno != old_devno) {
+/* Only possible if machine < 2.7 (no css_dev_path) */
+
+error_setg(, "%x != %x", old_devno,  s->devno);
+error_append_hint(, "Devno mismatch, tried to load wrong section!"
+  " Likely reason: some sequences of plug and unplug"
+  " can break migration for machine versions prior to"
+  " 2.7 (known design flaw).\n");
+error_report_err(err);
+return -EINVAL;
+}
 /* Re-assign subch. */
 if (old_schid != s->schid) {
 old_s = channel_subsys.css[s->cssid]->sch_set[s->ssid]->sch[old_schid];
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index e7167e3d05..4f7efa240d 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1274,9 +1274,13 @@ static int virtio_ccw_load_config(DeviceState *d, 
QEMUFile *f)
 SubchDev *s = ccw_dev->sch;
 VirtIODevice *vdev = virtio_ccw_get_vdev(s);
 int len;
+int ret;
 
 s->driver_data = dev;
-subch_device_load(s, f);
+ret = subch_device_load(s, f);
+if (ret) {
+return ret;
+}
 /* Re-fill subch_id after loading the subchannel states.*/
 if (ck->refill_ids) {
 ck->refill_ids(ccw_dev);
-- 
2.11.2

Re: [Qemu-devel] [PATCH] qcow2: add allocated-size to image specific info

2017-05-18 Thread Vladimir Sementsov-Ogievskiy


18.05.2017 13:25, Kevin Wolf wrote:

Am 18.05.2017 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:

Shows, how much data qcow2 allocates in underlying file. This should
be helpful on non-sparse file systems, when qemu-img info "disk size"
doesn't provide this information.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
Hi all.

Here is an allocated-size feature for qemu-img info.

I'm not a fan of loading all L2 tables (can take some time) for
'qemu-img info' (which should be very quick). Why isn't the qemu-img
check output good enough?

Kevin

$ ./qemu-img check /tmp/test.qcow2
No errors were found on the image.
16164/491520 = 3.29% allocated, 11.98% fragmented, 0.00% compressed clusters
Image end offset: 1060044800
$ ./qemu-img check --output=json /tmp/test.qcow2
{
 "image-end-offset": 1060044800,
 "total-clusters": 491520,
 "check-errors": 0,
 "allocated-clusters": 16164,
 "filename": "/tmp/test.qcow2",
 "format": "qcow2",
 "fragmented-clusters": 1937
}



It is not the same, it shows guest clusters, but we need host clusters - 
including all metadata, dirty bitmaps, snapshots, etc..




--
Best regards,
Vladimir

Re: [Qemu-devel] [Bug 1691379] [NEW] NetBSD evbmips64el port installation doesn't work with qemu-system-mips64el.

2017-05-18 Thread Aurelien Jarno

On 2017-05-18 04:29, Kamil Rytarowski wrote:
> On 17.05.2017 19:58, Kamil Rytarowski wrote:
> > On 17.05.2017 10:10, Thomas Huth wrote:
> >> On 17.05.2017 09:52, Utkarsh Anand wrote:
> >>> Public bug reported:
> >>>
> >>> I successfully installed the NetBSD evbmips64el port on gxemul but was
> >>> unable to install it on qemu. Trying to boot it on qemu takes me to the
> >>> 'db>' prompt. Here's the output and backtrace:
> >>>
> >>> panic: pcib_isa_intr_string: bogus isa irq 0x0
> >>> kernel: breakpoint trap
> >>> Stopped in pid 0.1 (system) at  netbsd:cpu_Debugger+0x4:jr  ra
> >>> bdslot: nop
> >>> db> bt
> >>> 0x805977f0: cpu_Debugger+4 
> >>> (63061,9000180003f8,6,804c2290) ra 8030acd0 sz 0
> >>> 0x805977f0: vpanic+158 
> >>> (63061,9000180003f8,6,804c2290) ra 8030ad7c sz 64
> >>> 0x80597830: panic+34 (63061,803d65b0,0,40) ra 
> >>> 80109784 sz 96
> >>> 0x80597890: pcib_isa_intr_string+6c (63061,803d65b0,0,40) 
> >>> ra 80149bfc sz 16
> >>> 0x805978a0: uhci_pci_attach+16c (63061,803d65b0,0,40) ra 
> >>> 802f0400 sz 176
> >>> 0x80597950: config_attach_loc+1c8 (63061,803d65b0,0,40) 
> >>> ra 802f053c sz 64
> >>> 0x80597990: config_found_sm_loc+5c (63061,803d65b0,0,40) 
> >>> ra 80121354 sz 64
> >>> 0x805979d0: pci_probe_device+524 (63061,803d65b0,0,0) ra 
> >>> 80121548 sz 288
> >>> 0x80597af0: pci_enumerate_bus+1d0 (63061,803d65b0,0,0) ra 
> >>> 8012167c sz 160
> >>> 0x80597b90: pcirescan+5c (63061,803d65b0,0,0) ra 
> >>> 801218c4 sz 32
> >>> 0x80597bb0: pciattach+19c (63061,803d65b0,0,0) ra 
> >>> 802f0400 sz 80
> >>> 0x80597c00: config_attach_loc+1c8 (63061,803d65b0,0,0) ra 
> >>> 802f053c sz 64
> >>> 0x80597c40: config_found_sm_loc+5c (63061,803d65b0,0,0) 
> >>> ra 80108934 sz 64
> >>> 0x80597c80: gt_attach+7c (63061,803d65b0,0,0) ra 
> >>> 802f0400 sz 112   
> >>> 0x80597cf0: config_attach_loc+1c8 (63061,803d65b0,0,0) ra 
> >>> 802f053c sz 64
> >>> 0x80597d30: config_found_sm_loc+5c (63061,803d65b0,0,0) 
> >>> ra 801086ac sz 64
> >>> 0x80597d70: mainbus_attach+dc (63061,803d65b0,0,0) ra 
> >>> 802f0400 sz 96
> >>> 0x80597dd0: config_attach_loc+1c8 (63061,803d65b0,0,0) ra 
> >>> 80104bf8 sz 64
> >>> 0x80597e10: cpu_configure+28 (63061,803d65b0,0,0) ra 
> >>> 803d5f30 sz 16
> >>> 0x80597e20: main+3a0 (63061,803d65b0,0,0) ra 
> >>> 801000dc sz 128   
> >>> 0x80597ea0: kernel_text+dc (63061,803d65b0,0,0) ra 0 sz 0
> >>> User-level: pid 0.1
> >>>
> >>> Here's the command that I used:
> >>>
> >>> Build evbmips64el from source and then launch it from qemu (replace the
> >>> paths relative to your system):
> >>>
> >>> qemu-system-mips64el -cdrom
> >>> /extra/evbmips64/distrib/evbmips/cdroms/installcd/NetBSD-7.99.71
> >>> -evbmips-mips64el.iso -hda /extra/evbmips64.img -kernel
> >>> /extra/evbmips64/releasedir/evbmips/installation/netbsd-INSTALL_MALTA64
> >>> -nographic -M malta
> >>>
> >>> (I've decompressed the kernel)
> >>>
> >>> Here's the output for qemu-system-mips64el --version :
> >>>
> >>> QEMU emulator version 2.7.1(qemu-2.7.1-6.fc25), Copyright (c) 2003-2016
> >>> Fabrice Bellard and the QEMU Project developers
> >>>
> >>> This doesn't look like a NetBSD bug. I've attached a screenshot of the
> >>> working installation using gxemul in the attachments.
> >>
> >> When reporting such issues, please always use the latest release of QEMU
> >> first, so could you please try again with the latest upstream release of
> >> QEMU (currently v2.9.0)? Thanks!
> >>
> >>  Thomas
> >>
> > 
> > 7.99.71 is the most recent kernel ABI version (NetBSD-current).
> > 
> > Release engineering builds of NetBSD-current are hosted on nyftp.netbsd.org.
> > 
> > kernel:
> > 
> > http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201705170540Z/evbmips-mips64el/installation/
> > 
> > installation medium:
> > 
> > http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201705170540Z/images/
> > 
> > [In future there will be need to switch 201705170540Z to a newer
> > snapshot, as this one will be removed.]
> > 
> > I will have to a look tonight and try to reproduce locally (and take
> > MIPS 64-bit crash course).
> > 
> 
> I've reproduced it locally with qemu/NetBSD-7.99.71 ver. 2.9 git HEAD
> (rev. cdece0467c7cf8e3).
> 
> 464 const char *
> 465 pcib_isa_intr_string(void *v, int irq, char *buf, size_t len)
> 466 {
> 467   if (irq == 0 || irq >= ICU_LEN || irq == 2)
> 468   panic("%s: bogus isa irq 0x%x", __func__, irq);
> 469
> 470   snprintf(buf, len, "isa irq %d",

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:02 +0800
> "Liu, Yi L"  wrote:
> 
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 17 +
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +   __u32   argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
> > driver */
> > +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
> > +   __u32   flags;
> > +   __u32   length;
> > +   __u8data[];
> 
> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
> pasid_table_info?  So at a minimum this is a union including struct
> pasid_table_info.  Furthermore how does a user learn what the opaque
> data in struct pasid_table_info is without looking at the code?  A user
> API needs to be clear and documented, not opaque and variable.  We
> should also have references to the hardware spec for an Intel or ARM
> PASID table in uapi.  flags should be defined as they're used, let's
> not reserve them with the expectation of future use.
> 

Agree. would add description accordingly. For the flags, I would remove
the last two as I wouldn't use. I think Jean would add them in his/her
patchset. Anyhow, one of us need to do merge on the flags.

Thanks,
Yi L

> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
> > +   VFIO_SVM_BIND_PASID | \
> > +   VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> > +
> >  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
> >  
> >  /*
> 
>

Re: [Qemu-devel] [PATCH] e1000e: Fix a bug where guest hangs upon migration

2017-05-18 Thread Paolo Bonzini



On 18/05/2017 12:04, Dr. David Alan Gilbert wrote:
> * Sameeh Jubran (sam...@daynix.com) wrote:
>> The bug was caused by the "receive overrun" (bit #6 of the ICR register) 
>> interrupt
>> which would be triggered post migration in a heavy traffic environment. Even 
>> though the
>> "receive overrun" bit (#6) is masked out by the IMS register (refer to the 
>> log below)
>> the driver still receives an interrupt as the "receive overrun" bit (#6) 
>> causes the
>> "Other" - bit #24 of the ICR register - bit to be set as documented below. 
>> The driver
>> handles the interrupt and clears the "Other" bit (#24) but doesn't clear the
>> "receive overrun" bit (#6) which leads to an infinite loop. Apparently the 
>> Windows
>> driver expects that the "receive overrun" bit and other ones - documented 
>> below - to be
>> cleared when the "Other" bit (#24) is cleared.
>>
>> So to sum that up:
>> 1. Bit #6 of the ICR register is set by heavy traffic
>> 2. As a results of setting bit #6, bit #24 is set
>> 3. The driver receives an interrupt for bit 24 (it doesn't receieve an 
>> interrupt for bit #6 as it is masked out by IMS)
>> 4. The driver handles and clears the interrupt of bit #24
>> 5. Bit #6 is still set.
>> 6. 2 happens all over again
>>
>> The Interrupt Cause Read - ICR register:
>>
>> The ICR has the "Other" bit - bit #24 - that is set when one or more of the 
>> following
>> ICR register's bits are set:
>>
>> LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17, 
>> MNG - bit #18
>>
>> Log sample of the storm:
>>
>> 27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
>> (ICR: 0x815000c2, IMS: 0x1a4)
>> 27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
>> 0x815000c2, IMS: 0xa4)
>> 27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
>> (ICR: 0x815000c2, IMS: 0x1a4)
>>
>> This commit solves:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1447935
>> https://bugzilla.redhat.com/show_bug.cgi?id=1449490
>>
>> Signed-off-by: Sameeh Jubran 
> 
> Thanks, I tested this with our downstream and it does
> fix the reproducer for 1447935 for me, so:
> 
> 
> Tested-by: Dr. David Alan Gilbert 
> 
> Dave
> 
>> ---
>>  hw/net/e1000e_core.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
>> index 28c5be1..8174b53 100644
>> --- a/hw/net/e1000e_core.c
>> +++ b/hw/net/e1000e_core.c
>> @@ -2454,14 +2454,17 @@ e1000e_set_ics(E1000ECore *core, int index, uint32_t 
>> val)
>>  static void
>>  e1000e_set_icr(E1000ECore *core, int index, uint32_t val)
>>  {
>> +uint32_t icr = 0;
>>  if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
>>  (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
>>  trace_e1000e_irq_icr_process_iame();
>>  e1000e_clear_ims_bits(core, core->mac[IAM]);
>>  }
>>  
>> -trace_e1000e_irq_icr_write(val, core->mac[ICR], core->mac[ICR] & ~val);
>> -core->mac[ICR] &= ~val;
>> +icr = core->mac[ICR] & ~val;
>> +icr = (val & E1000_ICR_OTHER) ? (icr & ~E1000_ICR_OTHER_CAUSES) : icr;
>> +trace_e1000e_irq_icr_write(val, core->mac[ICR], icr);
>> +core->mac[ICR] = icr;
>>  e1000e_update_interrupt_state(core);
>>  }
>>  
>> -- 
>> 2.8.1.185.gdc0db2c
>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

Cc: qemu-sta...@nongnu.org

Re: [Qemu-devel] [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:01 +0800
> "Liu, Yi L"  wrote:
> 
> > From: Jacob Pan 
> > 
> > This patch adds Intel VT-d specific function to implement
> > iommu_do_invalidate API.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the guest
> > to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 43 
> > +++
> >  include/linux/intel-iommu.h | 11 +++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6d5b939..0b098ad 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct 
> > iommu_domain *domain,
> > dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> >  }
> >  
> > +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +   struct intel_iommu *iommu;
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct intel_invalidate_data *inv_data;
> > +   struct qi_desc *qi;
> > +   u16 did;
> > +   u8 bus, devfn;
> > +
> > +   if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> > +   return -EINVAL;
> > +
> > +   iommu = device_to_iommu(dev, , );
> > +   if (!iommu)
> > +   return -ENODEV;
> > +
> > +   inv_data = (struct intel_invalidate_data *)_info->opaque;
> > +
> > +   /* check SID */
> > +   if (PCI_DEVID(bus, devfn) != inv_data->sid)
> > +   return 0;
> > +
> > +   qi = _data->inv_desc;
> > +
> > +   switch (qi->low & QI_TYPE_MASK) {
> > +   case QI_DIOTLB_TYPE:
> > +   case QI_DEIOTLB_TYPE:
> > +   /* for device IOTLB, we just let it pass through */
> > +   break;
> > +   default:
> > +   did = dmar_domain->iommu_did[iommu->seq_id];
> > +   set_mask_bits(>low, QI_DID_MASK, QI_DID(did));
> > +   break;
> > +   }
> > +
> > +   ret = qi_submit_sync(qi, iommu);
> > +
> > +   return ret;
> 
> nit, ret variable is unnecessary.

yes, would remove it.
 
> > +}
> > +
> >  static int intel_iommu_map(struct iommu_domain *domain,
> >unsigned long iova, phys_addr_t hpa,
> >size_t size, int iommu_prot)
> > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct 
> > iommu_domain *domain,
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> > .bind_pasid_table   = intel_iommu_bind_pasid_table,
> > .unbind_pasid_table = intel_iommu_unbind_pasid_table,
> > +   .do_invalidate  = intel_iommu_do_invalidate,
> >  #endif
> > .map= intel_iommu_map,
> > .unmap  = intel_iommu_unmap,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index ac04f28..9d6562c 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -271,6 +272,10 @@ enum {
> >  #define QI_PGRP_RESP_TYPE  0x9
> >  #define QI_PSTRM_RESP_TYPE 0xa
> >  
> > +#define QI_DID(did)(((u64)did & 0x) << 16)
> > +#define QI_DID_MASKGENMASK(31, 16)
> > +#define QI_TYPE_MASK   GENMASK(3, 0)
> > +
> >  #define QI_IEC_SELECTIVE   (((u64)1) << 4)
> >  #define QI_IEC_IIDEX(idx)  (((u64)(idx & 0x) << 32))
> >  #define QI_IEC_IM(m)   (((u64)(m & 0x1f) << 27))
> > @@ -529,6 +534,12 @@ struct intel_svm {
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> >  #endif
> >  
> > +struct intel_invalidate_data {
> > +   u16 sid;
> > +   u32 pasid;
> > +   struct qi_desc inv_desc;
> > +};
> 
> This needs to be uapi since the vfio user is expected to create it, so
> we need a uapi version of qi_desc too.
>

yes, would do it.

Thx,
Yi L
 
> > +
> >  extern const struct attribute_group *intel_iommu_groups[];
> >  extern void intel_iommu_debugfs_init(void);
> >  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
>

Re: [Qemu-devel] [PATCH] qcow2: add allocated-size to image specific info

2017-05-18 Thread Denis V. Lunev

On 05/18/2017 01:25 PM, Kevin Wolf wrote:
> Am 18.05.2017 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Shows, how much data qcow2 allocates in underlying file. This should
>> be helpful on non-sparse file systems, when qemu-img info "disk size"
>> doesn't provide this information.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>> Hi all.
>>
>> Here is an allocated-size feature for qemu-img info.
> I'm not a fan of loading all L2 tables (can take some time) for
> 'qemu-img info' (which should be very quick). Why isn't the qemu-img
> check output good enough?
>
> Kevin
>
> $ ./qemu-img check /tmp/test.qcow2 
> No errors were found on the image.
> 16164/491520 = 3.29% allocated, 11.98% fragmented, 0.00% compressed clusters
> Image end offset: 1060044800
> $ ./qemu-img check --output=json /tmp/test.qcow2 
> {
> "image-end-offset": 1060044800,
> "total-clusters": 491520,
> "check-errors": 0,
> "allocated-clusters": 16164,
> "filename": "/tmp/test.qcow2",
> "format": "qcow2",
> "fragmented-clusters": 1937
> }
this can be done, sure. Is this considered safe
for running VM too?

Den

Re: [Qemu-devel] [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:00 +0800
> "Liu, Yi L"  wrote:
> 

Hi Alex,

Pls refer to the open I mentioned in this email, I need your comments
on it to prepare the formal patchset for SVM virtualization. Thx.

> > From: "Liu, Yi L" 
> > 
> > When a SVM capable device is assigned to a guest, the first level page
> > tables are owned by the guest and the guest PASID table pointer is
> > linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates unless
> > the guest invalidation activities are passed down to the host. The
> > primary usage is derived from emulated IOMMU in the guest, where QEMU
> > can trap invalidation activities before pass them down the
> > host/physical IOMMU. There are IOMMU architectural specific actions
> > need to be taken which requires the generic APIs introduced in this
> > patch to have opaque data in the tlb_invalidate_info argument.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/iommu.c | 13 +
> >  include/linux/iommu.h | 16 
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index f2da636..ca7cff2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain 
> > *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >  
> > +int iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +
> > +   if (unlikely(domain->ops->do_invalidate == NULL))
> > +   return -ENODEV;
> > +
> > +   ret = domain->ops->do_invalidate(domain, dev, inv_info);
> > +   return ret;
> 
> nit, ret is unnecessary.

yes, would modify it. Thx.

> > +}
> > +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 491a011..a48e3b75 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -140,6 +140,11 @@ struct pasid_table_info {
> > __u8opaque[];/* IOMMU-specific details */
> >  };
> >  
> > +struct tlb_invalidate_info {
> > +   __u32   model;
> > +   __u8opaque[];
> > +};
> 
> I'm wondering if 'model' is really necessary here, shouldn't this
> function only be called if a bind_pasid_table() succeeded, and then the
> model would be set at that time?

For this model, I'm thinking about another potential usage which
is from Tianyu's idea to use tlb_invalidate_info to pass invalidations
for iova related mappings. In such case, there would be no bind_pasid_table()
before it, so a model check would be needed. But I may remove it since this
patchset is focusing on SVM.

Here, I have an open to check with you. I defined the tlb_invalidate_info
with full opaque data. The opaque would include the invalidate info for
different vendors. But we have two choices for the tlb_invalidate_info
definition.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
   on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
   question since different vendor may have vendor specific
   invalidation info. This would make it difficult to have vendor
   agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
   The generic structure would be a hyper-set of all possible
   invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. Jean from ARM has gave some comments on it and inclined to the opaque
way with generic part defined explicitly. Jean's reply is in the link below.

http://www.spinics.net/lists/kvm/msg149884.html

I'd like to see your comments on it before moving forward. I'm fine with
Jean's idea. For VT-d, I may define it as "generic part" + "raw data".

Thanks,
Yi L

> This also needs to be uapi since you're expecting a user to provide it
> to vfio.  The opaque data needs to be fully specified (relative

[Qemu-devel] [PATCH] trivial: Remove unneeded ifndef in memory.h

2017-05-18 Thread Juan Quintela

All the file is surounded already by #ifndef CONFIG_USER_ONLY.

Signed-off-by: Juan Quintela 
---
 include/exec/memory.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 99e0f54..6b5b953 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -17,9 +17,7 @@
 #ifndef CONFIG_USER_ONLY
 
 #include "exec/cpu-common.h"
-#ifndef CONFIG_USER_ONLY
 #include "exec/hwaddr.h"
-#endif
 #include "exec/memattrs.h"
 #include "exec/ramlist.h"
 #include "qemu/queue.h"
-- 
2.9.3

[Qemu-devel] [PATCH] altera_timer: fix incorrect memset

2017-05-18 Thread Paolo Bonzini

Use sizeof instead of ARRAY_SIZE, fixing -Wmemset-elt-size with recent
GCC versions.

Signed-off-by: Paolo Bonzini 
---
 hw/timer/altera_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/timer/altera_timer.c b/hw/timer/altera_timer.c
index 6d4862661d..c9a0fc5dca 100644
--- a/hw/timer/altera_timer.c
+++ b/hw/timer/altera_timer.c
@@ -204,7 +204,7 @@ static void altera_timer_reset(DeviceState *dev)
 
 ptimer_stop(t->ptimer);
 ptimer_set_limit(t->ptimer, 0x, 1);
-memset(t->regs, 0, ARRAY_SIZE(t->regs));
+memset(t->regs, 0, sizeof(t->regs));
 }
 
 static Property altera_timer_properties[] = {
-- 
2.13.0

Re: [Qemu-devel] [PATCH 0/2] Make migration/ram.c target independent

2017-05-18 Thread Peter Xu

On Wed, May 17, 2017 at 06:08:02PM +0200, Juan Quintela wrote:
> Hi
> 
> Only reason that ram.c is compiled by target is because it use
> TARGET_PAGE_BITS.  As we already have a function to export
> TARGET_PAGE_SIZE, do the same.
> After this, we can make it target independent.
> 
> Please, review.
> 
> Later, Juan.
> 
> 
> 
> 
> Juan Quintela (2):
>   exec: Create include for target_page_size()
>   migration: Make savevm.c target independent
> 
>  Makefile.target|  2 +-
>  exec.c | 10 ++
>  include/exec/target_page.h | 22 ++
>  include/sysemu/sysemu.h|  1 -
>  migration/Makefile.objs|  2 +-
>  migration/migration.c  |  1 +
>  migration/postcopy-ram.c   |  1 +
>  migration/savevm.c | 15 ---
>  8 files changed, 44 insertions(+), 10 deletions(-)
>  create mode 100644 include/exec/target_page.h

Though I am not 100% sure of the relationship between TARGET_PAGE_*,
qemu_target_page_*(), and exec/poison.h, but at least we already have
qemu_target_page_size(), and this patch moves savevm.c out of
arch-dependent list, which does make sense. So:

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] virtio-scsi: Unset hotplug handler when unrealize

2017-05-18 Thread Paolo Bonzini



On 18/05/2017 12:28, Fam Zheng wrote:
> Like suggested by Paolo, it seems to make sense to do this clean up in
> bus_unparent, but given how confused I got when working on this bug, I'm
> not confident in my reasoning about the ref count balance in other
> callers.

Queued this for now.

Paolo

[Qemu-devel] [PATCH] virtio-scsi: Unset hotplug handler when unrealize

2017-05-18 Thread Fam Zheng

This matches the qbus_set_hotplug_handler in realize and is important to
release the final reference to the embedded VirtIODevice so that it is
properly finalized.

A use-after-free is fixed with this patch, indirectly:
virtio_device_instance_finalize wasn't called at hot-unplug, and the
vdev->listener would be a dangling pointer in the global and the per
address space listener list. See also RHBZ 1449031.

Thanks to Paolo for explaining the reference counting.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Fam Zheng 

---

Like suggested by Paolo, it seems to make sense to do this clean up in
bus_unparent, but given how confused I got when working on this bug, I'm
not confident in my reasoning about the ref count balance in other
callers.
---
 hw/scsi/virtio-scsi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 46a3e3f..f46f06d 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -918,6 +918,9 @@ void virtio_scsi_common_unrealize(DeviceState *dev, Error 
**errp)
 
 static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp)
 {
+VirtIOSCSI *s = VIRTIO_SCSI(dev);
+
+qbus_set_hotplug_handler(BUS(>bus), NULL, _abort);
 virtio_scsi_common_unrealize(dev, errp);
 }
 
-- 
2.9.3

Re: [Qemu-devel] [PATCH] qcow2: add allocated-size to image specific info

2017-05-18 Thread Kevin Wolf

Am 18.05.2017 um 12:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Shows, how much data qcow2 allocates in underlying file. This should
> be helpful on non-sparse file systems, when qemu-img info "disk size"
> doesn't provide this information.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> Hi all.
> 
> Here is an allocated-size feature for qemu-img info.

I'm not a fan of loading all L2 tables (can take some time) for
'qemu-img info' (which should be very quick). Why isn't the qemu-img
check output good enough?

Kevin

$ ./qemu-img check /tmp/test.qcow2 
No errors were found on the image.
16164/491520 = 3.29% allocated, 11.98% fragmented, 0.00% compressed clusters
Image end offset: 1060044800
$ ./qemu-img check --output=json /tmp/test.qcow2 
{
"image-end-offset": 1060044800,
"total-clusters": 491520,
"check-errors": 0,
"allocated-clusters": 16164,
"filename": "/tmp/test.qcow2",
"format": "qcow2",
"fragmented-clusters": 1937
}

[Qemu-devel] [PATCH] qcow2: add allocated-size to image specific info

2017-05-18 Thread Vladimir Sementsov-Ogievskiy

Shows, how much data qcow2 allocates in underlying file. This should
be helpful on non-sparse file systems, when qemu-img info "disk size"
doesn't provide this information.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all.

Here is an allocated-size feature for qemu-img info.

 block/qcow2-refcount.c | 60 ++
 block/qcow2.c  |  4 
 block/qcow2.h  |  2 ++
 qapi/block-core.json   |  6 -
 4 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..a2f7696a0a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2931,3 +2931,63 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+typedef struct NbAllocatedClustersCo {
+BlockDriverState *bs;
+int64_t ret;
+} NbAllocatedClustersCo;
+
+static void coroutine_fn qcow2_nb_allocated_clusters_co_entry(void *opaque)
+{
+NbAllocatedClustersCo *nbco = opaque;
+BlockDriverState *bs = nbco->bs;
+BDRVQcow2State *s = bs->opaque;
+int64_t size, nb_clusters, nb_allocated = 0, i;
+int ret = 0;
+
+size = bdrv_getlength(bs->file->bs);
+if (size < 0) {
+nbco->ret = size;
+return;
+}
+
+nb_clusters = size_to_clusters(s, size);
+
+qemu_co_mutex_lock(>lock);
+
+for (i = 0; i < nb_clusters; ++i) {
+uint64_t refcount;
+ret = qcow2_get_refcount(bs, i, );
+if (ret < 0) {
+nbco->ret = ret;
+qemu_co_mutex_unlock(>lock);
+return;
+}
+if (refcount > 0) {
+nb_allocated++;
+}
+}
+
+qemu_co_mutex_unlock(>lock);
+
+nbco->ret = nb_allocated;
+}
+
+int64_t qcow2_nb_allocated_clusters(BlockDriverState *bs)
+{
+NbAllocatedClustersCo nbco = {
+.bs = bs,
+.ret = -EINPROGRESS
+};
+
+if (qemu_in_coroutine()) {
+qcow2_nb_allocated_clusters_co_entry();
+} else {
+Coroutine *co =
+qemu_coroutine_create(qcow2_nb_allocated_clusters_co_entry, );
+qemu_coroutine_enter(co);
+BDRV_POLL_WHILE(bs, nbco.ret == -EINPROGRESS);
+}
+
+return nbco.ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0981..9a64b89ba2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2927,6 +2927,8 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
 .refcount_bits  = s->refcount_bits,
 };
 } else if (s->qcow_version == 3) {
+int64_t nb_allocated = qcow2_nb_allocated_clusters(bs);
+
 *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
 .compat = g_strdup("1.1"),
 .lazy_refcounts = s->compatible_features &
@@ -2936,6 +2938,8 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs)
   QCOW2_INCOMPAT_CORRUPT,
 .has_corrupt= true,
 .refcount_bits  = s->refcount_bits,
+.allocated_size = MAX(0, nb_allocated) << s->cluster_bits,
+.has_allocated_size = nb_allocated >= 0,
 };
 } else {
 /* if this assertion fails, this probably means a new version was
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..a0f0bf9358 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -532,6 +532,8 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int 
refcount_order,
 BlockDriverAmendStatusCB *status_cb,
 void *cb_opaque, Error **errp);
 
+int64_t qcow2_nb_allocated_clusters(BlockDriverState *bs);
+
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
 bool exact_size);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6b974b952f..dda830 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -44,6 +44,9 @@
 #
 # @refcount-bits: width of a refcount entry in bits (since 2.3)
 #
+# @allocated-size: number of allocated clusters (including metadata)
+#  if available; only valid for compat >= 1.1 (since v2.10)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -51,7 +54,8 @@
   'compat': 'str',
   '*lazy-refcounts': 'bool',
   '*corrupt': 'bool',
-  'refcount-bits': 'int'
+  'refcount-bits': 'int',
+  '*allocated-size': 'uint64'
   } }
 
 ##
-- 
2.11.1

Re: [Qemu-devel] [PATCH] e1000e: Fix a bug where guest hangs upon migration

2017-05-18 Thread Dr. David Alan Gilbert

* Sameeh Jubran (sam...@daynix.com) wrote:
> The bug was caused by the "receive overrun" (bit #6 of the ICR register) 
> interrupt
> which would be triggered post migration in a heavy traffic environment. Even 
> though the
> "receive overrun" bit (#6) is masked out by the IMS register (refer to the 
> log below)
> the driver still receives an interrupt as the "receive overrun" bit (#6) 
> causes the
> "Other" - bit #24 of the ICR register - bit to be set as documented below. 
> The driver
> handles the interrupt and clears the "Other" bit (#24) but doesn't clear the
> "receive overrun" bit (#6) which leads to an infinite loop. Apparently the 
> Windows
> driver expects that the "receive overrun" bit and other ones - documented 
> below - to be
> cleared when the "Other" bit (#24) is cleared.
> 
> So to sum that up:
> 1. Bit #6 of the ICR register is set by heavy traffic
> 2. As a results of setting bit #6, bit #24 is set
> 3. The driver receives an interrupt for bit 24 (it doesn't receieve an 
> interrupt for bit #6 as it is masked out by IMS)
> 4. The driver handles and clears the interrupt of bit #24
> 5. Bit #6 is still set.
> 6. 2 happens all over again
> 
> The Interrupt Cause Read - ICR register:
> 
> The ICR has the "Other" bit - bit #24 - that is set when one or more of the 
> following
> ICR register's bits are set:
> 
> LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17, MNG 
> - bit #18
> 
> Log sample of the storm:
> 
> 27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
> (ICR: 0x815000c2, IMS: 0x1a4)
> 27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 
> 0x815000c2, IMS: 0xa4)
> 27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x100 
> (ICR: 0x815000c2, IMS: 0x1a4)
> 
> This commit solves:
> https://bugzilla.redhat.com/show_bug.cgi?id=1447935
> https://bugzilla.redhat.com/show_bug.cgi?id=1449490
> 
> Signed-off-by: Sameeh Jubran 

Thanks, I tested this with our downstream and it does
fix the reproducer for 1447935 for me, so:


Tested-by: Dr. David Alan Gilbert 

Dave

> ---
>  hw/net/e1000e_core.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 28c5be1..8174b53 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -2454,14 +2454,17 @@ e1000e_set_ics(E1000ECore *core, int index, uint32_t 
> val)
>  static void
>  e1000e_set_icr(E1000ECore *core, int index, uint32_t val)
>  {
> +uint32_t icr = 0;
>  if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
>  (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
>  trace_e1000e_irq_icr_process_iame();
>  e1000e_clear_ims_bits(core, core->mac[IAM]);
>  }
>  
> -trace_e1000e_irq_icr_write(val, core->mac[ICR], core->mac[ICR] & ~val);
> -core->mac[ICR] &= ~val;
> +icr = core->mac[ICR] & ~val;
> +icr = (val & E1000_ICR_OTHER) ? (icr & ~E1000_ICR_OTHER_CAUSES) : icr;
> +trace_e1000e_irq_icr_write(val, core->mac[ICR], icr);
> +core->mac[ICR] = icr;
>  e1000e_update_interrupt_state(core);
>  }
>  
> -- 
> 2.8.1.185.gdc0db2c
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 0/2] Make migration/ram.c target independent

2017-05-18 Thread Juan Quintela

Juan Quintela  wrote:
> Hi

Hi

And I got it wrong subject.

What got target independent is savevm.c

ram.c needs to "cleanup" first include/exec/ram_addr.h

Sorry, Juan.

>
> Only reason that ram.c is compiled by target is because it use
> TARGET_PAGE_BITS.  As we already have a function to export
> TARGET_PAGE_SIZE, do the same.
> After this, we can make it target independent.
>
> Please, review.
>
> Later, Juan.
>
>
>
>
> Juan Quintela (2):
>   exec: Create include for target_page_size()
>   migration: Make savevm.c target independent
>
>  Makefile.target|  2 +-
>  exec.c | 10 ++
>  include/exec/target_page.h | 22 ++
>  include/sysemu/sysemu.h|  1 -
>  migration/Makefile.objs|  2 +-
>  migration/migration.c  |  1 +
>  migration/postcopy-ram.c   |  1 +
>  migration/savevm.c | 15 ---
>  8 files changed, 44 insertions(+), 10 deletions(-)
>  create mode 100644 include/exec/target_page.h

[Qemu-devel] [PATCH v2] gluster: add support for PREALLOC_MODE_FALLOC

2017-05-18 Thread Niels de Vos

Add missing support for "preallocation=falloc" to the Gluster block
driver. This change bases its logic on that of block/file-posix.c and
removed the gluster_supports_zerofill() and qemu_gluster_zerofill()
functiond in favour of #ifdef checks in an easy to read
switch-statement.

Both glfs_zerofill() and glfs_fallocate() have been introduced with
GlusterFS 3.5.0 (pkg-config glusterfs-api = 6). A #define for the
availability of glfs_fallocate() has been added to ./configure.

Reported-by: Satheesaran Sundaramoorthi 
URL: https://bugzilla.redhat.com/1450759
Signed-off-by: Niels de Vos 
---
v2 changes requested by Jeff Cody:
- add CONFIG_GLUSTERFS_FALLOCATE
- remove unneeded wrapper qemu_gluster_zerofill()

 block/gluster.c | 76 ++---
 configure   |  6 +
 2 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 7c76cd0..0610183 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -964,29 +964,6 @@ static coroutine_fn int 
qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
 qemu_coroutine_yield();
 return acb.ret;
 }
-
-static inline bool gluster_supports_zerofill(void)
-{
-return 1;
-}
-
-static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
-{
-return glfs_zerofill(fd, offset, size);
-}
-
-#else
-static inline bool gluster_supports_zerofill(void)
-{
-return 0;
-}
-
-static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
-{
-return 0;
-}
 #endif
 
 static int qemu_gluster_create(const char *filename,
@@ -996,9 +973,10 @@ static int qemu_gluster_create(const char *filename,
 struct glfs *glfs;
 struct glfs_fd *fd;
 int ret = 0;
-int prealloc = 0;
+PreallocMode prealloc;
 int64_t total_size = 0;
 char *tmp = NULL;
+Error *local_err = NULL;
 
 gconf = g_new0(BlockdevOptionsGluster, 1);
 gconf->debug = qemu_opt_get_number_del(opts, GLUSTER_OPT_DEBUG,
@@ -1026,13 +1004,12 @@ static int qemu_gluster_create(const char *filename,
   BDRV_SECTOR_SIZE);
 
 tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
-if (!tmp || !strcmp(tmp, "off")) {
-prealloc = 0;
-} else if (!strcmp(tmp, "full") && gluster_supports_zerofill()) {
-prealloc = 1;
-} else {
-error_setg(errp, "Invalid preallocation mode: '%s'"
- " or GlusterFS doesn't support zerofill API", tmp);
+prealloc = qapi_enum_parse(PreallocMode_lookup, tmp,
+   PREALLOC_MODE__MAX, PREALLOC_MODE_OFF,
+   _err);
+g_free(tmp);
+if (local_err) {
+error_propagate(errp, local_err);
 ret = -EINVAL;
 goto out;
 }
@@ -1041,21 +1018,48 @@ static int qemu_gluster_create(const char *filename,
 O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR | 
S_IWUSR);
 if (!fd) {
 ret = -errno;
-} else {
+goto out;
+}
+
+switch (prealloc) {
+#ifdef CONFIG_GLUSTERFS_FALLOCATE
+case PREALLOC_MODE_FALLOC:
+if (!glfs_fallocate(fd, 0, 0, total_size)) {
+error_setg(errp, "Could not preallocate data for the new file");
+ret = -errno;
+}
+break;
+#endif /* CONFIG_GLUSTERFS_FALLOCATE */
+#ifdef CONFIG_GLUSTERFS_ZEROFILL
+case PREALLOC_MODE_FULL:
 if (!glfs_ftruncate(fd, total_size)) {
-if (prealloc && qemu_gluster_zerofill(fd, 0, total_size)) {
+if (glfs_zerofill(fd, 0, total_size)) {
+error_setg(errp, "Could not zerofill the new file");
 ret = -errno;
 }
 } else {
+error_setg(errp, "Could not resize file");
 ret = -errno;
 }
-
-if (glfs_close(fd) != 0) {
+break;
+#endif /* CONFIG_GLUSTERFS_ZEROFILL */
+case PREALLOC_MODE_OFF:
+if (glfs_ftruncate(fd, total_size) != 0) {
 ret = -errno;
+error_setg(errp, "Could not resize file");
 }
+break;
+default:
+ret = -EINVAL;
+error_setg(errp, "Unsupported preallocation mode: %s",
+   PreallocMode_lookup[prealloc]);
+break;
+}
+
+if (glfs_close(fd) != 0) {
+ret = -errno;
 }
 out:
-g_free(tmp);
 qapi_free_BlockdevOptionsGluster(gconf);
 glfs_clear_preopened(glfs);
 return ret;
diff --git a/configure b/configure
index 7c020c0..8b8291e 100755
--- a/configure
+++ b/configure
@@ -300,6 +300,7 @@ seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
 glusterfs_discard="no"
+glusterfs_fallocate="no"
 glusterfs_zerofill="no"
 gtk=""
 gtkabi=""
@@ -3576,6 +3577,7 @@ if test "$glusterfs" != "no" ; then
   glusterfs_discard="yes"
 fi
 if $pkg_config

Re: [Qemu-devel] [PATCH v4 2/3] net/rocker: Plug memory leak in pci_rocker_init()

2017-05-18 Thread Mao Zhongyi


Hi, Philippe
Thanks for your quick review:)

On 05/18/2017 01:22 AM, Philippe Mathieu-Daudé wrote:

Hi Mao,

On 05/17/2017 08:12 AM, Mao Zhongyi wrote:

pci_rocker_init() leaks a World when the name more than 9 chars,
then return a negative value directly, doesn't make a correct
cleanup. So add a new goto label to fix it.

Cc: jasow...@redhat.com
Cc: j...@resnulli.us
Cc: f4...@amsat.org
Cc: arm...@redhat.com
Signed-off-by: Mao Zhongyi 
---
 hw/net/rocker/rocker.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index b2b6dc7..a382a6f 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1371,7 +1371,8 @@ static int pci_rocker_init(PCIDevice *dev)
 fprintf(stderr,


While here can you update fprintf -> qemu_log_mask?


In the patch 3, fprintf() has been updated to err_setg() which output error 
message
for more user-friendly. I'm not very clear why convert it to qemu_log_mask(). 
Could
you give me more hints about this?

Meanwhile, updating to qemu_log_mask() results in a segmentation fault. Because 
the
World memory was freed for 2 times. The first time is when the name more than 9 
chars,
It was freed by the goto label "err_name_too_long" in the pci_rocker_realize(), 
the
second is freed by qemu_system_reset() which called in the main().

So I think convert it to Error maybe a little better. What do you think?

Thanks
Mao




 "rocker: name too long; please shorten to at most %d chars\n",
 MAX_ROCKER_NAME_LEN);
-return -EINVAL;
+err = -EINVAL;
+goto err_name_too_long;
 }

 if (memcmp(>fp_start_macaddr, , sizeof(zero)) == 0) {
@@ -1430,6 +1431,7 @@ static int pci_rocker_init(PCIDevice *dev)

 return 0;

+err_name_too_long:
 err_duplicate:
 rocker_msix_uninit(r);
 err_msix_init:

Re: [Qemu-devel] [PATCH v2 0/6] gdb updates and cputlb traces

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 17, 2017 at 03:52:53PM +0100, Alex Bennée wrote:
> Hi,
> 
> Here is an update to the cputlb tracing and also a number of gdbstub
> updates. The main changes to the cputlb tracing are making each flush
> an explicit event instead of just dumping counts. This means you can
> so analysis on the delay from queuing work to scheduling at the cost
> of losing the raw count in the MMI interface.
> 
> Alex Bennée (6):
>   scripts/replay-dump.py: replay log dumper
>   scripts/qemu-gdb/timers.py: new helper to dump timer state
>   scripts/qemu-gdb/tcg: new helper to dump tcg state
>   cputlb: remove tlb_flush_count
>   cputlb: add trace events
>   new script/analyse-tlb-flushes-simpletrace.py
> 
>  cputlb.c   |  38 +++-
>  include/exec/cputlb.h  |   1 -
>  scripts/analyse-tlb-flushes-simpletrace.py | 144 +++
>  scripts/qemu-gdb.py|   4 +-
>  scripts/qemugdb/tcg.py |  46 +
>  scripts/qemugdb/timers.py  |  54 ++
>  scripts/replay-dump.py | 272 
> +
>  trace-events   |   7 +
>  translate-all.c|   1 -
>  9 files changed, 559 insertions(+), 8 deletions(-)
>  create mode 100755 scripts/analyse-tlb-flushes-simpletrace.py
>  create mode 100644 scripts/qemugdb/tcg.py
>  create mode 100644 scripts/qemugdb/timers.py
>  create mode 100755 scripts/replay-dump.py
> 
> -- 
> 2.11.0
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 00/13] pci, virtio, vhost: fixes

2017-05-18 Thread Stefan Hajnoczi

On Thu, May 18, 2017 at 12:44:59AM +0300, Michael S. Tsirkin wrote:
> This includes the previous pull request which still
> does not appear to be in - not rebased so merging twice
> will not cause conflicts. Note that patch 08 makes checkpatch
> complain, patch 9 fixes that.
> 
> The following changes since commit 76d20ea0f1b26ebd5da2f5fb2fdf3250cde887bb:
> 
>   Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-04-v3' into 
> staging (2017-05-09 15:49:14 -0400)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> 
> for you to fetch changes up to a764040cc831cfe5b8bf1c80e8341b9bf2de3ce8:
> 
>   exec: abstract address_space_do_translate() (2017-05-18 00:35:15 +0300)
> 
> 
> pci, virtio, vhost: fixes
> 
> A bunch of fixes that missed the release.
> Most notably we are reverting shpc back to enabled by default state
> as guests uses that as an indicator that hotplug is supported
> (even though it's unused). Unfortunately we can't fix this
> on the stable branch since that would break migration.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Ard Biesheuvel (2):
>   hw/acpi-defs: replace leading X with x_ in FADT field names
>   hw/arm/virt: generate 64-bit addressable ACPI objects
> 
> Bruce Rogers (1):
>   ACPI: don't call acpi_pcihp_device_plug_cb on xen
> 
> Eduardo Habkost (1):
>   iommu: Don't crash if machine is not PC_MACHINE
> 
> Greg Kurz (1):
>   virtio: allow broken device to notify guest
> 
> Herongguang (Stephen) (1):
>   pci: deassert intx when pci device unrealize
> 
> Igor Mammedov (1):
>   pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware 
> boot
> 
> Marc-André Lureau (1):
>   libvhost-user: fix crash when rings aren't ready
> 
> Marcel Apfelbaum (1):
>   Revert "hw/pci: disable pci-bridge's shpc by default"
> 
> Michael S. Tsirkin (1):
>   acpi-defs: clean up open brace usage
> 
> Peter Xu (2):
>   pc: add 2.10 machine type
>   exec: abstract address_space_do_translate()
> 
> Zhiyong Yang (1):
>   hw/virtio: fix vhost user fails to startup when MQ
> 
>  contrib/libvhost-user/libvhost-user.h |   6 +-
>  include/hw/acpi/acpi-defs.h   |  45 +++
>  include/hw/acpi/aml-build.h   |   3 +
>  include/hw/compat.h   |   6 +-
>  include/hw/i386/pc.h  |  10 ++--
>  contrib/libvhost-user/libvhost-user.c |  26 +++--
>  exec.c| 103 
> +++---
>  hw/acpi/aml-build.c   |  27 +
>  hw/acpi/piix4.c   |  11 +++-
>  hw/arm/virt-acpi-build.c  |  26 -
>  hw/i386/acpi-build.c  |   4 +-
>  hw/i386/amd_iommu.c   |  15 -
>  hw/i386/intel_iommu.c |  14 -
>  hw/i386/pc.c  |   9 ++-
>  hw/i386/pc_piix.c |  16 +-
>  hw/i386/pc_q35.c  |  14 -
>  hw/pci-bridge/pci_bridge_dev.c|   2 +-
>  hw/pci/pci.c  |   1 +
>  hw/virtio/vhost-user.c|  21 ---
>  hw/virtio/virtio.c|   4 +-
>  tests/bios-tables-test.c  |   4 +-
>  21 files changed, 253 insertions(+), 114 deletions(-)
> 
> 

Thanks, applied to my staging tree:
https://github.com/stefanha/qemu/commits/staging

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 0/3] hmp queue

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 17, 2017 at 07:07:51PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The following changes since commit cdece0467c7cf8e3f4b3c3f0b13bf2c4fea9:
> 
>   block/win32: fix 'ret not initialized' warning (2017-05-16 15:34:18 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/dagrh/qemu.git tags/pull-hmp-20170517
> 
> for you to fetch changes up to be9b23c4a539090da30b482015ee660850e8bb5f:
> 
>   ramblock: add new hmp command "info ramblock" (2017-05-17 17:31:16 +0100)
> 
> 
> HMP pull
> 
> 
> Peter Xu (3):
>   ramblock: add RAMBLOCK_FOREACH()
>   utils: provide size_to_str()
>   ramblock: add new hmp command "info ramblock"
> 
>  exec.c   | 44 
> +---
>  hmp-commands-info.hx | 14 ++
>  hmp.c|  6 ++
>  hmp.h|  1 +
>  include/exec/ramlist.h   |  6 ++
>  include/qemu-common.h|  1 +
>  migration/ram.c  | 13 +++--
>  qapi/string-output-visitor.c | 22 ++
>  util/cutils.c| 25 +
>  9 files changed, 99 insertions(+), 33 deletions(-)
> 

Peter has already merged pull requests from your RH2 PGP key.  I will
merge it even though gpg doesn't trust the key.

Thanks, applied to my staging tree:
https://github.com/stefanha/qemu/commits/staging

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread Thomas Huth

On 18.05.2017 11:14, David Hildenbrand wrote:
> On 18.05.2017 11:05, Thomas Huth wrote:
>> On 18.05.2017 11:00, Christian Borntraeger wrote:
>>> On 05/18/2017 10:48 AM, David Hildenbrand wrote:
 On 18.05.2017 03:55, Thomas Huth wrote:
> On 17.05.2017 18:49, David Hildenbrand wrote:
>> On 17.05.2017 17:35, Thomas Huth wrote:
>>> Currently we only present the plain z900 feature bits to the guest,
>>> but QEMU already emulates some additional features (but not all of
>>> the next CPU generation, so we can not use the next CPU level as
>>> default yet). Since newer Linux kernels are checking the feature bits
>>> and refuse to work if a required feature is missing, we should present
>>> as much of the supported features as possible when we are running
>>> with the default "qemu" CPU.
>>> This patch now adds the "stfle", "extended immediate" and "stckf"
>>> facility bits to the "qemu" CPU, which are already supported facilities.
>>> It is unfortunately still not enough to run e.g. recent Fedora kernels,
>>> but at least it's a first step into the right direction.
>>>
>>
>> Three things:
>>
>> 1. Should we care about backwards compatibility? I think so. This should
>> be fixed up using compat machine properties. (qemu model is a
>> migration-safe model and could e.g. be used in KVM setups, too).
>
> Theoretically, I agree, but do we really care about backwards
> compatibility at this point in time? All major distro kernels (except
> Debian, I think) currently do not work in QEMU, so there is currently
> not that much that can be migrated...
> And currently, the "qemu" CPU is the very same as the "z900" CPU, so you
> might also get along with simply using "-cpu z900" on the destination
> instead, I guess.

 If possible, I would like to avoid changing migration safe CPU model.
 And I guess it shouldn't be too hard for now (unless we really change
 the base model to e.g. a z9, then some more work might have to be done)

 I think for now, setting "stfle=off" on "s390-cpu-qemu" for compat
 machines should do the trick.

>
>> 2. I would recommend to not enable STFLE for now. Why?
>>
>> It is/was an indication that the system is running on a z9 (and
>> implicitly has the basic features). This was not only done because
>> people were lazy, but because this bit was implicitly connected to other
>> machine properties.
>
> Uh, that's ugly!
>
>> One popular example is the "DAT-enhancement facility 2". It introduced
>> the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
>> introduced. SO there is no way to check if the instruction is available
>> and actually working.
>
> Does the Linux kernel use this instruction at all? I just grep'ed
> through the kernel sources and did not find it. If the Linux kernel does
> not use it, I think we should ignore this interdependency and just
> provide the STFLE feature bit to the guest - since recent Linux kernels
> depend on it.

 Yes, current linux doesn't use it, I don't remember if previous versions
 did. Most likely not. The question is if they relied on the stfle==z9
 assumption. The STFLE facility really is special in that sense.

>
>> Please note that we added a feature representation for this facility,
>> because this would allow us later on to at least model removal of such a
>> facility (if HW actually would drop it) on a CPU model level.
>
> What about STFLE bit 78, according to my version of the POP, it says:
>
> "The enhanced-DAT facility 2 is installed in the
>  z/Architecture architectural mode."
>
> ?

 As Aurelien already mentioned, there seemed to be different ways to
 enhance DAT :) enhanced-DAT facility 2 is 2GB page support.

>
>> 3. This introduces some inconsistency. s390x/cpu_models.c:set_feature()
>> explicitly tests for such inconsistencies.
>>
>> So your QEMU CPU model would have a feature, but you would not be able
>> to run that model using QEMU when manually specifying it on the command
>> line. Especially, expanding the "qemu" model and feeding it back to QEMU
>> will fail.
>
> I've checked that I can also successfully disable the features again at
> the command line, using "-cpu qemu,eimm=false" for example, so not sure
> what exactly you're talking about here. Could you please elaborate?

 Assume libvirt/the user expands the CPU model name "qemu" via
 "qmp-expand-cpu-model "qemu", you will get something like

 "z900-base,.,stfle=on"

 If you feed that to QEMU using "-cpu z900-base,...,stfle=on", QEMU will
 detect the inconsistency when setting the property and abort. However,
 "-cpu qemu" will succeed. Please note that these checks actually make

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread David Hildenbrand

On 18.05.2017 11:05, Thomas Huth wrote:
> On 18.05.2017 11:00, Christian Borntraeger wrote:
>> On 05/18/2017 10:48 AM, David Hildenbrand wrote:
>>> On 18.05.2017 03:55, Thomas Huth wrote:
 On 17.05.2017 18:49, David Hildenbrand wrote:
> On 17.05.2017 17:35, Thomas Huth wrote:
>> Currently we only present the plain z900 feature bits to the guest,
>> but QEMU already emulates some additional features (but not all of
>> the next CPU generation, so we can not use the next CPU level as
>> default yet). Since newer Linux kernels are checking the feature bits
>> and refuse to work if a required feature is missing, we should present
>> as much of the supported features as possible when we are running
>> with the default "qemu" CPU.
>> This patch now adds the "stfle", "extended immediate" and "stckf"
>> facility bits to the "qemu" CPU, which are already supported facilities.
>> It is unfortunately still not enough to run e.g. recent Fedora kernels,
>> but at least it's a first step into the right direction.
>>
>
> Three things:
>
> 1. Should we care about backwards compatibility? I think so. This should
> be fixed up using compat machine properties. (qemu model is a
> migration-safe model and could e.g. be used in KVM setups, too).

 Theoretically, I agree, but do we really care about backwards
 compatibility at this point in time? All major distro kernels (except
 Debian, I think) currently do not work in QEMU, so there is currently
 not that much that can be migrated...
 And currently, the "qemu" CPU is the very same as the "z900" CPU, so you
 might also get along with simply using "-cpu z900" on the destination
 instead, I guess.
>>>
>>> If possible, I would like to avoid changing migration safe CPU model.
>>> And I guess it shouldn't be too hard for now (unless we really change
>>> the base model to e.g. a z9, then some more work might have to be done)
>>>
>>> I think for now, setting "stfle=off" on "s390-cpu-qemu" for compat
>>> machines should do the trick.
>>>

> 2. I would recommend to not enable STFLE for now. Why?
>
> It is/was an indication that the system is running on a z9 (and
> implicitly has the basic features). This was not only done because
> people were lazy, but because this bit was implicitly connected to other
> machine properties.

 Uh, that's ugly!

> One popular example is the "DAT-enhancement facility 2". It introduced
> the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
> introduced. SO there is no way to check if the instruction is available
> and actually working.

 Does the Linux kernel use this instruction at all? I just grep'ed
 through the kernel sources and did not find it. If the Linux kernel does
 not use it, I think we should ignore this interdependency and just
 provide the STFLE feature bit to the guest - since recent Linux kernels
 depend on it.
>>>
>>> Yes, current linux doesn't use it, I don't remember if previous versions
>>> did. Most likely not. The question is if they relied on the stfle==z9
>>> assumption. The STFLE facility really is special in that sense.
>>>

> Please note that we added a feature representation for this facility,
> because this would allow us later on to at least model removal of such a
> facility (if HW actually would drop it) on a CPU model level.

 What about STFLE bit 78, according to my version of the POP, it says:

 "The enhanced-DAT facility 2 is installed in the
  z/Architecture architectural mode."

 ?
>>>
>>> As Aurelien already mentioned, there seemed to be different ways to
>>> enhance DAT :) enhanced-DAT facility 2 is 2GB page support.
>>>

> 3. This introduces some inconsistency. s390x/cpu_models.c:set_feature()
> explicitly tests for such inconsistencies.
>
> So your QEMU CPU model would have a feature, but you would not be able
> to run that model using QEMU when manually specifying it on the command
> line. Especially, expanding the "qemu" model and feeding it back to QEMU
> will fail.

 I've checked that I can also successfully disable the features again at
 the command line, using "-cpu qemu,eimm=false" for example, so not sure
 what exactly you're talking about here. Could you please elaborate?
>>>
>>> Assume libvirt/the user expands the CPU model name "qemu" via
>>> "qmp-expand-cpu-model "qemu", you will get something like
>>>
>>> "z900-base,.,stfle=on"
>>>
>>> If you feed that to QEMU using "-cpu z900-base,...,stfle=on", QEMU will
>>> detect the inconsistency when setting the property and abort. However,
>>> "-cpu qemu" will succeed. Please note that these checks actually make
>>> sense for KVM:
>>>
>>
>> Jason (now on cc) has a patch prepared for other reasons that disabled 
>> features
>> for given machines. I kept the

Re: [Qemu-devel] [PATCH v2] .gdbinit: load QEMU sub-commands when gdb starts

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 17, 2017 at 01:40:42PM +0100, Stefan Hajnoczi wrote:
> The scripts/qemu-gdb.py file is not easily discoverable.  Add a .gdbinit
> file so GDB either loads qemu-gdb.py automatically or prints a message
> informing the user how to enable them (some systems disable ./.gdbinit
> loading for security reasons).
> 
> Symlink .gdbinit and the scripts directory in order to make out-of-tree
> builds work.  The scripts directory is used to find the qemu-gdb.py file
> specified by a relative path in .gdbinit.
> 
> Suggested-by: Eric Blake 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * Support out-of-tree builds [Daniel, Markus]
> 
>  configure | 1 +
>  .gdbinit  | 8 
>  2 files changed, 9 insertions(+)
>  create mode 100644 .gdbinit

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 0/7] Migration pull request

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 17, 2017 at 01:13:39PM +0200, Juan Quintela wrote:
> Hi
> 
> This pull request:
> - drop block capability patch (now they is being reviewed)
> - add following patches that got reviewed:
>   * page_cache.c into migration
>   * postcopy stuff into postocpy-ram.c
>   * check_migratable change
> 
> Please apply.
> 
> Thanks, Juan.
> 
> The following changes since commit cdece0467c7cf8e3f4b3c3f0b13bf2c4fea9:
> 
>   block/win32: fix 'ret not initialized' warning (2017-05-16 15:34:18 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/juanquintela/qemu.git tags/migration/20170517
> 
> for you to fetch changes up to 1bfe5f0586083747f1602931713111673849cd9d:
> 
>   migration: Move check_migratable() into qdev.c (2017-05-17 12:04:59 +0200)
> 
> 
> migration/next for 20170517
> 
> 
> Juan Quintela (7):
>   migration: Fix regression with compression threads
>   migration: Pass Error ** argument to {save,load}_vmstate
>   ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO
>   migration: Create migration/blocker.h
>   migration: Move page_cache.c to migration/
>   migration: Move postcopy stuff to postcopy-ram.c
>   migration: Move check_migratable() into qdev.c
> 
>  Makefile.objs |  1 -
>  block/qcow.c  |  2 +-
>  block/vdi.c   |  2 +-
>  block/vhdx.c  |  2 +-
>  block/vmdk.c  |  2 +-
>  block/vpc.c   |  2 +-
>  block/vvfat.c |  2 +-
>  hmp.c |  9 +++-
>  hw/9pfs/9p.c  |  2 +-
>  hw/core/qdev.c| 20 +++--
>  hw/display/qxl.c  |  2 +-
>  hw/display/virtio-gpu.c   |  2 +-
>  hw/intc/arm_gic_kvm.c |  2 +-
>  hw/intc/arm_gicv3_its_kvm.c   |  2 +-
>  hw/intc/arm_gicv3_kvm.c   |  2 +-
>  hw/misc/ivshmem.c |  2 +-
>  hw/scsi/vhost-scsi.c  |  2 +-
>  hw/virtio/vhost.c |  2 +-
>  include/migration/blocker.h   | 35 +++
>  include/migration/migration.h | 50 --
>  include/migration/vmstate.h   |  2 +
>  include/sysemu/sysemu.h   |  6 +--
>  migration/Makefile.objs   |  2 +-
>  migration/migration.c | 34 +--
>  page_cache.c => migration/page_cache.c|  0
>  {include/migration => migration}/page_cache.h |  0
>  migration/postcopy-ram.c  | 18 
>  migration/postcopy-ram.h  | 26 
>  migration/ram.c   | 35 +--
>  migration/savevm.c| 61 
> +++
>  replay/replay-snapshot.c  |  8 +++-
>  stubs/migr-blocker.c  |  2 +-
>  stubs/vmstate.c   |  5 +--
>  target/i386/kvm.c |  2 +-
>  tests/Makefile.include|  2 +-
>  vl.c  |  4 +-
>  36 files changed, 194 insertions(+), 158 deletions(-)
>  create mode 100644 include/migration/blocker.h
>  rename page_cache.c => migration/page_cache.c (100%)
>  rename {include/migration => migration}/page_cache.h (100%)
> 

Thanks, applied to my staging tree:
https://github.com/stefanha/qemu/commits/staging

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread Thomas Huth

On 18.05.2017 11:00, Christian Borntraeger wrote:
> On 05/18/2017 10:48 AM, David Hildenbrand wrote:
>> On 18.05.2017 03:55, Thomas Huth wrote:
>>> On 17.05.2017 18:49, David Hildenbrand wrote:
 On 17.05.2017 17:35, Thomas Huth wrote:
> Currently we only present the plain z900 feature bits to the guest,
> but QEMU already emulates some additional features (but not all of
> the next CPU generation, so we can not use the next CPU level as
> default yet). Since newer Linux kernels are checking the feature bits
> and refuse to work if a required feature is missing, we should present
> as much of the supported features as possible when we are running
> with the default "qemu" CPU.
> This patch now adds the "stfle", "extended immediate" and "stckf"
> facility bits to the "qemu" CPU, which are already supported facilities.
> It is unfortunately still not enough to run e.g. recent Fedora kernels,
> but at least it's a first step into the right direction.
>

 Three things:

 1. Should we care about backwards compatibility? I think so. This should
 be fixed up using compat machine properties. (qemu model is a
 migration-safe model and could e.g. be used in KVM setups, too).
>>>
>>> Theoretically, I agree, but do we really care about backwards
>>> compatibility at this point in time? All major distro kernels (except
>>> Debian, I think) currently do not work in QEMU, so there is currently
>>> not that much that can be migrated...
>>> And currently, the "qemu" CPU is the very same as the "z900" CPU, so you
>>> might also get along with simply using "-cpu z900" on the destination
>>> instead, I guess.
>>
>> If possible, I would like to avoid changing migration safe CPU model.
>> And I guess it shouldn't be too hard for now (unless we really change
>> the base model to e.g. a z9, then some more work might have to be done)
>>
>> I think for now, setting "stfle=off" on "s390-cpu-qemu" for compat
>> machines should do the trick.
>>
>>>
 2. I would recommend to not enable STFLE for now. Why?

 It is/was an indication that the system is running on a z9 (and
 implicitly has the basic features). This was not only done because
 people were lazy, but because this bit was implicitly connected to other
 machine properties.
>>>
>>> Uh, that's ugly!
>>>
 One popular example is the "DAT-enhancement facility 2". It introduced
 the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
 introduced. SO there is no way to check if the instruction is available
 and actually working.
>>>
>>> Does the Linux kernel use this instruction at all? I just grep'ed
>>> through the kernel sources and did not find it. If the Linux kernel does
>>> not use it, I think we should ignore this interdependency and just
>>> provide the STFLE feature bit to the guest - since recent Linux kernels
>>> depend on it.
>>
>> Yes, current linux doesn't use it, I don't remember if previous versions
>> did. Most likely not. The question is if they relied on the stfle==z9
>> assumption. The STFLE facility really is special in that sense.
>>
>>>
 Please note that we added a feature representation for this facility,
 because this would allow us later on to at least model removal of such a
 facility (if HW actually would drop it) on a CPU model level.
>>>
>>> What about STFLE bit 78, according to my version of the POP, it says:
>>>
>>> "The enhanced-DAT facility 2 is installed in the
>>>  z/Architecture architectural mode."
>>>
>>> ?
>>
>> As Aurelien already mentioned, there seemed to be different ways to
>> enhance DAT :) enhanced-DAT facility 2 is 2GB page support.
>>
>>>
 3. This introduces some inconsistency. s390x/cpu_models.c:set_feature()
 explicitly tests for such inconsistencies.

 So your QEMU CPU model would have a feature, but you would not be able
 to run that model using QEMU when manually specifying it on the command
 line. Especially, expanding the "qemu" model and feeding it back to QEMU
 will fail.
>>>
>>> I've checked that I can also successfully disable the features again at
>>> the command line, using "-cpu qemu,eimm=false" for example, so not sure
>>> what exactly you're talking about here. Could you please elaborate?
>>
>> Assume libvirt/the user expands the CPU model name "qemu" via
>> "qmp-expand-cpu-model "qemu", you will get something like
>>
>> "z900-base,.,stfle=on"
>>
>> If you feed that to QEMU using "-cpu z900-base,...,stfle=on", QEMU will
>> detect the inconsistency when setting the property and abort. However,
>> "-cpu qemu" will succeed. Please note that these checks actually make
>> sense for KVM:
>>
> 
> Jason (now on cc) has a patch prepared for other reasons that disabled 
> features
> for given machines. I kept the ESOP example in that patch.
> That would allow us to disable STFLE for old machines but enable it for 2.10
[...]
> Maybe we should split that out

Re: [Qemu-devel] [PULL 0/9] pci, virtio, vhost: fixes

2017-05-18 Thread Stefan Hajnoczi

On Thu, May 18, 2017 at 12:40:01AM +0300, Michael S. Tsirkin wrote:
> On Mon, May 15, 2017 at 04:04:33PM +0300, Michael S. Tsirkin wrote:
> > On Mon, May 15, 2017 at 01:58:40PM +0100, Stefan Hajnoczi wrote:
> > > On Wed, May 10, 2017 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> > > > The following changes since commit 
> > > > 76d20ea0f1b26ebd5da2f5fb2fdf3250cde887bb:
> > > > 
> > > >   Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-04-v3' 
> > > > into staging (2017-05-09 15:49:14 -0400)
> > > > 
> > > > are available in the git repository at:
> > > > 
> > > >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> > > > 
> > > > for you to fetch changes up to 8b12e48950a3d59188489b2ff6c5ad9cc09e9866:
> > > > 
> > > >   acpi-defs: clean up open brace usage (2017-05-10 22:04:23 +0300)
> > > > 
> > > > 
> > > > pci, virtio, vhost: fixes
> > > > 
> > > > A bunch of fixes that missed the release.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > 
> > > Please resend with the checkpatch.pl issue fixed.
> > 
> > 
> > There's no issue - new patch followed surrounding code which
> > was not refactored to match checkpatch.
> > Last patch of the series refactors all of that file to match.
> > 
> > We do not need to worry abut bisect for checkpatch issues IMHO.
> 
> Ping - is this going to get merged? I'm preparing the next pull request
> meanwhile ...

Sorry, I forgot about this one.  Applied now.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 0/9] pci, virtio, vhost: fixes

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 10, 2017 at 10:07:58PM +0300, Michael S. Tsirkin wrote:
> The following changes since commit 76d20ea0f1b26ebd5da2f5fb2fdf3250cde887bb:
> 
>   Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-04-v3' into 
> staging (2017-05-09 15:49:14 -0400)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> 
> for you to fetch changes up to 8b12e48950a3d59188489b2ff6c5ad9cc09e9866:
> 
>   acpi-defs: clean up open brace usage (2017-05-10 22:04:23 +0300)
> 
> 
> pci, virtio, vhost: fixes
> 
> A bunch of fixes that missed the release.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Ard Biesheuvel (2):
>   hw/acpi-defs: replace leading X with x_ in FADT field names
>   hw/arm/virt: generate 64-bit addressable ACPI objects
> 
> Bruce Rogers (1):
>   ACPI: don't call acpi_pcihp_device_plug_cb on xen
> 
> Eduardo Habkost (1):
>   iommu: Don't crash if machine is not PC_MACHINE
> 
> Igor Mammedov (1):
>   pc/fwcfg: unbreak migration from qemu-2.5 and qemu-2.6 during firmware 
> boot
> 
> Marc-André Lureau (1):
>   libvhost-user: fix crash when rings aren't ready
> 
> Michael S. Tsirkin (1):
>   acpi-defs: clean up open brace usage
> 
> Peter Xu (1):
>   pc: add 2.10 machine type
> 
> Zhiyong Yang (1):
>   hw/virtio: fix vhost user fails to startup when MQ
> 
>  contrib/libvhost-user/libvhost-user.h |  6 ++---
>  include/hw/acpi/acpi-defs.h   | 45 
> ++-
>  include/hw/acpi/aml-build.h   |  3 +++
>  include/hw/i386/pc.h  | 10 
>  contrib/libvhost-user/libvhost-user.c | 26 +++-
>  hw/acpi/aml-build.c   | 27 +
>  hw/acpi/piix4.c   | 11 ++---
>  hw/arm/virt-acpi-build.c  | 26 ++--
>  hw/i386/acpi-build.c  |  4 ++--
>  hw/i386/amd_iommu.c   | 15 +++-
>  hw/i386/intel_iommu.c | 14 +--
>  hw/i386/pc.c  |  9 ---
>  hw/i386/pc_piix.c | 16 ++---
>  hw/i386/pc_q35.c  | 14 +--
>  hw/virtio/vhost-user.c| 21 +---
>  tests/bios-tables-test.c  |  4 ++--
>  16 files changed, 175 insertions(+), 76 deletions(-)
> 
> 

Thanks, applied to my staging tree:
https://github.com/stefanha/qemu/commits/staging

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread Christian Borntraeger

On 05/18/2017 10:48 AM, David Hildenbrand wrote:
> On 18.05.2017 03:55, Thomas Huth wrote:
>> On 17.05.2017 18:49, David Hildenbrand wrote:
>>> On 17.05.2017 17:35, Thomas Huth wrote:
 Currently we only present the plain z900 feature bits to the guest,
 but QEMU already emulates some additional features (but not all of
 the next CPU generation, so we can not use the next CPU level as
 default yet). Since newer Linux kernels are checking the feature bits
 and refuse to work if a required feature is missing, we should present
 as much of the supported features as possible when we are running
 with the default "qemu" CPU.
 This patch now adds the "stfle", "extended immediate" and "stckf"
 facility bits to the "qemu" CPU, which are already supported facilities.
 It is unfortunately still not enough to run e.g. recent Fedora kernels,
 but at least it's a first step into the right direction.

>>>
>>> Three things:
>>>
>>> 1. Should we care about backwards compatibility? I think so. This should
>>> be fixed up using compat machine properties. (qemu model is a
>>> migration-safe model and could e.g. be used in KVM setups, too).
>>
>> Theoretically, I agree, but do we really care about backwards
>> compatibility at this point in time? All major distro kernels (except
>> Debian, I think) currently do not work in QEMU, so there is currently
>> not that much that can be migrated...
>> And currently, the "qemu" CPU is the very same as the "z900" CPU, so you
>> might also get along with simply using "-cpu z900" on the destination
>> instead, I guess.
> 
> If possible, I would like to avoid changing migration safe CPU model.
> And I guess it shouldn't be too hard for now (unless we really change
> the base model to e.g. a z9, then some more work might have to be done)
> 
> I think for now, setting "stfle=off" on "s390-cpu-qemu" for compat
> machines should do the trick.
> 
>>
>>> 2. I would recommend to not enable STFLE for now. Why?
>>>
>>> It is/was an indication that the system is running on a z9 (and
>>> implicitly has the basic features). This was not only done because
>>> people were lazy, but because this bit was implicitly connected to other
>>> machine properties.
>>
>> Uh, that's ugly!
>>
>>> One popular example is the "DAT-enhancement facility 2". It introduced
>>> the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
>>> introduced. SO there is no way to check if the instruction is available
>>> and actually working.
>>
>> Does the Linux kernel use this instruction at all? I just grep'ed
>> through the kernel sources and did not find it. If the Linux kernel does
>> not use it, I think we should ignore this interdependency and just
>> provide the STFLE feature bit to the guest - since recent Linux kernels
>> depend on it.
> 
> Yes, current linux doesn't use it, I don't remember if previous versions
> did. Most likely not. The question is if they relied on the stfle==z9
> assumption. The STFLE facility really is special in that sense.
> 
>>
>>> Please note that we added a feature representation for this facility,
>>> because this would allow us later on to at least model removal of such a
>>> facility (if HW actually would drop it) on a CPU model level.
>>
>> What about STFLE bit 78, according to my version of the POP, it says:
>>
>> "The enhanced-DAT facility 2 is installed in the
>>  z/Architecture architectural mode."
>>
>> ?
> 
> As Aurelien already mentioned, there seemed to be different ways to
> enhance DAT :) enhanced-DAT facility 2 is 2GB page support.
> 
>>
>>> 3. This introduces some inconsistency. s390x/cpu_models.c:set_feature()
>>> explicitly tests for such inconsistencies.
>>>
>>> So your QEMU CPU model would have a feature, but you would not be able
>>> to run that model using QEMU when manually specifying it on the command
>>> line. Especially, expanding the "qemu" model and feeding it back to QEMU
>>> will fail.
>>
>> I've checked that I can also successfully disable the features again at
>> the command line, using "-cpu qemu,eimm=false" for example, so not sure
>> what exactly you're talking about here. Could you please elaborate?
> 
> Assume libvirt/the user expands the CPU model name "qemu" via
> "qmp-expand-cpu-model "qemu", you will get something like
> 
> "z900-base,.,stfle=on"
> 
> If you feed that to QEMU using "-cpu z900-base,...,stfle=on", QEMU will
> detect the inconsistency when setting the property and abort. However,
> "-cpu qemu" will succeed. Please note that these checks actually make
> sense for KVM:
> 

Jason (now on cc) has a patch prepared for other reasons that disabled features
for given machines. I kept the ESOP example in that patch.
That would allow us to disable STFLE for old machines but enable it for 2.10

copy/pasted and hand edited to get rid of the unrelated changes:


diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3c12735..26b0ac9 100644
---

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread David Hildenbrand

On 17.05.2017 18:26, Aurelien Jarno wrote:
> On 2017-05-17 17:35, Thomas Huth wrote:
>> Currently we only present the plain z900 feature bits to the guest,
>> but QEMU already emulates some additional features (but not all of
>> the next CPU generation, so we can not use the next CPU level as
>> default yet). Since newer Linux kernels are checking the feature bits
>> and refuse to work if a required feature is missing, we should present
>> as much of the supported features as possible when we are running
>> with the default "qemu" CPU.
>> This patch now adds the "stfle", "extended immediate" and "stckf"
>> facility bits to the "qemu" CPU, which are already supported facilities.
>> It is unfortunately still not enough to run e.g. recent Fedora kernels,
>> but at least it's a first step into the right direction.
> 
> This indeed looks like a step in a good direction. At least it makes the
> recently added STFLE emulation useful :-).
> 
>> Signed-off-by: Thomas Huth 
>> ---
>>  target/s390x/cpu_models.c | 29 ++---
>>  1 file changed, 26 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
>> index 8d27363..18789ab 100644
>> --- a/target/s390x/cpu_models.c
>> +++ b/target/s390x/cpu_models.c
>> @@ -658,6 +658,24 @@ static void check_compatibility(const S390CPUModel 
>> *max_model,
>>"available in the configuration: ");
>>  }
>>  
>> +/**
>> + * The base TCG CPU model "qemu" is based on the z900. However, we already
>> + * can also emulate some additional features of later CPU generations, so
>> + * we add these additional feature bits here.
>> + */
>> +static void add_qemu_cpu_model_features(S390FeatBitmap fbm)
>> +{
>> +int i, feats[] = {
>> +S390_FEAT_STFLE,
>> +S390_FEAT_EXTENDED_IMMEDIATE,
>> +S390_FEAT_STORE_CLOCK_FAST
> 
> According to my list, QEMU also fully implements the following
> facilities:
> 
> S390_FEAT_LONG_DISPLACEMENT
> S390_FEAT_GENERAL_INSTRUCTIONS_EXT
> S390_FEAT_EXECUTE_EXT
> S390_FEAT_STFLE_45
> 

If S390_FEAT_LONG_DISPLACEMENT is supported, so is
S390_FEAT_LONG_DISPLACEMENT_FAST (because TCG obvious has high
performance ;) )

-- 

Thanks,

David

Re: [Qemu-devel] [Qemu-stable] [RESEND PATCH] virtio: allow broken device to notify guest

2017-05-18 Thread Fam Zheng

On Wed, 05/17 10:17, Greg Kurz wrote:
> According to section 2.1.2 of the virtio-1 specification:
> 
> "The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that
> a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET,
> the device MUST send a device configuration change notification to the
> driver."
> 
> Commit "f5ed36635d8f virtio: stop virtqueue processing if device is broken"
> introduced a virtio_error() call that just does that:
> 
> - internally mark the device as broken
> - set the DEVICE_NEEDS_RESET bit in the status
> - send a configuration change notification
> 
> Unfortunately, virtio_notify_vector(), called by virtio_notify_config(),
> returns right away when the device is marked as broken and the notification
> isn't sent in this case.
> 
> The spec doesn't say whether a broken device can send notifications
> in other situations or not. But since the driver isn't supposed to do
> anything but to reset the device, it makes sense to keep the check in
> virtio_notify_config().
> 
> Marking the device as broken AFTER the configuration change notification was
> sent is enough to fix the issue.
> 
> Signed-off-by: Greg Kurz 
> Reviewed-by: Cornelia Huck 
> ---
>  hw/virtio/virtio.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Re-sending this patch with Cornelia's Reviewed-by and Cc'ing qemu-stable
> since this is a spec violation, as suggested in:
> 
> Message-ID: <20170427183237-mutt-send-email-...@kernel.org>
> 
> Cheers,
> 
> --
> Greg
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 03592c542a55..890b4d7eb751 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2451,12 +2451,12 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice 
> *vdev, const char *fmt, ...)
>  error_vreport(fmt, ap);
>  va_end(ap);
>  
> -vdev->broken = true;
> -
>  if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>  virtio_set_status(vdev, vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET);
>  virtio_notify_config(vdev);
>  }
> +
> +vdev->broken = true;
>  }
>  
>  static void virtio_memory_listener_commit(MemoryListener *listener)
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread David Hildenbrand

On 18.05.2017 03:55, Thomas Huth wrote:
> On 17.05.2017 18:49, David Hildenbrand wrote:
>> On 17.05.2017 17:35, Thomas Huth wrote:
>>> Currently we only present the plain z900 feature bits to the guest,
>>> but QEMU already emulates some additional features (but not all of
>>> the next CPU generation, so we can not use the next CPU level as
>>> default yet). Since newer Linux kernels are checking the feature bits
>>> and refuse to work if a required feature is missing, we should present
>>> as much of the supported features as possible when we are running
>>> with the default "qemu" CPU.
>>> This patch now adds the "stfle", "extended immediate" and "stckf"
>>> facility bits to the "qemu" CPU, which are already supported facilities.
>>> It is unfortunately still not enough to run e.g. recent Fedora kernels,
>>> but at least it's a first step into the right direction.
>>>
>>
>> Three things:
>>
>> 1. Should we care about backwards compatibility? I think so. This should
>> be fixed up using compat machine properties. (qemu model is a
>> migration-safe model and could e.g. be used in KVM setups, too).
> 
> Theoretically, I agree, but do we really care about backwards
> compatibility at this point in time? All major distro kernels (except
> Debian, I think) currently do not work in QEMU, so there is currently
> not that much that can be migrated...
> And currently, the "qemu" CPU is the very same as the "z900" CPU, so you
> might also get along with simply using "-cpu z900" on the destination
> instead, I guess.

If possible, I would like to avoid changing migration safe CPU model.
And I guess it shouldn't be too hard for now (unless we really change
the base model to e.g. a z9, then some more work might have to be done)

I think for now, setting "stfle=off" on "s390-cpu-qemu" for compat
machines should do the trick.

> 
>> 2. I would recommend to not enable STFLE for now. Why?
>>
>> It is/was an indication that the system is running on a z9 (and
>> implicitly has the basic features). This was not only done because
>> people were lazy, but because this bit was implicitly connected to other
>> machine properties.
> 
> Uh, that's ugly!
> 
>> One popular example is the "DAT-enhancement facility 2". It introduced
>> the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
>> introduced. SO there is no way to check if the instruction is available
>> and actually working.
> 
> Does the Linux kernel use this instruction at all? I just grep'ed
> through the kernel sources and did not find it. If the Linux kernel does
> not use it, I think we should ignore this interdependency and just
> provide the STFLE feature bit to the guest - since recent Linux kernels
> depend on it.

Yes, current linux doesn't use it, I don't remember if previous versions
did. Most likely not. The question is if they relied on the stfle==z9
assumption. The STFLE facility really is special in that sense.

> 
>> Please note that we added a feature representation for this facility,
>> because this would allow us later on to at least model removal of such a
>> facility (if HW actually would drop it) on a CPU model level.
> 
> What about STFLE bit 78, according to my version of the POP, it says:
> 
> "The enhanced-DAT facility 2 is installed in the
>  z/Architecture architectural mode."
> 
> ?

As Aurelien already mentioned, there seemed to be different ways to
enhance DAT :) enhanced-DAT facility 2 is 2GB page support.

> 
>> 3. This introduces some inconsistency. s390x/cpu_models.c:set_feature()
>> explicitly tests for such inconsistencies.
>>
>> So your QEMU CPU model would have a feature, but you would not be able
>> to run that model using QEMU when manually specifying it on the command
>> line. Especially, expanding the "qemu" model and feeding it back to QEMU
>> will fail.
> 
> I've checked that I can also successfully disable the features again at
> the command line, using "-cpu qemu,eimm=false" for example, so not sure
> what exactly you're talking about here. Could you please elaborate?

Assume libvirt/the user expands the CPU model name "qemu" via
"qmp-expand-cpu-model "qemu", you will get something like

"z900-base,.,stfle=on"

If you feed that to QEMU using "-cpu z900-base,...,stfle=on", QEMU will
detect the inconsistency when setting the property and abort. However,
"-cpu qemu" will succeed. Please note that these checks actually make
sense for KVM:

If you're on a z13 and configure a zEC12, you might be tempted to add
e.g. "vx=on". However, IBC (Instruction Blocking Control) in the machine
will block any attempt to execute a vx instruction. So these checks make
sure that only facilities really supported for a machine generation can
be enabled.

If we really want that, we might decide to drop such checks for models <
e.g. z9, because nobody will most likely care.

> 
>> So I am not sure if we should introduce such inconsistencies at that
>> point. Rather fix up the basics and then move the CPU model to a
>>

Re: [Qemu-devel] [PATCH 6/6] spec/vhost-user spec: Add IOMMU support

2017-05-18 Thread Maxime Coquelin




On 05/17/2017 06:48 PM, Michael S. Tsirkin wrote:

On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin 
---
  docs/specs/vhost-user.txt | 75 +++
  hw/virtio/vhost-user.c| 31 
  2 files changed, 106 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 5fa7016..4a1f0c3 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -97,6 +97,23 @@ Depending on the request type, payload can be:
 log offset: offset from start of supplied file descriptor
 where logging starts (i.e. where guest address 0 would be logged)
  
+ * An IOTLB message

+   -
+   | iova | size | user address | permissions flags | type |
+   -
+
+   IOVA: a 64-bit guest I/O virtual address
+   Size: a 64-bit size
+   User address: a 64-bit user address
+   Permissions flags: a 8-bit bit field:
+- Bit 0: Read access
+- Bit 1: Write access
+   Type: a 8-bit IOTLB message type:
+- 1: IOTLB miss
+- 2: IOTLB update
+- 3: IOTLB invalidate
+- 4: IOTLB access fail
+
  In QEMU the vhost-user message is implemented with the following struct:
  
  typedef struct VhostUserMsg {

@@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
  struct vhost_vring_addr addr;
  VhostUserMemory memory;
  VhostUserLog log;
+struct vhost_iotlb_msg iotlb;
  };
  } QEMU_PACKED VhostUserMsg;
  
@@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by

  the source. No further update must be done before rings are
  restarted.
  
+IOMMU support

+-
+
+When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
+to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
+requests to the slave with a struct vhost_iotlb_msg payload. For update events,
+the iotlb payload has to be filled with the update message type (2), the I/O
+virtual address, the size, the user virtual address, and the permissions
+flags. For invalidation events, the iotlb payload has to be filled with the
+invalidation message type (3), the I/O virtual address and the size. On
+success, the slave is expected to reply with a zero payload, non-zero
+otherwise.
+
+When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
+master initiated the slave to master communication channel using the
+VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
+failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
+with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
+to be filled with the miss message type (1), the I/O virtual address and the
+permissions flags. For access failure event, the iotlb payload has to be
+filled with the access failure message type (4), the I/O virtual address and
+the permissions flags.


I don't think slave should cache invalid entries. If it does not,
how can it detect access failure as opposed to a miss?


Of course, invalid cache entries should not be cached.
The VHOST_IOTLB_ACCESS_FAIL has been specified for the Kernel backend,
even if the latter does not implement it yet.

In the case of user backend, I think it could be used without caching
entries.

After the backend sends a miss requests, the master will send an update
requests.

If for some reasons, the update entry is invalid (e.g. is outside the
memory ranges shared by set_mem_table), it won't be inserted into the
cache. So, when looking again at the cache, the backend will still face
a miss, and so can notify the master of the failing access.

That said, I specified this message type to mimic the kernel backend, I
don't what the master could do of such information.




For synchronization purpose, the slave may rely on the
+reply-ack feature, so the master may send a reply when operation is completed


What does completed mean in this context?


Completed means either the master has sent the IOTLB entry update to the
slave through the master->slave channel, or the IOVA is invalid (so
nothing sent).




+if the reply-ack feature is negotiated and slaves requests a reply.
+


This is not very clear to me.
So slave sends an access to master. Master finds a pte that
overlaps. What does it send to guest? Initial PTE?
All PTEs to cover the request? Part of the PTE that overlaps
the request?


Actually, the slave only sends the IOVA and access permissions, not the
size. So the

Re: [Qemu-devel] [PATCH 3/5] 9pfs: local: simplify file opening

2017-05-18 Thread Greg Kurz

On Tue, 9 May 2017 11:23:05 +0200
Greg Kurz  wrote:

> On Fri, 5 May 2017 12:01:55 -0500
> Eric Blake  wrote:
> 
> > On 05/05/2017 09:37 AM, Greg Kurz wrote:  
> > > All paths in the virtfs directory now start with "./" (except the virtfs
> > > root itself which is exactly ".").
> > > 
> > > We hence don't need to skip leading '/' characters anymore, nor to handle
> > > the empty path case. Also, since virtfs will only ever be supported on
> > > linux+glibc hosts, we can use strchrnul() and come up with a much simplier
> > > code to walk through the path elements. And we don't need to dup() the
> > > passed directory fd.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/9pfs/9p-local.c |5 -
> > >  hw/9pfs/9p-util.c  |   26 ++
> > >  2 files changed, 10 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> > > index 92262f3c3e37..bb6e296df317 100644
> > > --- a/hw/9pfs/9p-local.c
> > > +++ b/hw/9pfs/9p-local.c
> > > @@ -54,11 +54,6 @@ int local_open_nofollow(FsContext *fs_ctx, const char 
> > > *path, int flags,
> > >  {
> > >  LocalData *data = fs_ctx->private;
> > >  
> > > -/* All paths are relative to the path data->mountfd points to */
> > > -while (*path == '/') {
> > > -path++;
> > > -}
> > 
> > Is it worth adding any assert()s in place of the deleted code?
> >   
> 
> The assert() added by this patch ensures that we never pass an empty
> string to relative_openat_nofollow(), which isn't related to this
> hunk of deleted code... so I'm not sure I understand the question :-\
> 

Ping ?

> > Otherwise looks okay.
> >   
> 



pgpTZlfGVTktU.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/5] 9pfs: local: resolve special directories in paths

2017-05-18 Thread Greg Kurz

On Tue, 9 May 2017 11:12:58 +0200
Greg Kurz  wrote:

> On Fri, 5 May 2017 11:59:15 -0500
> Eric Blake  wrote:
> 
> > On 05/05/2017 09:37 AM, Greg Kurz wrote:  
> > > When using the mapped-file security mode, the creds of a path /foo/bar
> > > are stored in the /foo/.virtfs_metadata/bar file. This is okay for all
> > > paths unless they end with '.' or '..', because we cannot create the
> > > corresponding file in the metadata directory.
> > > 
> > > This patch ensures that '.' and '..' are resolved in all paths.
> > > 
> > > The core code only passes path elements (no '/') to the backend, with
> > > the notable exception of the '/' path, which refers to the virtfs root.
> > > This patch preserve the current behavior of converting it to '.' so
[...]
> > > +} else if (!strcmp(name, "..")) {
> > > +if (!strcmp(dir_path->data, ".")) {
> > > +/* ".." relative to the root is "." */
> > > +v9fs_path_sprintf(target, ".");
> > > +} else {
> > > +char *tmp = g_path_get_dirname(dir_path->data);
> > > +/* ".." relative to "foo/bar" is equivalent to "foo" */  
> > >   
> > 
> > True only if bar is not a symlink to some other directory.  What
> > guarantees do you have that you are not going to be inadvertently
> > skipping a traversal through symlinks and thereby picking the wrong
> > location for '..'?
> >   
> 
> My understanding is that symlinks are supposed to be resolved by the client,
> and we shouldn't follow them in the server.
> 

Eric,

Do you have any comment further comment or can I go on with this change ?

Cheers,

--
Greg


pgp8X30D_G7ce.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Kevin Wolf

Am 18.05.2017 um 10:06 hat Stefan Hajnoczi geschrieben:
> On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf  wrote:
> > Am 17.05.2017 um 22:16 hat Eric Blake geschrieben:
> >> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
> >> > Calling aio_poll() directly may have been fine previously, but this is
> >> > the future, man!
> >>
> >> lol
> >>
> >> >  The difference between an aio_poll() loop and
> >> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
> >> > around aio_poll().
> >> >
> >> > This allows the IOThread to run fd handlers or BHs to complete the
> >> > request.  Failure to release the AioContext causes deadlocks.
> >> >
> >> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
> >> > iothread.
> >>
> >> I'm surprised at how many separate hangs we actually had!
> >
> > How hard would it be to write some test cases for this? Dataplane has
> > a serious lack of automated testing.
> 
> And this hang doesn't even require in-flight guest I/O, so it would be
> easy to reproduce in qemu-iotests if we add an IOThread mode.

I don't think I would make it a separate mode (because people would run
only one or the other - when did you last run qcow2 v2 tests?), but just
add some test cases that make use of it during the normal -qcow2 or -raw
run.

Kevin

Re: [Qemu-devel] [PATCH 3/3] migration: avoid recursive AioContext locking in save_vmstate()

2017-05-18 Thread Kevin Wolf

Am 17.05.2017 um 19:09 hat Stefan Hajnoczi geschrieben:
> AioContext was designed to allow nested acquire/release calls.  It uses
> a recursive mutex so callers don't need to worry about nesting...or so
> we thought.
> 
> BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
> the AioContext temporarily around aio_poll().  This gives IOThreads a
> chance to acquire the AioContext to process I/O completions.
> 
> It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
> BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
> will not be able to acquire the AioContext if it was acquired
> multiple times.
> 
> Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
> this patch simply avoids nested locking in save_vmstate().  It's the
> simplest fix and we should step back to consider the big picture with
> all the recent changes to block layer threading.
> 
> This patch is the final fix to solve 'savevm' hanging with -object
> iothread.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  migration/savevm.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 7f66d58..a70ba20 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2153,6 +2153,14 @@ int save_vmstate(const char *name)
>  goto the_end;
>  }
>  
> +/* The bdrv_all_create_snapshot() call that follows acquires the 
> AioContext
> + * for itself.  BDRV_POLL_WHILE() does not support nested locking because
> + * it only releases the lock once.  Therefore synchronous I/O will 
> deadlock
> + * unless we release the AioContext before bdrv_all_create_snapshot().
> + */
> +aio_context_release(aio_context);
> +aio_context = NULL;
> +
>  ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, );
>  if (ret < 0) {
>  error_report("Error while creating snapshot on '%s'",
> @@ -2163,7 +2171,9 @@ int save_vmstate(const char *name)
>  ret = 0;
>  
>   the_end:
> -aio_context_release(aio_context);
> +if (aio_context) {
> +aio_context_release(aio_context);
> +}
>  if (saved_vm_running) {
>  vm_start();
>  }

It might actually even be true before this patch because the lock is
already only taken for some parts of the function, but don't we need to
call bdrv_drain_all_begin/end() around the whole function now?

We're stopping the VM, so hopefully no device is continuing to process
requests, but can't we still have block jobs, NBD server requests etc.?

And the same is probably true for qemu_loadvm_state().

Kevin

[Qemu-devel] [PATCH 0/3] numa: code consolidation and test fixup

2017-05-18 Thread Igor Mammedov

git repo for testing:
   https://github.com/imammedo/qemu.git cphp_numa_cfg_follow_up_v3_cleanups_v1

CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
CC: Eduardo Habkost 
CC: David Gibson 
CC: Andrew Jones 

Igor Mammedov (3):
  numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
  numa: move default mapping init to machine
  numa: silence incomplete mapping warning under qtest

 include/sysemu/numa.h |  1 +
 hw/arm/virt.c | 16 ++--
 hw/core/machine.c | 35 +++
 hw/i386/pc.c  | 17 +
 hw/ppc/spapr.c| 17 +
 numa.c| 48 ++--
 6 files changed, 50 insertions(+), 84 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH 3/3] numa: silence incomplete mapping warning under qtest

2017-05-18 Thread Igor Mammedov

Suggested-by: Markus Armbruster 
Signed-off-by: Igor Mammedov 
---
CC: Markus Armbruster 

---
 hw/core/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2e91aa9..21ebef8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -21,6 +21,7 @@
 #include "qemu/error-report.h"
 #include "qemu/cutils.h"
 #include "sysemu/numa.h"
+#include "sysemu/qtest.h"
 
 static char *machine_get_accel(Object *obj, Error **errp)
 {
@@ -732,7 +733,7 @@ static void machine_numa_finish_init(MachineState *machine)
 }
 }
 }
-if (s->len) {
+if (s->len && !qtest_enabled()) {
 error_report("warning: CPU(s) not present in any NUMA nodes: %s",
  s->str);
 error_report("warning: All CPU(s) up to maxcpus should be described "
-- 
2.7.4

[Qemu-devel] [PATCH 2/3] numa: move default mapping init to machine

2017-05-18 Thread Igor Mammedov

there is no need use cpu_index_to_instance_props() for setting
default cpu -> node mapping. Generic machine code can do it
without cpu_index by just enabling already preset defaults
in possible_cpus.

PS:
as bonus it makes one less user of cpu_index_to_instance_props()

Signed-off-by: Igor Mammedov 
---
 hw/core/machine.c | 32 +---
 numa.c| 26 --
 2 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index fd6a436..2e91aa9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -700,26 +700,36 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
 return g_string_free(s, false);
 }
 
-static void machine_numa_validate(MachineState *machine)
+static void machine_numa_finish_init(MachineState *machine)
 {
-int i;
+int i, default_mapping;
 GString *s = g_string_new(NULL);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
 
 assert(nb_numa_nodes);
+for (i = possible_cpus->len;
+ i && !possible_cpus->cpus[i - 1].props.has_node_id;
+ i--)
+;;
+default_mapping = !i; /* i == 0 : no explicit mapping provided by user */
+
 for (i = 0; i < possible_cpus->len; i++) {
 const CPUArchId *cpu_slot = _cpus->cpus[i];
 
-/* at this point numa mappings are initilized by CLI options
- * or with default mappings so it's sufficient to list
- * all not yet mapped CPUs here */
-/* TODO: make it hard error in future */
 if (!cpu_slot->props.has_node_id) {
-char *cpu_str = cpu_slot_to_string(cpu_slot);
-g_string_append_printf(s, "%sCPU %d [%s]", s->len ? ", " : "", i,
-   cpu_str);
-g_free(cpu_str);
+if (default_mapping) {
+/* fetch default mapping from board and enable it */
+CpuInstanceProperties props = cpu_slot->props;
+props.has_node_id = true;
+machine_set_cpu_numa_node(machine, , _fatal);
+} else {
+/* record slots with not set mapping */
+char *cpu_str = cpu_slot_to_string(cpu_slot);
+g_string_append_printf(s, "%sCPU %d [%s]",
+   s->len ? ", " : "", i, cpu_str);
+g_free(cpu_str);
+}
 }
 }
 if (s->len) {
@@ -737,7 +747,7 @@ void machine_run_board_init(MachineState *machine)
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
 if (nb_numa_nodes) {
-machine_numa_validate(machine);
+machine_numa_finish_init(machine);
 }
 machine_class->init(machine);
 }
diff --git a/numa.c b/numa.c
index 0115bfd..796cd7d 100644
--- a/numa.c
+++ b/numa.c
@@ -427,7 +427,6 @@ void numa_default_auto_assign_ram(MachineClass *mc, 
NodeInfo *nodes,
 void parse_numa_opts(MachineState *ms)
 {
 int i;
-const CPUArchIdList *possible_cpus;
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 
 if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
@@ -485,31 +484,6 @@ void parse_numa_opts(MachineState *ms)
 
 numa_set_mem_ranges();
 
-/* assign CPUs to nodes using board provided default mapping */
-if (!mc->cpu_index_to_instance_props || !mc->possible_cpu_arch_ids) {
-error_report("default CPUs to NUMA node mapping isn't supported");
-exit(1);
-}
-
-possible_cpus = mc->possible_cpu_arch_ids(ms);
-for (i = 0; i < possible_cpus->len; i++) {
-if (possible_cpus->cpus[i].props.has_node_id) {
-break;
-}
-}
-
-/* no CPUs are assigned to NUMA nodes */
-if (i == possible_cpus->len) {
-for (i = 0; i < max_cpus; i++) {
-CpuInstanceProperties props;
-/* fetch default mapping from board and enable it */
-props = mc->cpu_index_to_instance_props(ms, i);
-props.has_node_id = true;
-
-machine_set_cpu_numa_node(ms, , _fatal);
-}
-}
-
 /* QEMU needs at least all unique node pair distances to build
  * the whole NUMA distance table. QEMU treats the distance table
  * as symmetric by default, i.e. distance A->B == distance B->A.
-- 
2.7.4

[Qemu-devel] [PATCH 1/3] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr

2017-05-18 Thread Igor Mammedov

Signed-off-by: Igor Mammedov 
---
 include/sysemu/numa.h |  1 +
 hw/arm/virt.c | 16 ++--
 hw/i386/pc.c  | 17 +
 hw/ppc/spapr.c| 17 +
 numa.c| 22 ++
 5 files changed, 27 insertions(+), 46 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 7ffde5b..610eece 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -35,4 +35,5 @@ void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo 
*nodes,
  int nb_nodes, ram_addr_t size);
 void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
   int nb_nodes, ram_addr_t size);
+void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c7c8159..ce676df 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1351,7 +1351,6 @@ static void machvirt_init(MachineState *machine)
 for (n = 0; n < possible_cpus->len; n++) {
 Object *cpuobj;
 CPUState *cs;
-int node_id;
 
 if (n >= smp_cpus) {
 break;
@@ -1364,19 +1363,8 @@ static void machvirt_init(MachineState *machine)
 cs = CPU(cpuobj);
 cs->cpu_index = n;
 
-node_id = possible_cpus->cpus[cs->cpu_index].props.node_id;
-if (!possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
-/* by default CPUState::numa_node was 0 if it's not set via CLI
- * keep it this way for now but in future we probably should
- * refuse to start up with incomplete numa mapping */
- node_id = 0;
-}
-if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
-cs->numa_node = node_id;
-} else {
-/* CPU isn't device_add compatible yet, this shouldn't happen */
-error_setg(_abort, "user set node-id not implemented");
-}
+numa_cpu_pre_plug(_cpus->cpus[cs->cpu_index], DEVICE(cpuobj),
+  _fatal);
 
 if (!vms->secure) {
 object_property_set_bool(cpuobj, false, "has_el3", NULL);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e36a375..d83c158 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1895,7 +1895,6 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
 int idx;
-int node_id;
 CPUState *cs;
 CPUArchId *cpu_slot;
 X86CPUTopoInfo topo;
@@ -1986,21 +1985,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 cs = CPU(cpu);
 cs->cpu_index = idx;
 
-node_id = cpu_slot->props.node_id;
-if (!cpu_slot->props.has_node_id) {
-/* by default CPUState::numa_node was 0 if it's not set via CLI
- * keep it this way for now but in future we probably should
- * refuse to start up with incomplete numa mapping */
-node_id = 0;
-}
-if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
-cs->numa_node = node_id;
-} else if (cs->numa_node != node_id) {
-error_setg(errp, "node-id %d must match numa node specified"
-"with -numa option for cpu-index %d",
-cs->numa_node, cs->cpu_index);
-return;
-}
+numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0980d73..c7fee8b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2831,11 +2831,9 @@ static void spapr_core_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
 Error *local_err = NULL;
 CPUCore *cc = CPU_CORE(dev);
-sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
 char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
 const char *type = object_get_typename(OBJECT(dev));
 CPUArchId *core_slot;
-int node_id;
 int index;
 
 if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
@@ -2870,20 +2868,7 @@ static void spapr_core_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out;
 }
 
-node_id = core_slot->props.node_id;
-if (!core_slot->props.has_node_id) {
-/* by default CPUState::numa_node was 0 if it's not set via CLI
- * keep it this way for now but in future we probably should
- * refuse to start up with incomplete numa mapping */
-node_id = 0;
-}
-if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
-sc->node_id = node_id;
-} else if (sc->node_id != node_id) {
-error_setg(_err, "node-id %d must match numa node specified"
-"with -numa option for cpu-index %d", sc->node_id, cc->core_id);
-goto out;
-}
+numa_cpu_pre_plug(core_slot, dev, _err);
 
 out:
 g_free(base_core_type);
diff --git a/numa.c b/numa.c
index ca73145..0115bfd 100644
---

Re: [Qemu-devel] [PATCH 2/2] postcopy: Require RAMBlocks that are whole pages

2017-05-18 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)"  wrote:
> > From: "Dr. David Alan Gilbert" 
> >
> > It turns out that it's legal to create a VM with RAMBlocks that aren't
> > a multiple of the pagesize in use; e.g. a 1025M main memory using
> > 2M host pages.  That breaks postcopy's atomic placement of pages,
> > so disallow it.
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> 
> Reviewed-by: Juan Quintela 

Thanks

> >  }
> >  
> >  /* We don't support postcopy with shared RAM yet */
> > -if (qemu_ram_foreach_block(test_range_shared, NULL)) {
> > +if (qemu_ram_foreach_block(test_ramblock_postcopiable, NULL)) {
> 
> When I was looking at this code, I still don't know why
> qemu_ram_foreach_block() don't pass the block directly.  It needs it
> almost all callers.
> 
> When I saw it I was about to change it, but got sidetracked on other
> things :-p

I think originally it passed very little information at all, and
that RAMBlocks were these mystical things no one outside exec.c
was really supposed to know about.

Dave

> Later, Juan.
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 3/3] pseries: Improve tracing of CPU compatibility negotiation

2017-05-18 Thread Greg Kurz

On Thu, 18 May 2017 15:45:22 +1000
David Gibson  wrote:

> This makes some improvements to the debug tracepoints around the
> negotiation of CPU compatibility mode during CAS.  The traces are
> reorganized to emphasise what the inputs and outputs of the process are,
> then distinguish the internal state of the two possible negotiation paths
> (current and pre-2.8 machine type compatibility).
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr_hcall.c | 14 --
>  hw/ppc/trace-events  |  6 --
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a790da7..cea5d99 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1109,7 +1109,6 @@ static uint32_t cas_check_pvr_pre_2_9(PowerPCCPU *cpu, 
> target_ulong *addr,
>  break; /* Terminator record */
>  }
>  
> -trace_spapr_cas_pvr_try(pvr);
>  if (!max_lvl &&
>  ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask))) {
>  cpu_match = true;
> @@ -1120,10 +1119,10 @@ static uint32_t cas_check_pvr_pre_2_9(PowerPCCPU 
> *cpu, target_ulong *addr,
>  } else if (!cpu_match) {
>  cas_handle_compat_cpu(pcc, pvr, max_lvl, _lvl, 
> _pvr);
>  }
> +trace_cas_check_pvr_pre_2_9(pvr, pvr_mask, cpu_match,
> +compat_lvl, compat_pvr);
>  }
>  
> -trace_spapr_cas_pvr(cpu->compat_pvr, cpu_match, compat_pvr);
> -
>  return compat_pvr;
>  }
>  
> @@ -1158,6 +1157,7 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, 
> target_ulong *addr,
>  best_compat = pvr;
>  }
>  }
> +trace_cas_check_pvr(pvr, pvr_mask, explicit_match, best_compat);
>  }
>  
>  if ((best_compat == 0) && (!explicit_match || max_compat)) {
> @@ -1168,9 +1168,6 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, 
> target_ulong *addr,
>  return 0;
>  }
>  
> -/* Parsing finished */
> -trace_spapr_cas_pvr(cpu->compat_pvr, explicit_match, best_compat);
> -
>  return best_compat;
>  }
>  
> @@ -1188,6 +1185,9 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  bool guest_radix;
>  Error *local_err = NULL;
>  
> +trace_cas_check_pvr_start(cpu->compat_pvr, cpu->max_compat,
> +  cpu->env.spr[SPR_PVR]);
> +
>  if (smc->pre_2_9_cas_pvr) {
>  cas_pvr = cas_check_pvr_pre_2_9(cpu, , _err);
>  } else {
> @@ -1198,6 +1198,8 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  return H_HARDWARE;
>  }
>  
> +trace_cas_check_pvr_complete(cpu->compat_pvr, cas_pvr);
> +
>  /* Update CPUs */
>  if (cpu->compat_pvr != cas_pvr) {
>  ppc_set_compat_all(cas_pvr, _err);
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 43d265f..5329740 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -14,8 +14,10 @@ spapr_cas_failed(unsigned long n) "DT diff buffer is too 
> small: %ld bytes"
>  spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
>  
>  # hw/ppc/spapr_hcall.c
> -spapr_cas_pvr_try(uint32_t pvr) "%x"
> -spapr_cas_pvr(uint32_t cur_pvr, bool explicit_match, uint32_t new_pvr) 
> "current=%x, explicit_match=%u, new=%x"
> +cas_check_pvr_pre_2_9(uint32_t pvr, uint32_t mask, bool match, unsigned lvl, 
> uint32_t compat_pvr) "0x%08x/0x%08x match=%d lvl=%d compat_pvr=0x%x"
> +cas_check_pvr(uint32_t pvr, uint32_t pvr_mask, bool explicit_match, uint32_t 
> best_compat) "0x%08x/0x%08x explicit_match=%d best_compat=0x%08x"
> +cas_check_pvr_start(uint32_t compat_pvr, uint32_t max_compat, uint32_t 
> phys_pvr) "Initial compat PVR 0x%08x, max compat 0x%08x (real PVR 0x%08x)"
> +cas_check_pvr_complete(uint32_t old_pvr, uint32_t new_pvr) "Compatibility 
> PVR was 0x%08x, now 0x%08x"
>  
>  # hw/ppc/spapr_iommu.c
>  spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) 
> "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64



pgpb0ctIUJSI9.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Stefan Hajnoczi

On Thu, May 18, 2017 at 8:57 AM, Kevin Wolf  wrote:
> Am 17.05.2017 um 22:16 hat Eric Blake geschrieben:
>> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
>> > Calling aio_poll() directly may have been fine previously, but this is
>> > the future, man!
>>
>> lol
>>
>> >  The difference between an aio_poll() loop and
>> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
>> > around aio_poll().
>> >
>> > This allows the IOThread to run fd handlers or BHs to complete the
>> > request.  Failure to release the AioContext causes deadlocks.
>> >
>> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
>> > iothread.
>>
>> I'm surprised at how many separate hangs we actually had!
>
> How hard would it be to write some test cases for this? Dataplane has
> a serious lack of automated testing.

And this hang doesn't even require in-flight guest I/O, so it would be
easy to reproduce in qemu-iotests if we add an IOThread mode.

Stefan

Re: [Qemu-devel] [Qemu-block] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Stefan Hajnoczi

On Wed, May 17, 2017 at 9:16 PM, Eric Blake  wrote:
> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
>> diff --git a/block/io.c b/block/io.c
>> index cc56e90..f0041cd 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2031,9 +2031,7 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector 
>> *qiov, int64_t pos,
>>  Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, 
>> );
>>
>>  bdrv_coroutine_enter(bs, co);
>> -while (data.ret == -EINPROGRESS) {
>> -aio_poll(bdrv_get_aio_context(bs), true);
>> -}
>> +BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
>>  return data.ret;
>>  }
>
> Do we have other culprits (not necessarily for vmsave, but for other
> situations), where we should be using BDRV_POLL_WHILE in separate
> patches? For example, a quick grep show that at least hw/9pfs/9p.c makes
> a direct call to aio_poll(,true) in a while loop.

virtio-9p doesn't use IOThread (dataplane) so it is not affected.  It
also doesn't use BlockDriverState so it cannot use BDRV_POLL_WHILE().

I did grep for other aio_poll() callers and didn't spot any obvious
cases that need to be fixed.  They tend to run in situations where
this deadlock cannot occur.

Stefan

Re: [Qemu-devel] [PATCH 2/3] pseries: Restore PVR negotiation logic for pre-2.9 machine types

2017-05-18 Thread Greg Kurz

On Thu, 18 May 2017 15:45:21 +1000
David Gibson  wrote:

> "pseries" guests go through a hypervisor<->guest feature negotiation during
> early boot.  Part of this is finding a CPU compatibility mode which works
> for both.
> 
> In 152ef80 "pseries: Rewrite CAS PVR compatibility logic" this logic was
> changed to strongly prefer architecture defined CPU compatibility modes,
> rather than CPU "raw" modes.  However, this change was made universally,
> which introduces a guest visible change for the previously existing machine
> types (pseries-2.8 and earlier).  That's never supposed to happen.
> 
> This corrects the behaviour, reverting to the old PVR negotiation logic
> for machine types prior to pseries-2.9.
> 
> Fixes: 152ef803ceb1959e2380a1da7736b935b109222e
> Reported-by: Andrea Bolognani 
> Signed-off-by: David Gibson 
> Reviewed-by: Laurent Vivier 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr.c |  3 ++
>  hw/ppc/spapr_hcall.c   | 90 
> +-
>  include/hw/ppc/spapr.h |  1 +
>  3 files changed, 93 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 35dceb0..ad8ddf2 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3313,9 +3313,12 @@ static void 
> spapr_machine_2_8_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_8_class_options(MachineClass *mc)
>  {
> +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
>  spapr_machine_2_9_class_options(mc);
>  SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_8);
>  mc->numa_mem_align_shift = 23;
> +smc->pre_2_9_cas_pvr = true;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_8, "2.8", false);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 77d2d66..a790da7 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1044,6 +1044,89 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu,
>  }
>  }
>  
> +/*
> + * Old logic for PVR negotiation, used old <2.9 machine types for
> + * compatibility with old qemu versions
> + */
> +#define get_compat_level(cpuver) (  \
> +((cpuver) == CPU_POWERPC_LOGICAL_2_05) ? 2050 : \
> +((cpuver) == CPU_POWERPC_LOGICAL_2_06) ? 2060 :  \
> +((cpuver) == CPU_POWERPC_LOGICAL_2_06_PLUS) ? 2061 :\
> +((cpuver) == CPU_POWERPC_LOGICAL_2_07) ? 2070 : 0)
> +
> +static void cas_handle_compat_cpu(PowerPCCPUClass *pcc, uint32_t pvr,
> +  unsigned max_lvl, unsigned *compat_lvl,
> +  unsigned *compat_pvr)
> +{
> +unsigned lvl = get_compat_level(pvr);
> +bool is205, is206, is207;
> +
> +if (!lvl) {
> +return;
> +}
> +
> +/* If it is a logical PVR, try to determine the highest level */
> +is205 = (pcc->pcr_supported & PCR_COMPAT_2_05) &&
> +(lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_05));
> +is206 = (pcc->pcr_supported & PCR_COMPAT_2_06) &&
> +((lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06)) ||
> + (lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_06_PLUS)));
> +is207 = (pcc->pcr_supported & PCR_COMPAT_2_07) &&
> +(lvl == get_compat_level(CPU_POWERPC_LOGICAL_2_07));
> +
> +if (is205 || is206 || is207) {
> +if (!max_lvl) {
> +/* User did not set the level, choose the highest */
> +if (*compat_lvl <= lvl) {
> +*compat_lvl = lvl;
> +*compat_pvr = pvr;
> +}
> +} else if (max_lvl >= lvl) {
> +/* User chose the level, don't set higher than this */
> +*compat_lvl = lvl;
> +*compat_pvr = pvr;
> +}
> +}
> +}
> +
> +static uint32_t cas_check_pvr_pre_2_9(PowerPCCPU *cpu, target_ulong *addr,
> +  Error **errp)
> +{
> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +int counter;
> +unsigned max_lvl = get_compat_level(cpu->max_compat);
> +bool cpu_match = false;
> +unsigned compat_lvl = 0, compat_pvr = 0;
> +
> +for (counter = 0; counter < 512; ++counter) {
> +uint32_t pvr, pvr_mask;
> +
> +pvr_mask = ldl_be_phys(_space_memory, *addr);
> +pvr = ldl_be_phys(_space_memory, *addr + 4);
> +*addr += 8;
> +
> +if (~pvr_mask & pvr) {
> +break; /* Terminator record */
> +}
> +
> +trace_spapr_cas_pvr_try(pvr);
> +if (!max_lvl &&
> +((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask))) {
> +cpu_match = true;
> +compat_pvr = 0;
> +} else if (pvr == cpu->compat_pvr) {
> +cpu_match = true;
> +compat_pvr = cpu->compat_pvr;
> +} else if (!cpu_match) {
> +cas_handle_compat_cpu(pcc, pvr, max_lvl, _lvl, 
> _pvr);
> +}
> +}
> +
> +

Re: [Qemu-devel] [PATCH 1/3] pseries: Split CAS PVR negotiation out into a separate function

2017-05-18 Thread Greg Kurz

On Thu, 18 May 2017 15:45:20 +1000
David Gibson  wrote:

> Guests of the qemu machine type go through a feature negotiation process
> known as "client architecture support" (CAS) during early boot.  This does
> a number of things, one of which is finding a CPU compatibility mode which
> can be supported by both guest and host.
> 
> In fact the CPU negotiation is probably the single most complex part of the
> CAS process, so this splits it out into a helper function.  We've recently
> made some mistakes in maintaining backward compatibility for old machine
> types here.  Splitting this out will also make it easier to fix this.
> 
> This also adds a possibly useful error message if the negotiation fails
> (i.e. if there isn't a CPU mode that's suitable for both guest and host).
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr_hcall.c | 49 -
>  1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 2daace4..77d2d66 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1044,19 +1044,13 @@ static target_ulong h_signal_sys_reset(PowerPCCPU 
> *cpu,
>  }
>  }
>  
> -static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> -  sPAPRMachineState *spapr,
> -  target_ulong opcode,
> -  target_ulong *args)
> +static uint32_t cas_check_pvr(PowerPCCPU *cpu, target_ulong *addr,
> +  Error **errp)
>  {
> -target_ulong list = ppc64_phys_to_real(args[0]);
> -target_ulong ov_table;
>  bool explicit_match = false; /* Matched the CPU's real PVR */
>  uint32_t max_compat = cpu->max_compat;
>  uint32_t best_compat = 0;
>  int i;
> -sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> -bool guest_radix;
>  
>  /*
>   * We scan the supplied table of PVRs looking for two things
> @@ -1066,9 +1060,9 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  for (i = 0; i < 512; ++i) {
>  uint32_t pvr, pvr_mask;
>  
> -pvr_mask = ldl_be_phys(_space_memory, list);
> -pvr = ldl_be_phys(_space_memory, list + 4);
> -list += 8;
> +pvr_mask = ldl_be_phys(_space_memory, *addr);
> +pvr = ldl_be_phys(_space_memory, *addr + 4);
> +*addr += 8;
>  
>  if (~pvr_mask & pvr) {
>  break; /* Terminator record */
> @@ -1087,17 +1081,38 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  /* We couldn't find a suitable compatibility mode, and either
>   * the guest doesn't support "raw" mode for this CPU, or raw
>   * mode is disabled because a maximum compat mode is set */
> -return H_HARDWARE;
> +error_setg(errp, "Couldn't negotiate a suitable PVR during CAS");
> +return 0;
>  }
>  
>  /* Parsing finished */
>  trace_spapr_cas_pvr(cpu->compat_pvr, explicit_match, best_compat);
>  
> -/* Update CPUs */
> -if (cpu->compat_pvr != best_compat) {
> -Error *local_err = NULL;
> +return best_compat;
> +}
>  
> -ppc_set_compat_all(best_compat, _err);
> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> +  sPAPRMachineState *spapr,
> +  target_ulong opcode,
> +  target_ulong *args)
> +{
> +/* Working address in data buffer */
> +target_ulong addr = ppc64_phys_to_real(args[0]);
> +target_ulong ov_table;
> +uint32_t cas_pvr;
> +sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> +bool guest_radix;
> +Error *local_err = NULL;
> +
> +cas_pvr = cas_check_pvr(cpu, , _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return H_HARDWARE;
> +}
> +
> +/* Update CPUs */
> +if (cpu->compat_pvr != cas_pvr) {
> +ppc_set_compat_all(cas_pvr, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return H_HARDWARE;
> @@ -1105,7 +1120,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  }
>  
>  /* For the future use: here @ov_table points to the first option vector 
> */
> -ov_table = list;
> +ov_table = addr;
>  
>  ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
>  ov5_guest = spapr_ovec_parse_vector(ov_table, 5);



pgpDq3ryyvcmh.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/3] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

2017-05-18 Thread Kevin Wolf

Am 17.05.2017 um 22:16 hat Eric Blake geschrieben:
> On 05/17/2017 12:09 PM, Stefan Hajnoczi wrote:
> > Calling aio_poll() directly may have been fine previously, but this is
> > the future, man!
> 
> lol
> 
> >  The difference between an aio_poll() loop and
> > BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
> > around aio_poll().
> > 
> > This allows the IOThread to run fd handlers or BHs to complete the
> > request.  Failure to release the AioContext causes deadlocks.
> > 
> > Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
> > iothread.
> 
> I'm surprised at how many separate hangs we actually had!

How hard would it be to write some test cases for this? Dataplane has
a serious lack of automated testing.

Kevin


pgp5kTyX144I1.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/5] mc146818rtc: fix Windows VM clock faster

2017-05-18 Thread Paolo Bonzini


> > Looks great.  Thanks for following up quickly on the reviews.
> 
> Paolo, if it is okay to you, could you please consider to merge this
> patchset? ;)

Yes, I've already queued it, but I don't have many patches so I am
delaying the pull request a bit (until at least I have time to test
vhost-user-scsi...).

In the meanwhile, it would be great if you prepared a qtest for the
problem of too many interrupts when the periodic timer is reprogrammed
continuously.

Thanks,

Paolo

Re: [Qemu-devel] Add Markus Armbrusters code for Broadcom Perhiperals for ARM.

2017-05-18 Thread Markus Armbruster

IANAL, this wouldn't be legal advice even if I was one, yadda, yadda,
here goes anyway:

Eric Blake  writes:

> On 05/17/2017 04:25 PM, John Bradley wrote:
>> Well unfortunately Eric. I don't understand your "top posted" slang.
>
> To learn what top-posting is:
> http://lmgtfy.com/?q=what+is+top-posting
>
> and why we don't like it on technical lists:
> http://www.caliburn.nl/topposting.html
>
> Or more humorously:

Seconded.

> A: Yes.
>> >Q: Are you sure?
>>> >>A: Because it reverses the logical flow of conversation.
 >>>Q: Why is top posting frowned upon?
>
>> 
>> As for his "intent", it is quite irrelevant as I have gone over the code 
>> line by line and what every he intended to do, he has succeed, as far as I 
>> can tell , in matching you standards, to such an extent that I am happy that 
>
> You are correct that the GPL gives us legal rights to use Andrew's code
> without his permission.  And yes, YOU can fork qemu, and take whatever
> GPL patches you want without attribution, and you are probably still
> just fine legally (as long as you still abide by the GPL in that you
> distribute sources to anyone that has your binary).
>
> But our project rules do not allow us to live by just GPL (in part,
> because the license of qemu is sometimes tricky to determine due to a
> mix of GPLv2-only code and non-GPL code, even though most new code is
> GPLv2+).  Also, if we ever had a reason to change license (supposing it
> is even possible, although it might require ripping out or
> reimplementing portions of the code base), having S-o-b means that we
> cannot be accused of applying a license that someone did not agree to.
>
> Therefore, it is easier, pragmatically, even if not legally necessary,
> to enforce proper chain of authorship by getting Signed-off-by: tags on
> ALL patches, especially where a patch asserts a copyright owner, insofar
> as the original copyright owner is still alive and able to assent to the
> action.  It is not just about legalities, it is also about risk-avoidance.
>
> It may sound like we are being hard-nosed (and so be it), but there's a
> reason that we list proper Signed-off-by: rules as our number 1 item on
> the SubmitAPatch page.
> http://wiki.qemu.org/Contribute/SubmitAPatch

Yes, we require patch submitters to provide their Signed-off-by.

We encourage documenting a patch's provenance in full by having every
author provide their S-o-b, whenever practical.  But we don't require
it.

The core purpose of the S-o-b is "to improve tracking of who did what"
by making patch authors formally certify that they "wrote it or
otherwise have the right to pass it on as a open-source patch"[2].  Note
"improve" and "have the right to pass it on".

One of the (many) reasons we're making software free is to help our
neighbor[1] by letting him use our creation on a quid pro quo basis.
Always requiring all author's S-o-b could make incorporating otherwise
free code impractical, and thus would conflict with this mission.

When obtaining an S-o-b is impractical, documenting provenance in other
ways has to do.

John wrote upthread:

Andrew Baumann has and others have release the code under GNU
General Public License version 2 (GPLv2), the same as QEMU that
allows me to added it to QEMU as it is under the same license, by
signing it off this is what I am certifying.

I agree with the reasoning "if free software compatible with the GPLv2,
then I can incorporate it into QEMU as long as I certify by signing
off".

John, thank you for your contribution to QEMU.  I'm sorry your patch got
side-tracked into this non-technical swamp, and hope you understand why
we're rather careful when it comes to protecting the freedom of our
software.  I futher hope that we can put than behind us (along with
top-posting *grin*), and move on to the fun part.

[1] See "The four essential freedoms"
https://www.gnu.org/philosophy/free-sw.en.html
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/process/submitting-patches.rst

Re: [Qemu-devel] [PATCH 3/3] pseries: Improve tracing of CPU compatibility negotiation

2017-05-18 Thread Laurent Vivier

On 18/05/2017 07:45, David Gibson wrote:
> This makes some improvements to the debug tracepoints around the
> negotiation of CPU compatibility mode during CAS.  The traces are
> reorganized to emphasise what the inputs and outputs of the process are,
> then distinguish the internal state of the two possible negotiation paths
> (current and pre-2.8 machine type compatibility).
> 
> Signed-off-by: David Gibson 

Reviewed-by: Laurent Vivier 

> ---
>  hw/ppc/spapr_hcall.c | 14 --
>  hw/ppc/trace-events  |  6 --
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a790da7..cea5d99 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1109,7 +1109,6 @@ static uint32_t cas_check_pvr_pre_2_9(PowerPCCPU *cpu, 
> target_ulong *addr,
>  break; /* Terminator record */
>  }
>  
> -trace_spapr_cas_pvr_try(pvr);
>  if (!max_lvl &&
>  ((cpu->env.spr[SPR_PVR] & pvr_mask) == (pvr & pvr_mask))) {
>  cpu_match = true;
> @@ -1120,10 +1119,10 @@ static uint32_t cas_check_pvr_pre_2_9(PowerPCCPU 
> *cpu, target_ulong *addr,
>  } else if (!cpu_match) {
>  cas_handle_compat_cpu(pcc, pvr, max_lvl, _lvl, 
> _pvr);
>  }
> +trace_cas_check_pvr_pre_2_9(pvr, pvr_mask, cpu_match,
> +compat_lvl, compat_pvr);
>  }
>  
> -trace_spapr_cas_pvr(cpu->compat_pvr, cpu_match, compat_pvr);
> -
>  return compat_pvr;
>  }
>  
> @@ -1158,6 +1157,7 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, 
> target_ulong *addr,
>  best_compat = pvr;
>  }
>  }
> +trace_cas_check_pvr(pvr, pvr_mask, explicit_match, best_compat);
>  }
>  
>  if ((best_compat == 0) && (!explicit_match || max_compat)) {
> @@ -1168,9 +1168,6 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, 
> target_ulong *addr,
>  return 0;
>  }
>  
> -/* Parsing finished */
> -trace_spapr_cas_pvr(cpu->compat_pvr, explicit_match, best_compat);
> -
>  return best_compat;
>  }
>  
> @@ -1188,6 +1185,9 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  bool guest_radix;
>  Error *local_err = NULL;
>  
> +trace_cas_check_pvr_start(cpu->compat_pvr, cpu->max_compat,
> +  cpu->env.spr[SPR_PVR]);
> +
>  if (smc->pre_2_9_cas_pvr) {
>  cas_pvr = cas_check_pvr_pre_2_9(cpu, , _err);
>  } else {
> @@ -1198,6 +1198,8 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  return H_HARDWARE;
>  }
>  
> +trace_cas_check_pvr_complete(cpu->compat_pvr, cas_pvr);
> +
>  /* Update CPUs */
>  if (cpu->compat_pvr != cas_pvr) {
>  ppc_set_compat_all(cas_pvr, _err);
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 43d265f..5329740 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -14,8 +14,10 @@ spapr_cas_failed(unsigned long n) "DT diff buffer is too 
> small: %ld bytes"
>  spapr_cas_continue(unsigned long n) "Copy changes to the guest: %ld bytes"
>  
>  # hw/ppc/spapr_hcall.c
> -spapr_cas_pvr_try(uint32_t pvr) "%x"
> -spapr_cas_pvr(uint32_t cur_pvr, bool explicit_match, uint32_t new_pvr) 
> "current=%x, explicit_match=%u, new=%x"
> +cas_check_pvr_pre_2_9(uint32_t pvr, uint32_t mask, bool match, unsigned lvl, 
> uint32_t compat_pvr) "0x%08x/0x%08x match=%d lvl=%d compat_pvr=0x%x"
> +cas_check_pvr(uint32_t pvr, uint32_t pvr_mask, bool explicit_match, uint32_t 
> best_compat) "0x%08x/0x%08x explicit_match=%d best_compat=0x%08x"
> +cas_check_pvr_start(uint32_t compat_pvr, uint32_t max_compat, uint32_t 
> phys_pvr) "Initial compat PVR 0x%08x, max compat 0x%08x (real PVR 0x%08x)"
> +cas_check_pvr_complete(uint32_t old_pvr, uint32_t new_pvr) "Compatibility 
> PVR was 0x%08x, now 0x%08x"
>  
>  # hw/ppc/spapr_iommu.c
>  spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) 
> "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
>

Re: [Qemu-devel] [PATCH 1/3] pseries: Split CAS PVR negotiation out into a separate function

2017-05-18 Thread Laurent Vivier

On 18/05/2017 07:45, David Gibson wrote:
> Guests of the qemu machine type go through a feature negotiation process
> known as "client architecture support" (CAS) during early boot.  This does
> a number of things, one of which is finding a CPU compatibility mode which
> can be supported by both guest and host.
> 
> In fact the CPU negotiation is probably the single most complex part of the
> CAS process, so this splits it out into a helper function.  We've recently
> made some mistakes in maintaining backward compatibility for old machine
> types here.  Splitting this out will also make it easier to fix this.
> 
> This also adds a possibly useful error message if the negotiation fails
> (i.e. if there isn't a CPU mode that's suitable for both guest and host).
> 
> Signed-off-by: David Gibson 

Reviewed-by: Laurent Vivier 

> ---
>  hw/ppc/spapr_hcall.c | 49 -
>  1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 2daace4..77d2d66 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1044,19 +1044,13 @@ static target_ulong h_signal_sys_reset(PowerPCCPU 
> *cpu,
>  }
>  }
>  
> -static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> -  sPAPRMachineState *spapr,
> -  target_ulong opcode,
> -  target_ulong *args)
> +static uint32_t cas_check_pvr(PowerPCCPU *cpu, target_ulong *addr,
> +  Error **errp)
>  {
> -target_ulong list = ppc64_phys_to_real(args[0]);
> -target_ulong ov_table;
>  bool explicit_match = false; /* Matched the CPU's real PVR */
>  uint32_t max_compat = cpu->max_compat;
>  uint32_t best_compat = 0;
>  int i;
> -sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> -bool guest_radix;
>  
>  /*
>   * We scan the supplied table of PVRs looking for two things
> @@ -1066,9 +1060,9 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  for (i = 0; i < 512; ++i) {
>  uint32_t pvr, pvr_mask;
>  
> -pvr_mask = ldl_be_phys(_space_memory, list);
> -pvr = ldl_be_phys(_space_memory, list + 4);
> -list += 8;
> +pvr_mask = ldl_be_phys(_space_memory, *addr);
> +pvr = ldl_be_phys(_space_memory, *addr + 4);
> +*addr += 8;
>  
>  if (~pvr_mask & pvr) {
>  break; /* Terminator record */
> @@ -1087,17 +1081,38 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  /* We couldn't find a suitable compatibility mode, and either
>   * the guest doesn't support "raw" mode for this CPU, or raw
>   * mode is disabled because a maximum compat mode is set */
> -return H_HARDWARE;
> +error_setg(errp, "Couldn't negotiate a suitable PVR during CAS");
> +return 0;
>  }
>  
>  /* Parsing finished */
>  trace_spapr_cas_pvr(cpu->compat_pvr, explicit_match, best_compat);
>  
> -/* Update CPUs */
> -if (cpu->compat_pvr != best_compat) {
> -Error *local_err = NULL;
> +return best_compat;
> +}
>  
> -ppc_set_compat_all(best_compat, _err);
> +static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> +  sPAPRMachineState *spapr,
> +  target_ulong opcode,
> +  target_ulong *args)
> +{
> +/* Working address in data buffer */
> +target_ulong addr = ppc64_phys_to_real(args[0]);
> +target_ulong ov_table;
> +uint32_t cas_pvr;
> +sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> +bool guest_radix;
> +Error *local_err = NULL;
> +
> +cas_pvr = cas_check_pvr(cpu, , _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return H_HARDWARE;
> +}
> +
> +/* Update CPUs */
> +if (cpu->compat_pvr != cas_pvr) {
> +ppc_set_compat_all(cas_pvr, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return H_HARDWARE;
> @@ -1105,7 +1120,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  }
>  
>  /* For the future use: here @ov_table points to the first option vector 
> */
> -ov_table = list;
> +ov_table = addr;
>  
>  ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
>  ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
>

Re: [Qemu-devel] [PATCH 3/6] vhost: Update rings information for IOTLB earlier

2017-05-18 Thread Maxime Coquelin




On 05/17/2017 06:41 PM, Michael S. Tsirkin wrote:

On Fri, May 12, 2017 at 01:21:18PM +0200, Maxime Coquelin wrote:


On 05/11/2017 07:33 PM, Michael S. Tsirkin wrote:

On Thu, May 11, 2017 at 02:32:43PM +0200, Maxime Coquelin wrote:

Vhost-kernel backend need to receive IOTLB entries for rings
information early, but vhost-user need the same information
earlier, before VHOST_USER_SET_VRING_ADDR is sent.

Weird. What does VHOST_USER_SET_VRING_ADDR have to do with it?

According to
Starting and stopping rings
in vhost user spec, vhost user does not access
anything until ring is started and enabled.



This patch also trigger IOTLB miss for all rings informations
for robustness, even if in practice these adresses are on the
same page.

Actually, the DPDK vhost-user backend is compliant with the spec,
but when handling VHOST_USER_SET_VRING_ADDR request, it translates the
guest addresses into backend VAs, and check they are valid. I will make the
commit message clearer about this in next revision.

The check could be done later, for example when the ring are started,
but it wouldn't change the need to trigger a miss at some point.

I think it should be done later, yes. As long as ring is not
started addresses should not be interpreted.



Ok, then I'll move these addresses translations in the
VHOST_USER_SET_VRING_KICK handler.

Thanks,
Maxime

Re: [Qemu-devel] [PATCH V5 7/9] migration: calculate vCPU blocktime on dst side

2017-05-18 Thread Alexey

On Tue, May 16, 2017 at 12:34:16PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.pereva...@samsung.com) wrote:
> > This patch provides blocktime calculation per vCPU,
> > as a summary and as a overlapped value for all vCPUs.
> > 
> > This approach was suggested by Peter Xu, as an improvements of
> > previous approch where QEMU kept tree with faulted page address and cpus 
> > bitmask
> > in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > list for blocktime per vCPU (could be traced with page_fault_addr)
> > 
> > Blocktime will not calculated if postcopy_blocktime field of
> > MigrationIncomingState wasn't initialized.
> > 
> > Signed-off-by: Alexey Perevalov 
> 
> I have some multi-threading/ordering worries still.
> 
> The fault thread receives faults over the ufd and calls
> mark_postcopy_blocktime_being.  That's fine.
> 
> The receiving thread receives pages, calls place page, and
> calls mark_postcopy_blocktime_end.  That's also fine.
> 
> However, remember that we send pages from the source without
> them being requested as background transfers; consider:
> 
> 
> Source   receive-thread  fault-thread
> 
>   1  Send A
>   2  Receive A
>   3Access A
>   4Report on UFD
>   5  Place
>   6Read UFD entry
> 
> 
>  Placing and reading UFD race - and up till now that's been fine;
> so we can read off the ufd an address that's already on it's way from
> the source, and which we might just be receiving, or that we might
> have already placed.
> 
> In this code at (6) won't you call mark_postcopy_blocktime_start
> even though it's already been placed at (5) ? Then that blocktime
> will stay set until the end of the run?
> 
> Perhaps that's not a problem; if mark_postcopy_blocktime_end is called
> for a different address it wont count the blocktime; and when
> mark_postcopy_blocktime_start is called for a different address it'll
> remove the addres that was a problem above - so perhaps that's fine?
It's not 100% fine, but I'm going to clarify my previous answer to that
email where I wrote "forever". That mechanism will think vCPU is blocked
until the same vCPU will block/page copied again.
Unfortunately we don't know vCPU index at *_end time, and I don't want
to extend struct uffdio_copy and add pid into it. But now I have only
expensive and robust or not expensive and not robust solution, like
keeping list of page addressed which was faulted (or just one page
address, the latest, taking into account _end, _start sequence should be
quick, and no other pages interpose, but it's assumption).

BTW with tree based solution, proposed in the first version, was possible to
lookup node by pageaddr in _end and mark it as populated.



> 
> 
> > ---
> >  migration/postcopy-ram.c | 87 
> > +++-
> >  migration/trace-events   |  5 ++-
> >  2 files changed, 90 insertions(+), 2 deletions(-)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index a1f1705..e2660ae 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -23,6 +23,7 @@
> >  #include "migration/postcopy-ram.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/balloon.h"
> > +#include 
> >  #include "qemu/error-report.h"
> >  #include "trace.h"
> >  
> > @@ -542,6 +543,86 @@ static int ram_block_enable_notify(const char 
> > *block_name, void *host_addr,
> >  return 0;
> >  }
> >  
> > +static int get_mem_fault_cpu_index(uint32_t pid)
> > +{
> > +CPUState *cpu_iter;
> > +
> > +CPU_FOREACH(cpu_iter) {
> > +if (cpu_iter->thread_id == pid) {
> > +return cpu_iter->cpu_index;
> > +}
> > +}
> > +trace_get_mem_fault_cpu_index(pid);
> > +return -1;
> > +}
> > +
> > +static void mark_postcopy_blocktime_begin(uint64_t addr, int cpu)
> > +{
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +PostcopyBlocktimeContext *dc;
> > +int64_t now_ms;
> > +if (!mis->blocktime_ctx || cpu < 0) {
> > +return;
> > +}
> 
> You might consider:
> 
>  PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
>  int64_t now_ms;
>  if (!dc || cpu < 0) {
>  return;
>  }
> 
> it gets rid of the two reads of mis->blocktime_ctx
> (You do something similar in a few places)
> 
> > +now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +dc = mis->blocktime_ctx;
> > +if (dc->vcpu_addr[cpu] == 0) {
> > +atomic_inc(>smp_cpus_down);
> > +}
> > +
> > +atomic_xchg__nocheck(>vcpu_addr[cpu], addr);
> > +atomic_xchg__nocheck(>last_begin, now_ms);
> > +atomic_xchg__nocheck(>page_fault_vcpu_time[cpu], now_ms);
> > +
> > +

Re: [Qemu-devel] [RFC PATCH] target/s390x/cpu_models: Set some additional feature bits for the "qemu" CPU

2017-05-18 Thread Aurelien Jarno

On 2017-05-18 03:55, Thomas Huth wrote:
> On 17.05.2017 18:49, David Hildenbrand wrote:
> > On 17.05.2017 17:35, Thomas Huth wrote:
> > 2. I would recommend to not enable STFLE for now. Why?
> > 
> > It is/was an indication that the system is running on a z9 (and
> > implicitly has the basic features). This was not only done because
> > people were lazy, but because this bit was implicitly connected to other
> > machine properties.
> 
> Uh, that's ugly!
> 
> > One popular example is the "DAT-enhancement facility 2". It introduced
> > the "LOAD PAGE TABLE ENTRY ADDRESS" instruction, but no facility bit was
> > introduced. SO there is no way to check if the instruction is available
> > and actually working.
> 
> Does the Linux kernel use this instruction at all? I just grep'ed
> through the kernel sources and did not find it. If the Linux kernel does
> not use it, I think we should ignore this interdependency and just
> provide the STFLE feature bit to the guest - since recent Linux kernels
> depend on it.
> 
> > Please note that we added a feature representation for this facility,
> > because this would allow us later on to at least model removal of such a
> > facility (if HW actually would drop it) on a CPU model level.
> 
> What about STFLE bit 78, according to my version of the POP, it says:
> 
> "The enhanced-DAT facility 2 is installed in the
>  z/Architecture architectural mode."

No that is different. IBM has chosen very confusing names for the DAT
related facilities:
- DAT-Enhancement Facility 1 => bits 3, 4 & 5
- DAT-Enhancement Facility 2 
- Enhanced-DAT Facility 1=> bit 8
- Enhanced-DAT Facility 2=> bit 78

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [RFC 0/2] pseries: Correct behaviour change for older machine types

2017-05-18 Thread Andrea Bolognani

On Thu, 2017-05-18 at 14:51 +1000, David Gibson wrote:
> > I gave it a spin and I can no longer reproduce the issue,
> > so for the entire series:
> > 
> > Tested-by: Andrea Bolognani 
> 
> I'm actually kind of astonished it worked at all given the bug that
> Laurent pointed out.

¯\_(ツ)_/¯

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [Qemu-devel] [PATCH V5 4/9] migration: split ufd_version_check onto receive/request features part

2017-05-18 Thread Alexey

On Tue, May 16, 2017 at 11:32:51AM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.pereva...@samsung.com) wrote:
> > This modification is necessary for userfault fd features which are
> > required to be requested from userspace.
> > UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
> > be introduced in the next patch.
> > 
> > QEMU need to use separate userfault file descriptor, due to
> > userfault context has internal state, and after first call of
> > ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
> > success), but
> > kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So
> > only one ioctl with UFFD_API is possible per ufd.
> > 
> > Signed-off-by: Alexey Perevalov 
> > ---
> >  migration/postcopy-ram.c | 82 
> > ++--
> >  1 file changed, 73 insertions(+), 9 deletions(-)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 0f75700..c96d5f5 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -60,32 +60,96 @@ struct PostcopyDiscardState {
> >  #include 
> >  #include 
> >  
> > -static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > +
> > +/*
> > + * Check userfault fd features, to request only supported features in
> > + * future.
> > + * __NR_userfaultfd - should be checked before
> > + * Return obtained features
> 
> That's not quite right;
>  * Returns: True on success, sets *features to supported features
> False on failure or if kernel doesn't support ufd
> 
yes, obtained features is out parameter,
but I want to keep false uncommented and just add error_report into
syscall check, because the possible reason of failure is:
1. No syscall userfaultfd, but function expects that syscall, it reflects in
comment
2  Within syscall:  exhausted fd or out of memory (file in kernel
is allocating)
3. Problem in ioctl due to internal state of UFFD, as example
UFFDIO_API after UFFDIO_REGISTER

Also I would prefer follow migration/ram.c comment style.
> > + */
> > +static bool receive_ufd_features(uint64_t *features)
> >  {
> > -struct uffdio_api api_struct;
> > -uint64_t ioctl_mask;
> > +struct uffdio_api api_struct = {0};
> > +int ufd;
> > +bool ret = true;
> > +
> > +/* if we are here __NR_userfaultfd should exists */
> > +ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
> > +if (ufd == -1) {
> > +return false;
> > +}
> >  
> > +/* ask features */
> >  api_struct.api = UFFD_API;
> >  api_struct.features = 0;
> >  if (ioctl(ufd, UFFDIO_API, _struct)) {
> > -error_report("%s: UFFDIO_API failed: %s", __func__
> > +error_report("%s: UFFDIO_API failed: %s", __func__,
> >   strerror(errno));
> > +ret = false;
> > +goto release_ufd;
> > +}
> > +
> > +*features = api_struct.features;
> > +
> > +release_ufd:
> > +close(ufd);
> > +return ret;
> > +}
> 
> Needs a comment; perhaps something like:
>   * Called once on a newly opened ufd, can request specific features.
>   * Returns: True on success
> 
> > +static bool request_ufd_features(int ufd, uint64_t features)
> > +{
> > +struct uffdio_api api_struct = {0};
> > +uint64_t ioctl_mask;
> > +
> > +api_struct.api = UFFD_API;
> > +api_struct.features = features;
> > +if (ioctl(ufd, UFFDIO_API, _struct)) {
> > +error_report("%s failed: UFFDIO_API failed: %s", __func__,
> > +strerror(errno));
> >  return false;
> >  }
> >  
> > -ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
> > - (__u64)1 << _UFFDIO_UNREGISTER;
> > +ioctl_mask = 1 << _UFFDIO_REGISTER |
> > + 1 << _UFFDIO_UNREGISTER;
> >  if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
> >  error_report("Missing userfault features: %" PRIx64,
> >   (uint64_t)(~api_struct.ioctls & ioctl_mask));
> >  return false;
> >  }
> >  
> > +return true;
> > +}
> > +
> > +static bool ufd_check_and_apply(int ufd, MigrationIncomingState *mis)
> > +{
> > +uint64_t asked_features = 0;
> > +uint64_t supported_features;
> > +
> > +/*
> > + * it's not possible to
> > + * request UFFD_API twice per one fd
> > + */
> > +if (!receive_ufd_features(_features)) {
> > +error_report("%s failed", __func__);
> > +return false;
> > +}
> > +
> > +/*
> > + * request features, even if asked_features is 0, due to
> > + * kernel expects UFFD_API before UFFDIO_REGISTER, per
> > + * userfault file descriptor
> > + */
> > +if (!request_ufd_features(ufd, asked_features)) {
> > +error_report("%s failed: features %" PRIu64, __func__,
> > +asked_features);
> > +return false;
> > +}
> > +
> >  if (getpagesize() != ram_pagesize_summary()) {
> >  bool

Re: [Qemu-devel] [RFC PATCH v1 6/6] spapr: Fix migration of Radix guests

2017-05-18 Thread David Gibson

On Thu, May 18, 2017 at 10:33:05AM +0530, Bharata B Rao wrote:
> On Wed, May 17, 2017 at 05:20:31PM +1000, David Gibson wrote:
> > On Wed, May 17, 2017 at 12:45:39PM +0530, Bharata B Rao wrote:
> > > On Wed, May 17, 2017 at 05:00:49PM +1000, David Gibson wrote:
> > > > On Wed, May 17, 2017 at 09:19:22AM +0530, Bharata B Rao wrote:
> > > > > Fix migration of radix guests by ensuring that we issue
> > > > > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.
> > > > > 
> > > > > Reported-by: Nageswara R Sastry 
> > > > > Signed-off-by: Bharata B Rao 
> > > > > ---
> > > > >  hw/ppc/spapr.c | 15 +++
> > > > >  hw/ppc/spapr_hcall.c   |  1 +
> > > > >  include/hw/ppc/spapr.h |  1 +
> > > > >  3 files changed, 17 insertions(+)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 05abfc1..dd1d687 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -1443,6 +1443,20 @@ static int spapr_post_load(void *opaque, int 
> > > > > version_id)
> > > > >  err = spapr_rtc_import_offset(>rtc, 
> > > > > spapr->rtc_offset);
> > > > >  }
> > > > >  
> > > > > +if (spapr->patb_entry) {
> > > > > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
> > > > > +err = kvmppc_configure_v3_mmu(POWERPC_CPU(first_cpu),
> > > > > +  spapr->patb_flags &
> > > > > +  SPAPR_PROC_TABLE_RADIX,
> > > > > +  spapr->patb_flags &
> > > > > +  SPAPR_PROC_TABLE_GTSE,
> > > > 
> > > > You should be able to work out the things you need here from
> > > > patb_entry without adding the new patb_flags field.
> > > 
> > > kvmppc_configure_v3_mmu() needs two bools: radix and gtse. The radix
> > > bit can be implied from patb_entry, I needed patb_flags to get the
> > > gtse value. Not immediately obvious of how to get gtse bit from 
> > > patb_entry,
> > > but let me take a relook.
> > 
> > Oh, sorry, you can't get GTSE from the patb_entry, but you can get it
> > from the LPCR.
> 
> So I need to get GTSE from LPCR at the target after the LPCR has been updated
> from the incoming stream (via vmstate_ppc_cpu vmsd). However we are also
> in the middle of processing the incoming stream here (via spapr_post_load).
> Can we be sure that spapr_post_load() processing happens after all
> SPRs (and hence LPCR) have been updated via vmstate_ppc_cpu vmsd handlers ?

Yes, you can.  An object's post_load is called after all that object's
state - and that of all objects below it in the QOM composition tree -
is transferred.  Since the CPUs are below the machine in the
composition tree, you can rely on the LPCR being correct in the
machine post_load handler.

> Also, in addition to issuing KVM_PPC_CONFIGURE_V3_MMU, should we be
> walking all the CPUs and setting the LPCR_UPRT and LPCR_GTSE bits like how
> H_REGISTER_PROC_TBL does ?

Shouldn't be necessary - the LPCR state should already be transferred,
along with the rest of the SPRs.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] xics_kvm: cache already enabled vCPU ids

2017-05-18 Thread David Gibson

On Wed, May 17, 2017 at 04:38:20PM +0200, Greg Kurz wrote:
> Since commit a45863bda90d ("xics_kvm: Don't enable KVM_CAP_IRQ_XICS if
> already enabled"), we were able to re-hotplug a vCPU that had been hot-
> unplugged ealier, thanks to a boolean flag in ICPState that we set when
> enabling KVM_CAP_IRQ_XICS.
> 
> This could work because the lifecycle of all ICPState objects was the
> same as the machine. Commit 5bc8d26de20c ("spapr: allocate the ICPState
> object from under sPAPRCPUCore") broke this assumption and now we always
> pass a freshly allocated ICPState object (ie, with the flag unset) to
> icp_kvm_cpu_setup().
> 
> This cause re-hotplug to fail with:
> 
> Unable to connect CPU8 to kernel XICS: Device or resource busy
> 
> Let's fix this by caching all the vCPU ids for which KVM_CAP_IRQ_XICS was
> enabled. This also drops the now useless boolean flag from ICPState.
> 
> Reported-by: Laurent Vivier 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-2.10.

> ---
>  hw/intc/xics_kvm.c|   27 ---
>  include/hw/ppc/xics.h |1 -
>  2 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index dd93531ae376..dd7f29846235 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -42,6 +42,14 @@
>  
>  static int kernel_xics_fd = -1;
>  
> +typedef struct KVMEnabledICP {
> +unsigned long vcpu_id;
> +QLIST_ENTRY(KVMEnabledICP) node;
> +} KVMEnabledICP;
> +
> +static QLIST_HEAD(, KVMEnabledICP)
> +kvm_enabled_icps = QLIST_HEAD_INITIALIZER(_enabled_icps);
> +
>  /*
>   * ICP-KVM
>   */
> @@ -121,6 +129,8 @@ static void icp_kvm_reset(void *dev)
>  static void icp_kvm_cpu_setup(ICPState *icp, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
> +KVMEnabledICP *enabled_icp;
> +unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
>  int ret;
>  
>  if (kernel_xics_fd == -1) {
> @@ -132,18 +142,21 @@ static void icp_kvm_cpu_setup(ICPState *icp, PowerPCCPU 
> *cpu)
>   * which was hot-removed earlier we don't have to renable
>   * KVM_CAP_IRQ_XICS capability again.
>   */
> -if (icp->cap_irq_xics_enabled) {
> -return;
> +QLIST_FOREACH(enabled_icp, _enabled_icps, node) {
> +if (enabled_icp->vcpu_id == vcpu_id) {
> +return;
> +}
>  }
>  
> -ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd,
> -  kvm_arch_vcpu_id(cs));
> +ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd, 
> vcpu_id);
>  if (ret < 0) {
> -error_report("Unable to connect CPU%ld to kernel XICS: %s",
> - kvm_arch_vcpu_id(cs), strerror(errno));
> +error_report("Unable to connect CPU%ld to kernel XICS: %s", vcpu_id,
> + strerror(errno));
>  exit(1);
>  }
> -icp->cap_irq_xics_enabled = true;
> +enabled_icp = g_malloc(sizeof(*enabled_icp));
> +enabled_icp->vcpu_id = vcpu_id;
> +QLIST_INSERT_HEAD(_enabled_icps, enabled_icp, node);
>  }
>  
>  static void icp_kvm_realize(DeviceState *dev, Error **errp)
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index d6cb51f3ad5d..a3073f90533a 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -81,7 +81,6 @@ struct ICPState {
>  uint8_t pending_priority;
>  uint8_t mfrr;
>  qemu_irq output;
> -bool cap_irq_xics_enabled;
>  
>  XICSFabric *xics;
>  };
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

< 1 2 3

201 - 282 of 282 matches

Mail list logo