Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Lei Li

On 08/21/2013 06:48 PM, Paolo Bonzini wrote:

Il 21/08/2013 09:18, Lei Li ha scritto:

Send all the ram blocks hooked by save_page, which will copy
ram page and MADV_DONTNEED the page just copied.

You should implement this entirely in the hook.

It will be a little less efficient because of the dirty bitmap overhead,
but you should aim at having *zero* changes in arch_init.c and migration.c.


Yes, the reason I modify the migration_thread() to have new process that send 
all
the ram pages in adjusted qemu_savevm_state_begin stage and send device states 
in
qemu_savevm_device_state stage for localhost migration is to avoid the bitmap 
thing,
which is a little less efficient just like you mentioned above.

The performance assurance is very important to this feature, our goal is 100ms
of downtime for a 1TB guest.



Paolo


Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  arch_init.c |   19 +--
  1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 434a4ca..cbbb4db 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -474,7 +474,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
  /* In doubt sent page as normal */
  bytes_sent = -1;
  ret = ram_control_save_page(f, block-offset,
-   offset, TARGET_PAGE_SIZE, bytes_sent);
+offset, TARGET_PAGE_SIZE, bytes_sent);
  
  if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {

  if (ret != RAM_SAVE_CONTROL_DELAYED) {
@@ -613,11 +613,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
  RAMBlock *block;
  int64_t ram_pages = last_ram_offset()  TARGET_PAGE_BITS;
  
-migration_bitmap = bitmap_new(ram_pages);

-bitmap_set(migration_bitmap, 0, ram_pages);
-migration_dirty_pages = ram_pages;
-mig_throttle_on = false;
-dirty_rate_high_cnt = 0;
+if (!migrate_is_localhost()) {
+migration_bitmap = bitmap_new(ram_pages);
+bitmap_set(migration_bitmap, 0, ram_pages);
+migration_dirty_pages = ram_pages;
+mig_throttle_on = false;
+dirty_rate_high_cnt = 0;
+}
  
  if (migrate_use_xbzrle()) {

  XBZRLE.cache = cache_init(migrate_xbzrle_cache_size() /
@@ -641,6 +643,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
  migration_bitmap_sync();
  qemu_mutex_unlock_iothread();
  
+if (migrate_is_localhost()) {

+ram_save_blocks(f);
+return 0;
+}
+
  qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
  
  QTAILQ_FOREACH(block, ram_list.blocks, next) {







--
Lei




Re: [Qemu-devel] [PATCH 06/18] bugfix: wrong error set by ram_control_load_hook()

2013-08-23 Thread Lei Li

On 08/23/2013 01:34 PM, Paolo Bonzini wrote:

On 08/21/2013 06:40 PM, Paolo Bonzini wrote:

Il 21/08/2013 09:18, Lei Li ha scritto:

It should set negative error value if there has been an error.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
   savevm.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/savevm.c b/savevm.c
index 1522d95..f10e031 100644
--- a/savevm.c
+++ b/savevm.c
@@ -649,7 +649,7 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t
flags)
   
   void ram_control_load_hook(QEMUFile *f, uint64_t flags)

   {
-int ret = 0;
+int ret = -1;

-EINVAL

OK, thanks.

Can you extract the patches that I reviewed positively, plus this
one, and send them separately?


Sure, I will send it later. :)



Thanks!

Paolo




--
Lei




Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-23 Thread Bharata B Rao
On Wed, Aug 21, 2013 at 05:40:11PM +0200, Paolo Bonzini wrote:
 
 We could just use a bottom half, too.  Add a bottom half to acb,
 schedule it in gluster_finish_aiocb, delete it in the bottom half's own
 callback.

I tried this approach (the patch at the end of this mail), but see this
occasional errors, doesn't happen always. Any clues on what am I missing here ?

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcbfff700 (LWP 11301)]
0x5597156b in event_notifier_set (e=0xa4)
at util/event_notifier-posix.c:97
97  ret = write(e-wfd, value, sizeof(value));
(gdb) bt
#0  0x5597156b in event_notifier_set (e=0xa4)
at util/event_notifier-posix.c:97
#1  0x555dd3cd in aio_notify (ctx=0x0) at async.c:233
#2  0x555dd00f in qemu_bh_schedule (bh=0x7fff38e0) at async.c:128
#3  0x556002da in gluster_finish_aiocb (fd=0x562ae5d0, ret=4096, 
arg=0x7fffd00419c0) at block/gluster.c:409
#4  0x75e70cdc in glfs_io_async_cbk (ret=4096, frame=0x0, data=
0x56443ee0) at glfs-fops.c:567
#5  0x75c2843e in synctask_wrap (old_task=0x56365940)
at syncop.c:133
#6  0x73b03370 in ?? () from /lib64/libc.so.6
#7  0x in ?? ()
(gdb) up
#1  0x555dd3cd in aio_notify (ctx=0x0) at async.c:233
233 event_notifier_set(ctx-notifier);
(gdb) up
#2  0x555dd00f in qemu_bh_schedule (bh=0x7fff38e0) at async.c:128
128 aio_notify(bh-ctx);
(gdb) p *bh
$1 = {ctx = 0x0, cb = 0x555ffdcd qemu_gluster_aio_bh, opaque = 
0x7fffd00419c0, next = 0x56345e70, scheduled = false, idle = false, 
  deleted = true}


gluster: Use BH mechanism for AIO completion

Replace the existing pipe based AIO completion handing by BH based method.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
 block/gluster.c |   69 ++-
 1 file changed, 8 insertions(+), 61 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 46f36f8..598b335 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -30,7 +30,6 @@ typedef struct GlusterAIOCB {
 
 typedef struct BDRVGlusterState {
 struct glfs *glfs;
-int fds[2];
 struct glfs_fd *fd;
 int event_reader_pos;
 GlusterAIOCB *event_acb;
@@ -231,12 +230,13 @@ out:
 return NULL;
 }
 
-static void qemu_gluster_complete_aio(GlusterAIOCB *acb, BDRVGlusterState *s)
+static void qemu_gluster_aio_bh(void *opaque)
 {
 int ret;
+GlusterAIOCB *acb = opaque;
 bool *finished = acb-finished;
 BlockDriverCompletionFunc *cb = acb-common.cb;
-void *opaque = acb-common.opaque;
+void *cb_opaque = acb-common.opaque;
 
 if (!acb-ret || acb-ret == acb-size) {
 ret = 0; /* Success */
@@ -246,33 +246,15 @@ static void qemu_gluster_complete_aio(GlusterAIOCB *acb, 
BDRVGlusterState *s)
 ret = -EIO; /* Partial read/write - fail it */
 }
 
+qemu_bh_delete(acb-bh);
 qemu_aio_release(acb);
-cb(opaque, ret);
+
+cb(cb_opaque, ret);
 if (finished) {
 *finished = true;
 }
 }
 
-static void qemu_gluster_aio_event_reader(void *opaque)
-{
-BDRVGlusterState *s = opaque;
-ssize_t ret;
-
-do {
-char *p = (char *)s-event_acb;
-
-ret = read(s-fds[GLUSTER_FD_READ], p + s-event_reader_pos,
-   sizeof(s-event_acb) - s-event_reader_pos);
-if (ret  0) {
-s-event_reader_pos += ret;
-if (s-event_reader_pos == sizeof(s-event_acb)) {
-s-event_reader_pos = 0;
-qemu_gluster_complete_aio(s-event_acb, s);
-}
-}
-} while (ret  0  errno == EINTR);
-}
-
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = gluster,
@@ -329,17 +311,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 s-fd = glfs_open(s-glfs, gconf-image, open_flags);
 if (!s-fd) {
 ret = -errno;
-goto out;
-}
-
-ret = qemu_pipe(s-fds);
-if (ret  0) {
-ret = -errno;
-goto out;
 }
-fcntl(s-fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
-qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ],
-qemu_gluster_aio_event_reader, NULL, s);
 
 out:
 qemu_opts_del(opts);
@@ -417,31 +389,10 @@ static const AIOCBInfo gluster_aiocb_info = {
 static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
 {
 GlusterAIOCB *acb = (GlusterAIOCB *)arg;
-BlockDriverState *bs = acb-common.bs;
-BDRVGlusterState *s = bs-opaque;
-int retval;
 
 acb-ret = ret;
-retval = qemu_write_full(s-fds[GLUSTER_FD_WRITE], acb, sizeof(acb));
-if (retval != sizeof(acb)) {
-/*
- * Gluster AIO callback thread failed to notify the waiting
- * QEMU thread about IO completion.
- *
- * Complete this IO request and make the disk inaccessible for
- * subsequent reads and 

Re: [Qemu-devel] VNC key presses not correct

2013-08-23 Thread Erik Rull
 On August 23, 2013 at 7:42 AM Markus Armbruster arm...@redhat.com wrote:


 Anthony Liguori anth...@codemonkey.ws writes:

  On Aug 22, 2013 4:55 PM, Erik Rull erik.r...@rdsoftware.de wrote:
 
  Markus Armbruster wrote:
 
  Erik Rull erik.r...@rdsoftware.de writes:
 
  Markus Armbruster wrote:
 
  Erik Rull erik.r...@rdsoftware.de writes:
 
  Hi all,
 
  I'm struggling with the QEMU VNC on qemu-kvm-1.2.0 a bit, the following
  two
  things are not working properly:
 
 
  Have you tried to reproduce on a current version?
 
 
  Hello Markus,
 
  yes, I was able to reproduce this with the released qemu-1.6.0 from
  the qemu-wiki website. Character mistakes are the same when opening a
  Notepad on Windows XP on the guest.
 
 
  Thanks!
 
  Maybe nobody cares because english is the layout used nearly everywhere
  :-)
 
 
  I nobody cared, we wouldn't ship it :)
 
  Cc'ing Gerd, who knows more about VNC than I do.
 
  And it is not related to the host operating system, I used a Debian
  6.0 and a self built linux - both with the same effect.
  And it is not related to the VNC client, I tried several clients on
  several operating systems, all with the same result.
 
  Do you see this effect with a non-english input layout on your side, too?
  When using SDL/direct input everything is great.
 
 
  Your initial report has details on some keypresses.  Please also tell us
  your complete QEMU command line, and your complete VNC client command
  line(s).
 
 
  Sure, here it is:
  ./qemu-system-x86_64 -drive file=../../../qemu-img/windows.img,cache=off
  -readconfig ../docs/ich9-ehci-uhci.cfg -device usb-tablet,bus=ehci.0 -boot
  c -no-acpi -monitor telnet::5100,server,nowait -vnc :1 -m 1024 --enable-kvm
  -cpu host
 
  You need to specify a keymap with -k

 Yes, because...

  Client is e.g. Ultra VNC 32 bit under Windows XP V 1.0.8.0 = Just enter
  the IP::Port and connect. Or RealVNC in the same way.
  Client OS has german keyboard and german keyboard layout - on the QEMU
  guest side it doesn't matter which input language is set.
  Guest is Windows XP SP3 32 bit.

 ... you use a VNC client that doesn't understand the extended key event
 extension.  The extension enables 100% faithful key interpretation, -k
 can only approximate it.

 https://www.berrange.com/posts/2010/07/04/more-than-you-or-i-ever-wanted-to-know-about-virtual-keyboard-handling/


Thanks, that's it! Updated to the latest VNC client version - works (btw.
TigerVNC crashes when sending Ctrl+Alt+Del). One thing has to be done due to the
default english keyboard layout: switch the input language on the client side to
english - then all keypresses are interpreted correctly. This is acceptable due
to the multilanguage user group of the guest and less effort than rebooting the
guest when a user with a different keyboard layout logs in :-)

Best regards,

Erik



[Qemu-devel] [PATCH] migration: Fix debug print type

2013-08-23 Thread Christoffer Dall
The printf args are uint64_t and with -Werr QEMU doesn't compile with
migration debugging turned on unless this is fixed.  Fix it.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
 migration.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration.c b/migration.c
index 1402fa7..0d8706f 100644
--- a/migration.c
+++ b/migration.c
@@ -566,7 +566,7 @@ static void *migration_thread(void *opaque)
 if (!qemu_file_rate_limit(s-file)) {
 DPRINTF(iterate\n);
 pending_size = qemu_savevm_state_pending(s-file, max_size);
-DPRINTF(pending size %lu max %lu\n, pending_size, max_size);
+DPRINTF(pending size %llu max %llu\n, pending_size, max_size);
 if (pending_size  pending_size = max_size) {
 qemu_savevm_state_iterate(s-file);
 } else {
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] migration: Fix debug print type

2013-08-23 Thread Stefan Weil
See my comment below.

Am 23.08.2013 09:06, schrieb Christoffer Dall:
 The printf args are uint64_t and with -Werr QEMU doesn't compile with
 migration debugging turned on unless this is fixed.  Fix it.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
  migration.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/migration.c b/migration.c
 index 1402fa7..0d8706f 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -566,7 +566,7 @@ static void *migration_thread(void *opaque)
  if (!qemu_file_rate_limit(s-file)) {
  DPRINTF(iterate\n);
  pending_size = qemu_savevm_state_pending(s-file, max_size);
 -DPRINTF(pending size %lu max %lu\n, pending_size, max_size);
 +DPRINTF(pending size %llu max %llu\n, pending_size, max_size);

Please use PRIu64 for uint64_t.

  if (pending_size  pending_size = max_size) {
  qemu_savevm_state_iterate(s-file);
  } else {




Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 08:48, Bharata B Rao ha scritto:
 On Wed, Aug 21, 2013 at 05:40:11PM +0200, Paolo Bonzini wrote:

 We could just use a bottom half, too.  Add a bottom half to acb,
 schedule it in gluster_finish_aiocb, delete it in the bottom half's own
 callback.
 
 I tried this approach (the patch at the end of this mail), but see this
 occasional errors, doesn't happen always. Any clues on what am I missing here 
 ?
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fffcbfff700 (LWP 11301)]
 0x5597156b in event_notifier_set (e=0xa4)
 at util/event_notifier-posix.c:97
 97ret = write(e-wfd, value, sizeof(value));
 (gdb) bt
 #0  0x5597156b in event_notifier_set (e=0xa4)
 at util/event_notifier-posix.c:97
 #1  0x555dd3cd in aio_notify (ctx=0x0) at async.c:233
 #2  0x555dd00f in qemu_bh_schedule (bh=0x7fff38e0) at async.c:128
 #3  0x556002da in gluster_finish_aiocb (fd=0x562ae5d0, ret=4096, 
 arg=0x7fffd00419c0) at block/gluster.c:409
 #4  0x75e70cdc in glfs_io_async_cbk (ret=4096, frame=0x0, data=
 0x56443ee0) at glfs-fops.c:567
 #5  0x75c2843e in synctask_wrap (old_task=0x56365940)
 at syncop.c:133
 #6  0x73b03370 in ?? () from /lib64/libc.so.6
 #7  0x in ?? ()
 (gdb) up
 #1  0x555dd3cd in aio_notify (ctx=0x0) at async.c:233
 233   event_notifier_set(ctx-notifier);
 (gdb) up
 #2  0x555dd00f in qemu_bh_schedule (bh=0x7fff38e0) at async.c:128
 128   aio_notify(bh-ctx);
 (gdb) p *bh
 $1 = {ctx = 0x0, cb = 0x555ffdcd qemu_gluster_aio_bh, opaque = 
 0x7fffd00419c0, next = 0x56345e70, scheduled = false, idle = false, 
   deleted = true}

This looks like a use-after-free, with bh-ctx corrupted when freeing
the bottom half.  But it's not at all obvious how it can happen.

I suggest using MALLOC_PERTURB_=42 to check this theory (if it is
correct, most fields will be something like 0x2a2a2a2a2a2a2a2a).  But I
don't see anything clearly wrong in the patch... Thus perhaps it is
simpler to just remove the unreachable error handling code.

Paolo

 
 
 gluster: Use BH mechanism for AIO completion
 
 Replace the existing pipe based AIO completion handing by BH based method.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 ---
  block/gluster.c |   69 
 ++-
  1 file changed, 8 insertions(+), 61 deletions(-)
 
 diff --git a/block/gluster.c b/block/gluster.c
 index 46f36f8..598b335 100644
 --- a/block/gluster.c
 +++ b/block/gluster.c
 @@ -30,7 +30,6 @@ typedef struct GlusterAIOCB {
  
  typedef struct BDRVGlusterState {
  struct glfs *glfs;
 -int fds[2];
  struct glfs_fd *fd;
  int event_reader_pos;
  GlusterAIOCB *event_acb;
 @@ -231,12 +230,13 @@ out:
  return NULL;
  }
  
 -static void qemu_gluster_complete_aio(GlusterAIOCB *acb, BDRVGlusterState *s)
 +static void qemu_gluster_aio_bh(void *opaque)
  {
  int ret;
 +GlusterAIOCB *acb = opaque;
  bool *finished = acb-finished;
  BlockDriverCompletionFunc *cb = acb-common.cb;
 -void *opaque = acb-common.opaque;
 +void *cb_opaque = acb-common.opaque;
  
  if (!acb-ret || acb-ret == acb-size) {
  ret = 0; /* Success */
 @@ -246,33 +246,15 @@ static void qemu_gluster_complete_aio(GlusterAIOCB 
 *acb, BDRVGlusterState *s)
  ret = -EIO; /* Partial read/write - fail it */
  }
  
 +qemu_bh_delete(acb-bh);
  qemu_aio_release(acb);
 -cb(opaque, ret);
 +
 +cb(cb_opaque, ret);
  if (finished) {
  *finished = true;
  }
  }
  
 -static void qemu_gluster_aio_event_reader(void *opaque)
 -{
 -BDRVGlusterState *s = opaque;
 -ssize_t ret;
 -
 -do {
 -char *p = (char *)s-event_acb;
 -
 -ret = read(s-fds[GLUSTER_FD_READ], p + s-event_reader_pos,
 -   sizeof(s-event_acb) - s-event_reader_pos);
 -if (ret  0) {
 -s-event_reader_pos += ret;
 -if (s-event_reader_pos == sizeof(s-event_acb)) {
 -s-event_reader_pos = 0;
 -qemu_gluster_complete_aio(s-event_acb, s);
 -}
 -}
 -} while (ret  0  errno == EINTR);
 -}
 -
  /* TODO Convert to fine grained options */
  static QemuOptsList runtime_opts = {
  .name = gluster,
 @@ -329,17 +311,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  
 QDict *options,
  s-fd = glfs_open(s-glfs, gconf-image, open_flags);
  if (!s-fd) {
  ret = -errno;
 -goto out;
 -}
 -
 -ret = qemu_pipe(s-fds);
 -if (ret  0) {
 -ret = -errno;
 -goto out;
  }
 -fcntl(s-fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
 -qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ],
 -qemu_gluster_aio_event_reader, NULL, s);
  
  out:
  qemu_opts_del(opts);
 @@ -417,31 +389,10 @@ static const AIOCBInfo gluster_aiocb_info = {
  static void 

Re: [Qemu-devel] [PATCH 07/18] arch_init: export RAM_SAVE_xxx flags

2013-08-23 Thread Paolo Bonzini
Il 22/08/2013 22:14, Michael R. Hines ha scritto:
 On 08/21/2013 06:49 AM, Paolo Bonzini wrote:
 Il 21/08/2013 09:18, Lei Li ha scritto:
 Export RAM_SAVE_xxx flags for localhost migration.

 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
   arch_init.c   |   12 
   include/migration/migration.h |   14 ++
   2 files changed, 14 insertions(+), 12 deletions(-)

 diff --git a/arch_init.c b/arch_init.c
 index 68a7ab7..1ea7c29 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -108,18 +108,6 @@ static bool mig_throttle_on;
   static int dirty_rate_high_cnt;
   static void check_guest_throttling(void);
   -/***/
 -/* ram save/restore */
 -
 -#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
 -#define RAM_SAVE_FLAG_COMPRESS 0x02
 -#define RAM_SAVE_FLAG_MEM_SIZE 0x04
 -#define RAM_SAVE_FLAG_PAGE 0x08
 -#define RAM_SAVE_FLAG_EOS  0x10
 -#define RAM_SAVE_FLAG_CONTINUE 0x20
 -#define RAM_SAVE_FLAG_XBZRLE   0x40
 -/* 0x80 is reserved in migration.h start with 0x100 next */
 -
 static struct defconfig_file {
   const char *filename;
 diff --git a/include/migration/migration.h
 b/include/migration/migration.h
 index 6a24e65..5336117 100644
 --- a/include/migration/migration.h
 +++ b/include/migration/migration.h
 @@ -158,12 +158,26 @@ void ram_control_before_iterate(QEMUFile *f,
 uint64_t flags);
   void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
   void ram_control_load_hook(QEMUFile *f, uint64_t flags);
   +
 +/***/
 +/* ram save/restore */
 +
 +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
 +#define RAM_SAVE_FLAG_COMPRESS 0x02
 +#define RAM_SAVE_FLAG_MEM_SIZE 0x04
 +#define RAM_SAVE_FLAG_PAGE 0x08
 +#define RAM_SAVE_FLAG_EOS  0x10
 +#define RAM_SAVE_FLAG_CONTINUE 0x20
 +#define RAM_SAVE_FLAG_XBZRLE   0x40
 +
   /* Whenever this is found in the data stream, the flags
* will be passed to ram_control_load_hook in the incoming-migration
* side. This lets before_ram_iterate/after_ram_iterate add
* transport-specific sections to the RAM migration data.
*/
   #define RAM_SAVE_FLAG_HOOK 0x80
 +/* Start with 0x100 next */
 +
 #define RAM_SAVE_CONTROL_NOT_SUPP -1000
   #define RAM_SAVE_CONTROL_DELAYED  -2000

 This also looks like an encapsulation violation.

 Localhost migration is not very different in concept from RDMA (except
 that it runs with the VM stopped, but that's just because you do
 MADV_DONTNEED---it's not specific to the migration transport), and it
 manages to do everything without touching arch_init.c and migration.c.
 
 I think maybe we need a private header file here. Multiple parties want
 to add potentially add to this flag list without modifying arch_init.c.

There is a mechanism for private flags, which is the page-load hook we
introduced for RDMA.  The existing RAM_SAVE_FLAG_* should not be needed
outside arch_init.c.

MIG_STATE_* might be needed elsewhere, but the right way to export it is
this one: change MigrationInfo's status field from a string to a QAPI
enum, and remove MIG_STATE_* altogether in favor of the QAPI enum.

Paolo



Re: [Qemu-devel] [PATCH 11/18] migration: introduce capability localhost

2013-08-23 Thread Paolo Bonzini
Il 22/08/2013 22:50, Michael R. Hines ha scritto:
 On 08/21/2013 11:18 AM, Paolo Bonzini wrote:
 Il 21/08/2013 09:18, Lei Li ha scritto:
   } else if (strstart(uri, unix:, p)) {
 +if (s-enabled_capabilities[MIGRATION_CAPABILITY_LOCALHOST]) {
 +local_start_outgoing_migration(s, p, local_err);
 +}
   unix_start_outgoing_migration(s, p, local_err);
   } else if (strstart(uri, fd:, p)) {
   fd_start_outgoing_migration(s, p, local_err);
 @@ -521,6 +524,15 @@ int migrate_use_xbzrle(void)
   return s-enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
   }
   +bool migrate_is_localhost(void)
 +{
 +MigrationState *s;
 +
 +s = migrate_get_current();
 +
 +return s-enabled_capabilities[MIGRATION_CAPABILITY_LOCALHOST];
 +}
 I think this is a bad name, too.  There is nothing more local in this
 migration than in unix: migration.

 Let's call the capability according to what it does, for example
 unix-page-flipping.
 
 Why is there a capability at all? Isn't that what the local URI is for?

Because in these patches, the local URI is only present in the
destination (which is wrong: the destination should autodetect local
mode using the load-page hook).  As you can see above, a unix URI will
examine the capability and pick the appropriate migration method.

However, this is also the wrong place to look at the capability.  It is
the save-page hook that should examine the capability.  It will then do
nothing if it is disabled, and do page-flipping if the capability is on.

Paolo



Re: [Qemu-devel] [PATCH 08/18] migration-local: introduce qemu_fopen_local()

2013-08-23 Thread Lei Li

On 08/23/2013 04:42 AM, Michael R. Hines wrote:

On 08/21/2013 03:18 AM, Lei Li wrote:

Introduce read/write backend of QEMUFileLocal used by localhost
migration. The unix domain socket will be replaced by PIPE with
vmsplice mechanism.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  Makefile.objs |1 +
  migration-local.c |  211 
+

  2 files changed, 212 insertions(+), 0 deletions(-)
  create mode 100644 migration-local.c

diff --git a/Makefile.objs b/Makefile.objs
index f46a4cd..30670cc 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -54,6 +54,7 @@ common-obj-y += migration.o migration-tcp.o
  common-obj-$(CONFIG_RDMA) += migration-rdma.o
  common-obj-y += qemu-char.o #aio.o
  common-obj-y += block-migration.o
+common-obj-y += migration-local.o
  common-obj-y += page_cache.o xbzrle.o

  common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o 
migration-fd.o

diff --git a/migration-local.c b/migration-local.c
new file mode 100644
index 000..93190fd
--- /dev/null
+++ b/migration-local.c
@@ -0,0 +1,211 @@
+/*
+ * QEMU localhost migration
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include config-host.h
+#include qemu-common.h
+#include migration/migration.h
+#include exec/cpu-common.h
+#include config.h
+#include exec/cpu-all.h
+#include monitor/monitor.h
+#include migration/qemu-file.h
+#include qemu/iov.h
+#include sysemu/arch_init.h
+#include sysemu/sysemu.h
+#include block/block.h
+#include qemu/sockets.h
+#include migration/block.h
+#include qemu/thread.h
+#include qmp-commands.h
+#include trace.h
+#include qemu/osdep.h
+
+//#define DEBUG_MIGRATION_LOCAL
+
+#ifdef DEBUG_MIGRATION_LOCAL
+#define DPRINTF(fmt, ...) \
+do { printf(migration-local:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+/*
+ * Interface for the local migration.
+ */
+typedef struct QEMUFileLocal {
+QEMUFile *file;
+int fd;
+int state;
+
+/*
+ * This is the last block from where we have sent data
+ * for local migration
+ */
+RAMBlock *last_block_sent;
+} QEMUFileLocal;
+
+
+static int qemu_local_get_buffer(void *opaque, uint8_t *buf,
+ int64_t pos, int size)
+{
+QEMUFileLocal *s = opaque;
+ssize_t len;
+
+for (;;) {
+len = qemu_recv(s-fd, buf, size, 0);
+if (len != -1) {
+break;
+}
+if (socket_error() == EAGAIN) {
+yield_until_fd_readable(s-fd);
+} else if (socket_error() != EINTR) {
+break;
+}
+}
+
+if (len == -1) {
+len = -socket_error();
+}
+return len;
+}
+


This looks like a line-for-line copy of socket_get_buffer()..

Since you're just going to end up replacing this with vmsplice(),
could you just call socket_get_buffer() temporarily until
your next patch is ready?


+static int qemu_local_get_fd(void *opaque)
+{
+QEMUFileLocal *s = opaque;
+
+return s-fd;
+}
+
+static int qemu_local_close(void *opaque)
+{
+QEMUFileLocal *s = opaque;
+
+closesocket(s-fd);
+g_free(s);
+
+return 0;
+}
+
+static size_t qemu_local_put_buffer(void *opaque, struct iovec *iov,
+int iovcnt, int64_t pos)
+{
+QEMUFileLocal *s = opaque;
+ssize_t len;
+ssize_t size = iov_size(iov, iovcnt);
+
+len = iov_send(s-fd, iov, iovcnt, 0, size);
+if (len  size) {
+len = -socket_error();
+}
+
+return len;
+}
+
+static size_t local_save_page(QEMUFile *f, RAMBlock *block,
+  ram_addr_t offset, int flags)
+{
+MemoryRegion *mr = block-mr;
+uint8_t *p;
+
+p = memory_region_get_ram_ptr(mr) + offset;
+
+if (buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE)) {
+qemu_put_be64(f, offset | flags | RAM_SAVE_FLAG_COMPRESS);
+if (!flags) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr,
+strlen(block-idstr));
+}
+qemu_put_byte(f, *p);
+return 0;
+}
+
+qemu_put_be64(f, offset | flags | RAM_SAVE_FLAG_PAGE);
+if (!flags) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr,
+strlen(block-idstr));
+}
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+
+return TARGET_PAGE_SIZE;
+}
+
+static size_t qemu_local_ram_save(QEMUFile *f, void *opaque,
+  ram_addr_t block_offset, 
ram_addr_t offset,

+  size_t size, int *bytes_sent)
+{
+QEMUFileLocal *s = opaque;
+uint64_t current_addr = block_offset + offset;
+RAMBlock *block = qemu_get_ram_block(current_addr);
+MemoryRegion *mr = block-mr;



Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 08:25, Lei Li ha scritto:
 On 08/21/2013 06:48 PM, Paolo Bonzini wrote:
 Il 21/08/2013 09:18, Lei Li ha scritto:
 Send all the ram blocks hooked by save_page, which will copy
 ram page and MADV_DONTNEED the page just copied.
 You should implement this entirely in the hook.

 It will be a little less efficient because of the dirty bitmap overhead,
 but you should aim at having *zero* changes in arch_init.c and
 migration.c.
 
 Yes, the reason I modify the migration_thread() to have new process that
 send all the ram pages in adjusted qemu_savevm_state_begin stage and send 
 device
 states in qemu_savevm_device_state stage for localhost migration is to avoid 
 the
 bitmap thing, which is a little less efficient just like you mentioned above.
 
 The performance assurance is very important to this feature, our goal is
 100ms of downtime for a 1TB guest.

Do not _start_ by introducing encapsulation violations all over the place.

Juan has been working on optimizing the dirty bitmap code.  His patches
could introduce a speedup of a factor of up to 64.  Thus it is possible
that his work will help you enough that you can work with the dirty bitmap.

Also, this feature (not looking at the dirty bitmap if the machine is
stopped) is not limited to localhost migration, add it later once the
basic vmsplice plumbing is in place.  This will also let you profile the
code and understand whether the goal is attainable.

I honestly doubt that 100ms of downtime is possible while the machine is
stopped.  A 1TB guest has 2^28 = 268*10^6 pages, which you want to
process in 100*10^6 nanoseconds.  Thus, your approach would require 0.4
nanoseconds per page, or roughly 2 clock cycles per page.  This is
impossible without _massive_ parallelization at all levels, starting
from the kernel.

As a matter of fact, 2^28 madvise system calls will take much, much
longer than 100ms.

Have you thought of using shared memory (with -mempath) instead of vmsplice?

Paolo



Re: [Qemu-devel] [PATCH 11/18] migration: introduce capability localhost

2013-08-23 Thread Lei Li

On 08/23/2013 03:40 PM, Paolo Bonzini wrote:

Il 22/08/2013 22:50, Michael R. Hines ha scritto:

On 08/21/2013 11:18 AM, Paolo Bonzini wrote:

Il 21/08/2013 09:18, Lei Li ha scritto:

   } else if (strstart(uri, unix:, p)) {
+if (s-enabled_capabilities[MIGRATION_CAPABILITY_LOCALHOST]) {
+local_start_outgoing_migration(s, p, local_err);
+}
   unix_start_outgoing_migration(s, p, local_err);
   } else if (strstart(uri, fd:, p)) {
   fd_start_outgoing_migration(s, p, local_err);
@@ -521,6 +524,15 @@ int migrate_use_xbzrle(void)
   return s-enabled_capabilities[MIGRATION_CAPABILITY_XBZRLE];
   }
   +bool migrate_is_localhost(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s-enabled_capabilities[MIGRATION_CAPABILITY_LOCALHOST];
+}

I think this is a bad name, too.  There is nothing more local in this
migration than in unix: migration.

Let's call the capability according to what it does, for example
unix-page-flipping.

Why is there a capability at all? Isn't that what the local URI is for?

Because in these patches, the local URI is only present in the
destination (which is wrong: the destination should autodetect local
mode using the load-page hook).  As you can see above, a unix URI will
examine the capability and pick the appropriate migration method.


Hi Paolo,

I agree with that 'localhost' is a bad capability name. Since we just use unix
socket temporarily and will be ending up with vmsplice via pipe, how about
just page_flipping?



However, this is also the wrong place to look at the capability.  It is
the save-page hook that should examine the capability.  It will then do
nothing if it is disabled, and do page-flipping if the capability is on.

Paolo




--
Lei




Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Alex Bligh



--On 23 August 2013 09:48:42 +0200 Paolo Bonzini pbonz...@redhat.com 
wrote:



As a matter of fact, 2^28 madvise system calls will take much, much
longer than 100ms.


Probably a stupid question, but why would you need to do one call per
page? It takes a 'size_t length' parameter.

--
Alex Bligh



[Qemu-devel] [PATCH] mips/malta: prevent writes to reset flash mapping faulting

2013-08-23 Thread Leon Alrae
From: James Hogan james.ho...@imgtec.com

Commit a427338 (mips_malta: correct reading MIPS revision at 0x1fc00010)
altered the behaviour of the monitor flash mapping at the reset address
by making it read only. However this causes data bus error exceptions
when it is written to since it is effectively unassigned memory for
writes. This isn't how the real hardware behaves. That memory can be
written to (even with the MFWR jumper not fitted) and the new value read
back from, but it doesn't get written back to the monitor flash so is
volatile.

This is fixed by converting the bios copy from read only ram to a bios
device with a nop write callback.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paul Burton paul.bur...@imgtec.com
Cc: Leon Alrae leon.al...@imgtec.com
Cc: Aurelien Jarno aurel...@aurel32.net
Signed-off-by: Leon Alrae leon.al...@imgtec.com
---
 hw/mips/mips_malta.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index f8d064c..9e721d3 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -873,6 +873,16 @@ static void cpu_request_exit(void *opaque, int irq, int 
level)
 }
 }
 
+static void monflash_copy_mem_write(void *opaque, hwaddr ram_addr,
+uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps monflash_copy_mem_ops = {
+.write = monflash_copy_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 static
 void mips_malta_init(QEMUMachineInitArgs *args)
 {
@@ -1043,13 +1053,13 @@ void mips_malta_init(QEMUMachineInitArgs *args)
  * handled by an overlapping region as the resulting ROM code subpage
  * regions are not executable.
  */
-memory_region_init_ram(bios_copy, NULL, bios.1fc, BIOS_SIZE);
+memory_region_init_rom_device(bios_copy, NULL, monflash_copy_mem_ops, 
NULL,
+  bios.1fc, BIOS_SIZE);
 if (!rom_copy(memory_region_get_ram_ptr(bios_copy),
   FLASH_ADDRESS, BIOS_SIZE)) {
 memcpy(memory_region_get_ram_ptr(bios_copy),
memory_region_get_ram_ptr(bios), BIOS_SIZE);
 }
-memory_region_set_readonly(bios_copy, true);
 memory_region_add_subregion(system_memory, RESET_ADDRESS, bios_copy);
 
 /* Board ID = 0x420 (Malta Board with CoreLV) */
-- 
1.7.5.4





Re: [Qemu-devel] [PATCH 11/18] migration: introduce capability localhost

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 09:51, Lei Li ha scritto:
 Because in these patches, the local URI is only present in the
 destination (which is wrong: the destination should autodetect local
 mode using the load-page hook).  As you can see above, a unix URI will
 examine the capability and pick the appropriate migration method.
 
 Hi Paolo,
 
 I agree with that 'localhost' is a bad capability name. Since we just
 use unix
 socket temporarily and will be ending up with vmsplice via pipe, how about
 just page_flipping?

The basic migration method is still a unix socket (and a unix: URI).
I suggested unix-page-flipping to make it clear that it doesn't work
with tcp or rdma migration (you might make it work with fd migration).

Paolo



Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 09:57, Alex Bligh ha scritto:
 
 
 --On 23 August 2013 09:48:42 +0200 Paolo Bonzini pbonz...@redhat.com
 wrote:
 
 As a matter of fact, 2^28 madvise system calls will take much, much
 longer than 100ms.
 
 Probably a stupid question, but why would you need to do one call per
 page? It takes a 'size_t length' parameter.

Right now migration is done a page at a time, and so is madvise AFAIU.
However, even with a larger length parameter I suspect it would alone
take more than 2 cycles per page.

So one way to do this could be to add a flag to migrate that would
migrate devices only, and use shared memory in both the source and the
target.

There is still a problem, because we must make sure the destination
doesn't write to memory (e.g. read firmware) when initializing the
board, because that would overwrite the memory of the running instance.
 But it looks more promising than page flipping.

Paolo



Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-23 Thread Bharata B Rao
On Fri, Aug 23, 2013 at 09:33:21AM +0200, Paolo Bonzini wrote:
  (gdb) p *bh
  $1 = {ctx = 0x0, cb = 0x555ffdcd qemu_gluster_aio_bh, opaque = 
  0x7fffd00419c0, next = 0x56345e70, scheduled = false, idle = false, 
deleted = true}
 
 This looks like a use-after-free, with bh-ctx corrupted when freeing
 the bottom half.  But it's not at all obvious how it can happen.
 
 I suggest using MALLOC_PERTURB_=42 to check this theory (if it is
 correct, most fields will be something like 0x2a2a2a2a2a2a2a2a).  But I
 don't see anything clearly wrong in the patch... Thus perhaps it is
 simpler to just remove the unreachable error handling code.

(gdb) p *bh
$1 = {ctx = 0x0, cb = 0x2a2a2a2a2a2a2a2a, opaque = 0x2a2a2a2a2a2a2a2a, next = 
0x2a2a2a2a2a2a2a2a, scheduled = false, idle = false, deleted = true}

May be as note above, I should just remove the unreachable error handling
code for now.

Regards,
Bharata.




[Qemu-devel] [PATCH V9 08/15] monitor: avoid direct use of global variable *mon_cmds

2013-08-23 Thread Wenchao Xia
New member *cmd_table is added in structure Monitor to avoid direct usage of
*mon_cmds. Now monitor have an associated command table, when global variable
*info_cmds is also discarded, structure Monitor would gain full control about
how to deal with user input.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 monitor.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/monitor.c b/monitor.c
index 3ac98ef..8152607 100644
--- a/monitor.c
+++ b/monitor.c
@@ -195,6 +195,7 @@ struct Monitor {
 CPUState *mon_cpu;
 BlockDriverCompletionFunc *password_completion_cb;
 void *password_opaque;
+mon_cmd_t *cmd_table;
 QError *error;
 QLIST_HEAD(,mon_fd_t) fds;
 QLIST_ENTRY(Monitor) entry;
@@ -687,6 +688,8 @@ static void monitor_data_init(Monitor *mon)
 {
 memset(mon, 0, sizeof(Monitor));
 mon-outbuf = qstring_new();
+/* Use *mon_cmds by default. */
+mon-cmd_table = mon_cmds;
 }
 
 static void monitor_data_destroy(Monitor *mon)
@@ -767,7 +770,7 @@ static void help_cmd(Monitor *mon, const char *name)
 if (name  !strcmp(name, info)) {
 help_cmd_dump(mon, info_cmds, info , NULL);
 } else {
-help_cmd_dump(mon, mon_cmds, , name);
+help_cmd_dump(mon, mon-cmd_table, , name);
 if (name  !strcmp(name, log)) {
 const QEMULogItem *item;
 monitor_printf(mon, Log items (comma separated):\n);
@@ -3994,7 +3997,7 @@ static void handle_user_command(Monitor *mon, const char 
*cmdline)
 
 qdict = qdict_new();
 
-cmd = monitor_parse_command(mon, cmdline, 0, mon_cmds, qdict);
+cmd = monitor_parse_command(mon, cmdline, 0, mon-cmd_table, qdict);
 if (!cmd)
 goto out;
 
@@ -4183,12 +4186,12 @@ static void monitor_find_completion(Monitor *mon,
 else
 cmdname = args[0];
 readline_set_completion_index(mon-rs, strlen(cmdname));
-for(cmd = mon_cmds; cmd-name != NULL; cmd++) {
+for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
 cmd_completion(mon, cmdname, cmd-name);
 }
 } else {
 /* find the command */
-for (cmd = mon_cmds; cmd-name != NULL; cmd++) {
+for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
 if (compare_cmd(args[0], cmd-name)) {
 break;
 }
@@ -4239,7 +4242,7 @@ static void monitor_find_completion(Monitor *mon,
 }
 } else if (!strcmp(cmd-name, help|?)) {
 readline_set_completion_index(mon-rs, strlen(str));
-for (cmd = mon_cmds; cmd-name != NULL; cmd++) {
+for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
 cmd_completion(mon, str, cmd-name);
 }
 }
-- 
1.7.1





[Qemu-devel] [PATCH V9 03/15] monitor: avoid use of global *cur_mon in block_completion_it()

2013-08-23 Thread Wenchao Xia
Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index 30819fa..6171c75 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4090,14 +4090,21 @@ static void file_completion(Monitor *mon, const char 
*input)
 closedir(ffs);
 }
 
+typedef struct MonitorBlockComplete {
+Monitor *mon;
+const char *input;
+} MonitorBlockComplete;
+
 static void block_completion_it(void *opaque, BlockDriverState *bs)
 {
 const char *name = bdrv_get_device_name(bs);
-const char *input = opaque;
+MonitorBlockComplete *mbc = opaque;
+Monitor *mon = mbc-mon;
+const char *input = mbc-input;
 
 if (input[0] == '\0' ||
 !strncmp(name, (char *)input, strlen(input))) {
-readline_add_completion(cur_mon-rs, name);
+readline_add_completion(mon-rs, name);
 }
 }
 
@@ -4141,6 +4148,7 @@ static void monitor_find_completion(const char *cmdline)
 const char *ptype, *str;
 const mon_cmd_t *cmd;
 Monitor *mon = cur_mon;
+MonitorBlockComplete mbs;
 
 parse_cmdline(cmdline, nb_args, args);
 #ifdef DEBUG_COMPLETION
@@ -4199,8 +4207,10 @@ static void monitor_find_completion(const char *cmdline)
 break;
 case 'B':
 /* block device name completion */
-readline_set_completion_index(cur_mon-rs, strlen(str));
-bdrv_iterate(block_completion_it, (void *)str);
+mbs.mon = mon;
+mbs.input = str;
+readline_set_completion_index(mon-rs, strlen(str));
+bdrv_iterate(block_completion_it, mbs);
 break;
 case 's':
 /* XXX: more generic ? */
-- 
1.7.1





[Qemu-devel] [PATCH V9 01/15] monitor: avoid use of global *cur_mon in cmd_completion()

2013-08-23 Thread Wenchao Xia
A new local variable *mon is added in monitor_find_completion()
to make compile pass, which will be removed later in
conversion patch for monitor_find_completion().

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/monitor.c b/monitor.c
index da9c9a2..e0154a8 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4008,7 +4008,7 @@ out:
 QDECREF(qdict);
 }
 
-static void cmd_completion(const char *name, const char *list)
+static void cmd_completion(Monitor *mon, const char *name, const char *list)
 {
 const char *p, *pstart;
 char cmd[128];
@@ -4026,7 +4026,7 @@ static void cmd_completion(const char *name, const char 
*list)
 memcpy(cmd, pstart, len);
 cmd[len] = '\0';
 if (name[0] == '\0' || !strncmp(name, cmd, strlen(name))) {
-readline_add_completion(cur_mon-rs, cmd);
+readline_add_completion(mon-rs, cmd);
 }
 if (*p == '\0')
 break;
@@ -4140,6 +4140,7 @@ static void monitor_find_completion(const char *cmdline)
 int nb_args, i, len;
 const char *ptype, *str;
 const mon_cmd_t *cmd;
+Monitor *mon = cur_mon;
 
 parse_cmdline(cmdline, nb_args, args);
 #ifdef DEBUG_COMPLETION
@@ -4165,7 +4166,7 @@ static void monitor_find_completion(const char *cmdline)
 cmdname = args[0];
 readline_set_completion_index(cur_mon-rs, strlen(cmdname));
 for(cmd = mon_cmds; cmd-name != NULL; cmd++) {
-cmd_completion(cmdname, cmd-name);
+cmd_completion(mon, cmdname, cmd-name);
 }
 } else {
 /* find the command */
@@ -4206,7 +4207,7 @@ static void monitor_find_completion(const char *cmdline)
 if (!strcmp(cmd-name, info)) {
 readline_set_completion_index(cur_mon-rs, strlen(str));
 for(cmd = info_cmds; cmd-name != NULL; cmd++) {
-cmd_completion(str, cmd-name);
+cmd_completion(mon, str, cmd-name);
 }
 } else if (!strcmp(cmd-name, sendkey)) {
 char *sep = strrchr(str, '-');
@@ -4214,12 +4215,12 @@ static void monitor_find_completion(const char *cmdline)
 str = sep + 1;
 readline_set_completion_index(cur_mon-rs, strlen(str));
 for (i = 0; i  Q_KEY_CODE_MAX; i++) {
-cmd_completion(str, QKeyCode_lookup[i]);
+cmd_completion(mon, str, QKeyCode_lookup[i]);
 }
 } else if (!strcmp(cmd-name, help|?)) {
 readline_set_completion_index(cur_mon-rs, strlen(str));
 for (cmd = mon_cmds; cmd-name != NULL; cmd++) {
-cmd_completion(str, cmd-name);
+cmd_completion(mon, str, cmd-name);
 }
 }
 break;
-- 
1.7.1





[Qemu-devel] [PATCH V9 00/15] monitor: support sub command group in auto completion and help

2013-08-23 Thread Wenchao Xia
This series make auto completion and help functions works normal for sub
command, by using reentrant functions. In order to do that, global variables
are not directly used in those functions any more. With this series, cmd_table
is a member of structure Monitor so it is possible to create a monitor with
different command table now, auto completion will work in that monitor. In
short, info is not treated as a special case now, this series ensure help
and auto complete function works normal for any sub command added in the future.

Patch 5 replaced cur_mon with rs-mon, it is safe because:
monitor_init() calls readline_init() which initialize mon-rs, result is
mon-rs-mon == mon. Then qemu_chr_add_handlers() is called, which make
monitor_read() function take *mon as its opaque. Later, when user input,
monitor_read() is called, where cur_mon is set to *mon by cur_mon = opaque.
If qemu's monitors run in one thread, then later in readline_handle_byte()
and readline_comletion(), cur_mon is actually equal to rs-mon, in another
word, it points to the monitor instance, so it is safe to replace *cur_mon
in those functions.

Thanks for Luiz and Eric for reviewing.

V2:
  General:
  To discard *info_comds more graceful, help related function is modified to 
support
sub command too.
  Patch 6/7 are added to improve help related functions.
  Patch 5: not directly return to make sure args are freed.

  Address Luiz's comments:
  Split patch into small series.
  struct mon_cmd_t was not moved into header file, instead mon_cmnd_t 
*cmd_table is
added as a member in struct Monitor.
  5/7: drop original code comments for info in monitor_find_completion().

v3:
  5/7: add parameter **args_cmdline in parse_cmdline() to tell next valid
parameter's position. This fix the issue in case command length in input is not
equal to its name's length such as help|?, and the case input start with
space such as   s. 
  7/7: better commit message.

v4:
  Address Eric's comments:
  1/7, 2/7, 4/7: better commit title and message.
  1/7 remove useless (char *) in old code, add space around for () in old 
code.
  3/7: separate code moving patch before usage.
  4/7: add space around for () in old code, add min(nb_args, MAX_ARGS) in free
to make code stronger.

v5:
  4/7: use a  b ? a : b instead of macro min.

v6:
  Address Luiz's comments:
  1/13 ~ 5/13: splitted small patches.
  5/13: added commit message about the correctness of replacing of cur_mon and
test result.
  6/13: better comments in code.
  7/13: added commit message about the reason of code moving.
  8/13: new patch to improve parse_cmdline(), since it is a more generic
function now.
  9/13: reworked the commit message, better commentes in code, use
free_cmdline_args() in clean. It is a bit hard to split this patch into
smaller meaning ful ones, so kepted this patch as a relative larger one,
with better commit message.
  12/13: put case 'S' with case 's' in monitor_find_completion_by_table().
moved this patch ahead of patch 13/13.
  13/13: this patch is moved behind patch 12/13.

  Generic change:
  10/13: splitted patch which moved out the reentrant part into a separate
function, make review easier. This also avoided re-parsing the command line
which does in previous version.
  11/13: splitted patch, which simply remove usage of info_cmds and support
sub command by re-enter the function.

v7:
  Address Luiz's comments:
  5/13: moved the comments why the change is safe, to cover-letter.
  8/13: use assert in free_cmdline_args(), fail when args in input exceed
the limit in parse_cmdline().

v8:
  Address Eric's comments:
  Fix typo in commit messages.

V9:
  General:
  6: new patch, I found that call can be optimized, so move it.
  Address Luiz's comments:
  7: split function to init the monitor.
  8: move the cmd_table init code to parse_cmdlinethe splitted function.
  11: directly return in fail calling parse_cmdline(), in help_cmd_dump().
  Other:
  After discuss with Luiz, it would be good to add test case for qmp fd-add
related stuff. Current test case is 045, which doesn't cover the case
that fd is added by SCM at runtime, and that case is related to monitor.
I have drafted a version, which can pass both on upstream and this series,
but it may require a bit more code, so send it later as another series.

Wenchao Xia (15):
  1 monitor: avoid use of global *cur_mon in cmd_completion()
  2 monitor: avoid use of global *cur_mon in file_completion()
  3 monitor: avoid use of global *cur_mon in block_completion_it()
  4 monitor: avoid use of global *cur_mon in monitor_find_completion()
  5 monitor: avoid use of global *cur_mon in readline_completion()
  6 monitor: call sortcmdlist() only one time
  7 monitor: split off monitor_data_init()
  8 monitor: avoid direct use of global variable *mon_cmds
  9 monitor: code move for parse_cmdline()
  10 monitor: refine parse_cmdline()
  11 monitor: support sub command in help
  12 monitor: refine monitor_find_completion()
  13 monitor: support 

[Qemu-devel] [PATCH V9 04/15] monitor: avoid use of global *cur_mon in monitor_find_completion()

2013-08-23 Thread Wenchao Xia
Parameter *mon is added, and local variable *mon added in previous patch
is removed. The caller readline_completion(), pass rs-mon as value, which
should be initialized in readline_init() called by monitor_init().

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 include/monitor/readline.h |3 ++-
 monitor.c  |   18 +-
 readline.c |2 +-
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/monitor/readline.h b/include/monitor/readline.h
index fc9806e..0faf6e1 100644
--- a/include/monitor/readline.h
+++ b/include/monitor/readline.h
@@ -8,7 +8,8 @@
 #define READLINE_MAX_COMPLETIONS 256
 
 typedef void ReadLineFunc(Monitor *mon, const char *str, void *opaque);
-typedef void ReadLineCompletionFunc(const char *cmdline);
+typedef void ReadLineCompletionFunc(Monitor *mon,
+const char *cmdline);
 
 typedef struct ReadLineState {
 char cmd_buf[READLINE_CMD_BUF_SIZE + 1];
diff --git a/monitor.c b/monitor.c
index 6171c75..63a779f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4140,20 +4140,20 @@ static const char *next_arg_type(const char *typestr)
 return (p != NULL ? ++p : typestr);
 }
 
-static void monitor_find_completion(const char *cmdline)
+static void monitor_find_completion(Monitor *mon,
+const char *cmdline)
 {
 const char *cmdname;
 char *args[MAX_ARGS];
 int nb_args, i, len;
 const char *ptype, *str;
 const mon_cmd_t *cmd;
-Monitor *mon = cur_mon;
 MonitorBlockComplete mbs;
 
 parse_cmdline(cmdline, nb_args, args);
 #ifdef DEBUG_COMPLETION
-for(i = 0; i  nb_args; i++) {
-monitor_printf(cur_mon, arg%d = '%s'\n, i, (char *)args[i]);
+for (i = 0; i  nb_args; i++) {
+monitor_printf(mon, arg%d = '%s'\n, i, args[i]);
 }
 #endif
 
@@ -4172,7 +4172,7 @@ static void monitor_find_completion(const char *cmdline)
 cmdname = ;
 else
 cmdname = args[0];
-readline_set_completion_index(cur_mon-rs, strlen(cmdname));
+readline_set_completion_index(mon-rs, strlen(cmdname));
 for(cmd = mon_cmds; cmd-name != NULL; cmd++) {
 cmd_completion(mon, cmdname, cmd-name);
 }
@@ -4202,7 +4202,7 @@ static void monitor_find_completion(const char *cmdline)
 switch(*ptype) {
 case 'F':
 /* file completion */
-readline_set_completion_index(cur_mon-rs, strlen(str));
+readline_set_completion_index(mon-rs, strlen(str));
 file_completion(mon, str);
 break;
 case 'B':
@@ -4215,7 +4215,7 @@ static void monitor_find_completion(const char *cmdline)
 case 's':
 /* XXX: more generic ? */
 if (!strcmp(cmd-name, info)) {
-readline_set_completion_index(cur_mon-rs, strlen(str));
+readline_set_completion_index(mon-rs, strlen(str));
 for(cmd = info_cmds; cmd-name != NULL; cmd++) {
 cmd_completion(mon, str, cmd-name);
 }
@@ -4223,12 +4223,12 @@ static void monitor_find_completion(const char *cmdline)
 char *sep = strrchr(str, '-');
 if (sep)
 str = sep + 1;
-readline_set_completion_index(cur_mon-rs, strlen(str));
+readline_set_completion_index(mon-rs, strlen(str));
 for (i = 0; i  Q_KEY_CODE_MAX; i++) {
 cmd_completion(mon, str, QKeyCode_lookup[i]);
 }
 } else if (!strcmp(cmd-name, help|?)) {
-readline_set_completion_index(cur_mon-rs, strlen(str));
+readline_set_completion_index(mon-rs, strlen(str));
 for (cmd = mon_cmds; cmd-name != NULL; cmd++) {
 cmd_completion(mon, str, cmd-name);
 }
diff --git a/readline.c b/readline.c
index 1c0f7ee..c91b324 100644
--- a/readline.c
+++ b/readline.c
@@ -285,7 +285,7 @@ static void readline_completion(ReadLineState *rs)
 cmdline = g_malloc(rs-cmd_buf_index + 1);
 memcpy(cmdline, rs-cmd_buf, rs-cmd_buf_index);
 cmdline[rs-cmd_buf_index] = '\0';
-rs-completion_finder(cmdline);
+rs-completion_finder(rs-mon, cmdline);
 g_free(cmdline);
 
 /* no completion found */
-- 
1.7.1





[Qemu-devel] [PATCH V9 06/15] monitor: call sortcmdlist() only one time

2013-08-23 Thread Wenchao Xia
It doesn't need to be done for every monitor, so change it.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 monitor.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/monitor.c b/monitor.c
index 63a779f..bf019e2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4762,6 +4762,7 @@ void monitor_init(CharDriverState *chr, int flags)
 
 if (is_first_init) {
 monitor_protocol_event_init();
+sortcmdlist();
 is_first_init = 0;
 }
 
@@ -4791,8 +4792,6 @@ void monitor_init(CharDriverState *chr, int flags)
 QLIST_INSERT_HEAD(mon_list, mon, entry);
 if (!default_mon || (flags  MONITOR_IS_DEFAULT))
 default_mon = mon;
-
-sortcmdlist();
 }
 
 static void bdrv_password_cb(Monitor *mon, const char *password, void *opaque)
-- 
1.7.1





[Qemu-devel] [PATCH V9 10/15] monitor: refine parse_cmdline()

2013-08-23 Thread Wenchao Xia
Since this function will be used by help_cmd() later, so improve
it to make it more generic and easier to use. free_cmdline_args()
is added too as paired function to free the result.

One change of this function is that, when the valid args in input
exceed the limit of MAX_ARGS, it fails now, instead of return with
MAX_ARGS of parsed args in old code. This should not impact much
since it is rare that user input many args in monitor's help and
auto complete scenario.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   51 ---
 1 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/monitor.c b/monitor.c
index 1db0117..8897291 100644
--- a/monitor.c
+++ b/monitor.c
@@ -821,9 +821,33 @@ static int get_str(char *buf, int buf_size, const char 
**pp)
 
 #define MAX_ARGS 16
 
-/* NOTE: this parser is an approximate form of the real command parser */
-static void parse_cmdline(const char *cmdline,
-  int *pnb_args, char **args)
+static void free_cmdline_args(char **args, int nb_args)
+{
+int i;
+
+assert(nb_args = MAX_ARGS);
+
+for (i = 0; i  nb_args; i++) {
+g_free(args[i]);
+}
+
+}
+
+/*
+ * Parse the command line to get valid args.
+ * @cmdline: command line to be parsed.
+ * @pnb_args: location to store the number of args, must NOT be NULL.
+ * @args: location to store the args, which should be freed by caller, must
+ *NOT be NULL.
+ *
+ * Returns 0 on success, negative on failure.
+ *
+ * NOTE: this parser is an approximate form of the real command parser. Number
+ *   of args have a limit of MAX_ARGS. If cmdline contains more, it will
+ *   return with failure.
+ */
+static int parse_cmdline(const char *cmdline,
+ int *pnb_args, char **args)
 {
 const char *p;
 int nb_args, ret;
@@ -839,16 +863,21 @@ static void parse_cmdline(const char *cmdline,
 break;
 }
 if (nb_args = MAX_ARGS) {
-break;
+goto fail;
 }
 ret = get_str(buf, sizeof(buf), p);
-args[nb_args] = g_strdup(buf);
-nb_args++;
 if (ret  0) {
-break;
+goto fail;
 }
+args[nb_args] = g_strdup(buf);
+nb_args++;
 }
 *pnb_args = nb_args;
+return 0;
+
+ fail:
+free_cmdline_args(args, nb_args);
+return -1;
 }
 
 static void help_cmd_dump(Monitor *mon, const mon_cmd_t *cmds,
@@ -4168,7 +4197,9 @@ static void monitor_find_completion(Monitor *mon,
 const mon_cmd_t *cmd;
 MonitorBlockComplete mbs;
 
-parse_cmdline(cmdline, nb_args, args);
+if (parse_cmdline(cmdline, nb_args, args)  0) {
+return;
+}
 #ifdef DEBUG_COMPLETION
 for (i = 0; i  nb_args; i++) {
 monitor_printf(mon, arg%d = '%s'\n, i, args[i]);
@@ -4258,9 +4289,7 @@ static void monitor_find_completion(Monitor *mon,
 }
 
 cleanup:
-for (i = 0; i  nb_args; i++) {
-g_free(args[i]);
-}
+free_cmdline_args(args, nb_args);
 }
 
 static int monitor_can_read(void *opaque)
-- 
1.7.1





[Qemu-devel] [PATCH V9 11/15] monitor: support sub command in help

2013-08-23 Thread Wenchao Xia
The old code in help_cmd() uses global 'info_cmds' and treats it as a
special case. Actually 'info_cmds' is a sub command group of 'mon_cmds',
in order to avoid direct use of it, help_cmd() needs to change its work
mechanism to support sub command and not treat it as a special case
any more.

To support sub command, help_cmd() will first parse the input and then call
help_cmd_dump(), which works as a reentrant function. When it meets a sub
command, it simply enters the function again. Since help dumping needs to
know whole input to printf full help message include prefix, for example,
help info block need to printf prefix info, so help_cmd_dump() takes all
args from input and extra parameter arg_index to identify the progress.
Another function help_cmd_dump_one() is introduced to printf the prefix
and command's help message.

Now help supports sub command, so later if another sub command group is
added in any depth, help will automatically work for it. Still help info
block will show error since command parser reject additional parameter,
which can be improved later. log is still treated as a special case.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   63 +++-
 1 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/monitor.c b/monitor.c
index 8897291..1ace29e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -880,33 +880,76 @@ static int parse_cmdline(const char *cmdline,
 return -1;
 }
 
+static void help_cmd_dump_one(Monitor *mon,
+  const mon_cmd_t *cmd,
+  char **prefix_args,
+  int prefix_args_nb)
+{
+int i;
+
+for (i = 0; i  prefix_args_nb; i++) {
+monitor_printf(mon, %s , prefix_args[i]);
+}
+monitor_printf(mon, %s %s -- %s\n, cmd-name, cmd-params, cmd-help);
+}
+
+/* @args[@arg_index] is the valid command need to find in @cmds */
 static void help_cmd_dump(Monitor *mon, const mon_cmd_t *cmds,
-  const char *prefix, const char *name)
+  char **args, int nb_args, int arg_index)
 {
 const mon_cmd_t *cmd;
 
-for(cmd = cmds; cmd-name != NULL; cmd++) {
-if (!name || !strcmp(name, cmd-name))
-monitor_printf(mon, %s%s %s -- %s\n, prefix, cmd-name,
-   cmd-params, cmd-help);
+/* No valid arg need to compare with, dump all in *cmds */
+if (arg_index = nb_args) {
+for (cmd = cmds; cmd-name != NULL; cmd++) {
+help_cmd_dump_one(mon, cmd, args, arg_index);
+}
+return;
+}
+
+/* Find one entry to dump */
+for (cmd = cmds; cmd-name != NULL; cmd++) {
+if (compare_cmd(args[arg_index], cmd-name)) {
+if (cmd-sub_table) {
+/* continue with next arg */
+help_cmd_dump(mon, cmd-sub_table,
+  args, nb_args, arg_index + 1);
+} else {
+help_cmd_dump_one(mon, cmd, args, arg_index);
+}
+break;
+}
 }
 }
 
 static void help_cmd(Monitor *mon, const char *name)
 {
-if (name  !strcmp(name, info)) {
-help_cmd_dump(mon, info_cmds, info , NULL);
-} else {
-help_cmd_dump(mon, mon-cmd_table, , name);
-if (name  !strcmp(name, log)) {
+char *args[MAX_ARGS];
+int nb_args = 0;
+
+/* 1. parse user input */
+if (name) {
+/* special case for log, directly dump and return */
+if (!strcmp(name, log)) {
 const QEMULogItem *item;
 monitor_printf(mon, Log items (comma separated):\n);
 monitor_printf(mon, %-10s %s\n, none, remove all logs);
 for (item = qemu_log_items; item-mask != 0; item++) {
 monitor_printf(mon, %-10s %s\n, item-name, item-help);
 }
+return;
+}
+
+if (parse_cmdline(name, nb_args, args)  0) {
+return;
 }
 }
+
+/* 2. dump the contents according to parsed args */
+help_cmd_dump(mon, mon-cmd_table, args, nb_args, 0);
+
+cleanup:
+free_cmdline_args(args, nb_args);
 }
 
 static void do_help_cmd(Monitor *mon, const QDict *qdict)
-- 
1.7.1





[Qemu-devel] [PATCH V9 05/15] monitor: avoid use of global *cur_mon in readline_completion()

2013-08-23 Thread Wenchao Xia
Now all completion functions do not use *cur_mon any more, instead
they use rs-mon. In short, structure ReadLineState decide where
the complete action would be taken now.

Tested with the case that qemu have two telnet monitors, auto
completion function works normal.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 readline.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/readline.c b/readline.c
index c91b324..abf27dd 100644
--- a/readline.c
+++ b/readline.c
@@ -276,7 +276,6 @@ void readline_set_completion_index(ReadLineState *rs, int 
index)
 
 static void readline_completion(ReadLineState *rs)
 {
-Monitor *mon = cur_mon;
 int len, i, j, max_width, nb_cols, max_prefix;
 char *cmdline;
 
@@ -300,7 +299,7 @@ static void readline_completion(ReadLineState *rs)
 if (len  0  rs-completions[0][len - 1] != '/')
 readline_insert_char(rs, ' ');
 } else {
-monitor_printf(mon, \n);
+monitor_printf(rs-mon, \n);
 max_width = 0;
 max_prefix = 0;
 for(i = 0; i  rs-nb_completions; i++) {
-- 
1.7.1





[Qemu-devel] [PATCH V9 02/15] monitor: avoid use of global *cur_mon in file_completion()

2013-08-23 Thread Wenchao Xia
Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index e0154a8..30819fa 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4034,7 +4034,7 @@ static void cmd_completion(Monitor *mon, const char 
*name, const char *list)
 }
 }
 
-static void file_completion(const char *input)
+static void file_completion(Monitor *mon, const char *input)
 {
 DIR *ffs;
 struct dirent *d;
@@ -4057,7 +4057,7 @@ static void file_completion(const char *input)
 pstrcpy(file_prefix, sizeof(file_prefix), p + 1);
 }
 #ifdef DEBUG_COMPLETION
-monitor_printf(cur_mon, input='%s' path='%s' prefix='%s'\n,
+monitor_printf(mon, input='%s' path='%s' prefix='%s'\n,
input, path, file_prefix);
 #endif
 ffs = opendir(path);
@@ -4084,7 +4084,7 @@ static void file_completion(const char *input)
 if (stat(file, sb) == 0  S_ISDIR(sb.st_mode)) {
 pstrcat(file, sizeof(file), /);
 }
-readline_add_completion(cur_mon-rs, file);
+readline_add_completion(mon-rs, file);
 }
 }
 closedir(ffs);
@@ -4195,7 +4195,7 @@ static void monitor_find_completion(const char *cmdline)
 case 'F':
 /* file completion */
 readline_set_completion_index(cur_mon-rs, strlen(str));
-file_completion(str);
+file_completion(mon, str);
 break;
 case 'B':
 /* block device name completion */
-- 
1.7.1





[Qemu-devel] [PATCH V9 12/15] monitor: refine monitor_find_completion()

2013-08-23 Thread Wenchao Xia
In order to support sub command in auto completion, a reentrant function
is needed, so monitor_find_completion() is split into two parts. The
first part does parsing of user input which need to be done only once,
the second part does the auto completion job according to the parsing
result, which contains the necessary code to support sub command and
works as the reentrant function. The global info_cmds is still used
in second part, which will be replaced by sub command code later.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   65 
 1 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/monitor.c b/monitor.c
index 1ace29e..49a5a88 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4230,34 +4230,17 @@ static const char *next_arg_type(const char *typestr)
 return (p != NULL ? ++p : typestr);
 }
 
-static void monitor_find_completion(Monitor *mon,
-const char *cmdline)
+static void monitor_find_completion_by_table(Monitor *mon,
+ const mon_cmd_t *cmd_table,
+ char **args,
+ int nb_args)
 {
 const char *cmdname;
-char *args[MAX_ARGS];
-int nb_args, i, len;
+int i;
 const char *ptype, *str;
 const mon_cmd_t *cmd;
 MonitorBlockComplete mbs;
 
-if (parse_cmdline(cmdline, nb_args, args)  0) {
-return;
-}
-#ifdef DEBUG_COMPLETION
-for (i = 0; i  nb_args; i++) {
-monitor_printf(mon, arg%d = '%s'\n, i, args[i]);
-}
-#endif
-
-/* if the line ends with a space, it means we want to complete the
-   next arg */
-len = strlen(cmdline);
-if (len  0  qemu_isspace(cmdline[len - 1])) {
-if (nb_args = MAX_ARGS) {
-goto cleanup;
-}
-args[nb_args++] = g_strdup();
-}
 if (nb_args = 1) {
 /* command completion */
 if (nb_args == 0)
@@ -4265,18 +4248,18 @@ static void monitor_find_completion(Monitor *mon,
 else
 cmdname = args[0];
 readline_set_completion_index(mon-rs, strlen(cmdname));
-for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
+for (cmd = cmd_table; cmd-name != NULL; cmd++) {
 cmd_completion(mon, cmdname, cmd-name);
 }
 } else {
 /* find the command */
-for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
+for (cmd = cmd_table; cmd-name != NULL; cmd++) {
 if (compare_cmd(args[0], cmd-name)) {
 break;
 }
 }
 if (!cmd-name) {
-goto cleanup;
+return;
 }
 
 ptype = next_arg_type(cmd-args_type);
@@ -4321,7 +4304,7 @@ static void monitor_find_completion(Monitor *mon,
 }
 } else if (!strcmp(cmd-name, help|?)) {
 readline_set_completion_index(mon-rs, strlen(str));
-for (cmd = mon-cmd_table; cmd-name != NULL; cmd++) {
+for (cmd = cmd_table; cmd-name != NULL; cmd++) {
 cmd_completion(mon, str, cmd-name);
 }
 }
@@ -4330,6 +4313,36 @@ static void monitor_find_completion(Monitor *mon,
 break;
 }
 }
+}
+
+static void monitor_find_completion(Monitor *mon,
+const char *cmdline)
+{
+char *args[MAX_ARGS];
+int nb_args, len;
+
+/* 1. parse the cmdline */
+if (parse_cmdline(cmdline, nb_args, args)  0) {
+return;
+}
+#ifdef DEBUG_COMPLETION
+for (i = 0; i  nb_args; i++) {
+monitor_printf(mon, arg%d = '%s'\n, i, args[i]);
+}
+#endif
+
+/* if the line ends with a space, it means we want to complete the
+   next arg */
+len = strlen(cmdline);
+if (len  0  qemu_isspace(cmdline[len - 1])) {
+if (nb_args = MAX_ARGS) {
+goto cleanup;
+}
+args[nb_args++] = g_strdup();
+}
+
+/* 2. auto complete according to args */
+monitor_find_completion_by_table(mon, mon-cmd_table, args, nb_args);
 
 cleanup:
 free_cmdline_args(args, nb_args);
-- 
1.7.1





[Qemu-devel] [PATCH V9 07/15] monitor: split off monitor_data_init()

2013-08-23 Thread Wenchao Xia
In qmp_human_monitor_command(), the monitor need to initialized for
basic functionalities, and later more init code will be added, so
split off this function. Note that it is different with QMP mode
monitor which accept json string from monitor's input,
qmp_human_monitor_command() retrieve the human style command from
QMP input, then send the command to a normal mode monitor.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
---
 monitor.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/monitor.c b/monitor.c
index bf019e2..3ac98ef 100644
--- a/monitor.c
+++ b/monitor.c
@@ -683,14 +683,24 @@ static int do_qmp_capabilities(Monitor *mon, const QDict 
*params,
 
 static void handle_user_command(Monitor *mon, const char *cmdline);
 
+static void monitor_data_init(Monitor *mon)
+{
+memset(mon, 0, sizeof(Monitor));
+mon-outbuf = qstring_new();
+}
+
+static void monitor_data_destroy(Monitor *mon)
+{
+QDECREF(mon-outbuf);
+}
+
 char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
 int64_t cpu_index, Error **errp)
 {
 char *output = NULL;
 Monitor *old_mon, hmp;
 
-memset(hmp, 0, sizeof(hmp));
-hmp.outbuf = qstring_new();
+monitor_data_init(hmp);
 hmp.skip_flush = true;
 
 old_mon = cur_mon;
@@ -716,7 +726,7 @@ char *qmp_human_monitor_command(const char *command_line, 
bool has_cpu_index,
 }
 
 out:
-QDECREF(hmp.outbuf);
+monitor_data_destroy(hmp);
 return output;
 }
 
@@ -4766,8 +4776,8 @@ void monitor_init(CharDriverState *chr, int flags)
 is_first_init = 0;
 }
 
-mon = g_malloc0(sizeof(*mon));
-mon-outbuf = qstring_new();
+mon = g_malloc(sizeof(*mon));
+monitor_data_init(mon);
 
 mon-chr = chr;
 mon-flags = flags;
-- 
1.7.1





[Qemu-devel] [PATCH V9 15/15] monitor: improve auto complete of help for single command in sub group

2013-08-23 Thread Wenchao Xia
Now special case help * in auto completion can work with sub commands,
such as help info u*.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index b657c85..62fc433 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4330,10 +4330,8 @@ static void monitor_find_completion_by_table(Monitor 
*mon,
 cmd_completion(mon, str, QKeyCode_lookup[i]);
 }
 } else if (!strcmp(cmd-name, help|?)) {
-readline_set_completion_index(mon-rs, strlen(str));
-for (cmd = cmd_table; cmd-name != NULL; cmd++) {
-cmd_completion(mon, str, cmd-name);
-}
+monitor_find_completion_by_table(mon, cmd_table,
+ args[1], nb_args - 1);
 }
 break;
 default:
-- 
1.7.1





[Qemu-devel] [PATCH V9 09/15] monitor: code move for parse_cmdline()

2013-08-23 Thread Wenchao Xia
help_cmd() need this function later, so move it. get_str() is called by
parse_cmdline() so it is moved also. Some code style error reported by
check script, is also fixed.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |  191 +++--
 1 files changed, 98 insertions(+), 93 deletions(-)

diff --git a/monitor.c b/monitor.c
index 8152607..1db0117 100644
--- a/monitor.c
+++ b/monitor.c
@@ -753,6 +753,104 @@ static int compare_cmd(const char *name, const char *list)
 return 0;
 }
 
+static int get_str(char *buf, int buf_size, const char **pp)
+{
+const char *p;
+char *q;
+int c;
+
+q = buf;
+p = *pp;
+while (qemu_isspace(*p)) {
+p++;
+}
+if (*p == '\0') {
+fail:
+*q = '\0';
+*pp = p;
+return -1;
+}
+if (*p == '\') {
+p++;
+while (*p != '\0'  *p != '\') {
+if (*p == '\\') {
+p++;
+c = *p++;
+switch (c) {
+case 'n':
+c = '\n';
+break;
+case 'r':
+c = '\r';
+break;
+case '\\':
+case '\'':
+case '\':
+break;
+default:
+qemu_printf(unsupported escape code: '\\%c'\n, c);
+goto fail;
+}
+if ((q - buf)  buf_size - 1) {
+*q++ = c;
+}
+} else {
+if ((q - buf)  buf_size - 1) {
+*q++ = *p;
+}
+p++;
+}
+}
+if (*p != '\') {
+qemu_printf(unterminated string\n);
+goto fail;
+}
+p++;
+} else {
+while (*p != '\0'  !qemu_isspace(*p)) {
+if ((q - buf)  buf_size - 1) {
+*q++ = *p;
+}
+p++;
+}
+}
+*q = '\0';
+*pp = p;
+return 0;
+}
+
+#define MAX_ARGS 16
+
+/* NOTE: this parser is an approximate form of the real command parser */
+static void parse_cmdline(const char *cmdline,
+  int *pnb_args, char **args)
+{
+const char *p;
+int nb_args, ret;
+char buf[1024];
+
+p = cmdline;
+nb_args = 0;
+for (;;) {
+while (qemu_isspace(*p)) {
+p++;
+}
+if (*p == '\0') {
+break;
+}
+if (nb_args = MAX_ARGS) {
+break;
+}
+ret = get_str(buf, sizeof(buf), p);
+args[nb_args] = g_strdup(buf);
+nb_args++;
+if (ret  0) {
+break;
+}
+}
+*pnb_args = nb_args;
+}
+
 static void help_cmd_dump(Monitor *mon, const mon_cmd_t *cmds,
   const char *prefix, const char *name)
 {
@@ -3433,71 +3531,6 @@ static int get_double(Monitor *mon, double *pval, const 
char **pp)
 return 0;
 }
 
-static int get_str(char *buf, int buf_size, const char **pp)
-{
-const char *p;
-char *q;
-int c;
-
-q = buf;
-p = *pp;
-while (qemu_isspace(*p))
-p++;
-if (*p == '\0') {
-fail:
-*q = '\0';
-*pp = p;
-return -1;
-}
-if (*p == '\') {
-p++;
-while (*p != '\0'  *p != '\') {
-if (*p == '\\') {
-p++;
-c = *p++;
-switch(c) {
-case 'n':
-c = '\n';
-break;
-case 'r':
-c = '\r';
-break;
-case '\\':
-case '\'':
-case '\':
-break;
-default:
-qemu_printf(unsupported escape code: '\\%c'\n, c);
-goto fail;
-}
-if ((q - buf)  buf_size - 1) {
-*q++ = c;
-}
-} else {
-if ((q - buf)  buf_size - 1) {
-*q++ = *p;
-}
-p++;
-}
-}
-if (*p != '\') {
-qemu_printf(unterminated string\n);
-goto fail;
-}
-p++;
-} else {
-while (*p != '\0'  !qemu_isspace(*p)) {
-if ((q - buf)  buf_size - 1) {
-*q++ = *p;
-}
-p++;
-}
-}
-*q = '\0';
-*pp = p;
-return 0;
-}
-
 /*
  * Store the command-name in cmdname, and return a pointer to
  * the remaining of the command string.
@@ -3554,8 +3587,6 @@ static char *key_get_info(const char *type, char **key)
 static int default_fmt_format = 'x';
 static int default_fmt_size = 4;
 
-#define MAX_ARGS 16
-
 static int is_valid_option(const char *c, const char *typestr)
 {
 char option[3];
@@ 

[Qemu-devel] [PATCH V9 14/15] monitor: allow help show message for single command in sub group

2013-08-23 Thread Wenchao Xia
A new parameter type 'S' is introduced to allow user input any string.
help info block works normal now.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 hmp-commands.hx |2 +-
 monitor.c   |   27 +++
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8c6b91a..c161fe9 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -11,7 +11,7 @@ ETEXI
 
 {
 .name   = help|?,
-.args_type  = name:s?,
+.args_type  = name:S?,
 .params = [cmd],
 .help   = show the help,
 .mhandler.cmd = do_help_cmd,
diff --git a/monitor.c b/monitor.c
index 47791b4..b657c85 100644
--- a/monitor.c
+++ b/monitor.c
@@ -83,6 +83,7 @@
  * 'F'  filename
  * 'B'  block device name
  * 's'  string (accept optional quote)
+ * 'S'  it just appends the rest of the string (accept optional quote)
  * 'O'  option string of the form NAME=VALUE,...
  *  parsed according to QemuOptsList given by its name
  *  Example: 'device:O' uses qemu_device_opts.
@@ -4047,6 +4048,31 @@ static const mon_cmd_t *monitor_parse_command(Monitor 
*mon,
 }
 }
 break;
+case 'S':
+{
+/* package all remaining string */
+int len;
+
+while (qemu_isspace(*p)) {
+p++;
+}
+if (*typestr == '?') {
+typestr++;
+if (*p == '\0') {
+/* no remaining string: NULL argument */
+break;
+}
+}
+len = strlen(p);
+if (len = 0) {
+monitor_printf(mon, %s: string expected\n,
+   cmdname);
+break;
+}
+qdict_put(qdict, key, qstring_from_str(p));
+p += len;
+}
+break;
 default:
 bad_type:
 monitor_printf(mon, %s: unknown type '%c'\n, cmdname, c);
@@ -4294,6 +4320,7 @@ static void monitor_find_completion_by_table(Monitor *mon,
 bdrv_iterate(block_completion_it, mbs);
 break;
 case 's':
+case 'S':
 if (!strcmp(cmd-name, sendkey)) {
 char *sep = strrchr(str, '-');
 if (sep)
-- 
1.7.1





[Qemu-devel] [PATCH V9 13/15] monitor: support sub command in auto completion

2013-08-23 Thread Wenchao Xia
This patch allows auto completion work normal for sub command case,
info block [DEVICE] can auto complete now, by re-enter the completion
function. In original code info is treated as a special case, now it
is treated as a sub command group, global variable info_cmds is not used
any more.

help command is still treated as a special case, since it is not a sub
command group but want to auto complete command in root command table.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
 monitor.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/monitor.c b/monitor.c
index 49a5a88..47791b4 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4262,6 +4262,12 @@ static void monitor_find_completion_by_table(Monitor 
*mon,
 return;
 }
 
+if (cmd-sub_table) {
+/* do the job again */
+return monitor_find_completion_by_table(mon, cmd-sub_table,
+args[1], nb_args - 1);
+}
+
 ptype = next_arg_type(cmd-args_type);
 for(i = 0; i  nb_args - 2; i++) {
 if (*ptype != '\0') {
@@ -4288,13 +4294,7 @@ static void monitor_find_completion_by_table(Monitor 
*mon,
 bdrv_iterate(block_completion_it, mbs);
 break;
 case 's':
-/* XXX: more generic ? */
-if (!strcmp(cmd-name, info)) {
-readline_set_completion_index(mon-rs, strlen(str));
-for(cmd = info_cmds; cmd-name != NULL; cmd++) {
-cmd_completion(mon, str, cmd-name);
-}
-} else if (!strcmp(cmd-name, sendkey)) {
+if (!strcmp(cmd-name, sendkey)) {
 char *sep = strrchr(str, '-');
 if (sep)
 str = sep + 1;
-- 
1.7.1





Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-23 Thread Asias He
On Thu, Aug 22, 2013 at 11:51:00AM +0200, Paolo Bonzini wrote:
 Il 22/08/2013 11:50, Asias He ha scritto:
  On Wed, Aug 21, 2013 at 10:16:02AM +0200, Paolo Bonzini wrote:
  Il 21/08/2013 04:02, Asias He ha scritto:
  In block/gluster.c, we have
 
  gluster_finish_aiocb
  {
 if (retval != sizeof(acb)) {
qemu_mutex_lock_iothread(); /* We are in gluster thread context */
...
qemu_mutex_unlock_iothread();
 }
  }
 
  qemu tools, e.g. qemu-img, might race here because
  qemu_mutex_{lock,unlock}_iothread are a nop operation and
  gluster_finish_aiocb is in the gluster thread context.
 
  To fix, we introduce our own mutex for qemu tools.
 
  Signed-off-by: Asias He as...@redhat.com
  ---
   stubs/iothread-lock.c | 11 +++
   1 file changed, 11 insertions(+)
 
  diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
  index 5d8aca1..d5c6dec 100644
  --- a/stubs/iothread-lock.c
  +++ b/stubs/iothread-lock.c
  @@ -1,10 +1,21 @@
   #include qemu-common.h
   #include qemu/main-loop.h
   
  +static QemuMutex qemu_tools_mutex;
  +static pthread_once_t qemu_tools_once = PTHREAD_ONCE_INIT;
  
  Doesn't work on Windows, but you can just add
  
  Hmm, Any reasons, why it does not work on Windows?
 
 There are no pthreads on Windows.
 
 There is an emulation library, but we're not using it.

This one: http://www.sourceware.org/pthreads-win32 ?

I found other places use pthread_create. e.g.

[qemu]$ git grep pthread_create
audio/audio_pt_int.c:err = pthread_create (p-thread, NULL, func,
opaque);
audio/audio_pt_int.c:efunc = pthread_create;
clone_func, info);
qemu-nbd.c:pthread_create(show_parts_thread, NULL, show_parts,
device);
qemu-nbd.c:ret = pthread_create(client_thread, NULL,
nbd_client_thread, device);

These bits are also not working on Windows?

  __attribute__((__constructor__)) to qemu_tools_mutex_init.
  
  __attribute__((__constructor__)) works on Windows?
 
 Yes, it is part of the runtime library's support for C++.  We use it
 already, see include/qemu/module.h.

Hmm, nice. 

 Paolo
 
  Paolo
 
  +static void qemu_tools_mutex_init(void)
  +{
  +qemu_mutex_init(qemu_tools_mutex);
  +}
  +
   void qemu_mutex_lock_iothread(void)
   {
  +pthread_once(qemu_tools_once, qemu_tools_mutex_init);
  +qemu_mutex_lock(qemu_tools_mutex);
   }
   
   void qemu_mutex_unlock_iothread(void)
   {
  +qemu_mutex_unlock(qemu_tools_mutex);
   }
 
 
  
 

-- 
Asias



Re: [Qemu-devel] [PATCH V9 06/12] NUMA: Add Linux libnuma detection

2013-08-23 Thread Andrew Jones


- Original Message -
 Add detection of libnuma (mostly contained in the numactl package)
 to the configure script. Can be enabled or disabled on the command line,
 default is use if available.
 
 Signed-off-by: Andre Przywara andre.przyw...@amd.com
 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com

Is this patch still necessary? I thought that dropping the
numa_num_configured_nodes() calls from patch 8/12 got rid
of the need for this library. Maybe I missed other uses?

drew



Re: [Qemu-devel] [PATCH V9 08/12] NUMA: set guest numa nodes memory policy

2013-08-23 Thread Andrew Jones


- Original Message -
 Set the guest numa nodes memory policies using the mbind(2)
 system call node by node.
 After this patch, we are able to set guest nodes memory policies
 through the QEMU options, this arms to solve the guest cross
 nodes memory access performance issue.
 And as you all know, if PCI-passthrough is used,
 direct-attached-device uses DMA transfer between device and qemu process.
 All pages of the guest will be pinned by get_user_pages().
 
 KVM_ASSIGN_PCI_DEVICE ioctl
   kvm_vm_ioctl_assign_device()
 =kvm_assign_device()
   = kvm_iommu_map_memslots()
 = kvm_iommu_map_pages()
= kvm_pin_pages()
 
 So, with direct-attached-device, all guest page's page count will be +1 and
 any page migration will not work. AutoNUMA won't too.
 
 So, we should set the guest nodes memory allocation policies before
 the pages are really mapped.
 
 Signed-off-by: Andre Przywara andre.przyw...@amd.com
 Signed-off-by: Wanlong Gao gaowanl...@cn.fujitsu.com
 ---
  numa.c | 90
  ++
  1 file changed, 90 insertions(+)
 
 diff --git a/numa.c b/numa.c
 index 4ccc6cb..4a9c368 100644
 --- a/numa.c
 +++ b/numa.c
 @@ -28,6 +28,16 @@
  #include qapi-visit.h
  #include qapi/opts-visitor.h
  #include qapi/dealloc-visitor.h
 +#include exec/memory.h
 +
 +#ifdef CONFIG_NUMA
 +#include numa.h
 +#include numaif.h
 +#ifndef MPOL_F_RELATIVE_NODES
 +#define MPOL_F_RELATIVE_NODES (1  14)
 +#define MPOL_F_STATIC_NODES   (1  15)
 +#endif
 +#endif
  
  QemuOptsList qemu_numa_opts = {
  .name = numa,
 @@ -219,6 +229,79 @@ void set_numa_nodes(void)
  }
  }
  
 +#ifdef CONFIG_NUMA
 +static int node_parse_bind_mode(unsigned int nodeid)
 +{
 +int bind_mode;
 +
 +switch (numa_info[nodeid].policy) {
 +case NUMA_NODE_POLICY_MEMBIND:
 +bind_mode = MPOL_BIND;
 +break;
 +case NUMA_NODE_POLICY_INTERLEAVE:
 +bind_mode = MPOL_INTERLEAVE;
 +break;
 +case NUMA_NODE_POLICY_PREFERRED:
 +bind_mode = MPOL_PREFERRED;
 +break;
 +case NUMA_NODE_POLICY_DEFAULT:
 +default:
 +bind_mode = MPOL_DEFAULT;
 +return bind_mode;
 +}
 +
 +bind_mode |= numa_info[nodeid].relative ?
 +MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES;
 +
 +return bind_mode;
 +}
 +#endif
 +
 +static int set_node_mem_policy(int nodeid)
 +{
 +#ifdef CONFIG_NUMA
 +void *ram_ptr;
 +RAMBlock *block;
 +ram_addr_t len, ram_offset = 0;
 +int bind_mode;
 +int i;
 +
 +QTAILQ_FOREACH(block, ram_list.blocks, next) {
 +if (!strcmp(block-mr-name, pc.ram)) {
 +break;
 +}
 +}
 +
 +if (block-host == NULL) {
 +return -1;
 +}
 +
 +ram_ptr = block-host;
 +for (i = 0; i  nodeid; i++) {
 +len = numa_info[i].node_mem;
 +ram_offset += len;
 +}
 +
 +len = numa_info[nodeid].node_mem;
 +bind_mode = node_parse_bind_mode(nodeid);
 +unsigned long *nodes = numa_info[nodeid].host_mem;
 +
 +/* This is a workaround for a long standing bug in Linux'
 + * mbind implementation, which cuts off the last specified
 + * node. To stay compatible should this bug be fixed, we
 + * specify one more node and zero this one out.
 + */
 +unsigned long maxnode = find_last_bit(nodes, MAX_CPUMASK_BITS);
 +clear_bit(maxnode + 1, nodes);

This clear_bit() isn't necessary. We know that maxnode+1 is certainly
already clear, because find_last_bit() just returned maxnode as the last
non-clear bit.

 +if (mbind(ram_ptr + ram_offset, len, bind_mode, nodes, maxnode + 1, 0))
 {
 +perror(mbind);
 +return -1;
 +}
 +#endif
 +
 +return 0;
 +}
 +
  void set_numa_modes(void)
  {
  CPUState *cpu;
 @@ -231,4 +314,11 @@ void set_numa_modes(void)
  }
  }
  }
 +
 +for (i = 0; i  nb_numa_nodes; i++) {
 +if (set_node_mem_policy(i) == -1) {
 +fprintf(stderr,
 +qemu: can not set host memory policy for node%d\n, i);
 +}
 +}
  }
 --
 1.8.4.rc4
 
 
 



Re: [Qemu-devel] pvpanic plans?

2013-08-23 Thread Paolo Bonzini
Il 22/08/2013 22:39, Anthony Liguori ha scritto:
 On Thu, Aug 22, 2013 at 3:36 PM, Laszlo Ersek ler...@redhat.com wrote:
 On 08/22/13 22:09, Anthony Liguori wrote:

 The difference is that ACPI or platform devices in general are
 unexpected to be added.  By definition it means that the motherboard has
 most likely been changed.

 You could encounter a new ACPI artifact after simply re-flashing your MB
 with an updated BIOS, without opening the chassis. If windows can't
 deal with that, their loss! :)
 
 I'm pretty sure does Windows boot up okay is on every major vendor's
 firmware test plan for shipping new updates...

For a firmware vendor it is perfectly okay to ship and require new
drivers for functionality introduced by a firmware update...

Paolo



Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Lei Li

On 08/23/2013 03:48 PM, Paolo Bonzini wrote:

Il 23/08/2013 08:25, Lei Li ha scritto:

On 08/21/2013 06:48 PM, Paolo Bonzini wrote:

Il 21/08/2013 09:18, Lei Li ha scritto:

Send all the ram blocks hooked by save_page, which will copy
ram page and MADV_DONTNEED the page just copied.

You should implement this entirely in the hook.

It will be a little less efficient because of the dirty bitmap overhead,
but you should aim at having *zero* changes in arch_init.c and
migration.c.

Yes, the reason I modify the migration_thread() to have new process that
send all the ram pages in adjusted qemu_savevm_state_begin stage and send device
states in qemu_savevm_device_state stage for localhost migration is to avoid the
bitmap thing, which is a little less efficient just like you mentioned above.

The performance assurance is very important to this feature, our goal is
100ms of downtime for a 1TB guest.

Do not _start_ by introducing encapsulation violations all over the place.

Juan has been working on optimizing the dirty bitmap code.  His patches
could introduce a speedup of a factor of up to 64.  Thus it is possible
that his work will help you enough that you can work with the dirty bitmap.

Also, this feature (not looking at the dirty bitmap if the machine is
stopped) is not limited to localhost migration, add it later once the
basic vmsplice plumbing is in place.  This will also let you profile the
code and understand whether the goal is attainable.

I honestly doubt that 100ms of downtime is possible while the machine is
stopped.  A 1TB guest has 2^28 = 268*10^6 pages, which you want to
process in 100*10^6 nanoseconds.  Thus, your approach would require 0.4
nanoseconds per page, or roughly 2 clock cycles per page.  This is
impossible without _massive_ parallelization at all levels, starting
from the kernel.

As a matter of fact, 2^28 madvise system calls will take much, much
longer than 100ms.

Have you thought of using shared memory (with -mempath) instead of vmsplice?


Precisely!

Well, as Anthony mentioned in the version 1[1], there has been some work 
involved
regarding improvement of vmsplice() at kernel side by Robert Jennings[2].

And yes, shared memory is an alternative, I think the problem with shared 
memory is
that can't share anonymous memory. For this maybe Anthony can chime in as the 
original
idea him.  :-)


Reference links:

[1] Anthony's comments:
  https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html

[2] vmpslice support for zero-copy gifting of pages:
  http://comments.gmane.org/gmane.linux.kernel.mm/103998



Paolo




--
Lei




[Qemu-devel] [PULL 0/9] KVM changes for 2013-08-23

2013-08-23 Thread Paolo Bonzini
Anthony,

The following changes since commit f03d07d4683b2e8325a7cb60b4e14b977b1a869c:

  Merge remote-tracking branch 'quintela/migration.next' into staging 
(2013-07-23 10:57:23 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

for you to fetch changes up to 3f994214cd39cfdac57be32c4d0cf401a046b17f:

  kvm: shorten the parameter list for get_real_device() (2013-08-22 18:40:12 
+0200)

Paolo


Arthur Chunqi Li (1):
  Initialize IA32_FEATURE_CONTROL MSR in reset and migration

Jan Kiszka (1):
  kvm: Simplify kvm_handle_io

Liu Jinsong (1):
  kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled

Marcelo Tosatti (2):
  kvm-all.c: max_cpus should not exceed KVM vcpu limit
  kvm: i386: fix LAPIC TSC deadline timer save/restore

Paolo Bonzini (3):
  target-i386: remove tabs from target-i386/cpu.h
  kvm: migrate vPMU state
  kvm: shorten the parameter list for get_real_device()

Vincenzo Maffione (1):
  kvm: add KVM_IRQFD_FLAG_RESAMPLE support

 hw/i386/kvm/pci-assign.c |   9 +-
 hw/misc/vfio.c   |   4 +-
 hw/virtio/virtio-pci.c   |   2 +-
 include/sysemu/kvm.h |   3 +-
 kvm-all.c|  52 +---
 target-i386/cpu.h| 217 ++-
 target-i386/kvm.c| 139 --
 target-i386/machine.c|  66 ++
 8 files changed, 349 insertions(+), 143 deletions(-)
-- 
1.8.3.1




[Qemu-devel] [PULL 1/9] Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-08-23 Thread Paolo Bonzini
From: Arthur Chunqi Li yzt...@gmail.com

The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
to clear this MSR when reset vCPU and keep the value of it when
migration. This patch add this feature.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c |  4 
 target-i386/machine.c | 22 ++
 3 files changed, 28 insertions(+)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index cedefdc..3a52f94 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -301,6 +301,7 @@
 #define MSR_IA32_APICBASE_BSP   (18)
 #define MSR_IA32_APICBASE_ENABLE(111)
 #define MSR_IA32_APICBASE_BASE  (0xf12)
+#define MSR_IA32_FEATURE_CONTROL0x003a
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_TSCDEADLINE0x6e0
 
@@ -813,6 +814,7 @@ typedef struct CPUX86State {
 
 uint64_t mcg_status;
 uint64_t msr_ia32_misc_enable;
+uint64_t msr_ia32_feature_control;
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3c9d10a..84ac00a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1121,6 +1121,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (hyperv_vapic_recommended()) {
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
+kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
env-msr_ia32_feature_control);
 }
 if (env-mcg_cap) {
 int i;
@@ -1345,6 +1346,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_misc_enable) {
 msrs[n++].index = MSR_IA32_MISC_ENABLE;
 }
+msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
 
 if (!env-tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1443,6 +1445,8 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_MISC_ENABLE:
 env-msr_ia32_misc_enable = msrs[i].data;
 break;
+case MSR_IA32_FEATURE_CONTROL:
+env-msr_ia32_feature_control = msrs[i].data;
 default:
 if (msrs[i].index = MSR_MC0_CTL 
 msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index f9ec581..0d2088e 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -435,6 +435,14 @@ static bool misc_enable_needed(void *opaque)
 return env-msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
 }
 
+static bool feature_control_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = cpu-env;
+
+return env-msr_ia32_feature_control != 0;
+}
+
 static const VMStateDescription vmstate_msr_ia32_misc_enable = {
 .name = cpu/msr_ia32_misc_enable,
 .version_id = 1,
@@ -446,6 +454,17 @@ static const VMStateDescription 
vmstate_msr_ia32_misc_enable = {
 }
 };
 
+static const VMStateDescription vmstate_msr_ia32_feature_control = {
+.name = cpu/msr_ia32_feature_control,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = cpu,
 .version_id = 12,
@@ -571,6 +590,9 @@ const VMStateDescription vmstate_x86_cpu = {
 }, {
 .vmsd = vmstate_msr_ia32_misc_enable,
 .needed = misc_enable_needed,
+}, {
+.vmsd = vmstate_msr_ia32_feature_control,
+.needed = feature_control_needed,
 } , {
 /* empty */
 }
-- 
1.8.3.1





[Qemu-devel] [PULL 2/9] target-i386: remove tabs from target-i386/cpu.h

2013-08-23 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/cpu.h | 192 +++---
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 3a52f94..af4c0f7 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -37,9 +37,9 @@
 #define TARGET_HAS_ICE 1
 
 #ifdef TARGET_X86_64
-#define ELF_MACHINEEM_X86_64
+#define ELF_MACHINE EM_X86_64
 #else
-#define ELF_MACHINEEM_386
+#define ELF_MACHINE EM_386
 #endif
 
 #define CPUArchState struct CPUX86State
@@ -98,10 +98,10 @@
 #define DESC_TSS_BUSY_MASK (1  9)
 
 /* eflags masks */
-#define CC_C   0x0001
-#define CC_P   0x0004
-#define CC_A   0x0010
-#define CC_Z   0x0040
+#define CC_C0x0001
+#define CC_P0x0004
+#define CC_A0x0010
+#define CC_Z0x0040
 #define CC_S0x0080
 #define CC_O0x0800
 
@@ -109,14 +109,14 @@
 #define IOPL_SHIFT 12
 #define VM_SHIFT   17
 
-#define TF_MASK0x0100
-#define IF_MASK0x0200
-#define DF_MASK0x0400
-#define IOPL_MASK  0x3000
-#define NT_MASK0x4000
-#define RF_MASK0x0001
-#define VM_MASK0x0002
-#define AC_MASK0x0004
+#define TF_MASK 0x0100
+#define IF_MASK 0x0200
+#define DF_MASK 0x0400
+#define IOPL_MASK   0x3000
+#define NT_MASK 0x4000
+#define RF_MASK 0x0001
+#define VM_MASK 0x0002
+#define AC_MASK 0x0004
 #define VIF_MASK0x0008
 #define VIP_MASK0x0010
 #define ID_MASK 0x0020
@@ -238,28 +238,28 @@
 #define DR7_TYPE_IO_RW   0x2
 #define DR7_TYPE_DATA_RW 0x3
 
-#define PG_PRESENT_BIT 0
-#define PG_RW_BIT  1
-#define PG_USER_BIT2
-#define PG_PWT_BIT 3
-#define PG_PCD_BIT 4
-#define PG_ACCESSED_BIT5
-#define PG_DIRTY_BIT   6
-#define PG_PSE_BIT 7
-#define PG_GLOBAL_BIT  8
-#define PG_NX_BIT  63
+#define PG_PRESENT_BIT  0
+#define PG_RW_BIT   1
+#define PG_USER_BIT 2
+#define PG_PWT_BIT  3
+#define PG_PCD_BIT  4
+#define PG_ACCESSED_BIT 5
+#define PG_DIRTY_BIT6
+#define PG_PSE_BIT  7
+#define PG_GLOBAL_BIT   8
+#define PG_NX_BIT   63
 
 #define PG_PRESENT_MASK  (1  PG_PRESENT_BIT)
-#define PG_RW_MASK  (1  PG_RW_BIT)
-#define PG_USER_MASK(1  PG_USER_BIT)
-#define PG_PWT_MASK (1  PG_PWT_BIT)
-#define PG_PCD_MASK (1  PG_PCD_BIT)
+#define PG_RW_MASK   (1  PG_RW_BIT)
+#define PG_USER_MASK (1  PG_USER_BIT)
+#define PG_PWT_MASK  (1  PG_PWT_BIT)
+#define PG_PCD_MASK  (1  PG_PCD_BIT)
 #define PG_ACCESSED_MASK (1  PG_ACCESSED_BIT)
-#define PG_DIRTY_MASK   (1  PG_DIRTY_BIT)
-#define PG_PSE_MASK (1  PG_PSE_BIT)
-#define PG_GLOBAL_MASK  (1  PG_GLOBAL_BIT)
+#define PG_DIRTY_MASK(1  PG_DIRTY_BIT)
+#define PG_PSE_MASK  (1  PG_PSE_BIT)
+#define PG_GLOBAL_MASK   (1  PG_GLOBAL_BIT)
 #define PG_HI_USER_MASK  0x7ff0LL
-#define PG_NX_MASK  (1LL  PG_NX_BIT)
+#define PG_NX_MASK   (1LL  PG_NX_BIT)
 
 #define PG_ERROR_W_BIT 1
 
@@ -269,32 +269,32 @@
 #define PG_ERROR_RSVD_MASK 0x08
 #define PG_ERROR_I_D_MASK  0x10
 
-#define MCG_CTL_P  (1ULL8)   /* MCG_CAP register available */
-#define MCG_SER_P  (1ULL24) /* MCA recovery/new status bits */
+#define MCG_CTL_P   (1ULL8)   /* MCG_CAP register available */
+#define MCG_SER_P   (1ULL24) /* MCA recovery/new status bits */
 
-#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P)
-#define MCE_BANKS_DEF  10
+#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
+#define MCE_BANKS_DEF   10
 
-#define MCG_STATUS_RIPV(1ULL0)   /* restart ip valid */
-#define MCG_STATUS_EIPV(1ULL1)   /* ip points to correct instruction 
*/
-#define MCG_STATUS_MCIP(1ULL2)   /* machine check in progress */
+#define MCG_STATUS_RIPV (1ULL0)   /* restart ip valid */
+#define MCG_STATUS_EIPV (1ULL1)   /* ip points to correct instruction */
+#define MCG_STATUS_MCIP (1ULL2)   /* machine check in progress */
 
-#define MCI_STATUS_VAL (1ULL63)  /* valid error */
-#define MCI_STATUS_OVER(1ULL62)  /* previous errors lost */
-#define MCI_STATUS_UC  (1ULL61)  /* uncorrected error */
-#define MCI_STATUS_EN  (1ULL60)  /* error enabled */
-#define MCI_STATUS_MISCV (1ULL59) /* misc error reg. valid */
-#define MCI_STATUS_ADDRV (1ULL58) /* addr reg. valid */
-#define MCI_STATUS_PCC (1ULL57)  /* processor context corrupt */
-#define MCI_STATUS_S   (1ULL56)  /* Signaled machine check */
-#define MCI_STATUS_AR  (1ULL55)  /* Action required */
+#define MCI_STATUS_VAL   (1ULL63)  /* valid error */
+#define MCI_STATUS_OVER  (1ULL62)  /* previous errors lost */
+#define MCI_STATUS_UC(1ULL61)  /* uncorrected error */
+#define MCI_STATUS_EN(1ULL60)  /* 

[Qemu-devel] [PULL 4/9] kvm: add KVM_IRQFD_FLAG_RESAMPLE support

2013-08-23 Thread Paolo Bonzini
From: Vincenzo Maffione v.maffi...@gmail.com

Added an EventNotifier* parameter to
kvm-all.c:kvm_irqchip_add_irqfd_notifier(), in order to give KVM
another eventfd to be used as resamplefd. See the documentation
in the linux kernel sources in Documentation/virtual/kvm/api.txt
(section 4.75) for more details.
When the added parameter is passed NULL, the behaviour of the
function is unchanged with respect to the previous versions.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/misc/vfio.c |  4 ++--
 hw/virtio/virtio-pci.c |  2 +-
 include/sysemu/kvm.h   |  3 ++-
 kvm-all.c  | 17 +
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index ad8ce77..54af34a 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -646,7 +646,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
-   vector-virq)  0) {
+   NULL, vector-virq)  0) {
 if (vector-virq = 0) {
 kvm_irqchip_release_virq(kvm_state, vector-virq);
 vector-virq = -1;
@@ -814,7 +814,7 @@ retry:
 vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
-   vector-virq)  0) {
+   NULL, vector-virq)  0) {
 qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt),
 vfio_msi_interrupt, NULL, vector);
 }
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c38cfd1..c4db407 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -508,7 +508,7 @@ static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
 VirtQueue *vq = virtio_get_queue(proxy-vdev, queue_no);
 EventNotifier *n = virtio_queue_get_guest_notifier(vq);
 int ret;
-ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd-virq);
+ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, NULL, irqfd-virq);
 return ret;
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index f8ac448..ce3efaf 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -309,7 +309,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
-int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq);
+int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n,
+   EventNotifier *rn, int virq);
 int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq);
 void kvm_pc_gsi_handler(void *opaque, int n, int level);
 void kvm_pc_setup_irq_routing(bool pci_enabled);
diff --git a/kvm-all.c b/kvm-all.c
index 4fb4ccb..bfa4aac 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1230,7 +1230,8 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg)
 return kvm_update_routing_entry(s, kroute);
 }
 
-static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
+static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int rfd, int virq,
+bool assign)
 {
 struct kvm_irqfd irqfd = {
 .fd = fd,
@@ -1238,6 +1239,11 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, 
int virq, bool assign)
 .flags = assign ? 0 : KVM_IRQFD_FLAG_DEASSIGN,
 };
 
+if (rfd != -1) {
+irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
+irqfd.resamplefd = rfd;
+}
+
 if (!kvm_irqfds_enabled()) {
 return -ENOSYS;
 }
@@ -1276,14 +1282,17 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg)
 }
 #endif /* !KVM_CAP_IRQ_ROUTING */
 
-int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq)
+int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n,
+   EventNotifier *rn, int virq)
 {
-return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), virq, true);
+return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n),
+   rn ? event_notifier_get_fd(rn) : -1, virq, true);
 }
 
 int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq)
 {
-return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), virq, false);
+return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), -1, virq,
+   false);
 }
 
 static int kvm_irqchip_create(KVMState *s)
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 10:32, Asias He ha scritto:
 On Thu, Aug 22, 2013 at 11:51:00AM +0200, Paolo Bonzini wrote:
 Il 22/08/2013 11:50, Asias He ha scritto:
 On Wed, Aug 21, 2013 at 10:16:02AM +0200, Paolo Bonzini wrote:
 Il 21/08/2013 04:02, Asias He ha scritto:
 In block/gluster.c, we have

 gluster_finish_aiocb
 {
if (retval != sizeof(acb)) {
   qemu_mutex_lock_iothread(); /* We are in gluster thread context */
   ...
   qemu_mutex_unlock_iothread();
}
 }

 qemu tools, e.g. qemu-img, might race here because
 qemu_mutex_{lock,unlock}_iothread are a nop operation and
 gluster_finish_aiocb is in the gluster thread context.

 To fix, we introduce our own mutex for qemu tools.

 Signed-off-by: Asias He as...@redhat.com
 ---
  stubs/iothread-lock.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
 index 5d8aca1..d5c6dec 100644
 --- a/stubs/iothread-lock.c
 +++ b/stubs/iothread-lock.c
 @@ -1,10 +1,21 @@
  #include qemu-common.h
  #include qemu/main-loop.h
  
 +static QemuMutex qemu_tools_mutex;
 +static pthread_once_t qemu_tools_once = PTHREAD_ONCE_INIT;

 Doesn't work on Windows, but you can just add

 Hmm, Any reasons, why it does not work on Windows?

 There are no pthreads on Windows.

 There is an emulation library, but we're not using it.
 
 This one: http://www.sourceware.org/pthreads-win32 ?

Yes.

 I found other places use pthread_create. e.g.
 
 [qemu]$ git grep pthread_create
 audio/audio_pt_int.c:err = pthread_create (p-thread, NULL, func,
 opaque);
 audio/audio_pt_int.c:efunc = pthread_create;
 clone_func, info);
 qemu-nbd.c:pthread_create(show_parts_thread, NULL, show_parts,
 device);
 qemu-nbd.c:ret = pthread_create(client_thread, NULL,
 nbd_client_thread, device);
 
 These bits are also not working on Windows?

They are not used on Windows, no.  qemu-nbd is not compiled, audio uses
audio/audio_win_int.c.

Paolo

 __attribute__((__constructor__)) to qemu_tools_mutex_init.

 __attribute__((__constructor__)) works on Windows?

 Yes, it is part of the runtime library's support for C++.  We use it
 already, see include/qemu/module.h.
 
 Hmm, nice. 
 
 Paolo

 Paolo

 +static void qemu_tools_mutex_init(void)
 +{
 +qemu_mutex_init(qemu_tools_mutex);
 +}
 +
  void qemu_mutex_lock_iothread(void)
  {
 +pthread_once(qemu_tools_once, qemu_tools_mutex_init);
 +qemu_mutex_lock(qemu_tools_mutex);
  }
  
  void qemu_mutex_unlock_iothread(void)
  {
 +qemu_mutex_unlock(qemu_tools_mutex);
  }




 




[Qemu-devel] [PULL 5/9] kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled

2013-08-23 Thread Paolo Bonzini
From: Liu Jinsong jinsong@intel.com

This patch is to fix the bug https://bugs.launchpad.net/qemu-kvm/+bug/1207623

IA32_FEATURE_CONTROL is pointless if not expose VMX or SMX bits to
cpuid.1.ecx of vcpu. Current qemu-kvm will error return when kvm_put_msrs
or kvm_get_msrs.

Signed-off-by: Liu Jinsong jinsong@intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/kvm.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 513ae52..7bb8455 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -65,6 +65,7 @@ static bool has_msr_star;
 static bool has_msr_hsave_pa;
 static bool has_msr_tsc_adjust;
 static bool has_msr_tsc_deadline;
+static bool has_msr_feature_control;
 static bool has_msr_async_pf_en;
 static bool has_msr_pv_eoi_en;
 static bool has_msr_misc_enable;
@@ -666,6 +667,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 qemu_add_vm_change_state_handler(cpu_update_state, env);
 
+c = cpuid_find_entry(cpuid_data.cpuid, 1, 0);
+if (c) {
+has_msr_feature_control = !!(c-ecx  CPUID_EXT_VMX) ||
+  !!(c-ecx  CPUID_EXT_SMX);
+}
+
 cpuid_data.cpuid.padding = 0;
 r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
 if (r) {
@@ -1169,7 +1176,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (hyperv_vapic_recommended()) {
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
-kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
env-msr_ia32_feature_control);
+if (has_msr_feature_control) {
+kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL,
+  env-msr_ia32_feature_control);
+}
 }
 if (env-mcg_cap) {
 int i;
@@ -1394,7 +1404,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_misc_enable) {
 msrs[n++].index = MSR_IA32_MISC_ENABLE;
 }
-msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
+if (has_msr_feature_control) {
+msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
+}
 
 if (!env-tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1509,6 +1521,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 break;
 case MSR_IA32_FEATURE_CONTROL:
 env-msr_ia32_feature_control = msrs[i].data;
+break;
 default:
 if (msrs[i].index = MSR_MC0_CTL 
 msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
-- 
1.8.3.1





[Qemu-devel] [PULL 3/9] kvm: migrate vPMU state

2013-08-23 Thread Paolo Bonzini
Reviewed-by: Gleb Natapov gnata...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/cpu.h | 23 +
 target-i386/kvm.c | 93 ---
 target-i386/machine.c | 44 
 3 files changed, 155 insertions(+), 5 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index af4c0f7..31de265 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -305,6 +305,8 @@
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_TSCDEADLINE0x6e0
 
+#define MSR_P6_PERFCTR0 0xc1
+
 #define MSR_MTRRcap 0xfe
 #define MSR_MTRRcap_VCNT8
 #define MSR_MTRRcap_FIXRANGE_SUPPORT(1  8)
@@ -318,6 +320,8 @@
 #define MSR_MCG_STATUS  0x17a
 #define MSR_MCG_CTL 0x17b
 
+#define MSR_P6_EVNTSEL0 0x186
+
 #define MSR_IA32_PERF_STATUS0x198
 
 #define MSR_IA32_MISC_ENABLE0x1a0
@@ -343,6 +347,14 @@
 
 #define MSR_MTRRdefType 0x2ff
 
+#define MSR_CORE_PERF_FIXED_CTR00x309
+#define MSR_CORE_PERF_FIXED_CTR10x30a
+#define MSR_CORE_PERF_FIXED_CTR20x30b
+#define MSR_CORE_PERF_FIXED_CTR_CTRL0x38d
+#define MSR_CORE_PERF_GLOBAL_STATUS 0x38e
+#define MSR_CORE_PERF_GLOBAL_CTRL   0x38f
+#define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
+
 #define MSR_MC0_CTL 0x400
 #define MSR_MC0_STATUS  0x401
 #define MSR_MC0_ADDR0x402
@@ -721,6 +733,9 @@ typedef struct {
 #define CPU_NB_REGS CPU_NB_REGS32
 #endif
 
+#define MAX_FIXED_COUNTERS 3
+#define MAX_GP_COUNTERS(MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
+
 #define NB_MMU_MODES 3
 
 typedef enum TPRAccess {
@@ -816,6 +831,14 @@ typedef struct CPUX86State {
 uint64_t msr_ia32_misc_enable;
 uint64_t msr_ia32_feature_control;
 
+uint64_t msr_fixed_ctr_ctrl;
+uint64_t msr_global_ctrl;
+uint64_t msr_global_status;
+uint64_t msr_global_ovf_ctrl;
+uint64_t msr_fixed_counters[MAX_FIXED_COUNTERS];
+uint64_t msr_gp_counters[MAX_GP_COUNTERS];
+uint64_t msr_gp_evtsel[MAX_GP_COUNTERS];
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 84ac00a..513ae52 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -71,6 +71,9 @@ static bool has_msr_misc_enable;
 static bool has_msr_kvm_steal_time;
 static int lm_capable_kernel;
 
+static bool has_msr_architectural_pmu;
+static uint32_t num_architectural_pmu_counters;
+
 bool kvm_allows_irq0_override(void)
 {
 return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
@@ -581,6 +584,25 @@ int kvm_arch_init_vcpu(CPUState *cs)
 break;
 }
 }
+
+if (limit = 0x0a) {
+uint32_t ver;
+
+cpu_x86_cpuid(env, 0x0a, 0, ver, unused, unused, unused);
+if ((ver  0xff)  0) {
+has_msr_architectural_pmu = true;
+num_architectural_pmu_counters = (ver  0xff00)  8;
+
+/* Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+if (num_architectural_pmu_counters  MAX_GP_COUNTERS) {
+num_architectural_pmu_counters = MAX_GP_COUNTERS;
+}
+}
+}
+
 cpu_x86_cpuid(env, 0x8000, 0, limit, unused, unused, unused);
 
 for (i = 0x8000; i = limit; i++) {
@@ -1052,7 +1074,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 struct kvm_msr_entry entries[100];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
-int n = 0;
+int n = 0, i;
 
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs);
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
@@ -1094,9 +1116,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 }
 /*
- * The following paravirtual MSRs have side effects on the guest or are
- * too heavy for normal writeback. Limit them to reset or full state
- * updates.
+ * The following MSRs have side effects on the guest or are too heavy
+ * for normal writeback. Limit them to reset or full state updates.
  */
 if (level = KVM_PUT_RESET_STATE) {
 kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME,
@@ -1114,6 +1135,33 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(msrs[n++], MSR_KVM_STEAL_TIME,
   env-steal_time_msr);
 }
+if (has_msr_architectural_pmu) {
+/* Stop the counter.  */
+kvm_msr_entry_set(msrs[n++], MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+kvm_msr_entry_set(msrs[n++], MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+/* Set the counter values.  */
+for (i = 0; i  MAX_FIXED_COUNTERS; 

[Qemu-devel] [PULL 6/9] kvm: Simplify kvm_handle_io

2013-08-23 Thread Paolo Bonzini
From: Jan Kiszka jan.kis...@siemens.com

Now that cpu_in/out is just a wrapper around address_space_rw, we can
also call the latter directly. As host endianness == guest endianness,
there is no need for the memory access helpers st*_p/ld*_p as well.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 kvm-all.c | 28 ++--
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index bfa4aac..ef52a0f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1508,32 +1508,8 @@ static void kvm_handle_io(uint16_t port, void *data, int 
direction, int size,
 uint8_t *ptr = data;
 
 for (i = 0; i  count; i++) {
-if (direction == KVM_EXIT_IO_IN) {
-switch (size) {
-case 1:
-stb_p(ptr, cpu_inb(port));
-break;
-case 2:
-stw_p(ptr, cpu_inw(port));
-break;
-case 4:
-stl_p(ptr, cpu_inl(port));
-break;
-}
-} else {
-switch (size) {
-case 1:
-cpu_outb(port, ldub_p(ptr));
-break;
-case 2:
-cpu_outw(port, lduw_p(ptr));
-break;
-case 4:
-cpu_outl(port, ldl_p(ptr));
-break;
-}
-}
-
+address_space_rw(address_space_io, port, ptr, size,
+ direction == KVM_EXIT_IO_OUT);
 ptr += size;
 }
 }
-- 
1.8.3.1





[Qemu-devel] [PULL 7/9] kvm-all.c: max_cpus should not exceed KVM vcpu limit

2013-08-23 Thread Paolo Bonzini
From: Marcelo Tosatti mtosa...@redhat.com

maxcpus, which specifies the maximum number of hotpluggable CPUs,
should not exceed KVM's vcpu limit.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
[Reword message. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 kvm-all.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index ef52a0f..a2d4978 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1400,6 +1400,13 @@ int kvm_init(void)
 goto err;
 }
 
+if (max_cpus  max_vcpus) {
+ret = -EINVAL;
+fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds 
max cpus 
+supported by KVM (%d)\n, max_cpus, max_vcpus);
+goto err;
+}
+
 s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
 if (s-vmfd  0) {
 #ifdef TARGET_S390X
-- 
1.8.3.1





[Qemu-devel] [PULL 8/9] kvm: i386: fix LAPIC TSC deadline timer save/restore

2013-08-23 Thread Paolo Bonzini
From: Marcelo Tosatti mtosa...@redhat.com

The configuration of the timer represented by MSR_IA32_TSCDEADLINE depends on:

- APIC LVT Timer register.
- TSC value.

Change the order to respect the dependency.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/kvm.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7bb8455..58f7bb7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1073,6 +1073,26 @@ static void kvm_msr_entry_set(struct kvm_msr_entry 
*entry,
 entry-data = value;
 }
 
+static int kvm_put_tscdeadline_msr(X86CPU *cpu)
+{
+CPUX86State *env = cpu-env;
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[1];
+} msr_data;
+struct kvm_msr_entry *msrs = msr_data.entries;
+
+if (!has_msr_tsc_deadline) {
+return 0;
+}
+
+kvm_msr_entry_set(msrs[0], MSR_IA32_TSCDEADLINE, env-tsc_deadline);
+
+msr_data.info.nmsrs = 1;
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, msr_data);
+}
+
 static int kvm_put_msrs(X86CPU *cpu, int level)
 {
 CPUX86State *env = cpu-env;
@@ -1096,9 +1116,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_tsc_adjust) {
 kvm_msr_entry_set(msrs[n++], MSR_TSC_ADJUST, env-tsc_adjust);
 }
-if (has_msr_tsc_deadline) {
-kvm_msr_entry_set(msrs[n++], MSR_IA32_TSCDEADLINE, env-tsc_deadline);
-}
 if (has_msr_misc_enable) {
 kvm_msr_entry_set(msrs[n++], MSR_IA32_MISC_ENABLE,
   env-msr_ia32_misc_enable);
@@ -1808,6 +1825,12 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 return ret;
 }
 }
+
+ret = kvm_put_tscdeadline_msr(x86_cpu);
+if (ret  0) {
+return ret;
+}
+
 ret = kvm_put_vcpu_events(x86_cpu, level);
 if (ret  0) {
 return ret;
-- 
1.8.3.1





[Qemu-devel] [PULL 9/9] kvm: shorten the parameter list for get_real_device()

2013-08-23 Thread Paolo Bonzini
get_real_device() has 5 parameters with the last 4 is contained in the first
structure.

This patch removes the last 4 parameters and directly use them from the first
parameter.

Acked-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/kvm/pci-assign.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index ff33dc8..73941b2 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -568,8 +568,7 @@ static int get_real_device_id(const char *devpath, uint16_t 
*val)
 return get_real_id(devpath, device, val);
 }
 
-static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
-   uint8_t r_bus, uint8_t r_dev, uint8_t r_func)
+static int get_real_device(AssignedDevice *pci_dev)
 {
 char dir[128], name[128];
 int fd, r = 0, v;
@@ -582,7 +581,8 @@ static int get_real_device(AssignedDevice *pci_dev, 
uint16_t r_seg,
 dev-region_number = 0;
 
 snprintf(dir, sizeof(dir), /sys/bus/pci/devices/%04x:%02x:%02x.%x/,
- r_seg, r_bus, r_dev, r_func);
+ pci_dev-host.domain, pci_dev-host.bus,
+ pci_dev-host.slot, pci_dev-host.function);
 
 snprintf(name, sizeof(name), %sconfig, dir);
 
@@ -1769,8 +1769,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 memcpy(dev-emulate_config_write, dev-emulate_config_read,
sizeof(dev-emulate_config_read));
 
-if (get_real_device(dev, dev-host.domain, dev-host.bus,
-dev-host.slot, dev-host.function)) {
+if (get_real_device(dev)) {
 error_report(pci-assign: Error: Couldn't get real device (%s)!,
  dev-dev.qdev.id);
 goto out;
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 04/18] savevm: set right return value for qemu_file_rate_limit

2013-08-23 Thread Lei Li

On 08/23/2013 01:34 PM, Paolo Bonzini wrote:

Say, In ram_save_iterate(), the current logic is:

ret = qemu_file_rate_limit();
while(ret == 0) {
  save RAM blocks until no more to send.
}
if (ret  0) {
  return ret;
}
...

And in savevm layer, qemu_savevm_state_iterate() set an error if the return
value of ram_save_iterate  0.

But that is to report errors *not in the QEMUFile*.  Errors in the
QEMUFile are already reported by qemu_file_get_error(), and
qemu_savevm_state_iterate() will not overwrite them.


Sorry, a little confusion.
In qemu_savevm_state_iterate(), if the return of ram_save_iterate  0,
it will set error in QEMUFile by qemu_file_set_error(f, ret), the error
reported by qemu_file_get_error() in the QEMUFile is got from
qemu_file_set_error() by reading the f-last_error, right?

And now as qemu_file_rate_limit() never return negative value, what's the
meaning for the check: if (qemu_file_rate_limit(f)  0) in ram_save_iterate()?



qemu_file_rate_limit() returning 1 is enough to exit the loop,
which is all that is needed.

Obviously the return value of qemu_file_rate_limit() should have an negative
value for there has been an error. Otherwise we need to modify the logic
above.

It is not obvious to me... what is, again, the bug that you're observing?
I think it is happening only because you're modifying the migration thread's
body.  If you use the normal code of the migration thread, it will just work.

Paolo




--
Lei




Re: [Qemu-devel] [PATCH 13/18] arch_init: adjust ram_save_setup() for migrate_is_localhost

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 11:00, Lei Li ha scritto:
 The performance assurance is very important to this feature, our goal is
 100ms of downtime for a 1TB guest.

 I honestly doubt that 100ms of downtime is possible while the machine is
 stopped.  A 1TB guest has 2^28 = 268*10^6 pages, which you want to
 process in 100*10^6 nanoseconds.  Thus, your approach would require 0.4
 nanoseconds per page, or roughly 2 clock cycles per page.  This is
 impossible without _massive_ parallelization at all levels, starting
 from the kernel.

 Have you thought of using shared memory (with -mempath) instead of
 vmsplice?
 
 Precisely!
 
 Well, as Anthony mentioned in the version 1[1], there has been some work 
 involved
 regarding improvement of vmsplice() at kernel side by Robert Jennings[2].

Oh, finally!

 And yes, shared memory is an alternative, I think the problem with shared 
 memory is
 that can't share anonymous memory. For this maybe Anthony can chime in
 as the original idea him.  :-)

You could perhaps switch from normal to shared memory while the VM is
running?  Either use the dirty bitmap for this, or mmap the shared
memory in place on top of the anonymous memory.  Since it needs to be
done while the VM is stopped, you can do it perhaps 1G at a time and let
the VM run for some time before doing the next 1G.

The nice part is that because the VM is running, you can do it as slow
as you want. :)  Only the final flush and device save affects downtime.

Paolo
 
 Reference links:
 
 [1] Anthony's comments:
   https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html
 
 [2] vmpslice support for zero-copy gifting of pages:
   http://comments.gmane.org/gmane.linux.kernel.mm/103998
 

 Paolo

 
 




Re: [Qemu-devel] [PATCH 04/18] savevm: set right return value for qemu_file_rate_limit

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 11:11, Lei Li ha scritto:
 
 In qemu_savevm_state_iterate(), if the return of ram_save_iterate  0,
 it will set error in QEMUFile by qemu_file_set_error(f, ret), the error
 reported by qemu_file_get_error() in the QEMUFile is got from
 qemu_file_set_error() by reading the f-last_error, right?
 
 And now as qemu_file_rate_limit() never return negative value, what's the
 meaning for the check: if (qemu_file_rate_limit(f)  0) in
 ram_save_iterate()?

I only see a while ((ret = qemu_file_rate_limit(f)) == 0), no
less-than-zero check.

Are we looking at the same code? :)

Paolo




Re: [Qemu-devel] [PATCH 04/18] savevm: set right return value for qemu_file_rate_limit

2013-08-23 Thread Lei Li

On 08/23/2013 05:14 PM, Paolo Bonzini wrote:

Il 23/08/2013 11:11, Lei Li ha scritto:

In qemu_savevm_state_iterate(), if the return of ram_save_iterate  0,
it will set error in QEMUFile by qemu_file_set_error(f, ret), the error
reported by qemu_file_get_error() in the QEMUFile is got from
qemu_file_set_error() by reading the f-last_error, right?

And now as qemu_file_rate_limit() never return negative value, what's the
meaning for the check: if (qemu_file_rate_limit(f)  0) in
ram_save_iterate()?

I only see a while ((ret = qemu_file_rate_limit(f)) == 0), no
less-than-zero check.

Are we looking at the same code? :)


I think so, hehe.
You might want to look a little more. After the while(..), there is a check:

if (ret  0) {
bytes_transferred += total_sent;
return ret;
}



Paolo




--
Lei




[Qemu-devel] [PATCH] spapr-vscsi: Adding VSCSI capabilities

2013-08-23 Thread Alexey Kardashevskiy
From: Nikunj A Dadhania nik...@linux.vnet.ibm.com

This implements capabilities exchange between host and client.
As at the moment no capability is supported, put zero flags everywhere
and return.

Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/scsi/spapr_vscsi.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index c5ff794..cc35b1b 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -919,6 +919,34 @@ static int vscsi_send_adapter_info(VSCSIState *s, 
vscsi_req *req)
 return vscsi_send_iu(s, req, sizeof(*sinfo), VIOSRP_MAD_FORMAT);
 }
 
+static int vscsi_send_capabilities(VSCSIState *s, vscsi_req *req)
+{
+struct viosrp_capabilities *vcap;
+struct capabilities cap;
+int rc;
+
+vcap = req-iu.mad.capabilities;
+rc = spapr_vio_dma_read(s-vdev, be64_to_cpu(vcap-buffer),
+cap, be16_to_cpu(vcap-common.length));
+if (rc)  {
+fprintf(stderr, vscsi_send_capabilities: DMA write failure !\n);
+}
+
+cap.flags = 0;
+cap.migration.ecl = 0;
+cap.reserve.type = 0;
+cap.migration.common.server_support = 0;
+cap.reserve.common.server_support = 0;
+rc = spapr_vio_dma_write(s-vdev, be64_to_cpu(vcap-buffer),
+ cap, be16_to_cpu(vcap-common.length));
+if (rc)  {
+fprintf(stderr, vscsi_send_capabilities: DMA write failure !\n);
+}
+vcap-common.status = rc ? cpu_to_be32(1) : 0;
+
+return vscsi_send_iu(s, req, sizeof(*vcap), VIOSRP_MAD_FORMAT);
+}
+
 static int vscsi_handle_mad_req(VSCSIState *s, vscsi_req *req)
 {
 union mad_iu *mad = req-iu.mad;
@@ -939,6 +967,9 @@ static int vscsi_handle_mad_req(VSCSIState *s, vscsi_req 
*req)
 mad-host_config.common.status = cpu_to_be16(1);
 vscsi_send_iu(s, req, sizeof(mad-host_config), VIOSRP_MAD_FORMAT);
 break;
+case VIOSRP_CAPABILITIES_TYPE:
+vscsi_send_capabilities(s, req);
+break;
 default:
 fprintf(stderr, VSCSI: Unknown MAD type %02x\n,
 be32_to_cpu(mad-empty_iu.common.type));
-- 
1.8.4.rc4




Re: [Qemu-devel] [PATCH 11/18] migration: introduce capability localhost

2013-08-23 Thread Lei Li

On 08/23/2013 04:01 PM, Paolo Bonzini wrote:

Il 23/08/2013 09:51, Lei Li ha scritto:

Because in these patches, the local URI is only present in the
destination (which is wrong: the destination should autodetect local
mode using the load-page hook).  As you can see above, a unix URI will
examine the capability and pick the appropriate migration method.

Hi Paolo,

I agree with that 'localhost' is a bad capability name. Since we just
use unix
socket temporarily and will be ending up with vmsplice via pipe, how about
just page_flipping?

The basic migration method is still a unix socket (and a unix: URI).
I suggested unix-page-flipping to make it clear that it doesn't work
with tcp or rdma migration (you might make it work with fd migration).


OK, make sense, thanks.



Paolo




--
Lei




Re: [Qemu-devel] [PATCH 04/18] savevm: set right return value for qemu_file_rate_limit

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 11:18, Lei Li ha scritto:

 And now as qemu_file_rate_limit() never return negative value, what's
 the
 meaning for the check: if (qemu_file_rate_limit(f)  0) in
 ram_save_iterate()?
 I only see a while ((ret = qemu_file_rate_limit(f)) == 0), no
 less-than-zero check.

 Are we looking at the same code? :)
 
 I think so, hehe.
 You might want to look a little more. After the while(..), there is a
 check:
 
 if (ret  0) {
 bytes_transferred += total_sent;
 return ret;
 }

Aha, there you are.  It's dead code, you can delete it.

Paolo



[Qemu-devel] [PATCH] spapr-vscsi: Report error on unsupported MAD requests

2013-08-23 Thread Alexey Kardashevskiy
The existing driver just dropped unsupported requests. This adds error
responses to those unhandled requests.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/scsi/spapr_vscsi.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index cc35b1b..9259d7e 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -957,24 +957,24 @@ static int vscsi_handle_mad_req(VSCSIState *s, vscsi_req 
*req)
 break;
 case VIOSRP_ERROR_LOG_TYPE:
 fprintf(stderr, Unsupported ERROR LOG MAD IU\n);
-mad-error_log.common.status = cpu_to_be16(1);
-vscsi_send_iu(s, req, sizeof(mad-error_log), VIOSRP_MAD_FORMAT);
 break;
 case VIOSRP_ADAPTER_INFO_TYPE:
 vscsi_send_adapter_info(s, req);
-break;
+return 1;
 case VIOSRP_HOST_CONFIG_TYPE:
-mad-host_config.common.status = cpu_to_be16(1);
-vscsi_send_iu(s, req, sizeof(mad-host_config), VIOSRP_MAD_FORMAT);
+fprintf(stderr, Unsupported HOST CONFIG TYPE MAD IU\n);
 break;
 case VIOSRP_CAPABILITIES_TYPE:
 vscsi_send_capabilities(s, req);
-break;
+return 1;
 default:
 fprintf(stderr, VSCSI: Unknown MAD type %02x\n,
 be32_to_cpu(mad-empty_iu.common.type));
 }
 
+mad-empty_iu.common.status = cpu_to_be16(VIOSRP_MAD_NOT_SUPPORTED);
+vscsi_send_iu(s, req, mad-empty_iu.common.length, VIOSRP_MAD_FORMAT);
+
 return 1;
 }
 
-- 
1.8.4.rc4




Re: [Qemu-devel] [PATCH 04/18] savevm: set right return value for qemu_file_rate_limit

2013-08-23 Thread Lei Li

On 08/23/2013 05:22 PM, Paolo Bonzini wrote:

Il 23/08/2013 11:18, Lei Li ha scritto:

And now as qemu_file_rate_limit() never return negative value, what's
the
meaning for the check: if (qemu_file_rate_limit(f)  0) in
ram_save_iterate()?

I only see a while ((ret = qemu_file_rate_limit(f)) == 0), no
less-than-zero check.

Are we looking at the same code? :)

I think so, hehe.
You might want to look a little more. After the while(..), there is a
check:

if (ret  0) {
 bytes_transferred += total_sent;
 return ret;
}

Aha, there you are.  It's dead code, you can delete it.


Sure, I will send it along with other two patches you reviewed positively. :)



Paolo




--
Lei




Re: [Qemu-devel] [PULL 9/9] kvm: shorten the parameter list for get_real_device()

2013-08-23 Thread Wei Yang
Paolo,

Is it necessary to add this?

From: Wei Yang weiy...@linux.vnet.ibm.com

On Fri, Aug 23, 2013 at 11:03:35AM +0200, Paolo Bonzini wrote:
get_real_device() has 5 parameters with the last 4 is contained in the first
structure.

This patch removes the last 4 parameters and directly use them from the first
parameter.

Acked-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/kvm/pci-assign.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index ff33dc8..73941b2 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -568,8 +568,7 @@ static int get_real_device_id(const char *devpath, 
uint16_t *val)
 return get_real_id(devpath, device, val);
 }

-static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
-   uint8_t r_bus, uint8_t r_dev, uint8_t r_func)
+static int get_real_device(AssignedDevice *pci_dev)
 {
 char dir[128], name[128];
 int fd, r = 0, v;
@@ -582,7 +581,8 @@ static int get_real_device(AssignedDevice *pci_dev, 
uint16_t r_seg,
 dev-region_number = 0;

 snprintf(dir, sizeof(dir), /sys/bus/pci/devices/%04x:%02x:%02x.%x/,
- r_seg, r_bus, r_dev, r_func);
+ pci_dev-host.domain, pci_dev-host.bus,
+ pci_dev-host.slot, pci_dev-host.function);

 snprintf(name, sizeof(name), %sconfig, dir);

@@ -1769,8 +1769,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 memcpy(dev-emulate_config_write, dev-emulate_config_read,
sizeof(dev-emulate_config_read));

-if (get_real_device(dev, dev-host.domain, dev-host.bus,
-dev-host.slot, dev-host.function)) {
+if (get_real_device(dev)) {
 error_report(pci-assign: Error: Couldn't get real device (%s)!,
  dev-dev.qdev.id);
 goto out;
-- 
1.8.3.1

-- 
Richard Yang
Help you, Help me




Re: [Qemu-devel] [PULL 9/9] kvm: shorten the parameter list for get_real_device()

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 11:37, Wei Yang ha scritto:
 Paolo,
 
 Is it necessary to add this?
 
 From: Wei Yang weiy...@linux.vnet.ibm.com

Right, I'll send v2 right away.

Paolo

 On Fri, Aug 23, 2013 at 11:03:35AM +0200, Paolo Bonzini wrote:
 get_real_device() has 5 parameters with the last 4 is contained in the first
 structure.

 This patch removes the last 4 parameters and directly use them from the first
 parameter.

 Acked-by: Alex Williamson alex.william...@redhat.com
 Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
 hw/i386/kvm/pci-assign.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

 diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
 index ff33dc8..73941b2 100644
 --- a/hw/i386/kvm/pci-assign.c
 +++ b/hw/i386/kvm/pci-assign.c
 @@ -568,8 +568,7 @@ static int get_real_device_id(const char *devpath, 
 uint16_t *val)
 return get_real_id(devpath, device, val);
 }

 -static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
 -   uint8_t r_bus, uint8_t r_dev, uint8_t r_func)
 +static int get_real_device(AssignedDevice *pci_dev)
 {
 char dir[128], name[128];
 int fd, r = 0, v;
 @@ -582,7 +581,8 @@ static int get_real_device(AssignedDevice *pci_dev, 
 uint16_t r_seg,
 dev-region_number = 0;

 snprintf(dir, sizeof(dir), /sys/bus/pci/devices/%04x:%02x:%02x.%x/,
 - r_seg, r_bus, r_dev, r_func);
 + pci_dev-host.domain, pci_dev-host.bus,
 + pci_dev-host.slot, pci_dev-host.function);

 snprintf(name, sizeof(name), %sconfig, dir);

 @@ -1769,8 +1769,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 memcpy(dev-emulate_config_write, dev-emulate_config_read,
sizeof(dev-emulate_config_read));

 -if (get_real_device(dev, dev-host.domain, dev-host.bus,
 -dev-host.slot, dev-host.function)) {
 +if (get_real_device(dev)) {
 error_report(pci-assign: Error: Couldn't get real device (%s)!,
  dev-dev.qdev.id);
 goto out;
 -- 
 1.8.3.1
 




[Qemu-devel] [PULL 0/9] KVM changes for 2013-08-23

2013-08-23 Thread Paolo Bonzini
Anthony,

The following changes since commit f03d07d4683b2e8325a7cb60b4e14b977b1a869c:

  Merge remote-tracking branch 'quintela/migration.next' into staging 
(2013-07-23 10:57:23 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

for you to fetch changes up to 3f994214cd39cfdac57be32c4d0cf401a046b17f:

  kvm: shorten the parameter list for get_real_device() (2013-08-22 18:40:12 
+0200)

Paolo


Arthur Chunqi Li (1):
  Initialize IA32_FEATURE_CONTROL MSR in reset and migration

Jan Kiszka (1):
  kvm: Simplify kvm_handle_io

Liu Jinsong (1):
  kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled

Marcelo Tosatti (2):
  kvm-all.c: max_cpus should not exceed KVM vcpu limit
  kvm: i386: fix LAPIC TSC deadline timer save/restore

Paolo Bonzini (3):
  target-i386: remove tabs from target-i386/cpu.h
  kvm: migrate vPMU state
  kvm: shorten the parameter list for get_real_device()

Vincenzo Maffione (1):
  kvm: add KVM_IRQFD_FLAG_RESAMPLE support

 hw/i386/kvm/pci-assign.c |   9 +-
 hw/misc/vfio.c   |   4 +-
 hw/virtio/virtio-pci.c   |   2 +-
 include/sysemu/kvm.h |   3 +-
 kvm-all.c|  52 +---
 target-i386/cpu.h| 217 ++-
 target-i386/kvm.c| 139 --
 target-i386/machine.c|  66 ++
 8 files changed, 349 insertions(+), 143 deletions(-)
-- 
1.8.3.1




[Qemu-devel] [PULL 6/9] kvm: Simplify kvm_handle_io

2013-08-23 Thread Paolo Bonzini
From: Jan Kiszka jan.kis...@siemens.com

Now that cpu_in/out is just a wrapper around address_space_rw, we can
also call the latter directly. As host endianness == guest endianness,
there is no need for the memory access helpers st*_p/ld*_p as well.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 kvm-all.c | 28 ++--
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index bfa4aac..ef52a0f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1508,32 +1508,8 @@ static void kvm_handle_io(uint16_t port, void *data, int 
direction, int size,
 uint8_t *ptr = data;
 
 for (i = 0; i  count; i++) {
-if (direction == KVM_EXIT_IO_IN) {
-switch (size) {
-case 1:
-stb_p(ptr, cpu_inb(port));
-break;
-case 2:
-stw_p(ptr, cpu_inw(port));
-break;
-case 4:
-stl_p(ptr, cpu_inl(port));
-break;
-}
-} else {
-switch (size) {
-case 1:
-cpu_outb(port, ldub_p(ptr));
-break;
-case 2:
-cpu_outw(port, lduw_p(ptr));
-break;
-case 4:
-cpu_outl(port, ldl_p(ptr));
-break;
-}
-}
-
+address_space_rw(address_space_io, port, ptr, size,
+ direction == KVM_EXIT_IO_OUT);
 ptr += size;
 }
 }
-- 
1.8.3.1





[Qemu-devel] [PULL 1/9] Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-08-23 Thread Paolo Bonzini
From: Arthur Chunqi Li yzt...@gmail.com

The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
to clear this MSR when reset vCPU and keep the value of it when
migration. This patch add this feature.

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
Signed-off-by: Gleb Natapov g...@redhat.com
---
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c |  4 
 target-i386/machine.c | 22 ++
 3 files changed, 28 insertions(+)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index cedefdc..3a52f94 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -301,6 +301,7 @@
 #define MSR_IA32_APICBASE_BSP   (18)
 #define MSR_IA32_APICBASE_ENABLE(111)
 #define MSR_IA32_APICBASE_BASE  (0xf12)
+#define MSR_IA32_FEATURE_CONTROL0x003a
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_TSCDEADLINE0x6e0
 
@@ -813,6 +814,7 @@ typedef struct CPUX86State {
 
 uint64_t mcg_status;
 uint64_t msr_ia32_misc_enable;
+uint64_t msr_ia32_feature_control;
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 3c9d10a..84ac00a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1121,6 +1121,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (hyperv_vapic_recommended()) {
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
+kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
env-msr_ia32_feature_control);
 }
 if (env-mcg_cap) {
 int i;
@@ -1345,6 +1346,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_misc_enable) {
 msrs[n++].index = MSR_IA32_MISC_ENABLE;
 }
+msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
 
 if (!env-tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1443,6 +1445,8 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_MISC_ENABLE:
 env-msr_ia32_misc_enable = msrs[i].data;
 break;
+case MSR_IA32_FEATURE_CONTROL:
+env-msr_ia32_feature_control = msrs[i].data;
 default:
 if (msrs[i].index = MSR_MC0_CTL 
 msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index f9ec581..0d2088e 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -435,6 +435,14 @@ static bool misc_enable_needed(void *opaque)
 return env-msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
 }
 
+static bool feature_control_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = cpu-env;
+
+return env-msr_ia32_feature_control != 0;
+}
+
 static const VMStateDescription vmstate_msr_ia32_misc_enable = {
 .name = cpu/msr_ia32_misc_enable,
 .version_id = 1,
@@ -446,6 +454,17 @@ static const VMStateDescription 
vmstate_msr_ia32_misc_enable = {
 }
 };
 
+static const VMStateDescription vmstate_msr_ia32_feature_control = {
+.name = cpu/msr_ia32_feature_control,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_x86_cpu = {
 .name = cpu,
 .version_id = 12,
@@ -571,6 +590,9 @@ const VMStateDescription vmstate_x86_cpu = {
 }, {
 .vmsd = vmstate_msr_ia32_misc_enable,
 .needed = misc_enable_needed,
+}, {
+.vmsd = vmstate_msr_ia32_feature_control,
+.needed = feature_control_needed,
 } , {
 /* empty */
 }
-- 
1.8.3.1





[Qemu-devel] [PULL 7/9] kvm-all.c: max_cpus should not exceed KVM vcpu limit

2013-08-23 Thread Paolo Bonzini
From: Marcelo Tosatti mtosa...@redhat.com

maxcpus, which specifies the maximum number of hotpluggable CPUs,
should not exceed KVM's vcpu limit.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
[Reword message. - Paolo]
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 kvm-all.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index ef52a0f..a2d4978 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1400,6 +1400,13 @@ int kvm_init(void)
 goto err;
 }
 
+if (max_cpus  max_vcpus) {
+ret = -EINVAL;
+fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds 
max cpus 
+supported by KVM (%d)\n, max_cpus, max_vcpus);
+goto err;
+}
+
 s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
 if (s-vmfd  0) {
 #ifdef TARGET_S390X
-- 
1.8.3.1





[Qemu-devel] [PULL 2/9] target-i386: remove tabs from target-i386/cpu.h

2013-08-23 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/cpu.h | 192 +++---
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 3a52f94..af4c0f7 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -37,9 +37,9 @@
 #define TARGET_HAS_ICE 1
 
 #ifdef TARGET_X86_64
-#define ELF_MACHINEEM_X86_64
+#define ELF_MACHINE EM_X86_64
 #else
-#define ELF_MACHINEEM_386
+#define ELF_MACHINE EM_386
 #endif
 
 #define CPUArchState struct CPUX86State
@@ -98,10 +98,10 @@
 #define DESC_TSS_BUSY_MASK (1  9)
 
 /* eflags masks */
-#define CC_C   0x0001
-#define CC_P   0x0004
-#define CC_A   0x0010
-#define CC_Z   0x0040
+#define CC_C0x0001
+#define CC_P0x0004
+#define CC_A0x0010
+#define CC_Z0x0040
 #define CC_S0x0080
 #define CC_O0x0800
 
@@ -109,14 +109,14 @@
 #define IOPL_SHIFT 12
 #define VM_SHIFT   17
 
-#define TF_MASK0x0100
-#define IF_MASK0x0200
-#define DF_MASK0x0400
-#define IOPL_MASK  0x3000
-#define NT_MASK0x4000
-#define RF_MASK0x0001
-#define VM_MASK0x0002
-#define AC_MASK0x0004
+#define TF_MASK 0x0100
+#define IF_MASK 0x0200
+#define DF_MASK 0x0400
+#define IOPL_MASK   0x3000
+#define NT_MASK 0x4000
+#define RF_MASK 0x0001
+#define VM_MASK 0x0002
+#define AC_MASK 0x0004
 #define VIF_MASK0x0008
 #define VIP_MASK0x0010
 #define ID_MASK 0x0020
@@ -238,28 +238,28 @@
 #define DR7_TYPE_IO_RW   0x2
 #define DR7_TYPE_DATA_RW 0x3
 
-#define PG_PRESENT_BIT 0
-#define PG_RW_BIT  1
-#define PG_USER_BIT2
-#define PG_PWT_BIT 3
-#define PG_PCD_BIT 4
-#define PG_ACCESSED_BIT5
-#define PG_DIRTY_BIT   6
-#define PG_PSE_BIT 7
-#define PG_GLOBAL_BIT  8
-#define PG_NX_BIT  63
+#define PG_PRESENT_BIT  0
+#define PG_RW_BIT   1
+#define PG_USER_BIT 2
+#define PG_PWT_BIT  3
+#define PG_PCD_BIT  4
+#define PG_ACCESSED_BIT 5
+#define PG_DIRTY_BIT6
+#define PG_PSE_BIT  7
+#define PG_GLOBAL_BIT   8
+#define PG_NX_BIT   63
 
 #define PG_PRESENT_MASK  (1  PG_PRESENT_BIT)
-#define PG_RW_MASK  (1  PG_RW_BIT)
-#define PG_USER_MASK(1  PG_USER_BIT)
-#define PG_PWT_MASK (1  PG_PWT_BIT)
-#define PG_PCD_MASK (1  PG_PCD_BIT)
+#define PG_RW_MASK   (1  PG_RW_BIT)
+#define PG_USER_MASK (1  PG_USER_BIT)
+#define PG_PWT_MASK  (1  PG_PWT_BIT)
+#define PG_PCD_MASK  (1  PG_PCD_BIT)
 #define PG_ACCESSED_MASK (1  PG_ACCESSED_BIT)
-#define PG_DIRTY_MASK   (1  PG_DIRTY_BIT)
-#define PG_PSE_MASK (1  PG_PSE_BIT)
-#define PG_GLOBAL_MASK  (1  PG_GLOBAL_BIT)
+#define PG_DIRTY_MASK(1  PG_DIRTY_BIT)
+#define PG_PSE_MASK  (1  PG_PSE_BIT)
+#define PG_GLOBAL_MASK   (1  PG_GLOBAL_BIT)
 #define PG_HI_USER_MASK  0x7ff0LL
-#define PG_NX_MASK  (1LL  PG_NX_BIT)
+#define PG_NX_MASK   (1LL  PG_NX_BIT)
 
 #define PG_ERROR_W_BIT 1
 
@@ -269,32 +269,32 @@
 #define PG_ERROR_RSVD_MASK 0x08
 #define PG_ERROR_I_D_MASK  0x10
 
-#define MCG_CTL_P  (1ULL8)   /* MCG_CAP register available */
-#define MCG_SER_P  (1ULL24) /* MCA recovery/new status bits */
+#define MCG_CTL_P   (1ULL8)   /* MCG_CAP register available */
+#define MCG_SER_P   (1ULL24) /* MCA recovery/new status bits */
 
-#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P)
-#define MCE_BANKS_DEF  10
+#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
+#define MCE_BANKS_DEF   10
 
-#define MCG_STATUS_RIPV(1ULL0)   /* restart ip valid */
-#define MCG_STATUS_EIPV(1ULL1)   /* ip points to correct instruction 
*/
-#define MCG_STATUS_MCIP(1ULL2)   /* machine check in progress */
+#define MCG_STATUS_RIPV (1ULL0)   /* restart ip valid */
+#define MCG_STATUS_EIPV (1ULL1)   /* ip points to correct instruction */
+#define MCG_STATUS_MCIP (1ULL2)   /* machine check in progress */
 
-#define MCI_STATUS_VAL (1ULL63)  /* valid error */
-#define MCI_STATUS_OVER(1ULL62)  /* previous errors lost */
-#define MCI_STATUS_UC  (1ULL61)  /* uncorrected error */
-#define MCI_STATUS_EN  (1ULL60)  /* error enabled */
-#define MCI_STATUS_MISCV (1ULL59) /* misc error reg. valid */
-#define MCI_STATUS_ADDRV (1ULL58) /* addr reg. valid */
-#define MCI_STATUS_PCC (1ULL57)  /* processor context corrupt */
-#define MCI_STATUS_S   (1ULL56)  /* Signaled machine check */
-#define MCI_STATUS_AR  (1ULL55)  /* Action required */
+#define MCI_STATUS_VAL   (1ULL63)  /* valid error */
+#define MCI_STATUS_OVER  (1ULL62)  /* previous errors lost */
+#define MCI_STATUS_UC(1ULL61)  /* uncorrected error */
+#define MCI_STATUS_EN(1ULL60)  /* 

[Qemu-devel] [PULL 4/9] kvm: add KVM_IRQFD_FLAG_RESAMPLE support

2013-08-23 Thread Paolo Bonzini
From: Vincenzo Maffione v.maffi...@gmail.com

Added an EventNotifier* parameter to
kvm-all.c:kvm_irqchip_add_irqfd_notifier(), in order to give KVM
another eventfd to be used as resamplefd. See the documentation
in the linux kernel sources in Documentation/virtual/kvm/api.txt
(section 4.75) for more details.
When the added parameter is passed NULL, the behaviour of the
function is unchanged with respect to the previous versions.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Vincenzo Maffione v.maffi...@gmail.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/misc/vfio.c |  4 ++--
 hw/virtio/virtio-pci.c |  2 +-
 include/sysemu/kvm.h   |  3 ++-
 kvm-all.c  | 17 +
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index ad8ce77..54af34a 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -646,7 +646,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
-   vector-virq)  0) {
+   NULL, vector-virq)  0) {
 if (vector-virq = 0) {
 kvm_irqchip_release_virq(kvm_state, vector-virq);
 vector-virq = -1;
@@ -814,7 +814,7 @@ retry:
 vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
-   vector-virq)  0) {
+   NULL, vector-virq)  0) {
 qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt),
 vfio_msi_interrupt, NULL, vector);
 }
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c38cfd1..c4db407 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -508,7 +508,7 @@ static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
 VirtQueue *vq = virtio_get_queue(proxy-vdev, queue_no);
 EventNotifier *n = virtio_queue_get_guest_notifier(vq);
 int ret;
-ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd-virq);
+ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, NULL, irqfd-virq);
 return ret;
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index f8ac448..ce3efaf 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -309,7 +309,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
-int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq);
+int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n,
+   EventNotifier *rn, int virq);
 int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq);
 void kvm_pc_gsi_handler(void *opaque, int n, int level);
 void kvm_pc_setup_irq_routing(bool pci_enabled);
diff --git a/kvm-all.c b/kvm-all.c
index 4fb4ccb..bfa4aac 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1230,7 +1230,8 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg)
 return kvm_update_routing_entry(s, kroute);
 }
 
-static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
+static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int rfd, int virq,
+bool assign)
 {
 struct kvm_irqfd irqfd = {
 .fd = fd,
@@ -1238,6 +1239,11 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, 
int virq, bool assign)
 .flags = assign ? 0 : KVM_IRQFD_FLAG_DEASSIGN,
 };
 
+if (rfd != -1) {
+irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
+irqfd.resamplefd = rfd;
+}
+
 if (!kvm_irqfds_enabled()) {
 return -ENOSYS;
 }
@@ -1276,14 +1282,17 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg)
 }
 #endif /* !KVM_CAP_IRQ_ROUTING */
 
-int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq)
+int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n,
+   EventNotifier *rn, int virq)
 {
-return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), virq, true);
+return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n),
+   rn ? event_notifier_get_fd(rn) : -1, virq, true);
 }
 
 int kvm_irqchip_remove_irqfd_notifier(KVMState *s, EventNotifier *n, int virq)
 {
-return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), virq, false);
+return kvm_irqchip_assign_irqfd(s, event_notifier_get_fd(n), -1, virq,
+   false);
 }
 
 static int kvm_irqchip_create(KVMState *s)
-- 
1.8.3.1





[Qemu-devel] [PULL 8/9] kvm: i386: fix LAPIC TSC deadline timer save/restore

2013-08-23 Thread Paolo Bonzini
From: Marcelo Tosatti mtosa...@redhat.com

The configuration of the timer represented by MSR_IA32_TSCDEADLINE depends on:

- APIC LVT Timer register.
- TSC value.

Change the order to respect the dependency.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/kvm.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7bb8455..58f7bb7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1073,6 +1073,26 @@ static void kvm_msr_entry_set(struct kvm_msr_entry 
*entry,
 entry-data = value;
 }
 
+static int kvm_put_tscdeadline_msr(X86CPU *cpu)
+{
+CPUX86State *env = cpu-env;
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[1];
+} msr_data;
+struct kvm_msr_entry *msrs = msr_data.entries;
+
+if (!has_msr_tsc_deadline) {
+return 0;
+}
+
+kvm_msr_entry_set(msrs[0], MSR_IA32_TSCDEADLINE, env-tsc_deadline);
+
+msr_data.info.nmsrs = 1;
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, msr_data);
+}
+
 static int kvm_put_msrs(X86CPU *cpu, int level)
 {
 CPUX86State *env = cpu-env;
@@ -1096,9 +1116,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_tsc_adjust) {
 kvm_msr_entry_set(msrs[n++], MSR_TSC_ADJUST, env-tsc_adjust);
 }
-if (has_msr_tsc_deadline) {
-kvm_msr_entry_set(msrs[n++], MSR_IA32_TSCDEADLINE, env-tsc_deadline);
-}
 if (has_msr_misc_enable) {
 kvm_msr_entry_set(msrs[n++], MSR_IA32_MISC_ENABLE,
   env-msr_ia32_misc_enable);
@@ -1808,6 +1825,12 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 return ret;
 }
 }
+
+ret = kvm_put_tscdeadline_msr(x86_cpu);
+if (ret  0) {
+return ret;
+}
+
 ret = kvm_put_vcpu_events(x86_cpu, level);
 if (ret  0) {
 return ret;
-- 
1.8.3.1





[Qemu-devel] [PULL 9/9] kvm: shorten the parameter list for get_real_device()

2013-08-23 Thread Paolo Bonzini
get_real_device() has 5 parameters with the last 4 is contained in the first
structure.

This patch removes the last 4 parameters and directly use them from the first
parameter.

Acked-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/kvm/pci-assign.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index ff33dc8..73941b2 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -568,8 +568,7 @@ static int get_real_device_id(const char *devpath, uint16_t 
*val)
 return get_real_id(devpath, device, val);
 }
 
-static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
-   uint8_t r_bus, uint8_t r_dev, uint8_t r_func)
+static int get_real_device(AssignedDevice *pci_dev)
 {
 char dir[128], name[128];
 int fd, r = 0, v;
@@ -582,7 +581,8 @@ static int get_real_device(AssignedDevice *pci_dev, 
uint16_t r_seg,
 dev-region_number = 0;
 
 snprintf(dir, sizeof(dir), /sys/bus/pci/devices/%04x:%02x:%02x.%x/,
- r_seg, r_bus, r_dev, r_func);
+ pci_dev-host.domain, pci_dev-host.bus,
+ pci_dev-host.slot, pci_dev-host.function);
 
 snprintf(name, sizeof(name), %sconfig, dir);
 
@@ -1769,8 +1769,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 memcpy(dev-emulate_config_write, dev-emulate_config_read,
sizeof(dev-emulate_config_read));
 
-if (get_real_device(dev, dev-host.domain, dev-host.bus,
-dev-host.slot, dev-host.function)) {
+if (get_real_device(dev)) {
 error_report(pci-assign: Error: Couldn't get real device (%s)!,
  dev-dev.qdev.id);
 goto out;
-- 
1.8.3.1




[Qemu-devel] [PULL 5/9] kvm: x86: fix setting IA32_FEATURE_CONTROL with nested VMX disabled

2013-08-23 Thread Paolo Bonzini
From: Liu Jinsong jinsong@intel.com

This patch is to fix the bug https://bugs.launchpad.net/qemu-kvm/+bug/1207623

IA32_FEATURE_CONTROL is pointless if not expose VMX or SMX bits to
cpuid.1.ecx of vcpu. Current qemu-kvm will error return when kvm_put_msrs
or kvm_get_msrs.

Signed-off-by: Liu Jinsong jinsong@intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/kvm.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 513ae52..7bb8455 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -65,6 +65,7 @@ static bool has_msr_star;
 static bool has_msr_hsave_pa;
 static bool has_msr_tsc_adjust;
 static bool has_msr_tsc_deadline;
+static bool has_msr_feature_control;
 static bool has_msr_async_pf_en;
 static bool has_msr_pv_eoi_en;
 static bool has_msr_misc_enable;
@@ -666,6 +667,12 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 qemu_add_vm_change_state_handler(cpu_update_state, env);
 
+c = cpuid_find_entry(cpuid_data.cpuid, 1, 0);
+if (c) {
+has_msr_feature_control = !!(c-ecx  CPUID_EXT_VMX) ||
+  !!(c-ecx  CPUID_EXT_SMX);
+}
+
 cpuid_data.cpuid.padding = 0;
 r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
 if (r) {
@@ -1169,7 +1176,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (hyperv_vapic_recommended()) {
 kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
-kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
env-msr_ia32_feature_control);
+if (has_msr_feature_control) {
+kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL,
+  env-msr_ia32_feature_control);
+}
 }
 if (env-mcg_cap) {
 int i;
@@ -1394,7 +1404,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_misc_enable) {
 msrs[n++].index = MSR_IA32_MISC_ENABLE;
 }
-msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
+if (has_msr_feature_control) {
+msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
+}
 
 if (!env-tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1509,6 +1521,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 break;
 case MSR_IA32_FEATURE_CONTROL:
 env-msr_ia32_feature_control = msrs[i].data;
+break;
 default:
 if (msrs[i].index = MSR_MC0_CTL 
 msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
-- 
1.8.3.1





[Qemu-devel] [PULL 3/9] kvm: migrate vPMU state

2013-08-23 Thread Paolo Bonzini
Reviewed-by: Gleb Natapov gnata...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 target-i386/cpu.h | 23 +
 target-i386/kvm.c | 93 ---
 target-i386/machine.c | 44 
 3 files changed, 155 insertions(+), 5 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index af4c0f7..31de265 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -305,6 +305,8 @@
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_TSCDEADLINE0x6e0
 
+#define MSR_P6_PERFCTR0 0xc1
+
 #define MSR_MTRRcap 0xfe
 #define MSR_MTRRcap_VCNT8
 #define MSR_MTRRcap_FIXRANGE_SUPPORT(1  8)
@@ -318,6 +320,8 @@
 #define MSR_MCG_STATUS  0x17a
 #define MSR_MCG_CTL 0x17b
 
+#define MSR_P6_EVNTSEL0 0x186
+
 #define MSR_IA32_PERF_STATUS0x198
 
 #define MSR_IA32_MISC_ENABLE0x1a0
@@ -343,6 +347,14 @@
 
 #define MSR_MTRRdefType 0x2ff
 
+#define MSR_CORE_PERF_FIXED_CTR00x309
+#define MSR_CORE_PERF_FIXED_CTR10x30a
+#define MSR_CORE_PERF_FIXED_CTR20x30b
+#define MSR_CORE_PERF_FIXED_CTR_CTRL0x38d
+#define MSR_CORE_PERF_GLOBAL_STATUS 0x38e
+#define MSR_CORE_PERF_GLOBAL_CTRL   0x38f
+#define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
+
 #define MSR_MC0_CTL 0x400
 #define MSR_MC0_STATUS  0x401
 #define MSR_MC0_ADDR0x402
@@ -721,6 +733,9 @@ typedef struct {
 #define CPU_NB_REGS CPU_NB_REGS32
 #endif
 
+#define MAX_FIXED_COUNTERS 3
+#define MAX_GP_COUNTERS(MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
+
 #define NB_MMU_MODES 3
 
 typedef enum TPRAccess {
@@ -816,6 +831,14 @@ typedef struct CPUX86State {
 uint64_t msr_ia32_misc_enable;
 uint64_t msr_ia32_feature_control;
 
+uint64_t msr_fixed_ctr_ctrl;
+uint64_t msr_global_ctrl;
+uint64_t msr_global_status;
+uint64_t msr_global_ovf_ctrl;
+uint64_t msr_fixed_counters[MAX_FIXED_COUNTERS];
+uint64_t msr_gp_counters[MAX_GP_COUNTERS];
+uint64_t msr_gp_evtsel[MAX_GP_COUNTERS];
+
 /* exception/interrupt handling */
 int error_code;
 int exception_is_int;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 84ac00a..513ae52 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -71,6 +71,9 @@ static bool has_msr_misc_enable;
 static bool has_msr_kvm_steal_time;
 static int lm_capable_kernel;
 
+static bool has_msr_architectural_pmu;
+static uint32_t num_architectural_pmu_counters;
+
 bool kvm_allows_irq0_override(void)
 {
 return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
@@ -581,6 +584,25 @@ int kvm_arch_init_vcpu(CPUState *cs)
 break;
 }
 }
+
+if (limit = 0x0a) {
+uint32_t ver;
+
+cpu_x86_cpuid(env, 0x0a, 0, ver, unused, unused, unused);
+if ((ver  0xff)  0) {
+has_msr_architectural_pmu = true;
+num_architectural_pmu_counters = (ver  0xff00)  8;
+
+/* Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+if (num_architectural_pmu_counters  MAX_GP_COUNTERS) {
+num_architectural_pmu_counters = MAX_GP_COUNTERS;
+}
+}
+}
+
 cpu_x86_cpuid(env, 0x8000, 0, limit, unused, unused, unused);
 
 for (i = 0x8000; i = limit; i++) {
@@ -1052,7 +1074,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 struct kvm_msr_entry entries[100];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
-int n = 0;
+int n = 0, i;
 
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs);
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
@@ -1094,9 +1116,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 }
 /*
- * The following paravirtual MSRs have side effects on the guest or are
- * too heavy for normal writeback. Limit them to reset or full state
- * updates.
+ * The following MSRs have side effects on the guest or are too heavy
+ * for normal writeback. Limit them to reset or full state updates.
  */
 if (level = KVM_PUT_RESET_STATE) {
 kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME,
@@ -1114,6 +1135,33 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(msrs[n++], MSR_KVM_STEAL_TIME,
   env-steal_time_msr);
 }
+if (has_msr_architectural_pmu) {
+/* Stop the counter.  */
+kvm_msr_entry_set(msrs[n++], MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
+kvm_msr_entry_set(msrs[n++], MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+/* Set the counter values.  */
+for (i = 0; i  MAX_FIXED_COUNTERS; 

Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-23 Thread Paolo Bonzini
Il 22/08/2013 17:39, Andrew Jones ha scritto:
 The comment in kvm_max_vcpus() states that it's using the recommended
 procedure from the kernel API documentation to get the max number
 of vcpus that kvm supports. It is, but by always returning the
 maximum number supported. The maximum number should only be used
 for development purposes. qemu should check KVM_CAP_NR_VCPUS for
 the recommended number of vcpus. This patch adds a warning if a user
 specifies a number of cpus between the recommended and max.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  kvm-all.c | 45 +++--
  1 file changed, 27 insertions(+), 18 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 716860f617455..9092e13ae60ea 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
  return 0;
  }
  
 -static int kvm_max_vcpus(KVMState *s)
 +/* Find number of supported CPUs using the recommended
 + * procedure from the kernel API documentation to cope with
 + * older kernels that may be missing capabilities.
 + */
 +static int kvm_recommended_vcpus(KVMState *s)
  {
  int ret;
  
 -/* Find number of supported CPUs using the recommended
 - * procedure from the kernel API documentation to cope with
 - * older kernels that may be missing capabilities.
 - */
 -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 -if (ret) {
 -return ret;
 -}
  ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
 -if (ret) {
 -return ret;
 -}
 +return (ret) ? ret : 4;
 +}
  
 -return 4;
 +static int kvm_max_vcpus(KVMState *s)
 +{
 +int ret;
 +
 +ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
 +return (ret) ? ret : kvm_recommended_vcpus(s);
  }
  
  int kvm_init(void)
 @@ -1383,12 +1383,21 @@ int kvm_init(void)
  goto err;
  }
  
 -max_vcpus = kvm_max_vcpus(s);
 +max_vcpus = kvm_recommended_vcpus(s);
  if (smp_cpus  max_vcpus) {
 -ret = -EINVAL;
 -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus 
 -supported by KVM (%d)\n, smp_cpus, max_vcpus);
 -goto err;
 +fprintf(stderr,
 +Warning: Number of SMP cpus requested (%d) exceeds 
 +recommended cpus supported by KVM (%d)\n,
 +smp_cpus, max_vcpus);
 +
 +max_vcpus = kvm_max_vcpus(s);
 +if (smp_cpus  max_vcpus) {
 +ret = -EINVAL;
 +fprintf(stderr, Number of SMP cpus requested (%d) exceeds 
 +max cpus supported by KVM (%d)\n,
 +smp_cpus, max_vcpus);
 +goto err;
 +}

You print both error messages when smp_cpus is greater than the max cpus
supported; is it intentional?

Apart from this, the concept looks good.  However, please over
qemu-kvm.git's uq/master branch, where we already have Marcelo's patch
to check max_cpus too against kvm_max_vcpus(s).

Thanks,

Paolo



[Qemu-devel] [PATCH 1/3] savevm: add comments for qemu_file_get_error()

2013-08-23 Thread Lei Li
Add comments for qemu_file_get_error(), as its return value
is not very clear.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 savevm.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 03fc4d9..95a11f9 100644
--- a/savevm.c
+++ b/savevm.c
@@ -566,6 +566,13 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
 return f;
 }
 
+/*
+ * Get last error for stream f
+ *
+ * Return negative error value if there has been an error on previous
+ * operations, return 0 if no error happened.
+ *
+ */
 int qemu_file_get_error(QEMUFile *f)
 {
 return f-last_error;
-- 
1.7.7.6




[Qemu-devel] [PATCH 2/3] savevm: wrong error set by ram_control_load_hook()

2013-08-23 Thread Lei Li
Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 savevm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/savevm.c b/savevm.c
index 95a11f9..fb17a6f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -649,7 +649,7 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 
 void ram_control_load_hook(QEMUFile *f, uint64_t flags)
 {
-int ret = 0;
+int ret = EINVAL;
 
 if (f-ops-hook_ram_load) {
 ret = f-ops-hook_ram_load(f, f-opaque, flags);
-- 
1.7.7.6




[Qemu-devel] [PATCH 3/3] arch_init: right return for ram_save_iterate

2013-08-23 Thread Lei Li
Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 arch_init.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 94d45e1..a34437c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -709,16 +709,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  */
 ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
-if (ret  0) {
-bytes_transferred += total_sent;
-return ret;
-}
-
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 total_sent += 8;
 bytes_transferred += total_sent;
 
-return total_sent;
+return qemu_file_get_error(f);
 }
 
 static int ram_save_complete(QEMUFile *f, void *opaque)
-- 
1.7.7.6




Re: [Qemu-devel] [PULL 1/9] Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-08-23 Thread Andreas Färber
Am 23.08.2013 11:39, schrieb Paolo Bonzini:
 From: Arthur Chunqi Li yzt...@gmail.com
 
 The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
 to clear this MSR when reset vCPU and keep the value of it when
 migration. This patch add this feature.
 
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  target-i386/cpu.h |  2 ++
  target-i386/kvm.c |  4 
  target-i386/machine.c | 22 ++
  3 files changed, 28 insertions(+)
 
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index cedefdc..3a52f94 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -301,6 +301,7 @@
  #define MSR_IA32_APICBASE_BSP   (18)
  #define MSR_IA32_APICBASE_ENABLE(111)
  #define MSR_IA32_APICBASE_BASE  (0xf12)
 +#define MSR_IA32_FEATURE_CONTROL0x003a
  #define MSR_TSC_ADJUST  0x003b
  #define MSR_IA32_TSCDEADLINE0x6e0
  
 @@ -813,6 +814,7 @@ typedef struct CPUX86State {
  
  uint64_t mcg_status;
  uint64_t msr_ia32_misc_enable;
 +uint64_t msr_ia32_feature_control;
  
  /* exception/interrupt handling */
  int error_code;
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 3c9d10a..84ac00a 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1121,6 +1121,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  if (hyperv_vapic_recommended()) {
  kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
  }
 +kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
 env-msr_ia32_feature_control);
  }
  if (env-mcg_cap) {
  int i;
 @@ -1345,6 +1346,7 @@ static int kvm_get_msrs(X86CPU *cpu)
  if (has_msr_misc_enable) {
  msrs[n++].index = MSR_IA32_MISC_ENABLE;
  }
 +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
  
  if (!env-tsc_valid) {
  msrs[n++].index = MSR_IA32_TSC;
 @@ -1443,6 +1445,8 @@ static int kvm_get_msrs(X86CPU *cpu)
  case MSR_IA32_MISC_ENABLE:
  env-msr_ia32_misc_enable = msrs[i].data;
  break;
 +case MSR_IA32_FEATURE_CONTROL:
 +env-msr_ia32_feature_control = msrs[i].data;

Shouldn't this patch be fixed to have the break that is being added in 5/9?

Andreas

  default:
  if (msrs[i].index = MSR_MC0_CTL 
  msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
 diff --git a/target-i386/machine.c b/target-i386/machine.c
 index f9ec581..0d2088e 100644
 --- a/target-i386/machine.c
 +++ b/target-i386/machine.c
 @@ -435,6 +435,14 @@ static bool misc_enable_needed(void *opaque)
  return env-msr_ia32_misc_enable != MSR_IA32_MISC_ENABLE_DEFAULT;
  }
  
 +static bool feature_control_needed(void *opaque)
 +{
 +X86CPU *cpu = opaque;
 +CPUX86State *env = cpu-env;
 +
 +return env-msr_ia32_feature_control != 0;
 +}
 +
  static const VMStateDescription vmstate_msr_ia32_misc_enable = {
  .name = cpu/msr_ia32_misc_enable,
  .version_id = 1,
 @@ -446,6 +454,17 @@ static const VMStateDescription 
 vmstate_msr_ia32_misc_enable = {
  }
  };
  
 +static const VMStateDescription vmstate_msr_ia32_feature_control = {
 +.name = cpu/msr_ia32_feature_control,
 +.version_id = 1,
 +.minimum_version_id = 1,
 +.minimum_version_id_old = 1,
 +.fields  = (VMStateField []) {
 +VMSTATE_UINT64(env.msr_ia32_feature_control, X86CPU),
 +VMSTATE_END_OF_LIST()
 +}
 +};
 +
  const VMStateDescription vmstate_x86_cpu = {
  .name = cpu,
  .version_id = 12,
 @@ -571,6 +590,9 @@ const VMStateDescription vmstate_x86_cpu = {
  }, {
  .vmsd = vmstate_msr_ia32_misc_enable,
  .needed = misc_enable_needed,
 +}, {
 +.vmsd = vmstate_msr_ia32_feature_control,
 +.needed = feature_control_needed,
  } , {
  /* empty */
  }
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PULL 1/9] Initialize IA32_FEATURE_CONTROL MSR in reset and migration

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 12:10, Andreas Färber ha scritto:
 Am 23.08.2013 11:39, schrieb Paolo Bonzini:
 From: Arthur Chunqi Li yzt...@gmail.com

 The recent KVM patch adds IA32_FEATURE_CONTROL support. QEMU needs
 to clear this MSR when reset vCPU and keep the value of it when
 migration. This patch add this feature.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  target-i386/cpu.h |  2 ++
  target-i386/kvm.c |  4 
  target-i386/machine.c | 22 ++
  3 files changed, 28 insertions(+)

 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index cedefdc..3a52f94 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -301,6 +301,7 @@
  #define MSR_IA32_APICBASE_BSP   (18)
  #define MSR_IA32_APICBASE_ENABLE(111)
  #define MSR_IA32_APICBASE_BASE  (0xf12)
 +#define MSR_IA32_FEATURE_CONTROL0x003a
  #define MSR_TSC_ADJUST  0x003b
  #define MSR_IA32_TSCDEADLINE0x6e0
  
 @@ -813,6 +814,7 @@ typedef struct CPUX86State {
  
  uint64_t mcg_status;
  uint64_t msr_ia32_misc_enable;
 +uint64_t msr_ia32_feature_control;
  
  /* exception/interrupt handling */
  int error_code;
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 3c9d10a..84ac00a 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1121,6 +1121,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  if (hyperv_vapic_recommended()) {
  kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
  }
 +kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
 env-msr_ia32_feature_control);
  }
  if (env-mcg_cap) {
  int i;
 @@ -1345,6 +1346,7 @@ static int kvm_get_msrs(X86CPU *cpu)
  if (has_msr_misc_enable) {
  msrs[n++].index = MSR_IA32_MISC_ENABLE;
  }
 +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
  
  if (!env-tsc_valid) {
  msrs[n++].index = MSR_IA32_TSC;
 @@ -1443,6 +1445,8 @@ static int kvm_get_msrs(X86CPU *cpu)
  case MSR_IA32_MISC_ENABLE:
  env-msr_ia32_misc_enable = msrs[i].data;
  break;
 +case MSR_IA32_FEATURE_CONTROL:
 +env-msr_ia32_feature_control = msrs[i].data;
 
 Shouldn't this patch be fixed to have the break that is being added in 5/9?

We try not to rebase uq/master unless there are conflicts that Anthony
prefers not to handle.  (I did that once and Gleb scolded me... :)
perhaps I'll be wrong this time too...).

Paolo



Re: [Qemu-devel] [PATCH 2/3] savevm: wrong error set by ram_control_load_hook()

2013-08-23 Thread Paolo Bonzini
Il 23/08/2013 12:03, Lei Li ha scritto:
 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
  savevm.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/savevm.c b/savevm.c
 index 95a11f9..fb17a6f 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -649,7 +649,7 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t 
 flags)
  
  void ram_control_load_hook(QEMUFile *f, uint64_t flags)
  {
 -int ret = 0;
 +int ret = EINVAL;

This should be -EINVAL.

Paolo

  if (f-ops-hook_ram_load) {
  ret = f-ops-hook_ram_load(f, f-opaque, flags);
 




Re: [Qemu-devel] [PATCH 0/3] Patchset description

2013-08-23 Thread Gerd Hoffmann
On Fr, 2013-08-23 at 03:18 +0200, Ákos Kovács wrote:
 Patch 1: Move sysbus and PCI specific code to a new files from 
 hw/usb/hcd-ohci.c. 

Looks good, except that it breaks ohci due to not compiling the new
files.

 Patch 2: Make them new configuration variables *_OHCI_SYSBUS and *_OHCI_PCI 
 and add
 them to the arm and sh4 default configuration.
 
 Patch 3: Create similar configuration variables for hcd-uhci-sysbus.c and 
 hcd-uhci-pci.c as
 *_UHCI_SYSBUS and *_UHCI_PCI.

Hmm, I think doing this fine-grained config should wait until we have
Kconfig, so we can create proper dependencies for this.

cheers,
  Gerd






Re: [Qemu-devel] [PATCH] qcow2: Change default for new images to compat=1.1

2013-08-23 Thread Ján Tomko
On 08/19/2013 11:25 AM, Kevin Wolf wrote:
 By the time that qemu 1.7 will be released, enough time will has passed
 since qemu 1.1, which is the first version to understand version 3
 images, that changing the default shouldn't hurt many people any more
 and the benefits of using the new format outweigh the pain.
 
 qemu-iotests already runs with compat=1.1 by default.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  block/qcow2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/block/qcow2.c b/block/qcow2.c
 index 3376901..42ea7ec 100644
 --- a/block/qcow2.c
 +++ b/block/qcow2.c
 @@ -1402,7 +1402,7 @@ static int qcow2_create(const char *filename, 
 QEMUOptionParameter *options)
  int flags = 0;
  size_t cluster_size = DEFAULT_CLUSTER_SIZE;
  int prealloc = 0;
 -int version = 2;
 +int version = 3;
  
  /* Read out options */
  while (options  options-name) {


This does not affect qemu-img (or bdrv_img_create), as it gets overwritten
with 2 when BLOCK_OPT_COMPAT_LEVEL is present in options, but the value is NULL.

Should this go into qcow2_create_options[] as well?

Jan



Re: [Qemu-devel] [PATCH] mips/malta: prevent writes to reset flash mapping faulting

2013-08-23 Thread Andreas Färber
Am 23.08.2013 09:59, schrieb Leon Alrae:
 From: James Hogan james.ho...@imgtec.com
 
 Commit a427338 (mips_malta: correct reading MIPS revision at 0x1fc00010)
 altered the behaviour of the monitor flash mapping at the reset address
 by making it read only. However this causes data bus error exceptions
 when it is written to since it is effectively unassigned memory for
 writes. This isn't how the real hardware behaves. That memory can be
 written to (even with the MFWR jumper not fitted) and the new value read
 back from, but it doesn't get written back to the monitor flash so is
 volatile.
 
 This is fixed by converting the bios copy from read only ram to a bios
 device with a nop write callback.

That sounds like a contradiction: The nop write will not have reads
return the new value, will it?

Why not just remove the _set_readonly and have it reloaded on reset for
volatility?

Anyway, having a MemoryRegionOps with just a .write looks dangerous, but
I guess you've tested read to work. We had been seeing assertions
elsewhere when either was missing.

Regards,
Andreas

 
 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Paul Burton paul.bur...@imgtec.com
 Cc: Leon Alrae leon.al...@imgtec.com
 Cc: Aurelien Jarno aurel...@aurel32.net
 Signed-off-by: Leon Alrae leon.al...@imgtec.com
 ---
  hw/mips/mips_malta.c |   14 --
  1 files changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
 index f8d064c..9e721d3 100644
 --- a/hw/mips/mips_malta.c
 +++ b/hw/mips/mips_malta.c
 @@ -873,6 +873,16 @@ static void cpu_request_exit(void *opaque, int irq, int 
 level)
  }
  }
  
 +static void monflash_copy_mem_write(void *opaque, hwaddr ram_addr,
 +uint64_t val, unsigned size)
 +{
 +}
 +
 +static const MemoryRegionOps monflash_copy_mem_ops = {
 +.write = monflash_copy_mem_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,
 +};
 +
  static
  void mips_malta_init(QEMUMachineInitArgs *args)
  {
 @@ -1043,13 +1053,13 @@ void mips_malta_init(QEMUMachineInitArgs *args)
   * handled by an overlapping region as the resulting ROM code subpage
   * regions are not executable.
   */
 -memory_region_init_ram(bios_copy, NULL, bios.1fc, BIOS_SIZE);
 +memory_region_init_rom_device(bios_copy, NULL, monflash_copy_mem_ops, 
 NULL,
 +  bios.1fc, BIOS_SIZE);
  if (!rom_copy(memory_region_get_ram_ptr(bios_copy),
FLASH_ADDRESS, BIOS_SIZE)) {
  memcpy(memory_region_get_ram_ptr(bios_copy),
 memory_region_get_ram_ptr(bios), BIOS_SIZE);
  }
 -memory_region_set_readonly(bios_copy, true);
  memory_region_add_subregion(system_memory, RESET_ADDRESS, bios_copy);
  
  /* Board ID = 0x420 (Malta Board with CoreLV) */
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH] pseries: Update SLOF firmware image

2013-08-23 Thread Alexey Kardashevskiy
On 08/23/2013 07:17 PM, Alexey Kardashevskiy wrote:
 This has reworked USB OHCI and adds support of USB EHCI,
 VIRTIO-SCSI and various fixes (IBM VSCSI, VGA and more).


Oh. Please ignore that, just found out that there are more patches for SLOF.


-- 
Alexey



Re: [Qemu-devel] [PATCH 1/3] hw/usb/hcd-ohci.c: Move sysbus and PCI code to new files

2013-08-23 Thread Andreas Färber
Am 23.08.2013 03:18, schrieb Ákos Kovács:
 Move the existing sysbus and PCI logic to hcd-ohci-sysbus.c and
 hcd-ohci-pci.c from hcd-ohci.c. Create a new hcd-ohci.h header for the
 shared declarations and macros.
 
 Signed-off-by: Ákos Kovács akoskov...@gmx.com
 ---
  hw/usb/hcd-ohci-pci.c|   96 ++
  hw/usb/hcd-ohci-sysbus.c |   84 +
  hw/usb/hcd-ohci.c|  434 
 +++---
  hw/usb/hcd-ohci.h|  311 +
  4 files changed, 514 insertions(+), 411 deletions(-)
  create mode 100644 hw/usb/hcd-ohci-pci.c
  create mode 100644 hw/usb/hcd-ohci-sysbus.c
  create mode 100644 hw/usb/hcd-ohci.h
 
 diff --git a/hw/usb/hcd-ohci-pci.c b/hw/usb/hcd-ohci-pci.c
 new file mode 100644
 index 000..ac50951
 --- /dev/null
 +++ b/hw/usb/hcd-ohci-pci.c
 @@ -0,0 +1,96 @@
 +/*
 + * QEMU USB OHCI Emulation
 + * Copyright (c) 2004 Gianni Tedesco
 + * Copyright (c) 2006 CodeSourcery
 + * Copyright (c) 2006 Openedhand Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see 
 http://www.gnu.org/licenses/.
 + *
 + * TODO:
 + *  o Isochronous transfers
 + *  o Allocate bandwidth in frames properly
 + *  o Disable timers when nothing needs to be done, or remove timer usage
 + *all together.
 + *  o BIOS work to boot from USB storage
 +*/
[...]
 diff --git a/hw/usb/hcd-ohci-sysbus.c b/hw/usb/hcd-ohci-sysbus.c
 new file mode 100644
 index 000..b9d15db
 --- /dev/null
 +++ b/hw/usb/hcd-ohci-sysbus.c
 @@ -0,0 +1,84 @@
 +/*
 + * QEMU USB OHCI Emulation
 + * Copyright (c) 2004 Gianni Tedesco
 + * Copyright (c) 2006 CodeSourcery
 + * Copyright (c) 2006 Openedhand Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see 
 http://www.gnu.org/licenses/.
 + *
 + * TODO:
 + *  o Isochronous transfers
 + *  o Allocate bandwidth in frames properly
 + *  o Disable timers when nothing needs to be done, or remove timer usage
 + *all together.
 + *  o BIOS work to boot from USB storage
 +*/
[...]
 diff --git a/hw/usb/hcd-ohci.h b/hw/usb/hcd-ohci.h
 new file mode 100644
 index 000..8be00f2
 --- /dev/null
 +++ b/hw/usb/hcd-ohci.h
 @@ -0,0 +1,311 @@
 +/*
 + * QEMU USB OHCI Emulation
 + * Copyright (c) 2004 Gianni Tedesco
 + * Copyright (c) 2006 CodeSourcery
 + * Copyright (c) 2006 Openedhand Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see 
 http://www.gnu.org/licenses/.
 + *
 + * TODO:
 + *  o Isochronous transfers
 + *  o Allocate bandwidth in frames properly
 + *  o Disable timers when nothing needs to be done, or remove timer usage
 + *all together.
 + *  o BIOS work to boot from USB storage
 +*/

Is it really a good idea to duplicate the TODO list into four files?
I assume points 1-3 affect the core code that is not moving, so that all
four points could stay in hcd-ohci.c. No one is going to remember to
sync it otherwise.

Also there should probably be a space before */ for alignment. :)

When I create new files I usually add an up-to-date copyright line, not
sure if that's applicable here, depending on amount of changes.

The split itself seems like a good idea, thanks.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, 

[Qemu-devel] [PATCH] spapr: support CPU hotplug

2013-08-23 Thread Alexey Kardashevskiy
PAPR+ requires two RTAS calls to be supported by the hypervisor in
order to allow hotplugging VCPUs from the guest. The start-cpu RTAS
call was already there but stop-self was not.

This adds the stop-self RTAS call.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/ppc/spapr_rtas.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 394ce05..8a4cfa0 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -202,6 +202,19 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, 
sPAPREnvironment *spapr,
 rtas_st(rets, 0, -3);
 }
 
+static void rtas_stop_self(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+   uint32_t token, uint32_t nargs,
+   target_ulong args,
+   uint32_t nret, target_ulong rets)
+{
+CPUState *cs = CPU(cpu);
+CPUPPCState *env = cpu-env;
+
+cs-halted = 1;
+env-msr = 0;
+cs-exit_request = 1;
+}
+
 static struct rtas_call {
 const char *name;
 spapr_rtas_fn fn;
@@ -322,6 +335,7 @@ static void core_rtas_register_types(void)
 spapr_rtas_register(query-cpu-stopped-state,
 rtas_query_cpu_stopped_state);
 spapr_rtas_register(start-cpu, rtas_start_cpu);
+spapr_rtas_register(stop-self, rtas_stop_self);
 }
 
 type_init(core_rtas_register_types)
-- 
1.8.4.rc4




Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-23 Thread Andrew Jones


- Original Message -
 Am 22.08.2013 18:12, schrieb Eduardo Habkost:
  
  On 22/08/2013, at 12:39, Andrew Jones drjo...@redhat.com wrote:
  
  The comment in kvm_max_vcpus() states that it's using the recommended
  procedure from the kernel API documentation to get the max number
  of vcpus that kvm supports. It is, but by always returning the
  maximum number supported. The maximum number should only be used
  for development purposes. qemu should check KVM_CAP_NR_VCPUS for
  the recommended number of vcpus. This patch adds a warning if a user
  specifies a number of cpus between the recommended and max.
 
  Signed-off-by: Andrew Jones drjo...@redhat.com
  
  CCing libvir-list. It is probably interesting for libvirt to expose or warn
  about the recommended VCPU limit somehow, and in this case a simple
  warning on stderr won't be enough.
  
  ---
  kvm-all.c | 45 +++--
  1 file changed, 27 insertions(+), 18 deletions(-)
 
  diff --git a/kvm-all.c b/kvm-all.c
  index 716860f617455..9092e13ae60ea 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
  return 0;
  }
 
  -static int kvm_max_vcpus(KVMState *s)
  +/* Find number of supported CPUs using the recommended
  + * procedure from the kernel API documentation to cope with
  + * older kernels that may be missing capabilities.
  + */
  +static int kvm_recommended_vcpus(KVMState *s)
  {
  int ret;
 
  -/* Find number of supported CPUs using the recommended
  - * procedure from the kernel API documentation to cope with
  - * older kernels that may be missing capabilities.
  - */
  -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
  -if (ret) {
  -return ret;
  -}
  ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
  -if (ret) {
  -return ret;
  -}
  +return (ret) ? ret : 4;
  +}
 
  -return 4;
  +static int kvm_max_vcpus(KVMState *s)
  +{
  +int ret;
  +
  +ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
  +return (ret) ? ret : kvm_recommended_vcpus(s);
  }
 
  int kvm_init(void)
  @@ -1383,12 +1383,21 @@ int kvm_init(void)
  goto err;
  }
 
  -max_vcpus = kvm_max_vcpus(s);
  +max_vcpus = kvm_recommended_vcpus(s);
  if (smp_cpus  max_vcpus) {
  -ret = -EINVAL;
  -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max
  cpus 
  -supported by KVM (%d)\n, smp_cpus, max_vcpus);
  -goto err;
  +fprintf(stderr,
  +Warning: Number of SMP cpus requested (%d) exceeds 
  +recommended cpus supported by KVM (%d)\n,
  +smp_cpus, max_vcpus);
  +
  +max_vcpus = kvm_max_vcpus(s);
  +if (smp_cpus  max_vcpus) {
  +ret = -EINVAL;
  +fprintf(stderr, Number of SMP cpus requested (%d) exceeds 
  +max cpus supported by KVM (%d)\n,
  +smp_cpus, max_vcpus);
  +goto err;
  +}
 
 Should at least the fatal one use the new error_report()?

So far kvm-all.c doesn't use error_report(). I'm inclined to leave it that way
for now, for the scope of this patch anyway. Maybe we should convert all of
kvm-all.c at some point though?

 
  }
 
  s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
 
 I notice that only checks in kvm_init() based on smp_cpus are touched
 herein. Should we add similar checks to CPU hot-add code and thus
 possibly move that into some per-vCPU code path?
 

That's a better question for hot-plug folk. Does smp_cpus map to the current
number of cpus, or to the number of possible cpus? If it maps to the number
of possible cpus, then this is the right place. If the former, then I guess
it'll take more thought. I'ved added Igor (still on vacation) to this reply,
but regardless I vote we worry about hot-plug limit checking in different
patch.

thanks,
drew

 Regards,
 Andreas
 
 --
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
 



Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-23 Thread Andrew Jones


- Original Message -
 Il 22/08/2013 17:39, Andrew Jones ha scritto:
  The comment in kvm_max_vcpus() states that it's using the recommended
  procedure from the kernel API documentation to get the max number
  of vcpus that kvm supports. It is, but by always returning the
  maximum number supported. The maximum number should only be used
  for development purposes. qemu should check KVM_CAP_NR_VCPUS for
  the recommended number of vcpus. This patch adds a warning if a user
  specifies a number of cpus between the recommended and max.
  
  Signed-off-by: Andrew Jones drjo...@redhat.com
  ---
   kvm-all.c | 45 +++--
   1 file changed, 27 insertions(+), 18 deletions(-)
  
  diff --git a/kvm-all.c b/kvm-all.c
  index 716860f617455..9092e13ae60ea 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
   return 0;
   }
   
  -static int kvm_max_vcpus(KVMState *s)
  +/* Find number of supported CPUs using the recommended
  + * procedure from the kernel API documentation to cope with
  + * older kernels that may be missing capabilities.
  + */
  +static int kvm_recommended_vcpus(KVMState *s)
   {
   int ret;
   
  -/* Find number of supported CPUs using the recommended
  - * procedure from the kernel API documentation to cope with
  - * older kernels that may be missing capabilities.
  - */
  -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
  -if (ret) {
  -return ret;
  -}
   ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
  -if (ret) {
  -return ret;
  -}
  +return (ret) ? ret : 4;
  +}
   
  -return 4;
  +static int kvm_max_vcpus(KVMState *s)
  +{
  +int ret;
  +
  +ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
  +return (ret) ? ret : kvm_recommended_vcpus(s);
   }
   
   int kvm_init(void)
  @@ -1383,12 +1383,21 @@ int kvm_init(void)
   goto err;
   }
   
  -max_vcpus = kvm_max_vcpus(s);
  +max_vcpus = kvm_recommended_vcpus(s);
   if (smp_cpus  max_vcpus) {
  -ret = -EINVAL;
  -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max
  cpus 
  -supported by KVM (%d)\n, smp_cpus, max_vcpus);
  -goto err;
  +fprintf(stderr,
  +Warning: Number of SMP cpus requested (%d) exceeds 
  +recommended cpus supported by KVM (%d)\n,
  +smp_cpus, max_vcpus);
  +
  +max_vcpus = kvm_max_vcpus(s);
  +if (smp_cpus  max_vcpus) {
  +ret = -EINVAL;
  +fprintf(stderr, Number of SMP cpus requested (%d) exceeds 
  +max cpus supported by KVM (%d)\n,
  +smp_cpus, max_vcpus);
  +goto err;
  +}
 
 You print both error messages when smp_cpus is greater than the max cpus
 supported; is it intentional?
 

Yup. This way we can inform the user not only that they're greater than
hard-max, but also what the soft-max is. This allows the user to choose
how much lower to adjust when they correct for the hard-max, possibly going
down low enough to avoid blowing the soft-max as well.

 Apart from this, the concept looks good.  However, please over
 qemu-kvm.git's uq/master branch, where we already have Marcelo's patch
 to check max_cpus too against kvm_max_vcpus(s).
 

OK, will respin on uq/master.

thanks,
drew



Re: [Qemu-devel] [PATCH v3 2/8] xics: add pre_save/post_load/cpu_setup dispatchers

2013-08-23 Thread Andreas Färber
Am 23.08.2013 05:39, schrieb Alexey Kardashevskiy:
 On 08/19/2013 11:54 PM, Andreas Färber wrote:
 Am 19.08.2013 07:55, schrieb Alexey Kardashevskiy:
 The upcoming support of in-kernel XICS will redefine migration callbacks
 for both ICS and ICP so classes and callback pointers are added.

 This adds a cpu_setup callback to the XICS device class (as XICS-KVM
 will do it different) and xics_dispatch_cpu_setup(). This also moves
 the place where xics_dispatch_cpu_setup() is called a bit further to
 have VCPU initialized (required for XICS-KVM).

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v3:
 * fixed local variables names
 ---
  hw/intc/xics.c| 61 
 +++
  hw/ppc/spapr.c|  4 ++--
  include/hw/ppc/xics.h | 46 +-
  3 files changed, 104 insertions(+), 7 deletions(-)

 diff --git a/hw/intc/xics.c b/hw/intc/xics.c
 index 6b3c071..e3a957d 100644
 --- a/hw/intc/xics.c
 +++ b/hw/intc/xics.c
 [...]
 @@ -674,10 +724,12 @@ static Property xics_properties[] = {
  static void xics_class_init(ObjectClass *oc, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(oc);
 +XICSStateClass *k = XICS_CLASS(oc);
  
  dc-realize = xics_realize;
  dc-props = xics_properties;
  dc-reset = xics_reset;
 +k-cpu_setup = xics_cpu_setup;
  
  spapr_rtas_register(ibm,set-xive, rtas_set_xive);
  spapr_rtas_register(ibm,get-xive, rtas_get_xive);

 This hunk is fixed up in 4/8, please squash that bit here.
 
 Thanks for the review, fixed this.
 
 
 Otherwise looks good.
 
 What exactly looks good? This patch only?

Yes. And another one has by Rb already I believe. I hope Alex can start
cherry-picking some more patches when he gets back to NUE.

I know you had some open questions I still need to reply to. Among
others, yes, I concur with mst that ppc/spapr_pci.c could be moved to
pci-host/spapr.c, but remember to update MAINTAINERS for you guys to
still get CC'ed on it then.

I had asked Anthony to reply to the KVM XICS patch because he said that
accessing the parent's method seemed wrong to him and the parent
implement should be put in charge of calling rather than the derived.
I.e., I will be dropping my proposed OBJECT_GET_PARENT_CLASS() macro and
started redoing several series. Haven't looked into how exactly your
code (CPU setup I think?) may need to change.

Andreas

 The whole series? If it is the
 whole series, can I put Reviewed-By: Andreas into them before repost them
 all? I am asking because Alex Graf won't even look at them before I get
 them reviewed/acked/signed by some one less ignorant than me :)
 Thanks!

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH v3 0/8] xics: reworks and in-kernel support

2013-08-23 Thread Andreas Färber
Am 19.08.2013 07:55, schrieb Alexey Kardashevskiy:
 Yet another try with XICS-KVM.
 
 v2-v3:
 Addressed multiple comments from Andreas;
 Added 2 patches for XICS from Ben - I included them into the series as they
 are about XICS and they won't rebase automatically if moved before XICS rework
 so it seemed to me that it would be better to carry them toghether. If it is
 wrong, please let me know, I'll repost them separately.

Patches 7-8 never arrived on the list?

Andreas

 
 v1-v2:
 The main change is this adds xics-common parent for emulated XICS and 
 XICS-KVM.
 And many, many small changes, mostly to address Andreas comments.
 
 Migration from XICS to XICS-KVM and vice versa still works.
 
 
 Alexey Kardashevskiy (4):
   xics: add pre_save/post_load/cpu_setup dispatchers
   xics: move registration of global state to realize()
   xics: minor changes and cleanups
   xics: split to xics and xics-common
 
 Benjamin Herrenschmidt (2):
   xics: Add H_IPOLL implementation
   xics: Implement H_XIRR_X
 
 David Gibson (2):
   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
   xics-kvm: Support for in-kernel XICS interrupt controller
 
  default-configs/ppc64-softmmu.mak |   1 +
  hw/intc/Makefile.objs |   1 +
  hw/intc/xics.c| 358 +--
  hw/intc/xics_kvm.c| 492 
 ++
  hw/ppc/spapr.c|  27 ++-
  include/hw/ppc/spapr.h|   3 +-
  include/hw/ppc/xics.h |  68 +-
  target-ppc/kvm.c  |  14 ++
  target-ppc/kvm_ppc.h  |   7 +
  9 files changed, 894 insertions(+), 77 deletions(-)
  create mode 100644 hw/intc/xics_kvm.c
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH v3] xics: Add H_IPOLL implementation

2013-08-23 Thread Alexey Kardashevskiy
From: Benjamin Herrenschmidt b...@kernel.crashing.org

This adds support for the H_IPOLL hypercall which the guest
uses to poll for a pending interrupt. This hypercall is
mandatory for PAPR+ and there is no way for the guest to
detect whether it is supported or not so just add it.

Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/intc/xics.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index f439e8d..2e6111d 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -689,6 +689,18 @@ static target_ulong h_eoi(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_ipoll(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+target_ulong opcode, target_ulong *args)
+{
+CPUState *cs = CPU(cpu);
+ICPState *ss = spapr-icp-ss[cs-cpu_index];
+
+args[0] = ss-xirr;
+args[1] = ss-mfrr;
+
+return H_SUCCESS;
+}
+
 static void rtas_set_xive(PowerPCCPU *cpu, sPAPREnvironment *spapr,
   uint32_t token,
   uint32_t nargs, target_ulong args,
@@ -842,6 +854,7 @@ static void xics_realize(DeviceState *dev, Error **errp)
 spapr_register_hypercall(H_IPI, h_ipi);
 spapr_register_hypercall(H_XIRR, h_xirr);
 spapr_register_hypercall(H_EOI, h_eoi);
+spapr_register_hypercall(H_IPOLL, h_ipoll);
 
 object_property_set_bool(OBJECT(icp-ics), true, realized, error);
 if (error) {
-- 
1.8.4.rc4




[Qemu-devel] [PATCH v3] xics: Implement H_XIRR_X

2013-08-23 Thread Alexey Kardashevskiy
From: Benjamin Herrenschmidt b...@kernel.crashing.org

This implements H_XIRR_X hypercall in addition to H_XIRR as
it is mandatory for PAPR+ and there is no way for the guest to
detect whether it is supported or not so just add it.

As the Partition Adjunct Option is not supported at the moment,
the CPPR parameter of the hypercall is ignored.

Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/intc/xics.c | 14 ++
 include/hw/ppc/spapr.h |  3 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 2e6111d..b7f849a 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -27,6 +27,7 @@
 
 #include hw/hw.h
 #include trace.h
+#include qemu/timer.h
 #include hw/ppc/spapr.h
 #include hw/ppc/xics.h
 #include qemu/error-report.h
@@ -679,6 +680,18 @@ static target_ulong h_xirr(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_xirr_x(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+ target_ulong opcode, target_ulong *args)
+{
+CPUState *cs = CPU(cpu);
+ICPState *ss = spapr-icp-ss + cs-cpu_index;
+uint32_t xirr = icp_accept(ss);
+
+args[0] = xirr;
+args[1] = cpu_get_real_ticks();
+return H_SUCCESS;
+}
+
 static target_ulong h_eoi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
   target_ulong opcode, target_ulong *args)
 {
@@ -853,6 +866,7 @@ static void xics_realize(DeviceState *dev, Error **errp)
 spapr_register_hypercall(H_CPPR, h_cppr);
 spapr_register_hypercall(H_IPI, h_ipi);
 spapr_register_hypercall(H_XIRR, h_xirr);
+spapr_register_hypercall(H_XIRR_X, h_xirr_x);
 spapr_register_hypercall(H_EOI, h_eoi);
 spapr_register_hypercall(H_IPOLL, h_ipoll);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9fc1972..a84b8ff 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -267,7 +267,8 @@ typedef struct sPAPREnvironment {
 #define H_GET_EM_PARMS  0x2B8
 #define H_SET_MPP   0x2D0
 #define H_GET_MPP   0x2D4
-#define MAX_HCALL_OPCODEH_GET_MPP
+#define H_XIRR_X0x2FC
+#define MAX_HCALL_OPCODEH_XIRR_X
 
 /* The hcalls above are standardized in PAPR and implemented by pHyp
  * as well.
-- 
1.8.4.rc4




Re: [Qemu-devel] [PATCH v3 0/8] xics: reworks and in-kernel support

2013-08-23 Thread Alexey Kardashevskiy
On 08/23/2013 09:56 PM, Andreas Färber wrote:
 Am 19.08.2013 07:55, schrieb Alexey Kardashevskiy:
 Yet another try with XICS-KVM.

 v2-v3:
 Addressed multiple comments from Andreas;
 Added 2 patches for XICS from Ben - I included them into the series as they
 are about XICS and they won't rebase automatically if moved before XICS 
 rework
 so it seemed to me that it would be better to carry them toghether. If it is
 wrong, please let me know, I'll repost them separately.
 
 Patches 7-8 never arrived on the list?

Weird. Posted them again as replies to the cover letter...


 
 Andreas
 

 v1-v2:
 The main change is this adds xics-common parent for emulated XICS and 
 XICS-KVM.
 And many, many small changes, mostly to address Andreas comments.

 Migration from XICS to XICS-KVM and vice versa still works.


 Alexey Kardashevskiy (4):
   xics: add pre_save/post_load/cpu_setup dispatchers
   xics: move registration of global state to realize()
   xics: minor changes and cleanups
   xics: split to xics and xics-common

 Benjamin Herrenschmidt (2):
   xics: Add H_IPOLL implementation
   xics: Implement H_XIRR_X

 David Gibson (2):
   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
   xics-kvm: Support for in-kernel XICS interrupt controller

  default-configs/ppc64-softmmu.mak |   1 +
  hw/intc/Makefile.objs |   1 +
  hw/intc/xics.c| 358 +--
  hw/intc/xics_kvm.c| 492 
 ++
  hw/ppc/spapr.c|  27 ++-
  include/hw/ppc/spapr.h|   3 +-
  include/hw/ppc/xics.h |  68 +-
  target-ppc/kvm.c  |  14 ++
  target-ppc/kvm_ppc.h  |   7 +
  9 files changed, 894 insertions(+), 77 deletions(-)
  create mode 100644 hw/intc/xics_kvm.c

 
 


-- 
Alexey



[Qemu-devel] [PATCH] kvm: sync cpu state on internal error before dump

2013-08-23 Thread James Hogan
When a KVM internal error occurs QEMU dumps the CPU state, however it
doesn't synchronise the state from KVM first so the dumped state is out
of date. Add the synchronisation calls before the dump in both locations
(which is used depends on whether the arch says to stop or not).

Note that x86 does a sync in its kvm_arch_stop_on_emulation_error()
function so at least for emulation errors is unaffected.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Gleb Natapov g...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: k...@vger.kernel.org
---
 kvm-all.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index b788fcd..803141a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1546,6 +1546,7 @@ static int kvm_handle_internal_error(CPUState *cpu, 
struct kvm_run *run)
 if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
 fprintf(stderr, emulation failure\n);
 if (!kvm_arch_stop_on_emulation_error(cpu)) {
+kvm_cpu_synchronize_state(cpu);
 cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
 return EXCP_INTERRUPT;
 }
@@ -1701,6 +1702,7 @@ int kvm_cpu_exec(CPUState *cpu)
 } while (ret == 0);
 
 if (ret  0) {
+kvm_cpu_synchronize_state(cpu);
 cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
 vm_stop(RUN_STATE_INTERNAL_ERROR);
 }
-- 
1.8.1.2





Re: [Qemu-devel] [PATCH v4 0/6] Clean up bogus default boot order

2013-08-23 Thread Markus Armbruster
Michael S. Tsirkin m...@redhat.com writes:

 On Fri, Aug 16, 2013 at 01:13:44PM +0200, arm...@redhat.com wrote:
 From: Markus Armbruster arm...@redhat.com
 
 The first five patches are admittedly related to the stated purpose of
 this series pretty much only by I can't stand perpetuating this
 stupid crap.  Max Filippov and Peter Maydell already cleaned up
 Xtensa and ARM the same way.

 I picked up patches 3,4 and 5 on my tree.
 1 and 2 were rebased by Eduardo, I'm taking them
 from his patchset.
 6 needs to be rebased and comments addressed.

Applies fine with git-am -3.  Pushed to
http://repo.or.cz/w/qemu/armbru.git/shortlog/refs/heads/boot-order
for your convenience.

We discussed the patch at some length, but it's not 100% clear to me
what exactly you'd like me to address and how.  So let's recap briefly.

I think your main point was that PC machine type declarations are a bit
repetitive.  They all share two lines:

.max_cpus = 255,
DEFAULT_MACHINE_OPTIONS,

where DEFAULT_MACHINE_OPTIONS is defined as

#define DEFAULT_MACHINE_OPTIONS \
.boot_order = cad

Many of them also share one of these lines:

.desc = Standard PC (Q35 + ICH9, 2009),
.desc = Standard PC (i440FX + PIIX, 1996),
.desc = Standard PC,

My patch touches only the shared DEFAULT_MACHINE_OPTIONS line.  It
becomes

   .boot_order = cad

Commit message explains why:

We set default boot order cad in every single machine definition
except pseries and moxiesim, even though very few boards actually
care for boot order, and cad makes sense for even fewer.

Machines that care:

* pc and its variants

  Accept up to three letters 'a', 'b' (undocumented alias for 'a'),
  'c', 'd' and 'n'.  Reject all others (fatal with -boot).

[...]

Strip characters these machines ignore from their default boot order.

For all other machines, remove the unused default boot order
alltogether.

The change is systematic: if the machine uses .boot_order, strip the
characters it ignores from its initial value, else drop the initializer,
so .boot_order remains null.

I don't want to squash further cleanup into this one, because it's hard
enough to review as it is (and it already got competent review).  I
could be persuaded to do further cleanup on top, but you need to tell me
what cleanup you want.  Probably faster if you do it yourself :)



Re: [Qemu-devel] [PATCH 2/3] savevm: wrong error set by ram_control_load_hook()

2013-08-23 Thread Lei Li

On 08/23/2013 06:12 PM, Paolo Bonzini wrote:

Il 23/08/2013 12:03, Lei Li ha scritto:

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  savevm.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/savevm.c b/savevm.c
index 95a11f9..fb17a6f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -649,7 +649,7 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
  
  void ram_control_load_hook(QEMUFile *f, uint64_t flags)

  {
-int ret = 0;
+int ret = EINVAL;

This should be -EINVAL.


Ooops! Sorry for the rush as I was off line to go home...  :-[
Resend soon.



Paolo


  if (f-ops-hook_ram_load) {
  ret = f-ops-hook_ram_load(f, f-opaque, flags);






--
Lei




Re: [Qemu-devel] [PATCH] kvm: sync cpu state on internal error before dump

2013-08-23 Thread Gleb Natapov
On Fri, Aug 23, 2013 at 01:26:00PM +0100, James Hogan wrote:
 When a KVM internal error occurs QEMU dumps the CPU state, however it
 doesn't synchronise the state from KVM first so the dumped state is out
 of date. Add the synchronisation calls before the dump in both locations
 (which is used depends on whether the arch says to stop or not).
 
x86_cpu_dump_state() calls cpu_synchronize_state() already.

 Note that x86 does a sync in its kvm_arch_stop_on_emulation_error()
 function so at least for emulation errors is unaffected.
 
 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: k...@vger.kernel.org
 ---
  kvm-all.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index b788fcd..803141a 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -1546,6 +1546,7 @@ static int kvm_handle_internal_error(CPUState *cpu, 
 struct kvm_run *run)
  if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
  fprintf(stderr, emulation failure\n);
  if (!kvm_arch_stop_on_emulation_error(cpu)) {
 +kvm_cpu_synchronize_state(cpu);
  cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
  return EXCP_INTERRUPT;
  }
 @@ -1701,6 +1702,7 @@ int kvm_cpu_exec(CPUState *cpu)
  } while (ret == 0);
  
  if (ret  0) {
 +kvm_cpu_synchronize_state(cpu);
  cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_CODE);
  vm_stop(RUN_STATE_INTERNAL_ERROR);
  }
 -- 
 1.8.1.2
 

--
Gleb.



[Qemu-devel] [PATCH v2] kvm: warn if num cpus is greater than num recommended

2013-08-23 Thread Andrew Jones
The comment in kvm_max_vcpus() states that it's using the recommended
procedure from the kernel API documentation to get the max number
of vcpus that kvm supports. It is, but by always returning the
maximum number supported. The maximum number should only be used
for development purposes. qemu should check KVM_CAP_NR_VCPUS for
the recommended number of vcpus. This patch adds a warning if a user
specifies a number of cpus between the recommended and max.

v2:
Incorporate tests for max_cpus, which specifies the maximum number
of hotpluggable cpus. An additional note is that the message for
the fail case was slightly changed, 'exceeds max cpus' to
'exceeds the maximum cpus'. If this is unacceptable change for
users like libvirt, then I'll need to spin a v3.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 kvm-all.c | 69 ---
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index a2d49786365e3..021f5f47e53da 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1322,24 +1322,20 @@ static int kvm_irqchip_create(KVMState *s)
 return 0;
 }
 
-static int kvm_max_vcpus(KVMState *s)
+/* Find number of supported CPUs using the recommended
+ * procedure from the kernel API documentation to cope with
+ * older kernels that may be missing capabilities.
+ */
+static int kvm_recommended_vcpus(KVMState *s)
 {
-int ret;
-
-/* Find number of supported CPUs using the recommended
- * procedure from the kernel API documentation to cope with
- * older kernels that may be missing capabilities.
- */
-ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
-if (ret) {
-return ret;
-}
-ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
-if (ret) {
-return ret;
-}
+int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
+return (ret) ? ret : 4;
+}
 
-return 4;
+static int kvm_max_vcpus(KVMState *s)
+{
+int ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
+return (ret) ? ret : kvm_recommended_vcpus(s);
 }
 
 int kvm_init(void)
@@ -1347,11 +1343,19 @@ int kvm_init(void)
 static const char upgrade_note[] =
 Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n
 (see http://sourceforge.net/projects/kvm).\n;
+struct {
+const char *name;
+int num;
+} num_cpus[] = {
+{ SMP,  smp_cpus },
+{ hotpluggable, max_cpus },
+{ NULL, }
+}, *nc = num_cpus;
+int soft_vcpus_limit, hard_vcpus_limit;
 KVMState *s;
 const KVMCapabilityInfo *missing_cap;
 int ret;
 int i;
-int max_vcpus;
 
 s = g_malloc0(sizeof(KVMState));
 
@@ -1392,19 +1396,26 @@ int kvm_init(void)
 goto err;
 }
 
-max_vcpus = kvm_max_vcpus(s);
-if (smp_cpus  max_vcpus) {
-ret = -EINVAL;
-fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus 
-supported by KVM (%d)\n, smp_cpus, max_vcpus);
-goto err;
-}
+/* check the vcpu limits */
+soft_vcpus_limit = kvm_recommended_vcpus(s);
+hard_vcpus_limit = kvm_max_vcpus(s);
 
-if (max_cpus  max_vcpus) {
-ret = -EINVAL;
-fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds 
max cpus 
-supported by KVM (%d)\n, max_cpus, max_vcpus);
-goto err;
+while (nc-name) {
+if (nc-num  soft_vcpus_limit) {
+fprintf(stderr,
+Warning: Number of %s cpus requested (%d) exceeds 
+the recommended cpus supported by KVM (%d)\n,
+nc-name, nc-num, soft_vcpus_limit);
+
+if (nc-num  hard_vcpus_limit) {
+ret = -EINVAL;
+fprintf(stderr, Number of %s cpus requested (%d) exceeds 
+the maximum cpus supported by KVM (%d)\n,
+nc-name, nc-num, hard_vcpus_limit);
+goto err;
+}
+}
+nc++;
 }
 
 s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
-- 
1.8.1.4




[Qemu-devel] [PATCH resend 1/3] savevm: add comments for qemu_file_get_error()

2013-08-23 Thread Lei Li
Add comments for qemu_file_get_error(), as its return value
is not very clear.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 savevm.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 03fc4d9..95a11f9 100644
--- a/savevm.c
+++ b/savevm.c
@@ -566,6 +566,13 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
 return f;
 }
 
+/*
+ * Get last error for stream f
+ *
+ * Return negative error value if there has been an error on previous
+ * operations, return 0 if no error happened.
+ *
+ */
 int qemu_file_get_error(QEMUFile *f)
 {
 return f-last_error;
-- 
1.7.7.6




[Qemu-devel] [PATCH resend 3/3] arch_init: right return for ram_save_iterate

2013-08-23 Thread Lei Li
Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 arch_init.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 94d45e1..a34437c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -709,16 +709,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  */
 ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
-if (ret  0) {
-bytes_transferred += total_sent;
-return ret;
-}
-
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 total_sent += 8;
 bytes_transferred += total_sent;
 
-return total_sent;
+return qemu_file_get_error(f);
 }
 
 static int ram_save_complete(QEMUFile *f, void *opaque)
-- 
1.7.7.6




[Qemu-devel] [PATCH 0/2] object_initialize: check size of passed in memory

2013-08-23 Thread Peter Maydell
This patchset addresses a concern that came up with Andreas' recent
patches for using embedded objects in some of the ARM CPU devices:
object_initialize() doesn't check that there's actually enough space
for the type being added, so if you have:

struct MyDevice {
   ...
   SomeObject obj;
};

object_initialize(mydev-obj, some-object);

then there's no compile time or runtime check that SomeObject
is really big enough for the some-object object -- if the
implementation is changed later then there will be silent
memory corruption.

These patches make object_initialize() a macro which can then
use sizeof(*PTR) to pass the size into the implementation to
be checked.

The virtio patch is worth applying anyway -- it removes some
pointless casts which would otherwise have caused false
positives.

Disclaimer: I've eyeballed all the uses of object_initialize()
but I haven't necessarily tested them all.

Peter Maydell (2):
  virtio: Remove unnecessary OBJECT casts
  qom: Make object_initialize and object_initialize_with_type check
size

 hw/core/qdev.c |2 +-
 hw/s390x/s390-virtio-bus.c |   12 ++--
 hw/s390x/virtio-ccw.c  |   14 +++---
 hw/virtio/virtio-pci.c |   16 
 include/qom/object.h   |   36 ++--
 qom/object.c   |9 +
 6 files changed, 61 insertions(+), 28 deletions(-)

-- 
1.7.9.5




Re: [Qemu-devel] [PATCH] kvm: sync cpu state on internal error before dump

2013-08-23 Thread James Hogan
On 23/08/13 13:58, Gleb Natapov wrote:
 On Fri, Aug 23, 2013 at 01:26:00PM +0100, James Hogan wrote:
 When a KVM internal error occurs QEMU dumps the CPU state, however it
 doesn't synchronise the state from KVM first so the dumped state is out
 of date. Add the synchronisation calls before the dump in both locations
 (which is used depends on whether the arch says to stop or not).

 x86_cpu_dump_state() calls cpu_synchronize_state() already.

Ah yes, thanks. I hadn't noticed that.

Out of the arches that support KVM only x86 and ppc call it. arm, mips
(qemu support not upstream yet), and s390 don't. s390 never seems to
emit that exit code, and arm only does so for unsupported exceptions
(which should never happen).

I'll fix in mips_cpu_dump_state() instead.

Cheers
James




[Qemu-devel] [PATCH resend 2/3] savevm: fix wrong error set by ram_control_load_hook()

2013-08-23 Thread Lei Li
It should set negative error value if there has been an error.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
 savevm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/savevm.c b/savevm.c
index 95a11f9..a0be109 100644
--- a/savevm.c
+++ b/savevm.c
@@ -649,7 +649,7 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 
 void ram_control_load_hook(QEMUFile *f, uint64_t flags)
 {
-int ret = 0;
+int ret = -EINVAL;
 
 if (f-ops-hook_ram_load) {
 ret = f-ops-hook_ram_load(f, f-opaque, flags);
-- 
1.7.7.6




  1   2   3   >