Re: [Qemu-devel] QEMU and Real Time OS

2015-02-02 Thread Marcelo Tosatti
On Fri, Jan 30, 2015 at 08:37:47AM +0100, Jan Kiszka wrote:
 On 2015-01-30 00:06, Paolo Bonzini wrote:
  
  
  On 29/01/2015 20:37, Marc MarĂ­ wrote:
  Is this an expected behaviour? I can't see why.
 
  I'd like to know if there is a certain reason why it doesn't work. Or
  if it should work and the problem is too much I/O overhead. Or any
  other hint to understand it.
  
  It is due to latencies in the host.  You need at least to use preempt-rt
  kernels in the host as well.
 
 That alone won't help much. You also need to fine-tune the guest to
 avoid running into QEMU locks that continuously synchronizes the guest
 on things like VGA or disk I/O emulation.
 
 When using KVM, thus being able to run VCPUs widely independent of each
 other and the device models, you need to push cyclictest on an isolated
 second virtual CPU of the guest. Luiz and Marcelo can probably confirm
 this based on their ongoing experiments.

Yes, we have achieved low latencies by using a dedicated pCPU for a
guest vCPU. 

This also avoids iothread - guest vCPU -RT priority issues.

 With TCG, we would first of all have to make it true SMP and independent
 of the I/O device lock. That's what Frederic is working on [1].
 
 Jan
 
 [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/314406
 
 -- 
 Siemens AG, Corporate Technology, CT RTC ITP SES-DE
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-02 Thread Denis V. Lunev

On 02/02/15 23:40, Peter Lieven wrote:

Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:

qemu_gluster_co_discard calculates size to discard as follows
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

glfs_discard_async is declared as follows:
   int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
   glfs_io_cbk fn, void *data) __THROW
This is problematic on i686 as sizeof(size_t) == 4.

Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
on i386.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Kevin Wolf kw...@redhat.com
CC: Peter Lieven p...@kamp.de
---
  block/gluster.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..8a8c153 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -622,6 +622,11 @@ out:
  return ret;
  }
  
+static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)

+{
+bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
+}
+

Looking at the gluster code bl.max_transfer_length should have the same limit, 
but thats a different patch.

ha, the same applies to nbd code too.

I'll do this stuff tomorrow and also I think that some
audit in other drivers could reveal something interesting.

Den



Re: [Qemu-devel] [PATCH v3 06/14] qemu-img: Use blk_new_open() in img_rebase()

2015-02-02 Thread Max Reitz

On 2015-02-02 at 14:00, Kevin Wolf wrote:

Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:

Signed-off-by: Max Reitz mre...@redhat.com
---
  qemu-img.c | 57 -
  1 file changed, 24 insertions(+), 33 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index be1953d..0b23c87 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2430,7 +2430,6 @@ static int img_rebase(int argc, char **argv)
  {
  BlockBackend *blk = NULL, *blk_old_backing = NULL, *blk_new_backing = 
NULL;
  BlockDriverState *bs = NULL, *bs_old_backing = NULL, *bs_new_backing = 
NULL;
-BlockDriver *old_backing_drv, *new_backing_drv;
  char *filename;
  const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
  int c, flags, src_flags, ret;
@@ -2524,54 +2523,46 @@ static int img_rebase(int argc, char **argv)
  }
  bs = blk_bs(blk);
  
-/* Find the right drivers for the backing files */

-old_backing_drv = NULL;
-new_backing_drv = NULL;
-
-if (!unsafe  bs-backing_format[0] != '\0') {
-old_backing_drv = bdrv_find_format(bs-backing_format);
-if (old_backing_drv == NULL) {
-error_report(Invalid format name: '%s', bs-backing_format);
-ret = -1;
-goto out;
-}
-}
-
-if (out_basefmt != NULL) {
-new_backing_drv = bdrv_find_format(out_basefmt);
-if (new_backing_drv == NULL) {
-error_report(Invalid format name: '%s', out_basefmt);
-ret = -1;
-goto out;
-}
-}

You're removing the validity check of the new backing file format:

[master] $ ./qemu-img rebase -u -b /tmp/backing.qcow2 -F foobar /tmp/test.qcow2
qemu-img: Invalid format name: 'foobar'
[master] $

[growable-v3] $ ./qemu-img rebase -u -b /tmp/backing.qcow2 -F foobar 
/tmp/test.qcow2
[growable-v3] $


Oops, right, I missed that this changes the unsafe path. Thanks for 
catching it! I'll fix it in v4.


Max



Re: [Qemu-devel] [PATCH v3 13/14] block: Remove growable from BDS

2015-02-02 Thread Kevin Wolf
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:
 Now that request clamping is done in the BlockBackend, the growable
 field can be removed from the BlockDriverState. All BDSs are now treated
 as being growable (that is, they are allowed to grow; they are not
 necessarily actually able to).
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block.c   | 24 +---
  block/qcow2.c |  6 --
  block/raw-posix.c |  2 +-
  block/raw-win32.c |  2 +-
  block/sheepdog.c  |  2 +-
  include/block/block_int.h |  3 ---
  6 files changed, 8 insertions(+), 31 deletions(-)
 
 diff --git a/block.c b/block.c
 index d45e4dd..356a857 100644
 --- a/block.c
 +++ b/block.c
 @@ -970,7 +970,6 @@ static int bdrv_open_common(BlockDriverState *bs, 
 BlockDriverState *file,
  bs-zero_beyond_eof = true;
  open_flags = bdrv_open_flags(bs, flags);
  bs-read_only = !(open_flags  BDRV_O_RDWR);
 -bs-growable = !!(flags  BDRV_O_PROTOCOL);
  
  if (use_bdrv_whitelist  !bdrv_is_whitelisted(drv, bs-read_only)) {
  error_setg(errp,
 @@ -1885,7 +1884,6 @@ void bdrv_close(BlockDriverState *bs)
  bs-encrypted = 0;
  bs-valid_key = 0;
  bs-sg = 0;
 -bs-growable = 0;
  bs-zero_beyond_eof = false;
  QDECREF(bs-options);
  bs-options = NULL;
 @@ -2645,25 +2643,13 @@ exit:
  static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
 size_t size)
  {
 -int64_t len;
 -
  if (size  INT_MAX) {
  return -EIO;
  }
  
 -if (!bdrv_is_inserted(bs))
 +if (!bdrv_is_inserted(bs)) {
  return -ENOMEDIUM;
 -
 -if (bs-growable)
 -return 0;
 -
 -len = bdrv_getlength(bs);
 -
 -if (offset  0)
 -return -EIO;

Wouldn't it be better to keep this one, even though bs-growable used to
disable it?

Kevin



Re: [Qemu-devel] [PATCH 2/2] nbd: fix max_discard

2015-02-02 Thread Peter Lieven
Am 02.02.2015 um 19:29 schrieb Denis V. Lunev:
 nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t
 as the length in bytes of the data to discard due to the following
 definition:

 struct nbd_request {
 uint32_t magic;
 uint32_t type;
 uint64_t handle;
 uint64_t from;
 uint32_t len; -- the length of data to be discarded, in bytes
 } QEMU_PACKED;

 Thus we should limit bl_max_discard to UINT32_MAX  BDRV_SECTOR_BITS to
 avoid overflow.

 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Kevin Wolf kw...@redhat.com
 CC: Peter Lieven p...@kamp.de
 ---
  block/nbd.c | 8 
  1 file changed, 8 insertions(+)

 diff --git a/block/nbd.c b/block/nbd.c
 index 04cc845..99af713 100644
 --- a/block/nbd.c
 +++ b/block/nbd.c
 @@ -301,6 +301,11 @@ static int nbd_co_flush(BlockDriverState *bs)
  return nbd_client_session_co_flush(s-client);
  }
  
 +static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 +{
 +bs-bl.max_discard = UINT32_MAX  BDRV_SECTOR_BITS;
 +}
 +
  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
int nb_sectors)
  {
 @@ -396,6 +401,7 @@ static BlockDriver bdrv_nbd = {
  .bdrv_close = nbd_close,
  .bdrv_co_flush_to_os= nbd_co_flush,
  .bdrv_co_discard= nbd_co_discard,
 +.bdrv_refresh_limits= nbd_refresh_limits,
  .bdrv_getlength = nbd_getlength,
  .bdrv_detach_aio_context= nbd_detach_aio_context,
  .bdrv_attach_aio_context= nbd_attach_aio_context,
 @@ -413,6 +419,7 @@ static BlockDriver bdrv_nbd_tcp = {
  .bdrv_close = nbd_close,
  .bdrv_co_flush_to_os= nbd_co_flush,
  .bdrv_co_discard= nbd_co_discard,
 +.bdrv_refresh_limits= nbd_refresh_limits,
  .bdrv_getlength = nbd_getlength,
  .bdrv_detach_aio_context= nbd_detach_aio_context,
  .bdrv_attach_aio_context= nbd_attach_aio_context,
 @@ -430,6 +437,7 @@ static BlockDriver bdrv_nbd_unix = {
  .bdrv_close = nbd_close,
  .bdrv_co_flush_to_os= nbd_co_flush,
  .bdrv_co_discard= nbd_co_discard,
 +.bdrv_refresh_limits= nbd_refresh_limits,
  .bdrv_getlength = nbd_getlength,
  .bdrv_detach_aio_context= nbd_detach_aio_context,
  .bdrv_attach_aio_context= nbd_attach_aio_context,

Reviewed-by: Peter Lieven p...@kamp.de



[Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-02 Thread Denis V. Lunev
qemu_gluster_co_discard calculates size to discard as follows
size_t size = nb_sectors * BDRV_SECTOR_SIZE;
ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

glfs_discard_async is declared as follows:
  int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
  glfs_io_cbk fn, void *data) __THROW
This is problematic on i686 as sizeof(size_t) == 4.

Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
on i386.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Kevin Wolf kw...@redhat.com
CC: Peter Lieven p...@kamp.de
---
 block/gluster.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..8a8c153 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -622,6 +622,11 @@ out:
 return ret;
 }
 
+static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
+}
+
 #ifdef CONFIG_GLUSTERFS_DISCARD
 static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors)
@@ -735,6 +740,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_discard  = qemu_gluster_co_discard,
 #endif
+.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
 #endif
@@ -762,6 +768,7 @@ static BlockDriver bdrv_gluster_tcp = {
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_discard  = qemu_gluster_co_discard,
 #endif
+.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
 #endif
@@ -789,6 +796,7 @@ static BlockDriver bdrv_gluster_unix = {
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_discard  = qemu_gluster_co_discard,
 #endif
+.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
 #endif
@@ -816,6 +824,7 @@ static BlockDriver bdrv_gluster_rdma = {
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_discard  = qemu_gluster_co_discard,
 #endif
+.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
 #endif
-- 
1.9.1




Re: [Qemu-devel] [RFC v2 6/8] vfio_pci: fix a wrong check in vfio_pci_reset

2015-02-02 Thread Alex Williamson
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote:
 when vfio device support FLR, then when device reset,
 we call VFIO_DEVICE_RESET ioctl to reset the device first,
 at kernel side, we also can see the order of reset:
 3330 rc = pcie_flr(dev, probe);
 3331 if (rc != -ENOTTY)
 3332 goto done;
 
 3334 rc = pci_af_flr(dev, probe);
 3335 if (rc != -ENOTTY)
 3336 goto done;
 3337
 3338 rc = pci_pm_reset(dev, probe);
 3339 if (rc != -ENOTTY)
 3340 goto done;
 
 so when vfio has FLR, reset it directly.
 
 Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
 ---
  hw/vfio/pci.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 8c81bb3..54eb6b4 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -3455,7 +3455,7 @@ static void vfio_pci_reset(DeviceState *dev)
  vfio_pci_pre_reset(vdev);
  
  if (vdev-vbasedev.reset_works 
 -(vdev-has_flr || !vdev-has_pm_reset) 
 +vdev-has_flr 
  !ioctl(vdev-vbasedev.fd, VFIO_DEVICE_RESET)) {
  trace_vfio_pci_reset_flr(vdev-vbasedev.name);
  goto post_reset;

Does this actually fix anything?  QEMU shouldn't rely on a specific
behavior of the kernel.  This test is de-prioritizing a PM reset because
they're often non-effective.  If the device supports FLR, the second
part of the OR is unreached, so what's the point of this change?
Thanks,

Alex




[Qemu-devel] [PATCH 0/3] nbd: Drop BDS backpointer

2015-02-02 Thread Max Reitz
Right now, bdrv_swap() on NBD BDSs results in a segmentation fault
pretty much all of the time. This series fixes this.

Note that this is not a common case, as bdrv_swap() is generally only
performed on root BDSs (there are exceptions, though) and NBD BDSs
normally have a format BDS above them. However, due to misconfiguration
(or maybe it is not even a misconfiguration, but just a strange
configuration) these cases may indeed occur.

For the iotest to succeed, this series relies on
iotests: Specify format for qemu-nbd.

I took the second patch in this series from my other series
block: Rework bdrv_close_all() (which has 21 patches itself and
depends on 64 other patches, so making this series rely on that one
probably would not have been a very good idea).


Max Reitz (3):
  nbd: Drop BDS backpointer
  iotests: Add wait functionality to _cleanup_qemu
  iotests: Add test for drive-mirror with NBD target

 block/nbd-client.c | 95 ++
 block/nbd-client.h | 20 -
 block/nbd.c| 37 +++-
 tests/qemu-iotests/094 | 81 +++
 tests/qemu-iotests/094.out | 11 +
 tests/qemu-iotests/common.qemu | 12 +-
 tests/qemu-iotests/group   |  1 +
 7 files changed, 177 insertions(+), 80 deletions(-)
 create mode 100755 tests/qemu-iotests/094
 create mode 100644 tests/qemu-iotests/094.out

-- 
2.1.0




[Qemu-devel] [PATCH 1/3] nbd: Drop BDS backpointer

2015-02-02 Thread Max Reitz
Before this patch, the opaque pointer in an NBD BDS points to a
BDRVNBDState, which contains an NbdClientSession object, which in turn
contains a pointer to the BDS. This pointer may become invalid due to
bdrv_swap(), so drop it, and instead pass the BDS directly to the
nbd-client.c functions which then retrieve the NbdClientSession object
from there.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/nbd-client.c | 95 --
 block/nbd-client.h | 20 ++--
 block/nbd.c| 37 -
 3 files changed, 73 insertions(+), 79 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 28bfb62..4ede714 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -43,20 +43,23 @@ static void nbd_recv_coroutines_enter_all(NbdClientSession 
*s)
 }
 }
 
-static void nbd_teardown_connection(NbdClientSession *client)
+static void nbd_teardown_connection(BlockDriverState *bs)
 {
+NbdClientSession *client = nbd_get_client_session(bs);
+
 /* finish any pending coroutines */
 shutdown(client-sock, 2);
 nbd_recv_coroutines_enter_all(client);
 
-nbd_client_session_detach_aio_context(client);
+nbd_client_session_detach_aio_context(bs);
 closesocket(client-sock);
 client-sock = -1;
 }
 
 static void nbd_reply_ready(void *opaque)
 {
-NbdClientSession *s = opaque;
+BlockDriverState *bs = opaque;
+NbdClientSession *s = nbd_get_client_session(bs);
 uint64_t i;
 int ret;
 
@@ -89,28 +92,29 @@ static void nbd_reply_ready(void *opaque)
 }
 
 fail:
-nbd_teardown_connection(s);
+nbd_teardown_connection(bs);
 }
 
 static void nbd_restart_write(void *opaque)
 {
-NbdClientSession *s = opaque;
+BlockDriverState *bs = opaque;
 
-qemu_coroutine_enter(s-send_coroutine, NULL);
+qemu_coroutine_enter(nbd_get_client_session(bs)-send_coroutine, NULL);
 }
 
-static int nbd_co_send_request(NbdClientSession *s,
-struct nbd_request *request,
-QEMUIOVector *qiov, int offset)
+static int nbd_co_send_request(BlockDriverState *bs,
+   struct nbd_request *request,
+   QEMUIOVector *qiov, int offset)
 {
+NbdClientSession *s = nbd_get_client_session(bs);
 AioContext *aio_context;
 int rc, ret;
 
 qemu_co_mutex_lock(s-send_mutex);
 s-send_coroutine = qemu_coroutine_self();
-aio_context = bdrv_get_aio_context(s-bs);
+aio_context = bdrv_get_aio_context(bs);
 aio_set_fd_handler(aio_context, s-sock,
-   nbd_reply_ready, nbd_restart_write, s);
+   nbd_reply_ready, nbd_restart_write, bs);
 if (qiov) {
 if (!s-is_unix) {
 socket_set_cork(s-sock, 1);
@@ -129,7 +133,7 @@ static int nbd_co_send_request(NbdClientSession *s,
 } else {
 rc = nbd_send_request(s-sock, request);
 }
-aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, s);
+aio_set_fd_handler(aio_context, s-sock, nbd_reply_ready, NULL, bs);
 s-send_coroutine = NULL;
 qemu_co_mutex_unlock(s-send_mutex);
 return rc;
@@ -195,10 +199,11 @@ static void nbd_coroutine_end(NbdClientSession *s,
 }
 }
 
-static int nbd_co_readv_1(NbdClientSession *client, int64_t sector_num,
+static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, QEMUIOVector *qiov,
   int offset)
 {
+NbdClientSession *client = nbd_get_client_session(bs);
 struct nbd_request request = { .type = NBD_CMD_READ };
 struct nbd_reply reply;
 ssize_t ret;
@@ -207,7 +212,7 @@ static int nbd_co_readv_1(NbdClientSession *client, int64_t 
sector_num,
 request.len = nb_sectors * 512;
 
 nbd_coroutine_start(client, request);
-ret = nbd_co_send_request(client, request, NULL, 0);
+ret = nbd_co_send_request(bs, request, NULL, 0);
 if (ret  0) {
 reply.error = -ret;
 } else {
@@ -218,15 +223,16 @@ static int nbd_co_readv_1(NbdClientSession *client, 
int64_t sector_num,
 
 }
 
-static int nbd_co_writev_1(NbdClientSession *client, int64_t sector_num,
+static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov,
int offset)
 {
+NbdClientSession *client = nbd_get_client_session(bs);
 struct nbd_request request = { .type = NBD_CMD_WRITE };
 struct nbd_reply reply;
 ssize_t ret;
 
-if (!bdrv_enable_write_cache(client-bs) 
+if (!bdrv_enable_write_cache(bs) 
 (client-nbdflags  NBD_FLAG_SEND_FUA)) {
 request.type |= NBD_CMD_FLAG_FUA;
 }
@@ -235,7 +241,7 @@ static int nbd_co_writev_1(NbdClientSession *client, 
int64_t sector_num,
 request.len = nb_sectors * 512;
 
 nbd_coroutine_start(client, request);
-ret = nbd_co_send_request(client, request, qiov, offset);
+ret = nbd_co_send_request(bs, request, qiov, offset);
   

Re: [Qemu-devel] [PATCH v3 10/14] qemu-io: Remove growable option

2015-02-02 Thread Max Reitz

On 2015-02-02 at 14:36, Kevin Wolf wrote:

Am 27.01.2015 um 18:11 hat Max Reitz geschrieben:

On 2015-01-27 at 12:10, Eric Blake wrote:

On 01/27/2015 10:04 AM, Max Reitz wrote:

On 2015-01-27 at 11:59, Eric Blake wrote:

On 01/26/2015 08:00 AM, Max Reitz wrote:

Remove growable option from the open command and from the qemu-io
command line. qemu-io is about to be converted to BlockBackend which
will make sure that no request exceeds the image size, so the only way
to keep growable would be to use BlockBackend if it is not given and
to directly access the BDS if it is.

qemu-io is a debugging tool, therefore removing a rarely used option
will have only a very small impact, if any. There was only one
qemu-iotest which used the option; since it is not critical, this patch
just removes it.

Signed-off-by: Max Reitz mre...@redhat.com
---

Reviewed-by: Eric Blake ebl...@redhat.com

Do we want to ever reuse the test number that you are deleting?

Good question, I think I have talked about that with Kevin before. It
would not hurt too much if we were to accidentally reuse the test case
number, most certainly not here in upstream.

However, for all downstream versions of qemu, this might make adding the
new test 16 difficult; but certainly not impossible (if someone is
affected by this issue, he/she can just use 999 or something). So we may
want to keep in mind not to reuse number 16, but if someone does, so be it.

Is it worth a placeholder file that has a comment mentioning that the
test number is intentionally reserved (and if someone attempts to run,
always passes)?

Seems good to me. It's a minor effort now and may avert some hassle later.

How about just keeping a comment line in group?


Oh, that would be too simple.

Or maybe just simple enough for me; yes, that seems nicer indeed.

Max



Re: [Qemu-devel] [RFC v2 2/8] vfio-pci: add aer capability support

2015-02-02 Thread Alex Williamson
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote:
 if we detect the aer capability in vfio device, then
 we should initialize the vfio device aer rigister bits.
 so guest OS can set this bits as needed.

s/rigister/register/

 Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
 ---
  hw/vfio/pci.c | 72 
 +++
  1 file changed, 72 insertions(+)
 
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 014a92c..2072261 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -2670,6 +2670,73 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev)
  return vfio_add_std_cap(vdev, pdev-config[PCI_CAPABILITY_LIST]);
  }
  
 +static void vfio_pci_aer_init(VFIOPCIDevice *vdev)
 +{
 +PCIDevice *pdev = vdev-pdev;
 +PCIExpressDevice *exp = pdev-exp;
 +uint16_t offset = exp-aer_cap;
 +
 +if (!offset) {
 +return;
 +}
 +

All of these need to be documented with comments.

 +memset(pdev-wmask + offset, 0, PCI_ERR_SIZEOF);
 +memset(pdev-w1cmask + offset, 0, PCI_ERR_SIZEOF);
 +memset(pdev-cmask + offset, 0xFF, PCI_ERR_SIZEOF);
 +
 +pci_long_test_and_set_mask(pdev-wmask + exp-exp_cap + PCI_EXP_DEVCTL,
 +   PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE |
 +   PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE);
 +pci_long_test_and_set_mask(pdev-w1cmask + exp-exp_cap + PCI_EXP_DEVSTA,
 +   PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED |
 +   PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD);
 +
 +pci_set_long(pdev-w1cmask + offset + PCI_ERR_UNCOR_STATUS,
 + PCI_ERR_UNC_SUPPORTED);
 +pci_set_long(pdev-wmask + offset + PCI_ERR_UNCOR_SEVER,
 + PCI_ERR_UNC_SUPPORTED);
 +pci_long_test_and_set_mask(pdev-w1cmask + offset + PCI_ERR_COR_STATUS,
 +   PCI_ERR_COR_STATUS);
 +pci_set_long(pdev-wmask + offset + PCI_ERR_COR_MASK,
 + PCI_ERR_COR_SUPPORTED);
 +
 +pci_set_long(pdev-wmask + offset + PCI_ERR_CAP,
 + PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE |
 + PCI_ERR_CAP_MHRE);
 +}
 +
 +static int vfio_add_ext_capabilities(VFIOPCIDevice *vdev)
 +{
 +PCIDevice *pdev = vdev-pdev;
 +PCIExpressDevice *exp;
 +uint32_t header;
 +uint16_t next = PCI_CONFIG_SPACE_SIZE;
 +
 +if (pci_config_size(pdev) = PCI_CONFIG_SPACE_SIZE) {
 +return 0;
 +}
 +
 +header = pci_get_long(pdev-config + next);
 +while (header) {
 +switch (PCI_EXT_CAP_ID(header)) {
 +case PCI_EXT_CAP_ID_ERR:
 + exp = pdev-exp;
 + exp-aer_cap = next;

Shouldn't we call pcie_aer_init() here?

 +
 + vfio_pci_aer_init(vdev);
 + break;
 +};
 +
 +next = PCI_EXT_CAP_NEXT(header);
 +if (!next) {
 +return 0;
 +}
 +header = pci_get_long(pdev-config + next);

I'd like to see this look more like vfio_add_std_cap(), registering
every capability with the QEMU PCIe-core and setting up emulation to
allow QEMU to skip capabilities that it doesn't want to expose.

 +}
 +
 +return 0;
 +}
 +
  static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
  {
  PCIDevice *pdev = vdev-pdev;
 @@ -3296,6 +3363,11 @@ static int vfio_initfn(PCIDevice *pdev)
  goto out_teardown;
  }
  
 +ret = vfio_add_ext_capabilities(vdev);
 +if (ret) {
 +goto out_teardown;
 +}
 +

Why not extend vfio_add_capabilities()?  It specifically calls
vfio_add_std_cap() in order that there could be a vfio_add_ext_cap().

  /* QEMU emulates all of MSI  MSIX */
  if (pdev-cap_present  QEMU_PCI_CAP_MSIX) {
  memset(vdev-emulated_config_bits + pdev-msix_cap, 0xff,






Re: [Qemu-devel] [RFC v2 7/8] vfio_pci: change vfio device features bit macro to enum definition

2015-02-02 Thread Alex Williamson
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote:


The why should go here

 Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
 ---
  hw/vfio/pci.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 54eb6b4..65247ee 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -134,6 +134,12 @@ typedef struct VFIOMSIXInfo {
  void *mmap;
  } VFIOMSIXInfo;
  
 +/* Bits in VFIOPCIDevice features field. */
 +enum {
 +#define VFIO_FEATURE_ENABLE_VGA_BIT 0
 +VFIO_FEATURE_ENABLE_VGA = (1  VFIO_FEATURE_ENABLE_VGA_BIT),
 +};

Rather strange to use a bitmap within an enum, isn't it?

 +
  typedef struct VFIOPCIDevice {
  PCIDevice pdev;
  VFIODevice vbasedev;
 @@ -154,8 +160,6 @@ typedef struct VFIOPCIDevice {
  PCIHostDeviceAddress host;
  EventNotifier err_notifier;
  uint32_t features;
 -#define VFIO_FEATURE_ENABLE_VGA_BIT 0
 -#define VFIO_FEATURE_ENABLE_VGA (1  VFIO_FEATURE_ENABLE_VGA_BIT)
  int32_t bootindex;
  uint8_t pm_cap;
  bool has_vga;






Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers

2015-02-02 Thread John Snow



On 02/02/2015 05:27 AM, Paolo Bonzini wrote:



On 30/01/2015 19:41, John Snow wrote:

+/* Set the #cx'th command of port #px. */
+void ahci_set_command_header(AHCIQState *ahci, uint8_t px,
+ uint8_t cx, AHCICommandHeader *cmd)
+{
+uint64_t ba = ahci-port[px].clb;
+ba += cx * sizeof(AHCICommandHeader);
+
+cmd-flags = cpu_to_le16(cmd-flags);
+cmd-prdtl = cpu_to_le16(cmd-prdtl);
+cmd-prdbc = cpu_to_le32(cmd-prdbc);
+cmd-ctba = cpu_to_le64(cmd-ctba);


Modifying cmd is uglyish, and especially it might hide bugs that only
happen on big endian system.  Please copy to a local AHCICommandHeader
variable before adjusting for host endianness.

Paolo


+memwrite(ba, cmd, sizeof(AHCICommandHeader));
+}




You're right. I had just never actually thought to reference the command 
after committing it before, so I hadn't considered it in that light.


If I intend to share goodies with other tests it would be nice if they 
didn't do secretly evil things :)




Re: [Qemu-devel] [PATCH v3 09/14] qemu-io: Use blk_new_open() in openfile()

2015-02-02 Thread Kevin Wolf
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  qemu-io.c | 31 ---
  1 file changed, 12 insertions(+), 19 deletions(-)
 
 diff --git a/qemu-io.c b/qemu-io.c
 index 91a445a..81f8f64 100644
 --- a/qemu-io.c
 +++ b/qemu-io.c
 @@ -39,7 +39,6 @@ static ReadLineState *readline_state;
  static int close_f(BlockDriverState *bs, int argc, char **argv)
  {
  blk_unref(qemuio_blk);
 -qemuio_bs = NULL;
  qemuio_blk = NULL;
  return 0;
  }

This doesn't seem to be correct at this point in the series. qemuio_bs
is still passed to qemuio_command(), which checks whether non-global
commands are allowed to run using init_check_command().

At the end of the series, please remove qemuio_bs altogether. This
version of the series leaves it around without any reader.

Kevin



[Qemu-devel] [PATCH 1/1] Tracify migration/rdma.c

2015-02-02 Thread Dr. David Alan Gilbert (git)
From: Dr. David Alan Gilbert dgilb...@redhat.com

Turn all the D/DD/DDDPRINTFs into trace events
Turn most of the fprintf(stderr, into error_report

Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com
---
 migration/rdma.c | 333 ---
 trace-events |  62 +++
 2 files changed, 206 insertions(+), 189 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b32dbdf..fc351ea 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -26,34 +26,7 @@
 #include arpa/inet.h
 #include string.h
 #include rdma/rdma_cma.h
-
-//#define DEBUG_RDMA
-//#define DEBUG_RDMA_VERBOSE
-//#define DEBUG_RDMA_REALLY_VERBOSE
-
-#ifdef DEBUG_RDMA
-#define DPRINTF(fmt, ...) \
-do { printf(rdma:  fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
-#ifdef DEBUG_RDMA_VERBOSE
-#define DDPRINTF(fmt, ...) \
-do { printf(rdma:  fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DDPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
-#ifdef DEBUG_RDMA_REALLY_VERBOSE
-#define DDDPRINTF(fmt, ...) \
-do { printf(rdma:  fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DDDPRINTF(fmt, ...) \
-do { } while (0)
-#endif
+#include trace.h
 
 /*
  * Print and error on both the Monitor and the Log file.
@@ -104,8 +77,8 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 do { \
 if (rdma-error_state) { \
 if (!rdma-error_reported) { \
-fprintf(stderr, RDMA is in an error state waiting migration \
- to abort!\n); \
+error_report(RDMA is in an error state waiting migration \
+ to abort!); \
 rdma-error_reported = 1; \
 } \
 return rdma-error_state; \
@@ -578,12 +551,13 @@ static int __qemu_rdma_add_block(RDMAContext *rdma, void 
*host_addr,
 
 g_hash_table_insert(rdma-blockmap, (void *) block_offset, block);
 
-DDPRINTF(Added Block: %d, addr: % PRIu64 , offset: % PRIu64
-length: % PRIu64  end: % PRIu64  bits % PRIu64  chunks 
%d\n,
-local-nb_blocks, (uint64_t) block-local_host_addr, block-offset,
-block-length, (uint64_t) (block-local_host_addr + block-length),
-BITS_TO_LONGS(block-nb_chunks) *
-sizeof(unsigned long) * 8, block-nb_chunks);
+trace___qemu_rdma_add_block(local-nb_blocks,
+   (uint64_t) block-local_host_addr, block-offset,
+   block-length,
+   (uint64_t) (block-local_host_addr + block-length),
+   BITS_TO_LONGS(block-nb_chunks) *
+   sizeof(unsigned long) * 8,
+   block-nb_chunks);
 
 local-nb_blocks++;
 
@@ -614,7 +588,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 rdma-blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
 memset(local, 0, sizeof *local);
 qemu_ram_foreach_block(qemu_rdma_init_one_block, rdma);
-DPRINTF(Allocated %d local ram block structures\n, local-nb_blocks);
+trace_qemu_rdma_init_ram_blocks(local-nb_blocks);
 rdma-block = (RDMARemoteBlock *) g_malloc0(sizeof(RDMARemoteBlock) *
 rdma-local_ram_blocks.nb_blocks);
 local-init = true;
@@ -683,12 +657,12 @@ static int __qemu_rdma_delete_block(RDMAContext *rdma, 
ram_addr_t block_offset)
 local-block = NULL;
 }
 
-DDPRINTF(Deleted Block: %d, addr: % PRIu64 , offset: % PRIu64
-length: % PRIu64  end: % PRIu64  bits % PRIu64  chunks 
%d\n,
-local-nb_blocks, (uint64_t) block-local_host_addr, block-offset,
-block-length, (uint64_t) (block-local_host_addr + block-length),
-BITS_TO_LONGS(block-nb_chunks) *
-sizeof(unsigned long) * 8, block-nb_chunks);
+trace___qemu_rdma_delete_block(local-nb_blocks,
+   (uint64_t)block-local_host_addr,
+   block-offset, block-length,
+   (uint64_t)(block-local_host_addr + block-length),
+   BITS_TO_LONGS(block-nb_chunks) *
+   sizeof(unsigned long) * 8, block-nb_chunks);
 
 g_free(old);
 
@@ -713,7 +687,7 @@ static void qemu_rdma_dump_id(const char *who, struct 
ibv_context *verbs)
 struct ibv_port_attr port;
 
 if (ibv_query_port(verbs, 1, port)) {
-fprintf(stderr, FAILED TO QUERY PORT INFORMATION!\n);
+error_report(Failed to query port information);
 return;
 }
 
@@ -744,7 +718,7 @@ static void qemu_rdma_dump_gid(const char *who, struct 
rdma_cm_id *id)
 char dgid[33];
 inet_ntop(AF_INET6, id-route.addr.addr.ibaddr.sgid, sgid, sizeof sgid);
 inet_ntop(AF_INET6, id-route.addr.addr.ibaddr.dgid, dgid, sizeof dgid);
-DPRINTF(%s Source GID: %s, Dest GID: %s\n, who, sgid, 

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-02 Thread Denis V. Lunev

On 02/02/15 22:58, Peter Lieven wrote:

Am 02.02.2015 um 19:29 schrieb Denis V. Lunev:

qemu_gluster_co_discard calculates size to discard as follows
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

glfs_discard_async is declared as follows:
   int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
   glfs_io_cbk fn, void *data) __THROW
This is problematic on i686 as sizeof(size_t) == 4.

Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
on i386.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Kevin Wolf kw...@redhat.com
CC: Peter Lieven p...@kamp.de
---
  block/gluster.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..47bf92d 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -622,6 +622,13 @@ out:
  return ret;
  }
  
+static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)

+{
+#if SIZE_MAX == UINT_MAX
+bs-bl.max_discard = SIZE_MAX  BDRV_SECTOR_BITS;
+#endif

I would write:

bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);

without the condition.


Peter


ok, respinned

Den



[Qemu-devel] [PATCH v3 0/2] fix max_discard for NBD/gluster block drivers

2015-02-02 Thread Denis V. Lunev
Changes from v2:
- dropped conditional in patch gluster code

Changes from v1:
- switched to UINT32_MAX for NBD
- limited max_discard for gluster to SIZE_MAX on i686 only

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Kevin Wolf kw...@redhat.com
CC: Peter Lieven p...@kamp.de




Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2015-02-02 Thread John Snow



On 02/02/2015 05:10 AM, Markus Armbruster wrote:

Kevin Wolf kw...@redhat.com writes:


Am 30.01.2015 um 18:04 hat John Snow geschrieben:



On 01/30/2015 09:32 AM, Kevin Wolf wrote:

Am 21.01.2015 um 10:34 hat Markus Armbruster geschrieben:

I'm afraid I forgot much of the discussion we had before the break, and
only now it's coming back, slowly.

Quoting myself on naming parameters identifying nodes[*]:

 John Snow pointed out to me that we still haven't spelled out how this
 single parameter should be named.

 On obvious option is calling it node-name, or FOO-node-name when we have
 several.  However, we'd then have two subtly different kinds of
 parameters called like that: the old ones accept *only* node names, the
 new ones also accept backend names, which automatically resolve to the
 backend's root node.

 Three ways to cope with that:

 * Find a better name.

 * Make the old ones accept backend names, too.  Only a few, not that
   much work.  However, there are exceptions:

   - blockdev-add's node-name *defines* the node name.

   - query-named-block-nodes's node-name *is* the node's name.

 * Stop worrying and embrace the inconsistency.  The affected commands
   are headed for deprecation anyway.

 I think I'd go with node or FOO-node for parameters that reference
 nodes and accept both node names and backend names, and refrain from
 touching the existing node-name parameters.


Wasn't the conclusion last time that we would try to find a better name
for new commands and leave old commands alone because they are going to
become deprecated? That is, a combination of your first and last option?



That was my impression, too: Use a new name for new commands and
then slowly phase out the old things. This makes the new name clear
as to what it supports (BOTH backends and nodes through a common
namespace) to external management utilities like libvirt.

That's why I just rolled 'node-ref.'


Yes, I agree with that in principle. I called it 'node' below because
that's shorter and doesn't include type information in the name, but if
everyone preferred 'node-ref', I wouldn't have a problem with it either.





Let's go through existing uses of @node-name again:

1. Define a node name

QMP commands blockdev-add (type BlockdevOptionsBase), drive-mirror

2. Report a node name

QMP command query-named-block-nodes (type BlockDeviceInfo)


Whatever name we end up using, 1. and 2. should probably use the same.


Should they? If these commands accept directly *node* names and have
no chance of referencing a backend, maybe they should use different
parameter names.


Maybe.  But even if we use different names for things that can only be
node names, never backend names, 1. and 2. should use the same name,
because they're both things that can only be node names.  That's what
Kevin said.


Note that these cases don't refer to a node, but they define/return the
name of a node. No way that could ever be a backend name, unless we
broke the code.


3. Node reference with backend names permitted for convenience

New QMP command block-dirty-bitmap-add (type BlockDirtyBitmapAdd) and
others

4. Node reference with backend names not permitted

QMP commands drive-mirror @replaces, change-backing-file
@image-node-name

We may want to support the backend name resolves to root node
convenience feature here, for consistency.  Then this moves under 3.

Note interface wart: change-backing-file additionally requires the
backend owning the node.  We need the backend to set op-blockers, we
can't easily find it from the node, so we make the user point it out
to us.


These shouldn't be existing. As you say, we should move them to 3.



Technically #3 here isn't a usage of node-name, because... I
didn't use node-name for these commands. Unless I am reading this
list wrong, but it's definitely not an existing use.

I don't have any opinions about #4; presumably that's something
we're aiming to phase out.


Yes. Where phasing out simply means to extend it so that it accepts
backend names. That should in theory be an easy change, except that
@image-node-name has a name that isn't quite compatible with it...


Our choice for 3. affects the phasing out of 4.

Our choice for 3 is a naming convention for parameters referencing nodes
that accept both node and backend names.

If, after extending the code to accept backend names, the old names for
4. follow the naming convention for 3., we're done.

Else, we still have an interface wart.  We can live with it, or we can
rename the parameter to follow the convention.


5. Pair of names node reference, specify exactly one

QMP commands block_passwd, block_resize, blockdev-snapshot-sync

We can ignore this one, because we intend to replace the commands and
deprecate the old ones.


Agreed, these shouldn't be existing either.


If I understand you correctly, you're 

Re: [Qemu-devel] [PATCH v3 04/14] block/xen: Use blk_new_open() in blk_connect()

2015-02-02 Thread Max Reitz

On 2015-02-02 at 13:27, Kevin Wolf wrote:

Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:

Signed-off-by: Max Reitz mre...@redhat.com
---
  hw/block/xen_disk.c | 28 
  1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 21842a0..1b0257c 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -40,6 +40,8 @@
  #include xen_blkif.h
  #include sysemu/blockdev.h
  #include sysemu/block-backend.h
+#include qapi/qmp/qdict.h
+#include qapi/qmp/qstring.h
  
  /* - */
  
@@ -897,30 +899,24 @@ static int blk_connect(struct XenDevice *xendev)

  blkdev-dinfo = drive_get(IF_XEN, 0, index);
  if (!blkdev-dinfo) {
  Error *local_err = NULL;
-BlockBackend *blk;
-BlockDriver *drv;
-BlockDriverState *bs;
+QDict *options = NULL;
  
-/* setup via xenbus - create new block driver instance */

-xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n);
-blk = blk_new_with_bs(blkdev-dev, NULL);
-if (!blk) {
-return -1;
+if (strcmp(blkdev-fileproto, unset)) {

xen_disk's usage of the string unset to mark configurations where no
driver is specific is quite ugly. I think it's possible for users to
pass this string as the driver name, and we would end up probing the
driver instead of returning an error.

Which was actually how any invalid driver name was handled before this
patch: bdrv_find_whitelisted_format() would return NULL, and instead of
erroring out, NULL would be passed to bdrv_open(), which probed the
driver then.

This patch improves the situation: Now any value that is not unset
is passed as the driver option, so invalid drivers will produce an
error now. Should be mentioned in the commit message.

However, if the user passes unset, that still means probing. Ugly.
Not sure why blkdev-fileproto == NULL wasn't used to specify that no
driver was given. We could change that in a patch before this one if we
wanted to clean it up. Or we could just feel reassured that xen_disk is
horrible code, that this patch already fixes most of it and leave it
alone.


I was glad enough that I made it out alive after digging just a bit into 
the code. I'm fully in favor of leaving this as-is and adding a note to 
the commit message.


Thank you for reviewing!

Max


+options = qdict_new();
+qdict_put_obj(options, driver,
+  QOBJECT(qstring_from_str(blkdev-fileproto)));
  }
-blkdev-blk = blk;
  
-bs = blk_bs(blk);

-drv = bdrv_find_whitelisted_format(blkdev-fileproto, readonly);
-if (bdrv_open(bs, blkdev-filename, NULL, NULL, qflags,
-  drv, local_err) != 0) {
+/* setup via xenbus - create new block driver instance */
+xen_be_printf(blkdev-xendev, 2, create new bdrv (xenbus setup)\n);
+blkdev-blk = blk_new_open(blkdev-dev, blkdev-filename, NULL, 
options,
+   qflags, local_err);
+if (!blkdev-blk) {
  xen_be_printf(blkdev-xendev, 0, error: %s\n,
error_get_pretty(local_err));
  error_free(local_err);
-blk_unref(blk);
-blkdev-blk = NULL;
  return -1;
  }
-assert(bs == blk_bs(blk));
  } else {
  /* setup via qemu cmdline - already setup for us */
  xen_be_printf(blkdev-xendev, 2, get configured bdrv (cmdline 
setup)\n);

Kevin





Re: [Qemu-devel] [PATCH v3 05/14] qemu-img: Use blk_new_open() in img_open()

2015-02-02 Thread Max Reitz

On 2015-02-02 at 13:35, Kevin Wolf wrote:

Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:

Signed-off-by: Max Reitz mre...@redhat.com
---
  qemu-img.c | 20 ++--
  1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4e9a7f5..be1953d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -291,32 +291,24 @@ static BlockBackend *img_open(const char *id, const char 
*filename,
  {
  BlockBackend *blk;
  BlockDriverState *bs;
-BlockDriver *drv;
  char password[256];
  Error *local_err = NULL;
-int ret;
-
-blk = blk_new_with_bs(id, error_abort);
-bs = blk_bs(blk);
+QDict *options = NULL;
  
  if (fmt) {

-drv = bdrv_find_format(fmt);
-if (!drv) {
-error_report(Unknown file format '%s', fmt);
-goto fail;
-}
-} else {
-drv = NULL;
+options = qdict_new();
+qdict_put_obj(options, driver, QOBJECT(qstring_from_str(fmt)));

I'm only noticing it here, though you did the same in the previous
patches. This can be written shorter as:

 qdict_put(options, driver, qstring_from_str(fmt));


Indeed, will do.

Max


  }
  
-ret = bdrv_open(bs, filename, NULL, NULL, flags, drv, local_err);

-if (ret  0) {
+blk = blk_new_open(id, filename, NULL, options, flags, local_err);
+if (!blk) {
  error_report(Could not open '%s': %s, filename,
   error_get_pretty(local_err));
  error_free(local_err);
  goto fail;
  }
  
+bs = blk_bs(blk);

  if (bdrv_is_encrypted(bs)  require_io) {
  qprintf(quiet, Disk image '%s' is encrypted.\n, filename);
  if (read_password(password, sizeof(password))  0) {

Kevin





Re: [Qemu-devel] [PATCH v3 00/14] block: Remove growable, add blk_new_open()

2015-02-02 Thread Kevin Wolf
Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:
 This series removes the growable field from the BlockDriverState
 object. Its use was to clamp guest requests against the limits of the
 BDS; however, this can now be done more easily by moving those checks
 into the BlockBackend functions.
 
 In a future series, growable may be reintroduced (maybe with a
 different name); it will then signify whether a BDS is able to grow (in
 contrast to the current growable, which signifies whether it is
 allowed to). Maybe I will add it to the BlockDriver instead of the BDS,
 though.
 
 To be able to remove that field, qemu-io needs to be converted to
 BlockBackend, which is done by this series as well. While working on
 that I decided to convert blk_new_with_bs()+bdrv_open() to
 blk_new_open(). I was skeptical about that decision at first, but it
 seems good now that I was able to replace nearly every blk_new_with_bs()
 call by blk_new_open(). In a future series I may try to convert some
 remaining bdrv_open() calls to blk_new_open() as well. (And, in fact, in
 a future series I *will* replace the last remaining blk_new_with_bs()
 outside of blk_new_open() by blk_new_open().)
 
 Finally, the question needs to be asked: If, after this series, every
 BDS is allowed to grow, are there any users which do not use BB, but
 should still be disallowed from reading/writing beyond a BDS's limits?
 The only users I could see were the block jobs. Some of them should
 indeed be converted to BB; but none of them takes a user-supplied offset
 or size, all work on the full BDS (or only on parts which have been
 modified, etc.). Therefore, it is by design impossible for them to
 exceed the BDS's limits, which makes making all BDS's growable safe.

Patches that I didn't comment on (all except 4, 5, 6, 9, 10, 13) are,
assuming that you address Eric's comments, if any:

Reviewed-by: Kevin Wolf kw...@redhat.com



Re: [Qemu-devel] [PATCH v3 09/14] qemu-io: Use blk_new_open() in openfile()

2015-02-02 Thread Max Reitz

On 2015-02-02 at 14:34, Kevin Wolf wrote:

Am 26.01.2015 um 16:00 hat Max Reitz geschrieben:

Signed-off-by: Max Reitz mre...@redhat.com
---
  qemu-io.c | 31 ---
  1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 91a445a..81f8f64 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -39,7 +39,6 @@ static ReadLineState *readline_state;
  static int close_f(BlockDriverState *bs, int argc, char **argv)
  {
  blk_unref(qemuio_blk);
-qemuio_bs = NULL;
  qemuio_blk = NULL;
  return 0;
  }

This doesn't seem to be correct at this point in the series. qemuio_bs
is still passed to qemuio_command(), which checks whether non-global
commands are allowed to run using init_check_command().


$ ./qemu-io -c close -c 'read 0 64k' test.qcow2
read failed: No medium found

Yep, will fix. Thanks for catching it!


At the end of the series, please remove qemuio_bs altogether. This
version of the series leaves it around without any reader.


Okay, will do.

Max



Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-02 Thread Peter Lieven
Am 02.02.2015 um 19:29 schrieb Denis V. Lunev:
 qemu_gluster_co_discard calculates size to discard as follows
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

 glfs_discard_async is declared as follows:
   int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
   glfs_io_cbk fn, void *data) __THROW
 This is problematic on i686 as sizeof(size_t) == 4.

 Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
 on i386.

 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Kevin Wolf kw...@redhat.com
 CC: Peter Lieven p...@kamp.de
 ---
  block/gluster.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/block/gluster.c b/block/gluster.c
 index 1eb3a8c..47bf92d 100644
 --- a/block/gluster.c
 +++ b/block/gluster.c
 @@ -622,6 +622,13 @@ out:
  return ret;
  }
  
 +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
 +{
 +#if SIZE_MAX == UINT_MAX
 +bs-bl.max_discard = SIZE_MAX  BDRV_SECTOR_BITS;
 +#endif

I would write:

bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);

without the condition.


Peter



 +}
 +
  #ifdef CONFIG_GLUSTERFS_DISCARD
  static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors)
 @@ -735,6 +742,7 @@ static BlockDriver bdrv_gluster = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -762,6 +770,7 @@ static BlockDriver bdrv_gluster_tcp = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -789,6 +798,7 @@ static BlockDriver bdrv_gluster_unix = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -816,6 +826,7 @@ static BlockDriver bdrv_gluster_rdma = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif




[Qemu-devel] [PATCH 3/3] softfloat: expand out STATUS macro

2015-02-02 Thread Peter Maydell
Expand out and remove the STATUS macro.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 fpu/softfloat-specialize.h |  24 ++---
 fpu/softfloat.c| 239 ++---
 include/fpu/softfloat.h|  30 +++---
 3 files changed, 167 insertions(+), 126 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index 21d3ec2..b5b4c67 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -172,7 +172,7 @@ const float128 float128_default_nan
 
 void float_raise(int8 flags , float_status *status)
 {
-STATUS(float_exception_flags) |= flags;
+status-float_exception_flags |= flags;
 }
 
 /*
@@ -275,7 +275,7 @@ static float16 commonNaNToFloat16(commonNaNT a, 
float_status *status)
 {
 uint16_t mantissa = a.high54;
 
-if (STATUS(default_nan_mode)) {
+if (status-default_nan_mode) {
 return float16_default_nan;
 }
 
@@ -380,7 +380,7 @@ static float32 commonNaNToFloat32(commonNaNT a, 
float_status *status)
 {
 uint32_t mantissa = a.high41;
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 return float32_default_nan;
 }
 
@@ -633,7 +633,7 @@ static float32 propagateFloat32NaN(float32 a, float32 b, 
float_status *status)
 float_raise(float_flag_invalid, status);
 }
 
-if ( STATUS(default_nan_mode) )
+if (status-default_nan_mode)
 return float32_default_nan;
 
 if ((uint32_t)(av1)  (uint32_t)(bv1)) {
@@ -684,7 +684,7 @@ static float32 propagateFloat32MulAddNaN(float32 a, float32 
b,
   bIsQuietNaN, bIsSignalingNaN,
   cIsQuietNaN, cIsSignalingNaN, infzero, status);
 
-if (STATUS(default_nan_mode)) {
+if (status-default_nan_mode) {
 /* Note that this check is after pickNaNMulAdd so that function
  * has an opportunity to set the Invalid flag.
  */
@@ -800,7 +800,7 @@ static float64 commonNaNToFloat64(commonNaNT a, 
float_status *status)
 {
 uint64_t mantissa = a.high12;
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 return float64_default_nan;
 }
 
@@ -836,7 +836,7 @@ static float64 propagateFloat64NaN(float64 a, float64 b, 
float_status *status)
 float_raise(float_flag_invalid, status);
 }
 
-if ( STATUS(default_nan_mode) )
+if (status-default_nan_mode)
 return float64_default_nan;
 
 if ((uint64_t)(av1)  (uint64_t)(bv1)) {
@@ -887,7 +887,7 @@ static float64 propagateFloat64MulAddNaN(float64 a, float64 
b,
   bIsQuietNaN, bIsSignalingNaN,
   cIsQuietNaN, cIsSignalingNaN, infzero, status);
 
-if (STATUS(default_nan_mode)) {
+if (status-default_nan_mode) {
 /* Note that this check is after pickNaNMulAdd so that function
  * has an opportunity to set the Invalid flag.
  */
@@ -1019,7 +1019,7 @@ static floatx80 commonNaNToFloatx80(commonNaNT a, 
float_status *status)
 {
 floatx80 z;
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 z.low = floatx80_default_nan_low;
 z.high = floatx80_default_nan_high;
 return z;
@@ -1057,7 +1057,7 @@ static floatx80 propagateFloatx80NaN(floatx80 a, floatx80 
b,
 float_raise(float_flag_invalid, status);
 }
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 a.low = floatx80_default_nan_low;
 a.high = floatx80_default_nan_high;
 return a;
@@ -1176,7 +1176,7 @@ static float128 commonNaNToFloat128(commonNaNT a, 
float_status *status)
 {
 float128 z;
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 z.low = float128_default_nan_low;
 z.high = float128_default_nan_high;
 return z;
@@ -1208,7 +1208,7 @@ static float128 propagateFloat128NaN(float128 a, float128 
b,
 float_raise(float_flag_invalid, status);
 }
 
-if ( STATUS(default_nan_mode) ) {
+if (status-default_nan_mode) {
 a.low = float128_default_nan_low;
 a.high = float128_default_nan_high;
 return a;
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f56c618..f1170fe 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -151,7 +151,7 @@ static int32 roundAndPackInt32(flag zSign, uint64_t absZ, 
float_status *status)
 int8 roundIncrement, roundBits;
 int32_t z;
 
-roundingMode = STATUS(float_rounding_mode);
+roundingMode = status-float_rounding_mode;
 roundNearestEven = ( roundingMode == float_round_nearest_even );
 switch (roundingMode) {
 case float_round_nearest_even:
@@ -179,7 +179,9 @@ static int32 roundAndPackInt32(flag zSign, uint64_t absZ, 
float_status *status)
 float_raise(float_flag_invalid, status);
 return zSign ? (int32_t) 0x8000 : 0x7FFF;
 }
-if 

Re: [Qemu-devel] [PATCH 1/2] glusterfs: fix max_discard

2015-02-02 Thread Peter Lieven
Am 02.02.2015 um 21:09 schrieb Denis V. Lunev:
 qemu_gluster_co_discard calculates size to discard as follows
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);

 glfs_discard_async is declared as follows:
   int glfs_discard_async (glfs_fd_t *fd, off_t length, size_t lent,
   glfs_io_cbk fn, void *data) __THROW
 This is problematic on i686 as sizeof(size_t) == 4.

 Set bl_max_discard to SIZE_MAX  BDRV_SECTOR_BITS to avoid overflow
 on i386.

 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Kevin Wolf kw...@redhat.com
 CC: Peter Lieven p...@kamp.de
 ---
  block/gluster.c | 9 +
  1 file changed, 9 insertions(+)

 diff --git a/block/gluster.c b/block/gluster.c
 index 1eb3a8c..8a8c153 100644
 --- a/block/gluster.c
 +++ b/block/gluster.c
 @@ -622,6 +622,11 @@ out:
  return ret;
  }
  
 +static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
 +{
 +bs-bl.max_discard = MIN(SIZE_MAX  BDRV_SECTOR_BITS, INT_MAX);
 +}
 +

Looking at the gluster code bl.max_transfer_length should have the same limit, 
but thats a different patch.

  #ifdef CONFIG_GLUSTERFS_DISCARD
  static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors)
 @@ -735,6 +740,7 @@ static BlockDriver bdrv_gluster = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -762,6 +768,7 @@ static BlockDriver bdrv_gluster_tcp = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -789,6 +796,7 @@ static BlockDriver bdrv_gluster_unix = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif
 @@ -816,6 +824,7 @@ static BlockDriver bdrv_gluster_rdma = {
  #ifdef CONFIG_GLUSTERFS_DISCARD
  .bdrv_co_discard  = qemu_gluster_co_discard,
  #endif
 +.bdrv_refresh_limits  = qemu_gluster_refresh_limits,
  #ifdef CONFIG_GLUSTERFS_ZEROFILL
  .bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes,
  #endif

Reviewed-by: Peter Lieven p...@kamp.de



Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters

2015-02-02 Thread John Snow



On 02/02/2015 05:35 AM, Paolo Bonzini wrote:



On 30/01/2015 19:42, John Snow wrote:

+void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes,
+unsigned prd_size)
+{
+/* Each PRD can describe up to 4MiB, and must not be odd. */
+g_assert_cmphex(prd_size, =, 4096 * 1024);
+g_assert_cmphex(prd_size  0x01, ==, 0x00);
+cmd-prd_size = prd_size;
+cmd-xbytes = xbytes;
+cmd-fis.count = cpu_to_le16(cmd-xbytes / AHCI_SECTOR_SIZE);


Why do you need cpu_to_le16 here, instead of having it in the function
that writes the command to guest memory?

Paolo


+cmd-header.prdtl = size_to_prdtl(cmd-xbytes, cmd-prd_size);
+}
+




In this case, only the command header had a utility written for it to 
flip the bits for me. This is part of the FIS, instead, which has no 
explicit flip-on-write mechanism inside of commit.


So, it's correct, but not terribly consistent.

I can write a fis write helper to make this more internally consistent 
about when we handle it for the user and when we don't.




Re: [Qemu-devel] [PATCH 0/3] softfloat: Remove STATUS macros

2015-02-02 Thread Richard Henderson
On 02/02/2015 12:31 PM, Peter Maydell wrote:
 Peter Maydell (3):
   softfloat: Expand out the STATUS_PARAM macro
   softfloat: expand out STATUS_VAR
   softfloat: expand out STATUS macro
 
  fpu/softfloat-specialize.h |  124 ++--
  fpu/softfloat.c| 1609 
 
  include/fpu/softfloat.h|  371 +-
  target-mips/msa_helper.c   |   74 +-
  4 files changed, 1188 insertions(+), 990 deletions(-)

Reviewed-by: Richard Henderson r...@twiddle.net

r~



Re: [Qemu-devel] [PATCH 1/3] softfloat: Expand out the STATUS_PARAM macro

2015-02-02 Thread Richard Henderson
On 02/02/2015 12:31 PM, Peter Maydell wrote:
 -void float_raise( int8 flags STATUS_PARAM )
 +void float_raise(int8 flags , float_status *status)

Extra space before comma.


r~



Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build

2015-02-02 Thread Paolo Bonzini


On 13/01/2015 09:53, Fam Zheng wrote:
 We have module build support around for a while, but also had it bitrot
 several times. It probably makes sense to enable it by default so that
 people can notice and use it.
 
 Counterpart to --enable-modules, which is turned as default,
 --disable-modules is added to suppress it. If both are omitted, the
 support is guesses as usual.
 
 Signed-off-by: Fam Zheng f...@redhat.com

I squashed this to unbreak static compilation, but Peter reported that
module linking fails on ARM host:
  LINK  block/curl.so
/usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against
`__stack_chk_guard' can not be used when making a shared object;
recompile with -fPIC
block/curl.o: could not read symbols: Bad value
collect2: error: ld returned 1 exit status

diff --git a/configure b/configure
index a9ae57a..4fae00a 100755
--- a/configure
+++ b/configure
@@ -1536,9 +1536,6 @@ if compile_prog -Werror -fno-gcse  ; then
 fi

 if test $static = yes ; then
-  if test $modules = yes ; then
-error_exit static and modules are mutually incompatible
-  fi
   if test $pie = yes ; then
 error_exit static and pie are mutually incompatible
   else
Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
@@ -2763,6 +2760,14 @@ fi
 module_try_enable()
 {
   force=$1
+  if test $static = yes; then
+if $force; then
+  error_exit static and modules are mutually incompatible
+else
+  modules=no
+  return
+fi
+  fi
   shacmd_probe=sha1sum sha1 shasum
   for c in $shacmd_probe; do
 if has $c; then


Is the above ok?

Paolo



Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine

2015-02-02 Thread John Snow



On 02/02/2015 05:37 AM, Paolo Bonzini wrote:



On 30/01/2015 19:42, John Snow wrote:

+/**
+ * Boot and fully enable the HBA device.
+ * @see ahci_boot, ahci_pci_enable and ahci_hba_enable.
+ */
+static AHCIQState *ahci_macro_bootup(void)


Ugly name...  I would just leave out this patch.

Paolo


+{
+AHCIQState *ahci;
+ahci = ahci_boot();
+
+ahci_pci_enable(ahci);
+ahci_hba_enable(ahci);
+
+return ahci;
+}




It comes in handy later for testing migration so I don't have to do a 
lot of boilerplate for each instance, though it is just a convenience 
subroutine with no logic of its own.


I like to cut down on boilerplate as much as possible to expose the 
logic of the test as much as possible.


Have a suggestion for a better name, or are you very adamant about 
culling it?


--js



Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers

2015-02-02 Thread John Snow



On 02/02/2015 05:25 AM, Paolo Bonzini wrote:



On 30/01/2015 19:41, John Snow wrote:

+/* Construct our Command Header (set_command_header handles endianness.) */
+memset(cmd, 0x00, sizeof(cmd));
+cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
+cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */


And here |= became = ?

Paolo


+cmd.prdtl = 1; /* One PRD table entry. */




A typo that testing didn't pick up on. (I wonder how?)
Thank you.



[Qemu-devel] [PATCH 2/3] iotests: Add wait functionality to _cleanup_qemu

2015-02-02 Thread Max Reitz
The qemu process does not always need to be killed, just waiting for it
can be fine, too. This introduces a way to do so.

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/common.qemu | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 8e618b5..4e1996c 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -187,13 +187,23 @@ function _launch_qemu()
 
 
 # Silenty kills the QEMU process
+#
+# If $wait is set to anything other than the empty string, the process will not
+# be killed but only waited for, and any output will be forwarded to stdout. If
+# $wait is empty, the process will be killed and all output will be suppressed.
 function _cleanup_qemu()
 {
 # QEMU_PID[], QEMU_IN[], QEMU_OUT[] all use same indices
 for i in ${!QEMU_OUT[@]}
 do
-kill -KILL ${QEMU_PID[$i]} 2/dev/null
+if [ -z ${wait} ]; then
+kill -KILL ${QEMU_PID[$i]} 2/dev/null
+fi
 wait ${QEMU_PID[$i]} 2/dev/null # silent kill
+if [ -n ${wait} ]; then
+cat ${QEMU_OUT[$i]} | _filter_testdir | _filter_qemu \
+  | _filter_qemu_io | _filter_qmp
+fi
 rm -f ${QEMU_FIFO_IN}_${i} ${QEMU_FIFO_OUT}_${i}
 eval exec ${QEMU_IN[$i]}-   # close file descriptors
 eval exec ${QEMU_OUT[$i]}-
-- 
2.1.0




[Qemu-devel] [PATCH 3/3] iotests: Add test for drive-mirror with NBD target

2015-02-02 Thread Max Reitz
When the drive-mirror block job is completed, it will call bdrv_swap()
on the source and the target BDS; this should obviously not result in a
segmentation fault.

Signed-off-by: Max Reitz mre...@redhat.com
---
 tests/qemu-iotests/094 | 81 ++
 tests/qemu-iotests/094.out | 11 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 93 insertions(+)
 create mode 100755 tests/qemu-iotests/094
 create mode 100644 tests/qemu-iotests/094.out

diff --git a/tests/qemu-iotests/094 b/tests/qemu-iotests/094
new file mode 100755
index 000..27a2be2
--- /dev/null
+++ b/tests/qemu-iotests/094
@@ -0,0 +1,81 @@
+#!/bin/bash
+#
+# Test case for drive-mirror to NBD (especially bdrv_swap() on NBD BDS)
+#
+# Copyright (C) 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo QA output created by $seq
+
+here=$PWD
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+trap exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt generic
+_supported_proto nbd
+_supported_os Linux
+_unsupported_imgopts subformat=monolithicFlat subformat=twoGbMaxExtentFlat
+
+_make_test_img 64M
+$QEMU_IMG create -f $IMGFMT $TEST_DIR/source.$IMGFMT 64M | _filter_img_create
+
+_launch_qemu -drive if=none,id=src,file=$TEST_DIR/source.$IMGFMT,format=raw \
+ -nodefaults
+
+_send_qemu_cmd $QEMU_HANDLE \
+{'execute': 'qmp_capabilities'} \
+'return'
+
+# 'format': 'nbd' is not actually correct, but this is probably the only way
+# to test bdrv_swap() on an NBD BDS
+_send_qemu_cmd $QEMU_HANDLE  \
+{'execute': 'drive-mirror',
+  'arguments': {'device': 'src',
+'target': '$TEST_IMG',
+'format': 'nbd',
+'sync':'full',
+'mode':'existing'}} \
+'BLOCK_JOB_READY'
+
+_send_qemu_cmd $QEMU_HANDLE  \
+{'execute': 'block-job-complete',
+  'arguments': {'device': 'src'}} \
+'BLOCK_JOB_COMPLETE'
+
+_send_qemu_cmd $QEMU_HANDLE \
+{'execute': 'quit'} \
+'return'
+
+wait=1 _cleanup_qemu
+
+_cleanup_test_img
+rm -f $TEST_DIR/source.$IMGFMT
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/094.out b/tests/qemu-iotests/094.out
new file mode 100644
index 000..b66dc07
--- /dev/null
+++ b/tests/qemu-iotests/094.out
@@ -0,0 +1,11 @@
+QA output created by 094
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/source.IMGFMT', fmt=IMGFMT size=67108864
+{return: {}}
+{return: {}}
+{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
BLOCK_JOB_READY, data: {device: src, len: 67108864, offset: 
67108864, speed: 0, type: mirror}}
+{return: {}}
+{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
BLOCK_JOB_COMPLETED, data: {device: src, len: 67108864, offset: 
67108864, speed: 0, type: mirror}}
+{return: {}}
+{timestamp: {seconds:  TIMESTAMP, microseconds:  TIMESTAMP}, event: 
SHUTDOWN}
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 4b2b93b..6e2447a 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -99,6 +99,7 @@
 090 rw auto quick
 091 rw auto
 092 rw auto quick
+094 rw auto quick
 095 rw auto quick
 097 rw auto backing
 098 rw auto backing quick
-- 
2.1.0




Re: [Qemu-devel] [PATCH v3 10/14] qemu-io: Remove growable option

2015-02-02 Thread Kevin Wolf
Am 27.01.2015 um 18:11 hat Max Reitz geschrieben:
 On 2015-01-27 at 12:10, Eric Blake wrote:
 On 01/27/2015 10:04 AM, Max Reitz wrote:
 On 2015-01-27 at 11:59, Eric Blake wrote:
 On 01/26/2015 08:00 AM, Max Reitz wrote:
 Remove growable option from the open command and from the qemu-io
 command line. qemu-io is about to be converted to BlockBackend which
 will make sure that no request exceeds the image size, so the only way
 to keep growable would be to use BlockBackend if it is not given and
 to directly access the BDS if it is.
 
 qemu-io is a debugging tool, therefore removing a rarely used option
 will have only a very small impact, if any. There was only one
 qemu-iotest which used the option; since it is not critical, this patch
 just removes it.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
 Reviewed-by: Eric Blake ebl...@redhat.com
 
 Do we want to ever reuse the test number that you are deleting?
 Good question, I think I have talked about that with Kevin before. It
 would not hurt too much if we were to accidentally reuse the test case
 number, most certainly not here in upstream.
 
 However, for all downstream versions of qemu, this might make adding the
 new test 16 difficult; but certainly not impossible (if someone is
 affected by this issue, he/she can just use 999 or something). So we may
 want to keep in mind not to reuse number 16, but if someone does, so be it.
 Is it worth a placeholder file that has a comment mentioning that the
 test number is intentionally reserved (and if someone attempts to run,
 always passes)?
 
 Seems good to me. It's a minor effort now and may avert some hassle later.

How about just keeping a comment line in group?

Kevin



[Qemu-devel] [PATCH 2/2] nbd: fix max_discard

2015-02-02 Thread Denis V. Lunev
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t
as the length in bytes of the data to discard due to the following
definition:

struct nbd_request {
uint32_t magic;
uint32_t type;
uint64_t handle;
uint64_t from;
uint32_t len; -- the length of data to be discarded, in bytes
} QEMU_PACKED;

Thus we should limit bl_max_discard to UINT32_MAX  BDRV_SECTOR_BITS to
avoid overflow.

Signed-off-by: Denis V. Lunev d...@openvz.org
Reviewed-by: Peter Lieven p...@kamp.de
CC: Kevin Wolf kw...@redhat.com
---
 block/nbd.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 04cc845..99af713 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -301,6 +301,11 @@ static int nbd_co_flush(BlockDriverState *bs)
 return nbd_client_session_co_flush(s-client);
 }
 
+static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+bs-bl.max_discard = UINT32_MAX  BDRV_SECTOR_BITS;
+}
+
 static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors)
 {
@@ -396,6 +401,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
+.bdrv_refresh_limits= nbd_refresh_limits,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
@@ -413,6 +419,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
+.bdrv_refresh_limits= nbd_refresh_limits,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
@@ -430,6 +437,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_close = nbd_close,
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_discard= nbd_co_discard,
+.bdrv_refresh_limits= nbd_refresh_limits,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
-- 
1.9.1




Re: [Qemu-devel] [RFC v2 8/8] vfio-pci: add VFIO_FEATURE_ENABLE_AER_CAP feature

2015-02-02 Thread Alex Williamson
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote:
 for old machine types, we should disable aercap feature.
 
 Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
 ---
  hw/vfio/pci.c   | 13 ++---
  include/hw/compat.h |  4 
  2 files changed, 14 insertions(+), 3 deletions(-)
 
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 65247ee..0d830e6 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -138,6 +138,8 @@ typedef struct VFIOMSIXInfo {
  enum {
  #define VFIO_FEATURE_ENABLE_VGA_BIT 0
  VFIO_FEATURE_ENABLE_VGA = (1  VFIO_FEATURE_ENABLE_VGA_BIT),
 +#define VFIO_FEATURE_ENABLE_AER_CAP_BIT 1
 +VFIO_FEATURE_ENABLE_AER_CAP = (1  VFIO_FEATURE_ENABLE_AER_CAP_BIT),
  };
  
  typedef struct VFIOPCIDevice {
 @@ -2724,10 +2726,12 @@ static int vfio_add_ext_capabilities(VFIOPCIDevice 
 *vdev)
  while (header) {
  switch (PCI_EXT_CAP_ID(header)) {
  case PCI_EXT_CAP_ID_ERR:
 - exp = pdev-exp;
 - exp-aer_cap = next;
 + if (vdev-features  VFIO_FEATURE_ENABLE_AER_CAP) {
 + exp = pdev-exp;
 + exp-aer_cap = next;
  
 - vfio_pci_aer_init(vdev);
 + vfio_pci_aer_init(vdev);
 +  }
   break;
  };
  
 @@ -3498,6 +3502,9 @@ static Property vfio_pci_dev_properties[] = {
  DEFINE_PROP_BIT(x-vga, VFIOPCIDevice, features,
  VFIO_FEATURE_ENABLE_VGA_BIT, false),
  DEFINE_PROP_INT32(bootindex, VFIOPCIDevice, bootindex, -1),
 +DEFINE_PROP_BIT(aercap, VFIOPCIDevice, features,
 +VFIO_FEATURE_ENABLE_AER_CAP_BIT, true),
 +
  /*
   * TODO - support passed fds... is this necessary?
   * DEFINE_PROP_STRING(vfiofd, VFIOPCIDevice, vfiofd_name),
 diff --git a/include/hw/compat.h b/include/hw/compat.h
 index 313682a..72a2cdb 100644
 --- a/include/hw/compat.h
 +++ b/include/hw/compat.h
 @@ -30,6 +30,10 @@
  .driver   = virtio-pci,\
  .property = virtio-pci-bus-master-bug-migration,\
  .value= on,\
 +},{\
 +.driver   = vfio-pci,\
 +.property = aercap,\
 +.value= off,\

This will leave it enabled on both Q35 and 440FX afaict, so I'm not sure
what this fixes.  We don't care about migration compatibility with
vfio-pci.

  }
  
  #endif /* HW_COMPAT_H */






Re: [Qemu-devel] [RFC v2 4/8] vfio-pci: pass the aer error to guest

2015-02-02 Thread Alex Williamson
On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote:
 when the vfio device encounters an uncorrectable error in host,
 the vfio_pci driver will signal the eventfd registered by this
 vfio device, the results in the qemu eventfd handler getting
 invoked.
 
 this patch is to pass the error to guest and have the guest driver
 recover from the error.
 
 Signed-off-by: Chen Fan chen.fan.f...@cn.fujitsu.com
 ---
  hw/vfio/pci.c | 34 --
  1 file changed, 28 insertions(+), 6 deletions(-)
 
 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index 2072261..8c81bb3 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -3141,18 +3141,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
  static void vfio_err_notifier_handler(void *opaque)
  {
  VFIOPCIDevice *vdev = opaque;
 +PCIDevice *dev = vdev-pdev;
 +PCIEAERMsg msg = {
 +.severity = 0,
 +.source_id = (pci_bus_num(dev-bus)  8) | dev-devfn,
 +};
  
  if (!event_notifier_test_and_clear(vdev-err_notifier)) {
  return;
  }
  
 +/* we should read the error details from the real hardware
 + * configuration spaces, here we only need to do is signaling
 + * to guest an uncorrectable error has occurred.
 + */
 +if (dev-exp.aer_cap) {
 +uint8_t *aer_cap = dev-config + dev-exp.aer_cap;
 +uint32_t uncor_status;
 +bool isfatal;
 +
 +uncor_status = vfio_pci_read_config(dev,
 +   dev-exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
 +
 +isfatal = uncor_status  pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER);
 +
 +msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
 + PCI_ERR_ROOT_CMD_NONFATAL_EN;
 +
 +pcie_aer_msg(dev, msg);
 +return;
 +}

What if the guest machine type is 440FX?  We've just killed the existing
vm_stop functionality for the majority of users.

 +
  /*
 - * TBD. Retrieve the error details and decide what action
 - * needs to be taken. One of the actions could be to pass
 - * the error to the guest and have the guest driver recover
 - * from the error. This requires that PCIe capabilities be
 - * exposed to the guest. For now, we just terminate the
 - * guest to contain the error.
 + * If the aer capability is not exposed to the guest. we just
 + * terminate the guest to contain the error.

Just because it's exposed doesn't mean the guest chipset allows access
to it, right?

   */
  
  error_report(%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  






[Qemu-devel] [PATCH 0/3] softfloat: Remove STATUS macros

2015-02-02 Thread Peter Maydell
This patchset expands out all the uses of the softfloat STATUS_VAR,
STATUS_PARAM and STATUS macros and removes their definitions.

In my opinion these macros are basically just obfuscating the
passing around and use of a float_status pointer in the softfloat
code, and serve no useful purpose. I used to think that we should
retain them on the offchance we wanted to pull in new softfloat
upstream code, but this is wrong on three counts now:
 (1) there's never been any new softfloat upstream code to take
 (2) even if there were, we couldn't take it because of the license
 (3) these macros aren't even in the upstream version; they were
 added by Fabrice when he integrated softfloat into QEMU

We've never used whatever flexibility might in theory be
permitted by the indirection through this set of macros, so
this patchset just deletes them all.

The changes make an alarmingly large diffstat, but it's just
mechanical expansion plus fixing of coding style issues on the
affected lines. (Lots of style fixing because the softfloat
upstream coding is (a) a long way away from QEMU's and (b) bonkers.)

This obviously is likely to induce merge conflicts with any
other in-flight softfloat changes, for which I apologise in
advance.

[Next on the cleanup list: get rid of those 'int8/int32/int64'
types...]

Peter Maydell (3):
  softfloat: Expand out the STATUS_PARAM macro
  softfloat: expand out STATUS_VAR
  softfloat: expand out STATUS macro

 fpu/softfloat-specialize.h |  124 ++--
 fpu/softfloat.c| 1609 
 include/fpu/softfloat.h|  371 +-
 target-mips/msa_helper.c   |   74 +-
 4 files changed, 1188 insertions(+), 990 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [PATCH v2 01/11] cpu_ldst.h: Allow NB_MMU_MODES to be 7

2015-02-02 Thread Richard Henderson
On 01/29/2015 10:55 AM, Peter Maydell wrote:
 Support guest CPUs which need 7 MMU index values.
 Add a comment about what would be required to raise the limit
 further (trivial for 8, TCG backend rework for 9 or more).
 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 Reviewed-by: Greg Bellows greg.bell...@linaro.org
 ---
  include/exec/cpu_ldst.h | 28 +---
  1 file changed, 25 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson r...@twiddle.net


r~




Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-02 Thread Chen, Tiejun


On 2015/2/2 20:19, Wei Liu wrote:

On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote:

When we're working to support IGD GFX passthrough with qemu
upstream, instead of -gfx_passthru we'd like to make that
a machine option, -machine xxx,-igd-passthru=on. This need
to bring a change on tool side.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v2:

* Based on some discussions with Wei we'd like to keep both old
   option, -gfx_passthru, and new machine property option,
   -machine xxx,-igd-passthru=on at the same time but deprecate
   the old one. Then finally we remove the old one at that point
   that to give downstream (in this case, Xen) time to cope with the
   change.



My suggestion has one premise -- if upstream QEMU has already released
that -gfx_passthru option. If there is no old one (in upstream QEMU)
at all, then there is nothing to keep and deprecate.


Understood.




  tools/libxl/libxl_dm.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index c2b0487..8405f0b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -701,6 +701,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,


Note this function is upstream QEMU specfic.


Yeah.




  flexarray_append(dm_args, -net);
  flexarray_append(dm_args, none);
  }
+/*
+ * Although we already introduce 'igd-passthru', but we'd like
+ * to remove this until we give downstream time to cope with
+ * the change.
+ */
  if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
  flexarray_append(dm_args, -gfx_passthru);
  }


The comment contradicts what I know (or what I think I know). In our
last email exchange you said there was no -gfx_passthru in any version
of upstream QEMU.

So, shouldn't you just remove this `if' statement?


Right. So what about this?

libxl: add one machine property to support IGD GFX passthrough

When we're working to support IGD GFX passthrough with qemu
upstream, we'd like to introduce a machine option,
-machine xxx,igd-passthru=on, to enable/disable that feature.
And we also remove that old option, -gfx_passthru, just from
the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually
no any qemu stream version really need or use that.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index c2b0487..b888f19 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -701,9 +701,6 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,

 flexarray_append(dm_args, -net);
 flexarray_append(dm_args, none);
 }
-if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
-flexarray_append(dm_args, -gfx_passthru);
-}
 } else {
 if (!sdl  !vnc) {
 flexarray_append(dm_args, -nographic);
@@ -748,6 +745,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,

 machinearg, max_ram_below_4g);
 }
 }
+
+if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
+machinearg = libxl__sprintf(gc, %s,igd-passthru=on, 
machinearg);

+}
+
 flexarray_append(dm_args, machinearg);
 for (i = 0; b_info-extra_hvm  b_info-extra_hvm[i] != NULL; 
i++)

 flexarray_append(dm_args, b_info-extra_hvm[i]);

Thanks
Tiejun



Wei.


@@ -748,6 +753,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
  machinearg, max_ram_below_4g);
  }
  }
+
+if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
+machinearg = libxl__sprintf(gc, %s,igd-passthru=on, machinearg);
+}
+
  flexarray_append(dm_args, machinearg);
  for (i = 0; b_info-extra_hvm  b_info-extra_hvm[i] != NULL; i++)
  flexarray_append(dm_args, b_info-extra_hvm[i]);
--
1.9.1






Re: [Qemu-devel] [PATCH v4 07/18] spapr_iommu: Implement free_table() helper

2015-02-02 Thread Alexey Kardashevskiy
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/02/2015 05:37 PM, David Gibson wrote:
 On Thu, Jan 29, 2015 at 08:27:19PM +1100, Alexey Kardashevskiy wrote:
 Every sPAPRTCETable object holds an IOMMU memory region which holds 
 a referenced to the sPAPRTCETable instance. So if we want to free an
 sPAPRTCETable instance, calling object_unref() will not be enough as
 embedded memory region will hold the reference and we need to break 
 the loop.
 
 This adds a spapr_tce_free_table() helper which destroys the
 embedded memory region and then calls object_unref() on the
 sPAPRTCETable instance which will succeed now. The helper adds a new
 child under unique name derived from LIOBN.
 
 As we are here, fix spapr_tce_new_table() callers. At the moment
 spapr_tce_new_table() references sPAPRTCETable twice - once in
 object_new() and second time in object_property_add_child(). The
 callers of spapr_tce_new_table() should take care of correct 
 referencing.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 Reviewed-by: David Gibson da...@gibson.dropbear.id.au
 
 With the caveat that I don't know the QOM refcounting model well 
 enough to really analyze if this is correct.  It's not obviously 
 wrong, and doesn't look like it will break anything in the actual
 PAPR related code.

I am redoing this part to always keep sPAPRTCETable and not free them on
every reboot so you can ignore this type of problems now. Thanks!



- -- 
Alexey
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJU0CUvAAoJEIYTPdgrwSC5mpcP+wa3vulve++JX06UWbTsaarV
nXQhWsKHAFQZmQYzRzyUwl00JEIp14NZ9dzBXEaAIutQmiFDeNA7a3QCHpnkt2Ei
JJzzzK5ygjcX9H2RP+hzaoJkHxdFKMQ9TREKe86IsHYw34rREzfS/Li6PkTm+LSo
3jnnd1k0GGB9AX/hwvDHHcdQRpio2sI0lG7WH3R8FDbHAHo/YW3JvrPL3iTDpNDA
HYR9Hy4JynzQIGJhzYFkmODEjUbLtn/c/xIYVZJDNdDdLETF1sK+iTpbinxtPe5n
JsKGncsFmZGKotJ3ZAjPxogdYhqtvq5/VsUpc2jHlC4QKgEsVLqUm5cfu+hl5a00
w/4W9tB+lqbFFvTxNAOokYPnWaD+NJZ3DY03vsCKRk16/MrMjdfMpsj/m3ai6pAx
Pc/A2JfZ30SjqbFBLQJZKBDKQ9JARp7F/S60s7iW2D4ELRChdprcb/cE/gVGoWFV
DYKpPRtdzjNNtjMLNd6OuWVu3LItcfz42VQwxrwjyZOIIgpdpOM1hBBXqMjhOykW
gFtlf/zEDAxQ9cl0RrjH9gC1UwnP7w+slvspa5MNghnFKByHBOwUW9bNpDc2xcp2
9I6j1yjxI81Si96uGFj8Ba2LzBGaTzVIbiSsnV2cvZKirNAN/Gnm9lcgwOOMKA09
QAx4oiji73rE6GzBZxFx
=o1mo
-END PGP SIGNATURE-



Re: [Qemu-devel] [PATCH 4/6] vmport_rpc: Add QMP access to vmport_rpc object.

2015-02-02 Thread Don Slutz
On 01/30/15 17:32, Eric Blake wrote:
 On 01/30/2015 02:06 PM, Don Slutz wrote:
 This adds two new inject commands:

 inject-vmport-reboot
 inject-vmport-halt

 And three guest info commands:

 vmport-guestinfo-set
 vmport-guestinfo-get
 query-vmport-guestinfo

 More details in qmp-commands.hx

 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 
 +static int foreach_dynamic_VMPortRpc_device(FindVMPortRpcDeviceFunc *func,
 +void *arg)
 +{
 +VMPortRpcFind find = {
 +.func = func,
 +.arg = arg,
 +};
 +
 +/* Loop through all VMPortRpc devices that were spawened outside
 
 s/spawened/spawned/
 

Fixed.

 
  #ifdef VMPORT_RPC_DEBUG
  /*
   * Add helper function for traceing.  This routine will convert
 
 Pre-existing as of this patch, but while you are here:
 s/traceing/tracing/
 unless it occurred earlier in the series, in which case fix it there.
 

Fixed.

 
 +static void convert_local_rc(Error **errp, int rc)
 +{
 +switch (rc) {
 +case 0:
 +break;
 +case VMPORT_DEVICE_NOT_FOUND:
 +error_set(errp, QERR_DEVICE_NOT_FOUND, TYPE_VMPORT_RPC);
 +break;
 +case SEND_NOT_OPEN:
 +error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc not open);
 
 shorter as:
 error_setg(errp, VMWare rpc not open);
 
 and similar for all your other uses of ERROR_CLASS_GENERIC_ERROR
 

Fixed.


 +++ b/qapi-schema.json
 @@ -1271,6 +1271,101 @@
  { 'command': 'inject-nmi' }
  
  ##
 +# @inject-vmport-reboot:
 +#
 +# Injects a VMWare Tools reboot to the guest.
 +#
 +# Returns:  If successful, nothing
 +#
 +# Since:  2.3
 +#
 +##
 +{ 'command': 'inject-vmport-reboot' }
 +
 +##
 +# @inject-vmport-halt:
 +#
 +# Injects a VMWare Tools halt to the guest.
 +#
 +# Returns:  If successful, nothing
 +#
 +# Since:  2.3
 +#
 +##
 +{ 'command': 'inject-vmport-halt' }
 
 Why two commands?  Why not just 'inject-vmport-action' with a parameter
 that says whether the action is 'reboot', 'halt', or something else?
 

This is a corner case.  I will convert it to inject-vmport-action with
a parameter.


 +
 +##
 +# @vmport-guestinfo-set:
 +#
 +# Set a VMWare Tools guestinfo key to a value
 +#
 +# @key: the key to set
 +#
 +# @value: The data to set the key to
 +#
 +# @format: #optional value encoding (default 'utf8').
 +#  - base64: value must be base64 encoded text.  Its binary
 +#decoding gets set.
 +#Bug: invalid base64 is currently not rejected.
 
 We should fix the bug, rather than document it.
 

Well, this turns out to be complex.  If there is a bug it is in GLIB.
However the Glib reference manual refers to RFC 1421 and RFC 2045 and
MIME encoding.  Based on all that (which seems to match:

http://en.wikipedia.org/wiki/Base64

) MIME states that all characters outside the (base64) alphabet are
to be ignored.  Testing shows that g_base64_decode() does this.

The confusion is that most non-MIME uses reject a base64 string
that contain characters outside the alphabet.  I was just following
the other uses of base64 in this file.

DataFormat refers to RFC 3548, which has the info:

  Implementations MUST reject the encoding if it contains characters
   outside the base alphabet when interpreting base encoded data, unless
   the specification referring to this document explicitly states
   otherwise.  Such specifications may, as MIME does, instead state that
   characters outside the base encoding alphabet should simply be
   ignored when interpreting data (be liberal in what you accept).

So with GLIB going the MIME way, I do not think this is a QEMU bug
(you could consider this a GLIB bug, but the document I found says
that GLIB goes the MIME way and so does not reject anything).

Having now looked closer at this, I plan to drop this bug section
since I am happy with what GLIB is doing.


 +#Whitespace *is* invalid.
 +#  - utf8: value's UTF-8 encoding is written
 +#  - value itself is always Unicode regardless of format, like
 +#any other string.
 
 This was confusing to read - there are three bullets, so it looks like
 you meant to document three valid DataFormat enum values; but there are
 only two.  The comment about 'value' being supplied as valid JSON UTF8
 and only later decoded according to 'format' might belong better on
 'value', if at all.  On the other hand, I see you are just blindly
 copy-and-pasting from 'ringbuf-write'.


Yup.  Will just drop the - value...

 
 +#
 +# Returns: Nothing on success
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'vmport-guestinfo-set',
 +  'data': {'key': 'str', 'value': 'str',
 +   '*format': 'DataFormat'} }
 +
 +##
 +# @vmport-guestinfo-get:
 +#
 +# Get a VMWare Tools guestinfo value for a key
 +#
 +# @key: the key to get
 +#
 +# @format: #optional data encoding (default 'utf8').
 +#  - base64: the value is returned in base64 encoding.
 +#  - utf8: the value is interpreted as UTF-8.
 +#Bug: can 

Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-02 Thread Chen, Tiejun


On 2015/2/2 19:08, Ian Campbell wrote:

On Mon, 2015-02-02 at 09:17 +0800, Tiejun Chen wrote:

When we're working to support IGD GFX passthrough with qemu
upstream, instead of -gfx_passthru we'd like to make that
a machine option, -machine xxx,-igd-passthru=on. This need
to bring a change on tool side.



From which Qemu version is this new option accepted? What will a verison

of qemu prior to then do when presented with the new option?


Sorry, maybe I'm not describing this correctly.

Actually qemu upstream never own this option, '-gfx_passthru' at all. 
This just exists alone in qemu-xen-traditional. So here I'm trying to 
introduce a new stuff that doesn't clash anything in qemu upstream.


So I guess I should rephrase this as follows:

libxl: add one machine property to support IGD GFX passthrough

When we're working to support IGD GFX passthrough with qemu
upstream, we'd like to introduce a machine option,
-machine xxx,igd-passthru=on, to enable/disable that feature.
And we also remove that old option, -gfx_passthru, just from
the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually
no any qemu stream version really need or use that.



(nb: you have an extra '-' in the description I think)


Yeah, I will remove that.

Thanks
Tiejun





Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v2:

* Based on some discussions with Wei we'd like to keep both old
   option, -gfx_passthru, and new machine property option,
   -machine xxx,-igd-passthru=on at the same time but deprecate
   the old one. Then finally we remove the old one at that point
   that to give downstream (in this case, Xen) time to cope with the
   change.

  tools/libxl/libxl_dm.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index c2b0487..8405f0b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -701,6 +701,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
  flexarray_append(dm_args, -net);
  flexarray_append(dm_args, none);
  }
+/*
+ * Although we already introduce 'igd-passthru', but we'd like
+ * to remove this until we give downstream time to cope with
+ * the change.
+ */
  if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
  flexarray_append(dm_args, -gfx_passthru);
  }
@@ -748,6 +753,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
  machinearg, max_ram_below_4g);
  }
  }
+
+if (libxl_defbool_val(b_info-u.hvm.gfx_passthru)) {
+machinearg = libxl__sprintf(gc, %s,igd-passthru=on, machinearg);
+}
+
  flexarray_append(dm_args, machinearg);
  for (i = 0; b_info-extra_hvm  b_info-extra_hvm[i] != NULL; i++)
  flexarray_append(dm_args, b_info-extra_hvm[i]);








Re: [Qemu-devel] [PATCH v2 1/2] configure: Default to enable module build

2015-02-02 Thread Fam Zheng
On Mon, 02/02 20:34, Paolo Bonzini wrote:
 
 
 On 13/01/2015 09:53, Fam Zheng wrote:
  We have module build support around for a while, but also had it bitrot
  several times. It probably makes sense to enable it by default so that
  people can notice and use it.
  
  Counterpart to --enable-modules, which is turned as default,
  --disable-modules is added to suppress it. If both are omitted, the
  support is guesses as usual.
  
  Signed-off-by: Fam Zheng f...@redhat.com
 
 I squashed this to unbreak static compilation, but Peter reported that
 module linking fails on ARM host:
   LINK  block/curl.so
 /usr/bin/ld: block/curl.o: relocation R_ARM_THM_MOVW_ABS_NC against
 `__stack_chk_guard' can not be used when making a shared object;
 recompile with -fPIC
 block/curl.o: could not read symbols: Bad value
 collect2: error: ld returned 1 exit status

I don't see how -fPIC is missed in ARM host :( Does the below patch fix this?

 
 diff --git a/configure b/configure
 index a9ae57a..4fae00a 100755
 --- a/configure
 +++ b/configure
 @@ -1536,9 +1536,6 @@ if compile_prog -Werror -fno-gcse  ; then
  fi
 
  if test $static = yes ; then
 -  if test $modules = yes ; then
 -error_exit static and modules are mutually incompatible
 -  fi
if test $pie = yes ; then
  error_exit static and pie are mutually incompatible
else
 Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
 @@ -2763,6 +2760,14 @@ fi
  module_try_enable()
  {
force=$1
 +  if test $static = yes; then
 +if $force; then
 +  error_exit static and modules are mutually incompatible
 +else
 +  modules=no
 +  return
 +fi
 +  fi
shacmd_probe=sha1sum sha1 shasum
for c in $shacmd_probe; do
  if has $c; then
 
 
 Is the above ok?

Yes, it looks correct.

Fam



Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-02-02 Thread Gonglei
On 2015/2/2 17:37, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:
 
 On 2015/1/30 20:32, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 20:01, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 15:46, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 0:03, Alexander Graf wrote:



 On 29.01.15 14:29, arei.gong...@huawei.com wrote:
 From: Gonglei arei.gong...@huawei.com

 If boot order is invaild or is set failed,
 exit qemu.

 Signed-off-by: Gonglei arei.gong...@huawei.com

 Do we really want to kill the machine only because the boot device
 string doesn't validate?


 Not all of the situation. If people want to change boot order by 
 qmp/hmp
 command, it just report an error, please see do_boot_set(). But
 if the boot
 order is set in qemu command line, it will exit qemu if the boot
 device string
 is invalidate, as this patch's situation, which follow the original
 processing
 way (commit ef3adf68).

 I think Alex isn't concerned about the monitor command, but what happens
 when boot order once is reset to order on system reset.

 -boot errors should have been detected during command line processing
 (strongly preferred) or initial startup (acceptable).  Detecting

 Yes, and it had done it just like that, please see main() of
 vl.c. So, actually
 it wouldn't fail in the check of restore_boot_order function's calling.
 The only possible fails will happen to call boot_set_handler(). Take
 x86 pc machine example, set_boot_dev() callback  may return errors.

 I don't like unreachable error messages.  If qemu_boot_set() can't fail
 in restore_boot_order(), then simply assert it doesn't fail, by passing
 error_abort.


 Sorry, I meant the validate_bootdevices() can't fail in 
 restore_boot_order(),
 but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as
 set_boot_dev(). For example:
 x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot
 menu=on,order=nbcdep,once=c -monitor stdio -vnc :0
 QEMU 2.2.50 monitor - type 'help' for more information
 (qemu) system_reset
 (qemu) qemu-system-x86_64: Too many boot devices for PC

 The value of parameter order should be checked during command line
 processing (strongly preferred) or initial startup (acceptable) if at
 all possible.  Is it possible?

 Either 'once' option or 'order' option can take effect for -boot at
 the same time,
 that is say initial startup processing can check only one. Besides,
 the check is just for
 corresponding machine type, so command line processing also can't do it.
 
 I challenge your idea that we can't check this before the guest starts
 running.
 
 qemu_boot_set() can fail for two reasons:

There is a third reason that boot_set_handler is not null, but fails in really 
executing time.
You can see my above example about function set_boot_dev(), the handler of
pc machine.

 
 * validate_bootdevices() fails
 
   Should never happen, because we've called it in main() already,
   treating failure as fatal error.

Yes.

 

 * boot_set_handler is null
 
   MachineClass method init() may set this.  main() could *easily* test
   whether it did!  If it didn't, and -boot once is given, error out.
   Similar checks exist already, e.g. drive_check_orphaned(),
   net_check_clients().  They only warn, but that's detail.

I agree, just need to report the error message.

Regards,
-Gonglei




Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-02 Thread Chen, Tiejun



On 2015/2/2 20:54, Ian Jackson wrote:

Wei Liu writes (Re: [v2][PATCH] libxl: add one machine property to support IGD GFX 
passthrough):

On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote:

When we're working to support IGD GFX passthrough with qemu
upstream, instead of -gfx_passthru we'd like to make that
a machine option, -machine xxx,-igd-passthru=on. This need
to bring a change on tool side.

...

My suggestion has one premise -- if upstream QEMU has already released
that -gfx_passthru option. If there is no old one (in upstream QEMU)
at all, then there is nothing to keep and deprecate.


I think the commit message of the xen.git commit should explain what
options are supported by which versions of qemu (including qemu
upstream's future plans).

That would provide (a) something which summarises the communication
etc. with qemu upstream and can be checked with them if necessary and
(b) something against which the libxl changes can be easily judged.



Sorry, looks I'm misleading this to everyone.

Here I picked my reply from another email:

Actually qemu upstream never own this option, '-gfx_passthru' at all. 
This just exists alone in qemu-xen-traditional. So here I'm trying to 
introduce a new stuff that doesn't clash anything in qemu upstream.


So I guess I should rephrase this as follows:

libxl: add one machine property to support IGD GFX passthrough

When we're working to support IGD GFX passthrough with qemu
upstream, we'd like to introduce a machine option,
-machine xxx,igd-passthru=on, to enable/disable that feature.
And we also remove that old option, -gfx_passthru, just from
the case of LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN since actually
no any qemu stream version really need or use that.

So is it good?

Thanks
Tiejun



Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence

2015-02-02 Thread Paolo Bonzini


On 02/02/2015 13:28, Pavel Dovgaluk wrote:
cpu-exception_index = EXCP_INTERRUPT;
next_tb = 0;
   +qemu_notify_event();
  
  Why is this needed?
 It is needed to wake up iothread in replay mode.
 Otherwise it waits for additional time and replay becomes too slow.

What event (something from a timerlist?) is ready, that has not been
notified to the iothread yet?  qemu_notify_event() should never be
needed in common case.  It's probably missing somewhere else.

Paolo



Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.

2015-02-02 Thread Paolo Bonzini


On 02/02/2015 14:12, Peter Maydell wrote:
  In particular, you can do it while you are under the big QEMU lock.
 
 ...but this is a signal handler, so we can't guarantee that the
 thread holds the big lock.
 
 ...and if we can iterate over CPU lists in signal handlers,
 the correct approach is probably to use the existing exit_request
 flag rather than adding another one. (Needs investigation to
 check and document semantics of that flag.)

I agree on both counts.

Paolo



Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence

2015-02-02 Thread Paolo Bonzini


On 02/02/2015 13:42, Pavel Dovgaluk wrote:
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 02/02/2015 13:28, Pavel Dovgaluk wrote:
  cpu-exception_index = EXCP_INTERRUPT;
  next_tb = 0;
 +qemu_notify_event();

 Why is this needed?
 It is needed to wake up iothread in replay mode.
 Otherwise it waits for additional time and replay becomes too slow.

 What event (something from a timerlist?) is ready, that has not been
 notified to the iothread yet?  qemu_notify_event() should never be
 needed in common case.  It's probably missing somewhere else.
 
 I think in this case there are no events at all - just reading timers values
 that were made while recording.
 We have to replay these reads by waking iothread.

I think the right place for this is in replay_read_next_clock then.

Paolo



Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Kevin Wolf
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
 fallocate() works fine and could handle properly with arbitrary size
 requests. There is no sense to reduce the amount of space to fallocate.
 The bigger is the size, the better is the performance as the amount of
 journal updates is reduced.
 
 The patch changes behavior for both generic filesystem and XFS codepaths,
 which are different in handle_aiocb_write_zeroes. The implementation
 of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
 thus the change is fine for both ways.
 
 Signed-off-by: Denis V. Lunev d...@openvz.org
 Reviewed-by: Max Reitz mre...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 CC: Stefan Hajnoczi stefa...@redhat.com
 CC: Peter Lieven p...@kamp.de
 CC: Fam Zheng f...@redhat.com
 ---
  block/raw-posix.c | 17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index 7b42f37..933c778 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, 
 int fd, Error **errp)
  }
  }
  
 +static void raw_probe_max_write_zeroes(BlockDriverState *bs)
 +{
 +BDRVRawState *s = bs-opaque;
 +struct stat st;
 +
 +if (fstat(s-fd, st)  0) {
 +return; /* no problem, keep default value */
 +}
 +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
 +return;
 +}
 +bs-bl.max_write_zeroes = INT_MAX;
 +}

Peter, do you remember why INT_MAX isn't actually the default? I think
the most reasonable behaviour would be that a limitation is only used if
a block driver requests it, and otherwise unlimited is assumed.

We can take this patch to raw-posix, it is certainly not wrong. But any
format driver or filter will still, in most cases needlessly, apply
MAX_WRITE_ZEROES_DEFAULT, i.e. a 16 MB maximum, so I think we should
consider making a change to the default.

Kevin



[Qemu-devel] [PATCH v4 0/5] this series adds the long missing multiread support to virtio-blk.

2015-02-02 Thread Peter Lieven
some remarks:
 - i introduced rd_merged and wr_merged block accounting stats to
   blockstats as a generic interface which can be set from any
   driver that will introduce multirequest merging in the future.
 - the old multiwrite interface is still there and might be removed.

v3-v4:
 - added a patch to disable request merging in the virtio-blk device
 - changed some static function names [Max]
 - removed _identifiers [Max]
 - removed to +1 in the check for max iovectors merged [Max, Kevin]
 - changed type  0xff to type  ~(VIRTIO_BLK_T_OUT | VIRTIO_BLK_T_BARRIER) 
[Max]
 - initialize qiov before bounds check [Max]
 - added an assertion to avoid overflowing mrb-reqs array [Max]
 - added comments about the usage of qiov (internal vs. external) [Max]

v2-v3:
 - completely reworked Patch 4 as it turned out that depending on the
   kernel even sequential requests fly in out of order or we receive
   a mix from different software queues.

v1-v2:
 - add overflow checking for nb_sectors [Kevin]
 - do not change the name of the macro of max mergable requests. [Fam]

RFC v2-v1:
 - added Erics grammar corrections to Patch 4
 - Patch 4
  - reworked the IF tree to a SWITCH statement to make it easier
to understand in virtio_blk_handle_request
  - stop merging if requests switch from read to write or vice
versa. e..g, otherwise we could introduce big delays as a small
read request could be followed by a lot of write requests
and the read requests sits there until the queue is empty.

RFC v1-RFC v2:
 - completed Patch 1 by the example in qmp-commands.hx [Eric]
 - used bool for merge in Patch 4 [Max]
 - fixed a few typos in the commit msg of Patch 4 [Max]
 - do not start merging and directly pass req[0]-qiov in case of num_reqs == 1
 - avoid allocating memory for the multireq [Kevin]
 - do not import block_int.h and add appropiate iface to block-backend [Kevin]
 - removed debug output and added trace event for multireq [Kevin]
 - fixed alloc hint for the merge qiov [Kevin]
 - currently did not split virtio_submit_multireq into rw code since
   the redundant code would now be much bigger part than in the original patch.
 - added a merge_qiov to VirtioBlockRequest. Abusing the qiov was not possible
   because it is initialized externally with guest memory [Kevin]
 - added a pointer to VirtioBlockRequest to create a linked list
   of VirtioBlockBlockRequests. This list is used to complete all
   requests belonging to a multireq [Kevin]

Peter Lieven (5):
  block: add accounting for merged requests
  hw/virtio-blk: add a constant for max number of merged requests
  block-backend: expose bs-bl.max_transfer_length
  virtio-blk: introduce multiread
  virtio-blk: add a knob to disable request merging

 block.c |   2 +
 block/accounting.c  |   7 +
 block/block-backend.c   |   5 +
 block/qapi.c|   2 +
 hmp.c   |   6 +-
 hw/block/dataplane/virtio-blk.c |   8 +-
 hw/block/virtio-blk.c   | 299 +++-
 include/block/accounting.h  |   3 +
 include/hw/virtio/virtio-blk.h  |  18 ++-
 include/sysemu/block-backend.h  |   1 +
 qapi/block-core.json|   9 +-
 qmp-commands.hx |  22 ++-
 trace-events|   1 +
 13 files changed, 273 insertions(+), 110 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Peter Lieven

Am 02.02.2015 um 15:16 schrieb Kevin Wolf:

Am 02.02.2015 um 15:12 hat Peter Lieven geschrieben:

Am 02.02.2015 um 15:04 schrieb Kevin Wolf:

Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben:

Am 02.02.2015 um 14:23 schrieb Kevin Wolf:

Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:

fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev d...@openvz.org
Reviewed-by: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  }
  }
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+struct stat st;
+
+if (fstat(s-fd, st)  0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
+return;
+}
+bs-bl.max_write_zeroes = INT_MAX;
+}

Peter, do you remember why INT_MAX isn't actually the default? I think
the most reasonable behaviour would be that a limitation is only used if
a block driver requests it, and otherwise unlimited is assumed.

The default (0) actually means unlimited or undefined. We introduced
that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
sized requests because there is no guarantee that write zeroes is a
fast operation. We should set INT_MAX only if we know that write
zeroes of an arbitrary size is always fast.

Well, splitting it up doesn't make it any faster. I think we can assume
that drv-bdrv_co_write_zeroes() wants to know the full request size
unless the driver has explicitly set bs-bl.max_write_zeroes.

You mean sth like this:

Yes, I think that's what I meant.


I can't find the original discussion why we added this limit. It was actually 
the default
before we introduced BlockLimits. And, it was also the default in the 
unsupported path
of write zeroes which created big memory allocations. This might be the reason 
why
we introduced a limit.

Peter



Kevin


diff --git a/block.c b/block.c
index 61412e9..8272ef9 100644
--- a/block.c
+++ b/block.c
@@ -3192,10 +3192,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState 
*bs,
  BDRV_REQ_COPY_ON_READ);
  }

-/* if no limit is specified in the BlockLimits use a default
- * of 32768 512-byte sectors (16 MiB) per request.
- */
-#define MAX_WRITE_ZEROES_DEFAULT 32768
+#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768

  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
@@ -3206,7 +3203,7 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
  int ret = 0;

  int max_write_zeroes = bs-bl.max_write_zeroes ?
-   bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT;
+   bs-bl.max_write_zeroes : INT_MAX;

  while (nb_sectors  0  !ret) {
  int num = nb_sectors;
@@ -3242,7 +3239,7 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
  if (ret == -ENOTSUP) {
  /* Fall back to bounce buffer if write zeroes is unsupported */
  int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length,
- MAX_WRITE_ZEROES_DEFAULT);
+ MAX_WRITE_ZEROES_BOUNCE_BUFFER);
  num = MIN(num, max_xfer_len);
  iov.iov_len = num * BDRV_SECTOR_SIZE;
  if (iov.iov_base == NULL) {
@@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void 
*opaque)
  rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors);
  }

-/* if no limit is specified in the BlockLimits use a default
- * of 32768 512-byte sectors (16 MiB) per request.
- */
-#define MAX_DISCARD_DEFAULT 32768
-
  int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors)
  {
@@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
  return 0;
  }

-max_discard = bs-bl.max_discard ?  bs-bl.max_discard : 
MAX_DISCARD_DEFAULT;
+max_discard = bs-bl.max_discard ?  bs-bl.max_discard : INT_MAX;
  while (nb_sectors  0) {
  int ret;
  int num = 

Re: [Qemu-devel] [PATCH] vl.c: fix -usb option assertion failure in qemu_opt_get_bool_helper()

2015-02-02 Thread Marcel Apfelbaum

On 01/31/2015 11:23 AM, Jan Kiszka wrote:

On 2015-01-05 12:37, Jan Kiszka wrote:

On 2015-01-05 12:22, Stefan Hajnoczi wrote:

Commit 49d2e648e8087d154d8bf8b91f27c8e05e79d5a6 (machine: remove
qemu_machine_opts global list) removed option descriptions from the
-machine QemuOptsList to avoid repeating MachineState's QOM properties.

This change broke vl.c:usb_enabled() because qemu_opt_get_bool() cannot
be used on QemuOptsList without option descriptions since QemuOpts
doesn't know the type and therefore left an unparsed string value.

This patch avoids calling qemu_opt_get_bool() to fix the assertion
failure:

   $ qemu-system-x86_64 -usb
   qemu_opt_get_bool_helper: Assertion `opt-desc  opt-desc-type == 
QEMU_OPT_BOOL' failed.

Test the presence of -usb using qemu_opt_find() but use the
MachineState-usb field instead of qemu_opt_get_bool().

Cc: Marcel Apfelbaum marce...@redhat.com
Cc: Tiejun Chen tiejun.c...@intel.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
  vl.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index bea9656..6e8889c 100644
--- a/vl.c
+++ b/vl.c
@@ -999,8 +999,11 @@ static int parse_name(QemuOpts *opts, void *opaque)

  bool usb_enabled(bool default_usb)
  {
-return qemu_opt_get_bool(qemu_get_machine_opts(), usb,
- has_defaults  default_usb);
+if (qemu_opt_find(qemu_get_machine_opts(), usb)) {
+return current_machine-usb;
+} else {
+return has_defaults  default_usb;
+}
  }

  #ifndef _WIN32



That still leaves the other boolean machine options broken. A generic
fix would be good. Or revert the original commit until we have one.


Just pulled current master, and at least the iommu machine option is
still triggering an abort.

What is the status of fixing these fallouts? Was anything else addressed
by now, just this one forgotten?

No, is not forgotten. I waited to see that the approach to fix this issue
is accepted by the reviewers.

Thanks for the reminder, patches will be submitted soon.
Marcel




Jan







[Qemu-devel] [v4 12/13] migration: Add command to set migration parameter

2015-02-02 Thread Liang Li
Add the qmp and hmp commands to tune the parameters used in live
migration.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hmp-commands.hx   | 15 ++
 hmp.c | 35 ++
 hmp.h |  3 ++
 include/migration/migration.h |  4 +--
 migration/migration.c | 69 +++
 monitor.c | 18 +++
 qapi-schema.json  | 52 
 qmp-commands.hx   | 25 
 8 files changed, 206 insertions(+), 15 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e37bc8b..535b5ba 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -985,6 +985,21 @@ Enable/Disable the usage of a capability @var{capability} 
for migration.
 ETEXI
 
 {
+.name   = migrate_set_parameter,
+.args_type  = parameter:s,value:i,
+.params = parameter value,
+.help   = Set the parameter for migration,
+.mhandler.cmd = hmp_migrate_set_parameter,
+.command_completion = migrate_set_parameter_completion,
+},
+
+STEXI
+@item migrate_set_parameter @var{parameter} @var{value}
+@findex migrate_set_parameter
+Set the parameter @var{parameter} for migration.
+ETEXI
+
+{
 .name   = client_migrate_info,
 .args_type  = 
protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?,
 .params = protocol hostname port tls-port cert-subject,
diff --git a/hmp.c b/hmp.c
index 481be80..faab4b0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1134,6 +1134,41 @@ void hmp_migrate_set_capability(Monitor *mon, const 
QDict *qdict)
 }
 }
 
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
+{
+const char *param = qdict_get_str(qdict, parameter);
+int value = qdict_get_int(qdict, value);
+Error *err = NULL;
+MigrationParameterStatusList *params = g_malloc0(sizeof(*params));
+MigrationParameterInt *data;
+int i;
+
+for (i = 0; i  MIGRATION_PARAMETER_MAX; i++) {
+if (strcmp(param, MigrationParameter_lookup[i]) == 0) {
+params-value = g_malloc0(sizeof(*params-value));
+params-value-kind = i;
+params-value-data = g_malloc0(sizeof(MigrationParameterInt));
+data = (MigrationParameterInt *)params-value-data;
+data-value = value;
+params-next = NULL;
+qmp_migrate_set_parameters(params, err);
+break;
+}
+}
+
+if (i == MIGRATION_PARAMETER_MAX) {
+error_set(err, QERR_INVALID_PARAMETER, param);
+}
+
+qapi_free_MigrationParameterStatusList(params);
+
+if (err) {
+monitor_printf(mon, migrate_set_parameter: %s\n,
+   error_get_pretty(err));
+error_free(err);
+}
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, protocol);
diff --git a/hmp.h b/hmp.h
index 4bb5dca..429efea 100644
--- a/hmp.h
+++ b/hmp.h
@@ -63,6 +63,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
+void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
@@ -111,6 +112,8 @@ void watchdog_action_completion(ReadLineState *rs, int 
nb_args,
 const char *str);
 void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
const char *str);
+void migrate_set_parameter_completion(ReadLineState *rs, int nb_args,
+  const char *str);
 void host_net_add_completion(ReadLineState *rs, int nb_args, const char *str);
 void host_net_remove_completion(ReadLineState *rs, int nb_args,
 const char *str);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0c4f21c..8e09b42 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -50,9 +50,7 @@ struct MigrationState
 QEMUBH *cleanup_bh;
 QEMUFile *file;
 QemuThread *compress_thread;
-int compress_thread_count;
-int decompress_thread_count;
-int compress_level;
+int parameters[MIGRATION_PARAMETER_MAX];
 
 int state;
 MigrationParams params;
diff --git a/migration/migration.c b/migration/migration.c
index cbbd455..b5055dc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -66,9 +66,12 @@ MigrationState *migrate_get_current(void)
 .bandwidth_limit = MAX_THROTTLE,
 .xbzrle_cache_size = 

Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: Add QEMU_PKGVERSION to QEMU_IMG_VERSION

2015-02-02 Thread Kevin Wolf
Am 09.01.2015 um 16:17 hat Don Slutz geschrieben:
 This is the same way vl.c handles this.
 
 Signed-off-by: Don Slutz dsl...@verizon.com

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Peter Lieven

Am 02.02.2015 um 14:23 schrieb Kevin Wolf:

Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:

fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev d...@openvz.org
Reviewed-by: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  }
  }
  
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)

+{
+BDRVRawState *s = bs-opaque;
+struct stat st;
+
+if (fstat(s-fd, st)  0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
+return;
+}
+bs-bl.max_write_zeroes = INT_MAX;
+}

Peter, do you remember why INT_MAX isn't actually the default? I think
the most reasonable behaviour would be that a limitation is only used if
a block driver requests it, and otherwise unlimited is assumed.


The default (0) actually means unlimited or undefined. We introduced
that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
sized requests because there is no guarantee that write zeroes is a
fast operation. We should set INT_MAX only if we know that write
zeroes of an arbitrary size is always fast.

Peter




[Qemu-devel] [PATCH v4 1/5] block: add accounting for merged requests

2015-02-02 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 block.c|  2 ++
 block/accounting.c |  7 +++
 block/qapi.c   |  2 ++
 hmp.c  |  6 +-
 include/block/accounting.h |  3 +++
 qapi/block-core.json   |  9 -
 qmp-commands.hx| 22 ++
 7 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..61412e9 100644
--- a/block.c
+++ b/block.c
@@ -4562,6 +4562,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 }
 }
 
+block_acct_merge_done(bs-stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
+
 return outidx + 1;
 }
 
diff --git a/block/accounting.c b/block/accounting.c
index 18102f0..01d594f 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -54,3 +54,10 @@ void block_acct_highest_sector(BlockAcctStats *stats, 
int64_t sector_num,
 stats-wr_highest_sector = sector_num + nb_sectors - 1;
 }
 }
+
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+  int num_requests)
+{
+assert(type  BLOCK_MAX_IOTYPE);
+stats-merged[type] += num_requests;
+}
diff --git a/block/qapi.c b/block/qapi.c
index 75c388e..d1a8917 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -335,6 +335,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s-stats-wr_bytes = bs-stats.nr_bytes[BLOCK_ACCT_WRITE];
 s-stats-rd_operations = bs-stats.nr_ops[BLOCK_ACCT_READ];
 s-stats-wr_operations = bs-stats.nr_ops[BLOCK_ACCT_WRITE];
+s-stats-rd_merged = bs-stats.merged[BLOCK_ACCT_READ];
+s-stats-wr_merged = bs-stats.merged[BLOCK_ACCT_WRITE];
 s-stats-wr_highest_offset =
 bs-stats.wr_highest_sector * BDRV_SECTOR_SIZE;
 s-stats-flush_operations = bs-stats.nr_ops[BLOCK_ACCT_FLUSH];
diff --git a/hmp.c b/hmp.c
index 481be80..5a10154 100644
--- a/hmp.c
+++ b/hmp.c
@@ -474,6 +474,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
 wr_total_time_ns=% PRId64
 rd_total_time_ns=% PRId64
 flush_total_time_ns=% PRId64
+rd_merged=% PRId64
+wr_merged=% PRId64
\n,
stats-value-stats-rd_bytes,
stats-value-stats-wr_bytes,
@@ -482,7 +484,9 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats-value-stats-flush_operations,
stats-value-stats-wr_total_time_ns,
stats-value-stats-rd_total_time_ns,
-   stats-value-stats-flush_total_time_ns);
+   stats-value-stats-flush_total_time_ns,
+   stats-value-stats-rd_merged,
+   stats-value-stats-wr_merged);
 }
 
 qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 50b42b3..4c406cf 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -39,6 +39,7 @@ typedef struct BlockAcctStats {
 uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
+uint64_t merged[BLOCK_MAX_IOTYPE];
 uint64_t wr_highest_sector;
 } BlockAcctStats;
 
@@ -53,5 +54,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
unsigned int nb_sectors);
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+   int num_requests);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 80984d1..b7d9772 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -407,13 +407,20 @@
 # growable sparse files (like qcow2) that are used on top
 # of a physical device.
 #
+# @rd_merged: Number of read requests that have been merged into another
+# request (Since 2.3).
+#
+# @wr_merged: Number of write requests that have been merged into another
+# request (Since 2.3).
+#
 # Since: 0.14.0
 ##
 { 'type': 'BlockDeviceStats',
   'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
'wr_operations': 'int', 'flush_operations': 'int',
'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } }
+   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
+   'rd_merged': 'int', 'wr_merged': 'int' } }
 
 ##
 # @BlockStats:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c5f16dd..af3fd19 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx

Re: [Qemu-devel] [PATCH] qed: Really remove unused field QEDAIOCB.finished

2015-02-02 Thread Kevin Wolf
Am 28.01.2015 um 02:51 hat Fam Zheng geschrieben:
 The commit 533ffb17a that removed qed_aiocb_info.cancel said to remove
 this but didn't do it.
 
 Signed-off-by: Fam Zheng f...@redhat.com

Thanks, applied to the block branch.

Kevin



[Qemu-devel] [PATCH v4 5/5] virtio-blk: add a knob to disable request merging

2015-02-02 Thread Peter Lieven
this adds a knob to disable request merging for debugging or benchmarks if 
dedired.

Signed-off-by: Peter Lieven p...@kamp.de
---
 hw/block/virtio-blk.c  | 5 -
 include/hw/virtio/virtio-blk.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d0a01a8..8c51a29 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -531,7 +531,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 /* merge would exceed maximum number of requests or IO direction
  * changes */
 if (mrb-num_reqs  0  (mrb-num_reqs == VIRTIO_BLK_MAX_MERGE_REQS ||
-  is_write != mrb-is_write)) {
+  is_write != mrb-is_write ||
+  !req-dev-conf.request_merging)) {
 virtio_blk_submit_multireq(req-dev-blk, mrb);
 }
 
@@ -950,6 +951,8 @@ static Property virtio_blk_properties[] = {
 #ifdef __linux__
 DEFINE_PROP_BIT(scsi, VirtIOBlock, conf.scsi, 0, true),
 #endif
+DEFINE_PROP_BIT(request-merging, VirtIOBlock, conf.request_merging, 0,
+true),
 DEFINE_PROP_BIT(x-data-plane, VirtIOBlock, conf.data_plane, 0, false),
 DEFINE_PROP_END_OF_LIST(),
 };
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 2027a64..fc7d311 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -113,6 +113,7 @@ struct VirtIOBlkConf
 uint32_t scsi;
 uint32_t config_wce;
 uint32_t data_plane;
+uint32_t request_merging;
 };
 
 struct VirtIOBlockDataPlane;
-- 
1.9.1




Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Kevin Wolf
Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben:
 Am 02.02.2015 um 14:23 schrieb Kevin Wolf:
 Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
 fallocate() works fine and could handle properly with arbitrary size
 requests. There is no sense to reduce the amount of space to fallocate.
 The bigger is the size, the better is the performance as the amount of
 journal updates is reduced.
 
 The patch changes behavior for both generic filesystem and XFS codepaths,
 which are different in handle_aiocb_write_zeroes. The implementation
 of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
 thus the change is fine for both ways.
 
 Signed-off-by: Denis V. Lunev d...@openvz.org
 Reviewed-by: Max Reitz mre...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 CC: Stefan Hajnoczi stefa...@redhat.com
 CC: Peter Lieven p...@kamp.de
 CC: Fam Zheng f...@redhat.com
 ---
   block/raw-posix.c | 17 +
   1 file changed, 17 insertions(+)
 
 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index 7b42f37..933c778 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, 
 int fd, Error **errp)
   }
   }
 +static void raw_probe_max_write_zeroes(BlockDriverState *bs)
 +{
 +BDRVRawState *s = bs-opaque;
 +struct stat st;
 +
 +if (fstat(s-fd, st)  0) {
 +return; /* no problem, keep default value */
 +}
 +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
 +return;
 +}
 +bs-bl.max_write_zeroes = INT_MAX;
 +}
 Peter, do you remember why INT_MAX isn't actually the default? I think
 the most reasonable behaviour would be that a limitation is only used if
 a block driver requests it, and otherwise unlimited is assumed.
 
 The default (0) actually means unlimited or undefined. We introduced
 that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
 sized requests because there is no guarantee that write zeroes is a
 fast operation. We should set INT_MAX only if we know that write
 zeroes of an arbitrary size is always fast.

Well, splitting it up doesn't make it any faster. I think we can assume
that drv-bdrv_co_write_zeroes() wants to know the full request size
unless the driver has explicitly set bs-bl.max_write_zeroes.

Only if we go on emulating the operation with a zero-filled buffer, I
understand that we might need to split it up so that our bounce buffer
doesn't become huge.

Kevin



Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence

2015-02-02 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 02/02/2015 13:28, Pavel Dovgaluk wrote:
 cpu-exception_index = EXCP_INTERRUPT;
 next_tb = 0;
+qemu_notify_event();
  
   Why is this needed?
  It is needed to wake up iothread in replay mode.
  Otherwise it waits for additional time and replay becomes too slow.
 
 What event (something from a timerlist?) is ready, that has not been
 notified to the iothread yet?  qemu_notify_event() should never be
 needed in common case.  It's probably missing somewhere else.

I think in this case there are no events at all - just reading timers values
that were made while recording.
We have to replay these reads by waking iothread.

Pavel Dovgalyuk




Re: [Qemu-devel] [PATCH v6 0/7] eliminate data write in bdrv_write_zeroes on Linux in raw-posix.c

2015-02-02 Thread Kevin Wolf
Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
 I have performed several tests with non-aligned fallocate calls and
 in all cases (with non-aligned fallocates) Linux performs fine, i.e.
 areas are zeroed correctly. Checks were made on
 Linux 3.16.0-28-generic #38-Ubuntu SMP
 
 This should seriously increase performance of bdrv_write_zeroes

Thanks, applied all to the block branch.

Note that I'm hoping to remove patch 7/7 from the queue again if we come
to the conclusion that changing the global default instead makes more
sense.

Kevin



Re: [Qemu-devel] [PATCH v2 1/4] qemu-io: initialize progname with error_set_progname()

2015-02-02 Thread Hitoshi Mitake
At Mon, 2 Feb 2015 12:36:00 +0100,
Kevin Wolf wrote:
 
 Am 02.02.2015 um 10:51 hat Hitoshi Mitake geschrieben:
  At Thu, 22 Jan 2015 18:08:11 +0900,
  Hitoshi Mitake wrote:
   
   Calling error_get_progname() in the context of qemu-io can cause
   segmentation fault because qemu-io doesn't initialize its progname
   with error_set_progname(). This patch adds the initialization.
   
   Currently, the missing call of error_set_progname() doesn't cause any
   problems because qemu-io doesn't use error_get_progname(). This patch
   is a proactive action.
   
   Cc: Kevin Wolf kw...@redhat.com
   Cc: Stefan Hajnoczi stefa...@redhat.com
   Cc: Markus Armbruster arm...@redhat.com
   Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
   ---
qemu-io.c | 1 +
1 file changed, 1 insertion(+)
  
  Hi Kevin, Stefan, could you pick this patch? sheepdog driver has a
  pending patch [1] which depends on it. I believe at least qemu-io
  should call error_set_progname() because block drivers can use the
  qemu-error infrastructure.
 
 So are you planning to get the individual patches merged on their own
 instead of the whole series as one?

I thought this patchset can be proactive fix. So I'm separating the
patch for sheepdog driver from this one.

 
  [1] 
  https://github.com/sheepdog/qemu/commit/a95c35e606a2a189e7dbaf645277c5c306b01e4b
 
 That patch looks wrong. Nobody guarantees that qemu-io is really called
 qemu-io. The user can name their binaries as they want and create
 symlinks with any name, and indeed names for qemu-img used by some
 distributions include 'qemu-img-kvm' and 'kvm-img'. You need to find a
 different way than checking binary file names.

I couldn't consider about the case, thanks for pointing! I'll seek
other method for distinguishing binary files.

Thanks,
Hitoshi

 
 Kevin



Re: [Qemu-devel] [PATCH RFC v6 18/20] virtio: support revision-specific features

2015-02-02 Thread Cornelia Huck
On Sun, 1 Feb 2015 23:29:20 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Jan 30, 2015 at 03:08:08PM +0100, Cornelia Huck wrote:
  On Wed, 7 Jan 2015 21:10:07 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Wed, Jan 07, 2015 at 05:22:32PM +0100, Cornelia Huck wrote:
On Sun, 28 Dec 2014 10:32:06 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Dec 11, 2014 at 02:25:20PM +0100, Cornelia Huck wrote:
  Devices may support different sets of feature bits depending on 
  which
  revision they're operating at. Let's give the transport a way to
  re-query the device about its features when the revision has been
  changed.
  
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 So now we have both get_features and get_features_rev, and
 it's never clear which revision does host_features refer to.
 IMHO that's just too messy.
 Let's add get_legacy_features and host_legacy_features instead?

I wanted to avoid touching anything that does not support version 1.
And this interface might still work for later revisions, no?
   
   We can add _modern_ then, or rename host_features to host_legacy_features
   everywhere as preparation.
   
  
  OK, I've ditched the don't modify old stuff goal and introduced
  -get_features_legacy(). For now, devices will add VERSION_1 in their
  -get_features() callback when they support it. (For many devices, this
  will be the only difference between the two callbacks.)
  
  Two sets of host_features don't make much sense to me.
 
 It's the most natural implementation given that
 some features are only set for legacy and some (will be) only
 for modern interfaces.

We may be talking about different things: There's the host_features
field which is referenced by the different device types, and there are
the sets of features for legacy or modern devices. These are currently
set dynamically in the -get_features() callbacks, although we could
possibly convert them to pre-set values like in the kernel. Then I
agree, two sets would be natural. But I'd rather stick with the current
method, if only to avoid churn.




Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Peter Lieven

Am 02.02.2015 um 15:04 schrieb Kevin Wolf:

Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben:

Am 02.02.2015 um 14:23 schrieb Kevin Wolf:

Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:

fallocate() works fine and could handle properly with arbitrary size
requests. There is no sense to reduce the amount of space to fallocate.
The bigger is the size, the better is the performance as the amount of
journal updates is reduced.

The patch changes behavior for both generic filesystem and XFS codepaths,
which are different in handle_aiocb_write_zeroes. The implementation
of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
thus the change is fine for both ways.

Signed-off-by: Denis V. Lunev d...@openvz.org
Reviewed-by: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7b42f37..933c778 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
  }
  }
+static void raw_probe_max_write_zeroes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+struct stat st;
+
+if (fstat(s-fd, st)  0) {
+return; /* no problem, keep default value */
+}
+if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
+return;
+}
+bs-bl.max_write_zeroes = INT_MAX;
+}

Peter, do you remember why INT_MAX isn't actually the default? I think
the most reasonable behaviour would be that a limitation is only used if
a block driver requests it, and otherwise unlimited is assumed.

The default (0) actually means unlimited or undefined. We introduced
that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
sized requests because there is no guarantee that write zeroes is a
fast operation. We should set INT_MAX only if we know that write
zeroes of an arbitrary size is always fast.

Well, splitting it up doesn't make it any faster. I think we can assume
that drv-bdrv_co_write_zeroes() wants to know the full request size
unless the driver has explicitly set bs-bl.max_write_zeroes.


You mean sth like this:

diff --git a/block.c b/block.c
index 61412e9..8272ef9 100644
--- a/block.c
+++ b/block.c
@@ -3192,10 +3192,7 @@ int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState 
*bs,
 BDRV_REQ_COPY_ON_READ);
 }

-/* if no limit is specified in the BlockLimits use a default
- * of 32768 512-byte sectors (16 MiB) per request.
- */
-#define MAX_WRITE_ZEROES_DEFAULT 32768
+#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768

 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
@@ -3206,7 +3203,7 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 int ret = 0;

 int max_write_zeroes = bs-bl.max_write_zeroes ?
-   bs-bl.max_write_zeroes : MAX_WRITE_ZEROES_DEFAULT;
+   bs-bl.max_write_zeroes : INT_MAX;

 while (nb_sectors  0  !ret) {
 int num = nb_sectors;
@@ -3242,7 +3239,7 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 if (ret == -ENOTSUP) {
 /* Fall back to bounce buffer if write zeroes is unsupported */
 int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length,
- MAX_WRITE_ZEROES_DEFAULT);
+ MAX_WRITE_ZEROES_BOUNCE_BUFFER);
 num = MIN(num, max_xfer_len);
 iov.iov_len = num * BDRV_SECTOR_SIZE;
 if (iov.iov_base == NULL) {
@@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void 
*opaque)
 rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, rwco-nb_sectors);
 }

-/* if no limit is specified in the BlockLimits use a default
- * of 32768 512-byte sectors (16 MiB) per request.
- */
-#define MAX_DISCARD_DEFAULT 32768
-
 int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors)
 {
@@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }

-max_discard = bs-bl.max_discard ?  bs-bl.max_discard : 
MAX_DISCARD_DEFAULT;
+max_discard = bs-bl.max_discard ?  bs-bl.max_discard : INT_MAX;
 while (nb_sectors  0) {
 int ret;
 int num = nb_sectors;



Peter



Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device

2015-02-02 Thread Igor Mammedov
On Sun, 01 Feb 2015 14:56:26 +0200
Gal Hammer gham...@redhat.com wrote:

 On 22/01/2015 15:52, Igor Mammedov wrote:
  On Tue, 16 Dec 2014 17:50:43 +0200
  Gal Hammer gham...@redhat.com wrote:
 
  Based on Microsoft's sepecifications (paper can be dowloaded from
  http://go.microsoft.com/fwlink/?LinkId=260709), add a device
  description to the SSDT ACPI table and its implementation.
 
  The GUID is set using a global vmgenid.uuid parameter.
 
  Signed-off-by: Gal Hammer gham...@redhat.com
 
 
  --- a/hw/i386/acpi-build.c
  +++ b/hw/i386/acpi-build.c
  @@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info)
#define ACPI_BUILD_TABLE_FILE etc/acpi/tables
#define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp
#define ACPI_BUILD_TPMLOG_FILE etc/tpm/log
  +#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id
 
static void
build_header(GArray *linker, GArray *table_data,
  @@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker,
{
MachineState *machine = MACHINE(qdev_get_machine());
uint32_t nr_mem = machine-ram_slots;
  +uint32_t vm_gid_physical_address;
  +uint32_t vm_gid_offset = 0;
unsigned acpi_cpus = guest_info-apic_id_limit;
int ssdt_start = table_data-len;
uint8_t *ssdt_ptr;
  @@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker,
ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
  ssdt_isa_pest[0], 16, misc-pvpanic_port);
 
  +if (vm_generation_id_set()) {
  +vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0];
  +bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, 
  true);
  +bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE,
  +   ACPI_BUILD_TABLE_FILE,
  +   table_data,
  +   vm_gid_offset,
  +   sizeof(vm_gid_offset));
  could some explain how this pointer magic works,
 
 I can try, but don't you think that a magic is gone once explained? ;-)
 
   From my weak understanding it seems broken.
  Lets see:
 
[1] vm_gid_offset - must be pointer inside of dest_file blob 
  (ACPI_BUILD_VMGENID_FILE)
[2] vm_gid_offset - should hold offset of the place inside of src_file
   (ACPI_BUILD_TABLE_FILE) where to pointer inside of 
  dest_file should point to
 
 The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the 
 VM's GUID is stored. At the moment, it should always be zero because the 
 GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE.
 
 
  now:
 vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in 
  inside SSDT in ACPI_BUILD_TABLE_FILE.
 
  +ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
  +  ssdt_acpi_vm_gid_addr[0], 32, 
  vm_gid_physical_address);
  Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE.
 
 Yes. This offset is later patched by the linker to the full physical 
 address.
 
  After BIOS loads tables it's going to patch at
[3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* only 
  god knows where it will be/
 
  and on top of it write in it value:
*(ACPI_BUILD_TABLE_FILE +  *[3])
 
 We know exactly where it is, no need to call for god's help :-).
 
  This approach in general of patching arbitrary place in AML blob
  to get PHY addr of buffer with UUID, is quite a hack, especially
  in light of that we are trying to hide all direct access to AML
  blobs with related pointer arithmetic and manual patching.
 
  Why not reserve some potion of RAM and pass to BIOS/guest
  a reservation so it won't be part of AddressRangeMemory or
  AddressRangeACPI as MS spec requires? Then you won't need
  jump all above hoops to just get buffer's PHY addr.
 
 I'll be glad to hear a new idea that I didn't already try in one of 
 other previous patches. The problem is that the specification requires 
 working with a physical address, so it must be allocated from inside the 
 guest. Since the OS is not exist in this stage and I also don't want to 
 write a special driver just to allocate this buffer I had to choose this 
 approach.
how about creating device which will map 4K MMIO region in PCI hole
address space and passing it as a reservation via e820 table we have in QEMU.
Then address could be directly built in ACPI tables as constant value
at the time of ACPI tables creation.

That way it would be possible to get address of buffer without
firmware + guest OS doing anything and going through quite complex
chain for getting buffer address (qemu-bios-OSPM-qemu).
If you go current route, it would be needed to teach linker a new command
to make reservation in E820 so that allocated buffer won't be part of
of AddressRangeMemory as required by spec or anything else.
Which would make already hard to understand/use correctly linker API
even more complex.



Re: [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-02 Thread Ian Jackson
Wei Liu writes (Re: [v2][PATCH] libxl: add one machine property to support IGD 
GFX passthrough):
 On Mon, Feb 02, 2015 at 09:17:23AM +0800, Tiejun Chen wrote:
  When we're working to support IGD GFX passthrough with qemu
  upstream, instead of -gfx_passthru we'd like to make that
  a machine option, -machine xxx,-igd-passthru=on. This need
  to bring a change on tool side.
...
 My suggestion has one premise -- if upstream QEMU has already released
 that -gfx_passthru option. If there is no old one (in upstream QEMU)
 at all, then there is nothing to keep and deprecate.

I think the commit message of the xen.git commit should explain what
options are supported by which versions of qemu (including qemu
upstream's future plans).

That would provide (a) something which summarises the communication
etc. with qemu upstream and can be checked with them if necessary and
(b) something against which the libxl changes can be easily judged.

Thanks,
Ian.



Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.

2015-02-02 Thread Peter Maydell
On 2 February 2015 at 10:03, Paolo Bonzini pbonz...@redhat.com wrote:
 You can't do this -- this code is a signal handler so it could
 get run at any time including while the list of CPUs is being
 updated. (This is why we have the exit_request flag in the
 first place rather than just setting the exit_request flag in
 each CPU...)

 Actually you can do this if you are careful.

 In particular, you can do it while you are under the big QEMU lock.

...but this is a signal handler, so we can't guarantee that the
thread holds the big lock.

  If
 you are not, basically you have to treat the CPU list as RCU-protected,
 and this is doable because the CPU object cannot be removed and added
 back into the CPU list.

 Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd
 have to convert the CPU list to QLIST.  But the basic idea of this patch
 can be done.

...and if we can iterate over CPU lists in signal handlers,
the correct approach is probably to use the existing exit_request
flag rather than adding another one. (Needs investigation to
check and document semantics of that flag.)

-- PMM



Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device

2015-02-02 Thread Gal Hammer

On 02/02/2015 14:46, Igor Mammedov wrote:

On Sun, 01 Feb 2015 14:56:26 +0200
Gal Hammer gham...@redhat.com wrote:


On 22/01/2015 15:52, Igor Mammedov wrote:

On Tue, 16 Dec 2014 17:50:43 +0200
Gal Hammer gham...@redhat.com wrote:


Based on Microsoft's sepecifications (paper can be dowloaded from
http://go.microsoft.com/fwlink/?LinkId=260709), add a device
description to the SSDT ACPI table and its implementation.

The GUID is set using a global vmgenid.uuid parameter.

Signed-off-by: Gal Hammer gham...@redhat.com




--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info)
   #define ACPI_BUILD_TABLE_FILE etc/acpi/tables
   #define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp
   #define ACPI_BUILD_TPMLOG_FILE etc/tpm/log
+#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id

   static void
   build_header(GArray *linker, GArray *table_data,
@@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker,
   {
   MachineState *machine = MACHINE(qdev_get_machine());
   uint32_t nr_mem = machine-ram_slots;
+uint32_t vm_gid_physical_address;
+uint32_t vm_gid_offset = 0;
   unsigned acpi_cpus = guest_info-apic_id_limit;
   int ssdt_start = table_data-len;
   uint8_t *ssdt_ptr;
@@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker,
   ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
 ssdt_isa_pest[0], 16, misc-pvpanic_port);

+if (vm_generation_id_set()) {
+vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0];
+bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, true);
+bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE,
+   ACPI_BUILD_TABLE_FILE,
+   table_data,
+   vm_gid_offset,
+   sizeof(vm_gid_offset));

could some explain how this pointer magic works,


I can try, but don't you think that a magic is gone once explained? ;-)


  From my weak understanding it seems broken.
Lets see:

   [1] vm_gid_offset - must be pointer inside of dest_file blob 
(ACPI_BUILD_VMGENID_FILE)
   [2] vm_gid_offset - should hold offset of the place inside of src_file
  (ACPI_BUILD_TABLE_FILE) where to pointer inside of 
dest_file should point to


The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the
VM's GUID is stored. At the moment, it should always be zero because the
GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE.



now:
vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in inside 
SSDT in ACPI_BUILD_TABLE_FILE.


+ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
+  ssdt_acpi_vm_gid_addr[0], 32, vm_gid_physical_address);

Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE.


Yes. This offset is later patched by the linker to the full physical
address.


After BIOS loads tables it's going to patch at
   [3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* only 
god knows where it will be/

and on top of it write in it value:
   *(ACPI_BUILD_TABLE_FILE +  *[3])


We know exactly where it is, no need to call for god's help :-).


This approach in general of patching arbitrary place in AML blob
to get PHY addr of buffer with UUID, is quite a hack, especially
in light of that we are trying to hide all direct access to AML
blobs with related pointer arithmetic and manual patching.

Why not reserve some potion of RAM and pass to BIOS/guest
a reservation so it won't be part of AddressRangeMemory or
AddressRangeACPI as MS spec requires? Then you won't need
jump all above hoops to just get buffer's PHY addr.


I'll be glad to hear a new idea that I didn't already try in one of
other previous patches. The problem is that the specification requires
working with a physical address, so it must be allocated from inside the
guest. Since the OS is not exist in this stage and I also don't want to
write a special driver just to allocate this buffer I had to choose this
approach.

how about creating device which will map 4K MMIO region in PCI hole
address space and passing it as a reservation via e820 table we have in QEMU.
Then address could be directly built in ACPI tables as constant value
at the time of ACPI tables creation.


Isn't this will cause a VMEXIT when the guest is reading the GUID? If it 
is then this idea was already presented and Michael didn't approve it.



That way it would be possible to get address of buffer without
firmware + guest OS doing anything and going through quite complex
chain for getting buffer address (qemu-bios-OSPM-qemu).
If you go current route, it would be needed to teach linker a new command
to make reservation in E820 so that allocated buffer won't be part of
of AddressRangeMemory as required by spec or anything else.
Which would make already hard to 

Re: [Qemu-devel] [RFC PATCH v8 09/21] replay: interrupts and exceptions

2015-02-02 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 22/01/2015 09:52, Pavel Dovgalyuk wrote:
  +if (replay_mode == REPLAY_MODE_RECORD) {
  +replay_save_instructions();
  +replay_put_event(EVENT_EXCEPTION);
  +return true;
 
 Missing mutex lock/unlock.

I think not. We just have to write number of already executed instructions.
This number is not linked to exception event. They could be read in replay while
processing some other event.
 
  @@ -1294,6 +1295,9 @@ bool x86_cpu_exec_interrupt(CPUState *cs, int 
  interrupt_request)
   if (interrupt_request  CPU_INTERRUPT_POLL) {
   cs-interrupt_request = ~CPU_INTERRUPT_POLL;
   apic_poll_irq(cpu-apic_state);
  +if (replay_mode != REPLAY_MODE_NONE) {
  +return true;
  +}
   }
   #endif
 
 Can you explain this?  It probably needs a comment.

Each function call should process one interrupt at once, because it corresponds 
to single event in the log.

Pavel Dovgalyuk




[Qemu-devel] [PATCH v4 4/5] virtio-blk: introduce multiread

2015-02-02 Thread Peter Lieven
this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.

The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.

The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.

  |4k|   64k|4k
MB/s  | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
--++-++-++
master| 1221   | 1187| 4178   | 4114| 1745   | 1213
multiread | 1829   | 1189| 4639   | 4110| 1894   | 1216

Signed-off-by: Peter Lieven p...@kamp.de
---
 hw/block/dataplane/virtio-blk.c |   8 +-
 hw/block/virtio-blk.c   | 296 +++-
 include/hw/virtio/virtio-blk.h  |  19 +--
 trace-events|   1 +
 4 files changed, 218 insertions(+), 106 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 39c5d71..be957d1 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e)
 event_notifier_test_and_clear(s-host_notifier);
 blk_io_plug(s-conf-conf.blk);
 for (;;) {
-MultiReqBuffer mrb = {
-.num_writes = 0,
-};
+MultiReqBuffer mrb = {};
 int ret;
 
 /* Disable guest-host notifies to avoid unnecessary vmexits */
@@ -120,7 +118,9 @@ static void handle_notify(EventNotifier *e)
 virtio_blk_handle_request(req, mrb);
 }
 
-virtio_submit_multiwrite(s-conf-conf.blk, mrb);
+if (mrb.num_reqs) {
+virtio_blk_submit_multireq(s-conf-conf.blk, mrb);
+}
 
 if (likely(ret == -EAGAIN)) { /* vring emptied */
 /* Re-enable guest-host notifies and stop processing the vring.
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e04adb8..d0a01a8 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -34,6 +34,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 req-dev = s;
 req-qiov.size = 0;
 req-next = NULL;
+req-mr_next = NULL;
 return req;
 }
 
@@ -84,20 +85,32 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, 
int error,
 
 static void virtio_blk_rw_complete(void *opaque, int ret)
 {
-VirtIOBlockReq *req = opaque;
+VirtIOBlockReq *next = opaque;
 
-trace_virtio_blk_rw_complete(req, ret);
+while (next) {
+VirtIOBlockReq *req = next;
+next = req-mr_next;
+trace_virtio_blk_rw_complete(req, ret);
 
-if (ret) {
-int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
-bool is_read = !(p  VIRTIO_BLK_T_OUT);
-if (virtio_blk_handle_rw_error(req, -ret, is_read))
-return;
-}
+if (req-qiov.nalloc != -1) {
+/* If nalloc is != 1 req-qiov is a local copy of the original
+ * external iovec. It was allocated in submit_merged_requests
+ * to be able to merge requests. */
+qemu_iovec_destroy(req-qiov);
+}
 
-virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-block_acct_done(blk_get_stats(req-dev-blk), req-acct);
-virtio_blk_free_request(req);
+if (ret) {
+int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
+bool is_read = !(p  VIRTIO_BLK_T_OUT);
+if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
+continue;
+}
+}
+
+virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+block_acct_done(blk_get_stats(req-dev-blk), req-acct);
+virtio_blk_free_request(req);
+}
 }
 
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -291,24 +304,127 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 }
 }
 
-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
+static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb,
+   int start, int num_reqs, int niov)
 {
-int i, ret;
+QEMUIOVector *qiov = mrb-reqs[start]-qiov;
+int64_t sector_num = mrb-reqs[start]-sector_num;
+int nb_sectors = mrb-reqs[start]-qiov.size / BDRV_SECTOR_SIZE;
+bool is_write = mrb-is_write;
+
+if (num_reqs  1) {
+int i;
+struct iovec *tmp_iov = qiov-iov;
+int tmp_niov = qiov-niov;
+
+/* mrb-reqs[start]-qiov was initialized from external so we can't
+ * modifiy it here. We need to initialize it locally and then add the
+ 

[Qemu-devel] [PATCH v4 2/5] hw/virtio-blk: add a constant for max number of merged requests

2015-02-02 Thread Peter Lieven
As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.

Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
---
 hw/block/virtio-blk.c  | 2 +-
 include/hw/virtio/virtio-blk.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 4032fca..e04adb8 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -360,7 +360,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 block_acct_start(blk_get_stats(req-dev-blk), req-acct, req-qiov.size,
  BLOCK_ACCT_WRITE);
 
-if (mrb-num_writes == 32) {
+if (mrb-num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
 virtio_submit_multiwrite(req-dev-blk, mrb);
 }
 
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 4652b70..6ef3fa5 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -134,8 +134,10 @@ typedef struct VirtIOBlock {
 struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
+#define VIRTIO_BLK_MAX_MERGE_REQS 32
+
 typedef struct MultiReqBuffer {
-BlockRequestblkreq[32];
+BlockRequestblkreq[VIRTIO_BLK_MAX_MERGE_REQS];
 unsigned intnum_writes;
 } MultiReqBuffer;
 
-- 
1.9.1




Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread Steffen Klassert
On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote:
 On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
 
  We might as well have not have implemented the IPSEC stack at all,
  because as a result of the userland VPN stuff our IPSEC stack is
  largely unused except by a very narrow group of users.
 
 Well, I'd love to make better use of it if I can. I do suspect it makes
 most sense for userspace to continue to manage the probing of UDP
 connectivity, and the fallback to TCP mode — and I suspect it also makes
 sense to continue to use tun for passing packets up to the VPN client
 when it's using the TCP transport.
 
 So the question would be how we handle redirecting the packet flow to
 the optional UDP transport, when the VPN client determines that it's
 available. For the sake of the user setting up firewall and routing
 rules, I do think it's important that it continues to appear to
 userspace as the *same* device for the entire lifetime of the session,
 regardless of which transport the packets happen to be using at a given
 moment in time. It doesn't *have* to be tun, though. 
 
 You don't seem to like my suggestion of somehow pushing down an XFRM
 state to the tun device to direct the packets out there instead of up to
 userspace. Do you have an alternative suggestion... or a specific
 concern that would help me come up with something you like better?

Maybe you want to use a virtual tunnel interface (vti) what we have
already. Everything that is routed through such an interface is
guaranteed to be either encrypted if a matching xfrm state is present
or dropped. Same on the rceive side, everything that is received by
this interface is guaranteed to be IPsec processed. So you can do
a routing based decision about the IPsec processing.

While I'm sure it could handle the ESP in UDP encapsulation, I'm not that
sure about your TCP fallback because this requires a valid xfrm state
to allow packets to pass. Using the same interface for both is probably
not possible.



Re: [Qemu-devel] [PATCH 7/7] block/raw-posix: set max_write_zeroes to INT_MAX for regular files

2015-02-02 Thread Kevin Wolf
Am 02.02.2015 um 15:12 hat Peter Lieven geschrieben:
 Am 02.02.2015 um 15:04 schrieb Kevin Wolf:
 Am 02.02.2015 um 14:55 hat Peter Lieven geschrieben:
 Am 02.02.2015 um 14:23 schrieb Kevin Wolf:
 Am 30.01.2015 um 09:42 hat Denis V. Lunev geschrieben:
 fallocate() works fine and could handle properly with arbitrary size
 requests. There is no sense to reduce the amount of space to fallocate.
 The bigger is the size, the better is the performance as the amount of
 journal updates is reduced.
 
 The patch changes behavior for both generic filesystem and XFS codepaths,
 which are different in handle_aiocb_write_zeroes. The implementation
 of fallocate and xfsctl(XFS_IOC_ZERO_RANGE) for XFS are exactly the same
 thus the change is fine for both ways.
 
 Signed-off-by: Denis V. Lunev d...@openvz.org
 Reviewed-by: Max Reitz mre...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 CC: Stefan Hajnoczi stefa...@redhat.com
 CC: Peter Lieven p...@kamp.de
 CC: Fam Zheng f...@redhat.com
 ---
   block/raw-posix.c | 17 +
   1 file changed, 17 insertions(+)
 
 diff --git a/block/raw-posix.c b/block/raw-posix.c
 index 7b42f37..933c778 100644
 --- a/block/raw-posix.c
 +++ b/block/raw-posix.c
 @@ -293,6 +293,20 @@ static void raw_probe_alignment(BlockDriverState 
 *bs, int fd, Error **errp)
   }
   }
 +static void raw_probe_max_write_zeroes(BlockDriverState *bs)
 +{
 +BDRVRawState *s = bs-opaque;
 +struct stat st;
 +
 +if (fstat(s-fd, st)  0) {
 +return; /* no problem, keep default value */
 +}
 +if (!S_ISREG(st.st_mode) || !s-discard_zeroes) {
 +return;
 +}
 +bs-bl.max_write_zeroes = INT_MAX;
 +}
 Peter, do you remember why INT_MAX isn't actually the default? I think
 the most reasonable behaviour would be that a limitation is only used if
 a block driver requests it, and otherwise unlimited is assumed.
 The default (0) actually means unlimited or undefined. We introduced
 that limit of 16MB in bdrv_co_write_zeroes to create only reasonable
 sized requests because there is no guarantee that write zeroes is a
 fast operation. We should set INT_MAX only if we know that write
 zeroes of an arbitrary size is always fast.
 Well, splitting it up doesn't make it any faster. I think we can assume
 that drv-bdrv_co_write_zeroes() wants to know the full request size
 unless the driver has explicitly set bs-bl.max_write_zeroes.
 
 You mean sth like this:

Yes, I think that's what I meant.

Kevin

 diff --git a/block.c b/block.c
 index 61412e9..8272ef9 100644
 --- a/block.c
 +++ b/block.c
 @@ -3192,10 +3192,7 @@ int coroutine_fn 
 bdrv_co_copy_on_readv(BlockDriverState *bs,
  BDRV_REQ_COPY_ON_READ);
  }
 
 -/* if no limit is specified in the BlockLimits use a default
 - * of 32768 512-byte sectors (16 MiB) per request.
 - */
 -#define MAX_WRITE_ZEROES_DEFAULT 32768
 +#define MAX_WRITE_ZEROES_BOUNCE_BUFFER 32768
 
  static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
 @@ -3206,7 +3203,7 @@ static int coroutine_fn 
 bdrv_co_do_write_zeroes(BlockDriverState *bs,
  int ret = 0;
 
  int max_write_zeroes = bs-bl.max_write_zeroes ?
 -   bs-bl.max_write_zeroes : 
 MAX_WRITE_ZEROES_DEFAULT;
 +   bs-bl.max_write_zeroes : INT_MAX;
 
  while (nb_sectors  0  !ret) {
  int num = nb_sectors;
 @@ -3242,7 +3239,7 @@ static int coroutine_fn 
 bdrv_co_do_write_zeroes(BlockDriverState *bs,
  if (ret == -ENOTSUP) {
  /* Fall back to bounce buffer if write zeroes is unsupported */
  int max_xfer_len = MIN_NON_ZERO(bs-bl.max_transfer_length,
 - MAX_WRITE_ZEROES_DEFAULT);
 + MAX_WRITE_ZEROES_BOUNCE_BUFFER);
  num = MIN(num, max_xfer_len);
  iov.iov_len = num * BDRV_SECTOR_SIZE;
  if (iov.iov_base == NULL) {
 @@ -5099,11 +5096,6 @@ static void coroutine_fn bdrv_discard_co_entry(void 
 *opaque)
  rwco-ret = bdrv_co_discard(rwco-bs, rwco-sector_num, 
 rwco-nb_sectors);
  }
 
 -/* if no limit is specified in the BlockLimits use a default
 - * of 32768 512-byte sectors (16 MiB) per request.
 - */
 -#define MAX_DISCARD_DEFAULT 32768
 -
  int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors)
  {
 @@ -5128,7 +5120,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
 int64_t sector_num,
  return 0;
  }
 
 -max_discard = bs-bl.max_discard ?  bs-bl.max_discard : 
 MAX_DISCARD_DEFAULT;
 +max_discard = bs-bl.max_discard ?  bs-bl.max_discard : INT_MAX;
  while (nb_sectors  0) {
  int ret;
  int num = nb_sectors;
 
 
 
 Peter



[Qemu-devel] [PATCH v4 3/5] block-backend: expose bs-bl.max_transfer_length

2015-02-02 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Max Reitz mre...@redhat.com
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d00c129..c28e240 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -580,6 +580,11 @@ int blk_get_flags(BlockBackend *blk)
 return bdrv_get_flags(blk-bs);
 }
 
+int blk_get_max_transfer_length(BlockBackend *blk)
+{
+return blk-bs-bl.max_transfer_length;
+}
+
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
 bdrv_set_guest_block_size(blk-bs, align);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8871a02..aab12b9 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -127,6 +127,7 @@ int blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
+int blk_get_max_transfer_length(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
-- 
1.9.1




Re: [Qemu-devel] [PATCH V11 2/3] i386: Add a Virtual Machine Generation ID device

2015-02-02 Thread Igor Mammedov
On Mon, 02 Feb 2015 15:13:39 +0200
Gal Hammer gham...@redhat.com wrote:

 On 02/02/2015 14:46, Igor Mammedov wrote:
  On Sun, 01 Feb 2015 14:56:26 +0200
  Gal Hammer gham...@redhat.com wrote:
 
  On 22/01/2015 15:52, Igor Mammedov wrote:
  On Tue, 16 Dec 2014 17:50:43 +0200
  Gal Hammer gham...@redhat.com wrote:
 
  Based on Microsoft's sepecifications (paper can be dowloaded from
  http://go.microsoft.com/fwlink/?LinkId=260709), add a device
  description to the SSDT ACPI table and its implementation.
 
  The GUID is set using a global vmgenid.uuid parameter.
 
  Signed-off-by: Gal Hammer gham...@redhat.com
 
 
  --- a/hw/i386/acpi-build.c
  +++ b/hw/i386/acpi-build.c
  @@ -257,6 +257,7 @@ static void acpi_get_pci_info(PcPciInfo *info)
 #define ACPI_BUILD_TABLE_FILE etc/acpi/tables
 #define ACPI_BUILD_RSDP_FILE etc/acpi/rsdp
 #define ACPI_BUILD_TPMLOG_FILE etc/tpm/log
  +#define ACPI_BUILD_VMGENID_FILE etc/vm-generation-id
 
 static void
 build_header(GArray *linker, GArray *table_data,
  @@ -1068,6 +1069,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 {
 MachineState *machine = MACHINE(qdev_get_machine());
 uint32_t nr_mem = machine-ram_slots;
  +uint32_t vm_gid_physical_address;
  +uint32_t vm_gid_offset = 0;
 unsigned acpi_cpus = guest_info-apic_id_limit;
 int ssdt_start = table_data-len;
 uint8_t *ssdt_ptr;
  @@ -1096,6 +1099,21 @@ build_ssdt(GArray *table_data, GArray *linker,
 ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
   ssdt_isa_pest[0], 16, misc-pvpanic_port);
 
  +if (vm_generation_id_set()) {
  +vm_gid_physical_address = ssdt_start + ssdt_acpi_vm_gid_addr[0];
  +bios_linker_loader_alloc(linker, ACPI_BUILD_VMGENID_FILE, 8, 
  true);
  +bios_linker_loader_add_pointer(linker, ACPI_BUILD_VMGENID_FILE,
  +   ACPI_BUILD_TABLE_FILE,
  +   table_data,
  +   vm_gid_offset,
  +   sizeof(vm_gid_offset));
  could some explain how this pointer magic works,
 
  I can try, but don't you think that a magic is gone once explained? ;-)
 
From my weak understanding it seems broken.
  Lets see:
 
 [1] vm_gid_offset - must be pointer inside of dest_file blob 
  (ACPI_BUILD_VMGENID_FILE)
 [2] vm_gid_offset - should hold offset of the place inside of src_file
(ACPI_BUILD_TABLE_FILE) where to pointer inside of 
  dest_file should point to
 
  The vm_gid_offset should point where in the ACPI_BUILD_VMGENID_FILE the
  VM's GUID is stored. At the moment, it should always be zero because the
  GUID is stored at the begging of the ACPI_BUILD_VMGENID_FILE.
 
 
  now:
  vm_gid_physical_address - holds [2] i.e. offset of VGIA constant in 
  inside SSDT in ACPI_BUILD_TABLE_FILE.
 
  +ACPI_BUILD_SET_LE(ssdt_ptr, sizeof(ssdp_misc_aml),
  +  ssdt_acpi_vm_gid_addr[0], 32, 
  vm_gid_physical_address);
  Then we write this offset into VGIA in ACPI_BUILD_TABLE_FILE.
 
  Yes. This offset is later patched by the linker to the full physical
  address.
 
  After BIOS loads tables it's going to patch at
 [3] ACPI_BUILD_VMGENID_FILE + (vm_gid_offset - table_data-data) /* 
  only god knows where it will be/
 
  and on top of it write in it value:
 *(ACPI_BUILD_TABLE_FILE +  *[3])
 
  We know exactly where it is, no need to call for god's help :-).
 
  This approach in general of patching arbitrary place in AML blob
  to get PHY addr of buffer with UUID, is quite a hack, especially
  in light of that we are trying to hide all direct access to AML
  blobs with related pointer arithmetic and manual patching.
 
  Why not reserve some potion of RAM and pass to BIOS/guest
  a reservation so it won't be part of AddressRangeMemory or
  AddressRangeACPI as MS spec requires? Then you won't need
  jump all above hoops to just get buffer's PHY addr.
 
  I'll be glad to hear a new idea that I didn't already try in one of
  other previous patches. The problem is that the specification requires
  working with a physical address, so it must be allocated from inside the
  guest. Since the OS is not exist in this stage and I also don't want to
  write a special driver just to allocate this buffer I had to choose this
  approach.
  how about creating device which will map 4K MMIO region in PCI hole
  address space and passing it as a reservation via e820 table we have in 
  QEMU.
  Then address could be directly built in ACPI tables as constant value
  at the time of ACPI tables creation.
 
 Isn't this will cause a VMEXIT when the guest is reading the GUID? If it 
 is then this idea was already presented and Michael didn't approve it.
It will, but is it performance critical? VM supposed to read it
at start-up and on getting notification. So I think VMEXIT in this case
is not sufficient to drop simple and 

[Qemu-devel] qemu 2.2.1 ?

2015-02-02 Thread Peter Lieven

Hi,

are there any plans to release a 2.2 maintainance release anytime soon?

Thanks,
Peter




Re: [Qemu-devel] [RFC PATCH v8 09/21] replay: interrupts and exceptions

2015-02-02 Thread Paolo Bonzini


On 02/02/2015 14:50, Pavel Dovgaluk wrote:
 I think not. We just have to write number of already executed instructions.
 This number is not linked to exception event. They could be read in replay 
 while
 processing some other event.
  

I was referring to replay_put_event(EVENT_EXCEPTION) only.

In principle you could run EVENT_EXCEPTION concurrently with other event
writes, for example:

+replay_mutex_lock();
+replay_put_event(EVENT_CLOCK + kind);
+replay_put_qword(clock);
+replay_mutex_unlock();

and get

EVENT_CLOCK + kind (1 byte)
EVENT_EXCEPTION (1 byte)
clock (8 bytes)

in the replay stream.

I know it's really unlikely, perhaps even impossible in the current QEMU
architecture that is dominated by the big QEMU lock.  But the right
thing to do is to lock the mutex around all event writes.  Even if the
write itself is atomic, it could happen in the middle of another write
if you do not lock the mutex.

Paolo



Re: [Qemu-devel] [PATCH] Fix ABI incompatibility between Qemu-aarch64 and Linux Kernel in signal handling.

2015-02-02 Thread Peter Maydell
On 2 February 2015 at 08:23, Maxim Ostapenko
m.ostape...@partner.samsung.com wrote:
 Sorry, missed ML.

You missed qemu-devel as well :-)

 On 02/02/2015 12:20 PM, Maxim Ostapenko wrote:

 Hi,

 this patch fixes https://bugs.launchpad.net/qemu/+bug/1416988.

 There is a small ABI incompatibility between Qemu-aarch64 and Linux Kernel
 3.19 caused by wrong size of struct target_siginfo in Qemu. This tiny patch
 fixes the issue.

 -Maxim

 From 1d662e325d004bce2d640cfea5b337c92c7feca2 Mon Sep 17 00:00:00 2001
 From: Max Ostapenko m.ostape...@partner.samsung.com
 Date: Mon, 2 Feb 2015 12:03:20 +0400
 Subject: [PATCH] linux-user: wrong TARGET_SI_PAD_SIZE value for some
 targets.

 Fix TARGET_SI_PAD_SIZE for S390X, SPARC, ALPHA and AARCH64 to correspond
 Linux Kernel 3.19 c59c961ca511dc7ee2f4f7e9c224d16f5c76ca6e revision.

Thanks for this patch. I'm afraid that to apply it we need a
Signed-off-by: line from you that certifies that you're OK to
contribute it. (This is the same as Linux kernel patches; see
the Patch emails must include a Signed-off-by: line section in
http://wiki.qemu.org/Contribute/SubmitAPatch ).

 ---
  linux-user/syscall_defs.h | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
 index ebb3be1..b95a0b2 100644
 --- a/linux-user/syscall_defs.h
 +++ b/linux-user/syscall_defs.h
 @@ -655,7 +655,12 @@ typedef struct {
  #endif

  #define TARGET_SI_MAX_SIZE128
 +#if defined(TARGET_S390X) || defined(TARGET_SPARC) \
 +   || defined(TARGET_ALPHA) || defined(TARGET_AARCH64)

I don't think this ifdef is correct. For instance for sparc
the kernel defines things differently for 32 and 64 bit.

In fact as far as I can tell for all architectures the
kernel defines __ARCH_SI_PREAMBLE_SIZE to (4 * sizeof(int))
for 64 bit and (3 * sizeof(int)) for 32 bit.

 +#define TARGET_SI_PAD_SIZE((TARGET_SI_MAX_SIZE/sizeof(int)) - 4)
 +#else
  #define TARGET_SI_PAD_SIZE((TARGET_SI_MAX_SIZE/sizeof(int)) - 3)
 +#endif

So I would suggest

#if TARGET_ABI_BITS == 32
#define TARGET_SI_PREAMBLE_SIZE (3 * sizeof(int))
#else
#define TARGET_SI_PREAMBLE_SIZE (4 * sizeof(int))
#endif

#define TARGET_SI_PAD_SIZE ((TARGET_SI_MAX_SIZE -
TARGET_SI_PREAMBLE_SIZE) / sizeof(int))

which should give us the correct answer for all archs and
configurations (including I think the 32-bit-on-64-bit
setups like sparc32plus). It also rephrases the
calculation to match what the kernel uses.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 09/19] qtest/ahci: Demagic ahci tests.

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:42, John Snow wrote:
  memset(cmd, 0x00, sizeof(cmd));
 -cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
 -cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */
 -cmd.prdtl = 1; /* One PRD table entry. */
 +cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
 +cmd.flags = CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */

Same = vs. |= note as before.

Paolo



Re: [Qemu-devel] [PATCH 10/19] libqos/ahci: Add ide cmd properties

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:42, John Snow wrote:
 The Invalid Command Sentinel here that caps the property array is an
 invalid ATA command, namely 0x01. 0x00 is NOP and 0xFF is reserved for
 vendor usage, so I chose the first invalid one instead.

You can use ARRAY_SIZE instead.

Paolo



[Qemu-devel] [v4 06/13] arch_init: Add and free data struct for decompression

2015-02-02 Thread Liang Li
Define the data structure and variables used to do multiple thread
decompression, and add the code to initialize and free them.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch_init.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index 87c4947..500f299 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -345,7 +345,12 @@ struct CompressParam {
 typedef struct CompressParam CompressParam;
 
 struct DecompressParam {
-/* To be done */
+bool busy;
+QemuMutex mutex;
+QemuCond cond;
+void *des;
+uint8 *compbuf;
+int len;
 };
 typedef struct DecompressParam DecompressParam;
 
@@ -1188,6 +1193,9 @@ void migrate_decompress_threads_create(int count)
 compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
 quit_thread = false;
 for (i = 0; i  count; i++) {
+qemu_mutex_init(decomp_param[i].mutex);
+qemu_cond_init(decomp_param[i].cond);
+decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
 qemu_thread_create(decompress_threads + i, decompress,
do_data_decompress, decomp_param + i,
QEMU_THREAD_JOINABLE);
@@ -1202,6 +1210,9 @@ void migrate_decompress_threads_join(void)
 thread_count = migrate_decompress_threads();
 for (i = 0; i  thread_count; i++) {
 qemu_thread_join(decompress_threads + i);
+qemu_mutex_destroy(decomp_param[i].mutex);
+qemu_cond_destroy(decomp_param[i].cond);
+g_free(decomp_param[i].compbuf);
 }
 g_free(decompress_threads);
 g_free(decomp_param);
-- 
1.9.1




Re: [Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:42, John Snow wrote:
 +void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes,
 +unsigned prd_size)
 +{
 +/* Each PRD can describe up to 4MiB, and must not be odd. */
 +g_assert_cmphex(prd_size, =, 4096 * 1024);
 +g_assert_cmphex(prd_size  0x01, ==, 0x00);
 +cmd-prd_size = prd_size;
 +cmd-xbytes = xbytes;
 +cmd-fis.count = cpu_to_le16(cmd-xbytes / AHCI_SECTOR_SIZE);

Why do you need cpu_to_le16 here, instead of having it in the function
that writes the command to guest memory?

Paolo

 +cmd-header.prdtl = size_to_prdtl(cmd-xbytes, cmd-prd_size);
 +}
 +



Re: [Qemu-devel] [PATCH 00/19] qtest/ahci: add dma test

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:41, John Snow wrote:
 Add a simple DMA r/w test to ahci-test.
 
 Oh, and for the first 18 patches, refactor everything into helpers so
 that each ahci_test isn't a thousand lines long.
 
 This patch depends upon the ahci test preliminary refactoring series
 upstream, which shuffled a lot of libqos and malloc facilities to
 support this series.
 
 This patchset is a necessary step in checking in AHCI/DMA migration
 tests that I will later use as proof as suitability of enabling the
 ICH9 and AHCI migration flags.

Sent a few comments, everything that I didn't reply to has my Reviewed-by.

Paolo

 John Snow (19):
   libqos/ahci: Add ahci_port_select helper
   libqos/ahci: Add ahci_port_clear helper
   qtest/ahci: rename 'Command' to 'CommandHeader'
   libqos/ahci: Add command header helpers
   libqos/ahci: Add ahci_port_check_error helper
   libqos/ahci: Add ahci_port_check_interrupts helper
   libqos/ahci: Add port_check_nonbusy helper
   libqos/ahci: Add cmd response sanity check helpers
   qtest/ahci: Demagic ahci tests.
   libqos/ahci: Add ide cmd properties
   libqos/ahci: add ahci command functions
   libqos/ahci: add ahci command verify
   libqos/ahci: add ahci command size setters
   libqos/ahci: Add ahci_guest_io
   libqos/ahci: add ahci_io
   libqos/ahci: Add ahci_clean_mem
   qtest/ahci: Add a macro bootup routine
   qtest/ahci: Assert sector size in identify test
   qtest/ahci: Adding simple dma read-write test
 
  tests/ahci-test.c | 246 +-
  tests/libqos/ahci.c   | 554 
 ++
  tests/libqos/ahci.h   | 163 ---
  tests/libqos/malloc.c |   5 +
  tests/libqos/malloc.h |   1 +
  5 files changed, 798 insertions(+), 171 deletions(-)
 



[Qemu-devel] [v4 11/13] migration: Add interface to control compression

2015-02-02 Thread Liang Li
The multiple compression threads can be turned on/off through
qmp and hmp interface before doing live migration.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com
---
 migration/migration.c | 7 +--
 qapi-schema.json  | 7 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a6f6e02..cbbd455 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -588,8 +588,11 @@ bool migrate_zero_blocks(void)
 
 bool migrate_use_compression(void)
 {
-/* Disable compression before the patch series are applied */
-return false;
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s-enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS];
 }
 
 int migrate_compress_level(void)
diff --git a/qapi-schema.json b/qapi-schema.json
index e16f8eb..0dfc4ce 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -491,13 +491,18 @@
 #  to enable the capability on the source VM. The feature is disabled 
by
 #  default. (since 1.6)
 #
+# @compress: Use multiple compression threads to accelerate live migration.
+#  This feature can help to reduce the migration traffic, by sending
+#  compressed pages. The feature is disabled by default. (since 2.3)
+#
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #  to speed up convergence of RAM migration. (since 1.6)
 #
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+  'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
+   'compress'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.9.1




Re: [Qemu-devel] [PATCH v2 1/4] qemu-io: initialize progname with error_set_progname()

2015-02-02 Thread Kevin Wolf
Am 02.02.2015 um 10:51 hat Hitoshi Mitake geschrieben:
 At Thu, 22 Jan 2015 18:08:11 +0900,
 Hitoshi Mitake wrote:
  
  Calling error_get_progname() in the context of qemu-io can cause
  segmentation fault because qemu-io doesn't initialize its progname
  with error_set_progname(). This patch adds the initialization.
  
  Currently, the missing call of error_set_progname() doesn't cause any
  problems because qemu-io doesn't use error_get_progname(). This patch
  is a proactive action.
  
  Cc: Kevin Wolf kw...@redhat.com
  Cc: Stefan Hajnoczi stefa...@redhat.com
  Cc: Markus Armbruster arm...@redhat.com
  Signed-off-by: Hitoshi Mitake mitake.hito...@lab.ntt.co.jp
  ---
   qemu-io.c | 1 +
   1 file changed, 1 insertion(+)
 
 Hi Kevin, Stefan, could you pick this patch? sheepdog driver has a
 pending patch [1] which depends on it. I believe at least qemu-io
 should call error_set_progname() because block drivers can use the
 qemu-error infrastructure.

So are you planning to get the individual patches merged on their own
instead of the whole series as one?

 [1] 
 https://github.com/sheepdog/qemu/commit/a95c35e606a2a189e7dbaf645277c5c306b01e4b

That patch looks wrong. Nobody guarantees that qemu-io is really called
qemu-io. The user can name their binaries as they want and create
symlinks with any name, and indeed names for qemu-img used by some
distributions include 'qemu-img-kvm' and 'kvm-img'. You need to find a
different way than checking binary file names.

Kevin



Re: [Qemu-devel] [RFC PATCH v8 08/21] cpu: replay instructions sequence

2015-02-02 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 22/01/2015 09:52, Pavel Dovgalyuk wrote:
  This patch adds calls to replay functions into the icount setup block.
  In record mode number of executed instructions is written to the log.
  In replay mode number of istructions to execute is taken from the replay 
  log.
 
  Signed-off-by: Pavel Dovgalyuk pavel.dovga...@ispras.ru
  ---
   cpu-exec.c  |1 +
   cpus.c  |   28 ++--
   replay/replay.c |   24 
   replay/replay.h |4 
   4 files changed, 47 insertions(+), 10 deletions(-)
 
  diff --git a/cpu-exec.c b/cpu-exec.c
  index 49f01f5..99a0993 100644
  --- a/cpu-exec.c
  +++ b/cpu-exec.c
  @@ -531,6 +531,7 @@ int cpu_exec(CPUArchState *env)
   }
   cpu-exception_index = EXCP_INTERRUPT;
   next_tb = 0;
  +qemu_notify_event();
 
 Why is this needed?

It is needed to wake up iothread in replay mode.
Otherwise it waits for additional time and replay becomes too slow.

 
   cpu_loop_exit(cpu);
   }
   break;

Pavel Dovgalyuk




Re: [Qemu-devel] [PATCH v2 4/4] linux-user: initialize progname with error_set_progname()

2015-02-02 Thread Hitoshi Mitake

Hi Peter,
Sorry for my late reply.

At Thu, 22 Jan 2015 10:07:27 +,
Peter Maydell wrote:
 
 On 22 January 2015 at 09:08, Hitoshi Mitake
 mitake.hito...@lab.ntt.co.jp wrote:
  Calling error_get_progname() in the context of qemu-x86_64 can cause
  segmentation fault because linux-user/main.c doesn't initialize its
  progname with error_set_progname(). This patch adds the
  initialization.
 
  Currently, the missing call of error_set_progname() doesn't cause any
  problems because qemu-x86_64 doesn't use error_get_progname(). This
  patch is a proactive action.
 
 I don't think this patch is right. The *-user binaries don't
 (and should not) use the qemu-error infrastructure (they don't
 have the monitor or any of the other stuff that uses). Code
 which tries to use error_get_progname() in *-user would be
 broken.

I just thought linux-user would be a potential user of the qemu-error
infrastructure because it is liked with util/qemu-error.o. Actually,
the changes of the patch (let linux-user/main.c include
qemu/error-report.h and call error_get_progname()) didn't cause any
problems.

But I'm not familiar with linux-user. If the command should never be a
user of the error infrastructure, I'd like to drop this patch, of
course.

Thanks,
Hitoshi






Re: [Qemu-devel] [RFC 09/10] cpu: remove exit_request global.

2015-02-02 Thread Paolo Bonzini


On 29/01/2015 16:52, Peter Maydell wrote:
  +CPU_FOREACH(cpu) {
  +cpu-exit_loop_request = 1;
  +}
   }
 You can't do this -- this code is a signal handler so it could
 get run at any time including while the list of CPUs is being
 updated. (This is why we have the exit_request flag in the
 first place rather than just setting the exit_request flag in
 each CPU...)

Actually you can do this if you are careful.

In particular, you can do it while you are under the big QEMU lock.  If
you are not, basically you have to treat the CPU list as RCU-protected,
and this is doable because the CPU object cannot be removed and added
back into the CPU list.

Unfortunately RCU doesn't support QTAILQ, at least not yet, so you'd
have to convert the CPU list to QLIST.  But the basic idea of this patch
can be done.

Paolo



[Qemu-devel] [v4 03/13] migration: Add the framework of multi-thread decompression

2015-02-02 Thread Liang Li
Add the code to create and destroy the multiple threads those will be
used to do data decompression. Left some functions empty just to keep
clearness, and the code will be added later.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch_init.c   | 75 +++
 include/migration/migration.h |  4 +++
 migration/migration.c | 16 +
 3 files changed, 95 insertions(+)

diff --git a/arch_init.c b/arch_init.c
index 1831f1a..ed34eb3 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -24,6 +24,7 @@
 #include stdint.h
 #include stdarg.h
 #include stdlib.h
+#include zlib.h
 #ifndef _WIN32
 #include sys/types.h
 #include sys/mman.h
@@ -126,6 +127,7 @@ static uint64_t bitmap_sync_count;
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
 /* 0x80 is reserved in migration.h start with 0x100 next */
+#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 
 static struct defconfig_file {
 const char *filename;
@@ -337,8 +339,16 @@ struct CompressParam {
 };
 typedef struct CompressParam CompressParam;
 
+struct DecompressParam {
+/* To be done */
+};
+typedef struct DecompressParam DecompressParam;
+
 static CompressParam *comp_param;
 static bool quit_thread;
+static DecompressParam *decomp_param;
+static QemuThread *decompress_threads;
+static uint8_t *compressed_data_buf;
 
 static void *do_data_compress(void *opaque)
 {
@@ -1128,10 +1138,58 @@ void ram_handle_compressed(void *host, uint8_t ch, 
uint64_t size)
 }
 }
 
+static void *do_data_decompress(void *opaque)
+{
+while (!quit_thread) {
+/* To be done */
+}
+
+return NULL;
+}
+
+void migrate_decompress_threads_create(int count)
+{
+int i;
+
+decompress_threads = g_new0(QemuThread, count);
+decomp_param = g_new0(DecompressParam, count);
+compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
+quit_thread = false;
+for (i = 0; i  count; i++) {
+qemu_thread_create(decompress_threads + i, decompress,
+   do_data_decompress, decomp_param + i,
+   QEMU_THREAD_JOINABLE);
+}
+}
+
+void migrate_decompress_threads_join(void)
+{
+int i, thread_count;
+
+quit_thread = true;
+thread_count = migrate_decompress_threads();
+for (i = 0; i  thread_count; i++) {
+qemu_thread_join(decompress_threads + i);
+}
+g_free(decompress_threads);
+g_free(decomp_param);
+g_free(compressed_data_buf);
+decompress_threads = NULL;
+decomp_param = NULL;
+compressed_data_buf = NULL;
+}
+
+static void decompress_data_with_multi_threads(uint8_t *compbuf,
+   void *host, int len)
+{
+/* To be done */
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 int flags = 0, ret = 0;
 static uint64_t seq_iter;
+int len = 0;
 
 seq_iter++;
 
@@ -1208,6 +1266,23 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 
 qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
 break;
+case RAM_SAVE_FLAG_COMPRESS_PAGE:
+host = host_from_stream_offset(f, addr, flags);
+if (!host) {
+error_report(Invalid RAM offset  RAM_ADDR_FMT, addr);
+ret = -EINVAL;
+break;
+}
+
+len = qemu_get_be32(f);
+if (len  0 || len  compressBound(TARGET_PAGE_SIZE)) {
+error_report(Invalid compressed data length: %d, len);
+ret = -EINVAL;
+break;
+}
+qemu_get_buffer(f, compressed_data_buf, len);
+decompress_data_with_multi_threads(compressed_data_buf, host, len);
+break;
 case RAM_SAVE_FLAG_XBZRLE:
 host = host_from_stream_offset(f, addr, flags);
 if (!host) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index daf6c81..0c4f21c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -51,6 +51,7 @@ struct MigrationState
 QEMUFile *file;
 QemuThread *compress_thread;
 int compress_thread_count;
+int decompress_thread_count;
 int compress_level;
 
 int state;
@@ -112,6 +113,8 @@ MigrationState *migrate_get_current(void);
 
 void migrate_compress_threads_create(MigrationState *s);
 void migrate_compress_threads_join(MigrationState *s);
+void migrate_decompress_threads_create(int count);
+void migrate_decompress_threads_join(void);
 uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
@@ -164,6 +167,7 @@ int64_t xbzrle_cache_resize(int64_t new_size);
 bool migrate_use_compression(void);
 int migrate_compress_level(void);
 int migrate_compress_threads(void);
+int migrate_decompress_threads(void);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void 

[Qemu-devel] [v4 13/13] migration: Add command to query migration parameter

2015-02-02 Thread Liang Li
Add the qmp and hmp commands to query the parameters used in live
migration.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 hmp-commands.hx   |  2 ++
 hmp.c | 21 +
 hmp.h |  1 +
 migration/migration.c | 27 +++
 monitor.c |  7 +++
 qapi-schema.json  | 11 +++
 qmp-commands.hx   | 26 ++
 7 files changed, 95 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 535b5ba..ed0c06a 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1779,6 +1779,8 @@ show user network stack connection states
 show migration status
 @item info migrate_capabilities
 show current migration capabilities
+@item info migrate_parameters
+show current migration parameters
 @item info migrate_cache_size
 show current migration XBZRLE cache size
 @item info balloon
diff --git a/hmp.c b/hmp.c
index faab4b0..33c95b3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -246,6 +246,27 @@ void hmp_info_migrate_capabilities(Monitor *mon, const 
QDict *qdict)
 qapi_free_MigrationCapabilityStatusList(caps);
 }
 
+void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
+{
+MigrationParameterStatusList *params, *p;
+MigrationParameterInt *data;
+
+params = qmp_query_migrate_parameters(NULL);
+
+if (params) {
+monitor_printf(mon, parameters:);
+for (p = params; p; p = p-next) {
+data = (MigrationParameterInt *)p-value-data;
+monitor_printf(mon,  %s: % PRId64,
+   MigrationParameter_lookup[p-value-kind],
+   data-value);
+}
+monitor_printf(mon, \n);
+}
+
+qapi_free_MigrationParameterStatusList(params);
+}
+
 void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict)
 {
 monitor_printf(mon, xbzrel cache size: % PRId64  kbytes\n,
diff --git a/hmp.h b/hmp.h
index 429efea..b2b2d2c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -28,6 +28,7 @@ void hmp_info_chardev(Monitor *mon, const QDict *qdict);
 void hmp_info_mice(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
+void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_info_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_block(Monitor *mon, const QDict *qdict);
diff --git a/migration/migration.c b/migration/migration.c
index b5055dc..f925130 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -179,6 +179,33 @@ MigrationCapabilityStatusList 
*qmp_query_migrate_capabilities(Error **errp)
 return head;
 }
 
+MigrationParameterStatusList *qmp_query_migrate_parameters(Error **errp)
+{
+MigrationParameterStatusList *head = NULL;
+MigrationParameterStatusList *params;
+MigrationState *s = migrate_get_current();
+MigrationParameterInt *data;
+int i;
+
+params = NULL; /* silence compiler warning */
+for (i = 0; i  MIGRATION_PARAMETER_MAX; i++) {
+if (head == NULL) {
+head = g_malloc0(sizeof(*params));
+params = head;
+} else {
+params-next = g_malloc0(sizeof(*params));
+params = params-next;
+}
+params-value = g_malloc(sizeof(*params-value));
+params-value-kind = i;
+params-value-data = g_malloc(sizeof(MigrationParameterInt));
+data = (MigrationParameterInt *)params-value-data;
+data-value = s-parameters[i];
+}
+
+return head;
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
 if (migrate_use_xbzrle()) {
diff --git a/monitor.c b/monitor.c
index fa8ebde..58dfa28 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2872,6 +2872,13 @@ static mon_cmd_t info_cmds[] = {
 .mhandler.cmd = hmp_info_migrate_capabilities,
 },
 {
+.name   = migrate_parameters,
+.args_type  = ,
+.params = ,
+.help   = show current migration parameters,
+.mhandler.cmd = hmp_info_migrate_parameters,
+},
+{
 .name   = migrate_cache_size,
 .args_type  = ,
 .params = ,
diff --git a/qapi-schema.json b/qapi-schema.json
index 273f991..af34f7f 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -593,6 +593,17 @@
 { 'command': 'migrate-set-parameters',
   'data': { 'parameters': ['MigrationParameterStatus'] } }
 ##
+# @query-migrate-parameters
+#
+# Returns information about the current migration parameters status
+#
+# Returns: @MigrationParametersStatus
+#
+# Since: 2.3
+##
+{ 'command': 'query-migrate-parameters',
+  'returns': ['MigrationParameterStatus'] }
+##
 ##
 # @MouseInfo:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9d16386..bcfe823 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3303,6 +3303,32 @@ EQMP
  

[Qemu-devel] [v4 04/13] qemu-file: Add compression functions to QEMUFile

2015-02-02 Thread Liang Li
qemu_put_compression_data() compress the data and put it to QEMUFile.
qemu_put_qemu_file() put the data in the buffer of source QEMUFile to
destination QEMUFile.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 include/migration/qemu-file.h |  3 +++
 migration/qemu-file.c | 39 +++
 2 files changed, 42 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index d843c00..2204fb9 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -159,6 +159,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
+size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
+ int level);
+int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
 /*
  * Note that you can only peek continuous bytes from where the current pointer
  * is; you aren't guaranteed to be able to peak to +n bytes unless you've
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index edc2830..de2da2d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -21,6 +21,7 @@
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  * THE SOFTWARE.
  */
+#include zlib.h
 #include qemu-common.h
 #include qemu/iov.h
 #include qemu/sockets.h
@@ -529,3 +530,41 @@ uint64_t qemu_get_be64(QEMUFile *f)
 v |= qemu_get_be32(f);
 return v;
 }
+
+/* compress size bytes of data start at p with specific compression
+ * leve and store the compressed data to the buffer of f.
+ */
+
+size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
+ int level)
+{
+size_t blen = IO_BUF_SIZE - f-buf_index - sizeof(int32_t);
+
+if (blen  compressBound(size)) {
+return 0;
+}
+if (compress2(f-buf + f-buf_index + sizeof(int32_t), blen, (Bytef *)p,
+  size, level) != Z_OK) {
+error_report(Compress Failed!);
+return 0;
+}
+qemu_put_be32(f, blen);
+f-buf_index += blen;
+return blen + sizeof(int32_t);
+}
+
+/* Put the data in the buffer of f_src to the buffer of f_des, and
+ * then reset the buf_index of f_src to 0.
+ */
+
+int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src)
+{
+int len = 0;
+
+if (f_src-buf_index  0) {
+len = f_src-buf_index;
+qemu_put_buffer(f_des, f_src-buf, f_src-buf_index);
+f_src-buf_index = 0;
+}
+return len;
+}
-- 
1.9.1




[Qemu-devel] [v4 01/13] docs: Add a doc about multiple thread compression

2015-02-02 Thread Liang Li
Give some details about the multiple thread (de)compression and
how to use it in live migration.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Reviewed-by: Dr.David Alan Gilbert dgilb...@redhat.com
---
 docs/multi-thread-compression.txt | 149 ++
 1 file changed, 149 insertions(+)
 create mode 100644 docs/multi-thread-compression.txt

diff --git a/docs/multi-thread-compression.txt 
b/docs/multi-thread-compression.txt
new file mode 100644
index 000..0d4d212
--- /dev/null
+++ b/docs/multi-thread-compression.txt
@@ -0,0 +1,149 @@
+Use multiple thread (de)compression in live migration
+=
+Copyright (C) 2015 Intel Corporation
+Author: Liang Li liang.z...@intel.com
+
+This work is licensed under the terms of the GNU GPLv2 or later. See
+the COPYING file in the top-level directory.
+
+Contents:
+=
+* Introduction
+* When to use
+* Performance
+* Usage
+* TODO
+
+Introduction
+
+Instead of sending the guest memory directly, this solution will
+compress the RAM page before sending; after receiving, the data will
+be decompressed. Using compression in live migration can help
+to reduce the data transferred about 60%, this is very useful when the
+bandwidth is limited, and the total migration time can also be reduced
+about 70% in a typical case. In addition to this, the VM downtime can be
+reduced about 50%. The benefit depends on data's compressibility in VM.
+
+The process of compression will consume additional CPU cycles, and the
+extra CPU cycles will increase the migration time. On the other hand,
+the amount of data transferred will decrease; this factor can reduce
+the total migration time. If the process of the compression is quick
+enough, then the total migration time can be reduced, and multiple
+thread compression can be used to accelerate the compression process.
+
+The decompression speed of Zlib is at least 4 times as quick as
+compression, if the source and destination CPU have equal speed,
+keeping the compression thread count 4 times the decompression
+thread count can avoid CPU waste.
+
+Compression level can be used to control the compression speed and the
+compression ratio. High compression ratio will take more time, level 0
+stands for no compression, level 1 stands for the best compression
+speed, and level 9 stands for the best compression ratio. Users can
+select a level number between 0 and 9.
+
+
+When to use the multiple thread compression in live migration
+=
+Compression of data will consume extra CPU cycles; so in a system with
+high overhead of CPU, avoid using this feature. When the network
+bandwidth is very limited and the CPU resource is adequate, use of
+multiple thread compression will be very helpful. If both the CPU and
+the network bandwidth are adequate, use of multiple thread compression
+can still help to reduce the migration time.
+
+Performance
+===
+Test environment:
+
+CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
+Socket Count: 2
+RAM: 128G
+NIC: Intel I350 (10/100/1000Mbps)
+Host OS: CentOS 7 64-bit
+Guest OS: RHEL 6.5 64-bit
+Parameter: qemu-system-x86_64 -enable-kvm -smp 4 -m 4096
+ /share/ia32e_rhel6u5.qcow -monitor stdio
+
+There is no additional application is running on the guest when doing
+the test.
+
+
+Speed limit: 1000Gb/s
+---
+| original  | compress thread: 8
+|   way | decompress thread: 2
+|   | compression level: 1
+---
+total time(msec):   |   |  1833
+---
+downtime(msec): |100|   27
+---
+transferred ram(kB):|  363536   | 107819
+---
+throughput(mbps):   |  893.73   | 482.22
+---
+total ram(kB):  |  4211524  | 4211524
+---
+
+There is an application running on the guest which write random numbers
+to RAM block areas periodically.
+
+Speed limit: 1000Gb/s
+---
+| original  | compress thread: 8
+|   way | decompress thread: 2
+|   | compression level: 1
+---
+total time(msec):   |   37369   | 15989
+---
+downtime(msec): |337|  173
+---
+transferred ram(kB):|  4274143  | 1699824

Re: [Qemu-devel] [PATCH 2/2] bootdevice: add check in restore_boot_order()

2015-02-02 Thread Markus Armbruster
Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 20:32, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:
 
 On 2015/1/30 20:01, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 15:46, Markus Armbruster wrote:

 Gonglei arei.gong...@huawei.com writes:

 On 2015/1/30 0:03, Alexander Graf wrote:



 On 29.01.15 14:29, arei.gong...@huawei.com wrote:
 From: Gonglei arei.gong...@huawei.com

 If boot order is invaild or is set failed,
 exit qemu.

 Signed-off-by: Gonglei arei.gong...@huawei.com

 Do we really want to kill the machine only because the boot device
 string doesn't validate?


 Not all of the situation. If people want to change boot order by qmp/hmp
 command, it just report an error, please see do_boot_set(). But
 if the boot
 order is set in qemu command line, it will exit qemu if the boot
 device string
 is invalidate, as this patch's situation, which follow the original
 processing
 way (commit ef3adf68).

 I think Alex isn't concerned about the monitor command, but what happens
 when boot order once is reset to order on system reset.

 -boot errors should have been detected during command line processing
 (strongly preferred) or initial startup (acceptable).  Detecting

 Yes, and it had done it just like that, please see main() of
 vl.c. So, actually
 it wouldn't fail in the check of restore_boot_order function's calling.
 The only possible fails will happen to call boot_set_handler(). Take
 x86 pc machine example, set_boot_dev() callback  may return errors.

 I don't like unreachable error messages.  If qemu_boot_set() can't fail
 in restore_boot_order(), then simply assert it doesn't fail, by passing
 error_abort.


 Sorry, I meant the validate_bootdevices() can't fail in 
 restore_boot_order(),
 but boot_set_handler(boot_set_opaque, boot_order, errp) may fail, such as
 set_boot_dev(). For example:
 x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -boot
 menu=on,order=nbcdep,once=c -monitor stdio -vnc :0
 QEMU 2.2.50 monitor - type 'help' for more information
 (qemu) system_reset
 (qemu) qemu-system-x86_64: Too many boot devices for PC
 
 The value of parameter order should be checked during command line
 processing (strongly preferred) or initial startup (acceptable) if at
 all possible.  Is it possible?

 Either 'once' option or 'order' option can take effect for -boot at
 the same time,
 that is say initial startup processing can check only one. Besides,
 the check is just for
 corresponding machine type, so command line processing also can't do it.

I challenge your idea that we can't check this before the guest starts
running.

qemu_boot_set() can fail for two reasons:

* validate_bootdevices() fails

  Should never happen, because we've called it in main() already,
  treating failure as fatal error.

* boot_set_handler is null

  MachineClass method init() may set this.  main() could *easily* test
  whether it did!  If it didn't, and -boot once is given, error out.
  Similar checks exist already, e.g. drive_check_orphaned(),
  net_check_clients().  They only warn, but that's detail.



Re: [Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:41, John Snow wrote:
 +/* Construct our Command Header (set_command_header handles endianness.) 
 */
 +memset(cmd, 0x00, sizeof(cmd));
 +cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
 +cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */

And here |= became = ?

Paolo

 +cmd.prdtl = 1; /* One PRD table entry. */



Re: [Qemu-devel] [PATCH 03/19] qtest/ahci: rename 'Command' to 'CommandHeader'

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:41, John Snow wrote:
 @@ -703,8 +703,8 @@ static void ahci_test_identify(AHCIQState *ahci)
  /* Copy the existing Command #0 structure from the CLB into local memory,
   * and build a new command #0. */
  memread(ahci-port[i].clb, cmd, sizeof(cmd));
 -cmd.b1 = 5;/* reg_h2d_fis is 5 double-words long */
 -cmd.b2 = 0x04; /* clear PxTFD.STS.BSY when done */
 +cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
 +cmd.flags |= 0x400; /* clear PxTFD.STS.BSY when done */

Missing cpu_to_le16, I think.

Paolo



Re: [Qemu-devel] [PATCH 19/19] qtest/ahci: Adding simple dma read-write test

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:42, John Snow wrote:
 +memwrite(ptr, tx, bufsize);
 +
 +/* Write this buffer to disk, then read it back to the DMA buffer. */
 +ahci_guest_io(ahci, px, CMD_WRITE_DMA, ptr, bufsize);

I would qmemset the buffer here.

Paolo

 +ahci_guest_io(ahci, px, CMD_READ_DMA, ptr, bufsize);
 +
 +/*** Read back the Data ***/
 +memread(ptr, rx, bufsize);
 +g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);



Re: [Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine

2015-02-02 Thread Paolo Bonzini


On 30/01/2015 19:42, John Snow wrote:
 +/**
 + * Boot and fully enable the HBA device.
 + * @see ahci_boot, ahci_pci_enable and ahci_hba_enable.
 + */
 +static AHCIQState *ahci_macro_bootup(void)

Ugly name...  I would just leave out this patch.

Paolo

 +{
 +AHCIQState *ahci;
 +ahci = ahci_boot();
 +
 +ahci_pci_enable(ahci);
 +ahci_hba_enable(ahci);
 +
 +return ahci;
 +}



[Qemu-devel] [v4 10/13] migration: Add the core code for decompression

2015-02-02 Thread Liang Li
Implement the core logic of multiple thread decompression,
the decompression can work now.

Signed-off-by: Liang Li liang.z...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
 arch_init.c | 49 +++--
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 8ef0315..549fdbb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -814,6 +814,13 @@ static inline void start_compression(CompressParam *param)
 qemu_mutex_unlock(param-mutex);
 }
 
+static inline void start_decompression(DecompressParam *param)
+{
+qemu_mutex_lock(param-mutex);
+param-busy = true;
+qemu_cond_signal(param-cond);
+qemu_mutex_unlock(param-mutex);
+}
 
 static uint64_t bytes_transferred;
 
@@ -1347,8 +1354,27 @@ void ram_handle_compressed(void *host, uint8_t ch, 
uint64_t size)
 
 static void *do_data_decompress(void *opaque)
 {
+DecompressParam *param = opaque;
+size_t pagesize;
+
 while (!quit_thread) {
-/* To be done */
+qemu_mutex_lock(param-mutex);
+while (!param-busy) {
+qemu_cond_wait(param-cond, param-mutex);
+if (quit_thread) {
+break;
+}
+pagesize = TARGET_PAGE_SIZE;
+/* uncompress() will return failed in some case, especially
+ * when the page is dirted when doing the compression, it's
+ * not a problem because the dirty page will be retransferred
+ * and uncompress() won't break the data in other pages.
+ */
+uncompress((Bytef *)param-des, pagesize,
+   (const Bytef *)param-compbuf, param-len);
+param-busy = false;
+}
+qemu_mutex_unlock(param-mutex);
 }
 
 return NULL;
@@ -1379,6 +1405,9 @@ void migrate_decompress_threads_join(void)
 quit_thread = true;
 thread_count = migrate_decompress_threads();
 for (i = 0; i  thread_count; i++) {
+qemu_cond_signal(decomp_param[i].cond);
+}
+for (i = 0; i  thread_count; i++) {
 qemu_thread_join(decompress_threads + i);
 qemu_mutex_destroy(decomp_param[i].mutex);
 qemu_cond_destroy(decomp_param[i].cond);
@@ -1395,7 +1424,23 @@ void migrate_decompress_threads_join(void)
 static void decompress_data_with_multi_threads(uint8_t *compbuf,
void *host, int len)
 {
-/* To be done */
+int idx, thread_count;
+
+thread_count = migrate_decompress_threads();
+while (true) {
+for (idx = 0; idx  thread_count; idx++) {
+if (!decomp_param[idx].busy) {
+memcpy(decomp_param[idx].compbuf, compbuf, len);
+decomp_param[idx].des = host;
+decomp_param[idx].len = len;
+start_decompression(decomp_param[idx]);
+break;
+}
+}
+if (idx  thread_count) {
+break;
+}
+}
 }
 
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
-- 
1.9.1




[Qemu-devel] [PATCH v4 0/13] migration: Add a new feature to do live migration

2015-02-02 Thread Liang Li
This feature can help to reduce the data transferred about 60%, and the
migration time can also be reduced about 70%.

Summary of changed from v3-v4

-Update the test data in the document
-Fix some typo errors 
-Use compressBound instead of MACRO define
-Optimize the performance when compression co-work with XBZRLE
-Added some comments
-Rename some functions and variables
-Incorrect coding style fix
-QMP type change for compatibility

-- 
1.9.1



  1   2   >