Re: [Qemu-devel] [PATCH v2 3/4] block: New bdrv_add_key(), convert monitor to use it

2015-01-30 Thread Max Reitz

On 2015-01-29 at 04:37, Markus Armbruster wrote:

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
  block.c   | 29 +
  blockdev.c| 24 ++--
  include/block/block.h |  1 +
  monitor.c | 16 +++-
  qmp.c |  8 
  5 files changed, 47 insertions(+), 31 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions

2015-01-30 Thread Maciej W. Rozycki
On Fri, 30 Jan 2015, Leon Alrae wrote:

  @@ -760,6 +760,6 @@ static inline int float128_is_any_nan(fl
   
  /*
   | The pattern for a default generated quadruple-precision NaN.
   
  **/
  -extern const float128 float128_default_nan;
  +__inline__ float128 float128_default_nan(void);
   
 
 Unfortunately clang complains about it and eventually it won't link:
 
   CCaarch64-softmmu/target-arm/helper-a64.o
 In file included from /qemu/target-arm/helper-a64.c:20:
 In file included from /qemu/target-arm/cpu.h:37:
 In file included from /qemu/include/qemu-common.h:120:
 In file included from /qemu/include/qemu/bswap.h:8:
 /qemu/include/fpu/softfloat.h:485:20: warning: inline function
 'float32_default_nan' is not defined [-Wundefined-inline]
 __inline__ float32 float32_default_nan(void);
^
 /qemu/target-arm/helper-a64.c:345:19: note: used here
 nan = float32_default_nan();
   ^
 In file included from /qemu/target-arm/helper-a64.c:20:
 In file included from /qemu/target-arm/cpu.h:37:
 In file included from /qemu/include/qemu-common.h:120:
 In file included from /qemu/include/qemu/bswap.h:8:
 /qemu/include/fpu/softfloat.h:597:20: warning: inline function
 'float64_default_nan' is not defined [-Wundefined-inline]
 __inline__ float64 float64_default_nan(void);
^
 /qemu/target-arm/helper-a64.c:374:19: note: used here
 nan = float64_default_nan();
   ^
 2 warnings generated.
   CCaarch64-softmmu/target-arm/gdbstub64.o
   CCaarch64-softmmu/target-arm/crypto_helper.o
   GEN   trace/generated-helpers.c
   CCaarch64-softmmu/trace/generated-helpers.o
   LINK  aarch64-softmmu/qemu-system-aarch64
 fpu/softfloat.o: In function `commonNaNToFloat64':
 /qemu/fpu/softfloat-specialize.h:796: undefined reference to
 `float64_default_nan'
 /qemu/fpu/softfloat-specialize.h:805: undefined reference to
 `float64_default_nan'
 fpu/softfloat.o: In function `commonNaNToFloatx80':
 /qemu/fpu/softfloat-specialize.h:1007: undefined reference to
 `floatx80_default_nan'
 /qemu/fpu/softfloat-specialize.h:1014: undefined reference to
 `floatx80_default_nan'

 Hmm, so perhaps my idea for a later improvement:

  Eventually we might want to move the new inline functions into a
 separate header to be included from softfloat.h instead of softfloat.c,
 but let's make changes one step at a time.

will actually have to be made right away.  I suspect GCC is more liberal 
here due to its convoluted extern/static/inline semantics history.  
Sigh...

  Maciej



Re: [Qemu-devel] [PATCH v2 00/17] ide: rerror and werror support for IDE and AHCI

2015-01-30 Thread John Snow



On 01/30/2015 04:38 AM, Paolo Bonzini wrote:



On 30/01/2015 01:44, John Snow wrote:

Post-holiday bump that this is sitting out there, awaiting love.
If this gets merged, we should be able to enable Q35 migration soon,
which would be nice.


Not sure how valuable my opinion is as the author of around 85% of the
series...  So I reviewed the rest.

Paolo



Understood, thank you!



[Qemu-devel] [PULL 02/11] rcu: add rcutorture

2015-01-30 Thread Paolo Bonzini
rcutorture is the unit test for rcu.

Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 tests/Makefile |   7 +-
 tests/rcutorture.c | 448 +
 2 files changed, 454 insertions(+), 1 deletion(-)
 create mode 100644 tests/rcutorture.c

diff --git a/tests/Makefile b/tests/Makefile
index c2e2e52..db5b3c3 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -60,6 +60,8 @@ gcov-files-test-mul64-y = util/host-utils.c
 check-unit-y += tests/test-int128$(EXESUF)
 # all code tested by test-int128 is inside int128.h
 gcov-files-test-int128-y =
+check-unit-y += tests/rcutorture$(EXESUF)
+gcov-files-rcutorture-y = util/rcu.c
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += 
tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
@@ -223,7 +225,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-qmp-input-visitor.o tests/test-qmp-input-strict.o \
tests/test-qmp-commands.o tests/test-visitor-serialization.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
-   tests/test-opts-visitor.o tests/test-qmp-event.o
+   tests/test-opts-visitor.o tests/test-qmp-event.o \
+   tests/rcutorture.o
 
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
  tests/test-qapi-event.o
@@ -252,6 +255,8 @@ tests/test-x86-cpuid$(EXESUF): tests/test-x86-cpuid.o
 tests/test-xbzrle$(EXESUF): tests/test-xbzrle.o migration/xbzrle.o 
page_cache.o libqemuutil.a
 tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
 tests/test-int128$(EXESUF): tests/test-int128.o
+tests/rcutorture$(EXESUF): tests/rcutorture.o libqemuutil.a
+
 tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
hw/core/irq.o \
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
new file mode 100644
index 000..e370843
--- /dev/null
+++ b/tests/rcutorture.c
@@ -0,0 +1,448 @@
+/*
+ * rcutorture.c: simple user-level performance/stress test of RCU.
+ *
+ * Usage:
+ * ./rcu nreaders rperf [ seconds ]
+ * Run a read-side performance test with the specified
+ * number of readers for seconds seconds.
+ * ./rcu nupdaters uperf [ seconds ]
+ * Run an update-side performance test with the specified
+ * number of updaters and specified duration.
+ * ./rcu nreaders perf [ seconds ]
+ * Run a combined read/update performance test with the specified
+ * number of readers and one updater and specified duration.
+ *
+ * The above tests produce output as follows:
+ *
+ * n_reads: 46008000  n_updates: 146026  nreaders: 2  nupdaters: 1 duration: 1
+ * ns/read: 43.4707  ns/update: 6848.1
+ *
+ * The first line lists the total number of RCU reads and updates executed
+ * during the test, the number of reader threads, the number of updater
+ * threads, and the duration of the test in seconds.  The second line
+ * lists the average duration of each type of operation in nanoseconds,
+ * or nan if the corresponding type of operation was not performed.
+ *
+ * ./rcu nreaders stress [ seconds ]
+ * Run a stress test with the specified number of readers and
+ * one updater.
+ *
+ * This test produces output as follows:
+ *
+ * n_reads: 114633217  n_updates: 3903415  n_mberror: 0
+ * rcu_stress_count: 114618391 14826 0 0 0 0 0 0 0 0 0
+ *
+ * The first line lists the number of RCU read and update operations
+ * executed, followed by the number of memory-ordering violations
+ * (which will be zero in a correct RCU implementation).  The second
+ * line lists the number of readers observing progressively more stale
+ * data.  A correct RCU implementation will have all but the first two
+ * numbers non-zero.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (c) 2008 Paul E. McKenney, IBM Corporation.
+ */
+
+/*
+ * Test variables.
+ */
+
+#include glib.h
+#include stdlib.h
+#include stdio.h
+#include string.h
+#include qemu/atomic.h
+#include qemu/rcu.h
+#include qemu/compiler.h
+#include qemu/thread.h
+
+long long n_reads = 0LL;
+long n_updates = 0L;

Re: [Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev

On 30/01/15 18:42, Max Reitz wrote:

On 2015-01-30 at 10:41, Denis V. Lunev wrote:

On 30/01/15 17:58, Max Reitz wrote:

On 2015-01-30 at 03:42, Denis V. Lunev wrote:
There is a possibility that we are extending our image and thus 
writing

zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could 
simply call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are 
defined.

Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+bool has_fallocate;
  bool needs_alignment;
  } BDRVRawState;
  @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
*bs, QDict *options,

  }
  if (S_ISREG(st.st_mode)) {
  s-discard_zeroes = true;
+s-has_fallocate = true;


This could be moved upwards where has_discard and has_write_zeroes 
are initialized; but it won't matter in practice, I hope. Thus:


Reviewed-by: Max Reitz mre...@redhat.com


This does matter as has_discard and has_write_zeroes are bit fields
thus I can not insert something useful into the middle of those
fields.


Right, but I did not mean the placement inside of the structure but 
the placement of the initialization statement (s-has_fallocate = 
true) in raw_open_common().


Max

hmm, you are right. This is possible but I don't want
to have this bit set for block/character etc devices
even if they are not using this bit/code. With my
approach the assignment is made in a way to indicate
application area.

Thank you for a review :) It is somewhat difficult
to obtain feedback here in comparison with Linux
kernel.



Re: [Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Denis V. Lunev

On 30/01/15 17:58, Max Reitz wrote:

On 2015-01-30 at 03:42, Denis V. Lunev wrote:

There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply 
call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+bool has_fallocate;
  bool needs_alignment;
  } BDRVRawState;
  @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
*bs, QDict *options,

  }
  if (S_ISREG(st.st_mode)) {
  s-discard_zeroes = true;
+s-has_fallocate = true;


This could be moved upwards where has_discard and has_write_zeroes are 
initialized; but it won't matter in practice, I hope. Thus:


Reviewed-by: Max Reitz mre...@redhat.com


This does matter as has_discard and has_write_zeroes are bit fields
thus I can not insert something useful into the middle of those
fields.



Re: [Qemu-devel] [PATCH 5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

2015-01-30 Thread Max Reitz

On 2015-01-30 at 10:41, Denis V. Lunev wrote:

On 30/01/15 17:58, Max Reitz wrote:

On 2015-01-30 at 03:42, Denis V. Lunev wrote:

There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply 
call

fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Max Reitz mre...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
CC: Peter Lieven p...@kamp.de
CC: Fam Zheng f...@redhat.com
---
  block/raw-posix.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+bool has_fallocate;
  bool needs_alignment;
  } BDRVRawState;
  @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
*bs, QDict *options,

  }
  if (S_ISREG(st.st_mode)) {
  s-discard_zeroes = true;
+s-has_fallocate = true;


This could be moved upwards where has_discard and has_write_zeroes 
are initialized; but it won't matter in practice, I hope. Thus:


Reviewed-by: Max Reitz mre...@redhat.com


This does matter as has_discard and has_write_zeroes are bit fields
thus I can not insert something useful into the middle of those
fields.


Right, but I did not mean the placement inside of the structure but the 
placement of the initialization statement (s-has_fallocate = true) in 
raw_open_common().


Max



Re: [Qemu-devel] [PATCH v2 2/4] blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macro

2015-01-30 Thread Max Reitz

On 2015-01-29 at 04:36, Markus Armbruster wrote:

From: Markus Armbruster arm...@pond.sub.org


Intentional?


The QERR_ macros are leftovers from the days of rich error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
this one up.

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
  blockdev.c| 3 ++-
  include/qapi/qmp/qerror.h | 3 ---
  2 files changed, 2 insertions(+), 4 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH v7 5/5] qemu-iotests: Add 093 for IO throttling

2015-01-30 Thread Max Reitz

On 2015-01-29 at 21:49, Fam Zheng wrote:

This case utilizes qemu-io command aio_{read,write} -q to verify the
effectiveness of IO throttling options.

It's implemented by driving the vm timer from qtest protocol, so the
throttling timers are signaled with determinied time duration. Then we
verify the completed IO requests are within 10% error of bps and iops
limits.

null protocol is used as the disk backend so that no actual disk IO is
performed on host, this will make the blockstats much more
deterministic. Both null-aio and null-co are covered, which is also
a simple cross validation test for the driver code.

Signed-off-by: Fam Zheng f...@redhat.com
---
  tests/qemu-iotests/093 | 114 +
  tests/qemu-iotests/093.out |   5 ++
  tests/qemu-iotests/group   |   1 +
  3 files changed, 120 insertions(+)
  create mode 100755 tests/qemu-iotests/093
  create mode 100644 tests/qemu-iotests/093.out


Reviewed-by: Max Reitz mre...@redhat.com



[Qemu-devel] vm live storage migration approach.

2015-01-30 Thread Yaodong Yang
Hi all,

I'm investigating the current schemes for the VM live storage migration in
QEMU system. I have the following questions:

1. What is the functionality of drive_mirror in QEMU? Is it designed as a
VM live storage migration approach?

2. What's the difference between drive_mirror and vMotion? I learned that
vMotion employs IO Mirroring mechanism to migration a running VM with all
the virtual disk images. Are there any other mechanisms inside QEMU serve
this purpose as well?

I'm looking for a mechanism in QEMU, which is similar to vMotion ( IO
Mirroring) in ESX environment.

Thanks!
Yaodong


[Qemu-devel] [PULL 06/11] memory: protect current_map by RCU

2015-01-30 Thread Paolo Bonzini
Replace the flat_view_mutex with RCU, avoiding futex contention for
dataplane on large systems and many iothreads.

Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/exec/memory.h |  5 +
 memory.c  | 54 ++-
 2 files changed, 28 insertions(+), 31 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0cd96b1..06ffa1d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -33,6 +33,7 @@
 #include qemu/notify.h
 #include qapi/error.h
 #include qom/object.h
+#include qemu/rcu.h
 
 #define MAX_PHYS_ADDR_SPACE_BITS 62
 #define MAX_PHYS_ADDR(((hwaddr)1  MAX_PHYS_ADDR_SPACE_BITS) - 1)
@@ -207,9 +208,13 @@ struct MemoryListener {
  */
 struct AddressSpace {
 /* All fields are private. */
+struct rcu_head rcu;
 char *name;
 MemoryRegion *root;
+
+/* Accessed via RCU.  */
 struct FlatView *current_map;
+
 int ioeventfd_nb;
 struct MemoryRegionIoeventfd *ioeventfds;
 struct AddressSpaceDispatch *dispatch;
diff --git a/memory.c b/memory.c
index 8c3d8c0..a844ced 100644
--- a/memory.c
+++ b/memory.c
@@ -33,26 +33,12 @@ static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 static bool global_dirty_log = false;
 
-/* flat_view_mutex is taken around reading as-current_map; the critical
- * section is extremely short, so I'm using a single mutex for every AS.
- * We could also RCU for the read-side.
- *
- * The BQL is taken around transaction commits, hence both locks are taken
- * while writing to as-current_map (with the BQL taken outside).
- */
-static QemuMutex flat_view_mutex;
-
 static QTAILQ_HEAD(memory_listeners, MemoryListener) memory_listeners
 = QTAILQ_HEAD_INITIALIZER(memory_listeners);
 
 static QTAILQ_HEAD(, AddressSpace) address_spaces
 = QTAILQ_HEAD_INITIALIZER(address_spaces);
 
-static void memory_init(void)
-{
-qemu_mutex_init(flat_view_mutex);
-}
-
 typedef struct AddrRange AddrRange;
 
 /*
@@ -242,6 +228,7 @@ struct FlatRange {
  * order.
  */
 struct FlatView {
+struct rcu_head rcu;
 unsigned ref;
 FlatRange *ranges;
 unsigned nr;
@@ -654,10 +641,10 @@ static FlatView *address_space_get_flatview(AddressSpace 
*as)
 {
 FlatView *view;
 
-qemu_mutex_lock(flat_view_mutex);
-view = as-current_map;
+rcu_read_lock();
+view = atomic_rcu_read(as-current_map);
 flatview_ref(view);
-qemu_mutex_unlock(flat_view_mutex);
+rcu_read_unlock();
 return view;
 }
 
@@ -766,10 +753,9 @@ static void address_space_update_topology(AddressSpace *as)
 address_space_update_topology_pass(as, old_view, new_view, false);
 address_space_update_topology_pass(as, old_view, new_view, true);
 
-qemu_mutex_lock(flat_view_mutex);
-flatview_unref(as-current_map);
-as-current_map = new_view;
-qemu_mutex_unlock(flat_view_mutex);
+/* Writes are protected by the BQL.  */
+atomic_rcu_set(as-current_map, new_view);
+call_rcu(old_view, flatview_unref, rcu);
 
 /* Note that all the old MemoryRegions are still alive up to this
  * point.  This relieves most MemoryListeners from the need to
@@ -1957,10 +1943,6 @@ void memory_listener_unregister(MemoryListener *listener)
 
 void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
 {
-if (QTAILQ_EMPTY(address_spaces)) {
-memory_init();
-}
-
 memory_region_transaction_begin();
 as-root = root;
 as-current_map = g_new(FlatView, 1);
@@ -1974,15 +1956,10 @@ void address_space_init(AddressSpace *as, MemoryRegion 
*root, const char *name)
 memory_region_transaction_commit();
 }
 
-void address_space_destroy(AddressSpace *as)
+static void do_address_space_destroy(AddressSpace *as)
 {
 MemoryListener *listener;
 
-/* Flush out anything from MemoryListeners listening in on this */
-memory_region_transaction_begin();
-as-root = NULL;
-memory_region_transaction_commit();
-QTAILQ_REMOVE(address_spaces, as, address_spaces_link);
 address_space_destroy_dispatch(as);
 
 QTAILQ_FOREACH(listener, memory_listeners, link) {
@@ -1994,6 +1971,21 @@ void address_space_destroy(AddressSpace *as)
 g_free(as-ioeventfds);
 }
 
+void address_space_destroy(AddressSpace *as)
+{
+/* Flush out anything from MemoryListeners listening in on this */
+memory_region_transaction_begin();
+as-root = NULL;
+memory_region_transaction_commit();
+QTAILQ_REMOVE(address_spaces, as, address_spaces_link);
+
+/* At this point, as-dispatch and as-current_map are dummy
+ * entries that the guest should never use.  Wait for the old
+ * values to expire before freeing the data.
+ */
+call_rcu(as, do_address_space_destroy, rcu);
+}
+
 bool io_mem_read(MemoryRegion *mr, hwaddr addr, uint64_t *pval, unsigned size)
 {
 return memory_region_dispatch_read(mr, addr, pval, size);
-- 
1.8.3.1





[Qemu-devel] [PULL 03/11] rcu: allow nesting of rcu_read_lock/rcu_read_unlock

2015-01-30 Thread Paolo Bonzini
Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 include/qemu/rcu.h | 15 ++-
 tests/rcutorture.c |  2 ++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index cfef36e..da043f2 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -68,6 +68,9 @@ struct rcu_reader_data {
 unsigned long ctr;
 bool waiting;
 
+/* Data used by reader only */
+unsigned depth;
+
 /* Data used for registry, protected by rcu_gp_lock */
 QLIST_ENTRY(rcu_reader_data) node;
 };
@@ -77,8 +80,13 @@ extern __thread struct rcu_reader_data rcu_reader;
 static inline void rcu_read_lock(void)
 {
 struct rcu_reader_data *p_rcu_reader = rcu_reader;
+unsigned ctr;
+
+if (p_rcu_reader-depth++  0) {
+return;
+}
 
-unsigned ctr = atomic_read(rcu_gp_ctr);
+ctr = atomic_read(rcu_gp_ctr);
 atomic_xchg(p_rcu_reader-ctr, ctr);
 if (atomic_read(p_rcu_reader-waiting)) {
 atomic_set(p_rcu_reader-waiting, false);
@@ -90,6 +98,11 @@ static inline void rcu_read_unlock(void)
 {
 struct rcu_reader_data *p_rcu_reader = rcu_reader;
 
+assert(p_rcu_reader-depth != 0);
+if (--p_rcu_reader-depth  0) {
+return;
+}
+
 atomic_xchg(p_rcu_reader-ctr, 0);
 if (atomic_read(p_rcu_reader-waiting)) {
 atomic_set(p_rcu_reader-waiting, false);
diff --git a/tests/rcutorture.c b/tests/rcutorture.c
index e370843..93ec1b3 100644
--- a/tests/rcutorture.c
+++ b/tests/rcutorture.c
@@ -255,9 +255,11 @@ static void *rcu_read_stress_test(void *arg)
 if (p-mbtest == 0) {
 n_mberror++;
 }
+rcu_read_lock();
 for (i = 0; i  100; i++) {
 garbage++;
 }
+rcu_read_unlock();
 pc = p-pipe_count;
 rcu_read_unlock();
 if ((pc  RCU_STRESS_PIPE_LEN) || (pc  0)) {
-- 
1.8.3.1





[Qemu-devel] [PULL 11/11] configure: Default to enable module build

2015-01-30 Thread Paolo Bonzini
From: Fam Zheng f...@redhat.com

We have module build support around for a while, but also had it bitrot
several times. It probably makes sense to enable it by default so that
people can notice and use it.

Add --disable-modules as a counterpart to --enable-modules, which is
now turned on by default.  If both are omitted, support is guessed as
usual.

Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 .travis.yml |  2 +-
 configure   | 95 ++---
 2 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 0ac170b..12bf1db 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -99,5 +99,5 @@ matrix:
   EXTRA_CONFIG=--enable-trace-backends=ust
   compiler: gcc
 - env: TARGETS=i386-softmmu,x86_64-softmmu
-   EXTRA_CONFIG=--enable-modules
+   EXTRA_CONFIG=--disable-modules
   compiler: gcc
diff --git a/configure b/configure
index f185dd0..2c3a444 100755
--- a/configure
+++ b/configure
@@ -271,7 +271,7 @@ gcov_tool=gcov
 EXESUF=
 DSOSUF=.so
 LDFLAGS_SHARED=-shared
-modules=no
+modules=
 prefix=/usr/local
 mandir=\${prefix}/share/man
 datadir=\${prefix}/share
@@ -768,6 +768,9 @@ for opt do
   --enable-modules)
   modules=yes
   ;;
+  --disable-modules)
+  modules=no
+  ;;
   --cpu=*)
   ;;
   --target-list=*) target_list=$optarg
@@ -1259,7 +1262,8 @@ Advanced options (experts only):
   --sysconfdir=PATHinstall config in PATH$confsuffix
   --localstatedir=PATH install local state in PATH (set at runtime on 
win32)
   --with-confsuffix=SUFFIX suffix for QEMU data inside 
datadir/libdir/sysconfdir [$confsuffix]
-  --enable-modules enable modules support
+  --enable-modules enable modules support (default)
+  --disable-modulesenable modules support
   --enable-debug-tcg   enable TCG debugging
   --disable-debug-tcg  disable TCG debugging (default)
   --enable-debug-info  enable debugging information (default)
@@ -2725,22 +2729,25 @@ if test $mingw32 = yes; then
 else
 glib_req_ver=2.12
 fi
-glib_modules=gthread-2.0
-if test $modules = yes; then
-glib_modules=$glib_modules gmodule-2.0
-fi
 
-for i in $glib_modules; do
-if $pkg_config --atleast-version=$glib_req_ver $i; then
-glib_cflags=`$pkg_config --cflags $i`
-glib_libs=`$pkg_config --libs $i`
-CFLAGS=$glib_cflags $CFLAGS
-LIBS=$glib_libs $LIBS
-libs_qga=$glib_libs $libs_qga
-else
-error_exit glib-$glib_req_ver $i is required to compile QEMU
-fi
-done
+glib_pkg_config()
+{
+  if $pkg_config --atleast-version=$glib_req_ver $1; then
+local probe_cflags=$($pkg_config --cflags $1)
+local probe_libs=$($pkg_config --libs $1)
+CFLAGS=$probe_cflags $CFLAGS
+LIBS=$probe_libs $LIBS
+libs_qga=$probe_libs $libs_qga
+glib_cflags=$probe_cflags $glib_cflags
+glib_libs=$probe_libs $glib_libs
+return 0
+  else
+return 1
+  fi
+}
+
+glib_pkg_config gthread-2.0 || \
+  error_exit glib-$glib_req_ver gthread-2.0 is required to compile QEMU
 
 # g_test_trap_subprocess added in 2.38. Used by some tests.
 glib_subprocess=yes
@@ -2749,19 +2756,49 @@ if ! $pkg_config --atleast-version=2.38 glib-2.0; then
 fi
 
 ##
-# SHA command probe for modules
-if test $modules = yes; then
-shacmd_probe=sha1sum sha1 shasum
-for c in $shacmd_probe; do
-if has $c; then
-shacmd=$c
-break
-fi
-done
-if test $shacmd = ; then
-error_exit one of the checksum commands is required to enable 
modules: $shacmd_probe
+# SHA command and gmodule-2.0 probe for modules
+# return 0 if probe succeeds
+# $1: true - force mode, exit if probe fail
+# false - optoinal mode, return 1 if probe fail
+module_try_enable()
+{
+  force=$1
+  shacmd_probe=sha1sum sha1 shasum
+  for c in $shacmd_probe; do
+if has $c; then
+  shacmd=$c
+  break
 fi
-fi
+  done
+  if test $shacmd = ; then
+if $force; then
+  error_exit one of the checksum commands is required to enable modules: 
$shacmd_probe
+else
+  modules=no
+  return
+fi
+  fi
+  if ! glib_pkg_config gmodule-2.0; then
+if $force; then
+  error_exit glib-$glib_req_ver gmodule-2.0 is required to compile QEMU
+else
+  modules=no
+  return
+fi
+  fi
+  modules=yes
+}
+
+case $modules in
+  yes)
+module_try_enable true
+;;
+  )
+module_try_enable false
+;;
+  no)
+;;
+esac
 
 ##
 # pixman support probe
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions

2015-01-30 Thread Peter Maydell
On 30 January 2015 at 16:02, Maciej W. Rozycki ma...@linux-mips.org wrote:
  Hmm, so perhaps my idea for a later improvement:

  Eventually we might want to move the new inline functions into a
 separate header to be included from softfloat.h instead of softfloat.c,
 but let's make changes one step at a time.

 will actually have to be made right away.  I suspect GCC is more liberal
 here due to its convoluted extern/static/inline semantics history.
 Sigh...

I would suggest just using static inline, as we do elsewhere
for little utility functions.

-- PMM



Re: [Qemu-devel] [PATCH v3 4/4] virtio-blk: introduce multiread

2015-01-30 Thread Max Reitz

On 2015-01-30 at 09:33, Peter Lieven wrote:

this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.

The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.

The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.

   |4k|   64k|4k
MB/s  | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
--++-++-++
master| 1221   | 1187| 4178   | 4114| 1745   | 1213
multiread | 1829   | 1189| 4639   | 4110| 1894   | 1216

Signed-off-by: Peter Lieven p...@kamp.de
---
  hw/block/dataplane/virtio-blk.c |   8 +-
  hw/block/virtio-blk.c   | 288 +++-
  include/hw/virtio/virtio-blk.h  |  19 +--
  trace-events|   1 +
  4 files changed, 210 insertions(+), 106 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 39c5d71..93472e9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e)
  event_notifier_test_and_clear(s-host_notifier);
  blk_io_plug(s-conf-conf.blk);
  for (;;) {
-MultiReqBuffer mrb = {
-.num_writes = 0,
-};
+MultiReqBuffer mrb = {};
  int ret;
  
  /* Disable guest-host notifies to avoid unnecessary vmexits */

@@ -120,7 +118,9 @@ static void handle_notify(EventNotifier *e)
  virtio_blk_handle_request(req, mrb);
  }
  
-virtio_submit_multiwrite(s-conf-conf.blk, mrb);

+if (mrb.num_reqs) {
+virtio_submit_multireq(s-conf-conf.blk, mrb);
+}
  
  if (likely(ret == -EAGAIN)) { /* vring emptied */

  /* Re-enable guest-host notifies and stop processing the vring.
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e04adb8..77890a0 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -34,6 +34,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
  req-dev = s;
  req-qiov.size = 0;
  req-next = NULL;
+req-mr_next = NULL;
  return req;
  }
  
@@ -84,20 +85,29 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
  
  static void virtio_blk_rw_complete(void *opaque, int ret)

  {
-VirtIOBlockReq *req = opaque;
+VirtIOBlockReq *next = opaque;
  
-trace_virtio_blk_rw_complete(req, ret);

+while (next) {
+VirtIOBlockReq *req = next;
+next = req-mr_next;
+trace_virtio_blk_rw_complete(req, ret);
  
-if (ret) {

-int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
-bool is_read = !(p  VIRTIO_BLK_T_OUT);
-if (virtio_blk_handle_rw_error(req, -ret, is_read))
-return;
-}
+if (req-qiov.nalloc != -1) {
+qemu_iovec_destroy(req-qiov);
+}
  
-virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);

-block_acct_done(blk_get_stats(req-dev-blk), req-acct);
-virtio_blk_free_request(req);
+if (ret) {
+int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
+bool is_read = !(p  VIRTIO_BLK_T_OUT);
+if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
+continue;
+}
+}
+
+virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+block_acct_done(blk_get_stats(req-dev-blk), req-acct);
+virtio_blk_free_request(req);
+}
  }
  
  static void virtio_blk_flush_complete(void *opaque, int ret)

@@ -291,24 +301,125 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
  }
  }
  
-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)

+static inline void


Is the inline really worth it? This function is rather long...

(To my surprise, gcc actually does inline the function)


+virtio_submit_multireq2(BlockBackend *blk, MultiReqBuffer *mrb,


Hm, that's not a very descriptive name... Maybe submit_merged_requests 
or something like that (it's static, so you can drop the virtio_ prefix 
if you want to) would be better?



+int start, int num_reqs, int niov)
  {
-int i, ret;
+QEMUIOVector *qiov = mrb-reqs[start]-qiov;
+int64_t sector_num = mrb-reqs[start]-sector_num;
+int nb_sectors = mrb-reqs[start]-qiov.size / BDRV_SECTOR_SIZE;
+bool is_write = mrb-is_write;
+
+if (num_reqs  1) {
+int i;
+struct iovec *_iov = qiov-iov;
+

Re: [Qemu-devel] QEMU segfault: Booting an overlay with backing_file over NBD: nbd.c:nbd_receive_request():L756: read failed

2015-01-30 Thread Kevin Wolf
Am 29.01.2015 um 17:25 hat Kashyap Chamarthy geschrieben:
 A simple reproducer below.
 
 Export a disk image over NBD (I realize port 10809 is default, thought
 I'd explicitly mention anyhow):
 
   $ qemu-nbd --f qcow2 -p10809 \
 /var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img -t
 
 
 Create an overlay with backing file exported via NBD:
 
   $ qemu-img create -f qcow2 -F \
 nbd -o backing_file=nbd://localhost overlay1.qcow2
 Formatting 'overlay1.qcow2', fmt=qcow2 size=41126400 
 backing_file='nbd://localhost' backing_fmt='nbd' encryption=off 
 cluster_size=65536 lazy_refcounts=off
 
 
 Let's attempt to boot the overlay with a minimal QEMU:
 
   $ qemu-system-x86_64   \
  -nographic  \
  -nodefconfig\
  -nodefaults \
  -m 2048 \
  -device virtio-scsi-pci,id=scsi \
  -device virtio-serial-pci   \
  -serial stdio   \
  -drive file=./overlay1.qcow2,format=qcow2,if=virtio,cache=writeback
   Segmentation fault (core dumped)
 
 
 On the shell where `qemu-nbd` is running, I notice this
 
   nbd.c:nbd_receive_request():L756: read failed
 
 
 Haven't investigated further with GDB, thought I'd bring it up here
 first.
 
 
 Versions
 
 
   $ rpm -q qemu; uname -r
   qemu-2.1.2-7.fc21.x86_64
   3.17.8-300.fc21.x86_64

Copying Stefan because he's the master of AIO contexts and it is
bs-aio_context that becomes NULL. I couldn't see anything obvious.

In the meantime, could you retest on git master?

Kevin



[Qemu-devel] [PULL 07/11] memory: avoid ref/unref in memory_region_find

2015-01-30 Thread Paolo Bonzini
Do the entire lookup under RCU, which avoids atomic operations
in flatview_ref and flatview_unref.

Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 memory.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/memory.c b/memory.c
index a844ced..9b91243 100644
--- a/memory.c
+++ b/memory.c
@@ -1828,11 +1828,11 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 }
 range = addrrange_make(int128_make64(addr), int128_make64(size));
 
-view = address_space_get_flatview(as);
+rcu_read_lock();
+view = atomic_rcu_read(as-current_map);
 fr = flatview_lookup(view, range);
 if (!fr) {
-flatview_unref(view);
-return ret;
+goto out;
 }
 
 while (fr  view-ranges  addrrange_intersects(fr[-1].addr, range)) {
@@ -1849,8 +1849,8 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 ret.offset_within_address_space = int128_get64(range.start);
 ret.readonly = fr-readonly;
 memory_region_ref(ret.mr);
-
-flatview_unref(view);
+out:
+rcu_read_unlock();
 return ret;
 }
 
-- 
1.8.3.1





[Qemu-devel] [PULL 08/11] cpu-exec: simplify align_clocks

2015-01-30 Thread Paolo Bonzini
sc-diff_clk is already equal to sleep_delay (split in a second and a
nanosecond part).  If you subtract sleep_delay - rem_delay, the result
is exactly rem_delay.

Cc: Sebastian Tanase sebastian.tan...@openwide.fr
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpu-exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index a4f0eff..9dd1ca5 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -61,8 +61,7 @@ static void align_clocks(SyncClocks *sc, const CPUState *cpu)
 sleep_delay.tv_sec = sc-diff_clk / 10LL;
 sleep_delay.tv_nsec = sc-diff_clk % 10LL;
 if (nanosleep(sleep_delay, rem_delay)  0) {
-sc-diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 
10LL;
-sc-diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec;
+sc-diff_clk = rem_delay.tv_sec * 10LL + rem_delay.tv_nsec;
 } else {
 sc-diff_clk = 0;
 }
-- 
1.8.3.1





[Qemu-devel] [PULL 05/11] memory: remove assertion on memory_region_destroy

2015-01-30 Thread Paolo Bonzini
From: Jan Kiszka jan.kis...@siemens.com

Now that memory_region_destroy can be called from an RCU callback,
checking the BQL-protected global memory_region_transaction_depth
does not make much sense.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 memory.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/memory.c b/memory.c
index c343bf3..8c3d8c0 100644
--- a/memory.c
+++ b/memory.c
@@ -1263,7 +1263,6 @@ static void memory_region_finalize(Object *obj)
 MemoryRegion *mr = MEMORY_REGION(obj);
 
 assert(QTAILQ_EMPTY(mr-subregions));
-assert(memory_region_transaction_depth == 0);
 mr-destructor(mr);
 memory_region_clear_coalescing(mr);
 g_free((char *)mr-name);
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH v3 3/4] block-backend: expose bs-bl.max_transfer_length

2015-01-30 Thread Max Reitz

On 2015-01-30 at 09:33, Peter Lieven wrote:

Signed-off-by: Peter Lieven p...@kamp.de
---
  block/block-backend.c  | 5 +
  include/sysemu/block-backend.h | 1 +
  2 files changed, 6 insertions(+)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH v2 4/4] block: Eliminate silly QERR_ macros used for encryption keys

2015-01-30 Thread Max Reitz

On 2015-01-29 at 04:37, Markus Armbruster wrote:

The QERR_ macros are leftovers from the days of rich error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments.  This trickiness has become pointless.  Clean
up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED.

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
---
  block.c   | 5 +++--
  include/qapi/qmp/qerror.h | 6 --
  2 files changed, 3 insertions(+), 8 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2015-01-30 Thread John Snow



On 01/30/2015 09:32 AM, Kevin Wolf wrote:

Am 21.01.2015 um 10:34 hat Markus Armbruster geschrieben:

I'm afraid I forgot much of the discussion we had before the break, and
only now it's coming back, slowly.

Quoting myself on naming parameters identifying nodes[*]:

 John Snow pointed out to me that we still haven't spelled out how this
 single parameter should be named.

 On obvious option is calling it node-name, or FOO-node-name when we have
 several.  However, we'd then have two subtly different kinds of
 parameters called like that: the old ones accept *only* node names, the
 new ones also accept backend names, which automatically resolve to the
 backend's root node.

 Three ways to cope with that:

 * Find a better name.

 * Make the old ones accept backend names, too.  Only a few, not that
   much work.  However, there are exceptions:

   - blockdev-add's node-name *defines* the node name.

   - query-named-block-nodes's node-name *is* the node's name.

 * Stop worrying and embrace the inconsistency.  The affected commands
   are headed for deprecation anyway.

 I think I'd go with node or FOO-node for parameters that reference
 nodes and accept both node names and backend names, and refrain from
 touching the existing node-name parameters.


Wasn't the conclusion last time that we would try to find a better name
for new commands and leave old commands alone because they are going to
become deprecated? That is, a combination of your first and last option?



That was my impression, too: Use a new name for new commands and then 
slowly phase out the old things. This makes the new name clear as to 
what it supports (BOTH backends and nodes through a common namespace) to 
external management utilities like libvirt.


That's why I just rolled 'node-ref.'


Let's go through existing uses of @node-name again:

1. Define a node name

QMP commands blockdev-add (type BlockdevOptionsBase), drive-mirror

2. Report a node name

QMP command query-named-block-nodes (type BlockDeviceInfo)


Whatever name we end up using, 1. and 2. should probably use the same.


Should they? If these commands accept directly *node* names and have no 
chance of referencing a backend, maybe they should use different 
parameter names.





3. Node reference with backend names permitted for convenience

New QMP command block-dirty-bitmap-add (type BlockDirtyBitmapAdd) and
others

4. Node reference with backend names not permitted

QMP commands drive-mirror @replaces, change-backing-file
@image-node-name

We may want to support the backend name resolves to root node
convenience feature here, for consistency.  Then this moves under 3.

Note interface wart: change-backing-file additionally requires the
backend owning the node.  We need the backend to set op-blockers, we
can't easily find it from the node, so we make the user point it out
to us.


These shouldn't be existing. As you say, we should move them to 3.



Technically #3 here isn't a usage of node-name, because... I didn't 
use node-name for these commands. Unless I am reading this list wrong, 
but it's definitely not an existing use.


I don't have any opinions about #4; presumably that's something we're 
aiming to phase out.



5. Pair of names node reference, specify exactly one

QMP commands block_passwd, block_resize, blockdev-snapshot-sync

We can ignore this one, because we intend to replace the commands and
deprecate the old ones.


Agreed, these shouldn't be existing either.


If I understand you correctly, you're proposing to use @node-name or
@FOO-node-name when the value must be a node name (items 1+2 and 4), and
@node-ref or @FOO-node-ref where we additionally support the backend
name resolves to root node convenience feature (item 3).

Is that a fair description of your proposal?

PRO: the name makes it clear when the convenience feature is supported.

CON: if we eliminate 4 by supporting the convenience feature, we either
create ugly exceptions to the naming convention, or rename the
parameters.

Opinions?


If we don't have any cases where node names are allowed, but backend
names are not, then there is no reason to have two different names. I've
yet to see a reason for having commands that can accept node names, but
not backend names.

It's a bit different when the command can already accept both, but uses
two separate arguments for it. But I think most of them will be
deprecated, so we can ignore them here.

As for the naming, I'm not that sure that it's even useful to add
something to the field name. After all, this is really the _type_ of the
object, not the name. We don't have fields like 'read-only-bool' either.

If we're more specifically looking at things that actually refer to
block devices, you already mentioned drive-mirrors @replaces, which is a
great name in my opinion. @replaces-node-ref wouldn't improve anything.

Re: [Qemu-devel] [PATCH v3 3/4] arm: Add PCIe host bridge in virt machine

2015-01-30 Thread Peter Maydell
On 29 January 2015 at 15:06, Alexander Graf ag...@suse.de wrote:
 Now that we have a working generic PCIe host bridge driver, we can plug
 it into ARM's virt machine to always have PCIe available to normal ARM VMs.
 -/* 0x1000 .. 0x4000 reserved for PCI */
 +/* PCIe region layout: [ MMIO | PIO | ECAM ] */
 +[VIRT_PCIE] =   { 0x1000, 0x3000 },

 +/*
 + * Align all the regions to their respective positions. Eventually
 + * we want to have:
 + *
 + *   MMIO space 
 + *   IO Port space 
 + *   ECAM space 
 + *
 + * all naturally aligned after each other. The MMIO region always starts
 + * at 0 and shrinks depending on the size of the other 2 regions.
 + */

This is still very opaque to me. Can't we just have a comment that
tells the reader how big the MMIO, IO and ECAM windows are without
requiring them to cross reference it with the previous array and
do a bunch of mental arithmetic?

For preference, this comment should live in the a15memmap[] array
where we document all the rest of the memory map.

thanks
-- PMM



[Qemu-devel] [PULL 01/11] rcu: add rcu library

2015-01-30 Thread Paolo Bonzini
This includes a (mangled) copy of the liburcu code.  The main changes
are: 1) removing dependencies on many other header files in liburcu; 2)
removing for simplicity the tentative busy waiting in synchronize_rcu,
which has limited performance effects; 3) replacing futexes in
synchronize_rcu with QemuEvents for Win32 portability.  The API is
the same as liburcu, so it should be possible in the future to require
liburcu on POSIX systems for example and use our copy only on Windows.

Among the various versions available I chose urcu-mb, which is the
least invasive implementation even though it does not have the
fastest rcu_read_{lock,unlock} implementation.  The urcu flavor can
be changed later, after benchmarking.

Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 docs/rcu.txt  | 285 ++
 hw/9pfs/virtio-9p-synth.c |   1 +
 include/qemu/atomic.h |  61 ++
 include/qemu/queue.h  |  13 +++
 include/qemu/rcu.h| 112 ++
 include/qemu/thread.h |   3 -
 util/Makefile.objs|   1 +
 util/rcu.c| 172 
 8 files changed, 645 insertions(+), 3 deletions(-)
 create mode 100644 docs/rcu.txt
 create mode 100644 include/qemu/rcu.h
 create mode 100644 util/rcu.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
new file mode 100644
index 000..9938ad3
--- /dev/null
+++ b/docs/rcu.txt
@@ -0,0 +1,285 @@
+Using RCU (Read-Copy-Update) for synchronization
+
+
+Read-copy update (RCU) is a synchronization mechanism that is used to
+protect read-mostly data structures.  RCU is very efficient and scalable
+on the read side (it is wait-free), and thus can make the read paths
+extremely fast.
+
+RCU supports concurrency between a single writer and multiple readers,
+thus it is not used alone.  Typically, the write-side will use a lock to
+serialize multiple updates, but other approaches are possible (e.g.,
+restricting updates to a single task).  In QEMU, when a lock is used,
+this will often be the iothread mutex, also known as the big QEMU
+lock (BQL).  Also, restricting updates to a single task is done in
+QEMU using the bottom half API.
+
+RCU is fundamentally a wait-to-finish mechanism.  The read side marks
+sections of code with critical sections, and the update side will wait
+for the execution of all *currently running* critical sections before
+proceeding, or before asynchronously executing a callback.
+
+The key point here is that only the currently running critical sections
+are waited for; critical sections that are started _after_ the beginning
+of the wait do not extend the wait, despite running concurrently with
+the updater.  This is the reason why RCU is more scalable than,
+for example, reader-writer locks.  It is so much more scalable that
+the system will have a single instance of the RCU mechanism; a single
+mechanism can be used for an arbitrary number of things, without
+having to worry about things such as contention or deadlocks.
+
+How is this possible?  The basic idea is to split updates in two phases,
+removal and reclamation.  During removal, we ensure that subsequent
+readers will not be able to get a reference to the old data.  After
+removal has completed, a critical section will not be able to access
+the old data.  Therefore, critical sections that begin after removal
+do not matter; as soon as all previous critical sections have finished,
+there cannot be any readers who hold references to the data structure,
+and these can now be safely reclaimed (e.g., freed or unref'ed).
+
+Here is a picutre:
+
+thread 1  thread 2  thread 3
+------
+enter RCU crit.sec.
+   |finish removal phase
+   |begin wait
+   |  |enter RCU crit.sec.
+exit RCU crit.sec |   |
+complete wait |
+begin reclamation phase   |
+   exit RCU crit.sec.
+
+
+Note how thread 3 is still executing its critical section when thread 2
+starts reclaiming data.  This is possible, because the old version of the
+data structure was not accessible at the time thread 3 began executing
+that critical section.
+
+
+RCU API
+===
+
+The core RCU API is small:
+
+ void rcu_read_lock(void);
+
+Used by a reader to inform the reclaimer that the reader is
+entering an RCU read-side critical section.
+
+ void rcu_read_unlock(void);
+
+Used by a reader to inform the reclaimer that the reader is
+exiting an RCU read-side critical section.  Note that RCU
+read-side critical sections may be nested and/or 

[Qemu-devel] [PULL 09/11] cpu-exec: simplify init_delay_params

2015-01-30 Thread Paolo Bonzini
With the introduction of QEMU_CLOCK_VIRTUAL_RT, the computation of
sc-diff_clk can be simplified nicely:

qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
cpu_get_clock_offset()

 =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cpu_get_clock_offset())

 =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timers_state.cpu_clock_offset)

 =  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT)

Cc: Sebastian Tanase sebastian.tan...@openwide.fr
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 cpu-exec.c   |  6 ++
 cpus.c   | 17 -
 include/qemu/timer.h |  1 -
 3 files changed, 2 insertions(+), 22 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 9dd1ca5..fa506e6 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -100,10 +100,8 @@ static void init_delay_params(SyncClocks *sc,
 if (!icount_align_option) {
 return;
 }
-sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) -
-   sc-realtime_clock +
-   cpu_get_clock_offset();
+sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT);
+sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - sc-realtime_clock;
 sc-last_cpu_icount = cpu-icount_extra + cpu-icount_decr.u16.low;
 if (sc-diff_clk  max_delay) {
 max_delay = sc-diff_clk;
diff --git a/cpus.c b/cpus.c
index 3a5323b..0cdd1d7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -229,23 +229,6 @@ int64_t cpu_get_clock(void)
 return ti;
 }
 
-/* return the offset between the host clock and virtual CPU clock */
-int64_t cpu_get_clock_offset(void)
-{
-int64_t ti;
-unsigned start;
-
-do {
-start = seqlock_read_begin(timers_state.vm_clock_seqlock);
-ti = timers_state.cpu_clock_offset;
-if (!timers_state.cpu_ticks_enabled) {
-ti -= get_clock();
-}
-} while (seqlock_read_retry(timers_state.vm_clock_seqlock, start));
-
-return -ti;
-}
-
 /* enable cpu_get_ticks()
  * Caller must hold BQL which server as mutex for vm_clock_seqlock.
  */
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index ca5befb..eba8b21 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -838,7 +838,6 @@ static inline int64_t get_clock(void)
 int64_t cpu_get_icount_raw(void);
 int64_t cpu_get_icount(void);
 int64_t cpu_get_clock(void);
-int64_t cpu_get_clock_offset(void);
 int64_t cpu_icount_to_ns(int64_t icount);
 
 /***/
-- 
1.8.3.1





[Qemu-devel] [PULL 10/11] scsi: Fix scsi_req_cancel_async for no aiocb req

2015-01-30 Thread Paolo Bonzini
From: Fam Zheng f...@redhat.com

scsi_req_cancel_complete is responsible for releasing the request, so we
shouldn't skip it in any case. This doesn't affect the only existing
caller, virtio-scsi, but is useful for other devices once they use it.

Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi/scsi-bus.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 9b740a3..db39ae0 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1756,6 +1756,8 @@ void scsi_req_cancel_async(SCSIRequest *req, Notifier 
*notifier)
 req-io_canceled = true;
 if (req-aiocb) {
 blk_aio_cancel_async(req-aiocb);
+} else {
+scsi_req_cancel_complete(req);
 }
 }
 
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH v3 1/4] block: add accounting for merged requests

2015-01-30 Thread Max Reitz

On 2015-01-30 at 09:32, Peter Lieven wrote:

Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Eric Blake ebl...@redhat.com
---
  block.c|  2 ++
  block/accounting.c |  7 +++
  block/qapi.c   |  2 ++
  hmp.c  |  6 +-
  include/block/accounting.h |  3 +++
  qapi/block-core.json   |  9 -
  qmp-commands.hx| 22 ++
  7 files changed, 45 insertions(+), 6 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH v3 2/4] hw/virtio-blk: add a constant for max number of merged requests

2015-01-30 Thread Max Reitz

On 2015-01-30 at 09:33, Peter Lieven wrote:

As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.

Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Eric Blake ebl...@redhat.com
---
  hw/block/virtio-blk.c  | 2 +-
  include/hw/virtio/virtio-blk.h | 4 +++-
  2 files changed, 4 insertions(+), 2 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-devel] [PATCH 0/2] ATAPI migration fix/hack

2015-01-30 Thread Dr. David Alan Gilbert
* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote:
 
 Oops, forgot to include Kevin and Stefan on cc for this.

Ping;
John R-b'd the two patches:
  https://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01133.html
  http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg01956.html

although hmm, possibly before I added both of you on
the cc.

Dave

 
 Dave
 
 * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
  From: Dr. David Alan Gilbert dgilb...@redhat.com
  
  This pair of patches fixes a problem where IDE/ATAPI cdrom
  reads get lost/corrupted over migration.
  
  The first of the patches (restore atapi_dma flag) is
  a simple fix that I think is safe; it no longer causes
  corruption in the case we saw, but does still trigger
  a long timeout.
  
  The second is a hack; it throws a medium error causing
  the guest to retry the command in the case where migration
  happens just between the IDE/ATAPI command being submitted
  and the bmdma being finished.   This recovers a lot
  faster than the timeout.
  
  Only tried on Linux guests so far; I think it might be possible
  to replace both of these by reparsing the command buffer for
  ATAPI; I'm just not confident I know when that's safe to do,
  and I wanted to see how disgusted people were by the 2nd hack.
  
  Dave
  
  Dr. David Alan Gilbert (2):
Restore atapi_dma flag across migration
atapi migration: Throw recoverable error to avoid recovery
  
   hw/ide/atapi.c| 17 +
   hw/ide/core.c |  1 +
   hw/ide/internal.h |  2 ++
   hw/ide/pci.c  | 11 +++
   4 files changed, 31 insertions(+)
  
  -- 
  2.1.0
  
  
 --
 Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PULL 04/11] rcu: add call_rcu

2015-01-30 Thread Paolo Bonzini
Asynchronous callbacks provided by call_rcu are particularly important
for QEMU, because the BQL makes it hard to use synchronize_rcu.

In addition, the current RCU implementation is not particularly friendly
to multiple concurrent synchronize_rcu callers, making call_rcu even
more important.

Reviewed-by: Fam Zheng f...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 docs/rcu.txt   | 110 +++--
 include/qemu/rcu.h |  22 ++
 util/rcu.c | 119 +
 3 files changed, 247 insertions(+), 4 deletions(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index 9938ad3..61752b9 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -82,7 +82,50 @@ The core RCU API is small:
 Note that it would be valid for another update to come while
 synchronize_rcu is running.  Because of this, it is better that
 the updater releases any locks it may hold before calling
-synchronize_rcu.
+synchronize_rcu.  If this is not possible (for example, because
+the updater is protected by the BQL), you can use call_rcu.
+
+ void call_rcu1(struct rcu_head * head,
+void (*func)(struct rcu_head *head));
+
+This function invokes func(head) after all pre-existing RCU
+read-side critical sections on all threads have completed.  This
+marks the end of the removal phase, with func taking care
+asynchronously of the reclamation phase.
+
+The foo struct needs to have an rcu_head structure added,
+perhaps as follows:
+
+struct foo {
+struct rcu_head rcu;
+int a;
+char b;
+long c;
+};
+
+so that the reclaimer function can fetch the struct foo address
+and free it:
+
+call_rcu1(foo.rcu, foo_reclaim);
+
+void foo_reclaim(struct rcu_head *rp)
+{
+struct foo *fp = container_of(rp, struct foo, rcu);
+g_free(fp);
+}
+
+For the common case where the rcu_head member is the first of the
+struct, you can use the following macro.
+
+ void call_rcu(T *p,
+   void (*func)(T *p),
+   field-name);
+
+call_rcu1 is typically used through this macro, in the common case
+where the struct rcu_head is the first field in the struct.  In
+the above case, one could have written simply:
+
+call_rcu(foo_reclaim, g_free, rcu);
 
  typeof(*p) atomic_rcu_read(p);
 
@@ -153,6 +196,11 @@ DIFFERENCES WITH LINUX
 - atomic_rcu_read and atomic_rcu_set replace rcu_dereference and
   rcu_assign_pointer.  They take a _pointer_ to the variable being accessed.
 
+- call_rcu is a macro that has an extra argument (the name of the first
+  field in the struct, which must be a struct rcu_head), and expects the
+  type of the callback's argument to be the type of the first argument.
+  call_rcu1 is the same as Linux's call_rcu.
+
 
 RCU PATTERNS
 
@@ -206,7 +254,47 @@ The write side looks simply like this (with appropriate 
locking):
 synchronize_rcu();
 free(old);
 
-Note that the same idiom would be possible with reader/writer
+If the processing cannot be done purely within the critical section, it
+is possible to combine this idiom with a real reference count:
+
+rcu_read_lock();
+p = atomic_rcu_read(foo);
+foo_ref(p);
+rcu_read_unlock();
+/* do something with p. */
+foo_unref(p);
+
+The write side can be like this:
+
+qemu_mutex_lock(foo_mutex);
+old = foo;
+atomic_rcu_set(foo, new);
+qemu_mutex_unlock(foo_mutex);
+synchronize_rcu();
+foo_unref(old);
+
+or with call_rcu:
+
+qemu_mutex_lock(foo_mutex);
+old = foo;
+atomic_rcu_set(foo, new);
+qemu_mutex_unlock(foo_mutex);
+call_rcu(foo_unref, old, rcu);
+
+In both cases, the write side only performs removal.  Reclamation
+happens when the last reference to a foo object is dropped.
+Using synchronize_rcu() is undesirably expensive, because the
+last reference may be dropped on the read side.  Hence you can
+use call_rcu() instead:
+
+ foo_unref(struct foo *p) {
+if (atomic_fetch_dec(p-refcount) == 1) {
+call_rcu(foo_destroy, p, rcu);
+}
+}
+
+
+Note that the same idioms would be possible with reader/writer
 locks:
 
 read_lock(foo_rwlock); write_mutex_lock(foo_rwlock);
@@ -216,13 +304,27 @@ locks:
 write_mutex_unlock(foo_rwlock);
 free(p);
 
+--
+
+read_lock(foo_rwlock); write_mutex_lock(foo_rwlock);
+p = foo;old = foo;
+foo_ref(p); foo = new;
+read_unlock(foo_rwlock);   foo_unref(old);
+/* do 

[Qemu-devel] [PULL 00/11] RCU, scsi, modules, icount changes for 2015-01-30

2015-01-30 Thread Paolo Bonzini
The following changes since commit 83761b9244ad2ed39d3cfabe8a0e901ab906f7bf:

  Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20150127' 
into staging (2015-01-27 22:25:56 +)

are available in the git repository at:


  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 379e1ffb72c1feb7bd94467eb467351895afdcf2:

  configure: Default to enable module build (2015-01-28 11:29:04 +0100)


The important bits here are the first part of RCU and enabling
modules by default.  They have been tested with Travis for a few
days now, and things seem okay.


Fam Zheng (2):
  scsi: Fix scsi_req_cancel_async for no aiocb req
  configure: Default to enable module build

Jan Kiszka (1):
  memory: remove assertion on memory_region_destroy

Paolo Bonzini (8):
  rcu: add rcu library
  rcu: add rcutorture
  rcu: allow nesting of rcu_read_lock/rcu_read_unlock
  rcu: add call_rcu
  memory: protect current_map by RCU
  memory: avoid ref/unref in memory_region_find
  cpu-exec: simplify align_clocks
  cpu-exec: simplify init_delay_params

 .travis.yml   |   2 +-
 configure |  95 +++---
 cpu-exec.c|   9 +-
 cpus.c|  17 --
 docs/rcu.txt  | 387 +++
 hw/9pfs/virtio-9p-synth.c |   1 +
 hw/scsi/scsi-bus.c|   2 +
 include/exec/memory.h |   5 +
 include/qemu/atomic.h |  61 +++
 include/qemu/queue.h  |  13 ++
 include/qemu/rcu.h| 147 +++
 include/qemu/thread.h |   3 -
 include/qemu/timer.h  |   1 -
 memory.c  |  65 +++
 tests/Makefile|   7 +-
 tests/rcutorture.c| 450 ++
 util/Makefile.objs|   1 +
 util/rcu.c| 291 ++
 18 files changed, 1462 insertions(+), 95 deletions(-)
 create mode 100644 docs/rcu.txt
 create mode 100644 include/qemu/rcu.h
 create mode 100644 tests/rcutorture.c
 create mode 100644 util/rcu.c
-- 
1.8.3.1




[Qemu-devel] [PATCH 1/6] vmport.c: Fix vmport_cmd_ram_size

2015-01-30 Thread Don Slutz
Based on

https://sites.google.com/site/chitchatvmback/backdoor

and testing on ESXi, this should be in MB not bytes.

Signed-off-by: Don Slutz dsl...@verizon.com
---
 hw/misc/vmport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/vmport.c b/hw/misc/vmport.c
index 7fcc00d..6b350ce 100644
--- a/hw/misc/vmport.c
+++ b/hw/misc/vmport.c
@@ -110,7 +110,7 @@ static uint32_t vmport_cmd_ram_size(void *opaque, uint32_t 
addr)
 X86CPU *cpu = X86_CPU(current_cpu);
 
 cpu-env.regs[R_EBX] = 0x1177;
-return ram_size;
+return ram_size  20; /* in MB */
 }
 
 static uint32_t vmport_cmd_xtest(void *opaque, uint32_t addr)
-- 
1.8.4




[Qemu-devel] [PATCH 0/6] Add limited support of VMware's hyper-call rpc

2015-01-30 Thread Don Slutz
The support included is enough to allow VMware tools to install in a
guest and provide guestinfo support.  guestinfo support is provided
by what is known as VMware RPC support.

One of the better on-line references is:

https://sites.google.com/site/chitchatvmback/backdoor

As a place to get more accurate information by studying:

http://open-vm-tools.sourceforge.net/

With vmware tools installed, you can do:

---
Last login: Fri Jan 30 16:03:08 2015
[root@C63-min-tools ~]# vmtoolsd --cmd info-get guestinfo.joejoel
No value found
[root@C63-min-tools ~]# vmtoolsd --cmd info-set guestinfo.joejoel bar

[root@C63-min-tools ~]# vmtoolsd --cmd info-get guestinfo.joejoel
bar
[root@C63-min-tools ~]# 
---

to access guest info.  QMP access is also provided.

The live migration code is still in progress.

Don Slutz (6):
  vmport.c: Fix vmport_cmd_ram_size
  vmport_rpc: Add the object vmport_rpc
  vmport_rpc: Add limited support of VMware's hyper-call rpc
  vmport_rpc: Add QMP access to vmport_rpc object.
  vmport:  Add VMware all ring hack
  MAINTAINERS: add VMware port

 MAINTAINERS  |7 +
 hw/i386/pc.c |   34 +-
 hw/i386/pc_piix.c|2 +-
 hw/i386/pc_q35.c |2 +-
 hw/misc/Makefile.objs|1 +
 hw/misc/vmport.c |2 +-
 hw/misc/vmport_rpc.c | 1188 ++
 include/hw/i386/pc.h |6 +-
 qapi-schema.json |   95 
 qmp-commands.hx  |  141 ++
 target-i386/cpu.c|4 +
 target-i386/cpu.h|2 +
 target-i386/seg_helper.c |6 +
 trace-events |   20 +
 14 files changed, 1505 insertions(+), 5 deletions(-)
 create mode 100644 hw/misc/vmport_rpc.c

-- 
1.8.4




Re: [Qemu-devel] [PATCH v3 4/4] virtio-blk: introduce multiread

2015-01-30 Thread Kevin Wolf
Am 30.01.2015 um 22:05 hat Peter Lieven geschrieben:
 Am 30.01.2015 um 18:16 schrieb Max Reitz:
  On 2015-01-30 at 09:33, Peter Lieven wrote:
  this patch finally introduces multiread support to virtio-blk. While
  multiwrite support was there for a long time, read support was missing.
 
  The complete merge logic is moved into virtio-blk.c which has
  been the only user of request merging ever since. This is required
  to be able to merge chunks of requests and immediately invoke callbacks
  for those requests. Secondly, this is required to switch to
  direct invocation of coroutines which is planned at a later stage.
 
  The following benchmarks show the performance of running fio with
  4 worker threads on a local ram disk. The numbers show the average
  of 10 test runs after 1 run as warmup phase.
 
 |4k|   64k|4k
  MB/s  | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
  --++-++-++
  master| 1221   | 1187| 4178   | 4114| 1745   | 1213
  multiread | 1829   | 1189| 4639   | 4110| 1894   | 1216
 
  Signed-off-by: Peter Lieven p...@kamp.de
  ---
hw/block/dataplane/virtio-blk.c |   8 +-
hw/block/virtio-blk.c   | 288 
  +++-
include/hw/virtio/virtio-blk.h  |  19 +--
trace-events|   1 +
4 files changed, 210 insertions(+), 106 deletions(-)

  +int64_t sector_num = 0;
  +
  +if (mrb-num_reqs == 1) {
  +virtio_submit_multireq2(blk, mrb, 0, 1, -1);
  +mrb-num_reqs = 0;
return;
}
-ret = blk_aio_multiwrite(blk, mrb-blkreq, mrb-num_writes);
  -if (ret != 0) {
  -for (i = 0; i  mrb-num_writes; i++) {
  -if (mrb-blkreq[i].error) {
  -virtio_blk_rw_complete(mrb-blkreq[i].opaque, -EIO);
  +max_xfer_len = blk_get_max_transfer_length(mrb-reqs[0]-dev-blk);
  +max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX);
  +
  +qsort(mrb-reqs, mrb-num_reqs, sizeof(*mrb-reqs),
  +  virtio_multireq_compare);
  +
  +for (i = 0; i  mrb-num_reqs; i++) {
  +VirtIOBlockReq *req = mrb-reqs[i];
  +if (num_reqs  0) {
  +bool merge = true;
  +
  +/* merge would exceed maximum number of IOVs */
  +if (niov + req-qiov.niov + 1  IOV_MAX) {
 
  Hm, why the +1?
 
 A really good question. I copied this piece from the old merge routine. It 
 seems
 definetely wrong.

The old code merged requests even if they were overlapping. This could
result in one area being split in two.

I think you don't support this here, so removing the + 1 is probably
okay.

Kevin



Re: [Qemu-devel] [PATCH v3 4/4] virtio-blk: introduce multiread

2015-01-30 Thread Max Reitz

On 2015-01-30 at 16:05, Peter Lieven wrote:

Am 30.01.2015 um 18:16 schrieb Max Reitz:

On 2015-01-30 at 09:33, Peter Lieven wrote:

this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.

The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.

The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.

|4k|   64k|4k
MB/s  | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
--++-++-++
master| 1221   | 1187| 4178   | 4114| 1745   | 1213
multiread | 1829   | 1189| 4639   | 4110| 1894   | 1216

Signed-off-by: Peter Lieven p...@kamp.de
---
   hw/block/dataplane/virtio-blk.c |   8 +-
   hw/block/virtio-blk.c   | 288 
+++-
   include/hw/virtio/virtio-blk.h  |  19 +--
   trace-events|   1 +
   4 files changed, 210 insertions(+), 106 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 39c5d71..93472e9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e)
   event_notifier_test_and_clear(s-host_notifier);
   blk_io_plug(s-conf-conf.blk);
   for (;;) {
-MultiReqBuffer mrb = {
-.num_writes = 0,
-};
+MultiReqBuffer mrb = {};
   int ret;
 /* Disable guest-host notifies to avoid unnecessary vmexits */
@@ -120,7 +118,9 @@ static void handle_notify(EventNotifier *e)
   virtio_blk_handle_request(req, mrb);
   }
   -virtio_submit_multiwrite(s-conf-conf.blk, mrb);
+if (mrb.num_reqs) {
+virtio_submit_multireq(s-conf-conf.blk, mrb);
+}
 if (likely(ret == -EAGAIN)) { /* vring emptied */
   /* Re-enable guest-host notifies and stop processing the vring.
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e04adb8..77890a0 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -34,6 +34,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
   req-dev = s;
   req-qiov.size = 0;
   req-next = NULL;
+req-mr_next = NULL;
   return req;
   }
   @@ -84,20 +85,29 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq 
*req, int error,
 static void virtio_blk_rw_complete(void *opaque, int ret)
   {
-VirtIOBlockReq *req = opaque;
+VirtIOBlockReq *next = opaque;
   -trace_virtio_blk_rw_complete(req, ret);
+while (next) {
+VirtIOBlockReq *req = next;
+next = req-mr_next;
+trace_virtio_blk_rw_complete(req, ret);
   -if (ret) {
-int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
-bool is_read = !(p  VIRTIO_BLK_T_OUT);
-if (virtio_blk_handle_rw_error(req, -ret, is_read))
-return;
-}
+if (req-qiov.nalloc != -1) {
+qemu_iovec_destroy(req-qiov);
+}
   -virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-block_acct_done(blk_get_stats(req-dev-blk), req-acct);
-virtio_blk_free_request(req);
+if (ret) {
+int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
+bool is_read = !(p  VIRTIO_BLK_T_OUT);
+if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
+continue;
+}
+}
+
+virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+block_acct_done(blk_get_stats(req-dev-blk), req-acct);
+virtio_blk_free_request(req);
+}
   }
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -291,24 +301,125 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
   }
   }
   -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
+static inline void

Is the inline really worth it? This function is rather long...

I tried really hard to not add a regression for random reads and the difference
between good and bad here is sometimes just a matter of a few small changes.
If gcc inlines on its own anyway would you mind to keep the inline? The function
is actually only called from 2 places.


Okay, feel free to keep it.


(To my surprise, gcc actually does inline the function)


+virtio_submit_multireq2(BlockBackend *blk, MultiReqBuffer *mrb,

Hm, that's not a very descriptive name... Maybe submit_merged_requests or 
something like that (it's static, so you can drop the virtio_ prefix if you 

Re: [Qemu-devel] [PATCH 4/6] vmport_rpc: Add QMP access to vmport_rpc object.

2015-01-30 Thread Eric Blake
On 01/30/2015 02:06 PM, Don Slutz wrote:
 This adds two new inject commands:
 
 inject-vmport-reboot
 inject-vmport-halt
 
 And three guest info commands:
 
 vmport-guestinfo-set
 vmport-guestinfo-get
 query-vmport-guestinfo
 
 More details in qmp-commands.hx
 
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---

 +static int foreach_dynamic_VMPortRpc_device(FindVMPortRpcDeviceFunc *func,
 +void *arg)
 +{
 +VMPortRpcFind find = {
 +.func = func,
 +.arg = arg,
 +};
 +
 +/* Loop through all VMPortRpc devices that were spawened outside

s/spawened/spawned/


  #ifdef VMPORT_RPC_DEBUG
  /*
   * Add helper function for traceing.  This routine will convert

Pre-existing as of this patch, but while you are here:
s/traceing/tracing/
unless it occurred earlier in the series, in which case fix it there.


 +static void convert_local_rc(Error **errp, int rc)
 +{
 +switch (rc) {
 +case 0:
 +break;
 +case VMPORT_DEVICE_NOT_FOUND:
 +error_set(errp, QERR_DEVICE_NOT_FOUND, TYPE_VMPORT_RPC);
 +break;
 +case SEND_NOT_OPEN:
 +error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc not open);

shorter as:
error_setg(errp, VMWare rpc not open);

and similar for all your other uses of ERROR_CLASS_GENERIC_ERROR

 +++ b/qapi-schema.json
 @@ -1271,6 +1271,101 @@
  { 'command': 'inject-nmi' }
  
  ##
 +# @inject-vmport-reboot:
 +#
 +# Injects a VMWare Tools reboot to the guest.
 +#
 +# Returns:  If successful, nothing
 +#
 +# Since:  2.3
 +#
 +##
 +{ 'command': 'inject-vmport-reboot' }
 +
 +##
 +# @inject-vmport-halt:
 +#
 +# Injects a VMWare Tools halt to the guest.
 +#
 +# Returns:  If successful, nothing
 +#
 +# Since:  2.3
 +#
 +##
 +{ 'command': 'inject-vmport-halt' }

Why two commands?  Why not just 'inject-vmport-action' with a parameter
that says whether the action is 'reboot', 'halt', or something else?

 +
 +##
 +# @vmport-guestinfo-set:
 +#
 +# Set a VMWare Tools guestinfo key to a value
 +#
 +# @key: the key to set
 +#
 +# @value: The data to set the key to
 +#
 +# @format: #optional value encoding (default 'utf8').
 +#  - base64: value must be base64 encoded text.  Its binary
 +#decoding gets set.
 +#Bug: invalid base64 is currently not rejected.

We should fix the bug, rather than document it.

 +#Whitespace *is* invalid.
 +#  - utf8: value's UTF-8 encoding is written
 +#  - value itself is always Unicode regardless of format, like
 +#any other string.

This was confusing to read - there are three bullets, so it looks like
you meant to document three valid DataFormat enum values; but there are
only two.  The comment about 'value' being supplied as valid JSON UTF8
and only later decoded according to 'format' might belong better on
'value', if at all.  On the other hand, I see you are just blindly
copy-and-pasting from 'ringbuf-write'.

 +#
 +# Returns: Nothing on success
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'vmport-guestinfo-set',
 +  'data': {'key': 'str', 'value': 'str',
 +   '*format': 'DataFormat'} }
 +
 +##
 +# @vmport-guestinfo-get:
 +#
 +# Get a VMWare Tools guestinfo value for a key
 +#
 +# @key: the key to get
 +#
 +# @format: #optional data encoding (default 'utf8').
 +#  - base64: the value is returned in base64 encoding.
 +#  - utf8: the value is interpreted as UTF-8.
 +#Bug: can screw up when the buffer contains invalid UTF-8
 +#sequences, NUL characters.
 +#  - The return value is always Unicode regardless of format,
 +#like any other string.

Similar comments, but again sourced by copy-and-pasting from an existing
interface.

 +#
 +# Returns: value for the guest info key
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'vmport-guestinfo-get',
 +  'data': {'key': 'str', '*format': 'DataFormat'},
 +  'returns': 'str' }
 +
 +##
 +# @VmportGuestInfo:
 +#
 +# Information about a single VMWare Tools guestinfo
 +#
 +# @key: The known key
 +#
 +# Since: 2.3
 +##
 +{ 'type': 'VmportGuestInfo', 'data': {'key': 'str'} }
 +
 +##
 +# @query-vmport-guestinfo:
 +#
 +# Returns information about VMWare Tools guestinfo
 +#
 +# Returns: a list of @VmportGuestInfo
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'query-vmport-guestinfo', 'returns': ['VmportGuestInfo'] }
 +
 +##

Interface seems more or less okay, since it copies from existing idioms.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] QEMU segfault: Booting an overlay with backing_file over NBD: nbd.c:nbd_receive_request():L756: read failed

2015-01-30 Thread Kashyap Chamarthy
On Fri, Jan 30, 2015 at 02:32:25PM -0500, Max Reitz wrote:
 On 2015-01-30 at 13:41, Kashyap Chamarthy wrote:
 On Fri, Jan 30, 2015 at 06:15:21PM +0100, Kevin Wolf wrote:
 Am 29.01.2015 um 17:25 hat Kashyap Chamarthy geschrieben:

[. . .]

 Copying Stefan because he's the master of AIO contexts and it is
 bs-aio_context that becomes NULL. I couldn't see anything obvious.
 
 
 In the meantime, could you retest on git master?
 Just tested from git, and I can still reproduce it.
 
 That's the commit I'm at:
 
$ git describe
v2.2.0-682-g16017c4
 
 
 Run the NBD server, from git:
 
$ /home/kashyapc/build/qemu/qemu-nbd -f qcow2 \
-p10809 ./f21vm.qcow2 -t
 
 
 Create the overlay:
 
$ /home/kashyapc/build/qemu/qemu-img create \
-f qcow2 -F nbd -o backing_file=nbd://localhost 
  overlay2-of-f21vm.qcow2
Segmentation fault (core dumped)
 
 You want to use -F raw. The file format is raw, not nbd (nbd is the protocol
 over which the data is read, which is in format raw).

Noted, thanks for this detail. However, the `qemu-img` binary from the
system (version noted earlier in the thread) honors the -F nbd option
just fine:

  $ qemu-img create -f qcow2 -F nbd \
 -o backing_file=nbd://localhost overlay3-of-f21vm.qcow2
  Formatting 'overlay3-of-f21vm.qcow2', fmt=qcow2 size=42949672960 
backing_file='nbd://localhost' backing_fmt='nbd' encryption=off 
cluster_size=65536 lazy_refcounts=off

Then, the segfault with the git-compiled `qemu-img` binary seems like
some kind of an incorrect regression (if you could call it so). 

Anyhow, the `qemu-img` from git works correctly as per your reasoning:

  $ /home/kashyapc/build/qemu/qemu-img create \
  -f qcow2 -F raw -o backing_file=nbd://localhost

 Anyway, -F nbd shouldn't result in a segfault. One way to prevent this is to
 check whether the backing file format specified (or any format given to
 qemu-img in general) is a real format or the name of a protocol driver and
 then error out if it's the latter; but that would be more of a hotfix.
 
 Kevin, Stefan: The real problem is that block/nbd.c stores a BDRVNBDState
 object in bs-opaque and passes BDRVNBDState.client (an NbdClientSession
 object) to the block/nbd-client.c functions. Those functions then receive
 the BDS pointer from client-bs. If an NBD BDS is a root BDS (as in this
 case), at some point a bdrv_swap() may happen (and it does happen here)
 which leads to ((BDRVNBDState *)bs-opaque)-client.bs != bs, and that's
 where the segfault comes from (bdrv_get_aio_context() returns NULL).
 
 One way to fix this real problem is to remove the BDS pointer from the
 NbdClientSession and to always pass the BDS explicitly to the
 block/nbd-client.c functions; the other is to always update the BDS pointer
 in NbdClientSession in block/nbd.c. I'll try the former, and if it doesn't
 work, will do the latter (if you don't object).
 
Thank you for investigating.


-- 
/kashyap



[Qemu-devel] [PATCH 1/2] kvm_stat: Define RESET ioctl number for PPC

2015-01-30 Thread Wei Huang
This patch defines the RESET ioctl number for PPC architecture. Without it,
the reset() function of Event class can potentionally cause exception on
PPC.

Signed-off-by: Wei Huang w...@redhat.com
---
 scripts/kvm/kvm_stat | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/kvm/kvm_stat b/scripts/kvm/kvm_stat
index c0c4ff0..7af5947 100755
--- a/scripts/kvm/kvm_stat
+++ b/scripts/kvm/kvm_stat
@@ -207,6 +207,7 @@ def ppc_init():
 'SET_FILTER' : 0x80002406 | (ctypes.sizeof(ctypes.c_char_p)  16),
 'ENABLE' : 0x20002400,
 'DISABLE': 0x20002401,
+'RESET'  : 0x20002403,
 }
 })
 
-- 
1.8.3.1




[Qemu-devel] [PATCH 2/2] kvm_stat: Add kvm_exit reasons for aarch64

2015-01-30 Thread Wei Huang
This patch defines the list of kvm_exit reasons for aarch64. This list is
based on the Exception Class (EC) field of HSR register. With this patch
users can trace the execution of guest VMs better. A sample output from
command kvm_stat -1 -t is shown as the following:
...
kvm_exit(WATCHPT_HYP)  0 0
kvm_exit(WFI)   9422  9361

NOTE: This patch requires TRACE_EVENT(kvm_exit) to include exit_reason
field in TP_ARGS. A patch to upstream kernel has been submitted.

Signed-off-by: Wei Huang w...@redhat.com
---
 scripts/kvm/kvm_stat | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/scripts/kvm/kvm_stat b/scripts/kvm/kvm_stat
index 7af5947..2a9ca98 100755
--- a/scripts/kvm/kvm_stat
+++ b/scripts/kvm/kvm_stat
@@ -145,6 +145,45 @@ svm_exit_reasons = {
 0x400: 'NPF',
 }
 
+# EC definition of HSR (from arch/arm64/include/asm/kvm_arm.h)
+aarch64_exit_reasons = {
+0x00: 'UNKNOWN',
+0x01: 'WFI',
+0x03: 'CP15_32',
+0x04: 'CP15_64',
+0x05: 'CP14_MR',
+0x06: 'CP14_LS',
+0x07: 'FP_ASIMD',
+0x08: 'CP10_ID',
+0x0C: 'CP14_64',
+0x0E: 'ILL_ISS',
+0x11: 'SVC32',
+0x12: 'HVC32',
+0x13: 'SMC32',
+0x15: 'SVC64',
+0x16: 'HVC64',
+0x17: 'SMC64',
+0x18: 'SYS64',
+0x20: 'IABT',
+0x21: 'IABT_HYP',
+0x22: 'PC_ALIGN',
+0x24: 'DABT',
+0x25: 'DABT_HYP',
+0x26: 'SP_ALIGN',
+0x28: 'FP_EXC32',
+0x2C: 'FP_EXC64',
+0x2F: 'SERROR',
+0x30: 'BREAKPT',
+0x31: 'BREAKPT_HYP',
+0x32: 'SOFTSTP',
+0x33: 'SOFTSTP_HYP',
+0x34: 'WATCHPT',
+0x35: 'WATCHPT_HYP',
+0x38: 'BKPT32',
+0x3A: 'VECTOR32',
+0x3C: 'BRK64',
+}
+
 # From include/uapi/linux/kvm.h, KVM_EXIT_xxx
 userspace_exit_reasons = {
  0: 'UNKNOWN',
@@ -213,7 +252,8 @@ def ppc_init():
 
 def aarch64_init():
 globals().update({
-'sc_perf_evt_open' : 241
+'sc_perf_evt_open' : 241,
+'exit_reasons' : aarch64_exit_reasons,
 })
 
 def detect_platform():
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-30 Thread Denis V. Lunev

On 29/01/15 16:49, Denis V. Lunev wrote:

On 29/01/15 16:18, Kevin Wolf wrote:

Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i  10; i++)
 write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes. This patch
forces page alignment when we really forced to perform memory
allocation.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Paolo Bonzini pbonz...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
---
  block.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index d45e4dd..38cf73f 100644
--- a/block.c
+++ b/block.c
@@ -5293,7 +5293,11 @@ void 
bdrv_set_guest_block_size(BlockDriverState *bs, int align)

void *qemu_blockalign(BlockDriverState *bs, size_t size)
  {
-return qemu_memalign(bdrv_opt_mem_align(bs), size);
+size_t align = bdrv_opt_mem_align(bs);
+if (align  4096) {
+align = 4096;
+}
+return qemu_memalign(align, size);
  }
void *qemu_blockalign0(BlockDriverState *bs, size_t size)
@@ -5307,6 +5311,9 @@ void *qemu_try_blockalign(BlockDriverState 
*bs, size_t size)

/* Ensure that NULL is never returned on success */
  assert(align  0);
+if (align  4096) {
+align = 4096;
+}
  if (size == 0) {
  size = align;
  }

This is the wrong place to make this change. First you're duplicating
logic in the callers of bdrv_opt_mem_align() instead of making it return
the right thing in the first place.

This has been actually done in the first iteration. bdrv_opt_mem_align
is called actually three times in:
  qemu_blockalign
  qemu_try_blockalign
  bdrv_qiov_is_aligned
Paolo says that he does not want to have bdrv_qiov_is_aligned affected
to avoid extra bounce buffering.

From my point of view this extra bounce buffering is better than 
unaligned
pointer during write to the disk as 512/4096 logical/physical sectors 
size

disks are mainstream now. Though I don't want to specially argue here.
Normal guest operations results in page aligned requests and this is not
a problem at all. The amount of 512 aligned requests from guest side is
quite negligible.

  Second, you're arguing with numbers
from a simple test case for O_DIRECT on Linux, but you're changing the
alignment for everyone instead of just the raw-posix driver which is
responsible for accessing Linux files.

This should not be a real problem. We are allocation memory for the
buffer. A little bit stricter alignment is not a big overhead for any 
libc
implementation thus this kludge will not produce any significant 
overhead.

Also, what's the real reason for the performance improvement? Having
page alignment? If so, actually querying the page size instead of
assuming 4k might be worth a thought.

Kevin

Most likely the problem comes from the read-modify-write pattern
either in kernel or in disk. Actually my experience says that it is a
bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
ABI technically allows this but in general it is much less tested.

Yes, this synthetic test shows some difference here. In terms of
qemu-io the result is also visible, but less
  qemu-img create -f qcow2 ./1.img 64G
  qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
performs 1% better.

There is also similar kludge here
size_t bdrv_opt_mem_align(BlockDriverState *bs)
{
if (!bs || !bs-drv) {
/* 4k should be on the safe side */
return 4096;
}

return bs-bl.opt_mem_alignment;
}
which just uses 4096 constant.

Yes, I could agree that queering page size could be a good idea, but
I do not know at the moment how to do that. Can you pls share your
opinion if you have any.

Regards,
Den

Paolo, Kevin,

I have spent a bit more time digging the issue and found some
additional information. The same 5% difference if the buffer is
aligned to 512/4096 is observed for the following devices/filesystems

1) ext4 with block size equals to 1024 over 512/512 physical/logical
   sector size SSD disk
2) ext4 with block size equals to 4096 over 512/512 physical/logical
   sector size SSD disk
3) ext4 with block size equals to 4096 over 512/4096 physical/logical
   sector size rotational disk (WDC WD20EZRX)
4) with block size equals to 4096 over 512/512 physical/logical
   sector size SSD disk

This means that only page size (4k) matters.

Guys, you propose quite different approaches. I can extend this patch
to use sysconf(_SC_PAGESIZE) to detect page size and drop hardcoded
4096. This is not a problem. But you have different opinion about
the place to insert the check.

Could you please come into agreement?

Proper defines/configuration 

[Qemu-devel] [PATCH 09/19] qtest/ahci: Demagic ahci tests.

2015-01-30 Thread John Snow
Add human-readable command names and other miscellaneous #defines
to help make the code more readable.

Some of these definitions are not yet used in this current series,
but for convenience and sanity they have been lumped together here,
as it's more trouble than it is worth in a test suite to hand-pick,
one-by-one, which preprocessor definitions are useful per-each test.

These definitions include:

ATA Command Mnemonics
Current expected AHCI sector size
FIS magic bytes
REG_H2D_FIS flags
Command Header flags

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 12 ++--
 tests/libqos/ahci.h | 53 +
 2 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index b21d4ad..c862525 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -705,9 +705,9 @@ static void ahci_test_identify(AHCIQState *ahci)
 
 /* Construct our Command Header (set_command_header handles endianness.) */
 memset(cmd, 0x00, sizeof(cmd));
-cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
-cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */
-cmd.prdtl = 1; /* One PRD table entry. */
+cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
+cmd.flags = CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */
+cmd.prdtl = 1;/* One PRD table entry. */
 cmd.prdbc = 0;
 cmd.ctba = table;
 
@@ -719,10 +719,10 @@ static void ahci_test_identify(AHCIQState *ahci)
 
 /* Construct our Command FIS, Based on http://wiki.osdev.org/AHCI */
 memset(fis, 0x00, sizeof(fis));
-fis.fis_type = 0x27; /* Register Host-to-Device FIS */
-fis.command = 0xEC;  /* IDENTIFY */
+fis.fis_type = REG_H2D_FIS;  /* Register Host-to-Device FIS */
+fis.command = CMD_IDENTIFY;
 fis.device = 0;
-fis.flags = 0x80;/* Indicate this is a command FIS */
+fis.flags = REG_H2D_FIS_CMD; /* Indicate this is a command FIS */
 
 /* We've committed nothing yet, no interrupts should be posted yet. */
 g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 3b4e1ef..0d12582 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -243,6 +243,59 @@
 #define AHCI_VERSION_1_2 (0x00010200)
 #define AHCI_VERSION_1_3 (0x00010300)
 
+#define AHCI_SECTOR_SIZE(512)
+
+/* FIS types */
+enum {
+REG_H2D_FIS = 0x27,
+REG_D2H_FIS = 0x34,
+DMA_ACTIVATE_FIS = 0x39,
+DMA_SETUP_FIS = 0x41,
+DATA_FIS = 0x46,
+BIST_ACTIVATE_FIS = 0x58,
+PIO_SETUP_FIS = 0x5F,
+SDB_FIS = 0xA1
+};
+
+/* FIS flags */
+#define REG_H2D_FIS_CMD  0x80
+
+/* ATA Commands */
+enum {
+/* DMA */
+CMD_READ_DMA  = 0xC8,
+CMD_READ_DMA_EXT  = 0x25,
+CMD_WRITE_DMA = 0xCA,
+CMD_WRITE_DMA_EXT = 0x35,
+/* PIO */
+CMD_READ_PIO  = 0x20,
+CMD_READ_PIO_EXT  = 0x24,
+CMD_WRITE_PIO = 0x30,
+CMD_WRITE_PIO_EXT = 0x34,
+/* Misc */
+CMD_READ_MAX  = 0xF8,
+CMD_READ_MAX_EXT  = 0x27,
+CMD_FLUSH_CACHE   = 0xE7,
+CMD_IDENTIFY  = 0xEC
+};
+
+/* AHCI Command Header Flags  Masks*/
+#define CMDH_CFL(0x1F)
+#define CMDH_ATAPI  (0x20)
+#define CMDH_WRITE  (0x40)
+#define CMDH_PREFETCH   (0x80)
+#define CMDH_RESET (0x100)
+#define CMDH_BIST  (0x200)
+#define CMDH_CLR_BSY   (0x400)
+#define CMDH_RES   (0x800)
+#define CMDH_PMP  (0xF000)
+
+/* ATA device register masks */
+#define ATA_DEVICE_MAGIC 0xA0
+#define ATA_DEVICE_LBA   0x40
+#define ATA_DEVICE_DRIVE 0x10
+#define ATA_DEVICE_HEAD  0x0F
+
 /*** Structures ***/
 
 typedef struct AHCIPortQState {
-- 
1.9.3




[Qemu-devel] [PATCH] integrator/cp: Wire up MMC card detection

2015-01-30 Thread Jan Kiszka
This allows to use MMC emulation with the Integrator/CP model. Well,
mostly. There seems to be timing issues with Linux so that the card is
not always detected (but most of the time).

Note that the read-only pin is intentionally left unconnected because
the PIC model could deliver it incorrectly as interrupt to the guest
while the spec says that this pin is just for status reading.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/arm/integratorcp.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index 8c48b68..0efadc4 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -477,7 +477,7 @@ static void integratorcp_init(MachineState *machine)
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 MemoryRegion *ram_alias = g_new(MemoryRegion, 1);
 qemu_irq pic[32];
-DeviceState *dev;
+DeviceState *dev, *sic;
 int i;
 Error *err = NULL;
 
@@ -535,7 +535,7 @@ static void integratorcp_init(MachineState *machine)
 for (i = 0; i  32; i++) {
 pic[i] = qdev_get_gpio_in(dev, i);
 }
-sysbus_create_simple(TYPE_INTEGRATOR_PIC, 0xca00, pic[26]);
+sic = sysbus_create_simple(TYPE_INTEGRATOR_PIC, 0xca00, pic[26]);
 sysbus_create_varargs(integrator_pit, 0x1300,
   pic[5], pic[6], pic[7], NULL);
 sysbus_create_simple(pl031, 0x1500, pic[8]);
@@ -545,7 +545,10 @@ static void integratorcp_init(MachineState *machine)
 sysbus_create_simple(pl050_keyboard, 0x1800, pic[3]);
 sysbus_create_simple(pl050_mouse, 0x1900, pic[4]);
 sysbus_create_simple(TYPE_INTEGRATOR_DEBUG, 0x1a00, 0);
-sysbus_create_varargs(pl181, 0x1c00, pic[23], pic[24], NULL);
+dev = sysbus_create_varargs(pl181, 0x1c00, pic[23], pic[24], NULL);
+/* Wire up MMC card detect */
+qdev_connect_gpio_out(dev, 1, qdev_get_gpio_in(sic, 3));
+
 if (nd_table[0].used)
 smc91c111_init(nd_table[0], 0xc800, pic[27]);
 
-- 
2.1.4



[Qemu-devel] [PATCH 2/6] vmport_rpc: Add the object vmport_rpc

2015-01-30 Thread Don Slutz
This is the 1st part of Add limited support of VMware's hyper-call
rpc.

This patch uses existing infrastructure used by vmmouse.c (provided
by vmport.c) to handle the VMware backdoor command 30.

One of the better on-line references is:

https://sites.google.com/site/chitchatvmback/backdoor

More in next patch.

Signed-off-by: Don Slutz dsl...@verizon.com
---
 hw/i386/pc.c  |   6 +++
 hw/misc/Makefile.objs |   1 +
 hw/misc/vmport_rpc.c  | 126 ++
 3 files changed, 133 insertions(+)
 create mode 100644 hw/misc/vmport_rpc.c

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c7af6aa..efae4d5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1435,8 +1435,14 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
 i8042 = isa_create_simple(isa_bus, i8042);
 i8042_setup_a20_line(i8042, a20_line[0]);
 if (!no_vmport) {
+ISADevice *vmport_rpc;
+
 vmport_init(isa_bus);
 vmmouse = isa_try_create(isa_bus, vmmouse);
+vmport_rpc = isa_try_create(isa_bus, vmport_rpc);
+if (vmport_rpc) {
+qdev_init_nofail(DEVICE(vmport_rpc));
+}
 } else {
 vmmouse = NULL;
 }
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 029a56f..5496992 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -7,6 +7,7 @@ common-obj-$(CONFIG_ISA_TESTDEV) += pc-testdev.o
 common-obj-$(CONFIG_PCI_TESTDEV) += pci-testdev.o
 
 obj-$(CONFIG_VMPORT) += vmport.o
+obj-$(CONFIG_VMPORT) += vmport_rpc.o
 
 # ARM devices
 common-obj-$(CONFIG_PL310) += arm_l2x0.o
diff --git a/hw/misc/vmport_rpc.c b/hw/misc/vmport_rpc.c
new file mode 100644
index 000..da724a4
--- /dev/null
+++ b/hw/misc/vmport_rpc.c
@@ -0,0 +1,126 @@
+/*
+ * QEMU VMPORT RPC emulation
+ *
+ * Copyright (C) 2015 Verizon Corporation
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License Version 2 (GPLv2)
+ * as published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details. http://www.gnu.org/licenses/.
+ */
+
+/*
+ * One of the better on-line references is:
+ *
+ * https://sites.google.com/site/chitchatvmback/backdoor
+ *
+ * Which points you to:
+ *
+ * http://open-vm-tools.sourceforge.net/
+ *
+ * as a place to get more accurate information by studying.
+ */
+
+#include hw/hw.h
+#include hw/i386/pc.h
+#include hw/qdev.h
+#include trace.h
+#include qmp-commands.h
+#include qapi/qmp/qerror.h
+
+/* #define VMPORT_RPC_DEBUG */
+
+#define TYPE_VMPORT_RPC vmport_rpc
+#define VMPORT_RPC(obj) OBJECT_CHECK(VMPortRpcState, (obj), TYPE_VMPORT_RPC)
+
+/* VMPORT RPC Command */
+#define VMPORT_RPC_COMMAND  30
+
+/* The vmport_rpc object. */
+typedef struct VMPortRpcState {
+ISADevice parent_obj;
+
+/* Properties */
+uint64_t reset_time;
+uint64_t build_number_value;
+uint64_t build_number_time;
+
+/* Private data */
+} VMPortRpcState;
+
+typedef struct {
+uint32_t eax;
+uint32_t ebx;
+uint32_t ecx;
+uint32_t edx;
+uint32_t esi;
+uint32_t edi;
+} vregs;
+
+static uint32_t vmport_rpc_ioport_read(void *opaque, uint32_t addr)
+{
+VMPortRpcState *s = opaque;
+union {
+uint32_t data[6];
+vregs regs;
+} ur;
+
+vmmouse_get_data(ur.data);
+
+s-build_number_time++;
+
+vmmouse_set_data(ur.data);
+return ur.data[0];
+}
+
+static void vmport_rpc_reset(DeviceState *d)
+{
+VMPortRpcState *s = VMPORT_RPC(d);
+
+s-reset_time = 14;
+s-build_number_value = 0;
+s-build_number_time = 0;
+}
+
+static void vmport_rpc_realizefn(DeviceState *dev, Error **errp)
+{
+VMPortRpcState *s = VMPORT_RPC(dev);
+
+vmport_register(VMPORT_RPC_COMMAND, vmport_rpc_ioport_read, s);
+}
+
+static Property vmport_rpc_properties[] = {
+DEFINE_PROP_UINT64(reset-time, VMPortRpcState, reset_time, 14),
+DEFINE_PROP_UINT64(build-number-value, VMPortRpcState,
+   build_number_value, 0),
+DEFINE_PROP_UINT64(build-number-time, VMPortRpcState,
+   build_number_time, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void vmport_rpc_class_initfn(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc-realize = vmport_rpc_realizefn;
+dc-reset = vmport_rpc_reset;
+dc-props = vmport_rpc_properties;
+}
+
+static const TypeInfo vmport_rpc_info = {
+.name  = TYPE_VMPORT_RPC,
+.parent= TYPE_ISA_DEVICE,
+.instance_size = sizeof(VMPortRpcState),
+.class_init= vmport_rpc_class_initfn,
+};
+
+static void vmport_rpc_register_types(void)
+{
+type_register_static(vmport_rpc_info);
+}
+
+type_init(vmport_rpc_register_types)
-- 
1.8.4




[Qemu-devel] [PATCH 6/6] MAINTAINERS: add VMware port

2015-01-30 Thread Don Slutz
Signed-off-by: Don Slutz dsl...@verizon.com
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fd335a4..b60ee6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -741,6 +741,13 @@ S: Maintained
 F: hw/net/vmxnet*
 F: hw/scsi/vmw_pvscsi*
 
+VMware port
+M: Don Slutz dsl...@verizon.com
+S: Maintained
+F: hw/misc/vmport.c
+F: hw/input/vmmouse.c
+F: hw/misc/vmport_rpc.c
+
 Subsystems
 --
 Audio
-- 
1.8.4




[Qemu-devel] [PATCH 3/6] vmport_rpc: Add limited support of VMware's hyper-call rpc

2015-01-30 Thread Don Slutz
The support included is enough to allow VMware tools to install in a
guest and provide guestinfo support.  guestinfo support is provided
by what is known as VMware RPC support.

If the guest is running VMware tools, then the build version of
the tools is also available via the property build-number-value of
the vmport_rpc object.  The build-number-time property is changed
every time the VMware tools running in the guest sends the
build version via the rpc.

The property reset-time controls how often to request them in
seconds minus 1.  The minus 1 is to handle to 0 case.  I.E. the
fastest that can be selected is every second.  The default is 4
times a minute.

The VMware RPC support includes the notion of channels that are
opened, active and closed.  All RPC messages sent via a channel
starts with normal ASCII text.  The message some times does include
binary data.

Currently there are 2 protocols defined for VMware RPC.  They
determine the direction for data flow, guest to tool stack or tool
stack to guest.

There is no provided interrupt for VMware RPC.

Getting VMPORT_RPC_DEBUG will provide a higher level trace calls
that are simpler to understand.

Signed-off-by: Don Slutz dsl...@verizon.com
---
 hw/misc/vmport_rpc.c | 789 ++-
 trace-events |  20 ++
 2 files changed, 808 insertions(+), 1 deletion(-)

diff --git a/hw/misc/vmport_rpc.c b/hw/misc/vmport_rpc.c
index da724a4..e12f912 100644
--- a/hw/misc/vmport_rpc.c
+++ b/hw/misc/vmport_rpc.c
@@ -40,6 +40,125 @@
 /* VMPORT RPC Command */
 #define VMPORT_RPC_COMMAND  30
 
+/* Limits on amount of non guest memory to use */
+#define MAX_KEY_LEN  128
+#define MIN_VAL_LEN  64
+#define MAX_VAL_LEN  8192
+#define MAX_NUM_KEY  256
+#define MAX_BKTS 4
+/* Max number of channels. */
+#define GUESTMSG_MAX_CHANNEL 8
+
+/*
+ * All of VMware's rpc is based on 32bit registers.  So this is the
+ * number of bytes in ebx.
+ */
+#define CHAR_PER_CALL   sizeof(uint32_t)
+/* Room for basic commands */
+#define EXTRA_SEND 22
+/* Status code and NULL */
+#define EXTRA_RECV 2
+#define MAX_SEND_BUF DIV_ROUND_UP(EXTRA_SEND + MAX_KEY_LEN + MAX_VAL_LEN, \
+  CHAR_PER_CALL)
+#define MAX_RECV_BUF DIV_ROUND_UP(EXTRA_RECV + MAX_VAL_LEN, CHAR_PER_CALL)
+#define MIN_SEND_BUF DIV_ROUND_UP(EXTRA_SEND + MAX_KEY_LEN + MIN_VAL_LEN, \
+  CHAR_PER_CALL)
+
+/* Reply statuses */
+/*  The basic request succeeded */
+#define MESSAGE_STATUS_SUCCESS  0x0001
+/*  vmware has a message available for its party */
+#define MESSAGE_STATUS_DORECV   0x0002
+/*  The channel has been closed */
+#define MESSAGE_STATUS_CLOSED   0x0004
+/*  vmware removed the message before the party fetched it */
+#define MESSAGE_STATUS_UNSENT   0x0008
+/*  A checkpoint occurred */
+#define MESSAGE_STATUS_CPT  0x0010
+/*  An underlying device is powering off */
+#define MESSAGE_STATUS_POWEROFF 0x0020
+/*  vmware has detected a timeout on the channel */
+#define MESSAGE_STATUS_TIMEOUT  0x0040
+/*  vmware supports high-bandwidth for sending and receiving the payload */
+#define MESSAGE_STATUS_HB   0x0080
+
+/* Max number of channels. */
+#define GUESTMSG_MAX_CHANNEL 8
+
+/* Flags to open a channel. */
+#define GUESTMSG_FLAG_COOKIE 0x8000
+
+/* Data to guest */
+#define VMWARE_PROTO_TO_GUEST   0x4f4c4354
+/* Data from guest */
+#define VMWARE_PROTO_FROM_GUEST 0x49435052
+
+/*
+ * Error return values used only in this file.  The routine
+ * convert_local_rc() is used to convert these to an Error
+ * object.
+ */
+#define VMPORT_DEVICE_NOT_FOUND -1
+#define SEND_NOT_OPEN   -2
+#define SEND_SKIPPED-3
+#define SEND_TRUCATED   -4
+#define SEND_NO_MEMORY  -5
+#define GUESTINFO_NOTFOUND  -6
+#define GUESTINFO_VALTOOLONG-7
+#define GUESTINFO_KEYTOOLONG-8
+#define GUESTINFO_TOOMANYKEYS   -9
+#define GUESTINFO_NO_MEMORY -10
+
+
+/* The VMware RPC guest info storage . */
+typedef struct {
+char *key_data;
+char *val_data;
+uint16_t val_len;
+uint16_t val_max;
+uint8_t key_len;
+} guestinfo_t;
+
+/* The VMware RPC bucket control. */
+typedef struct {
+uint16_t recv_len;
+uint16_t recv_idx;
+} bucket_control_t;
+
+/* The VMware RPC bucket info. */
+typedef struct {
+union {
+uint32_t *words;
+char *bytes;
+} recv;
+bucket_control_t ctl;
+uint16_t recv_buf_max;
+} bucket_t;
+
+
+/* The VMware RPC channel control. */
+typedef struct {
+uint64_t active_time;
+uint32_t chan_id;
+uint32_t cookie;
+uint32_t proto_num;
+uint16_t send_len;
+uint16_t send_idx;
+uint8_t recv_read;
+uint8_t recv_write;
+} channel_control_t;
+
+/* The VMware RPC channel info. */
+typedef struct {
+union {
+uint32_t *words;
+char *bytes;
+} send;
+channel_control_t ctl;
+bucket_t recv_bkt[MAX_BKTS];
+uint16_t 

[Qemu-devel] [PATCH 5/6] vmport: Add VMware all ring hack

2015-01-30 Thread Don Slutz
This is done by adding a new machine property vmware-port-ring3 that
needs to be enabled to have any effect.  It only effects accel=tcg
mode.  It is needed if you want to use VMware tools in accel=tcg
mode.

Signed-off-by: Don Slutz dsl...@verizon.com
(cherry picked from commit 6d99c91fc9ae27b476e89a8cc880b4a46e237536)
---
 hw/i386/pc.c | 28 +++-
 hw/i386/pc_piix.c|  2 +-
 hw/i386/pc_q35.c |  2 +-
 include/hw/i386/pc.h |  6 +-
 target-i386/cpu.c|  4 
 target-i386/cpu.h|  2 ++
 target-i386/seg_helper.c |  6 ++
 7 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index efae4d5..3999bbf 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1012,7 +1012,9 @@ void pc_hot_add_cpu(const int64_t id, Error **errp)
 pc_new_cpu(current_cpu_model, apic_id, icc_bridge, errp);
 }
 
-void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge)
+/* vmware_port_ring3 true says enable VMware port access in ring3. */
+void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge,
+  bool vmware_port_ring3)
 {
 int i;
 X86CPU *cpu = NULL;
@@ -1044,6 +1046,9 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 error_free(error);
 exit(1);
 }
+if (vmware_port_ring3) {
+cpu-env.hflags2 |= HF2_VMPORT_HACK_MASK;
+}
 }
 
 /* map APIC MMIO area if CPU has APIC */
@@ -1774,6 +1779,21 @@ static bool pc_machine_get_aligned_dimm(Object *obj, 
Error **errp)
 return pcms-enforce_aligned_dimm;
 }
 
+static bool pc_machine_get_vmware_port_ring3(Object *obj, Error **errp)
+{
+PCMachineState *pcms = PC_MACHINE(obj);
+
+return pcms-vmware_port_ring3;
+}
+
+static void pc_machine_set_vmware_port_ring3(Object *obj, bool value,
+ Error **errp)
+{
+PCMachineState *pcms = PC_MACHINE(obj);
+
+pcms-vmware_port_ring3 = value;
+}
+
 static void pc_machine_initfn(Object *obj)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
@@ -1804,6 +1824,12 @@ static void pc_machine_initfn(Object *obj)
 object_property_add_bool(obj, PC_MACHINE_ENFORCE_ALIGNED_DIMM,
  pc_machine_get_aligned_dimm,
  NULL, NULL);
+
+pcms-vmware_port_ring3 = false;
+object_property_add_bool(obj, PC_MACHINE_VMWARE_PORT_RING3,
+ pc_machine_get_vmware_port_ring3,
+ pc_machine_set_vmware_port_ring3,
+ NULL);
 }
 
 static void pc_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 38b42b0..8434c04 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -146,7 +146,7 @@ static void pc_init1(MachineState *machine,
 object_property_add_child(qdev_get_machine(), icc-bridge,
   OBJECT(icc_bridge), NULL);
 
-pc_cpus_init(machine-cpu_model, icc_bridge);
+pc_cpus_init(machine-cpu_model, icc_bridge, 
pc_machine-vmware_port_ring3);
 
 if (kvm_enabled()  kvmclock_enabled) {
 kvmclock_create();
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 63027ee..d952fa1 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -135,7 +135,7 @@ static void pc_q35_init(MachineState *machine)
 object_property_add_child(qdev_get_machine(), icc-bridge,
   OBJECT(icc_bridge), NULL);
 
-pc_cpus_init(machine-cpu_model, icc_bridge);
+pc_cpus_init(machine-cpu_model, icc_bridge, 
pc_machine-vmware_port_ring3);
 pc_acpi_init(q35-acpi-dsdt.aml);
 
 kvmclock_create();
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 69d9cf8..d31f157 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -40,6 +40,7 @@ struct PCMachineState {
 
 uint64_t max_ram_below_4g;
 OnOffAuto vmport;
+bool vmware_port_ring3;
 bool enforce_aligned_dimm;
 };
 
@@ -48,6 +49,7 @@ struct PCMachineState {
 #define PC_MACHINE_MAX_RAM_BELOW_4G max-ram-below-4g
 #define PC_MACHINE_VMPORT   vmport
 #define PC_MACHINE_ENFORCE_ALIGNED_DIMM enforce-aligned-dimm
+#define PC_MACHINE_VMWARE_PORT_RING3 vmware-port-ring3
 
 /**
  * PCMachineClass:
@@ -177,7 +179,9 @@ extern int fd_bootchk;
 void pc_register_ferr_irq(qemu_irq irq);
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge);
+/* vmware_port_ring3 true says enable VMware port access in ring3. */
+void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge,
+  bool vmware_port_ring3);
 void pc_hot_add_cpu(const int64_t id, Error **errp);
 void pc_acpi_init(const char *default_dsdt);
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 3a9b32e..a787599 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2594,6 +2594,7 @@ static void x86_cpu_reset(CPUState 

Re: [Qemu-devel] [PATCH v3 4/4] virtio-blk: introduce multiread

2015-01-30 Thread Peter Lieven
Am 30.01.2015 um 18:16 schrieb Max Reitz:
 On 2015-01-30 at 09:33, Peter Lieven wrote:
 this patch finally introduces multiread support to virtio-blk. While
 multiwrite support was there for a long time, read support was missing.

 The complete merge logic is moved into virtio-blk.c which has
 been the only user of request merging ever since. This is required
 to be able to merge chunks of requests and immediately invoke callbacks
 for those requests. Secondly, this is required to switch to
 direct invocation of coroutines which is planned at a later stage.

 The following benchmarks show the performance of running fio with
 4 worker threads on a local ram disk. The numbers show the average
 of 10 test runs after 1 run as warmup phase.

|4k|   64k|4k
 MB/s  | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
 --++-++-++
 master| 1221   | 1187| 4178   | 4114| 1745   | 1213
 multiread | 1829   | 1189| 4639   | 4110| 1894   | 1216

 Signed-off-by: Peter Lieven p...@kamp.de
 ---
   hw/block/dataplane/virtio-blk.c |   8 +-
   hw/block/virtio-blk.c   | 288 
 +++-
   include/hw/virtio/virtio-blk.h  |  19 +--
   trace-events|   1 +
   4 files changed, 210 insertions(+), 106 deletions(-)

 diff --git a/hw/block/dataplane/virtio-blk.c 
 b/hw/block/dataplane/virtio-blk.c
 index 39c5d71..93472e9 100644
 --- a/hw/block/dataplane/virtio-blk.c
 +++ b/hw/block/dataplane/virtio-blk.c
 @@ -96,9 +96,7 @@ static void handle_notify(EventNotifier *e)
   event_notifier_test_and_clear(s-host_notifier);
   blk_io_plug(s-conf-conf.blk);
   for (;;) {
 -MultiReqBuffer mrb = {
 -.num_writes = 0,
 -};
 +MultiReqBuffer mrb = {};
   int ret;
 /* Disable guest-host notifies to avoid unnecessary vmexits */
 @@ -120,7 +118,9 @@ static void handle_notify(EventNotifier *e)
   virtio_blk_handle_request(req, mrb);
   }
   -virtio_submit_multiwrite(s-conf-conf.blk, mrb);
 +if (mrb.num_reqs) {
 +virtio_submit_multireq(s-conf-conf.blk, mrb);
 +}
 if (likely(ret == -EAGAIN)) { /* vring emptied */
   /* Re-enable guest-host notifies and stop processing the 
 vring.
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index e04adb8..77890a0 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
 @@ -34,6 +34,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
   req-dev = s;
   req-qiov.size = 0;
   req-next = NULL;
 +req-mr_next = NULL;
   return req;
   }
   @@ -84,20 +85,29 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq 
 *req, int error,
 static void virtio_blk_rw_complete(void *opaque, int ret)
   {
 -VirtIOBlockReq *req = opaque;
 +VirtIOBlockReq *next = opaque;
   -trace_virtio_blk_rw_complete(req, ret);
 +while (next) {
 +VirtIOBlockReq *req = next;
 +next = req-mr_next;
 +trace_virtio_blk_rw_complete(req, ret);
   -if (ret) {
 -int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
 -bool is_read = !(p  VIRTIO_BLK_T_OUT);
 -if (virtio_blk_handle_rw_error(req, -ret, is_read))
 -return;
 -}
 +if (req-qiov.nalloc != -1) {
 +qemu_iovec_destroy(req-qiov);
 +}
   -virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 -block_acct_done(blk_get_stats(req-dev-blk), req-acct);
 -virtio_blk_free_request(req);
 +if (ret) {
 +int p = virtio_ldl_p(VIRTIO_DEVICE(req-dev), req-out.type);
 +bool is_read = !(p  VIRTIO_BLK_T_OUT);
 +if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
 +continue;
 +}
 +}
 +
 +virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 +block_acct_done(blk_get_stats(req-dev-blk), req-acct);
 +virtio_blk_free_request(req);
 +}
   }
 static void virtio_blk_flush_complete(void *opaque, int ret)
 @@ -291,24 +301,125 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq 
 *req)
   }
   }
   -void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb)
 +static inline void

 Is the inline really worth it? This function is rather long...

I tried really hard to not add a regression for random reads and the difference
between good and bad here is sometimes just a matter of a few small changes.
If gcc inlines on its own anyway would you mind to keep the inline? The function
is actually only called from 2 places.


 (To my surprise, gcc actually does inline the function)

 +virtio_submit_multireq2(BlockBackend *blk, MultiReqBuffer *mrb,

 Hm, that's not a very descriptive name... Maybe submit_merged_requests or 
 something like that (it's static, so you can drop the virtio_ prefix if you 

[Qemu-devel] [PATCH 4/6] vmport_rpc: Add QMP access to vmport_rpc object.

2015-01-30 Thread Don Slutz
This adds two new inject commands:

inject-vmport-reboot
inject-vmport-halt

And three guest info commands:

vmport-guestinfo-set
vmport-guestinfo-get
query-vmport-guestinfo

More details in qmp-commands.hx

Signed-off-by: Don Slutz dsl...@verizon.com
---
 hw/misc/vmport_rpc.c | 275 +++
 qapi-schema.json |  95 ++
 qmp-commands.hx  | 141 ++
 3 files changed, 511 insertions(+)

diff --git a/hw/misc/vmport_rpc.c b/hw/misc/vmport_rpc.c
index e12f912..74cc95e 100644
--- a/hw/misc/vmport_rpc.c
+++ b/hw/misc/vmport_rpc.c
@@ -218,6 +218,56 @@ typedef struct {
 uint32_t edi;
 } vregs;
 
+/*
+ * Run func() for every VMPortRpc device, traverse the tree for
+ * everything else.  Note: This routine expects that opaque is a
+ * VMPortRpcFind pointer and not NULL.
+ */
+static int find_VMPortRpc_device(Object *obj, void *opaque)
+{
+VMPortRpcFind *find = opaque;
+Object *dev;
+VMPortRpcState *s;
+
+if (find-found) {
+return 0;
+}
+dev = object_dynamic_cast(obj, TYPE_VMPORT_RPC);
+s = (VMPortRpcState *)dev;
+
+if (!s) {
+/* Container, traverse it for children */
+return object_child_foreach(obj, find_VMPortRpc_device, opaque);
+}
+
+find-found++;
+find-rc = find-func(s, find-arg);
+
+return 0;
+}
+
+/*
+ * Loop through all dynamically created VMPortRpc devices and call
+ * func() for each instance.
+ */
+static int foreach_dynamic_VMPortRpc_device(FindVMPortRpcDeviceFunc *func,
+void *arg)
+{
+VMPortRpcFind find = {
+.func = func,
+.arg = arg,
+};
+
+/* Loop through all VMPortRpc devices that were spawened outside
+ * the machine */
+find_VMPortRpc_device(qdev_get_machine(), find);
+if (find.found) {
+return find.rc;
+} else {
+return VMPORT_DEVICE_NOT_FOUND;
+}
+}
+
 #ifdef VMPORT_RPC_DEBUG
 /*
  * Add helper function for traceing.  This routine will convert
@@ -465,6 +515,26 @@ static int get_guestinfo(VMPortRpcState *s,
 return GUESTINFO_NOTFOUND;
 }
 
+static int get_qmp_guestinfo(VMPortRpcState *s,
+ unsigned int a_key_len, char *a_info_key,
+ unsigned int *a_value_len, void **a_value_data)
+{
+unsigned int i;
+
+for (i = 0; i  s-used_guestinfo; i++) {
+if (s-guestinfo[i] 
+(s-guestinfo[i]-key_len == a_key_len) 
+(memcmp(a_info_key, s-guestinfo[i]-key_data,
+s-guestinfo[i]-key_len) == 0)) {
+*a_value_len = s-guestinfo[i]-val_len;
+*a_value_data = s-guestinfo[i]-val_data;
+return 0;
+}
+}
+
+return GUESTINFO_NOTFOUND;
+}
+
 static int set_guestinfo(VMPortRpcState *s, int a_key_len,
  unsigned int a_val_len, char *a_info_key, char *val)
 {
@@ -864,6 +934,210 @@ static uint32_t vmport_rpc_ioport_read(void *opaque, 
uint32_t addr)
 return ur.data[0];
 }
 
+static int find_send(VMPortRpcState *s, void *arg)
+{
+return ctrl_send(s, arg);
+}
+
+static void convert_local_rc(Error **errp, int rc)
+{
+switch (rc) {
+case 0:
+break;
+case VMPORT_DEVICE_NOT_FOUND:
+error_set(errp, QERR_DEVICE_NOT_FOUND, TYPE_VMPORT_RPC);
+break;
+case SEND_NOT_OPEN:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc not open);
+break;
+case SEND_SKIPPED:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc send skipped);
+break;
+case SEND_TRUCATED:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR, VMWare rpc send trucated);
+break;
+case SEND_NO_MEMORY:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare rpc send out of memory);
+break;
+case GUESTINFO_NOTFOUND:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare guestinfo not found);
+break;
+case GUESTINFO_VALTOOLONG:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare guestinfo value too long);
+break;
+case GUESTINFO_KEYTOOLONG:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare guestinfo key too long);
+break;
+case GUESTINFO_TOOMANYKEYS:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare guestinfo too many keys);
+break;
+case GUESTINFO_NO_MEMORY:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare guestinfo out of memory);
+break;
+default:
+error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+  VMWare rpc send rc=%d unknown, rc);
+break;
+}
+}
+
+void qmp_inject_vmport_reboot(Error **errp)
+{
+int rc = foreach_dynamic_VMPortRpc_device(find_send, (void *)OS_Reboot);
+
+convert_local_rc(errp, rc);
+}
+
+void qmp_inject_vmport_halt(Error **errp)
+{
+

Re: [Qemu-devel] [PATCH RESEND 03/50] hw/block/fdc: Implement tray status

2015-01-30 Thread John Snow



On 01/27/2015 02:45 PM, Max Reitz wrote:

The tray of an FDD is open iff there is no medium inserted (there are
only two states for an FDD: medium inserted or no medium inserted).

This results in the tray being reported as open if qemu has been started
with the default floppy drive, which breaks some tests. Fix them.

Signed-off-by: Max Reitz mre...@redhat.com
---
  hw/block/fdc.c | 20 +---
  tests/fdc-test.c   |  4 +---
  tests/qemu-iotests/067.out | 60 +++---
  tests/qemu-iotests/071.out |  2 --
  tests/qemu-iotests/081.out |  1 -
  tests/qemu-iotests/087.out |  5 
  6 files changed, 26 insertions(+), 66 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 2bf87c9..0c5a6b4 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -192,6 +192,8 @@ typedef struct FDrive {
  uint8_t ro;   /* Is read-only   */
  uint8_t media_changed;/* Is media changed   */
  uint8_t media_rate;   /* Data rate of medium*/
+
+bool media_inserted;  /* Is there a medium in the tray */
  } FDrive;

  static void fd_init(FDrive *drv)
@@ -261,7 +263,7 @@ static int fd_seek(FDrive *drv, uint8_t head, uint8_t 
track, uint8_t sect,
  #endif
  drv-head = head;
  if (drv-track != track) {
-if (drv-blk != NULL  blk_is_inserted(drv-blk)) {
+if (drv-media_inserted) {
  drv-media_changed = 0;
  }
  ret = 1;
@@ -270,7 +272,7 @@ static int fd_seek(FDrive *drv, uint8_t head, uint8_t 
track, uint8_t sect,
  drv-sect = sect;
  }

-if (drv-blk == NULL || !blk_is_inserted(drv-blk)) {
+if (!drv-media_inserted) {
  ret = 2;
  }

@@ -296,7 +298,7 @@ static void fd_revalidate(FDrive *drv)
  ro = blk_is_read_only(drv-blk);
  pick_geometry(drv-blk, nb_heads, max_track,
last_sect, drv-drive, drive, rate);
-if (!blk_is_inserted(drv-blk)) {
+if (!drv-media_inserted) {
  FLOPPY_DPRINTF(No disk in drive\n);
  } else {
  FLOPPY_DPRINTF(Floppy disk (%d h %d t %d s) %s\n, nb_heads,
@@ -2062,12 +2064,21 @@ static void fdctrl_change_cb(void *opaque, bool load)
  {
  FDrive *drive = opaque;

+drive-media_inserted = load  drive-blk  blk_is_inserted(drive-blk);
+
  drive-media_changed = 1;
  fd_revalidate(drive);
  }

+static bool fdctrl_is_tray_open(void *opaque)
+{
+FDrive *drive = opaque;
+return !drive-media_inserted;
+}
+
  static const BlockDevOps fdctrl_block_ops = {
  .change_media_cb = fdctrl_change_cb,
+.is_tray_open = fdctrl_is_tray_open,
  };

  /* Init functions */
@@ -2095,6 +2106,9 @@ static void fdctrl_connect_drives(FDCtrl *fdctrl, Error 
**errp)
  fdctrl_change_cb(drive, 0);
  if (drive-blk) {
  blk_set_dev_ops(drive-blk, fdctrl_block_ops, drive);
+if (blk_is_inserted(drive-blk)) {
+drive-media_inserted = true;
+}
  }
  }
  }
diff --git a/tests/fdc-test.c b/tests/fdc-test.c
index 3c6c83c..f287c10 100644
--- a/tests/fdc-test.c
+++ b/tests/fdc-test.c
@@ -293,9 +293,7 @@ static void test_media_insert(void)
  qmp_discard_response({'execute':'change', 'arguments':{
'device':'floppy0', 'target': %s, 'arg': 'raw' }},
   test_image);
-qmp_discard_response(); /* ignore event
- (FIXME open - open transition?!) */
-qmp_discard_response(); /* ignore event */
+qmp_discard_response(); /* ignore event (open - close) */

  dir = inb(FLOPPY_BASE + reg_dir);
  assert_bit_set(dir, DSKCHG);
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 00b3eae..42bae32 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -69,7 +69,7 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virti
  device: floppy0,
  locked: false,
  removable: true,
-tray_open: false,
+tray_open: true,
  type: unknown
  },
  {
@@ -131,7 +131,7 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virti
  device: floppy0,
  locked: false,
  removable: true,
-tray_open: false,
+tray_open: true,
  type: unknown
  },
  {
@@ -165,17 +165,6 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virti
  tray-open: true
  }
  }
-{
-timestamp: {
-seconds:  TIMESTAMP,
-microseconds:  TIMESTAMP
-},
-event: DEVICE_TRAY_MOVED,
-data: {
-device: floppy0,
-tray-open: true
-}
-}


  === -drive/device_add and device_del ===
@@ -246,7 +235,7 @@ Testing: -drive 

[Qemu-devel] value of VIRTQUEUE_MAX_SIZE

2015-01-30 Thread Peter Lieven
Hi,

Just wondering if VIRTQUEUE_MAX_SIZE in include/hw/virtio/virtio.h should not 
be equal to IOV_MAX instead of the hardcoded 1024?

Thanks,
Peter




Re: [Qemu-devel] [PATCH 7/8] qmp: print dirty bitmap

2015-01-30 Thread Eric Blake
On 01/30/2015 02:06 AM, Vladimir Sementsov-Ogievskiy wrote:
 is it better to add qmp_query_dirty_bitmap with underlying
 bdrv_query_dirty_bitmap, or to modify (add dirty regions information)
 existing qmp_query_block/qmp_query_dirty_bitmapS?

[please don't top-post on technical lists]

Extending an existing command may be reasonable, if the amount of
information being added is compact (if you are adding several kilobytes
of information, that might justify a new command, if only so that old
callers don't waste time receiving large amounts of data on the wire
just to sift through and discard it all to get at the older content they
cared about).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v11 00/13] block: Incremental backup series

2015-01-30 Thread John Snow



On 01/30/2015 05:24 AM, Vladimir Sementsov-Ogievskiy wrote:

About added functions for BdrvDirtyBitmap:

some functions has needless BlockDriverState* parameter, and others -
doesn't:

with needless *bs:
bdrv_dirty_bitmap_make_anon
bdrv_dirty_bitmap_granularity
bdrv_clear_dirty_bitmap

without *bs:
bdrv_dirty_bitmap_enabled
bdrv_disable_dirty_bitmap
bdrv_enable_dirty_bitmap

I think, to be consistent, onlly one of two ways should be chosen:
1) all bdrv_dirty_bitmap_* functions has BlockDriverState* bs
parameter, maybe needless
2) only bdrv_dirty_bitmap_* function that need BlockDriverState* bs
parameter has it

Personally, I prefer the second one.

Best regards,
Vladimir


Fair enough, it is inconsistent. I was mostly following Fam's lead for 
his prototypes, so I could go either way with it.


Before I cull the unused parameters, though:

Fam, do we have a strong reason for why we want to maintain the unused 
BDS parameter for some of these functions? (Like making sure a caller 
HAS a BDS before they call these functions?)


Thanks,
--js



On 12.01.2015 19:30, John Snow wrote:

Welcome to version 11. I hope you are enjoying our regular newsletter.

This patchset enables the in-memory part of the incremental backup
feature. A patchset by Vladimir Sementsov-Ogievskiy enables the
migration of in-memory dirty bitmaps, and a future patchset will
enable the storage and retrieval of dirty bitmaps to and from permanent
storage.

Enough changes have been made that most Reviewed-By lines from
previous iterations have been removed. (Sorry!)

This series was originally authored by Fam Zheng;
his cover letter is included below.

~John Snow

=

This is the in memory part of the incremental backup feature.

With the added commands, we can create a bitmap on a block
backend, from which point of time all the writes are tracked by
the bitmap, marking sectors as dirty. Later, we call drive-backup
and pass the bitmap to it, to do an incremental backup.

See the last patch which adds some tests for this use case.

Fam

=

For convenience, this patchset is available on github:
 https://github.com/jnsnow/qemu/commits/dbm-backup

v11:

  - Instead of copying BdrvDirtyBitmaps and keeping a pointer to the
object we were copied from, we instead freeze a bitmap in-place
without copying it. On success, we thaw and delete the bitmap.
On failure, we merge the bitmap with a successor, which is an
anonymous bitmap attached as a child that records writes for us
for the duration of the backup operation.

This means that incremental backups can NEVER BE RETRIED in a
deterministic fashion. If an incremental backup fails on a set
of dirty sectors {x}, and a new set of dirty sectors {y} are
introduced during the backup, then any possible retry action
on an incremental backup can only operate on {x,y}. There is no
way to get an incremental backup as it would have been.

So, the failure mode for incremental backup is to try again,
and the resulting image will simply be a differential from the
last successful dirty bitmap backup.

  - Removed hbitmap_copy and bdrv_dirty_bitmap_copy.

  - Added a small fixup patch:
- Update all granularity fields to be uint64_t.
- Update documentation around BdrvDirtyBitmap structure.

  - Modified transactions to obey frozen attribute of BdrvDirtyBitmaps.

  - Added frozen attribute to the info query.

v10 (Overview):

  - I've included one of Vladimir's bitmap fixes as patch #1.

  - QMP commands and transactions are now protected via
aio_context functions.

  - QMP commands use node-ref as a parameter name now. Reasoning
is thus: Either a device name or a node name can be used to
reference a BDS, which is the key item we are actually acting
on with these bitmap commands. Thus, I refer to this unified
parameter as a Node Reference, or node-ref.

We could also argue for backend-ref or device-ref for the
reverse semantics: where we accept a unified parameter, but we
intend to resolve it to the BlockBackend instead of resolving
the parameter given to the BlockDriverState.

Or, we could use reference for both cases, and to match the
existing BlockdevRef command.

  - All QMP commands added are now per-node via a unified parameter
name. Though backup only operates on devices, you are free to
create bitmaps for any arbitrary node you wish. It is obviously
only useful for the root node, currently.

  - Bitmap Sync modes (CONSUME, RESET) are removed. See below
(changelog, RFC questions) for more details.

  - Code to recover the bitmap after a failure has been added,
but I have some major questions about drive_backup guarantees.
See RFC questions.

v10 (Detailed Changelog):

(1/13) New Patch:
  - Included Vladimir Sementsov-Ogievskiy's 

Re: [Qemu-devel] [PATCH 14/21] blockdev: Remove blk_hide_on_behalf_of_do_drive_del()

2015-01-30 Thread Eric Blake
On 01/26/2015 12:27 PM, Max Reitz wrote:
 This function first removed the BlockBackend from the blk_backends list
 and cleared its name so it would no longer be found by blk_name(); since
 blk_next() now iterates through monitor_block_backends (which the BB is
 removed from in do_drive_del()), this is no longer necessary.
 
 Second, bdrv_make_anon() was called on the BDS. This was intended for
 cases where the BDS was owned by that BB alone; in which case the BDS
 will no longer exist at this point thanks to the blk_remove_bs() in
 do_drive_del().
 
 Therefore, this function does nothing useful anymore. Remove it.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block/block-backend.c  | 25 ++---
  blockdev.c |  1 -
  include/sysemu/block-backend.h |  2 --
  3 files changed, 2 insertions(+), 26 deletions(-)

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 15/21] block: Make bdrv_drain_one() public

2015-01-30 Thread Eric Blake
On 01/26/2015 12:27 PM, Max Reitz wrote:
 We will need it in block/block-backend.c.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block.c   | 2 +-
  include/block/block_int.h | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake ebl...@redhat.com


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2015-01-30 Thread Kevin Wolf
Am 30.01.2015 um 18:04 hat John Snow geschrieben:
 
 
 On 01/30/2015 09:32 AM, Kevin Wolf wrote:
 Am 21.01.2015 um 10:34 hat Markus Armbruster geschrieben:
 I'm afraid I forgot much of the discussion we had before the break, and
 only now it's coming back, slowly.
 
 Quoting myself on naming parameters identifying nodes[*]:
 
  John Snow pointed out to me that we still haven't spelled out how this
  single parameter should be named.
 
  On obvious option is calling it node-name, or FOO-node-name when we 
  have
  several.  However, we'd then have two subtly different kinds of
  parameters called like that: the old ones accept *only* node names, the
  new ones also accept backend names, which automatically resolve to the
  backend's root node.
 
  Three ways to cope with that:
 
  * Find a better name.
 
  * Make the old ones accept backend names, too.  Only a few, not that
much work.  However, there are exceptions:
 
- blockdev-add's node-name *defines* the node name.
 
- query-named-block-nodes's node-name *is* the node's name.
 
  * Stop worrying and embrace the inconsistency.  The affected commands
are headed for deprecation anyway.
 
  I think I'd go with node or FOO-node for parameters that reference
  nodes and accept both node names and backend names, and refrain from
  touching the existing node-name parameters.
 
 Wasn't the conclusion last time that we would try to find a better name
 for new commands and leave old commands alone because they are going to
 become deprecated? That is, a combination of your first and last option?
 
 
 That was my impression, too: Use a new name for new commands and
 then slowly phase out the old things. This makes the new name clear
 as to what it supports (BOTH backends and nodes through a common
 namespace) to external management utilities like libvirt.
 
 That's why I just rolled 'node-ref.'

Yes, I agree with that in principle. I called it 'node' below because
that's shorter and doesn't include type information in the name, but if
everyone preferred 'node-ref', I wouldn't have a problem with it either.

 Let's go through existing uses of @node-name again:
 
 1. Define a node name
 
 QMP commands blockdev-add (type BlockdevOptionsBase), drive-mirror
 
 2. Report a node name
 
 QMP command query-named-block-nodes (type BlockDeviceInfo)
 
 Whatever name we end up using, 1. and 2. should probably use the same.
 
 Should they? If these commands accept directly *node* names and have
 no chance of referencing a backend, maybe they should use different
 parameter names.

Note that these cases don't refer to a node, but they define/return the
name of a node. No way that could ever be a backend name, unless we
broke the code.

 3. Node reference with backend names permitted for convenience
 
 New QMP command block-dirty-bitmap-add (type BlockDirtyBitmapAdd) and
 others
 
 4. Node reference with backend names not permitted
 
 QMP commands drive-mirror @replaces, change-backing-file
 @image-node-name
 
 We may want to support the backend name resolves to root node
 convenience feature here, for consistency.  Then this moves under 3.
 
 Note interface wart: change-backing-file additionally requires the
 backend owning the node.  We need the backend to set op-blockers, we
 can't easily find it from the node, so we make the user point it out
 to us.
 
 These shouldn't be existing. As you say, we should move them to 3.
 
 
 Technically #3 here isn't a usage of node-name, because... I
 didn't use node-name for these commands. Unless I am reading this
 list wrong, but it's definitely not an existing use.
 
 I don't have any opinions about #4; presumably that's something
 we're aiming to phase out.

Yes. Where phasing out simply means to extend it so that it accepts
backend names. That should in theory be an easy change, except that
@image-node-name has a name that isn't quite compatible with it...

 5. Pair of names node reference, specify exactly one
 
 QMP commands block_passwd, block_resize, blockdev-snapshot-sync
 
 We can ignore this one, because we intend to replace the commands and
 deprecate the old ones.
 
 Agreed, these shouldn't be existing either.
 
 If I understand you correctly, you're proposing to use @node-name or
 @FOO-node-name when the value must be a node name (items 1+2 and 4), and
 @node-ref or @FOO-node-ref where we additionally support the backend
 name resolves to root node convenience feature (item 3).
 
 Is that a fair description of your proposal?
 
 PRO: the name makes it clear when the convenience feature is supported.
 
 CON: if we eliminate 4 by supporting the convenience feature, we either
 create ugly exceptions to the naming convention, or rename the
 parameters.
 
 Opinions?
 
 If we don't have any cases where node names are allowed, but backend
 names are not, then there is no reason to have two 

[Qemu-devel] [PATCH 10/19] libqos/ahci: Add ide cmd properties

2015-01-30 Thread John Snow
Add a structure that defines some properties of various IDE commands.
These will be used to simplify the interface to the libqos AHCI calls,
lessening the redundancy of specifying and respecifying properties of
commands to various helper functions.

The Invalid Command Sentinel here that caps the property array is an
invalid ATA command, namely 0x01. 0x00 is NOP and 0xFF is reserved for
vendor usage, so I chose the first invalid one instead.

Signed-off-by: John Snow js...@redhat.com
---
 tests/libqos/ahci.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 924b9f0..f0e2a27 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -34,6 +34,48 @@
 #include hw/pci/pci_ids.h
 #include hw/pci/pci_regs.h
 
+typedef struct AHCICommandProp {
+uint8_t  cmd;/* Command Code */
+bool data;   /* Data transfer command? */
+bool pio;
+bool dma;
+bool lba28;
+bool lba48;
+bool read;
+bool write;
+bool atapi;
+bool ncq;
+uint64_t size;   /* Static transfer size, for commands like IDENTIFY. 
*/
+uint32_t interrupts; /* Expected interrupts for this command. */
+} AHCICommandProp;
+
+#define CMD_INVALID_SENTINEL 0x01
+
+AHCICommandProp ahci_command_properties[] = {
+{ .cmd = CMD_READ_PIO,  .data = true,  .pio = true,
+.lba28 = true, .read = true },
+{ .cmd = CMD_WRITE_PIO, .data = true,  .pio = true,
+.lba28 = true, .write = true },
+{ .cmd = CMD_READ_PIO_EXT,  .data = true,  .pio = true,
+.lba48 = true, .read = true },
+{ .cmd = CMD_WRITE_PIO_EXT, .data = true,  .pio = true,
+.lba48 = true, .write = true },
+{ .cmd = CMD_READ_DMA,  .data = true,  .dma = true,
+.lba28 = true, .read = true },
+{ .cmd = CMD_WRITE_DMA, .data = true,  .dma = true,
+.lba28 = true, .write = true },
+{ .cmd = CMD_READ_DMA_EXT,  .data = true,  .dma = true,
+.lba48 = true, .read = true },
+{ .cmd = CMD_WRITE_DMA_EXT, .data = true,  .dma = true,
+.lba48 = true, .write = true },
+{ .cmd = CMD_IDENTIFY,  .data = true,  .pio = true,
+.size = 512,   .read = true },
+{ .cmd = CMD_READ_MAX,  .lba28 = true },
+{ .cmd = CMD_READ_MAX_EXT,  .lba48 = true },
+{ .cmd = CMD_FLUSH_CACHE,   .data = false },
+{ .cmd = CMD_INVALID_SENTINEL }
+};
+
 /**
  * Allocate space in the guest using information in the AHCIQState object.
  */
-- 
1.9.3




[Qemu-devel] [PATCH 11/19] libqos/ahci: add ahci command functions

2015-01-30 Thread John Snow
This patch adds the AHCICommand structure, and a set of functions to
operate on the structure.

ahci_command_create - Initialize and create a new AHCICommand in memory
ahci_command_free - Destroy this object.
ahci_command_set_buffer - Set where the guest memory DMA buffer is.
ahci_command_commit - Write this command to the AHCI HBA.
ahci_command_issue - Issue the committed command synchronously.
ahci_command_issue_async - Issue the committed command asynchronously.
ahci_command_wait - Wait for an asynchronous command to finish.
ahci_command_slot - Get the number of the command slot we committed to.

Helpers:
size_to_prdtl   - Calculate the required minimum PRDTL size from
  a buffer size.
ahci_command_find   - Given an ATA command mnemonic, look it up in the
  properties table to obtain info about the command.
command_header_init - Initialize the command header with sane values.
command_table_init  - Initialize the command table with sane values.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   |  73 +--
 tests/libqos/ahci.c | 202 
 tests/libqos/ahci.h |  15 
 3 files changed, 234 insertions(+), 56 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index c862525..0834020 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -657,30 +657,28 @@ static void ahci_test_port_spec(AHCIQState *ahci, uint8_t 
port)
  */
 static void ahci_test_identify(AHCIQState *ahci)
 {
-RegH2DFIS fis;
-AHCICommandHeader cmd;
-PRD prd;
 uint32_t data_ptr;
 uint16_t buff[256];
 unsigned i;
 int rc;
+AHCICommand *cmd;
 uint8_t cx;
-uint64_t table;
 
 g_assert(ahci != NULL);
 
 /* We need to:
- * (1) Create a Command Table Buffer and update the Command List Slot #0
- * to point to this buffer.
- * (2) Construct an FIS host-to-device command structure, and write it to
+ * (1) Create a data buffer for the IDENTIFY response to be sent to,
+ * (2) Create a Command Table Buffer
+ * (3) Construct an FIS host-to-device command structure, and write it to
  * the top of the command table buffer.
- * (3) Create a data buffer for the IDENTIFY response to be sent to
  * (4) Create a Physical Region Descriptor that points to the data buffer,
  * and write it to the bottom (offset 0x80) of the command table.
- * (5) Now, PxCLB points to the command list, command 0 points to
+ * (5) Obtain a Command List slot, and update this header to point to
+ * the Command Table we built above.
+ * (6) Now, PxCLB points to the command list, command 0 points to
  * our table, and our table contains an FIS instruction and a
  * PRD that points to our rx buffer.
- * (6) We inform the HBA via PxCI that there is a command ready in slot #0.
+ * (7) We inform the HBA via PxCI that there is a command ready in slot #0.
  */
 
 /* Pick the first implemented and running port */
@@ -690,61 +688,24 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* Clear out the FIS Receive area and any pending interrupts. */
 ahci_port_clear(ahci, i);
 
-/* Create a Command Table buffer. 0x80 is the smallest with a PRDTL of 0. 
*/
-/* We need at least one PRD, so round up to the nearest 0x80 multiple.
*/
-table = ahci_alloc(ahci, CMD_TBL_SIZ(1));
-g_assert(table);
-ASSERT_BIT_CLEAR(table, 0x7F);
-
-/* Create a data buffer ... where we will dump the IDENTIFY data to. */
+/* Create a data buffer where we will dump the IDENTIFY data to. */
 data_ptr = ahci_alloc(ahci, 512);
 g_assert(data_ptr);
 
-/* pick a command slot (should be 0!) */
-cx = ahci_pick_cmd(ahci, i);
-
-/* Construct our Command Header (set_command_header handles endianness.) */
-memset(cmd, 0x00, sizeof(cmd));
-cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
-cmd.flags = CMDH_CLR_BSY; /* clear PxTFD.STS.BSY when done */
-cmd.prdtl = 1;/* One PRD table entry. */
-cmd.prdbc = 0;
-cmd.ctba = table;
-
-/* Construct our PRD, noting that DBC is 0-indexed. */
-prd.dba = cpu_to_le64(data_ptr);
-prd.res = 0;
-/* 511+1 bytes, request DPS interrupt */
-prd.dbc = cpu_to_le32(511 | 0x8000);
-
-/* Construct our Command FIS, Based on http://wiki.osdev.org/AHCI */
-memset(fis, 0x00, sizeof(fis));
-fis.fis_type = REG_H2D_FIS;  /* Register Host-to-Device FIS */
-fis.command = CMD_IDENTIFY;
-fis.device = 0;
-fis.flags = REG_H2D_FIS_CMD; /* Indicate this is a command FIS */
-
-/* We've committed nothing yet, no interrupts should be posted yet. */
-g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
-
-/* Commit the Command FIS to the Command Table */
-memwrite(table, fis, sizeof(fis));
-
-/* Commit the PRD entry to the Command Table */
-

Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-30 Thread Denis V. Lunev

On 30/01/15 22:48, Kevin Wolf wrote:

Am 30.01.2015 um 19:39 hat Denis V. Lunev geschrieben:

On 29/01/15 16:49, Denis V. Lunev wrote:

On 29/01/15 16:18, Kevin Wolf wrote:

Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:

The following sequence
 int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
 for (i = 0; i  10; i++)
 write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes. This patch
forces page alignment when we really forced to perform memory
allocation.

Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Paolo Bonzini pbonz...@redhat.com
CC: Kevin Wolf kw...@redhat.com
CC: Stefan Hajnoczi stefa...@redhat.com
---
  block.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index d45e4dd..38cf73f 100644
--- a/block.c
+++ b/block.c
@@ -5293,7 +5293,11 @@ void
bdrv_set_guest_block_size(BlockDriverState *bs, int align)
void *qemu_blockalign(BlockDriverState *bs, size_t size)
  {
-return qemu_memalign(bdrv_opt_mem_align(bs), size);
+size_t align = bdrv_opt_mem_align(bs);
+if (align  4096) {
+align = 4096;
+}
+return qemu_memalign(align, size);
  }
void *qemu_blockalign0(BlockDriverState *bs, size_t size)
@@ -5307,6 +5311,9 @@ void
*qemu_try_blockalign(BlockDriverState *bs, size_t size)
/* Ensure that NULL is never returned on success */
  assert(align  0);
+if (align  4096) {
+align = 4096;
+}
  if (size == 0) {
  size = align;
  }

This is the wrong place to make this change. First you're duplicating
logic in the callers of bdrv_opt_mem_align() instead of making it return
the right thing in the first place.

This has been actually done in the first iteration. bdrv_opt_mem_align
is called actually three times in:
  qemu_blockalign
  qemu_try_blockalign
  bdrv_qiov_is_aligned
Paolo says that he does not want to have bdrv_qiov_is_aligned affected
to avoid extra bounce buffering.


From my point of view this extra bounce buffering is better than

unaligned
pointer during write to the disk as 512/4096 logical/physical
sectors size
disks are mainstream now. Though I don't want to specially argue here.
Normal guest operations results in page aligned requests and this is not
a problem at all. The amount of 512 aligned requests from guest side is
quite negligible.

  Second, you're arguing with numbers

from a simple test case for O_DIRECT on Linux, but you're changing the

alignment for everyone instead of just the raw-posix driver which is
responsible for accessing Linux files.

This should not be a real problem. We are allocation memory for the
buffer. A little bit stricter alignment is not a big overhead for
any libc
implementation thus this kludge will not produce any significant
overhead.

Also, what's the real reason for the performance improvement? Having
page alignment? If so, actually querying the page size instead of
assuming 4k might be worth a thought.

Kevin

Most likely the problem comes from the read-modify-write pattern
either in kernel or in disk. Actually my experience says that it is a
bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
ABI technically allows this but in general it is much less tested.

Yes, this synthetic test shows some difference here. In terms of
qemu-io the result is also visible, but less
  qemu-img create -f qcow2 ./1.img 64G
  qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
performs 1% better.

There is also similar kludge here
size_t bdrv_opt_mem_align(BlockDriverState *bs)
{
if (!bs || !bs-drv) {
/* 4k should be on the safe side */
return 4096;
}

return bs-bl.opt_mem_alignment;
}
which just uses 4096 constant.

Yes, I could agree that queering page size could be a good idea, but
I do not know at the moment how to do that. Can you pls share your
opinion if you have any.

Regards,
Den

Paolo, Kevin,

I have spent a bit more time digging the issue and found some
additional information. The same 5% difference if the buffer is
aligned to 512/4096 is observed for the following devices/filesystems

1) ext4 with block size equals to 1024 over 512/512 physical/logical
sector size SSD disk
2) ext4 with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk
3) ext4 with block size equals to 4096 over 512/4096 physical/logical
sector size rotational disk (WDC WD20EZRX)
4) with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk

This means that only page size (4k) matters.

Guys, you propose quite different approaches. I can extend this patch
to use sysconf(_SC_PAGESIZE) to detect page size and drop hardcoded
4096. This is not a problem. But you have different opinion about
the 

Re: [Qemu-devel] [PATCH v2] net: cadence_gem: Set initial MAC address

2015-01-30 Thread Peter Crosthwaite
On Thu, Jan 29, 2015 at 9:48 PM, Sebastian Huber
sebastian.hu...@embedded-brains.de wrote:
 Set initial MAC address to the one specified by the command line.

 Signed-off-by: Sebastian Huber sebastian.hu...@embedded-brains.de
 Reviewed-by: Jason Wang jasow...@redhat.com

 v2: Remove superfluous whitespace change.

The inter-spin change-logs should not go into commit messages. This
should be below the ---. With this removed (placed below the ---):

Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com

 ---

Change-logs go here.

Regards,
Peter

  hw/net/cadence_gem.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
 index de26609..f9a7af1 100644
 --- a/hw/net/cadence_gem.c
 +++ b/hw/net/cadence_gem.c
 @@ -1005,6 +1005,7 @@ static void gem_reset(DeviceState *d)
  {
  int i;
  GemState *s = GEM(d);
 +const uint8_t *a;

  DB_PRINT(\n);

 @@ -1023,6 +1024,11 @@ static void gem_reset(DeviceState *d)
  s-regs[GEM_DESCONF5] = 0x002f2145;
  s-regs[GEM_DESCONF6] = 0x0200;

 +/* Set MAC address */
 +a = s-conf.macaddr.a[0];
 +s-regs[GEM_SPADDR1LO] = a[0] | (a[1]  8) | (a[2]  16) | (a[3]  
 24);
 +s-regs[GEM_SPADDR1HI] = a[4] | (a[5]  8);
 +
  for (i = 0; i  4; i++) {
  s-sar_active[i] = false;
  }
 --
 1.8.4.5





[Qemu-devel] [PATCH 06/19] libqos/ahci: Add ahci_port_check_interrupts helper

2015-01-30 Thread John Snow
A helper that compares a given port's current interrupts and checks them
against a supplied list of expected interrupt bits, and throws an error
if they do not match.

The helper then resets the requested interrupts on this port, and asserts
that the interrupt register is now empty.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 13 ++---
 tests/libqos/ahci.c | 14 ++
 tests/libqos/ahci.h |  2 ++
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 63ecf73..e744f44 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -747,19 +747,10 @@ static void ahci_test_identify(AHCIQState *ahci)
 while (BITSET(ahci_px_rreg(ahci, i, AHCI_PX_TFD), AHCI_PX_TFD_STS_BSY)) {
 usleep(50);
 }
+/* Check registers for post-command consistency */
 ahci_port_check_error(ahci, i);
-
-/* Check for expected interrupts */
-reg = ahci_px_rreg(ahci, i, AHCI_PX_IS);
-ASSERT_BIT_SET(reg, AHCI_PX_IS_DHRS);
-ASSERT_BIT_SET(reg, AHCI_PX_IS_PSS);
 /* BUG: we expect AHCI_PX_IS_DPS to be set. */
-ASSERT_BIT_CLEAR(reg, AHCI_PX_IS_DPS);
-
-/* Clear expected interrupts and assert all interrupts now cleared. */
-ahci_px_wreg(ahci, i, AHCI_PX_IS,
- AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS | AHCI_PX_IS_DPS);
-g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
+ahci_port_check_interrupts(ahci, i, AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS);
 
 /* Investigate the CMD, assert that we read 512 bytes */
 ahci_get_command_header(ahci, i, cx, cmd);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 35de9af..3318a54 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -333,6 +333,20 @@ void ahci_port_check_error(AHCIQState *ahci, uint8_t px)
 ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR);
 }
 
+void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t px,
+uint32_t intr_mask)
+{
+uint32_t reg;
+
+/* Check for expected interrupts */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_IS);
+ASSERT_BIT_SET(reg, intr_mask);
+
+/* Clear expected interrupts and assert all interrupts now cleared. */
+ahci_px_wreg(ahci, px, AHCI_PX_IS, intr_mask);
+g_assert_cmphex(ahci_px_rreg(ahci, px, AHCI_PX_IS), ==, 0);
+}
+
 /* Get the #cx'th command of port #px. */
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd)
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index b384cbc..4b6696a 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -434,6 +434,8 @@ void ahci_hba_enable(AHCIQState *ahci);
 unsigned ahci_port_select(AHCIQState *ahci);
 void ahci_port_clear(AHCIQState *ahci, uint8_t px);
 void ahci_port_check_error(AHCIQState *ahci, uint8_t px);
+void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t px,
+uint32_t intr_mask);
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd);
 void ahci_set_command_header(AHCIQState *ahci, uint8_t px,
-- 
1.9.3




Re: [Qemu-devel] QEMU segfault: Booting an overlay with backing_file over NBD: nbd.c:nbd_receive_request():L756: read failed

2015-01-30 Thread Kashyap Chamarthy
On Fri, Jan 30, 2015 at 06:15:21PM +0100, Kevin Wolf wrote:
 Am 29.01.2015 um 17:25 hat Kashyap Chamarthy geschrieben:

$ qemu-system-x86_64   \
   -nographic  \
   -nodefconfig\
   -nodefaults \
   -m 2048 \
   -device virtio-scsi-pci,id=scsi \
   -device virtio-serial-pci   \
   -serial stdio   \
   -drive file=./overlay1.qcow2,format=qcow2,if=virtio,cache=writeback
Segmentation fault (core dumped)
  
  
  On the shell where `qemu-nbd` is running, I notice this
  
nbd.c:nbd_receive_request():L756: read failed
  
  
  Haven't investigated further with GDB, thought I'd bring it up here
  first.
  
  
  Versions
  
  
$ rpm -q qemu; uname -r
qemu-2.1.2-7.fc21.x86_64
3.17.8-300.fc21.x86_64
 
 Copying Stefan because he's the master of AIO contexts and it is
 bs-aio_context that becomes NULL. I couldn't see anything obvious.

 
 In the meantime, could you retest on git master?

Just tested from git, and I can still reproduce it.

That's the commit I'm at:

  $ git describe 
  v2.2.0-682-g16017c4


Run the NBD server, from git:

  $ /home/kashyapc/build/qemu/qemu-nbd -f qcow2 \
  -p10809 ./f21vm.qcow2 -t


Create the overlay:

  $ /home/kashyapc/build/qemu/qemu-img create \
  -f qcow2 -F nbd -o backing_file=nbd://localhost overlay2-of-f21vm.qcow2
  Segmentation fault (core dumped)

Creating the overlay from the  git-compiled `qemu-img` binary fails.

So, let's create the overlay using the `qemu-img` binary from the system
(RPM version noted above) and boot the overlay from the just compiled
QEMU x86_64 binary from git, still core dumps:

  $ /home/kashyapc/build/qemu/x86_64-softmmu/qemu-system-x86_64 \
  -nographic  \
  -nodefconfig\
  -nodefaults \
  -m 2048 \
  -device virtio-scsi-pci,id=scsi \
  -device virtio-serial-pci   \
  -serial stdio   \
  -drive file=./overlay2-f21vm.qcow2,format=qcow2,if=virtio,cache=writeback
  Segmentation fault (core dumped)


PS: I'm traveling, so I'll be a little slow to respond here, but can
provide more debugging info from the coredump of `qemu-img` binary as I
have access to a real computer.


-- 
/kashyap



Re: [Qemu-devel] vm live storage migration approach.

2015-01-30 Thread Stefan Hajnoczi
On Fri, Jan 30, 2015 at 8:13 PM, Yaodong Yang yaodong.ya...@gmail.com wrote:
 An follow up questions.

 Suppose I have a running VM with two virtual disks, I would like to migrate
 the vm from host A to host B. Both host A and host B have their own isolated
 storage devices. Is there anyway to migrate the vm's memory, two virtual
 disk images and other states together from host A to host B? Can
 drive_mirror command itself finish this job? I noticed that drive_mirror
 only mirror for one virtual disk and require both the source and destination
 share the same storage namespace. I do not know how to migrate the whole VM
 (memory, storage, network ) together from host A to host B, given that host
 A and host B have NO shared storage resource.

 Could you show me an example, if possible?

 I know migrate -b works well for this purpose. But the downside is
 migrate -b does not mirror Write Requests to both host A and host B during
 migration. In this case, migrate -b has a higher VM downtime during the
 migration.

Hi Yaodong,
The answers to these questions are in libvirt's source code.  It
orchestrates live migration between two hosts.

Multiple independent drive_mirror jobs can run.  That's how you handle
multiple disks.

The regular migrate command (without -b) can be used if drive_mirror
is in place and fully synced.

Libvirt sets up an NBD server on the destination host.  The source
host runs drive_mirror to copy over the contents of the disk images.
Once the drive_mirror command has completed syncing data the regular
'migrate' command can be used to send device state and RAM over to the
destination host.

See qemuMigrationDriveMirror() in libvirt.

migrate -b is a legacy feature that is being replaced by drive_mirror.

Stefan



[Qemu-devel] Virtual timer (future) status

2015-01-30 Thread Stefano Angeleri
Hi,

I need to syncronize the simulation time of QEMU with another tool.
In order to do so I'd need to know in advance at least a rough estimation
of how much the virtual timer will progress upon executing the next block
of instruction(s) (QEMU_CLOCK_VIRTUAL_RT/QEMU_CLOCK_VIRTUAL)
in qemu/cpus.c seems there are some function which sound like might give me
an approximate result (by knowing how many instructions are going to be
executed next) like cpu_get_icount with cpu_icount_to_ns. Is that a right
approach?
On OVP there is a lookup table in order to figure out how much an
instruction is going to take is there something similar in QEMU? Or a more
appropriate way to obtain these results?
Thanks in advance for any help.
Stefano


[Qemu-devel] Pci device example

2015-01-30 Thread rku slov
Hi, I've created for fun a pci dummy device in Qemu and the associated
linux driver.
Some friends wanted to know how I had done it so I did a write up.

If this can be of any use to someone who want to create his own device I
would be glad.
Also, if there are some error or some things implemented wrongly, I would
be more than happy to correct my examples.

http://tic-le-polard.blogspot.fr/2015/01/emulate-pci-device-with-qemu.html

Have a good day !


[Qemu-devel] [PATCH 01/19] libqos/ahci: Add ahci_port_select helper

2015-01-30 Thread John Snow
This helper identifies which port of the
AHCI HBA has a device we may run tests on.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 19 ++-
 tests/libqos/ahci.c | 27 +++
 tests/libqos/ahci.h |  1 +
 3 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index fca33d2..c689b62 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -662,7 +662,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 RegH2DFIS fis;
 AHCICommand cmd;
 PRD prd;
-uint32_t ports, reg, table, data_ptr;
+uint32_t reg, table, data_ptr;
 uint16_t buff[256];
 unsigned i;
 int rc;
@@ -684,22 +684,7 @@ static void ahci_test_identify(AHCIQState *ahci)
  */
 
 /* Pick the first implemented and running port */
-ports = ahci_rreg(ahci, AHCI_PI);
-for (i = 0; i  32; ports = 1, ++i) {
-if (ports == 0) {
-i = 32;
-}
-
-if (!(ports  0x01)) {
-continue;
-}
-
-reg = ahci_px_rreg(ahci, i, AHCI_PX_CMD);
-if (BITSET(reg, AHCI_PX_CMD_ST)) {
-break;
-}
-}
-g_assert_cmphex(i, , 32);
+i = ahci_port_select(ahci);
 g_test_message(Selected port %u for test, i);
 
 /* Clear out this port's interrupts (ignore the init register d2h fis) */
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 3056fb1..8874790 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -267,3 +267,30 @@ void ahci_hba_enable(AHCIQState *ahci)
  * In the future, a small test-case to inspect the Register D2H FIS
  * and clear the initial interrupts might be good. */
 }
+
+/**
+ * Pick the first implemented and running port
+ */
+unsigned ahci_port_select(AHCIQState *ahci)
+{
+uint32_t ports, reg;
+unsigned i;
+
+ports = ahci_rreg(ahci, AHCI_PI);
+for (i = 0; i  32; ports = 1, ++i) {
+if (ports == 0) {
+i = 32;
+}
+
+if (!(ports  0x01)) {
+continue;
+}
+
+reg = ahci_px_rreg(ahci, i, AHCI_PX_CMD);
+if (BITSET(reg, AHCI_PX_CMD_ST)) {
+break;
+}
+}
+g_assert(i  32);
+return i;
+}
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 77f2055..b3992e1 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -431,5 +431,6 @@ void free_ahci_device(QPCIDevice *dev);
 void ahci_pci_enable(AHCIQState *ahci);
 void start_ahci_device(AHCIQState *ahci);
 void ahci_hba_enable(AHCIQState *ahci);
+unsigned ahci_port_select(AHCIQState *ahci);
 
 #endif
-- 
1.9.3




[Qemu-devel] [PATCH 07/19] libqos/ahci: Add port_check_nonbusy helper

2015-01-30 Thread John Snow
A simple helper that asserts a given port is not busy processing any
commands via the TFD, Command Issue and SACT registers.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   |  1 +
 tests/libqos/ahci.c | 18 ++
 tests/libqos/ahci.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index e744f44..77d0f97 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -751,6 +751,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 ahci_port_check_error(ahci, i);
 /* BUG: we expect AHCI_PX_IS_DPS to be set. */
 ahci_port_check_interrupts(ahci, i, AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS);
+ahci_port_check_nonbusy(ahci, i, cx);
 
 /* Investigate the CMD, assert that we read 512 bytes */
 ahci_get_command_header(ahci, i, cx, cmd);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 3318a54..0016aad 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -347,6 +347,24 @@ void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t 
px,
 g_assert_cmphex(ahci_px_rreg(ahci, px, AHCI_PX_IS), ==, 0);
 }
 
+void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t px, uint8_t cx)
+{
+uint32_t reg;
+
+/* Assert that the command slot is no longer busy (NCQ) */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_SACT);
+ASSERT_BIT_CLEAR(reg, (1  cx));
+
+/* Non-NCQ */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_CI);
+ASSERT_BIT_CLEAR(reg, (1  cx));
+
+/* And assert that we are generally not busy. */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_TFD);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_BSY);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_DRQ);
+}
+
 /* Get the #cx'th command of port #px. */
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd)
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 4b6696a..7c504a5 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -436,6 +436,7 @@ void ahci_port_clear(AHCIQState *ahci, uint8_t px);
 void ahci_port_check_error(AHCIQState *ahci, uint8_t px);
 void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t px,
 uint32_t intr_mask);
+void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t px, uint8_t cx);
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd);
 void ahci_set_command_header(AHCIQState *ahci, uint8_t px,
-- 
1.9.3




[Qemu-devel] [PATCH 00/19] qtest/ahci: add dma test

2015-01-30 Thread John Snow
Add a simple DMA r/w test to ahci-test.

Oh, and for the first 18 patches, refactor everything into helpers so
that each ahci_test isn't a thousand lines long.

This patch depends upon the ahci test preliminary refactoring series
upstream, which shuffled a lot of libqos and malloc facilities to
support this series.

This patchset is a necessary step in checking in AHCI/DMA migration
tests that I will later use as proof as suitability of enabling the
ICH9 and AHCI migration flags.

~John

John Snow (19):
  libqos/ahci: Add ahci_port_select helper
  libqos/ahci: Add ahci_port_clear helper
  qtest/ahci: rename 'Command' to 'CommandHeader'
  libqos/ahci: Add command header helpers
  libqos/ahci: Add ahci_port_check_error helper
  libqos/ahci: Add ahci_port_check_interrupts helper
  libqos/ahci: Add port_check_nonbusy helper
  libqos/ahci: Add cmd response sanity check helpers
  qtest/ahci: Demagic ahci tests.
  libqos/ahci: Add ide cmd properties
  libqos/ahci: add ahci command functions
  libqos/ahci: add ahci command verify
  libqos/ahci: add ahci command size setters
  libqos/ahci: Add ahci_guest_io
  libqos/ahci: add ahci_io
  libqos/ahci: Add ahci_clean_mem
  qtest/ahci: Add a macro bootup routine
  qtest/ahci: Assert sector size in identify test
  qtest/ahci: Adding simple dma read-write test

 tests/ahci-test.c | 246 +-
 tests/libqos/ahci.c   | 554 ++
 tests/libqos/ahci.h   | 163 ---
 tests/libqos/malloc.c |   5 +
 tests/libqos/malloc.h |   1 +
 5 files changed, 798 insertions(+), 171 deletions(-)

-- 
1.9.3




Re: [Qemu-devel] [PATCH 1/1] block: enforce minimal 4096 alignment in qemu_blockalign

2015-01-30 Thread Kevin Wolf
Am 30.01.2015 um 19:39 hat Denis V. Lunev geschrieben:
 On 29/01/15 16:49, Denis V. Lunev wrote:
 On 29/01/15 16:18, Kevin Wolf wrote:
 Am 29.01.2015 um 11:50 hat Denis V. Lunev geschrieben:
 The following sequence
  int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
  for (i = 0; i  10; i++)
  write(fd, buf, 4096);
 performs 5% better if buf is aligned to 4096 bytes rather then to
 512 bytes on HDD with 512/4096 logical/physical sector size.
 
 The difference is quite reliable.
 
 On the other hand we do not want at the moment to enforce bounce
 buffering if guest request is aligned to 512 bytes. This patch
 forces page alignment when we really forced to perform memory
 allocation.
 
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Paolo Bonzini pbonz...@redhat.com
 CC: Kevin Wolf kw...@redhat.com
 CC: Stefan Hajnoczi stefa...@redhat.com
 ---
   block.c | 9 -
   1 file changed, 8 insertions(+), 1 deletion(-)
 
 diff --git a/block.c b/block.c
 index d45e4dd..38cf73f 100644
 --- a/block.c
 +++ b/block.c
 @@ -5293,7 +5293,11 @@ void
 bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
   {
 -return qemu_memalign(bdrv_opt_mem_align(bs), size);
 +size_t align = bdrv_opt_mem_align(bs);
 +if (align  4096) {
 +align = 4096;
 +}
 +return qemu_memalign(align, size);
   }
 void *qemu_blockalign0(BlockDriverState *bs, size_t size)
 @@ -5307,6 +5311,9 @@ void
 *qemu_try_blockalign(BlockDriverState *bs, size_t size)
 /* Ensure that NULL is never returned on success */
   assert(align  0);
 +if (align  4096) {
 +align = 4096;
 +}
   if (size == 0) {
   size = align;
   }
 This is the wrong place to make this change. First you're duplicating
 logic in the callers of bdrv_opt_mem_align() instead of making it return
 the right thing in the first place.
 This has been actually done in the first iteration. bdrv_opt_mem_align
 is called actually three times in:
   qemu_blockalign
   qemu_try_blockalign
   bdrv_qiov_is_aligned
 Paolo says that he does not want to have bdrv_qiov_is_aligned affected
 to avoid extra bounce buffering.
 
 From my point of view this extra bounce buffering is better than
 unaligned
 pointer during write to the disk as 512/4096 logical/physical
 sectors size
 disks are mainstream now. Though I don't want to specially argue here.
 Normal guest operations results in page aligned requests and this is not
 a problem at all. The amount of 512 aligned requests from guest side is
 quite negligible.
   Second, you're arguing with numbers
 from a simple test case for O_DIRECT on Linux, but you're changing the
 alignment for everyone instead of just the raw-posix driver which is
 responsible for accessing Linux files.
 This should not be a real problem. We are allocation memory for the
 buffer. A little bit stricter alignment is not a big overhead for
 any libc
 implementation thus this kludge will not produce any significant
 overhead.
 Also, what's the real reason for the performance improvement? Having
 page alignment? If so, actually querying the page size instead of
 assuming 4k might be worth a thought.
 
 Kevin
 Most likely the problem comes from the read-modify-write pattern
 either in kernel or in disk. Actually my experience says that it is a
 bad idea to supply 512 byte aligned buffer for O_DIRECT IO.
 ABI technically allows this but in general it is much less tested.
 
 Yes, this synthetic test shows some difference here. In terms of
 qemu-io the result is also visible, but less
   qemu-img create -f qcow2 ./1.img 64G
   qemu-io -n -c 'write -P 0xaa 0 1G' 1.img
 performs 1% better.
 
 There is also similar kludge here
 size_t bdrv_opt_mem_align(BlockDriverState *bs)
 {
 if (!bs || !bs-drv) {
 /* 4k should be on the safe side */
 return 4096;
 }
 
 return bs-bl.opt_mem_alignment;
 }
 which just uses 4096 constant.
 
 Yes, I could agree that queering page size could be a good idea, but
 I do not know at the moment how to do that. Can you pls share your
 opinion if you have any.
 
 Regards,
 Den
 Paolo, Kevin,
 
 I have spent a bit more time digging the issue and found some
 additional information. The same 5% difference if the buffer is
 aligned to 512/4096 is observed for the following devices/filesystems
 
 1) ext4 with block size equals to 1024 over 512/512 physical/logical
sector size SSD disk
 2) ext4 with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk
 3) ext4 with block size equals to 4096 over 512/4096 physical/logical
sector size rotational disk (WDC WD20EZRX)
 4) with block size equals to 4096 over 512/512 physical/logical
sector size SSD disk
 
 This means that only page size (4k) matters.
 
 Guys, you propose quite different approaches. I can extend this patch
 to use sysconf(_SC_PAGESIZE) to detect page size and drop hardcoded

Re: [Qemu-devel] New IOREQ type -- IOREQ_TYPE_VMWARE_PORT

2015-01-30 Thread Don Slutz
On 01/30/15 05:23, Paul Durrant wrote:
 -Original Message-
 From: Don Slutz [mailto:dsl...@verizon.com]
 Sent: 29 January 2015 19:41
 To: Paul Durrant; Don Slutz; qemu-devel@nongnu.org; Stefano Stabellini
 Cc: Peter Maydell; Olaf Hering; Alexey Kardashevskiy; Stefan Weil; Michael
 Tokarev; Alexander Graf; Gerd Hoffmann; Stefan Hajnoczi; Paolo Bonzini;
 xen-devel
 Subject: New IOREQ type -- IOREQ_TYPE_VMWARE_PORT

 On 01/29/15 07:09, Paul Durrant wrote:
 ...
 Given that IIRC you are using a new dedicated IOREQ type, I
 think there needs to be something that allows an emulator to
 register for this IOREQ type. How about adding a new type to
 those defined for HVMOP_map_io_range_to_ioreq_server for your
 case? (In your case the start and end values in the hypercall
 would be meaningless but it could be used to steer
 hvm_select_ioreq_server() into sending all emulation requests or
 your new type to QEMU.


 This is an interesting idea.  Will need to spend more time on it.


This does look very doable.  The only issue I see is that it requires
a QEMU change in order to work.  This would prevent Xen 4.6 from using
QEMU 2.2.0 and vmport (vmware-tools, vmware-mouse).

What makes sense to me is to invert it  I.E. the default is to send
IOREQ_TYPE_VMWARE_PORT via io_range, and an ioreq server can say stop
sending them.

The reason this is safe so far is that IOREQ_TYPE_VMWARE_PORT can only
be sent if vmport is configured on.  And in that case QEMU will be
started with vmport=on which will cause all QEMUs that do not support
IOREQ_TYPE_VMWARE_PORT to crash.





 Actually such a mechanism could be used
 to steer IOREQ_TYPE_TIMEOFFSET requests as, with the new QEMU
 patches, they are going nowhere. Upstream QEMU (as default) used
 to ignore them anyway, which is why I didn't bother with such a
 patch to Xen before but since you now need one maybe you could
 add that too?


 I think it would not be that hard.  Will look into adding it.


 Currently I do not see how hvm_do_resume() works with 2 ioreq servers.
 It looks like to me that if a vpcu (like 0) needs to wait for the
 2nd ioreq server, hvm_do_resume() will check the 1st ioreq server
 and return as if the ioreq is done.  What am I missing?

 
 hvm_do_resume() walks the ioreq server list looking at the IOREQ state in the 
 shared page of each server in turn. If no IOREQ was sent to that server then 
 then state will be IOREQ_NONE and hvm_wait_for_io() will return 1 immediately 
 so the outer loop in hvm_do_resume() will move on to the next server. If a 
 state of IOREQ_READY or IOREQ_INPROCESS is found then the vcpu blocks on the 
 relevant event channel until the state transitions to IORESP_READY. The IOREQ 
 is then completed and the loop moves on to the next server.
 Normally an IOREQ would only be directed at one server and indeed IOREQs that 
 are issued for emulation requests (i.e. when io_state != HVMIO_none) fall 
 into this category but there is one case of a broadcast IOREQ, which is the 
 INVALIDATE IOREQ (sent to tell emulators to invalidate any mappings of guest 
 memory they may have cached) and that is why hvm_do_resume() has to iterate 
 over all servers.
 
 Does that make sense?
 

Thanks for the clear info.  It does make sense.

   -Don Slutz

   Paul
 
-Don Slutz
 



[Qemu-devel] [PATCH 12/19] libqos/ahci: add ahci command verify

2015-01-30 Thread John Snow
Helps to verify that a command completed successfully.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 16 ++--
 tests/libqos/ahci.c | 12 
 tests/libqos/ahci.h |  1 +
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 0834020..6e7b765 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -662,7 +662,6 @@ static void ahci_test_identify(AHCIQState *ahci)
 unsigned i;
 int rc;
 AHCICommand *cmd;
-uint8_t cx;
 
 g_assert(ahci != NULL);
 
@@ -702,20 +701,9 @@ static void ahci_test_identify(AHCIQState *ahci)
  * so we should find that there are no pending interrupts yet. */
 g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
 
-/* Issue Command #cx via PxCI */
+/* Issue command and sanity check response. */
 ahci_command_issue(ahci, cmd);
-cx = ahci_command_slot(cmd);
-
-/* Check registers for post-command consistency */
-ahci_port_check_error(ahci, i);
-/* BUG: we expect AHCI_PX_IS_DPS to be set. */
-ahci_port_check_interrupts(ahci, i, AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS);
-ahci_port_check_nonbusy(ahci, i, cx);
-/* Investigate the CMD, assert that we read 512 bytes */
-ahci_port_check_cmd_sanity(ahci, i, cx, 512);
-/* Investigate FIS responses */
-ahci_port_check_d2h_sanity(ahci, i, cx);
-ahci_port_check_pio_sanity(ahci, i, cx, 512);
+ahci_command_verify(ahci, cmd);
 
 /* Last, but not least: Investigate the IDENTIFY response data. */
 memread(data_ptr, buff, 512);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 87cc9dc..594c821 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -725,6 +725,18 @@ void ahci_command_issue(AHCIQState *ahci, AHCICommand *cmd)
 ahci_command_wait(ahci, cmd);
 }
 
+void ahci_command_verify(AHCIQState *ahci, AHCICommand *cmd)
+{
+ahci_port_check_error(ahci, cmd-px);
+ahci_port_check_interrupts(ahci, cmd-px, cmd-interrupts);
+ahci_port_check_nonbusy(ahci, cmd-px, cmd-cx);
+ahci_port_check_cmd_sanity(ahci, cmd-px, cmd-cx, cmd-xbytes);
+ahci_port_check_d2h_sanity(ahci, cmd-px, cmd-cx);
+if (cmd-props-pio) {
+ahci_port_check_pio_sanity(ahci, cmd-px, cmd-cx, cmd-xbytes);
+}
+}
+
 uint8_t ahci_command_slot(AHCICommand *cmd)
 {
 return cmd-cx;
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 99ac820..46b7bf3 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -527,6 +527,7 @@ void ahci_command_commit(AHCIQState *ahci, AHCICommand 
*cmd, uint8_t px);
 void ahci_command_issue(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_command_issue_async(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_command_wait(AHCIQState *ahci, AHCICommand *cmd);
+void ahci_command_verify(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_command_free(AHCICommand *cmd);
 /* Command adjustments */
 void ahci_command_set_buffer(AHCICommand *cmd, uint64_t buffer);
-- 
1.9.3




[Qemu-devel] [PATCH 02/19] libqos/ahci: Add ahci_port_clear helper

2015-01-30 Thread John Snow
Add a helper that assists in clearing out potentially old error and FIS
information from an AHCI port's data structures. This ensures we always
start with a blank slate for interrupt and FIS receipt information.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   |  9 ++---
 tests/libqos/ahci.c | 16 
 tests/libqos/ahci.h |  1 +
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index c689b62..90647f2 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -687,13 +687,8 @@ static void ahci_test_identify(AHCIQState *ahci)
 i = ahci_port_select(ahci);
 g_test_message(Selected port %u for test, i);
 
-/* Clear out this port's interrupts (ignore the init register d2h fis) */
-reg = ahci_px_rreg(ahci, i, AHCI_PX_IS);
-ahci_px_wreg(ahci, i, AHCI_PX_IS, reg);
-g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
-
-/* Wipe the FIS-Receive Buffer */
-qmemset(ahci-port[i].fb, 0x00, 0x100);
+/* Clear out the FIS Receive area and any pending interrupts. */
+ahci_port_clear(ahci, i);
 
 /* Create a Command Table buffer. 0x80 is the smallest with a PRDTL of 0. 
*/
 /* We need at least one PRD, so round up to the nearest 0x80 multiple.
*/
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 8874790..9195bd6 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -294,3 +294,19 @@ unsigned ahci_port_select(AHCIQState *ahci)
 g_assert(i  32);
 return i;
 }
+
+/**
+ * Clear a port's interrupts and status information prior to a test.
+ */
+void ahci_port_clear(AHCIQState *ahci, uint8_t px)
+{
+uint32_t reg;
+
+/* Clear out this port's interrupts (ignore the init register d2h fis) */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_IS);
+ahci_px_wreg(ahci, px, AHCI_PX_IS, reg);
+g_assert_cmphex(ahci_px_rreg(ahci, px, AHCI_PX_IS), ==, 0);
+
+/* Wipe the FIS-Recieve Buffer */
+qmemset(ahci-port[px].fb, 0x00, 0x100);
+}
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index b3992e1..49ddee5 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -432,5 +432,6 @@ void ahci_pci_enable(AHCIQState *ahci);
 void start_ahci_device(AHCIQState *ahci);
 void ahci_hba_enable(AHCIQState *ahci);
 unsigned ahci_port_select(AHCIQState *ahci);
+void ahci_port_clear(AHCIQState *ahci, uint8_t px);
 
 #endif
-- 
1.9.3




[Qemu-devel] [PATCH 03/19] qtest/ahci: rename 'Command' to 'CommandHeader'

2015-01-30 Thread John Snow
The structure name is a bit of a misnomer; the structure currently named
command is actually the commandheader. A future patch in this series
will add an actual Command structure, so we'll rename it now before the
rest of the functions in this series try to use it.

In addition, rename the b1 and b2 fields
to be a unified uint16_t named flags.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 6 +++---
 tests/libqos/ahci.h | 7 +++
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 90647f2..85e5761 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -660,7 +660,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 RegD2HFIS *d2h = g_malloc0(0x20);
 RegD2HFIS *pio = g_malloc0(0x20);
 RegH2DFIS fis;
-AHCICommand cmd;
+AHCICommandHeader cmd;
 PRD prd;
 uint32_t reg, table, data_ptr;
 uint16_t buff[256];
@@ -703,8 +703,8 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* Copy the existing Command #0 structure from the CLB into local memory,
  * and build a new command #0. */
 memread(ahci-port[i].clb, cmd, sizeof(cmd));
-cmd.b1 = 5;/* reg_h2d_fis is 5 double-words long */
-cmd.b2 = 0x04; /* clear PxTFD.STS.BSY when done */
+cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
+cmd.flags |= 0x400; /* clear PxTFD.STS.BSY when done */
 cmd.prdtl = cpu_to_le16(1); /* One PRD table entry. */
 cmd.prdbc = 0;
 cmd.ctba = cpu_to_le32(table);
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 49ddee5..e3ada6e 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -329,15 +329,14 @@ typedef struct RegH2DFIS {
  * Command List entry structure.
  * The command list contains between 1-32 of these structures.
  */
-typedef struct AHCICommand {
-uint8_t b1;
-uint8_t b2;
+typedef struct AHCICommandHeader {
+uint16_t flags; /* Cmd-Fis-Len, PMP#, and flags. */
 uint16_t prdtl; /* Phys Region Desc. Table Length */
 uint32_t prdbc; /* Phys Region Desc. Byte Count */
 uint32_t ctba;  /* Command Table Descriptor Base Address */
 uint32_t ctbau; /*'' Upper */
 uint32_t res[4];
-} __attribute__((__packed__)) AHCICommand;
+} __attribute__((__packed__)) AHCICommandHeader;
 
 /**
  * Physical Region Descriptor; pointed to by the Command List Header,
-- 
1.9.3




[Qemu-devel] [PATCH 13/19] libqos/ahci: add ahci command size setters

2015-01-30 Thread John Snow
Adds setters for size, prd_size and both via set_sizes.

Signed-off-by: John Snow js...@redhat.com
---
 tests/libqos/ahci.c | 22 ++
 tests/libqos/ahci.h |  5 +
 2 files changed, 27 insertions(+)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 594c821..3375d54 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -648,6 +648,28 @@ void ahci_command_set_buffer(AHCICommand *cmd, uint64_t 
buffer)
 cmd-buffer = buffer;
 }
 
+void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes,
+unsigned prd_size)
+{
+/* Each PRD can describe up to 4MiB, and must not be odd. */
+g_assert_cmphex(prd_size, =, 4096 * 1024);
+g_assert_cmphex(prd_size  0x01, ==, 0x00);
+cmd-prd_size = prd_size;
+cmd-xbytes = xbytes;
+cmd-fis.count = cpu_to_le16(cmd-xbytes / AHCI_SECTOR_SIZE);
+cmd-header.prdtl = size_to_prdtl(cmd-xbytes, cmd-prd_size);
+}
+
+void ahci_command_set_size(AHCICommand *cmd, uint64_t xbytes)
+{
+ahci_command_set_sizes(cmd, xbytes, cmd-prd_size);
+}
+
+void ahci_command_set_prd_size(AHCICommand *cmd, unsigned prd_size)
+{
+ahci_command_set_sizes(cmd, cmd-xbytes, prd_size);
+}
+
 void ahci_command_commit(AHCIQState *ahci, AHCICommand *cmd, uint8_t px)
 {
 uint16_t i, prdtl;
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 46b7bf3..1300967 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -531,6 +531,11 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand 
*cmd);
 void ahci_command_free(AHCICommand *cmd);
 /* Command adjustments */
 void ahci_command_set_buffer(AHCICommand *cmd, uint64_t buffer);
+void ahci_command_set_size(AHCICommand *cmd, uint64_t xbytes);
+void ahci_command_set_prd_size(AHCICommand *cmd, unsigned prd_size);
+void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes,
+unsigned prd_size);
 
 uint8_t ahci_command_slot(AHCICommand *cmd);
+
 #endif
-- 
1.9.3




[Qemu-devel] [PATCH 15/19] libqos/ahci: add ahci_io

2015-01-30 Thread John Snow
ahci_io is a wrapper around ahci_guest_io that takes a pointer to host
memory instead, and will create a guest memory buffer and copy the data
to/from as needed and as appropriate for a read/write command, such that
after a read, the guest data will be in a host buffer, and for a write,
the data will be transmitted to guest memory prior to the block operation.

Now that we have all the syntactic sugar functions in place for AHCI,
we can convert the identify test to be very, very short.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 63 +
 tests/libqos/ahci.c | 25 +
 tests/libqos/ahci.h |  2 ++
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 6e7b765..47491fe 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -657,56 +657,43 @@ static void ahci_test_port_spec(AHCIQState *ahci, uint8_t 
port)
  */
 static void ahci_test_identify(AHCIQState *ahci)
 {
-uint32_t data_ptr;
 uint16_t buff[256];
-unsigned i;
+unsigned px;
 int rc;
-AHCICommand *cmd;
+const size_t buffsize = 512;
 
 g_assert(ahci != NULL);
 
-/* We need to:
- * (1) Create a data buffer for the IDENTIFY response to be sent to,
- * (2) Create a Command Table Buffer
+/**
+ * This serves as a bit of a tutorial on AHCI device programming:
+ *
+ * (1) Create a data buffer for the IDENTIFY response to be sent to
+ * (2) Create a Command Table buffer, where we will store the
+ * command and PRDT (Physical Region Descriptor Table)
  * (3) Construct an FIS host-to-device command structure, and write it to
- * the top of the command table buffer.
- * (4) Create a Physical Region Descriptor that points to the data buffer,
- * and write it to the bottom (offset 0x80) of the command table.
- * (5) Obtain a Command List slot, and update this header to point to
- * the Command Table we built above.
- * (6) Now, PxCLB points to the command list, command 0 points to
- * our table, and our table contains an FIS instruction and a
- * PRD that points to our rx buffer.
- * (7) We inform the HBA via PxCI that there is a command ready in slot #0.
+ * the top of the Command Table buffer.
+ * (4) Create one or more Physical Region Descriptors (PRDs) that describe
+ * a location in memory where data may be stored/retrieved.
+ * (5) Write these PRDTs to the bottom (offset 0x80) of the Command Table.
+ * (6) Each AHCI port has up to 32 command slots. Each slot contains a
+ * header that points to a Command Table buffer. Pick an unused slot
+ * and update it to point to the Command Table we have built.
+ * (7) Now: Command #n points to our Command Table, and our Command Table
+ * contains the FIS (that describes our command) and the PRDTL, which
+ * describes our buffer.
+ * (8) We inform the HBA via PxCI (Command Issue) that the command in slot
+ * #n is ready for processing.
  */
 
 /* Pick the first implemented and running port */
-i = ahci_port_select(ahci);
-g_test_message(Selected port %u for test, i);
+px = ahci_port_select(ahci);
+g_test_message(Selected port %u for test, px);
 
 /* Clear out the FIS Receive area and any pending interrupts. */
-ahci_port_clear(ahci, i);
+ahci_port_clear(ahci, px);
 
-/* Create a data buffer where we will dump the IDENTIFY data to. */
-data_ptr = ahci_alloc(ahci, 512);
-g_assert(data_ptr);
-
-/* Construct the Command Table (FIS and PRDT) and Command Header */
-cmd = ahci_command_create(CMD_IDENTIFY);
-ahci_command_set_buffer(cmd, data_ptr);
-/* Write the command header and PRDT to guest memory */
-ahci_command_commit(ahci, cmd, i);
-
-/* Everything is in place, but we haven't given the go-ahead yet,
- * so we should find that there are no pending interrupts yet. */
-g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
-
-/* Issue command and sanity check response. */
-ahci_command_issue(ahci, cmd);
-ahci_command_verify(ahci, cmd);
-
-/* Last, but not least: Investigate the IDENTIFY response data. */
-memread(data_ptr, buff, 512);
+/* Read 512 bytes using CMD_IDENTIFY into the host buffer. */
+ahci_io(ahci, px, CMD_IDENTIFY, buff, buffsize);
 
 /* Check serial number/version in the buffer */
 /* NB: IDENTIFY strings are packed in 16bit little endian chunks.
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index d55dab8..e73205b 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -580,6 +580,31 @@ static AHCICommandProp *ahci_command_find(uint8_t 
command_name)
 }
 }
 
+/* Given a HOST buffer, create a buffer address and perform an IO operation. */
+void ahci_io(AHCIQState *ahci, uint8_t px, uint8_t ide_cmd,
+ void *buffer, 

[Qemu-devel] [PATCH] qemu-sockets: Fix buffer overflow in inet_parse()

2015-01-30 Thread Kevin Wolf
The size of the stack allocated host[] array didn't account for the
terminating '\0' byte that sscanf() writes. Fix the array size.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 util/qemu-sockets.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index a76bb3c..aacf1fc 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -512,7 +512,7 @@ InetSocketAddress *inet_parse(const char *str, Error **errp)
 {
 InetSocketAddress *addr;
 const char *optstr, *h;
-char host[64];
+char host[65];
 char port[33];
 int to;
 int pos;
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] qemu-sockets: Fix buffer overflow in inet_parse()

2015-01-30 Thread John Snow



On 01/30/2015 02:37 PM, Kevin Wolf wrote:

The size of the stack allocated host[] array didn't account for the
terminating '\0' byte that sscanf() writes. Fix the array size.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
  util/qemu-sockets.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index a76bb3c..aacf1fc 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -512,7 +512,7 @@ InetSocketAddress *inet_parse(const char *str, Error **errp)
  {
  InetSocketAddress *addr;
  const char *optstr, *h;
-char host[64];
+char host[65];
  char port[33];
  int to;
  int pos;



You don't really need reviews for trivial, right?
*shrug*

Reviewed-by: John Snow js...@redhat.com



Re: [Qemu-devel] vm live storage migration approach.

2015-01-30 Thread Yaodong Yang
Thank you Stefan! Now I understand the workflow.

-Yaodong

On Fri, Jan 30, 2015 at 2:37 PM, Stefan Hajnoczi stefa...@gmail.com wrote:

 On Fri, Jan 30, 2015 at 8:13 PM, Yaodong Yang yaodong.ya...@gmail.com
 wrote:
  An follow up questions.
 
  Suppose I have a running VM with two virtual disks, I would like to
 migrate
  the vm from host A to host B. Both host A and host B have their own
 isolated
  storage devices. Is there anyway to migrate the vm's memory, two virtual
  disk images and other states together from host A to host B? Can
  drive_mirror command itself finish this job? I noticed that drive_mirror
  only mirror for one virtual disk and require both the source and
 destination
  share the same storage namespace. I do not know how to migrate the whole
 VM
  (memory, storage, network ) together from host A to host B, given that
 host
  A and host B have NO shared storage resource.
 
  Could you show me an example, if possible?
 
  I know migrate -b works well for this purpose. But the downside is
  migrate -b does not mirror Write Requests to both host A and host B
 during
  migration. In this case, migrate -b has a higher VM downtime during the
  migration.

 Hi Yaodong,
 The answers to these questions are in libvirt's source code.  It
 orchestrates live migration between two hosts.

 Multiple independent drive_mirror jobs can run.  That's how you handle
 multiple disks.

 The regular migrate command (without -b) can be used if drive_mirror
 is in place and fully synced.

 Libvirt sets up an NBD server on the destination host.  The source
 host runs drive_mirror to copy over the contents of the disk images.
 Once the drive_mirror command has completed syncing data the regular
 'migrate' command can be used to send device state and RAM over to the
 destination host.

 See qemuMigrationDriveMirror() in libvirt.

 migrate -b is a legacy feature that is being replaced by drive_mirror.

 Stefan




-- 
Yaodong Yang
yaodong.ya...@gmail.com or yy...@cse.unl.edu
Computer Science and Engineering Department
University of Nebraska-Lincoln
Lincoln, NE, USA


[Qemu-devel] [PATCH 16/19] libqos/ahci: Add ahci_clean_mem

2015-01-30 Thread John Snow
Clean up guest memory being used in ahci_clean_mem, to be
called during ahci_shutdown. With all guest memory leaks removed,
add an option to the allocator to throw an assertion if a leak
occurs.

This test adds some sanity to both the AHCI library and the
allocator.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c |  3 +++
 tests/libqos/ahci.c   | 18 ++
 tests/libqos/ahci.h   |  1 +
 tests/libqos/malloc.c |  5 +
 tests/libqos/malloc.h |  1 +
 5 files changed, 28 insertions(+)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 47491fe..3a0131a 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -86,6 +86,7 @@ static AHCIQState *ahci_boot(void)
 -device ide-hd,drive=drive0 
 -global ide-hd.ver=%s;
 s-parent = qtest_pc_boot(cli, tmp_path, testdisk, version);
+alloc_set_flags(s-parent-alloc, ALLOC_LEAK_ASSERT);
 
 /* Verify that we have an AHCI device present. */
 s-dev = get_ahci_device(s-fingerprint);
@@ -99,6 +100,8 @@ static AHCIQState *ahci_boot(void)
 static void ahci_shutdown(AHCIQState *ahci)
 {
 QOSState *qs = ahci-parent;
+
+ahci_clean_mem(ahci);
 free_ahci_device(ahci-dev);
 g_free(ahci);
 qtest_shutdown(qs);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index e73205b..87a9e62 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -133,6 +133,24 @@ void free_ahci_device(QPCIDevice *dev)
 qpci_free_pc(pcibus);
 }
 
+/* Free all memory in-use by the AHCI device. */
+void ahci_clean_mem(AHCIQState *ahci)
+{
+uint8_t px, cx;
+
+for (px = 0; px  32; ++px) {
+if (ahci-port[px].fb) {
+ahci_free(ahci, ahci-port[px].fb);
+}
+if (ahci-port[px].clb) {
+for (cx = 0; cx  32; cx++) {
+ahci_destroy_command(ahci, px, cx);
+}
+ahci_free(ahci, ahci-port[px].clb);
+}
+}
+}
+
 /*** Logical Device Initialization ***/
 
 /**
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 3c4f478..2d25957 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -499,6 +499,7 @@ uint64_t ahci_alloc(AHCIQState *ahci, size_t bytes);
 void ahci_free(AHCIQState *ahci, uint64_t addr);
 QPCIDevice *get_ahci_device(uint32_t *fingerprint);
 void free_ahci_device(QPCIDevice *dev);
+void ahci_clean_mem(AHCIQState *ahci);
 void ahci_pci_enable(AHCIQState *ahci);
 void start_ahci_device(AHCIQState *ahci);
 void ahci_hba_enable(AHCIQState *ahci);
diff --git a/tests/libqos/malloc.c b/tests/libqos/malloc.c
index 8cce1ba..ad283f6 100644
--- a/tests/libqos/malloc.c
+++ b/tests/libqos/malloc.c
@@ -324,3 +324,8 @@ void alloc_set_page_size(QGuestAllocator *allocator, size_t 
page_size)
 g_assert(is_power_of_2(page_size));
 allocator-page_size = page_size;
 }
+
+void alloc_set_flags(QGuestAllocator *allocator, QAllocOpts opts)
+{
+allocator-opts |= opts;
+}
diff --git a/tests/libqos/malloc.h b/tests/libqos/malloc.h
index a39dba4..71ac407 100644
--- a/tests/libqos/malloc.h
+++ b/tests/libqos/malloc.h
@@ -36,5 +36,6 @@ QGuestAllocator *alloc_init(uint64_t start, uint64_t end);
 QGuestAllocator *alloc_init_flags(QAllocOpts flags,
   uint64_t start, uint64_t end);
 void alloc_set_page_size(QGuestAllocator *allocator, size_t page_size);
+void alloc_set_flags(QGuestAllocator *allocator, QAllocOpts opts);
 
 #endif
-- 
1.9.3




[Qemu-devel] [PATCH 14/19] libqos/ahci: Add ahci_guest_io

2015-01-30 Thread John Snow
ahci_guest_io is a shorthand function that will, in one shot,
execute a data command on the guest to the specified guest buffer
location, in the requested amount.

Signed-off-by: John Snow js...@redhat.com
---
 tests/libqos/ahci.c | 15 +++
 tests/libqos/ahci.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 3375d54..d55dab8 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -536,6 +536,21 @@ inline unsigned size_to_prdtl(unsigned bytes, unsigned 
bytes_per_prd)
 return (bytes + bytes_per_prd - 1) / bytes_per_prd;
 }
 
+/* Given a guest buffer address, perform an IO operation */
+void ahci_guest_io(AHCIQState *ahci, uint8_t px, uint8_t ide_cmd,
+   uint64_t buffer, size_t bufsize)
+{
+AHCICommand *cmd;
+
+cmd = ahci_command_create(ide_cmd);
+ahci_command_set_buffer(cmd, buffer);
+ahci_command_set_size(cmd, bufsize);
+ahci_command_commit(ahci, cmd, px);
+ahci_command_issue(ahci, cmd);
+ahci_command_verify(ahci, cmd);
+ahci_command_free(cmd);
+}
+
 typedef struct AHCICommand {
 /* Test Management Data */
 uint8_t name;
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 1300967..cbde141 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -520,6 +520,8 @@ void ahci_set_command_header(AHCIQState *ahci, uint8_t px,
 void ahci_destroy_command(AHCIQState *ahci, uint8_t px, uint8_t cx);
 unsigned ahci_pick_cmd(AHCIQState *ahci, uint8_t px);
 unsigned size_to_prdtl(unsigned bytes, unsigned bytes_per_prd);
+void ahci_guest_io(AHCIQState *ahci, uint8_t px, uint8_t ide_cmd,
+   uint64_t gbuffer, size_t size);
 
 /* Command Lifecycle */
 AHCICommand *ahci_command_create(uint8_t command_name);
-- 
1.9.3




[Qemu-devel] [PATCH 08/19] libqos/ahci: Add cmd response sanity check helpers

2015-01-30 Thread John Snow
This patch adds a few helpers to help sanity-check the response of the
AHCI device after a command.

ahci_d2h_check_sanity inspects the D2H Register FIS,
ahci_pio_check_sanity inspects the PIO Setup FIS, and
ahci_cmd_check_sanity inspects the command header.

To support the PIO sanity check, a new structure is added for the
PIO Setup FIS type. Existing FIS types (H2D and D2H) have had their
members renamed slightly to condense reserved members into fewer
fields; and LBA fields are now represented by arrays of 8 byte chunks
instead of independent variables.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 29 
 tests/libqos/ahci.c | 47 ++
 tests/libqos/ahci.h | 54 -
 3 files changed, 88 insertions(+), 42 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 77d0f97..b21d4ad 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -657,12 +657,10 @@ static void ahci_test_port_spec(AHCIQState *ahci, uint8_t 
port)
  */
 static void ahci_test_identify(AHCIQState *ahci)
 {
-RegD2HFIS *d2h = g_malloc0(0x20);
-RegD2HFIS *pio = g_malloc0(0x20);
 RegH2DFIS fis;
 AHCICommandHeader cmd;
 PRD prd;
-uint32_t reg, data_ptr;
+uint32_t data_ptr;
 uint16_t buff[256];
 unsigned i;
 int rc;
@@ -752,27 +750,11 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* BUG: we expect AHCI_PX_IS_DPS to be set. */
 ahci_port_check_interrupts(ahci, i, AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS);
 ahci_port_check_nonbusy(ahci, i, cx);
-
 /* Investigate the CMD, assert that we read 512 bytes */
-ahci_get_command_header(ahci, i, cx, cmd);
-g_assert_cmphex(512, ==, cmd.prdbc);
-
+ahci_port_check_cmd_sanity(ahci, i, cx, 512);
 /* Investigate FIS responses */
-memread(ahci-port[i].fb + 0x20, pio, 0x20);
-memread(ahci-port[i].fb + 0x40, d2h, 0x20);
-g_assert_cmphex(pio-fis_type, ==, 0x5f);
-g_assert_cmphex(d2h-fis_type, ==, 0x34);
-g_assert_cmphex(pio-flags, ==, d2h-flags);
-g_assert_cmphex(pio-status, ==, d2h-status);
-g_assert_cmphex(pio-error, ==, d2h-error);
-
-reg = ahci_px_rreg(ahci, i, AHCI_PX_TFD);
-g_assert_cmphex((reg  AHCI_PX_TFD_ERR), ==, pio-error);
-g_assert_cmphex((reg  AHCI_PX_TFD_STS), ==, pio-status);
-/* The PIO Setup FIS contains a bytes read field, which is a
- * 16-bit value. The Physical Region Descriptor Byte Count is
- * 32-bit, but for small transfers using one PRD, it should match. */
-g_assert_cmphex(le16_to_cpu(pio-res4), ==, cmd.prdbc);
+ahci_port_check_d2h_sanity(ahci, i, cx);
+ahci_port_check_pio_sanity(ahci, i, cx, 512);
 
 /* Last, but not least: Investigate the IDENTIFY response data. */
 memread(data_ptr, buff, 512);
@@ -789,9 +771,6 @@ static void ahci_test_identify(AHCIQState *ahci)
 string_bswap16(buff[23], 8);
 rc = memcmp(buff[23], version , 8);
 g_assert_cmphex(rc, ==, 0);
-
-g_free(d2h);
-g_free(pio);
 }
 
 
/**/
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 0016aad..924b9f0 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -365,6 +365,53 @@ void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t px, 
uint8_t cx)
 ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_DRQ);
 }
 
+void ahci_port_check_d2h_sanity(AHCIQState *ahci, uint8_t px, uint8_t cx)
+{
+RegD2HFIS *d2h = g_malloc0(0x20);
+uint32_t reg;
+
+memread(ahci-port[px].fb + 0x40, d2h, 0x20);
+g_assert_cmphex(d2h-fis_type, ==, 0x34);
+
+reg = ahci_px_rreg(ahci, px, AHCI_PX_TFD);
+g_assert_cmphex((reg  AHCI_PX_TFD_ERR)  8, ==, d2h-error);
+g_assert_cmphex((reg  AHCI_PX_TFD_STS), ==, d2h-status);
+
+g_free(d2h);
+}
+
+void ahci_port_check_pio_sanity(AHCIQState *ahci, uint8_t px,
+uint8_t cx, size_t buffsize)
+{
+PIOSetupFIS *pio = g_malloc0(0x20);
+
+/* We cannot check the Status or E_Status registers, becuase
+ * the status may have again changed between the PIO Setup FIS
+ * and the conclusion of the command with the D2H Register FIS. */
+memread(ahci-port[px].fb + 0x20, pio, 0x20);
+g_assert_cmphex(pio-fis_type, ==, 0x5f);
+
+/* BUG: PIO Setup FIS as utilized by QEMU tries to fit the entire
+ * transfer size in a uint16_t field. The maximum transfer size can
+ * eclipse this; the field is meant to convey the size of data per
+ * each Data FIS, not the entire operation as a whole. For now,
+ * we will sanity check the broken case where applicable. */
+if (buffsize = UINT16_MAX) {
+g_assert_cmphex(le16_to_cpu(pio-tx_count), ==, buffsize);
+}
+
+g_free(pio);
+}
+
+void ahci_port_check_cmd_sanity(AHCIQState *ahci, uint8_t px,
+uint8_t cx, size_t buffsize)
+{
+AHCICommandHeader cmd;

Re: [Qemu-devel] [PATCH 13/21] blockdev: Add list of monitor-owned BlockBackends

2015-01-30 Thread Eric Blake
On 01/26/2015 12:27 PM, Max Reitz wrote:
 The monitor does hold references to some BlockBackends so it should have
 a list of those BBs; blk_backends is a different list, as it contains
 references to all BBs (after a follow-up patch, that is), and that
 should not be changed because we do need such a list.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block/block-backend.c  | 27 ++-
  blockdev.c | 14 ++
  include/sysemu/block-backend.h |  2 ++
  3 files changed, 42 insertions(+), 1 deletion(-)
 

Reviewed-by: Eric Blake ebl...@redhat.com

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH 05/19] libqos/ahci: Add ahci_port_check_error helper

2015-01-30 Thread John Snow
ahci_port_check_error checks a given port's error registers and asserts
that everything from the port-level view is still OK.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   |  8 +---
 tests/libqos/ahci.c | 22 ++
 tests/libqos/ahci.h |  1 +
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 17aee37..63ecf73 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -747,6 +747,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 while (BITSET(ahci_px_rreg(ahci, i, AHCI_PX_TFD), AHCI_PX_TFD_STS_BSY)) {
 usleep(50);
 }
+ahci_port_check_error(ahci, i);
 
 /* Check for expected interrupts */
 reg = ahci_px_rreg(ahci, i, AHCI_PX_IS);
@@ -760,13 +761,6 @@ static void ahci_test_identify(AHCIQState *ahci)
  AHCI_PX_IS_DHRS | AHCI_PX_IS_PSS | AHCI_PX_IS_DPS);
 g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
 
-/* Check for errors. */
-reg = ahci_px_rreg(ahci, i, AHCI_PX_SERR);
-g_assert_cmphex(reg, ==, 0);
-reg = ahci_px_rreg(ahci, i, AHCI_PX_TFD);
-ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_ERR);
-ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR);
-
 /* Investigate the CMD, assert that we read 512 bytes */
 ahci_get_command_header(ahci, i, cx, cmd);
 g_assert_cmphex(512, ==, cmd.prdbc);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index f5e8864..35de9af 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -311,6 +311,28 @@ void ahci_port_clear(AHCIQState *ahci, uint8_t px)
 qmemset(ahci-port[px].fb, 0x00, 0x100);
 }
 
+/**
+ * Check a port for errors.
+ */
+void ahci_port_check_error(AHCIQState *ahci, uint8_t px)
+{
+uint32_t reg;
+
+/* The upper 9 bits of the IS register all indicate errors. */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_IS);
+reg = 23;
+g_assert_cmphex(reg, ==, 0);
+
+/* The Sata Error Register should be empty. */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_SERR);
+g_assert_cmphex(reg, ==, 0);
+
+/* The TFD also has two error sections. */
+reg = ahci_px_rreg(ahci, px, AHCI_PX_TFD);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_ERR);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR);
+}
+
 /* Get the #cx'th command of port #px. */
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd)
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index d27e75f..b384cbc 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -433,6 +433,7 @@ void start_ahci_device(AHCIQState *ahci);
 void ahci_hba_enable(AHCIQState *ahci);
 unsigned ahci_port_select(AHCIQState *ahci);
 void ahci_port_clear(AHCIQState *ahci, uint8_t px);
+void ahci_port_check_error(AHCIQState *ahci, uint8_t px);
 void ahci_get_command_header(AHCIQState *ahci, uint8_t px,
  uint8_t cx, AHCICommandHeader *cmd);
 void ahci_set_command_header(AHCIQState *ahci, uint8_t px,
-- 
1.9.3




[Qemu-devel] [PATCH 04/19] libqos/ahci: Add command header helpers

2015-01-30 Thread John Snow
Adds command header helper functions:
-ahci_command_header_set
-ahci_command_header_get,
-ahci_command_destroy, and
-ahci_cmd_pick

These helpers help to quickly manage the command header information in
the AHCI device.

ahci_command_header_set and get will store or retrieve an AHCI command
header, respectively.

ahci_cmd_pick chooses the first available but least recently used
command slot to allow us to cycle through the available command slots.

ahci_command_destroy obliterates all information contained within a
given slot's command header, and frees its associated command table,
but not its DMA buffer!

Lastly, the command table pointer fields (dba and dbau) are merged into
a single 64bit value to make managing 64bit tests simpler.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c   | 43 ---
 tests/libqos/ahci.c | 74 +
 tests/libqos/ahci.h | 17 
 3 files changed, 109 insertions(+), 25 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 85e5761..17aee37 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -662,10 +662,12 @@ static void ahci_test_identify(AHCIQState *ahci)
 RegH2DFIS fis;
 AHCICommandHeader cmd;
 PRD prd;
-uint32_t reg, table, data_ptr;
+uint32_t reg, data_ptr;
 uint16_t buff[256];
 unsigned i;
 int rc;
+uint8_t cx;
+uint64_t table;
 
 g_assert(ahci != NULL);
 
@@ -700,19 +702,19 @@ static void ahci_test_identify(AHCIQState *ahci)
 data_ptr = ahci_alloc(ahci, 512);
 g_assert(data_ptr);
 
-/* Copy the existing Command #0 structure from the CLB into local memory,
- * and build a new command #0. */
-memread(ahci-port[i].clb, cmd, sizeof(cmd));
-cmd.flags = 5;/* reg_h2d_fis is 5 double-words long */
-cmd.flags |= 0x400; /* clear PxTFD.STS.BSY when done */
-cmd.prdtl = cpu_to_le16(1); /* One PRD table entry. */
+/* pick a command slot (should be 0!) */
+cx = ahci_pick_cmd(ahci, i);
+
+/* Construct our Command Header (set_command_header handles endianness.) */
+memset(cmd, 0x00, sizeof(cmd));
+cmd.flags = 5; /* reg_h2d_fis is 5 double-words long */
+cmd.flags = 0x400; /* clear PxTFD.STS.BSY when done */
+cmd.prdtl = 1; /* One PRD table entry. */
 cmd.prdbc = 0;
-cmd.ctba = cpu_to_le32(table);
-cmd.ctbau = 0;
+cmd.ctba = table;
 
 /* Construct our PRD, noting that DBC is 0-indexed. */
-prd.dba = cpu_to_le32(data_ptr);
-prd.dbau = 0;
+prd.dba = cpu_to_le64(data_ptr);
 prd.res = 0;
 /* 511+1 bytes, request DPS interrupt */
 prd.dbc = cpu_to_le32(511 | 0x8000);
@@ -733,14 +735,15 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* Commit the PRD entry to the Command Table */
 memwrite(table + 0x80, prd, sizeof(prd));
 
-/* Commit Command #0, pointing to the Table, to the Command List Buffer. */
-memwrite(ahci-port[i].clb, cmd, sizeof(cmd));
+/* Commit Command #cx, pointing to the Table, to the Command List Buffer. 
*/
+ahci_set_command_header(ahci, i, cx, cmd);
 
-/* Everything is in place, but we haven't given the go-ahead yet. */
+/* Everything is in place, but we haven't given the go-ahead yet,
+ * so we should find that there are no pending interrupts yet. */
 g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0);
 
-/* Issue Command #0 via PxCI */
-ahci_px_wreg(ahci, i, AHCI_PX_CI, (1  0));
+/* Issue Command #cx via PxCI */
+ahci_px_wreg(ahci, i, AHCI_PX_CI, (1  cx));
 while (BITSET(ahci_px_rreg(ahci, i, AHCI_PX_TFD), AHCI_PX_TFD_STS_BSY)) {
 usleep(50);
 }
@@ -764,9 +767,9 @@ static void ahci_test_identify(AHCIQState *ahci)
 ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_ERR);
 ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR);
 
-/* Investigate CMD #0, assert that we read 512 bytes */
-memread(ahci-port[i].clb, cmd, sizeof(cmd));
-g_assert_cmphex(512, ==, le32_to_cpu(cmd.prdbc));
+/* Investigate the CMD, assert that we read 512 bytes */
+ahci_get_command_header(ahci, i, cx, cmd);
+g_assert_cmphex(512, ==, cmd.prdbc);
 
 /* Investigate FIS responses */
 memread(ahci-port[i].fb + 0x20, pio, 0x20);
@@ -783,7 +786,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 /* The PIO Setup FIS contains a bytes read field, which is a
  * 16-bit value. The Physical Region Descriptor Byte Count is
  * 32-bit, but for small transfers using one PRD, it should match. */
-g_assert_cmphex(le16_to_cpu(pio-res4), ==, le32_to_cpu(cmd.prdbc));
+g_assert_cmphex(le16_to_cpu(pio-res4), ==, cmd.prdbc);
 
 /* Last, but not least: Investigate the IDENTIFY response data. */
 memread(data_ptr, buff, 512);
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 9195bd6..f5e8864 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -310,3 +310,77 @@ void ahci_port_clear(AHCIQState *ahci, 

[Qemu-devel] [PATCH 17/19] qtest/ahci: Add a macro bootup routine

2015-01-30 Thread John Snow
Add a routine that can be used to engage the AHCI
device at a not-granular level so that bringing up
the functionality of the HBA is easy in future tests
that are not concerned with testing the bring-up process.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 3a0131a..9207e73 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -107,6 +107,21 @@ static void ahci_shutdown(AHCIQState *ahci)
 qtest_shutdown(qs);
 }
 
+/**
+ * Boot and fully enable the HBA device.
+ * @see ahci_boot, ahci_pci_enable and ahci_hba_enable.
+ */
+static AHCIQState *ahci_macro_bootup(void)
+{
+AHCIQState *ahci;
+ahci = ahci_boot();
+
+ahci_pci_enable(ahci);
+ahci_hba_enable(ahci);
+
+return ahci;
+}
+
 /*** Specification Adherence Tests ***/
 
 /**
@@ -787,9 +802,7 @@ static void test_identify(void)
 {
 AHCIQState *ahci;
 
-ahci = ahci_boot();
-ahci_pci_enable(ahci);
-ahci_hba_enable(ahci);
+ahci = ahci_macro_bootup();
 ahci_test_identify(ahci);
 ahci_shutdown(ahci);
 }
-- 
1.9.3




[Qemu-devel] [PATCH 19/19] qtest/ahci: Adding simple dma read-write test

2015-01-30 Thread John Snow
Adds a test case for AHCI wherein we write a 4K
block of a changing pattern to sector 0, then
read back that 4K and compare the transmit and
receive buffers.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index cae94b5..1f02fa8 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -731,6 +731,45 @@ static void ahci_test_identify(AHCIQState *ahci)
 g_assert_cmphex(sect_size, ==, 0x200);
 }
 
+static void ahci_test_dma_rw_simple(AHCIQState *ahci)
+{
+uint64_t ptr;
+uint8_t px;
+unsigned i;
+const unsigned bufsize = 4096;
+unsigned char *tx = g_malloc(bufsize);
+unsigned char *rx = g_malloc0(bufsize);
+
+g_assert(ahci != NULL);
+
+/* Pick the first running port and clear it. */
+px = ahci_port_select(ahci);
+ahci_port_clear(ahci, px);
+
+/*** Create pattern and transfer to guest ***/
+/* Data buffer in the guest */
+ptr = ahci_alloc(ahci, bufsize);
+g_assert(ptr);
+
+/* Write some indicative pattern to our 4K buffer. */
+for (i = 0; i  bufsize; i++) {
+tx[i] = (bufsize - i);
+}
+memwrite(ptr, tx, bufsize);
+
+/* Write this buffer to disk, then read it back to the DMA buffer. */
+ahci_guest_io(ahci, px, CMD_WRITE_DMA, ptr, bufsize);
+ahci_guest_io(ahci, px, CMD_READ_DMA, ptr, bufsize);
+
+/*** Read back the Data ***/
+memread(ptr, rx, bufsize);
+g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);
+
+ahci_free(ahci, ptr);
+g_free(tx);
+g_free(rx);
+}
+
 
/**/
 /* Test Interfaces
*/
 
/**/
@@ -811,6 +850,18 @@ static void test_identify(void)
 ahci_shutdown(ahci);
 }
 
+/**
+ * Perform a simple DMA R/W test, using a single PRD and non-NCQ commands.
+ */
+static void test_dma_rw_simple(void)
+{
+AHCIQState *ahci;
+
+ahci = ahci_macro_bootup();
+ahci_test_dma_rw_simple(ahci);
+ahci_shutdown(ahci);
+}
+
 
/**/
 
 int main(int argc, char **argv)
@@ -866,6 +917,7 @@ int main(int argc, char **argv)
 qtest_add_func(/ahci/hba_spec,   test_hba_spec);
 qtest_add_func(/ahci/hba_enable, test_hba_enable);
 qtest_add_func(/ahci/identify,   test_identify);
+qtest_add_func(/ahci/dma/simple, test_dma_rw_simple);
 
 ret = g_test_run();
 
-- 
1.9.3




[Qemu-devel] [PATCH 18/19] qtest/ahci: Assert sector size in identify test

2015-01-30 Thread John Snow
A minor sanity check to assert that the sector size is 512.
The current block layer code deeply assumes that the IDE
sector size will be 512 bytes, so we carry forward that assumption
here.

This is useful for the DMA tests, which currently assume that
a sector will always be 512 bytes.

Signed-off-by: John Snow js...@redhat.com
---
 tests/ahci-test.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 9207e73..cae94b5 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -678,6 +678,7 @@ static void ahci_test_identify(AHCIQState *ahci)
 uint16_t buff[256];
 unsigned px;
 int rc;
+uint16_t sect_size;
 const size_t buffsize = 512;
 
 g_assert(ahci != NULL);
@@ -725,6 +726,9 @@ static void ahci_test_identify(AHCIQState *ahci)
 string_bswap16(buff[23], 8);
 rc = memcmp(buff[23], version , 8);
 g_assert_cmphex(rc, ==, 0);
+
+sect_size = le16_to_cpu(*((uint16_t *)(buff[5])));
+g_assert_cmphex(sect_size, ==, 0x200);
 }
 
 
/**/
-- 
1.9.3




Re: [Qemu-devel] QEMU segfault: Booting an overlay with backing_file over NBD: nbd.c:nbd_receive_request():L756: read failed

2015-01-30 Thread Max Reitz

On 2015-01-30 at 13:41, Kashyap Chamarthy wrote:

On Fri, Jan 30, 2015 at 06:15:21PM +0100, Kevin Wolf wrote:

Am 29.01.2015 um 17:25 hat Kashyap Chamarthy geschrieben:

   $ qemu-system-x86_64   \
  -nographic  \
  -nodefconfig\
  -nodefaults \
  -m 2048 \
  -device virtio-scsi-pci,id=scsi \
  -device virtio-serial-pci   \
  -serial stdio   \
  -drive file=./overlay1.qcow2,format=qcow2,if=virtio,cache=writeback
   Segmentation fault (core dumped)


On the shell where `qemu-nbd` is running, I notice this

   nbd.c:nbd_receive_request():L756: read failed


Haven't investigated further with GDB, thought I'd bring it up here
first.


Versions


   $ rpm -q qemu; uname -r
   qemu-2.1.2-7.fc21.x86_64
   3.17.8-300.fc21.x86_64

Copying Stefan because he's the master of AIO contexts and it is
bs-aio_context that becomes NULL. I couldn't see anything obvious.


In the meantime, could you retest on git master?

Just tested from git, and I can still reproduce it.

That's the commit I'm at:

   $ git describe
   v2.2.0-682-g16017c4


Run the NBD server, from git:

   $ /home/kashyapc/build/qemu/qemu-nbd -f qcow2 \
   -p10809 ./f21vm.qcow2 -t


Create the overlay:

   $ /home/kashyapc/build/qemu/qemu-img create \
   -f qcow2 -F nbd -o backing_file=nbd://localhost overlay2-of-f21vm.qcow2
   Segmentation fault (core dumped)


You want to use -F raw. The file format is raw, not nbd (nbd is the 
protocol over which the data is read, which is in format raw).


Anyway, -F nbd shouldn't result in a segfault. One way to prevent this 
is to check whether the backing file format specified (or any format 
given to qemu-img in general) is a real format or the name of a protocol 
driver and then error out if it's the latter; but that would be more of 
a hotfix.


Kevin, Stefan: The real problem is that block/nbd.c stores a 
BDRVNBDState object in bs-opaque and passes BDRVNBDState.client (an 
NbdClientSession object) to the block/nbd-client.c functions. Those 
functions then receive the BDS pointer from client-bs. If an NBD BDS is 
a root BDS (as in this case), at some point a bdrv_swap() may happen 
(and it does happen here) which leads to ((BDRVNBDState 
*)bs-opaque)-client.bs != bs, and that's where the segfault comes from 
(bdrv_get_aio_context() returns NULL).


One way to fix this real problem is to remove the BDS pointer from the 
NbdClientSession and to always pass the BDS explicitly to the 
block/nbd-client.c functions; the other is to always update the BDS 
pointer in NbdClientSession in block/nbd.c. I'll try the former, and if 
it doesn't work, will do the latter (if you don't object).


Max


Creating the overlay from the  git-compiled `qemu-img` binary fails.

So, let's create the overlay using the `qemu-img` binary from the system
(RPM version noted above) and boot the overlay from the just compiled
QEMU x86_64 binary from git, still core dumps:

   $ /home/kashyapc/build/qemu/x86_64-softmmu/qemu-system-x86_64 \
   -nographic  \
   -nodefconfig\
   -nodefaults \
   -m 2048 \
   -device virtio-scsi-pci,id=scsi \
   -device virtio-serial-pci   \
   -serial stdio   \
   -drive file=./overlay2-f21vm.qcow2,format=qcow2,if=virtio,cache=writeback
   Segmentation fault (core dumped)


PS: I'm traveling, so I'll be a little slow to respond here, but can
provide more debugging info from the coredump of `qemu-img` binary as I
have access to a real computer.







Re: [Qemu-devel] vm live storage migration approach.

2015-01-30 Thread Yaodong Yang
An follow up questions.

Suppose I have a running VM with two virtual disks, I would like to migrate
the vm from host A to host B. Both host A and host B have their own
isolated storage devices. Is there anyway to migrate the vm's memory, two
virtual disk images and other states together from host A to host B? Can
drive_mirror command itself finish this job? I noticed that drive_mirror
only mirror for one virtual disk and require both the source and
destination share the same storage namespace. I do not know how to migrate
the whole VM (memory, storage, network ) together from host A to host B,
given that host A and host B have NO shared storage resource.

Could you show me an example, if possible?

I know migrate -b works well for this purpose. But the downside is
migrate -b does not mirror Write Requests to both host A and host B
during migration. In this case, migrate -b has a higher VM downtime
during the migration.

Yaodong

On Fri, Jan 30, 2015 at 1:21 PM, Eric Blake ebl...@redhat.com wrote:

 On 01/30/2015 09:25 AM, Yaodong Yang wrote:
  Hi all,
 
  I'm investigating the current schemes for the VM live storage migration
 in
  QEMU system. I have the following questions:
 
  1. What is the functionality of drive_mirror in QEMU? Is it designed as a
  VM live storage migration approach?

 Yes. drive-mirror was added precisely to support live storage migration.

 
  2. What's the difference between drive_mirror and vMotion? I learned that
  vMotion employs IO Mirroring mechanism to migration a running VM with all
  the virtual disk images. Are there any other mechanisms inside QEMU serve
  this purpose as well?

 vMotion is not part of qemu, so I'm not sure what it does.  Therefore, I
 cannot make a fair comparison.

 
  I'm looking for a mechanism in QEMU, which is similar to vMotion ( IO
  Mirroring) in ESX environment.

 Look at drive-mirror, block-backup, quorum drives, and lots of recent
 threads on this list about adding redundant processing (such as the term
 COLO) for how to piece together lower-level blocks into whatever drive
 mirroring scheme you can think of.


 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org




-- 
Yaodong Yang
yaodong.ya...@gmail.com or yy...@cse.unl.edu
Computer Science and Engineering Department
University of Nebraska-Lincoln
Lincoln, NE, USA


[Qemu-devel] [PATCH] vfio-pci: unparent BAR subregions

2015-01-30 Thread Alex Williamson
Commit d8d95814609e replaced a number of memory_region_destroy()
calls with object_unparent() calls.  The logic appears to be that
subregions need to be unparented, but the base region is destroyed
with the device object.  Doing hotplug testing with vfio-pci I
occasionally get a segfault from object_finalize_child_property()
due to completely bogus class pointers on the child Object.  Adding
the explicit object_unparent() for these subregions resolves the
problem, however I question the sanity of the Memory API now where
we sometimes need to destroy MemoryRegions, but the rules aren't
clear and there's no longer a memory_region_destroy() function, so
we need to reach over to some other random QEMU API and unparent
an object that we barely know about and certainly didn't explicitly
parent previously.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: qemu-sta...@nongnu.org
---

 hw/vfio/pci.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 014a92c..c71499e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr)
 
 memory_region_del_subregion(bar-region.mem, bar-region.mmap_mem);
 munmap(bar-region.mmap, memory_region_size(bar-region.mmap_mem));
+object_unparent(OBJECT(bar-region.mmap_mem));
 
 if (vdev-msix  vdev-msix-table_bar == nr) {
 memory_region_del_subregion(bar-region.mem, vdev-msix-mmap_mem);
 munmap(vdev-msix-mmap, memory_region_size(vdev-msix-mmap_mem));
+object_unparent(OBJECT(vdev-msix-mmap_mem));
 }
 }
 




Re: [Qemu-devel] [Xen-devel] [RFC][PATCH 1/1] libxl: add one machine property to support IGD GFX passthrough

2015-01-30 Thread Xu, Quan


 -Original Message-
 From: xen-devel-boun...@lists.xen.org
 [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Wei Liu
 Sent: Friday, January 30, 2015 8:26 PM
 To: Chen, Tiejun
 Cc: Wei Liu; ian.campb...@citrix.com; m...@redhat.com; Ian Jackson;
 qemu-devel@nongnu.org; xen-de...@lists.xen.org; Gerd Hoffmann
 Subject: Re: [Xen-devel] [Qemu-devel] [RFC][PATCH 1/1] libxl: add one machine
 property to support IGD GFX passthrough
 
 On Fri, Jan 30, 2015 at 08:56:48AM +0800, Chen, Tiejun wrote:
 [...]
  
  Just remember to handle old option in libxl if your old option is
  already released by some older version of QEMUs.
  
  I just drop that old option, -gfx_passthru, if we're under qemu
  upstream circumstance, like this,
  
  
  The question is, is there any version of qemu upstream that has been
  released that has the old option (-gfx-passthru)?
 
  No. Just now we're starting to support IGD passthrough in qemu upstream.
 
 
 Right, as of QEMU 2.2.0 there's no support of IGD passthrough in QMEU
 upstream.
 

Just a question:
   Now what features do vt-d support? Thanks.

-Quan

  
  This gives us a situation that we need to support both the old
  (-gfx-passthru) and new (-igd-passthru) options. Presumably we
  (libxl) would need to fork a qemu process to determine which option
  it has and pass the right one.
  
  Or you can try to keep both old and new option at the same time but
 
  Yeah, actually I also have considered to keep both two options at the
  same time. Its really friendly to any qemu version.
 
  deprecate the old one. Then in a few qemu release cycles later (or
 
  This should be like 'accel=kvm' versus 'enable-kvm' in qemu upstream.
  They're coexisted now but just the former is a modern option.
 
  probably one year or two?) you can finally remove the old one. The
  point is that to give downstream (in this case, Xen) time to cope
  with the change.
 
  Here I'm fine to this way.
 
  So Gerd,
 
 
 So you don't actually need to ask Gerd this question because there is no old
 option to keep in qemu upstream.
 
 Libxl (or any sensible toolstack) will just do the right thing to either pass
 -igd-passthru (or whatever you guys agree upon) to qemu upstream or pass
 -gfx-passthru to qemu traditional. :-)
 
 Wei.
 
  What about this?
 
 
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel



Re: [Qemu-devel] [RFC PATCH v1 10/13] cpus, spapr: reclaim allocated vCPU objects

2015-01-30 Thread David Gibson
On Fri, Jan 30, 2015 at 01:53:47PM +0530, Bharata B Rao wrote:
 On Thu, Jan 29, 2015 at 12:48:39PM +1100, David Gibson wrote:
  On Thu, Jan 08, 2015 at 11:40:17AM +0530, Bharata B Rao wrote:
   From: Gu Zheng guz.f...@cn.fujitsu.com
  
  This needs a commit message, it's not at all clear from the 1-line 
  description.
 
 Borrowed patch, but I should have put in a description.
 
   diff --git a/kvm-all.c b/kvm-all.c
   index 18cc6b4..6f543ce 100644
   --- a/kvm-all.c
   +++ b/kvm-all.c
   @@ -71,6 +71,12 @@ typedef struct KVMSlot

typedef struct kvm_dirty_log KVMDirtyLog;

   +struct KVMParkedVcpu {
   +unsigned long vcpu_id;
   +int kvm_fd;
   +QLIST_ENTRY(KVMParkedVcpu) node;
   +};
   +
struct KVMState
{
AccelState parent_obj;
   @@ -107,6 +113,7 @@ struct KVMState
QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) 
   msi_hashtab[KVM_MSI_HASHTAB_SIZE];
bool direct_msi;
#endif
   +QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
};

#define TYPE_KVM_ACCEL ACCEL_CLASS_NAME(kvm)
   @@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, 
   KVMSlot *slot)
return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, mem);
}

   +int kvm_destroy_vcpu(CPUState *cpu)
   +{
   +KVMState *s = kvm_state;
   +long mmap_size;
   +struct KVMParkedVcpu *vcpu = NULL;
   +int ret = 0;
   +
   +DPRINTF(kvm_destroy_vcpu\n);
   +
   +mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
   +if (mmap_size  0) {
   +ret = mmap_size;
   +DPRINTF(kvm_destroy_vcpu failed\n);
   +goto err;
   +}
   +
   +ret = munmap(cpu-kvm_run, mmap_size);
   +if (ret  0) {
   +goto err;
   +}
   +
   +vcpu = g_malloc0(sizeof(*vcpu));
   +vcpu-vcpu_id = kvm_arch_vcpu_id(cpu);
   +vcpu-kvm_fd = cpu-kvm_fd;
   +QLIST_INSERT_HEAD(kvm_state-kvm_parked_vcpus, vcpu, node);
  
  What's the reason for parking vcpus rather than removing / recreating
  them at the kvm level?
 
 Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
 correctly, certain work arounds have to be employed to allow reuse of
 vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
 proposed workaround is to park the vcpu fd in userspace during cpu unplug
 and reuse it later during next hotplug.
 
 Some details can be found here:
 KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
 QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

Ok, that makes sense but it definitely needs comment, both in the code
and in the commit message.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpvhbIplswqJ.pgp
Description: PGP signature


[Qemu-devel] [PATCH 0/2] xen_pt: fix two Coverity defects

2015-01-30 Thread arei.gonglei
From: Gonglei arei.gong...@huawei.com


Gonglei (2):
  xen-pt: fix Negative array index read
  xen-pt: fix Out-of-bounds read

 hw/xen/xen_pt_config_init.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

-- 
1.7.12.4





[Qemu-devel] [PATCH 1/2] xen-pt: fix Negative array index read

2015-01-30 Thread arei.gonglei
From: Gonglei arei.gong...@huawei.com

Coverity spot:
Function xen_pt_bar_offset_to_index() may returns a negative
number (-1) value index, which as an index to array d-io_regions.

Let's directly and simply pass index as an argument to
xen_pt_bar_reg_parse().

Signed-off-by: Gonglei arei.gong...@huawei.com
---
 hw/xen/xen_pt_config_init.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index de9a20f..710fe50 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -360,15 +360,13 @@ static uint64_t xen_pt_get_bar_size(PCIIORegion *r)
 }
 
 static XenPTBarFlag xen_pt_bar_reg_parse(XenPCIPassthroughState *s,
- XenPTRegInfo *reg)
+ int index)
 {
 PCIDevice *d = s-dev;
 XenPTRegion *region = NULL;
 PCIIORegion *r;
-int index = 0;
 
 /* check 64bit BAR */
-index = xen_pt_bar_offset_to_index(reg-offset);
 if ((0  index)  (index  PCI_ROM_SLOT)) {
 int type = s-real_device.io_regions[index - 1].type;
 
@@ -422,7 +420,7 @@ static int xen_pt_bar_reg_init(XenPCIPassthroughState *s, 
XenPTRegInfo *reg,
 }
 
 /* set BAR flag */
-s-bases[index].bar_flag = xen_pt_bar_reg_parse(s, reg);
+s-bases[index].bar_flag = xen_pt_bar_reg_parse(s, index);
 if (s-bases[index].bar_flag == XEN_PT_BAR_FLAG_UNUSED) {
 reg_field = XEN_PT_INVALID_REG;
 }
-- 
1.7.12.4





[Qemu-devel] [PATCH 2/2] xen-pt: fix Out-of-bounds read

2015-01-30 Thread arei.gonglei
From: Gonglei arei.gong...@huawei.com

The array length of s-real_device.io_regions[] is
PCI_NUM_REGIONS - 1. Add a check, just make Coverity happy.

Signed-off-by: Gonglei arei.gong...@huawei.com
---
 hw/xen/xen_pt_config_init.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 710fe50..3c8b0f1 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -443,6 +443,11 @@ static int xen_pt_bar_reg_read(XenPCIPassthroughState *s, 
XenPTReg *cfg_entry,
 return -1;
 }
 
+if (index == PCI_ROM_SLOT) {
+XEN_PT_ERR(s-dev, Internal error: Access violation at ROM BAR.\n);
+return -1;
+}
+
 /* use fixed-up value from kernel sysfs */
 *value = base_address_with_flags(s-real_device.io_regions[index]);
 
-- 
1.7.12.4





Re: [Qemu-devel] [PATCH RFC v6 18/20] virtio: support revision-specific features

2015-01-30 Thread Cornelia Huck
On Wed, 7 Jan 2015 21:10:07 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Jan 07, 2015 at 05:22:32PM +0100, Cornelia Huck wrote:
  On Sun, 28 Dec 2014 10:32:06 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Thu, Dec 11, 2014 at 02:25:20PM +0100, Cornelia Huck wrote:
Devices may support different sets of feature bits depending on which
revision they're operating at. Let's give the transport a way to
re-query the device about its features when the revision has been
changed.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
   
   So now we have both get_features and get_features_rev, and
   it's never clear which revision does host_features refer to.
   IMHO that's just too messy.
   Let's add get_legacy_features and host_legacy_features instead?
  
  I wanted to avoid touching anything that does not support version 1.
  And this interface might still work for later revisions, no?
 
 We can add _modern_ then, or rename host_features to host_legacy_features
 everywhere as preparation.
 

OK, I've ditched the don't modify old stuff goal and introduced
-get_features_legacy(). For now, devices will add VERSION_1 in their
-get_features() callback when they support it. (For many devices, this
will be the only difference between the two callbacks.)

Two sets of host_features don't make much sense to me.

I've hacked up something and will play with it a bit; I might post a
new patch series next week.




Re: [Qemu-devel] [PATCH RFC v6 19/20] virtio-blk: revision specific feature bits

2015-01-30 Thread Cornelia Huck
On Wed, 7 Jan 2015 21:11:44 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Jan 07, 2015 at 05:29:49PM +0100, Cornelia Huck wrote:
  On Sun, 28 Dec 2014 12:24:46 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On Thu, Dec 11, 2014 at 02:25:21PM +0100, Cornelia Huck wrote:
Wire up virtio-blk to provide different feature bit sets depending
on whether legacy or v1.0 has been requested.

Note that VERSION_1 is still disabled due to missing ANY_LAYOUT support.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
   
   So we need some way for devices to tell transports
   not to negotiate rev 1.
   Does clearing VERSION_1 have this effect?
   
  I just noticed that my patch is running in circles here.
  
  What we need is probably the transport-dependent maximum revision
  checker (which at least for ccw is acting on a device) pass in the
  requested revision and check if the feature bits for the revision
  include VERSION_1. Does that make sense?
 
 Just make devices set 'rev 1 supported' flag?

I'm now using the -get_features() callback to check for VERSION_1
(assuming every device that supports it adds the bit in its callback)
and only allow rev 1 if it is present. Will play with this a bit as
well.




Re: [Qemu-devel] [PATCH v11 03/13] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2015-01-30 Thread Kevin Wolf
Am 21.01.2015 um 10:34 hat Markus Armbruster geschrieben:
 I'm afraid I forgot much of the discussion we had before the break, and
 only now it's coming back, slowly.
 
 Quoting myself on naming parameters identifying nodes[*]:
 
 John Snow pointed out to me that we still haven't spelled out how this
 single parameter should be named.
 
 On obvious option is calling it node-name, or FOO-node-name when we have
 several.  However, we'd then have two subtly different kinds of
 parameters called like that: the old ones accept *only* node names, the
 new ones also accept backend names, which automatically resolve to the
 backend's root node.
 
 Three ways to cope with that:
 
 * Find a better name.
 
 * Make the old ones accept backend names, too.  Only a few, not that
   much work.  However, there are exceptions:
 
   - blockdev-add's node-name *defines* the node name.
 
   - query-named-block-nodes's node-name *is* the node's name.
 
 * Stop worrying and embrace the inconsistency.  The affected commands
   are headed for deprecation anyway.
 
 I think I'd go with node or FOO-node for parameters that reference
 nodes and accept both node names and backend names, and refrain from
 touching the existing node-name parameters.

Wasn't the conclusion last time that we would try to find a better name
for new commands and leave old commands alone because they are going to
become deprecated? That is, a combination of your first and last option?

 Let's go through existing uses of @node-name again:
 
 1. Define a node name
 
QMP commands blockdev-add (type BlockdevOptionsBase), drive-mirror
 
 2. Report a node name
 
QMP command query-named-block-nodes (type BlockDeviceInfo)

Whatever name we end up using, 1. and 2. should probably use the same.

 3. Node reference with backend names permitted for convenience
 
New QMP command block-dirty-bitmap-add (type BlockDirtyBitmapAdd) and
others
 
 4. Node reference with backend names not permitted
   
QMP commands drive-mirror @replaces, change-backing-file
@image-node-name
 
We may want to support the backend name resolves to root node
convenience feature here, for consistency.  Then this moves under 3.
 
Note interface wart: change-backing-file additionally requires the
backend owning the node.  We need the backend to set op-blockers, we
can't easily find it from the node, so we make the user point it out
to us.

These shouldn't be existing. As you say, we should move them to 3.

 5. Pair of names node reference, specify exactly one
 
QMP commands block_passwd, block_resize, blockdev-snapshot-sync
 
We can ignore this one, because we intend to replace the commands and
deprecate the old ones.

Agreed, these shouldn't be existing either.

 If I understand you correctly, you're proposing to use @node-name or
 @FOO-node-name when the value must be a node name (items 1+2 and 4), and
 @node-ref or @FOO-node-ref where we additionally support the backend
 name resolves to root node convenience feature (item 3).
 
 Is that a fair description of your proposal?
 
 PRO: the name makes it clear when the convenience feature is supported.
 
 CON: if we eliminate 4 by supporting the convenience feature, we either
 create ugly exceptions to the naming convention, or rename the
 parameters.
 
 Opinions?

If we don't have any cases where node names are allowed, but backend
names are not, then there is no reason to have two different names. I've
yet to see a reason for having commands that can accept node names, but
not backend names.

It's a bit different when the command can already accept both, but uses
two separate arguments for it. But I think most of them will be
deprecated, so we can ignore them here.

As for the naming, I'm not that sure that it's even useful to add
something to the field name. After all, this is really the _type_ of the
object, not the name. We don't have fields like 'read-only-bool' either.

If we're more specifically looking at things that actually refer to
block devices, you already mentioned drive-mirrors @replaces, which is a
great name in my opinion. @replaces-node-ref wouldn't improve anything.
Likewise, blockdev-add already refers to 'file' and 'backing' instead of
'file-node' or 'backing-node-ref'.

This probably means that FOO-node-{ref,name} shouldn't exist, because
just FOO is as good or better. The question is a bit harder where there
is only one node involved and we don't have a nice word to describe its
role for the command. This is where we used to use 'device' in the past,
when node-level addressing didn't exist yet. I think just 'node' would
be fine there.

Kevin



[Qemu-devel] [PATCH v3 3/4] block-backend: expose bs-bl.max_transfer_length

2015-01-30 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d00c129..c28e240 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -580,6 +580,11 @@ int blk_get_flags(BlockBackend *blk)
 return bdrv_get_flags(blk-bs);
 }
 
+int blk_get_max_transfer_length(BlockBackend *blk)
+{
+return blk-bs-bl.max_transfer_length;
+}
+
 void blk_set_guest_block_size(BlockBackend *blk, int align)
 {
 bdrv_set_guest_block_size(blk-bs, align);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8871a02..aab12b9 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -127,6 +127,7 @@ int blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
+int blk_get_max_transfer_length(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_blockalign(BlockBackend *blk, size_t size);
 bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp);
-- 
1.9.1




[Qemu-devel] [PATCH v3 1/4] block: add accounting for merged requests

2015-01-30 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
Reviewed-by: Eric Blake ebl...@redhat.com
---
 block.c|  2 ++
 block/accounting.c |  7 +++
 block/qapi.c   |  2 ++
 hmp.c  |  6 +-
 include/block/accounting.h |  3 +++
 qapi/block-core.json   |  9 -
 qmp-commands.hx| 22 ++
 7 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..61412e9 100644
--- a/block.c
+++ b/block.c
@@ -4562,6 +4562,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 }
 }
 
+block_acct_merge_done(bs-stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
+
 return outidx + 1;
 }
 
diff --git a/block/accounting.c b/block/accounting.c
index 18102f0..01d594f 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -54,3 +54,10 @@ void block_acct_highest_sector(BlockAcctStats *stats, 
int64_t sector_num,
 stats-wr_highest_sector = sector_num + nb_sectors - 1;
 }
 }
+
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+  int num_requests)
+{
+assert(type  BLOCK_MAX_IOTYPE);
+stats-merged[type] += num_requests;
+}
diff --git a/block/qapi.c b/block/qapi.c
index 75c388e..d1a8917 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -335,6 +335,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs,
 s-stats-wr_bytes = bs-stats.nr_bytes[BLOCK_ACCT_WRITE];
 s-stats-rd_operations = bs-stats.nr_ops[BLOCK_ACCT_READ];
 s-stats-wr_operations = bs-stats.nr_ops[BLOCK_ACCT_WRITE];
+s-stats-rd_merged = bs-stats.merged[BLOCK_ACCT_READ];
+s-stats-wr_merged = bs-stats.merged[BLOCK_ACCT_WRITE];
 s-stats-wr_highest_offset =
 bs-stats.wr_highest_sector * BDRV_SECTOR_SIZE;
 s-stats-flush_operations = bs-stats.nr_ops[BLOCK_ACCT_FLUSH];
diff --git a/hmp.c b/hmp.c
index 481be80..5a10154 100644
--- a/hmp.c
+++ b/hmp.c
@@ -474,6 +474,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
 wr_total_time_ns=% PRId64
 rd_total_time_ns=% PRId64
 flush_total_time_ns=% PRId64
+rd_merged=% PRId64
+wr_merged=% PRId64
\n,
stats-value-stats-rd_bytes,
stats-value-stats-wr_bytes,
@@ -482,7 +484,9 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats-value-stats-flush_operations,
stats-value-stats-wr_total_time_ns,
stats-value-stats-rd_total_time_ns,
-   stats-value-stats-flush_total_time_ns);
+   stats-value-stats-flush_total_time_ns,
+   stats-value-stats-rd_merged,
+   stats-value-stats-wr_merged);
 }
 
 qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 50b42b3..4c406cf 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -39,6 +39,7 @@ typedef struct BlockAcctStats {
 uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
+uint64_t merged[BLOCK_MAX_IOTYPE];
 uint64_t wr_highest_sector;
 } BlockAcctStats;
 
@@ -53,5 +54,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
unsigned int nb_sectors);
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+   int num_requests);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 80984d1..b7d9772 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -407,13 +407,20 @@
 # growable sparse files (like qcow2) that are used on top
 # of a physical device.
 #
+# @rd_merged: Number of read requests that have been merged into another
+# request (Since 2.3).
+#
+# @wr_merged: Number of write requests that have been merged into another
+# request (Since 2.3).
+#
 # Since: 0.14.0
 ##
 { 'type': 'BlockDeviceStats',
   'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
'wr_operations': 'int', 'flush_operations': 'int',
'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } }
+   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
+   'rd_merged': 'int', 'wr_merged': 'int' } }
 
 ##
 # @BlockStats:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c5f16dd..af3fd19 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2303,6 +2303,10 @@ Each json-object 

Re: [Qemu-devel] [PATCH v2 3/7] softfloat: Convert `*_default_nan' variables into inline functions

2015-01-30 Thread Leon Alrae
On 12/12/2014 19:34, Maciej W. Rozycki wrote:
 Mechanically replace `*_default_nan' variables with inline functions and 
 convert references accordingly.  Use `__inline__' rather than `inline' 
 so that the latter does not cause the definitions to become static as a 
 result of macro expansion, the functions are best inlined when referred 
 to from softfloat.c, but external copies must be also produced for 
 external callers.
 
 Signed-off-by: Thomas Schwinge tho...@codesourcery.com
 Signed-off-by: Maciej W. Rozycki ma...@codesourcery.com
snip
 Index: qemu-git-trunk/include/fpu/softfloat.h
 ===
 --- qemu-git-trunk.orig/include/fpu/softfloat.h   2014-12-11 
 18:15:02.0 +
 +++ qemu-git-trunk/include/fpu/softfloat.h2014-12-11 18:18:00.918095644 
 +
 @@ -370,7 +370,7 @@ static inline int float16_is_any_nan(flo
  
 /*
  | The pattern for a default generated half-precision NaN.
  
 **/
 -extern const float16 float16_default_nan;
 +__inline__ float16 float16_default_nan(void);
  
  
 /*
  | Software IEC/IEEE single-precision conversion routines.
 @@ -482,7 +482,7 @@ static inline float32 float32_set_sign(f
  
 /*
  | The pattern for a default generated single-precision NaN.
  
 **/
 -extern const float32 float32_default_nan;
 +__inline__ float32 float32_default_nan(void);
  
  
 /*
  | Software IEC/IEEE double-precision conversion routines.
 @@ -594,7 +594,7 @@ static inline float64 float64_set_sign(f
  
 /*
  | The pattern for a default generated double-precision NaN.
  
 **/
 -extern const float64 float64_default_nan;
 +__inline__ float64 float64_default_nan(void);
  
  
 /*
  | Software IEC/IEEE extended double-precision conversion routines.
 @@ -679,7 +679,7 @@ static inline int floatx80_is_any_nan(fl
  
 /*
  | The pattern for a default generated extended double-precision NaN.
  
 **/
 -extern const floatx80 floatx80_default_nan;
 +__inline__ floatx80 floatx80_default_nan(void);
  
  
 /*
  | Software IEC/IEEE quadruple-precision conversion routines.
 @@ -760,6 +760,6 @@ static inline int float128_is_any_nan(fl
  
 /*
  | The pattern for a default generated quadruple-precision NaN.
  
 **/
 -extern const float128 float128_default_nan;
 +__inline__ float128 float128_default_nan(void);
  

Unfortunately clang complains about it and eventually it won't link:

  CCaarch64-softmmu/target-arm/helper-a64.o
In file included from /qemu/target-arm/helper-a64.c:20:
In file included from /qemu/target-arm/cpu.h:37:
In file included from /qemu/include/qemu-common.h:120:
In file included from /qemu/include/qemu/bswap.h:8:
/qemu/include/fpu/softfloat.h:485:20: warning: inline function
'float32_default_nan' is not defined [-Wundefined-inline]
__inline__ float32 float32_default_nan(void);
   ^
/qemu/target-arm/helper-a64.c:345:19: note: used here
nan = float32_default_nan();
  ^
In file included from /qemu/target-arm/helper-a64.c:20:
In file included from /qemu/target-arm/cpu.h:37:
In file included from /qemu/include/qemu-common.h:120:
In file included from /qemu/include/qemu/bswap.h:8:
/qemu/include/fpu/softfloat.h:597:20: warning: inline function
'float64_default_nan' is not defined [-Wundefined-inline]
__inline__ float64 float64_default_nan(void);
   ^
/qemu/target-arm/helper-a64.c:374:19: note: used here
nan = float64_default_nan();
  ^
2 warnings generated.
  CCaarch64-softmmu/target-arm/gdbstub64.o
  CCaarch64-softmmu/target-arm/crypto_helper.o
  GEN   trace/generated-helpers.c
  CCaarch64-softmmu/trace/generated-helpers.o
  LINK  aarch64-softmmu/qemu-system-aarch64
fpu/softfloat.o: In function `commonNaNToFloat64':
/qemu/fpu/softfloat-specialize.h:796: undefined reference to
`float64_default_nan'
/qemu/fpu/softfloat-specialize.h:805: undefined reference to
`float64_default_nan'

  1   2   3   >