Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
 Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
   You only need to fdatasync() before every guest flush, no?
  No, you need to set the dirty bit before issuing the write on the
  host.  Otherwise the image data may be modified without setting the
  appropriate dirty bit.  That would allow data modifications to go
  undetected!
 
 But data modifications can go undetected until the guest flush returns,
 can't they?

You are thinking about it from the guest perspective - if a flush has
not completed yet then there is no guarantee that the write has reached
disk.

But from a host perspective the dirty bitmap should be conservative so
that the backup application can always restore a bit-for-bit identical
copy of the disk image.  It would be weird if writes can sneak in
unnoticed.

Stefan



Re: [Qemu-devel] [PATCH v2 2/2] net: introduce command to query mac-table information

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 12:46:09PM +0800, Amos Kong wrote:
 On Fri, May 17, 2013 at 09:39:31AM +0200, Stefan Hajnoczi wrote:
  On Thu, May 16, 2013 at 07:07:25PM +0800, Amos Kong wrote:
 
 Hi Stefan,
 
   @@ -961,6 +961,44 @@ void print_net_client(Monitor *mon, NetClientState 
   *nc)
   nc-info_str);
}

   +MacTableInfoList *qmp_query_mac_table(bool has_name, const char *name,
   +  Error **errp)
   +{
   +NetClientState *nc;
   +MacTableInfoList *table_list = NULL, *last_entry = NULL;
   +
   +QTAILQ_FOREACH(nc, net_clients, next) {
   +MacTableInfoList *entry;
   +MacTableInfo *info;
   +
   +if (nc-info-type != NET_CLIENT_OPTIONS_KIND_NIC) {
   +continue;
   +}
 
   +if (has_name  strcmp(nc-name, name) != 0) {
   +continue;
   +}
 
  if (has_name  strcmp(nc-name, name) != 0) {
  error_setg(errp, invalid net client name: %s, name);
  break;
  }

We still need to search the other NICs.  Bailing early doesn't work:
imagine we have nic1 and nic2.  If the user invokes query-mac-table nic2
then this code would return an error!

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:25:01AM +0800, Wenchao Xia wrote:
 于 2013-5-17 17:14, Stefan Hajnoczi 写道:
 On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote:
 于 2013-5-16 15:47, Stefan Hajnoczi 写道:
 On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote:
After checking the code, I found it possible to add delta data backup
 support also, If an additional dirty bitmap was added.
 
 I've been thinking about this.  Incremental backups need to know which
 blocks have changed, but keeping a persistent dirty bitmap is expensive
 and unnecessary.
 
Yes, it would be likely another block layer, so hope not do that.
 
 Not at all, persistent dirty bitmaps need to be part of the block layer
 since they need to support any image type - qcow2, Gluster, raw LVM,
 etc.
 
 I don't consider block jobs to be qemu device layer.  It sounds like
 you think the code should be in bdrv_co_do_writev()?
 
I feel a trend of becoming fragility from different solutions,
 and COW is a key feature that block layer provide, so I wonder if it
 can be adjusted under block layer later
 
 The generic block layer includes more than just block.c.  It also
 includes block jobs and the qcow2 metadata cache that Dong Xu has
 extracted recently, for example.  Therefore you need to be more specific
 about what and why.
 
 This copy-on-write backup approach is available as a block job which
 runs on top of any BlockDriverState.  What concrete change are you
 proposing?
 
   Since hard to hide it BlockDriverState now, suggest add some
 document in qemu about the three snapshot types: qcow2 internal,
 backing chain, drive-backup, which are all qemu software based snapshot
 implemention, then user can know the difference with it eaiser.
 
   In long term, I hope to form a library expose those in a unified
 format, perhaps it calls qmp_transaction internally, and make it
 easier to be offloaded if possible, so hope a abstract-driver structure.

Okay, just keep in mind they have different behavior.  That means these
snapshot types solve different problems and may be inappropriate for
some use cases.

Stefan



Re: [Qemu-devel] [PATCH] hw/9pfs: use O_NOFOLLOW for mapped readlink operation

2013-05-21 Thread Stefan Hajnoczi
On Mon, May 20, 2013 at 11:04:29PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 With mapped security models like mapped-xattr and mapped-file, we save the
 symlink target as file contents. Now if we ever expose a normal directory
 with mapped security model and find real symlinks in export path, never
 follow them and return proper error.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  hw/9pfs/virtio-9p-local.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com



Re: [Qemu-devel] [PATCH 0/2] coroutine: dataplane support

2013-05-21 Thread Stefan Hajnoczi
On Fri, May 17, 2013 at 04:08:36PM +0100, Peter Maydell wrote:
 On 17 May 2013 14:51, Stefan Hajnoczi stefa...@redhat.com wrote:
  There is ongoing work to enable multiple event loop threads.  This will 
  allow
  QEMU itself to take advantage of SMP and reduce Big QEMU Lock (BQL) 
  contention.
  This series is one step in that effort.
 
  These patches make coroutines safe in a multi-event loop/multi-threaded 
  world.
  I have successfully tested them running qcow2 in a dataplane thread (further
  patches are required which I'll be sending soon).
 
 Did you test the sigaltstack backend as well as ucontext? I know
 in theory they should be the same but still :-)

Yes, I have tested sigaltstack successfully with make check and
qemu-iotests.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote:
 Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto:
  On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote:
  Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto:
  You only need to fdatasync() before every guest flush, no?
  No, you need to set the dirty bit before issuing the write on the
  host.  Otherwise the image data may be modified without setting the
  appropriate dirty bit.  That would allow data modifications to go
  undetected!
 
  But data modifications can go undetected until the guest flush returns,
  can't they?
  
  You are thinking about it from the guest perspective - if a flush has
  not completed yet then there is no guarantee that the write has reached
  disk.
  
  But from a host perspective the dirty bitmap should be conservative so
  that the backup application can always restore a bit-for-bit identical
  copy of the disk image.  It would be weird if writes can sneak in
  unnoticed.
 
 True, but that would happen only in case the host crashes.  Even for a
 QEMU crash the changes would be safe, I think.  They would be written
 back when the persistent dirty bitmap's mmap() area is unmapped, during
 process exit.

I'd err on the side of caution, mark the persistent dirty bitmap while
QEMU is running.  Discard the file if there was a power failure.

It really depends what the dirty bitmap users are doing.  It could be
okay to have a tiny chance of missing a modification but it might not.

Stefan



[Qemu-devel] [PATCH v2] tests: set MALLOC_PERTURB_ to expose memory bugs

2013-05-21 Thread Stefan Hajnoczi
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment
variable is set.  The value of the environment variable determines the
bit pattern used to wipe memory.  For more information, see
http://udrepper.livejournal.com/11429.html.

Set MALLOC_PERTURB_ for gtester and qemu-iotests.  Note we pick a random
value from 1 to 255 to expose more bugs.  If you need to reproduce a
crash use 'show environment' in gdb to extract the MALLOC_PERTURB_
value from a core dump.

Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/Makefile   | 5 -
 tests/qemu-iotests/check | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index a307d5a..24880c6 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
 $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y)
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
$(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
+   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
gtester $(GTESTER_OPTIONS) -m=$(SPEED) 
$(check-qtest-$*-y),GTESTER $@)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \
  echo Gcov report for $$f:;\
@@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): 
check-qtest-%: $(check-qtest-y)
 .PHONY: $(patsubst %, check-%, $(check-unit-y))
 $(patsubst %, check-%, $(check-unit-y)): check-%: %
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
-   $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER 
$*)
+   $(call quiet-command, \
+   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
+   gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \
  echo Gcov report for $$f:;\
  $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..74628ae 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -214,7 +214,8 @@ do
start=`_wallclock`
$timestamp  echo -n  [`date +%T`]
[ ! -x $seq ]  chmod u+x $seq # ensure we can run it
-   ./$seq $tmp.out 21
+   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
+   ./$seq $tmp.out 21
sts=$?
$timestamp  _timestamp
stop=`_wallclock`
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver

2013-05-21 Thread Stefan Hajnoczi
On Mon, May 20, 2013 at 01:30:37PM +0200, Paolo Bonzini wrote:
 Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto:
  +wait_for_overlapping_requests(job, start, end);
  +cow_request_begin(cow_request, job, start, end);
  +
  +for (; start  end; start++) {
  +if (hbitmap_get(job-bitmap, start)) {
  +DPRINTF(brdv_co_backup_cow skip C% PRId64 \n, start);
  +continue; /* already copied */
  +}
  +
  +/* immediately set bitmap (avoid coroutine race) */
  +hbitmap_set(job-bitmap, start, 1);
  +
 
 HBitmap already has code for finding the next set bit, but you're not
 using it.
 
 If you reverse the direction of the bitmap, you can use an HBitmapIter here:
 
  
  +start = 0;
  +end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
  +   BACKUP_BLOCKS_PER_CLUSTER);
  +
  +job-bitmap = hbitmap_alloc(end, 0);
  +
  +before_write = bdrv_add_before_write_cb(bs, backup_before_write);
  +
  +DPRINTF(backup_run start %s % PRId64  % PRId64 \n,
  +bdrv_get_device_name(bs), start, end);
  +
  +for (; start  end; start++) {
 
 ... instead of iterating through each item manually.

/**
 * hbitmap_iter_init:
[...]
 * position of the iterator is also okay.  However, concurrent resetting of
 * bits can lead to unexpected behavior if the iterator has not yet reached
 * those bits. 
 */ 
void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first);

This worries me.  We would initialize the bitmap to all 1s.  Backing up
a cluster resets the bit.  But the documentation says it is not safe to
reset bits while iterating?

Stefan



Re: [Qemu-devel] [PATCH v1 07/14] net: hub use lock to protect ports list

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 07, 2013 at 01:46:55PM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
 Hub ports will run on multi-threads, so use lock to protect them.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  net/hub.c |   25 -
  1 files changed, 24 insertions(+), 1 deletions(-)

Does this rely on net queue re-entrancy detection?  Otherwise we'll
deadlock when slirp re-enters the net layer, for example for ARP
requests.

I suggest moving this later in the patch series when the net queue is
protected already.

Stefan



Re: [Qemu-devel] [PATCH v1 08/14] net: introduce lock to protect NetQueue

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 07, 2013 at 01:46:56PM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
 NetQueue will be accessed by nc and its peers at the same time,
 need lock to protect it.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  net/queue.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)
 
 diff --git a/net/queue.c b/net/queue.c
 index 859d02a..2856c1d 100644
 --- a/net/queue.c
 +++ b/net/queue.c
 @@ -53,6 +53,7 @@ struct NetQueue {
  uint32_t nq_maxlen;
  uint32_t nq_count;
  
 +QemuMutex lock;
  QTAILQ_HEAD(packets, NetPacket) packets;
  
  unsigned delivering : 1;
 @@ -68,6 +69,7 @@ NetQueue *qemu_new_net_queue(void *opaque)
  queue-nq_maxlen = 1;
  queue-nq_count = 0;
  
 +qemu_mutex_init(queue-lock);
  QTAILQ_INIT(queue-packets);
  
  queue-delivering = 0;
 @@ -107,7 +109,9 @@ static void qemu_net_queue_append(NetQueue *queue,
  memcpy(packet-data, buf, size);
  
  queue-nq_count++;
 +qemu_mutex_lock(queue-lock);
  QTAILQ_INSERT_TAIL(queue-packets, packet, entry);
 +qemu_mutex_unlock(queue-lock);

nq_count must be protected too.



Re: [Qemu-devel] [PATCH v2] tests: set MALLOC_PERTURB_ to expose memory bugs

2013-05-21 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 06:56:07AM -0600, Eric Blake wrote:
 On 05/21/2013 06:43 AM, Stefan Hajnoczi wrote:
  glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment
  variable is set.  The value of the environment variable determines the
  bit pattern used to wipe memory.  For more information, see
  http://udrepper.livejournal.com/11429.html.
  
  Set MALLOC_PERTURB_ for gtester and qemu-iotests.  Note we pick a random
  value from 1 to 255 to expose more bugs.  If you need to reproduce a
  crash use 'show environment' in gdb to extract the MALLOC_PERTURB_
  value from a core dump.
  
  Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   tests/Makefile   | 5 -
   tests/qemu-iotests/check | 3 ++-
   2 files changed, 6 insertions(+), 2 deletions(-)
  
  diff --git a/tests/Makefile b/tests/Makefile
  index a307d5a..24880c6 100644
  --- a/tests/Makefile
  +++ b/tests/Makefile
  @@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
   $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: 
  $(check-qtest-y)
  $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
  $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
  +   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
 
 This is a Makefile; don't you need to use $$ instead of $ (three instances)?
 
 $RANDOM is a bash-ism.  If make is run with SHELL as /bin/sh on a
 platform where dash is the primary shell, it will fail:
 
 $ dash -c 'echo $(($RANDOM % 255))'
 dash: 1: arithmetic expression: expecting primary:  % 255
 
 HOWEVER: you can exploit the fact that inside $(()), you don't need $ to
 use the value of a defined variable, and also the fact that unless set
 -u is in effect, an undefined variable name silently evaluates as 0:
 
 $ dash -c 'echo $((RANDOM % 255))'
 0
 
 then you could write the shell code:
 
   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))}
 
 or the Makefile code:
 
   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))}
 
 and things will at least work on /bin/sh as dash (even though there will
 be no randomness and you are always testing with 1 in that case).

Silly me.  I did test it but it silently worked.

Will resend.

Stefan



[Qemu-devel] [PATCH v3] tests: set MALLOC_PERTURB_ to expose memory bugs

2013-05-22 Thread Stefan Hajnoczi
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment
variable is set.  The value of the environment variable determines the
bit pattern used to wipe memory.  For more information, see
http://udrepper.livejournal.com/11429.html.

Set MALLOC_PERTURB_ for gtester and qemu-iotests.  Note we pick a random
value from 1 to 255 to expose more bugs.  If you need to reproduce a
crash use 'show environment' in gdb to extract the MALLOC_PERTURB_
value from a core dump.

Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
v3:
 * Use $ escaping in tests/Makefile [eblake]

v2:
 * Randomize MALLOC_PERTURB_ value [armbru]
 * Preserve existing MALLOC_PERTURB_ variable, if set [danpb]

 tests/Makefile   | 5 -
 tests/qemu-iotests/check | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index a307d5a..7844cd7 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
 $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y)
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
$(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
+   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$(($$RANDOM % 255 + 1))} \
gtester $(GTESTER_OPTIONS) -m=$(SPEED) 
$(check-qtest-$*-y),GTESTER $@)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \
  echo Gcov report for $$f:;\
@@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): 
check-qtest-%: $(check-qtest-y)
 .PHONY: $(patsubst %, check-%, $(check-unit-y))
 $(patsubst %, check-%, $(check-unit-y)): check-%: %
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
-   $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER 
$*)
+   $(call quiet-command, \
+   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$(($$RANDOM % 255 + 1))} \
+   gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \
  echo Gcov report for $$f:;\
  $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..74628ae 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -214,7 +214,8 @@ do
start=`_wallclock`
$timestamp  echo -n  [`date +%T`]
[ ! -x $seq ]  chmod u+x $seq # ensure we can run it
-   ./$seq $tmp.out 21
+   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
+   ./$seq $tmp.out 21
sts=$?
$timestamp  _timestamp
stop=`_wallclock`
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH] kvm: add detail error message when fail to add ioeventfd

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote:
 I try to hotplug 28 * 8 multiple-function devices to guest with
 old host kernel, ioeventfds in host kernel will be exhausted, then
 qemu fails to allocate ioeventfds for blk/nic devices.
 
 It's better to add detail error here.
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  kvm-all.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

It would be nice to make kvm bus scalable so that the hardcoded
in-kernel I/O device limit can be lifted.

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com



Re: [Qemu-devel] 'qemu-nbd' explicit flush

2013-05-22 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 08:01:10PM +, Mark Trumpold wrote:
 Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
 http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108

Did you check that the kernel is sending NBD_FLUSH commands?  You can
use tcpdump and then check the captured network traffic.

 Usage example:
 'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
 'mount /dev/nbd0 /my-mount-point'
 
 Everything does flush correctly when I first unmount and then disconnect the 
 device; however, in my case I am not able to unmount things before 
 snapshotting.
 
 I tried several approaches externally to flush the device.  For example:
 'mount -o remount,ro /dev/nbd0'
 'blockdev --flushbufs /dev/nbd0'

Did you try plain old sync(1)?

 I have been looking at the Qemu source code and in user space 'nbd.c' in 
 routine 'nbd_trip' I see the case 'NBD_CMD_FLUSH' which looks to be called 
 from the NBD socket interface.  Here I see 'bdrv_co_flush(exp-bs)' which 
 looks promising; however, I don't know how to setup the 'bs' pointer for the 
 call.

bs is the block device which was exported using:

exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed);

in qemu-nbd.c:main().

 Ideally, I would like to add a command line parm to 'qemu-nbd.c' to 
 explicitely do the flush, but so far no luck.

Doing that is a little tricky, I think there are two options:

1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
   flushes all exports.

2. Instantiate a block/nbd.c client that connects to the running
   qemu-nbd server (make sure format=raw).  Then call bdrv_flush() on
   the NBD client.  You must use the qemu-nbd --shared=2 option.

Stefan



Re: [Qemu-devel] [PATCH] makefile: detect corrupted elf files

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 12:46:45AM +0300, Michael S. Tsirkin wrote:
 Once in a while make gets killed and doesn't
 clean up partial object files after it.
 Result is nasty errors from link.
 This hack checks object is well formed before linking,
 and rebuilds it if not.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Is below useful for others?
 
  Makefile.target | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/Makefile.target b/Makefile.target
 index ce4391f..4dddee5 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -191,3 +191,10 @@ endif
  
  GENERATED_HEADERS += config-target.h
  Makefile: $(GENERATED_HEADERS)
 +
 +.SECONDEXPANSION:
 +
 +.PHONY: CORRUPTBINARY
 +
 +$(all-obj-y): % : $$(if $$(shell size %), , CORRUPTBINARY)

How does size(1) establish the validity of the ELF file?  Is it possible
to sneak past a truncated file (which I think is the only type of
corruption you're trying to protect against)?

Stefan



[Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written

2013-05-22 Thread Stefan Hajnoczi
Net queues support efficient receive disable.  For example, tap's file
descriptor will not be polled while its peer has receive disabled.  This
saves CPU cycles for needlessly copying and then dropping packets which
the peer cannot receive.

rtl8139 is missing the qemu_flush_queued_packets() call that wakes the
queue up when receive becomes possible again.

As a result, the Windows 7 guest driver reaches a state where the
rtl8139 cannot receive packets.  The driver has actually refilled the
receive buffer but we never resume reception.

The bug can be reproduced by running a large FTP 'get' inside a Windows
7 guest:

  $ qemu -netdev tap,id=tap0,...
 -device rtl8139,netdev=tap0

The Linux guest driver does not trigger the bug, probably due to a
different buffer management strategy.

Reported-by: Oliver Francke oliver.fran...@filoo.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/net/rtl8139.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 9369507..7993f9f 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, 
uint32_t val)
 /* this value is off by 16 */
 s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize);
 
+/* more buffer space may be available so try to receive */
+qemu_flush_queued_packets(qemu_get_queue(s-nic));
+
 DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n,
 s-RxBufferSize, s-RxBufAddr, s-RxBufPtr);
 }
-- 
1.8.1.4




Re: [Qemu-devel] Fwd: [Qemu-stable] connectivity problem with Windows 7 + heavy network-traffic

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 11:27 AM, Oliver Francke oliver.fran...@filoo.dewrote:

  Hi Stefan,

 thanks for your attention. See all infos below including complete
 command-line.


Hi Oliver,
I just sent a fix for qemu.git/master.  If you need to backport it, replace
qemu_get_queue(s-nic) with s-nic-nc.

Please let me know if the fix solves the issue for you.

Thanks,
Stefan


Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 2:53 PM, Andreas Färber afaer...@suse.de wrote:
 Am 22.05.2013 14:50, schrieb Stefan Hajnoczi:
 Net queues support efficient receive disable.  For example, tap's file
 descriptor will not be polled while its peer has receive disabled.  This
 saves CPU cycles for needlessly copying and then dropping packets which
 the peer cannot receive.

 rtl8139 is missing the qemu_flush_queued_packets() call that wakes the
 queue up when receive becomes possible again.

 As a result, the Windows 7 guest driver reaches a state where the
 rtl8139 cannot receive packets.  The driver has actually refilled the
 receive buffer but we never resume reception.

 The bug can be reproduced by running a large FTP 'get' inside a Windows
 7 guest:

   $ qemu -netdev tap,id=tap0,...
  -device rtl8139,netdev=tap0

 The Linux guest driver does not trigger the bug, probably due to a
 different buffer management strategy.

 Reported-by: Oliver Francke oliver.fran...@filoo.de
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com

 Sounds as if we should

 Cc: qemu-sta...@nongnu.org

Yes, please.  Oliver just confirmed that it fixes the issue for him on
IRC so this is good for QEMU 1.5.1.

Stefan



Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver

2013-05-22 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 06:46:39PM +0200, Paolo Bonzini wrote:
 Il 21/05/2013 18:26, Dietmar Maurer ha scritto:
  Hmm, right.  But do we need the bitmap at all?  We can just use
   bdrv_is_allocated like bdrv_co_do_readv does.
  Does that works with a nbd driver?
 
 Ah, right.  That's the answer.

Yes, the target may not supported allocation info.

  Or does that add another RPC call (slow down)?
 
 It doesn't work at all.  Thus Stefan's patch is fine (except if you
 don't use HBitmapIter, there's no advantage in using HBitmap; you can
 use qemu/bitmap.h instead).

I chose HBitmap because bitmap.h uses the int type instead of uint64_t,
which makes it risky to use in the block layer.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-22 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 10:58:47AM +, Dietmar Maurer wrote:
   True, but that would happen only in case the host crashes.  Even for
   a QEMU crash the changes would be safe, I think.  They would be
   written back when the persistent dirty bitmap's mmap() area is
   unmapped, during process exit.
  
   I'd err on the side of caution, mark the persistent dirty bitmap while
   QEMU is running.  Discard the file if there was a power failure.
  
  Agreed.  Though this is something that management must do manually, isn't 
  it?
  QEMU cannot distinguish a SIGKILL from a power failure, while management
  can afford treating SIGKILL as a power failure.
  
   It really depends what the dirty bitmap users are doing.  It could be
   okay to have a tiny chance of missing a modification but it might not.
 
 I just want to mention that there is another way to do incremental backups. 
 Instead
 of using a dirty bitmap, you can compare the content, usually using a digest 
 (SHA1) on clusters.

Reading gigabytes of data from disk is expensive though.  I guess they
keep a Merkle tree so it's easy to find out which parts of the image
must be transferred without re-reading the entire image.

That sounds like more work than a persistent dirty bitmap.  The
advantage is that while dirty bitmaps are consumed by a single user, the
Merkle tree can be used to sync up any number of replicas.

 That way you can also implement async replication to a remote site (like MS 
 do).

Sounds like rsync.

Stefan



Re: [Qemu-devel] [PATCH v4 2/8] block: add basic backup support to block driver

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 11:56:45AM +0200, Kevin Wolf wrote:
 Am 22.05.2013 um 11:54 hat Paolo Bonzini geschrieben:
  Il 22/05/2013 11:38, Kevin Wolf ha scritto:
   +
   +DPRINTF(brdv_co_backup_cow done C% PRId64 \n, start);
   +}
   +
   +out:
   +if (bounce_buffer) {
   +qemu_vfree(bounce_buffer);
   +}
   +
   +cow_request_end(cow_request);
   +
   +qemu_co_rwlock_unlock(job-flush_rwlock);
   +
   +return ret;
   +}
   +
   +static void coroutine_fn backup_before_write_notify(Notifier *notifier,
   +void *opaque)
   +{
   +BdrvTrackedRequest *req = opaque;
   +backup_do_cow(req-bs, req-sector_num, req-nb_sectors);
   +}
   
   I don't think you can ignore errors here. Not sure if we can stop the VM
   and resume later or something like that, but if we can't, the backup
   will be invalid and we must fail the job.
  
  Yes, there is rerror/werror machinery for jobs that this patch is not using.
 
 This is not enough here. The guest write can't continue before the old
 content is saved to the backup image.

Are you saying just the vm_stop(RUN_STATE_IO_ERROR) call is missing?

I think the reason it's not there is because there's an assumption that
block jobs are in the background and do not affect the guest.  The guest
continues running while the block job is in an error state.

But perhaps we can make the vm_stop() call optional so that drive-backup
can use it.

Stefan



Re: [Qemu-devel] [PATCH v4 4/8] qemu-iotests: add 055 drive-backup test case

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 01:19:58PM +0200, Kevin Wolf wrote:
 Am 16.05.2013 um 10:36 hat Stefan Hajnoczi geschrieben:
  Testing drive-backup is similar to image streaming and drive mirroring.
  This test case is based on 041.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   tests/qemu-iotests/055 | 230 
  +
   tests/qemu-iotests/055.out |   5 +
   tests/qemu-iotests/group   |   1 +
   3 files changed, 236 insertions(+)
   create mode 100755 tests/qemu-iotests/055
   create mode 100644 tests/qemu-iotests/055.out
  
  diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
  new file mode 100755
  index 000..bc2eebf
  --- /dev/null
  +++ b/tests/qemu-iotests/055
  @@ -0,0 +1,230 @@
  +#!/usr/bin/env python
  +#
  +# Tests for drive-backup
  +#
  +# Copyright (C) 2013 Red Hat, Inc.
  +#
  +# Based on 041.
  +#
  +# This program is free software; you can redistribute it and/or modify
  +# it under the terms of the GNU General Public License as published by
  +# the Free Software Foundation; either version 2 of the License, or
  +# (at your option) any later version.
  +#
  +# This program is distributed in the hope that it will be useful,
  +# but WITHOUT ANY WARRANTY; without even the implied warranty of
  +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  +# GNU General Public License for more details.
  +#
  +# You should have received a copy of the GNU General Public License
  +# along with this program.  If not, see http://www.gnu.org/licenses/.
  +#
  +
  +import time
  +import os
  +import iotests
  +from iotests import qemu_img, qemu_io
  +
  +test_img = os.path.join(iotests.test_dir, 'test.img')
  +target_img = os.path.join(iotests.test_dir, 'target.img')
  +
  +class DriveBackupTestCase(iotests.QMPTestCase):
  +'''Abstract base class for drive-backup test cases'''
  +
  +def assert_no_active_backups(self):
  +result = self.vm.qmp('query-block-jobs')
  +self.assert_qmp(result, 'return', [])
  +
  +def cancel_and_wait(self, drive='drive0'):
  +'''Cancel a block job and wait for it to finish'''
  +result = self.vm.qmp('block-job-cancel', device=drive)
  +self.assert_qmp(result, 'return', {})
  +
  +cancelled = False
  +while not cancelled:
  +for event in self.vm.get_qmp_events(wait=True):
  +if event['event'] == 'BLOCK_JOB_COMPLETED' or \
  +   event['event'] == 'BLOCK_JOB_CANCELLED':
  +self.assert_qmp(event, 'data/type', 'backup')
  +self.assert_qmp(event, 'data/device', drive)
  +cancelled = True
  +
  +self.assert_no_active_backups()
  +
  +def complete_and_wait(self):
  +completed = False
  +while not completed:
  +for event in self.vm.get_qmp_events(wait=True):
  +if event['event'] == 'BLOCK_JOB_COMPLETED':
  +self.assert_qmp(event, 'data/type', 'backup')
  +self.assert_qmp(event, 'data/device', 'drive0')
  +self.assert_qmp(event, 'data/offset', self.image_len)
  +self.assert_qmp(event, 'data/len', self.image_len)
  +completed = True
  +self.assert_no_active_backups()
  +
  +def compare_images(self, img1, img2):
  +try:
  +qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, 
  img1 + '.raw')
  +qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, 
  img2 + '.raw')
  +file1 = open(img1 + '.raw', 'r')
  +file2 = open(img2 + '.raw', 'r')
  +return file1.read() == file2.read()
  +finally:
  +if file1 is not None:
  +file1.close()
  +if file2 is not None:
  +file2.close()
  +try:
  +os.remove(img1 + '.raw')
  +except OSError:
  +pass
  +try:
  +os.remove(img2 + '.raw')
  +except OSError:
  +pass
 
 compare_images() is an exact copy of its 041 counterparts,
 complete_and_wait() and cancel_and_wait() are the same as in 041 in the
 wait_ready = False case. Sounds like candidates for factoring out.

Will fix.



Re: [Qemu-devel] [PATCH v4 3/8] block: add drive-backup QMP command

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 11:53:44AM +0200, Kevin Wolf wrote:
 Am 16.05.2013 um 10:36 hat Stefan Hajnoczi geschrieben:
  +proto_drv = bdrv_find_protocol(target);
  +if (!proto_drv) {
  +error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
  +return;
  +}
 
 I see that other block job commands are doing the same, but there are
 two things unclear for me about this:
 
 1. Shouldn't bdrv_img_create() and/or bdrv_open() already return an error
if the protocol can't be found? The error check is the only thing
that proto_drv is used for.
 
 2. Why do we complain about a wrong _format_ here? If I ask for a qcow2
image called foo:bar.qcow2, qemu won't find the protocol foo and
complain that qcow2 is an invalid format.

You are right.  It doesn't seem necessary, we'll get an error message
later from bdrv_open() or bdrv_img_create().

Will drop in the next revision.

  +
  +bdrv_get_geometry(bs, size);
  +size *= 512;
 
 I was going to suggest BDRV_SECTOR_SIZE at first, but...
 
 Why not use bdrv_getlength() in the first place if you need it in bytes
 anyway? As a nice bonus you would get -errno instead of size 0 on
 failure.

Yes.  Will fix and update qmp_drive_mirror() too.

  +if (mode != NEW_IMAGE_MODE_EXISTING) {
  +assert(format  drv);
  +bdrv_img_create(target, format,
  +NULL, NULL, NULL, size, flags, local_err, false);
  +}
  +
  +if (error_is_set(local_err)) {
  +error_propagate(errp, local_err);
  +return;
  +}
  +
  +target_bs = bdrv_new();
  +ret = bdrv_open(target_bs, target, NULL, flags, drv);
  +
  +if (ret  0) {
 
 Why the empty line between bdrv_open and checking its return value?

A dramatic pause, to build up suspense :).

Will drop in the next revision and update qmp_drive_mirror() too.



Re: [Qemu-devel] [PATCH v4 00/10] curl: fix curl read

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 12:12:24PM +0100, Richard W.M. Jones wrote:
 On Wed, May 22, 2013 at 01:04:51PM +0200, Paolo Bonzini wrote:
  Something is trying to write, but there's no write operation defined for
  CURL.
  
  I guess curl (and other backends too) should reject being opened for
  write.  Alternatively, block.c could do that for them.
 
 Yes, I'd just got to that conclusion as well :-)
 
 The attached patch fixes the crash for me.

Please post a top-level thread so that your patch gets picked up by
scripts and noticed by humans too :).

Alternatively Fam can include it in the next revision of this series.

Stefan



Re: [Qemu-devel] [PATCH] hw/9pfs: Use O_NOFOLLOW when opening files on server

2013-05-22 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 04:52:54PM +0530, Aneesh Kumar K.V wrote:
 diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
 index fe8e0ed..e2a89e3 100644
 --- a/hw/9pfs/virtio-9p-handle.c
 +++ b/hw/9pfs/virtio-9p-handle.c
 @@ -608,7 +608,7 @@ static int handle_init(FsContext *ctx)
  struct file_handle fh;
  struct handle_data *data = g_malloc(sizeof(struct handle_data));
  
 -data-mountfd = open(ctx-fs_root, O_DIRECTORY);
 +data-mountfd = open(ctx-fs_root, O_DIRECTORY | O_NOFOLLOW);

Why is the root path not allowed to be a symlink?

And if so, it would be more user-friendly to resolve the path before
open.  That way we don't need to bug the user with an error here.



Re: [Qemu-devel] [PATCH] kvm: add detail error message when fail to add ioeventfd

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 09:48:21PM +0800, Amos Kong wrote:
 On Wed, May 22, 2013 at 11:32:27AM +0200, Stefan Hajnoczi wrote:
  On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote:
   I try to hotplug 28 * 8 multiple-function devices to guest with
   old host kernel, ioeventfds in host kernel will be exhausted, then
   qemu fails to allocate ioeventfds for blk/nic devices.
   
   It's better to add detail error here.
   
   Signed-off-by: Amos Kong ak...@redhat.com
   ---
kvm-all.c |4 
1 files changed, 4 insertions(+), 0 deletions(-)
  
  It would be nice to make kvm bus scalable so that the hardcoded
  in-kernel I/O device limit can be lifted.
 
 I had increased kernel NR_IOBUS_DEVS to 1000 (a limitation is needed for
 security) in last Mar, and make resizing kvm_io_range array dynamical.

The maximum should not be hardcoded.  File descriptor, maximum memory,
etc are all controlled by rlimits.  And since ioeventfds are file
descriptors they are already limited by the maximum number of file
descriptors.

Why is there a need to impose a hardcoded limit?

Stefan



Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 11:46:15PM +0200, Paolo Bonzini wrote:
 Il 22/05/2013 22:47, Richard W.M. Jones ha scritto:
   
   I meant if there was interest in reading from a disk that isn't fully
   synchronized
   (yet) to the original disk (it might have old blocks).  Or would you only
   want to
   connect once a (complete) snapshot is available (synchronized completely 
   to
   some point-in.
  IIUC a disk which wasn't fully synchronized wouldn't necessarily be
  interpretable by libguestfs, so I guess we would need the complete
  snapshot.
 
 In the case of point-in-time backups (Stefan's block-backup) the plan is
 to have the snapshot complete from the beginning.

The way it will work is that the drive-backup target is a qcow2 image
with the guest's disk as its backing file.  When the guest writes to the
disk, drive-backup copies the original data to the qcow2 image.

The qcow2 image is exported over NBD so a client can connect to access
the read-only point-in-time snapshot.  It is not necessary to populate
the qcow2 file since it uses the guest disk as its backing file - all
reads to unpopulated clusters go to the backing file.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 03:34:18PM +, Dietmar Maurer wrote:
  That sounds like more work than a persistent dirty bitmap.  The advantage 
  is that
  while dirty bitmaps are consumed by a single user, the Merkle tree can be 
  used
  to sync up any number of replicas.
 
 I also consider it safer, because you make sure the data exists (using hash 
 keys like SHA1).
 
 I am unsure how you can check if a dirty bitmap contains errors, or is out of 
 date?
 
 Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is 
 always related to a single image.
 (consider the user remove the latest backup from the backup target). 

One disadvantage of Merkle trees is that the client becomes stateful -
the client needs to store its own Merkle tree and this requires fancier
client-side code.

It is also more expensive to update hashes than a dirty bitmap.  Not
because you need to hash data but because a small write (e.g. 1 sector)
requires that you read the surrounding sectors to recompute a hash for
the cluster.  Therefore you can expect worse guest I/O performance than
with a dirty bitmap.

I still think it's a cool idea.  Making it work well will require a lot
more effort than a dirty bitmap.

Stefan



Re: [Qemu-devel] QMP interface for drive-add (or even blockdev-add)

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 03:53:05PM +0200, Kevin Wolf wrote:
 Am 16.05.2013 um 21:05 hat Eric Blake geschrieben:
  On 05/16/2013 02:24 AM, Kevin Wolf wrote:
 The other thing that I'm not sure about is whether we should teach QAPI
 to parse certain data structures just into QDicts instead of C structs,
 or if dealing with the big unions inside the block layer actually makes
 sense.

This is an interesting question.  It's very convenient from the code
side - we don't have to worry about laying down a schema.

However, the point of QAPI is to offer that schema that allows for us to
reason about things like compatibility (hard to sneak in a patch that
modifies the schema, easy to sneak in a patch that modifies block driver
parameter code) and eliminates the boilerplate of type-checking/basic
input validation.

Even if it requires some effort, I think we should avoid tunneling
schema-less data over QAPI.

Stefan



Re: [Qemu-devel] [PATCH 0/2] coroutine: dataplane support

2013-05-23 Thread Stefan Hajnoczi
On Fri, May 17, 2013 at 03:51:24PM +0200, Stefan Hajnoczi wrote:
 There is ongoing work to enable multiple event loop threads.  This will allow
 QEMU itself to take advantage of SMP and reduce Big QEMU Lock (BQL) 
 contention.
 This series is one step in that effort.
 
 These patches make coroutines safe in a multi-event loop/multi-threaded world.
 I have successfully tested them running qcow2 in a dataplane thread (further
 patches are required which I'll be sending soon).
 
 Patch 1 protects the global coroutine freelist with a lock :).
 
 Patch 2 drops the CoQueue dependency on AioContext.  This allows CoMutex and
 CoRwlock to operate in a dataplane thread whereas previously we always
 scheduled coroutines in the QEMU iothread.
 
 Stefan Hajnoczi (2):
   coroutine: protect global pool with a mutex
   coroutine: stop using AioContext in CoQueue
 
  include/block/coroutine_int.h |  4 
  qemu-coroutine-lock.c | 56 
 ---
  qemu-coroutine.c  | 23 --
  trace-events  |  2 +-
  4 files changed, 47 insertions(+), 38 deletions(-)
 
 -- 
 1.8.1.4
 
 

Applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [Bug 1182490] [NEW] [qemu-1.5] coroutine-win32.c broken on NULL pointer

2013-05-23 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 02:11:05PM -, Cauchy Song wrote:
 Public bug reported:
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 4340.0x163c]
 qemu_coroutine_switch (action=COROUTINE_TERMINATE, to_=0x0, from_=0x3ba1c80)
 at /home/cauchy/vcs/git/qemu/coroutine-win32.c:47
 (gdb) bt
 #0  qemu_coroutine_switch (action=COROUTINE_TERMINATE, to_=0x0,
 from_=0x3ba1c80) at /home/cauchy/vcs/git/qemu/coroutine-win32.c:47
 #1  coroutine_trampoline (co_=0x3ba1c80)
 at /home/cauchy/vcs/git/qemu/coroutine-win32.c:58
 #2  0x77098fed in ?? ()
 #3  0x in ?? ()

What is the command-line?

How do you reproduce the crash?

Stefan



Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier

2013-05-23 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote:
 Add C++ keywords to avoid errors in compiling with c++ compiler.
 This also renames class member of PciDeviceInfo to q_class.
 
 Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com
 ---
  hmp.c   |2 +-
  hw/pci/pci.c|2 +-
  scripts/qapi.py |9 -
  3 files changed, 10 insertions(+), 3 deletions(-)

Please also extend scripts/checkpatch.pl.  Otherwise it is very likely
that C++ keywords will be introduced again in the future.  Most people
will not build the VSS code and therefore checkpatch.pl needs to ensure
that patches with C++ keywords will not be accepted.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS

2013-05-23 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote:
 * How to build  run qemu-ga with VSS support
 
  - Download Microsoft VSS SDK from:
http://www.microsoft.com/en-us/download/details.aspx?id=23490
 
  - Setup the SDK
scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)
 
  - Specify installed SDK directory to configure option as:
./configure -with-vss-sdk=path/to/VSS SDK 
 --cross-prefix=i686-w64-mingw32-

Are there any plans to make this more developer-friendly?  In the Linux
world it's unusual to download third-party SDKs; development headers are
available as packages from the distro.

I haven't looked at the SDK license but hopefully the VSS headers can be
added to the mingw cross-build toolchain?

Stefan



Re: [Qemu-devel] [RFC PATCH v3 11/11] QMP/qmp.py: set locale for exceptions to display non-ascii messages correctly

2013-05-23 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:34:16AM -0400, Tomoki Sekiyama wrote:
 qemu-ga in Windows may return error message with multibyte characters
 when the guest OS language is set to other than English. To display such
 messages correctly, this encodes the message based on the locale settings.
 
 Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com
 ---
  QMP/qmp.py |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/QMP/qmp.py b/QMP/qmp.py
 index c551df1..ee21819 100644
 --- a/QMP/qmp.py
 +++ b/QMP/qmp.py
 @@ -11,6 +11,7 @@
  import json
  import errno
  import socket
 +import locale
  
  class QMPError(Exception):
  pass
 @@ -133,7 +134,8 @@ class QEMUMonitorProtocol:
  def command(self, cmd, **kwds):
  ret = self.cmd(cmd, kwds)
  if ret.has_key('error'):
 -raise Exception(ret['error']['desc'])
 +enc = locale.getpreferredencoding()
 +raise Exception(ret['error']['desc'].encode(enc))

You should not need to explicitly encode the error descriptor.  The
error description should be UTF-8 on the wire and a Unicode Python
string in this script.

I think the real problem is:

1. Guest qga is writing strings in local encoding onto the wire.

or

2. qmp.py isn't UTF-8-decoding strings received over the wire.

Either or both bugs could be present.  Once they are fixed you shouldn't
see encoding problems.

Stefan



[Qemu-devel] Designing QMP APIs at KVM Forum

2013-05-23 Thread Stefan Hajnoczi
With better QMP introspection on the horizon and work in various
subsystems pushing QMP boundaries it would be useful to bring together
the latest best practices for designing QMP APIs.

There are design rules for keeping QMP APIs extensible and for
allowing clients to detect the presence of features.  There is also
QEMU-side infrastructure like event rate-limiting, which developers
should make use of where appropriate.

Is anyone willing to bring together the best practices and present
them at KVM Forum this year?

I think that could help set the standard for QMP APIs.  A set of
slides or wiki page can be a reference to developers that stops us
working from first pricinples every time a new API is added.

Stefan



Re: [Qemu-devel] qemu seabios issue with vhost-scsi

2013-05-23 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 05:36:08PM -0700, Badari wrote:
 Hi,
 
 While testing vhost-scsi in the current qemu git, ran into an earlier issue
 with seabios. I had to disable scsi support in seabios to get it working.
 
 I was hoping this issue got resolved when vhost-scsi support got
 merged into qemu. Is this still being worked on ?
 
 Thanks,
 Badari
 
 [root ~]# gdb /root/qemu/x86_64-softmmu/qemu-system-x86_64
 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
 Copyright (C) 2010 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu.
 For bug reporting instructions, please see:
 http://www.gnu.org/software/gdb/bugs/...
 Reading symbols from /root/qemu/x86_64-softmmu/qemu-system-x86_64...done.
 (gdb) run  --cpu qemu64 --enable-kvm  -m 4096 -drive
 file=/var/lib/libvirt/images/lnx.img,if=ide,cache=writethrough
 -device vhost-scsi-pci,wwpn=naa.6001405bd4e8476d,event_idx=off -vnc
 :10 -boot d
 Starting program: /root/qemu/x86_64-softmmu/qemu-system-x86_64 --cpu
 qemu64 --enable-kvm  -m 4096 -drive
 file=/var/lib/libvirt/images/window.img,if=ide,cache=writethrough
 -device vhost-scsi-pci,wwpn=naa.6001405bd4e8476d,event_idx=off -vnc
 :10 -boot d
 warning: no loadable sections found in added symbol-file
 system-supplied DSO at 0x77ffa000
 [Thread debugging using libthread_db enabled]
 [New Thread 0x71c1c700 (LWP 4725)]
 [New Thread 0x71239700 (LWP 4726)]
 [New Thread 0x7fffeb7ff700 (LWP 4729)]
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x71239700 (LWP 4726)]
 0x556b3191 in scsi_device_find (bus=0x565abb50, channel=0, id=0,
 lun=0) at hw/scsi/scsi-bus.c:1744
 1744QTAILQ_FOREACH_REVERSE(kid, bus-qbus.children,
 ChildrenHead, sibling) {
 Missing separate debuginfos, use: debuginfo-install
 cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64
 cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64
 cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64
 cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 db4-4.7.25-17.el6.x86_64
 glib2-2.22.5-7.el6.x86_64 glibc-2.12-1.107.el6.x86_64
 gnutls-2.8.5-10.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64
 krb5-libs-1.10.3-10.el6.x86_64 libcom_err-1.41.12-14.el6.x86_64
 libcurl-7.19.7-35.el6.x86_64 libgcrypt-1.4.5-9.el6_2.2.x86_64
 libgpg-error-1.7-4.el6.x86_64 libidn-1.18-2.el6.x86_64
 libpng-1.2.49-1.el6_2.x86_64 libselinux-2.0.94-5.3.el6.x86_64
 libssh2-1.4.2-1.el6.x86_64 libtasn1-2.3-3.el6_2.1.x86_64
 ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.9.2-1.el6.x86_64
 nss-3.14.0.0-12.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64
 nss-util-3.14.0.0-2.el6.x86_64 openldap-2.4.23-31.el6.x86_64
 openssl-1.0.0-27.el6.x86_64 pixman-0.26.2-4.el6.x86_64
 zlib-1.2.3-29.el6.x86_64
 (gdb) bt
 #0  0x556b3191 in scsi_device_find (bus=0x565abb50,
 channel=0, id=
 0, lun=0) at hw/scsi/scsi-bus.c:1744
 #1  0x557a59f0 in virtio_scsi_device_find (vdev=0x565aba38, vq=
 0x565d1150) at /root/qemu/hw/scsi/virtio-scsi.c:56
 #2  virtio_scsi_handle_cmd (vdev=0x565aba38, vq=0x565d1150)
 at /root/qemu/hw/scsi/virtio-scsi.c:376

We should never get here with vhost-scsi.  This function is processing
the command virtqueue in QEMU userspace - if vhost is active then we
shouldn't reach this.

AFAICT the s-bus was not initialized in the vhost codepath.  Therefore
the crash in scsi_device_find(bus, ...).

Can you check vhost_scsi_set_status() was called and if it successfully
enabled vhost?

Is it possible that the guest is notifying the virtqueue before setting
the status register to DRIVER_OK?  That would explain why vhost hasn't
been activated yet.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 05/11] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-05-23 Thread Stefan Hajnoczi
On Tue, May 21, 2013 at 11:33:52AM -0400, Tomoki Sekiyama wrote:
 Implements a basic stub of software VSS provider. Currently, this modules
 only provides a relay function of events between qemu-guest-agent and
 Windows VSS when VSS finished filesystem freeze and when qemu snapshot
 is done.
 
 In the future, this module could be extended to support the other VSS
 functions, such as query for snapshot volumes and recovery.
 
 Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com
 ---
  Makefile.objs   |2 
  configure   |5 
  qga/Makefile.objs   |6 
  qga/vss-win32-provider.h|   26 ++
  qga/vss-win32-provider/Makefile.objs|   21 +
  qga/vss-win32-provider/install.cpp  |  494 
 +++
  qga/vss-win32-provider/provider.cpp |  474 ++
  qga/vss-win32-provider/qga-provider.def |   10 +
  qga/vss-win32-provider/qga-provider.idl |   20 +
  qga/vss-win32.h |   85 +
  10 files changed, 1142 insertions(+), 1 deletion(-)
  create mode 100644 qga/vss-win32-provider.h
  create mode 100644 qga/vss-win32-provider/Makefile.objs
  create mode 100644 qga/vss-win32-provider/install.cpp
  create mode 100644 qga/vss-win32-provider/provider.cpp
  create mode 100644 qga/vss-win32-provider/qga-provider.def
  create mode 100644 qga/vss-win32-provider/qga-provider.idl
  create mode 100644 qga/vss-win32.h

Please run scripts/checkpatch.pl and use QEMU coding style (see
./CODING_STYLE and ./HACKING).

 diff --git a/qga/vss-win32-provider/Makefile.objs 
 b/qga/vss-win32-provider/Makefile.objs
 new file mode 100644
 index 000..73ef752
 --- /dev/null
 +++ b/qga/vss-win32-provider/Makefile.objs
 @@ -0,0 +1,21 @@
 +# rules to build qga-provider.dll
 +
 +qga-obj-y += qga-provider.dll
 +qga-prv-obj-y += provider.o install.o
 +
 +obj-qga-prv-obj-y = $(addprefix $(obj)/, $(qga-prv-obj-y))
 +$(obj-qga-prv-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes 
 -Wmissing-prototypes -Wnested-externs -Wold-style-declaration 
 -Wold-style-definition -Wredundant-decls -fstack-protector-all, 
 $(QEMU_CFLAGS)) -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor
 +
 +$(obj)/qga-provider.dll: LDFLAGS = -shared 
 -Wl,--add-stdcall-alias,--enable-stdcall-fixup -lole32 -loleaut32 -lshlwapi 
 -luuid -static
 +$(obj)/qga-provider.dll: $(obj-qga-prv-obj-y) 
 $(SRC_PATH)/$(obj)/qga-provider.def $(obj)/qga-provider.tlb
 + $(call quiet-command,$(CXX) -o $@ $(qga-prv-obj-y) 
 $(SRC_PATH)/qga/vss-win32-provider/qga-provider.def $(CXXFLAGS) $(LDFLAGS),  
 LINK  $(TARGET_DIR)$@)
 +
 +
 +# rules to build qga-provider.tlb
 +# Currently, only native build is supported because building .tlb
 +# (TypeLibrary) from .idl requires WindowsSDK and MIDL (included in VC++).
 +MIDL=midl
 +WINSDK=C:\\Program Files\\Microsoft SDKs\\Windows\\v7.1\\Include

This needs to be a ./configure option.

 diff --git a/qga/vss-win32.h b/qga/vss-win32.h
 new file mode 100644
 index 000..7600087
 --- /dev/null
 +++ b/qga/vss-win32.h
 @@ -0,0 +1,85 @@
 +/*
 + * QEMU Guest Agent win32 VSS common declarations
 + *
 + * Copyright Hitachi Data Systems Corp. 2013
 + *
 + * Authors:
 + *  Tomoki Sekiyama   tomoki.sekiy...@hds.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#ifndef VSS_WIN32_H
 +#define VSS_WIN32_H
 +
 +#define __MIDL_user_allocate_free_DEFINED__
 +#include config-host.h
 +#include windows.h
 +#include shlwapi.h
 +
 +/* Reduce warnings to include vss.h */
 +#define __in  IN
 +#define __out OUT
 +#define __RPC_unique_pointer
 +#define __RPC_string
 +#define __RPC__deref_inout_opt
 +#define __RPC__out
 +#ifndef __RPC__out_ecount_part
 +#define __RPC__out_ecount_part(x, y)
 +#endif
 +#define _declspec(x)
 +#undef uuid
 +#define uuid(x)

This looks hacky.  Why are you stubbing out macros that vss.h uses?



Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto:
 On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote:
 * How to build  run qemu-ga with VSS support

  - Download Microsoft VSS SDK from:
http://www.microsoft.com/en-us/download/details.aspx?id=23490

  - Setup the SDK
scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)

  - Specify installed SDK directory to configure option as:
./configure -with-vss-sdk=path/to/VSS SDK 
 --cross-prefix=i686-w64-mingw32-

 Are there any plans to make this more developer-friendly?  In the Linux
 world it's unusual to download third-party SDKs; development headers are
 available as packages from the distro.

 I haven't looked at the SDK license but hopefully the VSS headers can be
 added to the mingw cross-build toolchain?

 No, the mingw headers are (re)written by hand, or something like that.
 The VSS license does not permit redistribution. :/

Can we follow the same process for VSS as for the other mingw headers?

TBH I don't see how one can write them by hand without deriving it
from the SDK, which seems dubious.  But if it worked for the rest of
mingw (which comes with Windows and DirectX headers, for example) then
maybe it can work here too.

Stefan



Re: [Qemu-devel] qemu seabios issue with vhost-scsi

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:48 AM, Gleb Natapov g...@redhat.com wrote:
 On Thu, May 23, 2013 at 08:53:55AM +0800, Asias He wrote:
 On Wed, May 22, 2013 at 05:36:08PM -0700, Badari wrote:
  Hi,
 
  While testing vhost-scsi in the current qemu git, ran into an earlier issue
  with seabios. I had to disable scsi support in seabios to get it working.
 
  I was hoping this issue got resolved when vhost-scsi support got
  merged into qemu. Is this still being worked on ?

 Hmm, can you try seabios.git? Not sure if seabios shipped by qemu picked
 up the fixes for vhost-scsi.

 Nothing in seabios should crash qemu.

Agreed.  I'm pretty sure it's the scenario that I posted in my other
reply to this thread.

The common virtio-scsi code in QEMU should guard against this.  In
virtio-blk data plane I hit a similar case and ended up starting the
data plane thread (equivalent to vhost here) *before* the status
register is set to DRIVER_OK.

Stefan



Re: [Qemu-devel] [PATCH v5 01/11] curl: introduce CURLSockInfo to BDRVCURLState.

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:37:59AM +0800, Fam Zheng wrote:
 @@ -90,7 +98,16 @@ static int curl_aio_flush(void *opaque);
  static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
  void *s, void *sp)
  {
 +BDRVCURLState *bs = s;

bs is used for BlockDriverState in block/* code.  I find this confusing.
Usually the custom state is called 's'.  Perhaps just rename the void*
arguments s_ and sp_.



Re: [Qemu-devel] [PATCH v5 05/11] curl: add timer to BDRVCURLState

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:38:03AM +0800, Fam Zheng wrote:
 diff --git a/block/curl.c b/block/curl.c
 index fc464ad..4fd5bb9 100644
 --- a/block/curl.c
 +++ b/block/curl.c
 @@ -89,6 +89,7 @@ typedef struct BDRVCURLState {
  QLIST_HEAD(, CURLSockInfo) socks;
  char *url;
  size_t readahead_size;
 +QEMUTimer *timer;
  /* Whether http server accept range in header */
  bool accept_range;
  } BDRVCURLState;
 @@ -148,6 +149,38 @@ static size_t curl_header_cb(void *ptr, size_t size, 
 size_t nmemb, void *opaque)
  return realsize;
  }
  
 +static void curl_timer_cb(void *opaque)
 +{
 +int running;
 +BDRVCURLState *bs = (BDRVCURLState *)opaque;

Please call it 's'.  'bs' is for BlockDriverState*.

Also, there is no need to cast void* to BDRVCURLState*, the conversion
is implicit.

 +DPRINTF(curl timeout!\n);
 +curl_multi_socket_action(bs-multi, CURL_SOCKET_TIMEOUT, 0, running);
 +}
 +
 +/* Call back for curl_multi interface */
 +static int curl_multi_timer_cb(CURLM *multi, long timeout_ms, void *s)
 +{
 +BDRVCURLState *bs = (BDRVCURLState *)s;

Same here.



Re: [Qemu-devel] [PATCH v5 06/11] curl: introduce CURLDataCache

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:38:04AM +0800, Fam Zheng wrote:
 +typedef struct CURLDataCache {
 +char *data;
 +size_t base_pos;

Must be int64_t.  QEMU compiled on 32-bit hosts would only allow 4 GB
images with size_t!

 +size_t data_len;
 +size_t write_pos;
 +/* Ref count for CURLState */
 +int use_count;

It's better to introduce this field when you add code to use it.  When
possible, don't add unused code in a patch, it makes it harder to
review.

 +static void curl_complete_io(BDRVCURLState *bs, CURLAIOCB *acb,
 + CURLDataCache *cache)
 +{
 +size_t aio_base = acb-sector_num * SECTOR_SIZE;

int64_t

 +size_t aio_bytes = acb-nb_sectors * SECTOR_SIZE;
 +size_t off = aio_base - cache-base_pos;
 +
 +qemu_iovec_from_buf(acb-qiov, 0, cache-data + off, aio_bytes);
 +acb-common.cb(acb-common.opaque, 0);
 +DPRINTF(AIO Request OK: %10zd %10zd\n, aio_base, aio_bytes);

PRId64 for 64-bit aio_base

 @@ -589,26 +577,24 @@ static const AIOCBInfo curl_aiocb_info = {
  static void curl_readv_bh_cb(void *p)
  {
  CURLState *state;
 -
 +CURLDataCache *cache = NULL;
  CURLAIOCB *acb = p;
  BDRVCURLState *s = acb-common.bs-opaque;
 +size_t aio_base, aio_bytes;

int64_t aio_base;



Re: [Qemu-devel] [PATCH v5 07/11] curl: make use of CURLDataCache.

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:38:05AM +0800, Fam Zheng wrote:
 @@ -221,31 +215,35 @@ static void curl_complete_io(BDRVCURLState *bs, 
 CURLAIOCB *acb,
  
  static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void 
 *opaque)
  {
 -CURLState *s = ((CURLState*)opaque);
 +CURLState *s = (CURLState *)opaque;

You can drop the cast completely because opaque is void*.

 +CURLDataCache *c = s-cache;
  size_t realsize = size * nmemb;
 -int i;
 -
 -DPRINTF(CURL: Just reading %zd bytes\n, realsize);
 +CURLAIOCB *acb;
  
 -if (!s || !s-orig_buf)
 +if (!c || !c-data) {
  goto read_end;
 +}
 +if (c-write_pos = c-data_len) {
 +goto read_end;
 +}
 +memcpy(c-data + c-write_pos, ptr,
 +   MIN(realsize, c-data_len - c-write_pos));
 +c-write_pos += realsize;
 +if (c-write_pos = c-data_len) {
 +c-write_pos = c-data_len;
 +}
  
 -memcpy(s-orig_buf + s-buf_off, ptr, realsize);
 -s-buf_off += realsize;

Why did you add MIN(realsize, c-data_len - c-write_pos)?  The original
code trusts realsize to be within s-orig_buf.

 -
 -for(i=0; iCURL_NUM_ACB; i++) {
 -CURLAIOCB *acb = s-acb[i];
 -
 -if (!acb)
 -continue;
 -
 -if ((s-buf_off = acb-end)) {
 -qemu_iovec_from_buf(acb-qiov, 0, s-orig_buf + acb-start,
 -acb-end - acb-start);
 -acb-common.cb(acb-common.opaque, 0);
 -qemu_aio_release(acb);
 -s-acb[i] = NULL;
 +acb = QLIST_FIRST(s-s-acbs);
 +while (acb) {
 +size_t aio_base = acb-sector_num * SECTOR_SIZE;

int64_t

 @@ -600,29 +596,41 @@ static void curl_readv_bh_cb(void *p)
  // No cache found, so let's start a new request
  state = curl_init_state(s);
  if (!state) {
 -acb-common.cb(acb-common.opaque, -EIO);
 -qemu_aio_release(acb);
 -return;
 +goto err_release;
  }
  
 -acb-start = 0;
 -acb-end = (acb-nb_sectors * SECTOR_SIZE);
 -
 -state-buf_off = 0;
 -if (state-orig_buf)
 -g_free(state-orig_buf);
 -state-buf_start = start;
 -state-buf_len = acb-end + s-readahead_size;
 -end = MIN(start + state-buf_len, s-len) - 1;
 -state-orig_buf = g_malloc(state-buf_len);
 -state-acb[0] = acb;
 -
 -snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, start, end);
 -DPRINTF(CURL (AIO): Reading %d at %zd (%s)\n,
 -(acb-nb_sectors * SECTOR_SIZE), start, state-range);
 -curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range);
 +cache = g_malloc0(sizeof(CURLDataCache));
 +cache-base_pos = acb-sector_num * SECTOR_SIZE;
 +cache-data_len = aio_bytes + s-readahead_size;
 +cache-write_pos = 0;
 +cache-data = g_malloc(cache-data_len);
  
 +QLIST_INSERT_HEAD(s-acbs, acb, next);
 +snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, 
 cache-base_pos,
 + cache-base_pos + cache-data_len);
 +DPRINTF(Reading range: %s\n, state-range);
 +curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range);
 +QLIST_INSERT_HEAD(s-cache, cache, next);
 +state-cache = cache;
 +cache-use_count++;

I don't see where you bump the use_count when a cache lookup is
successful.  Maybe I just missed it in the other patches.



Re: [Qemu-devel] [PATCH v5 08/11] curl: use list to store CURLState

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:38:06AM +0800, Fam Zheng wrote:
 @@ -660,9 +651,13 @@ static BlockDriverAIOCB *curl_aio_readv(BlockDriverState 
 *bs,
  static void curl_close(BlockDriverState *bs)
  {
  BDRVCURLState *s = bs-opaque;
 -int i;
  
  DPRINTF(CURL: Close\n);
 +if (s-timer) {
 +qemu_del_timer(s-timer);
 +qemu_free_timer(s-timer);
 +s-timer = NULL;
 +}

This hunk is duplicated, see the next line :-).

 @@ -670,16 +665,26 @@ static void curl_close(BlockDriverState *bs)
  s-timer = NULL;
  }
  
 -for (i=0; iCURL_NUM_STATES; i++) {
 -if (s-states[i].in_use)
 -curl_clean_state(s-states[i]);
 -if (s-states[i].curl) {
 -curl_easy_cleanup(s-states[i].curl);
 -s-states[i].curl = NULL;
 -}
 +while (!QLIST_EMPTY(s-curl_states)) {
 +CURLState *state = QLIST_FIRST(s-curl_states);
 +/* Remove and clean curl easy handles */
 +curl_clean_state(state);
 +QLIST_REMOVE(state, next);
 +g_free(state);
 +state = NULL;
  }
 -if (s-multi)
 +
 +if (s-multi) {
  curl_multi_cleanup(s-multi);
 +}
 +
 +while (!QLIST_EMPTY(s-acbs)) {
 +CURLAIOCB *acb = QLIST_FIRST(s-acbs);
 +acb-common.cb(acb-common.opaque, -EIO);
 +QLIST_REMOVE(acb, next);
 +qemu_aio_release(acb);
 +acb = NULL;
 +}

Duplicated?  See line below.

  
  while (!QLIST_EMPTY(s-acbs)) {
  CURLAIOCB *acb = QLIST_FIRST(s-acbs);
 @@ -709,6 +714,7 @@ static void curl_close(BlockDriverState *bs)
  }
  
  g_free(s-url);
 +s-url = NULL;
  }
  

Can s-url = NULL be merged into a previous patch which adds
g_free(s-url)?



Re: [Qemu-devel] [PATCH v5 00/11] curl: fix curl read

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:37:58AM +0800, Fam Zheng wrote:
 CURL library API has changed, the current curl driver is not working.
 This patch rewrites the use of API as well as the structure of internal
 states.
 
 BDRVCURLState holds the pointer to curl multi interface (man 3
 libcurl-multi), and 4 lists for internal states:
  - CURLState holds state for libcurl connection (man 3 libcurl-easy)
  - CURLSockInfo holds information for libcurl socket interface (man 3
curl_multi_socket_action).
  - CURLDataCache holds the user data read from libcurl, it is in a list
ordered by access, the used cache is moved to list head on access, so
the tail element is freed first. BDRVCURLState.cache_quota is the
threshold to start freeing cache.
  - CURLAIOCB holds ongoing aio information.

Looking pretty good.

It's not clear to me if block/curl.c was broken, besides failing with
HTTP servers that do not support Range:, or which part of this series
fixes the bug(s).  Can you clarify that in the patch description?

I'll test the next revision and audit a little more for memory leaks,
since you are introducing several heap-allocated structures and lists.



Re: [Qemu-devel] [PATCH v5 09/11] curl: add cache quota.

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:38:07AM +0800, Fam Zheng wrote:
 Introduce a cache quota: BDRVCURLState.cache_quota.
 When adding new CURLDataCache to BDRVCURLState, if number of existing
 CURLDataCache is larger than CURL_CACHE_QUOTA, try to release some first
 to limit the in memory cache size.
 
 A least used entry is selected for releasing.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/curl.c | 27 +++
  1 file changed, 27 insertions(+)
 
 diff --git a/block/curl.c b/block/curl.c
 index 3ea2552..5adbc84 100644
 --- a/block/curl.c
 +++ b/block/curl.c
 @@ -40,6 +40,7 @@
  
  #define SECTOR_SIZE 512
  #define READ_AHEAD_SIZE (256 * 1024)
 +#define CURL_CACHE_QUOTA 10
  
  struct BDRVCURLState;
  
 @@ -90,6 +91,8 @@ typedef struct BDRVCURLState {
  QEMUTimer *timer;
  /* List of data cache ordered by access, freed from tail */
  QLIST_HEAD(, CURLDataCache) cache;
 +/* Threshold to release unused cache when cache list is longer than it */
 +int cache_quota;
  /* Whether http server accept range in header */
  bool accept_range;
  } BDRVCURLState;
 @@ -526,6 +529,8 @@ static int curl_open(BlockDriverState *bs, QDict 
 *options, int flags)
  curl_multi_socket_action(s-multi, CURL_SOCKET_TIMEOUT, 0, running);
  
  qemu_opts_del(opts);
 +s-cache_quota = CURL_CACHE_QUOTA;
 +
  return 0;
  
  out:
 @@ -595,6 +600,27 @@ static void curl_readv_bh_cb(void *p)
  cache-data_len = aio_bytes + s-readahead_size;
  cache-write_pos = 0;
  cache-data = g_malloc(cache-data_len);
 +/* Try to release some cache */
 +while (0  s-cache_quota = 0) {

while (0) :(



Re: [Qemu-devel] QMP interface for drive-add (or even blockdev-add)

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 4:41 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 23.05.2013 um 13:57 hat Stefan Hajnoczi geschrieben:
 On Wed, May 22, 2013 at 03:53:05PM +0200, Kevin Wolf wrote:
  Am 16.05.2013 um 21:05 hat Eric Blake geschrieben:
   On 05/16/2013 02:24 AM, Kevin Wolf wrote:
  The other thing that I'm not sure about is whether we should teach QAPI
  to parse certain data structures just into QDicts instead of C structs,
  or if dealing with the big unions inside the block layer actually makes
  sense.

 This is an interesting question.  It's very convenient from the code
 side - we don't have to worry about laying down a schema.

 However, the point of QAPI is to offer that schema that allows for us to
 reason about things like compatibility (hard to sneak in a patch that
 modifies the schema, easy to sneak in a patch that modifies block driver
 parameter code) and eliminates the boilerplate of type-checking/basic
 input validation.

 Even if it requires some effort, I think we should avoid tunneling
 schema-less data over QAPI.

 Note that I'm talking _only_ about the C side here. Everything that goes
 through QMP is an external API and must described by a schema, I fully
 agree there.

 The question is whether QAPI must, after validating the input against
 the schema, parse it into C structs or whether it should be able to fill
 QDicts and pass those around.

I see.  In that case using a QDict seems fine.

Stefan



Re: [Qemu-devel] qemu seabios issue with vhost-scsi

2013-05-23 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 6:47 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 23/05/2013 18:38, Badari Pulavarty ha scritto:
 If that is with the old SeaBIOS, then SIGABRT is intended. :)  The guest
 is buggy, the problem in QEMU only lies in _how_ it fails.

 Paolo



 I am confused now. Without above changes, seabios fix makes the
 guest boot. But with the above changes, I run into this (even with
 seabios fix)

 Ah, okay.  I understood it crashed only with the SeaBIOS fix.

 I'll work on a more proper fix then.

Maybe use the same approach as data plane - activate vhost when
userspace sees the first virtqueue kick.  That way it works with weird
guests and never aborts.

Stefan



Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 08:11:42AM +, Dietmar Maurer wrote:
   I also consider it safer, because you make sure the data exists (using 
   hash keys
  like SHA1).
  
   I am unsure how you can check if a dirty bitmap contains errors, or is 
   out of
  date?
  
   Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is 
   always
  related to a single image.
   (consider the user remove the latest backup from the backup target).
  
  One disadvantage of Merkle trees is that the client becomes stateful - the 
  client
  needs to store its own Merkle tree and this requires fancier client-side 
  code.
 
 What 'client' do you talk about here?

A backup application, for example.  Previously it could simply use
api.getDirtyBlocks() - [Sector] and it would translate into a single
QMP API call.

Now a Merkle tree needs to be stored on the client side and synced with
the server.  The client-side library becomes more complex.

 But sure, the code gets more complex, and needs considerable amount of RAM
 to store the hash keys .
  
  It is also more expensive to update hashes than a dirty bitmap.  Not 
  because you
  need to hash data but because a small write (e.g. 1 sector) requires that 
  you
  read the surrounding sectors to recompute a hash for the cluster.  
  Therefore you
  can expect worse guest I/O performance than with a dirty bitmap.
 
 There is no need to update any hash - You only need to do that on backup - in 
 fact, all
 those things can be done by the backup driver.

The problem is that if you leave hash calculation until backup time then
you need to read in the entire disk image (100s of GB) from disk.  That
is slow and drains I/O resources.

Maybe the best approach is to maintain a dirty bitmap while the guest is
running, which is fairly cheap.  Then you can use the dirty bitmap to
only hash modified clusters when building the Merkle tree - this avoids
reading the entire disk image.

  I still think it's a cool idea.  Making it work well will require a lot 
  more effort than
  a dirty bitmap.
 
 How do you re-generate a dirty bitmap after a server crash?

The dirty bitmap is invalid after crash.  A full backup is required, all
clusters are considered dirty.

The simplest way to implement this is to mark the persistent bitmap
invalid upon the first guest write.  When QEMU is terminated cleanly,
flush all dirty bitmap updates to disk and then mark the file valid
again.  If QEMU finds the file is invalid on startup, start from
scratch.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 06:34:43PM +, Tomoki Sekiyama wrote:
 On 5/23/13 8:12 , Stefan Hajnoczi stefa...@gmail.com wrote:
 
 On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote:
  Add C++ keywords to avoid errors in compiling with c++ compiler.
  This also renames class member of PciDeviceInfo to q_class.
  
  Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com
  ---
   hmp.c   |2 +-
   hw/pci/pci.c|2 +-
   scripts/qapi.py |9 -
   3 files changed, 10 insertions(+), 3 deletions(-)
 
 Please also extend scripts/checkpatch.pl.  Otherwise it is very likely
 that C++ keywords will be introduced again in the future.  Most people
 will not build the VSS code and therefore checkpatch.pl needs to ensure
 that patches with C++ keywords will not be accepted.
 
 Stefan
 
 I think it would be difficult to identify problematic C++ keywords usage
 from patches because headers can legally contain C++ keywords and
 checkpatch.pl doesn't know where it should be used.
 Do you have any good ideas?

We can ignore false warnings for 0.1% of patches (the ones that touch
VSS C++ code).  But for the remaining 99.9% of patches it's worth
guarding against VSS bitrot.

Remember not many people will compile it and therefore they won't notice
when they break it.  I really think it's worth putting some effort in
now so VSS doesn't periodically break.

checkpatch.pl is a hacky sort of C parser.  It already checks for a
bunch of similar things and it knows about comments, macros, and
strings.  It does not perform #include expansion, so there is no risk of
including system headers that have C++ code.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 05/11] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 06:36:35PM +, Tomoki Sekiyama wrote:
 On 5/23/13 8:22 , Stefan Hajnoczi stefa...@gmail.com wrote:
 
 On Tue, May 21, 2013 at 11:33:52AM -0400, Tomoki Sekiyama wrote:
  diff --git a/qga/vss-win32.h b/qga/vss-win32.h
  new file mode 100644
  index 000..7600087
  --- /dev/null
  +++ b/qga/vss-win32.h
  @@ -0,0 +1,85 @@
  +/*
  + * QEMU Guest Agent win32 VSS common declarations
  + *
  + * Copyright Hitachi Data Systems Corp. 2013
  + *
  + * Authors:
  + *  Tomoki Sekiyama   tomoki.sekiy...@hds.com
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2 or
 later.
  + * See the COPYING file in the top-level directory.
  + */
  +
  +#ifndef VSS_WIN32_H
  +#define VSS_WIN32_H
  +
  +#define __MIDL_user_allocate_free_DEFINED__
  +#include config-host.h
  +#include windows.h
  +#include shlwapi.h
  +
  +/* Reduce warnings to include vss.h */
  +#define __in  IN
  +#define __out OUT
  +#define __RPC_unique_pointer
  +#define __RPC_string
  +#define __RPC__deref_inout_opt
  +#define __RPC__out
  +#ifndef __RPC__out_ecount_part
  +#define __RPC__out_ecount_part(x, y)
  +#endif
  +#define _declspec(x)
  +#undef uuid
  +#define uuid(x)
 
 This looks hacky.  Why are you stubbing out macros that vss.h uses?
 
 Because these macros are internally defined/used by windows SDK's
 headers and not fully covered by mingw (some of them is only
 meaningful with Microsoft IDE), so it don't compile without
 these define's.
 
 This could be better if mingw supports these VSS headers, but
 I have no idea how to achieve that...

Please add a comment explaining this.

BTW, did you look at the macro definitions to see if they define
anything important?  In other words, is there a difference when compiled
using Visual C++ where the macros actually have meaning?

Stefan



Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 03:50:19PM +0200, Paolo Bonzini wrote:
 Il 23/05/2013 15:28, Stefan Hajnoczi ha scritto:
  On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto:
  On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote:
  * How to build  run qemu-ga with VSS support
 
   - Download Microsoft VSS SDK from:
 http://www.microsoft.com/en-us/download/details.aspx?id=23490
 
   - Setup the SDK
 scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)
 
   - Specify installed SDK directory to configure option as:
 ./configure -with-vss-sdk=path/to/VSS SDK 
  --cross-prefix=i686-w64-mingw32-
 
  Are there any plans to make this more developer-friendly?  In the Linux
  world it's unusual to download third-party SDKs; development headers are
  available as packages from the distro.
 
  I haven't looked at the SDK license but hopefully the VSS headers can be
  added to the mingw cross-build toolchain?
 
  No, the mingw headers are (re)written by hand, or something like that.
  The VSS license does not permit redistribution. :/
  
  Can we follow the same process for VSS as for the other mingw headers?
 
 I have no idea.  Many of them are automatically generated, but some of
 the others have content seemed non-trivial.
 
  TBH I don't see how one can write them by hand without deriving it
  from the SDK, which seems dubious.  But if it worked for the rest of
  mingw (which comes with Windows and DirectX headers, for example) then
  maybe it can work here too.
 
 I'd rather avoid muddy legal waters...

I'll try to figure out mingw's process.  If mingw has a process that is
clean, then let's use it for VSS headers too.

What gives me hope is that mingw headers are shipped by Fedora and
Debian so the legal situation must be somewhat reasonable.

Stefan



Re: [Qemu-devel] 'qemu-nbd' explicit flush

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 09:58:31PM +, Mark Trumpold wrote:
 I have a working configuration using the signal approach suggested by Stefan.
 
 'qemu-nbd.c' is patched as follows:
 
 do {
 main_loop_wait(false);
 +   if (sighup_reported) {
 +   sighup_reported = false;
 +   bdrv_drain_all();
 +   bdrv_flush_all();
 }
 } while (!sigterm_reported  (persistent || !nbd_started || nb_fds  0));
 
 The driving script was patched as follows:
 
  mount -o remount,ro /dev/nbd0
  blockdev --flushbufs /dev/nbd0
 +kill -HUP qemu-nbd process id
 
 I needed to retain 'blockdev --flushbufs' for things to work.  Seems the 
 'bdrv_flush_all' is flushing what is being missed by the blockdev flush.  I 
 did not go back an retest with 'fsync' or other approaches I had tried before.

Okay, that makes sense:

'blockdev --flushbufs' is writing dirty pages to the NBD device.

bdrv_drain_all() + bdrv_flush_all() ensures that image file writes reach
the physical disk.

One thing to be careful of is whether these operations are asynchronous.
The signal is asynchronous, you have no way of knowing when qemu-nbd is
finished flushing to the physical disk.

I didn't check blockdev(8) but it could be the same there.

So watch out, otherwise your script is timing-dependent and may not
actually have finished flushing when you take the snapshot.

Stefan



Re: [Qemu-devel] 'qemu-nbd' explicit flush

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 11:35:24PM +, Mark Trumpold wrote:
 I had one question I forgot to ask..
 
 Is it possible to switch from '--cache=writeback' functionality
 to '--cache=writethrough' (and visa versa) while qemu-nbd is
 connected to the '/dev/nbdx' device?

No.  The block layer APIs to do that are available in QEMU but qemu-nbd
doesn't have run-time cache mode changing.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 03:50:19PM +0200, Paolo Bonzini wrote:
 Il 23/05/2013 15:28, Stefan Hajnoczi ha scritto:
  On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto:
  On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote:
  * How to build  run qemu-ga with VSS support
 
   - Download Microsoft VSS SDK from:
 http://www.microsoft.com/en-us/download/details.aspx?id=23490
 
   - Setup the SDK
 scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)
 
   - Specify installed SDK directory to configure option as:
 ./configure -with-vss-sdk=path/to/VSS SDK 
  --cross-prefix=i686-w64-mingw32-
 
  Are there any plans to make this more developer-friendly?  In the Linux
  world it's unusual to download third-party SDKs; development headers are
  available as packages from the distro.
 
  I haven't looked at the SDK license but hopefully the VSS headers can be
  added to the mingw cross-build toolchain?
 
  No, the mingw headers are (re)written by hand, or something like that.
  The VSS license does not permit redistribution. :/
  
  Can we follow the same process for VSS as for the other mingw headers?
 
 I have no idea.  Many of them are automatically generated, but some of
 the others have content seemed non-trivial.

Here we go:

For missing declarations or import functions in the mingw runtime or
w32api it is essential to specify the source of the documentation on
which you based your patch, or to otherwise provide proof that you have
developed your solution through your own experimental effort. Never
resolve an issue by looking at someone else's work (unless it is public
domain). We only accept patches based on publicly available
documentation, (MSDN is an acceptable source, as is any BSD or similarly
licensed source; no GPL or LPGL licensed source is acceptable, unless
your patch relates to a MinGW package which is already distributed under
one of these licences), or on individual experimental effort, and never
based on others' work, (particularly where that work may be
proprietary). Note, in particular, that we adopt a much stricter
standpoint on extracting information from Microsoft's SDK headers, than
that taken by some other projects, such as Wine or ReactOS; consequently
we will always reject, unconditionally, patches based on information
extracted from files published by such projects.

http://www.mingw.org/wiki/SubmitPatches

So basically they reimplement headers by studying MSDN and maybe
experimenting in Visual C++ without peeking at the official SDK.

This would take extra effort.  It would be a nice thing to have in mingw
but is out of scope here.

Stefan



Re: [Qemu-devel] [PATCH v5 07/11] curl: make use of CURLDataCache.

2013-05-24 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 11:10:26AM +0800, Fam Zheng wrote:
 On Thu, 05/23 16:23, Stefan Hajnoczi wrote:
  On Thu, May 23, 2013 at 11:38:05AM +0800, Fam Zheng wrote:
   +CURLDataCache *c = s-cache;
size_t realsize = size * nmemb;
   -int i;
   -
   -DPRINTF(CURL: Just reading %zd bytes\n, realsize);
   +CURLAIOCB *acb;

   -if (!s || !s-orig_buf)
   +if (!c || !c-data) {
goto read_end;
   +}
   +if (c-write_pos = c-data_len) {
   +goto read_end;
   +}
   +memcpy(c-data + c-write_pos, ptr,
   +   MIN(realsize, c-data_len - c-write_pos));
   +c-write_pos += realsize;
   +if (c-write_pos = c-data_len) {
   +c-write_pos = c-data_len;
   +}

   -memcpy(s-orig_buf + s-buf_off, ptr, realsize);
   -s-buf_off += realsize;
  
  Why did you add MIN(realsize, c-data_len - c-write_pos)?  The original
  code trusts realsize to be within s-orig_buf.
 
 I don't see an evidence why it's safe here. CURL certainly doesn't know
 how much buffer do we have. (man 3 curl_easy_setopt, section
 CURLOPT_WRITEFUNCTION)

The HTTP request included a Range: header so we should know the total
number of bytes we'll receive.

That said, libcurl may not check so this is defensive programming.  A
malicious server shouldn't be able to overflow our buffer.  A comment or
note in the commit description would be nice to explain semantic changes
like this.

   @@ -600,29 +596,41 @@ static void curl_readv_bh_cb(void *p)
// No cache found, so let's start a new request
state = curl_init_state(s);
if (!state) {
   -acb-common.cb(acb-common.opaque, -EIO);
   -qemu_aio_release(acb);
   -return;
   +goto err_release;
}

   -acb-start = 0;
   -acb-end = (acb-nb_sectors * SECTOR_SIZE);
   -
   -state-buf_off = 0;
   -if (state-orig_buf)
   -g_free(state-orig_buf);
   -state-buf_start = start;
   -state-buf_len = acb-end + s-readahead_size;
   -end = MIN(start + state-buf_len, s-len) - 1;
   -state-orig_buf = g_malloc(state-buf_len);
   -state-acb[0] = acb;
   -
   -snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, start, 
   end);
   -DPRINTF(CURL (AIO): Reading %d at %zd (%s)\n,
   -(acb-nb_sectors * SECTOR_SIZE), start, state-range);
   -curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range);
   +cache = g_malloc0(sizeof(CURLDataCache));
   +cache-base_pos = acb-sector_num * SECTOR_SIZE;
   +cache-data_len = aio_bytes + s-readahead_size;
   +cache-write_pos = 0;
   +cache-data = g_malloc(cache-data_len);

   +QLIST_INSERT_HEAD(s-acbs, acb, next);
   +snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, 
   cache-base_pos,
   + cache-base_pos + cache-data_len);
   +DPRINTF(Reading range: %s\n, state-range);
   +curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range);
   +QLIST_INSERT_HEAD(s-cache, cache, next);
   +state-cache = cache;
   +cache-use_count++;
  
  I don't see where you bump the use_count when a cache lookup is
  successful.  Maybe I just missed it in the other patches.
 
 Use count is for serving as the receiving buffer for submitted CURL
 requests. It's not necessary to bump use_count when cache lookup is
 successful, since data is immediately copied to guest, no ref to the
 cache hold.

You're right.

Stefan



Re: [Qemu-devel] Designing QMP APIs at KVM Forum

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 07:59:17AM -0600, Eric Blake wrote:
 On 05/23/2013 06:46 AM, Luiz Capitulino wrote:
  On Thu, 23 May 2013 13:51:22 +0200
  Stefan Hajnoczi stefa...@gmail.com wrote:
  
  With better QMP introspection on the horizon and work in various
  subsystems pushing QMP boundaries it would be useful to bring together
  the latest best practices for designing QMP APIs.
 
  There are design rules for keeping QMP APIs extensible and for
  allowing clients to detect the presence of features.  There is also
  QEMU-side infrastructure like event rate-limiting, which developers
  should make use of where appropriate.
 
  Is anyone willing to bring together the best practices and present
  them at KVM Forum this year?
  
  I think this is a great idea and I vote for Eric to prepare a presentation.
  Eric is doing an exceptional work on QMP command review, he is also
  experienced on the client side.
 
 Indeed, it looks like I have a good topic for presentation.  Yes, I'll
 take on that task for the KVM forum.

Great, looking forward to your presentation.

Stefan



Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote:
 This patch series provides VNVRAM persistent storage support that
 QEMU can use internally.  The initial target user will be a software
 vTPM 1.2 backend that needs to store keys in VNVRAM and be able to
 reboot/migrate and retain the keys.
 
 This support uses QEMU's block driver to provide persistent storage
 by reading/writing VNVRAM data from/to a drive image.  The VNVRAM
 drive image is provided with the -drive command line option just like
 any other drive image and the vnvram_create() API will find it.
 
 The APIs allow for VNVRAM entries to be registered, one at a time,
 each with a maximum blob size.  Entry blobs can then be read/written
 from/to an entry on the drive.  Here's an example of usage:
 
 VNVRAM *vnvram;
 int errcode
 const VNVRAMEntryName entry_name;
 const char *blob_w = blob data;
 char *blob_r;
 uint32_t blob_r_size;
 
 vnvram = vnvram_create(drive-ide0-0-0, false, errcode);
 strcpy((char *)entry_name, first-entry);

VNVRAMEntryName is very prone to buffer overflow.  I hope real code
doesn't use strcpy().  The cast is ugly, please don't hide the type.

 vnvram_register_entry(vnvram, entry_name, 1024);
 vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1);
 vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size);

These are synchronous functions.  If I/O is involved then this is a
problem: QEMU will be blocked waiting for host I/O to complete and the
big QEMU lock is held.  This can cause poor guest interactivity and poor
scalability because vcpus cannot make progress, neither can the QEMU
monitor respond.

Stefan



Re: [Qemu-devel] [PATCH V2 2/5] block: move snapshot code in block.c to block/snapshot.c

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 04:47:13PM +0800, Wenchao Xia wrote:
 -int bdrv_is_snapshot(BlockDriverState *bs)
 -{
 -return !!(bs-open_flags  BDRV_O_SNAPSHOT);
 -}

No need to respin, but this function has nothing to do with the other
functions.  This function is about -drive snapshot=on, the others are
about internal snapshots.

It should not go into block/snapshot.c.  At least it should be
documented as having nothing to do with internal snapshots.

Stefan



Re: [Qemu-devel] [PATCH V2 4/5] util: add new function message_printf()

2013-05-24 Thread Stefan Hajnoczi
On Thu, May 23, 2013 at 04:47:15PM +0800, Wenchao Xia wrote:
 This function takes an input parameter *output, which can be specified by
 caller as stderr, stdout or a monitor. error_vprintf() now calls 
 message_vprintf(),
 which is a static function added in this patch.
 
 Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
 ---
  include/qemu/error-report.h |   13 +
  util/qemu-error.c   |   28 ++--
  2 files changed, 39 insertions(+), 2 deletions(-)
 
 diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
 index c902cc1..cdde78b 100644
 --- a/include/qemu/error-report.h
 +++ b/include/qemu/error-report.h
 @@ -14,6 +14,8 @@
  #define QEMU_ERROR_H
  
  #include stdarg.h
 +#include stdio.h
 +#include qemu/typedefs.h
  
  typedef struct Location {
  /* all members are private to qemu-error.c */
 @@ -32,6 +34,17 @@ void loc_set_none(void);
  void loc_set_cmdline(char **argv, int idx, int cnt);
  void loc_set_file(const char *fname, int lno);
  
 +typedef struct QemuOutput {
 +enum { OUTPUT_STREAM, OUTPUT_MONITOR } kind;
 +union {
 +FILE *stream;
 +Monitor *monitor;
 +};
 +} QemuOutput;
 +
 +void message_printf(const QemuOutput *output, const char *fmt, ...)
 +GCC_FMT_ATTR(2, 3);

This is introducing a slightly different solution for fprintf_function,
which is already widely used:

  $ git grep fprintf_function | wc -l
  101

Please reuse fprintf_function.

Stefan



Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed under xen-4.1.2 and qemu upstream

2013-05-24 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 02:59:05AM +, Gonglei (Arei) wrote:
 Hi,
 
  
  On Thu, 23 May 2013, Gonglei (Arei) wrote:
   Hi, all
  
   I use O_DIRECT to open disk images for IDE, but I'm failed. After debug, 
   I get
  the below logs:
   [2013-05-22 23:25:46] ide: CMD=c8
   [2013-05-22 23:25:46] bmdma: readb 0x00 : 0x08
   [2013-05-22 23:25:46] bmdma: writeb 0x00 : 0x09
   [2013-05-22 23:25:46] bmdma_cmd_writeb: 0x0009
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
   [2013-05-22 23:26:39] bmdma: writeb 0x00 : 0x08
   [2013-05-22 23:26:39] bmdma_cmd_writeb: 0x0008
   [2013-05-22 23:26:56] == offset:0 buf:0x7ff100f21c00 count:512
  aio_offset:0
   [2013-05-22 23:31:30] == offset:0 buf:0x7ff100f21c00 count:512
  aio_offset:0
   [2013-05-22 23:31:30] == handle_aiocb_rw_linear errno: -14
   [2013-05-22 23:31:30] == paio_complete errno=14
   [2013-05-22 23:31:30] == ide_dma_error!!!
   [2013-05-22 23:31:30] ide: read status addr=0x3f6 val=41
  
   QEMU command line :
   qemu-system-i386 -xen-domid 837 -chardev
  socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-837,server,nowait -mon
  chardev=libxl-cmd,mode=control -name suse11 -vnc 0.0.0.0:1 -serial pty -boot
  order=c -usb -usbdevice tablet -smp 2,maxcpus=2 -device
  rtl8139,id=nic0,netdev=net0,mac=00:16:3e:13:d3:72 -netdev
  type=tap,id=net0,ifname=tap837.0,downscript=no -M xenfv -m 2040 -drive
  file=/mnt/sdd/image/suse.image,if=ide,index=0,media=disk,format=raw,cache
  =none
  
   errno 14 shows Bad Address. And I find QEMU_AIO_MISALIGNED flag bit is
  not set through debug.
  
   /*
* If O_DIRECT is used the buffer needs to be aligned on a sector
* boundary.  Check if this is the case or tell the low-level
* driver that it needs to copy the buffer.
*/
   if ((bs-open_flags  BDRV_O_NOCACHE)) {
   if (!bdrv_qiov_is_aligned(bs, qiov)) {//if the address is
  aligned-512, will no meet the conditions
   type |= QEMU_AIO_MISALIGNED;
   #ifdef CONFIG_LINUX_AIO
   } else if (s-use_aio) {
   return laio_submit(bs, s-aio_ctx, s-fd, sector_num, qiov,
  nb_sectors, cb, opaque, type);
   #endif
  
   Next process:
   static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
   {
   ssize_t nbytes;
   char *buf;
  
   if (!(aiocb-aio_type  QEMU_AIO_MISALIGNED)) {
   /*
* If there is just a single buffer, and it is properly aligned
* we can just use plain pread/pwrite without any problems.
*/
   if (aiocb-aio_niov == 1)
return handle_aiocb_rw_linear(aiocb,
  aiocb-aio_iov-iov_base); //this way, and reports errno 14 next
  
   Anyone have a good method to resolve this bug? Thanks!
  
  I know that this is not the answer you are looking for but why do you
  want to use O_DIRECT with IDE?
  It should be perfectly safe to use write-back.
 
 A few days ago, I asked a question about the IDE FLUSH time of guest: 
 http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg02642.html
 
 finally I found that because Qemu use write-back flag to open disk images by 
 default. 
 so I hope to use O_DIRECT to avoid meeting that problem, but I'm failed under 
 Xen platform with Qemu upstream.

cache=none does not eliminate the need for flush.  If you want to do
that, try cache=writethrough or cache=directsync.

Regarding the EFAULT you are seeing, did you check if the I/O buffer
address is valid?  Try breaking in gdb and inspecting /proc/pid/maps
or just x buffer-address in gdb to see if the buffer is readable.

Stefan



Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage

2013-05-24 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 08:13:27AM -0400, Stefan Berger wrote:
 On 05/24/2013 05:59 AM, Stefan Hajnoczi wrote:
 On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote:
 This patch series provides VNVRAM persistent storage support that
 QEMU can use internally.  The initial target user will be a software
 vTPM 1.2 backend that needs to store keys in VNVRAM and be able to
 reboot/migrate and retain the keys.
 
 This support uses QEMU's block driver to provide persistent storage
 by reading/writing VNVRAM data from/to a drive image.  The VNVRAM
 drive image is provided with the -drive command line option just like
 any other drive image and the vnvram_create() API will find it.
 
 The APIs allow for VNVRAM entries to be registered, one at a time,
 each with a maximum blob size.  Entry blobs can then be read/written
 from/to an entry on the drive.  Here's an example of usage:
 
 VNVRAM *vnvram;
 int errcode
 const VNVRAMEntryName entry_name;
 const char *blob_w = blob data;
 char *blob_r;
 uint32_t blob_r_size;
 
 vnvram = vnvram_create(drive-ide0-0-0, false, errcode);
 strcpy((char *)entry_name, first-entry);
 VNVRAMEntryName is very prone to buffer overflow.  I hope real code
 doesn't use strcpy().  The cast is ugly, please don't hide the type.
 
 vnvram_register_entry(vnvram, entry_name, 1024);
 vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1);
 vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size);
 These are synchronous functions.  If I/O is involved then this is a
 problem: QEMU will be blocked waiting for host I/O to complete and the
 big QEMU lock is held.  This can cause poor guest interactivity and poor
 scalability because vcpus cannot make progress, neither can the QEMU
 monitor respond.
 
 The vTPM is going to run as a thread and will have to write state
 blobs into a bdrv. The above functions will typically be called from
 this thead. When I originally wrote the code, the vTPM thread could
 not write the blobs into bdrv directly, so I had to resort to
 sending a message to the main QEMU thread to write the data to the
 bdrv. How else could we do this?

How else: use asynchronous APIs like bdrv_aio_writev() or the coroutine
versions (which eliminate the need for callbacks) like bdrv_co_writev().

I'm preparing patches that allow the QEMU block layer to be used safely
outside the QEMU global mutex.  Once this is possible it would be okay
to use synchronous methods.

Stefan



[Qemu-devel] [PULL 00/11] Block patches

2013-05-24 Thread Stefan Hajnoczi
The following changes since commit 64afc2b4d48fb21e085517c38a59a3f61a11283c:

  Merge remote-tracking branch 'luiz/queue/qmp' into staging (2013-05-23 
14:16:35 -0500)

are available in the git repository at:


  git://github.com/stefanha/qemu.git block

for you to fetch changes up to 02ffb504485f0920cfc75a0982a602f824a9a4f4:

  coroutine: stop using AioContext in CoQueue (2013-05-24 16:17:56 +0200)


Kevin Wolf (4):
  blockdev: Rename BlockdevAction - TransactionAction
  qemu-io: Fix 'map' output
  qcow2.py: Subcommand for changing header fields
  qemu-iotests: Try creating huge qcow2 image

Stefan Hajnoczi (2):
  coroutine: protect global pool with a mutex
  coroutine: stop using AioContext in CoQueue

Wenchao Xia (5):
  block: package preparation code in qmp_transaction()
  block: move input parsing code in qmp_transaction()
  block: package committing code in qmp_transaction()
  block: package rollback code in qmp_transaction()
  block: make all steps in qmp_transaction() as callback

 blockdev.c| 280 ++
 include/block/coroutine_int.h |   4 +
 qapi-schema.json  |  21 ++--
 qemu-coroutine-lock.c |  56 -
 qemu-coroutine.c  |  23 +++-
 qemu-io.c |  46 ++-
 tests/qemu-iotests/054|  58 +
 tests/qemu-iotests/054.out|  10 ++
 tests/qemu-iotests/common.rc  |   2 +-
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/qcow2.py   |  17 +++
 trace-events  |   2 +-
 12 files changed, 361 insertions(+), 159 deletions(-)
 create mode 100755 tests/qemu-iotests/054
 create mode 100644 tests/qemu-iotests/054.out

-- 
1.8.1.4




[Qemu-devel] [PATCH 02/11] block: move input parsing code in qmp_transaction()

2013-05-24 Thread Stefan Hajnoczi
From: Wenchao Xia xiaw...@linux.vnet.ibm.com

The code is moved into preparation function, and changed
a bit to tip more clearly what it is doing.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 1cb9640..0115f46 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -785,10 +785,7 @@ typedef struct BlkTransactionStates {
 QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
 } BlkTransactionStates;
 
-static void external_snapshot_prepare(const char *device,
-  const char *format,
-  const char *new_image_file,
-  enum NewImageMode mode,
+static void external_snapshot_prepare(BlockdevAction *action,
   BlkTransactionStates *states,
   Error **errp)
 {
@@ -796,7 +793,24 @@ static void external_snapshot_prepare(const char *device,
 BlockDriver *drv;
 int flags, ret;
 Error *local_err = NULL;
+const char *device;
+const char *new_image_file;
+const char *format = qcow2;
+enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
 
+/* get parameters */
+g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
+
+device = action-blockdev_snapshot_sync-device;
+new_image_file = action-blockdev_snapshot_sync-snapshot_file;
+if (action-blockdev_snapshot_sync-has_format) {
+format = action-blockdev_snapshot_sync-format;
+}
+if (action-blockdev_snapshot_sync-has_mode) {
+mode = action-blockdev_snapshot_sync-mode;
+}
+
+/* start processing */
 drv = bdrv_find_format(format);
 if (!drv) {
 error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
@@ -877,10 +891,6 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 /* We don't do anything in this loop that commits us to the snapshot */
 while (NULL != dev_entry) {
 BlockdevAction *dev_info = NULL;
-enum NewImageMode mode;
-const char *new_image_file;
-const char *device;
-const char *format = qcow2;
 
 dev_info = dev_entry-value;
 dev_entry = dev_entry-next;
@@ -890,17 +900,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 
 switch (dev_info-kind) {
 case BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC:
-device = dev_info-blockdev_snapshot_sync-device;
-if (!dev_info-blockdev_snapshot_sync-has_mode) {
-dev_info-blockdev_snapshot_sync-mode = 
NEW_IMAGE_MODE_ABSOLUTE_PATHS;
-}
-new_image_file = dev_info-blockdev_snapshot_sync-snapshot_file;
-if (dev_info-blockdev_snapshot_sync-has_format) {
-format = dev_info-blockdev_snapshot_sync-format;
-}
-mode = dev_info-blockdev_snapshot_sync-mode;
-external_snapshot_prepare(device, format, new_image_file,
-  mode, states, local_err);
+external_snapshot_prepare(dev_info, states, errp);
 if (error_is_set(local_err)) {
 error_propagate(errp, local_err);
 goto delete_and_fail;
-- 
1.8.1.4




[Qemu-devel] [PATCH 03/11] block: package committing code in qmp_transaction()

2013-05-24 Thread Stefan Hajnoczi
From: Wenchao Xia xiaw...@linux.vnet.ibm.com

The code is simply moved into a separate function.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 0115f46..76f0532 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -871,6 +871,17 @@ static void external_snapshot_prepare(BlockdevAction 
*action,
 }
 }
 
+static void external_snapshot_commit(BlkTransactionStates *states)
+{
+/* This removes our old bs from the bdrv_states, and adds the new bs */
+bdrv_append(states-new_bs, states-old_bs);
+/* We don't need (or want) to use the transactional
+ * bdrv_reopen_multiple() across all the entries at once, because we
+ * don't want to abort all of them if one of them fails the reopen */
+bdrv_reopen(states-new_bs, states-new_bs-open_flags  ~BDRV_O_RDWR,
+NULL);
+}
+
 /*
  * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
  *  then we do not pivot any of the devices in the group, and abandon the
@@ -916,13 +927,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 /* Now we are going to do the actual pivot.  Everything up to this point
  * is reversible, but we are committed at this point */
 QSIMPLEQ_FOREACH(states, snap_bdrv_states, entry) {
-/* This removes our old bs from the bdrv_states, and adds the new bs */
-bdrv_append(states-new_bs, states-old_bs);
-/* We don't need (or want) to use the transactional
- * bdrv_reopen_multiple() across all the entries at once, because we
- * don't want to abort all of them if one of them fails the reopen */
-bdrv_reopen(states-new_bs, states-new_bs-open_flags  ~BDRV_O_RDWR,
-NULL);
+external_snapshot_commit(states);
 }
 
 /* success */
-- 
1.8.1.4




[Qemu-devel] [PATCH 07/11] qemu-io: Fix 'map' output

2013-05-24 Thread Stefan Hajnoczi
From: Kevin Wolf kw...@redhat.com

The output of the 'map' command in qemu-io used to directly resemble
bdrv_is_allocated() and could contain many lines for small chunks that
all have the same allocation status. After this patch, they will be
coalesced into a single output line for a large chunk.

As a side effect, the command gains some error handling.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 qemu-io.c | 46 +-
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 475a8bd..5e6680b 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1635,12 +1635,43 @@ static const cmdinfo_t alloc_cmd = {
 .oneline= checks if a sector is present in the file,
 };
 
+
+static int map_is_allocated(int64_t sector_num, int64_t nb_sectors, int64_t 
*pnum)
+{
+int num, num_checked;
+int ret, firstret;
+
+num_checked = MIN(nb_sectors, INT_MAX);
+ret = bdrv_is_allocated(bs, sector_num, num_checked, num);
+if (ret  0) {
+return ret;
+}
+
+firstret = ret;
+*pnum = num;
+
+while (nb_sectors  0  ret == firstret) {
+sector_num += num;
+nb_sectors -= num;
+
+num_checked = MIN(nb_sectors, INT_MAX);
+ret = bdrv_is_allocated(bs, sector_num, num_checked, num);
+if (ret == firstret) {
+*pnum += num;
+} else {
+break;
+}
+}
+
+return firstret;
+}
+
 static int map_f(int argc, char **argv)
 {
 int64_t offset;
 int64_t nb_sectors;
 char s1[64];
-int num, num_checked;
+int64_t num;
 int ret;
 const char *retstr;
 
@@ -1648,12 +1679,17 @@ static int map_f(int argc, char **argv)
 nb_sectors = bs-total_sectors;
 
 do {
-num_checked = MIN(nb_sectors, INT_MAX);
-ret = bdrv_is_allocated(bs, offset, num_checked, num);
+ret = map_is_allocated(offset, nb_sectors, num);
+if (ret  0) {
+error_report(Failed to get allocation status: %s, 
strerror(-ret));
+return 0;
+}
+
 retstr = ret ? allocated : not allocated;
 cvtstr(offset  9ULL, s1, sizeof(s1));
-printf([% 24 PRId64 ] % 8d/% 8d sectors %s at offset %s (%d)\n,
-   offset  9ULL, num, num_checked, retstr, s1, ret);
+printf([% 24 PRId64 ] % 8 PRId64 /% 8 PRId64  sectors %s 
+   at offset %s (%d)\n,
+   offset  9ULL, num, nb_sectors, retstr, s1, ret);
 
 offset += num;
 nb_sectors -= num;
-- 
1.8.1.4




[Qemu-devel] [PATCH 05/11] block: make all steps in qmp_transaction() as callback

2013-05-24 Thread Stefan Hajnoczi
From: Wenchao Xia xiaw...@linux.vnet.ibm.com

Make it easier to add other operations to qmp_transaction() by using
callbacks, with external snapshots serving as an example implementation
of the callbacks.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c | 95 ++
 1 file changed, 71 insertions(+), 24 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 2131e13..617501c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -779,14 +779,43 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
 
 
 /* New and old BlockDriverState structs for group snapshots */
-typedef struct BlkTransactionStates {
+
+typedef struct BlkTransactionStates BlkTransactionStates;
+
+/* Only prepare() may fail. In a single transaction, only one of commit() or
+   abort() will be called, clean() will always be called if it present. */
+typedef struct BdrvActionOps {
+/* Size of state struct, in bytes. */
+size_t instance_size;
+/* Prepare the work, must NOT be NULL. */
+void (*prepare)(BlkTransactionStates *common, Error **errp);
+/* Commit the changes, must NOT be NULL. */
+void (*commit)(BlkTransactionStates *common);
+/* Abort the changes on fail, can be NULL. */
+void (*abort)(BlkTransactionStates *common);
+/* Clean up resource in the end, can be NULL. */
+void (*clean)(BlkTransactionStates *common);
+} BdrvActionOps;
+
+/*
+ * This structure must be arranged as first member in child type, assuming
+ * that compiler will also arrange it to the same address with parent instance.
+ * Later it will be used in free().
+ */
+struct BlkTransactionStates {
+BlockdevAction *action;
+const BdrvActionOps *ops;
+QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
+};
+
+/* external snapshot private data */
+typedef struct ExternalSnapshotStates {
+BlkTransactionStates common;
 BlockDriverState *old_bs;
 BlockDriverState *new_bs;
-QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
-} BlkTransactionStates;
+} ExternalSnapshotStates;
 
-static void external_snapshot_prepare(BlockdevAction *action,
-  BlkTransactionStates *states,
+static void external_snapshot_prepare(BlkTransactionStates *common,
   Error **errp)
 {
 BlockDriver *proto_drv;
@@ -797,6 +826,9 @@ static void external_snapshot_prepare(BlockdevAction 
*action,
 const char *new_image_file;
 const char *format = qcow2;
 enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+ExternalSnapshotStates *states =
+ DO_UPCAST(ExternalSnapshotStates, common, common);
+BlockdevAction *action = common-action;
 
 /* get parameters */
 g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
@@ -871,8 +903,11 @@ static void external_snapshot_prepare(BlockdevAction 
*action,
 }
 }
 
-static void external_snapshot_commit(BlkTransactionStates *states)
+static void external_snapshot_commit(BlkTransactionStates *common)
 {
+ExternalSnapshotStates *states =
+ DO_UPCAST(ExternalSnapshotStates, common, common);
+
 /* This removes our old bs from the bdrv_states, and adds the new bs */
 bdrv_append(states-new_bs, states-old_bs);
 /* We don't need (or want) to use the transactional
@@ -882,13 +917,24 @@ static void external_snapshot_commit(BlkTransactionStates 
*states)
 NULL);
 }
 
-static void external_snapshot_abort(BlkTransactionStates *states)
+static void external_snapshot_abort(BlkTransactionStates *common)
 {
+ExternalSnapshotStates *states =
+ DO_UPCAST(ExternalSnapshotStates, common, common);
 if (states-new_bs) {
 bdrv_delete(states-new_bs);
 }
 }
 
+static const BdrvActionOps actions[] = {
+[BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
+.instance_size = sizeof(ExternalSnapshotStates),
+.prepare  = external_snapshot_prepare,
+.commit   = external_snapshot_commit,
+.abort = external_snapshot_abort,
+},
+};
+
 /*
  * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
  *  then we do not pivot any of the devices in the group, and abandon the
@@ -909,32 +955,28 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 /* We don't do anything in this loop that commits us to the snapshot */
 while (NULL != dev_entry) {
 BlockdevAction *dev_info = NULL;
+const BdrvActionOps *ops;
 
 dev_info = dev_entry-value;
 dev_entry = dev_entry-next;
 
-states = g_malloc0(sizeof(BlkTransactionStates));
+assert(dev_info-kind  ARRAY_SIZE(actions));
+
+ops = actions[dev_info-kind

[Qemu-devel] [PATCH 09/11] qemu-iotests: Try creating huge qcow2 image

2013-05-24 Thread Stefan Hajnoczi
From: Kevin Wolf kw...@redhat.com

It's supposed to fail gracefully instead of segfaulting.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 tests/qemu-iotests/054   | 58 
 tests/qemu-iotests/054.out   | 10 
 tests/qemu-iotests/common.rc |  2 +-
 tests/qemu-iotests/group |  1 +
 4 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/054
 create mode 100644 tests/qemu-iotests/054.out

diff --git a/tests/qemu-iotests/054 b/tests/qemu-iotests/054
new file mode 100755
index 000..b360429
--- /dev/null
+++ b/tests/qemu-iotests/054
@@ -0,0 +1,58 @@
+#!/bin/bash
+#
+# Test huge qcow2 images
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto generic
+_supported_os Linux
+
+echo
+echo creating too large image (1 EB)
+_make_test_img $((1024*1024))T
+
+echo
+echo creating too large image (1 EB) using qcow2.py
+_make_test_img 4G
+./qcow2.py $TEST_IMG set-header size $((1024 ** 6))
+_check_test_img
+
+# success, all done
+echo *** done
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/054.out b/tests/qemu-iotests/054.out
new file mode 100644
index 000..0b2fe30
--- /dev/null
+++ b/tests/qemu-iotests/054.out
@@ -0,0 +1,10 @@
+QA output created by 054
+
+creating too large image (1 EB)
+qemu-img: The image size is too large for file format 'qcow2'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1152921504606846976 
+
+creating too large image (1 EB) using qcow2.py
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 
+qemu-img: Could not open 'TEST_DIR/t.qcow2': File too large
+*** done
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 31eb62b..e9ba358 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -167,7 +167,7 @@ _cleanup_test_img()
 
 _check_test_img()
 {
-$QEMU_IMG check $@ -f $IMGFMT $TEST_IMG 21 | \
+$QEMU_IMG check $@ -f $IMGFMT $TEST_IMG 21 | _filter_testdir | \
 sed -e '/allocated.*fragmented.*compressed clusters/d' \
 -e 's/qemu-img: This image format does not support checks/No 
errors were found on the image./' \
 -e '/Image end offset: [0-9]\+/d'
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index bf944d9..387b050 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -60,3 +60,4 @@
 #051 rw auto
 052 rw auto backing
 053 rw auto
+054 rw auto
-- 
1.8.1.4




[Qemu-devel] [PATCH 08/11] qcow2.py: Subcommand for changing header fields

2013-05-24 Thread Stefan Hajnoczi
From: Kevin Wolf kw...@redhat.com

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 tests/qemu-iotests/qcow2.py | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index fecf5b9..44a2b45 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -149,6 +149,22 @@ def cmd_dump_header(fd):
 h.dump()
 h.dump_extensions()
 
+def cmd_set_header(fd, name, value):
+try:
+value = int(value, 0)
+except:
+print '%s' is not a valid number % value
+sys.exit(1)
+
+fields = (field[2] for field in QcowHeader.fields)
+if not name in fields:
+print '%s' is not a known header field % name
+sys.exit(1)
+
+h = QcowHeader(fd)
+h.__dict__[name] = value
+h.update(fd)
+
 def cmd_add_header_ext(fd, magic, data):
 try:
 magic = int(magic, 0)
@@ -205,6 +221,7 @@ def cmd_set_feature_bit(fd, group, bit):
 
 cmds = [
 [ 'dump-header',cmd_dump_header,0, 'Dump image header and header 
extensions' ],
+[ 'set-header', cmd_set_header, 2, 'Set a field in the header'],
 [ 'add-header-ext', cmd_add_header_ext, 2, 'Add a header extension' ],
 [ 'del-header-ext', cmd_del_header_ext, 1, 'Delete a header extension' ],
 [ 'set-feature-bit', cmd_set_feature_bit, 2, 'Set a feature bit'],
-- 
1.8.1.4




[Qemu-devel] [PATCH 06/11] blockdev: Rename BlockdevAction - TransactionAction

2013-05-24 Thread Stefan Hajnoczi
From: Kevin Wolf kw...@redhat.com

There's no reason to restrict transactions to operations related to
block devices, so rename the type now before schema introspection stops
us from doing so.

Also change the schema documentation of 'transaction' to not refer to
block devices or snapshots any more.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c   | 22 +++---
 qapi-schema.json | 21 ++---
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 617501c..d1ec99a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -750,8 +750,8 @@ void do_commit(Monitor *mon, const QDict *qdict)
 
 static void blockdev_do_action(int kind, void *data, Error **errp)
 {
-BlockdevAction action;
-BlockdevActionList list;
+TransactionAction action;
+TransactionActionList list;
 
 action.kind = kind;
 action.data = data;
@@ -773,8 +773,8 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
 .has_mode = has_mode,
 .mode = mode,
 };
-blockdev_do_action(BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC, snapshot,
-   errp);
+blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC,
+   snapshot, errp);
 }
 
 
@@ -803,7 +803,7 @@ typedef struct BdrvActionOps {
  * Later it will be used in free().
  */
 struct BlkTransactionStates {
-BlockdevAction *action;
+TransactionAction *action;
 const BdrvActionOps *ops;
 QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
 };
@@ -828,10 +828,10 @@ static void 
external_snapshot_prepare(BlkTransactionStates *common,
 enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
 ExternalSnapshotStates *states =
  DO_UPCAST(ExternalSnapshotStates, common, common);
-BlockdevAction *action = common-action;
+TransactionAction *action = common-action;
 
 /* get parameters */
-g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
+g_assert(action-kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC);
 
 device = action-blockdev_snapshot_sync-device;
 new_image_file = action-blockdev_snapshot_sync-snapshot_file;
@@ -927,7 +927,7 @@ static void external_snapshot_abort(BlkTransactionStates 
*common)
 }
 
 static const BdrvActionOps actions[] = {
-[BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
+[TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = {
 .instance_size = sizeof(ExternalSnapshotStates),
 .prepare  = external_snapshot_prepare,
 .commit   = external_snapshot_commit,
@@ -940,9 +940,9 @@ static const BdrvActionOps actions[] = {
  *  then we do not pivot any of the devices in the group, and abandon the
  *  snapshots
  */
-void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
+void qmp_transaction(TransactionActionList *dev_list, Error **errp)
 {
-BlockdevActionList *dev_entry = dev_list;
+TransactionActionList *dev_entry = dev_list;
 BlkTransactionStates *states, *next;
 Error *local_err = NULL;
 
@@ -954,7 +954,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 
 /* We don't do anything in this loop that commits us to the snapshot */
 while (NULL != dev_entry) {
-BlockdevAction *dev_info = NULL;
+TransactionAction *dev_info = NULL;
 const BdrvActionOps *ops;
 
 dev_info = dev_entry-value;
diff --git a/qapi-schema.json b/qapi-schema.json
index 664b31f..ef1f657 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1609,12 +1609,12 @@
 '*mode': 'NewImageMode' } }
 
 ##
-# @BlockdevAction
+# @TransactionAction
 #
 # A discriminated record of operations that can be performed with
 # @transaction.
 ##
-{ 'union': 'BlockdevAction',
+{ 'union': 'TransactionAction',
   'data': {
'blockdev-snapshot-sync': 'BlockdevSnapshot'
} }
@@ -1622,25 +1622,24 @@
 ##
 # @transaction
 #
-# Atomically operate on a group of one or more block devices.  If
-# any operation fails, then the entire set of actions will be
-# abandoned and the appropriate error returned.  The only operation
-# supported is currently blockdev-snapshot-sync.
+# Executes a number of transactionable QMP commands atomically. If any
+# operation fails, then the entire set of actions will be abandoned and the
+# appropriate error returned.
 #
 #  List of:
-#  @BlockdevAction: information needed for the device snapshot
+#  @TransactionAction: information needed for the respective operation
 #
 # Returns: nothing on success
-#  If @device is not a valid block device, DeviceNotFound
+#  Errors depend on the operations of the transaction
 #
-# Note: The transaction aborts on the first failure.  Therefore, there will
-# be only one device or snapshot file returned in an error condition, and
+# Note: The transaction aborts on the first failure.  Therefore, there will be
+# information on only one failed operation 

[Qemu-devel] [PATCH 10/11] coroutine: protect global pool with a mutex

2013-05-24 Thread Stefan Hajnoczi
The coroutine freelist is a global pool of unused coroutines.  It avoids
the setup/teardown overhead associated with the coroutine lifecycle.
Since the pool is global, we need to synchronize access so that
coroutines can be used outside the BQL.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 qemu-coroutine.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index 25a14c6..60ac79e 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -14,6 +14,7 @@
 
 #include trace.h
 #include qemu-common.h
+#include qemu/thread.h
 #include block/coroutine.h
 #include block/coroutine_int.h
 
@@ -23,6 +24,7 @@ enum {
 };
 
 /** Free list to speed up creation */
+static QemuMutex pool_lock;
 static QSLIST_HEAD(, Coroutine) pool = QSLIST_HEAD_INITIALIZER(pool);
 static unsigned int pool_size;
 
@@ -30,11 +32,15 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 {
 Coroutine *co;
 
+qemu_mutex_lock(pool_lock);
 co = QSLIST_FIRST(pool);
 if (co) {
 QSLIST_REMOVE_HEAD(pool, pool_next);
 pool_size--;
-} else {
+}
+qemu_mutex_unlock(pool_lock);
+
+if (!co) {
 co = qemu_coroutine_new();
 }
 
@@ -44,17 +50,25 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 
 static void coroutine_delete(Coroutine *co)
 {
+qemu_mutex_lock(pool_lock);
 if (pool_size  POOL_MAX_SIZE) {
 QSLIST_INSERT_HEAD(pool, co, pool_next);
 co-caller = NULL;
 pool_size++;
+qemu_mutex_unlock(pool_lock);
 return;
 }
+qemu_mutex_unlock(pool_lock);
 
 qemu_coroutine_delete(co);
 }
 
-static void __attribute__((destructor)) coroutine_cleanup(void)
+static void __attribute__((constructor)) coroutine_pool_init(void)
+{
+qemu_mutex_init(pool_lock);
+}
+
+static void __attribute__((destructor)) coroutine_pool_cleanup(void)
 {
 Coroutine *co;
 Coroutine *tmp;
@@ -63,6 +77,8 @@ static void __attribute__((destructor)) 
coroutine_cleanup(void)
 QSLIST_REMOVE_HEAD(pool, pool_next);
 qemu_coroutine_delete(co);
 }
+
+qemu_mutex_destroy(pool_lock);
 }
 
 static void coroutine_swap(Coroutine *from, Coroutine *to)
-- 
1.8.1.4




[Qemu-devel] [PATCH 01/11] block: package preparation code in qmp_transaction()

2013-05-24 Thread Stefan Hajnoczi
From: Wenchao Xia xiaw...@linux.vnet.ibm.com

The code before really committing is moved into a function. Most
code is simply moved from qmp_transaction(), except that on fail it
just returns now. Other code such as input parsing is not touched,
to make it easier in review.

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c | 139 ++---
 1 file changed, 77 insertions(+), 62 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7c9d8dd..1cb9640 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -785,6 +785,78 @@ typedef struct BlkTransactionStates {
 QSIMPLEQ_ENTRY(BlkTransactionStates) entry;
 } BlkTransactionStates;
 
+static void external_snapshot_prepare(const char *device,
+  const char *format,
+  const char *new_image_file,
+  enum NewImageMode mode,
+  BlkTransactionStates *states,
+  Error **errp)
+{
+BlockDriver *proto_drv;
+BlockDriver *drv;
+int flags, ret;
+Error *local_err = NULL;
+
+drv = bdrv_find_format(format);
+if (!drv) {
+error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+return;
+}
+
+states-old_bs = bdrv_find(device);
+if (!states-old_bs) {
+error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+return;
+}
+
+if (!bdrv_is_inserted(states-old_bs)) {
+error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+return;
+}
+
+if (bdrv_in_use(states-old_bs)) {
+error_set(errp, QERR_DEVICE_IN_USE, device);
+return;
+}
+
+if (!bdrv_is_read_only(states-old_bs)) {
+if (bdrv_flush(states-old_bs)) {
+error_set(errp, QERR_IO_ERROR);
+return;
+}
+}
+
+flags = states-old_bs-open_flags;
+
+proto_drv = bdrv_find_protocol(new_image_file);
+if (!proto_drv) {
+error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+return;
+}
+
+/* create new image w/backing file */
+if (mode != NEW_IMAGE_MODE_EXISTING) {
+bdrv_img_create(new_image_file, format,
+states-old_bs-filename,
+states-old_bs-drv-format_name,
+NULL, -1, flags, local_err, false);
+if (error_is_set(local_err)) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
+/* We will manually add the backing_hd field to the bs later */
+states-new_bs = bdrv_new();
+/* TODO Inherit bs-options or only take explicit options with an
+ * extended QMP command? */
+ret = bdrv_open(states-new_bs, new_image_file, NULL,
+flags | BDRV_O_NO_BACKING, drv);
+if (ret != 0) {
+error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
+}
+}
+
 /*
  * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
  *  then we do not pivot any of the devices in the group, and abandon the
@@ -792,7 +864,6 @@ typedef struct BlkTransactionStates {
  */
 void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
 {
-int ret = 0;
 BlockdevActionList *dev_entry = dev_list;
 BlkTransactionStates *states, *next;
 Error *local_err = NULL;
@@ -806,9 +877,6 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 /* We don't do anything in this loop that commits us to the snapshot */
 while (NULL != dev_entry) {
 BlockdevAction *dev_info = NULL;
-BlockDriver *proto_drv;
-BlockDriver *drv;
-int flags;
 enum NewImageMode mode;
 const char *new_image_file;
 const char *device;
@@ -831,70 +899,17 @@ void qmp_transaction(BlockdevActionList *dev_list, Error 
**errp)
 format = dev_info-blockdev_snapshot_sync-format;
 }
 mode = dev_info-blockdev_snapshot_sync-mode;
-break;
-default:
-abort();
-}
-
-drv = bdrv_find_format(format);
-if (!drv) {
-error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-goto delete_and_fail;
-}
-
-states-old_bs = bdrv_find(device);
-if (!states-old_bs) {
-error_set(errp, QERR_DEVICE_NOT_FOUND, device);
-goto delete_and_fail;
-}
-
-if (!bdrv_is_inserted(states-old_bs)) {
-error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
-goto delete_and_fail;
-}
-
-if (bdrv_in_use(states-old_bs)) {
-error_set(errp, QERR_DEVICE_IN_USE, device);
-goto delete_and_fail;
-}
-
-if (!bdrv_is_read_only(states-old_bs

[Qemu-devel] [PATCH 2/2] rtl8139: flush queued packets when RxBufPtr is written

2013-05-24 Thread Stefan Hajnoczi
Net queues support efficient receive disable.  For example, tap's file
descriptor will not be polled while its peer has receive disabled.  This
saves CPU cycles for needlessly copying and then dropping packets which
the peer cannot receive.

rtl8139 is missing the qemu_flush_queued_packets() call that wakes the
queue up when receive becomes possible again.

As a result, the Windows 7 guest driver reaches a state where the
rtl8139 cannot receive packets.  The driver has actually refilled the
receive buffer but we never resume reception.

The bug can be reproduced by running a large FTP 'get' inside a Windows
7 guest:

  $ qemu -netdev tap,id=tap0,...
 -device rtl8139,netdev=tap0

The Linux guest driver does not trigger the bug, probably due to a
different buffer management strategy.

Reported-by: Oliver Francke oliver.fran...@filoo.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/net/rtl8139.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 9369507..7993f9f 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, 
uint32_t val)
 /* this value is off by 16 */
 s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize);
 
+/* more buffer space may be available so try to receive */
+qemu_flush_queued_packets(qemu_get_queue(s-nic));
+
 DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n,
 s-RxBufferSize, s-RxBufAddr, s-RxBufPtr);
 }
-- 
1.8.1.4




[Qemu-devel] [PATCH 1/2] net: support for bridged networking on Mac OS X

2013-05-24 Thread Stefan Hajnoczi
From: Alasdair McLeay alasdair.mcl...@me.com

tun tap can be implemented on Mac OS X using
http://tuntaposx.sourceforge.net

It behaves in the same way as FreeBSD/OpenBSD implementations, but Qemu
needs a patch to use the OpenBS/FreeBSD code.

As per the patch listed in this forum thread:
http://forum.gns3.net/post17679.html#p17679

And also as used in the MacPorts installation:
https://trac.macports.org/browser/trunk/dports/emulators/qemu/files/patch-net-tap-interface.diff

Signed-off-by: Alasdair McLeay alasdair.mcl...@me.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 net/tap-bsd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index bcdb268..f61d580 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -44,7 +44,8 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
 struct stat s;
 #endif
 
-#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || defined(__OpenBSD__)
+#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || \
+defined(__OpenBSD__) || defined(__APPLE__)
 /* if no ifname is given, always start the search from tap0/tun0. */
 int i;
 char dname[100];
-- 
1.8.1.4




[Qemu-devel] [PULL 0/2] Net patches

2013-05-24 Thread Stefan Hajnoczi
The following changes since commit 64afc2b4d48fb21e085517c38a59a3f61a11283c:

  Merge remote-tracking branch 'luiz/queue/qmp' into staging (2013-05-23 
14:16:35 -0500)

are available in the git repository at:


  git://github.com/stefanha/qemu.git net

for you to fetch changes up to 00b7ade807b5ce6779ddd86ce29c5521ec5c529a:

  rtl8139: flush queued packets when RxBufPtr is written (2013-05-24 16:34:13 
+0200)


Alasdair McLeay (1):
  net: support for bridged networking on Mac OS X

Stefan Hajnoczi (1):
  rtl8139: flush queued packets when RxBufPtr is written

 hw/net/rtl8139.c | 3 +++
 net/tap-bsd.c| 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

-- 
1.8.1.4




[Qemu-devel] [PATCH 11/11] coroutine: stop using AioContext in CoQueue

2013-05-24 Thread Stefan Hajnoczi
qemu_co_queue_next(queue) arranges that the next queued coroutine is
run at a later point in time.  This deferred restart is useful because
the caller may not want to transfer control yet.

This behavior was implemented using QEMUBH in the past, which meant that
CoQueue (and hence CoMutex and CoRwlock) had a dependency on the
AioContext event loop.  This hidden dependency causes trouble when we
move to a world with multiple event loops - now qemu_co_queue_next()
needs to know which event loop to schedule the QEMUBH in.

After pondering how to stash AioContext I realized the best solution is
to not use AioContext at all.  This patch implements the deferred
restart behavior purely in terms of coroutines and no longer uses
QEMUBH.

Here is how it works:

Each Coroutine has a wakeup queue that starts out empty.  When
qemu_co_queue_next() is called, the next coroutine is added to our
wakeup queue.  The wakeup queue is processed when we yield or terminate.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/block/coroutine_int.h |  4 
 qemu-coroutine-lock.c | 56 ---
 qemu-coroutine.c  |  3 +++
 trace-events  |  2 +-
 4 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/include/block/coroutine_int.h b/include/block/coroutine_int.h
index 17eb71e..f133d65 100644
--- a/include/block/coroutine_int.h
+++ b/include/block/coroutine_int.h
@@ -38,6 +38,9 @@ struct Coroutine {
 void *entry_arg;
 Coroutine *caller;
 QSLIST_ENTRY(Coroutine) pool_next;
+
+/* Coroutines that should be woken up when we yield or terminate */
+QTAILQ_HEAD(, Coroutine) co_queue_wakeup;
 QTAILQ_ENTRY(Coroutine) co_queue_next;
 };
 
@@ -45,5 +48,6 @@ Coroutine *qemu_coroutine_new(void);
 void qemu_coroutine_delete(Coroutine *co);
 CoroutineAction qemu_coroutine_switch(Coroutine *from, Coroutine *to,
   CoroutineAction action);
+void coroutine_fn qemu_co_queue_run_restart(Coroutine *co);
 
 #endif
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 86efe1f..d9fea49 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -26,39 +26,11 @@
 #include block/coroutine.h
 #include block/coroutine_int.h
 #include qemu/queue.h
-#include block/aio.h
 #include trace.h
 
-/* Coroutines are awoken from a BH to allow the current coroutine to complete
- * its flow of execution.  The BH may run after the CoQueue has been destroyed,
- * so keep BH data in a separate heap-allocated struct.
- */
-typedef struct {
-QEMUBH *bh;
-QTAILQ_HEAD(, Coroutine) entries;
-} CoQueueNextData;
-
-static void qemu_co_queue_next_bh(void *opaque)
-{
-CoQueueNextData *data = opaque;
-Coroutine *next;
-
-trace_qemu_co_queue_next_bh();
-while ((next = QTAILQ_FIRST(data-entries))) {
-QTAILQ_REMOVE(data-entries, next, co_queue_next);
-qemu_coroutine_enter(next, NULL);
-}
-
-qemu_bh_delete(data-bh);
-g_slice_free(CoQueueNextData, data);
-}
-
 void qemu_co_queue_init(CoQueue *queue)
 {
 QTAILQ_INIT(queue-entries);
-
-/* This will be exposed to callers once there are multiple AioContexts */
-queue-ctx = qemu_get_aio_context();
 }
 
 void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
@@ -77,23 +49,37 @@ void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue 
*queue)
 assert(qemu_in_coroutine());
 }
 
+/**
+ * qemu_co_queue_run_restart:
+ *
+ * Enter each coroutine that was previously marked for restart by
+ * qemu_co_queue_next() or qemu_co_queue_restart_all().  This function is
+ * invoked by the core coroutine code when the current coroutine yields or
+ * terminates.
+ */
+void qemu_co_queue_run_restart(Coroutine *co)
+{
+Coroutine *next;
+
+trace_qemu_co_queue_run_restart(co);
+while ((next = QTAILQ_FIRST(co-co_queue_wakeup))) {
+QTAILQ_REMOVE(co-co_queue_wakeup, next, co_queue_next);
+qemu_coroutine_enter(next, NULL);
+}
+}
+
 static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
 {
+Coroutine *self = qemu_coroutine_self();
 Coroutine *next;
-CoQueueNextData *data;
 
 if (QTAILQ_EMPTY(queue-entries)) {
 return false;
 }
 
-data = g_slice_new(CoQueueNextData);
-data-bh = aio_bh_new(queue-ctx, qemu_co_queue_next_bh, data);
-QTAILQ_INIT(data-entries);
-qemu_bh_schedule(data-bh);
-
 while ((next = QTAILQ_FIRST(queue-entries)) != NULL) {
 QTAILQ_REMOVE(queue-entries, next, co_queue_next);
-QTAILQ_INSERT_TAIL(data-entries, next, co_queue_next);
+QTAILQ_INSERT_TAIL(self-co_queue_wakeup, next, co_queue_next);
 trace_qemu_co_queue_next(next);
 if (single) {
 break;
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index 60ac79e..423430d 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -45,6 +45,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 }
 
 co-entry = entry

[Qemu-devel] [PATCH 04/11] block: package rollback code in qmp_transaction()

2013-05-24 Thread Stefan Hajnoczi
From: Wenchao Xia xiaw...@linux.vnet.ibm.com

Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
Reviewed-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 blockdev.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 76f0532..2131e13 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -882,6 +882,13 @@ static void external_snapshot_commit(BlkTransactionStates 
*states)
 NULL);
 }
 
+static void external_snapshot_abort(BlkTransactionStates *states)
+{
+if (states-new_bs) {
+bdrv_delete(states-new_bs);
+}
+}
+
 /*
  * 'Atomic' group snapshots.  The snapshots are taken as a set, and if any fail
  *  then we do not pivot any of the devices in the group, and abandon the
@@ -939,9 +946,7 @@ delete_and_fail:
 * the original bs for all images
 */
 QSIMPLEQ_FOREACH(states, snap_bdrv_states, entry) {
-if (states-new_bs) {
- bdrv_delete(states-new_bs);
-}
+external_snapshot_abort(states);
 }
 exit:
 QSIMPLEQ_FOREACH_SAFE(states, snap_bdrv_states, entry, next) {
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written

2013-05-24 Thread Stefan Hajnoczi
On Wed, May 22, 2013 at 02:50:18PM +0200, Stefan Hajnoczi wrote:
 Net queues support efficient receive disable.  For example, tap's file
 descriptor will not be polled while its peer has receive disabled.  This
 saves CPU cycles for needlessly copying and then dropping packets which
 the peer cannot receive.
 
 rtl8139 is missing the qemu_flush_queued_packets() call that wakes the
 queue up when receive becomes possible again.
 
 As a result, the Windows 7 guest driver reaches a state where the
 rtl8139 cannot receive packets.  The driver has actually refilled the
 receive buffer but we never resume reception.
 
 The bug can be reproduced by running a large FTP 'get' inside a Windows
 7 guest:
 
   $ qemu -netdev tap,id=tap0,...
  -device rtl8139,netdev=tap0
 
 The Linux guest driver does not trigger the bug, probably due to a
 different buffer management strategy.
 
 Reported-by: Oliver Francke oliver.fran...@filoo.de
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  hw/net/rtl8139.c | 3 +++
  1 file changed, 3 insertions(+)

Applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan



Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written

2013-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 08:15:42AM +0200, Peter Lieven wrote:
 I ocassionally have seen a probably related problem in the past. It mainly 
 happend with rtl8139 under
 WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers.
 
 My question is if you see increasing dropped packets on the tap device if 
 this problem occurs?
 
 tap36 Link encap:Ethernet  HWaddr b2:84:23:c0:e2:c0
   inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link
   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
   RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0
   TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0
   collisions:0 txqueuelen:500
   RX bytes:5161769434 (5.1 GB)  TX bytes:380415916 (380.4 MB)

My reading of the tun code is that will see TX dropped increase.  This
is because tun keeps a finite size queue of tx packets.  Since QEMU
userspace is not monitoring the tap fd anymore we'll never drain the
queue and soon enough the TX dropped counter will begin incrementing.

 In my case as well the only option to recover without shutting down the whole 
 vServer is Live Migration
 to another Node.
 
 However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported 
 it does not happen there.

Yes, the patch that exposes this problem was only merged in 1.2.1.

Can you still reproduce the problem now that the patch has been merged
into qemu.git/master?

Stefan



Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information

2013-05-27 Thread Stefan Hajnoczi
On Sun, May 26, 2013 at 10:38:14AM +0300, Michael S. Tsirkin wrote:
 On Fri, May 24, 2013 at 04:00:42PM -0400, Luiz Capitulino wrote:
  On Fri, 24 May 2013 12:05:12 -0600
  Eric Blake ebl...@redhat.com wrote:
  
   On 05/24/2013 10:12 AM, Michael S. Tsirkin wrote:
   
Event message contains the net client name, management might only 
want
to query the single net client.
   
The client can do the filtering itself.
   
   
I'm not sure I buy the responsiveness argument.  Sure, the fastest I/O
is no I/O, but whether you read and parse 100 bytes or 1000 from a Unix
domain socket once in a great while shouldn't make a difference.
   
   And the time spent malloc'ing the larger message to send from qemu, as
   well as the time spent malloc'ing the libvirt side that parses the qemu
   string into C code for use, and the time spent strcmp'ing every entry to
   find the right one...
   
   It really IS more efficient to filter as low down in the stack as
   possible, once it is determined that filtering is desirable.
   
   Whether filtering makes a difference in performance is a different
   question - you may be right that always returning the entire list and
   making libvirt do its own filtering will still not add any more
   noticeable delay compared to libvirt doing a filtered query, if the
   bottleneck lies elsewhere (such as libvirt telling macvtap its new
   configration).
   
   
My main concern is to keep the external interface simple.  I'm rather
reluctant to have query commands grow options.
   
In a case where we need the give me everything query anyway, the 
give
me this particular part option is additional complexity.  Needs
justification, say arguments involving throughput, latency or client
complexity.
   
Perhaps cases exist where we never want to ask for everything.  Then 
the
give me everything query is useless, and the option should be
mandatory.
   
   For this _particular_ interface, I'm not sure whether libvirt will ever
   use an unfiltered query -
  
  If having the argument is useful for libvirt, then it's fine to have it.
  
  But I'd be very reluctant to buy any performance argument w/o real
  numbers to back them up.
 
 Me too. I think it's more convenience than performance.

Agreed.  I suggested filtering on a NIC for usability rather than
performance reasons.

QMP should be easy to use.  Requiring every client to fish for the right
NIC in a bunch of output that gets discarded is not convenient.

Stefan



Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage

2013-05-27 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 11:39:09AM -0400, Corey Bryant wrote:
 
 
 On 05/24/2013 08:36 AM, Stefan Hajnoczi wrote:
 On Fri, May 24, 2013 at 08:13:27AM -0400, Stefan Berger wrote:
 On 05/24/2013 05:59 AM, Stefan Hajnoczi wrote:
 On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote:
 This patch series provides VNVRAM persistent storage support that
 QEMU can use internally.  The initial target user will be a software
 vTPM 1.2 backend that needs to store keys in VNVRAM and be able to
 reboot/migrate and retain the keys.
 
 This support uses QEMU's block driver to provide persistent storage
 by reading/writing VNVRAM data from/to a drive image.  The VNVRAM
 drive image is provided with the -drive command line option just like
 any other drive image and the vnvram_create() API will find it.
 
 The APIs allow for VNVRAM entries to be registered, one at a time,
 each with a maximum blob size.  Entry blobs can then be read/written
 from/to an entry on the drive.  Here's an example of usage:
 
 VNVRAM *vnvram;
 int errcode
 const VNVRAMEntryName entry_name;
 const char *blob_w = blob data;
 char *blob_r;
 uint32_t blob_r_size;
 
 vnvram = vnvram_create(drive-ide0-0-0, false, errcode);
 strcpy((char *)entry_name, first-entry);
 VNVRAMEntryName is very prone to buffer overflow.  I hope real code
 doesn't use strcpy().  The cast is ugly, please don't hide the type.
 
 vnvram_register_entry(vnvram, entry_name, 1024);
 vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1);
 vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size);
 These are synchronous functions.  If I/O is involved then this is a
 problem: QEMU will be blocked waiting for host I/O to complete and the
 big QEMU lock is held.  This can cause poor guest interactivity and poor
 scalability because vcpus cannot make progress, neither can the QEMU
 monitor respond.
 
 The vTPM is going to run as a thread and will have to write state
 blobs into a bdrv. The above functions will typically be called from
 this thead. When I originally wrote the code, the vTPM thread could
 not write the blobs into bdrv directly, so I had to resort to
 sending a message to the main QEMU thread to write the data to the
 bdrv. How else could we do this?
 
 How else: use asynchronous APIs like bdrv_aio_writev() or the coroutine
 versions (which eliminate the need for callbacks) like bdrv_co_writev().
 
 I'm preparing patches that allow the QEMU block layer to be used safely
 outside the QEMU global mutex.  Once this is possible it would be okay
 to use synchronous methods.
 
 Ok thanks.  I'll use aio APIs next time around.  Just to be clear,
 does eliminating the callback mean I don't have to use a
 bottom-half if I use coroutine reads/writes?

I've only skimmed the patches but I think vTPM runs in its own thread
and uses a BH to kick off I/O requests since the block layer must be
called with the QEMU global mutex held.

In this case you still need the BH since its purpose is to run block
layer code in a thread that holds the QEMU global mutex.

Stefan



Re: [Qemu-devel] [PATCH 2/2] rtl8139: flush queued packets when RxBufPtr is written

2013-05-27 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 05:18:09PM +0200, Paolo Bonzini wrote:
 Il 24/05/2013 16:38, Stefan Hajnoczi ha scritto:
  diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
  index 9369507..7993f9f 100644
  --- a/hw/net/rtl8139.c
  +++ b/hw/net/rtl8139.c
  @@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, 
  uint32_t val)
   /* this value is off by 16 */
   s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize);
   
  +/* more buffer space may be available so try to receive */
  +qemu_flush_queued_packets(qemu_get_queue(s-nic));
  +
   DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n,
   s-RxBufferSize, s-RxBufAddr, s-RxBufPtr);
   }
  
 
 Do you have time to update the branch with a Cc: qemu-sta...@nongnu.org?

It has been merged.  Will try to CC qemu-stable in the future.

Stefan



Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-27 Thread Stefan Hajnoczi
On Sun, May 26, 2013 at 11:32:49AM +0200, Luke Gorrie wrote:
 Stefan put us onto the highly promising track of vhost/virtio. We have
 implemented this between Snabb Switch and the Linux kernel, but not
 directly between Snabb Switch and QEMU guests. The roadblock we have hit
 is embarrasingly basic: QEMU is using user-to-kernel system calls to setup
 vhost (open /dev/net/tun and /dev/vhost-net, ioctl()s) and I haven't found
 a good way to map these towards Snabb Switch instead of the kernel.

vhost_net is about connecting the a virtio-net speaking process to a
tun-like device.  The problem you are trying to solve is connecting a
virtio-net speaking process to Snabb Switch.

Either you need to replace vhost or you need a tun-like device
interface.

Replacing vhost would mean that your switch implements virtio-net,
shares guest RAM with the guest, and shares the ioeventfd and irqfd
which are used to signal with the guest.  At that point your switch is
similar to the virtio-net data plane work that Ping Fan Liu is working
on but your switch is in a separate process rather than a thread.

How does your switch talk to hardware?  If you have userspace NIC
drivers that bypass the Linux network stack then the approach I
mentioned fits well.

If you are using the Linux network stack then it might be better to
integrate with vhost maybe as a tun-like device driver.

Stefan



Re: [Qemu-devel] [PATCH] kvm: exclude ioeventfd from counting kvm_io_range limit

2013-05-27 Thread Stefan Hajnoczi
On Sat, May 25, 2013 at 06:44:15AM +0800, Amos Kong wrote:
 We can easily reach the 1000 limit by start VM with a couple
 hundred I/O devices (multifunction=on). The hardcode limit
 already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
 
 In userspace, we already have maximum file descriptor to
 limit ioeventfd count. But kvm_io_bus devices also are used
 for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
 by maximum file descriptor.
 
 Currently only ioeventfds take too much kvm_io_bus devices,
 so just exclude it from counting kvm_io_range limit.
 
 Also fixed one indent issue in kvm_host.h
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  include/linux/kvm_host.h | 3 ++-
  virt/kvm/eventfd.c   | 2 ++
  virt/kvm/kvm_main.c  | 3 ++-
  3 files changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com



Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports lost interrupt and the file system becomes read-only

2013-05-27 Thread Stefan Hajnoczi
On Sun, May 26, 2013 at 07:08:02PM +0200, Andreas Färber wrote:
 Am 21.05.2013 09:12, schrieb Gonglei (Arei):
  Through analysis, I found that because the system call the fdatasync 
  command in the Qemu over 30s, 
  after the Guest's kernel thread detects the io transferation is timeout, 
  went to check IDE disk state. 
  But the IDE disk status is 0x50, rather than the BSY status, and then 
  departed error process...
  
  the path of kernel's action is :
  scsi_softirq_done
   scsi_eh_scmd_add
 scsi_error_handler
   shost-transportt-eh_strategy_handler 
  ata_scsi_error 
  ap-ops-lost_interrupt
  ata_sff_lost_interrupt
  Finally, the file system becomes read-only.
  
  Why not set the IDE disk for the BSY status When 0xe7 command is executed 
  in the Qemu?
 
 Have you actually tried that out with a patch such as the following?
 
 diff --git a/hw/ide/core.c b/hw/ide/core.c
 index c7a8041..bf1ff18 100644
 --- a/hw/ide/core.c
 +++ b/hw/ide/core.c
 @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret)
  {
  IDEState *s = opaque;
 
 +s-status = ~BUSY_STAT;
 +
  if (ret  0) {
  /* XXX: What sector number to set here? */
  if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) {
 @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s)
  return;
  }
 
 +s-status |= BUSY_STAT;
  bdrv_acct_start(s-bs, s-acct, 0, BDRV_ACCT_FLUSH);
  bdrv_aio_flush(s-bs, ide_flush_cb, s);
  }
 
 No clue if this is spec-compliant. ;)
 
 Note however that qemu_fdatasync() is done in the flush callback of
 block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or
 bdrv_flush_all() may potentially run into issues beyond just ATA:

This is an IDE emulation bug.  virtio-blk, for example, doesn't have
this kind of busy status bit.  It's probably not an issue with SCSI
either.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier

2013-05-27 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 05:25:21PM +0200, Markus Armbruster wrote:
 Tomoki Sekiyama tomoki.sekiy...@hds.com writes:
 
  On 5/24/13 4:52 , Stefan Hajnoczi stefa...@gmail.com wrote:
 
 On Thu, May 23, 2013 at 06:34:43PM +, Tomoki Sekiyama wrote:
  On 5/23/13 8:12 , Stefan Hajnoczi stefa...@gmail.com wrote:
  
  On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote:
   Add C++ keywords to avoid errors in compiling with c++ compiler.
   This also renames class member of PciDeviceInfo to q_class.
   
   Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com
   ---
hmp.c   |2 +-
hw/pci/pci.c|2 +-
scripts/qapi.py |9 -
3 files changed, 10 insertions(+), 3 deletions(-)
  
  Please also extend scripts/checkpatch.pl.  Otherwise it is very likely
  that C++ keywords will be introduced again in the future.  Most people
  will not build the VSS code and therefore checkpatch.pl needs to ensure
  that patches with C++ keywords will not be accepted.
  
  Stefan
  
  I think it would be difficult to identify problematic C++ keywords usage
  from patches because headers can legally contain C++ keywords and
  checkpatch.pl doesn't know where it should be used.
  Do you have any good ideas?
 
 We can ignore false warnings for 0.1% of patches (the ones that touch
 VSS C++ code).  But for the remaining 99.9% of patches it's worth
 guarding against VSS bitrot.
 
 Remember not many people will compile it and therefore they won't notice
 when they break it.  I really think it's worth putting some effort in
 now so VSS doesn't periodically break.
 
 checkpatch.pl is a hacky sort of C parser.  It already checks for a
 bunch of similar things and it knows about comments, macros, and
 strings.  It does not perform #include expansion, so there is no risk of
 including system headers that have C++ code.
 
 Stefan
 
  Thanks for your comment.
 
  I'm still wondering because it actually causes a lot false positives
  (not just 0.1%...) even for the patches not touching VSS.
 
  For example, keyword 'class' is used in qdev-monitor.c, qom/object.c,
  and a lot of files in hw/*/*.c and include/{hw,qom}/*.h, but
  they have nothing to do with qemu-ga. Qemu-ga is just a part of whole
  qemu source code, so I don't want to warn around the other parts.
 
 And I appreciate that.  Purging some other language's keywords feels
 like pointless churn to me.

I see what you guys are saying now.  You want to protect only qemu-ga.
I was proposing protecting the entire codebase so we never get into a
situation where changing a header breaks qemu-ga.

It's not that hard to use klass instead of class, but if it's
unpopular then we'll just have to live with VSS breakage.

Stefan



Re: [Qemu-devel] [RFC PATCH v3 11/11] QMP/qmp.py: set locale for exceptions to display non-ascii messages correctly

2013-05-27 Thread Stefan Hajnoczi
On Fri, May 24, 2013 at 06:55:21PM +, Tomoki Sekiyama wrote:
 On 5/23/13 8:30 , Stefan Hajnoczi stefa...@gmail.com wrote:
 Either or both bugs could be present.  Once they are fixed you shouldn't
 see encoding problems.
 
 Stefan
 
 Anyway, as I understood this patch for qmp.py is not correct way to fix
 this issue, I'm going to drop it from the series.

Interesting, thanks for showing that that QEMU and qmp.py are behaving
correctly.

For reference, the Python unicode exception bug is here and doesn't seem
to be fixable in Python 2.x:

http://bugs.python.org/issue2517

Stefan



Re: [Qemu-devel] 'qemu-nbd' explicit flush

2013-05-27 Thread Stefan Hajnoczi
On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
 On 5/24/13 1:05 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, May 23, 2013 at 09:58:31PM +, Mark Trumpold wrote:
 One thing to be careful of is whether these operations are asynchronous.
 The signal is asynchronous, you have no way of knowing when qemu-nbd is
 finished flushing to the physical disk.
 
 Right, of course.  I missed the obvious.

I missed something too.  Paolo may have already hinted at this when he
posted a dd oflag=sync command-line option:

blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
writes out dirty pages to the block device.  It does *not* guarantee to
send a flush request to the device.

Therefore, the underlying image file may not be put into an up-to-date
state by qemu-nbd.


I suggest trying the following instead of blockdev --flushbufs:

  python -c 'import os; os.fsync(open(/dev/loopX, r+b))'

This should do the same as blockdev --flushbufs *plus* it sends and
waits for the NBD FLUSH command.

You may have to play with this command-line a little but the main idea
is to open the block device and fsync it.

Stefan



Re: [Qemu-devel] [Bug 1184089] [NEW] [Feature request] loadvm snapshot as read-only

2013-05-27 Thread Stefan Hajnoczi
On Sat, May 25, 2013 at 08:29:11AM -, Michael Coppola wrote:
 There are many ways to take and manage snapshots in QEMU, but one main
 feature that's missing is the ability to 'loadvm' a LIVE snapshot and
 have all future changes redirected to a temporary file.  This would
 effectively be combining the -loadvm and -snapshot switches and make the
 snapshot read-only.  With this feature, users would be provided a
 sandbox and be able to start and restart the same live snapshot
 without corrupting the image in doing so.

This should be possible soon.  Wenchao Xia is working on new monitor
commands that allow you to combine internal snapshots (loadvm/savevm)
with external snapshots (blockdev-snapshot-sync).

You would submit a QMP 'transaction' command that specifies a loadvm
followed by a blockdev-snapshot-sync.  This operation is atomic.

Note that internal snapshots do not destroy the snapshot.  Therefore,
when you loadvm an internal snapshot and write to the disk, you are not
modifying the internal snapshot only the current state of the disk.  You
can loadvm it again later.

Stefan



Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed under xen-4.1.2 and qemu upstream

2013-05-27 Thread Stefan Hajnoczi
On Sat, May 25, 2013 at 07:32:55AM +, Gonglei (Arei) wrote:
 
 
  -Original Message-
  From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
  Sent: Friday, May 24, 2013 8:20 PM
  To: Gonglei (Arei)
  Cc: Stefano Stabellini; Hanweidong; Luonengjun; qemu-devel@nongnu.org;
  Wangzhenguo; xen-de...@lists.xen.org; Huangweidong (Hardware)
  Subject: Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed
  under xen-4.1.2 and qemu upstream
  
  On Fri, May 24, 2013 at 02:59:05AM +, Gonglei (Arei) wrote:
   Hi,
  
   
On Thu, 23 May 2013, Gonglei (Arei) wrote:
 Hi, all

 I use O_DIRECT to open disk images for IDE, but I'm failed. After 
 debug, I
  get
the below logs:
 [2013-05-22 23:25:46] ide: CMD=c8
 [2013-05-22 23:25:46] bmdma: readb 0x00 : 0x08
 [2013-05-22 23:25:46] bmdma: writeb 0x00 : 0x09
 [2013-05-22 23:25:46] bmdma_cmd_writeb: 0x0009
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01
 [2013-05-22 23:26:39] bmdma: writeb 0x00 : 0x08
 [2013-05-22 23:26:39] bmdma_cmd_writeb: 0x0008
 [2013-05-22 23:26:56] == offset:0 buf:0x7ff100f21c00 count:512
aio_offset:0
 [2013-05-22 23:31:30] == offset:0 buf:0x7ff100f21c00 count:512
aio_offset:0
 [2013-05-22 23:31:30] == handle_aiocb_rw_linear errno: -14
 [2013-05-22 23:31:30] == paio_complete errno=14
 [2013-05-22 23:31:30] == ide_dma_error!!!
 [2013-05-22 23:31:30] ide: read status addr=0x3f6 val=41

 QEMU command line :
 qemu-system-i386 -xen-domid 837 -chardev
socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-837,server,nowait -mon
chardev=libxl-cmd,mode=control -name suse11 -vnc 0.0.0.0:1 -serial pty
  -boot
order=c -usb -usbdevice tablet -smp 2,maxcpus=2 -device
rtl8139,id=nic0,netdev=net0,mac=00:16:3e:13:d3:72 -netdev
type=tap,id=net0,ifname=tap837.0,downscript=no -M xenfv -m 2040 -drive
   
  file=/mnt/sdd/image/suse.image,if=ide,index=0,media=disk,format=raw,cache
=none

 errno 14 shows Bad Address. And I find QEMU_AIO_MISALIGNED flag bit
  is
not set through debug.

 /*
  * If O_DIRECT is used the buffer needs to be aligned on a sector
  * boundary.  Check if this is the case or tell the low-level
  * driver that it needs to copy the buffer.
  */
 if ((bs-open_flags  BDRV_O_NOCACHE)) {
 if (!bdrv_qiov_is_aligned(bs, qiov)) {//if the address is
aligned-512, will no meet the conditions
 type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
 } else if (s-use_aio) {
 return laio_submit(bs, s-aio_ctx, s-fd, sector_num, 
 qiov,
nb_sectors, cb, opaque, type);
 #endif

 Next process:
 static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
 {
 ssize_t nbytes;
 char *buf;

 if (!(aiocb-aio_type  QEMU_AIO_MISALIGNED)) {
 /*
  * If there is just a single buffer, and it is properly 
 aligned
  * we can just use plain pread/pwrite without any problems.
  */
 if (aiocb-aio_niov == 1)
  return handle_aiocb_rw_linear(aiocb,
aiocb-aio_iov-iov_base); //this way, and reports errno 14 next

 Anyone have a good method to resolve this bug? Thanks!
   
I know that this is not the answer you are looking for but why do you
want to use O_DIRECT with IDE?
It should be perfectly safe to use write-back.
  
   A few days ago, I asked a question about the IDE FLUSH time of guest:
   http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg02642.html
  
   finally I found that because Qemu use write-back flag to open disk images 
   by
  default.
   so I hope to use O_DIRECT to avoid meeting that problem, but I'm failed
  under Xen platform with Qemu upstream.
  
  cache=none does not eliminate the need for flush.  If you want to do
  that, try cache=writethrough or cache=directsync.
  
 I had tried it, but I got the same error.

Perhaps that's because the busy bit also needs to be set correctly for
other IDE commands (like read/write requests).

  Regarding the EFAULT you are seeing, did you check if the I/O buffer
  address is valid?  Try breaking in gdb and inspecting /proc/pid/maps
  or just x buffer-address in gdb to see if the buffer is readable.
  
  Stefan
 
 Hi, Stefan
 
 The follows are details in gdb:
 handle_aiocb_rw_linear (aiocb=0x13a0f80, buf=0x7f0a637d7c00 Address 
 0x7f0a637d7c00 out of bounds)
 at /mnt/sdd/gonglei/xen-4.1.2/tools/qemu-xen/posix-aio-compat.c:213
 213 {
 (gdb) bt
 #0  handle_aiocb_rw_linear (aiocb=0x13a0f80

[Qemu-devel] [PATCH v4] tests: set MALLOC_PERTURB_ to expose memory bugs

2013-05-27 Thread Stefan Hajnoczi
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment
variable is set.  The value of the environment variable determines the
bit pattern used to wipe memory.  For more information, see
http://udrepper.livejournal.com/11429.html.

Set MALLOC_PERTURB_ for gtester and qemu-iotests.  Note we pick a random
value from 1 to 255 to expose more bugs.  If you need to reproduce a
crash use 'show environment' in gdb to extract the MALLOC_PERTURB_
value from a core dump.

Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
Embarassing how long it's taking to get this right :P

v4:
 * Drop dash-incompatible $$RANDOM [eblake]

v3:
 * Use $ escaping in tests/Makefile [eblake]

v2:
 * Randomize MALLOC_PERTURB_ value [armbru]
 * Preserve existing MALLOC_PERTURB_ variable, if set [danpb]

 tests/Makefile   | 5 -
 tests/qemu-iotests/check | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index a307d5a..c107489 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
 $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y)
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
$(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
+   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \
gtester $(GTESTER_OPTIONS) -m=$(SPEED) 
$(check-qtest-$*-y),GTESTER $@)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \
  echo Gcov report for $$f:;\
@@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): 
check-qtest-%: $(check-qtest-y)
 .PHONY: $(patsubst %, check-%, $(check-unit-y))
 $(patsubst %, check-%, $(check-unit-y)): check-%: %
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
-   $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER 
$*)
+   $(call quiet-command, \
+   MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \
+   gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*)
$(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \
  echo Gcov report for $$f:;\
  $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..74628ae 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -214,7 +214,8 @@ do
start=`_wallclock`
$timestamp  echo -n  [`date +%T`]
[ ! -x $seq ]  chmod u+x $seq # ensure we can run it
-   ./$seq $tmp.out 21
+   MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \
+   ./$seq $tmp.out 21
sts=$?
$timestamp  _timestamp
stop=`_wallclock`
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written

2013-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 04:24:59PM +0200, Peter Lieven wrote:
 On 27.05.2013 16:07, Oliver Francke wrote:
 Well,
 
 Am 27.05.2013 um 08:15 schrieb Peter Lieven lieven-li...@dlhnet.de:
 
 Hi all,
 
 I ocassionally have seen a probably related problem in the past. It mainly 
 happend with rtl8139 under
 WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers.
 
 My question is if you see increasing dropped packets on the tap device if 
 this problem occurs?
 
 tap36 Link encap:Ethernet  HWaddr b2:84:23:c0:e2:c0
   inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link
   UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
   RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0
   TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0
   collisions:0 txqueuelen:500
   RX bytes:5161769434 (5.1 GB)  TX bytes:380415916 (380.4 MB)
 
 In my case as well the only option to recover without shutting down the 
 whole vServer is Live Migration
 to another Node.
 
 ACK, tried it and every network-devices might have been re-created into a 
 defined state qemu-wise.
 
 However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported 
 it does not happen there.
 
 Neither me nor any  affected customers have ever seen such failures in 
 qemu-1.2.0, so this was my last-known-good ;)
 I cherry-picked
 
 net: add receive_disabled logic to iov delivery path

This one exposes the bug that Oliver reported:

commit a9d8f7b1c41a8a346f4cf5a0c6963a79fbd1249e
Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Date:   Mon Aug 20 13:35:23 2012 +0100

net: do not report queued packets as sent



Re: [Qemu-devel] [Bug 1184616] [NEW] undefined reference to `trace_qemu_anon_ram_alloc'

2013-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 4:02 PM, Nigel Horne 1184...@bugs.launchpad.net wrote:
 The latest git version (commit 6a4e17711442849bf2cc731ccddef5a2a2d92d29)
 fails to compile:

 ...
   LINK  qemu-ga
 libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_alloc':
 oslib-posix.c:(.text+0x154): undefined reference to 
 `trace_qemu_anon_ram_alloc'
 libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_free':
 oslib-posix.c:(.text+0x1e7): undefined reference to `trace_qemu_anon_ram_free'
 collect2: error: ld returned 1 exit status
 make: *** [qemu-ga] Error 1

 This is on Ubuntu 13.04, gcc 4.7.3, configure flags:
 './configure' '--enable-linux-aio' '--enable-kvm' '--enable-vhost-net'

Please try:
make distclean  ./configure --enable-linux-aio --enable-kvm
--enable-vhost-net  make

Stefan



Re: [Qemu-devel] [PATCH V3 1/4] block: drop bs_snapshots global variable

2013-05-28 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 05:25:16PM +0200, Kevin Wolf wrote:
 Am 25.05.2013 um 05:09 hat Wenchao Xia geschrieben:
  From: Stefan Hajnoczi stefa...@redhat.com
  
  The bs_snapshots global variable points to the BlockDriverState which
  will be used to save vmstate.  This is really a savevm.c concept but was
  moved into block.c:bdrv_snapshots() when it became clear that hotplug
  could result in a dangling pointer.
  
  While auditing the block layer's global state I came upon bs_snapshots
  and realized that a variable is not necessary here.  Simply find the
  first BlockDriverState capable of internal snapshots each time this is
  needed.
  
  The behavior of bdrv_snapshots() is preserved across hotplug because new
  drives are always appended to the bdrv_states list.  This means that
  calling the new find_vmstate_bs() function is idempotent - it returns
  the same BlockDriverState unless it was hot-unplugged.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  Reviewed-by: Eric Blake ebl...@redhat.com
  Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
  Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
  ---
   block.c   |   28 
   include/block/block.h |1 -
   savevm.c  |   19 +++
   3 files changed, 15 insertions(+), 33 deletions(-)
  
  diff --git a/block.c b/block.c
  index 3f87489..478a3b2 100644
  --- a/block.c
  +++ b/block.c
  @@ -99,9 +99,6 @@ static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
   static QLIST_HEAD(, BlockDriver) bdrv_drivers =
   QLIST_HEAD_INITIALIZER(bdrv_drivers);
   
  -/* The device to use for VM snapshots */
  -static BlockDriverState *bs_snapshots;
  -
   /* If non-zero, use only whitelisted block drivers */
   static int use_bdrv_whitelist;
   
  @@ -1357,9 +1354,6 @@ void bdrv_close(BlockDriverState *bs)
   notifier_list_notify(bs-close_notifiers, bs);
   
   if (bs-drv) {
  -if (bs == bs_snapshots) {
  -bs_snapshots = NULL;
  -}
   if (bs-backing_hd) {
   bdrv_delete(bs-backing_hd);
   bs-backing_hd = NULL;
  @@ -1591,7 +1585,6 @@ void bdrv_delete(BlockDriverState *bs)
   
   bdrv_close(bs);
   
  -assert(bs != bs_snapshots);
   g_free(bs);
   }
   
  @@ -1635,9 +1628,6 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
  BlockDevOps *ops,
   {
   bs-dev_ops = ops;
   bs-dev_opaque = opaque;
  -if (bdrv_dev_has_removable_media(bs)  bs == bs_snapshots) {
  -bs_snapshots = NULL;
  -}
 
 This hunk isn't replaced by any other code. If I understand correctly
 what it's doing, it prevented you from saving the VM state to a
 removable device, which would be allowed after this patch.
 
 Is this a change we want to make? Why isn't it described in the commit
 message?

My understanding of this change is different.  Markus is on CC so maybe
he can confirm.

The point of bs_snapshots = NULL is not to prevent you from saving
snapshots.  It's simply to reset the pointer to the next snapshottable
device (used by bdrv_snapshots()).

See the bdrv_close() hunk above which does the same thing, as well as
bdrv_snapshots() which iterates bdrv_states and updates bs_snapshots.

So what this hunk does is to reset the bdrv_snapshots() iterator when a
removable device is hooked up to an emulated storage controller.  It's
no longer necessary since this patch drops the global state
(bs_snapshots) and users will always iterate from scratch.

The whole stateful approach was not necessary.

Stefan



Re: [Qemu-devel] [PATCH] block: add read only to whitelist

2013-05-28 Thread Stefan Hajnoczi
On Tue, May 28, 2013 at 02:44:26PM +0800, Fam Zheng wrote:
 We may want to include a driver in the whitelist for read only tasks
 such as diagnosing or exporting guest data (with libguestfs as a good
 example). This patch introduces the magic prefix ^ to include a driver
 to the whitelist, but only enables qemu to open specific image
 format readonly, and returns -ENOTSUP for RW opening.
 
 Example: To include vmdk readonly, and others read+write:
 ./configure --block-drv-whitelist=qcow2,raw,file,qed,^vmdk

This is great, thanks for tackling this.  block/vmdk.c isn't suitable
for running real VMs (read-write) since it's not optimized for
concurrent I/O but it is usable for libguestfs (read-only).

 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block.c   | 43 +++
  blockdev.c|  4 ++--
  configure |  2 ++
  include/block/block.h |  3 ++-
  scripts/create_config |  9 -
  5 files changed, 41 insertions(+), 20 deletions(-)
 
 diff --git a/block.c b/block.c
 index 3f87489..e6a7270 100644
 --- a/block.c
 +++ b/block.c
 @@ -328,28 +328,40 @@ BlockDriver *bdrv_find_format(const char *format_name)
  return NULL;
  }
  
 -static int bdrv_is_whitelisted(BlockDriver *drv)
 +static int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
  {
 -static const char *whitelist[] = {
 -CONFIG_BDRV_WHITELIST
 +static const char *whitelist_rw[] = {
 +CONFIG_BDRV_WHITELIST_RW
 +};
 +static const char *whitelist_ro[] = {
 +CONFIG_BDRV_WHITELIST_RO
  };
  const char **p;

Please also make the ./configure lists separate.  The funky ^ syntax is
not obvious.  Better to have:

  ./configure --block-drv-whitelist-rw=qcow2,raw,file,qed \
  --block-drv-whitelist-ro=vmdk,vpc

  
 -if (!whitelist[0])
 +if (!whitelist_rw[0]  !whitelist_ro[0]) {
  return 1;   /* no whitelist, anything goes */
 +}
  
 -for (p = whitelist; *p; p++) {
 +for (p = whitelist_rw; *p; p++) {
  if (!strcmp(drv-format_name, *p)) {
  return 1;
  }
  }
 +if (read_only) {
 +for (p = whitelist_ro; *p; p++) {
 +if (!strcmp(drv-format_name, *p)) {
 +return 1;
 +}
 +}
 +}
  return 0;
  }
  
 -BlockDriver *bdrv_find_whitelisted_format(const char *format_name)
 +BlockDriver *bdrv_find_whitelisted_format(const char *format_name,
 +  bool read_only)
  {
  BlockDriver *drv = bdrv_find_format(format_name);
 -return drv  bdrv_is_whitelisted(drv) ? drv : NULL;
 +return drv  bdrv_is_whitelisted(drv, read_only) ? drv : NULL;
  }
  
  typedef struct CreateCo {
 @@ -684,10 +696,6 @@ static int bdrv_open_common(BlockDriverState *bs, 
 BlockDriverState *file,
  
  trace_bdrv_open_common(bs, filename ?: , flags, drv-format_name);
  
 -if (use_bdrv_whitelist  !bdrv_is_whitelisted(drv)) {
 -return -ENOTSUP;
 -}
 -
  /* bdrv_open() with directly using a protocol as drv. This layer is 
 already
   * opened, so assign it to bs (while file becomes a closed 
 BlockDriverState)
   * and return immediately. */
if (file != NULL  drv-bdrv_file_open) {
bdrv_swap(file, bs);
return 0;
}

I think your change is okay.  You moved the check after this early
return, but file is already opened so we passed the whitelist check
already.  This is a little tricky but it seems fine.

Stefan



Re: [Qemu-devel] kvm process disappears

2013-05-28 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 11:09:51PM +0200, Stefan Priebe wrote:
 Am 10.05.2013 13:09, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 11:07 AM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Am 10.05.2013 09:42, schrieb Stefan Hajnoczi:
 On Fri, May 10, 2013 at 08:12:39AM +0200, Stefan Priebe - Profihost AG 
 wrote:
 3. Either use gdb or an LD_PRELOAD library that catches exit(3) and
 _exit(2) and dumps core using abort(3).  Make sure core dumps are
 enabled.
 
 LD_PRELOAD sounds good can you point me to such a lib?
 
 $ cat /tmp/catchexit.c
 #include unistd.h
 #include stdlib.h
 
 void exit(int status)
 {
  const char msg[] = *** CAUGHT EXIT, DUMPING CORE ***\n;
  write(2, msg, sizeof msg);
  abort();
 }
 
 void _exit(int status) __attribute__((alias(exit)));
 
 $ gcc -o catchexit.so -shared -fPIC -std=gnu99 catchexit.c
 
 $ LD_PRELOAD=/tmp/catchexit.so x86_64-softmmu/qemu-system-x86_64 -m
 1024 -enable-kvm -cpu host -vga asdf
 Unknown vga type: asdf
 *** CAUGHT EXIT, DUMPING CORE ***
 Aborted (core dumped)
 
 Make sure to give the absolute path to catchexit.so.  Also keep in
 mind that this does not catch a normal return from main() or possibly
 other ways of terminating the process.
 
 You can hook more library functions, if necessary.
 
 Stefan
 
 I'm really sorry for bothering you. It turned out to be a host
 kernel bug. Without NUMA Balancing turned on (kernel 3.8.13) i see
 no vm crashes at all...

No worries.  Glad you found the solution.

Stefan



Re: [Qemu-devel] [Bug 1184089] Re: [Feature request] loadvm snapshot as read-only

2013-05-28 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 10:42:17PM -, Michael Coppola wrote:
 Awesome, looking forward to it.  I may be misunderstanding what's
 happening under the hood, but at least for me, calling 'loadvm' on a
 single snapshot over and over seems to work the first few times and then
 immediately blue screens the WinXP guest with PFN_LIST_CORRUPT.  I was
 under the assumption that all runtime modifications were being written
 back to the image, effectively corrupting something (whether it was
 changes to the snapshot or the backing image causing things to break).

savevm/loadvm does not use backing images.  It relies on internal
snapshot which are stored inside the existing qcow2 image file.

If you *are* using backing images then you're right - modifying the
backing image is likely to trigger weird guest behavior.

 Until then, I've seemed to have found a workaround for the feature
 itself.  Instead of creating a snapshot with 'savevm', I can start the
 VM with -snapshot and then call:
 
 migrate exec: gzip -c  snapshot.gz
 
 in QMP and it saves the live image to a compressed file.  Make sure it's
 completed migration before exiting with info migrate.  Subsequently
 loading the snapshot with:
 
 qemu-* whatever flags -incoming exec: gzip -c -d snapshot.gz
 -snapshot
 
 will load the live snapshot and redirect all runtime modifications to a
 temp file.  http://www.linux-kvm.org/page/Migration says not to use
 -snapshot, but who follows the rules anyways? ;)  It seems to work so
 far and things haven't exploded yet.  Running md5sum on the qcow2 image
 and gzip snapshot before and after shows no changes to either files.

The reason that -snapshot isn't used together with migration is because
the disk state will be discarded and not migrated.

Stefan



Re: [Qemu-devel] [Bug 1184616] [NEW] undefined reference to`trace_qemu_anon_ram_alloc'

2013-05-28 Thread Stefan Hajnoczi
On Mon, May 27, 2013 at 09:04:26PM -, Nigel Horne wrote:
  On Mon, May 27, 2013 at 4:02 PM, Nigel Horne 1184...@bugs.launchpad.net 
  wrote:
  The latest git version (commit 6a4e17711442849bf2cc731ccddef5a2a2d92d29)
  fails to compile:
 
  ...
LINK  qemu-ga
  libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_alloc':
  oslib-posix.c:(.text+0x154): undefined reference to 
  `trace_qemu_anon_ram_alloc'
  libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_free':
  oslib-posix.c:(.text+0x1e7): undefined reference to 
  `trace_qemu_anon_ram_free'
  collect2: error: ld returned 1 exit status
  make: *** [qemu-ga] Error 1
 
  This is on Ubuntu 13.04, gcc 4.7.3, configure flags:
  './configure' '--enable-linux-aio' '--enable-kvm' '--enable-vhost-net'
 
  Please try:
  make distclean  ./configure --enable-linux-aio --enable-kvm
  --enable-vhost-net  make
 
 I tried that before I raised the bug.  However I tried it again to be sure, 
 and yes I still get the same error.

Please post the output of git status.  I wonder if there are stale
files because the build works fine here.

Stefan



[Qemu-devel] [PATCH] qemu-iotests: fix 054 cluster size help output

2013-05-28 Thread Stefan Hajnoczi
Commit f3f4d2c09b9cf46903ba38425ec46c44185162bd added a hint to increase
the cluster size when a large image cannot be created.  Test 054 now has
outdated output and fails because the golden output does not match.

This patch updates the 054 golden output.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/054.out | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/054.out b/tests/qemu-iotests/054.out
index 0b2fe30..2f357c2 100644
--- a/tests/qemu-iotests/054.out
+++ b/tests/qemu-iotests/054.out
@@ -1,7 +1,7 @@
 QA output created by 054
 
 creating too large image (1 EB)
-qemu-img: The image size is too large for file format 'qcow2'
+qemu-img: The image size is too large for file format 'qcow2' (try using a 
larger cluster size)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1152921504606846976 
 
 creating too large image (1 EB) using qcow2.py
-- 
1.8.1.4




Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O

2013-05-28 Thread Stefan Hajnoczi
On Tue, May 28, 2013 at 12:10 PM, Luke Gorrie luk...@gmail.com wrote:
 On 27 May 2013 11:34, Stefan Hajnoczi stefa...@redhat.com wrote:

 vhost_net is about connecting the a virtio-net speaking process to a
 tun-like device.  The problem you are trying to solve is connecting a
 virtio-net speaking process to Snabb Switch.


 Yep!


 Either you need to replace vhost or you need a tun-like device
 interface.

 Replacing vhost would mean that your switch implements virtio-net,
 shares guest RAM with the guest, and shares the ioeventfd and irqfd
 which are used to signal with the guest.


 This would be a great solution from my perspective. This is the design that
 I am now struggling to find a good implementation strategy for.


 At that point your switch is similar to the virtio-net data plane work
 that Ping Fan Liu is working
 on but your switch is in a separate process rather than a thread.


 Thanks for the reference! I was not aware of this work and it sounds highly
 relevant.

 How does your switch talk to hardware?  If you have userspace NIC
 drivers that bypass the Linux network stack then the approach I
 mentioned fits well.


 The switch talks to hardware using a built-in userspace (kernel bypass)
 device driver.

BTW there is an effort to get low-latency networking integrated into Linux:

http://thread.gmane.org/gmane.linux.kernel/1493276

Stefan



  1   2   3   4   5   6   7   8   9   10   >