Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: You only need to fdatasync() before every guest flush, no? No, you need to set the dirty bit before issuing the write on the host. Otherwise the image data may be modified without setting the appropriate dirty bit. That would allow data modifications to go undetected! But data modifications can go undetected until the guest flush returns, can't they? You are thinking about it from the guest perspective - if a flush has not completed yet then there is no guarantee that the write has reached disk. But from a host perspective the dirty bitmap should be conservative so that the backup application can always restore a bit-for-bit identical copy of the disk image. It would be weird if writes can sneak in unnoticed. Stefan
Re: [Qemu-devel] [PATCH v2 2/2] net: introduce command to query mac-table information
On Tue, May 21, 2013 at 12:46:09PM +0800, Amos Kong wrote: On Fri, May 17, 2013 at 09:39:31AM +0200, Stefan Hajnoczi wrote: On Thu, May 16, 2013 at 07:07:25PM +0800, Amos Kong wrote: Hi Stefan, @@ -961,6 +961,44 @@ void print_net_client(Monitor *mon, NetClientState *nc) nc-info_str); } +MacTableInfoList *qmp_query_mac_table(bool has_name, const char *name, + Error **errp) +{ +NetClientState *nc; +MacTableInfoList *table_list = NULL, *last_entry = NULL; + +QTAILQ_FOREACH(nc, net_clients, next) { +MacTableInfoList *entry; +MacTableInfo *info; + +if (nc-info-type != NET_CLIENT_OPTIONS_KIND_NIC) { +continue; +} +if (has_name strcmp(nc-name, name) != 0) { +continue; +} if (has_name strcmp(nc-name, name) != 0) { error_setg(errp, invalid net client name: %s, name); break; } We still need to search the other NICs. Bailing early doesn't work: imagine we have nic1 and nic2. If the user invokes query-mac-table nic2 then this code would return an error! Stefan
Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Tue, May 21, 2013 at 11:25:01AM +0800, Wenchao Xia wrote: 于 2013-5-17 17:14, Stefan Hajnoczi 写道: On Fri, May 17, 2013 at 02:58:57PM +0800, Wenchao Xia wrote: 于 2013-5-16 15:47, Stefan Hajnoczi 写道: On Thu, May 16, 2013 at 02:16:20PM +0800, Wenchao Xia wrote: After checking the code, I found it possible to add delta data backup support also, If an additional dirty bitmap was added. I've been thinking about this. Incremental backups need to know which blocks have changed, but keeping a persistent dirty bitmap is expensive and unnecessary. Yes, it would be likely another block layer, so hope not do that. Not at all, persistent dirty bitmaps need to be part of the block layer since they need to support any image type - qcow2, Gluster, raw LVM, etc. I don't consider block jobs to be qemu device layer. It sounds like you think the code should be in bdrv_co_do_writev()? I feel a trend of becoming fragility from different solutions, and COW is a key feature that block layer provide, so I wonder if it can be adjusted under block layer later The generic block layer includes more than just block.c. It also includes block jobs and the qcow2 metadata cache that Dong Xu has extracted recently, for example. Therefore you need to be more specific about what and why. This copy-on-write backup approach is available as a block job which runs on top of any BlockDriverState. What concrete change are you proposing? Since hard to hide it BlockDriverState now, suggest add some document in qemu about the three snapshot types: qcow2 internal, backing chain, drive-backup, which are all qemu software based snapshot implemention, then user can know the difference with it eaiser. In long term, I hope to form a library expose those in a unified format, perhaps it calls qmp_transaction internally, and make it easier to be offloaded if possible, so hope a abstract-driver structure. Okay, just keep in mind they have different behavior. That means these snapshot types solve different problems and may be inappropriate for some use cases. Stefan
Re: [Qemu-devel] [PATCH] hw/9pfs: use O_NOFOLLOW for mapped readlink operation
On Mon, May 20, 2013 at 11:04:29PM +0530, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com With mapped security models like mapped-xattr and mapped-file, we save the symlink target as file contents. Now if we ever expose a normal directory with mapped security model and find real symlinks in export path, never follow them and return proper error. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- hw/9pfs/virtio-9p-local.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] [PATCH 0/2] coroutine: dataplane support
On Fri, May 17, 2013 at 04:08:36PM +0100, Peter Maydell wrote: On 17 May 2013 14:51, Stefan Hajnoczi stefa...@redhat.com wrote: There is ongoing work to enable multiple event loop threads. This will allow QEMU itself to take advantage of SMP and reduce Big QEMU Lock (BQL) contention. This series is one step in that effort. These patches make coroutines safe in a multi-event loop/multi-threaded world. I have successfully tested them running qcow2 in a dataplane thread (further patches are required which I'll be sending soon). Did you test the sigaltstack backend as well as ucontext? I know in theory they should be the same but still :-) Yes, I have tested sigaltstack successfully with make check and qemu-iotests. Stefan
Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Tue, May 21, 2013 at 10:30:22AM +0200, Paolo Bonzini wrote: Il 21/05/2013 09:31, Stefan Hajnoczi ha scritto: On Mon, May 20, 2013 at 09:23:43AM +0200, Paolo Bonzini wrote: Il 20/05/2013 08:24, Stefan Hajnoczi ha scritto: You only need to fdatasync() before every guest flush, no? No, you need to set the dirty bit before issuing the write on the host. Otherwise the image data may be modified without setting the appropriate dirty bit. That would allow data modifications to go undetected! But data modifications can go undetected until the guest flush returns, can't they? You are thinking about it from the guest perspective - if a flush has not completed yet then there is no guarantee that the write has reached disk. But from a host perspective the dirty bitmap should be conservative so that the backup application can always restore a bit-for-bit identical copy of the disk image. It would be weird if writes can sneak in unnoticed. True, but that would happen only in case the host crashes. Even for a QEMU crash the changes would be safe, I think. They would be written back when the persistent dirty bitmap's mmap() area is unmapped, during process exit. I'd err on the side of caution, mark the persistent dirty bitmap while QEMU is running. Discard the file if there was a power failure. It really depends what the dirty bitmap users are doing. It could be okay to have a tiny chance of missing a modification but it might not. Stefan
[Qemu-devel] [PATCH v2] tests: set MALLOC_PERTURB_ to expose memory bugs
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment variable is set. The value of the environment variable determines the bit pattern used to wipe memory. For more information, see http://udrepper.livejournal.com/11429.html. Set MALLOC_PERTURB_ for gtester and qemu-iotests. Note we pick a random value from 1 to 255 to expose more bugs. If you need to reproduce a crash use 'show environment' in gdb to extract the MALLOC_PERTURB_ value from a core dump. Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/Makefile | 5 - tests/qemu-iotests/check | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index a307d5a..24880c6 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,) $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \ + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ gtester $(GTESTER_OPTIONS) -m=$(SPEED) $(check-qtest-$*-y),GTESTER $@) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \ echo Gcov report for $$f:;\ @@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) .PHONY: $(patsubst %, check-%, $(check-unit-y)) $(patsubst %, check-%, $(check-unit-y)): check-%: % $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) - $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) + $(call quiet-command, \ + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ + gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \ echo Gcov report for $$f:;\ $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \ diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 432732c..74628ae 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -214,7 +214,8 @@ do start=`_wallclock` $timestamp echo -n [`date +%T`] [ ! -x $seq ] chmod u+x $seq # ensure we can run it - ./$seq $tmp.out 21 + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ + ./$seq $tmp.out 21 sts=$? $timestamp _timestamp stop=`_wallclock` -- 1.8.1.4
Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
On Mon, May 20, 2013 at 01:30:37PM +0200, Paolo Bonzini wrote: Il 15/05/2013 16:34, Stefan Hajnoczi ha scritto: +wait_for_overlapping_requests(job, start, end); +cow_request_begin(cow_request, job, start, end); + +for (; start end; start++) { +if (hbitmap_get(job-bitmap, start)) { +DPRINTF(brdv_co_backup_cow skip C% PRId64 \n, start); +continue; /* already copied */ +} + +/* immediately set bitmap (avoid coroutine race) */ +hbitmap_set(job-bitmap, start, 1); + HBitmap already has code for finding the next set bit, but you're not using it. If you reverse the direction of the bitmap, you can use an HBitmapIter here: +start = 0; +end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE, + BACKUP_BLOCKS_PER_CLUSTER); + +job-bitmap = hbitmap_alloc(end, 0); + +before_write = bdrv_add_before_write_cb(bs, backup_before_write); + +DPRINTF(backup_run start %s % PRId64 % PRId64 \n, +bdrv_get_device_name(bs), start, end); + +for (; start end; start++) { ... instead of iterating through each item manually. /** * hbitmap_iter_init: [...] * position of the iterator is also okay. However, concurrent resetting of * bits can lead to unexpected behavior if the iterator has not yet reached * those bits. */ void hbitmap_iter_init(HBitmapIter *hbi, const HBitmap *hb, uint64_t first); This worries me. We would initialize the bitmap to all 1s. Backing up a cluster resets the bit. But the documentation says it is not safe to reset bits while iterating? Stefan
Re: [Qemu-devel] [PATCH v1 07/14] net: hub use lock to protect ports list
On Tue, May 07, 2013 at 01:46:55PM +0800, Liu Ping Fan wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Hub ports will run on multi-threads, so use lock to protect them. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- net/hub.c | 25 - 1 files changed, 24 insertions(+), 1 deletions(-) Does this rely on net queue re-entrancy detection? Otherwise we'll deadlock when slirp re-enters the net layer, for example for ARP requests. I suggest moving this later in the patch series when the net queue is protected already. Stefan
Re: [Qemu-devel] [PATCH v1 08/14] net: introduce lock to protect NetQueue
On Tue, May 07, 2013 at 01:46:56PM +0800, Liu Ping Fan wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com NetQueue will be accessed by nc and its peers at the same time, need lock to protect it. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- net/queue.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/net/queue.c b/net/queue.c index 859d02a..2856c1d 100644 --- a/net/queue.c +++ b/net/queue.c @@ -53,6 +53,7 @@ struct NetQueue { uint32_t nq_maxlen; uint32_t nq_count; +QemuMutex lock; QTAILQ_HEAD(packets, NetPacket) packets; unsigned delivering : 1; @@ -68,6 +69,7 @@ NetQueue *qemu_new_net_queue(void *opaque) queue-nq_maxlen = 1; queue-nq_count = 0; +qemu_mutex_init(queue-lock); QTAILQ_INIT(queue-packets); queue-delivering = 0; @@ -107,7 +109,9 @@ static void qemu_net_queue_append(NetQueue *queue, memcpy(packet-data, buf, size); queue-nq_count++; +qemu_mutex_lock(queue-lock); QTAILQ_INSERT_TAIL(queue-packets, packet, entry); +qemu_mutex_unlock(queue-lock); nq_count must be protected too.
Re: [Qemu-devel] [PATCH v2] tests: set MALLOC_PERTURB_ to expose memory bugs
On Tue, May 21, 2013 at 06:56:07AM -0600, Eric Blake wrote: On 05/21/2013 06:43 AM, Stefan Hajnoczi wrote: glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment variable is set. The value of the environment variable determines the bit pattern used to wipe memory. For more information, see http://udrepper.livejournal.com/11429.html. Set MALLOC_PERTURB_ for gtester and qemu-iotests. Note we pick a random value from 1 to 255 to expose more bugs. If you need to reproduce a crash use 'show environment' in gdb to extract the MALLOC_PERTURB_ value from a core dump. Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/Makefile | 5 - tests/qemu-iotests/check | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index a307d5a..24880c6 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,) $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \ + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ This is a Makefile; don't you need to use $$ instead of $ (three instances)? $RANDOM is a bash-ism. If make is run with SHELL as /bin/sh on a platform where dash is the primary shell, it will fail: $ dash -c 'echo $(($RANDOM % 255))' dash: 1: arithmetic expression: expecting primary: % 255 HOWEVER: you can exploit the fact that inside $(()), you don't need $ to use the value of a defined variable, and also the fact that unless set -u is in effect, an undefined variable name silently evaluates as 0: $ dash -c 'echo $((RANDOM % 255))' 0 then you could write the shell code: MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} or the Makefile code: MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} and things will at least work on /bin/sh as dash (even though there will be no randomness and you are always testing with 1 in that case). Silly me. I did test it but it silently worked. Will resend. Stefan
[Qemu-devel] [PATCH v3] tests: set MALLOC_PERTURB_ to expose memory bugs
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment variable is set. The value of the environment variable determines the bit pattern used to wipe memory. For more information, see http://udrepper.livejournal.com/11429.html. Set MALLOC_PERTURB_ for gtester and qemu-iotests. Note we pick a random value from 1 to 255 to expose more bugs. If you need to reproduce a crash use 'show environment' in gdb to extract the MALLOC_PERTURB_ value from a core dump. Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- v3: * Use $ escaping in tests/Makefile [eblake] v2: * Randomize MALLOC_PERTURB_ value [armbru] * Preserve existing MALLOC_PERTURB_ variable, if set [danpb] tests/Makefile | 5 - tests/qemu-iotests/check | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index a307d5a..7844cd7 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,) $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \ + MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$(($$RANDOM % 255 + 1))} \ gtester $(GTESTER_OPTIONS) -m=$(SPEED) $(check-qtest-$*-y),GTESTER $@) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \ echo Gcov report for $$f:;\ @@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) .PHONY: $(patsubst %, check-%, $(check-unit-y)) $(patsubst %, check-%, $(check-unit-y)): check-%: % $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) - $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) + $(call quiet-command, \ + MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$(($$RANDOM % 255 + 1))} \ + gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \ echo Gcov report for $$f:;\ $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \ diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 432732c..74628ae 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -214,7 +214,8 @@ do start=`_wallclock` $timestamp echo -n [`date +%T`] [ ! -x $seq ] chmod u+x $seq # ensure we can run it - ./$seq $tmp.out 21 + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ + ./$seq $tmp.out 21 sts=$? $timestamp _timestamp stop=`_wallclock` -- 1.8.1.4
Re: [Qemu-devel] [PATCH] kvm: add detail error message when fail to add ioeventfd
On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote: I try to hotplug 28 * 8 multiple-function devices to guest with old host kernel, ioeventfds in host kernel will be exhausted, then qemu fails to allocate ioeventfds for blk/nic devices. It's better to add detail error here. Signed-off-by: Amos Kong ak...@redhat.com --- kvm-all.c |4 1 files changed, 4 insertions(+), 0 deletions(-) It would be nice to make kvm bus scalable so that the hardcoded in-kernel I/O device limit can be lifted. Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] 'qemu-nbd' explicit flush
On Tue, May 21, 2013 at 08:01:10PM +, Mark Trumpold wrote: Linux kernel 3.3.1 with Qemu patch to enable kernel flushing: http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108 Did you check that the kernel is sending NBD_FLUSH commands? You can use tcpdump and then check the captured network traffic. Usage example: 'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img' 'mount /dev/nbd0 /my-mount-point' Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting. I tried several approaches externally to flush the device. For example: 'mount -o remount,ro /dev/nbd0' 'blockdev --flushbufs /dev/nbd0' Did you try plain old sync(1)? I have been looking at the Qemu source code and in user space 'nbd.c' in routine 'nbd_trip' I see the case 'NBD_CMD_FLUSH' which looks to be called from the NBD socket interface. Here I see 'bdrv_co_flush(exp-bs)' which looks promising; however, I don't know how to setup the 'bs' pointer for the call. bs is the block device which was exported using: exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed); in qemu-nbd.c:main(). Ideally, I would like to add a command line parm to 'qemu-nbd.c' to explicitely do the flush, but so far no luck. Doing that is a little tricky, I think there are two options: 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which flushes all exports. 2. Instantiate a block/nbd.c client that connects to the running qemu-nbd server (make sure format=raw). Then call bdrv_flush() on the NBD client. You must use the qemu-nbd --shared=2 option. Stefan
Re: [Qemu-devel] [PATCH] makefile: detect corrupted elf files
On Wed, May 22, 2013 at 12:46:45AM +0300, Michael S. Tsirkin wrote: Once in a while make gets killed and doesn't clean up partial object files after it. Result is nasty errors from link. This hack checks object is well formed before linking, and rebuilds it if not. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Is below useful for others? Makefile.target | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Makefile.target b/Makefile.target index ce4391f..4dddee5 100644 --- a/Makefile.target +++ b/Makefile.target @@ -191,3 +191,10 @@ endif GENERATED_HEADERS += config-target.h Makefile: $(GENERATED_HEADERS) + +.SECONDEXPANSION: + +.PHONY: CORRUPTBINARY + +$(all-obj-y): % : $$(if $$(shell size %), , CORRUPTBINARY) How does size(1) establish the validity of the ELF file? Is it possible to sneak past a truncated file (which I think is the only type of corruption you're trying to protect against)? Stefan
[Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written
Net queues support efficient receive disable. For example, tap's file descriptor will not be polled while its peer has receive disabled. This saves CPU cycles for needlessly copying and then dropping packets which the peer cannot receive. rtl8139 is missing the qemu_flush_queued_packets() call that wakes the queue up when receive becomes possible again. As a result, the Windows 7 guest driver reaches a state where the rtl8139 cannot receive packets. The driver has actually refilled the receive buffer but we never resume reception. The bug can be reproduced by running a large FTP 'get' inside a Windows 7 guest: $ qemu -netdev tap,id=tap0,... -device rtl8139,netdev=tap0 The Linux guest driver does not trigger the bug, probably due to a different buffer management strategy. Reported-by: Oliver Francke oliver.fran...@filoo.de Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hw/net/rtl8139.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 9369507..7993f9f 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val) /* this value is off by 16 */ s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize); +/* more buffer space may be available so try to receive */ +qemu_flush_queued_packets(qemu_get_queue(s-nic)); + DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n, s-RxBufferSize, s-RxBufAddr, s-RxBufPtr); } -- 1.8.1.4
Re: [Qemu-devel] Fwd: [Qemu-stable] connectivity problem with Windows 7 + heavy network-traffic
On Wed, May 22, 2013 at 11:27 AM, Oliver Francke oliver.fran...@filoo.dewrote: Hi Stefan, thanks for your attention. See all infos below including complete command-line. Hi Oliver, I just sent a fix for qemu.git/master. If you need to backport it, replace qemu_get_queue(s-nic) with s-nic-nc. Please let me know if the fix solves the issue for you. Thanks, Stefan
Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written
On Wed, May 22, 2013 at 2:53 PM, Andreas Färber afaer...@suse.de wrote: Am 22.05.2013 14:50, schrieb Stefan Hajnoczi: Net queues support efficient receive disable. For example, tap's file descriptor will not be polled while its peer has receive disabled. This saves CPU cycles for needlessly copying and then dropping packets which the peer cannot receive. rtl8139 is missing the qemu_flush_queued_packets() call that wakes the queue up when receive becomes possible again. As a result, the Windows 7 guest driver reaches a state where the rtl8139 cannot receive packets. The driver has actually refilled the receive buffer but we never resume reception. The bug can be reproduced by running a large FTP 'get' inside a Windows 7 guest: $ qemu -netdev tap,id=tap0,... -device rtl8139,netdev=tap0 The Linux guest driver does not trigger the bug, probably due to a different buffer management strategy. Reported-by: Oliver Francke oliver.fran...@filoo.de Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Sounds as if we should Cc: qemu-sta...@nongnu.org Yes, please. Oliver just confirmed that it fixes the issue for him on IRC so this is good for QEMU 1.5.1. Stefan
Re: [Qemu-devel] [PATCH v3 2/8] block: add basic backup support to block driver
On Tue, May 21, 2013 at 06:46:39PM +0200, Paolo Bonzini wrote: Il 21/05/2013 18:26, Dietmar Maurer ha scritto: Hmm, right. But do we need the bitmap at all? We can just use bdrv_is_allocated like bdrv_co_do_readv does. Does that works with a nbd driver? Ah, right. That's the answer. Yes, the target may not supported allocation info. Or does that add another RPC call (slow down)? It doesn't work at all. Thus Stefan's patch is fine (except if you don't use HBitmapIter, there's no advantage in using HBitmap; you can use qemu/bitmap.h instead). I chose HBitmap because bitmap.h uses the int type instead of uint64_t, which makes it risky to use in the block layer. Stefan
Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Tue, May 21, 2013 at 10:58:47AM +, Dietmar Maurer wrote: True, but that would happen only in case the host crashes. Even for a QEMU crash the changes would be safe, I think. They would be written back when the persistent dirty bitmap's mmap() area is unmapped, during process exit. I'd err on the side of caution, mark the persistent dirty bitmap while QEMU is running. Discard the file if there was a power failure. Agreed. Though this is something that management must do manually, isn't it? QEMU cannot distinguish a SIGKILL from a power failure, while management can afford treating SIGKILL as a power failure. It really depends what the dirty bitmap users are doing. It could be okay to have a tiny chance of missing a modification but it might not. I just want to mention that there is another way to do incremental backups. Instead of using a dirty bitmap, you can compare the content, usually using a digest (SHA1) on clusters. Reading gigabytes of data from disk is expensive though. I guess they keep a Merkle tree so it's easy to find out which parts of the image must be transferred without re-reading the entire image. That sounds like more work than a persistent dirty bitmap. The advantage is that while dirty bitmaps are consumed by a single user, the Merkle tree can be used to sync up any number of replicas. That way you can also implement async replication to a remote site (like MS do). Sounds like rsync. Stefan
Re: [Qemu-devel] [PATCH v4 2/8] block: add basic backup support to block driver
On Wed, May 22, 2013 at 11:56:45AM +0200, Kevin Wolf wrote: Am 22.05.2013 um 11:54 hat Paolo Bonzini geschrieben: Il 22/05/2013 11:38, Kevin Wolf ha scritto: + +DPRINTF(brdv_co_backup_cow done C% PRId64 \n, start); +} + +out: +if (bounce_buffer) { +qemu_vfree(bounce_buffer); +} + +cow_request_end(cow_request); + +qemu_co_rwlock_unlock(job-flush_rwlock); + +return ret; +} + +static void coroutine_fn backup_before_write_notify(Notifier *notifier, +void *opaque) +{ +BdrvTrackedRequest *req = opaque; +backup_do_cow(req-bs, req-sector_num, req-nb_sectors); +} I don't think you can ignore errors here. Not sure if we can stop the VM and resume later or something like that, but if we can't, the backup will be invalid and we must fail the job. Yes, there is rerror/werror machinery for jobs that this patch is not using. This is not enough here. The guest write can't continue before the old content is saved to the backup image. Are you saying just the vm_stop(RUN_STATE_IO_ERROR) call is missing? I think the reason it's not there is because there's an assumption that block jobs are in the background and do not affect the guest. The guest continues running while the block job is in an error state. But perhaps we can make the vm_stop() call optional so that drive-backup can use it. Stefan
Re: [Qemu-devel] [PATCH v4 4/8] qemu-iotests: add 055 drive-backup test case
On Wed, May 22, 2013 at 01:19:58PM +0200, Kevin Wolf wrote: Am 16.05.2013 um 10:36 hat Stefan Hajnoczi geschrieben: Testing drive-backup is similar to image streaming and drive mirroring. This test case is based on 041. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/055 | 230 + tests/qemu-iotests/055.out | 5 + tests/qemu-iotests/group | 1 + 3 files changed, 236 insertions(+) create mode 100755 tests/qemu-iotests/055 create mode 100644 tests/qemu-iotests/055.out diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055 new file mode 100755 index 000..bc2eebf --- /dev/null +++ b/tests/qemu-iotests/055 @@ -0,0 +1,230 @@ +#!/usr/bin/env python +# +# Tests for drive-backup +# +# Copyright (C) 2013 Red Hat, Inc. +# +# Based on 041. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import time +import os +import iotests +from iotests import qemu_img, qemu_io + +test_img = os.path.join(iotests.test_dir, 'test.img') +target_img = os.path.join(iotests.test_dir, 'target.img') + +class DriveBackupTestCase(iotests.QMPTestCase): +'''Abstract base class for drive-backup test cases''' + +def assert_no_active_backups(self): +result = self.vm.qmp('query-block-jobs') +self.assert_qmp(result, 'return', []) + +def cancel_and_wait(self, drive='drive0'): +'''Cancel a block job and wait for it to finish''' +result = self.vm.qmp('block-job-cancel', device=drive) +self.assert_qmp(result, 'return', {}) + +cancelled = False +while not cancelled: +for event in self.vm.get_qmp_events(wait=True): +if event['event'] == 'BLOCK_JOB_COMPLETED' or \ + event['event'] == 'BLOCK_JOB_CANCELLED': +self.assert_qmp(event, 'data/type', 'backup') +self.assert_qmp(event, 'data/device', drive) +cancelled = True + +self.assert_no_active_backups() + +def complete_and_wait(self): +completed = False +while not completed: +for event in self.vm.get_qmp_events(wait=True): +if event['event'] == 'BLOCK_JOB_COMPLETED': +self.assert_qmp(event, 'data/type', 'backup') +self.assert_qmp(event, 'data/device', 'drive0') +self.assert_qmp(event, 'data/offset', self.image_len) +self.assert_qmp(event, 'data/len', self.image_len) +completed = True +self.assert_no_active_backups() + +def compare_images(self, img1, img2): +try: +qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img1, img1 + '.raw') +qemu_img('convert', '-f', iotests.imgfmt, '-O', 'raw', img2, img2 + '.raw') +file1 = open(img1 + '.raw', 'r') +file2 = open(img2 + '.raw', 'r') +return file1.read() == file2.read() +finally: +if file1 is not None: +file1.close() +if file2 is not None: +file2.close() +try: +os.remove(img1 + '.raw') +except OSError: +pass +try: +os.remove(img2 + '.raw') +except OSError: +pass compare_images() is an exact copy of its 041 counterparts, complete_and_wait() and cancel_and_wait() are the same as in 041 in the wait_ready = False case. Sounds like candidates for factoring out. Will fix.
Re: [Qemu-devel] [PATCH v4 3/8] block: add drive-backup QMP command
On Wed, May 22, 2013 at 11:53:44AM +0200, Kevin Wolf wrote: Am 16.05.2013 um 10:36 hat Stefan Hajnoczi geschrieben: +proto_drv = bdrv_find_protocol(target); +if (!proto_drv) { +error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); +return; +} I see that other block job commands are doing the same, but there are two things unclear for me about this: 1. Shouldn't bdrv_img_create() and/or bdrv_open() already return an error if the protocol can't be found? The error check is the only thing that proto_drv is used for. 2. Why do we complain about a wrong _format_ here? If I ask for a qcow2 image called foo:bar.qcow2, qemu won't find the protocol foo and complain that qcow2 is an invalid format. You are right. It doesn't seem necessary, we'll get an error message later from bdrv_open() or bdrv_img_create(). Will drop in the next revision. + +bdrv_get_geometry(bs, size); +size *= 512; I was going to suggest BDRV_SECTOR_SIZE at first, but... Why not use bdrv_getlength() in the first place if you need it in bytes anyway? As a nice bonus you would get -errno instead of size 0 on failure. Yes. Will fix and update qmp_drive_mirror() too. +if (mode != NEW_IMAGE_MODE_EXISTING) { +assert(format drv); +bdrv_img_create(target, format, +NULL, NULL, NULL, size, flags, local_err, false); +} + +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +return; +} + +target_bs = bdrv_new(); +ret = bdrv_open(target_bs, target, NULL, flags, drv); + +if (ret 0) { Why the empty line between bdrv_open and checking its return value? A dramatic pause, to build up suspense :). Will drop in the next revision and update qmp_drive_mirror() too.
Re: [Qemu-devel] [PATCH v4 00/10] curl: fix curl read
On Wed, May 22, 2013 at 12:12:24PM +0100, Richard W.M. Jones wrote: On Wed, May 22, 2013 at 01:04:51PM +0200, Paolo Bonzini wrote: Something is trying to write, but there's no write operation defined for CURL. I guess curl (and other backends too) should reject being opened for write. Alternatively, block.c could do that for them. Yes, I'd just got to that conclusion as well :-) The attached patch fixes the crash for me. Please post a top-level thread so that your patch gets picked up by scripts and noticed by humans too :). Alternatively Fam can include it in the next revision of this series. Stefan
Re: [Qemu-devel] [PATCH] hw/9pfs: Use O_NOFOLLOW when opening files on server
On Wed, May 22, 2013 at 04:52:54PM +0530, Aneesh Kumar K.V wrote: diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c index fe8e0ed..e2a89e3 100644 --- a/hw/9pfs/virtio-9p-handle.c +++ b/hw/9pfs/virtio-9p-handle.c @@ -608,7 +608,7 @@ static int handle_init(FsContext *ctx) struct file_handle fh; struct handle_data *data = g_malloc(sizeof(struct handle_data)); -data-mountfd = open(ctx-fs_root, O_DIRECTORY); +data-mountfd = open(ctx-fs_root, O_DIRECTORY | O_NOFOLLOW); Why is the root path not allowed to be a symlink? And if so, it would be more user-friendly to resolve the path before open. That way we don't need to bug the user with an error here.
Re: [Qemu-devel] [PATCH] kvm: add detail error message when fail to add ioeventfd
On Wed, May 22, 2013 at 09:48:21PM +0800, Amos Kong wrote: On Wed, May 22, 2013 at 11:32:27AM +0200, Stefan Hajnoczi wrote: On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote: I try to hotplug 28 * 8 multiple-function devices to guest with old host kernel, ioeventfds in host kernel will be exhausted, then qemu fails to allocate ioeventfds for blk/nic devices. It's better to add detail error here. Signed-off-by: Amos Kong ak...@redhat.com --- kvm-all.c |4 1 files changed, 4 insertions(+), 0 deletions(-) It would be nice to make kvm bus scalable so that the hardcoded in-kernel I/O device limit can be lifted. I had increased kernel NR_IOBUS_DEVS to 1000 (a limitation is needed for security) in last Mar, and make resizing kvm_io_range array dynamical. The maximum should not be hardcoded. File descriptor, maximum memory, etc are all controlled by rlimits. And since ioeventfds are file descriptors they are already limited by the maximum number of file descriptors. Why is there a need to impose a hardcoded limit? Stefan
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 11:46:15PM +0200, Paolo Bonzini wrote: Il 22/05/2013 22:47, Richard W.M. Jones ha scritto: I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. IIUC a disk which wasn't fully synchronized wouldn't necessarily be interpretable by libguestfs, so I guess we would need the complete snapshot. In the case of point-in-time backups (Stefan's block-backup) the plan is to have the snapshot complete from the beginning. The way it will work is that the drive-backup target is a qcow2 image with the guest's disk as its backing file. When the guest writes to the disk, drive-backup copies the original data to the qcow2 image. The qcow2 image is exported over NBD so a client can connect to access the read-only point-in-time snapshot. It is not necessary to populate the qcow2 file since it uses the guest disk as its backing file - all reads to unpopulated clusters go to the backing file. Stefan
Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Wed, May 22, 2013 at 03:34:18PM +, Dietmar Maurer wrote: That sounds like more work than a persistent dirty bitmap. The advantage is that while dirty bitmaps are consumed by a single user, the Merkle tree can be used to sync up any number of replicas. I also consider it safer, because you make sure the data exists (using hash keys like SHA1). I am unsure how you can check if a dirty bitmap contains errors, or is out of date? Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image. (consider the user remove the latest backup from the backup target). One disadvantage of Merkle trees is that the client becomes stateful - the client needs to store its own Merkle tree and this requires fancier client-side code. It is also more expensive to update hashes than a dirty bitmap. Not because you need to hash data but because a small write (e.g. 1 sector) requires that you read the surrounding sectors to recompute a hash for the cluster. Therefore you can expect worse guest I/O performance than with a dirty bitmap. I still think it's a cool idea. Making it work well will require a lot more effort than a dirty bitmap. Stefan
Re: [Qemu-devel] QMP interface for drive-add (or even blockdev-add)
On Wed, May 22, 2013 at 03:53:05PM +0200, Kevin Wolf wrote: Am 16.05.2013 um 21:05 hat Eric Blake geschrieben: On 05/16/2013 02:24 AM, Kevin Wolf wrote: The other thing that I'm not sure about is whether we should teach QAPI to parse certain data structures just into QDicts instead of C structs, or if dealing with the big unions inside the block layer actually makes sense. This is an interesting question. It's very convenient from the code side - we don't have to worry about laying down a schema. However, the point of QAPI is to offer that schema that allows for us to reason about things like compatibility (hard to sneak in a patch that modifies the schema, easy to sneak in a patch that modifies block driver parameter code) and eliminates the boilerplate of type-checking/basic input validation. Even if it requires some effort, I think we should avoid tunneling schema-less data over QAPI. Stefan
Re: [Qemu-devel] [PATCH 0/2] coroutine: dataplane support
On Fri, May 17, 2013 at 03:51:24PM +0200, Stefan Hajnoczi wrote: There is ongoing work to enable multiple event loop threads. This will allow QEMU itself to take advantage of SMP and reduce Big QEMU Lock (BQL) contention. This series is one step in that effort. These patches make coroutines safe in a multi-event loop/multi-threaded world. I have successfully tested them running qcow2 in a dataplane thread (further patches are required which I'll be sending soon). Patch 1 protects the global coroutine freelist with a lock :). Patch 2 drops the CoQueue dependency on AioContext. This allows CoMutex and CoRwlock to operate in a dataplane thread whereas previously we always scheduled coroutines in the QEMU iothread. Stefan Hajnoczi (2): coroutine: protect global pool with a mutex coroutine: stop using AioContext in CoQueue include/block/coroutine_int.h | 4 qemu-coroutine-lock.c | 56 --- qemu-coroutine.c | 23 -- trace-events | 2 +- 4 files changed, 47 insertions(+), 38 deletions(-) -- 1.8.1.4 Applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
Re: [Qemu-devel] [Bug 1182490] [NEW] [qemu-1.5] coroutine-win32.c broken on NULL pointer
On Tue, May 21, 2013 at 02:11:05PM -, Cauchy Song wrote: Public bug reported: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 4340.0x163c] qemu_coroutine_switch (action=COROUTINE_TERMINATE, to_=0x0, from_=0x3ba1c80) at /home/cauchy/vcs/git/qemu/coroutine-win32.c:47 (gdb) bt #0 qemu_coroutine_switch (action=COROUTINE_TERMINATE, to_=0x0, from_=0x3ba1c80) at /home/cauchy/vcs/git/qemu/coroutine-win32.c:47 #1 coroutine_trampoline (co_=0x3ba1c80) at /home/cauchy/vcs/git/qemu/coroutine-win32.c:58 #2 0x77098fed in ?? () #3 0x in ?? () What is the command-line? How do you reproduce the crash? Stefan
Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier
On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote: Add C++ keywords to avoid errors in compiling with c++ compiler. This also renames class member of PciDeviceInfo to q_class. Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com --- hmp.c |2 +- hw/pci/pci.c|2 +- scripts/qapi.py |9 - 3 files changed, 10 insertions(+), 3 deletions(-) Please also extend scripts/checkpatch.pl. Otherwise it is very likely that C++ keywords will be introduced again in the future. Most people will not build the VSS code and therefore checkpatch.pl needs to ensure that patches with C++ keywords will not be accepted. Stefan
Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS
On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote: * How to build run qemu-ga with VSS support - Download Microsoft VSS SDK from: http://www.microsoft.com/en-us/download/details.aspx?id=23490 - Setup the SDK scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) - Specify installed SDK directory to configure option as: ./configure -with-vss-sdk=path/to/VSS SDK --cross-prefix=i686-w64-mingw32- Are there any plans to make this more developer-friendly? In the Linux world it's unusual to download third-party SDKs; development headers are available as packages from the distro. I haven't looked at the SDK license but hopefully the VSS headers can be added to the mingw cross-build toolchain? Stefan
Re: [Qemu-devel] [RFC PATCH v3 11/11] QMP/qmp.py: set locale for exceptions to display non-ascii messages correctly
On Tue, May 21, 2013 at 11:34:16AM -0400, Tomoki Sekiyama wrote: qemu-ga in Windows may return error message with multibyte characters when the guest OS language is set to other than English. To display such messages correctly, this encodes the message based on the locale settings. Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com --- QMP/qmp.py |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/QMP/qmp.py b/QMP/qmp.py index c551df1..ee21819 100644 --- a/QMP/qmp.py +++ b/QMP/qmp.py @@ -11,6 +11,7 @@ import json import errno import socket +import locale class QMPError(Exception): pass @@ -133,7 +134,8 @@ class QEMUMonitorProtocol: def command(self, cmd, **kwds): ret = self.cmd(cmd, kwds) if ret.has_key('error'): -raise Exception(ret['error']['desc']) +enc = locale.getpreferredencoding() +raise Exception(ret['error']['desc'].encode(enc)) You should not need to explicitly encode the error descriptor. The error description should be UTF-8 on the wire and a Unicode Python string in this script. I think the real problem is: 1. Guest qga is writing strings in local encoding onto the wire. or 2. qmp.py isn't UTF-8-decoding strings received over the wire. Either or both bugs could be present. Once they are fixed you shouldn't see encoding problems. Stefan
[Qemu-devel] Designing QMP APIs at KVM Forum
With better QMP introspection on the horizon and work in various subsystems pushing QMP boundaries it would be useful to bring together the latest best practices for designing QMP APIs. There are design rules for keeping QMP APIs extensible and for allowing clients to detect the presence of features. There is also QEMU-side infrastructure like event rate-limiting, which developers should make use of where appropriate. Is anyone willing to bring together the best practices and present them at KVM Forum this year? I think that could help set the standard for QMP APIs. A set of slides or wiki page can be a reference to developers that stops us working from first pricinples every time a new API is added. Stefan
Re: [Qemu-devel] qemu seabios issue with vhost-scsi
On Wed, May 22, 2013 at 05:36:08PM -0700, Badari wrote: Hi, While testing vhost-scsi in the current qemu git, ran into an earlier issue with seabios. I had to disable scsi support in seabios to get it working. I was hoping this issue got resolved when vhost-scsi support got merged into qemu. Is this still being worked on ? Thanks, Badari [root ~]# gdb /root/qemu/x86_64-softmmu/qemu-system-x86_64 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /root/qemu/x86_64-softmmu/qemu-system-x86_64...done. (gdb) run --cpu qemu64 --enable-kvm -m 4096 -drive file=/var/lib/libvirt/images/lnx.img,if=ide,cache=writethrough -device vhost-scsi-pci,wwpn=naa.6001405bd4e8476d,event_idx=off -vnc :10 -boot d Starting program: /root/qemu/x86_64-softmmu/qemu-system-x86_64 --cpu qemu64 --enable-kvm -m 4096 -drive file=/var/lib/libvirt/images/window.img,if=ide,cache=writethrough -device vhost-scsi-pci,wwpn=naa.6001405bd4e8476d,event_idx=off -vnc :10 -boot d warning: no loadable sections found in added symbol-file system-supplied DSO at 0x77ffa000 [Thread debugging using libthread_db enabled] [New Thread 0x71c1c700 (LWP 4725)] [New Thread 0x71239700 (LWP 4726)] [New Thread 0x7fffeb7ff700 (LWP 4729)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x71239700 (LWP 4726)] 0x556b3191 in scsi_device_find (bus=0x565abb50, channel=0, id=0, lun=0) at hw/scsi/scsi-bus.c:1744 1744QTAILQ_FOREACH_REVERSE(kid, bus-qbus.children, ChildrenHead, sibling) { Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 db4-4.7.25-17.el6.x86_64 glib2-2.22.5-7.el6.x86_64 glibc-2.12-1.107.el6.x86_64 gnutls-2.8.5-10.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6.x86_64 libcom_err-1.41.12-14.el6.x86_64 libcurl-7.19.7-35.el6.x86_64 libgcrypt-1.4.5-9.el6_2.2.x86_64 libgpg-error-1.7-4.el6.x86_64 libidn-1.18-2.el6.x86_64 libpng-1.2.49-1.el6_2.x86_64 libselinux-2.0.94-5.3.el6.x86_64 libssh2-1.4.2-1.el6.x86_64 libtasn1-2.3-3.el6_2.1.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.9.2-1.el6.x86_64 nss-3.14.0.0-12.el6.x86_64 nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.14.0.0-2.el6.x86_64 openldap-2.4.23-31.el6.x86_64 openssl-1.0.0-27.el6.x86_64 pixman-0.26.2-4.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x556b3191 in scsi_device_find (bus=0x565abb50, channel=0, id= 0, lun=0) at hw/scsi/scsi-bus.c:1744 #1 0x557a59f0 in virtio_scsi_device_find (vdev=0x565aba38, vq= 0x565d1150) at /root/qemu/hw/scsi/virtio-scsi.c:56 #2 virtio_scsi_handle_cmd (vdev=0x565aba38, vq=0x565d1150) at /root/qemu/hw/scsi/virtio-scsi.c:376 We should never get here with vhost-scsi. This function is processing the command virtqueue in QEMU userspace - if vhost is active then we shouldn't reach this. AFAICT the s-bus was not initialized in the vhost codepath. Therefore the crash in scsi_device_find(bus, ...). Can you check vhost_scsi_set_status() was called and if it successfully enabled vhost? Is it possible that the guest is notifying the virtqueue before setting the status register to DRIVER_OK? That would explain why vhost hasn't been activated yet. Stefan
Re: [Qemu-devel] [RFC PATCH v3 05/11] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze
On Tue, May 21, 2013 at 11:33:52AM -0400, Tomoki Sekiyama wrote: Implements a basic stub of software VSS provider. Currently, this modules only provides a relay function of events between qemu-guest-agent and Windows VSS when VSS finished filesystem freeze and when qemu snapshot is done. In the future, this module could be extended to support the other VSS functions, such as query for snapshot volumes and recovery. Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com --- Makefile.objs |2 configure |5 qga/Makefile.objs |6 qga/vss-win32-provider.h| 26 ++ qga/vss-win32-provider/Makefile.objs| 21 + qga/vss-win32-provider/install.cpp | 494 +++ qga/vss-win32-provider/provider.cpp | 474 ++ qga/vss-win32-provider/qga-provider.def | 10 + qga/vss-win32-provider/qga-provider.idl | 20 + qga/vss-win32.h | 85 + 10 files changed, 1142 insertions(+), 1 deletion(-) create mode 100644 qga/vss-win32-provider.h create mode 100644 qga/vss-win32-provider/Makefile.objs create mode 100644 qga/vss-win32-provider/install.cpp create mode 100644 qga/vss-win32-provider/provider.cpp create mode 100644 qga/vss-win32-provider/qga-provider.def create mode 100644 qga/vss-win32-provider/qga-provider.idl create mode 100644 qga/vss-win32.h Please run scripts/checkpatch.pl and use QEMU coding style (see ./CODING_STYLE and ./HACKING). diff --git a/qga/vss-win32-provider/Makefile.objs b/qga/vss-win32-provider/Makefile.objs new file mode 100644 index 000..73ef752 --- /dev/null +++ b/qga/vss-win32-provider/Makefile.objs @@ -0,0 +1,21 @@ +# rules to build qga-provider.dll + +qga-obj-y += qga-provider.dll +qga-prv-obj-y += provider.o install.o + +obj-qga-prv-obj-y = $(addprefix $(obj)/, $(qga-prv-obj-y)) +$(obj-qga-prv-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wold-style-declaration -Wold-style-definition -Wredundant-decls -fstack-protector-all, $(QEMU_CFLAGS)) -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor + +$(obj)/qga-provider.dll: LDFLAGS = -shared -Wl,--add-stdcall-alias,--enable-stdcall-fixup -lole32 -loleaut32 -lshlwapi -luuid -static +$(obj)/qga-provider.dll: $(obj-qga-prv-obj-y) $(SRC_PATH)/$(obj)/qga-provider.def $(obj)/qga-provider.tlb + $(call quiet-command,$(CXX) -o $@ $(qga-prv-obj-y) $(SRC_PATH)/qga/vss-win32-provider/qga-provider.def $(CXXFLAGS) $(LDFLAGS), LINK $(TARGET_DIR)$@) + + +# rules to build qga-provider.tlb +# Currently, only native build is supported because building .tlb +# (TypeLibrary) from .idl requires WindowsSDK and MIDL (included in VC++). +MIDL=midl +WINSDK=C:\\Program Files\\Microsoft SDKs\\Windows\\v7.1\\Include This needs to be a ./configure option. diff --git a/qga/vss-win32.h b/qga/vss-win32.h new file mode 100644 index 000..7600087 --- /dev/null +++ b/qga/vss-win32.h @@ -0,0 +1,85 @@ +/* + * QEMU Guest Agent win32 VSS common declarations + * + * Copyright Hitachi Data Systems Corp. 2013 + * + * Authors: + * Tomoki Sekiyama tomoki.sekiy...@hds.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef VSS_WIN32_H +#define VSS_WIN32_H + +#define __MIDL_user_allocate_free_DEFINED__ +#include config-host.h +#include windows.h +#include shlwapi.h + +/* Reduce warnings to include vss.h */ +#define __in IN +#define __out OUT +#define __RPC_unique_pointer +#define __RPC_string +#define __RPC__deref_inout_opt +#define __RPC__out +#ifndef __RPC__out_ecount_part +#define __RPC__out_ecount_part(x, y) +#endif +#define _declspec(x) +#undef uuid +#define uuid(x) This looks hacky. Why are you stubbing out macros that vss.h uses?
Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS
On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto: On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote: * How to build run qemu-ga with VSS support - Download Microsoft VSS SDK from: http://www.microsoft.com/en-us/download/details.aspx?id=23490 - Setup the SDK scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) - Specify installed SDK directory to configure option as: ./configure -with-vss-sdk=path/to/VSS SDK --cross-prefix=i686-w64-mingw32- Are there any plans to make this more developer-friendly? In the Linux world it's unusual to download third-party SDKs; development headers are available as packages from the distro. I haven't looked at the SDK license but hopefully the VSS headers can be added to the mingw cross-build toolchain? No, the mingw headers are (re)written by hand, or something like that. The VSS license does not permit redistribution. :/ Can we follow the same process for VSS as for the other mingw headers? TBH I don't see how one can write them by hand without deriving it from the SDK, which seems dubious. But if it worked for the rest of mingw (which comes with Windows and DirectX headers, for example) then maybe it can work here too. Stefan
Re: [Qemu-devel] qemu seabios issue with vhost-scsi
On Thu, May 23, 2013 at 11:48 AM, Gleb Natapov g...@redhat.com wrote: On Thu, May 23, 2013 at 08:53:55AM +0800, Asias He wrote: On Wed, May 22, 2013 at 05:36:08PM -0700, Badari wrote: Hi, While testing vhost-scsi in the current qemu git, ran into an earlier issue with seabios. I had to disable scsi support in seabios to get it working. I was hoping this issue got resolved when vhost-scsi support got merged into qemu. Is this still being worked on ? Hmm, can you try seabios.git? Not sure if seabios shipped by qemu picked up the fixes for vhost-scsi. Nothing in seabios should crash qemu. Agreed. I'm pretty sure it's the scenario that I posted in my other reply to this thread. The common virtio-scsi code in QEMU should guard against this. In virtio-blk data plane I hit a similar case and ended up starting the data plane thread (equivalent to vhost here) *before* the status register is set to DRIVER_OK. Stefan
Re: [Qemu-devel] [PATCH v5 01/11] curl: introduce CURLSockInfo to BDRVCURLState.
On Thu, May 23, 2013 at 11:37:59AM +0800, Fam Zheng wrote: @@ -90,7 +98,16 @@ static int curl_aio_flush(void *opaque); static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, void *s, void *sp) { +BDRVCURLState *bs = s; bs is used for BlockDriverState in block/* code. I find this confusing. Usually the custom state is called 's'. Perhaps just rename the void* arguments s_ and sp_.
Re: [Qemu-devel] [PATCH v5 05/11] curl: add timer to BDRVCURLState
On Thu, May 23, 2013 at 11:38:03AM +0800, Fam Zheng wrote: diff --git a/block/curl.c b/block/curl.c index fc464ad..4fd5bb9 100644 --- a/block/curl.c +++ b/block/curl.c @@ -89,6 +89,7 @@ typedef struct BDRVCURLState { QLIST_HEAD(, CURLSockInfo) socks; char *url; size_t readahead_size; +QEMUTimer *timer; /* Whether http server accept range in header */ bool accept_range; } BDRVCURLState; @@ -148,6 +149,38 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) return realsize; } +static void curl_timer_cb(void *opaque) +{ +int running; +BDRVCURLState *bs = (BDRVCURLState *)opaque; Please call it 's'. 'bs' is for BlockDriverState*. Also, there is no need to cast void* to BDRVCURLState*, the conversion is implicit. +DPRINTF(curl timeout!\n); +curl_multi_socket_action(bs-multi, CURL_SOCKET_TIMEOUT, 0, running); +} + +/* Call back for curl_multi interface */ +static int curl_multi_timer_cb(CURLM *multi, long timeout_ms, void *s) +{ +BDRVCURLState *bs = (BDRVCURLState *)s; Same here.
Re: [Qemu-devel] [PATCH v5 06/11] curl: introduce CURLDataCache
On Thu, May 23, 2013 at 11:38:04AM +0800, Fam Zheng wrote: +typedef struct CURLDataCache { +char *data; +size_t base_pos; Must be int64_t. QEMU compiled on 32-bit hosts would only allow 4 GB images with size_t! +size_t data_len; +size_t write_pos; +/* Ref count for CURLState */ +int use_count; It's better to introduce this field when you add code to use it. When possible, don't add unused code in a patch, it makes it harder to review. +static void curl_complete_io(BDRVCURLState *bs, CURLAIOCB *acb, + CURLDataCache *cache) +{ +size_t aio_base = acb-sector_num * SECTOR_SIZE; int64_t +size_t aio_bytes = acb-nb_sectors * SECTOR_SIZE; +size_t off = aio_base - cache-base_pos; + +qemu_iovec_from_buf(acb-qiov, 0, cache-data + off, aio_bytes); +acb-common.cb(acb-common.opaque, 0); +DPRINTF(AIO Request OK: %10zd %10zd\n, aio_base, aio_bytes); PRId64 for 64-bit aio_base @@ -589,26 +577,24 @@ static const AIOCBInfo curl_aiocb_info = { static void curl_readv_bh_cb(void *p) { CURLState *state; - +CURLDataCache *cache = NULL; CURLAIOCB *acb = p; BDRVCURLState *s = acb-common.bs-opaque; +size_t aio_base, aio_bytes; int64_t aio_base;
Re: [Qemu-devel] [PATCH v5 07/11] curl: make use of CURLDataCache.
On Thu, May 23, 2013 at 11:38:05AM +0800, Fam Zheng wrote: @@ -221,31 +215,35 @@ static void curl_complete_io(BDRVCURLState *bs, CURLAIOCB *acb, static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) { -CURLState *s = ((CURLState*)opaque); +CURLState *s = (CURLState *)opaque; You can drop the cast completely because opaque is void*. +CURLDataCache *c = s-cache; size_t realsize = size * nmemb; -int i; - -DPRINTF(CURL: Just reading %zd bytes\n, realsize); +CURLAIOCB *acb; -if (!s || !s-orig_buf) +if (!c || !c-data) { goto read_end; +} +if (c-write_pos = c-data_len) { +goto read_end; +} +memcpy(c-data + c-write_pos, ptr, + MIN(realsize, c-data_len - c-write_pos)); +c-write_pos += realsize; +if (c-write_pos = c-data_len) { +c-write_pos = c-data_len; +} -memcpy(s-orig_buf + s-buf_off, ptr, realsize); -s-buf_off += realsize; Why did you add MIN(realsize, c-data_len - c-write_pos)? The original code trusts realsize to be within s-orig_buf. - -for(i=0; iCURL_NUM_ACB; i++) { -CURLAIOCB *acb = s-acb[i]; - -if (!acb) -continue; - -if ((s-buf_off = acb-end)) { -qemu_iovec_from_buf(acb-qiov, 0, s-orig_buf + acb-start, -acb-end - acb-start); -acb-common.cb(acb-common.opaque, 0); -qemu_aio_release(acb); -s-acb[i] = NULL; +acb = QLIST_FIRST(s-s-acbs); +while (acb) { +size_t aio_base = acb-sector_num * SECTOR_SIZE; int64_t @@ -600,29 +596,41 @@ static void curl_readv_bh_cb(void *p) // No cache found, so let's start a new request state = curl_init_state(s); if (!state) { -acb-common.cb(acb-common.opaque, -EIO); -qemu_aio_release(acb); -return; +goto err_release; } -acb-start = 0; -acb-end = (acb-nb_sectors * SECTOR_SIZE); - -state-buf_off = 0; -if (state-orig_buf) -g_free(state-orig_buf); -state-buf_start = start; -state-buf_len = acb-end + s-readahead_size; -end = MIN(start + state-buf_len, s-len) - 1; -state-orig_buf = g_malloc(state-buf_len); -state-acb[0] = acb; - -snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, start, end); -DPRINTF(CURL (AIO): Reading %d at %zd (%s)\n, -(acb-nb_sectors * SECTOR_SIZE), start, state-range); -curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range); +cache = g_malloc0(sizeof(CURLDataCache)); +cache-base_pos = acb-sector_num * SECTOR_SIZE; +cache-data_len = aio_bytes + s-readahead_size; +cache-write_pos = 0; +cache-data = g_malloc(cache-data_len); +QLIST_INSERT_HEAD(s-acbs, acb, next); +snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, cache-base_pos, + cache-base_pos + cache-data_len); +DPRINTF(Reading range: %s\n, state-range); +curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range); +QLIST_INSERT_HEAD(s-cache, cache, next); +state-cache = cache; +cache-use_count++; I don't see where you bump the use_count when a cache lookup is successful. Maybe I just missed it in the other patches.
Re: [Qemu-devel] [PATCH v5 08/11] curl: use list to store CURLState
On Thu, May 23, 2013 at 11:38:06AM +0800, Fam Zheng wrote: @@ -660,9 +651,13 @@ static BlockDriverAIOCB *curl_aio_readv(BlockDriverState *bs, static void curl_close(BlockDriverState *bs) { BDRVCURLState *s = bs-opaque; -int i; DPRINTF(CURL: Close\n); +if (s-timer) { +qemu_del_timer(s-timer); +qemu_free_timer(s-timer); +s-timer = NULL; +} This hunk is duplicated, see the next line :-). @@ -670,16 +665,26 @@ static void curl_close(BlockDriverState *bs) s-timer = NULL; } -for (i=0; iCURL_NUM_STATES; i++) { -if (s-states[i].in_use) -curl_clean_state(s-states[i]); -if (s-states[i].curl) { -curl_easy_cleanup(s-states[i].curl); -s-states[i].curl = NULL; -} +while (!QLIST_EMPTY(s-curl_states)) { +CURLState *state = QLIST_FIRST(s-curl_states); +/* Remove and clean curl easy handles */ +curl_clean_state(state); +QLIST_REMOVE(state, next); +g_free(state); +state = NULL; } -if (s-multi) + +if (s-multi) { curl_multi_cleanup(s-multi); +} + +while (!QLIST_EMPTY(s-acbs)) { +CURLAIOCB *acb = QLIST_FIRST(s-acbs); +acb-common.cb(acb-common.opaque, -EIO); +QLIST_REMOVE(acb, next); +qemu_aio_release(acb); +acb = NULL; +} Duplicated? See line below. while (!QLIST_EMPTY(s-acbs)) { CURLAIOCB *acb = QLIST_FIRST(s-acbs); @@ -709,6 +714,7 @@ static void curl_close(BlockDriverState *bs) } g_free(s-url); +s-url = NULL; } Can s-url = NULL be merged into a previous patch which adds g_free(s-url)?
Re: [Qemu-devel] [PATCH v5 00/11] curl: fix curl read
On Thu, May 23, 2013 at 11:37:58AM +0800, Fam Zheng wrote: CURL library API has changed, the current curl driver is not working. This patch rewrites the use of API as well as the structure of internal states. BDRVCURLState holds the pointer to curl multi interface (man 3 libcurl-multi), and 4 lists for internal states: - CURLState holds state for libcurl connection (man 3 libcurl-easy) - CURLSockInfo holds information for libcurl socket interface (man 3 curl_multi_socket_action). - CURLDataCache holds the user data read from libcurl, it is in a list ordered by access, the used cache is moved to list head on access, so the tail element is freed first. BDRVCURLState.cache_quota is the threshold to start freeing cache. - CURLAIOCB holds ongoing aio information. Looking pretty good. It's not clear to me if block/curl.c was broken, besides failing with HTTP servers that do not support Range:, or which part of this series fixes the bug(s). Can you clarify that in the patch description? I'll test the next revision and audit a little more for memory leaks, since you are introducing several heap-allocated structures and lists.
Re: [Qemu-devel] [PATCH v5 09/11] curl: add cache quota.
On Thu, May 23, 2013 at 11:38:07AM +0800, Fam Zheng wrote: Introduce a cache quota: BDRVCURLState.cache_quota. When adding new CURLDataCache to BDRVCURLState, if number of existing CURLDataCache is larger than CURL_CACHE_QUOTA, try to release some first to limit the in memory cache size. A least used entry is selected for releasing. Signed-off-by: Fam Zheng f...@redhat.com --- block/curl.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/block/curl.c b/block/curl.c index 3ea2552..5adbc84 100644 --- a/block/curl.c +++ b/block/curl.c @@ -40,6 +40,7 @@ #define SECTOR_SIZE 512 #define READ_AHEAD_SIZE (256 * 1024) +#define CURL_CACHE_QUOTA 10 struct BDRVCURLState; @@ -90,6 +91,8 @@ typedef struct BDRVCURLState { QEMUTimer *timer; /* List of data cache ordered by access, freed from tail */ QLIST_HEAD(, CURLDataCache) cache; +/* Threshold to release unused cache when cache list is longer than it */ +int cache_quota; /* Whether http server accept range in header */ bool accept_range; } BDRVCURLState; @@ -526,6 +529,8 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags) curl_multi_socket_action(s-multi, CURL_SOCKET_TIMEOUT, 0, running); qemu_opts_del(opts); +s-cache_quota = CURL_CACHE_QUOTA; + return 0; out: @@ -595,6 +600,27 @@ static void curl_readv_bh_cb(void *p) cache-data_len = aio_bytes + s-readahead_size; cache-write_pos = 0; cache-data = g_malloc(cache-data_len); +/* Try to release some cache */ +while (0 s-cache_quota = 0) { while (0) :(
Re: [Qemu-devel] QMP interface for drive-add (or even blockdev-add)
On Thu, May 23, 2013 at 4:41 PM, Kevin Wolf kw...@redhat.com wrote: Am 23.05.2013 um 13:57 hat Stefan Hajnoczi geschrieben: On Wed, May 22, 2013 at 03:53:05PM +0200, Kevin Wolf wrote: Am 16.05.2013 um 21:05 hat Eric Blake geschrieben: On 05/16/2013 02:24 AM, Kevin Wolf wrote: The other thing that I'm not sure about is whether we should teach QAPI to parse certain data structures just into QDicts instead of C structs, or if dealing with the big unions inside the block layer actually makes sense. This is an interesting question. It's very convenient from the code side - we don't have to worry about laying down a schema. However, the point of QAPI is to offer that schema that allows for us to reason about things like compatibility (hard to sneak in a patch that modifies the schema, easy to sneak in a patch that modifies block driver parameter code) and eliminates the boilerplate of type-checking/basic input validation. Even if it requires some effort, I think we should avoid tunneling schema-less data over QAPI. Note that I'm talking _only_ about the C side here. Everything that goes through QMP is an external API and must described by a schema, I fully agree there. The question is whether QAPI must, after validating the input against the schema, parse it into C structs or whether it should be able to fill QDicts and pass those around. I see. In that case using a QDict seems fine. Stefan
Re: [Qemu-devel] qemu seabios issue with vhost-scsi
On Thu, May 23, 2013 at 6:47 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/05/2013 18:38, Badari Pulavarty ha scritto: If that is with the old SeaBIOS, then SIGABRT is intended. :) The guest is buggy, the problem in QEMU only lies in _how_ it fails. Paolo I am confused now. Without above changes, seabios fix makes the guest boot. But with the above changes, I run into this (even with seabios fix) Ah, okay. I understood it crashed only with the SeaBIOS fix. I'll work on a more proper fix then. Maybe use the same approach as data plane - activate vhost when userspace sees the first virtqueue kick. That way it works with weird guests and never aborts. Stefan
Re: [Qemu-devel] [PATCH v3 0/8] block: drive-backup live backup command
On Thu, May 23, 2013 at 08:11:42AM +, Dietmar Maurer wrote: I also consider it safer, because you make sure the data exists (using hash keys like SHA1). I am unsure how you can check if a dirty bitmap contains errors, or is out of date? Also, you can compare arbitrary Merkle trees, whereas a dirty bitmap is always related to a single image. (consider the user remove the latest backup from the backup target). One disadvantage of Merkle trees is that the client becomes stateful - the client needs to store its own Merkle tree and this requires fancier client-side code. What 'client' do you talk about here? A backup application, for example. Previously it could simply use api.getDirtyBlocks() - [Sector] and it would translate into a single QMP API call. Now a Merkle tree needs to be stored on the client side and synced with the server. The client-side library becomes more complex. But sure, the code gets more complex, and needs considerable amount of RAM to store the hash keys . It is also more expensive to update hashes than a dirty bitmap. Not because you need to hash data but because a small write (e.g. 1 sector) requires that you read the surrounding sectors to recompute a hash for the cluster. Therefore you can expect worse guest I/O performance than with a dirty bitmap. There is no need to update any hash - You only need to do that on backup - in fact, all those things can be done by the backup driver. The problem is that if you leave hash calculation until backup time then you need to read in the entire disk image (100s of GB) from disk. That is slow and drains I/O resources. Maybe the best approach is to maintain a dirty bitmap while the guest is running, which is fairly cheap. Then you can use the dirty bitmap to only hash modified clusters when building the Merkle tree - this avoids reading the entire disk image. I still think it's a cool idea. Making it work well will require a lot more effort than a dirty bitmap. How do you re-generate a dirty bitmap after a server crash? The dirty bitmap is invalid after crash. A full backup is required, all clusters are considered dirty. The simplest way to implement this is to mark the persistent bitmap invalid upon the first guest write. When QEMU is terminated cleanly, flush all dirty bitmap updates to disk and then mark the file valid again. If QEMU finds the file is invalid on startup, start from scratch. Stefan
Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier
On Thu, May 23, 2013 at 06:34:43PM +, Tomoki Sekiyama wrote: On 5/23/13 8:12 , Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote: Add C++ keywords to avoid errors in compiling with c++ compiler. This also renames class member of PciDeviceInfo to q_class. Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com --- hmp.c |2 +- hw/pci/pci.c|2 +- scripts/qapi.py |9 - 3 files changed, 10 insertions(+), 3 deletions(-) Please also extend scripts/checkpatch.pl. Otherwise it is very likely that C++ keywords will be introduced again in the future. Most people will not build the VSS code and therefore checkpatch.pl needs to ensure that patches with C++ keywords will not be accepted. Stefan I think it would be difficult to identify problematic C++ keywords usage from patches because headers can legally contain C++ keywords and checkpatch.pl doesn't know where it should be used. Do you have any good ideas? We can ignore false warnings for 0.1% of patches (the ones that touch VSS C++ code). But for the remaining 99.9% of patches it's worth guarding against VSS bitrot. Remember not many people will compile it and therefore they won't notice when they break it. I really think it's worth putting some effort in now so VSS doesn't periodically break. checkpatch.pl is a hacky sort of C parser. It already checks for a bunch of similar things and it knows about comments, macros, and strings. It does not perform #include expansion, so there is no risk of including system headers that have C++ code. Stefan
Re: [Qemu-devel] [RFC PATCH v3 05/11] qemu-ga: Add Windows VSS provider to quiesce applications on fsfreeze
On Thu, May 23, 2013 at 06:36:35PM +, Tomoki Sekiyama wrote: On 5/23/13 8:22 , Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, May 21, 2013 at 11:33:52AM -0400, Tomoki Sekiyama wrote: diff --git a/qga/vss-win32.h b/qga/vss-win32.h new file mode 100644 index 000..7600087 --- /dev/null +++ b/qga/vss-win32.h @@ -0,0 +1,85 @@ +/* + * QEMU Guest Agent win32 VSS common declarations + * + * Copyright Hitachi Data Systems Corp. 2013 + * + * Authors: + * Tomoki Sekiyama tomoki.sekiy...@hds.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef VSS_WIN32_H +#define VSS_WIN32_H + +#define __MIDL_user_allocate_free_DEFINED__ +#include config-host.h +#include windows.h +#include shlwapi.h + +/* Reduce warnings to include vss.h */ +#define __in IN +#define __out OUT +#define __RPC_unique_pointer +#define __RPC_string +#define __RPC__deref_inout_opt +#define __RPC__out +#ifndef __RPC__out_ecount_part +#define __RPC__out_ecount_part(x, y) +#endif +#define _declspec(x) +#undef uuid +#define uuid(x) This looks hacky. Why are you stubbing out macros that vss.h uses? Because these macros are internally defined/used by windows SDK's headers and not fully covered by mingw (some of them is only meaningful with Microsoft IDE), so it don't compile without these define's. This could be better if mingw supports these VSS headers, but I have no idea how to achieve that... Please add a comment explaining this. BTW, did you look at the macro definitions to see if they define anything important? In other words, is there a difference when compiled using Visual C++ where the macros actually have meaning? Stefan
Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS
On Thu, May 23, 2013 at 03:50:19PM +0200, Paolo Bonzini wrote: Il 23/05/2013 15:28, Stefan Hajnoczi ha scritto: On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto: On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote: * How to build run qemu-ga with VSS support - Download Microsoft VSS SDK from: http://www.microsoft.com/en-us/download/details.aspx?id=23490 - Setup the SDK scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) - Specify installed SDK directory to configure option as: ./configure -with-vss-sdk=path/to/VSS SDK --cross-prefix=i686-w64-mingw32- Are there any plans to make this more developer-friendly? In the Linux world it's unusual to download third-party SDKs; development headers are available as packages from the distro. I haven't looked at the SDK license but hopefully the VSS headers can be added to the mingw cross-build toolchain? No, the mingw headers are (re)written by hand, or something like that. The VSS license does not permit redistribution. :/ Can we follow the same process for VSS as for the other mingw headers? I have no idea. Many of them are automatically generated, but some of the others have content seemed non-trivial. TBH I don't see how one can write them by hand without deriving it from the SDK, which seems dubious. But if it worked for the rest of mingw (which comes with Windows and DirectX headers, for example) then maybe it can work here too. I'd rather avoid muddy legal waters... I'll try to figure out mingw's process. If mingw has a process that is clean, then let's use it for VSS headers too. What gives me hope is that mingw headers are shipped by Fedora and Debian so the legal situation must be somewhat reasonable. Stefan
Re: [Qemu-devel] 'qemu-nbd' explicit flush
On Thu, May 23, 2013 at 09:58:31PM +, Mark Trumpold wrote: I have a working configuration using the signal approach suggested by Stefan. 'qemu-nbd.c' is patched as follows: do { main_loop_wait(false); + if (sighup_reported) { + sighup_reported = false; + bdrv_drain_all(); + bdrv_flush_all(); } } while (!sigterm_reported (persistent || !nbd_started || nb_fds 0)); The driving script was patched as follows: mount -o remount,ro /dev/nbd0 blockdev --flushbufs /dev/nbd0 +kill -HUP qemu-nbd process id I needed to retain 'blockdev --flushbufs' for things to work. Seems the 'bdrv_flush_all' is flushing what is being missed by the blockdev flush. I did not go back an retest with 'fsync' or other approaches I had tried before. Okay, that makes sense: 'blockdev --flushbufs' is writing dirty pages to the NBD device. bdrv_drain_all() + bdrv_flush_all() ensures that image file writes reach the physical disk. One thing to be careful of is whether these operations are asynchronous. The signal is asynchronous, you have no way of knowing when qemu-nbd is finished flushing to the physical disk. I didn't check blockdev(8) but it could be the same there. So watch out, otherwise your script is timing-dependent and may not actually have finished flushing when you take the snapshot. Stefan
Re: [Qemu-devel] 'qemu-nbd' explicit flush
On Thu, May 23, 2013 at 11:35:24PM +, Mark Trumpold wrote: I had one question I forgot to ask.. Is it possible to switch from '--cache=writeback' functionality to '--cache=writethrough' (and visa versa) while qemu-nbd is connected to the '/dev/nbdx' device? No. The block layer APIs to do that are available in QEMU but qemu-nbd doesn't have run-time cache mode changing. Stefan
Re: [Qemu-devel] [RFC PATCH v3 00/11] qemu-ga: fsfreeze on Windows using VSS
On Thu, May 23, 2013 at 03:50:19PM +0200, Paolo Bonzini wrote: Il 23/05/2013 15:28, Stefan Hajnoczi ha scritto: On Thu, May 23, 2013 at 3:20 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/05/2013 14:15, Stefan Hajnoczi ha scritto: On Tue, May 21, 2013 at 11:33:33AM -0400, Tomoki Sekiyama wrote: * How to build run qemu-ga with VSS support - Download Microsoft VSS SDK from: http://www.microsoft.com/en-us/download/details.aspx?id=23490 - Setup the SDK scripts/extract-vsssdk-headers setup.exe (on POSIX-systems) - Specify installed SDK directory to configure option as: ./configure -with-vss-sdk=path/to/VSS SDK --cross-prefix=i686-w64-mingw32- Are there any plans to make this more developer-friendly? In the Linux world it's unusual to download third-party SDKs; development headers are available as packages from the distro. I haven't looked at the SDK license but hopefully the VSS headers can be added to the mingw cross-build toolchain? No, the mingw headers are (re)written by hand, or something like that. The VSS license does not permit redistribution. :/ Can we follow the same process for VSS as for the other mingw headers? I have no idea. Many of them are automatically generated, but some of the others have content seemed non-trivial. Here we go: For missing declarations or import functions in the mingw runtime or w32api it is essential to specify the source of the documentation on which you based your patch, or to otherwise provide proof that you have developed your solution through your own experimental effort. Never resolve an issue by looking at someone else's work (unless it is public domain). We only accept patches based on publicly available documentation, (MSDN is an acceptable source, as is any BSD or similarly licensed source; no GPL or LPGL licensed source is acceptable, unless your patch relates to a MinGW package which is already distributed under one of these licences), or on individual experimental effort, and never based on others' work, (particularly where that work may be proprietary). Note, in particular, that we adopt a much stricter standpoint on extracting information from Microsoft's SDK headers, than that taken by some other projects, such as Wine or ReactOS; consequently we will always reject, unconditionally, patches based on information extracted from files published by such projects. http://www.mingw.org/wiki/SubmitPatches So basically they reimplement headers by studying MSDN and maybe experimenting in Visual C++ without peeking at the official SDK. This would take extra effort. It would be a nice thing to have in mingw but is out of scope here. Stefan
Re: [Qemu-devel] [PATCH v5 07/11] curl: make use of CURLDataCache.
On Fri, May 24, 2013 at 11:10:26AM +0800, Fam Zheng wrote: On Thu, 05/23 16:23, Stefan Hajnoczi wrote: On Thu, May 23, 2013 at 11:38:05AM +0800, Fam Zheng wrote: +CURLDataCache *c = s-cache; size_t realsize = size * nmemb; -int i; - -DPRINTF(CURL: Just reading %zd bytes\n, realsize); +CURLAIOCB *acb; -if (!s || !s-orig_buf) +if (!c || !c-data) { goto read_end; +} +if (c-write_pos = c-data_len) { +goto read_end; +} +memcpy(c-data + c-write_pos, ptr, + MIN(realsize, c-data_len - c-write_pos)); +c-write_pos += realsize; +if (c-write_pos = c-data_len) { +c-write_pos = c-data_len; +} -memcpy(s-orig_buf + s-buf_off, ptr, realsize); -s-buf_off += realsize; Why did you add MIN(realsize, c-data_len - c-write_pos)? The original code trusts realsize to be within s-orig_buf. I don't see an evidence why it's safe here. CURL certainly doesn't know how much buffer do we have. (man 3 curl_easy_setopt, section CURLOPT_WRITEFUNCTION) The HTTP request included a Range: header so we should know the total number of bytes we'll receive. That said, libcurl may not check so this is defensive programming. A malicious server shouldn't be able to overflow our buffer. A comment or note in the commit description would be nice to explain semantic changes like this. @@ -600,29 +596,41 @@ static void curl_readv_bh_cb(void *p) // No cache found, so let's start a new request state = curl_init_state(s); if (!state) { -acb-common.cb(acb-common.opaque, -EIO); -qemu_aio_release(acb); -return; +goto err_release; } -acb-start = 0; -acb-end = (acb-nb_sectors * SECTOR_SIZE); - -state-buf_off = 0; -if (state-orig_buf) -g_free(state-orig_buf); -state-buf_start = start; -state-buf_len = acb-end + s-readahead_size; -end = MIN(start + state-buf_len, s-len) - 1; -state-orig_buf = g_malloc(state-buf_len); -state-acb[0] = acb; - -snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, start, end); -DPRINTF(CURL (AIO): Reading %d at %zd (%s)\n, -(acb-nb_sectors * SECTOR_SIZE), start, state-range); -curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range); +cache = g_malloc0(sizeof(CURLDataCache)); +cache-base_pos = acb-sector_num * SECTOR_SIZE; +cache-data_len = aio_bytes + s-readahead_size; +cache-write_pos = 0; +cache-data = g_malloc(cache-data_len); +QLIST_INSERT_HEAD(s-acbs, acb, next); +snprintf(state-range, sizeof(state-range) - 1, %zd-%zd, cache-base_pos, + cache-base_pos + cache-data_len); +DPRINTF(Reading range: %s\n, state-range); +curl_easy_setopt(state-curl, CURLOPT_RANGE, state-range); +QLIST_INSERT_HEAD(s-cache, cache, next); +state-cache = cache; +cache-use_count++; I don't see where you bump the use_count when a cache lookup is successful. Maybe I just missed it in the other patches. Use count is for serving as the receiving buffer for submitted CURL requests. It's not necessary to bump use_count when cache lookup is successful, since data is immediately copied to guest, no ref to the cache hold. You're right. Stefan
Re: [Qemu-devel] Designing QMP APIs at KVM Forum
On Thu, May 23, 2013 at 07:59:17AM -0600, Eric Blake wrote: On 05/23/2013 06:46 AM, Luiz Capitulino wrote: On Thu, 23 May 2013 13:51:22 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: With better QMP introspection on the horizon and work in various subsystems pushing QMP boundaries it would be useful to bring together the latest best practices for designing QMP APIs. There are design rules for keeping QMP APIs extensible and for allowing clients to detect the presence of features. There is also QEMU-side infrastructure like event rate-limiting, which developers should make use of where appropriate. Is anyone willing to bring together the best practices and present them at KVM Forum this year? I think this is a great idea and I vote for Eric to prepare a presentation. Eric is doing an exceptional work on QMP command review, he is also experienced on the client side. Indeed, it looks like I have a good topic for presentation. Yes, I'll take on that task for the KVM forum. Great, looking forward to your presentation. Stefan
Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage
On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote: This patch series provides VNVRAM persistent storage support that QEMU can use internally. The initial target user will be a software vTPM 1.2 backend that needs to store keys in VNVRAM and be able to reboot/migrate and retain the keys. This support uses QEMU's block driver to provide persistent storage by reading/writing VNVRAM data from/to a drive image. The VNVRAM drive image is provided with the -drive command line option just like any other drive image and the vnvram_create() API will find it. The APIs allow for VNVRAM entries to be registered, one at a time, each with a maximum blob size. Entry blobs can then be read/written from/to an entry on the drive. Here's an example of usage: VNVRAM *vnvram; int errcode const VNVRAMEntryName entry_name; const char *blob_w = blob data; char *blob_r; uint32_t blob_r_size; vnvram = vnvram_create(drive-ide0-0-0, false, errcode); strcpy((char *)entry_name, first-entry); VNVRAMEntryName is very prone to buffer overflow. I hope real code doesn't use strcpy(). The cast is ugly, please don't hide the type. vnvram_register_entry(vnvram, entry_name, 1024); vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1); vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size); These are synchronous functions. If I/O is involved then this is a problem: QEMU will be blocked waiting for host I/O to complete and the big QEMU lock is held. This can cause poor guest interactivity and poor scalability because vcpus cannot make progress, neither can the QEMU monitor respond. Stefan
Re: [Qemu-devel] [PATCH V2 2/5] block: move snapshot code in block.c to block/snapshot.c
On Thu, May 23, 2013 at 04:47:13PM +0800, Wenchao Xia wrote: -int bdrv_is_snapshot(BlockDriverState *bs) -{ -return !!(bs-open_flags BDRV_O_SNAPSHOT); -} No need to respin, but this function has nothing to do with the other functions. This function is about -drive snapshot=on, the others are about internal snapshots. It should not go into block/snapshot.c. At least it should be documented as having nothing to do with internal snapshots. Stefan
Re: [Qemu-devel] [PATCH V2 4/5] util: add new function message_printf()
On Thu, May 23, 2013 at 04:47:15PM +0800, Wenchao Xia wrote: This function takes an input parameter *output, which can be specified by caller as stderr, stdout or a monitor. error_vprintf() now calls message_vprintf(), which is a static function added in this patch. Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com --- include/qemu/error-report.h | 13 + util/qemu-error.c | 28 ++-- 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h index c902cc1..cdde78b 100644 --- a/include/qemu/error-report.h +++ b/include/qemu/error-report.h @@ -14,6 +14,8 @@ #define QEMU_ERROR_H #include stdarg.h +#include stdio.h +#include qemu/typedefs.h typedef struct Location { /* all members are private to qemu-error.c */ @@ -32,6 +34,17 @@ void loc_set_none(void); void loc_set_cmdline(char **argv, int idx, int cnt); void loc_set_file(const char *fname, int lno); +typedef struct QemuOutput { +enum { OUTPUT_STREAM, OUTPUT_MONITOR } kind; +union { +FILE *stream; +Monitor *monitor; +}; +} QemuOutput; + +void message_printf(const QemuOutput *output, const char *fmt, ...) +GCC_FMT_ATTR(2, 3); This is introducing a slightly different solution for fprintf_function, which is already widely used: $ git grep fprintf_function | wc -l 101 Please reuse fprintf_function. Stefan
Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed under xen-4.1.2 and qemu upstream
On Fri, May 24, 2013 at 02:59:05AM +, Gonglei (Arei) wrote: Hi, On Thu, 23 May 2013, Gonglei (Arei) wrote: Hi, all I use O_DIRECT to open disk images for IDE, but I'm failed. After debug, I get the below logs: [2013-05-22 23:25:46] ide: CMD=c8 [2013-05-22 23:25:46] bmdma: readb 0x00 : 0x08 [2013-05-22 23:25:46] bmdma: writeb 0x00 : 0x09 [2013-05-22 23:25:46] bmdma_cmd_writeb: 0x0009 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:26:39] bmdma: writeb 0x00 : 0x08 [2013-05-22 23:26:39] bmdma_cmd_writeb: 0x0008 [2013-05-22 23:26:56] == offset:0 buf:0x7ff100f21c00 count:512 aio_offset:0 [2013-05-22 23:31:30] == offset:0 buf:0x7ff100f21c00 count:512 aio_offset:0 [2013-05-22 23:31:30] == handle_aiocb_rw_linear errno: -14 [2013-05-22 23:31:30] == paio_complete errno=14 [2013-05-22 23:31:30] == ide_dma_error!!! [2013-05-22 23:31:30] ide: read status addr=0x3f6 val=41 QEMU command line : qemu-system-i386 -xen-domid 837 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-837,server,nowait -mon chardev=libxl-cmd,mode=control -name suse11 -vnc 0.0.0.0:1 -serial pty -boot order=c -usb -usbdevice tablet -smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:13:d3:72 -netdev type=tap,id=net0,ifname=tap837.0,downscript=no -M xenfv -m 2040 -drive file=/mnt/sdd/image/suse.image,if=ide,index=0,media=disk,format=raw,cache =none errno 14 shows Bad Address. And I find QEMU_AIO_MISALIGNED flag bit is not set through debug. /* * If O_DIRECT is used the buffer needs to be aligned on a sector * boundary. Check if this is the case or tell the low-level * driver that it needs to copy the buffer. */ if ((bs-open_flags BDRV_O_NOCACHE)) { if (!bdrv_qiov_is_aligned(bs, qiov)) {//if the address is aligned-512, will no meet the conditions type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s-use_aio) { return laio_submit(bs, s-aio_ctx, s-fd, sector_num, qiov, nb_sectors, cb, opaque, type); #endif Next process: static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb) { ssize_t nbytes; char *buf; if (!(aiocb-aio_type QEMU_AIO_MISALIGNED)) { /* * If there is just a single buffer, and it is properly aligned * we can just use plain pread/pwrite without any problems. */ if (aiocb-aio_niov == 1) return handle_aiocb_rw_linear(aiocb, aiocb-aio_iov-iov_base); //this way, and reports errno 14 next Anyone have a good method to resolve this bug? Thanks! I know that this is not the answer you are looking for but why do you want to use O_DIRECT with IDE? It should be perfectly safe to use write-back. A few days ago, I asked a question about the IDE FLUSH time of guest: http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg02642.html finally I found that because Qemu use write-back flag to open disk images by default. so I hope to use O_DIRECT to avoid meeting that problem, but I'm failed under Xen platform with Qemu upstream. cache=none does not eliminate the need for flush. If you want to do that, try cache=writethrough or cache=directsync. Regarding the EFAULT you are seeing, did you check if the I/O buffer address is valid? Try breaking in gdb and inspecting /proc/pid/maps or just x buffer-address in gdb to see if the buffer is readable. Stefan
Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage
On Fri, May 24, 2013 at 08:13:27AM -0400, Stefan Berger wrote: On 05/24/2013 05:59 AM, Stefan Hajnoczi wrote: On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote: This patch series provides VNVRAM persistent storage support that QEMU can use internally. The initial target user will be a software vTPM 1.2 backend that needs to store keys in VNVRAM and be able to reboot/migrate and retain the keys. This support uses QEMU's block driver to provide persistent storage by reading/writing VNVRAM data from/to a drive image. The VNVRAM drive image is provided with the -drive command line option just like any other drive image and the vnvram_create() API will find it. The APIs allow for VNVRAM entries to be registered, one at a time, each with a maximum blob size. Entry blobs can then be read/written from/to an entry on the drive. Here's an example of usage: VNVRAM *vnvram; int errcode const VNVRAMEntryName entry_name; const char *blob_w = blob data; char *blob_r; uint32_t blob_r_size; vnvram = vnvram_create(drive-ide0-0-0, false, errcode); strcpy((char *)entry_name, first-entry); VNVRAMEntryName is very prone to buffer overflow. I hope real code doesn't use strcpy(). The cast is ugly, please don't hide the type. vnvram_register_entry(vnvram, entry_name, 1024); vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1); vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size); These are synchronous functions. If I/O is involved then this is a problem: QEMU will be blocked waiting for host I/O to complete and the big QEMU lock is held. This can cause poor guest interactivity and poor scalability because vcpus cannot make progress, neither can the QEMU monitor respond. The vTPM is going to run as a thread and will have to write state blobs into a bdrv. The above functions will typically be called from this thead. When I originally wrote the code, the vTPM thread could not write the blobs into bdrv directly, so I had to resort to sending a message to the main QEMU thread to write the data to the bdrv. How else could we do this? How else: use asynchronous APIs like bdrv_aio_writev() or the coroutine versions (which eliminate the need for callbacks) like bdrv_co_writev(). I'm preparing patches that allow the QEMU block layer to be used safely outside the QEMU global mutex. Once this is possible it would be okay to use synchronous methods. Stefan
[Qemu-devel] [PULL 00/11] Block patches
The following changes since commit 64afc2b4d48fb21e085517c38a59a3f61a11283c: Merge remote-tracking branch 'luiz/queue/qmp' into staging (2013-05-23 14:16:35 -0500) are available in the git repository at: git://github.com/stefanha/qemu.git block for you to fetch changes up to 02ffb504485f0920cfc75a0982a602f824a9a4f4: coroutine: stop using AioContext in CoQueue (2013-05-24 16:17:56 +0200) Kevin Wolf (4): blockdev: Rename BlockdevAction - TransactionAction qemu-io: Fix 'map' output qcow2.py: Subcommand for changing header fields qemu-iotests: Try creating huge qcow2 image Stefan Hajnoczi (2): coroutine: protect global pool with a mutex coroutine: stop using AioContext in CoQueue Wenchao Xia (5): block: package preparation code in qmp_transaction() block: move input parsing code in qmp_transaction() block: package committing code in qmp_transaction() block: package rollback code in qmp_transaction() block: make all steps in qmp_transaction() as callback blockdev.c| 280 ++ include/block/coroutine_int.h | 4 + qapi-schema.json | 21 ++-- qemu-coroutine-lock.c | 56 - qemu-coroutine.c | 23 +++- qemu-io.c | 46 ++- tests/qemu-iotests/054| 58 + tests/qemu-iotests/054.out| 10 ++ tests/qemu-iotests/common.rc | 2 +- tests/qemu-iotests/group | 1 + tests/qemu-iotests/qcow2.py | 17 +++ trace-events | 2 +- 12 files changed, 361 insertions(+), 159 deletions(-) create mode 100755 tests/qemu-iotests/054 create mode 100644 tests/qemu-iotests/054.out -- 1.8.1.4
[Qemu-devel] [PATCH 02/11] block: move input parsing code in qmp_transaction()
From: Wenchao Xia xiaw...@linux.vnet.ibm.com The code is moved into preparation function, and changed a bit to tip more clearly what it is doing. Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/blockdev.c b/blockdev.c index 1cb9640..0115f46 100644 --- a/blockdev.c +++ b/blockdev.c @@ -785,10 +785,7 @@ typedef struct BlkTransactionStates { QSIMPLEQ_ENTRY(BlkTransactionStates) entry; } BlkTransactionStates; -static void external_snapshot_prepare(const char *device, - const char *format, - const char *new_image_file, - enum NewImageMode mode, +static void external_snapshot_prepare(BlockdevAction *action, BlkTransactionStates *states, Error **errp) { @@ -796,7 +793,24 @@ static void external_snapshot_prepare(const char *device, BlockDriver *drv; int flags, ret; Error *local_err = NULL; +const char *device; +const char *new_image_file; +const char *format = qcow2; +enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; +/* get parameters */ +g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC); + +device = action-blockdev_snapshot_sync-device; +new_image_file = action-blockdev_snapshot_sync-snapshot_file; +if (action-blockdev_snapshot_sync-has_format) { +format = action-blockdev_snapshot_sync-format; +} +if (action-blockdev_snapshot_sync-has_mode) { +mode = action-blockdev_snapshot_sync-mode; +} + +/* start processing */ drv = bdrv_find_format(format); if (!drv) { error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); @@ -877,10 +891,6 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) /* We don't do anything in this loop that commits us to the snapshot */ while (NULL != dev_entry) { BlockdevAction *dev_info = NULL; -enum NewImageMode mode; -const char *new_image_file; -const char *device; -const char *format = qcow2; dev_info = dev_entry-value; dev_entry = dev_entry-next; @@ -890,17 +900,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) switch (dev_info-kind) { case BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC: -device = dev_info-blockdev_snapshot_sync-device; -if (!dev_info-blockdev_snapshot_sync-has_mode) { -dev_info-blockdev_snapshot_sync-mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; -} -new_image_file = dev_info-blockdev_snapshot_sync-snapshot_file; -if (dev_info-blockdev_snapshot_sync-has_format) { -format = dev_info-blockdev_snapshot_sync-format; -} -mode = dev_info-blockdev_snapshot_sync-mode; -external_snapshot_prepare(device, format, new_image_file, - mode, states, local_err); +external_snapshot_prepare(dev_info, states, errp); if (error_is_set(local_err)) { error_propagate(errp, local_err); goto delete_and_fail; -- 1.8.1.4
[Qemu-devel] [PATCH 03/11] block: package committing code in qmp_transaction()
From: Wenchao Xia xiaw...@linux.vnet.ibm.com The code is simply moved into a separate function. Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/blockdev.c b/blockdev.c index 0115f46..76f0532 100644 --- a/blockdev.c +++ b/blockdev.c @@ -871,6 +871,17 @@ static void external_snapshot_prepare(BlockdevAction *action, } } +static void external_snapshot_commit(BlkTransactionStates *states) +{ +/* This removes our old bs from the bdrv_states, and adds the new bs */ +bdrv_append(states-new_bs, states-old_bs); +/* We don't need (or want) to use the transactional + * bdrv_reopen_multiple() across all the entries at once, because we + * don't want to abort all of them if one of them fails the reopen */ +bdrv_reopen(states-new_bs, states-new_bs-open_flags ~BDRV_O_RDWR, +NULL); +} + /* * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail * then we do not pivot any of the devices in the group, and abandon the @@ -916,13 +927,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) /* Now we are going to do the actual pivot. Everything up to this point * is reversible, but we are committed at this point */ QSIMPLEQ_FOREACH(states, snap_bdrv_states, entry) { -/* This removes our old bs from the bdrv_states, and adds the new bs */ -bdrv_append(states-new_bs, states-old_bs); -/* We don't need (or want) to use the transactional - * bdrv_reopen_multiple() across all the entries at once, because we - * don't want to abort all of them if one of them fails the reopen */ -bdrv_reopen(states-new_bs, states-new_bs-open_flags ~BDRV_O_RDWR, -NULL); +external_snapshot_commit(states); } /* success */ -- 1.8.1.4
[Qemu-devel] [PATCH 07/11] qemu-io: Fix 'map' output
From: Kevin Wolf kw...@redhat.com The output of the 'map' command in qemu-io used to directly resemble bdrv_is_allocated() and could contain many lines for small chunks that all have the same allocation status. After this patch, they will be coalesced into a single output line for a large chunk. As a side effect, the command gains some error handling. Signed-off-by: Kevin Wolf kw...@redhat.com --- qemu-io.c | 46 +- 1 file changed, 41 insertions(+), 5 deletions(-) diff --git a/qemu-io.c b/qemu-io.c index 475a8bd..5e6680b 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -1635,12 +1635,43 @@ static const cmdinfo_t alloc_cmd = { .oneline= checks if a sector is present in the file, }; + +static int map_is_allocated(int64_t sector_num, int64_t nb_sectors, int64_t *pnum) +{ +int num, num_checked; +int ret, firstret; + +num_checked = MIN(nb_sectors, INT_MAX); +ret = bdrv_is_allocated(bs, sector_num, num_checked, num); +if (ret 0) { +return ret; +} + +firstret = ret; +*pnum = num; + +while (nb_sectors 0 ret == firstret) { +sector_num += num; +nb_sectors -= num; + +num_checked = MIN(nb_sectors, INT_MAX); +ret = bdrv_is_allocated(bs, sector_num, num_checked, num); +if (ret == firstret) { +*pnum += num; +} else { +break; +} +} + +return firstret; +} + static int map_f(int argc, char **argv) { int64_t offset; int64_t nb_sectors; char s1[64]; -int num, num_checked; +int64_t num; int ret; const char *retstr; @@ -1648,12 +1679,17 @@ static int map_f(int argc, char **argv) nb_sectors = bs-total_sectors; do { -num_checked = MIN(nb_sectors, INT_MAX); -ret = bdrv_is_allocated(bs, offset, num_checked, num); +ret = map_is_allocated(offset, nb_sectors, num); +if (ret 0) { +error_report(Failed to get allocation status: %s, strerror(-ret)); +return 0; +} + retstr = ret ? allocated : not allocated; cvtstr(offset 9ULL, s1, sizeof(s1)); -printf([% 24 PRId64 ] % 8d/% 8d sectors %s at offset %s (%d)\n, - offset 9ULL, num, num_checked, retstr, s1, ret); +printf([% 24 PRId64 ] % 8 PRId64 /% 8 PRId64 sectors %s + at offset %s (%d)\n, + offset 9ULL, num, nb_sectors, retstr, s1, ret); offset += num; nb_sectors -= num; -- 1.8.1.4
[Qemu-devel] [PATCH 05/11] block: make all steps in qmp_transaction() as callback
From: Wenchao Xia xiaw...@linux.vnet.ibm.com Make it easier to add other operations to qmp_transaction() by using callbacks, with external snapshots serving as an example implementation of the callbacks. Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 95 ++ 1 file changed, 71 insertions(+), 24 deletions(-) diff --git a/blockdev.c b/blockdev.c index 2131e13..617501c 100644 --- a/blockdev.c +++ b/blockdev.c @@ -779,14 +779,43 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, /* New and old BlockDriverState structs for group snapshots */ -typedef struct BlkTransactionStates { + +typedef struct BlkTransactionStates BlkTransactionStates; + +/* Only prepare() may fail. In a single transaction, only one of commit() or + abort() will be called, clean() will always be called if it present. */ +typedef struct BdrvActionOps { +/* Size of state struct, in bytes. */ +size_t instance_size; +/* Prepare the work, must NOT be NULL. */ +void (*prepare)(BlkTransactionStates *common, Error **errp); +/* Commit the changes, must NOT be NULL. */ +void (*commit)(BlkTransactionStates *common); +/* Abort the changes on fail, can be NULL. */ +void (*abort)(BlkTransactionStates *common); +/* Clean up resource in the end, can be NULL. */ +void (*clean)(BlkTransactionStates *common); +} BdrvActionOps; + +/* + * This structure must be arranged as first member in child type, assuming + * that compiler will also arrange it to the same address with parent instance. + * Later it will be used in free(). + */ +struct BlkTransactionStates { +BlockdevAction *action; +const BdrvActionOps *ops; +QSIMPLEQ_ENTRY(BlkTransactionStates) entry; +}; + +/* external snapshot private data */ +typedef struct ExternalSnapshotStates { +BlkTransactionStates common; BlockDriverState *old_bs; BlockDriverState *new_bs; -QSIMPLEQ_ENTRY(BlkTransactionStates) entry; -} BlkTransactionStates; +} ExternalSnapshotStates; -static void external_snapshot_prepare(BlockdevAction *action, - BlkTransactionStates *states, +static void external_snapshot_prepare(BlkTransactionStates *common, Error **errp) { BlockDriver *proto_drv; @@ -797,6 +826,9 @@ static void external_snapshot_prepare(BlockdevAction *action, const char *new_image_file; const char *format = qcow2; enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; +ExternalSnapshotStates *states = + DO_UPCAST(ExternalSnapshotStates, common, common); +BlockdevAction *action = common-action; /* get parameters */ g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC); @@ -871,8 +903,11 @@ static void external_snapshot_prepare(BlockdevAction *action, } } -static void external_snapshot_commit(BlkTransactionStates *states) +static void external_snapshot_commit(BlkTransactionStates *common) { +ExternalSnapshotStates *states = + DO_UPCAST(ExternalSnapshotStates, common, common); + /* This removes our old bs from the bdrv_states, and adds the new bs */ bdrv_append(states-new_bs, states-old_bs); /* We don't need (or want) to use the transactional @@ -882,13 +917,24 @@ static void external_snapshot_commit(BlkTransactionStates *states) NULL); } -static void external_snapshot_abort(BlkTransactionStates *states) +static void external_snapshot_abort(BlkTransactionStates *common) { +ExternalSnapshotStates *states = + DO_UPCAST(ExternalSnapshotStates, common, common); if (states-new_bs) { bdrv_delete(states-new_bs); } } +static const BdrvActionOps actions[] = { +[BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = { +.instance_size = sizeof(ExternalSnapshotStates), +.prepare = external_snapshot_prepare, +.commit = external_snapshot_commit, +.abort = external_snapshot_abort, +}, +}; + /* * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail * then we do not pivot any of the devices in the group, and abandon the @@ -909,32 +955,28 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) /* We don't do anything in this loop that commits us to the snapshot */ while (NULL != dev_entry) { BlockdevAction *dev_info = NULL; +const BdrvActionOps *ops; dev_info = dev_entry-value; dev_entry = dev_entry-next; -states = g_malloc0(sizeof(BlkTransactionStates)); +assert(dev_info-kind ARRAY_SIZE(actions)); + +ops = actions[dev_info-kind
[Qemu-devel] [PATCH 09/11] qemu-iotests: Try creating huge qcow2 image
From: Kevin Wolf kw...@redhat.com It's supposed to fail gracefully instead of segfaulting. Signed-off-by: Kevin Wolf kw...@redhat.com --- tests/qemu-iotests/054 | 58 tests/qemu-iotests/054.out | 10 tests/qemu-iotests/common.rc | 2 +- tests/qemu-iotests/group | 1 + 4 files changed, 70 insertions(+), 1 deletion(-) create mode 100755 tests/qemu-iotests/054 create mode 100644 tests/qemu-iotests/054.out diff --git a/tests/qemu-iotests/054 b/tests/qemu-iotests/054 new file mode 100755 index 000..b360429 --- /dev/null +++ b/tests/qemu-iotests/054 @@ -0,0 +1,58 @@ +#!/bin/bash +# +# Test huge qcow2 images +# +# Copyright (C) 2013 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +# creator +owner=kw...@redhat.com + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +_supported_fmt qcow2 +_supported_proto generic +_supported_os Linux + +echo +echo creating too large image (1 EB) +_make_test_img $((1024*1024))T + +echo +echo creating too large image (1 EB) using qcow2.py +_make_test_img 4G +./qcow2.py $TEST_IMG set-header size $((1024 ** 6)) +_check_test_img + +# success, all done +echo *** done +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/054.out b/tests/qemu-iotests/054.out new file mode 100644 index 000..0b2fe30 --- /dev/null +++ b/tests/qemu-iotests/054.out @@ -0,0 +1,10 @@ +QA output created by 054 + +creating too large image (1 EB) +qemu-img: The image size is too large for file format 'qcow2' +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1152921504606846976 + +creating too large image (1 EB) using qcow2.py +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 +qemu-img: Could not open 'TEST_DIR/t.qcow2': File too large +*** done diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index 31eb62b..e9ba358 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -167,7 +167,7 @@ _cleanup_test_img() _check_test_img() { -$QEMU_IMG check $@ -f $IMGFMT $TEST_IMG 21 | \ +$QEMU_IMG check $@ -f $IMGFMT $TEST_IMG 21 | _filter_testdir | \ sed -e '/allocated.*fragmented.*compressed clusters/d' \ -e 's/qemu-img: This image format does not support checks/No errors were found on the image./' \ -e '/Image end offset: [0-9]\+/d' diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group index bf944d9..387b050 100644 --- a/tests/qemu-iotests/group +++ b/tests/qemu-iotests/group @@ -60,3 +60,4 @@ #051 rw auto 052 rw auto backing 053 rw auto +054 rw auto -- 1.8.1.4
[Qemu-devel] [PATCH 08/11] qcow2.py: Subcommand for changing header fields
From: Kevin Wolf kw...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- tests/qemu-iotests/qcow2.py | 17 + 1 file changed, 17 insertions(+) diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py index fecf5b9..44a2b45 100755 --- a/tests/qemu-iotests/qcow2.py +++ b/tests/qemu-iotests/qcow2.py @@ -149,6 +149,22 @@ def cmd_dump_header(fd): h.dump() h.dump_extensions() +def cmd_set_header(fd, name, value): +try: +value = int(value, 0) +except: +print '%s' is not a valid number % value +sys.exit(1) + +fields = (field[2] for field in QcowHeader.fields) +if not name in fields: +print '%s' is not a known header field % name +sys.exit(1) + +h = QcowHeader(fd) +h.__dict__[name] = value +h.update(fd) + def cmd_add_header_ext(fd, magic, data): try: magic = int(magic, 0) @@ -205,6 +221,7 @@ def cmd_set_feature_bit(fd, group, bit): cmds = [ [ 'dump-header',cmd_dump_header,0, 'Dump image header and header extensions' ], +[ 'set-header', cmd_set_header, 2, 'Set a field in the header'], [ 'add-header-ext', cmd_add_header_ext, 2, 'Add a header extension' ], [ 'del-header-ext', cmd_del_header_ext, 1, 'Delete a header extension' ], [ 'set-feature-bit', cmd_set_feature_bit, 2, 'Set a feature bit'], -- 1.8.1.4
[Qemu-devel] [PATCH 06/11] blockdev: Rename BlockdevAction - TransactionAction
From: Kevin Wolf kw...@redhat.com There's no reason to restrict transactions to operations related to block devices, so rename the type now before schema introspection stops us from doing so. Also change the schema documentation of 'transaction' to not refer to block devices or snapshots any more. Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 22 +++--- qapi-schema.json | 21 ++--- 2 files changed, 21 insertions(+), 22 deletions(-) diff --git a/blockdev.c b/blockdev.c index 617501c..d1ec99a 100644 --- a/blockdev.c +++ b/blockdev.c @@ -750,8 +750,8 @@ void do_commit(Monitor *mon, const QDict *qdict) static void blockdev_do_action(int kind, void *data, Error **errp) { -BlockdevAction action; -BlockdevActionList list; +TransactionAction action; +TransactionActionList list; action.kind = kind; action.data = data; @@ -773,8 +773,8 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, .has_mode = has_mode, .mode = mode, }; -blockdev_do_action(BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC, snapshot, - errp); +blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC, + snapshot, errp); } @@ -803,7 +803,7 @@ typedef struct BdrvActionOps { * Later it will be used in free(). */ struct BlkTransactionStates { -BlockdevAction *action; +TransactionAction *action; const BdrvActionOps *ops; QSIMPLEQ_ENTRY(BlkTransactionStates) entry; }; @@ -828,10 +828,10 @@ static void external_snapshot_prepare(BlkTransactionStates *common, enum NewImageMode mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS; ExternalSnapshotStates *states = DO_UPCAST(ExternalSnapshotStates, common, common); -BlockdevAction *action = common-action; +TransactionAction *action = common-action; /* get parameters */ -g_assert(action-kind == BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC); +g_assert(action-kind == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC); device = action-blockdev_snapshot_sync-device; new_image_file = action-blockdev_snapshot_sync-snapshot_file; @@ -927,7 +927,7 @@ static void external_snapshot_abort(BlkTransactionStates *common) } static const BdrvActionOps actions[] = { -[BLOCKDEV_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = { +[TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC] = { .instance_size = sizeof(ExternalSnapshotStates), .prepare = external_snapshot_prepare, .commit = external_snapshot_commit, @@ -940,9 +940,9 @@ static const BdrvActionOps actions[] = { * then we do not pivot any of the devices in the group, and abandon the * snapshots */ -void qmp_transaction(BlockdevActionList *dev_list, Error **errp) +void qmp_transaction(TransactionActionList *dev_list, Error **errp) { -BlockdevActionList *dev_entry = dev_list; +TransactionActionList *dev_entry = dev_list; BlkTransactionStates *states, *next; Error *local_err = NULL; @@ -954,7 +954,7 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) /* We don't do anything in this loop that commits us to the snapshot */ while (NULL != dev_entry) { -BlockdevAction *dev_info = NULL; +TransactionAction *dev_info = NULL; const BdrvActionOps *ops; dev_info = dev_entry-value; diff --git a/qapi-schema.json b/qapi-schema.json index 664b31f..ef1f657 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1609,12 +1609,12 @@ '*mode': 'NewImageMode' } } ## -# @BlockdevAction +# @TransactionAction # # A discriminated record of operations that can be performed with # @transaction. ## -{ 'union': 'BlockdevAction', +{ 'union': 'TransactionAction', 'data': { 'blockdev-snapshot-sync': 'BlockdevSnapshot' } } @@ -1622,25 +1622,24 @@ ## # @transaction # -# Atomically operate on a group of one or more block devices. If -# any operation fails, then the entire set of actions will be -# abandoned and the appropriate error returned. The only operation -# supported is currently blockdev-snapshot-sync. +# Executes a number of transactionable QMP commands atomically. If any +# operation fails, then the entire set of actions will be abandoned and the +# appropriate error returned. # # List of: -# @BlockdevAction: information needed for the device snapshot +# @TransactionAction: information needed for the respective operation # # Returns: nothing on success -# If @device is not a valid block device, DeviceNotFound +# Errors depend on the operations of the transaction # -# Note: The transaction aborts on the first failure. Therefore, there will -# be only one device or snapshot file returned in an error condition, and +# Note: The transaction aborts on the first failure. Therefore, there will be +# information on only one failed operation
[Qemu-devel] [PATCH 10/11] coroutine: protect global pool with a mutex
The coroutine freelist is a global pool of unused coroutines. It avoids the setup/teardown overhead associated with the coroutine lifecycle. Since the pool is global, we need to synchronize access so that coroutines can be used outside the BQL. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- qemu-coroutine.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/qemu-coroutine.c b/qemu-coroutine.c index 25a14c6..60ac79e 100644 --- a/qemu-coroutine.c +++ b/qemu-coroutine.c @@ -14,6 +14,7 @@ #include trace.h #include qemu-common.h +#include qemu/thread.h #include block/coroutine.h #include block/coroutine_int.h @@ -23,6 +24,7 @@ enum { }; /** Free list to speed up creation */ +static QemuMutex pool_lock; static QSLIST_HEAD(, Coroutine) pool = QSLIST_HEAD_INITIALIZER(pool); static unsigned int pool_size; @@ -30,11 +32,15 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) { Coroutine *co; +qemu_mutex_lock(pool_lock); co = QSLIST_FIRST(pool); if (co) { QSLIST_REMOVE_HEAD(pool, pool_next); pool_size--; -} else { +} +qemu_mutex_unlock(pool_lock); + +if (!co) { co = qemu_coroutine_new(); } @@ -44,17 +50,25 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) static void coroutine_delete(Coroutine *co) { +qemu_mutex_lock(pool_lock); if (pool_size POOL_MAX_SIZE) { QSLIST_INSERT_HEAD(pool, co, pool_next); co-caller = NULL; pool_size++; +qemu_mutex_unlock(pool_lock); return; } +qemu_mutex_unlock(pool_lock); qemu_coroutine_delete(co); } -static void __attribute__((destructor)) coroutine_cleanup(void) +static void __attribute__((constructor)) coroutine_pool_init(void) +{ +qemu_mutex_init(pool_lock); +} + +static void __attribute__((destructor)) coroutine_pool_cleanup(void) { Coroutine *co; Coroutine *tmp; @@ -63,6 +77,8 @@ static void __attribute__((destructor)) coroutine_cleanup(void) QSLIST_REMOVE_HEAD(pool, pool_next); qemu_coroutine_delete(co); } + +qemu_mutex_destroy(pool_lock); } static void coroutine_swap(Coroutine *from, Coroutine *to) -- 1.8.1.4
[Qemu-devel] [PATCH 01/11] block: package preparation code in qmp_transaction()
From: Wenchao Xia xiaw...@linux.vnet.ibm.com The code before really committing is moved into a function. Most code is simply moved from qmp_transaction(), except that on fail it just returns now. Other code such as input parsing is not touched, to make it easier in review. Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 139 ++--- 1 file changed, 77 insertions(+), 62 deletions(-) diff --git a/blockdev.c b/blockdev.c index 7c9d8dd..1cb9640 100644 --- a/blockdev.c +++ b/blockdev.c @@ -785,6 +785,78 @@ typedef struct BlkTransactionStates { QSIMPLEQ_ENTRY(BlkTransactionStates) entry; } BlkTransactionStates; +static void external_snapshot_prepare(const char *device, + const char *format, + const char *new_image_file, + enum NewImageMode mode, + BlkTransactionStates *states, + Error **errp) +{ +BlockDriver *proto_drv; +BlockDriver *drv; +int flags, ret; +Error *local_err = NULL; + +drv = bdrv_find_format(format); +if (!drv) { +error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); +return; +} + +states-old_bs = bdrv_find(device); +if (!states-old_bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, device); +return; +} + +if (!bdrv_is_inserted(states-old_bs)) { +error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); +return; +} + +if (bdrv_in_use(states-old_bs)) { +error_set(errp, QERR_DEVICE_IN_USE, device); +return; +} + +if (!bdrv_is_read_only(states-old_bs)) { +if (bdrv_flush(states-old_bs)) { +error_set(errp, QERR_IO_ERROR); +return; +} +} + +flags = states-old_bs-open_flags; + +proto_drv = bdrv_find_protocol(new_image_file); +if (!proto_drv) { +error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); +return; +} + +/* create new image w/backing file */ +if (mode != NEW_IMAGE_MODE_EXISTING) { +bdrv_img_create(new_image_file, format, +states-old_bs-filename, +states-old_bs-drv-format_name, +NULL, -1, flags, local_err, false); +if (error_is_set(local_err)) { +error_propagate(errp, local_err); +return; +} +} + +/* We will manually add the backing_hd field to the bs later */ +states-new_bs = bdrv_new(); +/* TODO Inherit bs-options or only take explicit options with an + * extended QMP command? */ +ret = bdrv_open(states-new_bs, new_image_file, NULL, +flags | BDRV_O_NO_BACKING, drv); +if (ret != 0) { +error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file); +} +} + /* * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail * then we do not pivot any of the devices in the group, and abandon the @@ -792,7 +864,6 @@ typedef struct BlkTransactionStates { */ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) { -int ret = 0; BlockdevActionList *dev_entry = dev_list; BlkTransactionStates *states, *next; Error *local_err = NULL; @@ -806,9 +877,6 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) /* We don't do anything in this loop that commits us to the snapshot */ while (NULL != dev_entry) { BlockdevAction *dev_info = NULL; -BlockDriver *proto_drv; -BlockDriver *drv; -int flags; enum NewImageMode mode; const char *new_image_file; const char *device; @@ -831,70 +899,17 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp) format = dev_info-blockdev_snapshot_sync-format; } mode = dev_info-blockdev_snapshot_sync-mode; -break; -default: -abort(); -} - -drv = bdrv_find_format(format); -if (!drv) { -error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); -goto delete_and_fail; -} - -states-old_bs = bdrv_find(device); -if (!states-old_bs) { -error_set(errp, QERR_DEVICE_NOT_FOUND, device); -goto delete_and_fail; -} - -if (!bdrv_is_inserted(states-old_bs)) { -error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); -goto delete_and_fail; -} - -if (bdrv_in_use(states-old_bs)) { -error_set(errp, QERR_DEVICE_IN_USE, device); -goto delete_and_fail; -} - -if (!bdrv_is_read_only(states-old_bs
[Qemu-devel] [PATCH 2/2] rtl8139: flush queued packets when RxBufPtr is written
Net queues support efficient receive disable. For example, tap's file descriptor will not be polled while its peer has receive disabled. This saves CPU cycles for needlessly copying and then dropping packets which the peer cannot receive. rtl8139 is missing the qemu_flush_queued_packets() call that wakes the queue up when receive becomes possible again. As a result, the Windows 7 guest driver reaches a state where the rtl8139 cannot receive packets. The driver has actually refilled the receive buffer but we never resume reception. The bug can be reproduced by running a large FTP 'get' inside a Windows 7 guest: $ qemu -netdev tap,id=tap0,... -device rtl8139,netdev=tap0 The Linux guest driver does not trigger the bug, probably due to a different buffer management strategy. Reported-by: Oliver Francke oliver.fran...@filoo.de Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hw/net/rtl8139.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 9369507..7993f9f 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val) /* this value is off by 16 */ s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize); +/* more buffer space may be available so try to receive */ +qemu_flush_queued_packets(qemu_get_queue(s-nic)); + DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n, s-RxBufferSize, s-RxBufAddr, s-RxBufPtr); } -- 1.8.1.4
[Qemu-devel] [PATCH 1/2] net: support for bridged networking on Mac OS X
From: Alasdair McLeay alasdair.mcl...@me.com tun tap can be implemented on Mac OS X using http://tuntaposx.sourceforge.net It behaves in the same way as FreeBSD/OpenBSD implementations, but Qemu needs a patch to use the OpenBS/FreeBSD code. As per the patch listed in this forum thread: http://forum.gns3.net/post17679.html#p17679 And also as used in the MacPorts installation: https://trac.macports.org/browser/trunk/dports/emulators/qemu/files/patch-net-tap-interface.diff Signed-off-by: Alasdair McLeay alasdair.mcl...@me.com Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- net/tap-bsd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/tap-bsd.c b/net/tap-bsd.c index bcdb268..f61d580 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -44,7 +44,8 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, struct stat s; #endif -#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || defined(__OpenBSD__) +#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || \ +defined(__OpenBSD__) || defined(__APPLE__) /* if no ifname is given, always start the search from tap0/tun0. */ int i; char dname[100]; -- 1.8.1.4
[Qemu-devel] [PULL 0/2] Net patches
The following changes since commit 64afc2b4d48fb21e085517c38a59a3f61a11283c: Merge remote-tracking branch 'luiz/queue/qmp' into staging (2013-05-23 14:16:35 -0500) are available in the git repository at: git://github.com/stefanha/qemu.git net for you to fetch changes up to 00b7ade807b5ce6779ddd86ce29c5521ec5c529a: rtl8139: flush queued packets when RxBufPtr is written (2013-05-24 16:34:13 +0200) Alasdair McLeay (1): net: support for bridged networking on Mac OS X Stefan Hajnoczi (1): rtl8139: flush queued packets when RxBufPtr is written hw/net/rtl8139.c | 3 +++ net/tap-bsd.c| 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) -- 1.8.1.4
[Qemu-devel] [PATCH 11/11] coroutine: stop using AioContext in CoQueue
qemu_co_queue_next(queue) arranges that the next queued coroutine is run at a later point in time. This deferred restart is useful because the caller may not want to transfer control yet. This behavior was implemented using QEMUBH in the past, which meant that CoQueue (and hence CoMutex and CoRwlock) had a dependency on the AioContext event loop. This hidden dependency causes trouble when we move to a world with multiple event loops - now qemu_co_queue_next() needs to know which event loop to schedule the QEMUBH in. After pondering how to stash AioContext I realized the best solution is to not use AioContext at all. This patch implements the deferred restart behavior purely in terms of coroutines and no longer uses QEMUBH. Here is how it works: Each Coroutine has a wakeup queue that starts out empty. When qemu_co_queue_next() is called, the next coroutine is added to our wakeup queue. The wakeup queue is processed when we yield or terminate. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- include/block/coroutine_int.h | 4 qemu-coroutine-lock.c | 56 --- qemu-coroutine.c | 3 +++ trace-events | 2 +- 4 files changed, 29 insertions(+), 36 deletions(-) diff --git a/include/block/coroutine_int.h b/include/block/coroutine_int.h index 17eb71e..f133d65 100644 --- a/include/block/coroutine_int.h +++ b/include/block/coroutine_int.h @@ -38,6 +38,9 @@ struct Coroutine { void *entry_arg; Coroutine *caller; QSLIST_ENTRY(Coroutine) pool_next; + +/* Coroutines that should be woken up when we yield or terminate */ +QTAILQ_HEAD(, Coroutine) co_queue_wakeup; QTAILQ_ENTRY(Coroutine) co_queue_next; }; @@ -45,5 +48,6 @@ Coroutine *qemu_coroutine_new(void); void qemu_coroutine_delete(Coroutine *co); CoroutineAction qemu_coroutine_switch(Coroutine *from, Coroutine *to, CoroutineAction action); +void coroutine_fn qemu_co_queue_run_restart(Coroutine *co); #endif diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c index 86efe1f..d9fea49 100644 --- a/qemu-coroutine-lock.c +++ b/qemu-coroutine-lock.c @@ -26,39 +26,11 @@ #include block/coroutine.h #include block/coroutine_int.h #include qemu/queue.h -#include block/aio.h #include trace.h -/* Coroutines are awoken from a BH to allow the current coroutine to complete - * its flow of execution. The BH may run after the CoQueue has been destroyed, - * so keep BH data in a separate heap-allocated struct. - */ -typedef struct { -QEMUBH *bh; -QTAILQ_HEAD(, Coroutine) entries; -} CoQueueNextData; - -static void qemu_co_queue_next_bh(void *opaque) -{ -CoQueueNextData *data = opaque; -Coroutine *next; - -trace_qemu_co_queue_next_bh(); -while ((next = QTAILQ_FIRST(data-entries))) { -QTAILQ_REMOVE(data-entries, next, co_queue_next); -qemu_coroutine_enter(next, NULL); -} - -qemu_bh_delete(data-bh); -g_slice_free(CoQueueNextData, data); -} - void qemu_co_queue_init(CoQueue *queue) { QTAILQ_INIT(queue-entries); - -/* This will be exposed to callers once there are multiple AioContexts */ -queue-ctx = qemu_get_aio_context(); } void coroutine_fn qemu_co_queue_wait(CoQueue *queue) @@ -77,23 +49,37 @@ void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue *queue) assert(qemu_in_coroutine()); } +/** + * qemu_co_queue_run_restart: + * + * Enter each coroutine that was previously marked for restart by + * qemu_co_queue_next() or qemu_co_queue_restart_all(). This function is + * invoked by the core coroutine code when the current coroutine yields or + * terminates. + */ +void qemu_co_queue_run_restart(Coroutine *co) +{ +Coroutine *next; + +trace_qemu_co_queue_run_restart(co); +while ((next = QTAILQ_FIRST(co-co_queue_wakeup))) { +QTAILQ_REMOVE(co-co_queue_wakeup, next, co_queue_next); +qemu_coroutine_enter(next, NULL); +} +} + static bool qemu_co_queue_do_restart(CoQueue *queue, bool single) { +Coroutine *self = qemu_coroutine_self(); Coroutine *next; -CoQueueNextData *data; if (QTAILQ_EMPTY(queue-entries)) { return false; } -data = g_slice_new(CoQueueNextData); -data-bh = aio_bh_new(queue-ctx, qemu_co_queue_next_bh, data); -QTAILQ_INIT(data-entries); -qemu_bh_schedule(data-bh); - while ((next = QTAILQ_FIRST(queue-entries)) != NULL) { QTAILQ_REMOVE(queue-entries, next, co_queue_next); -QTAILQ_INSERT_TAIL(data-entries, next, co_queue_next); +QTAILQ_INSERT_TAIL(self-co_queue_wakeup, next, co_queue_next); trace_qemu_co_queue_next(next); if (single) { break; diff --git a/qemu-coroutine.c b/qemu-coroutine.c index 60ac79e..423430d 100644 --- a/qemu-coroutine.c +++ b/qemu-coroutine.c @@ -45,6 +45,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) } co-entry = entry
[Qemu-devel] [PATCH 04/11] block: package rollback code in qmp_transaction()
From: Wenchao Xia xiaw...@linux.vnet.ibm.com Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Reviewed-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Kevin Wolf kw...@redhat.com --- blockdev.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/blockdev.c b/blockdev.c index 76f0532..2131e13 100644 --- a/blockdev.c +++ b/blockdev.c @@ -882,6 +882,13 @@ static void external_snapshot_commit(BlkTransactionStates *states) NULL); } +static void external_snapshot_abort(BlkTransactionStates *states) +{ +if (states-new_bs) { +bdrv_delete(states-new_bs); +} +} + /* * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail * then we do not pivot any of the devices in the group, and abandon the @@ -939,9 +946,7 @@ delete_and_fail: * the original bs for all images */ QSIMPLEQ_FOREACH(states, snap_bdrv_states, entry) { -if (states-new_bs) { - bdrv_delete(states-new_bs); -} +external_snapshot_abort(states); } exit: QSIMPLEQ_FOREACH_SAFE(states, snap_bdrv_states, entry, next) { -- 1.8.1.4
Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written
On Wed, May 22, 2013 at 02:50:18PM +0200, Stefan Hajnoczi wrote: Net queues support efficient receive disable. For example, tap's file descriptor will not be polled while its peer has receive disabled. This saves CPU cycles for needlessly copying and then dropping packets which the peer cannot receive. rtl8139 is missing the qemu_flush_queued_packets() call that wakes the queue up when receive becomes possible again. As a result, the Windows 7 guest driver reaches a state where the rtl8139 cannot receive packets. The driver has actually refilled the receive buffer but we never resume reception. The bug can be reproduced by running a large FTP 'get' inside a Windows 7 guest: $ qemu -netdev tap,id=tap0,... -device rtl8139,netdev=tap0 The Linux guest driver does not trigger the bug, probably due to a different buffer management strategy. Reported-by: Oliver Francke oliver.fran...@filoo.de Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- hw/net/rtl8139.c | 3 +++ 1 file changed, 3 insertions(+) Applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan
Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written
On Mon, May 27, 2013 at 08:15:42AM +0200, Peter Lieven wrote: I ocassionally have seen a probably related problem in the past. It mainly happend with rtl8139 under WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers. My question is if you see increasing dropped packets on the tap device if this problem occurs? tap36 Link encap:Ethernet HWaddr b2:84:23:c0:e2:c0 inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0 TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:5161769434 (5.1 GB) TX bytes:380415916 (380.4 MB) My reading of the tun code is that will see TX dropped increase. This is because tun keeps a finite size queue of tx packets. Since QEMU userspace is not monitoring the tap fd anymore we'll never drain the queue and soon enough the TX dropped counter will begin incrementing. In my case as well the only option to recover without shutting down the whole vServer is Live Migration to another Node. However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported it does not happen there. Yes, the patch that exposes this problem was only merged in 1.2.1. Can you still reproduce the problem now that the patch has been merged into qemu.git/master? Stefan
Re: [Qemu-devel] [PATCH v3 2/2] net: introduce command to query rx-filter information
On Sun, May 26, 2013 at 10:38:14AM +0300, Michael S. Tsirkin wrote: On Fri, May 24, 2013 at 04:00:42PM -0400, Luiz Capitulino wrote: On Fri, 24 May 2013 12:05:12 -0600 Eric Blake ebl...@redhat.com wrote: On 05/24/2013 10:12 AM, Michael S. Tsirkin wrote: Event message contains the net client name, management might only want to query the single net client. The client can do the filtering itself. I'm not sure I buy the responsiveness argument. Sure, the fastest I/O is no I/O, but whether you read and parse 100 bytes or 1000 from a Unix domain socket once in a great while shouldn't make a difference. And the time spent malloc'ing the larger message to send from qemu, as well as the time spent malloc'ing the libvirt side that parses the qemu string into C code for use, and the time spent strcmp'ing every entry to find the right one... It really IS more efficient to filter as low down in the stack as possible, once it is determined that filtering is desirable. Whether filtering makes a difference in performance is a different question - you may be right that always returning the entire list and making libvirt do its own filtering will still not add any more noticeable delay compared to libvirt doing a filtered query, if the bottleneck lies elsewhere (such as libvirt telling macvtap its new configration). My main concern is to keep the external interface simple. I'm rather reluctant to have query commands grow options. In a case where we need the give me everything query anyway, the give me this particular part option is additional complexity. Needs justification, say arguments involving throughput, latency or client complexity. Perhaps cases exist where we never want to ask for everything. Then the give me everything query is useless, and the option should be mandatory. For this _particular_ interface, I'm not sure whether libvirt will ever use an unfiltered query - If having the argument is useful for libvirt, then it's fine to have it. But I'd be very reluctant to buy any performance argument w/o real numbers to back them up. Me too. I think it's more convenience than performance. Agreed. I suggested filtering on a NIC for usability rather than performance reasons. QMP should be easy to use. Requiring every client to fish for the right NIC in a bunch of output that gets discarded is not convenient. Stefan
Re: [Qemu-devel] [PATCH 0/7] VNVRAM persistent storage
On Fri, May 24, 2013 at 11:39:09AM -0400, Corey Bryant wrote: On 05/24/2013 08:36 AM, Stefan Hajnoczi wrote: On Fri, May 24, 2013 at 08:13:27AM -0400, Stefan Berger wrote: On 05/24/2013 05:59 AM, Stefan Hajnoczi wrote: On Thu, May 23, 2013 at 01:44:40PM -0400, Corey Bryant wrote: This patch series provides VNVRAM persistent storage support that QEMU can use internally. The initial target user will be a software vTPM 1.2 backend that needs to store keys in VNVRAM and be able to reboot/migrate and retain the keys. This support uses QEMU's block driver to provide persistent storage by reading/writing VNVRAM data from/to a drive image. The VNVRAM drive image is provided with the -drive command line option just like any other drive image and the vnvram_create() API will find it. The APIs allow for VNVRAM entries to be registered, one at a time, each with a maximum blob size. Entry blobs can then be read/written from/to an entry on the drive. Here's an example of usage: VNVRAM *vnvram; int errcode const VNVRAMEntryName entry_name; const char *blob_w = blob data; char *blob_r; uint32_t blob_r_size; vnvram = vnvram_create(drive-ide0-0-0, false, errcode); strcpy((char *)entry_name, first-entry); VNVRAMEntryName is very prone to buffer overflow. I hope real code doesn't use strcpy(). The cast is ugly, please don't hide the type. vnvram_register_entry(vnvram, entry_name, 1024); vnvram_write_entry(vnvram, entry_name, (char *)blob_w, strlen(blob_w)+1); vnvram_read_entry(vnvram, entry_name, blob_r, blob_r_size); These are synchronous functions. If I/O is involved then this is a problem: QEMU will be blocked waiting for host I/O to complete and the big QEMU lock is held. This can cause poor guest interactivity and poor scalability because vcpus cannot make progress, neither can the QEMU monitor respond. The vTPM is going to run as a thread and will have to write state blobs into a bdrv. The above functions will typically be called from this thead. When I originally wrote the code, the vTPM thread could not write the blobs into bdrv directly, so I had to resort to sending a message to the main QEMU thread to write the data to the bdrv. How else could we do this? How else: use asynchronous APIs like bdrv_aio_writev() or the coroutine versions (which eliminate the need for callbacks) like bdrv_co_writev(). I'm preparing patches that allow the QEMU block layer to be used safely outside the QEMU global mutex. Once this is possible it would be okay to use synchronous methods. Ok thanks. I'll use aio APIs next time around. Just to be clear, does eliminating the callback mean I don't have to use a bottom-half if I use coroutine reads/writes? I've only skimmed the patches but I think vTPM runs in its own thread and uses a BH to kick off I/O requests since the block layer must be called with the QEMU global mutex held. In this case you still need the BH since its purpose is to run block layer code in a thread that holds the QEMU global mutex. Stefan
Re: [Qemu-devel] [PATCH 2/2] rtl8139: flush queued packets when RxBufPtr is written
On Fri, May 24, 2013 at 05:18:09PM +0200, Paolo Bonzini wrote: Il 24/05/2013 16:38, Stefan Hajnoczi ha scritto: diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 9369507..7993f9f 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -2575,6 +2575,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, uint32_t val) /* this value is off by 16 */ s-RxBufPtr = MOD2(val + 0x10, s-RxBufferSize); +/* more buffer space may be available so try to receive */ +qemu_flush_queued_packets(qemu_get_queue(s-nic)); + DPRINTF( CAPR write: rx buffer length %d head 0x%04x read 0x%04x\n, s-RxBufferSize, s-RxBufAddr, s-RxBufPtr); } Do you have time to update the branch with a Cc: qemu-sta...@nongnu.org? It has been merged. Will try to CC qemu-stable in the future. Stefan
Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
On Sun, May 26, 2013 at 11:32:49AM +0200, Luke Gorrie wrote: Stefan put us onto the highly promising track of vhost/virtio. We have implemented this between Snabb Switch and the Linux kernel, but not directly between Snabb Switch and QEMU guests. The roadblock we have hit is embarrasingly basic: QEMU is using user-to-kernel system calls to setup vhost (open /dev/net/tun and /dev/vhost-net, ioctl()s) and I haven't found a good way to map these towards Snabb Switch instead of the kernel. vhost_net is about connecting the a virtio-net speaking process to a tun-like device. The problem you are trying to solve is connecting a virtio-net speaking process to Snabb Switch. Either you need to replace vhost or you need a tun-like device interface. Replacing vhost would mean that your switch implements virtio-net, shares guest RAM with the guest, and shares the ioeventfd and irqfd which are used to signal with the guest. At that point your switch is similar to the virtio-net data plane work that Ping Fan Liu is working on but your switch is in a separate process rather than a thread. How does your switch talk to hardware? If you have userspace NIC drivers that bypass the Linux network stack then the approach I mentioned fits well. If you are using the Linux network stack then it might be better to integrate with vhost maybe as a tun-like device driver. Stefan
Re: [Qemu-devel] [PATCH] kvm: exclude ioeventfd from counting kvm_io_range limit
On Sat, May 25, 2013 at 06:44:15AM +0800, Amos Kong wrote: We can easily reach the 1000 limit by start VM with a couple hundred I/O devices (multifunction=on). The hardcode limit already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000). In userspace, we already have maximum file descriptor to limit ioeventfd count. But kvm_io_bus devices also are used for pit, pic, ioapic, coalesced_mmio. They couldn't be limited by maximum file descriptor. Currently only ioeventfds take too much kvm_io_bus devices, so just exclude it from counting kvm_io_range limit. Also fixed one indent issue in kvm_host.h Signed-off-by: Amos Kong ak...@redhat.com --- include/linux/kvm_host.h | 3 ++- virt/kvm/eventfd.c | 2 ++ virt/kvm/kvm_main.c | 3 ++- 3 files changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@redhat.com
Re: [Qemu-devel] IDE disk FLUSH take more than 30 secs, the SUSE guest reports lost interrupt and the file system becomes read-only
On Sun, May 26, 2013 at 07:08:02PM +0200, Andreas Färber wrote: Am 21.05.2013 09:12, schrieb Gonglei (Arei): Through analysis, I found that because the system call the fdatasync command in the Qemu over 30s, after the Guest's kernel thread detects the io transferation is timeout, went to check IDE disk state. But the IDE disk status is 0x50, rather than the BSY status, and then departed error process... the path of kernel's action is : scsi_softirq_done scsi_eh_scmd_add scsi_error_handler shost-transportt-eh_strategy_handler ata_scsi_error ap-ops-lost_interrupt ata_sff_lost_interrupt Finally, the file system becomes read-only. Why not set the IDE disk for the BSY status When 0xe7 command is executed in the Qemu? Have you actually tried that out with a patch such as the following? diff --git a/hw/ide/core.c b/hw/ide/core.c index c7a8041..bf1ff18 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -795,6 +795,8 @@ static void ide_flush_cb(void *opaque, int ret) { IDEState *s = opaque; +s-status = ~BUSY_STAT; + if (ret 0) { /* XXX: What sector number to set here? */ if (ide_handle_rw_error(s, -ret, BM_STATUS_RETRY_FLUSH)) { @@ -814,6 +816,7 @@ void ide_flush_cache(IDEState *s) return; } +s-status |= BUSY_STAT; bdrv_acct_start(s-bs, s-acct, 0, BDRV_ACCT_FLUSH); bdrv_aio_flush(s-bs, ide_flush_cb, s); } No clue if this is spec-compliant. ;) Note however that qemu_fdatasync() is done in the flush callback of block/raw-posix.c, so IIUC everything calling bdrv_aio_flush() or bdrv_flush_all() may potentially run into issues beyond just ATA: This is an IDE emulation bug. virtio-blk, for example, doesn't have this kind of busy status bit. It's probably not an issue with SCSI either. Stefan
Re: [Qemu-devel] [RFC PATCH v3 02/11] Fix errors and warnings while compiling with c++ compilier
On Fri, May 24, 2013 at 05:25:21PM +0200, Markus Armbruster wrote: Tomoki Sekiyama tomoki.sekiy...@hds.com writes: On 5/24/13 4:52 , Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, May 23, 2013 at 06:34:43PM +, Tomoki Sekiyama wrote: On 5/23/13 8:12 , Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, May 21, 2013 at 11:33:41AM -0400, Tomoki Sekiyama wrote: Add C++ keywords to avoid errors in compiling with c++ compiler. This also renames class member of PciDeviceInfo to q_class. Signed-off-by: Tomoki Sekiyama tomoki.sekiy...@hds.com --- hmp.c |2 +- hw/pci/pci.c|2 +- scripts/qapi.py |9 - 3 files changed, 10 insertions(+), 3 deletions(-) Please also extend scripts/checkpatch.pl. Otherwise it is very likely that C++ keywords will be introduced again in the future. Most people will not build the VSS code and therefore checkpatch.pl needs to ensure that patches with C++ keywords will not be accepted. Stefan I think it would be difficult to identify problematic C++ keywords usage from patches because headers can legally contain C++ keywords and checkpatch.pl doesn't know where it should be used. Do you have any good ideas? We can ignore false warnings for 0.1% of patches (the ones that touch VSS C++ code). But for the remaining 99.9% of patches it's worth guarding against VSS bitrot. Remember not many people will compile it and therefore they won't notice when they break it. I really think it's worth putting some effort in now so VSS doesn't periodically break. checkpatch.pl is a hacky sort of C parser. It already checks for a bunch of similar things and it knows about comments, macros, and strings. It does not perform #include expansion, so there is no risk of including system headers that have C++ code. Stefan Thanks for your comment. I'm still wondering because it actually causes a lot false positives (not just 0.1%...) even for the patches not touching VSS. For example, keyword 'class' is used in qdev-monitor.c, qom/object.c, and a lot of files in hw/*/*.c and include/{hw,qom}/*.h, but they have nothing to do with qemu-ga. Qemu-ga is just a part of whole qemu source code, so I don't want to warn around the other parts. And I appreciate that. Purging some other language's keywords feels like pointless churn to me. I see what you guys are saying now. You want to protect only qemu-ga. I was proposing protecting the entire codebase so we never get into a situation where changing a header breaks qemu-ga. It's not that hard to use klass instead of class, but if it's unpopular then we'll just have to live with VSS breakage. Stefan
Re: [Qemu-devel] [RFC PATCH v3 11/11] QMP/qmp.py: set locale for exceptions to display non-ascii messages correctly
On Fri, May 24, 2013 at 06:55:21PM +, Tomoki Sekiyama wrote: On 5/23/13 8:30 , Stefan Hajnoczi stefa...@gmail.com wrote: Either or both bugs could be present. Once they are fixed you shouldn't see encoding problems. Stefan Anyway, as I understood this patch for qmp.py is not correct way to fix this issue, I'm going to drop it from the series. Interesting, thanks for showing that that QEMU and qmp.py are behaving correctly. For reference, the Python unicode exception bug is here and doesn't seem to be fixable in Python 2.x: http://bugs.python.org/issue2517 Stefan
Re: [Qemu-devel] 'qemu-nbd' explicit flush
On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote: On 5/24/13 1:05 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, May 23, 2013 at 09:58:31PM +, Mark Trumpold wrote: One thing to be careful of is whether these operations are asynchronous. The signal is asynchronous, you have no way of knowing when qemu-nbd is finished flushing to the physical disk. Right, of course. I missed the obvious. I missed something too. Paolo may have already hinted at this when he posted a dd oflag=sync command-line option: blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only writes out dirty pages to the block device. It does *not* guarantee to send a flush request to the device. Therefore, the underlying image file may not be put into an up-to-date state by qemu-nbd. I suggest trying the following instead of blockdev --flushbufs: python -c 'import os; os.fsync(open(/dev/loopX, r+b))' This should do the same as blockdev --flushbufs *plus* it sends and waits for the NBD FLUSH command. You may have to play with this command-line a little but the main idea is to open the block device and fsync it. Stefan
Re: [Qemu-devel] [Bug 1184089] [NEW] [Feature request] loadvm snapshot as read-only
On Sat, May 25, 2013 at 08:29:11AM -, Michael Coppola wrote: There are many ways to take and manage snapshots in QEMU, but one main feature that's missing is the ability to 'loadvm' a LIVE snapshot and have all future changes redirected to a temporary file. This would effectively be combining the -loadvm and -snapshot switches and make the snapshot read-only. With this feature, users would be provided a sandbox and be able to start and restart the same live snapshot without corrupting the image in doing so. This should be possible soon. Wenchao Xia is working on new monitor commands that allow you to combine internal snapshots (loadvm/savevm) with external snapshots (blockdev-snapshot-sync). You would submit a QMP 'transaction' command that specifies a loadvm followed by a blockdev-snapshot-sync. This operation is atomic. Note that internal snapshots do not destroy the snapshot. Therefore, when you loadvm an internal snapshot and write to the disk, you are not modifying the internal snapshot only the current state of the disk. You can loadvm it again later. Stefan
Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed under xen-4.1.2 and qemu upstream
On Sat, May 25, 2013 at 07:32:55AM +, Gonglei (Arei) wrote: -Original Message- From: Stefan Hajnoczi [mailto:stefa...@gmail.com] Sent: Friday, May 24, 2013 8:20 PM To: Gonglei (Arei) Cc: Stefano Stabellini; Hanweidong; Luonengjun; qemu-devel@nongnu.org; Wangzhenguo; xen-de...@lists.xen.org; Huangweidong (Hardware) Subject: Re: [Qemu-devel] use O_DIRECT to open disk images for IDE failed under xen-4.1.2 and qemu upstream On Fri, May 24, 2013 at 02:59:05AM +, Gonglei (Arei) wrote: Hi, On Thu, 23 May 2013, Gonglei (Arei) wrote: Hi, all I use O_DIRECT to open disk images for IDE, but I'm failed. After debug, I get the below logs: [2013-05-22 23:25:46] ide: CMD=c8 [2013-05-22 23:25:46] bmdma: readb 0x00 : 0x08 [2013-05-22 23:25:46] bmdma: writeb 0x00 : 0x09 [2013-05-22 23:25:46] bmdma_cmd_writeb: 0x0009 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:25:46] bmdma: readb 0x02 : 0x01 [2013-05-22 23:26:39] bmdma: writeb 0x00 : 0x08 [2013-05-22 23:26:39] bmdma_cmd_writeb: 0x0008 [2013-05-22 23:26:56] == offset:0 buf:0x7ff100f21c00 count:512 aio_offset:0 [2013-05-22 23:31:30] == offset:0 buf:0x7ff100f21c00 count:512 aio_offset:0 [2013-05-22 23:31:30] == handle_aiocb_rw_linear errno: -14 [2013-05-22 23:31:30] == paio_complete errno=14 [2013-05-22 23:31:30] == ide_dma_error!!! [2013-05-22 23:31:30] ide: read status addr=0x3f6 val=41 QEMU command line : qemu-system-i386 -xen-domid 837 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-837,server,nowait -mon chardev=libxl-cmd,mode=control -name suse11 -vnc 0.0.0.0:1 -serial pty -boot order=c -usb -usbdevice tablet -smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:13:d3:72 -netdev type=tap,id=net0,ifname=tap837.0,downscript=no -M xenfv -m 2040 -drive file=/mnt/sdd/image/suse.image,if=ide,index=0,media=disk,format=raw,cache =none errno 14 shows Bad Address. And I find QEMU_AIO_MISALIGNED flag bit is not set through debug. /* * If O_DIRECT is used the buffer needs to be aligned on a sector * boundary. Check if this is the case or tell the low-level * driver that it needs to copy the buffer. */ if ((bs-open_flags BDRV_O_NOCACHE)) { if (!bdrv_qiov_is_aligned(bs, qiov)) {//if the address is aligned-512, will no meet the conditions type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s-use_aio) { return laio_submit(bs, s-aio_ctx, s-fd, sector_num, qiov, nb_sectors, cb, opaque, type); #endif Next process: static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb) { ssize_t nbytes; char *buf; if (!(aiocb-aio_type QEMU_AIO_MISALIGNED)) { /* * If there is just a single buffer, and it is properly aligned * we can just use plain pread/pwrite without any problems. */ if (aiocb-aio_niov == 1) return handle_aiocb_rw_linear(aiocb, aiocb-aio_iov-iov_base); //this way, and reports errno 14 next Anyone have a good method to resolve this bug? Thanks! I know that this is not the answer you are looking for but why do you want to use O_DIRECT with IDE? It should be perfectly safe to use write-back. A few days ago, I asked a question about the IDE FLUSH time of guest: http://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg02642.html finally I found that because Qemu use write-back flag to open disk images by default. so I hope to use O_DIRECT to avoid meeting that problem, but I'm failed under Xen platform with Qemu upstream. cache=none does not eliminate the need for flush. If you want to do that, try cache=writethrough or cache=directsync. I had tried it, but I got the same error. Perhaps that's because the busy bit also needs to be set correctly for other IDE commands (like read/write requests). Regarding the EFAULT you are seeing, did you check if the I/O buffer address is valid? Try breaking in gdb and inspecting /proc/pid/maps or just x buffer-address in gdb to see if the buffer is readable. Stefan Hi, Stefan The follows are details in gdb: handle_aiocb_rw_linear (aiocb=0x13a0f80, buf=0x7f0a637d7c00 Address 0x7f0a637d7c00 out of bounds) at /mnt/sdd/gonglei/xen-4.1.2/tools/qemu-xen/posix-aio-compat.c:213 213 { (gdb) bt #0 handle_aiocb_rw_linear (aiocb=0x13a0f80
[Qemu-devel] [PATCH v4] tests: set MALLOC_PERTURB_ to expose memory bugs
glibc wipes malloc(3) memory when the MALLOC_PERTURB_ environment variable is set. The value of the environment variable determines the bit pattern used to wipe memory. For more information, see http://udrepper.livejournal.com/11429.html. Set MALLOC_PERTURB_ for gtester and qemu-iotests. Note we pick a random value from 1 to 255 to expose more bugs. If you need to reproduce a crash use 'show environment' in gdb to extract the MALLOC_PERTURB_ value from a core dump. Both make check and qemu-iotests pass with MALLOC_PERTURB_ enabled. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- Embarassing how long it's taking to get this right :P v4: * Drop dash-incompatible $$RANDOM [eblake] v3: * Use $ escaping in tests/Makefile [eblake] v2: * Randomize MALLOC_PERTURB_ value [armbru] * Preserve existing MALLOC_PERTURB_ variable, if set [danpb] tests/Makefile | 5 - tests/qemu-iotests/check | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index a307d5a..c107489 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -171,6 +171,7 @@ GCOV_OPTIONS = -n $(if $(V),-f,) $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) $(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \ + MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \ gtester $(GTESTER_OPTIONS) -m=$(SPEED) $(check-qtest-$*-y),GTESTER $@) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$*-y); do \ echo Gcov report for $$f:;\ @@ -180,7 +181,9 @@ $(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y) .PHONY: $(patsubst %, check-%, $(check-unit-y)) $(patsubst %, check-%, $(check-unit-y)): check-%: % $(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,) - $(call quiet-command,gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) + $(call quiet-command, \ + MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \ + gtester $(GTESTER_OPTIONS) -m=$(SPEED) $*,GTESTER $*) $(if $(CONFIG_GCOV),@for f in $(gcov-files-$(subst tests/,,$*)-y); do \ echo Gcov report for $$f:;\ $(GCOV) $(GCOV_OPTIONS) $$f -o `dirname $$f`; \ diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 432732c..74628ae 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -214,7 +214,8 @@ do start=`_wallclock` $timestamp echo -n [`date +%T`] [ ! -x $seq ] chmod u+x $seq # ensure we can run it - ./$seq $tmp.out 21 + MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(($RANDOM % 255 + 1))} \ + ./$seq $tmp.out 21 sts=$? $timestamp _timestamp stop=`_wallclock` -- 1.8.1.4
Re: [Qemu-devel] [PATCH] rtl8139: flush queued packets when RxBufPtr is written
On Mon, May 27, 2013 at 04:24:59PM +0200, Peter Lieven wrote: On 27.05.2013 16:07, Oliver Francke wrote: Well, Am 27.05.2013 um 08:15 schrieb Peter Lieven lieven-li...@dlhnet.de: Hi all, I ocassionally have seen a probably related problem in the past. It mainly happend with rtl8139 under WinXP where we most likely use rtl8139 due to lack of shipped e1000 drivers. My question is if you see increasing dropped packets on the tap device if this problem occurs? tap36 Link encap:Ethernet HWaddr b2:84:23:c0:e2:c0 inet6 addr: fe80::b084:23ff:fec0:e2c0/64 Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:5816096 errors:0 dropped:0 overruns:0 frame:0 TX packets:3878744 errors:0 dropped:13775 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:5161769434 (5.1 GB) TX bytes:380415916 (380.4 MB) In my case as well the only option to recover without shutting down the whole vServer is Live Migration to another Node. ACK, tried it and every network-devices might have been re-created into a defined state qemu-wise. However, I also see this problem under qemu-kvm-1.2.0 while Oliver reported it does not happen there. Neither me nor any affected customers have ever seen such failures in qemu-1.2.0, so this was my last-known-good ;) I cherry-picked net: add receive_disabled logic to iov delivery path This one exposes the bug that Oliver reported: commit a9d8f7b1c41a8a346f4cf5a0c6963a79fbd1249e Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Mon Aug 20 13:35:23 2012 +0100 net: do not report queued packets as sent
Re: [Qemu-devel] [Bug 1184616] [NEW] undefined reference to `trace_qemu_anon_ram_alloc'
On Mon, May 27, 2013 at 4:02 PM, Nigel Horne 1184...@bugs.launchpad.net wrote: The latest git version (commit 6a4e17711442849bf2cc731ccddef5a2a2d92d29) fails to compile: ... LINK qemu-ga libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_alloc': oslib-posix.c:(.text+0x154): undefined reference to `trace_qemu_anon_ram_alloc' libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_free': oslib-posix.c:(.text+0x1e7): undefined reference to `trace_qemu_anon_ram_free' collect2: error: ld returned 1 exit status make: *** [qemu-ga] Error 1 This is on Ubuntu 13.04, gcc 4.7.3, configure flags: './configure' '--enable-linux-aio' '--enable-kvm' '--enable-vhost-net' Please try: make distclean ./configure --enable-linux-aio --enable-kvm --enable-vhost-net make Stefan
Re: [Qemu-devel] [PATCH V3 1/4] block: drop bs_snapshots global variable
On Mon, May 27, 2013 at 05:25:16PM +0200, Kevin Wolf wrote: Am 25.05.2013 um 05:09 hat Wenchao Xia geschrieben: From: Stefan Hajnoczi stefa...@redhat.com The bs_snapshots global variable points to the BlockDriverState which will be used to save vmstate. This is really a savevm.c concept but was moved into block.c:bdrv_snapshots() when it became clear that hotplug could result in a dangling pointer. While auditing the block layer's global state I came upon bs_snapshots and realized that a variable is not necessary here. Simply find the first BlockDriverState capable of internal snapshots each time this is needed. The behavior of bdrv_snapshots() is preserved across hotplug because new drives are always appended to the bdrv_states list. This means that calling the new find_vmstate_bs() function is idempotent - it returns the same BlockDriverState unless it was hot-unplugged. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com Reviewed-by: Eric Blake ebl...@redhat.com Reviewed-by: Wenchao Xia xiaw...@linux.vnet.ibm.com Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com --- block.c | 28 include/block/block.h |1 - savevm.c | 19 +++ 3 files changed, 15 insertions(+), 33 deletions(-) diff --git a/block.c b/block.c index 3f87489..478a3b2 100644 --- a/block.c +++ b/block.c @@ -99,9 +99,6 @@ static QTAILQ_HEAD(, BlockDriverState) bdrv_states = static QLIST_HEAD(, BlockDriver) bdrv_drivers = QLIST_HEAD_INITIALIZER(bdrv_drivers); -/* The device to use for VM snapshots */ -static BlockDriverState *bs_snapshots; - /* If non-zero, use only whitelisted block drivers */ static int use_bdrv_whitelist; @@ -1357,9 +1354,6 @@ void bdrv_close(BlockDriverState *bs) notifier_list_notify(bs-close_notifiers, bs); if (bs-drv) { -if (bs == bs_snapshots) { -bs_snapshots = NULL; -} if (bs-backing_hd) { bdrv_delete(bs-backing_hd); bs-backing_hd = NULL; @@ -1591,7 +1585,6 @@ void bdrv_delete(BlockDriverState *bs) bdrv_close(bs); -assert(bs != bs_snapshots); g_free(bs); } @@ -1635,9 +1628,6 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const BlockDevOps *ops, { bs-dev_ops = ops; bs-dev_opaque = opaque; -if (bdrv_dev_has_removable_media(bs) bs == bs_snapshots) { -bs_snapshots = NULL; -} This hunk isn't replaced by any other code. If I understand correctly what it's doing, it prevented you from saving the VM state to a removable device, which would be allowed after this patch. Is this a change we want to make? Why isn't it described in the commit message? My understanding of this change is different. Markus is on CC so maybe he can confirm. The point of bs_snapshots = NULL is not to prevent you from saving snapshots. It's simply to reset the pointer to the next snapshottable device (used by bdrv_snapshots()). See the bdrv_close() hunk above which does the same thing, as well as bdrv_snapshots() which iterates bdrv_states and updates bs_snapshots. So what this hunk does is to reset the bdrv_snapshots() iterator when a removable device is hooked up to an emulated storage controller. It's no longer necessary since this patch drops the global state (bs_snapshots) and users will always iterate from scratch. The whole stateful approach was not necessary. Stefan
Re: [Qemu-devel] [PATCH] block: add read only to whitelist
On Tue, May 28, 2013 at 02:44:26PM +0800, Fam Zheng wrote: We may want to include a driver in the whitelist for read only tasks such as diagnosing or exporting guest data (with libguestfs as a good example). This patch introduces the magic prefix ^ to include a driver to the whitelist, but only enables qemu to open specific image format readonly, and returns -ENOTSUP for RW opening. Example: To include vmdk readonly, and others read+write: ./configure --block-drv-whitelist=qcow2,raw,file,qed,^vmdk This is great, thanks for tackling this. block/vmdk.c isn't suitable for running real VMs (read-write) since it's not optimized for concurrent I/O but it is usable for libguestfs (read-only). Signed-off-by: Fam Zheng f...@redhat.com --- block.c | 43 +++ blockdev.c| 4 ++-- configure | 2 ++ include/block/block.h | 3 ++- scripts/create_config | 9 - 5 files changed, 41 insertions(+), 20 deletions(-) diff --git a/block.c b/block.c index 3f87489..e6a7270 100644 --- a/block.c +++ b/block.c @@ -328,28 +328,40 @@ BlockDriver *bdrv_find_format(const char *format_name) return NULL; } -static int bdrv_is_whitelisted(BlockDriver *drv) +static int bdrv_is_whitelisted(BlockDriver *drv, bool read_only) { -static const char *whitelist[] = { -CONFIG_BDRV_WHITELIST +static const char *whitelist_rw[] = { +CONFIG_BDRV_WHITELIST_RW +}; +static const char *whitelist_ro[] = { +CONFIG_BDRV_WHITELIST_RO }; const char **p; Please also make the ./configure lists separate. The funky ^ syntax is not obvious. Better to have: ./configure --block-drv-whitelist-rw=qcow2,raw,file,qed \ --block-drv-whitelist-ro=vmdk,vpc -if (!whitelist[0]) +if (!whitelist_rw[0] !whitelist_ro[0]) { return 1; /* no whitelist, anything goes */ +} -for (p = whitelist; *p; p++) { +for (p = whitelist_rw; *p; p++) { if (!strcmp(drv-format_name, *p)) { return 1; } } +if (read_only) { +for (p = whitelist_ro; *p; p++) { +if (!strcmp(drv-format_name, *p)) { +return 1; +} +} +} return 0; } -BlockDriver *bdrv_find_whitelisted_format(const char *format_name) +BlockDriver *bdrv_find_whitelisted_format(const char *format_name, + bool read_only) { BlockDriver *drv = bdrv_find_format(format_name); -return drv bdrv_is_whitelisted(drv) ? drv : NULL; +return drv bdrv_is_whitelisted(drv, read_only) ? drv : NULL; } typedef struct CreateCo { @@ -684,10 +696,6 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, trace_bdrv_open_common(bs, filename ?: , flags, drv-format_name); -if (use_bdrv_whitelist !bdrv_is_whitelisted(drv)) { -return -ENOTSUP; -} - /* bdrv_open() with directly using a protocol as drv. This layer is already * opened, so assign it to bs (while file becomes a closed BlockDriverState) * and return immediately. */ if (file != NULL drv-bdrv_file_open) { bdrv_swap(file, bs); return 0; } I think your change is okay. You moved the check after this early return, but file is already opened so we passed the whitelist check already. This is a little tricky but it seems fine. Stefan
Re: [Qemu-devel] kvm process disappears
On Mon, May 27, 2013 at 11:09:51PM +0200, Stefan Priebe wrote: Am 10.05.2013 13:09, schrieb Stefan Hajnoczi: On Fri, May 10, 2013 at 11:07 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Am 10.05.2013 09:42, schrieb Stefan Hajnoczi: On Fri, May 10, 2013 at 08:12:39AM +0200, Stefan Priebe - Profihost AG wrote: 3. Either use gdb or an LD_PRELOAD library that catches exit(3) and _exit(2) and dumps core using abort(3). Make sure core dumps are enabled. LD_PRELOAD sounds good can you point me to such a lib? $ cat /tmp/catchexit.c #include unistd.h #include stdlib.h void exit(int status) { const char msg[] = *** CAUGHT EXIT, DUMPING CORE ***\n; write(2, msg, sizeof msg); abort(); } void _exit(int status) __attribute__((alias(exit))); $ gcc -o catchexit.so -shared -fPIC -std=gnu99 catchexit.c $ LD_PRELOAD=/tmp/catchexit.so x86_64-softmmu/qemu-system-x86_64 -m 1024 -enable-kvm -cpu host -vga asdf Unknown vga type: asdf *** CAUGHT EXIT, DUMPING CORE *** Aborted (core dumped) Make sure to give the absolute path to catchexit.so. Also keep in mind that this does not catch a normal return from main() or possibly other ways of terminating the process. You can hook more library functions, if necessary. Stefan I'm really sorry for bothering you. It turned out to be a host kernel bug. Without NUMA Balancing turned on (kernel 3.8.13) i see no vm crashes at all... No worries. Glad you found the solution. Stefan
Re: [Qemu-devel] [Bug 1184089] Re: [Feature request] loadvm snapshot as read-only
On Mon, May 27, 2013 at 10:42:17PM -, Michael Coppola wrote: Awesome, looking forward to it. I may be misunderstanding what's happening under the hood, but at least for me, calling 'loadvm' on a single snapshot over and over seems to work the first few times and then immediately blue screens the WinXP guest with PFN_LIST_CORRUPT. I was under the assumption that all runtime modifications were being written back to the image, effectively corrupting something (whether it was changes to the snapshot or the backing image causing things to break). savevm/loadvm does not use backing images. It relies on internal snapshot which are stored inside the existing qcow2 image file. If you *are* using backing images then you're right - modifying the backing image is likely to trigger weird guest behavior. Until then, I've seemed to have found a workaround for the feature itself. Instead of creating a snapshot with 'savevm', I can start the VM with -snapshot and then call: migrate exec: gzip -c snapshot.gz in QMP and it saves the live image to a compressed file. Make sure it's completed migration before exiting with info migrate. Subsequently loading the snapshot with: qemu-* whatever flags -incoming exec: gzip -c -d snapshot.gz -snapshot will load the live snapshot and redirect all runtime modifications to a temp file. http://www.linux-kvm.org/page/Migration says not to use -snapshot, but who follows the rules anyways? ;) It seems to work so far and things haven't exploded yet. Running md5sum on the qcow2 image and gzip snapshot before and after shows no changes to either files. The reason that -snapshot isn't used together with migration is because the disk state will be discarded and not migrated. Stefan
Re: [Qemu-devel] [Bug 1184616] [NEW] undefined reference to`trace_qemu_anon_ram_alloc'
On Mon, May 27, 2013 at 09:04:26PM -, Nigel Horne wrote: On Mon, May 27, 2013 at 4:02 PM, Nigel Horne 1184...@bugs.launchpad.net wrote: The latest git version (commit 6a4e17711442849bf2cc731ccddef5a2a2d92d29) fails to compile: ... LINK qemu-ga libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_alloc': oslib-posix.c:(.text+0x154): undefined reference to `trace_qemu_anon_ram_alloc' libqemuutil.a(oslib-posix.o): In function `qemu_anon_ram_free': oslib-posix.c:(.text+0x1e7): undefined reference to `trace_qemu_anon_ram_free' collect2: error: ld returned 1 exit status make: *** [qemu-ga] Error 1 This is on Ubuntu 13.04, gcc 4.7.3, configure flags: './configure' '--enable-linux-aio' '--enable-kvm' '--enable-vhost-net' Please try: make distclean ./configure --enable-linux-aio --enable-kvm --enable-vhost-net make I tried that before I raised the bug. However I tried it again to be sure, and yes I still get the same error. Please post the output of git status. I wonder if there are stale files because the build works fine here. Stefan
[Qemu-devel] [PATCH] qemu-iotests: fix 054 cluster size help output
Commit f3f4d2c09b9cf46903ba38425ec46c44185162bd added a hint to increase the cluster size when a large image cannot be created. Test 054 now has outdated output and fails because the golden output does not match. This patch updates the 054 golden output. Signed-off-by: Stefan Hajnoczi stefa...@redhat.com --- tests/qemu-iotests/054.out | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/054.out b/tests/qemu-iotests/054.out index 0b2fe30..2f357c2 100644 --- a/tests/qemu-iotests/054.out +++ b/tests/qemu-iotests/054.out @@ -1,7 +1,7 @@ QA output created by 054 creating too large image (1 EB) -qemu-img: The image size is too large for file format 'qcow2' +qemu-img: The image size is too large for file format 'qcow2' (try using a larger cluster size) Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1152921504606846976 creating too large image (1 EB) using qcow2.py -- 1.8.1.4
Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
On Tue, May 28, 2013 at 12:10 PM, Luke Gorrie luk...@gmail.com wrote: On 27 May 2013 11:34, Stefan Hajnoczi stefa...@redhat.com wrote: vhost_net is about connecting the a virtio-net speaking process to a tun-like device. The problem you are trying to solve is connecting a virtio-net speaking process to Snabb Switch. Yep! Either you need to replace vhost or you need a tun-like device interface. Replacing vhost would mean that your switch implements virtio-net, shares guest RAM with the guest, and shares the ioeventfd and irqfd which are used to signal with the guest. This would be a great solution from my perspective. This is the design that I am now struggling to find a good implementation strategy for. At that point your switch is similar to the virtio-net data plane work that Ping Fan Liu is working on but your switch is in a separate process rather than a thread. Thanks for the reference! I was not aware of this work and it sounds highly relevant. How does your switch talk to hardware? If you have userspace NIC drivers that bypass the Linux network stack then the approach I mentioned fits well. The switch talks to hardware using a built-in userspace (kernel bypass) device driver. BTW there is an effort to get low-latency networking integrated into Linux: http://thread.gmane.org/gmane.linux.kernel/1493276 Stefan