Re: [Qemu-devel] [PATCH v3 for-2.2 0/8] don't use Yoda conditions
Gonglei (Arei) arei.gong...@huawei.com writes: Hi, $WHATEVER: don't use 'Yoda conditions' 'Yoda conditions' are not part of idiomatic QEMU coding style, so rewrite them in the more usual order. OK but why stop at these files? How about this instead? I just search c files by using key words like NULL == etc. I don't think we should change conditional statements like and =. Eric pointed out it's actually incorrect for NaNs. If you want to touch inequalities, separate patch(es) please, because they need more thorough review, both for correctness and for style. BTW, just using like value == NULL instead of NULL == value in all files is not a good idea, which we have discussed in my patch serials v2. So, I posted v3, add change log imitate nearby code about using '!value' or value == NULL' at every patch . Re not a good idea: I think rewriting NULL == value to value == NULL *is* a good idea, but rewriting it to !value where that blends in with surrounding code is a *better* idea. Gonglei's patches do that, Michael's don't, but are more complete. Therefore: So, maybe you can post patches for those files I have missed in the serials, but not simply instead all by semantic script IMO, thanks! Easy: apply Gonglei's patches before you run the script. You may have to split patches along subsystem boundaries to get them in. Bothersome, as it involves guessing boundaries. Not a request from me, just a warning of possible misfortune :)
[Qemu-devel] [PATCH v2 2/4] monitor: fix access freed memory
The function monitor_fdset_dup_fd_find_remove() references member of 'mon_fdset' which may be freed in function monitor_fdset_cleanup() Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com --- monitor.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 5bc70a6..41e46a6 100644 --- a/monitor.c +++ b/monitor.c @@ -2532,8 +2532,10 @@ static int monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove) { MonFdset *mon_fdset; MonFdsetFd *mon_fdset_fd_dup; +int64_t id = -1; QLIST_FOREACH(mon_fdset, mon_fdsets, next) { +id = mon_fdset-id; QLIST_FOREACH(mon_fdset_fd_dup, mon_fdset-dup_fds, next) { if (mon_fdset_fd_dup-fd == dup_fd) { if (remove) { @@ -2542,7 +2544,7 @@ static int monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove) monitor_fdset_cleanup(mon_fdset); } } -return mon_fdset-id; +return id; } } } -- 1.7.12.4
[Qemu-devel] [PATCH v2 1/4] l2cap: fix access freed memory
Pointer 'ch' will be used in function 'l2cap_channel_open_req_msg' after it was previously freed in 'l2cap_channel_open'. Assigned it to NULL after it is freed. Reviewed-by: Alex Bennée alex.ben...@linaro.org Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com --- hw/bt/l2cap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/bt/l2cap.c b/hw/bt/l2cap.c index 2301d6f..591e047 100644 --- a/hw/bt/l2cap.c +++ b/hw/bt/l2cap.c @@ -429,7 +429,7 @@ static struct l2cap_chan_s *l2cap_channel_open(struct l2cap_instance_s *l2cap, status = L2CAP_CS_NO_INFO; } else { g_free(ch); - +ch = NULL; result = L2CAP_CR_NO_MEM; status = L2CAP_CS_NO_INFO; } -- 1.7.12.4
[Qemu-devel] [PATCH v2 4/4] ivshmem: check the value returned by fstat()
The function fstat() may fail, so check its return value. Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com --- hw/misc/ivshmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index 768e528..5d939d2 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -324,7 +324,11 @@ static int check_shm_size(IVShmemState *s, int fd) { struct stat buf; -fstat(fd, buf); +if (fstat(fd, buf) 0) { +fprintf(stderr, ivshmem: exiting for fstat fd '%d' failed: %s\n, +fd, strerror(errno)); +return -1; +} if (s-ivshmem_size buf.st_size) { fprintf(stderr, -- 1.7.12.4
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
William Dauchy wdau...@gmail.com writes: On Tue, Aug 5, 2014 at 8:57 PM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: Can you confirm this is on the final 2.1 release (there was a fix that went in just around rc5). for the receiver, I'm using 2.1 # qemu-system-x86_64 --version QEMU emulator version 2.1.0, Copyright (c) 2003-2008 Fabrice Bellar for the sender side, here is the qmp result: (QEMU) query-version { u'return': { u'package': u'', u'qemu': { u'major': 2, u'micro': 50, u'minor': 0}}} What's your command line on both ends? the receiver: qemu-system-x86_64 -m 2048 -cpu host,level=9 -nodefconfig -nodefaults -nographic -readconfig /var/lib/qemu/VM_A/config -pidfile /var/lib/qemu/VM_A/pid -serial pty -D /var/lib/qemu/VM_A/log -d unimp,guest_errors -incoming tcp:0:32768 /var/lib/qemu/VM_A/config: [name] guest = VM_A process = VM_A [chardev compat_monitor0] backend = socket path = /var/lib/qemu/VM_A/sock server = on wait = off [device] driver = virtio-balloon [rtc] base = utc [mon compat_monitor0] mode = control chardev = compat_monitor0 default = on [machine] kernel = /boot/bzImage append = root=/dev/sda console=ttyS0 ro accel = kvm [device] driver = virtio-rng-pci [device scsi0] driver = virtio-scsi-pci hotplug = on [...] vif config [...] disk config [smp-opts] cpus = 2 sockets = 1 cores = 1 threads = 1 maxcpus = 12 the sender has the exact same command (without tcp receiver) and also the same config; I'm just issuing a migrate command through qmp. Looks like you're not specifying a machine type. If that's the case, your source machine uses the old QEMU's default machine type, probably pc-i440fx-2.0, and your destination machine uses the new default, probably pc-i440fx-2.1. Migration requires identical machine types on source and destination. If they differ, migration may work anyway (this is the very lucky case), fail outright (lucky case), or it may succeed, then make the guest misbehave or blow up on the destination, possibly after some delay. Please try with 'type = pc-i440fx-2.0' in your '[machine]' section.
[Qemu-devel] [PATCH v2 3/4] virtio-blk: fix reference a pointer which might be freed
In function virtio_blk_handle_request, it may freed memory pointed by req, So do not access member of req after calling this function. Reviewed-by: Stefan Hajnoczi stefa...@redhat.com Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com --- hw/block/virtio-blk.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index c241c50..54a853a 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -458,7 +458,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) static void virtio_blk_dma_restart_bh(void *opaque) { VirtIOBlock *s = opaque; -VirtIOBlockReq *req = s-rq; +VirtIOBlockReq *req = s-rq, *next = NULL; MultiReqBuffer mrb = { .num_writes = 0, }; @@ -469,8 +469,9 @@ static void virtio_blk_dma_restart_bh(void *opaque) s-rq = NULL; while (req) { +next = req-next; virtio_blk_handle_request(req, mrb); -req = req-next; +req = next; } virtio_submit_multiwrite(s-bs, mrb); -- 1.7.12.4
Re: [Qemu-devel] [v3][PATCH 2/4] hw:i386:pc_piix: split pc_init1()
On 2014/8/4 21:48, Michael S. Tsirkin wrote: On Thu, Jul 31, 2014 at 08:09:32PM +0800, Tiejun Chen wrote: We'd like to split pc_init1 and then we can share something with other stuff. Signed-off-by: Tiejun Chen tiejun.c...@intel.com With patch 1 in place, this should not be necessary - just propage the correct types around. Will rebase this in next version. Thanks Tiejun
[Qemu-devel] [PATCH v2 0/4] fix several bugs about use-after-free and an api abuse
v1 - v2: -ivshmem: modified the log message according to reviewing suggestion of Michael zhanghailiang (4): l2cap: fix access freed memory monitor: fix access freed memory virtio-blk: fix reference a pointer which might be freed ivshmem: check the value returned by fstat() hw/block/virtio-blk.c | 5 +++-- hw/bt/l2cap.c | 2 +- hw/misc/ivshmem.c | 6 +- monitor.c | 4 +++- 4 files changed, 12 insertions(+), 5 deletions(-) -- 1.7.12.4
Re: [Qemu-devel] [questions] about qemu log
Zhang Haoyu zhan...@sangfor.com writes: The output is on qemu's stderr. You are in control of what that stderr is. I don't get why we can configure -D /path/to/unique/file/name.log but we also have to redirect stderr (I didn't checked if the daemonize option was closing it). What's the purpose of this logfile option? Well -D will log to file only loggable (i.e. qemu_log()) information (which has all sorts of options and switches). Stderr, is a little more static and should in theory be limited to genuine errors. But if you want a combined log of both you can simply omit -D to default qemu_log output to stderr. This gives you a combined log that you can redirect anywhere. To be honest, this is what I do as a matter of course (2 foo rather than -D foo). Maybe we can introduce a new qemu option to specify a error logfile where stderr be redirected, like below, DEF(elogfile, HAS_ARG, QEMU_OPTION_elogfile, \ -elogfile logfile redirect stderr log to logfile(default /var/log/qemu/vm name##.log)\n, QEMU_ARCH_ALL) STEXI @item -elogfile @var{logfile} @findex -elogfile redirect stderr in @var{logfile} ETEXI then we can set the error log file through qemu command, /var/log/qemu/vm name##.log as default. This sounds out-of-scope for QEMU to me and makes a standard flow non-standard. If prints are going to stderr where should be going elsewhere they probably should be fixed. Do you have specific examples of information going to stderr that you would rather go to a log (be it an error log or something else?). I use proxmox to manage vm, it dose not redirect qemu's stderr, and start vm with -daemonize option, so the error log disappeared. I want to redirect the error log of qemu to a specified logfile, if fault happened, I can use the error log to analyze the fault. And, why qemu output the error log to stderr instead of a error logfile which can be configure? Because the code is a mess in that regard. You don't fix that by redirecting stderr wholesale, because that just adds to the mess. You fix it at the root, one ill-advised fprintf() at a time, as Peter advises: [...] There's plently of tree wide work to clean up the cases where stderr is used where qemu_log should be. If you are finding that log information is going to stderr instead of the log, patches would be welcome. If you want to redirect stderr in the interim, do it in whatever runs QEMU.
Re: [Qemu-devel] [v3][PATCH 1/4] i440fx: make types configurable at run-time
On 2014/8/4 21:48, Michael S. Tsirkin wrote: On Thu, Jul 31, 2014 at 08:09:31PM +0800, Tiejun Chen wrote: Xen wants to supply a different pci and host devices, inheriting i440fx devices. Make types configurable. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com You should have a From: line too, otherwise git will set the Author field incorrectly. Will update as follows: From: Michael S. Tsirkin m...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Thanks Tiejun
Re: [Qemu-devel] [v3][PATCH 3/4] xen:hw:pci-host:piix: create host bridge to passthrough
On 2014/8/4 21:50, Michael S. Tsirkin wrote: On Thu, Jul 31, 2014 at 08:09:33PM +0800, Tiejun Chen wrote: Implement a pci host bridge specific to passthrough. Actually this just inherits the standard one. This is based on http://patchwork.ozlabs.org/patch/363810/. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 41 + include/hw/i386/pc.h | 2 ++ 2 files changed, 43 insertions(+) v3: * Rebase v2: * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 0cd82b8..26ba1e5 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -93,6 +93,9 @@ typedef struct PIIX3State { #define I440FX_PCI_DEVICE(obj) \ OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) +#define XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE(obj) \ +OBJECT_CHECK(PCII440FXState, (obj), TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE) + struct PCII440FXState { /* private */ PCIDevice parent_obj; @@ -303,6 +306,16 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } +static int xen_igd_passthrough_i440fx_initfn(PCIDevice *dev) +{ +PCII440FXState *d = XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE(dev); + +dev-config[I440FX_SMRAM] = 0x02; + +cpu_smm_register(i440fx_set_smm, d); +return 0; +} + This is exactly i440fx_initfn. Don't copy and paste code like this, reuse existing functions. We can't reuse i440fx_initfn() simply after we introduce XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE(). But with a further concern I think we also pass type into I440FX_PCI_DEVIC() to index different objects here. Please see next version. Thanks Tiejun
Re: [Qemu-devel] [v3][PATCH 0/5] xen: introduce new machine for IGD passthrough
On 2014/8/4 21:51, Michael S. Tsirkin wrote: On Thu, Jul 31, 2014 at 08:09:30PM +0800, Tiejun Chen wrote: v3: * Drop patch #4 * Add one patch #1 from Michael * Rebase You added my patch but don't use it, so most of my comment weren't addressed. I guess I should cover those comments and posted some reply as well. But maybe something is confused. Do you plan to send v4 to address them? Anyway, I will send v4 to narrow down that gap to your expectoration. Thanks Tiejun
[Qemu-devel] [Bug 1353149] [NEW] qemu 2.1.0 fails to start if number of cores is greater than 1.
Public bug reported: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353149 Title: qemu 2.1.0 fails to start if number of cores is greater than 1. Status in QEMU: New Bug description: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353149/+subscriptions
Re: [Qemu-devel] about -enable-kvm options
Thanks Richard :) On Sat, Aug 2, 2014 at 4:47 AM, Richard W.M. Jones rjo...@redhat.com wrote: On Fri, Aug 01, 2014 at 11:15:29AM +0800, Gareth wrote: Hi all What does '-enable-kvm' option mean? I have heard two versions of answers: It's a shortcut for: $qemu -machine accel=kvm a) guest OS would have /dev/kvm device and which could help vm in guest OS (nested vm) That's nested KVM, which is enabled as a module option in the kernel module (on the host), eg: # modprobe kvm_intel nested=1 On AMD it's enabled by default. b) use /dev/kvm and intel-vt on host OS which could help vm run more fast than pure emulator. Nearly. It uses /dev/kvm on the host, which may or may not be implemented using Intel VT, or a variety of other techniques. You can also have fallbacks to software emulation (TCG): $qemu -machine accel=kvm:tcg Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html -- Gareth *Cloud Computing, OpenStack, Distributed Storage, Fitness, Basketball* *OpenStack contributor, kun_huang@freenode* *My promise: if you find any spelling or grammar mistakes in my email from Mar 1 2013, notify me * *and I'll donate $1 or ¥1 to an open organization you specify.*
[Qemu-devel] [v4][PATCH 1/5] i440fx: make types configurable at run-time
From: Michael S. Tsirkin m...@redhat.com Xen wants to supply a different pci and host devices, inheriting i440fx devices. Make types configurable. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/i386/pc_piix.c| 4 +++- hw/pci-host/piix.c | 9 - include/hw/i386/pc.h | 6 +- 3 files changed, 12 insertions(+), 7 deletions(-) v4: * Just add From: Michael S. Tsirkin m...@redhat.com v3: * New patch from Michael diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 4f22be8..bf26550 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -201,7 +201,9 @@ static void pc_init1(MachineState *machine, } if (pci_enabled) { -pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi, +pci_bus = i440fx_init(TYPE_I440FX_PCI_HOST_BRIDGE, + TYPE_I440FX_PCI_DEVICE, + i440fx_state, piix3_devfn, isa_bus, gsi, system_memory, system_io, machine-ram_size, below_4g_mem_size, above_4g_mem_size, diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index e0e0946..0cd82b8 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -40,7 +40,6 @@ * http://download.intel.com/design/chipsets/datashts/29054901.pdf */ -#define TYPE_I440FX_PCI_HOST_BRIDGE i440FX-pcihost #define I440FX_PCI_HOST_BRIDGE(obj) \ OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_PCI_HOST_BRIDGE) @@ -91,7 +90,6 @@ typedef struct PIIX3State { MemoryRegion rcr_mem; } PIIX3State; -#define TYPE_I440FX_PCI_DEVICE i440FX #define I440FX_PCI_DEVICE(obj) \ OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) @@ -305,7 +303,8 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } -PCIBus *i440fx_init(PCII440FXState **pi440fx_state, +PCIBus *i440fx_init(const char *host_type, const char *pci_type, +PCII440FXState **pi440fx_state, int *piix3_devfn, ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, @@ -325,7 +324,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, unsigned i; I440FXState *i440fx; -dev = qdev_create(NULL, TYPE_I440FX_PCI_HOST_BRIDGE); +dev = qdev_create(NULL, host_type); s = PCI_HOST_BRIDGE(dev); b = pci_bus_new(dev, NULL, pci_address_space, address_space_io, 0, TYPE_PCI_BUS); @@ -333,7 +332,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, object_property_add_child(qdev_get_machine(), i440fx, OBJECT(dev), NULL); qdev_init_nofail(dev); -d = pci_create_simple(b, 0, TYPE_I440FX_PCI_DEVICE); +d = pci_create_simple(b, 0, pci_type); *pi440fx_state = I440FX_PCI_DEVICE(d); f = *pi440fx_state; f-system_memory = address_space_mem; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 863eefb..11fb72f 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -235,7 +235,11 @@ extern int no_hpet; struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; -PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, +#define TYPE_I440FX_PCI_HOST_BRIDGE i440FX-pcihost +#define TYPE_I440FX_PCI_DEVICE i440FX + +PCIBus *i440fx_init(const char *host_type, const char *pci_type, +PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, -- 1.9.1
[Qemu-devel] [v4][PATCH 0/5] xen: introduce new machine for IGD passthrough
v4: * Rebase on latest tree * Drop patch #2 * Regenerate patches after Michael introduce patch #1 * We need to use this pci_type as a index to reuse I440FX_PCI_DEVICE() * Test: boot with a preinstalled winxp ./i386-softmmu/qemu-system-i386 -hda winxp-32.img -m 2560 -boot c -machine pc v3: * Drop patch #4 * Add one patch #1 from Michael * Rebase * In./i386-softmmu/qemu-system-i386 -hda test.img -m 2560 -boot c -machine pc v2: * Fix some coding style * New patch to separate i440fx_init * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough * Based on patch #2 to regenerate * Unify prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough like patch #3 * Test: boot with a preinstalled ubuntu 14.04 ./i386-softmmu/qemu-system-i386 -hda test.img -m 2560 -boot c -machine pc As we discussed we need to create a separate machine to support current IGD passthrough. Michael S. Tsirkin (1): i440fx: make types configurable at run-time Tiejun Chen (4): pc_init1: pass parameters just with types I440FX_PCI_DEVICE: add pci_type to index xen:hw:pci-host:piix: create host bridge to passthrough xen:hw:i386:pc_piix: introduce new machine for IGD passthrough hw/i386/pc_piix.c| 60 +++- hw/pci-host/piix.c | 58 -- include/hw/i386/pc.h | 8 +++- 3 files changed, 110 insertions(+), 16 deletions(-) Thanks Tiejun
[Qemu-devel] [v4][PATCH 4/5] xen:hw:pci-host:piix: create host bridge to passthrough
Implement a pci host bridge specific to passthrough. Actually this just inherits the standard one. This is based on http://patchwork.ozlabs.org/patch/363810/. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 39 +++ include/hw/i386/pc.h | 2 ++ 2 files changed, 41 insertions(+) v4: * Rebase v3: * Rebase v2: * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 4330599..5cac398 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -303,6 +303,17 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } +static int xen_igd_passthrough_i440fx_initfn(PCIDevice *dev) +{ +PCII440FXState *d = I440FX_PCI_DEVICE(dev, + TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE); + +dev-config[I440FX_SMRAM] = 0x02; + +cpu_smm_register(i440fx_set_smm, d); +return 0; +} + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix3_devfn, @@ -703,6 +714,33 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +static void xen_igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k-init = xen_igd_passthrough_i440fx_initfn; +k-vendor_id = PCI_VENDOR_ID_INTEL; +k-device_id = PCI_DEVICE_ID_INTEL_82441; +k-revision = 0x02; +k-class_id = PCI_CLASS_BRIDGE_HOST; +dc-desc = IGD PT XEN Host bridge; +dc-vmsd = vmstate_i440fx; +/* + * PCI-facing part of the host bridge, not usable without the + * host-facing part, which can't be device_add'ed, yet. + */ +dc-cannot_instantiate_with_device_add_yet = true; +dc-hotpluggable = false; +} + +static const TypeInfo xen_igd_passthrough_i440fx_info = { +.name = TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE, +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCII440FXState), +.class_init= xen_igd_passthrough_i440fx_class_init, +}; + static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -744,6 +782,7 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(i440fx_info); +type_register_static(xen_igd_passthrough_i440fx_info); type_register_static(piix3_info); type_register_static(piix3_xen_info); type_register_static(i440fx_pcihost_info); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 11fb72f..de34aa6 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -238,6 +238,8 @@ typedef struct PCII440FXState PCII440FXState; #define TYPE_I440FX_PCI_HOST_BRIDGE i440FX-pcihost #define TYPE_I440FX_PCI_DEVICE i440FX +#define TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE xen-igd-passthrough-i440FX + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, -- 1.9.1
Re: [Qemu-devel] [PATCH v3 for-2.2 0/8] don't use Yoda conditions
On Tue, Aug 05, 2014 at 07:53:51PM -0600, Eric Blake wrote: On 08/05/2014 08:02 AM, Michael S. Tsirkin wrote: On Fri, Aug 01, 2014 at 03:46:08PM +0800, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com $WHATEVER: don't use 'Yoda conditions' 'Yoda conditions' are not part of idiomatic QEMU coding style, so rewrite them in the more usual order. OK but why stop at these files? How about this instead? @ disable commeq @ expression E; constant C; @@ - C == E + E == C @ disable commeq @ expression E; constant C; @@ - C == E + E == C Why is this listed twice? @ disable gtr_lss @ expression E; constant C; @@ - C E + E C This is wrong for floating point (think NaN); you'd have to audit the results to make sure only integers are commuted. I did, take a look at the results :) @ disable gtr_lss_eq @ expression E; constant C; @@ - C = E + E = C Ditto. Signed-off-by: Michael S. Tsirkin m...@redhat.com The idea seems okay to me, but I haven't closely reviewed the patch yet. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] [questions] about qemu log
The output is on qemu's stderr. You are in control of what that stderr is. I don't get why we can configure -D /path/to/unique/file/name.log but we also have to redirect stderr (I didn't checked if the daemonize option was closing it). What's the purpose of this logfile option? Well -D will log to file only loggable (i.e. qemu_log()) information (which has all sorts of options and switches). Stderr, is a little more static and should in theory be limited to genuine errors. But if you want a combined log of both you can simply omit -D to default qemu_log output to stderr. This gives you a combined log that you can redirect anywhere. To be honest, this is what I do as a matter of course (2 foo rather than -D foo). Maybe we can introduce a new qemu option to specify a error logfile where stderr be redirected, like below, DEF(elogfile, HAS_ARG, QEMU_OPTION_elogfile, \ -elogfile logfile redirect stderr log to logfile(default /var/log/qemu/vm name##.log)\n, QEMU_ARCH_ALL) STEXI @item -elogfile @var{logfile} @findex -elogfile redirect stderr in @var{logfile} ETEXI then we can set the error log file through qemu command, /var/log/qemu/vm name##.log as default. This sounds out-of-scope for QEMU to me and makes a standard flow non-standard. If prints are going to stderr where should be going elsewhere they probably should be fixed. Do you have specific examples of information going to stderr that you would rather go to a log (be it an error log or something else?). I use proxmox to manage vm, it dose not redirect qemu's stderr, and start vm with -daemonize option, so the error log disappeared. I want to redirect the error log of qemu to a specified logfile, if fault happened, I can use the error log to analyze the fault. And, why qemu output the error log to stderr instead of a error logfile which can be configure? Because the code is a mess in that regard. You don't fix that by redirecting stderr wholesale, because that just adds to the mess. You fix it at the root, one ill-advised fprintf() at a time, as Peter advises: Sorry, I'm afraid I misunderstand what you mean, should I replace all of fprintf(stderr, ...) with qemu_log() ? or only some cases where stderr is used where qemu_log should be, as Perter advises? If so, should I still need to redirect the stderr to specified logfile in qemu's parent shell/process ? Thanks, Zhang Haoyu [...] There's plently of tree wide work to clean up the cases where stderr is used where qemu_log should be. If you are finding that log information is going to stderr instead of the log, patches would be welcome. If you want to redirect stderr in the interim, do it in whatever runs QEMU.
Re: [Qemu-devel] [Bug 1353149] [NEW] qemu 2.1.0 fails to start if number of cores is greater than 1.
I can confirm this in Centos 7, also. Qemu 2.1.0 with latest libvirt. David Cruz 2014-08-06 0:12 GMT+02:00 asavah irher...@gmail.com: Public bug reported: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353149 Title: qemu 2.1.0 fails to start if number of cores is greater than 1. Status in QEMU: New Bug description: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353149/+subscriptions
Re: [Qemu-devel] [PATCH 1/2] virtio-serial: create a linked list of all active devices
Amit Shah amit.s...@redhat.com writes: On (Mon) 04 Aug 2014 [13:33:56], Markus Armbruster wrote: Amit Shah amit.s...@redhat.com writes: To ensure two virtserialports don't get added to the system with the same 'name' parameter, we need to access all the ports on all the devices added, and compare the names. We currently don't have a list of all VirtIOSerial devices added to the system. This commit adds a simple linked list in which devices are put when they're initialized, and removed when they go away. snip +struct VirtIOSerialDevices { +QLIST_HEAD(, VirtIOSerial) devices; +} vserdevices; + Any particular reason for stuffing the list into a struct? Not strictly needed for this patch alone, but I think it's cleaner to keep it this way, and when something else comes up that needs a per-device variable, this list will be around. Adding the struct when it's needed will be easy, so why pay the notational overhead now? Also, this is also the way it's done in the kernel, so that uniformity helps as well. Two uglies make a pretty? Anyway, matter of taste. [...]
Re: [Qemu-devel] [PATCH 0/2] virtio-serial: Prevent addition of ports with same name
Amit Shah amit.s...@redhat.com writes: Hi, The 2nd patch prevents adding ports to the system with conflicts in the 'name' parameter. This is done by going through all the ports in the system, including ports on other virtio-serial devices. The first patch prepares for this by creating a linked list of all available virtio-serial devices, so they can be walked over to check their ports' names. Please review. No advice from Andreas on better ways to iterate over devices, yet (vacation time?). If there is one, we can always use it on top. Reviewed-by: Markus Armbruster arm...@redhat.com
[Qemu-devel] [Bug 1353149] Re: qemu 2.1.0 fails to start if number of cores is greater than 1.
-smp 1,maxcpus=4,sockets=1,cores=4,threads=1 should be -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 although more human-friendly error is more appropriate there (better than a silent fallback to either 1- or 4- core topology) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353149 Title: qemu 2.1.0 fails to start if number of cores is greater than 1. Status in QEMU: New Bug description: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353149/+subscriptions
Re: [Qemu-devel] [PATCH 1/2] virtio-serial: create a linked list of all active devices
On (Wed) 06 Aug 2014 [09:27:02], Markus Armbruster wrote: Amit Shah amit.s...@redhat.com writes: On (Mon) 04 Aug 2014 [13:33:56], Markus Armbruster wrote: Amit Shah amit.s...@redhat.com writes: To ensure two virtserialports don't get added to the system with the same 'name' parameter, we need to access all the ports on all the devices added, and compare the names. We currently don't have a list of all VirtIOSerial devices added to the system. This commit adds a simple linked list in which devices are put when they're initialized, and removed when they go away. snip +struct VirtIOSerialDevices { +QLIST_HEAD(, VirtIOSerial) devices; +} vserdevices; + Any particular reason for stuffing the list into a struct? Not strictly needed for this patch alone, but I think it's cleaner to keep it this way, and when something else comes up that needs a per-device variable, this list will be around. Adding the struct when it's needed will be easy, so why pay the notational overhead now? True. Also, this is also the way it's done in the kernel, so that uniformity helps as well. Two uglies make a pretty? Hah! Amit
Re: [Qemu-devel] [PATCH 0/2] virtio-serial: Prevent addition of ports with same name
On (Wed) 06 Aug 2014 [09:27:04], Markus Armbruster wrote: Amit Shah amit.s...@redhat.com writes: Hi, The 2nd patch prevents adding ports to the system with conflicts in the 'name' parameter. This is done by going through all the ports in the system, including ports on other virtio-serial devices. The first patch prepares for this by creating a linked list of all available virtio-serial devices, so they can be walked over to check their ports' names. Please review. No advice from Andreas on better ways to iterate over devices, yet (vacation time?). If there is one, we can always use it on top. Reviewed-by: Markus Armbruster arm...@redhat.com Thanks! I'll wait for a week before submitting a pull req. Amit
Re: [Qemu-devel] [questions] about qemu log
Zhang Haoyu zhan...@sangfor.com writes: The output is on qemu's stderr. You are in control of what that stderr is. I don't get why we can configure -D /path/to/unique/file/name.log but we also have to redirect stderr (I didn't checked if the daemonize option was closing it). What's the purpose of this logfile option? Well -D will log to file only loggable (i.e. qemu_log()) information (which has all sorts of options and switches). Stderr, is a little more static and should in theory be limited to genuine errors. But if you want a combined log of both you can simply omit -D to default qemu_log output to stderr. This gives you a combined log that you can redirect anywhere. To be honest, this is what I do as a matter of course (2 foo rather than -D foo). Maybe we can introduce a new qemu option to specify a error logfile where stderr be redirected, like below, DEF(elogfile, HAS_ARG, QEMU_OPTION_elogfile, \ -elogfile logfile redirect stderr log to logfile(default /var/log/qemu/vm name##.log)\n, QEMU_ARCH_ALL) STEXI @item -elogfile @var{logfile} @findex -elogfile redirect stderr in @var{logfile} ETEXI then we can set the error log file through qemu command, /var/log/qemu/vm name##.log as default. This sounds out-of-scope for QEMU to me and makes a standard flow non-standard. If prints are going to stderr where should be going elsewhere they probably should be fixed. Do you have specific examples of information going to stderr that you would rather go to a log (be it an error log or something else?). I use proxmox to manage vm, it dose not redirect qemu's stderr, and start vm with -daemonize option, so the error log disappeared. I want to redirect the error log of qemu to a specified logfile, if fault happened, I can use the error log to analyze the fault. And, why qemu output the error log to stderr instead of a error logfile which can be configure? Because the code is a mess in that regard. You don't fix that by redirecting stderr wholesale, because that just adds to the mess. You fix it at the root, one ill-advised fprintf() at a time, as Peter advises: Sorry, I'm afraid I misunderstand what you mean, should I replace all of fprintf(stderr, ...) with qemu_log() ? or only some cases where stderr is used where qemu_log should be, as Perter advises? I didn't mean to suggest blind conversion from fprintf() to qemu_log(). Each instance of fprintf() needs to be reviewed before conversion. I think that's exactly Peter's advice, too. If so, should I still need to redirect the stderr to specified logfile in qemu's parent shell/process ? Probably. Libvirt does it.
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
Il 05/08/2014 20:57, Dr. David Alan Gilbert ha scritto: (CC Paolo) * William Dauchy (wdau...@gmail.com) wrote: Hello, For a qemu live migration from 2.0 to 2.1 I'm getting: qemu-system-x86_64: Length mismatch: /rom@etc/acpi/tables: 3000 in != 2 qemu-system-x86_64: Unknown ramblock /table-loader, cannot accept migration qemu-system-x86_64: Ack, bad migration stream! qemu-system-x86_64: Illegal RAM offset f0e2d500f06000 qemu: warning: error while loading state for instance 0x0 of device 'ram' qemu-system-x86_64: load of migration failed: Invalid argument I was wondering if it was a bug or if I was supposed to do something to avoid this error. That should work. I saw the bug report for qemu1.7 but it's not my case. Can you confirm this is on the final 2.1 release (there was a fix that went in just around rc5). What's your command line on both ends? I suspect he's using -M pc on both. You must use -M pc-i440fx-2.0 on both if you're migrating from 2.0 to a different version. Paolo
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
Il 06/08/2014 08:16, Markus Armbruster ha scritto: for the receiver, I'm using 2.1 # qemu-system-x86_64 --version QEMU emulator version 2.1.0, Copyright (c) 2003-2008 Fabrice Bellar for the sender side, here is the qmp result: (QEMU) query-version { u'return': { u'package': u'', u'qemu': { u'major': 2, u'micro': 50, u'minor': 0}}} This is a development version between 2.0.0 and 2.0.0-rc1; it may have bugs such as the one that was fixed in 2.0.0-rc5. Migration from development versions is not supported. You must either upgrade to 2.1.0, or downgrade to the released 2.0.0 version. Paolo
Re: [Qemu-devel] [PATCH v3 for-2.2 0/8] don't use Yoda conditions
Michael S. Tsirkin m...@redhat.com writes: On Wed, Aug 06, 2014 at 08:05:46AM +0200, Markus Armbruster wrote: Gonglei (Arei) arei.gong...@huawei.com writes: Hi, $WHATEVER: don't use 'Yoda conditions' 'Yoda conditions' are not part of idiomatic QEMU coding style, so rewrite them in the more usual order. OK but why stop at these files? How about this instead? I just search c files by using key words like NULL == etc. I don't think we should change conditional statements like and =. Eric pointed out it's actually incorrect for NaNs. If you want to touch inequalities, separate patch(es) please, because they need more thorough review, both for correctness and for style. BTW, just using like value == NULL instead of NULL == value in all files is not a good idea, which we have discussed in my patch serials v2. So, I posted v3, add change log imitate nearby code about using '!value' or value == NULL' at every patch . Re not a good idea: I think rewriting NULL == value to value == NULL *is* a good idea, but rewriting it to !value where that blends in with surrounding code is a *better* idea. Gonglei's patches do that, Michael's don't, but are more complete. Therefore: Yes but it's unrelated to Yoda: we have x != NULL without Yoda in a lot of places. So this seems, to me, an unrelated issue. Actually, the relation is clear to me. The patch cleans up style, by converting Yoda conditionals into the locally appropriate form. Locally appropriate because the optimal choice between x == NULL and !x depends on which form the context uses. Gonglei made those choices. It's only a small improvement, and I wouldn't demand it, but since we already have it, let's not throw it away. If people feel this == NULL - !x is desired, it's better to do it all at once IMHO, and do x != NULL - x at the same time. Nope, because there's no consensus. See thread leading to https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg00033.html Easy to run another script to do it on top. So, maybe you can post patches for those files I have missed in the serials, but not simply instead all by semantic script IMO, thanks! Easy: apply Gonglei's patches before you run the script. You may have to split patches along subsystem boundaries to get them in. Bothersome, as it involves guessing boundaries. Not a request from me, just a warning of possible misfortune :) It's going in through trivial tree, I don't think split-up is necessary. Hope you have better luck than me, because when I try to route tree-wide cleanups via -trivial, I usually don't have any.
[Qemu-devel] [PATCH] linux-user: redirect openat calls
From: Riku Voipio riku.voi...@linaro.org While Mikhail fixed /proc/self/maps, it was noticed openat calls are not redirected currently. Some archs don't have open at all, so openat needs to be redirected. Fix this by consolidating open/openat code to do_openat - open is implemented using openat(AT_FDCWD, ... ), which according to open(2) man page is identical. Since all targets now have openat, remove the ifdef around sys_openat and openat: case in do_syscall. Cc: Mikhail Ilin m.i...@samsung.com Signed-off-by: Riku Voipio riku.voi...@linaro.org --- linux-user/syscall.c | 23 +-- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index c8c2b4c..dd77673 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -294,7 +294,6 @@ static int sys_getcwd1(char *buf, size_t size) return strlen(buf)+1; } -#ifdef TARGET_NR_openat static int sys_openat(int dirfd, const char *pathname, int flags, mode_t mode) { /* @@ -306,7 +305,6 @@ static int sys_openat(int dirfd, const char *pathname, int flags, mode_t mode) } return (openat(dirfd, pathname, flags)); } -#endif #ifdef TARGET_NR_utimensat #ifdef CONFIG_UTIMENSAT @@ -5274,7 +5272,7 @@ static int open_net_route(void *cpu_env, int fd) } #endif -static int do_open(void *cpu_env, const char *pathname, int flags, mode_t mode) +static int do_openat(void *cpu_env, int dirfd, const char *pathname, int flags, mode_t mode) { struct fake_open { const char *filename; @@ -5295,7 +5293,7 @@ static int do_open(void *cpu_env, const char *pathname, int flags, mode_t mode) if (is_proc_myself(pathname, exe)) { int execfd = qemu_getauxval(AT_EXECFD); -return execfd ? execfd : get_errno(open(exec_path, flags, mode)); +return execfd ? execfd : get_errno(sys_openat(dirfd, exec_path, flags, mode)); } for (fake_open = fakes; fake_open-filename; fake_open++) { @@ -5329,7 +5327,7 @@ static int do_open(void *cpu_env, const char *pathname, int flags, mode_t mode) return fd; } -return get_errno(open(path(pathname), flags, mode)); +return get_errno(sys_openat(dirfd, path(pathname), flags, mode)); } /* do_syscall() should always have a single exit point at the end so @@ -5404,22 +5402,19 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, case TARGET_NR_open: if (!(p = lock_user_string(arg1))) goto efault; -ret = get_errno(do_open(cpu_env, p, -target_to_host_bitmask(arg2, fcntl_flags_tbl), -arg3)); +ret = get_errno(do_openat(cpu_env, AT_FDCWD, p, + target_to_host_bitmask(arg2, fcntl_flags_tbl), + arg3)); unlock_user(p, arg1, 0); break; -#if defined(TARGET_NR_openat) defined(__NR_openat) case TARGET_NR_openat: if (!(p = lock_user_string(arg2))) goto efault; -ret = get_errno(sys_openat(arg1, - path(p), - target_to_host_bitmask(arg3, fcntl_flags_tbl), - arg4)); +ret = get_errno(do_openat(cpu_env, arg1, p, + target_to_host_bitmask(arg3, fcntl_flags_tbl), + arg4)); unlock_user(p, arg2, 0); break; -#endif case TARGET_NR_close: ret = get_errno(close(arg1)); break; -- 2.0.1
Re: [Qemu-devel] [PATCH] linux-user: /proc/self/maps content
On Tue, Aug 05, 2014 at 05:24:27PM +0400, Mikhail Ilin wrote: I've tested the sample for Aarch64 myself and found that the approach should also work fine. Translation layout: $ qemu-aarch64 -strace /tmp/busybox-static cat /proc/self/maps startend size prot 0040-005ba000 001ba000 r-x 005c9000-005d3000 a000 rw- 0040-00401000 1000 --- 00401000-004000801000 0080 rw- /proc/self/maps output: 0040-005ba000 r-xp 08:01 28837016 /tmp/busybox-static 005ba000-005c9000 ---p 00:00 0 005c9000-005cc000 rw-p 001b9000 08:01 28837016 /tmp/busybox-static 005cc000-005f4000 rw-p 00:00 0 6000-602eb000 r-xp 08:01 55578769 /home/michail/my1/bin/qemu-aarch64 604eb000-604f6000 rw-p 002eb000 08:01 55578769 /home/michail/my1/bin/qemu-aarch64 604f6000-6054a000 rw-p 00:00 0 6054a000-6254b000 rwxp 00:00 0 6254b000-62577000 rw-p 00:00 0 63396000-633da000 rw-p 00:00 0 [heap] 40-401000 ---p 00:00 0 401000-4000801000 rw-p 00:00 0 7ff830cab000-7ff8348fb000 rw-p 00:00 0 7fffb26ed000-7fffb270e000 rw-p 00:00 0 [stack] 7fffb27bb000-7fffb27bd000 r-xp 00:00 0 [vdso] ff60-ff601000 r-xp 00:00 0 [vsyscall] And the reason why it doesn't work for Aarch64 is openat call which is used instead of open. $ qemu-aarch64 -strace /tmp/busybox-static cat /proc/self/maps 483 setgid(1000,0,47,45,0,274886296116) = 0 483 setuid(1000,0,47,45,0,274886296116) = 0 483 openat(AT_FDCWD,/proc/self/maps,O_RDONLY) = 3 483 read(3,0x7febf0,4096) = 1071 this call doesn't have additional preprocessing and called directly. case TARGET_NR_openat: if (!(p = lock_user_string(arg2))) goto efault; ret = get_errno(sys_openat(arg1, path(p), target_to_host_bitmask(arg3, fcntl_flags_tbl), arg4)); I believe OpenRISC case looks the same. Thanks for looking into it. I just sent a patch that adds preprocessing to openat, and seems to clear the issue for both aarch64 and OpenRISC. Riku On 05.08.2014 15:47, Riku Voipio wrote: Hi, On Tue, Aug 05, 2014 at 03:10:07PM +0400, Mikhail Ilyin wrote: Build /proc/self/maps doing a match against guest memory translation table. Output only that map records which are valid for guest memory layout. This is clear improvement, for most archs. But seems aarch64, openrisc still leak host maps. It's not a regression, same issue before the patch. + /home/voipio/linaro/qemu/obj/alpha-linux-user/qemu-alpha /home/voipio/linaro/qemu-smoke/alpha/busybox cat /proc/self/maps 00012000-0001201cc000 r-xp fe:00 8784862 /home/voipio/linaro/qemu-smoke/alpha/busybox 0001201dc000-0001201e rw-p 001cc000 fe:00 8784862 /home/voipio/linaro/qemu-smoke/alpha/busybox 0001201e-00012020a000 rw-p 00:00 0 0040-00402000 ---p 00:00 0 00402000-004000802000 rw-p 00:00 0 [stack] + /home/voipio/linaro/qemu/obj/arm-linux-user/qemu-arm /home/voipio/linaro/qemu-smoke/armel/busybox cat /proc/self/maps 8000-0014b000 r-xp fe:00 8784905 /home/voipio/linaro/qemu-smoke/armel/busybox 00153000-00154000 rw-p 00143000 fe:00 8784905 /home/voipio/linaro/qemu-smoke/armel/busybox 00154000-0017b000 rw-p 00:00 0 f67ff000-f680 ---p 00:00 0 f680-f700 rw-p 00:00 0[stack] + /home/voipio/linaro/qemu/obj/aarch64-linux-user/qemu-aarch64 /home/voipio/linaro/qemu-smoke/arm64/busybox cat /proc/self/maps 0040-00572000 r-xp fe:00 8784917 /home/voipio/linaro/qemu-smoke/arm64/busybox 00572000-00581000 ---p 00:00 0 00581000-00584000 rw-p 00171000 fe:00 8784917 /home/voipio/linaro/qemu-smoke/arm64/busybox 00584000-005ac000 rw-p 00:00 0 40-401000 ---p 00:00 0 401000-4000811000 rw-p 00:00 0 7f38e312b000-7f38e6d2b000 rw-p 00:00 0 7f38e6d2b000-7f38e6d86000 r-xp fe:00 5242918 /lib/x86_64-linux-gnu/libpcre.so.3.13.1 7f38e6d86000-7f38e6f86000 ---p 0005b000 fe:00 5242918 /lib/x86_64-linux-gnu/libpcre.so.3.13.1 7f38e6f86000-7f38e6f87000 rw-p 0005b000 fe:00 5242918 /lib/x86_64-linux-gnu/libpcre.so.3.13.1 7f38e6f87000-7f38e7126000 r-xp fe:00 5248993 /lib/x86_64-linux-gnu/libc-2.19.so 7f38e7126000-7f38e7326000 ---p 0019f000 fe:00 5248993 /lib/x86_64-linux-gnu/libc-2.19.so
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/08/2014 07:33, Ming Lei ha scritto: I played a bit with the following, I hope it's not too naive. I couldn't see a difference with your patches, but at least one reason for this is probably that my laptop SSD isn't fast enough to make the CPU the bottleneck. Haven't tried ramdisk yet, that would probably be the next thing. (I actually wrote the patch up just for some profiling on my own, not for comparing throughput, but it should be usable for that as well.) This might not be good for the test since it is basically a sequential read test, which can be optimized a lot by kernel. And I always use randread benchmark. A microbenchmark already exists in tests/test-coroutine.c, and doesn't really tell us much; it's obvious that coroutines execute more code, the question is why it affects the iops performance. Could you take a look at the coroutine benchmark I worte? The running result shows coroutine does decrease performance a lot compared with bypass coroutine like the patchset is doing. The sequential read should be the right workload. For fio, you want to get as many iops as possible to QEMU and so you need randread. But qemu-img is not run in a guest and if the kernel optimizes sequential reads then the bypass should have even more benefits because it makes userspace proportionally more expensive. In any case, the patches as written have no hope of being accepted. If you invert the logic from aio-co to co-aio, that would be much better even if it's tedious. Let's not talk the bypass patch, and see if the coroutine is really the cause of performance drop first. Thanks,
[Qemu-devel] [Bug 1353346] [NEW] ARMv7-M software-triggered interrupts-- unexpected behaviour
Public bug reported: The handling of the NVIC Software Triggered Interrupt Register in qemu-2.1.0/hw/intc/armv7m_nvic.c:375 isn't quite right. As things stand, writing a zero to the STIR ends up transferring control to vector table entry zero, which, on ARMv7-M, holds the reset value of the stack pointer. That's what I see with lm3s811evb emulation, and that's not what happens on my STM NUCLEO-F103RB board (Cortex-M3). Seems like an oversight-- the handler probably wants armv7m_nvic_set_pending(), not gic_set_pending_private(), and the IRQ number needs 16 added onto it to get the exception number for the interrupt. ARM DUI 0552A (Cortex-M3 Devices: Generic User's Guide), p. 134: Interrupt ID of the interrupt to trigger, in the range 0-239. For example, a value of 0x03 specifies interrupt IRQ3. Cheers, Boris ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353346 Title: ARMv7-M software-triggered interrupts-- unexpected behaviour Status in QEMU: New Bug description: The handling of the NVIC Software Triggered Interrupt Register in qemu-2.1.0/hw/intc/armv7m_nvic.c:375 isn't quite right. As things stand, writing a zero to the STIR ends up transferring control to vector table entry zero, which, on ARMv7-M, holds the reset value of the stack pointer. That's what I see with lm3s811evb emulation, and that's not what happens on my STM NUCLEO-F103RB board (Cortex-M3). Seems like an oversight-- the handler probably wants armv7m_nvic_set_pending(), not gic_set_pending_private(), and the IRQ number needs 16 added onto it to get the exception number for the interrupt. ARM DUI 0552A (Cortex-M3 Devices: Generic User's Guide), p. 134: Interrupt ID of the interrupt to trigger, in the range 0-239. For example, a value of 0x03 specifies interrupt IRQ3. Cheers, Boris To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353346/+subscriptions
Re: [Qemu-devel] [RFC PATCH v2 09/10] virtio-scsi-dataplane: Code to run virtio-scsi on iothread
Il 06/08/2014 07:35, Fam Zheng ha scritto: ifeq ($(CONFIG_VIRTIO),y) -obj-y += virtio-scsi.o +obj-y += virtio-scsi.o virtio-scsi-dataplane.o obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o endif I first thought that this must be conditional on CONFIG_VIRTIO_BLK_DATA_PLANE. However, CONFIG_VIRTIO_BLK_DATA_PLANE is itself obsolete: ## # adjust virtio-blk-data-plane based on linux-aio if test $virtio_blk_data_plane = yes -a \ $linux_aio != yes ; then error_exit virtio-blk-data-plane requires Linux AIO, please try --enable-linux-aio elif test -z $virtio_blk_data_plane ; then virtio_blk_data_plane=$linux_aio fi and there's no requirement to have Linux AIO anymore. Can you prepare a patch to drop CONFIG_VIRTIO_BLK_DATA_PLANE, and replace patch 1 with it? We can leave --disable-virtio-blk-data-plane and --enable-virtio-blk-data-plane in for a couple of releases. Paolo
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Am 06.08.2014 um 07:33 hat Ming Lei geschrieben: Hi Kevin, On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf kw...@redhat.com wrote: Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben: I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your bypass code path to run the request inside a coroutine. That way you can compare bypass without coroutine against bypass with coroutine. Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. My doubts aren't only that the overhead might not come from the coroutines, but also whether any coroutine-related overhead is really unavoidable. If we can optimise coroutines, I'd strongly prefer to do just that instead of introducing additional code paths. OK, thank you for taking look at the problem, and hope we can figure out the root cause, :-) Another thought I had was this: If the performance difference is indeed only coroutines, then that is completely inside the block layer and we don't actually need a VM to test it. We could instead have something like a simple qemu-img based benchmark and should be observing the same. Even it is simpler to run a coroutine-only benchmark, and I just wrote a raw one, and looks coroutine does decrease performance a lot, please see the attachment patch, and thanks for your template to help me add the 'co_bench' command in qemu-img. Yes, we can look at coroutines microbenchmarks in isolation. I actually did do that yesterday with the yield test from tests/test-coroutine.c. And in fact profiling immediately showed something to optimise: pthread_getspecific() was quite high, replacing it by __thread on systems where it works is more efficient and helped the numbers a bit. Also, a lot of time seems to be spent in pthread_mutex_lock/unlock (even in qemu-img bench), maybe there's even something that can be done here. However, I just wasn't sure whether a change on this level would be relevant in a realistic environment. This is the reason why I wanted to get a benchmark involving the block layer and some I/O. From the profiling data in below link: http://pastebin.com/YwH2uwbq With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased ~35%(693M vs. 512M), insns per cycle is decreased by ~50%( 1.35 vs. 1.63), compared with bypassing coroutine(-b parameter). The bypass code in the benchmark is very similar with the approach used in the bypass patch, since linux-aio with O_DIRECT seldom blocks in the the kernel I/O path. Maybe the benchmark is a bit extremely, but given modern storage device may reach millions of IOPS, and it is very easy to slow down the I/O by coroutine. I think in order to optimise coroutines, such benchmarks are fair game. It's just not guaranteed that the effects are exactly the same on real workloads, so we should take the results with a grain of salt. Anyhow, the coroutine version of your benchmark is buggy, it leaks all coroutines instead of exiting them, so it can't make any use of the coroutine pool. On my laptop, I get this (where fixed coroutine is a version that simply removes the yield at the end): | bypass| fixed coro| buggy coro +---+---+-- time| 1.09s | 1.10s | 1.62s L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438 insns per cycle | 2.39 | 2.39 | 1.90 Begs the question whether you see a similar effect on a real qemu and the coroutine pool is still not big enough? With correct use of coroutines, the difference seems to be barely measurable even without any I/O involved. I played a bit with the following, I hope it's not too naive. I couldn't see a difference with your patches, but at least one reason for this is probably that my laptop SSD isn't fast enough to make the CPU the bottleneck. Haven't tried ramdisk yet, that would probably be the next thing. (I actually wrote the patch up just for some profiling on my own, not for comparing throughput, but it should be usable for that as well.) This might not be good for the test since it is basically a sequential read test, which can be optimized a lot by kernel. And I always use randread benchmark. Yes, I shortly pondered whether I should implement random offsets instead. But then I realised that a quicker kernel operation would only help the benchmark because we want it to test the CPU consumption in userspace. So the faster the kernel gets, the better for us, because it should make the impact of coroutines bigger. Kevin
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Il 06/08/2014 10:38, Ming Lei ha scritto: On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/08/2014 07:33, Ming Lei ha scritto: I played a bit with the following, I hope it's not too naive. I couldn't see a difference with your patches, but at least one reason for this is probably that my laptop SSD isn't fast enough to make the CPU the bottleneck. Haven't tried ramdisk yet, that would probably be the next thing. (I actually wrote the patch up just for some profiling on my own, not for comparing throughput, but it should be usable for that as well.) This might not be good for the test since it is basically a sequential read test, which can be optimized a lot by kernel. And I always use randread benchmark. A microbenchmark already exists in tests/test-coroutine.c, and doesn't really tell us much; it's obvious that coroutines execute more code, the question is why it affects the iops performance. Could you take a look at the coroutine benchmark I worte? The running result shows coroutine does decrease performance a lot compared with bypass coroutine like the patchset is doing. Your benchmark is synchronous, while disk I/O is asynchronous. Your benchmark doesn't add much compared to time tests/test-coroutine -m perf -p /perf/yield. It takes 8 seconds on my machine, and 10^8 function calls obviously take less than 8 seconds. I've sent a patch to add a baseline function call benchmark to test-coroutine. The sequential read should be the right workload. For fio, you want to get as many iops as possible to QEMU and so you need randread. But qemu-img is not run in a guest and if the kernel optimizes sequential reads then the bypass should have even more benefits because it makes userspace proportionally more expensive. Do you agree with this? Paolo
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
On Wed, Aug 6, 2014 at 9:47 AM, Paolo Bonzini pbonz...@redhat.com wrote: I suspect he's using -M pc on both. Is it the default value? because it's not the case in my command line. You must use -M pc-i440fx-2.0 on both if you're migrating from 2.0 to a different version. wow. I wasn't expecting such behavior; i.e migration between qemu version does not seem trivial and it removes the benefits; better using stop/start. thanks for the info. -- William
Re: [Qemu-devel] [RFC PATCH v2 09/10] virtio-scsi-dataplane: Code to run virtio-scsi on iothread
On Wed, 08/06 10:45, Paolo Bonzini wrote: Il 06/08/2014 07:35, Fam Zheng ha scritto: ifeq ($(CONFIG_VIRTIO),y) -obj-y += virtio-scsi.o +obj-y += virtio-scsi.o virtio-scsi-dataplane.o obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o endif I first thought that this must be conditional on CONFIG_VIRTIO_BLK_DATA_PLANE. However, CONFIG_VIRTIO_BLK_DATA_PLANE is itself obsolete: ## # adjust virtio-blk-data-plane based on linux-aio if test $virtio_blk_data_plane = yes -a \ $linux_aio != yes ; then error_exit virtio-blk-data-plane requires Linux AIO, please try --enable-linux-aio elif test -z $virtio_blk_data_plane ; then virtio_blk_data_plane=$linux_aio fi and there's no requirement to have Linux AIO anymore. Can you prepare a patch to drop CONFIG_VIRTIO_BLK_DATA_PLANE, and replace patch 1 with it? We can leave --disable-virtio-blk-data-plane and --enable-virtio-blk-data-plane in for a couple of releases. Yes. Sounds a good idea. Fam
Re: [Qemu-devel] Are -cdrom/-hda (or -drive if=ide) supposed to work in q35?
Am 05.08.2014 um 23:14 hat John Snow geschrieben: On 08/05/2014 04:30 AM, Kevin Wolf wrote: Am 01.08.2014 um 22:10 hat John Snow geschrieben: On 06/12/2014 05:03 AM, Markus Armbruster wrote: -drive mixes up configuration of backend and frontend (a.k.a. device model), as follows: 1. It always defines a backend. info block shows them. 2. It always defines a few frontend configuration bits for the device models to pick up. 3. Except with if=none, it posts a request to board code to create a suitable frontend. It's up to the board code to honor, reject or ignore the request. The i440FX boards honor it, the Q35 boards ignore it. Nobody has gotten around to making the Q35 boards honor it, in part because there has been some confusion on what if=ide is supposed to mean on Q35. Should it connect an ide-hd / ide-cd in SATA mode or in legacy PATA mode? I've always argued for SATA, because for me if=ide does *not* imply a specific kind of HBA any more than if=scsi does, and the natural HBA for Q35 is AHCI in SATA mode. Kevin (cc'ed) has argued for a way to make it connect in legacy PATA mode. I'd be fine with that, as long as it's off by default. Patches welcome. Kevin, (or anyone else with an opinion for that matter), what is the reasoning behind wanting -cdrom to use the old PATA interfaces? The assumption that makes it a problem is that sooner or later we'll want to make Q35 the default. Most of the changes this brings in will make the guest see different, but generally still compatible hardware. AHCI however is not compatible with IDE in the sense that an OS that has only an IDE driver won't work with AHCI. It wouldn't be reasonable to break things like '-hda winxp.qcow2'. And in fact, if the internet is right, even newer Windows version give you trouble when you suddenly change from IDE to AHCI. So after an upgrade many users would find their existing guests to be broken. For at least the immediate future, the AHCI device doesn't support the mixed-mode SATA/PATA access models, though I suppose we could, it seems like a more obvious and simple solution to just allow the shorthand syntactic sugar commands to use the native bus of the system until you specify otherwise. I think I will probably begin writing a patch under this assumption unless there is a better technical reason not to. I think the solution that was generally agreed on was to introduce a machine option that decides whether to provide an IDE or AHCI interface (similar to the BIOS option that commonly exists on real hardware). We just need someone to implement it. Kevin OK, though for what it's worth I think that on real hardware, that BIOS switch toggles configuration bits on the AHCI device that actually changes its signature into a new device. I'm not completely sure, but afaik these bits aren't actually in the AHCI standard, but vendor-specific extensions. If you toggle them, apparently the device magically turns into a different one, including different PCI IDs and things like that. I think we can safely take the shortcut of directly creating the right device and leaving the BIOS alone. I suppose in our case we could create a machine option that toggled the behavior of the AHCI device appropriately to be either IDE or AHCI and treated the syntactic sugar commands appropriately, though you still may run into problems with Windows guests if you ever change the default machine type -- I have a hunch that Windows would get rather grumpy if you swapped out its IDE device from under it with even the emulated ICH9 one in legacy mode, but I suppose we can cross that bridge when we get there. For now, in the meantime, I will assume that -M q35 also implies the usage of AHCI, and treat the shorthands accordingly. Okay. Kevin
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
On Wed, Aug 6, 2014 at 9:49 AM, Paolo Bonzini pbonz...@redhat.com wrote: This is a development version between 2.0.0 and 2.0.0-rc1; it may have bugs such as the one that was fixed in 2.0.0-rc5. you mean 2.1.0-rc5 Migration from development versions is not supported. You must either upgrade to 2.1.0, or downgrade to the released 2.0.0 version. in all case I would have the same issue with 2.0.0 version since the fix has been pushed in 2.1.0-rc5 -- William
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
* William Dauchy (wdau...@gmail.com) wrote: On Wed, Aug 6, 2014 at 9:47 AM, Paolo Bonzini pbonz...@redhat.com wrote: I suspect he's using -M pc on both. Is it the default value? because it's not the case in my command line. You must use -M pc-i440fx-2.0 on both if you're migrating from 2.0 to a different version. wow. I wasn't expecting such behavior; i.e migration between qemu version does not seem trivial and it removes the benefits; better using stop/start. thanks for the info. The trick is to pick a -M value and stick with it; then you should be able to keep migrating to newer QEMU versions easily (just don't go with dev versions because things are often broken in them). -M pc is special, don't use that if you want to be able to migrate Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] qemu live migration error from 2.0 to 2.1
On Wed, Aug 6, 2014 at 11:17 AM, Dr. David Alan Gilbert dgilb...@redhat.com wrote: The trick is to pick a -M value and stick with it; then you should be able to keep migrating to newer QEMU versions easily (just don't go with dev versions because things are often broken in them). ok for dev versions, I'm just cherry picking interesting fixes which are not yet released in stable. -M pc is special, don't use that if you want to be able to migrate understood, I guess it's the default value -- William
[Qemu-devel] [PATCH] test-coroutine: add baseline test that times the cost of function calls
This can be used to compute the cost of coroutine operations. In the end the cost of the function call is a few clock cycles, so it's pretty cheap for now, but it may become more relevant as the coroutine code is optimized. For example, here are the results on my machine: Function call 1 iterations: 0.173884 s Yield 1 iterations: 8.445064 s Lifecycle 100 iterations: 0.098445 s Nesting 1 iterations of 1000 depth each: 7.406431 s One yield takes 83 nanoseconds, one enter takes 97 nanoseconds, one coroutine allocation takes (roughly, since some of the allocations in the nesting test do hit the pool) 739 nanoseconds: (8.445064 - 0.173884) * 10^9 / 1 = 82.7 (0.098445 * 100 - 0.173884) * 10^9 / 1 = 96.7 (7.406431 * 10 - 0.173884) * 10^9 / 1 = 738.9 Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/test-coroutine.c | 24 1 files changed, 24 insertions(+) diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index 760636d..6e634f4 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -288,6 +288,29 @@ static void perf_yield(void) maxcycles, duration); } +static __attribute__((noinline)) void dummy(unsigned *i) +{ +(*i)--; +} + +static void perf_baseline(void) +{ +unsigned int i, maxcycles; +double duration; + +maxcycles = 1; +i = maxcycles; + +g_test_timer_start(); +while (i 0) { +dummy(i); +} +duration = g_test_timer_elapsed(); + +g_test_message(Function call %u iterations: %f s\n, +maxcycles, duration); +} + int main(int argc, char **argv) { g_test_init(argc, argv, NULL); @@ -301,6 +324,7 @@ int main(int argc, char **argv) g_test_add_func(/perf/lifecycle, perf_lifecycle); g_test_add_func(/perf/nesting, perf_nesting); g_test_add_func(/perf/yield, perf_yield); +g_test_add_func(/perf/function-call, perf_baseline); } return g_test_run(); } -- 1.9.3
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 6, 2014 at 4:48 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 07:33 hat Ming Lei geschrieben: Hi Kevin, On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf kw...@redhat.com wrote: Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben: I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your bypass code path to run the request inside a coroutine. That way you can compare bypass without coroutine against bypass with coroutine. Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. My doubts aren't only that the overhead might not come from the coroutines, but also whether any coroutine-related overhead is really unavoidable. If we can optimise coroutines, I'd strongly prefer to do just that instead of introducing additional code paths. OK, thank you for taking look at the problem, and hope we can figure out the root cause, :-) Another thought I had was this: If the performance difference is indeed only coroutines, then that is completely inside the block layer and we don't actually need a VM to test it. We could instead have something like a simple qemu-img based benchmark and should be observing the same. Even it is simpler to run a coroutine-only benchmark, and I just wrote a raw one, and looks coroutine does decrease performance a lot, please see the attachment patch, and thanks for your template to help me add the 'co_bench' command in qemu-img. Yes, we can look at coroutines microbenchmarks in isolation. I actually did do that yesterday with the yield test from tests/test-coroutine.c. And in fact profiling immediately showed something to optimise: pthread_getspecific() was quite high, replacing it by __thread on systems where it works is more efficient and helped the numbers a bit. Also, a lot of time seems to be spent in pthread_mutex_lock/unlock (even in qemu-img bench), maybe there's even something that can be done here. The lock/unlock in dataplane is often from memory_region_find(), and Paolo should have done lots of work on that. However, I just wasn't sure whether a change on this level would be relevant in a realistic environment. This is the reason why I wanted to get a benchmark involving the block layer and some I/O. From the profiling data in below link: http://pastebin.com/YwH2uwbq With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased ~35%(693M vs. 512M), insns per cycle is decreased by ~50%( 1.35 vs. 1.63), compared with bypassing coroutine(-b parameter). The bypass code in the benchmark is very similar with the approach used in the bypass patch, since linux-aio with O_DIRECT seldom blocks in the the kernel I/O path. Maybe the benchmark is a bit extremely, but given modern storage device may reach millions of IOPS, and it is very easy to slow down the I/O by coroutine. I think in order to optimise coroutines, such benchmarks are fair game. It's just not guaranteed that the effects are exactly the same on real workloads, so we should take the results with a grain of salt. Anyhow, the coroutine version of your benchmark is buggy, it leaks all coroutines instead of exiting them, so it can't make any use of the coroutine pool. On my laptop, I get this (where fixed coroutine is a version that simply removes the yield at the end): | bypass| fixed coro| buggy coro +---+---+-- time| 1.09s | 1.10s | 1.62s L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438 insns per cycle | 2.39 | 2.39 | 1.90 Begs the question whether you see a similar effect on a real qemu and the coroutine pool is still not big enough? With correct use of coroutines, the difference seems to be barely measurable even without any I/O involved. When I comment qemu_coroutine_yield(), looks result of bypass and fixed coro is very similar as your test, and I am just wondering if stack is always switched in qemu_coroutine_enter() without calling qemu_coroutine_yield(). Without the yield, the benchmark can't emulate coroutine usage in bdrv_aio_readv/writev() path any more, and bypass in the patchset skips two qemu_coroutine_enter() and one qemu_coroutine_yield() for each bdrv_aio_readv/writev(). I played a bit with the following, I hope it's not too naive. I couldn't see a difference with your patches, but at least one reason for this is probably that my laptop SSD isn't fast enough to make the CPU the bottleneck. Haven't tried ramdisk yet, that would probably be the next thing. (I actually wrote the patch up just for some profiling on my
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 06, 2014 at 01:33:36PM +0800, Ming Lei wrote: With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased I agree with Paolo about microbenchmarks. We need to do I/O to get a realistic picture of performance, since there is little point is optimizing something that is not a significant factor in overall performance. But I also wanted to say that these benchmark durations are so short that they can be greatly affected by outliers (e.g. scheduler behavior, system background activity, etc). Run benchmarks for 2 minutes to reduce variance and give the system time to warm up. pgpdVv9EmCnOH.pgp Description: PGP signature
[Qemu-devel] [Bug 1353149] Re: qemu 2.1.0 fails to start if number of cores is greater than 1.
I forgot to mention that VM was created and managed remotely via virt-manager 0.9.5 from another host. I used custom cpu topology. however this config worked fine on qemu 2.0.0 with virt-manager 0.9.5 just tried virt-manager 1.0.1 - it creates the proper argument -smp 4,maxcpus=4,sockets=1,cores=4,threads=1 so this was a virt-manager bug already fixed upstream. this bug can be closed :) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353149 Title: qemu 2.1.0 fails to start if number of cores is greater than 1. Status in QEMU: New Bug description: qemu (kvm) 2.1.0 (built from sources) fails to start if number of cores is greater than 1. relevant part of commandline arguments: /usr/bin/qemu-system-x86_64 -name test3 -S -machine pc- i440fx-2.1,accel=kvm,usb=off -cpu Westmere -m 4096 -realtime mlock=off -smp 1,maxcpus=4,sockets=1,cores=4,threads=1 the error reported is: qemu-system-x86_64: /home/asavah/pkgbuild/qemu-2.1.0/hw/i386/smbios.c:825: smbios_get_tables: Assertion `smbios_smp_sockets = 1' failed. 2014-08-05 21:45:35.825+: shutting down however setting 4 sockets with 1 core each allows me to start the machine just fine. the system is debian wheezy Linux hostname 3.16.0-hostname2 #2 SMP Mon Aug 4 17:02:16 EEST 2014 x86_64 GNU/Linux libvirt 1.2.7 (built from sources) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353149/+subscriptions
Re: [Qemu-devel] [v4][PATCH 4/5] xen:hw:pci-host:piix: create host bridge to passthrough
On Wed, Aug 06, 2014 at 02:50:34PM +0800, Tiejun Chen wrote: Implement a pci host bridge specific to passthrough. Actually this just inherits the standard one. This is based on http://patchwork.ozlabs.org/patch/363810/. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 39 +++ include/hw/i386/pc.h | 2 ++ 2 files changed, 41 insertions(+) v4: * Rebase v3: * Rebase v2: * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 4330599..5cac398 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -303,6 +303,17 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } +static int xen_igd_passthrough_i440fx_initfn(PCIDevice *dev) +{ +PCII440FXState *d = I440FX_PCI_DEVICE(dev, + TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE); + +dev-config[I440FX_SMRAM] = 0x02; + +cpu_smm_register(i440fx_set_smm, d); +return 0; +} + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix3_devfn, don't duplicate code. Reuse the function from regular piix. @@ -703,6 +714,33 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +static void xen_igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k-init = xen_igd_passthrough_i440fx_initfn; +k-vendor_id = PCI_VENDOR_ID_INTEL; +k-device_id = PCI_DEVICE_ID_INTEL_82441; +k-revision = 0x02; +k-class_id = PCI_CLASS_BRIDGE_HOST; +dc-desc = IGD PT XEN Host bridge; +dc-vmsd = vmstate_i440fx; +/* + * PCI-facing part of the host bridge, not usable without the + * host-facing part, which can't be device_add'ed, yet. + */ +dc-cannot_instantiate_with_device_add_yet = true; +dc-hotpluggable = false; +} + +static const TypeInfo xen_igd_passthrough_i440fx_info = { +.name = TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE, +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCII440FXState), +.class_init= xen_igd_passthrough_i440fx_class_init, +}; + static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -744,6 +782,7 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(i440fx_info); +type_register_static(xen_igd_passthrough_i440fx_info); type_register_static(piix3_info); type_register_static(piix3_xen_info); type_register_static(i440fx_pcihost_info); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 11fb72f..de34aa6 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -238,6 +238,8 @@ typedef struct PCII440FXState PCII440FXState; #define TYPE_I440FX_PCI_HOST_BRIDGE i440FX-pcihost #define TYPE_I440FX_PCI_DEVICE i440FX +#define TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE xen-igd-passthrough-i440FX + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, -- 1.9.1
Re: [Qemu-devel] [PATCH v2 00/30] AHCI test suite framework
On Mon, Aug 04, 2014 at 05:11:01PM -0400, John Snow wrote: This patch series introduces a number of small fixes and tweaks to help support an AHCI test suite that in the future I hope to expand to a fuller regression suite to help guide the development of the AHCI device support under, in particular, the Q35 machine type in QEMU. Paolo Bonzini has contributed a number of cleanup and refactoring patches that support changes to the PIO setup FIS packet construction code, which is necessary for testing ths specification adherence of the IDENTIFY command, which issues its data exclusively via PIO mechanisms. The ahci-test code being checked in represents a minimum of functionality needed in order to issue and receive commands from the AHCI HBA under the libqos / qtest environment. In V2, as detailed below, these tests are not currently expected to pass. I will post a complementary patch outside of this set that highlights the exact set of tests that will not pass, which can help verify at least the portions of these tests that do work correctly. Assertions that currently fail: - Ordering of PCI capabilities as defined by either AHCI or Intel ICH9 - Boot-time values of the PxTFD register, which should not have valid data until after a D2H FIS is received, but does in Qemu 2.1 - Boot-time values of the PxSIG register, which should have a specific placeholder signature until the first D2H FIS is received, but is currently blank. - The Descriptor Processed interrupt is expected after the IDENTIFY command exhausts the given PRDT, but is not seen. I guess these are the assertion failures: ERROR:tests/ahci-test.c:777:ahci_test_pci_spec: assertion failed ((data 0xFF) == PCI_CAP_ID_MSI): (0x0012 == 0x0005) GTester: last random seed: R02Sd92815a5d013e8433808b903b2b13fb0 ** ERROR:tests/ahci-test.c:1165:ahci_test_port_spec: assertion failed ((reg) ((0x01)) == ((0x01))): (0x == 0x0001) GTester: last random seed: R02S4d6c05e864dc777e64141cdc6d2a18cf ** ERROR:tests/ahci-test.c:1360:ahci_test_identify: assertion failed ((reg) ((0x20)) == ((0x20))): (0x == 0x0020) GTester: last random seed: R02S2b3b330b83a66badb24da80b48120b1d Why publish this patch series if the test fails? We can't merge failing tests. pgpxF10PzmRWz.pgp Description: PGP signature
Re: [Qemu-devel] [v4][PATCH 3/5] I440FX_PCI_DEVICE: add pci_type to index
On Wed, Aug 06, 2014 at 02:50:33PM +0800, Tiejun Chen wrote: We need to use this index to reuse this macro later Signed-off-by: Tiejun Chen tiejun.c...@intel.com Which index? Most users don't need to change. Just open-code OBJECT_CHECK where necessary, or add a new wrapper. --- hw/pci-host/piix.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) v4: * New patch to extend I440FX_PCI_DEVICE diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 0cd82b8..4330599 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -90,8 +90,8 @@ typedef struct PIIX3State { MemoryRegion rcr_mem; } PIIX3State; -#define I440FX_PCI_DEVICE(obj) \ -OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) +#define I440FX_PCI_DEVICE(obj, type) \ +OBJECT_CHECK(PCII440FXState, (obj), type) struct PCII440FXState { /* private */ @@ -155,7 +155,7 @@ static void i440fx_set_smm(int val, void *arg) static void i440fx_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len) { -PCII440FXState *d = I440FX_PCI_DEVICE(dev); +PCII440FXState *d = I440FX_PCI_DEVICE(dev, TYPE_I440FX_PCI_DEVICE); /* XXX: implement SMRAM.D_LOCK */ pci_default_write_config(dev, address, val, len); @@ -295,7 +295,7 @@ static void i440fx_pcihost_realize(DeviceState *dev, Error **errp) static int i440fx_initfn(PCIDevice *dev) { -PCII440FXState *d = I440FX_PCI_DEVICE(dev); +PCII440FXState *d = I440FX_PCI_DEVICE(dev, TYPE_I440FX_PCI_DEVICE); dev-config[I440FX_SMRAM] = 0x02; @@ -333,7 +333,7 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type, qdev_init_nofail(dev); d = pci_create_simple(b, 0, pci_type); -*pi440fx_state = I440FX_PCI_DEVICE(d); +*pi440fx_state = I440FX_PCI_DEVICE(d, pci_type); f = *pi440fx_state; f-system_memory = address_space_mem; f-pci_address_space = pci_address_space; -- 1.9.1
Re: [Qemu-devel] [v4][PATCH 4/5] xen:hw:pci-host:piix: create host bridge to passthrough
On 2014/8/6 17:42, Michael S. Tsirkin wrote: On Wed, Aug 06, 2014 at 02:50:34PM +0800, Tiejun Chen wrote: Implement a pci host bridge specific to passthrough. Actually this just inherits the standard one. This is based on http://patchwork.ozlabs.org/patch/363810/. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- hw/pci-host/piix.c | 39 +++ include/hw/i386/pc.h | 2 ++ 2 files changed, 41 insertions(+) v4: * Rebase v3: * Rebase v2: * Just add prefix with XEN_IGD_PASSTHROUGH/xen_igd_passthrough diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 4330599..5cac398 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -303,6 +303,17 @@ static int i440fx_initfn(PCIDevice *dev) return 0; } +static int xen_igd_passthrough_i440fx_initfn(PCIDevice *dev) +{ +PCII440FXState *d = I440FX_PCI_DEVICE(dev, + TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE); Sorry, I don't understand what you mean. i440fx_initfn(PCIDevice *dev) just have one parameter, dev, how to multiplex that here? Thanks Tiejun + +dev-config[I440FX_SMRAM] = 0x02; + +cpu_smm_register(i440fx_set_smm, d); +return 0; +} + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix3_devfn, don't duplicate code. Reuse the function from regular piix. @@ -703,6 +714,33 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +static void xen_igd_passthrough_i440fx_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + +k-init = xen_igd_passthrough_i440fx_initfn; +k-vendor_id = PCI_VENDOR_ID_INTEL; +k-device_id = PCI_DEVICE_ID_INTEL_82441; +k-revision = 0x02; +k-class_id = PCI_CLASS_BRIDGE_HOST; +dc-desc = IGD PT XEN Host bridge; +dc-vmsd = vmstate_i440fx; +/* + * PCI-facing part of the host bridge, not usable without the + * host-facing part, which can't be device_add'ed, yet. + */ +dc-cannot_instantiate_with_device_add_yet = true; +dc-hotpluggable = false; +} + +static const TypeInfo xen_igd_passthrough_i440fx_info = { +.name = TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE, +.parent= TYPE_PCI_DEVICE, +.instance_size = sizeof(PCII440FXState), +.class_init= xen_igd_passthrough_i440fx_class_init, +}; + static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) { @@ -744,6 +782,7 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(i440fx_info); +type_register_static(xen_igd_passthrough_i440fx_info); type_register_static(piix3_info); type_register_static(piix3_xen_info); type_register_static(i440fx_pcihost_info); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 11fb72f..de34aa6 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -238,6 +238,8 @@ typedef struct PCII440FXState PCII440FXState; #define TYPE_I440FX_PCI_HOST_BRIDGE i440FX-pcihost #define TYPE_I440FX_PCI_DEVICE i440FX +#define TYPE_XEN_IGD_PASSTHROUGH_I440FX_PCI_DEVICE xen-igd-passthrough-i440FX + PCIBus *i440fx_init(const char *host_type, const char *pci_type, PCII440FXState **pi440fx_state, int *piix_devfn, ISABus **isa_bus, qemu_irq *pic, -- 1.9.1
[Qemu-devel] [PATCH v3] po: Add Chinese translation
Signed-off-by: Fam Zheng f...@redhat.com Reviewed-by: Amos Kong ak...@redhat.com Reviewed-by: Dongsheng Song songdongsh...@live.cn Reviewed-by: Wei Huang wehu...@redhat.com --- v3: Add a missing space for msgid - Press Ctrl+Alt+G to release grab As pointed out by Amos. Meanwhile adding people's review-by's. [Hope no one in the rev-by list is against with that change. To Wei: I changed your Acked-by to Reviewed-by which I think is more appropriate here.] Cc'ing qemu-trivial. v2: Fix typo for _View. --- po/zh_CN.po | 86 + 1 file changed, 86 insertions(+) create mode 100644 po/zh_CN.po diff --git a/po/zh_CN.po b/po/zh_CN.po new file mode 100644 index 000..2b1d42e --- /dev/null +++ b/po/zh_CN.po @@ -0,0 +1,86 @@ +# Chinese translation for QEMU. +# This file is put in the public domain. +# +# Fam Zheng f...@redhat.com, 2014 +msgid +msgstr +Project-Id-Version: QEMU 2.2\n +Report-Msgid-Bugs-To: qemu-devel@nongnu.org\n +POT-Creation-Date: 2014-07-31 10:03+0800\n +PO-Revision-Date: 2014-07-31 10:00+0800\n +Last-Translator: Fam Zheng f...@redhat.com\n +Language-Team: Chinese z...@li.org\n +Language: zh\n +MIME-Version: 1.0\n +Content-Type: text/plain; charset=UTF-8\n +Content-Transfer-Encoding: 8bit\n +Plural-Forms: nplurals=2; plural=n != 1;\n +X-Generator: Lokalize 1.4\n + +#: ui/gtk.c:321 +msgid - Press Ctrl+Alt+G to release grab +msgstr - 按下 Ctrl+Alt+G 取消捕获 + +#: ui/gtk.c:325 +msgid [Paused] +msgstr [已暂停] + +#: ui/gtk.c:1601 +msgid _Pause +msgstr 暂停(_P) + +#: ui/gtk.c:1607 +msgid _Reset +msgstr 重置(_R) + +#: ui/gtk.c:1610 +msgid Power _Down +msgstr 关闭电源(_D) + +#: ui/gtk.c:1616 +msgid _Quit +msgstr 退出(_Q) + +#: ui/gtk.c:1692 +msgid _Fullscreen +msgstr 全屏(_F) + +#: ui/gtk.c:1702 +msgid Zoom _In +msgstr 放大(_I) + +#: ui/gtk.c:1709 +msgid Zoom _Out +msgstr 缩小(_O) + +#: ui/gtk.c:1716 +msgid Best _Fit +msgstr 最合适大小(_F) + +#: ui/gtk.c:1723 +msgid Zoom To _Fit +msgstr 缩放以适应大小(_F) + +#: ui/gtk.c:1729 +msgid Grab On _Hover +msgstr 鼠标经过时捕获(_H) + +#: ui/gtk.c:1732 +msgid _Grab Input +msgstr 捕获输入(_G) + +#: ui/gtk.c:1761 +msgid Show _Tabs +msgstr 显示标签页(_T) + +#: ui/gtk.c:1764 +msgid Detach Tab +msgstr 分离标签页 + +#: ui/gtk.c:1778 +msgid _Machine +msgstr 虚拟机(_M) + +#: ui/gtk.c:1783 +msgid _View +msgstr 视图(_V) -- 2.0.3
Re: [Qemu-devel] [questions] about KVM asaMicrosoft-compatiblehypervisor
On Mon, 2014-08-04 at 14:29 +0800, Zhang Haoyu wrote: Hi Zhang, No I haven't seen such problem Which kernel version are you running? Host kernel: RHEL7-RC1(linux-3.10.0). Does it include the latest lazy eli changes? lazy eli or lazy eoi? EOI How to confirm whether lazy eli has been included? not in linux-3.10.0 Btw, hv_spinlocks=0xfff is a pretty huge value. which value do you advise to use? MS seems to be using 0x as a default. best regards, Vadim. Thanks, Zhang Haoyu Best regards, Vadim.
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Am 06.08.2014 um 11:37 hat Ming Lei geschrieben: On Wed, Aug 6, 2014 at 4:48 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 07:33 hat Ming Lei geschrieben: Hi Kevin, On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf kw...@redhat.com wrote: Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben: I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your bypass code path to run the request inside a coroutine. That way you can compare bypass without coroutine against bypass with coroutine. Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. My doubts aren't only that the overhead might not come from the coroutines, but also whether any coroutine-related overhead is really unavoidable. If we can optimise coroutines, I'd strongly prefer to do just that instead of introducing additional code paths. OK, thank you for taking look at the problem, and hope we can figure out the root cause, :-) Another thought I had was this: If the performance difference is indeed only coroutines, then that is completely inside the block layer and we don't actually need a VM to test it. We could instead have something like a simple qemu-img based benchmark and should be observing the same. Even it is simpler to run a coroutine-only benchmark, and I just wrote a raw one, and looks coroutine does decrease performance a lot, please see the attachment patch, and thanks for your template to help me add the 'co_bench' command in qemu-img. Yes, we can look at coroutines microbenchmarks in isolation. I actually did do that yesterday with the yield test from tests/test-coroutine.c. And in fact profiling immediately showed something to optimise: pthread_getspecific() was quite high, replacing it by __thread on systems where it works is more efficient and helped the numbers a bit. Also, a lot of time seems to be spent in pthread_mutex_lock/unlock (even in qemu-img bench), maybe there's even something that can be done here. The lock/unlock in dataplane is often from memory_region_find(), and Paolo should have done lots of work on that. However, I just wasn't sure whether a change on this level would be relevant in a realistic environment. This is the reason why I wanted to get a benchmark involving the block layer and some I/O. From the profiling data in below link: http://pastebin.com/YwH2uwbq With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased ~35%(693M vs. 512M), insns per cycle is decreased by ~50%( 1.35 vs. 1.63), compared with bypassing coroutine(-b parameter). The bypass code in the benchmark is very similar with the approach used in the bypass patch, since linux-aio with O_DIRECT seldom blocks in the the kernel I/O path. Maybe the benchmark is a bit extremely, but given modern storage device may reach millions of IOPS, and it is very easy to slow down the I/O by coroutine. I think in order to optimise coroutines, such benchmarks are fair game. It's just not guaranteed that the effects are exactly the same on real workloads, so we should take the results with a grain of salt. Anyhow, the coroutine version of your benchmark is buggy, it leaks all coroutines instead of exiting them, so it can't make any use of the coroutine pool. On my laptop, I get this (where fixed coroutine is a version that simply removes the yield at the end): | bypass| fixed coro| buggy coro +---+---+-- time| 1.09s | 1.10s | 1.62s L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438 insns per cycle | 2.39 | 2.39 | 1.90 Begs the question whether you see a similar effect on a real qemu and the coroutine pool is still not big enough? With correct use of coroutines, the difference seems to be barely measurable even without any I/O involved. When I comment qemu_coroutine_yield(), looks result of bypass and fixed coro is very similar as your test, and I am just wondering if stack is always switched in qemu_coroutine_enter() without calling qemu_coroutine_yield(). Yes, definitely. qemu_coroutine_enter() always involves calling qemu_coroutine_switch(), which is the stack switch. Without the yield, the benchmark can't emulate coroutine usage in bdrv_aio_readv/writev() path any more, and bypass in the patchset skips two qemu_coroutine_enter() and one qemu_coroutine_yield() for each bdrv_aio_readv/writev(). It's not completely comparable anyway because you're not going through a main loop and callbacks
Re: [Qemu-devel] [v4][PATCH 3/5] I440FX_PCI_DEVICE: add pci_type to index
On 2014/8/6 17:45, Michael S. Tsirkin wrote: On Wed, Aug 06, 2014 at 02:50:33PM +0800, Tiejun Chen wrote: We need to use this index to reuse this macro later Signed-off-by: Tiejun Chen tiejun.c...@intel.com Which index? Most users don't need to change. Just open-code OBJECT_CHECK where necessary, or add a new wrapper. Okay so what about this? hw:pci-host:piix: define I440FX_PCI_DEVICE_FROM_TYPE We need to introduce I440FX_PCI_DEVICE_FROM_TYPE to get object with type, then we can reuse i440fx_init() simply. Signed-off-by: Tiejun Chen tiejun.c...@intel.com diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 0cd82b8..8c74653 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -93,6 +93,9 @@ typedef struct PIIX3State { #define I440FX_PCI_DEVICE(obj) \ OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) +#define I440FX_PCI_DEVICE_FROM_TYPE(obj, type) \ +OBJECT_CHECK(PCII440FXState, (obj), type) + struct PCII440FXState { /* private */ PCIDevice parent_obj; @@ -333,7 +336,7 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type, qdev_init_nofail(dev); d = pci_create_simple(b, 0, pci_type); -*pi440fx_state = I440FX_PCI_DEVICE(d); +*pi440fx_state = I440FX_PCI_DEVICE_FROM_TYPE(d, pci_type); f = *pi440fx_state; f-system_memory = address_space_mem; f-pci_address_space = pci_address_space; Thanks Tiejun --- hw/pci-host/piix.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) v4: * New patch to extend I440FX_PCI_DEVICE diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 0cd82b8..4330599 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -90,8 +90,8 @@ typedef struct PIIX3State { MemoryRegion rcr_mem; } PIIX3State; -#define I440FX_PCI_DEVICE(obj) \ -OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE) +#define I440FX_PCI_DEVICE(obj, type) \ +OBJECT_CHECK(PCII440FXState, (obj), type) struct PCII440FXState { /* private */ @@ -155,7 +155,7 @@ static void i440fx_set_smm(int val, void *arg) static void i440fx_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len) { -PCII440FXState *d = I440FX_PCI_DEVICE(dev); +PCII440FXState *d = I440FX_PCI_DEVICE(dev, TYPE_I440FX_PCI_DEVICE); /* XXX: implement SMRAM.D_LOCK */ pci_default_write_config(dev, address, val, len); @@ -295,7 +295,7 @@ static void i440fx_pcihost_realize(DeviceState *dev, Error **errp) static int i440fx_initfn(PCIDevice *dev) { -PCII440FXState *d = I440FX_PCI_DEVICE(dev); +PCII440FXState *d = I440FX_PCI_DEVICE(dev, TYPE_I440FX_PCI_DEVICE); dev-config[I440FX_SMRAM] = 0x02; @@ -333,7 +333,7 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type, qdev_init_nofail(dev); d = pci_create_simple(b, 0, pci_type); -*pi440fx_state = I440FX_PCI_DEVICE(d); +*pi440fx_state = I440FX_PCI_DEVICE(d, pci_type); f = *pi440fx_state; f-system_memory = address_space_mem; f-pci_address_space = pci_address_space; -- 1.9.1
Re: [Qemu-devel] [edk2] license for binary drivers
On 08/06/14 09:40, Reza Jelveh wrote: Hello, EDK2 integrates FAT as a binary driver. What is the license of the FAT driver? https://svn.code.sf.net/p/edk2/code/trunk/edk2/FatBinPkg/License.txt What are the guidelines for use of binary drivers with EDK2? Specifically if you want to bundle an OVMF firmware with qemu? I'm not a lawyer. You should consult a lawyer. That said, the binary or source code nature of the FAT driver is inconsequential in this instance. You can actually find the source code for the FAT driver, and build it yourself, you just need to import it in the edk2 tree, and patch OvmfPkg*.{dsc,fdf) slightly. https://svn.code.sf.net/p/edk2/code/trunk/edk2/FatBinPkg/ReadMe.txt https://github.com/tianocore/edk2-FatPkg/ The main issue is that the FAT driver is not free software. It doesn't satisfy the four freedom requirements listed here: https://www.gnu.org/philosophy/free-sw.html [...] The freedom to run the program as you wish, for any purpose (freedom 0). [...] The FAT driver license does not give you this freedom. The FAT driver's license is more restrictive than the 3-clause BSDL that it is based upon. The 3-clause BSDL is GPL compatible: http://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_.28.22Revised_BSD_License.22.2C_.22New_BSD_License.22.2C_or_.22Modified_BSD_License.22.29 But the additional restriction in the FAT driver's license makes it incompatible with the GPLv2 (which QEMU as a whole us released under, see LICENSE). The additional restriction most likely originates from http://msdn.microsoft.com/en-us/gg463080.aspx, the Microsoft EFI FAT32 File System Specification: Note: The download license agreement permits you to use the Microsoft EFI FAT32 File System Specification only in connection with a firmware implementation of the Extensible Firmware Initiative Specification, v. 1.0. If you plan to implement the FAT32 File System specification for other purposes, you must obtain an additional license from Microsoft. For example, you must obtain an additional license in order to create a file system for reading or reading and writing FAT32 in digital cameras recording to flash media, in computer operating systems reading and writing internal/external hard disks or flash media, or in set-top boxes reading FAT-formatted media. (This is precisely what freedom 0 would be about.) So no, you can't ship an OVMF binary (or source tarball) that contains the FAT driver, bundled as part of the GPLv2 (+compatible) QEMU distribution, either in source or in binary form. Linux distributions have moved OVMF to their non-free channels accordingly. https://packages.debian.org/sid/ovmf https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745698 http://packages.ubuntu.com/trusty/misc/ovmf What you can do is build an OVMF binary that doesn't include the FAT driver. The result will be then covered by 3-clause BSDL (further restricted by the OpenSSL license, if you include -D SECURE_BOOT), and that one you can bundle with QEMU. However, such an OVMF binary will be useless, unless users themselves can augment it somehow at build time or at runtime with some FAT driver. (The UEFI specification requires FAT32 support.) UEFI should have standardized a preexistent, but unencumbered, file system. That horse is out of the barn. Disclaimer: I am not a lawyer, and this is my personal opinion only, not necessarily my employer's. Laszlo
Re: [Qemu-devel] [questions] about qemu log
On Wed, Aug 6, 2014 at 12:40 AM, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: Well -D will log to file only loggable (i.e. qemu_log()) information (which has all sorts of options and switches). Stderr, is a little more static and should in theory be limited to genuine errors. But if you want a combined log of both you can simply omit -D to default qemu_log output to stderr. This gives you a combined log that you can redirect anywhere. To be honest, this is what I do as a matter of course (2 foo rather than -D foo). understood; this make it incompatible with -daemonize option. there should be a possibility to detach the process and also redirect stderr somewhere. -- William
Re: [Qemu-devel] [PATCH RFC v2 01/12] QEMUSizedBuffer/QEMUFile
* Eric Blake (ebl...@redhat.com) wrote: On 07/25/2014 09:39 AM, Sanidhya Kashyap wrote: From: Dr. David Alan Gilbert dgilb...@redhat.com Stefan Berger's to create a QEMUFile that goes to a memory buffer; Missing something. Maybe you meant: This is based on Stefan Berger's patch to create ... I'll update that in my version. from: http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html Using the QEMUFile interface, this patch adds support functions for operating on in-memory sized buffers that can be written to or read from. Odd line breaking. Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com Signed-off-by: Joel Schopp jsch...@linux.vnet.ibm.com Still looks weird to have David as author but not listed in S-o-B. I'm thinking of trying to put this code in via a stand-alone patch (with a testcase use); since we've got 3+ users of this code in unmerged things and at least 3 implementations of something similar, it would be best to get it in before someone writes a 4th. --- include/migration/qemu-file.h | 27 +++ qemu-file.c | 411 ++ 2 files changed, 438 insertions(+) +/** + * Set the length of the buffer; The primary usage of this s/The/the/ + * function is to truncate the number of used bytes in the buffer. + * The size will not be extended beyond the current number of no need for double space + * allocated bytes in the QEMUSizedBuffer. + * + * @qsb: A QEMUSizedBuffer + * @new_len : The new length of bytes in the buffer + * + * Returns the number of bytes the buffer was trucated or extended s/trucated/truncated/ I've picked up these typos in my world. Thanks. + +QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input) +{ +QEMUBuffer *s; + +if (mode == NULL || (mode[0] != 'r' mode[0] != 'w') || mode[1] != 0) { +fprintf(stderr, qemu_bufopen: Argument validity check failed\n); Should this function be directly printing to stderr, or should it be converted to use Error **errp semantics? It's consistent with the open functions for other file types ( qemu_fdopen, qemu_popen_cmd, qemu_fopen ). Dave -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] QEMU with KVM does not start Win8 on kernel 3.4.67 and core2duo
Hi all, I did already several tests and I'm not completely sure what's going wrong, but here my scenario: When I start up QEMU w/ KVM 1.7.0 on a Core2Duo machine running a vanilla kernel 3.4.67 to run a Windows 8.0 guest, the guest freezes at boot without any error. When I dump the CPU registers via info registers, nothing changes, that means the system really stalled. Same happens with QEMU 2.0.0. But - when I run the very same guest using Kernel 2.6.32.12 and QEMU 1.7.0 on the host side it works on the Core2Duo. Also the system above but just with an i3 or i5 CPU it works, too. I already disabled networking and USB for the guest and changed the graphics card - no effect. I assume that some mean bits and bytes have to be set up properly to get the thing running. Any hint what to change / test would be really appreciated. Thanks in advance, Best regards, Erik
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 6, 2014 at 6:09 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 11:37 hat Ming Lei geschrieben: On Wed, Aug 6, 2014 at 4:48 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 07:33 hat Ming Lei geschrieben: Hi Kevin, On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf kw...@redhat.com wrote: Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben: I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your bypass code path to run the request inside a coroutine. That way you can compare bypass without coroutine against bypass with coroutine. Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. My doubts aren't only that the overhead might not come from the coroutines, but also whether any coroutine-related overhead is really unavoidable. If we can optimise coroutines, I'd strongly prefer to do just that instead of introducing additional code paths. OK, thank you for taking look at the problem, and hope we can figure out the root cause, :-) Another thought I had was this: If the performance difference is indeed only coroutines, then that is completely inside the block layer and we don't actually need a VM to test it. We could instead have something like a simple qemu-img based benchmark and should be observing the same. Even it is simpler to run a coroutine-only benchmark, and I just wrote a raw one, and looks coroutine does decrease performance a lot, please see the attachment patch, and thanks for your template to help me add the 'co_bench' command in qemu-img. Yes, we can look at coroutines microbenchmarks in isolation. I actually did do that yesterday with the yield test from tests/test-coroutine.c. And in fact profiling immediately showed something to optimise: pthread_getspecific() was quite high, replacing it by __thread on systems where it works is more efficient and helped the numbers a bit. Also, a lot of time seems to be spent in pthread_mutex_lock/unlock (even in qemu-img bench), maybe there's even something that can be done here. The lock/unlock in dataplane is often from memory_region_find(), and Paolo should have done lots of work on that. However, I just wasn't sure whether a change on this level would be relevant in a realistic environment. This is the reason why I wanted to get a benchmark involving the block layer and some I/O. From the profiling data in below link: http://pastebin.com/YwH2uwbq With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased ~35%(693M vs. 512M), insns per cycle is decreased by ~50%( 1.35 vs. 1.63), compared with bypassing coroutine(-b parameter). The bypass code in the benchmark is very similar with the approach used in the bypass patch, since linux-aio with O_DIRECT seldom blocks in the the kernel I/O path. Maybe the benchmark is a bit extremely, but given modern storage device may reach millions of IOPS, and it is very easy to slow down the I/O by coroutine. I think in order to optimise coroutines, such benchmarks are fair game. It's just not guaranteed that the effects are exactly the same on real workloads, so we should take the results with a grain of salt. Anyhow, the coroutine version of your benchmark is buggy, it leaks all coroutines instead of exiting them, so it can't make any use of the coroutine pool. On my laptop, I get this (where fixed coroutine is a version that simply removes the yield at the end): | bypass| fixed coro| buggy coro +---+---+-- time| 1.09s | 1.10s | 1.62s L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438 insns per cycle | 2.39 | 2.39 | 1.90 Begs the question whether you see a similar effect on a real qemu and the coroutine pool is still not big enough? With correct use of coroutines, the difference seems to be barely measurable even without any I/O involved. When I comment qemu_coroutine_yield(), looks result of bypass and fixed coro is very similar as your test, and I am just wondering if stack is always switched in qemu_coroutine_enter() without calling qemu_coroutine_yield(). Yes, definitely. qemu_coroutine_enter() always involves calling qemu_coroutine_switch(), which is the stack switch. Without the yield, the benchmark can't emulate coroutine usage in bdrv_aio_readv/writev() path any more, and bypass in the patchset skips two qemu_coroutine_enter() and one qemu_coroutine_yield() for each bdrv_aio_readv/writev(). It's not completely
Re: [Qemu-devel] [PATCH v2 00/30] AHCI test suite framework
Stefan Hajnoczi stefa...@redhat.com writes: On Mon, Aug 04, 2014 at 05:11:01PM -0400, John Snow wrote: This patch series introduces a number of small fixes and tweaks to help support an AHCI test suite that in the future I hope to expand to a fuller regression suite to help guide the development of the AHCI device support under, in particular, the Q35 machine type in QEMU. Paolo Bonzini has contributed a number of cleanup and refactoring patches that support changes to the PIO setup FIS packet construction code, which is necessary for testing ths specification adherence of the IDENTIFY command, which issues its data exclusively via PIO mechanisms. The ahci-test code being checked in represents a minimum of functionality needed in order to issue and receive commands from the AHCI HBA under the libqos / qtest environment. In V2, as detailed below, these tests are not currently expected to pass. I will post a complementary patch outside of this set that highlights the exact set of tests that will not pass, which can help verify at least the portions of these tests that do work correctly. Assertions that currently fail: - Ordering of PCI capabilities as defined by either AHCI or Intel ICH9 - Boot-time values of the PxTFD register, which should not have valid data until after a D2H FIS is received, but does in Qemu 2.1 - Boot-time values of the PxSIG register, which should have a specific placeholder signature until the first D2H FIS is received, but is currently blank. - The Descriptor Processed interrupt is expected after the IDENTIFY command exhausts the given PRDT, but is not seen. I guess these are the assertion failures: ERROR:tests/ahci-test.c:777:ahci_test_pci_spec: assertion failed ((data 0xFF) == PCI_CAP_ID_MSI): (0x0012 == 0x0005) GTester: last random seed: R02Sd92815a5d013e8433808b903b2b13fb0 ** ERROR:tests/ahci-test.c:1165:ahci_test_port_spec: assertion failed ((reg) ((0x01)) == ((0x01))): (0x == 0x0001) GTester: last random seed: R02S4d6c05e864dc777e64141cdc6d2a18cf ** ERROR:tests/ahci-test.c:1360:ahci_test_identify: assertion failed ((reg) ((0x20)) == ((0x20))): (0x == 0x0020) GTester: last random seed: R02S2b3b330b83a66badb24da80b48120b1d Why publish this patch series if the test fails? We can't merge failing tests. Correct. What I do when I want to start some bug fixing work with tests is to write the tests to expect the actual, incorrect behavior, with a greppable comment documenting the correct behavior. Then clean that up as the bugs get fixed.
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 6, 2014 at 7:28 PM, Ming Lei ming@canonical.com wrote: On Wed, Aug 6, 2014 at 6:09 PM, Kevin Wolf kw...@redhat.com wrote: I use the /dev/nullb0 block device to test, which is available in linux kernel 3.13+, and follows the difference, which looks not very big( 10%): And I added two parameter to your img-bench patch: -c CNT # which is passed to 'data.n' -b #enable bypass coroutine introduced in this patchset Another difference is that dataplane uses its own thread, and this bench takes main_loop. ming@:~/git/qemu$ sudo ~/bin/perf stat -e L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses ./qemu-img bench -f raw -t off -n -c 1000 -b /dev/nullb0 read time: 58024ms Performance counter stats for './qemu-img bench -f raw -t off -n -c 1000 -b /dev/nullb0': 34,874,462,357 L1-dcache-loads [40.00%] 714,018,039 L1-dcache-load-misses #2.05% of all L1-dcache hits [40.00%] 133,897,794,677 cpu-cycles[40.05%] 116,714,230,004 instructions #0.87 insns per cycle [50.02%] 22,689,223,546 branch-instructions [50.01%] 391,673,952 branch-misses #1.73% of all branches [50.00%] 22,726,856,215 branch-loads [50.01%] 18,570,766,783 branch-load-misses [49.98%] 34,944,839,907 dTLB-loads [39.99%] 24,405,944 dTLB-load-misses #0.07% of all dTLB cache hits [39.99%] 58.040785989 seconds time elapsed ming@:~/git/qemu$ sudo ~/bin/perf stat -e L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses ./qemu-img bench -f raw -t off -n -c 1000 /dev/nullb0 read time: 63369ms BTW, Stefan's coroutine resize patch is applied in both the tests(qem-img bench). Thanks,
Re: [Qemu-devel] [questions] about qemu log
On Wed, Aug 6, 2014 at 5:42 PM, Markus Armbruster arm...@redhat.com wrote: Zhang Haoyu zhan...@sangfor.com writes: The output is on qemu's stderr. You are in control of what that stderr is. I don't get why we can configure -D /path/to/unique/file/name.log but we also have to redirect stderr (I didn't checked if the daemonize option was closing it). What's the purpose of this logfile option? Well -D will log to file only loggable (i.e. qemu_log()) information (which has all sorts of options and switches). Stderr, is a little more static and should in theory be limited to genuine errors. But if you want a combined log of both you can simply omit -D to default qemu_log output to stderr. This gives you a combined log that you can redirect anywhere. To be honest, this is what I do as a matter of course (2 foo rather than -D foo). Maybe we can introduce a new qemu option to specify a error logfile where stderr be redirected, like below, DEF(elogfile, HAS_ARG, QEMU_OPTION_elogfile, \ -elogfile logfile redirect stderr log to logfile(default /var/log/qemu/vm name##.log)\n, QEMU_ARCH_ALL) STEXI @item -elogfile @var{logfile} @findex -elogfile redirect stderr in @var{logfile} ETEXI then we can set the error log file through qemu command, /var/log/qemu/vm name##.log as default. This sounds out-of-scope for QEMU to me and makes a standard flow non-standard. If prints are going to stderr where should be going elsewhere they probably should be fixed. Do you have specific examples of information going to stderr that you would rather go to a log (be it an error log or something else?). I use proxmox to manage vm, it dose not redirect qemu's stderr, and start vm with -daemonize option, so the error log disappeared. I want to redirect the error log of qemu to a specified logfile, if fault happened, I can use the error log to analyze the fault. And, why qemu output the error log to stderr instead of a error logfile which can be configure? Because the code is a mess in that regard. You don't fix that by redirecting stderr wholesale, because that just adds to the mess. You fix it at the root, one ill-advised fprintf() at a time, as Peter advises: Sorry, I'm afraid I misunderstand what you mean, should I replace all of fprintf(stderr, ...) with qemu_log() ? or only some cases where stderr is used where qemu_log should be, as Perter advises? I didn't mean to suggest blind conversion from fprintf() to qemu_log(). Each instance of fprintf() needs to be reviewed before conversion. Yes i'm afraid there is no quick one-shot fix to your problem. The best is the suggested workarounds using stderr redirection. I think that's exactly Peter's advice, too. Correct. Regards, Peter If so, should I still need to redirect the stderr to specified logfile in qemu's parent shell/process ? Probably. Libvirt does it.
Re: [Qemu-devel] [PATCH v5 1/6] exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr
On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: Add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr so that we can handler errors. handle Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- exec.c | 32 +++- include/exec/ram_addr.h | 4 ++-- memory.c| 6 +++--- 3 files changed, 28 insertions(+), 14 deletions(-) diff --git a/exec.c b/exec.c index 765bd94..7e60a44 100644 --- a/exec.c +++ b/exec.c @@ -1224,7 +1224,7 @@ static int memory_try_enable_merging(void *addr, size_t len) return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE); } -static ram_addr_t ram_block_add(RAMBlock *new_block) +static ram_addr_t ram_block_add(RAMBlock *new_block, Error **errp) { RAMBlock *block; ram_addr_t old_ram_size, new_ram_size; @@ -1241,9 +1241,11 @@ static ram_addr_t ram_block_add(RAMBlock *new_block) } else { new_block-host = phys_mem_alloc(new_block-length); if (!new_block-host) { -fprintf(stderr, Cannot set up guest memory '%s': %s\n, -new_block-mr-name, strerror(errno)); -exit(1); +error_setg_errno(errp, errno, + cannot set up guest memory '%s', + new_block-mr-name); +qemu_mutex_unlock_ramlist(); +return -1; } memory_try_enable_merging(new_block-host, new_block-length); } @@ -1294,6 +1296,7 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, Error **errp) { RAMBlock *new_block; +ram_addr_t addr; if (xen_enabled()) { error_setg(errp, -mem-path not supported with Xen); @@ -1323,14 +1326,20 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return -1; } -return ram_block_add(new_block); +addr = ram_block_add(new_block, errp); +if (errp *errp) { +g_free(new_block); The free being conditional on errp will cause a leak if clients (validly) pass a NULL errp in. This free needs to be unconditional. The way to achieve that is the local_err error_propagate pattern. +return -1; +} +return addr; } #endif ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, - MemoryRegion *mr) + MemoryRegion *mr, Error **errp) { RAMBlock *new_block; +ram_addr_t addr; size = TARGET_PAGE_ALIGN(size); new_block = g_malloc0(sizeof(*new_block)); @@ -1341,12 +1350,17 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, if (host) { new_block-flags |= RAM_PREALLOC; } -return ram_block_add(new_block); +addr = ram_block_add(new_block, errp); +if (errp *errp) { +g_free(new_block); ditto. Regards, Peter +return -1; +} +return addr; } -ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr) +ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr, Error **errp) { -return qemu_ram_alloc_from_ptr(size, NULL, mr); +return qemu_ram_alloc_from_ptr(size, NULL, mr, errp); } void qemu_ram_free_from_ptr(ram_addr_t addr) diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 6593be1..cf1d4c7 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -26,8 +26,8 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, bool share, const char *mem_path, Error **errp); ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host, - MemoryRegion *mr); -ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr); + MemoryRegion *mr, Error **errp); +ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr, Error **errp); int qemu_get_ram_fd(ram_addr_t addr); void *qemu_get_ram_block_host_ptr(ram_addr_t addr); void *qemu_get_ram_ptr(ram_addr_t addr); diff --git a/memory.c b/memory.c index 64d7176..59d9935 100644 --- a/memory.c +++ b/memory.c @@ -1169,7 +1169,7 @@ void memory_region_init_ram(MemoryRegion *mr, mr-ram = true; mr-terminates = true; mr-destructor = memory_region_destructor_ram; -mr-ram_addr = qemu_ram_alloc(size, mr); +mr-ram_addr = qemu_ram_alloc(size, mr, error_abort); } #ifdef __linux__ @@ -1199,7 +1199,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr, mr-ram = true; mr-terminates = true; mr-destructor = memory_region_destructor_ram_from_ptr; -mr-ram_addr = qemu_ram_alloc_from_ptr(size, ptr, mr); +mr-ram_addr = qemu_ram_alloc_from_ptr(size, ptr, mr, error_abort); } void
Re: [Qemu-devel] [PATCH v5 2/6] memory: add parameter errp to memory_region_init_ram
On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: Add parameter errp to memory_region_init_ram and update all call sites to pass in error_abort. Signed-off-by: Hu Tao hu...@cn.fujitsu.com Liking it. Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com --- backends/hostmem-ram.c | 2 +- hw/alpha/typhoon.c | 3 ++- hw/arm/armv7m.c | 7 --- hw/arm/cubieboard.c | 2 +- hw/arm/digic_boards.c| 2 +- hw/arm/exynos4210.c | 9 + hw/arm/highbank.c| 5 +++-- hw/arm/integratorcp.c| 5 +++-- hw/arm/kzm.c | 4 ++-- hw/arm/mainstone.c | 3 ++- hw/arm/musicpal.c| 6 -- hw/arm/omap1.c | 6 -- hw/arm/omap2.c | 6 -- hw/arm/omap_sx1.c| 6 -- hw/arm/palm.c| 3 ++- hw/arm/pxa2xx.c | 11 +++ hw/arm/realview.c| 9 ++--- hw/arm/spitz.c | 2 +- hw/arm/strongarm.c | 3 ++- hw/arm/tosa.c| 2 +- hw/arm/versatilepb.c | 3 ++- hw/arm/vexpress.c| 15 ++- hw/arm/virt.c| 3 ++- hw/arm/xilinx_zynq.c | 6 -- hw/block/onenand.c | 2 +- hw/core/loader.c | 2 +- hw/cris/axis_dev88.c | 6 -- hw/display/cg3.c | 6 -- hw/display/qxl.c | 6 +++--- hw/display/sm501.c | 2 +- hw/display/tc6393xb.c| 3 ++- hw/display/tcx.c | 5 +++-- hw/display/vga.c | 3 ++- hw/display/vmware_vga.c | 3 ++- hw/i386/kvm/pci-assign.c | 3 ++- hw/i386/pc.c | 3 ++- hw/i386/pc_sysfw.c | 5 +++-- hw/input/milkymist-softusb.c | 4 ++-- hw/lm32/lm32_boards.c| 6 -- hw/lm32/milkymist.c | 3 ++- hw/m68k/an5206.c | 4 ++-- hw/m68k/dummy_m68k.c | 2 +- hw/m68k/mcf5208.c| 4 ++-- hw/microblaze/petalogix_ml605_mmu.c | 5 +++-- hw/microblaze/petalogix_s3adsp1800_mmu.c | 6 -- hw/mips/mips_fulong2e.c | 5 +++-- hw/mips/mips_jazz.c | 8 +--- hw/mips/mips_malta.c | 6 -- hw/mips/mips_mipssim.c | 6 -- hw/mips/mips_r4k.c | 5 +++-- hw/moxie/moxiesim.c | 4 ++-- hw/net/milkymist-minimac2.c | 2 +- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 3 ++- hw/pci/pci.c | 2 +- hw/ppc/mac_newworld.c| 3 ++- hw/ppc/mac_oldworld.c| 3 ++- hw/ppc/ppc405_boards.c | 8 +--- hw/ppc/ppc405_uc.c | 3 ++- hw/s390x/s390-virtio-ccw.c | 2 +- hw/s390x/s390-virtio.c | 2 +- hw/sh4/r2d.c | 2 +- hw/sh4/shix.c| 8 +--- hw/sparc/leon3.c | 4 ++-- hw/sparc/sun4m.c | 10 ++ hw/sparc64/sun4u.c | 6 -- hw/unicore32/puv3.c | 3 ++- hw/xtensa/sim.c | 4 ++-- hw/xtensa/xtfpga.c | 8 +--- include/exec/memory.h| 4 +++- memory.c | 5 +++-- numa.c | 4 ++-- xen-hvm.c| 3 ++- 73 files changed, 203 insertions(+), 128 deletions(-) diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c index d9a8290..e55d066 100644 --- a/backends/hostmem-ram.c +++ b/backends/hostmem-ram.c @@ -27,7 +27,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path_component(OBJECT(backend)); memory_region_init_ram(backend-mr, OBJECT(backend), path, - backend-size); + backend-size, error_abort); g_free(path); } diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c index 67a1070..058b723 100644 --- a/hw/alpha/typhoon.c +++ b/hw/alpha/typhoon.c @@ -843,7 +843,8 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus, /* Main memory
Re: [Qemu-devel] [PATCH v5 3/6] memory: add parameter errp to memory_region_init_ram_ptr
On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: Add parameter errp to memory_region_init_ram_ptr and update all call sites to pass in error_abort. Signed-off-by: Hu Tao hu...@cn.fujitsu.com Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com --- hw/display/g364fb.c | 2 +- hw/i386/kvm/pci-assign.c | 3 ++- hw/misc/ivshmem.c| 5 +++-- hw/misc/vfio.c | 3 ++- hw/ppc/spapr.c | 2 +- include/exec/memory.h| 4 +++- memory.c | 5 +++-- 7 files changed, 15 insertions(+), 9 deletions(-) diff --git a/hw/display/g364fb.c b/hw/display/g364fb.c index 46f7b41..cce33ae 100644 --- a/hw/display/g364fb.c +++ b/hw/display/g364fb.c @@ -487,7 +487,7 @@ static void g364fb_init(DeviceState *dev, G364State *s) memory_region_init_io(s-mem_ctrl, NULL, g364fb_ctrl_ops, s, ctrl, 0x18); memory_region_init_ram_ptr(s-mem_vram, NULL, vram, - s-vram_size, s-vram); + s-vram_size, s-vram, error_abort); vmstate_register_ram(s-mem_vram, dev); memory_region_set_coalescing(s-mem_vram); } diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c index 5dcd2d5..d2013af 100644 --- a/hw/i386/kvm/pci-assign.c +++ b/hw/i386/kvm/pci-assign.c @@ -456,7 +456,8 @@ static void assigned_dev_register_regions(PCIRegion *io_regions, object_get_typename(OBJECT(pci_dev)), i); memory_region_init_ram_ptr(pci_dev-v_addrs[i].real_iomem, OBJECT(pci_dev), name, - cur_region-size, virtbase); + cur_region-size, virtbase, + error_abort); vmstate_register_ram(pci_dev-v_addrs[i].real_iomem, pci_dev-dev.qdev); } diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index 768e528..0949c15 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -348,7 +348,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memory_region_init_ram_ptr(s-ivshmem, OBJECT(s), ivshmem.bar2, - s-ivshmem_size, ptr); + s-ivshmem_size, ptr, error_abort); vmstate_register_ram(s-ivshmem, DEVICE(s)); memory_region_add_subregion(s-bar, 0, s-ivshmem); @@ -476,7 +476,8 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) map_ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, incoming_fd, 0); memory_region_init_ram_ptr(s-ivshmem, OBJECT(s), - ivshmem.bar2, s-ivshmem_size, map_ptr); + ivshmem.bar2, s-ivshmem_size, map_ptr, + error_abort); vmstate_register_ram(s-ivshmem, DEVICE(s)); IVSHMEM_DPRINTF(guest h/w addr = % PRIu64 , size = % PRIu64 \n, diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 0b9eba0..91d2c95 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -2894,7 +2894,8 @@ static int vfio_mmap_bar(VFIODevice *vdev, VFIOBAR *bar, goto empty_region; } -memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map); +memory_region_init_ram_ptr(submem, OBJECT(vdev), name, size, *map, + error_abort); } else { empty_region: /* Create a zero sized sub-region to make cleanup easy. */ diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index d01978f..4dfe40a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1342,7 +1342,7 @@ static void ppc_spapr_init(MachineState *machine) if (rma_alloc_size rma) { rma_region = g_new(MemoryRegion, 1); memory_region_init_ram_ptr(rma_region, NULL, ppc_spapr.rma, - rma_alloc_size, rma); + rma_alloc_size, rma, error_abort); vmstate_register_ram_global(rma_region); memory_region_add_subregion(sysmem, 0, rma_region); } diff --git a/include/exec/memory.h b/include/exec/memory.h index ec6299b..caa988d 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -351,12 +351,14 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, * @name: the name of the region. * @size: size of the region. * @ptr: memory to be mapped; must contain at least @size bytes. + * @errp: pointer to Error*, to store an error if it happens. */ void memory_region_init_ram_ptr(MemoryRegion *mr, struct Object *owner, const char *name, uint64_t
[Qemu-devel] [Bug 1353456] [NEW] qemu-io: Failure on a qcow2 image with the fuzzed refcount table
Public bug reported: 'qemu-io -c write' and 'qemu-io -c aio_write' crashes on a qcow2 image with a fuzzed refcount table. Sequence: 1. Unpack the attached archive, make a copy of test.img 2. Put copy.img and backing_img.file in the same directory 3. Execute qemu-io copy.img -c write 279552 322560 or qemu-io copy.img -c aio_write 836608 166400 Result: qemu-io was killed by SIGIOT with the reason: qemu-io: block/qcow2-cluster.c:1291: qcow2_alloc_cluster_offset: Assertion `*host_offset != 0' failed. qemu.git HEAD 69f87f713069f1f ** Affects: qemu Importance: Undecided Status: New ** Attachment added: traces.n.images.tar.gz https://bugs.launchpad.net/bugs/1353456/+attachment/4171103/+files/traces.n.images.tar.gz -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1353456 Title: qemu-io: Failure on a qcow2 image with the fuzzed refcount table Status in QEMU: New Bug description: 'qemu-io -c write' and 'qemu-io -c aio_write' crashes on a qcow2 image with a fuzzed refcount table. Sequence: 1. Unpack the attached archive, make a copy of test.img 2. Put copy.img and backing_img.file in the same directory 3. Execute qemu-io copy.img -c write 279552 322560 or qemu-io copy.img -c aio_write 836608 166400 Result: qemu-io was killed by SIGIOT with the reason: qemu-io: block/qcow2-cluster.c:1291: qcow2_alloc_cluster_offset: Assertion `*host_offset != 0' failed. qemu.git HEAD 69f87f713069f1f To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1353456/+subscriptions
Re: [Qemu-devel] [PATCH v2 4/4] ivshmem: check the value returned by fstat()
The function fstat() may fail, so check its return value. Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com --- hw/misc/ivshmem.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index 768e528..5d939d2 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -324,7 +324,11 @@ static int check_shm_size(IVShmemState *s, int fd) { struct stat buf; -fstat(fd, buf); +if (fstat(fd, buf) 0) { +fprintf(stderr, ivshmem: exiting for fstat fd '%d' failed: %s\n, +fd, strerror(errno)); exiting for fstat? I would go with something like this: ivshmem: exiting: fstat on fd %d failed: %s ... or something among the lines. Thanks! Levente Kurusa
Re: [Qemu-devel] [PATCH v5 4/6] memory: add parameter errp to memory_region_init_rom_device
On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: Add parameter errp to memory_region_init_rom_device and update all call sites to pass in error_abort. Signed-off-by: Hu Tao hu...@cn.fujitsu.com Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com --- hw/block/pflash_cfi01.c | 2 +- hw/block/pflash_cfi02.c | 2 +- include/exec/memory.h | 4 +++- memory.c| 5 +++-- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c index f9507b4..649565d 100644 --- a/hw/block/pflash_cfi01.c +++ b/hw/block/pflash_cfi01.c @@ -770,7 +770,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp) memory_region_init_rom_device( pfl-mem, OBJECT(dev), pfl-be ? pflash_cfi01_ops_be : pflash_cfi01_ops_le, pfl, -pfl-name, total_len); +pfl-name, total_len, error_abort); vmstate_register_ram(pfl-mem, DEVICE(pfl)); pfl-storage = memory_region_get_ram_ptr(pfl-mem); sysbus_init_mmio(SYS_BUS_DEVICE(dev), pfl-mem); diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c index 8d4b828..49db02d 100644 --- a/hw/block/pflash_cfi02.c +++ b/hw/block/pflash_cfi02.c @@ -608,7 +608,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error **errp) memory_region_init_rom_device(pfl-orig_mem, OBJECT(pfl), pfl-be ? pflash_cfi02_ops_be : pflash_cfi02_ops_le, - pfl, pfl-name, chip_len); + pfl, pfl-name, chip_len, error_abort); We probably should take the opportunity to error_propagate in these cases, to prepare support for hotplug of devs like this. But I think your blind conversions are a good first step as they will preserve existing behaviour. So lets call that follow up. Regards, Peter vmstate_register_ram(pfl-orig_mem, DEVICE(pfl)); pfl-storage = memory_region_get_ram_ptr(pfl-orig_mem); pfl-chip_len = chip_len; diff --git a/include/exec/memory.h b/include/exec/memory.h index caa988d..71bed47 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -388,13 +388,15 @@ void memory_region_init_alias(MemoryRegion *mr, * @ops: callbacks for write access handling. * @name: the name of the region. * @size: size of the region. + * @errp: pointer to Error*, to store an error if it happens. */ void memory_region_init_rom_device(MemoryRegion *mr, struct Object *owner, const MemoryRegionOps *ops, void *opaque, const char *name, - uint64_t size); + uint64_t size, + Error **errp); /** * memory_region_init_reservation: Initialize a memory region that reserves diff --git a/memory.c b/memory.c index bcebfd8..06a7e1b 100644 --- a/memory.c +++ b/memory.c @@ -1223,7 +1223,8 @@ void memory_region_init_rom_device(MemoryRegion *mr, const MemoryRegionOps *ops, void *opaque, const char *name, - uint64_t size) + uint64_t size, + Error **errp) { memory_region_init(mr, owner, name, size); mr-ops = ops; @@ -1231,7 +1232,7 @@ void memory_region_init_rom_device(MemoryRegion *mr, mr-terminates = true; mr-rom_device = true; mr-destructor = memory_region_destructor_rom_device; -mr-ram_addr = qemu_ram_alloc(size, mr, error_abort); +mr-ram_addr = qemu_ram_alloc(size, mr, errp); } void memory_region_init_iommu(MemoryRegion *mr, -- 1.9.3
Re: [Qemu-devel] [PATCH v5 5/6] hostmem-ram: don't exit qemu if size of memory-backend-ram is way too big
On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: When using monitor command object_add to add a memory backend whose size is way too big to allocate memory for it, qemu just exits. In the case we'd better give an error message and keep guest running. The problem can be reproduced as follows: 1. run qemu 2. (monitor)object_add memory-backend-ram,size=10G,id=ram0 Signed-off-by: Hu Tao hu...@cn.fujitsu.com Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com --- backends/hostmem-ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c index e55d066..a67a134 100644 --- a/backends/hostmem-ram.c +++ b/backends/hostmem-ram.c @@ -27,7 +27,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) path = object_get_canonical_path_component(OBJECT(backend)); memory_region_init_ram(backend-mr, OBJECT(backend), path, - backend-size, error_abort); + backend-size, errp); g_free(path); } -- 1.9.3
Re: [Qemu-devel] [PATCH v5 6/6] exec: improve error handling and reporting in file_ram_alloc() and gethugepagesize()
You subject line is excessively long. How about just improve RAM file error handling and elaborate in a commit msg para? On Wed, Aug 6, 2014 at 3:36 PM, Hu Tao hu...@cn.fujitsu.com wrote: This patch fixes two problems of memory-backend-file: It looks like two self contained changes. Any reason to not split? 1. If user adds a memory-backend-file object using object_add command, specifying a non-existing directory for property mem-path, qemu will core dump with message: /nonexistingdir: No such file or directory Bad ram offset f000 Aborted (core dumped) with this patch, qemu reports error message like: qemu-system-x86_64: -object memory-backend-file,mem-path=/nonexistingdir,id=mem-file0,size=128M: failed to stat file /nonexistingdir: No such file or directory 2. If user adds a memory-backend-file object using object_add command, specifying a size that is less than huge page size, qemu will core dump with message: Bad ram offset f000 Aborted (core dumped) with this patch, qemu reports error message like: qemu-system-x86_64: -object memory-backend-file,mem-path=/hugepages,id=mem-file0,size=1M: memory size 0x10 should be euqal or larger than huge page size 0x20 equal. Signed-off-by: Hu Tao hu...@cn.fujitsu.com --- exec.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/exec.c b/exec.c index 7e60a44..6512820 100644 --- a/exec.c +++ b/exec.c @@ -996,7 +996,7 @@ void qemu_mutex_unlock_ramlist(void) #define HUGETLBFS_MAGIC 0x958458f6 -static long gethugepagesize(const char *path) +static long gethugepagesize(const char *path, Error **errp) { struct statfs fs; int ret; @@ -1006,7 +1006,7 @@ static long gethugepagesize(const char *path) } while (ret != 0 errno == EINTR); if (ret != 0) { -perror(path); +error_setg_errno(errp, errno, failed to get size of file %s, path); I think your error message is imprecise. It's not the file size you are trying to get its the page size for that file (or its underlying file system I think). return 0; } @@ -1024,17 +1024,20 @@ static void *file_ram_alloc(RAMBlock *block, char *filename; char *sanitized_name; char *c; -void *area; +void *area = NULL; int fd; -unsigned long hpagesize; +uint64_t hpagesize; -hpagesize = gethugepagesize(path); -if (!hpagesize) { +hpagesize = gethugepagesize(path, errp); +if (errp *errp) { More flow control dependent on non NULL errp. I think you want a local_err for safety here. goto error; } if (memory hpagesize) { -return NULL; +error_setg(errp, memory size 0x RAM_ADDR_FMT must be euqal to equal Regards, Peter + or larger than huge page size 0x% PRIx64, + memory, hpagesize); +goto error; } if (kvm_enabled() !kvm_has_sync_mmu()) { @@ -1094,8 +1097,8 @@ static void *file_ram_alloc(RAMBlock *block, return area; error: -if (mem_prealloc) { -exit(1); +if (area area != MAP_FAILED) { +munmap(area, memory); } return NULL; } -- 1.9.3
[Qemu-devel] [PATCH V5 0/5] tests: Add the image fuzzer with qcow2 support
This patch series introduces the image fuzzer, a tool for stability and reliability testing. Its approach is to run large amount of tests in background. During every test a program (e.g. qemu-img) is called to read or modify an invalid test image. A test image has valid inner structure defined by its format specification with some fields having random invalid values. Patch 1 contains documentation for the image fuzzer, patch 2 is the test runner and remaining ones relate to the image generator for qcow2 format. This patch series was created for the 'block-next' branch. v4 - v5: Runner: * Added a warning message if a backing file failed to be created (based on the review of Fam Zheng) * Back up a test image before every test command * Fixed always logged messages of a program under test * Added a warning message if a wrong name of a program under test is specified * Made offset and length qemu-io arguments multiple of a sector size * Print all errors to stderr Layout: * Fixed estimation of the feature name table length Fuzz: * Added fuzzing vector for 8bit values Overall: * Missed white spaces (based on reviews of Fam Zheng and Stefan Hajnoczi) * Simplified attribute calls (based on the review of Stefan Hajnoczi) Maria Kustova (5): docs: Specification for the image fuzzer runner: Tool for fuzz tests execution fuzz: Fuzzing functions for qcow2 images layout: Generator of fuzzed qcow2 images package: Public API for image-fuzzer/runner/runner.py tests/image-fuzzer/docs/image-fuzzer.txt | 239 ++ tests/image-fuzzer/qcow2/__init__.py | 1 + tests/image-fuzzer/qcow2/fuzz.py | 327 + tests/image-fuzzer/qcow2/layout.py | 369 tests/image-fuzzer/runner/runner.py | 405 +++ 5 files changed, 1341 insertions(+) create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt create mode 100644 tests/image-fuzzer/qcow2/__init__.py create mode 100644 tests/image-fuzzer/qcow2/fuzz.py create mode 100644 tests/image-fuzzer/qcow2/layout.py create mode 100755 tests/image-fuzzer/runner/runner.py -- 1.9.3
[Qemu-devel] [PATCH V5 5/5] package: Public API for image-fuzzer/runner/runner.py
__init__.py provides the public API required by the test runner Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/__init__.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 tests/image-fuzzer/qcow2/__init__.py diff --git a/tests/image-fuzzer/qcow2/__init__.py b/tests/image-fuzzer/qcow2/__init__.py new file mode 100644 index 000..e2ebe19 --- /dev/null +++ b/tests/image-fuzzer/qcow2/__init__.py @@ -0,0 +1 @@ +from layout import create_image -- 1.9.3
[Qemu-devel] [PATCH V5 1/5] docs: Specification for the image fuzzer
'Overall fuzzer requirements' chapter contains the current product vision and features done and to be done. This chapter is still in progress. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/docs/image-fuzzer.txt | 239 +++ 1 file changed, 239 insertions(+) create mode 100644 tests/image-fuzzer/docs/image-fuzzer.txt diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt b/tests/image-fuzzer/docs/image-fuzzer.txt new file mode 100644 index 000..efe0ed4 --- /dev/null +++ b/tests/image-fuzzer/docs/image-fuzzer.txt @@ -0,0 +1,239 @@ +# Specification for the fuzz testing tool +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. + + +Image fuzzer + + +Description +--- + +The goal of the image fuzzer is to catch crashes of qemu-io/qemu-img +by providing to them randomly corrupted images. +Test images are generated from scratch and have valid inner structure with some +elements, e.g. L1/L2 tables, having random invalid values. + + +Test runner +--- + +The test runner generates test images, executes tests utilizing generated +images, indicates their results and collects all test related artifacts (logs, +core dumps, test images). +The test means execution of all available commands under test with the same +generated test image. +By default, the test runner generates new tests and executes them until +keyboard interruption. But if a test seed is specified via the '--seed' runner +parameter, then only one test with this seed will be executed, after its finish +the runner will exit. + +The runner uses an external image fuzzer to generate test images. An image +generator should be specified as a mandatory parameter of the test runner. +Details about interactions between the runner and fuzzers see Module +interfaces. + +The runner activates generation of core dumps during test executions, but it +assumes that core dumps will be generated in the current working directory. +For comprehensive test results, please, set up your test environment +properly. + +Paths to binaries under test (SUTs) qemu-img and qemu-io are retrieved from +environment variables. If the environment check fails the runner will +use SUTs installed in system paths. +qemu-img is required for creation of backing files, so it's mandatory to set +the related environment variable if it's not installed in the system path. +For details about environment variables see qemu-iotests/check. + +The runner accepts via the '--config' argument a JSON array of fields expected +to be fuzzed, e.g. + + '[[feature_name_table], [header, l1_table_offset]]' + +Each sublist can be have one or two strings defining image structure elements. +In the latter case a parent element should be placed on the first position, +and a field name on the second one. + +The runner accepts a list of commands under test as a JSON array via +the '--command' argument. Each command is a list containing a SUT and all its +arguments, e.g. + + runner.py -c '[[qemu-io, $test_img, -c, write $off $len]]' + /tmp/test ../qcow2 + +For variable arguments next aliases can be used: +- $test_img for a fuzzed img +- $off for an offset in the fuzzed image +- $len for a data size + +Values for last two aliases will be generated based on the size of the virtal +disk in the fuzzed image. +In case when no commands are specified the runner will execute commands from +the default list: +- qemu-img check +- qemu-img info +- qemu-img convert +- qemu-io -c read +- qemu-io -c write +- qemu-io -c aio_read +- qemu-io -c aio_write +- qemu-io -c flush +- qemu-io -c discard +- qemu-io -c truncate + + +Qcow2 image generator +- + +The 'qcow2' generator is a Python package providing 'create_image' method as +a single public API. See details in 'Test runner/image fuzzer' in 'Module +interfaces'. + +Qcow2 contains two submodules: fuzz.py and layout.py. + +'fuzz.py' contains all fuzzing functions, one per image field. It's assumed +that after code analysis every field will have own constraints for its value. +For now only universal potentially dangerous values are used, e.g. type limits +for integers or unsafe symbols as '%s' for strings. For bitmasks random amount +of bits are set to ones. All fuzzed values are checked on
[Qemu-devel] [PATCH V5 4/5] layout: Generator of fuzzed qcow2 images
The layout submodule of the qcow2 package creates a random valid image, randomly selects some amount of its fields, fuzzes them and write the fuzzed image to the file. Fuzzing process can be controlled by an external configuration. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/layout.py | 369 + 1 file changed, 369 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/layout.py diff --git a/tests/image-fuzzer/qcow2/layout.py b/tests/image-fuzzer/qcow2/layout.py new file mode 100644 index 000..4c08202 --- /dev/null +++ b/tests/image-fuzzer/qcow2/layout.py @@ -0,0 +1,369 @@ +# Generator of fuzzed qcow2 images +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random +import struct +import fuzz + +MAX_IMAGE_SIZE = 10 * (1 20) +# Standard sizes +UINT32_S = 4 +UINT64_S = 8 + + +class Field(object): + +Atomic image element (field). + +The class represents an image field as quadruple of a data format +of value necessary for its packing to binary form, an offset from +the beginning of the image, a value and a name. + +The field can be iterated as a list [format, offset, value]. + + +__slots__ = ('fmt', 'offset', 'value', 'name') + +def __init__(self, fmt, offset, val, name): +self.fmt = fmt +self.offset = offset +self.value = val +self.name = name + +def __iter__(self): +return iter([self.fmt, self.offset, self.value]) + +def __repr__(self): +return Field(fmt='%s', offset=%d, value=%s, name=%s) % \ +(self.fmt, self.offset, str(self.value), self.name) + + +class FieldsList(object): + +List of fields. + +The class allows access to a field in the list by its name and joins +several list in one via in-place addition. + + +def __init__(self, meta_data=None): +if meta_data is None: +self.data = [] +else: +self.data = [Field(f[0], f[1], f[2], f[3]) + for f in meta_data] + +def __getitem__(self, name): +return [x for x in self.data if x.name == name] + +def __iter__(self): +return iter(self.data) + +def __iadd__(self, other): +self.data += other.data +return self + +def __len__(self): +return len(self.data) + + +class Image(object): + + Qcow2 image object. + +This class allows to create qcow2 images with random valid structures and +values, fuzz them via external qcow2.fuzz module and write the result to +a file. + + +@staticmethod +def _size_params(): +Generate a random image size aligned to a random correct +cluster size. + +cluster_bits = random.randrange(9, 21) +cluster_size = 1 cluster_bits +img_size = random.randrange(0, MAX_IMAGE_SIZE + 1, cluster_size) +return (cluster_bits, img_size) + +@staticmethod +def _header(cluster_bits, img_size, backing_file_name=None): +Generate a random valid header. +meta_header = [ +['4s', 0, QFI\xfb, 'magic'], +['I', 4, random.randint(2, 3), 'version'], +['Q', 8, 0, 'backing_file_offset'], +['I', 16, 0, 'backing_file_size'], +['I', 20, cluster_bits, 'cluster_bits'], +['Q', 24, img_size, 'size'], +['I', 32, 0, 'crypt_method'], +['I', 36, 0, 'l1_size'], +['Q', 40, 0, 'l1_table_offset'], +['Q', 48, 0, 'refcount_table_offset'], +['I', 56, 0, 'refcount_table_clusters'], +['I', 60, 0, 'nb_snapshots'], +['Q', 64, 0, 'snapshots_offset'], +['Q', 72, 0, 'incompatible_features'], +['Q', 80, 0, 'compatible_features'], +['Q', 88, 0, 'autoclear_features'], +# Only refcount_order = 4 is supported by current (07.2014) +# implementation of QEMU +['I', 96, 4, 'refcount_order'], +['I', 100, 0, 'header_length'] +] +v_header = FieldsList(meta_header) + +if v_header['version'][0].value == 2: +v_header['header_length'][0].value = 72 +else: +v_header['incompatible_features'][0].value = random.getrandbits(2) +
[Qemu-devel] [PATCH V5 2/5] runner: Tool for fuzz tests execution
The purpose of the test runner is to prepare the test environment (e.g. create a work directory, a test image, etc), execute a program under test with parameters, indicate a test failure if the program was killed during the test execution and collect core dumps, logs and other test artifacts. The test runner doesn't depend on an image format or a program will be tested, so it can be used with any external image generator and program under test. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/runner/runner.py | 405 1 file changed, 405 insertions(+) create mode 100755 tests/image-fuzzer/runner/runner.py diff --git a/tests/image-fuzzer/runner/runner.py b/tests/image-fuzzer/runner/runner.py new file mode 100755 index 000..3fa7fca --- /dev/null +++ b/tests/image-fuzzer/runner/runner.py @@ -0,0 +1,405 @@ +#!/usr/bin/env python + +# Tool for running fuzz tests +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import sys +import os +import signal +import subprocess +import random +import shutil +from itertools import count +import getopt +import StringIO +import resource + +try: +import json +except ImportError: +try: +import simplejson as json +except ImportError: +print sys.stderr, \ +Warning: Module for JSON processing is not found.\n \ +'--config' and '--command' options are not supported. + +# Backing file sizes in MB +MAX_BACKING_FILE_SIZE = 10 +MIN_BACKING_FILE_SIZE = 1 + + +def multilog(msg, *output): + Write an object to all of specified file descriptors. +for fd in output: +fd.write(msg) +fd.flush() + + +def str_signal(sig): + Convert a numeric value of a system signal to the string one +defined by the current operational system. + +for k, v in signal.__dict__.items(): +if v == sig: +return k + + +def run_app(fd, q_args): +Start an application with specified arguments and return its exit code +or kill signal depending on the result of execution. + +devnull = open('/dev/null', 'r+') +process = subprocess.Popen(q_args, stdin=devnull, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE) +out, err = process.communicate() +fd.write(out) +fd.write(err) +return process.returncode + + +class TestException(Exception): +Exception for errors risen by TestEnv objects. +pass + + +class TestEnv(object): + +Test object. + +The class sets up test environment, generates backing and test images +and executes application under tests with specified arguments and a test +image provided. + +All logs are collected. + +The summary log will contain short descriptions and statuses of tests in +a run. + +The test log will include application (e.g. 'qemu-img') logs besides info +sent to the summary log. + + +def __init__(self, test_id, seed, work_dir, run_log, + cleanup=True, log_all=False): +Set test environment in a specified work directory. + +Path to qemu-img and qemu-io will be retrieved from 'QEMU_IMG' and +'QEMU_IO' environment variables. + +if seed is not None: +self.seed = seed +else: +self.seed = str(random.randint(0, sys.maxint)) +random.seed(self.seed) + +self.init_path = os.getcwd() +self.work_dir = work_dir +self.current_dir = os.path.join(work_dir, 'test-' + test_id) +self.qemu_img = os.environ.get('QEMU_IMG', 'qemu-img')\ + .strip().split(' ') +self.qemu_io = os.environ.get('QEMU_IO', 'qemu-io').strip().split(' ') +self.commands = [['qemu-img', 'check', '-f', 'qcow2', '$test_img'], + ['qemu-img', 'info', '-f', 'qcow2', '$test_img'], + ['qemu-io', '$test_img', '-c', 'read $off $len'], + ['qemu-io', '$test_img', '-c', 'write $off $len'], + ['qemu-io', '$test_img', '-c', + 'aio_read $off $len'], + ['qemu-io', '$test_img', '-c', + 'aio_write $off $len'], + ['qemu-io', '$test_img', '-c',
[Qemu-devel] [PATCH V5 3/5] fuzz: Fuzzing functions for qcow2 images
The fuzz submodule of the qcow2 image generator contains fuzzing functions for image fields. Each fuzzing function contains a list of constraints and a call of a helper function that randomly selects a fuzzed value satisfied to one of constraints. For now constraints include only known as invalid or potentially dangerous values. But after investigation of code coverage by fuzz tests they will be expanded by heuristic values based on inner checks and flows of a program under test. Now fuzzing of a header, header extensions and a backing file name is supported. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/fuzz.py | 327 +++ 1 file changed, 327 insertions(+) create mode 100644 tests/image-fuzzer/qcow2/fuzz.py diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py new file mode 100644 index 000..a53c84f --- /dev/null +++ b/tests/image-fuzzer/qcow2/fuzz.py @@ -0,0 +1,327 @@ +# Fuzzing functions for qcow2 fields +# +# Copyright (C) 2014 Maria Kustova mari...@catit.be +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see http://www.gnu.org/licenses/. +# + +import random + + +UINT8 = 0xff +UINT32 = 0x +UINT64 = 0x +# Most significant bit orders +UINT32_M = 31 +UINT64_M = 63 +# Fuzz vectors +UINT8_V = [0, 0x10, UINT8/4, UINT8/2 - 1, UINT8/2, UINT8/2 + 1, UINT8 - 1, + UINT8] +UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1, +UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32] +UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4, + UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1, + UINT64] +STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x', +'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n', +'%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p', +'%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C%S%08x%%', +'%s x 129', '%x x 257'] + + +def random_from_intervals(intervals): +Select a random integer number from the list of specified intervals. + +Each interval is a tuple of lower and upper limits of the interval. The +limits are included. Intervals in a list should not overlap. + +total = reduce(lambda x, y: x + y[1] - y[0] + 1, intervals, 0) +r = random.randint(0, total - 1) + intervals[0][0] +for x in zip(intervals, intervals[1:]): +r = r + (r x[0][1]) * (x[1][0] - x[0][1] - 1) +return r + + +def random_bits(bit_ranges): +Generate random binary mask with ones in the specified bit ranges. + +Each bit_ranges is a list of tuples of lower and upper limits of bit +positions will be fuzzed. The limits are included. Random amount of bits +in range limits will be set to ones. The mask is returned in decimal +integer format. + +bit_numbers = [] +# Select random amount of random positions in bit_ranges +for rng in bit_ranges: +bit_numbers += random.sample(range(rng[0], rng[1] + 1), + random.randint(0, rng[1] - rng[0] + 1)) +val = 0 +# Set bits on selected positions to ones +for bit in bit_numbers: +val |= 1 bit +return val + + +def truncate_string(strings, length): +Return strings truncated to specified length. +if type(strings) == list: +return [s[:length] for s in strings] +else: +return strings[:length] + + +def validator(current, pick, choices): +Return a value not equal to the current selected by the pick +function from choices. + +while True: +val = pick(choices) +if not val == current: +return val + + +def int_validator(current, intervals): +Return a random value from intervals not equal to the current. + +This function is useful for selection from valid values except current one. + +return validator(current, random_from_intervals, intervals) + + +def bit_validator(current, bit_ranges): +Return a random bit mask not equal to the current. + +This function is useful for selection from valid values except current one. + +return validator(current, random_bits, bit_ranges) + + +def string_validator(current, strings): +Return a random string value from the list not equal to the current. + +This
Re: [Qemu-devel] [edk2] license for binary drivers
Il 06/08/2014 12:34, Laszlo Ersek ha scritto: So no, you can't ship an OVMF binary (or source tarball) that contains the FAT driver, bundled as part of the GPLv2 (+compatible) QEMU distribution, either in source or in binary form. What Laszlo said is mostly my understanding too (IANAL etc.). Just one thing: the GPL does allow you to merely aggregate the OVMF binaries with QEMU sources or QEMU binaries; and a lawyer could also tell you the if QEMU were to ship OVMF binaries in its tarball, it would be mere aggregation. It would then be allowed by the GPL. I wouldn't disagree; the OVMF binaries are just data as far as QEMU is concerned. However, the non-free nature of the OVMF binaries mean that QEMU will never ever ship OVMF binaries until the license is fixed for the offending FAT driver. Not only because we don't want to get into legal minefields, but also because QEMU is free software and wants to keep its distributed releases entirely free. Paolo
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
On Wed, Aug 6, 2014 at 4:50 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/08/2014 10:38, Ming Lei ha scritto: On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 06/08/2014 07:33, Ming Lei ha scritto: I played a bit with the following, I hope it's not too naive. I couldn't see a difference with your patches, but at least one reason for this is probably that my laptop SSD isn't fast enough to make the CPU the bottleneck. Haven't tried ramdisk yet, that would probably be the next thing. (I actually wrote the patch up just for some profiling on my own, not for comparing throughput, but it should be usable for that as well.) This might not be good for the test since it is basically a sequential read test, which can be optimized a lot by kernel. And I always use randread benchmark. A microbenchmark already exists in tests/test-coroutine.c, and doesn't really tell us much; it's obvious that coroutines execute more code, the question is why it affects the iops performance. Could you take a look at the coroutine benchmark I worte? The running result shows coroutine does decrease performance a lot compared with bypass coroutine like the patchset is doing. Your benchmark is synchronous, while disk I/O is asynchronous. It can be thought as asynchronous too, since it won't sleep like synchronous I/O. Basically the IO thread is CPU bound type in case of linux-aio since both submission and completion won't block CPU mostly, so my benchmark still fits if we thought the completion as nop. The current problem is that from single coroutine benchmark, looks it doesn't hurt performance with stack switch, but in Kevin's block aio benchmark, bypass coroutine can still obtain observable improvement. Your benchmark doesn't add much compared to time tests/test-coroutine -m perf -p /perf/yield. It takes 8 seconds on my machine, and 10^8 function calls obviously take less than 8 seconds. I've sent a patch to add a baseline function call benchmark to test-coroutine. The sequential read should be the right workload. For fio, you want to get as many iops as possible to QEMU and so you need randread. But qemu-img is not run in a guest and if the kernel optimizes sequential reads then the bypass should have even more benefits because it makes userspace proportionally more expensive. Do you agree with this? Yes, I have posted the result of the benchmark, and looks the result is basically similar with my previous test on dataplane. Thanks,
[Qemu-devel] [PATCH V2 0/3] image-fuzzer: Support L1/L2 tables in the qcow2 image generator
This patch series adds support of L1/L2 tables to the qcow2 image generator. This patch series was created for the 'block-next' branch and based on the next series: [PATCH V5 0/5] tests: Add the image fuzzer with qcow2 support. v1 - v2: * Rebased to the new version of the parent patch series * Fixed wrong maximum number of L2 tables * Fixed missed whitespaces (based on the review of Stefan Hajnoczi) Maria Kustova (3): docs: Expand the list of supported image elements with L1/L2 tables fuzz: Add fuzzing functions for L1/L2 table entries layout: Add generators of L1/L2 tables tests/image-fuzzer/docs/image-fuzzer.txt | 3 +- tests/image-fuzzer/qcow2/fuzz.py | 28 tests/image-fuzzer/qcow2/layout.py | 273 --- 3 files changed, 240 insertions(+), 64 deletions(-) -- 1.9.3
[Qemu-devel] [PATCH V2 1/3] docs: Expand the list of supported image elements with L1/L2 tables
Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/docs/image-fuzzer.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/image-fuzzer/docs/image-fuzzer.txt b/tests/image-fuzzer/docs/image-fuzzer.txt index efe0ed4..2e8e3b9 100644 --- a/tests/image-fuzzer/docs/image-fuzzer.txt +++ b/tests/image-fuzzer/docs/image-fuzzer.txt @@ -125,8 +125,7 @@ If a fuzzer configuration is specified, then it has the next interpretation: will be always fuzzed for every test. This case is useful for regression testing. -For now only header fields and header extensions are generated. - +For now only header fields, header extensions and L1/L2 tables are generated. Module interfaces - -- 1.9.3
[Qemu-devel] [PATCH V2 3/3] layout: Add generators of L1/L2 tables
Valid L2 entries contain offsets to image clusters filled with random data. L2 entries have random positions inside L2 tables. L1 entries contain offsets to generated L2 tables and also have random positions inside the L1 table. Clusters for L1/L2 tables and guest data are selected randomly. Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/layout.py | 273 - 1 file changed, 211 insertions(+), 62 deletions(-) diff --git a/tests/image-fuzzer/qcow2/layout.py b/tests/image-fuzzer/qcow2/layout.py index 4c08202..7839d2c 100644 --- a/tests/image-fuzzer/qcow2/layout.py +++ b/tests/image-fuzzer/qcow2/layout.py @@ -19,6 +19,8 @@ import random import struct import fuzz +from math import ceil +from os import urandom MAX_IMAGE_SIZE = 10 * (1 20) # Standard sizes @@ -102,7 +104,66 @@ class Image(object): return (cluster_bits, img_size) @staticmethod -def _header(cluster_bits, img_size, backing_file_name=None): +def _get_available_clusters(used, number): +Return a set of indices of not allocated clusters. + +'used' contains indices of currently allocated clusters. +All clusters that cannot be allocated between 'used' clusters will have +indices appended to the end of 'used'. + +append_id = max(used) + 1 +free = set(range(1, append_id)) - used +if len(free) = number: +return set(random.sample(free, number)) +else: +return free | set(range(append_id, append_id + number - len(free))) + +@staticmethod +def _get_adjacent_clusters(used, size): +Return an index of the first cluster in the sequence of free ones. + +'used' contains indices of currently allocated clusters. 'size' is the +length of the sequence of free clusters. +If the sequence of 'size' is not available between 'used' clusters, its +first index will be append to the end of 'used'. + +def get_cluster_id(lst, length): +Return the first index of the sequence of the specified length +or None if the sequence cannot be inserted in the list. + +if len(lst) != 0: +pairs = [] +pair = (lst[0], 1) +for i in range(1, len(lst)): +if lst[i] == lst[i-1] + 1: +pair = (lst[i], pair[1] + 1) +else: +pairs.append(pair) +pair = (lst[i], 1) +pairs.append(pair) +random.shuffle(pairs) +for x, s in pairs: +if s = length: +return x - length + 1 +return None + +append_id = max(used) + 1 +free = list(set(range(1, append_id)) - used) +idx = get_cluster_id(free, size) +if idx is None: +return append_id +else: +return idx + +@staticmethod +def _alloc_data(img_size, cluster_size): +Return a set of random indices of clusters allocated for guest data. + +num_of_cls = img_size/cluster_size +return set(random.sample(range(1, num_of_cls + 1), + random.randint(0, num_of_cls))) + +def create_header(self, cluster_bits, backing_file_name=None): Generate a random valid header. meta_header = [ ['4s', 0, QFI\xfb, 'magic'], @@ -110,7 +171,7 @@ class Image(object): ['Q', 8, 0, 'backing_file_offset'], ['I', 16, 0, 'backing_file_size'], ['I', 20, cluster_bits, 'cluster_bits'], -['Q', 24, img_size, 'size'], +['Q', 24, self.image_size, 'size'], ['I', 32, 0, 'crypt_method'], ['I', 36, 0, 'l1_size'], ['Q', 40, 0, 'l1_table_offset'], @@ -126,63 +187,59 @@ class Image(object): ['I', 96, 4, 'refcount_order'], ['I', 100, 0, 'header_length'] ] -v_header = FieldsList(meta_header) +self.header = FieldsList(meta_header) -if v_header['version'][0].value == 2: -v_header['header_length'][0].value = 72 +if self.header['version'][0].value == 2: +self.header['header_length'][0].value = 72 else: -v_header['incompatible_features'][0].value = random.getrandbits(2) -v_header['compatible_features'][0].value = random.getrandbits(1) -v_header['header_length'][0].value = 104 - -max_header_len = struct.calcsize(v_header['header_length'][0].fmt) + \ - v_header['header_length'][0].offset +self.header['incompatible_features'][0].value = \ +random.getrandbits(2) +self.header['compatible_features'][0].value = random.getrandbits(1) +
[Qemu-devel] [PATCH V2 2/3] fuzz: Add fuzzing functions for L1/L2 table entries
Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/fuzz.py | 28 1 file changed, 28 insertions(+) diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py index a53c84f..57527f9 100644 --- a/tests/image-fuzzer/qcow2/fuzz.py +++ b/tests/image-fuzzer/qcow2/fuzz.py @@ -325,3 +325,31 @@ def feature_name(current): truncate_string(STRING_V, 46) # Fuzz padding (field length = 46) ] return selector(current, constraints, string_validator) + + +def l1_entry(current): +Fuzz an entry of the L1 table. +constraints = UINT64_V +# Reserved bits are ignored +# Added a possibility when only flags are fuzzed +offset = 0x7fff random.choice([selector(current, + constraints), + current]) +is_cow = random.randint(0, 1) +return offset + (is_cow UINT64_M) + + +def l2_entry(current): +Fuzz an entry of an L2 table. +constraints = UINT64_V +# Reserved bits are ignored +# Add a possibility when only flags are fuzzed +offset = 0x3ffe random.choice([selector(current, + constraints), + current]) +is_compressed = random.randint(0, 1) +is_cow = random.randint(0, 1) +is_zero = random.randint(0, 1) +value = offset + (is_cow UINT64_M) + \ +(is_compressed UINT64_M - 1) + is_zero +return value -- 1.9.3
Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
Am 06.08.2014 um 13:28 hat Ming Lei geschrieben: On Wed, Aug 6, 2014 at 6:09 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 11:37 hat Ming Lei geschrieben: On Wed, Aug 6, 2014 at 4:48 PM, Kevin Wolf kw...@redhat.com wrote: Am 06.08.2014 um 07:33 hat Ming Lei geschrieben: Hi Kevin, On Tue, Aug 5, 2014 at 10:47 PM, Kevin Wolf kw...@redhat.com wrote: Am 05.08.2014 um 15:48 hat Stefan Hajnoczi geschrieben: I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your bypass code path to run the request inside a coroutine. That way you can compare bypass without coroutine against bypass with coroutine. Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. My doubts aren't only that the overhead might not come from the coroutines, but also whether any coroutine-related overhead is really unavoidable. If we can optimise coroutines, I'd strongly prefer to do just that instead of introducing additional code paths. OK, thank you for taking look at the problem, and hope we can figure out the root cause, :-) Another thought I had was this: If the performance difference is indeed only coroutines, then that is completely inside the block layer and we don't actually need a VM to test it. We could instead have something like a simple qemu-img based benchmark and should be observing the same. Even it is simpler to run a coroutine-only benchmark, and I just wrote a raw one, and looks coroutine does decrease performance a lot, please see the attachment patch, and thanks for your template to help me add the 'co_bench' command in qemu-img. Yes, we can look at coroutines microbenchmarks in isolation. I actually did do that yesterday with the yield test from tests/test-coroutine.c. And in fact profiling immediately showed something to optimise: pthread_getspecific() was quite high, replacing it by __thread on systems where it works is more efficient and helped the numbers a bit. Also, a lot of time seems to be spent in pthread_mutex_lock/unlock (even in qemu-img bench), maybe there's even something that can be done here. The lock/unlock in dataplane is often from memory_region_find(), and Paolo should have done lots of work on that. qemu-img bench doesn't run that code. We have a few more locks that are taken, and one of them (the coroutine pool lock) is avoided by your bypass patches. However, I just wasn't sure whether a change on this level would be relevant in a realistic environment. This is the reason why I wanted to get a benchmark involving the block layer and some I/O. From the profiling data in below link: http://pastebin.com/YwH2uwbq With coroutine, the running time for same loading is increased ~50%(1.325s vs. 0.903s), and dcache load events is increased ~35%(693M vs. 512M), insns per cycle is decreased by ~50%( 1.35 vs. 1.63), compared with bypassing coroutine(-b parameter). The bypass code in the benchmark is very similar with the approach used in the bypass patch, since linux-aio with O_DIRECT seldom blocks in the the kernel I/O path. Maybe the benchmark is a bit extremely, but given modern storage device may reach millions of IOPS, and it is very easy to slow down the I/O by coroutine. I think in order to optimise coroutines, such benchmarks are fair game. It's just not guaranteed that the effects are exactly the same on real workloads, so we should take the results with a grain of salt. Anyhow, the coroutine version of your benchmark is buggy, it leaks all coroutines instead of exiting them, so it can't make any use of the coroutine pool. On my laptop, I get this (where fixed coroutine is a version that simply removes the yield at the end): | bypass| fixed coro| buggy coro +---+---+-- time| 1.09s | 1.10s | 1.62s L1-dcache-loads | 921,836,360 | 932,781,747 | 1,298,067,438 insns per cycle | 2.39 | 2.39 | 1.90 Begs the question whether you see a similar effect on a real qemu and the coroutine pool is still not big enough? With correct use of coroutines, the difference seems to be barely measurable even without any I/O involved. When I comment qemu_coroutine_yield(), looks result of bypass and fixed coro is very similar as your test, and I am just wondering if stack is always switched in qemu_coroutine_enter() without calling qemu_coroutine_yield(). Yes, definitely. qemu_coroutine_enter() always involves
[Qemu-devel] [PATCH] layout: Reduce number of generator functions in __init__
Some issues can be found only when a fuzzed image has a partial structure, e.g. has L1/L2 tables but no refcount ones. Generation of an entirely defined image limits these cases. Now the Image constructor creates only a header and a backing file name (if any), other image elements are generated in the 'create_image' API. This patch series was created for the 'block-next' branch and based on the next series: [PATCH V2 0/3] image-fuzzer: Support L1/L2 tables in the qcow2 image generator Signed-off-by: Maria Kustova mari...@catit.be --- tests/image-fuzzer/qcow2/layout.py | 92 -- 1 file changed, 49 insertions(+), 43 deletions(-) diff --git a/tests/image-fuzzer/qcow2/layout.py b/tests/image-fuzzer/qcow2/layout.py index 7839d2c..98673c4 100644 --- a/tests/image-fuzzer/qcow2/layout.py +++ b/tests/image-fuzzer/qcow2/layout.py @@ -156,6 +156,14 @@ class Image(object): return idx @staticmethod +def _get_cluster_ids(fields_list, cluster_size): +Return indices of clusters allocated by fields of fields_list. +ids = set() +for x in fields_list: +ids.add(x.offset/cluster_size) +return ids + +@staticmethod def _alloc_data(img_size, cluster_size): Return a set of random indices of clusters allocated for guest data. @@ -196,12 +204,12 @@ class Image(object): random.getrandbits(2) self.header['compatible_features'][0].value = random.getrandbits(1) self.header['header_length'][0].value = 104 - -max_header_len = struct.calcsize( +# Extensions starts at the header last field offset and the field size +self.ext_offset = struct.calcsize( self.header['header_length'][0].fmt) + \ self.header['header_length'][0].offset end_of_extension_area_len = 2 * UINT32_S -free_space = self.cluster_size - max_header_len - \ +free_space = self.cluster_size - self.ext_offset - \ end_of_extension_area_len # If the backing file name specified and there is enough space for it # in the first cluster, then it's placed in the very end of the first @@ -224,24 +232,18 @@ class Image(object): [data_fmt, self.header['backing_file_offset'][0].value, backing_file_name, 'bf_name'] ]) -else: -self.backing_file_name = FieldsList() def set_backing_file_format(self, backing_file_fmt=None): Generate the header extension for the backing file format. -self.backing_file_format = FieldsList() -offset = struct.calcsize(self.header['header_length'][0].fmt) + \ - self.header['header_length'][0].offset - if backing_file_fmt is not None: # Calculation of the free space available in the first cluster end_of_extension_area_len = 2 * UINT32_S high_border = (self.header['backing_file_offset'][0].value or (self.cluster_size - 1)) - \ end_of_extension_area_len -free_space = high_border - offset +free_space = high_border - self.ext_offset ext_size = 2 * UINT32_S + ((len(backing_file_fmt) + 7) ~7) if free_space = ext_size: @@ -249,18 +251,19 @@ class Image(object): ext_data_fmt = '' + str(ext_data_len) + 's' ext_padding_len = 7 - (ext_data_len - 1) % 8 self.backing_file_format = FieldsList([ -['I', offset, 0xE2792ACA, 'ext_magic'], -['I', offset + UINT32_S, ext_data_len, 'ext_length'], -[ext_data_fmt, offset + UINT32_S * 2, backing_file_fmt, - 'bf_format'] +['I', self.ext_offset, 0xE2792ACA, 'ext_magic'], +['I', self.ext_offset + UINT32_S, ext_data_len, + 'ext_length'], +[ext_data_fmt, self.ext_offset + UINT32_S * 2, + backing_file_fmt, 'bf_format'] ]) -offset = self.backing_file_format['bf_format'][0].offset + \ - struct.calcsize(self.backing_file_format[ - 'bf_format'][0].fmt) + ext_padding_len - -return offset +self.ext_offset = \ +struct.calcsize( +self.backing_file_format['bf_format'][0].fmt) + \ +ext_padding_len + \ +self.backing_file_format['bf_format'][0].offset -def create_feature_name_table(self, offset): +def create_feature_name_table(self): Generate a random header extension for names of features used in the image. @@ -272,7 +275,7 @@ class Image(object): high_border =
Re: [Qemu-devel] [PATCH v2 1/2] hw/misc/arm_sp810: Create SP810 device
On 06 Aug 2014, at 01:03, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: On Tue, Aug 5, 2014 at 7:32 PM, Fabian Aggeler aggel...@ethz.ch wrote: This adds a device model for the PrimeXsys System Controller (SP810) which is present in the Versatile Express motherboards. It is so far read-only but allows to set the SCCTRL register. Signed-off-by: Fabian Aggeler aggel...@ethz.ch --- default-configs/arm-softmmu.mak | 1 + hw/misc/Makefile.objs | 1 + hw/misc/arm_sp810.c | 98 + include/hw/misc/arm_sp810.h | 85 +++ 4 files changed, 185 insertions(+) create mode 100644 hw/misc/arm_sp810.c create mode 100644 include/hw/misc/arm_sp810.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index f3513fa..67ba99a 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -55,6 +55,7 @@ CONFIG_PL181=y CONFIG_PL190=y CONFIG_PL310=y CONFIG_PL330=y +CONFIG_SP810=y CONFIG_CADENCE=y CONFIG_XGMAC=y CONFIG_EXYNOS4=y diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index 979e532..49d023b 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -13,6 +13,7 @@ common-obj-$(CONFIG_PL310) += arm_l2x0.o common-obj-$(CONFIG_INTEGRATOR_DEBUG) += arm_integrator_debug.o common-obj-$(CONFIG_A9SCU) += a9scu.o common-obj-$(CONFIG_ARM11SCU) += arm11scu.o +common-obj-$(CONFIG_SP810) += arm_sp810.o # PKUnity SoC devices common-obj-$(CONFIG_PUV3) += puv3_pm.o diff --git a/hw/misc/arm_sp810.c b/hw/misc/arm_sp810.c new file mode 100644 index 000..21eb816 --- /dev/null +++ b/hw/misc/arm_sp810.c @@ -0,0 +1,98 @@ +/* + * ARM PrimeXsys System Controller (SP810) + * + * Copyright (C) 2014 Fabian Aggeler aggel...@ethz.ch + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include hw/misc/arm_sp810.h + +static const VMStateDescription vmstate_arm_sysctl = { +.name = arm_sp810, +.version_id = 1, +.minimum_version_id = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT32(sc_ctrl, arm_sp810_state), +VMSTATE_END_OF_LIST() +} +}; + +static uint64_t arm_sp810_read(void *opaque, hwaddr offset, unsigned size) +{ +arm_sp810_state *s = opaque; + +switch (offset) { +case SCCTRL: +return s-sc_ctrl; +default: +qemu_log_mask(LOG_UNIMP, +arm_sp810_read: Register not yet implemented: %s\n, +HWADDR_PRIx); +return 0; +} +} + +static void arm_sp810_write(void *opaque, hwaddr offset, +uint64_t val, unsigned size) +{ +switch (offset) { +default: +qemu_log_mask(LOG_UNIMP, +arm_sp810_write: Register not yet implemented: %s\n, +HWADDR_PRIx); +return; +} +} + +static const MemoryRegionOps arm_sp810_ops = { +.read = arm_sp810_read, +.write = arm_sp810_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +static void arm_sp810_init(Object *obj) +{ +arm_sp810_state *s = ARM_SP810(obj); + +memory_region_init_io(s-iomem, obj, arm_sp810_ops, s, arm-sp810, + 0x1000); +sysbus_init_mmio(SYS_BUS_DEVICE(obj), s-iomem); +} + +static Property arm_sp810_properties[] = { +DEFINE_PROP_UINT32(sc-ctrl, arm_sp810_state, sc_ctrl, 0x09), +DEFINE_PROP_END_OF_LIST(), +}; + So it looks to me like the SCCTRL contains multiple register fields for multiple configurable options. Although i'm having trouble getting my head around it as I could not easily find public docco for this core (have a link handy?). Unfortunately not as it seems ARM marked it as obsolete. We found the document offline. I could not find it on the web either. See also http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/038722.html Indeed SCCTRL has many configuration bits, not just these 4 configuration options. Anyways, the thinking is you should define multiple configurable options as invididual QOM properties, rather than have board level code init registers. This would mean that you DEFINE_PROP_UINT8(timeren0-sel, ...) DEFINE_PROP_UINT8(timeren1-sel, ...) Have a look at hw/dma/pl330.c for an example of invidividual configs getting packed into a single sw visible register. Okay, I will have a look at pl330 then. However these look like software configurable
[Qemu-devel] [Bug 1285708] Re: FreeBSD Guest crash on boot due to xsave instruction issue
** Changed in: linux (Ubuntu Precise) Status: New = Won't Fix -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1285708 Title: FreeBSD Guest crash on boot due to xsave instruction issue Status in QEMU: New Status in qemu-kvm: New Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Precise: Won't Fix Status in “linux” source package in Trusty: In Progress Bug description: When trying to boot a working FreeBSD 9.1/9.2 guest on a kvm/qemu host with the following command: kvm -m 256 -cdrom FreeBSD-9.2-RELEASE-amd64-disc1.iso -drive file=FreeBSD-9.2-RELEASE-amd64.qcow2,if=virtio -net nic,model=virtio -net user -nographic -vnc :10 -enable-kvm -balloon virtio -cpu core2duo,+xsave The FreeBSD Guest will kernel crash on boot with the following error: panic: CPU0 does not support X87 or SSE: 0 When launching the guest without the cpu flags, it works just fine. This bug has been resolved in source: https://lkml.org/lkml/2014/2/22/58 Can this fix be included in Precise ASAP! To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1285708/+subscriptions
[Qemu-devel] [PULL 02/11] icount: put icount variables into TimerState.
From: KONRAD Frederic fred.kon...@greensocs.com This puts qemu_icount and qemu_icount_bias into TimerState structure to allow them to be migrated. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/cpus.c b/cpus.c index 5e7f2cf..127de1c 100644 --- a/cpus.c +++ b/cpus.c @@ -102,17 +102,12 @@ static bool all_cpu_threads_idle(void) /* Protected by TimersState seqlock */ -/* Compensate for varying guest execution speed. */ -static int64_t qemu_icount_bias; static int64_t vm_clock_warp_start; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ #define MAX_ICOUNT_SHIFT 10 -/* Only written by TCG thread */ -static int64_t qemu_icount; - static QEMUTimer *icount_rt_timer; static QEMUTimer *icount_vm_timer; static QEMUTimer *icount_warp_timer; @@ -129,6 +124,11 @@ typedef struct TimersState { int64_t cpu_clock_offset; int32_t cpu_ticks_enabled; int64_t dummy; + +/* Compensate for varying guest execution speed. */ +int64_t qemu_icount_bias; +/* Only written by TCG thread */ +int64_t qemu_icount; } TimersState; static TimersState timers_state; @@ -139,14 +139,14 @@ static int64_t cpu_get_icount_locked(void) int64_t icount; CPUState *cpu = current_cpu; -icount = qemu_icount; +icount = timers_state.qemu_icount; if (cpu) { if (!cpu_can_do_io(cpu)) { fprintf(stderr, Bad clock read\n); } icount -= (cpu-icount_decr.u16.low + cpu-icount_extra); } -return qemu_icount_bias + (icount icount_time_shift); +return timers_state.qemu_icount_bias + (icount icount_time_shift); } int64_t cpu_get_icount(void) @@ -284,7 +284,8 @@ static void icount_adjust(void) icount_time_shift++; } last_delta = delta; -qemu_icount_bias = cur_icount - (qemu_icount icount_time_shift); +timers_state.qemu_icount_bias = cur_icount + - (timers_state.qemu_icount icount_time_shift); seqlock_write_unlock(timers_state.vm_clock_seqlock); } @@ -333,7 +334,7 @@ static void icount_warp_rt(void *opaque) int64_t delta = cur_time - cur_icount; warp_delta = MIN(warp_delta, delta); } -qemu_icount_bias += warp_delta; +timers_state.qemu_icount_bias += warp_delta; } vm_clock_warp_start = -1; seqlock_write_unlock(timers_state.vm_clock_seqlock); @@ -351,7 +352,7 @@ void qtest_clock_warp(int64_t dest) int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); int64_t warp = qemu_soonest_timeout(dest - clock, deadline); seqlock_write_lock(timers_state.vm_clock_seqlock); -qemu_icount_bias += warp; +timers_state.qemu_icount_bias += warp; seqlock_write_unlock(timers_state.vm_clock_seqlock); qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL); @@ -1250,7 +1251,8 @@ static int tcg_cpu_exec(CPUArchState *env) int64_t count; int64_t deadline; int decr; -qemu_icount -= (cpu-icount_decr.u16.low + cpu-icount_extra); +timers_state.qemu_icount -= (cpu-icount_decr.u16.low ++ cpu-icount_extra); cpu-icount_decr.u16.low = 0; cpu-icount_extra = 0; deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL); @@ -1265,7 +1267,7 @@ static int tcg_cpu_exec(CPUArchState *env) } count = qemu_icount_round(deadline); -qemu_icount += count; +timers_state.qemu_icount += count; decr = (count 0x) ? 0x : count; count -= decr; cpu-icount_decr.u16.low = decr; @@ -1278,7 +1280,8 @@ static int tcg_cpu_exec(CPUArchState *env) if (use_icount) { /* Fold pending instructions back into the instruction counter, and clear the interrupt flag. */ -qemu_icount -= (cpu-icount_decr.u16.low + cpu-icount_extra); +timers_state.qemu_icount -= (cpu-icount_decr.u16.low ++ cpu-icount_extra); cpu-icount_decr.u32 = 0; cpu-icount_extra = 0; } -- 1.9.3
[Qemu-devel] [PULL 00/11] KVM, icount changes for 2014-08-06
The following changes since commit 41a1a9c42c4e0fb5f1b94aa8b72e42f66ebde3d9: po: Update German translation (2014-07-28 23:37:17 +0200) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 92627748f7f7355bb2ea676a45791bd66b84aee0: target-mips: Ignore unassigned accesses with KVM (2014-08-06 17:53:07 +0200) KVM changes include a MIPS patch and the testdev backend used by the ARM kvm-unit-tests. icount include the first part of reverse execution and Sebastian Tanase's patches to slow down -icount execution to the desired speed of the target. James Hogan (1): target-mips: Ignore unassigned accesses with KVM KONRAD Frederic (3): icount: put icount variables into TimerState. migration: migrate icount fields. timer: add cpu_icount_to_ns function. Paolo Bonzini (1): backends: Introduce chr-testdev Sebastian Tanase (6): icount: Fix virtual clock start value on ARM icount: Add QemuOpts for icount icount: Add align option to icount cpu-exec: Add sleeping algorithm cpu-exec: Print to console if the guest is late monitor: Add drift info to 'info jit' backends/Makefile.objs | 2 +- backends/testdev.c | 131 cpu-exec.c | 116 ++ cpus.c | 114 ++--- include/qemu-common.h | 8 ++- include/qemu/timer.h| 2 + include/sysemu/char.h | 3 ++ monitor.c | 1 + qapi-schema.json| 3 +- qemu-char.c | 4 ++ qemu-options.hx | 17 +-- qtest.c | 13 - stubs/Makefile.objs | 1 + stubs/chr-testdev.c | 7 +++ target-mips/op_helper.c | 11 vl.c| 39 +++--- 16 files changed, 440 insertions(+), 32 deletions(-) create mode 100644 backends/testdev.c create mode 100644 stubs/chr-testdev.c -- 1.9.3
[Qemu-devel] [PULL 03/11] migration: migrate icount fields.
From: KONRAD Frederic fred.kon...@greensocs.com This fixes a bug where qemu_icount and qemu_icount_bias are not migrated. It adds a subsection timer/icount to vmstate_timers so icount is migrated only when needed. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com Reviewed-by: Amit Shah amit.s...@redhat.com Reviewed-by: Juan Quintela quint...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/cpus.c b/cpus.c index 127de1c..a6b6557 100644 --- a/cpus.c +++ b/cpus.c @@ -429,6 +429,25 @@ void qemu_clock_warp(QEMUClockType type) } } +static bool icount_state_needed(void *opaque) +{ +return use_icount; +} + +/* + * This is a subsection for icount migration. + */ +static const VMStateDescription icount_vmstate_timers = { +.name = timer/icount, +.version_id = 1, +.minimum_version_id = 1, +.fields = (VMStateField[]) { +VMSTATE_INT64(qemu_icount_bias, TimersState), +VMSTATE_INT64(qemu_icount, TimersState), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_timers = { .name = timer, .version_id = 2, @@ -438,6 +457,14 @@ static const VMStateDescription vmstate_timers = { VMSTATE_INT64(dummy, TimersState), VMSTATE_INT64_V(cpu_clock_offset, TimersState, 2), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection[]) { +{ +.vmsd = icount_vmstate_timers, +.needed = icount_state_needed, +}, { +/* empty */ +} } }; -- 1.9.3
[Qemu-devel] [PULL 08/11] cpu-exec: Add sleeping algorithm
From: Sebastian Tanase sebastian.tan...@openwide.fr The goal is to sleep qemu whenever the guest clock is in advance compared to the host clock (we use the monotonic clocks). The amount of time to sleep is calculated in the execution loop in cpu_exec. At first, we tried to approximate at each for loop the real time elapsed while searching for a TB (generating or retrieving from cache) and executing it. We would then approximate the virtual time corresponding to the number of virtual instructions executed. The difference between these 2 values would allow us to know if the guest is in advance or delayed. However, the function used for measuring the real time (qemu_clock_get_ns(QEMU_CLOCK_REALTIME)) proved to be very expensive. We had an added overhead of 13% of the total run time. Therefore, we modified the algorithm and only take into account the difference between the 2 clocks at the begining of the cpu_exec function. During the for loop we try to reduce the advance of the guest only by computing the virtual time elapsed and sleeping if necessary. The overhead is thus reduced to 3%. Even though this method still has a noticeable overhead, it no longer is a bottleneck in trying to achieve a better guest frequency for which the guest clock is faster than the host one. As for the the alignement of the 2 clocks, with the first algorithm the guest clock was oscillating between -1 and 1ms compared to the host clock. Using the second algorithm we notice that the guest is 5ms behind the host, which is still acceptable for our use case. The tests where conducted using fio and stress. The host machine in an i5 CPU at 3.10GHz running Debian Jessie (kernel 3.12). The guest machine is an arm versatile-pb built with buildroot. Currently, on our test machine, the lowest icount we can achieve that is suitable for aligning the 2 clocks is 6. However, we observe that the IO tests (using fio) are slower than the cpu tests (using stress). Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 79 cpus.c | 17 +++ include/qemu/timer.h | 1 + 3 files changed, 97 insertions(+) diff --git a/cpu-exec.c b/cpu-exec.c index 38e5f02..68f82b6 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -22,6 +22,72 @@ #include tcg.h #include qemu/atomic.h #include sysemu/qtest.h +#include qemu/timer.h + +/* -icount align implementation. */ + +typedef struct SyncClocks { +int64_t diff_clk; +int64_t last_cpu_icount; +} SyncClocks; + +#if !defined(CONFIG_USER_ONLY) +/* Allow the guest to have a max 3ms advance. + * The difference between the 2 clocks could therefore + * oscillate around 0. + */ +#define VM_CLOCK_ADVANCE 300 + +static void align_clocks(SyncClocks *sc, const CPUState *cpu) +{ +int64_t cpu_icount; + +if (!icount_align_option) { +return; +} + +cpu_icount = cpu-icount_extra + cpu-icount_decr.u16.low; +sc-diff_clk += cpu_icount_to_ns(sc-last_cpu_icount - cpu_icount); +sc-last_cpu_icount = cpu_icount; + +if (sc-diff_clk VM_CLOCK_ADVANCE) { +#ifndef _WIN32 +struct timespec sleep_delay, rem_delay; +sleep_delay.tv_sec = sc-diff_clk / 10LL; +sleep_delay.tv_nsec = sc-diff_clk % 10LL; +if (nanosleep(sleep_delay, rem_delay) 0) { +sc-diff_clk -= (sleep_delay.tv_sec - rem_delay.tv_sec) * 10LL; +sc-diff_clk -= sleep_delay.tv_nsec - rem_delay.tv_nsec; +} else { +sc-diff_clk = 0; +} +#else +Sleep(sc-diff_clk / SCALE_MS); +sc-diff_clk = 0; +#endif +} +} + +static void init_delay_params(SyncClocks *sc, + const CPUState *cpu) +{ +if (!icount_align_option) { +return; +} +sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + + cpu_get_clock_offset(); +sc-last_cpu_icount = cpu-icount_extra + cpu-icount_decr.u16.low; +} +#else +static void align_clocks(SyncClocks *sc, const CPUState *cpu) +{ +} + +static void init_delay_params(SyncClocks *sc, const CPUState *cpu) +{ +} +#endif /* CONFIG USER ONLY */ void cpu_loop_exit(CPUState *cpu) { @@ -227,6 +293,8 @@ int cpu_exec(CPUArchState *env) TranslationBlock *tb; uint8_t *tc_ptr; uintptr_t next_tb; +SyncClocks sc; + /* This must be volatile so it is not trashed by longjmp() */ volatile bool have_tb_lock = false; @@ -283,6 +351,13 @@ int cpu_exec(CPUArchState *env) #endif cpu-exception_index = -1; +/* Calculate difference between guest clock and host clock. + * This delay includes the delay of the last cycle, so + * what we have to do is sleep until it is 0. As for the + * advance/delay we gain here, we try to fix it next time. +
[Qemu-devel] [PULL 05/11] icount: Fix virtual clock start value on ARM
From: Sebastian Tanase sebastian.tan...@openwide.fr When using the icount option on ARM, the virtual clock starts counting at realtime clock but it should start at 0. The reason why the virtual clock starts at realtime clock is because the first time we call qemu_clock_warp (which calls icount_warp_rt) in tcg_exec_all, qemu_icount_bias (which is part of the virtual time computation mechanism) will increment by realtime - vm_clock_warp_start, with vm_clock_warp_start being 0 (see icount_warp_rt in cpus.c). By changing the value of vm_clock_warp_start from 0 to -1, the first time we call qemu_clock_warp which calls icount_warp_rt, we will return immediatly because icount_warp_rt first checks if vm_clock_warp_start is -1 and if it's the case it returns. Therefore, qemu_icount_bias will first be incremented by the value of a virtual timer deadline when the virtual cpu goes from active to inactive. The virtual time will start at 0 and increment based on the instruction counter when the vcpu is active or the qemu_icount_bias value when inactive. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index 62636a6..bbb8d4e 100644 --- a/cpus.c +++ b/cpus.c @@ -102,7 +102,7 @@ static bool all_cpu_threads_idle(void) /* Protected by TimersState seqlock */ -static int64_t vm_clock_warp_start; +static int64_t vm_clock_warp_start = -1; /* Conversion factor from emulated instructions to virtual clock ticks. */ static int icount_time_shift; /* Arbitrarily pick 1MIPS as the minimum allowable speed. */ -- 1.9.3
[Qemu-devel] [PULL 04/11] timer: add cpu_icount_to_ns function.
From: KONRAD Frederic fred.kon...@greensocs.com This adds cpu_icount_to_ns function which is needed for reverse execution. It returns the time for a specific instruction. Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c | 7 ++- include/qemu/timer.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/cpus.c b/cpus.c index a6b6557..62636a6 100644 --- a/cpus.c +++ b/cpus.c @@ -146,7 +146,7 @@ static int64_t cpu_get_icount_locked(void) } icount -= (cpu-icount_decr.u16.low + cpu-icount_extra); } -return timers_state.qemu_icount_bias + (icount icount_time_shift); +return timers_state.qemu_icount_bias + cpu_icount_to_ns(icount); } int64_t cpu_get_icount(void) @@ -162,6 +162,11 @@ int64_t cpu_get_icount(void) return icount; } +int64_t cpu_icount_to_ns(int64_t icount) +{ +return icount icount_time_shift; +} + /* return the host CPU cycle counter and handle stop/restart */ /* Caller must hold the BQL */ int64_t cpu_get_ticks(void) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 7f9a074..e12c714 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -745,6 +745,7 @@ static inline int64_t get_clock(void) /* icount */ int64_t cpu_get_icount(void); int64_t cpu_get_clock(void); +int64_t cpu_icount_to_ns(int64_t icount); /***/ /* host CPU ticks (if available) */ -- 1.9.3
[Qemu-devel] [PULL 06/11] icount: Add QemuOpts for icount
From: Sebastian Tanase sebastian.tan...@openwide.fr Make icount parameter use QemuOpts style options in order to easily add other suboptions. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c| 10 +- include/qemu-common.h | 3 ++- qemu-options.hx | 4 ++-- qtest.c | 13 +++-- vl.c | 35 --- 5 files changed, 52 insertions(+), 13 deletions(-) diff --git a/cpus.c b/cpus.c index bbb8d4e..8291044 100644 --- a/cpus.c +++ b/cpus.c @@ -473,13 +473,21 @@ static const VMStateDescription vmstate_timers = { } }; -void configure_icount(const char *option) +void configure_icount(QemuOpts *opts, Error **errp) { +const char *option; + seqlock_init(timers_state.vm_clock_seqlock, NULL); vmstate_register(NULL, 0, vmstate_timers, timers_state); +option = qemu_opt_get(opts, shift); if (!option) { return; } +/* When using -icount shift, the shift option will be + misinterpreted as a boolean */ +if (strcmp(option, on) == 0 || strcmp(option, off) == 0) { +error_setg(errp, The shift option must be a number or auto); +} icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME, icount_warp_rt, NULL); diff --git a/include/qemu-common.h b/include/qemu-common.h index 6ef8282..04b0769 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -41,6 +41,7 @@ #include assert.h #include signal.h #include glib-compat.h +#include qemu/option.h #ifdef _WIN32 #include sysemu/os-win32.h @@ -105,7 +106,7 @@ static inline char *realpath(const char *path, char *resolved_path) #endif /* icount */ -void configure_icount(const char *option); +void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; #include qemu/osdep.h diff --git a/qemu-options.hx b/qemu-options.hx index 1549625..5a1b001 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -3011,11 +3011,11 @@ re-inject them. ETEXI DEF(icount, HAS_ARG, QEMU_OPTION_icount, \ --icount [N|auto]\n \ +-icount [shift=N|auto]\n \ enable virtual instruction counter with 2^N clock ticks per\n \ instruction\n, QEMU_ARCH_ALL) STEXI -@item -icount [@var{N}|auto] +@item -icount [shift=@var{N}|auto] @findex -icount Enable virtual instruction counter. The virtual cpu will execute one instruction every 2^@var{N} ns of virtual time. If @code{auto} is specified diff --git a/qtest.c b/qtest.c index 04a6dc1..ef0d991 100644 --- a/qtest.c +++ b/qtest.c @@ -19,6 +19,9 @@ #include hw/irq.h #include sysemu/sysemu.h #include sysemu/cpus.h +#include qemu/config-file.h +#include qemu/option.h +#include qemu/error-report.h #define MAX_IRQ 256 @@ -509,10 +512,16 @@ static void qtest_event(void *opaque, int event) } } -int qtest_init_accel(MachineClass *mc) +static void configure_qtest_icount(const char *options) { -configure_icount(0); +QemuOpts *opts = qemu_opts_parse(qemu_find_opts(icount), options, 1); +configure_icount(opts, error_abort); +qemu_opts_del(opts); +} +int qtest_init_accel(MachineClass *mc) +{ +configure_qtest_icount(0); return 0; } diff --git a/vl.c b/vl.c index fe451aa..f2621a5 100644 --- a/vl.c +++ b/vl.c @@ -537,6 +537,20 @@ static QemuOptsList qemu_mem_opts = { }, }; +static QemuOptsList qemu_icount_opts = { +.name = icount, +.implied_opt_name = shift, +.merge_lists = true, +.head = QTAILQ_HEAD_INITIALIZER(qemu_icount_opts.head), +.desc = { +{ +.name = shift, +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; + /** * Get machine options * @@ -2908,13 +2922,12 @@ int main(int argc, char **argv, char **envp) { int i; int snapshot, linux_boot; -const char *icount_option = NULL; const char *initrd_filename; const char *kernel_filename, *kernel_cmdline; const char *boot_order; DisplayState *ds; int cyls, heads, secs, translation; -QemuOpts *hda_opts = NULL, *opts, *machine_opts; +QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL; QemuOptsList *olist; int optind; const char *optarg; @@ -2979,6 +2992,7 @@ int main(int argc, char **argv, char **envp) qemu_add_opts(qemu_msg_opts); qemu_add_opts(qemu_name_opts); qemu_add_opts(qemu_numa_opts); +qemu_add_opts(qemu_icount_opts); runstate_init(); @@ -3830,7 +3844,11 @@ int main(int argc, char **argv, char **envp) } break; case QEMU_OPTION_icount: -icount_option = optarg; +icount_opts = qemu_opts_parse(qemu_find_opts(icount), + optarg, 1); +if
[Qemu-devel] [PULL 09/11] cpu-exec: Print to console if the guest is late
From: Sebastian Tanase sebastian.tan...@openwide.fr If the align option is enabled, we print to the user whenever the guest clock is behind the host clock in order for he/she to have a hint about the actual performance. The maximum print interval is 2s and we limit the number of messages to 100. If desired, this can be changed in cpu-exec.c Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/cpu-exec.c b/cpu-exec.c index 68f82b6..3c14502 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -29,6 +29,7 @@ typedef struct SyncClocks { int64_t diff_clk; int64_t last_cpu_icount; +int64_t realtime_clock; } SyncClocks; #if !defined(CONFIG_USER_ONLY) @@ -37,6 +38,9 @@ typedef struct SyncClocks { * oscillate around 0. */ #define VM_CLOCK_ADVANCE 300 +#define THRESHOLD_REDUCE 1.5 +#define MAX_DELAY_PRINT_RATE 20LL +#define MAX_NB_PRINTS 100 static void align_clocks(SyncClocks *sc, const CPUState *cpu) { @@ -68,16 +72,43 @@ static void align_clocks(SyncClocks *sc, const CPUState *cpu) } } +static void print_delay(const SyncClocks *sc) +{ +static float threshold_delay; +static int64_t last_realtime_clock; +static int nb_prints; + +if (icount_align_option +sc-realtime_clock - last_realtime_clock = MAX_DELAY_PRINT_RATE +nb_prints MAX_NB_PRINTS) { +if ((-sc-diff_clk / (float)10LL threshold_delay) || +(-sc-diff_clk / (float)10LL + (threshold_delay - THRESHOLD_REDUCE))) { +threshold_delay = (-sc-diff_clk / 10LL) + 1; +printf(Warning: The guest is now late by %.1f to %.1f seconds\n, + threshold_delay - 1, + threshold_delay); +nb_prints++; +last_realtime_clock = sc-realtime_clock; +} +} +} + static void init_delay_params(SyncClocks *sc, const CPUState *cpu) { if (!icount_align_option) { return; } +sc-realtime_clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); sc-diff_clk = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - - qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + + sc-realtime_clock + cpu_get_clock_offset(); sc-last_cpu_icount = cpu-icount_extra + cpu-icount_decr.u16.low; + +/* Print every 2s max if the guest is late. We limit the number + of printed messages to NB_PRINT_MAX(currently 100) */ +print_delay(sc); } #else static void align_clocks(SyncClocks *sc, const CPUState *cpu) -- 1.9.3
[Qemu-devel] [PULL 01/11] backends: Introduce chr-testdev
From: Paolo Bonzini pbonz...@redhat.com chr-testdev enables a virtio serial channel to be used for guest initiated qemu exits. hw/misc/debugexit already enables guest initiated qemu exits, but only for PC targets. chr-testdev supports any virtio-capable target. kvm-unit-tests/arm is already making use of this backend. Currently there is a single command implemented, q. It takes a (prefix) argument for the exit code, thus an exit is implemented by writing, e.g. 1q, to the virtio-serial port. It can be used as: $QEMU ... \ -device virtio-serial-device \ -device virtserialport,chardev=ctd -chardev testdev,id=ctd or, use: $QEMU ... \ -device virtio-serial-device \ -device virtconsole,chardev=ctd -chardev testdev,id=ctd to bind it to virtio-serial port0. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Andrew Jones drjo...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- backends/Makefile.objs | 2 +- backends/testdev.c | 131 + include/sysemu/char.h | 3 ++ qapi-schema.json | 3 +- qemu-char.c| 4 ++ stubs/Makefile.objs| 1 + stubs/chr-testdev.c| 7 +++ 7 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 backends/testdev.c create mode 100644 stubs/chr-testdev.c diff --git a/backends/Makefile.objs b/backends/Makefile.objs index 506a46c..31a3a89 100644 --- a/backends/Makefile.objs +++ b/backends/Makefile.objs @@ -1,7 +1,7 @@ common-obj-y += rng.o rng-egd.o common-obj-$(CONFIG_POSIX) += rng-random.o -common-obj-y += msmouse.o +common-obj-y += msmouse.o testdev.o common-obj-$(CONFIG_BRLAPI) += baum.o baum.o-cflags := $(SDL_CFLAGS) diff --git a/backends/testdev.c b/backends/testdev.c new file mode 100644 index 000..70d63b3 --- /dev/null +++ b/backends/testdev.c @@ -0,0 +1,131 @@ +/* + * QEMU Char Device for testsuite control + * + * Copyright (c) 2014 Red Hat, Inc. + * + * Author: Paolo Bonzini pbonz...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ +#include qemu-common.h +#include sysemu/char.h + +#define BUF_SIZE 32 + +typedef struct { +CharDriverState *chr; +uint8_t in_buf[32]; +int in_buf_used; +} TestdevCharState; + +/* Try to interpret a whole incoming packet */ +static int testdev_eat_packet(TestdevCharState *testdev) +{ +const uint8_t *cur = testdev-in_buf; +int len = testdev-in_buf_used; +uint8_t c; +int arg; + +#define EAT(c) do { \ +if (!len--) { \ +return 0; \ +} \ +c = *cur++; \ +} while (0) + +EAT(c); + +while (isspace(c)) { +EAT(c); +} + +arg = 0; +while (isdigit(c)) { +arg = arg * 10 + c - '0'; +EAT(c); +} + +while (isspace(c)) { +EAT(c); +} + +switch (c) { +case 'q': +exit((arg 1) | 1); +break; +default: +break; +} +return cur - testdev-in_buf; +} + +/* The other end is writing some data. Store it and try to interpret */ +static int testdev_write(CharDriverState *chr, const uint8_t *buf, int len) +{ +TestdevCharState *testdev = chr-opaque; +int tocopy, eaten, orig_len = len; + +while (len) { +/* Complete our buffer as much as possible */ +tocopy = MIN(len, BUF_SIZE - testdev-in_buf_used); + +memcpy(testdev-in_buf + testdev-in_buf_used, buf, tocopy); +testdev-in_buf_used += tocopy; +buf += tocopy; +len -= tocopy; + +/* Interpret it as much as possible */ +while (testdev-in_buf_used 0 + (eaten = testdev_eat_packet(testdev)) 0) { +memmove(testdev-in_buf, testdev-in_buf + eaten, +testdev-in_buf_used - eaten); +testdev-in_buf_used -= eaten; +} +} +return orig_len; +} + +static void testdev_close(struct CharDriverState *chr) +{ +TestdevCharState
[Qemu-devel] [PULL 07/11] icount: Add align option to icount
From: Sebastian Tanase sebastian.tan...@openwide.fr The align option is used for activating the align algorithm in order to synchronise the host clock and the guest clock. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpus.c| 19 --- include/qemu-common.h | 1 + qemu-options.hx | 15 +-- vl.c | 4 4 files changed, 30 insertions(+), 9 deletions(-) diff --git a/cpus.c b/cpus.c index 8291044..7e09538 100644 --- a/cpus.c +++ b/cpus.c @@ -476,25 +476,30 @@ static const VMStateDescription vmstate_timers = { void configure_icount(QemuOpts *opts, Error **errp) { const char *option; +char *rem_str = NULL; seqlock_init(timers_state.vm_clock_seqlock, NULL); vmstate_register(NULL, 0, vmstate_timers, timers_state); option = qemu_opt_get(opts, shift); if (!option) { +if (qemu_opt_get(opts, align) != NULL) { +error_setg(errp, Please specify shift option when using align); +} return; } -/* When using -icount shift, the shift option will be - misinterpreted as a boolean */ -if (strcmp(option, on) == 0 || strcmp(option, off) == 0) { -error_setg(errp, The shift option must be a number or auto); -} - +icount_align_option = qemu_opt_get_bool(opts, align, false); icount_warp_timer = timer_new_ns(QEMU_CLOCK_REALTIME, icount_warp_rt, NULL); if (strcmp(option, auto) != 0) { -icount_time_shift = strtol(option, NULL, 0); +errno = 0; +icount_time_shift = strtol(option, rem_str, 0); +if (errno != 0 || *rem_str != '\0' || !strlen(option)) { +error_setg(errp, icount: Invalid shift value); +} use_icount = 1; return; +} else if (icount_align_option) { +error_setg(errp, shift=auto and align=on are incompatible); } use_icount = 2; diff --git a/include/qemu-common.h b/include/qemu-common.h index 04b0769..5d10ac2 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -108,6 +108,7 @@ static inline char *realpath(const char *path, char *resolved_path) /* icount */ void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; +extern int icount_align_option; #include qemu/osdep.h #include qemu/bswap.h diff --git a/qemu-options.hx b/qemu-options.hx index 5a1b001..96516c1 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -3011,9 +3011,9 @@ re-inject them. ETEXI DEF(icount, HAS_ARG, QEMU_OPTION_icount, \ --icount [shift=N|auto]\n \ +-icount [shift=N|auto][,align=on|off]\n \ enable virtual instruction counter with 2^N clock ticks per\n \ -instruction\n, QEMU_ARCH_ALL) +instruction and enable aligning the host and virtual clocks\n, QEMU_ARCH_ALL) STEXI @item -icount [shift=@var{N}|auto] @findex -icount @@ -3026,6 +3026,17 @@ Note that while this option can give deterministic behavior, it does not provide cycle accurate emulation. Modern CPUs contain superscalar out of order cores with complex cache hierarchies. The number of instructions executed often has little or no correlation with actual performance. + +@option{align=on} will activate the delay algorithm which will try to +to synchronise the host clock and the virtual clock. The goal is to +have a guest running at the real frequency imposed by the shift option. +Whenever the guest clock is behind the host clock and if +@option{align=on} is specified then we print a messsage to the user +to inform about the delay. +Currently this option does not work when @option{shift} is @code{auto}. +Note: The sync algorithm will work for those shift values for which +the guest clock runs ahead of the host clock. Typically this happens +when the shift value is high (how high depends on the host machine). ETEXI DEF(watchdog, HAS_ARG, QEMU_OPTION_watchdog, \ diff --git a/vl.c b/vl.c index f2621a5..a8029d5 100644 --- a/vl.c +++ b/vl.c @@ -183,6 +183,7 @@ uint8_t *boot_splash_filedata; size_t boot_splash_filedata_size; uint8_t qemu_extra_params_fw[2]; +int icount_align_option; typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { @@ -546,6 +547,9 @@ static QemuOptsList qemu_icount_opts = { { .name = shift, .type = QEMU_OPT_STRING, +}, { +.name = align, +.type = QEMU_OPT_BOOL, }, { /* end of list */ } }, -- 1.9.3
[Qemu-devel] [PULL 11/11] target-mips: Ignore unassigned accesses with KVM
From: James Hogan james.ho...@imgtec.com MIPS registers an unassigned access handler which raises a guest bus error exception. However this causes QEMU to crash when KVM is enabled as it isn't called from the main execution loop so longjmp() gets called without a corresponding setjmp(). Until the KVM API can be updated to trigger a guest exception in response to an MMIO exit, prevent the bus error exception being raised from mips_cpu_unassigned_access() if KVM is enabled. The check is at run time since the do_unassigned_access callback is initialised before it is known whether KVM will be enabled. The problem can be triggered with Malta emulation by making the guest write to the reset region at physical address 0x1bf0, since it is marked read-only which is treated as unassigned for writes. Signed-off-by: James Hogan james.ho...@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net Cc: Peter Maydell peter.mayd...@linaro.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sanjay Lal sanj...@kymasys.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-mips/op_helper.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 27651a4..df97b35 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -21,6 +21,7 @@ #include qemu/host-utils.h #include exec/helper-proto.h #include exec/cpu_ldst.h +#include sysemu/kvm.h #ifndef CONFIG_USER_ONLY static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr, MIPSCPU *cpu = MIPS_CPU(cs); CPUMIPSState *env = cpu-env; +/* + * Raising an exception with KVM enabled will crash because it won't be from + * the main execution loop so the longjmp won't have a matching setjmp. + * Until we can trigger a bus error exception through KVM lets just ignore + * the access. + */ +if (kvm_enabled()) { +return; +} + if (is_exec) { helper_raise_exception(env, EXCP_IBE); } else { -- 1.9.3
[Qemu-devel] [PULL 10/11] monitor: Add drift info to 'info jit'
From: Sebastian Tanase sebastian.tan...@openwide.fr Show in 'info jit' the current delay between the host clock and the guest clock. In addition, print the maximum advance and delay of the guest compared to the host. Signed-off-by: Sebastian Tanase sebastian.tan...@openwide.fr Tested-by: Camille Bégué camille.be...@openwide.fr Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- cpu-exec.c| 6 ++ cpus.c| 15 +++ include/qemu-common.h | 4 monitor.c | 1 + 4 files changed, 26 insertions(+) diff --git a/cpu-exec.c b/cpu-exec.c index 3c14502..cbc8067 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -105,6 +105,12 @@ static void init_delay_params(SyncClocks *sc, sc-realtime_clock + cpu_get_clock_offset(); sc-last_cpu_icount = cpu-icount_extra + cpu-icount_decr.u16.low; +if (sc-diff_clk max_delay) { +max_delay = sc-diff_clk; +} +if (sc-diff_clk max_advance) { +max_advance = sc-diff_clk; +} /* Print every 2s max if the guest is late. We limit the number of printed messages to NB_PRINT_MAX(currently 100) */ diff --git a/cpus.c b/cpus.c index 19245e9..a5f78c1 100644 --- a/cpus.c +++ b/cpus.c @@ -64,6 +64,8 @@ #endif /* CONFIG_LINUX */ static CPUState *next_cpu; +int64_t max_delay; +int64_t max_advance; bool cpu_is_stopped(CPUState *cpu) { @@ -1552,3 +1554,16 @@ void qmp_inject_nmi(Error **errp) error_set(errp, QERR_UNSUPPORTED); #endif } + +void dump_drift_info(FILE *f, fprintf_function cpu_fprintf) +{ +cpu_fprintf(f, Host - Guest clock %ld(ms)\n, +(cpu_get_clock() - cpu_get_icount())/SCALE_MS); +if (icount_align_option) { +cpu_fprintf(f, Max guest delay %ld(ms)\n, -max_delay/SCALE_MS); +cpu_fprintf(f, Max guest advance %ld(ms)\n, max_advance/SCALE_MS); +} else { +cpu_fprintf(f, Max guest delay NA\n); +cpu_fprintf(f, Max guest advance NA\n); +} +} diff --git a/include/qemu-common.h b/include/qemu-common.h index 5d10ac2..bcf7a6a 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -109,6 +109,10 @@ static inline char *realpath(const char *path, char *resolved_path) void configure_icount(QemuOpts *opts, Error **errp); extern int use_icount; extern int icount_align_option; +/* drift information for info jit command */ +extern int64_t max_delay; +extern int64_t max_advance; +void dump_drift_info(FILE *f, fprintf_function cpu_fprintf); #include qemu/osdep.h #include qemu/bswap.h diff --git a/monitor.c b/monitor.c index 5bc70a6..cdbaa60 100644 --- a/monitor.c +++ b/monitor.c @@ -1047,6 +1047,7 @@ static void do_info_registers(Monitor *mon, const QDict *qdict) static void do_info_jit(Monitor *mon, const QDict *qdict) { dump_exec_info((FILE *)mon, monitor_fprintf); +dump_drift_info((FILE *)mon, monitor_fprintf); } static void do_info_history(Monitor *mon, const QDict *qdict) -- 1.9.3
[Qemu-devel] [Bug 1285708] Re: FreeBSD Guest crash on boot due to xsave instruction issue
You don't need +xsave, -cpu host just works. The bug is invalid. You are requesting a CPU that doesn't exist (a core2duo that supports XSAVE), and the guest's behavior is probably not going to be well defined. ** Changed in: qemu Status: New = Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1285708 Title: FreeBSD Guest crash on boot due to xsave instruction issue Status in QEMU: Invalid Status in qemu-kvm: New Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Precise: Won't Fix Status in “linux” source package in Trusty: In Progress Bug description: When trying to boot a working FreeBSD 9.1/9.2 guest on a kvm/qemu host with the following command: kvm -m 256 -cdrom FreeBSD-9.2-RELEASE-amd64-disc1.iso -drive file=FreeBSD-9.2-RELEASE-amd64.qcow2,if=virtio -net nic,model=virtio -net user -nographic -vnc :10 -enable-kvm -balloon virtio -cpu core2duo,+xsave The FreeBSD Guest will kernel crash on boot with the following error: panic: CPU0 does not support X87 or SSE: 0 When launching the guest without the cpu flags, it works just fine. This bug has been resolved in source: https://lkml.org/lkml/2014/2/22/58 Can this fix be included in Precise ASAP! To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1285708/+subscriptions
Re: [Qemu-devel] [PATCH v2] vmdk: improve streamOptimized vmdk support
On Tue, Aug 5, 2014 at 9:44 PM, Fam Zheng f...@redhat.com wrote: On Tue, 08/05 12:44, Milos Vyletel wrote: On Tue, Aug 5, 2014 at 1:27 AM, Fam Zheng f...@redhat.com wrote: Does putting a monolithicSparse into the OVA work in this case? It does not. I did not try to import it to OVM but ESXi server complained about the format of vmdk in OVA archive (don't have exact error message anymore). I did look in OVF specifications and it does not state that the vmdk must be streamOptimized but it looks like VMWare is enforcing it. I can try to create OVA with monolithicSparse image and retry if you want to know exact error I got. I'm curious about the tool you're using to create OVA, because there are xml files in OVA/OVF that qemu has no knowledge about. Is it some tool from VMware? Fam Nope. It's just perl script that creates OVF, manifest and packs them together with vmdk. Qemu just does the conversion to vmdk in this case. Milos