Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to kvm if the host supports it
On 02.10.2012 11:46, Markus Armbruster wrote: Daniel P. Berrange berra...@redhat.com writes: IMHO, default to KVM, fallback to TCG is the most friendly default behaviour. Friendly perhaps, generating an infinite series of questions why is my guest slow as molasses? certainly. With a warning about switching to slow emulation mode because .. printed at startup that becomes a non-issue, because there's no reason to ask more questions about why it is slow - it already said why. Yes some may try to ask what to do, which is different. Every howto nowadays mentions kvm modules and /dev/kvm device permissions. And for each instance of the question, there's an unknown number of users who give QEMU a quick try, screw up KVM unknowingly, observe the glacial speed, and conclude it's crap. This is, again, I think, unfair. With the warning message it becomes more or less obvious. If you're talking about users who run it with -daemonize argument - this is a) stupid to do when TRYING it out, so it's not a big deal to lose another stupid user, and b) qemu should init everything first and throw all warnings and fatal errors before daemonizing, if this is not the case it should be fixed in the code. And if you're talking about management software (libvirt and others), it controls all the required privileges already and explicitly requests acceleration and other stuff. So the best thing to do is what Daniel, Aurelien, Paolo and others are suggested: accel=kvm:tcg with a warning. Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Michael S. Tsirkin m...@redhat.com writes: Thinking about Sasha's patches, we can reduce ring usage for virtio net small packets dramatically if we put virtio net header inline with the data. This can be done for free in case guest net stack allocated extra head room for the packet, and I don't see why would this have any downsides. I've been wanting to do this for the longest time... but... Even though with my recent patches qemu no longer requires header to be the first s/g element, we need a new feature bit to detect this. A trivial qemu patch will be sent separately. There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. So my plan was to tie this assumption to the new PCI layout. And have a stress-testing patch like the one below in the kernel (see my virtio-wip branch for stuff like this). Turn it on at boot with virtio_ring.torture on the kernel commandline. BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0) is too old. Building the latest git now... Cheers, Rusty. Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE Virtio devices are not supposed to depend on the framing of the scatter-gather lists, but various implementations did. Safeguard this in future by adding an option to deliberately create perverse descriptors. Signed-off-by: Rusty Russell ru...@rustcorp.com.au diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 8d5bddb..930a4ea 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -5,6 +5,15 @@ config VIRTIO bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST, CONFIG_RPMSG or CONFIG_S390_GUEST. +config VIRTIO_DEVICE_TORTURE + bool Virtio device torture tests + depends on VIRTIO DEBUG_KERNEL + help + This makes the virtio_ring implementation creatively change + the format of requests to make sure that devices are + properly implemented. This will make your virtual machine + slow *and* unreliable! Say N. + menu Virtio drivers config VIRTIO_PCI diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index e639584..8893753 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -124,6 +124,149 @@ struct vring_virtqueue #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE +static bool torture; +module_param(torture, bool, 0644); + +struct torture { + unsigned int orig_out, orig_in; + void *orig_data; + struct scatterlist sg[4]; + struct scatterlist orig_sg[]; +}; + +static size_t tot_len(struct scatterlist sg[], unsigned num) +{ + size_t len, i; + + for (len = 0, i = 0; i num; i++) + len += sg[i].length; + + return len; +} + +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum, +const struct scatterlist *src, unsigned snum) +{ + unsigned len; + struct scatterlist s, d; + + s = *src; + d = *dst; + + while (snum dnum) { + len = min(s.length, d.length); + memcpy(sg_virt(d), sg_virt(s), len); + d.offset += len; + d.length -= len; + s.offset += len; + s.length -= len; + if (!s.length) { + BUG_ON(snum == 0); + src++; + snum--; + s = *src; + } + if (!d.length) { + BUG_ON(dnum == 0); + dst++; + dnum--; + d = *dst; + } + } +} + +static bool torture_replace(struct scatterlist **sg, +unsigned int *out, +unsigned int *in, +void **data, +gfp_t gfp) +{ + static size_t seed; + struct torture *t; + size_t outlen, inlen, ourseed, len1; + void *buf; + + if (!torture) + return true; + + outlen = tot_len(*sg, *out); + inlen = tot_len(*sg + *out, *in); + + /* This will break horribly on large block requests. */ + t = kmalloc(sizeof(*t) + (*out + *in) * sizeof(t-orig_sg[1]) + + outlen + 1 + inlen + 1, gfp); + if (!t) + return false; + + sg_init_table(t-sg, 4); + buf = t-orig_sg[*out + *in]; + + memcpy(t-orig_sg, *sg, sizeof(**sg) * (*out + *in)); + t-orig_out = *out; + t-orig_in = *in; + t-orig_data = *data; + *data = t; + + ourseed = ACCESS_ONCE(seed); + seed++; + + *sg = t-sg; + if (outlen) { + /* Split outbuf into two parts, one byte apart. */ + *out = 2; + len1 = ourseed %
Re: usr/include/linux/kvm_para.h:26: included file 'asm-m68k/kvm_para.h' is not exported
On Wed, Oct 3, 2012 at 3:44 AM, Fengguang Wu fengguang...@intel.com wrote: FYI, something goes wrong since commit: 2bbc89a8e9c652ee71c6c3b2e0679b7ecedb1a09 m68k: Use Kbuild logic to import asm-generic headers config: m68k-allmodconfig All error/warnings: usr/include/linux/kexec.h:49: userspace cannot reference function or variable defined in the kernel usr/include/linux/kvm_para.h:26: included file 'asm-m68k/kvm_para.h' is not exported usr/include/linux/soundcard.h:1054: userspace cannot reference function or variable defined in the kernel Yes, this is a known issue, cfr. e.g. https://lkml.org/lkml/2012/9/16/77 The kvm and kbuild people have to get their act together and agree on a solution. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Set default accelerator to kvm if the host supports it
On 2012-10-01 18:20, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: If we built a target for a host that supports KVM in principle, set the default accelerator to KVM as well. This also means the start of QEMU will fail to start if KVM support turns out to be unavailable at runtime. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |1 + kvm-stub.c |1 + kvm.h |1 + vl.c |4 ++-- 4 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 92a7137..4d5f86c 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -103,6 +103,7 @@ struct KVMState #endif }; +bool kvm_configured = true; KVMState *kvm_state; bool kvm_kernel_irqchip; bool kvm_async_interrupts_allowed; diff --git a/kvm-stub.c b/kvm-stub.c index 3c52eb5..86a6451 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -17,6 +17,7 @@ #include gdbstub.h #include kvm.h +bool kvm_configured; KVMState *kvm_state; bool kvm_kernel_irqchip; bool kvm_async_interrupts_allowed; diff --git a/kvm.h b/kvm.h index dea2998..9936e5f 100644 --- a/kvm.h +++ b/kvm.h @@ -22,6 +22,7 @@ #include linux/kvm.h #endif +extern bool kvm_configured; extern int kvm_allowed; extern bool kvm_kernel_irqchip; extern bool kvm_async_interrupts_allowed; diff --git a/vl.c b/vl.c index 8d305ca..f557bd1 100644 --- a/vl.c +++ b/vl.c @@ -2215,8 +2215,8 @@ static int configure_accelerator(void) } if (p == NULL) { -/* Use the default accelerator, tcg */ -p = tcg; +/* The default accelerator depends on the availability of KVM. */ +p = kvm_configured ? kvm : tcg; } How about making this an arch_init() function call and then using a #if defined(KVM_CONFIG) in arch_init.c? I hate to introduce another global variable if we can avoid it... Hacked too quickly. In fact, kvm_configured is simply kvm_available(). However, resistance appear to be too high here. Jan Otherwise: Acked-by: Anthony Liguori aligu...@us.ibm.com Blue/Aurelien, any objections? Regards, Anthony Liguori while (!accel_initialised *p != '\0') { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] qemu-kvm: remove boot=on|off drive parameter compatibility
On Mon, Oct 01, 2012 at 03:26:05PM +0200, Jan Kiszka wrote: On 2012-10-01 15:19, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-10-01 11:31, Marcelo Tosatti wrote: It's not just about default configs. We need to validate if the migration formats are truly compatible (qemu-kvm - QEMU, the other way around definitely not). For the command line switches, we could provide a wrapper script that translates them into upstream format or simply ignores them. That should be harmless to carry upstream. qemu-kvm has: -no-kvm -no-kvm-irqchip -no-kvm-pit -no-kvm-pit-reinjection -tdf - does nothing There are replacements for all of the above. If we need to add them to qemu.git, it's not big deal to add them. But I don't think we should add them to the source code. This can perfectly be handled my a (disposable) script layer on top of qemu-system-x86_64 - the namespace (qemu-kvm in most cases) is also free. -drive ...,boot= - this is ignored cpu_set command for CPU hotplug which is known broken in qemu-kvm. Right, so nothing is lost when migrating to QEMU. testdev which is nice but only used for development Jan, do you have a plan for testdev device? It would be a pity to have qemu-kvm just for that. Default nic is rtl8139 vs. e1000. Some logic to move change the default VGA ram size to 16mb for pc-1.2 (QEMU uses 16mb by default now too). Also nicely manageable in a wrapper. I think at this point, none of this matters but I added the various distro maintainers to the thread. I think it's time for the distros to drop qemu-kvm and just ship qemu.git. +1 Jan Is there anything else that needs to happen to make that switch? Regards, Anthony Liguori -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v2 08/10] ARM: KVM: VGIC initialisation code
On Tue, Oct 02, 2012 at 08:45:54PM +0100, Peter Maydell wrote: On 2 October 2012 20:28, Will Deacon will.dea...@arm.com wrote: On Tue, Oct 02, 2012 at 07:31:43PM +0100, Peter Maydell wrote: We probably want to be passing in the base of the cpu-internal peripherals, rather than base of the GIC specifically. For the A15 these are the same thing, but that's not inherent [compare the A9 which has more devices at fixed offsets from a configurable base address]. If you do that, userspace will need a way to probe the emulated CPU so that is knows exactly which set of peripherals there are and which ones it needs to emulate. This feels pretty nasty, given that the vgic is handled more or less completely by the kernel-side of things. Userspace knows what the emulated CPU is because it tells the kernel which CPU to provide -- the kernel can say yes or no but it can't provide a different CPU to the one we ask for, or one with bits mising... Aha, ok, I didn't realise that's how it works. Does userspace just pass the CPUID or is there an identifier provided by kvm? /me jumps back into the code. Thanks, Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm: remove boot=on|off drive parameter compatibility
On Wed, Oct 03, 2012 at 12:06:57PM +0200, Jan Kiszka wrote: On 2012-10-03 11:55, Gleb Natapov wrote: On Mon, Oct 01, 2012 at 03:26:05PM +0200, Jan Kiszka wrote: On 2012-10-01 15:19, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-10-01 11:31, Marcelo Tosatti wrote: It's not just about default configs. We need to validate if the migration formats are truly compatible (qemu-kvm - QEMU, the other way around definitely not). For the command line switches, we could provide a wrapper script that translates them into upstream format or simply ignores them. That should be harmless to carry upstream. qemu-kvm has: -no-kvm -no-kvm-irqchip -no-kvm-pit -no-kvm-pit-reinjection -tdf - does nothing There are replacements for all of the above. If we need to add them to qemu.git, it's not big deal to add them. But I don't think we should add them to the source code. This can perfectly be handled my a (disposable) script layer on top of qemu-system-x86_64 - the namespace (qemu-kvm in most cases) is also free. -drive ...,boot= - this is ignored cpu_set command for CPU hotplug which is known broken in qemu-kvm. Right, so nothing is lost when migrating to QEMU. testdev which is nice but only used for development Jan, do you have a plan for testdev device? It would be a pity to have qemu-kvm just for that. Nope, not on my schedule. Understood :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Error: KVM Guest with virtio network driver loses network connectivity
Hi all, I setup Host with centos 6.0 - 64bits, Guest with centos5.3 - 64bits (kernel updated), I have installed qemu-kvm-tool http://rpmfind.net/linux/RPM/centos/updates/6.3/x86_64/Packages/qemu-kvm-tools-0.12.1.2-2.295.el6_3.1.x86_64.html with Bug Fix: 804578 But I can't fix error: KVM Guest with virtio network driver loses network connectivity Please help me :( Thanks Hung -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: KVM Guest with virtio network driver loses network connectivity
On 03.10.2012 14:32, hung -cuncon wrote: Hi all, I setup Host with centos 6.0 - 64bits, Guest with centos5.3 - 64bits (kernel updated), I have installed qemu-kvm-tool http://rpmfind.net/linux/RPM/centos/updates/6.3/x86_64/Packages/qemu-kvm-tools-0.12.1.2-2.295.el6_3.1.x86_64.html with Bug Fix: 804578 Please address this to redhat support staff. It is unrealistic for people on this list to be able to deal with ancient and heavily patched kernel and qemu-kvm where only redhat knows the changes they've made. Thank you. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Il 03/10/2012 08:44, Rusty Russell ha scritto: There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. However, this bug affects (almost) all implementations and (almost) all devices. It even makes sense to reserve a transport feature bit for it instead of a device feature bit. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/6] Use machine options to emulate -no-kvm-pit
Commit e81dda195556e72f8cd294998296c1051aab30a8 from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Leave the related command line option in place, just issuing a warning that it has no function anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3066,7 +3066,11 @@ int main(int argc, char **argv, char **e qemu_opts_parse(olist, kernel_irqchip=off, 0); break; } - +case QEMU_OPTION_no_kvm_pit: { +fprintf(stderr, Warning: KVM PIT can no longer be disabled +separately.\n); +break; +} case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2841,6 +2841,10 @@ ETEXI DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, -no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n, QEMU_ARCH_I386) +DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, +-no-kvm-pit disable KVM kernel mode PIT\n, +QEMU_ARCH_I386) + HXCOMM This is the last statement. Insert new options before this line! STEXI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/6] Use global properties to emulate -no-kvm-pit-reinjection
Commit 80019541e9c13fab476bee35edcef3e11646222c from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Use global properties to emulate -no-kvm-pit-reinjection Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3071,6 +3071,21 @@ int main(int argc, char **argv, char **e separately.\n); break; } +case QEMU_OPTION_no_kvm_pit_reinjection: { +static GlobalProperty kvm_pit_lost_tick_policy[] = { +{ +.driver = kvm-pit, +.property = lost_tick_policy, +.value= discard, +}, +{ /* end of list */ } +}; + +fprintf(stderr, Warning: option deprecated, use +lost_tick_policy property of kvm-pit instead.\n); +qdev_prop_register_global_list(kvm_pit_lost_tick_policy); +break; +} case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2844,7 +2844,10 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_ DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, -no-kvm-pit disable KVM kernel mode PIT\n, QEMU_ARCH_I386) - +DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection, +-no-kvm-pit-reinjection\n +disable KVM kernel mode PIT interrupt reinjection\n, +QEMU_ARCH_I386) HXCOMM This is the last statement. Insert new options before this line! STEXI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/6] Use machine options to emulate -no-kvm-irqchip
Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3061,6 +3061,12 @@ int main(int argc, char **argv, char **e machine = machine_parse(optarg); } break; +case QEMU_OPTION_no_kvm_irqchip: { +olist = qemu_find_opts(machine); +qemu_opts_parse(olist, kernel_irqchip=off, 0); +break; +} + case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2838,6 +2838,10 @@ STEXI Enable FIPS 140-2 compliance mode. ETEXI +DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, +-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n, +QEMU_ARCH_I386) + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 6/6] Emulate qemu-kvms -tdf option
Commit d527b774878defc27f317cdde19b5c54fd0d5666 from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Add a warning that there is no effect anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3169,6 +3169,10 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_semihosting: semihosting_enabled = 1; break; +case QEMU_OPTION_tdf: +fprintf(stderr, Warning: user space PIT time drift fix +is no longer supported.\n); +break; case QEMU_OPTION_name: qemu_name = g_strdup(optarg); { Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2849,6 +2849,10 @@ DEF(no-kvm-pit-reinjection, 0, QEMU_OP disable KVM kernel mode PIT interrupt reinjection\n, QEMU_ARCH_I386) +DEF(tdf, 0, QEMU_OPTION_tdf, +-tdftime drift fix (deprecated)\n, +QEMU_ARCH_ALL) + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/6] cirrus_vga: allow configurable vram size
Allow RAM size to be configurable for cirrus, to allow migration compatibility from qemu-kvm. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/hw/cirrus_vga.c === --- qemu-compat-kvm.orig/hw/cirrus_vga.c +++ qemu-compat-kvm/hw/cirrus_vga.c @@ -43,8 +43,6 @@ //#define DEBUG_CIRRUS //#define DEBUG_BITBLT -#define VGA_RAM_SIZE (8192 * 1024) - /*** * * definitions @@ -2853,7 +2851,8 @@ static void cirrus_init_common(CirrusVGA /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_io, cirrus_linear_io_ops, s, - cirrus-linear-io, VGA_RAM_SIZE); + cirrus-linear-io, s-vga.vram_size_mb + * 1024 * 1024); /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_bitblt_io, @@ -2893,7 +2892,6 @@ static int vga_initfn(ISADevice *dev) ISACirrusVGAState *d = DO_UPCAST(ISACirrusVGAState, dev, dev); VGACommonState *s = d-cirrus_vga.vga; -s-vram_size_mb = VGA_RAM_SIZE 20; vga_common_init(s); cirrus_init_common(d-cirrus_vga, CIRRUS_ID_CLGD5430, 0, isa_address_space(dev)); @@ -2906,6 +2904,12 @@ static int vga_initfn(ISADevice *dev) return 0; } +static Property isa_vga_cirrus_properties[] = { +DEFINE_PROP_UINT32(vgamem_mb, struct ISACirrusVGAState, + cirrus_vga.vga.vram_size_mb, 8), +DEFINE_PROP_END_OF_LIST(), +}; + static void isa_cirrus_vga_class_init(ObjectClass *klass, void *data) { ISADeviceClass *k = ISA_DEVICE_CLASS(klass); @@ -2913,6 +2917,7 @@ static void isa_cirrus_vga_class_init(Ob dc-vmsd = vmstate_cirrus_vga; k-init = vga_initfn; +dc-props = isa_vga_cirrus_properties; } static TypeInfo isa_cirrus_vga_info = { @@ -2936,7 +2941,6 @@ static int pci_cirrus_vga_initfn(PCIDevi int16_t device_id = pc-device_id; /* setup VGA */ - s-vga.vram_size_mb = VGA_RAM_SIZE 20; vga_common_init(s-vga); cirrus_init_common(s, device_id, 1, pci_address_space(dev)); s-vga.ds = graphic_console_init(s-vga.update, s-vga.invalidate, @@ -2968,6 +2972,12 @@ DeviceState *pci_cirrus_vga_init(PCIBus return pci_create_simple(bus, -1, cirrus-vga)-qdev; } +static Property pci_vga_cirrus_properties[] = { +DEFINE_PROP_UINT32(vgamem_mb, struct PCICirrusVGAState, + cirrus_vga.vga.vram_size_mb, 8), +DEFINE_PROP_END_OF_LIST(), +}; + static void cirrus_vga_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -2981,6 +2991,7 @@ static void cirrus_vga_class_init(Object k-class_id = PCI_CLASS_DISPLAY_VGA; dc-desc = Cirrus CLGD 54xx VGA; dc-vmsd = vmstate_pci_cirrus_vga; +dc-props = pci_vga_cirrus_properties; } static TypeInfo cirrus_vga_info = { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/6] Emulate qemu-kvms drive parameter boot=on|off
Commit 841280b6c224ea2c6edc2f5afc2add513c85181d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com We do not want to maintain this option forever. It will be removed after a grace period of a few releases. So warn the user that this option has no effect and will become invalid soon. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/blockdev.c === --- qemu-compat-kvm.orig/blockdev.c +++ qemu-compat-kvm/blockdev.c @@ -432,6 +432,12 @@ DriveInfo *drive_init(QemuOpts *opts, in return NULL; } +if (qemu_opt_get(opts, boot) != NULL) { +fprintf(stderr, qemu-kvm: boot=on|off is deprecated and will be +ignored. Future versions will reject this parameter. Please +update your scripts.\n); +} + on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { Index: qemu-compat-kvm/qemu-config.c === --- qemu-compat-kvm.orig/qemu-config.c +++ qemu-compat-kvm/qemu-config.c @@ -114,6 +114,10 @@ static QemuOptsList qemu_drive_opts = { .name = copy-on-read, .type = QEMU_OPT_BOOL, .help = copy read data from backing file into image file, +},{ +.name = boot, +.type = QEMU_OPT_BOOL, +.help = (deprecated, ignored), }, { /* end of list */ } }, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/6] qemu-kvm compat
As discussed on yesterdays qemu call, follows qemu-kvm compat patches for qemu: - command line compatibility - allow configurable ram size for cirrus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm: remove boot=on|off drive parameter compatibility
On 10/03/2012 06:55 AM, Gleb Natapov wrote: On Mon, Oct 01, 2012 at 03:26:05PM +0200, Jan Kiszka wrote: On 2012-10-01 15:19, Anthony Liguori wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2012-10-01 11:31, Marcelo Tosatti wrote: It's not just about default configs. We need to validate if the migration formats are truly compatible (qemu-kvm - QEMU, the other way around definitely not). For the command line switches, we could provide a wrapper script that translates them into upstream format or simply ignores them. That should be harmless to carry upstream. qemu-kvm has: -no-kvm -no-kvm-irqchip -no-kvm-pit -no-kvm-pit-reinjection -tdf - does nothing There are replacements for all of the above. If we need to add them to qemu.git, it's not big deal to add them. But I don't think we should add them to the source code. This can perfectly be handled my a (disposable) script layer on top of qemu-system-x86_64 - the namespace (qemu-kvm in most cases) is also free. -drive ...,boot= - this is ignored cpu_set command for CPU hotplug which is known broken in qemu-kvm. Right, so nothing is lost when migrating to QEMU. testdev which is nice but only used for development Jan, do you have a plan for testdev device? It would be a pity to have qemu-kvm just for that. Yep, I did send patches with the testdev device present on qemu-kvm.git to qemu.git a while ago, but there were many comments on the review, I ended up not implementing everything that was asked and the patches were archived. If nobody wants to step up to port it, I'll re-read the original thread and will spin up new patches (and try to go through the end with it). Executing the KVM unittests is something that we can't afford to lose, so I'd say it's important on this last mile effort to get rid of qemu-kvm. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[virt][PATCH 1/3] virt: Adds OpenVSwitch support to virt tests.
When autotest tries add tap to bridge then test recognize if test is bridge is standard linux or OpenVSwitch. And adds some utils for bridge manipulation. Signed-off-by: Jiří Župka jzu...@redhat.com --- virttest/utils_misc.py | 473 ++-- 1 files changed, 459 insertions(+), 14 deletions(-) diff --git a/virttest/utils_misc.py b/virttest/utils_misc.py index d37cf87..f03e922 100644 --- a/virttest/utils_misc.py +++ b/virttest/utils_misc.py @@ -8,9 +8,10 @@ import time, string, random, socket, os, signal, re, logging, commands, cPickle import fcntl, shelve, ConfigParser, sys, UserDict, inspect, tarfile import struct, shutil, glob, HTMLParser, urllib, traceback, platform from autotest.client import utils, os_dep -from autotest.client.shared import error, logging_config +from autotest.client.shared import error, logging_config, openvswitch from autotest.client.shared import logging_manager, git + try: import koji KOJI_INSTALLED = True @@ -25,6 +26,7 @@ if ARCH == ppc64: SIOCSIFFLAGS = 0x8914 SIOCGIFINDEX = 0x8933 SIOCBRADDIF= 0x89a2 +SIOCBRDELIF= 0x89a3 # From linux/include/linux/if_tun.h TUNSETIFF = 0x800454ca TUNGETIFF = 0x400454d2 @@ -38,9 +40,10 @@ else: # From include/linux/sockios.h SIOCSIFHWADDR = 0x8924 SIOCGIFHWADDR = 0x8927 -SIOCSIFFLAGS = 0x8914 -SIOCGIFINDEX = 0x8933 -SIOCBRADDIF = 0x89a2 +SIOCSIFFLAGS = 0x8914 +SIOCGIFINDEX = 0x8933 +SIOCBRADDIF = 0x89a2 +SIOCBRDELIF = 0x89a3 # From linux/include/linux/if_tun.h TUNSETIFF = 0x400454ca TUNGETIFF = 0x800454d2 @@ -52,6 +55,110 @@ else: IFF_UP = 0x1 +class Bridge(object): +def get_structure(self): + +Get bridge list. + +br_i = re.compile(^(\S+).*?(\S+)$, re.MULTILINE) +nbr_i = re.compile(^\s+(\S+)$, re.MULTILINE) +out_line = utils.run(brctl show, verbose=False).stdout.splitlines() +result = dict() +bridge = None +iface = None +for line in out_line[1:]: +try: +(tmpbr, iface) = br_i.findall(line)[0] +bridge = tmpbr +result[bridge] = [] +except IndexError: +iface = nbr_i.findall(line)[0] + +if iface: # add interface to bridge +result[bridge].append(iface) + +return result + + +def list_br(self): +return self.get_structure().keys() + + +def port_to_br(self, port_name): + +Return bridge which contain port. + +@param port_name: Name of port. +@return: Bridge name or None if there is no bridge which contain port. + +bridge = None +for (br, ifaces) in self.get_structure().iteritems(): +if port_name in ifaces: +bridge = br +return bridge + + +def _br_ioctl(self, io_cmd, brname, ifname): +ctrl_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0) +index = if_nametoindex(ifname) +if index == 0: +raise TAPNotExistError(ifname) +ifr = struct.pack(16si, brname, index) +_ = fcntl.ioctl(ctrl_sock, io_cmd, ifr) +ctrl_sock.close() + + +def add_port(self, brname, ifname): + +Add a device to bridge + +@param ifname: Name of TAP device +@param brname: Name of the bridge + +try: +self._br_ioctl(SIOCBRADDIF, brname, ifname) +except IOError, details: +raise BRAddIfError(ifname, brname, details) + + +def del_port(self, brname, ifname): + +Remove a TAP device from bridge + +@param ifname: Name of TAP device +@param brname: Name of the bridge + +try: +self._br_ioctl(SIOCBRDELIF, brname, ifname) +except IOError, details: +raise BRDelIfError(ifname, brname, details) + + +def __init_openvswitch(func): + +Decorator used for late init of __ovs variable. + +def wrap_init(*args, **kargs): +global __ovs +if __ovs is None: +try: +__ovs = openvswitch.OpenVSwitchSystem() +__ovs.init_system() +if (not __ovs.check()): +raise Exception(Check of OpenVSwitch failed.) +except Exception, e: +logging.debug(System not support OpenVSwitch:) +logging.debug(e) + +return func(*args, **kargs) +return wrap_init + + +#Global variable for OpenVSwitch +__ovs = None +__bridge = Bridge() + + def lock_file(filename, mode=fcntl.LOCK_EX): f = open(filename, w) fcntl.lockf(f, mode) @@ -123,6 +230,74 @@ class BRAddIfError(NetError): (self.ifname, self.brname, self.details)) +class BRDelIfError(NetError): +def __init__(self, ifname, brname, details): +NetError.__init__(self,
[virt][PATCH 2/3] virt: Adds functionality for vms.
Allow creating of machine with tap devices which are not connected to bridge. Add function for fill virtnet object with address. Signed-off-by: Jiří Župka jzu...@redhat.com --- virttest/kvm_vm.py |9 +++-- virttest/utils_misc.py | 45 + virttest/utils_misc_unittest.py | 59 +++ virttest/virt_vm.py | 20 + 4 files changed, 110 insertions(+), 23 deletions(-) diff --git a/virttest/kvm_vm.py b/virttest/kvm_vm.py index 9877d55..7d4f93f 100644 --- a/virttest/kvm_vm.py +++ b/virttest/kvm_vm.py @@ -958,7 +958,7 @@ class VM(virt_vm.BaseVM): qemu_cmd += add_name(hlp, name) # no automagic devices please defaults = params.get(defaults, no) -if has_option(hlp,nodefaults) and defaults != yes: +if has_option(hlp, nodefaults) and defaults != yes: qemu_cmd += -nodefaults # Add monitors for monitor_name in params.objects(monitors): @@ -1074,7 +1074,7 @@ class VM(virt_vm.BaseVM): for nic in vm.virtnet: # setup nic parameters as needed -nic = vm.add_nic(**dict(nic)) # add_netdev if netdev_id not set +nic = vm.add_nic(**dict(nic)) # add_netdev if netdev_id not set # gather set values or None if unset vlan = int(nic.get('vlan')) netdev_id = nic.get('netdev_id') @@ -2073,7 +2073,7 @@ class VM(virt_vm.BaseVM): nic.set_if_none('nettype', 'bridge') if nic.nettype == 'bridge': # implies tap # destination is required, hard-code reasonable default if unset -nic.set_if_none('netdst', 'virbr0') +# nic.set_if_none('netdst', 'virbr0') # tapfd allocated/set in activate because requires system resources nic.set_if_none('tapfd_id', utils_misc.generate_random_id()) elif nic.nettype == 'user': @@ -2151,7 +2151,8 @@ class VM(virt_vm.BaseVM): error.context(Raising bridge for + msg_sfx + attach_cmd, logging.debug) # assume this will puke if netdst unset -utils_misc.add_to_bridge(nic.ifname, nic.netdst) +if not nic.netdst is None: +utils_misc.add_to_bridge(nic.ifname, nic.netdst) elif nic.nettype == 'user': attach_cmd += user,name=%s % nic.ifname else: # unsupported nettype diff --git a/virttest/utils_misc.py b/virttest/utils_misc.py index f03e922..4376f44 100644 --- a/virttest/utils_misc.py +++ b/virttest/utils_misc.py @@ -62,7 +62,7 @@ class Bridge(object): br_i = re.compile(^(\S+).*?(\S+)$, re.MULTILINE) nbr_i = re.compile(^\s+(\S+)$, re.MULTILINE) -out_line = utils.run(brctl show, verbose=False).stdout.splitlines() +out_line = (utils.run(brctl show, verbose=False).stdout.splitlines()) result = dict() bridge = None iface = None @@ -226,7 +226,7 @@ class BRAddIfError(NetError): self.details = details def __str__(self): -return (Can not add if %s to bridge %s: %s % +return (Can't remove interface %s from bridge %s: %s % (self.ifname, self.brname, self.details)) @@ -249,7 +249,7 @@ class IfNotInBridgeError(NetError): self.details = details def __str__(self): -return (If %s in any bridge: %s % +return (Interface %s is not present on any bridge: %s % (self.ifname, self.details)) @@ -260,7 +260,7 @@ class BRNotExistError(NetError): self.details = details def __str__(self): -return (Bridge %s not exists: %s % (self.brname, self.details)) +return (Bridge %s does not exist: %s % (self.brname, self.details)) class IfChangeBrError(NetError): @@ -272,7 +272,7 @@ class IfChangeBrError(NetError): self.details = details def __str__(self): -return (Can not change if %s from bridge %s to bridge %s: %s % +return (Can't move interface %s from bridge %s to bridge %s: %s % (self.ifname, self.new_brname, self.oldbrname, self.details)) @@ -284,7 +284,7 @@ class IfChangeAddrError(NetError): self.details = details def __str__(self): -return (Can not change if %s from bridge %s to bridge %s: %s % +return (Can't change interface IP address %s from interface %s: %s % (self.ifname, self.ipaddr, self.details)) @@ -294,8 +294,9 @@ class BRIpError(NetError): self.brname = brname def __str__(self): -return (Bridge %s doesn't have assigned any ip address. It is - impossible to start dnsmasq for this bridge. % (self.brname)) +return (Bridge %s doesn't have an IP address assigned. It's + impossible to start dnsmasq for this bridge. % + (self.brname)) class HwAddrSetError(NetError): @@
[Autotest][PATCH] Autotest: Add utils for OpenVSwitch patch
pull-request https://github.com/autotest/autotest/pull/569 ForAllxx: run object method on every object in list ForAll[a,b,c].print() Signed-off-by: Jiří Župka jzu...@redhat.com --- client/shared/base_utils.py | 81 +- client/shared/openvswitch.py | 578 ++ client/tests |2 +- 3 files changed, 646 insertions(+), 15 deletions(-) create mode 100644 client/shared/openvswitch.py diff --git a/client/shared/base_utils.py b/client/shared/base_utils.py index 0734742..573b907 100644 --- a/client/shared/base_utils.py +++ b/client/shared/base_utils.py @@ -1224,6 +1224,56 @@ def system_output_parallel(commands, timeout=None, ignore_status=False, return out +class ForAll(list): +def __getattr__(self, name): +def wrapper(*args, **kargs): +return map(lambda o: o.__getattribute__(name)(*args, **kargs), self) +return wrapper + + +class ForAllP(list): + +Parallel version of ForAll + +def __getattr__(self, name): +def wrapper(*args, **kargs): +threads = [] +for o in self: +threads.append(InterruptedThread(o.__getattribute__(name), +args=args, kwargs=kargs)) +for t in threads: +t.start() +return map(lambda t: t.join(), threads) +return wrapper + + +class ForAllPSE(list): + +Parallel version of and suppress exception. + +def __getattr__(self, name): +def wrapper(*args, **kargs): +threads = [] +for o in self: +threads.append(InterruptedThread(o.__getattribute__(name), +args=args, kwargs=kargs)) +for t in threads: +t.start() + +result = [] +for t in threads: +ret = {} +try: +ret[return] = t.join() +except Exception: +ret[exception] = sys.exc_info() +ret[args] = args +ret[kargs] = kargs +result.append(ret) +return result +return wrapper + + def etraceback(prep, exc_info): Enhanced Traceback formats traceback into lines prep: line\nname: line @@ -1733,9 +1783,12 @@ def import_site_function(path, module, funcname, dummy, modulefile=None): return import_site_symbol(path, module, funcname, dummy, modulefile) -def _get_pid_path(program_name): -pid_files_dir = GLOBAL_CONFIG.get_config_value(SERVER, 'pid_files_dir', - default=) +def get_pid_path(program_name, pid_files_dir=None): +if pid_files_dir is None: +pid_files_dir = GLOBAL_CONFIG.get_config_value(SERVER, + 'pid_files_dir', + default=) + if not pid_files_dir: base_dir = os.path.dirname(__file__) pid_path = os.path.abspath(os.path.join(base_dir, .., .., @@ -1746,25 +1799,25 @@ def _get_pid_path(program_name): return pid_path -def write_pid(program_name): +def write_pid(program_name, pid_files_dir=None): Try to drop program_name.pid in the main autotest directory. Args: program_name: prefix for file name -pidfile = open(_get_pid_path(program_name), w) +pidfile = open(get_pid_path(program_name, pid_files_dir), w) try: pidfile.write(%s\n % os.getpid()) finally: pidfile.close() -def delete_pid_file_if_exists(program_name): +def delete_pid_file_if_exists(program_name, pid_files_dir=None): Tries to remove program_name.pid from the main autotest directory. -pidfile_path = _get_pid_path(program_name) +pidfile_path = get_pid_path(program_name, pid_files_dir) try: os.remove(pidfile_path) @@ -1774,18 +1827,18 @@ def delete_pid_file_if_exists(program_name): raise -def get_pid_from_file(program_name): +def get_pid_from_file(program_name, pid_files_dir=None): Reads the pid from program_name.pid in the autotest directory. @param program_name the name of the program @return the pid if the file exists, None otherwise. -pidfile_path = _get_pid_path(program_name) +pidfile_path = get_pid_path(program_name, pid_files_dir) if not os.path.exists(pidfile_path): return None -pidfile = open(_get_pid_path(program_name), 'r') +pidfile = open(get_pid_path(program_name, pid_files_dir), 'r') try: try: @@ -1808,27 +1861,27 @@ def get_process_name(pid): return get_field(read_file(/proc/%d/stat % pid), 1)[1:-1] -def program_is_alive(program_name): +def program_is_alive(program_name, pid_files_dir=None): Checks if the process is alive and not in Zombie state. @param program_name the
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
* Avi Kivity a...@redhat.com [2012-09-24 17:41:19]: On 09/21/2012 08:24 PM, Raghavendra K T wrote: On 09/21/2012 06:32 PM, Rik van Riel wrote: On 09/21/2012 08:00 AM, Raghavendra K T wrote: From: Raghavendra K T raghavendra...@linux.vnet.ibm.com When total number of VCPUs of system is less than or equal to physical CPUs, PLE exits become costly since each VCPU can have dedicated PCPU, and trying to find a target VCPU to yield_to just burns time in PLE handler. This patch reduces overhead, by simply doing a return in such scenarios by checking the length of current cpu runqueue. I am not convinced this is the way to go. The VCPU that is holding the lock, and is not releasing it, probably got scheduled out. That implies that VCPU is on a runqueue with at least one other task. I see your point here, we have two cases: case 1) rq1 : vcpu1-wait(lockA) (spinning) rq2 : vcpu2-holding(lockA) (running) Here Ideally vcpu1 should not enter PLE handler, since it would surely get the lock within ple_window cycle. (assuming ple_window is tuned for that workload perfectly). May be this explains why we are not seeing benefit with kernbench. On the other side, Since we cannot have a perfect ple_window tuned for all type of workloads, for those workloads, which may need more than 4096 cycles, we gain. thinking is it that we are seeing in benefited cases? Maybe we need to increase the ple window regardless. 4096 cycles is 2 microseconds or less (call it t_spin). The overhead from kvm_vcpu_on_spin() and the associated task switches is at least a few microseconds, increasing as contention is added (call it t_tield). The time for a natural context switch is several milliseconds (call it t_slice). There is also the time the lock holder owns the lock, assuming no contention (t_hold). If t_yield t_spin, then in the undercommitted case it dominates t_spin. If t_hold t_spin we lose badly. If t_spin t_yield, then the undercommitted case doesn't suffer as much as most of the spinning happens in the guest instead of the host, so it can pick up the unlock timely. We don't lose too much in the overcommitted case provided the values aren't too far apart (say a factor of 3). Obviously t_spin must be significantly smaller than t_slice, otherwise it accomplishes nothing. Regarding t_hold: if it is small, then a larger t_spin helps avoid false exits. If it is large, then we're not very sensitive to t_spin. It doesn't matter if it takes us 2 usec or 20 usec to yield, if we end up yielding for several milliseconds. So I think it's worth trying again with ple_window of 2-4. Hi Avi, I ran different benchmarks increasing ple_window, and results does not seem to be encouraging for increasing ple_window. Results: 16 core PLE machine with 16 vcpu guest. base kernel = 3.6-rc5 + ple handler optimization patch base_pleopt_8k = base kernel + ple window = 8k base_pleopt_16k = base kernel + ple window = 16k base_pleopt_32k = base kernel + ple window = 32k Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096 base_pleopt_8k base_pleopt_16k base_pleopt_32k - kernbench_1x-5.54915-15.94529 -44.31562 kernbench_2x-7.89399-17.75039 -37.73498 - sysbench_1x 0.45955 -0.987780.05252 sysbench_2x 1.44071 -0.816251.35620 sysbench_3x 0.45549 1.51795 -0.41573 - hackbench_1x-3.80272-13.91456 -40.79059 hackbench_2x-4.78999-7.61382-7.24475 - ebizzy_1x -2.54626-16.86050 -38.46109 ebizzy_2x -8.75526-19.29116 -48.33314 - I also got perf top output to analyse the difference. Difference comes because of flushtlb (and also spinlock). Ebizzy run for 4k ple_window - 87.20% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore - 100.00% _raw_spin_unlock_irqrestore + 52.89% release_pages + 47.10% pagevec_lru_move_fn - 5.71% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore + 86.03% default_send_IPI_mask_allbutself_phys + 13.96% default_send_IPI_mask_sequence_phys - 3.10% [kernel] [k] smp_call_function_many smp_call_function_many Ebizzy run for 32k ple_window - 91.40% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore - 100.00% _raw_spin_unlock_irqrestore + 53.13%
Re: qemu-kvm: remove boot=on|off drive parameter compatibility
Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto: Yep, I did send patches with the testdev device present on qemu-kvm.git to qemu.git a while ago, but there were many comments on the review, I ended up not implementing everything that was asked and the patches were archived. If nobody wants to step up to port it, I'll re-read the original thread and will spin up new patches (and try to go through the end with it). Executing the KVM unittests is something that we can't afford to lose, so I'd say it's important on this last mile effort to get rid of qemu-kvm. Absolutely, IIRC the problem was that testdev did a little bit of everything... let's see what's the functionality of testdev: - write (port 0xf1), can be replaced in autotest with: -device isa-debugcon,iobase=0xf1,chardev=... - exit code (port 0xf4), see this series: http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html - ram size (port 0xd1). If we can also patch kvm-unittests, the memory is available in the CMOS or in fwcfg. Here is the SeaBIOS code: u32 rs = ((inb_cmos(0x34) 16) | (inb_cmos(0x35) 24)); if (rs) rs += 16 * 1024 * 1024; else rs = (((inb_cmos(0x30) 10) | (inb_cmos(0x31) 18)) + 1 * 1024 * 1024); The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm: remove boot=on|off drive parameter compatibility
On Wed, Oct 03, 2012 at 03:19:56PM +0200, Paolo Bonzini wrote: Il 03/10/2012 12:57, Lucas Meneghel Rodrigues ha scritto: Yep, I did send patches with the testdev device present on qemu-kvm.git to qemu.git a while ago, but there were many comments on the review, I ended up not implementing everything that was asked and the patches were archived. If nobody wants to step up to port it, I'll re-read the original thread and will spin up new patches (and try to go through the end with it). Executing the KVM unittests is something that we can't afford to lose, so I'd say it's important on this last mile effort to get rid of qemu-kvm. Absolutely, IIRC the problem was that testdev did a little bit of everything... let's see what's the functionality of testdev: - write (port 0xf1), can be replaced in autotest with: -device isa-debugcon,iobase=0xf1,chardev=... kvm-unit-tests no longer uses 0xf1 for output. It uses serial. - exit code (port 0xf4), see this series: http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html - ram size (port 0xd1). If we can also patch kvm-unittests, the memory is available in the CMOS or in fwcfg. Here is the SeaBIOS code: u32 rs = ((inb_cmos(0x34) 16) | (inb_cmos(0x35) 24)); if (rs) rs += 16 * 1024 * 1024; else rs = (((inb_cmos(0x30) 10) | (inb_cmos(0x31) 18)) + 1 * 1024 * 1024); The rest (ports 0xe0..0xe7, 0x2000..0x2017, MMIO) can be left in testdev. Paolo -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvmclock: fix guest stop notification
On Sun, Sep 30, 2012 at 09:50:07PM -0400, Amos Kong wrote: - Original Message - On Thu, Sep 20, 2012 at 09:46:41AM -0300, Marcelo Tosatti wrote: On Thu, Sep 20, 2012 at 01:55:20PM +0530, Amit Shah wrote: Commit f349c12c0434e29c79ecde89029320c4002f7253 added the guest stop In commitlog of f349c12c0434e29c79ecde89029320c4002f7253: ## This patch uses the qemu Notifier system to tell the guest it _is about to be_ stopped notification, but it did it in a way that the stop notification would never reach the kernel. The kvm_vm_state_changed() function gets a value of 0 for the 'running' parameter when the VM is stopped, making all the code added previously dead code. This patch reworks the code so that it's called when 'running' is 0, which indicates the VM was stopped. Amit, did you touch any real issue? guest gets call trace with current code? which kind of context? Someone told me he got call trace when shutdown guest by 'init 0', I didn't verify this issue. CC: Eric B Munson emun...@mgebm.net CC: Raghavendra K T raghavendra...@linux.vnet.ibm.com CC: Andreas Färber afaer...@suse.de CC: Marcelo Tosatti mtosa...@redhat.com CC: Paolo Bonzini pbonz...@redhat.com CC: Laszlo Ersek ler...@redhat.com Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/kvm/clock.c | 21 +++-- 1 files changed, 11 insertions(+), 10 deletions(-) diff --git a/hw/kvm/clock.c b/hw/kvm/clock.c index 824b978..f3427eb 100644 --- a/hw/kvm/clock.c +++ b/hw/kvm/clock.c @@ -71,18 +71,19 @@ static void kvmclock_vm_state_change(void *opaque, int running, I found this function is only called when resume vm (here running is 1, it means vm is already resumed? we don't call that ioctl _before_ resume). kvmclock_vm_state_change() is not called when I stop vm through qemu monitor command. void vm_start(void) { if (!runstate_is_running()) { cpu_enable_ticks(); runstate_set(RUN_STATE_RUNNING); vm_state_notify(1, RUN_STATE_RUNNING); resume_all_vcpus(); monitor_protocol_event(QEVENT_RESUME, NULL); } } 'running' is a bad name that causes confusion because it refers to the present moment (which is not precise). IMO, better name would be 'new_state'. if (running) { s-clock_valid = false; +return; +} -if (!cap_clock_ctrl) { -return; -} -for (penv = first_cpu; penv != NULL; penv = penv-next_cpu) { -ret = kvm_vcpu_ioctl(penv, KVM_KVMCLOCK_CTRL, 0); -if (ret) { -if (ret != -EINVAL) { -fprintf(stderr, %s: %s\n, __func__, strerror(-ret)); -} -return; +if (!cap_clock_ctrl) { +return; +} +for (penv = first_cpu; penv != NULL; penv = penv-next_cpu) { +ret = kvm_vcpu_ioctl(penv, KVM_KVMCLOCK_CTRL, 0); +if (ret) { +if (ret != -EINVAL) { +fprintf(stderr, %s: %s\n, __func__, strerror(-ret)); } +return; } } } -- 1.7.7.6 ACK Avi, please merge through uq/master. NACK, guest should be notified when the VM is starting, not when stopping. # from api.txt ioctl (KVM_CAP_KVMCLOCK_CTRL) can be called any time _after_ pausing the vcpu, but _before_ it is resumed. This is before its actually resumed. From the QEMU code pov, actually resumed would be the point where it calls ioctl(vcpu_fd, KVM_RUN). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
* Avi Kivity a...@redhat.com [2012-09-30 13:13:09]: On 09/30/2012 01:07 PM, Gleb Natapov wrote: On Sun, Sep 30, 2012 at 10:18:17AM +0200, Avi Kivity wrote: On 09/28/2012 08:16 AM, Raghavendra K T wrote: +struct pv_sched_info { + unsigned long sched_bitmap; Thinking, whether we need something similar to cpumask here? Only thing is we are representing guest (v)cpumask. DECLARE_BITMAP(sched_bitmap, KVM_MAX_VCPUS) vcpu_id can be greater than KVM_MAX_VCPUS. Use the index into the vcpu table as the bitmap index then. In fact it's better because then the lookup to get the vcpu pointer is trivial. Did you mean, while setting the bitmap, we should do for (i = 1..n) if (kvm-vcpus[i] == vcpu) set ith position in bitmap? I just wanted to know whether there is any easy way to convert from vcpu pointer to index in kvm vcpu table. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
* Avi Kivity a...@redhat.com [2012-09-27 14:03:59]: On 09/27/2012 01:23 PM, Raghavendra K T wrote: [...] 2) looking at the result (comparing A C) , I do feel we have significant in iterating over vcpus (when compared to even vmexit) so We still would need undercommit fix sugested by PeterZ (improving by 140%). ? Looking only at the current runqueue? My worry is that it misses a lot of cases. Maybe try the current runqueue first and then others. Okay. Do you mean we can have something like + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } in the Peter's patch ? ( I thought lot about or || . Both seem to have their own cons ). But that should be only when we have short term imbalance, as PeterZ told. I am experimenting all these for V2 patch. Will come back with analysis and patch. Or were you referring to something else? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/6] cirrus_vga: allow configurable vram size
Marcelo Tosatti mtosa...@redhat.com writes: Allow RAM size to be configurable for cirrus, to allow migration compatibility from qemu-kvm. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori Index: qemu-compat-kvm/hw/cirrus_vga.c === --- qemu-compat-kvm.orig/hw/cirrus_vga.c +++ qemu-compat-kvm/hw/cirrus_vga.c @@ -43,8 +43,6 @@ //#define DEBUG_CIRRUS //#define DEBUG_BITBLT -#define VGA_RAM_SIZE (8192 * 1024) - /*** * * definitions @@ -2853,7 +2851,8 @@ static void cirrus_init_common(CirrusVGA /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_io, cirrus_linear_io_ops, s, - cirrus-linear-io, VGA_RAM_SIZE); + cirrus-linear-io, s-vga.vram_size_mb + * 1024 * 1024); /* I/O handler for LFB */ memory_region_init_io(s-cirrus_linear_bitblt_io, @@ -2893,7 +2892,6 @@ static int vga_initfn(ISADevice *dev) ISACirrusVGAState *d = DO_UPCAST(ISACirrusVGAState, dev, dev); VGACommonState *s = d-cirrus_vga.vga; -s-vram_size_mb = VGA_RAM_SIZE 20; vga_common_init(s); cirrus_init_common(d-cirrus_vga, CIRRUS_ID_CLGD5430, 0, isa_address_space(dev)); @@ -2906,6 +2904,12 @@ static int vga_initfn(ISADevice *dev) return 0; } +static Property isa_vga_cirrus_properties[] = { +DEFINE_PROP_UINT32(vgamem_mb, struct ISACirrusVGAState, + cirrus_vga.vga.vram_size_mb, 8), +DEFINE_PROP_END_OF_LIST(), +}; + static void isa_cirrus_vga_class_init(ObjectClass *klass, void *data) { ISADeviceClass *k = ISA_DEVICE_CLASS(klass); @@ -2913,6 +2917,7 @@ static void isa_cirrus_vga_class_init(Ob dc-vmsd = vmstate_cirrus_vga; k-init = vga_initfn; +dc-props = isa_vga_cirrus_properties; } static TypeInfo isa_cirrus_vga_info = { @@ -2936,7 +2941,6 @@ static int pci_cirrus_vga_initfn(PCIDevi int16_t device_id = pc-device_id; /* setup VGA */ - s-vga.vram_size_mb = VGA_RAM_SIZE 20; vga_common_init(s-vga); cirrus_init_common(s, device_id, 1, pci_address_space(dev)); s-vga.ds = graphic_console_init(s-vga.update, s-vga.invalidate, @@ -2968,6 +2972,12 @@ DeviceState *pci_cirrus_vga_init(PCIBus return pci_create_simple(bus, -1, cirrus-vga)-qdev; } +static Property pci_vga_cirrus_properties[] = { +DEFINE_PROP_UINT32(vgamem_mb, struct PCICirrusVGAState, + cirrus_vga.vga.vram_size_mb, 8), +DEFINE_PROP_END_OF_LIST(), +}; + static void cirrus_vga_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -2981,6 +2991,7 @@ static void cirrus_vga_class_init(Object k-class_id = PCI_CLASS_DISPLAY_VGA; dc-desc = Cirrus CLGD 54xx VGA; dc-vmsd = vmstate_pci_cirrus_vga; +dc-props = pci_vga_cirrus_properties; } static TypeInfo cirrus_vga_info = { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] Use machine options to emulate -no-kvm-pit
Marcelo Tosatti mtosa...@redhat.com writes: Commit e81dda195556e72f8cd294998296c1051aab30a8 from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Leave the related command line option in place, just issuing a warning that it has no function anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3066,7 +3066,11 @@ int main(int argc, char **argv, char **e qemu_opts_parse(olist, kernel_irqchip=off, 0); break; } - +case QEMU_OPTION_no_kvm_pit: { +fprintf(stderr, Warning: KVM PIT can no longer be disabled +separately.\n); +break; +} case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2841,6 +2841,10 @@ ETEXI DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, -no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n, QEMU_ARCH_I386) +DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, +-no-kvm-pit disable KVM kernel mode PIT\n, +QEMU_ARCH_I386) + HXCOMM This is the last statement. Insert new options before this line! STEXI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] Use machine options to emulate -no-kvm-irqchip
Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Regards, Anthony Liguori Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3061,6 +3061,12 @@ int main(int argc, char **argv, char **e machine = machine_parse(optarg); } break; +case QEMU_OPTION_no_kvm_irqchip: { +olist = qemu_find_opts(machine); +qemu_opts_parse(olist, kernel_irqchip=off, 0); +break; +} + case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2838,6 +2838,10 @@ STEXI Enable FIPS 140-2 compliance mode. ETEXI +DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, +-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n, +QEMU_ARCH_I386) + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 6/6] Emulate qemu-kvms -tdf option
Marcelo Tosatti mtosa...@redhat.com writes: Commit d527b774878defc27f317cdde19b5c54fd0d5666 from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Add a warning that there is no effect anymore. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3169,6 +3169,10 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_semihosting: semihosting_enabled = 1; break; +case QEMU_OPTION_tdf: +fprintf(stderr, Warning: user space PIT time drift fix +is no longer supported.\n); +break; case QEMU_OPTION_name: qemu_name = g_strdup(optarg); { Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2849,6 +2849,10 @@ DEF(no-kvm-pit-reinjection, 0, QEMU_OP disable KVM kernel mode PIT interrupt reinjection\n, QEMU_ARCH_I386) +DEF(tdf, 0, QEMU_OPTION_tdf, +-tdftime drift fix (deprecated)\n, +QEMU_ARCH_ALL) + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/6] Emulate qemu-kvms drive parameter boot=on|off
Marcelo Tosatti mtosa...@redhat.com writes: Commit 841280b6c224ea2c6edc2f5afc2add513c85181d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com We do not want to maintain this option forever. It will be removed after a grace period of a few releases. So warn the user that this option has no effect and will become invalid soon. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori Index: qemu-compat-kvm/blockdev.c === --- qemu-compat-kvm.orig/blockdev.c +++ qemu-compat-kvm/blockdev.c @@ -432,6 +432,12 @@ DriveInfo *drive_init(QemuOpts *opts, in return NULL; } +if (qemu_opt_get(opts, boot) != NULL) { +fprintf(stderr, qemu-kvm: boot=on|off is deprecated and will be +ignored. Future versions will reject this parameter. Please +update your scripts.\n); +} + on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, werror)) != NULL) { if (type != IF_IDE type != IF_SCSI type != IF_VIRTIO type != IF_NONE) { Index: qemu-compat-kvm/qemu-config.c === --- qemu-compat-kvm.orig/qemu-config.c +++ qemu-compat-kvm/qemu-config.c @@ -114,6 +114,10 @@ static QemuOptsList qemu_drive_opts = { .name = copy-on-read, .type = QEMU_OPT_BOOL, .help = copy read data from backing file into image file, +},{ +.name = boot, +.type = QEMU_OPT_BOOL, +.help = (deprecated, ignored), }, { /* end of list */ } }, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/6] Use global properties to emulate -no-kvm-pit-reinjection
Marcelo Tosatti mtosa...@redhat.com writes: Commit 80019541e9c13fab476bee35edcef3e11646222c from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Use global properties to emulate -no-kvm-pit-reinjection Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3071,6 +3071,21 @@ int main(int argc, char **argv, char **e separately.\n); break; } +case QEMU_OPTION_no_kvm_pit_reinjection: { +static GlobalProperty kvm_pit_lost_tick_policy[] = { +{ +.driver = kvm-pit, +.property = lost_tick_policy, +.value= discard, +}, +{ /* end of list */ } +}; + +fprintf(stderr, Warning: option deprecated, use +lost_tick_policy property of kvm-pit instead.\n); +qdev_prop_register_global_list(kvm_pit_lost_tick_policy); +break; +} case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2844,7 +2844,10 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_ DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, -no-kvm-pit disable KVM kernel mode PIT\n, QEMU_ARCH_I386) - +DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection, +-no-kvm-pit-reinjection\n +disable KVM kernel mode PIT interrupt reinjection\n, +QEMU_ARCH_I386) HXCOMM This is the last statement. Insert new options before this line! STEXI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/6] qemu-kvm compat
Marcelo Tosatti mtosa...@redhat.com writes: As discussed on yesterdays qemu call, follows qemu-kvm compat patches for qemu: - command line compatibility - allow configurable ram size for cirrus Whole thing looks good. I'll apply it directly to get it into qemu.git faster. Thanks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 10/03/2012 04:17 PM, Raghavendra K T wrote: * Avi Kivity a...@redhat.com [2012-09-30 13:13:09]: On 09/30/2012 01:07 PM, Gleb Natapov wrote: On Sun, Sep 30, 2012 at 10:18:17AM +0200, Avi Kivity wrote: On 09/28/2012 08:16 AM, Raghavendra K T wrote: +struct pv_sched_info { + unsigned long sched_bitmap; Thinking, whether we need something similar to cpumask here? Only thing is we are representing guest (v)cpumask. DECLARE_BITMAP(sched_bitmap, KVM_MAX_VCPUS) vcpu_id can be greater than KVM_MAX_VCPUS. Use the index into the vcpu table as the bitmap index then. In fact it's better because then the lookup to get the vcpu pointer is trivial. Did you mean, while setting the bitmap, we should do for (i = 1..n) if (kvm-vcpus[i] == vcpu) set ith position in bitmap? You can store i in the vcpu itself: set_bit(vcpu-index, kvm-preempted); I just wanted to know whether there is any easy way to convert from vcpu pointer to index in kvm vcpu table. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [PATCH v2 08/10] ARM: KVM: VGIC initialisation code
On Wed, Oct 3, 2012 at 6:02 AM, Will Deacon will.dea...@arm.com wrote: On Tue, Oct 02, 2012 at 08:45:54PM +0100, Peter Maydell wrote: On 2 October 2012 20:28, Will Deacon will.dea...@arm.com wrote: On Tue, Oct 02, 2012 at 07:31:43PM +0100, Peter Maydell wrote: We probably want to be passing in the base of the cpu-internal peripherals, rather than base of the GIC specifically. For the A15 these are the same thing, but that's not inherent [compare the A9 which has more devices at fixed offsets from a configurable base address]. If you do that, userspace will need a way to probe the emulated CPU so that is knows exactly which set of peripherals there are and which ones it needs to emulate. This feels pretty nasty, given that the vgic is handled more or less completely by the kernel-side of things. Userspace knows what the emulated CPU is because it tells the kernel which CPU to provide -- the kernel can say yes or no but it can't provide a different CPU to the one we ask for, or one with bits mising... Aha, ok, I didn't realise that's how it works. Does userspace just pass the CPUID or is there an identifier provided by kvm? /me jumps back into the code. Userspace provides an identifier (0 for Cortex-A15). This changed in the last patch series, so as to only have one (public and internal) identifier used to index into the array of core-specific coprocessor handlings. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/6] qemu-kvm compat
On Wed, Oct 03, 2012 at 09:45:07AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: As discussed on yesterdays qemu call, follows qemu-kvm compat patches for qemu: - command line compatibility - allow configurable ram size for cirrus Whole thing looks good. I'll apply it directly to get it into qemu.git faster. Great. I'll test migration later today. You will take care of the default options matching qemu-kvm, as agreed, yes? Via machine options? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 0/6] qemu-kvm compat
Marcelo Tosatti mtosa...@redhat.com writes: On Wed, Oct 03, 2012 at 09:45:07AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: As discussed on yesterdays qemu call, follows qemu-kvm compat patches for qemu: - command line compatibility - allow configurable ram size for cirrus Whole thing looks good. I'll apply it directly to get it into qemu.git faster. Great. I'll test migration later today. You will take care of the default options matching qemu-kvm, as agreed, yes? Via machine options? Yup. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On 2012-10-03 17:46, Jan Kiszka wrote: On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. I guess patch 4 is fine as well, so consider 4 5 ack'ed. Jan signature.asc Description: OpenPGP digital signature
Re: [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler
On 10/03/2012 02:22 PM, Raghavendra K T wrote: So I think it's worth trying again with ple_window of 2-4. Hi Avi, I ran different benchmarks increasing ple_window, and results does not seem to be encouraging for increasing ple_window. Thanks for testing! Comments below. Results: 16 core PLE machine with 16 vcpu guest. base kernel = 3.6-rc5 + ple handler optimization patch base_pleopt_8k = base kernel + ple window = 8k base_pleopt_16k = base kernel + ple window = 16k base_pleopt_32k = base kernel + ple window = 32k Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096 base_pleopt_8k base_pleopt_16k base_pleopt_32k - kernbench_1x -5.54915-15.94529 -44.31562 kernbench_2x -7.89399-17.75039 -37.73498 So, 44% degradation even with no overcommit? That's surprising. I also got perf top output to analyse the difference. Difference comes because of flushtlb (and also spinlock). That's in the guest, yes? Ebizzy run for 4k ple_window - 87.20% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore - 100.00% _raw_spin_unlock_irqrestore + 52.89% release_pages + 47.10% pagevec_lru_move_fn - 5.71% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore + 86.03% default_send_IPI_mask_allbutself_phys + 13.96% default_send_IPI_mask_sequence_phys - 3.10% [kernel] [k] smp_call_function_many smp_call_function_many Ebizzy run for 32k ple_window - 91.40% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore - 100.00% _raw_spin_unlock_irqrestore + 53.13% release_pages + 46.86% pagevec_lru_move_fn - 4.38% [kernel] [k] smp_call_function_many smp_call_function_many - 2.51% [kernel] [k] arch_local_irq_restore - arch_local_irq_restore + 90.76% default_send_IPI_mask_allbutself_phys + 9.24% default_send_IPI_mask_sequence_phys Both the 4k and the 32k results are crazy. Why is arch_local_irq_restore() so prominent? Do you have a very high interrupt rate in the guest? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip
Jan Kiszka jan.kis...@web.de writes: On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. The distros need to keep these flags to do the switch. I see no point in deprecating them since they're trivially easy to maintain. So we'd just support them forever. Regards, Anthony Liguori Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On 2012-10-03 19:16, Anthony Liguori wrote: Jan Kiszka jan.kis...@web.de writes: On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. The distros need to keep these flags to do the switch. Why? Should be documented in commit log. I see no point in deprecating them since they're trivially easy to maintain. Given the level of cr** we already have in the command line, they are kind of noise, yes. But even then, these patches are not consistent as pointed out above. Also, they should not be documented to avoid being spread. That's what we did with other deprecated switches in QEMU. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler
On 10/03/2012 04:29 PM, Raghavendra K T wrote: * Avi Kivity a...@redhat.com [2012-09-27 14:03:59]: On 09/27/2012 01:23 PM, Raghavendra K T wrote: [...] 2) looking at the result (comparing A C) , I do feel we have significant in iterating over vcpus (when compared to even vmexit) so We still would need undercommit fix sugested by PeterZ (improving by 140%). ? Looking only at the current runqueue? My worry is that it misses a lot of cases. Maybe try the current runqueue first and then others. Okay. Do you mean we can have something like + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } in the Peter's patch ? ( I thought lot about or || . Both seem to have their own cons ). But that should be only when we have short term imbalance, as PeterZ told. I'm missing the context. What is p_rq? What I mean was: if can_yield_to_process_in_current_rq do that else if can_yield_to_process_in_other_rq do that else return -ESRCH -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip
Jan Kiszka jan.kis...@web.de writes: On 2012-10-03 19:16, Anthony Liguori wrote: Jan Kiszka jan.kis...@web.de writes: On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. The distros need to keep these flags to do the switch. Why? Should be documented in commit log. I see no point in deprecating them since they're trivially easy to maintain. Given the level of cr** we already have in the command line, they are kind of noise, yes. But even then, these patches are not consistent as pointed out above. Also, they should not be documented to avoid being spread. That's what we did with other deprecated switches in QEMU. The patchset isn't checkpatch clean so I'll fix that, remove the docs, and send a new version tomorrow along with the machine changes. Regards, Anthony Liguori Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On Wed, Oct 03, 2012 at 07:52:57AM -0300, Marcelo Tosatti wrote: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-compat-kvm/vl.c === --- qemu-compat-kvm.orig/vl.c +++ qemu-compat-kvm/vl.c @@ -3061,6 +3061,12 @@ int main(int argc, char **argv, char **e machine = machine_parse(optarg); } break; +case QEMU_OPTION_no_kvm_irqchip: { +olist = qemu_find_opts(machine); +qemu_opts_parse(olist, kernel_irqchip=off, 0); +break; +} + case QEMU_OPTION_usb: usb_enabled = 1; break; Index: qemu-compat-kvm/qemu-options.hx === --- qemu-compat-kvm.orig/qemu-options.hx +++ qemu-compat-kvm/qemu-options.hx @@ -2838,6 +2838,10 @@ STEXI Enable FIPS 140-2 compliance mode. ETEXI +DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, +-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n, +QEMU_ARCH_I386) + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table As far as I understand, this option was not in QEMU, because this syntax is considered as deprecated. Can we also add an output a warning message in that case? -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [patch 2/6] Use machine options to emulate -no-kvm-irqchip
On Wed, Oct 03, 2012 at 07:24:48PM +0200, Jan Kiszka wrote: On 2012-10-03 19:16, Anthony Liguori wrote: Jan Kiszka jan.kis...@web.de writes: On 2012-10-03 17:03, Marcelo Tosatti wrote: On Wed, Oct 03, 2012 at 09:40:17AM -0500, Anthony Liguori wrote: Marcelo Tosatti mtosa...@redhat.com writes: Commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d from qemu-kvm.git. From: Jan Kiszka jan.kis...@siemens.com Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Although it's a little odd to have From: Jan without a SoB... Agree, Jan can you ACK? I wasn't able to join the call yesterday: Is there a removal schedule associated with those switches? Also, why pushing things upstream, even when only for one release, that have been loudly deprecated for a while in qemu-kvm? Some switches are lacking deprecated warnings on the console, and -no-kvm is missing completely. I tend to focus on patch 1 5, dropping the rest - based on relevance for production use. The distros need to keep these flags to do the switch. Why? Should be documented in commit log. I see no point in deprecating them since they're trivially easy to maintain. Given the level of cr** we already have in the command line, they are kind of noise, yes. But even then, these patches are not consistent as pointed out above. Also, they should not be documented to avoid being spread. That's what we did with other deprecated switches in QEMU. Jan Jan, You're comments to the patch are: - No documentation. - Expiration date. - Changelog explaining what?? (didnt get that). Perhaps better changelog in general? Please help me understand. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Set default accelerator to kvm if the host supports it
On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori anth...@codemonkey.ws wrote: Jan Kiszka jan.kis...@siemens.com writes: If we built a target for a host that supports KVM in principle, set the default accelerator to KVM as well. This also means the start of QEMU will fail to start if KVM support turns out to be unavailable at runtime. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |1 + kvm-stub.c |1 + kvm.h |1 + vl.c |4 ++-- 4 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 92a7137..4d5f86c 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -103,6 +103,7 @@ struct KVMState #endif }; +bool kvm_configured = true; KVMState *kvm_state; bool kvm_kernel_irqchip; bool kvm_async_interrupts_allowed; diff --git a/kvm-stub.c b/kvm-stub.c index 3c52eb5..86a6451 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -17,6 +17,7 @@ #include gdbstub.h #include kvm.h +bool kvm_configured; KVMState *kvm_state; bool kvm_kernel_irqchip; bool kvm_async_interrupts_allowed; diff --git a/kvm.h b/kvm.h index dea2998..9936e5f 100644 --- a/kvm.h +++ b/kvm.h @@ -22,6 +22,7 @@ #include linux/kvm.h #endif +extern bool kvm_configured; extern int kvm_allowed; extern bool kvm_kernel_irqchip; extern bool kvm_async_interrupts_allowed; diff --git a/vl.c b/vl.c index 8d305ca..f557bd1 100644 --- a/vl.c +++ b/vl.c @@ -2215,8 +2215,8 @@ static int configure_accelerator(void) } if (p == NULL) { -/* Use the default accelerator, tcg */ -p = tcg; +/* The default accelerator depends on the availability of KVM. */ +p = kvm_configured ? kvm : tcg; } How about making this an arch_init() function call and then using a #if defined(KVM_CONFIG) in arch_init.c? I hate to introduce another global variable if we can avoid it... Otherwise: Acked-by: Anthony Liguori aligu...@us.ibm.com Blue/Aurelien, any objections? No, maybe a message could be printed that says that the default has changed, for a few releases. Regards, Anthony Liguori while (!accel_initialised *p != '\0') { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to kvm if the host supports it
On 3 October 2012 21:01, Blue Swirl blauwir...@gmail.com wrote: On Mon, Oct 1, 2012 at 4:20 PM, Anthony Liguori anth...@codemonkey.ws wrote: Jan Kiszka jan.kis...@siemens.com writes: +/* The default accelerator depends on the availability of KVM. */ +p = kvm_configured ? kvm : tcg; } Blue/Aurelien, any objections? No, maybe a message could be printed that says that the default has changed, for a few releases. I've lost track of the conversation, are we currently proposing the accelerator default to be kvm (as per the original patch you quote here) or kvm:tcg ? I'm not entirely sure which I prefer from an ARM perspective For some time to come and for a lot of targets (ie any target CPU except A15), having a default of kvm is going to cause existing working commandlines to stop working. [I expect that ARM-host qemu binaries will be built with CONFIG_KVM once ARM KVM support lands, but the same binary will be run on hosts without virtualization extensions.] On the other hand, perhaps there just aren't really very many people who run QEMU on ARM hosts, and so we can ignore them :-) -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: Thinking about Sasha's patches, we can reduce ring usage for virtio net small packets dramatically if we put virtio net header inline with the data. This can be done for free in case guest net stack allocated extra head room for the packet, and I don't see why would this have any downsides. I've been wanting to do this for the longest time... but... Even though with my recent patches qemu no longer requires header to be the first s/g element, Breaks for me; see why I hate bug features? Now we'd need another one... qemu-system-i386: virtio: trying to map MMIO memory Please try my patch. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio-net: correct capacity math on ring full
Michael S. Tsirkin m...@redhat.com writes: Capacity math on ring full is wrong: we are looking at num_sg but that might be optimistic because of indirect buffer use. The implementation also penalizes fast path with extra memory accesses for the benefit of ring full condition handling which is slow path. It's easy to query ring capacity so let's do just that. This path will reduce the actual queue use to worst-case assumptions. With bufferbloat maybe that's a good thing, but it's true. If we do this, the code is now wrong: /* This can happen with OOM and indirect buffers. */ if (unlikely(capacity 0)) { Because this should now *never* happen. But I do like the cleanup; returning capacity from add_buf() was always hacky. I've got an idea, we'll see what it looks like... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: Thinking about Sasha's patches, we can reduce ring usage for virtio net small packets dramatically if we put virtio net header inline with the data. This can be done for free in case guest net stack allocated extra head room for the packet, and I don't see why would this have any downsides. I've been wanting to do this for the longest time... but... Even though with my recent patches qemu no longer requires header to be the first s/g element, we need a new feature bit to detect this. A trivial qemu patch will be sent separately. There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. This is a bug in the specification. The QEMU implementation pre-dates the specification. All of the actual implementations of virtio relied on the semantics of s/g elements and still do. What's in the specification really doesn't matter when it doesn't agree with all of the existing implementations. Users use implementations, not specifications. The specification really ought to be changed here. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Paolo Bonzini pbonz...@redhat.com writes: Il 03/10/2012 08:44, Rusty Russell ha scritto: There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. However, this bug affects (almost) all implementations and (almost) all devices. It even makes sense to reserve a transport feature bit for it instead of a device feature bit. Paolo Perhaps, but we have to fix the bugs first! As I said, my torture patch broke qemu immediately. Since noone has leapt onto fixing that, I'll take a look now... Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. So my plan was to tie this assumption to the new PCI layout. And have a stress-testing patch like the one below in the kernel (see my virtio-wip branch for stuff like this). Turn it on at boot with virtio_ring.torture on the kernel commandline. BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0) is too old. Building the latest git now... Cheers, Rusty. Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE Virtio devices are not supposed to depend on the framing of the scatter-gather lists, but various implementations did. Safeguard this in future by adding an option to deliberately create perverse descriptors. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Ignore framing is really a bad idea. You want backends to enforce reasonable framing because guest's shouldn't do silly things with framing. For instance, with virtio-blk, if you want decent performance, you absolutely want to avoid bouncing the data. If you're using O_DIRECT in the host to submit I/O requests, then it's critical that all of the s/g elements are aligned to a sector boundary and sized to a sector boundary. Yes, QEMU can handle if that's not the case, but it would be insanely stupid for a guest not to do this. This is the sort of thing that ought to be enforced in the specification because a guest cannot perform well if it doesn't follow these rules. A spec isn't terribly useful if the result is guest drivers that are slow. There's very little to gain by not enforcing rules around framing and there's a lot to lose if a guest frames incorrectly. In the rare case where we want to make a framing change, we should use feature bits like Michael is proposing. In this case, we should simply say that with the feature bit, the vnet header can be in the same element as the data but not allow the header to be spread across multiple elements. Regards, Anthony Liguori diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 8d5bddb..930a4ea 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -5,6 +5,15 @@ config VIRTIO bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST, CONFIG_RPMSG or CONFIG_S390_GUEST. +config VIRTIO_DEVICE_TORTURE + bool Virtio device torture tests + depends on VIRTIO DEBUG_KERNEL + help + This makes the virtio_ring implementation creatively change + the format of requests to make sure that devices are + properly implemented. This will make your virtual machine + slow *and* unreliable! Say N. + menu Virtio drivers config VIRTIO_PCI diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index e639584..8893753 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -124,6 +124,149 @@ struct vring_virtqueue #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE +static bool torture; +module_param(torture, bool, 0644); + +struct torture { + unsigned int orig_out, orig_in; + void *orig_data; + struct scatterlist sg[4]; + struct scatterlist orig_sg[]; +}; + +static size_t tot_len(struct scatterlist sg[], unsigned num) +{ + size_t len, i; + + for (len = 0, i = 0; i num; i++) + len += sg[i].length; + + return len; +} + +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum, + const struct scatterlist *src, unsigned snum) +{ + unsigned len; + struct scatterlist s, d; + + s = *src; + d = *dst; + + while (snum dnum) { + len = min(s.length, d.length); + memcpy(sg_virt(d), sg_virt(s), len); + d.offset += len; + d.length -= len; + s.offset += len; + s.length -= len; + if (!s.length) { + BUG_ON(snum == 0); + src++; + snum--; + s = *src; + } + if (!d.length) { + BUG_ON(dnum == 0); + dst++; + dnum--; + d = *dst; + } + } +} + +static bool torture_replace(struct scatterlist **sg, + unsigned int *out, + unsigned int *in, + void **data, + gfp_t gfp) +{ + static size_t seed; + struct torture *t; + size_t outlen, inlen, ourseed, len1; + void *buf; + + if (!torture) + return true; + + outlen = tot_len(*sg, *out); + inlen = tot_len(*sg +
[PATCH] hw: Add test device for unittests execution
Add a test device which supports the kvmctl ioports, so one can run the KVM unittest suite [1]. Usage: qemu -device testdev 1) Removed port 0xf1, since now kvm-unit-tests use serial 2) Removed exit code port 0xf4, since that can be replaced by -device isa-debugexit,iobase=0xf4,access-size=2 3) Removed ram size port 0xd1, since guest memory size can be retrieved from firmware, there's a patch for kvm-unit-tests including an API to retrieve that value. [1] Preliminary versions of this patch were posted to the mailing list about a year ago, I re-read the comments of the thread, and had guidance from Paolo about which ports to remove from the test device. CC: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Gerd Hoffmann kra...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- hw/i386/Makefile.objs | 1 + hw/testdev.c | 131 ++ 2 files changed, 132 insertions(+) create mode 100644 hw/testdev.c diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 8c764bb..64d2787 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o obj-y += kvm/ obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o +obj-y += testdev.o obj-y := $(addprefix ../,$(obj-y)) diff --git a/hw/testdev.c b/hw/testdev.c new file mode 100644 index 000..44070f2 --- /dev/null +++ b/hw/testdev.c @@ -0,0 +1,131 @@ +#include sys/mman.h +#include hw.h +#include qdev.h +#include isa.h + +struct testdev { +ISADevice dev; +MemoryRegion iomem; +CharDriverState *chr; +}; + +#define TYPE_TESTDEV testdev +#define TESTDEV(obj) \ + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV) + +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data) +{ +struct testdev *dev = opaque; + +qemu_set_irq(isa_get_irq(dev-dev, addr - 0x2000), !!data); +} + +static uint32 test_device_ioport_data; + +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t data) +{ +test_device_ioport_data = data; +} + +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr) +{ +return test_device_ioport_data; +} + +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t data) +{ +target_phys_addr_t len = 4096; +void *a = cpu_physical_memory_map(data ~0xffful, len, 0); + +mprotect(a, 4096, PROT_NONE); +mprotect(a, 4096, PROT_READ|PROT_WRITE); +cpu_physical_memory_unmap(a, len, 0, 0); +} + +static char *iomem_buf; + +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr) +{ +return iomem_buf[addr]; +} + +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr) +{ +return *(uint16_t*)(iomem_buf + addr); +} + +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr) +{ +return *(uint32_t*)(iomem_buf + addr); +} + +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +iomem_buf[addr] = val; +} + +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +*(uint16_t*)(iomem_buf + addr) = val; +} + +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +*(uint32_t*)(iomem_buf + addr) = val; +} + +static const MemoryRegionOps test_iomem_ops = { +.old_mmio = { +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, }, +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, }, +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static int init_test_device(ISADevice *isa) +{ +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa); + +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev); +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev); +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev); +register_ioport_write(0xe4, 1, 4, test_device_flush_page, dev); +register_ioport_write(0x2000, 24, 1, test_device_irq_line, NULL); +iomem_buf = g_malloc0(0x1); +memory_region_init_io(dev-iomem, test_iomem_ops, dev, + testdev, 0x1); +memory_region_add_subregion(isa_address_space(dev-dev), 0xff00, + dev-iomem); +return 0; +} + +static Property testdev_isa_properties[] = { +DEFINE_PROP_CHR(chardev, struct testdev, chr), +DEFINE_PROP_END_OF_LIST(), +}; + +static void testdev_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc =
Re: [PATCH 0/3] virtio-net: inline header support
Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: Thinking about Sasha's patches, we can reduce ring usage for virtio net small packets dramatically if we put virtio net header inline with the data. This can be done for free in case guest net stack allocated extra head room for the packet, and I don't see why would this have any downsides. I've been wanting to do this for the longest time... but... Even though with my recent patches qemu no longer requires header to be the first s/g element, we need a new feature bit to detect this. A trivial qemu patch will be sent separately. There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. This is a bug in the specification. The QEMU implementation pre-dates the specification. All of the actual implementations of virtio relied on the semantics of s/g elements and still do. lguest fix is pending in my queue. lkvm and qemu are broken; lkvm isn't ever going to be merged, so I'm not sure what its status is? But I'm determined to fix qemu, and hence my torture patch to make sure this doesn't creep in again. What's in the specification really doesn't matter when it doesn't agree with all of the existing implementations. Users use implementations, not specifications. The specification really ought to be changed here. I'm sorely tempted, except that we're losing a real optimization because of this :( The specification has long contained the footnote: The current qemu device implementations mistakenly insist that the first descriptor cover the header in these cases exactly, so a cautious driver should arrange it so. I'd like to tie this caveat to the PCI capability change, so this note will move to the appendix with the old PCI layout. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] hw: Add test device for unittests execution
On 10/04/2012 12:49 AM, Lucas Meneghel Rodrigues wrote: Add a test device which supports the kvmctl ioports, so one can run the KVM unittest suite [1]. Usage: qemu -device testdev 1) Removed port 0xf1, since now kvm-unit-tests use serial 2) Removed exit code port 0xf4, since that can be replaced by -device isa-debugexit,iobase=0xf4,access-size=2 I forgot to mention that this would work *if* the isa-debugexit device gets upstream. Paolo pointed this thread: http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00818.html But it appears that no consensus was reached. 3) Removed ram size port 0xd1, since guest memory size can be retrieved from firmware, there's a patch for kvm-unit-tests including an API to retrieve that value. [1] Preliminary versions of this patch were posted to the mailing list about a year ago, I re-read the comments of the thread, and had guidance from Paolo about which ports to remove from the test device. CC: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Gerd Hoffmann kra...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- hw/i386/Makefile.objs | 1 + hw/testdev.c | 131 ++ 2 files changed, 132 insertions(+) create mode 100644 hw/testdev.c diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 8c764bb..64d2787 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -11,5 +11,6 @@ obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o xen_pt_msi.o obj-y += kvm/ obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o +obj-y += testdev.o obj-y := $(addprefix ../,$(obj-y)) diff --git a/hw/testdev.c b/hw/testdev.c new file mode 100644 index 000..44070f2 --- /dev/null +++ b/hw/testdev.c @@ -0,0 +1,131 @@ +#include sys/mman.h +#include hw.h +#include qdev.h +#include isa.h + +struct testdev { +ISADevice dev; +MemoryRegion iomem; +CharDriverState *chr; +}; + +#define TYPE_TESTDEV testdev +#define TESTDEV(obj) \ + OBJECT_CHECK(struct testdev, (obj), TYPE_TESTDEV) + +static void test_device_irq_line(void *opaque, uint32_t addr, uint32_t data) +{ +struct testdev *dev = opaque; + +qemu_set_irq(isa_get_irq(dev-dev, addr - 0x2000), !!data); +} + +static uint32 test_device_ioport_data; + +static void test_device_ioport_write(void *opaque, uint32_t addr, uint32_t data) +{ +test_device_ioport_data = data; +} + +static uint32_t test_device_ioport_read(void *opaque, uint32_t addr) +{ +return test_device_ioport_data; +} + +static void test_device_flush_page(void *opaque, uint32_t addr, uint32_t data) +{ +target_phys_addr_t len = 4096; +void *a = cpu_physical_memory_map(data ~0xffful, len, 0); + +mprotect(a, 4096, PROT_NONE); +mprotect(a, 4096, PROT_READ|PROT_WRITE); +cpu_physical_memory_unmap(a, len, 0, 0); +} + +static char *iomem_buf; + +static uint32_t test_iomem_readb(void *opaque, target_phys_addr_t addr) +{ +return iomem_buf[addr]; +} + +static uint32_t test_iomem_readw(void *opaque, target_phys_addr_t addr) +{ +return *(uint16_t*)(iomem_buf + addr); +} + +static uint32_t test_iomem_readl(void *opaque, target_phys_addr_t addr) +{ +return *(uint32_t*)(iomem_buf + addr); +} + +static void test_iomem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +iomem_buf[addr] = val; +} + +static void test_iomem_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +*(uint16_t*)(iomem_buf + addr) = val; +} + +static void test_iomem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +*(uint32_t*)(iomem_buf + addr) = val; +} + +static const MemoryRegionOps test_iomem_ops = { +.old_mmio = { +.read = { test_iomem_readb, test_iomem_readw, test_iomem_readl, }, +.write = { test_iomem_writeb, test_iomem_writew, test_iomem_writel, }, +}, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static int init_test_device(ISADevice *isa) +{ +struct testdev *dev = DO_UPCAST(struct testdev, dev, isa); + +register_ioport_read(0xe0, 1, 1, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 1, test_device_ioport_write, dev); +register_ioport_read(0xe0, 1, 2, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 2, test_device_ioport_write, dev); +register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev); +register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev); +register_ioport_write(0xe4, 1, 4, test_device_flush_page, dev); +register_ioport_write(0x2000, 24, 1, test_device_irq_line, NULL); +iomem_buf = g_malloc0(0x1); +memory_region_init_io(dev-iomem, test_iomem_ops, dev, + testdev, 0x1); +memory_region_add_subregion(isa_address_space(dev-dev), 0xff00, +
Re: [PATCH 0/3] virtio-net: inline header support
Rusty Russell ru...@rustcorp.com.au writes: Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: Thinking about Sasha's patches, we can reduce ring usage for virtio net small packets dramatically if we put virtio net header inline with the data. This can be done for free in case guest net stack allocated extra head room for the packet, and I don't see why would this have any downsides. I've been wanting to do this for the longest time... but... Even though with my recent patches qemu no longer requires header to be the first s/g element, we need a new feature bit to detect this. A trivial qemu patch will be sent separately. There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. This is a bug in the specification. The QEMU implementation pre-dates the specification. All of the actual implementations of virtio relied on the semantics of s/g elements and still do. lguest fix is pending in my queue. lkvm and qemu are broken; lkvm isn't ever going to be merged, so I'm not sure what its status is? But I'm determined to fix qemu, and hence my torture patch to make sure this doesn't creep in again. There are even more implementations out there and I'd wager they all rely on framing. What's in the specification really doesn't matter when it doesn't agree with all of the existing implementations. Users use implementations, not specifications. The specification really ought to be changed here. I'm sorely tempted, except that we're losing a real optimization because of this :( What optimizations? What Michael is proposing is still achievable with a device feature. Are there other optimizations that can be achieved by changing framing that we can't achieve with feature bits? As I mentioned in another note, bad framing decisions can cause performance issues too... The specification has long contained the footnote: The current qemu device implementations mistakenly insist that the first descriptor cover the header in these cases exactly, so a cautious driver should arrange it so. I seem to recall this being a compromise between you and I.. I think I objected strongly to this back when you first wrote the spec and you added this to appease me ;-) Regards, Anthony Liguori I'd like to tie this caveat to the PCI capability change, so this note will move to the appendix with the old PCI layout. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] virtio-net: inline header support
Anthony Liguori anth...@codemonkey.ws writes: Rusty Russell ru...@rustcorp.com.au writes: Michael S. Tsirkin m...@redhat.com writes: There's a reason I haven't done this. I really, really dislike my implemention isn't broken feature bits. We could have an infinite number of them, for each bug in each device. So my plan was to tie this assumption to the new PCI layout. And have a stress-testing patch like the one below in the kernel (see my virtio-wip branch for stuff like this). Turn it on at boot with virtio_ring.torture on the kernel commandline. BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0) is too old. Building the latest git now... Cheers, Rusty. Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE Virtio devices are not supposed to depend on the framing of the scatter-gather lists, but various implementations did. Safeguard this in future by adding an option to deliberately create perverse descriptors. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Ignore framing is really a bad idea. You want backends to enforce reasonable framing because guest's shouldn't do silly things with framing. For instance, with virtio-blk, if you want decent performance, you absolutely want to avoid bouncing the data. If you're using O_DIRECT in the host to submit I/O requests, then it's critical that all of the s/g elements are aligned to a sector boundary and sized to a sector boundary. Yes, QEMU can handle if that's not the case, but it would be insanely stupid for a guest not to do this. This is the sort of thing that ought to be enforced in the specification because a guest cannot perform well if it doesn't follow these rules. Lack of imagination is what got us into trouble in the first place; when presented with one counter-example, it's useful to look for others. That's our job, not to dismiss them a insanely stupid. For example: 1) Perhaps the guest isn't trying to perform well, it's trying to be a tiny bootloader? 2) Perhaps the guest is the direct consumer, and aligning buffers is redundant. A spec isn't terribly useful if the result is guest drivers that are slow. There's very little to gain by not enforcing rules around framing and there's a lot to lose if a guest frames incorrectly. The guest has the flexibility, and gets to decide. The spec is not forcing them to perform badly. In the rare case where we want to make a framing change, we should use feature bits like Michael is proposing. In this case, we should simply say that with the feature bit, the vnet header can be in the same element as the data but not allow the header to be spread across multiple elements. I'd love to split struct virtio_net_hdr_mrg_rxbuf, so the num_buffers ends up somewhere else. The simplest rules are never or always. Cheers, Rusty. PS. Inserting zero-length buffers is something I'd be prepared to rule out, my current patch does it just for yuks... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html