Re: [Qemu-devel] [PATCH v3 0/9] HyperV equivalent of pvpanic driver
On Tue, Jun 30, 2015 at 02:33:18PM +0300, Denis V. Lunev wrote: > Windows 2012 guests can notify hypervisor about occurred guest crash > (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does > handling of this MSR's by KVM and sending notification to user space that > allows to gather Windows guest crash dump by QEMU/LIBVIRT. > > The idea is to provide functionality equal to pvpanic device without > QEMU guest agent for Windows. That's nice - do you know if the Linux kernel (or any other non-Win2k12 kernels) have support for notifying hypevisors via this Hyper-V msr, when running as a guest ? Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On Fri, May 22, 2015 at 12:21:27PM +0100, Peter Maydell wrote: > On 22 May 2015 at 12:12, Daniel P. Berrange wrote: > > Yep, it is hard saying no - but I'd think as long as it was possible to add > > the extra features using -device, it ought to be practical to keep a "virt" > > machine types "-nodefaults -nodefconfig" base setup pretty minimal. > > Mmm, but -device only works for pluggable devices really. We don't > have a coherent mechanism for saying "put the PS/2 keyboard controller > into the system at its usual IO ports" on the command line. Oh, I didn't neccessarily mean that we'd need the ability to add a ps/2 keyboard via -device. I meant that there just need to be able to add /some/ kind of keyboard. eg we have a usb-kbd device that could potentially fill that role. Likewise for mouse pointer. Serial ports, etc. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On Fri, May 22, 2015 at 12:04:54PM +0100, Peter Maydell wrote: > On 22 May 2015 at 12:01, Daniel P. Berrange wrote: > > On the QEMU side of things I wonder if there is scope for taking AArch64's > > 'virt' machine type concept and duplicating it on all architectures. > > Experience suggests that holding the line on "minimal" is really > quite tricky, though -- there's always one more thing that > somebody really wants to add... Yep, it is hard saying no - but I'd think as long as it was possible to add the extra features using -device, it ought to be practical to keep a "virt" machine types "-nodefaults -nodefconfig" base setup pretty minimal. In particular I don't see why we need to have a SATA controller and ISA/LPC bridge in every virt machine - root PCI bus only should be possible, as you can provide disks via virtio-blk or virtio-scsi and serial, parallel, mouse, floppy via PCI devices and/or by adding a USB bus in the cases where you really need one. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU
On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote: > Some of you may have heard about the "Clear Containers" initiative from > Intel, which couple KVM with various kernel tricks to create extremely > lightweight virtual machines. The experimental Clear Containers setup > requires only 18-20 MB to launch a virtual machine, and needs about 60 > ms to boot. > > Now, as all of you probably know, "QEMU is great for running Windows or > legacy Linux guests, but that flexibility comes at a hefty price. Not > only does all of the emulation consume memory, it also requires some > form of low-level firmware in the guest as well. All of this adds quite > a bit to virtual-machine startup times (500 to 700 milliseconds is not > unusual)". > > Right? In fact, it's for this reason that Clear Containers uses kvmtool > instead of QEMU. > > No, wrong! In fact, reporting bad performance is pretty much the same > as throwing down the gauntlet. On the QEMU side of things I wonder if there is scope for taking AArch64's 'virt' machine type concept and duplicating it on all architectures. It would be nice to have a common minimal machine type on all architectures that discards all legacy platform stuff and focuses on the minimum needed to run modern virtual machine optimized guest OS. People would always know that a machine type called 'virt' was the minimal virtualization platform, while the others all target emulation of realworld (legacy) baremetal platforms. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server
On Mon, Jul 21, 2014 at 08:21:21AM -0600, Eric Blake wrote: > On 07/20/2014 03:38 AM, David Marchand wrote: > > When using ivshmem devices, notifications between guests can be sent as > > interrupts using a ivshmem-server (typical use described in documentation). > > The client is provided as a debug tool. > > > > Signed-off-by: Olivier Matz > > Signed-off-by: David Marchand > > --- > > contrib/ivshmem-client/Makefile | 26 ++ > > > +++ b/contrib/ivshmem-client/Makefile > > @@ -0,0 +1,26 @@ > > +# Copyright 2014 6WIND S.A. > > +# All rights reserved > > This file has no other license, and is therefore incompatible with > GPLv2. You'll need to resubmit under an appropriately open license. > > > +++ b/contrib/ivshmem-client/ivshmem-client.h > > @@ -0,0 +1,238 @@ > > +/* > > + * Copyright(c) 2014 6WIND S.A. > > + * All rights reserved. > > + * > > + * This work is licensed under the terms of the GNU GPL, version 2. See > > + * the COPYING file in the top-level directory. > > I'm not a lawyer, but to me, this license is self-contradictory. You > can't have "All rights reserved" and still be GPL, because the point of > the GPL is that you are NOT reserving all rights, but explicitly > granting your user various rights (on condition that they likewise grant > those rights to others). But you're not the only file in the qemu code > base with this questionable mix. In any case adding the term 'All rights reserved' is said to be redundant obsolete these days https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Xen hypervisor inside KVM guest with x2apic CPU feature fails to boot
I'm running kernel-3.14.4-200.fc20.x86_64 qemu-1.6.2-5.fc20.x86_64 xen-4.4.0-4.fc21 In process of trying to get a Xen hypervisor running inside a KVM guest I found that there's a problem with x2apic. NB I do *not* use nested-VMX here, just trying to get plain Xen paravirt working before trying todo nested HVM. Any time I enable the 'x2apic' CPU flag for the KVM guest, the Xen hypervisor running inside the guest will fail to boot: The QEMU/KVM -cpu arg is -cpu core2duo,+erms,+smep,+fsgsbase,+lahf_lm,+rdtscp,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+tsc-deadline,+popcnt,+x2apic,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds The Xen logs indicate it isn't liking the x2apic feature and is disabling it, but then it obviously fails to setup the non-x2apic codepath it is following - even though the non-x2apic codepath works fine if you don't have +x2apic set for the KVM guest. (XEN) Not enabling x2APIC: depends on iommu_supports_eim. (XEN) XSM Framework v1.0.0 initialized (XEN) Flask: Initializing. (XEN) AVC INITIALIZED (XEN) Flask: Starting in permissive mode. (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2693.939 MHz processor. (XEN) Initing memory sharing. (XEN) traps.c:3071: GPF (): 82d0801b83c7 -> 82d08023386b (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 1 CMCI 0 firstbank 1 extended MCE MSR 0 (XEN) Intel machine check reporting enabled (XEN) I/O virtualisation disabled (XEN) Getting VERSION: 1050014 (XEN) Getting VERSION: 1050014 (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) Getting ID: 0 (XEN) Getting LVT0: 8700 (XEN) Getting LVT1: 8400 (XEN) Suppress EOI broadcast on CPU#0 (XEN) enabled ExtINT on CPU#0 (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) init IO_APIC IRQs (XEN) IO-APIC (apicid-pin) 0-0, 0-16, 0-17, 0-18, 0-19, 0-20, 0-21, 0-22, 0-23 not connected. (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC (XEN) ...trying to set up timer (IRQ0) through the 8259A ... failed. (XEN) ...trying to set up timer as Virtual Wire IRQ... failed. (XEN) ...trying to set up timer as ExtINT IRQ... failed :(. (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send a report. Then try booting with the 'noapic' option (XEN) Will attach the full non-trimmed Xen log to this mail, along with a log showing successful boot when 'x2apic' isn't given to KVM. I'm unclear if this is a Xen bug or KVM bug or QEMU bug, or a combination of them Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| Xen 4.4.0-4.fc21 (XEN) Xen version 4.4.0 (mockbuild@[unknown]) (gcc (GCC) 4.9.0 20140506 (Red Hat 4.9.0-3)) debug=n Mon May 12 18:38:23 UTC 2014 (XEN) Latest ChangeSet: (XEN) Bootloader: GRUB 2.00 (XEN) Command line: placeholder loglvl=all guest_loglvl=all com1=115200,8n1 console=com1,vga apic_verbosity=debug (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009fc00 (usable) (XEN) 0009fc00 - 000a (reserved) (XEN) 000f - 0010 (reserved) (XEN) 0010 - 5dbfe000 (usable) (XEN) 5dbfe000 - 5dc0 (reserved) (XEN) feffc000 - ff00 (reserved) (XEN) fffc - 0001 (reserved) (XEN) System RAM: 1499MB (1535604kB) (XEN) ACPI: RSDP 000F1690, 0014 (r0 BOCHS ) (XEN) ACPI: RSDT 5DBFE4A0, 0030 (r1 BOCHS BXPCRSDT1 BXPC1) (XEN) ACPI: FACP 5DBFFF80, 0074 (r1 BOCHS BXPCFACP1 BXPC1) (XEN) ACPI: DSDT 5DBFE4D0, 1137 (r1 BXPC BXDSDT1 INTL 20140114) (XEN) ACPI: FACS 5DBFFF40, 0040 (XEN) ACPI: SSDT 5DBFF700, 0838 (r1 BOCHS BXPCSSDT1 BXPC1) (XEN) ACPI: APIC 5DBFF610, 0078 (r1 BOCHS BXPCAPIC1 BXPC1) (XEN) No NUMA configuration found (XEN) Faking a node at -5dbfe000 (XEN) Domain heap initialised (XEN) found SMP MP-table at 000f17f0 (XEN) DMI 2.4 present. (XEN) APIC boot state is 'xapic' (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0xb008 (XEN) ACPI: SLEEP INFO: pm1x_cnt[b004,0], pm1x_evt[b000,0] (XEN) ACPI: wakeup_vec[5dbfff4c], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 6:15 APIC version 20 (XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) (XEN) ACP
Re: [Qemu-devel] KVM call agenda for 2014-04-28
On Tue, Apr 29, 2014 at 02:33:58PM +0200, Markus Armbruster wrote: > Peter Maydell writes: > > > On 29 April 2014 11:09, Michael S. Tsirkin wrote: > >> Let's just make clear how to contact us securely, when to contact that > >> list, and what we'll do with the info. I cobbled together the > >> following: > >> http://wiki.qemu.org/SecurityProcess > > > > Looks generally OK I guess. I'd drop the 'how to use pgp' section -- > > anybody who cares will already know how to send us PGP email. > > The first paragraph under "How to Contact Us Securely" is fine, the rest > seems redundant for readers familiar with PGP, yet hardly sufficient for > the rest. > > One thing I like about Libvirt's Security Process page[*] is they give > an idea on embargo duration. FWIW I picked the "2 weeks" length myself a completely arbitrary timeframe. We haven't stuck to that strictly - we consider needs of each vulnerability as it is triaged to determine the minimum practical embargo time. So think of "2 weeks" as more of a guiding principal to show the world that we don't believe in keeping issues under embargo for very long periods of time. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help regarding virsh domifstat
On Thu, Oct 31, 2013 at 08:30:30PM -0500, Rohit Bhat wrote: > Hi, > > I need a small help. I am working on a project where i have to monitor > network activity of a VM running on KVM. > > I am interested in how much data is going into the VM and how much > data is coming out of the VM. I checked on the net and found out virsh > domifstat is the way to go about it. > > 1. But looks like these stats also include bytes related to control > traffic for the VM. Is there a way to exclude that? I just want the > size of actual data transfers. > > 2. Is there a way by which i can report the data transfer of VM with > the outside world (outside hypervisor) only while excluding data > transfer with any other VM on the same host? > > Please let me know if this is a not the right group for such queries. The libvirt-users mailing list is a better place for virsh related questions http://libvirt.org/contact.html#email Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu, numa: non-contiguous cpusets
On Sun, Sep 29, 2013 at 05:10:44PM +0200, Borislav Petkov wrote: > Btw, > > while I got your attention, on a not-really related topic: how do we > feel about adding support for specifying a non-contiguous set of cpus > for a numa node in qemu with the -numa option? I.e., like this, for > example: > > x86_64-softmmu/qemu-system-x86_64 -smp 8 -numa node,nodeid=0,cpus=0\;2\;4-5 > -numa node,nodeid=1,cpus=1\;3\;6-7 > > The ';' needs to be escaped from the shell but I'm open for better > suggestions. Use a ':' instead. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt-users] Questions on how to reset ID numbers for virt Guests.
On Wed, Sep 11, 2013 at 09:47:07AM +0200, Paolo Bonzini wrote: > Il 11/09/2013 00:27, James Sparenberg ha scritto: > > I'm doing some experimenting in our Development lab and as a result > > I'm kickstarting over and over Virtual guests. This is of course > > causing the guest Id to increment by one with each test. I've > > googled around and tried searching the list but have not found out > > how (if at all) it would be possible to reset the ID number back to 1 > > more than is in use. Also is there a limit where I run out of ID's? > > (for example does it only go up to 99?) > > No, there is no limit. Well, 'int' will wrap eventually, but you'd need to have created a hell of alot of guests for that to be a problem :-) > I don't know the answer to your other question, so I'm adding the > libvirt-users mailing list. If you restart libvirtd, it reset itself to start allocating IDs at the max current used ID of any running guest. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disabling mergeable rx buffers for the guest
On Tue, Jul 16, 2013 at 10:40:28AM +, Naor Shlomo wrote: > Hi Paolo, > > For some unknown reason it suddenly started to accept the changes to the XML > and the strings you gave me are now in place. > Upon machine start I now receive the following error messages: > > virsh # start NaorDev > error: Failed to start domain NaorDev > error: internal error Process exited while reading console log output: kvm: > -global: requires an argument > > Here's the XML: > > > > > Presumably what you wanted to do was Rather than setting an environment variable. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] Bugs filed in the week for Upstream Qemu and Libvirt
On Wed, Jul 10, 2013 at 06:45:08PM +0530, chandrashekar shastri wrote: > Hi, > > Below are bugs filed in this week for Upstream qemu and libvirt: > > Qemu in Launchpad: > > https://bugs.launchpad.net/opensuse/+bug/1199416 > Hot-add qcow2 [virtio-scsi] devices doesn't work in SlLES-11-SP2guest > > Libvirt Bugs: > > Bug 982224 - Attaching of the Virtio-scsi [qcow2] drives fails with > "error: internal error No more available PCI addresses" > Bug 982455 - RHEL Guest fails to boot after attaching 200+ scsi > devices [virtio-scsi qcow2] > Bug 980954 - Virtio-scsi drives in Windows7 shows yellow bang in > device manager though virtio scsi pass through driver is installed > Bug 982630 - Documentation : virsh attach-disk --help shouldbe > updated with proper examples for --type and --driver We really don't need lists of bugs emailed to the libvirt-list mailing list. People already monitor bugzilla & have email alerts from bugzilla as they desire. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.9.x kvm hangs after seabios
On Wed, May 08, 2013 at 02:08:55PM +0200, Tomas Papan wrote: > Hi, > > I found this in the libvirt (but those messages are same in 3.8.x) > anakin libvirt # cat libvirtd.log > 2013-05-08 11:59:29.645+: 3750: info : libvirt version: 1.0.5 > 2013-05-08 11:59:29.645+: 3750: error : udevGetDMIData:1548 : > Failed to get udev device for syspath '/sys/devices/virtual/dmi/id' or > '/sys/class/dmi/id' > 2013-05-08 11:59:29.680+: 3750: warning : > ebiptablesDriverInitCLITools:4225 : Could not find 'ebtables' > executable You need to look at /var/log/libvirt/qemu/$GUESTNAME.log for QEMU related messages. The libvirtd.log file only has the libvirt related messages. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [okeanos-dev] Re: KVM versions, machine types and failed migrations
On Wed, Jan 09, 2013 at 03:27:53PM +0200, Vangelis Koukis wrote: > On Wed, Jan 09, 2013 at 01:10:45pm +0000, Daniel P. Berrange wrote: > > When doing migration, the fundamental requirement is that the guest > > OS visible machine ABI must not change. Thus there are three key > > things to take care of when launching QEMU on the migration target > > host. > > > > - The device PCI/USB addresses must be identical to the source > > - The machine type must be identical to the source > > - The CPU model must be identical to the source > > > > Thanks for the detailed list of requirements, we'll take it into account > for the relevant Ganeti patch. > > > If you don't follow those requirements, either QEMU or the guest OS > > or both will crash & burn during migration & you get to keep both > > pieces :-) > > > > My point is, are these requirements left up to the caller of "kvm > -incoming" to satisfy? Since the migration will most probably break, > wouldn't it be best for QEMU to detect this and complain loudly, instead > of continuing with the migration, failing silently and destroying the > VM? > > Sure there could be some "yes, do it, I know it is going to break" > option, which will make QEMU proceed with the migration. However, in 99% > of the cases this is just user error, e.g. the user has upgraded the > version on the other end and has not specified -M explicitly. It would > be best if QEMU was able to detect and warn the user about what is going > to happen, because it does lead to the VM dying. What you describe is certainly desirable, but it is quite hard to achieve with current QEMU. Much of the work with moving to the new QEMU object model & configuration descriptions has been motivated by a desire to enable improvements migration handling. As you suggest, the goal is that the source QEMU be able to send a complete & reliable hardware description to the destination QEMU during migration.It is getting closer, but we're not there yet. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM versions, machine types and failed migrations
On Wed, Jan 09, 2013 at 02:23:50PM +0200, Vangelis Koukis wrote: > Hello, > > I'd like to ask a few questions about the way migrations work in KVM > among different emulated machine types and different versions of the > qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists, > please redirect me if I was wrong in doing so. > > In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we > see VM migrations fail silently if going from kvm 1.0 to kvm 1.1. > The source VM is frozen, "info migrate" on the source monitor reports > success, but the VM is dead upon arrival on the destination process. > Please see [3] for the exact package versions for qemu-kvm we have > tested with. > > Migration works if the destination kvm has been started with the same > machine type as the source VM, e.g., using "-M pc-1.0" specifically on > the destination, when migrating a pc-1.0 machine from kvm 1.0 to > kvm 1.1. > > How does the machine type specified with -M work in the case of > migrations? Are migrations expected to fail if the machine type is > different between source and destination process? If yes, shouldn't KVM be > able to detect this and abort the migration instead of failing silently? When doing migration, the fundamental requirement is that the guest OS visible machine ABI must not change. Thus there are three key things to take care of when launching QEMU on the migration target host. - The device PCI/USB addresses must be identical to the source - The machine type must be identical to the source - The CPU model must be identical to the source If you don't follow those requirements, either QEMU or the guest OS or both will crash & burn during migration & you get to keep both pieces :-) > Regarding different package versions of qemu-kvm, it seems migrations do > not work from source 0.12.5 to any other version *even* if -M pc-0.12 is > specified at the incoming KVM process. For versions >= 1.0 everything > works provided the machine type on the destination is the same as on the > source. Some older versions of QEMU were buggy causing the machine type to not correctly preserve ABI. > Our goal is to patch Ganeti [2] so that it sets the destination machine > type to that of the source specifically, ensuring migrations work > seamlessly after a KVM upgrade. Is there a way to retrieve the machine > type of a running KVM process through a monitor command? IIRC there is not a monitor command for this. The general approach to dealing with migration stability should be to launch QEMU with a canonical hardware configuration. This means explicitly setting a machine type, CPU model and PCI/USB devices addresses upfront. NB you should not use 'pc' as a machine type - if you query the list of machine types from QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the versioned type so you have a known machine type. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility
On Mon, Oct 01, 2012 at 08:19:29AM -0500, Anthony Liguori wrote: > Jan Kiszka writes: > I think at this point, none of this matters but I added the various > distro maintainers to the thread. > > I think it's time for the distros to drop qemu-kvm and just ship > qemu.git. Is there anything else that needs to happen to make that > switch? If that is upstream's recommendation, then I see no issue with switching Fedora 19 / RHEL-7 to use qemu.git instead of qemu-kvm.git. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it
On Mon, Oct 01, 2012 at 06:43:00PM +0200, Andreas Färber wrote: > Hello Jan, > > Am 01.10.2012 16:34, schrieb Jan Kiszka: > > If we built a target for a host that supports KVM in principle, set the > > default accelerator to KVM as well. This also means the start of QEMU > > will fail to start if KVM support turns out to be unavailable at > > runtime. > > From a distro point of view this of course means that we will build > against KVM and that the new KVM default will start to fail for users on > very old hardware. Can't we do a runtime check to select the default? NB, this is *not* only about old hardware. There are plenty of users who use QEMU inside VMs. One very common usage I know of is image building tools which are run inside Amazon VMs, using libguestfs & QEMU. IMHO, default to KVM, fallback to TCG is the most friendly default behaviour. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] TSC scaling interface to management
On Wed, Sep 12, 2012 at 12:39:39PM -0300, Marcelo Tosatti wrote: > > > HW TSC scaling is a feature of AMD processors that allows a > multiplier to be specified to the TSC frequency exposed to the guest. > > KVM also contains provision to trap TSC ("KVM: Infrastructure for > software and hardware based TSC rate scaling" cc578287e3224d0da) > or advance TSC frequency. > > This is useful when migrating to a host with different frequency and > the guest is possibly using direct RDTSC instructions for purposes > other than measuring cycles (that is, it previously calculated > cycles-per-second, and uses that information which is stale after > migration). > > "qemu-x86: Set tsc_khz in kvm when supported" (e7429073ed1a76518) > added support for tsc_khz= option in QEMU. > > I am proposing the following changes so that management applications > can work with this: > > 1) New option for tsc_khz, which is tsc_khz=host (QEMU command line > option). Host means that QEMU is responsible for retrieving the > TSC frequency of the host processor and use that. > Management application does not have to deal with the burden. FYI, libvirt already has support for expressing a number of different TSC related config options, for support of Xen and VMWare's capabilities in this area. What we currently allow for is In this context the frequency attribute provides the HZ value to provide to the guest. - auto == Emulate if TSC is unstable, else allow native TSC access - native == Always allow native TSC access - emulate = Always emulate TSC - smpsafe == Always emulate TSC, and interlock SMP > Therefore it appears that this "tsc_khz=auto" option can be specified > only if the user specifies so (it can be a per-guest flag hidden > in the management configuration/manual). > > Sending this email to gather suggestions (or objections) > to this interface. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8] kvm: notify host when the guest is panicked
On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote: > On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote: > > We can know the guest is panicked when the guest runs on xen. > > But we do not have such feature on kvm. > > > > Another purpose of this feature is: management app(for example: > > libvirt) can do auto dump when the guest is panicked. If management > > app does not do auto dump, the guest's user can do dump by hand if > > he sees the guest is panicked. > > > > We have three solutions to implement this feature: > > 1. use vmcall > > 2. use I/O port > > 3. use virtio-serial. > > > > We have decided to avoid touching hypervisor. The reason why I choose > > choose the I/O port is: > > 1. it is easier to implememt > > 2. it does not depend any virtual device > > 3. it can work when starting the kernel > > How about searching for the "Kernel panic - not syncing" string > in the guests serial output? Say libvirtd could take an action upon > that? No, this is not satisfactory. It depends on the guest OS being configured to use the serial port for console output which we cannot mandate, since it may well be required for other purposes. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First shot at adding IPMI to qemu
On Mon, Jul 09, 2012 at 08:23:11AM -0500, Corey Minyard wrote: > I haven't heard anything about these patches. Any comments, good or > bad? Has anyone tried these? You really ought to post this to the qemu-devel mailing list, since that's where the majority of QEMU developers hang out. This KVM list is primarily for KVM specific development tasks in QEMU. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Fix default machine options
On Fri, Jul 06, 2012 at 06:21:06PM +0200, Jan Kiszka wrote: > qemu-kvm-specific machine defaults were missing for pc-0.15 to pc-1.1. > Then Daniel noted that --disable-kvm caused problems as the generated > binaries would be unable to run. As we are at it, we can drop the > kernel_irqchip=on that is now enable by default in upstream. > > CC: Daniel P. Berrange > Signed-off-by: Jan Kiszka ACK, looks good to me. > Noticed that there was more to do. Can you take care of stable-1.1, > Daniel? TIA. Yep, will post a patch for stable-1.1 when this is accepted into master. > hw/pc_piix.c | 23 --- > 1 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/hw/pc_piix.c b/hw/pc_piix.c > index 98a06fa..5860d52 100644 > --- a/hw/pc_piix.c > +++ b/hw/pc_piix.c > @@ -353,6 +353,12 @@ static void pc_xen_hvm_init(ram_addr_t ram_size, > } > #endif > > +#ifdef CONFIG_KVM_OPTIONS > +#define KVM_MACHINE_OPTIONS "accel=kvm" > +#else > +#define KVM_MACHINE_OPTIONS "" > +#endif > + > static QEMUMachine pc_machine_v1_2 = { > .name = "pc-1.2", > .alias = "pc", > @@ -360,7 +366,7 @@ static QEMUMachine pc_machine_v1_2 = { > .init = pc_init_pci, > .max_cpus = 255, > .is_default = 1, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > }; > > #define PC_COMPAT_1_1 \ > @@ -387,6 +393,7 @@ static QEMUMachine pc_machine_v1_1 = { > .desc = "Standard PC", > .init = pc_init_pci, > .max_cpus = 255, > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_1_1, > { /* end of list */ } > @@ -422,6 +429,7 @@ static QEMUMachine pc_machine_v1_0 = { > .desc = "Standard PC", > .init = pc_init_pci, > .max_cpus = 255, > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_1_0, > { /* end of list */ } > @@ -437,6 +445,7 @@ static QEMUMachine pc_machine_v0_15 = { > .desc = "Standard PC", > .init = pc_init_pci, > .max_cpus = 255, > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_15, > { /* end of list */ } > @@ -469,7 +478,7 @@ static QEMUMachine pc_machine_v0_14 = { > .desc = "Standard PC", > .init = pc_init_pci, > .max_cpus = 255, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_14, > { > @@ -503,7 +512,7 @@ static QEMUMachine pc_machine_v0_13 = { > .desc = "Standard PC", > .init = pc_init_pci_no_kvmclock, > .max_cpus = 255, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_13, > { > @@ -541,7 +550,7 @@ static QEMUMachine pc_machine_v0_12 = { > .desc = "Standard PC", > .init = pc_init_pci_no_kvmclock, > .max_cpus = 255, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_12, > { > @@ -575,7 +584,7 @@ static QEMUMachine pc_machine_v0_11 = { > .desc = "Standard PC, qemu 0.11", > .init = pc_init_pci_no_kvmclock, > .max_cpus = 255, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_11, > { > @@ -597,7 +606,7 @@ static QEMUMachine pc_machine_v0_10 = { > .desc = "Standard PC, qemu 0.10", > .init = pc_init_pci_no_kvmclock, > .max_cpus = 255, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > PC_COMPAT_0_11, > { > @@ -631,7 +640,7 @@ static QEMUMachine isapc_machine = { > .desc = "ISA-only PC", > .init = pc_init_isa, > .max_cpus = 1, > -.default_machine_opts = "accel=kvm,kernel_irqchip=on", > +.default_machine_opts = KVM_MACHINE_OPTIONS, > .compat_props = (GlobalProperty[]) { > { > .driver = "pc-sysfw", > -- > 1.7.3.4 Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix default accelerator when building with --disable-kvm
From: "Daniel P. Berrange" The following commit commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d Author: Jan Kiszka Date: Fri Mar 2 10:30:43 2012 +0100 qemu-kvm: Use machine options to configure qemu-kvm defaults Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. prevents qemu from starting when it has been build with the --disable-kvm argument, because the accelerator is hardcoded to 'kvm'. This is a regression previously fixed by commit ce967f6610dcd7b7762dbad5a639fecf42d5c76d Author: Daniel P. Berrange Date: Fri Aug 5 09:50:29 2011 +0100 Fix default accelerator when configured with --disable-kvm The default accelerator is hardcoded to 'kvm'. This is a fine default for qemu-kvm normally, but if the user built with ./configure --disable-kvm, then the resulting binaries will not work by default The fix is again to make this conditional on CONFIG_KVM_OPTIONS Signed-off-by: Daniel P. Berrange --- hw/pc_piix.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 98a06fa..35202dd 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -360,7 +360,9 @@ static QEMUMachine pc_machine_v1_2 = { .init = pc_init_pci, .max_cpus = 255, .is_default = 1, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif }; #define PC_COMPAT_1_1 \ @@ -469,7 +471,9 @@ static QEMUMachine pc_machine_v0_14 = { .desc = "Standard PC", .init = pc_init_pci, .max_cpus = 255, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { PC_COMPAT_0_14, { @@ -503,7 +507,9 @@ static QEMUMachine pc_machine_v0_13 = { .desc = "Standard PC", .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { PC_COMPAT_0_13, { @@ -541,7 +547,9 @@ static QEMUMachine pc_machine_v0_12 = { .desc = "Standard PC", .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { PC_COMPAT_0_12, { @@ -575,7 +583,9 @@ static QEMUMachine pc_machine_v0_11 = { .desc = "Standard PC, qemu 0.11", .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { PC_COMPAT_0_11, { @@ -597,7 +607,9 @@ static QEMUMachine pc_machine_v0_10 = { .desc = "Standard PC, qemu 0.10", .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { PC_COMPAT_0_11, { @@ -631,7 +643,9 @@ static QEMUMachine isapc_machine = { .desc = "ISA-only PC", .init = pc_init_isa, .max_cpus = 1, +#ifdef CONFIG_KVM_OPTIONS .default_machine_opts = "accel=kvm,kernel_irqchip=on", +#endif .compat_props = (GlobalProperty[]) { { .driver = "pc-sysfw", -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote: > On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote: > > On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: > > > Resending series, after fixing some coding style issues. Does anybody has > > > any > > > feedback about this proposal? > > > > > > Changes v1 -> v2: > > > - Coding style fixes > > > > > > Original cover letter: > > > > > > I was investigating if there are any mechanisms that allow manually > > > pinning of > > > guest RAM to specific host NUMA nodes, in the case of multi-node KVM > > > guests, and > > > noticed that -mem-path could be used for that, except that it currently > > > removes > > > any files it creates (using mkstemp()) immediately, not allowing numactl > > > to be > > > used on the backing files, as a result. This patches add a > > > -keep-mem-path-files > > > option to make QEMU create the files inside -mem-path with more > > > predictable > > > names, and not remove them after creation. > > > > > > Some previous discussions about the subject, for reference: > > > - Message-ID: <1281534738-8310-1-git-send-email-andre.przyw...@amd.com> > > >http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 > > > - Message-ID: <4c7d7c2a.7000...@codemonkey.ws> > > >http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 > > > > > > A more recent thread can be found at: > > > - Message-ID: <20111029184502.gh11...@in.ibm.com> > > >http://article.gmane.org/gmane.comp.emulators.qemu/123001 > > > > > > Note that this is just a mechanism to facilitate manual static binding > > > using > > > numactl on hugetlbfs later, for optimization. This may be especially > > > useful for > > > single large multi-node guests use-cases (and, of course, has to be used > > > with > > > care). > > > > > > I don't know if it is a good idea to use the memory range names as a > > > publicly- > > > visible interface. Another option may be to use a single file instead, > > > and mmap > > > different regions inside the same file for each memory region. I an open > > > to > > > comments and suggestions. > > > > > > Example (untested) usage to bind manually each half of the RAM of a guest > > > to a > > > different NUMA node: > > > > > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > > >-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > > >-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO > > > $ numactl --offset=1G --length=1G --membind=1 --file > > > /mnt/hugetlbfs/FOO/pc.ram > > > $ numactl --offset=0 --length=1G --membind=2 --file > > > /mnt/hugetlbfs/FOO/pc.ram > > > > I'd suggest that instead of making the memory file name into a > > public ABI QEMU needs to maintain, QEMU could expose the info > > via a monitor command. eg > > > >$ qemu-system-x86_64 [...] -m 2048 -smp 4 \ > > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ > > -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ > > -monitor stdio > >(qemu) info mem-nodes > > node0: file=/proc/self/fd/3, offset=0G, length=1G > > node1: file=/proc/self/fd/3, offset=1G, length=1G > > > > This example takes advantage of the fact that with Linux, you can > > still access a deleted file via /proc/self/fd/NNN, which AFAICT, > > would avoid the need for a --keep-mem-path-files. > > I like the suggestion. > > But other processes still need to be able to open those files if we want > to do anything useful with them. In this case, I guess it's better to > let QEMU itself build a "/proc//fd/" string instead of > using "/proc/self" and forcing the client to find out what's the right > PID? > > Anyway, even if we want to avoid file-descriptor and /proc tricks, we > can still use the interface you suggest. Then we wouldn't need to have > any filename assumptions: the filenames could be completly random, as > they would be reported using the new monitor command. Opps, yes of course. I did intend that client apps could use the files, so I should have used /proc/$PID and not /proc/self > > > > > By returning info via a monitor command you also avoid hardcoding > > the use of 1 single file for all
Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: > Resending series, after fixing some coding style issues. Does anybody has any > feedback about this proposal? > > Changes v1 -> v2: > - Coding style fixes > > Original cover letter: > > I was investigating if there are any mechanisms that allow manually pinning of > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, > and > noticed that -mem-path could be used for that, except that it currently > removes > any files it creates (using mkstemp()) immediately, not allowing numactl to be > used on the backing files, as a result. This patches add a > -keep-mem-path-files > option to make QEMU create the files inside -mem-path with more predictable > names, and not remove them after creation. > > Some previous discussions about the subject, for reference: > - Message-ID: <1281534738-8310-1-git-send-email-andre.przyw...@amd.com> >http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 > - Message-ID: <4c7d7c2a.7000...@codemonkey.ws> >http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 > > A more recent thread can be found at: > - Message-ID: <20111029184502.gh11...@in.ibm.com> >http://article.gmane.org/gmane.comp.emulators.qemu/123001 > > Note that this is just a mechanism to facilitate manual static binding using > numactl on hugetlbfs later, for optimization. This may be especially useful > for > single large multi-node guests use-cases (and, of course, has to be used with > care). > > I don't know if it is a good idea to use the memory range names as a publicly- > visible interface. Another option may be to use a single file instead, and > mmap > different regions inside the same file for each memory region. I an open to > comments and suggestions. > > Example (untested) usage to bind manually each half of the RAM of a guest to a > different NUMA node: > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ >-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ >-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO > $ numactl --offset=1G --length=1G --membind=1 --file > /mnt/hugetlbfs/FOO/pc.ram > $ numactl --offset=0 --length=1G --membind=2 --file > /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter
On Wed, Jun 27, 2012 at 04:52:32PM +0200, Cornelia Huck wrote: > On Wed, 27 Jun 2012 15:02:23 +0800 > Wen Congyang wrote: > > > When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT. > > So if qemu reads 0x1 from this port, we can do the folloing three > > things according to the parameter -onpanic: > > 1. emit QEVENT_GUEST_PANICKED only > > 2. emit QEVENT_GUEST_PANICKED and pause the guest > > 3. emit QEVENT_GUEST_PANICKED and poweroff the guest > > 4. emit QEVENT_GUEST_PANICKED and reset the guest > > Would it be useful to add some "dump the guest" actions here? Better off leaving that to the mgmt layer using QEMU. If you tried to directly handle "dump the guest" in the context of the panic notifier then you add all sorts of extra complexity to this otherwise simple feature. For a start the you need to tell it what filename to use, which is not something you can necessarily decide at the time QEMU starts - you might want a separate filename each time a panic ocurrs. THe mgmt app might not even want QEMU to dump to a file - it might want to use a socket, or pass in a file descriptor at time of dump. All in all, it is better to keep the panic notifier simple, and let the mgmt app then decide whether to take a dump separately, using existing QEMU monitor commands and features. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event
On Tue, Jun 12, 2012 at 09:35:04AM -0300, Luiz Capitulino wrote: > On Tue, 12 Jun 2012 14:55:37 +0800 > Wen Congyang wrote: > > > >> +static void panicked_perform_action(void) > > >> +{ > > >> +switch(panicked_action) { > > >> +case PANICKED_REPORT: > > >> +panicked_mon_event("report"); > > >> +break; > > >> + > > >> +case PANICKED_PAUSE: > > >> +panicked_mon_event("pause"); > > >> +vm_stop(RUN_STATE_GUEST_PANICKED); > > >> +break; > > >> + > > >> +case PANICKED_QUIT: > > >> +panicked_mon_event("quit"); > > >> +exit(0); > > >> +break; > > >> +} > > > > > > Having the data argument is not needed/wanted. The mngt app can guess it > > > if it > > > needs to know it, but I think it doesn't want to. > > > > Libvirt will do something when the kernel is panicked, so it should know > > the action > > in qemu side. > > But the action will be set by libvirt itself, no? Sure, but the whole world isn't libvirt. If the process listening to the monitor is not the same as the process which launched the VM, then I think including the action is worthwhile. Besides, the way Wen has done this is identical to what we already do with QEVENT_WATCHDOG and I think it is desirable to keep consistency here. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for June, Tuesday 15th
On Tue, May 15, 2012 at 08:44:14AM -0500, Anthony Liguori wrote: > On 05/15/2012 03:51 AM, Kevin Wolf wrote: > >Currently we have a very simple unidirectional structure: > >qemu is a standalone program that keeps running on its own. libvirt is > >the user of qemu. Often enough it's already hard to get things working > >correctly in error cases with this simple structure - do you really want > >to have qemu depend on an RPC to libvirt? > > Yes. We're relying on libvirt for a *syscall* that the kernel isn't > processing correctly. I'm not advocating a general mechanism where > we defer larges parts of QEMU to libvirt. This is specifically the > open() syscall. > > >You're right that the proper fix would be in the kernel, but in qemu a > >much better solution that RPCs to libvirt is allowing all QMP commands > >that open new files to pass a block device description that can contain > >a fd. > > I don't agree that this is an obviously better solution. For > example, it mandates that libvirt parse image formats to determine > the backing file chains. I think that the question of parsing image formats is tangential to this QEMU impl choice. > OTOH, the open() RPC allows libvirt to avoid parsing image formats. > It could do something as simple as have the user specify a white > list of image files the guest is allowed to access in the domain XML > and validate against that. > > It removes considerable complexity from libvirt as it doesn't have > to construct a potentially complex set of blockdev arguments. I don't really think this QEMU approach to a callback for arbitrary files simplifies libvirt's life in any way. In fact I think it will actually complicate our life, because instead of being able to provide all the information/resources required at one time, we need have to wait to get async callbacks some time later. We then have to try and figure out whether the file being request is actually allowed by the config. > >This would much better than first getting an open command via QMP > >and then using an RPC to ask back what we're really meant to open. > > > >To the full extent we're going to get this with blockdev-add (which is > >what we should really start working on now rather than on hacks like > >-open-fd-hook), but if you like hacks, much (if not all) of it is > >already possible today with the 'existing' mode of live snapshots. > > I really don't think that blockdev is an elegant solution to this > problem. It pushes an awful lot of complexity to libvirt (or any > management tool). > > I actually think Avi's original idea of a filename dictionary is a > better approach than blockdev for solving this problem. While I raise blockdev as an alternative approach, I am open to other alternative ways to provide this config via the CLI or monitor. Basically anything that isn't this generic file open callback. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: smp option of qemu-kvm
On Thu, Apr 05, 2012 at 02:52:40PM -0400, Steven wrote: > Hi, Daniel, > Thanks for your quick response. However, the ps -eLf show 4 threads > for the VM and I checked 4 threads have the same tgid. > But the VM I created is with -smp 2 option. Could you explain this? Thanks. As well as the vCPU threads, QEMU creates other threads as needed, typically for I/O - indeed the count of threads may vary over time. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: smp option of qemu-kvm
On Thu, Apr 05, 2012 at 02:28:51PM -0400, Steven wrote: > Hi, > I started a kvm VM by adding -smp 2 option. From inside the guest, I > can see that /proc/cpuinfo outputs 2 cores. > However, in the host, I only observe one qemu-kvm process for that VM. > Does that mean this VM is actually running on one core? > If so, how to make a VM to run on 2 or more cores? Thanks. Each VCPU in KVM corresponds to a separate thread in the process. The 'ps' command only ever shows the thread leader by default - so you don't see those VCPU threads in the process list. eg ps -eLf to see all threads Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 21, 2012 at 06:25:16PM +0200, Avi Kivity wrote: > On 03/21/2012 06:18 PM, Corey Minyard wrote: > > > >> Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic > >> event over IMPI. The code is pretty complex. Of course if we a going to > >> implement something more complex than simple hypercall for panic > >> notification we better do something more interesting with it than just > >> saying "panic happened", like sending stack traces on all cpus for > >> instance. > > > > I doubt that's the best example, unfortunately. The IPMI event log > > has limited space and it has to be send a little piece at a time since > > each log entry is 14 bytes. It just prints the panic string, nothing > > else. Not that it isn't useful, it has saved my butt before. > > > > You have lots of interesting options with paravirtualization. You > > could, for instance, create a console driver that delivered all > > console output efficiently through a hypercall. That would be really > > easy. Or, as you mention, a custom way to deliver panic information. > > Collecting information like stack traces would be harder to > > accomplish, as I don't think there is currently a way to get it except > > by sending it to printk. > > That already exists; virtio-console (or serial console emulation) can do > the job. > > In fact the feature can be implemented 100% host side by searching for a > panic string signature in the console logs. You can even go one better and search for the panic string in the guest memory directly, which is what virt-dmesg does :-) http://people.redhat.com/~rjones/virt-dmesg/ Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 14, 2012 at 07:06:50PM +0800, Wen Congyang wrote: > At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote: > > On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote: > >> At 03/14/2012 06:52 PM, Avi Kivity Wrote: > >>> On 03/14/2012 12:52 PM, Wen Congyang wrote: > >>>>> > >>>>>> If so, is this channel visible to guest userspace? If the channle is > >>>>>> visible to guest > >>>>>> userspace, the program running in userspace may write the same message > >>>>>> to the channel. > >>>>> > >>>>> Access control is via permissions. You can have udev scripts assign > >>>>> whatever uid and gid to the port of your interest. By default, all > >>>>> ports are only accessible to the root user. > >>>> > >>>> We should also prevent root user writing message to this channel if it is > >>>> used for panicked notification. > >>>> > >>> > >>> Why? root can easily cause a panic. > >>> > >> > >> root user can write the same message to virtio-serial while the guest is > >> running... > > > > Unless you are running a MAC policy which strictly confines the root > > account, root can cause a kernel panic regardless of virtio-serial > > permissions in the guest: > > > > echo c > /proc/sysrq-trigger > > Yes, root user can cause a kernel panic. But if he writes the same message to > virtio-serial, > the host will see the guest is panicked while the guest is not panicked. The > host is cheated. The host mgmt layer must *ALWAYS* expect that any information originating from the guest is bogus. It must never trust the guest info. So regardless of the implementation, you have to expect that the guest might have lied to you about it being crashed. The same is true even of Xen's panic notifier. So if an application is automatically triggering core dumps based on this panic notification, it needs to be aware that the guest can lie and take steps to avoid the guest causing a DOS attack on the host. Most likely by rate limiting the frequency of core dumps per guest, and/or setting a max core dump storage quota per guest. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote: > At 03/14/2012 06:52 PM, Avi Kivity Wrote: > > On 03/14/2012 12:52 PM, Wen Congyang wrote: > >>> > If so, is this channel visible to guest userspace? If the channle is > visible to guest > userspace, the program running in userspace may write the same message > to the channel. > >>> > >>> Access control is via permissions. You can have udev scripts assign > >>> whatever uid and gid to the port of your interest. By default, all > >>> ports are only accessible to the root user. > >> > >> We should also prevent root user writing message to this channel if it is > >> used for panicked notification. > >> > > > > Why? root can easily cause a panic. > > > > root user can write the same message to virtio-serial while the guest is > running... Unless you are running a MAC policy which strictly confines the root account, root can cause a kernel panic regardless of virtio-serial permissions in the guest: echo c > /proc/sysrq-trigger Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 14, 2012 at 03:21:14PM +0530, Amit Shah wrote: > On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote: > > At 03/13/2012 06:47 PM, Avi Kivity Wrote: > > > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote: > > >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote: > > >>> On 03/12/2012 11:04 AM, Wen Congyang wrote: > > >>>> Do you have any other comments about this patch? > > >>>> > > >>> > > >>> Not really, but I'm not 100% convinced the patch is worthwhile. It's > > >>> likely to only be used by Linux, which has kexec facilities, and you can > > >>> put talk to management via virtio-serial and describe the crash in more > > >>> details than a simple hypercall. > > >> > > >> As mentioned before, I don't think virtio-serial is a good fit for this. > > >> We want something that is simple & guaranteed always available. Using > > >> virtio-serial requires significant setup work on both the host and guest. > > > > > > So what? It needs to be done anyway for the guest agent. > > > > > >> Many management application won't know to make a vioserial device > > >> available > > >> to all guests they create. > > > > > > Then they won't know to deal with the panic event either. > > > > > >> Most administrators won't even configure kexec, > > >> let alone virtio serial on top of it. > > > > > > It should be done by the OS vendor, not the individual admin. > > > > > >> The hypercall requires zero host > > >> side config, and zero guest side config, which IMHO is what we need for > > >> this feature. > > > > > > If it was this one feature, yes. But we keep getting more and more > > > features like that and we bloat the hypervisor. There's a reason we > > > have a host-to-guest channel, we should use it. > > > > > > > I donot know how to use virtio-serial. > > > > I start vm like this: > > qemu ...\ > >-device virtio-serial \ > > -chardev socket,path=/tmp/foo,server,nowait,id=foo \ > > -device virtserialport,chardev=foo,name=port1 ... > > This is sufficient. On the host, you can open /tmp/foo using a custom > program or nc (nc -U /tmp/foo). On the guest, you can just open > /dev/virtio-ports/port1 and read/write into it. > > See the following links for more details. > > https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test > http://www.linux-kvm.org/page/Virtio-serial_API > > > You said that there are too many channels. Does it mean /tmp/foo is a > > channel? > > You can have several such -device virtserialport. The -device part > describes what the guest will see. The -chardev part ties that to the > host-side part of the channel. > > /tmp/foo is the host end-point for the channel, in the example above, > and /dev/virtio-ports/port1 is the guest-side end-point. If we do choose to use virtio-serial for panics (which I don't think we should), then we should not expose it in the host filesystem. The host side should be a virtual chardev backend internal to QEMU, in the same way that 'spicevmc' is handled. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote: > On 03/12/2012 11:04 AM, Wen Congyang wrote: > > Do you have any other comments about this patch? > > > > Not really, but I'm not 100% convinced the patch is worthwhile. It's > likely to only be used by Linux, which has kexec facilities, and you can > put talk to management via virtio-serial and describe the crash in more > details than a simple hypercall. As mentioned before, I don't think virtio-serial is a good fit for this. We want something that is simple & guaranteed always available. Using virtio-serial requires significant setup work on both the host and guest. Many management application won't know to make a vioserial device available to all guests they create. Most administrators won't even configure kexec, let alone virtio serial on top of it. The hypercall requires zero host side config, and zero guest side config, which IMHO is what we need for this feature. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event
On Thu, Mar 08, 2012 at 01:52:45PM +0200, Avi Kivity wrote: > On 03/08/2012 01:36 PM, Daniel P. Berrange wrote: > > On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote: > > > On 03/08/2012 12:15 PM, Wen Congyang wrote: > > > > When the host knows the guest is panicked, it will set > > > > exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive > > > > this exit_reason, we can send a event to tell management > > > > application that the guest is panicked and set the guest > > > > status to RUN_STATE_PANICKED. > > > > > > > > Signed-off-by: Wen Congyang > > > > --- > > > > kvm-all.c|5 + > > > > monitor.c|3 +++ > > > > monitor.h|1 + > > > > qapi-schema.json |2 +- > > > > qmp.c|3 ++- > > > > vl.c |1 + > > > > 6 files changed, 13 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/kvm-all.c b/kvm-all.c > > > > index 77eadf6..b3c9a83 100644 > > > > --- a/kvm-all.c > > > > +++ b/kvm-all.c > > > > @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env) > > > > (uint64_t)run->hw.hardware_exit_reason); > > > > ret = -1; > > > > break; > > > > +case KVM_EXIT_GUEST_PANICKED: > > > > +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL); > > > > +vm_stop(RUN_STATE_PANICKED); > > > > +ret = -1; > > > > +break; > > > > > > > > > > If the management application is not aware of this event, then it will > > > never resume the guest, so it will appear hung. > > > > Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should > > still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will > > know the guest CPUs have been stopped, even if it isn't aware of the > > reason why, which seems fine to me. > > No. The guest is stopped, and there's no reason to suppose that the > management app will restart it. Behaviour has changed. > > Suppose the guest has reboot_on_panic set; now the behaviour change is > even more visible - service will stop completely instead of being > interrupted for a bit while the guest reboots. Hmm, so this calls for a new command line argument to control behaviour, similar to what we do for disk werror, eg something like --onpanic "report|pause|stop|..." where report - emit QEVENT_GUEST_PANICKED only pause - emit QEVENT_GUEST_PANICKED and pause VM stop - emit QEVENT_GUEST_PANICKED and quit VM stop - emit QEVENT_GUEST_PANICKED and quit VM This would map fairly well into libvirt, where we already have config parameters for controlling what todo with a guest when it panics. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event
On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote: > On 03/08/2012 12:15 PM, Wen Congyang wrote: > > When the host knows the guest is panicked, it will set > > exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive > > this exit_reason, we can send a event to tell management > > application that the guest is panicked and set the guest > > status to RUN_STATE_PANICKED. > > > > Signed-off-by: Wen Congyang > > --- > > kvm-all.c|5 + > > monitor.c|3 +++ > > monitor.h|1 + > > qapi-schema.json |2 +- > > qmp.c|3 ++- > > vl.c |1 + > > 6 files changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/kvm-all.c b/kvm-all.c > > index 77eadf6..b3c9a83 100644 > > --- a/kvm-all.c > > +++ b/kvm-all.c > > @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env) > > (uint64_t)run->hw.hardware_exit_reason); > > ret = -1; > > break; > > +case KVM_EXIT_GUEST_PANICKED: > > +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL); > > +vm_stop(RUN_STATE_PANICKED); > > +ret = -1; > > +break; > > > > If the management application is not aware of this event, then it will > never resume the guest, so it will appear hung. Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will know the guest CPUs have been stopped, even if it isn't aware of the reason why, which seems fine to me. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Use getaddrinfo for migration
On Fri, Mar 02, 2012 at 02:25:36PM +0400, Michael Tokarev wrote: > Not a reply to the patch but a general observation. > > I noticed that the tcp migration uses gethostname > (or getaddrinfo after this patch) from the main > thread - is it really the way to go? Note that > DNS query which is done may block for a large amount > of time. Is it really safe in this context? Should > it resolve the name in a separate thread, allowing > guest to run while it is doing that? > > This question is important for me because right now > I'm evaluating a network-connected block device driver > which should do failover, so it will have to resolve > alternative name(s) at runtime (especially since list > of available targets is dynamic). > > From one point, _usually_, the delay there is very > small since it is unlikely you'll do migration or > failover overseas, so most likely you'll have the > answer from DNS handy. But from another point, if > the DNS is malfunctioning right at that time (eg, > one of the two DNS resolvers is being rebooted), > the delay even from local DNS may be noticeable. Yes, I think you are correct - QEMU should take care to ensure that DNS resolution can not block the QEMU event loop thread. There is the GLib extension (getaddrinfo_a) which does async DNS resolution, but for sake of portability it is probably better to use a thread to do it. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: notify host when guest paniced
On Wed, Feb 29, 2012 at 12:05:32PM +0200, Avi Kivity wrote: > On 02/29/2012 11:58 AM, Daniel P. Berrange wrote: > > > > > > How about using a virtio-serial channel for this? You can transfer any > > > amount of information (including the dump itself). > > > > When the guest OS has crashed, any dumps will be done from the host > > OS using libvirt's core dump mechanism. The guest OS isn't involved > > and is likely too dead to be of any use anyway. Likewise it is > > quite probably too dead to work a virtio-serial channel or any > > similarly complex device. We're really just after the simplest > > possible notification that the guest kernel has paniced. > > If it's alive enough to panic, it's alive enough to kexec its kdump > kernel. After that it can do anything. > > Guest-internal dumps are more useful IMO that host-initiated dumps. In > a cloud, the host-initiated dump is left on the host, outside the reach > of the guest admin, outside the guest image where all the symbols are, > and sometimes not even on the same host if a live migration occurred. > It's more useful in small setups, or if the problem is in the > hypervisor, not the guest. I don't think guest vs host dumps should be considered mutually exclusive, they both have pluses+minuses. Configuring kexec+kdump requires non-negligable guest admin configuration work before it's usable, and this work is guest OS specific, if it is possible at all. A permanent panic notifier that's built in the kernel by default requires zero guest admin config, and can allow host admin to automate collection of dumps across all their hosts/guests. The KVM hypercall notification is fairly trivially ported to any OS kernel, by comparison with a full virtio + virtio-serial impl. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: notify host when guest paniced
On Wed, Feb 29, 2012 at 11:49:58AM +0200, Avi Kivity wrote: > On 02/29/2012 03:29 AM, Wen Congyang wrote: > > At 02/28/2012 07:23 PM, Avi Kivity Wrote: > > > On 02/27/2012 05:01 AM, Wen Congyang wrote: > > >> We can know the guest is paniced when the guest runs on xen. > > >> But we do not have such feature on kvm. This patch implemnts > > >> this feature, and the implementation is the same as xen: > > >> register panic notifier, and call hypercall when the guest > > >> is paniced. > > > > > > What's the motivation for this? "Xen does this" is insufficient. > > > > Another purpose is: management app(for example: libvirt) can do auto > > dump when the guest is crashed. If management app does not do auto > > dump, the guest's user can do dump by hand if he sees the guest is > > paniced. > > > > I am thinking about another status: dumping. This status tells > > the guest's user that the guest is paniced, and the OS's dump function > > is working. > > > > These two status can tell the guest's user whether the guest is pancied, > > and what should he do if the guest is paniced. > > > > How about using a virtio-serial channel for this? You can transfer any > amount of information (including the dump itself). When the guest OS has crashed, any dumps will be done from the host OS using libvirt's core dump mechanism. The guest OS isn't involved and is likely too dead to be of any use anyway. Likewise it is quite probably too dead to work a virtio-serial channel or any similarly complex device. We're really just after the simplest possible notification that the guest kernel has paniced. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] QEMU applying for Google Summer of Code 2012
On Fri, Feb 10, 2012 at 10:30:24AM +, Stefan Hajnoczi wrote: > This year's Google Summer of Code has been announced: > > http://www.google-melange.com/gsoc/events/google/gsoc2012 > > For those who haven't heard of GSoC before, it funds university > students to work on open source projects during the summer. > Organizations, such as QEMU, can participate to attract students who > will tackle projects for 12 weeks this summer. The GSoC program has > been very successful because it gives students real open source > experience and organizations can grow their development community. > > QEMU has participated for several years and I would like to organize > our participation this year. Luiz was QEMU organization administrator > last year and contacted me because he will not have time this year. I > will prepare the application form for QEMU so that we will be > considered for 2012. > > Umbrella organization > - > Like last year, we can provide a home for KVM kernel module and > libvirt projects too if those organizations prefer not to apply to > GSoC themselves. Please let us know so we can work together! To maximise the spirit of collaboration between libvirt & QEMU/KVM communities I think it would make sense for us to work together under the same GSoC Umbrella organization. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?
On Fri, Jan 20, 2012 at 02:02:03PM +0100, Jan Kiszka wrote: > On 2012-01-20 13:54, Daniel P. Berrange wrote: > > On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote: > >> On 2012-01-20 13:42, Daniel P. Berrange wrote: > >>> On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote: > >>>> On 2012-01-20 12:45, Daniel P. Berrange wrote: > >>>>> On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote: > >>>>>> On 2012-01-20 11:25, Daniel P. Berrange wrote: > >>>>>>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote: > >>>>>>>> On 2012-01-20 11:14, Marcelo Tosatti wrote: > >>>>>>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote: > >>>>>>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote: > >>>>>>>>>>>> What problems does it cause, and in which scenarios? Can't they > >>>>>>>>>>>> be > >>>>>>>>>>>> fixed? > >>>>>>>>>>> > >>>>>>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, > >>>>>>>>>>> guest > >>>>>>>>>>> time advances faster then it should, to the extent where NTP > >>>>>>>>>>> fails to > >>>>>>>>>>> correct it. This is the case with RHEL4. > >>>>>>>>>>> > >>>>>>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not > >>>>>>>>>>> compensate. In that case you want KVM to reinject. > >>>>>>>>>>> > >>>>>>>>>>> I don't know of any other way to fix this. > >>>>>>>>>> > >>>>>>>>>> OK, i see. The old unsolved problem of guessing what is being > >>>>>>>>>> executed. > >>>>>>>>>> > >>>>>>>>>> Then the next question is how and where to control this. > >>>>>>>>>> Conceptually, > >>>>>>>>>> there should rather be a global switch say "compensate for lost > >>>>>>>>>> ticks of > >>>>>>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we > >>>>>>>>>> discussed something like this before? > >>>>>>>>> > >>>>>>>>> I don't see the advantage of a global control versus per device > >>>>>>>>> control (in fact it lowers flexibility). > >>>>>>>> > >>>>>>>> Usability. Users should not have to care about individual tick-based > >>>>>>>> clocks. They care about "my OS requires lost ticks compensation, yes > >>>>>>>> or no". > >>>>>>> > >>>>>>> FYI, at the libvirt level we model policy against individual timers, > >>>>>>> for > >>>>>>> example: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> Are the various modes of tickpolicy fully specified somewhere? > >>>>> > >>>>> There are some (not all that great) docs here: > >>>>> > >>>>> http://libvirt.org/formatdomain.html#elementsTime > >>>>> > >>>>> The meaning of the 4 policies are: > >>>>> > >>>>> delay: continue to deliver at normal rate > >>>> > >>>> What does this mean? The timer stops ticking until the guest accepts its > >>>> ticks again? > >>> > >>> It means that the hypervisor will not attempt to do any compensation, > >>> so the guest will see delays in its ticks being delivered & gradually > >>> drift over time. > >> > >> Still, is the logic as I described? Or what is the difference to "discard". > > > > With 'discard', the delayed tick will be thrown away. In 'delay', the > > delayed tick will still be injected to the guest, possibly well after > > the intended injection time though, and there will be no attempt to > > compensate by speeding up delivery of later ticks. > > OK, let's see if I got it: > > delay - all lost ticks are replayed in a row once the guest accepts > them again > catchup - lost ticks are gradually replayed at a higher frequency than > the original tick > merge - at most one tick is replayed once the guest accepts it again > discard - no lost ticks compensation Yes, I think that is a good understanding. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?
On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote: > On 2012-01-20 13:42, Daniel P. Berrange wrote: > > On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote: > >> On 2012-01-20 12:45, Daniel P. Berrange wrote: > >>> On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote: > >>>> On 2012-01-20 11:25, Daniel P. Berrange wrote: > >>>>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote: > >>>>>> On 2012-01-20 11:14, Marcelo Tosatti wrote: > >>>>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote: > >>>>>>>>>> What problems does it cause, and in which scenarios? Can't they be > >>>>>>>>>> fixed? > >>>>>>>>> > >>>>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, > >>>>>>>>> guest > >>>>>>>>> time advances faster then it should, to the extent where NTP fails > >>>>>>>>> to > >>>>>>>>> correct it. This is the case with RHEL4. > >>>>>>>>> > >>>>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not > >>>>>>>>> compensate. In that case you want KVM to reinject. > >>>>>>>>> > >>>>>>>>> I don't know of any other way to fix this. > >>>>>>>> > >>>>>>>> OK, i see. The old unsolved problem of guessing what is being > >>>>>>>> executed. > >>>>>>>> > >>>>>>>> Then the next question is how and where to control this. > >>>>>>>> Conceptually, > >>>>>>>> there should rather be a global switch say "compensate for lost > >>>>>>>> ticks of > >>>>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we > >>>>>>>> discussed something like this before? > >>>>>>> > >>>>>>> I don't see the advantage of a global control versus per device > >>>>>>> control (in fact it lowers flexibility). > >>>>>> > >>>>>> Usability. Users should not have to care about individual tick-based > >>>>>> clocks. They care about "my OS requires lost ticks compensation, yes > >>>>>> or no". > >>>>> > >>>>> FYI, at the libvirt level we model policy against individual timers, for > >>>>> example: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> Are the various modes of tickpolicy fully specified somewhere? > >>> > >>> There are some (not all that great) docs here: > >>> > >>> http://libvirt.org/formatdomain.html#elementsTime > >>> > >>> The meaning of the 4 policies are: > >>> > >>> delay: continue to deliver at normal rate > >> > >> What does this mean? The timer stops ticking until the guest accepts its > >> ticks again? > > > > It means that the hypervisor will not attempt to do any compensation, > > so the guest will see delays in its ticks being delivered & gradually > > drift over time. > > Still, is the logic as I described? Or what is the difference to "discard". With 'discard', the delayed tick will be thrown away. In 'delay', the delayed tick will still be injected to the guest, possibly well after the intended injection time though, and there will be no attempt to compensate by speeding up delivery of later ticks. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?
On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote: > On 2012-01-20 12:45, Daniel P. Berrange wrote: > > On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote: > >> On 2012-01-20 11:25, Daniel P. Berrange wrote: > >>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote: > >>>> On 2012-01-20 11:14, Marcelo Tosatti wrote: > >>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote: > >>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote: > >>>>>>>> What problems does it cause, and in which scenarios? Can't they be > >>>>>>>> fixed? > >>>>>>> > >>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, guest > >>>>>>> time advances faster then it should, to the extent where NTP fails to > >>>>>>> correct it. This is the case with RHEL4. > >>>>>>> > >>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not > >>>>>>> compensate. In that case you want KVM to reinject. > >>>>>>> > >>>>>>> I don't know of any other way to fix this. > >>>>>> > >>>>>> OK, i see. The old unsolved problem of guessing what is being executed. > >>>>>> > >>>>>> Then the next question is how and where to control this. Conceptually, > >>>>>> there should rather be a global switch say "compensate for lost ticks > >>>>>> of > >>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we > >>>>>> discussed something like this before? > >>>>> > >>>>> I don't see the advantage of a global control versus per device > >>>>> control (in fact it lowers flexibility). > >>>> > >>>> Usability. Users should not have to care about individual tick-based > >>>> clocks. They care about "my OS requires lost ticks compensation, yes or > >>>> no". > >>> > >>> FYI, at the libvirt level we model policy against individual timers, for > >>> example: > >>> > >>> > >>> > >>> > >>> > >> > >> Are the various modes of tickpolicy fully specified somewhere? > > > > There are some (not all that great) docs here: > > > > http://libvirt.org/formatdomain.html#elementsTime > > > > The meaning of the 4 policies are: > > > > delay: continue to deliver at normal rate > > What does this mean? The timer stops ticking until the guest accepts its > ticks again? It means that the hypervisor will not attempt to do any compensation, so the guest will see delays in its ticks being delivered & gradually drift over time. > > catchup: deliver at higher rate to catchup > > merge: ticks merged into 1 single tick > > discard: all missed ticks are discarded > > But those interpretations aren't stated in the docs. That makes it hard > to map them on individual hypervisors - or model proper new hypervisor > interfaces accordingly. That's not a real problem, now I notice they are missing the docs, I can just add them in. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?
On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote: > On 2012-01-20 11:25, Daniel P. Berrange wrote: > > On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote: > >> On 2012-01-20 11:14, Marcelo Tosatti wrote: > >>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote: > >>>> On 2012-01-19 18:53, Marcelo Tosatti wrote: > >>>>>> What problems does it cause, and in which scenarios? Can't they be > >>>>>> fixed? > >>>>> > >>>>> If the guest compensates for lost ticks, and KVM reinjects them, guest > >>>>> time advances faster then it should, to the extent where NTP fails to > >>>>> correct it. This is the case with RHEL4. > >>>>> > >>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not > >>>>> compensate. In that case you want KVM to reinject. > >>>>> > >>>>> I don't know of any other way to fix this. > >>>> > >>>> OK, i see. The old unsolved problem of guessing what is being executed. > >>>> > >>>> Then the next question is how and where to control this. Conceptually, > >>>> there should rather be a global switch say "compensate for lost ticks of > >>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we > >>>> discussed something like this before? > >>> > >>> I don't see the advantage of a global control versus per device > >>> control (in fact it lowers flexibility). > >> > >> Usability. Users should not have to care about individual tick-based > >> clocks. They care about "my OS requires lost ticks compensation, yes or > >> no". > > > > FYI, at the libvirt level we model policy against individual timers, for > > example: > > > > > > > > > > > > Are the various modes of tickpolicy fully specified somewhere? There are some (not all that great) docs here: http://libvirt.org/formatdomain.html#elementsTime The meaning of the 4 policies are: delay: continue to deliver at normal rate catchup: deliver at higher rate to catchup merge: ticks merged into 1 single tick discard: all missed ticks are discarded The original design rationale was here, though beware that some things changed between this design & the actual implementation libvirt has: https://www.redhat.com/archives/libvir-list/2010-March/msg00304.html Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?
On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote: > On 2012-01-20 11:14, Marcelo Tosatti wrote: > > On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote: > >> On 2012-01-19 18:53, Marcelo Tosatti wrote: > What problems does it cause, and in which scenarios? Can't they be > fixed? > >>> > >>> If the guest compensates for lost ticks, and KVM reinjects them, guest > >>> time advances faster then it should, to the extent where NTP fails to > >>> correct it. This is the case with RHEL4. > >>> > >>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not > >>> compensate. In that case you want KVM to reinject. > >>> > >>> I don't know of any other way to fix this. > >> > >> OK, i see. The old unsolved problem of guessing what is being executed. > >> > >> Then the next question is how and where to control this. Conceptually, > >> there should rather be a global switch say "compensate for lost ticks of > >> periodic timers: yes/no" - instead of a per-timer knob. Didn't we > >> discussed something like this before? > > > > I don't see the advantage of a global control versus per device > > control (in fact it lowers flexibility). > > Usability. Users should not have to care about individual tick-based > clocks. They care about "my OS requires lost ticks compensation, yes or no". FYI, at the libvirt level we model policy against individual timers, for example: Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SPEC-file for making RPMs (with rpmbuild)
On Fri, Jan 06, 2012 at 11:11:21AM +0100, Guido Winkelmann wrote: > Hi, > > Is there a spec-file somewhere for creating RPMs from the newest qemu-kvm > release? The current Fedora RPM specfiles are always a good bet to start off with: http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=blob;f=qemu.spec;hb=HEAD By default these will build all QEMU targets, and a dedicated qemu-kvm binary too.There is a flag to restrict it to x86 only for cases where you don't want all archs. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 5x slower guest disk performance with virtio disk
On Thu, Dec 15, 2011 at 07:16:22PM +0200, Sasha Levin wrote: > On Thu, 2011-12-15 at 11:55 -0500, Brian J. Murrell wrote: > > So, about 2/3 of host speed now -- which is much better. Is 2/3 about > > normal or should I be looking for more? > > aio=native > > Thats the qemu setting, I'm not sure where libvirt hides that. ... Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Allow the user to pass a FD to use as a TAP device
On Wed, Dec 07, 2011 at 06:28:12PM +0200, Pekka Enberg wrote: > On Wed, Dec 7, 2011 at 11:37 AM, Sasha Levin wrote: > > This allows users to pass a pre-configured fd to use for the network > > interface. > > > > For example: > > kvm run -n mode=tap,fd=3 3<>/dev/net/tap3 > > > > Cc: Daniel P. Berrange > > Cc: Osier Yang > > Signed-off-by: Sasha Levin > > Daniel, Osier, I assume this is useful for libvirt? Yes, this works. I don't know if kvmtool supports the VNET_HDR extension yet, but if it does, then we can make libvirt pass in a pre-opened FD for that too. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] (no subject)
On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote: > On Tue, 2011-12-06 at 14:38 +0000, Daniel P. Berrange wrote: > > On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: > > > * KVM tool manages the network completely itself (with DHCP support?), > > > no way to configure, except specify the modes (user|tap|none). I > > > have not test it yet, but it should need explicit script to setup > > > the network rules(e.g. NAT) for the guest access outside world. > > > Anyway, there is no way for libvirt to control the guest network. > > > > If KVM tool support TAP devices, can't be do whatever we like with > > that just by passing in a configured TAP device from libvir ? > > KVM tool currently creates and configures the TAP devices it uses, it > shouldn't be an issue to have it use a TAP fd passed to it either. > > How does libvirt do it? Create a /dev/tapX on it's own and pass the fd > to the hypervisor? Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA mode), adds it to the neccessary bridge, and/or configures VEPA, etc and then passes the FD to the hypervisor, with a ARGV parameter to tell the HV which FD is being passed. > > > * console connection is implemented by setup ptys in libvirt, > > > stdout/stderr > > > of kvm tool process is redirected to the master pty, and libvirt > > > connects > > > to the slave pty. This works fine now, but it might be better if kvm > > > tool could provide more advanced console mechanisms. Just like QEMU > > > does? > > > > This sounds good enough for now. > > KVM tools does a redirection to a PTY, which at that point could be > redirected to anywhere the user wants. > > What features might be interesting to do on top of that? I presume that Osier is just comparing with the features QEMU has available for chardevs config, which include - PTYs - UNIX sockets - TCP sockets - UDP sockets - FIFO pipe - Plain file (output only obviously, but useful for logging) libvirt doesn't specifically need any of them, but it can support those options if they exist. > > > * Not much ways existed yet for external apps or user to query the guest > > > informations. But this might be changed soon per KVM tool grows up > > > quickly. > > > > What sort of guest info are you thinking about ? The most immediate > > pieces of info I can imagine we need are > > > > - Mapping between PIDs and vCPU threads > > - Current balloon driver value > > Those are pretty easily added using the IPC interface I've mentioned > above. For example, 'kvm balloon' and 'kvm stat' will return a lot of > info out of the balloon driver (including the memory stats VQ - which > afaik we're probably the only ones who actually do that (but I might be > wrong) :) Ok, that sounds sufficient for the balloon info. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH 7/7] kvmtool: Implementation for kvm tool driver
On Fri, Nov 11, 2011 at 07:57:06PM +0800, Osier Yang wrote: > Basically, the drivers is implemented by using kvm tool binary > currently, (see ./kvm help for more info). > > Current implementation supports define/undefine, start/destroy/, > suspend/resume, connect to guest console via "virsh console", > and balloon memory with with "virsh setmem" (using ./kvm balloon > command). Also as it supports cgroup controllers "cpuacct", and > "memory", so some other commands like "schedinfo", "memtune" can > also work. Some other commands such as "domid", "domname", "dumpxml" > ,"autostart", etc. are supported, as the driver is designed > as a "stateful" driver, those APIs just need to talk with libvirtd > simply. > > As Native Linux KVM Tool is designed for both non-root and root users, > the driver is designed just like QEMU, supports two modes of the > connection: > > kvmtool:///system > kvmtool+unix:///system > > kvmtool:///session > kvmtool+unix:///session > > An example of the domain XML (all the XMLs supported currently are > listed): > > % virsh -c kvm:///system dumpxml kvm_test > > kvm_test > 88bf38f1-b6ab-cfa6-ab53-4b4c0993d894 > 524288 > 524288 > 1 > > hvm > /boot/bzImage > > > > destroy > restart > restart > > /usr/bin/kvmtool > > > > > > > > > > > > > > > --- > cfg.mk |1 + > daemon/Makefile.am |4 + > daemon/libvirtd.c|7 + > po/POTFILES.in |2 + > src/Makefile.am | 36 +- > src/kvmtool/kvmtool_conf.c | 130 ++ > src/kvmtool/kvmtool_conf.h | 66 + > src/kvmtool/kvmtool_driver.c | 3079 > ++ > src/kvmtool/kvmtool_driver.h | 29 + My main suggestion here would be to split up the kvmtool_driver.c file into 3 parts as we did with the QEMU driver. kvmtool_driver.c -> Basic libvirt API glue kvmtool_command.c -> ARGV generation kvmtool_process.c -> KVMtool process start/stop/autostart/autodestroy Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH 4/7] kvmtool: Add hook support for kvmtool domain
On Fri, Nov 11, 2011 at 07:57:03PM +0800, Osier Yang wrote: > Just like QEMU and LXC, kvm driver intends to support running hook > script before domain starting and after domain shutdown too. > --- > src/util/hooks.c | 11 ++- > src/util/hooks.h |8 > 2 files changed, 18 insertions(+), 1 deletions(-) > > diff --git a/src/util/hooks.c b/src/util/hooks.c > index 110a94b..765cb68 100644 > --- a/src/util/hooks.c > +++ b/src/util/hooks.c > @@ -52,12 +52,14 @@ VIR_ENUM_DECL(virHookDaemonOp) > VIR_ENUM_DECL(virHookSubop) > VIR_ENUM_DECL(virHookQemuOp) > VIR_ENUM_DECL(virHookLxcOp) > +VIR_ENUM_DECL(virHookKvmToolOp) > > VIR_ENUM_IMPL(virHookDriver, >VIR_HOOK_DRIVER_LAST, >"daemon", >"qemu", > - "lxc") > + "lxc", > + "kvmtool") > > VIR_ENUM_IMPL(virHookDaemonOp, VIR_HOOK_DAEMON_OP_LAST, >"start", > @@ -79,6 +81,10 @@ VIR_ENUM_IMPL(virHookLxcOp, VIR_HOOK_LXC_OP_LAST, >"start", >"stopped") > > +VIR_ENUM_IMPL(virHookKvmToolOp, VIR_HOOK_KVMTOOL_OP_LAST, > + "start", > + "stopped") > + > static int virHooksFound = -1; > > /** > @@ -230,6 +236,9 @@ virHookCall(int driver, const char *id, int op, int > sub_op, const char *extra, > case VIR_HOOK_DRIVER_LXC: > opstr = virHookLxcOpTypeToString(op); > break; > +case VIR_HOOK_DRIVER_KVMTOOL: > +opstr = virHookKvmToolOpTypeToString(op); > +break; > } > if (opstr == NULL) { > virHookReportError(VIR_ERR_INTERNAL_ERROR, > diff --git a/src/util/hooks.h b/src/util/hooks.h > index fd7411c..69081c4 100644 > --- a/src/util/hooks.h > +++ b/src/util/hooks.h > @@ -31,6 +31,7 @@ enum virHookDriverType { > VIR_HOOK_DRIVER_DAEMON = 0,/* Daemon related events */ > VIR_HOOK_DRIVER_QEMU, /* QEmu domains related events */ > VIR_HOOK_DRIVER_LXC, /* LXC domains related events */ > +VIR_HOOK_DRIVER_KVMTOOL, /* KVMTOOL domains related events */ > > VIR_HOOK_DRIVER_LAST, > }; > @@ -67,6 +68,13 @@ enum virHookLxcOpType { > VIR_HOOK_LXC_OP_LAST, > }; > > +enum virHookKvmToolOpType { > +VIR_HOOK_KVMTOOL_OP_START,/* domain is about to start */ > +VIR_HOOK_KVMTOOL_OP_STOPPED, /* domain has stopped */ > + > +VIR_HOOK_KVMTOOL_OP_LAST, > +}; > + > int virHookInitialize(void); > > int virHookPresent(int driver); Trivial, ACK Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH 3/7] kvmtool: Add new enums and error codes for the driver
On Fri, Nov 11, 2011 at 07:57:02PM +0800, Osier Yang wrote: > --- > include/libvirt/virterror.h |1 + > src/driver.h|1 + > src/util/virterror.c|3 +++ > 3 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h > index a8549b7..deda42d 100644 > --- a/include/libvirt/virterror.h > +++ b/include/libvirt/virterror.h > @@ -84,6 +84,7 @@ typedef enum { > VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */ > VIR_FROM_LOCKING = 42, /* Error from lock manager */ > VIR_FROM_HYPERV = 43,/* Error from Hyper-V driver */ > +VIR_FROM_KVMTOOL = 44, /* Error from kvm tool driver */ > } virErrorDomain; > > > diff --git a/src/driver.h b/src/driver.h > index 4c14aaa..158a13c 100644 > --- a/src/driver.h > +++ b/src/driver.h > @@ -30,6 +30,7 @@ typedef enum { > VIR_DRV_VMWARE = 13, > VIR_DRV_LIBXL = 14, > VIR_DRV_HYPERV = 15, > +VIR_DRV_KVMTOOL = 16, > } virDrvNo; > > > diff --git a/src/util/virterror.c b/src/util/virterror.c > index 5006fa2..abb5b5a 100644 > --- a/src/util/virterror.c > +++ b/src/util/virterror.c > @@ -175,6 +175,9 @@ static const char *virErrorDomainName(virErrorDomain > domain) { > case VIR_FROM_HYPERV: > dom = "Hyper-V "; > break; > +case VIR_FROM_KVMTOOL: > +dom = "KVMTOOL "; > +break; > } > return(dom); > } Trivial, ACK Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH 2/7] kvmtool: Add documents
On Fri, Nov 11, 2011 at 07:57:01PM +0800, Osier Yang wrote: > The document is rather rough now, but at least contains an domain > config example of all the current supported XMLs, and tells how to > play with the driver. > --- > docs/drivers.html.in|1 + > docs/drvkvmtool.html.in | 87 > +++ > docs/index.html.in |3 ++ > docs/sitemap.html.in|4 ++ > src/README |3 +- > 5 files changed, 97 insertions(+), 1 deletions(-) > create mode 100644 docs/drvkvmtool.html.in > > diff --git a/docs/drivers.html.in b/docs/drivers.html.in > index 75038fc..249c137 100644 > --- a/docs/drivers.html.in > +++ b/docs/drivers.html.in > @@ -29,6 +29,7 @@ >VMware > Workstation/Player >Xen >Microsoft > Hyper-V > + Native Linux KVM > Tool > > > Storage drivers > diff --git a/docs/drvkvmtool.html.in b/docs/drvkvmtool.html.in > new file mode 100644 > index 000..1b6acdf > --- /dev/null > +++ b/docs/drvkvmtool.html.in > @@ -0,0 +1,87 @@ > + > + > +KVM tool driver > + > + > + > + > + The libvirt KVMTOOL driver manages hypervisor Native Linux KVM Tool, > + it's implemented by using command line of kvm tool binary. > + > + > +Project Links > + > + > + > +The Native Linux > KVM Tool Native > +Linux KVM Tool > + > + > + > +Connections to the KVMTOOL driver > + > + The libvirt KVMTOOL driver is a multi-instance driver, providing a > single > + system wide privileged driver (the "system" instance), and per-user > + unprivileged drivers (the "session" instance). The URI driver protocol > + is "kvmtool". Some example conection URIs for the libvirt driver are: > + > + > + > + kvmtool:///session (local access to per-user > instance) > + kvmtool+unix:///session (local access to per-user > instance) > + > + kvmtool:///system (local access to system > instance) > + kvmtool+unix:///system (local access to system > instance) > + > + > + cgroups controllers "cpuacct", and "memory" are supported currently. > + > + > + Example config > + > + > +As mentioned in a later patch, we should just use type='kvm' here still Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH 5/7] kvmtool: Add new domain type
On Fri, Nov 11, 2011 at 07:57:04PM +0800, Osier Yang wrote: > It's named as "kvmtool". > --- > src/conf/domain_conf.c |4 +++- > src/conf/domain_conf.h |1 + > 2 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > index 58f4d0f..55121d8 100644 > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -91,7 +91,8 @@ VIR_ENUM_IMPL(virDomainVirt, VIR_DOMAIN_VIRT_LAST, >"hyperv", >"vbox", >"one", > - "phyp") > + "phyp", > + "kvmtool") > > VIR_ENUM_IMPL(virDomainBoot, VIR_DOMAIN_BOOT_LAST, >"fd", > @@ -4018,6 +4019,7 @@ virDomainChrDefParseXML(virCapsPtr caps, > if (type == NULL) { > def->source.type = VIR_DOMAIN_CHR_TYPE_PTY; > } else if ((def->source.type = virDomainChrTypeFromString(type)) < 0) { > +VIR_WARN("type = %s", type); > virDomainReportError(VIR_ERR_XML_ERROR, > _("unknown type presented to host for character > device: %s"), > type); > diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h > index a3cb834..001bc46 100644 > --- a/src/conf/domain_conf.h > +++ b/src/conf/domain_conf.h > @@ -59,6 +59,7 @@ enum virDomainVirtType { > VIR_DOMAIN_VIRT_VBOX, > VIR_DOMAIN_VIRT_ONE, > VIR_DOMAIN_VIRT_PHYP, > +VIR_DOMAIN_VIRT_KVMTOOL, > > VIR_DOMAIN_VIRT_LAST, > }; IMHO this patch is not required. The domain type is refering to the hypervisor used for the domain, which is still 'kvm'. What is different here is just the userspace device model. If you look at the 3 different Xen user spaces we support, all of them use still. So just use here for kvmtool. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [PATCH] kvm tools: Introduce an ENV variable for the state dir
On Fri, Nov 11, 2011 at 07:57:00PM +0800, Osier Yang wrote: > Which is named as "KVMTOOL_STATE_DIR", so that the user can > configure the path of state directly as he wants. > --- > tools/kvm/main.c |7 ++- > 1 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/tools/kvm/main.c b/tools/kvm/main.c > index 05bc82c..37b2b1d 100644 > --- a/tools/kvm/main.c > +++ b/tools/kvm/main.c > @@ -13,7 +13,12 @@ static int handle_kvm_command(int argc, char **argv) > > int main(int argc, char *argv[]) > { > - kvm__set_dir("%s/%s", HOME_DIR, KVM_PID_FILE_PATH); > + char *state_dir = getenv("KVMTOOL_STATE_DIR"); > + > + if (state_dir) > + kvm__set_dir("%s", state_dir); > + else > + kvm__set_dir("%s/%s", HOME_DIR, KVM_PID_FILE_PATH); > > return handle_kvm_command(argc - 1, &argv[1]); > } As per my comments in the first patch, I don't think this is critical for libvirt's needs. We should just honour the default location that the KVM tool uses, rather than forcing a libvirt specific location. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] (no subject)
On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote: > Hi, all > > This is a basic implementation of libvirt Native Linux KVM > Tool driver. Note that this is just made with my own interest > and spare time, it's not an endorsement/effort by Red Hat, > and it isn't supported by Red Hat officially. > > Basically, the driver is designed as *stateful*, as KVM tool > doesn't maintain any info about the guest except a socket which > for its own IPC. And it's implemented by using KVM tool binary, > which is name "kvm" currently, along with cgroup controllers > "cpuacct", and "memory" support. And as one of KVM tool's > pricinple is to allow both the non-root and root user to play with. > The driver is designed to support root and non-root too, just > like QEMU does. Example of the connection URI: > > virsh -c kvmtool:///system > virsh -c kvmtool:///session > virsh -c kvmtool+unix:///system > virsh -c kvmtool+unix:///session > > The implementation can support more or less than 15 virsh commands > currently, including basic domain cycle operations (define/undefine, > start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml, > ,autostart, dominfo, etc.) > > About the domain configuration: > * "kernel": must be specified as KVM tool only support boots > from the kernel currently (no integration with BIOS app yet). > > * "disk": only virtio bus is supported, and device type must be 'disk'. > > * "serial/console": only one console is supported, of type serial or > virtio (can extend to support multiple console as long as kvm tool > supports, libvirt already supported mutiple console, see upstream > commit 0873b688c). > > * "p9fs": only support specifying the source dir, and mount tag, only > type of 'mount' is supported. > > * "memballoon": only virtio is supported, and there is no way > to config the addr. > > * Multiple "disk" and "p9fs" is supported. > > * Graphics and network are not supported, will explain below. > > Please see "[PATCH 7/8]" for an example of the domain config. (which > contains all the XMLs supported by current implementation). > > The problems of Native Linux KVM Tool from libvirt p.o.v: > > * Some destros package "qemu-kvm" as "kvm", also "kvm" is a long > established name for "KVM" itself, so naming the project as > "kvm" might be not a good idea. I assume it will be named > as "kvmtool" in this implementation, never mind this if you > don't like that, it can be updated easily. :-) Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend using 'kvmtool' as the binary name to avoid confusion with existing 'kvm' binaries based on QEMU. > * It still doesn't have an official package yet, even no "make install". > means we have no way to check the dependancy and do the checking > when 'configure'. I assume it will be installed as "/usr/bin/kvmtool" > in this implementation. This is the main reason which can prevents > upstream libvirt accepting the patches I guess. Ok, not really a problem - we do similar for the regular QEMU driver. > * Lacks of options for user's configuration, such as "-vnc", there > is no option for user to configure the properties for the "vnc", > such as the port. It hides things, doesn't provide ways to query > the properties too, this causes problems for libvirt to add the > vnc support, as vnc clients such as virt-manager, virt-viewer, > have no way to connect the guest. Even vncviewer can't. Being able to specify a VNC port of libvirt's choosing is pretty much mandatory to be able to support that.In addition being able to specify the bind address is important to be able to control security. eg to only bind to 127.0.0.1, or only to certain NICs in a multi-NIC host. > * KVM tool manages the network completely itself (with DHCP support?), > no way to configure, except specify the modes (user|tap|none). I > have not test it yet, but it should need explicit script to setup > the network rules(e.g. NAT) for the guest access outside world. > Anyway, there is no way for libvirt to control the guest network. If KVM tool support TAP devices, can't be do whatever we like with that just by passing in a configured TAP device from libvir ? > * There is a gap about the domain status between KVM tool and libvirt, > it's caused by KVM tool unlink()s the guest socket when user exits > from console (both text and graphic), but libvirt still think the > guest is running. Being able to reliably detect shutdown/exit of the KVM too is a very important tasks, and we can't rely on waitpid/SIG_CHLD because we want to daemonize all instances wrt libvirtd. In the QEMU driver we keep open a socket to the monitor, and when we see an I/O error / POLLHUP on the socket we know that QEMU has quit. What is this guest socket used for ? Could libvirt keep open a connection to it ? One other option would
Re: [Qemu-devel] KVM call minutes for November 29
On Wed, Nov 30, 2011 at 11:22:37AM +0200, Alon Levy wrote: > On Tue, Nov 29, 2011 at 04:59:51PM -0600, Anthony Liguori wrote: > > On 11/29/2011 10:59 AM, Avi Kivity wrote: > > >On 11/29/2011 05:51 PM, Juan Quintela wrote: > > >>How to do high level stuff? > > >>- python? > > >> > > > > > >One of the disadvantages of the various scripting languages is the lack > > >of static type checking, which makes it harder to do full sweeps of the > > >source for API changes, relying on the compiler to catch type (or other) > > >errors. > > > > This is less interesting to me (figuring out the perfectest language to > > use). > > > > I think what's more interesting is the practical execution of > > something like this. Just assuming we used python (since that's > > what I know best), I think we could do something like this: > > > > 1) We could write a binding layer to expose the QMP interface as a > > python module. This would be very little binding code but would > > bring a bunch of functionality to python bits. > > If going this route, I would propose to use gobject-introspection [1] > instead of directly binding to python. You should be able to get > multiple languages support this way, including python. I think it > requires using glib 3.0, but I haven't tested it myself (yet). Maybe > someone more knowledgable can shoot it down. > > [1] http://live.gnome.org/GObjectIntrospection/ > > Actually this might make sense for the whole of QEMU. I think for a > defined interface like QMP implementing the interface directly in python > makes more sense. But having qemu itself GObject'ified and scriptable > is cool. It would also lend it self to 4) without going through 2), but > also make 2) possible (with any language, not just python). I think taking advantage of GObject introspection is fine idea - I certainly don't want to manually create python (or any other language) bindings for any C code ever again. GObject + introspection takes away all the burden of supporting access to C code from non-C languages. Given that QEMU has already adopted GLib as mandatory infrastructure, going down the GObject route seems like a very natural fit/direction to take. If people like the idea of a higher level language for QEMU, but are concerned about performance / overhead of embedding a scripting language in QEMU, then GObject introspection opens the possibilty of writing in Vala, which is a higher level language which compiles straight down to machine code like C does. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote: > > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote: > > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote: > > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange: > > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote: > > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote: > > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > > > >>>>>> Live migration with qcow2 or any other image format is just > > > > > >>>>>> not going to work > > > > > >>>>>> right now even with proper clustered storage. I think doing a > > > > > >>>>>> block level flush > > > > > >>>>>> cache interface and letting block devices decide how to do it > > > > > >>>>>> is the best approach. > > > > > >>>>> > > > > > >>>>> I would really prefer reusing the existing open/close code. It > > > > > >>>>> means > > > > > >>>>> less (duplicated) code, is existing code that is well tested > > > > > >>>>> and doesn't > > > > > >>>>> make migration much of a special case. > > > > > >>>>> > > > > > >>>>> If you want to avoid reopening the file on the OS level, we can > > > > > >>>>> reopen > > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) > > > > > >>>>> for now > > > > > >>>>> and in 1.1 we can use bdrv_reopen(). > > > > > >>>>> > > > > > >>>> > > > > > >>>> Intuitively I dislike _reopen style interfaces. If the second > > > > > >>>> open > > > > > >>>> yields different results from the first, does it invalidate any > > > > > >>>> computations in between? > > > > > >>>> > > > > > >>>> What's wrong with just delaying the open? > > > > > >>> > > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then > > > > > >>> you loose > > > > > >>> the ability to rollback to the source host upon open failure for > > > > > >>> most > > > > > >>> deployed versions of libvirt. We only fairly recently switched to > > > > > >>> a five > > > > > >>> stage migration handshake to cope with rollback when 'cont' fails. > > > > > >>> > > > > > >>> Daniel > > > > > >> > > > > > >> I guess reopen can fail as well, so this seems to me to be an > > > > > >> important > > > > > >> fix but not a blocker. > > > > > > > > > > > > If if the initial open succeeds, then it is far more likely that a > > > > > > later > > > > > > re-open will succeed too, because you have already elminated the > > > > > > possibility > > > > > > of configuration mistakes, and will have caught most storage > > > > > > runtime errors > > > > > > too. So there is a very significant difference in reliability > > > > > > between doing > > > > > > an 'open at startup + reopen at cont' vs just 'open at cont' > > > > > > > > > > > > Based on the bug reports I see, we want to be very good at > > > > > > detecting and > > > > > > gracefully handling open errors because they are pretty frequent. > > > > > > > > > > Do you have some more details on the kind of errors? Missing files, > > > > > permissions, something like this? Or rather something related to the > > > > > actual content of an image file? > > > > > > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI > > > > setup. Access permissions due to incorrect user / group setup, or read > > > > only mounts, or SELinux denials. Actual I/O errors are less common and > > > > are not so likely to cause QEMU to fail to start any, since QEMU is > > > > likely to just report them to the guest OS instead. > > > > > > Do you run qemu with -S, then give a 'cont' command to start it? > > > > Yes > > OK, so let's go back one step now - how is this related to > 'rollback to source host'? In the old libvirt migration protocol, by the time we run 'cont' on the destination, the source QEMU has already been killed off, so there's nothing to resume on failure. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote: > > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote: > > > On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote: > > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote: > > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange: > > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote: > > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote: > > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > > > >>>>>> Live migration with qcow2 or any other image format is just > > > > > >>>>>> not going to work > > > > > >>>>>> right now even with proper clustered storage. I think doing a > > > > > >>>>>> block level flush > > > > > >>>>>> cache interface and letting block devices decide how to do it > > > > > >>>>>> is the best approach. > > > > > >>>>> > > > > > >>>>> I would really prefer reusing the existing open/close code. It > > > > > >>>>> means > > > > > >>>>> less (duplicated) code, is existing code that is well tested > > > > > >>>>> and doesn't > > > > > >>>>> make migration much of a special case. > > > > > >>>>> > > > > > >>>>> If you want to avoid reopening the file on the OS level, we can > > > > > >>>>> reopen > > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) > > > > > >>>>> for now > > > > > >>>>> and in 1.1 we can use bdrv_reopen(). > > > > > >>>>> > > > > > >>>> > > > > > >>>> Intuitively I dislike _reopen style interfaces. If the second > > > > > >>>> open > > > > > >>>> yields different results from the first, does it invalidate any > > > > > >>>> computations in between? > > > > > >>>> > > > > > >>>> What's wrong with just delaying the open? > > > > > >>> > > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then > > > > > >>> you loose > > > > > >>> the ability to rollback to the source host upon open failure for > > > > > >>> most > > > > > >>> deployed versions of libvirt. We only fairly recently switched to > > > > > >>> a five > > > > > >>> stage migration handshake to cope with rollback when 'cont' fails. > > > > > >>> > > > > > >>> Daniel > > > > > >> > > > > > >> I guess reopen can fail as well, so this seems to me to be an > > > > > >> important > > > > > >> fix but not a blocker. > > > > > > > > > > > > If if the initial open succeeds, then it is far more likely that a > > > > > > later > > > > > > re-open will succeed too, because you have already elminated the > > > > > > possibility > > > > > > of configuration mistakes, and will have caught most storage > > > > > > runtime errors > > > > > > too. So there is a very significant difference in reliability > > > > > > between doing > > > > > > an 'open at startup + reopen at cont' vs just 'open at cont' > > > > > > > > > > > > Based on the bug reports I see, we want to be very good at > > > > > > detecting and > > > > > > gracefully handling open errors because they are pretty frequent. > > > > > > > > > > Do you have some more details on the kind of errors? Missing files, > > > > > permissions, something like this? Or rather something related to the > > > > > actual content of an image file? > > > > > > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI > > > > setup. Access permissions due to incorrect user / group setup, or read > > > > only mounts, or SELinux denials. Actual I/O errors are less common and > > > > are not so likely to cause QEMU to fail to start any, since QEMU is > > > > likely to just report them to the guest OS instead. > > > > > > Do you run qemu with -S, then give a 'cont' command to start it? > > Probably in an attempt to improve reliability :) Not really. We can't simply let QEMU start its own CPUs, because there are various tasks that need performing after the migration transfer finishes, but before the CPUs are allowed to be started. eg - Finish 802.11Qb{g,h} (VEPA) network port profile association on target - Release leases for any resources associated with the source QEMU via a configured lock manager (eg sanlock) - Acquire leases for any resources associated with the target QEMU via a configured lock manager (eg sanlock) Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote: > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote: > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange: > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote: > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote: > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > >>>>>> Live migration with qcow2 or any other image format is just not > > > >>>>>> going to work > > > >>>>>> right now even with proper clustered storage. I think doing a > > > >>>>>> block level flush > > > >>>>>> cache interface and letting block devices decide how to do it is > > > >>>>>> the best approach. > > > >>>>> > > > >>>>> I would really prefer reusing the existing open/close code. It means > > > >>>>> less (duplicated) code, is existing code that is well tested and > > > >>>>> doesn't > > > >>>>> make migration much of a special case. > > > >>>>> > > > >>>>> If you want to avoid reopening the file on the OS level, we can > > > >>>>> reopen > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for > > > >>>>> now > > > >>>>> and in 1.1 we can use bdrv_reopen(). > > > >>>>> > > > >>>> > > > >>>> Intuitively I dislike _reopen style interfaces. If the second open > > > >>>> yields different results from the first, does it invalidate any > > > >>>> computations in between? > > > >>>> > > > >>>> What's wrong with just delaying the open? > > > >>> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you > > > >>> loose > > > >>> the ability to rollback to the source host upon open failure for most > > > >>> deployed versions of libvirt. We only fairly recently switched to a > > > >>> five > > > >>> stage migration handshake to cope with rollback when 'cont' fails. > > > >>> > > > >>> Daniel > > > >> > > > >> I guess reopen can fail as well, so this seems to me to be an important > > > >> fix but not a blocker. > > > > > > > > If if the initial open succeeds, then it is far more likely that a later > > > > re-open will succeed too, because you have already elminated the > > > > possibility > > > > of configuration mistakes, and will have caught most storage runtime > > > > errors > > > > too. So there is a very significant difference in reliability between > > > > doing > > > > an 'open at startup + reopen at cont' vs just 'open at cont' > > > > > > > > Based on the bug reports I see, we want to be very good at detecting and > > > > gracefully handling open errors because they are pretty frequent. > > > > > > Do you have some more details on the kind of errors? Missing files, > > > permissions, something like this? Or rather something related to the > > > actual content of an image file? > > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI > > setup. Access permissions due to incorrect user / group setup, or read > > only mounts, or SELinux denials. Actual I/O errors are less common and > > are not so likely to cause QEMU to fail to start any, since QEMU is > > likely to just report them to the guest OS instead. > > Do you run qemu with -S, then give a 'cont' command to start it? Yes Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote: > Am 14.11.2011 12:08, schrieb Daniel P. Berrange: > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote: > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote: > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote: > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori: > >>>>>> Live migration with qcow2 or any other image format is just not going > >>>>>> to work > >>>>>> right now even with proper clustered storage. I think doing a block > >>>>>> level flush > >>>>>> cache interface and letting block devices decide how to do it is the > >>>>>> best approach. > >>>>> > >>>>> I would really prefer reusing the existing open/close code. It means > >>>>> less (duplicated) code, is existing code that is well tested and doesn't > >>>>> make migration much of a special case. > >>>>> > >>>>> If you want to avoid reopening the file on the OS level, we can reopen > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now > >>>>> and in 1.1 we can use bdrv_reopen(). > >>>>> > >>>> > >>>> Intuitively I dislike _reopen style interfaces. If the second open > >>>> yields different results from the first, does it invalidate any > >>>> computations in between? > >>>> > >>>> What's wrong with just delaying the open? > >>> > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose > >>> the ability to rollback to the source host upon open failure for most > >>> deployed versions of libvirt. We only fairly recently switched to a five > >>> stage migration handshake to cope with rollback when 'cont' fails. > >>> > >>> Daniel > >> > >> I guess reopen can fail as well, so this seems to me to be an important > >> fix but not a blocker. > > > > If if the initial open succeeds, then it is far more likely that a later > > re-open will succeed too, because you have already elminated the possibility > > of configuration mistakes, and will have caught most storage runtime errors > > too. So there is a very significant difference in reliability between doing > > an 'open at startup + reopen at cont' vs just 'open at cont' > > > > Based on the bug reports I see, we want to be very good at detecting and > > gracefully handling open errors because they are pretty frequent. > > Do you have some more details on the kind of errors? Missing files, > permissions, something like this? Or rather something related to the > actual content of an image file? Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI setup. Access permissions due to incorrect user / group setup, or read only mounts, or SELinux denials. Actual I/O errors are less common and are not so likely to cause QEMU to fail to start any, since QEMU is likely to just report them to the guest OS instead. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote: > On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote: > > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > > > On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > > > Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > > > Live migration with qcow2 or any other image format is just not going > > > > > to work > > > > > right now even with proper clustered storage. I think doing a block > > > > > level flush > > > > > cache interface and letting block devices decide how to do it is the > > > > > best approach. > > > > > > > > I would really prefer reusing the existing open/close code. It means > > > > less (duplicated) code, is existing code that is well tested and doesn't > > > > make migration much of a special case. > > > > > > > > If you want to avoid reopening the file on the OS level, we can reopen > > > > only the topmost layer (i.e. the format, but not the protocol) for now > > > > and in 1.1 we can use bdrv_reopen(). > > > > > > > > > > Intuitively I dislike _reopen style interfaces. If the second open > > > yields different results from the first, does it invalidate any > > > computations in between? > > > > > > What's wrong with just delaying the open? > > > > If you delay the 'open' until the mgmt app issues 'cont', then you loose > > the ability to rollback to the source host upon open failure for most > > deployed versions of libvirt. We only fairly recently switched to a five > > stage migration handshake to cope with rollback when 'cont' fails. > > > > Daniel > > I guess reopen can fail as well, so this seems to me to be an important > fix but not a blocker. If if the initial open succeeds, then it is far more likely that a later re-open will succeed too, because you have already elminated the possibility of configuration mistakes, and will have caught most storage runtime errors too. So there is a very significant difference in reliability between doing an 'open at startup + reopen at cont' vs just 'open at cont' Based on the bug reports I see, we want to be very good at detecting and gracefully handling open errors because they are pretty frequent. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote: > On 11/11/2011 12:15 PM, Kevin Wolf wrote: > > Am 10.11.2011 22:30, schrieb Anthony Liguori: > > > Live migration with qcow2 or any other image format is just not going to > > > work > > > right now even with proper clustered storage. I think doing a block > > > level flush > > > cache interface and letting block devices decide how to do it is the best > > > approach. > > > > I would really prefer reusing the existing open/close code. It means > > less (duplicated) code, is existing code that is well tested and doesn't > > make migration much of a special case. > > > > If you want to avoid reopening the file on the OS level, we can reopen > > only the topmost layer (i.e. the format, but not the protocol) for now > > and in 1.1 we can use bdrv_reopen(). > > > > Intuitively I dislike _reopen style interfaces. If the second open > yields different results from the first, does it invalidate any > computations in between? > > What's wrong with just delaying the open? If you delay the 'open' until the mgmt app issues 'cont', then you loose the ability to rollback to the source host upon open failure for most deployed versions of libvirt. We only fairly recently switched to a five stage migration handshake to cope with rollback when 'cont' fails. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote: > On 11/10/2011 12:42 PM, Daniel P. Berrange wrote: > >On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote: > >>What does libvirt actually do in the monitor prior to migration > >>completing on the destination? The least invasive way of doing > >>delayed open of block devices is probably to make -incoming create a > >>monitor and run a main loop before the block devices (and full > >>device model) is initialized. Since this isolates the changes > >>strictly to migration, I'd feel okay doing this for 1.0 (although it > >>might need to be in the stable branch). > > > >The way migration works with libvirt wrt QEMU interactions is now > >as follows > > > > 1. Destination. > >Run qemu -incoming ...args... > >Query chardevs via monitor > >Query vCPU threads via monitor > >Set disk / vnc passwords > > Since RHEL carries Juan's patch, and Juan's patch doesn't handle > disk passwords gracefully, how does libvirt cope with that? No idea, that's the first I've heard of any patch that causes problems with passwords in QEMU. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote: > What does libvirt actually do in the monitor prior to migration > completing on the destination? The least invasive way of doing > delayed open of block devices is probably to make -incoming create a > monitor and run a main loop before the block devices (and full > device model) is initialized. Since this isolates the changes > strictly to migration, I'd feel okay doing this for 1.0 (although it > might need to be in the stable branch). The way migration works with libvirt wrt QEMU interactions is now as follows 1. Destination. Run qemu -incoming ...args... Query chardevs via monitor Query vCPU threads via monitor Set disk / vnc passwords Set netdev link states Set balloon target 2. Source Set migration speed Set migration max downtime Run migrate command (detached) while 1 Query migration status if status is failed or success break; 3. Destination If final status was success Run 'cont' in monitor else kill QEMU process 4. Source If final status was success and 'cont' on dest succeeded kill QEMU process else Run 'cont' in monitor In older libvirt, the bits from step 4, would actually take place at the end of step 2. This meant we could end up with no QEMU on either the source or dest, if starting CPUs on the dest QEMU failed for some reason. We would still really like to have a 'query-migrate' command for the destination, so that we can confirm that the destination has consumed all incoming migrate data successfully, rather than just blindly starting CPUs and hoping for the best. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for October 25
On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote: > Am 26.10.2011 11:57, schrieb Daniel P. Berrange: > > On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: > >> Kevin Wolf writes: > >> > >>> Am 25.10.2011 16:06, schrieb Anthony Liguori: > >>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote: > >>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori: > >>>>>> I'd be much more open to changing the default mode to cache=none FWIW > >>>>>> since the > >>>>>> risk of data loss there is much, much lower. > >>>>> > >>>>> I think people said that they'd rather not have cache=none as default > >>>>> because O_DIRECT doesn't work everywhere. > >>>> > >>>> Where doesn't it work these days? I know it doesn't work on tmpfs. I > >>>> know it > >>>> works on ext[234], btrfs, nfs. > >>> > >>> Besides file systems (and probably OSes) that don't support O_DIRECT, > >>> there's another case: Our defaults don't work on 4k sector disks today. > >>> You need to explicitly specify the logical_block_size qdev property for > >>> cache=none to work on them. > >>> > >>> And changing this default isn't trivial as the right value doesn't only > >>> depend on the host disk, but it's also guest visible. The only way out > >>> would be bounce buffers, but I'm not sure that doing that silently is a > >>> good idea... > >> > >> Sector size is a device property. > >> > >> If the user asks for a 4K sector disk, and the backend can't support > >> that, we need to reject the configuration. Just like we reject > >> read-only backends for read/write disks. > > > > I don't see why we need to reject a guest disk with 4k sectors, > > just because the host disk only has 512 byte sectors. A guest > > sector size that's a larger multiple of host sector size should > > work just fine. It just means any guest sector write will update > > 8 host sectors at a time. We only have problems if guest sector > > size is not a multiple of host sector size, in which case bounce > > buffers are the only option (other than rejecting the config > > which is not too nice). > > > > IIUC, current QEMU behaviour is > > > >Guest 512Guest 4k > > Host 512 * OK OK > > Host 4k* I/O Err OK > > > > '*' marks defaults > > > > IMHO, QEMU needs to work withot I/O errors in all of these > > combinations, even if this means having to use bounce buffers > > in some of them. That said, IMHO the default should be for > > QEMU to avoid bounce buffers, which implies it should either > > chose guest sector size to match host sector size, or it > > should unconditionally use 4k guest. IMHO we need the former > > > >Guest 512 Guest 4k > > Host 512 *OK OK > > Host 4k OK*OK > > I'm not sure if a 4k host should imply a 4k guest by default. This means > that some guests wouldn't be able to run on a 4k host. On the other > hand, for those guests that can do 4k, it would be the much better option. > > So I think this decision is the hard thing about it. I guess it somewhat depends whether we want to strive for 1. Give the user the fastest working config by default 2. Give the user a working config by default 3. Give the user the fastest (possibly broken) config by default IMHO 3 is not a serious option, but I could see 2 as a reasonable tradeoff to avoid complexity in chosing QEMU defaults. The user would have a working config with 512 sectors, but sub-optimal perf on 4k hosts due to bounce buffering. Ideally libvirt or other higher app would be setting the best block size that a guest can support by default, so bounce buffers would rarely be needed. So only people using QEMU directly without setting a block size would ordinarily suffer the bounce buffer perf hit on a 4k host host Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for October 25
On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote: > Kevin Wolf writes: > > > Am 25.10.2011 16:06, schrieb Anthony Liguori: > >> On 10/25/2011 08:56 AM, Kevin Wolf wrote: > >>> Am 25.10.2011 15:05, schrieb Anthony Liguori: > I'd be much more open to changing the default mode to cache=none FWIW > since the > risk of data loss there is much, much lower. > >>> > >>> I think people said that they'd rather not have cache=none as default > >>> because O_DIRECT doesn't work everywhere. > >> > >> Where doesn't it work these days? I know it doesn't work on tmpfs. I > >> know it > >> works on ext[234], btrfs, nfs. > > > > Besides file systems (and probably OSes) that don't support O_DIRECT, > > there's another case: Our defaults don't work on 4k sector disks today. > > You need to explicitly specify the logical_block_size qdev property for > > cache=none to work on them. > > > > And changing this default isn't trivial as the right value doesn't only > > depend on the host disk, but it's also guest visible. The only way out > > would be bounce buffers, but I'm not sure that doing that silently is a > > good idea... > > Sector size is a device property. > > If the user asks for a 4K sector disk, and the backend can't support > that, we need to reject the configuration. Just like we reject > read-only backends for read/write disks. I don't see why we need to reject a guest disk with 4k sectors, just because the host disk only has 512 byte sectors. A guest sector size that's a larger multiple of host sector size should work just fine. It just means any guest sector write will update 8 host sectors at a time. We only have problems if guest sector size is not a multiple of host sector size, in which case bounce buffers are the only option (other than rejecting the config which is not too nice). IIUC, current QEMU behaviour is Guest 512Guest 4k Host 512 * OK OK Host 4k* I/O Err OK '*' marks defaults IMHO, QEMU needs to work withot I/O errors in all of these combinations, even if this means having to use bounce buffers in some of them. That said, IMHO the default should be for QEMU to avoid bounce buffers, which implies it should either chose guest sector size to match host sector size, or it should unconditionally use 4k guest. IMHO we need the former Guest 512 Guest 4k Host 512 *OK OK Host 4k OK*OK Yes, I know there are other wierd sector sizes besides 512 and 4k, but the same general principals apply of either one being a multiple of the other, or needing to use bounce buffers. > If the backend can only support it by using bounce buffers, I'd say > reject it unless the user explicitly permits bounce buffers. But that's > debatable. I don't think it really adds value for QEMU to force the user to specify some extra magic flag in order to make the user's requested config actually be honoured. If a config needs bounce buffers, QEMU should just do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app is in a better position to inform users about the consequences. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/11] virt: Introducing libvirt VM class
On Tue, Oct 11, 2011 at 06:07:11PM -0300, Lucas Meneghel Rodrigues wrote: > This is a first attempt at providing a libvirt VM class, > in order to implement the needed methods for virt testing. > With this class, we will be able to implement a libvirt > test, that behaves similarly to the KVM test. > > As of implementation details, libvirt_vm uses virsh > (a userspace program written on top of libvirt) to > do domain start, stop, verification of status and > other common operations. The reason why virsh was > used is to get more coverage of the userspace stack > that libvirt offers, and also to catch issues that > virsh users would catch. Personally I would have recommended that you use the libvirt Python API. virsh is a very thin layer over the libvirt API, which mostly avoidse adding any logic of its own, so once it has been tested once, there's not much value in doing more. By using the Python API directly, you will be able todo more intelligent handling of errors, since you'll get the full libvirt python exception object instead of a blob of stuff on stderr. Not to mention that it is so much more efficient, and robust against any future changes in virsh. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)
On Wed, Sep 28, 2011 at 12:19:09PM +0200, Reeted wrote: > On 09/28/11 11:53, Daniel P. Berrange wrote: > >On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote: > >>YES! > >>It's the vhost. With vhost=on it takes about 12 seconds more time to boot. > >> > >>...meaning? :-) > >I've no idea. I was always under the impression that 'vhost=on' was > >the 'make it go much faster' switch. So something is going wrong > >here that I cna't explain. > > > >Perhaps one of the network people on this list can explain... > > > > > >To turn vhost off in the libvirt XML, you should be able to use > > for the interface in question,eg > > > > > > > > > > > > > > > > > Ok that seems to work: it removes the vhost part in the virsh launch > hence cutting down 12secs of boot time. > > If nobody comes out with an explanation of why, I will open another > thread on the kvm list for this. I would probably need to test disk > performance on vhost=on to see if it degrades or it's for another > reason that boot time is increased. Be sure to CC the qemu-devel mailing list too next time, since that has a wider audience who might be able to help Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)
On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote: > On 09/28/11 11:28, Daniel P. Berrange wrote: > >On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote: > >>On 09/28/11 09:51, Daniel P. Berrange wrote: > >>>>This is my bash commandline: > >>>> > >>>>/opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm > >>>>-m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid > >>>>ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults > >>>>-chardev > >>>>socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait > >>>>-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc > >>>>-boot order=dc,menu=on -drive > >>>>file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native > >>>>-device > >>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > >>>>-drive > >>>>if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native > >>>>-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 > >>>>-net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no > >>>>-usb -vnc 127.0.0.1:0 -vga cirrus -device > >>>>virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 > >>>This shows KVM is being requested, but we should validate that KVM is > >>>definitely being activated when under libvirt. You can test this by > >>>doing: > >>> > >>> virsh qemu-monitor-command vmname1 'info kvm' > >>kvm support: enabled > >> > >>I think I would see a higher impact if it was KVM not enabled. > >> > >>>>Which was taken from libvirt's command line. The only modifications > >>>>I did to the original libvirt commandline (seen with ps aux) were: > > > >>>>- Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18 > >>>>-device > >>>>virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 > >>>>Has been simplified to: -net nic,model=virtio -net > >>>>tap,ifname=tap0,script=no,downscript=no > >>>>and manual bridging of the tap0 interface. > >>>You could have equivalently used > >>> > >>> -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on > >>> -device > >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 > >>It's this! It's this!! (thanks for the line) > >> > >>It raises boot time by 10-13 seconds > >Ok, that is truely bizarre and I don't really have any explanation > >for why that is. I guess you could try 'vhost=off' too and see if that > >makes the difference. > > YES! > It's the vhost. With vhost=on it takes about 12 seconds more time to boot. > > ...meaning? :-) I've no idea. I was always under the impression that 'vhost=on' was the 'make it go much faster' switch. So something is going wrong here that I cna't explain. Perhaps one of the network people on this list can explain... To turn vhost off in the libvirt XML, you should be able to use for the interface in question,eg Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] Qemu/KVM is 3x slower under libvirt
On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote: > On 09/28/11 09:51, Daniel P. Berrange wrote: > >>This is my bash commandline: > >> > >>/opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm > >>-m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid > >>ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults > >>-chardev > >>socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait > >>-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc > >>-boot order=dc,menu=on -drive > >>file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native > >>-device > >>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > >>-drive > >>if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native > >>-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 > >>-net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no > >>-usb -vnc 127.0.0.1:0 -vga cirrus -device > >>virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 > > > >This shows KVM is being requested, but we should validate that KVM is > >definitely being activated when under libvirt. You can test this by > >doing: > > > > virsh qemu-monitor-command vmname1 'info kvm' > > kvm support: enabled > > I think I would see a higher impact if it was KVM not enabled. > > >>Which was taken from libvirt's command line. The only modifications > >>I did to the original libvirt commandline (seen with ps aux) were: > >>- Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18 > >>-device > >>virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 > >>Has been simplified to: -net nic,model=virtio -net > >>tap,ifname=tap0,script=no,downscript=no > >>and manual bridging of the tap0 interface. > >You could have equivalently used > > > > -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on > > -device > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 > > It's this! It's this!! (thanks for the line) > > It raises boot time by 10-13 seconds Ok, that is truely bizarre and I don't really have any explanation for why that is. I guess you could try 'vhost=off' too and see if that makes the difference. > > But now I don't know where to look During boot there is a pause > usually between /scripts/init-bottom (Ubuntu 11.04 guest) and the > appearance of login prompt, however that is not really meaningful > because there is probably much background activity going on there, > with init etc. which don't display messages > > > init-bottom does just this > > - > #!/bin/sh -e > # initramfs init-bottom script for udev > > PREREQ="" > > # Output pre-requisites > prereqs() > { > echo "$PREREQ" > } > > case "$1" in > prereqs) > prereqs > exit 0 > ;; > esac > > > # Stop udevd, we'll miss a few events while we run init, but we catch up > pkill udevd > > # Move /dev to the real filesystem > mount -n -o move /dev ${rootmnt}/dev > - > > It doesn't look like it should take time to execute. > So there is probably some other background activity going on... and > that is slower, but I don't know what that is. > > > Another thing that can be noticed is that the dmesg message: > > [ 13.290173] eth0: no IPv6 routers present > > (which is also the last message) > > happens on average 1 (one) second earlier in the fast case (-net) > than in the slow case (-netdev) Hmm, none of that looks particularly suspect. So I don't really have much idea what else to try apart from the 'vhost=off' possibilty. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] Qemu/KVM is 3x slower under libvirt
On Tue, Sep 27, 2011 at 08:10:21PM +0200, Reeted wrote: > I repost this, this time by also including the libvirt mailing list. > > Info on my libvirt: it's the version in Ubuntu 11.04 Natty which is > 0.8.8-1ubuntu6.5 . I didn't recompile this one, while Kernel and > qemu-kvm are vanilla and compiled by hand as described below. > > My original message follows: > > This is really strange. > > I just installed a new host with kernel 3.0.3 and Qemu-KVM 0.14.1 > compiled by me. > > I have created the first VM. > This is on LVM, virtio etc... if I run it directly from bash > console, it boots in 8 seconds (it's a bare ubuntu with no > graphics), while if I boot it under virsh (libvirt) it boots in > 20-22 seconds. This is the time from after Grub to the login prompt, > or from after Grub to the ssh-server up. > > I was almost able to replicate the whole libvirt command line on the > bash console, and it still goes almost 3x faster when launched from > bash than with virsh start vmname. The part I wasn't able to > replicate is the -netdev part because I still haven't understood the > semantics of it. -netdev is just an alternative way of setting up networking that avoids QEMU's nasty VLAN concept. Using -netdev allows QEMU to use more efficient codepaths for networking, which should improve the network performance. > This is my bash commandline: > > /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm > -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid > ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc > -boot order=dc,menu=on -drive > file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native > -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > -drive > if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native > -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 > -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no > -usb -vnc 127.0.0.1:0 -vga cirrus -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 This shows KVM is being requested, but we should validate that KVM is definitely being activated when under libvirt. You can test this by doing: virsh qemu-monitor-command vmname1 'info kvm' > Which was taken from libvirt's command line. The only modifications > I did to the original libvirt commandline (seen with ps aux) were: > > - Removed -S Fine, has no effect on performance. > - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18 > -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 > Has been simplified to: -net nic,model=virtio -net > tap,ifname=tap0,script=no,downscript=no > and manual bridging of the tap0 interface. You could have equivalently used -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3 That said, I don't expect this has anything todo with the performance since booting a guest rarely involves much network I/O unless you're doing something odd like NFS-root / iSCSI-root. > Firstly I had thought that this could be fault of the VNC: I have > compiled qemu-kvm with no separate vnc thread. I thought that > libvirt might have connected to the vnc server at all times and this > could have slowed down the whole VM. > But then I also tried connecting vith vncviewer to the KVM machine > launched directly from bash, and the speed of it didn't change. So > no, it doesn't seem to be that. Yeah, I have never seen VNC be responsible for the kind of slowdown you describe. > BTW: is the slowdown of the VM on "no separate vnc thread" only in > effect when somebody is actually connected to VNC, or always? Probably, but again I dont think it is likely to be relevant here. > Also, note that the time difference is not visible in dmesg once the > machine has booted. So it's not a slowdown in detecting devices. > Devices are always detected within the first 3 seconds, according to > dmesg, at 3.6 seconds the first ext4 mount begins. It seems to be > really the OS boot that is slow... it seems an hard disk performance > problem. There are a couple of things that would be different between running the VM directly, vs via libvirt. - Security drivers - SELinux/AppArmour - CGroups If it is was AppArmour causing this slowdown I don't think you would have been the first person to complain, so lets ignore that. Which leaves cgroups as a likely culprit. Do a grep cgroup /proc/mounts if any of them are mounted, then for each cgroups mount in turn, - Umount the cgroup - Restart libvirtd - Test your guest boot performance Regards, Daniel -- |: http://berrange.com -o-h
Re: How many threads should a kvm vm be starting?
On Tue, Sep 27, 2011 at 04:04:41PM -0600, Thomas Fjellstrom wrote: > On September 27, 2011, Avi Kivity wrote: > > On 09/27/2011 03:29 AM, Thomas Fjellstrom wrote: > > > I just noticed something interesting, a virtual machine on one of my > > > servers seems to have 69 threads (including the main thread). Other > > > guests on the machine only have a couple threads. > > > > > > Is this normal? or has something gone horribly wrong? > > > > It's normal if the guest does a lot of I/O. The thread count should go > > down when the guest idles. > > Ah, that would make sense. Though it kind of defeats assigning a vm a single > cpu/core. A single VM can now DOS an entire multi-core-cpu server. It pretty > much pegged my dual core (with HT) server for a couple hours. You can mitigate these problems by putting each KVM process in its own cgroup, and using the 'cpu_shares' tunable to ensure that each KVM process gets the same relative ratio of CPU time, regardless of how many threads it is running. With newer kernels there are other CPU tunables for placing hard caps on CPU utilization of the process as a whole too. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Avoid the use of deprecated gnutls gnutls_*_set_priority functions.
On Thu, Aug 25, 2011 at 11:54:41AM +0100, Stefan Hajnoczi wrote: > On Mon, Jul 4, 2011 at 11:00 PM, Raghavendra D Prabhu > wrote: > > The gnutls_*_set_priority family of functions has been marked deprecated > > in 2.12.x. These functions have been superceded by > > gnutls_priority_set_direct(). > > > > Signed-off-by: Raghavendra D Prabhu > > --- > > ui/vnc-tls.c | 20 +--- > > 1 files changed, 1 insertions(+), 19 deletions(-) > > > > diff --git a/ui/vnc-tls.c b/ui/vnc-tls.c > > index dec626c..33a5d8c 100644 > > --- a/ui/vnc-tls.c > > +++ b/ui/vnc-tls.c > > @@ -286,10 +286,6 @@ int vnc_tls_validate_certificate(struct VncState *vs) > > > > int vnc_tls_client_setup(struct VncState *vs, > > int needX509Creds) { > > - static const int cert_type_priority[] = { GNUTLS_CRT_X509, 0 }; > > - static const int protocol_priority[]= { GNUTLS_TLS1_1, GNUTLS_TLS1_0, > > GNUTLS_SSL3, 0 }; > > - static const int kx_anon[] = {GNUTLS_KX_ANON_DH, 0}; > > - static const int kx_x509[] = {GNUTLS_KX_DHE_DSS, GNUTLS_KX_RSA, > > GNUTLS_KX_DHE_RSA, GNUTLS_KX_SRP, 0}; > > > > VNC_DEBUG("Do TLS setup\n"); > > if (vnc_tls_initialize() < 0) { > > @@ -310,21 +306,7 @@ int vnc_tls_client_setup(struct VncState *vs, > > return -1; > > } > > > > - if (gnutls_kx_set_priority(vs->tls.session, needX509Creds ? > > kx_x509 : kx_anon) < 0) { > > - gnutls_deinit(vs->tls.session); > > - vs->tls.session = NULL; > > - vnc_client_error(vs); > > - return -1; > > - } > > - > > - if (gnutls_certificate_type_set_priority(vs->tls.session, > > cert_type_priority) < 0) { > > - gnutls_deinit(vs->tls.session); > > - vs->tls.session = NULL; > > - vnc_client_error(vs); > > - return -1; > > - } > > - > > - if (gnutls_protocol_set_priority(vs->tls.session, > > protocol_priority) < 0) { > > + if (gnutls_priority_set_direct(vs->tls.session, needX509Creds ? > > "NORMAL" : "NORMAL:+ANON-DH", NULL) < 0) { > > gnutls_deinit(vs->tls.session); > > vs->tls.session = NULL; > > vnc_client_error(vs); > > -- > > 1.7.6 > > Daniel, > This patch looks good to me but I don't know much about gnutls or > crypto in general. Would you be willing to review this? ACK, this approach is different from what I did in libvirt, but it matches the recommendations in the GNUTLS manual for setting priority, so I believe it is good. Signed-off-by: Daniel P. Berrange Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMI BIOS String
On Mon, Aug 22, 2011 at 03:52:19PM +1200, Derek wrote: > Hi Folks, > > I could not track down any solid info on modifying the DMI BIOS string. > > For example, in VirtualBox you can use 'vboxmanage setsextradata' to > set the BIOS product and vendor string per VM. > > Any ideas if this is possible with KVM? If using QEMU directly you can use '-smbios' args. eg -smbios "type=0,vendor=LENOVO,version=6FET82WW (3.12 )" -smbios "type=1,manufacturer=Fedora,product=Virt-Manager,version=0.8.2-3.fc14,serial=32dfcb37-5af1-552b-357c-be8c3aa38310,uuid=c7a5fdbd-edaf-9455-926a-d65c16db1809,sku=1234567890,family=Red Hat" If using QEMU via libvirt you can use the following: http://libvirt.org/formatdomain.html#elementsSysinfo Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm
From: "Daniel P. Berrange" The default accelerator is hardcoded to 'kvm'. This is a fine default for qemu-kvm normally, but if the user built with ./configure --disable-kvm, then the resulting binaries will not work by default * vl.c: Default to 'tcg' unless CONFIG_KVM is defined Signed-off-by: Daniel P. Berrange --- vl.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index 7ae549e..28fd2f3 100644 --- a/vl.c +++ b/vl.c @@ -1953,8 +1953,13 @@ static int configure_accelerator(void) } if (p == NULL) { +#ifdef CONFIG_KVM /* Use the default "accelerator", kvm */ p = "kvm"; +#else +/* Use the default "accelerator", tcg */ +p = "tcg"; +#endif } while (!accel_initalised && *p != '\0') { -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm
From: "Daniel P. Berrange" The default accelerator is hardcoded to 'kvm'. This is a fine default for qemu-kvm normally, but if the user built with ./configure --disable-kvm, then the resulting binaries will not work by default * vl.c: Default to 'tcg' unless CONFIG_KVM is defined --- vl.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index 7ae549e..28fd2f3 100644 --- a/vl.c +++ b/vl.c @@ -1953,8 +1953,13 @@ static int configure_accelerator(void) } if (p == NULL) { +#ifdef CONFIG_KVM /* Use the default "accelerator", kvm */ p = "kvm"; +#else +/* Use the default "accelerator", tcg */ +p = "tcg"; +#endif } while (!accel_initalised && *p != '\0') { -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Introduce panic hypercall
On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote: > On 06/20/2011 04:38 PM, Daniel Gollub wrote: > >Introduce panic hypercall to enable the crashing guest to notify the > >host. This enables the host to run some actions as soon a guest > >crashed (kernel panic). > > > >This patch series introduces the panic hypercall at the host end. > >As well as the hypercall for KVM paravirtuliazed Linux guests, by > >registering the hypercall to the panic_notifier_list. > > > >The basic idea is to create KVM crashdump automatically as soon the > >guest paniced and power-cycle the VM (e.g. libvirt). > > This would be more easily done via a "panic device" (I/O port or > memory-mapped address) that the guest hits. It would be intercepted > by qemu without any new code in kvm.\ > > However, I'm not sure I see the gain. Most enterprisey guests > already contain in-guest crash dumpers which provide more > information than a qemu memory dump could, since they know exact > load addresses etc. and are integrated with crash analysis tools. > What do you have in mind? Well libvirt can capture a "core" file by doing 'virsh dump $GUESTNAME'. This actually uses the QEMU monitor migration command to capture the entire of QEMU memory. The 'crash' command line tool actually knows how to analyse this data format as it would a normal kernel crashdump. I think having a way for a guest OS to notify the host that is has crashed would be useful. libvirt could automatically do a crash dump of the QEMU memory, or at least pause the guest CPUs and notify the management app of the crash, which can then decide what todo. You can also use tools like 'virt-dmesg' which uses libvirt to peek into guest memory to extract the most recent kernel dmesg logs (even if the guest OS itself is crashed & didn't manage to send them out via netconsole or something else). This series does need to introduce a QMP event notification upon crash, so that the crash notification can be propagated to mgmt layers above QEMU. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel][RFC]QEMU disk I/O limits
On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote: > On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote: > > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote: > > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote: > > > > Hello, all, > > > > > > > > I have prepared to work on a feature called "Disk I/O limits" for > > > > qemu-kvm projeect. > > > > This feature will enable the user to cap disk I/O amount performed > > > > by a VM.It is important for some storage resources to be shared among > > > > multi-VMs. As you've known, if some of VMs are doing excessive disk > > > > I/O, they will hurt the performance of other VMs. > > > > > > > > > > Hi Zhiyong, > > > > > > Why not use kernel blkio controller for this and why reinvent the wheel > > > and implement the feature again in qemu? > > > > The finest level of granularity offered by cgroups apply limits per QEMU > > process. So the blkio controller can't be used to apply controls directly > > to individual disks used by QEMU, only the VM as a whole. > > So are multiple VMs using same disk. Then put multiple VMs in same > cgroup and apply the limit on that disk. > > Or if you want to put a system wide limit on a disk, then put all > VMs in root cgroup and put limit on root cgroups. > > I fail to understand what's the exact requirement here. I thought > the biggest use case was isolation one VM from other which might > be sharing same device. Hence we were interested in putting > per VM limit on disk and not a system wide limit on disk (independent > of VM). No, it isn't about putting limits on a disk independant of a VM. It is about one VM having multiple disks, and wanting to set different policies for each of its virtual disks. eg qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3 and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is limited to 50 MB/s. You can't do that kind of thing with cgroups, because it can only control the entire process, not individual resources within the process. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel][RFC]QEMU disk I/O limits
On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote: > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote: > > Hello, all, > > > > I have prepared to work on a feature called "Disk I/O limits" for > > qemu-kvm projeect. > > This feature will enable the user to cap disk I/O amount performed by a > > VM.It is important for some storage resources to be shared among multi-VMs. > > As you've known, if some of VMs are doing excessive disk I/O, they will > > hurt the performance of other VMs. > > > > Hi Zhiyong, > > Why not use kernel blkio controller for this and why reinvent the wheel > and implement the feature again in qemu? The finest level of granularity offered by cgroups apply limits per QEMU process. So the blkio controller can't be used to apply controls directly to individual disks used by QEMU, only the VM as a whole. We networking we can use 'net_cls' cgroups controller for the process as a whole, or attach 'tc' to individual TAP devices for per-NIC throttling, both of which ultimately use the same kernel functionality. I don't see an equivalent option for throttling individual disks that would reuse functionality from the blkio controller. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drop -enable-nesting
On Mon, May 30, 2011 at 06:19:14PM +0300, Avi Kivity wrote: > On 05/30/2011 06:15 PM, Jan Kiszka wrote: > >On 2011-05-30 17:10, Roedel, Joerg wrote: > >> On Mon, May 30, 2011 at 11:04:02AM -0400, Jan Kiszka wrote: > >>> On 2011-05-30 16:38, Nadav Har'El wrote: > On Mon, May 30, 2011, Jan Kiszka wrote about "drop -enable-nesting > (was: [PATCH 3/7] cpu model bug fixes and definition corrections...)": > > On 2011-05-30 10:18, Roedel, Joerg wrote: > >> On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote: > >> > >>> J�rg, how to deal with -enable-nesting in qemu-kvm to align behavior > >>> with upstream? > >> > >> My personal preference is to just remove it. In upstream-qemu it is > >> enabled/disabled by +/-svm. -enable-nesting is just a historic thing > >> which can be wiped out. > > "-enable-nesting" could remain as a synonym for enabling either VMX or > SVM > in the guest, depending on what was available in the host (because KVM > now > supports both nested SVM and nested VMX, but not SVM-on-VMX or vice > versa). > >>> > >>> Why? Once nesting is stable (I think SVM already is), there is no reason > >>> for an explicit enable. And you can always mask it out via -cpu. > >>> > >>> BTW, what are the defaults for SVM right now in qemu-kvm and upstream? > >>> Enable if the modeled CPU supports it? > >> > >> qemu-kvm still needs -enable-nesting, otherwise it is disabled. Upstream > >> qemu should enable it unconditionally (can be disabled with -cpu ,-svm). > > > >Then let's start with aligning qemu-kvm defaults to upstream? I guess > >that's what the diff I was citing yesterday is responsible for. > > > >In the same run, -enable-nesting could dump a warning on the console > >that this switch is obsolete and will be removed from future versions. > > I think it's safe to drop -enable-nesting immediately. Dan, does > libvirt make use of it? Yes, but it should be safe to drop it. Currently, if the user specifies a CPU with the 'svm' flag present in libvirt guest XML, then we will pass args '-cpu +svm -enable-nesting'. So if we drop --enable-nesting, then libvirt will simply omit it and everything should still work because we have still got +svm set. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Wed, Apr 13, 2011 at 10:56:21PM +0300, Blue Swirl wrote: > On Wed, Apr 13, 2011 at 4:08 PM, Luiz Capitulino > wrote: > > On Tue, 12 Apr 2011 21:31:18 +0300 > > Blue Swirl wrote: > > > >> On Tue, Apr 12, 2011 at 10:52 AM, Avi Kivity wrote: > >> > On 04/11/2011 08:15 PM, Blue Swirl wrote: > >> >> > >> >> On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbruster > >> >> wrote: > >> >> > Avi Kivity writes: > >> >> > > >> >> >> On 04/08/2011 12:41 AM, Anthony Liguori wrote: > >> >> >>> > >> >> >>> And it's a good thing to have, but exposing this as the only API to > >> >> >>> do something as simple as generating a guest crash dump is not the > >> >> >>> friendliest thing in the world to do to users. > >> >> >> > >> >> >> nmi is a fine name for something that corresponds to a real-life nmi > >> >> >> button (often labeled "NMI"). > >> >> > > >> >> > Agree. > >> >> > >> >> We could also introduce an alias mechanism for user friendly names, so > >> >> nmi could be used in addition of full path. Aliases could be useful > >> >> for device paths as well. > >> > > >> > Yes. Perhaps limited to the human monitor. > >> > >> I'd limit all debugging commands (including NMI) to the human monitor. > > > > Why? > > Do they have any real use in production environment? Also, we should > have the freedom to change the debugging facilities (for example, to > improve some internal implementation) as we want without regard to > compatibility to previous versions. We have users of libvirt requesting that we add an API for triggering a NMI. They want this for support in a production environment, to be able to initiate Windows crash dumps. We really don't want to have to use HMP passthrough for this, instead of a proper QMP command. More generally I don't want to see stuff in HMP, that isn't in the QMP. We already have far too much that we have to do via HMP passthrough in libvirt due to lack of QMP commands, to the extent that we might as well have just ignored QMP and continued with HMP for everything. If we want the flexibility to change the debugging commands between releases then we should come up with a plan to do this within the scope of QMP, not restrict them to HMP only. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote: > From: Lai Jiangshan > Date: Mon, 7 Mar 2011 17:05:15 +0800 > Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command > > inject-nmi command injects an NMI on all CPUs of guest. > It is only supported for x86 guest currently, it will > returns "Unsupported" error for non-x86 guest. > > --- > hmp-commands.hx |2 +- > monitor.c | 18 +- > qmp-commands.hx | 29 + > 3 files changed, 47 insertions(+), 2 deletions(-) Does anyone have any feedback on this addition, or are all new QMP patch proposals blocked pending Anthony's QAPI work ? We'd like to support it in libvirt and thus want it to be available in QMP, as well as HMP. > @@ -2566,6 +2566,22 @@ static void do_inject_nmi(Monitor *mon, const QDict > *qdict) > break; > } > } > + > +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject > **ret_data) > +{ > +CPUState *env; > + > +for (env = first_cpu; env != NULL; env = env->next_cpu) > +cpu_interrupt(env, CPU_INTERRUPT_NMI); > + > +return 0; > +} > +#else > +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject > **ret_data) > +{ > +qerror_report(QERR_UNSUPPORTED); > +return -1; > +} > #endif > Interesting that with HMP you need to specify a single CPU index, but with QMP it is injecting to all CPUs at once. Is there any compelling reason why we'd ever need the ability to only inject to a single CPU from an app developer POV ? Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] [Qemu-devel] KVM call minutes for Mar 15
On Tue, Mar 15, 2011 at 12:06:06PM -0700, Chris Wright wrote: > * Anthony Liguori (anth...@codemonkey.ws) wrote: > > On 03/15/2011 09:53 AM, Chris Wright wrote: > > > QAPI > > > >- c library implementation is critical to have unit tests and test > > > driven development > > > - thread safe? > > > - no shared state, no statics. > > > - threading model requires lock for the qmp session > > > - licensiing? > > > - LGPL > > > - forwards/backwards compat? > > > - designed with that in mind see wiki: > > > > > > http://wiki.qemu.org/Features/QAPI > > > > One neat feature of libqmp is that once libvirt has a better QMP > > passthrough interface, we can create a QmpSession that uses libvirt. > > > > It would look something like: > > > > QmpSession *libqmp_session_new_libvirt(virDomainPtr dom); > > Looks like you mean this? > >-> request QmpSession -> > client libvirt ><- return QmpSession <- > > client -> QmpSession -> QMP -> QEMU > > So bypassing libvirt completely to actually use the session? > > Currently, it's more like: > > client -> QemuMonitorCommand -> libvirt -> QMP -> QEMU > > > The QmpSession returned by this call can then be used with all of > > the libqmp interfaces. This means we can still exercise our test > > suite with a guest launched through libvirt. It also should make > > the libvirt pass through interface a bit easier to consume by third > > parties. > > This sounds like it's something libvirt folks should be involved with. > At the very least, this mode is there now and considered basically > unstable/experimental/developer use: > > "Qemu monitor command '%s' executed; libvirt results may be unpredictable!" > > So likely some concern about making it easier to use, esp. assuming > that third parties above are mgmt apps, not just developers. Although we provide monitor and command line passthrough in libvirt, our recommendation is that mgmt apps do not develop against these APIs. Our goal / policy is that apps should be able todo anything they need using the formally modelled libvirt public APIs. The primary intended usage for the monitor/command line passthrough is debugging & experimentation, and as a very short term hack/workaround for mgmt apps while formal APIs are added to libvirt. In other words, we provide the feature because we don't want libvirt to be a roadblock, but we still strongly discourage their usage untill all other options have been exhausted. In same way as loading binary only modules into the kernels sets a 'tainted' flag, we plan that direct usage of monitor/command line passthrough will set a tainted flag against a VM. This is allow distro maintainers to identify usage & decide how they wish to support these features in products (if at all). Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Configuring the bridge interface: why assign an IP?
On Mon, Mar 14, 2011 at 11:24:40AM -0600, Ben Beuchler wrote: > Most of the examples for setting up the bridge interface on a VM host > suggest assigning the IP address to the bridge. Assigning the IP to > the bridge leaves you open to the MAC address of the bridge changing > as you add/remove guests from the host, resulting in a brief (~20 > second) loss of connectivity to the host. (I am aware that I can > manually set the MAC of the bridge to avoid unexpected changes. That's > my current workaround.) You don't need to manually set a MAC on the bridge - indeed you can't set an arbitrary MAC on it - it must have a MAC that matches one of the interfaces enslaved. The key is that the MAC of the enslaved ethernet device should be numerically smaller than that of any guest TAP devices. The kernel gives TAP devices a completely random MAC by default, so you need to make a little change to that. Two options - Take the random host TAP device MAC and simply set the first byte to 0xFE - Take the guest NIC MAC, set first byte to 0xFE and give that to the host TAP device. Recent releases of libvirt, follow the second approach and it has worked out well, eliminating any connectivity loss with guest startup/shutdown Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with bridged tap interface
On Wed, Feb 23, 2011 at 12:34:45PM +0100, andreas.a...@de.transport.bombardier.com wrote: > Hi all, > > sorry for the previous partial e-mail, I hit the send button accidentally > ;-). > > I have a setup with a kvm-based virtual machine running a stock RedHat 6.1 > (yes, that old) on a rather current debian host. > > 1. uname in host: 2.6.26-2-amd64 #1 SMP Wed May 12 18:03:14 UTC 2010 > x86_64 GNU/Linux > > 2. uname in guest: 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown > > eth0 of the guest is connected via tap0 to a kernel bridge, that is in > turn connected via the host's eth1 to a Gigabit link. On the kvm > command-line I configure the guest-nic as "model=ne2k_pci". > > The problem is, that I frequently loose network access from/to the guest. There have been QEMU NIC model implementation bugs that exhibit that characteristic. If you have the drivers available in the guest, then I'd recommend trying out different NIC models than ne2k, since that's probably the least actively maintained NIC model. At least try rtl8139, but ideally the e1000 too. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.
On Thu, Feb 10, 2011 at 07:23:33PM +0900, Yoshiaki Tamura wrote: > 2011/2/10 Daniel P. Berrange : > > On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote: > >> On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote: > >> >Currently FdMigrationState doesn't support read(), and this patch > >> >introduces it to get response from the other side. > >> > > >> >Signed-off-by: Yoshiaki Tamura > >> > >> Migration is unidirectional. Changing this is fundamental and not > >> something to be done lightly. > > > > Making it bi-directional might break libvirt's save/restore > > to file support which uses migration, passing a unidirectional > > FD for the file. It could also break libvirt's secure tunnelled > > migration support which is currently only expecting to have > > data sent in one direction on the socket. > > Hi Daniel, > > IIUC, this patch isn't something to make existing live migration > bi-directional. Just opens up a way for Kemari to use it. Do > you think it's dangerous for libvirt still? The key is for it to be a no-op for any usage of the existing 'migrate' command. I had thought this was wiring up read into the event loop too, so it would be poll()ing for reads, but after re-reading I see this isn't the case here. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.
On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote: > On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote: > >Currently FdMigrationState doesn't support read(), and this patch > >introduces it to get response from the other side. > > > >Signed-off-by: Yoshiaki Tamura > > Migration is unidirectional. Changing this is fundamental and not > something to be done lightly. Making it bi-directional might break libvirt's save/restore to file support which uses migration, passing a unidirectional FD for the file. It could also break libvirt's secure tunnelled migration support which is currently only expecting to have data sent in one direction on the socket. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2
On Sat, Feb 05, 2011 at 04:34:01PM +, James Neave wrote: > Hi, > > I'm trying to pass a NOVA-T-500 TV Tuner card through to a gust VM. > I'm getting the error "The driver 'pci-stub' is occupying your device > :08:06.2" This is a rather misleading error message. It is *expected* that pci-stub will occupy the device. Unfortunately the rest of the error messages QEMU is printing aren't much help either, but ultimately something is returning -EBUSY in the PCI device assign step Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote: > Hi, > > >For (2), you cannot use bus=X,addr=Y because it makes assumptions about > >the PCI topology which may change in newer -M pc's. > > Why should the PCI topology for 'pc' ever change? > > We'll probably get q35 support some day, but when this lands I > expect we'll see a new machine type 'q35', so '-m q35' will pick the > ich9 chipset (which will have a different pci topology of course) > and '-m pc' will pick the existing piix chipset (which will continue > to look like it looks today). If the topology does ever change (eg in the kind of way anthony suggests, first bus only has the graphics card), I think libvirt is going to need a little work to adapt to the new topology, regardless of whether we currently specify a bus= arg to -device or not. I'm not sure there's anything we could do to future proof us to that kind of change. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote: > On 01/19/2011 11:35 AM, Daniel P. Berrange wrote: > >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >>>On 01/18/11 18:09, Anthony Liguori wrote: > >>>>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>>>>>The device model topology is 100% a hidden architectural detail. > >>>>>This is true for the sysbus, it is obviously not the case for PCI and > >>>>>similarly discoverable buses. There we have a guest-explorable topology > >>>>>that is currently equivalent to the the qdev layout. > >>>>But we also don't do PCI passthrough so we really haven't even explored > >>>>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>>>qdev-ify it. > >>>It is qdev-ified. It is a normal pci device from qdev's point of view. > >>> > >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't > >>>work with tcg? > >>> > >>>>The -device interface is a stable interface. Right now, you don't > >>>>specify any type of identifier of the pci bus when you create a PCI > >>>>device. It's implied in the interface. > >>>Wrong. You can specify the bus you want attach the device to via > >>>bus=. This is true for *every* device, including all pci > >>>devices. If unspecified qdev uses the first bus it finds. > >>> > >>>As long as there is a single pci bus only there is simply no need > >>>to specify it, thats why nobody does that today. > >>Right. In terms of specifying bus=, what are we promising re: > >>compatibility? Will there always be a pci.0? If we add some > >>PCI-to-PCI bridges in order to support more devices, is libvirt > >>support to parse the hierarchy and figure out which bus to put the > >>device on? > >The answer to your questions probably differ depending on > >whether '-nodefconfig' and '-nodefaults' are set on the > >command line. If they are set, then I'd expect to only > >ever see one PCI bus with name pci.0 forever more, unless > >i explicitly ask for more. If they are not set, then you > >might expect to see multiple PCI buses by appear by magic > > Yeah, we can't promise that. If you use -M pc, you aren't > guaranteed a stable PCI bus topology even with > -nodefconfig/-nodefaults. That's why we never use '-M pc' when actually invoking QEMU. If the user specifies 'pc' in the XML, we canonicalize that to the versioned alternative like 'pc-0.12' before invoking QEMU. We also expose the list of versioned machines to apps so they can do canonicalization themselves if desired. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote: > On 01/19/2011 11:01 AM, Daniel P. Berrange wrote: > > > >The reason we specify 'bus' is that we wanted to be flexible wrt > >upgrades of libvirt, without needing restarts of QEMU instances > >it manages. That way we can introduce new functionality into > >libvirt that relies on it having previously set 'bus' on all > >active QEMUs. > > > >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to > >be adding the extra bridges. I'd expect that QEMU provided just > >the first bridge and then libvirt would specify how many more > >bridges to create at boot or hotplug them later. So it wouldn't > >ever need to parse topology. > > Yeah, but replacing the main chipset will certainly change the PCI > topology such that if you're specifying bus=X and addr=X and then > also using -M pc, unless you're parsing the default topology to come > up with the addressing, it will break in the future. We never use a bare '-M pc' though, we always canonicalize to one of the versioned forms. So if we run '-M pc-0.12', then neither the main PCI chipset nor topology would have changed in newer QEMU. Of course if we deployed a new VM with '-M pc-0.20' that might have new PCI chipset, so bus=pci.0 might have different meaning that it did when used with '-M pc-0.12', but I don't think that's an immediate problem Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >On 01/18/11 18:09, Anthony Liguori wrote: > >>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>> > The device model topology is 100% a hidden architectural detail. > >>>This is true for the sysbus, it is obviously not the case for PCI and > >>>similarly discoverable buses. There we have a guest-explorable topology > >>>that is currently equivalent to the the qdev layout. > >> > >>But we also don't do PCI passthrough so we really haven't even explored > >>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>qdev-ify it. > > > >It is qdev-ified. It is a normal pci device from qdev's point of view. > > > >BTW: is there any reason why (vfio-based) pci passthrough couldn't > >work with tcg? > > > >>The -device interface is a stable interface. Right now, you don't > >>specify any type of identifier of the pci bus when you create a PCI > >>device. It's implied in the interface. > > > >Wrong. You can specify the bus you want attach the device to via > >bus=. This is true for *every* device, including all pci > >devices. If unspecified qdev uses the first bus it finds. > > > >As long as there is a single pci bus only there is simply no need > >to specify it, thats why nobody does that today. > > Right. In terms of specifying bus=, what are we promising re: > compatibility? Will there always be a pci.0? If we add some > PCI-to-PCI bridges in order to support more devices, is libvirt > support to parse the hierarchy and figure out which bus to put the > device on? The answer to your questions probably differ depending on whether '-nodefconfig' and '-nodefaults' are set on the command line. If they are set, then I'd expect to only ever see one PCI bus with name pci.0 forever more, unless i explicitly ask for more. If they are not set, then you might expect to see multiple PCI buses by appear by magic Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote: > On 01/19/2011 07:11 AM, Markus Armbruster wrote: > >Gerd Hoffmann writes: > > > >>On 01/18/11 18:09, Anthony Liguori wrote: > >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >The device model topology is 100% a hidden architectural detail. > This is true for the sysbus, it is obviously not the case for PCI and > similarly discoverable buses. There we have a guest-explorable topology > that is currently equivalent to the the qdev layout. > >>>But we also don't do PCI passthrough so we really haven't even explored > >>>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>>qdev-ify it. > >>It is qdev-ified. It is a normal pci device from qdev's point of view. > >> > >>BTW: is there any reason why (vfio-based) pci passthrough couldn't > >>work with tcg? > >> > >>>The -device interface is a stable interface. Right now, you don't > >>>specify any type of identifier of the pci bus when you create a PCI > >>>device. It's implied in the interface. > >>Wrong. You can specify the bus you want attach the device to via > >>bus=. This is true for *every* device, including all pci > >>devices. If unspecified qdev uses the first bus it finds. > >> > >>As long as there is a single pci bus only there is simply no need to > >>specify it, thats why nobody does that today. Once q35 finally > >>arrives this will change of course. > >As far as I know, libvirt does it already. > > I think that's a bad idea from a forward compatibility perspective. In our past experiance though, *not* specifying attributes like these has also been pretty bad from a forward compatibility perspective too. We're kind of damned either way, so on balance we decided we'd specify every attribute in qdev that's related to unique identification of devices & their inter-relationships. By strictly locking down the topology we were defining, we ought to have a more stable ABI in face of future changes. I accept this might not always work out, so we may have to adjust things over time still. Predicting the future is hard :-) Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >On 01/18/11 18:09, Anthony Liguori wrote: > >>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>> > The device model topology is 100% a hidden architectural detail. > >>>This is true for the sysbus, it is obviously not the case for PCI and > >>>similarly discoverable buses. There we have a guest-explorable topology > >>>that is currently equivalent to the the qdev layout. > >> > >>But we also don't do PCI passthrough so we really haven't even explored > >>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>qdev-ify it. > > > >It is qdev-ified. It is a normal pci device from qdev's point of view. > > > >BTW: is there any reason why (vfio-based) pci passthrough couldn't > >work with tcg? > > > >>The -device interface is a stable interface. Right now, you don't > >>specify any type of identifier of the pci bus when you create a PCI > >>device. It's implied in the interface. > > > >Wrong. You can specify the bus you want attach the device to via > >bus=. This is true for *every* device, including all pci > >devices. If unspecified qdev uses the first bus it finds. > > > >As long as there is a single pci bus only there is simply no need > >to specify it, thats why nobody does that today. > > Right. In terms of specifying bus=, what are we promising re: > compatibility? Will there always be a pci.0? If we add some > PCI-to-PCI bridges in order to support more devices, is libvirt > support to parse the hierarchy and figure out which bus to put the > device on? The reason we specify 'bus' is that we wanted to be flexible wrt upgrades of libvirt, without needing restarts of QEMU instances it manages. That way we can introduce new functionality into libvirt that relies on it having previously set 'bus' on all active QEMUs. If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to be adding the extra bridges. I'd expect that QEMU provided just the first bridge and then libvirt would specify how many more bridges to create at boot or hotplug them later. So it wouldn't ever need to parse topology. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] VM stuck in interrupt-loop after suspend to/resumed from file, or no interrupts at all
On Wed, Jan 12, 2011 at 03:51:13PM +0100, Philipp Hahn wrote: > Hello, > > libvirt implements a manages save, which suspens a VM to a file, from which > it > can be resumed later. This uses Qemus/Kvms "migrate exec:" feature. > This doesn't work reliable for me: In may cases the resumed VM seems to be > stuck: its VNC console is restored, but no key presses or network packages > are accepted. This both happens with Windows XP, 7, 2008 and Linux 2.6.32 > systems. > > Using the debugging cycle described below in more detail I was able to track > the problem down to interrupt handling: Either the Linux-guest-kernel > constantly receives an interrupt for the 8139cp network adapter, or no > interrupts at all (neither network nor keyboard nor timer); only sending a > NMI works and shows that at least the Linux-Kernel is still alive. > > If I add the -no-kvm-irqchip Option, it seems to work; I was not able to > reproduce a hang. I remember reporting a bug with that scenario 4/5 months back and it being fixed in the host kernel IIRC. Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [libvirt] cgroup limits only affect kvm guest under certain conditions
On Thu, Jan 06, 2011 at 02:15:37PM +0100, Dominik Klein wrote: > Hi > > I am playing with cgroups and try to limit block io for guests. > > The proof of concept is: > > # mkdir /dev/cgroup/blkio > # mount -t cgroup -o blkio blkio /dev/cgroup/blkio/ > # cd blkio/ > # mkdir test > # cd test/ > # ls -l /dev/vdisks/kirk > lrwxrwxrwx 1 root root 7 2011-01-06 13:46 /dev/vdisks/kirk -> ../dm-5 > # ls -l /dev/dm-5 > brw-rw 1 root disk 253, 5 2011-01-06 13:36 /dev/dm-5 > # echo "253:5 1048576" > blkio.throttle.write_bps_device > # echo $$ > tasks > # dd if=/dev/zero of=/dev/dm-5 bs=1M count=20 > 20+0 records in > 20+0 records out > 20971520 bytes (21 MB) copied, 20.0223 s, 1.0 MB/s > > So limit applies to the dd child of my shell. > > Now I assign /dev/dm-5 (/dev/vdisks/kirk) to a vm and echo the qemu-kvm > pid into tasks. Limits are not applied, the guest can happily use max io > bandwidth. Did you just echo the main qemu-kvm PID, or did you also add the PIDs of every thread too ? From this description of the problem, I'd guess you've only confined the main process thread and thus the I/O & VCPU threads are not confined. Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-0.13.0 - winsows 2008 - chkdisk too slow
On Thu, Jan 06, 2011 at 12:19:21PM +0200, Avi Kivity wrote: > On 01/06/2011 11:42 AM, Nikola Ciprich wrote: > >> - run trace-cmd record -e kvm -b 10 -P pid1 -P pid2, ctrl-C after a > >seems like it's not possible to specify multiple pids, so > > Did you get 'overrun: something' reports from trace-cmd, where > something != 0? > > If you're not sure, please run the trace again. Also try adding '-r > 10' to the command line. > > >I've run 4 commands in parallel. Also I can't get monitor information > >since vm is started using libvirt, so I've just used all machine's qemu-kvm > >pids.. > > Dan, is there a way to hijack the monitor so we can run some > commands on it? Things like 'info registers' and disassembly. Depends on the libvirt version. For most, you'll need to look for the monitor path in the QEMU argv: -chardev +socket,id=monitor,path=/var/lib/libvirt/qemu/vmwts02.monitor,server,nowait -mon chardev=monitor,mode=readline then, 'service libvirtd stop' and now you can connect to the monitor at that path & run commands you want, and then disconnect and start libvirtd again. If you run any commands that change the VM state, things may well get confused when you start libvirtd again, but if its just 'info registers' etc it should be pretty safe. If you have a new enough libvirt, then you can also send commands directly using 'virsh qemu-monitor-command' (checking whether you need JSON or HMP syntax first - in this case you can see it needs HMP). Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] device-assignment: chmod the rom file before opening read/write
On Wed, Jan 05, 2011 at 05:14:55PM +0200, Avi Kivity wrote: > On 01/05/2011 04:57 PM, Alex Williamson wrote: > >A valid argument. I think it could also be argued that the user is > >providing ownership of the file and writing to the file is part of the > >low level details of the sysfs rom file API and should be handled by the > >user of that API. We basically have 3 places we could put this: > > > > A. kernel - Why is this file mode 0400 by default anyway if using > > it requires write access? Set it to mode 0600 here by default. > > B. libvirt - Already does chown, why not do chmod too? chmod and > > restore here. > > C. qemu - Owns file, chmod is trivial and part of the sysfs rom > > file API? chmod around usage. > > > > qemu might not actually own the file, just have rw permissions. Or > it might own the file and selinux may prevent it from changing the > permissions. Or it may die before the reverse chmod and leave > things not as they were. Agreed, I don't think we can rely on QEMU being able to chmod() the file in general. > > >I chose qemu because it seemed to have the least chance of side-effects > >and has the smallest usage window. Do you prefer libvirt or kernel? > > No idea really. What's the kernel's motivation for keeping it ro? Sanity? > > I'd guess libvirt is the one to do it, but someone more familiar > with device assignment / pci (you?) should weigh in on this. I've no real objection to libvirt setting the 0600 permissions on it, if that's required for correct operation. BTW, what is the failure scenario seen when the file is 0400. I want to know how to diagnose/triage this if it gets reported by users in BZ... Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: USB Passthrough 1.1 performance problem...
On Tue, Dec 14, 2010 at 12:55:04PM +0100, Kenni Lund wrote: > 2010/12/14 Erik Brakkee : > >> From: Kenni Lund > >> 2010/12/14 Erik Brakkee : > > From: Kenni Lund > > > > Does this mean I have a chance now that PCI passthrough of my WinTV > > PVR-500 > > might work now? > > Passthrough of a PVR-500 has been working for a long time. I've been > running with passthrough of a PVR-500 in my HTPC, since > November/December 2009...so it should work with any recent kernel and > any recent version of qemu-kvm you can find today - No patching > needed. The only issue I had with the PVR-500 card, was when *I* > didn't free up the shared interrupts...once I fixed that, it "just > worked". > >>> > >>> How did you free up those shared interrupts then? I tried different slots > >>> but always get conflicts with the USB irqs. > >> > >> I did an unbind of the conflicting device (eg. disabled it). I moved > >> the PVR-500 card around in the different slots and once I got a > >> conflict with the integrated sound card, I left the PVR-500 card in > >> that slot (it's a headless machine, so no need for sound) and > >> configured unbind of the sound card at boot time. On my old system I > >> think it was conflicting with one of the USB controllers as well, but > >> it didn't really matter, as I only lost a few of the ports on the back > >> of the computer for that particular USB controller - I still had > >> plenty of USB ports left and if I really needed more ports, I could > >> just plug in an extra USB PCI card. > >> > >> My /etc/rc.local boot script looks like the following today: > >> -- > >> #Remove HDA conflicting with ivtv1 > >> echo ":00:1b.0" > /sys/bus/pci/drivers/HDA\ Intel/unbind > >> > >> # ivtv0 > >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/new_id > >> echo ":04:08.0" > /sys/bus/pci/drivers/ivtv/unbind > >> echo ":04:08.0" > /sys/bus/pci/drivers/pci-stub/bind > >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/remove_id > >> > >> # ivtv1 > >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/new_id > >> echo ":04:09.0" > /sys/bus/pci/drivers/ivtv/unbind > >> echo ":04:09.0" > /sys/bus/pci/drivers/pci-stub/bind > >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/remove_id > > > > I did not try unbinding the usb device so I can also try that. > > > > I don'.t understand what is happening with the 0016. I configured the > > pci card in kvm and I believe kvm does the binding to pci-stub in recent > > versions. Where is the 0016%oming from? > > Okay, qemu-kvm might do it today, I don't know - I haven't changed > that script for the past year. But are you sure that it's not > libvirt/virsh/virt-manager which does that for you? If you use the managed="yes" attribute on the in libvirt XML, then libvirt will automatically do the pcistub bind/unbind, followed by a device reset at guest startup & the reverse at shutdown. If you have conflicting devices on the bus though, libvirt won't attempt to unbind them, unless you had also explicitly assigned all those conflicting devices to the same guest. Daniel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html