from:"Daniel P. Berrange"

Re: [Qemu-devel] [PATCH v3 0/9] HyperV equivalent of pvpanic driver

2015-06-30 Thread Daniel P. Berrange

On Tue, Jun 30, 2015 at 02:33:18PM +0300, Denis V. Lunev wrote:
> Windows 2012 guests can notify hypervisor about occurred guest crash
> (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
> handling of this MSR's by KVM and sending notification to user space that
> allows to gather Windows guest crash dump by QEMU/LIBVIRT.
> 
> The idea is to provide functionality equal to pvpanic device without
> QEMU guest agent for Windows.

That's nice - do you know if the Linux kernel (or any other non-Win2k12
kernels) have support for notifying hypevisors via this Hyper-V msr,
when running as a guest ?

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange

On Fri, May 22, 2015 at 12:21:27PM +0100, Peter Maydell wrote:
> On 22 May 2015 at 12:12, Daniel P. Berrange  wrote:
> > Yep, it is hard saying no - but I'd think as long as it was possible to add
> > the extra features using -device, it ought to be practical to keep a "virt"
> > machine types "-nodefaults -nodefconfig" base setup pretty minimal.
> 
> Mmm, but -device only works for pluggable devices really. We don't
> have a coherent mechanism for saying "put the PS/2 keyboard controller
> into the system at its usual IO ports" on the command line.

Oh, I didn't neccessarily mean that we'd need the ability to add a
ps/2 keyboard via -device. I meant that there just need to be able
to add /some/ kind of keyboard. eg we have a usb-kbd device that
could potentially fill that role. Likewise for mouse pointer. Serial
ports, etc.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange

On Fri, May 22, 2015 at 12:04:54PM +0100, Peter Maydell wrote:
> On 22 May 2015 at 12:01, Daniel P. Berrange  wrote:
> > On the QEMU side of things I wonder if there is scope for taking AArch64's
> > 'virt' machine type concept and duplicating it on all architectures.
> 
> Experience suggests that holding the line on "minimal" is really
> quite tricky, though -- there's always one more thing that
> somebody really wants to add...

Yep, it is hard saying no - but I'd think as long as it was possible to add
the extra features using -device, it ought to be practical to keep a "virt"
machine types "-nodefaults -nodefconfig" base setup pretty minimal. In
particular I don't see why we need to have a SATA controller and ISA/LPC
bridge in every virt machine - root PCI bus only should be possible, as you
can provide disks via virtio-blk or virtio-scsi and serial, parallel, mouse,
floppy via PCI devices and/or by adding a USB bus in the cases where you
really need one.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange

On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote:
> Some of you may have heard about the "Clear Containers" initiative from
> Intel, which couple KVM with various kernel tricks to create extremely
> lightweight virtual machines.  The experimental Clear Containers setup
> requires only 18-20 MB to launch a virtual machine, and needs about 60
> ms to boot.
> 
> Now, as all of you probably know, "QEMU is great for running Windows or
> legacy Linux guests, but that flexibility comes at a hefty price. Not
> only does all of the emulation consume memory, it also requires some
> form of low-level firmware in the guest as well. All of this adds quite
> a bit to virtual-machine startup times (500 to 700 milliseconds is not
> unusual)".
> 
> Right?  In fact, it's for this reason that Clear Containers uses kvmtool
> instead of QEMU.
> 
> No, wrong!  In fact, reporting bad performance is pretty much the same
> as throwing down the gauntlet.

On the QEMU side of things I wonder if there is scope for taking AArch64's
'virt' machine type concept and duplicating it on all architectures. It
would be nice to have a common minimal machine type on all architectures
that discards all legacy platform stuff and focuses on the minimum needed
to run modern virtual machine optimized guest OS. People would always know
that a machine type called 'virt' was the minimal virtualization platform,
while the others all target emulation of realworld (legacy) baremetal
platforms.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread Daniel P. Berrange

On Mon, Jul 21, 2014 at 08:21:21AM -0600, Eric Blake wrote:
> On 07/20/2014 03:38 AM, David Marchand wrote:
> > When using ivshmem devices, notifications between guests can be sent as
> > interrupts using a ivshmem-server (typical use described in documentation).
> > The client is provided as a debug tool.
> > 
> > Signed-off-by: Olivier Matz 
> > Signed-off-by: David Marchand 
> > ---
> >  contrib/ivshmem-client/Makefile |   26 ++
> 
> > +++ b/contrib/ivshmem-client/Makefile
> > @@ -0,0 +1,26 @@
> > +# Copyright 2014 6WIND S.A.
> > +# All rights reserved
> 
> This file has no other license, and is therefore incompatible with
> GPLv2.  You'll need to resubmit under an appropriately open license.
> 
> > +++ b/contrib/ivshmem-client/ivshmem-client.h
> > @@ -0,0 +1,238 @@
> > +/*
> > + * Copyright(c) 2014 6WIND S.A.
> > + * All rights reserved.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
> 
> I'm not a lawyer, but to me, this license is self-contradictory.  You
> can't have "All rights reserved" and still be GPL, because the point of
> the GPL is that you are NOT reserving all rights, but explicitly
> granting your user various rights (on condition that they likewise grant
> those rights to others).  But you're not the only file in the qemu code
> base with this questionable mix.

In any case adding the term 'All rights reserved' is said to be redundant
obsolete these days

  https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Xen hypervisor inside KVM guest with x2apic CPU feature fails to boot

2014-06-02 Thread Daniel P. Berrange

I'm running

 kernel-3.14.4-200.fc20.x86_64
 qemu-1.6.2-5.fc20.x86_64
 xen-4.4.0-4.fc21

In process of trying to get a Xen hypervisor running inside a KVM guest I
found that there's a problem with x2apic. NB I do *not* use nested-VMX
here, just trying to get plain Xen paravirt working before trying todo
nested HVM.

Any time I enable the 'x2apic' CPU flag for the KVM guest, the Xen hypervisor
running inside the guest will fail to boot:

The QEMU/KVM -cpu arg is

  -cpu 
core2duo,+erms,+smep,+fsgsbase,+lahf_lm,+rdtscp,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+tsc-deadline,+popcnt,+x2apic,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds
 

The Xen logs indicate it isn't liking the x2apic feature and is disabling
it, but then it obviously fails to setup the non-x2apic codepath it is
following - even though the non-x2apic codepath works fine if you don't
have +x2apic set for the KVM guest.

(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) XSM Framework v1.0.0 initialized
(XEN) Flask:  Initializing.
(XEN) AVC INITIALIZED
(XEN) Flask:  Starting in permissive mode.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2693.939 MHz processor.
(XEN) Initing memory sharing.
(XEN) traps.c:3071: GPF (): 82d0801b83c7 -> 82d08023386b
(XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 1 CMCI 0 firstbank 1 
extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) Getting VERSION: 1050014
(XEN) Getting VERSION: 1050014
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) Getting ID: 0
(XEN) Getting LVT0: 8700
(XEN) Getting LVT1: 8400
(XEN) Suppress EOI broadcast on CPU#0
(XEN) enabled ExtINT on CPU#0
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) init IO_APIC IRQs
(XEN)  IO-APIC (apicid-pin) 0-0, 0-16, 0-17, 0-18, 0-19, 0-20, 0-21, 0-22, 0-23 
not connected.
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through the 8259A ...  failed.
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ... failed :(.
(XEN) 
(XEN) 
(XEN) Panic on CPU 0:
(XEN) IO-APIC + timer doesn't work!  Boot with apic_verbosity=debug and send a 
report.  Then try booting with the 'noapic' option
(XEN) 

Will attach the full non-trimmed Xen log to this mail, along with a log
showing successful boot when 'x2apic' isn't given to KVM.

I'm unclear if this is a Xen bug or KVM bug or QEMU bug, or a combination
of them

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
 Xen 4.4.0-4.fc21
(XEN) Xen version 4.4.0 (mockbuild@[unknown]) (gcc (GCC) 4.9.0 20140506 (Red 
Hat 4.9.0-3)) debug=n Mon May 12 18:38:23 UTC 2014
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder loglvl=all guest_loglvl=all com1=115200,8n1 
console=com1,vga apic_verbosity=debug
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009fc00 (usable)
(XEN)  0009fc00 - 000a (reserved)
(XEN)  000f - 0010 (reserved)
(XEN)  0010 - 5dbfe000 (usable)
(XEN)  5dbfe000 - 5dc0 (reserved)
(XEN)  feffc000 - ff00 (reserved)
(XEN)  fffc - 0001 (reserved)
(XEN) System RAM: 1499MB (1535604kB)
(XEN) ACPI: RSDP 000F1690, 0014 (r0 BOCHS )
(XEN) ACPI: RSDT 5DBFE4A0, 0030 (r1 BOCHS  BXPCRSDT1 BXPC1)
(XEN) ACPI: FACP 5DBFFF80, 0074 (r1 BOCHS  BXPCFACP1 BXPC1)
(XEN) ACPI: DSDT 5DBFE4D0, 1137 (r1   BXPC   BXDSDT1 INTL 20140114)
(XEN) ACPI: FACS 5DBFFF40, 0040
(XEN) ACPI: SSDT 5DBFF700, 0838 (r1 BOCHS  BXPCSSDT1 BXPC1)
(XEN) ACPI: APIC 5DBFF610, 0078 (r1 BOCHS  BXPCAPIC1 BXPC1)
(XEN) No NUMA configuration found
(XEN) Faking a node at -5dbfe000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000f17f0
(XEN) DMI 2.4 present.
(XEN) APIC boot state is 'xapic'
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0xb008
(XEN) ACPI: SLEEP INFO: pm1x_cnt[b004,0], pm1x_evt[b000,0]
(XEN) ACPI: wakeup_vec[5dbfff4c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:15 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACP

Re: [Qemu-devel] KVM call agenda for 2014-04-28

2014-04-29 Thread Daniel P. Berrange

On Tue, Apr 29, 2014 at 02:33:58PM +0200, Markus Armbruster wrote:
> Peter Maydell  writes:
> 
> > On 29 April 2014 11:09, Michael S. Tsirkin  wrote:
> >> Let's just make clear how to contact us securely, when to contact that
> >> list, and what we'll do with the info.  I cobbled together the
> >> following:
> >> http://wiki.qemu.org/SecurityProcess
> >
> > Looks generally OK I guess. I'd drop the 'how to use pgp' section --
> > anybody who cares will already know how to send us PGP email.
> 
> The first paragraph under "How to Contact Us Securely" is fine, the rest
> seems redundant for readers familiar with PGP, yet hardly sufficient for
> the rest.
> 
> One thing I like about Libvirt's Security Process page[*] is they give
> an idea on embargo duration.

FWIW I picked the "2 weeks" length myself a completely arbitrary timeframe.
We haven't stuck to that strictly - we consider needs of each vulnerability
as it is triaged to determine the minimum practical embargo time. So think
of "2 weeks" as more of a guiding principal to show the world that we don't
believe in keeping issues under embargo for very long periods of time.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Help regarding virsh domifstat

2013-11-01 Thread Daniel P. Berrange

On Thu, Oct 31, 2013 at 08:30:30PM -0500, Rohit Bhat wrote:
> Hi,
> 
> I need a small help. I am working on a project where i have to monitor
> network activity of a VM running on KVM.
> 
> I am interested in how much data is going into the VM and how much
> data is coming out of the VM. I checked on the net and found out virsh
> domifstat is the way to go about it.
> 
> 1. But looks like these stats also include bytes related to control
> traffic for the VM. Is there a way to exclude that? I just want the
> size of actual data transfers.
> 
> 2. Is there a way by which i can report the data transfer of VM with
> the outside world (outside hypervisor) only while excluding data
> transfer with any other VM on the same host?
> 
> Please let me know if this is a not the right group for such queries.

The libvirt-users mailing list is a better place for virsh related
questions

  http://libvirt.org/contact.html#email

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu, numa: non-contiguous cpusets

2013-09-30 Thread Daniel P. Berrange

On Sun, Sep 29, 2013 at 05:10:44PM +0200, Borislav Petkov wrote:
> Btw,
> 
> while I got your attention, on a not-really related topic: how do we
> feel about adding support for specifying a non-contiguous set of cpus
> for a numa node in qemu with the -numa option? I.e., like this, for
> example:
> 
> x86_64-softmmu/qemu-system-x86_64 -smp 8 -numa node,nodeid=0,cpus=0\;2\;4-5 
> -numa node,nodeid=1,cpus=1\;3\;6-7
> 
> The ';' needs to be escaped from the shell but I'm open for better
> suggestions.

Use a ':' instead.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt-users] Questions on how to reset ID numbers for virt Guests.

2013-09-11 Thread Daniel P. Berrange

On Wed, Sep 11, 2013 at 09:47:07AM +0200, Paolo Bonzini wrote:
> Il 11/09/2013 00:27, James Sparenberg ha scritto:
> > I'm doing some experimenting in our Development lab and as a result
> > I'm kickstarting over and over Virtual guests.  This is of course
> > causing the guest Id to increment by one with each test.  I've
> > googled around and tried searching the list but have not found out
> > how (if at all) it would be possible to reset the ID number back to 1
> > more than is in use.  Also is there  a limit where I run out of ID's?
> > (for example does it only go up to 99?)
> 
> No, there is no limit.

Well, 'int' will wrap eventually, but you'd need to have created
a hell of alot of guests for that to be a problem :-)

> I don't know the answer to your other question, so I'm adding the
> libvirt-users mailing list.

If you restart libvirtd, it reset itself to start allocating IDs
at the max current used ID of any running guest.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Daniel P. Berrange

On Tue, Jul 16, 2013 at 10:40:28AM +, Naor Shlomo wrote:
> Hi Paolo,
> 
> For some unknown reason it suddenly started to accept the changes to the XML 
> and the strings you gave me are now in place.
> Upon machine start I now receive the following error messages:
> 
> virsh # start NaorDev
> error: Failed to start domain NaorDev
> error: internal error Process exited while reading console log output: kvm: 
> -global: requires an argument
> 
> Here's the XML:
> 

>   
> 
> 
>   

Presumably what you wanted to do was

   
 
 
   

Rather than setting an environment variable.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Bugs filed in the week for Upstream Qemu and Libvirt

2013-07-10 Thread Daniel P. Berrange

On Wed, Jul 10, 2013 at 06:45:08PM +0530, chandrashekar shastri wrote:
> Hi,
> 
> Below are bugs filed in this week for Upstream qemu and libvirt:
> 
> Qemu in Launchpad:
> 
> https://bugs.launchpad.net/opensuse/+bug/1199416
> Hot-add qcow2 [virtio-scsi] devices doesn't work in SlLES-11-SP2guest
> 
> Libvirt Bugs:
> 
> Bug 982224 - Attaching of the Virtio-scsi [qcow2] drives fails with
> "error: internal error No more available PCI addresses"
> Bug 982455 - RHEL Guest fails to boot after attaching 200+ scsi
> devices [virtio-scsi qcow2]
> Bug 980954 - Virtio-scsi drives in Windows7 shows yellow bang in
> device manager though virtio scsi pass through driver is installed
> Bug 982630 - Documentation : virsh attach-disk --help shouldbe
> updated with proper examples for --type and --driver

We really don't need lists of bugs emailed to the libvirt-list mailing
list. People already monitor bugzilla & have email alerts from bugzilla
as they desire.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel 3.9.x kvm hangs after seabios

2013-05-08 Thread Daniel P. Berrange

On Wed, May 08, 2013 at 02:08:55PM +0200, Tomas Papan wrote:
> Hi,
> 
> I found this in the libvirt (but those messages are same in 3.8.x)
> anakin libvirt # cat libvirtd.log
> 2013-05-08 11:59:29.645+: 3750: info : libvirt version: 1.0.5
> 2013-05-08 11:59:29.645+: 3750: error : udevGetDMIData:1548 :
> Failed to get udev device for syspath '/sys/devices/virtual/dmi/id' or
> '/sys/class/dmi/id'
> 2013-05-08 11:59:29.680+: 3750: warning :
> ebiptablesDriverInitCLITools:4225 : Could not find 'ebtables'
> executable

You need to look at /var/log/libvirt/qemu/$GUESTNAME.log for
QEMU related messages. The libvirtd.log file only has the
libvirt related messages.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [okeanos-dev] Re: KVM versions, machine types and failed migrations

2013-01-10 Thread Daniel P. Berrange

On Wed, Jan 09, 2013 at 03:27:53PM +0200, Vangelis Koukis wrote:
> On Wed, Jan 09, 2013 at 01:10:45pm +0000, Daniel P. Berrange wrote:
> > When doing migration, the fundamental requirement is that the guest
> > OS visible machine ABI must not change. Thus there are three key
> > things to take care of when launching QEMU on the migration target
> > host.
> > 
> >  - The device PCI/USB addresses must be identical to the source
> >  - The machine type must be identical to the source
> >  - The CPU model must be identical to the source
> > 
> 
> Thanks for the detailed list of requirements, we'll take it into account
> for the relevant Ganeti patch.
> 
> > If you don't follow those requirements, either QEMU or the guest OS
> > or both will crash & burn during migration & you get to keep both
> > pieces :-)
> > 
> 
> My point is, are these requirements left up to the caller of "kvm
> -incoming" to satisfy? Since the migration will most probably break,
> wouldn't it be best for QEMU to detect this and complain loudly, instead
> of continuing with the migration, failing silently and destroying the
> VM?
> 
> Sure there could be some "yes, do it, I know it is going to break"
> option, which will make QEMU proceed with the migration. However, in 99%
> of the cases this is just user error, e.g. the user has upgraded the
> version on the other end and has not specified -M explicitly. It would
> be best if QEMU was able to detect and warn the user about what is going
> to happen, because it does lead to the VM dying.

What you describe is certainly desirable, but it is quite hard to achieve
with current QEMU. Much of the work with moving to the new QEMU object
model & configuration descriptions has been motivated by a desire to
enable improvements migration handling. As you suggest, the goal is that
the source QEMU be able to send a complete & reliable hardware description
to the destination QEMU during migration.It is getting closer, but we're
not there yet.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM versions, machine types and failed migrations

2013-01-09 Thread Daniel P. Berrange

On Wed, Jan 09, 2013 at 02:23:50PM +0200, Vangelis Koukis wrote:
> Hello,
> 
> I'd like to ask a few questions about the way migrations work in KVM
> among different emulated machine types and different versions of the
> qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists,
> please redirect me if I was wrong in doing so.
> 
> In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we
> see VM migrations fail silently if going from kvm 1.0 to kvm 1.1.
> The source VM is frozen, "info migrate" on the source monitor reports
> success, but the VM is dead upon arrival on the destination process.
> Please see [3] for the exact package versions for qemu-kvm we have
> tested with.
> 
> Migration works if the destination kvm has been started with the same
> machine type as the source VM, e.g., using "-M pc-1.0" specifically on
> the destination, when migrating a pc-1.0 machine from kvm 1.0 to
> kvm 1.1.
> 
> How does the machine type specified with -M work in the case of
> migrations? Are migrations expected to fail if the machine type is
> different between source and destination process? If yes, shouldn't KVM be
> able to detect this and abort the migration instead of failing silently?

When doing migration, the fundamental requirement is that the guest
OS visible machine ABI must not change. Thus there are three key
things to take care of when launching QEMU on the migration target
host.

 - The device PCI/USB addresses must be identical to the source
 - The machine type must be identical to the source
 - The CPU model must be identical to the source

If you don't follow those requirements, either QEMU or the guest OS
or both will crash & burn during migration & you get to keep both
pieces :-)

> Regarding different package versions of qemu-kvm, it seems migrations do
> not work from source 0.12.5 to any other version *even* if -M pc-0.12 is
> specified at the incoming KVM process. For versions >= 1.0 everything
> works provided the machine type on the destination is the same as on the
> source.

Some older versions of QEMU were buggy causing the machine type to
not correctly preserve ABI.

> Our goal is to patch Ganeti [2] so that it sets the destination machine
> type to that of the source specifically, ensuring migrations work
> seamlessly after a KVM upgrade. Is there a way to retrieve the machine
> type of a running KVM process through a monitor command?

IIRC there is not a monitor command for this. The general approach
to dealing with migration stability should be to launch QEMU with a
canonical hardware configuration. This means explicitly setting a machine
type, CPU model and PCI/USB devices addresses upfront. NB you should not
use 'pc' as a machine type - if you query the list of machine types from
QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the
versioned type so you have a known machine type.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: remove "boot=on|off" drive parameter compatibility

2012-10-02 Thread Daniel P. Berrange

On Mon, Oct 01, 2012 at 08:19:29AM -0500, Anthony Liguori wrote:
> Jan Kiszka  writes:
> I think at this point, none of this matters but I added the various
> distro maintainers to the thread.
> 
> I think it's time for the distros to drop qemu-kvm and just ship
> qemu.git.  Is there anything else that needs to happen to make that
> switch?

If that is upstream's recommendation, then I see no issue with switching
Fedora 19 / RHEL-7 to use qemu.git instead of qemu-kvm.git.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to "kvm" if the host supports it

2012-10-01 Thread Daniel P. Berrange

On Mon, Oct 01, 2012 at 06:43:00PM +0200, Andreas Färber wrote:
> Hello Jan,
> 
> Am 01.10.2012 16:34, schrieb Jan Kiszka:
> > If we built a target for a host that supports KVM in principle, set the
> > default accelerator to KVM as well. This also means the start of QEMU
> > will fail to start if KVM support turns out to be unavailable at
> > runtime.
> 
> From a distro point of view this of course means that we will build
> against KVM and that the new KVM default will start to fail for users on
> very old hardware. Can't we do a runtime check to select the default?

NB, this is *not* only about old hardware. There are plenty of users who
use QEMU inside VMs. One very common usage I know of is image building
tools which are run inside Amazon VMs, using libguestfs & QEMU.

IMHO, default to KVM, fallback to TCG is the most friendly default
behaviour.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] TSC scaling interface to management

2012-09-25 Thread Daniel P. Berrange

On Wed, Sep 12, 2012 at 12:39:39PM -0300, Marcelo Tosatti wrote:
> 
> 
> HW TSC scaling is a feature of AMD processors that allows a
> multiplier to be specified to the TSC frequency exposed to the guest.
> 
> KVM also contains provision to trap TSC ("KVM: Infrastructure for
> software and hardware based TSC rate scaling" cc578287e3224d0da)
> or advance TSC frequency.
> 
> This is useful when migrating to a host with different frequency and
> the guest is possibly using direct RDTSC instructions for purposes
> other than measuring cycles (that is, it previously calculated
> cycles-per-second, and uses that information which is stale after
> migration).
> 
> "qemu-x86: Set tsc_khz in kvm when supported" (e7429073ed1a76518)
> added support for tsc_khz= option in QEMU.
> 
> I am proposing the following changes so that management applications
> can work with this:
> 
> 1) New option for tsc_khz, which is tsc_khz=host (QEMU command line
> option). Host means that QEMU is responsible for retrieving the 
> TSC frequency of the host processor and use that.
> Management application does not have to deal with the burden.

FYI, libvirt already has support for expressing a number of different
TSC related config options, for support of Xen and VMWare's capabilities
in this area. What we currently allow for is

   

In this context the frequency attribute provides the HZ value to
provide to the guest.

  - auto == Emulate if TSC is unstable, else allow native TSC access
  - native == Always allow native TSC access
  - emulate = Always emulate TSC
  - smpsafe == Always emulate TSC, and interlock SMP

> Therefore it appears that this "tsc_khz=auto" option can be specified
> only if the user specifies so (it can be a per-guest flag hidden
> in the management configuration/manual).
> 
> Sending this email to gather suggestions (or objections)
> to this interface.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8] kvm: notify host when the guest is panicked

2012-08-14 Thread Daniel P. Berrange

On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
> On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
> > We can know the guest is panicked when the guest runs on xen.
> > But we do not have such feature on kvm.
> > 
> > Another purpose of this feature is: management app(for example:
> > libvirt) can do auto dump when the guest is panicked. If management
> > app does not do auto dump, the guest's user can do dump by hand if
> > he sees the guest is panicked.
> > 
> > We have three solutions to implement this feature:
> > 1. use vmcall
> > 2. use I/O port
> > 3. use virtio-serial.
> > 
> > We have decided to avoid touching hypervisor. The reason why I choose
> > choose the I/O port is:
> > 1. it is easier to implememt
> > 2. it does not depend any virtual device
> > 3. it can work when starting the kernel
> 
> How about searching for the "Kernel panic - not syncing" string 
> in the guests serial output? Say libvirtd could take an action upon
> that?

No, this is not satisfactory. It depends on the guest OS being
configured to use the serial port for console output which we
cannot mandate, since it may well be required for other purposes.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: First shot at adding IPMI to qemu

2012-07-09 Thread Daniel P. Berrange

On Mon, Jul 09, 2012 at 08:23:11AM -0500, Corey Minyard wrote:
> I haven't heard anything about these patches.  Any comments, good or
> bad?  Has anyone tried these?

You really ought to post this to the qemu-devel mailing list,
since that's where the majority of QEMU developers hang out.
This KVM list is primarily for KVM specific development tasks
in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] qemu-kvm: Fix default machine options

2012-07-06 Thread Daniel P. Berrange

On Fri, Jul 06, 2012 at 06:21:06PM +0200, Jan Kiszka wrote:
> qemu-kvm-specific machine defaults were missing for pc-0.15 to pc-1.1.
> Then Daniel noted that --disable-kvm caused problems as the generated
> binaries would be unable to run. As we are at it, we can drop the
> kernel_irqchip=on that is now enable by default in upstream.
> 
> CC: Daniel P. Berrange 
> Signed-off-by: Jan Kiszka 

ACK, looks good to me.

> Noticed that there was more to do. Can you take care of stable-1.1,
> Daniel? TIA.

Yep, will post a patch for stable-1.1 when this is accepted
into master.

>  hw/pc_piix.c |   23 ---
>  1 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index 98a06fa..5860d52 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -353,6 +353,12 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
>  }
>  #endif
>  
> +#ifdef CONFIG_KVM_OPTIONS
> +#define KVM_MACHINE_OPTIONS "accel=kvm"
> +#else
> +#define KVM_MACHINE_OPTIONS ""
> +#endif
> +
>  static QEMUMachine pc_machine_v1_2 = {
>  .name = "pc-1.2",
>  .alias = "pc",
> @@ -360,7 +366,7 @@ static QEMUMachine pc_machine_v1_2 = {
>  .init = pc_init_pci,
>  .max_cpus = 255,
>  .is_default = 1,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  };
>  
>  #define PC_COMPAT_1_1 \
> @@ -387,6 +393,7 @@ static QEMUMachine pc_machine_v1_1 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci,
>  .max_cpus = 255,
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_1_1,
>  { /* end of list */ }
> @@ -422,6 +429,7 @@ static QEMUMachine pc_machine_v1_0 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci,
>  .max_cpus = 255,
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_1_0,
>  { /* end of list */ }
> @@ -437,6 +445,7 @@ static QEMUMachine pc_machine_v0_15 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci,
>  .max_cpus = 255,
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_15,
>  { /* end of list */ }
> @@ -469,7 +478,7 @@ static QEMUMachine pc_machine_v0_14 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci,
>  .max_cpus = 255,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_14, 
>  {
> @@ -503,7 +512,7 @@ static QEMUMachine pc_machine_v0_13 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci_no_kvmclock,
>  .max_cpus = 255,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_13,
>  {
> @@ -541,7 +550,7 @@ static QEMUMachine pc_machine_v0_12 = {
>  .desc = "Standard PC",
>  .init = pc_init_pci_no_kvmclock,
>  .max_cpus = 255,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_12,
>  {
> @@ -575,7 +584,7 @@ static QEMUMachine pc_machine_v0_11 = {
>  .desc = "Standard PC, qemu 0.11",
>  .init = pc_init_pci_no_kvmclock,
>  .max_cpus = 255,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_11,
>  {
> @@ -597,7 +606,7 @@ static QEMUMachine pc_machine_v0_10 = {
>  .desc = "Standard PC, qemu 0.10",
>  .init = pc_init_pci_no_kvmclock,
>  .max_cpus = 255,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  PC_COMPAT_0_11,
>  {
> @@ -631,7 +640,7 @@ static QEMUMachine isapc_machine = {
>  .desc = "ISA-only PC",
>  .init = pc_init_isa,
>  .max_cpus = 1,
> -.default_machine_opts = "accel=kvm,kernel_irqchip=on",
> +.default_machine_opts = KVM_MACHINE_OPTIONS,
>  .compat_props = (GlobalProperty[]) {
>  {
>  .driver   = "pc-sysfw",
> -- 
> 1.7.3.4

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix default accelerator when building with --disable-kvm

2012-07-06 Thread Daniel P. Berrange

From: "Daniel P. Berrange" 

The following commit

  commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d
  Author: Jan Kiszka 
  Date:   Fri Mar 2 10:30:43 2012 +0100

qemu-kvm: Use machine options to configure qemu-kvm defaults

Upstream is moving towards this mechanism, so start using it in qemu-kvm
already to configure the specific defaults: kvm enabled on, just like
in-kernel irqchips.

prevents qemu from starting when it has been build with the
--disable-kvm argument, because the accelerator is hardcoded
to 'kvm'.  This is a regression previously fixed by

  commit ce967f6610dcd7b7762dbad5a639fecf42d5c76d
  Author: Daniel P. Berrange 
  Date:   Fri Aug 5 09:50:29 2011 +0100

Fix default accelerator when configured with --disable-kvm

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

The fix is again to make this conditional on CONFIG_KVM_OPTIONS

Signed-off-by: Daniel P. Berrange 
---
 hw/pc_piix.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 98a06fa..35202dd 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -360,7 +360,9 @@ static QEMUMachine pc_machine_v1_2 = {
 .init = pc_init_pci,
 .max_cpus = 255,
 .is_default = 1,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 };
 
 #define PC_COMPAT_1_1 \
@@ -469,7 +471,9 @@ static QEMUMachine pc_machine_v0_14 = {
 .desc = "Standard PC",
 .init = pc_init_pci,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
 {
@@ -503,7 +507,9 @@ static QEMUMachine pc_machine_v0_13 = {
 .desc = "Standard PC",
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_13,
 {
@@ -541,7 +547,9 @@ static QEMUMachine pc_machine_v0_12 = {
 .desc = "Standard PC",
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_12,
 {
@@ -575,7 +583,9 @@ static QEMUMachine pc_machine_v0_11 = {
 .desc = "Standard PC, qemu 0.11",
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_11,
 {
@@ -597,7 +607,9 @@ static QEMUMachine pc_machine_v0_10 = {
 .desc = "Standard PC, qemu 0.10",
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_11,
 {
@@ -631,7 +643,9 @@ static QEMUMachine isapc_machine = {
 .desc = "ISA-only PC",
 .init = pc_init_isa,
 .max_cpus = 1,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = "accel=kvm,kernel_irqchip=on",
+#endif
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = "pc-sysfw",
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-03 Thread Daniel P. Berrange

On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
> On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
> > On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> > > Resending series, after fixing some coding style issues. Does anybody has 
> > > any
> > > feedback about this proposal?
> > > 
> > > Changes v1 -> v2:
> > >  - Coding style fixes
> > > 
> > > Original cover letter:
> > > 
> > > I was investigating if there are any mechanisms that allow manually 
> > > pinning of
> > > guest RAM to specific host NUMA nodes, in the case of multi-node KVM 
> > > guests, and
> > > noticed that -mem-path could be used for that, except that it currently 
> > > removes
> > > any files it creates (using mkstemp()) immediately, not allowing numactl 
> > > to be
> > > used on the backing files, as a result. This patches add a 
> > > -keep-mem-path-files
> > > option to make QEMU create the files inside -mem-path with more 
> > > predictable
> > > names, and not remove them after creation.
> > > 
> > > Some previous discussions about the subject, for reference:
> > >  - Message-ID: <1281534738-8310-1-git-send-email-andre.przyw...@amd.com>
> > >http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> > >  - Message-ID: <4c7d7c2a.7000...@codemonkey.ws>
> > >http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> > > 
> > > A more recent thread can be found at:
> > >  - Message-ID: <20111029184502.gh11...@in.ibm.com>
> > >http://article.gmane.org/gmane.comp.emulators.qemu/123001
> > > 
> > > Note that this is just a mechanism to facilitate manual static binding 
> > > using
> > > numactl on hugetlbfs later, for optimization. This may be especially 
> > > useful for
> > > single large multi-node guests use-cases (and, of course, has to be used 
> > > with
> > > care).
> > > 
> > > I don't know if it is a good idea to use the memory range names as a 
> > > publicly-
> > > visible interface. Another option may be to use a single file instead, 
> > > and mmap
> > > different regions inside the same file for each memory region. I an open 
> > > to
> > > comments and suggestions.
> > > 
> > > Example (untested) usage to bind manually each half of the RAM of a guest 
> > > to a
> > > different NUMA node:
> > > 
> > >  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > >-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > >-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> > >  $ numactl --offset=1G --length=1G --membind=1 --file 
> > > /mnt/hugetlbfs/FOO/pc.ram
> > >  $ numactl --offset=0  --length=1G --membind=2 --file 
> > > /mnt/hugetlbfs/FOO/pc.ram
> > 
> > I'd suggest that instead of making the memory file name into a
> > public ABI QEMU needs to maintain, QEMU could expose the info
> > via a monitor command. eg
> > 
> >$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> >  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> >  -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> >  -monitor stdio
> >(qemu) info mem-nodes
> > node0: file=/proc/self/fd/3, offset=0G, length=1G
> > node1: file=/proc/self/fd/3, offset=1G, length=1G
> > 
> > This example takes advantage of the fact that with Linux, you can
> > still access a deleted file via /proc/self/fd/NNN, which AFAICT,
> > would avoid the need for a --keep-mem-path-files.
> 
> I like the suggestion.
> 
> But other processes still need to be able to open those files if we want
> to do anything useful with them. In this case, I guess it's better to
> let QEMU itself build a "/proc//fd/" string instead of
> using "/proc/self" and forcing the client to find out what's the right
> PID?
> 
> Anyway, even if we want to avoid file-descriptor and /proc tricks, we
> can still use the interface you suggest. Then we wouldn't need to have
> any filename assumptions: the filenames could be completly random, as
> they would be reported using the new monitor command.

Opps, yes of course. I did intend that client apps could use the
files, so I should have used  /proc/$PID and not /proc/self

> 
> > 
> > By returning info via a monitor command you also avoid hardcoding
> > the use of 1 single file for all

Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-02 Thread Daniel P. Berrange

On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> Resending series, after fixing some coding style issues. Does anybody has any
> feedback about this proposal?
> 
> Changes v1 -> v2:
>  - Coding style fixes
> 
> Original cover letter:
> 
> I was investigating if there are any mechanisms that allow manually pinning of
> guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, 
> and
> noticed that -mem-path could be used for that, except that it currently 
> removes
> any files it creates (using mkstemp()) immediately, not allowing numactl to be
> used on the backing files, as a result. This patches add a 
> -keep-mem-path-files
> option to make QEMU create the files inside -mem-path with more predictable
> names, and not remove them after creation.
> 
> Some previous discussions about the subject, for reference:
>  - Message-ID: <1281534738-8310-1-git-send-email-andre.przyw...@amd.com>
>http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
>  - Message-ID: <4c7d7c2a.7000...@codemonkey.ws>
>http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> 
> A more recent thread can be found at:
>  - Message-ID: <20111029184502.gh11...@in.ibm.com>
>http://article.gmane.org/gmane.comp.emulators.qemu/123001
> 
> Note that this is just a mechanism to facilitate manual static binding using
> numactl on hugetlbfs later, for optimization. This may be especially useful 
> for
> single large multi-node guests use-cases (and, of course, has to be used with
> care).
> 
> I don't know if it is a good idea to use the memory range names as a publicly-
> visible interface. Another option may be to use a single file instead, and 
> mmap
> different regions inside the same file for each memory region. I an open to
> comments and suggestions.
> 
> Example (untested) usage to bind manually each half of the RAM of a guest to a
> different NUMA node:
> 
>  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
>-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
>-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
>  $ numactl --offset=1G --length=1G --membind=1 --file 
> /mnt/hugetlbfs/FOO/pc.ram
>  $ numactl --offset=0  --length=1G --membind=2 --file 
> /mnt/hugetlbfs/FOO/pc.ram

I'd suggest that instead of making the memory file name into a
public ABI QEMU needs to maintain, QEMU could expose the info
via a monitor command. eg

   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/3, offset=1G, length=1G

This example takes advantage of the fact that with Linux, you can
still access a deleted file via /proc/self/fd/NNN, which AFAICT,
would avoid the need for a --keep-mem-path-files.

By returning info via a monitor command you also avoid hardcoding
the use of 1 single file for all of memory. You also avoid hardcoding
the fact that QEMU stores the nodes in contiguous order inside the
node. eg QEMU could easily return data like this


   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/4, offset=0G, length=1G

or more ingeneous options

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-06-27 Thread Daniel P. Berrange

On Wed, Jun 27, 2012 at 04:52:32PM +0200, Cornelia Huck wrote:
> On Wed, 27 Jun 2012 15:02:23 +0800
> Wen Congyang  wrote:
> 
> > When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT.
> > So if qemu reads 0x1 from this port, we can do the folloing three
> > things according to the parameter -onpanic:
> > 1. emit QEVENT_GUEST_PANICKED only
> > 2. emit QEVENT_GUEST_PANICKED and pause the guest
> > 3. emit QEVENT_GUEST_PANICKED and poweroff the guest
> > 4. emit QEVENT_GUEST_PANICKED and reset the guest
> 
> Would it be useful to add some "dump the guest" actions here?

Better off leaving that to the mgmt layer using QEMU. If you
tried to directly handle "dump the guest" in the context of
the panic notifier then you add all sorts of extra complexity
to this otherwise simple feature. For a start the you need to
tell it what filename to use, which is not something you can
necessarily decide at the time QEMU starts - you might want
a separate filename each time a panic ocurrs. THe mgmt app
might not even want QEMU to dump to a file - it might want
to use a socket, or pass in a file descriptor at time of
dump. All in all, it is better to keep the panic notifier
simple, and let the mgmt app then decide whether to take
a dump separately, using existing QEMU monitor commands
and features.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event

2012-06-12 Thread Daniel P. Berrange

On Tue, Jun 12, 2012 at 09:35:04AM -0300, Luiz Capitulino wrote:
> On Tue, 12 Jun 2012 14:55:37 +0800
> Wen Congyang  wrote:
> 
> > >> +static void panicked_perform_action(void)
> > >> +{
> > >> +switch(panicked_action) {
> > >> +case PANICKED_REPORT:
> > >> +panicked_mon_event("report");
> > >> +break;
> > >> +
> > >> +case PANICKED_PAUSE:
> > >> +panicked_mon_event("pause");
> > >> +vm_stop(RUN_STATE_GUEST_PANICKED);
> > >> +break;
> > >> +
> > >> +case PANICKED_QUIT:
> > >> +panicked_mon_event("quit");
> > >> +exit(0);
> > >> +break;
> > >> +}
> > > 
> > > Having the data argument is not needed/wanted. The mngt app can guess it 
> > > if it
> > > needs to know it, but I think it doesn't want to.
> > 
> > Libvirt will do something when the kernel is panicked, so it should know 
> > the action
> > in qemu side.
> 
> But the action will be set by libvirt itself, no?

Sure, but the whole world isn't libvirt. If the process listening to the
monitor is not the same as the process which launched the VM, then I
think including the action is worthwhile. Besides, the way Wen has done
this is identical to what we already do with QEVENT_WATCHDOG and I think
it is desirable to keep consistency here.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for June, Tuesday 15th

2012-05-15 Thread Daniel P. Berrange

On Tue, May 15, 2012 at 08:44:14AM -0500, Anthony Liguori wrote:
> On 05/15/2012 03:51 AM, Kevin Wolf wrote:
> >Currently we have a very simple unidirectional structure:
> >qemu is a standalone program that keeps running on its own. libvirt is
> >the user of qemu. Often enough it's already hard to get things working
> >correctly in error cases with this simple structure - do you really want
> >to have qemu depend on an RPC to libvirt?
> 
> Yes.  We're relying on libvirt for a *syscall* that the kernel isn't
> processing correctly.  I'm not advocating a general mechanism where
> we defer larges parts of QEMU to libvirt.  This is specifically the
> open() syscall.
> 
> >You're right that the proper fix would be in the kernel, but in qemu a
> >much better solution that RPCs to libvirt is allowing all QMP commands
> >that open new files to pass a block device description that can contain
> >a fd.
> 
> I don't agree that this is an obviously better solution.  For
> example, it mandates that libvirt parse image formats to determine
> the backing file chains.

I think that the question of parsing image formats is tangential
to this QEMU impl choice.

> OTOH, the open() RPC allows libvirt to avoid parsing image formats.
> It could do something as simple as have the user specify a white
> list of image files the guest is allowed to access in the domain XML
> and validate against that.
>
> It removes considerable complexity from libvirt as it doesn't have
> to construct a potentially complex set of blockdev arguments.

I don't really think this QEMU approach to a callback for arbitrary
files simplifies libvirt's life in any way. In fact I think it will
actually complicate our life, because instead of being able to
provide all the information/resources required at one time, we need
have to wait to get async callbacks some time later. We then have to
try and figure out whether the file being request is actually allowed
by the config.


> >This would much better than first getting an open command via QMP
> >and then using an RPC to ask back what we're really meant to open.
> >
> >To the full extent we're going to get this with blockdev-add (which is
> >what we should really start working on now rather than on hacks like
> >-open-fd-hook), but if you like hacks, much (if not all) of it is
> >already possible today with the 'existing' mode of live snapshots.
> 
> I really don't think that blockdev is an elegant solution to this
> problem.  It pushes an awful lot of complexity to libvirt (or any
> management tool).
> 
> I actually think Avi's original idea of a filename dictionary is a
> better approach than blockdev for solving this problem.

While I raise blockdev as an alternative approach, I am open to
other alternative ways to provide this config via the CLI or
monitor. Basically anything that isn't this generic file open
callback.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: smp option of qemu-kvm

2012-04-05 Thread Daniel P. Berrange

On Thu, Apr 05, 2012 at 02:52:40PM -0400, Steven wrote:
> Hi, Daniel,
> Thanks for your quick response. However, the ps -eLf show 4 threads
> for the VM and I checked 4 threads have the same tgid.
> But the VM I created is with -smp 2 option. Could you explain this? Thanks.

As well as the vCPU threads, QEMU creates other threads as needed, typically
for I/O - indeed the count of threads may vary over time.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: smp option of qemu-kvm

2012-04-05 Thread Daniel P. Berrange

On Thu, Apr 05, 2012 at 02:28:51PM -0400, Steven wrote:
> Hi,
> I started a kvm VM by adding -smp 2 option. From inside the guest, I
> can see that /proc/cpuinfo outputs 2 cores.
> However, in the host, I only observe one qemu-kvm process for that VM.
> Does that mean this VM is actually running on one core?
> If so, how to make a VM to run on 2 or more cores? Thanks.

Each VCPU in KVM corresponds to a separate thread in the process. The
'ps' command only ever shows the thread leader by default - so you
don't see those VCPU threads in the process list. eg ps -eLf to
see all threads

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-21 Thread Daniel P. Berrange

On Wed, Mar 21, 2012 at 06:25:16PM +0200, Avi Kivity wrote:
> On 03/21/2012 06:18 PM, Corey Minyard wrote:
> >
> >> Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic
> >> event over IMPI. The code is pretty complex. Of course if we a going to
> >> implement something more complex than simple hypercall for panic
> >> notification we better do something more interesting with it than just
> >> saying "panic happened", like sending stack traces on all cpus for
> >> instance.
> >
> > I doubt that's the best example, unfortunately.  The IPMI event log
> > has limited space and it has to be send a little piece at a time since
> > each log entry is 14 bytes.  It just prints the panic string, nothing
> > else.  Not that it isn't useful, it has saved my butt before.
> >
> > You have lots of interesting options with paravirtualization.  You
> > could, for instance, create a console driver that delivered all
> > console output efficiently through a hypercall.  That would be really
> > easy.  Or, as you mention, a custom way to deliver panic information. 
> > Collecting information like stack traces would be harder to
> > accomplish, as I don't think there is currently a way to get it except
> > by sending it to printk.
> 
> That already exists; virtio-console (or serial console emulation) can do
> the job.
> 
> In fact the feature can be implemented 100% host side by searching for a
> panic string signature in the console logs.

You can even go one better and search for the panic string in the
guest memory directly, which is what virt-dmesg does :-)

  http://people.redhat.com/~rjones/virt-dmesg/

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange

On Wed, Mar 14, 2012 at 07:06:50PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote:
> > On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> >> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> >>> On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >>>>>
> >>>>>> If so, is this channel visible to guest userspace? If the channle is 
> >>>>>> visible to guest
> >>>>>> userspace, the program running in userspace may write the same message 
> >>>>>> to the channel.
> >>>>>
> >>>>> Access control is via permissions.  You can have udev scripts assign
> >>>>> whatever uid and gid to the port of your interest.  By default, all
> >>>>> ports are only accessible to the root user.
> >>>>
> >>>> We should also prevent root user writing message to this channel if it is
> >>>> used for panicked notification.
> >>>>
> >>>
> >>> Why?  root can easily cause a panic.
> >>>
> >>
> >> root user can write the same message to virtio-serial while the guest is 
> >> running...
> > 
> > Unless you are running a MAC policy which strictly confines the root
> > account, root can cause a kernel panic regardless of virtio-serial
> > permissions in the guest:
> > 
> >   echo c > /proc/sysrq-trigger
> 
> Yes, root user can cause a kernel panic. But if he writes the same message to 
> virtio-serial,
> the host will see the guest is panicked while the guest is not panicked. The 
> host is cheated.

The host mgmt layer must *ALWAYS* expect that any information originating
from the guest is bogus. It must never trust the guest info. So regardless
of the implementation, you have to expect that the guest might have lied
to you about it being crashed. The same is true even of Xen's panic notifier.

So if an application is automatically triggering core dumps based on this
panic notification, it needs to be aware that the guest can lie and take
steps to avoid the guest causing a DOS attack on the host. Most likely
by rate limiting the frequency of core dumps per guest, and/or setting a
max core dump storage quota per guest.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange

On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
> At 03/14/2012 06:52 PM, Avi Kivity Wrote:
> > On 03/14/2012 12:52 PM, Wen Congyang wrote:
> >>>
>  If so, is this channel visible to guest userspace? If the channle is 
>  visible to guest
>  userspace, the program running in userspace may write the same message 
>  to the channel.
> >>>
> >>> Access control is via permissions.  You can have udev scripts assign
> >>> whatever uid and gid to the port of your interest.  By default, all
> >>> ports are only accessible to the root user.
> >>
> >> We should also prevent root user writing message to this channel if it is
> >> used for panicked notification.
> >>
> > 
> > Why?  root can easily cause a panic.
> > 
> 
> root user can write the same message to virtio-serial while the guest is 
> running...

Unless you are running a MAC policy which strictly confines the root
account, root can cause a kernel panic regardless of virtio-serial
permissions in the guest:

  echo c > /proc/sysrq-trigger

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange

On Wed, Mar 14, 2012 at 03:21:14PM +0530, Amit Shah wrote:
> On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
> > At 03/13/2012 06:47 PM, Avi Kivity Wrote:
> > > On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
> > >> On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> > >>> On 03/12/2012 11:04 AM, Wen Congyang wrote:
> > >>>> Do you have any other comments about this patch?
> > >>>>
> > >>>
> > >>> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> > >>> likely to only be used by Linux, which has kexec facilities, and you can
> > >>> put talk to management via virtio-serial and describe the crash in more
> > >>> details than a simple hypercall.
> > >>
> > >> As mentioned before, I don't think virtio-serial is a good fit for this.
> > >> We want something that is simple & guaranteed always available. Using
> > >> virtio-serial requires significant setup work on both the host and guest.
> > > 
> > > So what?  It needs to be done anyway for the guest agent.
> > > 
> > >> Many management application won't know to make a vioserial device 
> > >> available
> > >> to all guests they create. 
> > > 
> > > Then they won't know to deal with the panic event either.
> > > 
> > >> Most administrators won't even configure kexec,
> > >> let alone virtio serial on top of it. 
> > > 
> > > It should be done by the OS vendor, not the individual admin.
> > > 
> > >> The hypercall requires zero host
> > >> side config, and zero guest side config, which IMHO is what we need for
> > >> this feature.
> > > 
> > > If it was this one feature, yes.  But we keep getting more and more
> > > features like that and we bloat the hypervisor.  There's a reason we
> > > have a host-to-guest channel, we should use it.
> > > 
> > 
> > I donot know how to use virtio-serial.
> > 
> > I start vm like this:
> > qemu ...\
> >-device virtio-serial \
> >   -chardev socket,path=/tmp/foo,server,nowait,id=foo \
> >   -device virtserialport,chardev=foo,name=port1 ...
> 
> This is sufficient.  On the host, you can open /tmp/foo using a custom
> program or nc (nc -U /tmp/foo).  On the guest, you can just open
> /dev/virtio-ports/port1 and read/write into it.
> 
> See the following links for more details.
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> > You said that there are too many channels. Does it mean /tmp/foo is a 
> > channel?
> 
> You can have several such -device virtserialport.  The -device part
> describes what the guest will see.  The -chardev part ties that to the
> host-side part of the channel.
> 
> /tmp/foo is the host end-point for the channel, in the example above,
> and /dev/virtio-ports/port1 is the guest-side end-point.

If we do choose to use virtio-serial for panics (which I don't think
we should), then we should not expose it in the host filesystem. The
host side should be a virtual chardev backend internal to QEMU, in
the same way that 'spicevmc' is handled.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-13 Thread Daniel P. Berrange

On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
> On 03/12/2012 11:04 AM, Wen Congyang wrote:
> > Do you have any other comments about this patch?
> >
> 
> Not really, but I'm not 100% convinced the patch is worthwhile.  It's
> likely to only be used by Linux, which has kexec facilities, and you can
> put talk to management via virtio-serial and describe the crash in more
> details than a simple hypercall.

As mentioned before, I don't think virtio-serial is a good fit for this.
We want something that is simple & guaranteed always available. Using
virtio-serial requires significant setup work on both the host and guest.
Many management application won't know to make a vioserial device available
to all guests they create. Most administrators won't even configure kexec,
let alone virtio serial on top of it. The hypercall requires zero host
side config, and zero guest side config, which IMHO is what we need for
this feature.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event

2012-03-08 Thread Daniel P. Berrange

On Thu, Mar 08, 2012 at 01:52:45PM +0200, Avi Kivity wrote:
> On 03/08/2012 01:36 PM, Daniel P. Berrange wrote:
> > On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote:
> > > On 03/08/2012 12:15 PM, Wen Congyang wrote:
> > > > When the host knows the guest is panicked, it will set
> > > > exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive
> > > > this exit_reason, we can send a event to tell management
> > > > application that the guest is panicked and set the guest
> > > > status to RUN_STATE_PANICKED.
> > > >
> > > > Signed-off-by: Wen Congyang 
> > > > ---
> > > >  kvm-all.c|5 +
> > > >  monitor.c|3 +++
> > > >  monitor.h|1 +
> > > >  qapi-schema.json |2 +-
> > > >  qmp.c|3 ++-
> > > >  vl.c |1 +
> > > >  6 files changed, 13 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kvm-all.c b/kvm-all.c
> > > > index 77eadf6..b3c9a83 100644
> > > > --- a/kvm-all.c
> > > > +++ b/kvm-all.c
> > > > @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env)
> > > >  (uint64_t)run->hw.hardware_exit_reason);
> > > >  ret = -1;
> > > >  break;
> > > > +case KVM_EXIT_GUEST_PANICKED:
> > > > +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL);
> > > > +vm_stop(RUN_STATE_PANICKED);
> > > > +ret = -1;
> > > > +break;
> > > >
> > > 
> > > If the management application is not aware of this event, then it will
> > > never resume the guest, so it will appear hung.
> >
> > Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should
> > still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will
> > know the guest CPUs have been stopped, even if it isn't aware of the
> > reason why, which seems fine to me.
> 
> No.  The guest is stopped, and there's no reason to suppose that the
> management app will restart it.  Behaviour has changed.
> 
> Suppose the guest has reboot_on_panic set; now the behaviour change is
> even more visible - service will stop completely instead of being
> interrupted for a bit while the guest reboots.

Hmm, so this calls for a new command line argument to control behaviour,
similar to what we do for disk werror, eg something like

  --onpanic "report|pause|stop|..."

where

 report - emit QEVENT_GUEST_PANICKED only
 pause  - emit QEVENT_GUEST_PANICKED and pause VM
 stop   - emit QEVENT_GUEST_PANICKED and quit VM
 stop   - emit QEVENT_GUEST_PANICKED and quit VM

This would map fairly well into libvirt, where we already have config
parameters for controlling what todo with a guest when it panics.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event

2012-03-08 Thread Daniel P. Berrange

On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote:
> On 03/08/2012 12:15 PM, Wen Congyang wrote:
> > When the host knows the guest is panicked, it will set
> > exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive
> > this exit_reason, we can send a event to tell management
> > application that the guest is panicked and set the guest
> > status to RUN_STATE_PANICKED.
> >
> > Signed-off-by: Wen Congyang 
> > ---
> >  kvm-all.c|5 +
> >  monitor.c|3 +++
> >  monitor.h|1 +
> >  qapi-schema.json |2 +-
> >  qmp.c|3 ++-
> >  vl.c |1 +
> >  6 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 77eadf6..b3c9a83 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env)
> >  (uint64_t)run->hw.hardware_exit_reason);
> >  ret = -1;
> >  break;
> > +case KVM_EXIT_GUEST_PANICKED:
> > +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL);
> > +vm_stop(RUN_STATE_PANICKED);
> > +ret = -1;
> > +break;
> >
> 
> If the management application is not aware of this event, then it will
> never resume the guest, so it will appear hung.

Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should
still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will
know the guest CPUs have been stopped, even if it isn't aware of the
reason why, which seems fine to me.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Use getaddrinfo for migration

2012-03-02 Thread Daniel P. Berrange

On Fri, Mar 02, 2012 at 02:25:36PM +0400, Michael Tokarev wrote:
> Not a reply to the patch but a general observation.
> 
> I noticed that the tcp migration uses gethostname
> (or getaddrinfo after this patch) from the main
> thread - is it really the way to go?  Note that
> DNS query which is done may block for a large amount
> of time.  Is it really safe in this context?  Should
> it resolve the name in a separate thread, allowing
> guest to run while it is doing that?
> 
> This question is important for me because right now
> I'm evaluating a network-connected block device driver
> which should do failover, so it will have to resolve
> alternative name(s) at runtime (especially since list
> of available targets is dynamic).
> 
> From one point, _usually_, the delay there is very
> small since it is unlikely you'll do migration or
> failover overseas, so most likely you'll have the
> answer from DNS handy.  But from another point, if
> the DNS is malfunctioning right at that time (eg,
> one of the two DNS resolvers is being rebooted),
> the delay even from local DNS may be noticeable.

Yes, I think you are correct - QEMU should take care to ensure that
DNS resolution can not block the QEMU event loop thread.

There is the GLib extension (getaddrinfo_a) which does async DNS
resolution, but for sake of portability it is probably better
to use a thread to do it.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Daniel P. Berrange

On Wed, Feb 29, 2012 at 12:05:32PM +0200, Avi Kivity wrote:
> On 02/29/2012 11:58 AM, Daniel P. Berrange wrote:
> > > 
> > > How about using a virtio-serial channel for this?  You can transfer any
> > > amount of information (including the dump itself).
> >
> > When the guest OS has crashed, any dumps will be done from the host
> > OS using libvirt's core dump mechanism. The guest OS isn't involved
> > and is likely too dead to be of any use anyway. Likewise it is
> > quite probably too dead to work a virtio-serial channel or any
> > similarly complex device. We're really just after the simplest
> > possible notification that the guest kernel has paniced.
> 
> If it's alive enough to panic, it's alive enough to kexec its kdump
> kernel.  After that it can do anything.
> 
> Guest-internal dumps are more useful IMO that host-initiated dumps.  In
> a cloud, the host-initiated dump is left on the host, outside the reach
> of the guest admin, outside the guest image where all the symbols are,
> and sometimes not even on the same host if a live migration occurred. 
> It's more useful in small setups, or if the problem is in the
> hypervisor, not the guest.

I don't think guest vs host dumps should be considered mutually exclusive,
they both have pluses+minuses.

Configuring kexec+kdump requires non-negligable guest admin configuration
work before it's usable, and this work is guest OS specific, if it is possible
at all. A permanent panic notifier that's built in the kernel by default
requires zero guest admin config, and can allow host admin to automate
collection of dumps across all their hosts/guests. The KVM hypercall
notification is fairly trivially ported to any OS kernel, by comparison
with a full virtio + virtio-serial impl.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Daniel P. Berrange

On Wed, Feb 29, 2012 at 11:49:58AM +0200, Avi Kivity wrote:
> On 02/29/2012 03:29 AM, Wen Congyang wrote:
> > At 02/28/2012 07:23 PM, Avi Kivity Wrote:
> > > On 02/27/2012 05:01 AM, Wen Congyang wrote:
> > >> We can know the guest is paniced when the guest runs on xen.
> > >> But we do not have such feature on kvm. This patch implemnts
> > >> this feature, and the implementation is the same as xen:
> > >> register panic notifier, and call hypercall when the guest
> > >> is paniced.
> > > 
> > > What's the motivation for this?  "Xen does this" is insufficient.
> >
> > Another purpose is: management app(for example: libvirt) can do auto
> > dump when the guest is crashed. If management app does not do auto
> > dump, the guest's user can do dump by hand if he sees the guest is
> > paniced.
> >
> > I am thinking about another status: dumping. This status tells
> > the guest's user that the guest is paniced, and the OS's dump function
> > is working.
> >
> > These two status can tell the guest's user whether the guest is pancied,
> > and what should he do if the guest is paniced.
> >
> 
> How about using a virtio-serial channel for this?  You can transfer any
> amount of information (including the dump itself).

When the guest OS has crashed, any dumps will be done from the host
OS using libvirt's core dump mechanism. The guest OS isn't involved
and is likely too dead to be of any use anyway. Likewise it is
quite probably too dead to work a virtio-serial channel or any
similarly complex device. We're really just after the simplest
possible notification that the guest kernel has paniced.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] QEMU applying for Google Summer of Code 2012

2012-02-10 Thread Daniel P. Berrange

On Fri, Feb 10, 2012 at 10:30:24AM +, Stefan Hajnoczi wrote:
> This year's Google Summer of Code has been announced:
> 
> http://www.google-melange.com/gsoc/events/google/gsoc2012
> 
> For those who haven't heard of GSoC before, it funds university
> students to work on open source projects during the summer.
> Organizations, such as QEMU, can participate to attract students who
> will tackle projects for 12 weeks this summer.  The GSoC program has
> been very successful because it gives students real open source
> experience and organizations can grow their development community.
> 
> QEMU has participated for several years and I would like to organize
> our participation this year.  Luiz was QEMU organization administrator
> last year and contacted me because he will not have time this year.  I
> will prepare the application form for QEMU so that we will be
> considered for 2012.
> 
> Umbrella organization
> -
> Like last year, we can provide a home for KVM kernel module and
> libvirt projects too if those organizations prefer not to apply to
> GSoC themselves.  Please let us know so we can work together!

To maximise the spirit of collaboration between libvirt & QEMU/KVM
communities I think it would make sense for us to work together under
the same GSoC Umbrella organization.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange

On Fri, Jan 20, 2012 at 02:02:03PM +0100, Jan Kiszka wrote:
> On 2012-01-20 13:54, Daniel P. Berrange wrote:
> > On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
> >> On 2012-01-20 13:42, Daniel P. Berrange wrote:
> >>> On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
> >>>> On 2012-01-20 12:45, Daniel P. Berrange wrote:
> >>>>> On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
> >>>>>> On 2012-01-20 11:25, Daniel P. Berrange wrote:
> >>>>>>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
> >>>>>>>> On 2012-01-20 11:14, Marcelo Tosatti wrote:
> >>>>>>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
> >>>>>>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote:
> >>>>>>>>>>>> What problems does it cause, and in which scenarios? Can't they 
> >>>>>>>>>>>> be
> >>>>>>>>>>>> fixed?
> >>>>>>>>>>>
> >>>>>>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, 
> >>>>>>>>>>> guest
> >>>>>>>>>>> time advances faster then it should, to the extent where NTP 
> >>>>>>>>>>> fails to
> >>>>>>>>>>> correct it. This is the case with RHEL4.
> >>>>>>>>>>>
> >>>>>>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not
> >>>>>>>>>>> compensate. In that case you want KVM to reinject.
> >>>>>>>>>>>
> >>>>>>>>>>> I don't know of any other way to fix this.
> >>>>>>>>>>
> >>>>>>>>>> OK, i see. The old unsolved problem of guessing what is being 
> >>>>>>>>>> executed.
> >>>>>>>>>>
> >>>>>>>>>> Then the next question is how and where to control this. 
> >>>>>>>>>> Conceptually,
> >>>>>>>>>> there should rather be a global switch say "compensate for lost 
> >>>>>>>>>> ticks of
> >>>>>>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we
> >>>>>>>>>> discussed something like this before?
> >>>>>>>>>
> >>>>>>>>> I don't see the advantage of a global control versus per device
> >>>>>>>>> control (in fact it lowers flexibility).
> >>>>>>>>
> >>>>>>>> Usability. Users should not have to care about individual tick-based
> >>>>>>>> clocks. They care about "my OS requires lost ticks compensation, yes 
> >>>>>>>> or no".
> >>>>>>>
> >>>>>>> FYI, at the libvirt level we model policy against individual timers, 
> >>>>>>> for
> >>>>>>> example:
> >>>>>>>
> >>>>>>>   
> >>>>>>> 
> >>>>>>> 
> >>>>>>>   
> >>>>>>
> >>>>>> Are the various modes of tickpolicy fully specified somewhere?
> >>>>>
> >>>>> There are some (not all that great) docs here:
> >>>>>
> >>>>>   http://libvirt.org/formatdomain.html#elementsTime
> >>>>>
> >>>>> The meaning of the 4 policies are:
> >>>>>
> >>>>>   delay: continue to deliver at normal rate
> >>>>
> >>>> What does this mean? The timer stops ticking until the guest accepts its
> >>>> ticks again?
> >>>
> >>> It means that the hypervisor will not attempt to do any compensation,
> >>> so the guest will see delays in its ticks being delivered & gradually
> >>> drift over time.
> >>
> >> Still, is the logic as I described? Or what is the difference to "discard".
> > 
> > With 'discard', the delayed tick will be thrown away. In 'delay', the
> > delayed tick will still be injected to the guest, possibly well after
> > the intended injection time though, and there will be no attempt to
> > compensate by speeding up delivery of later ticks.
> 
> OK, let's see if I got it:
> 
> delay   - all lost ticks are replayed in a row once the guest accepts
>   them again
> catchup - lost ticks are gradually replayed at a higher frequency than
>   the original tick
> merge   - at most one tick is replayed once the guest accepts it again
> discard - no lost ticks compensation

Yes, I think that is a good understanding.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange

On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
> On 2012-01-20 13:42, Daniel P. Berrange wrote:
> > On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
> >> On 2012-01-20 12:45, Daniel P. Berrange wrote:
> >>> On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
> >>>> On 2012-01-20 11:25, Daniel P. Berrange wrote:
> >>>>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
> >>>>>> On 2012-01-20 11:14, Marcelo Tosatti wrote:
> >>>>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
> >>>>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote:
> >>>>>>>>>> What problems does it cause, and in which scenarios? Can't they be
> >>>>>>>>>> fixed?
> >>>>>>>>>
> >>>>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, 
> >>>>>>>>> guest
> >>>>>>>>> time advances faster then it should, to the extent where NTP fails 
> >>>>>>>>> to
> >>>>>>>>> correct it. This is the case with RHEL4.
> >>>>>>>>>
> >>>>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not
> >>>>>>>>> compensate. In that case you want KVM to reinject.
> >>>>>>>>>
> >>>>>>>>> I don't know of any other way to fix this.
> >>>>>>>>
> >>>>>>>> OK, i see. The old unsolved problem of guessing what is being 
> >>>>>>>> executed.
> >>>>>>>>
> >>>>>>>> Then the next question is how and where to control this. 
> >>>>>>>> Conceptually,
> >>>>>>>> there should rather be a global switch say "compensate for lost 
> >>>>>>>> ticks of
> >>>>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we
> >>>>>>>> discussed something like this before?
> >>>>>>>
> >>>>>>> I don't see the advantage of a global control versus per device
> >>>>>>> control (in fact it lowers flexibility).
> >>>>>>
> >>>>>> Usability. Users should not have to care about individual tick-based
> >>>>>> clocks. They care about "my OS requires lost ticks compensation, yes 
> >>>>>> or no".
> >>>>>
> >>>>> FYI, at the libvirt level we model policy against individual timers, for
> >>>>> example:
> >>>>>
> >>>>>   
> >>>>> 
> >>>>> 
> >>>>>   
> >>>>
> >>>> Are the various modes of tickpolicy fully specified somewhere?
> >>>
> >>> There are some (not all that great) docs here:
> >>>
> >>>   http://libvirt.org/formatdomain.html#elementsTime
> >>>
> >>> The meaning of the 4 policies are:
> >>>
> >>>   delay: continue to deliver at normal rate
> >>
> >> What does this mean? The timer stops ticking until the guest accepts its
> >> ticks again?
> > 
> > It means that the hypervisor will not attempt to do any compensation,
> > so the guest will see delays in its ticks being delivered & gradually
> > drift over time.
> 
> Still, is the logic as I described? Or what is the difference to "discard".

With 'discard', the delayed tick will be thrown away. In 'delay', the
delayed tick will still be injected to the guest, possibly well after
the intended injection time though, and there will be no attempt to
compensate by speeding up delivery of later ticks.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange

On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
> On 2012-01-20 12:45, Daniel P. Berrange wrote:
> > On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
> >> On 2012-01-20 11:25, Daniel P. Berrange wrote:
> >>> On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
> >>>> On 2012-01-20 11:14, Marcelo Tosatti wrote:
> >>>>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
> >>>>>> On 2012-01-19 18:53, Marcelo Tosatti wrote:
> >>>>>>>> What problems does it cause, and in which scenarios? Can't they be
> >>>>>>>> fixed?
> >>>>>>>
> >>>>>>> If the guest compensates for lost ticks, and KVM reinjects them, guest
> >>>>>>> time advances faster then it should, to the extent where NTP fails to
> >>>>>>> correct it. This is the case with RHEL4.
> >>>>>>>
> >>>>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not
> >>>>>>> compensate. In that case you want KVM to reinject.
> >>>>>>>
> >>>>>>> I don't know of any other way to fix this.
> >>>>>>
> >>>>>> OK, i see. The old unsolved problem of guessing what is being executed.
> >>>>>>
> >>>>>> Then the next question is how and where to control this. Conceptually,
> >>>>>> there should rather be a global switch say "compensate for lost ticks 
> >>>>>> of
> >>>>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we
> >>>>>> discussed something like this before?
> >>>>>
> >>>>> I don't see the advantage of a global control versus per device
> >>>>> control (in fact it lowers flexibility).
> >>>>
> >>>> Usability. Users should not have to care about individual tick-based
> >>>> clocks. They care about "my OS requires lost ticks compensation, yes or 
> >>>> no".
> >>>
> >>> FYI, at the libvirt level we model policy against individual timers, for
> >>> example:
> >>>
> >>>   
> >>> 
> >>> 
> >>>   
> >>
> >> Are the various modes of tickpolicy fully specified somewhere?
> > 
> > There are some (not all that great) docs here:
> > 
> >   http://libvirt.org/formatdomain.html#elementsTime
> > 
> > The meaning of the 4 policies are:
> > 
> >   delay: continue to deliver at normal rate
> 
> What does this mean? The timer stops ticking until the guest accepts its
> ticks again?

It means that the hypervisor will not attempt to do any compensation,
so the guest will see delays in its ticks being delivered & gradually
drift over time.

> > catchup: deliver at higher rate to catchup
> >   merge: ticks merged into 1 single tick
> > discard: all missed ticks are discarded
> 
> But those interpretations aren't stated in the docs. That makes it hard
> to map them on individual hypervisors - or model proper new hypervisor
> interfaces accordingly.

That's not a real problem, now I notice they are missing the docs, I
can just add them in.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange

On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
> On 2012-01-20 11:25, Daniel P. Berrange wrote:
> > On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
> >> On 2012-01-20 11:14, Marcelo Tosatti wrote:
> >>> On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
> >>>> On 2012-01-19 18:53, Marcelo Tosatti wrote:
> >>>>>> What problems does it cause, and in which scenarios? Can't they be
> >>>>>> fixed?
> >>>>>
> >>>>> If the guest compensates for lost ticks, and KVM reinjects them, guest
> >>>>> time advances faster then it should, to the extent where NTP fails to
> >>>>> correct it. This is the case with RHEL4.
> >>>>>
> >>>>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not
> >>>>> compensate. In that case you want KVM to reinject.
> >>>>>
> >>>>> I don't know of any other way to fix this.
> >>>>
> >>>> OK, i see. The old unsolved problem of guessing what is being executed.
> >>>>
> >>>> Then the next question is how and where to control this. Conceptually,
> >>>> there should rather be a global switch say "compensate for lost ticks of
> >>>> periodic timers: yes/no" - instead of a per-timer knob. Didn't we
> >>>> discussed something like this before?
> >>>
> >>> I don't see the advantage of a global control versus per device
> >>> control (in fact it lowers flexibility).
> >>
> >> Usability. Users should not have to care about individual tick-based
> >> clocks. They care about "my OS requires lost ticks compensation, yes or 
> >> no".
> > 
> > FYI, at the libvirt level we model policy against individual timers, for
> > example:
> > 
> >   
> > 
> > 
> >   
> 
> Are the various modes of tickpolicy fully specified somewhere?

There are some (not all that great) docs here:

  http://libvirt.org/formatdomain.html#elementsTime

The meaning of the 4 policies are:

  delay: continue to deliver at normal rate
catchup: deliver at higher rate to catchup
  merge: ticks merged into 1 single tick
discard: all missed ticks are discarded


The original design rationale was here, though beware that some things
changed between this design & the actual implementation libvirt has:

  https://www.redhat.com/archives/libvir-list/2010-March/msg00304.html

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange

On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
> On 2012-01-20 11:14, Marcelo Tosatti wrote:
> > On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
> >> On 2012-01-19 18:53, Marcelo Tosatti wrote:
>  What problems does it cause, and in which scenarios? Can't they be
>  fixed?
> >>>
> >>> If the guest compensates for lost ticks, and KVM reinjects them, guest
> >>> time advances faster then it should, to the extent where NTP fails to
> >>> correct it. This is the case with RHEL4.
> >>>
> >>> But for example v2.4 kernel (or Windows with non-acpi HAL) do not
> >>> compensate. In that case you want KVM to reinject.
> >>>
> >>> I don't know of any other way to fix this.
> >>
> >> OK, i see. The old unsolved problem of guessing what is being executed.
> >>
> >> Then the next question is how and where to control this. Conceptually,
> >> there should rather be a global switch say "compensate for lost ticks of
> >> periodic timers: yes/no" - instead of a per-timer knob. Didn't we
> >> discussed something like this before?
> > 
> > I don't see the advantage of a global control versus per device
> > control (in fact it lowers flexibility).
> 
> Usability. Users should not have to care about individual tick-based
> clocks. They care about "my OS requires lost ticks compensation, yes or no".

FYI, at the libvirt level we model policy against individual timers, for
example:

  


  


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SPEC-file for making RPMs (with rpmbuild)

2012-01-06 Thread Daniel P. Berrange

On Fri, Jan 06, 2012 at 11:11:21AM +0100, Guido Winkelmann wrote:
> Hi,
> 
> Is there a spec-file somewhere for creating RPMs from the newest qemu-kvm 
> release?

The current Fedora RPM specfiles are always a good bet to start off with:

  http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=blob;f=qemu.spec;hb=HEAD

By default these will build all QEMU targets, and a dedicated qemu-kvm
binary too.There is a flag to restrict it to x86 only for cases where
you don't want all archs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 5x slower guest disk performance with virtio disk

2011-12-15 Thread Daniel P. Berrange

On Thu, Dec 15, 2011 at 07:16:22PM +0200, Sasha Levin wrote:
> On Thu, 2011-12-15 at 11:55 -0500, Brian J. Murrell wrote:
> > So, about 2/3 of host speed now -- which is much better.  Is 2/3 about
> > normal or should I be looking for more? 
> 
> aio=native
> 
> Thats the qemu setting, I'm not sure where libvirt hides that.

  

...
  

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Allow the user to pass a FD to use as a TAP device

2011-12-07 Thread Daniel P. Berrange

On Wed, Dec 07, 2011 at 06:28:12PM +0200, Pekka Enberg wrote:
> On Wed, Dec 7, 2011 at 11:37 AM, Sasha Levin  wrote:
> > This allows users to pass a pre-configured fd to use for the network
> > interface.
> >
> > For example:
> >        kvm run -n mode=tap,fd=3 3<>/dev/net/tap3
> >
> > Cc: Daniel P. Berrange 
> > Cc: Osier Yang 
> > Signed-off-by: Sasha Levin 
> 
> Daniel, Osier, I assume this is useful for libvirt?

Yes, this works.

I don't know if kvmtool supports  the VNET_HDR extension yet, but if it
does, then we can make libvirt pass in a pre-opened FD for that too.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] (no subject)

2011-12-07 Thread Daniel P. Berrange

On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote:
> On Tue, 2011-12-06 at 14:38 +0000, Daniel P. Berrange wrote:
> > On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
> > >   * KVM tool manages the network completely itself (with DHCP support?),
> > > no way to configure, except specify the modes (user|tap|none). I
> > > have not test it yet, but it should need explicit script to setup
> > > the network rules(e.g. NAT) for the guest access outside world.
> > > Anyway, there is no way for libvirt to control the guest network.
> > 
> > If KVM tool support TAP devices, can't be do whatever we like with
> > that just by passing in a configured TAP device from libvir ?
> 
> KVM tool currently creates and configures the TAP devices it uses, it
> shouldn't be an issue to have it use a TAP fd passed to it either.
> 
> How does libvirt do it? Create a /dev/tapX on it's own and pass the fd
> to the hypervisor?

Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA
mode), adds it to the neccessary bridge, and/or configures VEPA, etc
and then passes the FD to the hypervisor, with a ARGV parameter to
tell the HV which FD is being passed.

> > >   * console connection is implemented by setup ptys in libvirt, 
> > > stdout/stderr
> > > of kvm tool process is redirected to the master pty, and libvirt 
> > > connects
> > > to the slave pty. This works fine now, but it might be better if kvm
> > > tool could provide more advanced console mechanisms. Just like QEMU
> > > does?
> > 
> > This sounds good enough for now.
> 
> KVM tools does a redirection to a PTY, which at that point could be
> redirected to anywhere the user wants.
> 
> What features might be interesting to do on top of that?

I presume that Osier is just comparing with the features QEMU has available
for chardevs config, which include

 - PTYs
 - UNIX sockets
 - TCP sockets
 - UDP sockets
 - FIFO pipe
 - Plain file (output only obviously, but useful for logging)

libvirt doesn't specifically need any of them, but it can support those
options if they exist.

> > >   * Not much ways existed yet for external apps or user to query the guest
> > > informations. But this might be changed soon per KVM tool grows up
> > > quickly.
> > 
> > What sort of guest info are you thinking about ? The most immediate
> > pieces of info I can imagine we need are
> > 
> >  - Mapping between PIDs and  vCPU threads
> >  - Current balloon driver value
> 
> Those are pretty easily added using the IPC interface I've mentioned
> above. For example, 'kvm balloon' and 'kvm stat' will return a lot of
> info out of the balloon driver (including the memory stats VQ - which
> afaik we're probably the only ones who actually do that (but I might be
> wrong) :)

Ok, that sounds sufficient for the balloon info.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 7/7] kvmtool: Implementation for kvm tool driver

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:06PM +0800, Osier Yang wrote:
> Basically, the drivers is implemented by using kvm tool binary
> currently, (see ./kvm help for more info).
> 
> Current implementation supports define/undefine, start/destroy/,
> suspend/resume, connect to guest console via "virsh console",
> and balloon memory with with "virsh setmem" (using ./kvm balloon
> command). Also as it supports cgroup controllers "cpuacct", and
> "memory", so some other commands like "schedinfo", "memtune" can
> also work. Some other commands such as "domid", "domname", "dumpxml"
> ,"autostart", etc. are supported, as the driver is designed
> as a "stateful" driver, those APIs just need to talk with libvirtd
> simply.
> 
> As Native Linux KVM Tool is designed for both non-root and root users,
> the driver is designed just like QEMU, supports two modes of the
> connection:
> 
> kvmtool:///system
> kvmtool+unix:///system
> 
> kvmtool:///session
> kvmtool+unix:///session
> 
> An example of the domain XML (all the XMLs supported currently are
> listed):
> 
> % virsh -c kvm:///system dumpxml kvm_test
> 
>   kvm_test
>   88bf38f1-b6ab-cfa6-ab53-4b4c0993d894
>   524288
>   524288
>   1
>   
> hvm
> /boot/bzImage
> 
>   
>   
>   destroy
>   restart
>   restart
>   
> /usr/bin/kvmtool
> 
>   
>   
> 
> 
>   
>   
> 
> 
>   
> 
> 
>   
> 
> ---
>  cfg.mk   |1 +
>  daemon/Makefile.am   |4 +
>  daemon/libvirtd.c|7 +
>  po/POTFILES.in   |2 +
>  src/Makefile.am  |   36 +-
>  src/kvmtool/kvmtool_conf.c   |  130 ++
>  src/kvmtool/kvmtool_conf.h   |   66 +
>  src/kvmtool/kvmtool_driver.c | 3079 
> ++
>  src/kvmtool/kvmtool_driver.h |   29 +

My main suggestion here would be to split up the kvmtool_driver.c
file into 3 parts as we did with the QEMU driver.

  kvmtool_driver.c   -> Basic libvirt API glue
  kvmtool_command.c  -> ARGV generation
  kvmtool_process.c  -> KVMtool process start/stop/autostart/autodestroy

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 4/7] kvmtool: Add hook support for kvmtool domain

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:03PM +0800, Osier Yang wrote:
> Just like QEMU and LXC, kvm driver intends to support running hook
> script before domain starting and after domain shutdown too.
> ---
>  src/util/hooks.c |   11 ++-
>  src/util/hooks.h |8 
>  2 files changed, 18 insertions(+), 1 deletions(-)
> 
> diff --git a/src/util/hooks.c b/src/util/hooks.c
> index 110a94b..765cb68 100644
> --- a/src/util/hooks.c
> +++ b/src/util/hooks.c
> @@ -52,12 +52,14 @@ VIR_ENUM_DECL(virHookDaemonOp)
>  VIR_ENUM_DECL(virHookSubop)
>  VIR_ENUM_DECL(virHookQemuOp)
>  VIR_ENUM_DECL(virHookLxcOp)
> +VIR_ENUM_DECL(virHookKvmToolOp)
>  
>  VIR_ENUM_IMPL(virHookDriver,
>VIR_HOOK_DRIVER_LAST,
>"daemon",
>"qemu",
> -  "lxc")
> +  "lxc",
> +  "kvmtool")
>  
>  VIR_ENUM_IMPL(virHookDaemonOp, VIR_HOOK_DAEMON_OP_LAST,
>"start",
> @@ -79,6 +81,10 @@ VIR_ENUM_IMPL(virHookLxcOp, VIR_HOOK_LXC_OP_LAST,
>"start",
>"stopped")
>  
> +VIR_ENUM_IMPL(virHookKvmToolOp, VIR_HOOK_KVMTOOL_OP_LAST,
> +  "start",
> +  "stopped")
> +
>  static int virHooksFound = -1;
>  
>  /**
> @@ -230,6 +236,9 @@ virHookCall(int driver, const char *id, int op, int 
> sub_op, const char *extra,
>  case VIR_HOOK_DRIVER_LXC:
>  opstr = virHookLxcOpTypeToString(op);
>  break;
> +case VIR_HOOK_DRIVER_KVMTOOL:
> +opstr = virHookKvmToolOpTypeToString(op);
> +break;
>  }
>  if (opstr == NULL) {
>  virHookReportError(VIR_ERR_INTERNAL_ERROR,
> diff --git a/src/util/hooks.h b/src/util/hooks.h
> index fd7411c..69081c4 100644
> --- a/src/util/hooks.h
> +++ b/src/util/hooks.h
> @@ -31,6 +31,7 @@ enum virHookDriverType {
>  VIR_HOOK_DRIVER_DAEMON = 0,/* Daemon related events */
>  VIR_HOOK_DRIVER_QEMU,  /* QEmu domains related events */
>  VIR_HOOK_DRIVER_LXC,   /* LXC domains related events */
> +VIR_HOOK_DRIVER_KVMTOOL,   /* KVMTOOL domains related events */
>  
>  VIR_HOOK_DRIVER_LAST,
>  };
> @@ -67,6 +68,13 @@ enum virHookLxcOpType {
>  VIR_HOOK_LXC_OP_LAST,
>  };
>  
> +enum virHookKvmToolOpType {
> +VIR_HOOK_KVMTOOL_OP_START,/* domain is about to start */
> +VIR_HOOK_KVMTOOL_OP_STOPPED,  /* domain has stopped */
> +
> +VIR_HOOK_KVMTOOL_OP_LAST,
> +};
> +
>  int virHookInitialize(void);
>  
>  int virHookPresent(int driver);

Trivial, ACK


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 3/7] kvmtool: Add new enums and error codes for the driver

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:02PM +0800, Osier Yang wrote:
> ---
>  include/libvirt/virterror.h |1 +
>  src/driver.h|1 +
>  src/util/virterror.c|3 +++
>  3 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
> index a8549b7..deda42d 100644
> --- a/include/libvirt/virterror.h
> +++ b/include/libvirt/virterror.h
> @@ -84,6 +84,7 @@ typedef enum {
>  VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */
>  VIR_FROM_LOCKING = 42,   /* Error from lock manager */
>  VIR_FROM_HYPERV = 43,/* Error from Hyper-V driver */
> +VIR_FROM_KVMTOOL = 44,   /* Error from kvm tool driver */
>  } virErrorDomain;
>  
>  
> diff --git a/src/driver.h b/src/driver.h
> index 4c14aaa..158a13c 100644
> --- a/src/driver.h
> +++ b/src/driver.h
> @@ -30,6 +30,7 @@ typedef enum {
>  VIR_DRV_VMWARE = 13,
>  VIR_DRV_LIBXL = 14,
>  VIR_DRV_HYPERV = 15,
> +VIR_DRV_KVMTOOL = 16,
>  } virDrvNo;
>  
>  
> diff --git a/src/util/virterror.c b/src/util/virterror.c
> index 5006fa2..abb5b5a 100644
> --- a/src/util/virterror.c
> +++ b/src/util/virterror.c
> @@ -175,6 +175,9 @@ static const char *virErrorDomainName(virErrorDomain 
> domain) {
>  case VIR_FROM_HYPERV:
>  dom = "Hyper-V ";
>  break;
> +case VIR_FROM_KVMTOOL:
> +dom = "KVMTOOL ";
> +break;
>  }
>  return(dom);
>  }

Trivial, ACK

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 2/7] kvmtool: Add documents

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:01PM +0800, Osier Yang wrote:
> The document is rather rough now, but at least contains an domain
> config example of all the current supported XMLs, and tells how to
> play with the driver.
> ---
>  docs/drivers.html.in|1 +
>  docs/drvkvmtool.html.in |   87 
> +++
>  docs/index.html.in  |3 ++
>  docs/sitemap.html.in|4 ++
>  src/README  |3 +-
>  5 files changed, 97 insertions(+), 1 deletions(-)
>  create mode 100644 docs/drvkvmtool.html.in
> 
> diff --git a/docs/drivers.html.in b/docs/drivers.html.in
> index 75038fc..249c137 100644
> --- a/docs/drivers.html.in
> +++ b/docs/drivers.html.in
> @@ -29,6 +29,7 @@
>VMware 
> Workstation/Player
>Xen
>Microsoft 
> Hyper-V
> +  Native Linux KVM 
> Tool
>  
>  
>  Storage drivers
> diff --git a/docs/drvkvmtool.html.in b/docs/drvkvmtool.html.in
> new file mode 100644
> index 000..1b6acdf
> --- /dev/null
> +++ b/docs/drvkvmtool.html.in
> @@ -0,0 +1,87 @@
> +
> +  
> +KVM tool driver
> +
> +
> +
> +
> +  The libvirt KVMTOOL driver manages hypervisor Native Linux KVM Tool,
> +  it's implemented by using command line of kvm tool binary.
> +
> +
> +Project Links
> +
> +
> +  
> +The Native Linux 
> KVM Tool Native
> +Linux KVM Tool
> +  
> +
> +
> +Connections to the KVMTOOL driver
> +
> +  The libvirt KVMTOOL driver is a multi-instance driver, providing a 
> single
> +  system wide privileged driver (the "system" instance), and per-user
> +  unprivileged drivers (the "session" instance). The URI driver protocol
> +  is "kvmtool". Some example conection URIs for the libvirt driver are:
> +
> +
> +
> +  kvmtool:///session  (local access to per-user 
> instance)
> +  kvmtool+unix:///session (local access to per-user 
> instance)
> +
> +  kvmtool:///system   (local access to system 
> instance)
> +  kvmtool+unix:///system  (local access to system 
> instance)
> +
> +
> +  cgroups controllers "cpuacct", and "memory" are supported currently.
> +
> +
> +  Example config
> +
> +  
> +

As mentioned in a later patch, we should just use  type='kvm' here still


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 5/7] kvmtool: Add new domain type

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:04PM +0800, Osier Yang wrote:
> It's named as "kvmtool".
> ---
>  src/conf/domain_conf.c |4 +++-
>  src/conf/domain_conf.h |1 +
>  2 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 58f4d0f..55121d8 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -91,7 +91,8 @@ VIR_ENUM_IMPL(virDomainVirt, VIR_DOMAIN_VIRT_LAST,
>"hyperv",
>"vbox",
>"one",
> -  "phyp")
> +  "phyp",
> +  "kvmtool")
>  
>  VIR_ENUM_IMPL(virDomainBoot, VIR_DOMAIN_BOOT_LAST,
>"fd",
> @@ -4018,6 +4019,7 @@ virDomainChrDefParseXML(virCapsPtr caps,
>  if (type == NULL) {
>  def->source.type = VIR_DOMAIN_CHR_TYPE_PTY;
>  } else if ((def->source.type = virDomainChrTypeFromString(type)) < 0) {
> +VIR_WARN("type = %s", type);
>  virDomainReportError(VIR_ERR_XML_ERROR,
>   _("unknown type presented to host for character 
> device: %s"),
>   type);
> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
> index a3cb834..001bc46 100644
> --- a/src/conf/domain_conf.h
> +++ b/src/conf/domain_conf.h
> @@ -59,6 +59,7 @@ enum virDomainVirtType {
>  VIR_DOMAIN_VIRT_VBOX,
>  VIR_DOMAIN_VIRT_ONE,
>  VIR_DOMAIN_VIRT_PHYP,
> +VIR_DOMAIN_VIRT_KVMTOOL,
>  
>  VIR_DOMAIN_VIRT_LAST,
>  };

IMHO this patch is not required. The domain type is refering to the
hypervisor used for the domain, which is still 'kvm'. What is different
here is just the userspace device model.  If you look at the 3 different
Xen user spaces we support, all of them use  still.
So just use   here for kvmtool.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH] kvm tools: Introduce an ENV variable for the state dir

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:57:00PM +0800, Osier Yang wrote:
> Which is named as "KVMTOOL_STATE_DIR", so that the user can
> configure the path of state directly as he wants.
> ---
>  tools/kvm/main.c |7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/kvm/main.c b/tools/kvm/main.c
> index 05bc82c..37b2b1d 100644
> --- a/tools/kvm/main.c
> +++ b/tools/kvm/main.c
> @@ -13,7 +13,12 @@ static int handle_kvm_command(int argc, char **argv)
>  
>  int main(int argc, char *argv[])
>  {
> - kvm__set_dir("%s/%s", HOME_DIR, KVM_PID_FILE_PATH);
> + char *state_dir = getenv("KVMTOOL_STATE_DIR");
> +
> + if (state_dir)
> + kvm__set_dir("%s", state_dir);
> + else
> + kvm__set_dir("%s/%s", HOME_DIR, KVM_PID_FILE_PATH);
>  
>   return handle_kvm_command(argc - 1, &argv[1]);
>  }

As per my comments in the first patch, I don't think this is critical
for libvirt's needs. We should just honour the default location that
the KVM tool uses, rather than forcing a libvirt specific location.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] (no subject)

2011-12-06 Thread Daniel P. Berrange

On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
> Hi, all
> 
> This is a basic implementation of libvirt Native Linux KVM
> Tool driver. Note that this is just made with my own interest
> and spare time, it's not an endorsement/effort by Red Hat,
> and it isn't supported by Red Hat officially.
> 
> Basically, the driver is designed as *stateful*, as KVM tool
> doesn't maintain any info about the guest except a socket which
> for its own IPC. And it's implemented by using KVM tool binary,
> which is name "kvm" currently, along with cgroup controllers
> "cpuacct", and "memory" support. And as one of KVM tool's
> pricinple is to allow both the non-root and root user to play with.
> The driver is designed to support root and non-root too, just
> like QEMU does. Example of the connection URI:
> 
> virsh -c kvmtool:///system
> virsh -c kvmtool:///session
> virsh -c kvmtool+unix:///system
> virsh -c kvmtool+unix:///session
> 
> The implementation can support more or less than 15 virsh commands
> currently, including basic domain cycle operations (define/undefine,
> start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml,
> ,autostart, dominfo, etc.)
> 
> About the domain configuration:
>   * "kernel": must be specified as KVM tool only support boots
>  from the kernel currently (no integration with BIOS app yet).
> 
>   * "disk": only virtio bus is supported, and device type must be 'disk'.
> 
>   * "serial/console": only one console is supported, of type serial or
>  virtio (can extend to support multiple console as long as kvm tool
>  supports, libvirt already supported mutiple console, see upstream
>  commit 0873b688c).
> 
>   * "p9fs": only support specifying the source dir, and mount tag, only
>  type of 'mount' is supported.
> 
>   * "memballoon": only virtio is supported, and there is no way
>  to config the addr.
> 
>   * Multiple "disk" and "p9fs" is supported.
> 
>   * Graphics and network are not supported, will explain below.
> 
> Please see "[PATCH 7/8]" for an example of the domain config. (which
> contains all the XMLs supported by current implementation).
> 
> The problems of Native Linux KVM Tool from libvirt p.o.v:
> 
>   * Some destros package "qemu-kvm" as "kvm", also "kvm" is a long
> established name for "KVM" itself, so naming the project as
> "kvm" might be not a good idea. I assume it will be named
> as "kvmtool" in this implementation, never mind this if you
> don't like that, it can be updated easily. :-)

Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend
using 'kvmtool' as the binary name to avoid confusion with existing
'kvm' binaries based on QEMU.

>   * It still doesn't have an official package yet, even no "make install".
> means we have no way to check the dependancy and do the checking
> when 'configure'. I assume it will be installed as "/usr/bin/kvmtool"
> in this implementation. This is the main reason which can prevents
> upstream libvirt accepting the patches I guess.

Ok, not really a problem - we do similar for the regular QEMU driver.

>   * Lacks of options for user's configuration, such as "-vnc", there
> is no option for user to configure the properties for the "vnc",
> such as the port. It hides things, doesn't provide ways to query
> the properties too, this causes problems for libvirt to add the
> vnc support, as vnc clients such as virt-manager, virt-viewer,
> have no way to connect the guest. Even vncviewer can't.

Being able to specify a VNC port of libvirt's choosing is pretty
much mandatory to be able to support that.In addition being able
to specify the bind address is important to be able to control
security. eg to only bind to 127.0.0.1, or only to certain NICs
in a multi-NIC host.

>   * KVM tool manages the network completely itself (with DHCP support?),
> no way to configure, except specify the modes (user|tap|none). I
> have not test it yet, but it should need explicit script to setup
> the network rules(e.g. NAT) for the guest access outside world.
> Anyway, there is no way for libvirt to control the guest network.

If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?

>   * There is a gap about the domain status between KVM tool and libvirt,
> it's caused by KVM tool unlink()s the guest socket when user exits
> from console (both text and graphic), but libvirt still think the
> guest is running. 

Being able to reliably detect shutdown/exit of the KVM too is
a very important tasks, and we can't rely on waitpid/SIG_CHLD
because we want to daemonize all instances wrt libvirtd.

In the QEMU driver we keep open a socket to the monitor, and
when we see an I/O error / POLLHUP on the socket we know that
QEMU has quit.

What is this guest socket used for ? Could libvirt keep open a
connection to it ?

One other option would

Re: [Qemu-devel] KVM call minutes for November 29

2011-11-30 Thread Daniel P. Berrange

On Wed, Nov 30, 2011 at 11:22:37AM +0200, Alon Levy wrote:
> On Tue, Nov 29, 2011 at 04:59:51PM -0600, Anthony Liguori wrote:
> > On 11/29/2011 10:59 AM, Avi Kivity wrote:
> > >On 11/29/2011 05:51 PM, Juan Quintela wrote:
> > >>How to do high level stuff?
> > >>- python?
> > >>
> > >
> > >One of the disadvantages of the various scripting languages is the lack
> > >of static type checking, which makes it harder to do full sweeps of the
> > >source for API changes, relying on the compiler to catch type (or other)
> > >errors.
> > 
> > This is less interesting to me (figuring out the perfectest language to 
> > use).
> > 
> > I think what's more interesting is the practical execution of
> > something like this.  Just assuming we used python (since that's
> > what I know best), I think we could do something like this:
> > 
> > 1) We could write a binding layer to expose the QMP interface as a
> > python module.  This would be very little binding code but would
> > bring a bunch of functionality to python bits.
> 
> If going this route, I would propose to use gobject-introspection [1]
> instead of directly binding to python. You should be able to get
> multiple languages support this way, including python. I think it
> requires using glib 3.0, but I haven't tested it myself (yet). Maybe
> someone more knowledgable can shoot it down.
> 
> [1] http://live.gnome.org/GObjectIntrospection/
> 
> Actually this might make sense for the whole of QEMU. I think for a
> defined interface like QMP implementing the interface directly in python
> makes more sense. But having qemu itself GObject'ified and scriptable
> is cool. It would also lend it self to 4) without going through 2), but
> also make 2) possible (with any language, not just python).

I think taking advantage of GObject introspection is fine idea - I
certainly don't want to manually create python (or any other language)
bindings for any C code ever again. GObject + introspection takes away
all the burden of supporting access to C code from non-C languages.
Given that QEMU has already adopted GLib as mandatory infrastructure,
going down the GObject route seems like a very natural fit/direction
to take.

If people like the idea of a higher level language for QEMU, but are
concerned about performance / overhead of embedding a scripting
language in QEMU, then GObject introspection opens the possibilty of
writing in Vala, which is a higher level language which compiles
straight down to machine code like C does.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just 
> > > > > >>>>>> not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a 
> > > > > >>>>>> block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it 
> > > > > >>>>>> is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It 
> > > > > >>>>> means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested 
> > > > > >>>>> and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can 
> > > > > >>>>> reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) 
> > > > > >>>>> for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second 
> > > > > >>>> open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then 
> > > > > >>> you loose
> > > > > >>> the ability to rollback to the source host upon open failure for 
> > > > > >>> most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to 
> > > > > >>> a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an 
> > > > > >> important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a 
> > > > > > later
> > > > > > re-open will succeed too, because you have already elminated the 
> > > > > > possibility
> > > > > > of configuration mistakes, and will have caught most storage 
> > > > > > runtime errors
> > > > > > too. So there is a very significant difference in reliability 
> > > > > > between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at 
> > > > > > detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> > 
> > Yes
> 
> OK, so let's go back one step now - how is this related to
> 'rollback to source host'?

In the old libvirt migration protocol, by the time we run 'cont' on the
destination, the source QEMU has already been killed off, so there's
nothing to resume on failure.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just 
> > > > > >>>>>> not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a 
> > > > > >>>>>> block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it 
> > > > > >>>>>> is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It 
> > > > > >>>>> means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested 
> > > > > >>>>> and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can 
> > > > > >>>>> reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) 
> > > > > >>>>> for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second 
> > > > > >>>> open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then 
> > > > > >>> you loose
> > > > > >>> the ability to rollback to the source host upon open failure for 
> > > > > >>> most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to 
> > > > > >>> a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an 
> > > > > >> important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a 
> > > > > > later
> > > > > > re-open will succeed too, because you have already elminated the 
> > > > > > possibility
> > > > > > of configuration mistakes, and will have caught most storage 
> > > > > > runtime errors
> > > > > > too. So there is a very significant difference in reliability 
> > > > > > between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at 
> > > > > > detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Probably in an attempt to improve reliability :)

Not really. We can't simply let QEMU start its own CPUs, because there are
various tasks that need performing after the migration transfer finishes,
but before the CPUs are allowed to be started. eg

 - Finish 802.11Qb{g,h} (VEPA) network port profile association on target
 - Release leases for any resources associated with the source QEMU
   via a configured lock manager (eg sanlock)
 - Acquire leases for any resources associated with the target QEMU
   via a configured lock manager (eg sanlock)

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
> > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > >>>>>> Live migration with qcow2 or any other image format is just not 
> > > >>>>>> going to work 
> > > >>>>>> right now even with proper clustered storage.  I think doing a 
> > > >>>>>> block level flush 
> > > >>>>>> cache interface and letting block devices decide how to do it is 
> > > >>>>>> the best approach.
> > > >>>>>
> > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > >>>>> less (duplicated) code, is existing code that is well tested and 
> > > >>>>> doesn't
> > > >>>>> make migration much of a special case.
> > > >>>>>
> > > >>>>> If you want to avoid reopening the file on the OS level, we can 
> > > >>>>> reopen
> > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for 
> > > >>>>> now
> > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > >>>>>
> > > >>>>
> > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > >>>> yields different results from the first, does it invalidate any
> > > >>>> computations in between?
> > > >>>>
> > > >>>> What's wrong with just delaying the open?
> > > >>>
> > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you 
> > > >>> loose
> > > >>> the ability to rollback to the source host upon open failure for most
> > > >>> deployed versions of libvirt. We only fairly recently switched to a 
> > > >>> five
> > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > >>>
> > > >>> Daniel
> > > >>
> > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > >> fix but not a blocker.
> > > > 
> > > > If if the initial open succeeds, then it is far more likely that a later
> > > > re-open will succeed too, because you have already elminated the 
> > > > possibility
> > > > of configuration mistakes, and will have caught most storage runtime 
> > > > errors
> > > > too. So there is a very significant difference in reliability between 
> > > > doing
> > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > 
> > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > gracefully handling open errors because they are pretty frequent.
> > > 
> > > Do you have some more details on the kind of errors? Missing files,
> > > permissions, something like this? Or rather something related to the
> > > actual content of an image file?
> > 
> > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > setup. Access permissions due to incorrect user / group setup, or read
> > only mounts, or SELinux denials. Actual I/O errors are less common and
> > are not so likely to cause QEMU to fail to start any, since QEMU is
> > likely to just report them to the guest OS instead.
> 
> Do you run qemu with -S, then give a 'cont' command to start it?

Yes

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
> >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>>>>> Live migration with qcow2 or any other image format is just not going 
> >>>>>> to work 
> >>>>>> right now even with proper clustered storage.  I think doing a block 
> >>>>>> level flush 
> >>>>>> cache interface and letting block devices decide how to do it is the 
> >>>>>> best approach.
> >>>>>
> >>>>> I would really prefer reusing the existing open/close code. It means
> >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> >>>>> make migration much of a special case.
> >>>>>
> >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> >>>>> and in 1.1 we can use bdrv_reopen().
> >>>>>
> >>>>
> >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> >>>> yields different results from the first, does it invalidate any
> >>>> computations in between?
> >>>>
> >>>> What's wrong with just delaying the open?
> >>>
> >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> >>> the ability to rollback to the source host upon open failure for most
> >>> deployed versions of libvirt. We only fairly recently switched to a five
> >>> stage migration handshake to cope with rollback when 'cont' fails.
> >>>
> >>> Daniel
> >>
> >> I guess reopen can fail as well, so this seems to me to be an important
> >> fix but not a blocker.
> > 
> > If if the initial open succeeds, then it is far more likely that a later
> > re-open will succeed too, because you have already elminated the possibility
> > of configuration mistakes, and will have caught most storage runtime errors
> > too. So there is a very significant difference in reliability between doing
> > an 'open at startup + reopen at cont' vs just 'open at cont'
> > 
> > Based on the bug reports I see, we want to be very good at detecting and
> > gracefully handling open errors because they are pretty frequent.
> 
> Do you have some more details on the kind of errors? Missing files,
> permissions, something like this? Or rather something related to the
> actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > Live migration with qcow2 or any other image format is just not going 
> > > > > to work 
> > > > > right now even with proper clustered storage.  I think doing a block 
> > > > > level flush 
> > > > > cache interface and letting block devices decide how to do it is the 
> > > > > best approach.
> > > >
> > > > I would really prefer reusing the existing open/close code. It means
> > > > less (duplicated) code, is existing code that is well tested and doesn't
> > > > make migration much of a special case.
> > > >
> > > > If you want to avoid reopening the file on the OS level, we can reopen
> > > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > > and in 1.1 we can use bdrv_reopen().
> > > >
> > > 
> > > Intuitively I dislike _reopen style interfaces.  If the second open
> > > yields different results from the first, does it invalidate any
> > > computations in between?
> > > 
> > > What's wrong with just delaying the open?
> > 
> > If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > the ability to rollback to the source host upon open failure for most
> > deployed versions of libvirt. We only fairly recently switched to a five
> > stage migration handshake to cope with rollback when 'cont' fails.
> > 
> > Daniel
> 
> I guess reopen can fail as well, so this seems to me to be an important
> fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the possibility
of configuration mistakes, and will have caught most storage runtime errors
too. So there is a very significant difference in reliability between doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange

On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > Live migration with qcow2 or any other image format is just not going to 
> > > work 
> > > right now even with proper clustered storage.  I think doing a block 
> > > level flush 
> > > cache interface and letting block devices decide how to do it is the best 
> > > approach.
> >
> > I would really prefer reusing the existing open/close code. It means
> > less (duplicated) code, is existing code that is well tested and doesn't
> > make migration much of a special case.
> >
> > If you want to avoid reopening the file on the OS level, we can reopen
> > only the topmost layer (i.e. the format, but not the protocol) for now
> > and in 1.1 we can use bdrv_reopen().
> >
> 
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?
> 
> What's wrong with just delaying the open?

If you delay the 'open' until the mgmt app issues 'cont', then you loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a five
stage migration handshake to cope with rollback when 'cont' fails.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange

On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote:
> On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
> >On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
> >>What does libvirt actually do in the monitor prior to migration
> >>completing on the destination?  The least invasive way of doing
> >>delayed open of block devices is probably to make -incoming create a
> >>monitor and run a main loop before the block devices (and full
> >>device model) is initialized.  Since this isolates the changes
> >>strictly to migration, I'd feel okay doing this for 1.0 (although it
> >>might need to be in the stable branch).
> >
> >The way migration works with libvirt wrt QEMU interactions is now
> >as follows
> >
> >  1. Destination.
> >Run   qemu -incoming ...args...
> >Query chardevs via monitor
> >Query vCPU threads via monitor
> >Set disk / vnc passwords
> 
> Since RHEL carries Juan's patch, and Juan's patch doesn't handle
> disk passwords gracefully, how does libvirt cope with that?

No idea, that's the first I've heard of any patch that causes
problems with passwords in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange

On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
> What does libvirt actually do in the monitor prior to migration
> completing on the destination?  The least invasive way of doing
> delayed open of block devices is probably to make -incoming create a
> monitor and run a main loop before the block devices (and full
> device model) is initialized.  Since this isolates the changes
> strictly to migration, I'd feel okay doing this for 1.0 (although it
> might need to be in the stable branch).

The way migration works with libvirt wrt QEMU interactions is now
as follows

 1. Destination.
   Run   qemu -incoming ...args...
   Query chardevs via monitor
   Query vCPU threads via monitor
   Set disk / vnc passwords
   Set netdev link states
   Set balloon target

 2. Source
   Set  migration speed
   Set  migration max downtime
   Run  migrate command (detached)
   while 1
  Query migration status
  if status is failed or success
break;

 3. Destination
  If final status was success
 Run  'cont' in monitor
  else
 kill QEMU process

 4. Source
  If final status was success and 'cont' on dest succeeded
 kill QEMU process
  else
 Run 'cont' in monitor

In older libvirt, the bits from step 4, would actually take place
at the end of step 2. This meant we could end up with no QEMU
on either the source or dest, if starting CPUs on the dest QEMU
failed for some reason.

We would still really like to have a 'query-migrate' command for
the destination, so that we can confirm that the destination has
consumed all incoming migrate data successfully, rather than just
blindly starting CPUs and hoping for the best.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange

On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote:
> Am 26.10.2011 11:57, schrieb Daniel P. Berrange:
> > On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
> >> Kevin Wolf  writes:
> >>
> >>> Am 25.10.2011 16:06, schrieb Anthony Liguori:
> >>>> On 10/25/2011 08:56 AM, Kevin Wolf wrote:
> >>>>> Am 25.10.2011 15:05, schrieb Anthony Liguori:
> >>>>>> I'd be much more open to changing the default mode to cache=none FWIW 
> >>>>>> since the
> >>>>>> risk of data loss there is much, much lower.
> >>>>>
> >>>>> I think people said that they'd rather not have cache=none as default
> >>>>> because O_DIRECT doesn't work everywhere.
> >>>>
> >>>> Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
> >>>> know it 
> >>>> works on ext[234], btrfs, nfs.
> >>>
> >>> Besides file systems (and probably OSes) that don't support O_DIRECT,
> >>> there's another case: Our defaults don't work on 4k sector disks today.
> >>> You need to explicitly specify the logical_block_size qdev property for
> >>> cache=none to work on them.
> >>>
> >>> And changing this default isn't trivial as the right value doesn't only
> >>> depend on the host disk, but it's also guest visible. The only way out
> >>> would be bounce buffers, but I'm not sure that doing that silently is a
> >>> good idea...
> >>
> >> Sector size is a device property.
> >>
> >> If the user asks for a 4K sector disk, and the backend can't support
> >> that, we need to reject the configuration.  Just like we reject
> >> read-only backends for read/write disks.
> > 
> > I don't see why we need to reject a guest disk with 4k sectors,
> > just because the host disk only has 512 byte sectors. A guest
> > sector size that's a larger multiple of host sector size should
> > work just fine. It just means any guest sector write will update
> > 8 host sectors at a time. We only have problems if guest sector
> > size is not a multiple of host sector size, in which case bounce
> > buffers are the only option (other than rejecting the config
> > which is not too nice).
> > 
> > IIUC, current QEMU behaviour is
> > 
> >Guest 512Guest 4k
> >  Host 512   * OK  OK
> >  Host 4k* I/O Err OK
> > 
> > '*' marks defaults
> > 
> > IMHO, QEMU needs to work withot I/O errors in all of these
> > combinations, even if this means having to use bounce buffers
> > in some of them. That said, IMHO the default should be for
> > QEMU to avoid bounce buffers, which implies it should either
> > chose guest sector size to match host sector size, or it
> > should unconditionally use 4k guest. IMHO we need the former
> > 
> >Guest 512  Guest 4k
> >  Host 512   *OK OK
> >  Host 4k OK*OK
> 
> I'm not sure if a 4k host should imply a 4k guest by default. This means
> that some guests wouldn't be able to run on a 4k host. On the other
> hand, for those guests that can do 4k, it would be the much better option.
> 
> So I think this decision is the hard thing about it.

I guess it somewhat depends whether we want to strive for

 1. Give the user the fastest working config by default
 2. Give the user a working config by default
 3. Give the user the fastest (possibly broken) config by default

IMHO 3 is not a serious option, but I could see 2 as a reasonable
tradeoff to avoid complexity in chosing QEMU defaults. The user
would have a working config with 512 sectors, but sub-optimal perf
on 4k hosts due to bounce buffering. Ideally libvirt or other
higher app would be setting the best block size that a guest
can support by default, so bounce buffers would rarely be needed.
So only people using QEMU directly without setting a block size
would ordinarily suffer the bounce buffer perf hit on a 4k host
host

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange

On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
> Kevin Wolf  writes:
> 
> > Am 25.10.2011 16:06, schrieb Anthony Liguori:
> >> On 10/25/2011 08:56 AM, Kevin Wolf wrote:
> >>> Am 25.10.2011 15:05, schrieb Anthony Liguori:
>  I'd be much more open to changing the default mode to cache=none FWIW 
>  since the
>  risk of data loss there is much, much lower.
> >>>
> >>> I think people said that they'd rather not have cache=none as default
> >>> because O_DIRECT doesn't work everywhere.
> >> 
> >> Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
> >> know it 
> >> works on ext[234], btrfs, nfs.
> >
> > Besides file systems (and probably OSes) that don't support O_DIRECT,
> > there's another case: Our defaults don't work on 4k sector disks today.
> > You need to explicitly specify the logical_block_size qdev property for
> > cache=none to work on them.
> >
> > And changing this default isn't trivial as the right value doesn't only
> > depend on the host disk, but it's also guest visible. The only way out
> > would be bounce buffers, but I'm not sure that doing that silently is a
> > good idea...
> 
> Sector size is a device property.
> 
> If the user asks for a 4K sector disk, and the backend can't support
> that, we need to reject the configuration.  Just like we reject
> read-only backends for read/write disks.

I don't see why we need to reject a guest disk with 4k sectors,
just because the host disk only has 512 byte sectors. A guest
sector size that's a larger multiple of host sector size should
work just fine. It just means any guest sector write will update
8 host sectors at a time. We only have problems if guest sector
size is not a multiple of host sector size, in which case bounce
buffers are the only option (other than rejecting the config
which is not too nice).

IIUC, current QEMU behaviour is

   Guest 512Guest 4k
 Host 512   * OK  OK
 Host 4k* I/O Err OK

'*' marks defaults

IMHO, QEMU needs to work withot I/O errors in all of these
combinations, even if this means having to use bounce buffers
in some of them. That said, IMHO the default should be for
QEMU to avoid bounce buffers, which implies it should either
chose guest sector size to match host sector size, or it
should unconditionally use 4k guest. IMHO we need the former

   Guest 512  Guest 4k
 Host 512   *OK OK
 Host 4k OK*OK

Yes, I know there are other wierd sector sizes besides 512
and 4k, but the same general principals apply of either one
being a multiple of the other, or needing to use bounce
buffers.

> If the backend can only support it by using bounce buffers, I'd say
> reject it unless the user explicitly permits bounce buffers.  But that's
> debatable.

I don't think it really adds value for QEMU to force the user to specify
some extra magic flag in order to make the user's requested config
actually be honoured. If a config needs bounce buffers, QEMU should just
do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app
is in a better position to inform users about the consequences.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/11] virt: Introducing libvirt VM class

2011-10-12 Thread Daniel P. Berrange

On Tue, Oct 11, 2011 at 06:07:11PM -0300, Lucas Meneghel Rodrigues wrote:
> This is a first attempt at providing a libvirt VM class,
> in order to implement the needed methods for virt testing.
> With this class, we will be able to implement a libvirt
> test, that behaves similarly to the KVM test.
> 
> As of implementation details, libvirt_vm uses virsh
> (a userspace program written on top of libvirt) to
> do domain start, stop, verification of status and
> other common operations. The reason why virsh was
> used is to get more coverage of the userspace stack
> that libvirt offers, and also to catch issues that
> virsh users would catch.

Personally I would have recommended that you use the libvirt Python API.
virsh is a very thin layer over the libvirt API, which mostly avoidse
adding any logic of its own, so once it has been tested once, there's
not much value in doing more. By using the Python API directly, you will
be able todo more intelligent handling of errors, since you'll get the
full libvirt python exception object instead of a blob of stuff on stderr.
Not to mention that it is so much more efficient, and robust against
any future changes in virsh.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

2011-09-28 Thread Daniel P. Berrange

On Wed, Sep 28, 2011 at 12:19:09PM +0200, Reeted wrote:
> On 09/28/11 11:53, Daniel P. Berrange wrote:
> >On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
> >>YES!
> >>It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
> >>
> >>...meaning? :-)
> >I've no idea. I was always under the impression that 'vhost=on' was
> >the 'make it go much faster' switch. So something is going wrong
> >here that I cna't explain.
> >
> >Perhaps one of the network people on this list can explain...
> >
> >
> >To turn vhost off in the libvirt XML, you should be able to use
> >  for the interface in question,eg
> >
> >
> > 
> >   
> >   
> >   
> > 
> 
> 
> Ok that seems to work: it removes the vhost part in the virsh launch
> hence cutting down 12secs of boot time.
> 
> If nobody comes out with an explanation of why, I will open another
> thread on the kvm list for this. I would probably need to test disk
> performance on vhost=on to see if it degrades or it's for another
> reason that boot time is increased.

Be sure to CC the qemu-devel mailing list too next time, since that has
a wider audience who might be able to help


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

2011-09-28 Thread Daniel P. Berrange

On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
> On 09/28/11 11:28, Daniel P. Berrange wrote:
> >On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
> >>On 09/28/11 09:51, Daniel P. Berrange wrote:
> >>>>This is my bash commandline:
> >>>>
> >>>>/opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
> >>>>-m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
> >>>>ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
> >>>>-chardev 
> >>>>socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
> >>>>-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
> >>>>-boot order=dc,menu=on -drive 
> >>>>file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
> >>>>-device 
> >>>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>>>-drive 
> >>>>if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
> >>>>-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
> >>>>-net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
> >>>>-usb -vnc 127.0.0.1:0 -vga cirrus -device
> >>>>virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
> >>>This shows KVM is being requested, but we should validate that KVM is
> >>>definitely being activated when under libvirt. You can test this by
> >>>doing:
> >>>
> >>> virsh qemu-monitor-command vmname1 'info kvm'
> >>kvm support: enabled
> >>
> >>I think I would see a higher impact if it was KVM not enabled.
> >>
> >>>>Which was taken from libvirt's command line. The only modifications
> >>>>I did to the original libvirt commandline (seen with ps aux) were:
> >
> >>>>- Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
> >>>>-device 
> >>>>virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
> >>>>Has been simplified to: -net nic,model=virtio -net
> >>>>tap,ifname=tap0,script=no,downscript=no
> >>>>and manual bridging of the tap0 interface.
> >>>You could have equivalently used
> >>>
> >>>  -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
> >>>  -device 
> >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
> >>It's this! It's this!! (thanks for the line)
> >>
> >>It raises boot time by 10-13 seconds
> >Ok, that is truely bizarre and I don't really have any explanation
> >for why that is. I guess you could try 'vhost=off' too and see if that
> >makes the difference.
> 
> YES!
> It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
> 
> ...meaning? :-)

I've no idea. I was always under the impression that 'vhost=on' was
the 'make it go much faster' switch. So something is going wrong
here that I cna't explain.

Perhaps one of the network people on this list can explain...


To turn vhost off in the libvirt XML, you should be able to use
 for the interface in question,eg



  
  
  


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-28 Thread Daniel P. Berrange

On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
> On 09/28/11 09:51, Daniel P. Berrange wrote:
> >>This is my bash commandline:
> >>
> >>/opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
> >>-m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
> >>ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
> >>-chardev 
> >>socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
> >>-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
> >>-boot order=dc,menu=on -drive 
> >>file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
> >>-device 
> >>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >>-drive 
> >>if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
> >>-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
> >>-net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
> >>-usb -vnc 127.0.0.1:0 -vga cirrus -device
> >>virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
> >
> >This shows KVM is being requested, but we should validate that KVM is
> >definitely being activated when under libvirt. You can test this by
> >doing:
> >
> > virsh qemu-monitor-command vmname1 'info kvm'
> 
> kvm support: enabled
> 
> I think I would see a higher impact if it was KVM not enabled.
> 
> >>Which was taken from libvirt's command line. The only modifications
> >>I did to the original libvirt commandline (seen with ps aux) were:


> >>- Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
> >>-device 
> >>virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
> >>Has been simplified to: -net nic,model=virtio -net
> >>tap,ifname=tap0,script=no,downscript=no
> >>and manual bridging of the tap0 interface.
> >You could have equivalently used
> >
> >  -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
> >  -device 
> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
> 
> It's this! It's this!! (thanks for the line)
> 
> It raises boot time by 10-13 seconds

Ok, that is truely bizarre and I don't really have any explanation
for why that is. I guess you could try 'vhost=off' too and see if that
makes the difference.

> 
> But now I don't know where to look During boot there is a pause
> usually between /scripts/init-bottom  (Ubuntu 11.04 guest) and the
> appearance of login prompt, however that is not really meaningful
> because there is probably much background activity going on there,
> with init etc. which don't display messages
> 
> 
> init-bottom does just this
> 
> -
> #!/bin/sh -e
> # initramfs init-bottom script for udev
> 
> PREREQ=""
> 
> # Output pre-requisites
> prereqs()
> {
> echo "$PREREQ"
> }
> 
> case "$1" in
> prereqs)
> prereqs
> exit 0
> ;;
> esac
> 
> 
> # Stop udevd, we'll miss a few events while we run init, but we catch up
> pkill udevd
> 
> # Move /dev to the real filesystem
> mount -n -o move /dev ${rootmnt}/dev
> -
> 
> It doesn't look like it should take time to execute.
> So there is probably some other background activity going on... and
> that is slower, but I don't know what that is.
> 
> 
> Another thing that can be noticed is that the dmesg message:
> 
> [   13.290173] eth0: no IPv6 routers present
> 
> (which is also the last message)
> 
> happens on average 1 (one) second earlier in the fast case (-net)
> than in the slow case (-netdev)

Hmm, none of that looks particularly suspect. So I don't really have
much idea what else to try apart from the 'vhost=off' possibilty.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-28 Thread Daniel P. Berrange

On Tue, Sep 27, 2011 at 08:10:21PM +0200, Reeted wrote:
> I repost this, this time by also including the libvirt mailing list.
> 
> Info on my libvirt: it's the version in Ubuntu 11.04 Natty which is
> 0.8.8-1ubuntu6.5 . I didn't recompile this one, while Kernel and
> qemu-kvm are vanilla and compiled by hand as described below.
> 
> My original message follows:
> 
> This is really strange.
> 
> I just installed a new host with kernel 3.0.3 and Qemu-KVM 0.14.1
> compiled by me.
> 
> I have created the first VM.
> This is on LVM, virtio etc... if I run it directly from bash
> console, it boots in 8 seconds (it's a bare ubuntu with no
> graphics), while if I boot it under virsh (libvirt) it boots in
> 20-22 seconds. This is the time from after Grub to the login prompt,
> or from after Grub to the ssh-server up.
>
> I was almost able to replicate the whole libvirt command line on the
> bash console, and it still goes almost 3x faster when launched from
> bash than with virsh start vmname. The part I wasn't able to
> replicate is the -netdev part because I still haven't understood the
> semantics of it.

-netdev is just an alternative way of setting up networking that
avoids QEMU's nasty VLAN concept. Using -netdev allows QEMU to
use more efficient codepaths for networking, which should improve
the network performance.

> This is my bash commandline:
> 
> /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
> -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
> ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
> -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
> -boot order=dc,menu=on -drive 
> file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
> -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> -drive 
> if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
> -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
> -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
> -usb -vnc 127.0.0.1:0 -vga cirrus -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


This shows KVM is being requested, but we should validate that KVM is
definitely being activated when under libvirt. You can test this by
doing:

virsh qemu-monitor-command vmname1 'info kvm'

> Which was taken from libvirt's command line. The only modifications
> I did to the original libvirt commandline (seen with ps aux) were:
> 
> - Removed -S

Fine, has no effect on performance.

> - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
> -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
> Has been simplified to: -net nic,model=virtio -net
> tap,ifname=tap0,script=no,downscript=no
> and manual bridging of the tap0 interface.

You could have equivalently used

 -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3

That said, I don't expect this has anything todo with the performance
since booting a guest rarely involves much network I/O unless you're
doing something odd like NFS-root / iSCSI-root.

> Firstly I had thought that this could be fault of the VNC: I have
> compiled qemu-kvm with no separate vnc thread. I thought that
> libvirt might have connected to the vnc server at all times and this
> could have slowed down the whole VM.
> But then I also tried connecting vith vncviewer to the KVM machine
> launched directly from bash, and the speed of it didn't change. So
> no, it doesn't seem to be that.

Yeah, I have never seen VNC be responsible for the kind of slowdown
you describe.

> BTW: is the slowdown of the VM on "no separate vnc thread" only in
> effect when somebody is actually connected to VNC, or always?

Probably, but again I dont think it is likely to be relevant here.

> Also, note that the time difference is not visible in dmesg once the
> machine has booted. So it's not a slowdown in detecting devices.
> Devices are always detected within the first 3 seconds, according to
> dmesg, at 3.6 seconds the first ext4 mount begins. It seems to be
> really the OS boot that is slow... it seems an hard disk performance
> problem.


There are a couple of things that would be different between running the
VM directly, vs via libvirt.

 - Security drivers - SELinux/AppArmour
 - CGroups

If it is was AppArmour causing this slowdown I don't think you would have
been the first person to complain, so lets ignore that. Which leaves
cgroups as a likely culprit. Do a

  grep cgroup /proc/mounts

if any of them are mounted, then for each cgroups mount in turn,

 - Umount the cgroup
 - Restart libvirtd
 - Test your guest boot performance


Regards,
Daniel
-- 
|: http://berrange.com  -o-h

Re: How many threads should a kvm vm be starting?

2011-09-28 Thread Daniel P. Berrange

On Tue, Sep 27, 2011 at 04:04:41PM -0600, Thomas Fjellstrom wrote:
> On September 27, 2011, Avi Kivity wrote:
> > On 09/27/2011 03:29 AM, Thomas Fjellstrom wrote:
> > > I just noticed something interesting, a virtual machine on one of my
> > > servers seems to have 69 threads (including the main thread). Other
> > > guests on the machine only have a couple threads.
> > > 
> > > Is this normal? or has something gone horribly wrong?
> > 
> > It's normal if the guest does a lot of I/O.  The thread count should go
> > down when the guest idles.
> 
> Ah, that would make sense. Though it kind of defeats assigning a vm a single 
> cpu/core. A single VM can now DOS an entire multi-core-cpu server. It pretty 
> much pegged my dual core (with HT) server for a couple hours.

You can mitigate these problems by putting each KVM process in its own
cgroup, and using the 'cpu_shares' tunable to ensure that each KVM
process gets the same relative ratio of CPU time, regardless of how
many threads it is running. With newer kernels there are other CPU
tunables for placing hard caps on CPU utilization of the process as
a whole too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Avoid the use of deprecated gnutls gnutls_*_set_priority functions.

2011-08-25 Thread Daniel P. Berrange

On Thu, Aug 25, 2011 at 11:54:41AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 4, 2011 at 11:00 PM, Raghavendra D Prabhu
>  wrote:
> > The gnutls_*_set_priority family of functions has been marked deprecated
> > in 2.12.x. These functions have been superceded by
> > gnutls_priority_set_direct().
> >
> > Signed-off-by: Raghavendra D Prabhu 
> > ---
> >  ui/vnc-tls.c |   20 +---
> >  1 files changed, 1 insertions(+), 19 deletions(-)
> >
> > diff --git a/ui/vnc-tls.c b/ui/vnc-tls.c
> > index dec626c..33a5d8c 100644
> > --- a/ui/vnc-tls.c
> > +++ b/ui/vnc-tls.c
> > @@ -286,10 +286,6 @@ int vnc_tls_validate_certificate(struct VncState *vs)
> >
> >  int vnc_tls_client_setup(struct VncState *vs,
> >                          int needX509Creds) {
> > -    static const int cert_type_priority[] = { GNUTLS_CRT_X509, 0 };
> > -    static const int protocol_priority[]= { GNUTLS_TLS1_1, GNUTLS_TLS1_0, 
> > GNUTLS_SSL3, 0 };
> > -    static const int kx_anon[] = {GNUTLS_KX_ANON_DH, 0};
> > -    static const int kx_x509[] = {GNUTLS_KX_DHE_DSS, GNUTLS_KX_RSA, 
> > GNUTLS_KX_DHE_RSA, GNUTLS_KX_SRP, 0};
> >
> >     VNC_DEBUG("Do TLS setup\n");
> >     if (vnc_tls_initialize() < 0) {
> > @@ -310,21 +306,7 @@ int vnc_tls_client_setup(struct VncState *vs,
> >             return -1;
> >         }
> >
> > -        if (gnutls_kx_set_priority(vs->tls.session, needX509Creds ? 
> > kx_x509 : kx_anon) < 0) {
> > -            gnutls_deinit(vs->tls.session);
> > -            vs->tls.session = NULL;
> > -            vnc_client_error(vs);
> > -            return -1;
> > -        }
> > -
> > -        if (gnutls_certificate_type_set_priority(vs->tls.session, 
> > cert_type_priority) < 0) {
> > -            gnutls_deinit(vs->tls.session);
> > -            vs->tls.session = NULL;
> > -            vnc_client_error(vs);
> > -            return -1;
> > -        }
> > -
> > -        if (gnutls_protocol_set_priority(vs->tls.session, 
> > protocol_priority) < 0) {
> > +        if (gnutls_priority_set_direct(vs->tls.session, needX509Creds ? 
> > "NORMAL" : "NORMAL:+ANON-DH", NULL) < 0) {
> >             gnutls_deinit(vs->tls.session);
> >             vs->tls.session = NULL;
> >             vnc_client_error(vs);
> > --
> > 1.7.6
> 
> Daniel,
> This patch looks good to me but I don't know much about gnutls or
> crypto in general.  Would you be willing to review this?

ACK, this approach is different from what I did in libvirt, but it matches
the recommendations in the GNUTLS manual for setting priority, so I believe
it is good.

Signed-off-by: Daniel P. Berrange 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DMI BIOS String

2011-08-22 Thread Daniel P. Berrange

On Mon, Aug 22, 2011 at 03:52:19PM +1200, Derek wrote:
> Hi Folks,
> 
> I could not track down any solid info on modifying the DMI BIOS string.
> 
> For example, in VirtualBox you can use 'vboxmanage setsextradata' to
> set the BIOS product and vendor string per VM.
> 
> Any ideas if this is possible with KVM?

If using QEMU directly you can use '-smbios' args. eg

-smbios "type=0,vendor=LENOVO,version=6FET82WW (3.12 )"
-smbios 
"type=1,manufacturer=Fedora,product=Virt-Manager,version=0.8.2-3.fc14,serial=32dfcb37-5af1-552b-357c-be8c3aa38310,uuid=c7a5fdbd-edaf-9455-926a-d65c16db1809,sku=1234567890,family=Red
 Hat"

If using QEMU via libvirt you can use the following:

  http://libvirt.org/formatdomain.html#elementsSysinfo


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-05 Thread Daniel P. Berrange

From: "Daniel P. Berrange" 

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined

Signed-off-by: Daniel P. Berrange 
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default "accelerator", kvm */
 p = "kvm";
+#else
+/* Use the default "accelerator", tcg */
+p = "tcg";
+#endif
 }
 
 while (!accel_initalised && *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-04 Thread Daniel P. Berrange

From: "Daniel P. Berrange" 

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default "accelerator", kvm */
 p = "kvm";
+#else
+/* Use the default "accelerator", tcg */
+p = "tcg";
+#endif
 }
 
 while (!accel_initalised && *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel P. Berrange

On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote:
> On 06/20/2011 04:38 PM, Daniel Gollub wrote:
> >Introduce panic hypercall to enable the crashing guest to notify the
> >host. This enables the host to run some actions as soon a guest
> >crashed (kernel panic).
> >
> >This patch series introduces the panic hypercall at the host end.
> >As well as the hypercall for KVM paravirtuliazed Linux guests, by
> >registering the hypercall to the panic_notifier_list.
> >
> >The basic idea is to create KVM crashdump automatically as soon the
> >guest paniced and power-cycle the VM (e.g. libvirt).
> 
> This would be more easily done via a "panic device" (I/O port or
> memory-mapped address) that the guest hits.  It would be intercepted
> by qemu without any new code in kvm.\
> 
> However, I'm not sure I see the gain.  Most enterprisey guests
> already contain in-guest crash dumpers which provide more
> information than a qemu memory dump could, since they know exact
> load addresses etc. and are integrated with crash analysis tools.
> What do you have in mind?

Well libvirt can capture a "core" file by doing 'virsh dump $GUESTNAME'.
This actually uses the QEMU monitor migration command to capture the
entire of QEMU memory. The 'crash' command line tool actually knows
how to analyse this data format as it would a normal kernel crashdump.

I think having a way for a guest OS to notify the host that is has
crashed would be useful. libvirt could automatically do a crash
dump of the QEMU memory, or at least pause the guest CPUs and notify
the management app of the crash, which can then decide what todo.
You can also use tools like 'virt-dmesg' which uses libvirt to peek
into guest memory to extract the most recent kernel dmesg logs (even
if the guest OS itself is crashed & didn't manage to send them out
via netconsole or something else).

This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange

On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > Hello, all,
> > > > 
> > > > I have prepared to work on a feature called "Disk I/O limits" for 
> > > > qemu-kvm projeect.
> > > > This feature will enable the user to cap disk I/O amount performed 
> > > > by a VM.It is important for some storage resources to be shared among 
> > > > multi-VMs. As you've known, if some of VMs are doing excessive disk 
> > > > I/O, they will hurt the performance of other VMs.
> > > > 
> > > 
> > > Hi Zhiyong,
> > > 
> > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > and implement the feature again in qemu?
> > 
> > The finest level of granularity offered by cgroups apply limits per QEMU
> > process. So the blkio controller can't be used to apply controls directly
> > to individual disks used by QEMU, only the VM as a whole.
> 
> So are multiple VMs using same disk. Then put multiple VMs in same
> cgroup and apply the limit on that disk.
> 
> Or if you want to put a system wide limit on a disk, then put all
> VMs in root cgroup and put limit on root cgroups.
> 
> I fail to understand what's the exact requirement here. I thought
> the biggest use case was isolation one VM from other which might
> be sharing same device. Hence we were interested in putting 
> per VM limit on disk and not a system wide limit on disk (independent
> of VM).

No, it isn't about putting limits on a disk independant of a VM. It is
about one VM having multiple disks, and wanting to set different policies
for each of its virtual disks. eg

  qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3

and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
limited to 50 MB/s.  You can't do that kind of thing with cgroups,
because it can only control the entire process, not individual
resources within the process.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange

On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > Hello, all,
> > 
> > I have prepared to work on a feature called "Disk I/O limits" for 
> > qemu-kvm projeect.
> > This feature will enable the user to cap disk I/O amount performed by a 
> > VM.It is important for some storage resources to be shared among multi-VMs. 
> > As you've known, if some of VMs are doing excessive disk I/O, they will 
> > hurt the performance of other VMs.
> > 
> 
> Hi Zhiyong,
> 
> Why not use kernel blkio controller for this and why reinvent the wheel
> and implement the feature again in qemu?

The finest level of granularity offered by cgroups apply limits per QEMU
process. So the blkio controller can't be used to apply controls directly
to individual disks used by QEMU, only the VM as a whole.

We networking we can use 'net_cls' cgroups controller for the process
as a whole, or attach  'tc' to individual TAP devices for per-NIC
throttling, both of which ultimately use the same kernel functionality.
I don't see an equivalent option for throttling individual disks that
would reuse functionality from the blkio controller.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: drop -enable-nesting

2011-05-31 Thread Daniel P. Berrange

On Mon, May 30, 2011 at 06:19:14PM +0300, Avi Kivity wrote:
> On 05/30/2011 06:15 PM, Jan Kiszka wrote:
> >On 2011-05-30 17:10, Roedel, Joerg wrote:
> >>  On Mon, May 30, 2011 at 11:04:02AM -0400, Jan Kiszka wrote:
> >>>  On 2011-05-30 16:38, Nadav Har'El wrote:
>   On Mon, May 30, 2011, Jan Kiszka wrote about "drop -enable-nesting 
>  (was: [PATCH 3/7] cpu model bug fixes and definition corrections...)":
> >  On 2011-05-30 10:18, Roedel, Joerg wrote:
> >>  On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote:
> >>
> >>>  J�rg, how to deal with -enable-nesting in qemu-kvm to align behavior
> >>>  with upstream?
> >>
> >>  My personal preference is to just remove it. In upstream-qemu it is
> >>  enabled/disabled by +/-svm. -enable-nesting is just a historic thing
> >>  which can be wiped out.
> 
>   "-enable-nesting" could remain as a synonym for enabling either VMX or 
>  SVM
>   in the guest, depending on what was available in the host (because KVM 
>  now
>   supports both nested SVM and nested VMX, but not SVM-on-VMX or vice 
>  versa).
> >>>
> >>>  Why? Once nesting is stable (I think SVM already is), there is no reason
> >>>  for an explicit enable. And you can always mask it out via -cpu.
> >>>
> >>>  BTW, what are the defaults for SVM right now in qemu-kvm and upstream?
> >>>  Enable if the modeled CPU supports it?
> >>
> >>  qemu-kvm still needs -enable-nesting, otherwise it is disabled. Upstream
> >>  qemu should enable it unconditionally (can be disabled with -cpu ,-svm).
> >
> >Then let's start with aligning qemu-kvm defaults to upstream? I guess
> >that's what the diff I was citing yesterday is responsible for.
> >
> >In the same run, -enable-nesting could dump a warning on the console
> >that this switch is obsolete and will be removed from future versions.
> 
> I think it's safe to drop -enable-nesting immediately.  Dan, does
> libvirt make use of it?

Yes, but it should be safe to drop it. Currently, if the user specifies
a CPU with the 'svm' flag present in libvirt guest XML, then we will
pass args '-cpu +svm -enable-nesting'. So if we drop --enable-nesting,
then libvirt will simply omit it and everything should still work because
we have still got +svm set.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-14 Thread Daniel P. Berrange

On Wed, Apr 13, 2011 at 10:56:21PM +0300, Blue Swirl wrote:
> On Wed, Apr 13, 2011 at 4:08 PM, Luiz Capitulino  
> wrote:
> > On Tue, 12 Apr 2011 21:31:18 +0300
> > Blue Swirl  wrote:
> >
> >> On Tue, Apr 12, 2011 at 10:52 AM, Avi Kivity  wrote:
> >> > On 04/11/2011 08:15 PM, Blue Swirl wrote:
> >> >>
> >> >> On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbruster
> >> >>  wrote:
> >> >> >  Avi Kivity  writes:
> >> >> >
> >> >> >>  On 04/08/2011 12:41 AM, Anthony Liguori wrote:
> >> >> >>>
> >> >> >>>  And it's a good thing to have, but exposing this as the only API to
> >> >> >>>  do something as simple as generating a guest crash dump is not the
> >> >> >>>  friendliest thing in the world to do to users.
> >> >> >>
> >> >> >>  nmi is a fine name for something that corresponds to a real-life nmi
> >> >> >>  button (often labeled "NMI").
> >> >> >
> >> >> >  Agree.
> >> >>
> >> >> We could also introduce an alias mechanism for user friendly names, so
> >> >> nmi could be used in addition of full path. Aliases could be useful
> >> >> for device paths as well.
> >> >
> >> > Yes.  Perhaps limited to the human monitor.
> >>
> >> I'd limit all debugging commands (including NMI) to the human monitor.
> >
> > Why?
> 
> Do they have any real use in production environment? Also, we should
> have the freedom to change the debugging facilities (for example, to
> improve some internal implementation) as we want without regard to
> compatibility to previous versions.

We have users of libvirt requesting that we add an API for triggering
a NMI. They want this for support in a production environment, to be
able to initiate Windows crash dumps.  We really don't want to have to
use HMP passthrough for this, instead of a proper QMP command.

More generally I don't want to see stuff in HMP, that isn't in the QMP.
We already have far too much that we have to do via HMP passthrough in
libvirt due to lack of QMP commands, to the extent that we might as
well have just ignored QMP and continued with HMP for everything.

If we want the flexibility to change the debugging commands between
releases then we should come up with a plan to do this within the
scope of QMP, not restrict them to HMP only.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-04 Thread Daniel P. Berrange

On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> Date: Mon, 7 Mar 2011 17:05:15 +0800
> Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command
> 
> inject-nmi command injects an NMI on all CPUs of guest.
> It is only supported for x86 guest currently, it will
> returns "Unsupported" error for non-x86 guest.
> 
> ---
>  hmp-commands.hx |2 +-
>  monitor.c   |   18 +-
>  qmp-commands.hx |   29 +
>  3 files changed, 47 insertions(+), 2 deletions(-)

Does anyone have any feedback on this addition, or are all new
QMP patch proposals blocked pending Anthony's QAPI work ?

We'd like to support it in libvirt and thus want it to be
available in QMP, as well as HMP.

> @@ -2566,6 +2566,22 @@ static void do_inject_nmi(Monitor *mon, const QDict 
> *qdict)
>  break;
>  }
>  }
> +
> +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
> **ret_data)
> +{
> +CPUState *env;
> +
> +for (env = first_cpu; env != NULL; env = env->next_cpu)
> +cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +
> +return 0;
> +}
> +#else
> +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
> **ret_data)
> +{
> +qerror_report(QERR_UNSUPPORTED);
> +return -1;
> +}
>  #endif
>  

Interesting that with HMP you need to specify a single CPU index, but
with QMP it is injecting to all CPUs at once. Is there any compelling
reason why we'd ever need the ability to only inject to a single CPU
from an app developer POV ?

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [Qemu-devel] KVM call minutes for Mar 15

2011-03-17 Thread Daniel P. Berrange

On Tue, Mar 15, 2011 at 12:06:06PM -0700, Chris Wright wrote:
> * Anthony Liguori (anth...@codemonkey.ws) wrote:
> > On 03/15/2011 09:53 AM, Chris Wright wrote:
> > > QAPI
> 
> > >- c library implementation is critical to have unit tests and test
> > >   driven development
> > >   - thread safe?
> > > - no shared state, no statics.
> > > - threading model requires lock for the qmp session
> > >   - licensiing?
> > > - LGPL
> > >   - forwards/backwards compat?
> > > - designed with that in mind see wiki:
> > >
> > >   http://wiki.qemu.org/Features/QAPI
> > 
> > One neat feature of libqmp is that once libvirt has a better QMP
> > passthrough interface, we can create a QmpSession that uses libvirt.
> > 
> > It would look something like:
> > 
> > QmpSession *libqmp_session_new_libvirt(virDomainPtr dom);
> 
> Looks like you mean this?
> 
>-> request QmpSession -> 
> client  libvirt
><- return QmpSession  <-
> 
> client -> QmpSession -> QMP -> QEMU
> 
> So bypassing libvirt completely to actually use the session?
> 
> Currently, it's more like:
> 
> client -> QemuMonitorCommand -> libvirt -> QMP -> QEMU
> 
> > The QmpSession returned by this call can then be used with all of
> > the libqmp interfaces.  This means we can still exercise our test
> > suite with a guest launched through libvirt.  It also should make
> > the libvirt pass through interface a bit easier to consume by third
> > parties.
> 
> This sounds like it's something libvirt folks should be involved with.
> At the very least, this mode is there now and considered basically
> unstable/experimental/developer use:
> 
>  "Qemu monitor command '%s' executed; libvirt results may be unpredictable!"
> 
> So likely some concern about making it easier to use, esp. assuming
> that third parties above are mgmt apps, not just developers.

Although we provide monitor and command line passthrough in libvirt,
our recommendation is that mgmt apps do not develop against these
APIs. Our goal / policy is that apps should be able todo anything
they need using the formally modelled libvirt public APIs.

The primary intended usage for the monitor/command line passthrough
is debugging & experimentation, and as a very short term hack/workaround
for mgmt apps while formal APIs are added to libvirt. In other words,
we provide the feature because we don't want libvirt to be a roadblock,
but we still strongly discourage their usage untill all other options
have been exhausted.

In same way as loading binary only modules into the kernels sets a
'tainted' flag, we plan that direct usage of monitor/command line
passthrough will set a tainted flag against a VM. This is allow distro
maintainers to identify usage & decide how they wish to support these
features in products (if at all).

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuring the bridge interface: why assign an IP?

2011-03-14 Thread Daniel P. Berrange

On Mon, Mar 14, 2011 at 11:24:40AM -0600, Ben Beuchler wrote:
> Most of the examples for setting up the bridge interface on a VM host
> suggest assigning the IP address to the bridge.  Assigning the IP to
> the bridge leaves you open to the MAC address of the bridge changing
> as you add/remove guests from the host, resulting in a brief (~20
> second) loss of connectivity to the host. (I am aware that I can
> manually set the MAC of the bridge to avoid unexpected changes. That's
> my current workaround.)

You don't need to manually set a MAC on the bridge - indeed you can't
set an arbitrary MAC on it - it must have a MAC that matches one of
the interfaces enslaved. The key is that the MAC of the enslaved ethernet
device should be numerically smaller than that of any guest TAP devices.
The kernel gives TAP devices a completely random MAC by default, so you
need to make a little change to that. Two options

 - Take the random host TAP device MAC and simply set the first byte to 0xFE
 - Take the guest NIC MAC, set first byte to 0xFE and give that to
   the host TAP device.

Recent releases of libvirt, follow the second approach and it has worked
out well, eliminating any connectivity loss with guest startup/shutdown

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with bridged tap interface

2011-02-23 Thread Daniel P. Berrange

On Wed, Feb 23, 2011 at 12:34:45PM +0100, 
andreas.a...@de.transport.bombardier.com wrote:
> Hi all,
> 
> sorry for the previous partial e-mail, I hit the send button accidentally 
> ;-).
> 
> I have a setup with a kvm-based virtual machine running a stock RedHat 6.1 
> (yes, that old) on a rather current debian host.
> 
> 1. uname in host: 2.6.26-2-amd64 #1 SMP Wed May 12 18:03:14 UTC 2010 
> x86_64 GNU/Linux
> 
> 2. uname in guest: 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown
> 
> eth0 of the guest is connected via tap0 to a kernel bridge, that is in 
> turn connected via the host's eth1 to a Gigabit link.  On the kvm 
> command-line I configure the guest-nic as "model=ne2k_pci".
> 
> The problem is, that I frequently loose network access from/to the guest.

There have been QEMU NIC model implementation bugs that exhibit
that characteristic. If you have the drivers available in the
guest, then I'd recommend trying out different NIC models than
ne2k, since that's probably the least actively maintained NIC
model. At least try rtl8139, but ideally the e1000 too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange

On Thu, Feb 10, 2011 at 07:23:33PM +0900, Yoshiaki Tamura wrote:
> 2011/2/10 Daniel P. Berrange :
> > On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
> >> On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
> >> >Currently FdMigrationState doesn't support read(), and this patch
> >> >introduces it to get response from the other side.
> >> >
> >> >Signed-off-by: Yoshiaki Tamura
> >>
> >> Migration is unidirectional.  Changing this is fundamental and not
> >> something to be done lightly.
> >
> > Making it bi-directional might break libvirt's save/restore
> > to file support which uses migration, passing a unidirectional
> > FD for the file. It could also break libvirt's secure tunnelled
> > migration support which is currently only expecting to have
> > data sent in one direction on the socket.
> 
> Hi Daniel,
> 
> IIUC, this patch isn't something to make existing live migration
> bi-directional.  Just opens up a way for Kemari to use it.  Do
> you think it's dangerous for libvirt still?

The key is for it to be a no-op for any usage of the existing
'migrate' command. I had thought this was wiring up read into
the event loop too, so it would be poll()ing for reads, but
after re-reading I see this isn't the case here.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange

On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
> On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
> >Currently FdMigrationState doesn't support read(), and this patch
> >introduces it to get response from the other side.
> >
> >Signed-off-by: Yoshiaki Tamura
> 
> Migration is unidirectional.  Changing this is fundamental and not
> something to be done lightly.

Making it bi-directional might break libvirt's save/restore
to file support which uses migration, passing a unidirectional
FD for the file. It could also break libvirt's secure tunnelled
migration support which is currently only expecting to have
data sent in one direction on the socket.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-07 Thread Daniel P. Berrange

On Sat, Feb 05, 2011 at 04:34:01PM +, James Neave wrote:
> Hi,
> 
> I'm trying to pass a NOVA-T-500 TV Tuner card through to a gust VM.
> I'm getting the error "The driver 'pci-stub' is occupying your device
> :08:06.2"

This is a rather misleading error message. It is *expected* that
pci-stub will occupy the device. Unfortunately the rest of the
error messages QEMU is printing aren't much help either, but 
ultimately something is returning -EBUSY in the PCI device assign
step

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Daniel P. Berrange

On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> >For (2), you cannot use bus=X,addr=Y because it makes assumptions about
> >the PCI topology which may change in newer -M pc's.
> 
> Why should the PCI topology for 'pc' ever change?
> 
> We'll probably get q35 support some day, but when this lands I
> expect we'll see a new machine type 'q35', so '-m q35' will pick the
> ich9 chipset (which will have a different pci topology of course)
> and '-m pc' will pick the existing piix chipset (which will continue
> to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange

On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >>>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>>>>The device model topology is 100% a hidden architectural detail.
> >>>>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>>>similarly discoverable buses. There we have a guest-explorable topology
> >>>>>that is currently equivalent to the the qdev layout.
> >>>>But we also don't do PCI passthrough so we really haven't even explored
> >>>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>>qdev-ify it.
> >>>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>>
> >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>>work with tcg?
> >>>
> >>>>The -device interface is a stable interface. Right now, you don't
> >>>>specify any type of identifier of the pci bus when you create a PCI
> >>>>device. It's implied in the interface.
> >>>Wrong.  You can specify the bus you want attach the device to via
> >>>bus=.  This is true for *every* device, including all pci
> >>>devices.  If unspecified qdev uses the first bus it finds.
> >>>
> >>>As long as there is a single pci bus only there is simply no need
> >>>to specify it, thats why nobody does that today.
> >>Right.  In terms of specifying bus=, what are we promising re:
> >>compatibility?  Will there always be a pci.0?  If we add some
> >>PCI-to-PCI bridges in order to support more devices, is libvirt
> >>support to parse the hierarchy and figure out which bus to put the
> >>device on?
> >The answer to your questions probably differ depending on
> >whether '-nodefconfig' and '-nodefaults' are set on the
> >command line.  If they are set, then I'd expect to only
> >ever see one PCI bus with name pci.0 forever more, unless
> >i explicitly ask for more. If they are not set, then you
> >might expect to see multiple PCI buses by appear by magic
> 
> Yeah, we can't promise that.  If you use -M pc, you aren't
> guaranteed a stable PCI bus topology even with
> -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
> >
> >The reason we specify 'bus' is that we wanted to be flexible wrt
> >upgrades of libvirt, without needing restarts of QEMU instances
> >it manages. That way we can introduce new functionality into
> >libvirt that relies on it having previously set 'bus' on all
> >active QEMUs.
> >
> >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> >be adding the extra bridges. I'd expect that QEMU provided just
> >the first bridge and then libvirt would specify how many more
> >bridges to create at boot or hotplug them later. So it wouldn't
> >ever need to parse topology.
> 
> Yeah, but replacing the main chipset will certainly change the PCI
> topology such that if you're specifying bus=X and addr=X and then
> also using -M pc, unless you're parsing the default topology to come
> up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange

On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
> On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> >Gerd Hoffmann  writes:
> >
> >>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >The device model topology is 100% a hidden architectural detail.
> This is true for the sysbus, it is obviously not the case for PCI and
> similarly discoverable buses. There we have a guest-explorable topology
> that is currently equivalent to the the qdev layout.
> >>>But we also don't do PCI passthrough so we really haven't even explored
> >>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>qdev-ify it.
> >>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>
> >>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>work with tcg?
> >>
> >>>The -device interface is a stable interface. Right now, you don't
> >>>specify any type of identifier of the pci bus when you create a PCI
> >>>device. It's implied in the interface.
> >>Wrong.  You can specify the bus you want attach the device to via
> >>bus=.  This is true for *every* device, including all pci
> >>devices. If unspecified qdev uses the first bus it finds.
> >>
> >>As long as there is a single pci bus only there is simply no need to
> >>specify it, thats why nobody does that today.  Once q35 finally
> >>arrives this will change of course.
> >As far as I know, libvirt does it already.
> 
> I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices & their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] VM stuck in interrupt-loop after suspend to/resumed from file, or no interrupts at all

2011-01-12 Thread Daniel P. Berrange

On Wed, Jan 12, 2011 at 03:51:13PM +0100, Philipp Hahn wrote:
> Hello,
> 
> libvirt implements a manages save, which suspens a VM to a file, from which 
> it 
> can be resumed later. This uses Qemus/Kvms "migrate exec:" feature.
> This doesn't work reliable for me: In may cases the resumed VM seems to be 
> stuck: its VNC console is restored, but no key presses or network packages 
> are accepted. This both happens with Windows XP, 7, 2008 and Linux 2.6.32 
> systems.
> 
> Using the debugging cycle described below in more detail I was able to track 
> the problem down to interrupt handling: Either the Linux-guest-kernel 
> constantly receives an interrupt for the 8139cp network adapter, or no 
> interrupts at all (neither network nor keyboard nor timer); only sending a 
> NMI works and shows that at least the Linux-Kernel is still alive.
> 
> If I add the -no-kvm-irqchip Option, it seems to work; I was not able to 
> reproduce a hang.

I remember reporting a bug with that scenario 4/5 months back
and it being fixed in the host kernel IIRC.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] cgroup limits only affect kvm guest under certain conditions

2011-01-06 Thread Daniel P. Berrange

On Thu, Jan 06, 2011 at 02:15:37PM +0100, Dominik Klein wrote:
> Hi
> 
> I am playing with cgroups and try to limit block io for guests.
> 
> The proof of concept is:
> 
> # mkdir /dev/cgroup/blkio
> # mount -t cgroup -o blkio blkio /dev/cgroup/blkio/
> # cd blkio/
> # mkdir test
> # cd test/
> # ls -l /dev/vdisks/kirk
> lrwxrwxrwx 1 root root 7 2011-01-06 13:46 /dev/vdisks/kirk -> ../dm-5
> # ls -l /dev/dm-5
> brw-rw 1 root disk 253, 5 2011-01-06 13:36 /dev/dm-5
> # echo "253:5  1048576" > blkio.throttle.write_bps_device
> # echo $$ > tasks
> # dd if=/dev/zero of=/dev/dm-5 bs=1M count=20
> 20+0 records in
> 20+0 records out
> 20971520 bytes (21 MB) copied, 20.0223 s, 1.0 MB/s
> 
> So limit applies to the dd child of my shell.
> 
> Now I assign /dev/dm-5 (/dev/vdisks/kirk) to a vm and echo the qemu-kvm
> pid into tasks. Limits are not applied, the guest can happily use max io
> bandwidth.

Did you just echo the main qemu-kvm PID, or did you also
add the PIDs of every thread too ? From this description
of the problem, I'd guess you've only confined the main
process thread and thus the I/O & VCPU threads are not
confined.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.13.0 - winsows 2008 - chkdisk too slow

2011-01-06 Thread Daniel P. Berrange

On Thu, Jan 06, 2011 at 12:19:21PM +0200, Avi Kivity wrote:
> On 01/06/2011 11:42 AM, Nikola Ciprich wrote:
> >>  - run trace-cmd record -e kvm -b 10 -P pid1 -P pid2, ctrl-C after a
> >seems like it's not possible to specify multiple pids, so
> 
> Did you get 'overrun: something' reports from trace-cmd, where
> something != 0?
> 
> If you're not sure, please run the trace again.  Also try adding '-r
> 10' to the command line.
> 
> >I've run 4 commands in parallel. Also I can't get monitor information
> >since vm is started using libvirt, so I've just used all machine's qemu-kvm
> >pids..
> 
> Dan, is there a way to hijack the monitor so we can run some
> commands on it?  Things like 'info registers' and disassembly.

Depends on the libvirt version. For most, you'll need to
look for the monitor path in the QEMU argv:

  -chardev
+socket,id=monitor,path=/var/lib/libvirt/qemu/vmwts02.monitor,server,nowait 
-mon   chardev=monitor,mode=readline

then, 'service libvirtd stop' and now you can connect to
the monitor at that path & run commands you want, and then
disconnect and start libvirtd again. If you run any commands
that change the VM state, things may well get confused when
you start libvirtd again, but if its just 'info registers'
etc it should be pretty safe.

If you have a new enough libvirt, then you can also send
commands directly using 'virsh qemu-monitor-command' (checking
whether you need JSON or HMP syntax first - in this case you
can see it needs HMP).

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] device-assignment: chmod the rom file before opening read/write

2011-01-05 Thread Daniel P. Berrange

On Wed, Jan 05, 2011 at 05:14:55PM +0200, Avi Kivity wrote:
> On 01/05/2011 04:57 PM, Alex Williamson wrote:
> >A valid argument.  I think it could also be argued that the user is
> >providing ownership of the file and writing to the file is part of the
> >low level details of the sysfs rom file API and should be handled by the
> >user of that API.  We basically have 3 places we could put this:
> >
> >  A. kernel - Why is this file mode 0400 by default anyway if using
> > it requires write access?  Set it to mode 0600 here by default.
> >  B. libvirt - Already does chown, why not do chmod too?  chmod and
> > restore here.
> >  C. qemu - Owns file, chmod is trivial and part of the sysfs rom
> > file API?  chmod around usage.
> >
> 
> qemu might not actually own the file, just have rw permissions.  Or
> it might own the file and selinux may prevent it from changing the
> permissions.  Or it may die before the reverse chmod and leave
> things not as they were.

Agreed, I don't think we can rely on QEMU being able to chmod() the
file in general.

> 
> >I chose qemu because it seemed to have the least chance of side-effects
> >and has the smallest usage window.  Do you prefer libvirt or kernel?
> 
> No idea really.  What's the kernel's motivation for keeping it ro?  Sanity?
> 
> I'd guess libvirt is the one to do it, but someone more familiar
> with device assignment / pci (you?) should weigh in on this.

I've no real objection to libvirt setting the 0600 permissions
on it, if that's required for correct operation.

BTW, what is the failure scenario seen when the file is 0400.
I want to know how to diagnose/triage this if it gets reported
by users in BZ...

Regards,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: USB Passthrough 1.1 performance problem...

2010-12-14 Thread Daniel P. Berrange

On Tue, Dec 14, 2010 at 12:55:04PM +0100, Kenni Lund wrote:
> 2010/12/14 Erik Brakkee :
> >> From: Kenni Lund 
> >> 2010/12/14 Erik Brakkee :
> 
>  From: Kenni Lund 
> >
> > Does this mean I have a chance now that PCI passthrough of my WinTV
> > PVR-500
> > might work now?
> 
>  Passthrough of a PVR-500 has been working for a long time. I've been
>  running with passthrough of a PVR-500 in my HTPC, since
>  November/December 2009...so it should work with any recent kernel and
>  any recent version of qemu-kvm you can find today - No patching
>  needed. The only issue I had with the PVR-500 card, was when *I*
>  didn't free up the shared interrupts...once I fixed that, it "just
>  worked".
> >>>
> >>> How did you free up those shared interrupts then? I tried different slots
> >>> but always get conflicts with the USB irqs.
> >>
> >> I did an unbind of the conflicting device (eg. disabled it). I moved
> >> the PVR-500 card around in the different slots and once I got a
> >> conflict with the integrated sound card, I left the PVR-500 card in
> >> that slot (it's a headless machine, so no need for sound) and
> >> configured unbind of the sound card at boot time. On my old system I
> >> think it was conflicting with one of the USB controllers as well, but
> >> it didn't really matter, as I only lost a few of the ports on the back
> >> of the computer for that particular USB controller - I still had
> >> plenty of USB ports left and if I really needed more ports, I could
> >> just plug in an extra USB PCI card.
> >>
> >> My /etc/rc.local boot script looks like the following today:
> >> --
> >> #Remove HDA conflicting with ivtv1
> >> echo ":00:1b.0" > /sys/bus/pci/drivers/HDA\ Intel/unbind
> >>
> >> # ivtv0
> >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/new_id
> >> echo ":04:08.0" > /sys/bus/pci/drivers/ivtv/unbind
> >> echo ":04:08.0" > /sys/bus/pci/drivers/pci-stub/bind
> >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/remove_id
> >>
> >> # ivtv1
> >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/new_id
> >> echo ":04:09.0" > /sys/bus/pci/drivers/ivtv/unbind
> >> echo ":04:09.0" > /sys/bus/pci/drivers/pci-stub/bind
> >> echo " 0016" > /sys/bus/pci/drivers/pci-stub/remove_id
> >
> > I did not try unbinding the usb device so I can also try that.
> >
> > I don'.t understand what is happening with the  0016. I configured the
> > pci card in kvm and I believe kvm does the binding to pci-stub in recent
> > versions. Where is the  0016%oming from?
> 
> Okay, qemu-kvm might do it today, I don't know - I haven't changed
> that script for the past year. But are you sure that it's not
> libvirt/virsh/virt-manager which does that for you?

If you use the managed="yes" attribute on the  in libvirt
XML, then libvirt will automatically do the pcistub bind/unbind,
followed by a device reset at guest startup & the reverse at shutdown.
If you have conflicting devices on the bus though, libvirt won't
attempt to unbind them, unless you had also explicitly assigned all
those conflicting devices to the same guest.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 239 matches

Mail list logo