Re: [Qemu-devel] [PATCH v3 0/9] HyperV equivalent of pvpanic driver

2015-06-30 Thread Daniel P. Berrange
On Tue, Jun 30, 2015 at 02:33:18PM +0300, Denis V. Lunev wrote:
 Windows 2012 guests can notify hypervisor about occurred guest crash
 (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
 handling of this MSR's by KVM and sending notification to user space that
 allows to gather Windows guest crash dump by QEMU/LIBVIRT.
 
 The idea is to provide functionality equal to pvpanic device without
 QEMU guest agent for Windows.

That's nice - do you know if the Linux kernel (or any other non-Win2k12
kernels) have support for notifying hypevisors via this Hyper-V msr,
when running as a guest ?

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange
On Thu, May 21, 2015 at 03:51:43PM +0200, Paolo Bonzini wrote:
 Some of you may have heard about the Clear Containers initiative from
 Intel, which couple KVM with various kernel tricks to create extremely
 lightweight virtual machines.  The experimental Clear Containers setup
 requires only 18-20 MB to launch a virtual machine, and needs about 60
 ms to boot.
 
 Now, as all of you probably know, QEMU is great for running Windows or
 legacy Linux guests, but that flexibility comes at a hefty price. Not
 only does all of the emulation consume memory, it also requires some
 form of low-level firmware in the guest as well. All of this adds quite
 a bit to virtual-machine startup times (500 to 700 milliseconds is not
 unusual).
 
 Right?  In fact, it's for this reason that Clear Containers uses kvmtool
 instead of QEMU.
 
 No, wrong!  In fact, reporting bad performance is pretty much the same
 as throwing down the gauntlet.

On the QEMU side of things I wonder if there is scope for taking AArch64's
'virt' machine type concept and duplicating it on all architectures. It
would be nice to have a common minimal machine type on all architectures
that discards all legacy platform stuff and focuses on the minimum needed
to run modern virtual machine optimized guest OS. People would always know
that a machine type called 'virt' was the minimal virtualization platform,
while the others all target emulation of realworld (legacy) baremetal
platforms.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange
On Fri, May 22, 2015 at 12:04:54PM +0100, Peter Maydell wrote:
 On 22 May 2015 at 12:01, Daniel P. Berrange berra...@redhat.com wrote:
  On the QEMU side of things I wonder if there is scope for taking AArch64's
  'virt' machine type concept and duplicating it on all architectures.
 
 Experience suggests that holding the line on minimal is really
 quite tricky, though -- there's always one more thing that
 somebody really wants to add...

Yep, it is hard saying no - but I'd think as long as it was possible to add
the extra features using -device, it ought to be practical to keep a virt
machine types -nodefaults -nodefconfig base setup pretty minimal. In
particular I don't see why we need to have a SATA controller and ISA/LPC
bridge in every virt machine - root PCI bus only should be possible, as you
can provide disks via virtio-blk or virtio-scsi and serial, parallel, mouse,
floppy via PCI devices and/or by adding a USB bus in the cases where you
really need one.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Announcing qboot, a minimal x86 firmware for QEMU

2015-05-22 Thread Daniel P. Berrange
On Fri, May 22, 2015 at 12:21:27PM +0100, Peter Maydell wrote:
 On 22 May 2015 at 12:12, Daniel P. Berrange berra...@redhat.com wrote:
  Yep, it is hard saying no - but I'd think as long as it was possible to add
  the extra features using -device, it ought to be practical to keep a virt
  machine types -nodefaults -nodefconfig base setup pretty minimal.
 
 Mmm, but -device only works for pluggable devices really. We don't
 have a coherent mechanism for saying put the PS/2 keyboard controller
 into the system at its usual IO ports on the command line.

Oh, I didn't neccessarily mean that we'd need the ability to add a
ps/2 keyboard via -device. I meant that there just need to be able
to add /some/ kind of keyboard. eg we have a usb-kbd device that
could potentially fill that role. Likewise for mouse pointer. Serial
ports, etc.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server

2014-07-21 Thread Daniel P. Berrange
On Mon, Jul 21, 2014 at 08:21:21AM -0600, Eric Blake wrote:
 On 07/20/2014 03:38 AM, David Marchand wrote:
  When using ivshmem devices, notifications between guests can be sent as
  interrupts using a ivshmem-server (typical use described in documentation).
  The client is provided as a debug tool.
  
  Signed-off-by: Olivier Matz olivier.m...@6wind.com
  Signed-off-by: David Marchand david.march...@6wind.com
  ---
   contrib/ivshmem-client/Makefile |   26 ++
 
  +++ b/contrib/ivshmem-client/Makefile
  @@ -0,0 +1,26 @@
  +# Copyright 2014 6WIND S.A.
  +# All rights reserved
 
 This file has no other license, and is therefore incompatible with
 GPLv2.  You'll need to resubmit under an appropriately open license.
 
  +++ b/contrib/ivshmem-client/ivshmem-client.h
  @@ -0,0 +1,238 @@
  +/*
  + * Copyright(c) 2014 6WIND S.A.
  + * All rights reserved.
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2.  See
  + * the COPYING file in the top-level directory.
 
 I'm not a lawyer, but to me, this license is self-contradictory.  You
 can't have All rights reserved and still be GPL, because the point of
 the GPL is that you are NOT reserving all rights, but explicitly
 granting your user various rights (on condition that they likewise grant
 those rights to others).  But you're not the only file in the qemu code
 base with this questionable mix.

In any case adding the term 'All rights reserved' is said to be redundant
obsolete these days

  https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Xen hypervisor inside KVM guest with x2apic CPU feature fails to boot

2014-06-02 Thread Daniel P. Berrange
I'm running

 kernel-3.14.4-200.fc20.x86_64
 qemu-1.6.2-5.fc20.x86_64
 xen-4.4.0-4.fc21

In process of trying to get a Xen hypervisor running inside a KVM guest I
found that there's a problem with x2apic. NB I do *not* use nested-VMX
here, just trying to get plain Xen paravirt working before trying todo
nested HVM.

Any time I enable the 'x2apic' CPU flag for the KVM guest, the Xen hypervisor
running inside the guest will fail to boot:

The QEMU/KVM -cpu arg is

  -cpu 
core2duo,+erms,+smep,+fsgsbase,+lahf_lm,+rdtscp,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+tsc-deadline,+popcnt,+x2apic,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds
 

The Xen logs indicate it isn't liking the x2apic feature and is disabling
it, but then it obviously fails to setup the non-x2apic codepath it is
following - even though the non-x2apic codepath works fine if you don't
have +x2apic set for the KVM guest.

(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) XSM Framework v1.0.0 initialized
(XEN) Flask:  Initializing.
(XEN) AVC INITIALIZED
(XEN) Flask:  Starting in permissive mode.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2693.939 MHz processor.
(XEN) Initing memory sharing.
(XEN) traps.c:3071: GPF (): 82d0801b83c7 - 82d08023386b
(XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 1 CMCI 0 firstbank 1 
extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) Getting VERSION: 1050014
(XEN) Getting VERSION: 1050014
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) Getting ID: 0
(XEN) Getting LVT0: 8700
(XEN) Getting LVT1: 8400
(XEN) Suppress EOI broadcast on CPU#0
(XEN) enabled ExtINT on CPU#0
(XEN) ENABLING IO-APIC IRQs
(XEN)  - Using old ACK method
(XEN) init IO_APIC IRQs
(XEN)  IO-APIC (apicid-pin) 0-0, 0-16, 0-17, 0-18, 0-19, 0-20, 0-21, 0-22, 0-23 
not connected.
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC
(XEN) ...trying to set up timer (IRQ0) through the 8259A ...  failed.
(XEN) ...trying to set up timer as Virtual Wire IRQ... failed.
(XEN) ...trying to set up timer as ExtINT IRQ... failed :(.
(XEN) 
(XEN) 
(XEN) Panic on CPU 0:
(XEN) IO-APIC + timer doesn't work!  Boot with apic_verbosity=debug and send a 
report.  Then try booting with the 'noapic' option
(XEN) 

Will attach the full non-trimmed Xen log to this mail, along with a log
showing successful boot when 'x2apic' isn't given to KVM.

I'm unclear if this is a Xen bug or KVM bug or QEMU bug, or a combination
of them

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
 Xen 4.4.0-4.fc21
(XEN) Xen version 4.4.0 (mockbuild@[unknown]) (gcc (GCC) 4.9.0 20140506 (Red 
Hat 4.9.0-3)) debug=n Mon May 12 18:38:23 UTC 2014
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder loglvl=all guest_loglvl=all com1=115200,8n1 
console=com1,vga apic_verbosity=debug
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009fc00 (usable)
(XEN)  0009fc00 - 000a (reserved)
(XEN)  000f - 0010 (reserved)
(XEN)  0010 - 5dbfe000 (usable)
(XEN)  5dbfe000 - 5dc0 (reserved)
(XEN)  feffc000 - ff00 (reserved)
(XEN)  fffc - 0001 (reserved)
(XEN) System RAM: 1499MB (1535604kB)
(XEN) ACPI: RSDP 000F1690, 0014 (r0 BOCHS )
(XEN) ACPI: RSDT 5DBFE4A0, 0030 (r1 BOCHS  BXPCRSDT1 BXPC1)
(XEN) ACPI: FACP 5DBFFF80, 0074 (r1 BOCHS  BXPCFACP1 BXPC1)
(XEN) ACPI: DSDT 5DBFE4D0, 1137 (r1   BXPC   BXDSDT1 INTL 20140114)
(XEN) ACPI: FACS 5DBFFF40, 0040
(XEN) ACPI: SSDT 5DBFF700, 0838 (r1 BOCHS  BXPCSSDT1 BXPC1)
(XEN) ACPI: APIC 5DBFF610, 0078 (r1 BOCHS  BXPCAPIC1 BXPC1)
(XEN) No NUMA configuration found
(XEN) Faking a node at -5dbfe000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000f17f0
(XEN) DMI 2.4 present.
(XEN) APIC boot state is 'xapic'
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0xb008
(XEN) ACPI: SLEEP INFO: pm1x_cnt[b004,0], pm1x_evt[b000,0]
(XEN) ACPI: wakeup_vec[5dbfff4c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:15 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) 

Re: [Qemu-devel] KVM call agenda for 2014-04-28

2014-04-29 Thread Daniel P. Berrange
On Tue, Apr 29, 2014 at 02:33:58PM +0200, Markus Armbruster wrote:
 Peter Maydell peter.mayd...@linaro.org writes:
 
  On 29 April 2014 11:09, Michael S. Tsirkin m...@redhat.com wrote:
  Let's just make clear how to contact us securely, when to contact that
  list, and what we'll do with the info.  I cobbled together the
  following:
  http://wiki.qemu.org/SecurityProcess
 
  Looks generally OK I guess. I'd drop the 'how to use pgp' section --
  anybody who cares will already know how to send us PGP email.
 
 The first paragraph under How to Contact Us Securely is fine, the rest
 seems redundant for readers familiar with PGP, yet hardly sufficient for
 the rest.
 
 One thing I like about Libvirt's Security Process page[*] is they give
 an idea on embargo duration.

FWIW I picked the 2 weeks length myself a completely arbitrary timeframe.
We haven't stuck to that strictly - we consider needs of each vulnerability
as it is triaged to determine the minimum practical embargo time. So think
of 2 weeks as more of a guiding principal to show the world that we don't
believe in keeping issues under embargo for very long periods of time.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help regarding virsh domifstat

2013-11-01 Thread Daniel P. Berrange
On Thu, Oct 31, 2013 at 08:30:30PM -0500, Rohit Bhat wrote:
 Hi,
 
 I need a small help. I am working on a project where i have to monitor
 network activity of a VM running on KVM.
 
 I am interested in how much data is going into the VM and how much
 data is coming out of the VM. I checked on the net and found out virsh
 domifstat is the way to go about it.
 
 1. But looks like these stats also include bytes related to control
 traffic for the VM. Is there a way to exclude that? I just want the
 size of actual data transfers.
 
 2. Is there a way by which i can report the data transfer of VM with
 the outside world (outside hypervisor) only while excluding data
 transfer with any other VM on the same host?
 
 Please let me know if this is a not the right group for such queries.

The libvirt-users mailing list is a better place for virsh related
questions

  http://libvirt.org/contact.html#email

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu, numa: non-contiguous cpusets

2013-09-30 Thread Daniel P. Berrange
On Sun, Sep 29, 2013 at 05:10:44PM +0200, Borislav Petkov wrote:
 Btw,
 
 while I got your attention, on a not-really related topic: how do we
 feel about adding support for specifying a non-contiguous set of cpus
 for a numa node in qemu with the -numa option? I.e., like this, for
 example:
 
 x86_64-softmmu/qemu-system-x86_64 -smp 8 -numa node,nodeid=0,cpus=0\;2\;4-5 
 -numa node,nodeid=1,cpus=1\;3\;6-7
 
 The ';' needs to be escaped from the shell but I'm open for better
 suggestions.

Use a ':' instead.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt-users] Questions on how to reset ID numbers for virt Guests.

2013-09-11 Thread Daniel P. Berrange
On Wed, Sep 11, 2013 at 09:47:07AM +0200, Paolo Bonzini wrote:
 Il 11/09/2013 00:27, James Sparenberg ha scritto:
  I'm doing some experimenting in our Development lab and as a result
  I'm kickstarting over and over Virtual guests.  This is of course
  causing the guest Id to increment by one with each test.  I've
  googled around and tried searching the list but have not found out
  how (if at all) it would be possible to reset the ID number back to 1
  more than is in use.  Also is there  a limit where I run out of ID's?
  (for example does it only go up to 99?)
 
 No, there is no limit.

Well, 'int' will wrap eventually, but you'd need to have created
a hell of alot of guests for that to be a problem :-)

 I don't know the answer to your other question, so I'm adding the
 libvirt-users mailing list.

If you restart libvirtd, it reset itself to start allocating IDs
at the max current used ID of any running guest.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disabling mergeable rx buffers for the guest

2013-07-16 Thread Daniel P. Berrange
On Tue, Jul 16, 2013 at 10:40:28AM +, Naor Shlomo wrote:
 Hi Paolo,
 
 For some unknown reason it suddenly started to accept the changes to the XML 
 and the strings you gave me are now in place.
 Upon machine start I now receive the following error messages:
 
 virsh # start NaorDev
 error: Failed to start domain NaorDev
 error: internal error Process exited while reading console log output: kvm: 
 -global: requires an argument
 
 Here's the XML:
 

   qemu:commandline
 qemu:arg value='-global'/
 qemu:env name='mrg_rxbuf' value='off'/
   /qemu:commandline

Presumably what you wanted to do was

   qemu:commandline
 qemu:arg value='-global'/
 qemu:arg value='mrg_rxbuf=off'/
   /qemu:commandline

Rather than setting an environment variable.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.9.x kvm hangs after seabios

2013-05-08 Thread Daniel P. Berrange
On Wed, May 08, 2013 at 02:08:55PM +0200, Tomas Papan wrote:
 Hi,
 
 I found this in the libvirt (but those messages are same in 3.8.x)
 anakin libvirt # cat libvirtd.log
 2013-05-08 11:59:29.645+: 3750: info : libvirt version: 1.0.5
 2013-05-08 11:59:29.645+: 3750: error : udevGetDMIData:1548 :
 Failed to get udev device for syspath '/sys/devices/virtual/dmi/id' or
 '/sys/class/dmi/id'
 2013-05-08 11:59:29.680+: 3750: warning :
 ebiptablesDriverInitCLITools:4225 : Could not find 'ebtables'
 executable

You need to look at /var/log/libvirt/qemu/$GUESTNAME.log for
QEMU related messages. The libvirtd.log file only has the
libvirt related messages.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [okeanos-dev] Re: KVM versions, machine types and failed migrations

2013-01-10 Thread Daniel P. Berrange
On Wed, Jan 09, 2013 at 03:27:53PM +0200, Vangelis Koukis wrote:
 On Wed, Jan 09, 2013 at 01:10:45pm +, Daniel P. Berrange wrote:
  When doing migration, the fundamental requirement is that the guest
  OS visible machine ABI must not change. Thus there are three key
  things to take care of when launching QEMU on the migration target
  host.
  
   - The device PCI/USB addresses must be identical to the source
   - The machine type must be identical to the source
   - The CPU model must be identical to the source
  
 
 Thanks for the detailed list of requirements, we'll take it into account
 for the relevant Ganeti patch.
 
  If you don't follow those requirements, either QEMU or the guest OS
  or both will crash  burn during migration  you get to keep both
  pieces :-)
  
 
 My point is, are these requirements left up to the caller of kvm
 -incoming to satisfy? Since the migration will most probably break,
 wouldn't it be best for QEMU to detect this and complain loudly, instead
 of continuing with the migration, failing silently and destroying the
 VM?
 
 Sure there could be some yes, do it, I know it is going to break
 option, which will make QEMU proceed with the migration. However, in 99%
 of the cases this is just user error, e.g. the user has upgraded the
 version on the other end and has not specified -M explicitly. It would
 be best if QEMU was able to detect and warn the user about what is going
 to happen, because it does lead to the VM dying.

What you describe is certainly desirable, but it is quite hard to achieve
with current QEMU. Much of the work with moving to the new QEMU object
model  configuration descriptions has been motivated by a desire to
enable improvements migration handling. As you suggest, the goal is that
the source QEMU be able to send a complete  reliable hardware description
to the destination QEMU during migration.It is getting closer, but we're
not there yet.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM versions, machine types and failed migrations

2013-01-09 Thread Daniel P. Berrange
On Wed, Jan 09, 2013 at 02:23:50PM +0200, Vangelis Koukis wrote:
 Hello,
 
 I'd like to ask a few questions about the way migrations work in KVM
 among different emulated machine types and different versions of the
 qemu-kvm package. I am sending to both the kvm@ and qemu-devel@ lists,
 please redirect me if I was wrong in doing so.
 
 In a nutshell: while trying to live-migrate a VM on ~okeanos [1], we
 see VM migrations fail silently if going from kvm 1.0 to kvm 1.1.
 The source VM is frozen, info migrate on the source monitor reports
 success, but the VM is dead upon arrival on the destination process.
 Please see [3] for the exact package versions for qemu-kvm we have
 tested with.
 
 Migration works if the destination kvm has been started with the same
 machine type as the source VM, e.g., using -M pc-1.0 specifically on
 the destination, when migrating a pc-1.0 machine from kvm 1.0 to
 kvm 1.1.
 
 How does the machine type specified with -M work in the case of
 migrations? Are migrations expected to fail if the machine type is
 different between source and destination process? If yes, shouldn't KVM be
 able to detect this and abort the migration instead of failing silently?

When doing migration, the fundamental requirement is that the guest
OS visible machine ABI must not change. Thus there are three key
things to take care of when launching QEMU on the migration target
host.

 - The device PCI/USB addresses must be identical to the source
 - The machine type must be identical to the source
 - The CPU model must be identical to the source

If you don't follow those requirements, either QEMU or the guest OS
or both will crash  burn during migration  you get to keep both
pieces :-)

 Regarding different package versions of qemu-kvm, it seems migrations do
 not work from source 0.12.5 to any other version *even* if -M pc-0.12 is
 specified at the incoming KVM process. For versions = 1.0 everything
 works provided the machine type on the destination is the same as on the
 source.

Some older versions of QEMU were buggy causing the machine type to
not correctly preserve ABI.

 Our goal is to patch Ganeti [2] so that it sets the destination machine
 type to that of the source specifically, ensuring migrations work
 seamlessly after a KVM upgrade. Is there a way to retrieve the machine
 type of a running KVM process through a monitor command?

IIRC there is not a monitor command for this. The general approach
to dealing with migration stability should be to launch QEMU with a
canonical hardware configuration. This means explicitly setting a machine
type, CPU model and PCI/USB devices addresses upfront. NB you should not
use 'pc' as a machine type - if you query the list of machine types from
QEMU, it will tell you what 'pc' corresponds to (pc-1.2) and then use the
versioned type so you have a known machine type.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm: remove boot=on|off drive parameter compatibility

2012-10-02 Thread Daniel P. Berrange
On Mon, Oct 01, 2012 at 08:19:29AM -0500, Anthony Liguori wrote:
 Jan Kiszka jan.kis...@siemens.com writes:
 I think at this point, none of this matters but I added the various
 distro maintainers to the thread.
 
 I think it's time for the distros to drop qemu-kvm and just ship
 qemu.git.  Is there anything else that needs to happen to make that
 switch?

If that is upstream's recommendation, then I see no issue with switching
Fedora 19 / RHEL-7 to use qemu.git instead of qemu-kvm.git.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: Set default accelerator to kvm if the host supports it

2012-10-01 Thread Daniel P. Berrange
On Mon, Oct 01, 2012 at 06:43:00PM +0200, Andreas Färber wrote:
 Hello Jan,
 
 Am 01.10.2012 16:34, schrieb Jan Kiszka:
  If we built a target for a host that supports KVM in principle, set the
  default accelerator to KVM as well. This also means the start of QEMU
  will fail to start if KVM support turns out to be unavailable at
  runtime.
 
 From a distro point of view this of course means that we will build
 against KVM and that the new KVM default will start to fail for users on
 very old hardware. Can't we do a runtime check to select the default?

NB, this is *not* only about old hardware. There are plenty of users who
use QEMU inside VMs. One very common usage I know of is image building
tools which are run inside Amazon VMs, using libguestfs  QEMU.

IMHO, default to KVM, fallback to TCG is the most friendly default
behaviour.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] TSC scaling interface to management

2012-09-25 Thread Daniel P. Berrange
On Wed, Sep 12, 2012 at 12:39:39PM -0300, Marcelo Tosatti wrote:
 
 
 HW TSC scaling is a feature of AMD processors that allows a
 multiplier to be specified to the TSC frequency exposed to the guest.
 
 KVM also contains provision to trap TSC (KVM: Infrastructure for
 software and hardware based TSC rate scaling cc578287e3224d0da)
 or advance TSC frequency.
 
 This is useful when migrating to a host with different frequency and
 the guest is possibly using direct RDTSC instructions for purposes
 other than measuring cycles (that is, it previously calculated
 cycles-per-second, and uses that information which is stale after
 migration).
 
 qemu-x86: Set tsc_khz in kvm when supported (e7429073ed1a76518)
 added support for tsc_khz= option in QEMU.
 
 I am proposing the following changes so that management applications
 can work with this:
 
 1) New option for tsc_khz, which is tsc_khz=host (QEMU command line
 option). Host means that QEMU is responsible for retrieving the 
 TSC frequency of the host processor and use that.
 Management application does not have to deal with the burden.

FYI, libvirt already has support for expressing a number of different
TSC related config options, for support of Xen and VMWare's capabilities
in this area. What we currently allow for is

   timer name='tsc' frequency='NNN'  mode='auto|native|emulate|smpsafe'/

In this context the frequency attribute provides the HZ value to
provide to the guest.

  - auto == Emulate if TSC is unstable, else allow native TSC access
  - native == Always allow native TSC access
  - emulate = Always emulate TSC
  - smpsafe == Always emulate TSC, and interlock SMP

 Therefore it appears that this tsc_khz=auto option can be specified
 only if the user specifies so (it can be a per-guest flag hidden
 in the management configuration/manual).
 
 Sending this email to gather suggestions (or objections)
 to this interface.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] kvm: notify host when the guest is panicked

2012-08-14 Thread Daniel P. Berrange
On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
 On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
  We can know the guest is panicked when the guest runs on xen.
  But we do not have such feature on kvm.
  
  Another purpose of this feature is: management app(for example:
  libvirt) can do auto dump when the guest is panicked. If management
  app does not do auto dump, the guest's user can do dump by hand if
  he sees the guest is panicked.
  
  We have three solutions to implement this feature:
  1. use vmcall
  2. use I/O port
  3. use virtio-serial.
  
  We have decided to avoid touching hypervisor. The reason why I choose
  choose the I/O port is:
  1. it is easier to implememt
  2. it does not depend any virtual device
  3. it can work when starting the kernel
 
 How about searching for the Kernel panic - not syncing string 
 in the guests serial output? Say libvirtd could take an action upon
 that?

No, this is not satisfactory. It depends on the guest OS being
configured to use the serial port for console output which we
cannot mandate, since it may well be required for other purposes.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First shot at adding IPMI to qemu

2012-07-09 Thread Daniel P. Berrange
On Mon, Jul 09, 2012 at 08:23:11AM -0500, Corey Minyard wrote:
 I haven't heard anything about these patches.  Any comments, good or
 bad?  Has anyone tried these?

You really ought to post this to the qemu-devel mailing list,
since that's where the majority of QEMU developers hang out.
This KVM list is primarily for KVM specific development tasks
in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix default accelerator when building with --disable-kvm

2012-07-06 Thread Daniel P. Berrange
From: Daniel P. Berrange berra...@redhat.com

The following commit

  commit 3ad763fcba5bd0ec5a79d4a9b6baeef119dd4a3d
  Author: Jan Kiszka jan.kis...@siemens.com
  Date:   Fri Mar 2 10:30:43 2012 +0100

qemu-kvm: Use machine options to configure qemu-kvm defaults

Upstream is moving towards this mechanism, so start using it in qemu-kvm
already to configure the specific defaults: kvm enabled on, just like
in-kernel irqchips.

prevents qemu from starting when it has been build with the
--disable-kvm argument, because the accelerator is hardcoded
to 'kvm'.  This is a regression previously fixed by

  commit ce967f6610dcd7b7762dbad5a639fecf42d5c76d
  Author: Daniel P. Berrange berra...@redhat.com
  Date:   Fri Aug 5 09:50:29 2011 +0100

Fix default accelerator when configured with --disable-kvm

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

The fix is again to make this conditional on CONFIG_KVM_OPTIONS

Signed-off-by: Daniel P. Berrange berra...@redhat.com
---
 hw/pc_piix.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 98a06fa..35202dd 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -360,7 +360,9 @@ static QEMUMachine pc_machine_v1_2 = {
 .init = pc_init_pci,
 .max_cpus = 255,
 .is_default = 1,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 };
 
 #define PC_COMPAT_1_1 \
@@ -469,7 +471,9 @@ static QEMUMachine pc_machine_v0_14 = {
 .desc = Standard PC,
 .init = pc_init_pci,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
 {
@@ -503,7 +507,9 @@ static QEMUMachine pc_machine_v0_13 = {
 .desc = Standard PC,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_13,
 {
@@ -541,7 +547,9 @@ static QEMUMachine pc_machine_v0_12 = {
 .desc = Standard PC,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_12,
 {
@@ -575,7 +583,9 @@ static QEMUMachine pc_machine_v0_11 = {
 .desc = Standard PC, qemu 0.11,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_11,
 {
@@ -597,7 +607,9 @@ static QEMUMachine pc_machine_v0_10 = {
 .desc = Standard PC, qemu 0.10,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_11,
 {
@@ -631,7 +643,9 @@ static QEMUMachine isapc_machine = {
 .desc = ISA-only PC,
 .init = pc_init_isa,
 .max_cpus = 1,
+#ifdef CONFIG_KVM_OPTIONS
 .default_machine_opts = accel=kvm,kernel_irqchip=on,
+#endif
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = pc-sysfw,
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Fix default machine options

2012-07-06 Thread Daniel P. Berrange
On Fri, Jul 06, 2012 at 06:21:06PM +0200, Jan Kiszka wrote:
 qemu-kvm-specific machine defaults were missing for pc-0.15 to pc-1.1.
 Then Daniel noted that --disable-kvm caused problems as the generated
 binaries would be unable to run. As we are at it, we can drop the
 kernel_irqchip=on that is now enable by default in upstream.
 
 CC: Daniel P. Berrange berra...@redhat.com
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

ACK, looks good to me.

 Noticed that there was more to do. Can you take care of stable-1.1,
 Daniel? TIA.

Yep, will post a patch for stable-1.1 when this is accepted
into master.

  hw/pc_piix.c |   23 ---
  1 files changed, 16 insertions(+), 7 deletions(-)
 
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 98a06fa..5860d52 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -353,6 +353,12 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
  }
  #endif
  
 +#ifdef CONFIG_KVM_OPTIONS
 +#define KVM_MACHINE_OPTIONS accel=kvm
 +#else
 +#define KVM_MACHINE_OPTIONS 
 +#endif
 +
  static QEMUMachine pc_machine_v1_2 = {
  .name = pc-1.2,
  .alias = pc,
 @@ -360,7 +366,7 @@ static QEMUMachine pc_machine_v1_2 = {
  .init = pc_init_pci,
  .max_cpus = 255,
  .is_default = 1,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  };
  
  #define PC_COMPAT_1_1 \
 @@ -387,6 +393,7 @@ static QEMUMachine pc_machine_v1_1 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_1_1,
  { /* end of list */ }
 @@ -422,6 +429,7 @@ static QEMUMachine pc_machine_v1_0 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_1_0,
  { /* end of list */ }
 @@ -437,6 +445,7 @@ static QEMUMachine pc_machine_v0_15 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_15,
  { /* end of list */ }
 @@ -469,7 +478,7 @@ static QEMUMachine pc_machine_v0_14 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_14, 
  {
 @@ -503,7 +512,7 @@ static QEMUMachine pc_machine_v0_13 = {
  .desc = Standard PC,
  .init = pc_init_pci_no_kvmclock,
  .max_cpus = 255,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_13,
  {
 @@ -541,7 +550,7 @@ static QEMUMachine pc_machine_v0_12 = {
  .desc = Standard PC,
  .init = pc_init_pci_no_kvmclock,
  .max_cpus = 255,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_12,
  {
 @@ -575,7 +584,7 @@ static QEMUMachine pc_machine_v0_11 = {
  .desc = Standard PC, qemu 0.11,
  .init = pc_init_pci_no_kvmclock,
  .max_cpus = 255,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_11,
  {
 @@ -597,7 +606,7 @@ static QEMUMachine pc_machine_v0_10 = {
  .desc = Standard PC, qemu 0.10,
  .init = pc_init_pci_no_kvmclock,
  .max_cpus = 255,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  PC_COMPAT_0_11,
  {
 @@ -631,7 +640,7 @@ static QEMUMachine isapc_machine = {
  .desc = ISA-only PC,
  .init = pc_init_isa,
  .max_cpus = 1,
 -.default_machine_opts = accel=kvm,kernel_irqchip=on,
 +.default_machine_opts = KVM_MACHINE_OPTIONS,
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = pc-sysfw,
 -- 
 1.7.3.4

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-03 Thread Daniel P. Berrange
On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
 On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
  On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
   Resending series, after fixing some coding style issues. Does anybody has 
   any
   feedback about this proposal?
   
   Changes v1 - v2:
- Coding style fixes
   
   Original cover letter:
   
   I was investigating if there are any mechanisms that allow manually 
   pinning of
   guest RAM to specific host NUMA nodes, in the case of multi-node KVM 
   guests, and
   noticed that -mem-path could be used for that, except that it currently 
   removes
   any files it creates (using mkstemp()) immediately, not allowing numactl 
   to be
   used on the backing files, as a result. This patches add a 
   -keep-mem-path-files
   option to make QEMU create the files inside -mem-path with more 
   predictable
   names, and not remove them after creation.
   
   Some previous discussions about the subject, for reference:
- Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
- Message-ID: 4c7d7c2a.7000...@codemonkey.ws
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
   
   A more recent thread can be found at:
- Message-ID: 20111029184502.gh11...@in.ibm.com
  http://article.gmane.org/gmane.comp.emulators.qemu/123001
   
   Note that this is just a mechanism to facilitate manual static binding 
   using
   numactl on hugetlbfs later, for optimization. This may be especially 
   useful for
   single large multi-node guests use-cases (and, of course, has to be used 
   with
   care).
   
   I don't know if it is a good idea to use the memory range names as a 
   publicly-
   visible interface. Another option may be to use a single file instead, 
   and mmap
   different regions inside the same file for each memory region. I an open 
   to
   comments and suggestions.
   
   Example (untested) usage to bind manually each half of the RAM of a guest 
   to a
   different NUMA node:
   
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
  -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
$ numactl --offset=1G --length=1G --membind=1 --file 
   /mnt/hugetlbfs/FOO/pc.ram
$ numactl --offset=0  --length=1G --membind=2 --file 
   /mnt/hugetlbfs/FOO/pc.ram
  
  I'd suggest that instead of making the memory file name into a
  public ABI QEMU needs to maintain, QEMU could expose the info
  via a monitor command. eg
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/3, offset=1G, length=1G
  
  This example takes advantage of the fact that with Linux, you can
  still access a deleted file via /proc/self/fd/NNN, which AFAICT,
  would avoid the need for a --keep-mem-path-files.
 
 I like the suggestion.
 
 But other processes still need to be able to open those files if we want
 to do anything useful with them. In this case, I guess it's better to
 let QEMU itself build a /proc/getpid()/fd/fd string instead of
 using /proc/self and forcing the client to find out what's the right
 PID?
 
 Anyway, even if we want to avoid file-descriptor and /proc tricks, we
 can still use the interface you suggest. Then we wouldn't need to have
 any filename assumptions: the filenames could be completly random, as
 they would be reported using the new monitor command.

Opps, yes of course. I did intend that client apps could use the
files, so I should have used  /proc/$PID and not /proc/self

 
  
  By returning info via a monitor command you also avoid hardcoding
  the use of 1 single file for all of memory. You also avoid hardcoding
  the fact that QEMU stores the nodes in contiguous order inside the
  node. eg QEMU could easily return data like this
  
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/4, offset=0G, length=1G
  
  or more ingeneous options
 
 Sounds good.
 
 -- 
 Eduardo

-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info

Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-02 Thread Daniel P. Berrange
On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
 Resending series, after fixing some coding style issues. Does anybody has any
 feedback about this proposal?
 
 Changes v1 - v2:
  - Coding style fixes
 
 Original cover letter:
 
 I was investigating if there are any mechanisms that allow manually pinning of
 guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, 
 and
 noticed that -mem-path could be used for that, except that it currently 
 removes
 any files it creates (using mkstemp()) immediately, not allowing numactl to be
 used on the backing files, as a result. This patches add a 
 -keep-mem-path-files
 option to make QEMU create the files inside -mem-path with more predictable
 names, and not remove them after creation.
 
 Some previous discussions about the subject, for reference:
  - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
  - Message-ID: 4c7d7c2a.7000...@codemonkey.ws
http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
 
 A more recent thread can be found at:
  - Message-ID: 20111029184502.gh11...@in.ibm.com
http://article.gmane.org/gmane.comp.emulators.qemu/123001
 
 Note that this is just a mechanism to facilitate manual static binding using
 numactl on hugetlbfs later, for optimization. This may be especially useful 
 for
 single large multi-node guests use-cases (and, of course, has to be used with
 care).
 
 I don't know if it is a good idea to use the memory range names as a publicly-
 visible interface. Another option may be to use a single file instead, and 
 mmap
 different regions inside the same file for each memory region. I an open to
 comments and suggestions.
 
 Example (untested) usage to bind manually each half of the RAM of a guest to a
 different NUMA node:
 
  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
  $ numactl --offset=1G --length=1G --membind=1 --file 
 /mnt/hugetlbfs/FOO/pc.ram
  $ numactl --offset=0  --length=1G --membind=2 --file 
 /mnt/hugetlbfs/FOO/pc.ram

I'd suggest that instead of making the memory file name into a
public ABI QEMU needs to maintain, QEMU could expose the info
via a monitor command. eg

   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/3, offset=1G, length=1G

This example takes advantage of the fact that with Linux, you can
still access a deleted file via /proc/self/fd/NNN, which AFAICT,
would avoid the need for a --keep-mem-path-files.

By returning info via a monitor command you also avoid hardcoding
the use of 1 single file for all of memory. You also avoid hardcoding
the fact that QEMU stores the nodes in contiguous order inside the
node. eg QEMU could easily return data like this


   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/4, offset=0G, length=1G

or more ingeneous options

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 5/6 v5] deal with guest panicked event accoring to -onpanic parameter

2012-06-27 Thread Daniel P. Berrange
On Wed, Jun 27, 2012 at 04:52:32PM +0200, Cornelia Huck wrote:
 On Wed, 27 Jun 2012 15:02:23 +0800
 Wen Congyang we...@cn.fujitsu.com wrote:
 
  When the guest is panicked, it will write 0x1 to the port KVM_PV_PORT.
  So if qemu reads 0x1 from this port, we can do the folloing three
  things according to the parameter -onpanic:
  1. emit QEVENT_GUEST_PANICKED only
  2. emit QEVENT_GUEST_PANICKED and pause the guest
  3. emit QEVENT_GUEST_PANICKED and poweroff the guest
  4. emit QEVENT_GUEST_PANICKED and reset the guest
 
 Would it be useful to add some dump the guest actions here?

Better off leaving that to the mgmt layer using QEMU. If you
tried to directly handle dump the guest in the context of
the panic notifier then you add all sorts of extra complexity
to this otherwise simple feature. For a start the you need to
tell it what filename to use, which is not something you can
necessarily decide at the time QEMU starts - you might want
a separate filename each time a panic ocurrs. THe mgmt app
might not even want QEMU to dump to a file - it might want
to use a socket, or pass in a file descriptor at time of
dump. All in all, it is better to keep the panic notifier
simple, and let the mgmt app then decide whether to take
a dump separately, using existing QEMU monitor commands
and features.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event

2012-06-12 Thread Daniel P. Berrange
On Tue, Jun 12, 2012 at 09:35:04AM -0300, Luiz Capitulino wrote:
 On Tue, 12 Jun 2012 14:55:37 +0800
 Wen Congyang we...@cn.fujitsu.com wrote:
 
   +static void panicked_perform_action(void)
   +{
   +switch(panicked_action) {
   +case PANICKED_REPORT:
   +panicked_mon_event(report);
   +break;
   +
   +case PANICKED_PAUSE:
   +panicked_mon_event(pause);
   +vm_stop(RUN_STATE_GUEST_PANICKED);
   +break;
   +
   +case PANICKED_QUIT:
   +panicked_mon_event(quit);
   +exit(0);
   +break;
   +}
   
   Having the data argument is not needed/wanted. The mngt app can guess it 
   if it
   needs to know it, but I think it doesn't want to.
  
  Libvirt will do something when the kernel is panicked, so it should know 
  the action
  in qemu side.
 
 But the action will be set by libvirt itself, no?

Sure, but the whole world isn't libvirt. If the process listening to the
monitor is not the same as the process which launched the VM, then I
think including the action is worthwhile. Besides, the way Wen has done
this is identical to what we already do with QEVENT_WATCHDOG and I think
it is desirable to keep consistency here.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for June, Tuesday 15th

2012-05-15 Thread Daniel P. Berrange
On Tue, May 15, 2012 at 08:44:14AM -0500, Anthony Liguori wrote:
 On 05/15/2012 03:51 AM, Kevin Wolf wrote:
 Currently we have a very simple unidirectional structure:
 qemu is a standalone program that keeps running on its own. libvirt is
 the user of qemu. Often enough it's already hard to get things working
 correctly in error cases with this simple structure - do you really want
 to have qemu depend on an RPC to libvirt?
 
 Yes.  We're relying on libvirt for a *syscall* that the kernel isn't
 processing correctly.  I'm not advocating a general mechanism where
 we defer larges parts of QEMU to libvirt.  This is specifically the
 open() syscall.
 
 You're right that the proper fix would be in the kernel, but in qemu a
 much better solution that RPCs to libvirt is allowing all QMP commands
 that open new files to pass a block device description that can contain
 a fd.
 
 I don't agree that this is an obviously better solution.  For
 example, it mandates that libvirt parse image formats to determine
 the backing file chains.

I think that the question of parsing image formats is tangential
to this QEMU impl choice.

 OTOH, the open() RPC allows libvirt to avoid parsing image formats.
 It could do something as simple as have the user specify a white
 list of image files the guest is allowed to access in the domain XML
 and validate against that.

 It removes considerable complexity from libvirt as it doesn't have
 to construct a potentially complex set of blockdev arguments.

I don't really think this QEMU approach to a callback for arbitrary
files simplifies libvirt's life in any way. In fact I think it will
actually complicate our life, because instead of being able to
provide all the information/resources required at one time, we need
have to wait to get async callbacks some time later. We then have to
try and figure out whether the file being request is actually allowed
by the config.


 This would much better than first getting an open command via QMP
 and then using an RPC to ask back what we're really meant to open.
 
 To the full extent we're going to get this with blockdev-add (which is
 what we should really start working on now rather than on hacks like
 -open-fd-hook), but if you like hacks, much (if not all) of it is
 already possible today with the 'existing' mode of live snapshots.
 
 I really don't think that blockdev is an elegant solution to this
 problem.  It pushes an awful lot of complexity to libvirt (or any
 management tool).
 
 I actually think Avi's original idea of a filename dictionary is a
 better approach than blockdev for solving this problem.

While I raise blockdev as an alternative approach, I am open to
other alternative ways to provide this config via the CLI or
monitor. Basically anything that isn't this generic file open
callback.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smp option of qemu-kvm

2012-04-05 Thread Daniel P. Berrange
On Thu, Apr 05, 2012 at 02:28:51PM -0400, Steven wrote:
 Hi,
 I started a kvm VM by adding -smp 2 option. From inside the guest, I
 can see that /proc/cpuinfo outputs 2 cores.
 However, in the host, I only observe one qemu-kvm process for that VM.
 Does that mean this VM is actually running on one core?
 If so, how to make a VM to run on 2 or more cores? Thanks.

Each VCPU in KVM corresponds to a separate thread in the process. The
'ps' command only ever shows the thread leader by default - so you
don't see those VCPU threads in the process list. eg ps -eLf to
see all threads

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smp option of qemu-kvm

2012-04-05 Thread Daniel P. Berrange
On Thu, Apr 05, 2012 at 02:52:40PM -0400, Steven wrote:
 Hi, Daniel,
 Thanks for your quick response. However, the ps -eLf show 4 threads
 for the VM and I checked 4 threads have the same tgid.
 But the VM I created is with -smp 2 option. Could you explain this? Thanks.

As well as the vCPU threads, QEMU creates other threads as needed, typically
for I/O - indeed the count of threads may vary over time.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-21 Thread Daniel P. Berrange
On Wed, Mar 21, 2012 at 06:25:16PM +0200, Avi Kivity wrote:
 On 03/21/2012 06:18 PM, Corey Minyard wrote:
 
  Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic
  event over IMPI. The code is pretty complex. Of course if we a going to
  implement something more complex than simple hypercall for panic
  notification we better do something more interesting with it than just
  saying panic happened, like sending stack traces on all cpus for
  instance.
 
  I doubt that's the best example, unfortunately.  The IPMI event log
  has limited space and it has to be send a little piece at a time since
  each log entry is 14 bytes.  It just prints the panic string, nothing
  else.  Not that it isn't useful, it has saved my butt before.
 
  You have lots of interesting options with paravirtualization.  You
  could, for instance, create a console driver that delivered all
  console output efficiently through a hypercall.  That would be really
  easy.  Or, as you mention, a custom way to deliver panic information. 
  Collecting information like stack traces would be harder to
  accomplish, as I don't think there is currently a way to get it except
  by sending it to printk.
 
 That already exists; virtio-console (or serial console emulation) can do
 the job.
 
 In fact the feature can be implemented 100% host side by searching for a
 panic string signature in the console logs.

You can even go one better and search for the panic string in the
guest memory directly, which is what virt-dmesg does :-)

  http://people.redhat.com/~rjones/virt-dmesg/

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 03:21:14PM +0530, Amit Shah wrote:
 On (Wed) 14 Mar 2012 [16:29:50], Wen Congyang wrote:
  At 03/13/2012 06:47 PM, Avi Kivity Wrote:
   On 03/13/2012 11:18 AM, Daniel P. Berrange wrote:
   On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
   On 03/12/2012 11:04 AM, Wen Congyang wrote:
   Do you have any other comments about this patch?
  
  
   Not really, but I'm not 100% convinced the patch is worthwhile.  It's
   likely to only be used by Linux, which has kexec facilities, and you can
   put talk to management via virtio-serial and describe the crash in more
   details than a simple hypercall.
  
   As mentioned before, I don't think virtio-serial is a good fit for this.
   We want something that is simple  guaranteed always available. Using
   virtio-serial requires significant setup work on both the host and guest.
   
   So what?  It needs to be done anyway for the guest agent.
   
   Many management application won't know to make a vioserial device 
   available
   to all guests they create. 
   
   Then they won't know to deal with the panic event either.
   
   Most administrators won't even configure kexec,
   let alone virtio serial on top of it. 
   
   It should be done by the OS vendor, not the individual admin.
   
   The hypercall requires zero host
   side config, and zero guest side config, which IMHO is what we need for
   this feature.
   
   If it was this one feature, yes.  But we keep getting more and more
   features like that and we bloat the hypervisor.  There's a reason we
   have a host-to-guest channel, we should use it.
   
  
  I donot know how to use virtio-serial.
  
  I start vm like this:
  qemu ...\
 -device virtio-serial \
-chardev socket,path=/tmp/foo,server,nowait,id=foo \
-device virtserialport,chardev=foo,name=port1 ...
 
 This is sufficient.  On the host, you can open /tmp/foo using a custom
 program or nc (nc -U /tmp/foo).  On the guest, you can just open
 /dev/virtio-ports/port1 and read/write into it.
 
 See the following links for more details.
 
 https://fedoraproject.org/wiki/Features/VirtioSerial#How_To_Test
 http://www.linux-kvm.org/page/Virtio-serial_API
 
  You said that there are too many channels. Does it mean /tmp/foo is a 
  channel?
 
 You can have several such -device virtserialport.  The -device part
 describes what the guest will see.  The -chardev part ties that to the
 host-side part of the channel.
 
 /tmp/foo is the host end-point for the channel, in the example above,
 and /dev/virtio-ports/port1 is the guest-side end-point.

If we do choose to use virtio-serial for panics (which I don't think
we should), then we should not expose it in the host filesystem. The
host side should be a virtual chardev backend internal to QEMU, in
the same way that 'spicevmc' is handled.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
 At 03/14/2012 06:52 PM, Avi Kivity Wrote:
  On 03/14/2012 12:52 PM, Wen Congyang wrote:
 
  If so, is this channel visible to guest userspace? If the channle is 
  visible to guest
  userspace, the program running in userspace may write the same message 
  to the channel.
 
  Access control is via permissions.  You can have udev scripts assign
  whatever uid and gid to the port of your interest.  By default, all
  ports are only accessible to the root user.
 
  We should also prevent root user writing message to this channel if it is
  used for panicked notification.
 
  
  Why?  root can easily cause a panic.
  
 
 root user can write the same message to virtio-serial while the guest is 
 running...

Unless you are running a MAC policy which strictly confines the root
account, root can cause a kernel panic regardless of virtio-serial
permissions in the guest:

  echo c  /proc/sysrq-trigger

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-14 Thread Daniel P. Berrange
On Wed, Mar 14, 2012 at 07:06:50PM +0800, Wen Congyang wrote:
 At 03/14/2012 06:59 PM, Daniel P. Berrange Wrote:
  On Wed, Mar 14, 2012 at 06:58:47PM +0800, Wen Congyang wrote:
  At 03/14/2012 06:52 PM, Avi Kivity Wrote:
  On 03/14/2012 12:52 PM, Wen Congyang wrote:
 
  If so, is this channel visible to guest userspace? If the channle is 
  visible to guest
  userspace, the program running in userspace may write the same message 
  to the channel.
 
  Access control is via permissions.  You can have udev scripts assign
  whatever uid and gid to the port of your interest.  By default, all
  ports are only accessible to the root user.
 
  We should also prevent root user writing message to this channel if it is
  used for panicked notification.
 
 
  Why?  root can easily cause a panic.
 
 
  root user can write the same message to virtio-serial while the guest is 
  running...
  
  Unless you are running a MAC policy which strictly confines the root
  account, root can cause a kernel panic regardless of virtio-serial
  permissions in the guest:
  
echo c  /proc/sysrq-trigger
 
 Yes, root user can cause a kernel panic. But if he writes the same message to 
 virtio-serial,
 the host will see the guest is panicked while the guest is not panicked. The 
 host is cheated.

The host mgmt layer must *ALWAYS* expect that any information originating
from the guest is bogus. It must never trust the guest info. So regardless
of the implementation, you have to expect that the guest might have lied
to you about it being crashed. The same is true even of Xen's panic notifier.

So if an application is automatically triggering core dumps based on this
panic notification, it needs to be aware that the guest can lie and take
steps to avoid the guest causing a DOS attack on the host. Most likely
by rate limiting the frequency of core dumps per guest, and/or setting a
max core dump storage quota per guest.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-13 Thread Daniel P. Berrange
On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote:
 On 03/12/2012 11:04 AM, Wen Congyang wrote:
  Do you have any other comments about this patch?
 
 
 Not really, but I'm not 100% convinced the patch is worthwhile.  It's
 likely to only be used by Linux, which has kexec facilities, and you can
 put talk to management via virtio-serial and describe the crash in more
 details than a simple hypercall.

As mentioned before, I don't think virtio-serial is a good fit for this.
We want something that is simple  guaranteed always available. Using
virtio-serial requires significant setup work on both the host and guest.
Many management application won't know to make a vioserial device available
to all guests they create. Most administrators won't even configure kexec,
let alone virtio serial on top of it. The hypercall requires zero host
side config, and zero guest side config, which IMHO is what we need for
this feature.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event

2012-03-08 Thread Daniel P. Berrange
On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote:
 On 03/08/2012 12:15 PM, Wen Congyang wrote:
  When the host knows the guest is panicked, it will set
  exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive
  this exit_reason, we can send a event to tell management
  application that the guest is panicked and set the guest
  status to RUN_STATE_PANICKED.
 
  Signed-off-by: Wen Congyang we...@cn.fujitsu.com
  ---
   kvm-all.c|5 +
   monitor.c|3 +++
   monitor.h|1 +
   qapi-schema.json |2 +-
   qmp.c|3 ++-
   vl.c |1 +
   6 files changed, 13 insertions(+), 2 deletions(-)
 
  diff --git a/kvm-all.c b/kvm-all.c
  index 77eadf6..b3c9a83 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env)
   (uint64_t)run-hw.hardware_exit_reason);
   ret = -1;
   break;
  +case KVM_EXIT_GUEST_PANICKED:
  +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL);
  +vm_stop(RUN_STATE_PANICKED);
  +ret = -1;
  +break;
 
 
 If the management application is not aware of this event, then it will
 never resume the guest, so it will appear hung.

Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should
still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will
know the guest CPUs have been stopped, even if it isn't aware of the
reason why, which seems fine to me.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 2/2 v3] deal with guest panicked event

2012-03-08 Thread Daniel P. Berrange
On Thu, Mar 08, 2012 at 01:52:45PM +0200, Avi Kivity wrote:
 On 03/08/2012 01:36 PM, Daniel P. Berrange wrote:
  On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote:
   On 03/08/2012 12:15 PM, Wen Congyang wrote:
When the host knows the guest is panicked, it will set
exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive
this exit_reason, we can send a event to tell management
application that the guest is panicked and set the guest
status to RUN_STATE_PANICKED.
   
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 kvm-all.c|5 +
 monitor.c|3 +++
 monitor.h|1 +
 qapi-schema.json |2 +-
 qmp.c|3 ++-
 vl.c |1 +
 6 files changed, 13 insertions(+), 2 deletions(-)
   
diff --git a/kvm-all.c b/kvm-all.c
index 77eadf6..b3c9a83 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env)
 (uint64_t)run-hw.hardware_exit_reason);
 ret = -1;
 break;
+case KVM_EXIT_GUEST_PANICKED:
+monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL);
+vm_stop(RUN_STATE_PANICKED);
+ret = -1;
+break;
   
   
   If the management application is not aware of this event, then it will
   never resume the guest, so it will appear hung.
 
  Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should
  still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will
  know the guest CPUs have been stopped, even if it isn't aware of the
  reason why, which seems fine to me.
 
 No.  The guest is stopped, and there's no reason to suppose that the
 management app will restart it.  Behaviour has changed.
 
 Suppose the guest has reboot_on_panic set; now the behaviour change is
 even more visible - service will stop completely instead of being
 interrupted for a bit while the guest reboots.

Hmm, so this calls for a new command line argument to control behaviour,
similar to what we do for disk werror, eg something like

  --onpanic report|pause|stop|...

where

 report - emit QEVENT_GUEST_PANICKED only
 pause  - emit QEVENT_GUEST_PANICKED and pause VM
 stop   - emit QEVENT_GUEST_PANICKED and quit VM
 stop   - emit QEVENT_GUEST_PANICKED and quit VM

This would map fairly well into libvirt, where we already have config
parameters for controlling what todo with a guest when it panics.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Use getaddrinfo for migration

2012-03-02 Thread Daniel P. Berrange
On Fri, Mar 02, 2012 at 02:25:36PM +0400, Michael Tokarev wrote:
 Not a reply to the patch but a general observation.
 
 I noticed that the tcp migration uses gethostname
 (or getaddrinfo after this patch) from the main
 thread - is it really the way to go?  Note that
 DNS query which is done may block for a large amount
 of time.  Is it really safe in this context?  Should
 it resolve the name in a separate thread, allowing
 guest to run while it is doing that?
 
 This question is important for me because right now
 I'm evaluating a network-connected block device driver
 which should do failover, so it will have to resolve
 alternative name(s) at runtime (especially since list
 of available targets is dynamic).
 
 From one point, _usually_, the delay there is very
 small since it is unlikely you'll do migration or
 failover overseas, so most likely you'll have the
 answer from DNS handy.  But from another point, if
 the DNS is malfunctioning right at that time (eg,
 one of the two DNS resolvers is being rebooted),
 the delay even from local DNS may be noticeable.

Yes, I think you are correct - QEMU should take care to ensure that
DNS resolution can not block the QEMU event loop thread.

There is the GLib extension (getaddrinfo_a) which does async DNS
resolution, but for sake of portability it is probably better
to use a thread to do it.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Daniel P. Berrange
On Wed, Feb 29, 2012 at 11:49:58AM +0200, Avi Kivity wrote:
 On 02/29/2012 03:29 AM, Wen Congyang wrote:
  At 02/28/2012 07:23 PM, Avi Kivity Wrote:
   On 02/27/2012 05:01 AM, Wen Congyang wrote:
   We can know the guest is paniced when the guest runs on xen.
   But we do not have such feature on kvm. This patch implemnts
   this feature, and the implementation is the same as xen:
   register panic notifier, and call hypercall when the guest
   is paniced.
   
   What's the motivation for this?  Xen does this is insufficient.
 
  Another purpose is: management app(for example: libvirt) can do auto
  dump when the guest is crashed. If management app does not do auto
  dump, the guest's user can do dump by hand if he sees the guest is
  paniced.
 
  I am thinking about another status: dumping. This status tells
  the guest's user that the guest is paniced, and the OS's dump function
  is working.
 
  These two status can tell the guest's user whether the guest is pancied,
  and what should he do if the guest is paniced.
 
 
 How about using a virtio-serial channel for this?  You can transfer any
 amount of information (including the dump itself).

When the guest OS has crashed, any dumps will be done from the host
OS using libvirt's core dump mechanism. The guest OS isn't involved
and is likely too dead to be of any use anyway. Likewise it is
quite probably too dead to work a virtio-serial channel or any
similarly complex device. We're really just after the simplest
possible notification that the guest kernel has paniced.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Daniel P. Berrange
On Wed, Feb 29, 2012 at 12:05:32PM +0200, Avi Kivity wrote:
 On 02/29/2012 11:58 AM, Daniel P. Berrange wrote:
   
   How about using a virtio-serial channel for this?  You can transfer any
   amount of information (including the dump itself).
 
  When the guest OS has crashed, any dumps will be done from the host
  OS using libvirt's core dump mechanism. The guest OS isn't involved
  and is likely too dead to be of any use anyway. Likewise it is
  quite probably too dead to work a virtio-serial channel or any
  similarly complex device. We're really just after the simplest
  possible notification that the guest kernel has paniced.
 
 If it's alive enough to panic, it's alive enough to kexec its kdump
 kernel.  After that it can do anything.
 
 Guest-internal dumps are more useful IMO that host-initiated dumps.  In
 a cloud, the host-initiated dump is left on the host, outside the reach
 of the guest admin, outside the guest image where all the symbols are,
 and sometimes not even on the same host if a live migration occurred. 
 It's more useful in small setups, or if the problem is in the
 hypervisor, not the guest.

I don't think guest vs host dumps should be considered mutually exclusive,
they both have pluses+minuses.

Configuring kexec+kdump requires non-negligable guest admin configuration
work before it's usable, and this work is guest OS specific, if it is possible
at all. A permanent panic notifier that's built in the kernel by default
requires zero guest admin config, and can allow host admin to automate
collection of dumps across all their hosts/guests. The KVM hypercall
notification is fairly trivially ported to any OS kernel, by comparison
with a full virtio + virtio-serial impl.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] QEMU applying for Google Summer of Code 2012

2012-02-10 Thread Daniel P. Berrange
On Fri, Feb 10, 2012 at 10:30:24AM +, Stefan Hajnoczi wrote:
 This year's Google Summer of Code has been announced:
 
 http://www.google-melange.com/gsoc/events/google/gsoc2012
 
 For those who haven't heard of GSoC before, it funds university
 students to work on open source projects during the summer.
 Organizations, such as QEMU, can participate to attract students who
 will tackle projects for 12 weeks this summer.  The GSoC program has
 been very successful because it gives students real open source
 experience and organizations can grow their development community.
 
 QEMU has participated for several years and I would like to organize
 our participation this year.  Luiz was QEMU organization administrator
 last year and contacted me because he will not have time this year.  I
 will prepare the application form for QEMU so that we will be
 considered for 2012.
 
 Umbrella organization
 -
 Like last year, we can provide a home for KVM kernel module and
 libvirt projects too if those organizations prefer not to apply to
 GSoC themselves.  Please let us know so we can work together!

To maximise the spirit of collaboration between libvirt  QEMU/KVM
communities I think it would make sense for us to work together under
the same GSoC Umbrella organization.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange
On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
 On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
  
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
 Usability. Users should not have to care about individual tick-based
 clocks. They care about my OS requires lost ticks compensation, yes or no.

FYI, at the libvirt level we model policy against individual timers, for
example:

  clock offset=localtime
timer name=rtc tickpolicy=catchup track=guest/
timer name=pit tickpolicy=delay/
  /clock


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange
On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
 On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes or 
  no.
  
  FYI, at the libvirt level we model policy against individual timers, for
  example:
  
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
 Are the various modes of tickpolicy fully specified somewhere?

There are some (not all that great) docs here:

  http://libvirt.org/formatdomain.html#elementsTime

The meaning of the 4 policies are:

  delay: continue to deliver at normal rate
catchup: deliver at higher rate to catchup
  merge: ticks merged into 1 single tick
discard: all missed ticks are discarded


The original design rationale was here, though beware that some things
changed between this design  the actual implementation libvirt has:

  https://www.redhat.com/archives/libvir-list/2010-March/msg00304.html

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange
On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
 On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks 
  of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes or 
  no.
 
  FYI, at the libvirt level we model policy against individual timers, for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
  
  There are some (not all that great) docs here:
  
http://libvirt.org/formatdomain.html#elementsTime
  
  The meaning of the 4 policies are:
  
delay: continue to deliver at normal rate
 
 What does this mean? The timer stops ticking until the guest accepts its
 ticks again?

It means that the hypervisor will not attempt to do any compensation,
so the guest will see delays in its ticks being delivered  gradually
drift over time.

  catchup: deliver at higher rate to catchup
merge: ticks merged into 1 single tick
  discard: all missed ticks are discarded
 
 But those interpretations aren't stated in the docs. That makes it hard
 to map them on individual hypervisors - or model proper new hypervisor
 interfaces accordingly.

That's not a real problem, now I notice they are missing the docs, I
can just add them in.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange
On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
 On 2012-01-20 13:42, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
  On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, 
  guest
  time advances faster then it should, to the extent where NTP fails 
  to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being 
  executed.
 
  Then the next question is how and where to control this. 
  Conceptually,
  there should rather be a global switch say compensate for lost 
  ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes 
  or no.
 
  FYI, at the libvirt level we model policy against individual timers, for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
 
  There are some (not all that great) docs here:
 
http://libvirt.org/formatdomain.html#elementsTime
 
  The meaning of the 4 policies are:
 
delay: continue to deliver at normal rate
 
  What does this mean? The timer stops ticking until the guest accepts its
  ticks again?
  
  It means that the hypervisor will not attempt to do any compensation,
  so the guest will see delays in its ticks being delivered  gradually
  drift over time.
 
 Still, is the logic as I described? Or what is the difference to discard.

With 'discard', the delayed tick will be thrown away. In 'delay', the
delayed tick will still be injected to the guest, possibly well after
the intended injection time though, and there will be no attempt to
compensate by speeding up delivery of later ticks.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

2012-01-20 Thread Daniel P. Berrange
On Fri, Jan 20, 2012 at 02:02:03PM +0100, Jan Kiszka wrote:
 On 2012-01-20 13:54, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
  On 2012-01-20 13:42, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
  On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they 
  be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, 
  guest
  time advances faster then it should, to the extent where NTP 
  fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being 
  executed.
 
  Then the next question is how and where to control this. 
  Conceptually,
  there should rather be a global switch say compensate for lost 
  ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes 
  or no.
 
  FYI, at the libvirt level we model policy against individual timers, 
  for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
 
  There are some (not all that great) docs here:
 
http://libvirt.org/formatdomain.html#elementsTime
 
  The meaning of the 4 policies are:
 
delay: continue to deliver at normal rate
 
  What does this mean? The timer stops ticking until the guest accepts its
  ticks again?
 
  It means that the hypervisor will not attempt to do any compensation,
  so the guest will see delays in its ticks being delivered  gradually
  drift over time.
 
  Still, is the logic as I described? Or what is the difference to discard.
  
  With 'discard', the delayed tick will be thrown away. In 'delay', the
  delayed tick will still be injected to the guest, possibly well after
  the intended injection time though, and there will be no attempt to
  compensate by speeding up delivery of later ticks.
 
 OK, let's see if I got it:
 
 delay   - all lost ticks are replayed in a row once the guest accepts
   them again
 catchup - lost ticks are gradually replayed at a higher frequency than
   the original tick
 merge   - at most one tick is replayed once the guest accepts it again
 discard - no lost ticks compensation

Yes, I think that is a good understanding.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SPEC-file for making RPMs (with rpmbuild)

2012-01-06 Thread Daniel P. Berrange
On Fri, Jan 06, 2012 at 11:11:21AM +0100, Guido Winkelmann wrote:
 Hi,
 
 Is there a spec-file somewhere for creating RPMs from the newest qemu-kvm 
 release?

The current Fedora RPM specfiles are always a good bet to start off with:

  http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=blob;f=qemu.spec;hb=HEAD

By default these will build all QEMU targets, and a dedicated qemu-kvm
binary too.There is a flag to restrict it to x86 only for cases where
you don't want all archs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 5x slower guest disk performance with virtio disk

2011-12-15 Thread Daniel P. Berrange
On Thu, Dec 15, 2011 at 07:16:22PM +0200, Sasha Levin wrote:
 On Thu, 2011-12-15 at 11:55 -0500, Brian J. Murrell wrote:
  So, about 2/3 of host speed now -- which is much better.  Is 2/3 about
  normal or should I be looking for more? 
 
 aio=native
 
 Thats the qemu setting, I'm not sure where libvirt hides that.

  disk  ...
driver io='threads|native'/
...
  /disk

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] (no subject)

2011-12-07 Thread Daniel P. Berrange
On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote:
 On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote:
  On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
 * KVM tool manages the network completely itself (with DHCP support?),
   no way to configure, except specify the modes (user|tap|none). I
   have not test it yet, but it should need explicit script to setup
   the network rules(e.g. NAT) for the guest access outside world.
   Anyway, there is no way for libvirt to control the guest network.
  
  If KVM tool support TAP devices, can't be do whatever we like with
  that just by passing in a configured TAP device from libvir ?
 
 KVM tool currently creates and configures the TAP devices it uses, it
 shouldn't be an issue to have it use a TAP fd passed to it either.
 
 How does libvirt do it? Create a /dev/tapX on it's own and pass the fd
 to the hypervisor?

Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA
mode), adds it to the neccessary bridge, and/or configures VEPA, etc
and then passes the FD to the hypervisor, with a ARGV parameter to
tell the HV which FD is being passed.

 * console connection is implemented by setup ptys in libvirt, 
   stdout/stderr
   of kvm tool process is redirected to the master pty, and libvirt 
   connects
   to the slave pty. This works fine now, but it might be better if kvm
   tool could provide more advanced console mechanisms. Just like QEMU
   does?
  
  This sounds good enough for now.
 
 KVM tools does a redirection to a PTY, which at that point could be
 redirected to anywhere the user wants.
 
 What features might be interesting to do on top of that?

I presume that Osier is just comparing with the features QEMU has available
for chardevs config, which include

 - PTYs
 - UNIX sockets
 - TCP sockets
 - UDP sockets
 - FIFO pipe
 - Plain file (output only obviously, but useful for logging)

libvirt doesn't specifically need any of them, but it can support those
options if they exist.

 * Not much ways existed yet for external apps or user to query the guest
   informations. But this might be changed soon per KVM tool grows up
   quickly.
  
  What sort of guest info are you thinking about ? The most immediate
  pieces of info I can imagine we need are
  
   - Mapping between PIDs and  vCPU threads
   - Current balloon driver value
 
 Those are pretty easily added using the IPC interface I've mentioned
 above. For example, 'kvm balloon' and 'kvm stat' will return a lot of
 info out of the balloon driver (including the memory stats VQ - which
 afaik we're probably the only ones who actually do that (but I might be
 wrong) :)

Ok, that sounds sufficient for the balloon info.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Allow the user to pass a FD to use as a TAP device

2011-12-07 Thread Daniel P. Berrange
On Wed, Dec 07, 2011 at 06:28:12PM +0200, Pekka Enberg wrote:
 On Wed, Dec 7, 2011 at 11:37 AM, Sasha Levin levinsasha...@gmail.com wrote:
  This allows users to pass a pre-configured fd to use for the network
  interface.
 
  For example:
         kvm run -n mode=tap,fd=3 3/dev/net/tap3
 
  Cc: Daniel P. Berrange berra...@redhat.com
  Cc: Osier Yang jy...@redhat.com
  Signed-off-by: Sasha Levin levinsasha...@gmail.com
 
 Daniel, Osier, I assume this is useful for libvirt?

Yes, this works.

I don't know if kvmtool supports  the VNET_HDR extension yet, but if it
does, then we can make libvirt pass in a pre-opened FD for that too.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] (no subject)

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
 Hi, all
 
 This is a basic implementation of libvirt Native Linux KVM
 Tool driver. Note that this is just made with my own interest
 and spare time, it's not an endorsement/effort by Red Hat,
 and it isn't supported by Red Hat officially.
 
 Basically, the driver is designed as *stateful*, as KVM tool
 doesn't maintain any info about the guest except a socket which
 for its own IPC. And it's implemented by using KVM tool binary,
 which is name kvm currently, along with cgroup controllers
 cpuacct, and memory support. And as one of KVM tool's
 pricinple is to allow both the non-root and root user to play with.
 The driver is designed to support root and non-root too, just
 like QEMU does. Example of the connection URI:
 
 virsh -c kvmtool:///system
 virsh -c kvmtool:///session
 virsh -c kvmtool+unix:///system
 virsh -c kvmtool+unix:///session
 
 The implementation can support more or less than 15 virsh commands
 currently, including basic domain cycle operations (define/undefine,
 start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml,
 ,autostart, dominfo, etc.)
 
 About the domain configuration:
   * kernel: must be specified as KVM tool only support boots
  from the kernel currently (no integration with BIOS app yet).
 
   * disk: only virtio bus is supported, and device type must be 'disk'.
 
   * serial/console: only one console is supported, of type serial or
  virtio (can extend to support multiple console as long as kvm tool
  supports, libvirt already supported mutiple console, see upstream
  commit 0873b688c).
 
   * p9fs: only support specifying the source dir, and mount tag, only
  type of 'mount' is supported.
 
   * memballoon: only virtio is supported, and there is no way
  to config the addr.
 
   * Multiple disk and p9fs is supported.
 
   * Graphics and network are not supported, will explain below.
 
 Please see [PATCH 7/8] for an example of the domain config. (which
 contains all the XMLs supported by current implementation).
 
 The problems of Native Linux KVM Tool from libvirt p.o.v:
 
   * Some destros package qemu-kvm as kvm, also kvm is a long
 established name for KVM itself, so naming the project as
 kvm might be not a good idea. I assume it will be named
 as kvmtool in this implementation, never mind this if you
 don't like that, it can be updated easily. :-)

Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend
using 'kvmtool' as the binary name to avoid confusion with existing
'kvm' binaries based on QEMU.

   * It still doesn't have an official package yet, even no make install.
 means we have no way to check the dependancy and do the checking
 when 'configure'. I assume it will be installed as /usr/bin/kvmtool
 in this implementation. This is the main reason which can prevents
 upstream libvirt accepting the patches I guess.

Ok, not really a problem - we do similar for the regular QEMU driver.

   * Lacks of options for user's configuration, such as -vnc, there
 is no option for user to configure the properties for the vnc,
 such as the port. It hides things, doesn't provide ways to query
 the properties too, this causes problems for libvirt to add the
 vnc support, as vnc clients such as virt-manager, virt-viewer,
 have no way to connect the guest. Even vncviewer can't.

Being able to specify a VNC port of libvirt's choosing is pretty
much mandatory to be able to support that.In addition being able
to specify the bind address is important to be able to control
security. eg to only bind to 127.0.0.1, or only to certain NICs
in a multi-NIC host.

   * KVM tool manages the network completely itself (with DHCP support?),
 no way to configure, except specify the modes (user|tap|none). I
 have not test it yet, but it should need explicit script to setup
 the network rules(e.g. NAT) for the guest access outside world.
 Anyway, there is no way for libvirt to control the guest network.

If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?

   * There is a gap about the domain status between KVM tool and libvirt,
 it's caused by KVM tool unlink()s the guest socket when user exits
 from console (both text and graphic), but libvirt still think the
 guest is running. 

Being able to reliably detect shutdown/exit of the KVM too is
a very important tasks, and we can't rely on waitpid/SIG_CHLD
because we want to daemonize all instances wrt libvirtd.

In the QEMU driver we keep open a socket to the monitor, and
when we see an I/O error / POLLHUP on the socket we know that
QEMU has quit.

What is this guest socket used for ? Could libvirt keep open a
connection to it ?

One other option would be to use inotify to watch for deletion
of the guest socket in the filesystem. This is sortof what we
do with the UML 

Re: [libvirt] [PATCH] kvm tools: Introduce an ENV variable for the state dir

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:00PM +0800, Osier Yang wrote:
 Which is named as KVMTOOL_STATE_DIR, so that the user can
 configure the path of state directly as he wants.
 ---
  tools/kvm/main.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/tools/kvm/main.c b/tools/kvm/main.c
 index 05bc82c..37b2b1d 100644
 --- a/tools/kvm/main.c
 +++ b/tools/kvm/main.c
 @@ -13,7 +13,12 @@ static int handle_kvm_command(int argc, char **argv)
  
  int main(int argc, char *argv[])
  {
 - kvm__set_dir(%s/%s, HOME_DIR, KVM_PID_FILE_PATH);
 + char *state_dir = getenv(KVMTOOL_STATE_DIR);
 +
 + if (state_dir)
 + kvm__set_dir(%s, state_dir);
 + else
 + kvm__set_dir(%s/%s, HOME_DIR, KVM_PID_FILE_PATH);
  
   return handle_kvm_command(argc - 1, argv[1]);
  }

As per my comments in the first patch, I don't think this is critical
for libvirt's needs. We should just honour the default location that
the KVM tool uses, rather than forcing a libvirt specific location.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [PATCH 5/7] kvmtool: Add new domain type

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:04PM +0800, Osier Yang wrote:
 It's named as kvmtool.
 ---
  src/conf/domain_conf.c |4 +++-
  src/conf/domain_conf.h |1 +
  2 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
 index 58f4d0f..55121d8 100644
 --- a/src/conf/domain_conf.c
 +++ b/src/conf/domain_conf.c
 @@ -91,7 +91,8 @@ VIR_ENUM_IMPL(virDomainVirt, VIR_DOMAIN_VIRT_LAST,
hyperv,
vbox,
one,
 -  phyp)
 +  phyp,
 +  kvmtool)
  
  VIR_ENUM_IMPL(virDomainBoot, VIR_DOMAIN_BOOT_LAST,
fd,
 @@ -4018,6 +4019,7 @@ virDomainChrDefParseXML(virCapsPtr caps,
  if (type == NULL) {
  def-source.type = VIR_DOMAIN_CHR_TYPE_PTY;
  } else if ((def-source.type = virDomainChrTypeFromString(type))  0) {
 +VIR_WARN(type = %s, type);
  virDomainReportError(VIR_ERR_XML_ERROR,
   _(unknown type presented to host for character 
 device: %s),
   type);
 diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
 index a3cb834..001bc46 100644
 --- a/src/conf/domain_conf.h
 +++ b/src/conf/domain_conf.h
 @@ -59,6 +59,7 @@ enum virDomainVirtType {
  VIR_DOMAIN_VIRT_VBOX,
  VIR_DOMAIN_VIRT_ONE,
  VIR_DOMAIN_VIRT_PHYP,
 +VIR_DOMAIN_VIRT_KVMTOOL,
  
  VIR_DOMAIN_VIRT_LAST,
  };

IMHO this patch is not required. The domain type is refering to the
hypervisor used for the domain, which is still 'kvm'. What is different
here is just the userspace device model.  If you look at the 3 different
Xen user spaces we support, all of them use domain type='xen' still.
So just use  domain type='kvm' here for kvmtool.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [PATCH 2/7] kvmtool: Add documents

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:01PM +0800, Osier Yang wrote:
 The document is rather rough now, but at least contains an domain
 config example of all the current supported XMLs, and tells how to
 play with the driver.
 ---
  docs/drivers.html.in|1 +
  docs/drvkvmtool.html.in |   87 
 +++
  docs/index.html.in  |3 ++
  docs/sitemap.html.in|4 ++
  src/README  |3 +-
  5 files changed, 97 insertions(+), 1 deletions(-)
  create mode 100644 docs/drvkvmtool.html.in
 
 diff --git a/docs/drivers.html.in b/docs/drivers.html.in
 index 75038fc..249c137 100644
 --- a/docs/drivers.html.in
 +++ b/docs/drivers.html.in
 @@ -29,6 +29,7 @@
listronga href=drvvmware.htmlVMware 
 Workstation/Player/a/strong/li
listronga href=drvxen.htmlXen/a/strong/li
listronga href=drvhyperv.htmlMicrosoft 
 Hyper-V/a/strong/li
 +  listronga href=drvkvmtool.htmlNative Linux KVM 
 Tool/a/strong/li
  /ul
  
  h2a name=stroageStorage drivers/a/h2
 diff --git a/docs/drvkvmtool.html.in b/docs/drvkvmtool.html.in
 new file mode 100644
 index 000..1b6acdf
 --- /dev/null
 +++ b/docs/drvkvmtool.html.in
 @@ -0,0 +1,87 @@
 +html
 +  body
 +h1KVM tool driver/h1
 +
 +ul id=toc/ul
 +
 +p
 +  The libvirt KVMTOOL driver manages hypervisor Native Linux KVM Tool,
 +  it's implemented by using command line of kvm tool binary.
 +/p
 +
 +h2a name=projectProject Links/a/h2
 +
 +ul
 +  li
 +The a href=git://github.com/penberg/linux-kvm.gitNative Linux 
 KVM Tool/a Native
 +Linux KVM Tool
 +  /li
 +/ul
 +
 +h2a name=urisConnections to the KVMTOOL driver/a/h2
 +p
 +  The libvirt KVMTOOL driver is a multi-instance driver, providing a 
 single
 +  system wide privileged driver (the system instance), and per-user
 +  unprivileged drivers (the session instance). The URI driver protocol
 +  is kvmtool. Some example conection URIs for the libvirt driver are:
 +/p
 +
 +pre
 +  kvmtool:///session  (local access to per-user 
 instance)
 +  kvmtool+unix:///session (local access to per-user 
 instance)
 +
 +  kvmtool:///system   (local access to system 
 instance)
 +  kvmtool+unix:///system  (local access to system 
 instance)
 +/pre
 +p
 +  cgroups controllers cpuacct, and memory are supported currently.
 +/p
 +
 +  h3Example config/h3
 +
 +  pre
 +lt;domain type='kvmtool' id='1'gt;

As mentioned in a later patch, we should just use  type='kvm' here still


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [PATCH 3/7] kvmtool: Add new enums and error codes for the driver

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:02PM +0800, Osier Yang wrote:
 ---
  include/libvirt/virterror.h |1 +
  src/driver.h|1 +
  src/util/virterror.c|3 +++
  3 files changed, 5 insertions(+), 0 deletions(-)
 
 diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
 index a8549b7..deda42d 100644
 --- a/include/libvirt/virterror.h
 +++ b/include/libvirt/virterror.h
 @@ -84,6 +84,7 @@ typedef enum {
  VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */
  VIR_FROM_LOCKING = 42,   /* Error from lock manager */
  VIR_FROM_HYPERV = 43,/* Error from Hyper-V driver */
 +VIR_FROM_KVMTOOL = 44,   /* Error from kvm tool driver */
  } virErrorDomain;
  
  
 diff --git a/src/driver.h b/src/driver.h
 index 4c14aaa..158a13c 100644
 --- a/src/driver.h
 +++ b/src/driver.h
 @@ -30,6 +30,7 @@ typedef enum {
  VIR_DRV_VMWARE = 13,
  VIR_DRV_LIBXL = 14,
  VIR_DRV_HYPERV = 15,
 +VIR_DRV_KVMTOOL = 16,
  } virDrvNo;
  
  
 diff --git a/src/util/virterror.c b/src/util/virterror.c
 index 5006fa2..abb5b5a 100644
 --- a/src/util/virterror.c
 +++ b/src/util/virterror.c
 @@ -175,6 +175,9 @@ static const char *virErrorDomainName(virErrorDomain 
 domain) {
  case VIR_FROM_HYPERV:
  dom = Hyper-V ;
  break;
 +case VIR_FROM_KVMTOOL:
 +dom = KVMTOOL ;
 +break;
  }
  return(dom);
  }

Trivial, ACK

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [PATCH 4/7] kvmtool: Add hook support for kvmtool domain

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:03PM +0800, Osier Yang wrote:
 Just like QEMU and LXC, kvm driver intends to support running hook
 script before domain starting and after domain shutdown too.
 ---
  src/util/hooks.c |   11 ++-
  src/util/hooks.h |8 
  2 files changed, 18 insertions(+), 1 deletions(-)
 
 diff --git a/src/util/hooks.c b/src/util/hooks.c
 index 110a94b..765cb68 100644
 --- a/src/util/hooks.c
 +++ b/src/util/hooks.c
 @@ -52,12 +52,14 @@ VIR_ENUM_DECL(virHookDaemonOp)
  VIR_ENUM_DECL(virHookSubop)
  VIR_ENUM_DECL(virHookQemuOp)
  VIR_ENUM_DECL(virHookLxcOp)
 +VIR_ENUM_DECL(virHookKvmToolOp)
  
  VIR_ENUM_IMPL(virHookDriver,
VIR_HOOK_DRIVER_LAST,
daemon,
qemu,
 -  lxc)
 +  lxc,
 +  kvmtool)
  
  VIR_ENUM_IMPL(virHookDaemonOp, VIR_HOOK_DAEMON_OP_LAST,
start,
 @@ -79,6 +81,10 @@ VIR_ENUM_IMPL(virHookLxcOp, VIR_HOOK_LXC_OP_LAST,
start,
stopped)
  
 +VIR_ENUM_IMPL(virHookKvmToolOp, VIR_HOOK_KVMTOOL_OP_LAST,
 +  start,
 +  stopped)
 +
  static int virHooksFound = -1;
  
  /**
 @@ -230,6 +236,9 @@ virHookCall(int driver, const char *id, int op, int 
 sub_op, const char *extra,
  case VIR_HOOK_DRIVER_LXC:
  opstr = virHookLxcOpTypeToString(op);
  break;
 +case VIR_HOOK_DRIVER_KVMTOOL:
 +opstr = virHookKvmToolOpTypeToString(op);
 +break;
  }
  if (opstr == NULL) {
  virHookReportError(VIR_ERR_INTERNAL_ERROR,
 diff --git a/src/util/hooks.h b/src/util/hooks.h
 index fd7411c..69081c4 100644
 --- a/src/util/hooks.h
 +++ b/src/util/hooks.h
 @@ -31,6 +31,7 @@ enum virHookDriverType {
  VIR_HOOK_DRIVER_DAEMON = 0,/* Daemon related events */
  VIR_HOOK_DRIVER_QEMU,  /* QEmu domains related events */
  VIR_HOOK_DRIVER_LXC,   /* LXC domains related events */
 +VIR_HOOK_DRIVER_KVMTOOL,   /* KVMTOOL domains related events */
  
  VIR_HOOK_DRIVER_LAST,
  };
 @@ -67,6 +68,13 @@ enum virHookLxcOpType {
  VIR_HOOK_LXC_OP_LAST,
  };
  
 +enum virHookKvmToolOpType {
 +VIR_HOOK_KVMTOOL_OP_START,/* domain is about to start */
 +VIR_HOOK_KVMTOOL_OP_STOPPED,  /* domain has stopped */
 +
 +VIR_HOOK_KVMTOOL_OP_LAST,
 +};
 +
  int virHookInitialize(void);
  
  int virHookPresent(int driver);

Trivial, ACK


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [PATCH 7/7] kvmtool: Implementation for kvm tool driver

2011-12-06 Thread Daniel P. Berrange
On Fri, Nov 11, 2011 at 07:57:06PM +0800, Osier Yang wrote:
 Basically, the drivers is implemented by using kvm tool binary
 currently, (see ./kvm help for more info).
 
 Current implementation supports define/undefine, start/destroy/,
 suspend/resume, connect to guest console via virsh console,
 and balloon memory with with virsh setmem (using ./kvm balloon
 command). Also as it supports cgroup controllers cpuacct, and
 memory, so some other commands like schedinfo, memtune can
 also work. Some other commands such as domid, domname, dumpxml
 ,autostart, etc. are supported, as the driver is designed
 as a stateful driver, those APIs just need to talk with libvirtd
 simply.
 
 As Native Linux KVM Tool is designed for both non-root and root users,
 the driver is designed just like QEMU, supports two modes of the
 connection:
 
 kvmtool:///system
 kvmtool+unix:///system
 
 kvmtool:///session
 kvmtool+unix:///session
 
 An example of the domain XML (all the XMLs supported currently are
 listed):
 
 % virsh -c kvm:///system dumpxml kvm_test
 domain type='kvmtool'
   namekvm_test/name
   uuid88bf38f1-b6ab-cfa6-ab53-4b4c0993d894/uuid
   memory524288/memory
   currentMemory524288/currentMemory
   vcpu1/vcpu
   os
 type arch='x86_64'hvm/type
 kernel/boot/bzImage/kernel
 boot dev='hd'/
   /os
   clock offset='utc'/
   on_poweroffdestroy/on_poweroff
   on_rebootrestart/on_reboot
   on_crashrestart/on_crash
   devices
 emulator/usr/bin/kvmtool/emulator
 disk type='file' device='disk'
   source file='/var/lib/libvirt/images/linux-0.2.img'/
   target dev='vda' bus='virtio'/
 /disk
 filesystem type='mount' accessmode='passthrough'
   source dir='/tmp'/
   target dir='/mnt'/
 /filesystem
 console type='pty'
   target type='serial' port='0'/
 /console
 memballoon model='virtio'/
   /devices
 /domain
 ---
  cfg.mk   |1 +
  daemon/Makefile.am   |4 +
  daemon/libvirtd.c|7 +
  po/POTFILES.in   |2 +
  src/Makefile.am  |   36 +-
  src/kvmtool/kvmtool_conf.c   |  130 ++
  src/kvmtool/kvmtool_conf.h   |   66 +
  src/kvmtool/kvmtool_driver.c | 3079 
 ++
  src/kvmtool/kvmtool_driver.h |   29 +

My main suggestion here would be to split up the kvmtool_driver.c
file into 3 parts as we did with the QEMU driver.

  kvmtool_driver.c   - Basic libvirt API glue
  kvmtool_command.c  - ARGV generation
  kvmtool_process.c  - KVMtool process start/stop/autostart/autodestroy

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for November 29

2011-11-30 Thread Daniel P. Berrange
On Wed, Nov 30, 2011 at 11:22:37AM +0200, Alon Levy wrote:
 On Tue, Nov 29, 2011 at 04:59:51PM -0600, Anthony Liguori wrote:
  On 11/29/2011 10:59 AM, Avi Kivity wrote:
  On 11/29/2011 05:51 PM, Juan Quintela wrote:
  How to do high level stuff?
  - python?
  
  
  One of the disadvantages of the various scripting languages is the lack
  of static type checking, which makes it harder to do full sweeps of the
  source for API changes, relying on the compiler to catch type (or other)
  errors.
  
  This is less interesting to me (figuring out the perfectest language to 
  use).
  
  I think what's more interesting is the practical execution of
  something like this.  Just assuming we used python (since that's
  what I know best), I think we could do something like this:
  
  1) We could write a binding layer to expose the QMP interface as a
  python module.  This would be very little binding code but would
  bring a bunch of functionality to python bits.
 
 If going this route, I would propose to use gobject-introspection [1]
 instead of directly binding to python. You should be able to get
 multiple languages support this way, including python. I think it
 requires using glib 3.0, but I haven't tested it myself (yet). Maybe
 someone more knowledgable can shoot it down.
 
 [1] http://live.gnome.org/GObjectIntrospection/
 
 Actually this might make sense for the whole of QEMU. I think for a
 defined interface like QMP implementing the interface directly in python
 makes more sense. But having qemu itself GObject'ified and scriptable
 is cool. It would also lend it self to 4) without going through 2), but
 also make 2) possible (with any language, not just python).

I think taking advantage of GObject introspection is fine idea - I
certainly don't want to manually create python (or any other language)
bindings for any C code ever again. GObject + introspection takes away
all the burden of supporting access to C code from non-C languages.
Given that QEMU has already adopted GLib as mandatory infrastructure,
going down the GObject route seems like a very natural fit/direction
to take.

If people like the idea of a higher level language for QEMU, but are
concerned about performance / overhead of embedding a scripting
language in QEMU, then GObject introspection opens the possibilty of
writing in Vala, which is a higher level language which compiles
straight down to machine code like C does.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
 On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
   Live migration with qcow2 or any other image format is just not going to 
   work 
   right now even with proper clustered storage.  I think doing a block 
   level flush 
   cache interface and letting block devices decide how to do it is the best 
   approach.
 
  I would really prefer reusing the existing open/close code. It means
  less (duplicated) code, is existing code that is well tested and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can reopen
  only the topmost layer (i.e. the format, but not the protocol) for now
  and in 1.1 we can use bdrv_reopen().
 
 
 Intuitively I dislike _reopen style interfaces.  If the second open
 yields different results from the first, does it invalidate any
 computations in between?
 
 What's wrong with just delaying the open?

If you delay the 'open' until the mgmt app issues 'cont', then you loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a five
stage migration handshake to cope with rollback when 'cont' fails.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
   On 11/11/2011 12:15 PM, Kevin Wolf wrote:
Am 10.11.2011 22:30, schrieb Anthony Liguori:
 Live migration with qcow2 or any other image format is just not going 
 to work 
 right now even with proper clustered storage.  I think doing a block 
 level flush 
 cache interface and letting block devices decide how to do it is the 
 best approach.
   
I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and doesn't
make migration much of a special case.
   
If you want to avoid reopening the file on the OS level, we can reopen
only the topmost layer (i.e. the format, but not the protocol) for now
and in 1.1 we can use bdrv_reopen().
   
   
   Intuitively I dislike _reopen style interfaces.  If the second open
   yields different results from the first, does it invalidate any
   computations in between?
   
   What's wrong with just delaying the open?
  
  If you delay the 'open' until the mgmt app issues 'cont', then you loose
  the ability to rollback to the source host upon open failure for most
  deployed versions of libvirt. We only fairly recently switched to a five
  stage migration handshake to cope with rollback when 'cont' fails.
  
  Daniel
 
 I guess reopen can fail as well, so this seems to me to be an important
 fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the possibility
of configuration mistakes, and will have caught most storage runtime errors
too. So there is a very significant difference in reliability between doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just not going 
  to work 
  right now even with proper clustered storage.  I think doing a block 
  level flush 
  cache interface and letting block devices decide how to do it is the 
  best approach.
 
  I would really prefer reusing the existing open/close code. It means
  less (duplicated) code, is existing code that is well tested and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can reopen
  only the topmost layer (i.e. the format, but not the protocol) for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then you loose
  the ability to rollback to the source host upon open failure for most
  deployed versions of libvirt. We only fairly recently switched to a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a later
  re-open will succeed too, because you have already elminated the possibility
  of configuration mistakes, and will have caught most storage runtime errors
  too. So there is a very significant difference in reliability between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
   Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
On 11/11/2011 12:15 PM, Kevin Wolf wrote:
Am 10.11.2011 22:30, schrieb Anthony Liguori:
Live migration with qcow2 or any other image format is just not 
going to work 
right now even with proper clustered storage.  I think doing a 
block level flush 
cache interface and letting block devices decide how to do it is 
the best approach.
   
I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and 
doesn't
make migration much of a special case.
   
If you want to avoid reopening the file on the OS level, we can 
reopen
only the topmost layer (i.e. the format, but not the protocol) for 
now
and in 1.1 we can use bdrv_reopen().
   
   
Intuitively I dislike _reopen style interfaces.  If the second open
yields different results from the first, does it invalidate any
computations in between?
   
What's wrong with just delaying the open?
   
If you delay the 'open' until the mgmt app issues 'cont', then you 
loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a 
five
stage migration handshake to cope with rollback when 'cont' fails.
   
Daniel
   
I guess reopen can fail as well, so this seems to me to be an important
fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the 
possibility
of configuration mistakes, and will have caught most storage runtime 
errors
too. So there is a very significant difference in reliability between 
doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.
   
   Do you have some more details on the kind of errors? Missing files,
   permissions, something like this? Or rather something related to the
   actual content of an image file?
  
  Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
  setup. Access permissions due to incorrect user / group setup, or read
  only mounts, or SELinux denials. Actual I/O errors are less common and
  are not so likely to cause QEMU to fail to start any, since QEMU is
  likely to just report them to the guest OS instead.
 
 Do you run qemu with -S, then give a 'cont' command to start it?

Yes

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:37:27AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
   On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just 
  not going to work 
  right now even with proper clustered storage.  I think doing a 
  block level flush 
  cache interface and letting block devices decide how to do it 
  is the best approach.
 
  I would really prefer reusing the existing open/close code. It 
  means
  less (duplicated) code, is existing code that is well tested 
  and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can 
  reopen
  only the topmost layer (i.e. the format, but not the protocol) 
  for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second 
  open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then 
  you loose
  the ability to rollback to the source host upon open failure for 
  most
  deployed versions of libvirt. We only fairly recently switched to 
  a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an 
  important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a 
  later
  re-open will succeed too, because you have already elminated the 
  possibility
  of configuration mistakes, and will have caught most storage 
  runtime errors
  too. So there is a very significant difference in reliability 
  between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at 
  detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.
   
   Do you run qemu with -S, then give a 'cont' command to start it?
 
 Probably in an attempt to improve reliability :)

Not really. We can't simply let QEMU start its own CPUs, because there are
various tasks that need performing after the migration transfer finishes,
but before the CPUs are allowed to be started. eg

 - Finish 802.11Qb{g,h} (VEPA) network port profile association on target
 - Release leases for any resources associated with the source QEMU
   via a configured lock manager (eg sanlock)
 - Acquire leases for any resources associated with the target QEMU
   via a configured lock manager (eg sanlock)

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-14 Thread Daniel P. Berrange
On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:37:27AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
   On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just 
  not going to work 
  right now even with proper clustered storage.  I think doing a 
  block level flush 
  cache interface and letting block devices decide how to do it 
  is the best approach.
 
  I would really prefer reusing the existing open/close code. It 
  means
  less (duplicated) code, is existing code that is well tested 
  and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can 
  reopen
  only the topmost layer (i.e. the format, but not the protocol) 
  for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second 
  open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then 
  you loose
  the ability to rollback to the source host upon open failure for 
  most
  deployed versions of libvirt. We only fairly recently switched to 
  a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an 
  important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a 
  later
  re-open will succeed too, because you have already elminated the 
  possibility
  of configuration mistakes, and will have caught most storage 
  runtime errors
  too. So there is a very significant difference in reliability 
  between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at 
  detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.
   
   Do you run qemu with -S, then give a 'cont' command to start it?
  
  Yes
 
 OK, so let's go back one step now - how is this related to
 'rollback to source host'?

In the old libvirt migration protocol, by the time we run 'cont' on the
destination, the source QEMU has already been killed off, so there's
nothing to resume on failure.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange
On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
 What does libvirt actually do in the monitor prior to migration
 completing on the destination?  The least invasive way of doing
 delayed open of block devices is probably to make -incoming create a
 monitor and run a main loop before the block devices (and full
 device model) is initialized.  Since this isolates the changes
 strictly to migration, I'd feel okay doing this for 1.0 (although it
 might need to be in the stable branch).

The way migration works with libvirt wrt QEMU interactions is now
as follows

 1. Destination.
   Run   qemu -incoming ...args...
   Query chardevs via monitor
   Query vCPU threads via monitor
   Set disk / vnc passwords
   Set netdev link states
   Set balloon target

 2. Source
   Set  migration speed
   Set  migration max downtime
   Run  migrate command (detached)
   while 1
  Query migration status
  if status is failed or success
break;

 3. Destination
  If final status was success
 Run  'cont' in monitor
  else
 kill QEMU process

 4. Source
  If final status was success and 'cont' on dest succeeded
 kill QEMU process
  else
 Run 'cont' in monitor


In older libvirt, the bits from step 4, would actually take place
at the end of step 2. This meant we could end up with no QEMU
on either the source or dest, if starting CPUs on the dest QEMU
failed for some reason.


We would still really like to have a 'query-migrate' command for
the destination, so that we can confirm that the destination has
consumed all incoming migrate data successfully, rather than just
blindly starting CPUs and hoping for the best.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange
On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote:
 On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
 On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
 What does libvirt actually do in the monitor prior to migration
 completing on the destination?  The least invasive way of doing
 delayed open of block devices is probably to make -incoming create a
 monitor and run a main loop before the block devices (and full
 device model) is initialized.  Since this isolates the changes
 strictly to migration, I'd feel okay doing this for 1.0 (although it
 might need to be in the stable branch).
 
 The way migration works with libvirt wrt QEMU interactions is now
 as follows
 
   1. Destination.
 Run   qemu -incoming ...args...
 Query chardevs via monitor
 Query vCPU threads via monitor
 Set disk / vnc passwords
 
 Since RHEL carries Juan's patch, and Juan's patch doesn't handle
 disk passwords gracefully, how does libvirt cope with that?

No idea, that's the first I've heard of any patch that causes
problems with passwords in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange
On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
 Kevin Wolf kw...@redhat.com writes:
 
  Am 25.10.2011 16:06, schrieb Anthony Liguori:
  On 10/25/2011 08:56 AM, Kevin Wolf wrote:
  Am 25.10.2011 15:05, schrieb Anthony Liguori:
  I'd be much more open to changing the default mode to cache=none FWIW 
  since the
  risk of data loss there is much, much lower.
 
  I think people said that they'd rather not have cache=none as default
  because O_DIRECT doesn't work everywhere.
  
  Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
  know it 
  works on ext[234], btrfs, nfs.
 
  Besides file systems (and probably OSes) that don't support O_DIRECT,
  there's another case: Our defaults don't work on 4k sector disks today.
  You need to explicitly specify the logical_block_size qdev property for
  cache=none to work on them.
 
  And changing this default isn't trivial as the right value doesn't only
  depend on the host disk, but it's also guest visible. The only way out
  would be bounce buffers, but I'm not sure that doing that silently is a
  good idea...
 
 Sector size is a device property.
 
 If the user asks for a 4K sector disk, and the backend can't support
 that, we need to reject the configuration.  Just like we reject
 read-only backends for read/write disks.

I don't see why we need to reject a guest disk with 4k sectors,
just because the host disk only has 512 byte sectors. A guest
sector size that's a larger multiple of host sector size should
work just fine. It just means any guest sector write will update
8 host sectors at a time. We only have problems if guest sector
size is not a multiple of host sector size, in which case bounce
buffers are the only option (other than rejecting the config
which is not too nice).

IIUC, current QEMU behaviour is

   Guest 512Guest 4k
 Host 512   * OK  OK
 Host 4k* I/O Err OK

'*' marks defaults

IMHO, QEMU needs to work withot I/O errors in all of these
combinations, even if this means having to use bounce buffers
in some of them. That said, IMHO the default should be for
QEMU to avoid bounce buffers, which implies it should either
chose guest sector size to match host sector size, or it
should unconditionally use 4k guest. IMHO we need the former

   Guest 512  Guest 4k
 Host 512   *OK OK
 Host 4k OK*OK


Yes, I know there are other wierd sector sizes besides 512
and 4k, but the same general principals apply of either one
being a multiple of the other, or needing to use bounce
buffers.

 If the backend can only support it by using bounce buffers, I'd say
 reject it unless the user explicitly permits bounce buffers.  But that's
 debatable.

I don't think it really adds value for QEMU to force the user to specify
some extra magic flag in order to make the user's requested config
actually be honoured. If a config needs bounce buffers, QEMU should just
do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app
is in a better position to inform users about the consequences.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange
On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote:
 Am 26.10.2011 11:57, schrieb Daniel P. Berrange:
  On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
  Kevin Wolf kw...@redhat.com writes:
 
  Am 25.10.2011 16:06, schrieb Anthony Liguori:
  On 10/25/2011 08:56 AM, Kevin Wolf wrote:
  Am 25.10.2011 15:05, schrieb Anthony Liguori:
  I'd be much more open to changing the default mode to cache=none FWIW 
  since the
  risk of data loss there is much, much lower.
 
  I think people said that they'd rather not have cache=none as default
  because O_DIRECT doesn't work everywhere.
 
  Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
  know it 
  works on ext[234], btrfs, nfs.
 
  Besides file systems (and probably OSes) that don't support O_DIRECT,
  there's another case: Our defaults don't work on 4k sector disks today.
  You need to explicitly specify the logical_block_size qdev property for
  cache=none to work on them.
 
  And changing this default isn't trivial as the right value doesn't only
  depend on the host disk, but it's also guest visible. The only way out
  would be bounce buffers, but I'm not sure that doing that silently is a
  good idea...
 
  Sector size is a device property.
 
  If the user asks for a 4K sector disk, and the backend can't support
  that, we need to reject the configuration.  Just like we reject
  read-only backends for read/write disks.
  
  I don't see why we need to reject a guest disk with 4k sectors,
  just because the host disk only has 512 byte sectors. A guest
  sector size that's a larger multiple of host sector size should
  work just fine. It just means any guest sector write will update
  8 host sectors at a time. We only have problems if guest sector
  size is not a multiple of host sector size, in which case bounce
  buffers are the only option (other than rejecting the config
  which is not too nice).
  
  IIUC, current QEMU behaviour is
  
 Guest 512Guest 4k
   Host 512   * OK  OK
   Host 4k* I/O Err OK
  
  '*' marks defaults
  
  IMHO, QEMU needs to work withot I/O errors in all of these
  combinations, even if this means having to use bounce buffers
  in some of them. That said, IMHO the default should be for
  QEMU to avoid bounce buffers, which implies it should either
  chose guest sector size to match host sector size, or it
  should unconditionally use 4k guest. IMHO we need the former
  
 Guest 512  Guest 4k
   Host 512   *OK OK
   Host 4k OK*OK
 
 I'm not sure if a 4k host should imply a 4k guest by default. This means
 that some guests wouldn't be able to run on a 4k host. On the other
 hand, for those guests that can do 4k, it would be the much better option.
 
 So I think this decision is the hard thing about it.

I guess it somewhat depends whether we want to strive for

 1. Give the user the fastest working config by default
 2. Give the user a working config by default
 3. Give the user the fastest (possibly broken) config by default

IMHO 3 is not a serious option, but I could see 2 as a reasonable
tradeoff to avoid complexity in chosing QEMU defaults. The user
would have a working config with 512 sectors, but sub-optimal perf
on 4k hosts due to bounce buffering. Ideally libvirt or other
higher app would be setting the best block size that a guest
can support by default, so bounce buffers would rarely be needed.
So only people using QEMU directly without setting a block size
would ordinarily suffer the bounce buffer perf hit on a 4k host
host

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/11] virt: Introducing libvirt VM class

2011-10-12 Thread Daniel P. Berrange
On Tue, Oct 11, 2011 at 06:07:11PM -0300, Lucas Meneghel Rodrigues wrote:
 This is a first attempt at providing a libvirt VM class,
 in order to implement the needed methods for virt testing.
 With this class, we will be able to implement a libvirt
 test, that behaves similarly to the KVM test.
 
 As of implementation details, libvirt_vm uses virsh
 (a userspace program written on top of libvirt) to
 do domain start, stop, verification of status and
 other common operations. The reason why virsh was
 used is to get more coverage of the userspace stack
 that libvirt offers, and also to catch issues that
 virsh users would catch.

Personally I would have recommended that you use the libvirt Python API.
virsh is a very thin layer over the libvirt API, which mostly avoidse
adding any logic of its own, so once it has been tested once, there's
not much value in doing more. By using the Python API directly, you will
be able todo more intelligent handling of errors, since you'll get the
full libvirt python exception object instead of a blob of stuff on stderr.
Not to mention that it is so much more efficient, and robust against
any future changes in virsh.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How many threads should a kvm vm be starting?

2011-09-28 Thread Daniel P. Berrange
On Tue, Sep 27, 2011 at 04:04:41PM -0600, Thomas Fjellstrom wrote:
 On September 27, 2011, Avi Kivity wrote:
  On 09/27/2011 03:29 AM, Thomas Fjellstrom wrote:
   I just noticed something interesting, a virtual machine on one of my
   servers seems to have 69 threads (including the main thread). Other
   guests on the machine only have a couple threads.
   
   Is this normal? or has something gone horribly wrong?
  
  It's normal if the guest does a lot of I/O.  The thread count should go
  down when the guest idles.
 
 Ah, that would make sense. Though it kind of defeats assigning a vm a single 
 cpu/core. A single VM can now DOS an entire multi-core-cpu server. It pretty 
 much pegged my dual core (with HT) server for a couple hours.

You can mitigate these problems by putting each KVM process in its own
cgroup, and using the 'cpu_shares' tunable to ensure that each KVM
process gets the same relative ratio of CPU time, regardless of how
many threads it is running. With newer kernels there are other CPU
tunables for placing hard caps on CPU utilization of the process as
a whole too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-28 Thread Daniel P. Berrange
On Tue, Sep 27, 2011 at 08:10:21PM +0200, Reeted wrote:
 I repost this, this time by also including the libvirt mailing list.
 
 Info on my libvirt: it's the version in Ubuntu 11.04 Natty which is
 0.8.8-1ubuntu6.5 . I didn't recompile this one, while Kernel and
 qemu-kvm are vanilla and compiled by hand as described below.
 
 My original message follows:
 
 This is really strange.
 
 I just installed a new host with kernel 3.0.3 and Qemu-KVM 0.14.1
 compiled by me.
 
 I have created the first VM.
 This is on LVM, virtio etc... if I run it directly from bash
 console, it boots in 8 seconds (it's a bare ubuntu with no
 graphics), while if I boot it under virsh (libvirt) it boots in
 20-22 seconds. This is the time from after Grub to the login prompt,
 or from after Grub to the ssh-server up.

 I was almost able to replicate the whole libvirt command line on the
 bash console, and it still goes almost 3x faster when launched from
 bash than with virsh start vmname. The part I wasn't able to
 replicate is the -netdev part because I still haven't understood the
 semantics of it.

-netdev is just an alternative way of setting up networking that
avoids QEMU's nasty VLAN concept. Using -netdev allows QEMU to
use more efficient codepaths for networking, which should improve
the network performance.

 This is my bash commandline:
 
 /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
 -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
 ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
 -boot order=dc,menu=on -drive 
 file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
 -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
 -usb -vnc 127.0.0.1:0 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


This shows KVM is being requested, but we should validate that KVM is
definitely being activated when under libvirt. You can test this by
doing:

virsh qemu-monitor-command vmname1 'info kvm'

 Which was taken from libvirt's command line. The only modifications
 I did to the original libvirt commandline (seen with ps aux) were:
 
 - Removed -S

Fine, has no effect on performance.

 - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 Has been simplified to: -net nic,model=virtio -net
 tap,ifname=tap0,script=no,downscript=no
 and manual bridging of the tap0 interface.

You could have equivalently used

 -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3

That said, I don't expect this has anything todo with the performance
since booting a guest rarely involves much network I/O unless you're
doing something odd like NFS-root / iSCSI-root.

 Firstly I had thought that this could be fault of the VNC: I have
 compiled qemu-kvm with no separate vnc thread. I thought that
 libvirt might have connected to the vnc server at all times and this
 could have slowed down the whole VM.
 But then I also tried connecting vith vncviewer to the KVM machine
 launched directly from bash, and the speed of it didn't change. So
 no, it doesn't seem to be that.

Yeah, I have never seen VNC be responsible for the kind of slowdown
you describe.

 BTW: is the slowdown of the VM on no separate vnc thread only in
 effect when somebody is actually connected to VNC, or always?

Probably, but again I dont think it is likely to be relevant here.

 Also, note that the time difference is not visible in dmesg once the
 machine has booted. So it's not a slowdown in detecting devices.
 Devices are always detected within the first 3 seconds, according to
 dmesg, at 3.6 seconds the first ext4 mount begins. It seems to be
 really the OS boot that is slow... it seems an hard disk performance
 problem.


There are a couple of things that would be different between running the
VM directly, vs via libvirt.

 - Security drivers - SELinux/AppArmour
 - CGroups

If it is was AppArmour causing this slowdown I don't think you would have
been the first person to complain, so lets ignore that. Which leaves
cgroups as a likely culprit. Do a

  grep cgroup /proc/mounts

if any of them are mounted, then for each cgroups mount in turn,

 - Umount the cgroup
 - Restart libvirtd
 - Test your guest boot performance


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org   

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-28 Thread Daniel P. Berrange
On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
 On 09/28/11 09:51, Daniel P. Berrange wrote:
 This is my bash commandline:
 
 /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
 -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
 ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
 -boot order=dc,menu=on -drive 
 file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
 -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
 -usb -vnc 127.0.0.1:0 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 
 This shows KVM is being requested, but we should validate that KVM is
 definitely being activated when under libvirt. You can test this by
 doing:
 
  virsh qemu-monitor-command vmname1 'info kvm'
 
 kvm support: enabled
 
 I think I would see a higher impact if it was KVM not enabled.
 
 Which was taken from libvirt's command line. The only modifications
 I did to the original libvirt commandline (seen with ps aux) were:


 - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 Has been simplified to: -net nic,model=virtio -net
 tap,ifname=tap0,script=no,downscript=no
 and manual bridging of the tap0 interface.
 You could have equivalently used
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
   -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 
 It's this! It's this!! (thanks for the line)
 
 It raises boot time by 10-13 seconds

Ok, that is truely bizarre and I don't really have any explanation
for why that is. I guess you could try 'vhost=off' too and see if that
makes the difference.

 
 But now I don't know where to look During boot there is a pause
 usually between /scripts/init-bottom  (Ubuntu 11.04 guest) and the
 appearance of login prompt, however that is not really meaningful
 because there is probably much background activity going on there,
 with init etc. which don't display messages
 
 
 init-bottom does just this
 
 -
 #!/bin/sh -e
 # initramfs init-bottom script for udev
 
 PREREQ=
 
 # Output pre-requisites
 prereqs()
 {
 echo $PREREQ
 }
 
 case $1 in
 prereqs)
 prereqs
 exit 0
 ;;
 esac
 
 
 # Stop udevd, we'll miss a few events while we run init, but we catch up
 pkill udevd
 
 # Move /dev to the real filesystem
 mount -n -o move /dev ${rootmnt}/dev
 -
 
 It doesn't look like it should take time to execute.
 So there is probably some other background activity going on... and
 that is slower, but I don't know what that is.
 
 
 Another thing that can be noticed is that the dmesg message:
 
 [   13.290173] eth0: no IPv6 routers present
 
 (which is also the last message)
 
 happens on average 1 (one) second earlier in the fast case (-net)
 than in the slow case (-netdev)

Hmm, none of that looks particularly suspect. So I don't really have
much idea what else to try apart from the 'vhost=off' possibilty.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

2011-09-28 Thread Daniel P. Berrange
On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
 On 09/28/11 11:28, Daniel P. Berrange wrote:
 On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
 On 09/28/11 09:51, Daniel P. Berrange wrote:
 This is my bash commandline:
 
 /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
 -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
 ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
 -boot order=dc,menu=on -drive 
 file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
 -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
 -usb -vnc 127.0.0.1:0 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 This shows KVM is being requested, but we should validate that KVM is
 definitely being activated when under libvirt. You can test this by
 doing:
 
  virsh qemu-monitor-command vmname1 'info kvm'
 kvm support: enabled
 
 I think I would see a higher impact if it was KVM not enabled.
 
 Which was taken from libvirt's command line. The only modifications
 I did to the original libvirt commandline (seen with ps aux) were:
 
 - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 Has been simplified to: -net nic,model=virtio -net
 tap,ifname=tap0,script=no,downscript=no
 and manual bridging of the tap0 interface.
 You could have equivalently used
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
   -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 It's this! It's this!! (thanks for the line)
 
 It raises boot time by 10-13 seconds
 Ok, that is truely bizarre and I don't really have any explanation
 for why that is. I guess you could try 'vhost=off' too and see if that
 makes the difference.
 
 YES!
 It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
 
 ...meaning? :-)

I've no idea. I was always under the impression that 'vhost=on' was
the 'make it go much faster' switch. So something is going wrong
here that I cna't explain.

Perhaps one of the network people on this list can explain...


To turn vhost off in the libvirt XML, you should be able to use
driver name='qemu'/ for the interface in question,eg


interface type='user'
  mac address='52:54:00:e5:48:58'/
  model type='virtio'/
  driver name='qemu'/
/interface

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

2011-09-28 Thread Daniel P. Berrange
On Wed, Sep 28, 2011 at 12:19:09PM +0200, Reeted wrote:
 On 09/28/11 11:53, Daniel P. Berrange wrote:
 On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
 YES!
 It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
 
 ...meaning? :-)
 I've no idea. I was always under the impression that 'vhost=on' was
 the 'make it go much faster' switch. So something is going wrong
 here that I cna't explain.
 
 Perhaps one of the network people on this list can explain...
 
 
 To turn vhost off in the libvirt XML, you should be able to use
 driver name='qemu'/  for the interface in question,eg
 
 
  interface type='user'
mac address='52:54:00:e5:48:58'/
model type='virtio'/
driver name='qemu'/
  /interface
 
 
 Ok that seems to work: it removes the vhost part in the virsh launch
 hence cutting down 12secs of boot time.
 
 If nobody comes out with an explanation of why, I will open another
 thread on the kvm list for this. I would probably need to test disk
 performance on vhost=on to see if it degrades or it's for another
 reason that boot time is increased.

Be sure to CC the qemu-devel mailing list too next time, since that has
a wider audience who might be able to help


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Avoid the use of deprecated gnutls gnutls_*_set_priority functions.

2011-08-25 Thread Daniel P. Berrange
On Thu, Aug 25, 2011 at 11:54:41AM +0100, Stefan Hajnoczi wrote:
 On Mon, Jul 4, 2011 at 11:00 PM, Raghavendra D Prabhu
 raghu.prabh...@gmail.com wrote:
  The gnutls_*_set_priority family of functions has been marked deprecated
  in 2.12.x. These functions have been superceded by
  gnutls_priority_set_direct().
 
  Signed-off-by: Raghavendra D Prabhu rpra...@wnohang.net
  ---
   ui/vnc-tls.c |   20 +---
   1 files changed, 1 insertions(+), 19 deletions(-)
 
  diff --git a/ui/vnc-tls.c b/ui/vnc-tls.c
  index dec626c..33a5d8c 100644
  --- a/ui/vnc-tls.c
  +++ b/ui/vnc-tls.c
  @@ -286,10 +286,6 @@ int vnc_tls_validate_certificate(struct VncState *vs)
 
   int vnc_tls_client_setup(struct VncState *vs,
                           int needX509Creds) {
  -    static const int cert_type_priority[] = { GNUTLS_CRT_X509, 0 };
  -    static const int protocol_priority[]= { GNUTLS_TLS1_1, GNUTLS_TLS1_0, 
  GNUTLS_SSL3, 0 };
  -    static const int kx_anon[] = {GNUTLS_KX_ANON_DH, 0};
  -    static const int kx_x509[] = {GNUTLS_KX_DHE_DSS, GNUTLS_KX_RSA, 
  GNUTLS_KX_DHE_RSA, GNUTLS_KX_SRP, 0};
 
      VNC_DEBUG(Do TLS setup\n);
      if (vnc_tls_initialize()  0) {
  @@ -310,21 +306,7 @@ int vnc_tls_client_setup(struct VncState *vs,
              return -1;
          }
 
  -        if (gnutls_kx_set_priority(vs-tls.session, needX509Creds ? 
  kx_x509 : kx_anon)  0) {
  -            gnutls_deinit(vs-tls.session);
  -            vs-tls.session = NULL;
  -            vnc_client_error(vs);
  -            return -1;
  -        }
  -
  -        if (gnutls_certificate_type_set_priority(vs-tls.session, 
  cert_type_priority)  0) {
  -            gnutls_deinit(vs-tls.session);
  -            vs-tls.session = NULL;
  -            vnc_client_error(vs);
  -            return -1;
  -        }
  -
  -        if (gnutls_protocol_set_priority(vs-tls.session, 
  protocol_priority)  0) {
  +        if (gnutls_priority_set_direct(vs-tls.session, needX509Creds ? 
  NORMAL : NORMAL:+ANON-DH, NULL)  0) {
              gnutls_deinit(vs-tls.session);
              vs-tls.session = NULL;
              vnc_client_error(vs);
  --
  1.7.6
 
 Daniel,
 This patch looks good to me but I don't know much about gnutls or
 crypto in general.  Would you be willing to review this?

ACK, this approach is different from what I did in libvirt, but it matches
the recommendations in the GNUTLS manual for setting priority, so I believe
it is good.

Signed-off-by: Daniel P. Berrange berra...@redhat.com

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMI BIOS String

2011-08-22 Thread Daniel P. Berrange
On Mon, Aug 22, 2011 at 03:52:19PM +1200, Derek wrote:
 Hi Folks,
 
 I could not track down any solid info on modifying the DMI BIOS string.
 
 For example, in VirtualBox you can use 'vboxmanage setsextradata' to
 set the BIOS product and vendor string per VM.
 
 Any ideas if this is possible with KVM?

If using QEMU directly you can use '-smbios' args. eg

-smbios type=0,vendor=LENOVO,version=6FET82WW (3.12 )
-smbios 
type=1,manufacturer=Fedora,product=Virt-Manager,version=0.8.2-3.fc14,serial=32dfcb37-5af1-552b-357c-be8c3aa38310,uuid=c7a5fdbd-edaf-9455-926a-d65c16db1809,sku=1234567890,family=Red
 Hat

If using QEMU via libvirt you can use the following:

  http://libvirt.org/formatdomain.html#elementsSysinfo


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-05 Thread Daniel P. Berrange
From: Daniel P. Berrange berra...@redhat.com

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined

Signed-off-by: Daniel P. Berrange berra...@redhat.com
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default accelerator, kvm */
 p = kvm;
+#else
+/* Use the default accelerator, tcg */
+p = tcg;
+#endif
 }
 
 while (!accel_initalised  *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-04 Thread Daniel P. Berrange
From: Daniel P. Berrange berra...@redhat.com

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default accelerator, kvm */
 p = kvm;
+#else
+/* Use the default accelerator, tcg */
+p = tcg;
+#endif
 }
 
 while (!accel_initalised  *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel P. Berrange
On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote:
 On 06/20/2011 04:38 PM, Daniel Gollub wrote:
 Introduce panic hypercall to enable the crashing guest to notify the
 host. This enables the host to run some actions as soon a guest
 crashed (kernel panic).
 
 This patch series introduces the panic hypercall at the host end.
 As well as the hypercall for KVM paravirtuliazed Linux guests, by
 registering the hypercall to the panic_notifier_list.
 
 The basic idea is to create KVM crashdump automatically as soon the
 guest paniced and power-cycle the VM (e.g. libvirton_crash /).
 
 This would be more easily done via a panic device (I/O port or
 memory-mapped address) that the guest hits.  It would be intercepted
 by qemu without any new code in kvm.\
 
 However, I'm not sure I see the gain.  Most enterprisey guests
 already contain in-guest crash dumpers which provide more
 information than a qemu memory dump could, since they know exact
 load addresses etc. and are integrated with crash analysis tools.
 What do you have in mind?

Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'.
This actually uses the QEMU monitor migration command to capture the
entire of QEMU memory. The 'crash' command line tool actually knows
how to analyse this data format as it would a normal kernel crashdump.

I think having a way for a guest OS to notify the host that is has
crashed would be useful. libvirt could automatically do a crash
dump of the QEMU memory, or at least pause the guest CPUs and notify
the management app of the crash, which can then decide what todo.
You can also use tools like 'virt-dmesg' which uses libvirt to peek
into guest memory to extract the most recent kernel dmesg logs (even
if the guest OS itself is crashed  didn't manage to send them out
via netconsole or something else).

This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drop -enable-nesting

2011-05-31 Thread Daniel P. Berrange
On Mon, May 30, 2011 at 06:19:14PM +0300, Avi Kivity wrote:
 On 05/30/2011 06:15 PM, Jan Kiszka wrote:
 On 2011-05-30 17:10, Roedel, Joerg wrote:
   On Mon, May 30, 2011 at 11:04:02AM -0400, Jan Kiszka wrote:
   On 2011-05-30 16:38, Nadav Har'El wrote:
   On Mon, May 30, 2011, Jan Kiszka wrote about drop -enable-nesting 
  (was: [PATCH 3/7] cpu model bug fixes and definition corrections...):
   On 2011-05-30 10:18, Roedel, Joerg wrote:
   On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote:
 
   J�rg, how to deal with -enable-nesting in qemu-kvm to align behavior
   with upstream?
 
   My personal preference is to just remove it. In upstream-qemu it is
   enabled/disabled by +/-svm. -enable-nesting is just a historic thing
   which can be wiped out.
 
   -enable-nesting could remain as a synonym for enabling either VMX or 
  SVM
   in the guest, depending on what was available in the host (because KVM 
  now
   supports both nested SVM and nested VMX, but not SVM-on-VMX or vice 
  versa).
 
   Why? Once nesting is stable (I think SVM already is), there is no reason
   for an explicit enable. And you can always mask it out via -cpu.
 
   BTW, what are the defaults for SVM right now in qemu-kvm and upstream?
   Enable if the modeled CPU supports it?
 
   qemu-kvm still needs -enable-nesting, otherwise it is disabled. Upstream
   qemu should enable it unconditionally (can be disabled with -cpu ,-svm).
 
 Then let's start with aligning qemu-kvm defaults to upstream? I guess
 that's what the diff I was citing yesterday is responsible for.
 
 In the same run, -enable-nesting could dump a warning on the console
 that this switch is obsolete and will be removed from future versions.
 
 I think it's safe to drop -enable-nesting immediately.  Dan, does
 libvirt make use of it?

Yes, but it should be safe to drop it. Currently, if the user specifies
a CPU with the 'svm' flag present in libvirt guest XML, then we will
pass args '-cpu +svm -enable-nesting'. So if we drop --enable-nesting,
then libvirt will simply omit it and everything should still work because
we have still got +svm set.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange
On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
 On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
  Hello, all,
  
  I have prepared to work on a feature called Disk I/O limits for 
  qemu-kvm projeect.
  This feature will enable the user to cap disk I/O amount performed by a 
  VM.It is important for some storage resources to be shared among multi-VMs. 
  As you've known, if some of VMs are doing excessive disk I/O, they will 
  hurt the performance of other VMs.
  
 
 Hi Zhiyong,
 
 Why not use kernel blkio controller for this and why reinvent the wheel
 and implement the feature again in qemu?

The finest level of granularity offered by cgroups apply limits per QEMU
process. So the blkio controller can't be used to apply controls directly
to individual disks used by QEMU, only the VM as a whole.

We networking we can use 'net_cls' cgroups controller for the process
as a whole, or attach  'tc' to individual TAP devices for per-NIC
throttling, both of which ultimately use the same kernel functionality.
I don't see an equivalent option for throttling individual disks that
would reuse functionality from the blkio controller.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange
On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
 On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
  On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
   On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
Hello, all,

I have prepared to work on a feature called Disk I/O limits for 
qemu-kvm projeect.
This feature will enable the user to cap disk I/O amount performed 
by a VM.It is important for some storage resources to be shared among 
multi-VMs. As you've known, if some of VMs are doing excessive disk 
I/O, they will hurt the performance of other VMs.

   
   Hi Zhiyong,
   
   Why not use kernel blkio controller for this and why reinvent the wheel
   and implement the feature again in qemu?
  
  The finest level of granularity offered by cgroups apply limits per QEMU
  process. So the blkio controller can't be used to apply controls directly
  to individual disks used by QEMU, only the VM as a whole.
 
 So are multiple VMs using same disk. Then put multiple VMs in same
 cgroup and apply the limit on that disk.
 
 Or if you want to put a system wide limit on a disk, then put all
 VMs in root cgroup and put limit on root cgroups.
 
 I fail to understand what's the exact requirement here. I thought
 the biggest use case was isolation one VM from other which might
 be sharing same device. Hence we were interested in putting 
 per VM limit on disk and not a system wide limit on disk (independent
 of VM).

No, it isn't about putting limits on a disk independant of a VM. It is
about one VM having multiple disks, and wanting to set different policies
for each of its virtual disks. eg

  qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3

and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
limited to 50 MB/s.  You can't do that kind of thing with cgroups,
because it can only control the entire process, not individual
resources within the process.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-14 Thread Daniel P. Berrange
On Wed, Apr 13, 2011 at 10:56:21PM +0300, Blue Swirl wrote:
 On Wed, Apr 13, 2011 at 4:08 PM, Luiz Capitulino lcapitul...@redhat.com 
 wrote:
  On Tue, 12 Apr 2011 21:31:18 +0300
  Blue Swirl blauwir...@gmail.com wrote:
 
  On Tue, Apr 12, 2011 at 10:52 AM, Avi Kivity a...@redhat.com wrote:
   On 04/11/2011 08:15 PM, Blue Swirl wrote:
  
   On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbrusterarm...@redhat.com
    wrote:
 Avi Kivitya...@redhat.com  writes:
   
 On 04/08/2011 12:41 AM, Anthony Liguori wrote:
   
 And it's a good thing to have, but exposing this as the only API to
 do something as simple as generating a guest crash dump is not the
 friendliest thing in the world to do to users.
   
 nmi is a fine name for something that corresponds to a real-life nmi
 button (often labeled NMI).
   
 Agree.
  
   We could also introduce an alias mechanism for user friendly names, so
   nmi could be used in addition of full path. Aliases could be useful
   for device paths as well.
  
   Yes.  Perhaps limited to the human monitor.
 
  I'd limit all debugging commands (including NMI) to the human monitor.
 
  Why?
 
 Do they have any real use in production environment? Also, we should
 have the freedom to change the debugging facilities (for example, to
 improve some internal implementation) as we want without regard to
 compatibility to previous versions.

We have users of libvirt requesting that we add an API for triggering
a NMI. They want this for support in a production environment, to be
able to initiate Windows crash dumps.  We really don't want to have to
use HMP passthrough for this, instead of a proper QMP command.

More generally I don't want to see stuff in HMP, that isn't in the QMP.
We already have far too much that we have to do via HMP passthrough in
libvirt due to lack of QMP commands, to the extent that we might as
well have just ignored QMP and continued with HMP for everything.

If we want the flexibility to change the debugging commands between
releases then we should come up with a plan to do this within the
scope of QMP, not restrict them to HMP only.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-04 Thread Daniel P. Berrange
On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote:
 From: Lai Jiangshan la...@cn.fujitsu.com
 Date: Mon, 7 Mar 2011 17:05:15 +0800
 Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command
 
 inject-nmi command injects an NMI on all CPUs of guest.
 It is only supported for x86 guest currently, it will
 returns Unsupported error for non-x86 guest.
 
 ---
  hmp-commands.hx |2 +-
  monitor.c   |   18 +-
  qmp-commands.hx |   29 +
  3 files changed, 47 insertions(+), 2 deletions(-)

Does anyone have any feedback on this addition, or are all new
QMP patch proposals blocked pending Anthony's QAPI work ?

We'd like to support it in libvirt and thus want it to be
available in QMP, as well as HMP.

 @@ -2566,6 +2566,22 @@ static void do_inject_nmi(Monitor *mon, const QDict 
 *qdict)
  break;
  }
  }
 +
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 +{
 +CPUState *env;
 +
 +for (env = first_cpu; env != NULL; env = env-next_cpu)
 +cpu_interrupt(env, CPU_INTERRUPT_NMI);
 +
 +return 0;
 +}
 +#else
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 +{
 +qerror_report(QERR_UNSUPPORTED);
 +return -1;
 +}
  #endif
  

Interesting that with HMP you need to specify a single CPU index, but
with QMP it is injecting to all CPUs at once. Is there any compelling
reason why we'd ever need the ability to only inject to a single CPU
from an app developer POV ?

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] [Qemu-devel] KVM call minutes for Mar 15

2011-03-17 Thread Daniel P. Berrange
On Tue, Mar 15, 2011 at 12:06:06PM -0700, Chris Wright wrote:
 * Anthony Liguori (anth...@codemonkey.ws) wrote:
  On 03/15/2011 09:53 AM, Chris Wright wrote:
   QAPI
 snip
  - c library implementation is critical to have unit tests and test
 driven development
 - thread safe?
   - no shared state, no statics.
   - threading model requires lock for the qmp session
 - licensiing?
   - LGPL
 - forwards/backwards compat?
   - designed with that in mind see wiki:
  
 http://wiki.qemu.org/Features/QAPI
  
  One neat feature of libqmp is that once libvirt has a better QMP
  passthrough interface, we can create a QmpSession that uses libvirt.
  
  It would look something like:
  
  QmpSession *libqmp_session_new_libvirt(virDomainPtr dom);
 
 Looks like you mean this?
 
- request QmpSession - 
 client  libvirt
- return QmpSession  -
 
 client - QmpSession - QMP - QEMU
 
 So bypassing libvirt completely to actually use the session?
 
 Currently, it's more like:
 
 client - QemuMonitorCommand - libvirt - QMP - QEMU
 
  The QmpSession returned by this call can then be used with all of
  the libqmp interfaces.  This means we can still exercise our test
  suite with a guest launched through libvirt.  It also should make
  the libvirt pass through interface a bit easier to consume by third
  parties.
 
 This sounds like it's something libvirt folks should be involved with.
 At the very least, this mode is there now and considered basically
 unstable/experimental/developer use:
 
  Qemu monitor command '%s' executed; libvirt results may be unpredictable!
 
 So likely some concern about making it easier to use, esp. assuming
 that third parties above are mgmt apps, not just developers.

Although we provide monitor and command line passthrough in libvirt,
our recommendation is that mgmt apps do not develop against these
APIs. Our goal / policy is that apps should be able todo anything
they need using the formally modelled libvirt public APIs.

The primary intended usage for the monitor/command line passthrough
is debugging  experimentation, and as a very short term hack/workaround
for mgmt apps while formal APIs are added to libvirt. In other words,
we provide the feature because we don't want libvirt to be a roadblock,
but we still strongly discourage their usage untill all other options
have been exhausted.

In same way as loading binary only modules into the kernels sets a
'tainted' flag, we plan that direct usage of monitor/command line
passthrough will set a tainted flag against a VM. This is allow distro
maintainers to identify usage  decide how they wish to support these
features in products (if at all).

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Configuring the bridge interface: why assign an IP?

2011-03-14 Thread Daniel P. Berrange
On Mon, Mar 14, 2011 at 11:24:40AM -0600, Ben Beuchler wrote:
 Most of the examples for setting up the bridge interface on a VM host
 suggest assigning the IP address to the bridge.  Assigning the IP to
 the bridge leaves you open to the MAC address of the bridge changing
 as you add/remove guests from the host, resulting in a brief (~20
 second) loss of connectivity to the host. (I am aware that I can
 manually set the MAC of the bridge to avoid unexpected changes. That's
 my current workaround.)

You don't need to manually set a MAC on the bridge - indeed you can't
set an arbitrary MAC on it - it must have a MAC that matches one of
the interfaces enslaved. The key is that the MAC of the enslaved ethernet
device should be numerically smaller than that of any guest TAP devices.
The kernel gives TAP devices a completely random MAC by default, so you
need to make a little change to that. Two options

 - Take the random host TAP device MAC and simply set the first byte to 0xFE
 - Take the guest NIC MAC, set first byte to 0xFE and give that to
   the host TAP device.

Recent releases of libvirt, follow the second approach and it has worked
out well, eliminating any connectivity loss with guest startup/shutdown

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with bridged tap interface

2011-02-23 Thread Daniel P. Berrange
On Wed, Feb 23, 2011 at 12:34:45PM +0100, 
andreas.a...@de.transport.bombardier.com wrote:
 Hi all,
 
 sorry for the previous partial e-mail, I hit the send button accidentally 
 ;-).
 
 I have a setup with a kvm-based virtual machine running a stock RedHat 6.1 
 (yes, that old) on a rather current debian host.
 
 1. uname in host: 2.6.26-2-amd64 #1 SMP Wed May 12 18:03:14 UTC 2010 
 x86_64 GNU/Linux
 
 2. uname in guest: 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown
 
 eth0 of the guest is connected via tap0 to a kernel bridge, that is in 
 turn connected via the host's eth1 to a Gigabit link.  On the kvm 
 command-line I configure the guest-nic as model=ne2k_pci.
 
 The problem is, that I frequently loose network access from/to the guest.

There have been QEMU NIC model implementation bugs that exhibit
that characteristic. If you have the drivers available in the
guest, then I'd recommend trying out different NIC models than
ne2k, since that's probably the least actively maintained NIC
model. At least try rtl8139, but ideally the e1000 too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange
On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
 On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
 Currently FdMigrationState doesn't support read(), and this patch
 introduces it to get response from the other side.
 
 Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
 
 Migration is unidirectional.  Changing this is fundamental and not
 something to be done lightly.

Making it bi-directional might break libvirt's save/restore
to file support which uses migration, passing a unidirectional
FD for the file. It could also break libvirt's secure tunnelled
migration support which is currently only expecting to have
data sent in one direction on the socket.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange
On Thu, Feb 10, 2011 at 07:23:33PM +0900, Yoshiaki Tamura wrote:
 2011/2/10 Daniel P. Berrange berra...@redhat.com:
  On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
  On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
  Currently FdMigrationState doesn't support read(), and this patch
  introduces it to get response from the other side.
  
  Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
 
  Migration is unidirectional.  Changing this is fundamental and not
  something to be done lightly.
 
  Making it bi-directional might break libvirt's save/restore
  to file support which uses migration, passing a unidirectional
  FD for the file. It could also break libvirt's secure tunnelled
  migration support which is currently only expecting to have
  data sent in one direction on the socket.
 
 Hi Daniel,
 
 IIUC, this patch isn't something to make existing live migration
 bi-directional.  Just opens up a way for Kemari to use it.  Do
 you think it's dangerous for libvirt still?

The key is for it to be a no-op for any usage of the existing
'migrate' command. I had thought this was wiring up read into
the event loop too, so it would be poll()ing for reads, but
after re-reading I see this isn't the case here.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-07 Thread Daniel P. Berrange
On Sat, Feb 05, 2011 at 04:34:01PM +, James Neave wrote:
 Hi,
 
 I'm trying to pass a NOVA-T-500 TV Tuner card through to a gust VM.
 I'm getting the error The driver 'pci-stub' is occupying your device
 :08:06.2

This is a rather misleading error message. It is *expected* that
pci-stub will occupy the device. Unfortunately the rest of the
error messages QEMU is printing aren't much help either, but 
ultimately something is returning -EBUSY in the PCI device assign
step

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Daniel P. Berrange
On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
   Hi,
 
 For (2), you cannot use bus=X,addr=Y because it makes assumptions about
 the PCI topology which may change in newer -M pc's.
 
 Why should the PCI topology for 'pc' ever change?
 
 We'll probably get q35 support some day, but when this lands I
 expect we'll see a new machine type 'q35', so '-m q35' will pick the
 ich9 chipset (which will have a different pci topology of course)
 and '-m pc' will pick the existing piix chipset (which will continue
 to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
 On 01/19/2011 07:11 AM, Markus Armbruster wrote:
 Gerd Hoffmannkra...@redhat.com  writes:
 
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices. If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need to
 specify it, thats why nobody does that today.  Once q35 finally
 arrives this will change of course.
 As far as I know, libvirt does it already.
 
 I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
 On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 
 The reason we specify 'bus' is that we wanted to be flexible wrt
 upgrades of libvirt, without needing restarts of QEMU instances
 it manages. That way we can introduce new functionality into
 libvirt that relies on it having previously set 'bus' on all
 active QEMUs.
 
 If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
 be adding the extra bridges. I'd expect that QEMU provided just
 the first bridge and then libvirt would specify how many more
 bridges to create at boot or hotplug them later. So it wouldn't
 ever need to parse topology.
 
 Yeah, but replacing the main chipset will certainly change the PCI
 topology such that if you're specifying bus=X and addr=X and then
 also using -M pc, unless you're parsing the default topology to come
 up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
 On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
 On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?
 The answer to your questions probably differ depending on
 whether '-nodefconfig' and '-nodefaults' are set on the
 command line.  If they are set, then I'd expect to only
 ever see one PCI bus with name pci.0 forever more, unless
 i explicitly ask for more. If they are not set, then you
 might expect to see multiple PCI buses by appear by magic
 
 Yeah, we can't promise that.  If you use -M pc, you aren't
 guaranteed a stable PCI bus topology even with
 -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] VM stuck in interrupt-loop after suspend to/resumed from file, or no interrupts at all

2011-01-12 Thread Daniel P. Berrange
On Wed, Jan 12, 2011 at 03:51:13PM +0100, Philipp Hahn wrote:
 Hello,
 
 libvirt implements a manages save, which suspens a VM to a file, from which 
 it 
 can be resumed later. This uses Qemus/Kvms migrate exec:file feature.
 This doesn't work reliable for me: In may cases the resumed VM seems to be 
 stuck: its VNC console is restored, but no key presses or network packages 
 are accepted. This both happens with Windows XP, 7, 2008 and Linux 2.6.32 
 systems.
 
 Using the debugging cycle described below in more detail I was able to track 
 the problem down to interrupt handling: Either the Linux-guest-kernel 
 constantly receives an interrupt for the 8139cp network adapter, or no 
 interrupts at all (neither network nor keyboard nor timer); only sending a 
 NMI works and shows that at least the Linux-Kernel is still alive.
 
 If I add the -no-kvm-irqchip Option, it seems to work; I was not able to 
 reproduce a hang.

I remember reporting a bug with that scenario 4/5 months back
and it being fixed in the host kernel IIRC.

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-0.13.0 - winsows 2008 - chkdisk too slow

2011-01-06 Thread Daniel P. Berrange
On Thu, Jan 06, 2011 at 12:19:21PM +0200, Avi Kivity wrote:
 On 01/06/2011 11:42 AM, Nikola Ciprich wrote:
   - run trace-cmd record -e kvm -b 10 -P pid1 -P pid2, ctrl-C after a
 seems like it's not possible to specify multiple pids, so
 
 Did you get 'overrun: something' reports from trace-cmd, where
 something != 0?
 
 If you're not sure, please run the trace again.  Also try adding '-r
 10' to the command line.
 
 I've run 4 commands in parallel. Also I can't get monitor information
 since vm is started using libvirt, so I've just used all machine's qemu-kvm
 pids..
 
 Dan, is there a way to hijack the monitor so we can run some
 commands on it?  Things like 'info registers' and disassembly.

Depends on the libvirt version. For most, you'll need to
look for the monitor path in the QEMU argv:

  -chardev
+socket,id=monitor,path=/var/lib/libvirt/qemu/vmwts02.monitor,server,nowait 
-mon   chardev=monitor,mode=readline

then, 'service libvirtd stop' and now you can connect to
the monitor at that path  run commands you want, and then
disconnect and start libvirtd again. If you run any commands
that change the VM state, things may well get confused when
you start libvirtd again, but if its just 'info registers'
etc it should be pretty safe.

If you have a new enough libvirt, then you can also send
commands directly using 'virsh qemu-monitor-command' (checking
whether you need JSON or HMP syntax first - in this case you
can see it needs HMP).

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] cgroup limits only affect kvm guest under certain conditions

2011-01-06 Thread Daniel P. Berrange
On Thu, Jan 06, 2011 at 02:15:37PM +0100, Dominik Klein wrote:
 Hi
 
 I am playing with cgroups and try to limit block io for guests.
 
 The proof of concept is:
 
 # mkdir /dev/cgroup/blkio
 # mount -t cgroup -o blkio blkio /dev/cgroup/blkio/
 # cd blkio/
 # mkdir test
 # cd test/
 # ls -l /dev/vdisks/kirk
 lrwxrwxrwx 1 root root 7 2011-01-06 13:46 /dev/vdisks/kirk - ../dm-5
 # ls -l /dev/dm-5
 brw-rw 1 root disk 253, 5 2011-01-06 13:36 /dev/dm-5
 # echo 253:5  1048576  blkio.throttle.write_bps_device
 # echo $$  tasks
 # dd if=/dev/zero of=/dev/dm-5 bs=1M count=20
 20+0 records in
 20+0 records out
 20971520 bytes (21 MB) copied, 20.0223 s, 1.0 MB/s
 
 So limit applies to the dd child of my shell.
 
 Now I assign /dev/dm-5 (/dev/vdisks/kirk) to a vm and echo the qemu-kvm
 pid into tasks. Limits are not applied, the guest can happily use max io
 bandwidth.

Did you just echo the main qemu-kvm PID, or did you also
add the PIDs of every thread too ? From this description
of the problem, I'd guess you've only confined the main
process thread and thus the I/O  VCPU threads are not
confined.

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] device-assignment: chmod the rom file before opening read/write

2011-01-05 Thread Daniel P. Berrange
On Wed, Jan 05, 2011 at 05:14:55PM +0200, Avi Kivity wrote:
 On 01/05/2011 04:57 PM, Alex Williamson wrote:
 A valid argument.  I think it could also be argued that the user is
 providing ownership of the file and writing to the file is part of the
 low level details of the sysfs rom file API and should be handled by the
 user of that API.  We basically have 3 places we could put this:
 
   A. kernel - Why is this file mode 0400 by default anyway if using
  it requires write access?  Set it to mode 0600 here by default.
   B. libvirt - Already does chown, why not do chmod too?  chmod and
  restore here.
   C. qemu - Owns file, chmod is trivial and part of the sysfs rom
  file API?  chmod around usage.
 
 
 qemu might not actually own the file, just have rw permissions.  Or
 it might own the file and selinux may prevent it from changing the
 permissions.  Or it may die before the reverse chmod and leave
 things not as they were.

Agreed, I don't think we can rely on QEMU being able to chmod() the
file in general.

 
 I chose qemu because it seemed to have the least chance of side-effects
 and has the smallest usage window.  Do you prefer libvirt or kernel?
 
 No idea really.  What's the kernel's motivation for keeping it ro?  Sanity?
 
 I'd guess libvirt is the one to do it, but someone more familiar
 with device assignment / pci (you?) should weigh in on this.

I've no real objection to libvirt setting the 0600 permissions
on it, if that's required for correct operation.

BTW, what is the failure scenario seen when the file is 0400.
I want to know how to diagnose/triage this if it gets reported
by users in BZ...

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: USB Passthrough 1.1 performance problem...

2010-12-14 Thread Daniel P. Berrange
On Tue, Dec 14, 2010 at 12:55:04PM +0100, Kenni Lund wrote:
 2010/12/14 Erik Brakkee e...@brakkee.org:
  From: Kenni Lund ke...@kelu.dk
  2010/12/14 Erik Brakkee e...@brakkee.org:
 
  From: Kenni Lund ke...@kelu.dk
 
  Does this mean I have a chance now that PCI passthrough of my WinTV
  PVR-500
  might work now?
 
  Passthrough of a PVR-500 has been working for a long time. I've been
  running with passthrough of a PVR-500 in my HTPC, since
  November/December 2009...so it should work with any recent kernel and
  any recent version of qemu-kvm you can find today - No patching
  needed. The only issue I had with the PVR-500 card, was when *I*
  didn't free up the shared interrupts...once I fixed that, it just
  worked.
 
  How did you free up those shared interrupts then? I tried different slots
  but always get conflicts with the USB irqs.
 
  I did an unbind of the conflicting device (eg. disabled it). I moved
  the PVR-500 card around in the different slots and once I got a
  conflict with the integrated sound card, I left the PVR-500 card in
  that slot (it's a headless machine, so no need for sound) and
  configured unbind of the sound card at boot time. On my old system I
  think it was conflicting with one of the USB controllers as well, but
  it didn't really matter, as I only lost a few of the ports on the back
  of the computer for that particular USB controller - I still had
  plenty of USB ports left and if I really needed more ports, I could
  just plug in an extra USB PCI card.
 
  My /etc/rc.local boot script looks like the following today:
  --
  #Remove HDA conflicting with ivtv1
  echo :00:1b.0  /sys/bus/pci/drivers/HDA\ Intel/unbind
 
  # ivtv0
  echo  0016  /sys/bus/pci/drivers/pci-stub/new_id
  echo :04:08.0  /sys/bus/pci/drivers/ivtv/unbind
  echo :04:08.0  /sys/bus/pci/drivers/pci-stub/bind
  echo  0016  /sys/bus/pci/drivers/pci-stub/remove_id
 
  # ivtv1
  echo  0016  /sys/bus/pci/drivers/pci-stub/new_id
  echo :04:09.0  /sys/bus/pci/drivers/ivtv/unbind
  echo :04:09.0  /sys/bus/pci/drivers/pci-stub/bind
  echo  0016  /sys/bus/pci/drivers/pci-stub/remove_id
 
  I did not try unbinding the usb device so I can also try that.
 
  I don'.t understand what is happening with the  0016. I configured the
  pci card in kvm and I believe kvm does the binding to pci-stub in recent
  versions. Where is the  0016%oming from?
 
 Okay, qemu-kvm might do it today, I don't know - I haven't changed
 that script for the past year. But are you sure that it's not
 libvirt/virsh/virt-manager which does that for you?

If you use the managed=yes attribute on the hostdev in libvirt
XML, then libvirt will automatically do the pcistub bind/unbind,
followed by a device reset at guest startup  the reverse at shutdown.
If you have conflicting devices on the bus though, libvirt won't
attempt to unbind them, unless you had also explicitly assigned all
those conflicting devices to the same guest.

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: -pcidevice broken - fix or remove it?

2010-11-02 Thread Daniel P. Berrange
On Tue, Nov 02, 2010 at 03:31:31PM +0100, Jan Kiszka wrote:
 Am 02.11.2010 14:19, Markus Armbruster wrote:
  Jan Kiszka jan.kis...@web.de writes:
  
  Hi,
 
  looks like the documented way to configure device assignment at qemu-kvm
  level is broken in 0.13 and git head:
 
# qemu-system-x86_64 -pcidevice host=0:1a.0
qemu-system-x86_64: Parameter 'id' expects an identifier
Identifiers consist of letters, digits, '-', '.', '_', starting with a 
  letter.
pcidevice argument parse error; please check the help text for usage
 
  -device pci-assign works, but only if specify iommu=0 (otherwise: No
  IOMMU found.  Unable to assign device (null)).
 
  Fix that legacy qemu-kvm switch or remove it at this chance? No one
  seems to have complained yet.
 
  Jan
  
  Broken since June.  Xudong Hao (cc'ed) reported it then[1], and
  Hidetoshi Seto (also cc'ed) attempted to get it patched [2,3].
  
  Removing -pcidevice would be fine with me.  For what it's worth, it's
  not in upstream qemu.
  
 
 Patch queued.
 
 While at it, I wondered if we should kill pci_add ... host as well.
 But it looks like libvirt uses it - and should stumble over this
 breakage (it does not specify a device name). I can fix it by removing
 the silly auto-naming, or can libvirt live without it?

As of libvirt = 0.8.1  QEMU = 0.12.x we use switched to using -device 
for everything. Older libvirt versions had rather broken checking for
PCI device topology, so I think it is fine to require libvirt = 0.8.1
for latest QEMU releases if users want PCI dev assignment. Thus -pcidevice
and pci_add can both be killed from our POV.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >