from:"Daniel P. Berrange"

On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
 On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
  
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
 Usability. Users should not have to care about individual tick-based
 clocks. They care about my OS requires lost ticks compensation, yes or no.

FYI, at the libvirt level we model policy against individual timers, for
example:

  clock offset=localtime
timer name=rtc tickpolicy=catchup track=guest/
timer name=pit tickpolicy=delay/
  /clock


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
 On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes or 
  no.
  
  FYI, at the libvirt level we model policy against individual timers, for
  example:
  
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
 Are the various modes of tickpolicy fully specified somewhere?

There are some (not all that great) docs here:

  http://libvirt.org/formatdomain.html#elementsTime

The meaning of the 4 policies are:

  delay: continue to deliver at normal rate
catchup: deliver at higher rate to catchup
  merge: ticks merged into 1 single tick
discard: all missed ticks are discarded


The original design rationale was here, though beware that some things
changed between this design  the actual implementation libvirt has:

  https://www.redhat.com/archives/libvir-list/2010-March/msg00304.html

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
 On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, guest
  time advances faster then it should, to the extent where NTP fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being executed.
 
  Then the next question is how and where to control this. Conceptually,
  there should rather be a global switch say compensate for lost ticks 
  of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes or 
  no.
 
  FYI, at the libvirt level we model policy against individual timers, for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
  
  There are some (not all that great) docs here:
  
http://libvirt.org/formatdomain.html#elementsTime
  
  The meaning of the 4 policies are:
  
delay: continue to deliver at normal rate
 
 What does this mean? The timer stops ticking until the guest accepts its
 ticks again?

It means that the hypervisor will not attempt to do any compensation,
so the guest will see delays in its ticks being delivered  gradually
drift over time.

  catchup: deliver at higher rate to catchup
merge: ticks merged into 1 single tick
  discard: all missed ticks are discarded
 
 But those interpretations aren't stated in the docs. That makes it hard
 to map them on individual hypervisors - or model proper new hypervisor
 interfaces accordingly.

That's not a real problem, now I notice they are missing the docs, I
can just add them in.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
 On 2012-01-20 13:42, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
  On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, 
  guest
  time advances faster then it should, to the extent where NTP fails 
  to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being 
  executed.
 
  Then the next question is how and where to control this. 
  Conceptually,
  there should rather be a global switch say compensate for lost 
  ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes 
  or no.
 
  FYI, at the libvirt level we model policy against individual timers, for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
 
  There are some (not all that great) docs here:
 
http://libvirt.org/formatdomain.html#elementsTime
 
  The meaning of the 4 policies are:
 
delay: continue to deliver at normal rate
 
  What does this mean? The timer stops ticking until the guest accepts its
  ticks again?
  
  It means that the hypervisor will not attempt to do any compensation,
  so the guest will see delays in its ticks being delivered  gradually
  drift over time.
 
 Still, is the logic as I described? Or what is the difference to discard.

With 'discard', the delayed tick will be thrown away. In 'delay', the
delayed tick will still be injected to the guest, possibly well after
the intended injection time though, and there will be no attempt to
compensate by speeding up delivery of later ticks.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu-kvm upstreaming: Do we need -no-kvm-pit and -no-kvm-pit-reinjection semantics?

On Fri, Jan 20, 2012 at 02:02:03PM +0100, Jan Kiszka wrote:
 On 2012-01-20 13:54, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:51:20PM +0100, Jan Kiszka wrote:
  On 2012-01-20 13:42, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 01:00:06PM +0100, Jan Kiszka wrote:
  On 2012-01-20 12:45, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 12:13:48PM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:25, Daniel P. Berrange wrote:
  On Fri, Jan 20, 2012 at 11:22:27AM +0100, Jan Kiszka wrote:
  On 2012-01-20 11:14, Marcelo Tosatti wrote:
  On Thu, Jan 19, 2012 at 07:01:44PM +0100, Jan Kiszka wrote:
  On 2012-01-19 18:53, Marcelo Tosatti wrote:
  What problems does it cause, and in which scenarios? Can't they 
  be
  fixed?
 
  If the guest compensates for lost ticks, and KVM reinjects them, 
  guest
  time advances faster then it should, to the extent where NTP 
  fails to
  correct it. This is the case with RHEL4.
 
  But for example v2.4 kernel (or Windows with non-acpi HAL) do not
  compensate. In that case you want KVM to reinject.
 
  I don't know of any other way to fix this.
 
  OK, i see. The old unsolved problem of guessing what is being 
  executed.
 
  Then the next question is how and where to control this. 
  Conceptually,
  there should rather be a global switch say compensate for lost 
  ticks of
  periodic timers: yes/no - instead of a per-timer knob. Didn't we
  discussed something like this before?
 
  I don't see the advantage of a global control versus per device
  control (in fact it lowers flexibility).
 
  Usability. Users should not have to care about individual tick-based
  clocks. They care about my OS requires lost ticks compensation, yes 
  or no.
 
  FYI, at the libvirt level we model policy against individual timers, 
  for
  example:
 
clock offset=localtime
  timer name=rtc tickpolicy=catchup track=guest/
  timer name=pit tickpolicy=delay/
/clock
 
  Are the various modes of tickpolicy fully specified somewhere?
 
  There are some (not all that great) docs here:
 
http://libvirt.org/formatdomain.html#elementsTime
 
  The meaning of the 4 policies are:
 
delay: continue to deliver at normal rate
 
  What does this mean? The timer stops ticking until the guest accepts its
  ticks again?
 
  It means that the hypervisor will not attempt to do any compensation,
  so the guest will see delays in its ticks being delivered  gradually
  drift over time.
 
  Still, is the logic as I described? Or what is the difference to discard.
  
  With 'discard', the delayed tick will be thrown away. In 'delay', the
  delayed tick will still be injected to the guest, possibly well after
  the intended injection time though, and there will be no attempt to
  compensate by speeding up delivery of later ticks.
 
 OK, let's see if I got it:
 
 delay   - all lost ticks are replayed in a row once the guest accepts
   them again
 catchup - lost ticks are gradually replayed at a higher frequency than
   the original tick
 merge   - at most one tick is replayed once the guest accepts it again
 discard - no lost ticks compensation

Yes, I think that is a good understanding.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SPEC-file for making RPMs (with rpmbuild)

2012-01-06 Thread Daniel P. Berrange

On Fri, Jan 06, 2012 at 11:11:21AM +0100, Guido Winkelmann wrote:
 Hi,
 
 Is there a spec-file somewhere for creating RPMs from the newest qemu-kvm 
 release?

The current Fedora RPM specfiles are always a good bet to start off with:

  http://pkgs.fedoraproject.org/gitweb/?p=qemu.git;a=blob;f=qemu.spec;hb=HEAD

By default these will build all QEMU targets, and a dedicated qemu-kvm
binary too.There is a flag to restrict it to x86 only for cases where
you don't want all archs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 5x slower guest disk performance with virtio disk

2011-12-15 Thread Daniel P. Berrange

On Thu, Dec 15, 2011 at 07:16:22PM +0200, Sasha Levin wrote:
 On Thu, 2011-12-15 at 11:55 -0500, Brian J. Murrell wrote:
  So, about 2/3 of host speed now -- which is much better.  Is 2/3 about
  normal or should I be looking for more? 
 
 aio=native
 
 Thats the qemu setting, I'm not sure where libvirt hides that.

  disk  ...
driver io='threads|native'/
...
  /disk

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] (no subject)

2011-12-07 Thread Daniel P. Berrange

On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote:
 On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote:
  On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
 * KVM tool manages the network completely itself (with DHCP support?),
   no way to configure, except specify the modes (user|tap|none). I
   have not test it yet, but it should need explicit script to setup
   the network rules(e.g. NAT) for the guest access outside world.
   Anyway, there is no way for libvirt to control the guest network.
  
  If KVM tool support TAP devices, can't be do whatever we like with
  that just by passing in a configured TAP device from libvir ?
 
 KVM tool currently creates and configures the TAP devices it uses, it
 shouldn't be an issue to have it use a TAP fd passed to it either.
 
 How does libvirt do it? Create a /dev/tapX on it's own and pass the fd
 to the hypervisor?

Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA
mode), adds it to the neccessary bridge, and/or configures VEPA, etc
and then passes the FD to the hypervisor, with a ARGV parameter to
tell the HV which FD is being passed.

 * console connection is implemented by setup ptys in libvirt, 
   stdout/stderr
   of kvm tool process is redirected to the master pty, and libvirt 
   connects
   to the slave pty. This works fine now, but it might be better if kvm
   tool could provide more advanced console mechanisms. Just like QEMU
   does?
  
  This sounds good enough for now.
 
 KVM tools does a redirection to a PTY, which at that point could be
 redirected to anywhere the user wants.
 
 What features might be interesting to do on top of that?

I presume that Osier is just comparing with the features QEMU has available
for chardevs config, which include

 - PTYs
 - UNIX sockets
 - TCP sockets
 - UDP sockets
 - FIFO pipe
 - Plain file (output only obviously, but useful for logging)

libvirt doesn't specifically need any of them, but it can support those
options if they exist.

 * Not much ways existed yet for external apps or user to query the guest
   informations. But this might be changed soon per KVM tool grows up
   quickly.
  
  What sort of guest info are you thinking about ? The most immediate
  pieces of info I can imagine we need are
  
   - Mapping between PIDs and  vCPU threads
   - Current balloon driver value
 
 Those are pretty easily added using the IPC interface I've mentioned
 above. For example, 'kvm balloon' and 'kvm stat' will return a lot of
 info out of the balloon driver (including the memory stats VQ - which
 afaik we're probably the only ones who actually do that (but I might be
 wrong) :)

Ok, that sounds sufficient for the balloon info.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Allow the user to pass a FD to use as a TAP device

2011-12-07 Thread Daniel P. Berrange

On Wed, Dec 07, 2011 at 06:28:12PM +0200, Pekka Enberg wrote:
 On Wed, Dec 7, 2011 at 11:37 AM, Sasha Levin levinsasha...@gmail.com wrote:
  This allows users to pass a pre-configured fd to use for the network
  interface.

  For example:
         kvm run -n mode=tap,fd=3 3/dev/net/tap3

  Cc: Daniel P. Berrange berra...@redhat.com
  Cc: Osier Yang jy...@redhat.com
  Signed-off-by: Sasha Levin levinsasha...@gmail.com

 Daniel, Osier, I assume this is useful for libvirt?

Yes, this works.

I don't know if kvmtool supports  the VNET_HDR extension yet, but if it
does, then we can make libvirt pass in a pre-opened FD for that too.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] (no subject)

On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:
 Hi, all
 
 This is a basic implementation of libvirt Native Linux KVM
 Tool driver. Note that this is just made with my own interest
 and spare time, it's not an endorsement/effort by Red Hat,
 and it isn't supported by Red Hat officially.
 
 Basically, the driver is designed as *stateful*, as KVM tool
 doesn't maintain any info about the guest except a socket which
 for its own IPC. And it's implemented by using KVM tool binary,
 which is name kvm currently, along with cgroup controllers
 cpuacct, and memory support. And as one of KVM tool's
 pricinple is to allow both the non-root and root user to play with.
 The driver is designed to support root and non-root too, just
 like QEMU does. Example of the connection URI:
 
 virsh -c kvmtool:///system
 virsh -c kvmtool:///session
 virsh -c kvmtool+unix:///system
 virsh -c kvmtool+unix:///session
 
 The implementation can support more or less than 15 virsh commands
 currently, including basic domain cycle operations (define/undefine,
 start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml,
 ,autostart, dominfo, etc.)
 
 About the domain configuration:
   * kernel: must be specified as KVM tool only support boots
  from the kernel currently (no integration with BIOS app yet).
 
   * disk: only virtio bus is supported, and device type must be 'disk'.
 
   * serial/console: only one console is supported, of type serial or
  virtio (can extend to support multiple console as long as kvm tool
  supports, libvirt already supported mutiple console, see upstream
  commit 0873b688c).
 
   * p9fs: only support specifying the source dir, and mount tag, only
  type of 'mount' is supported.
 
   * memballoon: only virtio is supported, and there is no way
  to config the addr.
 
   * Multiple disk and p9fs is supported.
 
   * Graphics and network are not supported, will explain below.
 
 Please see [PATCH 7/8] for an example of the domain config. (which
 contains all the XMLs supported by current implementation).
 
 The problems of Native Linux KVM Tool from libvirt p.o.v:
 
   * Some destros package qemu-kvm as kvm, also kvm is a long
 established name for KVM itself, so naming the project as
 kvm might be not a good idea. I assume it will be named
 as kvmtool in this implementation, never mind this if you
 don't like that, it can be updated easily. :-)

Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend
using 'kvmtool' as the binary name to avoid confusion with existing
'kvm' binaries based on QEMU.

   * It still doesn't have an official package yet, even no make install.
 means we have no way to check the dependancy and do the checking
 when 'configure'. I assume it will be installed as /usr/bin/kvmtool
 in this implementation. This is the main reason which can prevents
 upstream libvirt accepting the patches I guess.

Ok, not really a problem - we do similar for the regular QEMU driver.

   * Lacks of options for user's configuration, such as -vnc, there
 is no option for user to configure the properties for the vnc,
 such as the port. It hides things, doesn't provide ways to query
 the properties too, this causes problems for libvirt to add the
 vnc support, as vnc clients such as virt-manager, virt-viewer,
 have no way to connect the guest. Even vncviewer can't.

Being able to specify a VNC port of libvirt's choosing is pretty
much mandatory to be able to support that.In addition being able
to specify the bind address is important to be able to control
security. eg to only bind to 127.0.0.1, or only to certain NICs
in a multi-NIC host.

   * KVM tool manages the network completely itself (with DHCP support?),
 no way to configure, except specify the modes (user|tap|none). I
 have not test it yet, but it should need explicit script to setup
 the network rules(e.g. NAT) for the guest access outside world.
 Anyway, there is no way for libvirt to control the guest network.

If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?

   * There is a gap about the domain status between KVM tool and libvirt,
 it's caused by KVM tool unlink()s the guest socket when user exits
 from console (both text and graphic), but libvirt still think the
 guest is running. 

Being able to reliably detect shutdown/exit of the KVM too is
a very important tasks, and we can't rely on waitpid/SIG_CHLD
because we want to daemonize all instances wrt libvirtd.

In the QEMU driver we keep open a socket to the monitor, and
when we see an I/O error / POLLHUP on the socket we know that
QEMU has quit.

What is this guest socket used for ? Could libvirt keep open a
connection to it ?

One other option would be to use inotify to watch for deletion
of the guest socket in the filesystem. This is sortof what we
do with the UML

Re: [libvirt] [PATCH] kvm tools: Introduce an ENV variable for the state dir

On Fri, Nov 11, 2011 at 07:57:00PM +0800, Osier Yang wrote:
 Which is named as KVMTOOL_STATE_DIR, so that the user can
 configure the path of state directly as he wants.
 ---
  tools/kvm/main.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/tools/kvm/main.c b/tools/kvm/main.c
 index 05bc82c..37b2b1d 100644
 --- a/tools/kvm/main.c
 +++ b/tools/kvm/main.c
 @@ -13,7 +13,12 @@ static int handle_kvm_command(int argc, char **argv)
  
  int main(int argc, char *argv[])
  {
 - kvm__set_dir(%s/%s, HOME_DIR, KVM_PID_FILE_PATH);
 + char *state_dir = getenv(KVMTOOL_STATE_DIR);
 +
 + if (state_dir)
 + kvm__set_dir(%s, state_dir);
 + else
 + kvm__set_dir(%s/%s, HOME_DIR, KVM_PID_FILE_PATH);
  
   return handle_kvm_command(argc - 1, argv[1]);
  }

As per my comments in the first patch, I don't think this is critical
for libvirt's needs. We should just honour the default location that
the KVM tool uses, rather than forcing a libvirt specific location.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 5/7] kvmtool: Add new domain type

On Fri, Nov 11, 2011 at 07:57:04PM +0800, Osier Yang wrote:
 It's named as kvmtool.
 ---
  src/conf/domain_conf.c |4 +++-
  src/conf/domain_conf.h |1 +
  2 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
 index 58f4d0f..55121d8 100644
 --- a/src/conf/domain_conf.c
 +++ b/src/conf/domain_conf.c
 @@ -91,7 +91,8 @@ VIR_ENUM_IMPL(virDomainVirt, VIR_DOMAIN_VIRT_LAST,
hyperv,
vbox,
one,
 -  phyp)
 +  phyp,
 +  kvmtool)
  
  VIR_ENUM_IMPL(virDomainBoot, VIR_DOMAIN_BOOT_LAST,
fd,
 @@ -4018,6 +4019,7 @@ virDomainChrDefParseXML(virCapsPtr caps,
  if (type == NULL) {
  def-source.type = VIR_DOMAIN_CHR_TYPE_PTY;
  } else if ((def-source.type = virDomainChrTypeFromString(type))  0) {
 +VIR_WARN(type = %s, type);
  virDomainReportError(VIR_ERR_XML_ERROR,
   _(unknown type presented to host for character 
 device: %s),
   type);
 diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
 index a3cb834..001bc46 100644
 --- a/src/conf/domain_conf.h
 +++ b/src/conf/domain_conf.h
 @@ -59,6 +59,7 @@ enum virDomainVirtType {
  VIR_DOMAIN_VIRT_VBOX,
  VIR_DOMAIN_VIRT_ONE,
  VIR_DOMAIN_VIRT_PHYP,
 +VIR_DOMAIN_VIRT_KVMTOOL,
  
  VIR_DOMAIN_VIRT_LAST,
  };

IMHO this patch is not required. The domain type is refering to the
hypervisor used for the domain, which is still 'kvm'. What is different
here is just the userspace device model.  If you look at the 3 different
Xen user spaces we support, all of them use domain type='xen' still.
So just use  domain type='kvm' here for kvmtool.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 2/7] kvmtool: Add documents

On Fri, Nov 11, 2011 at 07:57:01PM +0800, Osier Yang wrote:
 The document is rather rough now, but at least contains an domain
 config example of all the current supported XMLs, and tells how to
 play with the driver.
 ---
  docs/drivers.html.in|1 +
  docs/drvkvmtool.html.in |   87 
 +++
  docs/index.html.in  |3 ++
  docs/sitemap.html.in|4 ++
  src/README  |3 +-
  5 files changed, 97 insertions(+), 1 deletions(-)
  create mode 100644 docs/drvkvmtool.html.in
 
 diff --git a/docs/drivers.html.in b/docs/drivers.html.in
 index 75038fc..249c137 100644
 --- a/docs/drivers.html.in
 +++ b/docs/drivers.html.in
 @@ -29,6 +29,7 @@
listronga href=drvvmware.htmlVMware 
 Workstation/Player/a/strong/li
listronga href=drvxen.htmlXen/a/strong/li
listronga href=drvhyperv.htmlMicrosoft 
 Hyper-V/a/strong/li
 +  listronga href=drvkvmtool.htmlNative Linux KVM 
 Tool/a/strong/li
  /ul
  
  h2a name=stroageStorage drivers/a/h2
 diff --git a/docs/drvkvmtool.html.in b/docs/drvkvmtool.html.in
 new file mode 100644
 index 000..1b6acdf
 --- /dev/null
 +++ b/docs/drvkvmtool.html.in
 @@ -0,0 +1,87 @@
 +html
 +  body
 +h1KVM tool driver/h1
 +
 +ul id=toc/ul
 +
 +p
 +  The libvirt KVMTOOL driver manages hypervisor Native Linux KVM Tool,
 +  it's implemented by using command line of kvm tool binary.
 +/p
 +
 +h2a name=projectProject Links/a/h2
 +
 +ul
 +  li
 +The a href=git://github.com/penberg/linux-kvm.gitNative Linux 
 KVM Tool/a Native
 +Linux KVM Tool
 +  /li
 +/ul
 +
 +h2a name=urisConnections to the KVMTOOL driver/a/h2
 +p
 +  The libvirt KVMTOOL driver is a multi-instance driver, providing a 
 single
 +  system wide privileged driver (the system instance), and per-user
 +  unprivileged drivers (the session instance). The URI driver protocol
 +  is kvmtool. Some example conection URIs for the libvirt driver are:
 +/p
 +
 +pre
 +  kvmtool:///session  (local access to per-user 
 instance)
 +  kvmtool+unix:///session (local access to per-user 
 instance)
 +
 +  kvmtool:///system   (local access to system 
 instance)
 +  kvmtool+unix:///system  (local access to system 
 instance)
 +/pre
 +p
 +  cgroups controllers cpuacct, and memory are supported currently.
 +/p
 +
 +  h3Example config/h3
 +
 +  pre
 +lt;domain type='kvmtool' id='1'gt;

As mentioned in a later patch, we should just use  type='kvm' here still


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 3/7] kvmtool: Add new enums and error codes for the driver

On Fri, Nov 11, 2011 at 07:57:02PM +0800, Osier Yang wrote:
 ---
  include/libvirt/virterror.h |1 +
  src/driver.h|1 +
  src/util/virterror.c|3 +++
  3 files changed, 5 insertions(+), 0 deletions(-)
 
 diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h
 index a8549b7..deda42d 100644
 --- a/include/libvirt/virterror.h
 +++ b/include/libvirt/virterror.h
 @@ -84,6 +84,7 @@ typedef enum {
  VIR_FROM_LIBXL = 41, /* Error from libxenlight driver */
  VIR_FROM_LOCKING = 42,   /* Error from lock manager */
  VIR_FROM_HYPERV = 43,/* Error from Hyper-V driver */
 +VIR_FROM_KVMTOOL = 44,   /* Error from kvm tool driver */
  } virErrorDomain;
  
  
 diff --git a/src/driver.h b/src/driver.h
 index 4c14aaa..158a13c 100644
 --- a/src/driver.h
 +++ b/src/driver.h
 @@ -30,6 +30,7 @@ typedef enum {
  VIR_DRV_VMWARE = 13,
  VIR_DRV_LIBXL = 14,
  VIR_DRV_HYPERV = 15,
 +VIR_DRV_KVMTOOL = 16,
  } virDrvNo;
  
  
 diff --git a/src/util/virterror.c b/src/util/virterror.c
 index 5006fa2..abb5b5a 100644
 --- a/src/util/virterror.c
 +++ b/src/util/virterror.c
 @@ -175,6 +175,9 @@ static const char *virErrorDomainName(virErrorDomain 
 domain) {
  case VIR_FROM_HYPERV:
  dom = Hyper-V ;
  break;
 +case VIR_FROM_KVMTOOL:
 +dom = KVMTOOL ;
 +break;
  }
  return(dom);
  }

Trivial, ACK

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 4/7] kvmtool: Add hook support for kvmtool domain

On Fri, Nov 11, 2011 at 07:57:03PM +0800, Osier Yang wrote:
 Just like QEMU and LXC, kvm driver intends to support running hook
 script before domain starting and after domain shutdown too.
 ---
  src/util/hooks.c |   11 ++-
  src/util/hooks.h |8 
  2 files changed, 18 insertions(+), 1 deletions(-)
 
 diff --git a/src/util/hooks.c b/src/util/hooks.c
 index 110a94b..765cb68 100644
 --- a/src/util/hooks.c
 +++ b/src/util/hooks.c
 @@ -52,12 +52,14 @@ VIR_ENUM_DECL(virHookDaemonOp)
  VIR_ENUM_DECL(virHookSubop)
  VIR_ENUM_DECL(virHookQemuOp)
  VIR_ENUM_DECL(virHookLxcOp)
 +VIR_ENUM_DECL(virHookKvmToolOp)
  
  VIR_ENUM_IMPL(virHookDriver,
VIR_HOOK_DRIVER_LAST,
daemon,
qemu,
 -  lxc)
 +  lxc,
 +  kvmtool)
  
  VIR_ENUM_IMPL(virHookDaemonOp, VIR_HOOK_DAEMON_OP_LAST,
start,
 @@ -79,6 +81,10 @@ VIR_ENUM_IMPL(virHookLxcOp, VIR_HOOK_LXC_OP_LAST,
start,
stopped)
  
 +VIR_ENUM_IMPL(virHookKvmToolOp, VIR_HOOK_KVMTOOL_OP_LAST,
 +  start,
 +  stopped)
 +
  static int virHooksFound = -1;
  
  /**
 @@ -230,6 +236,9 @@ virHookCall(int driver, const char *id, int op, int 
 sub_op, const char *extra,
  case VIR_HOOK_DRIVER_LXC:
  opstr = virHookLxcOpTypeToString(op);
  break;
 +case VIR_HOOK_DRIVER_KVMTOOL:
 +opstr = virHookKvmToolOpTypeToString(op);
 +break;
  }
  if (opstr == NULL) {
  virHookReportError(VIR_ERR_INTERNAL_ERROR,
 diff --git a/src/util/hooks.h b/src/util/hooks.h
 index fd7411c..69081c4 100644
 --- a/src/util/hooks.h
 +++ b/src/util/hooks.h
 @@ -31,6 +31,7 @@ enum virHookDriverType {
  VIR_HOOK_DRIVER_DAEMON = 0,/* Daemon related events */
  VIR_HOOK_DRIVER_QEMU,  /* QEmu domains related events */
  VIR_HOOK_DRIVER_LXC,   /* LXC domains related events */
 +VIR_HOOK_DRIVER_KVMTOOL,   /* KVMTOOL domains related events */
  
  VIR_HOOK_DRIVER_LAST,
  };
 @@ -67,6 +68,13 @@ enum virHookLxcOpType {
  VIR_HOOK_LXC_OP_LAST,
  };
  
 +enum virHookKvmToolOpType {
 +VIR_HOOK_KVMTOOL_OP_START,/* domain is about to start */
 +VIR_HOOK_KVMTOOL_OP_STOPPED,  /* domain has stopped */
 +
 +VIR_HOOK_KVMTOOL_OP_LAST,
 +};
 +
  int virHookInitialize(void);
  
  int virHookPresent(int driver);

Trivial, ACK


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [PATCH 7/7] kvmtool: Implementation for kvm tool driver

On Fri, Nov 11, 2011 at 07:57:06PM +0800, Osier Yang wrote:
 Basically, the drivers is implemented by using kvm tool binary
 currently, (see ./kvm help for more info).
 
 Current implementation supports define/undefine, start/destroy/,
 suspend/resume, connect to guest console via virsh console,
 and balloon memory with with virsh setmem (using ./kvm balloon
 command). Also as it supports cgroup controllers cpuacct, and
 memory, so some other commands like schedinfo, memtune can
 also work. Some other commands such as domid, domname, dumpxml
 ,autostart, etc. are supported, as the driver is designed
 as a stateful driver, those APIs just need to talk with libvirtd
 simply.
 
 As Native Linux KVM Tool is designed for both non-root and root users,
 the driver is designed just like QEMU, supports two modes of the
 connection:
 
 kvmtool:///system
 kvmtool+unix:///system
 
 kvmtool:///session
 kvmtool+unix:///session
 
 An example of the domain XML (all the XMLs supported currently are
 listed):
 
 % virsh -c kvm:///system dumpxml kvm_test
 domain type='kvmtool'
   namekvm_test/name
   uuid88bf38f1-b6ab-cfa6-ab53-4b4c0993d894/uuid
   memory524288/memory
   currentMemory524288/currentMemory
   vcpu1/vcpu
   os
 type arch='x86_64'hvm/type
 kernel/boot/bzImage/kernel
 boot dev='hd'/
   /os
   clock offset='utc'/
   on_poweroffdestroy/on_poweroff
   on_rebootrestart/on_reboot
   on_crashrestart/on_crash
   devices
 emulator/usr/bin/kvmtool/emulator
 disk type='file' device='disk'
   source file='/var/lib/libvirt/images/linux-0.2.img'/
   target dev='vda' bus='virtio'/
 /disk
 filesystem type='mount' accessmode='passthrough'
   source dir='/tmp'/
   target dir='/mnt'/
 /filesystem
 console type='pty'
   target type='serial' port='0'/
 /console
 memballoon model='virtio'/
   /devices
 /domain
 ---
  cfg.mk   |1 +
  daemon/Makefile.am   |4 +
  daemon/libvirtd.c|7 +
  po/POTFILES.in   |2 +
  src/Makefile.am  |   36 +-
  src/kvmtool/kvmtool_conf.c   |  130 ++
  src/kvmtool/kvmtool_conf.h   |   66 +
  src/kvmtool/kvmtool_driver.c | 3079 
 ++
  src/kvmtool/kvmtool_driver.h |   29 +

My main suggestion here would be to split up the kvmtool_driver.c
file into 3 parts as we did with the QEMU driver.

  kvmtool_driver.c   - Basic libvirt API glue
  kvmtool_command.c  - ARGV generation
  kvmtool_process.c  - KVMtool process start/stop/autostart/autodestroy

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for November 29

2011-11-30 Thread Daniel P. Berrange

On Wed, Nov 30, 2011 at 11:22:37AM +0200, Alon Levy wrote:
 On Tue, Nov 29, 2011 at 04:59:51PM -0600, Anthony Liguori wrote:
  On 11/29/2011 10:59 AM, Avi Kivity wrote:
  On 11/29/2011 05:51 PM, Juan Quintela wrote:
  How to do high level stuff?
  - python?
  
  
  One of the disadvantages of the various scripting languages is the lack
  of static type checking, which makes it harder to do full sweeps of the
  source for API changes, relying on the compiler to catch type (or other)
  errors.
  
  This is less interesting to me (figuring out the perfectest language to 
  use).
  
  I think what's more interesting is the practical execution of
  something like this.  Just assuming we used python (since that's
  what I know best), I think we could do something like this:
  
  1) We could write a binding layer to expose the QMP interface as a
  python module.  This would be very little binding code but would
  bring a bunch of functionality to python bits.
 
 If going this route, I would propose to use gobject-introspection [1]
 instead of directly binding to python. You should be able to get
 multiple languages support this way, including python. I think it
 requires using glib 3.0, but I haven't tested it myself (yet). Maybe
 someone more knowledgable can shoot it down.
 
 [1] http://live.gnome.org/GObjectIntrospection/
 
 Actually this might make sense for the whole of QEMU. I think for a
 defined interface like QMP implementing the interface directly in python
 makes more sense. But having qemu itself GObject'ified and scriptable
 is cool. It would also lend it self to 4) without going through 2), but
 also make 2) possible (with any language, not just python).

I think taking advantage of GObject introspection is fine idea - I
certainly don't want to manually create python (or any other language)
bindings for any C code ever again. GObject + introspection takes away
all the burden of supporting access to C code from non-C languages.
Given that QEMU has already adopted GLib as mandatory infrastructure,
going down the GObject route seems like a very natural fit/direction
to take.

If people like the idea of a higher level language for QEMU, but are
concerned about performance / overhead of embedding a scripting
language in QEMU, then GObject introspection opens the possibilty of
writing in Vala, which is a higher level language which compiles
straight down to machine code like C does.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
 On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
   Live migration with qcow2 or any other image format is just not going to 
   work 
   right now even with proper clustered storage.  I think doing a block 
   level flush 
   cache interface and letting block devices decide how to do it is the best 
   approach.
 
  I would really prefer reusing the existing open/close code. It means
  less (duplicated) code, is existing code that is well tested and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can reopen
  only the topmost layer (i.e. the format, but not the protocol) for now
  and in 1.1 we can use bdrv_reopen().
 
 
 Intuitively I dislike _reopen style interfaces.  If the second open
 yields different results from the first, does it invalidate any
 computations in between?
 
 What's wrong with just delaying the open?

If you delay the 'open' until the mgmt app issues 'cont', then you loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a five
stage migration handshake to cope with rollback when 'cont' fails.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
   On 11/11/2011 12:15 PM, Kevin Wolf wrote:
Am 10.11.2011 22:30, schrieb Anthony Liguori:
 Live migration with qcow2 or any other image format is just not going 
 to work 
 right now even with proper clustered storage.  I think doing a block 
 level flush 
 cache interface and letting block devices decide how to do it is the 
 best approach.
   
I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and doesn't
make migration much of a special case.
   
If you want to avoid reopening the file on the OS level, we can reopen
only the topmost layer (i.e. the format, but not the protocol) for now
and in 1.1 we can use bdrv_reopen().
   
   
   Intuitively I dislike _reopen style interfaces.  If the second open
   yields different results from the first, does it invalidate any
   computations in between?
   
   What's wrong with just delaying the open?
  
  If you delay the 'open' until the mgmt app issues 'cont', then you loose
  the ability to rollback to the source host upon open failure for most
  deployed versions of libvirt. We only fairly recently switched to a five
  stage migration handshake to cope with rollback when 'cont' fails.
  
  Daniel
 
 I guess reopen can fail as well, so this seems to me to be an important
 fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the possibility
of configuration mistakes, and will have caught most storage runtime errors
too. So there is a very significant difference in reliability between doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just not going 
  to work 
  right now even with proper clustered storage.  I think doing a block 
  level flush 
  cache interface and letting block devices decide how to do it is the 
  best approach.
 
  I would really prefer reusing the existing open/close code. It means
  less (duplicated) code, is existing code that is well tested and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can reopen
  only the topmost layer (i.e. the format, but not the protocol) for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then you loose
  the ability to rollback to the source host upon open failure for most
  deployed versions of libvirt. We only fairly recently switched to a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a later
  re-open will succeed too, because you have already elminated the possibility
  of configuration mistakes, and will have caught most storage runtime errors
  too. So there is a very significant difference in reliability between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
   Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
On 11/11/2011 12:15 PM, Kevin Wolf wrote:
Am 10.11.2011 22:30, schrieb Anthony Liguori:
Live migration with qcow2 or any other image format is just not 
going to work 
right now even with proper clustered storage.  I think doing a 
block level flush 
cache interface and letting block devices decide how to do it is 
the best approach.
   
I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and 
doesn't
make migration much of a special case.
   
If you want to avoid reopening the file on the OS level, we can 
reopen
only the topmost layer (i.e. the format, but not the protocol) for 
now
and in 1.1 we can use bdrv_reopen().
   
   
Intuitively I dislike _reopen style interfaces.  If the second open
yields different results from the first, does it invalidate any
computations in between?
   
What's wrong with just delaying the open?
   
If you delay the 'open' until the mgmt app issues 'cont', then you 
loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a 
five
stage migration handshake to cope with rollback when 'cont' fails.
   
Daniel
   
I guess reopen can fail as well, so this seems to me to be an important
fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the 
possibility
of configuration mistakes, and will have caught most storage runtime 
errors
too. So there is a very significant difference in reliability between 
doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.
   
   Do you have some more details on the kind of errors? Missing files,
   permissions, something like this? Or rather something related to the
   actual content of an image file?
  
  Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
  setup. Access permissions due to incorrect user / group setup, or read
  only mounts, or SELinux denials. Actual I/O errors are less common and
  are not so likely to cause QEMU to fail to start any, since QEMU is
  likely to just report them to the guest OS instead.
 
 Do you run qemu with -S, then give a 'cont' command to start it?

Yes

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:37:27AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
   On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just 
  not going to work 
  right now even with proper clustered storage.  I think doing a 
  block level flush 
  cache interface and letting block devices decide how to do it 
  is the best approach.
 
  I would really prefer reusing the existing open/close code. It 
  means
  less (duplicated) code, is existing code that is well tested 
  and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can 
  reopen
  only the topmost layer (i.e. the format, but not the protocol) 
  for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second 
  open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then 
  you loose
  the ability to rollback to the source host upon open failure for 
  most
  deployed versions of libvirt. We only fairly recently switched to 
  a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an 
  important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a 
  later
  re-open will succeed too, because you have already elminated the 
  possibility
  of configuration mistakes, and will have caught most storage 
  runtime errors
  too. So there is a very significant difference in reliability 
  between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at 
  detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.
   
   Do you run qemu with -S, then give a 'cont' command to start it?
 
 Probably in an attempt to improve reliability :)

Not really. We can't simply let QEMU start its own CPUs, because there are
various tasks that need performing after the migration transfer finishes,
but before the CPUs are allowed to be started. eg

 - Finish 802.11Qb{g,h} (VEPA) network port profile association on target
 - Release leases for any resources associated with the source QEMU
   via a configured lock manager (eg sanlock)
 - Acquire leases for any resources associated with the target QEMU
   via a configured lock manager (eg sanlock)

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
 On Mon, Nov 14, 2011 at 11:37:27AM +, Daniel P. Berrange wrote:
  On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
   On Mon, Nov 14, 2011 at 11:29:18AM +, Daniel P. Berrange wrote:
On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
 Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
  On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
  On Mon, Nov 14, 2011 at 10:16:10AM +, Daniel P. Berrange wrote:
  On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
  On 11/11/2011 12:15 PM, Kevin Wolf wrote:
  Am 10.11.2011 22:30, schrieb Anthony Liguori:
  Live migration with qcow2 or any other image format is just 
  not going to work 
  right now even with proper clustered storage.  I think doing a 
  block level flush 
  cache interface and letting block devices decide how to do it 
  is the best approach.
 
  I would really prefer reusing the existing open/close code. It 
  means
  less (duplicated) code, is existing code that is well tested 
  and doesn't
  make migration much of a special case.
 
  If you want to avoid reopening the file on the OS level, we can 
  reopen
  only the topmost layer (i.e. the format, but not the protocol) 
  for now
  and in 1.1 we can use bdrv_reopen().
 
 
  Intuitively I dislike _reopen style interfaces.  If the second 
  open
  yields different results from the first, does it invalidate any
  computations in between?
 
  What's wrong with just delaying the open?
 
  If you delay the 'open' until the mgmt app issues 'cont', then 
  you loose
  the ability to rollback to the source host upon open failure for 
  most
  deployed versions of libvirt. We only fairly recently switched to 
  a five
  stage migration handshake to cope with rollback when 'cont' fails.
 
  Daniel
 
  I guess reopen can fail as well, so this seems to me to be an 
  important
  fix but not a blocker.
  
  If if the initial open succeeds, then it is far more likely that a 
  later
  re-open will succeed too, because you have already elminated the 
  possibility
  of configuration mistakes, and will have caught most storage 
  runtime errors
  too. So there is a very significant difference in reliability 
  between doing
  an 'open at startup + reopen at cont' vs just 'open at cont'
  
  Based on the bug reports I see, we want to be very good at 
  detecting and
  gracefully handling open errors because they are pretty frequent.
 
 Do you have some more details on the kind of errors? Missing files,
 permissions, something like this? Or rather something related to the
 actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.
   
   Do you run qemu with -S, then give a 'cont' command to start it?
  
  Yes
 
 OK, so let's go back one step now - how is this related to
 'rollback to source host'?

In the old libvirt migration protocol, by the time we run 'cont' on the
destination, the source QEMU has already been killed off, so there's
nothing to resume on failure.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange

On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
 What does libvirt actually do in the monitor prior to migration
 completing on the destination?  The least invasive way of doing
 delayed open of block devices is probably to make -incoming create a
 monitor and run a main loop before the block devices (and full
 device model) is initialized.  Since this isolates the changes
 strictly to migration, I'd feel okay doing this for 1.0 (although it
 might need to be in the stable branch).

The way migration works with libvirt wrt QEMU interactions is now
as follows

 1. Destination.
   Run   qemu -incoming ...args...
   Query chardevs via monitor
   Query vCPU threads via monitor
   Set disk / vnc passwords
   Set netdev link states
   Set balloon target

 2. Source
   Set  migration speed
   Set  migration max downtime
   Run  migrate command (detached)
   while 1
  Query migration status
  if status is failed or success
break;

 3. Destination
  If final status was success
 Run  'cont' in monitor
  else
 kill QEMU process

 4. Source
  If final status was success and 'cont' on dest succeeded
 kill QEMU process
  else
 Run 'cont' in monitor


In older libvirt, the bits from step 4, would actually take place
at the end of step 2. This meant we could end up with no QEMU
on either the source or dest, if starting CPUs on the dest QEMU
failed for some reason.


We would still really like to have a 'query-migrate' command for
the destination, so that we can confirm that the destination has
consumed all incoming migrate data successfully, rather than just
blindly starting CPUs and hoping for the best.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] qemu and qemu.git - Migration + disk stress introduces qcow2 corruptions

2011-11-10 Thread Daniel P. Berrange

On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote:
 On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
 On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
 What does libvirt actually do in the monitor prior to migration
 completing on the destination?  The least invasive way of doing
 delayed open of block devices is probably to make -incoming create a
 monitor and run a main loop before the block devices (and full
 device model) is initialized.  Since this isolates the changes
 strictly to migration, I'd feel okay doing this for 1.0 (although it
 might need to be in the stable branch).
 
 The way migration works with libvirt wrt QEMU interactions is now
 as follows
 
   1. Destination.
 Run   qemu -incoming ...args...
 Query chardevs via monitor
 Query vCPU threads via monitor
 Set disk / vnc passwords
 
 Since RHEL carries Juan's patch, and Juan's patch doesn't handle
 disk passwords gracefully, how does libvirt cope with that?

No idea, that's the first I've heard of any patch that causes
problems with passwords in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange

On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
 Kevin Wolf kw...@redhat.com writes:
 
  Am 25.10.2011 16:06, schrieb Anthony Liguori:
  On 10/25/2011 08:56 AM, Kevin Wolf wrote:
  Am 25.10.2011 15:05, schrieb Anthony Liguori:
  I'd be much more open to changing the default mode to cache=none FWIW 
  since the
  risk of data loss there is much, much lower.
 
  I think people said that they'd rather not have cache=none as default
  because O_DIRECT doesn't work everywhere.
  
  Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
  know it 
  works on ext[234], btrfs, nfs.
 
  Besides file systems (and probably OSes) that don't support O_DIRECT,
  there's another case: Our defaults don't work on 4k sector disks today.
  You need to explicitly specify the logical_block_size qdev property for
  cache=none to work on them.
 
  And changing this default isn't trivial as the right value doesn't only
  depend on the host disk, but it's also guest visible. The only way out
  would be bounce buffers, but I'm not sure that doing that silently is a
  good idea...
 
 Sector size is a device property.
 
 If the user asks for a 4K sector disk, and the backend can't support
 that, we need to reject the configuration.  Just like we reject
 read-only backends for read/write disks.

I don't see why we need to reject a guest disk with 4k sectors,
just because the host disk only has 512 byte sectors. A guest
sector size that's a larger multiple of host sector size should
work just fine. It just means any guest sector write will update
8 host sectors at a time. We only have problems if guest sector
size is not a multiple of host sector size, in which case bounce
buffers are the only option (other than rejecting the config
which is not too nice).

IIUC, current QEMU behaviour is

   Guest 512Guest 4k
 Host 512   * OK  OK
 Host 4k* I/O Err OK

'*' marks defaults

IMHO, QEMU needs to work withot I/O errors in all of these
combinations, even if this means having to use bounce buffers
in some of them. That said, IMHO the default should be for
QEMU to avoid bounce buffers, which implies it should either
chose guest sector size to match host sector size, or it
should unconditionally use 4k guest. IMHO we need the former

   Guest 512  Guest 4k
 Host 512   *OK OK
 Host 4k OK*OK


Yes, I know there are other wierd sector sizes besides 512
and 4k, but the same general principals apply of either one
being a multiple of the other, or needing to use bounce
buffers.

 If the backend can only support it by using bounce buffers, I'd say
 reject it unless the user explicitly permits bounce buffers.  But that's
 debatable.

I don't think it really adds value for QEMU to force the user to specify
some extra magic flag in order to make the user's requested config
actually be honoured. If a config needs bounce buffers, QEMU should just
do it, without needing 'use-bounce-buffers=1'. A higher level mgmt app
is in a better position to inform users about the consequences.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-26 Thread Daniel P. Berrange

On Wed, Oct 26, 2011 at 01:23:05PM +0200, Kevin Wolf wrote:
 Am 26.10.2011 11:57, schrieb Daniel P. Berrange:
  On Wed, Oct 26, 2011 at 10:48:12AM +0200, Markus Armbruster wrote:
  Kevin Wolf kw...@redhat.com writes:
 
  Am 25.10.2011 16:06, schrieb Anthony Liguori:
  On 10/25/2011 08:56 AM, Kevin Wolf wrote:
  Am 25.10.2011 15:05, schrieb Anthony Liguori:
  I'd be much more open to changing the default mode to cache=none FWIW 
  since the
  risk of data loss there is much, much lower.
 
  I think people said that they'd rather not have cache=none as default
  because O_DIRECT doesn't work everywhere.
 
  Where doesn't it work these days?  I know it doesn't work on tmpfs.  I 
  know it 
  works on ext[234], btrfs, nfs.
 
  Besides file systems (and probably OSes) that don't support O_DIRECT,
  there's another case: Our defaults don't work on 4k sector disks today.
  You need to explicitly specify the logical_block_size qdev property for
  cache=none to work on them.
 
  And changing this default isn't trivial as the right value doesn't only
  depend on the host disk, but it's also guest visible. The only way out
  would be bounce buffers, but I'm not sure that doing that silently is a
  good idea...
 
  Sector size is a device property.
 
  If the user asks for a 4K sector disk, and the backend can't support
  that, we need to reject the configuration.  Just like we reject
  read-only backends for read/write disks.
  
  I don't see why we need to reject a guest disk with 4k sectors,
  just because the host disk only has 512 byte sectors. A guest
  sector size that's a larger multiple of host sector size should
  work just fine. It just means any guest sector write will update
  8 host sectors at a time. We only have problems if guest sector
  size is not a multiple of host sector size, in which case bounce
  buffers are the only option (other than rejecting the config
  which is not too nice).
  
  IIUC, current QEMU behaviour is
  
 Guest 512Guest 4k
   Host 512   * OK  OK
   Host 4k* I/O Err OK
  
  '*' marks defaults
  
  IMHO, QEMU needs to work withot I/O errors in all of these
  combinations, even if this means having to use bounce buffers
  in some of them. That said, IMHO the default should be for
  QEMU to avoid bounce buffers, which implies it should either
  chose guest sector size to match host sector size, or it
  should unconditionally use 4k guest. IMHO we need the former
  
 Guest 512  Guest 4k
   Host 512   *OK OK
   Host 4k OK*OK
 
 I'm not sure if a 4k host should imply a 4k guest by default. This means
 that some guests wouldn't be able to run on a 4k host. On the other
 hand, for those guests that can do 4k, it would be the much better option.
 
 So I think this decision is the hard thing about it.

I guess it somewhat depends whether we want to strive for

 1. Give the user the fastest working config by default
 2. Give the user a working config by default
 3. Give the user the fastest (possibly broken) config by default

IMHO 3 is not a serious option, but I could see 2 as a reasonable
tradeoff to avoid complexity in chosing QEMU defaults. The user
would have a working config with 512 sectors, but sub-optimal perf
on 4k hosts due to bounce buffering. Ideally libvirt or other
higher app would be setting the best block size that a guest
can support by default, so bounce buffers would rarely be needed.
So only people using QEMU directly without setting a block size
would ordinarily suffer the bounce buffer perf hit on a 4k host
host

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/11] virt: Introducing libvirt VM class

2011-10-12 Thread Daniel P. Berrange

On Tue, Oct 11, 2011 at 06:07:11PM -0300, Lucas Meneghel Rodrigues wrote:
 This is a first attempt at providing a libvirt VM class,
 in order to implement the needed methods for virt testing.
 With this class, we will be able to implement a libvirt
 test, that behaves similarly to the KVM test.
 
 As of implementation details, libvirt_vm uses virsh
 (a userspace program written on top of libvirt) to
 do domain start, stop, verification of status and
 other common operations. The reason why virsh was
 used is to get more coverage of the userspace stack
 that libvirt offers, and also to catch issues that
 virsh users would catch.

Personally I would have recommended that you use the libvirt Python API.
virsh is a very thin layer over the libvirt API, which mostly avoidse
adding any logic of its own, so once it has been tested once, there's
not much value in doing more. By using the Python API directly, you will
be able todo more intelligent handling of errors, since you'll get the
full libvirt python exception object instead of a blob of stuff on stderr.
Not to mention that it is so much more efficient, and robust against
any future changes in virsh.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How many threads should a kvm vm be starting?

On Tue, Sep 27, 2011 at 04:04:41PM -0600, Thomas Fjellstrom wrote:
 On September 27, 2011, Avi Kivity wrote:
  On 09/27/2011 03:29 AM, Thomas Fjellstrom wrote:
   I just noticed something interesting, a virtual machine on one of my
   servers seems to have 69 threads (including the main thread). Other
   guests on the machine only have a couple threads.
   
   Is this normal? or has something gone horribly wrong?
  
  It's normal if the guest does a lot of I/O.  The thread count should go
  down when the guest idles.
 
 Ah, that would make sense. Though it kind of defeats assigning a vm a single 
 cpu/core. A single VM can now DOS an entire multi-core-cpu server. It pretty 
 much pegged my dual core (with HT) server for a couple hours.

You can mitigate these problems by putting each KVM process in its own
cgroup, and using the 'cpu_shares' tunable to ensure that each KVM
process gets the same relative ratio of CPU time, regardless of how
many threads it is running. With newer kernels there are other CPU
tunables for placing hard caps on CPU utilization of the process as
a whole too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

On Tue, Sep 27, 2011 at 08:10:21PM +0200, Reeted wrote:
I repost this, this time by also including the libvirt mailing list.

Info on my libvirt: it's the version in Ubuntu 11.04 Natty which is
0.8.8-1ubuntu6.5 . I didn't recompile this one, while Kernel and
qemu-kvm are vanilla and compiled by hand as described below.

My original message follows:

This is really strange.

I just installed a new host with kernel 3.0.3 and Qemu-KVM 0.14.1
compiled by me.

I have created the first VM.
This is on LVM, virtio etc... if I run it directly from bash
console, it boots in 8 seconds (it's a bare ubuntu with no
graphics), while if I boot it under virsh (libvirt) it boots in
20-22 seconds. This is the time from after Grub to the login prompt,
or from after Grub to the ssh-server up.

I was almost able to replicate the whole libvirt command line on the
bash console, and it still goes almost 3x faster when launched from
bash than with virsh start vmname. The part I wasn't able to
replicate is the -netdev part because I still haven't understood the
semantics of it.

-netdev is just an alternative way of setting up networking that
avoids QEMU's nasty VLAN concept. Using -netdev allows QEMU to
use more efficient codepaths for networking, which should improve
the network performance.

This is my bash commandline:

/opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
-m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
-boot order=dc,menu=on -drive
file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-drive
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
-usb -vnc 127.0.0.1:0 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

This shows KVM is being requested, but we should validate that KVM is
definitely being activated when under libvirt. You can test this by
doing:

virsh qemu-monitor-command vmname1 'info kvm'

Which was taken from libvirt's command line. The only modifications
I did to the original libvirt commandline (seen with ps aux) were:

- Removed -S

Fine, has no effect on performance.

- Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
Has been simplified to: -net nic,model=virtio -net
tap,ifname=tap0,script=no,downscript=no
and manual bridging of the tap0 interface.

You could have equivalently used

-netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3

That said, I don't expect this has anything todo with the performance
since booting a guest rarely involves much network I/O unless you're
doing something odd like NFS-root / iSCSI-root.

Firstly I had thought that this could be fault of the VNC: I have
compiled qemu-kvm with no separate vnc thread. I thought that
libvirt might have connected to the vnc server at all times and this
could have slowed down the whole VM.
But then I also tried connecting vith vncviewer to the KVM machine
launched directly from bash, and the speed of it didn't change. So
no, it doesn't seem to be that.

Yeah, I have never seen VNC be responsible for the kind of slowdown
you describe.

BTW: is the slowdown of the VM on no separate vnc thread only in
effect when somebody is actually connected to VNC, or always?

Probably, but again I dont think it is likely to be relevant here.

Also, note that the time difference is not visible in dmesg once the
machine has booted. So it's not a slowdown in detecting devices.
Devices are always detected within the first 3 seconds, according to
dmesg, at 3.6 seconds the first ext4 mount begins. It seems to be
really the OS boot that is slow... it seems an hard disk performance
problem.

There are a couple of things that would be different between running the
VM directly, vs via libvirt.

- Security drivers - SELinux/AppArmour
- CGroups

If it is was AppArmour causing this slowdown I don't think you would have
been the first person to complain, so lets ignore that. Which leaves
cgroups as a likely culprit. Do a

grep cgroup /proc/mounts

if any of them are mounted, then for each cgroups mount in turn,

- Umount the cgroup
- Restart libvirtd
- Test your guest boot performance

Regards,
Daniel
--
|: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
 On 09/28/11 09:51, Daniel P. Berrange wrote:
 This is my bash commandline:
 
 /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
 -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
 ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
 -boot order=dc,menu=on -drive 
 file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
 -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
 -usb -vnc 127.0.0.1:0 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 
 This shows KVM is being requested, but we should validate that KVM is
 definitely being activated when under libvirt. You can test this by
 doing:
 
  virsh qemu-monitor-command vmname1 'info kvm'
 
 kvm support: enabled
 
 I think I would see a higher impact if it was KVM not enabled.
 
 Which was taken from libvirt's command line. The only modifications
 I did to the original libvirt commandline (seen with ps aux) were:


 - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 Has been simplified to: -net nic,model=virtio -net
 tap,ifname=tap0,script=no,downscript=no
 and manual bridging of the tap0 interface.
 You could have equivalently used
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
   -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 
 It's this! It's this!! (thanks for the line)
 
 It raises boot time by 10-13 seconds

Ok, that is truely bizarre and I don't really have any explanation
for why that is. I guess you could try 'vhost=off' too and see if that
makes the difference.

 
 But now I don't know where to look During boot there is a pause
 usually between /scripts/init-bottom  (Ubuntu 11.04 guest) and the
 appearance of login prompt, however that is not really meaningful
 because there is probably much background activity going on there,
 with init etc. which don't display messages
 
 
 init-bottom does just this
 
 -
 #!/bin/sh -e
 # initramfs init-bottom script for udev
 
 PREREQ=
 
 # Output pre-requisites
 prereqs()
 {
 echo $PREREQ
 }
 
 case $1 in
 prereqs)
 prereqs
 exit 0
 ;;
 esac
 
 
 # Stop udevd, we'll miss a few events while we run init, but we catch up
 pkill udevd
 
 # Move /dev to the real filesystem
 mount -n -o move /dev ${rootmnt}/dev
 -
 
 It doesn't look like it should take time to execute.
 So there is probably some other background activity going on... and
 that is slower, but I don't know what that is.
 
 
 Another thing that can be noticed is that the dmesg message:
 
 [   13.290173] eth0: no IPv6 routers present
 
 (which is also the last message)
 
 happens on average 1 (one) second earlier in the fast case (-net)
 than in the slow case (-netdev)

Hmm, none of that looks particularly suspect. So I don't really have
much idea what else to try apart from the 'vhost=off' possibilty.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
 On 09/28/11 11:28, Daniel P. Berrange wrote:
 On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
 On 09/28/11 09:51, Daniel P. Berrange wrote:
 This is my bash commandline:
 
 /opt/qemu-kvm-0.14.1/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm
 -m 2002 -smp 2,sockets=2,cores=1,threads=1 -name vmname1-1 -uuid
 ee75e28a-3bf3-78d9-3cba-65aa63973380 -nodefconfig -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/vmname1-1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc
 -boot order=dc,menu=on -drive 
 file=/dev/mapper/vgPtpVM-lvVM_Vmname1_d1,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none,aio=native
 -device 
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,cache=none,aio=native
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -net nic,model=virtio -net tap,ifname=tap0,script=no,downscript=no
 -usb -vnc 127.0.0.1:0 -vga cirrus -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 This shows KVM is being requested, but we should validate that KVM is
 definitely being activated when under libvirt. You can test this by
 doing:
 
  virsh qemu-monitor-command vmname1 'info kvm'
 kvm support: enabled
 
 I think I would see a higher impact if it was KVM not enabled.
 
 Which was taken from libvirt's command line. The only modifications
 I did to the original libvirt commandline (seen with ps aux) were:
 
 - Network was: -netdev tap,fd=17,id=hostnet0,vhost=on,vhostfd=18
 -device 
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 Has been simplified to: -net nic,model=virtio -net
 tap,ifname=tap0,script=no,downscript=no
 and manual bridging of the tap0 interface.
 You could have equivalently used
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
   -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 It's this! It's this!! (thanks for the line)
 
 It raises boot time by 10-13 seconds
 Ok, that is truely bizarre and I don't really have any explanation
 for why that is. I guess you could try 'vhost=off' too and see if that
 makes the difference.
 
 YES!
 It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
 
 ...meaning? :-)

I've no idea. I was always under the impression that 'vhost=on' was
the 'make it go much faster' switch. So something is going wrong
here that I cna't explain.

Perhaps one of the network people on this list can explain...


To turn vhost off in the libvirt XML, you should be able to use
driver name='qemu'/ for the interface in question,eg


interface type='user'
  mac address='52:54:00:e5:48:58'/
  model type='virtio'/
  driver name='qemu'/
/interface

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] Qemu/KVM is 3x slower under libvirt (due to vhost=on)

On Wed, Sep 28, 2011 at 12:19:09PM +0200, Reeted wrote:
 On 09/28/11 11:53, Daniel P. Berrange wrote:
 On Wed, Sep 28, 2011 at 11:49:01AM +0200, Reeted wrote:
 YES!
 It's the vhost. With vhost=on it takes about 12 seconds more time to boot.
 
 ...meaning? :-)
 I've no idea. I was always under the impression that 'vhost=on' was
 the 'make it go much faster' switch. So something is going wrong
 here that I cna't explain.
 
 Perhaps one of the network people on this list can explain...
 
 
 To turn vhost off in the libvirt XML, you should be able to use
 driver name='qemu'/  for the interface in question,eg
 
 
  interface type='user'
mac address='52:54:00:e5:48:58'/
model type='virtio'/
driver name='qemu'/
  /interface
 
 
 Ok that seems to work: it removes the vhost part in the virsh launch
 hence cutting down 12secs of boot time.
 
 If nobody comes out with an explanation of why, I will open another
 thread on the kvm list for this. I would probably need to test disk
 performance on vhost=on to see if it degrades or it's for another
 reason that boot time is increased.

Be sure to CC the qemu-devel mailing list too next time, since that has
a wider audience who might be able to help


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] Avoid the use of deprecated gnutls gnutls_*_set_priority functions.

2011-08-25 Thread Daniel P. Berrange

On Thu, Aug 25, 2011 at 11:54:41AM +0100, Stefan Hajnoczi wrote:
 On Mon, Jul 4, 2011 at 11:00 PM, Raghavendra D Prabhu
 raghu.prabh...@gmail.com wrote:
  The gnutls_*_set_priority family of functions has been marked deprecated
  in 2.12.x. These functions have been superceded by
  gnutls_priority_set_direct().
 
  Signed-off-by: Raghavendra D Prabhu rpra...@wnohang.net
  ---
   ui/vnc-tls.c |   20 +---
   1 files changed, 1 insertions(+), 19 deletions(-)
 
  diff --git a/ui/vnc-tls.c b/ui/vnc-tls.c
  index dec626c..33a5d8c 100644
  --- a/ui/vnc-tls.c
  +++ b/ui/vnc-tls.c
  @@ -286,10 +286,6 @@ int vnc_tls_validate_certificate(struct VncState *vs)
 
   int vnc_tls_client_setup(struct VncState *vs,
                           int needX509Creds) {
  -    static const int cert_type_priority[] = { GNUTLS_CRT_X509, 0 };
  -    static const int protocol_priority[]= { GNUTLS_TLS1_1, GNUTLS_TLS1_0, 
  GNUTLS_SSL3, 0 };
  -    static const int kx_anon[] = {GNUTLS_KX_ANON_DH, 0};
  -    static const int kx_x509[] = {GNUTLS_KX_DHE_DSS, GNUTLS_KX_RSA, 
  GNUTLS_KX_DHE_RSA, GNUTLS_KX_SRP, 0};
 
      VNC_DEBUG(Do TLS setup\n);
      if (vnc_tls_initialize()  0) {
  @@ -310,21 +306,7 @@ int vnc_tls_client_setup(struct VncState *vs,
              return -1;
          }
 
  -        if (gnutls_kx_set_priority(vs-tls.session, needX509Creds ? 
  kx_x509 : kx_anon)  0) {
  -            gnutls_deinit(vs-tls.session);
  -            vs-tls.session = NULL;
  -            vnc_client_error(vs);
  -            return -1;
  -        }
  -
  -        if (gnutls_certificate_type_set_priority(vs-tls.session, 
  cert_type_priority)  0) {
  -            gnutls_deinit(vs-tls.session);
  -            vs-tls.session = NULL;
  -            vnc_client_error(vs);
  -            return -1;
  -        }
  -
  -        if (gnutls_protocol_set_priority(vs-tls.session, 
  protocol_priority)  0) {
  +        if (gnutls_priority_set_direct(vs-tls.session, needX509Creds ? 
  NORMAL : NORMAL:+ANON-DH, NULL)  0) {
              gnutls_deinit(vs-tls.session);
              vs-tls.session = NULL;
              vnc_client_error(vs);
  --
  1.7.6
 
 Daniel,
 This patch looks good to me but I don't know much about gnutls or
 crypto in general.  Would you be willing to review this?

ACK, this approach is different from what I did in libvirt, but it matches
the recommendations in the GNUTLS manual for setting priority, so I believe
it is good.

Signed-off-by: Daniel P. Berrange berra...@redhat.com

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DMI BIOS String

2011-08-22 Thread Daniel P. Berrange

On Mon, Aug 22, 2011 at 03:52:19PM +1200, Derek wrote:
 Hi Folks,
 
 I could not track down any solid info on modifying the DMI BIOS string.
 
 For example, in VirtualBox you can use 'vboxmanage setsextradata' to
 set the BIOS product and vendor string per VM.
 
 Any ideas if this is possible with KVM?

If using QEMU directly you can use '-smbios' args. eg

-smbios type=0,vendor=LENOVO,version=6FET82WW (3.12 )
-smbios 
type=1,manufacturer=Fedora,product=Virt-Manager,version=0.8.2-3.fc14,serial=32dfcb37-5af1-552b-357c-be8c3aa38310,uuid=c7a5fdbd-edaf-9455-926a-d65c16db1809,sku=1234567890,family=Red
 Hat

If using QEMU via libvirt you can use the following:

  http://libvirt.org/formatdomain.html#elementsSysinfo


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-05 Thread Daniel P. Berrange

From: Daniel P. Berrange berra...@redhat.com

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined

Signed-off-by: Daniel P. Berrange berra...@redhat.com
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default accelerator, kvm */
 p = kvm;
+#else
+/* Use the default accelerator, tcg */
+p = tcg;
+#endif
 }
 
 while (!accel_initalised  *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH master+STABLE-0.15] Fix default accelerator when configured with --disable-kvm

2011-08-04 Thread Daniel P. Berrange

From: Daniel P. Berrange berra...@redhat.com

The default accelerator is hardcoded to 'kvm'. This is a fine
default for qemu-kvm normally, but if the user built with
./configure --disable-kvm, then the resulting binaries will
not work by default

* vl.c: Default to 'tcg' unless CONFIG_KVM is defined
---
 vl.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7ae549e..28fd2f3 100644
--- a/vl.c
+++ b/vl.c
@@ -1953,8 +1953,13 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
+#ifdef CONFIG_KVM
 /* Use the default accelerator, kvm */
 p = kvm;
+#else
+/* Use the default accelerator, tcg */
+p = tcg;
+#endif
 }
 
 while (!accel_initalised  *p != '\0') {
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Introduce panic hypercall

2011-06-20 Thread Daniel P. Berrange

On Mon, Jun 20, 2011 at 06:31:23PM +0300, Avi Kivity wrote:
 On 06/20/2011 04:38 PM, Daniel Gollub wrote:
 Introduce panic hypercall to enable the crashing guest to notify the
 host. This enables the host to run some actions as soon a guest
 crashed (kernel panic).
 
 This patch series introduces the panic hypercall at the host end.
 As well as the hypercall for KVM paravirtuliazed Linux guests, by
 registering the hypercall to the panic_notifier_list.
 
 The basic idea is to create KVM crashdump automatically as soon the
 guest paniced and power-cycle the VM (e.g. libvirton_crash /).
 
 This would be more easily done via a panic device (I/O port or
 memory-mapped address) that the guest hits.  It would be intercepted
 by qemu without any new code in kvm.\
 
 However, I'm not sure I see the gain.  Most enterprisey guests
 already contain in-guest crash dumpers which provide more
 information than a qemu memory dump could, since they know exact
 load addresses etc. and are integrated with crash analysis tools.
 What do you have in mind?

Well libvirt can capture a core file by doing 'virsh dump $GUESTNAME'.
This actually uses the QEMU monitor migration command to capture the
entire of QEMU memory. The 'crash' command line tool actually knows
how to analyse this data format as it would a normal kernel crashdump.

I think having a way for a guest OS to notify the host that is has
crashed would be useful. libvirt could automatically do a crash
dump of the QEMU memory, or at least pause the guest CPUs and notify
the management app of the crash, which can then decide what todo.
You can also use tools like 'virt-dmesg' which uses libvirt to peek
into guest memory to extract the most recent kernel dmesg logs (even
if the guest OS itself is crashed  didn't manage to send them out
via netconsole or something else).

This series does need to introduce a QMP event notification upon
crash, so that the crash notification can be propagated to mgmt
layers above QEMU.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: drop -enable-nesting

2011-05-31 Thread Daniel P. Berrange

On Mon, May 30, 2011 at 06:19:14PM +0300, Avi Kivity wrote:
 On 05/30/2011 06:15 PM, Jan Kiszka wrote:
 On 2011-05-30 17:10, Roedel, Joerg wrote:
   On Mon, May 30, 2011 at 11:04:02AM -0400, Jan Kiszka wrote:
   On 2011-05-30 16:38, Nadav Har'El wrote:
   On Mon, May 30, 2011, Jan Kiszka wrote about drop -enable-nesting 
  (was: [PATCH 3/7] cpu model bug fixes and definition corrections...):
   On 2011-05-30 10:18, Roedel, Joerg wrote:
   On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote:
 
   J�rg, how to deal with -enable-nesting in qemu-kvm to align behavior
   with upstream?
 
   My personal preference is to just remove it. In upstream-qemu it is
   enabled/disabled by +/-svm. -enable-nesting is just a historic thing
   which can be wiped out.
 
   -enable-nesting could remain as a synonym for enabling either VMX or 
  SVM
   in the guest, depending on what was available in the host (because KVM 
  now
   supports both nested SVM and nested VMX, but not SVM-on-VMX or vice 
  versa).
 
   Why? Once nesting is stable (I think SVM already is), there is no reason
   for an explicit enable. And you can always mask it out via -cpu.
 
   BTW, what are the defaults for SVM right now in qemu-kvm and upstream?
   Enable if the modeled CPU supports it?
 
   qemu-kvm still needs -enable-nesting, otherwise it is disabled. Upstream
   qemu should enable it unconditionally (can be disabled with -cpu ,-svm).
 
 Then let's start with aligning qemu-kvm defaults to upstream? I guess
 that's what the diff I was citing yesterday is responsible for.
 
 In the same run, -enable-nesting could dump a warning on the console
 that this switch is obsolete and will be removed from future versions.
 
 I think it's safe to drop -enable-nesting immediately.  Dan, does
 libvirt make use of it?

Yes, but it should be safe to drop it. Currently, if the user specifies
a CPU with the 'svm' flag present in libvirt guest XML, then we will
pass args '-cpu +svm -enable-nesting'. So if we drop --enable-nesting,
then libvirt will simply omit it and everything should still work because
we have still got +svm set.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange

On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
 On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
  Hello, all,
  
  I have prepared to work on a feature called Disk I/O limits for 
  qemu-kvm projeect.
  This feature will enable the user to cap disk I/O amount performed by a 
  VM.It is important for some storage resources to be shared among multi-VMs. 
  As you've known, if some of VMs are doing excessive disk I/O, they will 
  hurt the performance of other VMs.
  
 
 Hi Zhiyong,
 
 Why not use kernel blkio controller for this and why reinvent the wheel
 and implement the feature again in qemu?

The finest level of granularity offered by cgroups apply limits per QEMU
process. So the blkio controller can't be used to apply controls directly
to individual disks used by QEMU, only the VM as a whole.

We networking we can use 'net_cls' cgroups controller for the process
as a whole, or attach  'tc' to individual TAP devices for per-NIC
throttling, both of which ultimately use the same kernel functionality.
I don't see an equivalent option for throttling individual disks that
would reuse functionality from the blkio controller.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel][RFC]QEMU disk I/O limits

2011-05-31 Thread Daniel P. Berrange

On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
 On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
  On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
   On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
Hello, all,

I have prepared to work on a feature called Disk I/O limits for 
qemu-kvm projeect.
This feature will enable the user to cap disk I/O amount performed 
by a VM.It is important for some storage resources to be shared among 
multi-VMs. As you've known, if some of VMs are doing excessive disk 
I/O, they will hurt the performance of other VMs.

   
   Hi Zhiyong,
   
   Why not use kernel blkio controller for this and why reinvent the wheel
   and implement the feature again in qemu?
  
  The finest level of granularity offered by cgroups apply limits per QEMU
  process. So the blkio controller can't be used to apply controls directly
  to individual disks used by QEMU, only the VM as a whole.
 
 So are multiple VMs using same disk. Then put multiple VMs in same
 cgroup and apply the limit on that disk.
 
 Or if you want to put a system wide limit on a disk, then put all
 VMs in root cgroup and put limit on root cgroups.
 
 I fail to understand what's the exact requirement here. I thought
 the biggest use case was isolation one VM from other which might
 be sharing same device. Hence we were interested in putting 
 per VM limit on disk and not a system wide limit on disk (independent
 of VM).

No, it isn't about putting limits on a disk independant of a VM. It is
about one VM having multiple disks, and wanting to set different policies
for each of its virtual disks. eg

  qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3

and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
limited to 50 MB/s.  You can't do that kind of thing with cgroups,
because it can only control the entire process, not individual
resources within the process.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-14 Thread Daniel P. Berrange

On Wed, Apr 13, 2011 at 10:56:21PM +0300, Blue Swirl wrote:
 On Wed, Apr 13, 2011 at 4:08 PM, Luiz Capitulino lcapitul...@redhat.com 
 wrote:
  On Tue, 12 Apr 2011 21:31:18 +0300
  Blue Swirl blauwir...@gmail.com wrote:
 
  On Tue, Apr 12, 2011 at 10:52 AM, Avi Kivity a...@redhat.com wrote:
   On 04/11/2011 08:15 PM, Blue Swirl wrote:
  
   On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbrusterarm...@redhat.com
    wrote:
 Avi Kivitya...@redhat.com  writes:
   
 On 04/08/2011 12:41 AM, Anthony Liguori wrote:
   
 And it's a good thing to have, but exposing this as the only API to
 do something as simple as generating a guest crash dump is not the
 friendliest thing in the world to do to users.
   
 nmi is a fine name for something that corresponds to a real-life nmi
 button (often labeled NMI).
   
 Agree.
  
   We could also introduce an alias mechanism for user friendly names, so
   nmi could be used in addition of full path. Aliases could be useful
   for device paths as well.
  
   Yes.  Perhaps limited to the human monitor.
 
  I'd limit all debugging commands (including NMI) to the human monitor.
 
  Why?
 
 Do they have any real use in production environment? Also, we should
 have the freedom to change the debugging facilities (for example, to
 improve some internal implementation) as we want without regard to
 compatibility to previous versions.

We have users of libvirt requesting that we add an API for triggering
a NMI. They want this for support in a production environment, to be
able to initiate Windows crash dumps.  We really don't want to have to
use HMP passthrough for this, instead of a proper QMP command.

More generally I don't want to see stuff in HMP, that isn't in the QMP.
We already have far too much that we have to do via HMP passthrough in
libvirt due to lack of QMP commands, to the extent that we might as
well have just ignored QMP and continued with HMP for everything.

If we want the flexibility to change the debugging commands between
releases then we should come up with a plan to do this within the
scope of QMP, not restrict them to HMP only.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command

2011-04-04 Thread Daniel P. Berrange

On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote:
 From: Lai Jiangshan la...@cn.fujitsu.com
 Date: Mon, 7 Mar 2011 17:05:15 +0800
 Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command

 inject-nmi command injects an NMI on all CPUs of guest.
 It is only supported for x86 guest currently, it will
 returns Unsupported error for non-x86 guest.

 ---
  hmp-commands.hx |2 +-
  monitor.c   |   18 +-
  qmp-commands.hx |   29 +
  3 files changed, 47 insertions(+), 2 deletions(-)

Does anyone have any feedback on this addition, or are all new
QMP patch proposals blocked pending Anthony's QAPI work ?

We'd like to support it in libvirt and thus want it to be
available in QMP, as well as HMP.

 @@ -2566,6 +2566,22 @@ static void do_inject_nmi(Monitor *mon, const QDict 
 *qdict)
  break;
  }
  }
 +
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 +{
 +CPUState *env;
 +
 +for (env = first_cpu; env != NULL; env = env-next_cpu)
 +cpu_interrupt(env, CPU_INTERRUPT_NMI);
 +
 +return 0;
 +}
 +#else
 +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
 +{
 +qerror_report(QERR_UNSUPPORTED);
 +return -1;
 +}
  #endif

Interesting that with HMP you need to specify a single CPU index, but
with QMP it is injecting to all CPUs at once. Is there any compelling
reason why we'd ever need the ability to only inject to a single CPU
from an app developer POV ?

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] [Qemu-devel] KVM call minutes for Mar 15

2011-03-17 Thread Daniel P. Berrange

On Tue, Mar 15, 2011 at 12:06:06PM -0700, Chris Wright wrote:
 * Anthony Liguori (anth...@codemonkey.ws) wrote:
  On 03/15/2011 09:53 AM, Chris Wright wrote:
   QAPI
 snip
  - c library implementation is critical to have unit tests and test
 driven development
 - thread safe?
   - no shared state, no statics.
   - threading model requires lock for the qmp session
 - licensiing?
   - LGPL
 - forwards/backwards compat?
   - designed with that in mind see wiki:
  
 http://wiki.qemu.org/Features/QAPI
  
  One neat feature of libqmp is that once libvirt has a better QMP
  passthrough interface, we can create a QmpSession that uses libvirt.
  
  It would look something like:
  
  QmpSession *libqmp_session_new_libvirt(virDomainPtr dom);
 
 Looks like you mean this?
 
- request QmpSession - 
 client  libvirt
- return QmpSession  -
 
 client - QmpSession - QMP - QEMU
 
 So bypassing libvirt completely to actually use the session?
 
 Currently, it's more like:
 
 client - QemuMonitorCommand - libvirt - QMP - QEMU
 
  The QmpSession returned by this call can then be used with all of
  the libqmp interfaces.  This means we can still exercise our test
  suite with a guest launched through libvirt.  It also should make
  the libvirt pass through interface a bit easier to consume by third
  parties.
 
 This sounds like it's something libvirt folks should be involved with.
 At the very least, this mode is there now and considered basically
 unstable/experimental/developer use:
 
  Qemu monitor command '%s' executed; libvirt results may be unpredictable!
 
 So likely some concern about making it easier to use, esp. assuming
 that third parties above are mgmt apps, not just developers.

Although we provide monitor and command line passthrough in libvirt,
our recommendation is that mgmt apps do not develop against these
APIs. Our goal / policy is that apps should be able todo anything
they need using the formally modelled libvirt public APIs.

The primary intended usage for the monitor/command line passthrough
is debugging  experimentation, and as a very short term hack/workaround
for mgmt apps while formal APIs are added to libvirt. In other words,
we provide the feature because we don't want libvirt to be a roadblock,
but we still strongly discourage their usage untill all other options
have been exhausted.

In same way as loading binary only modules into the kernels sets a
'tainted' flag, we plan that direct usage of monitor/command line
passthrough will set a tainted flag against a VM. This is allow distro
maintainers to identify usage  decide how they wish to support these
features in products (if at all).

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Configuring the bridge interface: why assign an IP?

2011-03-14 Thread Daniel P. Berrange

On Mon, Mar 14, 2011 at 11:24:40AM -0600, Ben Beuchler wrote:
 Most of the examples for setting up the bridge interface on a VM host
 suggest assigning the IP address to the bridge.  Assigning the IP to
 the bridge leaves you open to the MAC address of the bridge changing
 as you add/remove guests from the host, resulting in a brief (~20
 second) loss of connectivity to the host. (I am aware that I can
 manually set the MAC of the bridge to avoid unexpected changes. That's
 my current workaround.)

You don't need to manually set a MAC on the bridge - indeed you can't
set an arbitrary MAC on it - it must have a MAC that matches one of
the interfaces enslaved. The key is that the MAC of the enslaved ethernet
device should be numerically smaller than that of any guest TAP devices.
The kernel gives TAP devices a completely random MAC by default, so you
need to make a little change to that. Two options

 - Take the random host TAP device MAC and simply set the first byte to 0xFE
 - Take the guest NIC MAC, set first byte to 0xFE and give that to
   the host TAP device.

Recent releases of libvirt, follow the second approach and it has worked
out well, eliminating any connectivity loss with guest startup/shutdown

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with bridged tap interface

2011-02-23 Thread Daniel P. Berrange

On Wed, Feb 23, 2011 at 12:34:45PM +0100, 
andreas.a...@de.transport.bombardier.com wrote:
 Hi all,
 
 sorry for the previous partial e-mail, I hit the send button accidentally 
 ;-).
 
 I have a setup with a kvm-based virtual machine running a stock RedHat 6.1 
 (yes, that old) on a rather current debian host.
 
 1. uname in host: 2.6.26-2-amd64 #1 SMP Wed May 12 18:03:14 UTC 2010 
 x86_64 GNU/Linux
 
 2. uname in guest: 2.2.12-20 #1 Mon Sep 27 10:40:35 EDT 1999 i686 unknown
 
 eth0 of the guest is connected via tap0 to a kernel bridge, that is in 
 turn connected via the host's eth1 to a Gigabit link.  On the kvm 
 command-line I configure the guest-nic as model=ne2k_pci.
 
 The problem is, that I frequently loose network access from/to the guest.

There have been QEMU NIC model implementation bugs that exhibit
that characteristic. If you have the drivers available in the
guest, then I'd recommend trying out different NIC models than
ne2k, since that's probably the least actively maintained NIC
model. At least try rtl8139, but ideally the e1000 too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange

On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
 On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
 Currently FdMigrationState doesn't support read(), and this patch
 introduces it to get response from the other side.
 
 Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
 
 Migration is unidirectional.  Changing this is fundamental and not
 something to be done lightly.

Making it bi-directional might break libvirt's save/restore
to file support which uses migration, passing a unidirectional
FD for the file. It could also break libvirt's secure tunnelled
migration support which is currently only expecting to have
data sent in one direction on the socket.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 02/18] Introduce read() to FdMigrationState.

2011-02-10 Thread Daniel P. Berrange

On Thu, Feb 10, 2011 at 07:23:33PM +0900, Yoshiaki Tamura wrote:
 2011/2/10 Daniel P. Berrange berra...@redhat.com:
  On Thu, Feb 10, 2011 at 10:54:01AM +0100, Anthony Liguori wrote:
  On 02/10/2011 10:30 AM, Yoshiaki Tamura wrote:
  Currently FdMigrationState doesn't support read(), and this patch
  introduces it to get response from the other side.
  
  Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
 
  Migration is unidirectional.  Changing this is fundamental and not
  something to be done lightly.
 
  Making it bi-directional might break libvirt's save/restore
  to file support which uses migration, passing a unidirectional
  FD for the file. It could also break libvirt's secure tunnelled
  migration support which is currently only expecting to have
  data sent in one direction on the socket.
 
 Hi Daniel,
 
 IIUC, this patch isn't something to make existing live migration
 bi-directional.  Just opens up a way for Kemari to use it.  Do
 you think it's dangerous for libvirt still?

The key is for it to be a no-op for any usage of the existing
'migrate' command. I had thought this was wiring up read into
the event loop too, so it would be poll()ing for reads, but
after re-reading I see this isn't the case here.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-07 Thread Daniel P. Berrange

On Sat, Feb 05, 2011 at 04:34:01PM +, James Neave wrote:
 Hi,
 
 I'm trying to pass a NOVA-T-500 TV Tuner card through to a gust VM.
 I'm getting the error The driver 'pci-stub' is occupying your device
 :08:06.2

This is a rather misleading error message. It is *expected* that
pci-stub will occupy the device. Unfortunately the rest of the
error messages QEMU is printing aren't much help either, but 
ultimately something is returning -EBUSY in the PCI device assign
step

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-20 Thread Daniel P. Berrange

On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
   Hi,
 
 For (2), you cannot use bus=X,addr=Y because it makes assumptions about
 the PCI topology which may change in newer -M pc's.
 
 Why should the PCI topology for 'pc' ever change?
 
 We'll probably get q35 support some day, but when this lands I
 expect we'll see a new machine type 'q35', so '-m q35' will pick the
 ich9 chipset (which will have a different pci topology of course)
 and '-m pc' will pick the existing piix chipset (which will continue
 to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
 On 01/19/2011 07:11 AM, Markus Armbruster wrote:
 Gerd Hoffmannkra...@redhat.com  writes:
 
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices. If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need to
 specify it, thats why nobody does that today.  Once q35 finally
 arrives this will change of course.
 As far as I know, libvirt does it already.
 
 I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
 On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 On 01/18/11 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
 The device model topology is 100% a hidden architectural detail.
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.
 
 But we also don't do PCI passthrough so we really haven't even explored
 how that maps in qdev. I don't know if qemu-kvm has attempted to
 qdev-ify it.
 
 It is qdev-ified.  It is a normal pci device from qdev's point of view.
 
 BTW: is there any reason why (vfio-based) pci passthrough couldn't
 work with tcg?
 
 The -device interface is a stable interface. Right now, you don't
 specify any type of identifier of the pci bus when you create a PCI
 device. It's implied in the interface.
 
 Wrong.  You can specify the bus you want attach the device to via
 bus=name.  This is true for *every* device, including all pci
 devices.  If unspecified qdev uses the first bus it finds.
 
 As long as there is a single pci bus only there is simply no need
 to specify it, thats why nobody does that today.
 
 Right.  In terms of specifying bus=, what are we promising re:
 compatibility?  Will there always be a pci.0?  If we add some
 PCI-to-PCI bridges in order to support more devices, is libvirt
 support to parse the hierarchy and figure out which bus to put the
 device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
 On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 
 The reason we specify 'bus' is that we wanted to be flexible wrt
 upgrades of libvirt, without needing restarts of QEMU instances
 it manages. That way we can introduce new functionality into
 libvirt that relies on it having previously set 'bus' on all
 active QEMUs.
 
 If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
 be adding the extra bridges. I'd expect that QEMU provided just
 the first bridge and then libvirt would specify how many more
 bridges to create at boot or hotplug them later. So it wouldn't
 ever need to parse topology.
 
 Yeah, but replacing the main chipset will certainly change the PCI
 topology such that if you're specifying bus=X and addr=X and then
 also using -M pc, unless you're parsing the default topology to come
 up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state