Re: [PATCH] x86: Pick up local arch trace header

2009-09-24 Thread Jorge Lucángeli Obes
2009/9/24 Jan Kiszka :
> Jorge Lucángeli Obes wrote:
>> ...
>> Aidan, were you able to solve this? I was having the same (original)
>> problem in Xubuntu 64-bits with a custom 2.6.31 kernel and kvm-88. I
>> still haven't tried Jan's patch (paper deadline at work) but I wanted
>> to know if you had made any progress.
>
> The kvm-kmod tree at git://git.kiszka.org/kvm-kmod.git (branch 'queue')
> meanwhile contains patches that solved all Aidan's build problems.
>
> But note: Even your customized 2.6.31 contains the very same KVM kernel
> sources my tree is currently pulling in. So you could make your life
> easier by simply compiling them along your kernel.
>
> The kvm-kmod patches will become important again when we pull more
> recent KVM sources that are not yet part of the latest kernel, at least
> not part of the particular kernel one is forced to use (for whatever
> reason).

Thanks Jan!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Avi Kivity

On 09/24/2009 12:15 AM, Gregory Haskins wrote:



There are various aspects about designing high-performance virtual
devices such as providing the shortest paths possible between the
physical resources and the consumers.  Conversely, we also need to
ensure that we meet proper isolation/protection guarantees at the same
time.  What this means is there are various aspects to any
high-performance PV design that require to be placed in-kernel to
maximize the performance yet properly isolate the guest.

For instance, you are required to have your signal-path (interrupts and
hypercalls), your memory-path (gpa translation), and
addressing/isolation model in-kernel to maximize performance.

   

Exactly.  That's what vhost puts into the kernel and nothing more.
 

Actually, no.  Generally, _KVM_ puts those things into the kernel, and
vhost consumes them.  Without KVM (or something equivalent), vhost is
incomplete.  One of my goals with vbus is to generalize the "something
equivalent" part here.
   


I don't really see how vhost and vbus are different here.  vhost expects 
signalling to happen through a couple of eventfds and requires someone 
to supply them and implement kernel support (if needed).  vbus requires 
someone to write a connector to provide the signalling implementation.  
Neither will work out-of-the-box when implementing virtio-net over 
falling dominos, for example.



Vbus accomplishes its in-kernel isolation model by providing a
"container" concept, where objects are placed into this container by
userspace.  The host kernel enforces isolation/protection by using a
namespace to identify objects that is only relevant within a specific
container's context (namely, a "u32 dev-id").  The guest addresses the
objects by its dev-id, and the kernel ensures that the guest can't
access objects outside of its dev-id namespace.

   

vhost manages to accomplish this without any kernel support.
 

No, vhost manages to accomplish this because of KVMs kernel support
(ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
merely a kind of "tuntap"-like clone signalled by eventfds.
   


Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
either.  Both vhost and vbus need an interface, vhost's is just narrower 
since it doesn't do configuration or enumeration.



This goes directly to my rebuttal of your claim that vbus places too
much in the kernel.  I state that, one way or the other, address decode
and isolation _must_ be in the kernel for performance.  Vbus does this
with a devid/container scheme.  vhost+virtio-pci+kvm does it with
pci+pio+ioeventfd.
   


vbus doesn't do kvm guest address decoding for the fast path.  It's 
still done by ioeventfd.



  The guest
simply has not access to any vhost resources other than the guest->host
doorbell, which is handed to the guest outside vhost (so it's somebody
else's problem, in userspace).
 

You mean _controlled_ by userspace, right?  Obviously, the other side of
the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
vhost would be pointless: e.g. just use vanilla tuntap if you don't need
fast in-kernel decoding.
   


Yes (though for something like level-triggered interrupts we're probably 
keeping it in userspace, enjoying the benefits of vhost data path while 
paying more for signalling).



All that is required is a way to transport a message with a "devid"
attribute as an address (such as DEVCALL(devid)) and the framework
provides the rest of the decode+execute function.

   

vhost avoids that.
 

No, it doesn't avoid it.  It just doesn't specify how its done, and
relies on something else to do it on its behalf.
   


That someone else can be in userspace, apart from the actual fast path.


Conversely, vbus specifies how its done, but not how to transport the
verb "across the wire".  That is the role of the vbus-connector abstraction.
   


So again, vbus does everything in the kernel (since it's so easy and 
cheap) but expects a vbus-connector.  vhost does configuration in 
userspace (since it's so clunky and fragile) but expects a couple of 
eventfds.



Contrast this to vhost+virtio-pci (called simply "vhost" from here).

   

It's the wrong name.  vhost implements only the data path.
 

Understood, but vhost+virtio-pci is what I am contrasting, and I use
"vhost" for short from that point on because I am too lazy to type the
whole name over and over ;)
   


If you #define A A+B+C don't expect intelligent conversation afterwards.


It is not immune to requiring in-kernel addressing support either, but
rather it just does it differently (and its not as you might expect via
qemu).

Vhost relies on QEMU to render PCI objects to the guest, which the guest
assigns resources (such as BARs, interrupts, etc).
   

vhost does not rely on qemu.  It relies on its user to handle
configuration.  In one important case it's qemu+pci.  It could just as
well be the lguest launcher.
  

Re: [PATCH] x86: Pick up local arch trace header

2009-09-24 Thread Avi Kivity

On 09/24/2009 09:42 AM, Jan Kiszka wrote:

Jorge Lucángeli Obes wrote:
   

...
Aidan, were you able to solve this? I was having the same (original)
problem in Xubuntu 64-bits with a custom 2.6.31 kernel and kvm-88. I
still haven't tried Jan's patch (paper deadline at work) but I wanted
to know if you had made any progress.
 

The kvm-kmod tree at git://git.kiszka.org/kvm-kmod.git (branch 'queue')
meanwhile contains patches that solved all Aidan's build problems.

   


Can you post them as patches please?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Avi Kivity

On 09/23/2009 06:45 PM, Jan Kiszka wrote:

Functions calling each other in the same subsystem can rely on callers
calling cpu_synchronize_state().  Across subsystems, that's another
matter, exported functions should try not to rely on implementation
details of their callers.

(You might argue that the apic is not separate subsystem wrt an x86 cpu,
and I'm not sure I have a counterargument)

 

I do accept this argument. It's just that my feeling is that we are
lacking proper review of the required call sites of cpu_sychronize_state
and rather put it where some regression popped up (and that only in
qemu-kvm...).
   


That's life...


The new rule is: Synchronize the states before accessing registers (or
in-kernel devices) the first time after a vmexit to user space.


No, the rule is: synchronize state before accessing registers.  Extra 
synchronization is cheap, while missing synchronization is very expensive.



But,
e.g., I do not see where we do this on CPU reset.
   


That's a bug.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Gleb Natapov
On Thu, Sep 24, 2009 at 10:53:59AM +0300, Avi Kivity wrote:
> On 09/23/2009 06:45 PM, Jan Kiszka wrote:
> >>Functions calling each other in the same subsystem can rely on callers
> >>calling cpu_synchronize_state().  Across subsystems, that's another
> >>matter, exported functions should try not to rely on implementation
> >>details of their callers.
> >>
> >>(You might argue that the apic is not separate subsystem wrt an x86 cpu,
> >>and I'm not sure I have a counterargument)
> >>
> >I do accept this argument. It's just that my feeling is that we are
> >lacking proper review of the required call sites of cpu_sychronize_state
> >and rather put it where some regression popped up (and that only in
> >qemu-kvm...).
> 
> That's life...
> 
> >The new rule is: Synchronize the states before accessing registers (or
> >in-kernel devices) the first time after a vmexit to user space.
> 
> No, the rule is: synchronize state before accessing registers.
> Extra synchronization is cheap, while missing synchronization is
> very expensive.
> 
So should we stick cpu_synchronize_state() before each register
accesses? I think it is reasonable to omit it if all callers do it
already.

> >But,
> >e.g., I do not see where we do this on CPU reset.
> 
> That's a bug.
> 
Only if kvm support cpus without apic. Otherwise CPU is reset by 
apic_reset() and cpu_synchronize_state() is called there.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Avi Kivity

On 09/23/2009 10:37 PM, Avi Kivity wrote:


Example: feature negotiation.  If it happens in userspace, it's easy 
to limit what features we expose to the guest.  If it happens in the 
kernel, we need to add an interface to let the kernel know which 
features it should expose to the guest.  We also need to add an 
interface to let userspace know which features were negotiated, if we 
want to implement live migration.  Something fairly trivial bloats 
rapidly.


btw, we have this issue with kvm reporting cpuid bits to the guest.  
Instead of letting kvm talk directly to the hardware and the guest, kvm 
gets the cpuid bits from the hardware, strips away features it doesn't 
support, exposes that to userspace, and expects userspace to program the 
cpuid bits it wants to expose to the guest (which may be different than 
what kvm exposed to userspace, and different from guest to guest).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Jan Kiszka
Gleb Natapov wrote:
> On Thu, Sep 24, 2009 at 10:53:59AM +0300, Avi Kivity wrote:
>> On 09/23/2009 06:45 PM, Jan Kiszka wrote:
 Functions calling each other in the same subsystem can rely on callers
 calling cpu_synchronize_state().  Across subsystems, that's another
 matter, exported functions should try not to rely on implementation
 details of their callers.

 (You might argue that the apic is not separate subsystem wrt an x86 cpu,
 and I'm not sure I have a counterargument)

>>> I do accept this argument. It's just that my feeling is that we are
>>> lacking proper review of the required call sites of cpu_sychronize_state
>>> and rather put it where some regression popped up (and that only in
>>> qemu-kvm...).
>> That's life...
>>
>>> The new rule is: Synchronize the states before accessing registers (or
>>> in-kernel devices) the first time after a vmexit to user space.
>> No, the rule is: synchronize state before accessing registers.
>> Extra synchronization is cheap, while missing synchronization is
>> very expensive.
>>
> So should we stick cpu_synchronize_state() before each register
> accesses? I think it is reasonable to omit it if all callers do it
> already.
> 
>>> But,
>>> e.g., I do not see where we do this on CPU reset.
>> That's a bug.
>>
> Only if kvm support cpus without apic. Otherwise CPU is reset by 
> apic_reset() and cpu_synchronize_state() is called there.

No, that's not enough if cpu_reset() first fiddles with some registers
that may later on be overwritten on cpu_synchronize_state() with the old
in-kernel state. At least in theory, haven't checked yet what happens in
reality. That's why not synchronizing properly is "expensive" (or broken
IOW).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Avi Kivity

On 09/24/2009 11:03 AM, Gleb Natapov wrote:



The new rule is: Synchronize the states before accessing registers (or
in-kernel devices) the first time after a vmexit to user space.
   

No, the rule is: synchronize state before accessing registers.
Extra synchronization is cheap, while missing synchronization is
very expensive.

 

So should we stick cpu_synchronize_state() before each register
accesses? I think it is reasonable to omit it if all callers do it
already.
   


If the callee is static we can and should avoid it.  If the function is 
exported then we shouldn't rely on callers.


IOW, it's fine to depend on local details (which a reader can easily 
gain), but better to avoid depending on global details.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Gleb Natapov
On Thu, Sep 24, 2009 at 10:15:15AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Sep 24, 2009 at 10:53:59AM +0300, Avi Kivity wrote:
> >> On 09/23/2009 06:45 PM, Jan Kiszka wrote:
>  Functions calling each other in the same subsystem can rely on callers
>  calling cpu_synchronize_state().  Across subsystems, that's another
>  matter, exported functions should try not to rely on implementation
>  details of their callers.
> 
>  (You might argue that the apic is not separate subsystem wrt an x86 cpu,
>  and I'm not sure I have a counterargument)
> 
> >>> I do accept this argument. It's just that my feeling is that we are
> >>> lacking proper review of the required call sites of cpu_sychronize_state
> >>> and rather put it where some regression popped up (and that only in
> >>> qemu-kvm...).
> >> That's life...
> >>
> >>> The new rule is: Synchronize the states before accessing registers (or
> >>> in-kernel devices) the first time after a vmexit to user space.
> >> No, the rule is: synchronize state before accessing registers.
> >> Extra synchronization is cheap, while missing synchronization is
> >> very expensive.
> >>
> > So should we stick cpu_synchronize_state() before each register
> > accesses? I think it is reasonable to omit it if all callers do it
> > already.
> > 
> >>> But,
> >>> e.g., I do not see where we do this on CPU reset.
> >> That's a bug.
> >>
> > Only if kvm support cpus without apic. Otherwise CPU is reset by 
> > apic_reset() and cpu_synchronize_state() is called there.
> 
> No, that's not enough if cpu_reset() first fiddles with some registers
> that may later on be overwritten on cpu_synchronize_state() with the old
> in-kernel state. At least in theory, haven't checked yet what happens in
Can't happen. Call chain is apic_reset() -> cpu_reset() and apic_reset()
calls  cpu_synchronize_state() before calling cpu_reset().

> reality. That's why not synchronizing properly is "expensive" (or broken
> IOW).
> 
> Jan
> 



--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Jan Kiszka
Gleb Natapov wrote:
> On Thu, Sep 24, 2009 at 10:15:15AM +0200, Jan Kiszka wrote:
>> Gleb Natapov wrote:
>>> On Thu, Sep 24, 2009 at 10:53:59AM +0300, Avi Kivity wrote:
 On 09/23/2009 06:45 PM, Jan Kiszka wrote:
>> Functions calling each other in the same subsystem can rely on callers
>> calling cpu_synchronize_state().  Across subsystems, that's another
>> matter, exported functions should try not to rely on implementation
>> details of their callers.
>>
>> (You might argue that the apic is not separate subsystem wrt an x86 cpu,
>> and I'm not sure I have a counterargument)
>>
> I do accept this argument. It's just that my feeling is that we are
> lacking proper review of the required call sites of cpu_sychronize_state
> and rather put it where some regression popped up (and that only in
> qemu-kvm...).
 That's life...

> The new rule is: Synchronize the states before accessing registers (or
> in-kernel devices) the first time after a vmexit to user space.
 No, the rule is: synchronize state before accessing registers.
 Extra synchronization is cheap, while missing synchronization is
 very expensive.

>>> So should we stick cpu_synchronize_state() before each register
>>> accesses? I think it is reasonable to omit it if all callers do it
>>> already.
>>>
> But,
> e.g., I do not see where we do this on CPU reset.
 That's a bug.

>>> Only if kvm support cpus without apic. Otherwise CPU is reset by 
>>> apic_reset() and cpu_synchronize_state() is called there.
>> No, that's not enough if cpu_reset() first fiddles with some registers
>> that may later on be overwritten on cpu_synchronize_state() with the old
>> in-kernel state. At least in theory, haven't checked yet what happens in
> Can't happen. Call chain is apic_reset() -> cpu_reset() and apic_reset()
> calls  cpu_synchronize_state() before calling cpu_reset().

And system_reset?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Don't call cpu_synchronize_state() in apic_init_reset()

2009-09-24 Thread Gleb Natapov
On Thu, Sep 24, 2009 at 10:59:46AM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Thu, Sep 24, 2009 at 10:15:15AM +0200, Jan Kiszka wrote:
> >> Gleb Natapov wrote:
> >>> On Thu, Sep 24, 2009 at 10:53:59AM +0300, Avi Kivity wrote:
>  On 09/23/2009 06:45 PM, Jan Kiszka wrote:
> >> Functions calling each other in the same subsystem can rely on callers
> >> calling cpu_synchronize_state().  Across subsystems, that's another
> >> matter, exported functions should try not to rely on implementation
> >> details of their callers.
> >>
> >> (You might argue that the apic is not separate subsystem wrt an x86 
> >> cpu,
> >> and I'm not sure I have a counterargument)
> >>
> > I do accept this argument. It's just that my feeling is that we are
> > lacking proper review of the required call sites of cpu_sychronize_state
> > and rather put it where some regression popped up (and that only in
> > qemu-kvm...).
>  That's life...
> 
> > The new rule is: Synchronize the states before accessing registers (or
> > in-kernel devices) the first time after a vmexit to user space.
>  No, the rule is: synchronize state before accessing registers.
>  Extra synchronization is cheap, while missing synchronization is
>  very expensive.
> 
> >>> So should we stick cpu_synchronize_state() before each register
> >>> accesses? I think it is reasonable to omit it if all callers do it
> >>> already.
> >>>
> > But,
> > e.g., I do not see where we do this on CPU reset.
>  That's a bug.
> 
> >>> Only if kvm support cpus without apic. Otherwise CPU is reset by 
> >>> apic_reset() and cpu_synchronize_state() is called there.
> >> No, that's not enough if cpu_reset() first fiddles with some registers
> >> that may later on be overwritten on cpu_synchronize_state() with the old
> >> in-kernel state. At least in theory, haven't checked yet what happens in
> > Can't happen. Call chain is apic_reset() -> cpu_reset() and apic_reset()
> > calls  cpu_synchronize_state() before calling cpu_reset().
> 
> And system_reset?
> 
And system_reset calls apic_reset() if cpu has apic, cpu_reset()
otherwise. That is why I said that the bug is only for cpus without
apic.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2826486 ] Clock speed in FreeBSD

2009-09-24 Thread SourceForge.net
Bugs item #2826486, was opened at 2009-07-24 11:16
Message generated for change (Comment added) made by aurel32
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2826486&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: POLYMORF34 (polymorf34)
Assigned to: Nobody/Anonymous (nobody)
Summary: Clock speed in FreeBSD

Initial Comment:
I use KVM 88 and KVM 85 on Gentoo GNU/Linux 2.6.29, running on Intel Core2 CPU 
6320 and Intel Xeon CPU E5405, both in 64 bits mode.
All gests running on FreeBSD 7.1-p5 in 64 bits with -smp 1. The first machine 
host only one gest.

The "sleep" command on FreeBSD does not work has expected. All sleep time are 
multiplied by 3

Example :

 freebsdmachine ~ # time sleep 1
real0m3.148s
user0m0.000s
sys 0m0.002s

freebsdmachine ~ # time sleep 10
real0m31.429s
user0m0.009s
sys 0m0.002s

With the "-no-kvm" flag, the "sleep" command works has expected.

--

Comment By: Aurelien Jarno (aurel32)
Date: 2009-09-24 11:30

Message:
This is a regression introduced by this commit:

commit a7dfd4349f00e256a884b572f98c2c3be57ad212
Author: Marcelo Tosatti 
Date:   Wed Jan 21 13:07:00 2009 -0200

KVM: x86: fix LAPIC pending count calculation

Simplify LAPIC TMCCT calculation by using hrtimer provided
function to query remaining time until expiration.

Fixes host hang with nested ESX.

Signed-off-by: Marcelo Tosatti 
Signed-off-by: Alexander Graf 
Signed-off-by: Avi Kivity 


--

Comment By: rmdir (rmdir)
Date: 2009-09-11 11:03

Message:
>Seems like there's a bug in one of the emulated timers. I worked around it
>with the Fedora 11 version of kvm by using the -no-kvm-irqchip flag.

 -no-kvm-irqchip is not  real solution. On FreeBSD guest  it's  really
mess with smp > 1 (I don't know with other guest).
You can reproduce this by making a  du or fsck

date ; du -csh /usr/ports/ ; date #use date instead of time because of
this bug 
with :
-smp 2 => 32s
-smp 2 -no-kvm-irqchip => 4m28
-smp 1 -no-kvm-irqchip => 35s
-smp 1 => 35s
no options => 17s


 



--

Comment By: Ed Swierk (eswierk)
Date: 2009-07-24 16:01

Message:
Seems like there's a bug in one of the emulated timers. I worked around it
with the Fedora 11 version of kvm by using the -no-kvm-irqchip flag. 


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2826486&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Avi Kivity

On 09/23/2009 06:58 PM, Matthew Tippett wrote:

Hi,

I would like to call attention to the SQLite performance under KVM in 
the current Ubuntu Alpha.


http://www.phoronix.com/scan.php?page=article&item=linux_2631_kvm&num=3

SQLite's benchmark as part of the Phoronix Test Suite is typically IO 
limited and is affected by both disk and filesystem performance.


When comparing SQLite under the host against the guest OS,  there is 
an order of magnitude _IMPROVEMENT_ in the measured performance  of 
the guest.


I am expecting that the host is doing synchronous IO operations but 
somewhere in the stack the calls are ultimately being made 
asynchronous or at the very least batched for writing.


On the surface, this represents a data integrity issue and  I am 
interested in the KVM communities thoughts on this behaviour.  Is it 
expected? Is it acceptable?  Is it safe?


qemu defaults to write-through caching, so there is no data integrity 
concern.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough

2009-09-24 Thread Avi Kivity

On 09/24/2009 03:01 AM, Matt Piermarini wrote:


If anybody has any ideas I can try, I'd surely appreciate it.  My host 
does NOT have vt-d capable hardware, and I'm not even sure that is 
requirement - is it?  Host is an Intel ICH10/P45/Q6600.


Flags: bus master, medium devsel, latency 64, IRQ 20


"bus master" means the card can dma, which requires an iommu.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Matthew Tippett
Thanks Avi,

I am still trying to reconcile the your statement with the potential
data risks and the numbers observed.

My read of your response is that the guest sees a consistent view -
the data is commited to the virtual disk device.  Does a synchronous
write within the guest trigger a synchronous write of the virtual
device within the host?

I don't think offering SQLite users a 10 fold increase in performance
with no data integrity risks just by using KVM is a sane proposition.

Regards... Matthew


On 9/24/09, Avi Kivity  wrote:
> On 09/23/2009 06:58 PM, Matthew Tippett wrote:
>> Hi,
>>
>> I would like to call attention to the SQLite performance under KVM in
>> the current Ubuntu Alpha.
>>
>> http://www.phoronix.com/scan.php?page=article&item=linux_2631_kvm&num=3
>>
>> SQLite's benchmark as part of the Phoronix Test Suite is typically IO
>> limited and is affected by both disk and filesystem performance.
>>
>> When comparing SQLite under the host against the guest OS,  there is
>> an order of magnitude _IMPROVEMENT_ in the measured performance  of
>> the guest.
>>
>> I am expecting that the host is doing synchronous IO operations but
>> somewhere in the stack the calls are ultimately being made
>> asynchronous or at the very least batched for writing.
>>
>> On the surface, this represents a data integrity issue and  I am
>> interested in the KVM communities thoughts on this behaviour.  Is it
>> expected? Is it acceptable?  Is it safe?
>
> qemu defaults to write-through caching, so there is no data integrity
> concern.
>
> --
> Do not meddle in the internals of kernels, for they are subtle and quick to
> panic.
>
>

-- 
Sent from my mobile device
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Avi Kivity

On 09/24/2009 03:31 PM, Matthew Tippett wrote:

Thanks Avi,

I am still trying to reconcile the your statement with the potential
data risks and the numbers observed.

My read of your response is that the guest sees a consistent view -
the data is commited to the virtual disk device.  Does a synchronous
write within the guest trigger a synchronous write of the virtual
device within the host?
   


Yes.


I don't think offering SQLite users a 10 fold increase in performance
with no data integrity risks just by using KVM is a sane proposition.
   


It isn't, my guess is that the test setup is broken somehow.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Binary Windows guest drivers are released

2009-09-24 Thread Yan Vugenfirer
Hello All,

I am happy to announce that the Windows guest drivers binaries are
released.

http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers


Best regards,
Yan Vugenfirer.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough

2009-09-24 Thread Matt Piermarini

On 09/24/2009 07:46 AM, Avi Kivity wrote:

On 09/24/2009 03:01 AM, Matt Piermarini wrote:


If anybody has any ideas I can try, I'd surely appreciate it.  My 
host does NOT have vt-d capable hardware, and I'm not even sure that 
is requirement - is it?  Host is an Intel ICH10/P45/Q6600.


Flags: bus master, medium devsel, latency 64, IRQ 20


"bus master" means the card can dma, which requires an iommu.


Thanks for the info -- At least I know I can stop pulling my hair out now.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update

2009-09-24 Thread Marcelo Tosatti
On Mon, Sep 21, 2009 at 08:37:18PM -0300, Marcelo Tosatti wrote:
> Use two steps for memslot deletion: mark the slot invalid (which stops 
> instantiation of new shadow pages for that slot, but allows destruction),
> then instantiate the new empty slot.
> 
> Also simplifies kvm_handle_hva locking.
> 
> Signed-off-by: Marcelo Tosatti 
> 



> - if (!npages)
> + if (!npages) {
> + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> + if (!slots)
> + goto out_free;
> + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> + if (mem->slot >= slots->nmemslots)
> + slots->nmemslots = mem->slot + 1;
> + slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
> +
> + old_memslots = kvm->memslots;
> + rcu_assign_pointer(kvm->memslots, slots);
> + synchronize_srcu(&kvm->srcu);
> + /* From this point no new shadow pages pointing to a deleted
> +  * memslot will be created.
> +  *
> +  * validation of sp->gfn happens in:
> +  *  - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> +  *  - kvm_is_visible_gfn (mmu_check_roots)
> +  */
>   kvm_arch_flush_shadow(kvm);
> + kfree(old_memslots);
> + }
>  
>   r = kvm_arch_prepare_memory_region(kvm, &new, old, user_alloc);
>   if (r)
>   goto out_free;
>  
> - spin_lock(&kvm->mmu_lock);
> - if (mem->slot >= kvm->memslots->nmemslots)
> - kvm->memslots->nmemslots = mem->slot + 1;
> +#ifdef CONFIG_DMAR
> + /* map the pages in iommu page table */
> + if (npages)
> + r = kvm_iommu_map_pages(kvm, &new);
> + if (r)
> + goto out_free;
> +#endif
>  
> - *memslot = new;
> - spin_unlock(&kvm->mmu_lock);
> + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> + if (!slots)
> + goto out_free;
> + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> + if (mem->slot >= slots->nmemslots)
> + slots->nmemslots = mem->slot + 1;
> +
> + /* actual memory is freed via old in kvm_free_physmem_slot below */
> + if (!npages) {
> + new.rmap = NULL;
> + new.dirty_bitmap = NULL;
> + for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i)
> + new.lpage_info[i] = NULL;
> + }
> +
> + slots->memslots[mem->slot] = new;
> + old_memslots = kvm->memslots;
> + rcu_assign_pointer(kvm->memslots, slots);
> + synchronize_srcu(&kvm->srcu);
>  
>   kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);

Paul,

There is a scenario where this path, which updates KVM memory slots, is
called relatively often.

Each synchronize_srcu() call takes about 10ms (avg 3ms per
synchronize_sched call), so this is hurting us.

Is this expected? Is there any possibility for synchronize_srcu()
optimization?

There are other sides we can work on, such as reducing the memory slot 
updates, but i'm wondering what can be done regarding SRCU itself.

TIA

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages

2009-09-24 Thread Marcelo Tosatti

This needs compat code for !MMU_NOTIFIERS case in kvm-kmod (Jan CC'ed).

Otherwise looks good.

On Wed, Sep 23, 2009 at 09:47:16PM +0300, Izik Eidus wrote:
> When using mmu notifiers, we are allowed to remove the page count
> reference tooken by get_user_pages to a specific page that is mapped
> inside the shadow page tables.
> 
> This is needed so we can balance the pagecount against mapcount
> checking.
> 
> (Right now kvm increase the pagecount and does not increase the
> mapcount when mapping page into shadow page table entry,
> so when comparing pagecount against mapcount, you have no
> reliable result.)
> 
> Signed-off-by: Izik Eidus 
> ---
>  arch/x86/kvm/mmu.c |7 ++-
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index eca41ae..6c67b23 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -634,9 +634,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
>   if (*spte & shadow_accessed_mask)
>   kvm_set_pfn_accessed(pfn);
>   if (is_writeble_pte(*spte))
> - kvm_release_pfn_dirty(pfn);
> - else
> - kvm_release_pfn_clean(pfn);
> + kvm_set_pfn_dirty(pfn);
>   rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
>   if (!*rmapp) {
>   printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
> @@ -1877,8 +1875,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
> *sptep,
>   page_header_update_slot(vcpu->kvm, sptep, gfn);
>   if (!was_rmapped) {
>   rmap_count = rmap_add(vcpu, sptep, gfn);
> - if (!is_rmap_spte(*sptep))
> - kvm_release_pfn_clean(pfn);
> + kvm_release_pfn_clean(pfn);
>   if (rmap_count > RMAP_RECYCLE_THRESHOLD)
>   rmap_recycle(vcpu, sptep, gfn);
>   } else {
> -- 
> 1.5.6.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] add support for change_pte mmu notifiers

2009-09-24 Thread Marcelo Tosatti
On Wed, Sep 23, 2009 at 09:47:18PM +0300, Izik Eidus wrote:
> this is needed for kvm if it want ksm to directly map pages into its
> shadow page tables.
> 
> Signed-off-by: Izik Eidus 
> ---
>  arch/x86/include/asm/kvm_host.h |1 +
>  arch/x86/kvm/mmu.c  |   62 +-
>  virt/kvm/kvm_main.c |   14 +
>  3 files changed, 68 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3be0004..d838922 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -796,6 +796,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 5cd8b4e..ceec065 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -748,7 +748,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
>   return write_protected;
>  }
>  
> -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, u64 data)
>  {
>   u64 *spte;
>   int need_tlb_flush = 0;
> @@ -763,8 +763,45 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned 
> long *rmapp)
>   return need_tlb_flush;
>  }
>  
> -static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> -   int (*handler)(struct kvm *kvm, unsigned long *rmapp))
> +static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, u64 data)
> +{
> + int need_flush = 0;
> + u64 *spte, new_spte;
> + pte_t *ptep = (pte_t *)data;
> + pfn_t new_pfn;
> +
> + WARN_ON(pte_huge(*ptep));
> + new_pfn = pte_pfn(*ptep);
> + spte = rmap_next(kvm, rmapp, NULL);
> + while (spte) {
> + BUG_ON(!is_shadow_present_pte(*spte));
> + rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
> + need_flush = 1;
> + if (pte_write(*ptep)) {
> + rmap_remove(kvm, spte);
> + __set_spte(spte, shadow_trap_nonpresent_pte);
> + spte = rmap_next(kvm, rmapp, NULL);
> + } else {
> + new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
> + new_spte |= new_pfn << PAGE_SHIFT;

new_spte |= (u64)new_pfn << PAGE_SHIFT;

Otherwise looks good to me.

> + new_spte &= ~PT_WRITABLE_MASK;
> + new_spte &= ~SPTE_HOST_WRITEABLE;
> + if (is_writeble_pte(*spte))
> + kvm_set_pfn_dirty(spte_to_pfn(*spte));
> + __set_spte(spte, new_spte);
> + spte = rmap_next(kvm, rmapp, spte);
> + }
> + }
> + if (need_flush)
> + kvm_flush_remote_tlbs(kvm);
> +
> + return 0;
> +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: kvm 2/6] Kill the confusing tsc_ref_khz and ref_freq variables.

2009-09-24 Thread Marcelo Tosatti
On Wed, Sep 23, 2009 at 05:29:01PM -1000, Zachary Amsden wrote:
> They are globals, not clearly protected by any ordering or locking, and
> vulnerable to various startup races.
> 
> Instead, for variable TSC machines, register the cpufreq notifier and get
> the TSC frequency directly from the cpufreq machinery.  Not only is it
> always right, it is also perfectly accurate, as no error prone measurement
> is required.  On such machines, also detect the frequency when bringing
> a new CPU online; it isn't clear what frequency it will start with, and
> it may not correspond to the reference.
> 
> Signed-off-by: Zachary Amsden 
> ---
>  arch/x86/kvm/x86.c |   38 --
>  1 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 15d2ace..35082dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -650,6 +650,19 @@ static void kvm_set_time_scale(uint32_t tsc_khz, struct 
> pvclock_vcpu_time_info *
>  
>  static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
>  
> +static inline void kvm_get_cpu_khz(int cpu)
> +{
> + unsigned int khz = cpufreq_get(cpu);

cpufreq_get does down_read, while kvm_arch_hardware_enable is called
either with a spinlock held or from interrupt context?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2865820 ] kvm-88-r1 & Intel E5450 Harpertown

2009-09-24 Thread SourceForge.net
Bugs item #2865820, was opened at 2009-09-24 10:44
Message generated for change (Tracker Item Submitted) made by jimerickson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2865820&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: James Erickson (jimerickson)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm-88-r1 & Intel E5450 Harpertown

Initial Comment:
i use kvm-88-r1 0n 64-bit gentoo. i recently install  two quad core Intel E5450 
Harpertown 64-bit processors. they do not have the vmx flag. so my /dev/kvm is 
not being created. is there a solution for this for this? my guest is usually 
32-bit freedbsd. i have included /proc/cpuinfo as an attachment. 

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2865820&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: kvm 3/6] Fix hotadd of CPUs for KVM.

2009-09-24 Thread Marcelo Tosatti
On Wed, Sep 23, 2009 at 05:29:02PM -1000, Zachary Amsden wrote:
> Both VMX and SVM require per-cpu memory allocation, which is done at module
> init time, for only online cpus.  When bringing a new CPU online, we must
> also allocate this structure.  The method chosen to implement this is to
> make the CPU online notifier available via a call to the arch code.  This
> allows memory allocation to be done smoothly, without any need to allocate
> extra structures.
> 
> Note: CPU up notifiers may call KVM callback before calling cpufreq callbacks.
> This would causes the CPU frequency not to be detected (and it is not always
> clear on non-constant TSC platforms what the bringup TSC rate will be, so the
> guess of using tsc_khz could be wrong).  So, we clear the rate to zero in such
> a case and add logic to query it upon entry.
> 
> Signed-off-by: Zachary Amsden 
> ---
>  arch/x86/include/asm/kvm_host.h |2 ++
>  arch/x86/kvm/svm.c  |   15 +--
>  arch/x86/kvm/vmx.c  |   17 +
>  arch/x86/kvm/x86.c  |   14 +-
>  include/linux/kvm_host.h|6 ++
>  virt/kvm/kvm_main.c |3 ++-
>  6 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 299cc1b..b7dd14b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -459,6 +459,7 @@ struct descriptor_table {
>  struct kvm_x86_ops {
>   int (*cpu_has_kvm_support)(void);  /* __init */
>   int (*disabled_by_bios)(void); /* __init */
> + int (*cpu_hotadd)(int cpu);
>   int (*hardware_enable)(void *dummy);
>   void (*hardware_disable)(void *dummy);
>   void (*check_processor_compatibility)(void *rtn);
> @@ -791,6 +792,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>   _ASM_PTR " 666b, 667b \n\t" \
>   ".popsection"
>  
> +#define KVM_ARCH_WANT_HOTPLUG_NOTIFIER
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 9a4daca..8f99d0c 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -330,13 +330,13 @@ static int svm_hardware_enable(void *garbage)
>   return -EBUSY;
>  
>   if (!has_svm()) {
> - printk(KERN_ERR "svm_cpu_init: err EOPNOTSUPP on %d\n", me);
> + printk(KERN_ERR "svm_hardware_enable: err EOPNOTSUPP on %d\n", 
> me);
>   return -EINVAL;
>   }
>   svm_data = per_cpu(svm_data, me);
>  
>   if (!svm_data) {
> - printk(KERN_ERR "svm_cpu_init: svm_data is NULL on %d\n",
> + printk(KERN_ERR "svm_hardware_enable: svm_data is NULL on %d\n",
>  me);
>   return -EINVAL;
>   }
> @@ -394,6 +394,16 @@ err_1:
>  
>  }
>  
> +static __cpuinit int svm_cpu_hotadd(int cpu)
> +{
> + struct svm_cpu_data *svm_data = per_cpu(svm_data, cpu);
> +
> + if (svm_data)
> + return 0;
> +
> + return svm_cpu_init(cpu);
> +}
> +
>  static void set_msr_interception(u32 *msrpm, unsigned msr,
>int read, int write)
>  {
> @@ -2858,6 +2868,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>   .hardware_setup = svm_hardware_setup,
>   .hardware_unsetup = svm_hardware_unsetup,
>   .check_processor_compatibility = svm_check_processor_compat,
> + .cpu_hotadd = svm_cpu_hotadd,
>   .hardware_enable = svm_hardware_enable,
>   .hardware_disable = svm_hardware_disable,
>   .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3fe0d42..b8a8428 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1408,6 +1408,22 @@ static __exit void hardware_unsetup(void)
>   free_kvm_area();
>  }
>  
> +static __cpuinit int vmx_cpu_hotadd(int cpu)
> +{
> + struct vmcs *vmcs;
> +
> + if (per_cpu(vmxarea, cpu))
> + return 0;
> +
> + vmcs = alloc_vmcs_cpu(cpu);
> + if (!vmcs) 
> + return -ENOMEM;
> +
> + per_cpu(vmxarea, cpu) = vmcs;
> +
> + return 0;
> +}

Have to free in __cpuexit?

Is it too wasteful to allocate statically with DEFINE_PER_CPU_PAGE_ALIGNED?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


a streaming server on a kvm virtual machine?

2009-09-24 Thread Mauro
My boss asked me to install and configure a streaming server for live videos.
My choice for the server is red5, an open source streaming server.
Do you think can I use a kvm virtual machine for this server or it's
better not to use virtualization?
My hardware is a HP Proliant DL580 G5 with 4 Intel Xeon Quad core
processors and 16G ram.
My operating system is debian lenny amd64.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes

2009-09-24 Thread Andrea Arcangeli
On Wed, Sep 23, 2009 at 09:47:17PM +0300, Izik Eidus wrote:
> this flag notify that the host physical page we are pointing to from
> the spte is write protected, and therefore we cant change its access
> to be write unless we run get_user_pages(write = 1).
> 
> (this is needed for change_pte support in kvm)
> 
> Signed-off-by: Izik Eidus 

Acked-by: Andrea Arcangeli 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages

2009-09-24 Thread Andrea Arcangeli
On Wed, Sep 23, 2009 at 09:47:16PM +0300, Izik Eidus wrote:
> When using mmu notifiers, we are allowed to remove the page count
> reference tooken by get_user_pages to a specific page that is mapped
> inside the shadow page tables.
> 
> This is needed so we can balance the pagecount against mapcount
> checking.
> 
> (Right now kvm increase the pagecount and does not increase the
> mapcount when mapping page into shadow page table entry,
> so when comparing pagecount against mapcount, you have no
> reliable result.)
> 
> Signed-off-by: Izik Eidus 

Acked-by: Andrea Arcangeli 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] add support for change_pte mmu notifiers

2009-09-24 Thread Andrea Arcangeli
On Wed, Sep 23, 2009 at 09:47:18PM +0300, Izik Eidus wrote:
> + if (need_flush)
> + kvm_flush_remote_tlbs(kvm);

need_flush can be return to kvm_mmu_notifier_change_pte to defer the
tlb flush after dropping the spin lock I think. We are forced to flush
the tlb inside spin_lock in kvm normal context because that stops the
VM from freeing the page (it hangs on the mmu_lock taken by kvm
invalidate_page/change_pte) so we can unmap tons of sptes and do a
single kvm tlb flush that covers them all (by keeping both actions
under the mmu_lock), but in mmu notifier context the pages can't be
freed from under the guest, so we can flush the tlb flushing the tlb
before making the page freeable, because both old and new page in
do_wp_page are still pinned and can't be freed and reused from under
us even if we release mmu_lock before tlb flush.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update

2009-09-24 Thread Paul E. McKenney
On Thu, Sep 24, 2009 at 11:06:51AM -0300, Marcelo Tosatti wrote:
> On Mon, Sep 21, 2009 at 08:37:18PM -0300, Marcelo Tosatti wrote:
> > Use two steps for memslot deletion: mark the slot invalid (which stops 
> > instantiation of new shadow pages for that slot, but allows destruction),
> > then instantiate the new empty slot.
> > 
> > Also simplifies kvm_handle_hva locking.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> 
> 
> 
> > -   if (!npages)
> > +   if (!npages) {
> > +   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> > +   if (!slots)
> > +   goto out_free;
> > +   memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> > +   if (mem->slot >= slots->nmemslots)
> > +   slots->nmemslots = mem->slot + 1;
> > +   slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
> > +
> > +   old_memslots = kvm->memslots;
> > +   rcu_assign_pointer(kvm->memslots, slots);
> > +   synchronize_srcu(&kvm->srcu);
> > +   /* From this point no new shadow pages pointing to a deleted
> > +* memslot will be created.
> > +*
> > +* validation of sp->gfn happens in:
> > +*  - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> > +*  - kvm_is_visible_gfn (mmu_check_roots)
> > +*/
> > kvm_arch_flush_shadow(kvm);
> > +   kfree(old_memslots);
> > +   }
> >  
> > r = kvm_arch_prepare_memory_region(kvm, &new, old, user_alloc);
> > if (r)
> > goto out_free;
> >  
> > -   spin_lock(&kvm->mmu_lock);
> > -   if (mem->slot >= kvm->memslots->nmemslots)
> > -   kvm->memslots->nmemslots = mem->slot + 1;
> > +#ifdef CONFIG_DMAR
> > +   /* map the pages in iommu page table */
> > +   if (npages)
> > +   r = kvm_iommu_map_pages(kvm, &new);
> > +   if (r)
> > +   goto out_free;
> > +#endif
> >  
> > -   *memslot = new;
> > -   spin_unlock(&kvm->mmu_lock);
> > +   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> > +   if (!slots)
> > +   goto out_free;
> > +   memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> > +   if (mem->slot >= slots->nmemslots)
> > +   slots->nmemslots = mem->slot + 1;
> > +
> > +   /* actual memory is freed via old in kvm_free_physmem_slot below */
> > +   if (!npages) {
> > +   new.rmap = NULL;
> > +   new.dirty_bitmap = NULL;
> > +   for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i)
> > +   new.lpage_info[i] = NULL;
> > +   }
> > +
> > +   slots->memslots[mem->slot] = new;
> > +   old_memslots = kvm->memslots;
> > +   rcu_assign_pointer(kvm->memslots, slots);
> > +   synchronize_srcu(&kvm->srcu);
> >  
> > kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
> 
> Paul,
> 
> There is a scenario where this path, which updates KVM memory slots, is
> called relatively often.
> 
> Each synchronize_srcu() call takes about 10ms (avg 3ms per
> synchronize_sched call), so this is hurting us.
> 
> Is this expected? Is there any possibility for synchronize_srcu()
> optimization?
> 
> There are other sides we can work on, such as reducing the memory slot 
> updates, but i'm wondering what can be done regarding SRCU itself.

This is expected behavior, but there is a possible fix currently
in mainline (Linus's git tree).  The idea would be to create a
synchronize_srcu_expedited(), which starts with synchronize_srcu(), and
replaces the synchronize_sched() calls with synchronize_sched_expedited().

This could potentially reduce the overall synchronize_srcu() latency
to well under a microsecond.  The price to be paid is that each instance
of synchronize_sched_expedited() IPIs all the online CPUs, and awakens
the migration thread on each.

Would this approach likely work for you?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/24/2009 12:15 AM, Gregory Haskins wrote:
>>
 There are various aspects about designing high-performance virtual
 devices such as providing the shortest paths possible between the
 physical resources and the consumers.  Conversely, we also need to
 ensure that we meet proper isolation/protection guarantees at the same
 time.  What this means is there are various aspects to any
 high-performance PV design that require to be placed in-kernel to
 maximize the performance yet properly isolate the guest.

 For instance, you are required to have your signal-path (interrupts and
 hypercalls), your memory-path (gpa translation), and
 addressing/isolation model in-kernel to maximize performance.


>>> Exactly.  That's what vhost puts into the kernel and nothing more.
>>>  
>> Actually, no.  Generally, _KVM_ puts those things into the kernel, and
>> vhost consumes them.  Without KVM (or something equivalent), vhost is
>> incomplete.  One of my goals with vbus is to generalize the "something
>> equivalent" part here.
>>
> 
> I don't really see how vhost and vbus are different here.  vhost expects
> signalling to happen through a couple of eventfds and requires someone
> to supply them and implement kernel support (if needed).  vbus requires
> someone to write a connector to provide the signalling implementation. 
> Neither will work out-of-the-box when implementing virtio-net over
> falling dominos, for example.

I realize in retrospect that my choice of words above implies vbus _is_
complete, but this is not what I was saying.  What I was trying to
convey is that vbus is _more_ complete.  Yes, in either case some kind
of glue needs to be written.  The difference is that vbus implements
more of the glue generally, and leaves less required to be customized
for each iteration.

Going back to our stack diagrams, you could think of a vhost solution
like this:

--
| virtio-net
--
| virtio-ring
--
| virtio-bus
--
| ? undefined-1 ?
--
| vhost
--

and you could think of a vbus solution like this

--
| virtio-net
--
| virtio-ring
--
| virtio-bus
--
| bus-interface
--
| ? undefined-2 ?
--
| bus-model
--
| virtio-net-device (vhost ported to vbus model? :)
--


So the difference between vhost and vbus in this particular context is
that you need to have "undefined-1" do device discovery/hotswap,
config-space, address-decode/isolation, signal-path routing, memory-path
routing, etc.  Today this function is filled by things like virtio-pci,
pci-bus, KVM/ioeventfd, and QEMU for x86.  I am not as familiar with
lguest, but presumably it is filled there by components like
virtio-lguest, lguest-bus, lguest.ko, and lguest-launcher.  And to use
more contemporary examples, we might have virtio-domino, domino-bus,
domino.ko, and domino-launcher as well as virtio-ira, ira-bus, ira.ko,
and ira-launcher.

Contrast this to the vbus stack:  The bus-X components (when optionally
employed by the connector designer) do device-discovery, hotswap,
config-space, address-decode/isolation, signal-path and memory-path
routing, etc in a general (and pv-centric) way. The "undefined-2"
portion is the "connector", and just needs to convey messages like
"DEVCALL" and "SHMSIGNAL".  The rest is handled in other parts of the stack.

So to answer your question, the difference is that the part that has to
be customized in vbus should be a fraction of what needs to be
customized with vhost because it defines more of the stack.  And, as
eluded to in my diagram, both virtio-net and vhost (with some
modifications to fit into the vbus framework) are potentially
complementary, not competitors.

> 
 Vbus accomplishes its in-kernel isolation model by providing a
 "container" concept, where objects are placed into this container by
 userspace.  The host kernel enforces isolation/protection by using a
 namespace to identify objects that is only relevant within a specific
 container's context (namely, a "u32 dev-id").  The guest addresses the
 objects by its dev-id, and the kernel ensures that the guest can't
 access objects outside of its dev-id namespace.


>>> vhost manages to accomplish this without any kernel support.
>>>  
>> No, vhost manages to accomplish this because of KVMs kernel support
>> (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
>> merely a kind of "tuntap"-like clone signalled by eventfds.
>>
> 
> Without a vbus-connector-falling-dominos, vbus-venet can't do anything
> either.

Mostly covered above...

However, I was addressing your assertion that vhost somehow m

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Gregory Haskins
Avi Kivity wrote:
> On 09/23/2009 10:37 PM, Avi Kivity wrote:
>>
>> Example: feature negotiation.  If it happens in userspace, it's easy
>> to limit what features we expose to the guest.  If it happens in the
>> kernel, we need to add an interface to let the kernel know which
>> features it should expose to the guest.  We also need to add an
>> interface to let userspace know which features were negotiated, if we
>> want to implement live migration.  Something fairly trivial bloats
>> rapidly.
> 
> btw, we have this issue with kvm reporting cpuid bits to the guest. 
> Instead of letting kvm talk directly to the hardware and the guest, kvm
> gets the cpuid bits from the hardware, strips away features it doesn't
> support, exposes that to userspace, and expects userspace to program the
> cpuid bits it wants to expose to the guest (which may be different than
> what kvm exposed to userspace, and different from guest to guest).
> 

This issue doesn't exist in the model I am referring to, as these are
all virtual-devices anyway.  See my last reply

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-24 Thread Ira W. Snyder
On Thu, Sep 24, 2009 at 10:18:28AM +0300, Avi Kivity wrote:
> On 09/24/2009 12:15 AM, Gregory Haskins wrote:
> >
> >>> There are various aspects about designing high-performance virtual
> >>> devices such as providing the shortest paths possible between the
> >>> physical resources and the consumers.  Conversely, we also need to
> >>> ensure that we meet proper isolation/protection guarantees at the same
> >>> time.  What this means is there are various aspects to any
> >>> high-performance PV design that require to be placed in-kernel to
> >>> maximize the performance yet properly isolate the guest.
> >>>
> >>> For instance, you are required to have your signal-path (interrupts and
> >>> hypercalls), your memory-path (gpa translation), and
> >>> addressing/isolation model in-kernel to maximize performance.
> >>>
> >>>
> >> Exactly.  That's what vhost puts into the kernel and nothing more.
> >>  
> > Actually, no.  Generally, _KVM_ puts those things into the kernel, and
> > vhost consumes them.  Without KVM (or something equivalent), vhost is
> > incomplete.  One of my goals with vbus is to generalize the "something
> > equivalent" part here.
> >
> 
> I don't really see how vhost and vbus are different here.  vhost expects 
> signalling to happen through a couple of eventfds and requires someone 
> to supply them and implement kernel support (if needed).  vbus requires 
> someone to write a connector to provide the signalling implementation.  
> Neither will work out-of-the-box when implementing virtio-net over 
> falling dominos, for example.
> 
> >>> Vbus accomplishes its in-kernel isolation model by providing a
> >>> "container" concept, where objects are placed into this container by
> >>> userspace.  The host kernel enforces isolation/protection by using a
> >>> namespace to identify objects that is only relevant within a specific
> >>> container's context (namely, a "u32 dev-id").  The guest addresses the
> >>> objects by its dev-id, and the kernel ensures that the guest can't
> >>> access objects outside of its dev-id namespace.
> >>>
> >>>
> >> vhost manages to accomplish this without any kernel support.
> >>  
> > No, vhost manages to accomplish this because of KVMs kernel support
> > (ioeventfd, etc).   Without a KVM-like in-kernel support, vhost is a
> > merely a kind of "tuntap"-like clone signalled by eventfds.
> >
> 
> Without a vbus-connector-falling-dominos, vbus-venet can't do anything 
> either.  Both vhost and vbus need an interface, vhost's is just narrower 
> since it doesn't do configuration or enumeration.
> 
> > This goes directly to my rebuttal of your claim that vbus places too
> > much in the kernel.  I state that, one way or the other, address decode
> > and isolation _must_ be in the kernel for performance.  Vbus does this
> > with a devid/container scheme.  vhost+virtio-pci+kvm does it with
> > pci+pio+ioeventfd.
> >
> 
> vbus doesn't do kvm guest address decoding for the fast path.  It's 
> still done by ioeventfd.
> 
> >>   The guest
> >> simply has not access to any vhost resources other than the guest->host
> >> doorbell, which is handed to the guest outside vhost (so it's somebody
> >> else's problem, in userspace).
> >>  
> > You mean _controlled_ by userspace, right?  Obviously, the other side of
> > the kernel still needs to be programmed (ioeventfd, etc).  Otherwise,
> > vhost would be pointless: e.g. just use vanilla tuntap if you don't need
> > fast in-kernel decoding.
> >
> 
> Yes (though for something like level-triggered interrupts we're probably 
> keeping it in userspace, enjoying the benefits of vhost data path while 
> paying more for signalling).
> 
> >>> All that is required is a way to transport a message with a "devid"
> >>> attribute as an address (such as DEVCALL(devid)) and the framework
> >>> provides the rest of the decode+execute function.
> >>>
> >>>
> >> vhost avoids that.
> >>  
> > No, it doesn't avoid it.  It just doesn't specify how its done, and
> > relies on something else to do it on its behalf.
> >
> 
> That someone else can be in userspace, apart from the actual fast path.
> 
> > Conversely, vbus specifies how its done, but not how to transport the
> > verb "across the wire".  That is the role of the vbus-connector abstraction.
> >
> 
> So again, vbus does everything in the kernel (since it's so easy and 
> cheap) but expects a vbus-connector.  vhost does configuration in 
> userspace (since it's so clunky and fragile) but expects a couple of 
> eventfds.
> 
> >>> Contrast this to vhost+virtio-pci (called simply "vhost" from here).
> >>>
> >>>
> >> It's the wrong name.  vhost implements only the data path.
> >>  
> > Understood, but vhost+virtio-pci is what I am contrasting, and I use
> > "vhost" for short from that point on because I am too lazy to type the
> > whole name over and over ;)
> >
> 
> If you #define A A+B+C don't expect intelligent conversation a

Re: Binary Windows guest drivers are released

2009-09-24 Thread Glennie Vignarajah
Le Thursday 24 September 2009 vers 15:13, Yan Vugenfirer("Yan 
Vugenfirer" ) a écrit:
> Hello All,

Hi,

> 
> I am happy to announce that the Windows guest drivers binaries are
> released.
> http://www.linux-kvm.org/page/WindowsGuestDrivers/Download_Drivers

Wonderfull...
I've using them on XP and W2K3. They are working like a charm.
Thank you and many thanks to Red hat for releasing these drivers.
Regards,
-- 
http://www.glennie.fr
If the only tool you have is hammer, you tend to see every problem as 
a nail.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Matthew Tippett
The test itself is a simple usage of SQLite.  It is stock KVM as
available in 2.6.31 on Ubuntu Karmic.  So it would be the environment,
not the test.

So assuming that KVM upstream works as expected that would leave
either 2.6.31 having an issue, or Ubuntu having an issue.

Care to make an assertion on the KVM in 2.6.31?  Leaving only Ubuntu's
installation.

Can some KVM developers attempt to confirm that a 'correctly'
configured KVM will not demonstrate this behaviour?
http://www.phoronix-test-suite.com/ (or is already available in newer
distributions of Fedora, openSUSE and Ubuntu.

Regards... Matthew


On 9/24/09, Avi Kivity  wrote:
> On 09/24/2009 03:31 PM, Matthew Tippett wrote:
>> Thanks Avi,
>>
>> I am still trying to reconcile the your statement with the potential
>> data risks and the numbers observed.
>>
>> My read of your response is that the guest sees a consistent view -
>> the data is commited to the virtual disk device.  Does a synchronous
>> write within the guest trigger a synchronous write of the virtual
>> device within the host?
>>
>
> Yes.
>
>> I don't think offering SQLite users a 10 fold increase in performance
>> with no data integrity risks just by using KVM is a sane proposition.
>>
>
> It isn't, my guess is that the test setup is broken somehow.
>
> --
> Do not meddle in the internals of kernels, for they are subtle and quick to
> panic.
>
>

-- 
Sent from my mobile device
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH: kvm 3/6] Fix hotadd of CPUs for KVM.

2009-09-24 Thread Zachary Amsden

On 09/24/2009 05:52 AM, Marcelo Tosatti wrote:



+static __cpuinit int vmx_cpu_hotadd(int cpu)
+{
+   struct vmcs *vmcs;
+
+   if (per_cpu(vmxarea, cpu))
+   return 0;
+
+   vmcs = alloc_vmcs_cpu(cpu);
+   if (!vmcs)
+   return -ENOMEM;
+
+   per_cpu(vmxarea, cpu) = vmcs;
+
+   return 0;
+}
 

Have to free in __cpuexit?

Is it too wasteful to allocate statically with DEFINE_PER_CPU_PAGE_ALIGNED?
   


Unfortunately, I think it is.  The VMX / SVM structures are quite large, 
and we can have a lot of potential CPUs.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Binary Windows guest drivers are released

2009-09-24 Thread Kenni Lund
2009/9/24 Yan Vugenfirer :
> Hello All,
>
> I am happy to announce that the Windows guest drivers binaries are
> released.

Thank you, I've been waiting for this for quite a while :)

I've done some benchmarking with the drivers on Windows XP SP3 32bit,
but it seems like using the VirtIO drivers are slower than the IDE drivers in
(almost) all cases. Perhaps I've missed something or does the driver still
need optimization?

I created two raw images of 5GB and attached them to a WinXP SP3
virtual machine with:
"-drive file=virtio.img,if=virtio -drive file=ide.img,if=ide"

I installed the VirtIO drivers, rebooted, formatted the new virtual HDDs with
NTFS and downloaded IOMeter. Three different test were run; database
workload ("Default" in IOmeter), maximum read throughput and maximum
write throughput (settings taken from IOmeter documentation). All results
are the average of two individual runs of the test. Each test ran for 3 minutes.

--
Typical database workload
("default" in Iometer: 2kb, 67% read, 33% write, 100% random, 0% sequential)
--
Total I/Os per sec:
IDE: 86,67
VirtIO: 66,84

Total MBs per second:
IDE: 0,17MB/sec
VirtIO: 0,13MB/sec

Average I/O response time:
IDE: 11,59ms
VirtIO: 14,96ms

Maximum I/O response time:
IDE: 177,06ms
VirtIO: 244,52ms

% CPU Utilization:
IDE: 3,15%
VirtIO: 2,55%

--
Maximum reading throughput
(64kb, 100% read, 0% write, 0% random, 100% sequential)
--
Total I/Os per sec:
IDE: 3266,17
VirtIO: 2694,34

Total MBs per second:
IDE: 204,14MB/sec
VirtIO: 168,40MB/sec

Average I/O response time:
IDE: 0,3053ms
VirtIO: 0,3710ms

Maximum I/O response time:
IDE: 210,60ms
VirtIO: 180,65ms

% CPU Utilization:
IDE: 70,4%
VirtIO: 55,66%

--
Maximum writing throughput
(64kb, 0% read, 100% write, 0% random, 100% sequential)
--
Total I/Os per sec:
IDE: 258,92
VirtIO: 123,69

Total MBs per second:
IDE: 16,18MB/sec
VirtIO: 7,74MB/sec

Average I/O response time:
IDE: 3,89ms
VirtIO: 8,17ms

Maximum I/O response time:
IDE: 241,99ms
VirtIO: 838,19ms

% CPU Utilization:
IDE: 8,21%
VirtIO: 4,88%

This was tested on a Arch Linux host with kernel 2.6.30.6 64bit and kvm-88.
One CPU and 2GB of RAM was assigned to the virtual machine.

Is this expected behaviour?

Thanks again for your effort on the VirtIO drivers :)

Best Regards
Kenni Lund
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Binary Windows guest drivers are released

2009-09-24 Thread Javier Guerra
On Thu, Sep 24, 2009 at 3:38 PM, Kenni Lund  wrote:
> I've done some benchmarking with the drivers on Windows XP SP3 32bit,
> but it seems like using the VirtIO drivers are slower than the IDE drivers in
> (almost) all cases. Perhaps I've missed something or does the driver still
> need optimization?

very interesting!

it seems that IDE wins on all the performance numbers, but VirtIO
always has lower CPU utilization.  i guess this is guest CPU %, right?
it would also be interesting to compare the CPU usage from the host
point of view, since a lower 'off-guest' CPU usage is very important
for scaling to many guests doing I/O.

-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Binary Windows guest drivers are released

2009-09-24 Thread Dor Laor

On 09/24/2009 11:59 PM, Javier Guerra wrote:

On Thu, Sep 24, 2009 at 3:38 PM, Kenni Lund  wrote:

I've done some benchmarking with the drivers on Windows XP SP3 32bit,
but it seems like using the VirtIO drivers are slower than the IDE drivers in
(almost) all cases. Perhaps I've missed something or does the driver still
need optimization?


very interesting!

it seems that IDE wins on all the performance numbers, but VirtIO
always has lower CPU utilization.  i guess this is guest CPU %, right?
it would also be interesting to compare the CPU usage from the host
point of view, since a lower 'off-guest' CPU usage is very important
for scaling to many guests doing I/O.



Can you re-try it with setting the host ioscheduler to deadline?
Virtio backend (thread pool) is sensitive for it.

These drivers are mainly tweaked for win2k3 and win2k8. We once had 
queue depth settings in the driver, not sure we still have it, Vadim, 
can you add more info?


Also virtio should provide IO parallelism as opposed to IDE. I don't 
think your test test it. Virtio can provide more virtual drives than the 
max 4 that ide offers.


Dor
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: a streaming server on a kvm virtual machine?

2009-09-24 Thread Charles Duffy
For a heavily I/O-bound load such as media streaming, it's better not to 
use virtualization. There are some newer technologies such as SR-IOV 
which may mitigate these problems, but I don't particularly suggest 
straying that close to the bleeding edge on a presumably 
mission-critical system.


If you really want to be able to compartmentalize tasks running on this 
hardware, look at BSD jails, OpenVZ or Virtuozzo for an alternate 
non-virtualization approach which doesn't have as much overhead on 
I/O-heavy loads.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hotplug patches for KVM

2009-09-24 Thread Zachary Amsden
Simplified the patch series a bit and fixed some bugs noticed by Marcelo.
Axed the hot-remove notifier (was not needed), fixed a locking bug by
using cpufreq_quick_get, fixed another bug in kvm_cpu_hotplug that was
filtering out online notifications when KVM was loaded but not in use.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: kvm 1/5] Code motion. Separate timer intialization into an indepedent function.

2009-09-24 Thread Zachary Amsden
Signed-off-by: Zachary Amsden 
---
 arch/x86/kvm/x86.c |   23 +++
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fedac9d..15d2ace 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3116,9 +3116,22 @@ static struct notifier_block 
kvmclock_cpufreq_notifier_block = {
 .notifier_call  = kvmclock_cpufreq_notifier
 };
 
+static void kvm_timer_init(void)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu)
+   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+   if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+   tsc_khz_ref = tsc_khz;
+   cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
+ CPUFREQ_TRANSITION_NOTIFIER);
+   }
+}
+
 int kvm_arch_init(void *opaque)
 {
-   int r, cpu;
+   int r;
struct kvm_x86_ops *ops = (struct kvm_x86_ops *)opaque;
 
if (kvm_x86_ops) {
@@ -3150,13 +3163,7 @@ int kvm_arch_init(void *opaque)
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0);
 
-   for_each_possible_cpu(cpu)
-   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
-   if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-   tsc_khz_ref = tsc_khz;
-   cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
- CPUFREQ_TRANSITION_NOTIFIER);
-   }
+   kvm_timer_init();
 
return 0;
 
-- 
1.6.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: kvm 2/5] Kill the confusing tsc_ref_khz and ref_freq variables.

2009-09-24 Thread Zachary Amsden
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.  On such machines, also detect the frequency when bringing
a new CPU online; it isn't clear what frequency it will start with, and
it may not correspond to the reference.

Signed-off-by: Zachary Amsden 
---
 arch/x86/kvm/x86.c |   27 +--
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 15d2ace..c18e2fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3061,9 +3061,6 @@ static void bounce_off(void *info)
/* nothing */
 }
 
-static unsigned int  ref_freq;
-static unsigned long tsc_khz_ref;
-
 static int kvmclock_cpufreq_notifier(struct notifier_block *nb, unsigned long 
val,
 void *data)
 {
@@ -3071,15 +3068,15 @@ static int kvmclock_cpufreq_notifier(struct 
notifier_block *nb, unsigned long va
struct kvm *kvm;
struct kvm_vcpu *vcpu;
int i, send_ipi = 0;
-
-   if (!ref_freq)
-   ref_freq = freq->old;
+   unsigned long old_khz;
 
if (val == CPUFREQ_PRECHANGE && freq->old > freq->new)
return 0;
if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new)
return 0;
-   per_cpu(cpu_tsc_khz, freq->cpu) = cpufreq_scale(tsc_khz_ref, ref_freq, 
freq->new);
+   old_khz = per_cpu(cpu_tsc_khz, freq->cpu);
+   per_cpu(cpu_tsc_khz, freq->cpu) = cpufreq_scale(old_khz, freq->old,
+   freq->new);
 
spin_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
@@ -3120,12 +3117,18 @@ static void kvm_timer_init(void)
 {
int cpu;
 
-   for_each_possible_cpu(cpu)
-   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-   tsc_khz_ref = tsc_khz;
cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
  CPUFREQ_TRANSITION_NOTIFIER);
+   for_each_online_cpu(cpu)
+   per_cpu(cpu_tsc_khz, cpu) = cpufreq_get(cpu);
+   } else {
+   for_each_possible_cpu(cpu)
+   per_cpu(cpu_tsc_khz, cpu) = tsc_khz;
+   }
+   for_each_possible_cpu(cpu) {
+   printk(KERN_DEBUG "kvm: cpu %d = %ld khz\n",
+   cpu, per_cpu(cpu_tsc_khz, cpu));
}
 }
 
@@ -4698,6 +4701,10 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 
 int kvm_arch_hardware_enable(void *garbage)
 {
+   if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
+   int cpu = raw_smp_processor_id();
+   per_cpu(cpu_tsc_khz, cpu) = cpufreq_quick_get(cpu);
+   }
return kvm_x86_ops->hardware_enable(garbage);
 }
 
-- 
1.6.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: kvm 4/5] Fix hotremove of CPUs for KVM.

2009-09-24 Thread Zachary Amsden
In the process of bringing down CPUs, the SVM / VMX structures associated
with those CPUs are not freed.  This may cause leaks when unloading and
reloading the KVM module, as only the structures associated with online
CPUs are cleaned up.  So, clean up all possible CPUs, not just online ones.

Signed-off-by: Zachary Amsden 
---
 arch/x86/kvm/svm.c |2 +-
 arch/x86/kvm/vmx.c |7 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8f99d0c..13ca268 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -525,7 +525,7 @@ static __exit void svm_hardware_unsetup(void)
 {
int cpu;
 
-   for_each_online_cpu(cpu)
+   for_each_possible_cpu(cpu)
svm_cpu_uninit(cpu);
 
__free_pages(pfn_to_page(iopm_base >> PAGE_SHIFT), IOPM_ALLOC_ORDER);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b8a8428..603bde3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1350,8 +1350,11 @@ static void free_kvm_area(void)
 {
int cpu;
 
-   for_each_online_cpu(cpu)
-   free_vmcs(per_cpu(vmxarea, cpu));
+   for_each_possible_cpu(cpu)
+   if (per_cpu(vmxarea, cpu)) {
+   free_vmcs(per_cpu(vmxarea, cpu));
+   per_cpu(vmxarea, cpu) = NULL;
+   }
 }
 
 static __init int alloc_kvm_area(void)
-- 
1.6.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: kvm 5/5] Math is hard; let's do some cooking.

2009-09-24 Thread Zachary Amsden
CPU frequency change callback provides new TSC frequency for us, and in the
same units (kHz), so there is no reason to do any math.

Signed-off-by: Zachary Amsden 
---
 arch/x86/kvm/x86.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66c6bb9..60ae2c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3070,15 +3070,12 @@ static int kvmclock_cpufreq_notifier(struct 
notifier_block *nb, unsigned long va
struct kvm *kvm;
struct kvm_vcpu *vcpu;
int i, send_ipi = 0;
-   unsigned long old_khz;
 
if (val == CPUFREQ_PRECHANGE && freq->old > freq->new)
return 0;
if (val == CPUFREQ_POSTCHANGE && freq->old < freq->new)
return 0;
-   old_khz = per_cpu(cpu_tsc_khz, freq->cpu);
-   per_cpu(cpu_tsc_khz, freq->cpu) = cpufreq_scale(old_khz, freq->old,
-   freq->new);
+   per_cpu(cpu_tsc_khz, freq->cpu) = freq->new;
 
spin_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list) {
-- 
1.6.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH: kvm 3/5] Fix hotadd of CPUs for KVM.

2009-09-24 Thread Zachary Amsden
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.  When bringing a new CPU online, we must
also allocate this structure.  The method chosen to implement this is to
make the CPU online notifier available via a call to the arch code.  This
allows memory allocation to be done smoothly, without any need to allocate
extra structures.

Note: CPU up notifiers may call KVM callback before calling cpufreq callbacks.
This would causes the CPU frequency not to be detected (and it is not always
clear on non-constant TSC platforms what the bringup TSC rate will be, so the
guess of using tsc_khz could be wrong).  So, we clear the rate to zero in such
a case and add logic to query it upon entry.

Signed-off-by: Zachary Amsden 
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/svm.c  |   15 +--
 arch/x86/kvm/vmx.c  |   17 +
 arch/x86/kvm/x86.c  |   13 +
 include/linux/kvm_host.h|6 ++
 virt/kvm/kvm_main.c |6 ++
 6 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 299cc1b..b7dd14b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -459,6 +459,7 @@ struct descriptor_table {
 struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void);  /* __init */
int (*disabled_by_bios)(void); /* __init */
+   int (*cpu_hotadd)(int cpu);
int (*hardware_enable)(void *dummy);
void (*hardware_disable)(void *dummy);
void (*check_processor_compatibility)(void *rtn);
@@ -791,6 +792,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
_ASM_PTR " 666b, 667b \n\t" \
".popsection"
 
+#define KVM_ARCH_WANT_HOTPLUG_NOTIFIER
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9a4daca..8f99d0c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -330,13 +330,13 @@ static int svm_hardware_enable(void *garbage)
return -EBUSY;
 
if (!has_svm()) {
-   printk(KERN_ERR "svm_cpu_init: err EOPNOTSUPP on %d\n", me);
+   printk(KERN_ERR "svm_hardware_enable: err EOPNOTSUPP on %d\n", 
me);
return -EINVAL;
}
svm_data = per_cpu(svm_data, me);
 
if (!svm_data) {
-   printk(KERN_ERR "svm_cpu_init: svm_data is NULL on %d\n",
+   printk(KERN_ERR "svm_hardware_enable: svm_data is NULL on %d\n",
   me);
return -EINVAL;
}
@@ -394,6 +394,16 @@ err_1:
 
 }
 
+static __cpuinit int svm_cpu_hotadd(int cpu)
+{
+   struct svm_cpu_data *svm_data = per_cpu(svm_data, cpu);
+
+   if (svm_data)
+   return 0;
+
+   return svm_cpu_init(cpu);
+}
+
 static void set_msr_interception(u32 *msrpm, unsigned msr,
 int read, int write)
 {
@@ -2858,6 +2868,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.hardware_setup = svm_hardware_setup,
.hardware_unsetup = svm_hardware_unsetup,
.check_processor_compatibility = svm_check_processor_compat,
+   .cpu_hotadd = svm_cpu_hotadd,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..b8a8428 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1408,6 +1408,22 @@ static __exit void hardware_unsetup(void)
free_kvm_area();
 }
 
+static __cpuinit int vmx_cpu_hotadd(int cpu)
+{
+   struct vmcs *vmcs;
+
+   if (per_cpu(vmxarea, cpu))
+   return 0;
+
+   vmcs = alloc_vmcs_cpu(cpu);
+   if (!vmcs) 
+   return -ENOMEM;
+
+   per_cpu(vmxarea, cpu) = vmcs;
+
+   return 0;
+}
+
 static void fix_pmode_dataseg(int seg, struct kvm_save_segment *save)
 {
struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
@@ -3925,6 +3941,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.hardware_setup = hardware_setup,
.hardware_unsetup = hardware_unsetup,
.check_processor_compatibility = vmx_check_processor_compat,
+   .cpu_hotadd = vmx_cpu_hotadd,
.hardware_enable = hardware_enable,
.hardware_disable = hardware_disable,
.cpu_has_accelerated_tpr = report_flexpriority,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c18e2fc..66c6bb9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1326,6 +1326,8 @@ out:
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
kvm_x86_ops->vcpu_load(vcpu, cpu);
+   if (unlikely(per_cpu(cpu_tsc_khz, cpu) == 0))
+   per_cpu(cpu_tsc_khz, 

Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting

2009-09-24 Thread Zhai, Edwin

Avi,

hrtimer is used for sleep in attached patch, which have similar perf 
gain with previous one. Maybe we can check in this patch first, and turn 
to direct yield in future, as you suggested.


Thanks,
edwin

Avi Kivity wrote:

On 09/23/2009 05:04 PM, Zhai, Edwin wrote:
  

Avi,

This is the patch to enable PLE, which depends on the a small change 
of Linux scheduler

(see http://lkml.org/lkml/2009/5/20/447).

According to our discussion last time, one missing part is that if PLE
exit, pick up an unscheduled vcpu at random and schedule it. But
further investigation found that:
1. KVM is hard to know the schedule state for each vcpu.
2. Linux scheduler has no existed API can be used to pull a specific
task to this cpu, so we need more changes to the common scheduler.
So I prefer current simple way: just give up current cpu time.

If no objection, I'll try to push common scheduler change first to
linux.



We haven't sorted out what is the correct thing to do here.  I think we 
should go for a directed yield, but until we have it, you can use 
hrtimers to sleep for 100 microseconds and hope the holding vcpu will 
get scheduled.  Even if it doesn't, we're only wasting a few percent cpu 
time instead of spinning.


  


--
best rgds,
edwin

KVM:VMX: Add support for Pause-Loop Exiting

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap- upper bound on the amount of time between two successive
 executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
 a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin 

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 272514c..2b49454 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -56,6 +56,7 @@
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
 #define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
+#define SECONDARY_EXEC_PAUSE_LOOP_EXITING  0x0400
 
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
@@ -144,6 +145,8 @@ enum vmcs_field {
VM_ENTRY_INSTRUCTION_LEN= 0x401a,
TPR_THRESHOLD   = 0x401c,
SECONDARY_VM_EXEC_CONTROL   = 0x401e,
+   PLE_GAP = 0x4020,
+   PLE_WINDOW  = 0x4022,
VM_INSTRUCTION_ERROR= 0x4400,
VM_EXIT_REASON  = 0x4402,
VM_EXIT_INTR_INFO   = 0x4404,
@@ -248,6 +251,7 @@ enum vmcs_field {
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_PAUSE_INSTRUCTION   40
 #define EXIT_REASON_MCE_DURING_VMENTRY  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3fe0d42..21dbfe9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,6 +61,25 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+/*
+ * These 2 parameters are used to config the controls for Pause-Loop Exiting:
+ * ple_gap:upper bound on the amount of time between two successive
+ * executions of PAUSE in a loop. Also indicate if ple enabled.
+ * According to test, this time is usually small than 41 cycles.
+ * ple_window: upper bound on the amount of time a guest is allowed to execute
+ * in a PAUSE loop. Tests indicate that most spinlocks are held for
+ * less than 2^12 cycles
+ * Time is measured based on a counter that runs at the same rate as the TSC,
+ * refer SDM volume 3b section 21.6.13 & 22.1.3.
+ */
+#define KVM_VMX_DEFAULT_PLE_GAP41
+#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
+module_param(ple_gap, int, S_IRUGO);
+
+static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
+module_param(ple_window, int, S_IRUGO);
+
 struct vmcs {
u32 revision_id;
u32 abor

Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Ian Woodstock
The Phoronix Test Suite is designed to test a (client) operating
system out of the box and it does a good job at that.
It's certainly valid to run PTS inside a virtual machine but you
you're going to need to tune the host, in this case Karmic.

The way you'd configure a client operating system to a server is
obviously different, for example selecting the right I/O elevator, in
the case of KVM you'll certainly see benefits there.
You'd also want to make sure that the guest OS has been optimally
installed - for exmaple in a VMware environment you'd install VMware
tools - in KVM you'd ensure that you're using VirtIO in the guest for
the same reason.
They you'd also look at optimizations like cpu pinning, use of huge pages, etc.

Just taking an generic installation of Karmic out of the box and
running VMs isn't going to give you real insight into the performance
of KVM. When deploying Linux as a virtualization host you should be
tuning it.
It would certainly be appropriate to have a spin of Karmic that was
designed to run as a virtualization host.

Maybe it would be more appropriate to actually run the test in a tuned
environment and present some results rather than ask a developer to
prove KVM is working.



> The test itself is a simple usage of SQLite.  It is stock KVM as
> available in 2.6.31 on Ubuntu Karmic.  So it would be the environment,
> not the test.
>
> So assuming that KVM upstream works as expected that would leave
> either 2.6.31 having an issue, or Ubuntu having an issue.
>
> Care to make an assertion on the KVM in 2.6.31?  Leaving only Ubuntu's
> installation.
>
> Can some KVM developers attempt to confirm that a 'correctly'
> configured KVM will not demonstrate this behaviour?
> http://www.phoronix-test-suite.com/ (or is already available in newer
> distributions of Fedora, openSUSE and Ubuntu.
>
> Regards... Matthew


On 9/24/09, Avi Kivity  wrote:
> On 09/24/2009 03:31 PM, Matthew Tippett wrote:
>> Thanks Avi,
>>
>> I am still trying to reconcile the your statement with the potential
>> data risks and the numbers observed.
>>
>> My read of your response is that the guest sees a consistent view -
>> the data is commited to the virtual disk device.  Does a synchronous
>> write within the guest trigger a synchronous write of the virtual
>> device within the host?
>>
>
> Yes.
>
>> I don't think offering SQLite users a 10 fold increase in performance
>> with no data integrity risks just by using KVM is a sane proposition.
>>
>
> It isn't, my guess is that the test setup is broken somehow.
>
> --
> Do not meddle in the internals of kernels, for they are subtle and quick to
> panic.
>
>

--
Sent from my mobile device
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-09-24 Thread Matthew Tippett

Thanks for your response.

Remember that I am not raising questions about the relative performance 
of KVM using guests.  The prevailing opinion would be that performance 
of a guest would range anywhere from considerably slower to around the 
same performance as native - depending on workload, tuning the guest and 
configuration.


I am looking further into a particular anomalous result.  The result is 
that SQLite experiences an _order of magnitude_ - 10x beneficial 
advantage when running under KVM.


My only rationalization of this would be as the subject suggests, is 
that somewhere between the host's HDD and the guests file layer 
something is making a synchronous call asynchronous and batching writes 
 together.


Intuitively this feels that running SQLite under at least a KVM 
virtualized environment will be putting the data at considerably higher 
risk than is present in a non-virtualized environment in case of system 
failure.  Performance is inconsequential in this case.


Focusing in particular on one response

> Maybe it would be more appropriate to actually run the test in a tuned
> environment and present some results rather than ask a developer to
> prove KVM is working.

I am not asking for comparative performance results, I am looking for 
more data that indicates if the anomalous performance increase is a 
Ubuntu+KVM+2.6.31 thing or a KVM+2.6.31 thing or a KVM thing.


I am looking to the KVM developers to either confirm that the behaviour 
is safe and expected, or to provide other data points to indicate that 
it is a Ubuntu+2.6.31 or a 2.6.31 thing by showing that when KVM is 
properly configured KVM environment the performance sits in the expected 
"considerably slower to around the same speed".


Regards,

Matthew

 Original Message  
Subject: Re: sync guest calls made async on host - SQLite performance
From: Ian Woodstock 
To: kvm@vger.kernel.org
Date: 09/24/2009 10:11 PM


The Phoronix Test Suite is designed to test a (client) operating
system out of the box and it does a good job at that.
It's certainly valid to run PTS inside a virtual machine but you
you're going to need to tune the host, in this case Karmic.

The way you'd configure a client operating system to a server is
obviously different, for example selecting the right I/O elevator, in
the case of KVM you'll certainly see benefits there.
You'd also want to make sure that the guest OS has been optimally
installed - for exmaple in a VMware environment you'd install VMware
tools - in KVM you'd ensure that you're using VirtIO in the guest for
the same reason.
They you'd also look at optimizations like cpu pinning, use of huge pages, etc.

Just taking an generic installation of Karmic out of the box and
running VMs isn't going to give you real insight into the performance
of KVM. When deploying Linux as a virtualization host you should be
tuning it.
It would certainly be appropriate to have a spin of Karmic that was
designed to run as a virtualization host.

Maybe it would be more appropriate to actually run the test in a tuned
environment and present some results rather than ask a developer to
prove KVM is working.




The test itself is a simple usage of SQLite.  It is stock KVM as
available in 2.6.31 on Ubuntu Karmic.  So it would be the environment,
not the test.

So assuming that KVM upstream works as expected that would leave
either 2.6.31 having an issue, or Ubuntu having an issue.

Care to make an assertion on the KVM in 2.6.31?  Leaving only Ubuntu's
installation.

Can some KVM developers attempt to confirm that a 'correctly'
configured KVM will not demonstrate this behaviour?
http://www.phoronix-test-suite.com/ (or is already available in newer
distributions of Fedora, openSUSE and Ubuntu.

Regards... Matthew



On 9/24/09, Avi Kivity  wrote:

On 09/24/2009 03:31 PM, Matthew Tippett wrote:

Thanks Avi,

I am still trying to reconcile the your statement with the potential
data risks and the numbers observed.

My read of your response is that the guest sees a consistent view -
the data is commited to the virtual disk device.  Does a synchronous
write within the guest trigger a synchronous write of the virtual
device within the host?


Yes.


I don't think offering SQLite users a 10 fold increase in performance
with no data integrity risks just by using KVM is a sane proposition.


It isn't, my guess is that the test setup is broken somehow.

--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.




--
Sent from my mobile device
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the bo

[PATCH] KVM test: Make possible to build older KVM trees

2009-09-24 Thread Lucas Meneghel Rodrigues
 * Made the build of older KVM trees possible
 * Now the test handles loading extra modules, improved module loading code
 * Other small cleanups

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/tests/build.py |  125 ++
 1 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/client/tests/kvm/tests/build.py b/client/tests/kvm/tests/build.py
index 2b3c6b6..0e6ec40 100644
--- a/client/tests/kvm/tests/build.py
+++ b/client/tests/kvm/tests/build.py
@@ -66,41 +66,40 @@ def load_kvm_modules(module_dir=None, load_stock=False, 
extra_modules=None):
 kvm_vendor_module_path = None
 abort = False
 
+list_modules = ['kvm.ko', 'kvm-%s.ko' % vendor]
+if extra_modules:
+for extra_module in extra_modules:
+list_modules.append('%s.ko' % extra_module)
+
+list_module_paths = []
 for folder, subdirs, files in os.walk(module_dir):
-if "kvm.ko" in files:
-kvm_module_path = os.path.join(folder, "kvm.ko")
-kvm_vendor_module_path = os.path.join(folder, "kvm-%s.ko" %
-  vendor)
-if extra_modules:
-extra_module_list = []
-for module in extra_modules:
-extra_module_list.append(os.path.join(folder,
-  "%s.ko" % 
module))
-
-if not kvm_module_path:
-logging.error("Could not find kvm.ko inside the source dir")
-abort = True
-if not kvm_vendor_module_path:
-logging.error("Could not find kvm-%s.ko inside the source dir")
-abort = True
-
-if abort:
+for module in list_modules:
+if module in files:
+module_path = os.path.join(folder, module)
+list_module_paths.append(module_path)
+
+# We might need to arrange the modules in the correct order
+# to avoid module load problems
+list_modules_load = []
+for module in list_modules:
+for module_path in list_module_paths:
+if os.path.basename(module_path) == module:
+list_modules_load.append(module_path)
+
+if len(list_module_paths) != len(list_modules):
 logging.error("KVM modules not found. If you don't want to use the 
"
   "modules built by this test, make sure the option "
   "load_modules: 'no' is marked on the test control "
   "file.")
-raise error.TestFail("Could not find one KVM test modules on %s "
- "source dir" % module_dir)
+raise error.TestError("The modules %s were requested to be loaded, 
"
+  "but the only modules found were %s" %
+  (list_modules, list_module_paths))
 
-try:
-utils.system('insmod %s' % kvm_module_path)
-utils.system('insmod %s' % kvm_vendor_module_path)
-if extra_modules:
-for module in extra_module_list:
-utils.system('insmod %s' % module)
-
-except Exception, e:
-raise error.TestFail("Failed to load KVM modules: %s" % e)
+for module_path in list_modules_load:
+try:
+utils.system("insmod %s" % module_path)
+except Exception, e:
+raise error.TestFail("Failed to load KVM modules: %s" % e)
 
 if load_stock:
 logging.info("Loading current system KVM modules...")
@@ -166,18 +165,10 @@ class KojiInstaller:
 
 self.koji_cmd = params.get("koji_cmd", default_koji_cmd)
 
-if not os_dep.command("rpm"):
-raise error.TestError("RPM package manager not available. Are "
-  "you sure you are using an RPM based 
system?")
-if not os_dep.command("yum"):
-raise error.TestError("Yum package manager not available. Yum is "
-  "necessary to handle package install and "
-  "update.")
-if not os_dep.command(self.koji_cmd):
-raise error.TestError("Build server command %s not available. "
-  "You need to install the appropriate package 
"
-  "(usually koji and koji-tools)" %
-  self.koji_cmd)
+# Checking if all required dependencies are available
+os_dep.command("rpm")
+os_dep.command("yum")
+os_dep.command(self.koji_cmd)
 
 self.src_pkg = params.get("src_pkg", default_src_pkg)
 self.pkg_list = params.get("pkg_list", default_pkg_list)
@@ -377,18 +368,20 @@ class SourceDirInstaller:
 os.chdir(srcdir)
 self.src

Re: Binary Windows guest drivers are released

2009-09-24 Thread Vadim Rozenfeld

On 09/25/2009 12:07 AM, Dor Laor wrote:

On 09/24/2009 11:59 PM, Javier Guerra wrote:

On Thu, Sep 24, 2009 at 3:38 PM, Kenni Lund  wrote:

I've done some benchmarking with the drivers on Windows XP SP3 32bit,
but it seems like using the VirtIO drivers are slower than the IDE 
drivers in
(almost) all cases. Perhaps I've missed something or does the driver 
still

need optimization?


very interesting!

it seems that IDE wins on all the performance numbers, but VirtIO
always has lower CPU utilization.  i guess this is guest CPU %, right?
it would also be interesting to compare the CPU usage from the host
point of view, since a lower 'off-guest' CPU usage is very important
for scaling to many guests doing I/O.



Can you re-try it with setting the host ioscheduler to deadline?
Virtio backend (thread pool) is sensitive for it.

These drivers are mainly tweaked for win2k3 and win2k8. We once had 
queue depth settings in the driver, not sure we still have it, Vadim, 
can you add more info?


Also virtio should provide IO parallelism as opposed to IDE. I don't 
think your test test it. Virtio can provide more virtual drives than 
the max 4 that ide offers.


Dor
Windows XP 32-bit virtio block driver was created from our mainline code 
almost for fun.
Not like our mainline code, which is STORPORT oriented, it is a SCSIPORT 
() mini-port driver.

SCSIPORT has never been known as I/O optimized storage stack.
SCSIPORT architecture is almost dead officially.
Windows XP 32-bit has no support for STORPORT or virtual storage stack.
Developing monolithic disk driver, which will sit right on top of 
virtio-blk PCI device, looks like the one way

to have some kind of high throughput storage for Windows XP 32-bit.

Regards,
Vadim.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html