Re: Make QEmu HPET disabled by default for KVM?

2010-03-10 Thread Avi Kivity

On 03/11/2010 09:52 AM, Sheng Yang wrote:

I think we have already suffered enough timer issues due to this(e.g. I can't
boot up well on 2.6.18 kernel)...


2.6.18 as guest or as host?


I have kept --no-hpet in my setup for
months...
   


Any details about the problems?  HPET is important to some guests.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-10 Thread Avi Kivity

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:


2. Do we have kvm-specific projects? Can they be part of the QEMU project
or do we need a different mentoring organization for it?
   


Something really interesting is kvm-assisted tcg.  I'm afraid it's a bit 
too complicated to GSoC.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Make QEmu HPET disabled by default for KVM?

2010-03-10 Thread Sheng Yang
I think we have already suffered enough timer issues due to this(e.g. I can't 
boot up well on 2.6.18 kernel)... I have kept --no-hpet in my setup for 
months...

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-10 Thread Avi Kivity

On 03/11/2010 09:46 AM, Sheng Yang wrote:

On Thursday 11 March 2010 15:36:01 Avi Kivity wrote:
   

On 03/11/2010 09:20 AM, Sheng Yang wrote:
 

Currently we can only get the cpu_stat of whole guest as one. This patch
enhanced cpu_stat with more detail, has guest_system and guest_user cpu
time statistics with a little overhead.

Signed-off-by: Sheng Yang
---

This draft patch based on KVM upstream to show the idea. I would split it
into more kernel friendly version later.

The overhead is, the cost of get_cpl() after each exit from guest.
   

This can be very expensive in the nested virtualization case, so I
wouldn't like this to be in normal paths.  I think detailed profiling
like that can be left to 'perf kvm', which only has overhead if enabled
at runtime.
 

Yes, that's my concern too(though nested vmcs/vmcb read already too expensive,
they should be optimized...).


Any ideas on how to do that?  Perhaps use paravirt_ops to covert the 
vmread into a memory read?  We store the vmwrites in the vmcs anyway.



The other concern is, perf alike mechanism would
bring a lot more overhead compared to this.
   


Ordinarily users won't care if time is spent in guest kernel mode or 
guest user mode.  They want to see which guest is imposing a load on a 
system.  I consider a user profiling a guest from the host an advanced 
and rarer use case, so it's okay to require tools and additional 
overhead for this.



For example you can put the code to note the cpl in a tracepoint which
is enabled dynamically.
 

Yanmin have already implement "perf kvm" to support this. We are just arguing
if a normal top-alike mechanism is necessary.

I am also considering to make it a feature that can be disabled. But seems it
make things complicate and result in uncertain cpustat output.
   


I'm not even sure that guest time was a good idea.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-10 Thread Sheng Yang
On Thursday 11 March 2010 15:36:01 Avi Kivity wrote:
> On 03/11/2010 09:20 AM, Sheng Yang wrote:
> > Currently we can only get the cpu_stat of whole guest as one. This patch
> > enhanced cpu_stat with more detail, has guest_system and guest_user cpu
> > time statistics with a little overhead.
> >
> > Signed-off-by: Sheng Yang
> > ---
> >
> > This draft patch based on KVM upstream to show the idea. I would split it
> > into more kernel friendly version later.
> >
> > The overhead is, the cost of get_cpl() after each exit from guest.
> 
> This can be very expensive in the nested virtualization case, so I
> wouldn't like this to be in normal paths.  I think detailed profiling
> like that can be left to 'perf kvm', which only has overhead if enabled
> at runtime.

Yes, that's my concern too(though nested vmcs/vmcb read already too expensive, 
they should be optimized...). The other concern is, perf alike mechanism would 
bring a lot more overhead compared to this. 

> For example you can put the code to note the cpl in a tracepoint which
> is enabled dynamically.

Yanmin have already implement "perf kvm" to support this. We are just arguing 
if a normal top-alike mechanism is necessary.

I am also considering to make it a feature that can be disabled. But seems it 
make things complicate and result in uncertain cpustat output.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-10 Thread Avi Kivity

On 03/11/2010 09:20 AM, Sheng Yang wrote:

Currently we can only get the cpu_stat of whole guest as one. This patch
enhanced cpu_stat with more detail, has guest_system and guest_user cpu time
statistics with a little overhead.

Signed-off-by: Sheng Yang
---

This draft patch based on KVM upstream to show the idea. I would split it into
more kernel friendly version later.

The overhead is, the cost of get_cpl() after each exit from guest.
   


This can be very expensive in the nested virtualization case, so I 
wouldn't like this to be in normal paths.  I think detailed profiling 
like that can be left to 'perf kvm', which only has overhead if enabled 
at runtime.


For example you can put the code to note the cpl in a tracepoint which 
is enabled dynamically.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-10 Thread Sheng Yang
Currently we can only get the cpu_stat of whole guest as one. This patch
enhanced cpu_stat with more detail, has guest_system and guest_user cpu time
statistics with a little overhead.

Signed-off-by: Sheng Yang 
---

This draft patch based on KVM upstream to show the idea. I would split it into
more kernel friendly version later.

The overhead is, the cost of get_cpl() after each exit from guest.

Comments are welcome!

 arch/x86/kvm/x86.c  |   10 ++
 fs/proc/stat.c  |   22 --
 include/linux/kernel_stat.h |2 ++
 include/linux/kvm_host.h|1 +
 include/linux/sched.h   |1 +
 kernel/sched.c  |6 ++
 6 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 703f637..c8ea6e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4290,6 +4290,14 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
}
 }
 
+static void kvm_update_guest_mode(struct kvm_vcpu *vcpu)
+{
+   int cpl = kvm_x86_ops->get_cpl(vcpu);
+
+   if (cpl != 0)
+   current->flags |= PF_VCPU_USER;
+}
+
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
int r;
@@ -4377,6 +4385,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
kvm_x86_ops->run(vcpu);
 
+   kvm_update_guest_mode(vcpu);
+
/*
 * If the guest has used debug registers, at least dr7
 * will be disabled while returning to the host.
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index b9b7aad..d07640a 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -27,7 +27,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
-   cputime64_t guest, guest_nice;
+   cputime64_t guest, guest_nice, guest_user, guest_system;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -36,7 +36,7 @@ static int show_stat(struct seq_file *p, void *v)
 
user = nice = system = idle = iowait =
irq = softirq = steal = cputime64_zero;
-   guest = guest_nice = cputime64_zero;
+   guest = guest_nice = guest_user = guest_system = cputime64_zero;
getboottime(&boottime);
jif = boottime.tv_sec;
 
@@ -53,6 +53,10 @@ static int show_stat(struct seq_file *p, void *v)
guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
guest_nice = cputime64_add(guest_nice,
kstat_cpu(i).cpustat.guest_nice);
+   guest_user = cputime64_add(guest_user,
+   kstat_cpu(i).cpustat.guest_user);
+   guest_system = cputime64_add(guest_system,
+   kstat_cpu(i).cpustat.guest_system);
for_each_irq_nr(j) {
sum += kstat_irqs_cpu(j, i);
}
@@ -68,7 +72,7 @@ static int show_stat(struct seq_file *p, void *v)
sum += arch_irq_stat();
 
seq_printf(p, "cpu  %llu %llu %llu %llu %llu %llu %llu %llu %llu "
-   "%llu\n",
+   "%llu %llu %llu\n",
(unsigned long long)cputime64_to_clock_t(user),
(unsigned long long)cputime64_to_clock_t(nice),
(unsigned long long)cputime64_to_clock_t(system),
@@ -78,7 +82,9 @@ static int show_stat(struct seq_file *p, void *v)
(unsigned long long)cputime64_to_clock_t(softirq),
(unsigned long long)cputime64_to_clock_t(steal),
(unsigned long long)cputime64_to_clock_t(guest),
-   (unsigned long long)cputime64_to_clock_t(guest_nice));
+   (unsigned long long)cputime64_to_clock_t(guest_nice),
+   (unsigned long long)cputime64_to_clock_t(guest_user),
+   (unsigned long long)cputime64_to_clock_t(guest_system));
for_each_online_cpu(i) {
 
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
@@ -93,9 +99,11 @@ static int show_stat(struct seq_file *p, void *v)
steal = kstat_cpu(i).cpustat.steal;
guest = kstat_cpu(i).cpustat.guest;
guest_nice = kstat_cpu(i).cpustat.guest_nice;
+   guest_user = kstat_cpu(i).cpustat.guest_user;
+   guest_system = kstat_cpu(i).cpustat.guest_system;
seq_printf(p,
"cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu "
-   "%llu\n",
+   "%llu %llu %llu\n",
i,
(unsigned long long)cputime64_to_clock_t(user),
(unsigned long long)cputime64_to_clock_t(nice),
@@ -106,7 +114,9 @@ static int show_stat(struct seq_file *p, void *v)
(unsigned long long)cputime64_to_clock_t(softirq),
(unsigned 

Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 04:04 PM, Arnd Bergmann wrote:

On Tuesday 09 March 2010, Cam Macdonell wrote:
   

We could make the masking in RAM, not in registers, like virtio, which would
require no exits.  It would then be part of the application specific
protocol and out of scope of of this spec.

   

This kind of implementation would be possible now since with UIO it's
up to the application whether to mask interrupts or not and what
interrupts mean.  We could leave the interrupt mask register for those
who want that behaviour.  Arnd's idea would remove the need for the
Doorbell and Mask, but we will always need at least one MMIO register
to send whatever interrupts we do send.
 

You'd also have to be very careful if the notification is in RAM to
avoid races between one guest triggering an interrupt and another
guest clearing its interrupt mask.

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.
   


That would be much slower.  The current scheme allows for an 
ioeventfd/irqfd short circuit which allows one guest to interrupt 
another without involving their qemus at all.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 06:36 PM, Cam Macdonell wrote:

On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivity  wrote:
   

On 03/09/2010 08:34 PM, Cam Macdonell wrote:
 

On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivitywrote:

   

On 03/09/2010 05:27 PM, Cam Macdonell wrote:

 


   


 

  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory
server).



   

How does the driver detect whether interrupts are supported or not?


 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?


   

I suggest not designing the device to uio.  Make it a good
guest-independent
device, and if uio doesn't fit it, change it.

Why not support interrupts unconditionally?  Is the device useful without
interrupts?

 

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.

   

Can you explain how they synchronize?  Polling?  Using the network?  Using
it as a shared cache?

If it's a reasonable use case it makes sense to keep it.

 

Do you mean how they synchronize without interrupts?  One project I've
been contacted about uses the shared region directly for
synchronization for simulations running in different VMs that share
data in the memory region.  In my tests spinlocks in the shared region
work between guests.
   


I see.


If we want to keep the serverless implementation, do we need to
support shm_open with -chardev somehow? Something like -chardev
shm,name=foo.  Right now my qdev implementation just passes the name
to the -device option and opens it.
   


I think using the file name is fine.


Another thing comes to mind - a shared memory ID, in case a guest has
multiple cards.
 

Sure, a number that can be passed on the command-line and stored in a register?
   


Yes.  NICs use the MAC address and storage uses the disk serial number, 
this is the same thing for shared memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-10 Thread Avi Kivity

On 03/10/2010 05:26 PM, Joerg Roedel wrote:

On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote:
   

On 03/10/2010 04:44 PM, Joerg Roedel wrote:
 

On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
   

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
 

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context. This is necessary for Nested SVM
with emulation of Nested Paging where we need an extra MMU
context to shadow the Nested Nested Page Table.


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..e7bef19 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_mmu {
int root_level;
int shadow_root_level;
union kvm_mmu_page_role base_role;
+   bool tdp_enabled;

   

This needs a different name, since the old one is still around.
Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
 

Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
The global variable indicates if tdp is _usable_ and we can _enable_ it
for a mmu context.
   

I think of the global flags as host tdp, and the mmu as guest tdp
(but maybe this is wrong?).  If that makes sense, the naming should
reflect that.
 

The basic flow of the mmu state with npt-npt is:

1. As long as the L1 is running the arch.mmu context is in tdp
   mode and builds a direct-mapped page table.

2. When vmrun is emulated and the nested vmcb enables nested
   paging, arch.mmu is switched to a shadow-mmu mode which now
   shadows the l1 nested page table.
   So when the l2-guest runs with nested paging the
   arch.mmu.tdp_enabled variable on the host is false.

3. On a vmexit emulation the mmu is switched back to tdp
   handling state.

So the mmu.tdp_enabled parameter is about tdp being enabled for the
mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped
page table when true or shadow a l1-page-table when false). Thats why I
think the 'tdp_enabled' name makes sense in the mmu-context.
The global flag only shows if an mmu-context could be in tdp-state. So
tdp_usable may be a good name for it.

   


tdp is still used in both cases, so that name is confusing.  We could 
call it mmu.direct_map (and set it for real mode?) or mmu.virtual_map 
(with the opposite sense).  Or something.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest patched with pax causes "set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003?" flood on host

2010-03-10 Thread Avi Kivity

On 03/10/2010 06:17 PM, Antoine Martin wrote:

Hi,

I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the 
base system), rebuilt kvm.
... and now I get hundreds of those in dmesg on the host when I start 
a guest kernel that worked fine before. (2.6.33 + pax patch v5)

 set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033
 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b
 set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033
 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b


The guest is clearly confused.  Can you bisect kvm to find out what 
introduced this problem?



(hundreds of all 4)
And the VM just reboots shortly after starting init.
Funnily enough, I've got some VMs still running that kernel just fine! 
(as I started them before the headers+glibc+qemu-kvm rebuild)


Now, you might just say that I shouldn't use out of tree patches like 
pax, 


You can run anything you like in a guest.

but I just want to know one thing: should the guest kernel still be 
able to flood dmesg on the host like this?


No, these are debug messages.



Thanks
Antoine

PS: Avi, are you still interested in seeing if this rebuild fixes the 
pread/glibc bug?


I think we figured it out, but a confirmation would be nice.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shadow page table questions

2010-03-10 Thread Avi Kivity

On 03/11/2010 02:06 AM, Marek Olszewski wrote:
Thanks for the response.  I've looked through the code some more and 
think I have figured it out now.   I finally see that the root_hpa 
variable gets switched before entering the guest in mmu_alloc_roots, 
to correspond with the new cr3.  Thanks again.


Perhaps you can help me with one more question.  I was hoping to try 
out a certain change for a research project.   I would like to 
"privatize" kvm_mmu_page's and their spe's for each guest thread 
running in certain designated guest processes.  The goal is to give 
each thread its own shadow page table graphs that map the same guest 
logical addresses to guest physical addresses (with some changes to be 
introduced later).   Are there any assumptions that KVM makes that 
will break if I do something like this?  I understand that I will have 
to add some code throughout the mmu to make sure that these structures 
are synchronized when a guest thread makes a change, but I'm wondering 
if there is anything else.  Does the reverse mapping data structure 
you have assume that there is only one shadow page per guest page?


It doesn't, and there are often multiple shadow pages per guest page, 
distinguished by their sp->role field.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 07:41 PM, Paul Brook wrote:

You're much better off using a bulk-data transfer API that relaxes
coherency requirements.  IOW, shared memory doesn't make sense for TCG
   

Rather, tcg doesn't make sense for shared memory smp.  But we knew that
already.
 

In think TCG SMP is a hard, but soluble problem, especially when you're
running guests used to coping with NUMA.
   


Do you mean by using a per-cpu tlb?  These kind of solutions are 
generally slow, but tcg's slowness may mask this out.



TCG interacting with third parties via shared memory is probably never going
to make sense.
   


The third party in this case is qemu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM-test: SR-IOV: Fix a bug that wrongly check VFs count

2010-03-10 Thread Yolkfull Chow
The parameter 'devices_requested' is irrelated to driver_option 'max_vfs'
of 'igb'.

NIC card 82576 has two network interfaces and each can be
virtualized up to 7 virtual functions, therefore we multiply
two for the value of driver_option 'max_vfs' and can thus get
the total number of VFs.

Signed-off-by: Yolkfull Chow 
---
 client/tests/kvm/kvm_utils.py |   19 +--
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 4565dc1..1813ed1 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -1012,17 +1012,22 @@ class PciAssignable(object):
 """
 Get VFs count number according to lspci.
 """
+# FIXME: Need to think out a method of identify which
+# 'virtual function' belongs to which physical card considering
+# that if the host has more than one 82576 card. PCI_ID?
 cmd = "lspci | grep 'Virtual Function' | wc -l"
-# For each VF we'll see 2 prints of 'Virtual Function', so let's
-# divide the result per 2
-return int(commands.getoutput(cmd)) / 2
+return int(commands.getoutput(cmd))
 
 
 def check_vfs_count(self):
 """
 Check VFs count number according to the parameter driver_options.
 """
-return (self.get_vfs_count == self.devices_requested)
+# Network card 82576 has two network interfaces and each can be
+# virtualized up to 7 virtual functions, therefore we multiply
+# two for the value of driver_option 'max_vfs'.
+expected_count = int((re.findall("(\d)", self.driver_option)[0])) * 2
+return (self.get_vfs_count == expected_count)
 
 
 def is_binded_to_stub(self, full_id):
@@ -1054,15 +1059,17 @@ class PciAssignable(object):
 elif not self.check_vfs_count():
 os.system("modprobe -r %s" % self.driver)
 re_probe = True
+else:
+return True
 
 # Re-probe driver with proper number of VFs
 if re_probe:
 cmd = "modprobe %s %s" % (self.driver, self.driver_option)
+logging.info("Loading the driver '%s' with option '%s'" %
+   (self.driver, self.driver_option))
 s, o = commands.getstatusoutput(cmd)
 if s:
 return False
-if not self.check_vfs_count():
-return False
 return True
 
 
-- 
1.7.0.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Nick Piggin
On Thu, Mar 11, 2010 at 03:10:47AM +, Jamie Lokier wrote:
> Paul Brook wrote:
> > > > In a cross environment that becomes extremely hairy.  For example the 
> > > > x86
> > > > architecture effectively has an implicit write barrier before every
> > > > store, and an implicit read barrier before every load.
> > > 
> > > Btw, x86 doesn't have any implicit barriers due to ordinary loads.
> > > Only stores and atomics have implicit barriers, afaik.
> > 
> > As of March 2009[1] Intel guarantees that memory reads occur in
> > order (they may only be reordered relative to writes). It appears
> > AMD do not provide this guarantee, which could be an interesting
> > problem for heterogeneous migration..
> 
> (Summary: At least on AMD64, it does too, for normal accesses to
> naturally aligned addresses in write-back cacheable memory.)
> 
> Oh, that's interesting.  Way back when I guess we knew writes were in
> order and it wasn't explicit that reads were, hence smp_rmb() using a
> locked atomic.
> 
> Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD
> documents asserting that reads to cacheable memory are in program order:
> 
> http://lkml.org/lkml/2007/9/28/212
> Subject: [patch] x86: improved memory barrier implementation
> 
> Links to documents:
> 
> http://developer.intel.com/products/processor/manuals/318147.pdf
> 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
> 
> The Intel link doesn't work any more, but the AMD one does.

It might have been merged into their development manual now.

 
> Nick asserts "both manufacturers are committed to in-order loads from
> cacheable memory for the x86 architecture".

At the time we did ask Intel and AMD engineers. We talked with Andy
Glew from Intel I believe, but I can't recall the AMD contact.
Linus was involved in the discussions as well. We tried to do the
right thing with this.

> I have just read the AMD document, and it is in there (but not
> completely obviously), in section 7.2.  The implicit load-load and
> store-store barriers are only guaranteed for "normal cacheable
> accesses on naturally aligned boundaries to WB [write-back cacheable]
> memory".  There are also implicit load-store barriers but not
> store-load.
> 
> Note that the document covers AMD64; it does not say anything about
> their (now old) 32-bit processors.

Hmm. Well it couldn't hurt to ask again. We've never seen any
problems yet, so I'm rather sure we're in the clear.

> 
> > [*] The most recent docs I have handy. Up to and including Core-2 Duo.
> 
> Are you sure the read ordering applies to 32-bit Intel and AMD CPUs too?
> 
> Many years ago, before 64-bit x86 existed, I recall discussions on
> LKML where it was made clear that stores were performed in program
> order.  If it were known at the time that loads were performed in
> program order on 32-bit x86s, I would have expected that to have been
> mentioned by someone.

The way it was explained to us by the Intel engineer is that they
had implemented only visibly in-order loads, but they wanted to keep
their options open in future so they did not want to commit to in
order loads as an ISA feature.

So when the whitepaper was released we got their blessing to
retroactively apply the rules to previous CPUs.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread Rodrigo Campos
On Thu, Mar 11, 2010 at 11:59:45AM +0800, sati...@pacific.net.hk wrote:
> Hi Dustin,
> 
> >Or you can edit the /etc/init.d/kvm or
> >/etc/init.d/qemu-kvm init script and add the "-b" option to the
> >modprobe calls in there.
> 
> $ cat /etc/init.d/qemu-kvm | grep modprobe
>   if modprobe "$module"
> 
> Where shall I add "-b" option?  Thanks

If I'm not wrong, there. That "if" is loading the module and sees the return
code to give an error if it fails to load.





Thanks,
Rodrigo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread satimis

Quoting Bitman Zhou :

- snip -


Please what further command I have to run in order to activate the new
blacklist.conf ?


For Ubutnu, you can just use update-rc.d

sudo update-rc.d kvm disable

to disable kvm and

sudo update-rc.d kvm enable

to enable it again.



Tks for your advice.


B.R.
Stephen L


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread satimis

Hi Dustin,

Thanks for your advice.

- snip -


Or you can edit the /etc/init.d/kvm or
/etc/init.d/qemu-kvm init script and add the "-b" option to the
modprobe calls in there.


$ cat /etc/init.d/kvm | grep modprobe
No printout


$ cat /etc/init.d/qemu-kvm | grep modprobe
if modprobe "$module"

.
if modprobe "$module"
then
log_end_msg 0
else
log_end_msg 1
exit 1
fi
;;



Where shall I add "-b" option?  Thanks


B.R.
Stephen

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Jamie Lokier
Paul Brook wrote:
> > > In a cross environment that becomes extremely hairy.  For example the x86
> > > architecture effectively has an implicit write barrier before every
> > > store, and an implicit read barrier before every load.
> > 
> > Btw, x86 doesn't have any implicit barriers due to ordinary loads.
> > Only stores and atomics have implicit barriers, afaik.
> 
> As of March 2009[1] Intel guarantees that memory reads occur in
> order (they may only be reordered relative to writes). It appears
> AMD do not provide this guarantee, which could be an interesting
> problem for heterogeneous migration..

(Summary: At least on AMD64, it does too, for normal accesses to
naturally aligned addresses in write-back cacheable memory.)

Oh, that's interesting.  Way back when I guess we knew writes were in
order and it wasn't explicit that reads were, hence smp_rmb() using a
locked atomic.

Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD
documents asserting that reads to cacheable memory are in program order:

http://lkml.org/lkml/2007/9/28/212
Subject: [patch] x86: improved memory barrier implementation

Links to documents:

http://developer.intel.com/products/processor/manuals/318147.pdf

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

The Intel link doesn't work any more, but the AMD one does.

Nick asserts "both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture".

I have just read the AMD document, and it is in there (but not
completely obviously), in section 7.2.  The implicit load-load and
store-store barriers are only guaranteed for "normal cacheable
accesses on naturally aligned boundaries to WB [write-back cacheable]
memory".  There are also implicit load-store barriers but not
store-load.

Note that the document covers AMD64; it does not say anything about
their (now old) 32-bit processors.

> [*] The most recent docs I have handy. Up to and including Core-2 Duo.

Are you sure the read ordering applies to 32-bit Intel and AMD CPUs too?

Many years ago, before 64-bit x86 existed, I recall discussions on
LKML where it was made clear that stores were performed in program
order.  If it were known at the time that loads were performed in
program order on 32-bit x86s, I would have expected that to have been
mentioned by someone.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shadow page table questions

2010-03-10 Thread Marek Olszewski
Thanks for the response.  I've looked through the code some more and 
think I have figured it out now.   I finally see that the root_hpa 
variable gets switched before entering the guest in mmu_alloc_roots, to 
correspond with the new cr3.  Thanks again.


Perhaps you can help me with one more question.  I was hoping to try out 
a certain change for a research project.   I would like to "privatize" 
kvm_mmu_page's and their spe's for each guest thread running in certain 
designated guest processes.  The goal is to give each thread its own 
shadow page table graphs that map the same guest logical addresses to 
guest physical addresses (with some changes to be introduced later).   
Are there any assumptions that KVM makes that will break if I do 
something like this?  I understand that I will have to add some code 
throughout the mmu to make sure that these structures are synchronized 
when a guest thread makes a change, but I'm wondering if there is 
anything else.  Does the reverse mapping data structure you have assume 
that there is only one shadow page per guest page?


Thanks!

Marek


Avi Kivity wrote:

On 03/10/2010 06:57 AM, Marek Olszewski wrote:

Hello,

I was wondering if someone could point me to some documentation that 
explains the basic non-nested-paging shadow page table 
algorithm/strategy used by KVM.  I understand that KVM caches shadow 
page tables across context switches and that there is a reverse 
mapping and page protection to help zap shadow page tables when the 
guest page tables change.  However, I'm not entirely sure how the 
actual caching is done.  At first I assumed that KVM would change the 
host CR3 on every guest context switch such that it would point to a 
cached shadow page table for the currently running guest user thread, 
however, as far as I can tell, the host CR3 does not change so I'm a 
little lost.  If indeed it doesn't change the CR3, how does KVM solve 
the problem that arises when two processes in the guest OS share the 
same guest logical addresses?


The host cr3 does change, though not by using the 'mov cr3' 
instruction (that would cause the host to immediately switch to the 
guest address space, which would be bad).


See the calls to kvm_x86_ops->set_cr3().



I'm also interested in figuring out what KVM does when running with 
multiple virtual CPUs.  Looking at the code, I can see that each VCPU 
has its own root pointer to a shadow page table graph, but I have yet 
to figure out if this graph has node's shared between VCPUs, or 
whether they are all private.


Everything is shared.  If the guest is running with identical cr3s, 
kvm will load identical cr3s in guest mode.


An exception is when we use 32-bit pae mode.  In that case, the guest 
cr3s will be different (but guest PDPTRs will be identical).  Instead 
of dealing with the pae cr3, we deal with the four PDPTRs.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][ PATCH 1/3] vhost-net: support multiple buffer heads in receiver

2010-03-10 Thread David Stevens
"Michael S. Tsirkin"  wrote on 03/07/2010 11:45:33 PM:

> > > > +static int skb_head_len(struct sk_buff_head *skq)
> > > > +{
> > > > +   struct sk_buff *head;
> > > > +
> > > > +   head = skb_peek(skq);
> > > > +   if (head)
> > > > +   return head->len;
> > > > +   return 0;
> > > > +}
> > > > +
> > > 
> > > This is done without locking, which I think can crash
> > > if skb is consumed after we peek it but before we read the
> > > length.
> > 
> > This thread is the only legitimate consumer, right? But
> > qemu has the file descriptor and I guess we shouldn't trust
> > that it won't give it to someone else; it'd break vhost, but
> > a crash would be inappropriate, of course. I'd like to avoid
> > the lock, but I have another idea here, so will investigate.

I looked at this some more and actually, I'm not sure I
see a crash here.
First, without qemu, or something it calls, being broken
as root, nothing else should ever read from the socket, in which
case the length will be exactly right for the next packet we
read. No problem.
But if by some error this skb is freed, we'll have valid
memory address that isn't the length field of the next packet
we'll read.
If the length is negative or more than available in the
vring, we'll fail the buffer allocation, exit the loop, and
get the new head length of the receive queue the next time
around -- no problem.
If the length field is 0, we'll exit the loop even
though we have data to read, but will get that packet the
next time we get in here, again, with the right length.
No problem.
If the length field is big enough to allocate buffer
space for it, but smaller than the new head we have to read,
the recvmsg will fail with EMSGSIZE, drop the packet, exit
the loop and be back in business with the next packet. No
problem.
Otherwise, the packet will fit and be delivered.

I don't much like the notion of using skb->head when
it's garbage, but that can only happen if qemu has broken,
and I don't see a crash unless the skb is not only freed
but no longer a valid memory address for reading at all,
and all within the race window.
Since the code handles other failure cases (not
enough ring buffers or packet not fitting in the allocated
buffers), the actual length value only matters in the
sense that it prevents us from using buffers unnecessarily--
something that isn't all that relevant if it's hosed enough
to have unauthorized readers on the socket.

Is this case worth the performance penalty we'll no
doubt pay for either locking the socket or always allocating
for a max-sized packet? I'll experiment with a couple
solutions here, but unless I've missed something, we might
be better off just leaving it as-is.

+-DLS


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ideas wiki for GSoC 2010

2010-03-10 Thread Luiz Capitulino

 Hi there,

 Our wiki page for the Summer of Code 2010 is doing quite well:

http://wiki.qemu.org/Google_Summer_of_Code_2010

 Now the most important is:

1. Get mentors assigned to projects. Just put your name and email in the
   right field. It's ok and even desirable to have two mentors per project,
   but please remember that mentoring is serious work, more info here:

   http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors

   http://gsoc-wiki.osuosl.org/index.php/Main_Page

2. Do we have kvm-specific projects? Can they be part of the QEMU project
   or do we need a different mentoring organization for it?

3. Fill in the missing information for the suggested project (description,
   skill level, languages, etc)

 I will complete our application tomorrow or on Friday.

PS: I'm CC'ing everyone who suggested projects there, except one or two
I couldn't find the email address.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread Dustin Kirkland
On Wed, Mar 10, 2010 at 3:08 AM, Bitman Zhou  wrote:
>> I need to stop KVM starting at boot.
>>
>> I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf
>> blacklist kvm
>> blacklist kvm-amd
>>
>>
>> Reboot PC
>>
>> It doesn't work.
>>
>> $ lsmod | grep kvm
>> kvm_amd                41556  0
>> kvm                   190648  1 kvm_amd
>>
>>
>> Please what further command I have to run in order to activate the new
>> blacklist.conf ?
>
> For Ubutnu, you can just use update-rc.d
>
> sudo update-rc.d kvm disable
>
> to disable kvm and
>
> sudo update-rc.d kvm enable
>
> to enable it again.

Hi there,

Unfortunately, the /etc/init.d/kvm and /etc/init.d/qemu-kvm init
scripts in previous Ubuntu releases (9.10 and earlier) didn't respect
the module blacklists.  I have corrected this in Ubuntu 10.04 by using
modprobe -b.  Thus for Ubuntu 10.04 forward, you should be able to use
the blacklist appropriately.

For other release, you can disable the init script entirely as the
other responder wrote.  Or you can edit the /etc/init.d/kvm or
/etc/init.d/qemu-kvm init script and add the "-b" option to the
modprobe calls in there.


-- 
:-Dustin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit qemu + 64-bit kvm be a problem?

2010-03-10 Thread Michael Tokarev
Neo Jia wrote:
> hi,
> 
> I have to keep a 32-bit qmeu user space to work with some legacy
> library I have but still want to use 64-bit host Linux to explore
> 64-bit advantage.
> 
> So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod
> configuration. Will there be any limitation or drawback for this
> configuration? I already get one that we can't assign guest physical
> memory more than 2047 MB.

I use 32bit kvm on 64bit kernel since the day one.  Nothing of interest
since that, everything just works.

Recently (this week) I come across a situation when something does not
work in 64/32 mode.  Namely it is linux aio (see the other thread in
kvm@ a few days back) - but this is not due to kvm but due to other
kernel subsystem (in this case aio) which lacks proper compat handlers
in place.

Generally I reported quite several issues in this config - here or there
there were issues, something did not work.  Now the places where we've
issues are decreasing (hopefully anyway), at least I haven't seen issues
recently, except of this aio stuff.

But strictly speaking, I don't see any good reason to run 32bit kvm on
64 bit kernel either.  Most distributions nowadays provide a set of
64bit libraries for their 32bit versions so that limited support for
64bit binaries are available.  This is mostly enough for kvm - without
X and SDL support it works just fine (using vnc display).  Historically
I've 32bit userspace, but most guests now are running with 64bit kvm -
either because the guests switched to 64bit kernel or because aio thing
or just because I looks like it is more efficient (less syscall/ioctl
32=>64 translation and the like).  kvm itself uses only very few memory
so here it almost makes no difference between 32 and 64 bits (in 64bit
pointers are larger and hence usually more memory is used).  Yes, it is
difficult to provide everything needed for sdl, but for our tasks SDL
windows aren't really necessary, and for testing 32bit mode works just
fine too...

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [Autotest] [KVM-AUTOTEST] fix tap interface for parallel execution

2010-03-10 Thread Michael Goldish

- "Yogananth Subramanian"  wrote:

> Adds support to create guests with different MAC address during
> parallel
> execution of autotest, this is done by creating worker dicts with
> different "address_index"
> 
> Signed-off-by: Yogananth Subramanian 
> ---
>  client/tests/kvm/kvm_scheduler.py |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/client/tests/kvm/kvm_scheduler.py
> b/client/tests/kvm/kvm_scheduler.py
> index 93b7df6..9000391 100644
> --- a/client/tests/kvm/kvm_scheduler.py
> +++ b/client/tests/kvm/kvm_scheduler.py
> @@ -33,7 +33,8 @@ class scheduler:
>  # "Personal" worker dicts contain modifications that are applied
>  # specifically to each worker.  For example, each worker must use a
>  # different environment file and a different MAC address pool.
> -self.worker_dicts = [{"env": "env%d" % i} for i in 
> range(num_workers)]
> +self.worker_dicts = [{"env": "env%d" % i, "address_index": i-1} 
> + for i in range(num_workers)]

This approach won't work in the general case -- some tests use more than 1 VM
and each VM requires a different address_index.

address_pools.cfg defines, for each host, a MAC address pool.
Every pool consists of several contiguous ranges, and looks something like this:

address_ranges = r1 r2 r3

address_range_base_mac_r1 = 52:54:00:12:34:56
address_range_size_r1 = 20

address_range_base_mac_r2 = 52:54:00:12:80:00
address_range_size_r2 = 20

... (more ranges here)

The pool itself needs to be split between the parallel workers, so that each
worker has its own completely separate pool.  In other words, the parameters
address_ranges, address_range_base_mac_* and address_range_size_* need to be
modified in 'self.worker_dicts', not address_index.

For example, if a pool has 2 ranges:
r1r2
  -
and there are 3 workers, the pool needs to be distributed evenly like this:
r1  r2r3r4
---   - ---
so that worker A gets r1, worker B gets [r2, r3] and worker C gets r4.

This shouldn't be very hard.  I'll see if I can work on a patch that will do 
this.

>  
>  
>  def worker(self, index, run_test_func):
> -- 
> 1.6.0.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Paul Brook
> > You're much better off using a bulk-data transfer API that relaxes
> > coherency requirements.  IOW, shared memory doesn't make sense for TCG
> 
> Rather, tcg doesn't make sense for shared memory smp.  But we knew that
> already.

In think TCG SMP is a hard, but soluble problem, especially when you're 
running guests used to coping with NUMA.

TCG interacting with third parties via shared memory is probably never going 
to make sense.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 07:13 PM, Anthony Liguori wrote:

On 03/10/2010 03:25 AM, Avi Kivity wrote:

On 03/09/2010 11:44 PM, Anthony Liguori wrote:
Ah yes.  For cross tcg environments you can map the memory using 
mmio callbacks instead of directly, and issue the appropriate 
barriers there.



Not good enough unless you want to severely restrict the use of 
shared memory within the guest.


For instance, it's going to be useful to assume that you atomic 
instructions remain atomic.  Crossing architecture boundaries here 
makes these assumptions invalid.  A barrier is not enough.


You could make the mmio callbacks flow to the shared memory server 
over the unix-domain socket, which would then serialize them.  Still 
need to keep RMWs as single operations.  When the host supports it, 
implement the operation locally (you can't render cmpxchg16b on i386, 
for example).


But now you have a requirement that the shmem server runs in lock-step 
with the guest VCPU which has to happen for every single word of data 
transferred.




Alternative implementation: expose a futex in a shared memory object and 
use that to serialize access.  Now all accesses happen from vcpu 
context, and as long as there is no contention, should be fast, at least 
relative to tcg.


You're much better off using a bulk-data transfer API that relaxes 
coherency requirements.  IOW, shared memory doesn't make sense for TCG 
:-)


Rather, tcg doesn't make sense for shared memory smp.  But we knew that 
already.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Anthony Liguori

On 03/10/2010 03:25 AM, Avi Kivity wrote:

On 03/09/2010 11:44 PM, Anthony Liguori wrote:
Ah yes.  For cross tcg environments you can map the memory using 
mmio callbacks instead of directly, and issue the appropriate 
barriers there.



Not good enough unless you want to severely restrict the use of 
shared memory within the guest.


For instance, it's going to be useful to assume that you atomic 
instructions remain atomic.  Crossing architecture boundaries here 
makes these assumptions invalid.  A barrier is not enough.


You could make the mmio callbacks flow to the shared memory server 
over the unix-domain socket, which would then serialize them.  Still 
need to keep RMWs as single operations.  When the host supports it, 
implement the operation locally (you can't render cmpxchg16b on i386, 
for example).


But now you have a requirement that the shmem server runs in lock-step 
with the guest VCPU which has to happen for every single word of data 
transferred.


You're much better off using a bulk-data transfer API that relaxes 
coherency requirements.  IOW, shared memory doesn't make sense for TCG :-)


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest] [KVM-AUTOTEST] Patch to fix tap interface support for parallel execution

2010-03-10 Thread Yogananth Subramanian

Hello Lucas,
I like to submit a patch to fix support for tap interface in kvm-autotest,
while executin it in parallel. This is done by creating workers with 
different "addres_index", so that MAC address of the guests created will
be unique. The currents implementation tries to create just differnt env
files and not different Mac address or address_index, so all the guests endup
using the same address_index and MAC address.

Thanks and Regards
Yogi


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [Autotest] [KVM-AUTOTEST] fix tap interface for parallel execution

2010-03-10 Thread Yogananth Subramanian
Adds support to create guests with different MAC address during parallel
execution of autotest, this is done by creating worker dicts with
different "address_index"

Signed-off-by: Yogananth Subramanian 
---
 client/tests/kvm/kvm_scheduler.py |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_scheduler.py 
b/client/tests/kvm/kvm_scheduler.py
index 93b7df6..9000391 100644
--- a/client/tests/kvm/kvm_scheduler.py
+++ b/client/tests/kvm/kvm_scheduler.py
@@ -33,7 +33,8 @@ class scheduler:
 # "Personal" worker dicts contain modifications that are applied
 # specifically to each worker.  For example, each worker must use a
 # different environment file and a different MAC address pool.
-self.worker_dicts = [{"env": "env%d" % i} for i in range(num_workers)]
+self.worker_dicts = [{"env": "env%d" % i, "address_index": i-1} 
+ for i in range(num_workers)]
 
 
 def worker(self, index, run_test_func):
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Cam Macdonell
On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivity  wrote:
> On 03/09/2010 08:34 PM, Cam Macdonell wrote:
>>
>> On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity  wrote:
>>
>>>
>>> On 03/09/2010 05:27 PM, Cam Macdonell wrote:
>>>


>
>
>>
>>  Registers are used
>> for synchronization between guests sharing the same memory object when
>> interrupts are supported (this requires using the shared memory
>> server).
>>
>>
>>
>
> How does the driver detect whether interrupts are supported or not?
>
>

 At the moment, the VM ID is set to -1 if interrupts aren't supported,
 but that may not be the clearest way to do things.  With UIO is there
 a way to detect if the interrupt pin is on?


>>>
>>> I suggest not designing the device to uio.  Make it a good
>>> guest-independent
>>> device, and if uio doesn't fit it, change it.
>>>
>>> Why not support interrupts unconditionally?  Is the device useful without
>>> interrupts?
>>>
>>
>> Currently my patch works with or without the shared memory server.  If
>> you give the parameter
>>
>> -ivshmem 256,foo
>>
>> then this will create (if necessary) and map /dev/shm/foo as the
>> shared region without interrupt support.  Some users of shared memory
>> are using it this way.
>>
>> Going forward we can require the shared memory server and always have
>> interrupts enabled.
>>
>
> Can you explain how they synchronize?  Polling?  Using the network?  Using
> it as a shared cache?
>
> If it's a reasonable use case it makes sense to keep it.
>

Do you mean how they synchronize without interrupts?  One project I've
been contacted about uses the shared region directly for
synchronization for simulations running in different VMs that share
data in the memory region.  In my tests spinlocks in the shared region
work between guests.

If we want to keep the serverless implementation, do we need to
support shm_open with -chardev somehow? Something like -chardev
shm,name=foo.  Right now my qdev implementation just passes the name
to the -device option and opens it.

> Another thing comes to mind - a shared memory ID, in case a guest has
> multiple cards.

Sure, a number that can be passed on the command-line and stored in a register?

Cam
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2941282 ] Ubuntu 10.04 installer fails due to I/O errors with virtio

2010-03-10 Thread SourceForge.net
Bugs item #2941282, was opened at 2010-01-27 23:19
Message generated for change (Comment added) made by bschmidt
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2941282&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Marti Raudsepp (intgr)
Assigned to: Nobody/Anonymous (nobody)
Summary: Ubuntu 10.04 installer fails due to I/O errors with virtio

Initial Comment:
I tried installing Ubuntu 10.04 and Fedora 12 in a KVM virtual machine using 
virtio, on a 8G raw file-backed disk. Both installers failed half-way due to 
I/O errors. So I tried reproducing it and managed to repeat it 6 times. The bug 
doesn't occur with IDE emulation. The bug happens fairly quickly with -smp 4 -- 
usually within 5 minutes -- but is much rarer with -smp 1.

Ubuntu installer has kernel 2.6.32-11-generic
Fedora 12 has kernel 2.6.31.5-127.fc12.x86_64
Host has kernel 2.6.32.6 (Arch Linux) and QEMU 0.12.2

When testing with -smp 1, it also produced a kernel oops from 
"block/blk-core.c:245". This line warns when the function is called with 
interrupts enabled:
void blk_start_queue(struct request_queue *q)
{
WARN_ON(!irqs_disabled());

queue_flag_clear(QUEUE_FLAG_STOPPED, q);
__blk_run_queue(q);
}


--- host machine ---

[ma...@newn]% qemu-kvm --version
QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 2003-2008 
Fabrice Bellard
[ma...@newn]% ps aux |grep crash
root 16283 31.4  7.1 427020 289960 ?   Sl   22:44   8:37 
/usr/bin/qemu-kvm -S -M pc-0.11 -enable-kvm -m 256 -smp 1 -name 
ubuntu-crashtest -uuid 0d7d4f2d-5589-160b-1f1b-75d46e293a2c -chardev 
socket,id=monitor,path=/var/lib/libvirt/qemu/ubuntu-crashtest.monitor,server,nowait
 -monitor chardev:monitor -boot d -drive 
file=/store/iso/lucid-desktop-amd64.iso,if=ide,media=cdrom,index=2,format=raw 
-drive file=/store/virt/ubuntu-crashtest.img,if=virtio,index=0,format=raw -net 
nic,macaddr=52:54:00:45:e7:19,vlan=0,name=nic.0 -net 
tap,fd=43,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice tablet 
-vnc 127.0.0.1:1 -k en-us -vga cirrus -soundhw es1370 -balloon virtio
marti17700  0.0  0.0   8360   968 pts/4S+   23:11   0:00 grep crash
[ma...@newn]% stat /store/virt/ubuntu-crashtest.img
  File: `/store/virt/ubuntu-crashtest.img'
  Size: 8589934592  Blocks: 5615368IO Block: 4096   regular file
Device: fe01h/65025dInode: 4718596 Links: 1
Access: (0600/-rw---)  Uid: (0/root)   Gid: (0/root)
Access: 2010-01-27 22:43:45.128113080 +0200
Modify: 2010-01-27 23:09:11.523577452 +0200
Change: 2010-01-27 23:09:11.523577452 +0200
[ma...@newn]% uname -a
Linux newn 2.6.32-ARCH #1 SMP PREEMPT Mon Jan 25 20:33:50 CET 2010 x86_64 AMD 
Phenom(tm) II X4 940 Processor AuthenticAMD GNU/Linux
[ma...@newn]% cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model   : 4
model name  : AMD Phenom(tm) II X4 940 Processor
stepping: 2
cpu MHz : 800.000
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni 
monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips: 6028.69
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

*snip* three more CPU cores

--- Ubuntu guest VM ---

ubu...@ubuntu:/tmp$ uname -a
Linux ubuntu 2.6.32-11-generic #15-Ubuntu SMP Tue Jan 19 20:38:41 UTC 2010 
x86_64 GNU/Linux
ubu...@ubuntu:/tmp$ cat /sys/block/vda/stat
   7388948289  1661218   39497026765   947851  6284676  9459960
0   987890  9893220
ubu...@ubuntu:/tmp$ dmesg
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 2.6.32-11-generic (bui...@crested) (gcc version 
4.4.3 20100116 (prerelease) (Ubuntu 4.4.2-9ubuntu4) ) #15-Ubuntu SMP Tue Jan 19 
20:38:41 UTC 2010 (Ubuntu 2.6.32-11.15-generic)
[0.00] Command line: BOOT_IMAGE=/casper/vmlinuz 
file=/cdrom/preseed/ubuntu.seed boot=casper only-ubiquity 
initrd=/casper/initrd.lz quiet splash --
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] BIOS-provide

guest patched with pax causes "set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003?" flood on host

2010-03-10 Thread Antoine Martin

Hi,

I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the 
base system), rebuilt kvm.
... and now I get hundreds of those in dmesg on the host when I start a 
guest kernel that worked fine before. (2.6.33 + pax patch v5)

 set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033
 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b
 set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033
 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b
(hundreds of all 4)
And the VM just reboots shortly after starting init.
Funnily enough, I've got some VMs still running that kernel just fine! 
(as I started them before the headers+glibc+qemu-kvm rebuild)


Now, you might just say that I shouldn't use out of tree patches like 
pax, but I just want to know one thing: should the guest kernel still be 
able to flood dmesg on the host like this?


Thanks
Antoine

PS: Avi, are you still interested in seeing if this rebuild fixes the 
pread/glibc bug?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to tweak kernel to get the best out of kvm?

2010-03-10 Thread Avi Kivity

On 03/10/2010 05:57 PM, Javier Guerra Giraldez wrote:

On Wed, Mar 10, 2010 at 8:15 AM, Avi Kivity  wrote:
   

15 guests should fit comfortably, more with ksm running if the workloads are
similar, or if you use ballooning.
 

is there any simple way to get some stats to see how is ksm doing?
   


See /sys/kernel/mm/ksm

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to tweak kernel to get the best out of kvm?

2010-03-10 Thread Javier Guerra Giraldez
On Wed, Mar 10, 2010 at 8:15 AM, Avi Kivity  wrote:
> 15 guests should fit comfortably, more with ksm running if the workloads are
> similar, or if you use ballooning.

is there any simple way to get some stats to see how is ksm doing?

-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-10 Thread Joerg Roedel
On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote:
> On 03/10/2010 04:44 PM, Joerg Roedel wrote:
> >On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
> >>On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >>>This patch changes the tdp_enabled flag from its global
> >>>meaning to the mmu-context. This is necessary for Nested SVM
> >>>with emulation of Nested Paging where we need an extra MMU
> >>>context to shadow the Nested Nested Page Table.
> >>>
> >>>
> >>>diff --git a/arch/x86/include/asm/kvm_host.h 
> >>>b/arch/x86/include/asm/kvm_host.h
> >>>index ec891a2..e7bef19 100644
> >>>--- a/arch/x86/include/asm/kvm_host.h
> >>>+++ b/arch/x86/include/asm/kvm_host.h
> >>>@@ -254,6 +254,7 @@ struct kvm_mmu {
> >>>   int root_level;
> >>>   int shadow_root_level;
> >>>   union kvm_mmu_page_role base_role;
> >>>+  bool tdp_enabled;
> >>>
> >>This needs a different name, since the old one is still around.
> >>Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
> >Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
> >The global variable indicates if tdp is _usable_ and we can _enable_ it
> >for a mmu context.
> 
> I think of the global flags as host tdp, and the mmu as guest tdp
> (but maybe this is wrong?).  If that makes sense, the naming should
> reflect that.

The basic flow of the mmu state with npt-npt is:

1. As long as the L1 is running the arch.mmu context is in tdp
   mode and builds a direct-mapped page table.

2. When vmrun is emulated and the nested vmcb enables nested
   paging, arch.mmu is switched to a shadow-mmu mode which now
   shadows the l1 nested page table.
   So when the l2-guest runs with nested paging the
   arch.mmu.tdp_enabled variable on the host is false.

3. On a vmexit emulation the mmu is switched back to tdp
   handling state.

So the mmu.tdp_enabled parameter is about tdp being enabled for the
mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped
page table when true or shadow a l1-page-table when false). Thats why I
think the 'tdp_enabled' name makes sense in the mmu-context.
The global flag only shows if an mmu-context could be in tdp-state. So
tdp_usable may be a good name for it.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] QMP: Sync with upstream event changes

2010-03-10 Thread Luiz Capitulino

This commit contains the following QMP event changes to sync
kvm_main_loop() with upstream:

- Add the SHUTDOWN event (it's currently missing here)
- Drop the RESET event (it's now emitted in qemu_system_reset())
- Drop the DEBUG event (it has been dropped upstream)

Signed-off-by: Luiz Capitulino 
---
 qemu-kvm.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index e417f21..2233a37 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2038,6 +2038,7 @@ int kvm_main_loop(void)
 while (1) {
 main_loop_wait(1000);
 if (qemu_shutdown_requested()) {
+monitor_protocol_event(QEVENT_SHUTDOWN, NULL);
 if (qemu_no_shutdown()) {
 vm_stop(0);
 } else
@@ -2046,10 +2047,8 @@ int kvm_main_loop(void)
 monitor_protocol_event(QEVENT_POWERDOWN, NULL);
 qemu_irq_raise(qemu_system_powerdown);
 } else if (qemu_reset_requested()) {
-monitor_protocol_event(QEVENT_RESET, NULL);
 qemu_kvm_system_reset();
 } else if (kvm_debug_cpu_requested) {
-monitor_protocol_event(QEVENT_DEBUG, NULL);
 gdb_set_stop_cpu(kvm_debug_cpu_requested);
 vm_stop(EXCP_DEBUG);
 kvm_debug_cpu_requested = NULL;
-- 
1.7.0.2.182.ge007

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-10 Thread Avi Kivity

On 03/10/2010 04:44 PM, Joerg Roedel wrote:

On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
   

On 03/03/2010 09:12 PM, Joerg Roedel wrote:
 

This patch changes the tdp_enabled flag from its global
meaning to the mmu-context. This is necessary for Nested SVM
with emulation of Nested Paging where we need an extra MMU
context to shadow the Nested Nested Page Table.


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..e7bef19 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_mmu {
int root_level;
int shadow_root_level;
union kvm_mmu_page_role base_role;
+   bool tdp_enabled;

   

This needs a different name, since the old one is still around.
Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.
 

Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
The global variable indicates if tdp is _usable_ and we can _enable_ it
for a mmu context.
   


I think of the global flags as host tdp, and the mmu as guest tdp (but 
maybe this is wrong?).  If that makes sense, the naming should reflect that.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: MMU: Reinstate pte prefetch on invlpg

2010-03-10 Thread Avi Kivity
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:

  "The processor may create entries in paging-structure caches for
   translations required for prefetches and for accesses that are a
   result of speculative execution that would never actually occur
   in the executed code path."

And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.

Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter.  If a race occured, do not install the pte.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   37 +++--
 arch/x86/kvm/paging_tmpl.h  |   15 +++
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..fb2afda 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,6 +389,7 @@ struct kvm_arch {
unsigned int n_free_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_alloc_mmu_pages;
+   atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 086025e..e821609 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2611,20 +2611,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int flooded = 0;
int npte;
int r;
+   int invlpg_counter;
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
-   switch (bytes) {
-   case 4:
-   gentry = *(const u32 *)new;
-   break;
-   case 8:
-   gentry = *(const u64 *)new;
-   break;
-   default:
-   gentry = 0;
-   break;
-   }
+   invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
 
/*
 * Assume that the pte write on a page table of the same type
@@ -2632,16 +2623,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 * (might be false while changing modes).  Note it is verified later
 * by update_pte().
 */
-   if (is_pae(vcpu) && bytes == 4) {
+   if ((is_pae(vcpu) && bytes == 4) || !new) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   gpa &= ~(gpa_t)7;
-   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8);
+   if (is_pae(vcpu)) {
+   gpa &= ~(gpa_t)7;
+   bytes = 8;
+   }
+   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
if (r)
gentry = 0;
+   new = (const u8 *)&gentry;
+   }
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
}
 
mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(&vcpu->kvm->mmu_lock);
+   if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
+   gentry = 0;
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 4b37e1a..067797a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -463,6 +463,7 @@ out_unlock:
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
struct kvm_shadow_walk_iterator iterator;
+   gpa_t pte_gpa = -1;
int level;
u64 *sptep;
int need_flush = 0;
@@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (level == PT_PAGE_TABLE_LEVEL  ||
((level == PT_DIRECTORY_LEVEL && is_large_pte(*sptep))) ||
((level == PT_PDPE_LEVEL && is_large_pte(*sptep {
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   pte_gpa = (sp->gfn << PAGE_SHIFT);
+   pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
if (is_shadow_present_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
@@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 
if (need_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
+
+   atomic_inc(&vcpu->kvm->arch.invlpg_counter);
+
spin_unlock(&vcpu->kvm->mmu_lock);
+
+   if (pte_gpa == -1)
+   return;
+
+   if (mmu_topup_memory_caches(vcpu)

[PATCH 4/5] KVM: MMU: Do not instantiate nontrapping spte on unsync page

2010-03-10 Thread Avi Kivity
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written.  This is fine since at present it is only
used on sync pages.  However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.

Needed for the next patch which reinstates update_pte() on invlpg.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/paging_tmpl.h |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 81eab9a..4b37e1a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *page,
pt_element_t gpte;
unsigned pte_access;
pfn_t pfn;
+   u64 new_spte;
 
gpte = *(const pt_element_t *)pte;
if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) {
-   if (!is_present_gpte(gpte))
-   __set_spte(spte, shadow_notrap_nonpresent_pte);
+   if (!is_present_gpte(gpte)) {
+   if (page->unsync)
+   new_spte = shadow_trap_nonpresent_pte;
+   else
+   new_spte = shadow_notrap_nonpresent_pte;
+   __set_spte(spte, new_spte);
+   }
return;
}
pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: Make locked operations truly atomic

2010-03-10 Thread Avi Kivity
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.

These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest.  This may cause corruption of guest page tables.

Fix by using an atomic cmpxchg for these operations.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |   69 
 1 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3753c11..8558a1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3310,41 +3310,68 @@ int emulator_write_emulated(unsigned long addr,
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
+#define CMPXCHG_TYPE(t, ptr, old, new) \
+   (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old))
+
+#ifdef CONFIG_X86_64
+#  define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new)
+#else
+#  define CMPXCHG64(ptr, old, new) \
+   (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old))
+#endif
+
 static int emulator_cmpxchg_emulated(unsigned long addr,
 const void *old,
 const void *new,
 unsigned int bytes,
 struct kvm_vcpu *vcpu)
 {
-   printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
-#ifndef CONFIG_X86_64
-   /* guests cmpxchg8b have to be emulated atomically */
-   if (bytes == 8) {
-   gpa_t gpa;
-   struct page *page;
-   char *kaddr;
-   u64 val;
+   gpa_t gpa;
+   struct page *page;
+   char *kaddr;
+   bool exchanged;
 
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
+   /* guests cmpxchg8b have to be emulated atomically */
+   if (bytes > 8 || (bytes & (bytes - 1)))
+   goto emul_write;
 
-   if (gpa == UNMAPPED_GVA ||
-  (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
-   goto emul_write;
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
 
-   if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
-   goto emul_write;
+   if (gpa == UNMAPPED_GVA ||
+   (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
+   goto emul_write;
 
-   val = *(u64 *)new;
+   if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
+   goto emul_write;
 
-   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+   page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT);
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val);
-   kunmap_atomic(kaddr, KM_USER0);
-   kvm_release_page_dirty(page);
+   kaddr = kmap_atomic(page, KM_USER0);
+   kaddr += offset_in_page(gpa);
+   switch (bytes) {
+   case 1:
+   exchanged = CMPXCHG_TYPE(u8, kaddr, old, new);
+   break;
+   case 2:
+   exchanged = CMPXCHG_TYPE(u16, kaddr, old, new);
+   break;
+   case 4:
+   exchanged = CMPXCHG_TYPE(u32, kaddr, old, new);
+   break;
+   case 8:
+   exchanged = CMPXCHG64(kaddr, old, new);
+   break;
+   default:
+   BUG();
}
+   kunmap_atomic(kaddr, KM_USER0);
+   kvm_release_page_dirty(page);
+
+   if (!exchanged)
+   return X86EMUL_CMPXCHG_FAILED;
+
 emul_write:
-#endif
+   printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
 
return emulator_write_emulated(addr, new, bytes, vcpu);
 }
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)

2010-03-10 Thread Avi Kivity
Currently when we emulate a locked operation into a shadowed guest page
table, we perform a write rather than a true atomic.  This is indicated
by the "emulating exchange as write" message that shows up in dmesg.

In addition, the pte prefetch operation during invlpg suffered from a
race.  This was fixed by removing the operation.

This patchset fixes both issues and reinstates pte prefetch on invlpg.

v2:
   - fix truncated description for patch 1
   - add new patch 4, which fixes a bug in patch 5

Avi Kivity (5):
  KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
  KVM: Make locked operations truly atomic
  KVM: Don't follow an atomic operation by a non-atomic one
  KVM: MMU: Do not instantiate nontrapping spte on unsync page
  KVM: MMU: Reinstate pte prefetch on invlpg

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   78 +++---
 arch/x86/kvm/paging_tmpl.h  |   25 +-
 arch/x86/kvm/x86.c  |  101 ---
 4 files changed, 137 insertions(+), 68 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()

2010-03-10 Thread Avi Kivity
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes.  Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/mmu.c |   69 +++
 1 files changed, 31 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 741373e..086025e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2558,36 +2558,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
 }
 
 static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *new, int bytes)
+ u64 gpte)
 {
gfn_t gfn;
-   int r;
-   u64 gpte = 0;
pfn_t pfn;
 
-   if (bytes != 4 && bytes != 8)
-   return;
-
-   /*
-* Assume that the pte write on a page table of the same type
-* as the current vcpu paging mode.  This is nearly always true
-* (might be false while changing modes).  Note it is verified later
-* by update_pte().
-*/
-   if (is_pae(vcpu)) {
-   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   if ((bytes == 4) && (gpa % 4 == 0)) {
-   r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
-   if (r)
-   return;
-   memcpy((void *)&gpte + (gpa % 8), new, 4);
-   } else if ((bytes == 8) && (gpa % 8 == 0)) {
-   memcpy((void *)&gpte, new, 8);
-   }
-   } else {
-   if ((bytes == 4) && (gpa % 4 == 0))
-   memcpy((void *)&gpte, new, 4);
-   }
if (!is_present_gpte(gpte))
return;
gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -2638,7 +2613,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int r;
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
-   mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
+
+   switch (bytes) {
+   case 4:
+   gentry = *(const u32 *)new;
+   break;
+   case 8:
+   gentry = *(const u64 *)new;
+   break;
+   default:
+   gentry = 0;
+   break;
+   }
+
+   /*
+* Assume that the pte write on a page table of the same type
+* as the current vcpu paging mode.  This is nearly always true
+* (might be false while changing modes).  Note it is verified later
+* by update_pte().
+*/
+   if (is_pae(vcpu) && bytes == 4) {
+   /* Handle a 32-bit guest writing two halves of a 64-bit gpte */
+   gpa &= ~(gpa_t)7;
+   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8);
+   if (r)
+   gentry = 0;
+   }
+
+   mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
@@ -2703,20 +2705,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
continue;
}
spte = &sp->spt[page_offset / sizeof(*spte)];
-   if ((gpa & (pte_size - 1)) || (bytes < pte_size)) {
-   gentry = 0;
-   r = kvm_read_guest_atomic(vcpu->kvm,
- gpa & ~(u64)(pte_size - 1),
- &gentry, pte_size);
-   new = (const void *)&gentry;
-   if (r < 0)
-   new = NULL;
-   }
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
-   if (new)
-   mmu_pte_write_new_pte(vcpu, sp, spte, new);
+   if (gentry)
+   mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
mmu_pte_write_flush_tlb(vcpu, entry, *spte);
++spte;
}
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: Don't follow an atomic operation by a non-atomic one

2010-03-10 Thread Avi Kivity
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked.  This updates the mmu
but undoes the whole point of doing things atomically.

Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |   32 +---
 1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8558a1c..4cd56c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3253,7 +3253,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu,
+  bool mmu_only)
 {
gpa_t gpa;
u32 error_code;
@@ -3269,6 +3270,10 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
goto mmio;
 
+   if (mmu_only) {
+   kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1);
+   return X86EMUL_CONTINUE;
+   }
if (emulator_write_phys(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;
 
@@ -3289,24 +3294,35 @@ mmio:
return X86EMUL_CONTINUE;
 }
 
-int emulator_write_emulated(unsigned long addr,
-  const void *val,
-  unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+static int __emulator_write_emulated(unsigned long addr,
+const void *val,
+unsigned int bytes,
+struct kvm_vcpu *vcpu,
+bool mmu_only)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
int rc, now;
 
now = -addr & ~PAGE_MASK;
-   rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
+   rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
+mmu_only);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
val += now;
bytes -= now;
}
-   return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
+   return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
+  mmu_only);
+}
+
+int emulator_write_emulated(unsigned long addr,
+   const void *val,
+   unsigned int bytes,
+   struct kvm_vcpu *vcpu)
+{
+   return __emulator_write_emulated(addr, val, bytes, vcpu, false);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3370,6 +3386,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
 
+   return __emulator_write_emulated(addr, new, bytes, vcpu, true);
+
 emul_write:
printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
 
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED

2010-03-10 Thread Joerg Roedel
On Mon, Mar 08, 2010 at 11:39:31AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This capability shows userspace that is can trust the values
> >of cpuid[0x800A] that it gets from the kernel. Old
> >behavior was to just return the host cpuid values which is
> >broken because all additional svm-features need support in
> >the svm emulation code.
> >
> 
> A think we can simply fix the bug and push the fix to the various
> stable queues.

Ok, sounds good too. I have some more fixes queued up and send this one
together with them.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker

2010-03-10 Thread Joerg Roedel
On Mon, Mar 08, 2010 at 11:37:22AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This patch introduces a mmu-callback to translate gpa
> >addresses in the walk_addr code. This is later used to
> >translate l2_gpa addresses into l1_gpa addresses.
> >
> >Signed-off-by: Joerg Roedel
> >---
> >  arch/x86/include/asm/kvm_host.h |1 +
> >  arch/x86/kvm/mmu.c  |7 +++
> >  arch/x86/kvm/paging_tmpl.h  |   19 +++
> >  include/linux/kvm_host.h|5 +
> >  4 files changed, 32 insertions(+), 0 deletions(-)
> >
> >diff --git a/arch/x86/include/asm/kvm_host.h 
> >b/arch/x86/include/asm/kvm_host.h
> >index c0b5576..76c8b5f 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -250,6 +250,7 @@ struct kvm_mmu {
> > void (*free)(struct kvm_vcpu *vcpu);
> > gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
> > u32 *error);
> >+gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
> > void (*prefetch_page)(struct kvm_vcpu *vcpu,
> >   struct kvm_mmu_page *page);
> > int (*sync_page)(struct kvm_vcpu *vcpu,
> 
> I think placing this here means we will miss a few translations,
> namely when we do a physical access (say, reading PDPTEs or
> similar).
> 
> We need to do this on the level of kvm_read_guest() so we capture
> physical accesses:
> 
> kvm_read_guest_virt
>   -> walk_addr
>  -> kvm_read_guest_tdp
>  -> kvm_read_guest_virt
> -> walk_addr
> -> kvm_read_guest_tdp
>  -> kvm_read_guest
> 
> Of course, not all accesses will use kvm_read_guest_tdp; for example
> kvmclock accesses should still go untranslated.

Ok, doing the translation in kvm_read_guest is certainly the more
generic approach. I already fixed a bug related to loading l2 pdptr
pointers. Doing the translation in kvm_read_guest makes the code a lot
nicer.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-10 Thread Joerg Roedel
On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote:
> On 03/03/2010 09:12 PM, Joerg Roedel wrote:
> >This patch changes the tdp_enabled flag from its global
> >meaning to the mmu-context. This is necessary for Nested SVM
> >with emulation of Nested Paging where we need an extra MMU
> >context to shadow the Nested Nested Page Table.
> >
> >
> >diff --git a/arch/x86/include/asm/kvm_host.h 
> >b/arch/x86/include/asm/kvm_host.h
> >index ec891a2..e7bef19 100644
> >--- a/arch/x86/include/asm/kvm_host.h
> >+++ b/arch/x86/include/asm/kvm_host.h
> >@@ -254,6 +254,7 @@ struct kvm_mmu {
> > int root_level;
> > int shadow_root_level;
> > union kvm_mmu_page_role base_role;
> >+bool tdp_enabled;
> >
> 
> This needs a different name, since the old one is still around.
> Perhaps we could call it parent_mmu and make it a kvm_mmu pointer.

Hmm, how about renaming the global tdp_enabled variable to tdp_usable?
The global variable indicates if tdp is _usable_ and we can _enable_ it
for a mmu context.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.

2010-03-10 Thread Gleb Natapov
On Wed, Mar 10, 2010 at 11:12:34AM +0200, Avi Kivity wrote:
> On 03/09/2010 08:09 PM, Gleb Natapov wrote:
> >
> >>We don't want to enter the emulator for non-string in/out.  Leftover
> >>test code?
> >>
> >No, unfortunately this is not leftover. I just don't see a way how we
> >can bypass emulator and still have emulator be able to emulate in/out
> >(for big real mode for instance). The problem is basically described in
> >the commit message. If we have function outside of emulator that does
> >in/out emulation on vcpu directly, then emulator can't  use it since
> >committing shadowed registers will overwrite the result of emulation.
> >Having two different emulations (one outside of emulator and another in
> >emulator) is also problematic since when userspace returns after IO exit
> >we don't know which emulation to continue. If we want to avoid
> >instruction decoding we can fill in emulation context from exit info as
> >if instruction was already decoded and call emulator.
> >
> 
> Alternatively, another entry point would be fine.  in/out is a fast
> path (used for virtio for example).
> 
You mean another entry point into emulator, not separate implementation
for emulated in/out and intercepted one. If yes this is what I mean by
"faking" decoding stage.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2962575 ] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

2010-03-10 Thread SourceForge.net
Bugs item #2962575, was opened at 2010-03-03 13:20
Message generated for change (Comment added) made by erikvdk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2962575&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Erik van der Kouwe (erikvdk)
Assigned to: Nobody/Anonymous (nobody)
Summary: MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

Initial Comment:
Dear all,

If one runs the following commands after installing qemu-0.12.3 or 
qemu-kvm-0.12.3:

wget http://www.minix3.org/download/minix_R3.1.6-r6084.iso.bz2
bunzip2 minix_R3.1.6-r6084.iso.bz2
qemu-system-x86_64 -cdrom minix_R3.1.6-r6084.iso -enable-kvm

and presses 1 (Regular MINIX 3), the following error message results when 
loading MINIX:
kvm: unhandled exit 8021
kvm_run returned -22

The guest stops after that.

This error message does not occur without the -enable-kvm switch. It does not 
occur with qemu-kvm-0.11.0 as bundled with Ubuntu. The problem occurs with the 
"qemu" binary from qemu-0.12.3 as well as "qemu-system-x86_64" from 
qemu-kvm-0.12.3, but in the former case no error message is printed.

The code that is running when it fails is in 
https://gforge.cs.vu.nl/gf/project/minix/scmsvn/?action=browse&path=%2Ftrunk%2Fsrc%2Fboot%2Fboothead.s&revision=5918&view=markup.
 It happens in ext_copy:

ext_copy:
mov x_dst_desc+2, ax
movbx_dst_desc+4, dl ! Set base of destination segment
mov ax, 8(bp)
mov dx, 10(bp)
mov x_src_desc+2, ax
movbx_src_desc+4, dl ! Set base of source segment
mov si, #x_gdt  ! es:si = global descriptor table
shr cx, #1  ! Words to move
movbah, #0x87   ! Code for extended memory move
int 0x15

The line that fails is "int 0x15", which performs a BIOS call to copy data from 
low memory to above the 1MB barrier. The machine is running in 16-bit real mode 
when this code is executed.

Output for "uname -a" on the host:

Linux hp364 2.6.31-20-generic #57-Ubuntu SMP Mon Feb 8 09:05:19 UTC 2010 i686 
GNU/Linux

Output for "cat /proc/cpuinfo" on the host:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8600  @ 3.33GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips: 6650.50
clflush size: 64
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8600  @ 3.33GHz
stepping: 10
cpu MHz : 1998.000
cache size  : 6144 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr 
pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority
bogomips: 6649.80
clflush size: 64
power management:

With kind regards,
Erik


--

Comment By: Erik van der Kouwe (erikvdk)
Date: 2010-03-10 15:16

Message:
 
Thanks to Avi Kivity I now have a workaround for this issue, namely
16-byte
align the addresses in the GDT passed to the BIOS extended copy function.
The BIOS left the unaligned descriptor causing MINIX to operate in unreal
mode, which is not well supported by KVM on Intel. 


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2962575&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2967396 ] Workaround

2010-03-10 Thread SourceForge.net
Bugs item #2967396, was opened at 2010-03-10 15:13
Message generated for change (Comment added) made by erikvdk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Deleted
Resolution: None
Priority: 5
Private: No
Submitted By: Erik van der Kouwe (erikvdk)
Assigned to: Nobody/Anonymous (nobody)
Summary: Workaround

Initial Comment:
Thanks to Avi Kivity I now have a workaround for this issue, namely 16-byte 
align the addresses in the GDT passed to the BIOS extended copy function. The 
BIOS left the unaligned descriptor causing MINIX to operate in unreal mode, 
which is not well supported by KVM on Intel.

--

>Comment By: Erik van der Kouwe (erikvdk)
Date: 2010-03-10 15:14

Message:
Oops, this was supposed to be a comment to another report

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2967396 ] Workaround

2010-03-10 Thread SourceForge.net
Bugs item #2967396, was opened at 2010-03-10 15:13
Message generated for change (Tracker Item Submitted) made by erikvdk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Erik van der Kouwe (erikvdk)
Assigned to: Nobody/Anonymous (nobody)
Summary: Workaround

Initial Comment:
Thanks to Avi Kivity I now have a workaround for this issue, namely 16-byte 
align the addresses in the GDT passed to the BIOS extended copy function. The 
BIOS left the unaligned descriptor causing MINIX to operate in unreal mode, 
which is not well supported by KVM on Intel.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread Bitman Zhou
> Hi folks.
> 
> Host - ubuntu 9.10 64bit
> Virtualizer - KVM
> 
> I need to stop KVM starting at boot.
> 
> I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf
> blacklist kvm
> blacklist kvm-amd
> 
> 
> Reboot PC
> 
> It doesn't work.
> 
> $ lsmod | grep kvm
> kvm_amd41556  0
> kvm   190648  1 kvm_amd
> 
> 
> Please what further command I have to run in order to activate the new  
> blacklist.conf ?

For Ubutnu, you can just use update-rc.d

sudo update-rc.d kvm disable

to disable kvm and 

sudo update-rc.d kvm enable

to enable it again.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Arnd Bergmann
On Tuesday 09 March 2010, Cam Macdonell wrote:
> >
> > We could make the masking in RAM, not in registers, like virtio, which would
> > require no exits.  It would then be part of the application specific
> > protocol and out of scope of of this spec.
> >
> 
> This kind of implementation would be possible now since with UIO it's
> up to the application whether to mask interrupts or not and what
> interrupts mean.  We could leave the interrupt mask register for those
> who want that behaviour.  Arnd's idea would remove the need for the
> Doorbell and Mask, but we will always need at least one MMIO register
> to send whatever interrupts we do send.

You'd also have to be very careful if the notification is in RAM to
avoid races between one guest triggering an interrupt and another
guest clearing its interrupt mask.

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI capabilities support for assigned devices

2010-03-10 Thread Sebastian Hetze
Hi *,

in qemu-kvm/hw/device-assignment.c assigned_device_pci_cap_init()
appearently only PCI_CAP_ID_MSI and PCI_CAP_ID_MSIX are exposed
to the guest.

Linux Broadcom bnx2 and tg3 drivers expect PCI_CAP_ID_PM to be present.

Are there any plans to implement this and possibly other PCI capability
features for assigned devices?

If not, is there a list of network cards known to work with PCI
assignment in KVM?

Best Regards,

  Sebastian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Gleb Natapov
On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote:
> Gleb Natapov wrote:
> 
> >>>Entering guest from time to time will not change semantics of the
> >>>processor (if code is not modified under processor's feet at least).
> >>>Currently we reenter guest mode after each iteration of string
> >>>instruction for all instruction but ins/outs.
> >>>
> >>E.g., is there no chance that during the repetitions, in the middle of the
> >>repetitions, page faults occur? If it can, without entering the guest, can
> >>we handle it?
> >> -- I lack some basic assumptions?
> >>
> >If page fault occurs we inject it to the guest.
> >
> 
> Oh, I maight fail to tell what I worried about.
> Opposite, I mean, I worried about NOT reentering the guest case.
> 
Are you thinking about something specific here? If we inject exceptions
when they occur and we inject interrupt when they arrive what problem do
you see? I guess this is how real CPU actually works. I doubt it
re-reads string instruction on each iteration.

> I know that current implementation with reentrance is OK.
Current implementation does not reenter guest on each iteration for pio
string, so currently we have both variants.

> 
> To inject a page fault without reentering the guest, we need to add
> some more hacks to the emulator IIUC.
> 
No, we just need to enter guest if exception happens. I see that this in
handled incorrectly in my current patch series.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM test: Support to SLES install

2010-03-10 Thread Lucas Meneghel Rodrigues
On Wed, Mar 10, 2010 at 8:45 AM, Lucas Meneghel Rodrigues
 wrote:
> From: yogi 
>
> Adds new entry "SUSE" in test_base file for sles and
> contains autoinst file for doing unatteneded Sles11 64-bit
> install.

Oh Yogi, by the way, could you please reorganize the opensuse session
and add at least an autoyast file for opensuse 11.2 so I can actually
test if we can get a successful installation? I tried to play with the
XML file of SLES to see if I could get opensuse 11.2 installed, but it
turns out that those config files are an endless XML nightmare and all
I tried makes yast to die.

The mechanics of the whole thing are correct, I can get yast to start
with no problems, but parsing the autoyast file makes the VM to hang.
So I am fine with adding the patch, but it'd be nice to have an OS
with irrestrict access that everybody could play with (opensuse). I
don't have enough time to make it work on my own, so if you have some
spare time, please work on this.

> Signed-off-by: Yogananth Subramanian 
> ---
>  client/tests/kvm/tests_base.cfg.sample |   22 ++
>  1 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index c76470d..acb2076 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -503,6 +503,28 @@ variants:
>                     md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8
>                     md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0
>
> +            - SLES:
> +                no setup
> +                shell_prompt = "^r...@.*[\#\$]\s*$|#"
> +                unattended_install:
> +                    pxe_image = "linux"
> +                    pxe_initrd = "initrd"
> +                    extra_params += " -bootp /pxelinux.0 -boot n"
> +                    kernel_args = "autoyast=floppy"
> +
> +                variants:
> +                    - 11.64:
> +                        no setup
> +                        image_name = sles11-64
> +                        cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso
> +                        md5sum = 50a2bd45cd12c3808c3ee48208e2586b
> +                        md5sum_1m = 0951cab7c32e332362fc424c1054
> +                        unattended_install:
> +                            unattended_file = 
> unattended/Sles11-64-autoinst.xml
> +                            tftp = "images/sles11-64/tftpboot"
> +                            floppy = "images/sles11-64floppy.img"
> +                            pxe_dir = "boot/x86_64/loader"
> +
>             - @Ubuntu:
>                 shell_prompt = "^r...@.*[\#\$]\s*$"
>
> --
> 1.6.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to tweak kernel to get the best out of kvm?

2010-03-10 Thread Avi Kivity

On 03/10/2010 02:58 PM, Harald Dunkel wrote:

Hi Avi,

On 03/08/10 12:02, Avi Kivity wrote:
   

On 03/05/2010 05:20 PM, Harald Dunkel wrote:
 

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks,

Problem: My kvm server (8 cores, 64 GByte RAM, amd64) can eat up
all block device or file system performance, so that the kvm clients
become almost unresponsive. This is _very_ bad. I would like to make
sure that the kvm clients do not affect each other, and that all
(including the server itself) get a fair part of computing power and
memory space.

   

Please describe the issue in detail, provide output from 'vmstat' and
'top'.

 

Sorry for the delay. I cannot put these services at risk, so I
have setup a test environment on another host (2 quadcore Xeons,
ht enabled, 32 GByte RAM, no swap, bridged networking) to
reproduce the problem.

There are 8 virtual hosts, each with a single CPU, 1 GByte RAM
and 4 GByte swap on a virtual disk. The virtual disks are image
files in the local file system. These images are not shared.

For testing each virtual host builds the Linux kernel. In
parallel I am running rsync to clone a remote virtual machine
(22 GByte) to the local physical disk.

Attached you can find the requested logs. The kern.log shows
the problem: The virtual CPUs get stuck (as it seems). Several
virtual hosts showed this effect. One vhost was unresponsive
for more than 30 minutes.

Surely this is a stress test, but I had a similar effect with
our virtual mail server on the production system, while I
was running a similar rsync session. mailhost was unresponsive
for more than 2 minutes, then it was back. The other 8 virtual
hosts on this system were started, but idle (AFAICT).

   


You have tons of iowait time, indicating an I/O bottleneck.

What filesystem are you using for the host?  Are you using qcow2 or raw 
access?  What's the qemu command line.


Perhaps your filesystem doesn't perform well on synchronous writes.  For 
testing only, you might try cache=writeback.



BTW, please note that free memory goes down over time. This
happens only if the rsync is running. Without rsync the free
memory is stable.
   


That's expected.  rsync fills up the guest and host pagecache, both 
drain free memory (the guest only until it has touched all of its memory).



What config options would you suggest to build and run a Linux
kernel optimized for running kvm clients?

Sorry for asking, but AFAICS some general guidelines for kvm are
missing here. Of course I saw a lot of options in Documentation/\
kernel-parameters.txt, but unfortunately I am not a kernel hacker.

Any helpful comment would be highly appreciated.

   

One way to ensure guests don't affect each other is not to overcommit,
that is make sure each guest gets its own cores, there is enough memory
for all guests, and guests have separate disks.  Of course that defeats
the some of the reasons for virtualizing in the first place; but if you
share resources, some compromises must be made.

 

How many virtual machines would you assume I could run on a
host with 64 GByte RAM, 2 quad cores, a bonding NIC with
4*1Gbit/sec and a hardware RAID? Each vhost is supposed to
get 4 GByte RAM and 1 CPU.
   


15 guests should fit comfortably, more with ksm running if the workloads 
are similar, or if you use ballooning.


   

If you do share resources, then Linux manages how they are shared.  The
scheduler will share the processors, the memory management subsystem
will share memory, and the I/O scheduler will share disk bandwidth.  If
you see a problem in one of these areas you will need to tune the
subsystem that is misbehaving.

 

Do you think that the bridge connecting the tunnel devices and
the real NIC makes the problems? Is there also a subsystem managing
network access?
   


Here the problem is likely the host filesystem and/or I/O scheduler.

The optimal layout is placing guest disks in LVM volumes, and accessing 
them with -drive file=...,cache=none.  However, file-based access should 
also work.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on stopping KVM start at boot

2010-03-10 Thread Rodrigo Campos
On Wed, Mar 10, 2010 at 04:07:09PM +0800, sati...@pacific.net.hk wrote:
> Hi folks.
> 
> Host - ubuntu 9.10 64bit
> Virtualizer - KVM
> 
> I need to stop KVM starting at boot.
> 
> I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf
> blacklist kvm
> blacklist kvm-amd
> 
> 
> Reboot PC
> 
> It doesn't work.
> 
> $ lsmod | grep kvm
> kvm_amd41556  0
> kvm   190648  1 kvm_amd
> 
> 
> Please what further command I have to run in order to activate the
> new blacklist.conf ?

/etc/init.d/qemu-kvm loads the module. So you probably must not run it on start
up or teach it how to read the blacklist.

Perhaps this is more appropriate to ask on an ubuntu-specific mailing list,
since this is an ubuntu problem.





Thanks,
Rodrigo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arch/x86/kvm/* Checkpatch cleanup

2010-03-10 Thread Joerg Roedel
Where have you based these changes on? I already did most of the
cleanups to svm.c you have made here and the patch should be in
avi/master.

Joerg

On Wed, Mar 10, 2010 at 12:37:47PM +0100, Andrea Gelmini wrote:
> Fixes for all files
> 
> Signed-off-by: Andrea Gelmini 
> ---
>  arch/x86/kvm/emulate.c   |  139 
> +++---
>  arch/x86/kvm/i8254.c |8 +--
>  arch/x86/kvm/i8254.h |   12 ++--
>  arch/x86/kvm/i8259.c |3 +-
>  arch/x86/kvm/kvm_timer.h |6 +-
>  arch/x86/kvm/lapic.c |6 +-
>  arch/x86/kvm/mmu.c   |   17 +++---
>  arch/x86/kvm/mmutrace.h  |6 +-
>  arch/x86/kvm/svm.c   |   77 +-
>  arch/x86/kvm/trace.h |   12 ++--
>  arch/x86/kvm/vmx.c   |   44 +++---
>  arch/x86/kvm/x86.c   |   18 +++---
>  arch/x86/kvm/x86.h   |2 +-
>  13 files changed, 170 insertions(+), 180 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM test: Support to SLES install

2010-03-10 Thread Lucas Meneghel Rodrigues
From: yogi 

Adds new entry "SUSE" in test_base file for sles and
contains autoinst file for doing unatteneded Sles11 64-bit
install.

Signed-off-by: Yogananth Subramanian 
---
 client/tests/kvm/tests_base.cfg.sample |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index c76470d..acb2076 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -503,6 +503,28 @@ variants:
 md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8
 md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0
 
+- SLES:
+no setup
+shell_prompt = "^r...@.*[\#\$]\s*$|#"
+unattended_install:
+pxe_image = "linux"
+pxe_initrd = "initrd"
+extra_params += " -bootp /pxelinux.0 -boot n"
+kernel_args = "autoyast=floppy"
+
+variants:
+- 11.64:
+no setup
+image_name = sles11-64
+cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso
+md5sum = 50a2bd45cd12c3808c3ee48208e2586b
+md5sum_1m = 0951cab7c32e332362fc424c1054
+unattended_install:
+unattended_file = unattended/Sles11-64-autoinst.xml
+tftp = "images/sles11-64/tftpboot"
+floppy = "images/sles11-64floppy.img"
+pxe_dir = "boot/x86_64/loader"
+
 - @Ubuntu:
 shell_prompt = "^r...@.*[\#\$]\s*$"
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM checkpatch.pl cleanup

2010-03-10 Thread Andrea Gelmini
Hi all,
as Marcelo told me I send you these group of patches.
They're just checkpatch.pl cleanup.
They cleanly apply and compile on latest Linus' git tree.

Thanks a lot for your work,
Andrea


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] arch/x86/kvm/* Checkpatch cleanup

2010-03-10 Thread Andrea Gelmini
Fixes for all files

Signed-off-by: Andrea Gelmini 
---
 arch/x86/kvm/emulate.c   |  139 +++---
 arch/x86/kvm/i8254.c |8 +--
 arch/x86/kvm/i8254.h |   12 ++--
 arch/x86/kvm/i8259.c |3 +-
 arch/x86/kvm/kvm_timer.h |6 +-
 arch/x86/kvm/lapic.c |6 +-
 arch/x86/kvm/mmu.c   |   17 +++---
 arch/x86/kvm/mmutrace.h  |6 +-
 arch/x86/kvm/svm.c   |   77 +-
 arch/x86/kvm/trace.h |   12 ++--
 arch/x86/kvm/vmx.c   |   44 +++---
 arch/x86/kvm/x86.c   |   18 +++---
 arch/x86/kvm/x86.h   |2 +-
 13 files changed, 170 insertions(+), 180 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4dade6a..3ebec1e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -449,35 +449,34 @@ static u32 group2_table[] = {
 
 
 /* Raw emulation: instruction has two explicit operands. */
-#define __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy) \
-   do {\
-   unsigned long _tmp; \
-   \
-   switch ((_dst).bytes) { \
-   case 2: \
-   emulate_2op(_op,_src,_dst,_eflags,_wx,_wy,"w"); \
-   break;  \
-   case 4: \
-   emulate_2op(_op,_src,_dst,_eflags,_lx,_ly,"l"); \
-   break;  \
-   case 8: \
-   
ON64(emulate_2op(_op,_src,_dst,_eflags,_qx,_qy,"q")); \
-   break;  \
-   }   \
+#define __emulate_2op_nobyte(_op, _src, _dst, _eflags, _wx, _wy, _lx, _ly, 
_qx, _qy)   \
+   do {
\
+   unsigned long _tmp; 
\
+   switch ((_dst).bytes) { 
\
+   case 2: 
\
+   emulate_2op(_op, _src, _dst, _eflags, _wx, _wy, 
"w");   \
+   break;  
\
+   case 4: 
\
+   emulate_2op(_op, _src, _dst, _eflags, _lx, _ly, 
"l");   \
+   break;  
\
+   case 8: 
\
+   ON64(emulate_2op(_op, _src, _dst, _eflags, _qx, 
_qy, "q")); \
+   break;  
\
+   }   
\
} while (0)
 
-#define __emulate_2op(_op,_src,_dst,_eflags,_bx,_by,_wx,_wy,_lx,_ly,_qx,_qy) \
-   do { \
-   unsigned long _tmp;  \
-   switch ((_dst).bytes) {  \
-   case 1:  \
-   emulate_2op(_op,_src,_dst,_eflags,_bx,_by,"b");  \
-   break;   \
-   default: \
-   __emulate_2op_nobyte(_op, _src, _dst, _eflags,   \
-_wx, _wy, _lx, _ly, _qx, _qy);  \
-   break;   \
-   }\
+#define __emulate_2op(_op, _src, _dst, _eflags, _bx, _by, _wx, _wy, _lx, _ly, 
_qx, _qy)\
+   do {
\
+   unsigned long _tmp; 
\
+   switch ((_dst).bytes) { 
\
+   case 1: 
\
+   emulate_2op(_op, _src, _dst, _eflags, _bx, _by, 
"b");   \
+   break;  
\
+   default:

[PATCH 1/2] virt/kvm/* Checkpatch cleanup

2010-03-10 Thread Andrea Gelmini
Fixes for all files

Signed-off-by: Andrea Gelmini 
---
 virt/kvm/assigned-dev.c   |2 +-
 virt/kvm/coalesced_mmio.h |4 ++--
 virt/kvm/ioapic.c |4 ++--
 virt/kvm/ioapic.h |2 +-
 virt/kvm/irq_comm.c   |2 +-
 virt/kvm/kvm_main.c   |   12 ++--
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 057e2cc..6595bf2 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -786,7 +786,7 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned 
ioctl,
goto out_free_irq_routing;
r = kvm_set_irq_routing(kvm, entries, routing.nr,
routing.flags);
-   out_free_irq_routing:
+out_free_irq_routing:
vfree(entries);
break;
}
diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h
index 8a5959e..8c3b79f 100644
--- a/virt/kvm/coalesced_mmio.h
+++ b/virt/kvm/coalesced_mmio.h
@@ -25,9 +25,9 @@ struct kvm_coalesced_mmio_dev {
 int kvm_coalesced_mmio_init(struct kvm *kvm);
 void kvm_coalesced_mmio_free(struct kvm *kvm);
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
-   struct kvm_coalesced_mmio_zone *zone);
+struct kvm_coalesced_mmio_zone *zone);
 int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
- struct kvm_coalesced_mmio_zone *zone);
+struct kvm_coalesced_mmio_zone *zone);
 
 #else
 
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3db15a8..b718699 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -43,7 +43,7 @@
 #include "irq.h"
 
 #if 0
-#define ioapic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg)
+#define ioapic_debug(fmt, arg...) printk(KERN_WARNING fmt, ##arg)
 #else
 #define ioapic_debug(fmt, arg...)
 #endif
@@ -326,7 +326,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
return -EOPNOTSUPP;
 
ioapic_debug("ioapic_mmio_write addr=%p len=%d val=%p\n",
-(void*)addr, len, val);
+(void *)addr, len, val);
ASSERT(!(addr & 0xf));  /* check alignment */
 
if (len == 4 || len == 8)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 8a751b7..0053dd7 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -50,7 +50,7 @@ struct kvm_ioapic {
 };
 
 #ifdef DEBUG
-#define ASSERT(x)  \
+#define ASSERT(x)  \
 do {   \
if (!(x)) { \
printk(KERN_EMERG "assertion failed %s: %d: %s\n",  \
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 9fd5b3e..9a05c77 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -162,7 +162,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
irq_set[i++] = *e;
rcu_read_unlock();
 
-   while(i--) {
+   while (i--) {
int r;
r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level);
if (r < 0)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 548f925..596900e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -64,7 +64,7 @@ MODULE_LICENSE("GPL");
 /*
  * Ordering of locks:
  *
- * kvm->lock --> kvm->slots_lock --> kvm->irq_lock
+ * kvm->lock --> kvm->slots_lock --> kvm->irq_lock
  */
 
 DEFINE_SPINLOCK(kvm_lock);
@@ -681,8 +681,8 @@ skip_lpage:
 * memslot will be created.
 *
 * validation of sp->gfn happens in:
-*  - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
-*  - kvm_is_visible_gfn (mmu_check_roots)
+*  - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
+*  - kvm_is_visible_gfn (mmu_check_roots)
 */
kvm_arch_flush_shadow(kvm);
kfree(old_memslots);
@@ -918,7 +918,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
slot = gfn_to_memslot_unaliased(kvm, gfn);
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return bad_hva();
-   return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
+   return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
@@ -970,7 +970,7 @@ EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
 static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t 
gfn)
 {
-   return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
+   return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
 }
 
 pfn_t gfn_to_pfn_memsl

Re: [PATCH] KVM test: Make sure check_image script runs on VMs turned off

2010-03-10 Thread Lucas Meneghel Rodrigues
On Wed, Mar 10, 2010 at 5:21 AM, Michael Goldish  wrote:
>
> - "Lucas Meneghel Rodrigues"  wrote:
>
>> As it is hard to guarantee that a qcow2 image will be in a
>> consistent state with a VM turned on, take an extra safety
>> step and make sure the preprocessor shuts down the VMs
>> before the post process command check_image.py runs.
>>
>> Signed-off-by: Lucas Meneghel Rodrigues 
>> ---
>>  client/tests/kvm/tests_base.cfg.sample |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/client/tests/kvm/tests_base.cfg.sample
>> b/client/tests/kvm/tests_base.cfg.sample
>> index 340b0c0..beae786 100644
>> --- a/client/tests/kvm/tests_base.cfg.sample
>> +++ b/client/tests/kvm/tests_base.cfg.sample
>> @@ -1049,6 +1049,8 @@ variants:
>>          post_command = " python scripts/check_image.py;"
>>          remove_image = no
>>          post_command_timeout = 600
>> +        kill_vm = yes
>> +        kill_vm_gracefully = yes
>
> That's not necessarily bad, but this may significantly slow down
> testing because it means the VM will shutdown and boot up again
> after every qcow2 test.  It'll also separate the tests in an
> unnatural way, eliminating the possibility of catching problems
> that only appear after several consecutive tests (such problems
> may or may not be possible, I'm not sure).
> Maybe we should consider specifying the post_command for only some
> of the tests, or add a dedicated test for this purpose, or even
> a no-op test that only shuts down the VM and runs the post command.

Or we could make this post command non critical and avoid the
shutdowns. This way we'd pay attention to failures only when
investigating the logs, this way if the consistency check fails on a
situation where it shouldn't then we'd take an action.

>>      - vmdk:
>>          only Fedora Ubuntu Windows
>>          only smp2
>> --
>> 1.6.6.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Paul Brook
> >> As of March 2009[1] Intel guarantees that memory reads occur in order
> >> (they may only be reordered relative to writes). It appears AMD do not
> >> provide this guarantee, which could be an interesting problem for
> >> heterogeneous migration..
> >
> > Interesting, but what ordering would cause problems that AMD would do
> > but Intel wouldn't?  Wouldn't that ordering cause the same problems
> > for POSIX shared memory in general (regardless of Qemu) on AMD?
> 
> If some code was written for the Intel guarantees it would break if
> migrated to AMD.  Of course, it would also break if run on AMD in the
> first place.

Right. This is independent of shared memory, and is a case where reporting an 
Intel CPUID on and AMD host might get you into trouble.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

2010-03-10 Thread Avi Kivity

On 03/10/2010 12:26 PM, Erik van der Kouwe wrote:

Dear all,

I've submitted this bug report a week ago:
http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831 



I was wondering if work has already been done on this (maybe the 
problem was already known) and whether patches to fix this and/or 
workarounds are known.




MINIX is using big real mode which is currently not well supported by 
kvm on Intel hardware:



(qemu) info registers
EAX=0010 EBX=0009 ECX=4920 EDX=a796
ESI=0200 EDI=49200200 EBP=0009 ESP=a762
EIP=f4a7 EFL=00023002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =   f300
CS =f000 000f  f300
SS =9492 00094920  f300
DS =97ce 00097cec  f300


A ds.base of 0x97cec cannot be translated to a real mode segment.

There is some work to get this to work, but it is proceeding really 
slowly.  It should work on AMD hardware though.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled

2010-03-10 Thread Erik van der Kouwe

Dear all,

I've submitted this bug report a week ago:
http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831

I was wondering if work has already been done on this (maybe the problem 
was already known) and whether patches to fix this and/or workarounds 
are known.


Thanks for any answers,
Erik van der Kouwe
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: extended vga modes?

2010-03-10 Thread Roy Tam
2010/3/8 Avi Kivity :
> On 03/08/2010 01:07 PM, Michael Tokarev wrote:
>>
>> Avi Kivity wrote:
>> []
>>

 In short, when vgabios were dropped from qemu-kvm
 (for whatever yet unknown reason),

>>>
>>> What do you mean?  qemu-kvm still carries a local vgabios (see
>>> kvm/vgabios in qemu-kvm.git).
>>>
>>
>> Oh my.  So we all overlooked it.  I asked you several times
>> about the bios sources, in 0.12 seabios were supposed to be
>> in roms/seabios (which is still empty in the release), and
>> I thought vgabios should be in roms/vgabios (which is empty
>> too), and concluded it were dropped from qemu-kvm tarball.
>> But you're right, and I by mistake take vgabios sources from
>> upstream qemu when building Debian package, instead of using
>> the old'good sources from kvm/vgabios.  What a mess!... :(
>>
>> And it looks like that it's time to remove at least parts of
>> this mess, don't you think?  How about pushing the vgabios
>> changes to qemu and moving it to the same place where it is
>> in qemu?  Does it make sense?
>>
>
> We can't push the changes to qemu since qemu.git doesn't have a vgabios
> fork.  We might push the changes upstream.  Best of all if the seabios thing
> repeats itself with vgabios so we have maintainable and maintained vga
> firmware.
>

actually they do. see vgasrc directory in seabios.git.
but it is very incomplete: no cirrus support, no vbe.

> --
> error compiling committee.c: too many arguments to function
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 32-bit qemu + 64-bit kvm be a problem?

2010-03-10 Thread Avi Kivity

On 03/10/2010 11:59 AM, Neo Jia wrote:

hi,

I have to keep a 32-bit qmeu user space to work with some legacy
library I have but still want to use 64-bit host Linux to explore
64-bit advantage.

So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod
configuration.


It's fully supported.  It's less well tested that 64/64 or 32/32, so 
please report any bugs.



Will there be any limitation or drawback for this
configuration? I already get one that we can't assign guest physical
memory more than 2047 MB.
   


That is the only limitation AFAIK.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Takuya Yoshikawa

Gleb Natapov wrote:



Entering guest from time to time will not change semantics of the
processor (if code is not modified under processor's feet at least).
Currently we reenter guest mode after each iteration of string
instruction for all instruction but ins/outs.


E.g., is there no chance that during the repetitions, in the middle of the
repetitions, page faults occur? If it can, without entering the guest, can
we handle it?
 -- I lack some basic assumptions?


If page fault occurs we inject it to the guest.



Oh, I maight fail to tell what I worried about.
Opposite, I mean, I worried about NOT reentering the guest case.

I know that current implementation with reentrance is OK.

To inject a page fault without reentering the guest, we need to add
some more hacks to the emulator IIUC.


Thanks,
  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


32-bit qemu + 64-bit kvm be a problem?

2010-03-10 Thread Neo Jia
hi,

I have to keep a 32-bit qmeu user space to work with some legacy
library I have but still want to use 64-bit host Linux to explore
64-bit advantage.

So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod
configuration. Will there be any limitation or drawback for this
configuration? I already get one that we can't assign guest physical
memory more than 2047 MB.

Thanks,
Neo
-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shadow page table questions

2010-03-10 Thread Avi Kivity

On 03/10/2010 06:57 AM, Marek Olszewski wrote:

Hello,

I was wondering if someone could point me to some documentation that 
explains the basic non-nested-paging shadow page table 
algorithm/strategy used by KVM.  I understand that KVM caches shadow 
page tables across context switches and that there is a reverse 
mapping and page protection to help zap shadow page tables when the 
guest page tables change.  However, I'm not entirely sure how the 
actual caching is done.  At first I assumed that KVM would change the 
host CR3 on every guest context switch such that it would point to a 
cached shadow page table for the currently running guest user thread, 
however, as far as I can tell, the host CR3 does not change so I'm a 
little lost.  If indeed it doesn't change the CR3, how does KVM solve 
the problem that arises when two processes in the guest OS share the 
same guest logical addresses?


The host cr3 does change, though not by using the 'mov cr3' instruction 
(that would cause the host to immediately switch to the guest address 
space, which would be bad).


See the calls to kvm_x86_ops->set_cr3().



I'm also interested in figuring out what KVM does when running with 
multiple virtual CPUs.  Looking at the code, I can see that each VCPU 
has its own root pointer to a shadow page table graph, but I have yet 
to figure out if this graph has node's shared between VCPUs, or 
whether they are all private.


Everything is shared.  If the guest is running with identical cr3s, kvm 
will load identical cr3s in guest mode.


An exception is when we use 32-bit pae mode.  In that case, the guest 
cr3s will be different (but guest PDPTRs will be identical).  Instead of 
dealing with the pae cr3, we deal with the four PDPTRs.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM PMU virtualization

2010-03-10 Thread Zhang, Yanmin
On Thu, 2010-03-04 at 09:00 +0800, Zhang, Yanmin wrote:
> On Wed, 2010-03-03 at 11:15 +0100, Peter Zijlstra wrote:
> > On Wed, 2010-03-03 at 17:27 +0800, Zhang, Yanmin wrote:
> > > -#ifndef perf_misc_flags
> > > -#define perf_misc_flags(regs)  (user_mode(regs) ? PERF_RECORD_MISC_USER 
> > > : \
> > > -PERF_RECORD_MISC_KERNEL)
> > > -#define perf_instruction_pointer(regs) instruction_pointer(regs)
> > > -#endif 
> > 
> > Ah, that #ifndef is for powerpc, which I think you just broke.
> Thanks for the reminder. I deleted powerpc codes when building cscope
> lib.
> 
> It seems perf_save_virt_ip/perf_reset_virt_ip interfaces are ugly. I plan to
> change them to a callback function struct and kvm registers its version to 
> perf.
> 
> Such like:
> struct perf_guest_info_callbacks {
>   int (*is_in_guest)();
>   u64 (*get_guest_ip)();
>   int (*copy_guest_stack)();
>   int (*reset_in_guest)();
>   ...
> };
> int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *);
> int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *);
> 
> It's more scalable and neater.
In case you guys might lose patience, I worked out a new patch against 
2.6.34-rc1.

It could work with:
#perf kvm --guest --guestkallsyms /guest/os/kernel/proc/kallsyms --guestmodules
/guest/os/proc/modules top
It also support to collect both host side and guest side at the same time:
#perf kvm --host --guest --guestkallsyms /guest/os/kernel/proc/kallsyms 
--guestmodules
/guest/os/proc/modules top

The first output line of top has guest kernel/user space percentage.

Or just host side:
#perf kvm --host

As tool perf source codes have lots of changes, I am still working on perf kvm 
record
and report.

---

diff -Nraup linux-2.6.34-rc1/arch/x86/include/asm/ptrace.h 
linux-2.6.34-rc1_work/arch/x86/include/asm/ptrace.h
--- linux-2.6.34-rc1/arch/x86/include/asm/ptrace.h  2010-03-09 
13:04:20.730596079 +0800
+++ linux-2.6.34-rc1_work/arch/x86/include/asm/ptrace.h 2010-03-10 
17:06:34.228953260 +0800
@@ -167,6 +167,15 @@ static inline int user_mode(struct pt_re
 #endif
 }
 
+static inline int user_mode_cs(u16 cs)
+{
+#ifdef CONFIG_X86_32
+   return (cs & SEGMENT_RPL_MASK) == USER_RPL;
+#else
+   return !!(cs & 3);
+#endif
+}
+
 static inline int user_mode_vm(struct pt_regs *regs)
 {
 #ifdef CONFIG_X86_32
diff -Nraup linux-2.6.34-rc1/arch/x86/kvm/vmx.c 
linux-2.6.34-rc1_work/arch/x86/kvm/vmx.c
--- linux-2.6.34-rc1/arch/x86/kvm/vmx.c 2010-03-09 13:04:20.758593132 +0800
+++ linux-2.6.34-rc1_work/arch/x86/kvm/vmx.c2010-03-10 17:11:49.709019136 
+0800
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct 
vmcs_write32(TPR_THRESHOLD, irr);
 }
 
+DEFINE_PER_CPU(int, kvm_in_guest) = {0};
+
+static void kvm_set_in_guest(void)
+{
+   percpu_write(kvm_in_guest, 1);
+}
+
+static int kvm_is_in_guest(void)
+{
+   return percpu_read(kvm_in_guest);
+}
+
+static int kvm_is_user_mode(void)
+{
+   int user_mode;
+   user_mode = user_mode_cs(vmcs_read16(GUEST_CS_SELECTOR));
+   return user_mode;
+}
+
+static u64 kvm_get_guest_ip(void)
+{
+   return vmcs_readl(GUEST_RIP);
+}
+
+static void kvm_reset_in_guest(void)
+{
+   if (percpu_read(kvm_in_guest))
+   percpu_write(kvm_in_guest, 0);
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+   .is_in_guest= kvm_is_in_guest,
+   .is_user_mode   = kvm_is_user_mode,
+   .get_guest_ip   = kvm_get_guest_ip,
+   .reset_in_guest = kvm_reset_in_guest
+};
+
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
@@ -3653,8 +3691,11 @@ static void vmx_complete_interrupts(stru
 
/* We need to handle NMIs before interrupts are enabled */
if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
-   (exit_intr_info & INTR_INFO_VALID_MASK))
+   (exit_intr_info & INTR_INFO_VALID_MASK)) {
+   kvm_set_in_guest();
asm("int $2");
+   kvm_reset_in_guest();
+   }
 
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
@@ -4251,6 +4292,8 @@ static int __init vmx_init(void)
if (bypass_guest_pf)
kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
 
+   perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
return 0;
 
 out3:
@@ -4266,6 +4309,8 @@ out:
 
 static void __exit vmx_exit(void)
 {
+   perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
free_page((unsigned long)vmx_msr_bitmap_legacy);
free_page((unsigned long)vmx_msr_bitmap_longmode);
free_page((unsigned long)vmx_io_bitmap_b);
diff -Nraup linux-2.6.34-rc1/include/linux/perf_event.h 
linux-2.6.34-rc1_work/include/linux/perf_event.h
--- linux-2.6.34-rc1/include/li

Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/10/2010 06:38 AM, Cam Macdonell wrote:

On Tue, Mar 9, 2010 at 5:03 PM, Paul Brook  wrote:
   

In a cross environment that becomes extremely hairy.  For example the x86
architecture effectively has an implicit write barrier before every
store, and an implicit read barrier before every load.
 

Btw, x86 doesn't have any implicit barriers due to ordinary loads.
Only stores and atomics have implicit barriers, afaik.
   

As of March 2009[1] Intel guarantees that memory reads occur in order (they
may only be reordered relative to writes). It appears AMD do not provide this
guarantee, which could be an interesting problem for heterogeneous migration..

Paul

[*] The most recent docs I have handy. Up to and including Core-2 Duo.

 

Interesting, but what ordering would cause problems that AMD would do
but Intel wouldn't?  Wouldn't that ordering cause the same problems
for POSIX shared memory in general (regardless of Qemu) on AMD?
   


If some code was written for the Intel guarantees it would break if 
migrated to AMD.  Of course, it would also break if run on AMD in the 
first place.



I think shared memory breaks migration anyway.
   


Until someone implements distributed shared memory.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/09/2010 11:44 PM, Anthony Liguori wrote:
Ah yes.  For cross tcg environments you can map the memory using mmio 
callbacks instead of directly, and issue the appropriate barriers there.



Not good enough unless you want to severely restrict the use of shared 
memory within the guest.


For instance, it's going to be useful to assume that you atomic 
instructions remain atomic.  Crossing architecture boundaries here 
makes these assumptions invalid.  A barrier is not enough.


You could make the mmio callbacks flow to the shared memory server over 
the unix-domain socket, which would then serialize them.  Still need to 
keep RMWs as single operations.  When the host supports it, implement 
the operation locally (you can't render cmpxchg16b on i386, for example).


Shared memory only makes sense when using KVM.  In fact, we should 
actively disable the shared memory device when not using KVM.


Looks like that's the only practical choice.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Avi Kivity

On 03/09/2010 08:34 PM, Cam Macdonell wrote:

On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity  wrote:
   

On 03/09/2010 05:27 PM, Cam Macdonell wrote:
 
   
 

  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).


   

How does the driver detect whether interrupts are supported or not?

 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?

   

I suggest not designing the device to uio.  Make it a good guest-independent
device, and if uio doesn't fit it, change it.

Why not support interrupts unconditionally?  Is the device useful without
interrupts?
 

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.
   


Can you explain how they synchronize?  Polling?  Using the network?  
Using it as a shared cache?


If it's a reasonable use case it makes sense to keep it.

Another thing comes to mind - a shared memory ID, in case a guest has 
multiple cards.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Gleb Natapov
On Wed, Mar 10, 2010 at 06:12:34PM +0900, Takuya Yoshikawa wrote:
> Gleb Natapov wrote:
> >On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote:
> >>Gleb Natapov wrote:
> >>>On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote:
> On 03/09/2010 04:09 PM, Gleb Natapov wrote:
> >Currently when string instruction is only partially complete we go back
> >to a guest mode, guest tries to reexecute instruction and exits again
> >and at this point emulation continues. Avoid all of this by restarting
> >instruction without going back to a guest mode.
> What happens if rcx is really big?  Going back into the guest gave
> us a preemption point.
> 
> >>>Two solutions. We can check if reschedule is required and yield cpu if
> >>>needed. Or we can enter guest from time to time.
> >>One generic question: from the viewpoint of KVM's policy, is it OK to make
> >>the semantics different from real CPUs?
> >>
> >>Semantics, may be better to use other words, but I'm little bit worried that
> >>the second solution may change something, not mentioning about bugs but some
> >>behavior patterns depending on the "time to time".
> >>
> >Entering guest from time to time will not change semantics of the
> >processor (if code is not modified under processor's feet at least).
> >Currently we reenter guest mode after each iteration of string
> >instruction for all instruction but ins/outs.
> >
> 
> E.g., is there no chance that during the repetitions, in the middle of the
> repetitions, page faults occur? If it can, without entering the guest, can
> we handle it?
>  -- I lack some basic assumptions?
> 
If page fault occurs we inject it to the guest.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Avi Kivity

On 03/10/2010 11:12 AM, Takuya Yoshikawa wrote:

Entering guest from time to time will not change semantics of the
processor (if code is not modified under processor's feet at least).
Currently we reenter guest mode after each iteration of string
instruction for all instruction but ins/outs.




E.g., is there no chance that during the repetitions, in the middle of 
the
repetitions, page faults occur? If it can, without entering the guest, 
can

we handle it?


Page faults can occur, and we need to handle them.

Another reason for reentering the guest is so that we can inject guest 
interrupts.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Avi Kivity

On 03/09/2010 08:11 PM, Gleb Natapov wrote:

On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote:
   

On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 

Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode.
   

What happens if rcx is really big?  Going back into the guest gave
us a preemption point.

 

Two solutions. We can check if reschedule is required and yield cpu if
needed. Or we can enter guest from time to time.
   


I'd stick with the current solution, reentering the guest every page or so.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.

2010-03-10 Thread Avi Kivity

On 03/09/2010 08:09 PM, Gleb Natapov wrote:



We don't want to enter the emulator for non-string in/out.  Leftover
test code?

 

No, unfortunately this is not leftover. I just don't see a way how we
can bypass emulator and still have emulator be able to emulate in/out
(for big real mode for instance). The problem is basically described in
the commit message. If we have function outside of emulator that does
in/out emulation on vcpu directly, then emulator can't  use it since
committing shadowed registers will overwrite the result of emulation.
Having two different emulations (one outside of emulator and another in
emulator) is also problematic since when userspace returns after IO exit
we don't know which emulation to continue. If we want to avoid
instruction decoding we can fill in emulation context from exit info as
if instruction was already decoded and call emulator.

   


Alternatively, another entry point would be fine.  in/out is a fast path 
(used for virtio for example).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-10 Thread Avi Kivity

On 03/09/2010 07:57 PM, Gleb Natapov wrote:


Descriptor writes need an atomic kvm_set_guest_bit(), no?

 

It is? atomic against what? Current code just write whole descriptor
using write_std().
   

These are accessed bit changes, and are done atomically in the same
way as a page table walk sets the accessed and dirty bit.
Presumably the atomic operation is to allow the kernel to scan
segments and swap them out if they are not used.

 

We can use cmpxchg callback for that, no?

   


Yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Takuya Yoshikawa

Gleb Natapov wrote:

On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote:

Gleb Natapov wrote:

On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote:

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode.

What happens if rcx is really big?  Going back into the guest gave
us a preemption point.


Two solutions. We can check if reschedule is required and yield cpu if
needed. Or we can enter guest from time to time.

One generic question: from the viewpoint of KVM's policy, is it OK to make
the semantics different from real CPUs?

Semantics, may be better to use other words, but I'm little bit worried that
the second solution may change something, not mentioning about bugs but some
behavior patterns depending on the "time to time".


Entering guest from time to time will not change semantics of the
processor (if code is not modified under processor's feet at least).
Currently we reenter guest mode after each iteration of string
instruction for all instruction but ins/outs.



E.g., is there no chance that during the repetitions, in the middle of the
repetitions, page faults occur? If it can, without entering the guest, can
we handle it?
 -- I lack some basic assumptions?


--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-10 Thread Gleb Natapov
On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote:
> Gleb Natapov wrote:
> >On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote:
> >>On 03/09/2010 04:09 PM, Gleb Natapov wrote:
> >>>Currently when string instruction is only partially complete we go back
> >>>to a guest mode, guest tries to reexecute instruction and exits again
> >>>and at this point emulation continues. Avoid all of this by restarting
> >>>instruction without going back to a guest mode.
> >>What happens if rcx is really big?  Going back into the guest gave
> >>us a preemption point.
> >>
> >Two solutions. We can check if reschedule is required and yield cpu if
> >needed. Or we can enter guest from time to time.
> 
> One generic question: from the viewpoint of KVM's policy, is it OK to make
> the semantics different from real CPUs?
> 
> Semantics, may be better to use other words, but I'm little bit worried that
> the second solution may change something, not mentioning about bugs but some
> behavior patterns depending on the "time to time".
> 
Entering guest from time to time will not change semantics of the
processor (if code is not modified under processor's feet at least).
Currently we reenter guest mode after each iteration of string
instruction for all instruction but ins/outs.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Make sure check_image script runs on VMs turned off

2010-03-10 Thread Michael Goldish

- "Lucas Meneghel Rodrigues"  wrote:

> As it is hard to guarantee that a qcow2 image will be in a
> consistent state with a VM turned on, take an extra safety
> step and make sure the preprocessor shuts down the VMs
> before the post process command check_image.py runs.
> 
> Signed-off-by: Lucas Meneghel Rodrigues 
> ---
>  client/tests/kvm/tests_base.cfg.sample |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/client/tests/kvm/tests_base.cfg.sample
> b/client/tests/kvm/tests_base.cfg.sample
> index 340b0c0..beae786 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -1049,6 +1049,8 @@ variants:
>  post_command = " python scripts/check_image.py;"
>  remove_image = no
>  post_command_timeout = 600
> +kill_vm = yes
> +kill_vm_gracefully = yes

That's not necessarily bad, but this may significantly slow down
testing because it means the VM will shutdown and boot up again
after every qcow2 test.  It'll also separate the tests in an
unnatural way, eliminating the possibility of catching problems
that only appear after several consecutive tests (such problems
may or may not be possible, I'm not sure).
Maybe we should consider specifying the post_command for only some
of the tests, or add a dedicated test for this purpose, or even
a no-op test that only shuts down the VM and runs the post command.

>  - vmdk:
>  only Fedora Ubuntu Windows
>  only smp2
> -- 
> 1.6.6.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question on stopping KVM start at boot

2010-03-10 Thread satimis

Hi folks.

Host - ubuntu 9.10 64bit
Virtualizer - KVM

I need to stop KVM starting at boot.

I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf
blacklist kvm
blacklist kvm-amd


Reboot PC

It doesn't work.

$ lsmod | grep kvm
kvm_amd41556  0
kvm   190648  1 kvm_amd


Please what further command I have to run in order to activate the new  
blacklist.conf ?


TIA


B.R.
Stephen L

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html