Re: debugging windows guests

2009-12-16 Thread Jan Kiszka
Raindog wrote:
 On 12/15/2009 3:39 PM, Jan Kiszka wrote:
 Raindog wrote:
   Hello,
 
   I am researching KVM as a malware analysis platform and had some
   questions about debugging the guest OS. In my case I intend to use
   windows guests. So my questsions are as follows:
 
   Questions:
 
   1. What instrumentation facilities are their available?
 
   2. Is it possible to extend the debugging interface so that
 debugging is
   more transparent to the guest OS? IE: there is still a limit of 4 HW
   breakpoints (which makes me wonder why a LIST is used for them...)

 In accelerated KVM mode, the x86 architecture restricts us to 4 break-
 or watchpoints that can be active at the same time. If you switch to
 emulation mode, there are no such limits. Actually, I just made use of
 this for debugging a subtle stack corruption in a guest, and I had more
 than 70 watchpoints active at the same time. It's just slightly slower
 than KVM...

 
 Ok, is there anything special that needs to be done to enable additional
 watchpoints as they are being called? How are these set btw? Is it
 accomplished transparently through gdb? IE: if you set a watchpoint at a
 specific address, under emulation mode, they simulate HW bps in that no
 code is modified via the injection of an int 3?

Yes, break- and watchpoints are transparent to the guest in emulation
mode. In KVM mode, hardware breakpoints do not require int 3, but they
are limited and the guest my notice that its own breakpoints have no
effect as long as the host injected some.

 
 
   3. I'm not finding any published API for interfacing with
 KVM/KQEMU/QEMU
   at a low level, for example, for writing custom tracers, etc. Is there
   one? Or is there something similar?

 KVM provides tracepoints for the Linux ftrace framework, see related
 documentation of the kernel.
 
 I found this http://lxr.linux.no/#linux+v2.6.27/Documentation/ftrace.txt
 but that can hardly be accused of being called documentation. I don't
 think something like this:
 http://www.pintool.org/tutorials/asplos08/slides/PinTutorial.pdf is
 unreasonable.

2.6.27 is too old anyway. There should be at least one LWN.net article
on this, and also quite a few presentations and paper, just ask your
favorite search engine.

 
 If you extend your guest
 Windows is by design not extensible.

Depends on where and how you want to hook into it. Of course, its kernel
is out of reach. But if you are interested e.g. in marking specific I/O
requests, you could write your own driver and hook into the stack.

 
   to issue certain
 events that the hypervisor sees and traces (e.g. writes to pseudo I/O
 ports), you can also trace things inside the guest that are otherwise
 invisible to the host. I once hacked up an ad-hoc tracing by means of
 hypercalls (required some kvm patching). That also worked from guest
 userspace - and revealed that even more hypercalls could be called that
 way (that's fixed in KVM now).

 
 
   Bugs:
 
   1. I hit a bug w/ instruction logging using a RAM based temp
 folder. If
   I ran w/ the following command line:
   (Version info: QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88))
 
   qemu-system-x86_64 -hda debian.img -enable-nesting -d in_asm

 -d only works in emulation mode as it relies on dynamic code translation
 (TCG). For qemu-kvm, you need to switch to emulation via -no-kvm (for
 upstream QEMU, it's the other way around).

 
 Hence why running w/out admin rights enables the logging. I was confused
 because the logfile is still created w/ the kvm module disabled.

It's created but remains empty for obvious reasons in KVM mode.

Jan



signature.asc
Description: OpenPGP digital signature


Re: debugging windows guests

2009-12-16 Thread Vadim Rozenfeld
On Wed, 2009-12-16 at 00:39 +0100, Jan Kiszka wrote:
 Raindog wrote:
  Hello,
  
  I am researching KVM as a malware analysis platform and had some
  questions about debugging the guest OS. In my case I intend to use
  windows guests. So my questsions are as follows:
  
  Questions:
  
  1. What instrumentation facilities are their available?
  
  2. Is it possible to extend the debugging interface so that debugging is
  more transparent to the guest OS? IE: there is still a limit of 4 HW
  breakpoints (which makes me wonder why a LIST is used for them...)
 
 In accelerated KVM mode, the x86 architecture restricts us to 4 break-
 or watchpoints that can be active at the same time. If you switch to
 emulation mode, there are no such limits. Actually, I just made use of
 this for debugging a subtle stack corruption in a guest, and I had more
 than 70 watchpoints active at the same time. It's just slightly slower
 than KVM...
 
  
  3. I'm not finding any published API for interfacing with KVM/KQEMU/QEMU
  at a low level, for example, for writing custom tracers, etc. Is there
  one? Or is there something similar?
 
 KVM provides tracepoints for the Linux ftrace framework, see related
 documentation of the kernel. If you extend your guest to issue certain
 events that the hypervisor sees and traces (e.g. writes to pseudo I/O
 ports), you can also trace things inside the guest that are otherwise
 invisible to the host.
You can WRITE_PORT_BUFFER_UCHAR to com1/com2 port when you are in kernel
mode. 
  I once hacked up an ad-hoc tracing by means of
 hypercalls (required some kvm patching). That also worked from guest
 userspace - and revealed that even more hypercalls could be called that
 way (that's fixed in KVM now).
 
  
  
  Bugs:
  
  1. I hit a bug w/ instruction logging using a RAM based temp folder. If
  I ran w/ the following command line:
  (Version info: QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88))
  
  qemu-system-x86_64 -hda debian.img -enable-nesting -d in_asm
 
 -d only works in emulation mode as it relies on dynamic code translation
 (TCG). For qemu-kvm, you need to switch to emulation via -no-kvm (for
 upstream QEMU, it's the other way around).
 
  
  It would successfully log to the tmp log file, but obviously, KVM would
  be disabled.
  
  If I use sudo, it won't log to the file, is this a known issue?
  
  2. -enable-nesting on AMD hardware using a xen guest OS causes xen to
  GPF somewhere in svm_cpu_up. Is nesting supposed to work w/ Xen based
  guests?
 
 If your host kernel or kvm-kmod is not 2.6.32 based, update first. A lot
 of nested SVM fixes went in recently. If it still fails, put Alex (Graf)
 and Joerg (Roedel) on CC.
 
 Jan
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Extended shared_msr_global to per CPU

2009-12-16 Thread Avi Kivity

On 12/16/2009 07:48 AM, Sheng Yang wrote:

shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.

Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.

Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.



-void kvm_define_shared_msr(unsigned slot, u32 msr)
+static void shared_msr_update(void *data)
  {
-   int cpu;
+   struct kvm_shared_msrs *smsr;
+   u32 msr = *(u32 *)data;
+   int slot;
u64 value;

+   smsr =get_cpu_var(shared_msrs);
   


Can use __get_cpu_var() since interrupts are disabled.


+   /* only read, and nobody should modify it at this time,
+* so don't need lock */
+   for (slot = 0; slot  shared_msrs_global.nr; slot++)
+   if (shared_msrs_global.msrs[slot] == msr)
+   break;
+   if (slot= shared_msrs_global.nr) {
+   printk(KERN_ERR kvm: can't find the defined MSR!);
+   return;
+   }
+   rdmsrl_safe(msr,value);
+   smsr-values[slot].host = value;
+   smsr-values[slot].curr = value;
+   put_cpu_var(shared_msrs);
+}
+
+void kvm_define_shared_msr(unsigned slot, u32 msr)
+{
if (slot= shared_msrs_global.nr)
shared_msrs_global.nr = slot + 1;
-   shared_msrs_global.msrs[slot].msr = msr;
-   rdmsrl_safe(msr,value);
-   shared_msrs_global.msrs[slot].value = value;
-   for_each_online_cpu(cpu)
-   per_cpu(shared_msrs, cpu).current_value[slot] = value;
+   shared_msrs_global.msrs[slot] = msr;
+   /* we need ensured the shared_msr_global have been updated */
+   smp_wmb();
+   smp_call_function(shared_msr_update,msr, 1);
   


on_each_cpu() is preferred.


+   shared_msr_update(msr);
  }
  EXPORT_SYMBOL_GPL(kvm_define_shared_msr);

  static void kvm_shared_msr_cpu_online(void)
  {
unsigned i;
-   struct kvm_shared_msrs *locals =__get_cpu_var(shared_msrs);
+   struct kvm_shared_msrs *smsr =__get_cpu_var(shared_msrs);

for (i = 0; i  shared_msrs_global.nr; ++i)
-   locals-current_value[i] = shared_msrs_global.msrs[i].value;
+   smsr-values[i].curr = smsr-values[i].host;
   


If the cpu is being onlined for the first time, then .host will not have 
been initialized.  Need to call shared_msr_update().


Also need to verify this is called after all the msrs have been 
initialized by the kernel, or we'll read default values.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging windows guests

2009-12-16 Thread Alexander Graf


Am 16.12.2009 um 09:14 schrieb Vadim Rozenfeld vroze...@redhat.com:


On Wed, 2009-12-16 at 00:39 +0100, Jan Kiszka wrote:

Raindog wrote:

Hello,

I am researching KVM as a malware analysis platform and had some
questions about debugging the guest OS. In my case I intend to use
windows guests. So my questsions are as follows:

Questions:

1. What instrumentation facilities are their available?

2. Is it possible to extend the debugging interface so that  
debugging is

more transparent to the guest OS? IE: there is still a limit of 4 HW
breakpoints (which makes me wonder why a LIST is used for them...)


In accelerated KVM mode, the x86 architecture restricts us to 4  
break-

or watchpoints that can be active at the same time. If you switch to
emulation mode, there are no such limits. Actually, I just made use  
of
this for debugging a subtle stack corruption in a guest, and I had  
more
than 70 watchpoints active at the same time. It's just slightly  
slower

than KVM...



3. I'm not finding any published API for interfacing with KVM/ 
KQEMU/QEMU
at a low level, for example, for writing custom tracers, etc. Is  
there

one? Or is there something similar?


KVM provides tracepoints for the Linux ftrace framework, see related
documentation of the kernel. If you extend your guest to issue  
certain

events that the hypervisor sees and traces (e.g. writes to pseudo I/O
ports), you can also trace things inside the guest that are otherwise
invisible to the host.
You can WRITE_PORT_BUFFER_UCHAR to com1/com2 port when you are in  
kernel

mode.

I once hacked up an ad-hoc tracing by means of
hypercalls (required some kvm patching). That also worked from guest
userspace - and revealed that even more hypercalls could be called  
that

way (that's fixed in KVM now).




Bugs:

1. I hit a bug w/ instruction logging using a RAM based temp  
folder. If

I ran w/ the following command line:
(Version info: QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88))

qemu-system-x86_64 -hda debian.img -enable-nesting -d in_asm


-d only works in emulation mode as it relies on dynamic code  
translation

(TCG). For qemu-kvm, you need to switch to emulation via -no-kvm (for
upstream QEMU, it's the other way around).



It would successfully log to the tmp log file, but obviously, KVM  
would

be disabled.

If I use sudo, it won't log to the file, is this a known issue?

2. -enable-nesting on AMD hardware using a xen guest OS causes xen  
to
GPF somewhere in svm_cpu_up. Is nesting supposed to work w/ Xen  
based

guests?


If your host kernel or kvm-kmod is not 2.6.32 based, update first.  
A lot
of nested SVM fixes went in recently. If it still fails, put Alex  
(Graf)

and Joerg (Roedel) on CC.


Also make sure you pass nested=1 to kvm-amd.ko.

Xen definitely worked for me, so you're probably just missing one of  
the many magic bits :-).


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: VMX: Add instruction rdtscp support for guest

2009-12-16 Thread Avi Kivity

On 12/16/2009 07:48 AM, Sheng Yang wrote:

Before enabling, execution of rdtscp in guest would result in #UD.


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f865e8..3a84acf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -532,6 +532,7 @@ struct kvm_x86_ops {
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
bool (*gb_page_enable)(void);
+   bool (*rdtscp_enable)(void);
   


Naming - a better name is rdtsp_supported().  rdtscp_enable sounds like 
you change it from disabled to enabled.



@@ -913,6 +924,9 @@ static void setup_msrs(struct vcpu_vmx *vmx)
index = __find_msr_index(vmx, MSR_CSTAR);
if (index= 0)
move_msr_up(vmx, index, save_nmsrs++);
+   index = __find_msr_index(vmx, MSR_TSC_AUX);
+   if (index= 0)
+   move_msr_up(vmx, index, save_nmsrs++);
   


Only if rdtscp is enabled in the guest's cpuid, so we don't play with 
this unnecessarily.  If it isn't, we should trap rdtscp and inject #UD.  
If it is, support the msr and don't trap.



/*
 * MSR_K6_STAR is only needed on long mode guests, and only
 * if efer.sce is enabled.
@@ -1002,6 +1016,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_IA32_SYSENTER_ESP:
data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
+   case MSR_TSC_AUX:
+   if (!vmx_rdtscp_enable())
+   return 1;
   


Again, check the guest rdtscp bit, not (just) the host's.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory under KVM?

2009-12-16 Thread Avi Kivity

On 12/16/2009 01:21 AM, Thomas Fjellstrom wrote:



The problem is it should be automatic. The balloon driver itself or
some other mechanism should be capable of noticing when it can free up
a bunch of guest memory. I can't be bothered to manually sit around and
monitor memory usage on my host so I can then go into virt-manager to
reduce memory to each guest.
   

That should be pretty easy though it will have an effect on guest
performance.

 

As long as its only done after an appropriately long idle period (ie: theres
been X MB's free for a long time, give it back), I can't see it harming
performance too much. At least not more than setting ram too low when
manually (de)ballooning memory.
   


It depends on what your expectations are.  If you have a lot of memory 
you might be surprised when you access an idle guest and have to wait 
for it to page itself back from disk.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory under KVM?

2009-12-16 Thread Thomas Fjellstrom
On Wed December 16 2009, Avi Kivity wrote:
 On 12/16/2009 01:21 AM, Thomas Fjellstrom wrote:
  The problem is it should be automatic. The balloon driver itself or
  some other mechanism should be capable of noticing when it can free
  up a bunch of guest memory. I can't be bothered to manually sit
  around and monitor memory usage on my host so I can then go into
  virt-manager to reduce memory to each guest.
 
  That should be pretty easy though it will have an effect on guest
  performance.
 
  As long as its only done after an appropriately long idle period (ie:
  theres been X MB's free for a long time, give it back), I can't see it
  harming performance too much. At least not more than setting ram too
  low when manually (de)ballooning memory.
 
 It depends on what your expectations are.  If you have a lot of memory
 you might be surprised when you access an idle guest and have to wait
 for it to page itself back from disk.
 

Why would it be swaping in that case? Only unused/free/cache memory should 
be returned to the host.

-- 
Thomas Fjellstrom
tfjellst...@shaw.ca
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Adjust KVM_VCPUEVENT flag names

2009-12-16 Thread Avi Kivity

On 12/16/2009 12:57 AM, Jan Kiszka wrote:

They are also used on KVM_GET_VCPU_EVENTS, so 'VALID' is a better name
element than 'SET'.

   


Thanks - applied and queued.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory under KVM?

2009-12-16 Thread Avi Kivity

On 12/16/2009 11:58 AM, Thomas Fjellstrom wrote:



It depends on what your expectations are.  If you have a lot of memory
you might be surprised when you access an idle guest and have to wait
for it to page itself back from disk.

 

Why would it be swaping in that case? Only unused/free/cache memory should
be returned to the host


Right, it would return cache memory, and when you use the guest next 
time, it will have to refill its cache.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory under KVM?

2009-12-16 Thread Thomas Fjellstrom
On Wed December 16 2009, Avi Kivity wrote:
 On 12/16/2009 11:58 AM, Thomas Fjellstrom wrote:
  It depends on what your expectations are.  If you have a lot of memory
  you might be surprised when you access an idle guest and have to wait
  for it to page itself back from disk.
 
  Why would it be swaping in that case? Only unused/free/cache memory
  should be returned to the host

Unless of course you were referring to the case of manually de-ballooning 
memory in the guests. Yes, swapping in the guests is slow, and you should 
try not to set the memory limit (-m) too small for a given workload.

Having a dynamic ballooning feature that did not actually change the guests 
view of ram wouldn't have that problem, especially since you're not 
returning any memory that's in use in the guest. And since KVM already 
supports running with large ranges of its assigned memory not actually 
assigned to it, dynamic ballooning probably isn't hard to support.

The memory over-commit rate on my old setup was rather astonishing. A 
couple of my guests would eventually get as low as showing 10MB ram in use. 
Even the larger memory users would get down as low as 1/5th the allocated 
ram after sitting mostly idle for a while. But since the full assigned ram 
is sometimes needed, just reducing the total assignment isn't a good option.

 Right, it would return cache memory, and when you use the guest next 
 time, it will have to refill its cache.

Sure, but there are hours where the guests can run with minimal memory use. 
It would allow one to run many more guests at the same time, if you know 
some/many of them won't always be using all of their assigned ram.

-- 
Thomas Fjellstrom
tfjellst...@shaw.ca
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


seabios question

2009-12-16 Thread Liu, Jinsong
Hi, Avi

Recently I want to update hvm vcpu add/remove feature of KVM.
I go through qemu-kvm, find that original boch vbios code (i.e, 
qemu-kvm/kvm/bios/acpi-dsdt.asl, ...) has been removed.

Now qemu-kvm use seabios git repo 
http://git.kernel.org/?p=virt/kvm/seabios.git;a=summary as its vbios, right?
If I want to update seabios dsdt logic, may I build and then copy 
seabios/out/bios.bin to qemu-kvm/pc-bios/bios.bin? 
and, I rebuild qemu-kvm and my update of seabios will take effect at kvm qemu 
side, right?

Thanks,
Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios question

2009-12-16 Thread Avi Kivity

On 12/16/2009 12:55 PM, Liu, Jinsong wrote:

Hi, Avi

Recently I want to update hvm vcpu add/remove feature of KVM.
I go through qemu-kvm, find that original boch vbios code (i.e, 
qemu-kvm/kvm/bios/acpi-dsdt.asl, ...) has been removed.

Now qemu-kvm use seabios git repo 
http://git.kernel.org/?p=virt/kvm/seabios.git;a=summary as its vbios, right?
If I want to update seabios dsdt logic, may I build and then copy 
seabios/out/bios.bin to qemu-kvm/pc-bios/bios.bin?
and, I rebuild qemu-kvm and my update of seabios will take effect at kvm qemu 
side, right?
   


Correct.  You also need a 'make install', since qemu will pick up the 
bios from /usr/local/share/qemu/bios.bin.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: seabios question

2009-12-16 Thread Gleb Natapov
On Wed, Dec 16, 2009 at 06:55:47PM +0800, Liu, Jinsong wrote:
 Hi, Avi
 
 Recently I want to update hvm vcpu add/remove feature of KVM.
 I go through qemu-kvm, find that original boch vbios code (i.e, 
 qemu-kvm/kvm/bios/acpi-dsdt.asl, ...) has been removed.
 
 Now qemu-kvm use seabios git repo 
 http://git.kernel.org/?p=virt/kvm/seabios.git;a=summary as its vbios, right?
 If I want to update seabios dsdt logic, may I build and then copy 
 seabios/out/bios.bin to qemu-kvm/pc-bios/bios.bin? 
 and, I rebuild qemu-kvm and my update of seabios will take effect at kvm qemu 
 side, right?
 
Seabios not yet supports vCPU hot plug/unplug.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: seabios question

2009-12-16 Thread Liu, Jinsong
Avi Kivity wrote:
 On 12/16/2009 12:55 PM, Liu, Jinsong wrote:
 Hi, Avi
 
 Recently I want to update hvm vcpu add/remove feature of KVM.
 I go through qemu-kvm, find that original boch vbios code (i.e,
 qemu-kvm/kvm/bios/acpi-dsdt.asl, ...) has been removed. 
 
 Now qemu-kvm use seabios git repo
 http://git.kernel.org/?p=virt/kvm/seabios.git;a=summary as its
 vbios, right?  
 If I want to update seabios dsdt logic, may I build and then copy
 seabios/out/bios.bin to qemu-kvm/pc-bios/bios.bin? and, I rebuild
 qemu-kvm and my update of seabios will take effect at kvm qemu side,
 right?  
 
 
 Correct.  You also need a 'make install', since qemu will pick up the
 bios from /usr/local/share/qemu/bios.bin.

Thanks Avi :)

Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: seabios question

2009-12-16 Thread Liu, Jinsong
Gleb Natapov wrote:
 On Wed, Dec 16, 2009 at 06:55:47PM +0800, Liu, Jinsong wrote:
 Hi, Avi
 
 Recently I want to update hvm vcpu add/remove feature of KVM.
 I go through qemu-kvm, find that original boch vbios code (i.e,
 qemu-kvm/kvm/bios/acpi-dsdt.asl, ...) has been removed. 
 
 Now qemu-kvm use seabios git repo
 http://git.kernel.org/?p=virt/kvm/seabios.git;a=summary as its
 vbios, right?  
 If I want to update seabios dsdt logic, may I build and then copy
 seabios/out/bios.bin to qemu-kvm/pc-bios/bios.bin? and, I rebuild
 qemu-kvm and my update of seabios will take effect at kvm qemu side,
 right?  
 
 Seabios not yet supports vCPU hot plug/unplug.

Yes, I notice that, thanks for remind!--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with qemu-kvm.git from today

2009-12-16 Thread Avi Kivity

On 12/15/2009 07:43 PM, Brian Jackson wrote:

With qemu-kvm.git from this morning (about an hour ago), I see the following
message. Qemu continues to run after this, but the guest is unresponsive and
the qemu process is chewing up 100% cpu.


rom: out of memory (rom pxe-virtio.bin, addr 0x000de800, size 0xdc00,
max 0x000e)
   


This was already noticed and is being worked on (I hope).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network shutdown under heavy load

2009-12-16 Thread Avi Kivity

On 12/14/2009 05:49 PM, rek2 wrote:
Hello, we notice that when we stress any of our guests, in this case 
they are all fedora, the KVM network will shutdown.. anyone experience 
this?




Herbert?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network shutdown under heavy load

2009-12-16 Thread Herbert Xu
On Wed, Dec 16, 2009 at 02:17:04PM +0200, Avi Kivity wrote:
 On 12/14/2009 05:49 PM, rek2 wrote:
 Hello, we notice that when we stress any of our guests, in this case  
 they are all fedora, the KVM network will shutdown.. anyone experience  
 this?

 Herbert?

What's the exact guest kernel version? When the network is down,
please get onto the guest console to determine which direction
(if not both) of the network is not functioning.

You can run tcpdump in the guest/host and execute pings on both
sides to see which direction is blocked.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] Nested VMX patch 1 implements vmon and vmoff

2009-12-16 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

   


Missing changelog entry.  Please use the format common to all kvm patches.



diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3de0b37..3f63cdd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -121,9 +121,6 @@ static int npt = 1;

  module_param(npt, int, S_IRUGO);

-static int nested = 1;
-module_param(nested, int, S_IRUGO);
-
   


Separate moving 'nested' into a different patch.


+struct __attribute__ ((__packed__)) level_state {
+};
+
+struct nested_vmx {
+   /* Has the level1 guest done vmxon? */
+   bool vmxon;
+   /* Level 1 state for switching to level 2 and back */
+   struct level_state *l1_state;
   


If this doesn't grow too large, can keep it as a member instead of a 
pointer.



+};
+

  static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -201,6 +214,7 @@ static struct kvm_vmx_segment_field {
  static u64 host_efer;

  static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
+static int create_l1_state(struct kvm_vcpu *vcpu);

  /*
   * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
@@ -961,6 +975,95 @@ static void guest_write_tsc(u64 guest_tsc, u64 host_tsc)
  }

  /*
+ * Handles msr read for nested virtualization
+ */
+static int nested_vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata)
+{
+   u64 vmx_msr = 0;
+
+   switch (msr_index) {
+   case MSR_IA32_FEATURE_CONTROL:
+   *pdata = 0;
+   break;
+   case MSR_IA32_VMX_BASIC:
+   *pdata = 0;
   


Not needed.


+   rdmsrl(MSR_IA32_VMX_BASIC, vmx_msr);
+   *pdata = (vmx_msr  0x00cf);
   


Please use symbolic constants.


+   break;
+   case MSR_IA32_VMX_PINBASED_CTLS:
+   rdmsrl(MSR_IA32_VMX_PINBASED_CTLS, vmx_msr);
+   *pdata = (PIN_BASED_EXT_INTR_MASK  
vmcs_config.pin_based_exec_ctrl) |
+   (PIN_BASED_NMI_EXITING  
vmcs_config.pin_based_exec_ctrl) |
+   (PIN_BASED_VIRTUAL_NMIS  
vmcs_config.pin_based_exec_ctrl);
   


Don't understand.  You read vmx_msr and then use vmcs_config?


+   case MSR_IA32_VMX_PROCBASED_CTLS:
+   {
+   u32 vmx_msr_high, vmx_msr_low;
+   u32 control = CPU_BASED_HLT_EXITING |
+#ifdef CONFIG_X86_64
+   CPU_BASED_CR8_LOAD_EXITING |
+   CPU_BASED_CR8_STORE_EXITING |
+#endif
+   CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_USE_IO_BITMAPS |
+   CPU_BASED_MOV_DR_EXITING |
+   CPU_BASED_USE_TSC_OFFSETING |
+   CPU_BASED_INVLPG_EXITING |
+   CPU_BASED_TPR_SHADOW |
+   CPU_BASED_USE_MSR_BITMAPS |
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+
+   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
+
+   control= vmx_msr_high; /* bit == 0 in high word ==  must be 
zero */
+   control |= vmx_msr_low;  /* bit == 1 in low word  ==  must be 
one  */
+
+   *pdata = (CPU_BASED_HLT_EXITING  control) |
+#ifdef CONFIG_X86_64
+   (CPU_BASED_CR8_LOAD_EXITING  control) |
+   (CPU_BASED_CR8_STORE_EXITING  control) |
+#endif
+   (CPU_BASED_CR3_LOAD_EXITING  control) |
+   (CPU_BASED_CR3_STORE_EXITING  control) |
+   (CPU_BASED_USE_IO_BITMAPS  control) |
+   (CPU_BASED_MOV_DR_EXITING  control) |
+   (CPU_BASED_USE_TSC_OFFSETING  control) |
+   (CPU_BASED_INVLPG_EXITING  control) ;
   


What about the high word of the msr?  Will it always allow 0?



  /*
+ * Writes msr value for nested virtualization
+ * Returns 0 on success, non-0 otherwise.
+ */
+static int nested_vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
+{
+   switch (msr_index) {
+   case MSR_IA32_FEATURE_CONTROL:
+   if ((data  (FEATURE_CONTROL_LOCKED |
+FEATURE_CONTROL_VMXON_ENABLED))
+   != (FEATURE_CONTROL_LOCKED |
+   FEATURE_CONTROL_VMXON_ENABLED))
+   return 1;
+   break;
   


Need to trap if unsupported bits are set.

Need a way for userspace to write these msrs, so that live migration to 
an older kvm can work.  We do the same thing with cpuid - userspace sets 
cpuid to values that are common across the migration cluster.



+static void free_l1_state(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (!vmx-nested.l1_state)
+   return;
   


Check isn't needed, kfree() likes NULLs.


+
+   kfree(vmx-nested.l1_state);
+   

Re: [PATCH 2/7] Nested VMX patch 2 implements vmclear

2009-12-16 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

---
  arch/x86/kvm/vmx.c |  235 +++-
  arch/x86/kvm/x86.c |5 +-
  arch/x86/kvm/x86.h |3 +
  3 files changed, 240 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2726a6c..a7ffd5e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -93,13 +93,39 @@ struct shared_msr_entry {
  };

  struct __attribute__ ((__packed__)) level_state {
+   /* Has the level1 guest done vmclear? */
+   bool vmclear;
+};
   


Suggest calling it launch_state and using an enum.  We can have three 
states: uninitialized, clear, and launched.  Not sure if this is really 
required by the spec.


Do we need vmclear in l1_state?


+struct __attribute__ ((__packed__)) nested_vmcs_page {
+   u32 revision_id;
+   u32 abort;
+   struct level_state l2_state;
+};
+
+struct nested_vmcs_list {
+   struct list_head list;
   


'link'


+   gpa_t vmcs_addr;
+   struct vmcs *l2_vmcs;
  };

  struct nested_vmx {
/* Has the level1 guest done vmxon? */
bool vmxon;
+   /* What is the location of the current vmcs l1 keeps for l2 */
+   gpa_t current_vmptr;
/* Level 1 state for switching to level 2 and back */
struct level_state *l1_state;
+   /* list of vmcs for each l2 guest created by l1 */
+   struct list_head l2_vmcs_list;
+   /* l2 page corresponding to the current vmcs set by l1 */
+   struct nested_vmcs_page *current_l2_page;
  };

  struct vcpu_vmx {
@@ -156,6 +182,76 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
*vcpu)
return container_of(vcpu, struct vcpu_vmx, vcpu);
  }

+static struct page *nested_get_page(struct kvm_vcpu *vcpu,
+   u64 vmcs_addr)
+{
+   struct page *vmcs_page = NULL;
+
+   down_read(current-mm-mmap_sem);
+   vmcs_page = gfn_to_page(vcpu-kvm, vmcs_addr  PAGE_SHIFT);
+   up_read(current-mm-mmap_sem);
   


gfn_to_page() doesn't need mmap_sem (and may deadlock if you take it).


+
+   if (is_error_page(vmcs_page)) {
+   printk(KERN_ERR %s error allocating page 0x%llx\n,
+  __func__, vmcs_addr);
+   kvm_release_page_clean(vmcs_page);
+   return NULL;
+   }
+
+   return vmcs_page;
+
+}
+
+static int nested_map_current(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   struct page *vmcs_page =
+   nested_get_page(vcpu, vmx-nested.current_vmptr);
+   struct nested_vmcs_page *mapped_page;
+
+   if (vmcs_page == NULL) {
+   printk(KERN_INFO %s: failure in nested_get_page\n, __func__);
+   return 0;
+   }
+
+   if (vmx-nested.current_l2_page) {
+   printk(KERN_INFO %s: shadow vmcs already mapped\n, __func__);
+   WARN_ON(1);
+   return 0;
+   }
+
+   mapped_page = kmap_atomic(vmcs_page, KM_USER0);
+
+   if (!mapped_page) {
+   printk(KERN_INFO %s: error in kmap_atomic\n, __func__);
+   return 0;
+   }
   


kmap_atomic() can't fail.


+
+   vmx-nested.current_l2_page = mapped_page;
+
+   return 1;
+}
+
+static void nested_unmap_current(struct kvm_vcpu *vcpu)
+{
+   struct page *page;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (!vmx-nested.current_l2_page) {
+   printk(KERN_INFO Shadow vmcs already unmapped\n);
+   WARN_ON(1);
   


Use BUG_ON(), since this can't happen unless there's a bug.


+   return;
+   }
+
+   page = kmap_atomic_to_page(vmx-nested.current_l2_page);
+
+   kunmap_atomic(vmx-nested.current_l2_page, KM_USER0);
+
+   kvm_release_page_dirty(page);
+
+   vmx-nested.current_l2_page = NULL;
+}
+
  static int init_rmode(struct kvm *kvm);
  static u64 construct_eptp(unsigned long root_hpa);

@@ -1144,6 +1240,35 @@ static int nested_vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
return 0;
  }

+static int read_guest_vmcs_gpa(struct kvm_vcpu *vcpu, gva_t gva, u64 *gentry)
+{
+   int r = 0;
+   uint size;
+
+   *gentry = 0;
+
+   if (is_long_mode(vcpu))
+   size = sizeof(u64);
+   else
+   size = sizeof(u32);
   


I think the gpa is always 64 bit, regardless of the current mode.


+
+   r = kvm_read_guest_virt(gva, gentry,
+   size, vcpu);
+   if (r) {
+   printk(KERN_ERR %s cannot read guest vmcs addr %lx : %d\n,
+  __func__, vcpu-arch.regs[VCPU_REGS_RAX], r);
   


RAX may not be relevant.  Just return, and the user can disassemble the 
instructions and see for themselves.



+   return r;
+   }
+
+   if (!IS_ALIGNED(*gentry, PAGE_SIZE)) {
+   printk(KERN_DEBUG %s addr %llx not aligned\n,
+  

Re: [PATCH] qemu-kvm initialize vcpu state after machine initialization

2009-12-16 Thread Marcelo Tosatti
On Tue, Dec 15, 2009 at 02:33:27PM +0200, Gleb Natapov wrote:
 On Tue, Dec 15, 2009 at 10:24:15AM -0200, Marcelo Tosatti wrote:
  On Tue, Dec 15, 2009 at 01:20:38PM +0200, Gleb Natapov wrote:
   On Mon, Dec 14, 2009 at 06:36:37PM -0200, Marcelo Tosatti wrote:

So that the vcpu state is initialized, from vcpu thread context, after 
machine initialization is settled.

This allows to revert apic_init's apic_reset call. apic_reset now
happens through system_reset, similarly to qemu upstream.

   This patch essentially revers commit 898c51c3. This commit fixes two
   races. First race is like this:
   
vcpu0vcpu1
   
  starts running
  loads lapic state into kernel 
  sends event to vcpu1
  starts running
  loads lapic state into kernel
  overwrites event from vcpu0
  
   At the time 898c51c3 was committed the race was easily reproducible
   by starting VM with 16 cpus + seabios. Sometimes some vcpus lost INIT/SIPI
   events. Now I am not able to reproduce it even with this patch applied,
   so something else changed, but it doesn't make the race non existent or
   acceptable.
  
  Note qemu_kvm_load_lapic depends on env-created set (kvm_vcpu_inited),
 Uhh. What about doing this:
 
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 44e8b75..fa6db8e 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1920,12 +1920,12 @@ static void *ap_main_loop(void *_env)
  pthread_mutex_lock(qemu_mutex);
  cpu_single_env = env;
  
 +current_env-created = 1;
  kvm_arch_init_vcpu(env);
  
  kvm_arch_load_regs(env);
  
  /* signal VCPU creation */
 -current_env-created = 1;
  pthread_cond_signal(qemu_vcpu_cond);
  
  /* and wait for machine initialization */

load_lapic ioctl is called from pc_new_cpu - apic_reset, so its
alright. No need for the patch above, then.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] Nested VMX patch 3 implements vmptrld and vmptrst

2009-12-16 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:


+
+
  struct __attribute__ ((__packed__)) level_state {
/* Has the level1 guest done vmclear? */
bool vmclear;
+
+   u64 io_bitmap_a;
+   u64 io_bitmap_b;
+   u64 msr_bitmap;
+
+   bool first_launch;
  };
   


Please keep things naturally aligned.


  /*
@@ -122,6 +255,8 @@ struct nested_vmx {
gpa_t current_vmptr;
/* Level 1 state for switching to level 2 and back */
struct level_state *l1_state;
+   /* Level 1 shadow vmcs for switching to level 2 and back */
+   struct shadow_vmcs *l1_shadow_vmcs;
/* list of vmcs for each l2 guest created by l1 */
struct list_head l2_vmcs_list;
/* l2 page corresponding to the current vmcs set by l1 */
@@ -187,10 +322,7 @@ static struct page *nested_get_page(struct kvm_vcpu *vcpu,
  {
struct page *vmcs_page = NULL;

-   down_read(current-mm-mmap_sem);
vmcs_page = gfn_to_page(vcpu-kvm, vmcs_addr  PAGE_SHIFT);
-   up_read(current-mm-mmap_sem);
   


Fold this into the patch that introduced the problem.


-
if (is_error_page(vmcs_page)) {
printk(KERN_ERR %s error allocating page 0x%llx\n,
   __func__, vmcs_addr);
@@ -832,13 +964,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
u8 error;
-
per_cpu(current_vmcs, cpu) = vmx-vmcs;
+
   


Please avoid pointless whitespace changes.


asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0
  : =g(error) : a(phys_addr), m(phys_addr)
  : cc);
+
if (error)
-   printk(KERN_ERR kvm: vmptrld %p/%llx fail\n,
+   printk(KERN_ERR kvm: vmptrld %p/%llx failed\n,
   vmx-vmcs, phys_addr);
   


Fold.


+
  static int create_l1_state(struct kvm_vcpu *vcpu)
  {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -1441,10 +1587,75 @@ static int create_l1_state(struct kvm_vcpu *vcpu)
} else
return 0;

+   vmx-nested.l1_shadow_vmcs = kzalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!vmx-nested.l1_shadow_vmcs) {
+   printk(KERN_INFO %s could not allocate memory for l1_shadow 
vmcs\n,
+  __func__);
+   kfree(vmx-nested.l1_state);
+   return -ENOMEM;
+   }
+
INIT_LIST_HEAD((vmx-nested.l2_vmcs_list));
return 0;
  }

+static struct vmcs *alloc_vmcs(void);
+int create_l2_state(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   struct vmcs *l2_vmcs;
+
+   if (!nested_map_current(vcpu)) {
+   printk(KERN_ERR %s error mapping  level 2 page, __func__);
+   return -ENOMEM;
+   }
+
+   l2_vmcs = nested_get_current_vmcs(vcpu);
+   if (!l2_vmcs) {
+   struct nested_vmcs_list *new_l2_guest =
+   (struct nested_vmcs_list *)
+   kmalloc(sizeof(struct nested_vmcs_list), GFP_KERNEL);
+
+   if (!new_l2_guest) {
+   printk(KERN_ERR %s error could not allocate memory for a 
new l2 guest list item,
+  __func__);
+   nested_unmap_current(vcpu);
+   return -ENOMEM;
+   }
   


Can the list grow without bounds?


+
+   l2_vmcs = alloc_vmcs();
+
+   if (!l2_vmcs) {
+   printk(KERN_ERR %s error could not allocate memory for 
l2_vmcs,
+  __func__);
+   kfree(new_l2_guest);
+   nested_unmap_current(vcpu);
+   return -ENOMEM;
+   }
+
+   new_l2_guest-vmcs_addr = vmx-nested.current_vmptr;
+   new_l2_guest-l2_vmcs   = l2_vmcs;
+   list_add((new_l2_guest-list),(vmx-nested.l2_vmcs_list));
+   }
+
+   if (cpu_has_vmx_msr_bitmap())
+   vmx-nested.current_l2_page-l2_state.msr_bitmap =
+   vmcs_read64(MSR_BITMAP);
+   else
+   vmx-nested.current_l2_page-l2_state.msr_bitmap = 0;
+
+   vmx-nested.current_l2_page-l2_state.io_bitmap_a =
+   vmcs_read64(IO_BITMAP_A);
+   vmx-nested.current_l2_page-l2_state.io_bitmap_b =
+   vmcs_read64(IO_BITMAP_B);
   


Don't understand why these reads are needed.


+
+   vmx-nested.current_l2_page-l2_state.first_launch = true;
+
+   nested_unmap_current(vcpu);
+
+   return 0;
+}
+
  



@@ -3633,8 +3849,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
return 1;
}

-   if (create_l1_state(vcpu)) {
-   printk(KERN_ERR %s create_l1_state failed\n, __func__);
+   r = create_l1_state(vcpu);
+   if (r) {
+   printk(KERN_ERR %s create_l1_state failed: 

Re: [PATCH RFC: kvm tsc virtualization 15/20] Fix longstanding races

2009-12-16 Thread Marcelo Tosatti
On Tue, Dec 15, 2009 at 11:26:59AM -1000, Zachary Amsden wrote:
 On 12/15/2009 08:21 AM, Marcelo Tosatti wrote:
 On Mon, Dec 14, 2009 at 06:08:42PM -1000, Zachary Amsden wrote:

 snip

 +   atomic_set(per_cpu(cpu_tsc_synchronized, freq-cpu),
 0);
 +   spin_lock(kvm_lock);
 +   list_for_each_entry(kvm,vm_list, vm_list) {
 +   kvm_for_each_vcpu(i, vcpu, kvm) {
 +   if (vcpu-cpu != freq-cpu)
 +   continue;
 +   if (vcpu-cpu != smp_processor_id())
 +   send_ipi++;
 +   kvm_request_guest_time_update(vcpu);

 There is some overlap here between KVM_REQ_KVMCLOCK_UPDATE and
 cpu_tsc_synchronized. Its the same information (frequency for a CPU has
 changed) stored in two places.

 Later you do:

  spin_lock(kvm_lock);
  list_for_each_entry(kvm,vm_list, vm_list) {
  kvm_for_each_vcpu(i, vcpu, kvm) {
  if (vcpu-cpu != freq-cpu)
  continue;
  if (vcpu-cpu != smp_processor_id())
 send_ipi++;
  kvm_request_guest_time_update(vcpu);
  }
  }
  spin_unlock(kvm_lock);
  --- a remote CPU could have updated kvmclock information
   with stale cpu_tsc_khz, clearing the
   KVM_REQ_KVMCLOCK_UPDATE bit.
  smp_call_function(evict) (which sets cpu_tsc_synchronized
to zero)

 Maybe worthwhile to unify it. Perhaps use the per cpu tsc generation in
 addition to vcpu_load to update kvmclock info (on arch vcpu_load update
 kvmclock store generation, update again on generation change).


 Yes, that is an excellent point.  The generation counter, the  
 tsc_synchronized variable and the per-vcpu clock counter all have some  
 amount of redundancy of information.

 Perhaps instead of overlapping, they should be layered?

 A rule for kvmclock: can't update kvmclock info until cpu is synchronized?

How about update kvmclock on:

- cpu switch (kvm_arch_vcpu_load). Then store cpu tsc generation in
  vcpu-arch.
- on vcpu_enter_guest, if tsc generation changes.

If kvm_arch_vcpu_load stored stale cpu_khz into kvmclock, the tsc
generation will have changed by the time guest entry succeeds.

Then you can kill KVM_REQ_KVMCLOCK_UPDATE and the kvm_for_each_vcpu()
loop in the cpufreq callback. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] Nested VMX patch 4 implements vmread and vmwrite

2009-12-16 Thread Avi Kivity

On 12/10/2009 08:38 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

---
  arch/x86/kvm/vmx.c |  670 +++-
  1 files changed, 660 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 46a4f3a..8745d44 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -239,6 +239,7 @@ struct __attribute__ ((__packed__)) level_state {
  struct __attribute__ ((__packed__)) nested_vmcs_page {
u32 revision_id;
u32 abort;
+   struct shadow_vmcs shadow_vmcs;
struct level_state l2_state;
  };

@@ -263,6 +264,55 @@ struct nested_vmx {
struct nested_vmcs_page *current_l2_page;
  };

+enum vmcs_field_type {
+   VMCS_FIELD_TYPE_U16 = 0,
+   VMCS_FIELD_TYPE_U64 = 1,
+   VMCS_FIELD_TYPE_U32 = 2,
+   VMCS_FIELD_TYPE_ULONG = 3
+};
+
+#define VMCS_FIELD_LENGTH_OFFSET 13
+#define VMCS_FIELD_LENGTH_MASK 0x6000
+
+/*
+  Returns VMCS Field type
+*/
+static inline int vmcs_field_type(unsigned long field)
+{
+   /* For 32 bit L1 when it using the HIGH field */
+   if (0x1  field)
+   return VMCS_FIELD_TYPE_U32;
+
+   return (VMCS_FIELD_LENGTH_MASK  field)  13;
+}
+
+/*
+  Returncs VMCS field size in bits
+*/
+static inline int vmcs_field_size(int field_type, struct kvm_vcpu *vcpu)
+{
+   switch (field_type) {
+   case VMCS_FIELD_TYPE_U16:
+   return 2;
+   case VMCS_FIELD_TYPE_U32:
+   return 4;
+   case VMCS_FIELD_TYPE_U64:
+   return 8;
+   case VMCS_FIELD_TYPE_ULONG:
+#ifdef CONFIG_X86_64
+   if (is_long_mode(vcpu))
+   return 8;
+   else
   


Can replace with #endif


+   return 4;
+#else
+   return 4;
+#endif
   


... and drop the previous three lines.


+   }
+
+   printk(KERN_INFO WARNING: invalid field type %d \n, field_type);
+   return 0;
   


Can this happen?  The field is only two bits wide.



+static inline struct shadow_vmcs *get_shadow_vmcs(struct kvm_vcpu *vcpu)
+{
+   WARN_ON(!to_vmx(vcpu)-nested.current_l2_page);
+   return(to_vmx(vcpu)-nested.current_l2_page-shadow_vmcs);
+}
+
+#define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x)
+
+static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = {
+
+   [VIRTUAL_PROCESSOR_ID] =
+   SHADOW_VMCS_OFFSET(virtual_processor_id),
   


Keep on one line, you can use a shorter macro name if it helps.  This 
table is just noise.



+
+static inline unsigned short vmcs_field_to_offset(unsigned long field)
+{
+
+   if (field  HOST_RIP || vmcs_field_to_offset_table[field] == 0) {
+   printk(KERN_ERR invalid vmcs encoding 0x%lx\n, field);
+   return -1;
   


This will be converted to 0x.


+   }
+
+   return vmcs_field_to_offset_table[field];
+}
+
+static inline unsigned long nested_vmcs_readl(struct kvm_vcpu *vcpu,
+ unsigned long field)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   unsigned long *entry;
+
+   if (!vmx-nested.current_l2_page) {
+   printk(KERN_ERR %s invalid nested vmcs\n, __func__);
+   return -1;
+   }
+
+   entry = (unsigned long *)((char *)(get_shadow_vmcs(vcpu)) +
+vmcs_field_to_offset(field));
   


Error check?


+static inline u64 nested_vmcs_read64(struct kvm_vcpu *vcpu, unsigned long 
field)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   u64 *entry;
+   if (!vmx-nested.current_l2_page) {
+   printk(KERN_ERR %s invalid nested vmcs\n, __func__);
+   return -1;
+   }
+
+   entry = (u64 *)((char *)(get_shadow_vmcs(vcpu)) +
+vmcs_field_to_offset(field));
   


Need to support the 'high' part of 64-bit fields.


+   return *entry;
+}

+
+static inline void nested_vmcs_write64(struct kvm_vcpu *vcpu,
+  unsigned long field, u64 value)
+{
+#ifdef CONFIG_X86_64
+   nested_vmcs_writel(vcpu, field, value);
+#else /* nested: 32 bit not actually tested */
+   nested_vmcs_writel(vcpu, field, value);
+   nested_vmcs_writel(vcpu, field+1, value  32);
+#endif
   


High field support needed.


  static struct page *nested_get_page(struct kvm_vcpu *vcpu,
u64 vmcs_addr)
  {
@@ -354,11 +809,6 @@ static int nested_map_current(struct kvm_vcpu *vcpu)

mapped_page = kmap_atomic(vmcs_page, KM_USER0);

-   if (!mapped_page) {
-   printk(KERN_INFO %s: error in kmap_atomic\n, __func__);
-   return 0;
-   }
-
   


Fold.


vmx-nested.current_l2_page = mapped_page;

return 1;
@@ -1390,7 +1840,7 @@ static int read_guest_vmcs_gpa(struct kvm_vcpu *vcpu, 
gva_t gva, u64 *gentry)
size, vcpu);
if (r) {

Re: [PATCH] Enable non page boundary BAR device assignment

2009-12-16 Thread Muli Ben-Yehuda
On Tue, Dec 15, 2009 at 07:24:47PM +0100, Alexander Graf wrote:
 Michael S. Tsirkin wrote:

  I guess this means you'll have to find a device with a sub-page
  BAR to test this on, instead of hacking driver for a device with
  full page BAR. Did anyone ever try doing passthrough on an
  emulated device in nested virt?
 
 We don't emulate an IOMMU, so no.

We have in-house patches to emulate VT-d, which work sufficiently well
to run unmodified Linux drivers for QEMU's emulation of e1000 and LSI
adapter, using unmodified Linux VT-d DMA-mapping code. The patches
need some lovin', but they'll be coming eventually.

Cheers,
Muli
-- 
Muli Ben-Yehuda | m...@il.ibm.com | +972-4-8281080
Manager, Virtualization and Systems Architecture
Master Inventor, IBM Research -- Haifa
Second Workshop on I/O Virtualization (WIOV '10):
http://sysrun.haifa.il.ibm.com/hrl/wiov2010/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-autotest][RFC] 32/32 PAE bit guest system definition

2009-12-16 Thread Dor Laor

On 12/15/2009 09:04 PM, Lucas Meneghel Rodrigues wrote:

On Fri, Dec 11, 2009 at 2:34 PM, Jiri Zupkajzu...@redhat.com  wrote:

Hello,
  we write KSM_overcommit test. If we calculate memory for guest we need to know
which architecture is Guest. If it is a 32b or 32b with PAE or 64b system.
Because with a 32b guest we can allocate only 3100M +-.

Currently we use the name of disk's image file. Image file name ends with 64 or 
32.
Is there way how we can detect if guest machine run with PAE etc.. ?
Do you think that kvm_autotest can define parameter in kvm_tests.cfg which
configure determine if is guest 32b or 32b with PAE or 64b.


Hi Jiri, sorry for taking long to answer you, I am reviewing the
overcommit test.

About your question, I'd combine your approach of picking if host is
32/64 bit from the image name with looking on /cat/cpuinfo for PAE
support.


We might keep in KISS for the time being since 99% hosts installation 
are 64 bit only and many times only the guest can turn on PAE to 
practically use it.


So I'll go with naming only.



Let's go with this approach for the final version of the test, OK?

Thanks and congrats for the test, it's a great piece of work! More
comments soon,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging windows guests

2009-12-16 Thread Raindog

On 12/15/2009 3:39 PM, Jan Kiszka wrote:

Raindog wrote:
  Hello,

  I am researching KVM as a malware analysis platform and had some
  questions about debugging the guest OS. In my case I intend to use
  windows guests. So my questsions are as follows:

  Questions:

  1. What instrumentation facilities are their available?

  2. Is it possible to extend the debugging interface so that debugging is
  more transparent to the guest OS? IE: there is still a limit of 4 HW
  breakpoints (which makes me wonder why a LIST is used for them...)

In accelerated KVM mode, the x86 architecture restricts us to 4 break-
or watchpoints that can be active at the same time. If you switch to
emulation mode, there are no such limits. Actually, I just made use of
this for debugging a subtle stack corruption in a guest, and I had more
than 70 watchpoints active at the same time. It's just slightly slower
than KVM...
   


Are there any advantages over stock qemu if using kvm w/out the kernel 
module?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Art Kupr has added you as a friend on the website VK.com

2009-12-16 Thread VK
kvm,

Art Kupr has added you as a friend on the website VK.com

You can log in and view your friends` pages using your email and automatically 
created password: 06EuNln3

VK.com is a website that helps dozens of millions of people find their old 
friends, share photos and events and always stay in touch.

To log in, please follow this link:
http://vkontakte.ru/login.php?regemail=...@vger.kernel.org#06eunln3

You can change your password in Settings.

Attention: If you ignore this invitation, your registration will not be 
activated.

Good luck!Best regards,
VK Administration


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SIGTERM to qemu-kvm process destroys qcow2 image?

2009-12-16 Thread Kenni Lund
Hi

Sorry if this is a stupid question, but is it expected behaviour that
a qcow2 image will/can get damaged by killing the qemu-kvm process
with a SIGTERM signal?

I would expect data on filesystems within the virtual machine to
potentially get damaged if it's in use, but I though that the qemu-kvm
process would take care of finishing its writes correctly to the qcow2
image before shutting down, ensuring the integrity of the qcow2 image.

Yesterday I entered an invalid boot device as an argument to my
qemu-kvm command for my Windows XP machine, causing an error about a
missing boot device in the qemu BIOS/POST. As I didn't have any
filesystems mounted inside the virtual machine (since it was stuck at
the BIOS asking for a device to boot), I did a kill $pid, fixed the
boot device in the qemu-kvm command and tried booting again...but with
no luck, whatever I try now with qemu-kvm gives me the error:
qemu: could not open disk image /data/virtualization/WindowsXP.img

And qemu-img (check, convert, etc) gives me:
qemu-img: Could not open 'WindowsXP.img'

Is this expected behaviour? Luckily I do have backups of the most
important data on this machine, I'm just happy this didn't happen to
any of my critical machines :-/

I'm on qemu-kvm 0.11.0 with kernel modules from 2.6.31.6.

Best Regards
Kenni Lund
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2907597 ] qemu vnc server clips at 2560x1600

2009-12-16 Thread SourceForge.net
Bugs item #2907597, was opened at 2009-12-02 16:57
Message generated for change (Comment added) made by mcsoccer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2907597group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: Matthew Colyer (mcsoccer)
Assigned to: Nobody/Anonymous (nobody)
Summary: qemu vnc server clips at 2560x1600

Initial Comment:
So I am running using the VESA driver to run an Ubuntu 9.10 guest at 2560x1600 
(I had to modify the xserver-video-vesa package to remove an internal screen 
limit of 2048x2048 in the xorg vesa driver) and everything works great except 
that the qemu vnc server appears to clip at this resolution. The problem goes 
away if I run 1900x1200 and it doesn't change if I run 16bit depth or 24bit 
depth.

I have attached two screenshots, the first is vncing directly into qemu (which 
exhibits the problem) and the second is vncing to a vnc server I have running 
in the guest which doesn't have the problem.

I poked around in vnc.c and couldn't see any limits but I feel like its a 
buffer limit of some kind.

Also if you look very closely at the first image you can see that the first row 
is drawn correctly all the way across but subsequent rows are not.

If you need more information doesn't hesitate to ask.

--

Comment By: Matthew Colyer (mcsoccer)
Date: 2009-12-17 03:18

Message:
The blocks do not show up when run in SDL mode. So I believe the problem is
still somehow related to the VNC server.

--

Comment By: Avi Kivity (avik)
Date: 2009-12-13 10:29

Message:
Does it work well with SDL?  Maybe the problem is not vnc related.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2907597group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGTERM to qemu-kvm process destroys qcow2 image?

2009-12-16 Thread Avi Kivity

On 12/17/2009 02:52 AM, Kenni Lund wrote:

Hi

Sorry if this is a stupid question, but is it expected behaviour that
a qcow2 image will/can get damaged by killing the qemu-kvm process
with a SIGTERM signal?

   


If it does, that's a serious bug.  qcow2 should survive SIGTERM, 
SIGKILL, and host power loss.



I would expect data on filesystems within the virtual machine to
potentially get damaged if it's in use, but I though that the qemu-kvm
process would take care of finishing its writes correctly to the qcow2
image before shutting down, ensuring the integrity of the qcow2 image.
   


No, it uses O_SYNC writes to ensure all writes are completed, and orders 
writes carefully so the image is consistent at all times.



Yesterday I entered an invalid boot device as an argument to my
qemu-kvm command for my Windows XP machine, causing an error about a
missing boot device in the qemu BIOS/POST. As I didn't have any
filesystems mounted inside the virtual machine (since it was stuck at
the BIOS asking for a device to boot), I did a kill $pid, fixed the
boot device in the qemu-kvm command and tried booting again...but with
no luck, whatever I try now with qemu-kvm gives me the error:
qemu: could not open disk image /data/virtualization/WindowsXP.img

And qemu-img (check, convert, etc) gives me:
qemu-img: Could not open 'WindowsXP.img'
   


Can you post the first 4K of the image?  It shouldn't contain private 
data, but go over it (or don't post) if you sensitive information there.



Is this expected behaviour? Luckily I do have backups of the most
important data on this machine, I'm just happy this didn't happen to
any of my critical machines :-/

   



It is not.


I'm on qemu-kvm 0.11.0 with kernel modules from 2.6.31.6.
   


Should be recent enough.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging windows guests

2009-12-16 Thread Avi Kivity

On 12/17/2009 12:06 AM, Raindog wrote:


Are there any advantages over stock qemu if using kvm w/out the kernel 
module?


No.  qemu-kvm is not tested without kvm, so there may be disadvantages.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debugging windows guests

2009-12-16 Thread Raindog

On 12/16/2009 9:36 PM, Avi Kivity wrote:

On 12/17/2009 12:06 AM, Raindog wrote:


Are there any advantages over stock qemu if using kvm w/out the 
kernel module?


No.  qemu-kvm is not tested without kvm, so there may be disadvantages.



Does that then imply that svm emulation (--enable-nesting) is not well 
tested either?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html