date:20100801

Re: enabling X86_FEATURE_ARCH_PERFMON in guest

2010-08-01 Thread Avi Kivity


 On 07/30/2010 08:02 PM, David S. Ahern wrote:

How do I get X86_FEATURE_ARCH_PERFMON enabled for a guest?

I've tried -cpu host,+perfmon and -cpu host,+arch_perfmon, but both
get rejected with an error: CPU feature perfmon not found



qemu doesn't know about the perfmon feature bits.  Even if it did, kvm 
doesn't, so it wouldn't work.


Jes is working on this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.

2010-08-01 Thread Avi Kivity


 On 07/30/2010 06:46 PM, Shirley Ma wrote:

Hello Avi,

On Fri, 2010-07-30 at 08:02 +0300, Avi Kivity wrote:

get_user_pages() is indeed slow.  But what about
get_user_pages_fast()?

Note that when the page is first touched, get_user_pages_fast() falls
back to get_user_pages(), so the latency needs to be measured after
quite a bit of warm-up.

Yes, I used get_user_pages_fast, however if falled back to
get_user_pages() when the apps doesn't allocate buffer on the same page.
If I run a single ping, the RTT is extremely high, but when running
multiple pings, the RTT time reduce significantly, but still it is not
as fast as copy from my initial test. I am thinking that we might need
to pre-pin memory pool.



I don't understand.  Under what conditions do you use get_user_pages() 
instead of get_user_pages_fast()?  Why?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sat, Jul 31, 2010 at 08:25:13PM +0300, Avi Kivity wrote:
  On 07/29/2010 03:11 PM, Gleb Natapov wrote:
 No need to update vcpu state since instruction is in the middle of the
 emulation.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
   arch/x86/kvm/x86.c |   31 +--
   1 files changed, 13 insertions(+), 18 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 76fbc32..7e5f075 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4057,32 +4057,27 @@ restart:
  return handle_emulation_failure(vcpu);
  }
 
 -toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
 -kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 -memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
 -kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 +r = EMULATE_DONE;
 
 -if (vcpu-arch.emulate_ctxt.exception= 0) {
 +if (vcpu-arch.emulate_ctxt.exception= 0)
  inject_emulated_exception(vcpu);
 -return EMULATE_DONE;
 -}
 -
 -if (vcpu-arch.pio.count) {
 +else if (vcpu-arch.pio.count) {
  if (!vcpu-arch.pio.in)
  vcpu-arch.pio.count = 0;
 -return EMULATE_DO_MMIO;
 -}
 -
 -if (vcpu-mmio_needed) {
 +r = EMULATE_DO_MMIO;
 +} else if (vcpu-mmio_needed) {
  if (vcpu-mmio_is_write)
  vcpu-mmio_needed = 0;
 -return EMULATE_DO_MMIO;
 -}
 -
 -if (vcpu-arch.emulate_ctxt.restart)
 +r = EMULATE_DO_MMIO;
 +} else if (vcpu-arch.emulate_ctxt.restart)
  goto restart;
 
 -return EMULATE_DONE;
 +toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
 +kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 +memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
 +kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 +
 +return r;
   }
   EXPORT_SYMBOL_GPL(emulate_instruction);
 
 
 What about kvm-tpr-opt.c?  It uses rip after pio.
 
It uses rip _during_ pio. And pio emulation changes rip
only at the end of emulation.

 It's true that it usually doesn't go through the emulator.
 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v8 00/16] Provide a zero-copy method on KVM virtio-net.

2010-08-01 Thread Michael S. Tsirkin

On Thu, Jul 29, 2010 at 03:31:22PM -0700, Shirley Ma wrote:
 I did some vhost performance measurement over 10Gb ixgbe, and found that
 in order to get consistent BW results, netperf/netserver, qemu, vhost
 threads smp affinities are required.

Could you provide an example of a good setup?
Specifically, is it a good idea for the vhost thread
to inherit CPU affinities from qemu?

 Looking forward to these results for small message size comparison.

I think we should explore the idea for the driver to fall back on data copy
for small message sizes.
The benefit of zero copy would then be CPU utilization on large messages.

 For
 large message size 10Gb ixgbe BW already reached by doing vhost smp
 affinity w/i offloading support, we will see how much CPU utilization it
 can be reduced. 
 
 Please provide latency results as well. I did some experimental on
 macvtap zero copy sendmsg, what I have found that get_user_pages latency
 pretty high.
 
 Thanks
 Shirley
 
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: locking/rcu cleanup

2010-08-01 Thread Michael S. Tsirkin

On Fri, Jul 30, 2010 at 04:49:54PM +0200, Tejun Heo wrote:
 Hello,
 
 On 07/29/2010 02:23 PM, Michael S. Tsirkin wrote:
  I saw WARN_ON(!list_empty(dev-work_list)) trigger
  so our custom flush is not as airtight as need be.
 
 Could be but it's also possible that something has queued something
 after the last flush?
 Is the problem reproducible?

Well, We do requeue from the job itself. So need to be careful with what
we do with indexes here. Bug seemed to happen all the time when qemu was
killed under stress but now I can't reproduce anymore :(
Will try again later.

  This patch switches to a simple atomic counter + srcu instead of
  the custom locked queue + flush implementation.
  
  This will slow down the setup ioctls, which should not matter -
  it's slow path anyway. We use the expedited flush to at least
  make sure it has a sane time bound.
  
  Works fine for me. I got reports that with many guests,
  work lock is highly contended, and this patch should in theory
  fix this as well - but I haven't tested this yet.
 
 Hmmm... vhost_poll_flush() becomes synchronize_srcu_expedited().  Can
 you please explain how it works?  synchronize_srcu_expedited() is an
 extremely heavy operation involving scheduling the cpu_stop task on
 all cpus.  I'm not quite sure whether doing it from every flush is a
 good idea.  Is flush supposed to be a very rare operation?

It is rare - happens on guest reboot typically. I guess I will
switch to regular synchronize_srcu.

 Having custom implementation is fine too but let's try to implement
 something generic if at all possible.
 
 Thanks.

Sure. It does seem that avoiding list lock would be pretty hard
in generic code though.

 -- 
 tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 11:28 AM, Gleb Natapov wrote:

On Sat, Jul 31, 2010 at 08:25:13PM +0300, Avi Kivity wrote:

  On 07/29/2010 03:11 PM, Gleb Natapov wrote:

No need to update vcpu state since instruction is in the middle of the
emulation.

Signed-off-by: Gleb Natapovg...@redhat.com
---
  arch/x86/kvm/x86.c |   31 +--
  1 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76fbc32..7e5f075 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4057,32 +4057,27 @@ restart:
return handle_emulation_failure(vcpu);
}

-   toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
-   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
-   memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
-   kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
+   r = EMULATE_DONE;

-   if (vcpu-arch.emulate_ctxt.exception= 0) {
+   if (vcpu-arch.emulate_ctxt.exception= 0)
inject_emulated_exception(vcpu);
-   return EMULATE_DONE;
-   }
-
-   if (vcpu-arch.pio.count) {
+   else if (vcpu-arch.pio.count) {
if (!vcpu-arch.pio.in)
vcpu-arch.pio.count = 0;
-   return EMULATE_DO_MMIO;
-   }
-
-   if (vcpu-mmio_needed) {
+   r = EMULATE_DO_MMIO;
+   } else if (vcpu-mmio_needed) {
if (vcpu-mmio_is_write)
vcpu-mmio_needed = 0;
-   return EMULATE_DO_MMIO;
-   }
-
-   if (vcpu-arch.emulate_ctxt.restart)
+   r = EMULATE_DO_MMIO;
+   } else if (vcpu-arch.emulate_ctxt.restart)
goto restart;

-   return EMULATE_DONE;
+   toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
+   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
+   memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
+   kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
+
+   return r;
  }
  EXPORT_SYMBOL_GPL(emulate_instruction);


What about kvm-tpr-opt.c?  It uses rip after pio.


It uses rip _during_ pio. And pio emulation changes rip
only at the end of emulation.


But non-emulated pio does a skip_emulated_instruction() immediately (or 
so the code in kvm-tpr-opt.c assumes:



static void vtpr_ioport_write(void *opaque, uint32_t addr, uint32_t val)
{
CPUState *env = cpu_single_env;
uint32_t rip;

cpu_synchronize_state(env);

rip = env-eip - 2;
write_byte_virt(env, rip, 0x66);
write_byte_virt(env, rip + 1, 0x90);


)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH repost] sched: export sched_set/getaffinity to modules

2010-08-01 Thread Michael S. Tsirkin

On Fri, Jul 30, 2010 at 04:19:01PM +0200, Oleg Nesterov wrote:
 Sorry for the delay, I can't be responsive these days...
 
 On 07/27, Michael S. Tsirkin wrote:
 
  On Mon, Jul 26, 2010 at 08:08:34PM +0200, Oleg Nesterov wrote:
   On 07/26, Sridhar Samudrala wrote:
   
I have been testing out a similar patch that uses kernel_thread() 
without CLONE_FILES
flag rather than create_kthread() and then closing the files.
  
   !CLONE_FILES can't help. copy_files() does dup_fd() in this case.
   The child still inherits the files.
  
Either version should be fine.
  
   I think neither version is fine ;)
  
   exit_files() is not enough too. How about the signals, reparenting?
  
  
   I already forgot all details, probably I missed somethig. But it
   seems to me that it is better to just export get/set affinity and
   forget about all complications.
  
   Oleg.
 
  Oleg, so can I attach your Ack to the patch in question, and merge
  it all through net-next?
 
 Well, I do not think you need my ack ;)
 
 
 But I must admit, I personally dislike this idea. A kernel thread which
 is the child of the user-space process, and in fact it is not the real
 kernel thread. I think this is against the common case. If you do not
 care the signals/reparenting, why can't you fork the user-space process
 which does all the work via ioctl's ? OK, I do not understand the problem
 domain, probably this can't work.
 
 Anyway, the patch looks buggy to me. Starting from
 
   create_kthread(create);
   wait_for_completion(create.done);
 
 At least you should check create_kthread() suceeds, otherwise
 wait_for_completion() will hang forever. OTOH, if it suceeds then
 wait_for_completion() is not needed. But this is minor.
 
 create_kthread()-kernel_thread() uses CLONE_VM, this means that the
 child will share -mm. And this means that if the parent recieves
 the coredumping signal it will hang forever in kernel space waiting
 until this child exits.
 
 This is just the immediate surprise I can see with this approach,
 I am afraid there is something else.
 
 And once again. We are doing this hacks only because we lack a
 couples of exports (iiuk). This is, well, a bit strange ;)
 
 Oleg.


Oleg, I mean Ack the exporting of get/set affinity.

Thanks!

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 11:54:38AM +0300, Avi Kivity wrote:
  On 08/01/2010 11:28 AM, Gleb Natapov wrote:
 On Sat, Jul 31, 2010 at 08:25:13PM +0300, Avi Kivity wrote:
   On 07/29/2010 03:11 PM, Gleb Natapov wrote:
 No need to update vcpu state since instruction is in the middle of the
 emulation.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
   arch/x86/kvm/x86.c |   31 +--
   1 files changed, 13 insertions(+), 18 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 76fbc32..7e5f075 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4057,32 +4057,27 @@ restart:
return handle_emulation_failure(vcpu);
}
 
 -  toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
 -  kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 -  memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
 -  kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 +  r = EMULATE_DONE;
 
 -  if (vcpu-arch.emulate_ctxt.exception= 0) {
 +  if (vcpu-arch.emulate_ctxt.exception= 0)
inject_emulated_exception(vcpu);
 -  return EMULATE_DONE;
 -  }
 -
 -  if (vcpu-arch.pio.count) {
 +  else if (vcpu-arch.pio.count) {
if (!vcpu-arch.pio.in)
vcpu-arch.pio.count = 0;
 -  return EMULATE_DO_MMIO;
 -  }
 -
 -  if (vcpu-mmio_needed) {
 +  r = EMULATE_DO_MMIO;
 +  } else if (vcpu-mmio_needed) {
if (vcpu-mmio_is_write)
vcpu-mmio_needed = 0;
 -  return EMULATE_DO_MMIO;
 -  }
 -
 -  if (vcpu-arch.emulate_ctxt.restart)
 +  r = EMULATE_DO_MMIO;
 +  } else if (vcpu-arch.emulate_ctxt.restart)
goto restart;
 
 -  return EMULATE_DONE;
 +  toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
 +  kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
 +  memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
 +  kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 +
 +  return r;
   }
   EXPORT_SYMBOL_GPL(emulate_instruction);
 
 What about kvm-tpr-opt.c?  It uses rip after pio.
 
 It uses rip _during_ pio. And pio emulation changes rip
 only at the end of emulation.
 
 But non-emulated pio does a skip_emulated_instruction() immediately
 (or so the code in kvm-tpr-opt.c assumes:
 
Indeed, this is bug in non-emulated pio. But the patch does not change
rip behaviour for emulated pio. vcpu-arch.emulate_ctxt.eip is updated
only at the end of emulation.

 static void vtpr_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 CPUState *env = cpu_single_env;
 uint32_t rip;
 
 cpu_synchronize_state(env);
 
 rip = env-eip - 2;
 write_byte_virt(env, rip, 0x66);
 write_byte_virt(env, rip + 1, 0x90);
 
 )
 
 -- 
 error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 12:01 PM, Gleb Natapov wrote:


It uses rip _during_ pio. And pio emulation changes rip
only at the end of emulation.

But non-emulated pio does a skip_emulated_instruction() immediately
(or so the code in kvm-tpr-opt.c assumes:


Indeed, this is bug in non-emulated pio.


But userspace depends on this bug.


But the patch does not change
rip behaviour for emulated pio. vcpu-arch.emulate_ctxt.eip is updated
only at the end of emulation.


That will lead to failures if the emulator is used for the kvm-tpr-opt 
pio (which may happen with big real mode).



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 12:14:40PM +0300, Avi Kivity wrote:
  On 08/01/2010 12:01 PM, Gleb Natapov wrote:
 
 It uses rip _during_ pio. And pio emulation changes rip
 only at the end of emulation.
 But non-emulated pio does a skip_emulated_instruction() immediately
 (or so the code in kvm-tpr-opt.c assumes:
 
 Indeed, this is bug in non-emulated pio.
 
 But userspace depends on this bug.
We can fix that, or make it smarter. Look for io instruction at
rip/rip-2 and use rip accordingly for instance.

 
 But the patch does not change
 rip behaviour for emulated pio. vcpu-arch.emulate_ctxt.eip is updated
 only at the end of emulation.
 
 That will lead to failures if the emulator is used for the
 kvm-tpr-opt pio (which may happen with big real mode).
 
IIRC it was always this way in emulator. I'd rather fix userspace than
break emulator.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 12:24 PM, Gleb Natapov wrote:

On Sun, Aug 01, 2010 at 12:14:40PM +0300, Avi Kivity wrote:

  On 08/01/2010 12:01 PM, Gleb Natapov wrote:

It uses rip _during_ pio. And pio emulation changes rip
only at the end of emulation.

But non-emulated pio does a skip_emulated_instruction() immediately
(or so the code in kvm-tpr-opt.c assumes:


Indeed, this is bug in non-emulated pio.

But userspace depends on this bug.

We can fix that, or make it smarter. Look for io instruction at
rip/rip-2 and use rip accordingly for instance.


That requires everyone to update, or suffer major breakage.


But the patch does not change
rip behaviour for emulated pio. vcpu-arch.emulate_ctxt.eip is updated
only at the end of emulation.

That will lead to failures if the emulator is used for the
kvm-tpr-opt pio (which may happen with big real mode).


IIRC it was always this way in emulator. I'd rather fix userspace than
break emulator.


It wasn't a problem because the emulator wasn't (and still isn't) used 
for this.  But it has the potential to break badly once we make 
emulate_invalid_guest_state=1 the default.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 01:00:11PM +0300, Avi Kivity wrote:
  On 08/01/2010 12:24 PM, Gleb Natapov wrote:
 On Sun, Aug 01, 2010 at 12:14:40PM +0300, Avi Kivity wrote:
   On 08/01/2010 12:01 PM, Gleb Natapov wrote:
 It uses rip _during_ pio. And pio emulation changes rip
 only at the end of emulation.
 But non-emulated pio does a skip_emulated_instruction() immediately
 (or so the code in kvm-tpr-opt.c assumes:
 
 Indeed, this is bug in non-emulated pio.
 But userspace depends on this bug.
 We can fix that, or make it smarter. Look for io instruction at
 rip/rip-2 and use rip accordingly for instance.
 
 That requires everyone to update, or suffer major breakage.
 
They will suffer major breakage when they update to a kvm that calls to
kvm-tpr-opt.c from emulator anyway.

 But the patch does not change
 rip behaviour for emulated pio. vcpu-arch.emulate_ctxt.eip is updated
 only at the end of emulation.
 That will lead to failures if the emulator is used for the
 kvm-tpr-opt pio (which may happen with big real mode).
 
 IIRC it was always this way in emulator. I'd rather fix userspace than
 break emulator.
 
 It wasn't a problem because the emulator wasn't (and still isn't)
 used for this.  But it has the potential to break badly once we make
 emulate_invalid_guest_state=1 the default.
 
So what can we do about it?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 01:53 PM, Gleb Natapov wrote:



That requires everyone to update, or suffer major breakage.


They will suffer major breakage when they update to a kvm that calls to
kvm-tpr-opt.c from emulator anyway.


Why?


IIRC it was always this way in emulator. I'd rather fix userspace than
break emulator.

It wasn't a problem because the emulator wasn't (and still isn't)
used for this.  But it has the potential to break badly once we make
emulate_invalid_guest_state=1 the default.


So what can we do about it?



Keep the existing behaviour.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 03:17:10PM +0300, Avi Kivity wrote:
  On 08/01/2010 01:53 PM, Gleb Natapov wrote:
 
 That requires everyone to update, or suffer major breakage.
 
 They will suffer major breakage when they update to a kvm that calls to
 kvm-tpr-opt.c from emulator anyway.
 
 Why?
 
Because tpr code will be called with wrong rip. Emulator always updated rip at 
the end
of an instruction emulation in writeback stage.

 IIRC it was always this way in emulator. I'd rather fix userspace than
 break emulator.
 It wasn't a problem because the emulator wasn't (and still isn't)
 used for this.  But it has the potential to break badly once we make
 emulate_invalid_guest_state=1 the default.
 
 So what can we do about it?
 
 
 Keep the existing behaviour.
 
Existing behaviour will cause breakage.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 03:23 PM, Gleb Natapov wrote:

On Sun, Aug 01, 2010 at 03:17:10PM +0300, Avi Kivity wrote:

  On 08/01/2010 01:53 PM, Gleb Natapov wrote:

That requires everyone to update, or suffer major breakage.


They will suffer major breakage when they update to a kvm that calls to
kvm-tpr-opt.c from emulator anyway.

Why?


Because tpr code will be called with wrong rip. Emulator always updated rip at 
the end
of an instruction emulation in writeback stage.



We can change it before switching enabling e_i_g_s by default.



So what can we do about it?


Keep the existing behaviour.


Existing behaviour will cause breakage.



The existing user-visible behaviour.  The user doesn't know whether the 
emulator is involved or not.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 03:35:41PM +0300, Avi Kivity wrote:
  On 08/01/2010 03:23 PM, Gleb Natapov wrote:
 On Sun, Aug 01, 2010 at 03:17:10PM +0300, Avi Kivity wrote:
   On 08/01/2010 01:53 PM, Gleb Natapov wrote:
 That requires everyone to update, or suffer major breakage.
 
 They will suffer major breakage when they update to a kvm that calls to
 kvm-tpr-opt.c from emulator anyway.
 Why?
 
 Because tpr code will be called with wrong rip. Emulator always updated rip 
 at the end
 of an instruction emulation in writeback stage.
 
 
 We can change it before switching enabling e_i_g_s by default.
 
 
Break emulator? We can't increment rip for all instructions before
emulation since then exception will be injected at incorrect rip.
Adding code that rollbacks rip in case of exception will complicate
things and exception is not the only reason to keep rip pointed to the
instruction. We may want to reenter guest to reexecute it for instance.

 So what can we do about it?
 
 Keep the existing behaviour.
 
 Existing behaviour will cause breakage.
 
 
 The existing user-visible behaviour.  The user doesn't know whether
 the emulator is involved or not.
 
When we are going to enable e_i_g_s by default? May be we have enough
time to fix userspace? Too ancient userspace already does not run on recent
kvm. Or may be we can make userspace enable e_i_g_s per guest. This way
userspace that knows it is OK can tell kernel so.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] KVM PPC PV framework v3

2010-08-01 Thread Avi Kivity


 On 07/29/2010 03:47 PM, Alexander Graf wrote:

On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
hypervisor extensions.

While that is all great to show that virtualization is possible, there are
quite some cases where the emulation overhead of privileged instructions is
killing performance.

This patchset tackles exactly that issue. It introduces a paravirtual framework
using which KVM and Linux share a page to exchange register state with. That
way we don't have to switch to the hypervisor just to change a value of a
privileged register.

To prove my point, I ran the same test I did for the MMU optimizations against
the PV framework. Here are the results:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m7.557s
user0m4.121s
sys 0m3.426s


So this is a significant performance improvement! I'm quite happy how fast this
whole thing becomes :)

I tried to take all comments I've heard from people so far about such a PV
framework into account. In case you told me something before that is a no-go
and I still did it, please just tell me again.

To make use of this whole thing you also need patches to qemu and openbios. I
have them in my queue, but want to see this set upstream first before I start
sending patches to the other projects.

Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
experiencing the power yourself. - heh

v1 -  v2:

   - change hypervisor calls to use r0 and r3
   - make crit detection only trigger in supervisor mode
   - RMO -  PAM
   - introduce kvm_patch_ins
   - only flush icache when patching
   - introduce kvm_patch_ins_b
   - update documentation

v2 -  v3:

   - use pPAPR conventions for hypercall interface
   - only use r0 as magic sc number
   - remove PVR detection
   - remove BookE shared page mapping support
   - combine book3s-64 and -32 magic page ra override
   - add self-test check if the mapping works to guest code
   - add safety check for relocatable kernels



Looks reasonable.  Since it's fair to say I understand nothing about 
powerpc, I'd like someone who does to review it and ack, please, with an 
emphasis on the interfaces.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] KVM: PPC: Add book3s_32 tlbie flush acceleration

2010-08-01 Thread Avi Kivity


 On 07/29/2010 04:04 PM, Alexander Graf wrote:

On Book3s_32 the tlbie instruction flushed effective addresses by the mask
0x0000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
to speed up that target we should also keep a special hash around for it.


  static inline u64 kvmppc_mmu_hash_vpte(u64 vpage)
  {
return hash_64(vpage  0xfULL, HPTEG_HASH_BITS_VPTE);
@@ -66,6 +72,11 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
index = kvmppc_mmu_hash_pte(pte-pte.eaddr);
hlist_add_head_rcu(pte-list_pte,vcpu-arch.hpte_hash_pte[index]);

+   /* Add to ePTE_long list */
+   index = kvmppc_mmu_hash_pte_long(pte-pte.eaddr);
+   hlist_add_head_rcu(pte-list_pte_long,
+   vcpu-arch.hpte_hash_pte_long[index]);
+


Isn't it better to make operations on this list conditional on 
Book3s_32?  Hashes are expensive since they usually cost cache misses.


Can of course be done later as an optimization.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Rest of my KVM-PPC patch queue

2010-08-01 Thread Avi Kivity


 On 07/29/2010 04:04 PM, Alexander Graf wrote:

During the past few weeks a couple of fixes have gathered in my queue. This
is a dump of everything that is not related to the PV framework.

Please apply on top of the PV stuff.



Looks reasonable as well.  I'll apply as soon as I get a review on the 
previous patchset.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/15] More emulator cleanups

2010-08-01 Thread Avi Kivity

This patchset further cleans up the emulator.  The goal is to push
segment decoding into 'struct operand', but a few things stood in
the way.

Avi Kivity (15):
  KVM: x86 emulator: push segment override out of decode_modrm()
  KVM: x86 emulator: use correct type for memory address in operands
  KVM: x86 emulator: simplify xchg decode tables
  KVM: x86 emulator: use SrcAcc to simplify xchg decoding
  KVM: x86 emulator: put register operand fetch into a function
  KVM: x86 emulator: drop use_modrm_ea
  KVM: x86 emulator: simplify REX.W check
  KVM: x86 emulator: introduce Force64 for forcing operand size to 64
bits
  KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in
long mode
  KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg
for reg op
  KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg
for reg op
  KVM: x86 emulator: add NoAccess flag for memory instructions that
skip access
  KVM: x86 emulator: switch LEA to use SrcMem decoding
  KVM: x86 emulator: change invlpg emulation to use src.mem.addr
  KVM: x86 emulator: Decode memory operands directly into a 'struct
operand'

 arch/x86/include/asm/kvm_emulate.h |   11 +-
 arch/x86/kvm/emulate.c |  321 
 2 files changed, 147 insertions(+), 185 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] KVM: x86 emulator: simplify xchg decode tables

2010-08-01 Thread Avi Kivity

Use X8() to avoid repetition.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 61d728d..745353e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2142,7 +2142,7 @@ static struct opcode opcode_table[256] = {
D(DstMem | SrcNone | ModRM | Mov), D(ModRM | DstReg),
D(ImplicitOps | SrcMem16 | ModRM), G(0, group1A),
/* 0x90 - 0x97 */
-   D(DstReg), D(DstReg), D(DstReg), D(DstReg), D(DstReg), D(DstReg), 
D(DstReg), D(DstReg),
+   X8(D(DstReg)),
/* 0x98 - 0x9F */
N, N, D(SrcImmFAddr | No64), N,
D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/15] KVM: x86 emulator: use SrcAcc to simplify xchg decoding

2010-08-01 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |   15 ---
 1 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 745353e..4d510c3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2142,7 +2142,7 @@ static struct opcode opcode_table[256] = {
D(DstMem | SrcNone | ModRM | Mov), D(ModRM | DstReg),
D(ImplicitOps | SrcMem16 | ModRM), G(0, group1A),
/* 0x90 - 0x97 */
-   X8(D(DstReg)),
+   X8(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
N, N, D(SrcImmFAddr | No64), N,
D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N,
@@ -2927,16 +2927,9 @@ special_insn:
if (rc != X86EMUL_CONTINUE)
goto done;
break;
-   case 0x90: /* nop / xchg r8,rax */
-   if (c-dst.addr.reg == c-regs[VCPU_REGS_RAX]) {
-   c-dst.type = OP_NONE;  /* nop */
-   break;
-   }
-   case 0x91 ... 0x97: /* xchg reg,rax */
-   c-src.type = OP_REG;
-   c-src.bytes = c-op_bytes;
-   c-src.addr.reg = c-regs[VCPU_REGS_RAX];
-   c-src.val = *(c-src.addr.reg);
+   case 0x90 ... 0x97: /* nop / xchg reg, rax */
+   if (c-dst.addr.reg == c-regs[VCPU_REGS_RAX])
+   goto done;
goto xchg;
case 0x9c: /* pushf */
c-src.val =  (unsigned long) ctxt-eflags;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/15] KVM: x86 emulator: add NoAccess flag for memory instructions that skip access

2010-08-01 Thread Avi Kivity

Use for INVLPG, which accesses the tlb, not memory.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5bc62f2..29312a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -83,6 +83,7 @@
 #define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
 #define GroupDual   (115) /* Alternate decoding of mod == 3 */
 /* Misc flags */
+#define NoAccess(123) /* Don't access memory (lea/invlpg/verr etc) */
 #define Force64 (124) /* Force operand size to 64 bits in 64-bit mode */
 #define Undefined   (125) /* No Such Instruction */
 #define Lock(126) /* lock prefix is allowed for the instruction */
@@ -2062,7 +2063,8 @@ static struct opcode group5[] = {
 static struct group_dual group7 = { {
N, N, D(ModRM | SrcMem | Priv), D(ModRM | SrcMem | Priv),
D(SrcNone | ModRM | DstMem | Mov), N,
-   D(SrcMem16 | ModRM | Mov | Priv), D(SrcMem | ModRM | ByteOp | Priv),
+   D(SrcMem16 | ModRM | Mov | Priv),
+   D(SrcMem | ModRM | ByteOp | Priv | NoAccess),
 }, {
D(SrcNone | ModRM | Priv), N, N, D(SrcNone | ModRM | Priv),
D(SrcNone | ModRM | DstMem | Mov), N,
@@ -2444,7 +2446,7 @@ done_prefixes:
c-src.bytes = (c-d  ByteOp) ? 1 :
   c-op_bytes;
/* Don't fetch the address for invlpg: it could be unmapped. */
-   if (c-twobyte  c-b == 0x01  c-modrm_reg == 7)
+   if (c-d  NoAccess)
break;
srcmem_common:
/*
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/15] KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op

2010-08-01 Thread Avi Kivity

This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d1a6cd7..53e5c60 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2205,8 +2205,8 @@ static struct opcode twobyte_table[256] = {
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
/* 0x20 - 0x2F */
-   D(ModRM | ImplicitOps | Priv | Force64), D(ModRM | Priv | Force64),
-   D(ModRM | ImplicitOps | Priv | Force64), D(ModRM | Priv | Force64),
+   D(ModRM | DstMem | Priv | Force64), D(ModRM | Priv | Force64),
+   D(ModRM | SrcMem | Priv | Force64), D(ModRM | Priv | Force64),
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
@@ -3228,8 +3228,7 @@ twobyte_insn:
emulate_ud(ctxt);
goto done;
}
-   c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
-   c-dst.type = OP_NONE;  /* no writeback */
+   c-dst.val = ops-get_cr(c-modrm_reg, ctxt-vcpu);
break;
case 0x21: /* mov from dr to reg */
if ((ops-get_cr(4, ctxt-vcpu)  X86_CR4_DE) 
@@ -3241,7 +3240,7 @@ twobyte_insn:
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   if (ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu)) {
+   if (ops-set_cr(c-modrm_reg, c-src.val, ctxt-vcpu)) {
emulate_gp(ctxt, 0);
goto done;
}
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/15] KVM: x86 emulator: change invlpg emulation to use src.mem.addr

2010-08-01 Thread Avi Kivity

Instead of using modrm_ea, which will soon be gone.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 46a5d75..de1ed94 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3194,7 +3194,7 @@ twobyte_insn:
emulate_ud(ctxt);
goto done;
case 7: /* invlpg*/
-   emulate_invlpg(ctxt-vcpu, c-modrm_ea);
+   emulate_invlpg(ctxt-vcpu, c-src.addr.mem);
/* Disable writeback. */
c-dst.type = OP_NONE;
break;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/15] KVM: x86 emulator: Decode memory operands directly into a 'struct operand'

2010-08-01 Thread Avi Kivity

Since modrm operand can be either register or memory, decoding it into
a 'struct operand', which can represent both, is simpler.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |3 -
 arch/x86/kvm/emulate.c |  125 ---
 2 files changed, 57 insertions(+), 71 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index db4a248..99c1c57 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -199,9 +199,6 @@ struct decode_cache {
u8 modrm_rm;
u8 modrm_seg;
bool rip_relative;
-   unsigned long modrm_ea;
-   void *modrm_ptr;
-   unsigned long modrm_val;
struct fetch_cache fetch;
struct read_cache io_read;
struct read_cache mem_read;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index de1ed94..49a174f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -581,12 +581,14 @@ static void decode_register_operand(struct operand *op,
 }
 
 static int decode_modrm(struct x86_emulate_ctxt *ctxt,
-   struct x86_emulate_ops *ops)
+   struct x86_emulate_ops *ops,
+   struct operand *op)
 {
struct decode_cache *c = ctxt-decode;
u8 sib;
int index_reg = 0, base_reg = 0, scale;
int rc = X86EMUL_CONTINUE;
+   ulong modrm_ea = 0;
 
if (c-rex_prefix) {
c-modrm_reg = (c-rex_prefix  4)  1;/* REX.R */
@@ -598,16 +600,19 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
c-modrm_mod |= (c-modrm  0xc0)  6;
c-modrm_reg |= (c-modrm  0x38)  3;
c-modrm_rm |= (c-modrm  0x07);
-   c-modrm_ea = 0;
c-modrm_seg = VCPU_SREG_DS;
 
if (c-modrm_mod == 3) {
-   c-modrm_ptr = decode_register(c-modrm_rm,
+   op-type = OP_REG;
+   op-bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
+   op-addr.reg = decode_register(c-modrm_rm,
   c-regs, c-d  ByteOp);
-   c-modrm_val = *(unsigned long *)c-modrm_ptr;
+   fetch_register_operand(op);
return rc;
}
 
+   op-type = OP_MEM;
+
if (c-ad_bytes == 2) {
unsigned bx = c-regs[VCPU_REGS_RBX];
unsigned bp = c-regs[VCPU_REGS_RBP];
@@ -618,46 +623,46 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
switch (c-modrm_mod) {
case 0:
if (c-modrm_rm == 6)
-   c-modrm_ea += insn_fetch(u16, 2, c-eip);
+   modrm_ea += insn_fetch(u16, 2, c-eip);
break;
case 1:
-   c-modrm_ea += insn_fetch(s8, 1, c-eip);
+   modrm_ea += insn_fetch(s8, 1, c-eip);
break;
case 2:
-   c-modrm_ea += insn_fetch(u16, 2, c-eip);
+   modrm_ea += insn_fetch(u16, 2, c-eip);
break;
}
switch (c-modrm_rm) {
case 0:
-   c-modrm_ea += bx + si;
+   modrm_ea += bx + si;
break;
case 1:
-   c-modrm_ea += bx + di;
+   modrm_ea += bx + di;
break;
case 2:
-   c-modrm_ea += bp + si;
+   modrm_ea += bp + si;
break;
case 3:
-   c-modrm_ea += bp + di;
+   modrm_ea += bp + di;
break;
case 4:
-   c-modrm_ea += si;
+   modrm_ea += si;
break;
case 5:
-   c-modrm_ea += di;
+   modrm_ea += di;
break;
case 6:
if (c-modrm_mod != 0)
-   c-modrm_ea += bp;
+   modrm_ea += bp;
break;
case 7:
-   c-modrm_ea += bx;
+   modrm_ea += bx;
break;
}
if (c-modrm_rm == 2 || c-modrm_rm == 3 ||
(c-modrm_rm == 6  c-modrm_mod != 0))
c-modrm_seg = VCPU_SREG_SS;
-   c-modrm_ea = (u16)c-modrm_ea;
+   modrm_ea = (u16)modrm_ea;
} else {
/* 32/64-bit ModR/M decode. */
if ((c-modrm_rm  7) == 4) {
@@ -667,48 +672,51 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
scale = sib  6;
 
if ((base_reg  7) == 5  c-modrm_mod ==

[PATCH 13/15] KVM: x86 emulator: switch LEA to use SrcMem decoding

2010-08-01 Thread Avi Kivity

The NoAccess flag will prevent memory from being accessed.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 29312a0..46a5d75 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2148,7 +2148,7 @@ static struct opcode opcode_table[256] = {
/* 0x88 - 0x8F */
D(ByteOp | DstMem | SrcReg | ModRM | Mov), D(DstMem | SrcReg | ModRM | 
Mov),
D(ByteOp | DstReg | SrcMem | ModRM | Mov), D(DstReg | SrcMem | ModRM | 
Mov),
-   D(DstMem | SrcNone | ModRM | Mov), D(ModRM | DstReg),
+   D(DstMem | SrcNone | ModRM | Mov), D(ModRM | SrcMem | NoAccess | 
DstReg),
D(ImplicitOps | SrcMem16 | ModRM), G(0, group1A),
/* 0x90 - 0x97 */
X8(D(SrcAcc | DstReg)),
@@ -2883,7 +2883,7 @@ special_insn:
c-dst.val = ops-get_segment_selector(c-modrm_reg, 
ctxt-vcpu);
break;
case 0x8d: /* lea r16/r32, m */
-   c-dst.val = c-modrm_ea;
+   c-dst.val = c-src.addr.mem;
break;
case 0x8e: { /* mov seg, r/m16 */
uint16_t sel;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/15] KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op

2010-08-01 Thread Avi Kivity

This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 53e5c60..5bc62f2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2205,8 +2205,8 @@ static struct opcode twobyte_table[256] = {
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
/* 0x20 - 0x2F */
-   D(ModRM | DstMem | Priv | Force64), D(ModRM | Priv | Force64),
-   D(ModRM | SrcMem | Priv | Force64), D(ModRM | Priv | Force64),
+   D(ModRM | DstMem | Priv | Force64), D(ModRM | DstMem | Priv | Force64),
+   D(ModRM | SrcMem | Priv | Force64), D(ModRM | SrcMem | Priv | Force64),
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
@@ -3236,8 +3236,7 @@ twobyte_insn:
emulate_ud(ctxt);
goto done;
}
-   ops-get_dr(c-modrm_reg, c-regs[c-modrm_rm], ctxt-vcpu);
-   c-dst.type = OP_NONE;  /* no writeback */
+   ops-get_dr(c-modrm_reg, c-dst.val, ctxt-vcpu);
break;
case 0x22: /* mov reg, cr */
if (ops-set_cr(c-modrm_reg, c-src.val, ctxt-vcpu)) {
@@ -3253,7 +3252,7 @@ twobyte_insn:
goto done;
}
 
-   if (ops-set_dr(c-modrm_reg, c-regs[c-modrm_rm] 
+   if (ops-set_dr(c-modrm_reg, c-src.val 
((ctxt-mode == X86EMUL_MODE_PROT64) ?
 ~0ULL : ~0U), ctxt-vcpu)  0) {
/* #UD condition is already handled by the code above */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode

2010-08-01 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d7d95de..d1a6cd7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2205,8 +2205,8 @@ static struct opcode twobyte_table[256] = {
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
/* 0x20 - 0x2F */
-   D(ModRM | ImplicitOps | Priv), D(ModRM | Priv),
-   D(ModRM | ImplicitOps | Priv), D(ModRM | Priv),
+   D(ModRM | ImplicitOps | Priv | Force64), D(ModRM | Priv | Force64),
+   D(ModRM | ImplicitOps | Priv | Force64), D(ModRM | Priv | Force64),
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] KVM: x86 emulator: introduce Force64 for forcing operand size to 64 bits

2010-08-01 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a114fa9..d7d95de 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -83,6 +83,7 @@
 #define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
 #define GroupDual   (115) /* Alternate decoding of mod == 3 */
 /* Misc flags */
+#define Force64 (124) /* Force operand size to 64 bits in 64-bit mode */
 #define Undefined   (125) /* No Such Instruction */
 #define Lock(126) /* lock prefix is allowed for the instruction */
 #define Priv(127) /* instruction generates #GP if current CPL != 0 */
@@ -2398,7 +2399,7 @@ done_prefixes:
return -1;
}
 
-   if (mode == X86EMUL_MODE_PROT64  (c-d  Stack))
+   if (mode == X86EMUL_MODE_PROT64  (c-d  (Stack | Force64)))
c-op_bytes = 8;
 
/* ModRM and SIB bytes. */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/15] KVM: x86 emulator: drop use_modrm_ea

2010-08-01 Thread Avi Kivity

Unused (and has never been).

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/kvm/emulate.c |1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index bbf0e81..db4a248 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -198,7 +198,6 @@ struct decode_cache {
u8 modrm_reg;
u8 modrm_rm;
u8 modrm_seg;
-   u8 use_modrm_ea;
bool rip_relative;
unsigned long modrm_ea;
void *modrm_ptr;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 063c96a..2ae2e54 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -597,7 +597,6 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
c-modrm_reg |= (c-modrm  0x38)  3;
c-modrm_rm |= (c-modrm  0x07);
c-modrm_ea = 0;
-   c-use_modrm_ea = 1;
c-modrm_seg = VCPU_SREG_DS;
 
if (c-modrm_mod == 3) {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/15] KVM: x86 emulator: push segment override out of decode_modrm()

2010-08-01 Thread Avi Kivity

Let it compute modrm_seg instead, and have the caller apply it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   10 ++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f397b79..ecb2653 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -193,6 +193,7 @@ struct decode_cache {
u8 modrm_mod;
u8 modrm_reg;
u8 modrm_rm;
+   u8 modrm_seg;
u8 use_modrm_ea;
bool rip_relative;
unsigned long modrm_ea;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc1ecff..2ed6c67 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -593,6 +593,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
c-modrm_rm |= (c-modrm  0x07);
c-modrm_ea = 0;
c-use_modrm_ea = 1;
+   c-modrm_seg = VCPU_SREG_DS;
 
if (c-modrm_mod == 3) {
c-modrm_ptr = decode_register(c-modrm_rm,
@@ -649,8 +650,7 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
}
if (c-modrm_rm == 2 || c-modrm_rm == 3 ||
(c-modrm_rm == 6  c-modrm_mod != 0))
-   if (!c-has_seg_override)
-   set_seg_override(c, VCPU_SREG_SS);
+   c-modrm_seg = VCPU_SREG_SS;
c-modrm_ea = (u16)c-modrm_ea;
} else {
/* 32/64-bit ModR/M decode. */
@@ -2400,9 +2400,11 @@ done_prefixes:
c-op_bytes = 8;
 
/* ModRM and SIB bytes. */
-   if (c-d  ModRM)
+   if (c-d  ModRM) {
rc = decode_modrm(ctxt, ops);
-   else if (c-d  MemAbs)
+   if (!c-has_seg_override)
+   set_seg_override(c, c-modrm_seg);
+   } else if (c-d  MemAbs)
rc = decode_abs(ctxt, ops);
if (rc != X86EMUL_CONTINUE)
goto done;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] KVM: x86 emulator: simplify REX.W check

2010-08-01 Thread Avi Kivity

(x  (x  y)) == (x  y)

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2ae2e54..a114fa9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2353,9 +2353,8 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt)
 done_prefixes:
 
/* REX prefix. */
-   if (c-rex_prefix)
-   if (c-rex_prefix  8)
-   c-op_bytes = 8;/* REX.W */
+   if (c-rex_prefix  8)
+   c-op_bytes = 8;/* REX.W */
 
/* Opcode byte(s). */
opcode = opcode_table[c-b];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] KVM: x86 emulator: use correct type for memory address in operands

2010-08-01 Thread Avi Kivity

Currently we use a void pointer for memory addresses.  That's wrong since
these are guest virtual addresses which are not directly dereferencable by
the host.

Use the correct type, unsigned long.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |6 ++-
 arch/x86/kvm/emulate.c |  117 +--
 2 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ecb2653..bbf0e81 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -152,7 +152,11 @@ struct x86_emulate_ops {
 struct operand {
enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
unsigned int bytes;
-   unsigned long orig_val, *ptr;
+   unsigned long orig_val;
+   union {
+   unsigned long *reg;
+   unsigned long mem;
+   } addr;
union {
unsigned long val;
char valptr[sizeof(unsigned long) + 2];
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2ed6c67..61d728d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -489,7 +489,7 @@ static void *decode_register(u8 modrm_reg, unsigned long 
*regs,
 
 static int read_descriptor(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
-  void *ptr,
+  ulong addr,
   u16 *size, unsigned long *address, int op_bytes)
 {
int rc;
@@ -497,12 +497,10 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
if (op_bytes == 2)
op_bytes = 3;
*address = 0;
-   rc = ops-read_std((unsigned long)ptr, (unsigned long *)size, 2,
-  ctxt-vcpu, NULL);
+   rc = ops-read_std(addr, (unsigned long *)size, 2, ctxt-vcpu, NULL);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = ops-read_std((unsigned long)ptr + 2, address, op_bytes,
-  ctxt-vcpu, NULL);
+   rc = ops-read_std(addr + 2, address, op_bytes, ctxt-vcpu, NULL);
return rc;
 }
 
@@ -552,21 +550,21 @@ static void decode_register_operand(struct operand *op,
reg = (c-b  7) | ((c-rex_prefix  1)  3);
op-type = OP_REG;
if ((c-d  ByteOp)  !inhibit_bytereg) {
-   op-ptr = decode_register(reg, c-regs, highbyte_regs);
-   op-val = *(u8 *)op-ptr;
+   op-addr.reg = decode_register(reg, c-regs, highbyte_regs);
+   op-val = *(u8 *)op-addr.reg;
op-bytes = 1;
} else {
-   op-ptr = decode_register(reg, c-regs, 0);
+   op-addr.reg = decode_register(reg, c-regs, 0);
op-bytes = c-op_bytes;
switch (op-bytes) {
case 2:
-   op-val = *(u16 *)op-ptr;
+   op-val = *(u16 *)op-addr.reg;
break;
case 4:
-   op-val = *(u32 *)op-ptr;
+   op-val = *(u32 *)op-addr.reg;
break;
case 8:
-   op-val = *(u64 *) op-ptr;
+   op-val = *(u64 *) op-addr.reg;
break;
}
}
@@ -976,23 +974,23 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
 */
switch (c-dst.bytes) {
case 1:
-   *(u8 *)c-dst.ptr = (u8)c-dst.val;
+   *(u8 *)c-dst.addr.reg = (u8)c-dst.val;
break;
case 2:
-   *(u16 *)c-dst.ptr = (u16)c-dst.val;
+   *(u16 *)c-dst.addr.reg = (u16)c-dst.val;
break;
case 4:
-   *c-dst.ptr = (u32)c-dst.val;
+   *c-dst.addr.reg = (u32)c-dst.val;
break;  /* 64b: zero-ext */
case 8:
-   *c-dst.ptr = c-dst.val;
+   *c-dst.addr.reg = c-dst.val;
break;
}
break;
case OP_MEM:
if (c-lock_prefix)
rc = ops-cmpxchg_emulated(
-   (unsigned long)c-dst.ptr,
+   c-dst.addr.mem,
c-dst.orig_val,
c-dst.val,
c-dst.bytes,
@@ -1000,14 +998,13 @@ static inline int writeback(struct x86_emulate_ctxt 
*ctxt,
ctxt-vcpu);
else
rc = ops-write_emulated(
-   (unsigned long)c-dst.ptr,
+   c-dst.addr.mem,

Re: [PATCH 7/24] Understanding guest pointers to vmcs12 structures

2010-08-01 Thread Nadav Har'El

On Tue, Jun 15, 2010, Gleb Natapov wrote about Re: [PATCH 7/24] Understanding 
guest pointers to vmcs12 structures:
  +/*
  + * Decode the memory-address operand of a vmx instruction, according to the
  + * Intel spec.
  + */
...
  +static gva_t get_vmx_mem_address(struct kvm_vcpu *vcpu,
  +unsigned long exit_qualification,
  +u32 vmx_instruction_info)
  +{
...
  +   if (is_reg) {
  +   kvm_queue_exception(vcpu, UD_VECTOR);
  +   return 0;
 Isn't zero a legitimate address for vmx operation?

Thanks. Please excuse my naivity, but is address 0 actually considered a
usable guest virtual address? If it is, do we have any possible value which is
considered invalid? Perhaps -1ull? I see that -1ull is used in a few places
in vmx.c, for example.

If all gva_t turn out to actually be valid addresses, I'll need to move to a
more complex (and uglier) success flag approach :(

-- 
Nadav Har'El|  Sunday, Aug  1 2010, 22 Av 5770
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |The only intuitive interface is the
http://nadav.harel.org.il   |nipple. After that, it's all learned.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/24] Understanding guest pointers to vmcs12 structures

2010-08-01 Thread Gleb Natapov

On Sun, Aug 01, 2010 at 06:16:59PM +0300, Nadav Har'El wrote:
 On Tue, Jun 15, 2010, Gleb Natapov wrote about Re: [PATCH 7/24] 
 Understanding guest pointers to vmcs12 structures:
   +/*
   + * Decode the memory-address operand of a vmx instruction, according to 
   the
   + * Intel spec.
   + */
 ...
   +static gva_t get_vmx_mem_address(struct kvm_vcpu *vcpu,
   +  unsigned long exit_qualification,
   +  u32 vmx_instruction_info)
   +{
 ...
   + if (is_reg) {
   + kvm_queue_exception(vcpu, UD_VECTOR);
   + return 0;
  Isn't zero a legitimate address for vmx operation?
 
 Thanks. Please excuse my naivity, but is address 0 actually considered a
 usable guest virtual address? If it is, do we have any possible value which is
 considered invalid? Perhaps -1ull? I see that -1ull is used in a few places
 in vmx.c, for example.
 
Guest can use any valid virtual address. There is UNMAPPED_GVA (~(gpa_t)0) which
at least cannot be valid if address that your function returns have to be
page aligned. And not all virtual addresses are valid BTW. For 32 bit
guest virt address cannot be bigger then 32 bit and for 64 bit guest
virtual address should be in canonical form.

 If all gva_t turn out to actually be valid addresses, I'll need to move to a
 more complex (and uglier) success flag approach :(
 
 -- 
 Nadav Har'El|  Sunday, Aug  1 2010, 22 Av 5770
 n...@math.technion.ac.il 
 |-
 Phone +972-523-790466, ICQ 13349191 |The only intuitive interface is the
 http://nadav.harel.org.il   |nipple. After that, it's all learned.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86 emulator: fix LMSW able to clear cr0.pe

2010-08-01 Thread Avi Kivity

LMSW is documented not to be able to set cr0.pe; make it so.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc1ecff..05f80f7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3212,7 +3212,7 @@ twobyte_insn:
c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
-   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0ful) |
+   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0eul) |
(c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio module question...

2010-08-01 Thread Avi Kivity


 On 07/30/2010 12:13 PM, Vasiliy G Tolstov wrote:

Good night or morning

If i need to some kvm module, that runs in virtualized environment and
reports some statistics to qemu (disk free space, memory usage, cpu
utilization) what i need to write?
I need kernel module, because userspace utilities under heavy load can't send 
such information via
network (for example snmp under load does not respond).

Does kvm have skeleton module, what i can use and append code needed
for me?


I recommend using virtio-serial in userspace.  If your load is very 
high, you can use mlockall() and sched_setscheduler() to keep the 
process in memory and on a cpu.


If you absolutely need a kernel module, you can base it off virtio-serial.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] KVM: PPC: Add book3s_32 tlbie flush acceleration

2010-08-01 Thread Alexander Graf


On 01.08.2010, at 16:08, Avi Kivity wrote:

 On 07/29/2010 04:04 PM, Alexander Graf wrote:
 On Book3s_32 the tlbie instruction flushed effective addresses by the mask
 0x0000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
 to speed up that target we should also keep a special hash around for it.
 
 
  static inline u64 kvmppc_mmu_hash_vpte(u64 vpage)
  {
  return hash_64(vpage  0xfULL, HPTEG_HASH_BITS_VPTE);
 @@ -66,6 +72,11 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, 
 struct hpte_cache *pte)
  index = kvmppc_mmu_hash_pte(pte-pte.eaddr);
  hlist_add_head_rcu(pte-list_pte,vcpu-arch.hpte_hash_pte[index]);
 
 +/* Add to ePTE_long list */
 +index = kvmppc_mmu_hash_pte_long(pte-pte.eaddr);
 +hlist_add_head_rcu(pte-list_pte_long,
 +vcpu-arch.hpte_hash_pte_long[index]);
 +
 
 Isn't it better to make operations on this list conditional on Book3s_32?  
 Hashes are expensive since they usually cost cache misses.

Yes, the same for vpte_long and vpte - book3s_32 guests don't need them except 
for the all flush. The tough part is that this is not host but guest dependent, 
so I need to have different structs for book3s_32 and book3s_64 guests. This 
isn't a big issue, but complicates the code.

 Can of course be done later as an optimization.

Yes, that was the plan. Great to see you got the same feeling there though :). 
To be honest, I even started a book3s_32 host optimization patch and threw it 
away because it made the code less readable. So yes, this is on my radar.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] KVM PPC PV framework v3

2010-08-01 Thread Alexander Graf


On 01.08.2010, at 16:02, Avi Kivity wrote:

 On 07/29/2010 03:47 PM, Alexander Graf wrote:
 On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
 hypervisor extensions.
 
 While that is all great to show that virtualization is possible, there are
 quite some cases where the emulation overhead of privileged instructions is
 killing performance.
 
 This patchset tackles exactly that issue. It introduces a paravirtual 
 framework
 using which KVM and Linux share a page to exchange register state with. That
 way we don't have to switch to the hypervisor just to change a value of a
 privileged register.
 
 To prove my point, I ran the same test I did for the MMU optimizations 
 against
 the PV framework. Here are the results:
 
 [without]
 
 debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; 
 done
 
 real0m14.659s
 user0m8.967s
 sys 0m5.688s
 
 [with]
 
 debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; 
 done
 
 real0m7.557s
 user0m4.121s
 sys 0m3.426s
 
 
 So this is a significant performance improvement! I'm quite happy how fast 
 this
 whole thing becomes :)
 
 I tried to take all comments I've heard from people so far about such a PV
 framework into account. In case you told me something before that is a no-go
 and I still did it, please just tell me again.
 
 To make use of this whole thing you also need patches to qemu and openbios. I
 have them in my queue, but want to see this set upstream first before I start
 sending patches to the other projects.
 
 Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
 experiencing the power yourself. - heh
 
 v1 -  v2:
 
   - change hypervisor calls to use r0 and r3
   - make crit detection only trigger in supervisor mode
   - RMO -  PAM
   - introduce kvm_patch_ins
   - only flush icache when patching
   - introduce kvm_patch_ins_b
   - update documentation
 
 v2 -  v3:
 
   - use pPAPR conventions for hypercall interface
   - only use r0 as magic sc number
   - remove PVR detection
   - remove BookE shared page mapping support
   - combine book3s-64 and -32 magic page ra override
   - add self-test check if the mapping works to guest code
   - add safety check for relocatable kernels
 
 
 Looks reasonable.  Since it's fair to say I understand nothing about powerpc, 
 I'd like someone who does to review it and ack, please, with an emphasis on 
 the interfaces.

Sounds good. Preferably someone with access to the ePAPR spec :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: VT-d regression issue

2010-08-01 Thread Hao, Xudong

Alex Williamson wrote:
 On Sat, 2010-07-31 at 16:33 +0800, Hao, Xudong wrote:
 Alex Williamson wrote:
 On Thu, 2010-07-22 at 16:03 +0300, Gleb Natapov wrote:
 On Thu, Jul 22, 2010 at 08:32:31PM +0800, Hao, Xudong wrote:
 Well, this patch works fine for me.
 
 Looks like userspace problem then. Userspace relied on something
 that was not guarantied by the kernel (access to read only page
 forwarded to userspace as MMOI).
 
 I just submitted a set of patches that should fix this using the
 slow mapping path for option ROMs.  Thanks,
 
 
 Alex, I saw your PCI option ROM fixes patches in kvm. Does
 qemu-kvm have corresponding fix for this issue either? 
 
 kvm userspace and qemu-kvm are the same thing.  Device assignment
 currently only lives in the kvm userspace, it it not merged into
 upstream qemu.  Thanks,
 
 Alex

Ok, I got it, your patch is for userspace. Thanks,

Xudong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Multiplexing RFLAGS.TF

2010-08-01 Thread Jan Kiszka

Am 29.07.2010 10:37, Avi Kivity wrote:
  static int db_interception(struct vcpu_svm *svm)
 {
 struct kvm_run *kvm_run = svm-vcpu.run;
 
 if (!(svm-vcpu.guest_debug 
   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) 
 !svm-nmi_singlestep) {
 kvm_queue_exception(svm-vcpu, DB_VECTOR);
 return 1;
 }
 
 if (svm-nmi_singlestep) {
 svm-nmi_singlestep = false;
 if (!(svm-vcpu.guest_debug  KVM_GUESTDBG_SINGLESTEP))
 svm-vmcb-save.rflags =
 ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
 update_db_intercept(svm-vcpu);
 }
 
 This code assumes that either the guest is debugging itself, or
 (nmi_singlestep | guest debugging).  However if the guest is debugging
 itself and takes an NMI, or if both host and guest are debugging the
 guest, things will go wrong.

I know.

 
 So we need an rflags_guest_owned_bits, usually set to -1ULL, but
 sometimes (NMI, host debugging) clearing EFLAGS_TF.  When we do that, we
 need to intercept instructions that influence RFLAGS.TF (POPF, IRET,
 INTn) and emulate them.  Otherwise, the guest can disable tracing which
 was enabled on behalf of the host.

I was still waiting on some smart idea from AMD how to properly
implement NMIs without having to fully emulate IRET. Probably there is
no alternative...

 
 We also need to drop the 'return 1' on the top of the function to allow
 both guest and host tracing.

Support for host and guest-initiated tracing at the same time would be
nice, but I would not spend to much effort on this corner case of the
corner cases. If it happens to fall off from the NMI fix, OK. But
otherwise let the host rule TF if it wants to.

 
 On Intel, the situation is harder.  We can't trap POPF or IRET.  What we
 can do, is use the Monitor Trap Flag on hosts that have it.

Setting TF before POPF and IRET should give us at least the chance to
provide host-overrules-guest tracing support. Adding monitor trap
support would be nice. It would allow more things actually, but it may
then require some additional knob in the user/kernel interface to
control the mode (MTF steps into exceptions/interrupts, TF not).

 
 Comments?  Perhaps I missed something.  Maybe I'll try writing a test
 case to prove the brokenness, it's fashionable these days.
 
 Jan, as this is your code, are you interested in doing this?

I'm not very keen on writing complex and error-prone opcode emulations,
but in principle resolving the AMD issue is on my long to-do list - with
moderate prio though.

Cheers from Lancaster County PA,
Jan



signature.asc
Description: OpenPGP digital signature

buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2010-08-01 Thread qemu-kvm

The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/489

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Disable EPT

2010-08-01 Thread Kuniyasu Suzaki


Hello,

Please tell me how to disable EPT of KVM.
Can I disable EPT in BIOS menu? I did not find the option in BIOS menu
of Intel DX58 (+i7) mother board.

Thank you.

--
suzaki
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree

2010-08-01 Thread qemu-kvm

The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/438

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree

2010-08-01 Thread qemu-kvm

The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/438

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_2

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Alt SeaBIOS SSDT cpu hotplug

2010-08-01 Thread Liu, Jinsong

Kevin,

This patch still has issue, 
It can boot Windows 2008 DataCenter, however,
when run cpu_set cpu online command, windows 2008 Datacenter system shutdown 
at once.

Thanks,
Jinsong

 
 Sorry about that.  It looks like I messed up the SSDT ScopeOp length.
 New patch attached below.  I've tested it by adding/removing cpus on
 Linux, and I've now also boot tested winxp and winvista.  (I don't
 have Windows 2008 DataCenter.)
 
 -Kevin
 
 
 diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
 index cc31112..640716c 100644
 --- a/src/acpi-dsdt.dsl
 +++ b/src/acpi-dsdt.dsl
 @@ -648,6 +648,78 @@ DefinitionBlock (
  Zero   /* reserved */
  })
 
 +/* CPU hotplug */
 +Scope(\_SB) {
 +/* Objects filled in by run-time generated SSDT */
 +External(NTFY, MethodObj)
 +External(CPON, PkgObj)
 +
 +/* Methods called by run-time generated SSDT Processor
 objects */ +Method (CPMA, 1, NotSerialized) {
 +// _MAT method - create an madt apic buffer
 +// Local0 = CPON flag for this cpu
 +Store(DerefOf(Index(CPON, Arg0)), Local0)
 +// Local1 = Buffer (in madt apic form) to return
 +Store(Buffer(8) {0x00, 0x08, 0x00, 0x00, 0x00, 0, 0, 0},
 Local1) +// Update the processor id, lapic id, and
 enable/disable status +Store(Arg0, Index(Local1, 2))
 +Store(Arg0, Index(Local1, 3))
 +Store(Local0, Index(Local1, 4))
 +Return (Local1)
 +}
 +Method (CPST, 1, NotSerialized) {
 +// _STA method - return ON status of cpu
 +// Local0 = CPON flag for this cpu
 +Store(DerefOf(Index(CPON, Arg0)), Local0)
 +If (Local0) { Return(0xF) } Else { Return(0x0) }
 +}
 +Method (CPEJ, 2, NotSerialized) {
 +// _EJ0 method - eject callback
 +Sleep(200)
 +}
 +
 +/* CPU hotplug notify method */
 +OperationRegion(PRST, SystemIO, 0xaf00, 32)
 +Field (PRST, ByteAcc, NoLock, Preserve)
 +{
 +PRS, 256
 +}
 +Method(PRSC, 0) {
 +// Local5 = active cpu bitmap
 +Store (PRS, Local5)
 +// Local2 = last read byte from bitmap
 +Store (Zero, Local2)
 +// Local0 = cpuid iterator
 +Store (Zero, Local0)
 +While (LLess(Local0, SizeOf(CPON))) {
 +// Local1 = CPON flag for this cpu
 +Store(DerefOf(Index(CPON, Local0)), Local1)
 +If (And(Local0, 0x07)) {
 +// Shift down previously read bitmap byte
 +ShiftRight(Local2, 1, Local2)
 +} Else {
 +// Read next byte from cpu bitmap
 +Store(DerefOf(Index(Local5, ShiftRight(Local0,
 3))), Local2) +}
 +// Local3 = active state for this cpu
 +Store(And(Local2, 1), Local3)
 +
 +If (LNotEqual(Local1, Local3)) {
 +// State change - update CPON with new state
 +Store(Local3, Index(CPON, Local0))
 +// Do CPU notify
 +If (LEqual(Local3, 1)) {
 +NTFY(Local0, 1)
 +} Else {
 +NTFY(Local0, 3)
 +}
 +}
 +Increment(Local0)
 +}
 +Return(One)
 +}
 +}
 +
  Scope (\_GPE)
  {
  Name(_HID, ACPI0006)
 @@ -701,7 +773,8 @@ DefinitionBlock (
 
  }
  Method(_L02) {
 -Return(0x01)
 +// CPU hotplug event
 +Return(\_SB.PRSC())
  }
  Method(_L03) {
  Return(0x01)
 diff --git a/src/acpi.c b/src/acpi.c
 index e91f8e0..3b49c4e 100644
 --- a/src/acpi.c
 +++ b/src/acpi.c
 @@ -1,6 +1,6 @@
  // Support for generating ACPI tables (on emulators)
  //
 -// Copyright (C) 2008,2009  Kevin O'Connor ke...@koconnor.net
 +// Copyright (C) 2008-2010  Kevin O'Connor ke...@koconnor.net
  // Copyright (C) 2006 Fabrice Bellard
  //
  // This file may be distributed under the terms of the GNU LGPLv3
 license. @@ -329,64 +329,121 @@ build_madt(void)
  return madt;
  }
 
 +// Encode a hex value
 +static inline char getHex(u32 val) {
 +val = 0x0f;
 +return (val = 9) ? ('0' + val) : ('A' + val - 10);
 +}
 +
 +// Encode a length in an SSDT.
 +static u8 *
 +encodeLen(u8 *ssdt_ptr, int length, int bytes)
 +{
 +switch (bytes) {
 +default:
 +case 4: ssdt_ptr[3] = ((length  20)  0xff);
 +case 3: ssdt_ptr[2] = ((length  12)  0xff);
 +case 2: ssdt_ptr[1] = ((length  4)  0xff);
 +ssdt_ptr[0] = (((bytes-1)  0x3)  6) | (length  0x0f);
 +break;
 +case 1: ssdt_ptr[0] = length  0x3f;
 +}
 +return ssdt_ptr + bytes;
 +}
 +
 +// AML Processor() object.  See src/ssdt-proc.dsl for info.

Re: Multiplexing RFLAGS.TF

2010-08-01 Thread Avi Kivity


 On 08/02/2010 04:17 AM, Jan Kiszka wrote:




So we need an rflags_guest_owned_bits, usually set to -1ULL, but
sometimes (NMI, host debugging) clearing EFLAGS_TF.  When we do that, we
need to intercept instructions that influence RFLAGS.TF (POPF, IRET,
INTn) and emulate them.  Otherwise, the guest can disable tracing which
was enabled on behalf of the host.

I was still waiting on some smart idea from AMD how to properly
implement NMIs without having to fully emulate IRET. Probably there is
no alternative...


Well, there's the existing singlestep implementation, it just needs to 
be fixed not to assume the host has exclusive ownership of TF.  It's 
probably faster than emulation, and certainly more accurate.



We also need to drop the 'return 1' on the top of the function to allow
both guest and host tracing.

Support for host and guest-initiated tracing at the same time would be
nice, but I would not spend to much effort on this corner case of the
corner cases. If it happens to fall off from the NMI fix, OK. But
otherwise let the host rule TF if it wants to.


Taking an NMI while the guest is tracing itself is not a corner case.  I 
agree about simulataneous debugging.



On Intel, the situation is harder.  We can't trap POPF or IRET.  What we
can do, is use the Monitor Trap Flag on hosts that have it.


Actually, I think a POPF or IRET that disables TF still takes a last 
trap?  If so it's workable.



Setting TF before POPF and IRET should give us at least the chance to
provide host-overrules-guest tracing support. Adding monitor trap
support would be nice. It would allow more things actually, but it may
then require some additional knob in the user/kernel interface to
control the mode (MTF steps into exceptions/interrupts, TF not).


There's also branch trace in debugctlmsr, that allows you to quickly 
step out of a function.



Comments?  Perhaps I missed something.  Maybe I'll try writing a test
case to prove the brokenness, it's fashionable these days.

Jan, as this is your code, are you interested in doing this?

I'm not very keen on writing complex and error-prone opcode emulations,
but in principle resolving the AMD issue is on my long to-do list - with
moderate prio though.



Definitely all this code has to be accompanied by test cases.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86 emulator: fix LMSW able to clear cr0.pe

2010-08-01 Thread Avi Kivity

LMSW is documented not to be able to set cr0.pe; make it so.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc1ecff..05f80f7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3212,7 +3212,7 @@ twobyte_insn:
c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
-   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0ful) |
+   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0eul) |
(c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Disable EPT

2010-08-01 Thread pradeepkumar

On Mon, 02 Aug 2010 10:46:13 +0900 (JST)
Kuniyasu Suzaki k.suz...@aist.go.jp wrote:

 
 Hello,
 
 Please tell me how to disable EPT of KVM.
 Can I disable EPT in BIOS menu? I did not find the option in BIOS menu
 of Intel DX58 (+i7) mother board.
 


unload kvm_intel.ko module and load it back with with ept=0


 Thank you.
 
 --
 suzaki
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix memory leak in register save load due to xsave support

2010-08-01 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 qemu-kvm-x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index fd426b7..4c32771 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -848,6 +848,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 xcrs.xcrs[0].value = env-xcr0;
 kvm_set_xcrs(env, xcrs);
 }
+qemu_free(xsave);
 } else {
 #endif
 memset(fpu, 0, sizeof fpu);
@@ -1042,6 +1043,7 @@ void kvm_arch_save_regs(CPUState *env)
 if (xcrs.xcrs[0].xcr == 0)
 env-xcr0 = xcrs.xcrs[0].value;
 }
+qemu_free(xsave);
 } else {
 #endif
 kvm_get_fpu(env, fpu);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: x86 emulator: don't update vcpu state if instruction is restarted.

2010-08-01 Thread Avi Kivity


 On 08/01/2010 04:27 PM, Gleb Natapov wrote:


When we are going to enable e_i_g_s by default?


Optimistically, 2.6.37, so six months.


May be we have enough
time to fix userspace?


Sure we do, but will users update?

0.12 is mature enough that some users will forget about it and not 
update it.



Too ancient userspace already does not run on recent
kvm. Or may be we can make userspace enable e_i_g_s per guest. This way
userspace that knows it is OK can tell kernel so.


Let's make it the other way round, enable the optimization for userspace 
that declares that it does not make use of rip during emulation 
(kvm-tpr-opt can be changed by queueing a signal and re-entering the 
guest to complete the operation).


Later we can make the optimization unconditional.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Alt SeaBIOS SSDT cpu hotplug

2010-08-01 Thread Kevin O'Connor

On Mon, Aug 02, 2010 at 10:41:39AM +0800, Liu, Jinsong wrote:
 Kevin,
 
 This patch still has issue, It can boot Windows 2008 DataCenter,
 however, when run cpu_set cpu online command, windows 2008
 Datacenter system shutdown at once.

Thanks for testing.

I've inspected the generated DSDT and SSDT files, and I don't see
anything wrong with them.  (To inspect a generated SSDT, uncomment the
call to hexdump(), cut and paste the hexdump output into SeaBIOS'
tools/transdump.py, and then call iasl -d on the output.)

It seems the Windows acpi interpreter is significantly different from
the Linux one.  The only guess I have is that Windows doesn't like one
of the ASL constructs even though they all look valid.  I'd try to
debug this by commenting out parts of the ASL until I narrowed down
the parts causing the problem.  Unfortunately, I don't have Windows
2008 to do this directly.

Any other ideas?

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] loader: pad kernel size when loaded from a uImage

2010-08-01 Thread Edgar E. Iglesias

On Sat, Jul 31, 2010 at 12:56:42AM +0200, Edgar E. Iglesias wrote:
 On Thu, Jul 29, 2010 at 06:48:24PM -0700, Hollis Blanchard wrote:
  The kernel's BSS size is lost by mkimage, which only considers file
  size. As a result, loading other blobs (e.g. device tree, initrd)
  immediately after the kernel location can result in them being zeroed by
  the kernel's BSS initialization code.
  
  Signed-off-by: Hollis Blanchard hol...@penguinppc.org
  ---
   hw/loader.c |7 +++
   1 files changed, 7 insertions(+), 0 deletions(-)
  
  diff --git a/hw/loader.c b/hw/loader.c
  index 79a6f95..35bc25a 100644
  --- a/hw/loader.c
  +++ b/hw/loader.c
  @@ -507,6 +507,13 @@ int load_uimage(const char *filename, 
  target_phys_addr_t *ep,
   
   ret = hdr-ih_size;
   
  +   /* The kernel's BSS size is lost by mkimage, which only considers file
  +* size. We don't know how big it is, but we do know we can't place
  +* anything immediately after the kernel. The padding seems like it 
  should
  +* be proportional to overall file size, but we also make sure it's at
  +* least 4-byte aligned. */
  +   ret += (hdr-ih_size / 16)  ~0x3;
 
 Maybe it's only me, but it feels a bit akward to push down this kind of
 knowledge down the abstraction layers. Does it work for you to have your
 caller of load_uimage apply whatever resizing magic needed for your kernel
 and arch?


Hi Hollis,

Sorry I was a bit in a hurry and short last time. And sorry for the bad
wording, I thought awkward simply meant wrong (english is not my native
languauge).

Ayway, IMO the conventions of where to pass blobs from the bootloader to the
loaded image are an agreement between the bootloader and the loaded code. The
formats or mechanisms to load the image should need to be involved that much.

For example in this particular case, other archs (e.g, MicroBlaze) might not
need any magic. The MicroBlaze linux kernel moves cmdline and device tree blobs
into safe areas prior to .bss initialization.

That's why I think that these kind of decisions should be made higher up.

Thanks and sorry for my clumsy wording last time :)
Edgar
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] KVM: PPC: Add book3s_32 tlbie flush acceleration

2010-08-01 Thread Avi Kivity


 On 07/29/2010 04:04 PM, Alexander Graf wrote:

On Book3s_32 the tlbie instruction flushed effective addresses by the mask
0x0000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
to speed up that target we should also keep a special hash around for it.


  static inline u64 kvmppc_mmu_hash_vpte(u64 vpage)
  {
return hash_64(vpage  0xfULL, HPTEG_HASH_BITS_VPTE);
@@ -66,6 +72,11 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct 
hpte_cache *pte)
index = kvmppc_mmu_hash_pte(pte-pte.eaddr);
hlist_add_head_rcu(pte-list_pte,vcpu-arch.hpte_hash_pte[index]);

+   /* Add to ePTE_long list */
+   index = kvmppc_mmu_hash_pte_long(pte-pte.eaddr);
+   hlist_add_head_rcu(pte-list_pte_long,
+   vcpu-arch.hpte_hash_pte_long[index]);
+


Isn't it better to make operations on this list conditional on 
Book3s_32?  Hashes are expensive since they usually cost cache misses.


Can of course be done later as an optimization.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/7] Rest of my KVM-PPC patch queue

2010-08-01 Thread Avi Kivity


 On 07/29/2010 04:04 PM, Alexander Graf wrote:

During the past few weeks a couple of fixes have gathered in my queue. This
is a dump of everything that is not related to the PV framework.

Please apply on top of the PV stuff.



Looks reasonable as well.  I'll apply as soon as I get a review on the 
previous patchset.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] KVM: PPC: Add book3s_32 tlbie flush acceleration

2010-08-01 Thread Alexander Graf


On 01.08.2010, at 16:08, Avi Kivity wrote:

 On 07/29/2010 04:04 PM, Alexander Graf wrote:
 On Book3s_32 the tlbie instruction flushed effective addresses by the mask
 0x0000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
 to speed up that target we should also keep a special hash around for it.
 
 
  static inline u64 kvmppc_mmu_hash_vpte(u64 vpage)
  {
  return hash_64(vpage  0xfULL, HPTEG_HASH_BITS_VPTE);
 @@ -66,6 +72,11 @@ void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, 
 struct hpte_cache *pte)
  index = kvmppc_mmu_hash_pte(pte-pte.eaddr);
  hlist_add_head_rcu(pte-list_pte,vcpu-arch.hpte_hash_pte[index]);
 
 +/* Add to ePTE_long list */
 +index = kvmppc_mmu_hash_pte_long(pte-pte.eaddr);
 +hlist_add_head_rcu(pte-list_pte_long,
 +vcpu-arch.hpte_hash_pte_long[index]);
 +
 
 Isn't it better to make operations on this list conditional on Book3s_32?  
 Hashes are expensive since they usually cost cache misses.

Yes, the same for vpte_long and vpte - book3s_32 guests don't need them except 
for the all flush. The tough part is that this is not host but guest dependent, 
so I need to have different structs for book3s_32 and book3s_64 guests. This 
isn't a big issue, but complicates the code.

 Can of course be done later as an optimization.

Yes, that was the plan. Great to see you got the same feeling there though :). 
To be honest, I even started a book3s_32 host optimization patch and threw it 
away because it made the code less readable. So yes, this is on my radar.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] KVM PPC PV framework v3

2010-08-01 Thread Alexander Graf


On 01.08.2010, at 16:02, Avi Kivity wrote:

 On 07/29/2010 03:47 PM, Alexander Graf wrote:
 On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
 hypervisor extensions.
 
 While that is all great to show that virtualization is possible, there are
 quite some cases where the emulation overhead of privileged instructions is
 killing performance.
 
 This patchset tackles exactly that issue. It introduces a paravirtual 
 framework
 using which KVM and Linux share a page to exchange register state with. That
 way we don't have to switch to the hypervisor just to change a value of a
 privileged register.
 
 To prove my point, I ran the same test I did for the MMU optimizations 
 against
 the PV framework. Here are the results:
 
 [without]
 
 debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; 
 done
 
 real0m14.659s
 user0m8.967s
 sys 0m5.688s
 
 [with]
 
 debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; 
 done
 
 real0m7.557s
 user0m4.121s
 sys 0m3.426s
 
 
 So this is a significant performance improvement! I'm quite happy how fast 
 this
 whole thing becomes :)
 
 I tried to take all comments I've heard from people so far about such a PV
 framework into account. In case you told me something before that is a no-go
 and I still did it, please just tell me again.
 
 To make use of this whole thing you also need patches to qemu and openbios. I
 have them in my queue, but want to see this set upstream first before I start
 sending patches to the other projects.
 
 Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
 experiencing the power yourself. - heh
 
 v1 -  v2:
 
   - change hypervisor calls to use r0 and r3
   - make crit detection only trigger in supervisor mode
   - RMO -  PAM
   - introduce kvm_patch_ins
   - only flush icache when patching
   - introduce kvm_patch_ins_b
   - update documentation
 
 v2 -  v3:
 
   - use pPAPR conventions for hypercall interface
   - only use r0 as magic sc number
   - remove PVR detection
   - remove BookE shared page mapping support
   - combine book3s-64 and -32 magic page ra override
   - add self-test check if the mapping works to guest code
   - add safety check for relocatable kernels
 
 
 Looks reasonable.  Since it's fair to say I understand nothing about powerpc, 
 I'd like someone who does to review it and ack, please, with an emphasis on 
 the interfaces.

Sounds good. Preferably someone with access to the ePAPR spec :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

58 matches

Mail list logo