Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-11 Thread Zhang, Yanmin
On Thu, 2010-03-11 at 09:50 +0200, Avi Kivity wrote:
 On 03/11/2010 09:46 AM, Sheng Yang wrote:
  On Thursday 11 March 2010 15:36:01 Avi Kivity wrote:
 
  On 03/11/2010 09:20 AM, Sheng Yang wrote:
   

 
  Currently we can only get the cpu_stat of whole guest as one. This patch
  enhanced cpu_stat with more detail, has guest_system and guest_user cpu
  time statistics with a little overhead.
 
  Signed-off-by: Sheng Yangsh...@linux.intel.com
It seems per-process guest cpu utilization is more useful than per-cpu's.


  ---
 
  This draft patch based on KVM upstream to show the idea. I would split it
  into more kernel friendly version later.
 
  The overhead is, the cost of get_cpl() after each exit from guest.
 
  This can be very expensive in the nested virtualization case, so I
  wouldn't like this to be in normal paths.  I think detailed profiling
  like that can be left to 'perf kvm', which only has overhead if enabled
  at runtime.
   
  Yes, that's my concern too(though nested vmcs/vmcb read already too 
  expensive,
  they should be optimized...).
 
 Any ideas on how to do that?  Perhaps use paravirt_ops to covert the 
 vmread into a memory read?  We store the vmwrites in the vmcs anyway.
Another method is to add sysctl entry, such like 
/proc/sys/kernel/collect_guest_utilization,
and we can set it off by default. Or add a 
/sys/kernel/debug/kvm/collect_guest_utilization.

 
  The other concern is, perf alike mechanism would
  bring a lot more overhead compared to this.
 
 
 Ordinarily users won't care if time is spent in guest kernel mode or 
 guest user mode.  They want to see which guest is imposing a load on a 
 system.  I consider a user profiling a guest from the host an advanced 
 and rarer use case, so it's okay to require tools and additional 
 overhead for this.
Here is the story why Sheng worked out the patch. Some guys work on
KVM performance. They want us to extend top to show guest utilization
info, such like guest kernel and guest userspace cpu utilization. With
the new tool, they could find which VM (mapping with qemu process id)
consumes too much cpu time in host space (including kernel and userspace),
and compare them with guest kernel/userspace. That information could provide
a first-hand high-level overview about all VMs running in the system
and help admin quickly find what the worst VM instance is.

So we need per-process (guest) cpu utilization than per-cpu guest utilization.

 
  For example you can put the code to note the cpl in a tracepoint which
  is enabled dynamically.
   
  Yanmin have already implement perf kvm to support this. We are just 
  arguing
  if a normal top-alike mechanism is necessary.
perf kvm mostly is used to find hot functions which might cause more overhead.
Sheng's patch has less overhead.


 
  I am also considering to make it a feature that can be disabled. But seems 
  it
  make things complicate and result in uncertain cpustat output.
 
 
 I'm not even sure that guest time was a good idea.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Sheng Yang
On Thursday 11 March 2010 15:58:12 Avi Kivity wrote:
 On 03/11/2010 09:52 AM, Sheng Yang wrote:
  I think we have already suffered enough timer issues due to this(e.g. I
  can't boot up well on 2.6.18 kernel)...
 
 2.6.18 as guest or as host?

Guest
 
  I have kept --no-hpet in my setup for
  months...
 
 Any details about the problems?  HPET is important to some guests.
 

Seems like HPET reaction is too slow to satisfy some guests(for it would 
replace PIT).

Here is the thread last time.

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899

-- 
regards
Yang, Sheng



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Avi Kivity

On 03/11/2010 10:23 AM, Sheng Yang wrote:

I have kept --no-hpet in my setup for
months...
   

Any details about the problems?  HPET is important to some guests.

 

Seems like HPET reaction is too slow to satisfy some guests(for it would
replace PIT).

Here is the thread last time.

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899

   


Thanks.  We can address this in three ways: first, adjust the guest not 
to do timing related tests when virtualized (since no matter what we do, 
the tests may fail).  Second, I think we should implement userspace ack 
notifiers (similar to tpr access notifiers already present).  Third, we 
can implement a kernel hpet, which, after we solve the zillion bug it 
introduces, will also give a nice performance improvement for hpet 
intensive workloads.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote:
 On 03/11/2010 10:23 AM, Sheng Yang wrote:
 I have kept --no-hpet in my setup for
 months...
 Any details about the problems?  HPET is important to some guests.
 
 Seems like HPET reaction is too slow to satisfy some guests(for it would
 replace PIT).
 
 Here is the thread last time.
 
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899
 
 
 Thanks.  We can address this in three ways: first, adjust the guest
 not to do timing related tests when virtualized (since no matter
 what we do, the tests may fail).  Second, I think we should
 implement userspace ack notifiers (similar to tpr access notifiers
 already present).  Third, we can implement a kernel hpet, which,
 after we solve the zillion bug it introduces, will also give a nice
 performance improvement for hpet intensive workloads.
 
Second will not solve the problem. Presence of ack notifiers will not
make HPET interrupt arrive faster.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-11 Thread Avi Kivity

On 03/02/2010 02:14 AM, Marcelo Tosatti wrote:

On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
   

This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:

- cpu_synchronize_all_states in qemu_savevm_state_complete
   (initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
   (writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
   (writeback after system reset)

These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:

- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)

This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.

cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.

Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.

Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 

Jan,

This patch breaks system reset of WinXP.32 install (more easily
reproducible without iothread enabled).

   


What's the conclusion here?  The patch is innocent of the regression?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/3] target-i386: print EFER in cpu_dump_state

2010-03-11 Thread Avi Kivity

On 03/09/2010 03:53 AM, Marcelo Tosatti wrote:

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Index: qemu-kvm-uq/target-i386/helper.c
===
--- qemu-kvm-uq.orig/target-i386/helper.c
+++ qemu-kvm-uq/target-i386/helper.c
@@ -1176,6 +1176,7 @@ void cpu_dump_state(CPUState *env, FILE
  cpu_x86_dump_seg_cache(env, f, cpu_fprintf, TR,env-tr);

  #ifdef TARGET_X86_64
+cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer);
  if (env-hflags  HF_LMA_MASK) {
  cpu_fprintf(f, GDT= %016 PRIx64  %08x\n,
  env-gdt.base, env-gdt.limit);

   


Better to do this for i386 too, no?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Sheng Yang
On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote:
 On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote:
  On 03/11/2010 10:23 AM, Sheng Yang wrote:
  I have kept --no-hpet in my setup for
  months...
  
  Any details about the problems?  HPET is important to some guests.
  
  Seems like HPET reaction is too slow to satisfy some guests(for it would
  replace PIT).
  
  Here is the thread last time.
  
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899
 
  Thanks.  We can address this in three ways: first, adjust the guest
  not to do timing related tests when virtualized (since no matter
  what we do, the tests may fail).  Second, I think we should
  implement userspace ack notifiers (similar to tpr access notifiers
  already present).  Third, we can implement a kernel hpet, which,
  after we solve the zillion bug it introduces, will also give a nice
  performance improvement for hpet intensive workloads.
 
 Second will not solve the problem. Presence of ack notifiers will not
 make HPET interrupt arrive faster.

The slow may also due to lost tick. And with the lost tick, hpet is still 
unusable...

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM test: Support to SLES install

2010-03-11 Thread yogi
On Wed, 2010-03-10 at 10:42 -0300, Lucas Meneghel Rodrigues wrote:
 On Wed, Mar 10, 2010 at 8:45 AM, Lucas Meneghel Rodrigues
 l...@redhat.com wrote:
  From: yogi anant...@linux.vnet.ibm.com
 
  Adds new entry SUSE in test_base file for sles and
  contains autoinst file for doing unatteneded Sles11 64-bit
  install.
 
 Oh Yogi, by the way, could you please reorganize the opensuse session
 and add at least an autoyast file for opensuse 11.2 so I can actually
 test if we can get a successful installation? I tried to play with the
 XML file of SLES to see if I could get opensuse 11.2 installed, but it
 turns out that those config files are an endless XML nightmare and all
 I tried makes yast to die.
 
 The mechanics of the whole thing are correct, I can get yast to start
 with no problems, but parsing the autoyast file makes the VM to hang.
 So I am fine with adding the patch, but it'd be nice to have an OS
 with irrestrict access that everybody could play with (opensuse). I
 don't have enough time to make it work on my own, so if you have some
 spare time, please work on this.
 
sure Lucas, i will be happy to create autoyast file for opensuse too.
Will work on tht patch and send it as soon as possible.
  Signed-off-by: Yogananth Subramanian anant...@linux.vnet.ibm.com
  ---
   client/tests/kvm/tests_base.cfg.sample |   22 ++
   1 files changed, 22 insertions(+), 0 deletions(-)
 
  diff --git a/client/tests/kvm/tests_base.cfg.sample 
  b/client/tests/kvm/tests_base.cfg.sample
  index c76470d..acb2076 100644
  --- a/client/tests/kvm/tests_base.cfg.sample
  +++ b/client/tests/kvm/tests_base.cfg.sample
  @@ -503,6 +503,28 @@ variants:
  md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8
  md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0
 
  +- SLES:
  +no setup
  +shell_prompt = ^r...@.*[\#\$]\s*$|#
  +unattended_install:
  +pxe_image = linux
  +pxe_initrd = initrd
  +extra_params +=  -bootp /pxelinux.0 -boot n
  +kernel_args = autoyast=floppy
  +
  +variants:
  +- 11.64:
  +no setup
  +image_name = sles11-64
  +cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso
  +md5sum = 50a2bd45cd12c3808c3ee48208e2586b
  +md5sum_1m = 0951cab7c32e332362fc424c1054
  +unattended_install:
  +unattended_file = 
  unattended/Sles11-64-autoinst.xml
  +tftp = images/sles11-64/tftpboot
  +floppy = images/sles11-64floppy.img
  +pxe_dir = boot/x86_64/loader
  +
  - @Ubuntu:
  shell_prompt = ^r...@.*[\#\$]\s*$
 
  --
  1.6.6.1
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] kvm: handle internal error

2010-03-11 Thread Avi Kivity

On 03/09/2010 03:53 AM, Marcelo Tosatti wrote:

Port qemu-kvm's KVM_EXIT_INTERNAL_ERROR handling to upstream.

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Index: qemu-kvm/kvm-all.c
===
--- qemu-kvm.orig/kvm-all.c
+++ qemu-kvm/kvm-all.c
@@ -721,6 +721,28 @@ static int kvm_handle_io(uint16_t port,
  return 1;
  }

+#ifdef KVM_CAP_INTERNAL_ERROR_DATA
+static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
+{
+
+if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
+int i;
+
+fprintf(stderr, KVM internal error. Suberror: %d\n,
+run-internal.suberror);
+
+for (i = 0; i  run-internal.ndata; ++i) {
+fprintf(stderr, extra data[%d]: %PRIx64\n,
+i, (uint64_t)run-internal.data[i]);
+}
+}
+cpu_dump_state(env, stderr, fprintf, 0);
+if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION)
+fprintf(stderr, emulation failure\n);
   


{ braces }


+vm_stop(0);
+}
+#endif
   


Should trigger a qmp message to let management know something went wrong 
(can come later).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] kvm: allow qemu to set EPT identity mapping address

2010-03-11 Thread Avi Kivity

On 03/09/2010 03:53 AM, Marcelo Tosatti wrote:

From: Sheng Yangsh...@linux.intel.com

If we use larger BIOS image than current 256KB, we would need move reserved
TSS and EPT identity mapping pages. Currently TSS support this, but not
EPT.

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Index: qemu-kvm/target-i386/kvm.c
===
--- qemu-kvm.orig/target-i386/kvm.c
+++ qemu-kvm/target-i386/kvm.c
@@ -341,6 +341,24 @@ static int kvm_has_msr_star(CPUState *en
  return 0;
  }

+static int kvm_init_identity_map_page(KVMState *s)
+{
+#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
+int ret;
+uint64_t addr = 0xfffbc000;
+
+if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR))
+return 0;
   


{ braces }


+
+ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR,addr);
+if (ret  0) {
+fprintf(stderr, kvm_set_identity_map_addr: %s\n, strerror(ret));
+return ret;
+}
+#endif
+return 0;
+}
+
  int kvm_arch_init(KVMState *s, int smp_cpus)
  {
  int ret;
@@ -368,7 +386,11 @@ int kvm_arch_init(KVMState *s, int smp_c
  perror(e820_add_entry() table is full);
  exit(1);
  }
-return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+if (ret  0)
+return ret;
   


{ }


+
+return kvm_init_identity_map_page(s);
  }

  static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote:
 On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote:
  On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote:
   On 03/11/2010 10:23 AM, Sheng Yang wrote:
   I have kept --no-hpet in my setup for
   months...
   
   Any details about the problems?  HPET is important to some guests.
   
   Seems like HPET reaction is too slow to satisfy some guests(for it would
   replace PIT).
   
   Here is the thread last time.
   
   http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899
  
   Thanks.  We can address this in three ways: first, adjust the guest
   not to do timing related tests when virtualized (since no matter
   what we do, the tests may fail).  Second, I think we should
   implement userspace ack notifiers (similar to tpr access notifiers
   already present).  Third, we can implement a kernel hpet, which,
   after we solve the zillion bug it introduces, will also give a nice
   performance improvement for hpet intensive workloads.
  
  Second will not solve the problem. Presence of ack notifiers will not
  make HPET interrupt arrive faster.
 
 The slow may also due to lost tick. And with the lost tick, hpet is still 
 unusable...
 
If the problem it due to lost ticks reinjection may solve it, but only 
partially.
What if IO thread haven't run even once during the time vcpu did clock
source check? IIRC sometimes we trigger this even with in kernel PIT.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Avi Kivity

On 03/11/2010 10:42 AM, Gleb Natapov wrote:

On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote:
   

On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote:
 

On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote:
   

On 03/11/2010 10:23 AM, Sheng Yang wrote:
 

I have kept --no-hpet in my setup for
months...
   

Any details about the problems?  HPET is important to some guests.
 

Seems like HPET reaction is too slow to satisfy some guests(for it would
replace PIT).

Here is the thread last time.

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899
   

Thanks.  We can address this in three ways: first, adjust the guest
not to do timing related tests when virtualized (since no matter
what we do, the tests may fail).  Second, I think we should
implement userspace ack notifiers (similar to tpr access notifiers
already present).  Third, we can implement a kernel hpet, which,
after we solve the zillion bug it introduces, will also give a nice
performance improvement for hpet intensive workloads.
 

Second will not solve the problem. Presence of ack notifiers will not
make HPET interrupt arrive faster.
   

The slow may also due to lost tick. And with the lost tick, hpet is still
unusable...

 

If the problem it due to lost ticks reinjection may solve it, but only 
partially.
What if IO thread haven't run even once during the time vcpu did clock
source check? IIRC sometimes we trigger this even with in kernel PIT.
   


That is true.  Reinjection can correct problems in the long term, but 
may fail in the short term.  10 ticks is easily short term in a heavily 
loaded system.


How does it happen with kernel PIT?  I could understand it if we had a 
work item doing the injection, but everything happens either from 
hrtimer context or vcpu context.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint

2010-03-11 Thread Avi Kivity
Reading rip is expensive on vmx, so move it inside the tracepoint so we only
incur the cost if tracing is enabled.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/svm.c   |2 +-
 arch/x86/kvm/trace.h |6 +++---
 arch/x86/kvm/vmx.c   |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3a2f2b9..b646e96 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2685,7 +2685,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu-run;
u32 exit_code = svm-vmcb-control.exit_code;
 
-   trace_kvm_exit(exit_code, svm-vmcb-save.rip);
+   trace_kvm_exit(exit_code, vcpu);
 
if (unlikely(svm-nested.exit_required)) {
nested_svm_vmexit(svm);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b75efef..3cf9547 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic,
  * Tracepoint for kvm guest exit:
  */
 TRACE_EVENT(kvm_exit,
-   TP_PROTO(unsigned int exit_reason, unsigned long guest_rip),
-   TP_ARGS(exit_reason, guest_rip),
+   TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu),
+   TP_ARGS(exit_reason, vcpu),
 
TP_STRUCT__entry(
__field(unsigned int,   exit_reason )
@@ -192,7 +192,7 @@ TRACE_EVENT(kvm_exit,
 
TP_fast_assign(
__entry-exit_reason= exit_reason;
-   __entry-guest_rip  = guest_rip;
+   __entry-guest_rip  = kvm_rip_read(vcpu);
),
 
TP_printk(reason %s rip 0x%lx,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae3217d..06108f3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3605,7 +3605,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 exit_reason = vmx-exit_reason;
u32 vectoring_info = vmx-idt_vectoring_info;
 
-   trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
+   trace_kvm_exit(exit_reason, vcpu);
 
/* If guest state is invalid, start emulating */
if (vmx-emulation_required  emulate_invalid_guest_state)
-- 
1.7.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat

2010-03-11 Thread Sheng Yang
On Thursday 11 March 2010 15:50:54 Avi Kivity wrote:
 On 03/11/2010 09:46 AM, Sheng Yang wrote:
  On Thursday 11 March 2010 15:36:01 Avi Kivity wrote:
  On 03/11/2010 09:20 AM, Sheng Yang wrote:
  Currently we can only get the cpu_stat of whole guest as one. This
  patch enhanced cpu_stat with more detail, has guest_system and
  guest_user cpu time statistics with a little overhead.
 
  Signed-off-by: Sheng Yangsh...@linux.intel.com
  ---
 
  This draft patch based on KVM upstream to show the idea. I would split
  it into more kernel friendly version later.
 
  The overhead is, the cost of get_cpl() after each exit from guest.
 
  This can be very expensive in the nested virtualization case, so I
  wouldn't like this to be in normal paths.  I think detailed profiling
  like that can be left to 'perf kvm', which only has overhead if enabled
  at runtime.
 
  Yes, that's my concern too(though nested vmcs/vmcb read already too
  expensive, they should be optimized...).
 
 Any ideas on how to do that?  Perhaps use paravirt_ops to covert the
 vmread into a memory read?  We store the vmwrites in the vmcs anyway.

When Qing(CCed) was working on nested VMX in the past, he found PV 
vmread/vmwrite indeed works well(it would write to the virtual vmcs so vmwrite 
can also benefit). Though compared to old machine(one our internal patch shows 
improve more than 5%), NHM get less benefit due to the reduced vmexit cost.

-- 
regards
Yang, Sheng

 
  The other concern is, perf alike mechanism would
  bring a lot more overhead compared to this.
 
 Ordinarily users won't care if time is spent in guest kernel mode or
 guest user mode.  They want to see which guest is imposing a load on a
 system.  I consider a user profiling a guest from the host an advanced
 and rarer use case, so it's okay to require tools and additional
 overhead for this.
 
  For example you can put the code to note the cpl in a tracepoint which
  is enabled dynamically.
 
  Yanmin have already implement perf kvm to support this. We are just
  arguing if a normal top-alike mechanism is necessary.
 
  I am also considering to make it a feature that can be disabled. But
  seems it make things complicate and result in uncertain cpustat output.
 
 I'm not even sure that guest time was a good idea.
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Paolo Bonzini

On 03/11/2010 08:55 AM, Avi Kivity wrote:

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:


2. Do we have kvm-specific projects? Can they be part of the QEMU project
or do we need a different mentoring organization for it?


Something really interesting is kvm-assisted tcg. I'm afraid it's a bit
too complicated to GSoC.


I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in 
some time, so it's good to have ideas written down.


Also, the selection of projects will be done by members of the 
community, by grading the student's submissions.  The bar would be 
placed higher for someone who picks a complicated project.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


guest kernel debugging through serial port

2010-03-11 Thread Neo Jia
hi,

I have followed the windows guest debugging procedure from
http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging. And
it works when I start two guests and bind tcp port to guest serial
port, but it is really slow.

And if I use -serial /dev/ttyS1 for the guest debugging target, I
can't talk to it from my dev machine that has connected to ttyS1 with
target machine (host).

Is this a known problem?

Thanks,
Neo

-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Status of KVM vulnerabilities

2010-03-11 Thread Daniel Bareiro
Hi, all.

Recently Debian has published the DSA-2010-1 [1] where the following
vulnerabilities are fixed:

* CVE-2010-0298  CVE-2010-0306 (Gleb Natapov)
* CVE-2010-0309 (Marcelo Tosatti)
* CVE-2010-0419 (Paolo Bonzini)

I'm using Linux 2.6.32.3 with qemu-kvm-0.12.1.2 and I would like to know
if it is necessary to update kvm-kmod or qemu-kvm, if some of these
versions presents this vulnerability and some new version already exists
and fix it.


Thanks in advance for your replies.

Regards,
Daniel

[1] http://seclists.org/bugtraq/2010/Mar/98
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598


signature.asc
Description: Digital signature


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-11 Thread Takuya Yoshikawa

Gleb Natapov wrote:

On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote:

Gleb Natapov wrote:

Entering guest from time to time will not change semantics of the
processor (if code is not modified under processor's feet at least).
Currently we reenter guest mode after each iteration of string
instruction for all instruction but ins/outs.


E.g., is there no chance that during the repetitions, in the middle of the
repetitions, page faults occur? If it can, without entering the guest, can
we handle it?
-- I lack some basic assumptions?


If page fault occurs we inject it to the guest.


Oh, I maight fail to tell what I worried about.
Opposite, I mean, I worried about NOT reentering the guest case.


Are you thinking about something specific here? If we inject exceptions

Yes.


when they occur and we inject interrupt when they arrive what problem do
you see? I guess this is how real CPU actually works. I doubt it
re-reads string instruction on each iteration.


No problem if we detect and inject page faults like that.

I just didn't so certain that when we encounter a page fault in the middle
of the repetitions(about rep specific case), if we can inject it, suspend the
repetition and enter the guest immediately like SDM Vol.2B says:

 A repeating string operation can be suspended by an exception or interrupt.
  When this happens, the state of the registers is preserved to allow the string
  operation to be resumed upon a return from the exception or interrupt handler.
  ...
  This mechanism allows long string operations to proceed without affecting the
  interrupt response time of the system.

Ah, I might misunderstand that if we reenter the guest every time for rep,
page fault detection, not injection, can be done by the other ways easily,
by EXIT reason or something. Both ways may need the same thing, sorry.


Another concern I wrote was just about the dependencies between your
time to time criteria and SDM's without affecting the interrupt response 
time.
This is just the problem of how we can determine the criteria appropriately.


I know that current implementation with reentrance is OK.

Current implementation does not reenter guest on each iteration for pio
string, so currently we have both variants.


I'm sorry, I was confused as if the current implementation already
included some of your patches.




To inject a page fault without reentering the guest, we need to add
some more hacks to the emulator IIUC.


No, we just need to enter guest if exception happens. I see that this in
handled incorrectly in my current patch series.


I was just not certain if the following condition(from SDM Vol.2B) is satisfied

  The source and destination registers point to the next string elements
   to be operated on, the EIP register points to the string instruction,
   and the ECX register has the value it held following the last successful
   iteration of the instruction.

in the emulator's fault handling. I should have read your patch more closely.

Thanks,
  Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 06:58:14PM +0900, Takuya Yoshikawa wrote:
 Gleb Natapov wrote:
 On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote:
 Gleb Natapov wrote:
 Entering guest from time to time will not change semantics of the
 processor (if code is not modified under processor's feet at least).
 Currently we reenter guest mode after each iteration of string
 instruction for all instruction but ins/outs.
 
 E.g., is there no chance that during the repetitions, in the middle of the
 repetitions, page faults occur? If it can, without entering the guest, can
 we handle it?
 -- I lack some basic assumptions?
 
 If page fault occurs we inject it to the guest.
 
 Oh, I maight fail to tell what I worried about.
 Opposite, I mean, I worried about NOT reentering the guest case.
 
 Are you thinking about something specific here? If we inject exceptions
 Yes.
 
 when they occur and we inject interrupt when they arrive what problem do
 you see? I guess this is how real CPU actually works. I doubt it
 re-reads string instruction on each iteration.
 
 No problem if we detect and inject page faults like that.
 
Yes, that part is missing from my patch.

 I just didn't so certain that when we encounter a page fault in the middle
 of the repetitions(about rep specific case), if we can inject it, suspend the
 repetition and enter the guest immediately like SDM Vol.2B says:
 
  A repeating string operation can be suspended by an exception or interrupt.
   When this happens, the state of the registers is preserved to allow the 
 string
   operation to be resumed upon a return from the exception or interrupt 
 handler.
   ...
   This mechanism allows long string operations to proceed without affecting 
 the
   interrupt response time of the system.
 
 Ah, I might misunderstand that if we reenter the guest every time for rep,
 page fault detection, not injection, can be done by the other ways easily,
 by EXIT reason or something. Both ways may need the same thing, sorry.
When instruction is emulated page fault detection is done by the
emulator itself. During guest entry the exception is injected. So all we
need to do in the emulator is to enter guest immediately when exception
condition is detected.

 
 Another concern I wrote was just about the dependencies between your
 time to time criteria and SDM's without affecting the interrupt response 
 time.
 This is just the problem of how we can determine the criteria appropriately.
 
We can reenter guest immediately if there is pending interrupt (we can't
do that with ins read ahead, but this optimization is non architectural anyway).

 I know that current implementation with reentrance is OK.
 Current implementation does not reenter guest on each iteration for pio
 string, so currently we have both variants.
 
 I'm sorry, I was confused as if the current implementation already
 included some of your patches.
 
It's independent from my patches. This is how string pio always worked.
Otherwise certain workloads are too slow.

 
 To inject a page fault without reentering the guest, we need to add
 some more hacks to the emulator IIUC.
 
 No, we just need to enter guest if exception happens. I see that this in
 handled incorrectly in my current patch series.
 
 I was just not certain if the following condition(from SDM Vol.2B) is 
 satisfied
 
   The source and destination registers point to the next string elements
to be operated on, the EIP register points to the string instruction,
and the ECX register has the value it held following the last successful
iteration of the instruction.
It is satisfied. Writeback is done on each iteration.

 
 in the emulator's fault handling. I should have read your patch more closely.
 
 Thanks,
   Takuya

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Don't spam kernel log when injecting exceptions due to bad cr writes

2010-03-11 Thread Avi Kivity
These are guest-triggerable.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |   27 ---
 1 files changed, 0 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 169b1b3..66609f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -404,8 +404,6 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
 #ifdef CONFIG_X86_64
if (cr0  0xUL) {
-   printk(KERN_DEBUG set_cr0: 0x%lx #GP, reserved bits 0x%lx\n,
-  cr0, kvm_read_cr0(vcpu));
kvm_inject_gp(vcpu, 0);
return;
}
@@ -414,14 +412,11 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
cr0 = ~CR0_RESERVED_BITS;
 
if ((cr0  X86_CR0_NW)  !(cr0  X86_CR0_CD)) {
-   printk(KERN_DEBUG set_cr0: #GP, CD == 0  NW == 1\n);
kvm_inject_gp(vcpu, 0);
return;
}
 
if ((cr0  X86_CR0_PG)  !(cr0  X86_CR0_PE)) {
-   printk(KERN_DEBUG set_cr0: #GP, set PG flag 
-  and a clear PE flag\n);
kvm_inject_gp(vcpu, 0);
return;
}
@@ -432,15 +427,11 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
int cs_db, cs_l;
 
if (!is_pae(vcpu)) {
-   printk(KERN_DEBUG set_cr0: #GP, start paging 
-  in long mode while PAE is disabled\n);
kvm_inject_gp(vcpu, 0);
return;
}
kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l);
if (cs_l) {
-   printk(KERN_DEBUG set_cr0: #GP, start paging 
-  in long mode while CS.L == 1\n);
kvm_inject_gp(vcpu, 0);
return;
 
@@ -448,8 +439,6 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
} else
 #endif
if (is_pae(vcpu)  !load_pdptrs(vcpu, vcpu-arch.cr3)) {
-   printk(KERN_DEBUG set_cr0: #GP, pdptrs 
-  reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
@@ -475,28 +464,23 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE;
 
if (cr4  CR4_RESERVED_BITS) {
-   printk(KERN_DEBUG set_cr4: #GP, reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
 
if (is_long_mode(vcpu)) {
if (!(cr4  X86_CR4_PAE)) {
-   printk(KERN_DEBUG set_cr4: #GP, clearing PAE while 
-  in long mode\n);
kvm_inject_gp(vcpu, 0);
return;
}
} else if (is_paging(vcpu)  (cr4  X86_CR4_PAE)
((cr4 ^ old_cr4)  pdptr_bits)
!load_pdptrs(vcpu, vcpu-arch.cr3)) {
-   printk(KERN_DEBUG set_cr4: #GP, pdptrs reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
 
if (cr4  X86_CR4_VMXE) {
-   printk(KERN_DEBUG set_cr4: #GP, setting VMXE\n);
kvm_inject_gp(vcpu, 0);
return;
}
@@ -517,21 +501,16 @@ void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
if (is_long_mode(vcpu)) {
if (cr3  CR3_L_MODE_RESERVED_BITS) {
-   printk(KERN_DEBUG set_cr3: #GP, reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
} else {
if (is_pae(vcpu)) {
if (cr3  CR3_PAE_RESERVED_BITS) {
-   printk(KERN_DEBUG
-  set_cr3: #GP, reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
if (is_paging(vcpu)  !load_pdptrs(vcpu, cr3)) {
-   printk(KERN_DEBUG set_cr3: #GP, pdptrs 
-  reserved bits\n);
kvm_inject_gp(vcpu, 0);
return;
}
@@ -563,7 +542,6 @@ EXPORT_SYMBOL_GPL(kvm_set_cr3);
 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
 {
if (cr8  CR8_RESERVED_BITS) {
-   printk(KERN_DEBUG set_cr8: #GP, reserved bits 0x%lx\n, cr8);
kvm_inject_gp(vcpu, 0);
return;
}
@@ -619,15 +597,12 @@ static u32 emulated_msrs[] = {
 static void set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
if (efer  efer_reserved_bits) 

Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 10:46:06AM +0200, Avi Kivity wrote:
 On 03/11/2010 10:42 AM, Gleb Natapov wrote:
 On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote:
 On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote:
 On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote:
 On 03/11/2010 10:23 AM, Sheng Yang wrote:
 I have kept --no-hpet in my setup for
 months...
 Any details about the problems?  HPET is important to some guests.
 Seems like HPET reaction is too slow to satisfy some guests(for it would
 replace PIT).
 
 Here is the thread last time.
 
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899
 Thanks.  We can address this in three ways: first, adjust the guest
 not to do timing related tests when virtualized (since no matter
 what we do, the tests may fail).  Second, I think we should
 implement userspace ack notifiers (similar to tpr access notifiers
 already present).  Third, we can implement a kernel hpet, which,
 after we solve the zillion bug it introduces, will also give a nice
 performance improvement for hpet intensive workloads.
 Second will not solve the problem. Presence of ack notifiers will not
 make HPET interrupt arrive faster.
 The slow may also due to lost tick. And with the lost tick, hpet is still
 unusable...
 
 If the problem it due to lost ticks reinjection may solve it, but only 
 partially.
 What if IO thread haven't run even once during the time vcpu did clock
 source check? IIRC sometimes we trigger this even with in kernel PIT.
 
 That is true.  Reinjection can correct problems in the long term,
 but may fail in the short term.  10 ticks is easily short term in a
 heavily loaded system.
 
 How does it happen with kernel PIT?  I could understand it if we had
 a work item doing the injection, but everything happens either from
 hrtimer context or vcpu context.
 
Do we kick vcpu out of guest mode when hrtimer triggers? I don't see us doing 
it in
__kvm_timer_fn().

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-03-11 Thread Joerg Roedel
On Thu, Mar 11, 2010 at 08:47:21AM +0200, Avi Kivity wrote:
 
 tdp is still used in both cases, so that name is confusing.  We
 could call it mmu.direct_map (and set it for real mode?) or
 mmu.virtual_map (with the opposite sense).  Or something.

I like the mmu.direct_map name. Its a good term too, I will change it in
the patch.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint

2010-03-11 Thread Takuya Yoshikawa
Avi Kivity wrote:
 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
 index b75efef..3cf9547 100644
 --- a/arch/x86/kvm/trace.h
 +++ b/arch/x86/kvm/trace.h
 @@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic,
   * Tracepoint for kvm guest exit:
   */
  TRACE_EVENT(kvm_exit,
 - TP_PROTO(unsigned int exit_reason, unsigned long guest_rip),
 - TP_ARGS(exit_reason, guest_rip),
 + TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu),

Whitespaces were inserted by accident?

 + TP_ARGS(exit_reason, vcpu),
  
   TP_STRUCT__entry(
   __field(unsigned int,   exit_reason )
 @@ -192,7 +192,7 @@ TRACE_EVENT(kvm_exit,
  
   TP_fast_assign(
   __entry-exit_reason= exit_reason;
 - __entry-guest_rip  = guest_rip;
 + __entry-guest_rip  = kvm_rip_read(vcpu);
   ),
  
   TP_printk(reason %s rip 0x%lx,
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index ae3217d..06108f3 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3605,7 +3605,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   u32 exit_reason = vmx-exit_reason;
   u32 vectoring_info = vmx-idt_vectoring_info;
  
 - trace_kvm_exit(exit_reason, kvm_rip_read(vcpu));
 + trace_kvm_exit(exit_reason, vcpu);
  
   /* If guest state is invalid, start emulating */
   if (vmx-emulation_required  emulate_invalid_guest_state)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Trace exception injection

2010-03-11 Thread Avi Kivity
Often an exception can help point out where things start to go wrong.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/trace.h |   32 
 arch/x86/kvm/x86.c   |3 +++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index d10b359..32c912c 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -219,6 +219,38 @@ TRACE_EVENT(kvm_inj_virq,
TP_printk(irq %u, __entry-irq)
 );
 
+#define EXS(x) { x##_VECTOR, # #x }
+
+#define kvm_trace_sym_exc  \
+   EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM),  \
+   EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF),   \
+   EXS(MF), EXS(MC)
+
+/*
+ * Tracepoint for kvm interrupt injection:
+ */
+TRACE_EVENT(kvm_inj_exception,
+   TP_PROTO(unsigned exception, bool has_error, unsigned error_code),
+   TP_ARGS(exception, has_error, error_code),
+
+   TP_STRUCT__entry(
+   __field(u8, exception   )
+   __field(u8, has_error   )
+   __field(u32,error_code  )
+   ),
+
+   TP_fast_assign(
+   __entry-exception  = exception;
+   __entry-has_error  = has_error;
+   __entry-error_code = error_code;
+   ),
+
+   TP_printk(%s (0x%x),
+ __print_symbolic(__entry-exception, kvm_trace_sym_exc),
+ /* FIXME: don't print error_code if not present */
+ __entry-has_error ? __entry-error_code : 0)
+);
+
 /*
  * Tracepoint for page fault.
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 66609f6..bcf52d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4231,6 +4231,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
 {
/* try to reinject previous events if any */
if (vcpu-arch.exception.pending) {
+   trace_kvm_inj_exception(vcpu-arch.exception.nr,
+   vcpu-arch.exception.has_error_code,
+   vcpu-arch.exception.error_code);
kvm_x86_ops-queue_exception(vcpu, vcpu-arch.exception.nr,
  vcpu-arch.exception.has_error_code,
  vcpu-arch.exception.error_code);
-- 
1.7.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint

2010-03-11 Thread Avi Kivity
On 03/11/2010 01:03 PM, Takuya Yoshikawa wrote:
 Avi Kivity wrote:
   
 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
 index b75efef..3cf9547 100644
 --- a/arch/x86/kvm/trace.h
 +++ b/arch/x86/kvm/trace.h
 @@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic,
   * Tracepoint for kvm guest exit:
   */
  TRACE_EVENT(kvm_exit,
 -TP_PROTO(unsigned int exit_reason, unsigned long guest_rip),
 -TP_ARGS(exit_reason, guest_rip),
 +TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu),
 
 Whitespaces were inserted by accident?
   

Yeah, already fixed locally.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host

2010-03-11 Thread Antoine Martin

On 03/11/2010 04:31 PM, pagee...@freemail.hu wrote:

On 11 Mar 2010 at 8:44, Avi Kivity wrote:

   

On 03/10/2010 06:17 PM, Antoine Martin wrote:
 

Hi,

I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the
base system), rebuilt kvm.
... and now I get hundreds of those in dmesg on the host when I start
a guest kernel that worked fine before. (2.6.33 + pax patch v5)
  set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033
  set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b
  set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033
  set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b
   

The guest is clearly confused.  Can you bisect kvm to find out what
introduced this problem?
 

OK, will try to find the time.

the guest is calling pax_{open,close}_kernel that flip cr0.wp off/on,
respectively. Antoine, can you decode some of those rip values please
(or better, send me the corresponding vmlinux and all logs)

I've dumped everything here (.config, vmlinuz and log):
http://users.nagafix.co.uk/~antoine/KVM/

Antoine
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Trace exception injection

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote:
 Often an exception can help point out where things start to go wrong.
 
Adding guest rip where exception happened will be useful too.

 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/trace.h |   32 
  arch/x86/kvm/x86.c   |3 +++
  2 files changed, 35 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
 index d10b359..32c912c 100644
 --- a/arch/x86/kvm/trace.h
 +++ b/arch/x86/kvm/trace.h
 @@ -219,6 +219,38 @@ TRACE_EVENT(kvm_inj_virq,
   TP_printk(irq %u, __entry-irq)
  );
  
 +#define EXS(x) { x##_VECTOR, # #x }
 +
 +#define kvm_trace_sym_exc\
 + EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM),  \
 + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF),   \
 + EXS(MF), EXS(MC)
 +
 +/*
 + * Tracepoint for kvm interrupt injection:
 + */
 +TRACE_EVENT(kvm_inj_exception,
 + TP_PROTO(unsigned exception, bool has_error, unsigned error_code),
 + TP_ARGS(exception, has_error, error_code),
 +
 + TP_STRUCT__entry(
 + __field(u8, exception   )
 + __field(u8, has_error   )
 + __field(u32,error_code  )
 + ),
 +
 + TP_fast_assign(
 + __entry-exception  = exception;
 + __entry-has_error  = has_error;
 + __entry-error_code = error_code;
 + ),
 +
 + TP_printk(%s (0x%x),
 +   __print_symbolic(__entry-exception, kvm_trace_sym_exc),
 +   /* FIXME: don't print error_code if not present */
 +   __entry-has_error ? __entry-error_code : 0)
 +);
 +
  /*
   * Tracepoint for page fault.
   */
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 66609f6..bcf52d1 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4231,6 +4231,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
  {
   /* try to reinject previous events if any */
   if (vcpu-arch.exception.pending) {
 + trace_kvm_inj_exception(vcpu-arch.exception.nr,
 + vcpu-arch.exception.has_error_code,
 + vcpu-arch.exception.error_code);
   kvm_x86_ops-queue_exception(vcpu, vcpu-arch.exception.nr,
 vcpu-arch.exception.has_error_code,
 vcpu-arch.exception.error_code);
 -- 
 1.7.0.2
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host

2010-03-11 Thread pageexec
On 11 Mar 2010 at 8:44, Avi Kivity wrote:

 On 03/10/2010 06:17 PM, Antoine Martin wrote:
  Hi,
 
  I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the 
  base system), rebuilt kvm.
  ... and now I get hundreds of those in dmesg on the host when I start 
  a guest kernel that worked fine before. (2.6.33 + pax patch v5)
   set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033
   set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b
   set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033
   set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b
 
 The guest is clearly confused.  Can you bisect kvm to find out what 
 introduced this problem?

the guest is calling pax_{open,close}_kernel that flip cr0.wp off/on,
respectively. Antoine, can you decode some of those rip values please
(or better, send me the corresponding vmlinux and all logs)?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Alexander Graf

On 11.03.2010, at 10:43, Paolo Bonzini wrote:

 On 03/11/2010 08:55 AM, Avi Kivity wrote:
 On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 
 2. Do we have kvm-specific projects? Can they be part of the QEMU project
 or do we need a different mentoring organization for it?
 
 Something really interesting is kvm-assisted tcg. I'm afraid it's a bit
 too complicated to GSoC.
 
 I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some 
 time, so it's good to have ideas written down.
 
 Also, the selection of projects will be done by members of the community, by 
 grading the student's submissions.  The bar would be placed higher for 
 someone who picks a complicated project.

The list is also still missing a lot of potential mentors for the listed ideas. 
Let me propose some here :)

== Shared memory transport between guest(s) and host ==

Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes 
to shm.

== Pass through file systems (9p, CIFS) ==

I dislike CIFS now that we use it regularly. It just doesn't work for Linux to 
Linux communication. But as far as 9P is concerned, do you need help there 
Anthony? If so, would you take over the mentoring?

== Add more sophisticated encodings to VNC server ==

I could probably help out being a secondary mentor here, but Anthony would be a 
good fit as primary, no? I guess Kraxel could help out too.

== Write a C QMP library based on QEMU JSON and QMP code ==

Suggested by Anthony, mentored by Anthony? :) Possible other candidates are 
Luiz and Kraxel I guess? I haven't really tracked QMP that much.

== Add support for guest copy/paste ==

This should probably be folded into the above VNC server improvements. By 
itself it's just too little of a task.

== Device state visualization ==

Jan, Kraxel? Maybe too small for a task?

== Upstreaming some of the Android emulator bits ==

Jan, Anthony?


If you read the suggestion and just think to yourself well yes, I think I 
could do it - then put your name in the wiki :).

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Trace exception injection

2010-03-11 Thread Avi Kivity

On 03/11/2010 01:09 PM, Gleb Natapov wrote:

On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote:
   

Often an exception can help point out where things start to go wrong.

 

Adding guest rip where exception happened will be useful too.
   


You get that from the previous kvm_exit trace.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Avi Kivity

On 03/11/2010 01:25 PM, Alexander Graf wrote:

The list is also still missing a lot of potential mentors for the listed ideas. 
Let me propose some here :)

== Shared memory transport between guest(s) and host ==

Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes 
to shm.
   


Not sure what this is.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Avi Kivity

On 03/11/2010 12:23 PM, Gleb Natapov wrote:


If the problem it due to lost ticks reinjection may solve it, but only 
partially.
What if IO thread haven't run even once during the time vcpu did clock
source check? IIRC sometimes we trigger this even with in kernel PIT.
   

That is true.  Reinjection can correct problems in the long term,
but may fail in the short term.  10 ticks is easily short term in a
heavily loaded system.

How does it happen with kernel PIT?  I could understand it if we had
a work item doing the injection, but everything happens either from
hrtimer context or vcpu context.

 

Do we kick vcpu out of guest mode when hrtimer triggers? I don't see us doing 
it in
__kvm_timer_fn().

   


We're always running on the same cpu as vcpu 0, so no need.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Alexander Graf

On 11.03.2010, at 12:54, Avi Kivity wrote:

 On 03/11/2010 01:25 PM, Alexander Graf wrote:
 The list is also still missing a lot of potential mentors for the listed 
 ideas. Let me propose some here :)
 
 == Shared memory transport between guest(s) and host ==
 
 Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it 
 comes to shm.
   
 
 Not sure what this is.

Cam's shared memory device.

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Avi Kivity

On 03/11/2010 01:56 PM, Avi Kivity wrote:

On 03/11/2010 12:23 PM, Gleb Natapov wrote:


If the problem it due to lost ticks reinjection may solve it, but 
only partially.

What if IO thread haven't run even once during the time vcpu did clock
source check? IIRC sometimes we trigger this even with in kernel PIT.

That is true.  Reinjection can correct problems in the long term,
but may fail in the short term.  10 ticks is easily short term in a
heavily loaded system.

How does it happen with kernel PIT?  I could understand it if we had
a work item doing the injection, but everything happens either from
hrtimer context or vcpu context.

Do we kick vcpu out of guest mode when hrtimer triggers? I don't see 
us doing it in

__kvm_timer_fn().



We're always running on the same cpu as vcpu 0, so no need.



Would be better to do it, though, in case we have migration races.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Avi Kivity

On 03/11/2010 01:56 PM, Alexander Graf wrote:

On 11.03.2010, at 12:54, Avi Kivity wrote:

   

On 03/11/2010 01:25 PM, Alexander Graf wrote:
 

The list is also still missing a lot of potential mentors for the listed ideas. 
Let me propose some here :)

== Shared memory transport between guest(s) and host ==

Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes 
to shm.

   

Not sure what this is.
 

Cam's shared memory device.
   


That's plain shared memory among guests (though the host could also 
participate).  transport evokes something like virtio rings.


I could mentor it, though I prefer something in kvm, and it looks close 
to completion.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Alexander Graf

On 11.03.2010, at 12:58, Avi Kivity wrote:

 On 03/11/2010 01:56 PM, Alexander Graf wrote:
 On 11.03.2010, at 12:54, Avi Kivity wrote:
 
   
 On 03/11/2010 01:25 PM, Alexander Graf wrote:
 
 The list is also still missing a lot of potential mentors for the listed 
 ideas. Let me propose some here :)
 
 == Shared memory transport between guest(s) and host ==
 
 Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it 
 comes to shm.
 
   
 Not sure what this is.
 
 Cam's shared memory device.
   
 
 That's plain shared memory among guests (though the host could also 
 participate).  transport evokes something like virtio rings.
 
 I could mentor it, though I prefer something in kvm, and it looks close to 
 completion.

I agree. Take it off the list then :-).

Another idea I'd have would be upstream integration (and cleanup) of the ARM 
KVM port: https://wiki.ncl.cs.columbia.edu/wiki/index.php/AndroidVirt:MainPage

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Paolo Bonzini

On 03/11/2010 12:25 PM, Alexander Graf wrote:

== Write a C QMP library based on QEMU JSON and QMP code ==

Suggested by Anthony, mentored by Anthony?:)  Possible other
candidates are Luiz and Kraxel I guess? I haven't really tracked QMP
that much.


If you guys are okay with this, I think I could mentor since I followed 
the design of QMP quite closely (and this is the only one that I think I 
could do a decent job with).


BTW, it worked out much better for me in the past when the student and 
mentor were in a similar time zone.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host

2010-03-11 Thread pageexec
On 11 Mar 2010 at 8:44, Avi Kivity wrote:

 On 03/10/2010 06:17 PM, Antoine Martin wrote:
  Hi,
 
  I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the 
  base system), rebuilt kvm.
  ... and now I get hundreds of those in dmesg on the host when I start 
  a guest kernel that worked fine before. (2.6.33 + pax patch v5)
   set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033
   set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b
   set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033
   set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b
 
 The guest is clearly confused.  Can you bisect kvm to find out what 
 introduced this problem?

i screwed up the paravirt register clobbers, don't worry about it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Avi Kivity

On 03/11/2010 02:03 PM, Alexander Graf wrote:

Another idea I'd have would be upstream integration (and cleanup) of the ARM 
KVM port: https://wiki.ncl.cs.columbia.edu/wiki/index.php/AndroidVirt:MainPage
   


Huh, didn't even know this thing existed.  Definitely something to merge.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows Driver for -vga std

2010-03-11 Thread erik . rull
Hi all,

using the Default VGA settings Windows XP detects an unknown VGA Device,
but everything is fine, Display settings are ok. But how can I setup my XP
to detect this virtual graphics board correctly? I just want to continue
using this setting but with no complaints in the system/hardware settings.

Best regards,

Erik


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Paul Brook
 On 03/10/2010 07:41 PM, Paul Brook wrote:
  You're much better off using a bulk-data transfer API that relaxes
  coherency requirements.  IOW, shared memory doesn't make sense for TCG
 
  Rather, tcg doesn't make sense for shared memory smp.  But we knew that
  already.
 
  In think TCG SMP is a hard, but soluble problem, especially when you're
  running guests used to coping with NUMA.
 
 Do you mean by using a per-cpu tlb?  These kind of solutions are
 generally slow, but tcg's slowness may mask this out.

Yes.

  TCG interacting with third parties via shared memory is probably never
  going to make sense.
 
 The third party in this case is qemu.

Maybe. But it's a different instance of qemu, and once this feature exists I 
bet people will use it for other things.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Avi Kivity

On 03/10/2010 11:30 PM, Luiz Capitulino wrote:

2. Do we have kvm-specific projects? Can they be part of the QEMU project
or do we need a different mentoring organization for it?
   


Complete big real mode emulation.  I'll add this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Trace exception injection

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 01:51:30PM +0200, Avi Kivity wrote:
 On 03/11/2010 01:09 PM, Gleb Natapov wrote:
 On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote:
 Often an exception can help point out where things start to go wrong.
 
 Adding guest rip where exception happened will be useful too.
 
 You get that from the previous kvm_exit trace.
 
Not in a case of emulation ;)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Trace exception injection

2010-03-11 Thread Avi Kivity

On 03/11/2010 02:31 PM, Gleb Natapov wrote:

On Thu, Mar 11, 2010 at 01:51:30PM +0200, Avi Kivity wrote:
   

On 03/11/2010 01:09 PM, Gleb Natapov wrote:
 

On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote:
   

Often an exception can help point out where things start to go wrong.

 

Adding guest rip where exception happened will be useful too.
   

You get that from the previous kvm_exit trace.

 

Not in a case of emulation ;)
   


Then we need an emulator trace.  I have it in a branch somewhere, will 
reactivate it after your stuff goes in.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Arnd Bergmann
On Thursday 11 March 2010, Avi Kivity wrote:
  A totally different option that avoids this whole problem would
  be to separate the signalling from the shared memory, making the
  PCI shared memory device a trivial device with a single memory BAR,
  and using something a higher-level concept like a virtio based
  serial line for the actual signalling.
 
 
 That would be much slower.  The current scheme allows for an 
 ioeventfd/irqfd short circuit which allows one guest to interrupt 
 another without involving their qemus at all.

Yes, the serial line approach would be much slower, but my point
was that we can do signaling over something else, which could
well be something building on irqfd.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Luiz Capitulino
On Thu, 11 Mar 2010 10:43:09 +0100
Paolo Bonzini pbonz...@redhat.com wrote:

 On 03/11/2010 08:55 AM, Avi Kivity wrote:
  On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 
  2. Do we have kvm-specific projects? Can they be part of the QEMU project
  or do we need a different mentoring organization for it?
 
  Something really interesting is kvm-assisted tcg. I'm afraid it's a bit
  too complicated to GSoC.
 
 I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in 
 some time, so it's good to have ideas written down.
 
 Also, the selection of projects will be done by members of the 
 community, by grading the student's submissions.  The bar would be 
 placed higher for someone who picks a complicated project.

 Exactly, we also have a 'skill level' tag, setting it to high should
help and note that we can have from grad students to phd ones.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Alexander Graf

On 11.03.2010, at 13:59, Luiz Capitulino wrote:

 On Thu, 11 Mar 2010 10:43:09 +0100
 Paolo Bonzini pbonz...@redhat.com wrote:
 
 On 03/11/2010 08:55 AM, Avi Kivity wrote:
 On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 
 2. Do we have kvm-specific projects? Can they be part of the QEMU project
 or do we need a different mentoring organization for it?
 
 Something really interesting is kvm-assisted tcg. I'm afraid it's a bit
 too complicated to GSoC.
 
 I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in 
 some time, so it's good to have ideas written down.
 
 Also, the selection of projects will be done by members of the 
 community, by grading the student's submissions.  The bar would be 
 placed higher for someone who picks a complicated project.
 
 Exactly, we also have a 'skill level' tag, setting it to high should
 help and note that we can have from grad students to phd ones.

I don't think we should put in a correlation between skill level and degree. I 
myself only have a Bachelor's degree :-).


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Avi Kivity

On 03/11/2010 02:57 PM, Arnd Bergmann wrote:

On Thursday 11 March 2010, Avi Kivity wrote:
   

A totally different option that avoids this whole problem would
be to separate the signalling from the shared memory, making the
PCI shared memory device a trivial device with a single memory BAR,
and using something a higher-level concept like a virtio based
serial line for the actual signalling.

   

That would be much slower.  The current scheme allows for an
ioeventfd/irqfd short circuit which allows one guest to interrupt
another without involving their qemus at all.
 

Yes, the serial line approach would be much slower, but my point
was that we can do signaling over something else, which could
well be something building on irqfd.
   


Well, we could, but it seems to make things more complicated?  A card 
with shared memory, and another card with an interrupt interconnect?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Ideas wiki for GSoC 2010

2010-03-11 Thread Lucas Meneghel Rodrigues
On Wed, 2010-03-10 at 18:30 -0300, Luiz Capitulino wrote:
 Hi there,
 
  Our wiki page for the Summer of Code 2010 is doing quite well:
 
 http://wiki.qemu.org/Google_Summer_of_Code_2010

Just to let you guys know that I'm going to give a talk at the local
university (Unicamp) about kvm autotest, and will spread the word about
the qemu and kvm summer of code applications, will incentivate the
students to apply for qemu and kvm.

The university was the 2nd overall place on number of student proposals
accepted on gsoc for the last couple of years, with an excellent
completion rate, so I believe we could have some good work coming out of
it.

  Now the most important is:
 
 1. Get mentors assigned to projects. Just put your name and email in the
right field. It's ok and even desirable to have two mentors per project,
but please remember that mentoring is serious work, more info here:
 
http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors
 
http://gsoc-wiki.osuosl.org/index.php/Main_Page
 
 2. Do we have kvm-specific projects? Can they be part of the QEMU project
or do we need a different mentoring organization for it?
 
 3. Fill in the missing information for the suggested project (description,
skill level, languages, etc)
 
  I will complete our application tomorrow or on Friday.
 
 PS: I'm CC'ing everyone who suggested projects there, except one or two
 I couldn't find the email address.
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Luiz Capitulino
On Thu, 11 Mar 2010 12:25:24 +0100
Alexander Graf ag...@suse.de wrote:

 == Write a C QMP library based on QEMU JSON and QMP code ==
 
 Suggested by Anthony, mentored by Anthony? :) Possible other candidates are 
 Luiz and Kraxel I guess? I haven't really tracked QMP that much.

 I didn't candidate as a mentor myself because Anthony has a better idea wrt
to the public API.

 But I certainly can help with the implementation. I have more two or three
QMP projects to suggest, btw.

 == Add support for guest copy/paste ==
 
 This should probably be folded into the above VNC server improvements. By 
 itself it's just too little of a task.
 
 == Device state visualization ==
 
 Jan, Kraxel? Maybe too small for a task?

 I think that whether a task is small or not also depends on the student,
of course that we should not come up with a project that can be easily
done in two weeks.

 On the other hand, 'not that difficult' tasks can be an excellent project
for those really new to open source and serious development. You know, when
you're a starter you spend quite a lot of time reading code and trying
things out (and there's nothing wrong with that).

 So, for this kind of project the mentor only should take extra care to
choose a student that is really going to learn a lot in the project.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Luiz Capitulino
On Thu, 11 Mar 2010 13:09:37 +0100
Paolo Bonzini pbonz...@redhat.com wrote:

 On 03/11/2010 12:25 PM, Alexander Graf wrote:
  == Write a C QMP library based on QEMU JSON and QMP code ==
 
  Suggested by Anthony, mentored by Anthony?:)  Possible other
  candidates are Luiz and Kraxel I guess? I haven't really tracked QMP
  that much.
 
 If you guys are okay with this, I think I could mentor since I followed 
 the design of QMP quite closely (and this is the only one that I think I 
 could do a decent job with).

 Sure.

 
 BTW, it worked out much better for me in the past when the student and 
 mentor were in a similar time zone.
 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Luiz Capitulino
On Thu, 11 Mar 2010 14:00:46 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 11.03.2010, at 13:59, Luiz Capitulino wrote:
 
  On Thu, 11 Mar 2010 10:43:09 +0100
  Paolo Bonzini pbonz...@redhat.com wrote:
  
  On 03/11/2010 08:55 AM, Avi Kivity wrote:
  On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
  
  2. Do we have kvm-specific projects? Can they be part of the QEMU project
  or do we need a different mentoring organization for it?
  
  Something really interesting is kvm-assisted tcg. I'm afraid it's a bit
  too complicated to GSoC.
  
  I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in 
  some time, so it's good to have ideas written down.
  
  Also, the selection of projects will be done by members of the 
  community, by grading the student's submissions.  The bar would be 
  placed higher for someone who picks a complicated project.
  
  Exactly, we also have a 'skill level' tag, setting it to high should
  help and note that we can have from grad students to phd ones.
 
 I don't think we should put in a correlation between skill level and degree. 
 I myself only have a Bachelor's degree :-).

 Absolutely, my bad :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to tweak kernel to get the best out of kvm?

2010-03-11 Thread Harald Dunkel
Hi Avi,

I had missed to include some important syslog lines from the
host system. See attachment.

On 03/10/10 14:15, Avi Kivity wrote:
 
 You have tons of iowait time, indicating an I/O bottleneck.
 

Is this disk IO or network IO? The rsync session puts a
high load on both, but actually I do not see how a high
load on disk or block IO could make the virtual hosts
unresponsive, as shown by the hosts syslog?


 What filesystem are you using for the host?  Are you using qcow2 or raw
 access?  What's the qemu command line.
 

It is ext3 and qcow2. Currently I am testing with reiserfs
on the host system. The system performance seems to be worse,
compared with ext3.

Here is the kvm command line (as generated by libvirt):

/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 1 -name test0.0 \
-uuid 74e71149-4baf-3af0-9c99-f4e50273296f \
-monitor unix:/var/lib/libvirt/qemu/test0.0.monitor,server,nowait \
-boot c -drive if=ide,media=cdrom,bus=1,unit=0 \
-drive file=/export/storage/test0.0.img,if=virtio,boot=on \
-net nic,macaddr=00:16:36:94:7e:f3,vlan=0,model=virtio,name=net0 \
-net tap,fd=60,vlan=0,name=hostnet0 -serial pty -parallel none \
-usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio

  
 How many virtual machines would you assume I could run on a
 host with 64 GByte RAM, 2 quad cores, a bonding NIC with
 4*1Gbit/sec and a hardware RAID? Each vhost is supposed to
 get 4 GByte RAM and 1 CPU.

 
 15 guests should fit comfortably, more with ksm running if the workloads
 are similar, or if you use ballooning.
 

15 vhosts would be nice. ksm is in the kernel, but not in my qemu-kvm
(yet).

 
 Here the problem is likely the host filesystem and/or I/O scheduler.
 
 The optimal layout is placing guest disks in LVM volumes, and accessing
 them with -drive file=...,cache=none.  However, file-based access should
 also work.
 

I will try LVM tomorrow, when the test with reiserfs is completed.


Many thanx

Harri


syslog.gz
Description: application/gzip


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Arnd Bergmann
On Thursday 11 March 2010, Avi Kivity wrote:
  That would be much slower.  The current scheme allows for an
  ioeventfd/irqfd short circuit which allows one guest to interrupt
  another without involving their qemus at all.
   
  Yes, the serial line approach would be much slower, but my point
  was that we can do signaling over something else, which could
  well be something building on irqfd.
 
 Well, we could, but it seems to make things more complicated?  A card 
 with shared memory, and another card with an interrupt interconnect?

Yes, I agree that it's more complicated if you have a specific application
in mind that needs one of each, and most use cases that want shared memory
also need an interrupt mechanism, but it's not always the case:

- You could use ext2 with -o xip on a private mapping of a shared host file
in order to share the page cache. This does not need any interrupts.

- If you have more than two parties sharing the segment, there are different
ways to communicate, e.g. always send an interrupt to all others, or have
dedicated point-to-point connections. There is also some complexity in
trying to cover all possible cases in one driver.

I have to say that I also really like the idea of futex over shared memory,
which could potentially make this all a lot simpler. I don't know how this
would best be implemented on the host though.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread malc
On Thu, 11 Mar 2010, Nick Piggin wrote:

 On Thu, Mar 11, 2010 at 03:10:47AM +, Jamie Lokier wrote:
  Paul Brook wrote:
 In a cross environment that becomes extremely hairy.  For example the 
 x86
 architecture effectively has an implicit write barrier before every
 store, and an implicit read barrier before every load.

Btw, x86 doesn't have any implicit barriers due to ordinary loads.
Only stores and atomics have implicit barriers, afaik.
   
   As of March 2009[1] Intel guarantees that memory reads occur in
   order (they may only be reordered relative to writes). It appears
   AMD do not provide this guarantee, which could be an interesting
   problem for heterogeneous migration..
  
  (Summary: At least on AMD64, it does too, for normal accesses to
  naturally aligned addresses in write-back cacheable memory.)
  
  Oh, that's interesting.  Way back when I guess we knew writes were in
  order and it wasn't explicit that reads were, hence smp_rmb() using a
  locked atomic.
  
  Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD
  documents asserting that reads to cacheable memory are in program order:
  
  http://lkml.org/lkml/2007/9/28/212
  Subject: [patch] x86: improved memory barrier implementation
  
  Links to documents:
  
  http://developer.intel.com/products/processor/manuals/318147.pdf
  
  http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
  
  The Intel link doesn't work any more, but the AMD one does.
 
 It might have been merged into their development manual now.

It was (http://www.intel.com/products/processor/manuals/):

Intel╝ 64 Architecture Memory Ordering White Paper

This document has been merged into Volume 3A of Intel 64 and IA-32 
Architectures Software Developer's Manual.

[..snip..]

-- 
mailto:av1...@comtv.ru

Re: Shadow page table questions

2010-03-11 Thread Marek Olszewski
It doesn't, and there are often multiple shadow pages per guest page, 
distinguished by their sp-role field. 
Oh, great!  Does this mean that there is already a mechanism for 
synchronizing all shadow pages shadowing the same guest when such a 
guest page changes?


Marek



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Cam Macdonell
On Thu, Mar 11, 2010 at 5:03 AM, Alexander Graf ag...@suse.de wrote:

 On 11.03.2010, at 12:58, Avi Kivity wrote:

 On 03/11/2010 01:56 PM, Alexander Graf wrote:
 On 11.03.2010, at 12:54, Avi Kivity wrote:


 On 03/11/2010 01:25 PM, Alexander Graf wrote:

 The list is also still missing a lot of potential mentors for the listed 
 ideas. Let me propose some here :)

 == Shared memory transport between guest(s) and host ==

 Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it 
 comes to shm.


 Not sure what this is.

 Cam's shared memory device.


 That's plain shared memory among guests (though the host could also 
 participate).  transport evokes something like virtio rings.

 I could mentor it, though I prefer something in kvm, and it looks close to 
 completion.

 I agree. Take it off the list then :-).


Fair enough.  I'd be willing to take up one of the other suggestions.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-11 Thread Marcelo Tosatti
On Thu, Mar 11, 2010 at 09:58:12AM +0200, Avi Kivity wrote:
 On 03/11/2010 09:52 AM, Sheng Yang wrote:
 I think we have already suffered enough timer issues due to this(e.g. I can't
 boot up well on 2.6.18 kernel)...
 
 2.6.18 as guest or as host?
 
 I have kept --no-hpet in my setup for
 months...
 
 Any details about the problems?  HPET is important to some guests.

As Gleb mentioned in the other thread, reinjection will introduce
another set of problems.

Ideally all this timer related problems should be fixed by correlating
timer interrupts and time source reads.

Since one already has to use special timer parameters (-rtc-td-hack,
-no-kvm-pit-reinjection), using -no-hpet for problematic Linux 
guests seems fine?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/3] target-i386: print EFER in cpu_dump_state

2010-03-11 Thread Marcelo Tosatti
On Thu, Mar 11, 2010 at 10:35:21AM +0200, Avi Kivity wrote:
 On 03/09/2010 03:53 AM, Marcelo Tosatti wrote:
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com
 
 Index: qemu-kvm-uq/target-i386/helper.c
 ===
 --- qemu-kvm-uq.orig/target-i386/helper.c
 +++ qemu-kvm-uq/target-i386/helper.c
 @@ -1176,6 +1176,7 @@ void cpu_dump_state(CPUState *env, FILE
   cpu_x86_dump_seg_cache(env, f, cpu_fprintf, TR,env-tr);
 
   #ifdef TARGET_X86_64
 +cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer);
   if (env-hflags  HF_LMA_MASK) {
   cpu_fprintf(f, GDT= %016 PRIx64  %08x\n,
   env-gdt.base, env-gdt.limit);
 
 
 Better to do this for i386 too, no?

On systems that support IA-32e mode, the extended feature enable
register (IA32_EFER) is available. This model-specific register controls
activation of IA-32e mode and other IA-32e mode operations.

Can it be useful for i386 too?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-11 Thread Marcelo Tosatti
On Thu, Mar 11, 2010 at 10:32:50AM +0200, Avi Kivity wrote:
 On 03/02/2010 02:14 AM, Marcelo Tosatti wrote:
 On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
 This grand cleanup drops all reset and vmsave/load related
 synchronization points in favor of four(!) generic hooks:
 
 - cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
 - cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
 - cpu_synchronize_all_post_init in main after machine init
 - cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
 
 These writeback points + the existing one of VCPU exec after
 cpu_synchronize_state map on three levels of writeback:
 
 - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
 - KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
 - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)
 
 This level is passed to the arch-specific VCPU state writing function
 that will decide which concrete substates need to be written. That way,
 no writer of load, save or reset functions that interact with in-kernel
 KVM states will ever have to worry about synchronization again. That
 also means that a lot of reasons for races, segfaults and deadlocks are
 eliminated.
 
 cpu_synchronize_state remains untouched, just as Anthony suggested. We
 continue to need it before reading or writing of VCPU states that are
 also tracked by in-kernel KVM subsystems.
 
 Consequently, this patch removes many cpu_synchronize_state calls that
 are now redundant, just like remaining explicit register syncs.
 
 Signed-off-by: Jan Kiszkajan.kis...@siemens.com
 Jan,
 
 This patch breaks system reset of WinXP.32 install (more easily
 reproducible without iothread enabled).
 
 
 What's the conclusion here?  The patch is innocent of the regression?

Yes, it is. The problem was caused by a recent seabios change, now
fixed.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2968899 ] guest lockup setting clock when smp 1

2010-03-11 Thread SourceForge.net
Bugs item #2968899, was opened at 2010-03-11 14:31
Message generated for change (Tracker Item Submitted) made by high33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: hugohiggins (high33)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest lockup setting clock when smp  1

Initial Comment:
When booting iso image ubuntu-9.10-server-amd64.iso using  qemu-kvm-0.12.3 the 
guest will always lock up when installer tries to set clock via ntp when using 
-smp 2.  

Bug is repeatable every time during install.  Workaround seems to be booting 
without -smp parameter.

command line:
/usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name test  -M pc -m 2048 
-boot d -vga std -sdl -net nic,macaddr=BA:DD:C0:FF:EE:F6 -net vde -drive 
file=/dev/sdp,if=scsi,boot=on -cdrom iso/ubuntu-9.10-server-amd64.iso -k en-us 
-usbdevice tablet -serial file:serial.log -smp 2


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2968899 ] guest lockup setting clock when smp 1

2010-03-11 Thread SourceForge.net
Bugs item #2968899, was opened at 2010-03-11 14:31
Message generated for change (Comment added) made by high33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: hugohiggins (high33)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest lockup setting clock when smp  1

Initial Comment:
When booting iso image ubuntu-9.10-server-amd64.iso using  qemu-kvm-0.12.3 the 
guest will always lock up when installer tries to set clock via ntp when using 
-smp 2.  

Bug is repeatable every time during install.  Workaround seems to be booting 
without -smp parameter.

command line:
/usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name test  -M pc -m 2048 
-boot d -vga std -sdl -net nic,macaddr=BA:DD:C0:FF:EE:F6 -net vde -drive 
file=/dev/sdp,if=scsi,boot=on -cdrom iso/ubuntu-9.10-server-amd64.iso -k en-us 
-usbdevice tablet -serial file:serial.log -smp 2


--

Comment By: hugohiggins (high33)
Date: 2010-03-11 14:33

Message:
This is on a kvm hypervisor host running xubuntu 9.04 dual processor 6-core
Opteron with 32Gig of ram and kernel 2.6.28-16-generic #55-Ubuntu SMP Tue
Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-11 Thread Marcelo Tosatti
On Thu, Mar 04, 2010 at 04:58:20PM +0100, Joerg Roedel wrote:
 On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote:
  On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote:
   Hi,
   
   here are the patches that implement nested paging support for nested
   svm. They are somewhat intrusive to the soft-mmu so I post them as RFC
   in the first round to get feedback about the general direction of the
   changes.  Nevertheless I am proud to report that with these patches the
   famous kernel-compile benchmark runs only 4% slower in the l2 guest as
   in the l1 guest when l2 is single-processor. With SMP guests the
   situation is very different. The more vcpus the guest has the more is
   the performance drop from l1 to l2. 
   Anyway, this post is to get feedback about the overall concept of these
   patches.  Please review and give feedback :-)
  
  Joerg,
  
  What perf gain does this bring ? (i'm not aware of the current
  overhead).
 
 The benchmark was an allnoconfig kernel compile in tmpfs which took with
 the same guest image:
 
 as l1-guest with npt:
   
   2m23s
 
 as l2-guest with l1(nested)-l2(shadow):
   
   around 8-9 minutes
 
 as l2-guest with l1(nested)-l2(shadow) without the recent msrpm
 optimization:
 
   around 19 minutes
 
 as l2-guest with l1(nested)-l2(nested) [this patchset]:
 
   2m25s-2m30s
 
  Overall comments:
  
  Can't you translate l2_gpa - l1_gpa walking the current l1 nested
  pagetable, and pass that to the kvm tdp fault path (with the correct
  context setup)?
 
 If I understand your suggestion correctly, I think thats exactly whats
 done in the patches. Some words about the design:
 
 For nested-nested we need to shadow the l1-nested-ptable on the host.
 This is done using the vcpu-arch.mmu context which holds the l1 paging
 modes while the l2 is running. On a npt-fault from the l2 we just
 instrument the shadow-ptable code. This is the common case. because it
 happens all the time while the l2 is running.

OK, makes sense now, I was missing the fact that the l1-nested-ptable   
needs to be shadowed and l1 translations to it must be write protected. 

You should disable out of sync shadow so that l1 guest writes to
l1-nested-ptables always trap. And in the trap case, you'd have to
invalidate l2 shadow pagetable entries that used the (now obsolete)
l1-nested-ptable entry. Does that happen automatically?

 The other thing is that vcpu-arch.mmu.gva_to_gpa is expected to still
 work and translate virtual addresses of the l2 into physical addresses
 of the l1 (so it can be accessed with kvm functions).
 
 To do this we need to be aware of the L2 paging mode. It is stored in
 vcpu-arch.nested_mmu context. This context is only used for gva_to_gpa
 translations. It is not used to build shadow page tables or anything
 else. Thats the reason only the parts necessary for gva_to_gpa
 translations of the nested_mmu context are initialized.
 
 Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and
 l1_gpa because this function is required to translate l2_gva to l1_gpa
 by other parts of kvm, the function which does this translation is moved
 to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers
 are swapped between mmu and nested_mmu.
 
 The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which
 is assigned to the newly introduced translate_gpa callback of nested_mmu
 context.
 
 This callback is used in the walk_addr function to translate every
 l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read
 the next step from the guest memory.
 
 In the old unnested case the translate_gpa callback would point to a
 function which just returns the gpa it is passed to it unmodified. The
 walk_addr function is generalized and now there are basically two
 versions of it:
 
   * walk_addr which translates using vcpu-arch.mmu context
   * walk_addr_nested which translates using vcpu-arch.nested_mmu
 context
 
 Thats pretty much how these patches work.
 
  You probably need to include a flag in base_role to differentiate
  between l1 / l2 shadow tables (say if they use the same cr3 value).
 
 Not sure if this is necessary. It may be necessary when large pages come
 into play. Otherwise the host npt pages are distinguished by the shadow
 npt pages by the direct-flag.
 
   Joerg
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: ignore access permissions for hypercall patching

2010-03-11 Thread Marcelo Tosatti

Ignore access permissions while patching hypercall instructions. 
Otherwise KVM injects a page fault when trying to patch vmcall 
on read-only text regions:

Freeing initrd memory: 8843k freed
Freeing unused kernel memory: 660k freed
Write protecting the kernel text: 4780k
Write protecting the kernel read-only data: 1912k
BUG: unable to handle kernel paging request at c01292e3
IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
*pde = 00910067 *pte = 00129161
Oops: 0003 [#1] SMP

CC: sta...@kernel.org
Reported-by: Stefan Bader stefan.ba...@canonical.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 703f637..bf5c83f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t 
gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu,
+  bool guest_initiated)
 {
gpa_t gpa;
u32 error_code;
 
-   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
+
+   if (guest_initiated)
+   gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
+   else
+   gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code);
 
if (gpa == UNMAPPED_GVA) {
kvm_inject_page_fault(vcpu, addr, error_code);
@@ -3289,24 +3294,35 @@ mmio:
return X86EMUL_CONTINUE;
 }
 
-int emulator_write_emulated(unsigned long addr,
+int __emulator_write_emulated(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu,
+  bool guest_initiated)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr)  PAGE_MASK) {
int rc, now;
 
now = -addr  ~PAGE_MASK;
-   rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
+   rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
+guest_initiated);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
val += now;
bytes -= now;
}
-   return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
+   return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
+  guest_initiated);
+}
+
+int emulator_write_emulated(unsigned long addr,
+  const void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu)
+{
+   return __emulator_write_emulated(addr, val, bytes, vcpu, true);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
-   return emulator_write_emulated(rip, instruction, 3, vcpu);
+   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: ignore access permissions for hypercall patching

2010-03-11 Thread Stefan Bader
With this patch applied on top, I was able to boot my guest on a AMD host 
system.

Marcelo Tosatti wrote:
 Ignore access permissions while patching hypercall instructions. 
 Otherwise KVM injects a page fault when trying to patch vmcall 
 on read-only text regions:
 
 Freeing initrd memory: 8843k freed
 Freeing unused kernel memory: 660k freed
 Write protecting the kernel text: 4780k
 Write protecting the kernel read-only data: 1912k
 BUG: unable to handle kernel paging request at c01292e3
 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
 *pde = 00910067 *pte = 00129161
 Oops: 0003 [#1] SMP
 
 CC: sta...@kernel.org
 Reported-by: Stefan Bader stefan.ba...@canonical.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Tested-by: Stefan Bader stefan.ba...@canonical.com
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 703f637..bf5c83f 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t 
 gpa,
  static int emulator_write_emulated_onepage(unsigned long addr,
  const void *val,
  unsigned int bytes,
 -struct kvm_vcpu *vcpu)
 +struct kvm_vcpu *vcpu,
 +bool guest_initiated)
  {
   gpa_t gpa;
   u32 error_code;
  
 - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
 +
 + if (guest_initiated)
 + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code);
 + else
 + gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code);
  
   if (gpa == UNMAPPED_GVA) {
   kvm_inject_page_fault(vcpu, addr, error_code);
 @@ -3289,24 +3294,35 @@ mmio:
   return X86EMUL_CONTINUE;
  }
  
 -int emulator_write_emulated(unsigned long addr,
 +int __emulator_write_emulated(unsigned long addr,
  const void *val,
  unsigned int bytes,
 -struct kvm_vcpu *vcpu)
 +struct kvm_vcpu *vcpu,
 +bool guest_initiated)
  {
   /* Crossing a page boundary? */
   if (((addr + bytes - 1) ^ addr)  PAGE_MASK) {
   int rc, now;
  
   now = -addr  ~PAGE_MASK;
 - rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
 + rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
 +  guest_initiated);
   if (rc != X86EMUL_CONTINUE)
   return rc;
   addr += now;
   val += now;
   bytes -= now;
   }
 - return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
 + return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
 +guest_initiated);
 +}
 +
 +int emulator_write_emulated(unsigned long addr,
 +const void *val,
 +unsigned int bytes,
 +struct kvm_vcpu *vcpu)
 +{
 + return __emulator_write_emulated(addr, val, bytes, vcpu, true);
  }
  EXPORT_SYMBOL_GPL(emulator_write_emulated);
  
 @@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
  
   kvm_x86_ops-patch_hypercall(vcpu, instruction);
  
 - return emulator_write_emulated(rip, instruction, 3, vcpu);
 + return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
  }
  
  static u64 mk_cr_64(u64 curr_cr, u32 new_val)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: Ideas wiki for GSoC 2010

2010-03-11 Thread Jamie Lokier
Avi Kivity wrote:
 On 03/10/2010 11:30 PM, Luiz Capitulino wrote:
 
 2. Do we have kvm-specific projects? Can they be part of the QEMU project
 or do we need a different mentoring organization for it?

 
 Something really interesting is kvm-assisted tcg.  I'm afraid it's a bit 
 too complicated to GSoC.

Is this simpler: kvm-assisted user-mode emulation (no TCG involved)?

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure

2010-03-11 Thread Wei Yongjun
The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if
copy_to_user() fail, and 0 will be return, we should use -EFAULT
instead of 0 in this case, so this patch fixed it.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/ia64/kvm/kvm-ia64.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 26e0e08..bc07c81 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1535,8 +1535,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
goto out;
 
if (copy_to_user(user_stack, stack,
-sizeof(struct kvm_ia64_vcpu_stack)))
+sizeof(struct kvm_ia64_vcpu_stack))) {
+   r = -EFAULT;
goto out;
+   }
 
break;
}
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: fix the error of ioctl KVM_IRQ_LINE if no irq chip

2010-03-11 Thread Wei Yongjun
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3753c11..c6b7e9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2857,11 +2857,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(irq_event, argp, sizeof irq_event))
goto out;
+   r = -ENXIO;
if (irqchip_in_kernel(kvm)) {
__s32 status;
status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event.irq, irq_event.level);
if (ioctl == KVM_IRQ_LINE_STATUS) {
+   r = -EFAULT;
irq_event.status = status;
if (copy_to_user(argp, irq_event,
sizeof irq_event))
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip

2010-03-11 Thread Wei Yongjun
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 arch/ia64/kvm/kvm-ia64.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 26e0e08..0d2e41a 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -979,11 +979,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(irq_event, argp, sizeof irq_event))
goto out;
+   r = -ENXIO;
if (irqchip_in_kernel(kvm)) {
__s32 status;
status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event.irq, irq_event.level);
if (ioctl == KVM_IRQ_LINE_STATUS) {
+   r = -EFAULT;
irq_event.status = status;
if (copy_to_user(argp, irq_event,
sizeof irq_event))
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] virtio-serial-bus: wake up iothread upon guest read notification

2010-03-11 Thread Marcelo Tosatti
Wake up iothread when buffers are consumed.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-ioworker/hw/virtio-serial-bus.c
===
--- qemu-ioworker.orig/hw/virtio-serial-bus.c
+++ qemu-ioworker/hw/virtio-serial-bus.c
@@ -331,6 +331,7 @@ static void handle_output(VirtIODevice *
 
 static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 {
+qemu_notify_event(main_io_worker);
 }
 
 static uint32_t get_features(VirtIODevice *vdev, uint32_t features)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] introduce QEMUIOWorker and wake up iothread on virtio-serial-bus notification

2010-03-11 Thread Marcelo Tosatti

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] Pass QEMUIOWorker to qemu_notify_event

2010-03-11 Thread Marcelo Tosatti
This can be used later to introduce generic iothread workers.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu-ioworker/async.c
===
--- qemu-ioworker.orig/async.c
+++ qemu-ioworker/async.c
@@ -180,7 +180,7 @@ void qemu_bh_schedule(QEMUBH *bh)
 bh-scheduled = 1;
 bh-idle = 0; 
 /* stop the currently executing CPU to execute the BH ASAP */
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 void qemu_bh_cancel(QEMUBH *bh)
Index: qemu-ioworker/hw/mac_dbdma.c
===
--- qemu-ioworker.orig/hw/mac_dbdma.c
+++ qemu-ioworker/hw/mac_dbdma.c
@@ -655,7 +655,7 @@ void DBDMA_register_channel(void *dbdma,
 
 void DBDMA_schedule(void)
 {
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 static void
Index: qemu-ioworker/hw/virtio-net.c
===
--- qemu-ioworker.orig/hw/virtio-net.c
+++ qemu-ioworker/hw/virtio-net.c
@@ -359,7 +359,7 @@ static void virtio_net_handle_rx(VirtIOD
 
 /* We now have RX buffers, signal to the IO thread to break out of the
  * select to re-poll the tap file descriptor */
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 static int virtio_net_can_receive(VLANClientState *nc)
Index: qemu-ioworker/qemu-common.h
===
--- qemu-ioworker.orig/qemu-common.h
+++ qemu-ioworker/qemu-common.h
@@ -234,11 +234,17 @@ typedef uint64_t pcibus_t;
 void cpu_save(QEMUFile *f, void *opaque);
 int cpu_load(QEMUFile *f, void *opaque, int version_id);
 
+typedef struct QEMUIOWorker {
+void *opaque;
+} QEMUIOWorker;
+
 /* Force QEMU to stop what it's doing and service IO */
 void qemu_service_io(void);
 
 /* Force QEMU to process pending events */
-void qemu_notify_event(void);
+void qemu_notify_event(QEMUIOWorker *worker);
+
+extern QEMUIOWorker *main_io_worker;
 
 /* Unblock cpu */
 void qemu_cpu_kick(void *env);
Index: qemu-ioworker/vl.c
===
--- qemu-ioworker.orig/vl.c
+++ qemu-ioworker/vl.c
@@ -274,6 +274,9 @@ uint8_t qemu_uuid[16];
 static QEMUBootSetHandler *boot_set_handler;
 static void *boot_set_opaque;
 
+QEMUIOWorker iothread_worker;
+QEMUIOWorker *main_io_worker = iothread_worker;
+
 #ifdef SIGRTMIN
 #define SIG_IPI (SIGRTMIN+4)
 #else
@@ -885,7 +888,7 @@ void qemu_mod_timer(QEMUTimer *ts, int64
 }
 /* Interrupt execution to force deadline recalculation.  */
 if (use_icount)
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 }
 
@@ -1062,7 +1065,7 @@ static void host_alarm_handler(int host_
 }
 #endif
 timer_alarm_pending = 1;
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 }
 
@@ -2928,7 +2931,7 @@ static int ram_load(QEMUFile *f, void *o
 
 void qemu_service_io(void)
 {
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 /***/
@@ -3180,26 +3183,26 @@ void qemu_system_reset_request(void)
 } else {
 reset_requested = 1;
 }
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 void qemu_system_shutdown_request(void)
 {
 shutdown_requested = 1;
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 void qemu_system_powerdown_request(void)
 {
 powerdown_requested = 1;
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 
 #ifdef CONFIG_IOTHREAD
 static void qemu_system_vmstop_request(int reason)
 {
 vmstop_requested = reason;
-qemu_notify_event();
+qemu_notify_event(main_io_worker);
 }
 #endif
 
@@ -3341,7 +3344,7 @@ void qemu_cpu_kick(void *env)
 return;
 }
 
-void qemu_notify_event(void)
+void qemu_notify_event(QEMUIOWorker *worker)
 {
 CPUState *env = cpu_single_env;
 
@@ -3727,7 +3730,7 @@ void qemu_init_vcpu(void *_env)
 tcg_init_vcpu(env);
 }
 
-void qemu_notify_event(void)
+void qemu_notify_event(QEMUIOWorker *worker)
 {
 qemu_event_increment();
 }


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev

2010-03-11 Thread Takuya Yoshikawa
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
ring page and dev even after it has freed them.

This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in
kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward.

This patch avoids such problems by NULLifying the pointers.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 virt/kvm/coalesced_mmio.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..11776b7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:
+   kvm-coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm-coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev

2010-03-11 Thread Wei Yongjun
Takuya Yoshikawa wrote:
 kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
 ring page and dev even after it has freed them.

 This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in
 kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward.

 This patch avoids such problems by NULLifying the pointers.
   

After this patch, I think we also need to do some check in
kvm_vcpu_fault() for coalesced_mmio_ring, since the coalesced_mmio
may not be init correctly. This is other issue, so I will send a
new patch for this.

 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---
  virt/kvm/coalesced_mmio.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

 diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
 index 5169736..11776b7 100644
 --- a/virt/kvm/coalesced_mmio.c
 +++ b/virt/kvm/coalesced_mmio.c
 @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
   return ret;
  
  out_free_dev:
 + kvm-coalesced_mmio_dev = NULL;
   kfree(dev);
  out_free_page:
 + kvm-coalesced_mmio_ring = NULL;
   __free_page(page);
  out_err:
   return ret;
   
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: fix to not use NULL kvm-coalesced_mmio_ring in kvm_vcpu_fault()

2010-03-11 Thread Wei Yongjun
If coalesced_mmio init fail, the kvm-coalesced_mmio_ring will be set
to NULL. If so, we should return VM_FAULT_SIGBUS in kvm_vcpu_fault()
even if vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 virt/kvm/kvm_main.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e758ef7..0e06a6d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1253,7 +1253,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, 
struct vm_fault *vmf)
page = virt_to_page(vcpu-arch.pio_data);
 #endif
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
+   else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET 
+vcpu-kvm-coalesced_mmio_ring)
page = virt_to_page(vcpu-kvm-coalesced_mmio_ring);
 #endif
else
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev

2010-03-11 Thread Takuya Yoshikawa

Wei Yongjun wrote:

Takuya Yoshikawa wrote:

kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
ring page and dev even after it has freed them.

This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in
kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward.

This patch avoids such problems by NULLifying the pointers.
  


After this patch, I think we also need to do some check in
kvm_vcpu_fault() for coalesced_mmio_ring, since the coalesced_mmio
may not be init correctly. This is other issue, so I will send a
new patch for this.


Eh, thanks.




Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 virt/kvm/coalesced_mmio.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..11776b7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:

+   kvm-coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm-coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: fix to not use NULL kvm-coalesced_mmio_ring in kvm_vcpu_fault()

2010-03-11 Thread Takuya Yoshikawa

Wei Yongjun wrote:

If coalesced_mmio init fail, the kvm-coalesced_mmio_ring will be set
to NULL. If so, we should return VM_FAULT_SIGBUS in kvm_vcpu_fault()
even if vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 virt/kvm/kvm_main.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e758ef7..0e06a6d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1253,7 +1253,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, 
struct vm_fault *vmf)
page = virt_to_page(vcpu-arch.pio_data);
 #endif
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
+   else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET 
+vcpu-kvm-coalesced_mmio_ring)
page = virt_to_page(vcpu-kvm-coalesced_mmio_ring);
 #endif
else


Btw, I am not certain if we can continue the normal path even if
kvm_coalesced_mmio_init() fails.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure

2010-03-11 Thread Wei Yongjun
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO
from -EINVAL to -ENXIO if no coalesced mmio dev exists.

Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com
---
 virt/kvm/coalesced_mmio.c |4 ++--
 virt/kvm/kvm_main.c   |2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..22500d4 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -138,7 +138,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_dev *dev = kvm-coalesced_mmio_dev;
 
if (dev == NULL)
-   return -EINVAL;
+   return -ENXIO;
 
mutex_lock(kvm-slots_lock);
if (dev-nb_zones = KVM_COALESCED_MMIO_ZONE_MAX) {
@@ -161,7 +161,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *z;
 
if (dev == NULL)
-   return -EINVAL;
+   return -ENXIO;
 
mutex_lock(kvm-slots_lock);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0e06a6d..861435e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1603,7 +1603,6 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(zone, argp, sizeof zone))
goto out;
-   r = -ENXIO;
r = kvm_vm_ioctl_register_coalesced_mmio(kvm, zone);
if (r)
goto out;
@@ -1615,7 +1614,6 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(zone, argp, sizeof zone))
goto out;
-   r = -ENXIO;
r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, zone);
if (r)
goto out;
-- 
1.6.3.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ioeventfd usage in KVM

2010-03-11 Thread Cam Macdonell
Hi,

I'm trying to use ioeventfd/irqfds for my shared memory patch.  I
followed the usage in the vhost-net patches to see how it's setup for
virtio-pci and tried to follow it as closely as I could. Despite the
call to kvm_vm_ioctl() returning 0, any writes to the assigned 4-byte
memory area do not seem to trigger a write to the corresponding fd.
At this point, I'm just trying to get the ioeventfd happening.

I notice that virtio-pci allocates it's BAR as
PCI_BASE_ADDRESS_SPACE_IO and then uses register_ioport_{read,write}
whereas I use cpu_register_io_memory and the
PCI_BASE_ADDRESS_SPACE_MEMORY type as shown below.

+static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num,
+   pcibus_t addr, pcibus_t size, int type)
+{
+PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev;
+IVShmemState *s = d-ivshmem_state;
+
+s-otheraddr = addr  /* this address will be used for the ioeventfd*/
+cpu_register_physical_memory(addr + 0, 0x100, s-ivshmem_mmio_io_addr);
+}


+s-ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read,
+ivshmem_mmio_write, s);
+/* region for registers*/
+pci_register_bar(d-dev, 0, 0x100,
+   PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map);

my basic attempt looks like this:

struct kvm_ioeventfd ked;

ked.addr = s-otheraddr + Doorbell;
ked.len = 4;
ked.flags = KVM_IOEVENTFD_FLAG_PIO;
ked.fd = an_eventfd;
ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, ked);

but when the guest writes to the offset of Doorbell, I cannot see any
action (via a select on the fd).  Is there something obviously wrong
that I'm doing?

When I get this working, I'd be happy to write up a page for the KVM site.

Thanks,
Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] virtio-serial-bus: wake up iothread upon guest read notification

2010-03-11 Thread Amit Shah
On (Thu) Mar 11 2010 [23:45:51], Marcelo Tosatti wrote:
 Wake up iothread when buffers are consumed.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: qemu-ioworker/hw/virtio-serial-bus.c
 ===
 --- qemu-ioworker.orig/hw/virtio-serial-bus.c
 +++ qemu-ioworker/hw/virtio-serial-bus.c
 @@ -331,6 +331,7 @@ static void handle_output(VirtIODevice *
  
  static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
  {
 +qemu_notify_event(main_io_worker);
  }

ACK, the host lets us know buffers are consumed and new buffers have
been added to the pool so that we can start sending more data.

Before this patch my tests took 16m18s to run.
After this patch my tests take 1m17s to run.

Both tests done with just one buffer made available in the virtio-queues.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: ignore access permissions for hypercall patching

2010-03-11 Thread Gleb Natapov
On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote:
 
 Ignore access permissions while patching hypercall instructions. 
 Otherwise KVM injects a page fault when trying to patch vmcall 
 on read-only text regions:
 
 Freeing initrd memory: 8843k freed
 Freeing unused kernel memory: 660k freed
 Write protecting the kernel text: 4780k
 Write protecting the kernel read-only data: 1912k
 BUG: unable to handle kernel paging request at c01292e3
 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
 *pde = 00910067 *pte = 00129161
 Oops: 0003 [#1] SMP
 
 CC: sta...@kernel.org
 Reported-by: Stefan Bader stefan.ba...@canonical.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
My emulator patch series introduce kvm_write_guest_virt_system(). May be
used it here (only compile tested).


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3753c11..9833c25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3157,14 +3157,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void 
*val, unsigned int bytes,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
 }
 
-static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
-   struct kvm_vcpu *vcpu, u32 *error)
+static int kvm_write_guest_virt_helper(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 access,
+  u32 *error)
 {
void *data = val;
int r = X86EMUL_CONTINUE;
 
+   access |= PFERR_WRITE_MASK;
+
while (bytes) {
-   gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
+   gpa_t gpa =  vcpu-arch.mmu.gva_to_gpa(vcpu, addr, access, 
error);
unsigned offset = addr  (PAGE_SIZE-1);
unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
@@ -3187,6 +3191,19 @@ out:
return r;
 }
 
+static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
+   struct kvm_vcpu *vcpu, u32 *error)
+{
+   u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, 
error);
+}
+
+static int kvm_write_guest_virt_system(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 *error)
+{
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
+}
 
 static int emulator_read_emulated(unsigned long addr,
  void *val,
@@ -3997,7 +4014,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
-   return emulator_write_emulated(rip, instruction, 3, vcpu);
+   return kvm_write_guest_virt_system(rip, instruction, 3, vcpu, NULL);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: ignore access permissions for hypercall patching

2010-03-11 Thread Gleb Natapov
On Fri, Mar 12, 2010 at 07:56:00AM +0200, Gleb Natapov wrote:
 On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote:
  
  Ignore access permissions while patching hypercall instructions. 
  Otherwise KVM injects a page fault when trying to patch vmcall 
  on read-only text regions:
  
  Freeing initrd memory: 8843k freed
  Freeing unused kernel memory: 660k freed
  Write protecting the kernel text: 4780k
  Write protecting the kernel read-only data: 1912k
  BUG: unable to handle kernel paging request at c01292e3
  IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70
  *pde = 00910067 *pte = 00129161
  Oops: 0003 [#1] SMP
  
  CC: sta...@kernel.org
  Reported-by: Stefan Bader stefan.ba...@canonical.com
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
 My emulator patch series introduce kvm_write_guest_virt_system(). May be
 used it here (only compile tested).
 
Ignore that, it will not work.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-11 Thread Avi Kivity

On 03/11/2010 10:58 PM, Marcelo Tosatti wrote:



Can't you translate l2_gpa -  l1_gpa walking the current l1 nested
pagetable, and pass that to the kvm tdp fault path (with the correct
context setup)?
   

If I understand your suggestion correctly, I think thats exactly whats
done in the patches. Some words about the design:

For nested-nested we need to shadow the l1-nested-ptable on the host.
This is done using the vcpu-arch.mmu context which holds the l1 paging
modes while the l2 is running. On a npt-fault from the l2 we just
instrument the shadow-ptable code. This is the common case. because it
happens all the time while the l2 is running.
 

OK, makes sense now, I was missing the fact that the l1-nested-ptable
needs to be shadowed and l1 translations to it must be write protected.
   


Shadow converts (gva - gpa - hpa) to (gva - hpa) or (ngpa - gpa - 
hpa) to (ngpa - hpa) equally well.  In the second case npt still does 
(ngva - ngpa).



You should disable out of sync shadow so that l1 guest writes to
l1-nested-ptables always trap.


Why?  The guest is under obligation to flush the tlb if it writes to a 
page table, and we will resync on that tlb flush.


Unsync makes just as much sense for nnpt.  Think of khugepaged in the 
guest eating a page table and spitting out a PDE.



And in the trap case, you'd have to
invalidate l2 shadow pagetable entries that used the (now obsolete)
l1-nested-ptable entry. Does that happen automatically?
   


What do you mean by 'l2 shadow ptable entries'?  There are the guest's 
page tables (ordinary direct mapped, unless the guest's guest is also 
running an npt-enabled hypervisor), and the host page tables.  When the 
guest writes to each page table, we invalidate the shadows.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)

2010-03-11 Thread Avi Kivity

On 03/04/2010 05:58 PM, Joerg Roedel wrote:

You probably need to include a flag in base_role to differentiate
between l1 / l2 shadow tables (say if they use the same cr3 value).
 

Not sure if this is necessary. It may be necessary when large pages come
into play. Otherwise the host npt pages are distinguished by the shadow
npt pages by the direct-flag.
   


Hm, I think that direct maps for the same gfn can be shared.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioeventfd usage in KVM

2010-03-11 Thread Avi Kivity

On 03/12/2010 07:08 AM, Cam Macdonell wrote:


+s-ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read,
+ivshmem_mmio_write, s);
+/* region for registers*/
+pci_register_bar(d-dev, 0, 0x100,
+   PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map);
   


You've selected the memory address space here.


my basic attempt looks like this:

 struct kvm_ioeventfd ked;

 ked.addr = s-otheraddr + Doorbell;
 ked.len = 4;
 ked.flags = KVM_IOEVENTFD_FLAG_PIO;
 ked.fd = an_eventfd;
 ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,ked);
   


But the PIO address space here.


but when the guest writes to the offset of Doorbell, I cannot see any
action (via a select on the fd).  Is there something obviously wrong
that I'm doing?
   


Yes - they must match.  Not PIO is faster on x86 but nonexistant elsewhere.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling

2010-03-11 Thread Takuya Yoshikawa
This version may be better.

Thanks,
  Takuya

===
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
ring page and dev even after it has freed them.

Also, if this function fails, though it must be rare, it seems to be
suggesting the system's serious state.

This patch changes the error handling for this function to fix these issues.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 virt/kvm/coalesced_mmio.c |2 ++
 virt/kvm/kvm_main.c   |4 +++-
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..11776b7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:
+   kvm-coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm-coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e758ef7..9e72067 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void)
list_add(kvm-vm_list, vm_list);
spin_unlock(kvm_lock);
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   kvm_coalesced_mmio_init(kvm);
+   r = kvm_coalesced_mmio_init(kvm);
+   if (r  0)
+   goto out_err;
 #endif
 out:
return kvm;
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling

2010-03-11 Thread Wei Yongjun
Takuya Yoshikawa wrote:
 This version may be better.

 Thanks,
   Takuya

 ===
 kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
 ring page and dev even after it has freed them.

 Also, if this function fails, though it must be rare, it seems to be
 suggesting the system's serious state.

 This patch changes the error handling for this function to fix these issues.
   

We must also unregister mmu_notifier in the error path.

 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---
  virt/kvm/coalesced_mmio.c |2 ++
  virt/kvm/kvm_main.c   |4 +++-
  2 files changed, 5 insertions(+), 1 deletions(-)

 diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
 index 5169736..11776b7 100644
 --- a/virt/kvm/coalesced_mmio.c
 +++ b/virt/kvm/coalesced_mmio.c
 @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
   return ret;
  
  out_free_dev:
 + kvm-coalesced_mmio_dev = NULL;
   kfree(dev);
  out_free_page:
 + kvm-coalesced_mmio_ring = NULL;
   __free_page(page);
  out_err:
   return ret;
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index e758ef7..9e72067 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void)
   list_add(kvm-vm_list, vm_list);
   spin_unlock(kvm_lock);
  #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 - kvm_coalesced_mmio_init(kvm);
 + r = kvm_coalesced_mmio_init(kvm);
 + if (r  0)
 + goto out_err;
  #endif
  out:
   return kvm;
   
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling

2010-03-11 Thread Takuya Yoshikawa

Wei Yongjun wrote:

Takuya Yoshikawa wrote:

This version may be better.

Thanks,
  Takuya

===
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio
ring page and dev even after it has freed them.

Also, if this function fails, though it must be rare, it seems to be
suggesting the system's serious state.

This patch changes the error handling for this function to fix these issues.
  


We must also unregister mmu_notifier in the error path.


Oh, sorry.




Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 virt/kvm/coalesced_mmio.c |2 ++
 virt/kvm/kvm_main.c   |4 +++-
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5169736..11776b7 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
return ret;
 
 out_free_dev:

+   kvm-coalesced_mmio_dev = NULL;
kfree(dev);
 out_free_page:
+   kvm-coalesced_mmio_ring = NULL;
__free_page(page);
 out_err:
return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e758ef7..9e72067 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void)
list_add(kvm-vm_list, vm_list);
spin_unlock(kvm_lock);
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   kvm_coalesced_mmio_init(kvm);
+   r = kvm_coalesced_mmio_init(kvm);
+   if (r  0)
+   goto out_err;
 #endif
 out:
return kvm;
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html