Re: [PATCH 26/27] KVM: PPC: Add KVM intercept handlers

2010-04-15 Thread Benjamin Herrenschmidt
On Fri, 2010-04-16 at 00:11 +0200, Alexander Graf wrote:
> When an interrupt occurs we don't know yet if we're in guest context or
> in host context. When in guest context, KVM needs to handle it.
> 
> So let's pull the same trick we did on Book3S_64: Just add a macro to
> determine if we're in guest context or not and if so jump on to KVM code.
> 
Acked-by: Benjamin Herrenschmidt 

> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kernel/head_32.S |   14 ++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
> index e025e89..98c4b29 100644
> --- a/arch/powerpc/kernel/head_32.S
> +++ b/arch/powerpc/kernel/head_32.S
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* 601 only have IBAT; cr0.eq is set on 601 when using this macro */
>  #define LOAD_BAT(n, reg, RA, RB) \
> @@ -303,6 +304,7 @@ __secondary_hold_acknowledge:
>   */
>  #define EXCEPTION(n, label, hdlr, xfer)  \
>   . = n;  \
> + DO_KVM n;   \
>  label:   \
>   EXCEPTION_PROLOG;   \
>   addir3,r1,STACK_FRAME_OVERHEAD; \
> @@ -358,6 +360,7 @@ i##n: 
> \
>   *   -- paulus.
>   */
>   . = 0x200
> + DO_KVM  0x200
>   mtspr   SPRN_SPRG_SCRATCH0,r10
>   mtspr   SPRN_SPRG_SCRATCH1,r11
>   mfcrr10
> @@ -381,6 +384,7 @@ i##n: 
> \
>  
>  /* Data access exception. */
>   . = 0x300
> + DO_KVM  0x300
>  DataAccess:
>   EXCEPTION_PROLOG
>   mfspr   r10,SPRN_DSISR
> @@ -397,6 +401,7 @@ DataAccess:
>  
>  /* Instruction access exception. */
>   . = 0x400
> + DO_KVM  0x400
>  InstructionAccess:
>   EXCEPTION_PROLOG
>   andis.  r0,r9,0x4000/* no pte found? */
> @@ -413,6 +418,7 @@ InstructionAccess:
>  
>  /* Alignment exception */
>   . = 0x600
> + DO_KVM  0x600
>  Alignment:
>   EXCEPTION_PROLOG
>   mfspr   r4,SPRN_DAR
> @@ -427,6 +433,7 @@ Alignment:
>  
>  /* Floating-point unavailable */
>   . = 0x800
> + DO_KVM  0x800
>  FPUnavailable:
>  BEGIN_FTR_SECTION
>  /*
> @@ -450,6 +457,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
>  
>  /* System call */
>   . = 0xc00
> + DO_KVM  0xc00
>  SystemCall:
>   EXCEPTION_PROLOG
>   EXC_XFER_EE_LITE(0xc00, DoSyscall)
> @@ -467,9 +475,11 @@ SystemCall:
>   * by executing an altivec instruction.
>   */
>   . = 0xf00
> + DO_KVM  0xf00
>   b   PerformanceMonitor
>  
>   . = 0xf20
> + DO_KVM  0xf20
>   b   AltiVecUnavailable
>  
>  /*
> @@ -882,6 +892,10 @@ __secondary_start:
>   RFI
>  #endif /* CONFIG_SMP */
>  
> +#ifdef CONFIG_KVM_BOOK3S_HANDLER
> +#include "../kvm/book3s_rmhandlers.S"
> +#endif
> +
>  /*
>   * Those generic dummy functions are kept for CPUs not
>   * included in CONFIG_6xx


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/27] PPC: Export SWITCH_FRAME_SIZE

2010-04-15 Thread Benjamin Herrenschmidt
On Fri, 2010-04-16 at 00:11 +0200, Alexander Graf wrote:
> We need the SWITCH_FRAME_SIZE define on Book3S_32 now too.
> So let's export it unconditionally.
> 

Acked-by: Benjamin Herrenschmidt 

> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kernel/asm-offsets.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 1804c2c..2716c51 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -210,8 +210,8 @@ int main(void)
>   /* Interrupt register frame */
>   DEFINE(STACK_FRAME_OVERHEAD, STACK_FRAME_OVERHEAD);
>   DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE);
> -#ifdef CONFIG_PPC64
>   DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct 
> pt_regs));
> +#ifdef CONFIG_PPC64
>   /* Create extra stack space for SRR0 and SRR1 when calling prom/rtas. */
>   DEFINE(PROM_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 
> 16);
>   DEFINE(RTAS_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 
> 16);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 23/27] KVM: PPC: Export MMU variables

2010-04-15 Thread Benjamin Herrenschmidt
On Fri, 2010-04-16 at 00:11 +0200, Alexander Graf wrote:
> Our shadow MMU code needs to know where the HTAB is located and how
> big it is. So we need some variables from the kernel exported to
> module space if KVM is built as a module.

Gross :-) Can't you just read the real SDR1 ? :-)

Cheers,
Ben.

> CC: Benjamin Herrenschmidt 
> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kernel/ppc_ksyms.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
> index bc9f39d..2b7c43f 100644
> --- a/arch/powerpc/kernel/ppc_ksyms.c
> +++ b/arch/powerpc/kernel/ppc_ksyms.c
> @@ -178,6 +178,11 @@ EXPORT_SYMBOL(switch_mmu_context);
>  extern long mol_trampoline;
>  EXPORT_SYMBOL(mol_trampoline); /* For MOL */
>  EXPORT_SYMBOL(flush_hash_pages); /* For MOL */
> +
> +extern struct hash_pte *Hash;
> +extern unsigned long _SDR1;
> +EXPORT_SYMBOL_GPL(Hash); /* For KVM */
> +EXPORT_SYMBOL_GPL(_SDR1); /* For KVM */
>  #ifdef CONFIG_SMP
>  extern int mmu_hash_lock;
>  EXPORT_SYMBOL(mmu_hash_lock); /* For MOL */


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/27] PPC: Add STLU

2010-04-15 Thread Benjamin Herrenschmidt
On Fri, 2010-04-16 at 00:11 +0200, Alexander Graf wrote:
> For assembly code there are several "long" load and store defines already.
> The one that's missing is the typical stack store, stdu/stwu.
> 
> So let's add that define as well, making my KVM code happy.
> 

Acked-by: Benjamin Herrenschmidt 

> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/include/asm/asm-compat.h |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-compat.h 
> b/arch/powerpc/include/asm/asm-compat.h
> index a9b91ed..2048a6a 100644
> --- a/arch/powerpc/include/asm/asm-compat.h
> +++ b/arch/powerpc/include/asm/asm-compat.h
> @@ -21,6 +21,7 @@
>  /* operations for longs and pointers */
>  #define PPC_LL   stringify_in_c(ld)
>  #define PPC_STL  stringify_in_c(std)
> +#define PPC_STLU stringify_in_c(stdu)
>  #define PPC_LCMPIstringify_in_c(cmpdi)
>  #define PPC_LONG stringify_in_c(.llong)
>  #define PPC_LONG_ALIGN   stringify_in_c(.balign 8)
> @@ -44,6 +45,7 @@
>  /* operations for longs and pointers */
>  #define PPC_LL   stringify_in_c(lwz)
>  #define PPC_STL  stringify_in_c(stw)
> +#define PPC_STLU stringify_in_c(stwu)
>  #define PPC_LCMPIstringify_in_c(cmpwi)
>  #define PPC_LONG stringify_in_c(.long)
>  #define PPC_LONG_ALIGN   stringify_in_c(.balign 4)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/27] PPC: Split context init/destroy functions

2010-04-15 Thread Benjamin Herrenschmidt
On Fri, 2010-04-16 at 00:11 +0200, Alexander Graf wrote:
> We need to reserve a context from KVM to make sure we have our own
> segment space. While we did that split for Book3S_64 already, 32 bit
> is still outstanding.
> 
> So let's split it now.
> 
> Signed-off-by: Alexander Graf 

Acked-by: Benjamin Herrenschmidt 


> ---
>  arch/powerpc/include/asm/mmu_context.h |2 ++
>  arch/powerpc/mm/mmu_context_hash32.c   |   29 ++---
>  2 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 26383e0..81fb412 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -27,6 +27,8 @@ extern int __init_new_context(void);
>  extern void __destroy_context(int context_id);
>  static inline void mmu_context_init(void) { }
>  #else
> +extern unsigned long __init_new_context(void);
> +extern void __destroy_context(unsigned long context_id);
>  extern void mmu_context_init(void);
>  #endif
>  
> diff --git a/arch/powerpc/mm/mmu_context_hash32.c 
> b/arch/powerpc/mm/mmu_context_hash32.c
> index 0dfba2b..d0ee554 100644
> --- a/arch/powerpc/mm/mmu_context_hash32.c
> +++ b/arch/powerpc/mm/mmu_context_hash32.c
> @@ -60,11 +60,7 @@
>  static unsigned long next_mmu_context;
>  static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
>  
> -
> -/*
> - * Set up the context for a new address space.
> - */
> -int init_new_context(struct task_struct *t, struct mm_struct *mm)
> +unsigned long __init_new_context(void)
>  {
>   unsigned long ctx = next_mmu_context;
>  
> @@ -74,19 +70,38 @@ int init_new_context(struct task_struct *t, struct 
> mm_struct *mm)
>   ctx = 0;
>   }
>   next_mmu_context = (ctx + 1) & LAST_CONTEXT;
> - mm->context.id = ctx;
> +
> + return ctx;
> +}
> +EXPORT_SYMBOL_GPL(__init_new_context);
> +
> +/*
> + * Set up the context for a new address space.
> + */
> +int init_new_context(struct task_struct *t, struct mm_struct *mm)
> +{
> + mm->context.id = __init_new_context();
>  
>   return 0;
>  }
>  
>  /*
> + * Free a context ID. Make sure to call this with preempt disabled!
> + */
> +void __destroy_context(unsigned long ctx)
> +{
> + clear_bit(ctx, context_map);
> +}
> +EXPORT_SYMBOL_GPL(__destroy_context);
> +
> +/*
>   * We're finished using the context for an address space.
>   */
>  void destroy_context(struct mm_struct *mm)
>  {
>   preempt_disable();
>   if (mm->context.id != NO_CONTEXT) {
> - clear_bit(mm->context.id, context_map);
> + __destroy_context(mm->context.id);
>   mm->context.id = NO_CONTEXT;
>   }
>   preempt_enable();


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest

2010-04-15 Thread pradeep

Lucas Meneghel Rodrigues wrote:




Hi Pradeep, I was reading the test once again while trying it myself,
some other ideas came to me. I spent some time hacking the test and sent
an updated patch with changes. Please let me know what you think, if you
are OK with them I'll commit it.

  

Hi Lucas

Patch looks fine to me. Thanks for your code changes.

--SP
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VM performance issue in KVM guests.

2010-04-15 Thread Zhang, Xiantao
Avi Kivity wrote:
> On 04/14/2010 06:24 AM, Zhang, Xiantao wrote:
>> 
> Spin loops need to be addressed first, they are known to kill
> performance in overcommit situations.
> 
> 
 Even in overcommit case, if vcpu threads of one qemu are not
 scheduled or pulled to the same logical processor, the performance
 drop is tolerant like Xen's case today. But for KVM, it has to
 suffer from additional performance loss, since host's scheduler
 actively pulls these vcpu threads together.
 
 
 
>>> Can you quantify this loss?  Give examples of what happens?
>>> 
>> For example, one machine is configured with 2 pCPUs and there are
>> two Windows guests running on the machine, and each guest is
>> cconfigured with 2 vcpus and one webbench server runs in it.  
>> If use host's default scheduler, webbench's performance is very bad,
>> but if pin each geust's vCPU0 to pCPU0 and vCPU1 to pCPU1, we can
>> see 5-10X performance improvement with same CPU utilization.  
>> In addition, we also see kvm's perf scalability is also impacted in
>> large systems, for some performance experiments, kvm's perf begins
>> to drop when vCPU is overcommitted and pCPU are saturated, but once
>> the wake_up_affine feature is switched off in scheduler, kvm's perf
>> can keep rising in this case.
>> 
> 
> Ok.  This is probably due to spinlock contention.

Yes, exactly. 

> When vcpus are pinned to pcpus, there is a 50% chance that a guest's
> vcpus will be co-scheduled and spinlocks will perform will.
> 
> When vcpus are not pinned, but affine wakeups are disabled, there is a
> 33% chance that vcpus will be co-scheduled.
> 
> When vcpus are not pinned and affine wakeups are enabled there is a 0%
> chance that vcpus will be co-scheduled.
> 
> Keeping both vcpus on the same core actually makes sense since they
> can communicate through the local cache faster than across cores. 
> What we need is to make sure that they don't spin.
> 
> Windows 2008 can report spinlock spinning through a hypercall.  Can
> you hook to that interface and see if it happens regularly? 
> Altenatively use a PLE capable host and trace the kvm_vcpu_on_spin()
> function. 
We only tried windows 2003 for the experiments, and have no data related to 
windows 2008. 
But maybe we can have  a try later.  Anyway, the key point is we have to 
enhance the scheduler to let it 
Know which threads are vcpu threads to avoid perf loss in this case.
Xiantao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on copy & paste

2010-04-15 Thread Stephen Liu
> You can use higher level layers to handle that in the meantime.  For
> example, I always use rdesktop to connect to my Windows guests and it
> supports copy and paste just fine.

Hi Jim,

Thanks for your advice.


Host - Debian 5.0
Guest - Debian 5.0


I have rdesktop running here.  But I can't connect the guest on host

$ rdesktop 192.168.0.30
Autoselected keyboard map en-us
ERROR: 192.168.0.30: unable to connect


On VBox I run;

$ rdesktop 192.168.0.30:3389

It connects.  Here I don't know which port to be used.

Could you please shed me some light.  TIA


B.R.
Stephen L




- Original Message 
From: Jim Paris 
To: Stephen Liu 
Cc: kvm@vger.kernel.org
Sent: Thu, April 15, 2010 2:23:08 PM
Subject: Re: Question on copy & paste

Stephen Liu wrote:
> 
> 
> - Original Message 
> From: Amit Shah 
> To: Stephen Liu 
> Cc: kvm@vger.kernel.org
> Sent: Thu, April 15, 2010 9:02:53 AM
> Subject: Re: Question on copy & paste
> 
> On (Thu) Apr 15 2010 [08:45:23], Stephen Liu wrote:
> > Hi folks,
> > 
> > host - Debian 5.04
> > 
> > What will the easy way to enable copy_and_paste function between guest and 
> > hosts?  Also among guests.  TIA
> 
> This doesn't exist yet, but something should be available in a few
> months.
> 
> 
> Noted and thanks

You can use higher level layers to handle that in the meantime.  For
example, I always use rdesktop to connect to my Windows guests and it
supports copy and paste just fine.

-jim


Send instant messages to your online friends http://uk.messenger.yahoo.com 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Convert u64 -> ulong

2010-04-15 Thread Alexander Graf
There are quite some pieces in the code that I overlooked that still use
u64s instead of longs. This has two side effects:

  1) Slowness
  2) Breakage

This patch fixes both, enabling me to successfully run a Debian guest
on a G4 iBook in KVM.

Signed-off-by: Alexander Graf 

---

Please apply after the just posted series
---
 arch/powerpc/include/asm/kvm_book3s.h |   17 +++--
 arch/powerpc/include/asm/kvm_host.h   |6 +++---
 arch/powerpc/kvm/book3s.c |6 +++---
 arch/powerpc/kvm/book3s_32_mmu.c  |   20 ++--
 arch/powerpc/kvm/book3s_32_mmu_host.c |   25 +
 arch/powerpc/kvm/book3s_64_mmu.c  |4 ++--
 arch/powerpc/kvm/book3s_64_mmu_host.c |   14 +++---
 7 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9517b8d..19d278f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -100,6 +100,8 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
 #define VSID_REAL_DR   0x7ff0ULL
 #define VSID_REAL_IR   0x7fe0ULL
 #define VSID_SPLIT_MASK0x7fe0ULL
@@ -107,9 +109,20 @@ struct kvmppc_vcpu_book3s {
 #define VSID_BAT   0x7fb0ULL
 #define VSID_PR0x8000ULL
 
-extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+#else /* Only have 32 bits VSIDs */
+
+#define VSID_REAL_DR   0x7ff0
+#define VSID_REAL_IR   0x7fe0
+#define VSID_SPLIT_MASK0x7fe0
+#define VSID_REAL  0x7fc0
+#define VSID_BAT   0x7fb0
+#define VSID_PR0x8000
+
+#endif
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong 
ea_mask);
 extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
-extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 
pa_end);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong 
pa_end);
 extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
 extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5a83995..69a9ba2 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -124,9 +124,9 @@ struct kvm_arch {
 };
 
 struct kvmppc_pte {
-   u64 eaddr;
+   ulong eaddr;
u64 vpage;
-   u64 raddr;
+   ulong raddr;
bool may_read   : 1;
bool may_write  : 1;
bool may_execute: 1;
@@ -145,7 +145,7 @@ struct kvmppc_mmu {
int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte 
*pte, bool data);
void (*reset_msr)(struct kvm_vcpu *vcpu);
void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
-   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+   int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, ulong esid, ulong *vsid);
u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
 };
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 5805f99..a7de709 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -812,12 +812,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * so we can't use the NX bit inside the guest. 
Let's cross our fingers,
 * that no guest that needs the dcbz hack does NX.
 */
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
} else {
vcpu->arch.msr |= to_svcpu(vcpu)->shadow_srr1 & 
0x5800;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), 
~0xFFFUL);
r = RESUME_GUEST;
}
break;
@@ -843,7 +843,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
vcpu->arch.dear = dar;
to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
diff --git a/arch/

[PATCH 02/27] KVM: PPC: Add host MMU Support

2010-04-15 Thread Alexander Graf
In order to support 32 bit Book3S, we need to add code to enable our
shadow MMU to actually add shadow PTEs. This is the module enabling
that support.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_32_mmu_host.c |  480 +
 1 files changed, 480 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
new file mode 100644
index 000..ce1bfb1
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -0,0 +1,480 @@
+/*
+ * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ * Alexander Graf 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_SR */
+
+#ifdef DEBUG_MMU
+#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_mmu(a, ...) do { } while(0)
+#endif
+
+#ifdef DEBUG_SR
+#define dprintk_sr(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_sr(a, ...) do { } while(0)
+#endif
+
+#if PAGE_SHIFT != 12
+#error Unknown page size
+#endif
+
+#ifdef CONFIG_SMP
+#error XXX need to grab mmu_hash_lock
+#endif
+
+#ifdef CONFIG_PTE_64BIT
+#error Only 32 bit pages are supported for now
+#endif
+
+static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
+{
+   volatile u32 *pteg;
+
+   dprintk_mmu("KVM: Flushing SPTE: 0x%llx (0x%llx) -> 0x%llx\n",
+   pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+
+   pteg = (u32*)pte->slot;
+
+   pteg[0] = 0;
+   asm volatile ("sync");
+   asm volatile ("tlbie %0" : : "r" (pte->pte.eaddr) : "memory");
+   asm volatile ("sync");
+   asm volatile ("tlbsync");
+
+   pte->host_va = 0;
+
+   if (pte->pte.may_write)
+   kvm_release_pfn_dirty(pte->pfn);
+   else
+   kvm_release_pfn_clean(pte->pfn);
+}
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 _guest_ea, u64 _ea_mask)
+{
+   int i;
+   u32 guest_ea = _guest_ea;
+   u32 ea_mask = _ea_mask;
+
+   dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%x & 0x%x\n",
+   vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+   BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+   guest_ea &= ea_mask;
+   for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+   struct hpte_cache *pte;
+
+   pte = &vcpu->arch.hpte_cache[i];
+   if (!pte->host_va)
+   continue;
+
+   if ((pte->pte.eaddr & ea_mask) == guest_ea) {
+   invalidate_pte(vcpu, pte);
+   }
+   }
+
+   /* Doing a complete flush -> start from scratch */
+   if (!ea_mask)
+   vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+   int i;
+
+   dprintk_mmu("KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+   vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+   BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+   guest_vp &= vp_mask;
+   for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+   struct hpte_cache *pte;
+
+   pte = &vcpu->arch.hpte_cache[i];
+   if (!pte->host_va)
+   continue;
+
+   if ((pte->pte.vpage & vp_mask) == guest_vp) {
+   invalidate_pte(vcpu, pte);
+   }
+   }
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+   int i;
+
+   dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+   vcpu->arch.hpte_cache_offset, pa_start, pa_end);
+   BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+   for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+   struct hpte_cache *pte;
+
+   pte = &vcpu->arch.hpte_cache[i];
+   if (!pte->host_va)
+   continue;
+
+   if ((pte->pte.raddr >= pa_start) &&
+   (pte->pte.raddr < pa_end)) {
+   invalidate_pte(vcpu, pte);
+   }
+   }
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_

[PATCH 14/27] KVM: PPC: Extract MMU init

2010-04-15 Thread Alexander Graf
The host shadow mmu code needs to get initialized. It needs to fetch a
segment it can use to put shadow PTEs into.

That initialization code was in generic code, which is icky. Let's move
it over to the respective MMU file.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_ppc.h|1 +
 arch/powerpc/kvm/book3s.c |8 +---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   18 ++
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index edade84..18d139e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -69,6 +69,7 @@ extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, 
gpa_t gpaddr,
 extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
 extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
 extern void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_init(struct kvm_vcpu *vcpu);
 extern int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr);
 extern int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr);
 extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6eb2da2..b917b97 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1201,14 +1201,9 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
 
vcpu->arch.shadow_msr = MSR_USER64;
 
-   err = __init_new_context();
+   err = kvmppc_mmu_init(vcpu);
if (err < 0)
goto free_shadow_vcpu;
-   vcpu_book3s->context_id = err;
-
-   vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << 
USER_ESID_BITS) - 1;
-   vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
-   vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
 
return vcpu;
 
@@ -1224,7 +1219,6 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
-   __destroy_context(vcpu_book3s->context_id);
kvm_vcpu_uninit(vcpu);
kfree(vcpu_book3s->shadow_vcpu);
vfree(vcpu_book3s);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b0f5b4e..0eea589 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -405,4 +405,22 @@ void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
 void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
 {
kvmppc_mmu_pte_flush(vcpu, 0, 0);
+   __destroy_context(to_book3s(vcpu)->context_id);
+}
+
+int kvmppc_mmu_init(struct kvm_vcpu *vcpu)
+{
+   struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu);
+   int err;
+
+   err = __init_new_context();
+   if (err < 0)
+   return -1;
+   vcpu3s->context_id = err;
+
+   vcpu3s->vsid_max = ((vcpu3s->context_id + 1) << USER_ESID_BITS) - 1;
+   vcpu3s->vsid_first = vcpu3s->context_id << USER_ESID_BITS;
+   vcpu3s->vsid_next = vcpu3s->vsid_first;
+
+   return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/27] KVM: PPC: Use now shadowed vcpu fields

2010-04-15 Thread Alexander Graf
The shadow vcpu now contains some fields we don't use from the vcpu anymore.
Access to them happens using inline functions that happily use the shadow
vcpu fields.

So let's now ifdef them out to booke only and add asm-offsets.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |8 +--
 arch/powerpc/include/asm/paca.h |6 --
 arch/powerpc/kernel/asm-offsets.c   |   91 +--
 3 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 22801f8..5a83995 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -191,11 +191,11 @@ struct kvm_vcpu_arch {
u32 qpr[32];
 #endif
 
+#ifdef CONFIG_BOOKE
ulong pc;
ulong ctr;
ulong lr;
 
-#ifdef CONFIG_BOOKE
ulong xer;
u32 cr;
 #endif
@@ -203,7 +203,6 @@ struct kvm_vcpu_arch {
ulong msr;
 #ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
-   ulong shadow_srr1;
ulong hflags;
ulong guest_owned_ext;
 #endif
@@ -258,14 +257,13 @@ struct kvm_vcpu_arch {
struct dentry *debugfs_exit_timing;
 #endif
 
+#ifdef CONFIG_BOOKE
u32 last_inst;
-#ifdef CONFIG_PPC64
-   u32 fault_dsisr;
-#endif
ulong fault_dear;
ulong fault_esr;
ulong queued_dear;
ulong queued_esr;
+#endif
gpa_t paddr_accessed;
 
u8 io_gpr; /* GPR used as IO source/target */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 33347ea..224eb37 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -137,14 +137,8 @@ struct paca_struct {
u64 startspurr; /* SPURR value snapshot */
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
-   struct  {
-   u64 esid;
-   u64 vsid;
-   } kvm_slb[64];  /* guest SLB */
/* We use this to store guest state in */
struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
-   u8 kvm_slb_max; /* highest used guest slb entry */
-   u8 kvm_in_guest;/* are we inside the guest? */
 #endif
 };
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 57a8c49..e8003ff 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -50,6 +50,9 @@
 #endif
 #ifdef CONFIG_KVM
 #include 
+#ifndef CONFIG_BOOKE
+#include 
+#endif
 #endif
 
 #ifdef CONFIG_PPC32
@@ -191,33 +194,9 @@ int main(void)
DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-   DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
-   DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
-   DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
-   DEFINE(PACA_KVM_CR, offsetof(struct paca_struct, shadow_vcpu.cr));
-   DEFINE(PACA_KVM_XER, offsetof(struct paca_struct, shadow_vcpu.xer));
-   DEFINE(PACA_KVM_R0, offsetof(struct paca_struct, shadow_vcpu.gpr[0]));
-   DEFINE(PACA_KVM_R1, offsetof(struct paca_struct, shadow_vcpu.gpr[1]));
-   DEFINE(PACA_KVM_R2, offsetof(struct paca_struct, shadow_vcpu.gpr[2]));
-   DEFINE(PACA_KVM_R3, offsetof(struct paca_struct, shadow_vcpu.gpr[3]));
-   DEFINE(PACA_KVM_R4, offsetof(struct paca_struct, shadow_vcpu.gpr[4]));
-   DEFINE(PACA_KVM_R5, offsetof(struct paca_struct, shadow_vcpu.gpr[5]));
-   DEFINE(PACA_KVM_R6, offsetof(struct paca_struct, shadow_vcpu.gpr[6]));
-   DEFINE(PACA_KVM_R7, offsetof(struct paca_struct, shadow_vcpu.gpr[7]));
-   DEFINE(PACA_KVM_R8, offsetof(struct paca_struct, shadow_vcpu.gpr[8]));
-   DEFINE(PACA_KVM_R9, offsetof(struct paca_struct, shadow_vcpu.gpr[9]));
-   DEFINE(PACA_KVM_R10, offsetof(struct paca_struct, shadow_vcpu.gpr[10]));
-   DEFINE(PACA_KVM_R11, offsetof(struct paca_struct, shadow_vcpu.gpr[11]));
-   DEFINE(PACA_KVM_R12, offsetof(struct paca_struct, shadow_vcpu.gpr[12]));
-   DEFINE(PACA_KVM_R13, offsetof(struct paca_struct, shadow_vcpu.gpr[13]));
-   DEFINE(PACA_KVM_HOST_R1, offsetof(struct paca_struct, 
shadow_vcpu.host_r1));
-   DEFINE(PACA_KVM_HOST_R2, offsetof(struct paca_struct, 
shadow_vcpu.host_r2));
-   DEFINE(PACA_KVM_VMHANDLER, offsetof(struct paca_struct,
-   shadow_vcpu.vmhandler));
-   DEFINE(PACA_KVM_SCRATCH0, offsetof(struct paca_struct,
-  shadow_vcpu.scratch0));
-   DEFINE(PACA_KVM_SCRATCH1, offsetof(struct paca_struct,
-  shadow_vcpu.scratch1));
+   DEFINE(PACA_KVM_SVCPU, offsetof(struct paca_struct, shadow_vcpu));
+   DEFINE(SVCPU_SLB, offsetof(struct kvmppc_book3s_shadow_vcpu, slb));
+   DE

[PATCH 17/27] KVM: PPC: Make SLB switching code the new segment framework

2010-04-15 Thread Alexander Graf
We just introduced generic segment switching code that only needs to call
small macros to do the actual switching, but keeps most of the entry / exit
code generic.

So let's move the SLB switching code over to use this new mechanism.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_slb.S |  183 +-
 arch/powerpc/kvm/book3s_rmhandlers.S |2 +-
 2 files changed, 25 insertions(+), 160 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 0919679..04e7d3b 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -44,8 +44,7 @@ slb_exit_skip_ ## num:
  **
  */
 
-.global kvmppc_handler_trampoline_enter
-kvmppc_handler_trampoline_enter:
+.macro LOAD_GUEST_SEGMENTS
 
/* Required state:
 *
@@ -53,20 +52,14 @@ kvmppc_handler_trampoline_enter:
 * R13 = PACA
 * R1 = host R1
 * R2 = host R2
-* R9 = guest IP
-* R10 = guest MSR
-* all other GPRS = free
-* PACA[KVM_CR] = guest CR
-* PACA[KVM_XER] = guest XER
+* R3 = shadow vcpu
+* all other volatile GPRS = free
+* SVCPU[CR]  = guest CR
+* SVCPU[XER] = guest XER
+* SVCPU[CTR] = guest CTR
+* SVCPU[LR]  = guest LR
 */
 
-   mtsrr0  r9
-   mtsrr1  r10
-
-   /* Activate guest mode, so faults get handled by KVM */
-   li  r11, KVM_GUEST_MODE_GUEST
-   stb r11, PACA_KVM_IN_GUEST(r13)
-
/* Remove LPAR shadow entries */
 
 #if SLB_NUM_BOLTED == 3
@@ -101,14 +94,14 @@ kvmppc_handler_trampoline_enter:
 
/* Fill SLB with our shadow */
 
-   lbz r12, PACA_KVM_SLB_MAX(r13)
+   lbz r12, SVCPU_SLB_MAX(r3)
mulli   r12, r12, 16
-   addir12, r12, PACA_KVM_SLB
-   add r12, r12, r13
+   addir12, r12, SVCPU_SLB
+   add r12, r12, r3
 
/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
-   li  r11, PACA_KVM_SLB
-   add r11, r11, r13
+   li  r11, SVCPU_SLB
+   add r11, r11, r3
 
 slb_loop_enter:
 
@@ -127,34 +120,7 @@ slb_loop_enter_skip:
 
 slb_do_enter:
 
-   /* Enter guest */
-
-   ld  r0, (PACA_KVM_R0)(r13)
-   ld  r1, (PACA_KVM_R1)(r13)
-   ld  r2, (PACA_KVM_R2)(r13)
-   ld  r3, (PACA_KVM_R3)(r13)
-   ld  r4, (PACA_KVM_R4)(r13)
-   ld  r5, (PACA_KVM_R5)(r13)
-   ld  r6, (PACA_KVM_R6)(r13)
-   ld  r7, (PACA_KVM_R7)(r13)
-   ld  r8, (PACA_KVM_R8)(r13)
-   ld  r9, (PACA_KVM_R9)(r13)
-   ld  r10, (PACA_KVM_R10)(r13)
-   ld  r12, (PACA_KVM_R12)(r13)
-
-   lwz r11, (PACA_KVM_CR)(r13)
-   mtcrr11
-
-   lwz r11, (PACA_KVM_XER)(r13)
-   mtxer   r11
-
-   ld  r11, (PACA_KVM_R11)(r13)
-   ld  r13, (PACA_KVM_R13)(r13)
-
-   RFI
-kvmppc_handler_trampoline_enter_end:
-
-
+.endm
 
 /**
  **
@@ -162,99 +128,22 @@ kvmppc_handler_trampoline_enter_end:
  **
  */
 
-.global kvmppc_handler_trampoline_exit
-kvmppc_handler_trampoline_exit:
+.macro LOAD_HOST_SEGMENTS
 
/* Register usage at this point:
 *
-* SPRG_SCRATCH0 = guest R13
-* R12   = exit handler id
-* R13   = PACA
-* PACA.KVM.SCRATCH0 = guest R12
-* PACA.KVM.SCRATCH1 = guest CR
+* R1 = host R1
+* R2 = host R2
+* R12= exit handler id
+* R13= shadow vcpu - SHADOW_VCPU_OFF [=PACA on PPC64]
+* SVCPU.*= guest *
+* SVCPU[CR]  = guest CR
+* SVCPU[XER] = guest XER
+* SVCPU[CTR] = guest CTR
+* SVCPU[LR]  = guest LR
 *
 */
 
-   /* Save registers */
-
-   std r0, PACA_KVM_R0(r13)
-   std r1, PACA_KVM_R1(r13)
-   std r2, PACA_KVM_R2(r13)
-   std r3, PACA_KVM_R3(r13)
-   std r4, PACA_KVM_R4(r13)
-   std r5, PACA_KVM_R5(r13)
-   std r6, PACA_KVM_R6(r13)
-   std r7, PACA_KVM_R7(r13)
-   std r8, PACA_KVM_R8(r13)
-   std r9, PACA_KVM_R9(r13)
-   std r10, PACA_KVM_R10(r13)
-   std r11, PACA_KVM_R11(r13)
-
-   /* Restore R1/R2 so we can handle faults */
-   ld  r1, PACA_KVM_HOST_R1(r13)
-   ld  r2, PACA_KVM_HOST_R2(r13)
-
-   /* Save guest PC and MSR in GPRs */
-   mfsrr0  r3
-   mfsrr1  r4
-
-   /* Get scratch'ed off regi

[PATCH 15/27] KVM: PPC: Make real mode handler generic

2010-04-15 Thread Alexander Graf
The real mode handler code was originally writen for 64 bit Book3S only.
But since we not add 32 bit functionality too, we need to make some tweaks
to it.

This patch basically combines using the "long" access defines and using
fields from the shadow VCPU we just moved there.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_rmhandlers.S |  119 +-
 1 files changed, 88 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
b/arch/powerpc/kvm/book3s_rmhandlers.S
index bd08535..0c8d331 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -22,7 +22,10 @@
 #include 
 #include 
 #include 
+
+#ifdef CONFIG_PPC_BOOK3S_64
 #include 
+#endif
 
 /*
  *   *
@@ -30,6 +33,39 @@
  *   *
  /
 
+#if defined(CONFIG_PPC_BOOK3S_64)
+
+#define LOAD_SHADOW_VCPU(reg)  \
+   mfspr   reg, SPRN_SPRG_PACA
+
+#define SHADOW_VCPU_OFFPACA_KVM_SVCPU
+#define MSR_NOIRQ  MSR_KERNEL & ~(MSR_IR | MSR_DR)
+#define FUNC(name) GLUE(.,name)
+
+#elif defined(CONFIG_PPC_BOOK3S_32)
+
+#define LOAD_SHADOW_VCPU(reg)  \
+   mfspr   reg, SPRN_SPRG_THREAD;  \
+   lwz reg, THREAD_KVM_SVCPU(reg); \
+   /* PPC32 can have a NULL pointer - let's check for that */  \
+   mtspr   SPRN_SPRG_SCRATCH1, r12;/* Save r12 */  \
+   mfcrr12;\
+   cmpwi   reg, 0; \
+   bne 1f; \
+   mfspr   reg, SPRN_SPRG_SCRATCH0;\
+   mtcrr12;\
+   mfspr   r12, SPRN_SPRG_SCRATCH1;\
+   b   kvmppc_resume_\intno;   \
+1:;\
+   mtcrr12;\
+   mfspr   r12, SPRN_SPRG_SCRATCH1;\
+   tophys(reg, reg)
+
+#define SHADOW_VCPU_OFF0
+#define MSR_NOIRQ  MSR_KERNEL
+#define FUNC(name) name
+
+#endif
 
 .macro INTERRUPT_TRAMPOLINE intno
 
@@ -42,19 +78,19 @@ kvmppc_trampoline_\intno:
 * First thing to do is to find out if we're coming
 * from a KVM guest or a Linux process.
 *
-* To distinguish, we check a magic byte in the PACA
+* To distinguish, we check a magic byte in the PACA/current
 */
-   mfspr   r13, SPRN_SPRG_PACA /* r13 = PACA */
-   std r12, PACA_KVM_SCRATCH0(r13)
+   LOAD_SHADOW_VCPU(r13)
+   PPC_STL r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
mfcrr12
-   stw r12, PACA_KVM_SCRATCH1(r13)
-   lbz r12, PACA_KVM_IN_GUEST(r13)
+   stw r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
+   lbz r12, (SHADOW_VCPU_OFF + SVCPU_IN_GUEST)(r13)
cmpwi   r12, KVM_GUEST_MODE_NONE
bne ..kvmppc_handler_hasmagic_\intno
/* No KVM guest? Then jump back to the Linux handler! */
-   lwz r12, PACA_KVM_SCRATCH1(r13)
+   lwz r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH1)(r13)
mtcrr12
-   ld  r12, PACA_KVM_SCRATCH0(r13)
+   PPC_LL  r12, (SHADOW_VCPU_OFF + SVCPU_SCRATCH0)(r13)
mfspr   r13, SPRN_SPRG_SCRATCH0 /* r13 = original r13 */
b   kvmppc_resume_\intno/* Get back original handler */
 
@@ -76,9 +112,7 @@ kvmppc_trampoline_\intno:
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_SYSTEM_RESET
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_MACHINE_CHECK
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_DATA_STORAGE
-INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_DATA_SEGMENT
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_INST_STORAGE
-INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_INST_SEGMENT
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_EXTERNAL
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_ALIGNMENT
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_PROGRAM
@@ -88,7 +122,14 @@ INTERRUPT_TRAMPOLINEBOOK3S_INTERRUPT_SYSCALL
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_TRACE
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_PERFMON
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_ALTIVEC
+
+/* Those are only available on 64 bit machines */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_INST_SEGMENT
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_VSX
+#en

[PATCH 08/27] KVM: PPC: Add fields to shadow vcpu

2010-04-15 Thread Alexander Graf
After a lot of thought on how to make the entry / exit code easier,
I figured it'd be clever to put even more register state into the
shadow vcpu. That way we have more registers available to use, making
the code easier to read.

So this patch adds a few new fields to that shadow vcpu. Later on we
will remove the originals from the vcpu and paca.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 183461b..e915e7d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -63,12 +63,33 @@ struct kvmppc_book3s_shadow_vcpu {
ulong gpr[14];
u32 cr;
u32 xer;
+
+   u32 fault_dsisr;
+   u32 last_inst;
+   ulong ctr;
+   ulong lr;
+   ulong pc;
+   ulong shadow_srr1;
+   ulong fault_dar;
+
ulong host_r1;
ulong host_r2;
ulong handler;
ulong scratch0;
ulong scratch1;
ulong vmhandler;
+   u8 in_guest;
+
+#ifdef CONFIG_PPC_BOOK3S_32
+   u32 sr[16]; /* Guest SRs */
+#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+   u8 slb_max; /* highest used guest slb entry */
+   struct  {
+   u64 esid;
+   u64 vsid;
+   } slb[64];  /* guest SLB */
+#endif
 };
 
 #endif /*__ASSEMBLY__ */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/27] KVM: PPC: Add kvm_book3s_64.h

2010-04-15 Thread Alexander Graf
In the process of generalizing as much code as possible, I also moved
the shadow vcpu code together to a generic book3s file. Unfortunately
the location of the shadow vcpu is different on 32 and 64 bit, so we
need a wrapper function to tell us where it is.

That sounded like a perfect fit for a subarch specific header file.
Here we can put anything that needs to be different between those two.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   28 
 1 files changed, 28 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
new file mode 100644
index 000..4cadd61
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -0,0 +1,28 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf 
+ */
+
+#ifndef __ASM_KVM_BOOK3S_64_H__
+#define __ASM_KVM_BOOK3S_64_H__
+
+static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu)
+{
+   return &get_paca()->shadow_vcpu;
+}
+
+#endif /* __ASM_KVM_BOOK3S_64_H__ */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/27] KVM: PPC: Use CONFIG_PPC_BOOK3S define

2010-04-15 Thread Alexander Graf
Upstream recently added a new name for PPC64: Book3S_64.

So instead of using CONFIG_PPC64 we should use CONFIG_PPC_BOOK3S consotently.
That makes understanding the code easier (I hope).

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_host.h |8 
 arch/powerpc/kernel/asm-offsets.c   |6 +++---
 arch/powerpc/kvm/Kconfig|2 +-
 arch/powerpc/kvm/emulate.c  |6 +++---
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5869a48..22801f8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -66,7 +66,7 @@ struct kvm_vcpu_stat {
u32 dec_exits;
u32 ext_intr_exits;
u32 halt_wakeup;
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
u32 pf_storage;
u32 pf_instruc;
u32 sp_storage;
@@ -160,7 +160,7 @@ struct hpte_cache {
 struct kvm_vcpu_arch {
ulong host_stack;
u32 host_pid;
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
ulong host_msr;
ulong host_r2;
void *host_retip;
@@ -201,7 +201,7 @@ struct kvm_vcpu_arch {
 #endif
 
ulong msr;
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
ulong shadow_srr1;
ulong hflags;
@@ -283,7 +283,7 @@ struct kvm_vcpu_arch {
u64 dec_jiffies;
unsigned long pending_exceptions;
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
int hpte_cache_offset;
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 957ceb7..57a8c49 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -426,8 +426,8 @@ int main(void)
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 
-   /* book3s_64 */
-#ifdef CONFIG_PPC64
+   /* book3s */
+#ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
@@ -442,7 +442,7 @@ int main(void)
 #else
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
-#endif /* CONFIG_PPC64 */
+#endif /* CONFIG_PPC_BOOK3S */
 #endif
 #ifdef CONFIG_44x
DEFINE(PGD_T_LOG2, PGD_T_LOG2);
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 8ef3766..d864698 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -31,7 +31,7 @@ config KVM_BOOK3S_64_HANDLER
 
 config KVM_BOOK3S_64
tristate "KVM support for PowerPC book3s_64 processors"
-   depends on EXPERIMENTAL && PPC64
+   depends on EXPERIMENTAL && PPC_BOOK3S_64
select KVM
select KVM_BOOK3S_64_HANDLER
---help---
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index c6db28c..b608c0b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -69,7 +69,7 @@
 #define OP_STH  44
 #define OP_STHU 45
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
 static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 {
return 1;
@@ -86,7 +86,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
unsigned long dec_nsec;
 
pr_debug("mtDEC: %x\n", vcpu->arch.dec);
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
/* mtdec lowers the interrupt line when positive. */
kvmppc_core_dequeue_dec(vcpu);
 
@@ -153,7 +153,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
 
switch (get_op(inst)) {
case OP_TRAP:
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_BOOK3S
case OP_TRAP_64:
kvmppc_core_queue_program(vcpu, SRR1_PROGTRAP);
 #else
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/27] KVM: PPC: Add kvm_book3s_32.h

2010-04-15 Thread Alexander Graf
In analogy to the 64 bit specific header file, this is the 32 bit
pendant. With this in place we can just always call to_svcpu and
be assured we get the right pointer anywhere.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_32.h |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_32.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_32.h 
b/arch/powerpc/include/asm/kvm_book3s_32.h
new file mode 100644
index 000..de604db
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_32.h
@@ -0,0 +1,42 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf 
+ */
+
+#ifndef __ASM_KVM_BOOK3S_32_H__
+#define __ASM_KVM_BOOK3S_32_H__
+
+static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu *vcpu)
+{
+   return to_book3s(vcpu)->shadow_vcpu;
+}
+
+#define PTE_SIZE   12
+#define VSID_ALL   0
+#define SR_INVALID 0x0001  /* VSID 1 should always be unused */
+#define SR_KP  0x2000
+#define PTE_V  0x8000
+#define PTE_SEC0x0040
+#define PTE_M  0x0010
+#define PTE_R  0x0100
+#define PTE_C  0x0080
+
+#define SID_SHIFT  28
+#define ESID_MASK  0xf000
+#define VSID_MASK  0x00fff000ULL
+
+#endif /* __ASM_KVM_BOOK3S_32_H__ */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/27] KVM: PPC: Remove fetch fail code

2010-04-15 Thread Alexander Graf
When instruction fetch failed, the inline function hook automatically
detects that and starts the internal guest memory load function. So
whenever we access kvmppc_get_last_inst(), we're sure the result is sane.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/emulate.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index b608c0b..4568ec3 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -147,10 +147,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
 
pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), 
get_xop(inst));
 
-   /* Try again next time */
-   if (inst == KVM_INST_FETCH_FAILED)
-   return EMULATE_DONE;
-
switch (get_op(inst)) {
case OP_TRAP:
 #ifdef CONFIG_PPC_BOOK3S
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/27] KVM: PPC: Make highmem code generic

2010-04-15 Thread Alexander Graf
Since we now have several fields in the shadow VCPU, we also change
the internal calling convention between the different entry/exit code
layers.

Let's reflect that in the IR=1 code and make sure we use "long" defines
for long field access.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_interrupts.S |  201 +-
 1 files changed, 101 insertions(+), 100 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index faca876..f5b3358 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -24,36 +24,56 @@
 #include 
 #include 
 
-#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
-#define ULONG_SIZE 8
-#define VCPU_GPR(n) (VCPU_GPRS + (n * ULONG_SIZE))
+#if defined(CONFIG_PPC_BOOK3S_64)
 
-.macro DISABLE_INTERRUPTS
-   mfmsr   r0
-   rldicl  r0,r0,48,1
-   rotldi  r0,r0,16
-   mtmsrd  r0,1
-.endm
+#define ULONG_SIZE 8
+#define FUNC(name) GLUE(.,name)
 
+#define GET_SHADOW_VCPU(reg)\
+addireg, r13, PACA_KVM_SVCPU
+
+#define DISABLE_INTERRUPTS \
+   mfmsr   r0; \
+   rldicl  r0,r0,48,1; \
+   rotldi  r0,r0,16;   \
+   mtmsrd  r0,1;   \
+
+#elif defined(CONFIG_PPC_BOOK3S_32)
+
+#define ULONG_SIZE  4
+#define FUNC(name) name
+
+#define GET_SHADOW_VCPU(reg)\
+lwz reg, (THREAD + THREAD_KVM_SVCPU)(r2)
+
+#define DISABLE_INTERRUPTS \
+   mfmsr   r0; \
+   rlwinm  r0,r0,0,17,15;  \
+   mtmsr   r0; \
+
+#endif /* CONFIG_PPC_BOOK3S_XX */
+
+
+#define VCPU_GPR(n)(VCPU_GPRS + (n * ULONG_SIZE))
 #define VCPU_LOAD_NVGPRS(vcpu) \
-   ld  r14, VCPU_GPR(r14)(vcpu); \
-   ld  r15, VCPU_GPR(r15)(vcpu); \
-   ld  r16, VCPU_GPR(r16)(vcpu); \
-   ld  r17, VCPU_GPR(r17)(vcpu); \
-   ld  r18, VCPU_GPR(r18)(vcpu); \
-   ld  r19, VCPU_GPR(r19)(vcpu); \
-   ld  r20, VCPU_GPR(r20)(vcpu); \
-   ld  r21, VCPU_GPR(r21)(vcpu); \
-   ld  r22, VCPU_GPR(r22)(vcpu); \
-   ld  r23, VCPU_GPR(r23)(vcpu); \
-   ld  r24, VCPU_GPR(r24)(vcpu); \
-   ld  r25, VCPU_GPR(r25)(vcpu); \
-   ld  r26, VCPU_GPR(r26)(vcpu); \
-   ld  r27, VCPU_GPR(r27)(vcpu); \
-   ld  r28, VCPU_GPR(r28)(vcpu); \
-   ld  r29, VCPU_GPR(r29)(vcpu); \
-   ld  r30, VCPU_GPR(r30)(vcpu); \
-   ld  r31, VCPU_GPR(r31)(vcpu); \
+   PPC_LL  r14, VCPU_GPR(r14)(vcpu); \
+   PPC_LL  r15, VCPU_GPR(r15)(vcpu); \
+   PPC_LL  r16, VCPU_GPR(r16)(vcpu); \
+   PPC_LL  r17, VCPU_GPR(r17)(vcpu); \
+   PPC_LL  r18, VCPU_GPR(r18)(vcpu); \
+   PPC_LL  r19, VCPU_GPR(r19)(vcpu); \
+   PPC_LL  r20, VCPU_GPR(r20)(vcpu); \
+   PPC_LL  r21, VCPU_GPR(r21)(vcpu); \
+   PPC_LL  r22, VCPU_GPR(r22)(vcpu); \
+   PPC_LL  r23, VCPU_GPR(r23)(vcpu); \
+   PPC_LL  r24, VCPU_GPR(r24)(vcpu); \
+   PPC_LL  r25, VCPU_GPR(r25)(vcpu); \
+   PPC_LL  r26, VCPU_GPR(r26)(vcpu); \
+   PPC_LL  r27, VCPU_GPR(r27)(vcpu); \
+   PPC_LL  r28, VCPU_GPR(r28)(vcpu); \
+   PPC_LL  r29, VCPU_GPR(r29)(vcpu); \
+   PPC_LL  r30, VCPU_GPR(r30)(vcpu); \
+   PPC_LL  r31, VCPU_GPR(r31)(vcpu); \
 
 /*
  *   *
@@ -69,11 +89,11 @@ _GLOBAL(__kvmppc_vcpu_entry)
 
 kvm_start_entry:
/* Write correct stack frame */
-   mflrr0
-   std r0,16(r1)
+   mflrr0
+   PPC_STL r0,PPC_LR_STKOFF(r1)
 
/* Save host state to the stack */
-   stdur1, -SWITCH_FRAME_SIZE(r1)
+   PPC_STLU r1, -SWITCH_FRAME_SIZE(r1)
 
/* Save r3 (kvm_run) and r4 (vcpu) */
SAVE_2GPRS(3, r1)
@@ -82,33 +102,28 @@ kvm_start_entry:
SAVE_NVGPRS(r1)
 
/* Save LR */
-   std r0, _LINK(r1)
+   PPC_STL r0, _LINK(r1)
 
/* Load non-volatile guest state from the vcpu */
VCPU_LOAD_NVGPRS(r4)
 
+   GET_SHADOW_VCPU(r5)
+
/* Save R1/R2 in the PACA */
-   std r1, PACA_KVM_HOST_R1(r13)
-   std r2, PACA_KVM_HOST_R2(r13)
+   PPC_STL r1, SVCPU_HOST_R1(r5)
+   PPC_STL r2, SVCPU_HOST_R2(r5)
 
/* XXX swap in/out on load? */
-   ld  r3, VCPU_HIGHMEM_HANDLER(r4)
-   std r3, PACA_KVM_VMHANDLER(r13)
+   PPC_LL  r3, VCPU_HIGHMEM_HANDLER(r4)
+   PPC_STL r3, SVCPU_VMHANDLER(r5)
 
 kvm_start_lightweight:
 
-   ld  r9, VCPU_PC(r4) /* r9 = vcpu->arch.pc */
-   ld  r10, VCPU_SHADOW_MSR(r4)/* r10 = vcpu->arch.shadow_msr 
*/
-
-   /* Load some guest state in the respective registers */
-   ld  r5, VCPU_CTR(r4)/* r5 = vcpu->arch.ctr */
-   /* will be swapped in by rmcall */
-
-  

[PATCH 22/27] KVM: PPC: Add Book3S compatibility code

2010-04-15 Thread Alexander Graf
Some code we had so far required defines and had code that was completely
Book3S_64 specific. Since we now opened book3s.c to Book3S_32 too, we need
to take care of these pieces.

So let's add some minor code where it makes sense to not go the Book3S_64
code paths and add compat defines on others.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c|   26 +-
 arch/powerpc/kvm/book3s_32_mmu.c |3 +++
 arch/powerpc/kvm/book3s_emulate.c|4 
 arch/powerpc/kvm/book3s_rmhandlers.S |4 ++--
 4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 178ddd4..f5229f9 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -39,6 +39,13 @@
 static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 ulong msr);
 
+/* Some compatibility defines */
+#ifdef CONFIG_PPC_BOOK3S_32
+#define MSR_USER32 MSR_USER
+#define MSR_USER64 MSR_USER
+#define HW_PAGE_SIZE PAGE_SIZE
+#endif
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "exits",   VCPU_STAT(sum_exits) },
{ "mmio",VCPU_STAT(mmio_exits) },
@@ -347,11 +354,14 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
 {
vcpu->arch.hflags &= ~BOOK3S_HFLAG_SLB;
vcpu->arch.pvr = pvr;
+#ifdef CONFIG_PPC_BOOK3S_64
if ((pvr >= 0x33) && (pvr < 0x7033)) {
kvmppc_mmu_book3s_64_init(vcpu);
to_book3s(vcpu)->hior = 0xfff0;
to_book3s(vcpu)->msr_mask = 0xULL;
-   } else {
+   } else
+#endif
+   {
kvmppc_mmu_book3s_32_init(vcpu);
to_book3s(vcpu)->hior = 0;
to_book3s(vcpu)->msr_mask = 0xULL;
@@ -368,6 +378,11 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
   really needs them in a VM on Cell and force disable them. */
if (!strcmp(cur_cpu_spec->platform, "ppc-cell-be"))
to_book3s(vcpu)->msr_mask &= ~(MSR_FE0 | MSR_FE1);
+
+#ifdef CONFIG_PPC_BOOK3S_32
+   /* 32 bit Book3S always has 32 byte dcbz */
+   vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+#endif
 }
 
 /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
@@ -1211,8 +1226,13 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
 
vcpu->arch.host_retip = kvm_return_point;
vcpu->arch.host_msr = mfmsr();
+#ifdef CONFIG_PPC_BOOK3S_64
/* default to book3s_64 (970fx) */
vcpu->arch.pvr = 0x3C0301;
+#else
+   /* default to book3s_32 (750) */
+   vcpu->arch.pvr = 0x84202;
+#endif
kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
vcpu_book3s->slb_nr = 64;
 
@@ -1220,7 +1240,11 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+#ifdef CONFIG_PPC_BOOK3S_64
vcpu->arch.rmcall = *(ulong*)kvmppc_rmcall;
+#else
+   vcpu->arch.rmcall = (ulong)kvmppc_rmcall;
+#endif
 
vcpu->arch.shadow_msr = MSR_USER64;
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 7071e22..48efb37 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -45,6 +45,9 @@
 
 #define PTEG_FLAG_ACCESSED 0x0100
 #define PTEG_FLAG_DIRTY0x0080
+#ifndef SID_SHIFT
+#define SID_SHIFT  28
+#endif
 
 static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index daa829b..3f7afb5 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -59,6 +59,10 @@
 #define SPRN_GQR6  918
 #define SPRN_GQR7  919
 
+/* Book3S_32 defines mfsrin(v) - but that messes up our abstract
+ * function pointers, so let's just disable the define. */
+#undef mfsrin
+
 int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance)
 {
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
b/arch/powerpc/kvm/book3s_rmhandlers.S
index da571f8..86f9bde 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -242,11 +242,11 @@ define_load_up(vsx)
 
 .global kvmppc_trampoline_lowmem
 kvmppc_trampoline_lowmem:
-   .long kvmppc_handler_lowmem_trampoline - _stext
+   .long kvmppc_handler_lowmem_trampoline - CONFIG_KERNEL_START
 
 .global kvmppc_trampoline_enter
 kvmppc_trampoline_enter:
-   .long kvmppc_handler_trampoline_enter - _stext
+   .long kvmppc_handler_trampoline_enter - CONFIG_KERNEL_START
 
 #include "book3s_segment.S"
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsu

[PATCH 09/27] KVM: PPC: Improve indirect svcpu accessors

2010-04-15 Thread Alexander Graf
We already have some inline fuctions we use to access vcpu or svcpu structs,
depending on whether we're on booke or book3s. Since we just put a few more
registers into the svcpu, we also need to make sure the respective callbacks
are available and get used.

So this patch moves direct use of the now in the svcpu struct fields to
inline function calls. While at it, it also moves the definition of those
inline function calls to respective header files for booke and book3s,
greatly improving readability.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h|   98 +++-
 arch/powerpc/include/asm/kvm_booke.h |   96 +++
 arch/powerpc/include/asm/kvm_ppc.h   |   79 +--
 arch/powerpc/kvm/book3s.c|  125 +-
 arch/powerpc/kvm/book3s_64_mmu.c |2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   26 +++---
 arch/powerpc/kvm/book3s_emulate.c|6 +-
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 arch/powerpc/kvm/emulate.c   |7 +-
 arch/powerpc/kvm/powerpc.c   |2 +-
 10 files changed, 290 insertions(+), 153 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_booke.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 7670e2a..9517b8d 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -71,7 +71,7 @@ struct kvmppc_sid_map {
 
 struct kvmppc_vcpu_book3s {
struct kvm_vcpu vcpu;
-   struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
+   struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
struct kvmppc_sid_map sid_map[SID_MAP_NUM];
struct kvmppc_slb slb[64];
struct {
@@ -147,6 +147,94 @@ static inline ulong dsisr(void)
 }
 
 extern void kvm_return_point(void);
+static inline struct kvmppc_book3s_shadow_vcpu *to_svcpu(struct kvm_vcpu 
*vcpu);
+
+static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
+{
+   if ( num < 14 ) {
+   to_svcpu(vcpu)->gpr[num] = val;
+   to_book3s(vcpu)->shadow_vcpu->gpr[num] = val;
+   } else
+   vcpu->arch.gpr[num] = val;
+}
+
+static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
+{
+   if ( num < 14 )
+   return to_svcpu(vcpu)->gpr[num];
+   else
+   return vcpu->arch.gpr[num];
+}
+
+static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
+{
+   to_svcpu(vcpu)->cr = val;
+   to_book3s(vcpu)->shadow_vcpu->cr = val;
+}
+
+static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->cr;
+}
+
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+{
+   to_svcpu(vcpu)->xer = val;
+   to_book3s(vcpu)->shadow_vcpu->xer = val;
+}
+
+static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->xer;
+}
+
+static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
+{
+   to_svcpu(vcpu)->ctr = val;
+}
+
+static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->ctr;
+}
+
+static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
+{
+   to_svcpu(vcpu)->lr = val;
+}
+
+static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->lr;
+}
+
+static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
+{
+   to_svcpu(vcpu)->pc = val;
+}
+
+static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->pc;
+}
+
+static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
+{
+   ulong pc = kvmppc_get_pc(vcpu);
+   struct kvmppc_book3s_shadow_vcpu *svcpu = to_svcpu(vcpu);
+
+   /* Load the instruction manually if it failed to do so in the
+* exit path */
+   if (svcpu->last_inst == KVM_INST_FETCH_FAILED)
+   kvmppc_ld(vcpu, &pc, sizeof(u32), &svcpu->last_inst, false);
+
+   return svcpu->last_inst;
+}
+
+static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
+{
+   return to_svcpu(vcpu)->fault_dar;
+}
 
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
@@ -155,4 +243,12 @@ extern void kvm_return_point(void);
 
 #define INS_DCBZ   0x7c0007ec
 
+/* Also add subarch specific defines */
+
+#ifdef CONFIG_PPC_BOOK3S_32
+#include 
+#else
+#include 
+#endif
+
 #endif /* __ASM_KVM_BOOK3S_H__ */
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
new file mode 100644
index 000..9c9ba3d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -0,0 +1,96 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it

[PATCH 20/27] KVM: PPC: Add SVCPU to Book3S_32

2010-04-15 Thread Alexander Graf
We need to keep the pointer to the shadow vcpu somewhere accessible from
within really early interrupt code. The best fit I found was the thread
struct, as that resides in an SPRG.

So let's put a pointer to the shadow vcpu in the thread struct and add
an asm-offset so we can find it.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/processor.h |3 +++
 arch/powerpc/kernel/asm-offsets.c|3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 221ba62..7492fe8 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -229,6 +229,9 @@ struct thread_struct {
unsigned long   spefscr;/* SPE & eFP status */
int used_spe;   /* set if process has used spe */
 #endif /* CONFIG_SPE */
+#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
+   void*   kvm_shadow_vcpu; /* KVM internal data */
+#endif /* CONFIG_KVM_BOOK3S_32_HANDLER */
 };
 
 #define ARCH_MIN_TASKALIGN 16
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e8003ff..1804c2c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -108,6 +108,9 @@ int main(void)
DEFINE(THREAD_USED_SPE, offsetof(struct thread_struct, used_spe));
 #endif /* CONFIG_SPE */
 #endif /* CONFIG_PPC64 */
+#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
+   DEFINE(THREAD_KVM_SVCPU, offsetof(struct thread_struct, 
kvm_shadow_vcpu));
+#endif
 
DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
DEFINE(TI_LOCAL_FLAGS, offsetof(struct thread_info, local_flags));
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/27] KVM: PPC: Enable Book3S_32 KVM building

2010-04-15 Thread Alexander Graf
Now that we have all the bits and pieces in place, let's enable building
of the Book3S_32 target.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/Kconfig  |   18 ++
 arch/powerpc/kvm/Makefile |   12 
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index d864698..b7baff7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -25,10 +25,28 @@ config KVM
 config KVM_BOOK3S_HANDLER
bool
 
+config KVM_BOOK3S_32_HANDLER
+   bool
+   select KVM_BOOK3S_HANDLER
+
 config KVM_BOOK3S_64_HANDLER
bool
select KVM_BOOK3S_HANDLER
 
+config KVM_BOOK3S_32
+   tristate "KVM support for PowerPC book3s_32 processors"
+   depends on EXPERIMENTAL && PPC_BOOK3S_32 && !SMP && !PTE_64BIT
+   select KVM
+   select KVM_BOOK3S_32_HANDLER
+   ---help---
+ Support running unmodified book3s_32 guest kernels
+ in virtual machines on book3s_32 host processors.
+
+ This module provides access to the hardware capabilities through
+ a character device node named /dev/kvm.
+
+ If unsure, say N.
+
 config KVM_BOOK3S_64
tristate "KVM support for PowerPC book3s_64 processors"
depends on EXPERIMENTAL && PPC_BOOK3S_64
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index f621ce6..ff43606 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -50,9 +50,21 @@ kvm-book3s_64-objs := \
book3s_32_mmu.o
 kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
 
+kvm-book3s_32-objs := \
+   $(common-objs-y) \
+   fpu.o \
+   book3s_paired_singles.o \
+   book3s.o \
+   book3s_emulate.o \
+   book3s_interrupts.o \
+   book3s_32_mmu_host.o \
+   book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs)
+
 kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
 
 obj-$(CONFIG_KVM_440) += kvm.o
 obj-$(CONFIG_KVM_E500) += kvm.o
 obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_32) += kvm.o
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/27] KVM: PPC: Release clean pages as clean

2010-04-15 Thread Alexander Graf
When we mapped a page as read-only, we can just release it as clean to
KVM's page claim mechanisms, because we're pretty sure it hasn't been
touched.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 0eea589..b230154 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -55,7 +55,11 @@ static void invalidate_pte(struct hpte_cache *pte)
   MMU_PAGE_4K, MMU_SEGSIZE_256M,
   false);
pte->host_va = 0;
-   kvm_release_pfn_dirty(pte->pfn);
+
+   if (pte->pte.may_write)
+   kvm_release_pfn_dirty(pte->pfn);
+   else
+   kvm_release_pfn_clean(pte->pfn);
 }
 
 void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/27] PPC: Add STLU

2010-04-15 Thread Alexander Graf
For assembly code there are several "long" load and store defines already.
The one that's missing is the typical stack store, stdu/stwu.

So let's add that define as well, making my KVM code happy.

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/asm-compat.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-compat.h 
b/arch/powerpc/include/asm/asm-compat.h
index a9b91ed..2048a6a 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -21,6 +21,7 @@
 /* operations for longs and pointers */
 #define PPC_LL stringify_in_c(ld)
 #define PPC_STLstringify_in_c(std)
+#define PPC_STLU   stringify_in_c(stdu)
 #define PPC_LCMPI  stringify_in_c(cmpdi)
 #define PPC_LONG   stringify_in_c(.llong)
 #define PPC_LONG_ALIGN stringify_in_c(.balign 8)
@@ -44,6 +45,7 @@
 /* operations for longs and pointers */
 #define PPC_LL stringify_in_c(lwz)
 #define PPC_STLstringify_in_c(stw)
+#define PPC_STLU   stringify_in_c(stwu)
 #define PPC_LCMPI  stringify_in_c(cmpwi)
 #define PPC_LONG   stringify_in_c(.long)
 #define PPC_LONG_ALIGN stringify_in_c(.balign 4)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/27] KVM: PPC: Add generic segment switching code

2010-04-15 Thread Alexander Graf
This is the code that will later be used instead of book3s_64_slb.S. It
does the last step of guest entry and the first generic steps of guest
exiting, once we have determined the interrupt is a KVM interrupt.

It also reads the last used instruction from the guest virtual address
space if necessary, to speed up that path.

The new thing about this file is that it makes use of generic long load
and store functions and calls a macro to fill in the actual segment
switching code. That still needs to be done differently for book3s_32 and
book3s_64.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_segment.S |  258 +
 1 files changed, 258 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_segment.S

diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
new file mode 100644
index 000..778e3fc
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -0,0 +1,258 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf 
+ */
+
+/* Real mode helpers */
+
+#if defined(CONFIG_PPC_BOOK3S_64)
+
+#define GET_SHADOW_VCPU(reg)\
+   addireg, r13, PACA_KVM_SVCPU
+
+#elif defined(CONFIG_PPC_BOOK3S_32)
+
+#define GET_SHADOW_VCPU(reg)   \
+   tophys(reg, r2);\
+   lwz reg, (THREAD + THREAD_KVM_SVCPU)(reg);  \
+   tophys(reg, reg)
+
+#endif
+
+/* Disable for nested KVM */
+#define USE_QUICK_LAST_INST
+
+
+/* Get helper functions for subarch specific functionality */
+
+#if defined(CONFIG_PPC_BOOK3S_64)
+#include "book3s_64_slb.S"
+#elif defined(CONFIG_PPC_BOOK3S_32)
+#include "book3s_32_sr.S"
+#endif
+
+/**
+ **
+ *   Entry code   *
+ **
+ */
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+   /* Required state:
+*
+* MSR = ~IR|DR
+* R13 = PACA
+* R1 = host R1
+* R2 = host R2
+* R10 = guest MSR
+* all other volatile GPRS = free
+* SVCPU[CR] = guest CR
+* SVCPU[XER] = guest XER
+* SVCPU[CTR] = guest CTR
+* SVCPU[LR] = guest LR
+*/
+
+   /* r3 = shadow vcpu */
+   GET_SHADOW_VCPU(r3)
+
+   /* Move SRR0 and SRR1 into the respective regs */
+   PPC_LL  r9, SVCPU_PC(r3)
+   mtsrr0  r9
+   mtsrr1  r10
+
+   /* Activate guest mode, so faults get handled by KVM */
+   li  r11, KVM_GUEST_MODE_GUEST
+   stb r11, SVCPU_IN_GUEST(r3)
+
+   /* Switch to guest segment. This is subarch specific. */
+   LOAD_GUEST_SEGMENTS
+
+   /* Enter guest */
+
+   PPC_LL  r4, (SVCPU_CTR)(r3)
+   PPC_LL  r5, (SVCPU_LR)(r3)
+   lwz r6, (SVCPU_CR)(r3)
+   lwz r7, (SVCPU_XER)(r3)
+
+   mtctr   r4
+   mtlrr5
+   mtcrr6
+   mtxer   r7
+
+   PPC_LL  r0, (SVCPU_R0)(r3)
+   PPC_LL  r1, (SVCPU_R1)(r3)
+   PPC_LL  r2, (SVCPU_R2)(r3)
+   PPC_LL  r4, (SVCPU_R4)(r3)
+   PPC_LL  r5, (SVCPU_R5)(r3)
+   PPC_LL  r6, (SVCPU_R6)(r3)
+   PPC_LL  r7, (SVCPU_R7)(r3)
+   PPC_LL  r8, (SVCPU_R8)(r3)
+   PPC_LL  r9, (SVCPU_R9)(r3)
+   PPC_LL  r10, (SVCPU_R10)(r3)
+   PPC_LL  r11, (SVCPU_R11)(r3)
+   PPC_LL  r12, (SVCPU_R12)(r3)
+   PPC_LL  r13, (SVCPU_R13)(r3)
+
+   PPC_LL  r3, (SVCPU_R3)(r3)
+
+   RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/**
+ **
+ *   Exit code*
+ **
+ */
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+   /* Register usage at this point:
+*
+* SP

[PATCH 03/27] KVM: PPC: Add SR swapping code

2010-04-15 Thread Alexander Graf
Later in this series we will move the current segment switch code to
generic code and make that call hooks for the specific sub-archs (32
vs. 64 bit). This is the hook for 32 bits.

It enabled the entry and exit code to swap segment registers with
values from the shadow cpu structure.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_32_sr.S |  143 +++
 1 files changed, 143 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_sr.S

diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
new file mode 100644
index 000..3608471
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_sr.S
@@ -0,0 +1,143 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf 
+ */
+
+/**
+ **
+ *   Entry code   *
+ **
+ */
+
+.macro LOAD_GUEST_SEGMENTS
+
+   /* Required state:
+*
+* MSR = ~IR|DR
+* R1 = host R1
+* R2 = host R2
+* R3 = shadow vcpu
+* all other volatile GPRS = free
+* SVCPU[CR]  = guest CR
+* SVCPU[XER] = guest XER
+* SVCPU[CTR] = guest CTR
+* SVCPU[LR]  = guest LR
+*/
+
+#define XCHG_SR(n) lwz r9, (SVCPU_SR+(n*4))(r3);  \
+   mtsrn, r9
+
+   XCHG_SR(0)
+   XCHG_SR(1)
+   XCHG_SR(2)
+   XCHG_SR(3)
+   XCHG_SR(4)
+   XCHG_SR(5)
+   XCHG_SR(6)
+   XCHG_SR(7)
+   XCHG_SR(8)
+   XCHG_SR(9)
+   XCHG_SR(10)
+   XCHG_SR(11)
+   XCHG_SR(12)
+   XCHG_SR(13)
+   XCHG_SR(14)
+   XCHG_SR(15)
+
+   /* Clear BATs. */
+
+#define KVM_KILL_BAT(n, reg)   \
+mtspr   SPRN_IBAT##n##U,reg;   \
+mtspr   SPRN_IBAT##n##L,reg;   \
+mtspr   SPRN_DBAT##n##U,reg;   \
+mtspr   SPRN_DBAT##n##L,reg;   \
+
+li r9, 0
+   KVM_KILL_BAT(0, r9)
+   KVM_KILL_BAT(1, r9)
+   KVM_KILL_BAT(2, r9)
+   KVM_KILL_BAT(3, r9)
+
+.endm
+
+/**
+ **
+ *   Exit code*
+ **
+ */
+
+.macro LOAD_HOST_SEGMENTS
+
+   /* Register usage at this point:
+*
+* R1 = host R1
+* R2 = host R2
+* R12= exit handler id
+* R13= shadow vcpu - SHADOW_VCPU_OFF
+* SVCPU.*= guest *
+* SVCPU[CR]  = guest CR
+* SVCPU[XER] = guest XER
+* SVCPU[CTR] = guest CTR
+* SVCPU[LR]  = guest LR
+*
+*/
+
+   /* Restore BATs */
+
+   /* We only overwrite the upper part, so we only restoree
+  the upper part. */
+#define KVM_LOAD_BAT(n, reg, RA, RB)   \
+   lwz RA,(n*16)+0(reg);   \
+   lwz RB,(n*16)+4(reg);   \
+   mtspr   SPRN_IBAT##n##U,RA; \
+   mtspr   SPRN_IBAT##n##L,RB; \
+   lwz RA,(n*16)+8(reg);   \
+   lwz RB,(n*16)+12(reg);  \
+   mtspr   SPRN_DBAT##n##U,RA; \
+   mtspr   SPRN_DBAT##n##L,RB; \
+
+   lis r9, b...@ha
+   addir9, r9, b...@l
+   tophys(r9, r9)
+   KVM_LOAD_BAT(0, r9, r10, r11)
+   KVM_LOAD_BAT(1, r9, r10, r11)
+   KVM_LOAD_BAT(2, r9, r10, r11)
+   KVM_LOAD_BAT(3, r9, r10, r11)
+
+   /* Restore Segment Registers */
+
+   /* 0xc - 0xf */
+
+li  r0, 4
+mtctr   r0
+   LOAD_REG_IMMEDIATE(r3, 0x2000 | (0x111 * 0xc))
+lis r4, 0xc000
+3:  mtsrin  r3, r4
+addir3, r3, 0x111 /* increment VSID */
+addis   r4, r4, 0x1000/* address of next segment */
+bdnz3b
+
+   /* 0x0 - 0xb */
+
+   /* 'cu

[PATCH 05/27] PPC: Split context init/destroy functions

2010-04-15 Thread Alexander Graf
We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.

So let's split it now.

Signed-off-by: Alexander Graf 
CC: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/mmu_context.h |2 ++
 arch/powerpc/mm/mmu_context_hash32.c   |   29 ++---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 26383e0..81fb412 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -27,6 +27,8 @@ extern int __init_new_context(void);
 extern void __destroy_context(int context_id);
 static inline void mmu_context_init(void) { }
 #else
+extern unsigned long __init_new_context(void);
+extern void __destroy_context(unsigned long context_id);
 extern void mmu_context_init(void);
 #endif
 
diff --git a/arch/powerpc/mm/mmu_context_hash32.c 
b/arch/powerpc/mm/mmu_context_hash32.c
index 0dfba2b..d0ee554 100644
--- a/arch/powerpc/mm/mmu_context_hash32.c
+++ b/arch/powerpc/mm/mmu_context_hash32.c
@@ -60,11 +60,7 @@
 static unsigned long next_mmu_context;
 static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
 
-
-/*
- * Set up the context for a new address space.
- */
-int init_new_context(struct task_struct *t, struct mm_struct *mm)
+unsigned long __init_new_context(void)
 {
unsigned long ctx = next_mmu_context;
 
@@ -74,19 +70,38 @@ int init_new_context(struct task_struct *t, struct 
mm_struct *mm)
ctx = 0;
}
next_mmu_context = (ctx + 1) & LAST_CONTEXT;
-   mm->context.id = ctx;
+
+   return ctx;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+/*
+ * Set up the context for a new address space.
+ */
+int init_new_context(struct task_struct *t, struct mm_struct *mm)
+{
+   mm->context.id = __init_new_context();
 
return 0;
 }
 
 /*
+ * Free a context ID. Make sure to call this with preempt disabled!
+ */
+void __destroy_context(unsigned long ctx)
+{
+   clear_bit(ctx, context_map);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
+
+/*
  * We're finished using the context for an address space.
  */
 void destroy_context(struct mm_struct *mm)
 {
preempt_disable();
if (mm->context.id != NO_CONTEXT) {
-   clear_bit(mm->context.id, context_map);
+   __destroy_context(mm->context.id);
mm->context.id = NO_CONTEXT;
}
preempt_enable();
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/27] KVM: PPC: Emulate segment fault

2010-04-15 Thread Alexander Graf
Book3S_32 doesn't know about segment faults. It only knows about page faults.
So in order to know that we didn't map a segment, we need to fake segment
faults.

We do this by setting invalid segment registers to an invalid VSID and then
check for that VSID on normal page faults.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index b917b97..178ddd4 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -774,6 +774,18 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
switch (exit_nr) {
case BOOK3S_INTERRUPT_INST_STORAGE:
vcpu->stat.pf_instruc++;
+
+#ifdef CONFIG_PPC_BOOK3S_32
+   /* We set segments as unused segments when invalidating them. So
+* treat the respective fault as segment fault. */
+   if (to_svcpu(vcpu)->sr[kvmppc_get_pc(vcpu) >> SID_SHIFT]
+   == SR_INVALID) {
+   kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu));
+   r = RESUME_GUEST;
+   break;
+   }
+#endif
+
/* only care about PTEG not found errors, but leave NX alone */
if (to_svcpu(vcpu)->shadow_srr1 & 0x4000) {
r = kvmppc_handle_pagefault(run, vcpu, 
kvmppc_get_pc(vcpu), exit_nr);
@@ -798,6 +810,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
{
ulong dar = kvmppc_get_fault_dar(vcpu);
vcpu->stat.pf_storage++;
+
+#ifdef CONFIG_PPC_BOOK3S_32
+   /* We set segments as unused segments when invalidating them. So
+* treat the respective fault as segment fault. */
+   if ((to_svcpu(vcpu)->sr[dar >> SID_SHIFT]) == SR_INVALID) {
+   kvmppc_mmu_map_segment(vcpu, dar);
+   r = RESUME_GUEST;
+   break;
+   }
+#endif
+
/* The only case we need to handle is missing shadow PTEs */
if (to_svcpu(vcpu)->fault_dsisr & DSISR_NOHPTE) {
r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/27] PPC: Export SWITCH_FRAME_SIZE

2010-04-15 Thread Alexander Graf
We need the SWITCH_FRAME_SIZE define on Book3S_32 now too.
So let's export it unconditionally.

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/asm-offsets.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1804c2c..2716c51 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -210,8 +210,8 @@ int main(void)
/* Interrupt register frame */
DEFINE(STACK_FRAME_OVERHEAD, STACK_FRAME_OVERHEAD);
DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE);
-#ifdef CONFIG_PPC64
DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct 
pt_regs));
+#ifdef CONFIG_PPC64
/* Create extra stack space for SRR0 and SRR1 when calling prom/rtas. */
DEFINE(PROM_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 
16);
DEFINE(RTAS_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) + 
16);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/27] KVM: PPC: Check max IRQ prio

2010-04-15 Thread Alexander Graf
We have a define on what the highest bit of IRQ priorities is. So we can
just as well use it in the bit checking code and avoid invalid IRQ values
to be triggered.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f5229f9..5805f99 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -336,7 +336,7 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
printk(KERN_EMERG "KVM: Check pending: %lx\n", 
vcpu->arch.pending_exceptions);
 #endif
priority = __ffs(*pending);
-   while (priority <= (sizeof(unsigned int) * 8)) {
+   while (priority < BOOK3S_IRQPRIO_MAX) {
if (kvmppc_book3s_irqprio_deliver(vcpu, priority) &&
(priority != BOOK3S_IRQPRIO_DECREMENTER)) {
/* DEC interrupts get cleared by mtdec */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/27] KVM: PPC: Add KVM intercept handlers

2010-04-15 Thread Alexander Graf
When an interrupt occurs we don't know yet if we're in guest context or
in host context. When in guest context, KVM needs to handle it.

So let's pull the same trick we did on Book3S_64: Just add a macro to
determine if we're in guest context or not and if so jump on to KVM code.

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/head_32.S |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index e025e89..98c4b29 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* 601 only have IBAT; cr0.eq is set on 601 when using this macro */
 #define LOAD_BAT(n, reg, RA, RB)   \
@@ -303,6 +304,7 @@ __secondary_hold_acknowledge:
  */
 #define EXCEPTION(n, label, hdlr, xfer)\
. = n;  \
+   DO_KVM n;   \
 label: \
EXCEPTION_PROLOG;   \
addir3,r1,STACK_FRAME_OVERHEAD; \
@@ -358,6 +360,7 @@ i##n:   
\
  * -- paulus.
  */
. = 0x200
+   DO_KVM  0x200
mtspr   SPRN_SPRG_SCRATCH0,r10
mtspr   SPRN_SPRG_SCRATCH1,r11
mfcrr10
@@ -381,6 +384,7 @@ i##n:   
\
 
 /* Data access exception. */
. = 0x300
+   DO_KVM  0x300
 DataAccess:
EXCEPTION_PROLOG
mfspr   r10,SPRN_DSISR
@@ -397,6 +401,7 @@ DataAccess:
 
 /* Instruction access exception. */
. = 0x400
+   DO_KVM  0x400
 InstructionAccess:
EXCEPTION_PROLOG
andis.  r0,r9,0x4000/* no pte found? */
@@ -413,6 +418,7 @@ InstructionAccess:
 
 /* Alignment exception */
. = 0x600
+   DO_KVM  0x600
 Alignment:
EXCEPTION_PROLOG
mfspr   r4,SPRN_DAR
@@ -427,6 +433,7 @@ Alignment:
 
 /* Floating-point unavailable */
. = 0x800
+   DO_KVM  0x800
 FPUnavailable:
 BEGIN_FTR_SECTION
 /*
@@ -450,6 +457,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_FPU_UNAVAILABLE)
 
 /* System call */
. = 0xc00
+   DO_KVM  0xc00
 SystemCall:
EXCEPTION_PROLOG
EXC_XFER_EE_LITE(0xc00, DoSyscall)
@@ -467,9 +475,11 @@ SystemCall:
  * by executing an altivec instruction.
  */
. = 0xf00
+   DO_KVM  0xf00
b   PerformanceMonitor
 
. = 0xf20
+   DO_KVM  0xf20
b   AltiVecUnavailable
 
 /*
@@ -882,6 +892,10 @@ __secondary_start:
RFI
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
+#include "../kvm/book3s_rmhandlers.S"
+#endif
+
 /*
  * Those generic dummy functions are kept for CPUs not
  * included in CONFIG_6xx
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/27] KVM: PPC: Export MMU variables

2010-04-15 Thread Alexander Graf
Our shadow MMU code needs to know where the HTAB is located and how
big it is. So we need some variables from the kernel exported to
module space if KVM is built as a module.

CC: Benjamin Herrenschmidt 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kernel/ppc_ksyms.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index bc9f39d..2b7c43f 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -178,6 +178,11 @@ EXPORT_SYMBOL(switch_mmu_context);
 extern long mol_trampoline;
 EXPORT_SYMBOL(mol_trampoline); /* For MOL */
 EXPORT_SYMBOL(flush_hash_pages); /* For MOL */
+
+extern struct hash_pte *Hash;
+extern unsigned long _SDR1;
+EXPORT_SYMBOL_GPL(Hash); /* For KVM */
+EXPORT_SYMBOL_GPL(_SDR1); /* For KVM */
 #ifdef CONFIG_SMP
 extern int mmu_hash_lock;
 EXPORT_SYMBOL(mmu_hash_lock); /* For MOL */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/27] KVM: PPC: Use KVM_BOOK3S_HANDLER

2010-04-15 Thread Alexander Graf
So far we had a lot of conditional code on CONFIG_KVM_BOOK3S_64_HANDLER.
As we're moving towards common code between 32 and 64 bits, most of
these ifdefs can be moved to a more generic term define, called
CONFIG_KVM_BOOK3S_HANDLER.

This patch adds the new generic config option and moves ifdefs over.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |4 ++--
 arch/powerpc/include/asm/paca.h   |2 +-
 arch/powerpc/kvm/Kconfig  |4 
 arch/powerpc/kvm/Makefile |2 +-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index e915e7d..36fdb3a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -22,7 +22,7 @@
 
 #ifdef __ASSEMBLY__
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
 
 #include 
 
@@ -55,7 +55,7 @@ kvmppc_resume_\intno:
 .macro DO_KVM intno
 .endm
 
-#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+#endif /* CONFIG_KVM_BOOK3S_HANDLER */
 
 #else  /*__ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dc3ccdf..33347ea 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -136,7 +136,7 @@ struct paca_struct {
u64 startpurr;  /* PURR/TB value snapshot */
u64 startspurr; /* SPURR value snapshot */
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#ifdef CONFIG_KVM_BOOK3S_HANDLER
struct  {
u64 esid;
u64 vsid;
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 60624cc..8ef3766 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -22,8 +22,12 @@ config KVM
select ANON_INODES
select KVM_MMIO
 
+config KVM_BOOK3S_HANDLER
+   bool
+
 config KVM_BOOK3S_64_HANDLER
bool
+   select KVM_BOOK3S_HANDLER
 
 config KVM_BOOK3S_64
tristate "KVM support for PowerPC book3s_64 processors"
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 0a67310..f621ce6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -14,7 +14,7 @@ CFLAGS_emulate.o  := -I.
 
 common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_exports.o
+obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/27] Book3S_32 (PPC32) KVM support

2010-04-15 Thread Alexander Graf
Since we do have support for Book3S_64 KVM now, the next obvious step is to
support the generation before that: Book3S_32.

This patch set adds support for Book3S_32 hosts, making your old G4 this much
more useful. It should also work on fancy exotic systems like the Wii and the
Game Cube, but I haven't tried yet.

As far as the path I took goes, I tried to merge as much functionality and code
as possible with the 64 bit host support. So whenever code was reusable, it gets
reused.

Alexander Graf (27):
  KVM: PPC: Name generic 64-bit code generic
  KVM: PPC: Add host MMU Support
  KVM: PPC: Add SR swapping code
  KVM: PPC: Add generic segment switching code
  PPC: Split context init/destroy functions
  KVM: PPC: Add kvm_book3s_64.h
  KVM: PPC: Add kvm_book3s_32.h
  KVM: PPC: Add fields to shadow vcpu
  KVM: PPC: Improve indirect svcpu accessors
  KVM: PPC: Use KVM_BOOK3S_HANDLER
  KVM: PPC: Use CONFIG_PPC_BOOK3S define
  PPC: Add STLU
  KVM: PPC: Use now shadowed vcpu fields
  KVM: PPC: Extract MMU init
  KVM: PPC: Make real mode handler generic
  KVM: PPC: Make highmem code generic
  KVM: PPC: Make SLB switching code the new segment framework
  KVM: PPC: Release clean pages as clean
  KVM: PPC: Remove fetch fail code
  KVM: PPC: Add SVCPU to Book3S_32
  KVM: PPC: Emulate segment fault
  KVM: PPC: Add Book3S compatibility code
  KVM: PPC: Export MMU variables
  PPC: Export SWITCH_FRAME_SIZE
  KVM: PPC: Check max IRQ prio
  KVM: PPC: Add KVM intercept handlers
  KVM: PPC: Enable Book3S_32 KVM building

 arch/powerpc/include/asm/asm-compat.h|2 +
 arch/powerpc/include/asm/kvm_book3s.h|  100 +-
 arch/powerpc/include/asm/kvm_book3s_32.h |   42 ++
 arch/powerpc/include/asm/kvm_book3s_64.h |   28 ++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   76 
 arch/powerpc/include/asm/kvm_book3s_asm.h|   97 +
 arch/powerpc/include/asm/kvm_booke.h |   96 +
 arch/powerpc/include/asm/kvm_host.h  |   16 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   80 +
 arch/powerpc/include/asm/mmu_context.h   |2 +
 arch/powerpc/include/asm/paca.h  |   10 +-
 arch/powerpc/include/asm/processor.h |3 +
 arch/powerpc/kernel/asm-offsets.c|  102 +++--
 arch/powerpc/kernel/head_32.S|   14 +
 arch/powerpc/kernel/head_64.S|4 +-
 arch/powerpc/kernel/ppc_ksyms.c  |5 +
 arch/powerpc/kvm/Kconfig |   24 +-
 arch/powerpc/kvm/Makefile|   18 +-
 arch/powerpc/kvm/book3s.c|  184 ++---
 arch/powerpc/kvm/book3s_32_mmu.c |3 +
 arch/powerpc/kvm/book3s_32_mmu_host.c|  480 ++
 arch/powerpc/kvm/book3s_32_sr.S  |  143 +++
 arch/powerpc/kvm/book3s_64_emulate.c |  566 -
 arch/powerpc/kvm/book3s_64_exports.c |   32 --
 arch/powerpc/kvm/book3s_64_interrupts.S  |  318 --
 arch/powerpc/kvm/book3s_64_mmu.c |2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   50 ++-
 arch/powerpc/kvm/book3s_64_rmhandlers.S  |  195 -
 arch/powerpc/kvm/book3s_64_slb.S |  183 ++---
 arch/powerpc/kvm/book3s_emulate.c|  570 ++
 arch/powerpc/kvm/book3s_exports.c|   32 ++
 arch/powerpc/kvm/book3s_interrupts.S |  319 ++
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 arch/powerpc/kvm/book3s_rmhandlers.S |  252 
 arch/powerpc/kvm/book3s_segment.S|  258 
 arch/powerpc/kvm/emulate.c   |   17 +-
 arch/powerpc/kvm/powerpc.c   |2 +-
 arch/powerpc/mm/mmu_context_hash32.c |   29 +-
 38 files changed, 2771 insertions(+), 1585 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_32.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64.h
 delete mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_asm.h
 create mode 100644 arch/powerpc/include/asm/kvm_booke.h
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_32_sr.S
 delete mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 delete mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 delete mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 delete mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_segment.S

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Add a 'screen' backend to qemu-char.c

2010-04-15 Thread Thomas Bächler
This backend uses the 'pty' backend, but attaches the pseudo tty to a screen.
A user can use 'screen:name' as a char device in a qemu options and run
'screen -R name' to attach to it.

This is very useful for running headless qemu/qemu-kvm machines, where you
can create screen sessions for the qemu monitor and serial console.
---
 qemu-char.c   |   33 +++--
 qemu-config.c |3 +++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index d845572..d82e4c6 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -973,7 +973,9 @@ static CharDriverState *qemu_chr_open_pty(QemuOpts *opts)
 CharDriverState *chr;
 PtyCharDriver *s;
 struct termios tty;
-int slave_fd, len;
+int slave_fd, len, r;
+const char *screensession;
+char *cmd;
 #if defined(__OpenBSD__) || defined(__DragonFly__)
 char pty_name[PATH_MAX];
 #define q_ptsname(x) pty_name
@@ -1001,7 +1003,29 @@ static CharDriverState *qemu_chr_open_pty(QemuOpts *opts)
 chr->filename = qemu_malloc(len);
 snprintf(chr->filename, len, "pty:%s", q_ptsname(s->fd));
 qemu_opt_set(opts, "path", q_ptsname(s->fd));
-fprintf(stderr, "char device redirected to %s\n", q_ptsname(s->fd));
+
+if((screensession = qemu_opt_get(opts, "screen")) != NULL) {
+if(strlen(screensession) == 0) {
+qemu_free(chr);
+qemu_free(s);
+return NULL;
+}
+len = strlen(screensession) + strlen(q_ptsname(s->fd)) + 20;
+cmd = qemu_malloc(len);
+snprintf(cmd, len, "screen -S '%s' -dm '%s'", screensession, 
q_ptsname(s->fd));
+r = system(cmd);
+qemu_free(cmd);
+if(r == -1 || WEXITSTATUS(r) != 0) {
+fprintf(stderr, "failed to launch screen\n");
+qemu_free(chr);
+qemu_free(s);
+return NULL;
+} else {
+fprintf(stderr, "char device attached to screen session %s\n", 
screensession);
+}
+} else {
+fprintf(stderr, "char device redirected to %s\n", q_ptsname(s->fd));
+}
 
 chr->opaque = s;
 chr->chr_write = pty_chr_write;
@@ -2343,6 +2367,11 @@ QemuOpts *qemu_chr_parse_compat(const char *label, const 
char *filename)
 qemu_opt_set(opts, "path", p);
 return opts;
 }
+if (strstart(filename, "screen:", &p)) {
+qemu_opt_set(opts, "backend", "pty");
+qemu_opt_set(opts, "screen", p);
+return opts;
+}
 if (strstart(filename, "tcp:", &p) ||
 strstart(filename, "telnet:", &p)) {
 if (sscanf(p, "%64[^:]:%32[^,]%n", host, port, &pos) < 2) {
diff --git a/qemu-config.c b/qemu-config.c
index 150157c..8ebf219 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -146,6 +146,9 @@ QemuOptsList qemu_chardev_opts = {
 },{
 .name = "signal",
 .type = QEMU_OPT_BOOL,
+},{
+.name = "screen",
+.type = QEMU_OPT_STRING,
 },
 { /* end if list */ }
 },
-- 
1.7.0.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] qemu-char.c: Fix memory leaks in qemu_chr_open_pty when openpty fails

2010-04-15 Thread Thomas Bächler
---
 qemu-char.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 05df971..d845572 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -986,6 +986,8 @@ static CharDriverState *qemu_chr_open_pty(QemuOpts *opts)
 s = qemu_mallocz(sizeof(PtyCharDriver));
 
 if (openpty(&s->fd, &slave_fd, pty_name, NULL, NULL) < 0) {
+qemu_free(chr);
+qemu_free(s);
 return NULL;
 }
 
-- 
1.7.0.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on copy & paste

2010-04-15 Thread Jim Paris
Stephen Liu wrote:
> 
> 
> - Original Message 
> From: Amit Shah 
> To: Stephen Liu 
> Cc: kvm@vger.kernel.org
> Sent: Thu, April 15, 2010 9:02:53 AM
> Subject: Re: Question on copy & paste
> 
> On (Thu) Apr 15 2010 [08:45:23], Stephen Liu wrote:
> > Hi folks,
> > 
> > host - Debian 5.04
> > 
> > What will the easy way to enable copy_and_paste function between guest and 
> > hosts?  Also among guests.  TIA
> 
> This doesn't exist yet, but something should be available in a few
> months.
> 
> 
> Noted and thanks

You can use higher level layers to handle that in the meantime.  For
example, I always use rdesktop to connect to my Windows guests and it
supports copy and paste just fine.

-jim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] drivers/uio/uio.c: DMA mapping, interrupt extensions, etc.

2010-04-15 Thread Tom Lyon
This is the second of 2 related, but independent, patches. This is for 
uio.c, the previous is for uio_pci_generic.c.

The 2 patches were previously one large patch. Changes for this version:
- uio_pci_generic.c just gets extensions so that a single fd can be used
  by non-privileged processes for interrupt control and mmaps
- All of the DMA and IOMMU related stuff move to uio.c; no longer a need
  to pass ioctls to individual uio drivers. It turns out that the code 
  is not PCI specific anyways.
- A new ioctl to pin DMA buffers to certain IO virtual addresses for KVM.
- New eventfd based interrupt notifications, including support for PCI
  specific MSI and MSI-X interrupts.
- PCI specific code to reset PCI functions before and after use

diff -ruNP linux-2.6.33/drivers/uio/uio.c uio-2.6.33/drivers/uio/uio.c
--- linux-2.6.33/drivers/uio/uio.c  2010-02-24 10:52:17.0 -0800
+++ uio-2.6.33/drivers/uio/uio.c2010-04-15 12:39:02.0 -0700
@@ -23,6 +23,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #define UIO_MAX_DEVICES 255
 
@@ -37,8 +42,37 @@
struct uio_info *info;
struct kobject  *map_dir;
struct kobject  *portio_dir;
+   struct pci_dev  *pdev;
+   int pmaster;
+   struct semaphoregate;
+   int listeners;
+   atomic_tmapcount;
+   struct msix_entry   *msix;
+   int nvec;
+   struct iommu_domain *domain;
+   int cachec;
+   struct eventfd_ctx  *ev_irq;
+   struct eventfd_ctx  *ev_msi;
+   struct eventfd_ctx  **ev_msix;
 };
 
+/*
+ * Structure for keeping track of memory nailed down by the
+ * user for DMA
+ */
+struct dma_map_page {
+struct list_head list;
+struct page **pages;
+   struct scatterlist *sg;
+dma_addr_t  daddr;
+   unsigned long   vaddr;
+   int npage;
+   int rdwr;
+};
+
+static void uio_disable_msi(struct uio_device *);
+static void uio_disable_msix(struct uio_device *);
+
 static int uio_major;
 static DEFINE_IDR(uio_idr);
 static const struct file_operations uio_fops;
@@ -440,17 +474,38 @@
struct uio_device *idev = (struct uio_device *)dev_id;
irqreturn_t ret = idev->info->handler(irq, idev->info);
 
-   if (ret == IRQ_HANDLED)
+   if (ret != IRQ_HANDLED)
+   return ret;
+   if (idev->ev_irq)
+   eventfd_signal(idev->ev_irq, 1);
+   else
uio_event_notify(idev->info);
 
return ret;
 }
 
+/*
+ * MSI and MSI-X Interrupt handler.
+ * Just record an event
+ */
+static irqreturn_t msihandler(int irq, void *arg)
+{
+   struct eventfd_ctx *ctx = arg;
+
+   eventfd_signal(ctx, 1);
+   return IRQ_HANDLED;
+}
+
 struct uio_listener {
struct uio_device *dev;
s32 event_count;
+   struct mm_struct*mm;
+   struct mmu_notifier mmu_notifier;
+   struct list_headdm_list;
 };
 
+static void uio_dma_unmapall(struct uio_listener *);
+
 static int uio_open(struct inode *inode, struct file *filep)
 {
struct uio_device *idev;
@@ -470,7 +525,7 @@
goto out;
}
 
-   listener = kmalloc(sizeof(*listener), GFP_KERNEL);
+   listener = kzalloc(sizeof(*listener), GFP_KERNEL);
if (!listener) {
ret = -ENOMEM;
goto err_alloc_listener;
@@ -478,8 +533,22 @@
 
listener->dev = idev;
listener->event_count = atomic_read(&idev->event);
+   INIT_LIST_HEAD(&listener->dm_list);
filep->private_data = listener;
 
+   down(&idev->gate);
+   if (idev->listeners == 0) { /* first open */
+   if (idev->pmaster && !iommu_found() && !capable(CAP_SYS_RAWIO)) 
{
+   up(&idev->gate);
+   return -EPERM;
+   }
+   /* reset to known state if we can */
+   if (idev->pdev)
+   (void) pci_reset_function(idev->pdev);
+   }
+   idev->listeners++;
+   up(&idev->gate);
+
if (idev->info->open) {
ret = idev->info->open(idev->info, inode);
if (ret)
@@ -514,6 +583,34 @@
if (idev->info->release)
ret = idev->info->release(idev->info, inode);
 
+   uio_dma_unmapall(listener);
+   if (listener->mm) {
+   mmu_notifier_unregister(&listener->mmu_notifier, listener->mm);
+   listener->mm = NULL;
+   }
+
+   down(&idev->gate);
+   if (--idev->listeners <= 0) {
+   if (idev->msix) {
+   uio_disable_msix(idev);
+   }
+   if (idev->ev_msi) {
+   uio_disable_msi(idev);
+   }
+   if (idev->ev_irq) {
+  

[PATCH V2] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes

2010-04-15 Thread Tom Lyon
This is the firt of 2 related, but independent, patches. This is for 
uio_pci_generic.c, the next is for uio.c.

The 2 patches were previously one large patch. Changes for this version:
- uio_pci_generic.c just gets extensions so that a single fd can be used
  by non-privileged processes for interrupt control and mmaps
- All of the DMA and IOMMU related stuff move to uio.c; no longer a need
  to pass ioctls to individual uio drivers. It turns out that the code 
  is not PCI specific anyways.
- A new ioctl to pin DMA buffers to certain IO virtual addresses for KVM.
- New eventfd based interrupt notifications, including support for PCI
  specific MSI and MSI-X interrupts.
- PCI specific code to reset PCI functions before and after use

--- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 10:52:17.0 
-0800
+++ uio-2.6.33/drivers/uio/uio_pci_generic.c2010-04-15 13:44:25.0 
-0700
@@ -14,9 +14,9 @@
  * # ls -l /sys/bus/pci/devices/:00:19.0/driver
  * .../:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
  *
- * Driver won't bind to devices which do not support the Interrupt Disable Bit
+ * Driver won't bind to devices which do not support MSI, MSI-x, or the 
Interrupt Disable Bit
  * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
- * all compliant PCI Express devices should support this bit.
+ * all compliant PCI Express devices should support one of these.
  */
 
 #include 
@@ -41,6 +41,39 @@
return container_of(info, struct uio_pci_generic_dev, info);
 }
 
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on)
+{
+   struct pci_dev *pdev = gdev->pdev;
+   unsigned long flags;
+   u16 orig, new;
+
+   spin_lock_irqsave(&gdev->lock, flags);
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, &orig);
+   new = irq_on ? (orig & ~PCI_COMMAND_INTX_DISABLE)
+   : (orig | PCI_COMMAND_INTX_DISABLE);
+   if (new != orig)
+   pci_write_config_word(gdev->pdev, PCI_COMMAND, new);
+   pci_unblock_user_cfg_access(pdev);
+   spin_unlock_irqrestore(&gdev->lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+/* A privileged app can write the PCI_COMMAND register directly,
+ * but we need this for normal apps
+ */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+
+   irqtoggle(gdev, irq_on);
+   return 0;
+}
+
 /* Interrupt handler. Read/modify/write the command register to disable
  * the interrupt. */
 static irqreturn_t irqhandler(int irq, struct uio_info *info)
@@ -89,7 +122,7 @@
 /* Verify that the device supports Interrupt Disable bit in command register,
  * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
  * in PCI 2.2. */
-static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+static int verify_pci_2_3(struct pci_dev *pdev)
 {
u16 orig, new;
int err = 0;
@@ -121,17 +154,51 @@
return err;
 }
 
-static int __devinit probe(struct pci_dev *pdev,
+/* we could've used the generic pci sysfs stuff for mmap,
+ * but this way we can allow non-privileged users as long
+ * as /dev/uio* has the right permissions
+ */
+static void uio_do_maps(struct uio_pci_generic_dev *gdev)
+{
+   struct pci_dev *pdev = gdev->pdev;
+   struct uio_info *info = &gdev->info;
+   int i, j;
+   char *name;
+
+   for (i=0, j=0; imem[j].name = name; 
+   info->mem[j].addr = pci_resource_start(pdev, i); 
+   info->mem[j].size = pci_resource_len(pdev, i);
+   info->mem[j].memtype = UIO_MEM_PHYS; 
+   j++;
+   }
+   }
+   for (i=0, j=0; iport[j].name = name;
+   info->port[j].start = pci_resource_start(pdev, i);
+   info->port[j].size = pci_resource_len(pdev, i);
+   info->port[j].porttype = UIO_PORT_X86;
+   j++;
+   }
+   }
+}
+
+static int probe(struct pci_dev *pdev,
   const struct pci_device_id *id)
 {
struct uio_pci_generic_dev *gdev;
int err;
-
-   if (!pdev->irq) {
-   dev_warn(&pdev->dev, "No IRQ assigned to device: "
-"no support for interrupts?\n");
-   return -ENODEV;
-   }
+   int msi=0;
 
err = pci_enable_device(pdev);
if (err) {
@@ -140,9 +207,26 @@
return err;
}
 
-   err = verify_pci_2_3(pdev);
-   if (err)
-   goto err_verify;
+   if (pci_find_capability(pdev, PCI_CAP_ID_MSI)) {
+  

Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest

2010-04-15 Thread Lucas Meneghel Rodrigues
On Thu, 2010-04-15 at 20:40 +0530, pradeep wrote:
> Hi Lucas
> 
> Please ignore my earlier patch
> Find the correct patch with the suggested changes.

Hi Pradeep, I was reading the test once again while trying it myself,
some other ideas came to me. I spent some time hacking the test and sent
an updated patch with changes. Please let me know what you think, if you
are OK with them I'll commit it.

> 
> --SP
> 
> 
> plain text document attachment (patch)
> diff -purN autotest/client/tests/kvm/tests/balloon_check.py 
> autotest-new/client/tests/kvm/tests/balloon_check.py
> --- autotest/client/tests/kvm/tests/balloon_check.py  1969-12-31 
> 19:00:00.0 -0500
> +++ autotest-new/client/tests/kvm/tests/balloon_check.py  2010-04-15 
> 18:50:09.0 -0400
> @@ -0,0 +1,51 @@
> +import re, string, logging, random, time
> +from autotest_lib.client.common_lib import error
> +import kvm_test_utils, kvm_utils
> +
> +def run_balloon_check(test, params, env):
> +"""
> +Check Memory ballooning:
> +1) Boot a guest
> +2) Change the memory between 60% to 95% of memory of guest using 
> ballooning 
> +3) check memory info
> +
> +@param test: kvm test object
> +@param params: Dictionary with the test parameters
> +@param env: Dictionary with test environment.
> +"""
> +
> +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
> +session = kvm_test_utils.wait_for_login(vm)
> +fail = 0
> +
> +# Check memory size
> +logging.info("Memory size check")
> +expected_mem = int(params.get("mem"))
> +actual_mem = vm.get_memory_size()
> +if actual_mem != expected_mem:
> +logging.error("Memory size mismatch:")
> +logging.error("Assigned to VM: %s" % expected_mem)
> +logging.error("Reported by OS: %s" % actual_mem)
> +
> +
> +#Check if info balloon works or not.
> +status, output = vm.send_monitor_cmd("info balloon")
> +if status != 0:
> +logging.error("qemu monitor command failed: info balloon")
> +fail += 1
> + 
> +#Reduce memory to random size between 60% to 95% of actual memory
> +percent = random.uniform(0.6, 0.95)
> +new_mem = int(percent*actual_mem)
> +vm.send_monitor_cmd("balloon %s" % new_mem)
> +time.sleep(20)
> +status, output = vm.send_monitor_cmd("info balloon")
> +ballooned_mem = int(re.findall("\d+",output)[0])
> +if ballooned_mem != new_mem:
> +logging.error("memory ballooning failed while changing memory from 
> %s to %s" %actual_mem %new_mem)  
> +fail += 1
> +
> +#Checking for test result
> +if fail != 0:
> +raise error.TestFail("Memory ballooning test failed ")
> +session.close()
> diff -purN autotest/client/tests/kvm/tests_base.cfg.sample 
> autotest-new/client/tests/kvm/tests_base.cfg.sample
> --- autotest/client/tests/kvm/tests_base.cfg.sample   2010-04-15 
> 09:14:10.0 -0400
> +++ autotest-new/client/tests/kvm/tests_base.cfg.sample   2010-04-15 
> 18:50:35.0 -0400
> @@ -171,6 +171,10 @@ variants:
>  drift_threshold = 10
>  drift_threshold_single = 3
>  
> +- balloon_check:  install setup unattended_install
> +type = balloon_check
> +extra_params += "-balloon virtio"
> +
>  - stress_boot:  install setup unattended_install
>  type = stress_boot
>  max_vms = 5


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] add documentation about kvmclock

2010-04-15 Thread Glauber Costa
On Thu, Apr 15, 2010 at 12:28:36PM -0700, Randy Dunlap wrote:
> On Thu, 15 Apr 2010 14:37:28 -0400 Glauber Costa wrote:
> 
Thanks Randy,

All coments relevant. I'll update the document to cover them.

As for your question: Both bit 3 and 0 are used, yes. But they
tell the guest to use a different MSR pair. However, this document
is intented for people not familiar with kvmclock. If you read it,
and could not extract that from the text, it definitely needs to
be augmented.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Memory ballooning test for KVM guest v4

2010-04-15 Thread Lucas Meneghel Rodrigues
From: pradeep 

This test verifies memory ballooning functionality for KVM guests.
It will boot a guest with -balloon virtio, increase and decrease
memory on qemu monitor, verifying the changes, for a given number
of iterations.

Changes from v3:
 * Also check current memory reported by the guest to see if
   ballooning was successful
 * Added a new method to vm, get_current_memory_size(), to
   report total memory for the guest OS at a given time
 * Refactored some test functions
 * Resets ballooned memory at the end of the test (which
   exercises a little more the functionality and avoids
   problems when running through multiple iterations).
 * Make the test run for at least 5 iterations, configurable
   in the 'iterations' parameter. This way the functionality
   is more thoroughly tested, giving testers with particular
   needs some run to customize how many iterations they want

Signed-off-by: Pradeep Kumar Surisetty 
Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/kvm_vm.py  |   18 +-
 client/tests/kvm/tests/balloon_check.py |  102 +++
 client/tests/kvm/tests_base.cfg.sample  |9 +++-
 3 files changed, 125 insertions(+), 4 deletions(-)
 create mode 100644 client/tests/kvm/tests/balloon_check.py

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 047505a..3c643fd 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -916,15 +916,19 @@ class VM:
 session.close()
 
 
-def get_memory_size(self):
+def get_memory_size(self, cmd=None):
 """
-Get memory size of the VM.
+Get bootup memory size of the VM.
+
+@param check_cmd: Command used to check memory. If not provided,
+self.params.get("mem_chk_cmd") will be used.
 """
 session = self.remote_login()
 if not session:
 return None
 try:
-cmd = self.params.get("mem_chk_cmd")
+if not cmd:
+cmd = self.params.get("mem_chk_cmd")
 s, mem_str = session.get_command_status_output(cmd)
 if s != 0:
 return None
@@ -941,3 +945,11 @@ class VM:
 return int(mem_size)
 finally:
 session.close()
+
+
+def get_current_memory_size(self):
+"""
+Get current memory size of the VM, rather than bootup memory.
+"""
+cmd = self.params.get("mem_chk_cur_cmd")
+return self.get_memory_size(cmd)
diff --git a/client/tests/kvm/tests/balloon_check.py 
b/client/tests/kvm/tests/balloon_check.py
new file mode 100644
index 000..2496785
--- /dev/null
+++ b/client/tests/kvm/tests/balloon_check.py
@@ -0,0 +1,102 @@
+import re, string, logging, random, time
+from autotest_lib.client.common_lib import error
+import kvm_test_utils, kvm_utils
+
+def run_balloon_check(test, params, env):
+"""
+Check Memory ballooning:
+1) Boot a guest
+2) Change the memory between 60% to 95% of memory of guest using ballooning
+3) check memory info
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+def check_ballooned_memory():
+"""
+Verify the actual memory reported by monitor command info balloon. If
+the operation failed, increase the failure counter.
+
+@return: Number of failures occurred during operation.
+"""
+fail = 0
+status, output = vm.send_monitor_cmd("info balloon")
+if status != 0:
+logging.error("qemu monitor command failed: info balloon")
+fail += 1
+return 0
+return int(re.findall("\d+", output)[0]), fail
+
+
+def balloon_memory(new_mem):
+"""
+Baloon memory to new_mem and verifies on both qemu monitor and
+guest OS if change worked.
+
+@param new_mem: New desired memory.
+@return: Number of failures occurred during operation.
+"""
+fail = 0
+logging.info("Changing VM memory to %s", new_mem)
+vm.send_monitor_cmd("balloon %s" % new_mem)
+time.sleep(20)
+
+ballooned_mem, cfail = check_ballooned_memory()
+fail += cfail
+# Verify whether the VM machine reports the correct new memory
+if ballooned_mem != new_mem:
+logging.error("Memory ballooning failed while changing memory "
+  "from %s to %s", actual_mem, new_mem)
+fail += 1
+
+# Verify whether the guest OS reports the correct new memory
+current_mem_guest = vm.get_current_memory_size()
+
+# Current memory figures will allways be a little smaller than new
+# memory. If they are higher, ballooning failed on guest perspective
+if current_mem_guest > new_mem:
+logging.error("Guest OS reports %s of RAM, but new ballooned RAM "
+ 

Re: [PATCH 5/5] add documentation about kvmclock

2010-04-15 Thread Randy Dunlap
On Thu, 15 Apr 2010 14:37:28 -0400 Glauber Costa wrote:

> This patch adds a new file, kvm/kvmclock.txt, describing
> the mechanism we use in kvmclock.
> 
> Signed-off-by: Glauber Costa 
> ---
>  Documentation/kvm/kvmclock.txt |  138 
> 
>  1 files changed, 138 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/kvm/kvmclock.txt
> 
> diff --git a/Documentation/kvm/kvmclock.txt b/Documentation/kvm/kvmclock.txt
> new file mode 100644
> index 000..21008bb
> --- /dev/null
> +++ b/Documentation/kvm/kvmclock.txt
> @@ -0,0 +1,138 @@
> +KVM Paravirtual Clocksource driver
> +Glauber Costa, Red Hat Inc.
> +==
> +
> +1. General Description
> +===
> +
...
> +
> +2. kvmclock basics 
> +===
> +
> +When supported by the hypervisor, guests can register a memory page
> +to contain kvmclock data. This page has to be present in guest's address 
> space
> +throughout its whole life. The hypervisor continues to write to it until it 
> is
> +explicitly disabled or the guest is turned off.
> +
> +2.1 kvmclock availability
> +-
> +
> +Guests that want to take advantage of kvmclock should first check its
> +availability through cpuid.
> +
> +kvm features are presented to the guest in leaf 0x4001. Bit 3 indicates
> +the present of kvmclock. Bit 0 indicates that kvmclock is present, but the

   presence
but it's confusing.  Is it bit 3 or bit 0?  They seem to indicate the same 
thing.

> +old MSR set must be used. See section 2.3 for details.

"old MSR set":  what does this mean?

> +
> +2.2 kvmclock functionality
> +--
> +
> +Two MSRs are provided by the hypervisor, controlling kvmclock operation:
> +
> + * MSR_KVM_WALL_CLOCK, value 0x4b564d00 and
> + * MSR_KVM_SYSTEM_TIME, value 0x4b564d01.
> +
> +The first one is only used in rare situations, like boot-time and a
> +suspend-resume cycle. Data is disposable, and after used, the guest
> +may use it for something else. This is hardly a hot path for anything.
> +The Hypervisor fills in the address provided through this MSR with the
> +following structure:
> +
> +struct pvclock_wall_clock {
> +u32   version;
> +u32   sec;
> +u32   nsec;
> +} __attribute__((__packed__));
> +
> +Guest should only trust data to be valid when version haven't changed before

 has not

> +and after reads of sec and nsec. Besides not changing, it has to be an even
> +number. Hypervisor may write an odd number to version field to indicate that
> +an update is in progress.
> +
> +MSR_KVM_SYSTEM_TIME, on the other hand, has persistent data, and is
> +constantly updated by the hypervisor with time information. The data
> +written in this MSR contains two pieces of information: the address in which
> +the guests expects time data to be present 4-byte aligned or'ed with an
> +enabled bit. If one wants to shutdown kvmclock, it just needs to write
> +anything that has 0 as its last bit.
> +
> +Time information presented by the hypervisor follows the structure:
> +
> +struct pvclock_vcpu_time_info {
> +u32   version;
> +u32   pad0;
> +u64   tsc_timestamp;
> +u64   system_time;
> +u32   tsc_to_system_mul;
> +s8tsc_shift;
> +u8pad[3];
> +} __attribute__((__packed__)); 
> +
> +The version field plays the same role as with the one in struct
> +pvclock_wall_clock. The other fields, are:
> +
> + a. tsc_timestamp: the guest-visible tsc (result of rdtsc + tsc_offset) of
> +this cpu at the moment we recorded system_time. Note that some time is

CPU (please)

> +inevitably spent between system_time and tsc_timestamp measurements.
> +Guests can subtract this quantity from the current value of tsc to obtain
> +a delta to be added to system_time

   to system_time.

> +
> + b. system_time: this is the most recent host-time we could be provided with.
> +host gets it through ktime_get_ts, using whichever clocksource is
> +registered at the moment

 moment.

> +
> + c. tsc_to_system_mul: this is the number that tsc delta has to be multiplied
> +by in order to obtain time in nanoseconds. Hypervisor is free to change
> +this value in face of events like cpu frequency change, pcpu migration,

 CPU

> +etc.
> + 
> + d. tsc_shift: guests must shift 

missing text??

> +
> +With this information available, guest calculates current time as:
> +
> +  T = kt + to_nsec(tsc - tsc_0)
> +
> +2.3 Compatibility MSRs
> +--
> +
> +Guests running on top of older hypervisors may have to use a different set of
> +MSRs. This is because originally, kvmclock MSRs were exported within a
> +reserved range by accident. Guests should check cpuid leaf 0x4001 for the
> +presen

KVM autotest: add boot_savevm test

2010-04-15 Thread Marcelo Tosatti

This test boots a guest while periodically running savevm/loadvm.

Adjust savevm_delay/guest memory size to reduce run time, if 
excessive.

Signed-off-by: Marcelo Tosatti 

Index: autotest/client/tests/kvm/tests/boot_savevm.py
===
--- /dev/null
+++ autotest/client/tests/kvm/tests/boot_savevm.py
@@ -0,0 +1,54 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+def run_boot_savevm(test, params, env):
+"""
+KVM boot savevm test:
+1) Start guest
+2) Periodically savevm/loadvm
+4) Log into the guest to verify it's up, fail after timeout seconds
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+savevm_delay = float(params.get("savevm_delay"))
+savevm_login_delay = float(params.get("savevm_login_delay"))
+logging.info("savevm_delay = %f" % savevm_delay)
+login_expire = time.time() + savevm_login_delay
+end_time = time.time() + float(params.get("savevm_timeout"))
+
+while time.time() < end_time:
+time.sleep(savevm_delay)
+
+s, o = vm.send_monitor_cmd("stop")
+if s:
+logging.error("stop failed: %r" % o)
+s, o = vm.send_monitor_cmd("savevm 1")
+if s:
+logging.error("savevm failed: %r" % o)
+s, o = vm.send_monitor_cmd("system_reset")
+if s:
+logging.error("system_reset: %r" % o)
+s, o = vm.send_monitor_cmd("loadvm 1")
+if s:
+logging.error("loadvm failed: %r" % o)
+s, o = vm.send_monitor_cmd("cont")
+if s:
+logging.error("cont failed: %r" % o)
+
+ # Log in
+if (time.time() > login_expire):
+login_expire = time.time() + savevm_login_delay
+logging.info("Logging in after loadvm...")
+session = kvm_utils.wait_for(vm.remote_login, 1, 0, 1)
+if not session:
+logging.info("Failed to login")
+else:
+logging.info("Logged in to guest!")
+break
+
+if (time.time() > end_time):
+raise error.TestFail("fail: timeout")
Index: autotest/client/tests/kvm/tests_base.cfg.sample
===
--- autotest.orig/client/tests/kvm/tests_base.cfg.sample
+++ autotest/client/tests/kvm/tests_base.cfg.sample
@@ -105,6 +105,15 @@ variants:
 iterations = 2
 used_mem = 1024
 
+- boot_savevm: install setup unattended_install
+type = boot_savevm
+savevm_delay = 0.3
+savevm_login_delay = 120
+savevm_timeout = 2000
+kill_vm_on_error = yes
+kill_vm_gracefully = yes
+kill_vm = yes
+
 - autotest: install setup unattended_install
 type = autotest
 test_timeout = 1800
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Try using new kvm clock msrs

2010-04-15 Thread Glauber Costa
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa 
---
 arch/x86/include/asm/kvm_para.h |4 ++
 arch/x86/kernel/kvmclock.c  |   68 +++
 2 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 0cffb96..a32710a 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+/* We could just try to use new msr values, but they are queried very early,
+ * kernel does not have idt handlers yet, and failures are fatal */
+#define KVM_FEATURE_CLOCKSOURCE2   3
+
 
 #define MSR_KVM_WALL_CLOCK_OLD  0x11
 #define MSR_KVM_SYSTEM_TIME_OLD 0x12
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..6d814ce 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,7 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int kvm_use_new_msrs = 0;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -41,6 +42,18 @@ early_param("no-kvmclock", parse_no_kvmclock);
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
 static struct pvclock_wall_clock wall_clock;
 
+static int kvm_system_time_write_value(int low, int high)
+{
+   if (kvm_use_new_msrs)
+   return native_write_msr_safe(MSR_KVM_SYSTEM_TIME_OLD, low, 
high);
+   else
+   return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+}
+
+static void kvm_turnoff_clock(void)
+{
+   kvm_system_time_write_value(0, 0);
+}
 /*
  * The wallclock is the time of day when we booted. Since then, some time may
  * have elapsed since the hypervisor wrote the data. So we try to account for
@@ -54,7 +67,11 @@ static unsigned long kvm_get_wallclock(void)
 
low = (int)__pa_symbol(&wall_clock);
high = ((u64)__pa_symbol(&wall_clock) >> 32);
-   native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+   if (kvm_use_new_msrs)
+   native_write_msr_safe(MSR_KVM_WALL_CLOCK, low, high);
+   else
+   native_write_msr(MSR_KVM_WALL_CLOCK_OLD, low, high);
 
vcpu_time = &get_cpu_var(hv_clock);
pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
@@ -130,7 +147,8 @@ static int kvm_register_clock(char *txt)
high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
   cpu, high, low, txt);
-   return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+   return kvm_system_time_write_value(low, high);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +183,14 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   kvm_turnoff_clock();
native_machine_crash_shutdown(regs);
 }
 #endif
 
 static void kvm_shutdown(void)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   kvm_turnoff_clock();
native_machine_shutdown();
 }
 
@@ -181,27 +199,35 @@ void __init kvmclock_init(void)
if (!kvm_para_available())
return;
 
-   if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-   if (kvm_register_clock("boot clock"))
-   return;
-   pv_time_ops.sched_clock = kvm_clock_read;
-   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
-   x86_platform.get_wallclock = kvm_get_wallclock;
-   x86_platform.set_wallclock = kvm_set_wallclock;
+   if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2))
+   kvm_use_new_msrs = 1;
+   else if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))
+   kvm_use_new_msrs = 0;
+   else
+   return;
+
+   printk(KERN_INFO "kvm-clock: %ssing clocksource new msrs",
+   kvm_use_new_msrs ? "U": "Not u");
+
+   if (kvm_register_clock("boot clock"))
+   return;
+   pv_time_ops.sched_clock = kvm_clock_read;
+   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+   x86_platform.get_wallclock = kvm_get_wallclock;
+   x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   x86_cpuinit.setup_percpu_clockev =
-   kvm_setup_secondary_cl

[PATCH 5/5] add documentation about kvmclock

2010-04-15 Thread Glauber Costa
This patch adds a new file, kvm/kvmclock.txt, describing
the mechanism we use in kvmclock.

Signed-off-by: Glauber Costa 
---
 Documentation/kvm/kvmclock.txt |  138 
 1 files changed, 138 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/kvmclock.txt

diff --git a/Documentation/kvm/kvmclock.txt b/Documentation/kvm/kvmclock.txt
new file mode 100644
index 000..21008bb
--- /dev/null
+++ b/Documentation/kvm/kvmclock.txt
@@ -0,0 +1,138 @@
+KVM Paravirtual Clocksource driver
+Glauber Costa, Red Hat Inc.
+==
+
+1. General Description
+===
+
+Keeping time in virtual machine is acknowledged as a hard problem. The most
+basic mode of operation, usually done by older guests, assumes a fixed length
+between timer interrupts. It then counts the number of interrupts and
+calculates elapsed time. This method fails easily in virtual machines, since
+we can't guarantee that the virtual interrupt will be delivered in time.
+
+Another possibility is to emulate modern devices like HPET, or any other we
+see fit. A modern guest which implements something like the clocksource
+infrastructure, can then ask this virtual device about current time when it
+needs to. The problem with this approach, is that it bumps the guest out
+of guest mode operation, and in some cases, even to userspace very frequently.
+
+In this context, the best approach is to provide the guest with a
+virtualization-aware (paravirtual) clock device. It the asks the hypervisor
+about current time, guaranteeing both stable and accurate timekeeping.
+
+2. kvmclock basics 
+===
+
+When supported by the hypervisor, guests can register a memory page
+to contain kvmclock data. This page has to be present in guest's address space
+throughout its whole life. The hypervisor continues to write to it until it is
+explicitly disabled or the guest is turned off.
+
+2.1 kvmclock availability
+-
+
+Guests that want to take advantage of kvmclock should first check its
+availability through cpuid.
+
+kvm features are presented to the guest in leaf 0x4001. Bit 3 indicates
+the present of kvmclock. Bit 0 indicates that kvmclock is present, but the
+old MSR set must be used. See section 2.3 for details.
+
+2.2 kvmclock functionality
+--
+
+Two MSRs are provided by the hypervisor, controlling kvmclock operation:
+
+ * MSR_KVM_WALL_CLOCK, value 0x4b564d00 and
+ * MSR_KVM_SYSTEM_TIME, value 0x4b564d01.
+
+The first one is only used in rare situations, like boot-time and a
+suspend-resume cycle. Data is disposable, and after used, the guest
+may use it for something else. This is hardly a hot path for anything.
+The Hypervisor fills in the address provided through this MSR with the
+following structure:
+
+struct pvclock_wall_clock {
+u32   version;
+u32   sec;
+u32   nsec;
+} __attribute__((__packed__));
+
+Guest should only trust data to be valid when version haven't changed before
+and after reads of sec and nsec. Besides not changing, it has to be an even
+number. Hypervisor may write an odd number to version field to indicate that
+an update is in progress.
+
+MSR_KVM_SYSTEM_TIME, on the other hand, has persistent data, and is
+constantly updated by the hypervisor with time information. The data
+written in this MSR contains two pieces of information: the address in which
+the guests expects time data to be present 4-byte aligned or'ed with an
+enabled bit. If one wants to shutdown kvmclock, it just needs to write
+anything that has 0 as its last bit.
+
+Time information presented by the hypervisor follows the structure:
+
+struct pvclock_vcpu_time_info {
+u32   version;
+u32   pad0;
+u64   tsc_timestamp;
+u64   system_time;
+u32   tsc_to_system_mul;
+s8tsc_shift;
+u8pad[3];
+} __attribute__((__packed__)); 
+
+The version field plays the same role as with the one in struct
+pvclock_wall_clock. The other fields, are:
+
+ a. tsc_timestamp: the guest-visible tsc (result of rdtsc + tsc_offset) of
+this cpu at the moment we recorded system_time. Note that some time is
+inevitably spent between system_time and tsc_timestamp measurements.
+Guests can subtract this quantity from the current value of tsc to obtain
+a delta to be added to system_time
+
+ b. system_time: this is the most recent host-time we could be provided with.
+host gets it through ktime_get_ts, using whichever clocksource is
+registered at the moment
+
+ c. tsc_to_system_mul: this is the number that tsc delta has to be multiplied
+by in order to obtain time in nanoseconds. Hypervisor is free to change
+this value in face of events like cpu frequency change, pcpu migration,
+etc.
+ 
+ d. tsc_shift: guests must shift 
+
+With this information available, guest calculates current time as:
+
+  T = kt + to_nsec(ts

[PATCH 1/5] Add a global synchronization point for pvclock

2010-04-15 Thread Glauber Costa
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa 
CC: Jeremy Fitzhardinge 
CC: Avi Kivity 
CC: Marcelo Tosatti 
CC: Zachary Amsden 
---
 arch/x86/kernel/pvclock.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..b7de0e6 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -109,11 +109,14 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
return pv_tsc_khz;
 }
 
+static u64 last_value = 0;
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+   u64 last;
 
do {
version = pvclock_get_time_values(&shadow, src);
@@ -123,6 +126,26 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src->version);
 
+   /*
+* Assumption here is that last_value, a global accumulator, always goes
+* forward. If we are less than that, we should not be much smaller.
+* We assume there is an error marging we're inside, and then the 
correction
+* does not sacrifice accuracy.
+*
+* For reads: global may have changed between test and return,
+* but this means someone else updated poked the clock at a later time.
+* We just need to make sure we are not seeing a backwards event.
+*
+* For updates: last_value = ret is not enough, since two vcpus could be
+* updating at the same time, and one of them could be slightly behind,
+* making the assumption that last_value always go forward fail to hold.
+*/
+   do {
+   last = last_value;
+   if (ret < last)
+   return last;
+   } while (unlikely(cmpxchg64(&last_value, last, ret) != ret));
+
return ret;
 }
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] change msr numbers for kvmclock

2010-04-15 Thread Glauber Costa
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa 
---
 arch/x86/include/asm/kvm_para.h |8 ++--
 arch/x86/kvm/x86.c  |7 ++-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..0cffb96 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -17,8 +17,12 @@
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
 
-#define MSR_KVM_WALL_CLOCK  0x11
-#define MSR_KVM_SYSTEM_TIME 0x12
+#define MSR_KVM_WALL_CLOCK_OLD  0x11
+#define MSR_KVM_SYSTEM_TIME_OLD 0x12
+
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME 0x4b564d01
 
 #define KVM_MAX_MMU_OP_BATCH   32
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8824b73..714aae2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -575,9 +575,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN5
+#define KVM_SAVE_MSRS_BEGIN7
 static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+   MSR_KVM_SYSTEM_TIME_OLD, MSR_KVM_WALL_CLOCK_OLD,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1099,10 +1100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
case MSR_IA32_MISC_ENABLE:
vcpu->arch.ia32_misc_enable_msr = data;
break;
+   case MSR_KVM_WALL_CLOCK_OLD:
case MSR_KVM_WALL_CLOCK:
vcpu->kvm->arch.wall_clock = data;
kvm_write_wall_clock(vcpu->kvm, data);
break;
+   case MSR_KVM_SYSTEM_TIME_OLD:
case MSR_KVM_SYSTEM_TIME: {
if (vcpu->arch.time_page) {
kvm_release_page_dirty(vcpu->arch.time_page);
@@ -1374,9 +1377,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
data = vcpu->arch.efer;
break;
case MSR_KVM_WALL_CLOCK:
+   case MSR_KVM_WALL_CLOCK_OLD:
data = vcpu->kvm->arch.wall_clock;
break;
case MSR_KVM_SYSTEM_TIME:
+   case MSR_KVM_SYSTEM_TIME_OLD:
data = vcpu->arch.time;
break;
case MSR_IA32_P5_MC_ADDR:
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] pv clock misc fixes

2010-04-15 Thread Glauber Costa
Hello folks,

In this series, I present a couple of fixes for kvmclock.
In patch 1, a guest-side fix is proposed for a problem that is biting us
for quite a while now. the tsc inside VMs does not seem to be
that good, (up to now, only single-socket nehalems were stable enough),
and we're seeing small (but nevertheless wrong) time warps inside SMP guests.
I am proposing the fix to reside on common code in pvclock.c, but would
be good to hear Jeremy on this.

On the other 3 patches, I change kvmclock MSR numbers in a compatible
fashion. Both MSR sets will be supported for a while.

Patch 5 adds documentation about kvmclock, which to date, we lacked.

Glauber Costa (5):
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  Try using new kvm clock msrs
  export new cpuid KVM_CAP
  add documentation about kvmclock

 Documentation/kvm/kvmclock.txt  |  138 +++
 arch/x86/include/asm/kvm_para.h |   12 +++-
 arch/x86/kernel/kvmclock.c  |   68 +--
 arch/x86/kernel/pvclock.c   |   23 +++
 arch/x86/kvm/x86.c  |   13 -
 include/linux/kvm.h |1 +
 6 files changed, 231 insertions(+), 24 deletions(-)
 create mode 100644 Documentation/kvm/kvmclock.txt

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] export new cpuid KVM_CAP

2010-04-15 Thread Glauber Costa
Since we're changing the msrs kvmclock uses, we have to communicate
that to the guest, through cpuid. We can add a new KVM_CAP to the
hypervisor, and then patch userspace to recognize it.

And if we ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Instead, what I'm proposing in this patch is a new capability, called
KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list
currently supported by the hypervisor. If we ever want to add or remove
some feature, we only need to tweak into the HV, leaving userspace untouched.

Signed-off-by: Glauber Costa 
---
 arch/x86/kvm/x86.c  |6 ++
 include/linux/kvm.h |1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 714aae2..74f0dc3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1545,6 +1545,12 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_MCE:
r = KVM_MAX_MCE_BANKS;
break;
+   case KVM_CAP_X86_CPUID_FEATURE_LIST:
+   r = (1 << KVM_FEATURE_CLOCKSOURCE) |
+   (1 << KVM_FEATURE_NOP_IO_DELAY) |
+   (1 << KVM_FEATURE_MMU_OP) |
+   (1 << KVM_FEATURE_CLOCKSOURCE2);
+   break;
default:
r = 0;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..1ce124f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -507,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_DEBUGREGS 50
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
+#define KVM_CAP_X86_CPUID_FEATURE_LIST 52
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 REBASE] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Gleb Natapov
If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

---
Changelog:
 v1->v2:
  - report emulation error to userspace instead of ignoring it silently.

Signed-off-by: Gleb Natapov
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4cb3aa6..6238eea 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2265,8 +2265,14 @@ static int task_switch_interception(struct vcpu_svm *svm)
 (int_vec == OF_VECTOR || int_vec == BP_VECTOR)))
skip_emulated_instruction(&svm->vcpu);
 
-   return kvm_task_switch(&svm->vcpu, tss_selector, reason,
-  has_error_code, error_code);
+   if (kvm_task_switch(&svm->vcpu, tss_selector, reason,
+   has_error_code, error_code) == EMULATE_FAIL) {
+   svm->vcpu.run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   svm->vcpu.run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+   svm->vcpu.run->internal.ndata = 0;
+   return 0;
+   }
+   return 1;
 }
 
 static int cpuid_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 29a63d4..7e2f8d5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3315,9 +3315,13 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
   type != INTR_TYPE_NMI_INTR))
skip_emulated_instruction(vcpu);
 
-   if (!kvm_task_switch(vcpu, tss_selector, reason, has_error_code,
-error_code))
+   if (kvm_task_switch(vcpu, tss_selector, reason,
+   has_error_code, error_code) == EMULATE_FAIL) {
+   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+   vcpu->run->internal.ndata = 0;
return 0;
+   }
 
/* clear all local breakpoint enable flags */
vmcs_writel(GUEST_DR7, vmcs_readl(GUEST_DR7) & ~55);
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-15 Thread Marcelo Tosatti
On Fri, Apr 16, 2010 at 09:25:03PM +0800, Xiao Guangrong wrote:
> - calculate zapped page number properly in mmu_zap_unsync_children()
> - calculate freeed page number properly kvm_mmu_change_mmu_pages()
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mmu.c |   12 
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a23ca75..41cccd4 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1481,13 +1481,16 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>   struct kvm_mmu_page *sp;
>  
>   for_each_sp(pages, sp, parents, i) {
> + if (list_empty(&kvm->arch.active_mmu_pages))
> + goto exit;

I meant to check for list_empty in kvm_mmu_change_mmu_pages, instead of
relying on the count returned by kvm_mmu_zap_page. Similarly to what 
__kvm_mmu_free_some_pages does.

Checking here is not needed because the pages returned in the array 
will not be zapped (mmu_lock is held).

Applied 1, 4 and 5 (so please regenerate against kvm.git -next branch),
thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM: x86: Push potential exception error code on task switches

2010-04-15 Thread Marcelo Tosatti
On Wed, Apr 14, 2010 at 03:51:09PM +0200, Jan Kiszka wrote:
> When a fault triggers a task switch, the error code, if existent, has to
> be pushed on the new task's stack. Implement the missing bits.
> 
> Signed-off-by: Jan Kiszka 
> ---
> 
> Changes in v2:
>  - push writeback into emulator_task_switch
>  - refactored over "Terminate early if task_switch_16/32 failed"
> 
>  arch/x86/include/asm/kvm_emulate.h |3 ++-
>  arch/x86/include/asm/kvm_host.h|3 ++-
>  arch/x86/include/asm/svm.h |1 +
>  arch/x86/kvm/emulate.c |   22 ++
>  arch/x86/kvm/svm.c |   11 ++-
>  arch/x86/kvm/vmx.c |   12 +++-
>  arch/x86/kvm/x86.c |6 --
>  7 files changed, 48 insertions(+), 10 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Replace role.glevels with role.cr4_pae

2010-04-15 Thread Marcelo Tosatti
On Wed, Apr 14, 2010 at 07:20:03PM +0300, Avi Kivity wrote:
> There is no real distinction between glevels=3 and glevels=4; both have
> exactly the same format and the code is treated exactly the same way.  Drop
> role.glevels and replace is with role.cr4_pae (which is meaningful).  This
> simplifies the code a bit.
> 
> As a side effect, it allows sharing shadow page tables between pae and
> longmode guest page tables at the same guest page.
> 
> Signed-off-by: Avi Kivity 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm test: Add 32-bit task switch micro-test

2010-04-15 Thread Marcelo Tosatti
On Wed, Apr 14, 2010 at 04:12:46PM +0200, Jan Kiszka wrote:
> This implements a basic task switch test for 32-bit targets. It
> specifically stresses the case that a fault with attached error code
> triggers the switch via a task gate.
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Marcelo Tosatti
On Thu, Apr 15, 2010 at 01:09:05PM +0300, Gleb Natapov wrote:
> 
> If kvm_task_switch() fails code exits to userspace without specifying
> exit reason, so the previous exit reason is reused by userspace. Fix
> this by specifying exit reason correctly.
> 
> ---
> Changelog:
>  v1->v2:
>   - report emulation error to userspace instead of ignoring it silently.
> 
> Should be applied after "KVM: fix emulator_task_switch() return value."
> since it relies on new return value from kvm_task_switch().
> 
> Signed-off-by: Gleb Natapov

Does not apply cleanly anymore, please regenerate against -next branch,
sorry.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM performance issue in KVM guests.

2010-04-15 Thread Srivatsa Vaddagiri
On Thu, Apr 15, 2010 at 03:33:18PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-04-15 at 11:18 +0300, Avi Kivity wrote:
> > 
> > Certainly that has even greater potential for Linux guests.  Note that 
> > we spin on mutexes now, so we need to prevent preemption while the lock 
> > owner is running. 
> 
> either that, or disable spinning on (para) virt kernels. Para virt
> kernels could possibly extend the thing by also checking to see if the
> owner's vcpu is running.

I suspect we will need a combination of both approaches, given that we will not
be able to avoid preempting guests in their critical section always (too long
critical sections or real-time tasks wanting to preempt). Other idea is to
gang-schedule VCPUs of the same guest as much as possible?

- vatsa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on copy & paste

2010-04-15 Thread Stephen Liu


- Original Message 
From: Amit Shah 
To: Stephen Liu 
Cc: kvm@vger.kernel.org
Sent: Thu, April 15, 2010 9:02:53 AM
Subject: Re: Question on copy & paste

On (Thu) Apr 15 2010 [08:45:23], Stephen Liu wrote:
> Hi folks,
> 
> host - Debian 5.04
> 
> What will the easy way to enable copy_and_paste function between guest and 
> hosts?  Also among guests.  TIA

This doesn't exist yet, but something should be available in a few
months.


Noted and thanks


B.R.
Stephen L


Send instant messages to your online friends http://uk.messenger.yahoo.com 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on copy & paste

2010-04-15 Thread Amit Shah
On (Thu) Apr 15 2010 [08:45:23], Stephen Liu wrote:
> Hi folks,
> 
> host - Debian 5.04
> 
> What will the easy way to enable copy_and_paste function between guest and 
> hosts?  Also among guests.  TIA

This doesn't exist yet, but something should be available in a few
months.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question on copy & paste

2010-04-15 Thread Stephen Liu
Hi folks,

host - Debian 5.04

What will the easy way to enable copy_and_paste function between guest and 
hosts?  Also among guests.  TIA


B.R.
Stephen L

Send instant messages to your online friends http://uk.messenger.yahoo.com 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest

2010-04-15 Thread pradeep

Hi Lucas

Please ignore my earlier patch
Find the correct patch with the suggested changes.


--SP


diff -purN autotest/client/tests/kvm/tests/balloon_check.py 
autotest-new/client/tests/kvm/tests/balloon_check.py
--- autotest/client/tests/kvm/tests/balloon_check.py1969-12-31 
19:00:00.0 -0500
+++ autotest-new/client/tests/kvm/tests/balloon_check.py2010-04-15 
18:50:09.0 -0400
@@ -0,0 +1,51 @@
+import re, string, logging, random, time
+from autotest_lib.client.common_lib import error
+import kvm_test_utils, kvm_utils
+
+def run_balloon_check(test, params, env):
+"""
+Check Memory ballooning:
+1) Boot a guest
+2) Change the memory between 60% to 95% of memory of guest using 
ballooning 
+3) check memory info
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+
+vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
+session = kvm_test_utils.wait_for_login(vm)
+fail = 0
+
+# Check memory size
+logging.info("Memory size check")
+expected_mem = int(params.get("mem"))
+actual_mem = vm.get_memory_size()
+if actual_mem != expected_mem:
+logging.error("Memory size mismatch:")
+logging.error("Assigned to VM: %s" % expected_mem)
+logging.error("Reported by OS: %s" % actual_mem)
+
+
+#Check if info balloon works or not.
+status, output = vm.send_monitor_cmd("info balloon")
+if status != 0:
+logging.error("qemu monitor command failed: info balloon")
+fail += 1
+ 
+#Reduce memory to random size between 60% to 95% of actual memory
+percent = random.uniform(0.6, 0.95)
+new_mem = int(percent*actual_mem)
+vm.send_monitor_cmd("balloon %s" % new_mem)
+time.sleep(20)
+status, output = vm.send_monitor_cmd("info balloon")
+ballooned_mem = int(re.findall("\d+",output)[0])
+if ballooned_mem != new_mem:
+logging.error("memory ballooning failed while changing memory from %s 
to %s" %actual_mem %new_mem)  
+fail += 1
+
+#Checking for test result
+if fail != 0:
+raise error.TestFail("Memory ballooning test failed ")
+session.close()
diff -purN autotest/client/tests/kvm/tests_base.cfg.sample 
autotest-new/client/tests/kvm/tests_base.cfg.sample
--- autotest/client/tests/kvm/tests_base.cfg.sample 2010-04-15 
09:14:10.0 -0400
+++ autotest-new/client/tests/kvm/tests_base.cfg.sample 2010-04-15 
18:50:35.0 -0400
@@ -171,6 +171,10 @@ variants:
 drift_threshold = 10
 drift_threshold_single = 3
 
+- balloon_check:  install setup unattended_install
+type = balloon_check
+extra_params += "-balloon virtio"
+
 - stress_boot:  install setup unattended_install
 type = stress_boot
 max_vms = 5


Re: [RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.

2010-04-15 Thread Arnd Bergmann
On Thursday 15 April 2010, Xin, Xiaohui wrote:
> 
> >It seems that you are duplicating a lot of functionality that
> >is already in macvtap. I've asked about this before but then
> >didn't look at your newer versions. Can you explain the value
> >of introducing another interface to user land?
> 
> >I'm still planning to add zero-copy support to macvtap,
> >hopefully reusing parts of your code, but do you think there
> >is value in having both?
> 
> I have not looked into your macvtap code in detail before.
> Does the two interface exactly the same? We just want to create a simple
> way to do zero-copy. Now it can only support vhost, but in future
> we also want it to support directly read/write operations from user space too.

Right now, the features are mostly distinct. Macvtap first of all provides
a "tap" style interface for users, and can also be used by vhost-net.
It also provides a way to share a NIC among a number of guests by software,
though I indent to add support for VMDq and SR-IOV as well. Zero-copy
is also not yet done in macvtap but should be added.

mpassthru right now does not allow sharing a NIC between guests, and
does not have a tap interface for non-vhost operation, but does the
zero-copy that is missing in macvtap.

> Basically, compared to the interface, I'm more worried about the modification
> to net core we have made to implement zero-copy now. If this hardest part
> can be done, then any user space interface modifications or integrations are 
> more easily to be done after that.

I agree that the network stack modifications are the hard part for zero-copy,
and your work on that looks very promising and is complementary to what I've
done with macvtap. Your current user interface looks good for testing this out,
but I think we should not merge it (the interface) upstream if we can get the
same or better result by integrating your buffer management code into macvtap.

I can try to merge your code into macvtap myself if you agree, so you
can focus on getting the internals right.

> >Not sure what I'm missing, but who calls the vq->receiver? This seems
> >to be neither in the upstream version of vhost nor introduced by your
> >patch.
> 
> See Patch v3 2/3 I have sent out, it is called by handle_rx() in vhost.

Ok, I see. As a general rule, it's preferred to split a patch series
in a way that makes it possible to apply each patch separately and still
get a working kernel, ideally with more features than the version before
the patch. I believe you could get there by reordering your patches to
make the actual driver the last one in the series.

Not a big problem though, I was mostly looking in the wrong place.

> >> +  ifr.ifr_name[IFNAMSIZ-1] = '\0';
> >> +
> >> +  ret = -EBUSY;
> >> +
> >> +  if (ifr.ifr_flags & IFF_MPASSTHRU_EXCL)
> >> +  break;
> 
> >Your current use of the IFF_MPASSTHRU* flags does not seem to make
> >any sense whatsoever. You check that this flag is never set, but set
> >it later yourself and then ignore all flags.
> 
> Using that flag is tried to prevent if another one wants to bind the same 
> device
> Again. But I will see if it really ignore all other flags.

The ifr variable is on the stack of the mp_chr_ioctl function, and you never
look at the value after setting it. In order to prevent multiple opens
of that device, you probably need to lock out any other users as well,
and make it a property of the underlying device. E.g. you also want to
prevent users on the host from setting an IP address on the NIC and
using it to send and receive data there.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Sheng Yang
On Thursday 15 April 2010 18:44:15 Avi Kivity wrote:
> On 04/15/2010 01:40 PM, Joerg Roedel wrote:
> >> That means an NMI that happens outside guest code (for example, in the
> >> mmu, or during the exit itself) would be counted as if in guest code.
> >
> > Hmm, true. The same is true for an NMI that happens between VMSAVE and
> > STGI but that window is smaller. Anyway, I think we don't need the
> > busy-wait loop. The NMI should be executed at a well defined point and
> > we set the cpu_var back to NULL after that point.
> 
> The point is not well defined.  Considering there are already at least
> two implementations svm, I don't want to rely on implementation details.

After more investigating, I realized that I had interpreted the SDM wrong. 
Sorry.

There is *no* risk with the original method of calling "int $2". 

According to the SDM 24.1:

> The following bullets detail when architectural state is and is not updated 
in response to VM exits:
[...]
> - An NMI causes subsequent NMIs to be blocked, but only after the VM exit 
completes.

So the truth is, after NMI directly caused VMExit, the following NMIs would be 
blocked, until encountered next "iret". So execute "int $2" is safe in 
vmx_complete_interrupts(), no risk in causing nested NMI. And it would unblock 
the following NMIs as well due to "iret" it executed.

So there is unnecessary to make change to avoid "potential nested NMI".

Sorry for the mistake and caused confusing.

-- 
regards
Yang, Sheng

> 
> We could tune the position of the loop so that zero iterations are
> executed on the implementations we know about.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM performance issue in KVM guests.

2010-04-15 Thread Peter Zijlstra
On Thu, 2010-04-15 at 11:18 +0300, Avi Kivity wrote:
> 
> Certainly that has even greater potential for Linux guests.  Note that 
> we spin on mutexes now, so we need to prevent preemption while the lock 
> owner is running. 

either that, or disable spinning on (para) virt kernels. Para virt
kernels could possibly extend the thing by also checking to see if the
owner's vcpu is running.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] KVM MMU: remove unused parameter in mmu_parent_walk()

2010-04-15 Thread Xiao Guangrong
'vcpu' is unused, remove it

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   24 +++-
 1 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a32c60c..2f8ae9e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -172,7 +172,7 @@ struct kvm_shadow_walk_iterator {
 shadow_walk_okay(&(_walker));  \
 shadow_walk_next(&(_walker)))
 
-typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page 
*sp);
+typedef int (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp);
 
 static struct kmem_cache *pte_chain_cache;
 static struct kmem_cache *rmap_desc_cache;
@@ -1000,8 +1000,7 @@ static void mmu_page_remove_parent_pte(struct 
kvm_mmu_page *sp,
 }
 
 
-static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
-   mmu_parent_walk_fn fn)
+static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn)
 {
struct kvm_pte_chain *pte_chain;
struct hlist_node *node;
@@ -1010,8 +1009,8 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 
if (!sp->multimapped && sp->parent_pte) {
parent_sp = page_header(__pa(sp->parent_pte));
-   fn(vcpu, parent_sp);
-   mmu_parent_walk(vcpu, parent_sp, fn);
+   fn(parent_sp);
+   mmu_parent_walk(parent_sp, fn);
return;
}
hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
@@ -1019,8 +1018,8 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
if (!pte_chain->parent_ptes[i])
break;
parent_sp = 
page_header(__pa(pte_chain->parent_ptes[i]));
-   fn(vcpu, parent_sp);
-   mmu_parent_walk(vcpu, parent_sp, fn);
+   fn(parent_sp);
+   mmu_parent_walk(parent_sp, fn);
}
 }
 
@@ -1057,16 +1056,15 @@ static void kvm_mmu_update_parents_unsync(struct 
kvm_mmu_page *sp)
}
 }
 
-static int unsync_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static int unsync_walk_fn(struct kvm_mmu_page *sp)
 {
kvm_mmu_update_parents_unsync(sp);
return 1;
 }
 
-static void kvm_mmu_mark_parents_unsync(struct kvm_vcpu *vcpu,
-   struct kvm_mmu_page *sp)
+static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-   mmu_parent_walk(vcpu, sp, unsync_walk_fn);
+   mmu_parent_walk(sp, unsync_walk_fn);
kvm_mmu_update_parents_unsync(sp);
 }
 
@@ -1344,7 +1342,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
-   kvm_mmu_mark_parents_unsync(vcpu, sp);
+   kvm_mmu_mark_parents_unsync(sp);
}
trace_kvm_mmu_get_page(sp, false);
return sp;
@@ -1761,7 +1759,7 @@ static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
++vcpu->kvm->stat.mmu_unsync;
sp->unsync = 1;
 
-   kvm_mmu_mark_parents_unsync(vcpu, sp);
+   kvm_mmu_mark_parents_unsync(sp);
 
mmu_convert_notrap(sp);
return 0;
-- 
1.6.1.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/5] KVM MMU: smaller reduce 'struct kvm_mmu_page' size

2010-04-15 Thread Xiao Guangrong
define 'multimapped' as 'bool'

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0c49c88..cace232 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -202,9 +202,9 @@ struct kvm_mmu_page {
 * in this shadow page.
 */
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
-   int multimapped; /* More than one parent_pte? */
-   int root_count;  /* Currently serving as active root */
+   bool multimapped; /* More than one parent_pte? */
bool unsync;
+   int root_count;  /* Currently serving as active root */
unsigned int unsync_children;
union {
u64 *parent_pte;   /* !multimapped */
-- 
1.6.1.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] KVM MMU: cleanup for restart hlist walking

2010-04-15 Thread Xiao Guangrong
Quote from Avi:

|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals. 

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   15 ++-
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 41cccd4..a32c60c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1567,13 +1567,14 @@ static int kvm_mmu_unprotect_page(struct kvm *kvm, 
gfn_t gfn)
r = 0;
index = kvm_page_table_hashfn(gfn);
bucket = &kvm->arch.mmu_page_hash[index];
+restart:
hlist_for_each_entry_safe(sp, node, n, bucket, hash_link)
if (sp->gfn == gfn && !sp->role.direct) {
pgprintk("%s: gfn %lx role %x\n", __func__, gfn,
 sp->role.word);
r = 1;
if (kvm_mmu_zap_page(kvm, sp))
-   n = bucket->first;
+   goto restart;
}
return r;
 }
@@ -1587,13 +1588,14 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
 
index = kvm_page_table_hashfn(gfn);
bucket = &kvm->arch.mmu_page_hash[index];
+restart:
hlist_for_each_entry_safe(sp, node, nn, bucket, hash_link) {
if (sp->gfn == gfn && !sp->role.direct
&& !sp->role.invalid) {
pgprintk("%s: zap %lx %x\n",
 __func__, gfn, sp->role.word);
if (kvm_mmu_zap_page(kvm, sp))
-   nn = bucket->first;
+   goto restart;
}
}
 }
@@ -2673,6 +2675,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
}
index = kvm_page_table_hashfn(gfn);
bucket = &vcpu->kvm->arch.mmu_page_hash[index];
+
+restart:
hlist_for_each_entry_safe(sp, node, n, bucket, hash_link) {
if (sp->gfn != gfn || sp->role.direct || sp->role.invalid)
continue;
@@ -2693,7 +2697,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
pgprintk("misaligned: gpa %llx bytes %d role %x\n",
 gpa, bytes, sp->role.word);
if (kvm_mmu_zap_page(vcpu->kvm, sp))
-   n = bucket->first;
+   goto restart;
++vcpu->kvm->stat.mmu_flooded;
continue;
}
@@ -2902,10 +2906,11 @@ void kvm_mmu_zap_all(struct kvm *kvm)
struct kvm_mmu_page *sp, *node;
 
spin_lock(&kvm->mmu_lock);
+restart:
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link)
if (kvm_mmu_zap_page(kvm, sp))
-   node = container_of(kvm->arch.active_mmu_pages.next,
-   struct kvm_mmu_page, link);
+   goto restart;
+
spin_unlock(&kvm->mmu_lock);
 
kvm_flush_remote_tlbs(kvm);
-- 
1.6.1.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-15 Thread Xiao Guangrong
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a23ca75..41cccd4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1481,13 +1481,16 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
struct kvm_mmu_page *sp;
 
for_each_sp(pages, sp, parents, i) {
+   if (list_empty(&kvm->arch.active_mmu_pages))
+   goto exit;
+
kvm_mmu_zap_page(kvm, sp);
mmu_pages_clear_parents(&parents);
+   zapped++;
}
-   zapped += pages.nr;
kvm_mmu_pages_init(parent, &parents, &pages);
}
-
+exit:
return zapped;
 }
 
@@ -1540,7 +1543,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned 
int kvm_nr_mmu_pages)
 
page = container_of(kvm->arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
-   kvm_mmu_zap_page(kvm, page);
+   used_pages -= kvm_mmu_zap_page(kvm, page);
used_pages--;
}
kvm->arch.n_free_mmu_pages = 0;
@@ -1589,7 +1592,8 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
&& !sp->role.invalid) {
pgprintk("%s: zap %lx %x\n",
 __func__, gfn, sp->role.word);
-   kvm_mmu_zap_page(kvm, sp);
+   if (kvm_mmu_zap_page(kvm, sp))
+   nn = bucket->first;
}
}
 }
-- 
1.6.1.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] KVM MMU: remove unused struct

2010-04-15 Thread Xiao Guangrong
Remove 'struct kvm_unsync_walk' since it's not used

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b44380b..a23ca75 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -172,11 +172,6 @@ struct kvm_shadow_walk_iterator {
 shadow_walk_okay(&(_walker));  \
 shadow_walk_next(&(_walker)))
 
-
-struct kvm_unsync_walk {
-   int (*entry) (struct kvm_mmu_page *sp, struct kvm_unsync_walk *walk);
-};
-
 typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page 
*sp);
 
 static struct kmem_cache *pte_chain_cache;
-- 
1.6.1.2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TCM_Loop virtual SAS Ports + megasas HBA emulation into Linux Guests

2010-04-15 Thread Nicholas A. Bellinger
Greetings Hannes, Christoph and Co,

So after some additional discussions on and off the linux-scsi list wrt
to KVM + megasas emulation, I got motivated and decided to try it out
for myself using the QEMU code from Gerd's tree at:

http://repo.or.cz/w/qemu/kraxel.git branch: scsi.v9.

As expected using megasas with hw/scsi-disk.c for QEMU userspace CDB
emulation is working, Thank you Dr. Hannes! :-)

Sooo, after poking around a bit further I managed to (partially) get the
TCM_Loop fabric module running with Persistent Reservation and ALUA
using virtual target SAS ports with megasas via hw/scsi-generic.c.  I am
using a single TCM/IBLOCK backstore (single SATA JBOD disk) with two
virtual SAS target ports appearing as local drives on the Linux host.

>From there I am using the following CLI ops to start two qemu instances:

./x86_64-softmmu/qemu-system-x86_64 -m 2048 -smp 4 /root/lenny64guest0-orig.img 
 \
  -drive if=none,id=mydisk,file=/dev/sg3 -device megasas,id=raid  \
  -device scsi-generic,bus=raid.0,scsi-id=1,drive=mydisk

Here are some observations of what I have seen with various tests so
far..

*) By representing one TCM/IBLOCK backstore as two virtual SAS target
ports, I can pass each scsi-generic device (representing the same
underlying struct block_device) from TCM_Loop's LLD into different Linux
guests.

*) Most TCM control path CDB emulation is working as expected.  This
includes Persitent Reservation IN CDBs and implict ALUA access state
transitions using sg_rtpg to virtual SAS ports in Linux guests..!

*) DATA_SG_IO READs (and I assume WRITEs as well) via hw/scsi-generic.c
are failing in Linux guests with the following message:

scsi_command_complete: ret -90 (Message too long)
[raid.0 id=1 lun=0] READ_12 - from-dev len=4096
scsi_command_complete: ret -90 (Message too long)
[raid.0 id=1 lun=0] READ_12 - from-dev len=512

I assume this is because megasas is currently using hw/scsi-disk.c, yes?

*) Persistent Reservation OUT CDBs seem to be getting dropped with GOOD
status, but the same PR OUT op is still working as expected when using
the virtual SCSI device on the Linux Host.  This does appear to be
specific to PROUT with megasas AFAICT..

Here is the very first screen shot of TCM_Loop + megasas in action on
host running a fresh cut of v2.6.34-rc4:

https://www.linux-iscsi.org/images/TCM_Loop-megasas-04152010.png

I will keep poking around and add the WIP items + info to the KVM and
TCM_loop pages on the LIO wiki.  Until then, any comments to steer me in
the right direction to address the above items would be apperciated.

Best,

--nab









--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity

On 04/15/2010 01:40 PM, Joerg Roedel wrote:



That means an NMI that happens outside guest code (for example, in the
mmu, or during the exit itself) would be counted as if in guest code.
 

Hmm, true. The same is true for an NMI that happens between VMSAVE and
STGI but that window is smaller. Anyway, I think we don't need the
busy-wait loop. The NMI should be executed at a well defined point and
we set the cpu_var back to NULL after that point.
   


The point is not well defined.  Considering there are already at least 
two implementations svm, I don't want to rely on implementation details.


We could tune the position of the loop so that zero iterations are 
executed on the implementations we know about.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Joerg Roedel
On Thu, Apr 15, 2010 at 12:48:09PM +0300, Avi Kivity wrote:
> On 04/15/2010 12:44 PM, oerg Roedel wrote:
>>
>>> So, we'd need something like the following:
>>>
>>> if (exit == NMI)
>>> __get_cpu_var(nmi_vcpu) = vcpu;
>>>
>>> stgi();
>>>
>>> if (exit == NMI) {
>>> while (!nmi_handled())
>>> cpu_relax();
>>> __get_cpu_var(nmi_vcpu) = NULL;
>>> }
>>>  
>> Hmm, looks a bit complicated to me. The NMI should happen shortly after
>> the stgi instruction. Interrupts are still disabled so we stay on this
>> cpu. Can't we just set and erase the cpu_var at vcpu_load/vcpu_put time?
>>
>>
>
> That means an NMI that happens outside guest code (for example, in the  
> mmu, or during the exit itself) would be counted as if in guest code.

Hmm, true. The same is true for an NMI that happens between VMSAVE and
STGI but that window is smaller. Anyway, I think we don't need the
busy-wait loop. The NMI should be executed at a well defined point and
we set the cpu_var back to NULL after that point.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Avi Kivity

On 04/15/2010 01:09 PM, Gleb Natapov wrote:


If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

---
Changelog:
  v1->v2:
   - report emulation error to userspace instead of ignoring it silently.

   


Yeah, better than logging to dmesg.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-15 Thread Michael S. Tsirkin
On Thu, Apr 15, 2010 at 05:36:07PM +0800, Xin, Xiaohui wrote:
> 
> Michael,
> >> The idea is simple, just to pin the guest VM user space and then
> >> let host NIC driver has the chance to directly DMA to it. 
> >> The patches are based on vhost-net backend driver. We add a device
> >> which provides proto_ops as sendmsg/recvmsg to vhost-net to
> >> send/recv directly to/from the NIC driver. KVM guest who use the
> >> vhost-net backend may bind any ethX interface in the host side to
> >> get copyless data transfer thru guest virtio-net frontend.
> >> 
> >> The scenario is like this:
> >> 
> >> The guest virtio-net driver submits multiple requests thru vhost-net
> >> backend driver to the kernel. And the requests are queued and then
> >> completed after corresponding actions in h/w are done.
> >> 
> >> For read, user space buffers are dispensed to NIC driver for rx when
> >> a page constructor API is invoked. Means NICs can allocate user buffers
> >> from a page constructor. We add a hook in netif_receive_skb() function
> >> to intercept the incoming packets, and notify the zero-copy device.
> >> 
> >> For write, the zero-copy deivce may allocates a new host skb and puts
> >> payload on the skb_shinfo(skb)->frags, and copied the header to skb->data.
> >> The request remains pending until the skb is transmitted by h/w.
> >> 
> >> Here, we have ever considered 2 ways to utilize the page constructor
> >> API to dispense the user buffers.
> >> 
> >> One:   Modify __alloc_skb() function a bit, it can only allocate a 
> >>structure of sk_buff, and the data pointer is pointing to a 
> >>user buffer which is coming from a page constructor API.
> >>Then the shinfo of the skb is also from guest.
> >>When packet is received from hardware, the skb->data is filled
> >>directly by h/w. What we have done is in this way.
> >> 
> >>Pros:   We can avoid any copy here.
> >>Cons:   Guest virtio-net driver needs to allocate skb as almost
> >>the same method with the host NIC drivers, say the size
> >>of netdev_alloc_skb() and the same reserved space in the
> >>head of skb. Many NIC drivers are the same with guest and
> >>ok for this. But some lastest NIC drivers reserves special
> >>room in skb head. To deal with it, we suggest to provide
> >>a method in guest virtio-net driver to ask for parameter
> >>we interest from the NIC driver when we know which device 
> >>we have bind to do zero-copy. Then we ask guest to do so.
> >>Is that reasonable?
> 
> >Unfortunately, this would break compatibility with existing virtio.
> >This also complicates migration. 
> 
> You mean any modification to the guest virtio-net driver will break the
> compatibility? We tried to enlarge the virtio_net_config to contains the
> 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
> will check the feature flag, and get the parameters, then virtio-net driver 
> use
> it to allocate buffers. How about this?

This means that we can't, for example, live-migrate between different systems
without flushing outstanding buffers.

> >What is the room in skb head used for?
> I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
> compared to
> NET_IP_ALIGN.

Looking at code, this seems to do with alignment - could just be
a performance optimization.

> >> Two:   Modify driver to get user buffer allocated from a page 
> >> constructor
> >>API(to substitute alloc_page()), the user buffer are used as payload
> >>buffers and filled by h/w directly when packet is received. Driver
> >>should associate the pages with skb (skb_shinfo(skb)->frags). For 
> >>the head buffer side, let host allocates skb, and h/w fills it. 
> >>After that, the data filled in host skb header will be copied into
> >>guest header buffer which is submitted together with the payload buffer.
> >> 
> >>Pros:   We could less care the way how guest or host allocates their
> >>buffers.
> >>Cons:   We still need a bit copy here for the skb header.
> >> 
> >> We are not sure which way is the better here.
> 
> >The obvious question would be whether you see any speed difference
> >with the two approaches. If no, then the second approach would be
> >better.
> 
> I remember the second approach is a bit slower in 1500MTU. 
> But we did not tested too much.

Well, that's an important datapoint. By the way, you'll need
header copy to activate LRO in host, so that's a good
reason to go with option 2 as well.

> >> This is the first thing we want
> >> to get comments from the community. We wish the modification to the network
> >> part will be generic which not used by vhost-net backend only, but a user
> >> application may use it as well when the zero-copy device may provides async
> >> read/write operations later.
> >> 
> >> Please give comments especially for the network part modificati

[PATCHv2] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Gleb Natapov

If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.

---
Changelog:
 v1->v2:
  - report emulation error to userspace instead of ignoring it silently.

Should be applied after "KVM: fix emulator_task_switch() return value."
since it relies on new return value from kvm_task_switch().

Signed-off-by: Gleb Natapov
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c773a46..4a7a9ff 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2254,7 +2254,13 @@ static int task_switch_interception(struct vcpu_svm *svm)
 (int_vec == OF_VECTOR || int_vec == BP_VECTOR)))
skip_emulated_instruction(&svm->vcpu);
 
-   return kvm_task_switch(&svm->vcpu, tss_selector, reason);
+   if (kvm_task_switch(&svm->vcpu, tss_selector, reason) == EMULATE_FAIL) {
+   svm->vcpu.run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   svm->vcpu.run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+   svm->vcpu.run->internal.ndata = 0;
+   return 0;
+   }
+   return 1;
 }
 
 static int cpuid_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 453f080..9517cbc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3306,8 +3306,12 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
   type != INTR_TYPE_NMI_INTR))
skip_emulated_instruction(vcpu);
 
-   if (!kvm_task_switch(vcpu, tss_selector, reason))
+   if (kvm_task_switch(vcpu, tss_selector, reason) == EMULATE_FAIL) {
+   vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+   vcpu->run->internal.ndata = 0;
return 0;
+   }
 
/* clear all local breakpoint enable flags */
vmcs_writel(GUEST_DR7, vmcs_readl(GUEST_DR7) & ~55);
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm autotest, how to disable address cache

2010-04-15 Thread Michael Goldish
On 04/08/2010 11:53 PM, Ryan Harper wrote:
> Is there any way to disable this?  I'm running a guest on -net user
> networking, no interaction with the host network, yet, during the test,
> I get tons of:
> 
> 15:50:48 DEBUG| (address cache) Adding cache entry: 00:1a:64:39:04:91 ---> 
> 10.0.253.16
> 15:50:49 DEBUG| (address cache) Adding cache entry: e4:1f:13:2c:e5:04 ---> 
> 10.0.253.132
> 
> many times for the same mapping.  If I'm not using tap networking on a
> public bridge, what's this address cache doing for me? And, how the heck
> do turn this off?
> 
> 

Currently that's not configurable.  It's there under the assumption that
it doesn't annoy anyone too much.  Now that I know you're annoyed by it
I'll send a patch to make it optional.  If you can't wait, feel free to
comment out the first paragraph in kvm_preprocessing.preprocess().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity

On 04/15/2010 12:44 PM, oerg Roedel wrote:



So, we'd need something like the following:

if (exit == NMI)
__get_cpu_var(nmi_vcpu) = vcpu;

stgi();

if (exit == NMI) {
while (!nmi_handled())
cpu_relax();
__get_cpu_var(nmi_vcpu) = NULL;
}
 

Hmm, looks a bit complicated to me. The NMI should happen shortly after
the stgi instruction. Interrupts are still disabled so we stay on this
cpu. Can't we just set and erase the cpu_var at vcpu_load/vcpu_put time?

   


That means an NMI that happens outside guest code (for example, in the 
mmu, or during the exit itself) would be counted as if in guest code.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread oerg Roedel
On Thu, Apr 15, 2010 at 12:09:28PM +0300, Avi Kivity wrote:
> On 04/15/2010 12:04 PM, oerg Roedel wrote:
>> On Mon, Apr 15, 2030 at 04:57:38PM +0800, Zhang, Yanmin wrote:
>>
>>
>>> I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
>>> happens in guest os. In addition, svm_complete_interrupts is called after
>>> interrupt is enabled.
>>>  
>> Yes. The NMI is held pending by the hardware until the STGI instruction
>> is executed.
>> And for nested svm the svm_complete_interrupts function needs to be
>> executed after the nested exit handling. Therefore it is done late on
>> svm.
>>
>
> So, we'd need something like the following:
>
>if (exit == NMI)
>__get_cpu_var(nmi_vcpu) = vcpu;
>
>stgi();
>
>if (exit == NMI) {
>while (!nmi_handled())
>cpu_relax();
>__get_cpu_var(nmi_vcpu) = NULL;
>}

Hmm, looks a bit complicated to me. The NMI should happen shortly after
the stgi instruction. Interrupts are still disabled so we stay on this
cpu. Can't we just set and erase the cpu_var at vcpu_load/vcpu_put time?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Avi Kivity

On 04/15/2010 12:28 PM, Gleb Natapov wrote:


kvm_task_switch() never requires userspace exit, so no matter what the
function returns we should not exit to userspace.

Signed-off-by: Gleb Natapov
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c773a46..1bd434b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2254,7 +2254,8 @@ static int task_switch_interception(struct vcpu_svm *svm)
 (int_vec == OF_VECTOR || int_vec == BP_VECTOR)))
skip_emulated_instruction(&svm->vcpu);

-   return kvm_task_switch(&svm->vcpu, tss_selector, reason);
+   kvm_task_switch(&svm->vcpu, tss_selector, reason);
+   return 1;
  }

  static int cpuid_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 453f080..3e1607d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3306,8 +3306,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
   type != INTR_TYPE_NMI_INTR))
skip_emulated_instruction(vcpu);

-   if (!kvm_task_switch(vcpu, tss_selector, reason))
-   return 0;
+   kvm_task_switch(vcpu, tss_selector, reason);

   


Ignoring the return seems wrong.  At the very least log errors in dmesg 
(rate-limited).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.

2010-04-15 Thread Xin, Xiaohui

Michael,
>> The idea is simple, just to pin the guest VM user space and then
>> let host NIC driver has the chance to directly DMA to it. 
>> The patches are based on vhost-net backend driver. We add a device
>> which provides proto_ops as sendmsg/recvmsg to vhost-net to
>> send/recv directly to/from the NIC driver. KVM guest who use the
>> vhost-net backend may bind any ethX interface in the host side to
>> get copyless data transfer thru guest virtio-net frontend.
>> 
>> The scenario is like this:
>> 
>> The guest virtio-net driver submits multiple requests thru vhost-net
>> backend driver to the kernel. And the requests are queued and then
>> completed after corresponding actions in h/w are done.
>> 
>> For read, user space buffers are dispensed to NIC driver for rx when
>> a page constructor API is invoked. Means NICs can allocate user buffers
>> from a page constructor. We add a hook in netif_receive_skb() function
>> to intercept the incoming packets, and notify the zero-copy device.
>> 
>> For write, the zero-copy deivce may allocates a new host skb and puts
>> payload on the skb_shinfo(skb)->frags, and copied the header to skb->data.
>> The request remains pending until the skb is transmitted by h/w.
>> 
>> Here, we have ever considered 2 ways to utilize the page constructor
>> API to dispense the user buffers.
>> 
>> One: Modify __alloc_skb() function a bit, it can only allocate a 
>>  structure of sk_buff, and the data pointer is pointing to a 
>>  user buffer which is coming from a page constructor API.
>>  Then the shinfo of the skb is also from guest.
>>  When packet is received from hardware, the skb->data is filled
>>  directly by h/w. What we have done is in this way.
>> 
>>  Pros:   We can avoid any copy here.
>>  Cons:   Guest virtio-net driver needs to allocate skb as almost
>>  the same method with the host NIC drivers, say the size
>>  of netdev_alloc_skb() and the same reserved space in the
>>  head of skb. Many NIC drivers are the same with guest and
>>  ok for this. But some lastest NIC drivers reserves special
>>  room in skb head. To deal with it, we suggest to provide
>>  a method in guest virtio-net driver to ask for parameter
>>  we interest from the NIC driver when we know which device 
>>  we have bind to do zero-copy. Then we ask guest to do so.
>>  Is that reasonable?

>Unfortunately, this would break compatibility with existing virtio.
>This also complicates migration. 

You mean any modification to the guest virtio-net driver will break the
compatibility? We tried to enlarge the virtio_net_config to contains the
2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe()
will check the feature flag, and get the parameters, then virtio-net driver use
it to allocate buffers. How about this?

>What is the room in skb head used for?
I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes 
compared to
NET_IP_ALIGN.

>> Two: Modify driver to get user buffer allocated from a page constructor
>>  API(to substitute alloc_page()), the user buffer are used as payload
>>  buffers and filled by h/w directly when packet is received. Driver
>>  should associate the pages with skb (skb_shinfo(skb)->frags). For 
>>  the head buffer side, let host allocates skb, and h/w fills it. 
>>  After that, the data filled in host skb header will be copied into
>>  guest header buffer which is submitted together with the payload buffer.
>> 
>>  Pros:   We could less care the way how guest or host allocates their
>>  buffers.
>>  Cons:   We still need a bit copy here for the skb header.
>> 
>> We are not sure which way is the better here.

>The obvious question would be whether you see any speed difference
>with the two approaches. If no, then the second approach would be
>better.

I remember the second approach is a bit slower in 1500MTU. 
But we did not tested too much.

>> This is the first thing we want
>> to get comments from the community. We wish the modification to the network
>> part will be generic which not used by vhost-net backend only, but a user
>> application may use it as well when the zero-copy device may provides async
>> read/write operations later.
>> 
>> Please give comments especially for the network part modifications.
>> 
>> 
>> We provide multiple submits and asynchronous notifiicaton to 
>>vhost-net too.
>> 
>> Our goal is to improve the bandwidth and reduce the CPU usage.
>> Exact performance data will be provided later. But for simple
>> test with netperf, we found bindwidth up and CPU % up too,
>> but the bindwidth up ratio is much more than CPU % up ratio.
>> 
>> What we have not done yet:
>>  packet split support

>What does this mean, exactly?
We can support 1500MTU, but for jumbo frame, since vhost driver before don't 
support mergeable buffer, we cannot try

[PATCH] KVM: fix emulator_task_switch() return value.

2010-04-15 Thread Gleb Natapov

emulator_task_switch() should return -1 for failure and 0 for success to
the caller, just like x86_emulate_insn() does.

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 083b269..b836900 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2437,7 +2437,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
kvm_rip_write(ctxt->vcpu, c->eip);
}
 
-   return rc;
+   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
 static void string_addr_inc(struct x86_emulate_ctxt *ctxt, unsigned long base,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11aef42..aa884f9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4811,10 +4811,11 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason)
ret = emulator_task_switch(&vcpu->arch.emulate_ctxt, &emulate_ops,
   tss_selector, reason);
 
-   if (ret == X86EMUL_CONTINUE)
-   kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
+   if (ret)
+   return EMULATE_FAIL;
 
-   return (ret != X86EMUL_CONTINUE);
+   kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags);
+   return EMULATE_DONE;
 }
 EXPORT_SYMBOL_GPL(kvm_task_switch);
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Gleb Natapov

kvm_task_switch() never requires userspace exit, so no matter what the
function returns we should not exit to userspace.

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c773a46..1bd434b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2254,7 +2254,8 @@ static int task_switch_interception(struct vcpu_svm *svm)
 (int_vec == OF_VECTOR || int_vec == BP_VECTOR)))
skip_emulated_instruction(&svm->vcpu);
 
-   return kvm_task_switch(&svm->vcpu, tss_selector, reason);
+   kvm_task_switch(&svm->vcpu, tss_selector, reason);
+   return 1;
 }
 
 static int cpuid_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 453f080..3e1607d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3306,8 +3306,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
   type != INTR_TYPE_NMI_INTR))
skip_emulated_instruction(vcpu);
 
-   if (!kvm_task_switch(vcpu, tss_selector, reason))
-   return 0;
+   kvm_task_switch(vcpu, tss_selector, reason);
 
/* clear all local breakpoint enable flags */
vmcs_writel(GUEST_DR7, vmcs_readl(GUEST_DR7) & ~55);
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity

On 04/15/2010 12:04 PM, oerg Roedel wrote:

On Mon, Apr 15, 2030 at 04:57:38PM +0800, Zhang, Yanmin wrote:

   

I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
happens in guest os. In addition, svm_complete_interrupts is called after
interrupt is enabled.
 

Yes. The NMI is held pending by the hardware until the STGI instruction
is executed.
And for nested svm the svm_complete_interrupts function needs to be
executed after the nested exit handling. Therefore it is done late on
svm.
   


So, we'd need something like the following:

   if (exit == NMI)
   __get_cpu_var(nmi_vcpu) = vcpu;

   stgi();

   if (exit == NMI) {
   while (!nmi_handled())
   cpu_relax();
   __get_cpu_var(nmi_vcpu) = NULL;
   }

and no code sharing betweem vmx and svm.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.

2010-04-15 Thread Michael S. Tsirkin
On Thu, Apr 15, 2010 at 05:01:10PM +0800, Xin, Xiaohui wrote:
> >It smells like a layering violation to look at the iocb->private field
> >from a lower-level driver. I would have hoped that it's possible to implement
> >this without having this driver know about the higher-level vhost driver
> >internals. Can you explain why this is needed?
> 
> I don't like this too, but since the kiocb is maintained by vhost with a 
> list_head.
> And mp device is responsible to collect the kiocb into the list_head,
> We need something known by vhost/mp both.

Can't vhost supply a kiocb completion callback that will handle the list?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread oerg Roedel
On Mon, Apr 15, 2030 at 04:57:38PM +0800, Zhang, Yanmin wrote:

> I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
> happens in guest os. In addition, svm_complete_interrupts is called after
> interrupt is enabled.

Yes. The NMI is held pending by the hardware until the STGI instruction
is executed.
And for nested svm the svm_complete_interrupts function needs to be
executed after the nested exit handling. Therefore it is done late on
svm.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Replace role.glevels with role.cr4_pae

2010-04-15 Thread Avi Kivity

On 04/14/2010 09:29 PM, Marcelo Tosatti wrote:

On Wed, Apr 14, 2010 at 07:32:12PM +0300, Avi Kivity wrote:
   

On 04/14/2010 07:20 PM, Avi Kivity wrote:
 

There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.
   
 

  static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
  {
-   if (sp->role.glevels != vcpu->arch.mmu.root_level) {
+   if (sp->role.cr4_pae != !!is_pae(vcpu)) {
kvm_mmu_zap_page(vcpu->kvm, sp);
return 1;
}
   

This bit confuses me a little.  Why is it needed?  It will never hit
from mmu_sync_children(), and as for kvm_mmu_get_page(), it will
simply zap unrelated pages?
 

kvm_mmu_get_page is write protecting a gfn.


Took me a while to figure out why.


If there's shadow for a
differ  ent role, and its unsync, it needs to be synchronized.

   


We could leave it unsync and write protected, though that destroys an 
invariant (sync==protected, unsync==unprotected), and all the calls to 
rmap_write_protect() become confused.



Perhaps it could call the appropriate _sync_page version instead
of zapping, similar to mmu_pte_write_new_pte.
   


Probably better for nonpae.


Is it related to the restriction that we can only unsync if we have
just one shadow page for a gfn?  That's somewhat artificial (and
hurts nonpae guests, and guests with linear page tables).
 

If gfn is shadowed at PMD or higher level, you can't unsync the PTE
shadow.
   


Yes.  Even if we could, invlpg is defined to drop all PDE caches (except 
large page PDEs), so we would have to resync all those pages on invlpg.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][PATCH v3 1/3] A device for zero-copy based on KVM virtio-net.

2010-04-15 Thread Xin, Xiaohui
Arnd,
>> From: Xin Xiaohui 
>> 
>> Add a device to utilize the vhost-net backend driver for
>> copy-less data transfer between guest FE and host NIC.
>> It pins the guest user space to the host memory and
>> provides proto_ops as sendmsg/recvmsg to vhost-net.

>Sorry for taking so long before finding the time to look
>at your code in more detail.

>It seems that you are duplicating a lot of functionality that
>is already in macvtap. I've asked about this before but then
>didn't look at your newer versions. Can you explain the value
>of introducing another interface to user land?

>I'm still planning to add zero-copy support to macvtap,
>hopefully reusing parts of your code, but do you think there
>is value in having both?

I have not looked into your macvtap code in detail before.
Does the two interface exactly the same? We just want to create a simple
way to do zero-copy. Now it can only support vhost, but in future
we also want it to support directly read/write operations from user space too.

Basically, compared to the interface, I'm more worried about the modification
to net core we have made to implement zero-copy now. If this hardest part
can be done, then any user space interface modifications or integrations are 
more easily to be done after that.

>> diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c
>> new file mode 100644
>> index 000..86d2525
>> --- /dev/null
>> +++ b/drivers/vhost/mpassthru.c
> >@@ -0,0 +1,1264 @@
> >+
> >+#ifdef MPASSTHRU_DEBUG
>> +static int debug;
>> +
>> +#define DBG  if (mp->debug) printk
> >+#define DBG1 if (debug == 2) printk
> >+#else
> >+#define DBG(a...)
> >+#define DBG1(a...)
> >+#endif

>This should probably just use the existing dev_dbg/pr_debug infrastructure.

Thanks. Will try that.
> [... skipping buffer management code for now]

> >+static int mp_sendmsg(struct kiocb *iocb, struct socket *sock,
> >+struct msghdr *m, size_t total_len)
> >+{
> >[...]

>This function looks like we should be able to easily include it into
>macvtap and get zero-copy transmits without introducing the new
>user-level interface.

>> +static int mp_recvmsg(struct kiocb *iocb, struct socket *sock,
>> +struct msghdr *m, size_t total_len,
>> +int flags)
>> +{
>> +struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
>> +struct page_ctor *ctor;
>> +struct vhost_virtqueue *vq = (struct vhost_virtqueue *)(iocb->private);

>It smells like a layering violation to look at the iocb->private field
>from a lower-level driver. I would have hoped that it's possible to implement
>this without having this driver know about the higher-level vhost driver
>internals. Can you explain why this is needed?

I don't like this too, but since the kiocb is maintained by vhost with a 
list_head.
And mp device is responsible to collect the kiocb into the list_head,
We need something known by vhost/mp both.
 
>> +spin_lock_irqsave(&ctor->read_lock, flag);
>> +list_add_tail(&info->list, &ctor->readq);
> >+spin_unlock_irqrestore(&ctor->read_lock, flag);
> >+
> >+if (!vq->receiver) {
> >+vq->receiver = mp_recvmsg_notify;
> >+set_memlock_rlimit(ctor, RLIMIT_MEMLOCK,
> >+   vq->num * 4096,
> >+   vq->num * 4096);
> >+}
> >+
> >+return 0;
> >+}

>Not sure what I'm missing, but who calls the vq->receiver? This seems
>to be neither in the upstream version of vhost nor introduced by your
>patch.

See Patch v3 2/3 I have sent out, it is called by handle_rx() in vhost.

>> +static void __mp_detach(struct mp_struct *mp)
>> +{
> >+mp->mfile = NULL;
> >+
> >+mp_dev_change_flags(mp->dev, mp->dev->flags & ~IFF_UP);
> >+page_ctor_detach(mp);
> >+mp_dev_change_flags(mp->dev, mp->dev->flags | IFF_UP);
> >+
> >+/* Drop the extra count on the net device */
> >+dev_put(mp->dev);
> >+}
> >+
> >+static DEFINE_MUTEX(mp_mutex);
> >+
> >+static void mp_detach(struct mp_struct *mp)
> >+{
> >+mutex_lock(&mp_mutex);
> >+__mp_detach(mp);
> >+mutex_unlock(&mp_mutex);
> >+}
> >+
> >+static void mp_put(struct mp_file *mfile)
> >+{
> >+if (atomic_dec_and_test(&mfile->count))
> >+mp_detach(mfile->mp);
> >+}
> >+
> >+static int mp_release(struct socket *sock)
> >+{
> >+struct mp_struct *mp = container_of(sock->sk, struct mp_sock, sk)->mp;
> >+struct mp_file *mfile = mp->mfile;
> >+
> >+mp_put(mfile);
> >+sock_put(mp->socket.sk);
> >+put_net(mfile->net);
> >+
> >+return 0;
> >+}

>Doesn't this prevent the underlying interface from going away while the chardev
>is open? You also have logic to handle that case, so why do you keep the extra
>reference on the netdev?

Let me think.

>> +/* Ops structure to mimic raw sockets with mp device */
>> +static const struct proto_ops mp_socket_ops = {
>> +.sendmsg = mp_sendmsg,
>> +.recvmsg = mp_recvmsg,
>> +.re

Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Zhang, Yanmin
On Thu, 2010-04-15 at 11:05 +0300, Avi Kivity wrote:
> On 04/15/2030 04:04 AM, Zhang, Yanmin wrote:
> >
> >> An even more accurate way to determine this is to check whether the
> >> interrupt frame points back at the 'int $2' instruction.  However we
> >> plan to switch to a self-IPI method to inject the NMI, and I'm not sure
> >> wether APIC NMIs are accepted on an instruction boundary or whether
> >> there's some latency involved.
> >>  
> > Yes. But the frame pointer checking seems a little complicated.
> >
> 
> An even bigger disadvantage is that it won't work with Sheng's patch, 
> self-NMIs are not synchronous.
> 
> >>>   trace_kvm_entry(vcpu->vcpu_id);
> >>> +
> >>> + percpu_write(current_vcpu, vcpu);
> >>>   kvm_x86_ops->run(vcpu);
> >>> + percpu_write(current_vcpu, NULL);
> >>>
> >>>
> >> If you move this around the 'int $2' instructions you will close the
> >> race, as a stray NMI won't catch us updating the rip cache.  But that
> >> depends on whether self-IPI is accepted on the next instruction or not.
> >>  
> > Right. The kernel part has dependency on the self-IPI implementation.
> > I will move above percpu_write(current_vcpu, vcpu) (or a new wrapper 
> > function)
> > just around 'int $2'.
> >
> >
> 
> Or create a new function to inject the interrupt in x86.c.  That will 
> reduce duplication between svm.c and vmx.c.
I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
happens in guest os. In addition, svm_complete_interrupts is called after
interrupt is enabled.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: Push potential exception error code on task switches

2010-04-15 Thread Jan Kiszka
Avi Kivity wrote:
> On 04/14/2010 04:19 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> On 04/14/2010 03:58 PM, Jan Kiszka wrote:
>>>  
> The TSS descriptor (gate doesn't have a size).  But isn't it possible to
> have a 32-bit TSS with a 16-bit CS/SS?
>
>  
 Might be possible, but will cause troubles as the spec says:

 "The error code is pushed on the stack as a doubleword or word
 (depending on the default interrupt, trap, or task gate size)."


>>> My guess is that this is an error and that the 32-bitness of a TSS only
>>> refers to the format of the TSS, and has nothing to do with the code
>>> that actually runs.  I'll ask Intel about it.  Meanwhile this can be
>>> applied, if there's a problem with 16-bit exception handlers running
>>> through a 32-bit task referenced by a task gate in the IDT, it can be
>>> fixed later.
>>>  
>> Go ahead. But architecturally this looks fairly consistent to me as the
>> processor simply derives the error code width from the corresponding
>> entry in the IDT.
>>
> 
> You are correct (though the entry isn't in the IDT!)

Yeah, right (this stuff keeps on hurting my brain).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: Push potential exception error code on task switches

2010-04-15 Thread Avi Kivity

On 04/14/2010 04:19 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 04/14/2010 03:58 PM, Jan Kiszka wrote:
 

The TSS descriptor (gate doesn't have a size).  But isn't it possible to
have a 32-bit TSS with a 16-bit CS/SS?

 

Might be possible, but will cause troubles as the spec says:

"The error code is pushed on the stack as a doubleword or word
(depending on the default interrupt, trap, or task gate size)."

   

My guess is that this is an error and that the 32-bitness of a TSS only
refers to the format of the TSS, and has nothing to do with the code
that actually runs.  I'll ask Intel about it.  Meanwhile this can be
applied, if there's a problem with 16-bit exception handlers running
through a 32-bit task referenced by a task gate in the IDT, it can be
fixed later.
 

Go ahead. But architecturally this looks fairly consistent to me as the
processor simply derives the error code width from the corresponding
entry in the IDT.
   


You are correct (though the entry isn't in the IDT!)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] Inter-VM shared memory PCI device

2010-04-15 Thread Avi Kivity

On 04/15/2010 02:30 AM, Cam Macdonell wrote:



Sample programs, init scripts and the shared memory server are available
in a
git repo here:

 www.gitorious.org/nahanni

   

Please consider qemu.git/contrib.
 

Should the compilation be tied into Qemu's regular build with a switch
(e.g. --enable-ivshmem-server)? Or should it be its own separate
build?
   


It can have its own makefile.

---
  Makefile.target |3 +
  hw/ivshmem.c|  700
+++
  qemu-char.c |6 +
  qemu-char.h |3 +

   

qemu-doc.texi | 45 +
 

Seems to be light on qdev devices.  I notice there is a section named
"Data Type Index" that "could be used for qdev device names and
options", but is currently empty.  Should I place documentation there
of device there or just add it to "3.3 Invocation"?
   


I think those are in qemu-options.hx.  Just put it somewhere where it 
seems appropriate.


   
 

  4 files changed, 712 insertions(+), 0 deletions(-)
  create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 1ffd802..bc9a681 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o
  obj-y += rtl8139.o
  obj-y += e1000.o

+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+

   

depends on CONFIG_PCI
 

as in

obj-($CONFIG_PCI) += ivshmem.o


the variable CONFIG_PCI doesn't seem to be set during configuration.
I don't see any other PCI devices that depend on it.


My mistake, keep as is.


Do we also want
to depend on CONFIG_KVM?
   


No real need.


+static void create_shared_memory_BAR(IVShmemState *s, int fd) {
+
+s->shm_fd = fd;
+
+s->ivshmem_offset = qemu_ram_mmap(s->shm_fd, s->ivshmem_size,
+ MAP_SHARED, 0);

   

Where did the offset go?
 

0 is the offset.  I include the offset parameter in qemu_ram_mmap() to
make it flexible for other uses.


Makes sense.


Are you suggesting to take an
optional offset as an argument to -device?
   


No need.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM performance issue in KVM guests.

2010-04-15 Thread Avi Kivity

On 04/15/2010 07:58 AM, Srivatsa Vaddagiri wrote:
On Sun, Apr 11, 2010 at 11:40 PM, Avi Kivity > wrote:


The current handing of PLE is very suboptimal.  With proper
directed yield we should be much better there.



Hi Avi,
  By directed yield, do you mean transfer the timeslice of 
one thread (which is contending for a lock) to another thread (which 
is holding a lock)?


It's a priority transfer (in CFS terms, vruntime) (we don't know who 
holds the lock, so we pick a co-vcpu at random).


If at that point in time, the lock-holder thread/VCPU is actually not 
running currently, ie it is at the back of the runqueue, would it help 
much? In such case, it will take time for the lock holder to run again 
and the default timeslice it would have got could have been sufficient 
to release the lock?


The idea is to increase the chances to the target vcpu to run, and to 
decrease the changes of the spinner to run (hopefully they change places).




I am also working on a prototype for some other technique here - to 
avoid preempting guest threads/VCPUs in the middle of their 
(spin-lock) critical section. This requires guest to hint host when 
there are in such a section. [1] has shown 33% improvement to an 
apache benchmark based on this idea.




Certainly that has even greater potential for Linux guests.  Note that 
we spin on mutexes now, so we need to prevent preemption while the lock 
owner is running.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity

On 04/15/2030 04:04 AM, Zhang, Yanmin wrote:



An even more accurate way to determine this is to check whether the
interrupt frame points back at the 'int $2' instruction.  However we
plan to switch to a self-IPI method to inject the NMI, and I'm not sure
wether APIC NMIs are accepted on an instruction boundary or whether
there's some latency involved.
 

Yes. But the frame pointer checking seems a little complicated.
   


An even bigger disadvantage is that it won't work with Sheng's patch, 
self-NMIs are not synchronous.



trace_kvm_entry(vcpu->vcpu_id);
+
+   percpu_write(current_vcpu, vcpu);
kvm_x86_ops->run(vcpu);
+   percpu_write(current_vcpu, NULL);

   

If you move this around the 'int $2' instructions you will close the
race, as a stray NMI won't catch us updating the rip cache.  But that
depends on whether self-IPI is accepted on the next instruction or not.
 

Right. The kernel part has dependency on the self-IPI implementation.
I will move above percpu_write(current_vcpu, vcpu) (or a new wrapper function)
just around 'int $2'.

   


Or create a new function to inject the interrupt in x86.c.  That will 
reduce duplication between svm.c and vmx.c.



Sheng would find a solution on the self-IPI delivery. Let's separate my patch
and self-IPI as 2 issues as we don't know when the self-IPI delivery would be
resolved.
   


Sure.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Timedrift in KVM guests after livemigration.

2010-04-15 Thread Espen Berg
We have three KVM hosts that supports live-migration between them, but 
one of our problems is time drifting.  The three frontends has different 
CPU frequency and the KVM guests adopt the frequency from the host 
machine where it was first started.


Host1: cat /proc/cpuinfo
model name  : Intel(R) Core(TM)2 CPU  6600  @ 2.40GHz
cpu MHz : 2394.048

Host2: cat /proc/cpuinfo
model name  : Intel(R) Core(TM)2 CPU  6700  @ 2.66GHz
cpu MHz : 2659.685

Host3: cat /proc/cpuinfo
model name  : Intel(R) Xeon(R) CPU   E5410  @ 2.33GHz
cpu MHz : 2327.507


virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.11.0

Is there any solution to our problems, or is a reboot the only safe 
solution?


Regards
Espen



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] [Autotest PATCH v2] KVM-test: Add a subtest 'qemu_img'

2010-04-15 Thread Lucas Meneghel Rodrigues
On Wed, Mar 31, 2010 at 4:23 AM, Yolkfull Chow  wrote:
> This is designed to test all subcommands of 'qemu-img' however
> so far 'commit' is not implemented.
>
> * For 'check' subcommand test, it will 'dd' to create a file with specified
> size and see whether it's supported to be checked. Then convert it to be
> supported formats ('qcow2' and 'raw' so far) to see whether there's error
> after convertion.
>
> * For 'convert' subcommand test, it will convert both to 'qcow2' and 'raw' 
> from
> the format specified in config file. And only check 'qcow2' after convertion.
>
> * For 'snapshot' subcommand test, it will create two snapshots and list them.
> Finally delete them if no errors found.
>
> * For 'info' subcommand test, it will check image format & size according to
> output of 'info' subcommand  at specified image file.
>
> * For 'rebase' subcommand test, it will create first snapshot 'sn1' based on 
> original
> base_img, and create second snapshot based on sn1. And then rebase sn2 to 
> base_img.
> After rebase check the baking_file of sn2.
>
> This supports two rebase mode: unsafe mode and safe mode:
> Unsafe mode:
> With -u an unsafe mode is enabled that doesn't require the backing files to 
> exist.
> It merely changes the backing file reference in the COW image. This is useful 
> for
> renaming or moving the backing file. The user is responsible to make sure 
> that the
> new backing file has no changes compared to the old one, or corruption may 
> occur.
>
> Safe Mode:
> Both the current and the new backing file need to exist, and after the 
> rebase, the
> COW image is guaranteed to have the same guest visible content as before.
> To achieve this, old and new backing file are compared and, if necessary, 
> data is
> copied from the old backing file into the COW image.
>
> Improvement from v1:
> * Add an underscore _ at the beginning of all the auxiliary functions.

Yolkfull, after testing I found some small issues, fixed them, changed
the commands APIs for autotest ones, and re-sent the result. I am
going to apply this shortly, thanks for your work on this test!

Cheers,

Lucas

> Results in:
>
> # ./scan_results.py
> Test                                                    Status  Seconds Info
>                                                     --  --- 
>        (Result file: ../../results/default/status)
>        smp2.RHEL.5.4.i386.qemu_img.check                       GOOD    132 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.create                      GOOD    144 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.convert.to_qcow2            GOOD    251 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.convert.to_raw              GOOD    245 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.snapshot                    GOOD    140 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.commit                      GOOD    146 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.info                        GOOD    133 
> completed successfully
>        smp2.RHEL.5.4.i386.qemu_img.rebase                      TEST_NA 137 
> Current kvm user space version does not support 'rebase' subcommand
>                                                            GOOD    1392
>        [r...@afu kvm]#
>
> shows only 'rebase' subtest is not supported currently.
> Others runs good from my side.
>
> Signed-off-by: Yolkfull Chow 
> ---
>  client/tests/kvm/tests/qemu_img.py     |  280 
> 
>  client/tests/kvm/tests_base.cfg.sample |   40 +
>  2 files changed, 320 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/tests/qemu_img.py
>
> diff --git a/client/tests/kvm/tests/qemu_img.py 
> b/client/tests/kvm/tests/qemu_img.py
> new file mode 100644
> index 000..7f786c5
> --- /dev/null
> +++ b/client/tests/kvm/tests/qemu_img.py
> @@ -0,0 +1,280 @@
> +import re, os, logging, commands
> +from autotest_lib.client.common_lib import utils, error
> +import kvm_vm, kvm_utils
> +
> +
> +def run_qemu_img(test, params, env):
> +    """
> +    `qemu-img' functions test:
> +    1) Judge what subcommand is going to be tested
> +    2) Run subcommand test
> +
> +   �...@param test: kvm test object
> +   �...@param params: Dictionary with the test parameters
> +   �...@param env: Dictionary with test environment.
> +    """
> +    cmd = kvm_utils.get_path(test.bindir, params.get("qemu_img_binary"))
> +    if not os.path.exists(cmd):
> +        raise error.TestError("Binary of 'qemu-img' not found")
> +    image_format = params.get("image_format")
> +    image_size = params.get("image_size", "10G")
> +    image_name = kvm_vm.get_image_filename(params, test.bindir)
> +
> +    def _check(cmd, img):
> +        """
> +        Simple 'qemu-img check' function implementation.
> +
> +       �...@param cmd: binary of 'qemu_img'
> +       �...@param img: image to be checked
> +        """
> +      

[PATCH] KVM-test: Add a subtest 'qemu_img' v3

2010-04-15 Thread Lucas Meneghel Rodrigues
This is designed to test all subcommands of 'qemu-img' however
so far 'commit' is not implemented.

* For 'check' subcommand test, it will 'dd' to create a file with specified
size and see whether it's supported to be checked. Then convert it to be
supported formats ('qcow2' and 'raw' so far) to see whether there's error
after convertion.

* For 'convert' subcommand test, it will convert both to 'qcow2' and 'raw' from
the format specified in config file. And only check 'qcow2' after convertion.

* For 'snapshot' subcommand test, it will create two snapshots and list them.
Finally delete them if no errors found.

* For 'info' subcommand test, it will check image format & size according to
output of 'info' subcommand  at specified image file.

* For 'rebase' subcommand test, it will create first snapshot 'sn1' based on 
original
base_img, and create second snapshot based on sn1. And then rebase sn2 to 
base_img.
After rebase check the baking_file of sn2.

This supports two rebase mode: unsafe mode and safe mode:
Unsafe mode:
With -u an unsafe mode is enabled that doesn't require the backing files to 
exist.
It merely changes the backing file reference in the COW image. This is useful 
for
renaming or moving the backing file. The user is responsible to make sure that 
the
new backing file has no changes compared to the old one, or corruption may 
occur.

Safe Mode:
Both the current and the new backing file need to exist, and after the rebase, 
the
COW image is guaranteed to have the same guest visible content as before.
To achieve this, old and new backing file are compared and, if necessary, data 
is
copied from the old backing file into the COW image.

Improvement from v2:
 * Used utils functions instead of commands
 * Fixed some bugs on the check test
 * Put docstrings at some functions
 * Disable profiling during the test
 * Disable regular screenshots during the test

---
 client/tests/kvm/tests/qemu_img.py |  325 
 client/tests/kvm/tests_base.cfg.sample |   42 
 2 files changed, 367 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/qemu_img.py

diff --git a/client/tests/kvm/tests/qemu_img.py 
b/client/tests/kvm/tests/qemu_img.py
new file mode 100644
index 000..d3f7ff1
--- /dev/null
+++ b/client/tests/kvm/tests/qemu_img.py
@@ -0,0 +1,325 @@
+import re, os, logging, commands
+from autotest_lib.client.common_lib import utils, error
+import kvm_vm, kvm_utils
+
+
+def run_qemu_img(test, params, env):
+"""
+'qemu-img' functions test:
+1) Judge what subcommand is going to be tested
+2) Run subcommand test
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+cmd = kvm_utils.get_path(test.bindir, params.get("qemu_img_binary"))
+if not os.path.exists(cmd):
+raise error.TestError("Binary of 'qemu-img' not found")
+image_format = params.get("image_format")
+image_size = params.get("image_size", "10G")
+image_name = kvm_vm.get_image_filename(params, test.bindir)
+
+
+def _check(cmd, img):
+"""
+Simple 'qemu-img check' function implementation.
+
+@param cmd: qemu-img base command.
+@param img: image to be checked
+"""
+cmd += " check %s" % img
+logging.info("Checking image '%s'...", img)
+try:
+output = utils.system_output(cmd)
+except error.CmdError, e:
+if "does not support checks" in str(e):
+return (True, "")
+else:
+return (False, str(e))
+return (True, output)
+
+
+def check_test(cmd):
+"""
+Subcommand 'qemu-img check' test.
+
+This tests will 'dd' to create a specified size file, and check it.
+Then convert it to supported image_format in each loop and check again.
+
+@param cmd: qemu-img base command.
+"""
+test_image = kvm_utils.get_path(test.bindir,
+params.get("image_name_dd"))
+print "test_image = %s" % test_image
+create_image_cmd = params.get("create_image_cmd")
+create_image_cmd = create_image_cmd % test_image
+print "create_image_cmd = %s" % create_image_cmd
+utils.system(create_image_cmd)
+s, o = _check(cmd, test_image)
+if not s:
+raise error.TestFail("Check image '%s' failed with error: %s" %
+   (test_image, o))
+for fmt in params.get("supported_image_formats").split():
+output_image = test_image + ".%s" % fmt
+_convert(cmd, fmt, test_image, output_image)
+s, o = _check(cmd, output_image)
+if not s:
+raise error.TestFail("Check image '%s' got error: %s" %
+ (output_image, o))
+os.remove(o