Re: [PATCH 1/4] KVM: PPC: Book3s: PR: Disable preemption in vcpu_run

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 19:19, Scott Wood wrote:

> On 12/09/2011 09:26 AM, Alexander Graf wrote:
>> When entering the guest, we want to make sure we're not getting preempted
>> away, so let's disable preemption on entry, but enable it again while 
>> handling
>> guest exits.
>> 
>> Reported-by: Jörg Sommer 
>> Signed-off-by: Alexander Graf 
>> ---
>> arch/powerpc/kvm/book3s_pr.c |7 +++
>> 1 files changed, 7 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index 726512b..8e4f800 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -519,6 +519,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
>> kvm_vcpu *vcpu,
>>  run->ready_for_interrupt_injection = 1;
>> 
>>  trace_kvm_book3s_exit(exit_nr, vcpu);
>> +preempt_enable();
>>  kvm_resched(vcpu);
>>  switch (exit_nr) {
>>  case BOOK3S_INTERRUPT_INST_STORAGE:
>> @@ -763,6 +764,8 @@ program_interrupt:
>>  run->exit_reason = KVM_EXIT_INTR;
>>  r = -EINTR;
>>  } else {
>> +preempt_disable();
>> +
>>  /* In case an interrupt came in that was triggered
>>   * from userspace (like DEC), we need to check what
>>   * to inject now! */
> 
> Shouldn't you really have interrupts disabled here, as booke does?

Ah, thanks for the reminder. Yeah, we probably want to disable interrupts in 
parallel to checking for signals (basically from one signal check point to 
world switch). I'm just not 100% sure how to easily sync the C and asm code on 
the first entry though. Doing local_irq_disable in C and undoing it in asm 
could become ugly with lazy interrupt disabling.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: align vcpu_kick with x86

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 20:15, Scott Wood wrote:

> On 12/09/2011 01:10 PM, Alexander Graf wrote:
>> 
>> On 09.12.2011, at 19:19, Scott Wood  wrote:
>> 
>>> On 12/09/2011 09:26 AM, Alexander Graf wrote:
 Our vcpu kick implementation differs a bit from x86 which resulted in us 
 not
 disabling preemption during the kick. Get it a bit closer to what x86 does.
>>> 
>>> Disabling preemption only matters due to the other bit of functionality
>>> you brought over -- avoiding kicking the current CPU.
>> 
>> Nope, I had BUG_ON warnings in the kick code because preemption was on.
> 
> Coming from where?

>From here:

BUG: using smp_processor_id() in preemptible [] code: 
qemu-system-ppc/17448
caller is .smp_mpic_message_pass+0x88/0x10c
Call Trace:
[c00078d83600] [c0013e70] .show_stack+0x6c/0x16c (unreliable)
[c00078d836b0] [c037b614] .debug_smp_processor_id+0xe4/0x11c
[c00078d83740] [c0048988] .smp_mpic_message_pass+0x88/0x10c
[c00078d837d0] [c002f2b4] .smp_send_reschedule+0x4c/0x80
[c00078d83850] [d5e68984] .kvm_vcpu_kick+0x5c/0x74 [kvm]
[c00078d838d0] [d5e689d8] .kvm_vcpu_ioctl_interrupt+0x3c/0x54 [kvm]
[c00078d83950] [d5e68a8c] .kvm_arch_vcpu_ioctl+0x9c/0x21c [kvm]
[c00078d83b60] [d5e64520] .kvm_vcpu_ioctl+0x7c/0x6b8 [kvm]
[c00078d83c20] [d5e64c2c] .kvm_vcpu_compat_ioctl+0xd0/0xfc [kvm]
[c00078d83cd0] [c01be750] .compat_sys_ioctl+0x160/0x1864
[c00078d83e30] [c000988c] syscall_exit+0x0/0x40


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Fix serial port probing

2011-12-09 Thread Ingo Molnar

* Pekka Enberg  wrote:

> >                        dev->ier        = ioport__read8(data) & 0x3f;
> > +                       kvm__irq_line(kvm, dev->irq, dev->ier?1:0);
> >                        break;
> >                case UART_LCR:
> >                        dev->lcr        = ioport__read8(data);
> 
> Applied, thanks!
> 
> Ingo, does this fix the occasional slowdowns you are seeing?

Indeed it does!

I still see things like 'top' refreshes progressing - i.e. 
taking something like 100-200 msecs to finish, but it's a *lot* 
faster now.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: align vcpu_kick with x86

2011-12-09 Thread Scott Wood
On 12/09/2011 01:10 PM, Alexander Graf wrote:
> 
> On 09.12.2011, at 19:19, Scott Wood  wrote:
> 
>> On 12/09/2011 09:26 AM, Alexander Graf wrote:
>>> Our vcpu kick implementation differs a bit from x86 which resulted in us not
>>> disabling preemption during the kick. Get it a bit closer to what x86 does.
>>
>> Disabling preemption only matters due to the other bit of functionality
>> you brought over -- avoiding kicking the current CPU.
> 
> Nope, I had BUG_ON warnings in the kick code because preemption was on.

Coming from where?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: align vcpu_kick with x86

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 19:19, Scott Wood  wrote:

> On 12/09/2011 09:26 AM, Alexander Graf wrote:
>> Our vcpu kick implementation differs a bit from x86 which resulted in us not
>> disabling preemption during the kick. Get it a bit closer to what x86 does.
> 
> Disabling preemption only matters due to the other bit of functionality
> you brought over -- avoiding kicking the current CPU.

Nope, I had BUG_ON warnings in the kick code because preemption was on. 
get_cpu() disables preemption indirectly though, getting rid of the oops :)

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: PPC: Book3s: PR: Disable preemption in vcpu_run

2011-12-09 Thread Scott Wood
On 12/09/2011 09:26 AM, Alexander Graf wrote:
> When entering the guest, we want to make sure we're not getting preempted
> away, so let's disable preemption on entry, but enable it again while handling
> guest exits.
> 
> Reported-by: Jörg Sommer 
> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kvm/book3s_pr.c |7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 726512b..8e4f800 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -519,6 +519,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
> kvm_vcpu *vcpu,
>   run->ready_for_interrupt_injection = 1;
>  
>   trace_kvm_book3s_exit(exit_nr, vcpu);
> + preempt_enable();
>   kvm_resched(vcpu);
>   switch (exit_nr) {
>   case BOOK3S_INTERRUPT_INST_STORAGE:
> @@ -763,6 +764,8 @@ program_interrupt:
>   run->exit_reason = KVM_EXIT_INTR;
>   r = -EINTR;
>   } else {
> + preempt_disable();
> +
>   /* In case an interrupt came in that was triggered
>* from userspace (like DEC), we need to check what
>* to inject now! */

Shouldn't you really have interrupts disabled here, as booke does?

Otherwise an interrupt (including an IPI kick) could send you a signal
or guest exception after you check.

Likewise for other guest entry points.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM: PPC: align vcpu_kick with x86

2011-12-09 Thread Scott Wood
On 12/09/2011 09:26 AM, Alexander Graf wrote:
> Our vcpu kick implementation differs a bit from x86 which resulted in us not
> disabling preemption during the kick. Get it a bit closer to what x86 does.

Disabling preemption only matters due to the other bit of functionality
you brought over -- avoiding kicking the current CPU.

Probably doesn't even matter all that much with it, since avoiding that
is just an optimization, and any race that causes us to fail to
reschedule a vcpu of a different thread means that thread just
rescheduled anyway.  Not something I feel any great need to rely on,
though. :-)

> Signed-off-by: Alexander Graf 
> ---
>  arch/powerpc/kvm/powerpc.c |7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index c952f13..ef8c990 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -557,12 +557,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>  
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
>  {
> +int me;
> +int cpu = vcpu->cpu;
> +
> +me = get_cpu();
>   if (waitqueue_active(&vcpu->wq)) {
>   wake_up_interruptible(vcpu->arch.wqp);
>   vcpu->stat.halt_wakeup++;
> - } else if (vcpu->cpu != -1) {
> + } else if (cpu != me && cpu != -1) {
>   smp_send_reschedule(vcpu->cpu);
>   }
> +put_cpu();
>  }

Whitespace.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 13:49, Carsten Otte wrote:

> This patch exports the s390 SIE hardware control block to userspace
> via the mapping of the vcpu file descriptor. In order to do so,
> a new arch callback named kvm_arch_vcpu_fault  is introduced for all
> architectures. It allows to map architecture specific pages.
> 
> Signed-off-by: Carsten Otte 
> ---
> ---
> Documentation/virtual/kvm/api.txt |5 +
> arch/ia64/kvm/kvm-ia64.c  |5 +
> arch/powerpc/kvm/powerpc.c|5 +
> arch/s390/kvm/kvm-s390.c  |   13 +
> arch/x86/kvm/x86.c|5 +
> include/linux/kvm.h   |1 +
> include/linux/kvm_host.h  |1 +
> virt/kvm/kvm_main.c   |2 +-
> 8 files changed, 36 insertions(+), 1 deletion(-)
> 
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -218,6 +218,11 @@ allocation of vcpu ids.  For example, if
> single-threaded guest vcpus, it should make all vcpu ids be a multiple
> of the number of vcpus per vcore.
> 
> +For virtual cpus that have been created with S390 user controlled virtual
> +machines, the resulting vcpu fd can be memory mapped at page offset
> +KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
> +cpu's hardware control block.
> +
> 4.8 KVM_GET_DIRTY_LOG (vm ioctl)
> 
> Capability: basic
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1566,6 +1566,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   struct kvm_memory_slot *memslot,
>   struct kvm_memory_slot old,
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -659,6 +659,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
> {
>   u32 inst_lis = 0x3c00;
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> +#ifdef CONFIG_KVM_UCONTROL

Do we ever want to allow non-s390 to circumvent the GPL? This should also have 
s390 in its name.

> + if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET)
> +  && (kvm_is_ucontrol(vcpu->kvm))) {
> + vmf->page = virt_to_page(vcpu->arch.sie_block);
> + get_page(vmf->page);
> + return 0;
> + }
> +#endif
> + return VM_FAULT_SIGBUS;
> +}
> +
> /* Section: memory related */
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  struct kvm_memory_slot *memslot,
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2790,6 +2790,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
> {
>   int ret;
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo {
> 
> #define KVM_VM_REGULAR  0
> #define KVM_VM_UCONTROL   1

Same as this. It's an s390 specific hack, so it should be identified as such.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: PPC: align vcpu_kick with x86

2011-12-09 Thread Alexander Graf
Our vcpu kick implementation differs a bit from x86 which resulted in us not
disabling preemption during the kick. Get it a bit closer to what x86 does.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/powerpc.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index c952f13..ef8c990 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -557,12 +557,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
+int me;
+int cpu = vcpu->cpu;
+
+me = get_cpu();
if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(vcpu->arch.wqp);
vcpu->stat.halt_wakeup++;
-   } else if (vcpu->cpu != -1) {
+   } else if (cpu != me && cpu != -1) {
smp_send_reschedule(vcpu->cpu);
}
+put_cpu();
 }
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: PPC: Book3s: PR: Disable preemption in vcpu_run

2011-12-09 Thread Alexander Graf
When entering the guest, we want to make sure we're not getting preempted
away, so let's disable preemption on entry, but enable it again while handling
guest exits.

Reported-by: Jörg Sommer 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 726512b..8e4f800 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -519,6 +519,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
run->ready_for_interrupt_injection = 1;
 
trace_kvm_book3s_exit(exit_nr, vcpu);
+   preempt_enable();
kvm_resched(vcpu);
switch (exit_nr) {
case BOOK3S_INTERRUPT_INST_STORAGE:
@@ -763,6 +764,8 @@ program_interrupt:
run->exit_reason = KVM_EXIT_INTR;
r = -EINTR;
} else {
+   preempt_disable();
+
/* In case an interrupt came in that was triggered
 * from userspace (like DEC), we need to check what
 * to inject now! */
@@ -925,6 +928,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 #endif
ulong ext_msr;
 
+   preempt_disable();
+
/* Check if we can run the vcpu at all */
if (!vcpu->arch.sane) {
kvm_run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -1006,6 +1011,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
current->thread.used_vsr = used_vsr;
 #endif
 
+   preempt_enable();
+
return ret;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Fix book3s-pr KVM with preemption

2011-12-09 Thread Alexander Graf
So far we got away with not implementing preemption properly. However,
recently users emerged who wanted to run PREEMPT_xxx kernels, running
into issues with KVM on there.

This patch set fixes all preempt issues I've found so far with Book3S
PR KVM.

Alexander Graf (4):
  KVM: PPC: Book3s: PR: Disable preemption in vcpu_run
  KVM: PPC: Book3s: PR: No irq_disable in vcpu_run
  KVM: PPC: Use get/set for to_svcpu to help preemption
  KVM: PPC: align vcpu_kick with x86

 arch/powerpc/include/asm/kvm_book3s.h|   76 +++---
 arch/powerpc/include/asm/kvm_book3s_32.h |6 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h |8 +++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   21 ++--
 arch/powerpc/kvm/book3s_64_mmu_host.c|   66 +-
 arch/powerpc/kvm/book3s_emulate.c|8 ++-
 arch/powerpc/kvm/book3s_pr.c |   71 +++-
 arch/powerpc/kvm/powerpc.c   |7 ++-
 arch/powerpc/kvm/trace.h |5 ++-
 9 files changed, 194 insertions(+), 74 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: PPC: Book3s: PR: No irq_disable in vcpu_run

2011-12-09 Thread Alexander Graf
Somewhere during merges we ended up from

  local_irq_enable()
  foo();
  local_irq_disable()

to always keeping irqs enabled during that part. However, we now
have the following code:

  foo();
  local_irq_disable()

which disables interrupts without the surrounding code enabling them
again! So let's remove that disable and be happy.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 8e4f800..40ce940 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -983,8 +983,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
kvm_guest_exit();
 
-   local_irq_disable();
-
current->thread.regs->msr = ext_msr;
 
/* Make sure we save the guest FPU/Altivec/VSX state */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: PPC: Use get/set for to_svcpu to help preemption

2011-12-09 Thread Alexander Graf
When running the 64-bit Book3s PR code without CONFIG_PREEMPT_NONE, we were
doing a few things wrong, most notably access to PACA fields without making
sure that the pointers stay stable accross the access (preempt_disable()).

This patch moves to_svcpu towards a get/put model which allows us to disable
preemption while accessing the shadow vcpu fields in the PACA. That way we
can run preemptible and everyone's happy!

Reported-by: Jörg Sommer 
Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h|   76 +++---
 arch/powerpc/include/asm/kvm_book3s_32.h |6 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h |8 +++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   21 ++--
 arch/powerpc/kvm/book3s_64_mmu_host.c|   66 +-
 arch/powerpc/kvm/book3s_emulate.c|8 ++-
 arch/powerpc/kvm/book3s_pr.c |   62 
 arch/powerpc/kvm/trace.h |5 ++-
 8 files changed, 181 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index deb8a4e..e8c78ac 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -185,7 +185,9 @@ static inline void kvmppc_update_int_pending(struct 
kvm_vcpu *vcpu,
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
if ( num < 14 ) {
-   to_svcpu(vcpu)->gpr[num] = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->gpr[num] = val;
+   svcpu_put(svcpu);
to_book3s(vcpu)->shadow_vcpu->gpr[num] = val;
} else
vcpu->arch.gpr[num] = val;
@@ -193,80 +195,120 @@ static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, 
int num, ulong val)
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
-   if ( num < 14 )
-   return to_svcpu(vcpu)->gpr[num];
-   else
+   if ( num < 14 ) {
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   ulong r = svcpu->gpr[num];
+   svcpu_put(svcpu);
+   return r;
+   } else
return vcpu->arch.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
-   to_svcpu(vcpu)->cr = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->cr = val;
+   svcpu_put(svcpu);
to_book3s(vcpu)->shadow_vcpu->cr = val;
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
-   return to_svcpu(vcpu)->cr;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   u32 r;
+   r = svcpu->cr;
+   svcpu_put(svcpu);
+   return r;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 {
-   to_svcpu(vcpu)->xer = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->xer = val;
to_book3s(vcpu)->shadow_vcpu->xer = val;
+   svcpu_put(svcpu);
 }
 
 static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
-   return to_svcpu(vcpu)->xer;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   u32 r;
+   r = svcpu->xer;
+   svcpu_put(svcpu);
+   return r;
 }
 
 static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
 {
-   to_svcpu(vcpu)->ctr = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->ctr = val;
+   svcpu_put(svcpu);
 }
 
 static inline ulong kvmppc_get_ctr(struct kvm_vcpu *vcpu)
 {
-   return to_svcpu(vcpu)->ctr;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   ulong r;
+   r = svcpu->ctr;
+   svcpu_put(svcpu);
+   return r;
 }
 
 static inline void kvmppc_set_lr(struct kvm_vcpu *vcpu, ulong val)
 {
-   to_svcpu(vcpu)->lr = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->lr = val;
+   svcpu_put(svcpu);
 }
 
 static inline ulong kvmppc_get_lr(struct kvm_vcpu *vcpu)
 {
-   return to_svcpu(vcpu)->lr;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   ulong r;
+   r = svcpu->lr;
+   svcpu_put(svcpu);
+   return r;
 }
 
 static inline void kvmppc_set_pc(struct kvm_vcpu *vcpu, ulong val)
 {
-   to_svcpu(vcpu)->pc = val;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   svcpu->pc = val;
+   svcpu_put(svcpu);
 }
 
 static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
 {
-   return to_svcpu(vcpu)->pc;
+   struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   ulong r;
+   r = svcpu->pc;
+   svcpu_put(svcpu);
+   return r;
 }
 
 static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
 {
ulong pc = kvmppc_get_pc(vcpu);
-   struct kvmppc_book3s_shadow_vcpu *svcpu = to_svcpu(vcpu);
+   struct kvmppc_book3s_shadow_vcpu 

Re: [PATCH V2] kvm: make vcpu life cycle separated from kvm instance

2011-12-09 Thread Gleb Natapov
On Fri, Dec 09, 2011 at 01:23:18PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan 
> 
> Currently, vcpu can be destructed only when kvm instance destroyed.
> Change this to vcpu's destruction taken when its refcnt is zero,
> and then vcpu MUST and CAN be destroyed before kvm's destroy.
> 
Now refcount is completely unused. It's just set to 1 during vcpu
creation and reset to 0 during destruction. Just drop it.

> Signed-off-by: Liu Ping Fan 
> ---
>  arch/x86/kvm/i8254.c |   10 --
>  arch/x86/kvm/i8259.c |   12 --
>  arch/x86/kvm/mmu.c   |7 ++--
>  arch/x86/kvm/x86.c   |   54 --
>  include/linux/kvm_host.h |   77 +++---
>  virt/kvm/irq_comm.c  |7 +++-
>  virt/kvm/kvm_main.c  |   82 
> --
>  7 files changed, 196 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index 76e3f1c..ac79598 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -289,7 +289,7 @@ static void pit_do_work(struct work_struct *work)
>   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
>   struct kvm *kvm = pit->kvm;
>   struct kvm_vcpu *vcpu;
> - int i;
> + struct kvm_iter it;
>   struct kvm_kpit_state *ps = &pit->pit_state;
>   int inject = 0;
>  
> @@ -315,9 +315,13 @@ static void pit_do_work(struct work_struct *work)
>* LVT0 to NMI delivery. Other PIC interrupts are just sent to
>* VCPU0, and only if its LVT0 is in EXTINT mode.
>*/
> - if (kvm->arch.vapics_in_nmi_mode > 0)
> - kvm_for_each_vcpu(i, vcpu, kvm)
> + if (kvm->arch.vapics_in_nmi_mode > 0) {
> + rcu_read_lock();
> + kvm_for_each_vcpu(it, vcpu, kvm) {
>   kvm_apic_nmi_wd_deliver(vcpu);
> + }
> + rcu_read_unlock();
> + }
>   }
>  }
>  
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index cac4746..2186b30 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -50,25 +50,29 @@ static void pic_unlock(struct kvm_pic *s)
>  {
>   bool wakeup = s->wakeup_needed;
>   struct kvm_vcpu *vcpu, *found = NULL;
> - int i;
> + struct kvm *kvm = s->kvm;
> + struct kvm_iter it;
>  
>   s->wakeup_needed = false;
>  
>   spin_unlock(&s->lock);
>  
>   if (wakeup) {
> - kvm_for_each_vcpu(i, vcpu, s->kvm) {
> + rcu_read_lock();
> + kvm_for_each_vcpu(it, vcpu, kvm)
>   if (kvm_apic_accept_pic_intr(vcpu)) {
>   found = vcpu;
>   break;
>   }
> - }
>  
> - if (!found)
> + if (!found) {
> + rcu_read_unlock();
>   return;
> + }
>  
>   kvm_make_request(KVM_REQ_EVENT, found);
>   kvm_vcpu_kick(found);
> + rcu_read_unlock();
>   }
>  }
>  
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f1b36cf..c16887e 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1833,11 +1833,12 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, 
> u64 *parent_pte)
>  
>  static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
>  {
> - int i;
> + struct kvm_iter it;
>   struct kvm_vcpu *vcpu;
> -
> - kvm_for_each_vcpu(i, vcpu, kvm)
> + rcu_read_lock();
> + kvm_for_each_vcpu(it, vcpu, kvm)
>   vcpu->arch.last_pte_updated = NULL;
> + rcu_read_unlock();
>  }
>  
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..a302470 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1831,10 +1831,15 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 
> msr, u64 *pdata)
>   switch (msr) {
>   case HV_X64_MSR_VP_INDEX: {
>   int r;
> + struct kvm_iter it;
>   struct kvm_vcpu *v;
> - kvm_for_each_vcpu(r, v, vcpu->kvm)
> + struct kvm *kvm =  vcpu->kvm;
> + rcu_read_lock();
> + kvm_for_each_vcpu(it, v, kvm) {
>   if (v == vcpu)
>   data = r;
> + }
> + rcu_read_unlock();
>   break;
>   }
>   case HV_X64_MSR_EOI:
> @@ -4966,7 +4971,8 @@ static int kvmclock_cpufreq_notifier(struct 
> notifier_block *nb, unsigned long va
>   struct cpufreq_freqs *freq = data;
>   struct kvm *kvm;
>   struct kvm_vcpu *vcpu;
> - int i, send_ipi = 0;
> + int send_ipi = 0;
> + struct kvm_iter it;
>  
>   /*
>* We allow guests to temporarily run on slowing clocks,
> @@ -5016,13 +5022

Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-09 Thread Heiko Carstens
On Fri, Dec 09, 2011 at 01:49:35PM +0100, Carsten Otte wrote:
> This patch introduces an interface to access the guest visible
> storage keys. It supports three operations that model the behavior
> that SSKE/ISKE/RRBE instructions would have if they were issued by
> the guest. These instructions are all documented in the z architecture
> principles of operation book.
> 
> Signed-off-by: Carsten Otte 
> ---

[...]

> + spin_lock(¤t->mm->page_table_lock);
> + ptep = ptep_for_addr(addr);
> + if (!ptep)
> + goto out_unlock;

FWIW, this is also a bit odd: if the guest would perform a storage key
operation on such an address it would succeed. If the host will do it,
it will fail (which doesn't match your description above).
No?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-09 Thread Carsten Otte

On 09.12.2011 13:52, Joachim von Buttlar wrote:

Shouldn't it be:   page_set_storage_key(pte_val(*ptep), skey |
_PAGE_CHANGED, 1);

+/* avoid race clobbering 
changed bit
*/
+pte_val(*ptep) |= _PAGE_SWC;

No, the guest GR/GC bits get set to the value userspace wants down
below (this is set storage key after all), and for the host we turn on 
Martins _PAGE_SWC software bit in the pte to make sure we don't 
underindicate changed. As far as I can tell, this should be just fine.



Typo:/* put acc+f plus guest referenced and changed into the

will fix.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user

2011-12-09 Thread Carsten Otte
This patch introduces a new exit reason in the kvm_run structure
named KVM_EXIT_UCONTROL. This exit indicates, that a virtual cpu
has regognized a fault on the host page table. The idea is that
userspace can handle this fault by mapping memory at the fault
location into the cpu's address space and then continue to run the
virtual cpu.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   14 ++
 arch/s390/kvm/kvm-s390.c  |   32 +++-
 arch/s390/kvm/kvm-s390.h  |1 +
 include/linux/kvm.h   |6 ++
 4 files changed, 48 insertions(+), 5 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1653,6 +1653,20 @@ s390 specific.
 
 s390 specific.
 
+   /* KVM_EXIT_UCONTROL */
+   struct {
+   __u64 trans_exc_code;
+   __u32 pgm_code;
+   } s390_ucontrol;
+
+s390 specific. A page fault has occurred for a user controlled virtual
+machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
+resolved by the kernel.
+The program code and the translation exception code that were placed
+in the cpu's lowcore are presented here as defined by the z Architecture
+Principles of Operation Book in the Chapter for Dynamic Address Translation
+(DAT)
+
/* KVM_EXIT_DCR */
struct {
__u32 dcrn;
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -499,8 +499,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(stru
return -EINVAL; /* not implemented yet */
 }
 
-static void __vcpu_run(struct kvm_vcpu *vcpu)
+static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
+   int rc;
+
memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
 
if (need_resched())
@@ -517,9 +519,15 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
VCPU_EVENT(vcpu, 6, "entering sie flags %x",
   atomic_read(&vcpu->arch.sie_block->cpuflags));
-   if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) {
-   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
-   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs);
+   if (rc) {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   rc = SIE_INTERCEPT_UCONTROL;
+   } else {
+   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
+   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = 0;
+   }
}
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
   vcpu->arch.sie_block->icptcode);
@@ -528,6 +536,7 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
 
memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16);
+   return rc;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
@@ -548,6 +557,7 @@ rerun_vcpu:
case KVM_EXIT_UNKNOWN:
case KVM_EXIT_INTR:
case KVM_EXIT_S390_RESET:
+   case KVM_EXIT_UCONTROL:
break;
default:
BUG();
@@ -559,7 +569,9 @@ rerun_vcpu:
might_fault();
 
do {
-   __vcpu_run(vcpu);
+   rc = __vcpu_run(vcpu);
+   if (rc)
+   break;
rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
@@ -571,6 +583,16 @@ rerun_vcpu:
rc = -EINTR;
}
 
+#ifdef CONFIG_KVM_UCONTROL
+   if (rc == SIE_INTERCEPT_UCONTROL) {
+   kvm_run->exit_reason = KVM_EXIT_UCONTROL;
+   kvm_run->s390_ucontrol.trans_exc_code =
+   current->thread.gmap_addr;
+   kvm_run->s390_ucontrol.pgm_code = 0x10;
+   rc = 0;
+   }
+#endif
+
if (rc == -EOPNOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -26,6 +26,7 @@ typedef int (*intercept_handler_t)(struc
 
 /* negativ values are error codes, positive values for internal conditions */
 #define SIE_INTERCEPT_RERUNVCPU(1<<0)
+#define SIE_INTERCEPT_UCONTROL (1<<1)
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu);
 
 #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -162,6 +162,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
 #define KVM_EXIT_PAPR_HCALL  19
+#define KVM_EXIT_UCONTROL20
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -249,6 +250,11 @@ struct kvm_run {
 #define KVM_S390_RESE

[patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms

2011-12-09 Thread Carsten Otte
This patch announces a new capability KVM_CAP_UCONTROL that
indicates that kvm can now support virtual machines that are
controlled by userspace.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |3 +++
 include/linux/kvm.h  |1 +
 2 files changed, 4 insertions(+)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -233,6 +233,9 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_S390_PSW:
case KVM_CAP_S390_GMAP:
case KVM_CAP_SYNC_MMU:
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_CAP_UCONTROL:
+#endif
r = 1;
break;
default:
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -574,6 +574,7 @@ struct kvm_s390_keyop {
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_UCONTROL 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 00/12] Ucontrol patchset V4

2011-12-09 Thread Carsten Otte
Hi Avi, Hi Marcelo,

this version includes review feedback from Heiko.

so long,
Carsten

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers

2011-12-09 Thread Carsten Otte
This patch fixes the return code of kvm_arch_vcpu_ioctl in case
of an unkown ioctl number.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -891,7 +891,7 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
default:
-   r = -EINVAL;
+   r = -ENOTTY;
}
return r;
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-09 Thread Carsten Otte
This patch introduces an interface to access the guest visible
storage keys. It supports three operations that model the behavior
that SSKE/ISKE/RRBE instructions would have if they were issued by
the guest. These instructions are all documented in the z architecture
principles of operation book.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   38 +
 arch/s390/include/asm/kvm_host.h  |4 +
 arch/s390/include/asm/pgtable.h   |1 
 arch/s390/kvm/kvm-s390.c  |  110 --
 arch/s390/mm/pgtable.c|   70 +---
 include/linux/kvm.h   |7 ++
 6 files changed, 209 insertions(+), 21 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1514,6 +1514,44 @@ table upfront. This is useful to handle
 controlled virtual machines to fault in the virtual cpu's lowcore pages
 prior to calling the KVM_RUN ioctl.
 
+4.67 KVM_S390_KEYOP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_keyop (in+out)
+Returns: 0 in case of success
+
+The parameter looks like this:
+   struct kvm_s390_keyop {
+   __u64 user_addr;
+   __u8  key;
+   __u8  operation;
+   };
+
+user_addr  contains the userspace address of a memory page
+keycontains the guest visible storage key as defined by the
+   z Architecture Principles of Operation book, including key
+   value for key controlled storage protection, the fetch
+   protection bit, and the reference and change indicator bits
+operation  indicates the key operation that should be performed
+
+The following operations are supported:
+KVM_S390_KEYOP_SSKE:
+   This operation behaves just like the set storage key extended (SSKE)
+   instruction would, if it were issued by the guest. The storage key
+   provided in "key" is placed in the guest visible storage key.
+KVM_S390_KEYOP_ISKE:
+   This operation behaves just like the insert storage key extended (ISKE)
+   instruction would, if it were issued by the guest. After this call,
+   the guest visible storage key is presented in the "key" field.
+KVM_S390_KEYOP_RRBE:
+   This operation behaves just like the reset referenced bit extended
+   (RRBE) instruction would, if it were issued by the guest. The guest
+   visible reference bit is cleared, and the value presented in the "key"
+   field after this call has the reference bit set to 1 in case the
+   guest view of the reference bit was 1 prior to this call.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -24,6 +24,10 @@
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
+#define KVM_S390_KEYOP_SSKE 0x01
+#define KVM_S390_KEYOP_ISKE 0x02
+#define KVM_S390_KEYOP_RRBE 0x03
+
 struct sca_entry {
atomic_t scn;
__u32   reserved;
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
+extern pte_t *ptep_for_addr(unsigned long addr);
 
 /*
  * No page table caches to initialise
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -112,13 +112,117 @@ void kvm_arch_exit(void)
 {
 }
 
+static long kvm_s390_keyop(struct kvm_s390_keyop *kop)
+{
+   unsigned long addr = kop->user_addr;
+   pte_t *ptep;
+   pgste_t pgste;
+   int r;
+   unsigned long skey;
+   unsigned long bits;
+
+   /* make sure this process is a hypervisor */
+   r = -EINVAL;
+   if (!mm_has_pgste(current->mm))
+   goto out;
+
+   r = -EFAULT;
+   if (addr >= PGDIR_SIZE)
+   goto out;
+
+   spin_lock(¤t->mm->page_table_lock);
+   ptep = ptep_for_addr(addr);
+   if (!ptep)
+   goto out_unlock;
+   if (IS_ERR(ptep)) {
+   r = PTR_ERR(ptep);
+   goto out_unlock;
+   }
+
+   pgste = pgste_get_lock(ptep);
+
+   switch (kop->operation) {
+   case KVM_S390_KEYOP_SSKE:
+   pgste = pgste_update_all(ptep, pgste);
+   /* set the real key back w/o rc bits */
+   skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
+   if (pte_present(*ptep)) {
+   page_set_storage_key(pte_val(*ptep), skey, 1);
+   /* avoid race clobbering changed bit */
+   pte_val(*ptep) |= _PAGE_SWC;
+   }
+   /* put acc+f plus guest refereced and changed into the pgste */
+   pgste_val(pgste) &= ~(RCP_A

[patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM

2011-12-09 Thread Carsten Otte
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces an optional parameter to
KVM_CREATE_VM in order to create a user controlled virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
Valid values for the new parameter are KVM_VM_REGULAR (defined to 0
for backward compatibility to old KVM_CREATE_VM) and
KVM_VM_UCONTROL for s390 only.
Note that the user controlled virtual machines require CAP_SYS_ADMIN
privileges.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |7 ++-
 arch/ia64/kvm/kvm-ia64.c  |5 -
 arch/powerpc/kvm/powerpc.c|5 -
 arch/s390/kvm/Kconfig |9 +
 arch/s390/kvm/kvm-s390.c  |   30 +-
 arch/s390/kvm/kvm-s390.h  |   10 ++
 arch/x86/kvm/x86.c|5 -
 include/linux/kvm.h   |3 +++
 include/linux/kvm_host.h  |2 +-
 virt/kvm/kvm_main.c   |   19 +--
 10 files changed, 79 insertions(+), 16 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -95,7 +95,7 @@ described as 'basic' will be available.
 Capability: basic
 Architectures: all
 Type: system ioctl
-Parameters: none
+Parameters: machine type identifier (KVM_VM_*)
 Returns: a VM fd that can be used to control the new virtual machine.
 
 The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
@@ -103,6 +103,11 @@ will access the virtual machine's physic
 corresponds to guest physical address zero.  Use of mmap() on a VM fd
 is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
 available.
+You most certainly want to use KVM_VM_REGULAR as machine type.
+
+In order to create user controlled virtual machines on S390, check
+KVM_CAP_UCONTROL and use KVM_VM_UCONTROL as machine type as
+privileged user (CAP_SYS_ADMIN).
 
 4.3 KVM_GET_MSR_INDEX_LIST
 
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -809,10 +809,13 @@ static void kvm_build_io_pmt(struct kvm
 #define GUEST_PHYSICAL_RR4 0x2739
 #define VMM_INIT_RR0x1660
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
BUG_ON(!kvm);
 
+   if (type != KVM_VM_REGULAR)
+   return -EINVAL;
+
kvm->arch.is_sn2 = ia64_platform_is("sn2");
 
kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0;
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -171,8 +171,11 @@ void kvm_arch_check_processor_compat(voi
*(int *)rtn = kvmppc_core_check_processor_compat();
 }
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+   if (type != KVM_VM_REGULAR)
+   return -EINVAL;
+
return kvmppc_core_init_vm(kvm);
 }
 
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -34,6 +34,15 @@ config KVM
 
  If unsure, say N.
 
+config KVM_UCONTROL
+   bool "Userspace controlled virtual machines"
+   depends on KVM
+   ---help---
+ Allow CAP_SYS_ADMIN users to create KVM virtual machines that are
+ controlled by userspace.
+
+ If unsure, say N.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/vhost/Kconfig
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -171,11 +171,28 @@ long kvm_arch_vm_ioctl(struct file *filp
return r;
 }
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int rc;
char debug_name[16];
 
+   rc = -EINVAL;
+#ifdef CONFIG_KVM_UCONTROL
+   switch (type) {
+   case KVM_VM_REGULAR:
+   break;
+   case KVM_VM_UCONTROL:
+   if (!capable(CAP_SYS_ADMIN))
+   goto out_err;
+   break;
+   default:
+   goto out_err;
+   }
+#else
+   if (type != KVM_VM_REGULAR)
+   goto out_err;
+#endif
+
rc = s390_enable_sie();
if (rc)
goto out_err;
@@ -198,10 +215,13 @@ int kvm_arch_init_vm(struct kvm *kvm)
debug_register_view(kvm->arch.dbf, &debug_sprintf_view);
VM_EVENT(kvm, 3, "%s", "vm created");
 
-   kvm->arch.gmap = gmap_alloc(current->mm);
-   if (!kvm->arch.gmap)
-   goto out_nogmap;
-
+   if (type == KVM_VM_REGULAR) {
+   kvm->arch.gmap = gmap_alloc(current->mm);
+   if (!kvm->arch.gmap)
+   goto out_nogmap;
+   } else {
+   kvm->arch.gmap = NULL;
+   }
return 0;
 out_nogmap:
debug_unregister(kvm->arch.dbf);
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -47,6 +47,16 @@ static inline int __cpu_is_stopped(struc
return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOP

[patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca

2011-12-09 Thread Carsten Otte
This patch makes sure user controlled virtual machines do not use a
system control area (sca). This is needed in order to create
virtual machines with more cpus than the size of the sca [64].

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -234,10 +234,13 @@ out_err:
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
VCPU_EVENT(vcpu, 3, "%s", "free cpu");
-   clear_bit(63 - vcpu->vcpu_id, (unsigned long *) 
&vcpu->kvm->arch.sca->mcn);
-   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
-   (__u64) vcpu->arch.sie_block)
-   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   clear_bit(63 - vcpu->vcpu_id,
+ (unsigned long *) &vcpu->kvm->arch.sca->mcn);
+   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
+   (__u64) vcpu->arch.sie_block)
+   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   }
smp_mb();
 
if (kvm_is_ucontrol(vcpu->kvm))
@@ -374,12 +377,19 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
goto out_free_cpu;
 
vcpu->arch.sie_block->icpua = id;
-   BUG_ON(!kvm->arch.sca);
-   if (!kvm->arch.sca->cpu[id].sda)
-   kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
-   vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
-   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
-   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   if (!kvm_is_ucontrol(kvm)) {
+   if (!kvm->arch.sca) {
+   WARN_ON_ONCE(1);
+   goto out_free_cpu;
+   }
+   if (!kvm->arch.sca->cpu[id].sda)
+   kvm->arch.sca->cpu[id].sda =
+   (__u64) vcpu->arch.sie_block;
+   vcpu->arch.sie_block->scaoh =
+   (__u32)(((__u64)kvm->arch.sca) >> 32);
+   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
+   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   }
 
spin_lock_init(&vcpu->arch.local_int.lock);
INIT_LIST_HEAD(&vcpu->arch.local_int.list);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table

2011-12-09 Thread Carsten Otte
This patch allows the user to fault in pages on a virtual cpus
address space for user controlled virtual machines. Typically this
is superfluous because userspace can just create a mapping and
let the kernel's page fault logic take are of it. There is one
exception: SIE won't start if the lowcore is not present. Normally
the kernel takes care of this [handle_validity() in
arch/s390/kvm/intercept.c] but since the kernel does not handle
intercepts for user controlled virtual machines, userspace needs to
be able to handle this condition.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   16 
 arch/s390/kvm/kvm-s390.c  |6 ++
 include/linux/kvm.h   |1 +
 3 files changed, 23 insertions(+)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1498,6 +1498,22 @@ This ioctl unmaps the memory in the vcpu
 "vcpu_addr" with the length "length". The field "user_addr" is ignored.
 All parameters need to be alligned by 1 megabyte.
 
+4.66 KVM_S390_VCPU_FAULT
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: vcpu absolute address (in)
+Returns: 0 in case of success
+
+This call creates a page table entry on the virtual cpu's address space
+(for user controlled virtual machines) or the virtual machine's address
+space (for regular virtual machines). This only works for minor faults,
+thus it's recommended to access subject memory page via the user page
+table upfront. This is useful to handle validity intercepts for user
+controlled virtual machines to fault in the virtual cpu's lowcore pages
+prior to calling the KVM_RUN ioctl.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -767,6 +767,12 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
 #endif
+   case KVM_S390_VCPU_FAULT: {
+   r = gmap_fault(arg, vcpu->arch.gmap);
+   if (!IS_ERR_VALUE(r))
+   r = 0;
+   break;
+   }
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -673,6 +673,7 @@ struct kvm_s390_ucas_mapping {
 };
 #define KVM_S390_UCAS_MAP_IOW(KVMIO, 0x50, struct 
kvm_s390_ucas_mapping)
 #define KVM_S390_UCAS_UNMAP  _IOW(KVMIO, 0x51, struct 
kvm_s390_ucas_mapping)
+#define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long)
 
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][Autotest] Autotest: Add subtest inteface to client utils.

2011-12-09 Thread Jiří Župka
This class and some decorators are for easy way of start function like a 
subtest.
Subtests result are collected and it is posible for review on end of test.
Subtest class and decorators should be placed in autotest_lib.client.utils.

There is possibility how to change  results format.

Example:
@staticmethod
def result_to_string(result):
"""
@param result: Result of test.
"""
print result
return ("[%(result)]%(name): %(output)") % (result)

  1)
Subtest.result_to_string = result_to_string
Subtest.get_text_result()

  2)
Subtest.get_text_result(result_to_string)

Pull-request: https://github.com/autotest/autotest/pull/111

Signed-off-by: Jiří Župka 
---
 client/common_lib/base_utils.py  |  214 ++
 client/common_lib/base_utils_unittest.py |  117 
 2 files changed, 331 insertions(+), 0 deletions(-)

diff --git a/client/common_lib/base_utils.py b/client/common_lib/base_utils.py
index 005e3b0..fc6578d 100644
--- a/client/common_lib/base_utils.py
+++ b/client/common_lib/base_utils.py
@@ -119,6 +119,220 @@ class BgJob(object):
 signal.signal(signal.SIGPIPE, signal.SIG_DFL)
 
 
+def subtest_fatal(function):
+"""
+Decorator which mark test critical.
+If subtest failed whole test ends.
+"""
+def wrapped(self, *args, **kwds):
+self._fatal = True
+self.decored()
+result = function(self, *args, **kwds)
+return result
+wrapped.func_name = function.func_name
+return wrapped
+
+
+def subtest_nocleanup(function):
+"""
+Decorator disable cleanup function.
+"""
+def wrapped(self, *args, **kwds):
+self._cleanup = False
+self.decored()
+result = function(self, *args, **kwds)
+return result
+wrapped.func_name = function.func_name
+return wrapped
+
+
+class Subtest(object):
+"""
+Collect result of subtest of main test.
+"""
+result = []
+passed = 0
+failed = 0
+def __new__(cls, *args, **kargs):
+self = super(Subtest, cls).__new__(cls)
+
+self._fatal = False
+self._cleanup = True
+self._num_decored = 0
+
+ret = None
+if args is None:
+args = []
+
+res = {
+   'result' : None,
+   'name'   : self.__class__.__name__,
+   'args'   : args,
+   'kargs'  : kargs,
+   'output' : None,
+  }
+try:
+logging.info("Starting test %s" % self.__class__.__name__)
+ret = self.test(*args, **kargs)
+res['result'] = 'PASS'
+res['output'] = ret
+try:
+logging.info(Subtest.result_to_string(res))
+except:
+self._num_decored = 0
+raise
+Subtest.result.append(res)
+Subtest.passed += 1
+except NotImplementedError:
+raise
+except Exception:
+exc_type, exc_value, exc_traceback = sys.exc_info()
+for _ in range(self._num_decored):
+exc_traceback = exc_traceback.tb_next
+logging.error("In function (" + self.__class__.__name__ + "):")
+logging.error("Call from:\n" +
+  traceback.format_stack()[-2][:-1])
+logging.error("Exception from:\n" +
+  "".join(traceback.format_exception(
+  exc_type, exc_value,
+  exc_traceback.tb_next)))
+# Clean up environment after subTest crash
+res['result'] = 'FAIL'
+logging.info(self.result_to_string(res))
+Subtest.result.append(res)
+Subtest.failed += 1
+if self._fatal:
+raise
+finally:
+if self._cleanup:
+self.clean()
+
+return ret
+
+
+def test(self):
+"""
+Check if test is defined.
+
+For makes test fatal add before implementation of test method
+decorator @subtest_fatal
+"""
+raise NotImplementedError("Method test is not implemented.")
+
+
+def clean(self):
+"""
+Check if cleanup is defined.
+
+For makes test fatal add before implementation of test method
+decorator @subtest_nocleanup
+"""
+raise NotImplementedError("Method cleanup is not implemented.")
+
+
+def decored(self):
+self._num_decored += 1
+
+
+@classmethod
+def has_failed(cls):
+"""
+@return: If any of subtest not pass return True.
+"""
+if cls.failed > 0:
+return True
+else:
+return False
+
+
+@classmethod
+def get_result(cls):
+"""
+@return: Result of subtests.
+   Format:

[patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack

2011-12-09 Thread Carsten Otte
This patch disables the in-kernel interrupt stack for KVM virtual
machines that are controlled by user. Userspace has to take care
of handling interrupts on its own.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -511,7 +511,8 @@ static int __vcpu_run(struct kvm_vcpu *v
if (test_thread_flag(TIF_MCCK_PENDING))
s390_handle_mcck();
 
-   kvm_s390_deliver_pending_interrupts(vcpu);
+   if (!kvm_is_ucontrol(vcpu->kvm))
+   kvm_s390_deliver_pending_interrupts(vcpu);
 
vcpu->arch.sie_block->icptcode = 0;
local_irq_disable();

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS

2011-12-09 Thread Carsten Otte
This patch fixes definition of the idle_mask and the local_int array
in kvm_s390_float_interrupt. Previous definition had 64 cpus max
hardcoded instead of using KVM_MAX_VCPUS.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
===
--- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h
+++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
@@ -220,8 +220,9 @@ struct kvm_s390_float_interrupt {
struct list_head list;
atomic_t active;
int next_rr_cpu;
-   unsigned long idle_mask [(64 + sizeof(long) - 1) / sizeof(long)];
-   struct kvm_s390_local_interrupt *local_int[64];
+   unsigned long idle_mask[(KVM_MAX_VCPUS + sizeof(long) - 1)
+   / sizeof(long)];
+   struct kvm_s390_local_interrupt *local_int[KVM_MAX_VCPUS];
 };
 
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-09 Thread Carsten Otte
This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |5 +
 arch/ia64/kvm/kvm-ia64.c  |5 +
 arch/powerpc/kvm/powerpc.c|5 +
 arch/s390/kvm/kvm-s390.c  |   13 +
 arch/x86/kvm/x86.c|5 +
 include/linux/kvm.h   |1 +
 include/linux/kvm_host.h  |1 +
 virt/kvm/kvm_main.c   |2 +-
 8 files changed, 36 insertions(+), 1 deletion(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -218,6 +218,11 @@ allocation of vcpu ids.  For example, if
 single-threaded guest vcpus, it should make all vcpu ids be a multiple
 of the number of vcpus per vcore.
 
+For virtual cpus that have been created with S390 user controlled virtual
+machines, the resulting vcpu fd can be memory mapped at page offset
+KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
+cpu's hardware control block.
+
 4.8 KVM_GET_DIRTY_LOG (vm ioctl)
 
 Capability: basic
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1566,6 +1566,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_memory_slot old,
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -659,6 +659,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
 {
u32 inst_lis = 0x3c00;
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+#ifdef CONFIG_KVM_UCONTROL
+   if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET)
+&& (kvm_is_ucontrol(vcpu->kvm))) {
+   vmf->page = virt_to_page(vcpu->arch.sie_block);
+   get_page(vmf->page);
+   return 0;
+   }
+#endif
+   return VM_FAULT_SIGBUS;
+}
+
 /* Section: memory related */
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_memory_slot *memslot,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2790,6 +2790,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
 {
int ret;
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo {
 
 #define KVM_VM_REGULAR  0
 #define KVM_VM_UCONTROL1
+#define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
  * ioctls for /dev/kvm fds:
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -449,6 +449,7 @@ long kvm_arch_dev_ioctl(struct file *fil
unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg);
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf);
 
 int kvm_dev_ioctl_check_extension(long ext);
 
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1657,7 +1657,7 @@ static int kvm_vcpu_fault(struct vm_area
page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
 #endif
else
-   return VM_FAULT_SIGBUS;
+   return kvm_arch_vcpu_fault(vcpu, vmf);
get_page(page);
vmf->page = page;
return 0;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts

2011-12-09 Thread Carsten Otte
This patch disables in-kernel handling of SIE intercepts for user
controlled virtual machines. All intercepts are passed to userspace
via KVM_EXIT_SIE exit reason just like SIE intercepts that cannot be
handled in-kernel for regular KVM guests.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -572,7 +572,10 @@ rerun_vcpu:
rc = __vcpu_run(vcpu);
if (rc)
break;
-   rc = kvm_handle_sie_intercept(vcpu);
+   if (kvm_is_ucontrol(vcpu->kvm))
+   rc = -EOPNOTSUPP;
+   else
+   rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
if (rc == SIE_INTERCEPT_RERUNVCPU)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces

2011-12-09 Thread Carsten Otte
This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   38 
 arch/s390/kvm/kvm-s390.c  |   50 +-
 include/linux/kvm.h   |   10 +++
 3 files changed, 97 insertions(+), 1 deletion(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1455,6 +1455,44 @@ is supported; 2 if the processor require
 an RMA, or 1 if the processor can use an RMA but doesn't require it,
 because it supports the Virtual RMA (VRMA) facility.
 
+4.64 KVM_S390_UCAS_MAP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+
+The parameter is defined like this:
+   struct kvm_s390_ucas_mapping {
+   __u64 user_addr;
+   __u64 vcpu_addr;
+   __u64 length;
+   };
+
+This ioctl maps the memory at "user_addr" with the length "length" to
+the vcpu's address space starting at "vcpu_addr". All parameters need to
+be alligned by 1 megabyte.
+
+4.65 KVM_S390_UCAS_UNMAP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+
+The parameter is defined like this:
+   struct kvm_s390_ucas_mapping {
+   __u64 user_addr;
+   __u64 vcpu_addr;
+   __u64 length;
+   };
+
+This ioctl unmaps the memory in the vcpu's address space starting at
+"vcpu_addr" with the length "length". The field "user_addr" is ignored.
+All parameters need to be alligned by 1 megabyte.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -239,6 +239,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vc
(__u64) vcpu->arch.sie_block)
vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
smp_mb();
+
+   if (kvm_is_ucontrol(vcpu->kvm))
+   gmap_free(vcpu->arch.gmap);
+
free_page((unsigned long)(vcpu->arch.sie_block));
kvm_vcpu_uninit(vcpu);
kfree(vcpu);
@@ -269,12 +273,20 @@ void kvm_arch_destroy_vm(struct kvm *kvm
kvm_free_vcpus(kvm);
free_page((unsigned long)(kvm->arch.sca));
debug_unregister(kvm->arch.dbf);
-   gmap_free(kvm->arch.gmap);
+   if (!kvm_is_ucontrol(kvm))
+   gmap_free(kvm->arch.gmap);
 }
 
 /* Section: vcpu related */
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   vcpu->arch.gmap = gmap_alloc(current->mm);
+   if (!vcpu->arch.gmap)
+   return -ENOMEM;
+   return 0;
+   }
+
vcpu->arch.gmap = vcpu->kvm->arch.gmap;
return 0;
 }
@@ -693,6 +705,42 @@ long kvm_arch_vcpu_ioctl(struct file *fi
case KVM_S390_INITIAL_RESET:
r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
break;
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_S390_UCAS_MAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr,
+ucasmap.vcpu_addr, ucasmap.length);
+   break;
+   }
+   case KVM_S390_UCAS_UNMAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr,
+   ucasmap.length);
+   break;
+   }
+#endif
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -657,6 +657,16 @@ struct kvm_clock_data {
struct kvm_use

Re: [libvirt] (no subject)

2011-12-09 Thread Osier Yang

On 2011年12月07日 17:16, Daniel P. Berrange wrote:

On Wed, Dec 07, 2011 at 08:21:06AM +0200, Sasha Levin wrote:

On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote:

On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:

   * KVM tool manages the network completely itself (with DHCP support?),
 no way to configure, except specify the modes (user|tap|none). I
 have not test it yet, but it should need explicit script to setup
 the network rules(e.g. NAT) for the guest access outside world.
 Anyway, there is no way for libvirt to control the guest network.


If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?


KVM tool currently creates and configures the TAP devices it uses, it
shouldn't be an issue to have it use a TAP fd passed to it either.

How does libvirt do it? Create a /dev/tapX on it's own and pass the fd
to the hypervisor?


Yes, libvirt opens a /dev/tap device (or a macvtap device for VEPA
mode), adds it to the neccessary bridge, and/or configures VEPA, etc
and then passes the FD to the hypervisor, with a ARGV parameter to
tell the HV which FD is being passed.


   * console connection is implemented by setup ptys in libvirt, stdout/stderr
 of kvm tool process is redirected to the master pty, and libvirt connects
 to the slave pty. This works fine now, but it might be better if kvm
 tool could provide more advanced console mechanisms. Just like QEMU
 does?


This sounds good enough for now.


KVM tools does a redirection to a PTY, which at that point could be
redirected to anywhere the user wants.

What features might be interesting to do on top of that?


I presume that Osier is just comparing with the features QEMU has available
for chardevs config, which include

  - PTYs
  - UNIX sockets
  - TCP sockets
  - UDP sockets
  - FIFO pipe
  - Plain file (output only obviously, but useful for logging)


Yes, that's what I meant. :-)



libvirt doesn't specifically need any of them, but it can support those
options if they exist.


Yes, these won't prevent us, I just meant it will be great if they are
supported.




   * Not much ways existed yet for external apps or user to query the guest
 informations. But this might be changed soon per KVM tool grows up
 quickly.


What sort of guest info are you thinking about ? The most immediate
pieces of info I can imagine we need are

  - Mapping between PIDs and  vCPU threads
  - Current balloon driver value


Those are pretty easily added using the IPC interface I've mentioned
above. For example, 'kvm balloon' and 'kvm stat' will return a lot of
info out of the balloon driver (including the memory stats VQ - which
afaik we're probably the only ones who actually do that (but I might be
wrong) :)


Ok, that sounds sufficient for the balloon info.

Regards,
Daniel


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] (no subject)

2011-12-09 Thread Osier Yang

On 2011年12月07日 14:21, Sasha Levin wrote:

On Tue, 2011-12-06 at 14:38 +, Daniel P. Berrange wrote:

On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:

   * Lacks of options for user's configuration, such as "-vnc", there
 is no option for user to configure the properties for the "vnc",
 such as the port. It hides things, doesn't provide ways to query
 the properties too, this causes problems for libvirt to add the
 vnc support, as vnc clients such as virt-manager, virt-viewer,
 have no way to connect the guest. Even vncviewer can't.


Being able to specify a VNC port of libvirt's choosing is pretty
much mandatory to be able to support that.In addition being able
to specify the bind address is important to be able to control
security. eg to only bind to 127.0.0.1, or only to certain NICs
in a multi-NIC host.


I'll add that feature in the next couple of days.




   * KVM tool manages the network completely itself (with DHCP support?),
 no way to configure, except specify the modes (user|tap|none). I
 have not test it yet, but it should need explicit script to setup
 the network rules(e.g. NAT) for the guest access outside world.
 Anyway, there is no way for libvirt to control the guest network.


If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?


KVM tool currently creates and configures the TAP devices it uses, it
shouldn't be an issue to have it use a TAP fd passed to it either.

How does libvirt do it? Create a /dev/tapX on it's own and pass the fd
to the hypervisor?



   * There is a gap about the domain status between KVM tool and libvirt,
 it's caused by KVM tool unlink()s the guest socket when user exits
 from console (both text and graphic), but libvirt still think the
 guest is running.


Being able to reliably detect shutdown/exit of the KVM too is
a very important tasks, and we can't rely on waitpid/SIG_CHLD
because we want to daemonize all instances wrt libvirtd.

In the QEMU driver we keep open a socket to the monitor, and
when we see an I/O error / POLLHUP on the socket we know that
QEMU has quit.

What is this guest socket used for ? Could libvirt keep open a
connection to it ?


It's being used for communication with the IPC sub-commands (like 'kvm
list', 'kvm debug', etc). It's basically the server in a server-client
setup used to signal the hypervisor to do things.

Theres also no problem with keeping an open connection to it.


I'll update the codes to use it.




One other option would be to use inotify to watch for deletion
of the guest socket in the filesystem. This is sortof what we
do with the UML driver.


   * KVM tool uses $HOME/.kvm_tool as the state dir, and no way to configure,
 I made a small patch to allow KVM tool accept a ENV variable,
 which is "KVM_STATE_DIR", it's used across the driver. I made a
 simple patch against kvm tool to let the whole patches work. See
 "[PATCH] kvm tools.". As generally we want the state dir of
 a driver can be "/var/run/libvirt/kvmtool/..." for root user or
 "$HOME/.libvirt/kvmtool/run" for non-root user.


What does it do with the state dir ?  Is that just for storing the
guest socket ?


afaik that patch should be already in as well.

It does two things in the state dir:

  - Store sockets.
  - KVM tools has a feature which lets a user boot a guest based on
virtio-9p which lets him see a system which is an exact copy of the
host. This makes testing of programs and sandboxing very easy. The state
files required for that are stored in that dir as well.







With QEMU we chose $HOME/.libvirt/qemu or /var/run/libvirt because
there was no policy set by QEMU itself.  If KVM tool has a policy
for where it stores its state, we should just use that, and not
try to force it into a libvirt specific location.

In a privileged libvirtd instace, we should aim to still have
kvmtool itself run as an unprivilegd user / group , eg 'kvmtool:kvmtool'
And we could set the home dir of that user to /var/lib/kvmtool


   * kvmtoolGetVersion is just broken now, as what "./kvm version" returns
 is something like "3.0.rc5.873.gb73216", however, libvirt wants things
 like "2.6.40.6-0". This might be not a problem as long as KVM tool
 has a official package.


The version numbers libvirt reports for hypervisors are pretty
meaningless really. In that example you give I'd just report '3.0'
as the version from libvirt. Anything that relies on these versions
from libvirt is doomed to be broken anyway.


The version is just a 'git describe' of the kernel tree in which it was
built, so if you build KVM tools from an "official" kernel tree* you'll
also get pretty versions :)

* After KVM tools is merged...




   * console connection is implemented by setup ptys in libvirt, stdout/stderr
 of kvm tool process is redirected to the master pty, and libvirt connects
 to the slave pty. This 

Re: [libvirt] (no subject)

2011-12-09 Thread Osier Yang

On 2011年12月06日 22:38, Daniel P. Berrange wrote:

On Fri, Nov 11, 2011 at 07:56:58PM +0800, Osier Yang wrote:

Hi, all

This is a basic implementation of libvirt Native Linux KVM
Tool driver. Note that this is just made with my own interest
and spare time, it's not an endorsement/effort by Red Hat,
and it isn't supported by Red Hat officially.

Basically, the driver is designed as *stateful*, as KVM tool
doesn't maintain any info about the guest except a socket which
for its own IPC. And it's implemented by using KVM tool binary,
which is name "kvm" currently, along with cgroup controllers
"cpuacct", and "memory" support. And as one of KVM tool's
pricinple is to allow both the non-root and root user to play with.
The driver is designed to support root and non-root too, just
like QEMU does. Example of the connection URI:

 virsh -c kvmtool:///system
 virsh -c kvmtool:///session
 virsh -c kvmtool+unix:///system
 virsh -c kvmtool+unix:///session

The implementation can support more or less than 15 virsh commands
currently, including basic domain cycle operations (define/undefine,
start/destroy, suspend/resume, console, setmem, schedinfo, dumpxml,
,autostart, dominfo, etc.)

About the domain configuration:
   * "kernel": must be specified as KVM tool only support boots
  from the kernel currently (no integration with BIOS app yet).

   * "disk": only virtio bus is supported, and device type must be 'disk'.

   * "serial/console": only one console is supported, of type serial or
  virtio (can extend to support multiple console as long as kvm tool
  supports, libvirt already supported mutiple console, see upstream
  commit 0873b688c).

   * "p9fs": only support specifying the source dir, and mount tag, only
  type of 'mount' is supported.

   * "memballoon": only virtio is supported, and there is no way
  to config the addr.

   * Multiple "disk" and "p9fs" is supported.

   * Graphics and network are not supported, will explain below.

Please see "[PATCH 7/8]" for an example of the domain config. (which
contains all the XMLs supported by current implementation).

The problems of Native Linux KVM Tool from libvirt p.o.v:

   * Some destros package "qemu-kvm" as "kvm", also "kvm" is a long
 established name for "KVM" itself, so naming the project as
 "kvm" might be not a good idea. I assume it will be named
 as "kvmtool" in this implementation, never mind this if you
 don't like that, it can be updated easily. :-)


Yeah, naming the binary 'kvm' is just madness. I'd strongly recommend
using 'kvmtool' as the binary name to avoid confusion with existing
'kvm' binaries based on QEMU.


   * It still doesn't have an official package yet, even no "make install".
 means we have no way to check the dependancy and do the checking
 when 'configure'. I assume it will be installed as "/usr/bin/kvmtool"
 in this implementation. This is the main reason which can prevents
 upstream libvirt accepting the patches I guess.


Ok, not really a problem - we do similar for the regular QEMU driver.


   * Lacks of options for user's configuration, such as "-vnc", there
 is no option for user to configure the properties for the "vnc",
 such as the port. It hides things, doesn't provide ways to query
 the properties too, this causes problems for libvirt to add the
 vnc support, as vnc clients such as virt-manager, virt-viewer,
 have no way to connect the guest. Even vncviewer can't.


Being able to specify a VNC port of libvirt's choosing is pretty
much mandatory to be able to support that.In addition being able
to specify the bind address is important to be able to control
security. eg to only bind to 127.0.0.1, or only to certain NICs
in a multi-NIC host.


   * KVM tool manages the network completely itself (with DHCP support?),
 no way to configure, except specify the modes (user|tap|none). I
 have not test it yet, but it should need explicit script to setup
 the network rules(e.g. NAT) for the guest access outside world.
 Anyway, there is no way for libvirt to control the guest network.


If KVM tool support TAP devices, can't be do whatever we like with
that just by passing in a configured TAP device from libvir ?


   * There is a gap about the domain status between KVM tool and libvirt,
 it's caused by KVM tool unlink()s the guest socket when user exits
 from console (both text and graphic), but libvirt still think the
 guest is running.


Being able to reliably detect shutdown/exit of the KVM too is
a very important tasks, and we can't rely on waitpid/SIG_CHLD
because we want to daemonize all instances wrt libvirtd.

In the QEMU driver we keep open a socket to the monitor, and
when we see an I/O error / POLLHUP on the socket we know that
QEMU has quit.

What is this guest socket used for ? Could libvirt keep open a
connection to it ?

One other option would be to use inotify to watch for deletion
of the guest 

Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-09 Thread Heiko Carstens
On Fri, Dec 09, 2011 at 12:23:36PM +0100, Carsten Otte wrote:
> This patch introduces an interface to access the guest visible
> storage keys. It supports three operations that model the behavior
> that SSKE/ISKE/RRBE instructions would have if they were issued by
> the guest. These instructions are all documented in the z architecture
> principles of operation book.
> 
> Signed-off-by: Carsten Otte 

[...]

> +static long kvm_s390_keyop(struct kvm_s390_keyop *kop)
> +{
> + unsigned long addr = kop->user_addr;
> + pte_t *ptep;
> + pgste_t pgste;
> + int r;
> + unsigned long skey;
> + unsigned long bits;
> +
> + /* make sure this process is a hypervisor */
> + r = -EINVAL;
> + if (!mm_has_pgste(current->mm))
> + goto out;
> +
> + r = -ENXIO;
> + if (addr >= PGDIR_SIZE)
> + goto out;

imho this should be -EFAULT.

> + spin_lock(¤t->mm->page_table_lock);
> + ptep = ptep_for_addr(addr);
> + if (!ptep)
> + goto out_unlock;

ptep is a pointer and may contain an error code, like you implemented it
below. Therefore you need to check for IS_ERR() here.

> +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr)
> +{
> + struct vm_area_struct *vma;
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> +
> + vma = find_vma(mm, addr);
> + if (!vma)
> + return ERR_PTR(-EINVAL);

-EFAULT imho.

Also, why is this check good enough? As far as I remember find_vma() only
guarantees that addr < vma_end, (if vma != NULL), but it does not guarantee
that addr >= vma_start.

> - vma = find_vma(mm, vmaddr);
> - if (!vma || vma->vm_start > vmaddr)
> - return -EFAULT;

... you used to check for that and also used the proper return code, btw.
Or is there a different reason why the above code is correct?

> +pte_t *ptep_for_addr(unsigned long addr)
> +{
> + pmd_t *pmd;
> + pte_t *rc;

Would you mind renaming rc into pte?

> +
> + down_read(¤t->mm->mmap_sem);
> +
> + pmd = __pmdp_for_addr(current->mm, addr);
> + if (IS_ERR(pmd)) {
> + rc = (pte_t *)pmd;
> + goto up_out;
> + }
> +
> + rc = pte_offset(pmd, addr);
> +up_out:
> + up_read(¤t->mm->mmap_sem);
> + return rc;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 00/23] kvm tools: Prepare kvmtool for another architecture

2011-12-09 Thread Pekka Enberg
On Fri, 2011-12-09 at 17:52 +1100, Matt Evans wrote:
> This patch series rearranges and tidies various parts of kvmtool to pave the 
> way
> for the addition of support for another architecture -- SPAPR PPC64.  A second
> patch series will follow to present the PPC64 support.

I applied most of the patches. I left out the ones that Sasha and myself
still had comments on.

Thanks Matt for cleaning up our nasty x86ism! :-)

Pekka

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-09 Thread Carsten Otte

On 09.12.2011 12:37, Alexander Graf wrote:

+#define KVM_S390_SIE_PAGE_OFFSET 1


Can we please make these a global number space? I don't want to have any user space code 
"accidently" call this mmap while it's trying to find out if it can map the PIO 
page. We already have a global number space for CAPs.
I consider a vma with holes in it pretty ugly, and we've got the 
mechanism for userspace to correctly find out where to find what.

I do prefer a contiguous vma and the existing capabilities/queries.
Avi?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM

2011-12-09 Thread Carsten Otte

On 09.12.2011 12:32, Alexander Graf wrote:

+KVM_CAP_UCONTROL

KVM_S390_CAP_UCONTROL
I'm happy either way. It seemed to me that the discussion between Avi 
and Sasha for V2 of the patch series on this naming has concluded to 
KVM_CAP_UCONTROL/KVM_VM_UCONTROL without _S390 in it.



KVM_ENABLE_CAP(KVM_S390_CAP_UCONTROL)? It doesn't look like you can't switch 
from kernel-controlled to user controlled mode during runtime. All you need to 
do is remove the gmap again and you should be fine, no?

This was the case via an ioctl KVM_S390_ENABLE_UCONTROL in version 1.
Avi pointed out some possible race conditions with that, and recommended 
to switch it via KVM_CREATE_VM. I'm happy either way, just let me know 
what's prefered.



We do something similar on PPC where we just call ENABLE_CAP to switch to PAPR 
mode. If otherwise too difficult you can for example also define that the 
ENABLE_CAP has to happen before your first VCPU_RUN.

Code looks to me like you do ENABLE_CAP per vcpu on ppc (chapter 4.37
of api.txt agrees with that). We need something per VM, we cannot switch 
individual CPUs between ucontrol/regular because with ucontrol the VM 
does not have a common address space.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Fix serial port probing

2011-12-09 Thread Pekka Enberg
On Fri, Dec 9, 2011 at 1:16 PM, Sasha Levin  wrote:
> The process of probing the 8250 serial port is as follows:
>
> 1. Start detecting IRQs
> 2. Enable the IER register [At this point, the port is supposed to light
> the INTR].
> 3. Stop detecting IRQs [At this point, the driver detects which IRQ belongs
> to that port].
> 4. Disable IER register.
>
> Since we weren't enabling and disabling the IRQ based on IER writes,
> we would often fail the probing since the driver couldn't detect
> which IRQ is used by the port, and would just default that to 0.
>
> This would cause slowness and may have caused hangs. For me there is a
> significant increase in speed of the terminal after this patch.
>
> Signed-off-by: Sasha Levin 
> ---
>  tools/kvm/hw/serial.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c
> index 67a2007..8cd47cf 100644
> --- a/tools/kvm/hw/serial.c
> +++ b/tools/kvm/hw/serial.c
> @@ -233,6 +233,7 @@ static bool serial8250_out(struct ioport *ioport, struct 
> kvm *kvm, u16 port, voi
>                        break;
>                case UART_IER:
>                        dev->ier        = ioport__read8(data) & 0x3f;
> +                       kvm__irq_line(kvm, dev->irq, dev->ier?1:0);
>                        break;
>                case UART_LCR:
>                        dev->lcr        = ioport__read8(data);

Applied, thanks!

Ingo, does this fix the occasional slowdowns you are seeing?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 12:23, Carsten Otte wrote:

> This patch exports the s390 SIE hardware control block to userspace
> via the mapping of the vcpu file descriptor. In order to do so,
> a new arch callback named kvm_arch_vcpu_fault  is introduced for all
> architectures. It allows to map architecture specific pages.
> 
> Signed-off-by: Carsten Otte 
> ---
> ---
> Documentation/virtual/kvm/api.txt |5 +
> arch/ia64/kvm/kvm-ia64.c  |5 +
> arch/powerpc/kvm/powerpc.c|5 +
> arch/s390/kvm/kvm-s390.c  |   13 +
> arch/x86/kvm/x86.c|5 +
> include/linux/kvm.h   |1 +
> include/linux/kvm_host.h  |1 +
> virt/kvm/kvm_main.c   |2 +-
> 8 files changed, 36 insertions(+), 1 deletion(-)
> 
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -218,6 +218,11 @@ allocation of vcpu ids.  For example, if
> single-threaded guest vcpus, it should make all vcpu ids be a multiple
> of the number of vcpus per vcore.
> 
> +For virtual cpus that have been created with S390 user controlled virtual
> +machines, the resulting vcpu fd can be memory mapped at page offset
> +KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
> +cpu's hardware control block.
> +
> 4.8 KVM_GET_DIRTY_LOG (vm ioctl)
> 
> Capability: basic
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1566,6 +1566,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   struct kvm_memory_slot *memslot,
>   struct kvm_memory_slot old,
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -659,6 +659,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
> {
>   u32 inst_lis = 0x3c00;
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> +#ifdef CONFIG_KVM_UCONTROL
> + if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET)
> +  && (kvm_is_ucontrol(vcpu->kvm))) {
> + vmf->page = virt_to_page(vcpu->arch.sie_block);
> + get_page(vmf->page);
> + return 0;
> + }
> +#endif
> + return VM_FAULT_SIGBUS;
> +}
> +
> /* Section: memory related */
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  struct kvm_memory_slot *memslot,
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2790,6 +2790,11 @@ out:
>   return r;
> }
> 
> +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
> +{
> + return VM_FAULT_SIGBUS;
> +}
> +
> static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
> {
>   int ret;
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo {
> 
> #define KVM_VM_REGULAR  0
> #define KVM_VM_UCONTROL   1
> +#define KVM_S390_SIE_PAGE_OFFSET 1

Can we please make these a global number space? I don't want to have any user 
space code "accidently" call this mmap while it's trying to find out if it can 
map the PIO page. We already have a global number space for CAPs.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM

2011-12-09 Thread Alexander Graf

On 09.12.2011, at 12:23, Carsten Otte wrote:

> This patch introduces a new config option for user controlled kernel
> virtual machines. It introduces an optional parameter to
> KVM_CREATE_VM in order to create a user controlled virtual machine.
> The parameter is passed to kvm_arch_init_vm for all architectures.
> Valid values for the new parameter are KVM_VM_REGULAR (defined to 0
> for backward compatibility to old KVM_CREATE_VM) and
> KVM_VM_UCONTROL for s390 only.
> Note that the user controlled virtual machines require CAP_SYS_ADMIN
> privileges.
> 
> Signed-off-by: Carsten Otte 
> ---
> ---
> Documentation/virtual/kvm/api.txt |7 ++-
> arch/ia64/kvm/kvm-ia64.c  |5 -
> arch/powerpc/kvm/powerpc.c|5 -
> arch/s390/kvm/Kconfig |9 +
> arch/s390/kvm/kvm-s390.c  |   30 +-
> arch/s390/kvm/kvm-s390.h  |   10 ++
> arch/x86/kvm/x86.c|5 -
> include/linux/kvm.h   |3 +++
> include/linux/kvm_host.h  |2 +-
> virt/kvm/kvm_main.c   |   19 +--
> 10 files changed, 79 insertions(+), 16 deletions(-)
> 
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -95,7 +95,7 @@ described as 'basic' will be available.
> Capability: basic
> Architectures: all
> Type: system ioctl
> -Parameters: none
> +Parameters: machine type identifier (KVM_VM_*)
> Returns: a VM fd that can be used to control the new virtual machine.
> 
> The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
> @@ -103,6 +103,11 @@ will access the virtual machine's physic
> corresponds to guest physical address zero.  Use of mmap() on a VM fd
> is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
> available.
> +You most certainly want to use KVM_VM_REGULAR as machine type.
> +
> +In order to create user controlled virtual machines on S390, check
> +KVM_CAP_UCONTROL

KVM_S390_CAP_UCONTROL

> and use KVM_VM_UCONTROL as machine type as
> +privileged user (CAP_SYS_ADMIN).

Why not

KVM_ENABLE_CAP(KVM_S390_CAP_UCONTROL)? It doesn't look like you can't switch 
from kernel-controlled to user controlled mode during runtime. All you need to 
do is remove the gmap again and you should be fine, no?

We do something similar on PPC where we just call ENABLE_CAP to switch to PAPR 
mode. If otherwise too difficult you can for example also define that the 
ENABLE_CAP has to happen before your first VCPU_RUN.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-09 Thread Carsten Otte
This patch introduces an interface to access the guest visible
storage keys. It supports three operations that model the behavior
that SSKE/ISKE/RRBE instructions would have if they were issued by
the guest. These instructions are all documented in the z architecture
principles of operation book.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   38 +
 arch/s390/include/asm/kvm_host.h  |4 +
 arch/s390/include/asm/pgtable.h   |1 
 arch/s390/kvm/kvm-s390.c  |  106 --
 arch/s390/mm/pgtable.c|   70 ++---
 include/linux/kvm.h   |7 ++
 6 files changed, 205 insertions(+), 21 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1514,6 +1514,44 @@ table upfront. This is useful to handle
 controlled virtual machines to fault in the virtual cpu's lowcore pages
 prior to calling the KVM_RUN ioctl.
 
+4.67 KVM_S390_KEYOP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_keyop (in+out)
+Returns: 0 in case of success
+
+The parameter looks like this:
+   struct kvm_s390_keyop {
+   __u64 user_addr;
+   __u8  key;
+   __u8  operation;
+   };
+
+user_addr  contains the userspace address of a memory page
+keycontains the guest visible storage key as defined by the
+   z Architecture Principles of Operation book, including key
+   value for key controlled storage protection, the fetch
+   protection bit, and the reference and change indicator bits
+operation  indicates the key operation that should be performed
+
+The following operations are supported:
+KVM_S390_KEYOP_SSKE:
+   This operation behaves just like the set storage key extended (SSKE)
+   instruction would, if it were issued by the guest. The storage key
+   provided in "key" is placed in the guest visible storage key.
+KVM_S390_KEYOP_ISKE:
+   This operation behaves just like the insert storage key extended (ISKE)
+   instruction would, if it were issued by the guest. After this call,
+   the guest visible storage key is presented in the "key" field.
+KVM_S390_KEYOP_RRBE:
+   This operation behaves just like the reset referenced bit extended
+   (RRBE) instruction would, if it were issued by the guest. The guest
+   visible reference bit is cleared, and the value presented in the "key"
+   field after this call has the reference bit set to 1 in case the
+   guest view of the reference bit was 1 prior to this call.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -24,6 +24,10 @@
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
+#define KVM_S390_KEYOP_SSKE 0x01
+#define KVM_S390_KEYOP_ISKE 0x02
+#define KVM_S390_KEYOP_RRBE 0x03
+
 struct sca_entry {
atomic_t scn;
__u32   reserved;
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
+extern pte_t *ptep_for_addr(unsigned long addr);
 
 /*
  * No page table caches to initialise
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -112,13 +112,113 @@ void kvm_arch_exit(void)
 {
 }
 
+static long kvm_s390_keyop(struct kvm_s390_keyop *kop)
+{
+   unsigned long addr = kop->user_addr;
+   pte_t *ptep;
+   pgste_t pgste;
+   int r;
+   unsigned long skey;
+   unsigned long bits;
+
+   /* make sure this process is a hypervisor */
+   r = -EINVAL;
+   if (!mm_has_pgste(current->mm))
+   goto out;
+
+   r = -ENXIO;
+   if (addr >= PGDIR_SIZE)
+   goto out;
+
+   spin_lock(¤t->mm->page_table_lock);
+   ptep = ptep_for_addr(addr);
+   if (!ptep)
+   goto out_unlock;
+
+   pgste = pgste_get_lock(ptep);
+
+   switch (kop->operation) {
+   case KVM_S390_KEYOP_SSKE:
+   pgste = pgste_update_all(ptep, pgste);
+   /* set the real key back w/o rc bits */
+   skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
+   if (pte_present(*ptep)) {
+   page_set_storage_key(pte_val(*ptep), skey, 1);
+   /* avoid race clobbering changed bit */
+   pte_val(*ptep) |= _PAGE_SWC;
+   }
+   /* put acc+f plus guest refereced and changed into the pgste */
+   pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT
+| RCP_GC_BIT);
+   bits

[patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack

2011-12-09 Thread Carsten Otte
This patch disables the in-kernel interrupt stack for KVM virtual
machines that are controlled by user. Userspace has to take care
of handling interrupts on its own.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -511,7 +511,8 @@ static int __vcpu_run(struct kvm_vcpu *v
if (test_thread_flag(TIF_MCCK_PENDING))
s390_handle_mcck();
 
-   kvm_s390_deliver_pending_interrupts(vcpu);
+   if (!kvm_is_ucontrol(vcpu->kvm))
+   kvm_s390_deliver_pending_interrupts(vcpu);
 
vcpu->arch.sie_block->icptcode = 0;
local_irq_disable();

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS

2011-12-09 Thread Carsten Otte
This patch fixes definition of the idle_mask and the local_int array
in kvm_s390_float_interrupt. Previous definition had 64 cpus max
hardcoded instead of using KVM_MAX_VCPUS.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
===
--- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h
+++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
@@ -220,8 +220,9 @@ struct kvm_s390_float_interrupt {
struct list_head list;
atomic_t active;
int next_rr_cpu;
-   unsigned long idle_mask [(64 + sizeof(long) - 1) / sizeof(long)];
-   struct kvm_s390_local_interrupt *local_int[64];
+   unsigned long idle_mask[(KVM_MAX_VCPUS + sizeof(long) - 1)
+   / sizeof(long)];
+   struct kvm_s390_local_interrupt *local_int[KVM_MAX_VCPUS];
 };
 
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table

2011-12-09 Thread Carsten Otte
This patch allows the user to fault in pages on a virtual cpus
address space for user controlled virtual machines. Typically this
is superfluous because userspace can just create a mapping and
let the kernel's page fault logic take are of it. There is one
exception: SIE won't start if the lowcore is not present. Normally
the kernel takes care of this [handle_validity() in
arch/s390/kvm/intercept.c] but since the kernel does not handle
intercepts for user controlled virtual machines, userspace needs to
be able to handle this condition.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   16 
 arch/s390/kvm/kvm-s390.c  |6 ++
 include/linux/kvm.h   |1 +
 3 files changed, 23 insertions(+)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1498,6 +1498,22 @@ This ioctl unmaps the memory in the vcpu
 "vcpu_addr" with the length "length". The field "user_addr" is ignored.
 All parameters need to be alligned by 1 megabyte.
 
+4.66 KVM_S390_VCPU_FAULT
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: vcpu absolute address (in)
+Returns: 0 in case of success
+
+This call creates a page table entry on the virtual cpu's address space
+(for user controlled virtual machines) or the virtual machine's address
+space (for regular virtual machines). This only works for minor faults,
+thus it's recommended to access subject memory page via the user page
+table upfront. This is useful to handle validity intercepts for user
+controlled virtual machines to fault in the virtual cpu's lowcore pages
+prior to calling the KVM_RUN ioctl.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -767,6 +767,12 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
 #endif
+   case KVM_S390_VCPU_FAULT: {
+   r = gmap_fault(arg, vcpu->arch.gmap);
+   if (!IS_ERR_VALUE(r))
+   r = 0;
+   break;
+   }
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -673,6 +673,7 @@ struct kvm_s390_ucas_mapping {
 };
 #define KVM_S390_UCAS_MAP_IOW(KVMIO, 0x50, struct 
kvm_s390_ucas_mapping)
 #define KVM_S390_UCAS_UNMAP  _IOW(KVMIO, 0x51, struct 
kvm_s390_ucas_mapping)
+#define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long)
 
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM

2011-12-09 Thread Carsten Otte
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces an optional parameter to
KVM_CREATE_VM in order to create a user controlled virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
Valid values for the new parameter are KVM_VM_REGULAR (defined to 0
for backward compatibility to old KVM_CREATE_VM) and
KVM_VM_UCONTROL for s390 only.
Note that the user controlled virtual machines require CAP_SYS_ADMIN
privileges.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |7 ++-
 arch/ia64/kvm/kvm-ia64.c  |5 -
 arch/powerpc/kvm/powerpc.c|5 -
 arch/s390/kvm/Kconfig |9 +
 arch/s390/kvm/kvm-s390.c  |   30 +-
 arch/s390/kvm/kvm-s390.h  |   10 ++
 arch/x86/kvm/x86.c|5 -
 include/linux/kvm.h   |3 +++
 include/linux/kvm_host.h  |2 +-
 virt/kvm/kvm_main.c   |   19 +--
 10 files changed, 79 insertions(+), 16 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -95,7 +95,7 @@ described as 'basic' will be available.
 Capability: basic
 Architectures: all
 Type: system ioctl
-Parameters: none
+Parameters: machine type identifier (KVM_VM_*)
 Returns: a VM fd that can be used to control the new virtual machine.
 
 The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
@@ -103,6 +103,11 @@ will access the virtual machine's physic
 corresponds to guest physical address zero.  Use of mmap() on a VM fd
 is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
 available.
+You most certainly want to use KVM_VM_REGULAR as machine type.
+
+In order to create user controlled virtual machines on S390, check
+KVM_CAP_UCONTROL and use KVM_VM_UCONTROL as machine type as
+privileged user (CAP_SYS_ADMIN).
 
 4.3 KVM_GET_MSR_INDEX_LIST
 
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -809,10 +809,13 @@ static void kvm_build_io_pmt(struct kvm
 #define GUEST_PHYSICAL_RR4 0x2739
 #define VMM_INIT_RR0x1660
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
BUG_ON(!kvm);
 
+   if (type != KVM_VM_REGULAR)
+   return -EINVAL;
+
kvm->arch.is_sn2 = ia64_platform_is("sn2");
 
kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0;
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -171,8 +171,11 @@ void kvm_arch_check_processor_compat(voi
*(int *)rtn = kvmppc_core_check_processor_compat();
 }
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
+   if (type != KVM_VM_REGULAR)
+   return -EINVAL;
+
return kvmppc_core_init_vm(kvm);
 }
 
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -34,6 +34,15 @@ config KVM
 
  If unsure, say N.
 
+config KVM_UCONTROL
+   bool "Userspace controlled virtual machines"
+   depends on KVM
+   ---help---
+ Allow CAP_SYS_ADMIN users to create KVM virtual machines that are
+ controlled by userspace.
+
+ If unsure, say N.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/vhost/Kconfig
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -171,11 +171,28 @@ long kvm_arch_vm_ioctl(struct file *filp
return r;
 }
 
-int kvm_arch_init_vm(struct kvm *kvm)
+int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int rc;
char debug_name[16];
 
+   rc = -EINVAL;
+#ifdef CONFIG_KVM_UCONTROL
+   switch (type) {
+   case KVM_VM_REGULAR:
+   break;
+   case KVM_VM_UCONTROL:
+   if (!capable(CAP_SYS_ADMIN))
+   goto out_err;
+   break;
+   default:
+   goto out_err;
+   }
+#else
+   if (type != KVM_VM_REGULAR)
+   goto out_err;
+#endif
+
rc = s390_enable_sie();
if (rc)
goto out_err;
@@ -198,10 +215,13 @@ int kvm_arch_init_vm(struct kvm *kvm)
debug_register_view(kvm->arch.dbf, &debug_sprintf_view);
VM_EVENT(kvm, 3, "%s", "vm created");
 
-   kvm->arch.gmap = gmap_alloc(current->mm);
-   if (!kvm->arch.gmap)
-   goto out_nogmap;
-
+   if (type == KVM_VM_REGULAR) {
+   kvm->arch.gmap = gmap_alloc(current->mm);
+   if (!kvm->arch.gmap)
+   goto out_nogmap;
+   } else {
+   kvm->arch.gmap = NULL;
+   }
return 0;
 out_nogmap:
debug_unregister(kvm->arch.dbf);
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -47,6 +47,16 @@ static inline int __cpu_is_stopped(struc
return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOP

[patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user

2011-12-09 Thread Carsten Otte
This patch introduces a new exit reason in the kvm_run structure
named KVM_EXIT_UCONTROL. This exit indicates, that a virtual cpu
has regognized a fault on the host page table. The idea is that
userspace can handle this fault by mapping memory at the fault
location into the cpu's address space and then continue to run the
virtual cpu.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   14 ++
 arch/s390/kvm/kvm-s390.c  |   32 +++-
 arch/s390/kvm/kvm-s390.h  |1 +
 include/linux/kvm.h   |6 ++
 4 files changed, 48 insertions(+), 5 deletions(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1653,6 +1653,20 @@ s390 specific.
 
 s390 specific.
 
+   /* KVM_EXIT_UCONTROL */
+   struct {
+   __u64 trans_exc_code;
+   __u32 pgm_code;
+   } s390_ucontrol;
+
+s390 specific. A page fault has occurred for a user controlled virtual
+machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
+resolved by the kernel.
+The program code and the translation exception code that were placed
+in the cpu's lowcore are presented here as defined by the z Architecture
+Principles of Operation Book in the Chapter for Dynamic Address Translation
+(DAT)
+
/* KVM_EXIT_DCR */
struct {
__u32 dcrn;
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -499,8 +499,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(stru
return -EINVAL; /* not implemented yet */
 }
 
-static void __vcpu_run(struct kvm_vcpu *vcpu)
+static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
+   int rc;
+
memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
 
if (need_resched())
@@ -517,9 +519,15 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
VCPU_EVENT(vcpu, 6, "entering sie flags %x",
   atomic_read(&vcpu->arch.sie_block->cpuflags));
-   if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) {
-   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
-   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs);
+   if (rc) {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   rc = SIE_INTERCEPT_UCONTROL;
+   } else {
+   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
+   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = 0;
+   }
}
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
   vcpu->arch.sie_block->icptcode);
@@ -528,6 +536,7 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
 
memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16);
+   return rc;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
@@ -548,6 +557,7 @@ rerun_vcpu:
case KVM_EXIT_UNKNOWN:
case KVM_EXIT_INTR:
case KVM_EXIT_S390_RESET:
+   case KVM_EXIT_UCONTROL:
break;
default:
BUG();
@@ -559,7 +569,9 @@ rerun_vcpu:
might_fault();
 
do {
-   __vcpu_run(vcpu);
+   rc = __vcpu_run(vcpu);
+   if (rc)
+   break;
rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
@@ -571,6 +583,16 @@ rerun_vcpu:
rc = -EINTR;
}
 
+#ifdef CONFIG_KVM_UCONTROL
+   if (rc == SIE_INTERCEPT_UCONTROL) {
+   kvm_run->exit_reason = KVM_EXIT_UCONTROL;
+   kvm_run->s390_ucontrol.trans_exc_code =
+   current->thread.gmap_addr;
+   kvm_run->s390_ucontrol.pgm_code = 0x10;
+   rc = 0;
+   }
+#endif
+
if (rc == -EOPNOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -26,6 +26,7 @@ typedef int (*intercept_handler_t)(struc
 
 /* negativ values are error codes, positive values for internal conditions */
 #define SIE_INTERCEPT_RERUNVCPU(1<<0)
+#define SIE_INTERCEPT_UCONTROL (1<<1)
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu);
 
 #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -162,6 +162,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
 #define KVM_EXIT_PAPR_HCALL  19
+#define KVM_EXIT_UCONTROL20
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -249,6 +250,11 @@ struct kvm_run {
 #define KVM_S390_RESE

[patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms

2011-12-09 Thread Carsten Otte
This patch announces a new capability KVM_CAP_UCONTROL that
indicates that kvm can now support virtual machines that are
controlled by userspace.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |3 +++
 include/linux/kvm.h  |1 +
 2 files changed, 4 insertions(+)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -229,6 +229,9 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_S390_PSW:
case KVM_CAP_S390_GMAP:
case KVM_CAP_SYNC_MMU:
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_CAP_UCONTROL:
+#endif
r = 1;
break;
default:
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -574,6 +574,7 @@ struct kvm_s390_keyop {
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_UCONTROL 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-09 Thread Carsten Otte
This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |5 +
 arch/ia64/kvm/kvm-ia64.c  |5 +
 arch/powerpc/kvm/powerpc.c|5 +
 arch/s390/kvm/kvm-s390.c  |   13 +
 arch/x86/kvm/x86.c|5 +
 include/linux/kvm.h   |1 +
 include/linux/kvm_host.h  |1 +
 virt/kvm/kvm_main.c   |2 +-
 8 files changed, 36 insertions(+), 1 deletion(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -218,6 +218,11 @@ allocation of vcpu ids.  For example, if
 single-threaded guest vcpus, it should make all vcpu ids be a multiple
 of the number of vcpus per vcore.
 
+For virtual cpus that have been created with S390 user controlled virtual
+machines, the resulting vcpu fd can be memory mapped at page offset
+KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
+cpu's hardware control block.
+
 4.8 KVM_GET_DIRTY_LOG (vm ioctl)
 
 Capability: basic
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1566,6 +1566,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_memory_slot old,
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -659,6 +659,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
 {
u32 inst_lis = 0x3c00;
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+#ifdef CONFIG_KVM_UCONTROL
+   if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET)
+&& (kvm_is_ucontrol(vcpu->kvm))) {
+   vmf->page = virt_to_page(vcpu->arch.sie_block);
+   get_page(vmf->page);
+   return 0;
+   }
+#endif
+   return VM_FAULT_SIGBUS;
+}
+
 /* Section: memory related */
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_memory_slot *memslot,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2790,6 +2790,11 @@ out:
return r;
 }
 
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
+{
+   return VM_FAULT_SIGBUS;
+}
+
 static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr)
 {
int ret;
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo {
 
 #define KVM_VM_REGULAR  0
 #define KVM_VM_UCONTROL1
+#define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
  * ioctls for /dev/kvm fds:
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -449,6 +449,7 @@ long kvm_arch_dev_ioctl(struct file *fil
unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg);
+int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf);
 
 int kvm_dev_ioctl_check_extension(long ext);
 
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1657,7 +1657,7 @@ static int kvm_vcpu_fault(struct vm_area
page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
 #endif
else
-   return VM_FAULT_SIGBUS;
+   return kvm_arch_vcpu_fault(vcpu, vmf);
get_page(page);
vmf->page = page;
return 0;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts

2011-12-09 Thread Carsten Otte
This patch disables in-kernel handling of SIE intercepts for user
controlled virtual machines. All intercepts are passed to userspace
via KVM_EXIT_SIE exit reason just like SIE intercepts that cannot be
handled in-kernel for regular KVM guests.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -572,7 +572,10 @@ rerun_vcpu:
rc = __vcpu_run(vcpu);
if (rc)
break;
-   rc = kvm_handle_sie_intercept(vcpu);
+   if (kvm_is_ucontrol(vcpu->kvm))
+   rc = -EOPNOTSUPP;
+   else
+   rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
if (rc == SIE_INTERCEPT_RERUNVCPU)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers

2011-12-09 Thread Carsten Otte
This patch fixes the return code of kvm_arch_vcpu_ioctl in case
of an unkown ioctl number.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -887,7 +887,7 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
default:
-   r = -EINVAL;
+   r = -ENOTTY;
}
return r;
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca

2011-12-09 Thread Carsten Otte
This patch makes sure user controlled virtual machines do not use a
system control area (sca). This is needed in order to create
virtual machines with more cpus than the size of the sca [64].

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -234,10 +234,13 @@ out_err:
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
VCPU_EVENT(vcpu, 3, "%s", "free cpu");
-   clear_bit(63 - vcpu->vcpu_id, (unsigned long *) 
&vcpu->kvm->arch.sca->mcn);
-   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
-   (__u64) vcpu->arch.sie_block)
-   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   clear_bit(63 - vcpu->vcpu_id,
+ (unsigned long *) &vcpu->kvm->arch.sca->mcn);
+   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
+   (__u64) vcpu->arch.sie_block)
+   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   }
smp_mb();
 
if (kvm_is_ucontrol(vcpu->kvm))
@@ -374,12 +377,19 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
goto out_free_cpu;
 
vcpu->arch.sie_block->icpua = id;
-   BUG_ON(!kvm->arch.sca);
-   if (!kvm->arch.sca->cpu[id].sda)
-   kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
-   vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
-   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
-   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   if (!kvm_is_ucontrol(kvm)) {
+   if (!kvm->arch.sca) {
+   WARN_ON_ONCE(1);
+   goto out_free_cpu;
+   }
+   if (!kvm->arch.sca->cpu[id].sda)
+   kvm->arch.sca->cpu[id].sda =
+   (__u64) vcpu->arch.sie_block;
+   vcpu->arch.sie_block->scaoh =
+   (__u32)(((__u64)kvm->arch.sca) >> 32);
+   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
+   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   }
 
spin_lock_init(&vcpu->arch.local_int.lock);
INIT_LIST_HEAD(&vcpu->arch.local_int.list);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces

2011-12-09 Thread Carsten Otte
This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte 
---
---
 Documentation/virtual/kvm/api.txt |   38 
 arch/s390/kvm/kvm-s390.c  |   50 +-
 include/linux/kvm.h   |   10 +++
 3 files changed, 97 insertions(+), 1 deletion(-)

--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1455,6 +1455,44 @@ is supported; 2 if the processor require
 an RMA, or 1 if the processor can use an RMA but doesn't require it,
 because it supports the Virtual RMA (VRMA) facility.
 
+4.64 KVM_S390_UCAS_MAP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+
+The parameter is defined like this:
+   struct kvm_s390_ucas_mapping {
+   __u64 user_addr;
+   __u64 vcpu_addr;
+   __u64 length;
+   };
+
+This ioctl maps the memory at "user_addr" with the length "length" to
+the vcpu's address space starting at "vcpu_addr". All parameters need to
+be alligned by 1 megabyte.
+
+4.65 KVM_S390_UCAS_UNMAP
+
+Capability: KVM_CAP_UCONTROL
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_ucas_mapping (in)
+Returns: 0 in case of success
+
+The parameter is defined like this:
+   struct kvm_s390_ucas_mapping {
+   __u64 user_addr;
+   __u64 vcpu_addr;
+   __u64 length;
+   };
+
+This ioctl unmaps the memory in the vcpu's address space starting at
+"vcpu_addr" with the length "length". The field "user_addr" is ignored.
+All parameters need to be alligned by 1 megabyte.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -239,6 +239,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vc
(__u64) vcpu->arch.sie_block)
vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
smp_mb();
+
+   if (kvm_is_ucontrol(vcpu->kvm))
+   gmap_free(vcpu->arch.gmap);
+
free_page((unsigned long)(vcpu->arch.sie_block));
kvm_vcpu_uninit(vcpu);
kfree(vcpu);
@@ -269,12 +273,20 @@ void kvm_arch_destroy_vm(struct kvm *kvm
kvm_free_vcpus(kvm);
free_page((unsigned long)(kvm->arch.sca));
debug_unregister(kvm->arch.dbf);
-   gmap_free(kvm->arch.gmap);
+   if (!kvm_is_ucontrol(kvm))
+   gmap_free(kvm->arch.gmap);
 }
 
 /* Section: vcpu related */
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   vcpu->arch.gmap = gmap_alloc(current->mm);
+   if (!vcpu->arch.gmap)
+   return -ENOMEM;
+   return 0;
+   }
+
vcpu->arch.gmap = vcpu->kvm->arch.gmap;
return 0;
 }
@@ -693,6 +705,42 @@ long kvm_arch_vcpu_ioctl(struct file *fi
case KVM_S390_INITIAL_RESET:
r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
break;
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_S390_UCAS_MAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr,
+ucasmap.vcpu_addr, ucasmap.length);
+   break;
+   }
+   case KVM_S390_UCAS_UNMAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr,
+   ucasmap.length);
+   break;
+   }
+#endif
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -657,6 +657,16 @@ struct kvm_clock_data {
struct kvm_use

[patch 00/12] Ucontrol patches V3

2011-12-09 Thread Carsten Otte
Hi Avi, Hi Marcelo,

this round includes feedback from Sasha Levin:
KVM_VM_S390_UCONTROL renamed to KVM_VM_UCONTROL
KVM_CAP_S390_UCONTROL renamed to KVM_CAP_UCONTROL

and a bugfix for a possible host change bit underindication (race)
in SSKE that was reported by Joachim off-list.

@Heiko: since Martin is out, could you please sit in for him and
ack the pgtable.[ch] parts of patch #10 so that Avi can pick up
the whole series?

so long,
Carsten

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Fix serial port probing

2011-12-09 Thread Sasha Levin
The process of probing the 8250 serial port is as follows:

1. Start detecting IRQs
2. Enable the IER register [At this point, the port is supposed to light
the INTR].
3. Stop detecting IRQs [At this point, the driver detects which IRQ belongs
to that port].
4. Disable IER register.

Since we weren't enabling and disabling the IRQ based on IER writes,
we would often fail the probing since the driver couldn't detect
which IRQ is used by the port, and would just default that to 0.

This would cause slowness and may have caused hangs. For me there is a
significant increase in speed of the terminal after this patch.

Signed-off-by: Sasha Levin 
---
 tools/kvm/hw/serial.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c
index 67a2007..8cd47cf 100644
--- a/tools/kvm/hw/serial.c
+++ b/tools/kvm/hw/serial.c
@@ -233,6 +233,7 @@ static bool serial8250_out(struct ioport *ioport, struct 
kvm *kvm, u16 port, voi
break;
case UART_IER:
dev->ier= ioport__read8(data) & 0x3f;
+   kvm__irq_line(kvm, dev->irq, dev->ier?1:0);
break;
case UART_LCR:
dev->lcr= ioport__read8(data);
-- 
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-09 Thread Raghavendra K T

On 12/08/2011 03:10 PM, Avi Kivity wrote:

On 12/07/2011 06:46 PM, Raghavendra K T wrote:

On 12/07/2011 08:22 PM, Avi Kivity wrote:

On 12/07/2011 03:39 PM, Marcelo Tosatti wrote:

Also I think we can keep the kicked flag in vcpu->requests, no need
for
new storage.


Was going to suggest it but it violates the currently organized
processing of entries at the beginning of vcpu_enter_guest.

That is, this "kicked" flag is different enough from vcpu->requests
processing that a separate variable seems worthwhile (even more
different with convertion to MP_STATE at KVM_GET_MP_STATE).


IMO, it's similar to KVM_REQ_EVENT (which can also cause mpstate to
change due to apic re-evaluation).



Ok, So what I understand is we have to either :
1. retain current kick flag AS-IS but would have to make it migration
friendly. [I still have to get more familiar with migration side]
or
2. introduce notion similar to KVM_REQ_PVLOCK_KICK(??) to be part of
vcpu->requests.

So what would be better? Please let me know.



IMO, KVM_REQ.



Ok, 'll continue in this direction. Hmm I think now the race condition 
should be kept in mind, pointed by Marcello.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-09 Thread Rusty Russell
On Thu, 08 Dec 2011 17:37:37 +0200, Sasha Levin  wrote:
> Which leads me to the question: Are MMIO vs MMIO reads/writes not
> ordered?

That seems really odd, especially being repeatable.

BTW, that's an address, not a pfn now.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-09 Thread Rusty Russell
On Thu, 08 Dec 2011 12:37:48 +0200, Sasha Levin  wrote:
> On Thu, 2011-12-08 at 20:14 +1030, Rusty Russell wrote:
> > On Wed, 7 Dec 2011 17:48:17 +0200, "Michael S. Tsirkin"  
> > wrote:
> > > On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> > > > On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
> > > > 
> > > > [snip]
> > > > 
> > > > Rusty, Michael, does the below looks a reasonable optimization for you?
> > > 
> > > OK overall but a bit hard to say for sure as it looks pretty incomplete 
> > > ...
> > 
> > A static threshold is very hackish; we need to either initialize it to
> > a proven-good value (since noone will ever change it) or be cleverer.
> 
> I'll better wait to see how the threshold issue is resolved, and
> possibly do it as a dynamic value which depends on the threshold.
> 
> I doubt theres one magic value which would work for all.

Sure, but if it's generally better than the current value, I'll apply it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-09 Thread Rusty Russell
On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings  
wrote:
> Solarflare controllers (sfc driver) have 8192 perfect filters for
> TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
> are organised as a hash table, but matched based on 5-tuples.)  I
> implemented the 'accelerated RFS' interface in this driver.
> 
> I believe the Intel 82599 controllers (ixgbe driver) have both
> hash-based and perfect filter modes and the driver can be configured to
> use one or the other.  The driver has its own independent mechanism for
> steering RX and TX flows which predates RFS; I don't know whether it
> uses hash-based or perfect filters.

Thanks for this summary (and Jason, too).  I've fallen a long way behind
NIC state-of-the-art.
 
> Most multi-queue controllers could support a kind of hash-based
> filtering for TCP/IP by adjusting the RSS indirection table.  However,
> this table is usually quite small (64-256 entries).  This means that
> hash collisions will be quite common and this can result in reordering.
> The same applies to the small table Jason has proposed for virtio-net.

But this happens on real hardware today.  Better that real hardware is
nice, but is it overkill?

And can't you reorder even with perfect matching, since prior packets
will be on the old queue and more recent ones on the new queue?  Does it
discard or requeue old ones?  Or am I missing a trick?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][Autotest] Add explain to unpickle error in parallel module.

2011-12-09 Thread Jiří Župka
Signed-off-by: Jiří Župka 
---
 client/bin/parallel.py |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/client/bin/parallel.py b/client/bin/parallel.py
index cb02082..78a02c9 100644
--- a/client/bin/parallel.py
+++ b/client/bin/parallel.py
@@ -56,7 +56,12 @@ def fork_start(tmp, l):
 def _check_for_subprocess_exception(temp_dir, pid):
 ename = temp_dir + "/debug/error-%d" % pid
 if os.path.exists(ename):
-e = pickle.load(file(ename, 'r'))
+try:
+e = pickle.load(file(ename, 'r'))
+except ImportError:
+logging.error("Unknown exception to unpickle. Exception must be"
+  " defined in error module.")
+raise
 # rename the error-pid file so that they do not affect later child
 # processes that use recycled pids.
 i = 0
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-09 Thread Pekka Enberg
On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans  wrote:
> Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
> memory (down in kvm__arch_init()).  For x86, guest memory is a normal
> ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.
>
> Signed-off-by: Matt Evans 

Btw, why don't you want to use MADV_HUGEPAGE for this? You could just
do it unconditionally, no?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-09 Thread Pekka Enberg
On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans  wrote:
> Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
> memory (down in kvm__arch_init()).  For x86, guest memory is a normal
> ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.
>
> Signed-off-by: Matt Evans 

> +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size)
> +{
> +       char mpath[PATH_MAX];
> +       int fd;
> +       int r;
> +       struct statfs sfs;
> +       void *addr;
> +
> +       do {
> +               /*
> +                * QEMU seems to work around this returning EINTR...  Let's do
> +                * that too.
> +                */
> +               r = statfs(htlbfs_path, &sfs);
> +       } while (r && errno == EINTR);

Can this really happen? What about EAGAIN? The retry logic really
wants to live in tools/kvm/read-write.c as a xstatfs() wrapper if we
do need this.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-09 Thread Pekka Enberg
On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin  wrote:
> If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
> headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
> tools tree as well.

Yup, all we need is ACKs from PPC maintainers.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-09 Thread Sasha Levin
If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
tools tree as well.

On Fri, 2011-12-09 at 17:53 +1100, Matt Evans wrote:
> kvmtool's types.h includes , which by default on PPC64 brings in
> int-l64.h; define __SANE_USERSPACE_TYPES__ to get LL64 types.
> 
> This patch also adds CFLAGS to the final link, so that any -m64 is obeyed
> when linking, too.
> 
> Signed-off-by: Matt Evans 
> ---
>  tools/kvm/Makefile  |2 +-
>  tools/kvm/include/linux/types.h |1 +
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
> index 009a6ba..57dc521 100644
> --- a/tools/kvm/Makefile
> +++ b/tools/kvm/Makefile
> @@ -218,7 +218,7 @@ KVMTOOLS-VERSION-FILE:
>  
>  $(PROGRAM): $(DEPS) $(OBJS)
>   $(E) "  LINK" $@
> - $(Q) $(CC) $(OBJS) $(LIBS) -o $@
> + $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@
>  
>  $(GUEST_INIT): guest/init.c
>   $(E) "  LINK" $@
> diff --git a/tools/kvm/include/linux/types.h b/tools/kvm/include/linux/types.h
> index 357799c..5e20f10 100644
> --- a/tools/kvm/include/linux/types.h
> +++ b/tools/kvm/include/linux/types.h
> @@ -2,6 +2,7 @@
>  #define LINUX_TYPES_H
>  
>  #include 
> +#define __SANE_USERSPACE_TYPES__ /* For PPC64, to get LL64 types */
>  #include 
>  
>  typedef __u64 u64;

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html