Re: [PATCH 6/6] vhost_net: remove the max pending check

2013-08-26 Thread Jason Wang
On 08/25/2013 07:53 PM, Michael S. Tsirkin wrote:
 On Fri, Aug 23, 2013 at 04:55:49PM +0800, Jason Wang wrote:
 On 08/20/2013 10:48 AM, Jason Wang wrote:
 On 08/16/2013 06:02 PM, Michael S. Tsirkin wrote:
 On Fri, Aug 16, 2013 at 01:16:30PM +0800, Jason Wang wrote:
 We used to limit the max pending DMAs to prevent guest from pinning too 
 many
 pages. But this could be removed since:

 - We have the sk_wmem_alloc check in both tun/macvtap to do the same 
 work
 - This max pending check were almost useless since it was one done when 
 there's
   no new buffers coming from guest. Guest can easily exceeds the 
 limitation.
 - We've already check upend_idx != done_idx and switch to non zerocopy 
 then. So
   even if all vq-heads were used, we can still does the packet 
 transmission.
 We can but performance will suffer.
 The check were in fact only done when no new buffers submitted from
 guest. So if guest keep sending, the check won't be done.

 If we really want to do this, we should do it unconditionally. Anyway, I
 will do test to see the result.
 There's a bug in PATCH 5/6, the check:

 nvq-upend_idx != nvq-done_idx

 makes the zerocopy always been disabled since we initialize both
 upend_idx and done_idx to zero. So I change it to:

 (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx.
 But what I would really like to try is limit ubuf_info to VHOST_MAX_PEND.
 I think this has a chance to improve performance since
 we'll be using less cache.

Maybe, but it in fact decrease the vq size to VHOST_MAX_PEND.
 Of course this means we must fix the code to really never submit
 more than VHOST_MAX_PEND requests.

 Want to try?

Ok, sure.
 With this change on top, I didn't see performance difference w/ and w/o
 this patch.
 Did you try small message sizes btw (like 1K)? Or just netperf
 default of 64K?


I just test multiple sessions of TCP_RR. Will test TCP_STREAM also.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ] Documentation/kvm: Update cpuid documentation for steal time and pv eoi

2013-08-26 Thread Michael S. Tsirkin
On Fri, Aug 23, 2013 at 05:34:47PM +0530, Raghavendra K T wrote:
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  While adding documentation for pvspinlock, I found that these two should
  be updated. I have based this on top of pvspinlock kvm host patchset (V12)

I would change the description to merely say what the CPUID bits
mean, and what they mean is exactly that an MSR is valid.
Use KVM_FEATURE_ASYNC_PF as a template.


  Documentation/virtual/kvm/cpuid.txt | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/Documentation/virtual/kvm/cpuid.txt 
 b/Documentation/virtual/kvm/cpuid.txt
 index 22ff659..15a5ac20 100644
 --- a/Documentation/virtual/kvm/cpuid.txt
 +++ b/Documentation/virtual/kvm/cpuid.txt
 @@ -43,6 +43,15 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
 available at msrs
  KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
 ||   || writing to msr 0x4b564d02
  
 --
 +KVM_FEATURE_STEAL_TIME || 5 || guest accounts fine 
 granularity
 +   ||   || task steal time.

I'm not sure what this phrase means.
Steal time is a host feature, not a guest feature:
IIUC if this bit is set, the hypervisor can pass the guest information
about how much time was spent running other processes outside the VM.

 enabled when
 +   ||   || shedstat or task delay 
 accounting
 +   ||   || is supported by the host.

I think it's enabled by guest, not by host.


 +--
 +KVM_FEATURE_PV_EOI || 6 || overrides the generic EOI
 +   ||   || implementation with an 
 optimized
 +   ||   || version.

More exactly with a paravirtualized version.

 +--
  KVM_FEATURE_PV_UNHALT  || 7 || guest checks this feature bit
 ||   || before enabling 
 paravirtualized
 ||   || spinlock support.
 -- 
 1.7.11.7
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-26 Thread Jan Kiszka
On 2013-08-25 17:26, Arthur Chunqi Li wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.
 
 With this patch, nested VMX preemption timer features are fully
 supported.
 
 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   49 -
  1 file changed, 44 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..6aa320e 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);
  
 @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }
  
 +static void nested_fix_preempt(struct kvm_vcpu *vcpu)

nested_adjust_preemption_timer - just preempt can be misleading.

 +{
 + u64 delta_guest_tsc;
 + u32 preempt_val, preempt_bit, delta_preempt_val;
 +
 + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC)  0x1F;

This is rather preemption_timer_scale. And if there is no symbolic value
for the bitmask, please introduce one.

 + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;
 + delta_preempt_val = delta_guest_tsc  preempt_bit;
 + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + if (preempt_val - delta_preempt_val  0)
 + preempt_val = 0;
 + else
 + preempt_val -= delta_preempt_val;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val);

The rest unfortunately wrong. It has to be split into two parts: Part
one, the calculation of L1's TSC value and its storing in nested_vmx,
has to be done on vmexit. Part two, reading the current TSC, calculating
the time spent in L0 and converting it into L1 TSC time, this has to be
done right before vmentry of L2.

Arthur, please make sure that your test case detects the current
breakage of preemption timer emulation properly, both /wrt to missing
save/restore and also regarding missing L0 time compensation, and then
check that your KVM patch fixes it based on the unit test results.

Jan

 +}
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   else
   vmx-nested.nested_run_pending = 0;
  
 - if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
 - nested_vmx_vmexit(vcpu);
 - return 1;
 + if (is_guest_mode(vcpu)) {
 + if (nested_vmx_exit_handled(vcpu)) {
 + nested_vmx_vmexit(vcpu);
 + return 1;
 + } else
 + nested_fix_preempt(vcpu);
   }
  
   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
 @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 exec_control;
 + u32 exit_control;
  
   vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
   vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
 @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
* bits are further modified by vmx_set_efer() below.
*/
 - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
 + exit_control = vmcs_config.vmexit_ctrl;
 + if (vmcs12-pin_based_vm_exec_control  PIN_BASED_VMX_PREEMPTION_TIMER)
 + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + vmcs_write32(VM_EXIT_CONTROLS, exit_control);
  
   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
* emulated by vmx_set_efer(), below.
 @@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   vmcs12-guest_pending_dbg_exceptions =
   

Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Paolo Bonzini
Il 25/08/2013 17:04, Alexander Graf ha scritto:
 
 On 24.08.2013, at 21:14, Yann Droneaud wrote:
 
 KVM uses anon_inode_get() to allocate file descriptors as part
 of some of its ioctls. But those ioctls are lacking a flag argument
 allowing userspace to choose options for the newly opened file descriptor.

 In such case it's advised to use O_CLOEXEC by default so that
 userspace is allowed to choose, without race, if the file descriptor
 is going to be inherited across exec().

 This patch set O_CLOEXEC flag on all file descriptors created
 with anon_inode_getfd() to not leak file descriptors across exec().

 Signed-off-by: Yann Droneaud ydrone...@opteya.com
 Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com
 
 Reviewed-by: Alexander Graf ag...@suse.de
 
 Would it make sense to simply inherit the O_CLOEXEC flag from the
 parent kvm fd instead? That would give user space the power to keep
 fds across exec() if it wants to.

Does it make sense to use non-O_CLOEXEC file descriptors with KVM at
all?  Besides fork() not being supported by KVM, as described in
Documentation/virtual/kvm/api.txt, the VMAs of the parent process go
away as soon as you exec().  I'm not sure how you can use the inherited
file descriptor in a sensible way after exec().

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-26 Thread Paolo Bonzini
Il 23/08/2013 13:33, Andrew Jones ha scritto:
 Does smp_cpus map to the current
 number of cpus, or to the number of possible cpus? If it maps to the number
 of possible cpus, then this is the right place. If the former, then I guess
 it'll take more thought. I'ved added Igor (still on vacation) to this reply,
 but regardless I vote we worry about hot-plug limit checking in different
 patch.

smp_cpus is the initial number, max_cpus is the number of possible cpus.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-26 Thread Andrew Jones


- Original Message -
 Il 23/08/2013 13:33, Andrew Jones ha scritto:
  Does smp_cpus map to the current
  number of cpus, or to the number of possible cpus? If it maps to the number
  of possible cpus, then this is the right place. If the former, then I guess
  it'll take more thought. I'ved added Igor (still on vacation) to this
  reply,
  but regardless I vote we worry about hot-plug limit checking in different
  patch.
 
 smp_cpus is the initial number, max_cpus is the number of possible cpus.
 

Yeah, I noticed that this issue is at least partially addressed already with
my v2, which incorporates Marcelo's check against the number of hotpluggable
cpus (max_cpus).

drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Partial huge page backing with KVM/qemu

2013-08-26 Thread Gleb Natapov
On Mon, Aug 26, 2013 at 02:09:57AM +, Chris Leduc wrote:
 
 
  -Original Message-
  From: Gleb Natapov [mailto:g...@redhat.com]
  Sent: Sunday, August 25, 2013 1:52 AM
  To: Chris Leduc
  Cc: kvm@vger.kernel.org
  Subject: Re: Partial huge page backing with KVM/qemu
  
  On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote:
   Hi - In a KVM/qemu environment is it possible for the host to back only a
  portion of the guests memory with huge pages?  In some situations it may
  not be desirable to back the entirety of a guest's memory with huge pages
  (as can be done via libvirt memoryBacking option).
  What are those situations?
 For example to limit a guest with 64GB of total memory to use 4GB of huge 
 pages for fast lookup memory.  This takes advantage of the 4 TLB entries for 
 1G pages on a Sandy/Ivy Bridge processor to ensure a page walk is never 
 necessary for this fast memory.  An example is a high performance data plane 
 application.  The remainder of the less frequently accessed memory can be in 
 normal pages.
 
When two level paging (EPT) is in use combined mappings are stored in
TLB, not linear mappings (see 28.3.1). I am not sure those will ever
use 1G TLB.  Not with KVM anyway since KVM does not use 1G pages for EPT
tables since the chance to get as much of contiguous memory on a running
system is close to zero.

   What would be very useful is to request huge pages in the guest, either at
  boot time or dynamically, and have the host back them with physical huge
  pages, but not back the rest of the normal page guest memory with huge
  pages from the host.
  
   The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot
  line.
  
  As far as I can tell this disables/enables use of huge pages by XEN vm, not
  something you say you want.
 
 The Xen documentation is not clear on this, but in practice this flag allows 
 the host to back up guest huge page requests with physical huge pages.  So 
 the guest could for example add hugepages=N to its boot line and these pages 
 would be backed in the host with corresponding physical huge pages.
Allow me to be sceptical on this :) With shadow paging sure, same is true
for KVM: if guest maps memory with huge page and memory is contiguous on
a host too KVM will create huge shadow page, but with two level paging
hypervisor has no idea how guest's page tables look. The best it can do
is to map entire guest physical memory using huge pages.

 
 From experimentation with KVM, requesting hugepages at guest boot time 
 (without memory backing enabled) will result in guest hugepages backed by 
 host normal pages.
What do you mean by requesting hugepages at guest boot time and how
have you checked that guest hugepages backed by host normal pages? Do
you have THP enabled? Without THP you need to back guest's memory with
huge pages using –mem-path /hugepagesfs. But again only 2MB pages are
supported.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Yann Droneaud

Le 26.08.2013 09:39, Paolo Bonzini a écrit :

Il 25/08/2013 17:04, Alexander Graf ha scritto:

On 24.08.2013, at 21:14, Yann Droneaud wrote:



This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().

Signed-off-by: Yann Droneaud ydrone...@opteya.com
Link: 
http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com


Reviewed-by: Alexander Graf ag...@suse.de

Would it make sense to simply inherit the O_CLOEXEC flag from the
parent kvm fd instead? That would give user space the power to keep
fds across exec() if it wants to.


Does it make sense to use non-O_CLOEXEC file descriptors with KVM at
all?  Besides fork() not being supported by KVM, as described in
Documentation/virtual/kvm/api.txt, the VMAs of the parent process go
away as soon as you exec().  I'm not sure how you can use the inherited
file descriptor in a sensible way after exec().



Sounds a lot like InfiniBand subsystem behavor: IB file descriptors
are of no use accross exec() since memory mappings tied to those fds
won't be available in the new process:

https://lkml.org/lkml/2013/7/8/380
http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org

Regards.

--
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Paolo Bonzini
Il 26/08/2013 10:23, Yann Droneaud ha scritto:
 
 Sounds a lot like InfiniBand subsystem behavor: IB file descriptors
 are of no use accross exec() since memory mappings tied to those fds
 won't be available in the new process:
 
 https://lkml.org/lkml/2013/7/8/380
 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org

Yes, it is very similar.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V13 1/4] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

2013-08-26 Thread Raghavendra K T
this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 include/uapi/linux/kvm_para.h| 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V13 3/4] kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

2013-08-26 Thread Raghavendra K T
Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov g...@redhat.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kvm/lapic.c |  5 -
 arch/x86/kvm/x86.c   | 25 ++---
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
break;
 
case APIC_DM_REMRD:
-   apic_debug(Ignoring delivery mode 3\n);
+   result = 1;
+   vcpu-arch.pv.pv_unhalted = 1;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
break;
 
case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int 
apicid)
 {
-   struct kvm_vcpu *vcpu = NULL;
-   int i;
+   struct kvm_lapic_irq lapic_irq;
 
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!kvm_apic_present(vcpu))
-   continue;
+   lapic_irq.shorthand = 0;
+   lapic_irq.dest_mode = 0;
+   lapic_irq.dest_id = apicid;
 
-   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-   break;
-   }
-   if (vcpu) {
-   /*
-* Setting unhalt flag here can result in spurious runnable
-* state when unhalt reset does not happen in vcpu_block.
-* But that is harmless since that should soon result in halt.
-*/
-   vcpu-arch.pv.pv_unhalted = true;
-   /* We need everybody see unhalt before vcpu unblocks */
-   smp_wmb();
-   kvm_vcpu_kick(vcpu);
-   }
+   lapic_irq.delivery_mode = APIC_DM_REMRD;
+   kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host

2013-08-26 Thread Raghavendra K T

 This series forms the kvm host part of paravirtual spinlock
 based against kvm tree.

 Please refer to https://lkml.org/lkml/2013/8/9/265 for
 kvm guest and Xen, x86 part merged to -tip spinlocks.

 Please note that:
 kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch
 for both guest and host.

 Changes since V12:
  fold the patch 3 into patch 2 for bisection. (Eric Northup)

Raghavendra K T (3):
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  kvm hypervisor: Simplify kvm_for_each_vcpu with
kvm_irq_delivery_to_apic
  Documentation/kvm : Add documentation on Hypercalls and features used
for PV spinlock

Srivatsa Vaddagiri (1):
  kvm hypervisor : Add a hypercall to KVM hypervisor to support
pv-ticketlocks

 Documentation/virtual/kvm/cpuid.txt  |  4 
 Documentation/virtual/kvm/hypercalls.txt | 14 ++
 arch/x86/include/asm/kvm_host.h  |  5 +
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/lapic.c |  5 -
 arch/x86/kvm/x86.c   | 31 ++-
 include/uapi/linux/kvm_para.h|  1 +
 8 files changed, 61 insertions(+), 3 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V13 4/4] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2013-08-26 Thread Raghavendra K T
KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU
Cc: Rob Landley r...@landley.net
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 Documentation/virtual/kvm/cpuid.txt  |  4 
 Documentation/virtual/kvm/hypercalls.txt | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..22ff659 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
available at msrs
 KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
||   || writing to msr 0x4b564d02
 --
+KVM_FEATURE_PV_UNHALT  || 7 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || spinlock support.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and 
guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer

2013-08-26 Thread Arthur Chunqi Li
On Mon, Aug 26, 2013 at 3:23 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-08-25 17:26, Arthur Chunqi Li wrote:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2-L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of Save VMX-preemption timer value VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   49 -
  1 file changed, 44 insertions(+), 5 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 57b4e12..6aa320e 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void)
  #ifdef CONFIG_X86_64
   VM_EXIT_HOST_ADDR_SPACE_SIZE |
  #endif
 - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
 + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
 + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
 + if (!(nested_vmx_pinbased_ctls_high  PIN_BASED_VMX_PREEMPTION_TIMER))
 + nested_vmx_exit_ctls_high =
 + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
 + if (!(nested_vmx_exit_ctls_high  VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
 + nested_vmx_pinbased_ctls_high =
 + (~PIN_BASED_VMX_PREEMPTION_TIMER);
   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
 VM_EXIT_LOAD_IA32_EFER);

 @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
 u64 *info1, u64 *info2)
   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
  }

 +static void nested_fix_preempt(struct kvm_vcpu *vcpu)

 nested_adjust_preemption_timer - just preempt can be misleading.

 +{
 + u64 delta_guest_tsc;
 + u32 preempt_val, preempt_bit, delta_preempt_val;
 +
 + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC)  0x1F;

 This is rather preemption_timer_scale. And if there is no symbolic value
 for the bitmask, please introduce one.

 + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu,
 + native_read_tsc()) - vcpu-arch.last_guest_tsc;
 + delta_preempt_val = delta_guest_tsc  preempt_bit;
 + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
 + if (preempt_val - delta_preempt_val  0)
 + preempt_val = 0;
 + else
 + preempt_val -= delta_preempt_val;
 + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val);

 The rest unfortunately wrong. It has to be split into two parts: Part
 one, the calculation of L1's TSC value and its storing in nested_vmx,
 has to be done on vmexit. Part two, reading the current TSC, calculating
 the time spent in L0 and converting it into L1 TSC time, this has to be
 done right before vmentry of L2.
As what we discussed yesterday, the calculation of L1's TSC value is
not saved in nested_vmx, however, to avoid adding codes to the hot
patch of vmexit. Instead, we use vcpu-arch.last_guest_tsc as the
value stored on vmexit (which has been done already). And the value of
part two is calculated in nested_fix_preempt() above (see variant
delta_guest_tsc, which stores the consumed TSC value in L0). Since
vmx_handle_exit is the last function called in vmexit path, I think
it's OK to put part two here.

 Arthur, please make sure that your test case detects the current
 breakage of preemption timer emulation properly, both /wrt to missing
 save/restore and also regarding missing L0 time compensation, and then
 check that your KVM patch fixes it based on the unit test results.
OK, I will commit a patch of kvm-unit-tests to test these changes.

Arthur

 Jan

 +}
  /*
   * The guest has exited.  See if we can fix it or if we need userspace
   * assistance.
 @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
   else
   vmx-nested.nested_run_pending = 0;

 - if (is_guest_mode(vcpu)  nested_vmx_exit_handled(vcpu)) {
 - nested_vmx_vmexit(vcpu);
 - return 1;
 + if (is_guest_mode(vcpu)) {
 + if (nested_vmx_exit_handled(vcpu)) {
 + nested_vmx_vmexit(vcpu);
 + return 1;
 + } else
 + nested_fix_preempt(vcpu);
   }

   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
 @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 exec_control;
 + u32 exit_control;

   vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector);
   vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector);
 @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER

Re: [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host

2013-08-26 Thread Gleb Natapov
On Mon, Aug 26, 2013 at 02:18:32PM +0530, Raghavendra K T wrote:
 
  This series forms the kvm host part of paravirtual spinlock
  based against kvm tree.
 
  Please refer to https://lkml.org/lkml/2013/8/9/265 for
  kvm guest and Xen, x86 part merged to -tip spinlocks.
 
  Please note that:
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch
  for both guest and host.
 
Thanks, applied. The patchset is not against kvm.git queue though, so I
had to fix one minor conflict manually.

  Changes since V12:
   fold the patch 3 into patch 2 for bisection. (Eric Northup)
 
 Raghavendra K T (3):
   kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
   kvm hypervisor: Simplify kvm_for_each_vcpu with
 kvm_irq_delivery_to_apic
   Documentation/kvm : Add documentation on Hypercalls and features used
 for PV spinlock
 
 Srivatsa Vaddagiri (1):
   kvm hypervisor : Add a hypercall to KVM hypervisor to support
 pv-ticketlocks
 
  Documentation/virtual/kvm/cpuid.txt  |  4 
  Documentation/virtual/kvm/hypercalls.txt | 14 ++
  arch/x86/include/asm/kvm_host.h  |  5 +
  arch/x86/include/uapi/asm/kvm_para.h |  1 +
  arch/x86/kvm/cpuid.c |  3 ++-
  arch/x86/kvm/lapic.c |  5 -
  arch/x86/kvm/x86.c   | 31 ++-
  include/uapi/linux/kvm_para.h|  1 +
  8 files changed, 61 insertions(+), 3 deletions(-)
 
 -- 
 1.7.11.7

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Gleb Natapov
On Sat, Aug 24, 2013 at 10:14:06PM +0200, Yann Droneaud wrote:
 Hi,
 
 Following a patchset asking to change calls to get_unused_flag() [1]
 to use O_CLOEXEC, Alex Williamson [2][3] decided to change VFIO
 to use the flag.
 
 Since it's a related subsystem to KVM, using O_CLOEXEC for
 file descriptors created by KVM might be applicable too.
 
 I'm suggesting to change calls to anon_inode_getfd() to use O_CLOEXEC
 as default flag.
 
 This patchset should be reviewed to not break existing userspace program.
 
 BTW, if it's not applicable, I would suggest that new ioctls be added to
 KVM subsystem, those ioctls would have a flag field added to their 
 arguments.
 Such flag would let userspace choose the open flag to use.
 See for example other APIs using anon_inode_getfd() such as fanotify,
 inotify, signalfd and timerfd.
 
 You might be interested to read:
 
 - Secure File Descriptor Handling (Ulrich Drepper, 2008)
   http://udrepper.livejournal.com/20407.html
 
 - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) 
   http://danwalsh.livejournal.com/53603.html
 
Applied, thanks.

 Regards.
 
 [1] http://lkml.kernel.org/r/cover.1376327678.git.ydrone...@opteya.com
 [2] http://lkml.kernel.org/r/1377186804.25163.17.ca...@ul30vt.home
 [3] http://lkml.kernel.org/r/20130822171744.1297.13711.st...@bling.home
 
 Yann Droneaud (2):
   kvm: use anon_inode_getfd() with O_CLOEXEC flag
   ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
 
  arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
  arch/powerpc/kvm/book3s_64_vio.c| 2 +-
  arch/powerpc/kvm/book3s_hv.c| 2 +-
  virt/kvm/kvm_main.c | 6 +++---
  4 files changed, 6 insertions(+), 6 deletions(-)
 
 -- 
 1.8.3.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host

2013-08-26 Thread Raghavendra K T

On 08/26/2013 03:34 PM, Gleb Natapov wrote:

On Mon, Aug 26, 2013 at 02:18:32PM +0530, Raghavendra K T wrote:


  This series forms the kvm host part of paravirtual spinlock
  based against kvm tree.

  Please refer to https://lkml.org/lkml/2013/8/9/265 for
  kvm guest and Xen, x86 part merged to -tip spinlocks.

  Please note that:
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch
  for both guest and host.


Thanks, applied. The patchset is not against kvm.git queue though, so I
had to fix one minor conflict manually.


Thank you Gleb.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines

2013-08-26 Thread Gleb Natapov
On Sun, Aug 25, 2013 at 04:27:14PM +0100, Alexander Graf wrote:
 
 On 25.08.2013, at 16:18, Peter Maydell wrote:
 
  On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote:
  On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote:
  Smiley noted, but this is pretty unlikely since it's not possible
  to lie to the guest about which mode it's in, so you can't make
  a guest think it's in Hyp mode.
  
  I suspected this, but forgot most that I read about Hyp mode by now.
  Need to refresh my memory ASAP. Is it impossible even with a lot of
  emulation? Can guest detect that it is not in a Hyp mode without
  trapping into hypervisor?
  
  Yes. The current mode is in the the low bits of the CPSR, which
  is readable without causing a trap. This is just the most obvious
  roadblock; I bet there are more. If you really had to run Hyp mode
  code in a VM you probably have to do it by having it all emulated
  via TCG.
 
 Or in an in-kernel instruction emulator that we have lying around anyways. 
 For kvm-in-kvm that should be good enough, as we only need to execute a few 
 instructions in HYP mode.
 
Will require emulation on each trap to Hyp mode tough. But since you
already have ideas about nested Hyp I consider it done :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: PPC: Book3S HV: Implement timebase offset for guests

2013-08-26 Thread Paul Mackerras
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  This means that userspace must supply
a value for the offset that has zeroes in the lower 24 bits.  If the
lower 24 bits are non-zero, they are ignored and taken as zeroes.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu-arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu-arch.dec_expires is a
host timebase value.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   |  1 +
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/include/asm/reg.h  |  1 +
 arch/powerpc/include/uapi/asm/kvm.h |  3 ++
 arch/powerpc/kernel/asm-offsets.c   |  1 +
 arch/powerpc/kvm/book3s_hv.c|  8 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 +++--
 7 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 8b4d984..88f4653 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1815,6 +1815,7 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_TLB3PS   | 32
   PPC   | KVM_REG_PPC_EPTCFG   | 32
   PPC   | KVM_REG_PPC_ICP_STATE | 64
+  PPC   | KVM_REG_PPC_TB_OFFSET| 64
 
 ARM registers are mapped using the lower 32 bits.  The upper 16 of that
 is the register group type, or coprocessor number:
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 91b833d..702d88b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -607,6 +607,8 @@ struct kvm_vcpu_arch {
spinlock_t tbacct_lock;
u64 busy_stolen;
u64 busy_preempt;
+
+   u64 tb_offset;  /* guest timebase - host timebase */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 4a9e408..72f8798 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -243,6 +243,7 @@
 #define SPRN_TBRU  0x10D   /* Time Base Read Upper Register (user, R/O) */
 #define SPRN_TBWL  0x11C   /* Time Base Lower Register (super, R/W) */
 #define SPRN_TBWU  0x11D   /* Time Base Upper Register (super, R/W) */
+#define SPRN_TBU40 0x11E   /* Timebase upper 40 bits (hyper, R/W) */
 #define SPRN_SPURR 0x134   /* Scaled PURR */
 #define SPRN_HSPRG00x130   /* Hypervisor Scratch 0 */
 #define SPRN_HSPRG10x131   /* Hypervisor Scratch 1 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index fb0a8a9..9935321 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -504,6 +504,9 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_TLB3PS (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9a)
 #define KVM_REG_PPC_EPTCFG (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9b)
 
+/* Timebase offset */
+#define KVM_REG_PPC_TB_OFFSET  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9c)
+
 /* PPC64 eXternal Interrupt Controller Specification */
 #define KVM_DEV_XICS_GRP_SOURCES   1   /* 64-bit source attributes */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 822b6ba..62acafd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -488,6 +488,7 @@ int main(void)
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
+   DEFINE(VCPU_TB_OFFSET, offsetof(struct kvm_vcpu, arch.tb_offset));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));

[PATCH 0/3] Some fixes for PPC HV-style KVM

2013-08-26 Thread Paul Mackerras
Here are 3 patches that add two PMU (performance monitor unit)
registers to the set being context-switched on guest entry and exit,
and implement a per-guest timebase offset that is needed when we
migrate a guest from one host to another that has a different timebase
origin.  The first patch just adds some one_reg register definitions
for extra PMU registers, including some that exist on POWER8.  These
new registers aren't yet handled by the kernel code, but their
definitions are included here so as to reserve the numbers.

These patches are against Alex Graf's kvm-ppc-queue branch.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers

2013-08-26 Thread Paul Mackerras
Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit.  The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the time of a performance monitor interrupt.  This fixes
it by saving and restoring these registers along with the other PMU
registers on guest entry/exit.

This also provides a way for userspace to access these values for a
vcpu via the one_reg interface.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/kernel/asm-offsets.c   |  2 ++
 arch/powerpc/kvm/book3s_hv.c| 12 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 
 4 files changed, 28 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3328353..91b833d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -498,6 +498,8 @@ struct kvm_vcpu_arch {
 
u64 mmcr[3];
u32 pmc[8];
+   u64 siar;
+   u64 sdar;
 
 #ifdef CONFIG_KVM_EXIT_TIMING
struct mutex exit_timing_lock;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a67c76e..822b6ba 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,8 @@ int main(void)
DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
+   DEFINE(VCPU_SIAR, offsetof(struct kvm_vcpu, arch.siar));
+   DEFINE(VCPU_SDAR, offsetof(struct kvm_vcpu, arch.sdar));
DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb));
DEFINE(VCPU_SLB_MAX, offsetof(struct kvm_vcpu, arch.slb_max));
DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2b95c45..9df824f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -771,6 +771,12 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, 
union kvmppc_one_reg *val)
}
break;
 #endif /* CONFIG_VSX */
+   case KVM_REG_PPC_SIAR:
+   *val = get_reg_val(id, vcpu-arch.siar);
+   break;
+   case KVM_REG_PPC_SDAR:
+   *val = get_reg_val(id, vcpu-arch.sdar);
+   break;
case KVM_REG_PPC_VPA_ADDR:
spin_lock(vcpu-arch.vpa_update_lock);
*val = get_reg_val(id, vcpu-arch.vpa.next_gpa);
@@ -855,6 +861,12 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, 
union kvmppc_one_reg *val)
}
break;
 #endif /* CONFIG_VSX */
+   case KVM_REG_PPC_SIAR:
+   vcpu-arch.siar = set_reg_val(id, *val);
+   break;
+   case KVM_REG_PPC_SDAR:
+   vcpu-arch.sdar = set_reg_val(id, *val);
+   break;
case KVM_REG_PPC_VPA_ADDR:
addr = set_reg_val(id, *val);
r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 60dce5b..2e1dd6c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -198,6 +198,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
ld  r6, VCPU_MMCR + 16(r4)
mtspr   SPRN_MMCR1, r5
mtspr   SPRN_MMCRA, r6
+BEGIN_FTR_SECTION
+   ld  r7, VCPU_SIAR(r4)
+   ld  r8, VCPU_SDAR(r4)
+   mtspr   SPRN_SIAR, r7
+   mtspr   SPRN_SDAR, r8
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
mtspr   SPRN_MMCR0, r3
isync
 
@@ -1125,6 +1131,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
std r4, VCPU_MMCR(r9)
std r5, VCPU_MMCR + 8(r9)
std r6, VCPU_MMCR + 16(r9)
+BEGIN_FTR_SECTION
+   mfspr   r7, SPRN_SIAR
+   mfspr   r8, SPRN_SDAR
+   std r7, VCPU_SIAR(r9)
+   std r8, VCPU_SDAR(r9)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
mfspr   r3, SPRN_PMC1
mfspr   r4, SPRN_PMC2
mfspr   r5, SPRN_PMC3
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: PPC: Book3S HV: Add one_reg definitions for more PMU registers

2013-08-26 Thread Paul Mackerras
This adds one_reg register numbers for two performance monitor registers
that exist on POWER7 and later processors (SIAR and SDAR) and three that
will be introduced on POWER8 (MMCR2, MMCRS and SIER).

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   | 5 +
 arch/powerpc/include/uapi/asm/kvm.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 66dd2aa..8b4d984 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1765,6 +1765,11 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_MMCR0 | 64
   PPC   | KVM_REG_PPC_MMCR1 | 64
   PPC   | KVM_REG_PPC_MMCRA | 64
+  PPC   | KVM_REG_PPC_MMCR2 | 64
+  PPC   | KVM_REG_PPC_MMCRS | 64
+  PPC   | KVM_REG_PPC_SIAR  | 64
+  PPC   | KVM_REG_PPC_SDAR  | 64
+  PPC   | KVM_REG_PPC_SIER  | 64
   PPC   | KVM_REG_PPC_PMC1  | 32
   PPC   | KVM_REG_PPC_PMC2  | 32
   PPC   | KVM_REG_PPC_PMC3  | 32
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..fb0a8a9 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -429,6 +429,11 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_MMCR0  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10)
 #define KVM_REG_PPC_MMCR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11)
 #define KVM_REG_PPC_MMCRA  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12)
+#define KVM_REG_PPC_MMCR2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x13)
+#define KVM_REG_PPC_MMCRS  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x14)
+#define KVM_REG_PPC_SIAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x15)
+#define KVM_REG_PPC_SDAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x16)
+#define KVM_REG_PPC_SIER   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x17)
 
 #define KVM_REG_PPC_PMC1   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18)
 #define KVM_REG_PPC_PMC2   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19)
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ] Documentation/kvm: Update cpuid documentation for steal time and pv eoi

2013-08-26 Thread Raghavendra K T

On 08/26/2013 12:37 PM, Michael S. Tsirkin wrote:

I would change the description to merely say what the CPUID bits
mean, and what they mean is exactly that an MSR is valid.
Use KVM_FEATURE_ASYNC_PF as a template.


Thank you for the review.
Changing the doc accordingly by adding msr info. Please refer below.


+KVM_FEATURE_STEAL_TIME || 5 || guest accounts fine granularity
+   ||   || task steal time.


I'm not sure what this phrase means.
Steal time is a host feature, not a guest feature:
IIUC if this bit is set, the hypervisor can pass the guest information
about how much time was spent running other processes outside the VM.


Okay. I guess I need some help here.

I took this from the PARAVIRT_TIME_ACCOUNTING config help. also I saw
that guest is  actually returning the steal time in kvm_steal_clock().




enabled when
+   ||   || shedstat or task delay 
accounting
+   ||   || is supported by the host.


I think it's enabled by guest, not by host.


true. My understanding was, Guest enables it when host has schedstat or
task delay accounting  on.

I referred to this hunk in kvm/cpuid.c

if (sched_info_on())
   entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
and sched_info_on() is true when schedstat or task delay accounting is
on.

Does this look good?

Enabled by writing to msr 0x4b564d03. The feature is
enabled by guest when host has schedstat or task delay accounting
support.


+KVM_FEATURE_PV_EOI || 6 || overrides the generic EOI
+   ||   || implementation with an optimized
+   ||   || version.


More exactly with a paravirtualized version.


Okay. So how does this sound?

overrides the generic EOI implementation with a paravirtualized
version. This feature is enabled by writing to msr  0x4b564d04.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2013-08-26 Thread Anatoly Burakov
subscribe kvm
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mmapping physical memory

2013-08-26 Thread Anatoly Burakov
Hi all

I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on
QEMU without KVM support enabled, but with KVM i get kernel errors:

* (with EPT enabled)

[  746.940720] [ cut here ]
[  746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!
[  746.949067] invalid opcode:  [#1] SMP
[  746.949393] Modules linked in: rte_kni(OF) igb_uio(OF)
ebtable_nat(F) xt_CHECKSUM(F) bridge(F) stp(F) llc(F)
nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F)
ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F)
nf_defrag_ipv6(F) bnep(F) bluetooth(F) rfkill(F) iptable_nat(F)
nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F)
nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F)
ebtables(F) ip6table_filter(F) ip6_tables(F) be2iscsi(F)
iscsi_boot_sysfs(F) bnx2i(F) cnic(F) uio(F) cxgb4i(F) cxgb4(F)
cxgb3i(F) cxgb3(F) libcxgbi(F) ib_iser(F) rdma_cm(F) ib_addr(F)
iw_cm(F) ib_cm(F) ib_sa(F) ib_mad(F) ib_core(F) iscsi_tcp(F)
libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) iTCO_wdt(F)
iTCO_vendor_support(F) acpi_cpufreq(F) mperf(F) coretemp(F) shpchp(F)
[  747.014963]  lpc_ich(F) mfd_core(F) i2c_i801(F) ioatdma(F)
microcode(F) joydev(F) i7core_edac(F) edac_core(F) vhost_net(F) tun(F)
macvtap(F) macvlan(F) kvm_intel(F) kvm(F) uinput(F) crc32_pclmul(F)
crc32c_intel(F) ghash_clmulni_intel(F) ast(F) ixgbe(F) igb(F)
drm_kms_helper(F) e1000e(F) dca(F) ttm(F) ptp(F) drm(F)
i2c_algo_bit(F) pps_core(F) mdio(F) i2c_core(F) sunrpc(F) [last
unloaded: rte_kni]
[  747.136764] CPU 8
[  747.136909] Pid: 2501, comm: qemu-system-x86 Tainted: GF  O
3.9.11-200.no_strict_dev_mem.fc18.x86_64 #1 Intel Corporation
S5520HC/S5520HC
[  747.228668] RIP: 0010:[a018c43a]  [a018c43a]
__gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[  747.259705] RSP: 0018:880130d39ae8  EFLAGS: 00010246
[  747.291580] RAX:  RBX:  RCX: 8801effeb000
[  747.322598] RDX: 001c3c00 RSI: 7fd11f00 RDI: ea00070f
[  747.354242] RBP: 880130d39b58 R08: 0126 R09: 880130d39c2f
[  747.385123] R10:  R11: 7fd14000 R12: 7fd11f01
[  747.415981] R13: 880130d39ba7 R14: 8801c3bcb4f0 R15: 8802b4538001
[  747.447877] FS:  7fd35c1e9700() GS:8801e9c8()
knlGS:
[  747.479010] CS:  0010 DS:  ES:  CR0: 8005003b
[  747.510220] CR2: 7fe2ffc0 CR3: 0001e66c4000 CR4: 27e0
[  747.542410] DR0:  DR1:  DR2: 
[  747.573780] DR3:  DR6: 0ff0 DR7: 0400
[  747.604759] Process qemu-system-x86 (pid: 2501, threadinfo
880130d38000, task 8801c3bcb4f0)
[  747.637044] Stack:
[  747.668362]  880130d39af8 81083798 880130d39b48
7fd11f00
[  747.700654]  001c3c00 00ff8802b3272a90 0380
8802b3272a80
[  747.731895]  0380 000fc000 880130d39c38
880365fe8000
[  747.763068] Call Trace:
[  747.793746]  [81083798] ? hrtimer_start+0x18/0x20
[  747.824435]  [a018c530] __gfn_to_pfn+0x60/0x70 [kvm]
[  747.855267]  [a018c61a] gfn_to_pfn_async+0x1a/0x20 [kvm]
[  747.884586]  [a01a703a] try_async_pf+0x4a/0x1d0 [kvm]
[  747.914146]  [a01aea2a] tdp_page_fault+0xfa/0x210 [kvm]
[  747.943000]  [a01a89a1] kvm_mmu_page_fault+0x31/0x100 [kvm]
[  747.972271]  [a02135ce] handle_ept_violation+0x5e/0x100 [kvm_intel]
[  748.000620]  [a02189f6] vmx_handle_exit+0xf6/0x7c0 [kvm_intel]
[  748.029860]  [a01bbe38] ? kvm_apic_has_interrupt+0x28/0xe0 [kvm]
[  748.058214]  [a0210370] ? vmx_invpcid_supported+0x20/0x20
[kvm_intel]
[  748.086496]  [a01a281b] kvm_arch_vcpu_ioctl_run+0x2fb/0x11a0 [kvm]
[  748.114711]  [a019de67] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm]
[  748.142788]  [a018a0ee] kvm_vcpu_ioctl+0x26e/0x5f0 [kvm]
[  748.170647]  [810b7340] ? do_futex+0x100/0xad0
[  748.198558]  [811232b4] ? perf_event_context_sched_in+0x94/0xc0
[  748.226194]  [811abe07] do_vfs_ioctl+0x97/0x580
[  748.253809]  [8129d027] ? file_has_perm+0x97/0xb0
[  748.281110]  [811ac381] sys_ioctl+0x91/0xb0
[  748.307911]  [816604d9] system_call_fastpath+0x16/0x1b
[  748.88] Code: ff ff 49 29 d2 4c 89 d2 48 c1 ea 0c 48 03 90 98
00 00 00 48 89 d7 48 89 55 b0 e8 92 d6 ff ff 84 c0 48 8b 55 b0 0f 85
bf fe ff ff 0f 0b 0f 1f 40 00 48 ba 00 00 00 00 00 00 f0 7f e9 aa fe
ff ff
[  748.392724] RIP  [a018c43a] __gfn_to_pfn_memslot+0x36a/0x3e0 [kvm]
[  748.419435]  RSP 880130d39ae8
[  748.524222] ---[ end trace 854a37c471141217 ]---




*** (with EPT disabled)

[  559.581338] [ cut here ]
[  559.581701] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!

Re: [PATCH v2] tile: support KVM for tilegx

2013-08-26 Thread Gleb Natapov
On Sun, Aug 25, 2013 at 09:26:47PM -0400, Chris Metcalf wrote:
 On 8/25/2013 7:39 AM, Gleb Natapov wrote:
  On Mon, Aug 12, 2013 at 04:24:11PM -0400, Chris Metcalf wrote:
  This change provides the initial framework support for KVM on tilegx.
  Basic virtual disk and networking is supported.
 
  This needs to be broken down to more reviewable patches.
 
 I already broke out one pre-requisite patch that wasn't strictly KVM-related:
 
 https://lkml.org/lkml/2013/8/12/339
 
 In addition, we've separately arranged to support booting our kernels in a 
 way that is compatible with the Tilera booter running at the highest 
 privilege level, which enables multiple kernel privilege levels:
 
 https://lkml.org/lkml/2013/5/2/468
 
 How would you recommend further breaking down this patch?  It's pretty much 
 just the basic support for minimal KVM.  I suppose I could break out all the 
 I/O related stuff into a separate patch, though it wouldn't amount to much; 
 perhaps the console could also be broken out separately.  Any other 
 suggestions?

First of all please break out host and guest bits. Also I/O related stuff,
like you suggest (so that guest PV bits will be in separate patch) and
change to a common code (not much as far as I see) with explanation why
it is needed. (Why kvm_vcpu_kick() is not needed for instance?)
 
  Also can you
  describe the implementation a little bit? Does tile arch has vitalization
  extension this implementation uses, or is it trap and emulate approach?
  If later does it run unmodified guest kernels? What userspace are you
  using with this implementation?
 
 We could do full virtualization via trap and emulate, but we've elected to do 
 a para-virtualized approach.  Userspace runs at PL (privilege level) 0, the 
 guest kernel runs at PL1, and the host runs at PL2.  We have available per-PL 
 resources for various things, and take advantage of having two on-chip timers 
 (for example) to handle timing for the host and guest kernels.  We run the 
 same userspace with either the host or the guest.
 
OK, thanks for explanation. Why have you decided to do PV over trap and
emulate?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

2013-08-26 Thread Gleb Natapov
On Wed, Aug 14, 2013 at 10:51:14AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote:
  This is to reserve a capablity number for upcoming support
  of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls
  which support mulptiple DMA map/unmap operations per one call.
 
 Gleb, any chance you can put this (and the next one) into a tree to
 lock in the numbers ?
 
Applied it. Sorry for slow response, was on vocation and still go
through the email backlog.

 I've been wanting to apply the whole series to powerpc-next, that's
 stuff has been simmering for way too long and is in a good enough shape
 imho, but I need the capabilities and ioctl numbers locked in your tree
 first.
 
 Cheers,
 Ben.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  Changes:
  2013/07/16:
  * changed the number
  
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
   include/uapi/linux/kvm.h | 1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
  index acccd08..99c2533 100644
  --- a/include/uapi/linux/kvm.h
  +++ b/include/uapi/linux/kvm.h
  @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_PPC_RTAS 91
   #define KVM_CAP_IRQ_XICS 92
   #define KVM_CAP_ARM_EL1_32BIT 93
  +#define KVM_CAP_SPAPR_MULTITCE 94
   
   #ifdef KVM_CAP_IRQ_ROUTING
   
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mmapping physical memory

2013-08-26 Thread Andrea Arcangeli
Hi Anatoly,

On Mon, Aug 26, 2013 at 12:58:25PM +0100, Anatoly Burakov wrote:
 Hi all
 
 I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on
 QEMU without KVM support enabled, but with KVM i get kernel errors:
 
 * (with EPT enabled)
 
 [  746.940720] [ cut here ]
 [  746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!

So the problem is KVM cannot do put_page on a pfn coming from a
/dev/mem mapping, but it cannot handle VM_PFNMAP mappings without
PageReserved set. During kvm_release_page_* KVM only has the pfn
number of the page, and it has to decide if this page is refcounted or
not, solely based on the pfn number. So if the page is not set as
referenced it cannot allow a mapping to be established, or later
during spte teardown put_page would run on the /dev/mem memory leading
to memory corruption. The above BUG_ON isn't just a false positive,
but it shows a limitation in the KVM page fault ability to map
any kind of memory coming from the host (including /dev/mem mappings).

So I'm suggesting to drop FOLL_GET in the page fault and
kvm_release_page_* after the spte establishment, and to relay entirely
on the mmu notifier and the kvm_mmu lock by adding a
vcpu-in_progress_fault_addr to set before calling gup hva_to_pfn and
to clear in the mmu notifier code within kvm-mmu_lock and to check
within the kvm-mmu_lock during spte establishment to know if the page
pointer become stale and we shall bail out and repeat the fault or not.

We'll still need to use FOLL_GET and set_page_dirty in some cases,
like after modifying the page in places like
emulator_cmpxchg_emulated. Those places cannot depend on the mmu
notifier and the dirty bit set in the pte isn't enough because the
page can be swapped out to disk and marked clean before kmap_atomic
runs, but the 99% of the hva_to_pfn are coming from the KVM secondary
MMU page faults, they're protected by the mmu notifier and they can
skip the refcounting completely including FOLL_GET. And then because
we won't have to run put_page at all anymore, the above BUG will
disappear too.

In terms of performance, I estimate the only cons will be a
ATOMIC_ONCE(vcpu-in_progress_fault_addr) = addr per-thread
cacheline local and lockless initialization before calling gup in
hva_to_pfn and the pros will be the removal of all refcounting
atomic_inc/dec and set_page_dirty from all the KVM page faults.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Oficiální vyhlášení OZNÁMENÍ

2013-08-26 Thread UNDP/Chevron
To je Vám oznámit, že jste byl vybrán pro pen#283;žní cenu
1,600,000.00 GB liber Chevron ropy / Organizace spojených národ#367; pro
rozvoj
Program (UNDP) Chcete-li zahájit zpracování vaší cenu, kontaktujte: Mr.D.
Matt, E-mail: derickm...@googlemail.com
Kontaktujte jej, a poskytne mu na tajnou PIN kódu x5pukg2013 a
Vaše referen#269;ní #269;íslo pro UNDP: 62082013EA-UK/21.

Ms.Tulisa hn#283;dá
http://translate.google.com/#auto/en/

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows Server 2008R2 KVM guest performance issues

2013-08-26 Thread Brian Rak
I've been trying to track down the cause of some serious performance 
issues with a Windows 2008R2 KVM guest.  So far, I've been unable to 
determine what exactly is causing the issue.


When the guest is under load, I see very high kernel CPU usage, as well 
as terrible guest performance.  The workload on the guest is 
approximately 1/4 of what we'd run unvirtualized on the same hardware.  
Even at that level, we max out every vCPU in the guest. While the guest 
runs, I see very high kernel CPU usage (based on `htop` output).



Host setup:
Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 
2013 x86_64 x86_64 x86_64 GNU/Linux

CentOS 6
qemu 1.6.0
2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores 
including hyperthread cores)

24GB memory
swap file is enabled, but unused

Guest setup:
Windows Server 2008R2 (64 bit)
24 vCPUs
16 GB memory
VirtIO disk and network drivers installed
/qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 
24,sockets=1,cores=12,threads=2 -uuid 
90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 
-vnc 127.0.0.1:100 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


The beginning of `perf top` output:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
 64.69%  [kernel][k] _raw_spin_lock
  2.59%  qemu-system-x86_64  [.] 0x001e688d
  1.90%  [kernel][k] native_write_msr_safe
  0.84%  [kvm]   [k] vcpu_enter_guest
  0.80%  [kernel][k] __schedule
  0.77%  [kvm_intel] [k] vmx_vcpu_run
  0.68%  [kernel][k] effective_load
  0.65%  [kernel][k] update_cfs_shares
  0.62%  [kernel][k] _raw_spin_lock_irq
  0.61%  [kernel][k] native_read_msr_safe
  0.56%  [kernel][k] enqueue_entity

I've captured 20,000 lines of kvm trace output.  This can be found 
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace


So far, I've tried the following with very little effect:
* Disable HPET on the guest
* Enable hv_relaxed, hv_vapic, hv_spinlocks
* Enable SR-IOV
* Pin vCPUs to physical CPUs
* Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes)
* bcdedit /set useplatformclock yes and no


Any suggestions as to what I can do to get better performance out of ths 
guest?  Or reasons why I'm seeing such high kernel cpu usage with it?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FAQ on linux-kvm.org has broken link

2013-08-26 Thread folkert
Hi,

 1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
you can rule out fixed bugs in vhost_net or tap.

For the last two weeks it went fine for a couple of days each time. This
evening was really bad again: uptimes of 5-10 minutes. This is with
3.11-rc4.

 2. Get the system into the bad state and then do some deeper.  Start
with outgoing ping, instrument guest driver and host vhost_net
functions to see what the drivers are doing, inspect the transmit
vring, etc.
 
 #1 is probably the best next step.  If it fails and you still have time

Yup, very much.

 to work on a solution we can start digging deeper with #2.

I had a small script running with showed the amount of traffic coming
and going out each second. I did not see any increase or decrease in the
amount, only that when the problem happens RX stays but TX goes to 0.


Folkert van Heusden

-- 
MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san
den commando spiti puru. Piki puru spesrutu sani, wroko nanga difrenti
kroru, tya kon makandra, nanga wan lo moro.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-26 Thread Kent Overstreet
On Tue, Aug 20, 2013 at 02:31:57PM -0700, Andrew Morton wrote:
 On Fri, 16 Aug 2013 23:09:06 + Nicholas A. Bellinger 
 n...@linux-iscsi.org wrote:
 
  From: Kent Overstreet k...@daterainc.com
  
  Percpu frontend for allocating ids. With percpu allocation (that works),
  it's impossible to guarantee it will always be possible to allocate all
  nr_tags - typically, some will be stuck on a remote percpu freelist
  where the current job can't get to them.
  
  We do guarantee that it will always be possible to allocate at least
  (nr_tags / 2) tags - this is done by keeping track of which and how many
  cpus have tags on their percpu freelists. On allocation failure if
  enough cpus have tags that there could potentially be (nr_tags / 2) tags
  stuck on remote percpu freelists, we then pick a remote cpu at random to
  steal from.
  
  Note that there's no cpu hotplug notifier - we don't care, because
  steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
  more allocations if we had a notifier - but we'll still meet our
  guarantees and it's absolutely not a correctness issue, so I don't think
  it's worth the extra code.
 
  ...
 
   include/linux/idr.h |   53 +
   lib/idr.c   |  316 
  +--
 
 I don't think this should be in idr.[ch] at all.  It has no
 relationship with the existing code.  Apart from duplicating its
 functionality :(

Well, in the full patch series it does make use of the non-percpu ida.
I'm still hoping to get the ida/idr rewrites in.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-v3 1/4] idr: Percpu ida

2013-08-26 Thread Kent Overstreet
On Wed, Aug 21, 2013 at 06:25:58PM +, Christoph Lameter wrote:
 On Fri, 16 Aug 2013, Nicholas A. Bellinger wrote:
 
  +   spinlock_t  lock;
 
 Remove the spinlock.

As Andrew noted, the spinlock is needed because of tag stealing. (You
don't think I'd stick a spinlock on a percpu data structure without a
real reason, would you?)

  +   unsignednr_free;
  +   unsignedfreelist[];
  +};
  +
  +static inline void move_tags(unsigned *dst, unsigned *dst_nr,
  +unsigned *src, unsigned *src_nr,
  +unsigned nr)
  +{
  +   *src_nr -= nr;
  +   memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
  +   *dst_nr += nr;
  +}
  +
 
  +static inline unsigned alloc_local_tag(struct percpu_ida *pool,
  +  struct percpu_ida_cpu *tags)
 
 Pass the __percpu offset and not the tags pointer.

Why? It just changes where the this_cpu_ptr

 
  +{
  +   int tag = -ENOSPC;
  +
  +   spin_lock(tags-lock);
 
 Interupts are already disabled. Drop the spinlock.
 
  +   if (tags-nr_free)
  +   tag = tags-freelist[--tags-nr_free];
 
 You can keep this or avoid address calculation through segment prefixes.
 F.e.
 
 if (__this_cpu_read(tags-nrfree) {
   int n = __this_cpu_dec_return(tags-nr_free);
   tag =  __this_cpu_read(tags-freelist[n]);
 }

Can you explain what the point of that change would be? It sounds like
it's preferable to do it that way and avoid this_cpu_ptr() for some
reason, but you're not explaining why.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.10.1 - NMI received for unknown reason

2013-08-26 Thread Stefan Pietsch
On 25.08.2013 13:45, Gleb Natapov wrote:
 On Fri, Aug 09, 2013 at 09:14:13PM +0200, Stefan Pietsch wrote:
 On 04.08.2013 14:44, Gleb Natapov wrote:
 On Fri, Aug 02, 2013 at 08:24:38AM +0200, Stefan Pietsch wrote:
 On 31.07.2013 11:20, Gleb Natapov wrote:
 On Wed, Jul 31, 2013 at 11:10:01AM +0200, Stefan Pietsch wrote:
 On 30.07.2013 07:31, Gleb Natapov wrote:

 What happen if you run perf on your host (perf record -a)?
 Do you see same NMI messages?

 It seems that perf record -a triggers some delayed NMI messages.
 They appear about 20 or 30 minutes after the command. This seems strange.
 Definitely strange. KVM guest is not running in parallel, correct? 20, 30
 minutes after perf stopped running or it is running all of the time?

 No, the KVM guest ist not running in parallel. But I'm not able to
 clearly reproduce the NMI messages with perf record.
 I start perf record -a and after some minutes I stop the recording.

 After that it seems NMI messages appear within a random period of time.
 So, I cannot tell what triggers the messages.
 When you run KVM with coreduo cpu model it emulates PMU which basically
 make is perf front end. If you can reproduce the messages with perf too
 it probably means that the problem is not in the KVM itself. If you
 disabled NMI watchdog in the guest the messages may go away.
 Can you send your guest's dmesg when you boot it with coreduo mode?


 The NMI messages appear in the host only. The guest runs as usual.


 I understand that. But enabling guest nmi watchdog is what makes KVM to
 use perf subsystem and likely causes this host messages. Try do disable
 nmi watchdog in a guest and see what happens.

I disabled the watchdog in the guest by booting the kernel with
nmi_watchdog=0. This does not produce any NMI errors.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008R2 KVM guest performance issues

2013-08-26 Thread Brian Rak

On 8/26/2013 3:15 PM, Brian Rak wrote:
I've been trying to track down the cause of some serious performance 
issues with a Windows 2008R2 KVM guest.  So far, I've been unable to 
determine what exactly is causing the issue.


When the guest is under load, I see very high kernel CPU usage, as 
well as terrible guest performance.  The workload on the guest is 
approximately 1/4 of what we'd run unvirtualized on the same 
hardware.  Even at that level, we max out every vCPU in the guest. 
While the guest runs, I see very high kernel CPU usage (based on 
`htop` output).



Host setup:
Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 
2013 x86_64 x86_64 x86_64 GNU/Linux

CentOS 6
qemu 1.6.0
2x Intel E5-2630 (virtualization extensions turned on, total of 24 
cores including hyperthread cores)

24GB memory
swap file is enabled, but unused

Guest setup:
Windows Server 2008R2 (64 bit)
24 vCPUs
16 GB memory
VirtIO disk and network drivers installed
/qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 
24,sockets=1,cores=12,threads=2 -uuid 
90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 
-vnc 127.0.0.1:100 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


The beginning of `perf top` output:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
 64.69%  [kernel][k] _raw_spin_lock
  2.59%  qemu-system-x86_64  [.] 0x001e688d
  1.90%  [kernel][k] native_write_msr_safe
  0.84%  [kvm]   [k] vcpu_enter_guest
  0.80%  [kernel][k] __schedule
  0.77%  [kvm_intel] [k] vmx_vcpu_run
  0.68%  [kernel][k] effective_load
  0.65%  [kernel][k] update_cfs_shares
  0.62%  [kernel][k] _raw_spin_lock_irq
  0.61%  [kernel][k] native_read_msr_safe
  0.56%  [kernel][k] enqueue_entity

I've captured 20,000 lines of kvm trace output.  This can be found 
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace 



So far, I've tried the following with very little effect:
* Disable HPET on the guest
* Enable hv_relaxed, hv_vapic, hv_spinlocks
* Enable SR-IOV
* Pin vCPUs to physical CPUs
* Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes)
* bcdedit /set useplatformclock yes and no


Any suggestions as to what I can do to get better performance out of 
ths guest?  Or reasons why I'm seeing such high kernel cpu usage with it?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I've done some additional research on this, and I believe that 'kvm_pio: 
pio_read at 0xb008 size 4 count 1' is related to windows trying to read 
the pm timer.  This timer appears to use the TSC in some cases (I 
think).  I found this patchset: 
http://www.spinics.net/lists/kvm/msg91214.html which doesn't appear to 
be applied yet.  Does it seem reasonable that this patchset would 
eliminate the need for windows to read from the pm timer continuously?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is fallback vhost_net to qemu for live migrate available?

2013-08-26 Thread Qin Chuanyu

Hi all

I am participating in a project which try to port vhost_net on Xen。

By change the memory copy and notify mechanism ,currently virtio-net 
with vhost_net could run on Xen with good performance。TCP receive 
throughput of single vnic from 2.77Gbps up to 6Gps。In VM receive 
side,I instead grant_copy with grant_map + memcopy,it efficiently 
reduce the cost of grant_table spin_lock of dom0,So the hole server TCP 
performance from 5.33Gps up to 9.5Gps。


Now I am consider the live migrate of vhost_net on Xen,vhost_net use 
vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the 
hole memory of VM,So I am trying to fallback datapath from vhost_net to 
qemu when doing live migrate ,and fallback datapath from qemu to

vhost_net again after vm migrate to new server。

My question is:
	why didn't vhost_net do the same fallback operation for live migrate on 
KVM,but use vhost_log to mark the dirty page?
	Is there any mechanism fault for the idea of fallback datapath from 
vhost_net to qemu for live migrate?


any question about the detail of vhost_net on Xen is welcome。

Thanks


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Investment.

2013-08-26 Thread Mr Peter Komo



Dear Sir/Madam,

Please note that my client would like to invest in your country and if you
can assist us to invest in a profitable areas that would yield profits
kindly get back to me for a detailed information on how to proceed with
this project.

If you are really interested to assist do reply  through this email
peter.ko...@gmail.com as soon as you receive this email for more details.

Thank you and waiting for your response.

Regards,

Mr Peter Komo.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

2013-08-26 Thread Benjamin Herrenschmidt
On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
  Gleb, any chance you can put this (and the next one) into a tree to
  lock in the numbers ?
  
 Applied it. Sorry for slow response, was on vocation and still go
 through the email backlog.

Thanks. Since it's not in a topic branch that I can pull, I'm going to
just cherry-pick them. However, they are in your queue branch, not
next branch. Should I still assume this is a stable branch and that
the numbers aren't going to change ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

2013-08-26 Thread Benjamin Herrenschmidt
On Tue, 2013-08-27 at 14:19 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
   Gleb, any chance you can put this (and the next one) into a tree to
   lock in the numbers ?
   
  Applied it. Sorry for slow response, was on vocation and still go
  through the email backlog.
 
 Thanks. Since it's not in a topic branch that I can pull, I'm going to
 just cherry-pick them. However, they are in your queue branch, not
 next branch. Should I still assume this is a stable branch and that
 the numbers aren't going to change ?

Oh and Alexey mentions that there are two capabilities and you only
applied one :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is fallback vhost_net to qemu for live migrate available?

2013-08-26 Thread Michael S. Tsirkin
On Tue, Aug 27, 2013 at 11:32:31AM +0800, Qin Chuanyu wrote:
 Hi all
 
 I am participating in a project which try to port vhost_net on Xen。
 
 By change the memory copy and notify mechanism ,currently
 virtio-net with vhost_net could run on Xen with good
 performance。TCP receive throughput of single vnic from 2.77Gbps up
 to 6Gps。In VM receive side,I instead grant_copy with grant_map +
 memcopy,it efficiently reduce the cost of grant_table spin_lock of
 dom0,So the hole server TCP performance from 5.33Gps up to 9.5Gps。
 
 Now I am consider the live migrate of vhost_net on Xen,vhost_net
 use vhost_log for live migrate on Kvm,but qemu on Xen havn't manage
 the hole memory of VM,So I am trying to fallback datapath from
 vhost_net to qemu when doing live migrate ,and fallback datapath
 from qemu to
 vhost_net again after vm migrate to new server。
 
 My question is:
   why didn't vhost_net do the same fallback operation for live
 migrate on KVM,but use vhost_log to mark the dirty page?
   Is there any mechanism fault for the idea of fallback datapath from
 vhost_net to qemu for live migrate?
 
 any question about the detail of vhost_net on Xen is welcome。
 
 Thanks
 

It should work, in practice.

However, one issue with this approach that I see is that you
are running two instances of virtio-net on the host:
qemu and vhost-net, doubling your security surface
for guest to host attack.

I don't exactly see why does it matter that qemu doesn't manage
the whole memory of a VM - vhost only needs to log
memory writes that it performs.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Paolo Bonzini
Il 25/08/2013 17:04, Alexander Graf ha scritto:
 
 On 24.08.2013, at 21:14, Yann Droneaud wrote:
 
 KVM uses anon_inode_get() to allocate file descriptors as part
 of some of its ioctls. But those ioctls are lacking a flag argument
 allowing userspace to choose options for the newly opened file descriptor.

 In such case it's advised to use O_CLOEXEC by default so that
 userspace is allowed to choose, without race, if the file descriptor
 is going to be inherited across exec().

 This patch set O_CLOEXEC flag on all file descriptors created
 with anon_inode_getfd() to not leak file descriptors across exec().

 Signed-off-by: Yann Droneaud ydrone...@opteya.com
 Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com
 
 Reviewed-by: Alexander Graf ag...@suse.de
 
 Would it make sense to simply inherit the O_CLOEXEC flag from the
 parent kvm fd instead? That would give user space the power to keep
 fds across exec() if it wants to.

Does it make sense to use non-O_CLOEXEC file descriptors with KVM at
all?  Besides fork() not being supported by KVM, as described in
Documentation/virtual/kvm/api.txt, the VMAs of the parent process go
away as soon as you exec().  I'm not sure how you can use the inherited
file descriptor in a sensible way after exec().

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Yann Droneaud

Le 26.08.2013 09:39, Paolo Bonzini a écrit :

Il 25/08/2013 17:04, Alexander Graf ha scritto:

On 24.08.2013, at 21:14, Yann Droneaud wrote:



This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().

Signed-off-by: Yann Droneaud ydrone...@opteya.com
Link: 
http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com


Reviewed-by: Alexander Graf ag...@suse.de

Would it make sense to simply inherit the O_CLOEXEC flag from the
parent kvm fd instead? That would give user space the power to keep
fds across exec() if it wants to.


Does it make sense to use non-O_CLOEXEC file descriptors with KVM at
all?  Besides fork() not being supported by KVM, as described in
Documentation/virtual/kvm/api.txt, the VMAs of the parent process go
away as soon as you exec().  I'm not sure how you can use the inherited
file descriptor in a sensible way after exec().



Sounds a lot like InfiniBand subsystem behavor: IB file descriptors
are of no use accross exec() since memory mappings tied to those fds
won't be available in the new process:

https://lkml.org/lkml/2013/7/8/380
http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org

Regards.

--
Yann Droneaud
OPTEYA

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Paolo Bonzini
Il 26/08/2013 10:23, Yann Droneaud ha scritto:
 
 Sounds a lot like InfiniBand subsystem behavor: IB file descriptors
 are of no use accross exec() since memory mappings tied to those fds
 won't be available in the new process:
 
 https://lkml.org/lkml/2013/7/8/380
 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org

Yes, it is very similar.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag

2013-08-26 Thread Gleb Natapov
On Sat, Aug 24, 2013 at 10:14:06PM +0200, Yann Droneaud wrote:
 Hi,
 
 Following a patchset asking to change calls to get_unused_flag() [1]
 to use O_CLOEXEC, Alex Williamson [2][3] decided to change VFIO
 to use the flag.
 
 Since it's a related subsystem to KVM, using O_CLOEXEC for
 file descriptors created by KVM might be applicable too.
 
 I'm suggesting to change calls to anon_inode_getfd() to use O_CLOEXEC
 as default flag.
 
 This patchset should be reviewed to not break existing userspace program.
 
 BTW, if it's not applicable, I would suggest that new ioctls be added to
 KVM subsystem, those ioctls would have a flag field added to their 
 arguments.
 Such flag would let userspace choose the open flag to use.
 See for example other APIs using anon_inode_getfd() such as fanotify,
 inotify, signalfd and timerfd.
 
 You might be interested to read:
 
 - Secure File Descriptor Handling (Ulrich Drepper, 2008)
   http://udrepper.livejournal.com/20407.html
 
 - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) 
   http://danwalsh.livejournal.com/53603.html
 
Applied, thanks.

 Regards.
 
 [1] http://lkml.kernel.org/r/cover.1376327678.git.ydrone...@opteya.com
 [2] http://lkml.kernel.org/r/1377186804.25163.17.ca...@ul30vt.home
 [3] http://lkml.kernel.org/r/20130822171744.1297.13711.st...@bling.home
 
 Yann Droneaud (2):
   kvm: use anon_inode_getfd() with O_CLOEXEC flag
   ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
 
  arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
  arch/powerpc/kvm/book3s_64_vio.c| 2 +-
  arch/powerpc/kvm/book3s_hv.c| 2 +-
  virt/kvm/kvm_main.c | 6 +++---
  4 files changed, 6 insertions(+), 6 deletions(-)
 
 -- 
 1.8.3.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly

2013-08-26 Thread Alexander Graf

On 26.08.2013, at 05:28, Aneesh Kumar K.V wrote:

 Alexander Graf ag...@suse.de writes:
 
 On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote:
 
 Alexander Graf ag...@suse.de writes:
 
 On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote:
 
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Isn't this you?
 
 Yes. The patches are generated using git format-patch and sent by
 git send-email. That's how it always created patches for me. I am not sure 
 if
 there is a config I can change to avoid having From:
 
 
 
 We should be able to copy upto count bytes
 
 Why?
 
 
 Without this we end up doing
 
 +struct kvm_get_htab_buf {
 +struct kvm_get_htab_header header;
 +/*
 + * Older kernel required one extra byte.
 + */
 +unsigned long hpte[3];
 +} hpte_buf;
 
 
 even though we are only looking for one hpte entry.
 
 Ok, please give me an example with real numbers and why it breaks.
 
 
 http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com
 
 
 Didn't quiet get what you are looking for. As explained before, we now
 need to pass an array with array size 3 even though we know we need to
 read only 2 entries because kernel doesn't loop correctly.

But we need to do that regardless, because newer QEMU needs to be able to run 
on older kernels, no?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers

2013-08-26 Thread Paul Mackerras
Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit.  The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the time of a performance monitor interrupt.  This fixes
it by saving and restoring these registers along with the other PMU
registers on guest entry/exit.

This also provides a way for userspace to access these values for a
vcpu via the one_reg interface.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/kernel/asm-offsets.c   |  2 ++
 arch/powerpc/kvm/book3s_hv.c| 12 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 
 4 files changed, 28 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3328353..91b833d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -498,6 +498,8 @@ struct kvm_vcpu_arch {
 
u64 mmcr[3];
u32 pmc[8];
+   u64 siar;
+   u64 sdar;
 
 #ifdef CONFIG_KVM_EXIT_TIMING
struct mutex exit_timing_lock;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a67c76e..822b6ba 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,8 @@ int main(void)
DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded));
DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr));
DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc));
+   DEFINE(VCPU_SIAR, offsetof(struct kvm_vcpu, arch.siar));
+   DEFINE(VCPU_SDAR, offsetof(struct kvm_vcpu, arch.sdar));
DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb));
DEFINE(VCPU_SLB_MAX, offsetof(struct kvm_vcpu, arch.slb_max));
DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2b95c45..9df824f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -771,6 +771,12 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, 
union kvmppc_one_reg *val)
}
break;
 #endif /* CONFIG_VSX */
+   case KVM_REG_PPC_SIAR:
+   *val = get_reg_val(id, vcpu-arch.siar);
+   break;
+   case KVM_REG_PPC_SDAR:
+   *val = get_reg_val(id, vcpu-arch.sdar);
+   break;
case KVM_REG_PPC_VPA_ADDR:
spin_lock(vcpu-arch.vpa_update_lock);
*val = get_reg_val(id, vcpu-arch.vpa.next_gpa);
@@ -855,6 +861,12 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, 
union kvmppc_one_reg *val)
}
break;
 #endif /* CONFIG_VSX */
+   case KVM_REG_PPC_SIAR:
+   vcpu-arch.siar = set_reg_val(id, *val);
+   break;
+   case KVM_REG_PPC_SDAR:
+   vcpu-arch.sdar = set_reg_val(id, *val);
+   break;
case KVM_REG_PPC_VPA_ADDR:
addr = set_reg_val(id, *val);
r = -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 60dce5b..2e1dd6c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -198,6 +198,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
ld  r6, VCPU_MMCR + 16(r4)
mtspr   SPRN_MMCR1, r5
mtspr   SPRN_MMCRA, r6
+BEGIN_FTR_SECTION
+   ld  r7, VCPU_SIAR(r4)
+   ld  r8, VCPU_SDAR(r4)
+   mtspr   SPRN_SIAR, r7
+   mtspr   SPRN_SDAR, r8
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
mtspr   SPRN_MMCR0, r3
isync
 
@@ -1125,6 +1131,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
std r4, VCPU_MMCR(r9)
std r5, VCPU_MMCR + 8(r9)
std r6, VCPU_MMCR + 16(r9)
+BEGIN_FTR_SECTION
+   mfspr   r7, SPRN_SIAR
+   mfspr   r8, SPRN_SDAR
+   std r7, VCPU_SIAR(r9)
+   std r8, VCPU_SDAR(r9)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
mfspr   r3, SPRN_PMC1
mfspr   r4, SPRN_PMC2
mfspr   r5, SPRN_PMC3
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: PPC: Book3S HV: Add one_reg definitions for more PMU registers

2013-08-26 Thread Paul Mackerras
This adds one_reg register numbers for two performance monitor registers
that exist on POWER7 and later processors (SIAR and SDAR) and three that
will be introduced on POWER8 (MMCR2, MMCRS and SIER).

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   | 5 +
 arch/powerpc/include/uapi/asm/kvm.h | 5 +
 2 files changed, 10 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 66dd2aa..8b4d984 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1765,6 +1765,11 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_MMCR0 | 64
   PPC   | KVM_REG_PPC_MMCR1 | 64
   PPC   | KVM_REG_PPC_MMCRA | 64
+  PPC   | KVM_REG_PPC_MMCR2 | 64
+  PPC   | KVM_REG_PPC_MMCRS | 64
+  PPC   | KVM_REG_PPC_SIAR  | 64
+  PPC   | KVM_REG_PPC_SDAR  | 64
+  PPC   | KVM_REG_PPC_SIER  | 64
   PPC   | KVM_REG_PPC_PMC1  | 32
   PPC   | KVM_REG_PPC_PMC2  | 32
   PPC   | KVM_REG_PPC_PMC3  | 32
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..fb0a8a9 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -429,6 +429,11 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_MMCR0  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10)
 #define KVM_REG_PPC_MMCR1  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11)
 #define KVM_REG_PPC_MMCRA  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12)
+#define KVM_REG_PPC_MMCR2  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x13)
+#define KVM_REG_PPC_MMCRS  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x14)
+#define KVM_REG_PPC_SIAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x15)
+#define KVM_REG_PPC_SDAR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x16)
+#define KVM_REG_PPC_SIER   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x17)
 
 #define KVM_REG_PPC_PMC1   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18)
 #define KVM_REG_PPC_PMC2   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19)
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: PPC: Book3S HV: Implement timebase offset for guests

2013-08-26 Thread Paul Mackerras
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.

Therefore this provides a new per-vcpu value accessed via the one_reg
interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
defaults to 0 and is not modified by KVM.  On entering the guest, this
value is added onto the timebase, and on exiting the guest, it is
subtracted from the timebase.

This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register.  Writing to the TBU40 register only
alters the upper 40 bits of the timebase, leaving the lower 24 bits
unchanged.  This provides a way to modify the timebase for guest
migration without disturbing the synchronization of the timebase
registers across CPU cores.  This means that userspace must supply
a value for the offset that has zeroes in the lower 24 bits.  If the
lower 24 bits are non-zero, they are ignored and taken as zeroes.

Timebase values stored in KVM structures (struct kvm_vcpu, struct
kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
values in the dispatch trace log need to be guest timebase values,
however, since that is read directly by the guest.  This moves the
setting of vcpu-arch.dec_expires on guest exit to a point after we
have restored the host timebase so that vcpu-arch.dec_expires is a
host timebase value.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   |  1 +
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/include/asm/reg.h  |  1 +
 arch/powerpc/include/uapi/asm/kvm.h |  3 ++
 arch/powerpc/kernel/asm-offsets.c   |  1 +
 arch/powerpc/kvm/book3s_hv.c|  8 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 +++--
 7 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 8b4d984..88f4653 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1815,6 +1815,7 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_TLB3PS   | 32
   PPC   | KVM_REG_PPC_EPTCFG   | 32
   PPC   | KVM_REG_PPC_ICP_STATE | 64
+  PPC   | KVM_REG_PPC_TB_OFFSET| 64
 
 ARM registers are mapped using the lower 32 bits.  The upper 16 of that
 is the register group type, or coprocessor number:
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 91b833d..702d88b 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -607,6 +607,8 @@ struct kvm_vcpu_arch {
spinlock_t tbacct_lock;
u64 busy_stolen;
u64 busy_preempt;
+
+   u64 tb_offset;  /* guest timebase - host timebase */
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 4a9e408..72f8798 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -243,6 +243,7 @@
 #define SPRN_TBRU  0x10D   /* Time Base Read Upper Register (user, R/O) */
 #define SPRN_TBWL  0x11C   /* Time Base Lower Register (super, R/W) */
 #define SPRN_TBWU  0x11D   /* Time Base Upper Register (super, R/W) */
+#define SPRN_TBU40 0x11E   /* Timebase upper 40 bits (hyper, R/W) */
 #define SPRN_SPURR 0x134   /* Scaled PURR */
 #define SPRN_HSPRG00x130   /* Hypervisor Scratch 0 */
 #define SPRN_HSPRG10x131   /* Hypervisor Scratch 1 */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index fb0a8a9..9935321 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -504,6 +504,9 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_TLB3PS (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9a)
 #define KVM_REG_PPC_EPTCFG (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9b)
 
+/* Timebase offset */
+#define KVM_REG_PPC_TB_OFFSET  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9c)
+
 /* PPC64 eXternal Interrupt Controller Specification */
 #define KVM_DEV_XICS_GRP_SOURCES   1   /* 64-bit source attributes */
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 822b6ba..62acafd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -488,6 +488,7 @@ int main(void)
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
+   DEFINE(VCPU_TB_OFFSET, offsetof(struct kvm_vcpu, arch.tb_offset));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));

[PATCH 0/3] Some fixes for PPC HV-style KVM

2013-08-26 Thread Paul Mackerras
Here are 3 patches that add two PMU (performance monitor unit)
registers to the set being context-switched on guest entry and exit,
and implement a per-guest timebase offset that is needed when we
migrate a guest from one host to another that has a different timebase
origin.  The first patch just adds some one_reg register definitions
for extra PMU registers, including some that exist on POWER8.  These
new registers aren't yet handled by the kernel code, but their
definitions are included here so as to reserve the numbers.

These patches are against Alex Graf's kvm-ppc-queue branch.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly

2013-08-26 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 26.08.2013, at 05:28, Aneesh Kumar K.V wrote:

 Alexander Graf ag...@suse.de writes:
 
 On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote:
 
 Alexander Graf ag...@suse.de writes:
 
 On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote:
 
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Isn't this you?
 
 Yes. The patches are generated using git format-patch and sent by
 git send-email. That's how it always created patches for me. I am not sure 
 if
 there is a config I can change to avoid having From:
 
 
 
 We should be able to copy upto count bytes
 
 Why?
 
 
 Without this we end up doing
 
 +struct kvm_get_htab_buf {
 +struct kvm_get_htab_header header;
 +/*
 + * Older kernel required one extra byte.
 + */
 +unsigned long hpte[3];
 +} hpte_buf;
 
 
 even though we are only looking for one hpte entry.
 
 Ok, please give me an example with real numbers and why it breaks.
 
 
 http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com
 
 
 Didn't quiet get what you are looking for. As explained before, we now
 need to pass an array with array size 3 even though we know we need to
 read only 2 entries because kernel doesn't loop correctly.

 But we need to do that regardless, because newer QEMU needs to be able to run 
 on older kernels, no?


yes. So use space will have to pass an array of size 3. But that should
not prevent us from fixing this right ?

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

2013-08-26 Thread Gleb Natapov
On Wed, Aug 14, 2013 at 10:51:14AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote:
  This is to reserve a capablity number for upcoming support
  of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls
  which support mulptiple DMA map/unmap operations per one call.
 
 Gleb, any chance you can put this (and the next one) into a tree to
 lock in the numbers ?
 
Applied it. Sorry for slow response, was on vocation and still go
through the email backlog.

 I've been wanting to apply the whole series to powerpc-next, that's
 stuff has been simmering for way too long and is in a good enough shape
 imho, but I need the capabilities and ioctl numbers locked in your tree
 first.
 
 Cheers,
 Ben.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  Changes:
  2013/07/16:
  * changed the number
  
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
   include/uapi/linux/kvm.h | 1 +
   1 file changed, 1 insertion(+)
  
  diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
  index acccd08..99c2533 100644
  --- a/include/uapi/linux/kvm.h
  +++ b/include/uapi/linux/kvm.h
  @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_PPC_RTAS 91
   #define KVM_CAP_IRQ_XICS 92
   #define KVM_CAP_ARM_EL1_32BIT 93
  +#define KVM_CAP_SPAPR_MULTITCE 94
   
   #ifdef KVM_CAP_IRQ_ROUTING
   
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

2013-08-26 Thread Benjamin Herrenschmidt
On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
  Gleb, any chance you can put this (and the next one) into a tree to
  lock in the numbers ?
  
 Applied it. Sorry for slow response, was on vocation and still go
 through the email backlog.

Thanks. Since it's not in a topic branch that I can pull, I'm going to
just cherry-pick them. However, they are in your queue branch, not
next branch. Should I still assume this is a stable branch and that
the numbers aren't going to change ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html