Re: [PATCH 0/4 v3] kvm: powerpc: use cache attributes from linux pte

2013-12-18 Thread Alexander Graf

On 18.11.2013, at 19:36, Scott Wood scottw...@freescale.com wrote:

 On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote:
 From: Bharat Bhushan bharat.bhus...@freescale.com
 
 v2-v3
  - Returning pte pointer rather than pte as suggested by Scott
  - Also removed PAGE_SPLITTING as this need to be handled by caller 
 
 v1-v2
 - Removed _PAGE_BUSY loop as suggested by PaulS.
 - Added check for PAGE_SPLITTING
 
 kvm: powerpc: use cache attributes from linux pte
 - 1st Patch fixes a bug in booke (detail in patch)
 - 2nd patch is renaming the linux_pte_lookup_function() just for clarity.
   There is not functional change.
 - 3nd Patch adds a Linux pte lookup function.
 - 4th Patch uses the above defined function and setup TLB.wimg accordingly
 
 Bharat Bhushan (4):
  kvm: booke: clear host tlb reference flag on guest tlb invalidation
  kvm: book3s: rename lookup_linux_pte() to
lookup_linux_pte_and_update()
  kvm: powerpc: define a linux pte lookup function
  kvm: powerpc: use caching attributes as per linux pte
 
 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/include/asm/pgtable.h  |   21 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++--
 arch/powerpc/kvm/booke.c|1 +
 arch/powerpc/kvm/e500.h |8 +++--
 arch/powerpc/kvm/e500_mmu_host.c|   55 +++---
 6 files changed, 64 insertions(+), 31 deletions(-)
 
 Reviewed-by: Scott Wood scottw...@freescale.com with patch 4/4
 replaced by v4.

Thanks, applied to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Leave VMX mode on clearing of feature control MSR

2013-12-18 Thread Paolo Bonzini
Il 17/12/2013 19:57, Jan Kiszka ha scritto:
 When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave
 root and non-root mode, fully disabling VMX. The register state of the
 VCPU is undefined after this step, so userspace has to set it to a
 proper state afterward.
 
 This enables to reboot a VM while it is running some hypervisor code.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
 
 Even without a QEMU patch, this already enables system reset - the guest
 is left in such a broken state that it simply triple-faults and resets
 twice. Nevertheless, QEMU patch will follow.
 
  arch/x86/kvm/vmx.c | 14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index f90320b..6a0c2fa 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2455,6 +2455,8 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
 msr_index, u64 *pdata)
   return 1;
  }
  
 +static void vmx_leave_nested(struct kvm_vcpu *vcpu);
 +
  static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  {
   u32 msr_index = msr_info-index;
 @@ -2470,6 +2472,8 @@ static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, 
 struct msr_data *msr_info)
FEATURE_CONTROL_LOCKED)
   return 0;
   to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
 + if (host_initialized  data == 0)
 + vmx_leave_nested(vcpu);
   return 1;
   }
  
 @@ -8488,6 +8492,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
  }
  
  /*
 + * Forcibly leave nested mode in order to be able to reset the VCPU later on.
 + */
 +static void vmx_leave_nested(struct kvm_vcpu *vcpu)
 +{
 + if (is_guest_mode(vcpu))
 + nested_vmx_vmexit(vcpu);
 + free_nested(to_vmx(vcpu));
 +}
 +
 +/*
   * L1's failure to enter L2 is a subset of a normal exit, as explained in
   * 23.7 VM-entry failures during or after loading guest state (this also
   * lists the acceptable exit-reason and exit-qualification parameters).
 

Applied, thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master] kvm: x86: Separately write feature control MSR on reset

2013-12-18 Thread Paolo Bonzini
Il 17/12/2013 20:05, Jan Kiszka ha scritto:
 If the guest is running in nested mode on system reset, clearing the
 feature MSR signals the kernel to leave this mode. Recent kernels
 processes this properly, but leave the VCPU state undefined behind. It
 is the job of userspace to bring it to a proper shape. Therefore, write
 this specific MSR first so that no state transfer gets lost.
 
 This allows to cleanly reset a guest with VMX in use.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  target-i386/kvm.c | 32 
  1 file changed, 28 insertions(+), 4 deletions(-)
 
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 1188482..ec51447 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1104,6 +1104,25 @@ static int kvm_put_tscdeadline_msr(X86CPU *cpu)
  return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, msr_data);
  }
  
 +/*
 + * Provide a separate write service for the feature control MSR in order to
 + * kick the VCPU out of VMXON or even guest mode on reset. This has to be 
 done
 + * before writing any other state because forcibly leaving nested mode
 + * invalidates the VCPU state.
 + */
 +static int kvm_put_msr_feature_control(X86CPU *cpu)
 +{
 +struct {
 +struct kvm_msrs info;
 +struct kvm_msr_entry entry;
 +} msr_data;
 +
 +kvm_msr_entry_set(msr_data.entry, MSR_IA32_FEATURE_CONTROL,
 +  cpu-env.msr_ia32_feature_control);
 +msr_data.info.nmsrs = 1;
 +return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, msr_data);
 +}
 +
  static int kvm_put_msrs(X86CPU *cpu, int level)
  {
  CPUX86State *env = cpu-env;
 @@ -1204,10 +1223,8 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  if (cpu-hyperv_vapic) {
  kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
  }
 -if (has_msr_feature_control) {
 -kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL,
 -  env-msr_ia32_feature_control);
 -}
 +/* Note: MSR_IA32_FEATURE_CONTROL is written separately, see
 + *   kvm_put_msr_feature_control. */
  }
  if (env-mcg_cap) {
  int i;
 @@ -1801,6 +1818,13 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
  
  assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
  
 +if (level = KVM_PUT_RESET_STATE  has_msr_feature_control) {
 +ret = kvm_put_msr_feature_control(x86_cpu);
 +if (ret  0) {
 +return ret;
 +}
 +}
 +
  ret = kvm_getput_regs(x86_cpu, 1);
  if (ret  0) {
  return ret;
 

Applied, thanks!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Support direct APIC access from L2

2013-12-18 Thread Paolo Bonzini
Il 16/12/2013 12:55, Jan Kiszka ha scritto:
 It's a pathological case, but still a valid one: If L1 disables APIC
 virtualization and also allows L2 to directly write to the APIC page, we
 have to forcibly enable APIC virtualization while in L2 if the in-kernel
 APIC is in use.
 
 This allows to run the direct interrupt test case in the vmx unit test
 without x2APIC.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  arch/x86/kvm/vmx.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index da04247..7c3ea63 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7771,6 +7771,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   else
   vmcs_write64(APIC_ACCESS_ADDR,
 page_to_phys(vmx-nested.apic_access_page));
 + } else if (vm_need_virtualize_apic_accesses(vmx-vcpu.kvm)) {
 + exec_control |=
 + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
 + vmcs_write64(APIC_ACCESS_ADDR,
 + page_to_phys(vcpu-kvm-arch.apic_access_page));
   }
  
   vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control);
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/15] kvm-unit-tests: Fix and enhance nVMX tests

2013-12-18 Thread Paolo Bonzini
Il 16/12/2013 10:57, Jan Kiszka ha scritto:
 This series contains various smaller fixes and cleanups for the VMX unit
 tests and also a few unrelated corners of the test suite. It ends with a
 rework of the VMX capability test and the addition of a bunch of new
 test cases for interrupt injection, halting and the preemption timer.
 Most of them fail or lockup at the moment. More work is required on KVM
 the fix this, I'm on it.
 
 These apply on top of the vmx branch. They are also available at
 
 git://git.kiszka.org/kvm-unit-tests queue

I finally merged the vmx branch into master, and applied 1-13 on top.

I'm in the process of installing newer hardware, and for now I'm stuck
with an old Core 2.  So I'm holding up patches 14-15 until after I get
the new machine; I pushed them to the vmx branch.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: book3s: kvm: Don't abuse host r2 in exit path

2013-12-18 Thread Alexander Graf

On 18.12.2013, at 08:35, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
wrote:

 
 Hi Alex,
 
 Any update on this ? We need this to got into 3.13.

Thanks, applied to for-3.13.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Don't drop low-order page address bits

2013-12-18 Thread Alexander Graf

On 16.12.2013, at 03:31, Paul Mackerras pau...@samba.org wrote:

 Commit caaa4c804fae (KVM: PPC: Book3S HV: Fix physical address
 calculations) unfortunately resulted in some low-order address bits
 getting dropped in the case where the guest is creating a 4k HPTE
 and the host page size is 64k.  By getting the low-order bits from
 hva rather than gpa we miss out on bits 12 - 15 in this case, since
 hva is at page granularity.  This puts the missing bits back in.
 
 Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org

Thanks, applied to for-3.13.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-18 Thread Michael S. Tsirkin
On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 Hi,
 
 Thank you all for your comments.
 I'm sorry for taking this long to reply, I was away on vacation..
 
 It was a good, long discussion, many issues were raised, which we'd like 
 to address with the following proposed roadmap for Elvis patches.
 In general, we believe it would be best to start with patches that are 
 as simple as possible, providing the basic Elvis functionality, 
 and attend to the more complicated issues in subsequent patches.
 
 Here's the road map for Elvis patches: 

Thanks for the follow up. Some suggestions below.
Please note they suggestions below merely represent
thoughts on merging upstream.
If as the first step you are content with keeping this
work as out of tree patches, in order to have
the freedom to experiment with interfaces and
performance, please feel free to ignore them.

 1. Shared vhost thread for multiple devices.
 
 The way to go here, we believe, is to start with a patch having a shared 
 vhost thread for multiple devices of the SAME vm.
 The next step/patch may be handling vms belonging to the same cgroup.
 
 Finally, we need to extend the functionality so that the shared vhost 
 thread 
 serves multiple vms (not necessarily belonging to the same cgroup).
 
 There was a lot of discussion about the way to address the enforcement 
 of cgroup policies, and we will consider the various solutions with a 
 future
 patch.

With respect to the upstream kernel,
I'm not sure a bunch of changes just for the sake of guests with
multiple virtual NIC cards makes sense.
And I wonder how this step, in isolation, will affect e.g.
multiqueue workloads.
But I guess if the numbers are convincing, this can be mergeable.

 
 2. Creation of vhost threads
 
 We suggested two ways of controlling the creation and removal of vhost
 threads: 
 - statically determining the maximum number of virtio devices per worker 
 via a kernel module parameter 
 - dynamically: Sysfs mechanism to add and remove vhost threads 
 
 It seems that it would be simplest to take the static approach as
 a first stage. At a second stage (next patch), we'll advance to 
 dynamically 
 changing the number of vhost threads, using the static module parameter 
 only as a default value. 

I'm not sure how independent this is from 1.
With respect to the upstream kernel,
Introducing interfaces (which we'll have to maintain
forever) just for the sake of guests with
multiple virtual NIC cards does not look like a good tradeoff.

So I'm unlikely to merge this upstream without making it useful cross-VM,
and yes this means isolation and accounting with cgroups need to
work properly.

 Regarding cwmq, it is an interesting mechanism, which we need to explore 
 further.
 At the moment we prefer not to change the vhost model to use cwmq, as some 
 of the issues that were discussed, such as cgroups, are not supported by 
 cwmq, and this is adding more complexity.
 However, we'll look further into it, and consider it at a later stage.

Hmm that's still assuming some smart management tool configuring
this correctly.  Can't this be determined automatically depending
on the workload?
This is what the cwmq suggestion was really about: detect
that we need more threads and spawn them.
It's less about sharing the implementation with workqueues -
would be very nice but not a must.



 3. Adding polling mode to vhost 
 
 It is a good idea making polling adaptive based on various factors such as 
 the I/O rate, the guest kick overhead(which is the tradeoff of polling), 
 or the amount of wasted cycles (cycles we kept polling but no new work was 
 added).
 However, as a beginning polling patch, we would prefer having a naive 
 polling approach, which could be tuned with later patches.
 

While any polling approach would still need a lot of testing to prove we
don't for example steal CPU from guest which could be doing other useful
work, given that an exit is at least 1.5K cycles at least in theory it
seems like something that can improve performance.  I'm not sure how
naive we can be without introducing regressions  for some workloads.
For example, if we are on the same host CPU, there's no
chance busy waiting will help us make progress.
How about detecting that the VCPU thread that kicked us
is currently running on another CPU, and only polling in
this case?

 4. vhost statistics 
 
 The issue that was raised for the vhost statistics was using ftrace 
 instead of the debugfs mechanism.
 However, looking further into the kvm stat mechanism, we learned that 
 ftrace didn't replace the plain debugfs mechanism, but was used in 
 addition to it.
  
 We propose to continue using debugfs for statistics, in a manner similar 
 to kvm,
 and at some point in the future ftrace can be added to vhost as well.

IMHO which kvm stat is a useful script, the best tool
for perf stats is still perf. So I would try to integrate with that.
How it works internally is IMHO less important.

 Does 

Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency

2013-12-18 Thread Robin Murphy

On 17/12/13 20:39, Alexander Graf wrote:


On 17.12.2013, at 19:31, Robin Murphy robin.mur...@arm.com wrote:


Some platforms have secure firmware which does not correctly set the
CNTFRQ register on boot, preventing the use of the Generic Timer.
This patch allows mirroring the necessary host workaround by specifying
the clock-frequency property in the guest DT.

This should only be considered a means of KVM bring-up on such systems,
such that vendors may be convinced to properly implement their firmware
to support the virtualisation capabilities of their hardware.

Signed-off-by: Robin Murphy robin.mur...@arm.com
Acked-by: Will Deacon will.dea...@arm.com


How does it encourage a vendor to properly implement their firmware if there's 
a workaround?


Alex




Hi Alex,

In short, by enabling the users to create the demand. Yes, like any 
workaround there's potential for abuse, but having *something* that 
makes it work is the difference between I want virtualisation[1] and 
Dear vendor, I've tried virtualisation on your chip/board and it's 
great, but it tells me I need new firmware, where do I get that?


Having the specs tell them what to do clearly isn't sufficient, so let's 
give the integrators and consumers incentive to shout at them too. The 
sooner proper support is commonplace and we can deprecate 
clock-frequency hacks altogether, the better.


Robin.

[1] 
http://www.theregister.co.uk/2013/12/12/virtualisation_on_mobile_phones_is_coming/


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency

2013-12-18 Thread Alexander Graf

On 18.12.2013, at 14:44, Robin Murphy robin.mur...@arm.com wrote:

 On 17/12/13 20:39, Alexander Graf wrote:
 
 On 17.12.2013, at 19:31, Robin Murphy robin.mur...@arm.com wrote:
 
 Some platforms have secure firmware which does not correctly set the
 CNTFRQ register on boot, preventing the use of the Generic Timer.
 This patch allows mirroring the necessary host workaround by specifying
 the clock-frequency property in the guest DT.
 
 This should only be considered a means of KVM bring-up on such systems,
 such that vendors may be convinced to properly implement their firmware
 to support the virtualisation capabilities of their hardware.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com
 Acked-by: Will Deacon will.dea...@arm.com
 
 How does it encourage a vendor to properly implement their firmware if 
 there's a workaround?
 
 
 Alex
 
 
 
 Hi Alex,
 
 In short, by enabling the users to create the demand. Yes, like any 
 workaround there's potential for abuse, but having *something* that makes it 
 work is the difference between I want virtualisation[1] and Dear vendor, 
 I've tried virtualisation on your chip/board and it's great, but it tells me 
 I need new firmware, where do I get that?
 
 Having the specs tell them what to do clearly isn't sufficient, so let's give 
 the integrators and consumers incentive to shout at them too. The sooner 
 proper support is commonplace and we can deprecate clock-frequency hacks 
 altogether, the better.

Oh, I'm all for hacks. But please don't fall under the illusion that this will 
push vendors to fix their firmware. It will have the opposite effect - vendors 
will just point to the workaround and say but it works :).

Either way, this hack is basically required because you can't program CNTFRQ 
because it's controlled by secure firmware, right? So the host os already needs 
to know about this and probably does have a clock-frequency value in its 
device tree entry already to know how fast its clock ticks.

Couldn't we search for that host entry and automatically pass it into the guest 
if it's there? That way the whole thing becomes seamless and even less of an 
issue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 67061] [BISECTED][Nested]L2 guest boot up fail(kvm on kvm).

2013-12-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=67061

Alan a...@lxorguk.ukuu.org.uk changed:

   What|Removed |Added

 CC||a...@lxorguk.ukuu.org.uk
Summary|[Nested]L2 guest boot up|[BISECTED][Nested]L2 guest
   |fail(kvm on kvm).   |boot up fail(kvm on kvm).

--- Comment #2 from Alan a...@lxorguk.ukuu.org.uk ---
Thats a very odd bisection point it must be said

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 08/12] KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy

2013-12-18 Thread Alexander Graf
As soon as we get back to our highmem handler in virtual address
space we may get preempted. Today the reason we can get preempted is
that we replay interrupts and all the lazy logic thinks we have
interrupts enabled.

However, it's not hard to make the code interruptible and that way
we can enable and handle interrupts even earlier.

This fixes random guest crashes that happened with CONFIG_PREEMPT=y
for me.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  1 +
 arch/powerpc/kvm/book3s_pr.c  | 22 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 0bd9348..412b2f3 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -106,6 +106,7 @@ struct kvmppc_host_state {
 };
 
 struct kvmppc_book3s_shadow_vcpu {
+   bool in_use;
ulong gpr[14];
u32 cr;
u32 xer;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index fe14ca3..5b9e906 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -66,6 +66,7 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, 
int cpu)
struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
memcpy(svcpu-slb, to_book3s(vcpu)-slb_shadow, sizeof(svcpu-slb));
svcpu-slb_max = to_book3s(vcpu)-slb_shadow_max;
+   svcpu-in_use = 0;
svcpu_put(svcpu);
 #endif
vcpu-cpu = smp_processor_id();
@@ -78,6 +79,9 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   if (svcpu-in_use) {
+   kvmppc_copy_from_svcpu(vcpu, svcpu);
+   }
memcpy(to_book3s(vcpu)-slb_shadow, svcpu-slb, sizeof(svcpu-slb));
to_book3s(vcpu)-slb_shadow_max = svcpu-slb_max;
svcpu_put(svcpu);
@@ -110,12 +114,26 @@ void kvmppc_copy_to_svcpu(struct 
kvmppc_book3s_shadow_vcpu *svcpu,
svcpu-ctr = vcpu-arch.ctr;
svcpu-lr  = vcpu-arch.lr;
svcpu-pc  = vcpu-arch.pc;
+   svcpu-in_use = true;
 }
 
 /* Copy data touched by real-mode code from shadow vcpu back to vcpu */
 void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
struct kvmppc_book3s_shadow_vcpu *svcpu)
 {
+   /*
+* vcpu_put would just call us again because in_use hasn't
+* been updated yet.
+*/
+   preempt_disable();
+
+   /*
+* Maybe we were already preempted and synced the svcpu from
+* our preempt notifiers. Don't bother touching this svcpu then.
+*/
+   if (!svcpu-in_use)
+   goto out;
+
vcpu-arch.gpr[0] = svcpu-gpr[0];
vcpu-arch.gpr[1] = svcpu-gpr[1];
vcpu-arch.gpr[2] = svcpu-gpr[2];
@@ -139,6 +157,10 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
vcpu-arch.fault_dar   = svcpu-fault_dar;
vcpu-arch.fault_dsisr = svcpu-fault_dsisr;
vcpu-arch.last_inst   = svcpu-last_inst;
+   svcpu-in_use = false;
+
+out:
+   preempt_enable();
 }
 
 static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 06/12] KVM: PPC: Book3S: PR: Don't clobber our exit handler id

2013-12-18 Thread Alexander Graf
We call a C helper to save all svcpu fields into our vcpu. The C
ABI states that r12 is considered volatile. However, we keep our
exit handler id in r12 currently.

So we need to save it away into a non-volatile register instead
that definitely does get preserved across the C call.

This bug usually didn't hit anyone yet since gcc is smart enough
to generate code that doesn't even need r12 which means it stayed
identical throughout the call by sheer luck. But we can't rely on
that.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index f4dd041..5e7cb32 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -132,9 +132,17 @@ kvm_start_lightweight:
 *
 */
 
+   PPC_LL  r3, GPR4(r1)/* vcpu pointer */
+
+   /*
+* kvmppc_copy_from_svcpu can clobber volatile registers, save
+* the exit handler id to the vcpu and restore it from there later.
+*/
+   stw r12, VCPU_TRAP(r3)
+
/* Transfer reg values from shadow vcpu back to vcpu struct */
/* On 64-bit, interrupts are still off at this point */
-   PPC_LL  r3, GPR4(r1)/* vcpu pointer */
+
GET_SHADOW_VCPU(r4)
bl  FUNC(kvmppc_copy_from_svcpu)
nop
@@ -151,7 +159,6 @@ kvm_start_lightweight:
 */
ld  r3, PACA_SPRG3(r13)
mtspr   SPRN_SPRG3, r3
-
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
/* R7 = vcpu */
@@ -177,7 +184,7 @@ kvm_start_lightweight:
PPC_STL r31, VCPU_GPR(R31)(r7)
 
/* Pass the exit number as 3rd argument to kvmppc_handle_exit */
-   mr  r5, r12
+   lwz r5, VCPU_TRAP(r7)
 
/* Restore r3 (kvm_run) and r4 (vcpu) */
REST_2GPRS(3, r1)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 01/12] KVM: PPC: Book3S HV: Fix physical address calculations

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

This fixes a bug in kvmppc_do_h_enter() where the physical address
for a page can be calculated incorrectly if transparent huge pages
(THP) are active.  Until THP came along, it was true that if we
encountered a large (16M) page in kvmppc_do_h_enter(), then the
associated memslot must be 16M aligned for both its guest physical
address and the userspace address, and the physical address
calculations in kvmppc_do_h_enter() assumed that.  With THP, that
is no longer true.

In the case where we are using MMU notifiers and the page size that
we get from the Linux page tables is larger than the page being mapped
by the guest, we need to fill in some low-order bits of the physical
address.  Without THP, these bits would be the same in the guest
physical address (gpa) and the host virtual address (hva).  With THP,
they can be different, and we need to use the bits from hva rather
than gpa.

In the case where we are not using MMU notifiers, the host physical
address we get from the memslot-arch.slot_phys[] array already
includes the low-order bits down to the PAGE_SIZE level, even if
we are using large pages.  Thus we can simplify the calculation in
this case to just add in the remaining bits in the case where
PAGE_SIZE is 64k and the guest is mapping a 4k page.

The same bug exists in kvmppc_book3s_hv_page_fault().  The basic fix
is to use psize (the page size from the HPTE) rather than pte_size
(the page size from the Linux PTE) when updating the HPTE low word
in r.  That means that pfn needs to be computed to PAGE_SIZE
granularity even if the Linux PTE is a huge page PTE.  That can be
arranged simply by doing the page_to_pfn() before setting page to
the head of the compound page.  If psize is less than PAGE_SIZE,
then we need to make sure we only update the bits from PAGE_SIZE
upwards, in order not to lose any sub-page offset bits in r.
On the other hand, if psize is greater than PAGE_SIZE, we need to
make sure we don't bring in non-zero low order bits in pfn, hence
we mask (pfn  PAGE_SHIFT) with ~(psize - 1).

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  4 ++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f3ff587..47bbeaf 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -665,6 +665,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return -EFAULT;
} else {
page = pages[0];
+   pfn = page_to_pfn(page);
if (PageHuge(page)) {
page = compound_head(page);
pte_size = compound_order(page);
@@ -689,7 +690,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
}
rcu_read_unlock_sched();
}
-   pfn = page_to_pfn(page);
}
 
ret = -EFAULT;
@@ -707,8 +707,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = (r  ~(HPTE_R_W|HPTE_R_I|HPTE_R_G)) | HPTE_R_M;
}
 
-   /* Set the HPTE to point to pfn */
-   r = (r  ~(HPTE_R_PP0 - pte_size)) | (pfn  PAGE_SHIFT);
+   /*
+* Set the HPTE to point to pfn.
+* Since the pfn is at PAGE_SIZE granularity, make sure we
+* don't mask out lower-order bits if psize  PAGE_SIZE.
+*/
+   if (psize  PAGE_SIZE)
+   psize = PAGE_SIZE;
+   r = (r  ~(HPTE_R_PP0 - psize)) | ((pfn  PAGE_SHIFT)  ~(psize - 1));
if (hpte_is_writable(r)  !write_ok)
r = hpte_make_readonly(r);
ret = RESUME_GUEST;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9c51544..fddbf98 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -225,6 +225,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
is_io = pa  (HPTE_R_I | HPTE_R_W);
pte_size = PAGE_SIZE  (pa  KVMPPC_PAGE_ORDER_MASK);
pa = PAGE_MASK;
+   pa |= gpa  ~PAGE_MASK;
} else {
/* Translate to host virtual address */
hva = __gfn_to_hva_memslot(memslot, gfn);
@@ -238,13 +239,12 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long 
flags,
ptel = hpte_make_readonly(ptel);
is_io = hpte_cache_bits(pte_val(pte));
pa = pte_pfn(pte)  PAGE_SHIFT;
+   pa |= hva  (pte_size - 1);
}
}
 
if (pte_size  psize)
return H_PARAMETER;
-   if (pa  pte_size  psize)
-   pa |= gpa  (pte_size - 1);

[PULL 10/12] powerpc/kvm/booke: Fix build break due to stack frame size warning

2013-12-18 Thread Alexander Graf
From: Scott Wood scottw...@freescale.com

Commit ce11e48b7fdd256ec68b932a89b397a790566031 (KVM: PPC: E500: Add
userspace debug stub support) added struct thread_struct to the
stack of kvmppc_vcpu_run().  thread_struct is 1152 bytes on my build,
compared to 48 bytes for the recently-introduced struct debug_reg.
Use the latter instead.

This fixes the following error:

cc1: warnings being treated as errors
arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger 
than 1024 bytes
make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make[1]: *** Waiting for unfinished jobs

Signed-off-by: Scott Wood scottw...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/switch_to.h |  2 +-
 arch/powerpc/kernel/process.c| 32 
 arch/powerpc/kvm/booke.c | 12 ++--
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9ee1261..aace905 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -35,7 +35,7 @@ extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
-extern void switch_booke_debug_regs(struct thread_struct *new_thread);
+extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 75c2d10..83530af 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -339,7 +339,7 @@ static void set_debug_reg_defaults(struct thread_struct 
*thread)
 #endif
 }
 
-static void prime_debug_regs(struct thread_struct *thread)
+static void prime_debug_regs(struct debug_reg *debug)
 {
/*
 * We could have inherited MSR_DE from userspace, since
@@ -348,22 +348,22 @@ static void prime_debug_regs(struct thread_struct *thread)
 */
mtmsr(mfmsr()  ~MSR_DE);
 
-   mtspr(SPRN_IAC1, thread-debug.iac1);
-   mtspr(SPRN_IAC2, thread-debug.iac2);
+   mtspr(SPRN_IAC1, debug-iac1);
+   mtspr(SPRN_IAC2, debug-iac2);
 #if CONFIG_PPC_ADV_DEBUG_IACS  2
-   mtspr(SPRN_IAC3, thread-debug.iac3);
-   mtspr(SPRN_IAC4, thread-debug.iac4);
+   mtspr(SPRN_IAC3, debug-iac3);
+   mtspr(SPRN_IAC4, debug-iac4);
 #endif
-   mtspr(SPRN_DAC1, thread-debug.dac1);
-   mtspr(SPRN_DAC2, thread-debug.dac2);
+   mtspr(SPRN_DAC1, debug-dac1);
+   mtspr(SPRN_DAC2, debug-dac2);
 #if CONFIG_PPC_ADV_DEBUG_DVCS  0
-   mtspr(SPRN_DVC1, thread-debug.dvc1);
-   mtspr(SPRN_DVC2, thread-debug.dvc2);
+   mtspr(SPRN_DVC1, debug-dvc1);
+   mtspr(SPRN_DVC2, debug-dvc2);
 #endif
-   mtspr(SPRN_DBCR0, thread-debug.dbcr0);
-   mtspr(SPRN_DBCR1, thread-debug.dbcr1);
+   mtspr(SPRN_DBCR0, debug-dbcr0);
+   mtspr(SPRN_DBCR1, debug-dbcr1);
 #ifdef CONFIG_BOOKE
-   mtspr(SPRN_DBCR2, thread-debug.dbcr2);
+   mtspr(SPRN_DBCR2, debug-dbcr2);
 #endif
 }
 /*
@@ -371,11 +371,11 @@ static void prime_debug_regs(struct thread_struct *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
-void switch_booke_debug_regs(struct thread_struct *new_thread)
+void switch_booke_debug_regs(struct debug_reg *new_debug)
 {
if ((current-thread.debug.dbcr0  DBCR0_IDM)
-   || (new_thread-debug.dbcr0  DBCR0_IDM))
-   prime_debug_regs(new_thread);
+   || (new_debug-dbcr0  DBCR0_IDM))
+   prime_debug_regs(new_debug);
 }
 EXPORT_SYMBOL_GPL(switch_booke_debug_regs);
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
@@ -683,7 +683,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   switch_booke_debug_regs(new-thread);
+   switch_booke_debug_regs(new-thread.debug);
 #else
 /*
  * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 53e65a2..0591e05 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -681,7 +681,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int ret, s;
-   struct thread_struct thread;
+   struct debug_reg debug;
 #ifdef CONFIG_PPC_FPU
struct thread_fp_state fp;
int fpexc_mode;
@@ -723,9 +723,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 #endif
 
/* Switch to guest debug context */
-   thread.debug = vcpu-arch.shadow_dbg_reg;
-   switch_booke_debug_regs(thread);
-   thread.debug = current-thread.debug;
+  

[PULL 07/12] KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu

2013-12-18 Thread Alexander Graf
The kvmppc_copy_{to,from}_svcpu functions are publically visible,
so we should also export them in a header for others C files to
consume.

So far we didn't need this because we only called it from asm code.
The next patch will introduce a C caller.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4a594b7..bc23b1b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -192,6 +192,10 @@ extern void kvmppc_load_up_vsx(void);
 extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
+extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
+struct kvm_vcpu *vcpu);
+extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
+  struct kvmppc_book3s_shadow_vcpu *svcpu);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 09/12] KVM: PPC: Book3S: PR: Enable interrupts earlier

2013-12-18 Thread Alexander Graf
Now that the svcpu sync is interrupt aware we can enable interrupts
earlier in the exit code path again, moving 32bit and 64bit closer
together.

While at it, document the fact that we're always executing the exit
path with interrupts enabled so that the next person doesn't trap
over this.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S | 6 +-
 arch/powerpc/kvm/book3s_rmhandlers.S | 6 +-
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index 5e7cb32..f779450 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -129,6 +129,7 @@ kvm_start_lightweight:
 * R12  = exit handler id
 * R13  = PACA
 * SVCPU.*  = guest *
+* MSR.EE   = 1
 *
 */
 
@@ -148,11 +149,6 @@ kvm_start_lightweight:
nop
 
 #ifdef CONFIG_PPC_BOOK3S_64
-   /* Re-enable interrupts */
-   ld  r3, HSTATE_HOST_MSR(r13)
-   ori r3, r3, MSR_EE
-   MTMSR_EERI(r3)
-
/*
 * Reload kernel SPRG3 value.
 * No need to save guest value as usermode can't modify SPRG3.
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
b/arch/powerpc/kvm/book3s_rmhandlers.S
index a38c4c9..c3c5231 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -153,15 +153,11 @@ _GLOBAL(kvmppc_entry_trampoline)
 
li  r6, MSR_IR | MSR_DR
andcr6, r5, r6  /* Clear DR and IR in MSR value */
-#ifdef CONFIG_PPC_BOOK3S_32
/*
 * Set EE in HOST_MSR so that it's enabled when we get into our
-* C exit handler function.  On 64-bit we delay enabling
-* interrupts until we have finished transferring stuff
-* to or from the PACA.
+* C exit handler function.
 */
ori r5, r5, MSR_EE
-#endif
mtsrr0  r7
mtsrr1  r6
RFI
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 00/12] 3.13 patch queue 2013-12-18 for 3.13

2013-12-18 Thread Alexander Graf
Hi Paolo / Gleb,

This is my current patch queue for 3.13.

It fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Please pull.

Alex


The following changes since commit f080480488028bcc25357f85e8ae54ccc3bb7173:

  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
(2013-11-15 13:51:36 +0900)

are available in the git repository at:


  git://github.com/agraf/linux-2.6.git tags/signed-for-3.13

for you to fetch changes up to df9059bb64023da9f27e56a94a3e2b8f4b6336a9:

  KVM: PPC: Book3S HV: Don't drop low-order page address bits (2013-12-18 
11:30:35 +0100)


Patch queue for 3.13 - 2013-12-18

This fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Alexander Graf (4):
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
  powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
  KVM: PPC: Book3S HV: Fix physical address calculations
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
  powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
  powerpc: kvm: fix rare but potential deadlock scene


Alexander Graf (4):
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
  powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
  KVM: PPC: Book3S HV: Fix physical address calculations
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
  powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
  powerpc: kvm: fix rare but potential deadlock scene

 arch/powerpc/include/asm/kvm_book3s.h |  4 
 arch/powerpc/include/asm/kvm_book3s_asm.h |  2 ++
 arch/powerpc/include/asm/switch_to.h  |  2 +-
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/process.c | 32 +++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 18 +
 arch/powerpc/kvm/book3s_hv.c  | 24 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |  9 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 --
 arch/powerpc/kvm/book3s_interrupts.S  | 19 ++
 arch/powerpc/kvm/book3s_pr.c  | 22 +
 arch/powerpc/kvm/book3s_rmhandlers.S  |  6 +-
 arch/powerpc/kvm/booke.c  | 12 ++--
 13 files changed, 112 insertions(+), 62 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 11/12] powerpc: book3s: kvm: Don't abuse host r2 in exit path

2013-12-18 Thread Alexander Graf
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

We don't use PACATOC for PR. Avoid updating HOST_R2 with PR
KVM mode when both HV and PR are enabled in the kernel. Without this we
get the below crash

(qemu)
Unable to handle kernel paging request for data at address 0x8310
Faulting instruction address: 0xc001d5a4
cpu 0x2: Vector: 300 (Data Access) at [c001dc53aef0]
pc: c001d5a4: .vtime_delta.isra.1+0x34/0x1d0
lr: c001d760: .vtime_account_system+0x20/0x60
sp: c001dc53b170
   msr: 80009032
   dar: 8310
 dsisr: 4000
  current = 0xc001d76c62d0
  paca= 0xcfef1100   softe: 0irq_happened: 0x01
pid   = 4472, comm = qemu-system-ppc
enter ? for help
[c001dc53b200] c001d760 .vtime_account_system+0x20/0x60
[c001dc53b290] c008d050 .kvmppc_handle_exit_pr+0x60/0xa50
[c001dc53b340] c008f51c kvm_start_lightweight+0xb4/0xc4
[c001dc53b510] c008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0
[c001dc53b9e0] c008341c .kvmppc_vcpu_run+0x2c/0x40
[c001dc53ba50] c0080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c001dc53bae0] c007b4c8 .kvm_vcpu_ioctl+0x478/0x730
[c001dc53bca0] c02140cc .do_vfs_ioctl+0x4ac/0x770
[c001dc53bd80] c02143e8 .SyS_ioctl+0x58/0xb0
[c001dc53be30] c0009e58 syscall_exit+0x0/0x98

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
 arch/powerpc/kernel/asm-offsets.c | 1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 7 +++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 412b2f3..192917d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -79,6 +79,7 @@ struct kvmppc_host_state {
ulong vmhandler;
ulong scratch0;
ulong scratch1;
+   ulong scratch2;
u8 in_guest;
u8 restore_hid5;
u8 napping;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 2ea5cc0..d3de010 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -576,6 +576,7 @@ int main(void)
HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler);
HSTATE_FIELD(HSTATE_SCRATCH0, scratch0);
HSTATE_FIELD(HSTATE_SCRATCH1, scratch1);
+   HSTATE_FIELD(HSTATE_SCRATCH2, scratch2);
HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5);
HSTATE_FIELD(HSTATE_NAPPING, napping);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bde28da..be4fa04a 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -754,15 +754,14 @@ kvmppc_interrupt_hv:
 * guest CR, R12 saved in shadow VCPU SCRATCH1/0
 * guest R13 saved in SPRN_SCRATCH0
 */
-   /* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */
-   std r9, HSTATE_HOST_R2(r13)
+   std r9, HSTATE_SCRATCH2(r13)
 
lbz r9, HSTATE_IN_GUEST(r13)
cmpwi   r9, KVM_GUEST_MODE_HOST_HV
beq kvmppc_bad_host_intr
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
cmpwi   r9, KVM_GUEST_MODE_GUEST
-   ld  r9, HSTATE_HOST_R2(r13)
+   ld  r9, HSTATE_SCRATCH2(r13)
beq kvmppc_interrupt_pr
 #endif
/* We're now back in the host but in guest MMU context */
@@ -782,7 +781,7 @@ kvmppc_interrupt_hv:
std r6, VCPU_GPR(R6)(r9)
std r7, VCPU_GPR(R7)(r9)
std r8, VCPU_GPR(R8)(r9)
-   ld  r0, HSTATE_HOST_R2(r13)
+   ld  r0, HSTATE_SCRATCH2(r13)
std r0, VCPU_GPR(R9)(r9)
std r10, VCPU_GPR(R10)(r9)
std r11, VCPU_GPR(R11)(r9)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 12/12] KVM: PPC: Book3S HV: Don't drop low-order page address bits

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Commit caaa4c804fae (KVM: PPC: Book3S HV: Fix physical address
calculations) unfortunately resulted in some low-order address bits
getting dropped in the case where the guest is creating a 4k HPTE
and the host page size is 64k.  By getting the low-order bits from
hva rather than gpa we miss out on bits 12 - 15 in this case, since
hva is at page granularity.  This puts the missing bits back in.

Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1931aa3..8689e2e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -240,6 +240,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
is_io = hpte_cache_bits(pte_val(pte));
pa = pte_pfn(pte)  PAGE_SHIFT;
pa |= hva  (pte_size - 1);
+   pa |= gpa  ~PAGE_MASK;
}
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 05/12] powerpc: kvm: fix rare but potential deadlock scene

2013-12-18 Thread Alexander Graf
From: pingfan liu qemul...@gmail.com

Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.

Suppose the following scene:

Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
vcpus.

If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
caught in realmode for a long time.

What makes things even worse if the following happens,
  On cpuM, bitlockX is hold, on cpuN, Y is hold.
  vcpu_B_2 try to lock Y on cpuM in realmode
  vcpu_A_2 try to lock X on cpuN in realmode

Oops! deadlock happens

Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
Reviewed-by: Paul Mackerras pau...@samba.org
CC: sta...@vger.kernel.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 47bbeaf..c5d1484 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -469,11 +469,14 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
slb_v = vcpu-kvm-arch.vrma_slb_v;
}
 
+   preempt_disable();
/* Find the HPTE in the hash table */
index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
 HPTE_V_VALID | HPTE_V_ABSENT);
-   if (index  0)
+   if (index  0) {
+   preempt_enable();
return -ENOENT;
+   }
hptep = (unsigned long *)(kvm-arch.hpt_virt + (index  4));
v = hptep[0]  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
@@ -481,6 +484,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
/* Unlock the HPTE */
asm volatile(lwsync : : : memory);
hptep[0] = v;
+   preempt_enable();
 
gpte-eaddr = eaddr;
gpte-vpage = ((v  HPTE_V_AVPN)  4) | ((eaddr  12)  0xfff);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index fddbf98..1931aa3 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -749,6 +749,10 @@ static int slb_base_page_shift[4] = {
20, /* 1M, unsupported */
 };
 
+/* When called from virtmode, this func should be protected by
+ * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
+ * can trigger deadlock issue.
+ */
 long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long 
slb_v,
  unsigned long valid)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 02/12] KVM: PPC: Book3S HV: Refine barriers in guest entry/exit

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Some users have reported instances of the host hanging with secondary
threads of a core waiting for the primary thread to exit the guest,
and the primary thread stuck in nap mode.  This prompted a review of
the memory barriers in the guest entry/exit code, and this is the
result.  Most of these changes are the suggestions of Dean Burdick
deanburd...@us.ibm.com.

The barriers between updating napping_threads and reading the
entry_exit_count on the one hand, and updating entry_exit_count and
reading napping_threads on the other, need to be isync not lwsync,
since we need to ensure that either the napping_threads update or the
entry_exit_count update get seen.  It is not sufficient to order the
load vs. lwarx, as lwsync does; we need to order the load vs. the
stwcx., so we need isync.

In addition, we need a full sync before sending IPIs to wake other
threads from nap, to ensure that the write to the entry_exit_count is
visible before the IPI occurs.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bc8de75..bde28da 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -153,7 +153,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 
 13:b   machine_check_fwnmi
 
-
 /*
  * We come in here when wakened from nap mode on a secondary hw thread.
  * Relocation is off and most register values are lost.
@@ -224,6 +223,11 @@ kvm_start_guest:
/* Clear our vcpu pointer so we don't come back in early */
li  r0, 0
std r0, HSTATE_KVM_VCPU(r13)
+   /*
+* Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing
+* the nap_count, because once the increment to nap_count is
+* visible we could be given another vcpu.
+*/
lwsync
/* Clear any pending IPI - we're an offline thread */
ld  r5, HSTATE_XICS_PHYS(r13)
@@ -241,7 +245,6 @@ kvm_start_guest:
/* increment the nap count and then go to nap mode */
ld  r4, HSTATE_KVM_VCORE(r13)
addir4, r4, VCORE_NAP_COUNT
-   lwsync  /* make previous updates visible */
 51:lwarx   r3, 0, r4
addir3, r3, 1
stwcx.  r3, 0, r4
@@ -990,14 +993,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
 */
/* Increment the threads-exiting-guest count in the 0xff00
   bits of vcore-entry_exit_count */
-   lwsync
ld  r5,HSTATE_KVM_VCORE(r13)
addir6,r5,VCORE_ENTRY_EXIT
 41:lwarx   r3,0,r6
addir0,r3,0x100
stwcx.  r0,0,r6
bne 41b
-   lwsync
+   isync   /* order stwcx. vs. reading napping_threads */
 
/*
 * At this point we have an interrupt that we have to pass
@@ -1030,6 +1032,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
sld r0,r0,r4
andc.   r3,r3,r0/* no sense IPI'ing ourselves */
beq 43f
+   /* Order entry/exit update vs. IPIs */
+   sync
mulli   r4,r4,PACA_SIZE /* get paca for thread 0 */
subfr6,r4,r13
 42:andi.   r0,r3,1
@@ -1638,10 +1642,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
bge kvm_cede_exit
stwcx.  r4,0,r6
bne 31b
+   /* order napping_threads update vs testing entry_exit_count */
+   isync
li  r0,1
stb r0,HSTATE_NAPPING(r13)
-   /* order napping_threads update vs testing entry_exit_count */
-   lwsync
mr  r4,r3
lwz r7,VCORE_ENTRY_EXIT(r5)
cmpwi   r7,0x100
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 03/12] KVM: PPC: Book3S HV: Make tbacct_lock irq-safe

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Lockdep reported that there is a potential for deadlock because
vcpu-arch.tbacct_lock is not irq-safe, and is sometimes taken inside
the rq_lock (run-queue lock) in the scheduler, which is taken within
interrupts.  The lockdep splat looks like:

==
[ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ]
3.12.0-rc5-kvm+ #8 Not tainted
--
qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
((vcpu-arch.tbacct_lock)-rlock){+.+...}, at: [c00947ac] 
.kvmppc_core_vcpu_put_hv+0x2c/0xa0

and this task is already holding:
(rq-lock){-.-.-.}, at: [c0ac16c0] .__schedule+0x180/0xaa0
which would create a new lock dependency:
(rq-lock){-.-.-.} - ((vcpu-arch.tbacct_lock)-rlock){+.+...}

but this new dependency connects a HARDIRQ-irq-safe lock:
(rq-lock){-.-.-.}
... which became HARDIRQ-irq-safe at:
 [c013797c] .lock_acquire+0xbc/0x190
 [c0ac3c74] ._raw_spin_lock+0x34/0x60
 [c00f8564] .scheduler_tick+0x54/0x180
 [c00c2610] .update_process_times+0x70/0xa0
 [c012cdfc] .tick_periodic+0x3c/0xe0
 [c012cec8] .tick_handle_periodic+0x28/0xb0
 [c001ef40] .timer_interrupt+0x120/0x2e0
 [c0002868] decrementer_common+0x168/0x180
 [c01c7ca4] .get_page_from_freelist+0x924/0xc10
 [c01c8e00] .__alloc_pages_nodemask+0x200/0xba0
 [c01c9eb8] .alloc_pages_exact_nid+0x68/0x110
 [c0f4c3ec] .page_cgroup_init+0x1e0/0x270
 [c0f24480] .start_kernel+0x3e0/0x4e4
 [c0009d30] .start_here_common+0x20/0x70

to a HARDIRQ-irq-unsafe lock:
((vcpu-arch.tbacct_lock)-rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [c013797c] .lock_acquire+0xbc/0x190
 [c0ac3c74] ._raw_spin_lock+0x34/0x60
 [c00946ac] .kvmppc_core_vcpu_load_hv+0x2c/0x100
 [c008394c] .kvmppc_core_vcpu_load+0x2c/0x40
 [c0081000] .kvm_arch_vcpu_load+0x10/0x30
 [c007afd4] .vcpu_load+0x64/0xd0
 [c007b0f8] .kvm_vcpu_ioctl+0x68/0x730
 [c025530c] .do_vfs_ioctl+0x4dc/0x7a0
 [c0255694] .SyS_ioctl+0xc4/0xe0
 [c0009ee4] syscall_exit+0x0/0x98

Some users have reported this deadlock occurring in practice, though
the reports have been primarily on 3.10.x-based kernels.

This fixes the problem by making tbacct_lock be irq-safe.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 072287f..31d9cfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -131,8 +131,9 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
+   unsigned long flags;
 
-   spin_lock(vcpu-arch.tbacct_lock);
+   spin_lock_irqsave(vcpu-arch.tbacct_lock, flags);
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE 
vc-preempt_tb != TB_NIL) {
vc-stolen_tb += mftb() - vc-preempt_tb;
@@ -143,19 +144,20 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu 
*vcpu, int cpu)
vcpu-arch.busy_stolen += mftb() - vcpu-arch.busy_preempt;
vcpu-arch.busy_preempt = TB_NIL;
}
-   spin_unlock(vcpu-arch.tbacct_lock);
+   spin_unlock_irqrestore(vcpu-arch.tbacct_lock, flags);
 }
 
 static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
+   unsigned long flags;
 
-   spin_lock(vcpu-arch.tbacct_lock);
+   spin_lock_irqsave(vcpu-arch.tbacct_lock, flags);
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE)
vc-preempt_tb = mftb();
if (vcpu-arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
vcpu-arch.busy_preempt = mftb();
-   spin_unlock(vcpu-arch.tbacct_lock);
+   spin_unlock_irqrestore(vcpu-arch.tbacct_lock, flags);
 }
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
@@ -486,11 +488,11 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 
now)
 */
if (vc-vcore_state != VCORE_INACTIVE 
vc-runner-arch.run_task != current) {
-   spin_lock(vc-runner-arch.tbacct_lock);
+   spin_lock_irq(vc-runner-arch.tbacct_lock);
p = vc-stolen_tb;
if (vc-preempt_tb != TB_NIL)
p += now - vc-preempt_tb;
-   spin_unlock(vc-runner-arch.tbacct_lock);
+   spin_unlock_irq(vc-runner-arch.tbacct_lock);
} else {
p = vc-stolen_tb;
}
@@ -512,10 +514,10 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
core_stolen = 

[PULL 04/12] KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic:

===
[ INFO: suspicious RCU usage. ]
3.12.0-rc5-kvm+ #9 Not tainted
---

include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-ppc/4831:

stack backtrace:
CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9
Call Trace:
[c00be462b2a0] [c001644c] .show_stack+0x7c/0x1f0 (unreliable)
[c00be462b370] [c0ad57c0] .dump_stack+0x88/0xb4
[c00be462b3f0] [c01315e8] .lockdep_rcu_suspicious+0x138/0x180
[c00be462b480] [c007862c] .gfn_to_memslot+0x13c/0x170
[c00be462b510] [c007d384] .gfn_to_hva_prot+0x24/0x90
[c00be462b5a0] [c007d420] .kvm_read_guest_page+0x30/0xd0
[c00be462b630] [c007d528] .kvm_read_guest+0x68/0x110
[c00be462b6e0] [c0084594] .kvmppc_rtas_hcall+0x34/0x180
[c00be462b7d0] [c0097934] .kvmppc_pseries_do_hcall+0x74/0x830
[c00be462b880] [c00990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0
[c00be462b9e0] [c00839cc] .kvmppc_vcpu_run+0x2c/0x40
[c00be462ba50] [c00810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c00be462bae0] [c007b508] .kvm_vcpu_ioctl+0x478/0x730
[c00be462bca0] [c025532c] .do_vfs_ioctl+0x4dc/0x7a0
[c00be462bd80] [c02556b4] .SyS_ioctl+0xc4/0xe0
[c00be462be30] [c0009ee4] syscall_exit+0x0/0x98

To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall()
call.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 31d9cfb..b51d5db 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -591,7 +591,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (list_empty(vcpu-kvm-arch.rtas_tokens))
return RESUME_HOST;
 
+   idx = srcu_read_lock(vcpu-kvm-srcu);
rc = kvmppc_rtas_hcall(vcpu);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);
 
if (rc == -ENOENT)
return RESUME_HOST;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool/arm: Add option to override Generic Timer frequency

2013-12-18 Thread Robin Murphy

On 18/12/13 14:07, Alexander Graf wrote:

[...]

How does it encourage a vendor to properly implement their firmware if there's 
a workaround?


Alex




Hi Alex,

In short, by enabling the users to create the demand. Yes, like any workaround there's potential 
for abuse, but having *something* that makes it work is the difference between I want 
virtualisation[1] and Dear vendor, I've tried virtualisation on your chip/board and 
it's great, but it tells me I need new firmware, where do I get that?

Having the specs tell them what to do clearly isn't sufficient, so let's give 
the integrators and consumers incentive to shout at them too. The sooner proper 
support is commonplace and we can deprecate clock-frequency hacks altogether, 
the better.


Oh, I'm all for hacks. But please don't fall under the illusion that this will push 
vendors to fix their firmware. It will have the opposite effect - vendors will just point 
to the workaround and say but it works :).



If vendors already aren't bothering to support functionality available 
in their flagship hardware, workarounds hardly make that worse, and are 
a win for the user. If it can drive adoption enough to get vendors to 
see the value in at least fixing future products, that's only good.



Either way, this hack is basically required because you can't program CNTFRQ because it's 
controlled by secure firmware, right? So the host os already needs to know about this and 
probably does have a clock-frequency value in its device tree entry already 
to know how fast its clock ticks.



In some cases, yes. In others they don't explicitly use the arch timer 
at all thus have no frequency set anywhere. In the case of the board I 
have on my desk, it took hacking the non-secure part of the bootloader, 
writing a shim to throw away the securely-booted non-hyp cpu0 and fire 
up a secondary, and hacking a timer node into the host DT to even get as 
far as having an issue with kvmtool.



Couldn't we search for that host entry and automatically pass it into the guest 
if it's there? That way the whole thing becomes seamless and even less of an 
issue.



In theory that would be an ideal solution, yes. In practice it means 
either making KVM dependent on PROC_DEVICETREE (yuck) or cooking up some 
kernel interface to expose the system timer frequency to userspace 
(double yuck). Not just global solution to local problem, but global 
solution to local 
problem-that-shouldn't-even-exist-and-you-want-to-go-away-as-soon-as-possible-without-leaving-a-legacy. 
Besides, that would probably just reinforce the equally wrong behaviour 
of putting the frequency in the host DT instead of fixing the firmware ;)


Robin.



Alex





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH backport hints for 3.10] KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)

2013-12-18 Thread Greg KH
On Mon, Dec 16, 2013 at 12:38:17PM +0100, Paolo Bonzini wrote:
 The KVM patch fix vapic memory corruption applies to most kernels that
 have KVM, but the fix does not apply on many older branches.  The APIs it
 uses are available in 3.1, but until 3.9 kvm_gfn_to_hva_cache_init had
 one fewer parameter.
 
 The comments in this patch should help fixing kvm_lapic_set_vapic_addr
 in older kernels.  I will review the backports as they are posted to
 LKML (if I am CCed...).

This fails to build on 3.4-stable, could you provide a working backport
for that kernel?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Do not skip the instruction if handle_dr injects a fault

2013-12-18 Thread Jan Kiszka
If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not
advance the instruction pointer. Otherwise the exception will hit the
wrong instruction.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 56c654c..ee3bf54 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5141,10 +5141,14 @@ static int handle_dr(struct kvm_vcpu *vcpu)
reg = DEBUG_REG_ACCESS_REG(exit_qualification);
if (exit_qualification  TYPE_MOV_FROM_DR) {
unsigned long val;
-   if (!kvm_get_dr(vcpu, dr, val))
-   kvm_register_write(vcpu, reg, val);
+
+   if (kvm_get_dr(vcpu, dr, val))
+   return 1;
+   kvm_register_write(vcpu, reg, val);
} else
-   kvm_set_dr(vcpu, dr, vcpu-arch.regs[reg]);
+   if (kvm_set_dr(vcpu, dr, vcpu-arch.regs[reg]))
+   return 1;
+
skip_emulated_instruction(vcpu);
return 1;
 }
-- 
1.8.1.1.298.ge7eed54
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mempath prefault: fix off-by-one error

2013-12-18 Thread Marcelo Tosatti

Fix off-by-one error (noticed by Andrea Arcangeli).

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/exec.c b/exec.c
index f4b9ef2..1be5252 100644
--- a/exec.c
+++ b/exec.c
@@ -1000,7 +1000,7 @@ static void *file_ram_alloc(RAMBlock *block,
 }
 
 /* MAP_POPULATE silently ignores failures */
-for (i = 0; i  (memory/hpagesize)-1; i++) {
+for (i = 0; i  (memory/hpagesize); i++) {
 memset(area + (hpagesize*i), 0, 1);
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values

2013-12-18 Thread Alexander Graf

On 11.11.2013, at 15:02, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
wrote:

 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Don't try to compute these values.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 
 NOTE: I am not sure why we were originally computing dsisr and dar. So may be
 we need a variant of this patch. But with this and the additional patch
 powerpc: book3s: PR: Enable Little Endian PR guest I am able to get a 
 Little Endian
 PR guest to boot.

It's quite easy to find out - git blame tells you all the history and points 
you to commit ca7f4203b.

commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a
Author: Alexander Graf ag...@suse.de
Date:   Wed Mar 24 21:48:28 2010 +0100

KVM: PPC: Implement alignment interrupt

Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.

But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.

Signed-off-by: Alexander Graf ag...@suse.de
Signed-off-by: Avi Kivity a...@redhat.com

Read this as on 970, alignment interrupts don't give us DSISR and DAR of the 
faulting instruction as otherwise I wouldn't have implemented it.

So this is clearly a nack on this patch :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

2013-12-18 Thread Alexander Graf

On 29.11.2013, at 03:01, “tiejun.chen” tiejun.c...@windriver.com wrote:

 Alex,
 
 I suppose Scott already elaborate anything you want to know.

Thanks, applied with this minor patch on top:

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 191c32b..be3de1d 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -470,7 +470,7 @@ _GLOBAL(kvmppc_resume_host)
 #ifdef CONFIG_64BIT
/*
 * We enter with interrupts disabled in hardware, but
-* we need to call RECONCILE_IRQ_STATE anyway to ensure
+* we need to call RECONCILE_IRQ_STATE to ensure
 * that the software state is kept in sync.
 */
RECONCILE_IRQ_STATE(r3,r5)

I've also taken the liberty of making the patch description be more verbose:

KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state

Rather than calling hard_irq_disable() when we're back in C code
we can just call RECONCILE_IRQ_STATE to soft disable IRQs while
we're already in hard disabled state.

This should be functionally equivalent to the code before, but
cleaner and faster.

Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
[agraf: fix comment, commit message]
Signed-off-by: Alexander Graf ag...@suse.de


Thanks, applied to kvm-ppc-queue.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values

2013-12-18 Thread Paul Mackerras
On Wed, Dec 18, 2013 at 10:44:08PM +0100, Alexander Graf wrote:
 
 On 11.11.2013, at 15:02, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
 wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  Don't try to compute these values.
  
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  ---
  
  NOTE: I am not sure why we were originally computing dsisr and dar. So may 
  be
  we need a variant of this patch. But with this and the additional patch
  powerpc: book3s: PR: Enable Little Endian PR guest I am able to get a 
  Little Endian
  PR guest to boot.
 
 It's quite easy to find out - git blame tells you all the history and points 
 you to commit ca7f4203b.
 
 commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a
 Author: Alexander Graf ag...@suse.de
 Date:   Wed Mar 24 21:48:28 2010 +0100
 
 KVM: PPC: Implement alignment interrupt
 
 Mac OS X has some applications - namely the Finder - that require 
 alignment
 interrupts to work properly. So we need to implement them.
 
 But the spec for 970 and 750 also looks different. While 750 requires the
 DSISR and DAR fields to reflect some instruction bits (DSISR) and the 
 fault
 address (DAR), the 970 declares this as an optional feature. So we need
 to reconstruct DSISR and DAR manually.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Avi Kivity a...@redhat.com
 
 Read this as on 970, alignment interrupts don't give us DSISR and DAR of the 
 faulting instruction as otherwise I wouldn't have implemented it.

Although it's optional, all IBM POWER cpus, and as far as I know all
PowerPC cpus, set DAR on an alignment interrupt to the effective
address being accessed.  You have a valid point regarding DSISR, but
it would be nice to skip the computations where either the host CPU
provides the bits, or the virtual CPU doesn't.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST][PATCH 2/2] driver core: platform: allow platform drivers to bind to any device

2013-12-18 Thread Greg Kroah-Hartman
On Tue, Dec 03, 2013 at 12:34:54PM +, Kim Phillips wrote:
 Platform drivers such as the vfio-platform meta- driver [1]
 should be allowed to specify that they can bind to any device,
 much like PCI drivers can with PCI_ANY_ID.
 
 Currently, binding platform drivers to devices depends on:
 
 - a string match in the device node's compatible entry (OF)
 - a string match in the ACPI id list (ACPI)
 - a string match in the id_table (platform data)
 - a string match on the driver name (fall-back)
 
 none of which allow for the notion of match any.
 
 This patch adds the notion by adding a match any device boolean to
 struct platform_driver, for drivers to be able to set and thus not cause
 platform_match() to fail when a bind is requested.
 
 [1] http://www.spinics.net/lists/kvm/msg96701.html
 
 Signed-off-by: Kim Phillips kim.phill...@linaro.org
 ---
 rebased onto 3.13-rc2, and reposted from first submission which
 received no comments:
 
 https://lkml.org/lkml/2013/10/11/48
 
  drivers/base/platform.c | 4 
  include/linux/platform_device.h | 1 +
  2 files changed, 5 insertions(+)
 
 diff --git a/drivers/base/platform.c b/drivers/base/platform.c
 index 3a94b79..78a5b62 100644
 --- a/drivers/base/platform.c
 +++ b/drivers/base/platform.c
 @@ -736,6 +736,10 @@ static int platform_match(struct device *dev, struct 
 device_driver *drv)
   struct platform_device *pdev = to_platform_device(dev);
   struct platform_driver *pdrv = to_platform_driver(drv);
  
 + /* the driver matches any device */
 + if (pdrv-match_any_dev)
 + return 1;

This breaks userspace in that it will never know to load the module that
can bind to anything.

You need a way to encode this in the platform device id that can be a
wildcard type of thing, so that userspace can know about this.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST][PATCH 1/2] driver core: Add new device_driver flag to allow binding via sysfs only

2013-12-18 Thread Greg Kroah-Hartman
On Tue, Dec 03, 2013 at 12:34:46PM +, Kim Phillips wrote:
 VFIO supports pass-through of devices to user space - for sake
 of illustration, say a PCI e1000 device:
 
 - the e1000 is first unbound from the PCI e1000 driver via sysfs
 - the vfio-pci driver is told via new_id that it now handles e1000 devices
 - the e1000 is explicitly bound to vfio-pci through sysfs
 
 However, now we have two drivers in the system that both handle e1000
 devices.  A hotplug event could then occur and it is ambiguous as to which
 driver will claim the device.  The desired semantics is that vfio-pci is
 only bound to devices by explicit request in sysfs.  This patch makes this
 possible by introducing a sysfs_bind_only flag in struct device_driver.

Why deal with this at all and not just deal with the bind sysfs file
instead?  That way no driver core logic needs to be changed at all, and
your userspace tools know _exactly_ which device is being bound to the
new device.

Don't mess with the new_id file for stuff like this, as you point out,
it's tricky...

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST][PATCH 1/2] driver core: Add new device_driver flag to allow binding via sysfs only

2013-12-18 Thread Greg Kroah-Hartman
On Tue, Dec 03, 2013 at 04:34:33PM +0100, Jan Kiszka wrote:
 On 2013-12-03 13:34, Kim Phillips wrote:
  VFIO supports pass-through of devices to user space - for sake
  of illustration, say a PCI e1000 device:
  
  - the e1000 is first unbound from the PCI e1000 driver via sysfs
  - the vfio-pci driver is told via new_id that it now handles e1000 devices
  - the e1000 is explicitly bound to vfio-pci through sysfs
  
  However, now we have two drivers in the system that both handle e1000
  devices.  A hotplug event could then occur and it is ambiguous as to which
  driver will claim the device.  The desired semantics is that vfio-pci is
  only bound to devices by explicit request in sysfs.  This patch makes this
  possible by introducing a sysfs_bind_only flag in struct device_driver.
  
  Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
  Signed-off-by: Kim Phillips kim.phill...@linaro.org
  ---
  rebased onto 3.13-rc2, and reposted from first submission which
  recieved no comments:
  
  https://lkml.org/lkml/2013/10/11/53
  
   drivers/base/dd.c  | 5 -
   include/linux/device.h | 2 ++
   2 files changed, 6 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/base/dd.c b/drivers/base/dd.c
  index 0605176..b83b16d 100644
  --- a/drivers/base/dd.c
  +++ b/drivers/base/dd.c
  @@ -389,7 +389,7 @@ static int __device_attach(struct device_driver *drv, 
  void *data)
   {
  struct device *dev = data;
   
  -   if (!driver_match_device(drv, dev))
  +   if (drv-sysfs_bind_only || !driver_match_device(drv, dev))
  return 0;
   
  return driver_probe_device(drv, dev);
  @@ -476,6 +476,9 @@ static int __driver_attach(struct device *dev, void 
  *data)
*/
   int driver_attach(struct device_driver *drv)
   {
  +   if (drv-sysfs_bind_only)
  +   return 0;
  +
  return bus_for_each_dev(drv-bus, NULL, drv, __driver_attach);
   }
   EXPORT_SYMBOL_GPL(driver_attach);
  diff --git a/include/linux/device.h b/include/linux/device.h
  index 952b010..ed441d1 100644
  --- a/include/linux/device.h
  +++ b/include/linux/device.h
  @@ -200,6 +200,7 @@ extern struct klist *bus_get_device_klist(struct 
  bus_type *bus);
* @owner: The module owner.
* @mod_name:  Used for built-in modules.
* @suppress_bind_attrs: Disables bind/unbind via sysfs.
  + * @sysfs_bind_only: Only allow bind/unbind via sysfs.
* @of_match_table: The open firmware table.
* @acpi_match_table: The ACPI match table.
* @probe: Called to query the existence of a specific device,
  @@ -233,6 +234,7 @@ struct device_driver {
  const char  *mod_name;  /* used for built-in modules */
   
  bool suppress_bind_attrs;   /* disables bind/unbind via sysfs */
  +   bool sysfs_bind_only;   /* only allow bind/unbind via sysfs */
   
  const struct of_device_id   *of_match_table;
  const struct acpi_device_id *acpi_match_table;
  
 
 I think I only discussed this with Stuart in person at the KVM Forum:
 Why not deriving the property sysfs bind only from the fact that a
 device does wild-card binding? Are there use cases that benefit from
 decoupling both features?

The driver core does not know if a bus, or a device, handles wild card
binding, so you can't do it at this level, sorry.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-18 Thread Abel Gordon
On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
 Hi,

 Thank you all for your comments.
 I'm sorry for taking this long to reply, I was away on vacation..

 It was a good, long discussion, many issues were raised, which we'd like
 to address with the following proposed roadmap for Elvis patches.
 In general, we believe it would be best to start with patches that are
 as simple as possible, providing the basic Elvis functionality,
 and attend to the more complicated issues in subsequent patches.

 Here's the road map for Elvis patches:

 Thanks for the follow up. Some suggestions below.
 Please note they suggestions below merely represent
 thoughts on merging upstream.
 If as the first step you are content with keeping this
 work as out of tree patches, in order to have
 the freedom to experiment with interfaces and
 performance, please feel free to ignore them.

 1. Shared vhost thread for multiple devices.

 The way to go here, we believe, is to start with a patch having a shared
 vhost thread for multiple devices of the SAME vm.
 The next step/patch may be handling vms belonging to the same cgroup.

 Finally, we need to extend the functionality so that the shared vhost
 thread
 serves multiple vms (not necessarily belonging to the same cgroup).

 There was a lot of discussion about the way to address the enforcement
 of cgroup policies, and we will consider the various solutions with a
 future
 patch.

 With respect to the upstream kernel,
 I'm not sure a bunch of changes just for the sake of guests with
 multiple virtual NIC cards makes sense.
 And I wonder how this step, in isolation, will affect e.g.
 multiqueue workloads.
 But I guess if the numbers are convincing, this can be mergeable.

Even if you have a single multiqueue device this change allows
to create one vhost thread for all the queues, one vhost thread per
queue or any other combination. I guess that depending on the workload
and depending on the system utilization (free cycles/cores, density)
you would prefer
to use one or more vhost threads.



 2. Creation of vhost threads

 We suggested two ways of controlling the creation and removal of vhost
 threads:
 - statically determining the maximum number of virtio devices per worker
 via a kernel module parameter
 - dynamically: Sysfs mechanism to add and remove vhost threads

 It seems that it would be simplest to take the static approach as
 a first stage. At a second stage (next patch), we'll advance to
 dynamically
 changing the number of vhost threads, using the static module parameter
 only as a default value.

 I'm not sure how independent this is from 1.
 With respect to the upstream kernel,
 Introducing interfaces (which we'll have to maintain
 forever) just for the sake of guests with
 multiple virtual NIC cards does not look like a good tradeoff.

 So I'm unlikely to merge this upstream without making it useful cross-VM,
 and yes this means isolation and accounting with cgroups need to
 work properly.

Agree, but even if you use a single multiqueue device having the
ability to use 1 thread to serve all the queues or multiple threads to
serve all the queues looks like a useful feature.


 Regarding cwmq, it is an interesting mechanism, which we need to explore
 further.
 At the moment we prefer not to change the vhost model to use cwmq, as some
 of the issues that were discussed, such as cgroups, are not supported by
 cwmq, and this is adding more complexity.
 However, we'll look further into it, and consider it at a later stage.

 Hmm that's still assuming some smart management tool configuring
 this correctly.  Can't this be determined automatically depending
 on the workload?
 This is what the cwmq suggestion was really about: detect
 that we need more threads and spawn them.
 It's less about sharing the implementation with workqueues -
 would be very nice but not a must.

But how cwmq can consider cgroup accounting ?
In any case, IMHO, the kernel should first provide the mechanism so
later on a user-space management application (the policy) can
orchestrate it.



 3. Adding polling mode to vhost

 It is a good idea making polling adaptive based on various factors such as
 the I/O rate, the guest kick overhead(which is the tradeoff of polling),
 or the amount of wasted cycles (cycles we kept polling but no new work was
 added).
 However, as a beginning polling patch, we would prefer having a naive
 polling approach, which could be tuned with later patches.


 While any polling approach would still need a lot of testing to prove we
 don't for example steal CPU from guest which could be doing other useful
 work, given that an exit is at least 1.5K cycles at least in theory it
 seems like something that can improve performance.  I'm not sure how
 naive we can be without introducing regressions  for some workloads.
 For example, if we are on the same host CPU, there's no
 chance busy waiting 

Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values

2013-12-18 Thread Aneesh Kumar K.V
Alexander Graf ag...@suse.de writes:

 On 11.11.2013, at 15:02, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
 wrote:

 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Don't try to compute these values.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 
 NOTE: I am not sure why we were originally computing dsisr and dar. So may be
 we need a variant of this patch. But with this and the additional patch
 powerpc: book3s: PR: Enable Little Endian PR guest I am able to get a 
 Little Endian
 PR guest to boot.

 It's quite easy to find out - git blame tells you all the history and points 
 you to commit ca7f4203b.

 commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a
 Author: Alexander Graf ag...@suse.de
 Date:   Wed Mar 24 21:48:28 2010 +0100

 KVM: PPC: Implement alignment interrupt

 Mac OS X has some applications - namely the Finder - that require 
 alignment
 interrupts to work properly. So we need to implement them.

 But the spec for 970 and 750 also looks different. While 750 requires the
 DSISR and DAR fields to reflect some instruction bits (DSISR) and the 
 fault
 address (DAR), the 970 declares this as an optional feature. So we need
 to reconstruct DSISR and DAR manually.

 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Avi Kivity a...@redhat.com

 Read this as on 970, alignment interrupts don't give us DSISR and DAR of the 
 faulting instruction as otherwise I wouldn't have implemented it.

 So this is clearly a nack on this patch :).

I can possibly do a if (cpu_has_feature(CPU_FTR_ARCH_201)). But do we need
to do that ? According to Paul we should always find DAR.

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 v3] kvm: powerpc: use cache attributes from linux pte

2013-12-18 Thread Alexander Graf

On 18.11.2013, at 19:36, Scott Wood scottw...@freescale.com wrote:

 On Fri, 2013-11-15 at 11:01 +0530, Bharat Bhushan wrote:
 From: Bharat Bhushan bharat.bhus...@freescale.com
 
 v2-v3
  - Returning pte pointer rather than pte as suggested by Scott
  - Also removed PAGE_SPLITTING as this need to be handled by caller 
 
 v1-v2
 - Removed _PAGE_BUSY loop as suggested by PaulS.
 - Added check for PAGE_SPLITTING
 
 kvm: powerpc: use cache attributes from linux pte
 - 1st Patch fixes a bug in booke (detail in patch)
 - 2nd patch is renaming the linux_pte_lookup_function() just for clarity.
   There is not functional change.
 - 3nd Patch adds a Linux pte lookup function.
 - 4th Patch uses the above defined function and setup TLB.wimg accordingly
 
 Bharat Bhushan (4):
  kvm: booke: clear host tlb reference flag on guest tlb invalidation
  kvm: book3s: rename lookup_linux_pte() to
lookup_linux_pte_and_update()
  kvm: powerpc: define a linux pte lookup function
  kvm: powerpc: use caching attributes as per linux pte
 
 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/include/asm/pgtable.h  |   21 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |8 +++--
 arch/powerpc/kvm/booke.c|1 +
 arch/powerpc/kvm/e500.h |8 +++--
 arch/powerpc/kvm/e500_mmu_host.c|   55 +++---
 6 files changed, 64 insertions(+), 31 deletions(-)
 
 Reviewed-by: Scott Wood scottw...@freescale.com with patch 4/4
 replaced by v4.

Thanks, applied to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: book3s: kvm: Don't abuse host r2 in exit path

2013-12-18 Thread Alexander Graf

On 18.12.2013, at 08:35, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
wrote:

 
 Hi Alex,
 
 Any update on this ? We need this to got into 3.13.

Thanks, applied to for-3.13.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Don't drop low-order page address bits

2013-12-18 Thread Alexander Graf

On 16.12.2013, at 03:31, Paul Mackerras pau...@samba.org wrote:

 Commit caaa4c804fae (KVM: PPC: Book3S HV: Fix physical address
 calculations) unfortunately resulted in some low-order address bits
 getting dropped in the case where the guest is creating a 4k HPTE
 and the host page size is 64k.  By getting the low-order bits from
 hva rather than gpa we miss out on bits 12 - 15 in this case, since
 hva is at page granularity.  This puts the missing bits back in.
 
 Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org

Thanks, applied to for-3.13.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 04/12] KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic:

===
[ INFO: suspicious RCU usage. ]
3.12.0-rc5-kvm+ #9 Not tainted
---

include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-ppc/4831:

stack backtrace:
CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9
Call Trace:
[c00be462b2a0] [c001644c] .show_stack+0x7c/0x1f0 (unreliable)
[c00be462b370] [c0ad57c0] .dump_stack+0x88/0xb4
[c00be462b3f0] [c01315e8] .lockdep_rcu_suspicious+0x138/0x180
[c00be462b480] [c007862c] .gfn_to_memslot+0x13c/0x170
[c00be462b510] [c007d384] .gfn_to_hva_prot+0x24/0x90
[c00be462b5a0] [c007d420] .kvm_read_guest_page+0x30/0xd0
[c00be462b630] [c007d528] .kvm_read_guest+0x68/0x110
[c00be462b6e0] [c0084594] .kvmppc_rtas_hcall+0x34/0x180
[c00be462b7d0] [c0097934] .kvmppc_pseries_do_hcall+0x74/0x830
[c00be462b880] [c00990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0
[c00be462b9e0] [c00839cc] .kvmppc_vcpu_run+0x2c/0x40
[c00be462ba50] [c00810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c00be462bae0] [c007b508] .kvm_vcpu_ioctl+0x478/0x730
[c00be462bca0] [c025532c] .do_vfs_ioctl+0x4dc/0x7a0
[c00be462bd80] [c02556b4] .SyS_ioctl+0xc4/0xe0
[c00be462be30] [c0009ee4] syscall_exit+0x0/0x98

To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall()
call.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 31d9cfb..b51d5db 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -591,7 +591,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (list_empty(vcpu-kvm-arch.rtas_tokens))
return RESUME_HOST;
 
+   idx = srcu_read_lock(vcpu-kvm-srcu);
rc = kvmppc_rtas_hcall(vcpu);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);
 
if (rc == -ENOENT)
return RESUME_HOST;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 06/12] KVM: PPC: Book3S: PR: Don't clobber our exit handler id

2013-12-18 Thread Alexander Graf
We call a C helper to save all svcpu fields into our vcpu. The C
ABI states that r12 is considered volatile. However, we keep our
exit handler id in r12 currently.

So we need to save it away into a non-volatile register instead
that definitely does get preserved across the C call.

This bug usually didn't hit anyone yet since gcc is smart enough
to generate code that doesn't even need r12 which means it stayed
identical throughout the call by sheer luck. But we can't rely on
that.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index f4dd041..5e7cb32 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -132,9 +132,17 @@ kvm_start_lightweight:
 *
 */
 
+   PPC_LL  r3, GPR4(r1)/* vcpu pointer */
+
+   /*
+* kvmppc_copy_from_svcpu can clobber volatile registers, save
+* the exit handler id to the vcpu and restore it from there later.
+*/
+   stw r12, VCPU_TRAP(r3)
+
/* Transfer reg values from shadow vcpu back to vcpu struct */
/* On 64-bit, interrupts are still off at this point */
-   PPC_LL  r3, GPR4(r1)/* vcpu pointer */
+
GET_SHADOW_VCPU(r4)
bl  FUNC(kvmppc_copy_from_svcpu)
nop
@@ -151,7 +159,6 @@ kvm_start_lightweight:
 */
ld  r3, PACA_SPRG3(r13)
mtspr   SPRN_SPRG3, r3
-
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
/* R7 = vcpu */
@@ -177,7 +184,7 @@ kvm_start_lightweight:
PPC_STL r31, VCPU_GPR(R31)(r7)
 
/* Pass the exit number as 3rd argument to kvmppc_handle_exit */
-   mr  r5, r12
+   lwz r5, VCPU_TRAP(r7)
 
/* Restore r3 (kvm_run) and r4 (vcpu) */
REST_2GPRS(3, r1)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 07/12] KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu

2013-12-18 Thread Alexander Graf
The kvmppc_copy_{to,from}_svcpu functions are publically visible,
so we should also export them in a header for others C files to
consume.

So far we didn't need this because we only called it from asm code.
The next patch will introduce a C caller.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4a594b7..bc23b1b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -192,6 +192,10 @@ extern void kvmppc_load_up_vsx(void);
 extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
+extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
+struct kvm_vcpu *vcpu);
+extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
+  struct kvmppc_book3s_shadow_vcpu *svcpu);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 11/12] powerpc: book3s: kvm: Don't abuse host r2 in exit path

2013-12-18 Thread Alexander Graf
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

We don't use PACATOC for PR. Avoid updating HOST_R2 with PR
KVM mode when both HV and PR are enabled in the kernel. Without this we
get the below crash

(qemu)
Unable to handle kernel paging request for data at address 0x8310
Faulting instruction address: 0xc001d5a4
cpu 0x2: Vector: 300 (Data Access) at [c001dc53aef0]
pc: c001d5a4: .vtime_delta.isra.1+0x34/0x1d0
lr: c001d760: .vtime_account_system+0x20/0x60
sp: c001dc53b170
   msr: 80009032
   dar: 8310
 dsisr: 4000
  current = 0xc001d76c62d0
  paca= 0xcfef1100   softe: 0irq_happened: 0x01
pid   = 4472, comm = qemu-system-ppc
enter ? for help
[c001dc53b200] c001d760 .vtime_account_system+0x20/0x60
[c001dc53b290] c008d050 .kvmppc_handle_exit_pr+0x60/0xa50
[c001dc53b340] c008f51c kvm_start_lightweight+0xb4/0xc4
[c001dc53b510] c008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0
[c001dc53b9e0] c008341c .kvmppc_vcpu_run+0x2c/0x40
[c001dc53ba50] c0080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
[c001dc53bae0] c007b4c8 .kvm_vcpu_ioctl+0x478/0x730
[c001dc53bca0] c02140cc .do_vfs_ioctl+0x4ac/0x770
[c001dc53bd80] c02143e8 .SyS_ioctl+0x58/0xb0
[c001dc53be30] c0009e58 syscall_exit+0x0/0x98

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
 arch/powerpc/kernel/asm-offsets.c | 1 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 7 +++
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 412b2f3..192917d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -79,6 +79,7 @@ struct kvmppc_host_state {
ulong vmhandler;
ulong scratch0;
ulong scratch1;
+   ulong scratch2;
u8 in_guest;
u8 restore_hid5;
u8 napping;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 2ea5cc0..d3de010 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -576,6 +576,7 @@ int main(void)
HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler);
HSTATE_FIELD(HSTATE_SCRATCH0, scratch0);
HSTATE_FIELD(HSTATE_SCRATCH1, scratch1);
+   HSTATE_FIELD(HSTATE_SCRATCH2, scratch2);
HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5);
HSTATE_FIELD(HSTATE_NAPPING, napping);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bde28da..be4fa04a 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -754,15 +754,14 @@ kvmppc_interrupt_hv:
 * guest CR, R12 saved in shadow VCPU SCRATCH1/0
 * guest R13 saved in SPRN_SCRATCH0
 */
-   /* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */
-   std r9, HSTATE_HOST_R2(r13)
+   std r9, HSTATE_SCRATCH2(r13)
 
lbz r9, HSTATE_IN_GUEST(r13)
cmpwi   r9, KVM_GUEST_MODE_HOST_HV
beq kvmppc_bad_host_intr
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
cmpwi   r9, KVM_GUEST_MODE_GUEST
-   ld  r9, HSTATE_HOST_R2(r13)
+   ld  r9, HSTATE_SCRATCH2(r13)
beq kvmppc_interrupt_pr
 #endif
/* We're now back in the host but in guest MMU context */
@@ -782,7 +781,7 @@ kvmppc_interrupt_hv:
std r6, VCPU_GPR(R6)(r9)
std r7, VCPU_GPR(R7)(r9)
std r8, VCPU_GPR(R8)(r9)
-   ld  r0, HSTATE_HOST_R2(r13)
+   ld  r0, HSTATE_SCRATCH2(r13)
std r0, VCPU_GPR(R9)(r9)
std r10, VCPU_GPR(R10)(r9)
std r11, VCPU_GPR(R11)(r9)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 10/12] powerpc/kvm/booke: Fix build break due to stack frame size warning

2013-12-18 Thread Alexander Graf
From: Scott Wood scottw...@freescale.com

Commit ce11e48b7fdd256ec68b932a89b397a790566031 (KVM: PPC: E500: Add
userspace debug stub support) added struct thread_struct to the
stack of kvmppc_vcpu_run().  thread_struct is 1152 bytes on my build,
compared to 48 bytes for the recently-introduced struct debug_reg.
Use the latter instead.

This fixes the following error:

cc1: warnings being treated as errors
arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run':
arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger 
than 1024 bytes
make[2]: *** [arch/powerpc/kvm/booke.o] Error 1
make[1]: *** [arch/powerpc/kvm] Error 2
make[1]: *** Waiting for unfinished jobs

Signed-off-by: Scott Wood scottw...@freescale.com
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/switch_to.h |  2 +-
 arch/powerpc/kernel/process.c| 32 
 arch/powerpc/kvm/booke.c | 12 ++--
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9ee1261..aace905 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -35,7 +35,7 @@ extern void giveup_vsx(struct task_struct *);
 extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
-extern void switch_booke_debug_regs(struct thread_struct *new_thread);
+extern void switch_booke_debug_regs(struct debug_reg *new_debug);
 
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 75c2d10..83530af 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -339,7 +339,7 @@ static void set_debug_reg_defaults(struct thread_struct 
*thread)
 #endif
 }
 
-static void prime_debug_regs(struct thread_struct *thread)
+static void prime_debug_regs(struct debug_reg *debug)
 {
/*
 * We could have inherited MSR_DE from userspace, since
@@ -348,22 +348,22 @@ static void prime_debug_regs(struct thread_struct *thread)
 */
mtmsr(mfmsr()  ~MSR_DE);
 
-   mtspr(SPRN_IAC1, thread-debug.iac1);
-   mtspr(SPRN_IAC2, thread-debug.iac2);
+   mtspr(SPRN_IAC1, debug-iac1);
+   mtspr(SPRN_IAC2, debug-iac2);
 #if CONFIG_PPC_ADV_DEBUG_IACS  2
-   mtspr(SPRN_IAC3, thread-debug.iac3);
-   mtspr(SPRN_IAC4, thread-debug.iac4);
+   mtspr(SPRN_IAC3, debug-iac3);
+   mtspr(SPRN_IAC4, debug-iac4);
 #endif
-   mtspr(SPRN_DAC1, thread-debug.dac1);
-   mtspr(SPRN_DAC2, thread-debug.dac2);
+   mtspr(SPRN_DAC1, debug-dac1);
+   mtspr(SPRN_DAC2, debug-dac2);
 #if CONFIG_PPC_ADV_DEBUG_DVCS  0
-   mtspr(SPRN_DVC1, thread-debug.dvc1);
-   mtspr(SPRN_DVC2, thread-debug.dvc2);
+   mtspr(SPRN_DVC1, debug-dvc1);
+   mtspr(SPRN_DVC2, debug-dvc2);
 #endif
-   mtspr(SPRN_DBCR0, thread-debug.dbcr0);
-   mtspr(SPRN_DBCR1, thread-debug.dbcr1);
+   mtspr(SPRN_DBCR0, debug-dbcr0);
+   mtspr(SPRN_DBCR1, debug-dbcr1);
 #ifdef CONFIG_BOOKE
-   mtspr(SPRN_DBCR2, thread-debug.dbcr2);
+   mtspr(SPRN_DBCR2, debug-dbcr2);
 #endif
 }
 /*
@@ -371,11 +371,11 @@ static void prime_debug_regs(struct thread_struct *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
-void switch_booke_debug_regs(struct thread_struct *new_thread)
+void switch_booke_debug_regs(struct debug_reg *new_debug)
 {
if ((current-thread.debug.dbcr0  DBCR0_IDM)
-   || (new_thread-debug.dbcr0  DBCR0_IDM))
-   prime_debug_regs(new_thread);
+   || (new_debug-dbcr0  DBCR0_IDM))
+   prime_debug_regs(new_debug);
 }
 EXPORT_SYMBOL_GPL(switch_booke_debug_regs);
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
@@ -683,7 +683,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   switch_booke_debug_regs(new-thread);
+   switch_booke_debug_regs(new-thread.debug);
 #else
 /*
  * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 53e65a2..0591e05 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -681,7 +681,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int ret, s;
-   struct thread_struct thread;
+   struct debug_reg debug;
 #ifdef CONFIG_PPC_FPU
struct thread_fp_state fp;
int fpexc_mode;
@@ -723,9 +723,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 #endif
 
/* Switch to guest debug context */
-   thread.debug = vcpu-arch.shadow_dbg_reg;
-   switch_booke_debug_regs(thread);
-   thread.debug = current-thread.debug;
+  

[PULL 03/12] KVM: PPC: Book3S HV: Make tbacct_lock irq-safe

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Lockdep reported that there is a potential for deadlock because
vcpu-arch.tbacct_lock is not irq-safe, and is sometimes taken inside
the rq_lock (run-queue lock) in the scheduler, which is taken within
interrupts.  The lockdep splat looks like:

==
[ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ]
3.12.0-rc5-kvm+ #8 Not tainted
--
qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
((vcpu-arch.tbacct_lock)-rlock){+.+...}, at: [c00947ac] 
.kvmppc_core_vcpu_put_hv+0x2c/0xa0

and this task is already holding:
(rq-lock){-.-.-.}, at: [c0ac16c0] .__schedule+0x180/0xaa0
which would create a new lock dependency:
(rq-lock){-.-.-.} - ((vcpu-arch.tbacct_lock)-rlock){+.+...}

but this new dependency connects a HARDIRQ-irq-safe lock:
(rq-lock){-.-.-.}
... which became HARDIRQ-irq-safe at:
 [c013797c] .lock_acquire+0xbc/0x190
 [c0ac3c74] ._raw_spin_lock+0x34/0x60
 [c00f8564] .scheduler_tick+0x54/0x180
 [c00c2610] .update_process_times+0x70/0xa0
 [c012cdfc] .tick_periodic+0x3c/0xe0
 [c012cec8] .tick_handle_periodic+0x28/0xb0
 [c001ef40] .timer_interrupt+0x120/0x2e0
 [c0002868] decrementer_common+0x168/0x180
 [c01c7ca4] .get_page_from_freelist+0x924/0xc10
 [c01c8e00] .__alloc_pages_nodemask+0x200/0xba0
 [c01c9eb8] .alloc_pages_exact_nid+0x68/0x110
 [c0f4c3ec] .page_cgroup_init+0x1e0/0x270
 [c0f24480] .start_kernel+0x3e0/0x4e4
 [c0009d30] .start_here_common+0x20/0x70

to a HARDIRQ-irq-unsafe lock:
((vcpu-arch.tbacct_lock)-rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [c013797c] .lock_acquire+0xbc/0x190
 [c0ac3c74] ._raw_spin_lock+0x34/0x60
 [c00946ac] .kvmppc_core_vcpu_load_hv+0x2c/0x100
 [c008394c] .kvmppc_core_vcpu_load+0x2c/0x40
 [c0081000] .kvm_arch_vcpu_load+0x10/0x30
 [c007afd4] .vcpu_load+0x64/0xd0
 [c007b0f8] .kvm_vcpu_ioctl+0x68/0x730
 [c025530c] .do_vfs_ioctl+0x4dc/0x7a0
 [c0255694] .SyS_ioctl+0xc4/0xe0
 [c0009ee4] syscall_exit+0x0/0x98

Some users have reported this deadlock occurring in practice, though
the reports have been primarily on 3.10.x-based kernels.

This fixes the problem by making tbacct_lock be irq-safe.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 072287f..31d9cfb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -131,8 +131,9 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
+   unsigned long flags;
 
-   spin_lock(vcpu-arch.tbacct_lock);
+   spin_lock_irqsave(vcpu-arch.tbacct_lock, flags);
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE 
vc-preempt_tb != TB_NIL) {
vc-stolen_tb += mftb() - vc-preempt_tb;
@@ -143,19 +144,20 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu 
*vcpu, int cpu)
vcpu-arch.busy_stolen += mftb() - vcpu-arch.busy_preempt;
vcpu-arch.busy_preempt = TB_NIL;
}
-   spin_unlock(vcpu-arch.tbacct_lock);
+   spin_unlock_irqrestore(vcpu-arch.tbacct_lock, flags);
 }
 
 static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcore *vc = vcpu-arch.vcore;
+   unsigned long flags;
 
-   spin_lock(vcpu-arch.tbacct_lock);
+   spin_lock_irqsave(vcpu-arch.tbacct_lock, flags);
if (vc-runner == vcpu  vc-vcore_state != VCORE_INACTIVE)
vc-preempt_tb = mftb();
if (vcpu-arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
vcpu-arch.busy_preempt = mftb();
-   spin_unlock(vcpu-arch.tbacct_lock);
+   spin_unlock_irqrestore(vcpu-arch.tbacct_lock, flags);
 }
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
@@ -486,11 +488,11 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 
now)
 */
if (vc-vcore_state != VCORE_INACTIVE 
vc-runner-arch.run_task != current) {
-   spin_lock(vc-runner-arch.tbacct_lock);
+   spin_lock_irq(vc-runner-arch.tbacct_lock);
p = vc-stolen_tb;
if (vc-preempt_tb != TB_NIL)
p += now - vc-preempt_tb;
-   spin_unlock(vc-runner-arch.tbacct_lock);
+   spin_unlock_irq(vc-runner-arch.tbacct_lock);
} else {
p = vc-stolen_tb;
}
@@ -512,10 +514,10 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
core_stolen = 

[PULL 00/12] 3.13 patch queue 2013-12-18 for 3.13

2013-12-18 Thread Alexander Graf
Hi Paolo / Gleb,

This is my current patch queue for 3.13.

It fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Please pull.

Alex


The following changes since commit f080480488028bcc25357f85e8ae54ccc3bb7173:

  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm 
(2013-11-15 13:51:36 +0900)

are available in the git repository at:


  git://github.com/agraf/linux-2.6.git tags/signed-for-3.13

for you to fetch changes up to df9059bb64023da9f27e56a94a3e2b8f4b6336a9:

  KVM: PPC: Book3S HV: Don't drop low-order page address bits (2013-12-18 
11:30:35 +0100)


Patch queue for 3.13 - 2013-12-18

This fixes some grave issues we've only found after 3.13-rc1:

  - Make the modularized HV/PR book3s kvm work well as modules
  - Fix some race conditions
  - Fix compilation with certain compilers (booke)
  - Fix THP for book3s_hv
  - Fix preemption for book3s_pr

Alexander Graf (4):
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
  powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
  KVM: PPC: Book3S HV: Fix physical address calculations
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
  powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
  powerpc: kvm: fix rare but potential deadlock scene


Alexander Graf (4):
  KVM: PPC: Book3S: PR: Don't clobber our exit handler id
  KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu
  KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy
  KVM: PPC: Book3S: PR: Enable interrupts earlier

Aneesh Kumar K.V (1):
  powerpc: book3s: kvm: Don't abuse host r2 in exit path

Paul Mackerras (5):
  KVM: PPC: Book3S HV: Fix physical address calculations
  KVM: PPC: Book3S HV: Refine barriers in guest entry/exit
  KVM: PPC: Book3S HV: Make tbacct_lock irq-safe
  KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call
  KVM: PPC: Book3S HV: Don't drop low-order page address bits

Scott Wood (1):
  powerpc/kvm/booke: Fix build break due to stack frame size warning

pingfan liu (1):
  powerpc: kvm: fix rare but potential deadlock scene

 arch/powerpc/include/asm/kvm_book3s.h |  4 
 arch/powerpc/include/asm/kvm_book3s_asm.h |  2 ++
 arch/powerpc/include/asm/switch_to.h  |  2 +-
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/process.c | 32 +++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   | 18 +
 arch/powerpc/kvm/book3s_hv.c  | 24 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |  9 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 --
 arch/powerpc/kvm/book3s_interrupts.S  | 19 ++
 arch/powerpc/kvm/book3s_pr.c  | 22 +
 arch/powerpc/kvm/book3s_rmhandlers.S  |  6 +-
 arch/powerpc/kvm/booke.c  | 12 ++--
 13 files changed, 112 insertions(+), 62 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 12/12] KVM: PPC: Book3S HV: Don't drop low-order page address bits

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Commit caaa4c804fae (KVM: PPC: Book3S HV: Fix physical address
calculations) unfortunately resulted in some low-order address bits
getting dropped in the case where the guest is creating a 4k HPTE
and the host page size is 64k.  By getting the low-order bits from
hva rather than gpa we miss out on bits 12 - 15 in this case, since
hva is at page granularity.  This puts the missing bits back in.

Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1931aa3..8689e2e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -240,6 +240,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
is_io = hpte_cache_bits(pte_val(pte));
pa = pte_pfn(pte)  PAGE_SHIFT;
pa |= hva  (pte_size - 1);
+   pa |= gpa  ~PAGE_MASK;
}
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 02/12] KVM: PPC: Book3S HV: Refine barriers in guest entry/exit

2013-12-18 Thread Alexander Graf
From: Paul Mackerras pau...@samba.org

Some users have reported instances of the host hanging with secondary
threads of a core waiting for the primary thread to exit the guest,
and the primary thread stuck in nap mode.  This prompted a review of
the memory barriers in the guest entry/exit code, and this is the
result.  Most of these changes are the suggestions of Dean Burdick
deanburd...@us.ibm.com.

The barriers between updating napping_threads and reading the
entry_exit_count on the one hand, and updating entry_exit_count and
reading napping_threads on the other, need to be isync not lwsync,
since we need to ensure that either the napping_threads update or the
entry_exit_count update get seen.  It is not sufficient to order the
load vs. lwarx, as lwsync does; we need to order the load vs. the
stwcx., so we need isync.

In addition, we need a full sync before sending IPIs to wake other
threads from nap, to ensure that the write to the entry_exit_count is
visible before the IPI occurs.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bc8de75..bde28da 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -153,7 +153,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 
 13:b   machine_check_fwnmi
 
-
 /*
  * We come in here when wakened from nap mode on a secondary hw thread.
  * Relocation is off and most register values are lost.
@@ -224,6 +223,11 @@ kvm_start_guest:
/* Clear our vcpu pointer so we don't come back in early */
li  r0, 0
std r0, HSTATE_KVM_VCPU(r13)
+   /*
+* Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing
+* the nap_count, because once the increment to nap_count is
+* visible we could be given another vcpu.
+*/
lwsync
/* Clear any pending IPI - we're an offline thread */
ld  r5, HSTATE_XICS_PHYS(r13)
@@ -241,7 +245,6 @@ kvm_start_guest:
/* increment the nap count and then go to nap mode */
ld  r4, HSTATE_KVM_VCORE(r13)
addir4, r4, VCORE_NAP_COUNT
-   lwsync  /* make previous updates visible */
 51:lwarx   r3, 0, r4
addir3, r3, 1
stwcx.  r3, 0, r4
@@ -990,14 +993,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
 */
/* Increment the threads-exiting-guest count in the 0xff00
   bits of vcore-entry_exit_count */
-   lwsync
ld  r5,HSTATE_KVM_VCORE(r13)
addir6,r5,VCORE_ENTRY_EXIT
 41:lwarx   r3,0,r6
addir0,r3,0x100
stwcx.  r0,0,r6
bne 41b
-   lwsync
+   isync   /* order stwcx. vs. reading napping_threads */
 
/*
 * At this point we have an interrupt that we have to pass
@@ -1030,6 +1032,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
sld r0,r0,r4
andc.   r3,r3,r0/* no sense IPI'ing ourselves */
beq 43f
+   /* Order entry/exit update vs. IPIs */
+   sync
mulli   r4,r4,PACA_SIZE /* get paca for thread 0 */
subfr6,r4,r13
 42:andi.   r0,r3,1
@@ -1638,10 +1642,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
bge kvm_cede_exit
stwcx.  r4,0,r6
bne 31b
+   /* order napping_threads update vs testing entry_exit_count */
+   isync
li  r0,1
stb r0,HSTATE_NAPPING(r13)
-   /* order napping_threads update vs testing entry_exit_count */
-   lwsync
mr  r4,r3
lwz r7,VCORE_ENTRY_EXIT(r5)
cmpwi   r7,0x100
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 08/12] KVM: PPC: Book3S: PR: Make svcpu - vcpu store preempt savvy

2013-12-18 Thread Alexander Graf
As soon as we get back to our highmem handler in virtual address
space we may get preempted. Today the reason we can get preempted is
that we replay interrupts and all the lazy logic thinks we have
interrupts enabled.

However, it's not hard to make the code interruptible and that way
we can enable and handle interrupts even earlier.

This fixes random guest crashes that happened with CONFIG_PREEMPT=y
for me.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  1 +
 arch/powerpc/kvm/book3s_pr.c  | 22 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 0bd9348..412b2f3 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -106,6 +106,7 @@ struct kvmppc_host_state {
 };
 
 struct kvmppc_book3s_shadow_vcpu {
+   bool in_use;
ulong gpr[14];
u32 cr;
u32 xer;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index fe14ca3..5b9e906 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -66,6 +66,7 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, 
int cpu)
struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
memcpy(svcpu-slb, to_book3s(vcpu)-slb_shadow, sizeof(svcpu-slb));
svcpu-slb_max = to_book3s(vcpu)-slb_shadow_max;
+   svcpu-in_use = 0;
svcpu_put(svcpu);
 #endif
vcpu-cpu = smp_processor_id();
@@ -78,6 +79,9 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu);
+   if (svcpu-in_use) {
+   kvmppc_copy_from_svcpu(vcpu, svcpu);
+   }
memcpy(to_book3s(vcpu)-slb_shadow, svcpu-slb, sizeof(svcpu-slb));
to_book3s(vcpu)-slb_shadow_max = svcpu-slb_max;
svcpu_put(svcpu);
@@ -110,12 +114,26 @@ void kvmppc_copy_to_svcpu(struct 
kvmppc_book3s_shadow_vcpu *svcpu,
svcpu-ctr = vcpu-arch.ctr;
svcpu-lr  = vcpu-arch.lr;
svcpu-pc  = vcpu-arch.pc;
+   svcpu-in_use = true;
 }
 
 /* Copy data touched by real-mode code from shadow vcpu back to vcpu */
 void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
struct kvmppc_book3s_shadow_vcpu *svcpu)
 {
+   /*
+* vcpu_put would just call us again because in_use hasn't
+* been updated yet.
+*/
+   preempt_disable();
+
+   /*
+* Maybe we were already preempted and synced the svcpu from
+* our preempt notifiers. Don't bother touching this svcpu then.
+*/
+   if (!svcpu-in_use)
+   goto out;
+
vcpu-arch.gpr[0] = svcpu-gpr[0];
vcpu-arch.gpr[1] = svcpu-gpr[1];
vcpu-arch.gpr[2] = svcpu-gpr[2];
@@ -139,6 +157,10 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
vcpu-arch.fault_dar   = svcpu-fault_dar;
vcpu-arch.fault_dsisr = svcpu-fault_dsisr;
vcpu-arch.last_inst   = svcpu-last_inst;
+   svcpu-in_use = false;
+
+out:
+   preempt_enable();
 }
 
 static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 09/12] KVM: PPC: Book3S: PR: Enable interrupts earlier

2013-12-18 Thread Alexander Graf
Now that the svcpu sync is interrupt aware we can enable interrupts
earlier in the exit code path again, moving 32bit and 64bit closer
together.

While at it, document the fact that we're always executing the exit
path with interrupts enabled so that the next person doesn't trap
over this.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_interrupts.S | 6 +-
 arch/powerpc/kvm/book3s_rmhandlers.S | 6 +-
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_interrupts.S 
b/arch/powerpc/kvm/book3s_interrupts.S
index 5e7cb32..f779450 100644
--- a/arch/powerpc/kvm/book3s_interrupts.S
+++ b/arch/powerpc/kvm/book3s_interrupts.S
@@ -129,6 +129,7 @@ kvm_start_lightweight:
 * R12  = exit handler id
 * R13  = PACA
 * SVCPU.*  = guest *
+* MSR.EE   = 1
 *
 */
 
@@ -148,11 +149,6 @@ kvm_start_lightweight:
nop
 
 #ifdef CONFIG_PPC_BOOK3S_64
-   /* Re-enable interrupts */
-   ld  r3, HSTATE_HOST_MSR(r13)
-   ori r3, r3, MSR_EE
-   MTMSR_EERI(r3)
-
/*
 * Reload kernel SPRG3 value.
 * No need to save guest value as usermode can't modify SPRG3.
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
b/arch/powerpc/kvm/book3s_rmhandlers.S
index a38c4c9..c3c5231 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -153,15 +153,11 @@ _GLOBAL(kvmppc_entry_trampoline)
 
li  r6, MSR_IR | MSR_DR
andcr6, r5, r6  /* Clear DR and IR in MSR value */
-#ifdef CONFIG_PPC_BOOK3S_32
/*
 * Set EE in HOST_MSR so that it's enabled when we get into our
-* C exit handler function.  On 64-bit we delay enabling
-* interrupts until we have finished transferring stuff
-* to or from the PACA.
+* C exit handler function.
 */
ori r5, r5, MSR_EE
-#endif
mtsrr0  r7
mtsrr1  r6
RFI
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: book3s: kvm: Use the saved dsisr and dar values

2013-12-18 Thread Paul Mackerras
On Wed, Dec 18, 2013 at 10:44:08PM +0100, Alexander Graf wrote:
 
 On 11.11.2013, at 15:02, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
 wrote:
 
  From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  
  Don't try to compute these values.
  
  Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
  ---
  
  NOTE: I am not sure why we were originally computing dsisr and dar. So may 
  be
  we need a variant of this patch. But with this and the additional patch
  powerpc: book3s: PR: Enable Little Endian PR guest I am able to get a 
  Little Endian
  PR guest to boot.
 
 It's quite easy to find out - git blame tells you all the history and points 
 you to commit ca7f4203b.
 
 commit ca7f4203b9b66e12d0d9968ff7dfe781f3a9695a
 Author: Alexander Graf ag...@suse.de
 Date:   Wed Mar 24 21:48:28 2010 +0100
 
 KVM: PPC: Implement alignment interrupt
 
 Mac OS X has some applications - namely the Finder - that require 
 alignment
 interrupts to work properly. So we need to implement them.
 
 But the spec for 970 and 750 also looks different. While 750 requires the
 DSISR and DAR fields to reflect some instruction bits (DSISR) and the 
 fault
 address (DAR), the 970 declares this as an optional feature. So we need
 to reconstruct DSISR and DAR manually.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 Signed-off-by: Avi Kivity a...@redhat.com
 
 Read this as on 970, alignment interrupts don't give us DSISR and DAR of the 
 faulting instruction as otherwise I wouldn't have implemented it.

Although it's optional, all IBM POWER cpus, and as far as I know all
PowerPC cpus, set DAR on an alignment interrupt to the effective
address being accessed.  You have a valid point regarding DSISR, but
it would be nice to skip the computations where either the host CPU
provides the bits, or the virtual CPU doesn't.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html