date:20150701

Re: [PATCH v3 01/11] KVM: arm: plug guest debug exploit

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 03:04:00PM +0800, zichao wrote:
 
 
 On June 29, 2015 11:49:53 PM GMT+08:00, Christoffer Dall 
 christoffer.d...@linaro.org wrote:
 On Mon, Jun 22, 2015 at 06:41:24PM +0800, Zhichao Huang wrote:
  Hardware debugging in guests is not intercepted currently, it means
  that a malicious guest can bring down the entire machine by writing
  to the debug registers.
  
  This patch enable trapping of all debug registers, preventing the
 guests
  to access the debug registers.
  
  This patch also disable the debug mode(DBGDSCR) in the guest world
 all
  the time, preventing the guests to mess with the host state.
  
  However, it is a precursor for later patches which will need to do
  more to world switch debug states while necessary.
  
  Cc: sta...@vger.kernel.org
  Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
  ---
   arch/arm/include/asm/kvm_coproc.h |  3 +-
   arch/arm/kvm/coproc.c | 60
 +++
   arch/arm/kvm/handle_exit.c|  4 +--
   arch/arm/kvm/interrupts_head.S| 13 -
   4 files changed, 70 insertions(+), 10 deletions(-)
  
  diff --git a/arch/arm/include/asm/kvm_coproc.h
 b/arch/arm/include/asm/kvm_coproc.h
  index 4917c2f..e74ab0f 100644
  --- a/arch/arm/include/asm/kvm_coproc.h
  +++ b/arch/arm/include/asm/kvm_coproc.h
  @@ -31,7 +31,8 @@ void kvm_register_target_coproc_table(struct
 kvm_coproc_target_table *table);
   int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
   int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run
 *run);
   int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run
 *run);
  -int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run
 *run);
  +int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
  +int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
   int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
   int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
   
  diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
  index f3d88dc..2e12760 100644
  --- a/arch/arm/kvm/coproc.c
  +++ b/arch/arm/kvm/coproc.c
  @@ -91,12 +91,6 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu
 *vcpu, struct kvm_run *run)
 return 1;
   }
   
  -int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run
 *run)
  -{
  -  kvm_inject_undefined(vcpu);
  -  return 1;
  -}
  -
   static void reset_mpidr(struct kvm_vcpu *vcpu, const struct
 coproc_reg *r)
   {
 /*
  @@ -519,6 +513,60 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu,
 struct kvm_run *run)
 return emulate_cp15(vcpu, params);
   }
   
  +/**
  + * kvm_handle_cp14_64 -- handles a mrrc/mcrr trap on a guest CP14
 access
  + * @vcpu: The VCPU pointer
  + * @run:  The kvm_run struct
  + */
  +int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
  +{
  +  struct coproc_params params;
  +
  +  params.CRn = (kvm_vcpu_get_hsr(vcpu)  1)  0xf;
  +  params.Rt1 = (kvm_vcpu_get_hsr(vcpu)  5)  0xf;
  +  params.is_write = ((kvm_vcpu_get_hsr(vcpu)  1) == 0);
  +  params.is_64bit = true;
  +
  +  params.Op1 = (kvm_vcpu_get_hsr(vcpu)  16)  0xf;
  +  params.Op2 = 0;
  +  params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
  +  params.CRm = 0;
 
 this is a complete duplicate of kvm_handle_cp15_64, can you share this
 code somehow?
 
 
 This patch just want to plug the exploit in the simplest way, and I shared 
 the cp14/cp15 handlers in later patches [PATCH v3 04/11].
 
 Should I take the patch [04/11] ahead of current patch [01/11] ?
 

It would be good if the patch that we can cc stable and which fixes the
issue is self-contained.  If it's impossible to do that while sharing
the handlers (I don't see why, but I didn't write the code) then ok, but
otherwise just add that bit of code into this patch I would say.

  +
  +  /* raz_wi */
  +  (void)pm_fake(vcpu, params, NULL);
  +
  +  /* handled */
  +  kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
  +  return 1;
  +}
  +
  +/**
  + * kvm_handle_cp14_32 -- handles a mrc/mcr trap on a guest CP14
 access
  + * @vcpu: The VCPU pointer
  + * @run:  The kvm_run struct
  + */
  +int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
  +{
  +  struct coproc_params params;
  +
  +  params.CRm = (kvm_vcpu_get_hsr(vcpu)  1)  0xf;
  +  params.Rt1 = (kvm_vcpu_get_hsr(vcpu)  5)  0xf;
  +  params.is_write = ((kvm_vcpu_get_hsr(vcpu)  1) == 0);
  +  params.is_64bit = false;
  +
  +  params.CRn = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
  +  params.Op1 = (kvm_vcpu_get_hsr(vcpu)  14)  0x7;
  +  params.Op2 = (kvm_vcpu_get_hsr(vcpu)  17)  0x7;
  +  params.Rt2 = 0;
 
 this is a complete duplicate of kvm_handle_cp15_32, can you share this
 code somehow?
 
  +
  +  /* raz_wi */
  +  (void)pm_fake(vcpu, params, NULL);
  +
  +  /* handled */
  +  kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
  +  return 1;
  +}
  +

Re: [PATCH v3 04/11] KVM: arm: common infrastructure for handling AArch32 CP14/CP15

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 03:09:35PM +0800, zichao wrote:
 
 
 On June 30, 2015 3:43:34 AM GMT+08:00, Christoffer Dall 
 christoffer.d...@linaro.org wrote:
 On Mon, Jun 22, 2015 at 06:41:27PM +0800, Zhichao Huang wrote:
  As we're about to trap a bunch of CP14 registers, let's rework
  the CP15 handling so it can be generalized and work with multiple
  tables.
  
  Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
  ---
   arch/arm/kvm/coproc.c  | 176
 ++---
   arch/arm/kvm/interrupts_head.S |   2 +-
   2 files changed, 112 insertions(+), 66 deletions(-)
  
  diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
  index 9d283d9..d23395b 100644
  --- a/arch/arm/kvm/coproc.c
  +++ b/arch/arm/kvm/coproc.c
  @@ -375,6 +375,9 @@ static const struct coproc_reg cp15_regs[] = {
 { CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
   };
   
  +static const struct coproc_reg cp14_regs[] = {
  +};
  +
   /* Target specific emulation tables */
   static struct kvm_coproc_target_table
 *target_tables[KVM_ARM_NUM_TARGETS];
   
  @@ -424,47 +427,75 @@ static const struct coproc_reg *find_reg(const
 struct coproc_params *params,
 return NULL;
   }
   
  -static int emulate_cp15(struct kvm_vcpu *vcpu,
  -  const struct coproc_params *params)
  +/*
  + * emulate_cp --  tries to match a cp14/cp15 access in a handling
 table,
  + *and call the corresponding trap handler.
  + *
  + * @params: pointer to the descriptor of the access
  + * @table: array of trap descriptors
  + * @num: size of the trap descriptor array
  + *
  + * Return 0 if the access has been handled, and -1 if not.
  + */
  +static int emulate_cp(struct kvm_vcpu *vcpu,
  +  const struct coproc_params *params,
  +  const struct coproc_reg *table,
  +  size_t num)
   {
  -  size_t num;
  -  const struct coproc_reg *table, *r;
  -
  -  trace_kvm_emulate_cp15_imp(params-Op1, params-Rt1, params-CRn,
  - params-CRm, params-Op2, params-is_write);
  +  const struct coproc_reg *r;
   
  -  table = get_target_table(vcpu-arch.target, num);
  +  if (!table)
  +  return -1;  /* Not handled */
   
  -  /* Search target-specific then generic table. */
 r = find_reg(params, table, num);
  -  if (!r)
  -  r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
   
  -  if (likely(r)) {
  +  if (r) {
 /* If we don't have an accessor, we should never get here! */
 BUG_ON(!r-access);
   
 if (likely(r-access(vcpu, params, r))) {
 /* Skip instruction, since it was emulated */
 kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
  -  return 1;
 }
  -  /* If access function fails, it should complain. */
  -  } else {
  -  kvm_err(Unsupported guest CP15 access at: %08lx\n,
  -  *vcpu_pc(vcpu));
  -  print_cp_instr(params);
  +
  +  /* Handled */
  +  return 0;
 }
  +
  +  /* Not handled */
  +  return -1;
  +}
  +
  +static void unhandled_cp_access(struct kvm_vcpu *vcpu,
  +  const struct coproc_params *params)
  +{
  +  u8 hsr_ec = kvm_vcpu_trap_get_class(vcpu);
  +  int cp;
  +
  +  switch (hsr_ec) {
  +  case HSR_EC_CP15_32:
  +  case HSR_EC_CP15_64:
  +  cp = 15;
  +  break;
  +  case HSR_EC_CP14_MR:
  +  case HSR_EC_CP14_64:
  +  cp = 14;
  +  break;
  +  default:
  +  WARN_ON((cp = -1));
  +  }
  +
  +  kvm_err(Unsupported guest CP%d access at: %08lx\n,
  +  cp, *vcpu_pc(vcpu));
  +  print_cp_instr(params);
 kvm_inject_undefined(vcpu);
  -  return 1;
   }
   
  -/**
  - * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15
 access
  - * @vcpu: The VCPU pointer
  - * @run:  The kvm_run struct
  - */
  -int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
  +int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
  +  const struct coproc_reg *global,
  +  size_t nr_global,
  +  const struct coproc_reg *target_specific,
  +  size_t nr_specific)
   {
 struct coproc_params params;
   
  @@ -478,7 +509,13 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu,
 struct kvm_run *run)
 params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
 params.CRm = 0;
   
  -  return emulate_cp15(vcpu, params);
  +  if (!emulate_cp(vcpu, params, target_specific, nr_specific))
  +  return 1;
  +  if (!emulate_cp(vcpu, params, global, nr_global))
  +  return 1;
  +
  +  unhandled_cp_access(vcpu, params);
  +  return 1;
   }
   
   static void reset_coproc_regs(struct kvm_vcpu *vcpu,
  @@ -491,12 +528,11 @@ static void reset_coproc_regs(struct kvm_vcpu
 *vcpu,
 table[i].reset(vcpu, table[i]);
   }
   
  -/**
  - * kvm_handle_cp15_32 -- handles a mrc/mcr trap

[PATCH v3 1/2] vhost: extend memory regions allocation to vmalloc

2015-07-01 Thread Igor Mammedov

with large number of memory regions we could end up with
high order allocations and kmalloc could fail if
host is under memory pressure.
Considering that memory regions array is used on hot path
try harder to allocate using kmalloc and if it fails resort
to vmalloc.
It's still better than just failing vhost_set_memory() and
causing guest crash due to it when a new memory hotplugged
to guest.

I'll still look at QEMU side solution to reduce amount of
memory regions it feeds to vhost to make things even better,
but it doesn't hurt for kernel to behave smarter and don't
crash older QEMU's which could use large amount of memory
regions.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 drivers/vhost/vhost.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f1e07b8..99931a0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -471,7 +471,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
fput(dev-log_file);
dev-log_file = NULL;
/* No one will access memory at this point */
-   kfree(dev-memory);
+   kvfree(dev-memory);
dev-memory = NULL;
WARN_ON(!list_empty(dev-work_list));
if (dev-worker) {
@@ -601,6 +601,18 @@ static int vhost_memory_reg_sort_cmp(const void *p1, const 
void *p2)
return 0;
 }
 
+static void *vhost_kvzalloc(unsigned long size)
+{
+   void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+
+   if (!n) {
+   n = vzalloc(size);
+   if (!n)
+   return ERR_PTR(-ENOMEM);
+   }
+   return n;
+}
+
 static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user 
*m)
 {
struct vhost_memory mem, *newmem, *oldmem;
@@ -613,21 +625,21 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -EOPNOTSUPP;
if (mem.nregions  VHOST_MEMORY_MAX_NREGIONS)
return -E2BIG;
-   newmem = kmalloc(size + mem.nregions * sizeof *m-regions, GFP_KERNEL);
+   newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m-regions));
if (!newmem)
return -ENOMEM;
 
memcpy(newmem, mem, size);
if (copy_from_user(newmem-regions, m-regions,
   mem.nregions * sizeof *m-regions)) {
-   kfree(newmem);
+   kvfree(newmem);
return -EFAULT;
}
sort(newmem-regions, newmem-nregions, sizeof(*newmem-regions),
vhost_memory_reg_sort_cmp, NULL);
 
if (!memory_access_ok(d, newmem, 0)) {
-   kfree(newmem);
+   kvfree(newmem);
return -EFAULT;
}
oldmem = d-memory;
@@ -639,7 +651,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
d-vqs[i]-memory = newmem;
mutex_unlock(d-vqs[i]-mutex);
}
-   kfree(oldmem);
+   kvfree(oldmem);
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs

2015-07-01 Thread Marc Zyngier

On 30/06/15 21:19, Christoffer Dall wrote:
 On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
 We only set the irq_queued flag for level interrupts, meaning
 that !vgic_irq_is_queued(vcpu, irq) is a good enough predicate
 for all interrupts.

 This will allow us to inject edge HW interrupts, for which the
 state ACTIVE+PENDING is not allowed.
 
 I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
 Do you mean that if we set the HW bit in the LR, then we are linking to
 an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
 GIC side?
 
 Why is this relevant here?  I feel like I'm missing context.

I've probably taken a shortcut here - bear with me while I'm trying to
explain the issue.

For HW interrupts, we shouldn't even try to use the state bits in the
LR, because that state is contained in the physical distributor. Setting
the HW bit really means there is something going on at the distributor
level, just go there.

If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
basically loose the second interrupt because that state is simply not
considered.

So the trick we're using is to only inject the active interrupt, and
prevent anything else from being injected until we can confirm that the
active state has been cleared at the physical level.

Does it make any sense?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

2015-07-01 Thread Marc Zyngier

On 30/06/15 21:19, Christoffer Dall wrote:
 On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
 So far, the only use of the HW interrupt facility is the timer,
 implying that the active state is context-switched for each vcpu,
 as the device is is shared across all vcpus.

 This does not work for a device that has been assigned to a VM,
 as the guest is entierely in control of that device (the HW is
 not shared). In that case, it makes sense to bypass the whole
 active state switchint, and only track the deactivation of the
 interrupt.

 The discinction here between shared and non-shared feels a bit arbitrary
 (it may not be, but just feel that way) and I can't easily convince
 myself that this is the logical/correct/all-encompassing word to
 describe the nature of the two devices.

Does the idea of global vs private resource feel more correct?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 09:26:59AM +0100, Marc Zyngier wrote:
 On 30/06/15 21:19, Christoffer Dall wrote:
  On Mon, Jun 08, 2015 at 06:04:05PM +0100, Marc Zyngier wrote:
  So far, the only use of the HW interrupt facility is the timer,
  implying that the active state is context-switched for each vcpu,
  as the device is is shared across all vcpus.
 
  This does not work for a device that has been assigned to a VM,
  as the guest is entierely in control of that device (the HW is
  not shared). In that case, it makes sense to bypass the whole
  active state switchint, and only track the deactivation of the
  interrupt.
 
  The discinction here between shared and non-shared feels a bit arbitrary
  (it may not be, but just feel that way) and I can't easily convince
  myself that this is the logical/correct/all-encompassing word to
  describe the nature of the two devices.
 
 Does the idea of global vs private resource feel more correct?
 
I think shared covers that equally well.  This feels like one of those
things that just doesn't make intuitive sense on its own but when you
think about the cases we are familiar with, then it fits for now.  So
what you have here is probably as good as it gets and hopefully it does
cover all the cases we care about, i.e. shared and non-shared :)

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/2] vhost: add max_mem_regions module parameter

2015-07-01 Thread Igor Mammedov

it became possible to use a bigger amount of memory
slots, which is used by memory hotplug for
registering hotplugged memory.
However QEMU crashes if it's used with more than ~60
pc-dimm devices and vhost-net enabled since host kernel
in module vhost-net refuses to accept more than 64
memory regions.

Allow to tweak limit via max_mem_regions module paramemter
with default value set to 64 slots.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 drivers/vhost/vhost.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99931a0..5905cd7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -29,8 +29,12 @@
 
 #include vhost.h
 
+static ushort max_mem_regions = 64;
+module_param(max_mem_regions, ushort, 0444);
+MODULE_PARM_DESC(max_mem_regions,
+   Maximum number of memory regions in memory map. (default: 64));
+
 enum {
-   VHOST_MEMORY_MAX_NREGIONS = 64,
VHOST_MEMORY_F_LOG = 0x1,
 };
 
@@ -623,7 +627,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -EFAULT;
if (mem.padding)
return -EOPNOTSUPP;
-   if (mem.nregions  VHOST_MEMORY_MAX_NREGIONS)
+   if (mem.nregions  max_mem_regions)
return -E2BIG;
newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m-regions));
if (!newmem)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/2] vhost: support more than 64 memory regions

2015-07-01 Thread Igor Mammedov

changes since v2:
  * drop cache patches for now as suggested
  * add max_mem_regions module parameter instead of unconditionally
increasing limit
  * drop bsearch patch since it's already queued

References to previous versions:
v2: https://lkml.org/lkml/2015/6/17/276
v1: http://www.spinics.net/lists/kvm/msg117654.html

Series allows to tweak vhost's memory regions count limit.

It fixes VM crashing on memory hotplug due to vhost refusing
accepting more than 64 memory regions with max_mem_regions
set to more than 262 slots in default QEMU configuration.

Igor Mammedov (2):
  vhost: extend memory regions allocation to vmalloc
  vhost: add max_mem_regions module parameter

 drivers/vhost/vhost.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: add hyper-v crash msrs values

2015-07-01 Thread Paolo Bonzini



On 01/07/2015 18:06, Peter Hornyack wrote:
 If userspace is controlling the crash capabilities then
 HV_X64_MSR_CRASH_CTL_CONTENTS is not needed.

Actually you still need to: userspace cannot write anything but 0 or
(1ULL  63).  However, the name makes less sense, so I'm in favor of
removing the value.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 05/11] KVM: arm64: guest debug, add SW break point support

2015-07-01 Thread Alex Bennée

This adds support for SW breakpoints inserted by userspace.

We do this by trapping all guest software debug exceptions to the
hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of
KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the
exception syndrome information.

It will be up to userspace to extract the PC (via GET_ONE_REG) and
determine if the debug event was for a breakpoint it inserted. If not
userspace will need to re-inject the correct exception restart the
hypervisor to deliver the debug exception to the guest.

Any other guest software debug exception (e.g. single step or HW
assisted breakpoints) will cause an error and the VM to be killed. This
is addressed by later patches which add support for the other debug
types.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v2
  - update to use new exit struct
  - tweak for C setup
  - do our setup in debug_setup/clear code
  - fixed up comments
v3:
  - fix spacing in KVM_GUESTDBG_VALID_MASK
  - fix and clarify wording on kvm_handle_guest_debug
  - handle error case in kvm_handle_guest_debug
  - re-word the commit message
v4
  - rm else leg
  - add r-b-tag
v7
  - moved ioctl to guest
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/debug.c|  3 +++
 arch/arm64/kvm/guest.c|  2 +-
 arch/arm64/kvm/handle_exit.c  | 36 
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index ba635c7..33c8143 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2667,7 +2667,7 @@ when running. Common control bits are:
 The top 16 bits of the control field are architecture specific control
 flags which can include the following:
 
-  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86]
+  - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
   - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
   - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index faf0e1f..8d1bfa4 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -73,6 +73,9 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
if (trap_debug)
vcpu-arch.mdcr_el2 |= MDCR_EL2_TDA;
 
+   /* Trap breakpoints? */
+   if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP)
+   vcpu-arch.mdcr_el2 |= MDCR_EL2_TDE;
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0ba8677..22d22c5 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -332,7 +332,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
-#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE)
+#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)
 
 /**
  * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 524fa25..27f38a9 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -82,6 +82,40 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return 1;
 }
 
+/**
+ * kvm_handle_guest_debug - handle a debug exception instruction
+ *
+ * @vcpu:  the vcpu pointer
+ * @run:   access to the kvm_run structure for results
+ *
+ * We route all debug exceptions through the same handler. If both the
+ * guest and host are using the same debug facilities it will be up to
+ * userspace to re-inject the correct exception for guest delivery.
+ *
+ * @return: 0 (while setting run-exit_reason), -1 for error
+ */
+static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   u32 hsr = kvm_vcpu_get_hsr(vcpu);
+   int ret = 0;
+
+   run-exit_reason = KVM_EXIT_DEBUG;
+   run-debug.arch.hsr = hsr;
+
+   switch (hsr  ESR_ELx_EC_SHIFT) {
+   case ESR_ELx_EC_BKPT32:
+   case ESR_ELx_EC_BRK64:
+   break;
+   default:
+   kvm_err(%s: un-handled case hsr: %#08x\n,
+   __func__, (unsigned int) hsr);
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
[ESR_ELx_EC_WFx]= kvm_handle_wfx,
[ESR_ELx_EC_CP15_32]= kvm_handle_cp15_32,
@@ -96,6 +130,8 @@ static exit_handle_fn arm_exit_handlers[] = {
[ESR_ELx_EC_SYS64]  = kvm_handle_sys_reg,
[ESR_ELx_EC_IABT_LOW]   = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW]   = kvm_handle_guest_abort,
+   [ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
+   [ESR_ELx_EC_BRK64]  = kvm_handle_guest_debug,
 };
 
 static

[PATCH v7 04/11] KVM: arm: introduce kvm_arm_init/setup/clear_debug

2015-07-01 Thread Alex Bennée

This is a precursor for later patches which will need to do more to
setup debug state before entering the hyp.S switch code. The existing
functionality for setting mdcr_el2 has been moved out of hyp.S and now
uses the value kept in vcpu-arch.mdcr_el2.

As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've
had to add a mechanism to save the value of mdcr_el2 as a per-cpu
variable during the initialisation code. The kernel never sets this
number so we are assuming the bootcode has set up the correct value
here.

This also moves the conditional setting of the TDA bit from the hyp code
into the C code which is currently used for the lazy debug register
context switch code.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v3
  - rename fns from arch-arm
  - preserve MDCR_EL2.HPMN setting
  - re-word some of the comments
  - fix some minor grammar nits
  - merge setting of mdcr_el2
  - introduce trap_debug flag
  - move setup/clear within the irq lock section
v4
  - fix TDOSA desc
  - rm un-needed else leg
  - s/arch/arm/
v6
  - add s-o-b tag
---
 arch/arm/include/asm/kvm_host.h   |  4 ++
 arch/arm/kvm/arm.c|  9 -
 arch/arm64/include/asm/kvm_asm.h  |  2 +
 arch/arm64/include/asm/kvm_host.h |  5 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/Makefile   |  2 +-
 arch/arm64/kvm/debug.c| 81 +++
 arch/arm64/kvm/hyp.S  | 19 -
 8 files changed, 110 insertions(+), 13 deletions(-)
 create mode 100644 arch/arm64/kvm/debug.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d71607c..746c0c69 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,4 +236,8 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 
+static inline void kvm_arm_init_debug(void) {}
+static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 92b80bc..af60e6f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -542,6 +542,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
continue;
}
 
+   kvm_arm_setup_debug(vcpu);
+
/**
 * Enter the guest
 */
@@ -554,7 +556,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
vcpu-mode = OUTSIDE_GUEST_MODE;
kvm_guest_exit();
trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
-   /*
+
+   kvm_arm_clear_debug(vcpu);
+
+/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
 * pending, as we haven't serviced it yet!
@@ -902,6 +907,8 @@ static void cpu_init_hyp_mode(void *dummy)
vector_ptr = (unsigned long)__kvm_hyp_vector;
 
__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
+
+   kvm_arm_init_debug();
 }
 
 static int hyp_init_cpu_notify(struct notifier_block *self,
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 4f7310f..d6b507e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -137,6 +137,8 @@ extern char __restore_vgic_v2_state[];
 extern char __save_vgic_v3_state[];
 extern char __restore_vgic_v3_state[];
 
+extern u32 __kvm_get_mdcr_el2(void);
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f0f58c9..7cb99b5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -103,6 +103,7 @@ struct kvm_vcpu_arch {
 
/* HYP configuration */
u64 hcr_el2;
+   u32 mdcr_el2;
 
/* Exception Information */
struct kvm_vcpu_fault_info fault;
@@ -250,4 +251,8 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 
+void kvm_arm_init_debug(void);
+void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
+void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index da675cc..dfb25a2 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -117,6 +117,7 @@ int main(void)
   DEFINE(VCPU_HPFAR_EL2,

Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs

2015-07-01 Thread Marc Zyngier

On 01/07/15 12:58, Christoffer Dall wrote:
 On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
 On 30/06/15 21:19, Christoffer Dall wrote:
 On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
 We only set the irq_queued flag for level interrupts, meaning
 that !vgic_irq_is_queued(vcpu, irq) is a good enough predicate
 for all interrupts.

 This will allow us to inject edge HW interrupts, for which the
 state ACTIVE+PENDING is not allowed.

 I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
 Do you mean that if we set the HW bit in the LR, then we are linking to
 an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
 GIC side?

 Why is this relevant here?  I feel like I'm missing context.

 I've probably taken a shortcut here - bear with me while I'm trying to
 explain the issue.

 For HW interrupts, we shouldn't even try to use the state bits in the
 LR, because that state is contained in the physical distributor. Setting
 the HW bit really means there is something going on at the distributor
 level, just go there.
 
 ok, so by HW interrupts you mean virtual interrupts with the HW bit in
 the LR set, correct?

Yes, sorry.


 If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
 basically loose the second interrupt because that state is simply not
 considered.
 
 Huh?  Which second interrupt.  I looked at the spec and it says don't
 use the state bits for HW interrupts, so isn't it simply not supported
 to set these bits at all and that's it?

I managed to confuse myself reading the same bit. It says (GICv3 spec):

A hypervisor must only use the pending and active state for software
originated interrupts, which are typically associated with virtual
devices, or SGIs.

That's the PENDING+ACTIVE state, and not the pending and active bits
like I read it initially.

Now consider the following scenario:

- We inject a virtual edge interrupt
- We mark the corresponding physical interrupt as active.
- Queue interrupt in an LR
- Resume vcpu

Now, we inject another edge interrupt, the vcpu exits for whatever
reason, and the previously injected interrupt is still active.

The normal vGIC flow would be to mark the interrupt as ACTIVE+PENDING in
the LR, and resume the vcpu. But the above states that this is invalid
for HW generated interrupts.


 So the trick we're using is to only inject the active interrupt, and
 prevent anything else from being injected until we can confirm that the
 active state has been cleared at the physical level.

 Does it make any sense?

 Sort of, but what I don't understand now is how the guest ever sees the
 interrupt then.  If we always inject the virtual interrupt by setting
 the active state on the physical distributor, and we can't inject this
 as active+pending, and the guest doesn't see the state in the LR, then
 how does this ever raise a virtual interrupt and how does the guest see
 an interrupt which is only PENDING so that it can ack it etc. etc.?
 
 Maybe I don't fully understand how the HW bit works after all...

The way the spec is written is slightly misleading. But the gist of it
is that we still signal the guest using the PENDING bit in the LR, and
switch the LR as usual. it is just that we can't use the PENDING+ACTIVE
state (apparently, this can lead to a double deactivation).

Not sure the above makes sense. Beer time, I suppose.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 01/11] KVM: add comments for kvm_debug_exit_arch struct

2015-07-01 Thread Alex Bennée

Bring into line with the comments for the other structures and their
KVM_EXIT_* cases. Also update api.txt to reflect use in kvm_run
documentation.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Andrew Jones drjo...@redhat.com
Acked-by: Christoffer Dall christoffer.d...@linaro.org

---

v2
  - add comments for other exit types
v3
  - s/commentary/comments/
  - add rb tags
  - update api.txt kvm_run to include KVM_EXIT_DEBUG desc
v4
  - sp fixes
  - add a-b
---
 Documentation/virtual/kvm/api.txt | 4 +++-
 include/uapi/linux/kvm.h  | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9fa2bf8..c34c32d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3070,11 +3070,13 @@ data_offset describes where the data is located 
(KVM_EXIT_IO_OUT) or
 where kvm expects application code to place the data for the next
 KVM_RUN invocation (KVM_EXIT_IO_IN).  Data format is a packed array.
 
+   /* KVM_EXIT_DEBUG */
struct {
struct kvm_debug_exit_arch arch;
} debug;
 
-Unused.
+If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
+for which architecture specific information is returned.
 
/* KVM_EXIT_MMIO */
struct {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4b60056..70ac641 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -237,6 +237,7 @@ struct kvm_run {
__u32 count;
__u64 data_offset; /* relative to kvm_run start */
} io;
+   /* KVM_EXIT_DEBUG */
struct {
struct kvm_debug_exit_arch arch;
} debug;
@@ -285,6 +286,7 @@ struct kvm_run {
__u32 data;
__u8  is_write;
} dcr;
+   /* KVM_EXIT_INTERNAL_ERROR */
struct {
__u32 suberror;
/* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
@@ -295,6 +297,7 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
+   /* KVM_EXIT_PAPR_HCALL */
struct {
__u64 nr;
__u64 ret;
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 06/11] KVM: arm64: guest debug, add support for single-step

2015-07-01 Thread Alex Bennée

This adds support for single-stepping the guest. To do this we need to
manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits to trigger
stepping. We take care to preserve MDSCR_EL1 and trap access to it to
ensure we don't affect the apparent state of the guest.

As we have to enable trapping of all software debug exceptions we
suppress the ability of the guest to single-step itself. If we didn't we
would have to deal with the exception arriving while the guest was in
kernelspace when the guest is expecting to single-step userspace. This
is something we don't want to unwind in the kernel. Once the host is no
longer debugging the guest its ability to single-step userspace is
restored.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v2
  - Move pstate/mdscr manipulation into C
  - don't export guest_debug to assembly
  - add accessor for saved_debug regs
  - tweak save/restore of mdscr_el1
v3
  - don't save PC in debug information struct
  - rename debug_saved_regs-guest_debug_state
  - save whole value, only use bits in restore
  - add save/restore_guest-debug_regs helper functions
  - simplify commit message for clarity
  - rm vcpu_debug_saved_reg access fn
v4
  - added more comments based on suggestions
  - guest_debug_state-guest_debug_preserved
  - no point masking restore, we will trap out
v5
  - more comments
  - don't bother preserving pstate.ss (guest never sees change)
v6
  - reword comments on guest SS suppression
  - simplify comment for save regs, SS explained in detail later on
  - add r-b-t (code)
  - expanded commit description
v7
  - merge fix for ioctl move to guest.c
---
 arch/arm64/include/asm/kvm_host.h | 11 +++
 arch/arm64/kvm/debug.c| 68 ---
 arch/arm64/kvm/guest.c|  4 ++-
 arch/arm64/kvm/handle_exit.c  |  2 ++
 4 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 7cb99b5..e2db6a6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -123,6 +123,17 @@ struct kvm_vcpu_arch {
 * here.
 */
 
+   /*
+* Guest registers we preserve during guest debugging.
+*
+* These shadow registers are updated by the kvm_handle_sys_reg
+* trap handler if the guest accesses or updates them while we
+* are using guest debug.
+*/
+   struct {
+   u32 mdscr_el1;
+   } guest_debug_preserved;
+
/* Don't run the guest */
bool pause;
 
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 8d1bfa4..d439eb8 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -19,11 +19,39 @@
 
 #include linux/kvm_host.h
 
+#include asm/debug-monitors.h
+#include asm/kvm_asm.h
 #include asm/kvm_arm.h
+#include asm/kvm_emulate.h
+
+/* These are the bits of MDSCR_EL1 we may manipulate */
+#define MDSCR_EL1_DEBUG_MASK   (DBG_MDSCR_SS | \
+   DBG_MDSCR_KDE | \
+   DBG_MDSCR_MDE)
 
 static DEFINE_PER_CPU(u32, mdcr_el2);
 
 /**
+ * save/restore_guest_debug_regs
+ *
+ * For some debug operations we need to tweak some guest registers. As
+ * a result we need to save the state of those registers before we
+ * make those modifications.
+ *
+ * Guest access to MDSCR_EL1 is trapped by the hypervisor and handled
+ * after we have restored the preserved value to the main context.
+ */
+static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
+{
+   vcpu-arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, 
MDSCR_EL1);
+}
+
+static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
+{
+   vcpu_sys_reg(vcpu, MDSCR_EL1) = 
vcpu-arch.guest_debug_preserved.mdscr_el1;
+}
+
+/**
  * kvm_arm_init_debug - grab what we need for debug
  *
  * Currently the sole task of this function is to retrieve the initial
@@ -38,7 +66,6 @@ void kvm_arm_init_debug(void)
__this_cpu_write(mdcr_el2, kvm_call_hyp(__kvm_get_mdcr_el2));
 }
 
-
 /**
  * kvm_arm_setup_debug - set up debug related stuff
  *
@@ -73,12 +100,45 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
if (trap_debug)
vcpu-arch.mdcr_el2 |= MDCR_EL2_TDA;
 
-   /* Trap breakpoints? */
-   if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP)
+   /* Is Guest debugging in effect? */
+   if (vcpu-guest_debug) {
+   /* Route all software debug exceptions to EL2 */
vcpu-arch.mdcr_el2 |= MDCR_EL2_TDE;
+
+   /* Save guest debug state */
+   save_guest_debug_regs(vcpu);
+
+   /*
+* Single Step (ARM ARM D2.12.3 The software step state
+* machine)
+*
+* If we are doing Single Step we need to manipulate
+* the guest's MDSCR_EL1.SS and PSTATE.SS. Once the
+* step has

[PATCH v7 11/11] KVM: arm64: add trace points for guest_debug debug

2015-07-01 Thread Alex Bennée

This includes trace points for:
  kvm_arch_setup_guest_debug
  kvm_arch_clear_guest_debug

I've also added some generic register setting trace events and also a
trace point to dump the array of hardware registers.

Signed-off-by: Alex Bennée alex.ben...@linaro.org

---
v3
  - add trace event for debug access.
  - remove short trace #define, rename trace events
  - use __print_array with fixed array instead of own func
  - rationalise trace points (only one per register changed)
  - add vcpu ptr to the debug_setup trace
  - remove :: in prints
v4
  - u32/u64 split on debug registers
  - fix for renames
  - add tracing of traps/set_guest_debug
  - remove handle_guest_debug trace
v5
  - minor print fmt fix
  - rm pstate traces
v6
  - fix merge conflicts
  - update control reg tracking to u64 (abi change)
v7
  - fix merge conflicts from ioctl move
  - fix other minor merge conflicts
  - fixes for the re-factored sys_regs code
---
 arch/arm64/kvm/debug.c|  35 -
 arch/arm64/kvm/guest.c|   4 ++
 arch/arm64/kvm/sys_regs.c |  21 
 arch/arm64/kvm/trace.h| 123 ++
 4 files changed, 182 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 46b73d7..119107f 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -24,6 +24,8 @@
 #include asm/kvm_arm.h
 #include asm/kvm_emulate.h
 
+#include trace.h
+
 /* These are the bits of MDSCR_EL1 we may manipulate */
 #define MDSCR_EL1_DEBUG_MASK   (DBG_MDSCR_SS | \
DBG_MDSCR_KDE | \
@@ -44,11 +46,17 @@ static DEFINE_PER_CPU(u32, mdcr_el2);
 static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
vcpu-arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, 
MDSCR_EL1);
+
+   trace_kvm_arm_set_dreg32(Saved MDSCR_EL1,
+   vcpu-arch.guest_debug_preserved.mdscr_el1);
 }
 
 static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
vcpu_sys_reg(vcpu, MDSCR_EL1) = 
vcpu-arch.guest_debug_preserved.mdscr_el1;
+
+   trace_kvm_arm_set_dreg32(Restored MDSCR_EL1,
+   vcpu_sys_reg(vcpu, MDSCR_EL1));
 }
 
 /**
@@ -99,6 +107,8 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
bool trap_debug = !(vcpu-arch.debug_flags  KVM_ARM64_DEBUG_DIRTY);
 
+   trace_kvm_arm_setup_debug(vcpu, vcpu-guest_debug);
+
vcpu-arch.mdcr_el2 = __this_cpu_read(mdcr_el2)  MDCR_EL2_HPMN_MASK;
vcpu-arch.mdcr_el2 |= (MDCR_EL2_TPM |
MDCR_EL2_TPMCR |
@@ -140,6 +150,8 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
vcpu_sys_reg(vcpu, MDSCR_EL1) = ~DBG_MDSCR_SS;
}
 
+   trace_kvm_arm_set_dreg32(SPSR_EL2, *vcpu_cpsr(vcpu));
+
/*
 * HW Breakpoints and watchpoints
 *
@@ -156,6 +168,14 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
vcpu-arch.debug_ptr = vcpu-arch.external_debug_state;
vcpu-arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
trap_debug = true;
+
+   trace_kvm_arm_set_regset(BKPTS, get_num_brps(),
+   
vcpu-arch.debug_ptr-dbg_bcr[0],
+   
vcpu-arch.debug_ptr-dbg_bvr[0]);
+
+   trace_kvm_arm_set_regset(WAPTS, get_num_wrps(),
+   
vcpu-arch.debug_ptr-dbg_wcr[0],
+   
vcpu-arch.debug_ptr-dbg_wvr[0]);
}
}
 
@@ -165,10 +185,15 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
/* Trap debug register access */
if (trap_debug)
vcpu-arch.mdcr_el2 |= MDCR_EL2_TDA;
+
+   trace_kvm_arm_set_dreg32(MDCR_EL2, vcpu-arch.mdcr_el2);
+   trace_kvm_arm_set_dreg32(MDSCR_EL1, vcpu_sys_reg(vcpu, MDSCR_EL1));
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
 {
+   trace_kvm_arm_clear_debug(vcpu-guest_debug);
+
if (vcpu-guest_debug) {
restore_guest_debug_regs(vcpu);
 
@@ -176,8 +201,16 @@ void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
 * If we were using HW debug we need to restore the
 * debug_ptr to the guest debug state.
 */
-   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW)
+   if (vcpu-guest_debug  KVM_GUESTDBG_USE_HW) {
kvm_arm_reset_debug_ptr(vcpu);
 
+   trace_kvm_arm_set_regset(BKPTS, get_num_brps(),
+   
vcpu-arch.debug_ptr-dbg_bcr[0],
+   
vcpu-arch.debug_ptr-dbg_bvr[0]);
+
+   trace_kvm_arm_set_regset(WAPTS, get_num_wrps(),
+   
vcpu-arch.debug_ptr-dbg_wcr[0],
+

[PATCH v7 08/11] KVM: arm64: introduce vcpu-arch.debug_ptr

2015-07-01 Thread Alex Bennée

This introduces a level of indirection for the debug registers. Instead
of using the sys_regs[] directly we store registers in a structure in
the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the
guest context.

This also entails updating the sys_regs code to access this new
structure. New access function have been added for each set of debug
registers. The generic functions are still used for the few registers
stored in the main context.

New access function pointers have been added to the sys_reg_desc
structure to support the GET/SET_ONE_REG ioctl operations.

Signed-off-by: Alex Bennée alex.ben...@linaro.org

---
v6:
  - fix up some ws issues
  - correct clobber info
  - re-word commentary in kvm_host.h
  - fix endian access issues for aarch32 fields
  - revert all KVM_GET/SET_ONE_REG to 64bit (also see ABI update)
v7
  - new fn kvm_arm_reset_debug_ptr(), stubbed for arm
  - split trap fns into bcr,bvr,bcr,wvr and wxvr
  - add set/get fns to sys_regs_desc
  - reg_to_dbg/dbg_to_reg helpers for 32bit support
---
 arch/arm/include/asm/kvm_host.h   |   2 +-
 arch/arm/kvm/arm.c|   2 +
 arch/arm64/include/asm/kvm_asm.h  |  24 ++--
 arch/arm64/include/asm/kvm_host.h |  17 ++-
 arch/arm64/kernel/asm-offsets.c   |   6 +
 arch/arm64/kvm/debug.c|   9 ++
 arch/arm64/kvm/hyp.S  |  24 ++--
 arch/arm64/kvm/sys_regs.c | 281 ++
 arch/arm64/kvm/sys_regs.h |   6 +
 9 files changed, 321 insertions(+), 50 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 746c0c69..f42759b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -239,5 +239,5 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, 
int cpu) {}
 static inline void kvm_arm_init_debug(void) {}
 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
-
+static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index af60e6f..525473f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -279,6 +279,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
 
+   kvm_arm_reset_debug_ptr(vcpu);
+
return 0;
 }
 
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d6b507e..e997404 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -46,24 +46,16 @@
 #defineCNTKCTL_EL1 20  /* Timer Control Register (EL1) */
 #definePAR_EL1 21  /* Physical Address Register */
 #define MDSCR_EL1  22  /* Monitor Debug System Control Register */
-#define DBGBCR0_EL123  /* Debug Breakpoint Control Registers (0-15) */
-#define DBGBCR15_EL1   38
-#define DBGBVR0_EL139  /* Debug Breakpoint Value Registers (0-15) */
-#define DBGBVR15_EL1   54
-#define DBGWCR0_EL155  /* Debug Watchpoint Control Registers (0-15) */
-#define DBGWCR15_EL1   70
-#define DBGWVR0_EL171  /* Debug Watchpoint Value Registers (0-15) */
-#define DBGWVR15_EL1   86
-#define MDCCINT_EL187  /* Monitor Debug Comms Channel Interrupt Enable 
Reg */
+#define MDCCINT_EL123  /* Monitor Debug Comms Channel Interrupt Enable 
Reg */
 
 /* 32bit specific registers. Keep them at the end of the range */
-#defineDACR32_EL2  88  /* Domain Access Control Register */
-#defineIFSR32_EL2  89  /* Instruction Fault Status Register */
-#defineFPEXC32_EL2 90  /* Floating-Point Exception Control 
Register */
-#defineDBGVCR32_EL291  /* Debug Vector Catch Register */
-#defineTEECR32_EL1 92  /* ThumbEE Configuration Register */
-#defineTEEHBR32_EL193  /* ThumbEE Handler Base Register */
-#defineNR_SYS_REGS 94
+#defineDACR32_EL2  24  /* Domain Access Control Register */
+#defineIFSR32_EL2  25  /* Instruction Fault Status Register */
+#defineFPEXC32_EL2 26  /* Floating-Point Exception Control 
Register */
+#defineDBGVCR32_EL227  /* Debug Vector Catch Register */
+#defineTEECR32_EL1 28  /* ThumbEE Configuration Register */
+#defineTEEHBR32_EL129  /* ThumbEE Handler Base Register */
+#defineNR_SYS_REGS 30
 
 /* 32bit mapping */
 #define c0_MPIDR   (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e2db6a6..461d288 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -108,11 +108,25 @@ struct kvm_vcpu_arch {
/* Exception Information */
struct kvm_vcpu_fault_info fault;
 
-   /* Debug state */
+   /*

[PATCH v7 10/11] KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG

2015-07-01 Thread Alex Bennée

Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm
support is added this check can be moved to the common
kvm_vm_ioctl_check_extension() code.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org

---

v3:
 - separated capability check from previous patches
 - moved into arm64 specific ioctl handler.
v4:
 - add a-b-tag
---
 arch/arm64/kvm/reset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 21d5a62..88e5331 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -76,6 +76,9 @@ int kvm_arch_dev_ioctl_check_extension(long ext)
case KVM_CAP_GUEST_DEBUG_HW_WPS:
r  = get_num_wrps();
break;
+   case KVM_CAP_SET_GUEST_DEBUG:
+   r = 1;
+   break;
default:
r = 0;
}
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support

2015-07-01 Thread Alex Bennée

This adds support for userspace to control the HW debug registers for
guest debug. In the debug ioctl we copy an IMPDEF registers into a new
register set called host_debug_state.

We use the recently introduced vcpu parameter debug_ptr to select which
register set is copied into the real registers when world switch occurs.

I've made some helper functions from hw_breakpoint.c more widely
available for re-use.

As with single step we need to tweak the guest registers to enable the
exceptions so we need to save and restore those bits.

Two new capabilities have been added to the KVM_EXTENSION ioctl to allow
userspace to query the number of hardware break and watch points
available on the host hardware.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v2
   - switched to C setup
   - replace host debug registers directly into context
   - minor tweak to api docs
   - setup right register for debug
   - add FAR_EL2 to debug exit structure
   - add support for trapping debug register access
v3
   - remove stray trace statement
   - fix spacing around operators (various)
   - clean-up usage of trap_debug
   - introduce debug_ptr, replace excessive memcpy stuff
   - don't use memcpy in ioctl, just assign
   - update cap ioctl documentation
   - reword a number comments
   - rename host_debug_state-external_debug_state
v4
   - use the new u32/u64 split debug_ptr approach
   - fix some wording/comments
v5
   - don't set MDSCR_EL1.KDE (not needed)
v6
   - update wording given change in commentary
   - KVM_GUESTDBG_USE_HW_BP-KVM_GUESTDBG_USE_HW
v7
   - fix merge conflicts from ioctl move to guest.c
   - use kvm_arm_reset_debug_ptr to reset ptr
   - a BUG_ON() test has been added to trap failure to reset debug_ptr
   - debugging-debug in kvm_host.h comment
   - s/defined// s/to// in commit msg
   - rm ref to introducing debug_ptr in commit msg
   - add r-b tag
---
 Documentation/virtual/kvm/api.txt  |  7 +-
 arch/arm64/include/asm/hw_breakpoint.h |  4 
 arch/arm64/include/asm/kvm_host.h  |  6 -
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/kvm/debug.c | 40 +-
 arch/arm64/kvm/guest.c |  7 ++
 arch/arm64/kvm/handle_exit.c   |  6 +
 arch/arm64/kvm/reset.c | 12 ++
 arch/arm64/kvm/sys_regs.c  |  3 ---
 include/uapi/linux/kvm.h   |  2 ++
 10 files changed, 79 insertions(+), 12 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 33c8143..ada57df 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2668,7 +2668,7 @@ The top 16 bits of the control field are architecture 
specific control
 flags which can include the following:
 
   - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
-  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
+  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
   - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
   - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
@@ -2683,6 +2683,11 @@ updated to the correct (supplied) values.
 The second part of the structure is architecture specific and
 typically contains a set of debug registers.
 
+For arm64 the number of debug registers is implementation defined and
+can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
+KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
+indicating the number of supported registers.
+
 When debug events exit the main run loop with the reason
 KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 structure containing architecture specific debug information.
diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 52b484b..9da2824 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -130,6 +130,10 @@ static inline void ptrace_hw_copy_thread(struct 
task_struct *task)
 }
 #endif
 
+/* Determine number of BRP/WRP registers available. */
+extern int get_num_brps(void);
+extern int get_num_wrps(void);
+
 extern struct pmu perf_ops_bp;
 
 #endif /* __KERNEL__ */
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 461d288..6c745e0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -116,13 +116,17 @@ struct kvm_vcpu_arch {
 * debugging the guest from the host and to maintain separate host and
 * guest state during world switches. vcpu_debug_state are the debug
 * registers of the vcpu as the guest sees them.  host_debug_state are
-* the host registers which are saved and restored during world 
switches.
+* the host

[PATCH v7 07/11] KVM: arm64: re-factor hyp.S debug register code

2015-07-01 Thread Alex Bennée

This is a pre-cursor to sharing the code with the guest debug support.
This replaces the big macro that fishes data out of a fixed location
with a more general helper macro to restore a set of debug registers. It
uses macro substitution so it can be re-used for debug control and value
registers. It does however rely on the debug registers being 64 bit
aligned (as they happen to be in the hyp ABI).

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v3:
  - return to the patch series
  - add save and restore targets
  - change register use and document
v4:
  - keep original setup/restore names
  - don't use split u32/u64 structure yet
v6:
  - fix ws and clobber info in hyp.S
v7:
  - fix whitespace
  - add r-b-tag
---
 arch/arm64/kvm/hyp.S | 517 ++-
 1 file changed, 138 insertions(+), 379 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 2c67a14..77c08df 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -228,199 +228,52 @@
stp x24, x25, [x3, #160]
 .endm
 
-.macro save_debug
-   // x2: base address for cpu context
-   // x3: tmp register
-
-   mrs x26, id_aa64dfr0_el1
-   ubfxx24, x26, #12, #4   // Extract BRPs
-   ubfxx25, x26, #20, #4   // Extract WRPs
-   mov w26, #15
-   sub w24, w26, w24   // How many BPs to skip
-   sub w25, w26, w25   // How many WPs to skip
-
-   add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
-
-   adr x26, 1f
-   add x26, x26, x24, lsl #2
-   br  x26
-1:
-   mrs x20, dbgbcr15_el1
-   mrs x19, dbgbcr14_el1
-   mrs x18, dbgbcr13_el1
-   mrs x17, dbgbcr12_el1
-   mrs x16, dbgbcr11_el1
-   mrs x15, dbgbcr10_el1
-   mrs x14, dbgbcr9_el1
-   mrs x13, dbgbcr8_el1
-   mrs x12, dbgbcr7_el1
-   mrs x11, dbgbcr6_el1
-   mrs x10, dbgbcr5_el1
-   mrs x9, dbgbcr4_el1
-   mrs x8, dbgbcr3_el1
-   mrs x7, dbgbcr2_el1
-   mrs x6, dbgbcr1_el1
-   mrs x5, dbgbcr0_el1
-
-   adr x26, 1f
-   add x26, x26, x24, lsl #2
-   br  x26
-
-1:
-   str x20, [x3, #(15 * 8)]
-   str x19, [x3, #(14 * 8)]
-   str x18, [x3, #(13 * 8)]
-   str x17, [x3, #(12 * 8)]
-   str x16, [x3, #(11 * 8)]
-   str x15, [x3, #(10 * 8)]
-   str x14, [x3, #(9 * 8)]
-   str x13, [x3, #(8 * 8)]
-   str x12, [x3, #(7 * 8)]
-   str x11, [x3, #(6 * 8)]
-   str x10, [x3, #(5 * 8)]
-   str x9, [x3, #(4 * 8)]
-   str x8, [x3, #(3 * 8)]
-   str x7, [x3, #(2 * 8)]
-   str x6, [x3, #(1 * 8)]
-   str x5, [x3, #(0 * 8)]
-
-   add x3, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
-
-   adr x26, 1f
-   add x26, x26, x24, lsl #2
-   br  x26
-1:
-   mrs x20, dbgbvr15_el1
-   mrs x19, dbgbvr14_el1
-   mrs x18, dbgbvr13_el1
-   mrs x17, dbgbvr12_el1
-   mrs x16, dbgbvr11_el1
-   mrs x15, dbgbvr10_el1
-   mrs x14, dbgbvr9_el1
-   mrs x13, dbgbvr8_el1
-   mrs x12, dbgbvr7_el1
-   mrs x11, dbgbvr6_el1
-   mrs x10, dbgbvr5_el1
-   mrs x9, dbgbvr4_el1
-   mrs x8, dbgbvr3_el1
-   mrs x7, dbgbvr2_el1
-   mrs x6, dbgbvr1_el1
-   mrs x5, dbgbvr0_el1
-
-   adr x26, 1f
-   add x26, x26, x24, lsl #2
-   br  x26
-
-1:
-   str x20, [x3, #(15 * 8)]
-   str x19, [x3, #(14 * 8)]
-   str x18, [x3, #(13 * 8)]
-   str x17, [x3, #(12 * 8)]
-   str x16, [x3, #(11 * 8)]
-   str x15, [x3, #(10 * 8)]
-   str x14, [x3, #(9 * 8)]
-   str x13, [x3, #(8 * 8)]
-   str x12, [x3, #(7 * 8)]
-   str x11, [x3, #(6 * 8)]
-   str x10, [x3, #(5 * 8)]
-   str x9, [x3, #(4 * 8)]
-   str x8, [x3, #(3 * 8)]
-   str x7, [x3, #(2 * 8)]
-   str x6, [x3, #(1 * 8)]
-   str x5, [x3, #(0 * 8)]
-
-   add x3, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
-
-   adr x26, 1f
-   add x26, x26, x25, lsl #2
-   br  x26
+.macro save_debug type
+   // x4: pointer to register set
+   // x5: number of registers to skip
+   // x6..x22 trashed
+
+   adr x22, 1f
+   add x22, x22, x5, lsl #2
+   br  x22
 1:
-   mrs x20, dbgwcr15_el1
-   mrs x19, dbgwcr14_el1
-   mrs x18, dbgwcr13_el1
-   mrs x17, dbgwcr12_el1
-   mrs x16, dbgwcr11_el1
-   mrs x15, dbgwcr10_el1
-   mrs x14, dbgwcr9_el1
-   mrs x13, dbgwcr8_el1
-   mrs x12, dbgwcr7_el1
-   mrs x11, dbgwcr6_el1
-   mrs x10, dbgwcr5_el1
-   mrs x9, dbgwcr4_el1
-   mrs x8,

[PATCH v7 00/11] KVM Guest Debug support for arm64

2015-07-01 Thread Alex Bennée

Here is V7 of the KVM Guest Debug support for arm64.

The fixes are fairly minor aside from the re-factoring of sys_regs.c
to have individual trap functions for each debug register. There is a
lot of boiler plate but it does make the ugliness of the previous
offset hacks go away.

On top of that I've fixed some build failures on v7 which were not
apparent on my defconfig build. I've also been helped with
kernelci.org doing the heavy lifting for me:

http://kernelci.org/boot/all/job/alex/

For full details see the changelog on each of the patches.

GIT Repos:

The patches for this series are based off v4.1 and can be found
at:

Kernel:
https://git.linaro.org/people/alex.bennee/linux.git
branch: guest-debug/4.1-v7
describe: v4.1-11-g2a10438

QEMU:
https://github.com/stsquad/qemu
branch: kvm/guest-debug-v6


Alex Bennée (11):
  KVM: add comments for kvm_debug_exit_arch struct
  KVM: arm64: guest debug, define API headers
  KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl
  KVM: arm: introduce kvm_arm_init/setup/clear_debug
  KVM: arm64: guest debug, add SW break point support
  KVM: arm64: guest debug, add support for single-step
  KVM: arm64: re-factor hyp.S debug register code
  KVM: arm64: introduce vcpu-arch.debug_ptr
  KVM: arm64: guest debug, HW assisted debug support
  KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG
  KVM: arm64: add trace points for guest_debug debug

 Documentation/virtual/kvm/api.txt  |  15 +-
 arch/arm/include/asm/kvm_host.h|   4 +
 arch/arm/kvm/arm.c |  18 +-
 arch/arm/kvm/guest.c   |   6 +
 arch/arm64/include/asm/hw_breakpoint.h |   4 +
 arch/arm64/include/asm/kvm_asm.h   |  26 +-
 arch/arm64/include/asm/kvm_host.h  |  37 ++-
 arch/arm64/include/uapi/asm/kvm.h  |  27 ++
 arch/arm64/kernel/asm-offsets.c|   7 +
 arch/arm64/kernel/hw_breakpoint.c  |   4 +-
 arch/arm64/kvm/Makefile|   2 +-
 arch/arm64/kvm/debug.c | 216 +
 arch/arm64/kvm/guest.c |  40 +++
 arch/arm64/kvm/handle_exit.c   |  44 +++
 arch/arm64/kvm/hyp.S   | 544 ++---
 arch/arm64/kvm/reset.c |  15 +
 arch/arm64/kvm/sys_regs.c  | 299 --
 arch/arm64/kvm/sys_regs.h  |   6 +
 arch/arm64/kvm/trace.h | 123 
 include/uapi/linux/kvm.h   |   5 +
 20 files changed, 996 insertions(+), 446 deletions(-)
 create mode 100644 arch/arm64/kvm/debug.c

-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 03/11] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-07-01 Thread Alex Bennée

This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
ioctl. Any unsupported flag will return -EINVAL. For now, only
KVM_GUESTDBG_ENABLE is supported, although it won't have any effects.

Signed-off-by: Alex Bennée alex.ben...@linaro.org.
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

---
v2
  - simplified form of the ioctl (stuff will go into setup_debug)
v3
 - KVM_GUESTDBG_VALID-KVM_GUESTDBG_VALID_MASK
 - move mask check to the top of function
 - add ioctl doc header
 - split capability into separate patch
 - tweaked commit wording w.r.t return of -EINVAL
v4
 - add r-b-tag
v7
 - moved ioctl to arm64/kvm/guest.c, stubbed arm/kvm/guest.c
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm/kvm/arm.c|  7 ---
 arch/arm/kvm/guest.c  |  6 ++
 arch/arm64/kvm/guest.c| 27 +++
 4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index c34c32d..ba635c7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2645,7 +2645,7 @@ handled.
 4.87 KVM_SET_GUEST_DEBUG
 
 Capability: KVM_CAP_SET_GUEST_DEBUG
-Architectures: x86, s390, ppc
+Architectures: x86, s390, ppc, arm64
 Type: vcpu ioctl
 Parameters: struct kvm_guest_debug (in)
 Returns: 0 on success; -1 on error
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d9631ec..92b80bc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -302,13 +302,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_arm_set_running_vcpu(NULL);
 }
 
-int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
-   struct kvm_guest_debug *dbg)
-{
-   return -EINVAL;
-}
-
-
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
 {
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index d503fbb..96e935b 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -290,3 +290,9 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 {
return -EINVAL;
 }
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+   struct kvm_guest_debug *dbg)
+{
+   return -EINVAL;
+}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9535bd5..0ba8677 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -331,3 +331,30 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 {
return -EINVAL;
 }
+
+#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE)
+
+/**
+ * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
+ * @kvm:   pointer to the KVM struct
+ * @kvm_guest_debug: the ioctl data buffer
+ *
+ * This sets up and enables the VM for guest debugging. Userspace
+ * passes in a control flag to enable different debug types and
+ * potentially other architecture specific information in the rest of
+ * the structure.
+ */
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+   struct kvm_guest_debug *dbg)
+{
+   if (dbg-control  ~KVM_GUESTDBG_VALID_MASK)
+   return -EINVAL;
+
+   if (dbg-control  KVM_GUESTDBG_ENABLE) {
+   vcpu-guest_debug = dbg-control;
+   } else {
+   /* If not enabled clear all flags */
+   vcpu-guest_debug = 0;
+   }
+   return 0;
+}
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: s390: virtio-ccw: Fix config space values

2015-07-01 Thread Michael S. Tsirkin

On Wed, Jul 01, 2015 at 04:05:27PM +0200, Paolo Bonzini wrote:
 
 
 On 01/07/2015 15:45, Michael S. Tsirkin wrote:
   Paolo,
   
   here is fix targetted for kvm/master (4.2) that fixes an issue with
   virtio config space on s390. It mostly manifests in vhost-scsi
   not working properly on s390. The problem itself might affect other
   things as well so cc stable/target 4.2.
   
   @Michael FYI, sending this via Paolo as most virtio-ccw kernel
   things went this way.
 
  OK but virtio patches should be Cc'd to the virtualization mailing
  list. So I think we need a separate MAINTAINERS entry for
  s390/virtio.
 
 See my other email---I think no special case is necessary.
 
 Paolo

Hmm but MAINTAINERS doesn't tell people they should Cc virtio ML -
isn't that a problem?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: s390: virtio-ccw: don't overwrite config space values

2015-07-01 Thread Paolo Bonzini



On 29/06/2015 16:44, Christian Borntraeger wrote:
 From: Cornelia Huck cornelia.h...@de.ibm.com
 
 Eric noticed problems with vhost-scsi and virtio-ccw: vhost-scsi
 complained about overwriting values in the config space, which
 was triggered by a broken implementation of virtio-ccw's config
 get/set routines. It was probably sheer luck that we did not hit
 this before.
 
 When writing a value to the config space, the WRITE_CONF ccw will
 always write from the beginning of the config space up to and
 including the value to be set. If the config space up to the value
 has not yet been retrieved from the device, however, we'll end up
 overwriting values. Keep track of the known config space and update
 if needed to avoid this.
 
 Moreover, READ_CONF will only read the number of bytes it has been
 instructed to retrieve, so we must not copy more than that to the
 buffer, or we might overwrite trailing values.
 
 Reported-by: Eric Farman far...@linux.vnet.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 Reviewed-by: Eric Farman far...@linux.vnet.ibm.com
 Tested-by: Eric Farman far...@linux.vnet.ibm.com
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 Cc: sta...@vger.kernel.org
 ---
  drivers/s390/kvm/virtio_ccw.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
 index 6f1fa17..f8d8fdb 100644
 --- a/drivers/s390/kvm/virtio_ccw.c
 +++ b/drivers/s390/kvm/virtio_ccw.c
 @@ -65,6 +65,7 @@ struct virtio_ccw_device {
   bool is_thinint;
   bool going_away;
   bool device_lost;
 + unsigned int config_ready;
   void *airq_info;
  };
  
 @@ -833,8 +834,11 @@ static void virtio_ccw_get_config(struct virtio_device 
 *vdev,
   if (ret)
   goto out_free;
  
 - memcpy(vcdev-config, config_area, sizeof(vcdev-config));
 - memcpy(buf, vcdev-config[offset], len);
 + memcpy(vcdev-config, config_area, offset + len);
 + if (buf)
 + memcpy(buf, vcdev-config[offset], len);
 + if (vcdev-config_ready  offset + len)
 + vcdev-config_ready = offset + len;
  
  out_free:
   kfree(config_area);
 @@ -857,6 +861,9 @@ static void virtio_ccw_set_config(struct virtio_device 
 *vdev,
   if (!config_area)
   goto out_free;
  
 + /* Make sure we don't overwrite fields. */
 + if (vcdev-config_ready  offset)
 + virtio_ccw_get_config(vdev, 0, NULL, offset);
   memcpy(vcdev-config[offset], buf, len);
   /* Write the config area to the host. */
   memcpy(config_area, vcdev-config, sizeof(vcdev-config));
 

Applied (but I think in general virtio-ccw patches should go through
mst---the exception is when matching changes to KVM are needed, and of
course the exception was almost always the rule during bringup).

Thanks,

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] PPC: Current patch queue for HV KVM

2015-07-01 Thread Alexander Graf



On 24.06.15 13:18, Paul Mackerras wrote:
 This is my current queue of patches for HV KVM.  This series is based
 on the kvm next branch.  They have all been posted 6 weeks ago or
 more, though I have just added a 3-line fix to patch 2/5 to fix a bug
 that we found in testing migration, and I expanded a comment (no code
 change) in patch 3/5 following a suggestion by Aneesh.
 
 I'd like to see these go into 4.2 if possible.

Thanks, applied all to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] PPC: Current patch queue for HV KVM

2015-07-01 Thread Alexander Graf



On 24.06.15 13:18, Paul Mackerras wrote:
 This is my current queue of patches for HV KVM.  This series is based
 on the kvm next branch.  They have all been posted 6 weeks ago or
 more, though I have just added a 3-line fix to patch 2/5 to fix a bug
 that we found in testing migration, and I expanded a comment (no code
 change) in patch 3/5 following a suggestion by Aneesh.
 
 I'd like to see these go into 4.2 if possible.

Thanks, applied all to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: remove data variable from kvm_get_msr_common

2015-07-01 Thread Paolo Bonzini



On 29/06/2015 12:39, Nicolas Iooss wrote:
 Commit 609e36d372ad (KVM: x86: pass host_initiated to functions that
 read MSRs) modified kvm_get_msr_common function to use msr_info-data
 instead of data but missed one occurrence.  Replace it and remove the
 unused local variable.
 
 Fixes: 609e36d372ad (KVM: x86: pass host_initiated to functions that
 read MSRs)
 Signed-off-by: Nicolas Iooss nicolas.iooss_li...@m4x.org
 ---
  arch/x86/kvm/x86.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index ac165c2fb8e5..bbaf44e8f0d3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2379,8 +2379,6 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 
 msr, u64 *pdata)
  
  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  {
 - u64 data;
 -
   switch (msr_info-index) {
   case MSR_IA32_PLATFORM_ID:
   case MSR_IA32_EBL_CR_POWERON:
 @@ -2453,7 +2451,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
 msr_data *msr_info)
   /* TSC increment by tick */
   msr_info-data = 1000ULL;
   /* CPU multiplier */
 - data |= (((uint64_t)4ULL)  40);
 + msr_info-data |= (((uint64_t)4ULL)  40);
   break;
   case MSR_EFER:
   msr_info-data = vcpu-arch.efer;
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/2] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-07-01 Thread Christoffer Dall

On Wed, Jun 24, 2015 at 05:04:11PM -0700, Mario Smarduch wrote:
 This patch only saves and restores FP/SIMD registers on Guest access. To do
 this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
 lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
 context is not saved/restored
 
 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm64/include/asm/kvm_arm.h |5 -
  arch/arm64/kvm/hyp.S |   46 
 +++---
  2 files changed, 47 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm64/include/asm/kvm_arm.h 
 b/arch/arm64/include/asm/kvm_arm.h
 index ac6fafb..7605e09 100644
 --- a/arch/arm64/include/asm/kvm_arm.h
 +++ b/arch/arm64/include/asm/kvm_arm.h
 @@ -171,10 +171,13 @@
  #define HSTR_EL2_TTEE(1  16)
  #define HSTR_EL2_T(x)(1  x)
  
 +/* Hyp Coproccessor Trap Register Shifts */
 +#define CPTR_EL2_TFP_SHIFT 10
 +
  /* Hyp Coprocessor Trap Register */
  #define CPTR_EL2_TCPAC   (1  31)
  #define CPTR_EL2_TTA (1  20)
 -#define CPTR_EL2_TFP (1  10)
 +#define CPTR_EL2_TFP (1  CPTR_EL2_TFP_SHIFT)
  
  /* Hyp Debug Configuration Register bits */
  #define MDCR_EL2_TDRA(1  11)
 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index 5befd01..de0788f 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -673,6 +673,15 @@
   tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
  .endm
  
 +/*
 + * Check cptr VFP/SIMD accessed bit, if set VFP/SIMD not accessed by guest.

This comment doesn't really help me understand the function, may I
suggest:

Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)

 + */
 +.macro skip_fpsimd_state tmp, target
 + mrs \tmp, cptr_el2
 + tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
 +.endm
 +
 +
  .macro compute_debug_state target
   // Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
   // is set, we do a full save/restore cycle and disable trapping.
 @@ -763,6 +772,7 @@
   ldr x2, [x0, #VCPU_HCR_EL2]
   msr hcr_el2, x2
   mov x2, #CPTR_EL2_TTA
 + orr x2, x2, #CPTR_EL2_TFP
   msr cptr_el2, x2
  
   mov x2, #(1  15)  // Trap CP15 Cr=15
 @@ -785,7 +795,6 @@
  .macro deactivate_traps
   mov x2, #HCR_RW
   msr hcr_el2, x2
 - msr cptr_el2, xzr
   msr hstr_el2, xzr
  
   mrs x2, mdcr_el2
 @@ -912,6 +921,28 @@ __restore_fpsimd:
   restore_fpsimd
   ret
  
 +switch_to_guest_fpsimd:
 + pushx4, lr
 +
 + mrs x2, cptr_el2
 + bic x2, x2, #CPTR_EL2_TFP
 + msr cptr_el2, x2
 +
 + mrs x0, tpidr_el2
 +
 + ldr x2, [x0, #VCPU_HOST_CONTEXT]
 + kern_hyp_va x2
 + bl __save_fpsimd
 +
 + add x2, x0, #VCPU_CONTEXT
 + bl __restore_fpsimd
 +
 + pop x4, lr
 + pop x2, x3
 + pop x0, x1
 +
 + eret
 +
  /*
   * u64 __kvm_vcpu_run(struct kvm_vcpu *vcpu);
   *
 @@ -932,7 +963,6 @@ ENTRY(__kvm_vcpu_run)
   kern_hyp_va x2
  
   save_host_regs
 - bl __save_fpsimd
   bl __save_sysregs
  
   compute_debug_state 1f
 @@ -948,7 +978,6 @@ ENTRY(__kvm_vcpu_run)
   add x2, x0, #VCPU_CONTEXT
  
   bl __restore_sysregs
 - bl __restore_fpsimd
  
   skip_debug_state x3, 1f
   bl  __restore_debug
 @@ -967,7 +996,9 @@ __kvm_vcpu_return:
   add x2, x0, #VCPU_CONTEXT
  
   save_guest_regs
 + skip_fpsimd_state x3, 1f
   bl __save_fpsimd
 +1:
   bl __save_sysregs
  
   skip_debug_state x3, 1f
 @@ -986,7 +1017,11 @@ __kvm_vcpu_return:
   kern_hyp_va x2
  
   bl __restore_sysregs
 + skip_fpsimd_state x3, 1f
   bl __restore_fpsimd
 +1:
 + /* Clear FPSIMD and Trace trapping */
 + msr cptr_el2, xzr

why not simply move the deactivate_traps down here instead?

  
   skip_debug_state x3, 1f
   // Clear the dirty flag for the next run, as all the state has
 @@ -1201,6 +1236,11 @@ el1_trap:
* x1: ESR
* x2: ESR_EC
*/
 +
 + /* Guest accessed VFP/SIMD registers, save host, restore Guest */
 + cmp x2, #ESR_ELx_EC_FP_ASIMD
 + b.eqswitch_to_guest_fpsimd
 +
   cmp x2, #ESR_ELx_EC_DABT_LOW
   mov x0, #ESR_ELx_EC_IABT_LOW
   ccmpx2, x0, #4, ne
 -- 
 1.7.9.5
 

Otherwise looks good,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/9] HyperV equivalent of pvpanic driver

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 13:33, Denis V. Lunev wrote:
 Windows 2012 guests can notify hypervisor about occurred guest crash
 (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
 handling of this MSR's by KVM and sending notification to user space that
 allows to gather Windows guest crash dump by QEMU/LIBVIRT.
 
 The idea is to provide functionality equal to pvpanic device without
 QEMU guest agent for Windows.
 
 The idea is borrowed from Linux HyperV bus driver and validated against
 Windows 2k12.
 
 Changes from v2:
 * forbid modification crash ctl msr by guest
 * qemu_system_guest_panicked usage in pvpanic and s390x
 * hyper-v crash handler move from generic kvm to i386
 * hyper-v crash handler: skip fetching crash msrs just mark crash occured
 * sync with linux-next 20150629
 * patch 11 squashed to patch 10
 * patch 9 squashed to patch 7
 
 Changes from v1:
 * hyperv code move to hyperv.c
 * added read handlers of crash data msrs
 * added per vm and per cpu hyperv context structures
 * added saving crash msrs inside qemu cpu state
 * added qemu fetch and update of crash msrs
 * added qemu crash msrs store in cpu state and it's migration
 
 Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Gleb Natapov g...@kernel.org
 CC: Paolo Bonzini pbonz...@redhat.com

The patches look good, thanks.  I'll queue them as soon as I start
merging 4.3 features.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: s390: virtio-ccw: Fix config space values

2015-07-01 Thread Paolo Bonzini



On 01/07/2015 15:45, Michael S. Tsirkin wrote:
  Paolo,
  
  here is fix targetted for kvm/master (4.2) that fixes an issue with
  virtio config space on s390. It mostly manifests in vhost-scsi
  not working properly on s390. The problem itself might affect other
  things as well so cc stable/target 4.2.
  
  @Michael FYI, sending this via Paolo as most virtio-ccw kernel
  things went this way.

 OK but virtio patches should be Cc'd to the virtualization mailing
 list. So I think we need a separate MAINTAINERS entry for
 s390/virtio.

See my other email---I think no special case is necessary.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] KVM: x86: legacy NMI watchdog fixes

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 22:19, Radim Krčmář wrote:
 Until v2.6.37, Linux used NMI watchdog that utilized IO-APIC and LVT0.
 This series fixes some problems with APICv, restore, and concurrency
 while keeping the monster asleep.

Queued for 4.2.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: x86: make vapics_in_nmi_mode atomic

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 22:19, Radim Krčmář wrote:
 Writes were a bit racy, but hard to turn into a bug at the same time.
 (Particularly because modern Linux doesn't use this feature anymore.)

I suspect patch 2 makes this race much easier to trigger, so it deserves
Cc: stable@ as well.

Paolo

 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h | 2 +-
  arch/x86/kvm/i8254.c| 2 +-
  arch/x86/kvm/lapic.c| 4 ++--
  3 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index c7fa57b529d2..2a7f5d782c33 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -607,7 +607,7 @@ struct kvm_arch {
   struct kvm_pic *vpic;
   struct kvm_ioapic *vioapic;
   struct kvm_pit *vpit;
 - int vapics_in_nmi_mode;
 + atomic_t vapics_in_nmi_mode;
   struct mutex apic_map_lock;
   struct kvm_apic_map *apic_map;
  
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index 4dce6f8b6129..f90952f64e79 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -305,7 +305,7 @@ static void pit_do_work(struct kthread_work *work)
* LVT0 to NMI delivery. Other PIC interrupts are just sent to
* VCPU0, and only if its LVT0 is in EXTINT mode.
*/
 - if (kvm-arch.vapics_in_nmi_mode  0)
 + if (atomic_read(kvm-arch.vapics_in_nmi_mode)  0)
   kvm_for_each_vcpu(i, vcpu, kvm)
   kvm_apic_nmi_wd_deliver(vcpu);
   }
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 8dc32b5a4e0d..954e98a8c2e3 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1264,9 +1264,9 @@ static void apic_manage_nmi_watchdog(struct kvm_lapic 
 *apic, u32 lvt0_val)
   if (lvt0_in_nmi_mode) {
   apic_debug(Receive NMI setting on APIC_LVT0 
  for cpu %d\n, apic-vcpu-vcpu_id);
 - apic-vcpu-kvm-arch.vapics_in_nmi_mode++;
 + atomic_inc(apic-vcpu-kvm-arch.vapics_in_nmi_mode);
   } else
 - apic-vcpu-kvm-arch.vapics_in_nmi_mode--;
 + atomic_dec(apic-vcpu-kvm-arch.vapics_in_nmi_mode);
   }
  }
  
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: s390: virtio-ccw: Fix config space values

2015-07-01 Thread Paolo Bonzini



On 01/07/2015 16:18, Michael S. Tsirkin wrote:
 On Wed, Jul 01, 2015 at 04:05:27PM +0200, Paolo Bonzini wrote:


 On 01/07/2015 15:45, Michael S. Tsirkin wrote:
 Paolo,

 here is fix targetted for kvm/master (4.2) that fixes an issue with
 virtio config space on s390. It mostly manifests in vhost-scsi
 not working properly on s390. The problem itself might affect other
 things as well so cc stable/target 4.2.

 @Michael FYI, sending this via Paolo as most virtio-ccw kernel
 things went this way.

 OK but virtio patches should be Cc'd to the virtualization mailing
 list. So I think we need a separate MAINTAINERS entry for
 s390/virtio.

 See my other email---I think no special case is necessary.

 Paolo
 
 Hmm but MAINTAINERS doesn't tell people they should Cc virtio ML -
 isn't that a problem?

Ah that's because ccw isn't under drivers/virtio.  Yes, that should be
fixed and the old pre-ccw drivers should also get a stanza in MAINTAINERS.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: s390: virtio-ccw: Fix config space values

2015-07-01 Thread Michael S. Tsirkin

On Mon, Jun 29, 2015 at 04:44:00PM +0200, Christian Borntraeger wrote:
 Paolo,
 
 here is fix targetted for kvm/master (4.2) that fixes an issue with
 virtio config space on s390. It mostly manifests in vhost-scsi
 not working properly on s390. The problem itself might affect other
 things as well so cc stable/target 4.2.
 
 @Michael FYI, sending this via Paolo as most virtio-ccw kernel
 things went this way.

OK but virtio patches should be Cc'd to the virtualization mailing
list. So I think we need a separate MAINTAINERS entry for
s390/virtio.

 Cornelia Huck (1):
   KVM: s390: virtio-ccw: don't overwrite config space values
 
  drivers/s390/kvm/virtio_ccw.c | 11 +--
  1 file changed, 9 insertions(+), 2 deletions(-)
 
 -- 
 2.3.0
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-07-01 Thread Christoffer Dall

On Wed, Jun 24, 2015 at 05:04:12PM -0700, Mario Smarduch wrote:
 After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved
 to guest trap handling. This allows us to keep exit handling flow between both
 architectures consistent.
 
 Signed-off-by: Mario Smarduch m.smard...@samsung.com
 ---
  arch/arm/kvm/interrupts.S |   14 --
  1 file changed, 8 insertions(+), 6 deletions(-)
 
 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 79caf79..b245b4e 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -363,10 +363,6 @@ hyp_hvc:
   @ Check syndrome register
   mrc p15, 4, r1, c5, c2, 0   @ HSR
   lsr r0, r1, #HSR_EC_SHIFT
 -#ifdef CONFIG_VFPv3
 - cmp r0, #HSR_EC_CP_0_13
 - beq switch_to_guest_vfp
 -#endif
   cmp r0, #HSR_EC_HVC
   bne guest_trap  @ Not HVC instr.
  
 @@ -380,7 +376,10 @@ hyp_hvc:
   cmp r2, #0
   bne guest_trap  @ Guest called HVC
  
 -host_switch_to_hyp:
 + /*
 +  * Getting here means host called HVC, we shift parameters and branch
 +  * to Hyp function.
 +  */
   pop {r0, r1, r2}
  
   /* Check for __hyp_get_vectors */
 @@ -411,6 +410,10 @@ guest_trap:
  
   @ Check if we need the fault information
   lsr r1, r1, #HSR_EC_SHIFT
 +#ifdef CONFIG_VFPv3
 + cmp r1, #HSR_EC_CP_0_13
 + beq switch_to_guest_vfp
 +#endif
   cmp r1, #HSR_EC_IABT
   mrceq   p15, 4, r2, c6, c0, 2   @ HIFAR
   beq 2f
 @@ -479,7 +482,6 @@ guest_trap:
   */
  #ifdef CONFIG_VFPv3
  switch_to_guest_vfp:
 - load_vcpu   @ Load VCPU pointer to r0
   push{r3-r7}
  
   @ NEON/VFP used.  Turn on VFP access.
 -- 
 1.7.9.5
 

Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] KVM: x86: properly restore LVT0

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 22:19, Radim Krčmář wrote:
 Legacy NMI watchdog didn't work after migration/resume, because
 vapics_in_nmi_mode was left at 0.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/lapic.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index f49c7cca1de6..8dc32b5a4e0d 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1824,6 +1824,7 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
   apic_update_ppr(apic);
   hrtimer_cancel(apic-lapic_timer.timer);
   apic_update_lvtt(apic);
 + apic_manage_nmi_watchdog(apic, kvm_apic_get_reg(apic, APIC_LVT0));
   update_divide_count(apic);
   start_apic_timer(apic);
   apic-irr_pending = true;
 

Applied already, with Cc: stable, as it is not related to APICv.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL] virtio/vhost: cross endian support

2015-07-01 Thread Michael S. Tsirkin

The following changes since commit 8a7b19d8b542b87bccc3eaaf81dcc90a5ca48aea:

  include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h (2015-06-01 
15:46:54 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 59a5b0f7bf74f88da6670bcbf924d8cc1e75b1ee:

  virtio-pci: alloc only resources actually used. (2015-06-24 08:15:09 +0200)


virtio/vhost: cross endian support

I have just queued some more bugfix patches today but none fix regressions and
none are related to these ones, so it looks like a good time for a merge for
-rc1.

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Gerd Hoffmann (1):
  virtio-pci: alloc only resources actually used.

Greg Kurz (8):
  virtio: introduce virtio_is_little_endian() helper
  tun: add tun_is_little_endian() helper
  macvtap: introduce macvtap_is_little_endian() helper
  vringh: introduce vringh_is_little_endian() helper
  vhost: introduce vhost_is_little_endian() helper
  virtio: add explicit big-endian support to memory accessors
  vhost: cross-endian support for legacy devices
  macvtap/tun: cross-endian support for little-endian hosts

 drivers/vhost/vhost.h  | 25 ---
 drivers/virtio/virtio_pci_common.h |  2 +
 include/linux/virtio_byteorder.h   | 24 ++-
 include/linux/virtio_config.h  | 18 +---
 include/linux/vringh.h | 18 +---
 include/uapi/linux/if_tun.h|  6 +++
 include/uapi/linux/vhost.h | 14 +++
 drivers/net/macvtap.c  | 65 -
 drivers/net/tun.c  | 67 +-
 drivers/vhost/vhost.c  | 85 +-
 drivers/virtio/virtio_pci_common.c |  7 
 drivers/virtio/virtio_pci_legacy.c | 13 +-
 drivers/virtio/virtio_pci_modern.c | 24 ---
 drivers/net/Kconfig| 14 +++
 drivers/vhost/Kconfig  | 15 +++
 15 files changed, 350 insertions(+), 47 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits

2015-07-01 Thread Christoffer Dall

On Wed, Jun 24, 2015 at 05:04:10PM -0700, Mario Smarduch wrote:
 Currently we save/restore fp/simd on each exit. Fist  patch optimizes arm64
 save/restore, we only do so on Guest access. hackbench and
 several lmbench tests show anywhere from 30% to above 50% optimzation
 achieved.
 
 In second patch 32-bit handler is updated to keep exit handling consistent
 with 64-bit code.

30-50% of what?  The overhead or overall performance?

 
 Changes since v1:
 - Addressed Marcs comments
 - Verified optimization improvements with lmbench and hackbench, updated 
   commit message
 
 Changes since v2:
 - only for patch 2/2
   - Reworked trapping to vfp access handler
 
 Changes since v3:
 - Only for patch 2/2
   - Removed load_vcpu in switch_to_guest_vfp per Marcs comment
   - Got another chance to replace an unreferenced label with a comment
 
 
 Mario Smarduch (2):
   Optimize arm64 skip 30-50% vfp/simd save/restore on exits
   keep arm vfp/simd exit handling consistent with arm64
 
  arch/arm/kvm/interrupts.S|   14 +++-
  arch/arm64/include/asm/kvm_arm.h |5 -
  arch/arm64/kvm/hyp.S |   46 
 +++---
  3 files changed, 55 insertions(+), 10 deletions(-)
 
 -- 
 1.7.9.5
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 10:17:52AM +0100, Marc Zyngier wrote:
 On 30/06/15 21:19, Christoffer Dall wrote:
  On Mon, Jun 08, 2015 at 06:04:00PM +0100, Marc Zyngier wrote:
  We only set the irq_queued flag for level interrupts, meaning
  that !vgic_irq_is_queued(vcpu, irq) is a good enough predicate
  for all interrupts.
 
  This will allow us to inject edge HW interrupts, for which the
  state ACTIVE+PENDING is not allowed.
  
  I don't understand this; ACTIVE+PENDING is allowed for edge interrupts.
  Do you mean that if we set the HW bit in the LR, then we are linking to
  an HW interrupt where we don't allow that to be ACTIVE+PENDING on the HW
  GIC side?
  
  Why is this relevant here?  I feel like I'm missing context.
 
 I've probably taken a shortcut here - bear with me while I'm trying to
 explain the issue.
 
 For HW interrupts, we shouldn't even try to use the state bits in the
 LR, because that state is contained in the physical distributor. Setting
 the HW bit really means there is something going on at the distributor
 level, just go there.

ok, so by HW interrupts you mean virtual interrupts with the HW bit in
the LR set, correct?

 
 If we were to inject a ACTIVE+PENDING interrupt at the LR level, we'd
 basically loose the second interrupt because that state is simply not
 considered.

Huh?  Which second interrupt.  I looked at the spec and it says don't
use the state bits for HW interrupts, so isn't it simply not supported
to set these bits at all and that's it?

 
 So the trick we're using is to only inject the active interrupt, and
 prevent anything else from being injected until we can confirm that the
 active state has been cleared at the physical level.
 
 Does it make any sense?
 
Sort of, but what I don't understand now is how the guest ever sees the
interrupt then.  If we always inject the virtual interrupt by setting
the active state on the physical distributor, and we can't inject this
as active+pending, and the guest doesn't see the state in the LR, then
how does this ever raise a virtual interrupt and how does the guest see
an interrupt which is only PENDING so that it can ack it etc. etc.?

Maybe I don't fully understand how the HW bit works after all...

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts

2015-07-01 Thread Christoffer Dall

On Wed, Jul 01, 2015 at 11:20:45AM +0100, Marc Zyngier wrote:
 On 30/06/15 21:19, Christoffer Dall wrote:
  On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
  In order to be able to feed physical interrupts to a guest, we need
  to be able to establish the virtual-physical mapping between the two
  worlds.
 
  The mapping is kept in a rbtree, indexed by virtual interrupts.
  
  how many of these do you expect there will be?  Is the extra code and
  complexity of an rbtree really warranted?
  
  I would assume that you'll have one PPI for each CPU in the default case
  plus potentially a few more for an assigned network adapter, let's say a
  couple of handfulls.  Am I missing something obvious or is this
  optimization of traversing a list of 10-12 mappings in the typical case
  not likely to be measurable?
  
  I would actually be more concerned about the additional locking and
  would look at RCU for protecting a list instead.  Can you protect an
  rbtree with RCU easily?
 
 Not very easily. There was some work done a while ago for the dentry
 cache IIRC, but I doubt that's reusable directly, and probably overkill.
 
 RCU protected lists are, on the other hand, readily available. Bah. I'll
 switch to this. By the time it becomes the bottleneck, the world will
 have moved on. Or so I hope.
 
We can also move to RB trees if we have some data to show us it's worth
the hassle later on, but I assume that since these structs are fairly
small and overhead like this is mostly to show up on a hot path, a
better optimization would be to allocate a bunch of these structures
contiguously for cache locality, but again, I feel like this is all
premature and we should measure the beast first.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts

2015-07-01 Thread Marc Zyngier

On 30/06/15 21:19, Christoffer Dall wrote:
 On Mon, Jun 08, 2015 at 06:04:01PM +0100, Marc Zyngier wrote:
 In order to be able to feed physical interrupts to a guest, we need
 to be able to establish the virtual-physical mapping between the two
 worlds.

 The mapping is kept in a rbtree, indexed by virtual interrupts.
 
 how many of these do you expect there will be?  Is the extra code and
 complexity of an rbtree really warranted?
 
 I would assume that you'll have one PPI for each CPU in the default case
 plus potentially a few more for an assigned network adapter, let's say a
 couple of handfulls.  Am I missing something obvious or is this
 optimization of traversing a list of 10-12 mappings in the typical case
 not likely to be measurable?
 
 I would actually be more concerned about the additional locking and
 would look at RCU for protecting a list instead.  Can you protect an
 rbtree with RCU easily?

Not very easily. There was some work done a while ago for the dentry
cache IIRC, but I doubt that's reusable directly, and probably overkill.

RCU protected lists are, on the other hand, readily available. Bah. I'll
switch to this. By the time it becomes the bottleneck, the world will
have moved on. Or so I hope.

M.

 
 Thanks,
 -Christoffer
 

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  include/kvm/arm_vgic.h |  18 
  virt/kvm/arm/vgic.c| 110 
 +
  2 files changed, 128 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 4f9fa1d..33d121a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -159,6 +159,14 @@ struct vgic_io_device {
  struct kvm_io_device dev;
  };
  
 +struct irq_phys_map {
 +struct rb_node  node;
 +u32 virt_irq;
 +u32 phys_irq;
 +u32 irq;
 +boolactive;
 +};
 +
  struct vgic_dist {
  spinlock_t  lock;
  boolin_kernel;
 @@ -256,6 +264,10 @@ struct vgic_dist {
  struct vgic_vm_ops  vm_ops;
  struct vgic_io_device   dist_iodev;
  struct vgic_io_device   *redist_iodevs;
 +
 +/* Virtual irq to hwirq mapping */
 +spinlock_t  irq_phys_map_lock;
 
 why do we need a separate lock here?
 
 +struct rb_root  irq_phys_map;
  };
  
  struct vgic_v2_cpu_if {
 @@ -307,6 +319,9 @@ struct vgic_cpu {
  struct vgic_v2_cpu_if   vgic_v2;
  struct vgic_v3_cpu_if   vgic_v3;
  };
 +
 +/* Protected by the distributor's irq_phys_map_lock */
 +struct rb_root  irq_phys_map;
  };
  
  #define LR_EMPTY0xff
 @@ -331,6 +346,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
 unsigned int irq_num,
  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 +   int virt_irq, int irq);
 +int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
  
  #define irqchip_in_kernel(k)(!!((k)-arch.vgic.in_kernel))
  #define vgic_initialized(k) (!!((k)-arch.vgic.nr_cpus))
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 59ed7a3..c6604f2 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -24,6 +24,7 @@
  #include linux/of.h
  #include linux/of_address.h
  #include linux/of_irq.h
 +#include linux/rbtree.h
  #include linux/uaccess.h
  
  #include linux/irqchip/arm-gic.h
 @@ -84,6 +85,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu 
 *vcpu);
  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
 lr_desc);
 +static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 +int virt_irq);
  
  static const struct vgic_ops *vgic_ops;
  static const struct vgic_params *vgic;
 @@ -1585,6 +1588,112 @@ static irqreturn_t vgic_maintenance_handler(int irq, 
 void *data)
  return IRQ_HANDLED;
  }
  
 +static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
 + int virt_irq)
 +{
 +if (virt_irq  VGIC_NR_PRIVATE_IRQS)
 +return vcpu-arch.vgic_cpu.irq_phys_map;
 +else
 +return vcpu-kvm-arch.vgic.irq_phys_map;
 +}
 +
 +struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 +   int virt_irq, int irq)
 +{
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
 +struct rb_node **new = root-rb_node, *parent = NULL;
 +struct irq_phys_map *new_map;
 +struct irq_desc *desc;
 +struct irq_data *data;
 +

[PATCH 13/16] nvdimm: support NFIT_CMD_IMPLEMENTED function

2015-07-01 Thread Xiao Guangrong

__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device,
NFIT_CMD_GET_CONFIG_SIZE, NFIT_CMD_GET_CONFIG_DATA and
NFIT_CMD_SET_CONFIG_DATA, that means we currently only allow to access
device's Label Namespace

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 126 +
 1 file changed, 126 insertions(+)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index c0965ae..b586bf7 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -29,6 +29,15 @@
 #include exec/address-spaces.h
 #include hw/acpi/aml-build.h
 #include hw/mem/pc-nvdimm.h
+#include sysemu/sysemu.h
+
+//#define NVDIMM_DEBUG
+
+#ifdef NVDIMM_DEBUG
+#define nvdebug(fmt, ...) fprintf(stderr, nvdimm:  fmt, ## __VA_ARGS__)
+#else
+#define nvdebug(...)
+#endif
 
 #define PAGE_SIZE   (1UL  12)
 
@@ -135,6 +144,22 @@ static void nfit_spa_uuid_pm(void *uuid)
 memcpy(uuid, uuid_pm, sizeof(uuid_pm));
 }
 
+static bool dsm_is_root_uuid(uint8_t *uuid)
+{
+uuid_le uuid_root = UUID_LE(0x2f10e7a4, 0x9e91, 0x11e4, 0x89,
+0xd3, 0x12, 0x3b, 0x93, 0xf7, 0x5c, 0xba);
+
+return !memcmp(uuid, uuid_root, sizeof(uuid_root));
+}
+
+static bool dsm_is_dimm_uuid(uint8_t *uuid)
+{
+uuid_le uuid_dimm = UUID_LE(0x4309ac30, 0x0d11, 0x11e4, 0x91,
+0x91, 0x08, 0x00, 0x20, 0x0c, 0x9a, 0x66);
+
+return !memcmp(uuid, uuid_dimm, sizeof(uuid_dimm));
+}
+
 enum {
 NFIT_TABLE_SPA = 0,
 NFIT_TABLE_MEM = 1,
@@ -349,6 +374,23 @@ enum {
 NFIT_CMD_VENDOR = 9,
 };
 
+enum {
+NFIT_STATUS_SUCCESS = 0,
+NFIT_STATUS_NOT_SUPPORTED = 1,
+NFIT_STATUS_NON_EXISTING_MEM_DEV = 2,
+NFIT_STATUS_INVALID_PARAS = 3,
+NFIT_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
+#define DSM_REVISION(1)
+
+/* do not support any command except NFIT_CMD_ARS_CAP on root. */
+#define ROOT_SUPPORT_CMD(1  NFIT_CMD_ARS_CAP)
+#define DIMM_SUPPORT_CMD((1  NFIT_CMD_IMPLEMENTED)\
+   | (1  NFIT_CMD_GET_CONFIG_SIZE)\
+   | (1  NFIT_CMD_GET_CONFIG_DATA)\
+   | (1  NFIT_CMD_SET_CONFIG_DATA))
+
 struct dsm_buffer {
 /* RAM page. */
 uint32_t handle;
@@ -366,6 +408,18 @@ struct dsm_buffer {
 };
 };
 
+struct cmd_out_implemented {
+uint64_t cmd_list;
+};
+
+struct dsm_out {
+union {
+uint32_t status;
+struct cmd_out_implemented cmd_implemented;
+uint8_t data[PAGE_SIZE];
+};
+};
+
 static uint64_t dsm_read(void *opaque, hwaddr addr,
  unsigned size)
 {
@@ -374,10 +428,82 @@ static uint64_t dsm_read(void *opaque, hwaddr addr,
 return 0;
 }
 
+static void dsm_write_root(struct dsm_buffer *in, struct dsm_out *out)
+{
+uint32_t function = in-arg2;
+
+if (function == NFIT_CMD_IMPLEMENTED) {
+out-cmd_implemented.cmd_list = ROOT_SUPPORT_CMD;
+return;
+}
+
+out-status = NFIT_STATUS_NOT_SUPPORTED;
+nvdebug(Return status %#x.\n, out-status);
+}
+
+static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
+{
+uint32_t function = in-arg2;
+uint32_t status;
+
+switch (function) {
+case NFIT_CMD_IMPLEMENTED:
+out-cmd_implemented.cmd_list = DIMM_SUPPORT_CMD;
+return;
+default:
+status = NFIT_STATUS_NOT_SUPPORTED;
+};
+
+nvdebug(Return status %#x.\n, status);
+out-status = status;
+}
+
 static void dsm_write(void *opaque, hwaddr addr,
   uint64_t val, unsigned size)
 {
+struct MemoryRegion *dsm_ram_mr = opaque;
+struct dsm_buffer *dsm;
+struct dsm_out *out;
+void *buf;
+
 assert(val == NOTIFY_VALUE);
+
+buf = memory_region_get_ram_ptr(dsm_ram_mr);
+dsm = buf;
+out = buf;
+
+nvdebug(Arg0  UUID_FMT .\n, dsm-arg0[0], dsm-arg0[1], dsm-arg0[2],
+dsm-arg0[3], dsm-arg0[4], dsm-arg0[5], dsm-arg0[6],
+dsm-arg0[7], dsm-arg0[8], dsm-arg0[9], dsm-arg0[10],
+dsm-arg0[11], dsm-arg0[12], dsm-arg0[13], dsm-arg0[14],
+dsm-arg0[15]);
+nvdebug(Handler %#x, Arg1 %#x, Arg2 %#x.\n, dsm-handle, dsm-arg1,
+dsm-arg2);
+
+if (dsm-arg1 != DSM_REVISION) {
+nvdebug(Revision %#x is not supported, expect %#x.\n,
+dsm-arg1, DSM_REVISION);
+goto exit;
+}
+
+if (!dsm-handle) {
+if (!dsm_is_root_uuid(dsm-arg0)) {
+nvdebug(Root UUID does not match.\n);
+goto exit;
+}
+
+return dsm_write_root(dsm, out);
+}
+
+if (!dsm_is_dimm_uuid(dsm-arg0)) {
+nvdebug(DIMM UUID does not match.\n);
+goto exit;
+}
+
+return dsm_write_nvdimm(dsm, out);
+
+exit:
+out-status = NFIT_STATUS_NOT_SUPPORTED;
 }
 
 static const

Re: [PATCH 8/9] kvm/x86: add sending hyper-v crash notification to user space

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 13:33, Denis V. Lunev wrote:
 From: Andrey Smetanin asmeta...@virtuozzo.com
 
 Sending of notification is done by exiting vcpu to user space
 if KVM_REQ_HV_CRASH is enabled for vcpu. kvm_run structure
 will contains system_event with type KVM_SYSTEM_EVENT_CRASH
 and flag KVM_SYSTEM_EVENT_FL_HV_CRASH to clarify that
 crash occures inside Hyper-V based guest.

This needs to be documented in Documentation/virtual/kvm/api.txt.  Also,
please rename KVM_SYSTEM_EVENT_FL_HV_CRASH to
KVM_SYSTEM_EVENT_FLAG_HV_CRASH and move it to
arch/x86/include/uapi/asm/kvm.h.

You do not need to send the whole series again; just resend this one patch.

Paolo

 Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com
 Signed-off-by: Denis V. Lunev d...@openvz.org
 CC: Paolo Bonzini pbonz...@redhat.com
 CC: Gleb Natapov g...@kernel.org
 ---
  arch/x86/kvm/x86.c   | 8 
  include/uapi/linux/kvm.h | 2 ++
  2 files changed, 10 insertions(+)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 47b7507..55a4b92 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6264,6 +6264,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   vcpu_scan_ioapic(vcpu);
   if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
   kvm_vcpu_reload_apic_access_page(vcpu);
 + if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
 + vcpu-run-exit_reason = KVM_EXIT_SYSTEM_EVENT;
 + vcpu-run-system_event.type = KVM_SYSTEM_EVENT_CRASH;
 + vcpu-run-system_event.flags =
 + KVM_SYSTEM_EVENT_FL_HV_CRASH;
 + r = 0;
 + goto out;
 + }
   }
  
   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 716ad4a..eefb8b9 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -317,6 +317,8 @@ struct kvm_run {
   struct {
  #define KVM_SYSTEM_EVENT_SHUTDOWN   1
  #define KVM_SYSTEM_EVENT_RESET  2
 +#define KVM_SYSTEM_EVENT_CRASH  3
 +#define KVM_SYSTEM_EVENT_FL_HV_CRASH(1ULL  0)
   __u32 type;
   __u64 flags;
   } system_event;
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: add hyper-v crash msrs values

2015-07-01 Thread Denis V. Lunev


On 01/07/15 18:00, Paolo Bonzini wrote:


On 30/06/2015 13:33, Denis V. Lunev wrote:

+#define HV_X64_MSR_CRASH_CTL_NOTIFY(1ULL  63)
+#define HV_X64_MSR_CRASH_CTL_CONTENTS  \
+   (HV_X64_MSR_CRASH_CTL_NOTIFY)

Why is HV_X64_MSR_CRASH_CTL_CONTENTS needed?  Can I just remove it?

Paolo

this was a direct request from Peter Hornyack peterhorny...@google.com

I suggest here:

#define HV_X64_MSR_CRASH_CTL_CONTENTS  \
(HV_CRASH_CTL_CRASH_NOTIFY)

To allow for more crash actions to be added in the future.

Den
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[kvm-ppc:kvm-ppc-queue 6/9] kernel/fork.c:99:0: warning: MAX_THREADS redefined

2015-07-01 Thread kbuild test robot

tree:   git://github.com/agraf/linux-2.6.git kvm-ppc-queue
head:   cc75c6b1368c88977d6015fd67b02c85ee04e57c
commit: c98d80c7b761a4b3bcbcc9314c4492f76585caa0 [6/9] KVM: PPC: Book3S HV: 
Implement dynamic micro-threading on POWER8
config: powerpc-defconfig (attached as .config)
reproduce:
  wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout c98d80c7b761a4b3bcbcc9314c4492f76585caa0
  # save the attached .config to linux build tree
  make.cross ARCH=powerpc 

All warnings (new ones prefixed by ):

 kernel/fork.c:99:0: warning: MAX_THREADS redefined
#define MAX_THREADS FUTEX_TID_MASK
^
   In file included from arch/powerpc/include/asm/paca.h:25:0,
from arch/powerpc/include/asm/hw_irq.h:42,
from arch/powerpc/include/asm/irqflags.h:11,
from include/linux/irqflags.h:15,
from include/linux/spinlock.h:53,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/slab.h:14,
from kernel/fork.c:14:
   arch/powerpc/include/asm/kvm_book3s_asm.h:29:0: note: this is the location 
of the previous definition
#define MAX_THREADS  8
^
 kernel/fork.c:99:0: warning: MAX_THREADS redefined
#define MAX_THREADS FUTEX_TID_MASK
^
   In file included from arch/powerpc/include/asm/paca.h:25:0,
from arch/powerpc/include/asm/hw_irq.h:42,
from arch/powerpc/include/asm/irqflags.h:11,
from include/linux/irqflags.h:15,
from include/linux/spinlock.h:53,
from include/linux/mmzone.h:7,
from include/linux/gfp.h:5,
from include/linux/slab.h:14,
from kernel/fork.c:14:
   arch/powerpc/include/asm/kvm_book3s_asm.h:29:0: note: this is the location 
of the previous definition
#define MAX_THREADS  8
^

vim +/MAX_THREADS +99 kernel/fork.c

^1da177e Linus Torvalds  2005-04-16   83  #include asm/cacheflush.h
^1da177e Linus Torvalds  2005-04-16   84  #include asm/tlbflush.h
^1da177e Linus Torvalds  2005-04-16   85  
ad8d75ff Steven Rostedt  2009-04-14   86  #include trace/events/sched.h
ad8d75ff Steven Rostedt  2009-04-14   87  
43d2b113 KAMEZAWA Hiroyuki   2012-01-10   88  #define CREATE_TRACE_POINTS
43d2b113 KAMEZAWA Hiroyuki   2012-01-10   89  #include trace/events/task.h
43d2b113 KAMEZAWA Hiroyuki   2012-01-10   90  
^1da177e Linus Torvalds  2005-04-16   91  /*
ac1b398d Heinrich Schuchardt 2015-04-16   92   * Minimum number of threads to 
boot the kernel
ac1b398d Heinrich Schuchardt 2015-04-16   93   */
ac1b398d Heinrich Schuchardt 2015-04-16   94  #define MIN_THREADS 20
ac1b398d Heinrich Schuchardt 2015-04-16   95  
ac1b398d Heinrich Schuchardt 2015-04-16   96  /*
ac1b398d Heinrich Schuchardt 2015-04-16   97   * Maximum number of threads
ac1b398d Heinrich Schuchardt 2015-04-16   98   */
ac1b398d Heinrich Schuchardt 2015-04-16  @99  #define MAX_THREADS FUTEX_TID_MASK
ac1b398d Heinrich Schuchardt 2015-04-16  100  
ac1b398d Heinrich Schuchardt 2015-04-16  101  /*
^1da177e Linus Torvalds  2005-04-16  102   * Protected counters by 
write_lock_irq(tasklist_lock)
^1da177e Linus Torvalds  2005-04-16  103   */
^1da177e Linus Torvalds  2005-04-16  104  unsigned long total_forks;
/* Handle normal Linux uptimes. */
^1da177e Linus Torvalds  2005-04-16  105  int nr_threads;   
/* The idle threads do not count.. */
^1da177e Linus Torvalds  2005-04-16  106  
^1da177e Linus Torvalds  2005-04-16  107  int max_threads;  /* 
tunable limit on nr_threads */

:: The code at line 99 was first introduced by commit
:: ac1b398de1ef94aeee8ba87b0120763526572a6e kernel/fork.c: avoid division 
by zero

:: TO: Heinrich Schuchardt xypron.g...@gmx.de
:: CC: Linus Torvalds torva...@linux-foundation.org

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.1.0 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_GENERIC_CPU=y
# CONFIG_CELL_CPU is not set
# CONFIG_POWER4_CPU is not set
# CONFIG_POWER5_CPU is not set
# CONFIG_POWER6_CPU is not set
# CONFIG_POWER7_CPU is not set
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
# CONFIG_TUNE_CELL is not set
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_PPC_DOORBELL=y
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
# CONFIG_CPU_LITTLE_ENDIAN is not set

[PATCH 03/16] acpi: add aml_derefof

2015-07-01 Thread Xiao Guangrong

Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 02f9e3d..9e89efc 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 996ac5b..21dc5e9 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const 
char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/16] nvdimm: build ACPI NFIT table

2015-07-01 Thread Xiao Guangrong

NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 tables:
- SPA table, define the PMEM region info

- MEM DEV table, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR table, it defines Vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/i386/acpi-build.c   |   3 +
 hw/mem/pc-nvdimm.c | 286 +
 include/hw/mem/pc-nvdimm.h |   8 ++
 3 files changed, 297 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6a1ab09..80c21be 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -39,6 +39,7 @@
 #include hw/loader.h
 #include hw/isa/isa.h
 #include hw/acpi/memory_hotplug.h
+#include hw/mem/pc-nvdimm.h
 #include sysemu/tpm.h
 #include hw/acpi/tpm.h
 #include sysemu/tpm_backend.h
@@ -1741,6 +1742,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables 
*tables)
 build_dmar_q35(tables_blob, tables-linker);
 }
 
+pc_nvdimm_build_nfit_table(table_offsets, tables_blob, tables-linker);
+
 /* Add tables supplied by user (if any) */
 for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
 unsigned len = acpi_table_len(u);
diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 9531935..e7cff29 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -27,10 +27,12 @@
 #include linux/fs.h
 
 #include exec/address-spaces.h
+#include hw/acpi/aml-build.h
 #include hw/mem/pc-nvdimm.h
 
 #define PAGE_SIZE   (1UL  12)
 
+#define MAX_NVDIMM_NUMBER   (10)
 #define MIN_CONFIG_DATA_SIZE(128  10)
 
 static struct nvdimms_info {
@@ -65,6 +67,290 @@ static uint32_t new_device_index(void)
 return nvdimms_info.device_index++;
 }
 
+static int pc_nvdimm_built_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PC_NVDIMM)) {
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+/* only realized NVDIMMs matter */
+if (memory_region_size(nvdimm-mr)) {
+*list = g_slist_append(*list, nvdimm);
+}
+}
+
+object_child_foreach(obj, pc_nvdimm_built_list, opaque);
+return 0;
+}
+
+static GSList *get_nvdimm_built_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), pc_nvdimm_built_list, list);
+return list;
+}
+
+static int get_nvdimm_device_number(GSList *list)
+{
+int nr = 0;
+
+for (; list; list = list-next) {
+nr++;
+}
+
+return nr;
+}
+
+static uint32_t nvdimm_index_to_sn(int index)
+{
+return 0x123456 + index;
+}
+
+static uint32_t nvdimm_index_to_handle(int index)
+{
+return index + 1;
+}
+
+typedef struct {
+uint8_t b[16];
+} uuid_le;
+
+#define UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)   \
+((uuid_le) \
+{ { (a)  0xff, ((a)  8)  0xff, ((a)  16)  0xff, ((a)  24)  0xff, \
+(b)  0xff, ((b)  8)  0xff, (c)  0xff, ((c)  8)  0xff,  \
+(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } })
+
+static void nfit_spa_uuid_pm(void *uuid)
+{
+uuid_le uuid_pm = UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d,
+  0x33, 0x18, 0xb7, 0x8c, 0xdb);
+memcpy(uuid, uuid_pm, sizeof(uuid_pm));
+}
+
+enum {
+NFIT_TABLE_SPA = 0,
+NFIT_TABLE_MEM = 1,
+NFIT_TABLE_IDT = 2,
+NFIT_TABLE_SMBIOS = 3,
+NFIT_TABLE_DCR = 4,
+NFIT_TABLE_BDW = 5,
+NFIT_TABLE_FLUSH = 6,
+};
+
+enum {
+EFI_MEMORY_UC = 0x1ULL,
+EFI_MEMORY_WC = 0x2ULL,
+EFI_MEMORY_WT = 0x4ULL,
+EFI_MEMORY_WB = 0x8ULL,
+EFI_MEMORY_UCE = 0x10ULL,
+EFI_MEMORY_WP = 0x1000ULL,
+EFI_MEMORY_RP = 0x2000ULL,
+EFI_MEMORY_XP = 0x4000ULL,
+EFI_MEMORY_NV = 0x8000ULL,
+EFI_MEMORY_MORE_RELIABLE = 0x1ULL,
+};
+
+/*
+ * struct nfit - Nvdimm Firmware Interface Table
+ * @signature: NFIT
+ */
+struct nfit {
+ACPI_TABLE_HEADER_DEF
+uint32_t reserved;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_spa - System Physical Address Range Structure
+ */
+struct nfit_spa {
+uint16_t type;
+uint16_t length;
+uint16_t spa_index;
+uint16_t flags;
+uint32_t reserved;
+uint32_t proximity_domain;
+uint8_t type_uuid[16];
+uint64_t spa_base;
+uint64_t spa_length;
+uint64_t mem_attr;
+} QEMU_PACKED;
+
+/*
+ * struct nfit_memdev - Memory Device to SPA Map Structure
+ */
+struct nfit_memdev {
+uint16_t type;
+uint16_t length;
+uint32_t nfit_handle;
+uint16_t phys_id;
+uint16_t region_id;
+uint16_t spa_index;
+uint16_t

[PATCH 00/16] implement vNVDIMM

2015-07-01 Thread Xiao Guangrong

== Background ==
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

== Design ==
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named pc-nvdimm, it has a parameter, file, which
is the file-based backed memory passed to guest. The file can be regular file
and block device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Config Data Area
So that the label namespace data can be persistent during power lose or system
failure

--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)

_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu

We use the address region above 4G to map these pages because there is huge
free space above 4G and it can avoid the address overlap with PCI and other
address reserved component (e,g HPET). This is also the reason we choose MMIO
notification instead of PIO

== Test ==
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append '-device pc-nvdimm,file=/tmp/nvdimm' in Qemu command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
continue;
}
 
-   if (nfit_mem-bdw  nfit_mem-memdev_pmem)
+   //if (nfit_mem-bdw  nfit_mem-memdev_pmem)
+   if (nfit_mem-memdev_pmem)
flags |= NDD_ALIASING;

You can append another NVDIMM device in guest and do:   
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen`  uuid
# echo `expr 1024 \* 1024 \* 128`  size
then reload nd.pmem.ko

You can see /dev/pmem1 appears

== TODO ==
1) NVDIMM NUMA support
2) NVDIMM hotplug support

Xiao Guangrong (16):
  acpi: allow aml_operation_region() working on 64 bit offset
  i386/acpi-build: allow SSDT to operate on 64 bit
  acpi: add aml_derefof
  acpi: add aml_sizeof
  acpi: add aml_create_field
  pc: implement NVDIMM device abstract
  nvdimm: reserve address range for NVDIMM
  nvdimm: init backend memory mapping and config data area
  nvdimm: build ACPI NFIT table
  nvdimm: init the address region used by _DSM method
  nvdimm: build ACPI nvdimm devices
  nvdimm: save arg3 for NVDIMM device _DSM method
  nvdimm: support NFIT_CMD_IMPLEMENTED function
  nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function
  nvdimm: support NFIT_CMD_GET_CONFIG_DATA
  nvdimm: support NFIT_CMD_SET_CONFIG_DATA

[PATCH 08/16] nvdimm: init backend memory mapping and config data area

2015-07-01 Thread Xiao Guangrong

The parameter @file is used as backed memory for NVDIMM which is
divided into two parts:
- first parts is (0, size - 128K], which is used as PMEM (Persistent
  Memory)
- 128K at the end of the file, which is used as Config Data Area, it's
  used to store Label namespace data

The @file supports both regular file and block device, of course we
can assign any these two kinds of files for test and emulation, however,
in the real word for performance reason, we usually used these files as
NVDIMM backed file:
- the regular file in the filesystem with DAX enabled created on NVDIMM
  device on host
- the raw PMEM device on host, e,g /dev/pmem0

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 102 -
 include/hw/mem/pc-nvdimm.h |   5 +++
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index b40d4e7..9531935 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -22,12 +22,20 @@
  * License along with this library; if not, see http://www.gnu.org/licenses/
  */
 
+#include sys/mman.h
+#include sys/ioctl.h
+#include linux/fs.h
+
+#include exec/address-spaces.h
 #include hw/mem/pc-nvdimm.h
 
-#define PAGE_SIZE  (1UL  12)
+#define PAGE_SIZE   (1UL  12)
+
+#define MIN_CONFIG_DATA_SIZE(128  10)
 
 static struct nvdimms_info {
 ram_addr_t current_addr;
+int device_index;
 } nvdimms_info;
 
 /* the address range [offset, ~0ULL) is reserved for NVDIMM. */
@@ -37,6 +45,26 @@ void pc_nvdimm_reserve_range(ram_addr_t offset)
 nvdimms_info.current_addr = offset;
 }
 
+static ram_addr_t reserved_range_push(uint64_t size)
+{
+uint64_t current;
+
+current = ROUND_UP(nvdimms_info.current_addr, PAGE_SIZE);
+
+/* do not have enough space? */
+if (current + size  current) {
+return 0;
+}
+
+nvdimms_info.current_addr = current + size;
+return current;
+}
+
+static uint32_t new_device_index(void)
+{
+return nvdimms_info.device_index++;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
@@ -48,6 +76,11 @@ static void set_file(Object *obj, const char *str, Error 
**errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
 
+if (memory_region_size(nvdimm-mr)) {
+error_setg(errp, cannot change property value);
+return;
+}
+
 if (nvdimm-file) {
 g_free(nvdimm-file);
 }
@@ -60,13 +93,80 @@ static void pc_nvdimm_init(Object *obj)
 object_property_add_str(obj, file, get_file, set_file, NULL);
 }
 
+static uint64_t get_file_size(int fd)
+{
+struct stat stat_buf;
+uint64_t size;
+
+if (fstat(fd, stat_buf)  0) {
+return 0;
+}
+
+if (S_ISREG(stat_buf.st_mode)) {
+return stat_buf.st_size;
+}
+
+if (S_ISBLK(stat_buf.st_mode)  !ioctl(fd, BLKGETSIZE64, size)) {
+return size;
+}
+
+return 0;
+}
+
 static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+char name[512];
+void *buf;
+ram_addr_t addr;
+uint64_t size;
+int fd;
 
 if (!nvdimm-file) {
 error_setg(errp, file property is not set);
 }
+
+fd = open(nvdimm-file, O_RDWR);
+if (fd  0) {
+error_setg(errp, can not open %s, nvdimm-file);
+return;
+}
+
+/* reserve MIN_CONFIGDATA_AREA_SIZE for configue data */
+size = get_file_size(fd) - MIN_CONFIG_DATA_SIZE;
+if ((int64_t)size = 0) {
+error_setg(errp, file size is too small to store NVDIMM
+  configure data);
+goto do_close;
+}
+
+buf = mmap(NULL, size + MIN_CONFIG_DATA_SIZE, PROT_READ | PROT_WRITE,
+   MAP_SHARED, fd, 0);
+if (buf == MAP_FAILED) {
+error_setg(errp, can not do mmap on %s, nvdimm-file);
+goto do_close;
+}
+
+addr = reserved_range_push(size);
+if (!addr) {
+error_setg(errp, do not have enough space for size %#lx.\n, size);
+goto do_unmap;
+}
+
+nvdimm-device_index = new_device_index();
+sprintf(name, NVDIMM-%d, nvdimm-device_index);
+memory_region_init_ram_ptr(nvdimm-mr, OBJECT(dev), name, size, buf);
+vmstate_register_ram(nvdimm-mr, DEVICE(dev));
+memory_region_add_subregion(get_system_memory(), addr, nvdimm-mr);
+
+nvdimm-config_data_addr = buf + size;
+nvdimm-config_data_size = MIN_CONFIG_DATA_SIZE;
+
+return;
+do_unmap:
+munmap(buf, size);
+do_close:
+close(fd);
 }
 
 static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 2081e7c..e743ed1 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -21,6 +21,11 @@ typedef struct PCNVDIMMDevice {
 DeviceState parent_obj;
 
 char *file;
+void *config_data_addr;
+uint64_t config_data_size;
+
+int device_index;
+

[PATCH 06/16] pc: implement NVDIMM device abstract

2015-07-01 Thread Xiao Guangrong

Introduce pc-nvdimm device and it only has one parameter, @file, which
is the backed memory file for NVDIMM device

We can use -device pc-nvdimm,file=/dev/pmem in the Qemu command to
create NVDIMM device for the guest

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/Makefile.objs   |  1 +
 hw/mem/pc-nvdimm.c | 83 ++
 include/hw/mem/pc-nvdimm.h | 32 ++
 3 files changed, 116 insertions(+)
 create mode 100644 hw/mem/pc-nvdimm.c
 create mode 100644 include/hw/mem/pc-nvdimm.h

diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..9a7f5a9 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_LINUX) += pc-nvdimm.o
diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
new file mode 100644
index 000..0209ea9
--- /dev/null
+++ b/hw/mem/pc-nvdimm.c
@@ -0,0 +1,83 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong guangrong.x...@linux.intel.com
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/
+ */
+
+#include hw/mem/pc-nvdimm.h
+
+static char *get_file(Object *obj, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+return g_strdup(nvdimm-file);
+}
+
+static void set_file(Object *obj, const char *str, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+if (nvdimm-file) {
+g_free(nvdimm-file);
+}
+
+nvdimm-file = g_strdup(str);
+}
+
+static void pc_nvdimm_init(Object *obj)
+{
+object_property_add_str(obj, file, get_file, set_file, NULL);
+}
+
+static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
+{
+PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+
+if (!nvdimm-file) {
+error_setg(errp, file property is not set);
+}
+}
+
+static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+/* nvdimm hotplug has not supported yet. */
+dc-hotpluggable = false;
+
+dc-realize = pc_nvdimm_realize;
+dc-desc = NVDIMM memory module;
+}
+
+static TypeInfo pc_nvdimm_info = {
+.name  = TYPE_PC_NVDIMM,
+.parent= TYPE_DEVICE,
+.instance_size = sizeof(PCNVDIMMDevice),
+.instance_init = pc_nvdimm_init,
+.class_init= pc_nvdimm_class_init,
+};
+
+static void pc_nvdimm_register_types(void)
+{
+type_register_static(pc_nvdimm_info);
+}
+
+type_init(pc_nvdimm_register_types)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
new file mode 100644
index 000..7f37b46
--- /dev/null
+++ b/include/hw/mem/pc-nvdimm.h
@@ -0,0 +1,32 @@
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong guangrong.x...@linux.intel.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef __PC_NVDIMM_H
+#define __PC_NVDIMM_H
+
+#include hw/qdev.h
+
+#ifdef CONFIG_LINUX
+typedef struct PCNVDIMMDevice {
+/* private */
+DeviceState parent_obj;
+
+char *file;
+} PCNVDIMMDevice;
+
+#define TYPE_PC_NVDIMM pc-nvdimm
+
+#define PC_NVDIMM(obj) \
+OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
+#else  /* !CONFIG_LINUX */
+#endif
+#endif
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/16] i386/acpi-build: allow SSDT to operate on 64 bit

2015-07-01 Thread Xiao Guangrong

Only 512M is left for MMIO below 4G and that are used by PCI, BIOS etc.
Other components also reserve regions from their internal usage, e.g,
[0xFED0, 0xFED0 + 0x400) is reserved for HPET

Switch SSDT to 64 bit to use the huge free room above 4G. In the later
patches, we will dynamical allocate free space within this region which
is used by NVDIMM _DSM method

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/i386/acpi-build.c  | 4 ++--
 hw/i386/acpi-dsdt.dsl | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 00818b9..6a1ab09 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1348,7 +1348,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 g_array_append_vals(table_data, ssdt-buf-data, ssdt-buf-len);
 build_header(linker, table_data,
 (void *)(table_data-data + table_data-len - ssdt-buf-len),
-SSDT, ssdt-buf-len, 1);
+SSDT, ssdt-buf-len, 2);
 free_aml_allocator();
 }
 
@@ -1586,7 +1586,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
AcpiMiscInfo *misc)
 
 memset(dsdt, 0, sizeof *dsdt);
 build_header(linker, table_data, dsdt, DSDT,
- misc-dsdt_size, 1);
+ misc-dsdt_size, 2);
 }
 
 static GArray *
diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index a2d84ec..5cd3f0e 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -22,7 +22,7 @@ ACPI_EXTRACT_ALL_CODE AcpiDsdtAmlCode
 DefinitionBlock (
 acpi-dsdt.aml,// Output Filename
 DSDT, // Signature
-0x01,   // DSDT Compliance Revision
+0x02,   // DSDT Compliance Revision
 BXPC, // OEMID
 BXDSDT,   // TABLE ID
 0x1 // OEM Revision
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/16] acpi: add aml_create_field

2015-07-01 Thread Xiao Guangrong

Implement CreateField term which are used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/acpi/aml-build.c | 14 ++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a526eed..debdad2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,20 @@ Aml *aml_sizeof(Aml *arg)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+Aml *var = aml_alloc();
+
+build_append_byte(var-buf, 0x5B); /* ExtOpPrefix */
+build_append_byte(var-buf, 0x13); /* CreateFieldOp */
+aml_append(var, srcbuf);
+aml_append(var, index);
+aml_append(var, len);
+build_append_namestring(var-buf, %s, name);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 6b591ab..d4dbd44 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/16] acpi: add aml_sizeof

2015-07-01 Thread Xiao Guangrong

Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/acpi/aml-build.c | 8 
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9e89efc..a526eed 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
 return var;
 }
 
+/* ACPI 6.0: 20.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+aml_append(var, arg);
+return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
  AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 21dc5e9..6b591ab 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] acpi: allow aml_operation_region() working on 64 bit offset

2015-07-01 Thread Xiao Guangrong

Currently, the offset in OperationRegion is limited to 32 bit, extend it
to 64 bit so that we can switch SSDT to 64 bit in later patch

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/acpi/aml-build.c | 2 +-
 include/hw/acpi/aml-build.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..02f9e3d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -752,7 +752,7 @@ Aml *aml_package(uint8_t num_elements)
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefOpRegion */
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len)
+  uint64_t offset, uint32_t len)
 {
 Aml *var = aml_alloc();
 build_append_byte(var-buf, 0x5B); /* ExtOpPrefix */
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index e3afa13..996ac5b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -222,7 +222,7 @@ Aml *aml_interrupt(AmlConsumerAndProducer con_and_pro,
 Aml *aml_io(AmlIODecode dec, uint16_t min_base, uint16_t max_base,
 uint8_t aln, uint8_t len);
 Aml *aml_operation_region(const char *name, AmlRegionSpace rs,
-  uint32_t offset, uint32_t len);
+  uint64_t offset, uint32_t len);
 Aml *aml_irq_no_flags(uint8_t irq);
 Aml *aml_named_field(const char *name, unsigned length);
 Aml *aml_reserved_field(unsigned length);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/16] nvdimm: build ACPI nvdimm devices

2015-07-01 Thread Xiao Guangrong

NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

This is a root device under \_SB and specified NVDIMM device are under the
root device. Each NVDIMM device has _ADR which return its handle used to
associate MEMDEV table in NFIT

We reserve handle 0 for root device. In this patch, we save handle, arg0,
arg1 and arg2. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/i386/acpi-build.c   |   2 +
 hw/mem/pc-nvdimm.c | 126 +
 include/hw/mem/pc-nvdimm.h |   6 +++
 3 files changed, 134 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 80c21be..85c7226 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1342,6 +1342,8 @@ build_ssdt(GArray *table_data, GArray *linker,
 aml_append(sb_scope, scope);
 }
 }
+
+pc_nvdimm_build_acpi_devices(sb_scope);
 aml_append(ssdt, sb_scope);
 }
 
diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 4c290cb..0e2a9d5 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -32,6 +32,7 @@
 
 #define PAGE_SIZE   (1UL  12)
 
+#define NOTIFY_VALUE(0x99)
 #define MAX_NVDIMM_NUMBER   (10)
 #define MIN_CONFIG_DATA_SIZE(128  10)
 
@@ -348,12 +349,15 @@ struct dsm_buffer {
 static uint64_t dsm_read(void *opaque, hwaddr addr,
  unsigned size)
 {
+fprintf(stderr, BUG: we never read DSM notification MMIO.\n);
+assert(0);
 return 0;
 }
 
 static void dsm_write(void *opaque, hwaddr addr,
   uint64_t val, unsigned size)
 {
+assert(val == NOTIFY_VALUE);
 }
 
 static const MemoryRegionOps dsm_ops = {
@@ -429,6 +433,128 @@ exit:
 g_slist_free(list);
 }
 
+#define BUILD_STA_METHOD(_dev_, _method_)  \
+do {   \
+_method_ = aml_method(_STA, 0);  \
+aml_append(_method_, aml_return(aml_int(0x0f)));   \
+aml_append(_dev_, _method_);   \
+} while (0)
+
+#define SAVE_ARG012_HANDLE(_method_, _handle_) \
+do {   \
+aml_append(_method_, aml_store(_handle_, aml_name(HDLE)));   \
+aml_append(_method_, aml_store(aml_arg(0), aml_name(ARG0))); \
+aml_append(_method_, aml_store(aml_arg(1), aml_name(ARG1))); \
+aml_append(_method_, aml_store(aml_arg(2), aml_name(ARG2))); \
+} while (0)
+
+#define NOTIFY_AND_RETURN(_method_)\
+do {   \
+aml_append(_method_, aml_store(aml_int(NOTIFY_VALUE),  \
+   aml_name(NOTI))); \
+aml_append(_method_, aml_return(aml_name(ODAT)));\
+} while (0)
+
+static void build_nvdimm_devices(Aml *root_dev, GSList *list)
+{
+for (; list; list = list-next) {
+PCNVDIMMDevice *nvdimm = list-data;
+uint32_t handle = nvdimm_index_to_handle(nvdimm-device_index);
+Aml *dev, *method;
+
+dev = aml_device(NVD%d, nvdimm-device_index);
+aml_append(dev, aml_name_decl(_ADR, aml_int(handle)));
+
+BUILD_STA_METHOD(dev, method);
+
+method = aml_method(_DSM, 4);
+{
+SAVE_ARG012_HANDLE(method, aml_int(handle));
+NOTIFY_AND_RETURN(method);
+}
+aml_append(dev, method);
+
+aml_append(root_dev, dev);
+}
+}
+
+void pc_nvdimm_build_acpi_devices(Aml *sb_scope)
+{
+Aml *dev, *method, *field;
+struct dsm_buffer *dsm_buf;
+GSList *list = get_nvdimm_built_list();
+int nr = get_nvdimm_device_number(list);
+
+if (nr = 0 || nr  MAX_NVDIMM_NUMBER) {
+g_slist_free(list);
+return;
+}
+
+dev = aml_device(NVDR);
+aml_append(dev, aml_name_decl(_HID, aml_string(ACPI0012)));
+
+/* map DSM buffer into ACPI namespace. */
+aml_append(dev, aml_operation_region(DSMR, AML_SYSTEM_MEMORY,
+   nvdimms_info.dsm_addr, nvdimms_info.dsm_size));
+
+/*
+ * DSM input:
+ * @HDLE: store device's handle, it's zero if the _DSM call happens
+ *on ROOT.
+ * @ARG0 ~ @ARG3: store the parameters of _DSM call.
+ *
+ * They are ram mapping on host so that these access never cause VM-EXIT.
+ */
+field = aml_field(DSMR, AML_DWORD_ACC, AML_PRESERVE);
+aml_append(field, aml_named_field(HDLE,
+   sizeof(dsm_buf-handle) * BITS_PER_BYTE));
+aml_append(field, aml_named_field(ARG0,
+   sizeof(dsm_buf-arg0) * BITS_PER_BYTE));
+aml_append(field, aml_named_field(ARG1,
+   sizeof(dsm_buf-arg1) *

[PATCH 07/16] nvdimm: reserve address range for NVDIMM

2015-07-01 Thread Xiao Guangrong

NVDIMM reserves all the free range above 4G to do:
- Persistent Memory (PMEM) mapping
- implement NVDIMM ACPI device _DSM method

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/i386/pc.c   | 11 +--
 hw/mem/pc-nvdimm.c | 13 +
 include/hw/mem/pc-nvdimm.h |  5 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7072930..82e80a9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -64,6 +64,7 @@
 #include hw/pci/pci_host.h
 #include acpi-build.h
 #include hw/mem/pc-dimm.h
+#include hw/mem/pc-nvdimm.h
 #include trace.h
 #include qapi/visitor.h
 #include qapi-visit.h
@@ -1241,6 +1242,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
 MemoryRegion *ram_below_4g, *ram_above_4g;
 FWCfgState *fw_cfg;
 PCMachineState *pcms = PC_MACHINE(machine);
+ram_addr_t offset;
 
 assert(machine-ram_size == below_4g_mem_size + above_4g_mem_size);
 
@@ -1278,6 +1280,8 @@ FWCfgState *pc_memory_init(MachineState *machine,
 exit(EXIT_FAILURE);
 }
 
+offset = 0x1ULL + above_4g_mem_size;
+
 /* initialize hotplug memory address space */
 if (guest_info-has_reserved_memory 
 (machine-ram_size  machine-maxram_size)) {
@@ -1297,8 +1301,7 @@ FWCfgState *pc_memory_init(MachineState *machine,
 exit(EXIT_FAILURE);
 }
 
-pcms-hotplug_memory_base =
-ROUND_UP(0x1ULL + above_4g_mem_size, 1ULL  30);
+pcms-hotplug_memory_base = ROUND_UP(offset, 1ULL  30);
 
 if (pcms-enforce_aligned_dimm) {
 /* size hotplug region assuming 1G page max alignment per slot */
@@ -1316,8 +1319,12 @@ FWCfgState *pc_memory_init(MachineState *machine,
hotplug-memory, hotplug_mem_size);
 memory_region_add_subregion(system_memory, pcms-hotplug_memory_base,
 pcms-hotplug_memory);
+offset = pcms-hotplug_memory_base + hotplug_mem_size;
 }
 
+/* all the space left above 4G is reserved for NVDIMM. */
+pc_nvdimm_reserve_range(offset);
+
 /* Initialize PC system firmware */
 pc_system_firmware_init(rom_memory, guest_info-isapc_ram_fw);
 
diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 0209ea9..b40d4e7 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -24,6 +24,19 @@
 
 #include hw/mem/pc-nvdimm.h
 
+#define PAGE_SIZE  (1UL  12)
+
+static struct nvdimms_info {
+ram_addr_t current_addr;
+} nvdimms_info;
+
+/* the address range [offset, ~0ULL) is reserved for NVDIMM. */
+void pc_nvdimm_reserve_range(ram_addr_t offset)
+{
+offset = ROUND_UP(offset, PAGE_SIZE);
+nvdimms_info.current_addr = offset;
+}
+
 static char *get_file(Object *obj, Error **errp)
 {
 PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
index 7f37b46..2081e7c 100644
--- a/include/hw/mem/pc-nvdimm.h
+++ b/include/hw/mem/pc-nvdimm.h
@@ -27,6 +27,11 @@ typedef struct PCNVDIMMDevice {
 
 #define PC_NVDIMM(obj) \
 OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
+
+void pc_nvdimm_reserve_range(ram_addr_t offset);
 #else  /* !CONFIG_LINUX */
+static inline void pc_nvdimm_reserve_range(ram_addr_t offset)
+{
+}
 #endif
 #endif
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/16] nvdimm: init the address region used by _DSM method

2015-07-01 Thread Xiao Guangrong

This memory range is used to transfer data between ACPI in guest and Qemu,
it occupies two pages:
- one is RAM-based used to save the input info of _DSM method and Qemu reuse
  it store output info

- another one is MMIO-based, ACPI write data to this page to transfer the
  control to Qemu

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 80 +-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index e7cff29..4c290cb 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -37,6 +37,10 @@
 
 static struct nvdimms_info {
 ram_addr_t current_addr;
+
+ram_addr_t dsm_addr;
+int dsm_size;
+
 int device_index;
 } nvdimms_info;
 
@@ -324,14 +328,88 @@ static void build_nfit_table(GSList *device_list, char 
*buf)
 }
 }
 
+struct dsm_buffer {
+/* RAM page. */
+uint32_t handle;
+uint8_t arg0[16];
+uint32_t arg1;
+uint32_t arg2;
+union {
+char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
+};
+
+/* MMIO page. */
+union {
+uint32_t notify;
+char pedding[PAGE_SIZE];
+};
+};
+
+static uint64_t dsm_read(void *opaque, hwaddr addr,
+ unsigned size)
+{
+return 0;
+}
+
+static void dsm_write(void *opaque, hwaddr addr,
+  uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps dsm_ops = {
+.read = dsm_read,
+.write = dsm_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static int build_dsm_buffer(void)
+{
+MemoryRegion *dsm_ram_mr, *dsm_mmio_mr;
+ram_addr_t addr;;
+
+QEMU_BUILD_BUG_ON(PAGE_SIZE * 2 != sizeof(struct dsm_buffer));
+
+/* DSM buffer has already been built. */
+if (nvdimms_info.dsm_addr) {
+return 0;
+}
+
+addr = reserved_range_push(2 * PAGE_SIZE);
+if (!addr) {
+return -1;
+}
+
+nvdimms_info.dsm_addr = addr;
+nvdimms_info.dsm_size = PAGE_SIZE * 2;
+
+dsm_ram_mr = g_new(MemoryRegion, 1);
+memory_region_init_ram(dsm_ram_mr, NULL, dsm_ram, PAGE_SIZE,
+   error_abort);
+vmstate_register_ram_global(dsm_ram_mr);
+memory_region_add_subregion(get_system_memory(), addr, dsm_ram_mr);
+
+dsm_mmio_mr = g_new(MemoryRegion, 1);
+memory_region_init_io(dsm_mmio_mr, NULL, dsm_ops, dsm_ram_mr,
+  dsm_mmio, PAGE_SIZE);
+memory_region_add_subregion(get_system_memory(), addr + PAGE_SIZE,
+dsm_mmio_mr);
+return 0;
+}
+
 void pc_nvdimm_build_nfit_table(GArray *table_offsets, GArray *table_data,
 GArray *linker)
 {
-GSList *list = get_nvdimm_built_list();
+GSList *list;
 size_t total;
 char *buf;
 int nfit_start, nr;
 
+if (build_dsm_buffer()) {
+fprintf(stderr, do not have enough space for DSM buffer.\n);
+return;
+}
+
+list = get_nvdimm_built_list();
 nr = get_nvdimm_device_number(list);
 total = get_nfit_total_size(nr);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/16] nvdimm: support NFIT_CMD_SET_CONFIG_DATA

2015-07-01 Thread Xiao Guangrong

Function 6 is used to set Namespace Label Data

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 0498de3..0d2d9fb 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -450,12 +450,17 @@ struct cmd_out_get_config_data {
 uint8_t out_buf[0];
 } QEMU_PACKED;
 
+struct cmd_out_set_config_data {
+uint32_t status;
+} QEMU_PACKED;
+
 struct dsm_out {
 union {
 uint32_t status;
 struct cmd_out_implemented cmd_implemented;
 struct cmd_out_get_config_size cmd_config_size;
 struct cmd_out_get_config_data cmd_config_get;
+struct cmd_out_set_config_data cmd_config_set;
 uint8_t data[PAGE_SIZE];
 };
 };
@@ -555,6 +560,35 @@ exit:
 return status;
 }
 
+static uint32_t dsm_cmd_config_set(struct dsm_buffer *in, struct dsm_out *out)
+{
+GSList *list = get_nvdimm_built_list();
+PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle);
+struct cmd_in_set_config_data *cmd_in = in-cmd_config_set;
+uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
+
+if (!nvdimm) {
+goto exit;
+}
+
+nvdebug(Write Config: offset %#x length %#x.\n, cmd_in-offset,
+cmd_in-length);
+if (nvdimm-config_data_size  cmd_in-length + cmd_in-offset) {
+nvdebug(position %#x is beyond config data (len = %#lx).\n,
+cmd_in-length + cmd_in-offset, nvdimm-config_data_size);
+status = NFIT_STATUS_INVALID_PARAS;
+goto exit;
+}
+
+status = NFIT_STATUS_SUCCESS;
+memcpy(nvdimm-config_data_addr + cmd_in-offset, cmd_in-in_buf,
+   cmd_in-length);
+
+exit:
+g_slist_free(list);
+return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
 uint32_t function = in-arg2;
@@ -570,6 +604,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct 
dsm_out *out)
 case NFIT_CMD_GET_CONFIG_DATA:
 status = dsm_cmd_config_get(in, out);
 break;
+case NFIT_CMD_SET_CONFIG_DATA:
+status = dsm_cmd_config_set(in, out);
+break;
 default:
 status = NFIT_STATUS_NOT_SUPPORTED;
 };
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/16] nvdimm: support NFIT_CMD_GET_CONFIG_DATA

2015-07-01 Thread Xiao Guangrong

Function 5 is used to get Namespace Label Data

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 7e5446c..0498de3 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -423,6 +423,7 @@ struct dsm_buffer {
 uint32_t arg1;
 uint32_t arg2;
 union {
+struct cmd_in_get_config_data cmd_config_get;
 struct cmd_in_set_config_data cmd_config_set;
 char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 };
@@ -525,6 +526,35 @@ exit:
 return status;
 }
 
+static uint32_t dsm_cmd_config_get(struct dsm_buffer *in, struct dsm_out *out)
+{
+GSList *list = get_nvdimm_built_list();
+PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle);
+struct cmd_in_get_config_data *cmd_in = in-cmd_config_get;
+uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
+
+if (!nvdimm) {
+goto exit;
+}
+
+nvdebug(Read Config: offset %#x length %#x.\n, cmd_in-offset,
+cmd_in-length);
+if (nvdimm-config_data_size  cmd_in-length + cmd_in-offset) {
+nvdebug(position %#x is beyond config data (len = %#lx).\n,
+cmd_in-length + cmd_in-offset, nvdimm-config_data_size);
+status = NFIT_STATUS_INVALID_PARAS;
+goto exit;
+}
+
+status = NFIT_STATUS_SUCCESS;
+memcpy(out-cmd_config_get.out_buf, nvdimm-config_data_addr +
+   cmd_in-offset, cmd_in-length);
+
+exit:
+g_slist_free(list);
+return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
 uint32_t function = in-arg2;
@@ -537,6 +567,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct 
dsm_out *out)
 case NFIT_CMD_GET_CONFIG_SIZE:
 status = dsm_cmd_config_size(in, out);
 break;
+case NFIT_CMD_GET_CONFIG_DATA:
+status = dsm_cmd_config_get(in, out);
+break;
 default:
 status = NFIT_STATUS_NOT_SUPPORTED;
 };
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/16] nvdimm: save arg3 for NVDIMM device _DSM method

2015-07-01 Thread Xiao Guangrong

Check if the function (Arg2) has additional input info (arg3) and save
the info if needed

We only do the save on NVDIMM device since we are not going to support any
function on root device

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 73 +-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index 0e2a9d5..c0965ae 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -329,6 +329,26 @@ static void build_nfit_table(GSList *device_list, char 
*buf)
 }
 }
 
+enum {
+NFIT_CMD_IMPLEMENTED = 0,
+
+/* bus commands */
+NFIT_CMD_ARS_CAP = 1,
+NFIT_CMD_ARS_START = 2,
+NFIT_CMD_ARS_QUERY = 3,
+
+/* per-dimm commands */
+NFIT_CMD_SMART = 1,
+NFIT_CMD_SMART_THRESHOLD = 2,
+NFIT_CMD_DIMM_FLAGS = 3,
+NFIT_CMD_GET_CONFIG_SIZE = 4,
+NFIT_CMD_GET_CONFIG_DATA = 5,
+NFIT_CMD_SET_CONFIG_DATA = 6,
+NFIT_CMD_VENDOR_EFFECT_LOG_SIZE = 7,
+NFIT_CMD_VENDOR_EFFECT_LOG = 8,
+NFIT_CMD_VENDOR = 9,
+};
+
 struct dsm_buffer {
 /* RAM page. */
 uint32_t handle;
@@ -433,6 +453,19 @@ exit:
 g_slist_free(list);
 }
 
+static bool device_cmd_has_arg3[] = {
+false,  /* NFIT_CMD_IMPLEMENTED */
+false,  /* NFIT_CMD_SMART */
+false,  /* NFIT_CMD_SMART_THRESHOLD */
+false,  /* NFIT_CMD_DIMM_FLAGS */
+false,  /* NFIT_CMD_GET_CONFIG_SIZE */
+true,   /* NFIT_CMD_GET_CONFIG_DATA */
+true,   /* NFIT_CMD_SET_CONFIG_DATA */
+false,  /* NFIT_CMD_VENDOR_EFFECT_LOG_SIZE */
+false,  /* NFIT_CMD_VENDOR_EFFECT_LOG */
+false,  /* NFIT_CMD_VENDOR */
+};
+
 #define BUILD_STA_METHOD(_dev_, _method_)  \
 do {   \
 _method_ = aml_method(_STA, 0);  \
@@ -457,10 +490,20 @@ exit:
 
 static void build_nvdimm_devices(Aml *root_dev, GSList *list)
 {
+Aml *has_arg3;
+int i, cmd_nr;
+
+cmd_nr = ARRAY_SIZE(device_cmd_has_arg3);
+has_arg3 = aml_package(cmd_nr);
+for (i = 0; i  cmd_nr; i++) {
+aml_append(has_arg3, aml_int(device_cmd_has_arg3[i]));
+}
+aml_append(root_dev, aml_name_decl(CAG3, has_arg3));
+
 for (; list; list = list-next) {
 PCNVDIMMDevice *nvdimm = list-data;
 uint32_t handle = nvdimm_index_to_handle(nvdimm-device_index);
-Aml *dev, *method;
+Aml *dev, *method, *ifctx;
 
 dev = aml_device(NVD%d, nvdimm-device_index);
 aml_append(dev, aml_name_decl(_ADR, aml_int(handle)));
@@ -470,6 +513,34 @@ static void build_nvdimm_devices(Aml *root_dev, GSList 
*list)
 method = aml_method(_DSM, 4);
 {
 SAVE_ARG012_HANDLE(method, aml_int(handle));
+
+/* Local5 = DeRefOf(Index(CAG3, Arg2)) */
+aml_append(method,
+   aml_store(aml_derefof(aml_index(aml_name(CAG3),
+   aml_arg(2))), aml_local(5)));
+/* if 0  local5 */
+ifctx = aml_if(aml_lless(aml_int(0), aml_local(5)));
+{
+/* Local0 = Index(Arg3, 0) */
+aml_append(ifctx, aml_store(aml_index(aml_arg(3), aml_int(0)),
+   aml_local(0)));
+/* Local1 = sizeof(Local0) */
+aml_append(ifctx, aml_store(aml_sizeof(aml_local(0)),
+   aml_local(1)));
+/* Local2 = Local1  3 */
+aml_append(ifctx, aml_store(aml_shiftleft(aml_local(1),
+   aml_int(3)), aml_local(2)));
+/* Local3 = DeRefOf(Local0) */
+aml_append(ifctx, aml_store(aml_derefof(aml_local(0)),
+   aml_local(3)));
+/* CreateField(Local3, 0, local2, IBUF) */
+aml_append(ifctx, aml_create_field(aml_local(3),
+   aml_int(0), aml_local(2), IBUF));
+/* ARG3 = IBUF */
+aml_append(ifctx, aml_store(aml_name(IBUF),
+   aml_name(ARG3)));
+}
+aml_append(method, ifctx);
 NOTIFY_AND_RETURN(method);
 }
 aml_append(dev, method);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/16] nvdimm: support NFIT_CMD_GET_CONFIG_SIZE function

2015-07-01 Thread Xiao Guangrong

Function 4 is used to get Namespace lable size

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 hw/mem/pc-nvdimm.c | 87 ++
 1 file changed, 87 insertions(+)

diff --git a/hw/mem/pc-nvdimm.c b/hw/mem/pc-nvdimm.c
index b586bf7..7e5446c 100644
--- a/hw/mem/pc-nvdimm.c
+++ b/hw/mem/pc-nvdimm.c
@@ -127,6 +127,20 @@ static uint32_t nvdimm_index_to_handle(int index)
 return index + 1;
 }
 
+static PCNVDIMMDevice
+*get_nvdimm_device_by_handle(GSList *list, uint32_t handle)
+{
+for (; list; list = list-next) {
+PCNVDIMMDevice *nvdimm = list-data;
+
+if (nvdimm_index_to_handle(nvdimm-device_index) == handle) {
+return nvdimm;
+}
+}
+
+return NULL;
+}
+
 typedef struct {
 uint8_t b[16];
 } uuid_le;
@@ -391,6 +405,17 @@ enum {
| (1  NFIT_CMD_GET_CONFIG_DATA)\
| (1  NFIT_CMD_SET_CONFIG_DATA))
 
+struct cmd_in_get_config_data {
+uint32_t offset;
+uint32_t length;
+} QEMU_PACKED;
+
+struct cmd_in_set_config_data {
+uint32_t offset;
+uint32_t length;
+uint8_t in_buf[0];
+} QEMU_PACKED;
+
 struct dsm_buffer {
 /* RAM page. */
 uint32_t handle;
@@ -398,6 +423,7 @@ struct dsm_buffer {
 uint32_t arg1;
 uint32_t arg2;
 union {
+struct cmd_in_set_config_data cmd_config_set;
 char arg3[PAGE_SIZE - 3 * sizeof(uint32_t) - 16 * sizeof(uint8_t)];
 };
 
@@ -412,10 +438,23 @@ struct cmd_out_implemented {
 uint64_t cmd_list;
 };
 
+struct cmd_out_get_config_size {
+uint32_t status;
+uint32_t config_size;
+uint32_t max_xfer;
+} QEMU_PACKED;
+
+struct cmd_out_get_config_data {
+uint32_t status;
+uint8_t out_buf[0];
+} QEMU_PACKED;
+
 struct dsm_out {
 union {
 uint32_t status;
 struct cmd_out_implemented cmd_implemented;
+struct cmd_out_get_config_size cmd_config_size;
+struct cmd_out_get_config_data cmd_config_get;
 uint8_t data[PAGE_SIZE];
 };
 };
@@ -441,6 +480,51 @@ static void dsm_write_root(struct dsm_buffer *in, struct 
dsm_out *out)
 nvdebug(Return status %#x.\n, out-status);
 }
 
+/*
+ * the max transfer size is the max size transfered by both a
+ * NFIT_CMD_GET_CONFIG_DATA and a NFIT_CMD_SET_CONFIG_DATA
+ * command.
+ */
+static uint32_t max_xfer_config_size(void)
+{
+struct dsm_buffer *in;
+struct dsm_out *out;
+uint32_t max_get_size, max_set_size;
+
+/*
+ * the max data ACPI can read one time which is transfered by
+ * the response of NFIT_CMD_GET_CONFIG_DATA.
+ */
+max_get_size = sizeof(out-data) - sizeof(out-cmd_config_get);
+
+/*
+ * the max data ACPI can write one time which is transfered by
+ * NFIT_CMD_SET_CONFIG_DATA
+ */
+max_set_size = sizeof(in-arg3) - sizeof(in-cmd_config_set);
+return MIN(max_get_size, max_set_size);
+}
+
+static uint32_t dsm_cmd_config_size(struct dsm_buffer *in, struct dsm_out *out)
+{
+GSList *list = get_nvdimm_built_list();
+PCNVDIMMDevice *nvdimm = get_nvdimm_device_by_handle(list, in-handle);
+uint32_t status = NFIT_STATUS_NON_EXISTING_MEM_DEV;
+
+if (!nvdimm) {
+goto exit;
+}
+
+status = NFIT_STATUS_SUCCESS;
+out-cmd_config_size.config_size = nvdimm-config_data_size;
+out-cmd_config_size.max_xfer = max_xfer_config_size();
+nvdebug(%s config_size %#x, max_xfer %#x.\n, __func__,
+out-cmd_config_size.config_size, out-cmd_config_size.max_xfer);
+exit:
+g_slist_free(list);
+return status;
+}
+
 static void dsm_write_nvdimm(struct dsm_buffer *in, struct dsm_out *out)
 {
 uint32_t function = in-arg2;
@@ -450,6 +534,9 @@ static void dsm_write_nvdimm(struct dsm_buffer *in, struct 
dsm_out *out)
 case NFIT_CMD_IMPLEMENTED:
 out-cmd_implemented.cmd_list = DIMM_SUPPORT_CMD;
 return;
+case NFIT_CMD_GET_CONFIG_SIZE:
+status = dsm_cmd_config_size(in, out);
+break;
 default:
 status = NFIT_STATUS_NOT_SUPPORTED;
 };
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] qemu/kvm: kvm hyper-v based guest crash event handling

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 13:33, Denis V. Lunev wrote:
 
 +static int kvm_arch_handle_hv_crash(CPUState *cs)
 +{
 +X86CPU *cpu = X86_CPU(cs);
 +CPUX86State *env = cpu-env;
 +
 +/* Mark that Hyper-v guest crash occurred */
 +env-hv_crash_occurred = 1;

This need not be a hv crash.  You can add crash_occurred to CPUState
directly, and set it in qemu_system_guest_panicked:

if (current_cpu) {
current_cpu-crash_occurred = true;
}

Then you would add two subsections: one for crash_occurred in exec.c
(attached to vmstate_cpu_common), one for hyperv crash params in
target-i386/machine.c.

This also gives an idea about splitting the patch: first the
introduction of qemu_system_guest_panicked and crash_occurred, second
the Hyper-V specific bits.

 +if (cpu-hyperv_crash) {
 +c-edx |= HV_X64_GUEST_CRASH_MSR_AVAILABLE;
 +has_msr_hv_crash = true;

You can only set this to true if the kernel also supports the MSRs.

 +}
 +
  c = cpuid_data.entries[cpuid_i++];
  c-function = HYPERV_CPUID_ENLIGHTMENT_INFO;
  if (cpu-hyperv_relaxed_timing) {
 @@ -761,6 +767,10 @@ void kvm_arch_reset_vcpu(X86CPU *cpu)
  } else {
  env-mp_state = KVM_MP_STATE_RUNNABLE;
  }
 +if (has_msr_hv_crash) {
 +env-msr_hv_crash_ctl = HV_X64_MSR_CRASH_CTL_NOTIFY;

The value is always host-defined, so I think it doesn't need a field in
CPUX86State.  On the other hand, this:


+static bool hyperv_crash_enable_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = cpu-env;
+
+return (env-msr_hv_crash_ctl  HV_X64_MSR_CRASH_CTL_CONTENTS) ?
+true : false;
+}
+

can just check if any of the params fields is nonzero.

Thanks,

Paolo

 +env-hv_crash_occurred = 0;
 +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/9] HyperV equivalent of pvpanic driver

2015-07-01 Thread Denis V. Lunev


On 01/07/15 17:09, Paolo Bonzini wrote:


On 30/06/2015 13:33, Denis V. Lunev wrote:

Windows 2012 guests can notify hypervisor about occurred guest crash
(Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
handling of this MSR's by KVM and sending notification to user space that
allows to gather Windows guest crash dump by QEMU/LIBVIRT.

The idea is to provide functionality equal to pvpanic device without
QEMU guest agent for Windows.

The idea is borrowed from Linux HyperV bus driver and validated against
Windows 2k12.

Changes from v2:
* forbid modification crash ctl msr by guest
* qemu_system_guest_panicked usage in pvpanic and s390x
* hyper-v crash handler move from generic kvm to i386
* hyper-v crash handler: skip fetching crash msrs just mark crash occured
* sync with linux-next 20150629
* patch 11 squashed to patch 10
* patch 9 squashed to patch 7

Changes from v1:
* hyperv code move to hyperv.c
* added read handlers of crash data msrs
* added per vm and per cpu hyperv context structures
* added saving crash msrs inside qemu cpu state
* added qemu fetch and update of crash msrs
* added qemu crash msrs store in cpu state and it's migration

Signed-off-by: Andrey Smetanin asmeta...@virtuozzo.com
Signed-off-by: Denis V. Lunev d...@openvz.org
CC: Gleb Natapov g...@kernel.org
CC: Paolo Bonzini pbonz...@redhat.com

The patches look good, thanks.  I'll queue them as soon as I start
merging 4.3 features.

Paolo

that sounds good to me. We'll re-send patch 8 and fork
second thread for QEMU part then.

Den
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: add hyper-v crash msrs values

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 13:33, Denis V. Lunev wrote:
 +#define HV_X64_MSR_CRASH_CTL_NOTIFY  (1ULL  63)
 +#define HV_X64_MSR_CRASH_CTL_CONTENTS\
 + (HV_X64_MSR_CRASH_CTL_NOTIFY)

Why is HV_X64_MSR_CRASH_CTL_CONTENTS needed?  Can I just remove it?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] MAINTAINERS: separate section for s390 virtio drivers

2015-07-01 Thread Paolo Bonzini



On 01/07/2015 17:15, Cornelia Huck wrote:
 The s390-specific virtio drivers have probably more to do with virtio
 than with kvm today; let's move them out into a separate section to
 reflect this and to be able to add relevant mailing lists.
 
 CC: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  MAINTAINERS | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 246d9d8..fca5c00 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -5766,7 +5766,6 @@ S:  Supported
  F:   Documentation/s390/kvm.txt
  F:   arch/s390/include/asm/kvm*
  F:   arch/s390/kvm/
 -F:   drivers/s390/kvm/
  
  KERNEL VIRTUAL MACHINE (KVM) FOR ARM
  M:   Christoffer Dall christoffer.d...@linaro.org
 @@ -10671,6 +10670,15 @@ F:   drivers/block/virtio_blk.c
  F:   include/linux/virtio_*.h
  F:   include/uapi/linux/virtio_*.h
  
 +VIRTIO DRIVERS FOR S390
 +M:   Christian Borntraeger borntrae...@de.ibm.com
 +M:   Cornelia Huck cornelia.h...@de.ibm.com
 +L:   linux-s...@vger.kernel.org
 +L:   virtualizat...@lists.linux-foundation.org
 +L:   kvm@vger.kernel.org

Keeping the KVM mailing list is probably a good idea.

 +S:   Supported
 +F:   drivers/s390/kvm/

Since we are at it, do we want to rename the directory to
drivers/s390/virtio?  Anyway:

Acked-by: Paolo Bonzini pbonz...@redhat.com

Paolo

  VIRTIO HOST (VHOST)
  M:   Michael S. Tsirkin m...@redhat.com
  L:   kvm@vger.kernel.org
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] MAINTAINERS: separate section for s390 virtio drivers

2015-07-01 Thread Cornelia Huck

The s390-specific virtio drivers have probably more to do with virtio
than with kvm today; let's move them out into a separate section to
reflect this and to be able to add relevant mailing lists.

CC: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 MAINTAINERS | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 246d9d8..fca5c00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5766,7 +5766,6 @@ S:Supported
 F: Documentation/s390/kvm.txt
 F: arch/s390/include/asm/kvm*
 F: arch/s390/kvm/
-F: drivers/s390/kvm/
 
 KERNEL VIRTUAL MACHINE (KVM) FOR ARM
 M: Christoffer Dall christoffer.d...@linaro.org
@@ -10671,6 +10670,15 @@ F: drivers/block/virtio_blk.c
 F: include/linux/virtio_*.h
 F: include/uapi/linux/virtio_*.h
 
+VIRTIO DRIVERS FOR S390
+M: Christian Borntraeger borntrae...@de.ibm.com
+M: Cornelia Huck cornelia.h...@de.ibm.com
+L: linux-s...@vger.kernel.org
+L: virtualizat...@lists.linux-foundation.org
+L: kvm@vger.kernel.org
+S: Supported
+F: drivers/s390/kvm/
+
 VIRTIO HOST (VHOST)
 M: Michael S. Tsirkin m...@redhat.com
 L: kvm@vger.kernel.org
-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/9] kvm: add hyper-v crash msrs values

2015-07-01 Thread Peter Hornyack

If userspace is controlling the crash capabilities then
HV_X64_MSR_CRASH_CTL_CONTENTS is not needed.

On Wed, Jul 1, 2015 at 8:53 AM, Denis V. Lunev d...@openvz.org wrote:
 On 01/07/15 18:00, Paolo Bonzini wrote:


 On 30/06/2015 13:33, Denis V. Lunev wrote:

 +#define HV_X64_MSR_CRASH_CTL_NOTIFY(1ULL  63)
 +#define HV_X64_MSR_CRASH_CTL_CONTENTS  \
 +   (HV_X64_MSR_CRASH_CTL_NOTIFY)

 Why is HV_X64_MSR_CRASH_CTL_CONTENTS needed?  Can I just remove it?

 Paolo

 this was a direct request from Peter Hornyack peterhorny...@google.com

 I suggest here:

 #define HV_X64_MSR_CRASH_CTL_CONTENTS  \
 (HV_CRASH_CTL_CRASH_NOTIFY)

 To allow for more crash actions to be added in the future.

 Den
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] virtio/vhost: cross endian support

2015-07-01 Thread Linus Torvalds

On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote:
 virtio/vhost: cross endian support

Ugh. Does this really have to be dynamic?

Can't virtio do the sane thing, and just use a _fixed_ endianness?

Doing a unconditional byte swap is faster and simpler than the crazy
conditionals. That's true regardless of endianness, but gets to be
even more so if the fixed endianness is little-endian, since BE is
not-so-slowly fading from the world.

   Linus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] virtio/vhost: cross endian support

2015-07-01 Thread Linus Torvalds

On Wed, Jul 1, 2015 at 12:02 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Doing a unconditional byte swap is faster and simpler than the crazy
 conditionals.

Unconditional endianness not only makes for simpler and faster code,
it also ends up being easier to debug and add things like type
annotations for sparse.

Linus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v7 02/11] KVM: arm64: guest debug, define API headers

2015-07-01 Thread Alex Bennée

This commit defines the API headers for guest debugging. There are two
architecture specific debug structures:

  - kvm_guest_debug_arch, allows us to pass in HW debug registers
  - kvm_debug_exit_arch, signals exception and possible faulting address

The type of debugging being used is controlled by the architecture
specific control bits of the kvm_guest_debug-control flags in the ioctl
structure.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Andrew Jones drjo...@redhat.com
Acked-by: Christoffer Dall christoffer.d...@linaro.org

---
v2
   - expose hsr and pc directly to user-space
v3
   - s/control/controlled/ in commit message
   - add v8 to ARM ARM comment (ARM Architecture Reference Manual)
   - add rb tag
   - rm pc, add far
   - re-word comments on alignment
   - rename KVM_ARM_NDBG_REGS - KVM_ARM_MAX_DBG_REGS
v4
   - now uses common HW/SW BP define
   - add a-b-tag
   - use u32 for control regs
v5
   - revert to have arch specific KVM_GUESTDBG_USE_SW/HW_BP
   - rm stale comments dbgctrl was stored as u64
v6
   - mv far comment from later patch
   - KVM_GUESTDBG_USE_HW_BP - KVM_GUESTDBG_USE_HW
   - revert control regs to u64 (parity with GET/SET_ONE_REG)
---
 arch/arm64/include/uapi/asm/kvm.h | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index d268320..d82f3f3 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -100,12 +100,39 @@ struct kvm_sregs {
 struct kvm_fpu {
 };
 
+/*
+ * See v8 ARM ARM D7.3: Debug Registers
+ *
+ * The architectural limit is 16 debug registers of each type although
+ * in practice there are usually less (see ID_AA64DFR0_EL1).
+ *
+ * Although the control registers are architecturally defined as 32
+ * bits wide we use a 64 bit structure here to keep parity with
+ * KVM_GET/SET_ONE_REG behaviour which treats all system registers as
+ * 64 bit values. It also allows for the possibility of the
+ * architecture expanding the control registers without having to
+ * change the userspace ABI.
+ */
+#define KVM_ARM_MAX_DBG_REGS 16
 struct kvm_guest_debug_arch {
+   __u64 dbg_bcr[KVM_ARM_MAX_DBG_REGS];
+   __u64 dbg_bvr[KVM_ARM_MAX_DBG_REGS];
+   __u64 dbg_wcr[KVM_ARM_MAX_DBG_REGS];
+   __u64 dbg_wvr[KVM_ARM_MAX_DBG_REGS];
 };
 
 struct kvm_debug_exit_arch {
+   __u32 hsr;
+   __u64 far;  /* used for watchpoints */
 };
 
+/*
+ * Architecture specific defines for kvm_guest_debug-control
+ */
+
+#define KVM_GUESTDBG_USE_SW_BP (1  16)
+#define KVM_GUESTDBG_USE_HW(1  17)
+
 struct kvm_sync_regs {
 };
 
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/9] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers

2015-07-01 Thread Paolo Bonzini



On 30/06/2015 13:33, Denis V. Lunev wrote:
 +static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data, bool 
 host)
 +{
 + struct kvm_hv *hv = vcpu-kvm-arch.hyperv;
 +
 + if (host)
 + hv-hv_crash_ctl = data;
 +

You need to check against HV_X64_MSR_CRASH_CTL_CONTENTS here (or
HV_X64_MSR_CRASH_CTL_NOTIFY) here.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 06/11] KVM: arm: add trap handlers for 32-bit debug registers

2015-07-01 Thread zichao



On June 30, 2015 5:16:41 AM GMT+08:00, Christoffer Dall 
christoffer.d...@linaro.org wrote:
On Mon, Jun 22, 2015 at 06:41:29PM +0800, Zhichao Huang wrote:
 Add handlers for all the 32-bit debug registers.
 
 Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
 ---
  arch/arm/include/asm/kvm_asm.h  |  12 
  arch/arm/include/asm/kvm_host.h |   3 +
  arch/arm/kernel/asm-offsets.c   |   1 +
  arch/arm/kvm/coproc.c   | 122

  4 files changed, 138 insertions(+)
 
 diff --git a/arch/arm/include/asm/kvm_asm.h
b/arch/arm/include/asm/kvm_asm.h
 index 25410b2..ba65e05 100644
 --- a/arch/arm/include/asm/kvm_asm.h
 +++ b/arch/arm/include/asm/kvm_asm.h
 @@ -52,6 +52,18 @@
  #define c10_AMAIR1  30  /* Auxilary Memory Attribute Indirection Reg1
*/
  #define NR_CP15_REGS31  /* Number of regs (incl. invalid) */
  
 +/* 0 is reserved as an invalid value. */
 +#define cp14_DBGBVR01   /* Debug Breakpoint Control Registers 
 (0-15)
*/
 +#define cp14_DBGBVR15   16
 +#define cp14_DBGBCR017  /* Debug Breakpoint Value Registers 
 (0-15)
*/
 +#define cp14_DBGBCR15   32
 +#define cp14_DBGWVR033  /* Debug Watchpoint Control Registers 
 (0-15)
*/
 +#define cp14_DBGWVR15   48
 +#define cp14_DBGWCR049  /* Debug Watchpoint Value Registers 
 (0-15)
*/
 +#define cp14_DBGWCR15   64
 +#define cp14_DBGDSCRext 65  /* Debug Status and Control external */
 +#define NR_CP14_REGS66  /* Number of regs (incl. invalid) */
 +
  #define ARM_EXCEPTION_RESET   0
  #define ARM_EXCEPTION_UNDEFINED   1
  #define ARM_EXCEPTION_SOFTWARE2
 diff --git a/arch/arm/include/asm/kvm_host.h
b/arch/arm/include/asm/kvm_host.h
 index d71607c..3d16820 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -124,6 +124,9 @@ struct kvm_vcpu_arch {
  struct vgic_cpu vgic_cpu;
  struct arch_timer_cpu timer_cpu;
  
 +/* System control coprocessor (cp14) */
 +u32 cp14[NR_CP14_REGS];
 +
  /*
   * Anything that is not used directly from assembly code goes
   * here.
 diff --git a/arch/arm/kernel/asm-offsets.c
b/arch/arm/kernel/asm-offsets.c
 index 871b826..9158de0 100644
 --- a/arch/arm/kernel/asm-offsets.c
 +++ b/arch/arm/kernel/asm-offsets.c
 @@ -172,6 +172,7 @@ int main(void)
  #ifdef CONFIG_KVM_ARM_HOST
DEFINE(VCPU_KVM,  offsetof(struct kvm_vcpu, kvm));
DEFINE(VCPU_MIDR, offsetof(struct kvm_vcpu, arch.midr));
 +  DEFINE(VCPU_CP14, offsetof(struct kvm_vcpu, arch.cp14));
DEFINE(VCPU_CP15, offsetof(struct kvm_vcpu, arch.cp15));
DEFINE(VCPU_VFP_GUEST,offsetof(struct kvm_vcpu, arch.vfp_guest));
DEFINE(VCPU_VFP_HOST, offsetof(struct kvm_vcpu,
arch.host_cpu_context));
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 16d5f69..59b65b7 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -220,6 +220,47 @@ bool access_vm_reg(struct kvm_vcpu *vcpu,
  return true;
  }
  
 +static bool trap_debug32(struct kvm_vcpu *vcpu,
 +const struct coproc_params *p,
 +const struct coproc_reg *r)
 +{
 +if (p-is_write)
 +vcpu-arch.cp14[r-reg] = *vcpu_reg(vcpu, p-Rt1);
 +else
 +*vcpu_reg(vcpu, p-Rt1) = vcpu-arch.cp14[r-reg];
 +
 +return true;
 +}
 +
 +/* DBGIDR (RO) Debug ID */
 +static bool trap_dbgidr(struct kvm_vcpu *vcpu,
 +const struct coproc_params *p,
 +const struct coproc_reg *r)
 +{
 +u32 val;
 +
 +if (p-is_write)
 +return ignore_write(vcpu, p);
 +
 +ARM_DBG_READ(c0, c0, 0, val);
 +*vcpu_reg(vcpu, p-Rt1) = val;
 +
 +return true;
 +}
 +
 +/* DBGDSCRint (RO) Debug Status and Control Register */
 +static bool trap_dbgdscr(struct kvm_vcpu *vcpu,
 +const struct coproc_params *p,
 +const struct coproc_reg *r)
 +{
 +if (p-is_write)
 +return ignore_write(vcpu, p);
 +
 +*vcpu_reg(vcpu, p-Rt1) = vcpu-arch.cp14[r-reg];
 +
 +return true;
 +}
 +
  /*
   * We could trap ID_DFR0 and tell the guest we don't support
performance
   * monitoring.  Unfortunately the patch to make the kernel check
ID_DFR0 was
 @@ -375,7 +416,88 @@ static const struct coproc_reg cp15_regs[] = {
  { CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
  };
  
 +#define DBG_BCR_BVR_WCR_WVR(n)  \
 +/* DBGBVRn */   \
 +{ CRn( 0), CRm((n)), Op1( 0), Op2( 4), is32,\
 +  trap_debug32, reset_val, (cp14_DBGBVR0 + (n)), 0 },   \
 +/* DBGBCRn */   \
 +{ CRn( 0), CRm((n)), Op1( 0), Op2( 5), is32,\
 +  trap_debug32, reset_val, (cp14_DBGBCR0 + (n)), 0 },   \
 +/* DBGWVRn */   \
 +{ CRn( 0), CRm((n)),

Re: [PATCH v3 04/11] KVM: arm: common infrastructure for handling AArch32 CP14/CP15

2015-07-01 Thread zichao



On June 30, 2015 3:43:34 AM GMT+08:00, Christoffer Dall 
christoffer.d...@linaro.org wrote:
On Mon, Jun 22, 2015 at 06:41:27PM +0800, Zhichao Huang wrote:
 As we're about to trap a bunch of CP14 registers, let's rework
 the CP15 handling so it can be generalized and work with multiple
 tables.
 
 Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
 ---
  arch/arm/kvm/coproc.c  | 176
++---
  arch/arm/kvm/interrupts_head.S |   2 +-
  2 files changed, 112 insertions(+), 66 deletions(-)
 
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 9d283d9..d23395b 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -375,6 +375,9 @@ static const struct coproc_reg cp15_regs[] = {
  { CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
  };
  
 +static const struct coproc_reg cp14_regs[] = {
 +};
 +
  /* Target specific emulation tables */
  static struct kvm_coproc_target_table
*target_tables[KVM_ARM_NUM_TARGETS];
  
 @@ -424,47 +427,75 @@ static const struct coproc_reg *find_reg(const
struct coproc_params *params,
  return NULL;
  }
  
 -static int emulate_cp15(struct kvm_vcpu *vcpu,
 -const struct coproc_params *params)
 +/*
 + * emulate_cp --  tries to match a cp14/cp15 access in a handling
table,
 + *and call the corresponding trap handler.
 + *
 + * @params: pointer to the descriptor of the access
 + * @table: array of trap descriptors
 + * @num: size of the trap descriptor array
 + *
 + * Return 0 if the access has been handled, and -1 if not.
 + */
 +static int emulate_cp(struct kvm_vcpu *vcpu,
 +const struct coproc_params *params,
 +const struct coproc_reg *table,
 +size_t num)
  {
 -size_t num;
 -const struct coproc_reg *table, *r;
 -
 -trace_kvm_emulate_cp15_imp(params-Op1, params-Rt1, params-CRn,
 -   params-CRm, params-Op2, params-is_write);
 +const struct coproc_reg *r;
  
 -table = get_target_table(vcpu-arch.target, num);
 +if (!table)
 +return -1;  /* Not handled */
  
 -/* Search target-specific then generic table. */
  r = find_reg(params, table, num);
 -if (!r)
 -r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
  
 -if (likely(r)) {
 +if (r) {
  /* If we don't have an accessor, we should never get here! */
  BUG_ON(!r-access);
  
  if (likely(r-access(vcpu, params, r))) {
  /* Skip instruction, since it was emulated */
  kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
 -return 1;
  }
 -/* If access function fails, it should complain. */
 -} else {
 -kvm_err(Unsupported guest CP15 access at: %08lx\n,
 -*vcpu_pc(vcpu));
 -print_cp_instr(params);
 +
 +/* Handled */
 +return 0;
  }
 +
 +/* Not handled */
 +return -1;
 +}
 +
 +static void unhandled_cp_access(struct kvm_vcpu *vcpu,
 +const struct coproc_params *params)
 +{
 +u8 hsr_ec = kvm_vcpu_trap_get_class(vcpu);
 +int cp;
 +
 +switch (hsr_ec) {
 +case HSR_EC_CP15_32:
 +case HSR_EC_CP15_64:
 +cp = 15;
 +break;
 +case HSR_EC_CP14_MR:
 +case HSR_EC_CP14_64:
 +cp = 14;
 +break;
 +default:
 +WARN_ON((cp = -1));
 +}
 +
 +kvm_err(Unsupported guest CP%d access at: %08lx\n,
 +cp, *vcpu_pc(vcpu));
 +print_cp_instr(params);
  kvm_inject_undefined(vcpu);
 -return 1;
  }
  
 -/**
 - * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15
access
 - * @vcpu: The VCPU pointer
 - * @run:  The kvm_run struct
 - */
 -int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
 +int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
 +const struct coproc_reg *global,
 +size_t nr_global,
 +const struct coproc_reg *target_specific,
 +size_t nr_specific)
  {
  struct coproc_params params;
  
 @@ -478,7 +509,13 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu,
struct kvm_run *run)
  params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
  params.CRm = 0;
  
 -return emulate_cp15(vcpu, params);
 +if (!emulate_cp(vcpu, params, target_specific, nr_specific))
 +return 1;
 +if (!emulate_cp(vcpu, params, global, nr_global))
 +return 1;
 +
 +unhandled_cp_access(vcpu, params);
 +return 1;
  }
  
  static void reset_coproc_regs(struct kvm_vcpu *vcpu,
 @@ -491,12 +528,11 @@ static void reset_coproc_regs(struct kvm_vcpu
*vcpu,
  table[i].reset(vcpu, table[i]);
  }
  
 -/**
 - * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15
access
 - * @vcpu: The VCPU pointer
 - * @run:  The

Re: [PATCH v3 07/11] KVM: arm: add trap handlers for 64-bit debug registers

2015-07-01 Thread Zhichao Huang



On June 30, 2015 9:20:29 PM GMT+08:00, Christoffer Dall 
christoffer.d...@linaro.org wrote:
On Mon, Jun 22, 2015 at 06:41:30PM +0800, Zhichao Huang wrote:
 Add handlers for all the 64-bit debug registers.
 
 There is an overlap between 32 and 64bit registers. Make sure that
 64-bit registers preceding 32-bit ones.
 
 Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
 ---
  arch/arm/kvm/coproc.c | 12 
  1 file changed, 12 insertions(+)
 
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index 59b65b7..648 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -435,9 +435,17 @@ static const struct coproc_reg cp15_regs[] = {
  { CRn( 1), CRm((n)), Op1( 0), Op2( 1), is32, trap_raz_wi }
  
  /*
 + * Architected CP14 registers.
 + *

belongs in other patch?

OK, I will move it to the patch [06/11].

   * Trapped cp14 registers. We generally ignore most of the external
   * debug, on the principle that they don't really make sense to a
   * guest. Revisit this one day, whould this principle change.
 + *
 + * CRn denotes the primary register number, but is copied to the CRm in the
 + * user space API for 64-bit register access in line with the terminology 
 used
 + * in the ARM ARM.
 + * Important: Must be sorted ascending by CRn, CRM, Op1, Op2 and
with 64-bit
 + *registers preceding 32-bit ones.
   */
  static const struct coproc_reg cp14_regs[] = {
  /* DBGIDR */
 @@ -445,10 +453,14 @@ static const struct coproc_reg cp14_regs[] = {
  /* DBGDTRRXext */
  { CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, trap_raz_wi },
  DBG_BCR_BVR_WCR_WVR(0),
 +/* DBGDRAR (64bit) */
 +{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is64, trap_raz_wi},
  /* DBGDSCRint */
  { CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, trap_dbgdscr,
  NULL, cp14_DBGDSCRext },
  DBG_BCR_BVR_WCR_WVR(1),
 +/* DBGDSAR (64bit) */
 +{ CRn( 0), CRm( 2), Op1( 0), Op2( 0), is64, trap_raz_wi},
  /* DBGDSCRext */
  { CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, trap_debug32,
  reset_val, cp14_DBGDSCRext, 0 },
 -- 
 1.7.12.4
 
Otherwise:
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org

-- 
Zhichao Huang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 01/11] KVM: arm: plug guest debug exploit

2015-07-01 Thread zichao



On June 29, 2015 11:49:53 PM GMT+08:00, Christoffer Dall 
christoffer.d...@linaro.org wrote:
On Mon, Jun 22, 2015 at 06:41:24PM +0800, Zhichao Huang wrote:
 Hardware debugging in guests is not intercepted currently, it means
 that a malicious guest can bring down the entire machine by writing
 to the debug registers.
 
 This patch enable trapping of all debug registers, preventing the
guests
 to access the debug registers.
 
 This patch also disable the debug mode(DBGDSCR) in the guest world
all
 the time, preventing the guests to mess with the host state.
 
 However, it is a precursor for later patches which will need to do
 more to world switch debug states while necessary.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
 ---
  arch/arm/include/asm/kvm_coproc.h |  3 +-
  arch/arm/kvm/coproc.c | 60
+++
  arch/arm/kvm/handle_exit.c|  4 +--
  arch/arm/kvm/interrupts_head.S| 13 -
  4 files changed, 70 insertions(+), 10 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_coproc.h
b/arch/arm/include/asm/kvm_coproc.h
 index 4917c2f..e74ab0f 100644
 --- a/arch/arm/include/asm/kvm_coproc.h
 +++ b/arch/arm/include/asm/kvm_coproc.h
 @@ -31,7 +31,8 @@ void kvm_register_target_coproc_table(struct
kvm_coproc_target_table *table);
  int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
  int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run
*run);
  int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run
*run);
 -int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run
*run);
 +int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 +int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
  
 diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
 index f3d88dc..2e12760 100644
 --- a/arch/arm/kvm/coproc.c
 +++ b/arch/arm/kvm/coproc.c
 @@ -91,12 +91,6 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu
*vcpu, struct kvm_run *run)
  return 1;
  }
  
 -int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run
*run)
 -{
 -kvm_inject_undefined(vcpu);
 -return 1;
 -}
 -
  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct
coproc_reg *r)
  {
  /*
 @@ -519,6 +513,60 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu,
struct kvm_run *run)
  return emulate_cp15(vcpu, params);
  }
  
 +/**
 + * kvm_handle_cp14_64 -- handles a mrrc/mcrr trap on a guest CP14
access
 + * @vcpu: The VCPU pointer
 + * @run:  The kvm_run struct
 + */
 +int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
 +{
 +struct coproc_params params;
 +
 +params.CRn = (kvm_vcpu_get_hsr(vcpu)  1)  0xf;
 +params.Rt1 = (kvm_vcpu_get_hsr(vcpu)  5)  0xf;
 +params.is_write = ((kvm_vcpu_get_hsr(vcpu)  1) == 0);
 +params.is_64bit = true;
 +
 +params.Op1 = (kvm_vcpu_get_hsr(vcpu)  16)  0xf;
 +params.Op2 = 0;
 +params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
 +params.CRm = 0;

this is a complete duplicate of kvm_handle_cp15_64, can you share this
code somehow?


This patch just want to plug the exploit in the simplest way, and I shared the 
cp14/cp15 handlers in later patches [PATCH v3 04/11].

Should I take the patch [04/11] ahead of current patch [01/11] ?

 +
 +/* raz_wi */
 +(void)pm_fake(vcpu, params, NULL);
 +
 +/* handled */
 +kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
 +return 1;
 +}
 +
 +/**
 + * kvm_handle_cp14_32 -- handles a mrc/mcr trap on a guest CP14
access
 + * @vcpu: The VCPU pointer
 + * @run:  The kvm_run struct
 + */
 +int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
 +{
 +struct coproc_params params;
 +
 +params.CRm = (kvm_vcpu_get_hsr(vcpu)  1)  0xf;
 +params.Rt1 = (kvm_vcpu_get_hsr(vcpu)  5)  0xf;
 +params.is_write = ((kvm_vcpu_get_hsr(vcpu)  1) == 0);
 +params.is_64bit = false;
 +
 +params.CRn = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
 +params.Op1 = (kvm_vcpu_get_hsr(vcpu)  14)  0x7;
 +params.Op2 = (kvm_vcpu_get_hsr(vcpu)  17)  0x7;
 +params.Rt2 = 0;

this is a complete duplicate of kvm_handle_cp15_32, can you share this
code somehow?

 +
 +/* raz_wi */
 +(void)pm_fake(vcpu, params, NULL);
 +
 +/* handled */
 +kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
 +return 1;
 +}
 +
 
/**
   * Userspace API
  
*/
 diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
 index 95f12b2..357ad1b 100644
 --- a/arch/arm/kvm/handle_exit.c
 +++ b/arch/arm/kvm/handle_exit.c
 @@ -104,9 +104,9 @@ static exit_handle_fn arm_exit_handlers[] = {
  [HSR_EC_WFI]= kvm_handle_wfx,

74 matches

Mail list logo