Re: [PATCH v4 3/7] KVM: arm/arm64: Fix the documentation

2015-10-07 Thread Christoffer Dall
On Mon, Sep 28, 2015 at 06:27:30PM +0300, Pavel Fedin wrote:
> During refactoring we noticed some mistakes in the documentation.
> Correct them.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  Documentation/virtual/kvm/devices/arm-vgic.txt | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
> b/Documentation/virtual/kvm/devices/arm-vgic.txt
> index 3fb9054..4727829 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
> @@ -44,28 +44,29 @@ Groups:
>Attributes:
>  The attr field of kvm_device_attr encodes two values:
>  bits: | 63     40 | 39 ..  32  |  31   0 |
> -values:   |reserved   |   cpu id   |  offset |
> +values:   |reserved   |  cpu idx   |  offset |

why should this be changed to cpu idx?

>  
>  All distributor regs are (rw, 32-bit)
>  
>  The offset is relative to the "Distributor base address" as defined in 
> the
>  GICv2 specs.  Getting or setting such a register has the same effect as
> -reading or writing the register on the actual hardware from the cpu
> -specified with cpu id field.  Note that most distributor fields are not
> -banked, but return the same value regardless of the cpu id used to access
> -the register.
> +reading or writing the register on the actual hardware from the cpu whose
> +index is specified with cpu idx field.  Note that most distributor fields
> +are not banked, but return the same value regardless of the cpu idx used 
> to
> +access the register.
>Limitations:
>  - Priorities are not implemented, and registers are RAZ/WI
>  - Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
>Errors:
> --ENODEV: Getting or setting this register is not yet supported
> +-ENXIO: Getting or setting this register is not yet supported
>  -EBUSY: One or more VCPUs are running
> +-EINVAL: Invalid CPU index supplied
>  
>KVM_DEV_ARM_VGIC_GRP_CPU_REGS
>Attributes:
>  The attr field of kvm_device_attr encodes two values:
>  bits: | 63     40 | 39 ..  32  |  31   0 |
> -values:   |reserved   |   cpu id   |  offset |
> +values:   |reserved   |  cpu idx   |  offset |
>  
>  All CPU interface regs are (rw, 32-bit)
>  
> @@ -91,8 +92,9 @@ Groups:
>  - Priorities are not implemented, and registers are RAZ/WI
>  - Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
>Errors:
> --ENODEV: Getting or setting this register is not yet supported
> +-ENXIO: Getting or setting this register is not yet supported
>  -EBUSY: One or more VCPUs are running
> +-EINVAL: Invalid CPU index supplied
>  
>KVM_DEV_ARM_VGIC_GRP_NR_IRQS
>Attributes:
> -- 
> 2.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/7] KVM: arm64: Implement vGICv3 distributor and redistributor access from userspace

2015-10-07 Thread Christoffer Dall
Hi Pavel,


On Mon, Sep 28, 2015 at 06:27:31PM +0300, Pavel Fedin wrote:
> The access is done similar to vGICv2, using KVM_DEV_ARM_VGIC_GRP_DIST_REGS
> and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS with KVM_SET_DEVICE_ATTR and
> KVM_GET_DEVICE_ATTR ioctls. Since GICv3 can handle large number of CPUs,
> KVM_DEV_ARM_VGIC_CPUID_MASK has been extended to 20 bits. This is enough
> for 1048576 CPUs.
> 
> Some registers are 64-bit wide according to the specification.
> KVM_DEV_ARM_VGIC_64BIT flag is introduced, allowing to perform full 64-bit
> accesses. vgic_attr_regs_access() has also been fixed up in order to be
> able to perform 64-bit accesses correctly.
> 
Apologies for the long delay in responding to this.

We went over this and discussed this during Linaro Connect, and we came
up with an API that we believe considers the future, works well with how
things are going to be designed in QEMU, and follows the architecture.

I'm sorry to make you rework your patches at this stage, but I'd like
you to take a look at the patch for the API that I just sent out
(subject: KVM: arm/arm64: Add VGICv3 save/restore API documentation) and
adapt this patch set to use that API instead and then I'll review that.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 17/20] KVM: ARM64: Add PMU overflow interrupt routing

2015-10-07 Thread Marc Zyngier
On 24/09/15 23:31, Shannon Zhao wrote:
> When calling perf_event_create_kernel_counter to create perf_event,
> assign a overflow handler. Then when perf event overflows, set
> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm/kvm/arm.c|  4 
>  include/kvm/arm_pmu.h |  2 ++
>  virt/kvm/arm/pmu.c| 54 
> ++-
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ce404a5..3fca263 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -554,6 +555,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   kvm_vgic_sync_hwstate(vcpu);
>   preempt_enable();
>   kvm_timer_sync_hwstate(vcpu);
> + kvm_pmu_sync_hwstate(vcpu);
>   continue;
>   }
>  
> @@ -604,6 +606,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>  
>   kvm_timer_sync_hwstate(vcpu);
>  
> + kvm_pmu_sync_hwstate(vcpu);
> +
>   ret = handle_exit(vcpu, run, ret);
>   }

The code around here is about to change with Christopher's patches. Most
importantly, virtual devices must signal their changes before we touch
the vgic. I suspect this will have some impacts, see below.

>  
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 9293133..953c400 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -38,6 +38,7 @@ struct kvm_pmu {
>  };
>  
>  #ifdef CONFIG_KVM_ARM_PMU
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 
> select_idx);
>  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val);
> @@ -45,6 +46,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 
> val);
>  void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
>   u32 select_idx);
>  #else
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 
> select_idx)
>  {
>   return 0;
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 18637c9..ca7e849 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static void kvm_pmu_set_evttyper(struct kvm_vcpu *vcpu, u32 idx, u32 val)
>  {
> @@ -62,6 +63,56 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
> u32 select_idx)
>  }
>  
>  /**
> + * kvm_pmu_sync_hwstate - sync pmu state for cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Inject virtual PMU IRQ if IRQ is pending for this cpu.
> + */
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = >arch.pmu;
> +
> + if (pmu->irq_pending && (pmu->irq_num != -1)) {

How likely is that pmu->irq_num could be -1? I don't think the interrupt
can be made optional.

> + kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, pmu->irq_num, 1);
> + pmu->irq_pending = false;
> + }

So you're signalling the interrupt as an edge, not a level. Is that an
accurate modelling of the PMU interrupt?

My hunch is that the interrupt should still be pending until the guest
has cleared the overflow condition (by writing to PMOVSCLR_EL0). You can
probably lift most of that logic from  Christoffer's rework of the timer
state.

> +}
> +
> +/**
> + * When perf event overflows, set irq_pending and call kvm_vcpu_kick() to 
> inject
> + * the interrupt.
> + */
> +static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
> +   struct perf_sample_data *data,
> +   struct pt_regs *regs)
> +{
> + struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> + struct kvm_vcpu *vcpu = pmc->vcpu;
> + struct kvm_pmu *pmu = >arch.pmu;
> + int idx = pmc->idx;
> +
> + if (!vcpu_mode_is_32bit(vcpu)) {
> + if ((vcpu_sys_reg(vcpu, PMINTENSET_EL1) >> idx) & 0x1) {
> + __set_bit(idx,
> + (unsigned long *)_sys_reg(vcpu, PMOVSSET_EL0));
> + __set_bit(idx,
> + (unsigned long *)_sys_reg(vcpu, PMOVSCLR_EL0));
> + pmu->irq_pending = true;
> + kvm_vcpu_kick(vcpu);
> + }
> + } else {
> + if ((vcpu_cp15(vcpu, c9_PMINTENSET) >> idx) & 0x1) {
> + __set_bit(idx,
> + (unsigned long *)_cp15(vcpu, c9_PMOVSSET));
> + __set_bit(idx,
> + 

Re: [PATCH 03/15] arm64: Introduce helpers for page table levels

2015-10-07 Thread Christoffer Dall
Hi Suzuki,

On Tue, Sep 15, 2015 at 04:41:12PM +0100, Suzuki K. Poulose wrote:
> From: "Suzuki K. Poulose" 
> 
> Introduce helpers for finding the number of page table
> levels required for a given VA width, shift for a particular
> page table level.
> 
> Convert the existing users to the new helpers. More users
> to follow.
> 
> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Signed-off-by: Suzuki K. Poulose 
> Reviewed-by: Ard Biesheuvel 
> Tested-by: Ard Biesheuvel 
> ---
>  arch/arm64/include/asm/pgtable-hwdef.h |   15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
> b/arch/arm64/include/asm/pgtable-hwdef.h
> index 24154b0..ce18389 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -16,13 +16,21 @@
>  #ifndef __ASM_PGTABLE_HWDEF_H
>  #define __ASM_PGTABLE_HWDEF_H
>  
> +/*
> + * Number of page-table levels required to address 'va_bits' wide
> + * address, without section mapping
> + */
> +#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))

I don't understand the '(va_bits) - 4' here, can you explain it (and add a
comment to that effect) ?

> +#define ARM64_HW_PGTABLE_LEVEL_SHIFT(level) \
> + ((PAGE_SHIFT - 3) * (level) + 3)
> +

While this change is clearly correct, if you can explain the math here
in a comment as well, that would be helpful.

Thanks,
-Christoffer

>  #define PTRS_PER_PTE (1 << (PAGE_SHIFT - 3))
>  
>  /*
>   * PMD_SHIFT determines the size a level 2 page table entry can map.
>   */
>  #if CONFIG_PGTABLE_LEVELS > 2
> -#define PMD_SHIFT((PAGE_SHIFT - 3) * 2 + 3)
> +#define PMD_SHIFTARM64_HW_PGTABLE_LEVEL_SHIFT(2)
>  #define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
>  #define PMD_MASK (~(PMD_SIZE-1))
>  #define PTRS_PER_PMD PTRS_PER_PTE
> @@ -32,7 +40,7 @@
>   * PUD_SHIFT determines the size a level 1 page table entry can map.
>   */
>  #if CONFIG_PGTABLE_LEVELS > 3
> -#define PUD_SHIFT((PAGE_SHIFT - 3) * 3 + 3)
> +#define PUD_SHIFTARM64_HW_PGTABLE_LEVEL_SHIFT(3)
>  #define PUD_SIZE (_AC(1, UL) << PUD_SHIFT)
>  #define PUD_MASK (~(PUD_SIZE-1))
>  #define PTRS_PER_PUD PTRS_PER_PTE
> @@ -42,7 +50,8 @@
>   * PGDIR_SHIFT determines the size a top-level page table entry can map
>   * (depending on the configuration, this level can be 0, 1 or 2).
>   */
> -#define PGDIR_SHIFT  ((PAGE_SHIFT - 3) * CONFIG_PGTABLE_LEVELS + 3)
> +#define PGDIR_SHIFT  \
> + ARM64_HW_PGTABLE_LEVEL_SHIFT(CONFIG_PGTABLE_LEVELS)
>  #define PGDIR_SIZE   (_AC(1, UL) << PGDIR_SHIFT)
>  #define PGDIR_MASK   (~(PGDIR_SIZE-1))
>  #define PTRS_PER_PGD (1 << (VA_BITS - PGDIR_SHIFT))
> -- 
> 1.7.9.5
> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Pavel Fedin
 Hello!

> +/* Called with the distributor lock held by the caller. */
> +void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
> +{
> + struct vgic_its *its = >kvm->arch.vgic.its;
> + struct its_itte *itte;
> +
> + spin_lock(>lock);
> +
> + /* Find the right ITTE and put the pending state back in there */
> + itte = find_itte_by_lpi(vcpu->kvm, lpi);
> + if (itte)
> + __set_bit(vcpu->vcpu_id, itte->pending);
> +
> + spin_unlock(>lock);
> +}

 I am working on implementing live migration for the ITS. And here i have one 
fundamental problem.
vits_unqueue_lpi() processes only PENDING state. And looks like this 
corresponds to the HW
implementation, which has only bitwise pending table. But, in terms of 
migration, we can actually
have LPI in active state, while it's being processed.
 The question is - how can we handle it? Should we have one more bitwise table 
for active LPIs, or
is it enough to remember only a single, currently active LPI? Can LPIs be 
preempted on a real
hardware, or not?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 03/20] KVM: ARM64: Add offset defines for PMU registers

2015-10-07 Thread Marc Zyngier
On 24/09/15 23:31, Shannon Zhao wrote:
> We are about to trap and emulate acccesses to each PMU register
> individually. This adds the context offsets for the AArch64 PMU
> registers and their AArch32 counterparts.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/include/asm/kvm_asm.h | 59 
> +++-
>  1 file changed, 52 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 67fa0de..0a4dfcc 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -48,14 +48,36 @@
>  #define MDSCR_EL122  /* Monitor Debug System Control Register */
>  #define MDCCINT_EL1  23  /* Monitor Debug Comms Channel Interrupt Enable 
> Reg */
>  
> +/* Performance Monitors Registers */
> +#define PMCR_EL0 24  /* Control Register */
> +#define PMOVSSET_EL0 25  /* Overflow Flag Status Set Register */
> +#define PMOVSCLR_EL0 26  /* Overflow Flag Status Clear Register */
> +#define PMSELR_EL0   27  /* Event Counter Selection Register */
> +#define PMCEID0_EL0  28  /* Common Event Identification Register 0 */
> +#define PMCEID1_EL0  29  /* Common Event Identification Register 1 */
> +#define PMEVCNTR0_EL030  /* Event Counter Register (0-30) */
> +#define PMEVCNTR30_EL0   60
> +#define PMCCNTR_EL0  61  /* Cycle Counter Register */
> +#define PMEVTYPER0_EL0   62  /* Event Type Register (0-30) */
> +#define PMEVTYPER30_EL0  92
> +#define PMCCFILTR_EL093  /* Cycle Count Filter Register */
> +#define PMXEVCNTR_EL094  /* Selected Event Count Register */
> +#define PMXEVTYPER_EL0   95  /* Selected Event Type Register */
> +#define PMCNTENSET_EL0   96  /* Count Enable Set Register */
> +#define PMCNTENCLR_EL0   97  /* Count Enable Clear Register */
> +#define PMINTENSET_EL1   98  /* Interrupt Enable Set Register */
> +#define PMINTENCLR_EL1   99  /* Interrupt Enable Clear Register */
> +#define PMUSERENR_EL0100 /* User Enable Register */
> +#define PMSWINC_EL0  101 /* Software Increment Register */
> +
>  /* 32bit specific registers. Keep them at the end of the range */
> -#define  DACR32_EL2  24  /* Domain Access Control Register */
> -#define  IFSR32_EL2  25  /* Instruction Fault Status Register */
> -#define  FPEXC32_EL2 26  /* Floating-Point Exception Control 
> Register */
> -#define  DBGVCR32_EL227  /* Debug Vector Catch Register */
> -#define  TEECR32_EL1 28  /* ThumbEE Configuration Register */
> -#define  TEEHBR32_EL129  /* ThumbEE Handler Base Register */
> -#define  NR_SYS_REGS 30
> +#define  DACR32_EL2  102 /* Domain Access Control Register */
> +#define  IFSR32_EL2  103 /* Instruction Fault Status Register */
> +#define  FPEXC32_EL2 104 /* Floating-Point Exception Control 
> Register */
> +#define  DBGVCR32_EL2105 /* Debug Vector Catch Register */
> +#define  TEECR32_EL1 106 /* ThumbEE Configuration Register */
> +#define  TEEHBR32_EL1107 /* ThumbEE Handler Base Register */
> +#define  NR_SYS_REGS 108

This will need some rebasing - some of the registers have already
changed or disappeared. I really need to find a way to make this mess
more manageable...

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] genirq: Move irq_set_vcpu_affinity out of "#ifdef CONFIG_SMP"

2015-10-07 Thread Wu, Feng
Hi Thomas & Paolo,

> -Original Message-
> From: Jiang Liu [mailto:jiang@linux.intel.com]
> Sent: Saturday, October 03, 2015 5:11 PM
> To: Wu, Feng; t...@linutronix.de; pbonz...@redhat.com
> Cc: kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH] genirq: Move irq_set_vcpu_affinity out of "#ifdef
> CONFIG_SMP"
> 
> On 2015/10/3 16:20, Feng Wu wrote:
> > irq_set_vcpu_affinity() is needed when CONFIG_SMP=n, so move the
> > definition out of "#ifdef CONFIG_SMP"

What is your option about this patch, Thanks a lot!

Thanks,
Feng

> >
> > Suggested-by: Paolo Bonzini 
> > Signed-off-by: Feng Wu 
> 
> Reviewed-by: Jiang Liu 
> 
> > ---
> >  kernel/irq/manage.c | 62
> ++---
> >  1 file changed, 31 insertions(+), 31 deletions(-)
> >
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 1c58655..90b378d 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -258,37 +258,6 @@ int irq_set_affinity_hint(unsigned int irq, const
> struct cpumask *m)
> >  }
> >  EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
> >
> > -/**
> > - * irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
> > - * @irq: interrupt number to set affinity
> > - * @vcpu_info: vCPU specific data
> > - *
> > - * This function uses the vCPU specific data to set the vCPU
> > - * affinity for an irq. The vCPU specific data is passed from
> > - * outside, such as KVM. One example code path is as below:
> > - * KVM -> IOMMU -> irq_set_vcpu_affinity().
> > - */
> > -int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
> > -{
> > -   unsigned long flags;
> > -   struct irq_desc *desc = irq_get_desc_lock(irq, , 0);
> > -   struct irq_data *data;
> > -   struct irq_chip *chip;
> > -   int ret = -ENOSYS;
> > -
> > -   if (!desc)
> > -   return -EINVAL;
> > -
> > -   data = irq_desc_get_irq_data(desc);
> > -   chip = irq_data_get_irq_chip(data);
> > -   if (chip && chip->irq_set_vcpu_affinity)
> > -   ret = chip->irq_set_vcpu_affinity(data, vcpu_info);
> > -   irq_put_desc_unlock(desc, flags);
> > -
> > -   return ret;
> > -}
> > -EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
> > -
> >  static void irq_affinity_notify(struct work_struct *work)
> >  {
> > struct irq_affinity_notify *notify =
> > @@ -424,6 +393,37 @@ setup_affinity(struct irq_desc *desc, struct
> cpumask *mask)
> >  }
> >  #endif
> >
> > +/**
> > + * irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
> > + * @irq: interrupt number to set affinity
> > + * @vcpu_info: vCPU specific data
> > + *
> > + * This function uses the vCPU specific data to set the vCPU
> > + * affinity for an irq. The vCPU specific data is passed from
> > + * outside, such as KVM. One example code path is as below:
> > + * KVM -> IOMMU -> irq_set_vcpu_affinity().
> > + */
> > +int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
> > +{
> > +   unsigned long flags;
> > +   struct irq_desc *desc = irq_get_desc_lock(irq, , 0);
> > +   struct irq_data *data;
> > +   struct irq_chip *chip;
> > +   int ret = -ENOSYS;
> > +
> > +   if (!desc)
> > +   return -EINVAL;
> > +
> > +   data = irq_desc_get_irq_data(desc);
> > +   chip = irq_data_get_irq_chip(data);
> > +   if (chip && chip->irq_set_vcpu_affinity)
> > +   ret = chip->irq_set_vcpu_affinity(data, vcpu_info);
> > +   irq_put_desc_unlock(desc, flags);
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
> > +
> >  void __disable_irq(struct irq_desc *desc)
> >  {
> > if (!desc->depth++)
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Marc Zyngier
On Wed, 7 Oct 2015 21:09:07 +0300
Pavel Fedin  wrote:

>  Hello!
> 
> > LPIs do not have an active state, at the redistributor or otherwise.
> 
>  Then what do they become after they were ACK'ed and before EOI'ed?

Nothing. They are gone. What is left at the CPU interface is the active
priority.

>  I tried to google up this thing, and came up with this email:
> http://www.spinics.net/lists/kvm-arm/msg16032.html. It says that "SW must 
> issue a write to EOI to
> clear the active priorities register, hence the CPU interface still requires 
> an active state for
> LPIs". They give a link to some document which seems to be top-secret and 
> never published, because
> my arch reference manual does not have section 4.8.3 named "Properties of 
> LPI".

Your architecture document has a section 1.2.1 which contains the
sentence: "LPIs do not have an active state, and therefore do not
require explicit deactivation.". It also has 1.2.2 ("Interrupt states")
that repeatedly states the same thing. Finally, the email you quote is
about priority drop vs deactivation, not about the active state of an
LPI.

>  And another thread,
> http://lists.xen.org/archives/html/xen-devel/2014-09/msg01141.html,
> says that virtual LPIs actually do have active state in LR.

Or not. Read again. The only case where something vaguely relevant
happens is when you inject a virtual SPI backed by a HW LPI. In that
case, the LR does have an active state (of course, this is an SPI). Or
when you inject a virtual LPI backed by a HW SPI (in which case the
relevant active state is in the physical distributor, not in the LR).

I'd appreciate if you could try to read and understand the architecture
spec instead of randomly googling and quoting various bits of
irrelevant information.

If something is unclear in the architecture specification (yes, this
is complicated and sometimes confusing), please ask relevant questions.
At the moment, you're just asserting fallacies, and I'd rather spend
time doing something useful instead of setting the record straight
again and again.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: arm/arm64: Add VGICv3 save/restore API documentation

2015-10-07 Thread Christoffer Dall
Factor out the GICv3-specific documentation into a separate
documentation file.  Add description for how to access distributor,
redistributor, and CPU interface registers for GICv3 in this new file.

Acked-by: Peter Maydell 
Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 Documentation/virtual/kvm/devices/arm-vgic-v3.txt | 114 ++
 Documentation/virtual/kvm/devices/arm-vgic.txt|  20 +---
 2 files changed, 118 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/arm-vgic-v3.txt

diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt 
b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
new file mode 100644
index 000..383f9be
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
@@ -0,0 +1,114 @@
+ARM Virtual Generic Interrupt Controller v3 and later (VGICv3)
+==
+
+
+Device types supported:
+  KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0
+
+Only one VGIC instance may be instantiated through this API.  The created VGIC
+will act as the VM interrupt controller, requiring emulated user-space devices
+to inject interrupts to the VGIC instead of directly to CPUs.  It is not
+possible to create both a GICv3 and GICv2 on the same VM.
+
+Creating a guest GICv3 device requires a host GICv3 as well.
+
+Groups:
+  KVM_DEV_ARM_VGIC_GRP_ADDR
+  Attributes:
+KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit)
+  Base address in the guest physical address space of the GICv3 distributor
+  register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
+  This address needs to be 64K aligned and the region covers 64 KByte.
+
+KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit)
+  Base address in the guest physical address space of the GICv3
+  redistributor register mappings. There are two 64K pages for each
+  VCPU and all of the redistributor pages are contiguous.
+  Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
+  This address needs to be 64K aligned.
+
+
+  KVM_DEV_ARM_VGIC_GRP_DIST_REGS
+  KVM_DEV_ARM_VGIC_GRP_REDIST_REGS
+  Attributes:
+The attr field of kvm_device_attr encodes two values:
+bits: | 63     32  |  31   0 |
+values:   |  mpidr |  offset |
+
+All distributor regs are (rw, 64-bit).
+
+KVM_DEV_ARM_VGIC_GRP_DIST_REGS accesses the main distributor registers.
+KVM_DEV_ARM_VGIC_GRP_REDIST_REGS accesses the redistributor of the CPU
+specified by the mpidr.
+
+The offset is relative to the "[Re]Distributor base address" as defined
+in the GICv3/4 specs.  Getting or setting such a register has the same
+effect as reading or writing the register on real hardware, and the mpidr
+field is used to specify which redistributor is accessed.  The mpidr is
+ignored for the distributor.
+
+The mpidr encoding is based on the affinity information in the
+architecture defined MPIDR, and the field is encoded as follows:
+  | 63  56 | 55  48 | 47  40 | 39  32 |
+  |Aff3|Aff2|Aff1|Aff0|
+
+Note that distributor fields are not banked, but return the same value
+regardless of the mpidr used to access the register.
+  Limitations:
+- Priorities are not implemented, and registers are RAZ/WI
+  Errors:
+-ENXIO: Getting or setting this register is not yet supported
+-EBUSY: One or more VCPUs are running
+
+
+  KVM_DEV_ARM_VGIC_CPU_SYSREGS
+  Attributes:
+The attr field of kvm_device_attr encodes two values:
+bits: | 63     32 | 31    16 | 15    0 |
+values:   | mpidr |  RES |instr|
+
+The mpidr field encodes the CPU ID based on the affinity information in the
+architecture defined MPIDR, and the field is encoded as follows:
+  | 63  56 | 55  48 | 47  40 | 39  32 |
+  |Aff3|Aff2|Aff1|Aff0   |
+
+The instr field encodes the system register to access based on the fields
+defined in the A64 instruction set encoding for system register access
+(RES means the bits are reserved for future use and should be zero):
+
+  | 15 ... 14 | 13 ... 11 | 10 ... 7 | 6 ... 3 | 2 ... 0 |
+  |   Op 0|Op1|CRn   |   CRm   |   Op2   |
+
+All system regs accessed through this API are (rw, 64-bit).
+
+KVM_DEV_ARM_VGIC_CPU_SYSREGS accesses the CPU interface registers for the
+CPU specified by the mpidr field.
+
+
+  Limitations:
+- Priorities are not implemented, and registers are RAZ/WI
+  Errors:
+-ENXIO: Getting or setting this register is not yet supported
+-EBUSY: One or more VCPUs are running
+
+
+  KVM_DEV_ARM_VGIC_GRP_NR_IRQS
+  Attributes:
+A value describing the number of interrupts (SGI, PPI and SPI) for
+this GIC instance, ranging from 

Re: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Christoffer Dall
On Wed, Oct 07, 2015 at 11:39:30AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > +/* Called with the distributor lock held by the caller. */
> > +void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
> > +{
> > +   struct vgic_its *its = >kvm->arch.vgic.its;
> > +   struct its_itte *itte;
> > +
> > +   spin_lock(>lock);
> > +
> > +   /* Find the right ITTE and put the pending state back in there */
> > +   itte = find_itte_by_lpi(vcpu->kvm, lpi);
> > +   if (itte)
> > +   __set_bit(vcpu->vcpu_id, itte->pending);
> > +
> > +   spin_unlock(>lock);
> > +}
> 
>  I am working on implementing live migration for the ITS. And here i have one 
> fundamental problem.
> vits_unqueue_lpi() processes only PENDING state. And looks like this 
> corresponds to the HW
> implementation, which has only bitwise pending table. But, in terms of 
> migration, we can actually
> have LPI in active state, while it's being processed.

I thought LPIs had strict fire-and-forget semantics, not allowing any
active state, and that they are either pending or inactive?


>  The question is - how can we handle it? Should we have one more bitwise 
> table for active LPIs, or
> is it enough to remember only a single, currently active LPI? Can LPIs be 
> preempted on a real
> hardware, or not?
> 
Perhaps you're asking if LPIs have active state semantics on real
hardware and thus supports threaded interrupt handling for LPIs?

That is not supported on real hardware, which I think addresses your
concerns.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gelten Sie für dringende Darlehen bieten1.4%

2015-10-07 Thread BancoPosta Online Loans



--
BancoPosta Loans
Viale Europa,
175-00144 Roma,
Italy.

Email: bancoposta-loansfact...@outlook.com

Zu wen es angeht.

BancopPosta Management bietet einfache und günstige Darlehen online für
Einzelperson/Firmen, braucht das Geld zu investieren oder ein Geschäft 
zu
starten. Darlehensbetrag von bis 10,000.00 bis Euro 100,000,000.00 Euro 
in

Höhe von 1.4% pro Jahr. Außerdem bieten wir Folgendes:

* Zweite Hypothek Darlehen
* Internationale Darlehen
* Schuldenkonsolidierung
* Business-Darlehen
* Persönliche Darlehen
* Darlehen Investoren

Für mehr Infos senden in ausführlichen Kredit-Vorschlag:

Vollständiger Name:
Land:
Kreditbetrag:
Darlehen-Dauer:

Wir erwarten Ihre Antwort um uns zu aktivieren gehen Sie mit der
Kredit-Dokumentation

Danke

Massimo Sarmi.
Kredit-Auszahlung-Manager
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] arm64: Introduce helpers for page table levels

2015-10-07 Thread Suzuki K. Poulose

On 07/10/15 10:26, Marc Zyngier wrote:

On 07/10/15 09:26, Christoffer Dall wrote:

Hi Suzuki,

On Tue, Sep 15, 2015 at 04:41:12PM +0100, Suzuki K. Poulose wrote:

From: "Suzuki K. Poulose" 

Introduce helpers for finding the number of page table
levels required for a given VA width, shift for a particular
page table level.

Convert the existing users to the new helpers. More users
to follow.

Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Suzuki K. Poulose 
Reviewed-by: Ard Biesheuvel 
Tested-by: Ard Biesheuvel 
---
  arch/arm64/include/asm/pgtable-hwdef.h |   15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 24154b0..ce18389 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -16,13 +16,21 @@
  #ifndef __ASM_PGTABLE_HWDEF_H
  #define __ASM_PGTABLE_HWDEF_H

+/*
+ * Number of page-table levels required to address 'va_bits' wide
+ * address, without section mapping
+ */
+#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))


I don't understand the '(va_bits) - 4' here, can you explain it (and add a
comment to that effect) ?


I just had a chat with Catalin, who did shed some light on this.
It all has to do with rounding up. What you would like to have here is:

#define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, 
PAGE_SHIFT - 3)

where (va_bits - PAGE_SHIFT) is the total number of bits we deal
with during a page table walk, and (PAGE_SHIFT - 3) is the number
of bits we deal with per level.

The clue is in how DIV_ROUND_UP is written:

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

which gives you Suzuki's magic formula.


Thanks Marc for pitching in. That explains it better.



I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable.


Sure, I can change that.

Suzuki

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Paolo Bonzini


On 06/10/2015 22:33, Radim Krčmář wrote:
> 2015-08-15 02:00+0200, Paolo Bonzini:
>> On 14/08/2015 10:38, Radim Krčmář wrote:
 How do you reproduce the bug?
>>> I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
>>> smp_affinity of "timer".  The bug is hit within seconds.
>>
>> Nice, I'll try to make a unit test for it on the plane. :)
> 
> (Time on planes is best spent by sleeping ;])

That's what I ended up doing. :)

> What do you think about the series?

I had just a few small comments.  I had all but forgotten it, sorry.
The unit test makes the bug clear, thanks.  What's the issue with
handle_irq?

Paolo

> I made a prototype kvm-unit-test ...
> (Reproduces with APICv or split irqchip builds.)
> ---8<---
> x86: test IO-APIC dest_id modification before
> 
> Regression test for kvm commit $TODO.
> Run with '-smp 2'.
> 
> I had problems with handle_irq so this version uses asm workaround;
> will fix that in final version.  Initialization also presumes that
> everything will work as it did on my machines :)
> ---
>  x86/ioapic.c | 47 +--
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/x86/ioapic.c b/x86/ioapic.c
> index 1fe1ccc9eadd..55f8eea03496 100644
> --- a/x86/ioapic.c
> +++ b/x86/ioapic.c
> @@ -4,21 +4,27 @@
>  #include "smp.h"
>  #include "desc.h"
>  #include "isr.h"
> +#include "io.h"
>  
>  #define EDGE_TRIGGERED 0
>  #define LEVEL_TRIGGERED 1
>  
> -static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
> +static void __set_ioapic_redir(unsigned line, u8 vec, bool trig_mode, u8 
> dest_id)
>  {
>   ioapic_redir_entry_t e = {
>   .vector = vec,
> - .delivery_mode = 0,
>   .trig_mode = trig_mode,
> + .dest_id = dest_id,
>   };
>  
>   ioapic_write_redir(line, e);
>  }
>  
> +static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
> +{
> + __set_ioapic_redir(line, vec, trig_mode, 0);
> +}
> +
>  static void set_irq_line(unsigned line, int val)
>  {
>   asm volatile("out %0, %1" : : "a"((u8)val), "d"((u16)(0x2000 + line)));
> @@ -298,6 +304,41 @@ static void test_ioapic_level_retrigger_mask(void)
>   set_mask(0x0e, false);
>  }
>  
> +static volatile int pit_working = -1;
> +
> +static void __attribute__((used)) pit_isr_fn(void)
> +{
> + if (!pit_working++);
> + __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 1);
> + eoi();
> +}
> +
> +void pit_isr(void);
> +asm (
> + "pit_isr: \n"
> + "   call pit_isr_fn \n"
> +#ifndef __x86_64__
> + "   iret"
> +#else
> + "   iretq"
> +#endif
> + );
> +
> +static void test_ioapic_eoi_bug(void)
> +{
> + if (cpu_count() < 2)
> + return;
> +
> + set_idt_entry(0x84, pit_isr, 0);
> + __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 0);
> +
> + outb(0x34, 0x43);
> +
> + for (unsigned loops = 1; loops && pit_working < 1; loops--)
> + asm volatile ("nop");
> +
> + report("dest_id reconfiguration before EOI", pit_working >= 1);
> +}
>  
>  int main(void)
>  {
> @@ -325,5 +366,7 @@ int main(void)
>   test_ioapic_level_mask();
>   test_ioapic_level_retrigger_mask();
>  
> + test_ioapic_eoi_bug();
> +
>   return report_summary();
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] arm64: Cleanup VTCR_EL2 computation

2015-10-07 Thread Suzuki K. Poulose

On 07/10/15 11:11, Marc Zyngier wrote:

On 15/09/15 16:41, Suzuki K. Poulose wrote:

From: "Suzuki K. Poulose" 




diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index bdf139e..699554d 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -138,6 +138,9 @@
   * The magic numbers used for VTTBR_X in this patch can be found in Tables
   * D4-23 and D4-25 in ARM DDI 0487A.b.
   */
+#define VTCR_EL2_COMMON_BITS   (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
+VTCR_EL2_IRGN0_WBWA | VTCR_EL2_T0SZ_40B)
+
  #ifdef CONFIG_ARM64_64K_PAGES
  /*
   * Stage2 translation configuration:
@@ -145,9 +148,8 @@
   * 64kB pages (TG0 = 1)
   * 2 level page tables (SL = 1)
   */
-#define VTCR_EL2_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SH0_INNER | \
-VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \
-VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B)
+#define VTCR_EL2_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1 | \
+VTCR_EL2_COMMON_BITS)
  #define VTTBR_X   (38 - VTCR_EL2_T0SZ_40B)
  #else
  /*
@@ -156,9 +158,8 @@
   * 4kB pages (TG0 = 0)
   * 3 level page tables (SL = 1)
   */
-#define VTCR_EL2_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SH0_INNER | \
-VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \
-VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B)
+#define VTCR_EL2_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1 | \
+VTCR_EL2_COMMON_BITS)
  #define VTTBR_X   (37 - VTCR_EL2_T0SZ_40B)
  #endif




This looks OK, but is going to clash badly with 857d1a9 ("arm64: KVM:
set {v,}TCR_EL2 RES1 bits"). Nothing we can't fix though.



As discussed, I will rebase my series on top of 4.3-rc4 to avoid this.

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-07 Thread Joerg Roedel
On Tue, Oct 06, 2015 at 01:59:27PM -0400, Bandan Das wrote:
> Joerg Roedel  writes:
> >
> > So svm->vmcb->control.next_rip is only written by hardware or in
> > svm_check_intercept(). Both cases write only to this field, if the
> > hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only
> 
> Not until commit f104765b4f81fd74d69e0eb161e89096deade2db. So, an older L1
> kernel will trigger it.

But we don't care if L1 writes something into its own next_rip, as we
never read this value from its VMCB. We only copy the next_rip value we
get from our shadow-vmcb to it on an emulated vmexit. So I still don't
understand what triggers the reported problem or why the WARN_ON is
necessary.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] arm64: Introduce helpers for page table levels

2015-10-07 Thread Suzuki K. Poulose

On 07/10/15 09:26, Christoffer Dall wrote:

Hi Suzuki,

On Tue, Sep 15, 2015 at 04:41:12PM +0100, Suzuki K. Poulose wrote:

From: "Suzuki K. Poulose" 

Introduce helpers for finding the number of page table
levels required for a given VA width, shift for a particular
page table level.

Convert the existing users to the new helpers. More users
to follow.

Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: Catalin Marinas 
Cc: Will Deacon 
Signed-off-by: Suzuki K. Poulose 
Reviewed-by: Ard Biesheuvel 
Tested-by: Ard Biesheuvel 
---
  arch/arm64/include/asm/pgtable-hwdef.h |   15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 24154b0..ce18389 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -16,13 +16,21 @@
  #ifndef __ASM_PGTABLE_HWDEF_H
  #define __ASM_PGTABLE_HWDEF_H

+/*
+ * Number of page-table levels required to address 'va_bits' wide
+ * address, without section mapping
+ */
+#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))


I don't understand the '(va_bits) - 4' here, can you explain it (and add a
comment to that effect) ?


As mentioned, I will change it to DIV_ROUND_UP() as suggested by Marc.




+#define ARM64_HW_PGTABLE_LEVEL_SHIFT(level) \
+   ((PAGE_SHIFT - 3) * (level) + 3)
+


While this change is clearly correct, if you can explain the math here
in a comment as well, that would be helpful.


Sure, will add a comment to that effect.

Suzuki

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/15] arm64: Introduce helpers for page table levels

2015-10-07 Thread Marc Zyngier
On 07/10/15 09:26, Christoffer Dall wrote:
> Hi Suzuki,
> 
> On Tue, Sep 15, 2015 at 04:41:12PM +0100, Suzuki K. Poulose wrote:
>> From: "Suzuki K. Poulose" 
>>
>> Introduce helpers for finding the number of page table
>> levels required for a given VA width, shift for a particular
>> page table level.
>>
>> Convert the existing users to the new helpers. More users
>> to follow.
>>
>> Cc: Ard Biesheuvel 
>> Cc: Mark Rutland 
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Signed-off-by: Suzuki K. Poulose 
>> Reviewed-by: Ard Biesheuvel 
>> Tested-by: Ard Biesheuvel 
>> ---
>>  arch/arm64/include/asm/pgtable-hwdef.h |   15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
>> b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 24154b0..ce18389 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -16,13 +16,21 @@
>>  #ifndef __ASM_PGTABLE_HWDEF_H
>>  #define __ASM_PGTABLE_HWDEF_H
>>  
>> +/*
>> + * Number of page-table levels required to address 'va_bits' wide
>> + * address, without section mapping
>> + */
>> +#define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 
>> 3))
> 
> I don't understand the '(va_bits) - 4' here, can you explain it (and add a
> comment to that effect) ?

I just had a chat with Catalin, who did shed some light on this.
It all has to do with rounding up. What you would like to have here is:

#define ARM64_HW_PGTABLE_LEVELS(va_bits) DIV_ROUND_UP(va_bits - PAGE_SHIFT, 
PAGE_SHIFT - 3)

where (va_bits - PAGE_SHIFT) is the total number of bits we deal
with during a page table walk, and (PAGE_SHIFT - 3) is the number
of bits we deal with per level.

The clue is in how DIV_ROUND_UP is written:

#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

which gives you Suzuki's magic formula.

I'd vote for the DIV_ROUND_UP(), which will make things a lot more readable.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/15] arm: kvm: Move fake PGD handling to arch specific files

2015-10-07 Thread Marc Zyngier
On 15/09/15 16:41, Suzuki K. Poulose wrote:
> From: "Suzuki K. Poulose" 
> 
> Rearrange the code for fake pgd handling, which is applicable
> to only ARM64. The intention is to keep the common code cleaner,
> unaware of the underlying hacks.
> 
> Cc: kvm...@lists.cs.columbia.edu
> Cc: christoffer.d...@linaro.org
> Cc: marc.zyng...@arm.com
> Signed-off-by: Suzuki K. Poulose 
> ---
>  arch/arm/include/asm/kvm_mmu.h   |7 ++
>  arch/arm/kvm/mmu.c   |   44 
> +-
>  arch/arm64/include/asm/kvm_mmu.h |   43 +
>  3 files changed, 55 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 405aa18..1c9aa8a 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -173,6 +173,13 @@ static inline unsigned int kvm_get_hwpgd_size(void)
>   return PTRS_PER_S2_PGD * sizeof(pgd_t);
>  }
>  
> +static inline pgd_t *kvm_setup_fake_pgd(pgd_t *pgd)
> +{
> + return pgd;
> +}
> +
> +static inline void kvm_free_fake_pgd(pgd_t *pgd) {}
> +
>  struct kvm;
>  
>  #define kvm_flush_dcache_to_poc(a,l) __cpuc_flush_dcache_area((a), (l))
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7b42012..b210622 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -677,43 +677,11 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>* guest, we allocate a fake PGD and pre-populate it to point
>* to the next-level page table, which will be the real
>* initial page table pointed to by the VTTBR.
> -  *
> -  * When KVM_PREALLOC_LEVEL==2, we allocate a single page for
> -  * the PMD and the kernel will use folded pud.
> -  * When KVM_PREALLOC_LEVEL==1, we allocate 2 consecutive PUD
> -  * pages.
>*/
> - if (KVM_PREALLOC_LEVEL > 0) {
> - int i;
> -
> - /*
> -  * Allocate fake pgd for the page table manipulation macros to
> -  * work.  This is not used by the hardware and we have no
> -  * alignment requirement for this allocation.
> -  */
> - pgd = kmalloc(PTRS_PER_S2_PGD * sizeof(pgd_t),
> - GFP_KERNEL | __GFP_ZERO);
> -
> - if (!pgd) {
> - kvm_free_hwpgd(hwpgd);
> - return -ENOMEM;
> - }
> -
> - /* Plug the HW PGD into the fake one. */
> - for (i = 0; i < PTRS_PER_S2_PGD; i++) {
> - if (KVM_PREALLOC_LEVEL == 1)
> - pgd_populate(NULL, pgd + i,
> -  (pud_t *)hwpgd + i * PTRS_PER_PUD);
> - else if (KVM_PREALLOC_LEVEL == 2)
> - pud_populate(NULL, pud_offset(pgd, 0) + i,
> -  (pmd_t *)hwpgd + i * PTRS_PER_PMD);
> - }
> - } else {
> - /*
> -  * Allocate actual first-level Stage-2 page table used by the
> -  * hardware for Stage-2 page table walks.
> -  */
> - pgd = (pgd_t *)hwpgd;
> + pgd = kvm_setup_fake_pgd(hwpgd);
> + if (IS_ERR(pgd)) {
> + kvm_free_hwpgd(hwpgd);
> + return PTR_ERR(pgd);
>   }
>  
>   kvm_clean_pgd(pgd);
> @@ -820,9 +788,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  
>   unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>   kvm_free_hwpgd(kvm_get_hwpgd(kvm));
> - if (KVM_PREALLOC_LEVEL > 0)
> - kfree(kvm->arch.pgd);
> -
> + kvm_free_fake_pgd(kvm->arch.pgd);
>   kvm->arch.pgd = NULL;
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 6150567..2567fe8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -198,6 +198,49 @@ static inline unsigned int kvm_get_hwpgd_size(void)
>   return PTRS_PER_S2_PGD * sizeof(pgd_t);
>  }
>  
> +/*
> + * Allocate fake pgd for the page table manipulation macros to
> + * work.  This is not used by the hardware and we have no
> + * alignment requirement for this allocation.
> + */
> +static inline pgd_t* kvm_setup_fake_pgd(pgd_t *hwpgd)
> +{
> + int i;
> + pgd_t *pgd;
> +
> + if (!KVM_PREALLOC_LEVEL)
> + return hwpgd;
> + /*
> +  * When KVM_PREALLOC_LEVEL==2, we allocate a single page for
> +  * the PMD and the kernel will use folded pud.
> +  * When KVM_PREALLOC_LEVEL==1, we allocate 2 consecutive PUD
> +  * pages.
> +  */
> + pgd = kmalloc(PTRS_PER_S2_PGD * sizeof(pgd_t),
> + GFP_KERNEL | __GFP_ZERO);
> +
> + if (!pgd)
> + return ERR_PTR(-ENOMEM);
> +
> + /* Plug the HW PGD into the fake one. */
> + for (i = 0; i < PTRS_PER_S2_PGD; i++) {
> + if (KVM_PREALLOC_LEVEL == 1)
> +

Re: [PATCH 13/15] arm64: kvm: Rewrite fake pgd handling

2015-10-07 Thread Marc Zyngier
On 15/09/15 16:41, Suzuki K. Poulose wrote:
> From: "Suzuki K. Poulose" 
> 
> The existing fake pgd handling code assumes that the stage-2 entry
> level can only be one level down that of the host, which may not be
> true always(e.g, with the introduction of 16k pagesize).
> 
> e.g.
> With 16k page size and 48bit VA and 40bit IPA we have the following
> split for page table levels:
> 
> level:  0   1 2 3
> bits : [47] [46 - 36] [35 - 25] [24 - 14] [13 - 0]
>  ^   ^ ^
>  |   | |
>host entry| x stage-2 entry
>  |
> IPA -x
> 
> The stage-2 entry level is 2, due to the concatenation of 16tables
> at level 2(mandated by the hardware). So, we need to fake two levels
> to actually reach the hyp page table. This case cannot be handled

Nit: this is the stage-2 PT, not HYP.

> with the existing code, as, all we know about is KVM_PREALLOC_LEVEL
> which kind of stands for two different pieces of information.
> 
> 1) Whether we have fake page table entry levels.
> 2) The entry level of stage-2 translation.
> 
> We loose the information about the number of fake levels that
> we may have to use. Also, KVM_PREALLOC_LEVEL computation itself
> is wrong, as we assume the hw entry level is always 1 level down
> from the host.
> 
> This patch introduces two seperate indicators :

Nit: "separate".

> 1) Accurate entry level for stage-2 translation - HYP_PGTABLE_ENTRY_LEVEL -
>using the new helpers.

Same confusion here. HYP has its own set of page tables, and this
definitely is S2, not HYP. Please update this symbol (and all the
similar ones) so that it is not confusing.

> 2) Number of levels of fake pagetable entries. (KVM_FAKE_PGTABLE_LEVELS)
> 
> The following conditions hold true for all cases(with 40bit IPA)
> 1) The stage-2 entry level <= 2
> 2) Number of fake page-table entries is in the inclusive range [0, 2].
> 
> Cc: kvm...@lists.cs.columbia.edu
> Cc: christoffer.d...@linaro.org
> Cc: marc.zyng...@arm.com
> Signed-off-by: Suzuki K. Poulose 
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  114 
> --
>  1 file changed, 61 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 2567fe8..72cfd9e 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -41,18 +41,6 @@
>   */
>  #define TRAMPOLINE_VA(HYP_PAGE_OFFSET_MASK & PAGE_MASK)
>  
> -/*
> - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
> - * levels in addition to the PGD and potentially the PUD which are
> - * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2
> - * tables use one level of tables less than the kernel.
> - */
> -#ifdef CONFIG_ARM64_64K_PAGES
> -#define KVM_MMU_CACHE_MIN_PAGES  1
> -#else
> -#define KVM_MMU_CACHE_MIN_PAGES  2
> -#endif
> -
>  #ifdef __ASSEMBLY__
>  
>  /*
> @@ -80,6 +68,26 @@
>  #define KVM_PHYS_SIZE(1UL << KVM_PHYS_SHIFT)
>  #define KVM_PHYS_MASK(KVM_PHYS_SIZE - 1UL)
>  
> +/*
> + * At stage-2 entry level, upto 16 tables can be concatenated and
> + * the hardware expects us to use concatenation, whenever possible.
> + * So, number of page table levels for KVM_PHYS_SHIFT is always
> + * the number of normal page table levels for (KVM_PHYS_SHIFT - 4).
> + */
> +#define HYP_PGTABLE_LEVELS   ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4)
> +/* Number of bits normally addressed by HYP_PGTABLE_LEVELS */
> +#define HYP_PGTABLE_SHIFTARM64_HW_PGTABLE_LEVEL_SHIFT(HYP_PGTABLE_LEVELS 
> + 1)

Why +1? I don't understand where that is coming from... which makes the
rest of the patch fairly opaque to me...

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Paolo Bonzini


On 13/08/2015 15:46, Radim Krčmář wrote:
> + if (kvm_x86_ops->sync_pir_to_irr(vcpu))
> + kvm_make_request(KVM_REQ_EVENT, vcpu);
> +

The call to sync_pir_to_irr belongs more in vcpu_scan_ioapic, I think.

More importantly, I think that KVM_REQ_EVENT is a latent bug for
kvm_vcpu_ioctl_get_lapic as well, so the call to kvm_make_request should
go in vmx_sync_pir_to_irr or in a new kvm_sync_pir_to_irr wrapper.

> + (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
> +  kvm_apic_pending_eoi(vcpu, e->fields.vector)))

Should we test again here that kvm_irq_has_notifier(ioapic->kvm,
KVM_IRQCHIP_IOAPIC, index), to avoid unnecessarily marking other
edge-triggered interrupts?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] arm64: Cleanup VTCR_EL2 computation

2015-10-07 Thread Marc Zyngier
On 15/09/15 16:41, Suzuki K. Poulose wrote:
> From: "Suzuki K. Poulose" 
> 
> No functional changes. Group the common bits for VCTR_EL2
> initialisation for better readability. The granule size
> and the entry level are controlled by the page size.
> 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: kvm...@lists.cs.columbia.edu
> Signed-off-by: Suzuki K. Poulose 
> ---
>  arch/arm64/include/asm/kvm_arm.h |   13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index bdf139e..699554d 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -138,6 +138,9 @@
>   * The magic numbers used for VTTBR_X in this patch can be found in Tables
>   * D4-23 and D4-25 in ARM DDI 0487A.b.
>   */
> +#define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
> +  VTCR_EL2_IRGN0_WBWA | VTCR_EL2_T0SZ_40B)
> +
>  #ifdef CONFIG_ARM64_64K_PAGES
>  /*
>   * Stage2 translation configuration:
> @@ -145,9 +148,8 @@
>   * 64kB pages (TG0 = 1)
>   * 2 level page tables (SL = 1)
>   */
> -#define VTCR_EL2_FLAGS   (VTCR_EL2_TG0_64K | VTCR_EL2_SH0_INNER 
> | \
> -  VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \
> -  VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B)
> +#define VTCR_EL2_FLAGS   (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1 | 
> \
> +  VTCR_EL2_COMMON_BITS)
>  #define VTTBR_X  (38 - VTCR_EL2_T0SZ_40B)
>  #else
>  /*
> @@ -156,9 +158,8 @@
>   * 4kB pages (TG0 = 0)
>   * 3 level page tables (SL = 1)
>   */
> -#define VTCR_EL2_FLAGS   (VTCR_EL2_TG0_4K | VTCR_EL2_SH0_INNER | 
> \
> -  VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \
> -  VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B)
> +#define VTCR_EL2_FLAGS   (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1 | \
> +  VTCR_EL2_COMMON_BITS)
>  #define VTTBR_X  (37 - VTCR_EL2_T0SZ_40B)
>  #endif
>  
> 

This looks OK, but is going to clash badly with 857d1a9 ("arm64: KVM:
set {v,}TCR_EL2 RES1 bits"). Nothing we can't fix though.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-10-07 Thread Stefan Hajnoczi
On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:
> On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:
> >On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
> >Have you thought about live migration?
> >
> >Are the contents of the NVDIMM migrated since they are registered as a
> >RAM region?
> 
> Will fully test live migration and VM save before sending the V3 out. :)

Hi,
What is the status of this patch series?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread kbuild test robot
Hi Joerg,

[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: i386-randconfig-x009-201540 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kvm/svm.c: In function 'nested_svm_vmexit':
>> arch/x86/kvm/svm.c:2369:28: error: 'vcpu' undeclared (first use in this 
>> function)
 if (guest_cpuid_has_nrips(vcpu))
   ^
   arch/x86/kvm/svm.c:2369:28: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/vcpu +2369 arch/x86/kvm/svm.c

  2363  nested_vmcb->control.exit_code_hi  = 
vmcb->control.exit_code_hi;
  2364  nested_vmcb->control.exit_info_1   = 
vmcb->control.exit_info_1;
  2365  nested_vmcb->control.exit_info_2   = 
vmcb->control.exit_info_2;
  2366  nested_vmcb->control.exit_int_info = 
vmcb->control.exit_int_info;
  2367  nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
  2368  
> 2369  if (guest_cpuid_has_nrips(vcpu))
  2370  nested_vmcb->control.next_rip  = vmcb->control.next_rip;
  2371  
  2372  /*

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 01:03:35PM +0200, Joerg Roedel wrote:
> But we don't care if L1 writes something into its own next_rip, as we
> never read this value from its VMCB. We only copy the next_rip value we
> get from our shadow-vmcb to it on an emulated vmexit. So I still don't
> understand what triggers the reported problem or why the WARN_ON is
> necessary.

Okay, I think I have an idea now. I talked a bit with Dirk and the
WARN_ON triggers in the guest, and not on the host. This makes a lot
more sense.

In nested-svm we always copy the next_rip from the shadow-vmcb to the
guests vmcb, even when the nrips bit in cpuid is not set for the guest.
This obviously triggers the WARN_ON() in the L1 KVM (I still don't
understand why the WARN_ON was introduced in the first place).

So the right fix is to only copy next_rip to the guests vmcb when its
cpuid indicates that next_rip is supported there, like in this patch:

>From 019afc60507618b8e44e0c67d5ea2d850d88c9dd Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Wed, 7 Oct 2015 13:38:19 +0200
Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

Currently we always write the next_rip of the shadow vmcb to
the guests vmcb when we emulate a vmexit. This could confuse
the guest when its cpuid indicated no support for the
next_rip feature.

Fix this by only propagating next_rip if the guest actually
supports it.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/cpuid.h | 21 +
 arch/x86/kvm/svm.c   |  7 ++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dd05b9c..effca1f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
*vcpu)
best = kvm_find_cpuid_entry(vcpu, 7, 0);
return best && (best->ebx & bit(X86_FEATURE_MPX));
 }
+
+/*
+ * NRIPS is provided through cpuidfn 0x800a.edx bit 3
+ */
+#define BIT_NRIPS  3
+
+static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
+
+   /*
+* NRIPS is a scattered cpuid feature, so we can't use
+* X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
+* position 8, not 3).
+*/
+   return best && (best->edx & bit(BIT_NRIPS));
+}
+#undef BIT_NRIPS
+
 #endif
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 94b7d15..e1a8824 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2459,7 +2459,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
nested_vmcb->control.exit_int_info_err = 
vmcb->control.exit_int_info_err;
-   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
+
+   if (guest_cpuid_has_nrips(vcpu))
+   nested_vmcb->control.next_rip  = vmcb->control.next_rip;
 
/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
@@ -2714,6 +2716,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
 
+   /* Clear next_rip, as real hardware would do */
+   nested_vmcb->control.next_rip = 0;
+
nested_svm_unmap(page);
 
/* Enter Guest-Mode */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Virtio drivers for WindowsServer 2012R2 network problems no outgoing traffic in Openstack

2015-10-07 Thread Daniel Seybold


Hi KVM developers,

currently I'm facing a tricky issue when I'm using the Virtio drivers
(Fedora [1]) for a WindowsServer2012R2 virtual machine on Openstack

I've created a WindowsServer Image following this tutorial [2]. I've
installed the virtio drivers  but without the Cloud-Init step.

I've created the Image in Openstack and started a VM.
I'm able to connect to the VM via powershell or remote desktop but I'm
not able to download any source from the internet.
I always receive a timeout.

Openstack Version: Kilo
QEMU emulator version 1.5.3 (qemu-kvm-1.5.3-86.el7_1.5), Copyright (c)
2003-2008 Fabrice Bellard
virtio drivers: virtio-win-0.1.102

I've done the same steps on an older Openstack setup with
virtio drivers: virtio-win-0.1.96
and everything works without a problem.

Openstack Version: Icehouse
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2-2.448.el6_6),
Copyright (c) 2003-2008 Fabrice Bellard
virtio drivers: virtio-win-0.1.96

It is definately not related to Openstack Security groups or netowork
issus on the host.
There is also a unix VM on the host without any network issues.

Does anyone have a clue how to solve this issue?

Thanks in advance for your help!

Cheers,
Daniel

[1] https://fedoraproject.org/wiki/Windows_Virtio_Drivers
[2] http://docs.openstack.org/image-guide/content/windows-image.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] arm64: kvm: Rewrite fake pgd handling

2015-10-07 Thread Suzuki K. Poulose

On 07/10/15 12:13, Marc Zyngier wrote:

On 15/09/15 16:41, Suzuki K. Poulose wrote:

From: "Suzuki K. Poulose" 

The existing fake pgd handling code assumes that the stage-2 entry
level can only be one level down that of the host, which may not be
true always(e.g, with the introduction of 16k pagesize).

e.g.
With 16k page size and 48bit VA and 40bit IPA we have the following
split for page table levels:

level:  0   1 2 3
bits : [47] [46 - 36] [35 - 25] [24 - 14] [13 - 0]
  ^   ^ ^
  |   | |
host entry| x stage-2 entry
  |
 IPA -x

The stage-2 entry level is 2, due to the concatenation of 16tables
at level 2(mandated by the hardware). So, we need to fake two levels
to actually reach the hyp page table. This case cannot be handled


Nit: this is the stage-2 PT, not HYP.


with the existing code, as, all we know about is KVM_PREALLOC_LEVEL
which kind of stands for two different pieces of information.

1) Whether we have fake page table entry levels.
2) The entry level of stage-2 translation.

We loose the information about the number of fake levels that
we may have to use. Also, KVM_PREALLOC_LEVEL computation itself
is wrong, as we assume the hw entry level is always 1 level down
from the host.

This patch introduces two seperate indicators :


Nit: "separate".


1) Accurate entry level for stage-2 translation - HYP_PGTABLE_ENTRY_LEVEL -
using the new helpers.


Same confusion here. HYP has its own set of page tables, and this
definitely is S2, not HYP. Please update this symbol (and all the
similar ones) so that it is not confusing.



Sure, I will use S2 everywhere.


2) Number of levels of fake pagetable entries. (KVM_FAKE_PGTABLE_LEVELS)

The following conditions hold true for all cases(with 40bit IPA)
1) The stage-2 entry level <= 2
2) Number of fake page-table entries is in the inclusive range [0, 2].

Cc: kvm...@lists.cs.columbia.edu
Cc: christoffer.d...@linaro.org
Cc: marc.zyng...@arm.com
Signed-off-by: Suzuki K. Poulose 
---
  arch/arm64/include/asm/kvm_mmu.h |  114 --
  1 file changed, 61 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2567fe8..72cfd9e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -41,18 +41,6 @@
   */
  #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK)

-/*
- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
- * levels in addition to the PGD and potentially the PUD which are
- * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2
- * tables use one level of tables less than the kernel.
- */
-#ifdef CONFIG_ARM64_64K_PAGES
-#define KVM_MMU_CACHE_MIN_PAGES1
-#else
-#define KVM_MMU_CACHE_MIN_PAGES2
-#endif
-
  #ifdef __ASSEMBLY__

  /*
@@ -80,6 +68,26 @@
  #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT)
  #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL)

+/*
+ * At stage-2 entry level, upto 16 tables can be concatenated and
+ * the hardware expects us to use concatenation, whenever possible.
+ * So, number of page table levels for KVM_PHYS_SHIFT is always
+ * the number of normal page table levels for (KVM_PHYS_SHIFT - 4).
+ */
+#define HYP_PGTABLE_LEVELS ARM64_HW_PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4)
+/* Number of bits normally addressed by HYP_PGTABLE_LEVELS */
+#define HYP_PGTABLE_SHIFT  ARM64_HW_PGTABLE_LEVEL_SHIFT(HYP_PGTABLE_LEVELS 
+ 1)


Why +1? I don't understand where that is coming from... which makes the
rest of the patch fairly opaque to me...


Sorry for the confusion in the numbering of levels and the lack of comments.

Taking the above example in the description, with 16K.


ARM ARM entry

no. of
levels 4 3 2 1 0

vabits : [47] [46 - 36] [35 - 25] [24 - 14] [13 - 0]
^   ^^
|   ||
  host entry|x stage-2 entry
||
   IPA -xx- HYP_PGTABLE_SHIFT


1) ARM64_HW_PGTABLE_LEVEL_SHIFT(x) gives the size a level 'x' entry can map.

e.g, PTE_SHIFT => ARM64_HW_PGTABLE_LEVEL_SHIFT(1) => PAGE_SHIFT = 14
 PMD_SHIFT => ARM64_HW_PGTABLE_LEVEL_SHIFT(2) => (PAGE_SHIFT - 3) + 
PAGE_SHIFT = 25
 PUD_SHIFT => ARM64_HW_PGTABLE_LEVEL_SHIFT(3) => 36

and so on.

Now we get HYP_PAGETABLE_LEVELS = 2

To calculate the number of concatenated entries, we need to know the total 
size(HYP_PGTABLE_SHIFT)
that can be mapped by the hyp(stage2) page table with HYP_PGTABLE_LEVELS(2). It 
is
nothing but the size mapped by a (HYP_PGTABLE_LEVELS + 1) entry.
i.e, ARM64_HW_PGTABLE_LEVEL_SHIFT(3) = 36 ( = 39 for 4K)

We can use that to calculate the number of concatenated entries, by :

KVM_PHYS_SHIFT - HYP_PGTABLE_SHIFT

Numbering of the levels is a bit confusing. The ARM ARM numbers levels from the 

Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 10:58:07AM -0400, Bandan Das wrote:
> Ok, looks like I am making some incorrect "vmx" assumptions here. What happens
> when we exit from L2 to L0, arent' we looking at the VMCB L1 is using to run
> L2 ? Wouldn't that trigger the warning if the host processor does not support
> nrips and the field is set ?

No, because the L1 hypervisor can't write to the shadow-vmcb, and this
is the only one where we _read_ next_rip from.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Marc Zyngier
On 07/10/15 16:10, Pavel Fedin wrote:
>  Hello!
> 
>> As the actual LPI number in a guest can be quite high, but is mostly
>> assigned using a very sparse allocation scheme, bitmaps and arrays
>> for storing the virtual interrupt status are a waste of memory.
>> We use our equivalent of the "Interrupt Translation Table Entry"
>> (ITTE) to hold this extra status information for a virtual LPI.
> 
> You know, not that i'm strongly against current approach and want you
> to redo everything once again, but... Is it architecturally correct
> to intertwine LPIs and ITS so much? As far as i

Yes it is.

> understand arch manual, it is possible to have LPIs without ITS
> (triggered by something else?). Shouldn't we do the same, and just
> add LPI support to our redistributors, and then proceed with the 
> ITS?

No. We're implementing a monolithic GICv3 that doesn't offer writing to
the redistributors directly from a device. And frankly, that's good enough.

> As to memory consumption, do we really need to store own copy of
> tables? After all, it's just a memory. What if we map a pointer
> directly into guest's memory (which it writes to 
> PROPBASER/PENDBASER), and just keep it? There will be no issues with
> caching and synchronization at all.

Sure. And you then have to parse and validate all the tables each and
every time you're going to inject an interrupt (because the guest can
change the table content behind your back). You are quickly going to
notice that your performance is abysmal.

At that point, you're going to start being clever, and add a cache. And
guess what, that's what the HW does too. And then you'll make your cache
a convenient structure to be able to quickly inject interrupts. And
that's what the HW does too. And finally, you're going to realize that
populating a cache sucks, and you're going to keep all the state where
it is convenient, when it is convenient (and that's basically always).

The HW can't do that, but we can.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 00/18] implement vNVDIMM

2015-10-07 Thread Xiao Guangrong



On 10/07/2015 10:02 PM, Stefan Hajnoczi wrote:

On Wed, Aug 26, 2015 at 06:49:35PM +0800, Xiao Guangrong wrote:

On 08/26/2015 12:26 AM, Stefan Hajnoczi wrote:

On Fri, Aug 14, 2015 at 10:51:53PM +0800, Xiao Guangrong wrote:
Have you thought about live migration?

Are the contents of the NVDIMM migrated since they are registered as a
RAM region?


Will fully test live migration and VM save before sending the V3 out. :)


Hi,
What is the status of this patch series?


This is huge change in v3, the patchset is ready now and it's being tested.
Will post it out (hopefully this week) after the long holiday in China. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Radim Krčmář
2015-10-07 11:29+0200, Paolo Bonzini:
> On 13/08/2015 15:46, Radim Krčmář wrote:
>> +if (kvm_x86_ops->sync_pir_to_irr(vcpu))
>> +kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +
> 
> The call to sync_pir_to_irr belongs more in vcpu_scan_ioapic, I think.

Will do so in v2.

> More importantly, I think that KVM_REQ_EVENT is a latent bug for
> kvm_vcpu_ioctl_get_lapic as well, so the call to kvm_make_request should
> go in vmx_sync_pir_to_irr or in a new kvm_sync_pir_to_irr wrapper.

True, thanks.  I'll make the request in kvm_apic_update_irr (unless
you'd prefer to have it in new kvm_sync_pir_to_irr).

>> +(e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
>> + kvm_apic_pending_eoi(vcpu, e->fields.vector)))
> 
> Should we test again here that kvm_irq_has_notifier(ioapic->kvm,
> KVM_IRQCHIP_IOAPIC, index), to avoid unnecessarily marking other
> edge-triggered interrupts?

Other edge-triggered interrupts are skipped by a previous condition:

  if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
  kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
  index == RTC_GSI)
 [we're here]

I think it is ok to ignore level-triggered RTC, but we do want to
include edge-triggered.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/16] KVM: arm64: sync LPI configuration and pending tables

2015-10-07 Thread Andre Przywara
The LPI configuration and pending tables of the GICv3 LPIs are held
in tables in (guest) memory. To achieve reasonable performance, we
cache this data in our own data structures, so we need to sync those
two views from time to time. This behaviour is well described in the
GICv3 spec and is also exercised by hardware, so the sync points are
well known.

Provide functions that read the guest memory and store the
information from the configuration and pending tables in the kernel.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- rework functions to avoid propbaser/pendbaser accesses inside lock

 include/kvm/arm_vgic.h  |   2 +
 virt/kvm/arm/its-emul.c | 133 
 virt/kvm/arm/its-emul.h |   3 ++
 3 files changed, 138 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 035911f..4ea023c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -179,6 +179,8 @@ struct vgic_its {
int cwriter;
struct list_headdevice_list;
struct list_headcollection_list;
+   /* memory used for buffering guest's memory */
+   void*buffer_page;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 8349970..7a8c5db 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -59,6 +59,7 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   u8 priority;
bool enabled;
unsigned long *pending;
 };
@@ -80,8 +81,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+#define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
+#define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
+
+/* stores the priority and enable bit for a given LPI */
+static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop)
+{
+   itte->priority = LPI_PROP_PRIORITY(prop);
+   itte->enabled  = LPI_PROP_ENABLE_BIT(prop);
+}
+
+#define GIC_LPI_OFFSET 8192
+
+/* We scan the table in chunks the size of the smallest page size */
+#define CHUNK_SIZE 4096U
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
+static int nr_idbits_propbase(u64 propbaser)
+{
+   int nr_idbits = (1U << (propbaser & 0x1f)) + 1;
+
+   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
+}
+
+/*
+ * Scan the whole LPI configuration table and put the LPI configuration
+ * data in our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_update_lpis_configuration(struct kvm *kvm, u64 prop_base_reg)
+{
+   struct vgic_dist *dist = >arch.vgic;
+   u8 *prop = dist->its.buffer_page;
+   u32 tsize;
+   gpa_t propbase;
+   int lpi = GIC_LPI_OFFSET;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+
+   propbase = BASER_BASE_ADDRESS(prop_base_reg);
+   tsize = nr_idbits_propbase(prop_base_reg);
+
+   while (tsize > 0) {
+   int chunksize = min(tsize, CHUNK_SIZE);
+
+   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
+   if (ret)
+   return false;
+
+   spin_lock(>its.lock);
+   /*
+* Updating the status for all allocated LPIs. We catch
+* those LPIs that get disabled. We really don't care
+* about unmapped LPIs, as they need to be updated
+* later manually anyway once they get mapped.
+*/
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi < lpi || itte->lpi >= lpi + chunksize)
+   continue;
+
+   update_lpi_config(kvm, itte, prop[itte->lpi - lpi]);
+   }
+   spin_unlock(>its.lock);
+   tsize -= chunksize;
+   lpi += chunksize;
+   propbase += chunksize;
+   }
+
+   return true;
+}
+
+/*
+ * Scan the whole LPI pending table and sync the pending bit in there
+ * with our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_sync_lpi_pending_table(struct kvm_vcpu *vcpu, u64 
base_addr_reg)
+{
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   unsigned long *pendmask = dist->its.buffer_page;
+   u32 nr_lpis = VITS_NR_LPIS;
+   gpa_t pendbase;
+   int lpi = 0;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+   int lpi_bit, nr_bits;
+
+   pendbase = BASER_BASE_ADDRESS(base_addr_reg);
+
+   while (nr_lpis > 0) {
+   nr_bits = min(nr_lpis, CHUNK_SIZE * 8);
+
+   ret = kvm_read_guest(vcpu->kvm, pendbase, pendmask,
+nr_bits / 8);
+   if (ret)
+   return false;
+
+   spin_lock(>its.lock);
+   

[PATCH v3 05/16] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities

2015-10-07 Thread Andre Przywara
KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- none

 arch/arm/include/asm/kvm_host.h   | 2 +-
 arch/arm/kvm/arm.c| 2 +-
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/reset.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..88e84db 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -210,7 +210,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
-static inline int kvm_arch_dev_ioctl_check_extension(long ext)
+static inline int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
return 0;
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6c7f4520..bdbefcd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -197,7 +197,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_MAX_VCPUS;
break;
default:
-   r = kvm_arch_dev_ioctl_check_extension(ext);
+   r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
break;
}
return r;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4562459..c41e613 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -43,7 +43,7 @@
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
-int kvm_arch_dev_ioctl_check_extension(long ext);
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 
 struct kvm_arch {
/* The VMID generation used for the virt. memory system */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 91cf535..4d7f78b4 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -63,7 +63,7 @@ static bool cpu_has_32bit_el1(void)
  * We currently assume that the number of HW registers is uniform
  * across all CPUs (see cpuinfo_sanity_check).
  */
-int kvm_arch_dev_ioctl_check_extension(long ext)
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
int r;
 
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/16] KVM: arm64: implement ITS command queue command handlers

2015-10-07 Thread Andre Przywara
The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures.
To avoid kmallocs inside the ITS spinlock, we preallocate possibly
needed memory outside of the lock and free that if it turns out to
be not needed (mostly error handling).
Error handling is very basic at this point, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- adjust handlers to new pendbaser/propbaser locking scheme
- properly free ITTEs (including pending bitmap)
- fix handling of unmapped collections

 include/linux/irqchip/arm-gic-v3.h |   5 +-
 virt/kvm/arm/its-emul.c| 502 -
 virt/kvm/arm/its-emul.h|  11 +
 3 files changed, 516 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index ef274a9..27c0e75 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -255,7 +255,10 @@
  */
 #define GITS_CMD_MAPD  0x08
 #define GITS_CMD_MAPC  0x09
-#define GITS_CMD_MAPVI 0x0a
+#define GITS_CMD_MAPTI 0x0a
+/* older GIC documentation used MAPVI for this command */
+#define GITS_CMD_MAPVI GITS_CMD_MAPTI
+#define GITS_CMD_MAPI  0x0b
 #define GITS_CMD_MOVI  0x01
 #define GITS_CMD_DISCARD   0x0f
 #define GITS_CMD_INV   0x0c
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 7a8c5db..642effb 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -64,6 +65,34 @@ struct its_itte {
unsigned long *pending;
 };
 
+static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
+{
+   struct vgic_its *its = >arch.vgic.its;
+   struct its_device *device;
+
+   list_for_each_entry(device, >device_list, dev_list)
+   if (device_id == device->device_id)
+   return device;
+
+   return NULL;
+}
+
+static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 event_id)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   device = find_its_device(kvm, device_id);
+   if (device == NULL)
+   return NULL;
+
+   list_for_each_entry(itte, >itt, itte_list)
+   if (itte->event_id == event_id)
+   return itte;
+
+   return NULL;
+}
+
 /* To be used as an iterator this macro misses the enclosing parentheses */
 #define for_each_lpi(dev, itte, kvm) \
list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
@@ -81,6 +110,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
+{
+   struct its_collection *collection;
+
+   list_for_each_entry(collection, >arch.vgic.its.collection_list,
+   coll_list) {
+   if (coll_id == collection->collection_id)
+   return collection;
+   }
+
+   return NULL;
+}
+
 #define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
 #define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
 
@@ -352,13 +394,471 @@ static void its_free_itte(struct its_itte *itte)
kfree(itte);
 }
 
+static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
+{
+   return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
+}
+
+#define its_cmd_get_command(cmd)   its_cmd_mask_field(cmd, 0,  0,  8)
+#define its_cmd_get_deviceid(cmd)  its_cmd_mask_field(cmd, 0, 32, 32)
+#define its_cmd_get_id(cmd)its_cmd_mask_field(cmd, 1,  0, 32)
+#define its_cmd_get_physical_id(cmd)   its_cmd_mask_field(cmd, 1, 32, 32)
+#define its_cmd_get_collection(cmd)its_cmd_mask_field(cmd, 2,  0, 16)
+#define its_cmd_get_target_addr(cmd)   its_cmd_mask_field(cmd, 2, 16, 32)
+#define its_cmd_get_validbit(cmd)  its_cmd_mask_field(cmd, 2, 63,  1)
+
+/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */
+static int vits_cmd_handle_discard(struct kvm *kvm, u64 *its_cmd)
+{
+   struct vgic_its *its = >arch.vgic.its;
+   u32 device_id;
+   u32 event_id;
+   struct its_itte *itte;
+   int ret = E_ITS_DISCARD_UNMAPPED_INTERRUPT;
+
+   device_id = 

[PATCH v3 15/16] KVM: arm64: implement MSI injection in ITS emulation

2015-10-07 Thread Andre Przywara
When userland wants to inject a MSI into the guest, we have to use
our data structures to find the LPI number and the VCPU to receive
the interrupt.
Use the wrapper functions to iterate the linked lists and find the
proper Interrupt Translation Table Entry. Then set the pending bit
in this ITTE to be later picked up by the LR handling code. Kick
the VCPU which is meant to handle this interrupt.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- proper checking for unmapped collections

 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/its-emul.c | 65 +
 virt/kvm/arm/its-emul.h |  2 ++
 virt/kvm/arm/vgic-v3-emul.c |  1 +
 4 files changed, 69 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4ea023c..7911059 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -149,6 +149,7 @@ struct vgic_vm_ops {
int (*map_resources)(struct kvm *, const struct vgic_params *);
bool(*queue_lpis)(struct kvm_vcpu *);
void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
+   int (*inject_msi)(struct kvm *, struct kvm_msi *);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 642effb..cd8526a 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -333,6 +333,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
 }
 
 /*
+ * Translates an incoming MSI request into the redistributor (=VCPU) and
+ * the associated LPI number. Sets the LPI pending bit and also marks the
+ * VCPU as having a pending interrupt.
+ */
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+   struct vgic_dist *dist = >arch.vgic;
+   struct vgic_its *its = >its;
+   struct its_itte *itte;
+   int cpuid;
+   bool inject = false;
+   int ret = 0;
+
+   if (!vgic_has_its(kvm))
+   return -ENODEV;
+
+   if (!(msi->flags & KVM_MSI_VALID_DEVID))
+   return -EINVAL;
+
+   spin_lock(>lock);
+
+   if (!its->enabled || !dist->lpis_enabled) {
+   ret = -EAGAIN;
+   goto out_unlock;
+   }
+
+   itte = find_itte(kvm, msi->devid, msi->data);
+   /* Triggering an unmapped IRQ gets silently dropped. */
+   if (!itte || !its_is_collection_mapped(itte->collection))
+   goto out_unlock;
+
+   cpuid = itte->collection->target_addr;
+   __set_bit(cpuid, itte->pending);
+   inject = itte->enabled;
+
+out_unlock:
+   spin_unlock(>lock);
+
+   if (inject) {
+   spin_lock(>lock);
+   __set_bit(cpuid, dist->irq_pending_on_cpu);
+   spin_unlock(>lock);
+   kvm_vcpu_kick(kvm_get_vcpu(kvm, cpuid));
+   }
+
+   return ret;
+}
+
+/*
  * Find all enabled and pending LPIs and queue them into the list
  * registers.
  * The dist lock is held by the caller.
@@ -812,6 +861,19 @@ static int vits_cmd_handle_movall(struct kvm *kvm, u64 
*its_cmd)
return 0;
 }
 
+/* The INT command injects the LPI associated with that DevID/EvID pair. */
+static int vits_cmd_handle_int(struct kvm *kvm, u64 *its_cmd)
+{
+   struct kvm_msi msi = {
+   .data = its_cmd_get_id(its_cmd),
+   .devid = its_cmd_get_deviceid(its_cmd),
+   .flags = KVM_MSI_VALID_DEVID,
+   };
+
+   vits_inject_msi(kvm, );
+   return 0;
+}
+
 /*
  * This function is called with both the ITS and the distributor lock dropped,
  * so the actual command handlers must take the respective locks when needed.
@@ -846,6 +908,9 @@ static int vits_handle_command(struct kvm_vcpu *vcpu, u64 
*its_cmd)
case GITS_CMD_MOVALL:
ret = vits_cmd_handle_movall(vcpu->kvm, its_cmd);
break;
+   case GITS_CMD_INT:
+   ret = vits_cmd_handle_int(vcpu->kvm, its_cmd);
+   break;
case GITS_CMD_INV:
ret = vits_cmd_handle_inv(vcpu->kvm, its_cmd);
break;
diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h
index 830524a..95e56a7 100644
--- a/virt/kvm/arm/its-emul.h
+++ b/virt/kvm/arm/its-emul.h
@@ -36,6 +36,8 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 int vits_init(struct kvm *kvm);
 void vits_destroy(struct kvm *kvm);
 
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 bool vits_queue_lpis(struct kvm_vcpu *vcpu);
 void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index f482e34..90f3628 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ 

[PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Andre Przywara
As the actual LPI number in a guest can be quite high, but is mostly
assigned using a very sparse allocation scheme, bitmaps and arrays
for storing the virtual interrupt status are a waste of memory.
We use our equivalent of the "Interrupt Translation Table Entry"
(ITTE) to hold this extra status information for a virtual LPI.
As the normal VGIC code cannot use its fancy bitmaps to manage
pending interrupts, we provide a hook in the VGIC code to let the
ITS emulation handle the list register queueing itself.
LPIs are located in a separate number range (>=8192), so
distinguishing them is easy. With LPIs being only edge-triggered, we
get away with a less complex IRQ handling.
We extend the number of bits for storing the IRQ number in our
LR struct to 16 to cover the LPI numbers we support as well.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- extend LR data structure to hold 16-bit wide IRQ IDs
- only clear pending bit if IRQ could be queued
- adapt __kvm_vgic_sync_hwstate() to upstream changes

 include/kvm/arm_vgic.h  |  4 +-
 virt/kvm/arm/its-emul.c | 75 
 virt/kvm/arm/its-emul.h |  3 ++
 virt/kvm/arm/vgic-v3-emul.c |  2 +
 virt/kvm/arm/vgic.c | 93 +++--
 5 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c3eb414..035911f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -95,7 +95,7 @@ enum vgic_type {
 #define LR_HW  (1 << 3)
 
 struct vgic_lr {
-   unsigned irq:10;
+   unsigned irq:16;
union {
unsigned hwirq:10;
unsigned source:3;
@@ -147,6 +147,8 @@ struct vgic_vm_ops {
int (*init_model)(struct kvm *);
void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
+   bool(*queue_lpis)(struct kvm_vcpu *);
+   void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index bab8033..8349970 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -59,8 +59,27 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   bool enabled;
+   unsigned long *pending;
 };
 
+/* To be used as an iterator this macro misses the enclosing parentheses */
+#define for_each_lpi(dev, itte, kvm) \
+   list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
+   list_for_each_entry(itte, &(dev)->itt, itte_list)
+
+static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi == lpi)
+   return itte;
+   }
+   return NULL;
+}
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -154,9 +173,65 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
return false;
 }
 
+/*
+ * Find all enabled and pending LPIs and queue them into the list
+ * registers.
+ * The dist lock is held by the caller.
+ */
+bool vits_queue_lpis(struct kvm_vcpu *vcpu)
+{
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   struct its_device *device;
+   struct its_itte *itte;
+   bool ret = true;
+
+   if (!vgic_has_its(vcpu->kvm))
+   return true;
+   if (!its->enabled || !vcpu->kvm->arch.vgic.lpis_enabled)
+   return true;
+
+   spin_lock(>lock);
+   for_each_lpi(device, itte, vcpu->kvm) {
+   if (!itte->enabled || !test_bit(vcpu->vcpu_id, itte->pending))
+   continue;
+
+   if (!itte->collection)
+   continue;
+
+   if (itte->collection->target_addr != vcpu->vcpu_id)
+   continue;
+
+
+   if (vgic_queue_irq(vcpu, 0, itte->lpi))
+   __clear_bit(vcpu->vcpu_id, itte->pending);
+   else
+   ret = false;
+   }
+
+   spin_unlock(>lock);
+   return ret;
+}
+
+/* Called with the distributor lock held by the caller. */
+void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
+{
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   struct its_itte *itte;
+
+   spin_lock(>lock);
+
+   /* Find the right ITTE and put the pending state back in there */
+   itte = find_itte_by_lpi(vcpu->kvm, lpi);
+   if (itte)
+   __set_bit(vcpu->vcpu_id, itte->pending);
+
+   spin_unlock(>lock);
+}
+
 static void its_free_itte(struct its_itte *itte)
 {
list_del(>itte_list);
+   kfree(itte->pending);
kfree(itte);
 }
 
diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h
index 

[PATCH v3 16/16] KVM: arm64: enable ITS emulation as a virtual MSI controller

2015-10-07 Thread Andre Przywara
If userspace has provided a base address for the ITS register frame,
we enable the bits that advertise LPIs in the GICv3.
When the guest has enabled LPIs and the ITS, we enable the emulation
part by initializing the ITS data structures and trapping on ITS
register frame accesses by the guest.
Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject
MSIs into the guest. Not having enabled the ITS emulation will lead
to a -ENODEV when trying to inject a MSI.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- replace kmalloc with kcalloc
- adjust number of supported LPIs in comment

 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/Kconfig|  1 +
 arch/arm64/kvm/reset.c|  6 ++
 include/kvm/arm_vgic.h|  6 ++
 virt/kvm/arm/its-emul.c   | 10 +-
 virt/kvm/arm/vgic-v3-emul.c   | 20 ++--
 virt/kvm/arm/vgic.c   |  8 
 7 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a302e0a..047e4e7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2134,7 +2134,7 @@ after pausing the vcpu, but before it is resumed.
 4.71 KVM_SIGNAL_MSI
 
 Capability: KVM_CAP_SIGNAL_MSI
-Architectures: x86
+Architectures: x86 arm64
 Type: vm ioctl
 Parameters: struct kvm_msi (in)
 Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5c7e920..e8d77f4 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select HAVE_KVM_MSI
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 4d7f78b4..a490f67 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -80,6 +80,12 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
+   case KVM_CAP_MSI_DEVID:
+   if (!kvm)
+   r = -EINVAL;
+   else
+   r = kvm->arch.vgic.msis_require_devid;
+   break;
default:
r = 0;
}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7911059..35657f9 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -174,6 +174,7 @@ struct irq_phys_map_entry {
 
 struct vgic_its {
boolenabled;
+   struct vgic_io_device   iodev;
spinlock_t  lock;
u64 cbaser;
int creadr;
@@ -192,6 +193,9 @@ struct vgic_dist {
/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
u32 vgic_model;
 
+   /* Do injected MSIs require an additional device ID? */
+   boolmsis_require_devid;
+
int nr_cpus;
int nr_irqs;
 
@@ -397,4 +401,6 @@ static inline int vgic_v3_probe(struct device_node 
*vgic_node,
 }
 #endif
 
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 #endif
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index cd8526a..b40a7fc 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -1117,6 +1117,7 @@ int vits_init(struct kvm *kvm)
 {
struct vgic_dist *dist = >arch.vgic;
struct vgic_its *its = >its;
+   int ret;
 
dist->pendbaser = kcalloc(dist->nr_cpus, sizeof(u64), GFP_KERNEL);
if (!dist->pendbaser)
@@ -1131,9 +1132,16 @@ int vits_init(struct kvm *kvm)
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>collection_list);
 
+   ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base,
+  KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges,
+  -1, >iodev);
+   if (ret)
+   return ret;
+
its->enabled = false;
+   dist->msis_require_devid = true;
 
-   return -ENXIO;
+   return 0;
 }
 
 void vits_destroy(struct kvm *kvm)
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 90f3628..311b3ea 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -8,7 +8,6 @@
  *
  * Limitations of the emulation:
  * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
- * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
  * - We do not support the message based interrupts (MBIs) triggered by
  *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
  * - We do not support the (optional) backwards compatibility feature.
@@ -87,10 +86,10 @@ static bool 

Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Radim Krčmář
2015-10-07 11:31+0200, Paolo Bonzini:
> What's the issue with
> handle_irq?

I get #PF instead of callback after redirecting to VCPU 1.
No idea what causes it, yet -- seeing handle_irq's iplementation made me
postpone debugging :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Pavel Fedin
 Hello!

> As the actual LPI number in a guest can be quite high, but is mostly
> assigned using a very sparse allocation scheme, bitmaps and arrays
> for storing the virtual interrupt status are a waste of memory.
> We use our equivalent of the "Interrupt Translation Table Entry"
> (ITTE) to hold this extra status information for a virtual LPI.

 You know, not that i'm strongly against current approach and want you to redo 
everything once
again, but... Is it architecturally correct to intertwine LPIs and ITS so much? 
As far as i
understand arch manual, it is possible to have LPIs without ITS (triggered by 
something else?).
Shouldn't we do the same, and just add LPI support to our redistributors, and 
then proceed with the
ITS?
 As to memory consumption, do we really need to store own copy of tables? After 
all, it's just a
memory. What if we map a pointer directly into guest's memory (which it writes 
to
PROPBASER/PENDBASER), and just keep it? There will be no issues with caching 
and synchronization at
all.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/16] KVM: extend struct kvm_msi to hold a 32-bit device ID

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.

Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
Changelog v2..v3:
- adjust KVM_CAP number to not clash with upstream

 Documentation/virtual/kvm/api.txt | 12 ++--
 include/uapi/linux/kvm.h  |  5 -
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d9eccee..a302e0a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2147,10 +2147,18 @@ struct kvm_msi {
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
-No flags are defined so far. The corresponding field must be 0.
+flags: KVM_MSI_VALID_DEVID: devid contains a valid value
+devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
+   for the device that wrote the MSI message.
+   For PCI, this is usually a BFD identifier in the lower 16 bits.
+
+The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
+the device ID. If this capability is not set, userland cannot rely on
+the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
 
 
 4.71 KVM_CREATE_PIT2
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a9256f0..eae9ba1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -824,6 +824,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 #define KVM_CAP_GUEST_DEBUG_HW_BPS 119
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
+#define KVM_CAP_MSI_DEVID 121
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -975,12 +976,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID(1U << 0)
 struct kvm_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/16] KVM: arm64: add data structures to model ITS interrupt translation

2015-10-07 Thread Andre Przywara
The GICv3 Interrupt Translation Service (ITS) uses tables in memory
to allow a sophisticated interrupt routing. It features device tables,
an interrupt table per device and a table connecting "collections" to
actual CPUs (aka. redistributors in the GICv3 lingo).
Since the interrupt numbers for the LPIs are allocated quite sparsely
and the range can be quite huge (8192 LPIs being the minimum), using
bitmaps or arrays for storing information is a waste of memory.
We use linked lists instead, which we iterate linearily. This works
very well with the actual number of LPIs/MSIs in the guest being
quite low. Should the number of LPIs exceed the number where iterating
through lists seems acceptable, we can later revisit this and use more
efficient data structures.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- add a comment

 include/kvm/arm_vgic.h  |  3 +++
 virt/kvm/arm/its-emul.c | 66 +
 2 files changed, 69 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9ac850d..c3eb414 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define VGIC_NR_IRQS_LEGACY256
 #define VGIC_NR_SGIS   16
@@ -174,6 +175,8 @@ struct vgic_its {
u64 cbaser;
int creadr;
int cwriter;
+   struct list_headdevice_list;
+   struct list_headcollection_list;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 9bbed86..bab8033 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,34 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+struct its_device {
+   struct list_head dev_list;
+
+   /* the head for the list of ITTEs */
+   struct list_head itt;
+   u32 device_id;
+};
+
+#define COLLECTION_NOT_MAPPED ((u32)-1)
+
+struct its_collection {
+   struct list_head coll_list;
+
+   u32 collection_id;
+   u32 target_addr;
+};
+
+#define its_is_collection_mapped(coll) ((coll) && \
+   ((coll)->target_addr != COLLECTION_NOT_MAPPED))
+
+struct its_itte {
+   struct list_head itte_list;
+
+   struct its_collection *collection;
+   u32 lpi;
+   u32 event_id;
+};
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -125,6 +154,12 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
return false;
 }
 
+static void its_free_itte(struct its_itte *itte)
+{
+   list_del(>itte_list);
+   kfree(itte);
+}
+
 /*
  * This function is called with both the ITS and the distributor lock dropped,
  * so the actual command handlers must take the respective locks when needed.
@@ -321,6 +356,9 @@ int vits_init(struct kvm *kvm)
 
spin_lock_init(>lock);
 
+   INIT_LIST_HEAD(>device_list);
+   INIT_LIST_HEAD(>collection_list);
+
its->enabled = false;
 
return -ENXIO;
@@ -330,11 +368,39 @@ void vits_destroy(struct kvm *kvm)
 {
struct vgic_dist *dist = >arch.vgic;
struct vgic_its *its = >its;
+   struct its_device *dev;
+   struct its_itte *itte;
+   struct list_head *dev_cur, *dev_temp;
+   struct list_head *cur, *temp;
 
if (!vgic_has_its(kvm))
return;
 
+   /*
+* We may end up here without the lists ever having been initialized.
+* Check this and bail out early to avoid dereferencing a NULL pointer.
+*/
+   if (!its->device_list.next)
+   return;
+
+   spin_lock(>lock);
+   list_for_each_safe(dev_cur, dev_temp, >device_list) {
+   dev = container_of(dev_cur, struct its_device, dev_list);
+   list_for_each_safe(cur, temp, >itt) {
+   itte = (container_of(cur, struct its_itte, itte_list));
+   its_free_itte(itte);
+   }
+   list_del(dev_cur);
+   kfree(dev);
+   }
+
+   list_for_each_safe(cur, temp, >collection_list) {
+   list_del(cur);
+   kfree(container_of(cur, struct its_collection, coll_list));
+   }
+
kfree(dist->pendbaser);
 
its->enabled = false;
+   spin_unlock(>lock);
 }
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/16] KVM: arm64: Introduce new MMIO region for the ITS base address

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS controller requires a separate register frame to
cover ITS specific registers. Add a new VGIC address type and store
the address in a field in the vgic_dist structure.
Provide a function to check whether userland has provided the address,
so ITS functionality can be guarded by that check.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
Changelog v2..v3:
- none

 Documentation/virtual/kvm/devices/arm-vgic.txt |  9 +
 arch/arm64/include/uapi/asm/kvm.h  |  2 ++
 include/kvm/arm_vgic.h |  3 +++
 virt/kvm/arm/vgic-v3-emul.c|  2 ++
 virt/kvm/arm/vgic.c| 16 
 virt/kvm/arm/vgic.h|  1 +
 6 files changed, 33 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
b/Documentation/virtual/kvm/devices/arm-vgic.txt
index 3fb9054..ec715f9e 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -39,6 +39,15 @@ Groups:
   Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
   This address needs to be 64K aligned.
 
+KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit)
+  Base address in the guest physical address space of the GICv3 ITS
+  control register frame. The ITS allows MSI(-X) interrupts to be
+  injected into guests. This extension is optional, if the kernel
+  does not support the ITS, the call returns -ENODEV.
+  This memory is solely for the guest to access the ITS control
+  registers and does not cover the ITS translation register.
+  Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
+  This address needs to be 64K aligned and the region covers 64 KByte.
 
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   Attributes:
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 0cd7b59..99e4006 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -87,9 +87,11 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST 2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST   3
+#define KVM_VGIC_V3_ADDR_TYPE_ITS  4
 
 #define KVM_VGIC_V3_DIST_SIZE  SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE(2 * SZ_64K)
+#define KVM_VGIC_V3_ITS_SIZE   SZ_64K
 
 #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2c10082..067ad09 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -190,6 +190,9 @@ struct vgic_dist {
phys_addr_t vgic_redist_base;
};
 
+   /* The base address of the ITS control register frame */
+   phys_addr_t vgic_its_base;
+
/* Distributor enabled */
u32 enabled;
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 1f42348..a8cf669 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -887,6 +887,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
 
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_its_base = VGIC_ADDR_UNDEF;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
@@ -1059,6 +1060,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
return -ENXIO;
case KVM_VGIC_V3_ADDR_TYPE_DIST:
case KVM_VGIC_V3_ADDR_TYPE_REDIST:
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
return 0;
}
break;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 1dd79e1..4219f22 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -953,6 +953,16 @@ int vgic_register_kvm_io_dev(struct kvm *kvm, gpa_t base, 
int len,
return ret;
 }
 
+bool vgic_has_its(struct kvm *kvm)
+{
+   struct vgic_dist *dist = >arch.vgic;
+
+   if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+   return false;
+
+   return !IS_VGIC_ADDR_UNDEF(dist->vgic_its_base);
+}
+
 static int vgic_nr_shared_irqs(struct vgic_dist *dist)
 {
return dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
@@ -2257,6 +2267,12 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
u64 *addr, bool write)
block_size = KVM_VGIC_V3_REDIST_SIZE;
alignment = SZ_64K;
break;
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
+   type_needed = KVM_DEV_TYPE_ARM_VGIC_V3;
+   addr_ptr = >vgic_its_base;
+   block_size = KVM_VGIC_V3_ITS_SIZE;
+   alignment = SZ_64K;
+   break;
 #endif
default:
r = -ENODEV;
diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
index 0df74cb..a093f5c 100644
--- a/virt/kvm/arm/vgic.h
+++ 

[PATCH v3 10/16] KVM: arm64: implement basic ITS register handlers

2015-10-07 Thread Andre Przywara
Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
  those implement the ITS command buffer handling

Most of the handlers are pretty straight forward, but CWRITER goes
some extra miles to allow fine grained locking. The idea here
is to let only the first instance iterate through the command ring
buffer, CWRITER accesses on other VCPUs meanwhile will be picked up
by that first instance and handled as well. The ITS lock is thus only
hold for very small periods of time and is dropped before the actual
command handler is called.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- use new renamed vgic_reg64_access() function
- rework locking in CWRITER handling
- use kcalloc instead of kmalloc

 include/kvm/arm_vgic.h |   3 +
 include/linux/irqchip/arm-gic-v3.h |   8 ++
 virt/kvm/arm/its-emul.c| 215 +
 virt/kvm/arm/its-emul.h|   1 +
 virt/kvm/arm/vgic-v3-emul.c|   2 +
 5 files changed, 229 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c8c48e3..9ac850d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -171,6 +171,9 @@ struct irq_phys_map_entry {
 struct vgic_its {
boolenabled;
spinlock_t  lock;
+   u64 cbaser;
+   int creadr;
+   int cwriter;
 };
 
 struct vgic_dist {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index 70e9539..ef274a9 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -181,15 +181,23 @@
 #define GITS_BASER 0x0100
 #define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
+#define GITS_PIDR4 0xffd0
+#define GITS_CIDR0 0xfff0
+#define GITS_CIDR1 0xfff4
+#define GITS_CIDR2 0xfff8
+#define GITS_CIDR3 0xfffc
 
 #define GITS_TRANSLATER0x10040
 
 #define GITS_CTLR_ENABLE   (1U << 0)
 #define GITS_CTLR_QUIESCENT(1U << 31)
 
+#define GITS_TYPER_PLPIS   (1UL << 0)
+#define GITS_TYPER_IDBITS_SHIFT8
 #define GITS_TYPER_DEVBITS_SHIFT   13
 #define GITS_TYPER_DEVBITS(r)  r) >> GITS_TYPER_DEVBITS_SHIFT) & 
0x1f) + 1)
 #define GITS_TYPER_PTA (1UL << 19)
+#define GITS_TYPER_HWCOLLCNT_SHIFT 24
 
 #define GITS_CBASER_VALID  (1UL << 63)
 #define GITS_CBASER_nCnB   (0UL << 59)
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 659dd39..9bbed86 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -32,10 +32,62 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+#define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
+
+/* The distributor lock is held by the VGIC MMIO handler. */
 static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
  struct kvm_exit_mmio *mmio,
  phys_addr_t offset)
 {
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   u32 reg;
+   bool was_enabled;
+
+   switch (offset & ~3) {
+   case 0x00:  /* GITS_CTLR */
+   /* We never defer any command execution. */
+   reg = GITS_CTLR_QUIESCENT;
+   if (its->enabled)
+   reg |= GITS_CTLR_ENABLE;
+   was_enabled = its->enabled;
+   vgic_reg_access(mmio, , offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+   its->enabled = !!(reg & GITS_CTLR_ENABLE);
+   return !was_enabled && its->enabled;
+   case 0x04:  /* GITS_IIDR */
+   reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+   vgic_reg_access(mmio, , offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+   break;
+   case 0x08:  /* GITS_TYPER */
+   /*
+* We use linear CPU numbers for redistributor addressing,
+* so GITS_TYPER.PTA is 0.
+* To avoid memory waste on the guest side, we keep the
+* number of IDBits and DevBits low for the time being.
+* This could later be made configurable by userland.
+* Since we have all collections in linked list, we claim
+* that we can hold all of the collection tables in our
+* own memory and that the ITT entry size is 1 byte (the
+* smallest possible one).
+*/
+   reg = GITS_TYPER_PLPIS;
+   reg |= 0xff << GITS_TYPER_HWCOLLCNT_SHIFT;
+   reg |= 

[PATCH v3 06/16] KVM: arm/arm64: make GIC frame address initialization model specific

2015-10-07 Thread Andre Przywara
Currently we initialize all the possible GIC frame addresses in one
function, without looking at the specific GIC model we instantiate
for the guest.
As this gets confusing when adding another VGIC model later, lets
move these initializations into the respective model's emulation
init functions.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- none

 virt/kvm/arm/vgic-v2-emul.c | 3 +++
 virt/kvm/arm/vgic-v3-emul.c | 3 +++
 virt/kvm/arm/vgic.c | 3 ---
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
index 1390797..8faa28c 100644
--- a/virt/kvm/arm/vgic-v2-emul.c
+++ b/virt/kvm/arm/vgic-v2-emul.c
@@ -567,6 +567,9 @@ void vgic_v2_init_emulation(struct kvm *kvm)
dist->vm_ops.init_model = vgic_v2_init_model;
dist->vm_ops.map_resources = vgic_v2_map_resources;
 
+   dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
 }
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index d2eeb20..1f42348 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -885,6 +885,9 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index b71f627..1dd79e1 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2160,9 +2160,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
-   kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
 
 out_unlock:
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/16] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-07 Thread Andre Przywara
Currently we track which IRQ has been mapped to which VGIC list
register and also have to synchronize both. We used to do this
to hold some extra state (for instance the active bit).
It turns out that this extra state in the LRs is no longer needed and
this extra tracking causes some pain later.
Remove the tracking feature (lr_map and lr_used) and get rid of
quite some code on the way.
In places where we scan LRs we now use our shadow copy of the ELRSR
register directly.
This code change means we lose the "piggy-back" optimization, which
would re-use an active-only LR to inject the pending state on top of
it. Tracing with various workloads shows that this actually occurred
very rarely, the ballpark figure is about once every 10,000 exits
in a disk I/O heavy workload. Also the list registers don't seem to
as scarce as assumed, with all 4 LRs on the popular implementations
used less than once every 100,000 exits.

This has been briefly tested on Midway, Juno and the model (the latter
both with GICv2 and GICv3 guests).

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- adapt to 4.3-rc
- keep, but change retire_lr to drop now unused parameter

 include/kvm/arm_vgic.h |   6 ---
 virt/kvm/arm/vgic-v2.c |   1 +
 virt/kvm/arm/vgic-v3.c |   1 +
 virt/kvm/arm/vgic.c| 137 +
 4 files changed, 61 insertions(+), 84 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7bc5d02..926d67c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -295,9 +295,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -308,9 +305,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index 8d7b04d..c0f5d7f 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -158,6 +158,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 7dd5d62..92003cb 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -193,6 +193,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3->vgic_vmcr = 0;
+   vgic_v3->vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f3e76e5..da0a866 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -102,7 +102,7 @@
 #include "vgic.h"
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
@@ -672,6 +672,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
return false;
 }
 
+static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
+  struct vgic_lr vlr)
+{
+   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
+}
+
+static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
+{
+   return vgic_ops->get_elrsr(vcpu);
+}
+
 /**
  * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -683,9 +694,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask();
int i;
 
-   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -728,7 +741,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * Mark the LR as free for other use.
 */
BUG_ON(lr.state & LR_STATE_MASK);
-   vgic_retire_lr(i, lr.irq, vcpu);
+   vgic_retire_lr(i, vcpu);
  

[PATCH v3 08/16] KVM: arm64: handle ITS related GICv3 redistributor registers

2015-10-07 Thread Andre Przywara
In the GICv3 redistributor there are the PENDBASER and PROPBASER
registers which we did not emulate so far, as they only make sense
when having an ITS. In preparation for that emulate those MMIO
accesses by storing the 64-bit data written into it into a variable
which we later read in the ITS emulation.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- rename vgic_handle_base_register to vgic_reg64_access()

 include/kvm/arm_vgic.h  |  8 
 virt/kvm/arm/vgic-v3-emul.c | 44 
 virt/kvm/arm/vgic.c | 31 +++
 virt/kvm/arm/vgic.h |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 067ad09..06c33bc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -272,6 +272,14 @@ struct vgic_dist {
/* Virtual irq to hwirq mapping */
spinlock_t  irq_phys_map_lock;
struct list_headirq_phys_map_list;
+
+   /* Address of LPI configuration table shared by all redistributors */
+   u64 propbaser;
+
+   /* Addresses of LPI pending tables per redistributor */
+   u64 *pendbaser;
+
+   boollpis_enabled;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index a8cf669..6939f7c 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -651,6 +651,38 @@ static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu 
*vcpu,
return vgic_handle_cfg_reg(reg, mmio, offset);
 }
 
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_propbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_reg64_access(mmio, offset, >propbaser, mode);
+
+   return false;
+}
+
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct kvm_vcpu *rdvcpu = mmio->private;
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_reg64_access(mmio, offset,
+ >pendbaser[rdvcpu->vcpu_id], mode);
+
+   return false;
+}
+
 #define SGI_base(x) ((x) + SZ_64K)
 
 static const struct vgic_io_range vgic_redist_ranges[] = {
@@ -679,6 +711,18 @@ static const struct vgic_io_range vgic_redist_ranges[] = {
.handle_mmio= handle_mmio_raz_wi,
},
{
+   .base   = GICR_PENDBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_pendbaser_redist,
+   },
+   {
+   .base   = GICR_PROPBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_propbaser_redist,
+   },
+   {
.base   = GICR_IDREGS,
.len= 0x30,
.bits_per_irq   = 0,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 4219f22..11bf692 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -471,6 +471,37 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
}
 }
 
+/* handle a 64-bit register access */
+void vgic_reg64_access(struct kvm_exit_mmio *mmio, phys_addr_t offset,
+  u64 *basereg, int mode)
+{
+   u32 reg;
+   u64 breg;
+
+   switch (offset & ~3) {
+   case 0x00:
+   breg = *basereg;
+   reg = lower_32_bits(breg);
+   vgic_reg_access(mmio, , offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg &= GENMASK_ULL(63, 32);
+   breg |= reg;
+   *basereg = breg;
+   }
+   break;
+   case 0x04:
+   breg = *basereg;
+   reg = upper_32_bits(breg);
+   vgic_reg_access(mmio, , offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg  = lower_32_bits(breg);
+   breg |= (u64)reg << 32;
+   *basereg = breg;
+   }
+  

[PATCH v3 04/16] KVM: arm/arm64: add emulation model specific destroy function

2015-10-07 Thread Andre Przywara
Currently we destroy the VGIC emulation in one function that cares for
all emulated models. To be on par with init_model (which is model
specific), lets introduce a per-emulation-model destroy method, too.
Use it for a tiny GICv3 specific code already, later it will be handy
for the ITS emulation.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
Changelog v2..v3:
- none

 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/vgic-v3-emul.c |  9 +
 virt/kvm/arm/vgic.c | 11 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 926d67c..2c10082 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -144,6 +144,7 @@ struct vgic_vm_ops {
bool(*queue_sgi)(struct kvm_vcpu *, int irq);
void(*add_sgi_source)(struct kvm_vcpu *, int irq, int source);
int (*init_model)(struct kvm *);
+   void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
 };
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index e661e7f..d2eeb20 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -862,6 +862,14 @@ static int vgic_v3_init_model(struct kvm *kvm)
return 0;
 }
 
+static void vgic_v3_destroy_model(struct kvm *kvm)
+{
+   struct vgic_dist *dist = >arch.vgic;
+
+   kfree(dist->irq_spi_mpidr);
+   dist->irq_spi_mpidr = NULL;
+}
+
 /* GICv3 does not keep track of SGI sources anymore. */
 static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
 {
@@ -874,6 +882,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
dist->vm_ops.init_model = vgic_v3_init_model;
+   dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index a5360b7..b71f627 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -125,6 +125,14 @@ int kvm_vgic_map_resources(struct kvm *kvm)
return kvm->arch.vgic.vm_ops.map_resources(kvm, vgic);
 }
 
+static void vgic_destroy_model(struct kvm *kvm)
+{
+   struct vgic_vm_ops *vm_ops = >arch.vgic.vm_ops;
+
+   if (vm_ops->destroy_model)
+   vm_ops->destroy_model(kvm);
+}
+
 /*
  * struct vgic_bitmap contains a bitmap made of unsigned longs, but
  * extracts u32s out of them.
@@ -1941,6 +1949,8 @@ void kvm_vgic_destroy(struct kvm *kvm)
struct kvm_vcpu *vcpu;
int i;
 
+   vgic_destroy_model(kvm);
+
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vgic_vcpu_destroy(vcpu);
 
@@ -1957,7 +1967,6 @@ void kvm_vgic_destroy(struct kvm *kvm)
}
kfree(dist->irq_sgi_sources);
kfree(dist->irq_spi_cpu);
-   kfree(dist->irq_spi_mpidr);
kfree(dist->irq_spi_target);
kfree(dist->irq_pending_on_cpu);
kfree(dist->irq_active_on_cpu);
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Andre Przywara
Hi,

another respin and rebase of the ITS emulation series.
Major changes compared to v2 (beside some minor things like added
comments and function renames) are the rebasing and adaption to 4.3-rc
and Christoffer's timer rework series. Also the locking has been
reworked to cope with the dependencies of the its and the dist lock
in connection with the PROPBASER/PENDBASER and the command handling.
For a more detailed changelog see below or look at the respective
commit messages.

This should address most of the comments I got on the list.
Many thanks to the diligent reviewers!
I didn't bother to fine-tune patch 01/16 too much, as I guess there
will be more discussion around this based on Pavel's latest post.

These patches go on top of Christoffer's timer rework series [1],
which itself is on top of 4.3-rc2.
You can find all of this code in the its-emul/v3 branch of my
repository [2].

Cheers,
Andre.

Changelog v2..v3:
- adapt to 4.3-rc and Christoffer's timer rework
- adapt spin locks on handling PROPBASER/PENDBASER registers
- rework locking in ITS command handling (dropping dist where needed)
- only clear LPI pending bit if LPI could actually be queued
- simplify GICR_CTLR handling
- properly free ITTEs (including our pending bitmap)
- fix corner cases with unmapped collections
- keep retire_lr() around
- rename vgic_handle_base_register to vgic_reg64_access()
- use kcalloc instead of kmalloc
- minor fixes, renames and added comments

Changelog v1..v2
- fix issues when using non-ITS GICv3 emulation
- streamline frame address initialization (new patch 05/15)
- preallocate buffer memory for reading from guest's memory
- move locking into the actual command handlers
-   preallocate memory for new structures if needed
- use non-atomic __set_bit() and __clear_bit() when under the lock
- add INT command handler to allow LPI injection from the guest
- rewrite CWRITER handler to align with new locking scheme
- remove unneeded CONFIG_HAVE_KVM_MSI #ifdefs
- check memory table size against our LPI limit (65536 interrupts)
- observe initial gap of 1024 interrupts in pending table
- use term "configuration table" to be in line with the spec
- clarify and extend documentation on API extensions
- introduce new KVM_CAP_MSI_DEVID capability to advertise device ID requirement
- update, fix and add many comments
- minor style changes as requested by reviewers

---

The GICv3 ITS (Interrupt Translation Service) is a part of the
ARM GICv3 interrupt controller [4] used for implementing MSIs.
It specifies a new kind of interrupts (LPIs), which are mapped to
establish a connection between a device, its MSI payload value and
the target processor the IRQ is eventually delivered to.
In order to allow using MSIs in an ARM64 KVM guest, we emulate this
ITS widget in the kernel.
The ITS works by reading commands written by software (from the guest
in our case) into a (guest allocated) memory region and establishing
the mapping between a device, the MSI payload and the target CPU.
We parse these commands and update our internal data structures to
reflect those changes. On an MSI injection we iterate those
structures to learn the LPI number we have to inject.
For the time being we use simple lists to hold the data, this is
good enough for the small number of entries each of the components
currently have. Should this become a performance bottleneck in the
future, those can be extended to arrays or trees if needed.

Most of the code lives in a separate source file (its-emul.c), though
there are some changes necessary both in vgic.c and vgic-v3-emul.c.

Patch 01/16 gets rid of the internal tracking of the used LR for
an injected IRQ, see the commit message for more details.
Patch 03/16 extends the KVM MSI ioctl to hold a device ID.
Patch 04-06 make small changes to the existing VGIC code which make
adaptions to the ITS later easier.
The rest of the patches implement the ITS functionality step by step.
For more details see the respective commit messages.

For the time being this series gives us the ability to use emulated
PCI devices that can use MSIs in the guest. Those have to be
triggered by letting the userland device emulation simulate the MSI
write with the KVM_SIGNAL_MSI ioctl. This will be translated into
the proper LPI by the ITS emulation and injected into the guest in
the usual way (just with a higher IRQ number).

This series is based on 4.3-rc2 and can be found at the its-emul/v3
branch of this repository [2].
For this to be used you need a GICv3 host machine (a fast model would
do), though it does not rely on any host ITS bits (neither in hardware
or software).

To test this you can use the kvmtool patches available in the "its"
branch here [3].
Start a guest with: "$ lkvm run --irqchip=gicv3-its --force-pci"
and see the ITS being used for instance by the virtio devices.

[1]: 
https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git/shortlog/refs/heads/timer-rework-v3
[2]: git://linux-arm.org/linux-ap.git
 

[PATCH v3 02/16] KVM: arm/arm64: remove now unused code after stay-in-LR rework

2015-10-07 Thread Andre Przywara
Now that we synchronize the LR state into our emulation upon guest
exit, there is no need for taking extra care of disabled IRQs.
Remove that code.

Signed-off-by: Andre Przywara 
---
Changelog v2..v3:
- new patch

 virt/kvm/arm/vgic.c | 29 -
 1 file changed, 29 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index da0a866..a5360b7 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -101,7 +101,6 @@
 
 #include "vgic.h"
 
-static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
@@ -477,7 +476,6 @@ bool vgic_handle_enable_reg(struct kvm *kvm, struct 
kvm_exit_mmio *mmio,
 {
u32 *reg;
int mode = ACCESS_READ_VALUE | access;
-   struct kvm_vcpu *target_vcpu = kvm_get_vcpu(kvm, vcpu_id);
 
reg = vgic_bitmap_get_reg(>arch.vgic.irq_enabled, vcpu_id, offset);
vgic_reg_access(mmio, reg, offset, mode);
@@ -485,7 +483,6 @@ bool vgic_handle_enable_reg(struct kvm *kvm, struct 
kvm_exit_mmio *mmio,
if (access & ACCESS_WRITE_CLEARBIT) {
if (offset < 4) /* Force SGI enabled */
*reg |= 0x;
-   vgic_retire_disabled_irqs(target_vcpu);
}
vgic_update_state(kvm);
return true;
@@ -1099,32 +1096,6 @@ static void vgic_retire_lr(int lr_nr, struct kvm_vcpu 
*vcpu)
vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
-/*
- * An interrupt may have been disabled after being made pending on the
- * CPU interface (the classic case is a timer running while we're
- * rebooting the guest - the interrupt would kick as soon as the CPU
- * interface gets enabled, with deadly consequences).
- *
- * The solution is to examine already active LRs, and check the
- * interrupt is still enabled. If not, just retire it.
- */
-static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
-{
-   u64 elrsr = vgic_get_elrsr(vcpu);
-   unsigned long *elrsr_ptr = u64_to_bitmask();
-   int lr;
-
-   for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) {
-   struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-
-   if (!vgic_irq_is_enabled(vcpu, vlr.irq)) {
-   vgic_retire_lr(lr, vcpu);
-   if (vgic_irq_is_queued(vcpu, vlr.irq))
-   vgic_irq_clear_queued(vcpu, vlr.irq);
-   }
-   }
-}
-
 static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 int lr_nr, struct vgic_lr vlr)
 {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/16] KVM: arm64: introduce ITS emulation file with stub functions

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS emulation code goes into a separate file, but
needs to be connected to the GICv3 emulation, of which it is an
option.
Introduce the skeleton with function stubs to be filled later.
Introduce the basic ITS data structure and initialize it, but don't
return any success yet, as we are not yet ready for the show.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
Changelog v2..v3:
- drop ITS check before doing GICR_CTLR access

 arch/arm64/kvm/Makefile|   1 +
 include/kvm/arm_vgic.h |   6 ++
 include/linux/irqchip/arm-gic-v3.h |   1 +
 virt/kvm/arm/its-emul.c| 125 +
 virt/kvm/arm/its-emul.h|  35 +++
 virt/kvm/arm/vgic-v3-emul.c|  20 +-
 6 files changed, 185 insertions(+), 3 deletions(-)
 create mode 100644 virt/kvm/arm/its-emul.c
 create mode 100644 virt/kvm/arm/its-emul.h

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1949fe5..75069a9 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,5 +25,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v2-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/its-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 06c33bc..c8c48e3 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -168,6 +168,11 @@ struct irq_phys_map_entry {
struct irq_phys_map map;
 };
 
+struct vgic_its {
+   boolenabled;
+   spinlock_t  lock;
+};
+
 struct vgic_dist {
spinlock_t  lock;
boolin_kernel;
@@ -280,6 +285,7 @@ struct vgic_dist {
u64 *pendbaser;
 
boollpis_enabled;
+   struct vgic_its its;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index 9eeeb95..70e9539 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -179,6 +179,7 @@
 #define GITS_CWRITER   0x0088
 #define GITS_CREADR0x0090
 #define GITS_BASER 0x0100
+#define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
 
 #define GITS_TRANSLATER0x10040
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
new file mode 100644
index 000..659dd39
--- /dev/null
+++ b/virt/kvm/arm/its-emul.c
@@ -0,0 +1,125 @@
+/*
+ * GICv3 ITS emulation
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ * Author: Andre Przywara 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "vgic.h"
+#include "its-emul.h"
+
+static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
+ struct kvm_exit_mmio *mmio,
+ phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cwriter(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_creadr(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static const struct vgic_io_range vgicv3_its_ranges[] = {
+   {
+   .base   = GITS_CTLR,
+   .len= 0x10,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_misc_gits,
+  

Re: [PATCH v2 13/15] KVM: arm64: implement ITS command queue command handlers

2015-10-07 Thread Andre Przywara
Hi Eric,

thanks a lot for your comments. I just found this email, which seemed to
have been crushed between my holidays and KVM forum.
I tried to address your concerns in my new revision, some have become
obsolete due to the reworked locking.
So I refrain from answering in detail here, since some code has changed
in v3 and I don't really want to talk about the old version anymore ;-)

If you care I can try to answer on your concerns in the new v3 context,
but you may want to take a look at the new patch anyway.

Cheers,
Andre.

On 17/08/15 14:33, Eric Auger wrote:
> On 07/10/2015 04:21 PM, Andre Przywara wrote:
>> The connection between a device, an event ID, the LPI number and the
>> allocated CPU is stored in in-memory tables in a GICv3, but their
>> format is not specified by the spec. Instead software uses a command
>> queue in a ring buffer to let the ITS implementation use their own
>> format.
>> Implement handlers for the various ITS commands and let them store
>> the requested relation into our own data structures.
>> To avoid kmallocs inside the ITS spinlock, we preallocate possibly
>> needed memory outside of the lock and free that if it turns out to
>> be not needed (mostly error handling).
> still dist lock ...?
>> Error handling is very basic at this point, as we don't have a good
>> way of communicating errors to the guest (usually a SError).
>> The INT command handler is missing at this point, as we gain the
>> capability of actually injecting MSIs into the guest only later on.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |   5 +-
>>  virt/kvm/arm/its-emul.c| 497 
>> -
>>  virt/kvm/arm/its-emul.h|  11 +
>>  3 files changed, 511 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h 
>> b/include/linux/irqchip/arm-gic-v3.h
>> index 0b450c7..80db4f6 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -253,7 +253,10 @@
>>   */
>>  #define GITS_CMD_MAPD   0x08
>>  #define GITS_CMD_MAPC   0x09
>> -#define GITS_CMD_MAPVI  0x0a
>> +#define GITS_CMD_MAPTI  0x0a
>> +/* older GIC documentation used MAPVI for this command */
>> +#define GITS_CMD_MAPVI  GITS_CMD_MAPTI
>> +#define GITS_CMD_MAPI   0x0b
>>  #define GITS_CMD_MOVI   0x01
>>  #define GITS_CMD_DISCARD0x0f
>>  #define GITS_CMD_INV0x0c
>> diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
>> index 05245cb..89534c6 100644
>> --- a/virt/kvm/arm/its-emul.c
>> +++ b/virt/kvm/arm/its-emul.c
>> @@ -22,6 +22,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -55,6 +56,34 @@ struct its_itte {
>>  unsigned long *pending;
>>  };
>>  
>> +static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
>> +{
>> +struct vgic_its *its = >arch.vgic.its;
>> +struct its_device *device;
>> +
>> +list_for_each_entry(device, >device_list, dev_list)
>> +if (device_id == device->device_id)
>> +return device;
>> +
>> +return NULL;
>> +}
>> +
>> +static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 
>> event_id)
>> +{
>> +struct its_device *device;
>> +struct its_itte *itte;
>> +
>> +device = find_its_device(kvm, device_id);
>> +if (device == NULL)
>> +return NULL;
>> +
>> +list_for_each_entry(itte, >itt, itte_list)
>> +if (itte->event_id == event_id)
>> +return itte;
>> +
>> +return NULL;
>> +}
>> +
>>  #define for_each_lpi(dev, itte, kvm) \
>>  list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
>>  list_for_each_entry(itte, &(dev)->itt, itte_list)
>> @@ -71,6 +100,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
>> *kvm, int lpi)
>>  return NULL;
>>  }
>>  
>> +static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
>> +{
>> +struct its_collection *collection;
>> +
>> +list_for_each_entry(collection, >arch.vgic.its.collection_list,
>> +coll_list) {
>> +if (coll_id == collection->collection_id)
>> +return collection;
>> +}
>> +
>> +return NULL;
>> +}
>> +
>>  #define LPI_PROP_ENABLE_BIT(p)  ((p) & LPI_PROP_ENABLED)
>>  #define LPI_PROP_PRIORITY(p)((p) & 0xfc)
>>  
>> @@ -333,9 +375,461 @@ void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
>>  spin_unlock(>lock);
>>  }
>>  
>> +static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
>> +{
>> +return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
>> +}
>> +
>> +#define its_cmd_get_command(cmd)its_cmd_mask_field(cmd, 0,  0,  8)
>> +#define 

Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-07 Thread Bandan Das
Joerg Roedel  writes:

> On Tue, Oct 06, 2015 at 01:59:27PM -0400, Bandan Das wrote:
>> Joerg Roedel  writes:
>> >
>> > So svm->vmcb->control.next_rip is only written by hardware or in
>> > svm_check_intercept(). Both cases write only to this field, if the
>> > hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only
>> 
>> Not until commit f104765b4f81fd74d69e0eb161e89096deade2db. So, an older L1
>> kernel will trigger it.
>
> But we don't care if L1 writes something into its own next_rip, as we
> never read this value from its VMCB. We only copy the next_rip value we
> get from our shadow-vmcb to it on an emulated vmexit. So I still don't
> understand what triggers the reported problem or why the WARN_ON is
> necessary.

Ok, looks like I am making some incorrect "vmx" assumptions here. What happens
when we exit from L2 to L0, arent' we looking at the VMCB L1 is using to run
L2 ? Wouldn't that trigger the warning if the host processor does not support
nrips and the field is set ?


>
>   Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Joerg Roedel
On Wed, Oct 07, 2015 at 11:48:36AM -0400, Bandan Das wrote:
> Ok, understood now. The warn_on would trigger in L1 only if it has
> decided to disable nrips for some reason as was the case here. So,
> my reasoning behind putting the warning was incorrect.

Okay, so I think the warning can be removed.

> > +
> > +   if (guest_cpuid_has_nrips(vcpu))
> > +   nested_vmcb->control.next_rip  = vmcb->control.next_rip;

Note that there is a bug here, instead of vcpu it must be >vcpu.
Somehow I missed to at least compile-test this.

Dirk is currently testing whether this (fixed) patch solves the problem
in his setup.

> >  
> > /*
> >  * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> > @@ -2714,6 +2716,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
> > svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
> > svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
> >  
> > +   /* Clear next_rip, as real hardware would do */
> > +   nested_vmcb->control.next_rip = 0;
> > +
> 
> Why do we need this ? And are you sure this is what real hardware does ?
> I couldn't find anything in the spec.

Yeah, probably right. Since we only write guests next_rip when the guest
supports it via cpuid, there is probably no point in resetting it at
vmrun emulation.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Marc Zyngier
On 07/10/15 17:05, Pavel Fedin wrote:
>  Hello!
> 
>  One more concern about the whole thing. I already replied to the previous 
> series, but looks like my
> reply was missed.
>  Your implementation does not care about live migration at all. And there's 
> one fundamental issue
> with it. In the redistributor LPIs can be only pending, but in the CPU 
> interface they still can be
> active. And they have priorities, therefore they can be preempted, so we can 
> have even more than one
> active LPI at once. How to migrate this state?
>  Here i am trying to prototype this by leaving active interrupts in LRs and 
> allowing the userland to
> read/write them. This looks a bit stupid, additionally this will create 
> problems if we are e. g.
> migrating from host with 8 LRs to host with 4 LRs, while having 6 active 
> LPIs. Can anybody suggest
> better solution?
>  Technically LPI pending table has unused bits from 0 to 8191, and we have 
> 8192 LPIs, so we could
> push active state there, just for migration. Would this be a big violation of 
> specification? It says
> nothing about these bits at all.

LPIs do not have an active state, at the redistributor or otherwise.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Marc Zyngier
On 07/10/15 16:46, Pavel Fedin wrote:
>  Hello!
> 
>> Sure. And you then have to parse and validate all the tables each and
>> every time you're going to inject an interrupt (because the guest can
>> change the table content behind your back). You are quickly going to
>> notice that your performance is abysmal.
> 
>  I don't see any real problems, at least with LPI tables. If the guest 
> changes something, it will be
> immediately available to us. I don't see any need to seriously validate 
> something, at least here.
> Pending bit is just pending bit, and configuration is just priority value 
> plus enable bit.
>  But, well, if we think a bit better, in case of pending bit modification, 
> the operations on both
> guest and host side have to be atomic, otherwise we can clobber our table if, 
> for example, both host
> and guest modify adjacent bits. And there's no way to interlock with the 
> guest. So, OK, i accept
> your point.

The pending table is the least of our concerns. Device table, ITTs,
collections. That's the real problem.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Pavel Fedin
 Hello!

 One more concern about the whole thing. I already replied to the previous 
series, but looks like my
reply was missed.
 Your implementation does not care about live migration at all. And there's one 
fundamental issue
with it. In the redistributor LPIs can be only pending, but in the CPU 
interface they still can be
active. And they have priorities, therefore they can be preempted, so we can 
have even more than one
active LPI at once. How to migrate this state?
 Here i am trying to prototype this by leaving active interrupts in LRs and 
allowing the userland to
read/write them. This looks a bit stupid, additionally this will create 
problems if we are e. g.
migrating from host with 8 LRs to host with 4 LRs, while having 6 active LPIs. 
Can anybody suggest
better solution?
 Technically LPI pending table has unused bits from 0 to 8191, and we have 8192 
LPIs, so we could
push active state there, just for migration. Would this be a big violation of 
specification? It says
nothing about these bits at all.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Pavel Fedin
 Hello!

> Sure. And you then have to parse and validate all the tables each and
> every time you're going to inject an interrupt (because the guest can
> change the table content behind your back). You are quickly going to
> notice that your performance is abysmal.

 I don't see any real problems, at least with LPI tables. If the guest changes 
something, it will be
immediately available to us. I don't see any need to seriously validate 
something, at least here.
Pending bit is just pending bit, and configuration is just priority value plus 
enable bit.
 But, well, if we think a bit better, in case of pending bit modification, the 
operations on both
guest and host side have to be atomic, otherwise we can clobber our table if, 
for example, both host
and guest modify adjacent bits. And there's no way to interlock with the guest. 
So, OK, i accept
your point.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Bandan Das
Joerg Roedel  writes:

> On Wed, Oct 07, 2015 at 01:03:35PM +0200, Joerg Roedel wrote:
>> But we don't care if L1 writes something into its own next_rip, as we
>> never read this value from its VMCB. We only copy the next_rip value we
>> get from our shadow-vmcb to it on an emulated vmexit. So I still don't
>> understand what triggers the reported problem or why the WARN_ON is
>> necessary.
>
> Okay, I think I have an idea now. I talked a bit with Dirk and the
> WARN_ON triggers in the guest, and not on the host. This makes a lot
> more sense.
>
> In nested-svm we always copy the next_rip from the shadow-vmcb to the
> guests vmcb, even when the nrips bit in cpuid is not set for the guest.
> This obviously triggers the WARN_ON() in the L1 KVM (I still don't
> understand why the WARN_ON was introduced in the first place).

Ok, understood now. The warn_on would trigger in L1 only if it has
decided to disable nrips for some reason as was the case here. So,
my reasoning behind putting the warning was incorrect. 

> So the right fix is to only copy next_rip to the guests vmcb when its
> cpuid indicates that next_rip is supported there, like in this patch:

Yep, agreed.

> From 019afc60507618b8e44e0c67d5ea2d850d88c9dd Mon Sep 17 00:00:00 2001
> From: Joerg Roedel 
> Date: Wed, 7 Oct 2015 13:38:19 +0200
> Subject: [PATCH] kvm: svm: Only propagate next_rip when guest supports it
>
> Currently we always write the next_rip of the shadow vmcb to
> the guests vmcb when we emulate a vmexit. This could confuse
> the guest when its cpuid indicated no support for the
> next_rip feature.
>
> Fix this by only propagating next_rip if the guest actually
> supports it.
>
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/kvm/cpuid.h | 21 +
>  arch/x86/kvm/svm.c   |  7 ++-
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dd05b9c..effca1f 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -133,4 +133,25 @@ static inline bool guest_cpuid_has_mpx(struct kvm_vcpu 
> *vcpu)
>   best = kvm_find_cpuid_entry(vcpu, 7, 0);
>   return best && (best->ebx & bit(X86_FEATURE_MPX));
>  }
> +
> +/*
> + * NRIPS is provided through cpuidfn 0x800a.edx bit 3
> + */
> +#define BIT_NRIPS3
> +
> +static inline bool guest_cpuid_has_nrips(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpuid_entry2 *best;
> +
> + best = kvm_find_cpuid_entry(vcpu, 0x800a, 0);
> +
> + /*
> +  * NRIPS is a scattered cpuid feature, so we can't use
> +  * X86_FEATURE_NRIPS here (X86_FEATURE_NRIPS would be bit
> +  * position 8, not 3).
> +  */
> + return best && (best->edx & bit(BIT_NRIPS));
> +}
> +#undef BIT_NRIPS
> +
>  #endif
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 94b7d15..e1a8824 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2459,7 +2459,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>   nested_vmcb->control.exit_info_2   = vmcb->control.exit_info_2;
>   nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info;
>   nested_vmcb->control.exit_int_info_err = 
> vmcb->control.exit_int_info_err;
> - nested_vmcb->control.next_rip  = vmcb->control.next_rip;
> +
> + if (guest_cpuid_has_nrips(vcpu))
> + nested_vmcb->control.next_rip  = vmcb->control.next_rip;
>  
>   /*
>* If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
> @@ -2714,6 +2716,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
>   svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
>   svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
>  
> + /* Clear next_rip, as real hardware would do */
> + nested_vmcb->control.next_rip = 0;
> +

Why do we need this ? And are you sure this is what real hardware does ?
I couldn't find anything in the spec.

>   nested_svm_unmap(page);
>  
>   /* Enter Guest-Mode */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Pavel Fedin
 Hello!

> LPIs do not have an active state, at the redistributor or otherwise.

 Then what do they become after they were ACK'ed and before EOI'ed?
 I tried to google up this thing, and came up with this email:
http://www.spinics.net/lists/kvm-arm/msg16032.html. It says that "SW must issue 
a write to EOI to
clear the active priorities register, hence the CPU interface still requires an 
active state for
LPIs". They give a link to some document which seems to be top-secret and never 
published, because
my arch reference manual does not have section 4.8.3 named "Properties of LPI".
 And another thread, 
http://lists.xen.org/archives/html/xen-devel/2014-09/msg01141.html, says that
virtual LPIs actually do have active state in LR.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: svm: Only propagate next_rip when guest supports it

2015-10-07 Thread Dirk Müller

On 07.10.2015 18:14, Joerg Roedel wrote:


Dirk is currently testing whether this (fixed) patch solves the problem
in his setup.


Tested-By: Dirk Mueller 

Works fine here. Thanks!


Greetings,
Dirk
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html