Re: [PATCH] KVM: arm/arm64: VGIC MMIO: add missing irq_lock

2018-03-12 Thread Christoffer Dall
On Tue, Mar 06, 2018 at 09:21:06AM +, Andre Przywara wrote:
> Our irq_is_pending() helper function accesses multiple members of the
> vgic_irq struct, so we need to hold the lock when calling it.

For the record I don't think this is necessarily a completely valid
conclusion.  The fact that you access multiple members of a struct is a
good indication that it might be a good idea to hold a lock, but it's
not as simple as that.

I think the only thing that could happen here is that a caller
mistakenly evaluates line_level when it shouldn't, but that would only
happen when changing the configuration of an irq from level to edge,
while the line_level is high, expecting the line_level to go down, and
the pending state to be subsequently reported as false, but we only
support changing the configuration of an interrupt when it's disabled,
and as a result this can only affect reads of the PENDR registers.

> Add that requirement as a comment to the definition and take the lock
> around the call in vgic_mmio_read_pending(), where we were missing it
> before.
> 
> Signed-off-by: Andre Przywara 

Note, I'm fine with this change, but I don't agree with the rationale.
The rationale is to take the lock on every use for consistency and to
make the code easier to reason about, but it's possible that some future
analysis in the future would relax this requirement if essential for
performance.

Thanks,
-Christoffer

> ---
>  virt/kvm/arm/vgic/vgic-mmio.c | 3 +++
>  virt/kvm/arm/vgic/vgic.h  | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 83d82bd7dc4e..dbe99d635c80 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -113,9 +113,12 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu 
> *vcpu,
>   /* Loop over all IRQs affected by this read */
>   for (i = 0; i < len * 8; i++) {
>   struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> + unsigned long flags;
>  
> + spin_lock_irqsave(&irq->irq_lock, flags);
>   if (irq_is_pending(irq))
>   value |= (1U << i);
> + spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>   vgic_put_irq(vcpu->kvm, irq);
>   }
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index 12c37b89f7a3..5b11859a1a1e 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -96,6 +96,7 @@
>  /* we only support 64 kB translation table page size */
>  #define KVM_ITS_L1E_ADDR_MASKGENMASK_ULL(51, 16)
>  
> +/* Requires the irq_lock to be held by the caller. */
>  static inline bool irq_is_pending(struct vgic_irq *irq)
>  {
>   if (irq->config == VGIC_CONFIG_EDGE)
> -- 
> 2.14.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/2] kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3

2018-03-12 Thread Christoffer Dall
On Sun, Mar 11, 2018 at 12:49:56PM +, Marc Zyngier wrote:
> On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to
> force synchronization between the memory-mapped guest view and
> the system-register view that the hypervisor uses.
> 
> This is incorrect, as the spec calls out the need for "a DSB whose
> required access type is both loads and stores with any Shareability
> attribute", while we're only synchronizing stores.
> 
> We also lack an isb after the dsb to ensure that the latter has
> actually been executed before we start reading stuff from the sysregs.
> 
> The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb()
> just after.
> 
> Cc: sta...@vger.kernel.org
> Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore")
> Reviewed-by: Andre Przywara 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 

> ---
>  virt/kvm/arm/hyp/vgic-v3-sr.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
> index f5c3d6d7019e..b89ce5432214 100644
> --- a/virt/kvm/arm/hyp/vgic-v3-sr.c
> +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
> @@ -215,7 +215,8 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu 
> *vcpu)
>* are now visible to the system register interface.
>*/
>   if (!cpu_if->vgic_sre) {
> - dsb(st);
> + dsb(sy);
> + isb();
>   cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
>   }
>  
> -- 
> 2.14.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

2018-03-12 Thread Christoffer Dall
On Sun, Mar 11, 2018 at 12:49:55PM +, Marc Zyngier wrote:
> The vgic code is trying to be clever when injecting GICv2 SGIs,
> and will happily populate LRs with the same interrupt number if
> they come from multiple vcpus (after all, they are distinct
> interrupt sources).
> 
> Unfortunately, this is against the letter of the architecture,
> and the GICv2 architecture spec says "Each valid interrupt stored
> in the List registers must have a unique VirtualID for that
> virtual CPU interface.". GICv3 has similar (although slightly
> ambiguous) restrictions.
> 
> This results in guests locking up when using GICv2-on-GICv3, for
> example. The obvious fix is to stop trying so hard, and inject
> a single vcpu per SGI per guest entry. After all, pending SGIs
> with multiple source vcpus are pretty rare, and are mostly seen
> in scenario where the physical CPUs are severely overcomitted.
> 
> But as we now only inject a single instance of a multi-source SGI per
> vcpu entry, we may delay those interrupts for longer than strictly
> necessary, and run the risk of injecting lower priority interrupts
> in the meantime.
> 
> In order to address this, we adopt a three stage strategy:
> - If we encounter a multi-source SGI in the AP list while computing
>   its depth, we force the list to be sorted
> - When populating the LRs, we prevent the injection of any interrupt
>   of lower priority than that of the first multi-source SGI we've
>   injected.
> - Finally, the injection of a multi-source SGI triggers the request
>   of a maintenance interrupt when there will be no pending interrupt
>   in the LRs (HCR_NPIE).
> 
> At the point where the last pending interrupt in the LRs switches
> from Pending to Active, the maintenance interrupt will be delivered,
> allowing us to add the remaining SGIs using the same process.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> Signed-off-by: Marc Zyngier 

The fact that we have to do this is really annoying, but I see not other
way around it.  It will get slightly better if we move to insertion sort
based on priorities when injecting interrupts as discussed with Andre,
though.

Acked-by: Christoffer Dall 

> ---
>  include/linux/irqchip/arm-gic-v3.h |  1 +
>  include/linux/irqchip/arm-gic.h|  1 +
>  virt/kvm/arm/vgic/vgic-v2.c|  9 +-
>  virt/kvm/arm/vgic/vgic-v3.c|  9 +-
>  virt/kvm/arm/vgic/vgic.c   | 61 
> +-
>  virt/kvm/arm/vgic/vgic.h   |  2 ++
>  6 files changed, 67 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-v3.h
> index c00c4c33e432..b26eccc78fb1 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -503,6 +503,7 @@
>  
>  #define ICH_HCR_EN   (1 << 0)
>  #define ICH_HCR_UIE  (1 << 1)
> +#define ICH_HCR_NPIE (1 << 3)
>  #define ICH_HCR_TC   (1 << 10)
>  #define ICH_HCR_TALL0(1 << 11)
>  #define ICH_HCR_TALL1(1 << 12)
> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
> index d3453ee072fc..68d8b1f73682 100644
> --- a/include/linux/irqchip/arm-gic.h
> +++ b/include/linux/irqchip/arm-gic.h
> @@ -84,6 +84,7 @@
>  
>  #define GICH_HCR_EN  (1 << 0)
>  #define GICH_HCR_UIE (1 << 1)
> +#define GICH_HCR_NPIE(1 << 3)
>  
>  #define GICH_LR_VIRTUALID(0x3ff << 0)
>  #define GICH_LR_PHYSID_CPUID_SHIFT   (10)
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index c32d7b93ffd1..44264d11be02 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -37,6 +37,13 @@ void vgic_v2_init_lrs(void)
>   vgic_v2_write_lr(i, 0);
>  }
>  
> +void vgic_v2_set_npie(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v2_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v2;
> +
> + cpuif->vgic_hcr |= GICH_HCR_NPIE;
> +}
> +
>  void vgic_v2_set_underflow(struct kvm_vcpu *vcpu)
>  {
>   struct vgic_v2_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v2;
> @@ -64,7 +71,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>   int lr;
>   unsigned long flags;
>  
> - cpuif->vgic_hcr &= ~GICH_HCR_UIE;
> + cpuif->vgic_hcr &= ~(GICH_HCR_UIE | GICH_HCR_NPIE);
>  
>   for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
>   u32 val = cpuif->vgic_lr[lr];
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 6b329414e57a..0ff2006f3781 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -26,6 +26,13 @@ static bool group1_trap;
>  static bool common_trap;
>  static bool gicv4_enable;
>  
> +void vgic_v3_set_npie(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpuif = &vcpu->arch.vgic_cpu.vgic_v3;
> +
> +

Re: [PATCH v5 19/23] arm64: KVM: Allow far branches from vector slots to the main vectors

2018-03-12 Thread Marc Zyngier
Hi James,

On 12/03/18 18:27, James Morse wrote:
> Hi Marc,
> 
> On 01/03/18 15:55, Marc Zyngier wrote:
>> So far, the branch from the vector slots to the main vectors can at
>> most be 4GB from the main vectors (the reach of ADRP), and this
>> distance is known at compile time. If we were to remap the slots
>> to an unrelated VA, things would break badly.
>>
>> A way to achieve VA independence would be to load the absolute
>> address of the vectors (__kvm_hyp_vector), either using a constant
>> pool or a series of movs, followed by an indirect branch.
>>
>> This patches implements the latter solution, using another instance
>> of a patching callback.
> 
>> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
>> index e000cb390618..e8d997788ad0 100644
>> --- a/arch/arm64/kernel/bpi.S
>> +++ b/arch/arm64/kernel/bpi.S
>> @@ -19,6 +19,9 @@
>>  #include 
>>  #include 
>>  
>> +#include 
>> +#include 
>> +
>>  .macro hyp_ventry offset
>>  .align 7
>>  .rept 29
>> @@ -64,9 +67,15 @@ ENTRY(__bp_harden_hyp_vecs_start)
>>  .endr
>>  
>>  __kvm_enter_vectors:
>> +alternative_cb  kvm_patch_vector_branch
>> +movzx1, #0
>> +movkx1, #0, lsl #16
>> +movkx1, #0, lsl #32
>> +movkx1, #0, lsl #48
>> +alternative_cb_end
>>  
>> -adr_l   x1, __kvm_hyp_vector
>>  add x0, x1, x0
>> +kern_hyp_va x0
> 
> Can't you patch the kern_hyp_va address into the movk block directly?
> Obviously you can't call kern_hyp_va, but you could generate the layout and 
> have
> some slow __kern_hyp_va() to generate the value. This would avoid generating a
> value, to then throw half of it away and patch something else in.

Hmmm. That's pretty twisted. Actually, this is utterly terrifying. And
thus absolutely mandatory. Irk. I really like it.

> Does this code run for VHE systems too? (it not, is the x<<48 movk needed?)

This is indeed VHE only, and if/when I adopt your suggestion, we'll be
able to drop the last movk, effectively getting rid of 6 instructions on
the exception path. Awesome!

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 19/23] arm64: KVM: Allow far branches from vector slots to the main vectors

2018-03-12 Thread James Morse
Hi Marc,

On 01/03/18 15:55, Marc Zyngier wrote:
> So far, the branch from the vector slots to the main vectors can at
> most be 4GB from the main vectors (the reach of ADRP), and this
> distance is known at compile time. If we were to remap the slots
> to an unrelated VA, things would break badly.
> 
> A way to achieve VA independence would be to load the absolute
> address of the vectors (__kvm_hyp_vector), either using a constant
> pool or a series of movs, followed by an indirect branch.
> 
> This patches implements the latter solution, using another instance
> of a patching callback.

> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
> index e000cb390618..e8d997788ad0 100644
> --- a/arch/arm64/kernel/bpi.S
> +++ b/arch/arm64/kernel/bpi.S
> @@ -19,6 +19,9 @@
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +
>  .macro hyp_ventry offset
>   .align 7
>   .rept 29
> @@ -64,9 +67,15 @@ ENTRY(__bp_harden_hyp_vecs_start)
>   .endr
>  
>  __kvm_enter_vectors:
> +alternative_cb   kvm_patch_vector_branch
> + movzx1, #0
> + movkx1, #0, lsl #16
> + movkx1, #0, lsl #32
> + movkx1, #0, lsl #48
> +alternative_cb_end
>  
> - adr_l   x1, __kvm_hyp_vector
>   add x0, x1, x0
> + kern_hyp_va x0

Can't you patch the kern_hyp_va address into the movk block directly?
Obviously you can't call kern_hyp_va, but you could generate the layout and have
some slow __kern_hyp_va() to generate the value. This would avoid generating a
value, to then throw half of it away and patch something else in.

Does this code run for VHE systems too? (it not, is the x<<48 movk needed?)


Thanks,

James


>   br  x0
>  ENTRY(__bp_harden_hyp_vecs_end)

>  
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index a73e47804972..7ef3d920c8d4 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -152,3 +152,30 @@ void __init kvm_update_va_mask(struct alt_instr *alt,
>   updptr[i] = cpu_to_le32(insn);
>   }
>  }
> +
> +void kvm_patch_vector_branch(struct alt_instr *alt,
> +  __le32 *origptr, __le32 *updptr, int nr_inst)
> +{
> + enum aarch64_insn_movewide_type type;
> + u64 addr;
> + u32 oinsn, rd;
> + int s;
> +
> + BUG_ON(nr_inst != 4);
> +
> + addr = (uintptr_t)kvm_ksym_ref(__kvm_hyp_vector);
> + oinsn = le32_to_cpu(origptr[0]);
> + rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn);
> +
> + type = AARCH64_INSN_MOVEWIDE_ZERO;
> + for (s = 0; nr_inst--; s += 16) {
> + u32 insn = aarch64_insn_gen_movewide(rd,
> +  (u16)(addr >> s),
> +  s,
> +  AARCH64_INSN_VARIANT_64BIT,
> +  type);
> + *updptr++ = cpu_to_le32(insn);
> + type = AARCH64_INSN_MOVEWIDE_KEEP;
> + }
> +
> +}
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 03/23] arm64: insn: Add encoder for bitwise operations using literals

2018-03-12 Thread Marc Zyngier
On 01/03/18 15:55, Marc Zyngier wrote:
> We lack a way to encode operations such as AND, ORR, EOR that take
> an immediate value. Doing so is quite involved, and is all about
> reverse engineering the decoding algorithm described in the
> pseudocode function DecodeBitMasks().
> 
> This has been tested by feeding it all the possible literal values
> and comparing the output with that of GAS.

This is getting quite funny. Although the test harness was 100% matching
the GAS encoding...

[...]

> + /*
> +  * Inverse of Replicate(). Try to spot a repeating pattern
> +  * with a pow2 stride.
> +  */
> + for (tmp = esz / 2; tmp >= 2; tmp /= 2) {
> + u64 emask = BIT(tmp) - 1;
> +
> + if ((imm & emask) != ((imm >> (tmp / 2)) & emask))

... I failed to move a fix from the test harness to the kernel code.
Total fail. Here, "(tmp / 2)" should really read "tmp".

Thanks to James for noticing the breakage.

M. (annoyed)
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 22/23] arm64: KVM: Allow mapping of vectors outside of the RAM region

2018-03-12 Thread Marc Zyngier
On 09/03/18 18:59, James Morse wrote:
> Hi Marc,
> 
> On 01/03/18 15:55, Marc Zyngier wrote:
>> We're now ready to map our vectors in weird and wonderful locations.
>> On enabling ARM64_HARDEN_EL2_VECTORS, a vector slots gets allocated
>> if this hasn't been already done via ARM64_HARDEN_BRANCH_PREDICTOR
>> and gets mapped outside of the normal RAM region, next to the
>> idmap.
>>
>> That way, being able to obtain VBAR_EL2 doesn't reveal the mapping
>> of the rest of the hypervisor code.
> 
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 3da9e5aea936..433d13d0c271 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> 
> [..]
> 
>>  
>> +/*  This is only called on a !VHE system */
>>  static inline int kvm_map_vectors(void)
>>  {
>> -return create_hyp_mappings(kvm_ksym_ref(__bp_harden_hyp_vecs_start),
>> -   kvm_ksym_ref(__bp_harden_hyp_vecs_end),
>> -   PAGE_HYP_EXEC);
>> -}
>> +phys_addr_t vect_pa = virt_to_phys(__bp_harden_hyp_vecs_start);
> 
> __pa_symbol()?
> 
> A gift from CONFIG_DEBUG_VIRTUAL:
> 
> [3.479878] kvm [1]: 8-bit VMID
> [3.500761] [ cut here ]
> [3.505608] virt_to_phys used for non-linear address: 6fa7ae39
> (__bp_harden_hyp_vecs_start+0x0/0x2000)
> [3.515907] WARNING: CPU: 3 PID: 1 at ../arch/arm64/mm/physaddr.c:15
> __virt_to_phys+0x48/0x68
> [3.524614] Modules linked in:
> [3.527782] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
> 4.16.0-rc4-00024-gf6f4460e41ba-dirty #9396
> [3.536751] Hardware name: ARM Juno development board (r1) (DT)
> [3.542806] pstate: 8045 (Nzcv daif +PAN -UAO)
> [3.547716] pc : __virt_to_phys+0x48/0x68
> [3.551832] lr : __virt_to_phys+0x48/0x68
> 
> [3.641447] Call trace:
> [3.643975]  __virt_to_phys+0x48/0x68
> [3.647739]  kvm_arch_init+0x2fc/0x734
> [3.651589]  kvm_init+0x28/0x2b0
> [3.654910]  arm_init+0x1c/0x24
> [3.658143]  do_one_initcall+0x38/0x11c
> [3.662083]  kernel_init_freeable+0x1e0/0x27c
> [3.666552]  kernel_init+0x10/0xfc
> [3.670049]  ret_from_fork+0x10/0x18
> [3.673731] ---[ end trace d4ef061e4bf05fc6 ]---
> [3.678870] kvm [1]: vgic-v2@2c04f000
> [3.683424] kvm [1]: vgic interrupt IRQ1
> [3.687675] kvm [1]: virtual timer IRQ5
> [3.692375] kvm [1]: Hyp mode initialized successfully
> [3.718640] Initialise system trusted keyrings
Nice catch. Fixed locally.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 10/23] KVM: arm/arm64: Move HYP IO VAs to the "idmap" range

2018-03-12 Thread Marc Zyngier
[+Kristina for the extended idmap stuff]

Hi James,

On 09/03/18 18:59, James Morse wrote:
> Hi Marc,
> 
> On 01/03/18 15:55, Marc Zyngier wrote:
>> We so far mapped our HYP IO (which is essencially the GICv2 control
> 
> (Nit: essentially)
> 
> 
>> registers) using the same method as for memory. It recently appeared
>> that is a bit unsafe:
>>
>> We compute the HYP VA using the kern_hyp_va helper, but that helper
>> is only designed to deal with kernel VAs coming from the linear map,
>> and not from the vmalloc region... This could in turn cause some bad
>> aliasing between the two, amplified by the upcoming VA randomisation.
>>
>> A solution is to come up with our very own basic VA allocator for
>> MMIO. Since half of the HYP address space only contains a single
>> page (the idmap), we have plenty to borrow from. Let's use the idmap
>> as a base, and allocate downwards from it. GICv2 now lives on the
>> other side of the great VA barrier.
> 
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 0e5cfffb4c21..3074544940dc 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -502,27 +505,31 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t 
>> start, u64 size)
>>   *
>>   * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
>>   * therefore contains either mappings in the kernel memory area (above
>> - * PAGE_OFFSET), or device mappings in the vmalloc range (from
>> - * VMALLOC_START to VMALLOC_END).
>> + * PAGE_OFFSET), or device mappings in the idmap range.
>>   *
>> - * boot_hyp_pgd should only map two pages for the init code.
>> + * boot_hyp_pgd should only map the idmap range, and is only used in
>> + * the extended idmap case.
>>   */
>>  void free_hyp_pgds(void)
>>  {
>> +pgd_t *id_pgd;
>> +
>>  mutex_lock(&kvm_hyp_pgd_mutex);
>>  
>> +id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd;
> 
>> +
>> +if (id_pgd)
>> +unmap_hyp_range(id_pgd, io_map_base,
>> +hyp_idmap_start + PAGE_SIZE - io_map_base);
> 
> Even if kvm_mmu_init() fails before it sets io_map_base, this will still unmap
> the idmap. It just starts from 0, so it may take out the flipped PAGE_OFFSET
> range too...

Yup, definitely worth fixing.

> 
> (using io_map_base without taking io_map_lock makes me nervous ... in 
> practice,
> its fine)

I'm not too worried about that, as we only have a single CPU performing
the teardown. But better safe than sorry, so I'll take it anyway.

> 
> 
>> +
>>  if (boot_hyp_pgd) {
>> -unmap_hyp_range(boot_hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
>>  boot_hyp_pgd = NULL;
>>  }
>>  
>>  if (hyp_pgd) {
>> -unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE);
>>  unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
>>  (uintptr_t)high_memory - PAGE_OFFSET);
>> -unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START),
>> -VMALLOC_END - VMALLOC_START);
>>  
>>  free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
>>  hyp_pgd = NULL;
>> @@ -719,7 +726,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t 
>> size,
>> void __iomem **kaddr,
>> void __iomem **haddr)
>>  {
>> -unsigned long start, end;
>> +pgd_t *pgd = hyp_pgd;
>> +unsigned long base;
>>  int ret;
>>  
>>  *kaddr = ioremap(phys_addr, size);
>> @@ -731,11 +739,42 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, 
>> size_t size,
>>  return 0;
>>  }
>>  
>> +mutex_lock(&io_map_lock);
>> +
>> +/*
>> + * This assumes that we we have enough space below the idmap
>> + * page to allocate our VAs. If not, the check below will
>> + * kick. A potential alternative would be to detect that
>> + * overflow and switch to an allocation above the idmap.
>> + */
>> +size = max(PAGE_SIZE, roundup_pow_of_two(size));
>> +base = io_map_base - size;
>> +base &= ~(size - 1);
>>  
>> -start = kern_hyp_va((unsigned long)*kaddr);
>> -end = kern_hyp_va((unsigned long)*kaddr + size);
>> -ret = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD, start, end,
>> - __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE);
>> +/*
>> + * Verify that BIT(VA_BITS - 1) hasn't been flipped by
>> + * allocating the new area, as it would indicate we've
>> + * overflowed the idmap/IO address range.
>> + */
>> +if ((base ^ io_map_base) & BIT(VA_BITS - 1)) {
>> +ret = -ENOMEM;
>> +goto out;
>> +}
>> +
>> +if (__kvm_cpu_uses_extended_idmap())
>> +pgd = boot_hyp_pgd;
>> +
>> +ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(),
>> +base, base + size,
>> +__phys_to_pfn(phys_addr), PAGE_HYP_DEVICE

Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

2018-03-12 Thread Marc Zyngier
On 12/03/18 02:33, Yang, Shunyong wrote:
> Hi, Marc,
> 
> On Sun, 2018-03-11 at 12:17 +, Marc Zyngier wrote:
>> On Sun, 11 Mar 2018 01:55:08 +
>> Christoffer Dall  wrote:
>>
>>>
>>> On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier >> m> wrote:

 On Fri, 09 Mar 2018 21:36:12 +,
 Christoffer Dall wrote:  
>
>
> On Thu, Mar 08, 2018 at 05:28:44PM +, Marc Zyngier wrote:  
>>
>> I'd be more confident if we did forbid P+A for such
>> interrupts
>> altogether, as they really feel like another kind of HW
>> interrupt.  
> How about a slightly bigger hammer:  Can we avoid doing P+A for
> level
> interrupts completely?  I don't think that really makes much
> sense, and
> I think we simply everything if we just come back out and
> resample the
> line.  For an edge, something like a network card, there's a
> potential
> performance win to appending a new pending state, but I doubt
> that this
> is the case for level interrupts.  
 I started implementing the same thing yesterday. Somehow, it
 feels
 slightly better to have the same flow for all level interrupts,
 including the timer, and we only use the MI on EOI as a way to
 trigger
 the next state of injection. Still testing, but looking good so
 far.

 I'm still puzzled that we have this level-but-not-quite behaviour
 for
 VFIO interrupts. At some point, it is going to bite us badly.
  
>>> Where is the departure from level-triggered behavior with VFIO?  As
>>> far as I can tell, the GIC flow of the interrupts will be just a
>>> level
>>> interrupt, 
>> The GIC is fine, I believe. What is not exactly fine is the
>> signalling
>> from the device, which will never be dropped until the EOI has been
>> detected.
>>
>>>
>>> but we just need to make sure the resamplefd mechanism is
>>> supported for both types of interrupts.  Whether or not that's a
>>> decent mechanism seems orthogonal to me, but that's a discussion
>>> for
>>> another day I think.
>> Given that VFIO is built around this mechanism, I don't think we have
>> a
>> choice but to support it. Anyway, I came up with the following patch,
>> which I tested on Seattle with mtty. It also survived my usual
>> hammering of cyclictest, hackbench  and bulk VM installs.
>>
>> Shunyong, could you please give it a go?
>>
>> Thanks,
>>
>>  M.
>>
> 
> I have tested the patch. It works on QDF2400 platform
> and kvm_notify_acked_irq() is called when state is idle.

Thanks a lot for testing.

> 
> BTW, I have following questions when I was debugging the issue.
> Coud you please give me some help?
> 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi();

MI stands for Maintenance Interrupts. Life is too short to write that
all the time ;-)

> 2)In some __hyp_text code where printk() will cause "HYP panic:", such
> as in __kvm_vcpu_run(). How can I output debug information?

You can't. None of the kernel is mapped at EL2 on pre-VHE hardware.
You'll need to find indirect ways of outputting information (store data
in memory, and output it once you're back to EL1).

Thanks again,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm