Re: [PATCH v2 6/6] KVM: arm/arm64: vgic: Allow configuration of interrupt groups

2018-07-09 Thread Marc Zyngier
On 04/07/18 10:38, Christoffer Dall wrote:
> Implement the required MMIO accessors for GICv2 and GICv3 for the
> IGROUPR distributor and redistributor registers.
> 
> This can allow guests to change behavior compared to running on previous
> versions of KVM, but only to align with the architecture and hardware
> implementations.
> 
> This also allows userspace to configure the groups for interrupts.  Note
> that this potentially results in GICv2 guests not receiving interrupts
> after migration if migrating from an older kernel that exposes GICv2
> interrupts as group 1.
> 
> Cc: Andrew Jones 
> Signed-off-by: Christoffer Dall 
> ---
> I implemented (but stashed) a version of this which predicated the
> behavior based on the value of GICD_IIDR revision field, falling back to
> ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> revision less than 2.  However, current QEMU implementations simply
> don't write the GICD_IIDR, so this doesn't help at all without changing
> QEMU anyhow.
> 
> The only actual fix I can see here to work around the problem in the
> kernel is to require an opt-in to allow restoring groups from userspace,
> but that's a lot of logic to support cross-kernel version migration.
> 
> Question: Do we expect that cross-kernel version migration is a critical
> feature that people really expect to work, and do we actually have
> examples of catering to this in the kernel elsewhere?  (Also, how would
> then that relate to the whole 'adding a new sysreg breaks migration'
> situation?)

I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
is definitely trying to restore RO sysregs (and that's how we detect
incompatibilities).

I think we should at least give userspace a chance to do the right
thing. If it doesn't, well, too bad.

How bad is that "writable GICD_IIDR" patch? Because at this stage, and
in the absence of any comment, I'm close to just pick that series and
merge it for 4.19.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/6] KVM: arm/arm64: vgic: Allow configuration of interrupt groups

2018-07-09 Thread Peter Maydell
On 9 July 2018 at 09:42, Marc Zyngier  wrote:
> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> is definitely trying to restore RO sysregs (and that's how we detect
> incompatibilities).

Accident of design, mostly. From QEMU's point of view, GICD_IIDR
is part of the GIC device, which we save and restore as a separate
thing from the CPU. The GIC device was written in what for QEMU
is a more 'traditional' style, where QEMU assumes it knows all the
registers that might have state and saves and restores them all
(and doesn't bother to do anything with constant registers).
The CPU sysregs are done in a completely different style[*], where
we let the kernel be the source of truth for what sysregs exist;
as a side effect of that we end up trying to save and restore
constant sysregs, since QEMU doesn't know they're constant.

[*] there's an argument that in retrospect this was a mistake;
still, it is what we have and trying to upend it now would be
a huge pain.

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: vgic-its: Remove VLA usage

2018-07-09 Thread Marc Zyngier
Hi kees,

On 02/07/18 18:15, Kees Cook wrote:
> On Mon, Jul 2, 2018 at 12:36 AM, Auger Eric  wrote:
>> Hi Kees,
>>
>> On 06/29/2018 08:46 PM, Kees Cook wrote:
>>> In the quest to remove all stack VLA usage from the kernel[1], this
>>> switches to using a maximum size and adds sanity checks. Additionally
>>> cleans up some of the int-vs-u32 usage and adds additional bounds checking.
>>> As it currently stands, this will always be 8 bytes until the ABI changes.
>>>
>>> [1] 
>>> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>>>
>>> Cc: Christoffer Dall 
>>> Cc: Marc Zyngier 
>>> Cc: Eric Auger 
>>> Cc: Andre Przywara 
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: kvmarm@lists.cs.columbia.edu
>>> Signed-off-by: Kees Cook 
>>> ---
>>>  virt/kvm/arm/vgic/vgic-its.c | 19 +++
>>>  1 file changed, 15 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>> index 4ed79c939fb4..3143fc047fcf 100644
>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>> @@ -168,8 +168,14 @@ struct vgic_its_abi {
>>>   int (*commit)(struct vgic_its *its);
>>>  };
>>>
>>> +#define ABI_0_ESZ8
>>> +#define ESZ_MAX  ABI_0_ESZ
>>> +
>>>  static const struct vgic_its_abi its_table_abi_versions[] = {
>>> - [0] = {.cte_esz = 8, .dte_esz = 8, .ite_esz = 8,
>>> + [0] = {
>>> +  .cte_esz = ABI_0_ESZ,
>>> +  .dte_esz = ABI_0_ESZ,
>>> +  .ite_esz = ABI_0_ESZ,
>>>.save_tables = vgic_its_save_tables_v0,
>>>.restore_tables = vgic_its_restore_tables_v0,
>>>.commit = vgic_its_commit_v0,
>>> @@ -180,10 +186,12 @@ static const struct vgic_its_abi 
>>> its_table_abi_versions[] = {
>>>
>>>  inline const struct vgic_its_abi *vgic_its_get_abi(struct vgic_its *its)
>>>  {
>>> + if (WARN_ON(its->abi_rev >= NR_ITS_ABIS))
>>> + return NULL;
>>>   return &its_table_abi_versions[its->abi_rev];
>>>  }
>>>
>>> -int vgic_its_set_abi(struct vgic_its *its, int rev)
>>> +static int vgic_its_set_abi(struct vgic_its *its, u32 rev)
>>>  {
>> if vgic_its_get_abi is likely to return NULL, don't we need to check abi
>> != NULL in all call sites.
> 
> My thinking was that since it should never happen, a WARN_ON would be
> sufficient. But I can drop all these changes if you want. I just
> wanted to see the VLA removed. :)

Are you going to respin this patch limiting it to just the VLA changes?
I'm actively queuing stuff for the next merge window, and it'd be good
to get that one in.

Alternatively, I can drop the WARN_ONs myself. Just let me know.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Fix vgic init race

2018-07-09 Thread Marc Zyngier
On 03/07/18 22:26, Christoffer Dall wrote:
> The vgic_init function can race with kvm_arch_vcpu_create() which does
> not hold kvm_lock() and we therefore have no synchronization primitives
> to ensure we're doing the right thing.
> 
> As the user is trying to initialize or run the VM while at the same time
> creating more VCPUs, we just have to refuse to initialize the VGIC in
> this case rather than silently failing with a broken VCPU.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/vgic/vgic-init.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 9406eaf..c0c0b88 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -286,6 +286,10 @@ int vgic_init(struct kvm *kvm)
>   if (vgic_initialized(kvm))
>   return 0;
>  
> + /* Are we also in the middle of creating a VCPU? */
> + if (kvm->created_vcpus != atomic_read(&kvm->online_vcpus))
> + return -EBUSY;
> +
>   /* freeze the number of spis */
>   if (!dist->nr_spis)
>   dist->nr_spis = VGIC_NR_IRQS_LEGACY - VGIC_NR_PRIVATE_IRQS;
> 

Applied to queue.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 0/6] KVM/arm64: Cache maintenance relaxations

2018-07-09 Thread Marc Zyngier
On 02/07/18 16:02, Marc Zyngier wrote:
> This small series makes use of features recently introduced in the
> ARMv8 architecture to relax the cache maintenance operations on CPUs
> that implement these features.
> 
> FWB is the most important one. It allows stage-2 to enforce the
> cacheability of memory, no matter what the guest says. It also
> mandates that the whole machine is cache coherent (no non-coherent
> I/O), meaning we can drop a whole class of cache maintenance
> operations.
> 
> FWB also has the same effect as CTR_EL0.IDC being set, and when
> combined with CTR_EL0.DIC allows the removal of all cache maintenance
> and tracking of pages being executed.
> 
> We also take the opportunity to drop a few useless CMOs that were
> applied to the HYP page tables, but that were never necessary. This
> ended up requiring quite a few changes in out page table accessors so
> that they get consistent barriers. These barriers are a bit on the
> heavy side, and could be further optimized, although that's probably
> for a different patch series
> 
> Unless someone screams now, I plan to apply this to kvmarm/next
> shortly.

Now applied to queue.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: vgic-debug: Show LPI status

2018-07-09 Thread Marc Zyngier
The vgic debugfs file only knows about SGI/PPI/SPI interrupts, and
completely ignores LPIs. Let's fix that.

Signed-off-by: Marc Zyngier 
---
I've had this patch in my tree for almost 4 months now, and it has been
useful at least once. Thoughts?

 virt/kvm/arm/vgic/vgic-debug.c | 42 ++
 virt/kvm/arm/vgic/vgic-its.c   | 12 +-
 virt/kvm/arm/vgic/vgic.h   |  1 +
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-debug.c b/virt/kvm/arm/vgic/vgic-debug.c
index d3a403f63010..07aa900bac56 100644
--- a/virt/kvm/arm/vgic/vgic-debug.c
+++ b/virt/kvm/arm/vgic/vgic-debug.c
@@ -36,9 +36,12 @@
 struct vgic_state_iter {
int nr_cpus;
int nr_spis;
+   int nr_lpis;
int dist_id;
int vcpu_id;
int intid;
+   int lpi_idx;
+   u32 *lpi_array;
 };
 
 static void iter_next(struct vgic_state_iter *iter)
@@ -52,6 +55,12 @@ static void iter_next(struct vgic_state_iter *iter)
if (iter->intid == VGIC_NR_PRIVATE_IRQS &&
++iter->vcpu_id < iter->nr_cpus)
iter->intid = 0;
+
+   if (iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS)) {
+   if (iter->lpi_idx < iter->nr_lpis)
+   iter->intid = iter->lpi_array[iter->lpi_idx];
+   iter->lpi_idx++;
+   }
 }
 
 static void iter_init(struct kvm *kvm, struct vgic_state_iter *iter,
@@ -63,6 +72,11 @@ static void iter_init(struct kvm *kvm, struct 
vgic_state_iter *iter,
 
iter->nr_cpus = nr_cpus;
iter->nr_spis = kvm->arch.vgic.nr_spis;
+   if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
+   iter->nr_lpis = vgic_copy_lpi_list(kvm, NULL, &iter->lpi_array);
+   if (iter->nr_lpis < 0)
+   iter->nr_lpis = 0;
+   }
 
/* Fast forward to the right position if needed */
while (pos--)
@@ -73,7 +87,8 @@ static bool end_of_vgic(struct vgic_state_iter *iter)
 {
return iter->dist_id > 0 &&
iter->vcpu_id == iter->nr_cpus &&
-   (iter->intid - VGIC_NR_PRIVATE_IRQS) == iter->nr_spis;
+   iter->intid >= (iter->nr_spis + VGIC_NR_PRIVATE_IRQS) &&
+   iter->lpi_idx > iter->nr_lpis;
 }
 
 static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
@@ -130,6 +145,7 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
 
mutex_lock(&kvm->lock);
iter = kvm->arch.vgic.iter;
+   kfree(iter->lpi_array);
kfree(iter);
kvm->arch.vgic.iter = NULL;
mutex_unlock(&kvm->lock);
@@ -137,12 +153,14 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
 
 static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
 {
+   bool v3 = dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3;
+
seq_printf(s, "Distributor\n");
seq_printf(s, "===\n");
-   seq_printf(s, "vgic_model:\t%s\n",
-  (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) ?
-  "GICv3" : "GICv2");
+   seq_printf(s, "vgic_model:\t%s\n", v3 ? "GICv3" : "GICv2");
seq_printf(s, "nr_spis:\t%d\n", dist->nr_spis);
+   if (v3)
+   seq_printf(s, "nr_lpis:\t%d\n", dist->lpi_list_count);
seq_printf(s, "enabled:\t%d\n", dist->enabled);
seq_printf(s, "\n");
 
@@ -175,8 +193,10 @@ static void print_irq_state(struct seq_file *s, struct 
vgic_irq *irq,
type = "SGI";
else if (irq->intid < VGIC_NR_PRIVATE_IRQS)
type = "PPI";
-   else
+   else if (irq->intid < VGIC_MAX_SPI)
type = "SPI";
+   else
+   type = "LPI";
 
if (irq->intid ==0 || irq->intid == VGIC_NR_PRIVATE_IRQS)
print_header(s, irq, vcpu);
@@ -204,7 +224,6 @@ static void print_irq_state(struct seq_file *s, struct 
vgic_irq *irq,
irq->source,
irq->priority,
(irq->vcpu) ? irq->vcpu->vcpu_id : -1);
-
 }
 
 static int vgic_debug_show(struct seq_file *s, void *v)
@@ -223,17 +242,20 @@ static int vgic_debug_show(struct seq_file *s, void *v)
if (!kvm->arch.vgic.initialized)
return 0;
 
-   if (iter->vcpu_id < iter->nr_cpus) {
+   if (iter->vcpu_id < iter->nr_cpus)
vcpu = kvm_get_vcpu(kvm, iter->vcpu_id);
-   irq = &vcpu->arch.vgic_cpu.private_irqs[iter->intid];
-   } else {
-   irq = &kvm->arch.vgic.spis[iter->intid - VGIC_NR_PRIVATE_IRQS];
+
+   irq = vgic_get_irq(kvm, vcpu, iter->intid);
+   if (!irq) {
+   seq_printf(s, "   LPI %4d freed\n", iter->intid);
+   return 0;
}
 
spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);
spin_unlock_irqrestore(&irq->irq_lock, flags);
 
+   vgic_put_irq(kvm, irq);
return 0;
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-its.c b

Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-09 Thread Dave Martin
On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:
> On 07/06/2018 04:09 PM, Marc Zyngier wrote:
> >On 06/07/18 14:49, Suzuki K Poulose wrote:
> >>On 04/07/18 23:03, Suzuki K Poulose wrote:
> >>>On 07/04/2018 04:51 PM, Will Deacon wrote:
> Hi Suzuki,
> 
> On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:
> >Allow specifying the physical address size for a new VM via
> >the kvm_type argument for KVM_CREATE_VM ioctl. This allows
> >us to finalise the stage2 page table format as early as possible
> >and hence perform the right checks on the memory slots without
> >complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
> >of the type field and can encode more information in the future if
> >required. The IPA size is still capped at 40bits.
> >
> >Cc: Marc Zyngier 
> >Cc: Christoffer Dall 
> >Cc: Peter Maydel 
> >Cc: Paolo Bonzini 
> >Cc: Radim Krčmář 
> >Signed-off-by: Suzuki K Poulose 
> >---
> >   arch/arm/include/asm/kvm_mmu.h   |  2 ++
> >   arch/arm64/include/asm/kvm_arm.h | 10 +++---
> >   arch/arm64/include/asm/kvm_mmu.h |  2 ++
> >   include/uapi/linux/kvm.h | 10 ++
> >   virt/kvm/arm/arm.c   | 24 ++--
> >   5 files changed, 39 insertions(+), 9 deletions(-)
> 
> [...]
> 
> >diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >index 4df9bb6..fa4cab0 100644
> >--- a/include/uapi/linux/kvm.h
> >+++ b/include/uapi/linux/kvm.h
> >@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
> >   #define KVM_S390_SIE_PAGE_OFFSET 1
> >   /*
> >+ * On arm/arm64, machine type can be used to request the physical
> >+ * address size for the VM. Bits [7-0] have been reserved for the
> >+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> >+ * value 0 implies the default IPA size, which is 40bits.
> >+ */
> >+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
> >+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
> >+    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
> 
> This seems like you're allocating quite a lot of bits in a non-extensible
> interface to a fairly esoteric parameter. Would it be better to add 
> another
> ioctl, or condense the number of sizes you support instead?
> >>>
> >>>As I explained in the other thread, we need the size as soon as the VM
> >>>is created. The major challenge is keeping the backward compatibility by
> >>>mapping 0 to 40bits. I will give it a thought.
> >>
> >>Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which
> >>occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange
> >>also has the field definitions, except that the field is 4bits wide, but
> >>only 3bits are used)
> >>
> >>000 32 bits, 4GB.
> >>001 36 bits, 64GB.
> >>010 40 bits, 1TB.
> >>011 42 bits, 4TB.
> >>100 44 bits, 16TB.
> >>101 48 bits, 256TB.
> >>110 52 bits, 4PB
> >>
> >>But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
> >>we could use the additional one bit to indicate that IPA size is requested
> >>in the 3 bits.
> >>
> >>i.e,
> >>
> >>machine_type:
> >>
> >>Bit [2:0]   - Requested IPA size. Values follow VTCR_EL2.PS format.
> >>
> >>Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested.
> >>0 => Not requested
> >>
> >>The only minor down side is restricting to the predefined values above,
> >>which is not a real issue for a VM.
> >>
> >>Thoughts ?
> >
> >I'd be very wary of using that 4th bit to do something that is not in
> >the architecture. We have only a single value left to be used (0b111),
> >and then your scheme clashes with the architecture definition.
> 
> I agree. However, if we ever go beyond the 3bits in PARange, we have an
> issue with {V}TCR counter part. But lets not take that chance.
> 
> >
> >I'd rather encode things in a way that is independent from the
> >architecture, and be done with it. You can map 0 to 40bits, and we have
> >the ability to express all values the architecture has (just in a
> >different order).
> 
> The other option I can think of is encoding a signed number which is the
> difference of the IPA from 40. But that would need 5 bits if we were to
> encode it as it is. And if we want to squeeze it in 4bit, we could store
> half the difference (limiting the IPA limit to even numbers).
> 
> i.e IPA = 40 + 2 * sign_extend(bits[3:0);

I came across similar issues when trying to work out how to enable
SVE for KVM.  In the end I reduced this to a per-vcpu feature, but
it means that there is no global opt-in for the SVE-specific KVM
API extensions:

That's a bit gross, because SVE may require a change to the way
vcpus are initialised.  The set of supported SVE vector lengths needs
to be set somehow before the vcpu is set running, but it's tricky do
do that without a new ioctl -- which would mean that 

Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-09 Thread Marc Zyngier
On 09/07/18 12:23, Dave Martin wrote:
> On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:
>> On 07/06/2018 04:09 PM, Marc Zyngier wrote:
>>> On 06/07/18 14:49, Suzuki K Poulose wrote:
 On 04/07/18 23:03, Suzuki K Poulose wrote:
> On 07/04/2018 04:51 PM, Will Deacon wrote:
>> Hi Suzuki,
>>
>> On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:
>>> Allow specifying the physical address size for a new VM via
>>> the kvm_type argument for KVM_CREATE_VM ioctl. This allows
>>> us to finalise the stage2 page table format as early as possible
>>> and hence perform the right checks on the memory slots without
>>> complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
>>> of the type field and can encode more information in the future if
>>> required. The IPA size is still capped at 40bits.
>>>
>>> Cc: Marc Zyngier 
>>> Cc: Christoffer Dall 
>>> Cc: Peter Maydel 
>>> Cc: Paolo Bonzini 
>>> Cc: Radim Krčmář 
>>> Signed-off-by: Suzuki K Poulose 
>>> ---
>>>   arch/arm/include/asm/kvm_mmu.h   |  2 ++
>>>   arch/arm64/include/asm/kvm_arm.h | 10 +++---
>>>   arch/arm64/include/asm/kvm_mmu.h |  2 ++
>>>   include/uapi/linux/kvm.h | 10 ++
>>>   virt/kvm/arm/arm.c   | 24 ++--
>>>   5 files changed, 39 insertions(+), 9 deletions(-)
>>
>> [...]
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 4df9bb6..fa4cab0 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
>>>   #define KVM_S390_SIE_PAGE_OFFSET 1
>>>   /*
>>> + * On arm/arm64, machine type can be used to request the physical
>>> + * address size for the VM. Bits [7-0] have been reserved for the
>>> + * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
>>> + * value 0 implies the default IPA size, which is 40bits.
>>> + */
>>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
>>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
>>> +    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
>>
>> This seems like you're allocating quite a lot of bits in a non-extensible
>> interface to a fairly esoteric parameter. Would it be better to add 
>> another
>> ioctl, or condense the number of sizes you support instead?
>
> As I explained in the other thread, we need the size as soon as the VM
> is created. The major challenge is keeping the backward compatibility by
> mapping 0 to 40bits. I will give it a thought.

 Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, 
 which
 occupies 3 bits and has the following definitions. 
 (ID_AA64MMFR0_EL1:PARange
 also has the field definitions, except that the field is 4bits wide, but
 only 3bits are used)

 000 32 bits, 4GB.
 001 36 bits, 64GB.
 010 40 bits, 1TB.
 011 42 bits, 4TB.
 100 44 bits, 16TB.
 101 48 bits, 256TB.
 110 52 bits, 4PB

 But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
 we could use the additional one bit to indicate that IPA size is requested
 in the 3 bits.

 i.e,

 machine_type:

 Bit [2:0]  - Requested IPA size. Values follow VTCR_EL2.PS format.

 Bit [3]- 1 => IPA Size bits (Bits[2:0]) requested.
0 => Not requested

 The only minor down side is restricting to the predefined values above,
 which is not a real issue for a VM.

 Thoughts ?
>>>
>>> I'd be very wary of using that 4th bit to do something that is not in
>>> the architecture. We have only a single value left to be used (0b111),
>>> and then your scheme clashes with the architecture definition.
>>
>> I agree. However, if we ever go beyond the 3bits in PARange, we have an
>> issue with {V}TCR counter part. But lets not take that chance.
>>
>>>
>>> I'd rather encode things in a way that is independent from the
>>> architecture, and be done with it. You can map 0 to 40bits, and we have
>>> the ability to express all values the architecture has (just in a
>>> different order).
>>
>> The other option I can think of is encoding a signed number which is the
>> difference of the IPA from 40. But that would need 5 bits if we were to
>> encode it as it is. And if we want to squeeze it in 4bit, we could store
>> half the difference (limiting the IPA limit to even numbers).
>>
>> i.e IPA = 40 + 2 * sign_extend(bits[3:0);
> 
> I came across similar issues when trying to work out how to enable
> SVE for KVM.  In the end I reduced this to a per-vcpu feature, but
> it means that there is no global opt-in for the SVE-specific KVM
> API extensions:
> 
> That's a bit gross, because SVE may require a change to the way
> vcpus are initialised.  The set of supported SVE v

Re: [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-09 Thread Dave Martin
On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> On 09/07/18 12:23, Dave Martin wrote:
> > On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:
> >> On 07/06/2018 04:09 PM, Marc Zyngier wrote:
> >>> On 06/07/18 14:49, Suzuki K Poulose wrote:
>  On 04/07/18 23:03, Suzuki K Poulose wrote:
> > On 07/04/2018 04:51 PM, Will Deacon wrote:
> >> Hi Suzuki,
> >>
> >> On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:
> >>> Allow specifying the physical address size for a new VM via
> >>> the kvm_type argument for KVM_CREATE_VM ioctl. This allows
> >>> us to finalise the stage2 page table format as early as possible
> >>> and hence perform the right checks on the memory slots without
> >>> complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
> >>> of the type field and can encode more information in the future if
> >>> required. The IPA size is still capped at 40bits.
> >>>
> >>> Cc: Marc Zyngier 
> >>> Cc: Christoffer Dall 
> >>> Cc: Peter Maydel 
> >>> Cc: Paolo Bonzini 
> >>> Cc: Radim Krčmář 
> >>> Signed-off-by: Suzuki K Poulose 
> >>> ---
> >>>   arch/arm/include/asm/kvm_mmu.h   |  2 ++
> >>>   arch/arm64/include/asm/kvm_arm.h | 10 +++---
> >>>   arch/arm64/include/asm/kvm_mmu.h |  2 ++
> >>>   include/uapi/linux/kvm.h | 10 ++
> >>>   virt/kvm/arm/arm.c   | 24 ++--
> >>>   5 files changed, 39 insertions(+), 9 deletions(-)
> >>
> >> [...]
> >>
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index 4df9bb6..fa4cab0 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
> >>>   #define KVM_S390_SIE_PAGE_OFFSET 1
> >>>   /*
> >>> + * On arm/arm64, machine type can be used to request the physical
> >>> + * address size for the VM. Bits [7-0] have been reserved for the
> >>> + * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> >>> + * value 0 implies the default IPA size, which is 40bits.
> >>> + */
> >>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
> >>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
> >>> +    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
> >>
> >> This seems like you're allocating quite a lot of bits in a 
> >> non-extensible
> >> interface to a fairly esoteric parameter. Would it be better to add 
> >> another
> >> ioctl, or condense the number of sizes you support instead?
> >
> > As I explained in the other thread, we need the size as soon as the VM
> > is created. The major challenge is keeping the backward compatibility by
> > mapping 0 to 40bits. I will give it a thought.
> 
>  Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, 
>  which
>  occupies 3 bits and has the following definitions. 
>  (ID_AA64MMFR0_EL1:PARange
>  also has the field definitions, except that the field is 4bits wide, but
>  only 3bits are used)
> 
>  000 32 bits, 4GB.
>  001 36 bits, 64GB.
>  010 40 bits, 1TB.
>  011 42 bits, 4TB.
>  100 44 bits, 16TB.
>  101 48 bits, 256TB.
>  110 52 bits, 4PB
> 
>  But we need to map 0 => 40bits IPA to make our ABI backward compatible. 
>  So
>  we could use the additional one bit to indicate that IPA size is 
>  requested
>  in the 3 bits.
> 
>  i.e,
> 
>  machine_type:
> 
>  Bit [2:0]- Requested IPA size. Values follow VTCR_EL2.PS format.
> 
>  Bit [3]  - 1 => IPA Size bits (Bits[2:0]) requested.
>   0 => Not requested
> 
>  The only minor down side is restricting to the predefined values above,
>  which is not a real issue for a VM.
> 
>  Thoughts ?
> >>>
> >>> I'd be very wary of using that 4th bit to do something that is not in
> >>> the architecture. We have only a single value left to be used (0b111),
> >>> and then your scheme clashes with the architecture definition.
> >>
> >> I agree. However, if we ever go beyond the 3bits in PARange, we have an
> >> issue with {V}TCR counter part. But lets not take that chance.
> >>
> >>>
> >>> I'd rather encode things in a way that is independent from the
> >>> architecture, and be done with it. You can map 0 to 40bits, and we have
> >>> the ability to express all values the architecture has (just in a
> >>> different order).
> >>
> >> The other option I can think of is encoding a signed number which is the
> >> difference of the IPA from 40. But that would need 5 bits if we were to
> >> encode it as it is. And if we want to squeeze it in 4bit, we could store
> >> half the difference (limiting the IPA limit to even numbers).
> >>
> >> i.e IPA = 40 + 2 * sign_extend(bits[3:0);
> > 
> > I came across similar issues

[PATCH v5 0/7] KVM: Support PUD hugepages at stage 2

2018-07-09 Thread Punit Agrawal
This series is an update to the PUD hugepage support previously posted
at [0]. This patchset adds support for PUD hugepages at stage
2. This feature is useful on cores that have support for large sized
TLB mappings (e.g., 1GB for 4K granule).

The biggest change in this version is to replace repeated instances of
walking the page tables to get to a leaf-entry with a function to
return the appropriate entry. This was suggested by Suzuki and should
help reduce the amount of churn resulting from future changes. It also
address other feedback on the previous version.

Support is added to code that is shared between arm and arm64. Dummy
helpers for arm are provided as the port does not support PUD hugepage
sizes.

The patches have been tested on an A57 based system. The patchset is
based on v4.18-rc4. The are a few conflicts with the support for 52
bit IPA[1] due to change in number of parameters for
stage2_pmd_offset().

Thanks,
Punit

v4 -> v5:
* Patch 1 - Drop helper stage2_should_exec() and refactor the
  condition to decide if a page table entry should be marked
  executable
* Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and
  latter patches
* Patch 7 - Use stage 2 accessors instead of using the page table
  helpers directly
* Patch 7 - Add a note to update the PUD hugepage support when number
  of levels of stage 2 tables differs from stage 1

v3 -> v4:
* Patch 1 and 7 - Don't put down hugepages pte if logging is enabled
* Patch 4-5 - Add PUD hugepage support for exec and access faults
* Patch 6 - PUD hugepage support for aging page table entries

v2 -> v3:
* Update vma_pagesize directly if THP [1/4]. Previsouly this was done
  indirectly via hugetlb
* Added review tag [4/4]

v1 -> v2:
* Create helper to check if the page should have exec permission [1/4]
* Fix broken condition to detect THP hugepage [1/4]
* Fix in-correct hunk resulting from a rebase [4/4]

[0] https://www.spinics.net/lists/arm-kernel/msg663562.html
[1] https://www.spinics.net/lists/kvm/msg171065.html

Punit Agrawal (7):
  KVM: arm/arm64: Share common code in user_mem_abort()
  KVM: arm/arm64: Introduce helpers to manupulate page table entries
  KVM: arm64: Support dirty page tracking for PUD hugepages
  KVM: arm64: Support PUD hugepage in stage2_is_exec()
  KVM: arm64: Support handling access faults for PUD hugepages
  KVM: arm64: Update age handlers to support PUD hugepages
  KVM: arm64: Add support for creating PUD hugepages at stage 2

 arch/arm/include/asm/kvm_mmu.h |  60 +
 arch/arm64/include/asm/kvm_mmu.h   |  47 
 arch/arm64/include/asm/pgtable-hwdef.h |   4 +
 arch/arm64/include/asm/pgtable.h   |   9 +
 virt/kvm/arm/mmu.c | 289 ++---
 5 files changed, 330 insertions(+), 79 deletions(-)

-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 0/7] KVM: Support PUD hugepages at stage 2

2018-07-09 Thread Punit Agrawal
This series is an update to the PUD hugepage support previously posted
at [0]. This patchset adds support for PUD hugepages at stage
2. This feature is useful on cores that have support for large sized
TLB mappings (e.g., 1GB for 4K granule).

The biggest change in this version is to replace repeated instances of
walking the page tables to get to a leaf-entry with a function to
return the appropriate entry. This was suggested by Suzuki and should
help reduce the amount of churn resulting from future changes. It also
address other feedback on the previous version.

Support is added to code that is shared between arm and arm64. Dummy
helpers for arm are provided as the port does not support PUD hugepage
sizes.

The patches have been tested on an A57 based system. The patchset is
based on v4.18-rc4. The are a few conflicts with the support for 52
bit IPA[1] due to change in number of parameters for
stage2_pmd_offset().

Thanks,
Punit

v4 -> v5:

* Patch 1 - Drop helper stage2_should_exec() and refactor the
  condition to decide if a page table entry should be marked
  executable
* Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and
  latter patches
* Patch 7 - Use stage 2 accessors instead of using the page table
  helpers directly
* Patch 7 - Add a note to update the PUD hugepage support when number
  of levels of stage 2 tables differs from stage 1


v3 -> v4:
* Patch 1 and 7 - Don't put down hugepages pte if logging is enabled
* Patch 4-5 - Add PUD hugepage support for exec and access faults
* Patch 6 - PUD hugepage support for aging page table entries

v2 -> v3:
* Update vma_pagesize directly if THP [1/4]. Previsouly this was done
  indirectly via hugetlb
* Added review tag [4/4]

v1 -> v2:
* Create helper to check if the page should have exec permission [1/4]
* Fix broken condition to detect THP hugepage [1/4]
* Fix in-correct hunk resulting from a rebase [4/4]

[0] https://www.spinics.net/lists/arm-kernel/msg663562.html
[1] https://www.spinics.net/lists/kvm/msg171065.html


Punit Agrawal (7):
  KVM: arm/arm64: Share common code in user_mem_abort()
  KVM: arm/arm64: Introduce helpers to manupulate page table entries
  KVM: arm64: Support dirty page tracking for PUD hugepages
  KVM: arm64: Support PUD hugepage in stage2_is_exec()
  KVM: arm64: Support handling access faults for PUD hugepages
  KVM: arm64: Update age handlers to support PUD hugepages
  KVM: arm64: Add support for creating PUD hugepages at stage 2

 arch/arm/include/asm/kvm_mmu.h |  60 +
 arch/arm64/include/asm/kvm_mmu.h   |  47 
 arch/arm64/include/asm/pgtable-hwdef.h |   4 +
 arch/arm64/include/asm/pgtable.h   |   9 +
 virt/kvm/arm/mmu.c | 289 ++---
 5 files changed, 330 insertions(+), 79 deletions(-)

-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 1/7] KVM: arm/arm64: Share common code in user_mem_abort()

2018-07-09 Thread Punit Agrawal
The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
---
 virt/kvm/arm/mmu.c | 67 ++
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..ea3d992e4fb7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1422,7 +1422,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  unsigned long fault_status)
 {
int ret;
-   bool write_fault, exec_fault, writable, hugetlb = false, force_pte = 
false;
+   bool write_fault, writable, hugetlb = false, force_pte = false;
+   bool exec_fault, needs_exec;
unsigned long mmu_seq;
gfn_t gfn = fault_ipa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
@@ -1431,7 +1432,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
-   unsigned long flags = 0;
+   unsigned long vma_pagesize, flags = 0;
 
write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1451,7 +1452,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
return -EFAULT;
}
 
-   if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
+   vma_pagesize = vma_kernel_pagesize(vma);
+   if (vma_pagesize == PMD_SIZE && !logging_active) {
hugetlb = true;
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
@@ -1520,28 +1522,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
 
-   if (!hugetlb && !force_pte)
+   if (!hugetlb && !force_pte) {
+   /*
+* Only PMD_SIZE transparent hugepages(THP) are
+* currently supported. This code will need to be
+* updated to support other THP sizes.
+*/
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+   if (hugetlb)
+   vma_pagesize = PMD_SIZE;
+   }
+
+   if (writable)
+   kvm_set_pfn_dirty(pfn);
+
+   if (fault_status != FSC_PERM)
+   clean_dcache_guest_page(pfn, vma_pagesize);
 
-   if (hugetlb) {
+   if (exec_fault)
+   invalidate_icache_guest_page(pfn, vma_pagesize);
+
+   /*
+* If we took an execution fault we have made the
+* icache/dcache coherent above and should now let the s2
+* mapping be executable.
+*
+* Write faults (!exec_fault && FSC_PERM) are orthogonal to
+* execute permissions, and we preserve whatever we have.
+*/
+   needs_exec = exec_fault ||
+   (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
+
+   if (hugetlb && vma_pagesize == PMD_SIZE) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
-   if (writable) {
+   if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);
-   kvm_set_pfn_dirty(pfn);
-   }
 
-   if (fault_status != FSC_PERM)
-   clean_dcache_guest_page(pfn, PMD_SIZE);
-
-   if (exec_fault) {
+   if (needs_exec)
new_pmd = kvm_s2pmd_mkexec(new_pmd);
-   invalidate_icache_guest_page(pfn, PMD_SIZE);
-   } else if (fault_status == FSC_PERM) {
-   /* Preserve execute if XN was already cleared */
-   if (stage2_is_exec(kvm, fault_ipa))
-   new_pmd = kvm_s2pmd_mkexec(new_pmd);
-   }
 
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
@@ -1549,21 +1568,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
-   kvm_set_pfn_dirty(pfn);
mark_page_dirty(kvm, gfn);
}
 
-   if (fault_status != FSC_PERM)
-   clean_dcache_guest_page(pfn, PAGE_SIZE);
-
-   if (exec_fault) {
+   if (needs_exec)
new_pte = kvm_s2pte_mkexec(new_pte);
-   invalidate_icache_guest_page(pfn, PAGE_SIZE);
-   } else if (fault_status == FSC_

[PATCH v5 3/7] KVM: arm64: Support dirty page tracking for PUD hugepages

2018-07-09 Thread Punit Agrawal
In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.

Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.

Signed-off-by: Punit Agrawal 
Reviewed-by: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h   | 16 
 arch/arm64/include/asm/kvm_mmu.h | 10 ++
 virt/kvm/arm/mmu.c   | 11 +++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d095c2d0b284..c23722f75d5c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void);
 
 #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
 
+/*
+ * The following kvm_*pud*() functionas are provided strictly to allow
+ * sharing code with arm64. They should never be called in practice.
+ */
+static inline void kvm_set_s2pud_readonly(pud_t *pud)
+{
+   BUG();
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pud)
+{
+   BUG();
+   return false;
+}
+
+
 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
 {
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 689def9bb9d5..84051930ddfe 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -239,6 +239,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
 }
 
+static inline void kvm_set_s2pud_readonly(pud_t *pudp)
+{
+   kvm_set_s2pte_readonly((pte_t *)pudp);
+}
+
+static inline bool kvm_s2pud_readonly(pud_t *pudp)
+{
+   return kvm_s2pte_readonly((pte_t *)pudp);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e131b7f9b7d7..ed8f8271c389 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1288,9 +1288,12 @@ static void  stage2_wp_puds(pgd_t *pgd, phys_addr_t 
addr, phys_addr_t end)
do {
next = stage2_pud_addr_end(addr, end);
if (!stage2_pud_none(*pud)) {
-   /* TODO:PUD not supported, revisit later if supported */
-   BUG_ON(stage2_pud_huge(*pud));
-   stage2_wp_pmds(pud, addr, next);
+   if (stage2_pud_huge(*pud)) {
+   if (!kvm_s2pud_readonly(pud))
+   kvm_set_s2pud_readonly(pud);
+   } else {
+   stage2_wp_pmds(pud, addr, next);
+   }
}
} while (pud++, addr = next, addr != end);
 }
@@ -1333,7 +1336,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t 
addr, phys_addr_t end)
  *
  * Called to start logging dirty pages after memory region
  * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
- * all present PMD and PTEs are write protected in the memory region.
+ * all present PUD, PMD and PTEs are write protected in the memory region.
  * Afterwards read of dirty page log can be called.
  *
  * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2

2018-07-09 Thread Punit Agrawal
KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.

Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on cores that support mapping
larger block sizes in the TLB entries.

Signed-off-by: Punit Agrawal 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h | 19 +++
 arch/arm64/include/asm/kvm_mmu.h   | 15 +
 arch/arm64/include/asm/pgtable-hwdef.h |  2 +
 arch/arm64/include/asm/pgtable.h   |  2 +
 virt/kvm/arm/mmu.c | 78 --
 5 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8e1e8aee229e..787baf9ec994 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -77,10 +77,13 @@ void kvm_clear_hyp_idmap(void);
 
 #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
 #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) (__pud(0))
 
 #define kvm_pud_pfn(pud)   (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> 
PAGE_SHIFT)
 
 #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
+/* No support for pud hugepages */
+#define kvm_pud_mkhuge(pud)(pud)
 
 /*
  * The following kvm_*pud*() functionas are provided strictly to allow
@@ -97,6 +100,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
return false;
 }
 
+static inline void kvm_set_pud(pud_t *pud, pud_t new_pud)
+{
+   BUG();
+}
+
+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+   BUG();
+   return pud;
+}
+
+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+   BUG();
+   return pud;
+}
 
 static inline bool kvm_s2pud_exec(pud_t *pud)
 {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index c542052fb199..dd8a23159463 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -171,13 +171,16 @@ void kvm_clear_hyp_idmap(void);
 
 #definekvm_set_pte(ptep, pte)  set_pte(ptep, pte)
 #definekvm_set_pmd(pmdp, pmd)  set_pmd(pmdp, pmd)
+#define kvm_set_pud(pudp, pud) set_pud(pudp, pud)
 
 #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
 #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot)
 
 #define kvm_pud_pfn(pud)   pud_pfn(pud)
 
 #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
+#define kvm_pud_mkhuge(pud)pud_mkhuge(pud)
 
 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
 {
@@ -191,6 +194,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
return pmd;
 }
 
+static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
+{
+   pud_val(pud) |= PUD_S2_RDWR;
+   return pud;
+}
+
 static inline pte_t kvm_s2pte_mkexec(pte_t pte)
 {
pte_val(pte) &= ~PTE_S2_XN;
@@ -203,6 +212,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
return pmd;
 }
 
+static inline pud_t kvm_s2pud_mkexec(pud_t pud)
+{
+   pud_val(pud) &= ~PUD_S2_XN;
+   return pud;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *ptep)
 {
pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 10ae592b78b8..e327665e94d1 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -193,6 +193,8 @@
 #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 #define PMD_S2_XN  (_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
+#define PUD_S2_RDONLY  (_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
+#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
 #define PUD_S2_XN  (_AT(pudval_t, 2) << 53)  /* XN[1:0] */
 
 /*
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4d9476e420d9..0afc34f94ff5 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pud_mkyoung(pud)   pte_pud(pte_mkyoung(pud_pte(pud)))
 #define pud_write(pud) pte_write(pud_pte(pud))
 
+#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT))
+
 #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
 #define __phys_to_pud_val(phys)__phys_to_pte_val(phys)
 #define pud_pfn(pud)   ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index a6d3ac9d7c7a..d8e2497e5353 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -116,6 +116,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, 
phys_addr_t addr, pmd_t *pmd)
put_page(virt_to_page(pmd));
 }
 
+/**
+ * stage2_dissolve_pud() - clear and flush huge PUD entry
+ * @kvm:   poin

[PATCH v5 4/7] KVM: arm64: Support PUD hugepage in stage2_is_exec()

2018-07-09 Thread Punit Agrawal
In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.

Provide trivial implementations of arm32 helpers to allow sharing of
code.

Signed-off-by: Punit Agrawal 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h |  6 
 arch/arm64/include/asm/kvm_mmu.h   |  5 +++
 arch/arm64/include/asm/pgtable-hwdef.h |  2 ++
 virt/kvm/arm/mmu.c | 49 +++---
 4 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index c23722f75d5c..d05c8986e495 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -96,6 +96,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud)
 }
 
 
+static inline bool kvm_s2pud_exec(pud_t *pud)
+{
+   BUG();
+   return false;
+}
+
 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
 {
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 84051930ddfe..15bc1be8f82f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -249,6 +249,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp)
return kvm_s2pte_readonly((pte_t *)pudp);
 }
 
+static inline bool kvm_s2pud_exec(pud_t *pudp)
+{
+   return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index fd208eac9f2a..10ae592b78b8 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -193,6 +193,8 @@
 #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 #define PMD_S2_XN  (_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
+#define PUD_S2_XN  (_AT(pudval_t, 2) << 53)  /* XN[1:0] */
+
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
  */
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ed8f8271c389..e73909a31e02 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1038,23 +1038,62 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
kvm_mmu_memory_cache
return 0;
 }
 
-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+/*
+ * stage2_get_leaf_entry - walk the stage2 VM page tables and return
+ * true if a valid and present leaf-entry is found. A pointer to the
+ * leaf-entry is returned in the appropriate level variable - pudpp,
+ * pmdpp, ptepp.
+ */
+static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
+ pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
 {
+   pud_t *pudp;
pmd_t *pmdp;
pte_t *ptep;
 
-   pmdp = stage2_get_pmd(kvm, NULL, addr);
+   pudp = stage2_get_pud(kvm, NULL, addr);
+   if (!pudp || pud_none(*pudp) || !pud_present(*pudp))
+   return false;
+
+   if (pud_huge(*pudp)) {
+   *pudpp = pudp;
+   return true;
+   }
+
+   pmdp = stage2_pmd_offset(pudp, addr);
if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
return false;
 
-   if (pmd_thp_or_huge(*pmdp))
-   return kvm_s2pmd_exec(pmdp);
+   if (pmd_thp_or_huge(*pmdp)) {
+   *pmdpp = pmdp;
+   return true;
+   }
 
ptep = pte_offset_kernel(pmdp, addr);
if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
return false;
 
-   return kvm_s2pte_exec(ptep);
+   *ptepp = ptep;
+   return true;
+}
+
+static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+{
+   pud_t *pudp = NULL;
+   pmd_t *pmdp = NULL;
+   pte_t *ptep = NULL;
+   bool found;
+
+   found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
+   if (!found)
+   return false;
+
+   if (pudp)
+   return kvm_s2pud_exec(pudp);
+   else if (pmdp)
+   return kvm_s2pmd_exec(pmdp);
+   else
+   return kvm_s2pte_exec(ptep);
 }
 
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 2/7] KVM: arm/arm64: Introduce helpers to manupulate page table entries

2018-07-09 Thread Punit Agrawal
Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.

The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do not exist on arm.

Signed-off-by: Punit Agrawal 
Acked-by: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h   | 5 +
 arch/arm64/include/asm/kvm_mmu.h | 5 +
 virt/kvm/arm/mmu.c   | 8 +---
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8553d68b7c8a..d095c2d0b284 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
 void kvm_clear_hyp_idmap(void);
 
+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
+
 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
 {
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index fb9a7127bb75..689def9bb9d5 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -172,6 +172,11 @@ void kvm_clear_hyp_idmap(void);
 #definekvm_set_pte(ptep, pte)  set_pte(ptep, pte)
 #definekvm_set_pmd(pmdp, pmd)  set_pmd(pmdp, pmd)
 
+#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
+#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
+
+#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
+
 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
 {
pte_val(pte) |= PTE_S2_RDWR;
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ea3d992e4fb7..e131b7f9b7d7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1554,8 +1554,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
 
if (hugetlb && vma_pagesize == PMD_SIZE) {
-   pmd_t new_pmd = pfn_pmd(pfn, mem_type);
-   new_pmd = pmd_mkhuge(new_pmd);
+   pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
+
+   new_pmd = kvm_pmd_mkhuge(new_pmd);
+
if (writable)
new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 
@@ -1564,7 +1566,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
-   pte_t new_pte = pfn_pte(pfn, mem_type);
+   pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
 
if (writable) {
new_pte = kvm_s2pte_mkwrite(new_pte);
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 6/7] KVM: arm64: Update age handlers to support PUD hugepages

2018-07-09 Thread Punit Agrawal
In preparation for creating larger hugepages at Stage 2, add support
to the age handling notifiers for PUD hugepages when encountered.

Provide trivial helpers for arm32 to allow sharing code.

Signed-off-by: Punit Agrawal 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h   |  6 
 arch/arm64/include/asm/kvm_mmu.h |  5 
 arch/arm64/include/asm/pgtable.h |  1 +
 virt/kvm/arm/mmu.c   | 51 ++--
 4 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index a4298d429efc..8e1e8aee229e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -110,6 +110,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
return pud;
 }
 
+static inline bool kvm_s2pud_young(pud_t pud)
+{
+   BUG();
+   return false;
+}
+
 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
 {
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 4d2780c588b0..c542052fb199 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -261,6 +261,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
return pud_mkyoung(pud);
 }
 
+static inline bool kvm_s2pud_young(pud_t pud)
+{
+   return pud_young(pud);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index a64a5c35beb1..4d9476e420d9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pfn_pmd(pfn,prot)  __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << 
PAGE_SHIFT) | pgprot_val(prot))
 #define mk_pmd(page,prot)  pfn_pmd(page_to_pfn(page),prot)
 
+#define pud_young(pud) pte_young(pud_pte(pud))
 #define pud_mkyoung(pud)   pte_pud(pte_mkyoung(pud_pte(pud)))
 #define pud_write(pud) pte_write(pud_pte(pud))
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index d2c705e31584..a6d3ac9d7c7a 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1172,6 +1172,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
return stage2_ptep_test_and_clear_young((pte_t *)pmd);
 }
 
+static int stage2_pudp_test_and_clear_young(pud_t *pud)
+{
+   return stage2_ptep_test_and_clear_young((pte_t *)pud);
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
@@ -1879,42 +1884,42 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long 
hva, pte_t pte)
 
 static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   pmd_t *pmd;
-   pte_t *pte;
+   pud_t *pud = NULL;
+   pmd_t *pmd = NULL;
+   pte_t *pte = NULL;
+   bool found;
 
-   WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
-   pmd = stage2_get_pmd(kvm, NULL, gpa);
-   if (!pmd || pmd_none(*pmd)) /* Nothing there */
+   WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
+   found = stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte);
+   if (!found)
return 0;
 
-   if (pmd_thp_or_huge(*pmd))  /* THP, HugeTLB */
+   if (pud)
+   return stage2_pudp_test_and_clear_young(pud);
+   else if (pmd)
return stage2_pmdp_test_and_clear_young(pmd);
-
-   pte = pte_offset_kernel(pmd, gpa);
-   if (pte_none(*pte))
-   return 0;
-
-   return stage2_ptep_test_and_clear_young(pte);
+   else
+   return stage2_ptep_test_and_clear_young(pte);
 }
 
 static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void 
*data)
 {
-   pmd_t *pmd;
-   pte_t *pte;
+   pud_t *pud = NULL;
+   pmd_t *pmd = NULL;
+   pte_t *pte = NULL;
+   bool found;
 
-   WARN_ON(size != PAGE_SIZE && size != PMD_SIZE);
-   pmd = stage2_get_pmd(kvm, NULL, gpa);
-   if (!pmd || pmd_none(*pmd)) /* Nothing there */
+   WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
+   found = stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte);
+   if (!found)
return 0;
 
-   if (pmd_thp_or_huge(*pmd))  /* THP, HugeTLB */
+   if (pud)
+   return kvm_s2pud_young(*pud);
+   else if (pmd)
return pmd_young(*pmd);
-
-   pte = pte_offset_kernel(pmd, gpa);
-   if (!pte_none(*pte))/* Just a page... */
+   else
return pte_young(*pte);
-
-   return 0;
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 5/7] KVM: arm64: Support handling access faults for PUD hugepages

2018-07-09 Thread Punit Agrawal
In preparation for creating larger hugepages at Stage 2, extend the
access fault handling at Stage 2 to support PUD hugepages when
encountered.

Provide trivial helpers for arm32 to allow sharing of code.

Signed-off-by: Punit Agrawal 
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm/include/asm/kvm_mmu.h   |  8 
 arch/arm64/include/asm/kvm_mmu.h |  7 +++
 arch/arm64/include/asm/pgtable.h |  6 ++
 virt/kvm/arm/mmu.c   | 29 -
 4 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d05c8986e495..a4298d429efc 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -78,6 +78,8 @@ void kvm_clear_hyp_idmap(void);
 #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
 #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
 
+#define kvm_pud_pfn(pud)   (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> 
PAGE_SHIFT)
+
 #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
 
 /*
@@ -102,6 +104,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud)
return false;
 }
 
+static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
+{
+   BUG();
+   return pud;
+}
+
 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
 {
*pmd = new_pmd;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 15bc1be8f82f..4d2780c588b0 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -175,6 +175,8 @@ void kvm_clear_hyp_idmap(void);
 #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot)
 #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot)
 
+#define kvm_pud_pfn(pud)   pud_pfn(pud)
+
 #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd)
 
 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
@@ -254,6 +256,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
 }
 
+static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
+{
+   return pud_mkyoung(pud);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 1bdeca8918a6..a64a5c35beb1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud)
return __pte(pud_val(pud));
 }
 
+static inline pud_t pte_pud(pte_t pte)
+{
+   return __pud(pte_val(pte));
+}
+
 static inline pmd_t pud_pmd(pud_t pud)
 {
return __pmd(pud_val(pud));
@@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd)
 #define pfn_pmd(pfn,prot)  __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << 
PAGE_SHIFT) | pgprot_val(prot))
 #define mk_pmd(page,prot)  pfn_pmd(page_to_pfn(page),prot)
 
+#define pud_mkyoung(pud)   pte_pud(pte_mkyoung(pud_pte(pud)))
 #define pud_write(pud) pte_write(pud_pte(pud))
 
 #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e73909a31e02..d2c705e31584 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1637,33 +1637,36 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
-   pmd_t *pmd;
-   pte_t *pte;
+   pud_t *pud = NULL;
+   pmd_t *pmd = NULL;
+   pte_t *pte = NULL;
kvm_pfn_t pfn;
-   bool pfn_valid = false;
+   bool found, pfn_valid = false;
 
trace_kvm_access_fault(fault_ipa);
 
spin_lock(&vcpu->kvm->mmu_lock);
 
-   pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
-   if (!pmd || pmd_none(*pmd)) /* Nothing there */
+   found = stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte);
+   if (!found)
goto out;
 
-   if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */
+   if (pud) {  /* HugeTLB */
+   *pud = kvm_s2pud_mkyoung(*pud);
+   pfn = kvm_pud_pfn(*pud);
+   pfn_valid = true;
+   goto out;
+   } else  if (pmd) {  /* THP, HugeTLB */
*pmd = pmd_mkyoung(*pmd);
pfn = pmd_pfn(*pmd);
pfn_valid = true;
goto out;
+   } else {
+   *pte = pte_mkyoung(*pte);   /* Just a page... */
+   pfn = pte_pfn(*pte);
+   pfn_valid = true;
}
 
-   pte = pte_offset_kernel(pmd, fault_ipa);
-   if (pte_none(*pte)) /* Nothing there either */
-   goto out;
-
-   *pte = pte_mkyoung(*pte);   /* Just a page... */
-   pfn = pte_pfn(*pte);
-   pfn_valid = true;
 out:
spin_unlock(&vcpu->kvm->mmu_lock);
if (pfn_valid)
-- 
2.17.1

___
kvmarm mailin

Re: [PATCH v5 0/7] KVM: Support PUD hugepages at stage 2

2018-07-09 Thread Punit Agrawal
Please ignore this cover letter.

Apologies for the duplicate cover-letter and a somewhat funky threading
(I blame emacs unsaved buffer).

The patches appear to be intact so don't let the threading get in the
way of review.

Punit Agrawal  writes:

> This series is an update to the PUD hugepage support previously posted
> at [0]. This patchset adds support for PUD hugepages at stage
> 2. This feature is useful on cores that have support for large sized
> TLB mappings (e.g., 1GB for 4K granule).
>
> The biggest change in this version is to replace repeated instances of
> walking the page tables to get to a leaf-entry with a function to
> return the appropriate entry. This was suggested by Suzuki and should
> help reduce the amount of churn resulting from future changes. It also
> address other feedback on the previous version.
>
> Support is added to code that is shared between arm and arm64. Dummy
> helpers for arm are provided as the port does not support PUD hugepage
> sizes.
>
> The patches have been tested on an A57 based system. The patchset is
> based on v4.18-rc4. The are a few conflicts with the support for 52
> bit IPA[1] due to change in number of parameters for
> stage2_pmd_offset().
>
> Thanks,
> Punit
>
> v4 -> v5:
>
> * Patch 1 - Drop helper stage2_should_exec() and refactor the
>   condition to decide if a page table entry should be marked
>   executable
> * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and
>   latter patches
> * Patch 7 - Use stage 2 accessors instead of using the page table
>   helpers directly
> * Patch 7 - Add a note to update the PUD hugepage support when number
>   of levels of stage 2 tables differs from stage 1
>
>
> v3 -> v4:
> * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled
> * Patch 4-5 - Add PUD hugepage support for exec and access faults
> * Patch 6 - PUD hugepage support for aging page table entries
>
> v2 -> v3:
> * Update vma_pagesize directly if THP [1/4]. Previsouly this was done
>   indirectly via hugetlb
> * Added review tag [4/4]
>
> v1 -> v2:
> * Create helper to check if the page should have exec permission [1/4]
> * Fix broken condition to detect THP hugepage [1/4]
> * Fix in-correct hunk resulting from a rebase [4/4]
>
> [0] https://www.spinics.net/lists/arm-kernel/msg663562.html
> [1] https://www.spinics.net/lists/kvm/msg171065.html
>
>
> Punit Agrawal (7):
>   KVM: arm/arm64: Share common code in user_mem_abort()
>   KVM: arm/arm64: Introduce helpers to manupulate page table entries
>   KVM: arm64: Support dirty page tracking for PUD hugepages
>   KVM: arm64: Support PUD hugepage in stage2_is_exec()
>   KVM: arm64: Support handling access faults for PUD hugepages
>   KVM: arm64: Update age handlers to support PUD hugepages
>   KVM: arm64: Add support for creating PUD hugepages at stage 2
>
>  arch/arm/include/asm/kvm_mmu.h |  60 +
>  arch/arm64/include/asm/kvm_mmu.h   |  47 
>  arch/arm64/include/asm/pgtable-hwdef.h |   4 +
>  arch/arm64/include/asm/pgtable.h   |   9 +
>  virt/kvm/arm/mmu.c | 289 ++---
>  5 files changed, 330 insertions(+), 79 deletions(-)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 05/16] KVM: arm: Add arch init/uninit hooks

2018-07-09 Thread Dave Martin
On Fri, Jul 06, 2018 at 11:02:20AM +0100, Alex Bennée wrote:
> 
> Dave Martin  writes:
> 
> > In preparation for adding support for SVE in guests on arm64, hooks
> > for allocating and freeing additional per-vcpu memory are needed.
> >
> > kvm_arch_vcpu_setup() could be used for allocation, but this
> > function is not clearly balanced by un "unsetup" function, making
> 
> Isn't that a double negative there? Surely it would be balanced by a
> kvm_arch_vcpu_unsetup() or possibly better named function.

Yes, but there is no such function, and it wasn't clear what the
semantics of the existing hooks is supposed to be... so I didn't
feel comfortable adding an _unsetup().

I was trying to be minimally invasive while I got things working...

> > it unclear where memory allocated in this function should be freed.
> >
> > To keep things simple, this patch defines backend hooks
> > kvm_arm_arch_vcpu_{,un}unint(), and plumbs them in appropriately.
> 
> Is {,un} a notation for dropping un? This might be why I'm confused. I
> would have written it as kvm_arm_arch_vcpu_[un]init() or even
> kvm_arm_arch_vcpu_[init|uninit].

That should be kvm_arm_arch_vcpu_{,un}init().

Whether this is readily understood by people is another question.
This is the bash brace-expansion syntax which I'm in the habit of
using, partly because it looks nothing like C syntax, thus
"reducing" confusion.

Personally I make heavy use of this in the shell, like

mv .config{,.old}

etc.  But that's me.  Maybe other people don't.

Too obscure?  I have a number of patches that follow this convention
upstream.

> > The exusting kvm_arch_vcpu_init() function now calls
> 
> /existing/
> 
> > kvm_arm_arch_vcpu_init(), while an explicit kvm_arch_vcpu_uninit()
> > is added which current does nothing except to call
> > kvm_arm_arch_vcpu_uninit().
> 
> OK I'm a little confused by this. It seems to me that KVM already has
> the provision for an init/uninit. What does the extra level on
> indirection buy you that keeping the static inline
> kvm_arm_arch_vcpu_uninit in arm/kvm_host.h and a concrete implementation
> in arm64/kvm/guest.c doesn't?

There isn't an intentional extra level of indirection, but the existing
code feels strangely factored due to the somewhat random use of
prefixes in function names (i.e., kvm_arch_*() is sometimes a hook from
KVM core into virt/kvm/arm/, some a hook from virt/kvm/arm/ into the
arm or arm64 backend, and sometimes a hook from KVM core directly into
the arm or arm64 backend).

So, I wasn't really sure where to put things initially.

This patch was always a bit of a bodge, and the series as posted
doesn't fully make use of it anyway... so I'll need to revisit.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: vgic-its: Remove VLA usage

2018-07-09 Thread Kees Cook
On Mon, Jul 9, 2018 at 3:47 AM, Marc Zyngier  wrote:
> Hi kees,
>
> On 02/07/18 18:15, Kees Cook wrote:
>> On Mon, Jul 2, 2018 at 12:36 AM, Auger Eric  wrote:
>>> Hi Kees,
>>>
>>> On 06/29/2018 08:46 PM, Kees Cook wrote:
 In the quest to remove all stack VLA usage from the kernel[1], this
 switches to using a maximum size and adds sanity checks. Additionally
 cleans up some of the int-vs-u32 usage and adds additional bounds checking.
 As it currently stands, this will always be 8 bytes until the ABI changes.

 [1] 
 https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

 Cc: Christoffer Dall 
 Cc: Marc Zyngier 
 Cc: Eric Auger 
 Cc: Andre Przywara 
 Cc: linux-arm-ker...@lists.infradead.org
 Cc: kvmarm@lists.cs.columbia.edu
 Signed-off-by: Kees Cook 
 ---
  virt/kvm/arm/vgic/vgic-its.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

 diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
 index 4ed79c939fb4..3143fc047fcf 100644
 --- a/virt/kvm/arm/vgic/vgic-its.c
 +++ b/virt/kvm/arm/vgic/vgic-its.c
 @@ -168,8 +168,14 @@ struct vgic_its_abi {
   int (*commit)(struct vgic_its *its);
  };

 +#define ABI_0_ESZ8
 +#define ESZ_MAX  ABI_0_ESZ
 +
  static const struct vgic_its_abi its_table_abi_versions[] = {
 - [0] = {.cte_esz = 8, .dte_esz = 8, .ite_esz = 8,
 + [0] = {
 +  .cte_esz = ABI_0_ESZ,
 +  .dte_esz = ABI_0_ESZ,
 +  .ite_esz = ABI_0_ESZ,
.save_tables = vgic_its_save_tables_v0,
.restore_tables = vgic_its_restore_tables_v0,
.commit = vgic_its_commit_v0,
 @@ -180,10 +186,12 @@ static const struct vgic_its_abi 
 its_table_abi_versions[] = {

  inline const struct vgic_its_abi *vgic_its_get_abi(struct vgic_its *its)
  {
 + if (WARN_ON(its->abi_rev >= NR_ITS_ABIS))
 + return NULL;
   return &its_table_abi_versions[its->abi_rev];
  }

 -int vgic_its_set_abi(struct vgic_its *its, int rev)
 +static int vgic_its_set_abi(struct vgic_its *its, u32 rev)
  {
>>> if vgic_its_get_abi is likely to return NULL, don't we need to check abi
>>> != NULL in all call sites.
>>
>> My thinking was that since it should never happen, a WARN_ON would be
>> sufficient. But I can drop all these changes if you want. I just
>> wanted to see the VLA removed. :)
>
> Are you going to respin this patch limiting it to just the VLA changes?
> I'm actively queuing stuff for the next merge window, and it'd be good
> to get that one in.
>
> Alternatively, I can drop the WARN_ONs myself. Just let me know.

If you can just drop them that would be best, thanks!

-Kees

-- 
Kees Cook
Pixel Security
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/6] KVM: arm/arm64: vgic: Allow configuration of interrupt groups

2018-07-09 Thread Christoffer Dall
On Mon, Jul 09, 2018 at 09:42:40AM +0100, Marc Zyngier wrote:
> On 04/07/18 10:38, Christoffer Dall wrote:
> > Implement the required MMIO accessors for GICv2 and GICv3 for the
> > IGROUPR distributor and redistributor registers.
> > 
> > This can allow guests to change behavior compared to running on previous
> > versions of KVM, but only to align with the architecture and hardware
> > implementations.
> > 
> > This also allows userspace to configure the groups for interrupts.  Note
> > that this potentially results in GICv2 guests not receiving interrupts
> > after migration if migrating from an older kernel that exposes GICv2
> > interrupts as group 1.
> > 
> > Cc: Andrew Jones 
> > Signed-off-by: Christoffer Dall 
> > ---
> > I implemented (but stashed) a version of this which predicated the
> > behavior based on the value of GICD_IIDR revision field, falling back to
> > ignoring writes and resetting GICv2 groups to 0 if the guest wrote a
> > revision less than 2.  However, current QEMU implementations simply
> > don't write the GICD_IIDR, so this doesn't help at all without changing
> > QEMU anyhow.
> > 
> > The only actual fix I can see here to work around the problem in the
> > kernel is to require an opt-in to allow restoring groups from userspace,
> > but that's a lot of logic to support cross-kernel version migration.
> > 
> > Question: Do we expect that cross-kernel version migration is a critical
> > feature that people really expect to work, and do we actually have
> > examples of catering to this in the kernel elsewhere?  (Also, how would
> > then that relate to the whole 'adding a new sysreg breaks migration'
> > situation?)
> 
> I don't really get why QEMU doesn't try to restore GICD_IIDR, while it
> is definitely trying to restore RO sysregs (and that's how we detect
> incompatibilities).
> 
> I think we should at least give userspace a chance to do the right
> thing. If it doesn't, well, too bad.

This series should give userspace an option to adjust its behavior.

My main concern is that this version of the series results in the worst
kind of migration failures, where the guest simply doesn't run on the
destination side with no warnings or error messages returned to the
user..

We could add logic to return an error code if trying to write a
different revision than what the kernel has (similar to the invariant
sysregs), so that a simple fix to QEMU to save restore the GICD_IIDR
register at least results in an error being returned to userspace.

However, as QEMU doesn't do anything useful here today (not blaming
anyone, I wrote the apparently broken GIC save/restore code for QEMU
myself), we could also argue that QEMU might as well just fix things up
if it detects a different IIDR.

> 
> How bad is that "writable GICD_IIDR" patch? Because at this stage, and
> in the absence of any comment, I'm close to just pick that series and
> merge it for 4.19.
> 

My guess is that a patch to "save/restore GICD_IIDR" is simple enough,
but requires additional logic in the kernel that returns an error if the
GICD_IIDR don't match on write.

A patch which changes the groups and bumps the IIDR in userspace is
probably more complex.

Sounds like I should add the GICD_IIDR checking patch.  Thoughts?

What I would really like to know is whether this is really an issue or
not.  Do people who run products based on KVM, such as RHEV, have any
expectations about cross-kernel version migration?


Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm