Re: [PATCH v5 02/13] KVM: x86: Refactor tsc synchronization code

2021-08-03 Thread Oliver Upton
On Fri, Jul 30, 2021 at 11:08 AM Sean Christopherson  wrote:
>
> On Thu, Jul 29, 2021, Oliver Upton wrote:
> > Refactor kvm_synchronize_tsc to make a new function that allows callers
> > to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> > for the sake of participating in TSC synchronization.
> >
> > This changes the locking semantics around TSC writes.
>
> "refactor" and "changes the locking semantics" are somewhat contradictory.  
> The
> correct way to do this is to first change the locking semantics, then extract 
> the
> helper you want.  That makes review and archaeology easier, and isolates the
> locking change in case it isn't so safe after all.

Indeed, it was mere laziness doing so :)

> > Writes to the TSC will now take the pvclock gtod lock while holding the tsc
> > write lock, whereas before these locks were disjoint.
> >
> > Reviewed-by: David Matlack 
> > Signed-off-by: Oliver Upton 
> > ---
> > +/*
> > + * Infers attempts to synchronize the guest's tsc from host writes. Sets 
> > the
> > + * offset for the vcpu and tracks the TSC matching generation that the vcpu
> > + * participates in.
> > + *
> > + * Must hold kvm->arch.tsc_write_lock to call this function.
>
> Drop this blurb, lockdep assertions exist for a reason :-)
>

Ack.

> > + */
> > +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 
> > tsc,
> > +   u64 ns, bool matched)
> > +{
> > + struct kvm *kvm = vcpu->kvm;
> > + bool already_matched;
>
> Eww, not your code, but "matched" and "already_matched" are not helpful names,
> e.g. they don't provide a clue as to _what_ matched, and thus don't explain 
> why
> there are two separate variables.  And I would expect an "already" variant to
> come in from the caller, not the other way 'round.
>
>   matched => freq_matched
>   already_matched => gen_matched

Yeah, everything this series touches is a bit messy. I greedily
avoided the pile of cleanups that are needed, but alas...

> > + spin_lock_irqsave(>arch.pvclock_gtod_sync_lock, flags);
>
> I believe this can be spin_lock(), since AFAICT the caller _must_ disable IRQs
> when taking tsc_write_lock, i.e. we know IRQs are disabled at this point.

Definitely.


>
> > + if (!matched) {
> > + /*
> > +  * We split periods of matched TSC writes into generations.
> > +  * For each generation, we track the original measured
> > +  * nanosecond time, offset, and write, so if TSCs are in
> > +  * sync, we can match exact offset, and if not, we can match
> > +  * exact software computation in compute_guest_tsc()
> > +  *
> > +  * These values are tracked in kvm->arch.cur_xxx variables.
> > +  */
> > + kvm->arch.nr_vcpus_matched_tsc = 0;
> > + kvm->arch.cur_tsc_generation++;
> > + kvm->arch.cur_tsc_nsec = ns;
> > + kvm->arch.cur_tsc_write = tsc;
> > + kvm->arch.cur_tsc_offset = offset;
>
> IMO, adjusting kvm->arch.cur_tsc_* belongs outside of pvclock_gtod_sync_lock.
> Based on the existing code, it is protected by tsc_write_lock.  I don't care
> about the extra work while holding pvclock_gtod_sync_lock, but it's very 
> confusing
> to see code that reads variables outside of a lock, then take a lock and write
> those same variables without first rechecking.
>
> > + matched = false;
>
> What's the point of clearing "matched"?  It's already false...

None, besides just yanking the old chunk of code :)

>
> > + } else if (!already_matched) {
> > + kvm->arch.nr_vcpus_matched_tsc++;
> > + }
> > +
> > + kvm_track_tsc_matching(vcpu);
> > + spin_unlock_irqrestore(>arch.pvclock_gtod_sync_lock, flags);
> > +}
> > +

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] memblock: make memblock_find_in_range method private

2021-08-03 Thread Mike Rapoport
On Tue, Aug 03, 2021 at 07:05:26PM +0100, Catalin Marinas wrote:
> On Tue, Aug 03, 2021 at 09:42:18AM +0300, Mike Rapoport wrote:
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 8490ed2917ff..0bffd2d1854f 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -74,6 +74,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
> >  static void __init reserve_crashkernel(void)
> >  {
> > unsigned long long crash_base, crash_size;
> > +   unsigned long long crash_max = arm64_dma_phys_limit;
> > int ret;
> >  
> > ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > @@ -84,33 +85,18 @@ static void __init reserve_crashkernel(void)
> >  
> > crash_size = PAGE_ALIGN(crash_size);
> >  
> > -   if (crash_base == 0) {
> > -   /* Current arm64 boot protocol requires 2MB alignment */
> > -   crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
> > -   crash_size, SZ_2M);
> > -   if (crash_base == 0) {
> > -   pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > -   crash_size);
> > -   return;
> > -   }
> > -   } else {
> > -   /* User specifies base address explicitly. */
> > -   if (!memblock_is_region_memory(crash_base, crash_size)) {
> > -   pr_warn("cannot reserve crashkernel: region is not 
> > memory\n");
> > -   return;
> > -   }
> > +   /* User specifies base address explicitly. */
> > +   if (crash_base)
> > +   crash_max = crash_base + crash_size;
> >  
> > -   if (memblock_is_region_reserved(crash_base, crash_size)) {
> > -   pr_warn("cannot reserve crashkernel: region overlaps 
> > reserved memory\n");
> > -   return;
> > -   }
> > -
> > -   if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > -   pr_warn("cannot reserve crashkernel: base address is 
> > not 2MB aligned\n");
> > -   return;
> > -   }
> > +   /* Current arm64 boot protocol requires 2MB alignment */
> > +   crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> > +  crash_base, crash_max);
> > +   if (!crash_base) {
> > +   pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > +   crash_size);
> > +   return;
> > }
> > -   memblock_reserve(crash_base, crash_size);
> 
> We'll miss a bit on debug information provided to the user in case of a
> wrong crash_base/size option on the command line. Not sure we care much,
> though the alignment would probably be useful (maybe we document it
> somewhere).

It is already documented:

Documentation/admin-guide/kdump/kdump.rst:
   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
 
> What I haven't checked is whether memblock_phys_alloc_range() aims to
> get a 2MB aligned end (size) as well. If crash_size is not 2MB aligned,
> crash_max wouldn't be either and the above could fail. We only care
> about the crash_base to be aligned but the memblock_phys_alloc_range()
> doc says that both the start and size would be aligned to this.

The doc lies :)

memblock_phys_alloc_range() boils down to 

for_each_free_mem_range_reverse(i, nid, flags, _start, _end,
NULL) {

/* clamp this_{start,end} to the user defined limits */

cand = round_down(this_end - size, align);
if (cand >= this_start)
return cand;
}

-- 
Sincerely yours,
Mike.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] memblock: make memblock_find_in_range method private

2021-08-03 Thread Catalin Marinas
On Tue, Aug 03, 2021 at 09:42:18AM +0300, Mike Rapoport wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 8490ed2917ff..0bffd2d1854f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -74,6 +74,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
>  static void __init reserve_crashkernel(void)
>  {
>   unsigned long long crash_base, crash_size;
> + unsigned long long crash_max = arm64_dma_phys_limit;
>   int ret;
>  
>   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> @@ -84,33 +85,18 @@ static void __init reserve_crashkernel(void)
>  
>   crash_size = PAGE_ALIGN(crash_size);
>  
> - if (crash_base == 0) {
> - /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
> - crash_size, SZ_2M);
> - if (crash_base == 0) {
> - pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> - crash_size);
> - return;
> - }
> - } else {
> - /* User specifies base address explicitly. */
> - if (!memblock_is_region_memory(crash_base, crash_size)) {
> - pr_warn("cannot reserve crashkernel: region is not 
> memory\n");
> - return;
> - }
> + /* User specifies base address explicitly. */
> + if (crash_base)
> + crash_max = crash_base + crash_size;
>  
> - if (memblock_is_region_reserved(crash_base, crash_size)) {
> - pr_warn("cannot reserve crashkernel: region overlaps 
> reserved memory\n");
> - return;
> - }
> -
> - if (!IS_ALIGNED(crash_base, SZ_2M)) {
> - pr_warn("cannot reserve crashkernel: base address is 
> not 2MB aligned\n");
> - return;
> - }
> + /* Current arm64 boot protocol requires 2MB alignment */
> + crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> +crash_base, crash_max);
> + if (!crash_base) {
> + pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> + crash_size);
> + return;
>   }
> - memblock_reserve(crash_base, crash_size);

We'll miss a bit on debug information provided to the user in case of a
wrong crash_base/size option on the command line. Not sure we care much,
though the alignment would probably be useful (maybe we document it
somewhere).

What I haven't checked is whether memblock_phys_alloc_range() aims to
get a 2MB aligned end (size) as well. If crash_size is not 2MB aligned,
crash_max wouldn't be either and the above could fail. We only care
about the crash_base to be aligned but the memblock_phys_alloc_range()
doc says that both the start and size would be aligned to this.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

2021-08-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 03 August 2021 16:31
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; qper...@google.com; Linuxarm
> 
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> On Tue, Aug 03, 2021 at 12:55:25PM +, Shameerali Kolothum Thodi
> wrote:
> > > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > > > index 5584e84aed95..5fd51f5445c1 100644
> > > > --- a/arch/arm64/kvm/vmid.c
> > > > +++ b/arch/arm64/kvm/vmid.c
> > > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> > > *kvm_vmid)
> > > > return idx2vmid(vmid) | generation;
> > > >  }
> > > >
> > > > +/* Call with preemption disabled */
> > > > +void kvm_arm_vmid_clear_active(void)
> > > > +{
> > > > +   atomic64_set(this_cpu_ptr(_vmids), 0);
> > > > +}
> > >
> > > I think this is very broken, as it will force everybody to take the
> > > slow-path when they see an active_vmid of 0.
> >
> > Yes. I have seen that happening in my test setup.
> 
> Why didn't you say so?!

Sorry. I thought of getting some performance numbers with and
without this patch and measure the impact. But didn't quite get time
to finish it yet.
 
> 
> > > It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> > > means that the reserved vmid is preserved.
> > >
> > > Needs more thought...
> >
> > How about we clear all the active_vmids in kvm_arch_free_vm() if it
> > matches the kvm_vmid->id ? But we may have to hold the lock
> > there
> 
> I think we have to be really careful not to run into the "suspended
> animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
> allocator handle suspended animation") if we go down this road.


Ok. I will go through that.
 
> Maybe something along the lines of:
> 
> ROLLOVER
> 
>   * Take lock
>   * Inc generation
> => This will force everybody down the slow path
>   * Record active VMIDs
>   * Broadcast TLBI
> => Only active VMIDs can be dirty
> => Reserve active VMIDs and mark as allocated
> 
> VCPU SCHED IN
> 
>   * Set active VMID
>   * Check generation
>   * If mismatch then:
> * Take lock
> * Try to match a reserved VMID
> * If no reserved VMID, allocate new
> 
> VCPU SCHED OUT
> 
>   * Clear active VMID
> 
> but I'm not daft enough to think I got it right first time. I think it
> needs both implementing *and* modelling in TLA+ before we merge it!
> 

Ok. I need some time to digest the above first :).

On another note, how serious do you think is the problem of extra
reservation of the VMID space? Just wondering if we can skip this
patch for now or not..

Thanks,
Shameer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 10/15] KVM: arm64: Guest exit handlers for nVHE hyp

2021-08-03 Thread Will Deacon
On Mon, Jul 19, 2021 at 05:03:41PM +0100, Fuad Tabba wrote:
> Add an array of pointers to handlers for various trap reasons in
> nVHE code.
> 
> The current code selects how to fixup a guest on exit based on a
> series of if/else statements. Future patches will also require
> different handling for guest exists. Create an array of handlers
> to consolidate them.
> 
> No functional change intended as the array isn't populated yet.
> 
> Acked-by: Will Deacon 
> Signed-off-by: Fuad Tabba 
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 43 +
>  arch/arm64/kvm/hyp/nvhe/switch.c| 35 
>  2 files changed, 78 insertions(+)

Definitely keep my Ack on this, but Clang just chucked out a warning due to:

> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index a0e78a6027be..5a2b89b96c67 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -409,6 +409,46 @@ static inline bool __hyp_handle_ptrauth(struct kvm_vcpu 
> *vcpu)
>   return true;
>  }
>  
> +typedef int (*exit_handle_fn)(struct kvm_vcpu *);
> +
> +exit_handle_fn kvm_get_nvhe_exit_handler(struct kvm_vcpu *vcpu);

and:

> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c 
> b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 86f3d6482935..36da423006bd 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -158,6 +158,41 @@ static void __pmu_switch_to_host(struct kvm_cpu_context 
> *host_ctxt)
>   write_sysreg(pmu->events_host, pmcntenset_el0);
>  }
>  
> +typedef int (*exit_handle_fn)(struct kvm_vcpu *);

Which leads to:

arch/arm64/kvm/hyp/nvhe/switch.c:189:15: warning: redefinition of typedef 
'exit_handle_fn' is a C11 feature [-Wtypedef-redefinition]
typedef int (*exit_handle_fn)(struct kvm_vcpu *);
  ^
./arch/arm64/kvm/hyp/include/hyp/switch.h:416:15: note: previous definition is 
here
typedef int (*exit_handle_fn)(struct kvm_vcpu *);
  ^
1 warning generated.

So I guess just pick your favourite?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

2021-08-03 Thread Will Deacon
On Tue, Aug 03, 2021 at 12:55:25PM +, Shameerali Kolothum Thodi wrote:
> > > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > > index 5584e84aed95..5fd51f5445c1 100644
> > > --- a/arch/arm64/kvm/vmid.c
> > > +++ b/arch/arm64/kvm/vmid.c
> > > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> > *kvm_vmid)
> > >   return idx2vmid(vmid) | generation;
> > >  }
> > >
> > > +/* Call with preemption disabled */
> > > +void kvm_arm_vmid_clear_active(void)
> > > +{
> > > + atomic64_set(this_cpu_ptr(_vmids), 0);
> > > +}
> > 
> > I think this is very broken, as it will force everybody to take the
> > slow-path when they see an active_vmid of 0.
> 
> Yes. I have seen that happening in my test setup.

Why didn't you say so?!

> > It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> > means that the reserved vmid is preserved.
> > 
> > Needs more thought...
> 
> How about we clear all the active_vmids in kvm_arch_free_vm() if it
> matches the kvm_vmid->id ? But we may have to hold the lock 
> there

I think we have to be really careful not to run into the "suspended
animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
allocator handle suspended animation") if we go down this road.

Maybe something along the lines of:

ROLLOVER

  * Take lock
  * Inc generation
=> This will force everybody down the slow path
  * Record active VMIDs
  * Broadcast TLBI
=> Only active VMIDs can be dirty
=> Reserve active VMIDs and mark as allocated

VCPU SCHED IN

  * Set active VMID
  * Check generation
  * If mismatch then:
* Take lock
* Try to match a reserved VMID
* If no reserved VMID, allocate new

VCPU SCHED OUT

  * Clear active VMID

but I'm not daft enough to think I got it right first time. I think it
needs both implementing *and* modelling in TLA+ before we merge it!

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

2021-08-03 Thread Fuad Tabba
Hi Quentin,

> > > +   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
> >
> > nit: for clarity, I wonder if it might be good to create an alias of
> > __hyp_bss_end as __bss_start or something. When it's been moved here,
> > it sticks out a bit more and makes the reader wonder about the
> > significance of __hyp_bss_end.
>
> I understand what you mean, but I'm not sure this aliasing is really
> going to clarify things much. We have a comment in arm.c (see
> init_hyp_mode()) to explain exactly why we're doing this, so maybe it
> would be worth adding it here too. WDYT?

Not sure to be honest. Comments are good, until they're stale, and
replicating the comment increases the odds of that happening. No
strong opinion either way.

> > > +   if (ret)
> > > +   return ret;
> > > +
> > > return 0;
> > >  }
> > >
> > > @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> > > hyp_put_page(, addr);
> > >  }
> > >
> > > +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> > > +kvm_pte_t *ptep,
> > > +enum kvm_pgtable_walk_flags flag,
> > > +void * const arg)
> > > +{
> > > +   enum kvm_pgtable_prot prot;
> > > +   enum pkvm_page_state state;
> > > +   kvm_pte_t pte = *ptep;
> > > +   phys_addr_t phys;
> > > +
> > > +   if (!kvm_pte_valid(pte))
> > > +   return 0;
> > > +
> > > +   if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> > > +   return -EINVAL;
> >
> > I know that it's not in scope here, but I'm wondering whether we
> > should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe
>
> Well these would check different things no?
>
> > it would be good to have a helper somewhere for all these checks both
> > for clarity and to ensure that nothing has gone wrong with the pte.
>
> The reason I need this check is just to make sure the call to
> host_stage2_idmap_locked() further down is correct with a hardcoded
> PAGE_SIZE size. The alternative would be to not be lazy and actually
> compute the current granule size based on the level and use that, as
> that would make this code robust to using block mappings at EL2 stage-1
> in the future.
>
> And I'll fix this up for v4.

I get it now. Thanks!
/fuad


> Cheers,
> Quentin
>
> > > +
> > > +   phys = kvm_pte_to_phys(pte);
> > > +   if (!addr_is_memory(phys))
> > > +   return 0;
> > > +
> > > +   /*
> > > +* Adjust the host stage-2 mappings to match the ownership 
> > > attributes
> > > +* configured in the hypervisor stage-1.
> > > +*/
> > > +   state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> > > +   switch (state) {
> > > +   case PKVM_PAGE_OWNED:
> > > +   return host_stage2_set_owner_locked(phys, phys + 
> > > PAGE_SIZE, pkvm_hyp_id);
> > > +   case PKVM_PAGE_SHARED_OWNED:
> > > +   prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, 
> > > PKVM_PAGE_SHARED_BORROWED);
> > > +   break;
> > > +   case PKVM_PAGE_SHARED_BORROWED:
> > > +   prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, 
> > > PKVM_PAGE_SHARED_OWNED);
> > > +   break;
> > > +   default:
> > > +   return -EINVAL;
> > > +   }
> > > +
> > > +   return host_stage2_idmap_locked(phys, phys + PAGE_SIZE, prot);
> > > +}
> > > +
> > > +static int finalize_host_mappings(void)
> > > +{
> > > +   struct kvm_pgtable_walker walker = {
> > > +   .cb = finalize_host_mappings_walker,
> > > +   .flags  = KVM_PGTABLE_WALK_LEAF,
> > > +   };
> > > +
> > > +   return kvm_pgtable_walk(_pgtable, 0, 
> > > BIT(pkvm_pgtable.ia_bits), );
> > > +}
> > > +
> > >  void __noreturn __pkvm_init_finalise(void)
> > >  {
> > > struct kvm_host_data *host_data = this_cpu_ptr(_host_data);
> > > @@ -167,6 +229,10 @@ void __noreturn __pkvm_init_finalise(void)
> > > if (ret)
> > > goto out;
> > >
> > > +   ret = finalize_host_mappings();
> > > +   if (ret)
> > > +   goto out;
> > > +
> > > pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
> > > .zalloc_page = hyp_zalloc_hyp_page,
> > > .phys_to_virt = hyp_phys_to_virt,
> > > --
> > > 2.32.0.432.gabb21c7263-goog
> > >
> >
> > Thanks,
> > /fuad
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

2021-08-03 Thread Fuad Tabba
Hi Quentin,

> > > +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot 
> > > prot)
> > > +{
> > > +   int ret;
> > > +
> > > +   hyp_spin_lock(_pgd_lock);
> > > +   ret = pkvm_create_mappings_locked(from, to, prot);
> > > +   hyp_spin_unlock(_pgd_lock);
> > > +
> > > +   return ret;
> > > +}
> > > +
> >
> > I'm wondering whether this patch should also refactor
> > __pkvm_create_mappings. It doesn't quite do the exact same thing and
> > has different parameters.
>
> Sorry, not sure I'm understanding your suggestion here. What do you
> think should be done to __pkvm_create_mappings?

Sorry, my comment wasn't very clear, and "refactor" is the wrong word.
I think it should probably be renamed, because __pkvm_create_mappings
isn't called by pkvm_create_mappings nor by
pkvm_create_mappings_locked. It also has different parameters and
behaves slightly differently.

Thanks,
/fuad

> Cheers,
> Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

2021-08-03 Thread Fuad Tabba
Hi Quentin,

> > > +static bool stage2_block_mapping_allowed(u64 addr, u64 end, u32 level,
> > > +struct stage2_map_data *data)
> > > +{
> > > +   if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> > > +   return false;
> >
> > I'm not sure I understand why checking the level is necessary. Can
> > there be block mapping at the last possible level?
>
> That's probably just a matter of naming, but this function is in fact
> called at every level, just like kvm_block_mapping_supported() was
> before. And we rely on it returning true at the last level, so I need to
> do that check here.
>
> Maybe renaming this stage2_leaf_mapping_allowed() would clarify?

Yes it would.

Thanks,
/fuad
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode

2021-08-03 Thread Quentin Perret
On Tuesday 03 Aug 2021 at 10:22:03 (+0200), Fuad Tabba wrote:
> Hi Quentin,
> 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> > b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 0ccea58df7e0..1b67f562b6fc 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -338,6 +338,95 @@ static int host_stage2_idmap(u64 addr)
> > return ret;
> >  }
> >
> > +static inline bool check_prot(enum kvm_pgtable_prot prot,
> > + enum kvm_pgtable_prot required,
> > + enum kvm_pgtable_prot denied)
> > +{
> > +   return (prot & (required | denied)) == required;
> > +}
> > +
> > +int __pkvm_host_share_hyp(u64 pfn)
> > +{
> > +   phys_addr_t addr = hyp_pfn_to_phys(pfn);
> > +   enum kvm_pgtable_prot prot, cur;
> > +   void *virt = __hyp_va(addr);
> > +   enum pkvm_page_state state;
> > +   kvm_pte_t pte;
> > +   u32 level;
> > +   int ret;
> > +
> > +   if (!range_is_memory(addr, addr + PAGE_SIZE))
> > +   return -EINVAL;
> > +
> > +   hyp_spin_lock(_kvm.lock);
> > +   hyp_spin_lock(_pgd_lock);
> > +
> > +   ret = kvm_pgtable_get_leaf(_kvm.pgt, addr, , );
> > +   if (ret)
> > +   goto unlock;
> > +   if (!pte)
> > +   goto map_shared;
> 
> Should this check whether kvm_pte_valid as well, is that guaranteed to
> always be the case, or implicitly handled later?

Yep, this is implicitly handled by kvm_pgtable_stage2_pte_prot() which
is guaranteed not to return KVM_PGTABLE_PROT_RWX for an invalid mapping.

> > +
> > +   /*
> > +* Check attributes in the host stage-2 PTE. We need the page to be:
> > +*  - mapped RWX as we're sharing memory;
> > +*  - not borrowed, as that implies absence of ownership.
> > +* Otherwise, we can't let it got through
> > +*/
> > +   cur = kvm_pgtable_stage2_pte_prot(pte);
> > +   prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
> > +   if (!check_prot(cur, KVM_PGTABLE_PROT_RWX, prot)) {
> > +   ret = -EPERM;
> > +   goto unlock;
> > +   }
> > +
> > +   state = pkvm_getstate(cur);
> > +   if (state == PKVM_PAGE_OWNED)
> > +   goto map_shared;
> > +
> > +   /*
> > +* Tolerate double-sharing the same page, but this requires
> > +* cross-checking the hypervisor stage-1.
> > +*/
> > +   if (state != PKVM_PAGE_SHARED_OWNED) {
> > +   ret = -EPERM;
> > +   goto unlock;
> > +   }
> > +
> > +   ret = kvm_pgtable_get_leaf(_pgtable, (u64)virt, , );
> > +   if (ret)
> > +   goto unlock;
> > +
> > +   /*
> > +* If the page has been shared with the hypervisor, it must be
> > +* SHARED_BORROWED already.
> > +*/
> 
> This comment confused me at first, but then I realized it's referring
> to the page from the hyp's point of view. Could you add something to
> the comment to that effect?

Sure thing.

> It might also make it easier to follow if the variables could be
> annotated to specify whether cur, state, and prot are the host's or
> hyps (and not reuse the same one for both).
> 
> > +   cur = kvm_pgtable_hyp_pte_prot(pte);
> > +   prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> > +   if (!check_prot(cur, prot, ~prot))
> > +   ret = EPERM;
> > +   goto unlock;
> > +
> > +map_shared:
> > +   /*
> > +* If the page is not yet shared, adjust mappings in both 
> > page-tables
> > +* while both locks are held.
> > +*/
> > +   prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> > +   ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
> > +   BUG_ON(ret);
> > +
> > +   prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> > +   ret = host_stage2_idmap_locked(addr, addr + PAGE_SIZE, prot);
> > +   BUG_ON(ret);
> > +
> > +unlock:
> > +   hyp_spin_unlock(_pgd_lock);
> > +   hyp_spin_unlock(_kvm.lock);
> > +
> > +   return ret;
> > +}
> > +
> >  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
> >  {
> > struct kvm_vcpu_fault_info fault;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 0625bf2353c2..cbab146cda6a 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -259,10 +259,8 @@ static int __create_hyp_mappings(unsigned long start, 
> > unsigned long size,
> >  {
> > int err;
> >
> > -   if (!kvm_host_owns_hyp_mappings()) {
> > -   return kvm_call_hyp_nvhe(__pkvm_create_mappings,
> > -start, size, phys, prot);
> > -   }
> > +   if (WARN_ON(!kvm_host_owns_hyp_mappings()))
> > +   return -EINVAL;
> >
> > mutex_lock(_hyp_pgd_mutex);
> > err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
> 

Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

2021-08-03 Thread Quentin Perret
Hey Fuad,

On Tuesday 03 Aug 2021 at 07:31:03 (+0200), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret  wrote:
> >
> > Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
> > a new pkvm_create_mappings_locked() function. This will be used in later
> > patches to allow walking and changing the hypervisor stage-1 without
> > releasing the lock.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
> >  arch/arm64/kvm/hyp/nvhe/mm.c | 18 --
> >  2 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h 
> > b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 8ec3a5a7744b..c76d7136ed9b 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -23,6 +23,7 @@ int hyp_map_vectors(void);
> >  int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t 
> > back);
> >  int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> >  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +int pkvm_create_mappings_locked(void *from, void *to, enum 
> > kvm_pgtable_prot prot);
> >  int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >unsigned long phys, enum kvm_pgtable_prot prot);
> >  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index a8efdf0f9003..6fbe8e8030f6 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -67,13 +67,15 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t 
> > phys, size_t size,
> > return addr;
> >  }
> >
> > -int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > +int pkvm_create_mappings_locked(void *from, void *to, enum 
> > kvm_pgtable_prot prot)
> >  {
> > unsigned long start = (unsigned long)from;
> > unsigned long end = (unsigned long)to;
> > unsigned long virt_addr;
> > phys_addr_t phys;
> >
> > +   hyp_assert_lock_held(_pgd_lock);
> > +
> > start = start & PAGE_MASK;
> > end = PAGE_ALIGN(end);
> >
> > @@ -81,7 +83,8 @@ int pkvm_create_mappings(void *from, void *to, enum 
> > kvm_pgtable_prot prot)
> > int err;
> >
> > phys = hyp_virt_to_phys((void *)virt_addr);
> > -   err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, 
> > prot);
> > +   err = kvm_pgtable_hyp_map(_pgtable, virt_addr, 
> > PAGE_SIZE,
> > + phys, prot);
> > if (err)
> > return err;
> > }
> > @@ -89,6 +92,17 @@ int pkvm_create_mappings(void *from, void *to, enum 
> > kvm_pgtable_prot prot)
> > return 0;
> >  }
> >
> > +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > +{
> > +   int ret;
> > +
> > +   hyp_spin_lock(_pgd_lock);
> > +   ret = pkvm_create_mappings_locked(from, to, prot);
> > +   hyp_spin_unlock(_pgd_lock);
> > +
> > +   return ret;
> > +}
> > +
> 
> I'm wondering whether this patch should also refactor
> __pkvm_create_mappings. It doesn't quite do the exact same thing and
> has different parameters.

Sorry, not sure I'm understanding your suggestion here. What do you
think should be done to __pkvm_create_mappings?

Cheers,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

2021-08-03 Thread Quentin Perret
On Tuesday 03 Aug 2021 at 07:02:42 (+0200), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret  wrote:
> >
> > As the hypervisor maps the host's .bss and .rodata sections in its
> > stage-1, make sure to tag them as shared in hyp and host page-tables.
> >
> > But since the hypervisor relies on the presence of these mappings, we
> > cannot let the host in complete control of the memory regions -- it
> > must not unshare or donate them to another entity for example. To
> > prevent this, let's transfer the ownership of those ranges to the
> > hypervisor itself, and share the pages back with the host.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/nvhe/setup.c | 82 +
> >  1 file changed, 74 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c 
> > b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 0b574d106519..7f557b264f62 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -58,6 +58,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> > unsigned long size,
> >  {
> > void *start, *end, *virt = hyp_phys_to_virt(phys);
> > unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> > +   enum kvm_pgtable_prot prot;
> > int ret, i;
> >
> > /* Recreate the hyp page-table using the early page allocator */
> > @@ -83,10 +84,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> > unsigned long size,
> > if (ret)
> > return ret;
> >
> > -   ret = pkvm_create_mappings(__start_rodata, __end_rodata, 
> > PAGE_HYP_RO);
> > -   if (ret)
> > -   return ret;
> > -
> > ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, 
> > PAGE_HYP_RO);
> > if (ret)
> > return ret;
> > @@ -95,10 +92,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> > unsigned long size,
> > if (ret)
> > return ret;
> >
> > -   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
> > -   if (ret)
> > -   return ret;
> > -
> > ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
> > if (ret)
> > return ret;
> > @@ -117,6 +110,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> > unsigned long size,
> > return ret;
> > }
> >
> > +   /*
> > +* Map the host's .bss and .rodata sections RO in the hypervisor, 
> > but
> > +* transfer the ownerhsip from the host to the hypervisor itself to
> > +* make sure it can't be donated or shared with another entity.
> 
> nit: ownerhsip -> ownership
> 
> > +*
> > +* The ownership transtion requires matching changes in the host
> 
> nit: transtion -> transition
> 
> > +* stage-2. This will done later (see finalize_host_mappings()) 
> > once the
> 
> nit: will done -> will be done

Urgh, I clearly went too fast writing this, thanks!

> > +* hyp_vmemmap is addressable.
> > +*/
> > +   prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
> > +   ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
> > +   if (ret)
> > +   return ret;
> > +
> > +   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
> 
> nit: for clarity, I wonder if it might be good to create an alias of
> __hyp_bss_end as __bss_start or something. When it's been moved here,
> it sticks out a bit more and makes the reader wonder about the
> significance of __hyp_bss_end.

I understand what you mean, but I'm not sure this aliasing is really
going to clarify things much. We have a comment in arm.c (see
init_hyp_mode()) to explain exactly why we're doing this, so maybe it
would be worth adding it here too. WDYT?

> > +   if (ret)
> > +   return ret;
> > +
> > return 0;
> >  }
> >
> > @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> > hyp_put_page(, addr);
> >  }
> >
> > +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> > +kvm_pte_t *ptep,
> > +enum kvm_pgtable_walk_flags flag,
> > +void * const arg)
> > +{
> > +   enum kvm_pgtable_prot prot;
> > +   enum pkvm_page_state state;
> > +   kvm_pte_t pte = *ptep;
> > +   phys_addr_t phys;
> > +
> > +   if (!kvm_pte_valid(pte))
> > +   return 0;
> > +
> > +   if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> > +   return -EINVAL;
> 
> I know that it's not in scope here, but I'm wondering whether we
> should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe

Well these would check different things no?

> it would be good to have a helper somewhere for all these checks both
> for clarity and to ensure that nothing has gone wrong 

Re: [PATCH v3 16/21] KVM: arm64: Enable retrieving protections attributes of PTEs

2021-08-03 Thread Quentin Perret
On Monday 02 Aug 2021 at 16:52:49 (+0200), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret  wrote:
> >
> > Introduce helper functions in the KVM stage-2 and stage-1 page-table
> > manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
> > PTE. This will be useful to implement custom walkers outside of
> > pgtable.c.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 20 +++
> >  arch/arm64/kvm/hyp/pgtable.c | 37 
> >  2 files changed, 57 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > b/arch/arm64/include/asm/kvm_pgtable.h
> > index d5ca9b6ce241..7ff9f52239ba 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -505,4 +505,24 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 
> > addr, u64 size,
> >   */
> >  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> >  kvm_pte_t *ptep, u32 *level);
> > +
> > +/**
> > + * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
> > + *stage-2 Page-Table Entry.
> > + * @pte:   Page-table entry
> > + *
> > + * Return: protection attributes of the page-table entry in the enum
> > + *kvm_pgtable_prot format.
> > + */
> > +enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> > +
> > +/**
> > + * kvm_pgtable_hyp_pte_prot() - Retrieve the protection attributes of a 
> > stage-1
> > + * Page-Table Entry.
> > + * @pte:   Page-table entry
> > + *
> > + * Return: protection attributes of the page-table entry in the enum
> > + *kvm_pgtable_prot format.
> > + */
> > +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> >  #endif /* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 1915489bb127..a6eda8f23cb6 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -363,6 +363,26 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot 
> > prot, kvm_pte_t *ptep)
> > return 0;
> >  }
> >
> > +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
> > +{
> > +   enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
> > +   u32 ap;
> > +
> > +   if (!kvm_pte_valid(pte))
> > +   return prot;
> > +
> > +   if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
> > +   prot |= KVM_PGTABLE_PROT_X;
> > +
> > +   ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
> > +   if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
> > +   prot |= KVM_PGTABLE_PROT_R;
> > +   else if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RW)
> > +   prot |= KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> 
> nit: why not use the freshly minted KVM_PGTABLE_PROT_RW?

No good reason, I'll fix that up, thanks!
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 15/21] KVM: arm64: Introduce addr_is_memory()

2021-08-03 Thread Quentin Perret
On Monday 02 Aug 2021 at 16:52:31 (+0200), Fuad Tabba wrote:
> Hi Quentin.
> 
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret  wrote:
> >
> > Introduce a helper usable in nVHE protected mode to check whether a
> > physical address is in a RAM region or not.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +++
> >  2 files changed, 8 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
> > b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index cc86598654b9..5968fbbb3514 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -51,6 +51,7 @@ extern const u8 pkvm_hyp_id;
> >  int __pkvm_prot_finalize(void);
> >  int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
> >
> > +bool addr_is_memory(phys_addr_t phys);
> 
> I'm just wondering about the naming of the function. I understand what
> you're trying to achieve with it, but an address without a unit that
> conveys size or type seems to be missing something. Would

Well it does have a type no? I was hopping this would make it clear what
it actually does.

> memregion_addr_is_memory or something like that be a better
> description, since it is what find_mem_range finds?

I think the callers shouldn't need to care about the implementation
details though. This just replies to the question 'is this physical
address in RAM range or not?'. And I could actually imagine that we
would change the implementation some day to avoid the binary search, but
the users probably don't need to care.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 13/21] KVM: arm64: Expose host stage-2 manipulation helpers

2021-08-03 Thread Quentin Perret
On Monday 02 Aug 2021 at 13:13:20 (+0200), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret  wrote:
> >
> > We will need to manipulate the host stage-2 page-table from outside
> > mem_protect.c soon. Introduce two functions allowing this, and make
> > them usable to users of mem_protect.h.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 ++
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 17 -
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
> > b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index ae355bfd8c01..47c2a0c51612 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -49,6 +49,8 @@ extern struct host_kvm host_kvm;
> >  int __pkvm_prot_finalize(void);
> >  int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
> >
> > +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot 
> > prot);
> > +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
> >  int kvm_host_prepare_stage2(void *pgt_pool_base);
> >  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> > b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 70c57d2c3024..a7f6134789e0 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -272,6 +272,21 @@ static int host_stage2_adjust_range(u64 addr, struct 
> > kvm_mem_range *range)
> > return 0;
> >  }
> >
> > +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot 
> > prot)
> > +{
> > +   hyp_assert_lock_held(_kvm.lock);
> > +
> > +   return host_stage2_try(__host_stage2_idmap, start, end, prot);
> > +}
> > +
> > +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id)
> > +{
> > +   hyp_assert_lock_held(_kvm.lock);
> > +
> > +   return host_stage2_try(kvm_pgtable_stage2_set_owner, _kvm.pgt,
> > +  start, end - start, _s2_pool, owner_id);
> > +}
> 
> This is a potential issue elsewhere as well, but all functions in
> kvm_pgtable.h, including kvm_pgtable_stage2_set_owner, specify an
> address range via address and size. The two you have introduced here
> take a start and an end. I'm not sure if making these two consistent
> with the ones in kvm_pgtable.h would be good, or would just complicate
> things in other places.

Good point, and it looks like specifying these two with start-size
parameters would simplify the callers a tiny bit as well, so I'll fold
that in v4.

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

2021-08-03 Thread Quentin Perret
Hi Fuad,

On Monday 02 Aug 2021 at 11:49:28 (+0200), Fuad Tabba wrote:
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret  wrote:
> >
> > Much of the stage-2 manipulation logic relies on being able to destroy
> > block mappings if e.g. installing a smaller mapping in the range. The
> > rationale for this behaviour is that stage-2 mappings can always be
> > re-created lazily. However, this gets more complicated when the stage-2
> > page-table is used to store metadata about the underlying pages. In such
> > cases, destroying a block mapping may lead to losing part of the state,
> > and confuse the user of those metadata (such as the hypervisor in nVHE
> > protected mode).
> >
> > To avoid this, introduce a callback function in the pgtable struct which
> > is called during all map operations to determine whether the mappings
> > can use blocks, or should be forced to page granularity. This is used by
> > the hypervisor when creating the host stage-2 to force page-level
> > mappings when using non-default protection attributes.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h  | 65 ---
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++--
> >  arch/arm64/kvm/hyp/pgtable.c  | 29 +---
> >  3 files changed, 91 insertions(+), 33 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > b/arch/arm64/include/asm/kvm_pgtable.h
> > index 83c5c97d9eac..ba7dcade2798 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -115,25 +115,6 @@ enum kvm_pgtable_stage2_flags {
> > KVM_PGTABLE_S2_IDMAP= BIT(1),
> >  };
> >
> > -/**
> > - * struct kvm_pgtable - KVM page-table.
> > - * @ia_bits:   Maximum input address size, in bits.
> > - * @start_level:   Level at which the page-table walk starts.
> > - * @pgd:   Pointer to the first top-level entry of the 
> > page-table.
> > - * @mm_ops:Memory management callbacks.
> > - * @mmu:   Stage-2 KVM MMU struct. Unused for stage-1 
> > page-tables.
> > - */
> > -struct kvm_pgtable {
> > -   u32 ia_bits;
> > -   u32 start_level;
> > -   kvm_pte_t   *pgd;
> > -   struct kvm_pgtable_mm_ops   *mm_ops;
> > -
> > -   /* Stage-2 only */
> > -   struct kvm_s2_mmu   *mmu;
> > -   enum kvm_pgtable_stage2_flags   flags;
> > -};
> > -
> >  /**
> >   * enum kvm_pgtable_prot - Page-table permissions and attributes.
> >   * @KVM_PGTABLE_PROT_X:Execute permission.
> > @@ -149,11 +130,41 @@ enum kvm_pgtable_prot {
> > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> >  };
> >
> > -#define PAGE_HYP   (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > +#define KVM_PGTABLE_PROT_RW(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > +#define KVM_PGTABLE_PROT_RWX   (KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X)
> > +
> > +#define PAGE_HYP   KVM_PGTABLE_PROT_RW
> >  #define PAGE_HYP_EXEC  (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X)
> >  #define PAGE_HYP_RO(KVM_PGTABLE_PROT_R)
> >  #define PAGE_HYP_DEVICE(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
> 
> I wonder if it would be useful to add a couple of other aliases for
> default memory and default mmio protections, e.g.,
> 
> #define  KVM_PGTABLE_PROT_MEM KVM_PGTABLE_PROT_RWX
> #define  KVM_PGTABLE_PROT_MMIO KVM_PGTABLE_PROT_RW
> 
> I think that using these below, e.g., host_stage2_force_pte_cb(),
> might make it clearer and answer comments you had in earlier patches
> about why "RWX" for memory.

Sure I can add something. I'll probably call them something else than
KVM_PGTABLE_PROT_{MEM,MMIO} though, just to make it clear this is all
specific to the host stage-2 stuff and not a general requirement of the
pgtable code to map things like this.

> >
> > +typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
> > +  enum kvm_pgtable_prot prot);
> > +
> > +/**
> > + * struct kvm_pgtable - KVM page-table.
> > + * @ia_bits:   Maximum input address size, in bits.
> > + * @start_level:   Level at which the page-table walk starts.
> > + * @pgd:   Pointer to the first top-level entry of the 
> > page-table.
> > + * @mm_ops:Memory management callbacks.
> > + * @mmu:   Stage-2 KVM MMU struct. Unused for stage-1 
> > page-tables.
> > + * @flags: Stage-2 page-table flags.
> > + * @force_pte_cb:  Callback function used during map operations to 
> > decide
> > + * whether block mappings can be used to map the given 
> > IPA
> > + * range.
> > + */
> 
> nit: I think it might be clearer (and probably not longer) to rephrase
> to describe in terms of the return value 

RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

2021-08-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 03 August 2021 12:41
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; qper...@google.com; Linuxarm
> 
> Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
> schedule out
> 
> On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote:
> > Like ASID allocator, we copy the active_vmids into the
> > reserved_vmids on a rollover. But it's unlikely that
> > every CPU will have a vCPU as current task and we may
> > end up unnecessarily reserving the VMID space.
> >
> > Hence, clear active_vmids when scheduling out a vCPU.
> >
> > Suggested-by: Will Deacon 
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 1 +
> >  arch/arm64/kvm/arm.c  | 1 +
> >  arch/arm64/kvm/vmid.c | 6 ++
> >  3 files changed, 8 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> > index bb993bce1363..d93141cb8d16 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits;
> >  int kvm_arm_vmid_alloc_init(void);
> >  void kvm_arm_vmid_alloc_free(void);
> >  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> > +void kvm_arm_vmid_clear_active(void);
> >
> >  static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch
> *vcpu_arch)
> >  {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 077e55a511a9..b134a1b89c84 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > kvm_timer_vcpu_put(vcpu);
> > kvm_vgic_put(vcpu);
> > kvm_vcpu_pmu_restore_host(vcpu);
> > +   kvm_arm_vmid_clear_active();
> >
> > vcpu->cpu = -1;
> >  }
> > diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> > index 5584e84aed95..5fd51f5445c1 100644
> > --- a/arch/arm64/kvm/vmid.c
> > +++ b/arch/arm64/kvm/vmid.c
> > @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
> *kvm_vmid)
> > return idx2vmid(vmid) | generation;
> >  }
> >
> > +/* Call with preemption disabled */
> > +void kvm_arm_vmid_clear_active(void)
> > +{
> > +   atomic64_set(this_cpu_ptr(_vmids), 0);
> > +}
> 
> I think this is very broken, as it will force everybody to take the
> slow-path when they see an active_vmid of 0.

Yes. I have seen that happening in my test setup.

> It also doesn't solve the issue I mentioned before, as an active_vmid of 0
> means that the reserved vmid is preserved.
> 
> Needs more thought...

How about we clear all the active_vmids in kvm_arch_free_vm() if it
matches the kvm_vmid->id ? But we may have to hold the lock 
there.

Thanks,
Shameer
 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode

2021-08-03 Thread Fuad Tabba
Hi Quentin,

> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 0ccea58df7e0..1b67f562b6fc 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -338,6 +338,95 @@ static int host_stage2_idmap(u64 addr)
> return ret;
>  }
>
> +static inline bool check_prot(enum kvm_pgtable_prot prot,
> + enum kvm_pgtable_prot required,
> + enum kvm_pgtable_prot denied)
> +{
> +   return (prot & (required | denied)) == required;
> +}
> +
> +int __pkvm_host_share_hyp(u64 pfn)
> +{
> +   phys_addr_t addr = hyp_pfn_to_phys(pfn);
> +   enum kvm_pgtable_prot prot, cur;
> +   void *virt = __hyp_va(addr);
> +   enum pkvm_page_state state;
> +   kvm_pte_t pte;
> +   u32 level;
> +   int ret;
> +
> +   if (!range_is_memory(addr, addr + PAGE_SIZE))
> +   return -EINVAL;
> +
> +   hyp_spin_lock(_kvm.lock);
> +   hyp_spin_lock(_pgd_lock);
> +
> +   ret = kvm_pgtable_get_leaf(_kvm.pgt, addr, , );
> +   if (ret)
> +   goto unlock;
> +   if (!pte)
> +   goto map_shared;

Should this check whether kvm_pte_valid as well, is that guaranteed to
always be the case, or implicitly handled later?

> +
> +   /*
> +* Check attributes in the host stage-2 PTE. We need the page to be:
> +*  - mapped RWX as we're sharing memory;
> +*  - not borrowed, as that implies absence of ownership.
> +* Otherwise, we can't let it got through
> +*/
> +   cur = kvm_pgtable_stage2_pte_prot(pte);
> +   prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
> +   if (!check_prot(cur, KVM_PGTABLE_PROT_RWX, prot)) {
> +   ret = -EPERM;
> +   goto unlock;
> +   }
> +
> +   state = pkvm_getstate(cur);
> +   if (state == PKVM_PAGE_OWNED)
> +   goto map_shared;
> +
> +   /*
> +* Tolerate double-sharing the same page, but this requires
> +* cross-checking the hypervisor stage-1.
> +*/
> +   if (state != PKVM_PAGE_SHARED_OWNED) {
> +   ret = -EPERM;
> +   goto unlock;
> +   }
> +
> +   ret = kvm_pgtable_get_leaf(_pgtable, (u64)virt, , );
> +   if (ret)
> +   goto unlock;
> +
> +   /*
> +* If the page has been shared with the hypervisor, it must be
> +* SHARED_BORROWED already.
> +*/

This comment confused me at first, but then I realized it's referring
to the page from the hyp's point of view. Could you add something to
the comment to that effect?

It might also make it easier to follow if the variables could be
annotated to specify whether cur, state, and prot are the host's or
hyps (and not reuse the same one for both).

> +   cur = kvm_pgtable_hyp_pte_prot(pte);
> +   prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> +   if (!check_prot(cur, prot, ~prot))
> +   ret = EPERM;
> +   goto unlock;
> +
> +map_shared:
> +   /*
> +* If the page is not yet shared, adjust mappings in both page-tables
> +* while both locks are held.
> +*/
> +   prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> +   ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
> +   BUG_ON(ret);
> +
> +   prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> +   ret = host_stage2_idmap_locked(addr, addr + PAGE_SIZE, prot);
> +   BUG_ON(ret);
> +
> +unlock:
> +   hyp_spin_unlock(_pgd_lock);
> +   hyp_spin_unlock(_kvm.lock);
> +
> +   return ret;
> +}
> +
>  void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
>  {
> struct kvm_vcpu_fault_info fault;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 0625bf2353c2..cbab146cda6a 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -259,10 +259,8 @@ static int __create_hyp_mappings(unsigned long start, 
> unsigned long size,
>  {
> int err;
>
> -   if (!kvm_host_owns_hyp_mappings()) {
> -   return kvm_call_hyp_nvhe(__pkvm_create_mappings,
> -start, size, phys, prot);
> -   }
> +   if (WARN_ON(!kvm_host_owns_hyp_mappings()))
> +   return -EINVAL;
>
> mutex_lock(_hyp_pgd_mutex);
> err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
> @@ -282,6 +280,21 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
> }
>  }
>
> +static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> +{
> +   phys_addr_t addr;
> +   int ret;
> +
> +   for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += 
> PAGE_SIZE) {
> +   ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
> +   __phys_to_pfn(addr));

I guess we don't expect this to happen often, but I wonder 

RE: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM

2021-08-03 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Will Deacon [mailto:w...@kernel.org]
> Sent: 03 August 2021 12:39
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com;
> james.mo...@arm.com; julien.thierry.k...@gmail.com;
> suzuki.poul...@arm.com; jean-phili...@linaro.org;
> alexandru.eli...@arm.com; qper...@google.com; Linuxarm
> 
> Subject: Re: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for
> KVM
> 
> On Thu, Jul 29, 2021 at 11:40:06AM +0100, Shameer Kolothum wrote:
> > A new VMID allocator for arm64 KVM use. This is based on
> > arm64 ASID allocator algorithm.
> >
> > One major deviation from the ASID allocator is the way we
> > flush the context. Unlike ASID allocator, we expect less
> > frequent rollover in the case of VMIDs. Hence, instead of
> > marking the CPU as flush_pending and issuing a local context
> > invalidation on the next context switch, we broadcast TLB
> > flush + I-cache invalidation over the inner shareable domain
> > on rollover.
> >
> > Signed-off-by: Shameer Kolothum
> 
> > ---
> 
> [...]
> 
> > +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> > +{
> > +   unsigned long flags;
> > +   unsigned int cpu;
> > +   u64 vmid, old_active_vmid;
> > +
> > +   vmid = atomic64_read(_vmid->id);
> > +
> > +   /*
> > +* Please refer comments in check_and_switch_context() in
> > +* arch/arm64/mm/context.c.
> > +*/
> > +   old_active_vmid = atomic64_read(this_cpu_ptr(_vmids));
> > +   if (old_active_vmid && vmid_gen_match(vmid) &&
> > +   atomic64_cmpxchg_relaxed(this_cpu_ptr(_vmids),
> > +old_active_vmid, vmid))
> > +   return;
> > +
> > +   raw_spin_lock_irqsave(_vmid_lock, flags);
> > +
> > +   /* Check that our VMID belongs to the current generation. */
> > +   vmid = atomic64_read(_vmid->id);
> > +   if (!vmid_gen_match(vmid)) {
> > +   vmid = new_vmid(kvm_vmid);
> > +   atomic64_set(_vmid->id, vmid);
> 
> new_vmid() can just set kvm_vmid->id directly

Ok.
> 
> > +   }
> > +
> > +   cpu = smp_processor_id();
> 
> Why?

Left over from previous one. Forgot to remove
as we don't have the tlb_flush_pending check anymore.

Thanks,
Shameer

> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

2021-08-03 Thread Will Deacon
On Thu, Jul 29, 2021 at 11:40:09AM +0100, Shameer Kolothum wrote:
> Like ASID allocator, we copy the active_vmids into the
> reserved_vmids on a rollover. But it's unlikely that
> every CPU will have a vCPU as current task and we may
> end up unnecessarily reserving the VMID space.
> 
> Hence, clear active_vmids when scheduling out a vCPU.
> 
> Suggested-by: Will Deacon 
> Signed-off-by: Shameer Kolothum 
> ---
>  arch/arm64/include/asm/kvm_host.h | 1 +
>  arch/arm64/kvm/arm.c  | 1 +
>  arch/arm64/kvm/vmid.c | 6 ++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index bb993bce1363..d93141cb8d16 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -687,6 +687,7 @@ extern unsigned int kvm_arm_vmid_bits;
>  int kvm_arm_vmid_alloc_init(void);
>  void kvm_arm_vmid_alloc_free(void);
>  void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
> +void kvm_arm_vmid_clear_active(void);
>  
>  static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
>  {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 077e55a511a9..b134a1b89c84 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -435,6 +435,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_timer_vcpu_put(vcpu);
>   kvm_vgic_put(vcpu);
>   kvm_vcpu_pmu_restore_host(vcpu);
> + kvm_arm_vmid_clear_active();
>  
>   vcpu->cpu = -1;
>  }
> diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
> index 5584e84aed95..5fd51f5445c1 100644
> --- a/arch/arm64/kvm/vmid.c
> +++ b/arch/arm64/kvm/vmid.c
> @@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid *kvm_vmid)
>   return idx2vmid(vmid) | generation;
>  }
>  
> +/* Call with preemption disabled */
> +void kvm_arm_vmid_clear_active(void)
> +{
> + atomic64_set(this_cpu_ptr(_vmids), 0);
> +}

I think this is very broken, as it will force everybody to take the
slow-path when they see an active_vmid of 0.

It also doesn't solve the issue I mentioned before, as an active_vmid of 0
means that the reserved vmid is preserved.

Needs more thought...

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 1/4] KVM: arm64: Introduce a new VMID allocator for KVM

2021-08-03 Thread Will Deacon
On Thu, Jul 29, 2021 at 11:40:06AM +0100, Shameer Kolothum wrote:
> A new VMID allocator for arm64 KVM use. This is based on
> arm64 ASID allocator algorithm.
> 
> One major deviation from the ASID allocator is the way we
> flush the context. Unlike ASID allocator, we expect less
> frequent rollover in the case of VMIDs. Hence, instead of
> marking the CPU as flush_pending and issuing a local context
> invalidation on the next context switch, we broadcast TLB
> flush + I-cache invalidation over the inner shareable domain
> on rollover.
> 
> Signed-off-by: Shameer Kolothum 
> ---

[...]

> +void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
> +{
> + unsigned long flags;
> + unsigned int cpu;
> + u64 vmid, old_active_vmid;
> +
> + vmid = atomic64_read(_vmid->id);
> +
> + /*
> +  * Please refer comments in check_and_switch_context() in
> +  * arch/arm64/mm/context.c.
> +  */
> + old_active_vmid = atomic64_read(this_cpu_ptr(_vmids));
> + if (old_active_vmid && vmid_gen_match(vmid) &&
> + atomic64_cmpxchg_relaxed(this_cpu_ptr(_vmids),
> +  old_active_vmid, vmid))
> + return;
> +
> + raw_spin_lock_irqsave(_vmid_lock, flags);
> +
> + /* Check that our VMID belongs to the current generation. */
> + vmid = atomic64_read(_vmid->id);
> + if (!vmid_gen_match(vmid)) {
> + vmid = new_vmid(kvm_vmid);
> + atomic64_set(_vmid->id, vmid);

new_vmid() can just set kvm_vmid->id directly

> + }
> +
> + cpu = smp_processor_id();

Why?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

2021-08-03 Thread Fuad Tabba
Hi Quentin,

On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret  wrote:
>
> Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
> a new pkvm_create_mappings_locked() function. This will be used in later
> patches to allow walking and changing the hypervisor stage-1 without
> releasing the lock.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mm.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/mm.c | 18 --
>  2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h 
> b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 8ec3a5a7744b..c76d7136ed9b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -23,6 +23,7 @@ int hyp_map_vectors(void);
>  int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
>  int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
>  int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot 
> prot);
>  int __pkvm_create_mappings(unsigned long start, unsigned long size,
>unsigned long phys, enum kvm_pgtable_prot prot);
>  unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index a8efdf0f9003..6fbe8e8030f6 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -67,13 +67,15 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t 
> phys, size_t size,
> return addr;
>  }
>
> -int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot 
> prot)
>  {
> unsigned long start = (unsigned long)from;
> unsigned long end = (unsigned long)to;
> unsigned long virt_addr;
> phys_addr_t phys;
>
> +   hyp_assert_lock_held(_pgd_lock);
> +
> start = start & PAGE_MASK;
> end = PAGE_ALIGN(end);
>
> @@ -81,7 +83,8 @@ int pkvm_create_mappings(void *from, void *to, enum 
> kvm_pgtable_prot prot)
> int err;
>
> phys = hyp_virt_to_phys((void *)virt_addr);
> -   err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, 
> prot);
> +   err = kvm_pgtable_hyp_map(_pgtable, virt_addr, PAGE_SIZE,
> + phys, prot);
> if (err)
> return err;
> }
> @@ -89,6 +92,17 @@ int pkvm_create_mappings(void *from, void *to, enum 
> kvm_pgtable_prot prot)
> return 0;
>  }
>
> +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> +{
> +   int ret;
> +
> +   hyp_spin_lock(_pgd_lock);
> +   ret = pkvm_create_mappings_locked(from, to, prot);
> +   hyp_spin_unlock(_pgd_lock);
> +
> +   return ret;
> +}
> +

I'm wondering whether this patch should also refactor
__pkvm_create_mappings. It doesn't quite do the exact same thing and
has different parameters.

Thanks,
/fuad

>  int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
>  {
> unsigned long start, end;
> --
> 2.32.0.432.gabb21c7263-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

2021-08-03 Thread Fuad Tabba
Hi Quentin,

On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret  wrote:
>
> As the hypervisor maps the host's .bss and .rodata sections in its
> stage-1, make sure to tag them as shared in hyp and host page-tables.
>
> But since the hypervisor relies on the presence of these mappings, we
> cannot let the host in complete control of the memory regions -- it
> must not unshare or donate them to another entity for example. To
> prevent this, let's transfer the ownership of those ranges to the
> hypervisor itself, and share the pages back with the host.
>
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kvm/hyp/nvhe/setup.c | 82 +
>  1 file changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 0b574d106519..7f557b264f62 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -58,6 +58,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned 
> long size,
>  {
> void *start, *end, *virt = hyp_phys_to_virt(phys);
> unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> +   enum kvm_pgtable_prot prot;
> int ret, i;
>
> /* Recreate the hyp page-table using the early page allocator */
> @@ -83,10 +84,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> unsigned long size,
> if (ret)
> return ret;
>
> -   ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
> -   if (ret)
> -   return ret;
> -
> ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, 
> PAGE_HYP_RO);
> if (ret)
> return ret;
> @@ -95,10 +92,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> unsigned long size,
> if (ret)
> return ret;
>
> -   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
> -   if (ret)
> -   return ret;
> -
> ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
> if (ret)
> return ret;
> @@ -117,6 +110,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> unsigned long size,
> return ret;
> }
>
> +   /*
> +* Map the host's .bss and .rodata sections RO in the hypervisor, but
> +* transfer the ownerhsip from the host to the hypervisor itself to
> +* make sure it can't be donated or shared with another entity.

nit: ownerhsip -> ownership

> +*
> +* The ownership transtion requires matching changes in the host

nit: transtion -> transition

> +* stage-2. This will done later (see finalize_host_mappings()) once 
> the

nit: will done -> will be done

> +* hyp_vmemmap is addressable.
> +*/
> +   prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
> +   ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
> +   if (ret)
> +   return ret;
> +
> +   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);

nit: for clarity, I wonder if it might be good to create an alias of
__hyp_bss_end as __bss_start or something. When it's been moved here,
it sticks out a bit more and makes the reader wonder about the
significance of __hyp_bss_end.

> +   if (ret)
> +   return ret;
> +
> return 0;
>  }
>
> @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> hyp_put_page(, addr);
>  }
>
> +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> +kvm_pte_t *ptep,
> +enum kvm_pgtable_walk_flags flag,
> +void * const arg)
> +{
> +   enum kvm_pgtable_prot prot;
> +   enum pkvm_page_state state;
> +   kvm_pte_t pte = *ptep;
> +   phys_addr_t phys;
> +
> +   if (!kvm_pte_valid(pte))
> +   return 0;
> +
> +   if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +   return -EINVAL;

I know that it's not in scope here, but I'm wondering whether we
should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe
it would be good to have a helper somewhere for all these checks both
for clarity and to ensure that nothing has gone wrong with the pte.

> +
> +   phys = kvm_pte_to_phys(pte);
> +   if (!addr_is_memory(phys))
> +   return 0;
> +
> +   /*
> +* Adjust the host stage-2 mappings to match the ownership attributes
> +* configured in the hypervisor stage-1.
> +*/
> +   state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +   switch (state) {
> +   case PKVM_PAGE_OWNED:
> +   return host_stage2_set_owner_locked(phys, phys + PAGE_SIZE, 
> pkvm_hyp_id);
> +   case PKVM_PAGE_SHARED_OWNED:
> +   prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, 
> PKVM_PAGE_SHARED_BORROWED);
> +   break;
> 

[PATCH v3] memblock: make memblock_find_in_range method private

2021-08-03 Thread Mike Rapoport
From: Mike Rapoport 

There are a lot of uses of memblock_find_in_range() along with
memblock_reserve() from the times memblock allocation APIs did not exist.

memblock_find_in_range() is the very core of memblock allocations, so any
future changes to its internal behaviour would mandate updates of all the
users outside memblock.

Replace the calls to memblock_find_in_range() with an equivalent calls to
memblock_phys_alloc() and memblock_phys_alloc_range() and make
memblock_find_in_range() private method of memblock.

This simplifies the callers, ensures that (unlikely) errors in
memblock_reserve() are handled and improves maintainability of
memblock_find_in_range().

Signed-off-by: Mike Rapoport 
---
v3:
* simplify check for exact crash kerenl allocation on arm, per Rob
* make crash_max unsigned long long on arm64, per Rob
 
v2: https://lore.kernel.org/lkml/20210802063737.22733-1-r...@kernel.org 
* don't change error message in arm::reserve_crashkernel(), per Russell

v1: https://lore.kernel.org/lkml/20210730104039.7047-1-r...@kernel.org

 arch/arm/kernel/setup.c   | 20 +-
 arch/arm64/kvm/hyp/reserved_mem.c |  9 +++
 arch/arm64/mm/init.c  | 36 -
 arch/mips/kernel/setup.c  | 14 +-
 arch/riscv/mm/init.c  | 44 ++-
 arch/s390/kernel/setup.c  | 10 ---
 arch/x86/kernel/aperture_64.c |  5 ++--
 arch/x86/mm/init.c| 21 +--
 arch/x86/mm/numa.c|  5 ++--
 arch/x86/mm/numa_emulation.c  |  5 ++--
 arch/x86/realmode/init.c  |  2 +-
 drivers/acpi/tables.c |  5 ++--
 drivers/base/arch_numa.c  |  5 +---
 drivers/of/of_reserved_mem.c  | 12 ++---
 include/linux/memblock.h  |  2 --
 mm/memblock.c |  2 +-
 16 files changed, 79 insertions(+), 118 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index f97eb2371672..284a80c0b6e1 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -1012,31 +1012,25 @@ static void __init reserve_crashkernel(void)
unsigned long long lowmem_max = __pa(high_memory - 1) + 1;
if (crash_max > lowmem_max)
crash_max = lowmem_max;
-   crash_base = memblock_find_in_range(CRASH_ALIGN, crash_max,
-   crash_size, CRASH_ALIGN);
+
+   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
+  CRASH_ALIGN, crash_max);
if (!crash_base) {
pr_err("crashkernel reservation failed - No suitable 
area found.\n");
return;
}
} else {
+   unsigned long long crash_max = crash_base + crash_size;
unsigned long long start;
 
-   start = memblock_find_in_range(crash_base,
-  crash_base + crash_size,
-  crash_size, SECTION_SIZE);
-   if (start != crash_base) {
+   start = memblock_phys_alloc_range(crash_size, SECTION_SIZE,
+ crash_base, crash_max);
+   if (!start) {
pr_err("crashkernel reservation failed - memory is in 
use.\n");
return;
}
}
 
-   ret = memblock_reserve(crash_base, crash_size);
-   if (ret < 0) {
-   pr_warn("crashkernel reservation failed - memory is in use 
(0x%lx)\n",
-   (unsigned long)crash_base);
-   return;
-   }
-
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System 
RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
(unsigned long)(crash_base >> 20),
diff --git a/arch/arm64/kvm/hyp/reserved_mem.c 
b/arch/arm64/kvm/hyp/reserved_mem.c
index d654921dd09b..578670e3f608 100644
--- a/arch/arm64/kvm/hyp/reserved_mem.c
+++ b/arch/arm64/kvm/hyp/reserved_mem.c
@@ -92,12 +92,10 @@ void __init kvm_hyp_reserve(void)
 * this is unmapped from the host stage-2, and fallback to PAGE_SIZE.
 */
hyp_mem_size = hyp_mem_pages << PAGE_SHIFT;
-   hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
- ALIGN(hyp_mem_size, PMD_SIZE),
- PMD_SIZE);
+   hyp_mem_base = memblock_phys_alloc(ALIGN(hyp_mem_size, PMD_SIZE),
+  PMD_SIZE);
if (!hyp_mem_base)
-   hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(),
- hyp_mem_size, PAGE_SIZE);
+   hyp_mem_base = memblock_phys_alloc(hyp_mem_size, PAGE_SIZE);
else