Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support

2020-06-28 Thread Nicholas Piggin
Excerpts from Michael Ellerman's message of June 12, 2020 4:14 pm:
> Nicholas Piggin  writes:
>> ISA v3.1 does not support the SAO storage control attribute required to
>> implement PROT_SAO. PROT_SAO was used by specialised system software
>> (Lx86) that has been discontinued for about 7 years, and is not thought
>> to be used elsewhere, so removal should not cause problems.
>>
>> We rather remove it than keep support for older processors, because
>> live migrating guest partitions to newer processors may not be possible
>> if SAO is in use.
> 

Thakns for the review, sorry got distracted...

> They key details being:
>  - you don't remove PROT_SAO from the uapi header, so code using the
>definition will still build.
>  - you change arch_validate_prot() to reject PROT_SAO, which means code
>using it will see a failure from mmap() at runtime.

Yes.

> This obviously risks breaking userspace, even if we think it won't in
> practice. I guess we don't really have any option given the hardware
> support is being dropped.
> 
> Can you repost with a wider Cc list, including linux-mm and linux-arch?

Will do.

> I wonder if we should add a comment to the uapi header, eg?
> 
> diff --git a/arch/powerpc/include/uapi/asm/mman.h 
> b/arch/powerpc/include/uapi/asm/mman.h
> index c0c737215b00..d4fdbe768997 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -11,7 +11,7 @@
>  #include 
>  
>  
> -#define PROT_SAO 0x10/* Strong Access Ordering */
> +#define PROT_SAO 0x10/* Unsupported since v5.9 */
>  
>  #define MAP_RENAME  MAP_ANONYMOUS   /* In SunOS terminology */
>  #define MAP_NORESERVE   0x40/* don't reserve swap pages */

Yeah that makes sense.

>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index f17442c3a092..d9e92586f8dc 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -20,9 +20,13 @@
>>  #define _PAGE_RW(_PAGE_READ | _PAGE_WRITE)
>>  #define _PAGE_RWX   (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>>  #define _PAGE_PRIVILEGED0x8 /* kernel access only */
>> -#define _PAGE_SAO   0x00010 /* Strong access order */
>> +
>> +#define _PAGE_CACHE_CTL 0x00030 /* Bits for the folowing cache 
>> modes */
>> +/*  No bits set is normal cacheable memory */
>> +/*  0x00010 unused, is SAO bit on radix POWER9 */
>>  #define _PAGE_NON_IDEMPOTENT0x00020 /* non idempotent memory */
>>  #define _PAGE_TOLERANT  0x00030 /* tolerant memory, cache 
>> inhibited */
>> +
> 
> Why'd you do it that way vs just dropping _PAGE_SAO from the or below?

Just didn't like _PAGE_CACHE_CTL depending on values of the variants 
that we use.

>> diff --git a/arch/powerpc/include/asm/cputable.h 
>> b/arch/powerpc/include/asm/cputable.h
>> index bac2252c839e..c7e923ba 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
>>  #define CPU_FTR_SPURR   
>> LONG_ASM_CONST(0x0100)
>>  #define CPU_FTR_DSCR
>> LONG_ASM_CONST(0x0200)
>>  #define CPU_FTR_VSX LONG_ASM_CONST(0x0400)
>> -#define CPU_FTR_SAO LONG_ASM_CONST(0x0800)
> 
> Can you do:
> 
> +// Free  LONG_ASM_CONST(0x0800)

Yes.

> 
>> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
>> b/arch/powerpc/include/asm/kvm_book3s_64.h
>> index 9bb9bb370b53..579c9229124b 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
>> @@ -400,7 +400,8 @@ static inline bool hpte_cache_flags_ok(unsigned long 
>> hptel, bool is_ci)
>>  
>>  /* Handle SAO */
>>  if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
>> -cpu_has_feature(CPU_FTR_ARCH_206))
>> +cpu_has_feature(CPU_FTR_ARCH_206) &&
>> +!cpu_has_feature(CPU_FTR_ARCH_31))
>>  wimg = HPTE_R_M;
> 
> Shouldn't it reject that combination if the host can't support it?
> 
> Or I guess it does, but yikes that code is not clear.

Yeah, took me a bit to work that out.

>> diff --git a/arch/powerpc/include/asm/mman.h 
>> b/arch/powerpc/include/asm/mman.h
>> index d610c2e07b28..43a62f3e21a0 100644
>> --- a/arch/powerpc/include/asm/mman.h
>> +++ b/arch/powerpc/include/asm/mman.h
>> @@ -13,38 +13,24 @@
>>  #include 
>>  #include 
>>  
>> -/*
>> - * This file is included by linux/mman.h, so we can't use 
>> cacl_vm_prot_bits()
>> - * here.  How important is the optimization?
>> - */
> 
> This comment seems confused, but also unrelated to this patch?

Yeah.
 
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
>> b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> 

Re: [PATCH v2 1/3] powerpc/mm: Enable radix GTSE only if supported.

2020-06-28 Thread Bharata B Rao
On Fri, Jun 26, 2020 at 05:55:30PM -0300, Murilo Opsfelder Ara├║jo wrote:
> > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> > index bc73abf0bc25..152aa0200cef 100644
> > --- a/arch/powerpc/mm/init_64.c
> > +++ b/arch/powerpc/mm/init_64.c
> > @@ -407,12 +407,15 @@ static void __init early_check_vec5(void)
> > if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
> > OV5_FEAT(OV5_RADIX_GTSE))) {
> > pr_warn("WARNING: Hypervisor doesn't support RADIX with 
> > GTSE\n");
> > -   }
> > +   cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> > +   } else
> > +   cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
> > /* Do radix anyway - the hypervisor said we had to */
> > cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
> > } else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
> > /* Hypervisor only supports hash - disable radix */
> > cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> > +   cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> > }
> >  }
> 
> Is this a part of the code where mmu_clear_feature() cannot be used?

Yes, it appears so. Jump label initialization isn't done yet.

Regards,
Bharata.


Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.

2020-06-28 Thread Bharata B Rao
On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote:
> On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> > The time taken to switch a VM to Secure-VM, increases by the size of the 
> > VM.  A
> > 100GB VM takes about 7minutes. This is unacceptable.  This linear increase 
> > is
> > caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory 
> > to
> > secure-memory. It has to just migrate the necessary and sufficient GFNs.
> > 
> > However when the optimization is incorporated in the Ultravisor, the 
> > Hypervisor
> > starts misbehaving. The Hypervisor has a inbuilt assumption that the 
> > Ultravisor
> > will explicitly request to migrate, each and every GFN of the VM. If only
> > necessary and sufficient GFNs are requested for migration, the Hypervisor
> > continues to manage the remaining GFNs as normal GFNs. This leads of memory
> > corruption, manifested consistently when the SVM reboots.
> > 
> > The same is true, when a memory slot is hotplugged into a SVM. The 
> > Hypervisor
> > expects the ultravisor to request migration of all GFNs to secure-GFN.  But 
> > at
> > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  
> > This
> > problem manifests as random errors in the SVM, when a memory-slot is
> > hotplugged.
> > 
> > This patch series automatically migrates the non-migrated pages of a SVM,
> >  and thus solves the problem.
> 
> So this is what I understand as the objective of this patchset:
> 
> 1. Getting all the pages into the secure memory right when the guest
>transitions into secure mode is expensive. Ultravisor wants to just get
>the necessary and sufficient pages in and put the onus on the Hypervisor
>to mark the remaining pages (w/o actual page-in) as secure during
>H_SVM_INIT_DONE.
> 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
>are already secure from the pages that are shared and that are paged-out.
>For this you are introducing all these new states in HV.
> 
> UV knows about the shared GFNs and maintains the state of the same. Hence
> let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
> and if UV finds any shared pages in them, let it fail the uv-page-in call.
> Then HV can fail the migration for it  and the page continues to remain
> shared. With this, you don't need to maintain a state for secured GFN in HV.
> 
> In the unlikely case of sending a paged-out page to UV during
> H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
> if required. With this, you don't need a state in HV to identify a
> paged-out-but-encrypted state.
> 
> Doesn't the above work?

I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE.
So that would need the extra states in HV which you are proposing here.

Regards,
Bharata.


Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y

2020-06-28 Thread Michael Ellerman
Masahiro Yamada  writes:
> CFLAGS_REMOVE_.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
>
> Add ccflags-remove-y and asflags-remove-y to make it easily.
>
> Use ccflags-remove-y to clean up some Makefiles.
>
> Suggested-by: Sami Tolvanen 
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/arm/boot/compressed/Makefile | 6 +-
>  arch/powerpc/xmon/Makefile| 3 +--
>  arch/sh/boot/compressed/Makefile  | 5 +
>  kernel/trace/Makefile | 4 ++--
>  lib/Makefile  | 5 +
>  scripts/Makefile.lib  | 4 ++--
>  6 files changed, 8 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index 89c76ca35640..55cbcdd88ac0 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -7,8 +7,7 @@ UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>  
>  # Disable ftrace for the entire directory
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
> +ccflags-remove-y += $(CC_FLAGS_FTRACE)

This could be:

ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)

Similar to kernel/trace/Makefile below.

I don't mind though.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 6575bb0a0434..7492844a8b1b 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -2,9 +2,9 @@
>  
>  # Do not instrument the tracer itself:
>  
> +ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)
> +
>  ifdef CONFIG_FUNCTION_TRACER
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
>  
>  # Avoid recursion due to instrumentation.
>  KCSAN_SANITIZE := n


Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-06-28 Thread Hari Bathini



On 28/06/20 7:44 am, piliu wrote:
> Hi Hari,

Hi Pingfan,

> 
> After a quick through for this series, I have a few question/comment on
> this patch for the time being. Pls see comment inline.
> 
> On 06/27/2020 03:05 AM, Hari Bathini wrote:
>> crashkernel region could have an overlap with special memory regions
>> like  opal, rtas, tce-table & such. These regions are referred to as
>> exclude memory ranges. Setup this ranges during image probe in order
>> to avoid them while finding the buffer for different kdump segments.

[...]

>> +/*
>> + * Use the locate_mem_hole logic in kexec_add_buffer() for regular
>> + * kexec_file_load syscall
>> + */
>> +if (kbuf->image->type != KEXEC_TYPE_CRASH)
>> +return 0;
> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
> there is no requirement for @exclude_ranges.

The ranges like rtas, opal are loaded by f/w. They almost always overlap with
crashkernel region. So, @exclude_ranges is required to support kdump.

> I guess you have a design for future. If not true, then it is better to
> fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the
> caller and rename this function to better distinguish use cases between
> kexec and kdump

Yeah, this condition will be folded. I have a follow-up patch for that 
explaining
why kexec case should also be folded. Will try to add that to this series for 
v2.

Thanks
Hari


Re: [PATCH 9/8] mm: Account PMD tables like PTE tables

2020-06-28 Thread Mike Rapoport
On Sat, Jun 27, 2020 at 07:46:42PM +0100, Matthew Wilcox wrote:
> We account the PTE level of the page tables to the process in order to
> make smarter OOM decisions and help diagnose why memory is fragmented.
> For these same reasons, we should account pages allocated for PMDs.
> With larger process address spaces and ASLR, the number of PMDs in use
> is higher than it used to be so the inaccuracy is starting to matter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 

Reviewed-by: Mike Rapoport 

> ---
>  include/linux/mm.h | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dc7b87310c10..b283e25fcffa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2271,7 +2271,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_to_page(pmd));
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pmd_ptlock_init(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   page->pmd_huge_pte = NULL;
> @@ -2279,7 +2279,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>   return ptlock_init(page);
>  }
>  
> -static inline void pgtable_pmd_page_dtor(struct page *page)
> +static inline void pmd_ptlock_free(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> @@ -2296,8 +2296,8 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page) { return true; }
> -static inline void pgtable_pmd_page_dtor(struct page *page) {}
> +static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2310,6 +2310,22 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + if (!pmd_ptlock_init(page))
> + return false;
> + __SetPageTable(page);
> + inc_zone_page_state(page, NR_PAGETABLE);
> + return true;
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> + pmd_ptlock_free(page);
> + __ClearPageTable(page);
> + dec_zone_page_state(page, NR_PAGETABLE);
> +}
> +
>  /*
>   * No scalability reason to split PUD locks yet, but follow the same pattern
>   * as the PMD locks to make it easier if we decide to.  The VM should not be
> -- 
> 2.27.0
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH 4/8] asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()

2020-06-28 Thread Mike Rapoport
On Sat, Jun 27, 2020 at 08:03:04PM +0100, Matthew Wilcox wrote:
> On Sat, Jun 27, 2020 at 05:34:49PM +0300, Mike Rapoport wrote:
> > More elaborate versions on arm64 and x86 account memory for the user page
> > tables and call to pgtable_pmd_page_ctor() as the part of PMD page
> > initialization.
> > 
> > Move the arm64 version to include/asm-generic/pgalloc.h and use the generic
> > version on several architectures.
> > 
> > The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
> > not enabled, so there is no functional change for most architectures except
> > of the addition of __GFP_ACCOUNT for allocation of user page tables.
> 
> Thanks for including this line; it reminded me that we're not setting
> the PageTable flag on the page, nor accounting it to the zone page stats.
> Hope you don't mind me tagging a patch to do that on as 9/8.
> 
> We could also do with a pud_page_[cd]tor and maybe even p4d/pgd versions.
> But that brings me to the next question -- could/should some of this
> be moved over to asm-generic/pgalloc.h?  The ctor/dtor aren't called
> from anywhere else, and there's value to reducing the total amount of
> code in mm.h, but then there's also value to keeping all the ifdef
> ARCH_ENABLE_SPLIT_PMD_PTLOCK code together too.  So I'm a bit torn.
> What do you think?

There are arhcitectures that don't use asm-generic/pgalloc.h but rather
have their own, sometimes completely different, versoins of these
funcitons.

I've tried adding linux/pgalloc.h, but I've ended up with contradicting
need to include asm/pgalloc.h before the generic code for some
architecures or after the generic code for others :)

I think let's leave it in mm.h for now, maybe after several more cleaups
we could do better.

-- 
Sincerely yours,
Mike.


Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y

2020-06-28 Thread Steven Rostedt
On Sun, 28 Jun 2020 10:50:41 +0900
Masahiro Yamada  wrote:

> CFLAGS_REMOVE_.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
> 
> Add ccflags-remove-y and asflags-remove-y to make it easily.
> 
> Use ccflags-remove-y to clean up some Makefiles.
> 
> Suggested-by: Sami Tolvanen 
> Signed-off-by: Masahiro Yamada 
> ---

Nice feature!

Acked-by: Steven Rostedt (VMware) 

-- Steve


Re: [PATCH v3 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-06-28 Thread Bharata B Rao
On Fri, Jun 19, 2020 at 03:43:41PM -0700, Ram Pai wrote:
> H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly

As noted in the last iteration, can you reword the above please?
I don't see it as an incorrect assumption, but see it as extension of
scope now :-)

> called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
> normal GFNs associated with normal PFNs; when infact, these GFNs should
> have been secure GFNs, associated with device PFNs.
> 
> Move all the PFNs associated with the SVM's GFNs, to secure-PFNs, in
> H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
> through H_SVM_PAGE_IN, or Paged-in followed by a Paged-out through
> UV_PAGE_OUT.
> 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Bharata B Rao 
> Cc: Aneesh Kumar K.V 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Cc: Thiago Jung Bauermann 
> Cc: David Gibson 
> Cc: Claudio Carvalho 
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai 
> ---
>  Documentation/powerpc/ultravisor.rst|   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 154 
> +++-
>  3 files changed, 132 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst 
> b/Documentation/powerpc/ultravisor.rst
> index 363736d..3bc8957 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -933,6 +933,8 @@ Return values
>   * H_UNSUPPORTED if called from the wrong context (e.g.
>   from an SVM or before an H_SVM_INIT_START
>   hypercall).
> + * H_STATE   if the hypervisor could not successfully
> +transition the VM to Secure VM.
>  
>  Description
>  ~~~
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index 5a9834e..b9cd7eb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -22,6 +22,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>  unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
>  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>struct kvm *kvm, bool skip_page_out);
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot);
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index c8c0290..449e8a7 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -93,6 +93,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static struct dev_pagemap kvmppc_uvmem_pgmap;
>  static unsigned long *kvmppc_uvmem_bitmap;
> @@ -339,6 +340,21 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
> struct kvm *kvm,
>   return false;
>  }
>  
> +/* return true, if the GFN is a shared-GFN, or a secure-GFN */
> +bool kvmppc_gfn_has_transitioned(unsigned long gfn, struct kvm *kvm)
> +{
> + struct kvmppc_uvmem_slot *p;
> +
> + list_for_each_entry(p, >arch.uvmem_pfns, list) {
> + if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
> + unsigned long index = gfn - p->base_pfn;
> +
> + return (p->pfns[index] & KVMPPC_GFN_FLAG_MASK);
> + }
> + }
> + return false;
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>   struct kvm_memslots *slots;
> @@ -379,12 +395,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  
>  unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
>  {
> + struct kvm_memslots *slots;
> + struct kvm_memory_slot *memslot;
> + int srcu_idx;
> + long ret = H_SUCCESS;
> +
>   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>   return H_UNSUPPORTED;
>  
> + /* migrate any unmoved normal pfn to device pfns*/
> + srcu_idx = srcu_read_lock(>srcu);
> + slots = kvm_memslots(kvm);
> + kvm_for_each_memslot(memslot, slots) {
> + ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
> + if (ret) {
> + ret = H_STATE;
> + goto out;
> + }
> + }
> +
>   kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
>   pr_info("LPID %d went secure\n", kvm->arch.lpid);
> - return H_SUCCESS;
> +
> +out:
> + srcu_read_unlock(>srcu, srcu_idx);
> + return ret;
>  }
>  
>  /*
> @@ -505,12 +540,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
> gpa, struct kvm *kvm)
>  }
>  
>  /*
> - * Alloc a PFN from private device memory pool and copy page from normal
> - * memory to secure memory using UV_PAGE_IN uvcall.
> + * Alloc a PFN 

Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.

2020-06-28 Thread Bharata B Rao
On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> The time taken to switch a VM to Secure-VM, increases by the size of the VM.  
> A
> 100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
> caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
> secure-memory. It has to just migrate the necessary and sufficient GFNs.
> 
> However when the optimization is incorporated in the Ultravisor, the 
> Hypervisor
> starts misbehaving. The Hypervisor has a inbuilt assumption that the 
> Ultravisor
> will explicitly request to migrate, each and every GFN of the VM. If only
> necessary and sufficient GFNs are requested for migration, the Hypervisor
> continues to manage the remaining GFNs as normal GFNs. This leads of memory
> corruption, manifested consistently when the SVM reboots.
> 
> The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
> expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
> the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
> problem manifests as random errors in the SVM, when a memory-slot is
> hotplugged.
> 
> This patch series automatically migrates the non-migrated pages of a SVM,
>  and thus solves the problem.

So this is what I understand as the objective of this patchset:

1. Getting all the pages into the secure memory right when the guest
   transitions into secure mode is expensive. Ultravisor wants to just get
   the necessary and sufficient pages in and put the onus on the Hypervisor
   to mark the remaining pages (w/o actual page-in) as secure during
   H_SVM_INIT_DONE.
2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
   are already secure from the pages that are shared and that are paged-out.
   For this you are introducing all these new states in HV.

UV knows about the shared GFNs and maintains the state of the same. Hence
let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
and if UV finds any shared pages in them, let it fail the uv-page-in call.
Then HV can fail the migration for it  and the page continues to remain
shared. With this, you don't need to maintain a state for secured GFN in HV.

In the unlikely case of sending a paged-out page to UV during
H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
if required. With this, you don't need a state in HV to identify a
paged-out-but-encrypted state.

Doesn't the above work? If so, we can avoid all those extra states
in HV. That way HV can continue to differentiate only between two types
of pages: secure and not-secure. The rest of the states (shared,
paged-out-encrypted) actually belong to SVM/UV and let UV take care of
them.

Or did I miss something?

Regards,
Bharata.