Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-07-08 Thread Christopher M. Riedl
On Thu Jul 1, 2021 at 2:51 AM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm:
> > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
> >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> >> > address to be patched is temporarily mapped as writeable. Currently, a
> >> > per-cpu vmalloc patch area is used for this purpose. While the patch
> >> > area is per-cpu, the temporary page mapping is inserted into the kernel
> >> > page tables for the duration of patching. The mapping is exposed to CPUs
> >> > other than the patching CPU - this is undesirable from a hardening
> >> > perspective. Use a temporary mm instead which keeps the mapping local to
> >> > the CPU doing the patching.
> >> > 
> >> > Use the `poking_init` init hook to prepare a temporary mm and patching
> >> > address. Initialize the temporary mm by copying the init mm. Choose a
> >> > randomized patching address inside the temporary mm userspace address
> >> > space. The patching address is randomized between PAGE_SIZE and
> >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> >> > the Book3s64 Hash MMU operates - by default the space above
> >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> >> > all platforms/MMUs is randomized inside this range.  The number of
> >> > possible random addresses is dependent on PAGE_SIZE and limited by
> >> > DEFAULT_MAP_WINDOW.
> >> > 
> >> > Bits of entropy with 64K page size on BOOK3S_64:
> >> > 
> >> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> >> > 
> >> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> >> > bits of entropy = log2(128TB / 64K) bits of entropy = 31
> >> > 
> >> > Randomization occurs only once during initialization at boot.
> >> > 
> >> > Introduce two new functions, map_patch() and unmap_patch(), to
> >> > respectively create and remove the temporary mapping with write
> >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> >> > the page for patching with PAGE_SHARED since the kernel cannot access
> >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> >> > 
> >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
> >> > taking an SLB and Hash fault during patching.
> >>
> >> What prevents the SLBE or HPTE from being removed before the last
> >> access?
> > 
> > This code runs with local IRQs disabled - we also don't access anything
> > else in userspace so I'm not sure what else could cause the entries to
> > be removed TBH.
> > 
> >>
> >>
> >> > +#ifdef CONFIG_PPC_BOOK3S_64
> >> > +
> >> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >> >  {
> >> > -struct vm_struct *area;
> >> > +int err;
> >> >  
> >> > -area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >> > -if (!area) {
> >> > -WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >> > -cpu);
> >> > -return -1;
> >> > -}
> >> > -this_cpu_write(text_poke_area, area);
> >> > +if (radix_enabled())
> >> > +return 0;
> >> >  
> >> > -return 0;
> >> > -}
> >> > +err = slb_allocate_user(patching_mm, patching_addr);
> >> > +if (err)
> >> > +pr_warn("map patch: failed to allocate slb entry\n");
> >> >  
> >> > -static int text_area_cpu_down(unsigned int cpu)
> >> > -{
> >> > -free_vm_area(this_cpu_read(text_poke_area));
> >> > -return 0;
> >> > +err = hash_page_mm(patching_mm, patching_addr, 
> >> > pgprot_val(pgprot), 0,
> >> > +   HPTE_USE_KERNEL_KEY);
> >> > +if (err)
> >> > +pr_warn("map patch: failed to insert hashed page\n");
> >> > +
> >> > +/* See comment in switch_slb() in mm/book3s64/slb.c */
> >> > +isync();
> >>
> >> I'm not sure if this is enough. Could we context switch here? You've
> >> got the PTL so no with a normal kernel but maybe yes with an RT kernel
> >> How about taking an machine check that clears the SLB? Could the HPTE
> >> get removed by something else here?
> > 
> > All of this happens after a local_irq_save() which should at least
> > prevent context switches IIUC.
>
> Ah yeah I didn't look that far back. A machine check can take out SLB
> entries.
>
> > I am not sure what else could cause the
> > HPTE to get removed here.
>
> Other CPUs?
>

Right because the HPTEs are "global".

> >> You want to prevent faults because you might be patching a fault
> >> handler?
> > 
> > In a more general sense: I don't think we want to take page faults every
> > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
> > fault handler codepath also checks 

Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-07-01 Thread Nicholas Piggin
Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm:
> On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
>> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
>> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
>> > address to be patched is temporarily mapped as writeable. Currently, a
>> > per-cpu vmalloc patch area is used for this purpose. While the patch
>> > area is per-cpu, the temporary page mapping is inserted into the kernel
>> > page tables for the duration of patching. The mapping is exposed to CPUs
>> > other than the patching CPU - this is undesirable from a hardening
>> > perspective. Use a temporary mm instead which keeps the mapping local to
>> > the CPU doing the patching.
>> > 
>> > Use the `poking_init` init hook to prepare a temporary mm and patching
>> > address. Initialize the temporary mm by copying the init mm. Choose a
>> > randomized patching address inside the temporary mm userspace address
>> > space. The patching address is randomized between PAGE_SIZE and
>> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
>> > the Book3s64 Hash MMU operates - by default the space above
>> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
>> > all platforms/MMUs is randomized inside this range.  The number of
>> > possible random addresses is dependent on PAGE_SIZE and limited by
>> > DEFAULT_MAP_WINDOW.
>> > 
>> > Bits of entropy with 64K page size on BOOK3S_64:
>> > 
>> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
>> > 
>> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
>> > bits of entropy = log2(128TB / 64K) bits of entropy = 31
>> > 
>> > Randomization occurs only once during initialization at boot.
>> > 
>> > Introduce two new functions, map_patch() and unmap_patch(), to
>> > respectively create and remove the temporary mapping with write
>> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
>> > the page for patching with PAGE_SHARED since the kernel cannot access
>> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
>> > 
>> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
>> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
>> > taking an SLB and Hash fault during patching.
>>
>> What prevents the SLBE or HPTE from being removed before the last
>> access?
> 
> This code runs with local IRQs disabled - we also don't access anything
> else in userspace so I'm not sure what else could cause the entries to
> be removed TBH.
> 
>>
>>
>> > +#ifdef CONFIG_PPC_BOOK3S_64
>> > +
>> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
>> >  {
>> > -  struct vm_struct *area;
>> > +  int err;
>> >  
>> > -  area = get_vm_area(PAGE_SIZE, VM_ALLOC);
>> > -  if (!area) {
>> > -  WARN_ONCE(1, "Failed to create text area for cpu %d\n",
>> > -  cpu);
>> > -  return -1;
>> > -  }
>> > -  this_cpu_write(text_poke_area, area);
>> > +  if (radix_enabled())
>> > +  return 0;
>> >  
>> > -  return 0;
>> > -}
>> > +  err = slb_allocate_user(patching_mm, patching_addr);
>> > +  if (err)
>> > +  pr_warn("map patch: failed to allocate slb entry\n");
>> >  
>> > -static int text_area_cpu_down(unsigned int cpu)
>> > -{
>> > -  free_vm_area(this_cpu_read(text_poke_area));
>> > -  return 0;
>> > +  err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
>> > + HPTE_USE_KERNEL_KEY);
>> > +  if (err)
>> > +  pr_warn("map patch: failed to insert hashed page\n");
>> > +
>> > +  /* See comment in switch_slb() in mm/book3s64/slb.c */
>> > +  isync();
>>
>> I'm not sure if this is enough. Could we context switch here? You've
>> got the PTL so no with a normal kernel but maybe yes with an RT kernel
>> How about taking an machine check that clears the SLB? Could the HPTE
>> get removed by something else here?
> 
> All of this happens after a local_irq_save() which should at least
> prevent context switches IIUC.

Ah yeah I didn't look that far back. A machine check can take out SLB
entries.

> I am not sure what else could cause the
> HPTE to get removed here.

Other CPUs?

>> You want to prevent faults because you might be patching a fault
>> handler?
> 
> In a more general sense: I don't think we want to take page faults every
> time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
> fault handler codepath also checks `current->mm` in some places which
> won't match the temporary mm. Also `current->mm` can be NULL which
> caused problems in my earlier revisions of this series.

Hmm, that's a bit of a hack then. Maybe doing an actual mm switch and 
setting current->mm properly would explode too much. Maybe that's okayish.
But I can't see how the HPT code is up to the job of this in general 
(even if that current->mm issue was fixed).

To do it without holes you would either 

Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-07-01 Thread Christopher M. Riedl
On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > address to be patched is temporarily mapped as writeable. Currently, a
> > per-cpu vmalloc patch area is used for this purpose. While the patch
> > area is per-cpu, the temporary page mapping is inserted into the kernel
> > page tables for the duration of patching. The mapping is exposed to CPUs
> > other than the patching CPU - this is undesirable from a hardening
> > perspective. Use a temporary mm instead which keeps the mapping local to
> > the CPU doing the patching.
> > 
> > Use the `poking_init` init hook to prepare a temporary mm and patching
> > address. Initialize the temporary mm by copying the init mm. Choose a
> > randomized patching address inside the temporary mm userspace address
> > space. The patching address is randomized between PAGE_SIZE and
> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> > the Book3s64 Hash MMU operates - by default the space above
> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> > all platforms/MMUs is randomized inside this range.  The number of
> > possible random addresses is dependent on PAGE_SIZE and limited by
> > DEFAULT_MAP_WINDOW.
> > 
> > Bits of entropy with 64K page size on BOOK3S_64:
> > 
> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> > 
> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> > bits of entropy = log2(128TB / 64K) bits of entropy = 31
> > 
> > Randomization occurs only once during initialization at boot.
> > 
> > Introduce two new functions, map_patch() and unmap_patch(), to
> > respectively create and remove the temporary mapping with write
> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> > the page for patching with PAGE_SHARED since the kernel cannot access
> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> > 
> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
> > taking an SLB and Hash fault during patching.
>
> What prevents the SLBE or HPTE from being removed before the last
> access?

This code runs with local IRQs disabled - we also don't access anything
else in userspace so I'm not sure what else could cause the entries to
be removed TBH.

>
>
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +
> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >  {
> > -   struct vm_struct *area;
> > +   int err;
> >  
> > -   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > -   if (!area) {
> > -   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > -   cpu);
> > -   return -1;
> > -   }
> > -   this_cpu_write(text_poke_area, area);
> > +   if (radix_enabled())
> > +   return 0;
> >  
> > -   return 0;
> > -}
> > +   err = slb_allocate_user(patching_mm, patching_addr);
> > +   if (err)
> > +   pr_warn("map patch: failed to allocate slb entry\n");
> >  
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > -   free_vm_area(this_cpu_read(text_poke_area));
> > -   return 0;
> > +   err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> > +  HPTE_USE_KERNEL_KEY);
> > +   if (err)
> > +   pr_warn("map patch: failed to insert hashed page\n");
> > +
> > +   /* See comment in switch_slb() in mm/book3s64/slb.c */
> > +   isync();
>
> I'm not sure if this is enough. Could we context switch here? You've
> got the PTL so no with a normal kernel but maybe yes with an RT kernel
> How about taking an machine check that clears the SLB? Could the HPTE
> get removed by something else here?

All of this happens after a local_irq_save() which should at least
prevent context switches IIUC. I am not sure what else could cause the
HPTE to get removed here.

>
> You want to prevent faults because you might be patching a fault
> handler?

In a more general sense: I don't think we want to take page faults every
time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
fault handler codepath also checks `current->mm` in some places which
won't match the temporary mm. Also `current->mm` can be NULL which
caused problems in my earlier revisions of this series.

>
> Thanks,
> Nick



Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-07-01 Thread Nicholas Piggin
Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> When code patching a STRICT_KERNEL_RWX kernel the page containing the
> address to be patched is temporarily mapped as writeable. Currently, a
> per-cpu vmalloc patch area is used for this purpose. While the patch
> area is per-cpu, the temporary page mapping is inserted into the kernel
> page tables for the duration of patching. The mapping is exposed to CPUs
> other than the patching CPU - this is undesirable from a hardening
> perspective. Use a temporary mm instead which keeps the mapping local to
> the CPU doing the patching.
> 
> Use the `poking_init` init hook to prepare a temporary mm and patching
> address. Initialize the temporary mm by copying the init mm. Choose a
> randomized patching address inside the temporary mm userspace address
> space. The patching address is randomized between PAGE_SIZE and
> DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> the Book3s64 Hash MMU operates - by default the space above
> DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> all platforms/MMUs is randomized inside this range.  The number of
> possible random addresses is dependent on PAGE_SIZE and limited by
> DEFAULT_MAP_WINDOW.
> 
> Bits of entropy with 64K page size on BOOK3S_64:
> 
> bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> 
> PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> bits of entropy = log2(128TB / 64K) bits of entropy = 31
> 
> Randomization occurs only once during initialization at boot.
> 
> Introduce two new functions, map_patch() and unmap_patch(), to
> respectively create and remove the temporary mapping with write
> permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> the page for patching with PAGE_SHARED since the kernel cannot access
> userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> 
> Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> for the patching_addr when using the Hash MMU on Book3s64 to avoid
> taking an SLB and Hash fault during patching.

What prevents the SLBE or HPTE from being removed before the last
access?


> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +static inline int hash_prefault_mapping(pgprot_t pgprot)
>  {
> - struct vm_struct *area;
> + int err;
>  
> - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> - if (!area) {
> - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> - cpu);
> - return -1;
> - }
> - this_cpu_write(text_poke_area, area);
> + if (radix_enabled())
> + return 0;
>  
> - return 0;
> -}
> + err = slb_allocate_user(patching_mm, patching_addr);
> + if (err)
> + pr_warn("map patch: failed to allocate slb entry\n");
>  
> -static int text_area_cpu_down(unsigned int cpu)
> -{
> - free_vm_area(this_cpu_read(text_poke_area));
> - return 0;
> + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> +HPTE_USE_KERNEL_KEY);
> + if (err)
> + pr_warn("map patch: failed to insert hashed page\n");
> +
> + /* See comment in switch_slb() in mm/book3s64/slb.c */
> + isync();

I'm not sure if this is enough. Could we context switch here? You've
got the PTL so no with a normal kernel but maybe yes with an RT kernel
How about taking an machine check that clears the SLB? Could the HPTE
get removed by something else here?

You want to prevent faults because you might be patching a fault 
handler?

Thanks,
Nick


Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-06-30 Thread Christopher M. Riedl
On Sun Jun 20, 2021 at 10:19 PM CDT, Daniel Axtens wrote:
> Hi Chris,
>
> > +   /*
> > +* Choose a randomized, page-aligned address from the range:
> > +* [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
> > +* The lower address bound is PAGE_SIZE to avoid the zero-page.
> > +* The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
> > +* under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> > +*/
> > +   patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> > +   % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));
>
> I checked and poking_init() comes after the functions that init the RNG,
> so this should be fine. The maths - while a bit fiddly to reason about -
> does check out.

Thanks for double checking.

>
> > +
> > +   /*
> > +* PTE allocation uses GFP_KERNEL which means we need to pre-allocate
> > +* the PTE here. We cannot do the allocation during patching with IRQs
> > +* disabled (ie. "atomic" context).
> > +*/
> > +   ptep = get_locked_pte(patching_mm, patching_addr, );
> > +   BUG_ON(!ptep);
> > +   pte_unmap_unlock(ptep, ptl);
> > +}
> >  
> >  #if IS_BUILTIN(CONFIG_LKDTM)
> >  unsigned long read_cpu_patching_addr(unsigned int cpu)
> >  {
> > -   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
> > +   return patching_addr;
> >  }
> >  #endif
> >  
> > -static int text_area_cpu_up(unsigned int cpu)
> > +struct patch_mapping {
> > +   spinlock_t *ptl; /* for protecting pte table */
> > +   pte_t *ptep;
> > +   struct temp_mm temp_mm;
> > +};
> > +
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +
> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >  {
> > -   struct vm_struct *area;
> > +   int err;
> >  
> > -   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > -   if (!area) {
> > -   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > -   cpu);
> > -   return -1;
> > -   }
> > -   this_cpu_write(text_poke_area, area);
> > +   if (radix_enabled())
> > +   return 0;
> >  
> > -   return 0;
> > -}
> > +   err = slb_allocate_user(patching_mm, patching_addr);
> > +   if (err)
> > +   pr_warn("map patch: failed to allocate slb entry\n");
> >  
>
> Here if slb_allocate_user() fails, you'll print a warning and then fall
> through to the rest of the function. You do return err, but there's a
> later call to hash_page_mm() that also sets err. Can slb_allocate_user()
> fail while hash_page_mm() succeeds, and would that be a problem?

Hmm, yes I think this is a problem. If slb_allocate_user() fails then we
could potentially mask that error until the actual patching
fails/miscompares later (and that *will* certainly fail in this case). I
will return the error and exit the function early in v5 of the series.
Thanks!

>
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > -   free_vm_area(this_cpu_read(text_poke_area));
> > -   return 0;
> > +   err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> > +  HPTE_USE_KERNEL_KEY);
> > +   if (err)
> > +   pr_warn("map patch: failed to insert hashed page\n");
> > +
> > +   /* See comment in switch_slb() in mm/book3s64/slb.c */
> > +   isync();
> > +
>
> The comment reads:
>
> /*
> * Synchronize slbmte preloads with possible subsequent user memory
> * address accesses by the kernel (user mode won't happen until
> * rfid, which is safe).
> */
> isync();
>
> I have to say having read the description of isync I'm not 100% sure why
> that's enough (don't we also need stores to complete?) but I'm happy to
> take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
> on trust here!
>
> I think it does make sense for you to have that barrier here: you are
> potentially about to start poking at the memory mapped through that SLB
> entry so you should make sure you're fully synchronised.
>
> > +   return err;
> >  }
> >  
>
> > +   init_temp_mm(_mapping->temp_mm, patching_mm);
> > +   use_temporary_mm(_mapping->temp_mm);
> >  
> > -   pmdp = pmd_offset(pudp, addr);
> > -   if (unlikely(!pmdp))
> > -   return -EINVAL;
> > +   /*
> > +* On Book3s64 with the Hash MMU we have to manually insert the SLB
> > +* entry and HPTE to prevent taking faults on the patching_addr later.
> > +*/
> > +   return(hash_prefault_mapping(pgprot));
>
> hmm, `return hash_prefault_mapping(pgprot);` or
> `return (hash_prefault_mapping((pgprot));` maybe?

Yeah, I noticed I left the extra parentheses here after the RESEND. I
think this is left-over when I had another wrapper here... anyway, I'll
clean it up for v5.

>
> Kind regards,
> Daniel



Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-06-20 Thread Daniel Axtens
Hi Chris,

> + /*
> +  * Choose a randomized, page-aligned address from the range:
> +  * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE]
> +  * The lower address bound is PAGE_SIZE to avoid the zero-page.
> +  * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay
> +  * under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU.
> +  */
> + patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK)
> + % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE));

I checked and poking_init() comes after the functions that init the RNG,
so this should be fine. The maths - while a bit fiddly to reason about -
does check out.

> +
> + /*
> +  * PTE allocation uses GFP_KERNEL which means we need to pre-allocate
> +  * the PTE here. We cannot do the allocation during patching with IRQs
> +  * disabled (ie. "atomic" context).
> +  */
> + ptep = get_locked_pte(patching_mm, patching_addr, );
> + BUG_ON(!ptep);
> + pte_unmap_unlock(ptep, ptl);
> +}
>  
>  #if IS_BUILTIN(CONFIG_LKDTM)
>  unsigned long read_cpu_patching_addr(unsigned int cpu)
>  {
> - return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
> + return patching_addr;
>  }
>  #endif
>  
> -static int text_area_cpu_up(unsigned int cpu)
> +struct patch_mapping {
> + spinlock_t *ptl; /* for protecting pte table */
> + pte_t *ptep;
> + struct temp_mm temp_mm;
> +};
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +
> +static inline int hash_prefault_mapping(pgprot_t pgprot)
>  {
> - struct vm_struct *area;
> + int err;
>  
> - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> - if (!area) {
> - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> - cpu);
> - return -1;
> - }
> - this_cpu_write(text_poke_area, area);
> + if (radix_enabled())
> + return 0;
>  
> - return 0;
> -}
> + err = slb_allocate_user(patching_mm, patching_addr);
> + if (err)
> + pr_warn("map patch: failed to allocate slb entry\n");
>  

Here if slb_allocate_user() fails, you'll print a warning and then fall
through to the rest of the function. You do return err, but there's a
later call to hash_page_mm() that also sets err. Can slb_allocate_user()
fail while hash_page_mm() succeeds, and would that be a problem?

> -static int text_area_cpu_down(unsigned int cpu)
> -{
> - free_vm_area(this_cpu_read(text_poke_area));
> - return 0;
> + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> +HPTE_USE_KERNEL_KEY);
> + if (err)
> + pr_warn("map patch: failed to insert hashed page\n");
> +
> + /* See comment in switch_slb() in mm/book3s64/slb.c */
> + isync();
> +

The comment reads:

/*
 * Synchronize slbmte preloads with possible subsequent user memory
 * address accesses by the kernel (user mode won't happen until
 * rfid, which is safe).
 */
 isync();

I have to say having read the description of isync I'm not 100% sure why
that's enough (don't we also need stores to complete?) but I'm happy to
take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
on trust here!

I think it does make sense for you to have that barrier here: you are
potentially about to start poking at the memory mapped through that SLB
entry so you should make sure you're fully synchronised.

> + return err;
>  }
>  

> + init_temp_mm(_mapping->temp_mm, patching_mm);
> + use_temporary_mm(_mapping->temp_mm);
>  
> - pmdp = pmd_offset(pudp, addr);
> - if (unlikely(!pmdp))
> - return -EINVAL;
> + /*
> +  * On Book3s64 with the Hash MMU we have to manually insert the SLB
> +  * entry and HPTE to prevent taking faults on the patching_addr later.
> +  */
> + return(hash_prefault_mapping(pgprot));

hmm, `return hash_prefault_mapping(pgprot);` or
`return (hash_prefault_mapping((pgprot));` maybe?

Kind regards,
Daniel


[RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching

2021-05-05 Thread Christopher M. Riedl
When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped as writeable. Currently, a
per-cpu vmalloc patch area is used for this purpose. While the patch
area is per-cpu, the temporary page mapping is inserted into the kernel
page tables for the duration of patching. The mapping is exposed to CPUs
other than the patching CPU - this is undesirable from a hardening
perspective. Use a temporary mm instead which keeps the mapping local to
the CPU doing the patching.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
space. The patching address is randomized between PAGE_SIZE and
DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
the Book3s64 Hash MMU operates - by default the space above
DEFAULT_MAP_WINDOW is not available. For now, the patching address for
all platforms/MMUs is randomized inside this range.  The number of
possible random addresses is dependent on PAGE_SIZE and limited by
DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
bits of entropy = log2(128TB / 64K) bits of entropy = 31

Randomization occurs only once during initialization at boot.

Introduce two new functions, map_patch() and unmap_patch(), to
respectively create and remove the temporary mapping with write
permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
the page for patching with PAGE_SHARED since the kernel cannot access
userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.

Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
for the patching_addr when using the Hash MMU on Book3s64 to avoid
taking an SLB and Hash fault during patching.

Since patching_addr is now a userspace address, lock/unlock KUAP on
non-Book3s64 platforms. On Book3s64 with a Radix MMU, mapping the page
with PAGE_KERNEL sets EAA[0] for the PTE which ignores the AMR (KUAP)
according to PowerISA v3.0b Figure 35. On Book3s64 with a Hash MMU, the
hash PTE for the mapping is inserted with HPTE_USE_KERNEL_KEY which
similarly avoids the need for switching KUAP.

Finally, add a new WARN_ON() to check that the instruction was patched
as intended after the temporary mapping is torn down.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

and:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 

---

v4:  * In the previous series this was two separate patches: one to init
   the temporary mm in poking_init() (unused in powerpc at the time)
   and the other to use it for patching (which removed all the
   per-cpu vmalloc code). Now that we use poking_init() in the
   existing per-cpu vmalloc approach, that separation doesn't work
   as nicely anymore so I just merged the two patches into one.
 * Preload the SLB entry and hash the page for the patching_addr
   when using Hash on book3s64 to avoid taking an SLB and Hash fault
   during patching. The previous implementation was a hack which
   changed current->mm to allow the SLB and Hash fault handlers to
   work with the temporary mm since both of those code-paths always
   assume mm == current->mm.
 * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we
   have to manage the mm->context.active_cpus counter and mm cpumask
   since they determine (via mm_is_thread_local()) if the TLB flush
   in pte_clear() is local or not - it should always be local when
   we're using the temporary mm. On book3s64's Radix MMU we can
   just call local_flush_tlb_mm().
 * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of
   KUAP.
---
 arch/powerpc/lib/code-patching.c | 209 ++-
 1 file changed, 121 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index cbdfba8a39360..7e15abc09ec04 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -19,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst 
instr,
   struct ppc_inst *patch_addr)
@@ -113,113 +116,142 @@ static inline void unuse_temporary_mm(struct temp_mm 
*temp_mm)
}
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+   spinlock_t *ptl; /* for