Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
On Thu Jul 1, 2021 at 2:51 AM CDT, Nicholas Piggin wrote: > Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm: > > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote: > >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: > >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the > >> > address to be patched is temporarily mapped as writeable. Currently, a > >> > per-cpu vmalloc patch area is used for this purpose. While the patch > >> > area is per-cpu, the temporary page mapping is inserted into the kernel > >> > page tables for the duration of patching. The mapping is exposed to CPUs > >> > other than the patching CPU - this is undesirable from a hardening > >> > perspective. Use a temporary mm instead which keeps the mapping local to > >> > the CPU doing the patching. > >> > > >> > Use the `poking_init` init hook to prepare a temporary mm and patching > >> > address. Initialize the temporary mm by copying the init mm. Choose a > >> > randomized patching address inside the temporary mm userspace address > >> > space. The patching address is randomized between PAGE_SIZE and > >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how > >> > the Book3s64 Hash MMU operates - by default the space above > >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for > >> > all platforms/MMUs is randomized inside this range. The number of > >> > possible random addresses is dependent on PAGE_SIZE and limited by > >> > DEFAULT_MAP_WINDOW. > >> > > >> > Bits of entropy with 64K page size on BOOK3S_64: > >> > > >> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > >> > > >> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > >> > bits of entropy = log2(128TB / 64K) bits of entropy = 31 > >> > > >> > Randomization occurs only once during initialization at boot. > >> > > >> > Introduce two new functions, map_patch() and unmap_patch(), to > >> > respectively create and remove the temporary mapping with write > >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping > >> > the page for patching with PAGE_SHARED since the kernel cannot access > >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. > >> > > >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE > >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid > >> > taking an SLB and Hash fault during patching. > >> > >> What prevents the SLBE or HPTE from being removed before the last > >> access? > > > > This code runs with local IRQs disabled - we also don't access anything > > else in userspace so I'm not sure what else could cause the entries to > > be removed TBH. > > > >> > >> > >> > +#ifdef CONFIG_PPC_BOOK3S_64 > >> > + > >> > +static inline int hash_prefault_mapping(pgprot_t pgprot) > >> > { > >> > -struct vm_struct *area; > >> > +int err; > >> > > >> > -area = get_vm_area(PAGE_SIZE, VM_ALLOC); > >> > -if (!area) { > >> > -WARN_ONCE(1, "Failed to create text area for cpu %d\n", > >> > -cpu); > >> > -return -1; > >> > -} > >> > -this_cpu_write(text_poke_area, area); > >> > +if (radix_enabled()) > >> > +return 0; > >> > > >> > -return 0; > >> > -} > >> > +err = slb_allocate_user(patching_mm, patching_addr); > >> > +if (err) > >> > +pr_warn("map patch: failed to allocate slb entry\n"); > >> > > >> > -static int text_area_cpu_down(unsigned int cpu) > >> > -{ > >> > -free_vm_area(this_cpu_read(text_poke_area)); > >> > -return 0; > >> > +err = hash_page_mm(patching_mm, patching_addr, > >> > pgprot_val(pgprot), 0, > >> > + HPTE_USE_KERNEL_KEY); > >> > +if (err) > >> > +pr_warn("map patch: failed to insert hashed page\n"); > >> > + > >> > +/* See comment in switch_slb() in mm/book3s64/slb.c */ > >> > +isync(); > >> > >> I'm not sure if this is enough. Could we context switch here? You've > >> got the PTL so no with a normal kernel but maybe yes with an RT kernel > >> How about taking an machine check that clears the SLB? Could the HPTE > >> get removed by something else here? > > > > All of this happens after a local_irq_save() which should at least > > prevent context switches IIUC. > > Ah yeah I didn't look that far back. A machine check can take out SLB > entries. > > > I am not sure what else could cause the > > HPTE to get removed here. > > Other CPUs? > Right because the HPTEs are "global". > >> You want to prevent faults because you might be patching a fault > >> handler? > > > > In a more general sense: I don't think we want to take page faults every > > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page > > fault handler codepath also checks
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm: > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote: >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the >> > address to be patched is temporarily mapped as writeable. Currently, a >> > per-cpu vmalloc patch area is used for this purpose. While the patch >> > area is per-cpu, the temporary page mapping is inserted into the kernel >> > page tables for the duration of patching. The mapping is exposed to CPUs >> > other than the patching CPU - this is undesirable from a hardening >> > perspective. Use a temporary mm instead which keeps the mapping local to >> > the CPU doing the patching. >> > >> > Use the `poking_init` init hook to prepare a temporary mm and patching >> > address. Initialize the temporary mm by copying the init mm. Choose a >> > randomized patching address inside the temporary mm userspace address >> > space. The patching address is randomized between PAGE_SIZE and >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how >> > the Book3s64 Hash MMU operates - by default the space above >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for >> > all platforms/MMUs is randomized inside this range. The number of >> > possible random addresses is dependent on PAGE_SIZE and limited by >> > DEFAULT_MAP_WINDOW. >> > >> > Bits of entropy with 64K page size on BOOK3S_64: >> > >> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) >> > >> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB >> > bits of entropy = log2(128TB / 64K) bits of entropy = 31 >> > >> > Randomization occurs only once during initialization at boot. >> > >> > Introduce two new functions, map_patch() and unmap_patch(), to >> > respectively create and remove the temporary mapping with write >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping >> > the page for patching with PAGE_SHARED since the kernel cannot access >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. >> > >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid >> > taking an SLB and Hash fault during patching. >> >> What prevents the SLBE or HPTE from being removed before the last >> access? > > This code runs with local IRQs disabled - we also don't access anything > else in userspace so I'm not sure what else could cause the entries to > be removed TBH. > >> >> >> > +#ifdef CONFIG_PPC_BOOK3S_64 >> > + >> > +static inline int hash_prefault_mapping(pgprot_t pgprot) >> > { >> > - struct vm_struct *area; >> > + int err; >> > >> > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); >> > - if (!area) { >> > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", >> > - cpu); >> > - return -1; >> > - } >> > - this_cpu_write(text_poke_area, area); >> > + if (radix_enabled()) >> > + return 0; >> > >> > - return 0; >> > -} >> > + err = slb_allocate_user(patching_mm, patching_addr); >> > + if (err) >> > + pr_warn("map patch: failed to allocate slb entry\n"); >> > >> > -static int text_area_cpu_down(unsigned int cpu) >> > -{ >> > - free_vm_area(this_cpu_read(text_poke_area)); >> > - return 0; >> > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, >> > + HPTE_USE_KERNEL_KEY); >> > + if (err) >> > + pr_warn("map patch: failed to insert hashed page\n"); >> > + >> > + /* See comment in switch_slb() in mm/book3s64/slb.c */ >> > + isync(); >> >> I'm not sure if this is enough. Could we context switch here? You've >> got the PTL so no with a normal kernel but maybe yes with an RT kernel >> How about taking an machine check that clears the SLB? Could the HPTE >> get removed by something else here? > > All of this happens after a local_irq_save() which should at least > prevent context switches IIUC. Ah yeah I didn't look that far back. A machine check can take out SLB entries. > I am not sure what else could cause the > HPTE to get removed here. Other CPUs? >> You want to prevent faults because you might be patching a fault >> handler? > > In a more general sense: I don't think we want to take page faults every > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page > fault handler codepath also checks `current->mm` in some places which > won't match the temporary mm. Also `current->mm` can be NULL which > caused problems in my earlier revisions of this series. Hmm, that's a bit of a hack then. Maybe doing an actual mm switch and setting current->mm properly would explode too much. Maybe that's okayish. But I can't see how the HPT code is up to the job of this in general (even if that current->mm issue was fixed). To do it without holes you would either
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote: > Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: > > When code patching a STRICT_KERNEL_RWX kernel the page containing the > > address to be patched is temporarily mapped as writeable. Currently, a > > per-cpu vmalloc patch area is used for this purpose. While the patch > > area is per-cpu, the temporary page mapping is inserted into the kernel > > page tables for the duration of patching. The mapping is exposed to CPUs > > other than the patching CPU - this is undesirable from a hardening > > perspective. Use a temporary mm instead which keeps the mapping local to > > the CPU doing the patching. > > > > Use the `poking_init` init hook to prepare a temporary mm and patching > > address. Initialize the temporary mm by copying the init mm. Choose a > > randomized patching address inside the temporary mm userspace address > > space. The patching address is randomized between PAGE_SIZE and > > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how > > the Book3s64 Hash MMU operates - by default the space above > > DEFAULT_MAP_WINDOW is not available. For now, the patching address for > > all platforms/MMUs is randomized inside this range. The number of > > possible random addresses is dependent on PAGE_SIZE and limited by > > DEFAULT_MAP_WINDOW. > > > > Bits of entropy with 64K page size on BOOK3S_64: > > > > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > > > > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > > bits of entropy = log2(128TB / 64K) bits of entropy = 31 > > > > Randomization occurs only once during initialization at boot. > > > > Introduce two new functions, map_patch() and unmap_patch(), to > > respectively create and remove the temporary mapping with write > > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping > > the page for patching with PAGE_SHARED since the kernel cannot access > > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. > > > > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE > > for the patching_addr when using the Hash MMU on Book3s64 to avoid > > taking an SLB and Hash fault during patching. > > What prevents the SLBE or HPTE from being removed before the last > access? This code runs with local IRQs disabled - we also don't access anything else in userspace so I'm not sure what else could cause the entries to be removed TBH. > > > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + > > +static inline int hash_prefault_mapping(pgprot_t pgprot) > > { > > - struct vm_struct *area; > > + int err; > > > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > + if (radix_enabled()) > > + return 0; > > > > - return 0; > > -} > > + err = slb_allocate_user(patching_mm, patching_addr); > > + if (err) > > + pr_warn("map patch: failed to allocate slb entry\n"); > > > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > > + HPTE_USE_KERNEL_KEY); > > + if (err) > > + pr_warn("map patch: failed to insert hashed page\n"); > > + > > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > > + isync(); > > I'm not sure if this is enough. Could we context switch here? You've > got the PTL so no with a normal kernel but maybe yes with an RT kernel > How about taking an machine check that clears the SLB? Could the HPTE > get removed by something else here? All of this happens after a local_irq_save() which should at least prevent context switches IIUC. I am not sure what else could cause the HPTE to get removed here. > > You want to prevent faults because you might be patching a fault > handler? In a more general sense: I don't think we want to take page faults every time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page fault handler codepath also checks `current->mm` in some places which won't match the temporary mm. Also `current->mm` can be NULL which caused problems in my earlier revisions of this series. > > Thanks, > Nick
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: > When code patching a STRICT_KERNEL_RWX kernel the page containing the > address to be patched is temporarily mapped as writeable. Currently, a > per-cpu vmalloc patch area is used for this purpose. While the patch > area is per-cpu, the temporary page mapping is inserted into the kernel > page tables for the duration of patching. The mapping is exposed to CPUs > other than the patching CPU - this is undesirable from a hardening > perspective. Use a temporary mm instead which keeps the mapping local to > the CPU doing the patching. > > Use the `poking_init` init hook to prepare a temporary mm and patching > address. Initialize the temporary mm by copying the init mm. Choose a > randomized patching address inside the temporary mm userspace address > space. The patching address is randomized between PAGE_SIZE and > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how > the Book3s64 Hash MMU operates - by default the space above > DEFAULT_MAP_WINDOW is not available. For now, the patching address for > all platforms/MMUs is randomized inside this range. The number of > possible random addresses is dependent on PAGE_SIZE and limited by > DEFAULT_MAP_WINDOW. > > Bits of entropy with 64K page size on BOOK3S_64: > > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > bits of entropy = log2(128TB / 64K) bits of entropy = 31 > > Randomization occurs only once during initialization at boot. > > Introduce two new functions, map_patch() and unmap_patch(), to > respectively create and remove the temporary mapping with write > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping > the page for patching with PAGE_SHARED since the kernel cannot access > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. > > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE > for the patching_addr when using the Hash MMU on Book3s64 to avoid > taking an SLB and Hash fault during patching. What prevents the SLBE or HPTE from being removed before the last access? > +#ifdef CONFIG_PPC_BOOK3S_64 > + > +static inline int hash_prefault_mapping(pgprot_t pgprot) > { > - struct vm_struct *area; > + int err; > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > - if (!area) { > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > - cpu); > - return -1; > - } > - this_cpu_write(text_poke_area, area); > + if (radix_enabled()) > + return 0; > > - return 0; > -} > + err = slb_allocate_user(patching_mm, patching_addr); > + if (err) > + pr_warn("map patch: failed to allocate slb entry\n"); > > -static int text_area_cpu_down(unsigned int cpu) > -{ > - free_vm_area(this_cpu_read(text_poke_area)); > - return 0; > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > +HPTE_USE_KERNEL_KEY); > + if (err) > + pr_warn("map patch: failed to insert hashed page\n"); > + > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > + isync(); I'm not sure if this is enough. Could we context switch here? You've got the PTL so no with a normal kernel but maybe yes with an RT kernel How about taking an machine check that clears the SLB? Could the HPTE get removed by something else here? You want to prevent faults because you might be patching a fault handler? Thanks, Nick
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
On Sun Jun 20, 2021 at 10:19 PM CDT, Daniel Axtens wrote: > Hi Chris, > > > + /* > > +* Choose a randomized, page-aligned address from the range: > > +* [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE] > > +* The lower address bound is PAGE_SIZE to avoid the zero-page. > > +* The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay > > +* under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU. > > +*/ > > + patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK) > > + % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE)); > > I checked and poking_init() comes after the functions that init the RNG, > so this should be fine. The maths - while a bit fiddly to reason about - > does check out. Thanks for double checking. > > > + > > + /* > > +* PTE allocation uses GFP_KERNEL which means we need to pre-allocate > > +* the PTE here. We cannot do the allocation during patching with IRQs > > +* disabled (ie. "atomic" context). > > +*/ > > + ptep = get_locked_pte(patching_mm, patching_addr, ); > > + BUG_ON(!ptep); > > + pte_unmap_unlock(ptep, ptl); > > +} > > > > #if IS_BUILTIN(CONFIG_LKDTM) > > unsigned long read_cpu_patching_addr(unsigned int cpu) > > { > > - return (unsigned long)(per_cpu(text_poke_area, cpu))->addr; > > + return patching_addr; > > } > > #endif > > > > -static int text_area_cpu_up(unsigned int cpu) > > +struct patch_mapping { > > + spinlock_t *ptl; /* for protecting pte table */ > > + pte_t *ptep; > > + struct temp_mm temp_mm; > > +}; > > + > > +#ifdef CONFIG_PPC_BOOK3S_64 > > + > > +static inline int hash_prefault_mapping(pgprot_t pgprot) > > { > > - struct vm_struct *area; > > + int err; > > > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > > - if (!area) { > > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > > - cpu); > > - return -1; > > - } > > - this_cpu_write(text_poke_area, area); > > + if (radix_enabled()) > > + return 0; > > > > - return 0; > > -} > > + err = slb_allocate_user(patching_mm, patching_addr); > > + if (err) > > + pr_warn("map patch: failed to allocate slb entry\n"); > > > > Here if slb_allocate_user() fails, you'll print a warning and then fall > through to the rest of the function. You do return err, but there's a > later call to hash_page_mm() that also sets err. Can slb_allocate_user() > fail while hash_page_mm() succeeds, and would that be a problem? Hmm, yes I think this is a problem. If slb_allocate_user() fails then we could potentially mask that error until the actual patching fails/miscompares later (and that *will* certainly fail in this case). I will return the error and exit the function early in v5 of the series. Thanks! > > > -static int text_area_cpu_down(unsigned int cpu) > > -{ > > - free_vm_area(this_cpu_read(text_poke_area)); > > - return 0; > > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > > + HPTE_USE_KERNEL_KEY); > > + if (err) > > + pr_warn("map patch: failed to insert hashed page\n"); > > + > > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > > + isync(); > > + > > The comment reads: > > /* > * Synchronize slbmte preloads with possible subsequent user memory > * address accesses by the kernel (user mode won't happen until > * rfid, which is safe). > */ > isync(); > > I have to say having read the description of isync I'm not 100% sure why > that's enough (don't we also need stores to complete?) but I'm happy to > take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") > on trust here! > > I think it does make sense for you to have that barrier here: you are > potentially about to start poking at the memory mapped through that SLB > entry so you should make sure you're fully synchronised. > > > + return err; > > } > > > > > + init_temp_mm(_mapping->temp_mm, patching_mm); > > + use_temporary_mm(_mapping->temp_mm); > > > > - pmdp = pmd_offset(pudp, addr); > > - if (unlikely(!pmdp)) > > - return -EINVAL; > > + /* > > +* On Book3s64 with the Hash MMU we have to manually insert the SLB > > +* entry and HPTE to prevent taking faults on the patching_addr later. > > +*/ > > + return(hash_prefault_mapping(pgprot)); > > hmm, `return hash_prefault_mapping(pgprot);` or > `return (hash_prefault_mapping((pgprot));` maybe? Yeah, I noticed I left the extra parentheses here after the RESEND. I think this is left-over when I had another wrapper here... anyway, I'll clean it up for v5. > > Kind regards, > Daniel
Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Hi Chris, > + /* > + * Choose a randomized, page-aligned address from the range: > + * [PAGE_SIZE, DEFAULT_MAP_WINDOW - PAGE_SIZE] > + * The lower address bound is PAGE_SIZE to avoid the zero-page. > + * The upper address bound is DEFAULT_MAP_WINDOW - PAGE_SIZE to stay > + * under DEFAULT_MAP_WINDOW with the Book3s64 Hash MMU. > + */ > + patching_addr = PAGE_SIZE + ((get_random_long() & PAGE_MASK) > + % (DEFAULT_MAP_WINDOW - 2 * PAGE_SIZE)); I checked and poking_init() comes after the functions that init the RNG, so this should be fine. The maths - while a bit fiddly to reason about - does check out. > + > + /* > + * PTE allocation uses GFP_KERNEL which means we need to pre-allocate > + * the PTE here. We cannot do the allocation during patching with IRQs > + * disabled (ie. "atomic" context). > + */ > + ptep = get_locked_pte(patching_mm, patching_addr, ); > + BUG_ON(!ptep); > + pte_unmap_unlock(ptep, ptl); > +} > > #if IS_BUILTIN(CONFIG_LKDTM) > unsigned long read_cpu_patching_addr(unsigned int cpu) > { > - return (unsigned long)(per_cpu(text_poke_area, cpu))->addr; > + return patching_addr; > } > #endif > > -static int text_area_cpu_up(unsigned int cpu) > +struct patch_mapping { > + spinlock_t *ptl; /* for protecting pte table */ > + pte_t *ptep; > + struct temp_mm temp_mm; > +}; > + > +#ifdef CONFIG_PPC_BOOK3S_64 > + > +static inline int hash_prefault_mapping(pgprot_t pgprot) > { > - struct vm_struct *area; > + int err; > > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > - if (!area) { > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > - cpu); > - return -1; > - } > - this_cpu_write(text_poke_area, area); > + if (radix_enabled()) > + return 0; > > - return 0; > -} > + err = slb_allocate_user(patching_mm, patching_addr); > + if (err) > + pr_warn("map patch: failed to allocate slb entry\n"); > Here if slb_allocate_user() fails, you'll print a warning and then fall through to the rest of the function. You do return err, but there's a later call to hash_page_mm() that also sets err. Can slb_allocate_user() fail while hash_page_mm() succeeds, and would that be a problem? > -static int text_area_cpu_down(unsigned int cpu) > -{ > - free_vm_area(this_cpu_read(text_poke_area)); > - return 0; > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > +HPTE_USE_KERNEL_KEY); > + if (err) > + pr_warn("map patch: failed to insert hashed page\n"); > + > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > + isync(); > + The comment reads: /* * Synchronize slbmte preloads with possible subsequent user memory * address accesses by the kernel (user mode won't happen until * rfid, which is safe). */ isync(); I have to say having read the description of isync I'm not 100% sure why that's enough (don't we also need stores to complete?) but I'm happy to take commit 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") on trust here! I think it does make sense for you to have that barrier here: you are potentially about to start poking at the memory mapped through that SLB entry so you should make sure you're fully synchronised. > + return err; > } > > + init_temp_mm(_mapping->temp_mm, patching_mm); > + use_temporary_mm(_mapping->temp_mm); > > - pmdp = pmd_offset(pudp, addr); > - if (unlikely(!pmdp)) > - return -EINVAL; > + /* > + * On Book3s64 with the Hash MMU we have to manually insert the SLB > + * entry and HPTE to prevent taking faults on the patching_addr later. > + */ > + return(hash_prefault_mapping(pgprot)); hmm, `return hash_prefault_mapping(pgprot);` or `return (hash_prefault_mapping((pgprot));` maybe? Kind regards, Daniel
[RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
When code patching a STRICT_KERNEL_RWX kernel the page containing the address to be patched is temporarily mapped as writeable. Currently, a per-cpu vmalloc patch area is used for this purpose. While the patch area is per-cpu, the temporary page mapping is inserted into the kernel page tables for the duration of patching. The mapping is exposed to CPUs other than the patching CPU - this is undesirable from a hardening perspective. Use a temporary mm instead which keeps the mapping local to the CPU doing the patching. Use the `poking_init` init hook to prepare a temporary mm and patching address. Initialize the temporary mm by copying the init mm. Choose a randomized patching address inside the temporary mm userspace address space. The patching address is randomized between PAGE_SIZE and DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how the Book3s64 Hash MMU operates - by default the space above DEFAULT_MAP_WINDOW is not available. For now, the patching address for all platforms/MMUs is randomized inside this range. The number of possible random addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW. Bits of entropy with 64K page size on BOOK3S_64: bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB bits of entropy = log2(128TB / 64K) bits of entropy = 31 Randomization occurs only once during initialization at boot. Introduce two new functions, map_patch() and unmap_patch(), to respectively create and remove the temporary mapping with write permissions at patching_addr. The Hash MMU on Book3s64 requires mapping the page for patching with PAGE_SHARED since the kernel cannot access userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE for the patching_addr when using the Hash MMU on Book3s64 to avoid taking an SLB and Hash fault during patching. Since patching_addr is now a userspace address, lock/unlock KUAP on non-Book3s64 platforms. On Book3s64 with a Radix MMU, mapping the page with PAGE_KERNEL sets EAA[0] for the PTE which ignores the AMR (KUAP) according to PowerISA v3.0b Figure 35. On Book3s64 with a Hash MMU, the hash PTE for the mapping is inserted with HPTE_USE_KERNEL_KEY which similarly avoids the need for switching KUAP. Finally, add a new WARN_ON() to check that the instruction was patched as intended after the temporary mapping is torn down. Based on x86 implementation: commit 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") and: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") Signed-off-by: Christopher M. Riedl --- v4: * In the previous series this was two separate patches: one to init the temporary mm in poking_init() (unused in powerpc at the time) and the other to use it for patching (which removed all the per-cpu vmalloc code). Now that we use poking_init() in the existing per-cpu vmalloc approach, that separation doesn't work as nicely anymore so I just merged the two patches into one. * Preload the SLB entry and hash the page for the patching_addr when using Hash on book3s64 to avoid taking an SLB and Hash fault during patching. The previous implementation was a hack which changed current->mm to allow the SLB and Hash fault handlers to work with the temporary mm since both of those code-paths always assume mm == current->mm. * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we have to manage the mm->context.active_cpus counter and mm cpumask since they determine (via mm_is_thread_local()) if the TLB flush in pte_clear() is local or not - it should always be local when we're using the temporary mm. On book3s64's Radix MMU we can just call local_flush_tlb_mm(). * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of KUAP. --- arch/powerpc/lib/code-patching.c | 209 ++- 1 file changed, 121 insertions(+), 88 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index cbdfba8a39360..7e15abc09ec04 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include #include @@ -19,6 +21,7 @@ #include #include #include +#include static int __patch_instruction(struct ppc_inst *exec_addr, struct ppc_inst instr, struct ppc_inst *patch_addr) @@ -113,113 +116,142 @@ static inline void unuse_temporary_mm(struct temp_mm *temp_mm) } } -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); +static struct mm_struct *patching_mm __ro_after_init; +static unsigned long patching_addr __ro_after_init; + +void __init poking_init(void) +{ + spinlock_t *ptl; /* for