Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses

2021-08-27 Thread Michael Ellerman
On Wed, 18 Aug 2021 22:05:18 +1000, Michael Ellerman wrote:
> Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
> on one of his systems:
> 
>   kernel tried to execute exec-protected page (c00804073278) - exploit 
> attempt? (uid: 0)
>   BUG: Unable to handle kernel instruction fetch
>   Faulting instruction address: 0xc00804073278
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>   Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
>   CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
>   Workqueue: events control_work_handler [virtio_console]
>   NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
>   REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
>   MSR:  80004280b033   CR: 24002822 XER: 
> 200400cf
>   ...
>   NIP fill_queue+0xf0/0x210 [virtio_console]
>   LR  fill_queue+0xf0/0x210 [virtio_console]
>   Call Trace:
> fill_queue+0xb4/0x210 [virtio_console] (unreliable)
> add_port+0x1a8/0x470 [virtio_console]
> control_work_handler+0xbc/0x1e8 [virtio_console]
> process_one_work+0x290/0x590
> worker_thread+0x88/0x620
> kthread+0x194/0x1a0
> ret_from_kernel_thread+0x5c/0x64
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/mm: Fix set_memory_*() against concurrent accesses
  https://git.kernel.org/powerpc/c/9f7853d7609d59172eecfc5e7ccf503bc1b690bd

cheers


Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses

2021-08-18 Thread Laurent Vivier
On 18/08/2021 14:05, Michael Ellerman wrote:
> Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
> on one of his systems:
> 
>   kernel tried to execute exec-protected page (c00804073278) - exploit 
> attempt? (uid: 0)
>   BUG: Unable to handle kernel instruction fetch
>   Faulting instruction address: 0xc00804073278
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>   Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
>   CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
>   Workqueue: events control_work_handler [virtio_console]
>   NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
>   REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
>   MSR:  80004280b033   CR: 24002822 XER: 
> 200400cf
>   ...
>   NIP fill_queue+0xf0/0x210 [virtio_console]
>   LR  fill_queue+0xf0/0x210 [virtio_console]
>   Call Trace:
> fill_queue+0xb4/0x210 [virtio_console] (unreliable)
> add_port+0x1a8/0x470 [virtio_console]
> control_work_handler+0xbc/0x1e8 [virtio_console]
> process_one_work+0x290/0x590
> worker_thread+0x88/0x620
> kthread+0x194/0x1a0
> ret_from_kernel_thread+0x5c/0x64
> 
> Jordan, Fabiano & Murilo were able to reproduce and identify that the
> problem is caused by the call to module_enable_ro() in do_init_module(),
> which happens after the module's init function has already been called.
> 
> Our current implementation of change_page_attr() is not safe against
> concurrent accesses, because it invalidates the PTE before flushing the
> TLB and then installing the new PTE. That leaves a window in time where
> there is no valid PTE for the page, if another CPU tries to access the
> page at that time we see something like the fault above.
> 
> We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
> code doesn't handle a set_pte_at() of a valid PTE. See [1].
> 
> But we do have pte_update(), which replaces the old PTE with the new,
> meaning there's no window where the PTE is invalid. And the hash MMU
> version hash__pte_update() deals with synchronising the hash page table
> correctly.
> 
> [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/
> 
> Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
> Reported-by: Laurent Vivier 
> Signed-off-by: Fabiano Rosas 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/pageattr.c | 23 ++-
>  1 file changed, 10 insertions(+), 13 deletions(-)
> 
> v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano,
> and ptep_get() as suggested by Christophe.
> 
> diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
> index 0876216ceee6..edea388e9d3f 100644
> --- a/arch/powerpc/mm/pageattr.c
> +++ b/arch/powerpc/mm/pageattr.c
> @@ -18,16 +18,12 @@
>  /*
>   * Updates the attributes of a page in three steps:
>   *
> - * 1. invalidate the page table entry
> - * 2. flush the TLB
> - * 3. install the new entry with the updated attributes
> - *
> - * Invalidating the pte means there are situations where this will not work
> - * when in theory it should.
> - * For example:
> - * - removing write from page whilst it is being executed
> - * - setting a page read-only whilst it is being read by another CPU
> + * 1. take the page_table_lock
> + * 2. install the new entry with the updated attributes
> + * 3. flush the TLB
>   *
> + * This sequence is safe against concurrent updates, and also allows 
> updating the
> + * attributes of a page currently being executed or accessed.
>   */
>  static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
>  {
> @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long 
> addr, void *data)
>  
>   spin_lock(_mm.page_table_lock);
>  
> - /* invalidate the PTE so it's safe to modify */
> - pte = ptep_get_and_clear(_mm, addr, ptep);
> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> + pte = ptep_get(ptep);
>  
>   /* modify the PTE bits as desired, then apply */
>   switch (action) {
> @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long 
> addr, void *data)
>   break;
>   }
>  
> - set_pte_at(_mm, addr, ptep, pte);
> + pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0);
>  
>   /* See ptesync comment in radix__set_pte_at() */
>   if (radix_enabled())
>   asm volatile("ptesync": : :"memory");
> +
> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +
>   spin_unlock(_mm.page_table_lock);
>  
>   return 0;
> 
> base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30
> 

Tested-by: Laurent Vivier 



Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses

2021-08-18 Thread Murilo Opsfelder Araújo

On 8/18/21 9:05 AM, Michael Ellerman wrote:

Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:

   kernel tried to execute exec-protected page (c00804073278) - exploit 
attempt? (uid: 0)
   BUG: Unable to handle kernel instruction fetch
   Faulting instruction address: 0xc00804073278
   Oops: Kernel access of bad area, sig: 11 [#1]
   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
   Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
   CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
   Workqueue: events control_work_handler [virtio_console]
   NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
   REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
   MSR:  80004280b033   CR: 24002822 XER: 
200400cf
   ...
   NIP fill_queue+0xf0/0x210 [virtio_console]
   LR  fill_queue+0xf0/0x210 [virtio_console]
   Call Trace:
 fill_queue+0xb4/0x210 [virtio_console] (unreliable)
 add_port+0x1a8/0x470 [virtio_console]
 control_work_handler+0xbc/0x1e8 [virtio_console]
 process_one_work+0x290/0x590
 worker_thread+0x88/0x620
 kthread+0x194/0x1a0
 ret_from_kernel_thread+0x5c/0x64

Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.

Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.

We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].

But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table
correctly.

[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Michael Ellerman 


Reviewed-by: Murilo Opsfelder Araújo 


---
  arch/powerpc/mm/pageattr.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano,
 and ptep_get() as suggested by Christophe.

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 0876216ceee6..edea388e9d3f 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -18,16 +18,12 @@
  /*
   * Updates the attributes of a page in three steps:
   *
- * 1. invalidate the page table entry
- * 2. flush the TLB
- * 3. install the new entry with the updated attributes
- *
- * Invalidating the pte means there are situations where this will not work
- * when in theory it should.
- * For example:
- * - removing write from page whilst it is being executed
- * - setting a page read-only whilst it is being read by another CPU
+ * 1. take the page_table_lock
+ * 2. install the new entry with the updated attributes
+ * 3. flush the TLB
   *
+ * This sequence is safe against concurrent updates, and also allows updating 
the
+ * attributes of a page currently being executed or accessed.
   */
  static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
  {
@@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, 
void *data)
  
  	spin_lock(_mm.page_table_lock);
  
-	/* invalidate the PTE so it's safe to modify */

-   pte = ptep_get_and_clear(_mm, addr, ptep);
-   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+   pte = ptep_get(ptep);
  
  	/* modify the PTE bits as desired, then apply */

switch (action) {
@@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long 
addr, void *data)
break;
}
  
-	set_pte_at(_mm, addr, ptep, pte);

+   pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0);
  
  	/* See ptesync comment in radix__set_pte_at() */

if (radix_enabled())
asm volatile("ptesync": : :"memory");
+
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+
spin_unlock(_mm.page_table_lock);
  
  	return 0;


base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30




--
Murilo


Re: [PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses

2021-08-18 Thread Christophe Leroy




Le 18/08/2021 à 14:05, Michael Ellerman a écrit :

Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:

   kernel tried to execute exec-protected page (c00804073278) - exploit 
attempt? (uid: 0)
   BUG: Unable to handle kernel instruction fetch
   Faulting instruction address: 0xc00804073278
   Oops: Kernel access of bad area, sig: 11 [#1]
   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
   Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
   CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
   Workqueue: events control_work_handler [virtio_console]
   NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
   REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
   MSR:  80004280b033   CR: 24002822 XER: 
200400cf
   ...
   NIP fill_queue+0xf0/0x210 [virtio_console]
   LR  fill_queue+0xf0/0x210 [virtio_console]
   Call Trace:
 fill_queue+0xb4/0x210 [virtio_console] (unreliable)
 add_port+0x1a8/0x470 [virtio_console]
 control_work_handler+0xbc/0x1e8 [virtio_console]
 process_one_work+0x290/0x590
 worker_thread+0x88/0x620
 kthread+0x194/0x1a0
 ret_from_kernel_thread+0x5c/0x64

Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.

Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.

We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].

But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table
correctly.

[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Michael Ellerman 


Reviewed-by: Christophe Leroy 


---
  arch/powerpc/mm/pageattr.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano,
 and ptep_get() as suggested by Christophe.

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 0876216ceee6..edea388e9d3f 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -18,16 +18,12 @@
  /*
   * Updates the attributes of a page in three steps:
   *
- * 1. invalidate the page table entry
- * 2. flush the TLB
- * 3. install the new entry with the updated attributes
- *
- * Invalidating the pte means there are situations where this will not work
- * when in theory it should.
- * For example:
- * - removing write from page whilst it is being executed
- * - setting a page read-only whilst it is being read by another CPU
+ * 1. take the page_table_lock
+ * 2. install the new entry with the updated attributes
+ * 3. flush the TLB
   *
+ * This sequence is safe against concurrent updates, and also allows updating 
the
+ * attributes of a page currently being executed or accessed.
   */
  static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
  {
@@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, 
void *data)
  
  	spin_lock(_mm.page_table_lock);
  
-	/* invalidate the PTE so it's safe to modify */

-   pte = ptep_get_and_clear(_mm, addr, ptep);
-   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+   pte = ptep_get(ptep);
  
  	/* modify the PTE bits as desired, then apply */

switch (action) {
@@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long 
addr, void *data)
break;
}
  
-	set_pte_at(_mm, addr, ptep, pte);

+   pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0);
  
  	/* See ptesync comment in radix__set_pte_at() */

if (radix_enabled())
asm volatile("ptesync": : :"memory");
+
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+
spin_unlock(_mm.page_table_lock);
  
  	return 0;


base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30



[PATCH v2] powerpc/mm: Fix set_memory_*() against concurrent accesses

2021-08-18 Thread Michael Ellerman
Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
on one of his systems:

  kernel tried to execute exec-protected page (c00804073278) - exploit 
attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch
  Faulting instruction address: 0xc00804073278
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
  CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
  Workqueue: events control_work_handler [virtio_console]
  NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
  REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
  MSR:  80004280b033   CR: 24002822 XER: 
200400cf
  ...
  NIP fill_queue+0xf0/0x210 [virtio_console]
  LR  fill_queue+0xf0/0x210 [virtio_console]
  Call Trace:
fill_queue+0xb4/0x210 [virtio_console] (unreliable)
add_port+0x1a8/0x470 [virtio_console]
control_work_handler+0xbc/0x1e8 [virtio_console]
process_one_work+0x290/0x590
worker_thread+0x88/0x620
kthread+0x194/0x1a0
ret_from_kernel_thread+0x5c/0x64

Jordan, Fabiano & Murilo were able to reproduce and identify that the
problem is caused by the call to module_enable_ro() in do_init_module(),
which happens after the module's init function has already been called.

Our current implementation of change_page_attr() is not safe against
concurrent accesses, because it invalidates the PTE before flushing the
TLB and then installing the new PTE. That leaves a window in time where
there is no valid PTE for the page, if another CPU tries to access the
page at that time we see something like the fault above.

We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
code doesn't handle a set_pte_at() of a valid PTE. See [1].

But we do have pte_update(), which replaces the old PTE with the new,
meaning there's no window where the PTE is invalid. And the hash MMU
version hash__pte_update() deals with synchronising the hash page table
correctly.

[1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r@linux.ibm.com/

Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
Reported-by: Laurent Vivier 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/mm/pageattr.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano,
and ptep_get() as suggested by Christophe.

diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
index 0876216ceee6..edea388e9d3f 100644
--- a/arch/powerpc/mm/pageattr.c
+++ b/arch/powerpc/mm/pageattr.c
@@ -18,16 +18,12 @@
 /*
  * Updates the attributes of a page in three steps:
  *
- * 1. invalidate the page table entry
- * 2. flush the TLB
- * 3. install the new entry with the updated attributes
- *
- * Invalidating the pte means there are situations where this will not work
- * when in theory it should.
- * For example:
- * - removing write from page whilst it is being executed
- * - setting a page read-only whilst it is being read by another CPU
+ * 1. take the page_table_lock
+ * 2. install the new entry with the updated attributes
+ * 3. flush the TLB
  *
+ * This sequence is safe against concurrent updates, and also allows updating 
the
+ * attributes of a page currently being executed or accessed.
  */
 static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
 {
@@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, 
void *data)
 
spin_lock(_mm.page_table_lock);
 
-   /* invalidate the PTE so it's safe to modify */
-   pte = ptep_get_and_clear(_mm, addr, ptep);
-   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+   pte = ptep_get(ptep);
 
/* modify the PTE bits as desired, then apply */
switch (action) {
@@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long 
addr, void *data)
break;
}
 
-   set_pte_at(_mm, addr, ptep, pte);
+   pte_update(_mm, addr, ptep, ~0UL, pte_val(pte), 0);
 
/* See ptesync comment in radix__set_pte_at() */
if (radix_enabled())
asm volatile("ptesync": : :"memory");
+
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+
spin_unlock(_mm.page_table_lock);
 
return 0;

base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30
-- 
2.25.1