Re: [PATCH v2] mm/slub: fix panic in slab_alloc_node()

2020-10-28 Thread Laurent Dufour

Le 28/10/2020 à 12:11, Christopher Lameter a écrit :

On Tue, 27 Oct 2020, Laurent Dufour wrote:


The issue is that object is not NULL while page is NULL which is odd but
may happen if the cache flush happened after loading object but before
loading page. Thus checking for the page pointer is required too.



Ok then lets revert commit  6159d0f5c03e? The situation may occur
elsewhere too.


The only other call to node_match() is in ___slab_alloc(), and the page pointer 
is already checked there.

So there is no real need to check it in node_match().


Re: [PATCH v2] mm/slub: fix panic in slab_alloc_node()

2020-10-28 Thread Christopher Lameter
On Tue, 27 Oct 2020, Laurent Dufour wrote:

> The issue is that object is not NULL while page is NULL which is odd but
> may happen if the cache flush happened after loading object but before
> loading page. Thus checking for the page pointer is required too.


Ok then lets revert commit  6159d0f5c03e? The situation may occur
elsewhere too.


[PATCH v2] mm/slub: fix panic in slab_alloc_node()

2020-10-27 Thread Laurent Dufour
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
with 11TB of ram, I hit the following panic:

BUG: Kernel NULL pointer dereference on read at 0x0007
Faulting instruction address: 0xc0456048
Oops: Kernel access of bad area, sig: 11 [#2]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp
CPU: 160 PID: 1 Comm: systemd Tainted: G  D   5.9.0 #1
NIP:  c0456048 LR: c0455fd4 CTR: c047b350
REGS: c6028d1b77a0 TRAP: 0300   Tainted: G  D(5.9.0)
MSR:  80009033   CR: 24004228  XER: 
CFAR: c000f1b0 DAR: 0007 DSISR: 4000 IRQMASK: 0
GPR00: c0455fd4 c6028d1b7a30 c1bec800 
GPR04: 0dc0  000374ef c7c53df99320
GPR08: 07c53c98  07c53c98 
GPR12: 4400 c0001e8e4400  0f6a
GPR16:  c1c25930 c1d62528 00c1
GPR20: c1d62538 c6be469e9000 000fffe0 c03c0ff8
GPR24: 0018  0dc0 
GPR28: c7c513755700 c1c236a4 c7bc4001f800 0001
NIP [c0456048] __kmalloc_node+0x108/0x790
LR [c0455fd4] __kmalloc_node+0x94/0x790
Call Trace:
[c6028d1b7a30] [c7c51af92000] 0xc7c51af92000 (unreliable)
[c6028d1b7aa0] [c03c0ff8] kvmalloc_node+0x58/0x110
[c6028d1b7ae0] [c047b45c] mem_cgroup_css_online+0x10c/0x270
[c6028d1b7b30] [c0241fd8] online_css+0x48/0xd0
[c6028d1b7b60] [c024af14] cgroup_apply_control_enable+0x2c4/0x470
[c6028d1b7c40] [c024e838] cgroup_mkdir+0x408/0x5f0
[c6028d1b7cb0] [c05a4ef0] kernfs_iop_mkdir+0x90/0x100
[c6028d1b7cf0] [c04b8168] vfs_mkdir+0x138/0x250
[c6028d1b7d40] [c04baf04] do_mkdirat+0x154/0x1c0
[c6028d1b7dc0] [c0032b38] system_call_exception+0xf8/0x200
[c6028d1b7e20] [c000c740] system_call_common+0xf0/0x27c
Instruction dump:
e93e e90d0030 39290008 7cc9402a e94d0030 e93e 7ce95214 7f89502a
2fbc 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78

This pointing to the following code:

mm/slub.c:2851
if (unlikely(!object || !node_match(page, node))) {
c0456038:   00 00 bc 2f cmpdi   cr7,r28,0
c045603c:   18 00 9e 41 beq cr7,c0456054 
<__kmalloc_node+0x114>
node_match():
mm/slub.c:2491
if (node != NUMA_NO_NODE && page_to_nid(page) != node)
c0456040:   30 02 92 41 beq cr4,c0456270 
<__kmalloc_node+0x330>
page_to_nid():
include/linux/mm.h:1294
c0456044:   10 00 27 e9 ld  r9,16(r7)
c0456048:   07 00 29 89 lbz r9,7(r9) r9 = NULL
node_match():
mm/slub.c:2491
c045604c:   00 48 99 7f cmpwcr7,r25,r9
c0456050:   20 02 9e 41 beq cr7,c0456270 
<__kmalloc_node+0x330>

The panic occurred in slab_alloc_node() when checking for the page's node:
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);

The issue is that object is not NULL while page is NULL which is odd but
may happen if the cache flush happened after loading object but before
loading page. Thus checking for the page pointer is required too.

The cache flush is done through an inter processor interrupt when a piece
of memory is off-lined. That interrupt is triggered when a memory
hot-unplug operation is initiated and offline_pages() is calling the slub's
MEM_GOING_OFFLINE callback slab_mem_going_offline_callback() which is
calling flush_cpu_slab(). If that interrupt is caught between the
reading of c->freelist and the reading of c->page, this could lead to such
a situation. That situation is expected and the later call to
this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
the whole operation.

In commit 6159d0f5c03e ("mm/slub.c: page is always non-NULL in
node_match()") check on the page pointer has been removed assuming that
page is always valid when it is called. It happens that this is not true in
that particular case, so check for page before calling node_match() here.

Fixes: 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()")
Signed-off-by: Laurent Dufour 
Acked-by: Vlastimil Babka 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: sta...@vger.kernel.org
---
 mm/slub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8f66de8a5ab3..7dc5c6aaf4b7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2852,7 +2852,7 @@ static __always_inline void *slab_alloc_node(struct