[PATCH] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()

2016-09-19 Thread Johannes Weiner
Antonio reports the following crash when using fuse under memory
pressure:

[25192.515454] kernel BUG at 
/build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
[25192.517521] invalid opcode:  [#1] SMP
[25192.519602] Modules linked in: all of them
[25192.540910] CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic 
#55-Ubuntu
[25192.543411] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[25192.545840] task: 88040cae6040 ti: 880407488000 task.ti: 
880407488000
[25192.548277] RIP: 0010:[]  [] 
shadow_lru_isolate+0x181/0x190
[25192.550706] RSP: 0018:88040748bbe0  EFLAGS: 00010002
[25192.553127] RAX: 1c81 RBX: 8802f91ee928 RCX: 8802f91eeb38
[25192.44] RDX: 8802f91ee938 RSI: 8802f91ee928 RDI: 8804099ba2c0
[25192.557914] RBP: 88040748bc08 R08: 0001a7b6 R09: 003f
[25192.560237] R10: 0001a750 R11:  R12: 8804099ba2c0
[25192.562512] R13: 8803157e9680 R14: 8803157e9668 R15: 8804099ba2c8
[25192.564724] FS:  () GS:88041f28() 
knlGS:
[25192.566990] CS:  0010 DS:  ES:  CR0: 80050033
[25192.569201] CR2: 7ffabb69 CR3: 01e0a000 CR4: 000406e0
[25192.571419] Stack:
[25192.573550]  8804099ba2c0 88039e4f86f0 8802f91ee928 
8804099ba2c8
[25192.575695]  88040748bd08 88040748bc58 811b99bf 
0052
[25192.577814]   811ba380 008a 
0080
[25192.579947] Call Trace:
[25192.582022]  [] __list_lru_walk_one.isra.3+0x8f/0x130
[25192.584137]  [] ? memcg_drain_all_list_lrus+0x190/0x190
[25192.586165]  [] list_lru_walk_one+0x23/0x30
[25192.588145]  [] scan_shadow_nodes+0x34/0x50
[25192.590074]  [] shrink_slab.part.40+0x1ed/0x3d0
[25192.591985]  [] shrink_zone+0x2ca/0x2e0
[25192.593863]  [] kswapd+0x51e/0x990
[25192.595737]  [] ? mem_cgroup_shrink_node_zone+0x1c0/0x1c0
[25192.597613]  [] kthread+0xd8/0xf0
[25192.599495]  [] ? kthread_create_on_node+0x1e0/0x1e0
[25192.601335]  [] ret_from_fork+0x3f/0x70
[25192.603193]  [] ? kthread_create_on_node+0x1e0/0x1e0
[25192.605083] Code: Red
[25192.609252] RIP  [] shadow_lru_isolate+0x181/0x190
[25192.611304]  RSP 

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere
with reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page. Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert(). This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers
for checking the number of pages and shadows in a radix tree node.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Johannes Weiner 
Reported-by: Antonio SJ Musumeci 
Debugged-by: Miklos Szeredi 
Cc: [3.15+]
---
 include/linux/swap.h |   2 +
 mm/filemap.c | 114 +--
 mm/workingset.c  |  10 ++---
 3 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index b17cc48..4a529c9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -257,6 +257,7 @@ static inline void workingset_node_pages_inc(struct 
radix_tree_node *node)
 
 static inline void workingset_node_pages_dec(struct radix_tree_node *node)
 {
+   VM_BUG_ON(!workingset_node_pages(node));
node->count--;
 }
 
@@ -272,6 +273,7 @@ static inline void workingset_node_shadows_inc(struct 
radix_tree_node *node)
 
 static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
 {
+   VM_BUG_ON(!workingset_node_shadows(node));
node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a287df..2d0986a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -110,6 +110,62 @@
  *   ->tasklist_lock(memory_failure, collect_procs_ao)
  */
 
+static int page_cache_tree_insert(struct address_space *mapping,
+ 

[PATCH] mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()

2016-09-19 Thread Johannes Weiner
Antonio reports the following crash when using fuse under memory
pressure:

[25192.515454] kernel BUG at 
/build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
[25192.517521] invalid opcode:  [#1] SMP
[25192.519602] Modules linked in: all of them
[25192.540910] CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic 
#55-Ubuntu
[25192.543411] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[25192.545840] task: 88040cae6040 ti: 880407488000 task.ti: 
880407488000
[25192.548277] RIP: 0010:[]  [] 
shadow_lru_isolate+0x181/0x190
[25192.550706] RSP: 0018:88040748bbe0  EFLAGS: 00010002
[25192.553127] RAX: 1c81 RBX: 8802f91ee928 RCX: 8802f91eeb38
[25192.44] RDX: 8802f91ee938 RSI: 8802f91ee928 RDI: 8804099ba2c0
[25192.557914] RBP: 88040748bc08 R08: 0001a7b6 R09: 003f
[25192.560237] R10: 0001a750 R11:  R12: 8804099ba2c0
[25192.562512] R13: 8803157e9680 R14: 8803157e9668 R15: 8804099ba2c8
[25192.564724] FS:  () GS:88041f28() 
knlGS:
[25192.566990] CS:  0010 DS:  ES:  CR0: 80050033
[25192.569201] CR2: 7ffabb69 CR3: 01e0a000 CR4: 000406e0
[25192.571419] Stack:
[25192.573550]  8804099ba2c0 88039e4f86f0 8802f91ee928 
8804099ba2c8
[25192.575695]  88040748bd08 88040748bc58 811b99bf 
0052
[25192.577814]   811ba380 008a 
0080
[25192.579947] Call Trace:
[25192.582022]  [] __list_lru_walk_one.isra.3+0x8f/0x130
[25192.584137]  [] ? memcg_drain_all_list_lrus+0x190/0x190
[25192.586165]  [] list_lru_walk_one+0x23/0x30
[25192.588145]  [] scan_shadow_nodes+0x34/0x50
[25192.590074]  [] shrink_slab.part.40+0x1ed/0x3d0
[25192.591985]  [] shrink_zone+0x2ca/0x2e0
[25192.593863]  [] kswapd+0x51e/0x990
[25192.595737]  [] ? mem_cgroup_shrink_node_zone+0x1c0/0x1c0
[25192.597613]  [] kthread+0xd8/0xf0
[25192.599495]  [] ? kthread_create_on_node+0x1e0/0x1e0
[25192.601335]  [] ret_from_fork+0x3f/0x70
[25192.603193]  [] ? kthread_create_on_node+0x1e0/0x1e0
[25192.605083] Code: Red
[25192.609252] RIP  [] shadow_lru_isolate+0x181/0x190
[25192.611304]  RSP 

which corresponds to the following sanity check in the shadow node
tracking:

  BUG_ON(node->count & RADIX_TREE_COUNT_MASK);

The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere
with reclaim of the radix tree node under memory pressure.

While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page. Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.

To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert(). This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.

Also, make the sanity checks a bit less obscure by using the helpers
for checking the number of pages and shadows in a radix tree node.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Johannes Weiner 
Reported-by: Antonio SJ Musumeci 
Debugged-by: Miklos Szeredi 
Cc: [3.15+]
---
 include/linux/swap.h |   2 +
 mm/filemap.c | 114 +--
 mm/workingset.c  |  10 ++---
 3 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index b17cc48..4a529c9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -257,6 +257,7 @@ static inline void workingset_node_pages_inc(struct 
radix_tree_node *node)
 
 static inline void workingset_node_pages_dec(struct radix_tree_node *node)
 {
+   VM_BUG_ON(!workingset_node_pages(node));
node->count--;
 }
 
@@ -272,6 +273,7 @@ static inline void workingset_node_shadows_inc(struct 
radix_tree_node *node)
 
 static inline void workingset_node_shadows_dec(struct radix_tree_node *node)
 {
+   VM_BUG_ON(!workingset_node_shadows(node));
node->count -= 1U << RADIX_TREE_COUNT_SHIFT;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a287df..2d0986a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -110,6 +110,62 @@
  *   ->tasklist_lock(memory_failure, collect_procs_ao)
  */
 
+static int page_cache_tree_insert(struct address_space *mapping,
+ struct page *page, void **shadowp)
+{
+   struct radix_tree_node *node;
+