Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-25 Thread E V
So I rebooted with 4.9rc6 with the patch inspired by the thread
"[PATCH] btrfs: limit the number of asynchronous delalloc pages to
reasonable value", but at 512K pages, ie:

 diff -u2 fs/btrfs/inode.c ../linux-4.9-rc6/fs/btrfs/
--- fs/btrfs/inode.c2016-11-13 13:32:32.0 -0500
+++ ../linux-4.9-rc6/fs/btrfs/inode.c   2016-11-23 08:31:02.145669550 -0500
@@ -1159,5 +1159,5 @@
unsigned long nr_pages;
u64 cur_end;
-   int limit = 10 * SZ_1M;
+   int limit = SZ_512K;

clear_extent_bit(_I(inode)->io_tree, start, end, EXTENT_LOCKED,

System still OOM'd after a few hours of rsync copying & deleting
files, but it didn't panic this time which was nice ;-) I then set:
echo 500 >> /proc/sys/vm/watermark_scale_factor
echo 3 >> /proc/sys/vm/dirty_background_ratio

and system has been running rsync fine for most of a day. system
memory load is noticably different in sar -r after changing the vm
params, rsync during OOM:
12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached
kbcommit   %commit  kbactive   kbinact   kbdirty
03:25:05 AM158616  32836640 99.52 72376  30853268
2942644  3.62  26048876   4977872   4936488
03:30:36 AM157700  32837556 99.52 72468  30944468
2940028  3.62  26070084   4957328   4957432
03:35:02 AM   1802144  31193112 94.54 72560  29266432
2944352  3.62  26184324   3182048187784
03:40:32 AM157272  32837984 99.52 72648  30934432
3007244  3.70  26102636   4930744   4927832
03:45:05 AM158288  32836968 99.52 72896  30980504
412108  0.51  26089920   4959668   4977556
running rsync after tuning VM params:
08:35:01 AM   1903352  31091904 94.23232772  26603624
2680952  3.30  24133864   5019748   1229964
08:40:01 AM   2878552  30116704 91.28232800  25641520
2697356  3.32  24158248   4039372   2864656
08:45:01 AM   3482616  29512640 89.45232656  25043068
2696144  3.32  24087376   3526164   1897192
08:50:01 AM   3590672  29404584 89.12232856  24962856
2704196  3.33  24078188   3451400666760
08:55:01 AM   2064900  30930356 93.74234800  26480996
2730384  3.36  24009244   5044012 50028

On Tue, Nov 22, 2016 at 9:48 AM, Vlastimil Babka  wrote:
> On 11/22/2016 02:58 PM, E V wrote:
>> System OOM'd several times last night with 4.8.10, I attached the
>> page_owner output from a morning cat ~8 hours after OOM's to the
>> bugzilla case, split and compressed to fit under the 5M attachment
>> limit. Let me know if you need anything else.
>
> Looks like for some reason, the stack saving produces garbage stacks
> that only repeat save_stack_trace and save_stack functions :/
>
> But judging from gfp flags and page flags, most pages seem to be
> allocated with:
>
> mask 0x2400840(GFP_NOFS|__GFP_NOFAIL)
>
> and page flags:
>
> 0x26c(referenced|uptodate|lru|active)
> or
> 0x200016c(referenced|uptodate|lru|active|owner_priv_1)
> or
> 0x200086c(referenced|uptodate|lru|active|private)
>
> While GFP_HIGHUSER_MOVABLE (which I would expect on lru) are less frequent.
>
> Example:
>> grep GFP_NOFS page_owner_after_af | wc -l
> 973596
>> grep GFP_HIGHUSER_MOVABLE page_owner_after_af | wc -l
> 158879
>> grep GFP_NOFAIL page_owner_after_af | wc -l
> 971442
>
> grepping for btrfs shows that at least some stacks for NOFS/NOFAIL pages
> imply it:
> clear_state_bit+0x135/0x1c0 [btrfs]
> or
> add_delayed_tree_ref+0xbf/0x170 [btrfs]
> or
> __btrfs_map_block+0x6a8/0x1200 [btrfs]
> or
> btrfs_buffer_uptodate+0x48/0x70 [btrfs]
> or
> btrfs_set_path_blocking+0x34/0x60 [btrfs]
>
> and some more variants.
>
> So looks like the pages contain btrfs metadata, are on file lru and from
> previous checks of /proc/kpagecount we know that they most likely have
> page_count() == 0 but are not freed. Could btrfs guys provide some
> insight here?
>
>> On Fri, Nov 18, 2016 at 10:02 AM, E V  wrote:
>>> Yes, the short window between the stalls and the panic makes it
>>> difficult to manually check much. I could setup a cron every 5 minutes
>>> or so if you want. Also, I see the OOM's in 4.8, but it has yet to
>>> panic on me. Where as 4.9rc has panic'd both times I've booted it, so
>>> depending on what you want to look at it might be easier to
>>> investigate on 4.8. Let me know, I can turn on a couple of the DEBUG
>>> config's and build a new 4.8.8. Never looked into a netconsole or
>>> serial console. I think just getting the system to use a higher res
>>> console would be an improvement, but the OOM's seemed to be the root
>>> cause of the panic so I haven't spent any time looking into that as of
>>> yet,
>>>
>>> Thanks,
>>> -Eli
>>>
>>> On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa
>>>  wrote:
 On 2016/11/18 6:49, Vlastimil Babka wrote:
> On 11/16/2016 02:39 PM, E V wrote:
>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
>> the 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-22 Thread Vlastimil Babka
On 11/22/2016 02:58 PM, E V wrote:
> System OOM'd several times last night with 4.8.10, I attached the
> page_owner output from a morning cat ~8 hours after OOM's to the
> bugzilla case, split and compressed to fit under the 5M attachment
> limit. Let me know if you need anything else.

Looks like for some reason, the stack saving produces garbage stacks
that only repeat save_stack_trace and save_stack functions :/

But judging from gfp flags and page flags, most pages seem to be
allocated with:

mask 0x2400840(GFP_NOFS|__GFP_NOFAIL)

and page flags:

0x26c(referenced|uptodate|lru|active)
or
0x200016c(referenced|uptodate|lru|active|owner_priv_1)
or
0x200086c(referenced|uptodate|lru|active|private)

While GFP_HIGHUSER_MOVABLE (which I would expect on lru) are less frequent.

Example:
> grep GFP_NOFS page_owner_after_af | wc -l
973596
> grep GFP_HIGHUSER_MOVABLE page_owner_after_af | wc -l
158879
> grep GFP_NOFAIL page_owner_after_af | wc -l
971442

grepping for btrfs shows that at least some stacks for NOFS/NOFAIL pages
imply it:
clear_state_bit+0x135/0x1c0 [btrfs]
or
add_delayed_tree_ref+0xbf/0x170 [btrfs]
or
__btrfs_map_block+0x6a8/0x1200 [btrfs]
or
btrfs_buffer_uptodate+0x48/0x70 [btrfs]
or
btrfs_set_path_blocking+0x34/0x60 [btrfs]

and some more variants.

So looks like the pages contain btrfs metadata, are on file lru and from
previous checks of /proc/kpagecount we know that they most likely have
page_count() == 0 but are not freed. Could btrfs guys provide some
insight here?

> On Fri, Nov 18, 2016 at 10:02 AM, E V  wrote:
>> Yes, the short window between the stalls and the panic makes it
>> difficult to manually check much. I could setup a cron every 5 minutes
>> or so if you want. Also, I see the OOM's in 4.8, but it has yet to
>> panic on me. Where as 4.9rc has panic'd both times I've booted it, so
>> depending on what you want to look at it might be easier to
>> investigate on 4.8. Let me know, I can turn on a couple of the DEBUG
>> config's and build a new 4.8.8. Never looked into a netconsole or
>> serial console. I think just getting the system to use a higher res
>> console would be an improvement, but the OOM's seemed to be the root
>> cause of the panic so I haven't spent any time looking into that as of
>> yet,
>>
>> Thanks,
>> -Eli
>>
>> On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa
>>  wrote:
>>> On 2016/11/18 6:49, Vlastimil Babka wrote:
 On 11/16/2016 02:39 PM, E V wrote:
> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
> the stack trace, and the 38 call traces in a 2 minute window shortly
> before, to the bugzilla case for those not on it's e-mail list:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=186671

 The panic screenshot has only the last part, but the end marker says
 it's OOM with no killable processes. The DEBUG_VM config thus didn't
 trigger anything, and still there's tons of pagecache, mostly clean,
 that's not being reclaimed.

 Could you now try this?
 - enable CONFIG_PAGE_OWNER
 - boot with kernel option: page_owner=on
 - after the first oom, "cat /sys/kernel/debug/page_owner > file"
 - provide the file (compressed, it will be quite large)
>>>
>>> Excuse me for a noise, but do we really need to do
>>> "cat /sys/kernel/debug/page_owner > file" after the first OOM killer
>>> invocation? I worry that it might be too difficult to do.
>>> Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file"
>>> hourly and compare tendency between the latest one and previous one?
>>>
>>> This system has swap, and /var/log/messages before panic
>>> reports that swapin was stalling at memory allocation.
>>>
>>> 
>>> [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, 
>>> order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
>>> [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW 
>>> I 4.9.0-rc5 #2
>>> [130346.262662]   8129ba69 8170e4c8 
>>> c90003ccb8d8
>>> [130346.262714]  8113449f 024200ca1ca11b40 8170e4c8 
>>> c90003ccb880
>>> [130346.262765]  0010 c90003ccb8e8 c90003ccb898 
>>> 88041f226e80
>>> [130346.262817] Call Trace:
>>> [130346.262843]  [] ? dump_stack+0x46/0x5d
>>> [130346.262872]  [] ? warn_alloc+0x11f/0x140
>>> [130346.262899]  [] ? __alloc_pages_slowpath+0x84b/0xa80
>>> [130346.262929]  [] ? __alloc_pages_nodemask+0x2b0/0x2f0
>>> [130346.262960]  [] ? alloc_pages_vma+0xbe/0x260
>>> [130346.262989]  [] ? pagecache_get_page+0x22/0x280
>>> [130346.263019]  [] ? __read_swap_cache_async+0x118/0x1a0
>>> [130346.263048]  [] ? read_swap_cache_async+0xf/0x30
>>> [130346.263077]  [] ? swapin_readahead+0x16e/0x1c0
>>> [130346.263106]  [] ? radix_tree_lookup_slot+0xe/0x20
>>> [130346.263135]  [] ? find_get_entry+0x14/0x130
>>> [130346.263162]  [] ? 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-22 Thread E V
System OOM'd several times last night with 4.8.10, I attached the
page_owner output from a morning cat ~8 hours after OOM's to the
bugzilla case, split and compressed to fit under the 5M attachment
limit. Let me know if you need anything else.

On Fri, Nov 18, 2016 at 10:02 AM, E V  wrote:
> Yes, the short window between the stalls and the panic makes it
> difficult to manually check much. I could setup a cron every 5 minutes
> or so if you want. Also, I see the OOM's in 4.8, but it has yet to
> panic on me. Where as 4.9rc has panic'd both times I've booted it, so
> depending on what you want to look at it might be easier to
> investigate on 4.8. Let me know, I can turn on a couple of the DEBUG
> config's and build a new 4.8.8. Never looked into a netconsole or
> serial console. I think just getting the system to use a higher res
> console would be an improvement, but the OOM's seemed to be the root
> cause of the panic so I haven't spent any time looking into that as of
> yet,
>
> Thanks,
> -Eli
>
> On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa
>  wrote:
>> On 2016/11/18 6:49, Vlastimil Babka wrote:
>>> On 11/16/2016 02:39 PM, E V wrote:
 System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
 the stack trace, and the 38 call traces in a 2 minute window shortly
 before, to the bugzilla case for those not on it's e-mail list:

 https://bugzilla.kernel.org/show_bug.cgi?id=186671
>>>
>>> The panic screenshot has only the last part, but the end marker says
>>> it's OOM with no killable processes. The DEBUG_VM config thus didn't
>>> trigger anything, and still there's tons of pagecache, mostly clean,
>>> that's not being reclaimed.
>>>
>>> Could you now try this?
>>> - enable CONFIG_PAGE_OWNER
>>> - boot with kernel option: page_owner=on
>>> - after the first oom, "cat /sys/kernel/debug/page_owner > file"
>>> - provide the file (compressed, it will be quite large)
>>
>> Excuse me for a noise, but do we really need to do
>> "cat /sys/kernel/debug/page_owner > file" after the first OOM killer
>> invocation? I worry that it might be too difficult to do.
>> Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file"
>> hourly and compare tendency between the latest one and previous one?
>>
>> This system has swap, and /var/log/messages before panic
>> reports that swapin was stalling at memory allocation.
>>
>> 
>> [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, 
>> order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
>> [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I 
>> 4.9.0-rc5 #2
>> [130346.262662]   8129ba69 8170e4c8 
>> c90003ccb8d8
>> [130346.262714]  8113449f 024200ca1ca11b40 8170e4c8 
>> c90003ccb880
>> [130346.262765]  0010 c90003ccb8e8 c90003ccb898 
>> 88041f226e80
>> [130346.262817] Call Trace:
>> [130346.262843]  [] ? dump_stack+0x46/0x5d
>> [130346.262872]  [] ? warn_alloc+0x11f/0x140
>> [130346.262899]  [] ? __alloc_pages_slowpath+0x84b/0xa80
>> [130346.262929]  [] ? __alloc_pages_nodemask+0x2b0/0x2f0
>> [130346.262960]  [] ? alloc_pages_vma+0xbe/0x260
>> [130346.262989]  [] ? pagecache_get_page+0x22/0x280
>> [130346.263019]  [] ? __read_swap_cache_async+0x118/0x1a0
>> [130346.263048]  [] ? read_swap_cache_async+0xf/0x30
>> [130346.263077]  [] ? swapin_readahead+0x16e/0x1c0
>> [130346.263106]  [] ? radix_tree_lookup_slot+0xe/0x20
>> [130346.263135]  [] ? find_get_entry+0x14/0x130
>> [130346.263162]  [] ? pagecache_get_page+0x22/0x280
>> [130346.263193]  [] ? do_swap_page+0x44f/0x5f0
>> [130346.263220]  [] ? __radix_tree_lookup+0x62/0xc0
>> [130346.263249]  [] ? handle_mm_fault+0x66a/0xf00
>> [130346.263277]  [] ? find_get_entry+0x14/0x130
>> [130346.263305]  [] ? __do_page_fault+0x1c5/0x490
>> [130346.263336]  [] ? page_fault+0x22/0x30
>> [130346.263364]  [] ? copy_user_generic_string+0x2c/0x40
>> [130346.263395]  [] ? set_fd_set+0x1d/0x30
>> [130346.263422]  [] ? core_sys_select+0x1a5/0x260
>> [130346.263450]  [] ? getname_flags+0x6a/0x1e0
>> [130346.263479]  [] ? cp_new_stat+0x115/0x130
>> [130346.263509]  [] ? ktime_get_ts64+0x3f/0xf0
>> [130346.263537]  [] ? SyS_select+0xa5/0xe0
>> [130346.263564]  [] ? entry_SYSCALL_64_fastpath+0x13/0x94
>> 
>>
>> Under such situation, trying to login and execute /bin/cat could take 
>> minutes.
>> Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a
>> situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress.
>>
>> 
>> [130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, 
>> mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
>> [130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I 
>> 4.9.0-rc5 #2
>> [130420.008401] Workqueue: btrfs-extent-refs 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-18 Thread E V
Yes, the short window between the stalls and the panic makes it
difficult to manually check much. I could setup a cron every 5 minutes
or so if you want. Also, I see the OOM's in 4.8, but it has yet to
panic on me. Where as 4.9rc has panic'd both times I've booted it, so
depending on what you want to look at it might be easier to
investigate on 4.8. Let me know, I can turn on a couple of the DEBUG
config's and build a new 4.8.8. Never looked into a netconsole or
serial console. I think just getting the system to use a higher res
console would be an improvement, but the OOM's seemed to be the root
cause of the panic so I haven't spent any time looking into that as of
yet,

Thanks,
-Eli

On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa
 wrote:
> On 2016/11/18 6:49, Vlastimil Babka wrote:
>> On 11/16/2016 02:39 PM, E V wrote:
>>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
>>> the stack trace, and the 38 call traces in a 2 minute window shortly
>>> before, to the bugzilla case for those not on it's e-mail list:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
>>
>> The panic screenshot has only the last part, but the end marker says
>> it's OOM with no killable processes. The DEBUG_VM config thus didn't
>> trigger anything, and still there's tons of pagecache, mostly clean,
>> that's not being reclaimed.
>>
>> Could you now try this?
>> - enable CONFIG_PAGE_OWNER
>> - boot with kernel option: page_owner=on
>> - after the first oom, "cat /sys/kernel/debug/page_owner > file"
>> - provide the file (compressed, it will be quite large)
>
> Excuse me for a noise, but do we really need to do
> "cat /sys/kernel/debug/page_owner > file" after the first OOM killer
> invocation? I worry that it might be too difficult to do.
> Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file"
> hourly and compare tendency between the latest one and previous one?
>
> This system has swap, and /var/log/messages before panic
> reports that swapin was stalling at memory allocation.
>
> 
> [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, order:0, 
> mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
> [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I  
>4.9.0-rc5 #2
> [130346.262662]   8129ba69 8170e4c8 
> c90003ccb8d8
> [130346.262714]  8113449f 024200ca1ca11b40 8170e4c8 
> c90003ccb880
> [130346.262765]  0010 c90003ccb8e8 c90003ccb898 
> 88041f226e80
> [130346.262817] Call Trace:
> [130346.262843]  [] ? dump_stack+0x46/0x5d
> [130346.262872]  [] ? warn_alloc+0x11f/0x140
> [130346.262899]  [] ? __alloc_pages_slowpath+0x84b/0xa80
> [130346.262929]  [] ? __alloc_pages_nodemask+0x2b0/0x2f0
> [130346.262960]  [] ? alloc_pages_vma+0xbe/0x260
> [130346.262989]  [] ? pagecache_get_page+0x22/0x280
> [130346.263019]  [] ? __read_swap_cache_async+0x118/0x1a0
> [130346.263048]  [] ? read_swap_cache_async+0xf/0x30
> [130346.263077]  [] ? swapin_readahead+0x16e/0x1c0
> [130346.263106]  [] ? radix_tree_lookup_slot+0xe/0x20
> [130346.263135]  [] ? find_get_entry+0x14/0x130
> [130346.263162]  [] ? pagecache_get_page+0x22/0x280
> [130346.263193]  [] ? do_swap_page+0x44f/0x5f0
> [130346.263220]  [] ? __radix_tree_lookup+0x62/0xc0
> [130346.263249]  [] ? handle_mm_fault+0x66a/0xf00
> [130346.263277]  [] ? find_get_entry+0x14/0x130
> [130346.263305]  [] ? __do_page_fault+0x1c5/0x490
> [130346.263336]  [] ? page_fault+0x22/0x30
> [130346.263364]  [] ? copy_user_generic_string+0x2c/0x40
> [130346.263395]  [] ? set_fd_set+0x1d/0x30
> [130346.263422]  [] ? core_sys_select+0x1a5/0x260
> [130346.263450]  [] ? getname_flags+0x6a/0x1e0
> [130346.263479]  [] ? cp_new_stat+0x115/0x130
> [130346.263509]  [] ? ktime_get_ts64+0x3f/0xf0
> [130346.263537]  [] ? SyS_select+0xa5/0xe0
> [130346.263564]  [] ? entry_SYSCALL_64_fastpath+0x13/0x94
> 
>
> Under such situation, trying to login and execute /bin/cat could take minutes.
> Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a
> situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress.
>
> 
> [130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, 
> mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> [130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I  
>4.9.0-rc5 #2
> [130420.008401] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
> [130420.008432]   8129ba69 8170e4c8 
> c900087836a0
> [130420.008483]  8113449f 024008401e3f1b40 8170e4c8 
> c90008783648
> [130420.008534]  0010 c900087836b0 c90008783660 
> 88041ecc4340
> [130420.008586] Call Trace:
> [130420.008611]  [] ? dump_stack+0x46/0x5d
> [130420.008640]  [] ? warn_alloc+0x11f/0x140
> 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-18 Thread Janos Toth F.
It could be totally unrelated but I have a similar problem: processes
get randomly OOM'd when I am doing anything "sort of heavy" on my
Btrfs filesystems.
I did some "evil tuning", so I assumed that must be the problem (even
if the values looked sane for my system). Thus, I kept cutting back on
the manually set values (mostly dirty/background ratio, io scheduler
request queue size and such tunables) but it seems to be a dead end. I
guess anything I change in order to try and cut back on the related
memory footprint just makes the OOMs less frequent but it's only a
matter of time and coincidence (lots of things randomly happen to do
some notable amount of IO) until OOMs happen anyway.
It seems to be plenty enough to start a defrag or balance on more than
a single filesystem (in parallel) and pretty much any notable "useful"
user load will have a high change of triggering OOMs (and get killed)
sooner or later. It's just my limited observation but database-like
loads [like that of bitcoind] (sync writes and/or frequent flushes?)
or high priority buffered writes (ffmpeg running with higher than
default priority and saving live video streams into files without
recoding) seem to have higher chance of triggering this (more so than
simply reading or writing files sequentially and asynchronously,
either locally or through Samba).
I am on gentoo-sources 4.8.8 right now but it was there with 4.7.x as well.

On Thu, Nov 17, 2016 at 10:49 PM, Vlastimil Babka  wrote:
> On 11/16/2016 02:39 PM, E V wrote:
>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
>> the stack trace, and the 38 call traces in a 2 minute window shortly
>> before, to the bugzilla case for those not on it's e-mail list:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
>
> The panic screenshot has only the last part, but the end marker says
> it's OOM with no killable processes. The DEBUG_VM config thus didn't
> trigger anything, and still there's tons of pagecache, mostly clean,
> that's not being reclaimed.
>
> Could you now try this?
> - enable CONFIG_PAGE_OWNER
> - boot with kernel option: page_owner=on
> - after the first oom, "cat /sys/kernel/debug/page_owner > file"
> - provide the file (compressed, it will be quite large)
>
> Vlastimil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-18 Thread Tetsuo Handa
On 2016/11/18 6:49, Vlastimil Babka wrote:
> On 11/16/2016 02:39 PM, E V wrote:
>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
>> the stack trace, and the 38 call traces in a 2 minute window shortly
>> before, to the bugzilla case for those not on it's e-mail list:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
> 
> The panic screenshot has only the last part, but the end marker says
> it's OOM with no killable processes. The DEBUG_VM config thus didn't
> trigger anything, and still there's tons of pagecache, mostly clean,
> that's not being reclaimed.
> 
> Could you now try this?
> - enable CONFIG_PAGE_OWNER
> - boot with kernel option: page_owner=on
> - after the first oom, "cat /sys/kernel/debug/page_owner > file"
> - provide the file (compressed, it will be quite large)

Excuse me for a noise, but do we really need to do
"cat /sys/kernel/debug/page_owner > file" after the first OOM killer
invocation? I worry that it might be too difficult to do.
Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file"
hourly and compare tendency between the latest one and previous one?

This system has swap, and /var/log/messages before panic
reports that swapin was stalling at memory allocation.


[130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, order:0, 
mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
[130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I
 4.9.0-rc5 #2
[130346.262662]   8129ba69 8170e4c8 
c90003ccb8d8
[130346.262714]  8113449f 024200ca1ca11b40 8170e4c8 
c90003ccb880
[130346.262765]  0010 c90003ccb8e8 c90003ccb898 
88041f226e80
[130346.262817] Call Trace:
[130346.262843]  [] ? dump_stack+0x46/0x5d
[130346.262872]  [] ? warn_alloc+0x11f/0x140
[130346.262899]  [] ? __alloc_pages_slowpath+0x84b/0xa80
[130346.262929]  [] ? __alloc_pages_nodemask+0x2b0/0x2f0
[130346.262960]  [] ? alloc_pages_vma+0xbe/0x260
[130346.262989]  [] ? pagecache_get_page+0x22/0x280
[130346.263019]  [] ? __read_swap_cache_async+0x118/0x1a0
[130346.263048]  [] ? read_swap_cache_async+0xf/0x30
[130346.263077]  [] ? swapin_readahead+0x16e/0x1c0
[130346.263106]  [] ? radix_tree_lookup_slot+0xe/0x20
[130346.263135]  [] ? find_get_entry+0x14/0x130
[130346.263162]  [] ? pagecache_get_page+0x22/0x280
[130346.263193]  [] ? do_swap_page+0x44f/0x5f0
[130346.263220]  [] ? __radix_tree_lookup+0x62/0xc0
[130346.263249]  [] ? handle_mm_fault+0x66a/0xf00
[130346.263277]  [] ? find_get_entry+0x14/0x130
[130346.263305]  [] ? __do_page_fault+0x1c5/0x490
[130346.263336]  [] ? page_fault+0x22/0x30
[130346.263364]  [] ? copy_user_generic_string+0x2c/0x40
[130346.263395]  [] ? set_fd_set+0x1d/0x30
[130346.263422]  [] ? core_sys_select+0x1a5/0x260
[130346.263450]  [] ? getname_flags+0x6a/0x1e0
[130346.263479]  [] ? cp_new_stat+0x115/0x130
[130346.263509]  [] ? ktime_get_ts64+0x3f/0xf0
[130346.263537]  [] ? SyS_select+0xa5/0xe0
[130346.263564]  [] ? entry_SYSCALL_64_fastpath+0x13/0x94


Under such situation, trying to login and execute /bin/cat could take minutes.
Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a
situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress.


[130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, 
mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
[130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I
 4.9.0-rc5 #2
[130420.008401] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[130420.008432]   8129ba69 8170e4c8 
c900087836a0
[130420.008483]  8113449f 024008401e3f1b40 8170e4c8 
c90008783648
[130420.008534]  0010 c900087836b0 c90008783660 
88041ecc4340
[130420.008586] Call Trace:
[130420.008611]  [] ? dump_stack+0x46/0x5d
[130420.008640]  [] ? warn_alloc+0x11f/0x140
[130420.008667]  [] ? __alloc_pages_slowpath+0x84b/0xa80
[130420.008707]  [] ? search_bitmap+0xc2/0x140 [btrfs]
[130420.008736]  [] ? __alloc_pages_nodemask+0x2b0/0x2f0
[130420.008766]  [] ? alloc_pages_current+0x8a/0x110
[130420.008796]  [] ? pagecache_get_page+0xec/0x280
[130420.008836]  [] ? alloc_extent_buffer+0x108/0x430 [btrfs]
[130420.008875]  [] ? btrfs_alloc_tree_block+0x118/0x4d0 
[btrfs]
[130420.008927]  [] ? __btrfs_cow_block+0x148/0x5d0 [btrfs]
[130420.008964]  [] ? btrfs_cow_block+0x114/0x1d0 [btrfs]
[130420.009001]  [] ? btrfs_search_slot+0x206/0xa40 [btrfs]
[130420.009039]  [] ? lookup_inline_extent_backref+0xd9/0x620 
[btrfs]
[130420.009095]  [] ? set_extent_bit+0x24/0x30 [btrfs]
[130420.009124]  [] ? kmem_cache_alloc+0x17f/0x1b0
[130420.009161]  [] ? __btrfs_free_extent.isra.69+0xef/0xd10 
[btrfs]
[130420.009215]  [] ? btrfs_merge_delayed_refs+0x56/0x6f0 
[btrfs]
[130420.009269]  [] ? 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-17 Thread Vlastimil Babka
On 11/16/2016 02:39 PM, E V wrote:
> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
> the stack trace, and the 38 call traces in a 2 minute window shortly
> before, to the bugzilla case for those not on it's e-mail list:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=186671

The panic screenshot has only the last part, but the end marker says
it's OOM with no killable processes. The DEBUG_VM config thus didn't
trigger anything, and still there's tons of pagecache, mostly clean,
that's not being reclaimed.

Could you now try this?
- enable CONFIG_PAGE_OWNER
- boot with kernel option: page_owner=on
- after the first oom, "cat /sys/kernel/debug/page_owner > file"
- provide the file (compressed, it will be quite large)

Vlastimil

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-16 Thread E V
System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
the stack trace, and the 38 call traces in a 2 minute window shortly
before, to the bugzilla case for those not on it's e-mail list:

https://bugzilla.kernel.org/show_bug.cgi?id=186671

On Mon, Nov 14, 2016 at 3:56 PM, E V  wrote:
> Pretty sure it was the system after the OOM just did a history search
> to check, though it is 3 days afterwards and several OOMs killed
> several processes in somewhat rapid succession, I just listed the 1st.
> I'll turn on CONFIG_DEBUG_VM and reboot again.
>
> On Mon, Nov 14, 2016 at 12:04 PM, Vlastimil Babka  wrote:
>> On 11/14/2016 02:27 PM, E V wrote:
>>> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10
>>> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll
>>> download the current 4.9rc and reboot, but in the mean time here's
>>> xxd, vmstat & kern.log output:
>>> 8532039 
>>
>> Hmm this would suggest that the memory is mostly free. But not according
>> to vmstat. Is it possible you mistakenly provided the xxd from a fresh
>> boot, but vmstat from after the OOM?
>>
>> But sure, a page_count() of zero is a reason why __isolate_lru_page()
>> would fail due to its get_page_unless_zero(). The question is then how
>> could it drop to zero without being freed at the same time, as
>> put_page() does.
>>
>> I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to
>> delete_from_page_cache(), but that's since 4.8 and you mention problems
>> since 4.7.
>>
>> Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code
>> usually has VM_BUG_ONs.
>>
>> Vlastimil
>>
>>>9324 0100
>>>2226 0200
>>> 405 0300
>>>  80 0400
>>>  34 0500
>>>  48 0600
>>>  17 0700
>>>  17 0800
>>>  32 0900
>>>  19 0a00
>>>   1 0c00
>>>   1 0d00
>>>   1 0e00
>>>  12 1000
>>>   8 1100
>>>  32 1200
>>>  10 1300
>>>   2 1400
>>>  11 1500
>>>  12 1600
>>>   7 1700
>>>   3 1800
>>>   5 1900
>>>   6 1a00
>>>  11 1b00
>>>  22 1c00
>>>   3 1d00
>>>  19 1e00
>>>  21 1f00
>>>  18 2000
>>>  28 2100
>>>  40 2200
>>>  38 2300
>>>  85 2400
>>>  59 2500
>>>   40520 81ff
>>>
>>> /proc/vmstat:
>>> nr_free_pages 60965
>>> nr_zone_inactive_anon 4646
>>> nr_zone_active_anon 3265
>>> nr_zone_inactive_file 633882
>>> nr_zone_active_file 7017458
>>> nr_zone_unevictable 0
>>> nr_zone_write_pending 0
>>> nr_mlock 0
>>> nr_slab_reclaimable 299205
>>> nr_slab_unreclaimable 195497
>>> nr_page_table_pages 935
>>> nr_kernel_stack 4976
>>> nr_bounce 0
>>> numa_hit 3577063288
>>> numa_miss 541393191
>>> numa_foreign 541393191
>>> numa_interleave 19415
>>> numa_local 3577063288
>>> numa_other 0
>>> nr_free_cma 0
>>> nr_inactive_anon 4646
>>> nr_active_anon 3265
>>> nr_inactive_file 633882
>>> nr_active_file 7017458
>>> nr_unevictable 0
>>> nr_isolated_anon 0
>>> nr_isolated_file 0
>>> nr_pages_scanned 0
>>> workingset_refault 42685891
>>> workingset_activate 15247281
>>> workingset_nodereclaim 26375216
>>> nr_anon_pages 5067
>>> nr_mapped 5630
>>> nr_file_pages 7654746
>>> nr_dirty 0
>>> nr_writeback 0
>>> nr_writeback_temp 0
>>> nr_shmem 2504
>>> nr_shmem_hugepages 0
>>> nr_shmem_pmdmapped 0
>>> nr_anon_transparent_hugepages 0
>>> nr_unstable 0
>>> nr_vmscan_write 5243750485
>>> nr_vmscan_immediate_reclaim 4207633857
>>> nr_dirtied 1839143430
>>> nr_written 1832626107
>>> nr_dirty_threshold 1147728
>>> nr_dirty_background_threshold 151410
>>> pgpgin 166731189
>>> pgpgout 7328142335
>>> pswpin 98608
>>> pswpout 117794
>>> pgalloc_dma 29504
>>> pgalloc_dma32 1006726216
>>> pgalloc_normal 5275218188
>>> pgalloc_movable 0
>>> allocstall_dma 0
>>> allocstall_dma32 0
>>> allocstall_normal 36461
>>> allocstall_movable 5867
>>> pgskip_dma 0
>>> pgskip_dma32 0
>>> pgskip_normal 6417890
>>> pgskip_movable 0
>>> pgfree 6309223401
>>> pgactivate 35076483
>>> pgdeactivate 63556974
>>> pgfault 35753842
>>> pgmajfault 69126
>>> pglazyfreed 0
>>> pgrefill 70008598
>>> pgsteal_kswapd 3567289713
>>> pgsteal_direct 5878057
>>> pgscan_kswapd 9059309872
>>> pgscan_direct 4239367903
>>> pgscan_direct_throttle 0
>>> zone_reclaim_failed 0
>>> pginodesteal 102916
>>> slabs_scanned 460790262
>>> kswapd_inodesteal 9130243
>>> kswapd_low_wmark_hit_quickly 10634373
>>> kswapd_high_wmark_hit_quickly 7348173
>>> pageoutrun 18349115
>>> pgrotated 16291322
>>> drop_pagecache 0
>>> drop_slab 0
>>> pgmigrate_success 18912908
>>> 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-14 Thread E V
Pretty sure it was the system after the OOM just did a history search
to check, though it is 3 days afterwards and several OOMs killed
several processes in somewhat rapid succession, I just listed the 1st.
I'll turn on CONFIG_DEBUG_VM and reboot again.

On Mon, Nov 14, 2016 at 12:04 PM, Vlastimil Babka  wrote:
> On 11/14/2016 02:27 PM, E V wrote:
>> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10
>> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll
>> download the current 4.9rc and reboot, but in the mean time here's
>> xxd, vmstat & kern.log output:
>> 8532039 
>
> Hmm this would suggest that the memory is mostly free. But not according
> to vmstat. Is it possible you mistakenly provided the xxd from a fresh
> boot, but vmstat from after the OOM?
>
> But sure, a page_count() of zero is a reason why __isolate_lru_page()
> would fail due to its get_page_unless_zero(). The question is then how
> could it drop to zero without being freed at the same time, as
> put_page() does.
>
> I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to
> delete_from_page_cache(), but that's since 4.8 and you mention problems
> since 4.7.
>
> Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code
> usually has VM_BUG_ONs.
>
> Vlastimil
>
>>9324 0100
>>2226 0200
>> 405 0300
>>  80 0400
>>  34 0500
>>  48 0600
>>  17 0700
>>  17 0800
>>  32 0900
>>  19 0a00
>>   1 0c00
>>   1 0d00
>>   1 0e00
>>  12 1000
>>   8 1100
>>  32 1200
>>  10 1300
>>   2 1400
>>  11 1500
>>  12 1600
>>   7 1700
>>   3 1800
>>   5 1900
>>   6 1a00
>>  11 1b00
>>  22 1c00
>>   3 1d00
>>  19 1e00
>>  21 1f00
>>  18 2000
>>  28 2100
>>  40 2200
>>  38 2300
>>  85 2400
>>  59 2500
>>   40520 81ff
>>
>> /proc/vmstat:
>> nr_free_pages 60965
>> nr_zone_inactive_anon 4646
>> nr_zone_active_anon 3265
>> nr_zone_inactive_file 633882
>> nr_zone_active_file 7017458
>> nr_zone_unevictable 0
>> nr_zone_write_pending 0
>> nr_mlock 0
>> nr_slab_reclaimable 299205
>> nr_slab_unreclaimable 195497
>> nr_page_table_pages 935
>> nr_kernel_stack 4976
>> nr_bounce 0
>> numa_hit 3577063288
>> numa_miss 541393191
>> numa_foreign 541393191
>> numa_interleave 19415
>> numa_local 3577063288
>> numa_other 0
>> nr_free_cma 0
>> nr_inactive_anon 4646
>> nr_active_anon 3265
>> nr_inactive_file 633882
>> nr_active_file 7017458
>> nr_unevictable 0
>> nr_isolated_anon 0
>> nr_isolated_file 0
>> nr_pages_scanned 0
>> workingset_refault 42685891
>> workingset_activate 15247281
>> workingset_nodereclaim 26375216
>> nr_anon_pages 5067
>> nr_mapped 5630
>> nr_file_pages 7654746
>> nr_dirty 0
>> nr_writeback 0
>> nr_writeback_temp 0
>> nr_shmem 2504
>> nr_shmem_hugepages 0
>> nr_shmem_pmdmapped 0
>> nr_anon_transparent_hugepages 0
>> nr_unstable 0
>> nr_vmscan_write 5243750485
>> nr_vmscan_immediate_reclaim 4207633857
>> nr_dirtied 1839143430
>> nr_written 1832626107
>> nr_dirty_threshold 1147728
>> nr_dirty_background_threshold 151410
>> pgpgin 166731189
>> pgpgout 7328142335
>> pswpin 98608
>> pswpout 117794
>> pgalloc_dma 29504
>> pgalloc_dma32 1006726216
>> pgalloc_normal 5275218188
>> pgalloc_movable 0
>> allocstall_dma 0
>> allocstall_dma32 0
>> allocstall_normal 36461
>> allocstall_movable 5867
>> pgskip_dma 0
>> pgskip_dma32 0
>> pgskip_normal 6417890
>> pgskip_movable 0
>> pgfree 6309223401
>> pgactivate 35076483
>> pgdeactivate 63556974
>> pgfault 35753842
>> pgmajfault 69126
>> pglazyfreed 0
>> pgrefill 70008598
>> pgsteal_kswapd 3567289713
>> pgsteal_direct 5878057
>> pgscan_kswapd 9059309872
>> pgscan_direct 4239367903
>> pgscan_direct_throttle 0
>> zone_reclaim_failed 0
>> pginodesteal 102916
>> slabs_scanned 460790262
>> kswapd_inodesteal 9130243
>> kswapd_low_wmark_hit_quickly 10634373
>> kswapd_high_wmark_hit_quickly 7348173
>> pageoutrun 18349115
>> pgrotated 16291322
>> drop_pagecache 0
>> drop_slab 0
>> pgmigrate_success 18912908
>> pgmigrate_fail 63382146
>> compact_migrate_scanned 2986269789
>> compact_free_scanned 190451505123
>> compact_isolated 109549437
>> compact_stall 3544
>> compact_fail 8
>> compact_success 3536
>> compact_daemon_wake 1403515
>> htlb_buddy_alloc_success 0
>> htlb_buddy_alloc_fail 0
>> unevictable_pgs_culled 12473
>> unevictable_pgs_scanned 0
>> unevictable_pgs_rescued 11979
>> unevictable_pgs_mlocked 14556
>> unevictable_pgs_munlocked 14556
>> unevictable_pgs_cleared 0
>> 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-14 Thread Vlastimil Babka
On 11/14/2016 02:27 PM, E V wrote:
> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10
> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll
> download the current 4.9rc and reboot, but in the mean time here's
> xxd, vmstat & kern.log output:
> 8532039 

Hmm this would suggest that the memory is mostly free. But not according
to vmstat. Is it possible you mistakenly provided the xxd from a fresh
boot, but vmstat from after the OOM?

But sure, a page_count() of zero is a reason why __isolate_lru_page()
would fail due to its get_page_unless_zero(). The question is then how
could it drop to zero without being freed at the same time, as
put_page() does.

I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to
delete_from_page_cache(), but that's since 4.8 and you mention problems
since 4.7.

Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code
usually has VM_BUG_ONs.

Vlastimil

>9324 0100
>2226 0200
> 405 0300
>  80 0400
>  34 0500
>  48 0600
>  17 0700
>  17 0800
>  32 0900
>  19 0a00
>   1 0c00
>   1 0d00
>   1 0e00
>  12 1000
>   8 1100
>  32 1200
>  10 1300
>   2 1400
>  11 1500
>  12 1600
>   7 1700
>   3 1800
>   5 1900
>   6 1a00
>  11 1b00
>  22 1c00
>   3 1d00
>  19 1e00
>  21 1f00
>  18 2000
>  28 2100
>  40 2200
>  38 2300
>  85 2400
>  59 2500
>   40520 81ff
> 
> /proc/vmstat:
> nr_free_pages 60965
> nr_zone_inactive_anon 4646
> nr_zone_active_anon 3265
> nr_zone_inactive_file 633882
> nr_zone_active_file 7017458
> nr_zone_unevictable 0
> nr_zone_write_pending 0
> nr_mlock 0
> nr_slab_reclaimable 299205
> nr_slab_unreclaimable 195497
> nr_page_table_pages 935
> nr_kernel_stack 4976
> nr_bounce 0
> numa_hit 3577063288
> numa_miss 541393191
> numa_foreign 541393191
> numa_interleave 19415
> numa_local 3577063288
> numa_other 0
> nr_free_cma 0
> nr_inactive_anon 4646
> nr_active_anon 3265
> nr_inactive_file 633882
> nr_active_file 7017458
> nr_unevictable 0
> nr_isolated_anon 0
> nr_isolated_file 0
> nr_pages_scanned 0
> workingset_refault 42685891
> workingset_activate 15247281
> workingset_nodereclaim 26375216
> nr_anon_pages 5067
> nr_mapped 5630
> nr_file_pages 7654746
> nr_dirty 0
> nr_writeback 0
> nr_writeback_temp 0
> nr_shmem 2504
> nr_shmem_hugepages 0
> nr_shmem_pmdmapped 0
> nr_anon_transparent_hugepages 0
> nr_unstable 0
> nr_vmscan_write 5243750485
> nr_vmscan_immediate_reclaim 4207633857
> nr_dirtied 1839143430
> nr_written 1832626107
> nr_dirty_threshold 1147728
> nr_dirty_background_threshold 151410
> pgpgin 166731189
> pgpgout 7328142335
> pswpin 98608
> pswpout 117794
> pgalloc_dma 29504
> pgalloc_dma32 1006726216
> pgalloc_normal 5275218188
> pgalloc_movable 0
> allocstall_dma 0
> allocstall_dma32 0
> allocstall_normal 36461
> allocstall_movable 5867
> pgskip_dma 0
> pgskip_dma32 0
> pgskip_normal 6417890
> pgskip_movable 0
> pgfree 6309223401
> pgactivate 35076483
> pgdeactivate 63556974
> pgfault 35753842
> pgmajfault 69126
> pglazyfreed 0
> pgrefill 70008598
> pgsteal_kswapd 3567289713
> pgsteal_direct 5878057
> pgscan_kswapd 9059309872
> pgscan_direct 4239367903
> pgscan_direct_throttle 0
> zone_reclaim_failed 0
> pginodesteal 102916
> slabs_scanned 460790262
> kswapd_inodesteal 9130243
> kswapd_low_wmark_hit_quickly 10634373
> kswapd_high_wmark_hit_quickly 7348173
> pageoutrun 18349115
> pgrotated 16291322
> drop_pagecache 0
> drop_slab 0
> pgmigrate_success 18912908
> pgmigrate_fail 63382146
> compact_migrate_scanned 2986269789
> compact_free_scanned 190451505123
> compact_isolated 109549437
> compact_stall 3544
> compact_fail 8
> compact_success 3536
> compact_daemon_wake 1403515
> htlb_buddy_alloc_success 0
> htlb_buddy_alloc_fail 0
> unevictable_pgs_culled 12473
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 11979
> unevictable_pgs_mlocked 14556
> unevictable_pgs_munlocked 14556
> unevictable_pgs_cleared 0
> unevictable_pgs_stranded 0
> thp_fault_alloc 0
> thp_fault_fallback 0
> thp_collapse_alloc 0
> thp_collapse_alloc_failed 0
> thp_file_alloc 0
> thp_file_mapped 0
> thp_split_page 0
> thp_split_page_failed 0
> thp_deferred_split_page 0
> thp_split_pmd 0
> thp_zero_page_alloc 0
> thp_zero_page_alloc_failed 0
> 
> kern.log OOM message:
> [737778.724194] snmpd invoked oom-killer:
> gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> [737778.724246] snmpd cpuset=/ mems_allowed=0-1
> [737778.724278] 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-14 Thread E V
System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10
family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll
download the current 4.9rc and reboot, but in the mean time here's
xxd, vmstat & kern.log output:
8532039 
   9324 0100
   2226 0200
405 0300
 80 0400
 34 0500
 48 0600
 17 0700
 17 0800
 32 0900
 19 0a00
  1 0c00
  1 0d00
  1 0e00
 12 1000
  8 1100
 32 1200
 10 1300
  2 1400
 11 1500
 12 1600
  7 1700
  3 1800
  5 1900
  6 1a00
 11 1b00
 22 1c00
  3 1d00
 19 1e00
 21 1f00
 18 2000
 28 2100
 40 2200
 38 2300
 85 2400
 59 2500
  40520 81ff

/proc/vmstat:
nr_free_pages 60965
nr_zone_inactive_anon 4646
nr_zone_active_anon 3265
nr_zone_inactive_file 633882
nr_zone_active_file 7017458
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_slab_reclaimable 299205
nr_slab_unreclaimable 195497
nr_page_table_pages 935
nr_kernel_stack 4976
nr_bounce 0
numa_hit 3577063288
numa_miss 541393191
numa_foreign 541393191
numa_interleave 19415
numa_local 3577063288
numa_other 0
nr_free_cma 0
nr_inactive_anon 4646
nr_active_anon 3265
nr_inactive_file 633882
nr_active_file 7017458
nr_unevictable 0
nr_isolated_anon 0
nr_isolated_file 0
nr_pages_scanned 0
workingset_refault 42685891
workingset_activate 15247281
workingset_nodereclaim 26375216
nr_anon_pages 5067
nr_mapped 5630
nr_file_pages 7654746
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 2504
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 5243750485
nr_vmscan_immediate_reclaim 4207633857
nr_dirtied 1839143430
nr_written 1832626107
nr_dirty_threshold 1147728
nr_dirty_background_threshold 151410
pgpgin 166731189
pgpgout 7328142335
pswpin 98608
pswpout 117794
pgalloc_dma 29504
pgalloc_dma32 1006726216
pgalloc_normal 5275218188
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 36461
allocstall_movable 5867
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 6417890
pgskip_movable 0
pgfree 6309223401
pgactivate 35076483
pgdeactivate 63556974
pgfault 35753842
pgmajfault 69126
pglazyfreed 0
pgrefill 70008598
pgsteal_kswapd 3567289713
pgsteal_direct 5878057
pgscan_kswapd 9059309872
pgscan_direct 4239367903
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 102916
slabs_scanned 460790262
kswapd_inodesteal 9130243
kswapd_low_wmark_hit_quickly 10634373
kswapd_high_wmark_hit_quickly 7348173
pageoutrun 18349115
pgrotated 16291322
drop_pagecache 0
drop_slab 0
pgmigrate_success 18912908
pgmigrate_fail 63382146
compact_migrate_scanned 2986269789
compact_free_scanned 190451505123
compact_isolated 109549437
compact_stall 3544
compact_fail 8
compact_success 3536
compact_daemon_wake 1403515
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 12473
unevictable_pgs_scanned 0
unevictable_pgs_rescued 11979
unevictable_pgs_mlocked 14556
unevictable_pgs_munlocked 14556
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 0
thp_split_page_failed 0
thp_deferred_split_page 0
thp_split_pmd 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0

kern.log OOM message:
[737778.724194] snmpd invoked oom-killer:
gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[737778.724246] snmpd cpuset=/ mems_allowed=0-1
[737778.724278] CPU: 15 PID: 2976 Comm: snmpd Tainted: GW I 4.8.4 #1
[737778.724352]   81292069 88041e043c48
88041e043c48
[737778.724403]  8118d1f6 88041dd70fc0 88041e043c48
0136236f
[737778.724454]  8170e11e 0001 8112a700
030f
[737778.724505] Call Trace:
[737778.724533]  [] ? dump_stack+0x46/0x5d
[737778.727077]  [] ? dump_header.isra.16+0x56/0x185
[737778.727108]  [] ? oom_kill_process+0x210/0x3c0
[737778.727136]  [] ? out_of_memory+0x34b/0x420
[737778.727165]  [] ? __alloc_pages_nodemask+0xd9a/0xde0
[737778.727195]  [] ? alloc_pages_vma+0xc1/0x240
[737778.727223]  [] ? pagecache_get_page+0x22/0x230
[737778.727253]  [] ? __read_swap_cache_async+0x104/0x180
[737778.727282]  [] ? read_swap_cache_async+0xf/0x30
[737778.727311]  [] ? swapin_readahead+0xec/0x1a0
[737778.727340]  [] ? do_swap_page+0x420/0x5c0
[737778.727369]  [] ? SYSC_recvfrom+0xa8/0x110
[737778.727397]  [] ? handle_mm_fault+0x629/0xe30
[737778.727426]  [] ? 

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-04 Thread Vlastimil Babka
On 11/04/2016 03:13 PM, E V wrote:
> After the system panic'd yesterday I booted back into 4.8.4 and
> restarted the rsync's. I'm away on vacation next week, so when I get
> back I'll get rc4 or rc5 and try again. In the mean time here's data
> from the system running 4.8.4 without problems for about a day. I'm
> not familiar with xxd and didn't see a -e option, so used -E:
> xxd -E -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c
> 8258633 
>  216440 0100

The lack of -e means it's big endian, which is not a big issue. So here
most of memory is free, some pages have just one pin, and only
relatively few have more. The vmstats also doesn't show anything bad, so
we'll have to wait if something appears within the week, or after you
try 4.9 again. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-04 Thread E V
After the system panic'd yesterday I booted back into 4.8.4 and
restarted the rsync's. I'm away on vacation next week, so when I get
back I'll get rc4 or rc5 and try again. In the mean time here's data
from the system running 4.8.4 without problems for about a day. I'm
not familiar with xxd and didn't see a -e option, so used -E:
xxd -E -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c
8258633 
 216440 0100
   5576 0200
592 0300
195 0400
184 0500
171 0600
 70 0700
  3 0800
 17 0900
 48 0a00
 78 0b00
 33 0c00
 23 0d00
 18 0e00
  3 0f00
  5 1000
  2 1100
  7 1200
  5 1300
  2 1400
 36 1500
 10 1600
  6 1700
  3 1800
  8 1900
  4 1a00
  7 1b00
  4 1c00
  5 1d00
  3 1e00
 18 1f00
  9 2000
  9 2100
  9 2200
 19 2300
 13 2400
  6 2500
 13 2600
 13 2700
  3 2800
 16 2900
  7 2a00
 21 2b00
 33 2c00
 19 2d00
 54 2e00
 29 2f00
 72 3000
 27 3100
 102635 81ff

cat /proc/vmstat
nr_free_pages 106970
nr_zone_inactive_anon 110034
nr_zone_active_anon 108424
nr_zone_inactive_file 350017
nr_zone_active_file 2158161
nr_zone_unevictable 0
nr_zone_write_pending 114
nr_mlock 0
nr_slab_reclaimable 4962990
nr_slab_unreclaimable 415089
nr_page_table_pages 2149
nr_kernel_stack 6176
nr_bounce 0
numa_hit 403780590
numa_miss 176970926
numa_foreign 176970926
numa_interleave 19415
numa_local 403780590
numa_other 0
nr_free_cma 0
nr_inactive_anon 110034
nr_active_anon 108424
nr_inactive_file 350017
nr_active_file 2158161
nr_unevictable 0
nr_isolated_anon 0
nr_isolated_file 0
nr_pages_scanned 0
workingset_refault 1443060
workingset_activate 558143
workingset_nodereclaim 6879280
nr_anon_pages 216243
nr_mapped 6462
nr_file_pages 2510544
nr_dirty 114
nr_writeback 0
nr_writeback_temp 0
nr_shmem 2179
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 1127
nr_vmscan_immediate_reclaim 19056
nr_dirtied 254716641
nr_written 254532248
nr_dirty_threshold 383652
nr_dirty_background_threshold 50612
pgpgin 21962903
pgpgout 1024651087
pswpin 214
pswpout 1127
pgalloc_dma 0
pgalloc_dma32 87690791
pgalloc_normal 806119097
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 210
allocstall_movable 0
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 0
pgskip_movable 0
pgfree 894694404
pgactivate 5513535
pgdeactivate 7989719
pgfault 4748538
pgmajfault 2528
pglazyfreed 0
pgrefill 7999038
pgsteal_kswapd 504125672
pgsteal_direct 36130
pgscan_kswapd 504479233
pgscan_direct 36142
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 1074
slabs_scanned 61625344
kswapd_inodesteal 1956613
kswapd_low_wmark_hit_quickly 49386
kswapd_high_wmark_hit_quickly 79880
pageoutrun 211656
pgrotated 203832
drop_pagecache 0
drop_slab 0
pgmigrate_success 684523
pgmigrate_fail 1189249
compact_migrate_scanned 94848219
compact_free_scanned 2329620072
compact_isolated 2648057
compact_stall 38
compact_fail 0
compact_success 38
compact_daemon_wake 9682
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 12473
unevictable_pgs_scanned 0
unevictable_pgs_rescued 11979
unevictable_pgs_mlocked 14556
unevictable_pgs_munlocked 14556
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 0
thp_split_page_failed 0
thp_deferred_split_page 0
thp_split_pmd 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0

On Thu, Nov 3, 2016 at 7:58 PM, Vlastimil Babka  wrote:
> On 11/03/2016 07:53 PM, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>
> +CC also btrfs just in case it's a problem in page reclaim there
>
>> On Wed, 02 Nov 2016 13:02:39 + bugzilla-dae...@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
>>>
>>> Bug ID: 186671
>>>Summary: OOM on system with just rsync running 32GB of ram 30GB
>>> of pagecache
>>>Product: Memory Management
>>>Version: 2.5
>>> Kernel Version: 4.9-rc3
>>>   Hardware: x86-64
>>> OS: Linux
>>>   Tree: Mainline
>>>  

Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

2016-11-03 Thread Vlastimil Babka
On 11/03/2016 07:53 PM, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).

+CC also btrfs just in case it's a problem in page reclaim there

> On Wed, 02 Nov 2016 13:02:39 + bugzilla-dae...@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
>>
>> Bug ID: 186671
>>Summary: OOM on system with just rsync running 32GB of ram 30GB
>> of pagecache
>>Product: Memory Management
>>Version: 2.5
>> Kernel Version: 4.9-rc3
>>   Hardware: x86-64
>> OS: Linux
>>   Tree: Mainline
>> Status: NEW
>>   Severity: high
>>   Priority: P1
>>  Component: Page Allocator
>>   Assignee: a...@linux-foundation.org
>>   Reporter: eliven...@gmail.com
>> Regression: No
>>
>> Running rsync on a debian jessie system with 32GB of RAM and a big
>> 250TB btrfs filesystem. 30 GB of ram show up as cached, not much else
>> running on the system. Lots of page alloction stalls in dmesg before
>> hand, and several OOM's after this one as well until it finally killed
>> the rsync. So more traces available if desired. Started with the 4.7
>> series kernels, thought it was going to be fixed in 4.9:
> 
> OK, this looks bad.  Please let's work it via email so do remember the
> reply-to-alls.

It's bad but note the "started with 4.7" so it's not a 4.9 regression.
Also not a high-order OOM (phew!).

>> [93428.029768] irqbalance invoked oom-killer:
>> gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0-1, order=0,
>> oom_score_adj=0
>> [93428.029824] irqbalance cpuset=/ mems_allowed=0-1
>> [93428.029857] CPU: 11 PID: 2992 Comm: irqbalance Tainted: G  W   
>> 4.9.0-rc3
>> #1
>> [93428.029945]   812946c9 c90003d8bb10
>> c90003d8bb10
>> [93428.029997]  81190dd5  
>> 88081db051c0
>> [93428.030049]  c90003d8bb10 81711866 0002
>> 0213
>> [93428.030101] Call Trace:
>> [93428.030127]  [] ? dump_stack+0x46/0x5d
>> [93428.030157]  [] ? dump_header.isra.20+0x75/0x1a6
>> [93428.030189]  [] ? oom_kill_process+0x219/0x3d0
>> [93428.030218]  [] ? out_of_memory+0xd9/0x570
>> [93428.030246]  [] ? __alloc_pages_slowpath+0xa4b/0xa80
>> [93428.030276]  [] ? __alloc_pages_nodemask+0x288/0x2c0
>> [93428.030306]  [] ? alloc_pages_vma+0xc1/0x240
>> [93428.030337]  [] ? handle_mm_fault+0xccb/0xe60
>> [93428.030367]  [] ? __do_page_fault+0x1c5/0x490
>> [93428.030397]  [] ? page_fault+0x22/0x30
>> [93428.030425]  [] ? copy_user_generic_string+0x2c/0x40
>> [93428.030455]  [] ? seq_read+0x305/0x370
>> [93428.030483]  [] ? proc_reg_read+0x3e/0x60
>> [93428.030511]  [] ? __vfs_read+0x1e/0x110
>> [93428.030538]  [] ? vfs_read+0x89/0x130
>> [93428.030564]  [] ? SyS_read+0x3d/0x90
>> [93428.030591]  [] ? entry_SYSCALL_64_fastpath+0x13/0x94
>> [93428.030620] Mem-Info:
>> [93428.030647] active_anon:9283 inactive_anon:9905 isolated_anon:0
>> [93428.030647]  active_file:6752598 inactive_file:999166 isolated_file:288
>> [93428.030647]  unevictable:0 dirty:997857 writeback:1665 unstable:0
>> [93428.030647]  slab_reclaimable:203122 slab_unreclaimable:202102
>> [93428.030647]  mapped:7933 shmem:3170 pagetables:1752 bounce:0
>> [93428.030647]  free:39250 free_pcp:954 free_cma:0
>> [93428.030800] Node 0 active_anon:24984kB inactive_anon:26704kB
>> active_file:14365920kB inactive_file:1341120kB unevictable:0kB
>> isolated(anon):0kB isolated(file):0kB mapped:15852kB dirty:1338044kB
>> writeback:3072kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB
>> anon_thp: 9484kB writeback_tmp:0kB unstable:0kB pages_scanned:23811175
>> all_unreclaimable? yes
>> [93428.030933] Node 1 active_anon:12148kB inactive_anon:12916kB
>> active_file:12644472kB inactive_file:2655544kB unevictable:0kB
>> isolated(anon):0kB isolated(file):1152kB mapped:15880kB
>> dirty:2653384kB writeback:3588kB shmem:0kB shmem_thp: 0kB
>> shmem_pmdmapped: 0kB anon_thp: 3196kB writeback_tmp:0kB unstable:0kB
>> pages_scanned:23178917 all_unreclaimable? yes

Note the high pages_scanned and all_unreclaimable. I suspect something
is pinning the memory. Can you post /proc/vmstat from the system with an
uptime after it experiences the OOM?

There's /proc/kpagecount file that could confirm that. Could you provide
it too? Try running something like this and provide the output please.

xxd -e -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c

>> [93428.031059] Node 0 Normal free:44968kB min:45192kB low:61736kB
>> high:78280kB active_anon:24984kB inactive_anon:26704kB
>> active_file:14365920kB inactive_file:1341120kB unevictable:0kB
>> writepending:1341116kB present:16777216kB managed:16546296kB
>> mlocked:0kB slab_reclaimable:413824kB slab_unreclaimable:253144kB
>> kernel_stack:3496kB pagetables:4104kB bounce:0kB free_pcp:1388kB
>>