Re: [Regression bisected] btrfs: always wait on ordered extents at fsync time

2018-11-09 Thread Mel Gorman
On Fri, Nov 09, 2018 at 09:51:48PM +, Mel Gorman wrote:
> Unfortunately, as
> I'm about to travel, I didn't attempt a revert and a test comparing 4.18,
> 4.19 and 4.20-rc1 is a few hours away so this could potentially be fixed
> already but I didn't spot any obvious Fixes commit.
> 

Still here a few hours later but the regression still appears to exist
in mainline and the comparison report is below.  While there are slightly
differences, the regressions are well outside multiples of the stddev and
co-efficient of variance so I'm fairly sure it's real. The one positive
thing is that the actual standard deviation is lower so the results are
more stable but that is a thin silver lining.

reaim
   4.18.0 4.19.0
 4.20.0-rc1
  vanillavanilla
vanilla
Min   new_dbase-1   703.07 (   0.00%)  752.74 (   7.06%)  
739.23 (   5.14%)
Min   new_dbase-5  6293.28 (   0.00%) 4158.82 ( -33.92%) 
4164.42 ( -33.83%)
Min   new_dbase-9 16803.63 (   0.00%)10948.82 ( -34.84%)
10927.31 ( -34.97%)
Min   new_dbase-1325343.85 (   0.00%)18640.37 ( -26.45%)
18094.59 ( -28.60%)
Min   new_dbase-1738203.64 (   0.00%)27430.81 ( -28.20%)
27793.65 ( -27.25%)
Min   new_dbase-2145697.18 (   0.00%)34700.53 ( -24.06%)
35170.73 ( -23.04%)
Min   new_dbase-2554593.64 (   0.00%)44269.34 ( -18.91%)
44142.86 ( -19.14%)
Min   new_dbase-2960752.54 (   0.00%)51797.69 ( -14.74%)
50484.51 ( -16.90%)
Min   new_dbase-3359631.58 (   0.00%)57610.17 (  -3.39%)
57286.52 (  -3.93%)
Hmean new_dbase-1   787.46 (   0.00%)  821.81 (   4.36%)  
866.52 (  10.04%)
Hmean new_dbase-5  6437.50 (   0.00%) 4269.14 * -33.68%* 
4314.44 * -32.98%*
Hmean new_dbase-9 17305.54 (   0.00%)11101.80 * -35.85%*
11075.27 * -36.00%*
Hmean new_dbase-1327570.35 (   0.00%)19065.02 * -30.85%*
18885.75 * -31.50%*
Hmean new_dbase-1739377.81 (   0.00%)27926.64 * -29.08%*
28181.33 * -28.43%*
Hmean new_dbase-2147434.21 (   0.00%)36785.71 * -22.45%*
36150.42 * -23.79%*
Hmean new_dbase-2556593.41 (   0.00%)45387.78 * -19.80%*
44808.59 * -20.82%*
Hmean new_dbase-2961503.09 (   0.00%)52250.73 * -15.04%*
51176.47 * -16.79%*
Hmean new_dbase-3366343.53 (   0.00%)58235.29 * -12.22%*
58637.15 * -11.62%*
Stddevnew_dbase-1   125.91 (   0.00%)   57.02 (  54.71%)   
88.32 (  29.86%)
Stddevnew_dbase-5   174.01 (   0.00%)  130.71 (  24.88%)  
107.34 (  38.32%)
Stddevnew_dbase-9   537.99 (   0.00%)  114.10 (  78.79%)  
154.87 (  71.21%)
Stddevnew_dbase-13 1502.62 (   0.00%)  352.32 (  76.55%)  
503.48 (  66.49%)
Stddevnew_dbase-17  935.76 (   0.00%)  322.83 (  65.50%)  
279.50 (  70.13%)
Stddevnew_dbase-21 1389.92 (   0.00%) 1425.14 (  -2.53%)  
805.66 (  42.04%)
Stddevnew_dbase-25 1672.09 (   0.00%)  875.75 (  47.63%)  
529.71 (  68.32%)
Stddevnew_dbase-29  678.85 (   0.00%)  340.66 (  49.82%)  
852.63 ( -25.60%)
Stddevnew_dbase-33 3958.22 (   0.00%)  632.39 (  84.02%)  
845.17 (  78.65%)
CoeffVar  new_dbase-115.72 (   0.00%)6.91 (  56.03%)   
10.10 (  35.73%)
CoeffVar  new_dbase-5 2.70 (   0.00%)3.06 ( -13.25%)
2.49 (   7.96%)
CoeffVar  new_dbase-9 3.11 (   0.00%)1.03 (  66.92%)
1.40 (  54.99%)
CoeffVar  new_dbase-135.44 (   0.00%)1.85 (  66.02%)
2.66 (  50.99%)
CoeffVar  new_dbase-172.38 (   0.00%)1.16 (  51.34%)
0.99 (  58.25%)
CoeffVar  new_dbase-212.93 (   0.00%)3.87 ( -32.14%)
2.23 (  23.92%)
CoeffVar  new_dbase-252.95 (   0.00%)1.93 (  34.67%)
1.18 (  59.97%)
CoeffVar  new_dbase-291.10 (   0.00%)0.65 (  40.93%)
1.67 ( -50.92%)
CoeffVar  new_dbase-335.95 (   0.00%)1.09 (  81.74%)
1.44 (  75.77%)
Max   new_dbase-1  1019.80 (   0.00%)  900.87 ( -11.66%)  
973.23 (  -4.57%)
Max   new_dbase-5  6717.39 (   0.00%) 4446.04 ( -33.81%) 
4414.29 ( -34.29%)
Max   new_dbase-9 18058.44 (   0.00%)11259.11 ( -37.65%)
11304.88 ( -37.40%)
Max   new_dbase-1328795.70 (   0.00%)19547.45 ( -32.12%)
19452.78 ( -32.45%)
Max   new_dbase-1740407.69 (   0.00%)28241.94 ( -30.11%)
28548.91 ( -29.35%)
Max   new_dbase-2148973.58 (   0.00%)38283.19 ( -21.83%)
37080.00 ( -24.29%)
Max   new_dbase-2559195.40 (   0.00%)46676.74 ( -21.15%)
45307.92 ( -23.46%)
Max   new_dbase-2962445.99 (   0.00%)52711.7

[Regression bisected] btrfs: always wait on ordered extents at fsync time

2018-11-09 Thread Mel Gorman
cbdd2aec612de4205c8f
# bad: [893bf4b115c713738df05bb557f8fba14f07c077] btrfs: print more details 
when checking tree block finds a problem
git bisect bad 893bf4b115c713738df05bb557f8fba14f07c077
# bad: [5a98ec0141805a0ff9adb18fd18834a906637f2f] btrfs: Remove fs_info from 
btrfs_remove_block_group
git bisect bad 5a98ec0141805a0ff9adb18fd18834a906637f2f
# bad: [edf57cbf2b030781885e339f32e35a470d2f8eba] btrfs: Fix a C compliance 
issue
git bisect bad edf57cbf2b030781885e339f32e35a470d2f8eba
# bad: [e7175a692765940f3ac3f0c005b9a766a59303d7] btrfs: remove the wait 
ordered logic in the log_one_extent path
git bisect bad e7175a692765940f3ac3f0c005b9a766a59303d7
# good: [bd3c685ed9fd3763615a51a70e19ff08a456e3e1] btrfs: Document 
__btrfs_inc_extent_ref
git bisect good bd3c685ed9fd3763615a51a70e19ff08a456e3e1
# bad: [b5e6c3e170b77025b5f6174258c7ad71eed2d4de] btrfs: always wait on ordered 
extents at fsync time
git bisect bad b5e6c3e170b77025b5f6174258c7ad71eed2d4de
# good: [16d1c062c7de2999ea7be61d31070fa4ce3d99c4] btrfs: Fix comment in 
lookup_inline_extent_backref
git bisect good 16d1c062c7de2999ea7be61d31070fa4ce3d99c4
# first bad commit: [b5e6c3e170b77025b5f6174258c7ad71eed2d4de] btrfs: always 
wait on ordered extents at fsync time

Impera log
==
impera::Fri 9 Nov 14:49:20 GMT 2018::reaim delboy :good: v4.18 scaled 6505+-158 
sigma good actual 6505.26+-158.76 Hmean-new_dbase-5
impera::Fri 9 Nov 15:12:22 GMT 2018::reaim delboy :bad: v4.19 scaled 4236+-146 
sigma bad actual 4236.36+-146.32 Hmean-new_dbase-5
impera::Fri 9 Nov 15:34:56 GMT 2018::reaim delboy :good: v4.18 scaled 6234+-177 
sigma 6347 actual 6234.86+-177.59 Hmean-new_dbase-5
impera::Fri 9 Nov 15:58:14 GMT 2018::reaim delboy :bad: 
db06f826ec12bf0701ea7fc0a3c0aa00b84417c8 scaled 4264+-102 sigma 6347 actual 
4264.42+-102.96 Hmean-new_dbase-5
impera::Fri 9 Nov 16:21:26 GMT 2018::reaim delboy :bad: 
0a957467c5fd46142bc9c52758ffc552d4c5e2f7 scaled 4265+-104 sigma 6347 actual 
4265.60+-104.39 Hmean-new_dbase-5
impera::Fri 9 Nov 16:44:36 GMT 2018::reaim delboy :bad: 
958f338e96f874a0d29442396d6adf9c1e17aa2d scaled 4260+-108 sigma 6347 actual 
4260.89+-108.14 Hmean-new_dbase-5
impera::Fri 9 Nov 17:07:30 GMT 2018::reaim delboy :good: 
85a0b791bc17f7a49280b33e2905d109c062a47b scaled 6554+-297 sigma 6347 actual 
6554.94+-297.09 Hmean-new_dbase-5
impera::Fri 9 Nov 17:30:38 GMT 2018::reaim delboy :bad: 
a1a4f841ec4585185c0e75bfae43a18b282dd316 scaled 4315+-126 sigma 6347 actual 
4315.64+-126.31 Hmean-new_dbase-5
impera::Fri 9 Nov 17:53:00 GMT 2018::reaim delboy :good: 
a66b4cd1e7163adb327838a3c81faaf6a9330d5a scaled 6376+-352 sigma 6347 actual 
6376.39+-352.02 Hmean-new_dbase-5
impera::Fri 9 Nov 18:16:06 GMT 2018::reaim delboy :bad: 
ac8a866af17edc692b50cbdd2aec612de4205c8f scaled 4271+-112 sigma 6347 actual 
4271.50+-112.50 Hmean-new_dbase-5
impera::Fri 9 Nov 18:39:14 GMT 2018::reaim delboy :bad: 
893bf4b115c713738df05bb557f8fba14f07c077 scaled 4382+-78 sigma 6347 actual 
4382.98+-78.95 Hmean-new_dbase-5
impera::Fri 9 Nov 19:02:38 GMT 2018::reaim delboy :bad: 
5a98ec0141805a0ff9adb18fd18834a906637f2f scaled 4275+-99 sigma 6347 actual 
4275.04+-99.84 Hmean-new_dbase-5
impera::Fri 9 Nov 19:25:53 GMT 2018::reaim delboy :bad: 
edf57cbf2b030781885e339f32e35a470d2f8eba scaled 4263+-26 sigma 6347 actual 
4263.25+-26.42 Hmean-new_dbase-5
impera::Fri 9 Nov 19:49:12 GMT 2018::reaim delboy :bad: 
e7175a692765940f3ac3f0c005b9a766a59303d7 scaled 4189+-99 sigma 6347 actual 
4189.26+-99.48 Hmean-new_dbase-5
impera::Fri 9 Nov 20:11:30 GMT 2018::reaim delboy :good: 
bd3c685ed9fd3763615a51a70e19ff08a456e3e1 scaled 6611+-130 sigma 6347 actual 
6611.04+-130.44 Hmean-new_dbase-5
impera::Fri 9 Nov 20:34:41 GMT 2018::reaim delboy :bad: 
b5e6c3e170b77025b5f6174258c7ad71eed2d4de scaled 4272+-90 sigma 6347 actual 
4272.68+-90.93 Hmean-new_dbase-5
impera::Fri 9 Nov 20:57:11 GMT 2018::reaim delboy :good: 
16d1c062c7de2999ea7be61d31070fa4ce3d99c4 scaled 6541+-556 sigma 6347 actual 
6541.07+-556.74 Hmean-new_dbase-5

-- 
Mel Gorman
SUSE Labs


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-10 Thread Mel Gorman
On Tue, Oct 10, 2017 at 08:09:20AM +1100, Dave Chinner wrote:
> > I'd _like_ to expand fio for cases we come up with that aren't possible, as
> > there's already a ton of measurements that are taken, especially around
> > latencies.
> 
> To be properly useful it needs to support more than just fio to run
> tests. Indeed, it's largely useless to me if that's all it can do or
> it's a major pain to add support for different tools like fsmark.
> 
> e.g.  my typical perf regression test that you see the concurrnet
> fsmark create workload is actually a lot more. It does:
> 
>   fsmark to create 50m zero length files
>   umount,
>   run parallel xfs_repair (excellent mmap_sem/page fault punisher)
>   mount
>   run parallel find -ctime (readdir + lookup traversal)
>   unmount, mount
>   run parallel ls -R (readdir + dtype traversal)
>   unmount, mount
>   parallel rm -rf of 50m files
> 
> I have variants that use small 4k files or large files rather than
> empty files, taht use different fsync patterns to stress the
> log, use grep -R to traverse the data as well as
> the directory/inode structure instead of find, etc.
> 

FWIW, this is partially implemented in mmtests as
configs/config-global-dhp__io-xfsrepair. It covers the fsmark and
xfs_repair part and an example report is

http://beta.suse.com/private/mgorman/results/home/marvin/openSUSE-LEAP-42.2/global-dhp__io-xfsrepair-xfs/delboy/#xfsrepair

(ignore 4.12.603, it's 4.12.3-stable with some additional patches that were
pending for -stable at the time the test was executed). That config was
added after a discussion with you a few years ago and I've kept it since as
it has been useful in a number of contexts. Adding additional tests to cover
parallel find, parallel ls and parallel rm would be relatively trivial but
it's not there. This is a test that doesn't have proper graphing support
but it could be added in 10-15 minutes as xfsrepair is the primary metric
and it's simply reported as elapsed time.

fsmark is also covered albeit not necessarily with parameters everyone wants
as configs/config-global-dhp__io-metadata in mmtests.  Example report is

http://beta.suse.com/private/mgorman/results/home/marvin/openSUSE-LEAP-42.2/global-dhp__io-metadata-xfs/delboy/#fsmark-threaded

mmtests has been modified multiple times as according as methodologies
were improved and it's far from perfect but it seems to me that fsperf
is going to end up reimplementing a lot of it.

It's not perfect as there are multiple quality-of-implementation issues
as it often takes the shortest path to being able to collect data but
it improves over time. When a test is found to be flawed, it's fixed and
historical data is discarded. It doesn't store data in sqlite or anything
fancy, just the raw logs are preserved and reports generated as required. In
terms of tools required, the core is just bash scripts. Some of the tests
require a number of packages to be installed but not all of them. It uses a
tool to install packages if they are missing but the naming is all based on
opensuse. It *can* map opensuse package names to fedora and debian but the
mappings are not up-to-date as I do not personally run those distributions.

Even with the quality-of-implementation issues, it seems to me that it
covers a lot of the requirements that fsperf aims for.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-30 Thread Mel Gorman
On Fri, Dec 30, 2016 at 12:05:45PM +0100, Michal Hocko wrote:
> On Fri 30-12-16 10:19:26, Mel Gorman wrote:
> > On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> > > On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > > > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > > > 
> > > > > Nils, even though this is still highly experimental, could you give 
> > > > > it a
> > > > > try please?
> > > > 
> > > > Yes, no problem! So I kept the very first patch you sent but had to
> > > > revert the latest version of the debugging patch (the one in
> > > > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > > > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > > > memory cgroups enabled again, and the first thing that strikes the eye
> > > > is that I get this during boot:
> > > > 
> > > > [1.568174] [ cut here ]
> > > > [1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 
> > > > mem_cgroup_update_lru_size+0x118/0x130
> > > > [1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 
> > > > but not empty
> > > 
> > > Ohh, I can see what is wrong! a) there is a bug in the accounting in
> > > my patch (I double account) and b) the detection for the empty list
> > > cannot work after my change because per node zone will not match per
> > > zone statistics. The updated patch is below. So I hope my brain already
> > > works after it's been mostly off last few days...
> > > ---
> > > From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko 
> > > Date: Fri, 23 Dec 2016 15:11:54 +0100
> > > Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests 
> > > when
> > >  memcg is enabled
> > > 
> > > Nils Holland has reported unexpected OOM killer invocations with 32b
> > > kernel starting with 4.8 kernels
> > > 
> > 
> > I think it's unfortunate that per-zone stats are reintroduced to the
> > memcg structure.
> 
> the original patch I had didn't add per zone stats but rather did a
> nr_highmem counter to mem_cgroup_per_node (inside ifdeff CONFIG_HIGMEM).
> This would help for this particular case but it wouldn't work for other
> lowmem requests (e.g. GFP_DMA32) and with the kmem accounting this might
> be a problem in future.

That did occur to me.

> So I've decided to go with a more generic
> approach which requires per-zone tracking. I cannot say I would be
> overly happy about this at all.
> 
> > I can't help but think that it would have also worked
> > to always rotate a small number of pages if !inactive_list_is_low and
> > reclaiming for memcg even if it distorted page aging.
> 
> I am not really sure how that would work. Do you mean something like the
> following?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fa30010a5277..563ada3c02ac 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2044,6 +2044,9 @@ static bool inactive_list_is_low(struct lruvec *lruvec, 
> bool file,
>   inactive = lruvec_lru_size(lruvec, file * LRU_FILE);
>   active = lruvec_lru_size(lruvec, file * LRU_FILE + LRU_ACTIVE);
>  
> + if (!mem_cgroup_disabled())
> + goto out;
> +
>   /*
>* For zone-constrained allocations, it is necessary to check if
>* deactivations are required for lowmem to be reclaimed. This
> @@ -2063,6 +2066,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, 
> bool file,
>   active -= min(active, active_zone);
>   }
>  
> +out:
>   gb = (inactive + active) >> (30 - PAGE_SHIFT);
>   if (gb)
>   inactive_ratio = int_sqrt(10 * gb);
> 
> The problem I see with such an approach is that chances are that this
> would reintroduce what f8d1a31163fc ("mm: consider whether to decivate
> based on eligible zones inactive ratio") tried to fix. But maybe I have
> missed your point.
> 

No, you didn't miss the point. It was something like that I had in mind
but as I thought about it, I could see some cases where it might not work
and still cause a premature OOM. The per-zone accounting is unfortunate
but it's robust hence the Ack.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-30 Thread Mel Gorman
On Mon, Dec 26, 2016 at 01:48:40PM +0100, Michal Hocko wrote:
> On Fri 23-12-16 23:26:00, Nils Holland wrote:
> > On Fri, Dec 23, 2016 at 03:47:39PM +0100, Michal Hocko wrote:
> > > 
> > > Nils, even though this is still highly experimental, could you give it a
> > > try please?
> > 
> > Yes, no problem! So I kept the very first patch you sent but had to
> > revert the latest version of the debugging patch (the one in
> > which you added the "mm_vmscan_inactive_list_is_low" event) because
> > otherwise the patch you just sent wouldn't apply. Then I rebooted with
> > memory cgroups enabled again, and the first thing that strikes the eye
> > is that I get this during boot:
> > 
> > [1.568174] [ cut here ]
> > [1.568327] WARNING: CPU: 0 PID: 1 at mm/memcontrol.c:1032 
> > mem_cgroup_update_lru_size+0x118/0x130
> > [1.568543] mem_cgroup_update_lru_size(f4406400, 2, 1): lru_size 0 but 
> > not empty
> 
> Ohh, I can see what is wrong! a) there is a bug in the accounting in
> my patch (I double account) and b) the detection for the empty list
> cannot work after my change because per node zone will not match per
> zone statistics. The updated patch is below. So I hope my brain already
> works after it's been mostly off last few days...
> ---
> From 397adf46917b2d9493180354a7b0182aee280a8b Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Fri, 23 Dec 2016 15:11:54 +0100
> Subject: [PATCH] mm, memcg: fix the active list aging for lowmem requests when
>  memcg is enabled
> 
> Nils Holland has reported unexpected OOM killer invocations with 32b
> kernel starting with 4.8 kernels
> 

I think it's unfortunate that per-zone stats are reintroduced to the
memcg structure. I can't help but think that it would have also worked
to always rotate a small number of pages if !inactive_list_is_low and
reclaiming for memcg even if it distorted page aging. However, given
that such an approach would be less robust and this has been heavily
tested;

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs deadlocks under stress up until 3.12

2013-12-04 Thread Mel Gorman
Hi,

I queued up a number of tests including IO stress tests a few weeks ago
and had noticed that some of the btrfs tests failed to complete but only
looked today.  Specfically, stress tests with reaims alltests configuration
on btrfs failed up until 3.12 with a console log that looked like

[ 2882.975251] INFO: task btrfs-transacti:2816 blocked for more than 480 
seconds.
[ 2882.994789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2883.015070] btrfs-transacti D 88023fc13600 0  2816  2 0x
[ 2883.034734]  880234539dc0 0046 880234539fd8 
00013600
[ 2883.054847]  880234539fd8 00013600 880230a44540 
8801c97868b8
[ 2883.075027]  8802346be9e8 8802346be9e8  
8801ef4a6770
[ 2883.095256] Call Trace:
[ 2883.110170]  [] schedule+0x24/0x70
[ 2883.127723]  [] wait_current_trans.isra.18+0xaf/0x110
[ 2883.147034]  [] ? wake_up_atomic_t+0x30/0x30
[ 2883.165492]  [] start_transaction+0x270/0x510
[ 2883.184214]  [] btrfs_attach_transaction+0x12/0x20
[ 2883.203282]  [] transaction_kthread+0x74/0x220
[ 2883.221941]  [] ? verify_parent_transid+0x170/0x170
[ 2883.241048]  [] kthread+0xbb/0xc0
[ 2883.258423]  [] ? kthread_create_on_node+0x110/0x110
[ 2883.277654]  [] ret_from_fork+0x7c/0xb0
[ 2883.295561]  [] ? kthread_create_on_node+0x110/0x110
[ 2883.314535] INFO: task kworker/u16:3:21786 blocked for more than 480 seconds.
[ 2883.334131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2883.354587] kworker/u16:3   D 88023fc13600 0 21786  2 0x
[ 2883.374274] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
[ 2883.393428]  8801d450bbb0 0046 8801d450bfd8 
00013600
[ 2883.413681]  8801d450bfd8 00013600 8801f540a440 
88021c9473b0
[ 2883.433798]  88023463a000 8801d450bbd0 8801c97868b8 
8801c9786928
[ 2883.453815] Call Trace:
[ 2883.468366]  [] schedule+0x24/0x70
[ 2883.485395]  [] btrfs_commit_transaction+0x265/0x960
[ 2883.503928]  [] ? wake_up_atomic_t+0x30/0x30
[ 2883.521745]  [] btrfs_write_inode+0x70/0xb0
[ 2883.539623]  [] __writeback_single_inode+0x167/0x220
[ 2883.558528]  [] writeback_sb_inodes+0x19f/0x400
[ 2883.577137]  [] wb_writeback+0xe3/0x2b0
[ 2883.595184]  [] ? set_worker_desc+0x71/0x80
[ 2883.613730]  [] bdi_writeback_workfn+0x100/0x3d0
[ 2883.632837]  [] process_one_work+0x178/0x410
[ 2883.651553]  [] worker_thread+0x119/0x3a0
[ 2883.669822]  [] ? rescuer_thread+0x360/0x360
[ 2883.688338]  [] kthread+0xbb/0xc0
[ 2883.705761]  [] ? kthread_create_on_node+0x110/0x110
[ 2883.724865]  [] ret_from_fork+0x7c/0xb0
[ 2883.742994]  [] ? kthread_create_on_node+0x110/0x110

Tests were executed by mmtests using the
configs/config-global-dhp__reaim-stress-alltests as a baseline but with
the following parameters added to use a test partition

export TESTDISK_PARTITION=/dev/sda6
export TESTDISK_FILESYSTEM=btrfs
export TESTDISK_MKFS_PARAM="-f"
export TESTDISK_MOUNT_ARGS=

While it is apparently fixed at the moment, any distribution using btrfs
with 3.10-longterm or 3.11 may file bugs and nag about the general stability
of btrfs even though the issues are already resolved.  I note there are
a number of deadlock-related fixes merged for btrfs between 3.11 and
3.12. Are there plans to backport them?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] Btrfs: pass __GFP_WRITE for buffered write page allocations

2011-10-03 Thread Mel Gorman
On Fri, Sep 30, 2011 at 09:17:24AM +0200, Johannes Weiner wrote:
> Tell the page allocator that pages allocated for a buffered write are
> expected to become dirty soon.
> 
> Signed-off-by: Johannes Weiner 
> Reviewed-by: Rik van Riel 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/5] mm: exclude reserved pages from dirtyable memory

2011-10-03 Thread Mel Gorman
On Fri, Sep 30, 2011 at 09:17:20AM +0200, Johannes Weiner wrote:
> The amount of dirtyable pages should not include the full number of
> free pages: there is a number of reserved pages that the page
> allocator and kswapd always try to keep free.
> 
> The closer (reclaimable pages - dirty pages) is to the number of
> reserved pages, the more likely it becomes for reclaim to run into
> dirty pages:
> 
>+--+ ---
>|   anon   |  |
>+--+  |
>|  |  |
>|  |  -- dirty limit new-- flusher new
>|   file   |  | |
>|  |  | |
>|  |  -- dirty limit old-- flusher old
>|  ||
>+--+   --- reclaim
>| reserved |
>+--+
>|  kernel  |
>+--+
> 
> This patch introduces a per-zone dirty reserve that takes both the
> lowmem reserve as well as the high watermark of the zone into account,
> and a global sum of those per-zone values that is subtracted from the
> global amount of dirtyable pages.  The lowmem reserve is unavailable
> to page cache allocations and kswapd tries to keep the high watermark
> free.  We don't want to end up in a situation where reclaim has to
> clean pages in order to balance zones.
> 
> Not treating reserved pages as dirtyable on a global level is only a
> conceptual fix.  In reality, dirty pages are not distributed equally
> across zones and reclaim runs into dirty pages on a regular basis.
> 
> But it is important to get this right before tackling the problem on a
> per-zone level, where the distance between reclaim and the dirty pages
> is mostly much smaller in absolute numbers.
> 
> Signed-off-by: Johannes Weiner 
> Reviewed-by: Rik van Riel 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2/4] mm: try to distribute dirty pages fairly across zones

2011-09-28 Thread Mel Gorman
On Fri, Sep 23, 2011 at 04:42:48PM +0200, Johannes Weiner wrote:
> The maximum number of dirty pages that exist in the system at any time
> is determined by a number of pages considered dirtyable and a
> user-configured percentage of those, or an absolute number in bytes.
> 
> This number of dirtyable pages is the sum of memory provided by all
> the zones in the system minus their lowmem reserves and high
> watermarks, so that the system can retain a healthy number of free
> pages without having to reclaim dirty pages.
> 
> But there is a flaw in that we have a zoned page allocator which does
> not care about the global state but rather the state of individual
> memory zones.  And right now there is nothing that prevents one zone
> from filling up with dirty pages while other zones are spared, which
> frequently leads to situations where kswapd, in order to restore the
> watermark of free pages, does indeed have to write pages from that
> zone's LRU list.  This can interfere so badly with IO from the flusher
> threads that major filesystems (btrfs, xfs, ext4) mostly ignore write
> requests from reclaim already, taking away the VM's only possibility
> to keep such a zone balanced, aside from hoping the flushers will soon
> clean pages from that zone.
> 
> Enter per-zone dirty limits.  They are to a zone's dirtyable memory
> what the global limit is to the global amount of dirtyable memory, and
> try to make sure that no single zone receives more than its fair share
> of the globally allowed dirty pages in the first place.  As the number
> of pages considered dirtyable exclude the zones' lowmem reserves and
> high watermarks, the maximum number of dirty pages in a zone is such
> that the zone can always be balanced without requiring page cleaning.
> 
> As this is a placement decision in the page allocator and pages are
> dirtied only after the allocation, this patch allows allocators to
> pass __GFP_WRITE when they know in advance that the page will be
> written to and become dirty soon.  The page allocator will then
> attempt to allocate from the first zone of the zonelist - which on
> NUMA is determined by the task's NUMA memory policy - that has not
> exceeded its dirty limit.
> 
> At first glance, it would appear that the diversion to lower zones can
> increase pressure on them, but this is not the case.  With a full high
> zone, allocations will be diverted to lower zones eventually, so it is
> more of a shift in timing of the lower zone allocations.  Workloads
> that previously could fit their dirty pages completely in the higher
> zone may be forced to allocate from lower zones, but the amount of
> pages that 'spill over' are limited themselves by the lower zones'
> dirty constraints, and thus unlikely to become a problem.
> 
> For now, the problem of unfair dirty page distribution remains for
> NUMA configurations where the zones allowed for allocation are in sum
> not big enough to trigger the global dirty limits, wake up the flusher
> threads and remedy the situation.  Because of this, an allocation that
> could not succeed on any of the considered zones is allowed to ignore
> the dirty limits before going into direct reclaim or even failing the
> allocation, until a future patch changes the global dirty throttling
> and flusher thread activation so that they take individual zone states
> into account.
> 
> Signed-off-by: Johannes Weiner 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2/4] mm: writeback: cleanups in preparation for per-zone dirty limits

2011-09-28 Thread Mel Gorman
On Fri, Sep 23, 2011 at 04:41:07PM +0200, Johannes Weiner wrote:
> On Thu, Sep 22, 2011 at 10:52:42AM +0200, Johannes Weiner wrote:
> > On Wed, Sep 21, 2011 at 04:02:26PM -0700, Andrew Morton wrote:
> > > Should we rename determine_dirtyable_memory() to
> > > global_dirtyable_memory(), to get some sense of its relationship with
> > > zone_dirtyable_memory()?
> > 
> > Sounds good.
> 
> ---
> 
> The next patch will introduce per-zone dirty limiting functions in
> addition to the traditional global dirty limiting.
> 
> Rename determine_dirtyable_memory() to global_dirtyable_memory()
> before adding the zone-specific version, and fix up its documentation.
> 
> Also, move the functions to determine the dirtyable memory and the
> function to calculate the dirty limit based on that together so that
> their relationship is more apparent and that they can be commented on
> as a group.
> 
> Signed-off-by: Johannes Weiner 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] mm: exclude reserved pages from dirtyable memory

2011-09-22 Thread Mel Gorman
On Thu, Sep 22, 2011 at 11:03:26AM +0200, Johannes Weiner wrote:
> On Wed, Sep 21, 2011 at 04:03:28PM +0100, Mel Gorman wrote:
> > On Wed, Sep 21, 2011 at 03:04:23PM +0100, Mel Gorman wrote:
> > > On Tue, Sep 20, 2011 at 03:45:12PM +0200, Johannes Weiner wrote:
> > > > The amount of dirtyable pages should not include the total number of
> > > > free pages: there is a number of reserved pages that the page
> > > > allocator and kswapd always try to keep free.
> > > > 
> > > > The closer (reclaimable pages - dirty pages) is to the number of
> > > > reserved pages, the more likely it becomes for reclaim to run into
> > > > dirty pages:
> > > > 
> > > >+--+ ---
> > > >|   anon   |  |
> > > >+--+  |
> > > >|  |  |
> > > >|  |  -- dirty limit new-- flusher new
> > > >|   file   |  | |
> > > >|  |  | |
> > > >|  |  -- dirty limit old-- flusher old
> > > >|  ||
> > > >+--+   --- reclaim
> > > >| reserved |
> > > >+--+
> > > >|  kernel  |
> > > >+--+
> > > > 
> > > > Not treating reserved pages as dirtyable on a global level is only a
> > > > conceptual fix.  In reality, dirty pages are not distributed equally
> > > > across zones and reclaim runs into dirty pages on a regular basis.
> > > > 
> > > > But it is important to get this right before tackling the problem on a
> > > > per-zone level, where the distance between reclaim and the dirty pages
> > > > is mostly much smaller in absolute numbers.
> > > > 
> > > > Signed-off-by: Johannes Weiner 
> > > > ---
> > > >  include/linux/mmzone.h |1 +
> > > >  mm/page-writeback.c|8 +---
> > > >  mm/page_alloc.c|1 +
> > > >  3 files changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > > index 1ed4116..e28f8e0 100644
> > > > --- a/include/linux/mmzone.h
> > > > +++ b/include/linux/mmzone.h
> > > > @@ -316,6 +316,7 @@ struct zone {
> > > >  * sysctl_lowmem_reserve_ratio sysctl changes.
> > > >  */
> > > > unsigned long   lowmem_reserve[MAX_NR_ZONES];
> > > > +   unsigned long   totalreserve_pages;
> > > >  
> > > 
> > > This is nit-picking but totalreserve_pages is a poor name because it's
> > > a per-zone value that is one of the lowmem_reserve[] fields instead
> > > of a total. After this patch, we have zone->totalreserve_pages and
> > > totalreserve_pages but are not related to the same thing.
> > > but they are not the same.
> > 
> > As you correctly pointed out to be on IRC, zone->totalreserve_pages
> > is not the lowmem_reserve because it takes the high_wmark into
> > account. Sorry about that, I should have kept thinking.  The name is
> > still poor though because it does not explain what the value is or
> > what it means.
> > 
> > zone->FOO value needs to be related to lowmem_reserve because this
> > is related to balancing zone usage.
> > 
> > zone->FOO value should also be related to the high_wmark because
> > this is avoiding writeback from page reclaim
> > 
> > err... umm... this?
> > 
> > /*
> >  * When allocating a new page that is expected to be
> >  * dirtied soon, the number of free pages and the
> >  * dirty_balance reserve are taken into account. The
> >  * objective is that the globally allowed number of dirty
> >  * pages should be distributed throughout the zones such
> >  * that it is very unlikely that page reclaim will call
> >  * ->writepage.
> >  *
> >  * dirty_balance_reserve takes both lowmem_reserve and
> >  * the high watermark into account. The lowmem_reserve
> >  * is taken into account because we don't want the
> >  * distribution of dirty pages to unnecessarily increase
> >  * lowmem pressure. The watermark is taken into account
> >  * because it's correlated with when kswapd wakes up
> >  * and how long it stays awake.
> >  */
> > unsigned long   dirty_balance_reserve.
> 
> Yes, that's much better, thanks.
> 
> I assume this is meant the same for both the zone and the global level
> and we should not mess with totalreserve_pages in either case?

Yes. I'd even suggest changing the name of totalreserve_pages to make
it clear it is related to overcommit rather than pfmemalloc, dirty
or any other reserve. i.e. s/totalreserve_pages/overcommit_reserve/

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] mm: exclude reserved pages from dirtyable memory

2011-09-21 Thread Mel Gorman
On Wed, Sep 21, 2011 at 03:04:23PM +0100, Mel Gorman wrote:
> On Tue, Sep 20, 2011 at 03:45:12PM +0200, Johannes Weiner wrote:
> > The amount of dirtyable pages should not include the total number of
> > free pages: there is a number of reserved pages that the page
> > allocator and kswapd always try to keep free.
> > 
> > The closer (reclaimable pages - dirty pages) is to the number of
> > reserved pages, the more likely it becomes for reclaim to run into
> > dirty pages:
> > 
> >+--+ ---
> >|   anon   |  |
> >+--+  |
> >|  |  |
> >|  |  -- dirty limit new-- flusher new
> >|   file   |  | |
> >|  |  | |
> >|  |  -- dirty limit old-- flusher old
> >|  ||
> >+--+   --- reclaim
> >| reserved |
> >+--+
> >|  kernel  |
> >+--+
> > 
> > Not treating reserved pages as dirtyable on a global level is only a
> > conceptual fix.  In reality, dirty pages are not distributed equally
> > across zones and reclaim runs into dirty pages on a regular basis.
> > 
> > But it is important to get this right before tackling the problem on a
> > per-zone level, where the distance between reclaim and the dirty pages
> > is mostly much smaller in absolute numbers.
> > 
> > Signed-off-by: Johannes Weiner 
> > ---
> >  include/linux/mmzone.h |1 +
> >  mm/page-writeback.c|8 +---
> >  mm/page_alloc.c|1 +
> >  3 files changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 1ed4116..e28f8e0 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -316,6 +316,7 @@ struct zone {
> >  * sysctl_lowmem_reserve_ratio sysctl changes.
> >  */
> > unsigned long   lowmem_reserve[MAX_NR_ZONES];
> > +   unsigned long   totalreserve_pages;
> >  
> 
> This is nit-picking but totalreserve_pages is a poor name because it's
> a per-zone value that is one of the lowmem_reserve[] fields instead
> of a total. After this patch, we have zone->totalreserve_pages and
> totalreserve_pages but are not related to the same thing.
> but they are not the same.
> 

As you correctly pointed out to be on IRC, zone->totalreserve_pages
is not the lowmem_reserve because it takes the high_wmark into
account. Sorry about that, I should have kept thinking.  The name is
still poor though because it does not explain what the value is or
what it means.

zone->FOO value needs to be related to lowmem_reserve because this
is related to balancing zone usage.

zone->FOO value should also be related to the high_wmark because
this is avoiding writeback from page reclaim

err... umm... this?

/*
 * When allocating a new page that is expected to be
 * dirtied soon, the number of free pages and the
 * dirty_balance reserve are taken into account. The
 * objective is that the globally allowed number of dirty
 * pages should be distributed throughout the zones such
 * that it is very unlikely that page reclaim will call
 * ->writepage.
 *
 * dirty_balance_reserve takes both lowmem_reserve and
 * the high watermark into account. The lowmem_reserve
 * is taken into account because we don't want the
 * distribution of dirty pages to unnecessarily increase
 * lowmem pressure. The watermark is taken into account
 * because it's correlated with when kswapd wakes up
 * and how long it stays awake.
 */
unsigned long   dirty_balance_reserve.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] mm: filemap: pass __GFP_WRITE from grab_cache_page_write_begin()

2011-09-21 Thread Mel Gorman
On Tue, Sep 20, 2011 at 03:45:14PM +0200, Johannes Weiner wrote:
> Tell the page allocator that pages allocated through
> grab_cache_page_write_begin() are expected to become dirty soon.
> 
> Signed-off-by: Johannes Weiner 

Acked-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/4] mm: writeback: distribute write pages across allowable zones

2011-09-21 Thread Mel Gorman
> +  * per-zone dirty limit in the slowpath before going
> +  * into reclaim, which is important when NUMA nodes
> +  * are not big enough to reach the global limit.  The
> +  * proper fix on these setups will require awareness
> +  * of zones in the dirty-throttling and the flusher
> +  * threads.
> +  */

Here would be a good reason to explain why we sometimes allow
__GFP_WRITE pages to fall back to lower zones. As it is, the reader
is required to remember that this affects LRU ordering and when/if
reclaim tries to write back the page.

> + if (!(alloc_flags & ALLOC_SLOWPATH) &&
> + (gfp_mask & __GFP_WRITE) && !zone_dirty_ok(zone))
> + goto this_zone_full;
>  
>   BUILD_BUG_ON(ALLOC_NO_WATERMARKS < NR_WMARK);
>   if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
> @@ -2111,7 +2131,7 @@ restart:
>* reclaim. Now things get more complex, so set up alloc_flags according
>* to how we want to proceed.
>*/
> - alloc_flags = gfp_to_alloc_flags(gfp_mask);
> + alloc_flags = gfp_to_alloc_flags(gfp_mask) | ALLOC_SLOWPATH;
>  

Instead of adding ALLOC_SLOWPATH, check for ALLOC_WMARK_LOW which is
only set in the fast path.

>   /*
>* Find the true preferred zone if the allocation is unconstrained by

Functionally, I did not find a problem with the patch.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] mm: exclude reserved pages from dirtyable memory

2011-09-21 Thread Mel Gorman
On Tue, Sep 20, 2011 at 03:45:12PM +0200, Johannes Weiner wrote:
> The amount of dirtyable pages should not include the total number of
> free pages: there is a number of reserved pages that the page
> allocator and kswapd always try to keep free.
> 
> The closer (reclaimable pages - dirty pages) is to the number of
> reserved pages, the more likely it becomes for reclaim to run into
> dirty pages:
> 
>+--+ ---
>|   anon   |  |
>+--+  |
>|  |  |
>|  |  -- dirty limit new-- flusher new
>|   file   |  | |
>|  |  | |
>|  |  -- dirty limit old-- flusher old
>|  ||
>+--+   --- reclaim
>| reserved |
>+--+
>|  kernel  |
>+--+
> 
> Not treating reserved pages as dirtyable on a global level is only a
> conceptual fix.  In reality, dirty pages are not distributed equally
> across zones and reclaim runs into dirty pages on a regular basis.
> 
> But it is important to get this right before tackling the problem on a
> per-zone level, where the distance between reclaim and the dirty pages
> is mostly much smaller in absolute numbers.
> 
> Signed-off-by: Johannes Weiner 
> ---
>  include/linux/mmzone.h |1 +
>  mm/page-writeback.c|8 +---
>  mm/page_alloc.c|1 +
>  3 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 1ed4116..e28f8e0 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -316,6 +316,7 @@ struct zone {
>* sysctl_lowmem_reserve_ratio sysctl changes.
>*/
>   unsigned long   lowmem_reserve[MAX_NR_ZONES];
> + unsigned long   totalreserve_pages;
>  

This is nit-picking but totalreserve_pages is a poor name because it's
a per-zone value that is one of the lowmem_reserve[] fields instead
of a total. After this patch, we have zone->totalreserve_pages and
totalreserve_pages but are not related to the same thing.
but they are not the same.

It gets confusing once you consider what the values are
for. lowmem_reserve is part of a placement policy that limits the
number of pages placed in lower zones that allocated from higher
zones. totalreserve_pages is related to the overcommit heuristic
where it is assuming that the most interesting type of allocation
is GFP_HIGHUSER.

This begs the question - what is this new field, where does it come
from, what does it want from us? Should we take it to our Patch Leader?

This field ultimately affects what zone is used to allocate a new
page so it's related to placement policy. That implies the naming then
should indicate it is related to lowmem_reserve - largest_lowmem_reserve?

Alternative, make it clear that it's one of the lowmem_reserve
values and store the index instead of the value - largest_reserve_idx?

>  #ifdef CONFIG_NUMA
>   int node;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index da6d263..9f896db 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -169,8 +169,9 @@ static unsigned long highmem_dirtyable_memory(unsigned 
> long total)
>   struct zone *z =
>   &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
>  
> - x += zone_page_state(z, NR_FREE_PAGES) +
> -  zone_reclaimable_pages(z);
> + x += zone_page_state(z, NR_FREE_PAGES) -
> + zone->totalreserve_pages;
> + x += zone_reclaimable_pages(z);
>   }

This is highmem so zone->totalreserve_pages should always be 0.

Otherwise, the patch seems fine.

>   /*
>* Make sure that the number of highmem pages is never larger
> @@ -194,7 +195,8 @@ static unsigned long determine_dirtyable_memory(void)
>  {
>   unsigned long x;
>  
> - x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
> + x = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
> + x += global_reclaimable_pages();
>  
>   if (!vm_highmem_is_dirtyable)
>   x -= highmem_dirtyable_memory(x);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1dba05e..7e8e2ee 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5075,6 +5075,7 @@ static void calculate_totalreserve_pages(void)
>  
>   if (max > zone->present_pages)
>   max = zone->present_pages;
> + zone->totalreserve_pages = max;
>   reserve_pages += max;
>   }
>   }

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html