and scan rates is marginal but
avoiding unnecessary restarts is important. It helps later patches that
are more careful about how pageblocks are treated as earlier iterations
of those patches hit corner cases where the restarts were punishing and
very visible.
Signed-off-by: Mel Gorman
---
mm
94.26 ( 0.00%)16249.30 * 20.32%*
Amean fault-both-3217450.76 ( 0.00%)14904.71 * 14.59%*
Signed-off-by: Mel Gorman
---
mm/compaction.c | 12 ++--
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 1a41a2dbff24..75eb0d4
patches but it just makes the review slightly
harder.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 61 ++---
1 file changed, 23 insertions(+), 38 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index be27e4fa1b40..1a41a2dbff24 100644
arted
recently so overall the reduction in scan rates is a mere 2.8% which
is borderline noise.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index 921720f7a416..be27e4fa1b40 100644
---
e not materially different.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 16
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 608d274f9880..921720f7a416 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1071,6 +1071,9 @@ s
en in this case. When it does happen,
the scan rates multiple by factors measured in the hundreds and would be
misleading to present.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 32 ++--
mm/internal.h | 1 +
2 files changed, 27 insertions(+), 6 deletions(-)
di
success
rate but also by the fact that the scanners do not meet for longer when
pageblocks are actually used. Overall this is justified and completing
a pageblock scan is very important for later patches.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 95
35%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 203 ++--
1 file changed, 198 insertions(+), 5 deletions(-)
diff
showed similar benefits.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 179 +++-
mm/internal.h | 2 +
2 files changed, 179 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 8f0ce44dba41..137e32e8a2f5 100644
( 0.00%) 95.17 ( 5.54%)
Percentage huge-32 89.72 ( 0.00%) 93.59 ( 4.32%)
Compaction migrate scanned5416830625516488
Compaction free scanned 80053095487603321
Migration scan rates are reduced by 52%.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 126
ut it would also be considered a bug given that such a change
would ruin fragmentation.
On both 1-socket and 2-socket machines, scan rates are reduced slightly
on workloads that intensively allocate THP while the system is fragmented.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 16 ++
increased by less
than 1% which is marginal. However, detailed tracing indicated that
failure of migration due to a premature ENOMEM triggered by watermark
checks were eliminated.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm
it is offset by future reductions
in scanning. Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern. However, full
scanning of the pageblock is important for later patches.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/compact
should reduce lock contention slightly in some cases.
The main benefit is removing some partially duplicated code.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 7 ++-
mm/compaction.c | 12 +++-
mm/page_alloc.c | 10 +-
3 files changed, 18 insertions(+), 11
.00%)21707.05 ( 4.43%)
Amean fault-both-3221692.92 ( 0.00%)21968.16 ( -1.27%)
The 2-socket results are not materially different. Scan rates are similar
as expected.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 delet
. The
change could be much deeper but this was enough to briefly clarify the
flow.
No functional change.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 54 ++
1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/mm/compaction.c
It's non-obvious that high-order free pages are split into order-0 pages
from the function name. Fix it.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7acb43f07303..3afa4e9
compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.
Signed-off-by: Mel Gorman
Acked
This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans, better controlling of skip
information and whether multiple scanners can target the same block and
capturing pageblocks before being stolen by parallel requests. The series
is based o
The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm
The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
mm/compaction.c | 25
On Fri, Jan 04, 2019 at 09:18:38AM +0100, Vlastimil Babka wrote:
> On 1/3/19 11:57 PM, Mel Gorman wrote:
> > While zone->flag could have continued to be unused, there is potential
> > for moving some existing fields into the flags field instead. Particularly
> > re
e zone->initialized and zone->contiguous.
Reported-by: syzbot+93d94a001cfbce9e6...@syzkaller.appspotmail.com
Tested-by: Qian Cai
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 6 ++
mm/page_alloc.c| 8 +++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/i
On Thu, Jan 03, 2019 at 02:40:35PM -0500, Qian Cai wrote:
> > Signed-off-by: Mel Gorman
>
> Tested-by: Qian Cai
Thanks!
--
Mel Gorman
SUSE Labs
isingly,
unused zone flag field. The flag is read without the lock held to
do the wakeup. It's possible that the flag setting context is not
the same as the flag clearing context or for small races to occur.
However, each race possibility is harmless and there is no visible
degredation in f
"distance" is reasonably well understood,
it's not as clear to me whether distance is appropriate to describe
"local-but-different-speed" memory given that accessing a remote
NUMA node can saturate a single link where as the same may not
be true of local-but-different-speed memory which probably has
dedicated channels. In an ideal world, application developers
interested in higher-speed-memory-reserved-for-important-use and
cheaper-lower-speed-memory could describe what sort of application
modifications they'd be willing to do but that might be unlikely.
--
Mel Gorman
SUSE Labs
It's not necessarily to keep track of
the IRQ flags as callers into that path already do things like treat
IRQ disabling and the spin lock separately.
2. Use another alloc_flag in steal_suitable_fallback that is set when a
wakeup is required but do the actual wakeup in rmqueue() after the
zone locks are dropped and the allocation request is completed
3. Always wakeup kswapd if watermarks are boosted. I like this the least
because it means doing wakeups that are unrelated to fragmentation
that occurred in the current context.
Any particular preference?
While I recognise there is no test case available, how often does this
trigger in syzbot as it would be nice to have some confirmation any
patch is really fixing the problem.
--
Mel Gorman
SUSE Labs
not just ...
>
> Mel, Randy? You seem to have been the prime instigators on this.
>
Patch seems fine.
Acked-by: Mel Gorman
--
Mel Gorman
SUSE Labs
ng the skip
bit to avoid picking a migration source that was previously a
migration target
o The exit condition for compaction is not when scanners meet but when
fast_isolate_freepages cannot find any pageblock that is
MIGRATE_MOVABLE && !pageblock_skip
--
Mel Gorman
SUSE Labs
On Thu, Dec 20, 2018 at 11:44:57AM -0800, Yang Shi wrote:
> On Fri, Dec 14, 2018 at 3:03 PM Mel Gorman
> wrote:
> >
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
>
On Tue, Dec 18, 2018 at 10:55:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > release_pages() is a simpler version of free_unref_page_list() but it
> > tracks the highest PFN for caching the restart point of the compaction
> > free scanner.
On Tue, Dec 18, 2018 at 02:58:33PM +0100, Vlastimil Babka wrote:
> On 12/18/18 2:51 PM, Mel Gorman wrote:
> > On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> >> On 12/15/18 12:03 AM, Mel Gorman wrote:
> >>> When pageblocks get fragmented, waterma
On Tue, Dec 18, 2018 at 01:36:42PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > When pageblocks get fragmented, watermarks are artifically boosted to pages
> > are reclaimed to avoid further fragmentation events. However, compaction
> > is often
On Tue, Dec 18, 2018 at 10:06:31AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Pages with no migration handler use a fallback hander which sometimes
> > works and sometimes persistently fails such as blockdev pages. Migration
> > will retry
On Tue, Dec 18, 2018 at 09:08:02AM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > Reserved pages are set at boot time, tend to be clustered and almost
> > never become unreserved. When isolating pages for migrating, skip
> > the entire pagebloc
On Mon, Dec 17, 2018 at 03:06:59PM +0100, Vlastimil Babka wrote:
> On 12/15/18 12:03 AM, Mel Gorman wrote:
> > It's non-obvious that high-order free pages are split into order-0
> > pages from the function name. Fix it.
>
> That's fine, but looks like the patch h
the per-zone
> free_area's to determine migration targets and set a bit if it should be
> considered a migration source or a migration target. If all pages for a
> pageblock are not on free_areas, they are fully used.
>
Series has patches which implement something similar to this idea.
--
Mel Gorman
SUSE Labs
%) 99.22 ( 3.86%)
Percentage huge-32 94.94 ( 0.00%) 98.97 ( 4.25%)
And scan rates are reduced
Compaction migrate scanned2763428419002941
Compaction free scanned 5527951946395714
Signed-off-by: Mel Gorman
---
include/linux/compaction.h | 3 ++-
include/linux
s THP, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.
Signed-off-by: Mel Gorman
--
0-rc6
isolmig-v1r4findfree-v1r8
Compaction migrate scanned2558745327634284
Compaction free scanned 8773589455279519
The free scan rates are reduced by 37%.
Signed-off-by: Mel Gorman
---
mm/compaction.c
4.12%)
Compaction migrate scanned5100545025587453
Compaction free scanned 78035946487735894
Migration scan rates are reduced by 49%. At the time of writing, the
2-socket results are not yet available.
Signed-off-by: Mel Gorman
---
mm/compaction.c
showing a 16% reduction in migration scanning with some mild
improvements on latency. A 2-socket machine showed similar reductions
of scan rates in percentage terms.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 179 +++-
mm/internal.h | 2
release_pages() is a simpler version of free_unref_page_list() but it
tracks the highest PFN for caching the restart point of the compaction
free scanner. This patch optionally tracks the highest PFN in the core
helper and converts compaction to use it.
Signed-off-by: Mel Gorman
---
include
The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 25 +
mm/internal.h
f the pageblock and sometimes it is offset by future
reductions in scanning. Hence, the results are not presented this time as
it's a mix of gains/losses without any clear pattern. However, completing
scanning of the pageblock is important for later patches.
Signed-off-by: Mel Gorman
---
mm/co
It's non-obvious that high-order free pages are split into order-0
pages from the function name. Fix it.
Signed-off-by: Mel Gorman
---
mm/compaction.c | 60 -
1 file changed, 29 insertions(+), 31 deletions(-)
diff --git a/mm/compact
compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.
Signed-off-by: Mel Gorman
---
mm
sensitive to timing and whether the boost was active or not. However,
detailed tracing indicated that failure of migration due to a premature
ENOMEM triggered by watermark checks were eliminated.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion
76.43 ( 0.00%) 1052.64 * 10.52%*
Compaction migrate scanned 3860713 3294284
Compaction free scanned 613786341 433423502
Kcompactd migrate scanned 408711 291915
Kcompactd free scanned 242509759 217164988
Signed-off-by: Mel Gorman
---
mm/compaction.
The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.
Signed-off-by: Mel Gorman
---
mm/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
This is a very preliminary RFC. I'm posting this early as the
__GFP_THISNODE discussion continues and has started looking at the
compaction implementation and it'd be worth looking at this series
fdirst. The cc list is based on that dicussion just to make them aware
it exists. A v2 will have a sign
( 4.62%)
Amean fault-both-3222461.41 ( 0.00%)21415.35 ( 4.66%)
The 2-socket results are not materially different. Scan rates are
similar as expected.
Signed-off-by: Mel Gorman
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b
ts as a regular user, but
> they seem to want to modify:
>
> /sys/kernel/mm/transparent_hugepage/enabled
>
Red herring in this case. Even if transparent hugepages are left as the
default, it still tries to write it stupidly. An irritating, but
harmless bug.
--
Mel Gorman
SUSE Labs
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote:
> On Tue 04-12-18 16:47:23, David Rientjes wrote:
> > On Tue, 4 Dec 2018, Mel Gorman wrote:
> >
> > > What should also be kept in mind is that we should avoid conflating
> > > locality preferences with
ot. It affects the level of work the system does
as well as the overall success rate of operations (be it reclaim, THP
allocation, compaction, whatever). This is why a reproduction case that is
representative of the problem you're facing on the real workload matters
would have been helpful because then any alternative proposal could have
taken your workload into account during testing.
--
Mel Gorman
SUSE Labs
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote:
> I have *one* result of the series on a 1-socket machine running
> "thpscale". It creates a file, punches holes in it to create a
> very light form of fragmentation and then tries THP allocations
> using madvise
but probably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-
check first if we can defragment the memory or
> whether it makes sense to free pages in case the defragmentation is
> expected to help afterwards. It seemed better to put this special case
> out of the main reclaim/compaction retry-with-increasing-priority loop
> for non-costly-order allocations that in general can't fail.
>
Again, this is accurate. Scanning/compaction costs a lot. This has improved
over time, but minimally it's unmapping pages, copying data and a bunch
of TLB flushes. During migration, any access to the data being migrated
stalls. The harm of reclaiming a little first so that the compaction is
more likely to succeed incurred fewer stalls of small magnitude in
general -- or at least it was the case when that behaviour was
developed.
--
Mel Gorman
SUSE Labs
but probably worthwhile
> > for long-term allocation success rates. It is possible to eliminate
> > fragmentation events entirely with tuning due to this patch although that
> > would require careful evaluation to determine if it's worthwhile.
> >
> > Signed-off-
ndicated it would) and that disabling PSI by default is reasonably
close in terms of performance for this particular workload on this
particular machine so;
Tested-by: Mel Gorman
Thanks!
--
Mel Gorman
SUSE Labs
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote:
> On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote:
> > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> > > @@ -509,6 +509,15 @@ config PSI
> > >
> > > Sa
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote:
> Hi Mel,
>
> On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote:
> > Hi Johannes,
> >
> > PSI is a great idea but it does have overhead and if enabled by Kconfig
> > then it incurs a hit
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be
applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path
mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 10 ++
1 file changed, 2 insertions
ect bad 505802a53510e54ad5fbbd655a68893df83bfb91
# bad: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup support
git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60
# first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup
support
--
Mel Gorman
SUSE Labs
claimed that this has nothing to do with ALLOC_NO_FRAGMENT.
That's true in this patch but is not true later so it's done now for
easier review to show where the flag needs to be recorded.
No functional change.
Signed-off-by: Mel Gorman
---
mm/internal.h | 1 +
mm/page_al
ong-term allocation success rate would be higher.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 21 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 43 +-
mm/vms
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a
tart on node 0 or not for this patch but the relevance is
reduced later in the series.
Overall, the patch reduces the number of external fragmentation causing
events so the success of THP over long periods of time would be improved
for this adverse workload.
There are some big changes due to both Vlastimil's review feedback on v4 and
some oddities spotted while answering his review. In some respects, the
series is slightly less effective but the approach is more consistent and
logical overall. The overhead is also lower from the first patch and stalls
alls can be enough for
kswapd to catch up. How much that helps is variable but probably worthwhile
for long-term allocation success rates. It is possible to eliminate
fragmentation events entirely with tuning due to this patch although that
would require careful evaluation to determine if it
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote:
> On 11/21/18 11:14 AM, Mel Gorman wrote:
> > An event that potentially causes external fragmentation problems has
> > already been described but there are degrees of severity. A "serious"
> > event
check gfp_flags I'm afraid, and that doesn't seem worth the trouble.
Indeed. While it works in some cases, it'll be full of holes and while
I could close them, it just turns into a subtle mess. I've prepared a
preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks
based on that. It's a lot cleaner overall, it's less of a mess than passing
gfp_flags all the way through for one test and there are fewer side-effects.
Thanks!
--
Mel Gorman
SUSE Labs
ns without __GFP_KSWAPD_RECLAIM? But returning 0 here means
> actually allowing the allocation go through steal_suitable_fallback()?
> So should it return ALLOC_NOFRAGMENT below, or was the intent different?
>
I want to avoid waking kswapd in steal_suitable_fallback if waking
kswapd is not allowed. If the calling context does not allow it, it does
mean that fragmentation will be allowed to occur. I'm banking on it
being a relatively rare case but potentially it'll be problematic. The
main source of allocation requests that I expect to hit this are THP and
as they are already at pageblock_order, it has limited impact from a
fragmentation perspective -- particularly as pageblock_order stealing is
allowed even with ALLOC_NOFRAGMENT.
--
Mel Gorman
SUSE Labs
zoneref *z = ac->preferred_zoneref;
> > struct zone *zone;
> > struct pglist_data *last_pgdat_dirty_limit = NULL;
> > + bool no_fallback;
> >
> > +retry:
>
> Ugh, I think 'z = ac->preferred_zoneref' should be moved here under
> retry. AFAICS without that, the preference of local node to
> fragmentation avoidance doesn't work?
>
Yup, you're right!
In the event of fragmentation of both normal and dma32 zone, it doesn't
restart on the local node and instead falls over to the remote node
prematurely. This is obviously not desirable. I'll give it and thanks
for spotting it.
--
Mel Gorman
SUSE Labs
No major change from v3 really, mostly resending to see if there is any
review reaction. It's rebased but a partial test indicated that the
behaviour is similar to the previous baseline
Changelog since v3
o Rebase to 4.20-rc3
o Remove a stupid warning from the last patch
Changelog since v2
o Drop
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b
tch significantly reduces the number of external
fragmentation causing events so the success of THP over long periods of
time would be improved for this adverse workload. While there are large
differences compared to how V1 behaved, this is almost entirely accounted
for by ac5b2c18911f ("mm:
s under quite some pressure.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 19 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 53 +--
mm/vms
limiting the fragmentation events. On the flip-side,
it has been checked that setting the fragment_stall_order to 9 eliminated
fragmentation events entirely.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 23 +++
include/linux/mm.h| 1 +
include/linux/
ad in advance, it might bring
this the right direction and not accidentally throw Anthony down a hole
working on a series that never gets ack'd.
I'm not necessarily the best person to answer because my natural inclination
after the fragmentation series would be to keep using thpfiosacle
(from the fragmentation avoidance series) and work on improving the THP
allocation success rates and reduce latencies. I've tunnel vision on that
for the moment.
Thanks.
--
Mel Gorman
SUSE Labs
On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote:
> On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
> > The basic idea as outlined by Mel Gorman in [2] is:
> >
> > 1) On first fault in a sufficiently sized range, allocate a huge page
>
On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
> The basic idea as outlined by Mel Gorman in [2] is:
>
> 1) On first fault in a sufficiently sized range, allocate a huge page
>sized and aligned block of base pages. Map the base page
>corresponding to th
It's unfortunate and I know the original microoptimisation
was mine but if the fast-path check ends up being a problem then I/we go
back to finding ways of making the page allocator faster from a fundamental
algorithmic point of view and not a microoptimisation approach. There is
potential fruit there, just none that is low-hanging.
--
Mel Gorman
SUSE Labs
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b
limiting the fragmentation events. On the flip-side,
it has been checked that setting the fragment_stall_order to 9 eliminated
fragmentation events entirely.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 23 +++
include/linux/mm.h| 1 +
include/linux/
tch significantly reduces the number of external
fragmentation causing events so the success of THP over long periods of
time would be improved for this adverse workload. While there are large
differences compared to how V1 behaved, this is almost entirely accounted
for by ac5b2c18911f ("mm:
s under quite some pressure.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 19 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 53 +--
mm/vms
Sorry to send out a v3 so quickly. I dropped patch 5 as I'm not very happy
with the approach or that it is without side-effects. I have some ideas
on how it could be better achieved which can be done without delaying the
other 4 patches. I've also updated patch 4 to reduce the stall timeout as
lon
The 1-socket machine is different to the one used in v1 so some of the
results are changed on that basis. The baseline has changed to 4.20-rc1 so
the __GFP_THISNODE removal for THP is in effect which alters the behaviour
on 2-socket in particular. The biggest changes are in the fourth patch,
both
This is a preparation patch only, no functional change.
Signed-off-by: Mel Gorman
---
include/linux/mmzone.h | 9 +
mm/compaction.c| 2 +-
mm/page_alloc.c| 12 ++--
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/mmzone.h b
the case of MADV_HUGEPAGE, the allocation
success rates were already high. However, it's encouraging that the THP
allocation latencies were improved.
Signed-off-by: Mel Gorman
---
include/linux/compaction.h| 4 ++
include/linux/migrate.h | 7 +-
include/linux/mmzone.h
s under quite some pressure.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 19 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 11 ++--
kernel/sysctl.c | 8 +++
mm/page_alloc.c | 53 +--
mm/vms
tch significantly reduces the number of external
fragmentation causing events so the success of THP over long periods of
time would be improved for this adverse workload. While there are large
differences compared to how V1 behaved, this is almost entirely accounted
for by ac5b2c18911f ("mm:
made available for analysis to see if the stall behaviour can be
reduced while still limiting the fragmentation events. On the flip-side,
it has been checked that setting the fragment_stall_order to 9 eliminated
fragmentation events entirely.
Signed-off-by: Mel Gorman
---
Documentation/sysct
On Mon, Nov 05, 2018 at 03:40:40PM -0800, Subhra Mazumdar wrote:
>
> On 11/5/18 2:08 AM, Mel Gorman wrote:
> > Adding Al Viro as per get_maintainers.pl.
> >
> > On Tue, Sep 25, 2018 at 04:32:40PM -0700, subhra mazumdar wrote:
> > > Introduce pipe_ll_usec field fo
nu guesses how long it'll be in an idle state for).
It's not really my area but I feel that this patch is a benchmark-specific
hack and that tuning it on a system-wide basis will be a game of "win
some, lose some" that is never used in practice. Worse, it might end up
in a tuning guide as "always set this sysctl" without considering the
capabilities of the machine or the workload and falls victim to cargo
cult tuning.
--
Mel Gorman
SUSE Labs
On Wed, Oct 31, 2018 at 04:06:43PM +, Mel Gorman wrote:
> An external fragmentation event was previously described as
>
> When the page allocator fragments memory, it records the event using
> the mm_page_alloc_extfrag event. If the fallback_order is smaller
> t
er of external
fragmentation causing events so the success of THP over long
periods of time would be improved for this adverse workload.
Signed-off-by: Mel Gorman
---
mm/internal.h | 13 +---
mm/page_alloc.c | 101 ++--
2 files changed,
hemselves can increase fragmentation pressure.
This is less an obvious universal win. It does control fragmentation
better to some extent in that pageblocks can be found faster in some
cases but the nature of the workload makes it less clear-cut.
Signed-off-by: Mel Gorman
---
include/linux/compaction.h
stall_order to 9 eliminated fragmentation events entirely
on the 1-socket machine and by 99.71% on the 2-socket machine.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/vm.txt | 23 +++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 2 ++
kernel/sysctl.c
701 - 800 of 4723 matches
Mail list logo