Re: [RFC/RFT][PATCH v6] cpuidle: New timer events oriented governor for tickless systems

2018-12-07 Thread Mel Gorman
a regular user, but > they seem to want to modify: > > /sys/kernel/mm/transparent_hugepage/enabled > Red herring in this case. Even if transparent hugepages are left as the default, it still tries to write it stupidly. An irritating, but harmless bug. -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Wed, Dec 05, 2018 at 10:08:56AM +0100, Michal Hocko wrote: > On Tue 04-12-18 16:47:23, David Rientjes wrote: > > On Tue, 4 Dec 2018, Mel Gorman wrote: > > > > > What should also be kept in mind is that we should avoid conflating > > > locality preferences with

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Mel Gorman
t affects the level of work the system does as well as the overall success rate of operations (be it reclaim, THP allocation, compaction, whatever). This is why a reproduction case that is representative of the problem you're facing on the real workload matters would have been helpful because then any alternative proposal could have taken your workload into account during testing. -- Mel Gorman SUSE Labs

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Mel Gorman
On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote: > I have *one* result of the series on a 1-socket machine running > "thpscale". It creates a file, punches holes in it to create a > very light form of fragmentation and then tries THP allocations > using mad

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-12-05 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-04 Thread Mel Gorman
r to put this special case > out of the main reclaim/compaction retry-with-increasing-priority loop > for non-costly-order allocations that in general can't fail. > Again, this is accurate. Scanning/compaction costs a lot. This has improved over time, but minimally it's unmapping pages, copying data and a bunch of TLB flushes. During migration, any access to the data being migrated stalls. The harm of reclaiming a little first so that the compaction is more likely to succeed incurred fewer stalls of small magnitude in general -- or at least it was the case when that behaviour was developed. -- Mel Gorman SUSE Labs

Re: [PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-27 Thread Mel Gorman
robably worthwhile > > for long-term allocation success rates. It is possible to eliminate > > fragmentation events entirely with tuning due to this patch although that > > would require careful evaluation to determine if it's worthwhile. > > > > Signed-off-by: Mel Go

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
icated it would) and that disabling PSI by default is reasonably close in terms of performance for this particular workload on this particular machine so; Tested-by: Mel Gorman Thanks! -- Mel Gorman SUSE Labs

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 12:32:18PM -0500, Johannes Weiner wrote: > On Mon, Nov 26, 2018 at 04:54:47PM +0000, Mel Gorman wrote: > > On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > > > @@ -509,6 +509,15 @@ config PSI > > > > > > Sa

Re: Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
On Mon, Nov 26, 2018 at 11:07:24AM -0500, Johannes Weiner wrote: > Hi Mel, > > On Mon, Nov 26, 2018 at 01:34:20PM +0000, Mel Gorman wrote: > > Hi Johannes, > > > > PSI is a great idea but it does have overhead and if enabled by Kconfig > > then it incur

[PATCH] mm: Use alloc_flags to record if kswapd can wake -fix

2018-11-26 Thread Mel Gorman
Vlastimil Babka correctly pointed out that the ALLOC_KSWAPD flag needs to be applied in the !CONFIG_ZONE_DMA32 case. This is a fix for the mmotm path mm-use-alloc_flags-to-record-if-kswapd-can-wake.patch Signed-off-by: Mel Gorman --- mm/page_alloc.c | 10 ++ 1 file changed, 2 insertions

Hackbench pipes regression bisected to PSI

2018-11-26 Thread Mel Gorman
60] psi: cgroup support git bisect bad 2ce7135adc9ad081aa3c49744144376ac74fea60 # first bad commit: [2ce7135adc9ad081aa3c49744144376ac74fea60] psi: cgroup support -- Mel Gorman SUSE Labs

[PATCH 3/5] mm: Use alloc_flags to record if kswapd can wake

2018-11-23 Thread Mel Gorman
be claimed that this has nothing to do with ALLOC_NO_FRAGMENT. That's true in this patch but is not true later so it's done now for easier review to show where the flag needs to be recorded. No functional change. Signed-off-by: Mel Gorman --- mm/internal.h | 1 + mm/page_alloc.c | 25

[PATCH 4/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-23 Thread Mel Gorman
erm allocation success rate would be higher. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 21 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 43 +- mm/vmscan.c

[PATCH 2/5] mm: Move zone watermark accesses behind an accessor

2018-11-23 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git

[PATCH 1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-23 Thread Mel Gorman
the relevance is reduced later in the series. Overall, the patch reduces the number of external fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. Signed-off-by: Mel Gorman --- mm/inte

[PATCH 0/5] Fragmentation avoidance improvements v5

2018-11-23 Thread Mel Gorman
There are some big changes due to both Vlastimil's review feedback on v4 and some oddities spotted while answering his review. In some respects, the series is slightly less effective but the approach is more consistent and logical overall. The overhead is also lower from the first patch and

[PATCH 5/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-23 Thread Mel Gorman
n be enough for kswapd to catch up. How much that helps is variable but probably worthwhile for long-term allocation success rates. It is possible to eliminate fragmentation events entirely with tuning due to this patch although that would require careful evaluation to determine if it's worthwhil

Re: [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-22 Thread Mel Gorman
On Thu, Nov 22, 2018 at 06:02:10PM +0100, Vlastimil Babka wrote: > On 11/21/18 11:14 AM, Mel Gorman wrote: > > An event that potentially causes external fragmentation problems has > > already been described but there are degrees of severity. A "serious" > > even

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
sn't seem worth the trouble. Indeed. While it works in some cases, it'll be full of holes and while I could close them, it just turns into a subtle mess. I've prepared a preparation path that encodes __GFP_KSWAPD_RECLAIM in alloc_flags and checks based on that. It's a lot cleaner overall, it's less of a mess than passing gfp_flags all the way through for one test and there are fewer side-effects. Thanks! -- Mel Gorman SUSE Labs

Re: [PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-22 Thread Mel Gorman
But returning 0 here means > actually allowing the allocation go through steal_suitable_fallback()? > So should it return ALLOC_NOFRAGMENT below, or was the intent different? > I want to avoid waking kswapd in steal_suitable_fallback if waking kswapd is not allowed. If the calling context does not allow it, it does mean that fragmentation will be allowed to occur. I'm banking on it being a relatively rare case but potentially it'll be problematic. The main source of allocation requests that I expect to hit this are THP and as they are already at pageblock_order, it has limited impact from a fragmentation perspective -- particularly as pageblock_order stealing is allowed even with ALLOC_NOFRAGMENT. -- Mel Gorman SUSE Labs

Re: [PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-21 Thread Mel Gorman
zoneref *z = ac->preferred_zoneref; > > struct zone *zone; > > struct pglist_data *last_pgdat_dirty_limit = NULL; > > + bool no_fallback; > > > > +retry: > > Ugh, I think 'z = ac->preferred_zoneref' should be moved here under > retry. AFAICS without that, the preference of local node to > fragmentation avoidance doesn't work? > Yup, you're right! In the event of fragmentation of both normal and dma32 zone, it doesn't restart on the local node and instead falls over to the remote node prematurely. This is obviously not desirable. I'll give it and thanks for spotting it. -- Mel Gorman SUSE Labs

[PATCH 0/4] Fragmentation avoidance improvements v4

2018-11-21 Thread Mel Gorman
No major change from v3 really, mostly resending to see if there is any review reaction. It's rebased but a partial test indicated that the behaviour is similar to the previous baseline Changelog since v3 o Rebase to 4.20-rc3 o Remove a stupid warning from the last patch Changelog since v2 o

[PATCH 2/4] mm: Move zone watermark accesses behind an accessor

2018-11-21 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

[PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-21 Thread Mel Gorman
nal fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. While there are large differences compared to how V1 behaved, this is almost entirely accounted for by ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE m

[PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-21 Thread Mel Gorman
er quite some pressure. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 19 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 53 +-- mm/vmscan.c

[PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-21 Thread Mel Gorman
e fragmentation events. On the flip-side, it has been checked that setting the fragment_stall_order to 9 eliminated fragmentation events entirely. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 23 +++ include/linux/mm.h| 1 + include/linux/mmzone.h

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-10 Thread Mel Gorman
own a hole working on a series that never gets ack'd. I'm not necessarily the best person to answer because my natural inclination after the fragmentation series would be to keep using thpfiosacle (from the fragmentation avoidance series) and work on improving the THP allocation success rates and reduce latencies. I've tunnel vision on that for the moment. Thanks. -- Mel Gorman SUSE Labs

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-09 Thread Mel Gorman
On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: > On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: > > The basic idea as outlined by Mel Gorman in [2] is: > > > > 1) On first fault in a sufficiently sized range, allocate a huge page >

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-09 Thread Mel Gorman
On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: > The basic idea as outlined by Mel Gorman in [2] is: > > 1) On first fault in a sufficiently sized range, allocate a huge page >sized and aligned block of base pages. Map the base page >corresponding to th

Re: UBSAN: Undefined behaviour in mm/page_alloc.c

2018-11-09 Thread Mel Gorman
nfortunate and I know the original microoptimisation was mine but if the fast-path check ends up being a problem then I/we go back to finding ways of making the page allocator faster from a fundamental algorithmic point of view and not a microoptimisation approach. There is potential fruit there, just none that is low-hanging. -- Mel Gorman SUSE Labs

[PATCH 2/4] mm: Move zone watermark accesses behind an accessor

2018-11-08 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

[PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-08 Thread Mel Gorman
e fragmentation events. On the flip-side, it has been checked that setting the fragment_stall_order to 9 eliminated fragmentation events entirely. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 23 +++ include/linux/mm.h| 1 + include/linux/mmzone.h

[PATCH 3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-08 Thread Mel Gorman
er quite some pressure. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 19 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 53 +-- mm/vmscan.c

[PATCH 1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-08 Thread Mel Gorman
nal fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. While there are large differences compared to how V1 behaved, this is almost entirely accounted for by ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE m

[PATCH 0/4] Fragmentation avoidance improvements v3

2018-11-08 Thread Mel Gorman
Sorry to send out a v3 so quickly. I dropped patch 5 as I'm not very happy with the approach or that it is without side-effects. I have some ideas on how it could be better achieved which can be done without delaying the other 4 patches. I've also updated patch 4 to reduce the stall timeout as

[PATCH 0/5] Fragmentation avoidance improvements v2

2018-11-07 Thread Mel Gorman
The 1-socket machine is different to the one used in v1 so some of the results are changed on that basis. The baseline has changed to 4.20-rc1 so the __GFP_THISNODE removal for THP is in effect which alters the behaviour on 2-socket in particular. The biggest changes are in the fourth patch, both

[PATCH 2/5] mm: Move zone watermark accesses behind an accessor

2018-11-07 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

[PATCH 5/5] mm: Target compaction on pageblocks that were recently fragmented

2018-11-07 Thread Mel Gorman
in the case of MADV_HUGEPAGE, the allocation success rates were already high. However, it's encouraging that the THP allocation latencies were improved. Signed-off-by: Mel Gorman --- include/linux/compaction.h| 4 ++ include/linux/migrate.h | 7 +- include/linux/mmzone.h

[PATCH 3/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-11-07 Thread Mel Gorman
er quite some pressure. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 19 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 53 +-- mm/vmscan.c

[PATCH 1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-11-07 Thread Mel Gorman
nal fragmentation causing events so the success of THP over long periods of time would be improved for this adverse workload. While there are large differences compared to how V1 behaved, this is almost entirely accounted for by ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE m

[PATCH 4/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-11-07 Thread Mel Gorman
for analysis to see if the stall behaviour can be reduced while still limiting the fragmentation events. On the flip-side, it has been checked that setting the fragment_stall_order to 9 eliminated fragmentation events entirely. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 23

Re: [RFC PATCH v2 1/1] pipe: busy wait for pipe

2018-11-06 Thread Mel Gorman
On Mon, Nov 05, 2018 at 03:40:40PM -0800, Subhra Mazumdar wrote: > > On 11/5/18 2:08 AM, Mel Gorman wrote: > > Adding Al Viro as per get_maintainers.pl. > > > > On Tue, Sep 25, 2018 at 04:32:40PM -0700, subhra mazumdar wrote: > > > Introduce pipe_ll_usec field fo

Re: [RFC PATCH v2 1/1] pipe: busy wait for pipe

2018-11-05 Thread Mel Gorman
ot really my area but I feel that this patch is a benchmark-specific hack and that tuning it on a system-wide basis will be a game of "win some, lose some" that is never used in practice. Worse, it might end up in a tuning guide as "always set this sysctl" without considering the capabilities of the machine or the workload and falls victim to cargo cult tuning. -- Mel Gorman SUSE Labs

Re: [PATCH 3/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-10-31 Thread Mel Gorman
On Wed, Oct 31, 2018 at 04:06:43PM +, Mel Gorman wrote: > An external fragmentation event was previously described as > > When the page allocator fragments memory, it records the event using > the mm_page_alloc_extfrag event. If the fallback_order is smaller > t

[PATCH 1/5] mm, page_alloc: Spread allocations across zones before introducing fragmentation

2018-10-31 Thread Mel Gorman
using events so the success of THP over long periods of time would be improved for this adverse workload. Signed-off-by: Mel Gorman --- mm/internal.h | 13 +--- mm/page_alloc.c | 101 ++-- 2 files changed, 99 insertions(+), 15 deletions(-)

[PATCH 5/5] mm: Target compaction on pageblocks that were recently fragmented

2018-10-31 Thread Mel Gorman
can increase fragmentation pressure. This is less an obvious universal win. It does control fragmentation better to some extent in that pageblocks can be found faster in some cases but the nature of the workload makes it less clear-cut. Signed-off-by: Mel Gorman --- include/linux/compaction.h| 4

[PATCH 4/5] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

2018-10-31 Thread Mel Gorman
9 eliminated fragmentation events entirely on the 1-socket machine and by 99.71% on the 2-socket machine. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 23 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 2 ++ kernel/sysctl.c | 10 +++ m

[PATCH 3/5] mm: Reclaim small amounts of memory when an external fragmentation event occurs

2018-10-31 Thread Mel Gorman
ate without a negative impact on fault latencies. Signed-off-by: Mel Gorman --- Documentation/sysctl/vm.txt | 19 +++ include/linux/mm.h | 1 + include/linux/mmzone.h | 11 ++-- kernel/sysctl.c | 8 +++ mm/page_alloc.c | 50 +

[PATCH 2/5] mm: Move zone watermark accesses behind an accessor

2018-10-31 Thread Mel Gorman
This is a preparation patch only, no functional change. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 9 + mm/compaction.c| 2 +- mm/page_alloc.c| 12 ++-- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b

[PATCH 0/5] Fragmentation avoidance improvements

2018-10-31 Thread Mel Gorman
Warning: This is a long intro with long changelogs and this is not a trivial area to either analyse or fix. TLDR -- 95% reduction in fragmentation events, patches 1-3 should be relatively ok. Patch 4 and 5 need scrutiny but they are also independent or dropped. It has been

Re: [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs

2018-10-24 Thread Mel Gorman
On Wed, Oct 24, 2018 at 04:11:24PM +0530, Srikar Dronamraju wrote: > * Peter Zijlstra [2018-10-24 12:15:08]: > > > On Wed, Oct 24, 2018 at 03:16:46PM +0530, Srikar Dronamraju wrote: > > > * Mel Gorman [2018-10-24 09:56:36]: > > > > > > > On Wed

Re: [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs

2018-10-24 Thread Mel Gorman
On Wed, Oct 24, 2018 at 03:16:46PM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2018-10-24 09:56:36]: > > > On Wed, Oct 24, 2018 at 08:32:49AM +0530, Srikar Dronamraju wrote: > > It would certainly be a bit odd because the > > application is asking for some prot

Re: [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs

2018-10-24 Thread Mel Gorman
ate kernel threads interfering then it also cannot tolerate remote access latencies) or disabling NUMA balancing entirely to avoid incurring minor faults. Thanks. -- Mel Gorman SUSE Labs

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-23 Thread Mel Gorman
On Tue, Oct 23, 2018 at 08:57:45AM +0100, Mel Gorman wrote: > Note that I accept it's trivial to fragment memory in a harmful way. > I've prototyped a test case yesterday that uses fio in the following way > to fragment memory > > o fio of many small files (64K) > o create i

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-23 Thread Mel Gorman
On Mon, Oct 22, 2018 at 02:04:32PM -0700, David Rientjes wrote: > On Tue, 16 Oct 2018, Mel Gorman wrote: > > > I consider this to be an unfortunate outcome. On the one hand, we have a > > problem that three people can trivially reproduce with known test cases > > and

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-19 Thread Mel Gorman
pretty much nowhere :-) > > I'll see what I can get with 'address space range' lock first and will > come back to 'lazy buddy' if it doesn't work out. Thank you and > Vlastimil for all the suggestions. My pleasure. -- Mel Gorman SUSE Labs

Re: [RFC v4 PATCH 3/5] mm/rmqueue_bulk: alloc without touching individual page structure

2018-10-18 Thread Mel Gorman
; > time to batch the struct page updates. > > I don't quite follow this part. It doesn't seem possible we can exceed > pcp->high in allocation path, or are you talking about free path? > I'm talking about the free path. > And thanks a lot for the review! My pleasure, hope it helps. -- Mel Gorman SUSE Labs

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-18 Thread Mel Gorman
? Is it that you are worried about high-order allocation > success rate using this design? I've pointed out what I see are the design flaws but yes, in general, I'm worried about the high order allocation success rate using this design, the reliance on compaction and the fact that the primary motivation is when THP is disabled. -- Mel Gorman SUSE Labs

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-17 Thread Mel Gorman
On Wed, Oct 17, 2018 at 09:10:59PM +0800, Aaron Lu wrote: > On Wed, Oct 17, 2018 at 11:44:27AM +0100, Mel Gorman wrote: > > On Wed, Oct 17, 2018 at 02:33:27PM +0800, Aaron Lu wrote: > > > Running will-it-scale/page_fault1 process mode workload on a 2 sockets > > >

Re: [RFC v4 PATCH 3/5] mm/rmqueue_bulk: alloc without touching individual page structure

2018-10-17 Thread Mel Gorman
o exceed pcp->high for short periods of time to batch the struct page updates. I didn't read the rest of the series as it builds upon this patch. -- Mel Gorman SUSE Labs

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-17 Thread Mel Gorman
f buddy pages that are still allocated. When lazy buddy merging was last examined years ago, a consequence was that high-order allocation success rates were reduced. I see you do the merging when compaction has been recently considered but I don't see how that is sufficient. If a high-order allocation fails, there is no guarantee that compaction will find those unmerged buddies. There is also no guarantee that a page free will find them. So, in the event of a high-order allocation failure, what finds all those unmerged buddies and puts them together to see if the allocation would succeed without reclaim/compaction/etc. -- Mel Gorman SUSE Labs

Re: [RFC v4 PATCH 1/5] mm/page_alloc: use helper functions to add/remove a page to/from buddy

2018-10-17 Thread Mel Gorman
ty change. > > Acked-by: Vlastimil Babka > Signed-off-by: Aaron Lu Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-17 Thread Mel Gorman
On Tue, Oct 16, 2018 at 03:37:15PM -0700, Andrew Morton wrote: > On Tue, 16 Oct 2018 08:46:06 +0100 Mel Gorman wrote: > > I consider this to be an unfortunate outcome. On the one hand, we have a > > problem that three people can trivially reproduce with known test cases > &

Re: [PATCH 3/4] mm: workingset: add vmstat counter for shadow nodes

2018-10-16 Thread Mel Gorman
e */ > + > if (node->count && node->count == node->exceptional) { > if (list_empty(>private_list)) { > list_lru_add(_nodes, >private_list); Note that for whatever reason, I've observed that irqs_disabled() is actually quite an expensive call. I'm not saying the warning is a bad idea but it should not be sprinkled around unnecessary and may be more suitable as a debug option. -- Mel Gorman SUSE Labs

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Mel Gorman
would be a chance that others could analyse the problem and prototype some fixes. The test case was requested in the thread and never produced so even if someone were to prototype fixes, it would be dependant on a third party to test and produce data which is a time-consuming loop. Instead, we are

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote: > On Tue 09-10-18 14:00:34, Mel Gorman wrote: > > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > > [Sorry for being slow in responding but I was mostly offline last few > > > days] > &g

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > [Sorry for being slow in responding but I was mostly offline last few > days] > > On Tue 09-10-18 10:48:25, Mel Gorman wrote: > [...] > > This goes back to my point that the MADV_HUGEPAGE hint should not ma

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
GFP_DIRECT_RECLAIM | __GFP_ONLY_COMPACT, > though, as I assume this would evolve into. > > Are there any other proposed users for __GFP_ONLY_COMPACT beyond thp > allocations? If not, we should just save the gfp bit and encode the logic > directly into the page allocator. > > Would you support this? I don't think it's necessarily bad but it cannot distinguish between THP and hugetlbfs. Hugetlbfs users are typically more willing to accept high overheads as they may be required for the application to function. That's probably fixable but will still leave us in the state where MADV_HUGEPAGE is also a hint about locality. It'd still be interesting to hear if it fixes the VM initialisation issue but do note that if this patch is used as a replacement that hugetlbfs users may complain down the line. -- Mel Gorman SUSE Labs

Re: [PATCH] mm,numa: Remove remaining traces of rate-limiting.

2018-10-08 Thread Mel Gorman
ort and mainline versions of the original patch. Thanks -- Mel Gorman SUSE Labs

Re: [PATCH] mm,numa: Remove remaining traces of rate-limiting.

2018-10-06 Thread Mel Gorman
gt; Signed-off-by: Srikar Dronamraju Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Mel Gorman
at both fit perfectly into NUMA nodes and are extremely sensitive to access latencies. It's a question of causing the least harm to the most users which is what this patch does. If you need behaviour for more agressive reclaim or locality hints then kindly introduce them and do not depend in MADV_HUGEPAGE accidentically doubling up as hints about memory locality. -- Mel Gorman SUSE Labs

Re: [PATCH V2 2/2] cpuidle/drivers/menu: Remove get_loadavg in the performance multiplier

2018-10-04 Thread Mel Gorman
d9f124703207895777ac6e91dacde0f7cc17 > > Cc: Peter Zijlstra > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Colin Cross > Cc: Ramesh Thomas > Cc: Mel Gorman > Signed-off-by: Daniel Lezcano I agree that removing this is the most sensible option so; Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

Re: [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task

2018-10-03 Thread Mel Gorman
Average: 18562.70 > > Difference being -2.6% regression > That's unfortunate. How much does this workload normally vary between runs? If you monitor migrations over time, is there an increase spike in migration early in the lifetime of the workload? -- Mel Gorman SUSE Labs

Re: [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task

2018-10-02 Thread Mel Gorman
consequences as it changes the behaviour for the entire lifetime of the workload. It could cause excessive migrations in the case where a machine is almost fully utilised and getting load balanced or in cases where tasks are pulled frequently cross-node (e.g. worker thread model or a pipelined computation). I'm only looking to address the case where the load balancer spreads a workload early and the memory should move to the new node quickly. If it turns out there are cases where that decision is wrong, it gets remedied quickly but if your proposal is ever wrong, the system doesn't recover. -- Mel Gorman SUSE Labs

Re: [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task

2018-10-02 Thread Mel Gorman
nt check we end up spreading memory faster > than we should hence hurting the chance of early consolidation. > > Can we restrict to something like this? > > if (p->numa_scan_seq >=MIN && p->numa_scan_seq <= MIN+4 && > (cpupid_match_pid(p, last_cpupid))) >

[tip:sched/urgent] sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task

2018-10-02 Thread tip-bot for Mel Gorman
Commit-ID: 37355bdc5a129899f6b245900a8eb944a092f7fd Gitweb: https://git.kernel.org/tip/37355bdc5a129899f6b245900a8eb944a092f7fd Author: Mel Gorman AuthorDate: Mon, 1 Oct 2018 11:05:25 +0100 Committer: Ingo Molnar CommitDate: Tue, 2 Oct 2018 11:31:33 +0200 sched/numa: Migrate pages

[tip:sched/urgent] mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration

2018-10-02 Thread tip-bot for Mel Gorman
Commit-ID: efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf Gitweb: https://git.kernel.org/tip/efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf Author: Mel Gorman AuthorDate: Mon, 1 Oct 2018 11:05:24 +0100 Committer: Ingo Molnar CommitDate: Tue, 2 Oct 2018 11:31:14 +0200 mm, sched/numa: Remove rate

[tip:sched/core] sched/numa: Limit the conditions where scan period is reset

2018-10-02 Thread tip-bot for Mel Gorman
Commit-ID: 05cbdf4f5c191ff378c47bbf66d7230beb725bdb Gitweb: https://git.kernel.org/tip/05cbdf4f5c191ff378c47bbf66d7230beb725bdb Author: Mel Gorman AuthorDate: Fri, 21 Sep 2018 23:18:59 +0530 Committer: Ingo Molnar CommitDate: Tue, 2 Oct 2018 09:42:24 +0200 sched/numa: Limit

[PATCH 0/2] Faster migration for automatic NUMA balancing

2018-10-01 Thread Mel Gorman
These two patches are based on top of Srikar Dronamraju's recent work on automatic NUMA balancing and are motivated by a bug report from Jirka Hladky that STREAM performance has regressed. The STREAM workload is mildly interesting in that it only works as a valid benchmark if tasks are pinned to

[PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task

2018-10-01 Thread Mel Gorman
( 9.32%)47219.24 ( 9.06%) MB/sec scale30115.06 ( 0.00%)32568.12 ( 8.15%)32527.56 ( 8.01%) MB/sec add 32825.12 ( 0.00%)36078.94 ( 9.91%)35928.02 ( 9.45%) MB/sec triad32549.52 ( 0.00%)35935.94 ( 10.40%)35969.88 ( 10.51%) Signed-off-by: Mel

[PATCH 1/2] mm, numa: Remove rate-limiting of automatic numa balancing migration

2018-10-01 Thread Mel Gorman
noratelimit-v1r1 MB/sec copy 43298.52 ( 0.00%)44673.38 ( 3.18%) MB/sec scale30115.06 ( 0.00%)31293.06 ( 3.91%) MB/sec add 32825.12 ( 0.00%)34883.62 ( 6.27%) MB/sec triad32549.52 ( 0.00%)34906.60 ( 7.24% Signed-off-by: Mel Gorman --- include

Re: linux-mm@ archive on lore.kernel.org (Was: [PATCH 0/2] thp nodereclaim fixes)

2018-09-26 Thread Mel Gorman
e concepts were generally covered by academic textbooks AFAIK. -- Mel Gorman SUSE Labs

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-25 Thread Mel Gorman
hich allows for the node reclaim like behavior > for the specific memory ranges which would allow a > > [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarca...@redhat.com > > [mho...@suse.com: rewrote the changelog based on the one from Andrea] > Fixes: 5265047ac301 ("mm,

Re: [SCHEDULER] Performance drop in 4.19 compared to 4.18 kernel

2018-09-17 Thread Mel Gorman
On Fri, Sep 14, 2018 at 04:50:20PM +0200, Jirka Hladky wrote: > Hi Peter and Srikar, > > > I have bounced the 5 patches to you, (one of the 6 has not been applied by > > Peter) so I have skipped that. > > They can also be fetched from > >

Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-12 Thread Mel Gorman
On Wed, Sep 12, 2018 at 11:57:42AM +0200, Ingo Molnar wrote: > > * Mel Gorman [2018-09-10 10:41:47]: > > > > > On Fri, Sep 07, 2018 at 01:37:39PM +0100, Mel Gorman wrote: > > > > > Srikar's patch here: > > > > > > > > > > &g

Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-12 Thread Mel Gorman
I had were marginal which is why I was not convinced the complexity was justified. Your ppc64 figures look a bit more convincing and while I'm disappointed that you did not make a like-like comparison, I'm happy enough to go with your version. I can re-evaluate "Stop comparing tasks for NUMA placement" on its own later as well as the fast-migrate patches. Thanks Srikar. -- Mel Gorman SUSE Labs

Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-10 Thread Mel Gorman
On Fri, Sep 07, 2018 at 01:37:39PM +0100, Mel Gorman wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index d59d3e00a480..d4c289c11012 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -1560,7 +1

Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-07 Thread Mel Gorman
. > There is a chance but to find out, the task has to be dequeued and requeued on a maybe. An idle CPU is less disruptive and the only task affected is migrating to the preferred node where, based on previous fault behaviour, should have better locality. It's also a simplier patch but I'm going to be biased towards my own patch, the tests will decide one way or the other. -- Mel Gorman SUSE Labs

Re: [PATCH 3/4] sched/numa: Stop comparing tasks for NUMA placement after selecting an idle core

2018-09-07 Thread Mel Gorman
On Fri, Sep 07, 2018 at 06:35:53PM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2018-09-07 11:11:38]: > > > task_numa_migrate is responsible for finding a core on a preferred NUMA > > node for a task. As part of this, task_numa_find_cpu iterates through > > the CPUs

Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-07 Thread Mel Gorman
> > Also frobs this condition, but in a less radical way. Does that yield > similar results? I can check. I do wonder of course if the less radical approach just means that automatic NUMA balancing and the load balancer simply disagree about placement at a different time. It'll take a few days to have an answer as the battery of workloads to check this take ages. -- Mel Gorman SUSE Labs

Re: [PATCH 0/4] Follow-up fixes for v4.19-rc1 NUMA balancing

2018-09-07 Thread Mel Gorman
On Fri, Sep 07, 2018 at 01:24:55PM +0200, Peter Zijlstra wrote: > On Fri, Sep 07, 2018 at 11:11:35AM +0100, Mel Gorman wrote: > > Srikar had an automatic NUMA balancing series merged during the 4.19 window > > and there some issues I missed during review that this s

[PATCH 0/4] Follow-up fixes for v4.19-rc1 NUMA balancing

2018-09-07 Thread Mel Gorman
Srikar had an automatic NUMA balancing series merged during the 4.19 window and there some issues I missed during review that this series addresses. Patches 1-2 are simply removing redundant code and calculations that are never used. Patch 3 makes the observation that we can call

[PATCH 1/4] sched/numa: Remove redundant numa_stats nr_running field

2018-09-07 Thread Mel Gorman
The nr_running field has not been used since commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()") so remove it. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b3

[PATCH 2/4] sched/numa: Remove unused calculations in update_numa_stats

2018-09-07 Thread Mel Gorman
Commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()") removed the the has_free_capacity field but did not remove calculations related to it in update_numa_stats. This patch removes the unused code. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 22 +---

[PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

2018-09-07 Thread Mel Gorman
migrations 552,529 page-faults 26 sched:sched_move_numa 0 sched:sched_stick_numa 16 sched:sched_swap_numa Note the large drop in CPU migrations, the calls to sched_move_numa and page faults. Signed-off-by: Mel Gorman --- ke

[PATCH 3/4] sched/numa: Stop comparing tasks for NUMA placement after selecting an idle core

2018-09-07 Thread Mel Gorman
-utilised. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5b2f1684e96e..d59d3e00a480 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1535,7 +1535,7

Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks

2018-09-06 Thread Mel Gorman
would also end up with a much more sensible cc list. -- Mel Gorman SUSE Labs

Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks

2018-09-04 Thread Mel Gorman
056fafa196e1ab4e7161bae4df76f9602d56d is the source of the issue as at least one auto-bisection found that it may be problematic. Whether it is an issue or not depends heavily on the number of threads relative to a socket size. -- Mel Gorman SUSE Labs

Re: [PATCH] mm, page_alloc: actually ignore mempolicies for high priority allocations

2018-08-16 Thread Mel Gorman
So > even GFP_ATOMIC would now ignore mempolicies after the initial attempts > fail - if the code worked as people thought it does. > > Link: http://lkml.kernel.org/r/20180612122624.8045-1-vba...@suse.cz > Signed-off-by: Vlastimil Babka > Cc: Mel Gorman > Cc: Michal Hocko > Cc: David Rientjes > Cc: Joonsoo Kim > Signed-off-by: Andrew Morton FWIW, I thought I acked this already. Acked-by: Mel Gorman -- Mel Gorman SUSE Labs

  1   2   3   4   5   6   7   8   9   10   >