f the fake word mallocated.
To be consistent, this preparation patch renames alloced to allocated
in rmqueue_bulk so the bulk allocator and per-cpu allocator use similar
names when the bulk allocator is introduced.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 8
1 file changed
This series is based on top of Matthew Wilcox's series "Rationalise
__alloc_pages wrapper" and does not apply to 5.14-rc4. If Andrew's tree
is not the testing baseline then the following git tree will work.
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v6r7
Changelog
Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
---
include/net/page_pool.h | 2 +-
net/core/page_pool.c| 82 +
2 files changed, 57 insertions(+), 27 deletions(-)
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b5b195305
On Wed, Mar 24, 2021 at 03:36:14PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 24, 2021 at 01:39:16PM +0000, Mel Gorman wrote:
>
> > > Yeah, lets say I was pleasantly surprised to find it there :-)
> > >
> >
> > Minimally, lets move that out before it gets kic
On Wed, Mar 24, 2021 at 01:12:16PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 24, 2021 at 11:42:24AM +0000, Mel Gorman wrote:
> > On Wed, Mar 24, 2021 at 11:54:24AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 24, 2021 at 10:37:43AM +0100, Peter Zijlstra wrote:
> > > &
ed
tweak kernel.sched_min_granularity_ns
Whether there are legimiate reasons to modify those values or not,
removing them may generate fun bug reports.
--
Mel Gorman
SUSE Labs
p;resched_latency_warn_enabled))
> + resched_latency = resched_latency_check(rq);
> calc_global_load_tick(rq);
> psi_task_tick(rq);
>
> rq_unlock(rq, &rf);
>
> + if (static_branch_unlikely(&resched_latency_warn_enabled) &&
> + resched_latency)
> + resched_latency_warn(cpu, resched_latency);
> +
> perf_event_task_tick();
>
I don't see the need to split latency detection with the display of the
warning. As resched_latency_check is static with a single caller, it should
be inlined so you can move all the logic, including the static branch
check there. Maybe to be on the safe side, explicitly mark it inline.
That allows you to delete resched_latency_warn and avoid advertising it
through sched.h
--
Mel Gorman
SUSE Labs
ced range tracking of faulted pages to limit
how much scanning it has to do, it would inadvertently cause a change in
page activation rate.
NUMA balancing is about page locality, it should not get conflated with
page aging.
--
Mel Gorman
SUSE Labs
h nfsd threads hammer on the page allocator.
This improves throughput scalability by enabling the threads to run
more independently of each other.
[mgorman: Update interpretation of alloc_pages_bulk return value]
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.
s tight, I've applied the patch but am
keeping it separate to preserve the data in case someone points out that
is a big function to inline and "fixes" it.
--
Mel Gorman
SUSE Labs
On Tue, Mar 23, 2021 at 04:08:14PM +0100, Jesper Dangaard Brouer wrote:
> On Tue, 23 Mar 2021 10:44:21 +
> Mel Gorman wrote:
>
> > On Mon, Mar 22, 2021 at 09:18:42AM +0000, Mel Gorman wrote:
> > > This series is based on top of Matthew Wilcox's series "Ratio
0, gfp, 0);
-
- /*
-* If the array is sparse, check whether the array is
-* now fully populated. Continue allocations if
-* necessary.
-*/
- while (nr_populated < nr_pages && page_array[nr_populated])
- nr_populated++;
- if (hole && nr_populated < nr_pages)
- goto retry_hole;
- }
-
return nr_populated;
failed_irq:
--
Mel Gorman
SUSE Labs
On Mon, Mar 22, 2021 at 09:18:42AM +, Mel Gorman wrote:
> This series is based on top of Matthew Wilcox's series "Rationalise
> __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
> test and are not using Andrew's tree as a baseline, I suggest u
;s
slower for the API to do this check because it has to check every element
while the sunrpc user could check one element. Let me know if a) this
hunk helps and b) is desired behaviour.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c83d38dfe936..4bf20650e5f5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5107,6 +5107,9 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
} else {
while (prep_index < nr_populated)
prep_new_page(page_array[prep_index++], 0, gfp, 0);
+
+ while (nr_populated < nr_pages && page_array[nr_populated])
+ nr_populated++;
}
return nr_populated;
--
Mel Gorman
SUSE Labs
On Mon, Mar 22, 2021 at 06:25:03PM +, Chuck Lever III wrote:
>
>
> > On Mar 22, 2021, at 5:18 AM, Mel Gorman wrote:
> >
> > This series is based on top of Matthew Wilcox's series "Rationalise
> > __alloc_pages wrapper" and does not apply to
On Mon, Mar 22, 2021 at 01:04:46PM +0100, Jesper Dangaard Brouer wrote:
> On Mon, 22 Mar 2021 09:18:42 +
> Mel Gorman wrote:
>
> > This series is based on top of Matthew Wilcox's series "Rationalise
> > __alloc_pages wrapper" and does not apply to 5.12-
res. Checking if prev's sibling is free when there are no
idle cores is fairly cheap in comparison to a cpumask initialisation and
partial clearing.
If you have the testing capacity and time, test both.
> > A third concern, although it is mild, is that the SMT scan ignores
> > the
> > SIS_PROP limits on the depth search. This patch may increase the scan
> > depth as a result. It's only a mild concern as limiting the depth of
> > a
> > search is a magic number anyway.
>
> Agreed, placing the search inside the SIS_PROP block is
> going to clip the search differently than placing it
> outside, too.
>
> Probably no big deal, but I'll push a kernel with
> that change into the tests, anyway :)
>
Best plan because select_idle_sibling is always surprising :)
--
Mel Gorman
SUSE Labs
On Sun, Mar 21, 2021 at 03:03:58PM -0400, Rik van Riel wrote:
> Mel Gorman did some nice work in 9fe1f127b913
> ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
> being more efficient at finding an idle CPU, and in tasks spending less
>
345 12.0929
> w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
> +4.1% +8.3% +7.3% +6.3%
>
> Signed-off-by: Barry Song
Acked-by: Mel Gorman
That said, the numa_idle_core() function then becomes slightly
redundant. A possible follow up is to move the "idle_core &g
storage to store the pages.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 13 ++--
mm/page_alloc.c | 75 ++---
2 files changed, 67 insertions(+), 21 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4a304fd39916
f the fake word mallocated.
To be consistent, this preparation patch renames alloced to allocated
in rmqueue_bulk so the bulk allocator and per-cpu allocator use similar
names when the bulk allocator is introduced.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 8
1 file changed
is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
---
include/linux/gfp.h | 11
mm/page_alloc.c
This series is based on top of Matthew Wilcox's series "Rationalise
__alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
test and are not using Andrew's tree as a baseline, I suggest using the
following git tree
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-r
On Fri, Mar 19, 2021 at 07:18:32PM +0100, Vlastimil Babka wrote:
> On 3/12/21 4:43 PM, Mel Gorman wrote:
> > This patch adds a new page allocator interface via alloc_pages_bulk,
> > and __alloc_pages_bulk_nodemask. A caller requests a number of pages
> > to be allocated and
On Fri, Mar 19, 2021 at 05:11:39PM +0100, Vlastimil Babka wrote:
> On 3/12/21 4:43 PM, Mel Gorman wrote:
> > __alloc_pages updates GFP flags to enforce what flags are allowed
> > during a global context such as booting or suspend. This patch moves the
> > enforcement
tatic branch would mean splitting the very large inline functions
adding by the patch. The inline section should do a static check only and
do the main work in a function in kernel/sched/debug.c so it has minimal
overhead if unused.
--
Mel Gorman
SUSE Labs
is a slight risk
that with a spare array that only needed 1 page in reality would fail
the watermark check but on low memory, allocations take more work anyway.
That definition of nr_pages would avoid the potential buffer overrun but
both you and Jesper would need to agree that it's an appropriate API.
--
Mel Gorman
SUSE Labs
s. This patch adds an array-based interface to the API to avoid
multiple list iterations. The page list interface is preserved to
avoid requiring all users of the bulk API to allocate and manage
enough storage to store the pages.
Signed-off-by: Mel Gorman
diff --git a/include/linux/gfp.h b/include/
even if you placed a number of
> pages in the cache.
>
I think you're right but I'm punting this to Jesper to fix. He's more
familiar with this particular code and can verify the performance is
still ok for high speed networks.
--
Mel Gorman
SUSE Labs
easily spliced on a private cache or simply
handed back to the free API without having to track exactly how many
pages are on the array or where they are located. With arrays, the
elements have to be copied one at a time.
I think it's easier overall for the callers to deal with a list in
the initial implementation and only switch to arrays when there is an
extremely hot user that benefits heavily if pages are inserted directly
into an array.
--
Mel Gorman
SUSE Labs
27;m happy enough to include them in the
series even if it ultimately gets merged via the NFSD tree. It'll need
to be kept as a separate pull request to avoid delaying unrelated NFSD
patches until Andrew merges the mm parts.
--
Mel Gorman
SUSE Labs
an easy API to start with. Optimise
the implementation if it is a bottleneck. Only make the API harder to
use if the callers are really willing to always allocate and size the
array in advance and it's shown that it really makes a big difference
performance-wise.
--
Mel Gorman
SUSE Labs
using
the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias Apalodimas
---
net/core/page_pool.c |
From: Jesper Dangaard Brouer
In preparation for next patch, move the dma mapping into its own
function, as this will make it easier to follow the changes.
V2: make page_pool_dma_map return boolean (Ilias)
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias
roved
but it would require refactoring. The intent is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 12 +
mm/page_alloc.c
From: Chuck Lever
Reduce the rate at which nfsd threads hammer on the page allocator.
This improves throughput scalability by enabling the threads to run
more independently of each other.
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 43
r() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svc_xprt.
, it is obvious that __alloc_pages() and __alloc_pages
use different names for the same variable. This is an unnecessary
complication so rename gfp_mask to gfp in prepare_alloc_pages() so the
name is consistent.
No functional change.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 25
This series is based on top of Matthew Wilcox's series "Rationalise
__alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
test and are not using Andrew's tree as a baseline, I suggest using the
following git tree
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-r
f the fake word mallocated.
To be consistent, this preparation patch renames alloced to allocated
in rmqueue_bulk so the bulk allocator and per-cpu allocator use similar
names when the bulk allocator is introduced.
Signed-off-by: Mel Gorman
---
mm/page_alloc.c | 8
1 file changed
On Fri, Mar 12, 2021 at 12:43:31PM +, Matthew Wilcox wrote:
> On Wed, Mar 10, 2021 at 10:46:15AM +0000, Mel Gorman wrote:
> > +int __alloc_pages_bulk_nodemask(gfp_t gfp_mask, int preferred_nid,
> > + nodemask_t *nodemas
e could have the caller provide the array to store struct-page
> pointers, like we do with kmem_cache_alloc_bulk API.
>
That is a possibility but it ties the caller into declaring an array,
either via kmalloc, within an existing struct or on-stack. They would
then need to ensure that nr_pages does not exceed the array size or pass
in the array size. It's more error prone and a harder API to use.
> You likely have good reasons for returning the pages as a list (via
> lru), as I can see/imagine that there are some potential for grabbing
> the entire PCP-list.
>
I used a list so that user was only required to define a list_head on
the stack to use the API.
--
Mel Gorman
SUSE Labs
e, 0, gfp_mask, 0);
> > +
> > + return alloced;
> > +
> > +failed_irq:
> > + local_irq_restore(flags);
> > +
> > +failed:
> > + page = __alloc_pages_nodemask(gfp_mask, 0, preferred_nid, nodemask);
> > + if (page) {
> > + alloced++;
>
> You could be explicit here and just set alloced to 1 and make this a
> write instead of bothering with the increment. Either that or just
> simplify this and return 1 after the list_add, and return 0 in the
> default case assuming you didn't allocate a page.
>
The intent was to deal with the case that someone in the future used
the failed path when a page had already been allocated. I cannot imagine
why that would be done so I can explicitly used allocated = 1. I'm still
letting it fall through to avoid two return paths in failed path. I do
not think it really matters but it feels redundant.
Thanks Alexander!
--
Mel Gorman
SUSE Labs
roved
but it would require refactoring. The intent is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 13 +
mm/page_alloc.c
r() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svc_xprt.
using
the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias Apalodimas
---
net/core/page_pool.c |
Changelog since v3
o Prep new pages with IRQs enabled
o Minor documentation update
Changelog since v1
o Parenthesise binary and boolean comparisons
o Add reviewed-bys
o Rebase to 5.12-rc2
This series introduces a bulk order-0 page allocator with sunrpc and
the network page pool being the first us
From: Chuck Lever
Reduce the rate at which nfsd threads hammer on the page allocator.
This improve throughput scalability by enabling the threads to run
more independently of each other.
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 43
From: Jesper Dangaard Brouer
In preparation for next patch, move the dma mapping into its own
function, as this will make it easier to follow the changes.
V2: make page_pool_dma_map return boolean (Ilias)
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias
On Wed, Mar 10, 2021 at 03:47:04PM -0800, Andrew Morton wrote:
> On Wed, 10 Mar 2021 10:46:13 +0000 Mel Gorman
> wrote:
>
> > This series introduces a bulk order-0 page allocator with sunrpc and
> > the network page pool being the first users.
>
>
>
> Right
On Wed, Mar 10, 2021 at 03:46:50PM -0800, Andrew Morton wrote:
> On Wed, 10 Mar 2021 10:46:15 +0000 Mel Gorman
> wrote:
>
> > This patch adds a new page allocator interface via alloc_pages_bulk,
> > and __alloc_pages_bulk_nodemask. A caller requests a number of pages
&g
On Wed, Mar 10, 2021 at 01:04:17PM +0200, Shay Agroskin wrote:
>
> Mel Gorman writes:
>
> >
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 8572a1474e16..4903d1cc48dc 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
From: Jesper Dangaard Brouer
In preparation for next patch, move the dma mapping into its own
function, as this will make it easier to follow the changes.
V2: make page_pool_dma_map return boolean (Ilias)
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias
From: Chuck Lever
Reduce the rate at which nfsd threads hammer on the page allocator.
This improve throughput scalability by enabling the threads to run
more independently of each other.
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 43
roved
but it would require refactoring. The intent is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 13 +
mm/page_alloc.c
using
the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
Reviewed-by: Ilias Apalodimas
---
net/core/page_pool.c |
r() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svc_xprt.
Changelog since v1
o Parenthesise binary and boolean comparisons
o Add reviewed-bys
o Rebase to 5.12-rc2
This series introduces a bulk order-0 page allocator with sunrpc and
the network page pool being the first users. The implementation is not
particularly efficient and the intention is to iron o
}
> >
> > - gfp_mask &= gfp_allowed_mask;
> > - alloc_mask = gfp_mask;
>
> Is this change intentional?
Yes so that prepare_alloc_pages works for both the single page and bulk
allocator. Slightly less code duplication.
--
Mel Gorman
SUSE Labs
On Tue, Mar 02, 2021 at 08:49:06PM +0200, Ilias Apalodimas wrote:
> Hi Mel,
>
> Can you please CC me in future revisions. I almost missed that!
>
Will do.
> On Mon, Mar 01, 2021 at 04:11:59PM +0000, Mel Gorman wrote:
> > From: Jesper Dangaard Brouer
> >
> &
From: Chuck Lever
Reduce the rate at which nfsd threads hammer on the page allocator.
This improve throughput scalability by enabling the threads to run
more independently of each other.
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 43
From: Jesper Dangaard Brouer
In preparation for next patch, move the dma mapping into its own
function, as this will make it easier to follow the changes.
V2: make page_pool_dma_map return boolean (Ilias)
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
---
net/core
using
the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Mel Gorman
---
net/core/page_pool.c |
This series introduces a bulk order-0 page allocator with sunrpc and
the network page pool being the first users. The implementation is not
particularly efficient and the intention is to iron out what the semantics
of the API should have for users. Once the semantics are ironed out, it can
be made
r() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svc_xprt.
roved
but it would require refactoring. The intent is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 13 +
mm/page_alloc.c
e_list cache-aligned because of its location in the
> > struct zone so what purpose does __pad_to_align_free_list serve?
>
> The purpose of purpose of __pad_to_align_free_list is because struct
> list_head is 16 bytes, thus I wanted to align free_list to 16, given we
> already have wasted the space.
>
Ok, that's fair enough but it's also somewhat of a micro-optimisation as
whether it helps or not depends on the architecture.
I don't think I'll pick this up, certainly in the context of the bulk
allocator but it's worth keeping in mind. It's an interesting corner case
at least.
--
Mel Gorman
SUSE Labs
nt size, I'm less certain and wonder if something else is going on.
Finally, moving nr_free to the end and cache aligning it will make the
started of each free_list cache-aligned because of its location in the
struct zone so what purpose does __pad_to_align_free_list serve?
--
Mel Gorman
SUSE Labs
On Wed, Feb 24, 2021 at 12:27:23PM +0100, Jesper Dangaard Brouer wrote:
> On Wed, 24 Feb 2021 10:26:00 +
> Mel Gorman wrote:
>
> > This is a prototype series that introduces a bulk order-0 page allocator
> > with sunrpc being the first user. The implementatio
This is a prototype series that introduces a bulk order-0 page allocator
with sunrpc being the first user. The implementation is not particularly
efficient and the intention is to iron out what the semantics of the API
should be. That said, sunrpc was reported to have reduced allocation
latency whe
r() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svc_xprt.
roved
but it would require refactoring. The intent is to make it available early
to determine what semantics are required by different callers. Once the
full semantics are nailed down, it can be refactored.
Signed-off-by: Mel Gorman
---
include/linux/gfp.h | 13 +
mm/page_alloc.c
From: Chuck Lever
Reduce the rate at which nfsd threads hammer on the page allocator.
This improve throughput scalability by enabling the threads to run
more independently of each other.
Signed-off-by: Chuck Lever
Signed-off-by: Mel Gorman
---
net/sunrpc/svc_xprt.c | 43
e wrong
> zone's lock.
>
> This patch should fix the above issues.
>
> Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a
> migration target")
Acked-by: Mel Gorman
--
Mel Gorman
SUSE Labs
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 9fe1f127b913318c631d0041ecf71486e38c2c2d
Gitweb:
https://git.kernel.org/tip/9fe1f127b913318c631d0041ecf71486e38c2c2d
Author:Mel Gorman
AuthorDate:Wed, 27 Jan 2021 13:52:03
Committer
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 6cd56ef1df399a004f90ecb682427f9964969fc9
Gitweb:
https://git.kernel.org/tip/6cd56ef1df399a004f90ecb682427f9964969fc9
Author:Mel Gorman
AuthorDate:Mon, 25 Jan 2021 08:59:08
Committer
keep the large orders based on
> nr_cpu_ids as was before 045ab8c9487b.
>
Tested-by: Mel Gorman
Only x86-64 tested, three machines, all showing similar results as would
be expected. One example;
hackbench-process-sockets
5.11.0-rc7 5.11.0-rc7
anning means there is a greater chance of finding an idle CPU over the
two passes. I think overall it's better to avoid double scanning even if
there are counter examples as it's possible we'll get that back through
better depth selection in the future.
Thanks.
--
Mel Gorman
SUSE Labs
On Thu, Jan 28, 2021 at 02:57:10PM +0100, Michal Hocko wrote:
> On Thu 28-01-21 13:45:12, Mel Gorman wrote:
> [...]
> > So mostly this is down to the number of times SLUB calls into the page
> > allocator which only caches order-0 pages on a per-cpu basis. I do have
> >
es_bulk+0x1ac/0x7d0
&zone->lock 620874
[<122cecf3>] get_page_from_freelist+0xaf0/0x1370
Each individual wait time is small but the maximum waittime-max is roughly
double (120us vanilla vs 66us reverting the patch). Total wait time is
roughly doubled also due to the patch. Acquisitions are almost doubled.
So mostly this is down to the number of times SLUB calls into the page
allocator which only caches order-0 pages on a per-cpu basis. I do have
a prototype for a high-order per-cpu allocator but it is very rough --
high watermarks stop making sense, code is rough, memory needed for the
pcpu structures quadruples etc.
--
Mel Gorman
SUSE Labs
ons(+), 82 deletions(-)
--
2.26.2
Mel Gorman (4):
sched/fair: Remove SIS_AVG_CPU
sched/fair: Move avg_scan_cost calculations under SIS_PROP
sched/fair: Remove select_idle_smt()
sched/fair: Merge select_idle_core/cpu()
kernel/sched/fair.c | 151 +++---
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.
Signed-off-by: Mel Gorman
Reviewed-by: Vincent
dle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.
Signed-off-by: Mel Gorman
Reviewed-by: Vincent Guittot
---
kernel/sched/fair.c | 20 +---
kernel/sched/features.h | 1 -
2 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/f
From: Peter Zijlstra (Intel)
In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Mel Gorman
Reviewed-by: Vincent Guittot
---
kernel/sched/fair.c | 30
an idle core. This way we'll only iterate every CPU once.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Mel Gorman
Reviewed-by: Vincent Guittot
---
kernel/sched/fair.c | 99 +++--
1 file changed, 59 insertions(+), 40 deletions(-)
diff --
o engage brain. I'll apply your hunk and resend. I don't think this
merits retesting as the saving of avoiding the intermeriate is marginal.
--
Mel Gorman
SUSE Labs
cpu = i;
+ idle_cpu = __select_idle_cpu(cpu);
+ if ((unsigned int)idle_cpu < nr_cpumask_bits)
break;
- }
}
}
--
Mel Gorman
SUSE Labs
ing on 4-socket systems). In such cases, double scanning can
still show improvements for workloads that idle rapidly like tbench and
hackbench even though it's expensive. The extra scanning gives more time
for a CPU to go idle enough to be selected which can improve throughput
but at the cost of wake-up latency,
Hopefully v4 can be tested as well which is now just a single scan.
--
Mel Gorman
SUSE Labs
dle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.
Signed-off-by: Mel Gorman
Reviewed-by: Vincent Guittot
---
kernel/sched/fair.c | 20 +---
kernel/sched/features.h | 1 -
2 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/f
From: Peter Zijlstra
From: Peter Zijlstra (Intel)
In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 30
On Sun, Jan 24, 2021 at 09:38:14PM +0100, Julia Lawall wrote:
>
>
> On Tue, 27 Oct 2020, Mel Gorman wrote:
>
> > On Thu, Oct 22, 2020 at 03:15:50PM +0200, Julia Lawall wrote:
> > > Fixes: 11f10e5420f6 ("sched/fair: Use load instead of runnable load in
> &g
encountered while
scanning for an idle core. This way we'll only iterate every CPU once.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 101 ++--
1 file changed, 61 insertions(+), 40 deletions(-)
diff --git a/k
0 affected because it included two patches related to halting
compaction that are relevant.
d20bdd571ee5c9966191568527ecdb1bd4b52368 mm/compaction: stop isolation if too
many pages are isolated and we have pages to migrate
38935861d85a4d9a353d1dd5a156c97700e2765d mm/compaction: count pages and stop
correctly during page isolation
--
Mel Gorman
SUSE Labs
may not be appropriate because it's a big change in an rc window.
So, should this patch be merged for 5.11 as a stopgap, fix up
schedutil/cpufreq and then test both AMD and Intel chips reporting the
correct max non-turbo and max-turbo frequencies? That would give time to
give some testing in linux-next before merging to reduce the chance
something else falls out.
--
Mel Gorman
SUSE Labs
On Mon, Jan 25, 2021 at 09:53:28PM +0800, Li, Aubrey wrote:
> On 2021/1/25 17:06, Mel Gorman wrote:
> > On Mon, Jan 25, 2021 at 02:02:58PM +0800, Aubrey Li wrote:
> >> A long-tail load balance cost is observed on the newly idle path,
> >> this is caused by a ra
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.
Signed-off-by: Mel Gorman
---
kernel/sched
Changelog since v3
o Drop scanning based on cores, SMT4 results showed problems
Changelog since v2
o Remove unnecessary parameters
o Update nr during scan only when scanning for cpus
Changlog since v1
o Move extern declaration to header for coding style
o Remove unnecessary parameter from __selec
a constant, why is it not a #define instead of increasing
the size of lb_env?
--
Mel Gorman
SUSE Labs
ned I tested 5 patches with patch3, not patch3 alone.
>
Ah, that makes more sense.
> >
> > Hopefully v4 can be tested as well which is now just a single scan.
> >
>
> Sure, may I know the baseline of v4?
>
5.11-rc4.
--
Mel Gorman
SUSE Labs
On Fri, Jan 22, 2021 at 10:30:52AM +0100, Vincent Guittot wrote:
> Hi Mel,
>
> On Tue, 19 Jan 2021 at 13:02, Mel Gorman wrote:
> >
> > On Tue, Jan 19, 2021 at 12:33:04PM +0100, Vincent Guittot wrote:
> > > On Tue, 19 Jan 2021 at 12:22, Mel Gorman
> > > wr
101 - 200 of 3811 matches
Mail list logo