[PATCH 01/15] mm: numa: Document automatic NUMA balancing sysctls

2013-07-05 Thread Mel Gorman
Signed-off-by: Mel Gorman --- Documentation/sysctl/kernel.txt | 66 + 1 file changed, 66 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ccd4258..0fe678c 100644 --- a/Documentation/sysctl/kernel.txt +++ b

[PATCH 0/15] Basic scheduler support for automatic NUMA balancing V3

2013-07-05 Thread Mel Gorman
This continues to build on the previous feedback. The results are a mix of gains and losses but when looking at the losses I think it's also important to consider the reduced overhead when the patches are applied. I still have not had the chance to closely review Peter's or Srikar's approach to sch

[PATCH 1/2] mm: vmscan: Avoid direct reclaim scanning at maximum priority

2013-06-26 Thread Mel Gorman
consider firing the OOM killer. The user-visible impact is that direct reclaim will not easily reach priority 0 and start swapping prematurely. Signed-off-by: Mel Gorman --- mm/vmscan.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index fe73724

[PATCH 2/2] mm: vmscan: Do not scale writeback pages when deciding whether to set ZONE_WRITEBACK

2013-06-26 Thread Mel Gorman
ect should be that kswapd will writeback fewer pages from reclaim context. Signed-off-by: Mel Gorman --- mm/vmscan.c | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 65f2fbea..f677780 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @

[PATCH 0/2] Reduce system disruption due to kswapd more followup

2013-06-26 Thread Mel Gorman
Further testing revealed that swapping was still higher than expected for the parallel IO tests. There was also a performance regression reported building kernels but there appears to be multiple sources of that problem. This follow-up series primarily addresses the first swapping issue. The tests

[PATCH 8/8] sched: Increase NUMA PTE scanning when a new preferred node is selected

2013-06-26 Thread Mel Gorman
task is currently running on the node to recheck if the placement decision is correct. In the optimistic expectation that the placement decisions will be correct, the maximum period between scans is also increased to reduce overhead due to automatic NUMA balancing. Signed-off-by: Mel Gorman

[PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-26 Thread Mel Gorman
scans. Signed-off-by: Mel Gorman --- Documentation/sysctl/kernel.txt | 8 +++- include/linux/sched.h | 1 + kernel/sched/core.c | 4 +++- kernel/sched/fair.c | 40 ++-- kernel/sysctl.c | 7 +++ 5

[PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter

2013-06-26 Thread Mel Gorman
approximates private pages by assuming that faults that pass the two-stage filter are private pages and all others are shared. The preferred NUMA node is then selected based on where the maximum number of approximately private faults were measured. Signed-off-by: Mel Gorman --- include/linux/sched.h | 4

[PATCH 0/6] Basic scheduler support for automatic NUMA balancing

2013-06-26 Thread Mel Gorman
It's several months overdue and everything was quiet after 3.8 came out but I recently had a chance to revisit automatic NUMA balancing for a few days. I looked at basic scheduler integration resulting in the following small series. Much of the following is heavily based on the numacore series whic

[PATCH 4/8] sched: Update NUMA hinting faults once per scan

2013-06-26 Thread Mel Gorman
. Signed-off-by: Mel Gorman --- include/linux/sched.h | 13 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 16 +--- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ba46a64..42f9818 100644 --- a

[PATCH 6/8] sched: Reschedule task on preferred NUMA node once selected

2013-06-26 Thread Mel Gorman
balancer to make a decision. Signed-off-by: Mel Gorman --- kernel/sched/core.c | 18 +++-- kernel/sched/fair.c | 55 ++-- kernel/sched/sched.h | 2 +- 3 files changed, 70 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b

[PATCH 1/8] mm: numa: Document automatic NUMA balancing sysctls

2013-06-26 Thread Mel Gorman
Signed-off-by: Mel Gorman --- Documentation/sysctl/kernel.txt | 66 + 1 file changed, 66 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ccd4258..0fe678c 100644 --- a/Documentation/sysctl/kernel.txt +++ b

[PATCH 2/8] sched: Track NUMA hinting faults on per-node basis

2013-06-26 Thread Mel Gorman
task did not migrate the data again unnecessarily. This information is later used to schedule a task on the node incurring the most NUMA hinting faults. Signed-off-by: Mel Gorman --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 12 +++- kernel/sched

[PATCH 3/8] sched: Select a preferred node with the most numa hinting faults

2013-06-26 Thread Mel Gorman
This patch selects a preferred node for a task to run on based on the NUMA hinting faults. This information is later used to migrate tasks towards the node during balancing. Signed-off-by: Mel Gorman --- include/linux/sched.h | 1 + kernel/sched/core.c | 10 ++ kernel/sched/fair.c

Re: [PATCH 1/2] mm: vmscan: Avoid direct reclaim scanning at maximum priority

2013-06-28 Thread Mel Gorman
On Wed, Jun 26, 2013 at 12:39:25PM -0700, Andrew Morton wrote: > On Wed, 26 Jun 2013 13:39:23 +0100 Mel Gorman wrote: > > > Page reclaim at priority 0 will scan the entire LRU as priority 0 is > > considered to be a near OOM condition. Direct reclaim can reach this > &

Re: [PATCH 2/8] sched: Track NUMA hinting faults on per-node basis

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 05:57:48PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:01PM +0100, Mel Gorman wrote: > > @@ -503,6 +503,18 @@ DECLARE_PER_CPU(struct rq, runqueues); > > #define cpu_curr(cpu) (cpu_rq(cpu)->curr) &

Re: [PATCH 2/8] sched: Track NUMA hinting faults on per-node basis

2013-06-28 Thread Mel Gorman
On Fri, Jun 28, 2013 at 11:38:29AM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2013-06-26 15:38:01]: > > > This patch tracks what nodes numa hinting faults were incurred on. Greater > > weight is given if the pages were to be migrated on the understanding > &g

Re: [PATCH 3/8] sched: Select a preferred node with the most numa hinting faults

2013-06-28 Thread Mel Gorman
On Fri, Jun 28, 2013 at 11:44:28AM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2013-06-26 15:38:02]: > > > This patch selects a preferred node for a task to run on based on the > > NUMA hinting faults. This information is later used to migrate tasks > > towards t

Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 04:53:45PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote: > > This patch favours moving tasks towards the preferred NUMA node when > > it has just been selected. Ideally this is self-reinforcing as the > > lon

Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 06:01:27PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote: > > @@ -3897,6 +3907,28 @@ task_hot(struct task_struct *p, u64 now, struct > > sched_domain *sd) > > return delta < (s64)sys

Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 06:11:27PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote: > > +/* Returns true if the destination node has incurred more faults */ > > +static bool migrate_improves_locality(struct task_struct *p, struct l

Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-28 Thread Mel Gorman
c +++ b/kernel/sched/fair.c @@ -4088,8 +4088,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) * 3) too many balance attempts have failed. */ - if (migrate_improves_locality(p, env)) + if (migrate_improves_locality(p, env)) { +#ifdef CONFIG_SCHEDSTATS +

Re: [PATCH 6/8] sched: Reschedule task on preferred NUMA node once selected

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 04:54:58PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:05PM +0100, Mel Gorman wrote: > > +static int > > +find_idlest_cpu_node(int this_cpu, int nid) > > +{ > > + unsigned long load, min_load = ULONG_MAX; > >

Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter

2013-06-28 Thread Mel Gorman
On Thu, Jun 27, 2013 at 04:56:58PM +0200, Peter Zijlstra wrote: > On Wed, Jun 26, 2013 at 03:38:06PM +0100, Mel Gorman wrote: > > +void task_numa_fault(int last_nid, int node, int pages, bool migrated) > > { > > struct task_struct *p = current; > > + int pri

Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter

2013-06-28 Thread Mel Gorman
to fault the shared page making the information unreliable. It is important that *something* be done with shared faults but I haven't thought of what exactly yet. One possibility would be to give them a different weight, maybe based on the number of active NUMA nodes, but I had not tested anything

Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node

2013-06-28 Thread Mel Gorman
On Fri, Jun 28, 2013 at 10:44:27PM +0530, Srikar Dronamraju wrote: > > > Yes, I understand that numa should have more priority over cache. > > > But the schedstats will not be updated about whether the task was hot or > > > cold. > > > > > > So lets say the task was cache hot but numa wants it to

Re: [PATCH 0/6] Basic scheduler support for automatic NUMA balancing

2013-07-01 Thread Mel Gorman
On Mon, Jul 01, 2013 at 11:09:47AM +0530, Srikar Dronamraju wrote: > * Srikar Dronamraju [2013-06-28 19:24:22]: > > > * Mel Gorman [2013-06-26 15:37:59]: > > > > > It's several months overdue and everything was quiet after 3.8 came out > > > but I r

Re: [PATCH 01/10] mm: vmscan: Limit the number of pages kswapd reclaims at each priority

2013-03-21 Thread Mel Gorman
On Thu, Mar 21, 2013 at 11:57:05AM -0400, Johannes Weiner wrote: > On Sun, Mar 17, 2013 at 01:04:07PM +0000, Mel Gorman wrote: > > The number of pages kswapd can reclaim is bound by the number of pages it > > scans which is related to the size of the zone and the scanning priori

Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd

2013-03-21 Thread Mel Gorman
On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote: > On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote: > > Simplistically, the anon and file LRU lists are scanned proportionally > > depending on the value of vm.swappiness although there are other factors

Re: [PATCH 09/10] mm: vmscan: Check if kswapd should writepage once per priority

2013-03-21 Thread Mel Gorman
On Thu, Mar 21, 2013 at 05:58:37PM +0100, Michal Hocko wrote: > On Sun 17-03-13 13:04:15, Mel Gorman wrote: > > Currently kswapd checks if it should start writepage as it shrinks > > each zone without taking into consideration if the zone is balanced or > > not. This is not

Re: [PATCH 10/10] mm: vmscan: Move logic from balance_pgdat() to kswapd_shrink_zone()

2013-03-21 Thread Mel Gorman
On Thu, Mar 21, 2013 at 06:18:04PM +0100, Michal Hocko wrote: > On Sun 17-03-13 13:04:16, Mel Gorman wrote: > > + > > + /* > > +* Kswapd reclaims only single pages with compaction enabled. Trying > > +* too hard to reclaim until contiguous free pages have beco

Re: [PATCH 06/10] mm: vmscan: Have kswapd writeback pages based on dirty pages encountered, not priority

2013-03-21 Thread Mel Gorman
On Thu, Mar 21, 2013 at 01:53:41PM -0400, Rik van Riel wrote: > On 03/17/2013 11:11 AM, Mel Gorman wrote: > >On Sun, Mar 17, 2013 at 07:42:39AM -0700, Andi Kleen wrote: > >>Mel Gorman writes: > >> > >>>@@ -495,6 +495,9 @@ typedef enum { > >>>

Re: [PATCH 07/10] mm: vmscan: Block kswapd if it is encountering pages under writeback

2013-03-22 Thread Mel Gorman
On Thu, Mar 21, 2013 at 02:42:26PM -0400, Rik van Riel wrote: > On 03/17/2013 09:04 AM, Mel Gorman wrote: > >Historically, kswapd used to congestion_wait() at higher priorities if it > >was not making forward progress. This made no sense as the failure to make > >progres

Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd

2013-03-22 Thread Mel Gorman
On Fri, Mar 22, 2013 at 08:54:27AM +0100, Michal Hocko wrote: > On Thu 21-03-13 15:34:42, Mel Gorman wrote: > > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote: > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > index 4835a7a.

Re: [RFC PATCH 0/8] Reduce system disruption due to kswapd

2013-03-22 Thread Mel Gorman
On Sun, Mar 17, 2013 at 01:04:06PM +, Mel Gorman wrote: > Kswapd and page reclaim behaviour has been screwy in one way or the other > for a long time. Very broadly speaking it worked in the far past because > machines were limited in memory so it did not have that many pages to scan

Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd

2013-03-22 Thread Mel Gorman
On Fri, Mar 22, 2013 at 12:53:49PM -0400, Johannes Weiner wrote: > On Thu, Mar 21, 2013 at 06:02:38PM +0000, Mel Gorman wrote: > > On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote: > > > On Sun, Mar 17, 2013 at 01:04:08PM +, Mel Gorman wrote: > > > >

Re: [PATCH 02/10] mm: vmscan: Obey proportional scanning requirements for kswapd

2013-03-22 Thread Mel Gorman
ACTIVE] = 0; > > > > /* Reduce scanning of the other LRU proportionally */ > > lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE; > > nr[lru] = nr[lru] * percentage / 100;; > > nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * pe

Re: [PATCH] mm: skip the page buddy block instead of one page

2013-08-14 Thread Mel Gorman
On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote: > Hi Mel, > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote: > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote: > > > A large free page buddy block will continue many times, so if the p

Re: [PATCH v6 0/5] zram/zsmalloc promotion

2013-08-14 Thread Mel Gorman
gable to optionally use zsmalloc when the user did not care that it had terrible writeback characteristics? zswap cannot replicate zram+tmpfs but I also think that such a configuration is a bad idea anyway. As zram is already being deployed then it might get promoted anyway but personally I think

Re: [PATCH] mm: skip the page buddy block instead of one page

2013-08-14 Thread Mel Gorman
On Thu, Aug 15, 2013 at 01:39:21AM +0900, Minchan Kim wrote: > On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote: > > On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote: > > > Hi Mel, > > > > > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel

Re: [PATCH] mm: skip the page buddy block instead of one page

2013-08-14 Thread Mel Gorman
On Wed, Aug 14, 2013 at 01:26:02PM -0700, Andrew Morton wrote: > On Thu, 15 Aug 2013 00:52:29 +0900 Minchan Kim wrote: > > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote: > > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote: > > > > A

Re: kswapd skips compaction if reclaim order drops to zero?

2013-08-15 Thread Mel Gorman
= no calling compact_pdatt In the case where order is reset to 0 due to fragmentation then it does call compact_pgdat but it does no work due to the cc->order check in __compact_pgdat. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: [PATCH] mm: skip the page buddy block instead of one page

2013-08-15 Thread Mel Gorman
_pfn = min(low_pfn, end_pfn); > } > #endif > continue; > } > > so worst case is (pageblock_nr_pages - 1). No it isn't. The worst case it that the whole region being searched is skipped. For THP allocations, it would happen to work as being the pageblock bound

Re: [PATCH] mm: skip the page buddy block instead of one page

2013-08-15 Thread Mel Gorman
works with [low_pfn, end_pfn) > and we can't guarantee page_order in normal compaction path > so I'd like to limit the skipping by end_pfn conservatively. > Fine s/MAX_ORDER_NR_PAGES/pageblock_nr_pages/ and take the min of it and low_pfn = min(low_pfn, end_pfn - 1) -- Mel G

Re: kswapd skips compaction if reclaim order drops to zero?

2013-08-15 Thread Mel Gorman
On Thu, Aug 15, 2013 at 10:41:39PM +0900, Minchan Kim wrote: > Hey Mel, > > On Thu, Aug 15, 2013 at 11:47:27AM +0100, Mel Gorman wrote: > > On Thu, Aug 15, 2013 at 06:02:53PM +0800, Hillf Danton wrote: > > > If the allocation order is not high, direct compaction does no

[PATCH] mm: compaction: Do not compact pgdat for order-0

2013-08-15 Thread Mel Gorman
...@gmail.com: Pointed out that it was a potential problem] Signed-off-by: Mel Gorman --- mm/compaction.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 05ccb4c..c437893 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1131,6 +1131,9 @@ void

Re: [PATCH v6 0/5] zram/zsmalloc promotion

2013-08-15 Thread Mel Gorman
ll lead to zram and zswap diverging further from each other, both implementing similar functionality and ultimately cause greater maintenance headaches. There is a path that makes zswap a functional replacement for zram and I've seen no good reason why that path was not taken. Zram cannot be a

Re: [PATCH v6 0/5] zram/zsmalloc promotion

2013-08-16 Thread Mel Gorman
fresh zswap > couldn't replace old zram? > > Mel, please consider embedded world although they are very little voice > in this core subsystem. > I already said I recognise it has a large number of users in the field and users count a lot more than me complaining. If it ge

Re: [PATCH v6 0/5] zram/zsmalloc promotion

2013-08-16 Thread Mel Gorman
On Fri, Aug 16, 2013 at 09:33:47AM +0100, Mel Gorman wrote: > On Fri, Aug 16, 2013 at 01:26:41PM +0900, Minchan Kim wrote: > > It'll get even more entertaining if/when someone ever tries > to reimplement zcache although since Dan left I do not believe anyone is > planni

[PATCH 03/18] mm: numa: Account for THP numa hinting faults on the correct node

2013-07-15 Thread Mel Gorman
THP NUMA hinting fault on pages that are not migrated are being accounted for incorrectly. Currently the fault will be counted as if the task was running on a node local to the page which is not necessarily true. Signed-off-by: Mel Gorman --- mm/huge_memory.c | 10 +- 1 file changed, 5

[PATCH 14/18] sched: Remove check that skips small VMAs

2013-07-15 Thread Mel Gorman
motivation to do it properly in the future. Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 4 1 file changed, 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8a392c8..b43122c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1080,10 +1080,6 @@ void

[PATCH 17/18] sched: Retry migration of tasks to CPU on a preferred node

2013-07-15 Thread Mel Gorman
time if another attempt should be made to migrate the task. It will only make an attempt once every five seconds. Signed-off-by: Mel Gorman --- include/linux/sched.h | 1 + kernel/sched/fair.c | 40 +++- 2 files changed, 24 insertions(+), 17 deletions

[PATCH 13/18] mm: numa: Scan pages with elevated page_mapcount

2013-07-15 Thread Mel Gorman
these are generally shared library pages. Migrating such pages is not beneficial as there is an expectation they are read-shared between caches and iTLB and iCache pressure is generally low. Signed-off-by: Mel Gorman --- include/linux/migrate.h | 7 --- mm/memory.c | 7 ++- mm

[PATCH 12/18] sched: Set the scan rate proportional to the size of the task being scanned

2013-07-15 Thread Mel Gorman
sed on the amount of virtual memory that should be scanned in a second. The default of 2.5G seems arbitrary but it is to have the maximum scan rate after the patch roughly match the maximum scan rate before the patch was applied. Signed-off-by: Mel Gorman --- Documentation/sysctl/kerne

[PATCH 11/18] sched: Check current->mm before allocating NUMA faults

2013-07-15 Thread Mel Gorman
task_numa_placement checks current->mm but after buffers for faults have already been uselessly allocated. Move the check earlier. [pet...@infradead.org: Identified the problem] Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) d

[PATCH 01/18] mm: numa: Document automatic NUMA balancing sysctls

2013-07-15 Thread Mel Gorman
Signed-off-by: Mel Gorman --- Documentation/sysctl/kernel.txt | 66 + 1 file changed, 66 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ccd4258..0fe678c 100644 --- a/Documentation/sysctl/kernel.txt +++ b

[PATCH 04/18] mm: numa: Do not migrate or account for hinting faults on the zero page

2013-07-15 Thread Mel Gorman
in both terms of counting faults and scheduling tasks on nodes. Signed-off-by: Mel Gorman --- mm/huge_memory.c | 9 + mm/memory.c | 7 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e4a79fa..ec938ed 100644 --- a/mm

[PATCH 06/18] sched: Update NUMA hinting faults once per scan

2013-07-15 Thread Mel Gorman
. Signed-off-by: Mel Gorman --- include/linux/sched.h | 13 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 16 +--- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ba46a64..42f9818 100644 --- a

[PATCH 0/18] Basic scheduler support for automatic NUMA balancing V5

2013-07-15 Thread Mel Gorman
This continues to build on the previous feedback and further testing and I'm hoping this can be finalised relatively soon. False sharing is still a major problem but I still think it deserves its own series. Minimally I think the fact that we are now scanning shared pages without much additional sy

[PATCH 07/18] sched: Favour moving tasks towards the preferred node

2013-07-15 Thread Mel Gorman
controlled by the numa_balancing_settle_count sysctl. Once the settle_count number of scans has complete the schedule is free to place the task on an alternative node if the load is imbalanced. [sri...@linux.vnet.ibm.com: Fixed statistics] Signed-off-by: Mel Gorman --- Documentation/sysctl/kernel.txt | 8

[PATCH 05/18] sched: Select a preferred node with the most numa hinting faults

2013-07-15 Thread Mel Gorman
This patch selects a preferred node for a task to run on based on the NUMA hinting faults. This information is later used to migrate tasks towards the node during balancing. Signed-off-by: Mel Gorman --- include/linux/sched.h | 1 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 17

[PATCH 09/18] sched: Add infrastructure for split shared/private accounting of NUMA hinting faults

2013-07-15 Thread Mel Gorman
, all faults are treated as private and detection will be introduced later. Signed-off-by: Mel Gorman --- include/linux/sched.h | 5 +++-- kernel/sched/fair.c | 33 - mm/huge_memory.c | 7 --- mm/memory.c | 9 ++--- 4 files changed, 37

[PATCH 08/18] sched: Reschedule task on preferred NUMA node once selected

2013-07-15 Thread Mel Gorman
balancer to make a decision. Signed-off-by: Mel Gorman --- kernel/sched/core.c | 17 + kernel/sched/fair.c | 46 +- kernel/sched/sched.h | 1 + 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched

[PATCH 02/18] sched: Track NUMA hinting faults on per-node basis

2013-07-15 Thread Mel Gorman
task did not migrate the data again unnecessarily. This information is later used to schedule a task on the node incurring the most NUMA hinting faults. Signed-off-by: Mel Gorman --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 12 +++- kernel/sched

[PATCH 10/18] sched: Increase NUMA PTE scanning when a new preferred node is selected

2013-07-15 Thread Mel Gorman
task is currently running on the node to recheck if the placement decision is correct. In the optimistic expectation that the placement decisions will be correct, the maximum period between scans is also increased to reduce overhead due to automatic NUMA balancing. Signed-off-by: Mel Gorman

[PATCH 15/18] sched: Set preferred NUMA node based on number of private faults

2013-07-15 Thread Mel Gorman
son is that multiple threads in a process will race each other to fault the shared page making the fault information unreliable. Signed-off-by: Mel Gorman --- include/linux/mm.h| 69 ++- include/linux/mm_types.h | 4 +-- include/li

[PATCH 18/18] sched: Swap tasks when reschuling if a CPU on a target node is imbalanced

2013-07-15 Thread Mel Gorman
attempt will be made to swap with the task if it is not running on its preferred node and that moving it would not impair its locality. Signed-off-by: Mel Gorman --- kernel/sched/core.c | 39 +-- kernel/sched/fair.c | 46

[PATCH 16/18] sched: Avoid overloading CPUs on a preferred NUMA node

2013-07-15 Thread Mel Gorman
-by: Peter Zijlstra Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 105 +--- 1 file changed, 83 insertions(+), 22 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f0519c..8ee1c8e 100644 --- a/kernel/sched/fair.c +++ b

Re: [PATCH 16/18] sched: Avoid overloading CPUs on a preferred NUMA node

2013-07-16 Thread Mel Gorman
On Mon, Jul 15, 2013 at 10:03:21PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:18PM +0100, Mel Gorman wrote: > > --- > > kernel/sched/fair.c | 105 > > +--- > > 1 file changed, 83 insertions(+), 22 delet

Re: [PATCH 18/18] sched: Swap tasks when reschuling if a CPU on a target node is imbalanced

2013-07-16 Thread Mel Gorman
On Mon, Jul 15, 2013 at 10:11:10PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:20PM +0100, Mel Gorman wrote: > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 53d8465..d679b01 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/s

Re: [PATCH 16/18] sched: Avoid overloading CPUs on a preferred NUMA node

2013-07-16 Thread Mel Gorman
On Tue, Jul 16, 2013 at 11:55:24PM +0800, Hillf Danton wrote: > On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote: > > + > > +static int task_numa_find_cpu(struct task_struct *p, int nid) > > +{ > > + int node_cpu = cpumask_first(cpumask_of_node(nid)); > [.

Re: [PATCH 02/18] sched: Track NUMA hinting faults on per-node basis

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 12:50:30PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:04PM +0100, Mel Gorman wrote: > > index cc03cfd..c5f773d 100644 > > --- a/kernel/sched/sched.h > > +++ b/kernel/sched/sched.h > > @@ -503,6 +503,17 @@ DECLARE_PER_CPU(struct r

Re: [PATCH 02/18] sched: Track NUMA hinting faults on per-node basis

2013-07-31 Thread Mel Gorman
On Mon, Jul 29, 2013 at 12:10:59PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:04PM +0100, Mel Gorman wrote: > > +++ b/kernel/sched/fair.c > > @@ -815,7 +815,14 @@ void task_numa_fault(int node, int pages, bool > > migrated) > >

Re: [PATCH 04/18] mm: numa: Do not migrate or account for hinting faults on the zero page

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 01:00:53PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:06PM +0100, Mel Gorman wrote: > > The zero page is not replicated between nodes and is often shared > > between processes. The data is read-only and likely to be cached in > >

Re: [PATCH] sched, numa: migrates_degrades_locality()

2013-07-31 Thread Mel Gorman
) + +/* + * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a + * higher number of hinting faults are recorded during active load + * balancing. + */ +SCHED_FEAT(NUMA_FAVOUR_HIGHER, true) #endif -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscri

Re: [PATCH 08/18] sched: Reschedule task on preferred NUMA node once selected

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 09:31:05AM +0800, Hillf Danton wrote: > On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote: > > +static int > > +find_idlest_cpu_node(int this_cpu, int nid) > > +{ > > + unsigned long load, min_load = ULONG_MAX; > > +

Re: [PATCH 09/18] sched: Add infrastructure for split shared/private accounting of NUMA hinting faults

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 10:17:29AM +0800, Hillf Danton wrote: > On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote: > > /* > > * Got a PROT_NONE fault for a page on @node. > > */ > > -void task_numa_fault(int node, int pages, bool migrated) > > +void task_

Re: [PATCH 13/18] mm: numa: Scan pages with elevated page_mapcount

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 01:22:22PM +0800, Sam Ben wrote: > On 07/15/2013 11:20 PM, Mel Gorman wrote: > >Currently automatic NUMA balancing is unable to distinguish between false > >shared versus private pages except by ignoring pages with an elevated > > What's t

Re: [PATCH 15/18] fix compilation with !CONFIG_NUMA_BALANCING

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 09:53:53PM -0400, Rik van Riel wrote: > On Mon, 15 Jul 2013 16:20:17 +0100 > Mel Gorman wrote: > > > Ideally it would be possible to distinguish between NUMA hinting faults that > > are private to a task and those that are shared. If treated ident

Re: [PATCH 15/18] sched: Set preferred NUMA node based on number of private faults

2013-07-31 Thread Mel Gorman
On Fri, Jul 26, 2013 at 01:20:50PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:17PM +0100, Mel Gorman wrote: > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > index cacc64a..04c9469 100644 > > --- a/mm/mprotect.c > > +++ b/mm/mprotect.c > >

Re: [PATCH 16/18] sched: Avoid overloading CPUs on a preferred NUMA node

2013-07-31 Thread Mel Gorman
On Wed, Jul 17, 2013 at 12:54:23PM +0200, Peter Zijlstra wrote: > On Mon, Jul 15, 2013 at 04:20:18PM +0100, Mel Gorman wrote: > > +static long effective_load(struct task_group *tg, int cpu, long wl, long > > wg); > > And this > -- which suggests you always build with cg

Re: [PATCH 17/18] sched: Retry migration of tasks to CPU on a preferred node

2013-07-31 Thread Mel Gorman
oses the stop_machine() state machine but only stops > the two cpus which we can do with on-stack structures and avoid > machine wide synchronization issues. > > Signed-off-by: Peter Zijlstra Clever! I did not spot any problems so will be pulling this (and presumably the next patch)

Re: [PATCH 17/18] sched: Retry migration of tasks to CPU on a preferred node

2013-07-31 Thread Mel Gorman
On Wed, Jul 31, 2013 at 12:05:05PM +0200, Peter Zijlstra wrote: > On Wed, Jul 31, 2013 at 11:03:31AM +0100, Mel Gorman wrote: > > On Thu, Jul 25, 2013 at 12:33:52PM +0200, Peter Zijlstra wrote: > > > > > > Subject: stop_machine: Introduce stop_two_cpus() > > &

Re: [PATCH 15/18] sched: Set preferred NUMA node based on number of private faults

2013-07-31 Thread Mel Gorman
On Wed, Jul 31, 2013 at 11:34:37AM +0200, Peter Zijlstra wrote: > On Wed, Jul 31, 2013 at 10:29:38AM +0100, Mel Gorman wrote: > > > Hurmph I just stumbled upon this PMD 'trick' and I'm not at all sure I > > > like it. If an application would pre-fault/initi

Re: [PATCH 0/18] Basic scheduler support for automatic NUMA balancing V5

2013-07-31 Thread Mel Gorman
> - * not guaranteed to the vma_migratable. If they are not, we would find > the > - * !migratable VMA on the next scan but not reset the scanner to the > start > - * so check it now. > + * It is possible to reach the end of the VMA list but the last few > +

Re: [PATCH] mm, numa: Sanitize task_numa_fault() callsites

2013-07-31 Thread Mel Gorman
sites should have the same > sementaics, furthermore we should accounts against where the page > really is, we already know where the task is. > Agreed. To allow the scheduler parts to still be evaluated in proper isolation I moved this patch to much earlier in the series. -- Mel Gor

Re: [PATCH 0/18] Basic scheduler support for automatic NUMA balancing V5

2013-07-31 Thread Mel Gorman
On Wed, Jul 31, 2013 at 12:48:14PM +0200, Peter Zijlstra wrote: > On Wed, Jul 31, 2013 at 11:30:52AM +0100, Mel Gorman wrote: > > I'm not sure I understand your point. The scan rate is decreased again if > > the page is found to be properly placed in the future. It's

Re: [Ksummit-2013-discuss] [ATTEND] [ARM ATTEND] kernel data bloat and how to avoid it

2013-07-31 Thread Mel Gorman
d scripts/bloat-o-meter highlight where the growth problems are? -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.

Re: [PATCH 0/18] Basic scheduler support for automatic NUMA balancing V5

2013-07-31 Thread Mel Gorman
On Wed, Jul 31, 2013 at 05:30:18PM +0200, Peter Zijlstra wrote: > On Wed, Jul 31, 2013 at 12:57:19PM +0100, Mel Gorman wrote: > > > > Right, so what Ingo did is have the scan rate depend on the convergence. > > > What exactly did you dislike about that? > > >

Re: [PATCH 08/18] sched: Reschedule task on preferred NUMA node once selected

2013-08-01 Thread Mel Gorman
On Thu, Aug 01, 2013 at 10:17:57AM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2013-07-15 16:20:10]: > > > A preferred node is selected based on the node the most NUMA hinting > > faults was incurred on. There is no guarantee that the task is running > > on that

Re: [PATCH 16/18] sched: Avoid overloading CPUs on a preferred NUMA node

2013-08-01 Thread Mel Gorman
g like a numa01 > but with far lesser number of threads probably nr_cpus/2 or nr_cpus/4, > then all threads will try to move to single node as we can keep seeing > idle threads. No? Wont it lead all load moving to one node and load > balancer spreading it out... > I cannot be 100

Re: [PATCH 17/18] sched: Retry migration of tasks to CPU on a preferred node

2013-08-01 Thread Mel Gorman
On Thu, Aug 01, 2013 at 10:43:27AM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2013-07-15 16:20:19]: > > > When a preferred node is selected for a tasks there is an attempt to migrate > > the task to a CPU there. This may fail in which case the task will only > > mi

Re: [PATCH 18/18] sched: Swap tasks when reschuling if a CPU on a target node is imbalanced

2013-08-01 Thread Mel Gorman
min_load = dst_load; > > dst_cpu = cpu; > > + *swap_p = swap_candidate; > > Are we some times passing a wrong candidate? > Lets say the first cpu balanced is false and we set the swap_candidate, > but find the second cpu(/or later cpus

Re: [PATCH 0/18] Basic scheduler support for automatic NUMA balancing V5

2013-08-01 Thread Mel Gorman
On Wed, Jul 31, 2013 at 06:39:03PM +0200, Peter Zijlstra wrote: > On Wed, Jul 31, 2013 at 05:11:41PM +0100, Mel Gorman wrote: > > RSS was another option it felt as arbitrary as a plain delay. > > Right, it would avoid 'small' programs getting scanning done with the &g

Re: [patch v2 1/3] mm: vmscan: fix numa reclaim balance problem in kswapd

2013-08-07 Thread Mel Gorman
the page counter fluctuation. > > By using zone_balanced(), it will now check, in addition to the > watermark, if compaction requires more order-0 pages to create a > higher order page. > > Signed-off-by: Johannes Weiner > Reviewed-by: Rik van Riel Acked-by: Mel Go

Re: [patch v2 2/3] mm: page_alloc: rearrange watermark checking in get_page_from_freelist

2013-08-07 Thread Mel Gorman
red in these extraordinary situations. > > Signed-off-by: Johannes Weiner > Reviewed-by: Rik van Riel Acked-by: Mel Gorman -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.o

Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2013-08-07 Thread Mel Gorman
eferred_zone); > > /* >* OK, we're below the kswapd watermark and have kicked background > @@ -4754,6 +4797,9 @@ static void __paginginit free_area_init_core(struct > pglist_data *pgdat, > zone_seqlock_init(zone); > zone->zon

Re: [patch v2 3/3] mm: page_alloc: fair zone allocator policy

2013-08-08 Thread Mel Gorman
On Thu, Aug 08, 2013 at 12:16:23AM -0400, Johannes Weiner wrote: > On Wed, Aug 07, 2013 at 11:37:43AM -0400, Johannes Weiner wrote: > > On Wed, Aug 07, 2013 at 03:58:28PM +0100, Mel Gorman wrote: > > > On Fri, Aug 02, 2013 at 11:37:26AM -0400, Johannes Weiner wrote: > &

[PATCH 12/27] sched: numa: Correct adjustment of numa_scan_period

2013-08-08 Thread Mel Gorman
numa_scan_period is in milliseconds, not jiffies. Properly placed pages slow the scanning rate but adding 10 jiffies to numa_scan_period means that the rate scanning slows depends on HZ which is confusing. Get rid of the jiffies_to_msec conversion and treat it as ms. Signed-off-by: Mel Gorman

[PATCH 26/27] sched: Avoid overloading CPUs on a preferred NUMA node

2013-08-08 Thread Mel Gorman
-by: Peter Zijlstra Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 105 +--- 1 file changed, 83 insertions(+), 22 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9d8b5cb..9ea4d5c 100644 --- a/kernel/sched/fair.c +++ b

<    4   5   6   7   8   9   10   11   12   13   >