Signed-off-by: Mel Gorman
---
Documentation/sysctl/kernel.txt | 66 +
1 file changed, 66 insertions(+)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ccd4258..0fe678c 100644
--- a/Documentation/sysctl/kernel.txt
+++ b
This continues to build on the previous feedback. The results are a mix of
gains and losses but when looking at the losses I think it's also important
to consider the reduced overhead when the patches are applied. I still
have not had the chance to closely review Peter's or Srikar's approach to
sch
consider firing the OOM killer. The
user-visible impact is that direct reclaim will not easily reach
priority 0 and start swapping prematurely.
Signed-off-by: Mel Gorman
---
mm/vmscan.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fe73724
ect
should be that kswapd will writeback fewer pages from reclaim context.
Signed-off-by: Mel Gorman
---
mm/vmscan.c | 16 +---
1 file changed, 1 insertion(+), 15 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 65f2fbea..f677780 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@
Further testing revealed that swapping was still higher than expected for
the parallel IO tests. There was also a performance regression reported
building kernels but there appears to be multiple sources of that problem.
This follow-up series primarily addresses the first swapping issue.
The tests
task is currently running on the node to recheck if the placement
decision is correct. In the optimistic expectation that the placement
decisions will be correct, the maximum period between scans is also
increased to reduce overhead due to automatic NUMA balancing.
Signed-off-by: Mel Gorman
scans.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/kernel.txt | 8 +++-
include/linux/sched.h | 1 +
kernel/sched/core.c | 4 +++-
kernel/sched/fair.c | 40 ++--
kernel/sysctl.c | 7 +++
5
approximates private pages by assuming that faults that pass the two-stage
filter are private pages and all others are shared. The preferred NUMA
node is then selected based on where the maximum number of approximately
private faults were measured.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 4
It's several months overdue and everything was quiet after 3.8 came out
but I recently had a chance to revisit automatic NUMA balancing for a few
days. I looked at basic scheduler integration resulting in the following
small series. Much of the following is heavily based on the numacore series
whic
.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 13 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 16 +---
3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ba46a64..42f9818 100644
--- a
balancer to make a decision.
Signed-off-by: Mel Gorman
---
kernel/sched/core.c | 18 +++--
kernel/sched/fair.c | 55 ++--
kernel/sched/sched.h | 2 +-
3 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b
Signed-off-by: Mel Gorman
---
Documentation/sysctl/kernel.txt | 66 +
1 file changed, 66 insertions(+)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ccd4258..0fe678c 100644
--- a/Documentation/sysctl/kernel.txt
+++ b
task did not migrate the data again unnecessarily. This information is later
used to schedule a task on the node incurring the most NUMA hinting faults.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 3 +++
kernel/sched/fair.c | 12 +++-
kernel/sched
This patch selects a preferred node for a task to run on based on the
NUMA hinting faults. This information is later used to migrate tasks
towards the node during balancing.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 10 ++
kernel/sched/fair.c
On Wed, Jun 26, 2013 at 12:39:25PM -0700, Andrew Morton wrote:
> On Wed, 26 Jun 2013 13:39:23 +0100 Mel Gorman wrote:
>
> > Page reclaim at priority 0 will scan the entire LRU as priority 0 is
> > considered to be a near OOM condition. Direct reclaim can reach this
> &
On Thu, Jun 27, 2013 at 05:57:48PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:01PM +0100, Mel Gorman wrote:
> > @@ -503,6 +503,18 @@ DECLARE_PER_CPU(struct rq, runqueues);
> > #define cpu_curr(cpu) (cpu_rq(cpu)->curr)
&
On Fri, Jun 28, 2013 at 11:38:29AM +0530, Srikar Dronamraju wrote:
> * Mel Gorman [2013-06-26 15:38:01]:
>
> > This patch tracks what nodes numa hinting faults were incurred on. Greater
> > weight is given if the pages were to be migrated on the understanding
> &g
On Fri, Jun 28, 2013 at 11:44:28AM +0530, Srikar Dronamraju wrote:
> * Mel Gorman [2013-06-26 15:38:02]:
>
> > This patch selects a preferred node for a task to run on based on the
> > NUMA hinting faults. This information is later used to migrate tasks
> > towards t
On Thu, Jun 27, 2013 at 04:53:45PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote:
> > This patch favours moving tasks towards the preferred NUMA node when
> > it has just been selected. Ideally this is self-reinforcing as the
> > lon
On Thu, Jun 27, 2013 at 06:01:27PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote:
> > @@ -3897,6 +3907,28 @@ task_hot(struct task_struct *p, u64 now, struct
> > sched_domain *sd)
> > return delta < (s64)sys
On Thu, Jun 27, 2013 at 06:11:27PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote:
> > +/* Returns true if the destination node has incurred more faults */
> > +static bool migrate_improves_locality(struct task_struct *p, struct l
c
+++ b/kernel/sched/fair.c
@@ -4088,8 +4088,13 @@ int can_migrate_task(struct task_struct *p, struct
lb_env *env)
* 3) too many balance attempts have failed.
*/
- if (migrate_improves_locality(p, env))
+ if (migrate_improves_locality(p, env)) {
+#ifdef CONFIG_SCHEDSTATS
+
On Thu, Jun 27, 2013 at 04:54:58PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:05PM +0100, Mel Gorman wrote:
> > +static int
> > +find_idlest_cpu_node(int this_cpu, int nid)
> > +{
> > + unsigned long load, min_load = ULONG_MAX;
> >
On Thu, Jun 27, 2013 at 04:56:58PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:06PM +0100, Mel Gorman wrote:
> > +void task_numa_fault(int last_nid, int node, int pages, bool migrated)
> > {
> > struct task_struct *p = current;
> > + int pri
to fault the shared page making the information unreliable.
It is important that *something* be done with shared faults but I haven't
thought of what exactly yet. One possibility would be to give them a
different weight, maybe based on the number of active NUMA nodes, but I had
not tested anything
On Fri, Jun 28, 2013 at 10:44:27PM +0530, Srikar Dronamraju wrote:
> > > Yes, I understand that numa should have more priority over cache.
> > > But the schedstats will not be updated about whether the task was hot or
> > > cold.
> > >
> > > So lets say the task was cache hot but numa wants it to
On Mon, Jul 01, 2013 at 11:09:47AM +0530, Srikar Dronamraju wrote:
> * Srikar Dronamraju [2013-06-28 19:24:22]:
>
> > * Mel Gorman [2013-06-26 15:37:59]:
> >
> > > It's several months overdue and everything was quiet after 3.8 came out
> > > but I r
On Thu, Mar 21, 2013 at 11:57:05AM -0400, Johannes Weiner wrote:
> On Sun, Mar 17, 2013 at 01:04:07PM +0000, Mel Gorman wrote:
> > The number of pages kswapd can reclaim is bound by the number of pages it
> > scans which is related to the size of the zone and the scanning priori
On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote:
> On Sun, Mar 17, 2013 at 01:04:08PM +0000, Mel Gorman wrote:
> > Simplistically, the anon and file LRU lists are scanned proportionally
> > depending on the value of vm.swappiness although there are other factors
On Thu, Mar 21, 2013 at 05:58:37PM +0100, Michal Hocko wrote:
> On Sun 17-03-13 13:04:15, Mel Gorman wrote:
> > Currently kswapd checks if it should start writepage as it shrinks
> > each zone without taking into consideration if the zone is balanced or
> > not. This is not
On Thu, Mar 21, 2013 at 06:18:04PM +0100, Michal Hocko wrote:
> On Sun 17-03-13 13:04:16, Mel Gorman wrote:
> > +
> > + /*
> > +* Kswapd reclaims only single pages with compaction enabled. Trying
> > +* too hard to reclaim until contiguous free pages have beco
On Thu, Mar 21, 2013 at 01:53:41PM -0400, Rik van Riel wrote:
> On 03/17/2013 11:11 AM, Mel Gorman wrote:
> >On Sun, Mar 17, 2013 at 07:42:39AM -0700, Andi Kleen wrote:
> >>Mel Gorman writes:
> >>
> >>>@@ -495,6 +495,9 @@ typedef enum {
> >>>
On Thu, Mar 21, 2013 at 02:42:26PM -0400, Rik van Riel wrote:
> On 03/17/2013 09:04 AM, Mel Gorman wrote:
> >Historically, kswapd used to congestion_wait() at higher priorities if it
> >was not making forward progress. This made no sense as the failure to make
> >progres
On Fri, Mar 22, 2013 at 08:54:27AM +0100, Michal Hocko wrote:
> On Thu 21-03-13 15:34:42, Mel Gorman wrote:
> > On Thu, Mar 21, 2013 at 04:07:55PM +0100, Michal Hocko wrote:
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 4835a7a.
On Sun, Mar 17, 2013 at 01:04:06PM +, Mel Gorman wrote:
> Kswapd and page reclaim behaviour has been screwy in one way or the other
> for a long time. Very broadly speaking it worked in the far past because
> machines were limited in memory so it did not have that many pages to scan
On Fri, Mar 22, 2013 at 12:53:49PM -0400, Johannes Weiner wrote:
> On Thu, Mar 21, 2013 at 06:02:38PM +0000, Mel Gorman wrote:
> > On Thu, Mar 21, 2013 at 12:25:18PM -0400, Johannes Weiner wrote:
> > > On Sun, Mar 17, 2013 at 01:04:08PM +, Mel Gorman wrote:
> > > >
ACTIVE] = 0;
> >
> > /* Reduce scanning of the other LRU proportionally */
> > lru = (lru == LRU_FILE) ? LRU_BASE : LRU_FILE;
> > nr[lru] = nr[lru] * percentage / 100;;
> > nr[lru + LRU_ACTIVE] = nr[lru + LRU_ACTIVE] * pe
On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> Hi Mel,
>
> On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > A large free page buddy block will continue many times, so if the p
gable to optionally use zsmalloc when the user did not care
that it had terrible writeback characteristics?
zswap cannot replicate zram+tmpfs but I also think that such a configuration
is a bad idea anyway. As zram is already being deployed then it might get
promoted anyway but personally I think
On Thu, Aug 15, 2013 at 01:39:21AM +0900, Minchan Kim wrote:
> On Wed, Aug 14, 2013 at 05:16:42PM +0100, Mel Gorman wrote:
> > On Thu, Aug 15, 2013 at 12:52:29AM +0900, Minchan Kim wrote:
> > > Hi Mel,
> > >
> > > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel
On Wed, Aug 14, 2013 at 01:26:02PM -0700, Andrew Morton wrote:
> On Thu, 15 Aug 2013 00:52:29 +0900 Minchan Kim wrote:
>
> > On Wed, Aug 14, 2013 at 09:57:11AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 14, 2013 at 12:45:41PM +0800, Xishi Qiu wrote:
> > > > A
= no calling compact_pdatt
In the case where order is reset to 0 due to fragmentation then it does
call compact_pgdat but it does no work due to the cc->order check in
__compact_pgdat.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
_pfn = min(low_pfn, end_pfn);
> }
> #endif
> continue;
> }
>
> so worst case is (pageblock_nr_pages - 1).
No it isn't. The worst case it that the whole region being searched is
skipped. For THP allocations, it would happen to work as being the
pageblock bound
works with [low_pfn, end_pfn)
> and we can't guarantee page_order in normal compaction path
> so I'd like to limit the skipping by end_pfn conservatively.
>
Fine
s/MAX_ORDER_NR_PAGES/pageblock_nr_pages/
and take the min of it and
low_pfn = min(low_pfn, end_pfn - 1)
--
Mel G
On Thu, Aug 15, 2013 at 10:41:39PM +0900, Minchan Kim wrote:
> Hey Mel,
>
> On Thu, Aug 15, 2013 at 11:47:27AM +0100, Mel Gorman wrote:
> > On Thu, Aug 15, 2013 at 06:02:53PM +0800, Hillf Danton wrote:
> > > If the allocation order is not high, direct compaction does no
...@gmail.com: Pointed out that it was a potential problem]
Signed-off-by: Mel Gorman
---
mm/compaction.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index 05ccb4c..c437893 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1131,6 +1131,9 @@ void
ll lead to zram and zswap diverging
further from each other, both implementing similar functionality and
ultimately cause greater maintenance headaches. There is a path that makes
zswap a functional replacement for zram and I've seen no good reason why
that path was not taken. Zram cannot be a
fresh zswap
> couldn't replace old zram?
>
> Mel, please consider embedded world although they are very little voice
> in this core subsystem.
>
I already said I recognise it has a large number of users in the field
and users count a lot more than me complaining. If it ge
On Fri, Aug 16, 2013 at 09:33:47AM +0100, Mel Gorman wrote:
> On Fri, Aug 16, 2013 at 01:26:41PM +0900, Minchan Kim wrote:
>
> It'll get even more entertaining if/when someone ever tries
> to reimplement zcache although since Dan left I do not believe anyone is
> planni
THP NUMA hinting fault on pages that are not migrated are being
accounted for incorrectly. Currently the fault will be counted as if the
task was running on a node local to the page which is not necessarily
true.
Signed-off-by: Mel Gorman
---
mm/huge_memory.c | 10 +-
1 file changed, 5
motivation to do it properly in the future.
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 4
1 file changed, 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a392c8..b43122c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1080,10 +1080,6 @@ void
time if another attempt should be made to migrate the task. It will only
make an attempt once every five seconds.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 40 +++-
2 files changed, 24 insertions(+), 17 deletions
these are generally shared library pages. Migrating such pages
is not beneficial as there is an expectation they are read-shared between
caches and iTLB and iCache pressure is generally low.
Signed-off-by: Mel Gorman
---
include/linux/migrate.h | 7 ---
mm/memory.c | 7 ++-
mm
sed on the
amount of virtual memory that should be scanned in a second. The default
of 2.5G seems arbitrary but it is to have the maximum scan rate after the
patch roughly match the maximum scan rate before the patch was applied.
Signed-off-by: Mel Gorman
---
Documentation/sysctl/kerne
task_numa_placement checks current->mm but after buffers for faults
have already been uselessly allocated. Move the check earlier.
[pet...@infradead.org: Identified the problem]
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
d
Signed-off-by: Mel Gorman
---
Documentation/sysctl/kernel.txt | 66 +
1 file changed, 66 insertions(+)
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ccd4258..0fe678c 100644
--- a/Documentation/sysctl/kernel.txt
+++ b
in both terms of counting faults and scheduling
tasks on nodes.
Signed-off-by: Mel Gorman
---
mm/huge_memory.c | 9 +
mm/memory.c | 7 ++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e4a79fa..ec938ed 100644
--- a/mm
.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 13 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 16 +---
3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ba46a64..42f9818 100644
--- a
This continues to build on the previous feedback and further testing and
I'm hoping this can be finalised relatively soon. False sharing is still
a major problem but I still think it deserves its own series. Minimally I
think the fact that we are now scanning shared pages without much additional
sy
controlled by the numa_balancing_settle_count sysctl. Once the
settle_count number of scans has complete the schedule is free to place
the task on an alternative node if the load is imbalanced.
[sri...@linux.vnet.ibm.com: Fixed statistics]
Signed-off-by: Mel Gorman
---
Documentation/sysctl/kernel.txt | 8
This patch selects a preferred node for a task to run on based on the
NUMA hinting faults. This information is later used to migrate tasks
towards the node during balancing.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 17
, all faults are treated as private and detection will be introduced
later.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 5 +++--
kernel/sched/fair.c | 33 -
mm/huge_memory.c | 7 ---
mm/memory.c | 9 ++---
4 files changed, 37
balancer to make a decision.
Signed-off-by: Mel Gorman
---
kernel/sched/core.c | 17 +
kernel/sched/fair.c | 46 +-
kernel/sched/sched.h | 1 +
3 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched
task did not migrate the data again unnecessarily. This information is later
used to schedule a task on the node incurring the most NUMA hinting faults.
Signed-off-by: Mel Gorman
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 3 +++
kernel/sched/fair.c | 12 +++-
kernel/sched
task is currently running on the node to recheck if the placement
decision is correct. In the optimistic expectation that the placement
decisions will be correct, the maximum period between scans is also
increased to reduce overhead due to automatic NUMA balancing.
Signed-off-by: Mel Gorman
son is that multiple threads in a process will race each
other to fault the shared page making the fault information unreliable.
Signed-off-by: Mel Gorman
---
include/linux/mm.h| 69 ++-
include/linux/mm_types.h | 4 +--
include/li
attempt will be made to swap with the task if it is not running
on its preferred node and that moving it would not impair its locality.
Signed-off-by: Mel Gorman
---
kernel/sched/core.c | 39 +--
kernel/sched/fair.c | 46
-by: Peter Zijlstra
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 105 +---
1 file changed, 83 insertions(+), 22 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f0519c..8ee1c8e 100644
--- a/kernel/sched/fair.c
+++ b
On Mon, Jul 15, 2013 at 10:03:21PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:18PM +0100, Mel Gorman wrote:
> > ---
> > kernel/sched/fair.c | 105
> > +---
> > 1 file changed, 83 insertions(+), 22 delet
On Mon, Jul 15, 2013 at 10:11:10PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:20PM +0100, Mel Gorman wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 53d8465..d679b01 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/s
On Tue, Jul 16, 2013 at 11:55:24PM +0800, Hillf Danton wrote:
> On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote:
> > +
> > +static int task_numa_find_cpu(struct task_struct *p, int nid)
> > +{
> > + int node_cpu = cpumask_first(cpumask_of_node(nid));
> [.
On Wed, Jul 17, 2013 at 12:50:30PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:04PM +0100, Mel Gorman wrote:
> > index cc03cfd..c5f773d 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -503,6 +503,17 @@ DECLARE_PER_CPU(struct r
On Mon, Jul 29, 2013 at 12:10:59PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:04PM +0100, Mel Gorman wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -815,7 +815,14 @@ void task_numa_fault(int node, int pages, bool
> > migrated)
> >
On Wed, Jul 17, 2013 at 01:00:53PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:06PM +0100, Mel Gorman wrote:
> > The zero page is not replicated between nodes and is often shared
> > between processes. The data is read-only and likely to be cached in
> >
)
+
+/*
+ * NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a
+ * higher number of hinting faults are recorded during active load
+ * balancing.
+ */
+SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
#endif
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscri
On Wed, Jul 17, 2013 at 09:31:05AM +0800, Hillf Danton wrote:
> On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote:
> > +static int
> > +find_idlest_cpu_node(int this_cpu, int nid)
> > +{
> > + unsigned long load, min_load = ULONG_MAX;
> > +
On Wed, Jul 17, 2013 at 10:17:29AM +0800, Hillf Danton wrote:
> On Mon, Jul 15, 2013 at 11:20 PM, Mel Gorman wrote:
> > /*
> > * Got a PROT_NONE fault for a page on @node.
> > */
> > -void task_numa_fault(int node, int pages, bool migrated)
> > +void task_
On Wed, Jul 17, 2013 at 01:22:22PM +0800, Sam Ben wrote:
> On 07/15/2013 11:20 PM, Mel Gorman wrote:
> >Currently automatic NUMA balancing is unable to distinguish between false
> >shared versus private pages except by ignoring pages with an elevated
>
> What's t
On Wed, Jul 17, 2013 at 09:53:53PM -0400, Rik van Riel wrote:
> On Mon, 15 Jul 2013 16:20:17 +0100
> Mel Gorman wrote:
>
> > Ideally it would be possible to distinguish between NUMA hinting faults that
> > are private to a task and those that are shared. If treated ident
On Fri, Jul 26, 2013 at 01:20:50PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:17PM +0100, Mel Gorman wrote:
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index cacc64a..04c9469 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> >
On Wed, Jul 17, 2013 at 12:54:23PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2013 at 04:20:18PM +0100, Mel Gorman wrote:
> > +static long effective_load(struct task_group *tg, int cpu, long wl, long
> > wg);
>
> And this
> -- which suggests you always build with cg
oses the stop_machine() state machine but only stops
> the two cpus which we can do with on-stack structures and avoid
> machine wide synchronization issues.
>
> Signed-off-by: Peter Zijlstra
Clever! I did not spot any problems so will be pulling this (and
presumably the next patch)
On Wed, Jul 31, 2013 at 12:05:05PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 11:03:31AM +0100, Mel Gorman wrote:
> > On Thu, Jul 25, 2013 at 12:33:52PM +0200, Peter Zijlstra wrote:
> > >
> > > Subject: stop_machine: Introduce stop_two_cpus()
> > &
On Wed, Jul 31, 2013 at 11:34:37AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 10:29:38AM +0100, Mel Gorman wrote:
> > > Hurmph I just stumbled upon this PMD 'trick' and I'm not at all sure I
> > > like it. If an application would pre-fault/initi
> - * not guaranteed to the vma_migratable. If they are not, we would find
> the
> - * !migratable VMA on the next scan but not reset the scanner to the
> start
> - * so check it now.
> + * It is possible to reach the end of the VMA list but the last few
> +
sites should have the same
> sementaics, furthermore we should accounts against where the page
> really is, we already know where the task is.
>
Agreed. To allow the scheduler parts to still be evaluated in proper
isolation I moved this patch to much earlier in the series.
--
Mel Gor
On Wed, Jul 31, 2013 at 12:48:14PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 11:30:52AM +0100, Mel Gorman wrote:
> > I'm not sure I understand your point. The scan rate is decreased again if
> > the page is found to be properly placed in the future. It's
d scripts/bloat-o-meter highlight where the growth problems are?
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.
On Wed, Jul 31, 2013 at 05:30:18PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 12:57:19PM +0100, Mel Gorman wrote:
>
> > > Right, so what Ingo did is have the scan rate depend on the convergence.
> > > What exactly did you dislike about that?
> > >
On Thu, Aug 01, 2013 at 10:17:57AM +0530, Srikar Dronamraju wrote:
> * Mel Gorman [2013-07-15 16:20:10]:
>
> > A preferred node is selected based on the node the most NUMA hinting
> > faults was incurred on. There is no guarantee that the task is running
> > on that
g like a numa01
> but with far lesser number of threads probably nr_cpus/2 or nr_cpus/4,
> then all threads will try to move to single node as we can keep seeing
> idle threads. No? Wont it lead all load moving to one node and load
> balancer spreading it out...
>
I cannot be 100
On Thu, Aug 01, 2013 at 10:43:27AM +0530, Srikar Dronamraju wrote:
> * Mel Gorman [2013-07-15 16:20:19]:
>
> > When a preferred node is selected for a tasks there is an attempt to migrate
> > the task to a CPU there. This may fail in which case the task will only
> > mi
min_load = dst_load;
> > dst_cpu = cpu;
> > + *swap_p = swap_candidate;
>
> Are we some times passing a wrong candidate?
> Lets say the first cpu balanced is false and we set the swap_candidate,
> but find the second cpu(/or later cpus
On Wed, Jul 31, 2013 at 06:39:03PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2013 at 05:11:41PM +0100, Mel Gorman wrote:
> > RSS was another option it felt as arbitrary as a plain delay.
>
> Right, it would avoid 'small' programs getting scanning done with the
&g
the page counter fluctuation.
>
> By using zone_balanced(), it will now check, in addition to the
> watermark, if compaction requires more order-0 pages to create a
> higher order page.
>
> Signed-off-by: Johannes Weiner
> Reviewed-by: Rik van Riel
Acked-by: Mel Go
red in these extraordinary situations.
>
> Signed-off-by: Johannes Weiner
> Reviewed-by: Rik van Riel
Acked-by: Mel Gorman
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.o
eferred_zone);
>
> /*
>* OK, we're below the kswapd watermark and have kicked background
> @@ -4754,6 +4797,9 @@ static void __paginginit free_area_init_core(struct
> pglist_data *pgdat,
> zone_seqlock_init(zone);
> zone->zon
On Thu, Aug 08, 2013 at 12:16:23AM -0400, Johannes Weiner wrote:
> On Wed, Aug 07, 2013 at 11:37:43AM -0400, Johannes Weiner wrote:
> > On Wed, Aug 07, 2013 at 03:58:28PM +0100, Mel Gorman wrote:
> > > On Fri, Aug 02, 2013 at 11:37:26AM -0400, Johannes Weiner wrote:
> &
numa_scan_period is in milliseconds, not jiffies. Properly placed pages
slow the scanning rate but adding 10 jiffies to numa_scan_period means
that the rate scanning slows depends on HZ which is confusing. Get rid
of the jiffies_to_msec conversion and treat it as ms.
Signed-off-by: Mel Gorman
-by: Peter Zijlstra
Signed-off-by: Mel Gorman
---
kernel/sched/fair.c | 105 +---
1 file changed, 83 insertions(+), 22 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d8b5cb..9ea4d5c 100644
--- a/kernel/sched/fair.c
+++ b
801 - 900 of 5507 matches
Mail list logo