[RFC][PATCH 0/2] reworking load_balance_monitor

2008-02-14 Thread Peter Zijlstra
Hi, Here the current patches that rework load_balance_monitor. The main reason for doing this is to eliminate the wakeups the thing generates, esp. on an idle system. The bonus is that it removes a kernel thread. Paul, Gregory - the thing that bothers me most atm is the lack of rd->load_balance.

[RFC][PATCH 2/2] sched: fair-group: per root-domain load balancing

2008-02-14 Thread Peter Zijlstra
Currently the lb_monitor will walk all the domains/cpus from a single cpu's timer interrupt. This will cause massive cache-trashing and cache-line bouncing on larger machines. Split the lb_monitor into root_domain (disjoint sched-domains). Signed-off-by: Peter Zijlstra <[EMAIL PROTEC

[PATCH 0/2] for sched-devel.git

2008-02-15 Thread Peter Zijlstra
Hi Ingo, Would you stick these into sched-devel. The first patch should address the latency isolation issue. While the second rectifies a massive brainfart :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordom

[PATCH 1/2] sched: fair: virtual deadline scheduling

2008-02-15 Thread Peter Zijlstra
meet. This includes the latency into the scheduling decision. [*] - EDF is correct up until load 1, after that it is not a closed system so improvement is possible here. It is usable because the system strives to generate the load 1 situation. Signed-off-by: Peter Zijlstra <[EMAIL PROTEC

[PATCH 2/2] sched: fair: fix calc_delta_asym

2008-02-15 Thread Peter Zijlstra
The goal of calc_delta_asym() is to be asymetrically around NICE_0_LOAD, in that it favours >=0 over <0. The current implementation does not achieve that. -20 | | 0 +--- .' 19 .' Signed-off-by: Peter Zijlstra <[EMAIL PROT

Re: 2.6.24-mm1 bugs

2008-02-15 Thread Peter Zijlstra
On Fri, 2008-02-15 at 12:43 +0100, Miklos Szeredi wrote: > - strange key repeating (short press of a key results in lots of key >press events) when there's some sort of load (I/O?) I may have >seen this on non-mm kernels as well, but it's definitely more >noticable in -mm Do you ha

Re: [PATCH 3/4] IPMI: convert locked counters to atomics

2008-02-15 Thread Peter Zijlstra
On Thu, 2008-02-14 at 12:30 -0600, Corey Minyard wrote: > +/* > + * Various statistics for IPMI, these index stats[] in the ipmi_smi > + * structure. > + */ > +/* Commands we got from the user that were invalid. */ > +#define IPMI_STAT_sent_invalid_commands 0 > + > +/* Comman

Re: [RFC][PATCH 2/2] sched: fair-group: per root-domain load balancing

2008-02-15 Thread Peter Zijlstra
On Fri, 2008-02-15 at 11:46 -0500, Gregory Haskins wrote: > Peter Zijlstra wrote: > > > @@ -6342,8 +6351,14 @@ static void rq_attach_root(struct rq *rq > > cpu_clear(rq->cpu, old_rd->span); > > cpu_clear(rq->cpu, old_rd-&

Re: [PATCH 1/1] kthread: disable preemption during complete()

2012-07-26 Thread Peter Zijlstra
On Wed, 2012-07-25 at 15:40 -0700, Tejun Heo wrote: > (cc'ing Oleg and Peter) Right, if you're playing games with preemption, always add the rt and sched folks.. added mingo and tglx. > On Wed, Jul 25, 2012 at 03:35:32PM -0700, Peter Boonstoppel wrote: > > After a kthread is created it signals th

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-07-26 Thread Peter Zijlstra
On Thu, 2012-07-26 at 08:50 +0300, Gleb Natapov wrote: > On Wed, Jul 25, 2012 at 10:35:46PM +0200, Peter Zijlstra wrote: > > On Tue, 2012-07-24 at 18:15 +0200, Robert Richter wrote: > > > David, > > > > > > On 24.07.12 08:20:19, David Ahern wrote: > > &g

Re: [PATCH 08/11] perf tool: precise mode requires exclude_guest

2012-07-26 Thread Peter Zijlstra
On Wed, 2012-07-25 at 23:16 -0600, David Ahern wrote: > Peter's patch (see https://lkml.org/lkml/2012/7/9/298) changes kernel > side to require the use of exclude_guest if the precise modifier is > used, returning -EOPNOTSUPP if exclude_guest is not set. This patch goes > after the user experie

Re: [RFC] page-table walkers vs memory order

2012-07-26 Thread Peter Zijlstra
On Wed, 2012-07-25 at 15:09 -0700, Hugh Dickins wrote: > We find out after it hits us, and someone studies the disassembly - > if we're lucky enough to crash near the origin of the problem. This is a rather painful way.. see https://lkml.org/lkml/2009/1/5/555 we were lucky there in that the l

Re: [PATCH 2/2] sched: fix a logical error in select_task_rq_fair

2012-07-26 Thread Peter Zijlstra
On Thu, 2012-07-26 at 13:27 +0800, Alex Shi wrote: > If find_idlest_cpu() return '-1', and sd->child is NULL. The function > select_task_rq_fair will return -1. That is not the function's purpose. But find_idlest_cpu() will only return -1 if the group mask is fully excluded by the cpus_allowed mas

Re: [PATCH 1/1] kthread: disable preemption during complete()

2012-07-26 Thread Peter Zijlstra
On Thu, 2012-07-26 at 17:54 +0200, Oleg Nesterov wrote: > Yes, but this "avoid the preemption after wakeup" can actually help > kthread_bind()->wait_task_inactive() ? Yeah. > This reminds me, Peter had a patch which teaches wait_task_inactive() > to use sched_in/sched_out notifiers to avoid the p

thp and memory barrier assumptions

2012-07-26 Thread Peter Zijlstra
__do_huge_pmd_anonymous_page() contains: /* * The spinlocking to take the lru_lock inside * page_add_new_anon_rmap() acts as a full memory * barrier to be sure clear_huge_page writes become * visible after the set

Re: thp and memory barrier assumptions

2012-07-26 Thread Peter Zijlstra
On Thu, 2012-07-26 at 22:31 +0200, Peter Zijlstra wrote: > __do_huge_pmd_anonymous_page() contains: > > /* > * The spinlocking to take the lru_lock inside > * page_add_new_anon_rmap() acts as a full memory > *

Re: [RFC] page-table walkers vs memory order

2012-07-26 Thread Peter Zijlstra
On Tue, 2012-07-24 at 14:51 -0700, Hugh Dickins wrote: > I do love the status quo, but an audit would be welcome. When > it comes to patches, personally I tend to prefer ACCESS_ONCE() and > smp_read_barrier_depends() and accompanying comments to be hidden away > in the underlying macros or inlines

Re: [ 028/108] sched/nohz: Rewrite and fix load-avg computation -- again

2012-07-26 Thread Peter Zijlstra
On Tue, 2012-07-24 at 15:06 +0100, Ben Hutchings wrote: > On Mon, 2012-07-23 at 02:07 +0100, Ben Hutchings wrote: > > 3.2-stable review patch. If anyone has any objections, please let me know. > > > > -- > > > > From

Re: [ 028/108] sched/nohz: Rewrite and fix load-avg computation -- again

2012-07-26 Thread Peter Zijlstra
On Thu, 2012-07-26 at 23:01 +0100, Ben Hutchings wrote: > > That's what I thought, so I went ahead with just the one. > Should I queue up the other two for a future 3.2.y update? Yeah, why not.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

Re: [PATCH 1/2] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and code clean up

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-27 at 09:47 +0800, Alex Shi wrote: > From 610515185d8a98c14c7c339c25381bc96cd99d93 Mon Sep 17 00:00:00 2001 > From: Alex Shi > Date: Thu, 26 Jul 2012 08:55:34 +0800 > Subject: [PATCH 1/3] sched: recover SD_WAKE_AFFINE in select_task_rq_fair and > code clean up > > Since power sa

Re: [PATCH 4/6] rbtree: faster augmented insert

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: > --- a/lib/rbtree.c > +++ b/lib/rbtree.c > @@ -88,7 +88,8 @@ __rb_rotate_set_parents(struct rb_node *old, struct rb_node > *new, > root->rb_node = new; > } > > -void rb_insert_color(struct rb_node *node, struct rb_root

Re: [PATCH 4/6] rbtree: faster augmented insert

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: > > rb_insert_color() is now a special case of rb_insert_augmented() with > a do-nothing callback. I used inlining to optimize out the callback, > with the intent that this would generate the same code as previously > for rb_insert_augmen

Re: [PATCH 5/6] rbtree: faster augmented erase

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: > --- a/lib/rbtree_test.c > +++ b/lib/rbtree_test.c > @@ -1,5 +1,6 @@ > #include > #include > +#include This confuses me.. either its internal to the rb-tree implementation and users don't need to see it, or its not in which case hu

Re: [PATCH 5/6] rbtree: faster augmented erase

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: > +static inline void > +rb_erase_augmented(struct rb_node *node, struct rb_root *root, > + rb_augment_propagate *augment_propagate, > + rb_augment_rotate *augment_rotate) So why put all this in a static

Re: [PATCH 4/6] rbtree: faster augmented insert

2012-07-27 Thread Peter Zijlstra
On Fri, 2012-07-20 at 05:31 -0700, Michel Lespinasse wrote: > +static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) > +{ > + struct test_node *old = rb_entry(rb_old, struct test_node, rb); > + struct test_node *new = rb_entry(rb_new, struct test_node, rb); > + > +

Re: [PATCH 1/5] user_hooks: New user hooks subsystem

2012-07-30 Thread Peter Zijlstra
On Fri, 2012-07-27 at 17:40 +0200, Frederic Weisbecker wrote: > +++ b/kernel/user_hooks.c > @@ -0,0 +1,56 @@ > +#include > +#include > +#include > +#include > + > +struct user_hooks { > + bool hooking; > + bool in_user; > +}; I really detest using bool in structures.. but that's ju

Re: [PATCH 1/5] user_hooks: New user hooks subsystem

2012-07-30 Thread Peter Zijlstra
On Mon, 2012-07-30 at 11:27 -0400, Steven Rostedt wrote: > I'm curious to what you have against bool in structures? _Bool as per the C std doesn't have a specified storage. Now IIRC hpa recently said that all GCC versions so far were consistent and used char (a byte) for it, but I might mis-rememb

Re: [PATCH 1/5] user_hooks: New user hooks subsystem

2012-07-30 Thread Peter Zijlstra
On Mon, 2012-07-30 at 12:07 -0400, Steven Rostedt wrote: > > Would 'is_hooked' be better? 'is_hooking' sounds more like what women in > high heels, really short skirts and lots of makeup are doing late night > on a corner of a Paris street ;-) This is exactly the first thing I though of when I re

Re: [PATCH 1/5] user_hooks: New user hooks subsystem

2012-07-30 Thread Peter Zijlstra
On Mon, 2012-07-30 at 12:07 -0400, Steven Rostedt wrote: > > Not only does bool describe it better, it should also allow gcc to > optimize it better as well. Unless Peter has a legitimate rational why > using bool in struct is bad, I would keep it as is. I don't mind too much, but like said, I h

Re: [PATCH 1/5] user_hooks: New user hooks subsystem

2012-07-31 Thread Peter Zijlstra
On Tue, 2012-07-31 at 16:57 +0200, Ingo Molnar wrote: > > 'callback', while a longer word, is almost always used as a noun > within the kernel - and it also has a pretty narrow meaning. An altogether different naming would be something like: struct user_kernel_tracking { int want_uk_tr

[PATCH 00/19] sched-numa rewrite

2012-07-31 Thread Peter Zijlstra
Hi all, After having had a talk with Rik about all this NUMA nonsense where he proposed the scheme implemented in the next to last patch, I came up with a related means of doing the home-node selection. I've also switched to (ab)using PROT_NONE for driving the migration faults. These patches go

[PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-07-31 Thread Peter Zijlstra
Signed-off-by: Peter Zijlstra --- arch/x86/include/asm/pgtable.h |1 mm/huge_memory.c | 104 +++-- 2 files changed, 50 insertions(+), 55 deletions(-) --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -350,6 +350,7

[PATCH 02/19] mm/mpol: Remove NUMA_INTERLEAVE_HIT

2012-07-31 Thread Peter Zijlstra
A_HIT fully includes NUMA_INTERLEAVE_HIT so users might switch to using that. This cleans up some of the weird MPOL_INTERLEAVE allocation exceptions. Cc: Lee Schermerhorn Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- drivers/base/node.c|2 - in

[PATCH 09/19] mm, migrate: Introduce migrate_misplaced_page()

2012-07-31 Thread Peter Zijlstra
Add migrate_misplaced_page() which deals with migrating pages from faults. This includes adding a new MIGRATE_FAULT migration mode to deal with the extra page reference required due to having to look up the page. Based-on-work-by: Lee Schermerhorn Cc: Paul Turner Signed-off-by: Peter Zijlstra

[PATCH 15/19] sched: Implement home-node awareness

2012-07-31 Thread Peter Zijlstra
parts. Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- include/linux/sched.h |1 kernel/sched/core.c | 21 +++- kernel/sched/debug.c|3 kernel/sched/fair.c | 236

[PATCH 17/19] sched, numa: Detect big processes

2012-07-31 Thread Peter Zijlstra
urrent heuristic for determining if a task is 'big' is if its consuming more than 1/2 a node's worth of cputime. We might want to add a term here looking at the RSS of the process and compare this against the available memory per node. Cc: Rik van Riel Cc: Paul Turner Signed-off-by:

[PATCH 12/19] mm/mpol: Make mempolicy home-node aware

2012-07-31 Thread Peter Zijlstra
rred[NEW] - default_policy Note that the tsk_home_node() policy has Migrate-on-Fault enabled to facilitate efficient on-demand memory migration. Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- mm

[PATCH 14/19] sched: Make find_busiest_queue() a method

2012-07-31 Thread Peter Zijlstra
Its a bit awkward but it was the least painful means of modifying the queue selection. Used in a later patch to conditionally use a random queue. Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra

[PATCH 05/19] mm, mpol: Create special PROT_NONE infrastructure

2012-07-31 Thread Peter Zijlstra
Suggested-by: Rik van Riel Cc: Paul Turner Signed-off-by: Peter Zijlstra --- include/linux/huge_mm.h |3 + include/linux/mempolicy.h |4 +- include/linux/mm.h| 12 ++ mm/huge_memory.c | 21 +++ mm/memory.c | 86

[PATCH 07/19] mm/mpol: Add MPOL_MF_NOOP

2012-07-31 Thread Peter Zijlstra
on Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- include/linux/mempolicy.h |1 + mm/mempolicy.c|8 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 87fabfa..668311a 100644 --- a/include/l

[PATCH 08/19] mm/mpol: Check for misplaced page

2012-07-31 Thread Peter Zijlstra
ff-by: Lee Schermerhorn Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds [ Added MPOL_F_LAZY to trigger migrate-on-fault; simplified code now that we don't have to bother with special crap for interleaved ] Signed-off-by: Peter Zijlstra --- include/linux/mempolicy.h |9

[PATCH 06/19] mm/mpol: Add MPOL_MF_LAZY ...

2012-07-31 Thread Peter Zijlstra
nodes. After unmap, the pages in regions assigned to the worker threads will be automatically migrated local to the threads on 1st touch. Signed-off-by: Lee Schermerhorn Cc: Lee Schermerhorn Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds [ nearly complete rewrite.. ] Signed-off-by: Peter

[PATCH 10/19] mm, mpol: Use special PROT_NONE to migrate pages

2012-07-31 Thread Peter Zijlstra
Combine our previous PROT_NONE, mpol_misplaced and migrate_misplaced_page() pieces into an effective migrate on fault scheme. Suggested-by: Rik van Riel Cc: Paul Turner Signed-off-by: Peter Zijlstra --- mm/huge_memory.c | 41 - mm/memory.c | 42

[PATCH 13/19] sched: Introduce sched_feat_numa()

2012-07-31 Thread Peter Zijlstra
Avoid a few #ifdef's later on. Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- kernel/sched/sched.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/kernel/

[PATCH 11/19] sched, mm: Introduce tsk_home_node()

2012-07-31 Thread Peter Zijlstra
better than no memory. This patch merely introduces the basic infrastructure, all policy comes later. Cc: Lee Schermerhorn Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- include/linux/init_task.h |8 include/linux/sched.h | 10

[PATCH 16/19] sched, numa: NUMA home-node selection code

2012-07-31 Thread Peter Zijlstra
constraints will try and move it away. The balance between these two 'forces' is what will result in the NUMA placement. Cc: Rik van Riel Cc: Paul Turner Signed-off-by: Peter Zijlstra --- include/linux/init_task.h |3 include/linux/mm_types.h |3 include/linux/sched.h

[PATCH 18/19] sched, numa: Per task memory placement for big processes

2012-07-31 Thread Peter Zijlstra
currently running on, since the home-node is the long term target for the task to run on, irrespective of whatever node it might temporarily run on. Suggested-by: Rik van Riel Cc: Paul Turner Signed-off-by: Peter Zijlstra --- include/linux/mempolicy.h |6 ++

[PATCH 03/19] mm/mpol: Make MPOL_LOCAL a real policy

2012-07-31 Thread Peter Zijlstra
Make MPOL_LOCAL a real and exposed policy such that applications that relied on the previous default behaviour can explicitly request it. Requested-by: Christoph Lameter Cc: Lee Schermerhorn Cc: Rik van Riel Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Peter Zijlstra --- include

[PATCH 19/19] mm, numa: retry failed page migrations

2012-07-31 Thread Peter Zijlstra
-by: Rik van Riel Signed-off-by: Peter Zijlstra --- include/linux/mm_types.h |2 ++ kernel/sched/core.c |2 ++ kernel/sched/fair.c | 19 ++- mm/memory.c | 15 --- 4 files changed, 34 insertions(+), 4 deletions(-) --- a/include/linux

[PATCH 01/19] task_work: Remove dependency on sched.h

2012-07-31 Thread Peter Zijlstra
Remove the need for sched.h from task_work.h so that we can use struct task_work in struct task_struct in a later patch. Cc: Oleg Nesterov Signed-off-by: Peter Zijlstra --- include/linux/task_work.h |7 --- kernel/exit.c |5 - 2 files changed, 4 insertions(+), 8

Re: [PATCH 1/4] uprobes: Kill set_swbp()->is_swbp_at_addr()

2012-09-24 Thread Peter Zijlstra
On Sun, 2012-09-23 at 22:19 +0200, Oleg Nesterov wrote: > A separate patch for better documentation. > > set_swbp()->is_swbp_at_addr() is not needed for correctness, it is > harmless to do the unnecessary __replace_page(old_page, new_page) > when these 2 pages are identical. > > And it can not be

Re: [PATCH 3/4] uprobes: Kill set_orig_insn()->is_swbp_at_addr()

2012-09-24 Thread Peter Zijlstra
On Sun, 2012-09-23 at 22:19 +0200, Oleg Nesterov wrote: > @@ -226,6 +245,10 @@ retry: Could you use: $ cat ~/.gitconfig [diff "default"] xfuncname = "^[[:alpha:]$_].*[^:]$" This avoids git-diff it using labels as function names. > if (ret <= 0) > return ret; > >

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-24 Thread Peter Zijlstra
Why are you cc'ing x86 and numa folks but not a single scheduler person when you're patching scheduler stuff? On Tue, 2012-09-18 at 18:12 +0800, Tang Chen wrote: > Once array sched_domains_numa_masks is defined, it is never updated. > When a new cpu on a new node is onlined, Hmm, so there's hardw

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 15:27 +0530, Srivatsa S. Bhat wrote: > On 09/24/2012 03:08 PM, Peter Zijlstra wrote: > >> + hotcpu_notifier(sched_domains_numa_masks_update, > >> CPU_PRI_SCHED_ACTIVE); > >> hotcpu_notifier(cpuset_cpu

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-09-24 Thread Peter Zijlstra
On Fri, 2012-09-21 at 17:30 +0530, Raghavendra K T wrote: > +unsigned long rq_nr_running(void) > +{ > + return this_rq()->nr_running; > +} > +EXPORT_SYMBOL(rq_nr_running); Uhm,.. no, that's a horrible thing to export. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Peter Zijlstra
On Fri, 2012-09-21 at 17:29 +0530, Raghavendra K T wrote: > In some special scenarios like #vcpu <= #pcpu, PLE handler may > prove very costly, because there is no need to iterate over vcpus > and do unsuccessful yield_to burning CPU. What's the costly thing? The vm-exit, the yield (which should

Re: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

2012-09-24 Thread Peter Zijlstra
On Tue, 2012-09-18 at 01:03 +0200, Daniel Vetter wrote: > - In the printk code there's a special trylock, only used to kick off > the logbuffer printk'ing in console_unlock. But all that happens > while lockdep is disable (since printk does a few other evil > tricks). So no issue there, eithe

Re: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 14:17 +0200, Peter Zijlstra wrote: > On Tue, 2012-09-18 at 01:03 +0200, Daniel Vetter wrote: > > - In the printk code there's a special trylock, only used to kick off > > the logbuffer printk'ing in console_unlock. But all that happens > >

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:22 +0530, Raghavendra K T wrote: > On 09/24/2012 05:04 PM, Peter Zijlstra wrote: > > On Fri, 2012-09-21 at 17:29 +0530, Raghavendra K T wrote: > >> In some special scenarios like #vcpu<= #pcpu, PLE handler may > >> prove very costly, becau

Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-17 at 13:38 -0300, Rafael Aquini wrote: > +static inline void assign_balloon_mapping(struct page *page, > + struct address_space > *mapping) > +{ > + page->mapping = mapping; > + smp_wmb(); > +} > + > +static inline void clear_ball

Re: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 14:54 +0200, Daniel Vetter wrote: > I've read through the patches and I'm hoping you don't volunteer me to > pick these up ... ;-) Worth a try, right? :-) > But there doesn't seem to be anything that would > get worse through this lockdep annotation patch, right? No indee

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 18:59 +0530, Raghavendra K T wrote: > However Rik had a genuine concern in the cases where runqueue is not > equally distributed and lockholder might actually be on a different run > queue but not running. Load should eventually get distributed equally -- that's what the loa

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 16:00 +0100, Mel Gorman wrote: > On Fri, Sep 14, 2012 at 02:42:44PM -0700, Linus Torvalds wrote: > > On Fri, Sep 14, 2012 at 2:27 PM, Borislav Petkov wrote: > > > > > > as Nikolay says below, we have a regression in 3.6 with pgbench's > > > benchmark in postgresql. > > > > >

Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:26 +0200, Avi Kivity wrote: > I think this is a no-op these (CFS) days. To get schedule() to do > anything, you need to wake up a task, or let time pass, or block. > Otherwise it will see that nothing has changed and as far as it's > concerned you're still the best task to

Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:43 +0200, Avi Kivity wrote: > Wouldn't this correspond to the scheduler interrupt firing and causing a > reschedule? I thought the timer was programmed for exactly the point in > time that CFS considers the right time for a switch. But I'm basing > this on my mental model

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:51 +0200, Avi Kivity wrote: > On 09/24/2012 03:54 PM, Peter Zijlstra wrote: > > On Mon, 2012-09-24 at 18:59 +0530, Raghavendra K T wrote: > >> However Rik had a genuine concern in the cases where runqueue is not > >> equally distributed and lock

Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:58 +0200, Avi Kivity wrote: > There is the TSC deadline timer mode of newer Intels. Programming the > timer is a simple wrmsr, and it will fire immediately if it already > expired. Unfortunately on AMDs it is not available, and on virtual > hardware it will be slow (~1-2

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 08:52 -0700, Linus Torvalds wrote: > Your patch looks odd, though. Why do you use some complex initial > value for 'candidate' (nr_cpu_ids) instead of a simple and readable > one (-1)? nr_cpu_ids is the typical no-value value for cpumask operations -- yes this is annoying an

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 08:52 -0700, Linus Torvalds wrote: > And the whole "if we find any non-idle cpu, skip the whole domain" > logic really seems a bit odd (that's not new to your patch, though). > Can somebody explain what the whole point of that idiotically written > function is? So we're look

Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 18:10 +0200, Avi Kivity wrote: > > Its also still a LAPIC write -- disguised as an MSR though :/ > > It's probably a whole lot faster though. I've been told its not, I haven't tried it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body o

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 18:06 +0200, Avi Kivity wrote: > > We would probably need a ->sched_exit() preempt notifier to make this > work. Peter, I know how much you love those, would it be acceptable? Where exactly do you want this? TASK_DEAD? or another exit? -- To unsubscribe from this list: sen

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 09:30 -0700, Linus Torvalds wrote: > On Mon, Sep 24, 2012 at 9:12 AM, Peter Zijlstra > wrote: > > > > So we're looking for an idle cpu around @target. We prefer a cpu of an > > idle core, since SMT-siblings share L[12] cache. The way we do

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 09:33 -0700, Linus Torvalds wrote: > Sure, the "scan bits" bitops will return ">= nr_cpu_ids" for the "I > couldn't find a bit" thing, but that doesn't mean that everything else > should. Fair enough.. --- kernel/sched/fair.c | 42 +-

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 18:54 +0200, Peter Zijlstra wrote: > But let me try and come up with the list thing, I think we've > actually got that someplace as well. OK, I'm sure the below can be written better, but my brain is gone for the day... --- include/linux/sched.h | 1

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 10:39 +0800, Tang Chen wrote: > >> @@ -6765,11 +6773,64 @@ static void sched_init_numa(void) > >> } > >> > >> sched_domain_topology = tl; > >> + > >> +sched_domains_numa_levels = level; > > And I set it to level here again. > But its already set there.. its set

Re: [PATCH 1/3] sched: Create sched_select_cpu() to give preferred CPU for power saving

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote: > +/* sched-domain levels */ > +#define SD_SIBLING 0x01/* Only for CONFIG_SCHED_SMT */ > +#define SD_MC 0x02/* Only for CONFIG_SCHED_MC */ > +#define SD_BOOK0x04/* Only for CONFIG

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 10:39 +0800, Tang Chen wrote: > > We do this because nr_node_ids changed, right? This means the entire > > distance table grew/shrunk, which means we should do the level scan > > again. > > It seems that nr_node_ids will not change once the system is up. > I'm not quite sure.

Re: [BUG] perf/x86: Intel uncore_pmu_to_box() local variable typo

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 12:44 +0200, Stephane Eranian wrote: > Hi, > > I don't understand why the local variable box needs to > be declared static here: > > static struct intel_uncore_box * > uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu) > { > static struct intel_uncore_box *box;

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote: > @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, > struct work_struct *work) > { > int ret; > > - ret = queue_work_on(get_cpu(), wq, work); > - put_cpu(); > + preempt_disable(); > + ret = qu

Re: [PATCH 1/1] perf, Add support for Xeon-Phi PMU

2012-09-25 Thread Peter Zijlstra
On Thu, 2012-09-20 at 13:03 -0400, Vince Weaver wrote: > One additional complication: some of the cache events map to > event "0". This causes problems because the generic events code > assumes "0" means not-available. I'm not sure the best way to address > that problem. For all except P4 we

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote: > But this is what the initial idea during LPC we had. Yeah.. that's true. > Any improvements here you can suggest? We could uhm... /me tries thinking ... reuse some of the NOHZ magic? Would that be sufficient, not waking a NOHZ cpu, or do

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 13:40 +0200, Peter Zijlstra wrote: > On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote: > > But this is what the initial idea during LPC we had. > > Yeah.. that's true. > > > Any improvements here you can suggest? > > We could uhm..

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 19:45 +0800, Tang Chen wrote: > Let's have an example here. > > sched_init_numa() > { > ... > // A loop set sched_domains_numa_levels to level.-1 > > // I set sched_domains_numa_levels to 0. > sched_domains_numa_levels = 0;--

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Peter Zijlstra
On Mon, 2012-09-24 at 19:11 -0700, Linus Torvalds wrote: > In the not-so-distant past, we had the intel "Dunnington" Xeon, which > was iirc basically three Core 2 duo's bolted together (ie three > clusters of two cores sharing L2, and a fully shared L3). So that was > a true multi-core with fairly

Re: [PATCH 1/1] perf, Add support for Xeon-Phi PMU

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 15:42 +0400, Cyrill Gorcunov wrote: > Guys, letme re-read this whole mail thread first since I have no clue > what this remapping about ;) x86_setup_perfctr() / set_ext_hw_attr() have special purposed 0 and -1 config values to mean -ENOENT and -EINVAL resp. This means neith

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 14:23 +0100, Mel Gorman wrote: > It crashes on boot due to the fact that you created a function-scope variable > called sd_llc in select_idle_sibling() and shadowed the actual sd_llc you > were interested in. D'0h! -- To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-26 Thread Peter Zijlstra
On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote: > Wouldn't a clean solution be to promote a task's scheduler > class to the spinner class when we PLE (or come from some special > syscall > for userspace spinlocks?)? Userspace spinlocks are typically employed to avoid syscalls.. > That cla

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-26 Thread Peter Zijlstra
On Wed, 2012-09-26 at 15:39 +0200, Andrew Jones wrote: > On Wed, Sep 26, 2012 at 03:26:11PM +0200, Peter Zijlstra wrote: > > On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote: > > > Wouldn't a clean solution be to promote a task's scheduler > > > class to t

Re: [PATCH 1/2] numa, mm: drop redundant check in do_huge_pmd_numa_page()

2012-10-26 Thread Peter Zijlstra
On Fri, 2012-10-26 at 16:57 +0300, Kirill A. Shutemov wrote: > > > Yes, this code will catch it: > > > > > > /* if an huge pmd materialized from under us just retry later */ > > > if (unlikely(pmd_trans_huge(*pmd))) > > > return 0; > > > > > > If the pmd is under splitting it'

Re: [PATCH 26/31] sched, numa, mm: Add fault driven placement and migration policy

2012-10-26 Thread Peter Zijlstra
On Fri, 2012-10-26 at 15:50 +0200, Ingo Molnar wrote: > > Oh, just found the reason: > > the ptep_modify_prot_start()/modify()/commit() sequence is > SMP-unsafe - it has to be done under the mmap_sem write-locked. > > It is safe against *hardware* updates to the PTE, but not safe > against its

Re: [PATCH] hrtimer: Printing timer info when hitting BUG_ON()

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 19:02 +0800, Chuansheng Liu wrote: > +/* > + * dump_hrtimer_callinfo - print hrtimer information including: > + * state, callback function, pid and start_site. > +*/ > +static void dump_hrtimer_callinfo(struct hrtimer *timer) > +{ > + > + char symname[KSYM_NAME_LEN]; > +

Re: [PATCH 31/33] perf, tools: Support generic events as pmu event names v2

2012-10-29 Thread Peter Zijlstra
On Sun, 2012-10-28 at 20:12 +0100, Andi Kleen wrote: > > Note I wrote and posted all this before you posted last week, but the wheels > of perf review grind so slowly that you overtook me. > > Peter Z., to be honest all these later patches are just caused by not having > generic TSX events/modifi

Re: [PATCH 01/33] perf, x86: Add PEBSv2 record support

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 19:08 +0900, Namhyung Kim wrote: > That means it can support precise == 3? It should, the difference between 2 and 3 is allowing for !EXACT_IP samples. Not needing the LBR based fixup we should never have that, so HSW might indeed allow for 3. -- To unsubscribe from this list

Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote: > +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3"); > +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); I haven't fully grokked the macro magic yet, but event=0x100b seems wrong, event only ta

Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote: > +static u64 load_latency_data(u64 status) > +{ > + union intel_x86_pebs_dse dse; > + u64 val; > + int model = boot_cpu_data.x86_model; > + int fam = boot_cpu_data.x86; > + > + dse.val = status; > + > +

Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote: > + fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT; > + > perf_sample_data_init(&data, 0, event->hw.last_period); > > + data.period = event->hw.last_period; > + sample_type = event->attr.sample_type; > + > +

Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote: > - if (fll) { > + if (fll || fst) { > if (sample_type & PERF_SAMPLE_ADDR) > data.addr = pebs->dla; > > @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event > *event

Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 16:43 +0100, Stephane Eranian wrote: > You meant fll, instead I think. Oh, yes, too small font I guess. > Well, that would work too, but I am trying to factorize the code > with Precise Store which is a later patch. Yeah, just found that, its fine the way it is. Just looke

Re: [PATCH 2/2] dm: stay in blk_queue_bypass until queue becomes initialized

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 12:38 -0400, Vivek Goyal wrote: > Ok, so the question is what's wrong with calling synchronize_rcu() inside > a mutex with CONFIG_PREEMPT=y. I don't know. Ccing paul mckenney and > peterz. int blkcg_activate_policy(struct request_queue *q, { ... preloaded

Re: [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 19:37 +0530, Raghavendra K T wrote: > +/* > + * A load of 2048 corresponds to 1:1 overcommit > + * undercommit threshold is half the 1:1 overcommit > + * overcommit threshold is 1.75 times of 1:1 overcommit threshold > + */ > +#define COMMIT_THRESHOLD (FIXED_1) > +#define UNDE

<    1   2   3   4   5   6   7   8   9   10   >