Re: [PATCH 1/3] sched: Create sched_select_cpu() to give preferred CPU for power saving

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote: +/* sched-domain levels */ +#define SD_SIBLING 0x01/* Only for CONFIG_SCHED_SMT */ +#define SD_MC 0x02/* Only for CONFIG_SCHED_MC */ +#define SD_BOOK0x04/* Only for

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 10:39 +0800, Tang Chen wrote: We do this because nr_node_ids changed, right? This means the entire distance table grew/shrunk, which means we should do the level scan again. It seems that nr_node_ids will not change once the system is up. I'm not quite sure. If I am

Re: [BUG] perf/x86: Intel uncore_pmu_to_box() local variable typo

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 12:44 +0200, Stephane Eranian wrote: Hi, I don't understand why the local variable box needs to be declared static here: static struct intel_uncore_box * uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu) { static struct intel_uncore_box *box;

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote: @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work) { int ret; - ret = queue_work_on(get_cpu(), wq, work); - put_cpu(); + preempt_disable(); + ret =

Re: [PATCH 1/1] perf, Add support for Xeon-Phi PMU

2012-09-25 Thread Peter Zijlstra
On Thu, 2012-09-20 at 13:03 -0400, Vince Weaver wrote: One additional complication: some of the cache events map to event 0. This causes problems because the generic events code assumes 0 means not-available. I'm not sure the best way to address that problem. For all except P4 we could

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote: But this is what the initial idea during LPC we had. Yeah.. that's true. Any improvements here you can suggest? We could uhm... /me tries thinking ... reuse some of the NOHZ magic? Would that be sufficient, not waking a NOHZ cpu, or do

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 13:40 +0200, Peter Zijlstra wrote: On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote: But this is what the initial idea during LPC we had. Yeah.. that's true. Any improvements here you can suggest? We could uhm... /me tries thinking ... reuse some

Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 19:45 +0800, Tang Chen wrote: Let's have an example here. sched_init_numa() { ... // A loop set sched_domains_numa_levels to level.-1 // I set sched_domains_numa_levels to 0. sched_domains_numa_levels =

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Peter Zijlstra
On Mon, 2012-09-24 at 19:11 -0700, Linus Torvalds wrote: In the not-so-distant past, we had the intel Dunnington Xeon, which was iirc basically three Core 2 duo's bolted together (ie three clusters of two cores sharing L2, and a fully shared L3). So that was a true multi-core with fairly big

Re: [PATCH 1/1] perf, Add support for Xeon-Phi PMU

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 15:42 +0400, Cyrill Gorcunov wrote: Guys, letme re-read this whole mail thread first since I have no clue what this remapping about ;) x86_setup_perfctr() / set_ext_hw_attr() have special purposed 0 and -1 config values to mean -ENOENT and -EINVAL resp. This means

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Peter Zijlstra
On Tue, 2012-09-25 at 14:23 +0100, Mel Gorman wrote: It crashes on boot due to the fact that you created a function-scope variable called sd_llc in select_idle_sibling() and shadowed the actual sd_llc you were interested in. D'0h! -- To unsubscribe from this list: send the line unsubscribe

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-26 Thread Peter Zijlstra
On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote: Wouldn't a clean solution be to promote a task's scheduler class to the spinner class when we PLE (or come from some special syscall for userspace spinlocks?)? Userspace spinlocks are typically employed to avoid syscalls.. That class

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-26 Thread Peter Zijlstra
On Wed, 2012-09-26 at 15:39 +0200, Andrew Jones wrote: On Wed, Sep 26, 2012 at 03:26:11PM +0200, Peter Zijlstra wrote: On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote: Wouldn't a clean solution be to promote a task's scheduler class to the spinner class when we PLE (or come from

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Peter Zijlstra
On Wed, 2012-09-26 at 11:19 -0700, Linus Torvalds wrote: For example, it starts with the maximum target scheduling domain, and works its way in over the scheduling groups within that domain. What the f*ck is the logic of that kind of crazy thing? It never makes sense to look at a biggest

Re: [tip:core/rcu] sched: Fix load avg vs cpu-hotplug

2012-09-27 Thread Peter Zijlstra
On Wed, 2012-09-26 at 22:12 -0700, tip-bot for Peter Zijlstra wrote: Commit-ID: 5d18023294abc22984886bd7185344e0c2be0daf Gitweb: http://git.kernel.org/tip/5d18023294abc22984886bd7185344e0c2be0daf Author: Peter Zijlstra pet...@infradead.org AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Peter Zijlstra
On Thu, 2012-09-27 at 09:48 -0700, da...@lang.hm wrote: I think you are bing too smart for your own good. you don't know if it's best to move them further apart or not. Well yes and no.. You're right, however in general the load-balancer has always tried to not use (SMT) siblings whenever

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Peter Zijlstra
On Thu, 2012-09-27 at 10:45 -0700, da...@lang.hm wrote: But I thought that this conversation (pgbench) was dealing with long running processes, Ah, I think we've got a confusion on long vs short.. yes pgbench is a long-running process, however the tasks might not be long in runnable state. Ie

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Peter Zijlstra
On Thu, 2012-09-27 at 11:19 -0700, Linus Torvalds wrote: On Thu, Sep 27, 2012 at 11:05 AM, Borislav Petkov b...@alien8.de wrote: On Thu, Sep 27, 2012 at 10:44:26AM -0700, Linus Torvalds wrote: Or could we just improve the heuristics. What happens if the scheduling granularity is increased,

Re: [PATCHv4] perf x86_64: Fix rsp register for system call fast path

2012-10-03 Thread Peter Zijlstra
On Wed, 2012-10-03 at 15:13 +0200, Jiri Olsa wrote: @@ -1190,8 +1191,8 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data-raw = NULL; data-br_stack = NULL; data-period = period; - data-regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE; -

Re: [PATCH 7/10] compiler{,-gcc4}.h: Introduce __flatten function attribute

2012-10-03 Thread Peter Zijlstra
On Wed, 2012-10-03 at 11:14 -0400, Steven Rostedt wrote: Yep. I personally never use the get_maintainers script. I first check the MAINTAINERS file. If the subsystem I'm working on exists there, I only email those that are listed there, including any mailing lists that are mentioned (as well

Re: [PATCH RFC 1/3] sched: introduce distinct per-cpu load average

2012-10-04 Thread Peter Zijlstra
On Thu, 2012-10-04 at 01:05 +0200, Andrea Righi wrote: +++ b/kernel/sched/core.c @@ -727,15 +727,17 @@ static void dequeue_task(struct rq *rq, struct task_struct *p, int flags) void activate_task(struct rq *rq, struct task_struct *p, int flags) { if (task_contributes_to_load(p))

uncore doing kfree() from CPU_{STARTING,DYING}

2012-10-04 Thread Peter Zijlstra
7fdba1ca10462f42ad2246b918fe6368f5ce488e Author: Peter Zijlstra a.p.zijls...@chello.nl Date: Fri Jul 22 13:41:54 2011 +0200 perf, x86: Avoid kfree() in CPU_STARTING On -rt kfree() can schedule, but CPU_STARTING is before the CPU is fully up and running. These are contradictory, so avoid it. Instead push

Re: [PATCH RFC 1/3] sched: introduce distinct per-cpu load average

2012-10-04 Thread Peter Zijlstra
On Thu, 2012-10-04 at 11:43 +0200, Andrea Righi wrote: Right, the update must be atomic to have a coherent nr_uninterruptible value. And AFAICS the only way to account a coherent nr_uninterruptible value per-cpu is to go with atomic ops... mmh... I'll think more on this. You could stick

Re: [PATCH v2 2/2] Update sched_domains_numa_masks when new cpus are onlined.

2012-10-04 Thread Peter Zijlstra
On Tue, 2012-09-25 at 21:12 +0800, Tang Chen wrote: +static int sched_domains_numa_masks_update(struct notifier_block *nfb, + unsigned long action, + void *hcpu) +{ + int cpu = (int)hcpu;

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-04 Thread Peter Zijlstra
On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote: Again the numbers are ridiculously high for arch_local_irq_restore. Maybe there's a bad perf/kvm interaction when we're injecting an interrupt, I can't believe we're spending 84% of the time running the popf instruction. Smells like a

Re: Seems like sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch

2012-10-04 Thread Peter Zijlstra
On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote: On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote: Hi, Greg I found that Linux-3.5.5 accept this commit sched: Add missing call to calc_load_exit_idle() but I think this isn't needed. Because 5167e8d5417b

Re: Seems like sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch

2012-10-05 Thread Peter Zijlstra
On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote: I'm puzzled as well. Any ideas if I should do anything here or not? So I think the current v3.5.5 code is fine. I'm just not smart enough to figure out how 3.6 got fuzzed, this git thing is confusing as hell. -- To unsubscribe from

Re: [RFC] perf: perf_event_attr anon unions and static initializer issue

2012-10-05 Thread Peter Zijlstra
On Fri, 2012-10-05 at 12:36 +0200, Stephane Eranian wrote: struct perf_event_attr attr = { .config = 0x1234, .config1 = 0x456 }; Does anyone have a better solution to propose? struct perf_event_attr attr = { .config = 0x1234, { .config1 = 0x5678 }, }; sometimes works,

Re: Seems like sched: Add missing call to calc_load_exit_idle() should be reverted in 3.5 branch

2012-10-05 Thread Peter Zijlstra
On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote: Peter Zijlstra wrote: On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote: I'm puzzled as well. Any ideas if I should do anything here or not? So I think the current v3.5.5 code is fine. Now I'm puzzled. You wrote

Re: sched: per-entity load-tracking

2012-10-08 Thread Peter Zijlstra
On Sat, 2012-10-06 at 09:39 +0200, Ingo Molnar wrote: Thanks Ingo! Paul, tip/kernel/sched/fair.c | 28 ++-- 1 file changed, 18 insertions(+), 10 deletions(-) Index: tip/kernel/sched/fair.c === ---

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote: Ok, thanks for the update. I agree that we should be clearing the mapping at node hot-remove since any cpu that would subsequently get onlined and assume one of the previous cpu's ids is not guaranteed to have the same affinity.

Re: [PATCH 2/2] rename NUMA fault handling functions

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 17:20 -0400, Rik van Riel wrote: Having the function name indicate what the function is used for makes the code a little easier to read. Furthermore, the fault handling code largely consists of do__page functions. I don't much care either way, but I was thinking

Re: [PATCH 1/2] add credits for NUMA placement

2012-10-19 Thread Peter Zijlstra
should probably be rewritten once we figure out the final details of what the NUMA code needs to do, and why. Signed-off-by: Rik van Riel r...@redhat.com Acked-by: Peter Zijlstra a.p.zijls...@chello.nl Thanks Rik! -- To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: [PATCH 1/2] brw_mutex: big read-write mutex

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 15:28 -0400, Mikulas Patocka wrote: On Thu, 18 Oct 2012, Oleg Nesterov wrote: Ooooh. And I just noticed include/linux/percpu-rwsem.h which does something similar. Certainly it was not in my tree when I started this patch... percpu_down_write() doesn't allow

Re: MAX_LOCKDEP_ENTRIES too low (called from ioc_release_fn)

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 01:21 -0400, Dave Jones wrote: Not sure why you are CC'ing a call site, rather than the maintainers of the code. Just looks like lockdep is using too small a static value. Though it is pretty darn large... You're right, it's a huge chunk of memory. It looks like

Re: [PATCH] sched, autogroup: fix kernel crashes caused by runtime disable autogroup

2012-10-19 Thread Peter Zijlstra
in sched_autogroup_create_attach(). Reported-by: cwillu cwi...@cwillu.com Reported-by: Luis Henriques luis.henriq...@canonical.com Signed-off-by: Xiaotian Feng dannyf...@tencent.com Cc: Ingo Molnar mi...@redhat.com Cc: Peter Zijlstra pet...@infradead.org --- kernel/sched/auto_group.c | 10

Re: perf: p6 PMU working by accident, should we fix it and KNC?

2012-10-19 Thread Peter Zijlstra
On Wed, 2012-10-17 at 11:35 -0400, Vince Weaver wrote: This is by accident; it looks like the code does val |= ARCH_PERFMON_EVENTSEL_ENABLE; in p6_pmu_disable_event() so that events are never truly disabled (is this a bug? should it be =~ instead?). I think that's on purpose.. from

Re: [PATCH RFC] sched: boost throttled entities on wakeups

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 11:32 +0400, Vladimir Davydov wrote: 1) Do you agree that the problem exists and should be sorted out? This is two questions.. yes it exists, I'm absolutely sure I pointed it out as soon as people even started talking about this nonsense (bw cruft). Should it be sorted,

Re: [PATCH RT] slab: Fix up stable merge of slab init_lock_keys()

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 09:40 -0400, Steven Rostedt wrote: Peter, There was a little conflict with my merge of 3.4.14 due to the backport of this patch: commit 947ca1856a7e60aa6d20536785e6a42dff25aa6e Author: Michael Wang wang...@linux.vnet.ibm.com Date: Wed Sep 5 10:33:18 2012 +0800

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: Of course I'm banging my head into a wall for not seeing earlier through the existing migration path how easy this could be. There's a reason I keep promoting the idea of 'someone' rewriting all that page-migration code :-) I forever

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: Right now, unlike the traditional migration path, this breaks COW for every migration, but maybe you don't care about shared pages in the first place. And fixing that should be nothing more than grabbing the anon_vma lock and using

Re: [tip:numa/core] sched/numa/mm: Improve migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote: It's slightly ugly that migrate_page_copy() actually modifies the existing page (deactivation, munlock) when you end up having to revert back to it. The worst is actually calling copy_huge_page() on a THP.. it seems to work though ;-)

Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: -modifier_event [ukhpGH]{1,8} +modifier_event [ukhpGHx]{1,8} wouldn't the max modifier sting length grow by adding another possible modifier? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: +static int intel_pebs_aliases_snb(struct perf_event *event) +{ + u64 cfg = event-hw.config; + /* +* for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must +* be measured alone on SNB (exclusive

Re: question on NUMA page migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 11:53 -0400, Rik van Riel wrote: If we do need the extra refcount, why is normal page migration safe? :) Its mostly a matter of how convoluted you make the code, regular page migration is about as bad as you can get Normal does: follow_page(FOLL_GET) +1

Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 18:31 +0200, Stephane Eranian wrote: On Fri, Oct 19, 2012 at 6:27 PM, Peter Zijlstra pet...@infradead.org wrote: On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote: +static int intel_pebs_aliases_snb(struct perf_event *event) +{ + u64 cfg = event

Re: [PATCH 1/2] brw_mutex: big read-write mutex

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 11:32 -0400, Mikulas Patocka wrote: So if you can do an alternative implementation without RCU, show it. Uhm,,. no that's not how it works. You just don't push through crap like this and then demand someone else does it better. But using preempt_{disable,enable} and using

Re: question on NUMA page migration

2012-10-19 Thread Peter Zijlstra
On Fri, 2012-10-19 at 13:13 -0400, Rik van Riel wrote: Would it make sense to have the normal page migration code always work with the extra refcount, so we do not have to introduce a new MIGRATE_FAULT migration mode? On the other hand, compaction does not take the extra reference...

Re: linux-next: build failure after merge of the final tree (tip/s390 trees related)

2012-10-19 Thread Peter Zijlstra
On Thu, 2012-10-18 at 17:02 +0200, Ralf Baechle wrote: CC mm/huge_memory.o mm/huge_memory.c: In function ‘do_huge_pmd_prot_none’: mm/huge_memory.c:789:3: error: incompatible type for argument 3 of ‘update_mmu_cache’ That appears to have become update_mmu_cache_pmd(), which makes sense

Re: [PATCH 1/6] perf, x86: Basic Haswell LBR call stack support

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote: + /* LBR callstack does not work well with FREEZE_LBRS_ON_PMI */ + if (!cpuc-lbr_sel || !(cpuc-lbr_sel-config LBR_CALL_STACK)) + debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; How useful it is without this? How many

Re: [PATCH 1/6] perf, x86: Basic Haswell LBR call stack support

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote: --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -160,8 +160,9 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_ABORT= 1U 7, /* transaction aborts */ PERF_SAMPLE_BRANCH_INTX

Re: [RFC PATCH 5/8] irq_work: Make self-IPIs optable

2012-10-22 Thread Peter Zijlstra
On Sat, 2012-10-20 at 12:22 -0400, Frederic Weisbecker wrote: + if (empty) { + /* +* If an IPI is requested, raise it right away. Otherwise wait +* for the next tick unless it's stopped. Now if the arch uses +* some other

Re: [tip:numa/core] x86, mm: Prevent gcc to re-read the pagetables

2012-10-22 Thread Peter Zijlstra
On Sun, 2012-10-21 at 05:56 -0700, tip-bot for Andrea Arcangeli wrote: In get_user_pages_fast() the TLB shootdown code can clear the pagetables before firing any TLB flush (the page can't be freed until the TLB flushing IPI has been delivered but the pagetables will be cleared well before

Re: [PATCH v2 1/3] sched: introduce distinct per-cpu load average

2012-10-22 Thread Peter Zijlstra
On Sat, 2012-10-20 at 21:06 +0200, Andrea Righi wrote: @@ -383,13 +383,7 @@ struct rq { struct list_head leaf_rt_rq_list; #endif + unsigned long __percpu *nr_uninterruptible; This is O(nr_cpus^2) memory.. +unsigned long nr_uninterruptible_cpu(int cpu) +{ +

re: sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 14:55 +0300, Dan Carpenter wrote: Hello Peter Zijlstra, The patch 3d049f8a5398: sched, numa, mm: Implement constant, per task Working Set Sampling (WSS) rate from Oct 14, 2012, leads to the following warning: kernel/sched/fair.c:954 task_numa_work() error: we

Re: [PATCH v2 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 17:44 +0200, Stephane Eranian wrote: I know the answer, because I know what's going on under the hood. But what about the average user? I'm still wondering if the avg user really thinks 'instructions' is a useful metric for other than obtaining ipc measurements. The

Re: [PATCH v2 1/2] perf tools: add event modifier to request exclusive PMU access

2012-10-22 Thread Peter Zijlstra
On Mon, 2012-10-22 at 18:08 +0200, Stephane Eranian wrote: I'm still wondering if the avg user really thinks 'instructions' is a useful metric for other than obtaining ipc measurements. Yeah, for many users CPI (or IPC) is a useful metric. Right but you don't get that using instruction

Re: [PATCH 1/6] perf, x86: Basic Haswell LBR call stack support

2012-10-23 Thread Peter Zijlstra
On Tue, 2012-10-23 at 13:41 +0800, Yan, Zheng wrote: On 10/22/2012 06:35 PM, Peter Zijlstra wrote: On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote: --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -160,8 +160,9 @@ enum perf_branch_sample_type

[PATCH] perf, stat: Add --pre and --post command

2012-10-23 Thread Peter Zijlstra
/ bzImage Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl --- tools/perf/builtin-stat.c | 42 -- 1 files changed, 36 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 93b9011..6888960 100644

Re: [PATCH 03/34] perf, x86: Basic Haswell PEBS support v2

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: +struct event_constraint intel_hsw_pebs_event_constraints[] = { + INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */ + INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */ +

Re: [PATCH 04/34] perf, x86: Support the TSX intx/intx_cp qualifiers

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: @@ -826,7 +827,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event) return true; /* implicit branch sampling to correct PEBS skid */ - if (x86_pmu.intel_cap.pebs_trap

Re: [PATCH 05/34] perf, x86: Report PEBS event in a raw format

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: + if (event-attr.sample_type PERF_SAMPLE_RAW) { + raw.size = x86_pmu.pebs_record_size; + raw.data = __pebs; + data.raw = raw; + } The Changelog babbles about registers, yet you export

Re: [PATCH 06/34] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v2

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com This is not arch perfmon, but older CPUs will just ignore it. This makes it possible to do at least some TSX measurements from a KVM guest Please, always CC people who wrote the code as well, in this

Re: [PATCH 08/34] perf, x86: Support Haswell v4 LBR format

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: Haswell has two additional LBR from flags for TSX: intx and abort, implemented as a new v4 version of the PEBS record. s/PEBS record/LBR format/ I presume ;-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: [PATCH 11/34] perf, tools: Add abort,notx,intx branch filter options to perf report -j

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: + BRANCH_OPT(abort, PERF_SAMPLE_BRANCH_ABORT), + BRANCH_OPT(intx, PERF_SAMPLE_BRANCH_INTX), + BRANCH_OPT(notx, PERF_SAMPLE_BRANCH_NOTX), I think we want tx in the abort name, its very much a transaction abort, not any

Re: [PATCH 14/34] perf, x86: Avoid checkpointed counters causing excessive TSX aborts

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: @@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event *event) int intel_pmu_save_and_restart(struct perf_event *event) { x86_perf_event_update(event); + /* +* For a checkpointed counter

Re: [PATCH 14/34] perf, x86: Avoid checkpointed counters causing excessive TSX aborts

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: + /* XXX move somewhere else. */ + if (cpuc-events[2] (cpuc-events[2]-hw.config HSW_INTX_CHECKPOINTED)) + status |= (1ULL 2); A comment explaining about those 'spurious' PMIs would go along with this nicely,

Re: [PATCH 15/34] perf, core: Add a concept of a weightened sample

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: @@ -601,6 +602,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data-regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE; data-regs_user.regs = NULL; data-stack_user_size = 0; + data-weight

Re: [PATCH 16/34] perf, x86: Support weight samples for PEBS

2012-10-23 Thread Peter Zijlstra
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com When a weighted sample is requested, first try to report the TSX abort cost on Haswell. If that is not available report the memory latency. This allows profiling both by abort cost and by memory

Re: [PATCH 05/34] perf, x86: Report PEBS event in a raw format

2012-10-23 Thread Peter Zijlstra
On Tue, 2012-10-23 at 15:30 +0200, Andi Kleen wrote: Also, there's an alignment issue there, the raw.data is 32bit offset, the record is u64 aligned, leaving the output stream offset, wrecking things. Can you explain more? Not sure I understand. PERF_SAMPLE_RAW has a u32 size header and

Re: [PATCHv4 0/8] perf, tool: Allow to use hw events in PMU syntax

2012-10-23 Thread Peter Zijlstra
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote: arch/x86/kernel/cpu/perf_event.c | 121 + arch/x86/kernel/cpu/perf_event.h | 2 ++ arch/x86/kernel/cpu/perf_event_amd.c | 9 +++

Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

2012-10-23 Thread Peter Zijlstra
On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading the rest via PERF_SAMPLE_READ sample type processing. Disabled counters could then produce wrong numbers. Fixing that by

Re: [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers

2012-10-23 Thread Peter Zijlstra
On Mon, 2012-10-22 at 19:37 -0400, Mikulas Patocka wrote: - /* -* On X86, write operation in this_cpu_dec serves as a memory unlock -* barrier (i.e. memory accesses may be moved before the write, but -* no memory accesses are moved past the write). -* On

Re: [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers

2012-10-23 Thread Peter Zijlstra
On Tue, 2012-10-23 at 21:23 +0200, Oleg Nesterov wrote: I have to admit, I have no idea how much cli/sti is slower compared to preempt_disable/enable. A lot.. esp on stupid hardware (insert pentium-4 reference), but I think its more expensive for pretty much all hardware, preempt_disable() is

Re: [PATCH 1/2] percpu-rw-semaphores: use light/heavy barriers

2012-10-23 Thread Peter Zijlstra
On Tue, 2012-10-23 at 21:23 +0200, Oleg Nesterov wrote: static void mb_ipi(void *arg) { smp_mb(); /* unneeded ? */ } static void force_mb_on_each_cpu(void) { smp_mb(); smp_call_function(mb_ipi,

Re: [rfc 0/2] Introducing VmFlags field into smaps output

2012-10-23 Thread Peter Zijlstra
On Wed, 2012-10-24 at 01:59 +0400, Cyrill Gorcunov wrote: [ilog2(VM_WRITE)] = { {'w', 'r'} }, since we're being awfully positive about crazy late night ideas, how about something like: #define MNEM(_VM, _mn) [ilog2(_VM)] = {(const char [2]){_mn}} MNEM(VM_WRITE,

Re: + procfs-add-vmflags-field-in-smaps-output-v3-fix-2.patch added to -mm tree

2012-10-24 Thread Peter Zijlstra
On Wed, 2012-10-24 at 12:45 +0400, Cyrill Gorcunov wrote: for (i = 0; i BITS_PER_LONG; i++) { - if (vma-vm_flags (1 i)) + if (vma-vm_flags (1ul i)) { for_each_set_bit(i, vma-vm_flags, BITS_PER_LONG) { seq_printf(m, %c%c

Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug

2012-10-24 Thread Peter Zijlstra
On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote: We found poweroff sometimes fails on our computers, so we have the lock debug options configured. Then, when we do poweroff or take a cpu down via cpu-hotplug, kernel complain as below. To resove this, we modify sched_ttwu_pending(),

Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

2012-10-24 Thread Peter Zijlstra
On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote: On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote: On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote: It's possible some of the counters in the group could be disabled when sampling member of the event group is reading

Re: [RFC][PATCH] perf: Add a few generic stalled-cycles events

2012-10-24 Thread Peter Zijlstra
On Tue, 2012-10-16 at 11:31 -0700, Sukadev Bhattiprolu wrote: On a side note, how does the kernel on x86 use the 'config' information in say /sys/bus/event_source/devices/cpu/format/cccr ? On Power7, the raw code encodes the information such as the PMC to use for the event. Is that how the

Re: [PATCH 02/11] perf: Do not get values from disabled counters in group format read

2012-10-24 Thread Peter Zijlstra
On Wed, 2012-10-24 at 14:14 +0200, Jiri Olsa wrote: well, x86_pmu_read calls x86_perf_event_update, which expects the event is active.. if it's not it'll update the count from whatever left in event.hw.idx counter.. could be uninitialized or used by others.. Oh right, we shouldn't call

Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug

2012-10-24 Thread Peter Zijlstra
On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote: I see, this is an arch-specific bug, sorry for my carelessness and thank you for your tips. What arch are you using? And what exactly did the arch do wrong? Most of the code involved seems to be common code. Going by c0_compare_interrupt, this is

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Hurm,. this is

Re: [PATCH] task_work: avoid unneeded cmpxchg() in task_work_run()

2012-10-09 Thread Peter Zijlstra
On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote: But the code looks more complex, and the only advantage is that non-exiting task does xchg() instead of cmpxchg(). Not sure this worth the trouble, in this case task_work_run() will likey run the callbacks (the caller checks -task_works

Re: [REPOST] RFC: sched: Prevent wakeup to enter critical section needlessly

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 06:37 -0700, Andi Kleen wrote: Ivo Sieben meltedpiano...@gmail.com writes: Check the waitqueue task list to be non empty before entering the critical section. This prevents locking the spin lock needlessly in case the queue was empty, and therefor also prevent

Re: [PATCH] x86/perf: Fix virtualization sanity check

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 17:38 +0200, Andre Przywara wrote: First you need an AMD family 10h/12h CPU. These do not reset the PERF_CTR registers on a reboot. Now you boot bare metal Linux, which goes successfully through this check, but leaves the magic value of 0xabcd in the register. You don't

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Peter Zijlstra
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote: If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote: On Tue, 9 Oct 2012, Peter Zijlstra wrote: Well the code they were patching is in the wakeup path. As I think Tang said, we leave !runnable tasks on whatever cpu they ran on last, even if that cpu is offlined, we try and fix up state

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote: Hmm, if per-cpu memory is preserved, and we can't offline and remove this memory. So we can't offline the node. But, if the node is hot added, and per-cpu memory doesn't use the memory on this node. We can hotremove cpu/memory on this

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote: I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc you when I post that patch. That script doesn't look at all usage sites of the code you modify does it? You need to audit the entire tree for usage of the

Re: Netperf UDP_STREAM regression due to not sending IPIs in ttwu_queue()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote: Do we really switch more though? Look at the difference in interrupts vs context switch. IPIs are an interrupt so if TTWU_QUEUE wakes process B using an IPI, does that count as a context switch? Nope. Nor would it for NO_TTWU_QUEUE. A

Re: [PATCH 4/8] perf x86: Adding hardware events translations for amd cpus

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote: +static ssize_t amd_event_sysfs_show(char *page, u64 config) +{ + u64 event = (config ARCH_PERFMON_EVENTSEL_EVENT) | + (config AMD64_EVENTSEL_EVENT) 24; + + return x86_event_sysfs_show(page, config, event);

Re: [PATCH 4/8] perf x86: Adding hardware events translations for amd cpus

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 16:25 +0200, Jiri Olsa wrote: On Wed, Oct 10, 2012 at 04:11:42PM +0200, Peter Zijlstra wrote: On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote: +static ssize_t amd_event_sysfs_show(char *page, u64 config) +{ + u64 event = (config

Re: Meaningless load?

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 17:44 +0200, Simon Klinkert wrote: I'm just wondering if the 'load' is really meaningful in this scenario. The machine is the whole time fully responsive and looks fine to me but maybe I didn't understand correctly what the load should mean. Is there any sensible

Re: [PATCH V2] task_work: avoid unneeded cmpxchg() in task_work_run()

2012-10-10 Thread Peter Zijlstra
On Wed, 2012-10-10 at 19:50 +0200, Oleg Nesterov wrote: But you did not answer, and I am curious. What was your original motivation? Is xchg really faster than cmpxchg? And is this true over multiple architectures? Or are we optimizing for x86_64 (again) ? -- To unsubscribe from this list:

Re: [patch for-3.7] mm, mempolicy: fix printing stack contents in numa_maps

2012-10-25 Thread Peter Zijlstra
On Wed, 2012-10-24 at 17:08 -0700, David Rientjes wrote: Ok, this looks the same but it's actually a different issue: mpol_misplaced(), which now only exists in linux-next and not in 3.7-rc2, calls get_vma_policy() which may take the shared policy mutex. This happens while holding

[PATCH 07/31] sched, numa, mm, s390/thp: Implement pmd_pgprot() for s390

2012-10-25 Thread Peter Zijlstra
...@de.ibm.com Cc: Heiko Carstens heiko.carst...@de.ibm.com Cc: Peter Zijlstra pet...@infradead.org Cc: Ralf Baechle r...@linux-mips.org Signed-off-by: Ingo Molnar mi...@kernel.org --- arch/s390/include/asm/pgtable.h | 13 + 1 file changed, 13 insertions(+) Index: tip/arch/s390/include/asm

[PATCH 02/31] sched, numa, mm: Describe the NUMA scheduling problem formally

2012-10-25 Thread Peter Zijlstra
This is probably a first: formal description of a complex high-level computing problem, within the kernel source. Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Cc: Linus Torvalds torva...@linux-foundation.org Cc: Andrew Morton a...@linux-foundation.org Cc: Peter Zijlstra a.p.zijls

[PATCH 06/31] mm: Only flush the TLB when clearing an accessible pte

2012-10-25 Thread Peter Zijlstra
From: Rik van Riel r...@redhat.com If ptep_clear_flush() is called to clear a page table entry that is accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry, there is no need to flush the TLB on remote CPUs. Signed-off-by: Rik van Riel r...@redhat.com Signed-off-by: Peter Zijlstra

[PATCH 04/31] x86/mm: Introduce pte_accessible()

2012-10-25 Thread Peter Zijlstra
flushes for pages that are not actually accessible. Signed-off-by: Rik van Riel r...@redhat.com Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Cc: Linus Torvalds torva...@linux-foundation.org Cc: Andrew Morton a...@linux-foundation.org Cc: Peter Zijlstra a.p.zijls...@chello.nl Signed-off

<    2   3   4   5   6   7   8   9   10   11   >