On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
+/* sched-domain levels */
+#define SD_SIBLING 0x01/* Only for CONFIG_SCHED_SMT */
+#define SD_MC 0x02/* Only for CONFIG_SCHED_MC */
+#define SD_BOOK0x04/* Only for
On Tue, 2012-09-25 at 10:39 +0800, Tang Chen wrote:
We do this because nr_node_ids changed, right? This means the entire
distance table grew/shrunk, which means we should do the level scan
again.
It seems that nr_node_ids will not change once the system is up.
I'm not quite sure. If I am
On Tue, 2012-09-25 at 12:44 +0200, Stephane Eranian wrote:
Hi,
I don't understand why the local variable box needs to
be declared static here:
static struct intel_uncore_box *
uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
{
static struct intel_uncore_box *box;
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
@@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq,
struct work_struct *work)
{
int ret;
- ret = queue_work_on(get_cpu(), wq, work);
- put_cpu();
+ preempt_disable();
+ ret =
On Thu, 2012-09-20 at 13:03 -0400, Vince Weaver wrote:
One additional complication: some of the cache events map to
event 0. This causes problems because the generic events code
assumes 0 means not-available. I'm not sure the best way to address
that problem.
For all except P4 we could
On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:
But this is what the initial idea during LPC we had.
Yeah.. that's true.
Any improvements here you can suggest?
We could uhm... /me tries thinking ... reuse some of the NOHZ magic?
Would that be sufficient, not waking a NOHZ cpu, or do
On Tue, 2012-09-25 at 13:40 +0200, Peter Zijlstra wrote:
On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:
But this is what the initial idea during LPC we had.
Yeah.. that's true.
Any improvements here you can suggest?
We could uhm... /me tries thinking ... reuse some
On Tue, 2012-09-25 at 19:45 +0800, Tang Chen wrote:
Let's have an example here.
sched_init_numa()
{
...
// A loop set sched_domains_numa_levels to level.-1
// I set sched_domains_numa_levels to 0.
sched_domains_numa_levels =
On Mon, 2012-09-24 at 19:11 -0700, Linus Torvalds wrote:
In the not-so-distant past, we had the intel Dunnington Xeon, which
was iirc basically three Core 2 duo's bolted together (ie three
clusters of two cores sharing L2, and a fully shared L3). So that was
a true multi-core with fairly big
On Tue, 2012-09-25 at 15:42 +0400, Cyrill Gorcunov wrote:
Guys, letme re-read this whole mail thread first since I have no clue
what this remapping about ;)
x86_setup_perfctr() / set_ext_hw_attr() have special purposed 0 and -1
config values to mean -ENOENT and -EINVAL resp.
This means
On Tue, 2012-09-25 at 14:23 +0100, Mel Gorman wrote:
It crashes on boot due to the fact that you created a function-scope variable
called sd_llc in select_idle_sibling() and shadowed the actual sd_llc you
were interested in.
D'0h!
--
To unsubscribe from this list: send the line unsubscribe
On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote:
Wouldn't a clean solution be to promote a task's scheduler
class to the spinner class when we PLE (or come from some special
syscall
for userspace spinlocks?)?
Userspace spinlocks are typically employed to avoid syscalls..
That class
On Wed, 2012-09-26 at 15:39 +0200, Andrew Jones wrote:
On Wed, Sep 26, 2012 at 03:26:11PM +0200, Peter Zijlstra wrote:
On Wed, 2012-09-26 at 15:20 +0200, Andrew Jones wrote:
Wouldn't a clean solution be to promote a task's scheduler
class to the spinner class when we PLE (or come from
On Wed, 2012-09-26 at 11:19 -0700, Linus Torvalds wrote:
For example, it starts with the maximum target scheduling domain, and
works its way in over the scheduling groups within that domain. What
the f*ck is the logic of that kind of crazy thing? It never makes
sense to look at a biggest
On Wed, 2012-09-26 at 22:12 -0700, tip-bot for Peter Zijlstra wrote:
Commit-ID: 5d18023294abc22984886bd7185344e0c2be0daf
Gitweb: http://git.kernel.org/tip/5d18023294abc22984886bd7185344e0c2be0daf
Author: Peter Zijlstra pet...@infradead.org
AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200
On Thu, 2012-09-27 at 09:48 -0700, da...@lang.hm wrote:
I think you are bing too smart for your own good. you don't know if it's
best to move them further apart or not.
Well yes and no.. You're right, however in general the load-balancer has
always tried to not use (SMT) siblings whenever
On Thu, 2012-09-27 at 10:45 -0700, da...@lang.hm wrote:
But I thought that this conversation (pgbench) was dealing with long
running processes,
Ah, I think we've got a confusion on long vs short.. yes pgbench is a
long-running process, however the tasks might not be long in runnable
state. Ie
On Thu, 2012-09-27 at 11:19 -0700, Linus Torvalds wrote:
On Thu, Sep 27, 2012 at 11:05 AM, Borislav Petkov b...@alien8.de wrote:
On Thu, Sep 27, 2012 at 10:44:26AM -0700, Linus Torvalds wrote:
Or could we just improve the heuristics. What happens if the
scheduling granularity is increased,
On Wed, 2012-10-03 at 15:13 +0200, Jiri Olsa wrote:
@@ -1190,8 +1191,8 @@ static inline void perf_sample_data_init(struct
perf_sample_data *data,
data-raw = NULL;
data-br_stack = NULL;
data-period = period;
- data-regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
-
On Wed, 2012-10-03 at 11:14 -0400, Steven Rostedt wrote:
Yep. I personally never use the get_maintainers script. I first check
the MAINTAINERS file. If the subsystem I'm working on exists there, I
only email those that are listed there, including any mailing lists that
are mentioned (as well
On Thu, 2012-10-04 at 01:05 +0200, Andrea Righi wrote:
+++ b/kernel/sched/core.c
@@ -727,15 +727,17 @@ static void dequeue_task(struct rq *rq, struct
task_struct *p, int flags)
void activate_task(struct rq *rq, struct task_struct *p, int flags)
{
if (task_contributes_to_load(p))
7fdba1ca10462f42ad2246b918fe6368f5ce488e
Author: Peter Zijlstra a.p.zijls...@chello.nl
Date: Fri Jul 22 13:41:54 2011 +0200
perf, x86: Avoid kfree() in CPU_STARTING
On -rt kfree() can schedule, but CPU_STARTING is before the CPU is
fully up and running. These are contradictory, so avoid it. Instead
push
On Thu, 2012-10-04 at 11:43 +0200, Andrea Righi wrote:
Right, the update must be atomic to have a coherent nr_uninterruptible
value. And AFAICS the only way to account a coherent
nr_uninterruptible
value per-cpu is to go with atomic ops... mmh... I'll think more on
this.
You could stick
On Tue, 2012-09-25 at 21:12 +0800, Tang Chen wrote:
+static int sched_domains_numa_masks_update(struct notifier_block
*nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ int cpu = (int)hcpu;
On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
Again the numbers are ridiculously high for arch_local_irq_restore.
Maybe there's a bad perf/kvm interaction when we're injecting an
interrupt, I can't believe we're spending 84% of the time running the
popf instruction.
Smells like a
On Thu, 2012-10-04 at 10:46 -0700, Greg Kroah-Hartman wrote:
On Thu, Oct 04, 2012 at 12:11:01PM +0800, Huacai Chen wrote:
Hi, Greg
I found that Linux-3.5.5 accept this commit sched: Add missing call
to calc_load_exit_idle() but I think this isn't needed. Because
5167e8d5417b
On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
I'm puzzled as well. Any ideas if I should do anything here or not?
So I think the current v3.5.5 code is fine. I'm just not smart enough to
figure out how 3.6 got fuzzed, this git thing is confusing as hell.
--
To unsubscribe from
On Fri, 2012-10-05 at 12:36 +0200, Stephane Eranian wrote:
struct perf_event_attr attr = { .config = 0x1234, .config1 = 0x456 };
Does anyone have a better solution to propose?
struct perf_event_attr attr = {
.config = 0x1234,
{ .config1 = 0x5678 },
};
sometimes works,
On Fri, 2012-10-05 at 10:10 -0700, Jonathan Nieder wrote:
Peter Zijlstra wrote:
On Thu, 2012-10-04 at 15:27 -0700, Greg Kroah-Hartman wrote:
I'm puzzled as well. Any ideas if I should do anything here or not?
So I think the current v3.5.5 code is fine.
Now I'm puzzled. You wrote
On Sat, 2012-10-06 at 09:39 +0200, Ingo Molnar wrote:
Thanks Ingo! Paul,
tip/kernel/sched/fair.c | 28 ++--
1 file changed, 18 insertions(+), 10 deletions(-)
Index: tip/kernel/sched/fair.c
===
---
On Wed, 2012-10-17 at 20:29 -0700, David Rientjes wrote:
Ok, thanks for the update. I agree that we should be clearing the mapping
at node hot-remove since any cpu that would subsequently get onlined and
assume one of the previous cpu's ids is not guaranteed to have the same
affinity.
On Thu, 2012-10-18 at 17:20 -0400, Rik van Riel wrote:
Having the function name indicate what the function is used
for makes the code a little easier to read. Furthermore,
the fault handling code largely consists of do__page
functions.
I don't much care either way, but I was thinking
should
probably be rewritten once we figure out the final details of
what the NUMA code needs to do, and why.
Signed-off-by: Rik van Riel r...@redhat.com
Acked-by: Peter Zijlstra a.p.zijls...@chello.nl
Thanks Rik!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel
On Thu, 2012-10-18 at 15:28 -0400, Mikulas Patocka wrote:
On Thu, 18 Oct 2012, Oleg Nesterov wrote:
Ooooh. And I just noticed include/linux/percpu-rwsem.h which does
something similar. Certainly it was not in my tree when I started
this patch... percpu_down_write() doesn't allow
On Fri, 2012-10-19 at 01:21 -0400, Dave Jones wrote:
Not sure why you are CC'ing a call site, rather than the maintainers of
the code. Just looks like lockdep is using too small a static value.
Though it is pretty darn large...
You're right, it's a huge chunk of memory.
It looks like
in sched_autogroup_create_attach().
Reported-by: cwillu cwi...@cwillu.com
Reported-by: Luis Henriques luis.henriq...@canonical.com
Signed-off-by: Xiaotian Feng dannyf...@tencent.com
Cc: Ingo Molnar mi...@redhat.com
Cc: Peter Zijlstra pet...@infradead.org
---
kernel/sched/auto_group.c | 10
On Wed, 2012-10-17 at 11:35 -0400, Vince Weaver wrote:
This is by accident; it looks like the code does
val |= ARCH_PERFMON_EVENTSEL_ENABLE;
in p6_pmu_disable_event() so that events are never truly disabled
(is this a bug? should it be =~ instead?).
I think that's on purpose.. from
On Thu, 2012-10-18 at 11:32 +0400, Vladimir Davydov wrote:
1) Do you agree that the problem exists and should be sorted out?
This is two questions.. yes it exists, I'm absolutely sure I pointed it
out as soon as people even started talking about this nonsense (bw
cruft).
Should it be sorted,
On Thu, 2012-10-18 at 09:40 -0400, Steven Rostedt wrote:
Peter,
There was a little conflict with my merge of 3.4.14 due to the backport
of this patch:
commit 947ca1856a7e60aa6d20536785e6a42dff25aa6e
Author: Michael Wang wang...@linux.vnet.ibm.com
Date: Wed Sep 5 10:33:18 2012 +0800
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote:
Of course I'm banging my head into a wall for not seeing earlier
through the existing migration path how easy this could be.
There's a reason I keep promoting the idea of 'someone' rewriting all
that page-migration code :-) I forever
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote:
Right now, unlike the traditional migration path, this breaks COW for
every migration, but maybe you don't care about shared pages in the
first place. And fixing that should be nothing more than grabbing the
anon_vma lock and using
On Fri, 2012-10-19 at 09:51 -0400, Johannes Weiner wrote:
It's slightly ugly that migrate_page_copy() actually modifies the
existing page (deactivation, munlock) when you end up having to revert
back to it.
The worst is actually calling copy_huge_page() on a THP.. it seems to
work though ;-)
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
-modifier_event [ukhpGH]{1,8}
+modifier_event [ukhpGHx]{1,8}
wouldn't the max modifier sting length grow by adding another possible
modifier?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
+static int intel_pebs_aliases_snb(struct perf_event *event)
+{
+ u64 cfg = event-hw.config;
+ /*
+* for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
+* be measured alone on SNB (exclusive
On Fri, 2012-10-19 at 11:53 -0400, Rik van Riel wrote:
If we do need the extra refcount, why is normal
page migration safe? :)
Its mostly a matter of how convoluted you make the code, regular page
migration is about as bad as you can get
Normal does:
follow_page(FOLL_GET) +1
On Fri, 2012-10-19 at 18:31 +0200, Stephane Eranian wrote:
On Fri, Oct 19, 2012 at 6:27 PM, Peter Zijlstra pet...@infradead.org wrote:
On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
+static int intel_pebs_aliases_snb(struct perf_event *event)
+{
+ u64 cfg = event
On Fri, 2012-10-19 at 11:32 -0400, Mikulas Patocka wrote:
So if you can do an alternative implementation without RCU, show it.
Uhm,,. no that's not how it works. You just don't push through crap like
this and then demand someone else does it better.
But using preempt_{disable,enable} and using
On Fri, 2012-10-19 at 13:13 -0400, Rik van Riel wrote:
Would it make sense to have the normal page migration code always
work with the extra refcount, so we do not have to introduce a new
MIGRATE_FAULT migration mode?
On the other hand, compaction does not take the extra reference...
On Thu, 2012-10-18 at 17:02 +0200, Ralf Baechle wrote:
CC mm/huge_memory.o
mm/huge_memory.c: In function ‘do_huge_pmd_prot_none’:
mm/huge_memory.c:789:3: error: incompatible type for argument 3 of
‘update_mmu_cache’
That appears to have become update_mmu_cache_pmd(), which makes sense
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote:
+ /* LBR callstack does not work well with FREEZE_LBRS_ON_PMI */
+ if (!cpuc-lbr_sel || !(cpuc-lbr_sel-config LBR_CALL_STACK))
+ debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
How useful it is without this? How many
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote:
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -160,8 +160,9 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ABORT= 1U 7, /* transaction aborts */
PERF_SAMPLE_BRANCH_INTX
On Sat, 2012-10-20 at 12:22 -0400, Frederic Weisbecker wrote:
+ if (empty) {
+ /*
+* If an IPI is requested, raise it right away. Otherwise wait
+* for the next tick unless it's stopped. Now if the arch uses
+* some other
On Sun, 2012-10-21 at 05:56 -0700, tip-bot for Andrea Arcangeli wrote:
In get_user_pages_fast() the TLB shootdown code can clear the pagetables
before firing any TLB flush (the page can't be freed until the TLB
flushing IPI has been delivered but the pagetables will be cleared well
before
On Sat, 2012-10-20 at 21:06 +0200, Andrea Righi wrote:
@@ -383,13 +383,7 @@ struct rq {
struct list_head leaf_rt_rq_list;
#endif
+ unsigned long __percpu *nr_uninterruptible;
This is O(nr_cpus^2) memory..
+unsigned long nr_uninterruptible_cpu(int cpu)
+{
+
On Mon, 2012-10-22 at 14:55 +0300, Dan Carpenter wrote:
Hello Peter Zijlstra,
The patch 3d049f8a5398: sched, numa, mm: Implement constant, per
task Working Set Sampling (WSS) rate from Oct 14, 2012, leads to the
following warning:
kernel/sched/fair.c:954 task_numa_work()
error: we
On Mon, 2012-10-22 at 17:44 +0200, Stephane Eranian wrote:
I know the answer, because I know what's going on under the
hood. But what about the average user?
I'm still wondering if the avg user really thinks 'instructions' is a
useful metric for other than obtaining ipc measurements.
The
On Mon, 2012-10-22 at 18:08 +0200, Stephane Eranian wrote:
I'm still wondering if the avg user really thinks 'instructions' is
a
useful metric for other than obtaining ipc measurements.
Yeah, for many users CPI (or IPC) is a useful metric.
Right but you don't get that using instruction
On Tue, 2012-10-23 at 13:41 +0800, Yan, Zheng wrote:
On 10/22/2012 06:35 PM, Peter Zijlstra wrote:
On Mon, 2012-10-22 at 14:11 +0800, Yan, Zheng wrote:
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -160,8 +160,9 @@ enum perf_branch_sample_type
/ bzImage
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
---
tools/perf/builtin-stat.c | 42 --
1 files changed, 36 insertions(+), 6 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 93b9011..6888960 100644
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+ INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+ INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
@@ -826,7 +827,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct
perf_event *event)
return true;
/* implicit branch sampling to correct PEBS skid */
- if (x86_pmu.intel_cap.pebs_trap
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
+ if (event-attr.sample_type PERF_SAMPLE_RAW) {
+ raw.size = x86_pmu.pebs_record_size;
+ raw.data = __pebs;
+ data.raw = raw;
+ }
The Changelog babbles about registers, yet you export
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
From: Andi Kleen a...@linux.intel.com
This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest
Please, always CC people who wrote the code as well, in this
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the PEBS record.
s/PEBS record/LBR format/ I presume ;-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
+ BRANCH_OPT(abort, PERF_SAMPLE_BRANCH_ABORT),
+ BRANCH_OPT(intx, PERF_SAMPLE_BRANCH_INTX),
+ BRANCH_OPT(notx, PERF_SAMPLE_BRANCH_NOTX),
I think we want tx in the abort name, its very much a transaction abort,
not any
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
@@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event
*event)
int intel_pmu_save_and_restart(struct perf_event *event)
{
x86_perf_event_update(event);
+ /*
+* For a checkpointed counter
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
+ /* XXX move somewhere else. */
+ if (cpuc-events[2] (cpuc-events[2]-hw.config
HSW_INTX_CHECKPOINTED))
+ status |= (1ULL 2);
A comment explaining about those 'spurious' PMIs would go along with
this nicely,
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
@@ -601,6 +602,7 @@ static inline void perf_sample_data_init(struct
perf_sample_data *data,
data-regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data-regs_user.regs = NULL;
data-stack_user_size = 0;
+ data-weight
On Thu, 2012-10-18 at 16:19 -0700, Andi Kleen wrote:
From: Andi Kleen a...@linux.intel.com
When a weighted sample is requested, first try to report the TSX abort cost
on Haswell. If that is not available report the memory latency. This
allows profiling both by abort cost and by memory
On Tue, 2012-10-23 at 15:30 +0200, Andi Kleen wrote:
Also, there's an alignment issue there, the raw.data is 32bit offset,
the record is u64 aligned, leaving the output stream offset, wrecking
things.
Can you explain more? Not sure I understand.
PERF_SAMPLE_RAW has a u32 size header and
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote:
arch/x86/kernel/cpu/perf_event.c | 121
+
arch/x86/kernel/cpu/perf_event.h | 2 ++
arch/x86/kernel/cpu/perf_event_amd.c | 9 +++
On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
It's possible some of the counters in the group could be
disabled when sampling member of the event group is reading
the rest via PERF_SAMPLE_READ sample type processing. Disabled
counters could then produce wrong numbers.
Fixing that by
On Mon, 2012-10-22 at 19:37 -0400, Mikulas Patocka wrote:
- /*
-* On X86, write operation in this_cpu_dec serves as a memory unlock
-* barrier (i.e. memory accesses may be moved before the write, but
-* no memory accesses are moved past the write).
-* On
On Tue, 2012-10-23 at 21:23 +0200, Oleg Nesterov wrote:
I have to admit, I have
no idea how much cli/sti is slower compared to preempt_disable/enable.
A lot.. esp on stupid hardware (insert pentium-4 reference), but I think
its more expensive for pretty much all hardware, preempt_disable() is
On Tue, 2012-10-23 at 21:23 +0200, Oleg Nesterov wrote:
static void mb_ipi(void *arg)
{
smp_mb(); /* unneeded ? */
}
static void force_mb_on_each_cpu(void)
{
smp_mb();
smp_call_function(mb_ipi,
On Wed, 2012-10-24 at 01:59 +0400, Cyrill Gorcunov wrote:
[ilog2(VM_WRITE)] = { {'w', 'r'} },
since we're being awfully positive about crazy late night ideas, how
about something like:
#define MNEM(_VM, _mn) [ilog2(_VM)] = {(const char [2]){_mn}}
MNEM(VM_WRITE,
On Wed, 2012-10-24 at 12:45 +0400, Cyrill Gorcunov wrote:
for (i = 0; i BITS_PER_LONG; i++) {
- if (vma-vm_flags (1 i))
+ if (vma-vm_flags (1ul i)) {
for_each_set_bit(i, vma-vm_flags, BITS_PER_LONG) {
seq_printf(m, %c%c
On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
We found poweroff sometimes fails on our computers, so we have the
lock debug options configured. Then, when we do poweroff or take a
cpu down via cpu-hotplug, kernel complain as below. To resove this,
we modify sched_ttwu_pending(),
On Tue, 2012-10-23 at 18:50 +0200, Jiri Olsa wrote:
On Tue, Oct 23, 2012 at 06:13:09PM +0200, Peter Zijlstra wrote:
On Sat, 2012-10-20 at 16:33 +0200, Jiri Olsa wrote:
It's possible some of the counters in the group could be
disabled when sampling member of the event group is reading
On Tue, 2012-10-16 at 11:31 -0700, Sukadev Bhattiprolu wrote:
On a side note, how does the kernel on x86 use the 'config' information in
say /sys/bus/event_source/devices/cpu/format/cccr ? On Power7, the raw
code encodes the information such as the PMC to use for the event. Is that
how the
On Wed, 2012-10-24 at 14:14 +0200, Jiri Olsa wrote:
well, x86_pmu_read calls x86_perf_event_update, which expects the
event
is active.. if it's not it'll update the count from whatever left in
event.hw.idx counter.. could be uninitialized or used by others..
Oh right, we shouldn't call
On Wed, 2012-10-24 at 20:34 +0800, 陈华才 wrote:
I see, this is an arch-specific bug, sorry for my carelessness and thank
you for your tips.
What arch are you using? And what exactly did the arch do wrong? Most of
the code involved seems to be common code.
Going by c0_compare_interrupt, this is
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
return -1. As a result, cpumask_of_node(nid) will return NULL. In this case,
find_next_bit() in for_each_cpu will get a NULL pointer and cause panic.
Hurm,. this is
On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote:
But the code looks more complex, and the only advantage is that
non-exiting task does xchg() instead of cmpxchg(). Not sure this
worth the trouble, in this case task_work_run() will likey run
the callbacks (the caller checks -task_works
On Tue, 2012-10-09 at 06:37 -0700, Andi Kleen wrote:
Ivo Sieben meltedpiano...@gmail.com writes:
Check the waitqueue task list to be non empty before entering the critical
section. This prevents locking the spin lock needlessly in case the queue
was empty, and therefor also prevent
On Tue, 2012-10-09 at 17:38 +0200, Andre Przywara wrote:
First you need an AMD family 10h/12h CPU. These do not reset the
PERF_CTR registers on a reboot.
Now you boot bare metal Linux, which goes successfully through this
check, but leaves the magic value of 0xabcd in the register. You
don't
On Tue, 2012-10-09 at 13:36 -0700, David Rientjes wrote:
On Tue, 9 Oct 2012, Peter Zijlstra wrote:
On Mon, 2012-10-08 at 10:59 +0800, Tang Chen wrote:
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will
return -1. As a result, cpumask_of_node(nid) will return NULL
On Tue, 2012-10-09 at 16:27 -0700, David Rientjes wrote:
On Tue, 9 Oct 2012, Peter Zijlstra wrote:
Well the code they were patching is in the wakeup path. As I think Tang
said, we leave !runnable tasks on whatever cpu they ran on last, even if
that cpu is offlined, we try and fix up state
On Wed, 2012-10-10 at 17:33 +0800, Wen Congyang wrote:
Hmm, if per-cpu memory is preserved, and we can't offline and remove
this memory. So we can't offline the node.
But, if the node is hot added, and per-cpu memory doesn't use the
memory on this node. We can hotremove cpu/memory on this
On Wed, 2012-10-10 at 18:10 +0800, Wen Congyang wrote:
I use ./scripts/get_maintainer.pl, and it doesn't tell me that I should cc
you when I post that patch.
That script doesn't look at all usage sites of the code you modify does
it?
You need to audit the entire tree for usage of the
On Wed, 2012-10-10 at 13:29 +0100, Mel Gorman wrote:
Do we really switch more though?
Look at the difference in interrupts vs context switch. IPIs are an interrupt
so if TTWU_QUEUE wakes process B using an IPI, does that count as a context
switch?
Nope. Nor would it for NO_TTWU_QUEUE. A
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote:
+static ssize_t amd_event_sysfs_show(char *page, u64 config)
+{
+ u64 event = (config ARCH_PERFMON_EVENTSEL_EVENT) |
+ (config AMD64_EVENTSEL_EVENT) 24;
+
+ return x86_event_sysfs_show(page, config, event);
On Wed, 2012-10-10 at 16:25 +0200, Jiri Olsa wrote:
On Wed, Oct 10, 2012 at 04:11:42PM +0200, Peter Zijlstra wrote:
On Wed, 2012-10-10 at 14:53 +0200, Jiri Olsa wrote:
+static ssize_t amd_event_sysfs_show(char *page, u64 config)
+{
+ u64 event = (config
On Wed, 2012-10-10 at 17:44 +0200, Simon Klinkert wrote:
I'm just wondering if the 'load' is really meaningful in this
scenario. The machine is the whole time fully responsive and looks
fine to me but maybe I didn't understand correctly what the load
should mean. Is there any sensible
On Wed, 2012-10-10 at 19:50 +0200, Oleg Nesterov wrote:
But you did not answer, and I am curious. What was your original
motivation? Is xchg really faster than cmpxchg?
And is this true over multiple architectures? Or are we optimizing for
x86_64 (again) ?
--
To unsubscribe from this list:
On Wed, 2012-10-24 at 17:08 -0700, David Rientjes wrote:
Ok, this looks the same but it's actually a different issue:
mpol_misplaced(), which now only exists in linux-next and not in 3.7-rc2,
calls get_vma_policy() which may take the shared policy mutex. This
happens while holding
...@de.ibm.com
Cc: Heiko Carstens heiko.carst...@de.ibm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Ralf Baechle r...@linux-mips.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
arch/s390/include/asm/pgtable.h | 13 +
1 file changed, 13 insertions(+)
Index: tip/arch/s390/include/asm
This is probably a first: formal description of a complex high-level
computing problem, within the kernel source.
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Peter Zijlstra a.p.zijls
From: Rik van Riel r...@redhat.com
If ptep_clear_flush() is called to clear a page table entry that is
accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry,
there is no need to flush the TLB on remote CPUs.
Signed-off-by: Rik van Riel r...@redhat.com
Signed-off-by: Peter Zijlstra
flushes for pages that are not actually accessible.
Signed-off-by: Rik van Riel r...@redhat.com
Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Signed-off
601 - 700 of 42288 matches
Mail list logo