racy jump label users

2013-03-22 Thread Andi Kleen
Jason, I noticed that a lot of the jump label users are racy, because they implement something like this static void sched_feat_disable(int i) { if (static_key_enabled(&sched_feat_keys[i])) static_key_slow_dec(&sched_feat_keys[i]); } static void sched_feat_enable(int i) {

[PATCH 10/29] locking, tsx: Add support for arch_spin_unlock_irq/flags

2013-03-22 Thread Andi Kleen
From: Andi Kleen The TSX RTM lock elision code needs to distinguish spin unlocks that reenable interrupts from others. Currently this is hidden in the higher level spinlock code. Add support for calling an architecture specific arch_spin_unlock_flags/irq if available. Signed-off-by: Andi Kleen

[PATCH 25/29] x86, tsx: Add adaption support for spinlocks

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add adaptation support for ticket spinlocks. Each spinlock keeps a skip count on how often to skip elision. This is controlled by the abort rate. The actual adaptation algorithm is generic and shared with other lock types. This avoids us having to tune each spinlock individually

[PATCH 09/29] x86, xen: Support arch_spin_unlock_irq/flags

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add simple implementations of arch_spin_unlock_flags/irq to the Xen paravirt spinlock code. Cc: jer...@goop.org Signed-off-by: Andi Kleen --- arch/x86/xen/spinlock.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen

[PATCH 29/29] tsx: Add documentation for lock-elision

2013-03-22 Thread Andi Kleen
From: Andi Kleen Document the tunables and the statistics in Documentation/lock-elision.txt Signed-off-by: Andi Kleen --- Documentation/lock-elision.txt | 94 1 files changed, 94 insertions(+), 0 deletions(-) create mode 100644 Documentation/lock

[PATCH 28/29] x86, tsx: Add adaptive elision for rwsems

2013-03-22 Thread Andi Kleen
From: Andi Kleen Convert the rwsem elision to be adaptive. This requires adding 4/8 bytes to the rwsem for the adaptation state. The algorithm is the same as for other adaptive lock types. The elision configuration is split for readers and writers. Signed-off-by: Andi Kleen --- arch/x86

[PATCH 17/29] x86, tsx: Enable lock elision for mutexes

2013-03-22 Thread Andi Kleen
From: Andi Kleen We use the generic elision macros and the mutex hook infrastructure added earlier. With that attempt to lock elide mutexes using the elide() macros and the usual elision wrapping pattern. Lock elision does not allow modifying the lock itself, so it's not possible to se

[PATCH 21/29] locking, tsx: Protect assert_spin_locked() with _xtest()

2013-03-22 Thread Andi Kleen
From: Andi Kleen lock_is_locked aborts with lock elision. Some code does a lot of lock asserts, which causes a lot of aborts. Add a _xtest() here so that the checking is only done when the lock is not elided. This always happens occasionally due to fallbacks, so there is still enough assert

[PATCH 19/29] x86, tsx: Add support for rwsem elision

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add the standard elide wrapper macros to rwsems to enable lock elision for rwsems. Main target is mmap_sem. Signed-off-by: Andi Kleen --- arch/x86/include/asm/rwsem.h | 23 ++- arch/x86/kernel/rtm-locks.c |3 +++ include/linux/rwsem.h|3

[PATCH 18/29] locking, tsx: Abort is mutex_is_locked()

2013-03-22 Thread Andi Kleen
From: Andi Kleen Inside a elided mutex we cannot tell if the mutex is really locked or not. Aborting it he safe answer. Callers who frequently abort (e.g. BUG_ONs) need to be fixed separately. Noop without RTM. Signed-off-by: Andi Kleen --- include/linux/mutex.h |2 ++ 1 files changed

[PATCH 16/29] locking, tsx: Allow architecture to control mutex fast path owner field

2013-03-22 Thread Andi Kleen
From: Andi Kleen Elided locks do not allow writing to the lock cache line in the fast path. This would abort the transaction. They do not actually need an owner in the speculative fast path, because they do not actually take the lock. But in the slow path when the lock is taken they actually

[PATCH 13/29] params: Add a per cpu module param type

2013-03-22 Thread Andi Kleen
From: Andi Kleen This is mainly useful for simple statistic counters. Essentially read-only, writing only clears. Cc: ru...@rustcorp.co.au Signed-off-by: Andi Kleen --- include/linux/moduleparam.h |4 kernel/params.c | 28 2 files changed

[PATCH 08/29] locking, tsx: Add support for arch_read/write_unlock_irq/flags

2013-03-22 Thread Andi Kleen
From: Andi Kleen The TSX RTM lock elision code needs to distinguish unlocks that reenable interrupts from others. Add arch_read/write_unlock_irq/flags for rwlocks similar to the ones for spinlocks. This is opt in by the architecture. Signed-off-by: Andi Kleen --- include/linux/rwlock.h

[PATCH 11/29] x86, paravirt: Add support for arch_spin_unlock_flags/irq

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add support for arch_spin_unlock_flags/irq Only done for the paravirt case, non paravirt just uses the wrappers in the upper level code. Cc: jer...@goop.org Signed-off-by: Andi Kleen --- arch/x86/include/asm/paravirt.h | 13 + arch/x86/include/asm

[PATCH 05/29] x86, tsx: Add a minimal RTM tester at bootup

2013-03-22 Thread Andi Kleen
From: Andi Kleen May be removed later, but useful for basic sanity checking. Signed-off-by: Andi Kleen --- arch/x86/kernel/Makefile |2 ++ arch/x86/kernel/rtm-test.c | 22 ++ 2 files changed, 24 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/rtm

[PATCH 03/29] tsx: Add generic disable_txn macros

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add generic macros to disable transactions per process. Without TSX (or other HTM) support this is a noop. An RTM enabled x86 kernel uses its own version. Signed-off-by: Andi Kleen --- include/linux/thread_info.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions

[PATCH 04/29] tsx: Add generic linux/elide.h macros

2013-03-22 Thread Andi Kleen
From: Andi Kleen For lock elision we (mostly) use generic elide() macros that can be added to the lock code with minimal intrusion. Add a generic version that does nothing and is used when RTM is not available. Signed-off-by: Andi Kleen --- include/linux/elide.h | 18 ++ 1

RFC: Kernel lock elision for TSX

2013-03-22 Thread Andi Kleen
/rtm_locks You may need the latest hsw/pmu* branch from git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git - Why does this use RTM and not HLE RTM is more flexible and we don't need HLE in this code. Andi Kleen a...@linux.intel.com Speaking for myself only -- To unsubscribe from

[PATCH 15/29] x86, tsx: Add TSX lock elision infrastructure

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add basic TSX lock elision infrastructure. This is implemented using RTM to give more flexibility. A lock is elided by wrapping an elision check around it: when the lock is free try to speculatively execute the lock region and fall back if that fails. Provide some generic

[PATCH 22/29] locking, tsx: Add a trace point for elision skipping

2013-03-22 Thread Andi Kleen
From: Andi Kleen For tuning the adaptive locking algorithms it's useful to trace adaptive elision skipping. Add a trace point for this case. Used in followon patches Signed-off-by: Andi Kleen --- include/trace/events/elision.h | 31 +++ 1 files change

[PATCH 26/29] x86, tsx: Add adaptation support to rw spinlocks

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add elision adaptation state to the rwlocks and use the generic adaptation wrapper. This unfortunately increases the size of the rwlock: 6 bytes for NR_CPUS>=2048, otherwise by 2 bytes. Signed-off-by: Andi Kleen --- arch/x86/include/asm/rwlock.h |

[PATCH 01/29] tsx: Add generic noop macros for RTM intrinsics

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add generic noop macros (act like transaction aborted) for RTM. The main use case is an occasional _xtest() added to generic code, without needing ifdefs. On x86+RTM this will use real TSX instructions. Signed-off-by: Andi Kleen --- include/linux/rtm.h | 15

[PATCH 27/29] locking, tsx: Add elision to bit spinlocks

2013-03-22 Thread Andi Kleen
From: Andi Kleen Very straight forward. Use the non-adaptive elision wrappers for bit spinlocks. This is useful because they perform very poorly under contention. The functions are a bit on the big side for inlining now, but I kept them inline for now. Signed-off-by: Andi Kleen --- arch/x86

[PATCH 14/29] params: Add static key module param

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add a module param type for uint static keys. Useful for time critical flags. Cc: ru...@rustcorp.co.au Signed-off-by: Andi Kleen --- include/linux/moduleparam.h |6 ++ kernel/params.c | 35 +++ 2 files changed, 41

[PATCH 12/29] x86, tsx: Add a per thread transaction disable count

2013-03-22 Thread Andi Kleen
From: Andi Kleen For some locks that have a low chance of not aborting it's best to just disable transactions. Add a counter to thread_info to allow to disable tranasctions for the current task. I originally experimented with more complicated solutions, like a magic spinlock type to di

[PATCH 24/29] x86, tsx: Use adaptive elision for mutexes

2013-03-22 Thread Andi Kleen
From: Andi Kleen Add the elision adaption state to struct mutex and use elide_adapt() This allows mutexes that do not elide to automatically disable elision on themselves for some time. This means we have a fail-safe for mutexes that do not elide well and do not need to annotate very mutex

[PATCH 20/29] x86, tsx: Enable elision for read write spinlocks

2013-03-22 Thread Andi Kleen
From: Andi Kleen rwspinlocks don't support paravirt ops, so add hooks to call lock elision based on the CPUID bit. We use the standard patching, so the overhead in the fast path is low when RTM is not supported. Signed-off-by: Andi Kleen --- arch/x86/include/asm/spinlock.h |

[PATCH 07/29] x86, tsx: Don't abort immediately in __read/write_lock_failed

2013-03-22 Thread Andi Kleen
From: Andi Kleen __read/write_lock_failed did execute a PAUSE first thing before checking the lock. This aborts transactions. Check the lock state again before executing the pause. This avoids a small number of extra aborts, and is slightly cheaper too. Signed-off-by: Andi Kleen --- arch/x86

[PATCH 23/29] x86, tsx: Add generic per-lock adaptive lock elision support

2013-03-22 Thread Andi Kleen
From: Andi Kleen Extend the elide() macro to support adaptation state per lock. The adaptation keeps track whether the elision is successfull. When the lock aborts due to internal reasons (e.g. it always writes a MSR or always does MMIO) disable elision for some time. The state is kept as a

[PATCH 02/29] x86, tsx: Add RTM intrinsics

2013-03-22 Thread Andi Kleen
From: Andi Kleen This adds the basic RTM (Restricted Transactional Memory) intrinsics for TSX, implemented with alternative() so that they can be transparently used without checking CPUID first. When the CPU does not support TSX we just always jump to the abort handler. These intrinsics are

[PATCH 06/29] checkpatch: Don't warn about if ((status = _xbegin()) == _XBEGIN_STARTED)

2013-03-22 Thread Andi Kleen
From: Andi Kleen Writing _xbegin which is like setjmp in a if is very natural. Stop checkpatch's whining about this. Cc: a...@canonical.com Signed-off-by: Andi Kleen --- scripts/checkpatch.pl |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/scripts/checkpatch

Re: [PATCH 12/29] x86, tsx: Add a per thread transaction disable count

2013-03-23 Thread Andi Kleen
> > This surely can be a bitfield like the other two below. It is basically > begging to be one. Bit fields are slower and larger in code and unlike the others this is on hot paths. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsub

Re: RFC: Kernel lock elision for TSX

2013-03-23 Thread Andi Kleen
Hi Linux, Thanks. Other code/design review would be still appreciated, even under the current constraints. > The other comment I have is that since it does touch non-x86 header > files etc (although not a lot), you really need to talk to the POWER8 > people about naming of the thing. Calling it

Re: [PATCH 12/29] x86, tsx: Add a per thread transaction disable count

2013-03-23 Thread Andi Kleen
> That said, making it just a byte sounds more than enough. How deep > would anybody want to nest it? Do we care? Byte should be fine. I'll change that. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the b

Re: RFC: Kernel lock elision for TSX

2013-03-23 Thread Andi Kleen
On Sat, Mar 23, 2013 at 07:00:10PM +0100, Andi Kleen wrote: > > Hi Linux, Also I debut on finally making that famous typo too. Sorry. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More m

Re: [PATCH 02/29] x86, tsx: Add RTM intrinsics

2013-03-25 Thread Andi Kleen
> FYI the TM spec can be downloaded here: > https://www.power.org/documentation/power-isa-transactional-memory/ > > You're example code looks like this: I don't think portable code will use this directly. Note it's in arch/x86/ Generally portable code should use higher level interfaces, like e

Re: [PATCH 01/29] tsx: Add generic noop macros for RTM intrinsics

2013-03-25 Thread Andi Kleen
> RTM == restricted transactional memory. I don't understand why it's > "restricted" and why any other architecture else would call it that and It's restricted as in there is no guarantee that a transaction ever succeeds and always needs a fallback path. My understanding is that this is true for

Re: [PATCH 02/29] x86, tsx: Add RTM intrinsics

2013-03-25 Thread Andi Kleen
> > Well x and tm doesn't really matter, but I already have x* so i'm inclined > > to keep it, unless people bikeshed too strongly. It should work for PPC too. > > Well if you're moving it out of generic code then it doesn't really > matter anymore. The only operations I want generic is _xtest(

Re: kworkers for dm-crypt locked to CPU core 0?

2013-03-25 Thread Andi Kleen
Christian Schmidt writes: > > Is there a way I can make the scheduler put those on multiple cores? Submit the IO from multiple cores. Don't use dd. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH v1 2/2] perf tools: add attr->mmap2 support

2013-08-14 Thread Andi Kleen
On Wed, Aug 14, 2013 at 12:30:08PM +0200, Stephane Eranian wrote: > On Wed, Aug 14, 2013 at 1:34 AM, Andi Kleen wrote: > > On Tue, Aug 13, 2013 at 01:55:57PM +0200, Stephane Eranian wrote: > >> This patch adds support for the new PERF_RECORD_MMAP2 > >> record type exp

Re: [RFC][PATCH 0/5] preempt_count rework

2013-08-14 Thread Andi Kleen
On Wed, Aug 14, 2013 at 03:15:39PM +0200, Peter Zijlstra wrote: > These patches optimize preempt_enable by firstly folding the preempt and > need_resched tests into one -- this should work for all architectures. And > secondly by providing per-arch preempt_count implementations; with x86 using > pe

Re: [RFC][PATCH 0/5] preempt_count rework

2013-08-14 Thread Andi Kleen
On Wed, Aug 14, 2013 at 06:55:05PM +0200, Peter Zijlstra wrote: > On Wed, Aug 14, 2013 at 09:48:27AM -0700, Andi Kleen wrote: > > > FWIW I removed the user_schedule in v2 because I don't need it anymore. > > Feel free to pick it up from v1 though. > > Ah, I had

perf, x86: Add parts of the remaining haswell PMU functionality v2

2013-08-14 Thread Andi Kleen
[v2: Added Peter's changes to the PEBS handler] Add some more TSX functionality to the basic Haswell PMU. A lot of the infrastructure needed for these patches has been merged earlier, so it is all quite straight forward now. - Add the checkpointed counter workaround. (Parts of this have been alr

[PATCH 3/4] perf, x86: Add Haswell TSX event aliases v6

2013-08-14 Thread Andi Kleen
From: Andi Kleen Add TSX event aliases, and export them from the kernel to perf. These are used by perf stat -T and to allow more user friendly access to events. The events are designed to be fairly generic and may also apply to other architectures implementing HTM. They all cover common

[PATCH 4/4] perf, tools: Add perf stat --transaction v3

2013-08-14 Thread Andi Kleen
From: Andi Kleen Add support to perf stat to print the basic transactional execution statistics: Total cycles, Cycles in Transaction, Cycles in aborted transsactions using the in_tx and in_tx_checkpoint qualifiers. Transaction Starts and Elision Starts, to compute the average transaction length

[PATCH 2/4] perf, x86: Report TSX transaction abort cost as weight v2

2013-08-14 Thread Andi Kleen
From: Andi Kleen Use the existing weight reporting facility to report the transaction abort cost, that is the number of cycles wasted in aborts. Haswell reports this in the PEBS record. This was in fact the original user for weight. This is a very useful sort key to concentrate on the most

[PATCH 1/4] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v4

2013-08-14 Thread Andi Kleen
From: Andi Kleen With checkpointed counters there can be a situation where the counter is overflowing, aborts the transaction, is set back to a non overflowing checkpoint, causes interupt. The interrupt doesn't see the overflow because it has been checkpointed. This is then a spuriou

Re: Re-tune x86 uaccess code for PREEMPT_VOLUNTARY

2013-08-14 Thread Andi Kleen
> I thought I fixed this for good in > commit 114276ac0a3beb9c391a410349bd770653e185ce > Author: Michael S. Tsirkin > Date: Sun May 26 17:32:13 2013 +0300 > mm, sched: Drop voluntary schedule from might_fault() You're right this was an older kernel. So you already fi

Re: page fault scalability (ext3, ext4, xfs)

2013-08-14 Thread Andi Kleen
> And FWIW, it's no secret that XFS has more per-operation overhead > than ext4 through the write path when it comes to allocation, so > it's no surprise that on a workload that is highly dependent on > allocation overhead that ext4 is a bit faster This cannot explain a worse scaling curve tho

Re: [PATCH 1/2] perf x86: Make intel_pmu_enable_all to enable only active events

2013-08-15 Thread Andi Kleen
> > I think its a NOP; this is the global ctrl register but > intel_pmu_disable_event() writes PERFEVTSELx.EN = 0, so even if you > enable it in the global mask, the event should still be disabled. Yes the hardware ANDs the various enable bits in the different registers. -andi -- To unsubscribe

Re: [PATCH 4/4] perf, tools: Add perf stat --transaction v3

2013-08-15 Thread Andi Kleen
> > +/* Default events used for perf stat -T */ > > +static const char * const transaction_attrs[] = { > > + "task-clock", > > + "{" > > + "instructions," > > + "cycles," > > + "cpu/cycles-t/," > > + "cpu/tx-start/," > > + "cpu/el-start/," > > + "cpu/cycles-ct/" > > + "}" > > +};

Re: [PATCH 4/4] perf, tools: Add perf stat --transaction v3

2013-08-15 Thread Andi Kleen
> > * Update various tracking values we maintain to print > > * more semantic information such as miss/hit ratios, > > @@ -283,8 +340,12 @@ static void update_shadow_stats(struct perf_evsel > > *counter, u64 *count) > > update_stats(&runtime_nsecs_stats[0], count[0]); > > else

Re: [PATCH 4/4] perf, tools: Add perf stat --transaction v3

2013-08-15 Thread Andi Kleen
Minor fixes. Signed-off-by: Andi Kleen --- tools/perf/Documentation/perf-stat.txt | 5 ++ tools/perf/builtin-stat.c | 144 - tools/perf/util/evsel.h| 6 ++ tools/perf/util/pmu.c | 16 tools/perf/util/

Re: [RFC PATCH] Fix aio performance regression for database caused by THP

2013-08-15 Thread Andi Kleen
Khalid Aziz writes: > I am working with a tool that simulates oracle database I/O workload. > This tool (orion to be specific - > ) > allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. Is this tool availab

[PATCH 4/6] Move might_sleep and friends from kernel.h to sched.h

2013-08-16 Thread Andi Kleen
From: Andi Kleen These are really related to scheduling, so they should be in sched.h Users usually will need to schedule anyways. The advantage of having them there is that we can access some of the scheduler inlines to make their fast path more efficient. This will come in a followon patch

[PATCH 3/6] tree-sweep: Include linux/sched.h for might_sleep users

2013-08-16 Thread Andi Kleen
From: Andi Kleen might_sleep is moving from linux/kernel.h to linux/sched.h, so any users need to include linux/sched.h This was done with a mechanistic script and some uses may be redundant (already included in some other include file). However it's good practice to always include any n

[PATCH 1/6] x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic

2013-08-16 Thread Andi Kleen
From: Andi Kleen The 64bit __copy_{from,to}_user_inatomic always called copy_from_user_generic, but skipped the special optimizations for 1/2/4/8 byte accesses. This especially hurts the futex call, which accesses the 4 byte futex user value with a complicated fast string operation in a

[PATCH 5/6] sched: mark should_resched() __always_inline

2013-08-16 Thread Andi Kleen
From: Andi Kleen At least gcc 4.6 and some earlier ones does not inline this function. Since it's small and on relatively hot paths force inline it. Signed-off-by: Andi Kleen --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cor

[PATCH 2/6] x86: Include linux/sched.h in asm/uaccess.h

2013-08-16 Thread Andi Kleen
From: Andi Kleen uaccess.h uses might_sleep, but there is currently no explicit include for this. Since a upcoming patch moves might_sleep into sched.h include sched.h here. Signed-off-by: Andi Kleen --- arch/x86/include/asm/uaccess.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch

Improve preempt-scheduling and x86 user access v3

2013-08-16 Thread Andi Kleen
Various optimizations related to CONFIG_PREEMPT_VOLUNTARY and x86 uaccess - Optimize copy_*_inatomic on x86-64 to handle 1-8 bytes without string instructions - Inline might_sleep and other preempt code to optimize various preemption paths This costs about 10k text size, but generates far better

[PATCH 6/6] sched: Inline the need_resched test into the caller for _cond_resched

2013-08-16 Thread Andi Kleen
From: Andi Kleen _cond_resched does at least two explicit calls just to decide to do nothing: _cond_resched and should_resched(). Inline a need_resched() into the caller to avoid these calls in the common case of no reschedule being needed. Signed-off-by: Andi Kleen --- include/linux/sched.h

[PATCH] x86, asmlinkage: Fix warning in xen asmlinkage change

2013-08-19 Thread Andi Kleen
From: Andi Kleen Use __visible for some functions without arguments. This avoids having to add regparm(0) to function pointers. Since they have no arguments it does not make any difference. Signed-off-by: Andi Kleen --- arch/x86/xen/xen-ops.h | 12 ++-- 1 file changed, 6 insertions

Re: linux-next: build warning after merge of the tip tree

2013-08-19 Thread Andi Kleen
> > Introduced by commit 9a55fdbe941e ("x86, asmlinkage, paravirt: Add > > __visible/asmlinkage to xen paravirt ops"). The 2 definitions used to be > > identical ... maybe there should be only one. > > Andi, please send a fix for this build warning, against > tip:x86/asmlinkage. I resent the pa

Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Andi Kleen
I think best would be to just find some way to implement LOCK prefix patching using atomic compiler intrinsics and then switch to those Then all this inline assembler horror could be ifdef'ed away for old compilers only, and likely the generated code would be better as the compiler could optimiz

Re: [PATCH] perf, tools, bench: Fix memcpy benchmark for large sizes

2013-07-16 Thread Andi Kleen
> > @@ -117,6 +117,8 @@ static void alloc_mem(void **dst, void **src, size_t > > length) > > *src = zalloc(length); > > if (!src) > > die("memory allocation failed - maybe length is too large?\n"); > > + /* Make sure to always replace the zero pages even if MMAP_THRESH is >

[PATCH 4/5] perf, tools: flush output after each line in stat interval mode

2013-08-02 Thread Andi Kleen
From: Andi Kleen When interval mode is outputting to a pipe, each measurement should be flushed individually, so that the reader sees it timely. With a terminal each line is automatically flushed by stdio, but that is disabled with non terminal output. Simply fflush output after each time

[PATCH 3/5] perf, tools: Add support for --initial-delay option to perf stat

2013-08-02 Thread Andi Kleen
From: Andi Kleen When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing

Misc perf stat improvements

2013-08-02 Thread Andi Kleen
Here are a couple of perf stat improvements/cleanups: - output more information (ratios) in CSV mode - add --initial-delay to skip startup phase of program - handle pipes better in interval mode - some cleanup -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body

[PATCH 5/5] perf, tools: Output running time and run/enabled ratio in CSV mode

2013-08-02 Thread Andi Kleen
From: Andi Kleen The information how much a counter ran in perf stat can be quite interesting for other tools to judge how trustworthy a measurement is. Currently it is only output in non CSV mode. This patches make perf stat always output the running time and the enabled/running ratio in CSV

[PATCH 2/5] tools, perf: Add support to evsel for enabling counters

2013-08-02 Thread Andi Kleen
From: Andi Kleen Add support for enabling already set up counters by using an ioctl. I share some code with the filter setup. Signed-off-by: Andi Kleen --- tools/perf/util/evsel.c | 21 ++--- tools/perf/util/evsel.h | 1 + 2 files changed, 19 insertions(+), 3 deletions

[PATCH 1/5] perf, tools: Remove obsolete dummy execve

2013-08-02 Thread Andi Kleen
From: Andi Kleen Minor cleanup. The dummy execve to pre-resolve the PLT is obsolete since "enable_on_execve" was added. The counters are only running after the execve anyways. So just remove it. Signed-off-by: Andi Kleen --- tools/perf/util/evlist.c | 7 --- 1 file changed, 7

[PATCH] RFC: perf, tools: Move gtk browser into separate perfgtk executable

2013-08-04 Thread Andi Kleen
From: Andi Kleen By default perf currently links with the GTK2 gui. This pulls in a lot of external libraries. It also causes dependency problems for distribution packages: simply installing perf requires pulling in GTK2 with all its dependencies. I think the UI is valuable, but it shouldn'

Re: [PATCH] Add per-process flag to control thp

2013-08-04 Thread Andi Kleen
Alex Thorlton writes: >> What kind of workloads are you talking about? > > Our benchmarking team has a list of several of the SPEC OMP benchmarks > that perform significantly better when THP is disabled. I tried to get > the list but one of our servers is acting up and I can't get to it > right n

[PATCH] perf, tools: Increase the file descriptor limits

2013-08-04 Thread Andi Kleen
From: Andi Kleen perf stat -a needs 10 open file descriptors per logical CPU perf stat -a - needs 20 open fds for each. This implies that stat -a doesn't work on any system with the default ulimit -n 1024 which has more than ~100 CPUs and stat -a - doesn't work on anything

Re: [PATCH 4/4] perf, tools: Add perf stat --transaction v3

2013-08-21 Thread Andi Kleen
On Wed, Aug 21, 2013 at 10:15:25AM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 15, 2013 at 06:42:03PM +0200, Andi Kleen escreveu: > > > > Here's an updated patch. > > > perf, tools: Add perf stat --transaction v4 > > @@ -1419,6 +1559,8 @@ int cmd_stat

[PATCH] x86, asmlinkage: Fix warning in xen asmlinkage change v2

2013-08-21 Thread Andi Kleen
From: Andi Kleen Current code uses asmlinkage for functions without arguments. This adds an implicit regparm(0) which creates a warning when assigning the function to pointers. Use __visible for the functions without arguments. This avoids having to add regparm(0) to function pointers. Since

Re: Regression: x86/mm: new _PTE_SWP_SOFT_DIRTY bit conflicts with existing use

2013-08-21 Thread Andi Kleen
Cyrill Gorcunov writes: > > Hi all, I worked on patch which would not touch PSE bit for dirty page > tracking and the result is not that good: > > - 2level pages now always page dirty if page is swapped in and out, because >there is no space left in PTE (other than PSE bit) Maybe just don't

perf, x86: Add parts of the remaining haswell PMU functionality v3

2013-08-21 Thread Andi Kleen
I hope this version is ok for everyone now. [v2: Added Peter's changes to the PEBS handler] [v3: Addressed Arnaldo's feedback for the perf stat -T change and avoid conflict] Add some more TSX functionality to the basic Haswell PMU. A lot of the infrastructure needed for these patches has be

[PATCH 4/4] perf, tools: Add perf stat --transaction v5

2013-08-21 Thread Andi Kleen
From: Andi Kleen Add support to perf stat to print the basic transactional execution statistics: Total cycles, Cycles in Transaction, Cycles in aborted transsactions using the in_tx and in_tx_checkpoint qualifiers. Transaction Starts and Elision Starts, to compute the average transaction length

[PATCH 2/4] perf, x86: Report TSX transaction abort cost as weight v2

2013-08-21 Thread Andi Kleen
From: Andi Kleen Use the existing weight reporting facility to report the transaction abort cost, that is the number of cycles wasted in aborts. Haswell reports this in the PEBS record. This was in fact the original user for weight. This is a very useful sort key to concentrate on the most

[PATCH 3/4] perf, x86: Add Haswell TSX event aliases v6

2013-08-21 Thread Andi Kleen
From: Andi Kleen Add TSX event aliases, and export them from the kernel to perf. These are used by perf stat -T and to allow more user friendly access to events. The events are designed to be fairly generic and may also apply to other architectures implementing HTM. They all cover common

[PATCH 1/4] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v4

2013-08-21 Thread Andi Kleen
From: Andi Kleen With checkpointed counters there can be a situation where the counter is overflowing, aborts the transaction, is set back to a non overflowing checkpoint, causes interupt. The interrupt doesn't see the overflow because it has been checkpointed. This is then a spuriou

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Andi Kleen
Steven Rostedt writes: Can't you just use -freorder-blocks-and-partition? This should already partition unlikely blocks into a different section. Just a single one of course. FWIW the disadvantage is that multiple code sections tends to break various older dwarf unwinders, as it needs dwarf3 la

Re: [PATCH] RFC: perf, tools: Move gtk browser into separate perfgtk executable

2013-08-05 Thread Andi Kleen
Namhyung Kim writes: > > I wrote a patch series [1] separating gtk code to a dso and use it with > libdl last year. But I didn't get much feedback probably due to the > mistake of not installing the dso to a proper place. It'd be great if > you guys take a look at it and give some comments. It

[PATCH 04/16] x86, asmlinkage: Make _*_start_kernel visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen Obviously these functions have to be visible, otherwise the whole kernel could be optimized away. Signed-off-by: Andi Kleen --- arch/x86/include/asm/setup.h | 8 +--- arch/x86/kernel/head32.c | 2 +- arch/x86/kernel/head64.c | 2 +- 3 files changed, 7 insertions

[PATCH 08/16] x86, asmlinkage, kexec: Drop bogus asmlinkage in machine_kexec_32

2013-08-05 Thread Andi Kleen
From: Andi Kleen A function pointer cannot be asmlinkage. Just drop it. Signed-off-by: Andi Kleen --- arch/x86/kernel/machine_kexec_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 5b19e4d

x86: Clean up asmlinkage use

2013-08-05 Thread Andi Kleen
This patchkit makes the use of asmlinkage consistent in arch/x86 Originally arch/x86 was (mostly) fully annotated with asmlinkage, but this has bitrotted over time. These changes were originally part of my LTO patchkit. In the interest of making it smaller, I'm posting them separately, as they ca

[PATCH 11/16] x86, asmlinkage, apm: Make APM data structure used from assembler visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen Signed-off-by: Andi Kleen --- arch/x86/kernel/apm_32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c index 53a4e27..3ab0343 100644 --- a/arch/x86/kernel/apm_32.c +++ b/arch/x86/kernel/apm_32.c @@ -392,7

[PATCH 07/16] x86, asmlinkage: Make kprobes code visible and fix assembler code

2013-08-05 Thread Andi Kleen
From: Andi Kleen - Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: ana...@in.ibm.com Signed-off-by: Andi Kleen --- arch/x86

[PATCH 06/16] x86, asmlinkage: Make various syscalls asmlinkage

2013-08-05 Thread Andi Kleen
From: Andi Kleen FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: Andi Kleen --- arch/x86/include/asm/syscalls.h | 6 +++--- arch/x86/kernel/signal.c| 4 ++-- 2 files changed, 5 inser

[PATCH 15/16] x86, asmlinkage, power: Make various symbols used by the suspend asm code visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen Signed-off-by: Andi Kleen --- arch/x86/power/cpu.c | 8 arch/x86/power/hibernate_64.c | 12 ++-- kernel/power/hibernate.c | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index

[PATCH 09/16] x86, asmlinkage: Make several variables used from assembler/linker script visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen plus one function load_gs_index Signed-off-by: Andi Kleen --- arch/x86/include/asm/pgtable.h | 3 ++- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/kernel/cpu/amd.c| 4 ++-- arch/x86/kernel/cpu/common.c

[PATCH 12/16] x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops

2013-08-05 Thread Andi Kleen
From: Andi Kleen Cc: jer...@goop.org Signed-off-by: Andi Kleen --- arch/x86/include/asm/paravirt_types.h | 3 ++- arch/x86/kernel/paravirt.c| 4 ++-- arch/x86/xen/xen-ops.h| 16 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/arch

[PATCH 16/16] x86, asmlinkage, vdso: Mark vdso variables __visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen Signed-off-by: Andi Kleen --- arch/x86/include/asm/vvar.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h index de656ac..d76ac40 100644 --- a/arch/x86/include/asm/vvar.h +++ b/arch/x86/include/asm

[PATCH 10/16] x86, asmlinkage: Make syscall tables visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen They are referenced from entry*.S. Signed-off-by: Andi Kleen --- arch/x86/kernel/syscall_32.c | 2 +- arch/x86/kernel/syscall_64.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/syscall_32.c b/arch/x86/kernel/syscall_32.c index 147fcd4

[PATCH 13/16] x86, asmlinkage: Make 64bit checksum functions visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen They are implemented in assembler. Signed-off-by: Andi Kleen --- arch/x86/include/asm/checksum_64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/checksum_64.h b/arch/x86/include/asm/checksum_64.h index 9bfdc41..e6fd8a0 100644 --- a

[PATCH 14/16] x86, asmlinkage: Make dump_stack visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen dump_stack is used from assembler code, so make it visible. Signed-off-by: Andi Kleen --- include/linux/printk.h | 2 +- lib/dump_stack.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index

[PATCH] dwc3: Only build debugfs when DWC3_GADGET/DUAL_ROLE is enabled

2013-08-05 Thread Andi Kleen
From: Andi Kleen Fix (randconfig) build problem with DEBUG_FS on Cc: ba...@ti.com Cc: linux-...@vger.kernel.org Signed-off-by: Andi Kleen --- drivers/usb/dwc3/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/dwc3/Makefile b/drivers/usb/dwc3/Makefile

[PATCH 03/16] x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible

2013-08-05 Thread Andi Kleen
From: Andi Kleen These handlers are all referenced from assembler stubs, so need to be visible. The handlers without arguments become asmlinkage, the others __visible to not force regparms(0) on x86-32. I put it all into a single patch, please let me know if you want it it split up. Signed

[PATCH 02/16] x86, asmlinkage: Change dotraplinkage into __visible on 32bit v2

2013-08-05 Thread Andi Kleen
From: Andi Kleen Mark 32bit dotraplinkage functions as __visible for LTO. 64bit already is using asmlinkage which includes it. v2: Clean up (M.Marek) Signed-off-by: Andi Kleen --- arch/x86/include/asm/traps.h | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86

<    4   5   6   7   8   9   10   11   12   13   >