Re: [GIT PULL] adaptive spinning mutexes
I'll kick off some runs of my three benchmarks on ext3 for comparison. If there are things less synthetic people would like to see, please let me know. What about a web-server test? Number of hits per second it can do? Folkert van Heusden -- MultiTail er et flexible tool for å kontrolere Logfiles og commandoer. Med filtrer, farger, sammenføringer, forskeliger ansikter etc. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
I'll kick off some runs of my three benchmarks on ext3 for comparison. If there are things less synthetic people would like to see, please let me know. What about a web-server test? Number of hits per second it can do? Quick hack: http://vanheusden.com/tortureweb/tortureweb-0.1.tgz To test multiple requesters, start this program multiple times in parallel. There are probably better testers but for a quick test and might be sufficient. Folkert van Heusden -- Multitail es una herramienta flexible que permite visualizar los log file y seguir la ejecución de comandos. Permite filtrar, añadir colores, combinar archivos, la visualización de diferencias (diff- view), etc. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
So I don't dispute at all that mutex with spinning performs better than a mutex, but I _do_ claim that it has some potentially huge downsides compared to a real spinlock. It may perform as well as a spinlock in the nice common case, but then when you hit the non-common case you see the difference between well-written code and badly written code. Make it mount-point dependant. Then your mail-spool can use the spinlock version and e.g. the /usr filesystem uses regular mutexes. Might be tricky to implement I guess. Folkert van Heusden -- MultiTail cok yonlu kullanimli bir program, loglari okumak, verilen kommandolari yerine getirebilen. Filter, renk verme, merge, 'diff- view', vs. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
Chris Mason wrote: On Thu, 2009-01-15 at 10:16 -0800, Linus Torvalds wrote: On Thu, 15 Jan 2009, Ingo Molnar wrote: btw., i think spin-mutexes have a design advantage here: in a lot of code areas it's quite difficult to use spinlocks - cannot allocate memory, cannot call any code that can sporadically block (but does not _normally_ block), etc. With mutexes those atomicity constraints go away - and the performance profile should now be quite close to that of spinlocks as well. Umm. Except if you wrote the code nicely and used spinlocks, you wouldn't hold the lock over all those unnecessary and complex operations. While this is true, there are examples of places we should expect speedups for this today. Concurrent file creation/deletion in a single dir will often find things hot in cache and not have to block anywhere (mail spools). And although not as common, NNTP servers using file per article storage. Concurrent O_DIRECT aio writes to the same file, where i_mutex is dropped early on. pipes should see a huge improvement. I'd like to see that. Didn't realize how slow pipes really are. I'll kick off some runs of my three benchmarks on ext3 for comparison. If there are things less synthetic people would like to see, please let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Bill Davidsen david...@tmr.com We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Chris Mason chris.ma...@oracle.com wrote: On Wed, 2009-01-14 at 19:33 +0100, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: Full series, including changelogs available at: http://programming.kicks-ass.net/kernel-patches/mutex-adaptive-spin/ and should shortly appear in a git tree near Ingo :-) Linus, Please pull the adaptive-mutexes-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git adaptive-mutexes-for-linus I was going to put this into the btrfs tree, but since you have a branch just for adaptive mutexes, is it easier to put there? From: Chris Mason chris.ma...@oracle.com Btrfs: stop spinning on mutex_trylock and let the adaptive code spin for us Mutexes now spin internally and the btrfs spin is no longer required for performance. Signed-off-by: Chris Mason chris.ma...@oracle.com applied it to tip/core/locking, thanks Chris! Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
You just disproved your own case :( how so? 80% is not enough? I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. Ubuntu has CONFIG_SCHED_DEBUG=y as well in their kernels. Debian too: mauer:~/bin# grep CONFIG_SCHED_DEBUG /boot/config-2.6.2* /boot/config-2.6.24-1-amd64:# CONFIG_SCHED_DEBUG is not set /boot/config-2.6.25-2-amd64:CONFIG_SCHED_DEBUG=y /boot/config-2.6.26-1-amd64:CONFIG_SCHED_DEBUG=y $ zgrep DEBUG_MUTEX /proc/config.gz # CONFIG_DEBUG_MUTEXES is not set mauer:~/bin# grep DEBUG_MUTEX /boot/config-2.6.2* /boot/config-2.6.22-3-amd64:# CONFIG_DEBUG_MUTEXES is not set /boot/config-2.6.24-1-amd64:# CONFIG_DEBUG_MUTEXES is not set /boot/config-2.6.25-2-amd64:# CONFIG_DEBUG_MUTEXES is not set /boot/config-2.6.26-1-amd64:# CONFIG_DEBUG_MUTEXES is not set Folkert van Heusden -- Looking for a cheap but fast webhoster with an excellent helpdesk? http://keetweej.vanheusden.com/redir.php?id=1001 -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, Jan 14, 2009 at 08:28:11PM +0100, Ingo Molnar wrote: [v2.6.14] [v2.6.29] Semaphores | Mutexes -- | no-spin spin | [tmpfs] ops/sec: 50713 | 291038 392865 (+34.9%) [ext3]ops/sec: 45214 | 283291 435674 (+53.7%) A 10x macro-performance improvement on ext3, compared to 2.6.14 :-) While lots of other details got changed meanwhile, i'm sure most of the performance win on this particular VFS workload comes from mutexes. I asked a couple of our benchmarking teams to try -v9. Neither the OLTP benchmark, nor the kernel-perf test suite found any significant performance change. I suspect mutex contention isn't a significant problem for most workloads. Has anyone found a non-synthetic benchmark where this makes a significant difference? Aside from btrfs, I mean. -- Matthew Wilcox Intel Open Source Technology Centre Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Matthew Wilcox matt...@wil.cx wrote: On Wed, Jan 14, 2009 at 08:28:11PM +0100, Ingo Molnar wrote: [v2.6.14] [v2.6.29] Semaphores | Mutexes -- | no-spin spin | [tmpfs] ops/sec: 50713 | 291038 392865 (+34.9%) [ext3]ops/sec: 45214 | 283291 435674 (+53.7%) A 10x macro-performance improvement on ext3, compared to 2.6.14 :-) While lots of other details got changed meanwhile, i'm sure most of the performance win on this particular VFS workload comes from mutexes. I asked a couple of our benchmarking teams to try -v9. Neither the OLTP benchmark, nor the kernel-perf test suite found any significant performance change. I suspect mutex contention isn't a significant problem for most workloads. basically only VFS is mutex-bound really, and few of the 'benchmarks' tend to be VFS intense. Maybe things like mail-server benchmarks would do that. Also, -v9 is like two days old code ;-) Old and crufty. The real performance uptick was not even in -v10 but in -v11 (the one we submitted in this thread). Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Linus Torvalds torva...@linux-foundation.org wrote: Has anyone found a non-synthetic benchmark where this makes a significant difference? Aside from btrfs, I mean. Yea, if you have some particular filesystem (or other subsystem) that uses a global mutex, you'll obviously see way more contention. Btrfs may not be _unique_ in this regard, but it's definitely doing something that isn't good. Btw, it's doing something that ext3 also used to do iirc, until we fixed it to use spinlocks instead (the block group lock in particular). Yeah - just double-checked. Commit c12b9866ea52 in the historical Linux archive, from 2003. Which made block allocation protected by a per-group spinlock, rather than lock_super(). btw., i think spin-mutexes have a design advantage here: in a lot of code areas it's quite difficult to use spinlocks - cannot allocate memory, cannot call any code that can sporadically block (but does not _normally_ block), etc. With mutexes those atomicity constraints go away - and the performance profile should now be quite close to that of spinlocks as well. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Thu, 15 Jan 2009, Ingo Molnar wrote: btw., i think spin-mutexes have a design advantage here: in a lot of code areas it's quite difficult to use spinlocks - cannot allocate memory, cannot call any code that can sporadically block (but does not _normally_ block), etc. With mutexes those atomicity constraints go away - and the performance profile should now be quite close to that of spinlocks as well. Umm. Except if you wrote the code nicely and used spinlocks, you wouldn't hold the lock over all those unnecessary and complex operations. IOW, if you do pre-allocation instead of holding a lock over the allocation, you win. So yes, spin-mutexes makes it easier to write the code, but it also makes it easier to just plain be lazy. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Chris Mason chris.ma...@oracle.com wrote: [ re: pipes, ok I don't know of realistic pipe benchmarks but I'll run them if people can suggest one ] Threaded servers making heavy use of sys_splice() ought to hit the pipe mutex all the time. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Thu, 15 Jan 2009, Paul E. McKenney wrote: On Thu, Jan 15, 2009 at 10:16:53AM -0800, Linus Torvalds wrote: IOW, if you do pre-allocation instead of holding a lock over the allocation, you win. So yes, spin-mutexes makes it easier to write the code, but it also makes it easier to just plain be lazy. In infrequently invoked code such as some error handling, lazy/simple can be a big win. Sure. I don't disagree at all. On such code we don't even care about locking. If it _really_ is fundamentally very rarely invoked. But if we're talking things like core filesystem locks, it's _really_ irritating when one of those (supposedly rare) allocation delays or the need to do IO then blocks all those (supposedly common) nice cached cases. So I don't dispute at all that mutex with spinning performs better than a mutex, but I _do_ claim that it has some potentially huge downsides compared to a real spinlock. It may perform as well as a spinlock in the nice common case, but then when you hit the non-common case you see the difference between well-written code and badly written code. And sadly, while allocations _usually_ are nice and immediate, and while our caches _usually_ mean that we don't need to do IO, bad behavior when we do need to do IO is what really kills interactive feel. Suddenly everything else is hurting too, because they wanted that lock - even if they didn't need to do IO or allocate anything. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Thu, Jan 15, 2009 at 05:01:32PM -0800, Linus Torvalds wrote: On Thu, 15 Jan 2009, Paul E. McKenney wrote: On Thu, Jan 15, 2009 at 10:16:53AM -0800, Linus Torvalds wrote: IOW, if you do pre-allocation instead of holding a lock over the allocation, you win. So yes, spin-mutexes makes it easier to write the code, but it also makes it easier to just plain be lazy. In infrequently invoked code such as some error handling, lazy/simple can be a big win. Sure. I don't disagree at all. On such code we don't even care about locking. If it _really_ is fundamentally very rarely invoked. But if we're talking things like core filesystem locks, it's _really_ irritating when one of those (supposedly rare) allocation delays or the need to do IO then blocks all those (supposedly common) nice cached cases. Certainly if there was one big mutex covering all the operations, it would indeed be bad. On the other hand, if the filesystem/cache was partitioned (perhaps hashed) so that there was a large number of such locks, then if should be OK. Yes, I am making the perhaps naive assumption that hot spots such as the root inode would be in the cache. And that they would rarely collide with allocation or I/O, which might also be naive. But on this point I must defer to the filesystem folks. So I don't dispute at all that mutex with spinning performs better than a mutex, but I _do_ claim that it has some potentially huge downsides compared to a real spinlock. It may perform as well as a spinlock in the nice common case, but then when you hit the non-common case you see the difference between well-written code and badly written code. And sadly, while allocations _usually_ are nice and immediate, and while our caches _usually_ mean that we don't need to do IO, bad behavior when we do need to do IO is what really kills interactive feel. Suddenly everything else is hurting too, because they wanted that lock - even if they didn't need to do IO or allocate anything. I certainly agree that there are jobs that a spin-mutex is ill-suited for. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Thu, Jan 15, 2009 at 10:16:53AM -0800, Linus Torvalds wrote: On Thu, 15 Jan 2009, Ingo Molnar wrote: btw., i think spin-mutexes have a design advantage here: in a lot of code areas it's quite difficult to use spinlocks - cannot allocate memory, cannot call any code that can sporadically block (but does not _normally_ block), etc. With mutexes those atomicity constraints go away - and the performance profile should now be quite close to that of spinlocks as well. Umm. Except if you wrote the code nicely and used spinlocks, you wouldn't hold the lock over all those unnecessary and complex operations. IOW, if you do pre-allocation instead of holding a lock over the allocation, you win. So yes, spin-mutexes makes it easier to write the code, but it also makes it easier to just plain be lazy. Yeah, I agree often it is harder to get the locking right but you end up with a better result. With mutexes, on the off chance you do have t oblock while holding the lock, performance and latency of other threads will tank. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] adaptive spinning mutexes
* Peter Zijlstra pet...@infradead.org wrote: Full series, including changelogs available at: http://programming.kicks-ass.net/kernel-patches/mutex-adaptive-spin/ and should shortly appear in a git tree near Ingo :-) Linus, Please pull the adaptive-mutexes-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git adaptive-mutexes-for-linus We dropped two fresh patches from v11 for the time being: the two debug patches, they had test failures [they triggered mutex debugging false positives]. So this tree is v10 (which got a lot of testing already) plus Chris's performance patch. It passes all x86 runtime tests here. The cross build matrix looks good too: testing 10 architectures. [ syncing linus ... ] testing alpha: -git: pass ( 21), -tip: pass ( 21) testingarm: -git: pass ( 5), -tip: pass ( 5) testing blackfin: -git: pass ( 20), -tip: pass ( 20) testing cris: -git: pass ( 32), -tip: pass ( 32) testing ia64: -git: pass ( 153), -tip: pass ( 153) testing m32r: -git: pass ( 21), -tip: pass ( 21) testing m68k: -git: pass ( 34), -tip: pass ( 34) testing mips: -git: pass ( 4), -tip: pass ( 4) testingpowerpc: -git: pass ( 11), -tip: pass ( 11) testing sparc: -git: pass ( 0), -tip: pass ( 0) Passes all and no new warnings. So in theory this is good enough as a pre-rc2 pull too, should you feel tempted ;-) Thanks, Ingo -- Chris Mason (1): mutex: adaptive spinnning, performance tweaks Peter Zijlstra (3): mutex: small cleanup mutex: preemption fixes mutex: implement adaptive spinning include/linux/mutex.h |5 +- include/linux/sched.h |2 + kernel/mutex-debug.c|9 +--- kernel/mutex-debug.h| 18 --- kernel/mutex.c | 121 --- kernel/mutex.h | 22 - kernel/sched.c | 71 ++- kernel/sched_features.h |1 + 8 files changed, 209 insertions(+), 40 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 7a0e5c4..3069ec7 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -50,8 +50,10 @@ struct mutex { atomic_tcount; spinlock_t wait_lock; struct list_headwait_list; -#ifdef CONFIG_DEBUG_MUTEXES +#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP) struct thread_info *owner; +#endif +#ifdef CONFIG_DEBUG_MUTEXES const char *name; void*magic; #endif @@ -68,7 +70,6 @@ struct mutex_waiter { struct list_headlist; struct task_struct *task; #ifdef CONFIG_DEBUG_MUTEXES - struct mutex*lock; void*magic; #endif }; diff --git a/include/linux/sched.h b/include/linux/sched.h index 4cae9b8..c34b137 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -328,7 +328,9 @@ extern signed long schedule_timeout(signed long timeout); extern signed long schedule_timeout_interruptible(signed long timeout); extern signed long schedule_timeout_killable(signed long timeout); extern signed long schedule_timeout_uninterruptible(signed long timeout); +asmlinkage void __schedule(void); asmlinkage void schedule(void); +extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner); struct nsproxy; struct user_namespace; diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c index 1d94160..50d022e 100644 --- a/kernel/mutex-debug.c +++ b/kernel/mutex-debug.c @@ -26,11 +26,6 @@ /* * Must be called with lock-wait_lock held. */ -void debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner) -{ - lock-owner = new_owner; -} - void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); @@ -59,7 +54,6 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter, /* Mark the current thread as blocked on the lock: */ ti-task-blocked_on = waiter; - waiter-lock = lock; } void mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter, @@ -82,7 +76,7 @@ void debug_mutex_unlock(struct mutex *lock) DEBUG_LOCKS_WARN_ON(lock-magic != lock); DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info()); DEBUG_LOCKS_WARN_ON(!lock-wait_list.prev !lock-wait_list.next); - DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info()); + mutex_clear_owner(lock); } void debug_mutex_init(struct mutex *lock, const char *name, @@ -95,7 +89,6 @@ void debug_mutex_init(struct mutex *lock, const char *name, debug_check_no_locks_freed((void *)lock, sizeof(*lock)); lockdep_init_map(lock-dep_map, name, key, 0); #endif - lock-owner
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 2009-01-14 at 19:33 +0100, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: Full series, including changelogs available at: http://programming.kicks-ass.net/kernel-patches/mutex-adaptive-spin/ and should shortly appear in a git tree near Ingo :-) Linus, Please pull the adaptive-mutexes-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git adaptive-mutexes-for-linus I was going to put this into the btrfs tree, but since you have a branch just for adaptive mutexes, is it easier to put there? From: Chris Mason chris.ma...@oracle.com Btrfs: stop spinning on mutex_trylock and let the adaptive code spin for us Mutexes now spin internally and the btrfs spin is no longer required for performance. Signed-off-by: Chris Mason chris.ma...@oracle.com diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index 39bae77..40ba8e8 100644 --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -37,16 +37,6 @@ int btrfs_tree_lock(struct extent_buffer *eb) { - int i; - - if (mutex_trylock(eb-mutex)) - return 0; - for (i = 0; i 512; i++) { - cpu_relax(); - if (mutex_trylock(eb-mutex)) - return 0; - } - cpu_relax(); mutex_lock_nested(eb-mutex, BTRFS_MAX_LEVEL - btrfs_header_level(eb)); return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Ingo Molnar mi...@elte.hu wrote: Linus, Please pull the adaptive-mutexes-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git adaptive-mutexes-for-linus We dropped two fresh patches from v11 for the time being: the two debug patches, they had test failures [they triggered mutex debugging false positives]. So this tree is v10 (which got a lot of testing already) plus Chris's performance patch. It passes all x86 runtime tests here. Latest performance figures, on a 2-socket 16-way Nehalem test-system, running the code above, measured via test-mutex V 128 10 VFS creat+unlink scalability test on tmpfs and ext3: no-spin spin [tmpfs]avg ops/sec: 291038 392865 (+34.9%) [ext3] avg ops/sec: 283291 435674 (+53.7%) Those numbers impress the heck out of me, rarely do we have such kind of speedups these days, for any established functionality. We still have the /sys/debug/sched_features tunable under CONFIG_SCHED_DEBUG=y, so should this cause any performance regressions somewhere, it can be pinned down and blamed back on this change easily, without bisection and without rebooting the box. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Peter Zijlstra a.p.zijls...@chello.nl wrote: On Wed, 2009-01-14 at 10:53 -0800, Andrew Morton wrote: On Wed, 14 Jan 2009 19:33:19 +0100 Ingo Molnar mi...@elte.hu wrote: Please pull the adaptive-mutexes-for-linus git tree fear - It seems a major shortcoming that the feature is disabled if CONFIG_DEBUG_MUTEXES=y. It means that lots of people won't test it. Yes, that's a bit unfortunate, a simple patch to enable that is: I must admit I'm a bit stumped on why that debug check triggers, I couldn't reproduce today, but Ingo ran into it quite quickly :/ Yes, the debug patch caused this false positive warning on one of my test-systems: Built 1 zonelists in Zone order, mobility grouping on. Total pages: 255762 [ cut here ] WARNING: at kernel/mutex-debug.c:77 debug_mutex_unlock+0x94/0xde() Hardware name: Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.29-rc1-tip-00983-ge1af3bd-dirty #1 Call Trace: [8024f2f7] warn_slowpath+0xd8/0xf7 [8024f5ec] ? wake_up_klogd+0x9/0x2f [80270fd1] ? graph_lock+0x27/0x66 [80275329] ? validate_chain+0xd4d/0xd9f [802714c4] ? save_trace+0x3f/0xb2 [80275b4f] ? __lock_acquire+0x7d4/0x832 [80275c5f] ? lock_acquire+0xb2/0xc2 [8025091c] ? cpu_maps_update_begin+0x17/0x19 [80271d21] ? trace_hardirqs_off+0xd/0xf [80270a8e] debug_mutex_unlock+0x94/0xde [80906d71] __mutex_unlock_slowpath+0xdd/0x152 [80906df4] mutex_unlock+0xe/0x10 [80250955] cpu_maps_update_done+0x15/0x17 [808ce8b5] register_cpu_notifier+0x2c/0x32 [80d7683e] page_alloc_init+0x10/0x12 [80d5ac45] start_kernel+0x1ba/0x422 [80d5a140] ? early_idt_handler+0x0/0x73 [80d5a2c3] x86_64_start_reservations+0xae/0xb2 [80d5a421] x86_64_start_kernel+0x137/0x146 ---[ end trace a7919e7f17c0a725 ]--- Kernel command line: root=/dev/sda1 earlyprintk=serial,ttyS0,115200 console=ttyS0,115200 console=tty 5 profile=0 debug initcall_debug apic=debug apic=verbose ignore_loglevel sysrq_always_enabled pci=nomsi kernel profiling enabled (shift: 0) debug: sysrq always enabled. Initializing CPU#0 So we left that change out from this pull request. It's not a big deal i think - mutex debugging always had a different fast-path from no-debug mutexes anyway (or a lack of fast-path to begin with). So the performance characteristics were always subtly different. Now they might be more different - but we'll fix that too. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 2009-01-14 at 11:36 -0800, Andrew Morton wrote: Do people enable CONFIG_SCHED_DEBUG? Well, I have it always enabled, but I've honestly no idea if that makes me weird. CONFIG_DEBUG_MUTEXES=n, CONFIG_SCHED_DEBUG=y is getting to be a pretty small subset? Could be, do you fancy me doing a sysctl? shouldn't be hard. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Andrew Morton a...@linux-foundation.org wrote: On Wed, 14 Jan 2009 20:00:08 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: On Wed, 14 Jan 2009 19:33:19 +0100 Ingo Molnar mi...@elte.hu wrote: Please pull the adaptive-mutexes-for-linus git tree fear - It seems a major shortcoming that the feature is disabled if CONFIG_DEBUG_MUTEXES=y. It means that lots of people won't test it. ^^^? - When people hit performance/latency oddities, it would be nice if they had a /proc knob with which they could disable this feature at runtime. This would also be useful for comparative performance testing. Yeah. From my other mail: We still have the /sys/debug/sched_features tunable under CONFIG_SCHED_DEBUG=y, so should this cause any performance regressions somewhere, it can be pinned down and blamed back on this change easily, without bisection and without rebooting the box. This kind of easy knob was included early on - this is how all those spin versus no-spin numbers were done. Do people enable CONFIG_SCHED_DEBUG? If they suspect performance problems and want to analyze them? Note that CONFIG_SCHED_DEBUG=y is also the default. CONFIG_DEBUG_MUTEXES=n, CONFIG_SCHED_DEBUG=y is getting to be a pretty small subset? Those two are the default config settings actually, so i'd expect it to be the most commonly occuring combinations. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Peter Zijlstra a.p.zijls...@chello.nl wrote: On Wed, 2009-01-14 at 11:36 -0800, Andrew Morton wrote: Do people enable CONFIG_SCHED_DEBUG? Well, I have it always enabled, but I've honestly no idea if that makes me weird. CONFIG_DEBUG_MUTEXES=n, CONFIG_SCHED_DEBUG=y is getting to be a pretty small subset? Could be, do you fancy me doing a sysctl? shouldn't be hard. i dunno, why another fancy sysctl for something that fits quite nicely into the existing sched_features scheme that we've been using for such purposes for the past 3-4 kernel releases? we always provided various toggles for new scheduler features via /sys/debug/sched_features, so that people can do performance regression testing, and it works quite well. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 14 Jan 2009 21:14:35 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: On Wed, 14 Jan 2009 20:00:08 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: On Wed, 14 Jan 2009 19:33:19 +0100 Ingo Molnar mi...@elte.hu wrote: Please pull the adaptive-mutexes-for-linus git tree fear - It seems a major shortcoming that the feature is disabled if CONFIG_DEBUG_MUTEXES=y. It means that lots of people won't test it. ^^^? - When people hit performance/latency oddities, it would be nice if they had a /proc knob with which they could disable this feature at runtime. This would also be useful for comparative performance testing. Yeah. From my other mail: We still have the /sys/debug/sched_features tunable under CONFIG_SCHED_DEBUG=y, so should this cause any performance regressions somewhere, it can be pinned down and blamed back on this change easily, without bisection and without rebooting the box. This kind of easy knob was included early on - this is how all those spin versus no-spin numbers were done. Do people enable CONFIG_SCHED_DEBUG? If they suspect performance problems and want to analyze them? The vast majority of users do not and usually cannot compile their own kernels. Note that CONFIG_SCHED_DEBUG=y is also the default. akpm:/usr/src/25 echo $ARCH x86_64 akpm:/usr/src/25 make defconfig *** Default configuration is based on 'x86_64_defconfig' # # configuration written to .config # akpm:/usr/src/25 grep SCHED_DEBUG .config # CONFIG_SCHED_DEBUG is not set CONFIG_DEBUG_MUTEXES=n, CONFIG_SCHED_DEBUG=y is getting to be a pretty small subset? Those two are the default config settings actually, so i'd expect it to be the most commonly occuring combinations. akpm:/usr/src/25 grep DEBUG_MUTEXES .config # CONFIG_DEBUG_MUTEXES is not set -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 14 Jan 2009 21:27:36 +0100 Ingo Molnar mi...@elte.hu wrote: * Peter Zijlstra a.p.zijls...@chello.nl wrote: On Wed, 2009-01-14 at 11:36 -0800, Andrew Morton wrote: Do people enable CONFIG_SCHED_DEBUG? Well, I have it always enabled, but I've honestly no idea if that makes me weird. CONFIG_DEBUG_MUTEXES=n, CONFIG_SCHED_DEBUG=y is getting to be a pretty small subset? Could be, do you fancy me doing a sysctl? shouldn't be hard. i dunno, why another fancy sysctl for something that fits quite nicely into the existing sched_features scheme that we've been using for such purposes for the past 3-4 kernel releases? we always provided various toggles for new scheduler features via /sys/debug/sched_features, so that people can do performance regression testing, and it works quite well. If we know that this control will be reliably available in packaged kernels then fine. But how to we arrange for that? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Andrew Morton a...@linux-foundation.org wrote: Do people enable CONFIG_SCHED_DEBUG? If they suspect performance problems and want to analyze them? The vast majority of users do not and usually cannot compile their own kernels. ... which they derive from distro kernels or some old .config they always used, via 'make oldconfig'. You are arguing against well-established facts here. If you dont believe my word for it, here's an analysis of all kernel configs posted to lkml in the past 8 months: $ grep ^CONFIG_SCHED_DEBUG linux-kernel | wc -l 424 $ grep 'CONFIG_SCHED_DEBUG is not' linux-kernel | wc -l 109 i.e. CONFIG_SCHED_DEBUG=y is set in 80% of the configs. A large majority of testers has it enabled and /sys/debug/sched_features was always a good mechanism that we used for runtime toggles. Note that CONFIG_SCHED_DEBUG=y is also the default. akpm:/usr/src/25 echo $ARCH x86_64 akpm:/usr/src/25 make defconfig *** Default configuration is based on 'x86_64_defconfig' x86 defconfig is used too, but it's a pretty rare usage. Under default i mean the customary meaning of default config: it's the default if you come via 'make oldconfig' or if you derive your config from a distro config: | config SCHED_DEBUG |bool Collect scheduler debugging info |depends on DEBUG_KERNEL PROC_FS |default y Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 14 Jan 2009 21:51:22 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: Do people enable CONFIG_SCHED_DEBUG? If they suspect performance problems and want to analyze them? The vast majority of users do not and usually cannot compile their own kernels. ... which they derive from distro kernels or some old .config they always used, via 'make oldconfig'. You are arguing against well-established facts here. If you dont believe my word for it, here's an analysis of all kernel configs posted to lkml in the past 8 months: $ grep ^CONFIG_SCHED_DEBUG linux-kernel | wc -l 424 $ grep 'CONFIG_SCHED_DEBUG is not' linux-kernel | wc -l 109 i.e. CONFIG_SCHED_DEBUG=y is set in 80% of the configs. A large majority of testers has it enabled and /sys/debug/sched_features was always a good mechanism that we used for runtime toggles. You just disproved your own case :( Note that CONFIG_SCHED_DEBUG=y is also the default. akpm:/usr/src/25 echo $ARCH x86_64 akpm:/usr/src/25 make defconfig *** Default configuration is based on 'x86_64_defconfig' x86 defconfig is used too, but it's a pretty rare usage. Under default i mean the customary meaning of default config: it's the default if you come via 'make oldconfig' or if you derive your config from a distro config: | config SCHED_DEBUG |bool Collect scheduler debugging info |depends on DEBUG_KERNEL PROC_FS |default y This simply isn't reliable. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, 14 Jan 2009 22:14:58 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: On Wed, 14 Jan 2009 21:51:22 +0100 Ingo Molnar mi...@elte.hu wrote: * Andrew Morton a...@linux-foundation.org wrote: Do people enable CONFIG_SCHED_DEBUG? If they suspect performance problems and want to analyze them? The vast majority of users do not and usually cannot compile their own kernels. ... which they derive from distro kernels or some old .config they always used, via 'make oldconfig'. You are arguing against well-established facts here. If you dont believe my word for it, here's an analysis of all kernel configs posted to lkml in the past 8 months: $ grep ^CONFIG_SCHED_DEBUG linux-kernel | wc -l 424 $ grep 'CONFIG_SCHED_DEBUG is not' linux-kernel | wc -l 109 i.e. CONFIG_SCHED_DEBUG=y is set in 80% of the configs. A large majority of testers has it enabled and /sys/debug/sched_features was always a good mechanism that we used for runtime toggles. You just disproved your own case :( how so? 80% is not enough? No. It really depends on what distros do. I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. If all distros set SCHED_DEBUG=y then fine. But if they do this then we should do this at the kernel.org level, and make it a hard-to-turn-off thing via CONFIG_EMBEDDED=y. note that there's also a performance issue here: we generally _dont want_ a debug sysctl overhead in the mutex code or in any fastpath for that matter. So making it depend on SCHED_DEBUG is useful. sched_feat() features get optimized out at build time when SCHED_DEBUG is disabled. So it gives us the best of two worlds: the utility of sysctls in the SCHED_DEBUG=y, and they get compiled out in the !SCHED_DEBUG case. I'm not detecting here a sufficient appreciation of the number of sched-related regressions we've seen in recent years, nor of the difficulty encountered in diagnosing and fixing them. Let alone the difficulty getting those fixes propagated out a *long* time after the regression was added. You're taking a whizzy new feature which drastically changes a critical core kernel feature and jamming it into mainline with a vestigial amount of testing coverage without giving sufficient care and thought to the practical lessons which we have learned from doing this in the past. This is a highly risky change. It's not that the probability of failure is high - the problem is that the *cost* of the improbable failure is high. We should seek to minimize that cost. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Ingo Molnar mi...@elte.hu wrote: You just disproved your own case :( how so? 80% is not enough? I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. Ubuntu has CONFIG_SCHED_DEBUG=y as well in their kernels. note that there's also a performance issue here: we generally _dont want_ a debug sysctl overhead in the mutex code or in any fastpath for that matter. So making it depend on SCHED_DEBUG is useful. sched_feat() features get optimized out at build time when SCHED_DEBUG is disabled. So it gives us the best of two worlds: the utility of sysctls in the SCHED_DEBUG=y, and they get compiled out in the !SCHED_DEBUG case. There's a third issue as well: the toggle _is_ intentionally debug-only, while sysctls are non-debug and we _really_ dont want feature assymetry like that. It will just splinter the application space: if there's significant performance variances then apps will just go the path of least resistence: instead of debugging the performance problems properly, the first group of applications will be tuned for the sysctl_mutex_spin=0 case, the second group of applications will be tuned for the sysctl_mutex_spin=1 case. And if an enterprise distro decided to flip the default around we'd have a real tuning mess. /sys/debug/sched_features strikes the right kind of balance IMO: it's not always available and it's explicitly in debugfs with no stable ABI so apps cannot standardize on being able to tweak it, but it's convenient enough in practice for developers to depend on it, performance analysis is easy, etc., etc. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
On Wed, Jan 14, 2009 at 22:41, Ingo Molnar mi...@elte.hu wrote: * Ingo Molnar mi...@elte.hu wrote: You just disproved your own case :( how so? 80% is not enough? I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. Ubuntu has CONFIG_SCHED_DEBUG=y as well in their kernels. $ cat /etc/SuSE-release openSUSE 11.1 (x86_64) VERSION = 11.1 $ uname -a Linux nga 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux $ zgrep SCHED_DEBUG /proc/config.gz CONFIG_SCHED_DEBUG=y $ zgrep DEBUG_MUTEX /proc/config.gz # CONFIG_DEBUG_MUTEXES is not set Kay -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Kay Sievers kay.siev...@vrfy.org wrote: On Wed, Jan 14, 2009 at 22:41, Ingo Molnar mi...@elte.hu wrote: * Ingo Molnar mi...@elte.hu wrote: You just disproved your own case :( how so? 80% is not enough? I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. Ubuntu has CONFIG_SCHED_DEBUG=y as well in their kernels. $ cat /etc/SuSE-release openSUSE 11.1 (x86_64) VERSION = 11.1 $ uname -a Linux nga 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux $ zgrep SCHED_DEBUG /proc/config.gz CONFIG_SCHED_DEBUG=y $ zgrep DEBUG_MUTEX /proc/config.gz # CONFIG_DEBUG_MUTEXES is not set Fedora has mutex debugging disabled too: # CONFIG_DEBUG_MUTEXES is not set So the 3 main Linux distros, generating ~95% of the kerneloops.org feedback traffic, all have SCHED_DEBUG=y, and at least two have !DEBUG_MUTEXES. (possibly Ubuntu has that disabled it too) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] adaptive spinning mutexes
* Andrew Morton a...@linux-foundation.org wrote: I also checked Fedora and it has SCHED_DEBUG=y in its kernel rpms. If all distros set SCHED_DEBUG=y then fine. 95% of the distros and significant majority of the lkml traffic. And no, we dont generally dont provide knobs for essential performance features of core Linux kernel primitives - so the existence of SPIN_OWNER in /sys/debug/sched_features is an exception already. We dont have any knob to switch ticket spinlocks to old-style spinlocks. We dont have any knob to switch the page allocator from LIFO to FIFO. We dont have any knob to turn off the coalescing of vmas in the MM. We dont have any knob to turn the mmap_sem from an rwsem to a mutex to a spinlock. Why? Beacause such design and implementation details are what make Linux Linux, and we stand by those decisions for better or worse. And we do try to eliminate as many 'worse' situations as possible, but we dont provide knobs galore. We offer flexibility in our willingness to fix any genuine performance issues in our source code. The thing is that apps tend to gravitate towards solutions with the least short-term cost. If a super important enterprise app can solve their performance problem by either redesigning their broken code, or by turning off a feature we have in the kernel in their install scripts (which we made so easy to tune via a stable sysctl), guess which variant they will chose? Even if they hurt all other apps in the process. note that there's also a performance issue here: we generally _dont want_ a debug sysctl overhead in the mutex code or in any fastpath for that matter. So making it depend on SCHED_DEBUG is useful. sched_feat() features get optimized out at build time when SCHED_DEBUG is disabled. So it gives us the best of two worlds: the utility of sysctls in the SCHED_DEBUG=y, and they get compiled out in the !SCHED_DEBUG case. I'm not detecting here a sufficient appreciation of the number of sched-related regressions we've seen in recent years, nor of the difficulty encountered in diagnosing and fixing them. Let alone the difficulty getting those fixes propagated out a *long* time after the regression was added. The bugzilla you just dug out in another thread does not seem to apply, so i'm not sure what you are referring to. Regarding historic tendencies, we have numbers like: [v2.6.14] [v2.6.29] Semaphores | Mutexes -- | no-spin spin | [tmpfs] ops/sec: 50713 | 291038 392865 (+34.9%) [ext3]ops/sec: 45214 | 283291 435674 (+53.7%) 10x performance improvement on ext3, compared to 2.6.14. I'm sure there will be other numbers that go down - but the thing is, we've _never_ been good at finding the worst-possible workload cases during development. You're taking a whizzy new feature which drastically changes a critical core kernel feature and jamming it into mainline with a vestigial amount of testing coverage without giving sufficient care and thought to the practical lessons which we have learned from doing this in the past. If you look at the whole existence of /sys/debug/sched_feature you'll see how careful we've been about performance regressions. We made it a sched_feat() exactly because if a change goes wrong and becomes a step backwards then it's a oneliner to turn it default-off. We made use of that facility in the past and we have a number of debug knobs there right now: # cat /debug/sched_features NEW_FAIR_SLEEPERS NORMALIZED_SLEEPER WAKEUP_PREEMPT START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN All of those ~16 scheduler knobs were done out of caution, to make sure that if we change some scheduling aspect there's a convenient way to debug performance or interactivity regressions, without forcing people into bisection and/or reboots, etc. This is a highly risky change. It's not that the probability of failure is high - the problem is that the *cost* of the improbable failure is high. We should seek to minimize that cost. It never mattered much to the efficiency of finding performance regressions whether a feature sat tight for 4 kernel releases in -mm or went upstream in a week. It _does_ matter to stability - but not significantly to performance. What matteres most to getting performance right is testing exposure and variance, not length of the integration period. Easy revertability helps too - and that is a given here - it's literally a oneliner to disable it. See that oneliner below. Ingo Index: linux/kernel/sched_features.h === ---
Re: [GIT PULL] adaptive spinning mutexes
On Wed, Jan 14, 2009 at 01:35:29PM -0800, Andrew Morton wrote: You're taking a whizzy new feature which drastically changes a critical core kernel feature and jamming it into mainline with a vestigial amount of testing coverage without giving sufficient care and thought to the practical lessons which we have learned from doing this in the past. This is a highly risky change. It's not that the probability of failure is high - the problem is that the *cost* of the improbable failure is high. We should seek to minimize that cost. There is very little downside to waiting for at least the next release cycle. What's the case for making an exception and merging it right now? It actually still seems to be generating a lot of changes and discussion right up until yesterday... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html