Re: [PATCH -v5][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Andi Kleen wrote: What about memory hotplug as Ingo mentioned? Should that be: #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_MEMORY_HOTPLUG) We expect memory hotunplug to only really work in movable zones (all others should at least have one kernel object somewhere that prevents unplug) and you can't have task structs in movable zones obviously So it's probably a non issue in practice. Sure, it probably is a non issue, but I'm afraid that non issues of today might become issues of tomorrow. Where does it say that we can never put a task struct in a movable zone. Perhaps, we could someday have a CPU with memory local to it, and pinned tasks to that CPU in that memory. Then there can be cases where we remove the CPU and memory together. Because of preemption in the mutex spin part, there's no guarantee that a the task in that removed memory will not be referenced again. Of course this thought is purely theoretical, but I like to solve bugs that might might happen tomorrow too. ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v6][RFC]: mutex: implement adaptive spinning
Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: * WOW * WOW indeed - and i can see a similar _brutal_ speedup on two separate 16-way boxes as well: 16 CPUs, running 128 parallel test-tasks. NO_OWNER_SPIN: avg ops/sec: 281595 OWNER_SPIN: avg ops/sec: 524791 Da Killer! This jives with our findings back when we first looked at this (200%-300% speedups in most benchmarks), so this is excellent that it is yielding boosts here as well. Look at the performance counter stats: 12098.324578 task clock ticks (msecs) 1081 CPU migrations (events) 7102 context switches (events) 2763 pagefaults (events) 22280.283224 task clock ticks (msecs) 117 CPU migrations (events) 5711 context switches (events) 2781 pagefaults (events) We were able to spend twice as much CPU time and efficiently so - and we did about 10% of the cross-CPU migrations as before (!). My (wild) guess is that the biggest speedup factor was perhaps this little trick: + if (need_resched()) + break; this allows the spin-mutex to only waste CPU time if there's no work around on that CPU. (i.e. if there's no other task that wants to run) The moment there's some other task, we context-switch to it. Well, IIUC thats only true if the other task happens to preempt current, which may not always be the case, right? For instance, if current still has timeslice left, etc. I think the primary difference is actually the reduction in the ctx switch rate, but its hard to say without looking at detailed traces and more stats. Either way, woohoo! -Greg signature.asc Description: OpenPGP digital signature
[PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 15:18 +0100, Ingo Molnar wrote: [ A detail, -tip testing found that the patch breaks mutex debugging: = [ BUG: bad unlock balance detected! ] - swapper/0 is trying to release lock (cpu_add_remove_lock) at: [8089f540] mutex_unlock+0xe/0x10 but there are no more locks to release! but that's a detail for -v7 ;-) ] Here it is.. I changed the optimistic spin cmpxchg trickery to not require the -1000 state, please double and tripple check the code. This was done because the interaction between trylock_slowpath and that -1000 state hurt my head. --- Subject: From: Peter Zijlstra a.p.zijls...@chello.nl Date: Thu Jan 08 09:41:22 CET 2009 Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl --- include/linux/mutex.h |4 +- include/linux/sched.h |1 kernel/mutex-debug.c| 24 ++- kernel/mutex-debug.h| 18 ++- kernel/mutex.c | 76 ++-- kernel/mutex.h | 22 - kernel/sched.c | 66 + kernel/sched_features.h |1 8 files changed, 185 insertions(+), 27 deletions(-) Index: linux-2.6/include/linux/mutex.h === --- linux-2.6.orig/include/linux/mutex.h +++ linux-2.6/include/linux/mutex.h @@ -50,8 +50,10 @@ struct mutex { atomic_tcount; spinlock_t wait_lock; struct list_headwait_list; -#ifdef CONFIG_DEBUG_MUTEXES +#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP) struct thread_info *owner; +#endif +#ifdef CONFIG_DEBUG_MUTEXES const char *name; void*magic; #endif Index: linux-2.6/kernel/mutex-debug.c === --- linux-2.6.orig/kernel/mutex-debug.c +++ linux-2.6/kernel/mutex-debug.c @@ -26,11 +26,6 @@ /* * Must be called with lock-wait_lock held. */ -void debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner) -{ - lock-owner = new_owner; -} - void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); @@ -80,9 +75,25 @@ void debug_mutex_unlock(struct mutex *lo return; DEBUG_LOCKS_WARN_ON(lock-magic != lock); +#if 0 + /* +* XXX something is iffy with owner tracking, but lockdep - which has +* similar checks - finds everything dandy, so disable for now. +*/ + if (lock-owner != current_thread_info()) { + printk(KERN_ERR %p %p\n, lock-owner, current_thread_info()); + if (lock-owner) { + printk(KERN_ERR %d %s\n, + lock-owner-task-pid, + lock-owner-task-comm); + } + printk(KERN_ERR %d %s\n, + current_thread_info()-task-pid, + current_thread_info()-task-comm); + } DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info()); +#endif DEBUG_LOCKS_WARN_ON(!lock-wait_list.prev !lock-wait_list.next); - DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info()); } void debug_mutex_init(struct mutex *lock, const char *name, @@ -95,7 +106,6 @@ void debug_mutex_init(struct mutex *lock debug_check_no_locks_freed((void *)lock, sizeof(*lock)); lockdep_init_map(lock-dep_map, name, key, 0); #endif - lock-owner = NULL; lock-magic = lock; } Index: linux-2.6/kernel/mutex-debug.h === --- linux-2.6.orig/kernel/mutex-debug.h +++ linux-2.6/kernel/mutex-debug.h @@ -13,14 +13,6 @@ /* * This must be called with lock-wait_lock held. */ -extern void -debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner); - -static inline void debug_mutex_clear_owner(struct mutex *lock) -{ - lock-owner = NULL; -} - extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); extern void debug_mutex_wake_waiter(struct mutex *lock, @@ -35,6 +27,16 @@ extern void debug_mutex_unlock(struct mu extern void debug_mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key); +static inline void mutex_set_owner(struct mutex *lock) +{ + lock-owner = current_thread_info(); +} + +static inline void mutex_clear_owner(struct mutex *lock) +{ + lock-owner = NULL; +} + #define spin_lock_mutex(lock, flags) \ do {\ struct mutex *l = container_of(lock, struct mutex, wait_lock); \ Index:
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 10:09 -0500, Steven Rostedt wrote: On Thu, 8 Jan 2009, Peter Zijlstra wrote: Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -4672,6 +4672,72 @@ need_resched_nonpreemptible: } EXPORT_SYMBOL(schedule); +#ifdef CONFIG_SMP +/* + * Look out! owner is an entirely speculative pointer + * access and not reliable. + */ +int spin_on_owner(struct mutex *lock, struct thread_info *owner) +{ + unsigned int cpu; + struct rq *rq; + int ret = 1; + + if (unlikely(!sched_feat(OWNER_SPIN))) I would remove the unlikely, if someone turns OWNER_SPIN off, then you have the wrong decision being made. Choices by users should never be in a likely or unlikely annotation. It's discrimination ;-) in the unlikely case we schedule(), that seems expensive enough to want to make the spin case ever so slightly faster. + return 0; + + preempt_disable(); +#ifdef CONFIG_DEBUG_PAGEALLOC + /* +* Need to access the cpu field knowing that +* DEBUG_PAGEALLOC could have unmapped it if +* the mutex owner just released it and exited. +*/ + if (probe_kernel_address(owner-cpu, cpu)) + goto out; +#else + cpu = owner-cpu; +#endif + + /* +* Even if the access succeeded (likely case), +* the cpu field may no longer be valid. +*/ + if (cpu = nr_cpumask_bits) + goto out; + + /* +* We need to validate that we can do a +* get_cpu() and that we have the percpu area. +*/ + if (!cpu_online(cpu)) + goto out; Should we need to do a get_cpu or something? Couldn't the CPU disappear between these two calls. Or does it do a stop-machine and the preempt disable will protect us? Did you miss the preempt_disable() a bit up? + + rq = cpu_rq(cpu); + + for (;;) { + if (lock-owner != owner) + break; + + /* +* Is that owner really running on that cpu? +*/ + if (task_thread_info(rq-curr) != owner) + break; + + if (need_resched()) { + ret = 0; + break; + } + + cpu_relax(); + } +out: + preempt_enable_no_resched(); + return ret; +} +#endif -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 10:28 -0500, Steven Rostedt wrote: On Thu, 8 Jan 2009, Peter Zijlstra wrote: in the unlikely case we schedule(), that seems expensive enough to want to make the spin case ever so slightly faster. OK, that makes sense, but I would comment that. Otherwise, it just looks like another misuse of the unlikely annotation. OK, sensible enough. Should we need to do a get_cpu or something? Couldn't the CPU disappear between these two calls. Or does it do a stop-machine and the preempt disable will protect us? Did you miss the preempt_disable() a bit up? No, let me rephrase it better. Does the preempt_disable protect against another CPU from going off line? Does taking a CPU off line do a stop_machine? Yes and yes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Steven Rostedt wrote: + /* +* We need to validate that we can do a +* get_cpu() and that we have the percpu area. +*/ + if (!cpu_online(cpu)) + goto out; Should we need to do a get_cpu or something? Couldn't the CPU disappear between these two calls. Or does it do a stop-machine and the preempt disable will protect us? Did you miss the preempt_disable() a bit up? No, let me rephrase it better. Does the preempt_disable protect against another CPU from going off line? Does taking a CPU off line do a stop_machine? I just looked at the cpu hotplug code, and it does call stop machine. All is in order ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Peter Zijlstra wrote: This was done because the interaction between trylock_slowpath and that -1000 state hurt my head. Yeah, it was a stupid hacky thing to avoid the list_empty(), but doing it explicitly is fine (we don't hold the lock, so the list isn't necessarily stable, but doing list_empty() is fine because we don't ever dereference the pointers, we just compare the pointers themselves). I shouldn't have done that hacky thing, it wasn't worth it. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 08:58 -0800, Linus Torvalds wrote: Ok, I've gone through -v7, and I'm sure you're all shocked to hear it, but I have no complaints. *cheer*, except I guess we need to figure out what goes bad for Chris. Except that you dropped all the good commit commentary you had earlier ;) Yeah, I've yet to add that back, will do. The patch looks pretty good (except for the big #if 0 block in mutex-debug.c that I hope gets fixed, but I can't even really claim that I can be bothered), the locking looks fine (ie no locking at all), and the numbers seem pretty convinving. Oh, and I think the open-coded atomic_cmpxchg(count, 1, 0) == 1 could possibly just be replaced with a simple __mutex_fastpath_trylock(). I dunno. __mutex_fastpath_trylock() isn't always that neat -- see include/asm-generic/mutex-xchg.h -- and its a NOP on DEBUG_MUTEXES. Note how I used old_val for the list_empty() thing as well, we could possibly drop that extra condition though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Steven Rostedt wrote: We keep spinning if the owner changes. I think we want to - if you have multiple CPU's and a heavily contended lock that acts as a spinlock, we still _do_ want to keep spinning even if another CPU gets the lock. And I don't even believe that is the bug. I suspect the bug is simpler. I think the need_resched() needs to go in the outer loop, or at least happen in the !owner case. Because at least with preemption, what can happen otherwise is - process A gets the lock, but gets preempted before it sets lock-owner. End result: count = 0, owner = NULL. - processes B/C goes into the spin loop, filling up all CPU's (assuming dual-core here), and will now both loop forever if they hold the kernel lock (or have some other preemption disabling thing over their down()). And all the while, process A would _happily_ set -owner, and eventually release the mutex, but it never gets to run to do either of them so. In fact, you might not even need a process C: all you need is for B to be on the same runqueue as A, and having enough load on the other CPU's that A never gets migrated away. So C might be in user space. I dunno. There are probably variations on the above. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
Unrelated: On Thu, 8 Jan 2009, Chris Mason wrote: RIP: 0010:[8024f4de] [8024f4de] __cmpxchg+0x36/0x3f Ouch. HOW THE HELL DID THAT NOT GET INLINED? cmpxchg() is a _single_ instruction if it's inlined, but it's a horrible mess of dynamic conditionals on the (constant - per call-site) size argument if it isn't. It looks like you probably enabled the let gcc mess up inlining config option. Ingo - I think we need to remove that crap again. Because gcc gets the inlining horribly horribly wrong. As usual. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Linus Torvalds wrote: And I don't even believe that is the bug. I suspect the bug is simpler. I think the need_resched() needs to go in the outer loop, or at least happen in the !owner case. Because at least with preemption, what can happen otherwise is - process A gets the lock, but gets preempted before it sets lock-owner. End result: count = 0, owner = NULL. - processes B/C goes into the spin loop, filling up all CPU's (assuming dual-core here), and will now both loop forever if they hold the kernel lock (or have some other preemption disabling thing over their down()). And all the while, process A would _happily_ set -owner, and eventually release the mutex, but it never gets to run to do either of them so. In fact, you might not even need a process C: all you need is for B to be on the same runqueue as A, and having enough load on the other CPU's that A never gets migrated away. So C might be in user space. I dunno. There are probably variations on the above. Ouch! I think you are on to something: for (;;) { struct thread_info *owner; old_val = atomic_cmpxchg(lock-count, 1, 0); if (old_val == 1) { lock_acquired(lock-dep_map, ip); mutex_set_owner(lock); return 0; } if (old_val 0 !list_empty(lock-wait_list)) break; /* See who owns it, and spin on him if anybody */ owner = ACCESS_ONCE(lock-owner); The owner was preempted before assigning lock-owner (as you stated). if (owner !spin_on_owner(lock, owner)) break; We just spin :-( I think adding the: + if (need_resched()) + break; would solve the problem. Thanks, -- Steve cpu_relax(); } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Steven Rostedt wrote: In fact, you might not even need a process C: all you need is for B to be on the same runqueue as A, and having enough load on the other CPU's that A never gets migrated away. So C might be in user space. You're right about not needing process C. I dunno. There are probably variations on the above. Ouch! I think you are on to something: for (;;) { struct thread_info *owner; old_val = atomic_cmpxchg(lock-count, 1, 0); if (old_val == 1) { lock_acquired(lock-dep_map, ip); mutex_set_owner(lock); return 0; } if (old_val 0 !list_empty(lock-wait_list)) break; /* See who owns it, and spin on him if anybody */ owner = ACCESS_ONCE(lock-owner); The owner was preempted before assigning lock-owner (as you stated). If it was the current process that preempted the owner and these are RT tasks pinned to the same CPU and the owner is of lower priority than the spinner, we have a deadlock! Hmm, I do not think the need_sched here will even fix that :-/ -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Steven Rostedt wrote: Ouch! I think you are on to something: Yeah, there's somethign there, but looking at Chris' backtrace, there's nothing there to disable preemption. So if it was this simple case, it should still have preempted him to let the other process run and finish up. So I don't think Chris' softlockup is at least _exactly_ that case. There's something else going on too. That said, I do think it's a mistake for us to care about the value of spin_on_owner(). I suspect v8 should - always have if (need_resched()) break in the outer loop. - drop the return value from spin_on_owner(), and just break out if anything changes (including the need_resched() flag). - I'd also drop the old_value 0 test, and just test the list_empty() unconditionally. Aim for really simple. As to what to do about the !owner case - we do want to spin on it, but the interaction with preemption is kind of nasty. I'd hesitate to make the mutex_[un]lock() use preempt_disable() to avoid scheduling in between getting the lock and settign the owner, though - because that would slow down the normal fast-path case. Maybe we should just limit the spin on !owner to some maximal count. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 10:16 -0800, Linus Torvalds wrote: On Thu, 8 Jan 2009, Steven Rostedt wrote: Ouch! I think you are on to something: Yeah, there's somethign there, but looking at Chris' backtrace, there's nothing there to disable preemption. So if it was this simple case, it should still have preempted him to let the other process run and finish up. My .config has no lockdep or schedule debugging and voluntary preempt. I do have CONFIG_INLINE_OPTIMIZE on, its a good name for trusting gcc I guess. So I don't think Chris' softlockup is at least _exactly_ that case. There's something else going on too. That said, I do think it's a mistake for us to care about the value of spin_on_owner(). I suspect v8 should - always have if (need_resched()) break in the outer loop. - drop the return value from spin_on_owner(), and just break out if anything changes (including the need_resched() flag). - I'd also drop the old_value 0 test, and just test the list_empty() unconditionally. I'll give the above a shot, and we can address the preempt + !owner in another rev -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
* Linus Torvalds torva...@linux-foundation.org wrote: Unrelated: On Thu, 8 Jan 2009, Chris Mason wrote: RIP: 0010:[8024f4de] [8024f4de] __cmpxchg+0x36/0x3f Ouch. HOW THE HELL DID THAT NOT GET INLINED? cmpxchg() is a _single_ instruction if it's inlined, but it's a horrible mess of dynamic conditionals on the (constant - per call-site) size argument if it isn't. It looks like you probably enabled the let gcc mess up inlining config option. Ingo - I think we need to remove that crap again. Because gcc gets the inlining horribly horribly wrong. As usual. Apparently it messes up with asm()s: it doesnt know the contents of the asm() and hence it over-estimates the size [based on string heuristics] ... Which is bad - asm()s tend to be the most important entities to inline - all over our fastpaths . Despite that messup it's still a 1% net size win: textdata bss dec hex filename 7109652 1464684 802888 9377224 8f15c8 vmlinux.always-inline 7046115 1465324 802888 9314327 8e2017 vmlinux.optimized-inlining That win is mixed in slowpath and fastpath as well. I see three options: - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already default-off) - Change the asm() inline markers to something new like asm_inline, which defaults to __always_inline. - Just mark all asm() inline markers as __always_inline - realizing that these should never ever be out of line. We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. I'll try to annotate the inline asms (there's not _that_ many of them), and measure what the size impact is. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
Ingo Molnar wrote: Apparently it messes up with asm()s: it doesnt know the contents of the asm() and hence it over-estimates the size [based on string heuristics] ... Right. gcc simply doesn't have any way to know how heavyweight an asm() statement is, and it WILL do the wrong thing in many cases -- especially the ones which involve an out-of-line recovery stub. This is due to a fundamental design decision in gcc to not integrate the compiler and assembler (which some compilers do.) Which is bad - asm()s tend to be the most important entities to inline - all over our fastpaths . Despite that messup it's still a 1% net size win: textdata bss dec hex filename 7109652 1464684 802888 9377224 8f15c8 vmlinux.always-inline 7046115 1465324 802888 9314327 8e2017 vmlinux.optimized-inlining That win is mixed in slowpath and fastpath as well. The good part here is that the assembly ones really don't have much subtlety -- a function call is at least five bytes, usually more once you count in the register spill penalties -- so __always_inline-ing them should still end up with numbers looking very much like the above. I see three options: - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already default-off) - Change the asm() inline markers to something new like asm_inline, which defaults to __always_inline. - Just mark all asm() inline markers as __always_inline - realizing that these should never ever be out of line. We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. I'll try to annotate the inline asms (there's not _that_ many of them), and measure what the size impact is. The main reason to do #2 over #3 would be for programmer documentation. There simply should be no reason to ever out-of-lining these. However, documenting the reason to the programmer is a valuable thing in itself. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 13:27 -0500, Chris Mason wrote: On Thu, 2009-01-08 at 10:16 -0800, Linus Torvalds wrote: On Thu, 8 Jan 2009, Steven Rostedt wrote: Ouch! I think you are on to something: Yeah, there's somethign there, but looking at Chris' backtrace, there's nothing there to disable preemption. So if it was this simple case, it should still have preempted him to let the other process run and finish up. My .config has no lockdep or schedule debugging and voluntary preempt. I do have CONFIG_INLINE_OPTIMIZE on, its a good name for trusting gcc I guess. The patch below isn't quite what Linus suggested, but it is working here at least. In every test I've tried so far, this is faster than the ugly btrfs spin. dbench v7.1:789mb/s dbench simple spin: 566MB/s 50 proc parallel creates v7.1:162 files/s avg sys: 1.6 50 proc parallel creates simple spin: 152 files/s avg sys: 2 50 proc parallel stat v7.1:2.3s total 50 proc parallel stat simple spin: 3.8s total It is less fair though, the 50 proc parallel creates had a much bigger span between the first and last proc's exit time. This isn't a huge shock, I think it shows the hot path is closer to a real spin lock. Here's the incremental I was using. It looks to me like most of the things that could change inside spin_on_owner mean we still want to spin. The only exception is the need_resched() flag. -chris diff --git a/kernel/mutex.c b/kernel/mutex.c index bd6342a..8936410 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -161,11 +161,13 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, return 0; } - if (old_val 0 !list_empty(lock-wait_list)) + if (!list_empty(lock-wait_list)) break; /* See who owns it, and spin on him if anybody */ owner = ACCESS_ONCE(lock-owner); + if (need_resched()) + break; if (owner !spin_on_owner(lock, owner)) break; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 13:14 -0500, Steven Rostedt wrote: On Thu, 8 Jan 2009, Steven Rostedt wrote: In fact, you might not even need a process C: all you need is for B to be on the same runqueue as A, and having enough load on the other CPU's that A never gets migrated away. So C might be in user space. You're right about not needing process C. I dunno. There are probably variations on the above. Ouch! I think you are on to something: for (;;) { struct thread_info *owner; old_val = atomic_cmpxchg(lock-count, 1, 0); if (old_val == 1) { lock_acquired(lock-dep_map, ip); mutex_set_owner(lock); return 0; } if (old_val 0 !list_empty(lock-wait_list)) break; /* See who owns it, and spin on him if anybody */ owner = ACCESS_ONCE(lock-owner); The owner was preempted before assigning lock-owner (as you stated). If it was the current process that preempted the owner and these are RT tasks pinned to the same CPU and the owner is of lower priority than the spinner, we have a deadlock! Hmm, I do not think the need_sched here will even fix that :-/ RT tasks could go directly to sleeping. The spinner would see them on the list and break out. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 11:13 -0800, Linus Torvalds wrote: Well, at least we do unless you enable that broken paravirt support. I'm not at all clear on why CONFIG_PARAVIRT wants to use inferior locks, but I don't much care. Because the virtual cpu that has the ticket might not get scheduled for a while, even though another vcpu with a spinner is scheduled. The whole (para)virt is a nightmare in that respect. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Chris Mason wrote: On Thu, 2009-01-08 at 13:14 -0500, Steven Rostedt wrote: If it was the current process that preempted the owner and these are RT tasks pinned to the same CPU and the owner is of lower priority than the spinner, we have a deadlock! Hmm, I do not think the need_sched here will even fix that :-/ RT tasks could go directly to sleeping. The spinner would see them on the list and break out. True, we could do: if (owner) { if (!spin_on_owner(lock, owner)) break; } else if (rt_task(current)) break; That would at least solve the issue in the short term. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 19:33 +0100, Ingo Molnar wrote: * Linus Torvalds torva...@linux-foundation.org wrote: snip Ingo - I think we need to remove that crap again. Because gcc gets the inlining horribly horribly wrong. As usual. Apparently it messes up with asm()s: it doesnt know the contents of the asm() and hence it over-estimates the size [based on string heuristics] ... snip That win is mixed in slowpath and fastpath as well. I see three options: - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already default-off) I'd like to see this, leave all the heuristics out of it, if I say inline, I don't mean _maybe_, I mean you'd better damn well inline it. On the other hand, alpha seems to be hand-disabling the inline means __always_inline in their arch headers, so if this option is kept, it should be enabled on alpha as that is the current state of play there and get rid of that arch-private bit. - Change the asm() inline markers to something new like asm_inline, which defaults to __always_inline. I'd suggest just making inline always mean __always_inline. And get to work removing inline from functions in C files. This is probably also the better choice to keep older gccs producing decent code. - Just mark all asm() inline markers as __always_inline - realizing that these should never ever be out of line. We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. Or just make it clear that inline shouldn't (unless for a very good reason) _ever_ be used in a .c file. Cheers, Harvey -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
The problem? Setting lock-count to 0. That will mean that the next mutex_unlock() will not necessarily enter the slowpath at all, and won't necessarily wake things up like it should. Normally we set lock-count to 0 after getting the lock, and only _inside_ the spinlock, and then we check the waiters after that. The comment says it all: /* set it to 0 if there are no waiters left: */ if (likely(list_empty(lock-wait_list))) atomic_set(lock-count, 0); and the spinning case violates that rule. The difference is that here we have it set to -1 (in the non patched code), and we have to decide if we should change that to 0. To change from -1 to 0 needs the protection of the spin locks. In the loop, we only change from 1 to 0 which is the same as the fast path, and should not cause any problems. Now, the spinning case only sets it to 0 if we saw it set to 1, so I think the argument can go something like: Yep. - if it was 1, and we _have_ seen contention, then we know that at least _one_ person that set it to 1 must have gone through the unlock slowpath (ie it wasn't just a normal locked increment. Correct. - So even if _we_ (in the spinning part of stealing that lock) didn't wake the waiter up, the slowpath wakeup case (that did _not_ wake us up, since we were spinning and hadn't added ourselves to the wait list) must have done so. Agreed. So maybe it's all really really safe, and we're still guaranteed to have as many wakeups as we had go-to-sleeps. But I have to admit that my brain hurts a bit from worrying about this. I do not think that the issue with the previous bug that Chris showed had anything to do with the actual sleepers. The slow path never changes the lock to '1', so it should not affect the spinners. We can think of the spinners as not having true contention with the lock, and are just like a: while (cond) { if (mutex_trylock(lock)) goto got_the_lock; } Sleeping mutexes are not ever simple. Now you see why in -rt we did all this in the slow path ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
Harvey Harrison wrote: We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. Or just make it clear that inline shouldn't (unless for a very good reason) _ever_ be used in a .c file. The question is if that would produce acceptable quality code. In theory it should, but I'm more than wondering if it really will. It would be ideal, of course, as it would mean less typing. I guess we could try it out by disabling any inline in the current code that isn't __always_inline... -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 2009-01-08 at 17:44 -0800, H. Peter Anvin wrote: Harvey Harrison wrote: We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. Or just make it clear that inline shouldn't (unless for a very good reason) _ever_ be used in a .c file. The question is if that would produce acceptable quality code. In theory it should, but I'm more than wondering if it really will. It would be ideal, of course, as it would mean less typing. I guess we could try it out by disabling any inline in the current code that isn't __always_inline... A lot of code was written assuming inline means __always_inline, I'd suggest keeping that assumption and working on removing inlines that aren't strictly necessary as there's no way to know what inlines meant 'try to inline' and what ones really should have been __always_inline. Not that I feel _that_ strongly about it. Cheers, Harvey -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's wrong with snapshot?
thanks. I got it. Shen on 01/08/2009 07:49 PM, Yan Zheng wrote: 2009/1/8 Shen Feng s...@cn.fujitsu.com: Hello, I tried to ceate a snapshot of a subvol. But it seems now work. Please see the test below. I think the file created by dd should also in the snap directory. This is the expected behaviour. Following link describes what snapshot is. http://en.wikipedia.org/wiki/Snapshot_(computer_storage) Yan Zheng -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, Jan 08, 2009 at 05:44:25PM -0800, H. Peter Anvin wrote: Harvey Harrison wrote: We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. Or just make it clear that inline shouldn't (unless for a very good reason) _ever_ be used in a .c file. The question is if that would produce acceptable quality code. In theory it should, but I'm more than wondering if it really will. I actually often use noinline when developing code simply because it makes it easier to read oopses when gcc doesn't inline ever static (which it normally does if it only has a single caller). You know roughly where it crashed without having to decode the line number. I believe others do that too, I notice it's all over btrfs for example. -Andi -- a...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, H. Peter Anvin wrote: Right. gcc simply doesn't have any way to know how heavyweight an asm() statement is I don't think that's relevant. First off, gcc _does_ have a perfectly fine notion of how heavy-weight an asm statement is: just count it as a single instruction (and count the argument setup cost that gcc _can_ estimate). That would be perfectly fine. If people use inline asms, they tend to use it for a reason. However, I doubt that it's the inline asm that was the biggest reason why gcc decided not to inline - it was probably the constant switch() statement. The inline function actually looks pretty large, if it wasn't for the fact that we have a constant argument, and that one makes the switch statement go away. I suspect gcc has some pre-inlining heuristics that don't take constant folding and simplifiation into account - if you look at just the raw tree of the function without taking the optimization into account, it will look big. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
From: Linus Torvalds torva...@linux-foundation.org Date: Thu, 8 Jan 2009 19:46:30 -0800 (PST) First off, gcc _does_ have a perfectly fine notion of how heavy-weight an asm statement is: just count it as a single instruction (and count the argument setup cost that gcc _can_ estimate). Actually, at least at one point, it counted the number of newline characters in the assembly string to estimate how many instructions are contained inside. It actually needs to know exaclty how many instructions are in there, to emit proper far branches and stuff like that, for some cpus. Since they never added an (optional) way to actually tell the compiler this critical piece of information, so I guess the newline hack is the best they could come up with. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
Linus Torvalds wrote: First off, gcc _does_ have a perfectly fine notion of how heavy-weight an asm statement is: just count it as a single instruction (and count the argument setup cost that gcc _can_ estimate). True. It's not what it's doing, though. It looks for '\n' and ';' characters, and counts the maximum instruction size for each possible instruction. The reason why is that gcc's size estimation is partially designed to select what kind of branches it needs to use on architectures which have more than one type of branches. As a result, it tends to drastically overestimate, on purpose. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
Harvey Harrison wrote: A lot of code was written assuming inline means __always_inline, I'd suggest keeping that assumption and working on removing inlines that aren't strictly necessary as there's no way to know what inlines meant 'try to inline' and what ones really should have been __always_inline. Not that I feel _that_ strongly about it. Actually, we have that reasonably well down by now. There seems to be a couple of minor tweaking still necessary, but I think we're 90-95% there already. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Fri, 9 Jan 2009 04:35:31 +0100 Andi Kleen a...@firstfloor.org wrote: On Thu, Jan 08, 2009 at 05:44:25PM -0800, H. Peter Anvin wrote: Harvey Harrison wrote: We might still try the second or third options, as i think we shouldnt go back into the business of managing the inline attributes of ~100,000 kernel functions. Or just make it clear that inline shouldn't (unless for a very good reason) _ever_ be used in a .c file. The question is if that would produce acceptable quality code. In theory it should, but I'm more than wondering if it really will. I actually often use noinline when developing code simply because it makes it easier to read oopses when gcc doesn't inline ever static (which it normally does if it only has a single caller). You know roughly where it crashed without having to decode the line number. I believe others do that too, I notice it's all over btrfs for example. Plus there is the problem where foo() { char a[1000]; } bar() { char a[1000]; } zot() { foo(); bar(); } uses 2000 bytes of stack. Fortunately scripts/checkstack.pl can find these. If someone runs it. With the right kconfig settings. And with the right compiler version. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, Jan 08, 2009 at 07:42:48PM -0800, Linus Torvalds wrote: I actually often use noinline when developing code simply because it makes it easier to read oopses when gcc doesn't inline ever static (which it normally does if it only has a single caller) Yes. Gcc inlining is a total piece of sh*t. The static inlining by default (unfortunately) saves a lot of text size. For testing I built an x86-64 allyesconfig kernel with -fno-inline-functions-called-once (which disables the default static inlining), and it increased text size by ~4.1% (over 2MB for a allyesconfig kernel). So I think we have to keep that, dropping it would cost too much :/ -Andi -- a...@linux.intel.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html