Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, 2014-01-16 at 13:05 +0100, Peter Zijlstra wrote: > On Wed, Jan 15, 2014 at 10:46:17PM -0800, Jason Low wrote: > > On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: > > > On Thu, Jan 16, 2014 at 9:45 AM, Jason Low wrote: > > > > > > > > Any comments on the below change which unlocks the mutex before taking > > > > the lock->wait_lock to wake up a waiter? Thanks. > > > > > > Hmm. Doesn't that mean that a new lock owner can come in *before* > > > you've called debug_mutex_unlock and the lockdep stuff, and get the > > > lock? And then debug_mutex_lock() will be called *before* the unlocker > > > called debug_mutex_unlock(), which I'm sure confuses things. > > > > If obtaining the wait_lock for debug_mutex_unlock is the issue, then > > perhaps we can address that by taking care of > > #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can > > take the wait_lock first, and in the regular case, take the wait_lock > > after releasing the mutex. > > I think we're already good for DEBUG_MUTEXES, because DEBUG_MUTEXES has > to work for archs that have !__mutex_slowpath_needs_to_unlock() and also > the DEBUG_MUTEXES code is entirely serialized on ->wait_lock. Yeah, in the !__mutex_slowpath_needs_to_unlock() case, we release the mutex before calling __mutex_unlock_common_slowpath(). Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Wed, Jan 15, 2014 at 10:46:17PM -0800, Jason Low wrote: > On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: > > On Thu, Jan 16, 2014 at 9:45 AM, Jason Low wrote: > > > > > > Any comments on the below change which unlocks the mutex before taking > > > the lock->wait_lock to wake up a waiter? Thanks. > > > > Hmm. Doesn't that mean that a new lock owner can come in *before* > > you've called debug_mutex_unlock and the lockdep stuff, and get the > > lock? And then debug_mutex_lock() will be called *before* the unlocker > > called debug_mutex_unlock(), which I'm sure confuses things. > > If obtaining the wait_lock for debug_mutex_unlock is the issue, then > perhaps we can address that by taking care of > #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can > take the wait_lock first, and in the regular case, take the wait_lock > after releasing the mutex. I think we're already good for DEBUG_MUTEXES, because DEBUG_MUTEXES has to work for archs that have !__mutex_slowpath_needs_to_unlock() and also the DEBUG_MUTEXES code is entirely serialized on ->wait_lock. Note that we cannot do the optimistic spinning for DEBUG_MUTEXES exactly because of this reason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Wed, Jan 15, 2014 at 10:46:17PM -0800, Jason Low wrote: On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: On Thu, Jan 16, 2014 at 9:45 AM, Jason Low jason.l...@hp.com wrote: Any comments on the below change which unlocks the mutex before taking the lock-wait_lock to wake up a waiter? Thanks. Hmm. Doesn't that mean that a new lock owner can come in *before* you've called debug_mutex_unlock and the lockdep stuff, and get the lock? And then debug_mutex_lock() will be called *before* the unlocker called debug_mutex_unlock(), which I'm sure confuses things. If obtaining the wait_lock for debug_mutex_unlock is the issue, then perhaps we can address that by taking care of #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can take the wait_lock first, and in the regular case, take the wait_lock after releasing the mutex. I think we're already good for DEBUG_MUTEXES, because DEBUG_MUTEXES has to work for archs that have !__mutex_slowpath_needs_to_unlock() and also the DEBUG_MUTEXES code is entirely serialized on -wait_lock. Note that we cannot do the optimistic spinning for DEBUG_MUTEXES exactly because of this reason. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, 2014-01-16 at 13:05 +0100, Peter Zijlstra wrote: On Wed, Jan 15, 2014 at 10:46:17PM -0800, Jason Low wrote: On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: On Thu, Jan 16, 2014 at 9:45 AM, Jason Low jason.l...@hp.com wrote: Any comments on the below change which unlocks the mutex before taking the lock-wait_lock to wake up a waiter? Thanks. Hmm. Doesn't that mean that a new lock owner can come in *before* you've called debug_mutex_unlock and the lockdep stuff, and get the lock? And then debug_mutex_lock() will be called *before* the unlocker called debug_mutex_unlock(), which I'm sure confuses things. If obtaining the wait_lock for debug_mutex_unlock is the issue, then perhaps we can address that by taking care of #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can take the wait_lock first, and in the regular case, take the wait_lock after releasing the mutex. I think we're already good for DEBUG_MUTEXES, because DEBUG_MUTEXES has to work for archs that have !__mutex_slowpath_needs_to_unlock() and also the DEBUG_MUTEXES code is entirely serialized on -wait_lock. Yeah, in the !__mutex_slowpath_needs_to_unlock() case, we release the mutex before calling __mutex_unlock_common_slowpath(). Jason -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: > On Thu, Jan 16, 2014 at 9:45 AM, Jason Low wrote: > > > > Any comments on the below change which unlocks the mutex before taking > > the lock->wait_lock to wake up a waiter? Thanks. > > Hmm. Doesn't that mean that a new lock owner can come in *before* > you've called debug_mutex_unlock and the lockdep stuff, and get the > lock? And then debug_mutex_lock() will be called *before* the unlocker > called debug_mutex_unlock(), which I'm sure confuses things. If obtaining the wait_lock for debug_mutex_unlock is the issue, then perhaps we can address that by taking care of #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can take the wait_lock first, and in the regular case, take the wait_lock after releasing the mutex. #ifdef CONFIG_DEBUG_MUTEXES spin_lock_mutex(>wait_lock, flags); #endif mutex_release(>dep_map, nested, _RET_IP_); debug_mutex_unlock(lock); if (__mutex_slowpath_needs_to_unlock()) atomic_set(>count, 1); #ifndef CONFIG_DEBUG_MUTEXES spin_lock_mutex(>wait_lock, flags); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, Jan 16, 2014 at 9:45 AM, Jason Low wrote: > > Any comments on the below change which unlocks the mutex before taking > the lock->wait_lock to wake up a waiter? Thanks. Hmm. Doesn't that mean that a new lock owner can come in *before* you've called debug_mutex_unlock and the lockdep stuff, and get the lock? And then debug_mutex_lock() will be called *before* the unlocker called debug_mutex_unlock(), which I'm sure confuses things. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: > When running workloads that have high contention in mutexes on an 8 socket > machine, spinners would often spin for a long time with no lock owner. > > One of the potential reasons for this is because a thread can be preempted > after clearing lock->owner but before releasing the lock, or preempted after > acquiring the mutex but before setting lock->owner. In those cases, the > spinner cannot check if owner is not on_cpu because lock->owner is NULL. Looks like a bigger source of !owner latency is in __mutex_unlock_common_slowpath(). If __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the wait_lock before setting lock->count to 1. If the wait_lock is being contended, which is occurring with some workloads on my box, then this can delay the owner from releasing the lock by quite a bit. Any comments on the below change which unlocks the mutex before taking the lock->wait_lock to wake up a waiter? Thanks. --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b500cc7..38f0eb0 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -723,10 +723,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) struct mutex *lock = container_of(lock_count, struct mutex, count); unsigned long flags; - spin_lock_mutex(>wait_lock, flags); - mutex_release(>dep_map, nested, _RET_IP_); - debug_mutex_unlock(lock); - /* * some architectures leave the lock unlocked in the fastpath failure * case, others need to leave it locked. In the later case we have to @@ -735,6 +731,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) if (__mutex_slowpath_needs_to_unlock()) atomic_set(>count, 1); + spin_lock_mutex(>wait_lock, flags); + mutex_release(>dep_map, nested, _RET_IP_); + debug_mutex_unlock(lock); + if (!list_empty(>wait_list)) { /* get the first entry from the wait-list: */ struct mutex_waiter *waiter = -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On 01/14/2014 07:33 PM, Jason Low wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock->owner but before releasing the lock, or preempted after acquiring the mutex but before setting lock->owner. In those cases, the spinner cannot check if owner is not on_cpu because lock->owner is NULL. A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock->owner. However, that will require adding overhead to the mutex fastpath. The solution used in this patch is to limit the # of times thread can spin on lock->count when !owner. The threshold used in this patch for each spinner was 128, which appeared to be a generous value, but any suggestions on another method to determine the threshold are welcomed. Signed-off-by: Jason Low --- kernel/locking/mutex.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b500cc7..9465604 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -43,6 +43,7 @@ * mutex. */ #define MUTEX_SHOW_NO_WAITER(mutex) (atomic_read(&(mutex)->count)>= 0) +#defineMUTEX_SPIN_THRESHOLD(128) void __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) @@ -418,7 +419,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; - int ret; + int ret, nr_spins = 0; struct mspin_node node; preempt_disable(); @@ -453,6 +454,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, mspin_lock(MLOCK(lock),); for (;;) { struct task_struct *owner; + nr_spins++; if (use_ww_ctx&& ww_ctx->acquired> 0) { struct ww_mutex *ww; @@ -502,9 +504,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * When there's no owner, we might have preempted between the * owner acquiring the lock and setting the owner field. If * we're an RT task that will live-lock because we won't let -* the owner complete. +* the owner complete. Additionally, when there is no owner, +* stop spinning after too many tries. */ - if (!owner&& (need_resched() || rt_task(task))) { + if (!owner&& (need_resched() || rt_task(task) || + nr_spins> MUTEX_SPIN_THRESHOLD)) { mspin_unlock(MLOCK(lock),); goto slowpath; } The time that a thread spent on one iteration of the loop can be highly variable. Instead of a predefined iteration count, you may consider setting an elapsed time limit on how long a thread can spin in the loop using changes in jiffies as a proxy. Let say setting a limit of 40ms. On a 1000Hz system, a changes of 40 or more in jiffies will indicate it is time to leave and go to the slowpath. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On 01/14/2014 07:33 PM, Jason Low wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock, or preempted after acquiring the mutex but before setting lock-owner. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock-owner. However, that will require adding overhead to the mutex fastpath. The solution used in this patch is to limit the # of times thread can spin on lock-count when !owner. The threshold used in this patch for each spinner was 128, which appeared to be a generous value, but any suggestions on another method to determine the threshold are welcomed. Signed-off-by: Jason Lowjason.l...@hp.com --- kernel/locking/mutex.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b500cc7..9465604 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -43,6 +43,7 @@ * mutex. */ #define MUTEX_SHOW_NO_WAITER(mutex) (atomic_read((mutex)-count)= 0) +#defineMUTEX_SPIN_THRESHOLD(128) void __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) @@ -418,7 +419,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; - int ret; + int ret, nr_spins = 0; struct mspin_node node; preempt_disable(); @@ -453,6 +454,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, mspin_lock(MLOCK(lock),node); for (;;) { struct task_struct *owner; + nr_spins++; if (use_ww_ctx ww_ctx-acquired 0) { struct ww_mutex *ww; @@ -502,9 +504,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * When there's no owner, we might have preempted between the * owner acquiring the lock and setting the owner field. If * we're an RT task that will live-lock because we won't let -* the owner complete. +* the owner complete. Additionally, when there is no owner, +* stop spinning after too many tries. */ - if (!owner (need_resched() || rt_task(task))) { + if (!owner (need_resched() || rt_task(task) || + nr_spins MUTEX_SPIN_THRESHOLD)) { mspin_unlock(MLOCK(lock),node); goto slowpath; } The time that a thread spent on one iteration of the loop can be highly variable. Instead of a predefined iteration count, you may consider setting an elapsed time limit on how long a thread can spin in the loop using changes in jiffies as a proxy. Let say setting a limit of 40ms. On a 1000Hz system, a changes of 40 or more in jiffies will indicate it is time to leave and go to the slowpath. -Longman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock, or preempted after acquiring the mutex but before setting lock-owner. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. Looks like a bigger source of !owner latency is in __mutex_unlock_common_slowpath(). If __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the wait_lock before setting lock-count to 1. If the wait_lock is being contended, which is occurring with some workloads on my box, then this can delay the owner from releasing the lock by quite a bit. Any comments on the below change which unlocks the mutex before taking the lock-wait_lock to wake up a waiter? Thanks. --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b500cc7..38f0eb0 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -723,10 +723,6 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) struct mutex *lock = container_of(lock_count, struct mutex, count); unsigned long flags; - spin_lock_mutex(lock-wait_lock, flags); - mutex_release(lock-dep_map, nested, _RET_IP_); - debug_mutex_unlock(lock); - /* * some architectures leave the lock unlocked in the fastpath failure * case, others need to leave it locked. In the later case we have to @@ -735,6 +731,10 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) if (__mutex_slowpath_needs_to_unlock()) atomic_set(lock-count, 1); + spin_lock_mutex(lock-wait_lock, flags); + mutex_release(lock-dep_map, nested, _RET_IP_); + debug_mutex_unlock(lock); + if (!list_empty(lock-wait_list)) { /* get the first entry from the wait-list: */ struct mutex_waiter *waiter = -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, Jan 16, 2014 at 9:45 AM, Jason Low jason.l...@hp.com wrote: Any comments on the below change which unlocks the mutex before taking the lock-wait_lock to wake up a waiter? Thanks. Hmm. Doesn't that mean that a new lock owner can come in *before* you've called debug_mutex_unlock and the lockdep stuff, and get the lock? And then debug_mutex_lock() will be called *before* the unlocker called debug_mutex_unlock(), which I'm sure confuses things. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Thu, 2014-01-16 at 10:14 +0700, Linus Torvalds wrote: On Thu, Jan 16, 2014 at 9:45 AM, Jason Low jason.l...@hp.com wrote: Any comments on the below change which unlocks the mutex before taking the lock-wait_lock to wake up a waiter? Thanks. Hmm. Doesn't that mean that a new lock owner can come in *before* you've called debug_mutex_unlock and the lockdep stuff, and get the lock? And then debug_mutex_lock() will be called *before* the unlocker called debug_mutex_unlock(), which I'm sure confuses things. If obtaining the wait_lock for debug_mutex_unlock is the issue, then perhaps we can address that by taking care of #ifdef CONFIG_DEBUG_MUTEXES. In the CONFIG_DEBUG_MUTEXES case, we can take the wait_lock first, and in the regular case, take the wait_lock after releasing the mutex. #ifdef CONFIG_DEBUG_MUTEXES spin_lock_mutex(lock-wait_lock, flags); #endif mutex_release(lock-dep_map, nested, _RET_IP_); debug_mutex_unlock(lock); if (__mutex_slowpath_needs_to_unlock()) atomic_set(lock-count, 1); #ifndef CONFIG_DEBUG_MUTEXES spin_lock_mutex(lock-wait_lock, flags); #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 17:06 -0800, Davidlohr Bueso wrote: > On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: > > When running workloads that have high contention in mutexes on an 8 socket > > machine, spinners would often spin for a long time with no lock owner. > > > > One of the potential reasons for this is because a thread can be preempted > > after clearing lock->owner but before releasing the lock > > What happens if you invert the order here? So mutex_clear_owner() is > called after the actual unlocking (__mutex_fastpath_unlock). Reversing the mutex_fastpath_unlock and mutex_clear_owner resulted in a 20+% performance improvement to Ingo's test-mutex application at 160 threads on an 8 socket box. I have tried this method before, but what I was initially concerned about with clearing the owner after unlocking was that the following scenario may occur. thread 1 releases the lock thread 2 acquires the lock (in the fastpath) thread 2 sets the owner thread 1 clears owner In this situation, lock owner is NULL but thread 2 has the lock. > > or preempted after > > acquiring the mutex but before setting lock->owner. > > That would be the case _only_ for the fastpath. For the slowpath > (including optimistic spinning) preemption is already disabled at that > point. Right, for just the fastpath_lock. > > In those cases, the > > spinner cannot check if owner is not on_cpu because lock->owner is NULL. > > > > A solution that would address the preemption part of this problem would > > be to disable preemption between acquiring/releasing the mutex and > > setting/clearing the lock->owner. However, that will require adding overhead > > to the mutex fastpath. > > It's not uncommon to disable preemption in hotpaths, the overhead should > be quite smaller, actually. > > > > > The solution used in this patch is to limit the # of times thread can spin > > on > > lock->count when !owner. > > > > The threshold used in this patch for each spinner was 128, which appeared to > > be a generous value, but any suggestions on another method to determine > > the threshold are welcomed. > > Hmm generous compared to what? Could you elaborate further on how you > reached this value? These kind of magic numbers have produced > significant debate in the past. I've observed that when running workloads which don't exhibit this behavior (long spins with no owner), threads rarely take more than 100 extra spins. So I went with 128 based on those number. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 17:00 -0800, Andrew Morton wrote: > On Tue, 14 Jan 2014 16:33:10 -0800 Jason Low wrote: > > > When running workloads that have high contention in mutexes on an 8 socket > > machine, spinners would often spin for a long time with no lock owner. > > > > One of the potential reasons for this is because a thread can be preempted > > after clearing lock->owner but before releasing the lock, or preempted after > > acquiring the mutex but before setting lock->owner. In those cases, the > > spinner cannot check if owner is not on_cpu because lock->owner is NULL. > > That sounds like a very small window. And your theory is that this > window is being hit sufficiently often to impact aggregate runtime > measurements, which sounds improbable to me? > > > A solution that would address the preemption part of this problem would > > be to disable preemption between acquiring/releasing the mutex and > > setting/clearing the lock->owner. However, that will require adding overhead > > to the mutex fastpath. > > preempt_disable() is cheap, and sometimes free. > > Have you confirmed that the preempt_disable() approach actually fixes > the performance issues? If it does then this would confirm your > "potential reason" hypothesis. If it doesn't then we should be hunting > further for the explanation. Using Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50) which can also generate high mutex contention, the preempt_disable() approach did provide approximately a 4% improvement at 160 threads, but not nearly the 25+% I was seeing with this patchset. So, it looks like preemption is not the main cause of the problem then. Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: > When running workloads that have high contention in mutexes on an 8 socket > machine, spinners would often spin for a long time with no lock owner. > > One of the potential reasons for this is because a thread can be preempted > after clearing lock->owner but before releasing the lock What happens if you invert the order here? So mutex_clear_owner() is called after the actual unlocking (__mutex_fastpath_unlock). > or preempted after > acquiring the mutex but before setting lock->owner. That would be the case _only_ for the fastpath. For the slowpath (including optimistic spinning) preemption is already disabled at that point. > In those cases, the > spinner cannot check if owner is not on_cpu because lock->owner is NULL. > > A solution that would address the preemption part of this problem would > be to disable preemption between acquiring/releasing the mutex and > setting/clearing the lock->owner. However, that will require adding overhead > to the mutex fastpath. It's not uncommon to disable preemption in hotpaths, the overhead should be quite smaller, actually. > > The solution used in this patch is to limit the # of times thread can spin on > lock->count when !owner. > > The threshold used in this patch for each spinner was 128, which appeared to > be a generous value, but any suggestions on another method to determine > the threshold are welcomed. Hmm generous compared to what? Could you elaborate further on how you reached this value? These kind of magic numbers have produced significant debate in the past. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 14 Jan 2014 16:33:10 -0800 Jason Low wrote: > When running workloads that have high contention in mutexes on an 8 socket > machine, spinners would often spin for a long time with no lock owner. > > One of the potential reasons for this is because a thread can be preempted > after clearing lock->owner but before releasing the lock, or preempted after > acquiring the mutex but before setting lock->owner. In those cases, the > spinner cannot check if owner is not on_cpu because lock->owner is NULL. That sounds like a very small window. And your theory is that this window is being hit sufficiently often to impact aggregate runtime measurements, which sounds improbable to me? > A solution that would address the preemption part of this problem would > be to disable preemption between acquiring/releasing the mutex and > setting/clearing the lock->owner. However, that will require adding overhead > to the mutex fastpath. preempt_disable() is cheap, and sometimes free. Have you confirmed that the preempt_disable() approach actually fixes the performance issues? If it does then this would confirm your "potential reason" hypothesis. If it doesn't then we should be hunting further for the explanation. > The solution used in this patch is to limit the # of times thread can spin on > lock->count when !owner. > > The threshold used in this patch for each spinner was 128, which appeared to > be a generous value, but any suggestions on another method to determine > the threshold are welcomed. It's a bit hacky, isn't it? If your "owner got preempted in the window" theory is correct then I guess this is reasonableish. But if !owner is occurring for other reasons then perhaps there are better solutions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 14 Jan 2014 16:33:10 -0800 Jason Low jason.l...@hp.com wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock, or preempted after acquiring the mutex but before setting lock-owner. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. That sounds like a very small window. And your theory is that this window is being hit sufficiently often to impact aggregate runtime measurements, which sounds improbable to me? A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock-owner. However, that will require adding overhead to the mutex fastpath. preempt_disable() is cheap, and sometimes free. Have you confirmed that the preempt_disable() approach actually fixes the performance issues? If it does then this would confirm your potential reason hypothesis. If it doesn't then we should be hunting further for the explanation. The solution used in this patch is to limit the # of times thread can spin on lock-count when !owner. The threshold used in this patch for each spinner was 128, which appeared to be a generous value, but any suggestions on another method to determine the threshold are welcomed. It's a bit hacky, isn't it? If your owner got preempted in the window theory is correct then I guess this is reasonableish. But if !owner is occurring for other reasons then perhaps there are better solutions. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock What happens if you invert the order here? So mutex_clear_owner() is called after the actual unlocking (__mutex_fastpath_unlock). or preempted after acquiring the mutex but before setting lock-owner. That would be the case _only_ for the fastpath. For the slowpath (including optimistic spinning) preemption is already disabled at that point. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock-owner. However, that will require adding overhead to the mutex fastpath. It's not uncommon to disable preemption in hotpaths, the overhead should be quite smaller, actually. The solution used in this patch is to limit the # of times thread can spin on lock-count when !owner. The threshold used in this patch for each spinner was 128, which appeared to be a generous value, but any suggestions on another method to determine the threshold are welcomed. Hmm generous compared to what? Could you elaborate further on how you reached this value? These kind of magic numbers have produced significant debate in the past. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 17:00 -0800, Andrew Morton wrote: On Tue, 14 Jan 2014 16:33:10 -0800 Jason Low jason.l...@hp.com wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock, or preempted after acquiring the mutex but before setting lock-owner. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. That sounds like a very small window. And your theory is that this window is being hit sufficiently often to impact aggregate runtime measurements, which sounds improbable to me? A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock-owner. However, that will require adding overhead to the mutex fastpath. preempt_disable() is cheap, and sometimes free. Have you confirmed that the preempt_disable() approach actually fixes the performance issues? If it does then this would confirm your potential reason hypothesis. If it doesn't then we should be hunting further for the explanation. Using Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50) which can also generate high mutex contention, the preempt_disable() approach did provide approximately a 4% improvement at 160 threads, but not nearly the 25+% I was seeing with this patchset. So, it looks like preemption is not the main cause of the problem then. Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries
On Tue, 2014-01-14 at 17:06 -0800, Davidlohr Bueso wrote: On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote: When running workloads that have high contention in mutexes on an 8 socket machine, spinners would often spin for a long time with no lock owner. One of the potential reasons for this is because a thread can be preempted after clearing lock-owner but before releasing the lock What happens if you invert the order here? So mutex_clear_owner() is called after the actual unlocking (__mutex_fastpath_unlock). Reversing the mutex_fastpath_unlock and mutex_clear_owner resulted in a 20+% performance improvement to Ingo's test-mutex application at 160 threads on an 8 socket box. I have tried this method before, but what I was initially concerned about with clearing the owner after unlocking was that the following scenario may occur. thread 1 releases the lock thread 2 acquires the lock (in the fastpath) thread 2 sets the owner thread 1 clears owner In this situation, lock owner is NULL but thread 2 has the lock. or preempted after acquiring the mutex but before setting lock-owner. That would be the case _only_ for the fastpath. For the slowpath (including optimistic spinning) preemption is already disabled at that point. Right, for just the fastpath_lock. In those cases, the spinner cannot check if owner is not on_cpu because lock-owner is NULL. A solution that would address the preemption part of this problem would be to disable preemption between acquiring/releasing the mutex and setting/clearing the lock-owner. However, that will require adding overhead to the mutex fastpath. It's not uncommon to disable preemption in hotpaths, the overhead should be quite smaller, actually. The solution used in this patch is to limit the # of times thread can spin on lock-count when !owner. The threshold used in this patch for each spinner was 128, which appeared to be a generous value, but any suggestions on another method to determine the threshold are welcomed. Hmm generous compared to what? Could you elaborate further on how you reached this value? These kind of magic numbers have produced significant debate in the past. I've observed that when running workloads which don't exhibit this behavior (long spins with no owner), threads rarely take more than 100 extra spins. So I went with 128 based on those number. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/