Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
* Davidlohr Bueso wrote: > On Tue, 2013-12-10 at 16:36 +0100, Ingo Molnar wrote: > > * Davidlohr Bueso wrote: > > > > > On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: > > > > On Tue, 3 Dec 2013, Davidlohr Bueso wrote: > > > > > Signed-off-by: Waiman Long > > > > > Signed-off-by: Jason Low > > > > > Signed-off-by: Davidlohr Bueso > > > > > > > > Who is the author of this? According to the SOB chains its Waiman, who > > > > handed it down to Jason and it finally got sumbitted by you. Is that > > > > correct? > > > > > > This was a group effort for the original patchset -- things did change > > > quite a bit for this particular patch in v2. Waiman and Jason helped > > > review and test the original atomic ops code. > > > > Then please put that into the credits (in the changelog and/or in the > > source code), and/or into copyright notices, etc. - but keep the > > SOB-chain the true route of the patch, where the primary author is the > > first in the SOB and where the person sending the patch is the last > > SOB. (the two are identical normally.) > > Ok, I wasn't aware of that primary author rule, sorry. The exact > same thing would go for patch 2 then. Is this something you want me > to correct for this particular series, or to keep in mind for the > future? Would be nice to fix it, the SOB chain is important for fine print reasons. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 10, 2013 at 09:36:36AM -0800, Davidlohr Bueso wrote: > On Tue, 2013-12-10 at 18:15 +0100, Peter Zijlstra wrote: > > - * The length of the list is tracked with atomic ops (hb->waiters), > > - * providing the necessary memory barriers for the waiters. For the > > - * waker side, however, we rely on get_futex_key_refs(), using either > > - * ihold() or the atomic_inc(), for shared futexes. The former provides > > - * a full mb on all architectures. For architectures that do not have an > > - * implicit barrier in atomic_inc/dec, we explicitly add it - please > > - * refer to futex_get_mm() and hb_waiters_inc/dec(). > > IMHO this text gives a nice summary instead of documenting each function > with this things like '... implies MB (B)'. Anyway, I'll resend this > patch with your corrections. Right, I didn't much care for that. Once you know that you need them and what for, the actual finding of the barriers is usually a 'trivial' matter. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 18:15 +0100, Peter Zijlstra wrote: > --- > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -82,12 +82,13 @@ > * The waker side modifies the user space value of the futex and calls > * futex_wake(). It computes the hash bucket and acquires the hash > * bucket lock. Then it looks for waiters on that futex in the hash > - * bucket and wakes them. In scenarios where wakeups are called and no > - * tasks are blocked on a futex, taking the hb spinlock can be avoided > - * and simply return. In order for this optimization to work, ordering > - * guarantees must exist so that the waiter being added to the list is > - * acknowledged when the list is concurrently being checked by the waker, > - * avoiding scenarios like the following: > + * bucket and wakes them. > + * > + * In scenarios where wakeups are called and no tasks are blocked on a futex, > + * taking the hb spinlock can be avoided and simply return. In order for this > + * optimization to work, ordering guarantees must exist so that the waiter > + * being added to the list is acknowledged when the list is concurrently > being > + * checked by the waker, avoiding scenarios like the following: > * > * CPU 0 CPU 1 > * val = *futex; > @@ -108,6 +109,7 @@ > * This would cause the waiter on CPU 0 to wait forever because it > * missed the transition of the user space value from val to newval > * and the waker did not find the waiter in the hash bucket queue. > + * > * The correct serialization ensures that a waiter either observes > * the changed user space value before blocking or is woken by a > * concurrent waker: > @@ -117,7 +119,8 @@ > * sys_futex(WAIT, futex, val); > * futex_wait(futex, val); > * > - * mb(); <-- paired with -- > + * waiters++; > + * mb(); (A) <-- paired with -. > * | > * lock(hash_bucket(futex)); | > * | > @@ -126,22 +129,29 @@ > * |sys_futex(WAKE, futex); > * | futex_wake(futex); > * | > - * > mb(); > + * `---> mb(); (B) > * if (uval == val) > * queue(); > * unlock(hash_bucket(futex)); > - * schedule(); if (!queue_empty()) > + * schedule(); if (waiters) > * lock(hash_bucket(futex)); > * wake_waiters(futex); > * unlock(hash_bucket(futex)); > * > - * The length of the list is tracked with atomic ops (hb->waiters), > - * providing the necessary memory barriers for the waiters. For the > - * waker side, however, we rely on get_futex_key_refs(), using either > - * ihold() or the atomic_inc(), for shared futexes. The former provides > - * a full mb on all architectures. For architectures that do not have an > - * implicit barrier in atomic_inc/dec, we explicitly add it - please > - * refer to futex_get_mm() and hb_waiters_inc/dec(). IMHO this text gives a nice summary instead of documenting each function with this things like '... implies MB (B)'. Anyway, I'll resend this patch with your corrections. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 10, 2013 at 09:22:08AM -0800, Davidlohr Bueso wrote: > On Tue, 2013-12-10 at 17:57 +0100, Peter Zijlstra wrote: > > > @@ -106,24 +108,40 @@ > > > * This would cause the waiter on CPU 0 to wait forever because it > > > * missed the transition of the user space value from val to newval > > > * and the waker did not find the waiter in the hash bucket queue. > > > + * The correct serialization ensures that a waiter either observes > > > + * the changed user space value before blocking or is woken by a > > > + * concurrent waker: > > > * > > > * CPU 0 CPU 1 > > > * val = *futex; > > > * sys_futex(WAIT, futex, val); > > > * futex_wait(futex, val); > > > + * > > > + * mb(); <-- paired with -- > > > + * | > > > + * lock(hash_bucket(futex)); | > > > + * | > > > + * uval = *futex; | > > > + * |*futex = newval; > > > + * |sys_futex(WAKE, futex); > > > + * | futex_wake(futex); > > > + * | > > > + * > mb(); > > > * if (uval == val) > > > + * queue(); > > > * unlock(hash_bucket(futex)); > > > + * schedule(); if (!queue_empty()) > > > + * lock(hash_bucket(futex)); > > > + * wake_waiters(futex); > > > + * unlock(hash_bucket(futex)); > > > + * > > > + * The length of the list is tracked with atomic ops (hb->waiters), > > > + * providing the necessary memory barriers for the waiters. For the > > > + * waker side, however, we rely on get_futex_key_refs(), using either > > > + * ihold() or the atomic_inc(), for shared futexes. The former provides > > > + * a full mb on all architectures. For architectures that do not have an > > > + * implicit barrier in atomic_inc/dec, we explicitly add it - please > > > + * refer to futex_get_mm() and hb_waiters_inc/dec(). > > > */ > > It isn't at all explained what purpose the memory barriers serve. > > Why doesn't this explain it? Because you failed to explain what is ordered against what and what the resulting order guarantees. > "The correct serialization ensures that a waiter either observes > the changed user space value before blocking or is woken by a > concurrent waker." Or both. For the given case: X = Y = 0 w[X]=1 w[Y]=1 MB MB r[Y]=y r[X]=x x==1 && y==1 is a valid result. The only invalid result is both 0. But then we're still short of how we end up at that guarantee. > Perhaps adding an example? > plist_add()|uaddr = newval > smp_mb() |smp_mb() > verify uaddr |plist_head_empty() Except of course you don't actually use plist_add() and plist_head_empty() for anything much at all, only creating more confusion. Just add the waiters variable and explicitly mention what you're doing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 17:57 +0100, Peter Zijlstra wrote: > On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -82,10 +82,12 @@ > > * The waker side modifies the user space value of the futex and calls > > * futex_wake(). It computes the hash bucket and acquires the hash > > * bucket lock. Then it looks for waiters on that futex in the hash > > - * bucket and wakes them. > > - * > > - * Note that the spin_lock serializes waiters and wakers, so that the > > - * following scenario is avoided: > > + * bucket and wakes them. > > Why not let this be the start of a new paragraph? > > > In scenarios where wakeups are called and no > > + * tasks are blocked on a futex, taking the hb spinlock can be avoided > > + * and simply return. In order for this optimization to work, ordering > > + * guarantees must exist so that the waiter being added to the list is > > + * acknowledged when the list is concurrently being checked by the waker, > > + * avoiding scenarios like the following: > > * > > * CPU 0 CPU 1 > > * val = *futex; > > @@ -106,24 +108,40 @@ > > * This would cause the waiter on CPU 0 to wait forever because it > > * missed the transition of the user space value from val to newval > > * and the waker did not find the waiter in the hash bucket queue. > > + * The correct serialization ensures that a waiter either observes > > + * the changed user space value before blocking or is woken by a > > + * concurrent waker: > > * > > * CPU 0 CPU 1 > > * val = *futex; > > * sys_futex(WAIT, futex, val); > > * futex_wait(futex, val); > > + * > > + * mb(); <-- paired with -- > > + * | > > + * lock(hash_bucket(futex)); | > > + * | > > + * uval = *futex; | > > + * |*futex = newval; > > + * |sys_futex(WAKE, futex); > > + * | futex_wake(futex); > > + * | > > + * > mb(); > > * if (uval == val) > > + * queue(); > > * unlock(hash_bucket(futex)); > > + * schedule(); if (!queue_empty()) > > + * lock(hash_bucket(futex)); > > + * wake_waiters(futex); > > + * unlock(hash_bucket(futex)); > > + * > > + * The length of the list is tracked with atomic ops (hb->waiters), > > + * providing the necessary memory barriers for the waiters. For the > > + * waker side, however, we rely on get_futex_key_refs(), using either > > + * ihold() or the atomic_inc(), for shared futexes. The former provides > > + * a full mb on all architectures. For architectures that do not have an > > + * implicit barrier in atomic_inc/dec, we explicitly add it - please > > + * refer to futex_get_mm() and hb_waiters_inc/dec(). > > */ > > This comment actually confuses me :/ Well that explains where the required barriers are coming from. > > It isn't at all explained what purpose the memory barriers serve. Why doesn't this explain it? "The correct serialization ensures that a waiter either observes the changed user space value before blocking or is woken by a concurrent waker." Perhaps adding an example? plist_add()|uaddr = newval smp_mb() |smp_mb() verify uaddr |plist_head_empty() Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
--- --- a/kernel/futex.c +++ b/kernel/futex.c @@ -82,12 +82,13 @@ * The waker side modifies the user space value of the futex and calls * futex_wake(). It computes the hash bucket and acquires the hash * bucket lock. Then it looks for waiters on that futex in the hash - * bucket and wakes them. In scenarios where wakeups are called and no - * tasks are blocked on a futex, taking the hb spinlock can be avoided - * and simply return. In order for this optimization to work, ordering - * guarantees must exist so that the waiter being added to the list is - * acknowledged when the list is concurrently being checked by the waker, - * avoiding scenarios like the following: + * bucket and wakes them. + * + * In scenarios where wakeups are called and no tasks are blocked on a futex, + * taking the hb spinlock can be avoided and simply return. In order for this + * optimization to work, ordering guarantees must exist so that the waiter + * being added to the list is acknowledged when the list is concurrently being + * checked by the waker, avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -108,6 +109,7 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * * The correct serialization ensures that a waiter either observes * the changed user space value before blocking or is woken by a * concurrent waker: @@ -117,7 +119,8 @@ * sys_futex(WAIT, futex, val); * futex_wait(futex, val); * - * mb(); <-- paired with -- + * waiters++; + * mb(); (A) <-- paired with -. * | * lock(hash_bucket(futex)); | * | @@ -126,22 +129,29 @@ * |sys_futex(WAKE, futex); * | futex_wake(futex); * | - * > mb(); + * `---> mb(); (B) * if (uval == val) * queue(); * unlock(hash_bucket(futex)); - * schedule(); if (!queue_empty()) + * schedule(); if (waiters) * lock(hash_bucket(futex)); * wake_waiters(futex); * unlock(hash_bucket(futex)); * - * The length of the list is tracked with atomic ops (hb->waiters), - * providing the necessary memory barriers for the waiters. For the - * waker side, however, we rely on get_futex_key_refs(), using either - * ihold() or the atomic_inc(), for shared futexes. The former provides - * a full mb on all architectures. For architectures that do not have an - * implicit barrier in atomic_inc/dec, we explicitly add it - please - * refer to futex_get_mm() and hb_waiters_inc/dec(). + * Where (A) orders the waiters increment and the futex value read; and + * where (B) orders the write to futex and the waiters read. + * + * This yields the following case (where X:=waiters, Y:=futex): + * + * X = Y = 0 + * + * w[X]=1 w[Y]=1 + * MB MB + * r[Y]=y r[X]=x + * + * Which guarantees that x==0 && y==0 is impossible; which translates back into + * the guarantee that we cannot both miss the futex variable change and the + * enqueue. */ int __read_mostly futex_cmpxchg_enabled; @@ -221,9 +231,7 @@ static const struct futex_q futex_q_init * waiting on a futex. */ struct futex_hash_bucket { -#ifdef CONFIG_SMP atomic_t waiters; -#endif spinlock_t lock; struct plist_head chain; } cacheline_aligned_in_smp; @@ -237,8 +245,9 @@ static inline void futex_get_mm(union fu atomic_inc(>private.mm->mm_count); #ifdef CONFIG_SMP /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). +* Ensure futex_get_mm() implies a full barrier such that +* get_futex_key() implies a full barrier. This is relied upon as full +* barrier (B), see the ordering comment above. */ smp_mb__after_atomic_inc(); #endif @@ -252,8 +261,7 @@ static inline void hb_waiters_inc(struct #ifdef CONFIG_SMP atomic_inc(>waiters); /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). +* Full barrier (A), see the ordering comment above. */ smp_mb__after_atomic_inc(); #endif @@ -267,18 +275,6 @@ static inline void hb_waiters_dec(struct { #ifdef CONFIG_SMP atomic_dec(>waiters); - /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). -* -* For non-x86 archs it's debatable whether this has -* a hard requirement to be guarded. The optimized -*
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -82,10 +82,12 @@ > * The waker side modifies the user space value of the futex and calls > * futex_wake(). It computes the hash bucket and acquires the hash > * bucket lock. Then it looks for waiters on that futex in the hash > - * bucket and wakes them. > - * > - * Note that the spin_lock serializes waiters and wakers, so that the > - * following scenario is avoided: > + * bucket and wakes them. Why not let this be the start of a new paragraph? > In scenarios where wakeups are called and no > + * tasks are blocked on a futex, taking the hb spinlock can be avoided > + * and simply return. In order for this optimization to work, ordering > + * guarantees must exist so that the waiter being added to the list is > + * acknowledged when the list is concurrently being checked by the waker, > + * avoiding scenarios like the following: > * > * CPU 0 CPU 1 > * val = *futex; > @@ -106,24 +108,40 @@ > * This would cause the waiter on CPU 0 to wait forever because it > * missed the transition of the user space value from val to newval > * and the waker did not find the waiter in the hash bucket queue. > + * The correct serialization ensures that a waiter either observes > + * the changed user space value before blocking or is woken by a > + * concurrent waker: > * > * CPU 0 CPU 1 > * val = *futex; > * sys_futex(WAIT, futex, val); > * futex_wait(futex, val); > + * > + * mb(); <-- paired with -- > + * | > + * lock(hash_bucket(futex)); | > + * | > + * uval = *futex; | > + * |*futex = newval; > + * |sys_futex(WAKE, futex); > + * | futex_wake(futex); > + * | > + * > mb(); > * if (uval == val) > + * queue(); > * unlock(hash_bucket(futex)); > + * schedule(); if (!queue_empty()) > + * lock(hash_bucket(futex)); > + * wake_waiters(futex); > + * unlock(hash_bucket(futex)); > + * > + * The length of the list is tracked with atomic ops (hb->waiters), > + * providing the necessary memory barriers for the waiters. For the > + * waker side, however, we rely on get_futex_key_refs(), using either > + * ihold() or the atomic_inc(), for shared futexes. The former provides > + * a full mb on all architectures. For architectures that do not have an > + * implicit barrier in atomic_inc/dec, we explicitly add it - please > + * refer to futex_get_mm() and hb_waiters_inc/dec(). > */ This comment actually confuses me :/ It isn't at all explained what purpose the memory barriers serve. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: > @@ -203,6 +221,9 @@ static const struct futex_q futex_q_init = { > * waiting on a futex. > */ > struct futex_hash_bucket { > +#ifdef CONFIG_SMP > + atomic_t waiters; > +#endif > spinlock_t lock; > struct plist_head chain; > } cacheline_aligned_in_smp; > @@ -2805,6 +2906,9 @@ static int __init futex_init(void) > for (i = 0; i < futex_hashsize; i++) { > plist_head_init(_queues[i].chain); > spin_lock_init(_queues[i].lock); > +#ifdef CONFIG_SMP > + atomic_set(_queues[i].waiters, 0); > +#endif > } > > return 0; Just remove those two #ifdefs, its not like the data structure size matters at this point. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 16:36 +0100, Ingo Molnar wrote: > * Davidlohr Bueso wrote: > > > On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: > > > On Tue, 3 Dec 2013, Davidlohr Bueso wrote: > > > > Signed-off-by: Waiman Long > > > > Signed-off-by: Jason Low > > > > Signed-off-by: Davidlohr Bueso > > > > > > Who is the author of this? According to the SOB chains its Waiman, who > > > handed it down to Jason and it finally got sumbitted by you. Is that > > > correct? > > > > This was a group effort for the original patchset -- things did change > > quite a bit for this particular patch in v2. Waiman and Jason helped > > review and test the original atomic ops code. > > Then please put that into the credits (in the changelog and/or in the > source code), and/or into copyright notices, etc. - but keep the > SOB-chain the true route of the patch, where the primary author is the > first in the SOB and where the person sending the patch is the last > SOB. (the two are identical normally.) Ok, I wasn't aware of that primary author rule, sorry. The exact same thing would go for patch 2 then. Is this something you want me to correct for this particular series, or to keep in mind for the future? Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: > With the new atomic counter we only enlarge the futex_hash_bucket struct > by 4 bytes on 32bit systems, and it is left the same size for 64bit > ones, at 24 bytes. patch 2 grew the structure to be an entire cacheline, this now seems a rather pointless comment ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
* Davidlohr Bueso wrote: > On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: > > On Tue, 3 Dec 2013, Davidlohr Bueso wrote: > > > Signed-off-by: Waiman Long > > > Signed-off-by: Jason Low > > > Signed-off-by: Davidlohr Bueso > > > > Who is the author of this? According to the SOB chains its Waiman, who > > handed it down to Jason and it finally got sumbitted by you. Is that > > correct? > > This was a group effort for the original patchset -- things did change > quite a bit for this particular patch in v2. Waiman and Jason helped > review and test the original atomic ops code. Then please put that into the credits (in the changelog and/or in the source code), and/or into copyright notices, etc. - but keep the SOB-chain the true route of the patch, where the primary author is the first in the SOB and where the person sending the patch is the last SOB. (the two are identical normally.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
* Davidlohr Bueso davidl...@hp.com wrote: On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: On Tue, 3 Dec 2013, Davidlohr Bueso wrote: Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Jason Low jason.l...@hp.com Signed-off-by: Davidlohr Bueso davidl...@hp.com Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? This was a group effort for the original patchset -- things did change quite a bit for this particular patch in v2. Waiman and Jason helped review and test the original atomic ops code. Then please put that into the credits (in the changelog and/or in the source code), and/or into copyright notices, etc. - but keep the SOB-chain the true route of the patch, where the primary author is the first in the SOB and where the person sending the patch is the last SOB. (the two are identical normally.) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: With the new atomic counter we only enlarge the futex_hash_bucket struct by 4 bytes on 32bit systems, and it is left the same size for 64bit ones, at 24 bytes. patch 2 grew the structure to be an entire cacheline, this now seems a rather pointless comment ;-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 16:36 +0100, Ingo Molnar wrote: * Davidlohr Bueso davidl...@hp.com wrote: On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: On Tue, 3 Dec 2013, Davidlohr Bueso wrote: Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Jason Low jason.l...@hp.com Signed-off-by: Davidlohr Bueso davidl...@hp.com Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? This was a group effort for the original patchset -- things did change quite a bit for this particular patch in v2. Waiman and Jason helped review and test the original atomic ops code. Then please put that into the credits (in the changelog and/or in the source code), and/or into copyright notices, etc. - but keep the SOB-chain the true route of the patch, where the primary author is the first in the SOB and where the person sending the patch is the last SOB. (the two are identical normally.) Ok, I wasn't aware of that primary author rule, sorry. The exact same thing would go for patch 2 then. Is this something you want me to correct for this particular series, or to keep in mind for the future? Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: @@ -203,6 +221,9 @@ static const struct futex_q futex_q_init = { * waiting on a futex. */ struct futex_hash_bucket { +#ifdef CONFIG_SMP + atomic_t waiters; +#endif spinlock_t lock; struct plist_head chain; } cacheline_aligned_in_smp; @@ -2805,6 +2906,9 @@ static int __init futex_init(void) for (i = 0; i futex_hashsize; i++) { plist_head_init(futex_queues[i].chain); spin_lock_init(futex_queues[i].lock); +#ifdef CONFIG_SMP + atomic_set(futex_queues[i].waiters, 0); +#endif } return 0; Just remove those two #ifdefs, its not like the data structure size matters at this point. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: --- a/kernel/futex.c +++ b/kernel/futex.c @@ -82,10 +82,12 @@ * The waker side modifies the user space value of the futex and calls * futex_wake(). It computes the hash bucket and acquires the hash * bucket lock. Then it looks for waiters on that futex in the hash - * bucket and wakes them. - * - * Note that the spin_lock serializes waiters and wakers, so that the - * following scenario is avoided: + * bucket and wakes them. Why not let this be the start of a new paragraph? In scenarios where wakeups are called and no + * tasks are blocked on a futex, taking the hb spinlock can be avoided + * and simply return. In order for this optimization to work, ordering + * guarantees must exist so that the waiter being added to the list is + * acknowledged when the list is concurrently being checked by the waker, + * avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -106,24 +108,40 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * The correct serialization ensures that a waiter either observes + * the changed user space value before blocking or is woken by a + * concurrent waker: * * CPU 0 CPU 1 * val = *futex; * sys_futex(WAIT, futex, val); * futex_wait(futex, val); + * + * mb(); -- paired with -- + * | + * lock(hash_bucket(futex)); | + * | + * uval = *futex; | + * |*futex = newval; + * |sys_futex(WAKE, futex); + * | futex_wake(futex); + * | + * mb(); * if (uval == val) + * queue(); * unlock(hash_bucket(futex)); + * schedule(); if (!queue_empty()) + * lock(hash_bucket(futex)); + * wake_waiters(futex); + * unlock(hash_bucket(futex)); + * + * The length of the list is tracked with atomic ops (hb-waiters), + * providing the necessary memory barriers for the waiters. For the + * waker side, however, we rely on get_futex_key_refs(), using either + * ihold() or the atomic_inc(), for shared futexes. The former provides + * a full mb on all architectures. For architectures that do not have an + * implicit barrier in atomic_inc/dec, we explicitly add it - please + * refer to futex_get_mm() and hb_waiters_inc/dec(). */ This comment actually confuses me :/ It isn't at all explained what purpose the memory barriers serve. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
--- --- a/kernel/futex.c +++ b/kernel/futex.c @@ -82,12 +82,13 @@ * The waker side modifies the user space value of the futex and calls * futex_wake(). It computes the hash bucket and acquires the hash * bucket lock. Then it looks for waiters on that futex in the hash - * bucket and wakes them. In scenarios where wakeups are called and no - * tasks are blocked on a futex, taking the hb spinlock can be avoided - * and simply return. In order for this optimization to work, ordering - * guarantees must exist so that the waiter being added to the list is - * acknowledged when the list is concurrently being checked by the waker, - * avoiding scenarios like the following: + * bucket and wakes them. + * + * In scenarios where wakeups are called and no tasks are blocked on a futex, + * taking the hb spinlock can be avoided and simply return. In order for this + * optimization to work, ordering guarantees must exist so that the waiter + * being added to the list is acknowledged when the list is concurrently being + * checked by the waker, avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -108,6 +109,7 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * * The correct serialization ensures that a waiter either observes * the changed user space value before blocking or is woken by a * concurrent waker: @@ -117,7 +119,8 @@ * sys_futex(WAIT, futex, val); * futex_wait(futex, val); * - * mb(); -- paired with -- + * waiters++; + * mb(); (A) -- paired with -. * | * lock(hash_bucket(futex)); | * | @@ -126,22 +129,29 @@ * |sys_futex(WAKE, futex); * | futex_wake(futex); * | - * mb(); + * `--- mb(); (B) * if (uval == val) * queue(); * unlock(hash_bucket(futex)); - * schedule(); if (!queue_empty()) + * schedule(); if (waiters) * lock(hash_bucket(futex)); * wake_waiters(futex); * unlock(hash_bucket(futex)); * - * The length of the list is tracked with atomic ops (hb-waiters), - * providing the necessary memory barriers for the waiters. For the - * waker side, however, we rely on get_futex_key_refs(), using either - * ihold() or the atomic_inc(), for shared futexes. The former provides - * a full mb on all architectures. For architectures that do not have an - * implicit barrier in atomic_inc/dec, we explicitly add it - please - * refer to futex_get_mm() and hb_waiters_inc/dec(). + * Where (A) orders the waiters increment and the futex value read; and + * where (B) orders the write to futex and the waiters read. + * + * This yields the following case (where X:=waiters, Y:=futex): + * + * X = Y = 0 + * + * w[X]=1 w[Y]=1 + * MB MB + * r[Y]=y r[X]=x + * + * Which guarantees that x==0 y==0 is impossible; which translates back into + * the guarantee that we cannot both miss the futex variable change and the + * enqueue. */ int __read_mostly futex_cmpxchg_enabled; @@ -221,9 +231,7 @@ static const struct futex_q futex_q_init * waiting on a futex. */ struct futex_hash_bucket { -#ifdef CONFIG_SMP atomic_t waiters; -#endif spinlock_t lock; struct plist_head chain; } cacheline_aligned_in_smp; @@ -237,8 +245,9 @@ static inline void futex_get_mm(union fu atomic_inc(key-private.mm-mm_count); #ifdef CONFIG_SMP /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). +* Ensure futex_get_mm() implies a full barrier such that +* get_futex_key() implies a full barrier. This is relied upon as full +* barrier (B), see the ordering comment above. */ smp_mb__after_atomic_inc(); #endif @@ -252,8 +261,7 @@ static inline void hb_waiters_inc(struct #ifdef CONFIG_SMP atomic_inc(hb-waiters); /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). +* Full barrier (A), see the ordering comment above. */ smp_mb__after_atomic_inc(); #endif @@ -267,18 +275,6 @@ static inline void hb_waiters_dec(struct { #ifdef CONFIG_SMP atomic_dec(hb-waiters); - /* -* Reduced to a simple barrier() where the atomic_inc -* has an implicit mb(). -* -* For non-x86 archs it's debatable whether this has -* a hard requirement to be guarded. The optimized -*
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 17:57 +0100, Peter Zijlstra wrote: On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: --- a/kernel/futex.c +++ b/kernel/futex.c @@ -82,10 +82,12 @@ * The waker side modifies the user space value of the futex and calls * futex_wake(). It computes the hash bucket and acquires the hash * bucket lock. Then it looks for waiters on that futex in the hash - * bucket and wakes them. - * - * Note that the spin_lock serializes waiters and wakers, so that the - * following scenario is avoided: + * bucket and wakes them. Why not let this be the start of a new paragraph? In scenarios where wakeups are called and no + * tasks are blocked on a futex, taking the hb spinlock can be avoided + * and simply return. In order for this optimization to work, ordering + * guarantees must exist so that the waiter being added to the list is + * acknowledged when the list is concurrently being checked by the waker, + * avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -106,24 +108,40 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * The correct serialization ensures that a waiter either observes + * the changed user space value before blocking or is woken by a + * concurrent waker: * * CPU 0 CPU 1 * val = *futex; * sys_futex(WAIT, futex, val); * futex_wait(futex, val); + * + * mb(); -- paired with -- + * | + * lock(hash_bucket(futex)); | + * | + * uval = *futex; | + * |*futex = newval; + * |sys_futex(WAKE, futex); + * | futex_wake(futex); + * | + * mb(); * if (uval == val) + * queue(); * unlock(hash_bucket(futex)); + * schedule(); if (!queue_empty()) + * lock(hash_bucket(futex)); + * wake_waiters(futex); + * unlock(hash_bucket(futex)); + * + * The length of the list is tracked with atomic ops (hb-waiters), + * providing the necessary memory barriers for the waiters. For the + * waker side, however, we rely on get_futex_key_refs(), using either + * ihold() or the atomic_inc(), for shared futexes. The former provides + * a full mb on all architectures. For architectures that do not have an + * implicit barrier in atomic_inc/dec, we explicitly add it - please + * refer to futex_get_mm() and hb_waiters_inc/dec(). */ This comment actually confuses me :/ Well that explains where the required barriers are coming from. It isn't at all explained what purpose the memory barriers serve. Why doesn't this explain it? The correct serialization ensures that a waiter either observes the changed user space value before blocking or is woken by a concurrent waker. Perhaps adding an example? plist_add()|uaddr = newval smp_mb() |smp_mb() verify uaddr |plist_head_empty() Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 10, 2013 at 09:22:08AM -0800, Davidlohr Bueso wrote: On Tue, 2013-12-10 at 17:57 +0100, Peter Zijlstra wrote: @@ -106,24 +108,40 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * The correct serialization ensures that a waiter either observes + * the changed user space value before blocking or is woken by a + * concurrent waker: * * CPU 0 CPU 1 * val = *futex; * sys_futex(WAIT, futex, val); * futex_wait(futex, val); + * + * mb(); -- paired with -- + * | + * lock(hash_bucket(futex)); | + * | + * uval = *futex; | + * |*futex = newval; + * |sys_futex(WAKE, futex); + * | futex_wake(futex); + * | + * mb(); * if (uval == val) + * queue(); * unlock(hash_bucket(futex)); + * schedule(); if (!queue_empty()) + * lock(hash_bucket(futex)); + * wake_waiters(futex); + * unlock(hash_bucket(futex)); + * + * The length of the list is tracked with atomic ops (hb-waiters), + * providing the necessary memory barriers for the waiters. For the + * waker side, however, we rely on get_futex_key_refs(), using either + * ihold() or the atomic_inc(), for shared futexes. The former provides + * a full mb on all architectures. For architectures that do not have an + * implicit barrier in atomic_inc/dec, we explicitly add it - please + * refer to futex_get_mm() and hb_waiters_inc/dec(). */ It isn't at all explained what purpose the memory barriers serve. Why doesn't this explain it? Because you failed to explain what is ordered against what and what the resulting order guarantees. The correct serialization ensures that a waiter either observes the changed user space value before blocking or is woken by a concurrent waker. Or both. For the given case: X = Y = 0 w[X]=1 w[Y]=1 MB MB r[Y]=y r[X]=x x==1 y==1 is a valid result. The only invalid result is both 0. But then we're still short of how we end up at that guarantee. Perhaps adding an example? plist_add()|uaddr = newval smp_mb() |smp_mb() verify uaddr |plist_head_empty() Except of course you don't actually use plist_add() and plist_head_empty() for anything much at all, only creating more confusion. Just add the waiters variable and explicitly mention what you're doing. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 2013-12-10 at 18:15 +0100, Peter Zijlstra wrote: --- --- a/kernel/futex.c +++ b/kernel/futex.c @@ -82,12 +82,13 @@ * The waker side modifies the user space value of the futex and calls * futex_wake(). It computes the hash bucket and acquires the hash * bucket lock. Then it looks for waiters on that futex in the hash - * bucket and wakes them. In scenarios where wakeups are called and no - * tasks are blocked on a futex, taking the hb spinlock can be avoided - * and simply return. In order for this optimization to work, ordering - * guarantees must exist so that the waiter being added to the list is - * acknowledged when the list is concurrently being checked by the waker, - * avoiding scenarios like the following: + * bucket and wakes them. + * + * In scenarios where wakeups are called and no tasks are blocked on a futex, + * taking the hb spinlock can be avoided and simply return. In order for this + * optimization to work, ordering guarantees must exist so that the waiter + * being added to the list is acknowledged when the list is concurrently being + * checked by the waker, avoiding scenarios like the following: * * CPU 0 CPU 1 * val = *futex; @@ -108,6 +109,7 @@ * This would cause the waiter on CPU 0 to wait forever because it * missed the transition of the user space value from val to newval * and the waker did not find the waiter in the hash bucket queue. + * * The correct serialization ensures that a waiter either observes * the changed user space value before blocking or is woken by a * concurrent waker: @@ -117,7 +119,8 @@ * sys_futex(WAIT, futex, val); * futex_wait(futex, val); * - * mb(); -- paired with -- + * waiters++; + * mb(); (A) -- paired with -. * | * lock(hash_bucket(futex)); | * | @@ -126,22 +129,29 @@ * |sys_futex(WAKE, futex); * | futex_wake(futex); * | - * mb(); + * `--- mb(); (B) * if (uval == val) * queue(); * unlock(hash_bucket(futex)); - * schedule(); if (!queue_empty()) + * schedule(); if (waiters) * lock(hash_bucket(futex)); * wake_waiters(futex); * unlock(hash_bucket(futex)); * - * The length of the list is tracked with atomic ops (hb-waiters), - * providing the necessary memory barriers for the waiters. For the - * waker side, however, we rely on get_futex_key_refs(), using either - * ihold() or the atomic_inc(), for shared futexes. The former provides - * a full mb on all architectures. For architectures that do not have an - * implicit barrier in atomic_inc/dec, we explicitly add it - please - * refer to futex_get_mm() and hb_waiters_inc/dec(). IMHO this text gives a nice summary instead of documenting each function with this things like '... implies MB (B)'. Anyway, I'll resend this patch with your corrections. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, Dec 10, 2013 at 09:36:36AM -0800, Davidlohr Bueso wrote: On Tue, 2013-12-10 at 18:15 +0100, Peter Zijlstra wrote: - * The length of the list is tracked with atomic ops (hb-waiters), - * providing the necessary memory barriers for the waiters. For the - * waker side, however, we rely on get_futex_key_refs(), using either - * ihold() or the atomic_inc(), for shared futexes. The former provides - * a full mb on all architectures. For architectures that do not have an - * implicit barrier in atomic_inc/dec, we explicitly add it - please - * refer to futex_get_mm() and hb_waiters_inc/dec(). IMHO this text gives a nice summary instead of documenting each function with this things like '... implies MB (B)'. Anyway, I'll resend this patch with your corrections. Right, I didn't much care for that. Once you know that you need them and what for, the actual finding of the barriers is usually a 'trivial' matter. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
* Davidlohr Bueso davidl...@hp.com wrote: On Tue, 2013-12-10 at 16:36 +0100, Ingo Molnar wrote: * Davidlohr Bueso davidl...@hp.com wrote: On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: On Tue, 3 Dec 2013, Davidlohr Bueso wrote: Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Jason Low jason.l...@hp.com Signed-off-by: Davidlohr Bueso davidl...@hp.com Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? This was a group effort for the original patchset -- things did change quite a bit for this particular patch in v2. Waiman and Jason helped review and test the original atomic ops code. Then please put that into the credits (in the changelog and/or in the source code), and/or into copyright notices, etc. - but keep the SOB-chain the true route of the patch, where the primary author is the first in the SOB and where the person sending the patch is the last SOB. (the two are identical normally.) Ok, I wasn't aware of that primary author rule, sorry. The exact same thing would go for patch 2 then. Is this something you want me to correct for this particular series, or to keep in mind for the future? Would be nice to fix it, the SOB chain is important for fine print reasons. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: > On Tue, 3 Dec 2013, Davidlohr Bueso wrote: > > Signed-off-by: Waiman Long > > Signed-off-by: Jason Low > > Signed-off-by: Davidlohr Bueso > > Who is the author of this? According to the SOB chains its Waiman, who > handed it down to Jason and it finally got sumbitted by you. Is that > correct? This was a group effort for the original patchset -- things did change quite a bit for this particular patch in v2. Waiman and Jason helped review and test the original atomic ops code. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 3 Dec 2013, Davidlohr Bueso wrote: > Signed-off-by: Waiman Long > Signed-off-by: Jason Low > Signed-off-by: Davidlohr Bueso Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Tue, 3 Dec 2013, Davidlohr Bueso wrote: Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Jason Low jason.l...@hp.com Signed-off-by: Davidlohr Bueso davidl...@hp.com Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 4/4] futex: Avoid taking hb lock if nothing to wakeup
On Fri, 2013-12-06 at 19:26 +0100, Thomas Gleixner wrote: On Tue, 3 Dec 2013, Davidlohr Bueso wrote: Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Jason Low jason.l...@hp.com Signed-off-by: Davidlohr Bueso davidl...@hp.com Who is the author of this? According to the SOB chains its Waiman, who handed it down to Jason and it finally got sumbitted by you. Is that correct? This was a group effort for the original patchset -- things did change quite a bit for this particular patch in v2. Waiman and Jason helped review and test the original atomic ops code. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/