Re: [PATCH v2 00/13] rwsem fast-path write lock stealing
On Sat, Apr 27, 2013 at 02:15:52PM -0700, Davidlohr Bueso wrote: > From: Davidlohr Bueso > Subject: [PATCH] rwsem: check counter to avoid cmpxchg calls > > This patch tries to reduce the amount of cmpxchg calls in the > writer failed path by checking the counter value first before > issuing the instruction. If ->count is not set to RWSEM_WAITING_BIAS > then there is no point wasting a cmpxchg call. > > Signed-off-by: Davidlohr Bueso > --- > lib/rwsem.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/lib/rwsem.c b/lib/rwsem.c > index 50fdd89..a79dc95 100644 > --- a/lib/rwsem.c > +++ b/lib/rwsem.c > @@ -222,9 +222,10 @@ struct rw_semaphore __sched > *rwsem_down_write_failed(struct rw_semaphore *sem) > count = RWSEM_ACTIVE_WRITE_BIAS; > if (!list_is_singular(>wait_list)) > count += RWSEM_WAITING_BIAS; > - if (cmpxchg(>count, RWSEM_WAITING_BIAS, count) == > - RWSEM_WAITING_BIAS) > - break; > + if ((*(volatile long *)&(sem)->count) == > RWSEM_WAITING_BIAS) > + if (cmpxchg(>count, RWSEM_WAITING_BIAS, > count) == > + RWSEM_WAITING_BIAS) > + break; > } > > raw_spin_unlock_irq(>wait_lock); Quite impressed that this makes as much of a difference - this code block already checks that the active count was 0 (which implies the sem->count must be RWSEM_WAITING_BIAS, since the writer thread is known to be waiting) not long before. But I suppose it helps due to the case where someone else steals the lock while we're trying to acquire sem->wait_lock. Regarding style: I would prefer if the sem->count read didn't use the volatile cast. The compiler will already be forced to reload the value due to the barriers in set_task_state() and raw_spin_lock_irq(). And if you absolutely need to have that volatile, I would prefer you use ACCESS_ONCE() rather than the hand written equivalent. I'm going to try pushing this to Linus tomorrow; I feel I shouldn't change your patch without putting it through tests again; are you OK with sending this as a followup to the series rather than me including it ? -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/13] rwsem fast-path write lock stealing
On Sat, Apr 27, 2013 at 2:15 PM, Davidlohr Bueso wrote: > Hi Michel, > > On Fri, 2013-03-15 at 03:54 -0700, Michel Lespinasse wrote: >> These patches extend Alex Shi's work (which added write lock stealing >> on the rwsem slow path) in order to provide rwsem write lock stealing >> on the fast path (that is, without taking the rwsem's wait_lock). >> >> I initially sent a shorter series shortly before v3.9, however some >> patches were doing too much at once which made them confusing to >> review. I have now split the series at a smaller granularity; >> hope this will help :) > > Coincidentally, I was doing some similar work with similar results. But > I have to say I like your series and cleanups better than mine :) > > I've run a few tests on this patchset and I'm pretty happy with the > numbers. The first series of tests were on my quad-core laptop running > pgbench on three different database sizes, each with various number of > clients: Thanks a lot for this testing. I was getting discouraged with this series, as I would have hoped for it to be picked in -next a long long time ago. Anyway, I will resend a v3 with the (mostly style) updates I got from everyone on v2 and try submitting it to Linus. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/13] rwsem fast-path write lock stealing
On Sat, Apr 27, 2013 at 2:15 PM, Davidlohr Bueso davidlohr.bu...@hp.com wrote: Hi Michel, On Fri, 2013-03-15 at 03:54 -0700, Michel Lespinasse wrote: These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I initially sent a shorter series shortly before v3.9, however some patches were doing too much at once which made them confusing to review. I have now split the series at a smaller granularity; hope this will help :) Coincidentally, I was doing some similar work with similar results. But I have to say I like your series and cleanups better than mine :) I've run a few tests on this patchset and I'm pretty happy with the numbers. The first series of tests were on my quad-core laptop running pgbench on three different database sizes, each with various number of clients: Thanks a lot for this testing. I was getting discouraged with this series, as I would have hoped for it to be picked in -next a long long time ago. Anyway, I will resend a v3 with the (mostly style) updates I got from everyone on v2 and try submitting it to Linus. -- Michel Walken Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/13] rwsem fast-path write lock stealing
On Fri, 2013-03-15 at 03:54 -0700, Michel Lespinasse wrote: > These patches extend Alex Shi's work (which added write lock stealing > on the rwsem slow path) in order to provide rwsem write lock stealing > on the fast path (that is, without taking the rwsem's wait_lock). Since none of Alex's original code will be in here with these patches applied, you might as well add this: diff --git a/lib/rwsem.c b/lib/rwsem.c index 4e4c889..61f91ca 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -4,6 +4,7 @@ * Derived from arch/i386/kernel/semaphore.c * * Writer lock-stealing by Alex Shi + * and Michel Lespinasse */ #include #include > Michel Lespinasse (13): > rwsem: make the waiter type an enumeration rather than a bitmask > rwsem: shorter spinlocked section in rwsem_down_failed_common() > rwsem: move rwsem_down_failed_common code into > rwsem_down_{read,write}_failed > rwsem: simplify rwsem_down_read_failed > rwsem: simplify rwsem_down_write_failed > rwsem: more agressive lock stealing in rwsem_down_write_failed > rwsem: use cmpxchg for trying to steal write lock > rwsem: avoid taking wait_lock in rwsem_down_write_failed > rwsem: skip initial trylock in rwsem_down_write_failed > rwsem: simplify __rwsem_do_wake > rwsem: implement support for write lock stealing on the fastpath > rwsem: do not block readers at head of queue if other readers are active > x86 rwsem: avoid taking slow path when stealing write lock > > arch/x86/include/asm/rwsem.h | 28 +++-- > lib/rwsem-spinlock.c | 40 +++- > lib/rwsem.c | 239 > +-- > 3 files changed, 154 insertions(+), 153 deletions(-) Because Michel asked: Reviewed-by: Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/13] rwsem fast-path write lock stealing
On Fri, 2013-03-15 at 03:54 -0700, Michel Lespinasse wrote: These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). Since none of Alex's original code will be in here with these patches applied, you might as well add this: diff --git a/lib/rwsem.c b/lib/rwsem.c index 4e4c889..61f91ca 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -4,6 +4,7 @@ * Derived from arch/i386/kernel/semaphore.c * * Writer lock-stealing by Alex Shi alex@intel.com + * and Michel Lespinasse wal...@google.com */ #include linux/rwsem.h #include linux/sched.h Michel Lespinasse (13): rwsem: make the waiter type an enumeration rather than a bitmask rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: simplify rwsem_down_read_failed rwsem: simplify rwsem_down_write_failed rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: skip initial trylock in rwsem_down_write_failed rwsem: simplify __rwsem_do_wake rwsem: implement support for write lock stealing on the fastpath rwsem: do not block readers at head of queue if other readers are active x86 rwsem: avoid taking slow path when stealing write lock arch/x86/include/asm/rwsem.h | 28 +++-- lib/rwsem-spinlock.c | 40 +++- lib/rwsem.c | 239 +-- 3 files changed, 154 insertions(+), 153 deletions(-) Because Michel asked: Reviewed-by: Peter Hurley pe...@hurleysoftware.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 00/13] rwsem fast-path write lock stealing
These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I initially sent a shorter series shortly before v3.9, however some patches were doing too much at once which made them confusing to review. I have now split the series at a smaller granularity; hope this will help :) Patches 1-2 are for cleanups: - Patch 1 replaces the waiter type bitmask with an enumeration (as we don't have any other planned uses for the bitmap) - Patch 2 shortens critical sections in rwsem_down_failed_common() so they don't cover more than what is absolutely necessary Patches 3-5 splits rwsem_down_failed_common() into separate functions for the read and write sides: - Patch 3 simply puts two identical copies of rwsem_down_failed_common() into rwsem_down_{read,write}_failed (no code changes, in order to make the review easier) - Patch 4 does easy simplifications in rwsem_down_read_failed(): - We don't need to wake readers queued before us; - We don't need to try to steal the lock, and thus we don't need to acquire the wait_lock after sleeping. - Patch 5 does easy simplifications in rwsem_down_write_failed(): - We don't need to check for !waiter.task since __rwsem_do_wake() doesn't remove writers from the wait_list; - Since the only way to exit the wait loop is by stealing the write lock, the corresponding exit code can be moved after the loop; - There is no point releaseing the wait_lock before entering the wait loop, as we will need to reacquire it immediately; - We don't need to get a reference on the task structure, since the task is responsible for removing itself from the wait_list; Patches 6-9 apply additional optimizations to rwsem_down_write_failed(): - Patch 6 tries write lock stealing more aggressively in order to avoid extra checks; - Patch 7 uses cmpxchg to implement the write lock stealing, instead of doing an additive adjustment that might need to be backed out; - Patch 8 avoids taking the wait_lock if there are already active locks when we wake up; - Patch 9 avoids the initial trylock if there were already active locks when we entered rwsem_down_write_failed() Patches 10-11 are updates to the __rwsem_do_wake function: - Patch 10 has simple structure cleanups that don't change the code behavior - Patch 11 adds generic support for fastpath write lock stealing, by removing assumptions the function did about rwsem acquisitions being prevented when the rwsem count value indicates there are queued waiters. Patch 12 fixes a race condition Peter Hurley noticed in reviewing v1 of this patch series, which resulted in readers sometimes blocking instead of executing in parallel with other existing readers. Patch 13 finally implements rwsem fast path lock stealing for x86 arch. Michel Lespinasse (13): rwsem: make the waiter type an enumeration rather than a bitmask rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: simplify rwsem_down_read_failed rwsem: simplify rwsem_down_write_failed rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: skip initial trylock in rwsem_down_write_failed rwsem: simplify __rwsem_do_wake rwsem: implement support for write lock stealing on the fastpath rwsem: do not block readers at head of queue if other readers are active x86 rwsem: avoid taking slow path when stealing write lock arch/x86/include/asm/rwsem.h | 28 +++-- lib/rwsem-spinlock.c | 40 +++- lib/rwsem.c | 239 +-- 3 files changed, 154 insertions(+), 153 deletions(-) Changes since v1: - Patches 1-9 are unchanged and Patch 13 are unchanged from prior submission. - v1 series made a change to ordering of reader wakeups which proved more controversial than I had anticipated. I reverted this part of the series. Patches 10-11 are new and imlement (well, in Patch 11) the minimal changes required for supporting fastpath write lock stealing without modifying the ordering of reader wakeups (This results in longer code than with the prior proposal, though). - Patch 12 is new, fixing a race condition Peter Hurley noticed while reviewing v1 which could result in reduced parallelism between readers. -- 1.8.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 00/13] rwsem fast-path write lock stealing
These patches extend Alex Shi's work (which added write lock stealing on the rwsem slow path) in order to provide rwsem write lock stealing on the fast path (that is, without taking the rwsem's wait_lock). I initially sent a shorter series shortly before v3.9, however some patches were doing too much at once which made them confusing to review. I have now split the series at a smaller granularity; hope this will help :) Patches 1-2 are for cleanups: - Patch 1 replaces the waiter type bitmask with an enumeration (as we don't have any other planned uses for the bitmap) - Patch 2 shortens critical sections in rwsem_down_failed_common() so they don't cover more than what is absolutely necessary Patches 3-5 splits rwsem_down_failed_common() into separate functions for the read and write sides: - Patch 3 simply puts two identical copies of rwsem_down_failed_common() into rwsem_down_{read,write}_failed (no code changes, in order to make the review easier) - Patch 4 does easy simplifications in rwsem_down_read_failed(): - We don't need to wake readers queued before us; - We don't need to try to steal the lock, and thus we don't need to acquire the wait_lock after sleeping. - Patch 5 does easy simplifications in rwsem_down_write_failed(): - We don't need to check for !waiter.task since __rwsem_do_wake() doesn't remove writers from the wait_list; - Since the only way to exit the wait loop is by stealing the write lock, the corresponding exit code can be moved after the loop; - There is no point releaseing the wait_lock before entering the wait loop, as we will need to reacquire it immediately; - We don't need to get a reference on the task structure, since the task is responsible for removing itself from the wait_list; Patches 6-9 apply additional optimizations to rwsem_down_write_failed(): - Patch 6 tries write lock stealing more aggressively in order to avoid extra checks; - Patch 7 uses cmpxchg to implement the write lock stealing, instead of doing an additive adjustment that might need to be backed out; - Patch 8 avoids taking the wait_lock if there are already active locks when we wake up; - Patch 9 avoids the initial trylock if there were already active locks when we entered rwsem_down_write_failed() Patches 10-11 are updates to the __rwsem_do_wake function: - Patch 10 has simple structure cleanups that don't change the code behavior - Patch 11 adds generic support for fastpath write lock stealing, by removing assumptions the function did about rwsem acquisitions being prevented when the rwsem count value indicates there are queued waiters. Patch 12 fixes a race condition Peter Hurley noticed in reviewing v1 of this patch series, which resulted in readers sometimes blocking instead of executing in parallel with other existing readers. Patch 13 finally implements rwsem fast path lock stealing for x86 arch. Michel Lespinasse (13): rwsem: make the waiter type an enumeration rather than a bitmask rwsem: shorter spinlocked section in rwsem_down_failed_common() rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed rwsem: simplify rwsem_down_read_failed rwsem: simplify rwsem_down_write_failed rwsem: more agressive lock stealing in rwsem_down_write_failed rwsem: use cmpxchg for trying to steal write lock rwsem: avoid taking wait_lock in rwsem_down_write_failed rwsem: skip initial trylock in rwsem_down_write_failed rwsem: simplify __rwsem_do_wake rwsem: implement support for write lock stealing on the fastpath rwsem: do not block readers at head of queue if other readers are active x86 rwsem: avoid taking slow path when stealing write lock arch/x86/include/asm/rwsem.h | 28 +++-- lib/rwsem-spinlock.c | 40 +++- lib/rwsem.c | 239 +-- 3 files changed, 154 insertions(+), 153 deletions(-) Changes since v1: - Patches 1-9 are unchanged and Patch 13 are unchanged from prior submission. - v1 series made a change to ordering of reader wakeups which proved more controversial than I had anticipated. I reverted this part of the series. Patches 10-11 are new and imlement (well, in Patch 11) the minimal changes required for supporting fastpath write lock stealing without modifying the ordering of reader wakeups (This results in longer code than with the prior proposal, though). - Patch 12 is new, fixing a race condition Peter Hurley noticed while reviewing v1 which could result in reduced parallelism between readers. -- 1.8.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/