Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
On 03/22/2019 03:30 PM, Davidlohr Bueso wrote: > On Fri, 22 Mar 2019, Linus Torvalds wrote: >> Some of them _might_ be performance-critical. There's the one on >> mmap_sem in the fault handling path, for example. And yes, I'd expect >> the normal case to very much be "no other readers or writers"

Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Waiman Long
On 03/22/2019 01:25 PM, Russell King - ARM Linux admin wrote: > On Fri, Mar 22, 2019 at 10:30:08AM -0400, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before th

Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
On 03/22/2019 01:01 PM, Linus Torvalds wrote: > On Fri, Mar 22, 2019 at 7:30 AM Waiman Long wrote: >> 19 files changed, 133 insertions(+), 930 deletions(-) > Lovely. And it all looks sane to me. > > So ack. > > The only comment I have is about __down_read_trylock()

[PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm

[PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rwse

[PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-03-22 Thread Waiman Long
-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- arch/alpha/Kconfig | 7 - arch/arc/Kconfig

[PATCH v5 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-03-22 Thread Waiman Long
the architectures use one single implementation of rwsem - rwsem-xadd.c. Waiman Long (3): locking/rwsem: Remove arch specific rwsem files locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs locking/rwsem: Optimize down_read_trylock() MAINTAI

Re: [PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-21 Thread Waiman Long
On 02/21/2019 09:14 AM, Will Deacon wrote: > On Wed, Feb 13, 2019 at 05:00:17PM -0500, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before this patch, down_read_trylock: >

Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-15 Thread Waiman Long
On 02/15/2019 01:40 PM, Will Deacon wrote: > On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote: >> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: >>> v4: >>> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. >>> >>

Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-14 Thread Waiman Long
On 02/14/2019 05:37 AM, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: >> v4: >> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. >> >> v3: >> - Optimize __down_read_trylock() for the uncontended case as s

Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-14 Thread Waiman Long
On 02/14/2019 01:02 PM, Will Deacon wrote: > On Thu, Feb 14, 2019 at 11:33:33AM +0100, Peter Zijlstra wrote: >> On Wed, Feb 13, 2019 at 03:32:12PM -0500, Waiman Long wrote: >>> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >>> it ge

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-14 Thread Waiman Long
On 02/14/2019 08:23 AM, Davidlohr Bueso wrote: > On Fri, 08 Feb 2019, Waiman Long wrote: >> I am planning to run more performance test and post the data sometimes >> next week. Davidlohr is also going to run some of his rwsem performance >> test on this patchset. > > S

Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-14 Thread Waiman Long
On 02/14/2019 05:33 AM, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 03:32:12PM -0500, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before this patch, down_read_trylock: >

[PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rw

[PATCH v4 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-02-13 Thread Waiman Long
-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- arch/alpha/Kconfig | 7 - arch/arc/Kconfig

[PATCH v4 1/3] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

[PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-13 Thread Waiman Long
of this patchset is to remove the architecture specific files for rwsem-xadd to make it easer to add enhancements in the later rwsem patches. It also removes the legacy rwsem-spinlock.c file and make all the architectures use one single implementation of rwsem - rwsem-xadd.c. Waiman Long (3): locking

[PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rw

[PATCH v3 1/2] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

[PATCH v3 0/2] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
ch 2 for arm64. Waiman Long (2): locking/rwsem: Remove arch specific rwsem files locking/rwsem: Optimize down_read_trylock() MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
On 02/13/2019 02:45 AM, Ingo Molnar wrote: > * Waiman Long wrote: > >> I looked at the assembly code in arch/x86/include/asm/rwsem.h. For both >> trylocks (read & write), the count is read first before attempting to >> lock it. We did the same for all tryl

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-12 Thread Waiman Long
On 02/12/2019 02:58 PM, Linus Torvalds wrote: > On Mon, Feb 11, 2019 at 11:31 AM Waiman Long wrote: >> Modify __down_read_trylock() to make it generate slightly better code >> (smaller and maybe a tiny bit faster). > This looks good, but I would ask you to try one slightly

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-12 Thread Waiman Long
On 02/12/2019 01:36 PM, Waiman Long wrote: > On 02/12/2019 08:25 AM, Peter Zijlstra wrote: >> On Tue, Feb 12, 2019 at 02:24:04PM +0100, Peter Zijlstra wrote: >>> On Mon, Feb 11, 2019 at 02:31:26PM -0500, Waiman Long wrote: >>>> Modify __down_read_trylock() to make it

[PATCH v2 0/2] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
platforms that I can tested on (arm64 & ppc) are both using the generic C codes, the rwsem performance shouldn't be affected by this patch except the down_read_trylock() code which was included in patch 2 for arm64. Waiman Long (2): locking/rwsem: Remove arch specific rwsem files locking/r

[PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-11 Thread Waiman Long
1 27,787 28,259 28,359 9,234 On a ARM64 system, the performance results were: Before PatchAfter Patch # of Threads rlock rlock - - 1 24,155

[PATCH v2 1/2] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
On 02/11/2019 06:58 AM, Peter Zijlstra wrote: > Which is clearly worse. Now we can write that as: > > int __down_read_trylock2(unsigned long *l) > { > long tmp = READ_ONCE(*l); > > while (tmp >= 0) { > if (try_cmpxchg(l, , tmp + 1)) >

Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-10 Thread Waiman Long
On 02/10/2019 09:00 PM, Waiman Long wrote: > As the generic rwsem-xadd code is using the appropriate acquire and > release versions of the atomic operations, the arch specific rwsem.h > files will not be that much faster than the generic code as long as the > atomic functions

[PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-10 Thread Waiman Long
/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-08 Thread Waiman Long
On 02/08/2019 02:50 PM, Linus Torvalds wrote: > On Thu, Feb 7, 2019 at 11:08 AM Waiman Long wrote: >> This patchset revamps the current rwsem-xadd implementation to make >> it saner and easier to work with. This patchset removes all the >> architecture specific assembly co

Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-08 Thread Waiman Long
On 02/07/2019 03:54 PM, Waiman Long wrote: > On 02/07/2019 03:08 PM, Peter Zijlstra wrote: >> On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: >>> On 32-bit architectures, there aren't enough bits to hold both. >>> 64-bit architectures, however,

Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-07 Thread Waiman Long
On 02/07/2019 03:08 PM, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: >> On 32-bit architectures, there aren't enough bits to hold both. >> 64-bit architectures, however, can have enough bits to do that. For >> x86-64, the physical add

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-07 Thread Waiman Long
On 02/07/2019 02:51 PM, Davidlohr Bueso wrote: > On Thu, 07 Feb 2019, Waiman Long wrote: >> 30 files changed, 1197 insertions(+), 1594 deletions(-) > > Performance numbers on numerous workloads, pretty please. > > I'll go and throw this at my mmap_sem intensive workl

Re: [PATCH-tip 04/22] locking/rwsem: Remove arch specific rwsem files

2019-02-07 Thread Waiman Long
On 02/07/2019 02:36 PM, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 02:07:08PM -0500, Waiman Long wrote: > >> +static inline int __down_read_trylock(struct rw_semaphore *sem) >> +{ >> +long tmp; >> + >> +while ((tmp = atomic_long_read(>

[PATCH-tip 01/22] locking/qspinlock_stat: Introduce a generic lockevent counting APIs

2019-02-07 Thread Waiman Long
() calls are replaced by either lockevent_inc() or lockevent_cond_inc() calls. The qstat_hop() call is renamed to lockevent_pv_hop(). The "reset_counters" debugfs file is also renamed to ".reset_counts". Signed-off-by: Waiman Long --- kernel/locking/lock_events.h| 55

[PATCH-tip 19/22] locking/rwsem: Enable readers spinning on writer

2019-02-07 Thread Waiman Long
-by: Waiman Long --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem-xadd.c | 80 ++- kernel/locking/rwsem-xadd.h | 3 ++ 3 files changed, 74 insertions(+), 10 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel

[PATCH-tip 10/22] locking/rwsem: Enable lock event counting

2019-02-07 Thread Waiman Long
after sleeping. Signed-off-by: Waiman Long --- arch/Kconfig | 2 +- kernel/locking/lock_events_list.h | 17 + kernel/locking/rwsem-xadd.c | 12 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index

[PATCH-tip 08/22] locking/rwsem: Add debug check for __down_read*()

2019-02-07 Thread Waiman Long
() are also moved over to rwsem-xadd.h. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.h | 12 ++-- kernel/locking/rwsem.c | 3 --- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h index 64e7d62..77151c3

[PATCH-tip 02/22] locking/lock_events: Make lock_events available for all archs & other locks

2019-02-07 Thread Waiman Long
directory. Signed-off-by: Waiman Long --- arch/Kconfig| 10 +++ arch/x86/Kconfig| 8 --- kernel/locking/Makefile | 1 + kernel/locking/lock_events.c| 153 kernel/locking/lock_events.h| 6 +- kernel

[PATCH-tip 09/22] locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro

2019-02-07 Thread Waiman Long
of the rwsem count and owner fields to give more information about what is wrong with the rwsem. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.h | 19 --- kernel/locking/rwsem.c | 5 +++-- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rwsem

[PATCH-tip 03/22] locking/rwsem: Relocate rwsem_down_read_failed()

2019-02-07 Thread Waiman Long
The rwsem_down_read_failed*() functions were relocted from above the optimistic spinning section to below that section. This enables the reader functions to use optimisitic spinning in future patches. There is no code change. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 172

[PATCH-tip 11/22] locking/rwsem: Implement a new locking scheme

2019-02-07 Thread Waiman Long
-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 145 +++- kernel/locking/rwsem-xadd.h | 85 +- 2 files changed, 89 insertions(+), 141 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index

[PATCH-tip 13/22] locking/rwsem: Remove rwsem_wake() wakeup optimization

2019-02-07 Thread Waiman Long
()/up_write()") will have to be reverted. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 74 - 1 file changed, 74 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 12b1d61..5f74bae 100644 --- a/ker

[PATCH-tip 04/22] locking/rwsem: Remove arch specific rwsem files

2019-02-07 Thread Waiman Long
. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211

[PATCH-tip 22/22] locking/rwsem: Ensure an RT task will not spin on reader

2019-02-07 Thread Waiman Long
to deadlock. So we have to make sure that an RT task will not spin on a reader-owned rwsem. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index

[PATCH-tip 16/22] locking/rwsem: Remove redundant computation of writer lock word

2019-02-07 Thread Waiman Long
, the extra constant argument to rwsem_try_write_lock() and rwsem_try_write_lock_unqueued() should be optimized out by the compiler. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/kernel

[PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-07 Thread Waiman Long
on both read and write lock performance. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 +++-- kernel/locking/rwsem-xadd.h | 105 +++- 2 files changed, 110 insertions(+), 15 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b

[PATCH-tip 17/22] locking/rwsem: Recheck owner if it is not on cpu

2019-02-07 Thread Waiman Long
Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 16dc7a1..21d462f 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem

[PATCH-tip 05/22] locking/rwsem: Move owner setting code from rwsem.c to rwsem.h

2019-02-07 Thread Waiman Long
as reader-owned when the functions return. That is currently true except in the transient case that the waiter queue just becomes empty. So a rwsem_set_reader_owned() call is added for this case. The __rwsem_set_reader_owned() call in __rwsem_mark_wake() is now necessary. Signed-off-by: Waiman Long

[PATCH-tip 07/22] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h

2019-02-07 Thread Waiman Long
We don't need to expose rwsem internal functions which are not supposed to be called directly from other kernel code. Signed-off-by: Waiman Long --- include/linux/rwsem.h | 7 --- kernel/locking/rwsem-xadd.h | 7 +++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git

[PATCH-tip 06/22] locking/rwsem: Rename kernel/locking/rwsem.h

2019-02-07 Thread Waiman Long
The content of kernel/locking/rwsem.h is now specific to rwsem-xadd only. Rename it to rwsem-xadd.h to indicate that it is specific to rwsem-xadd and include it only when CONFIG_RWSEM_XCHGADD_ALGORITHM is set. Signed-off-by: Waiman Long --- kernel/locking/percpu-rwsem.c | 4 +- kernel/locking

[PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-07 Thread Waiman Long
. Waiman Long (22): locking/qspinlock_stat: Introduce a generic lockevent counting APIs locking/lock_events: Make lock_events available for all archs & other locks locking/rwsem: Relocate rwsem_down_read_failed() locking/rwsem: Remove arch specific rwsem files locking/rwsem:

Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2017-10-11 Thread Waiman Long
On 10/11/2017 04:50 PM, Dave Chinner wrote: > On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote: >> In term of rwsem performance, a rwsem microbenchmark and fio randrw >> test with a xfs filesystem on a ramdisk were used to verify the >> performance changes due t

Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme

2017-10-11 Thread Waiman Long
On 10/11/2017 02:58 PM, Waiman Long wrote: > On 10/11/2017 02:40 PM, Peter Zijlstra wrote: >> On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote: >>> +/* >>> + * The definition of the atomic counter in the semaphore: >>> + * >>> + *

[PATCH v6 02/11] locking/rwsem: Implement a new locking scheme

2017-10-11 Thread Waiman Long
than xadd in ppc, the elimination of the atomic count reversal in slowpath helps the contended performance, though. Signed-off-by: Waiman Long <long...@redhat.com> --- include/asm-generic/rwsem.h | 129 - include/linux/rwsem.h | 12 ++--

[PATCH v6 03/11] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h

2017-10-11 Thread Waiman Long
if the owner is properly set first. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.c | 7 +++ kernel/locking/rwsem-xadd.h | 19 --- kernel/locking/rwsem.c | 17 ++--- kernel/locking/rwsem.h | 11 --- 4 files c

[PATCH v6 01/11] locking/rwsem: relocate rwsem_down_read_failed()

2017-10-11 Thread Waiman Long
The rwsem_down_read_failed*() functions were relocted from above the optimistic spinning section to below that section. This enables them to use functions in that section in future patches. There is no code change. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-

[PATCH v6 06/11] locking/rwsem: Remove arch specific rwsem files

2017-10-11 Thread Waiman Long
. Signed-off-by: Waiman Long <long...@redhat.com> --- arch/alpha/include/asm/rwsem.h | 195 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/rwsem.h

[PATCH v6 07/11] locking/rwsem: Implement lock handoff to prevent lock starvation

2017-10-11 Thread Waiman Long
done more than 646k of them. For the patched kernel, the locking rate dropped to 12,590 kop/s. The number of locking operations done per thread had a range of 14,450 - 22,648. The rwsem became much more fair with the tradeoff of lower overall throughput. Signed-off-by: Waiman Long <long...@redhat.

[PATCH v6 08/11] locking/rwsem: Enable readers spinning on writer

2017-10-11 Thread Waiman Long
. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.c | 66 ++--- 1 file changed, 57 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index bca412f..52305c3 100644 --- a/

[PATCH v6 05/11] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h

2017-10-11 Thread Waiman Long
We don't need to expose rwsem internal functions which are not supposed to be called directly from other kernel code. Signed-off-by: Waiman Long <long...@redhat.com> --- include/linux/rwsem.h | 7 --- kernel/locking/rwsem-xadd.h | 7 +++ 2 files changed, 7 insertions

[PATCH v6 11/11] locking/rwsem: Enable count-based spinning on reader

2017-10-11 Thread Waiman Long
alternatively, the resulting locking total rates on a 4.14 based kernel were 927 kop/s and 3218 kop/s without and with the patch respectively. That was an increase of about 247%. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.

[PATCH v6 10/11] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value

2017-10-11 Thread Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.c | 14 +- 1 file chan

[PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2017-10-11 Thread Waiman Long
9 211,276/ 509,712/1,134,0074,894/221,839/246,818 11 884,513/1,043,989/1,252,5339,604/ 11,105/ 25,225 It can be seen that rwsem changes from writer-preferring to reader-preferring. Waiman Long (11): locking/rwsem: relocate rwsem_down_read_failed() locking/rwsem: Impl

[PATCH v5 4/9] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2017-06-01 Thread Waiman Long
+ RWSEM_ACTIVE_READ_BIAS Signed-off-by: Waiman Long <long...@redhat.com> --- arch/alpha/include/asm/rwsem.h| 3 ++- arch/ia64/include/asm/rwsem.h | 2 +- arch/s390/include/asm/rwsem.h | 2 +- arch/x86/include/asm/rwsem.h | 3 ++- include/asm-generic/rwsem.h

[PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed()

2017-06-01 Thread Waiman Long
The rwsem_down_read_failed() function was relocted from above the optimistic spinning section to below that section. This enables it to use functions in that section in future patches. There is no code change. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.

[PATCH v5 6/9] locking/rwsem: Use bit in owner to stop spinning

2017-06-01 Thread Waiman Long
or spinning writers. This patch provides the helper functions to facilitate the use of that bit. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem.h | 66 ++ 1 file changed, 56 insertions(+), 10 deletions(-) diff

[PATCH v5 9/9] locking/rwsem: Enable reader lock stealing

2017-06-01 Thread Waiman Long
stealing on a rwsem as long as the lock is reader-owned and optimistic spinning hasn't been disabled because of long writer wait. This will improve overall performance without running the risk of writer lock starvation. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking

[PATCH v5 3/9] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h

2017-06-01 Thread Waiman Long
ed-off-by: Waiman Long <long...@redhat.com> --- arch/alpha/include/asm/rwsem.h| 8 +--- arch/ia64/include/asm/rwsem.h | 7 ++- arch/s390/include/asm/rwsem.h | 7 +-- arch/x86/include/asm/rwsem.h | 19 +-- include/asm-generic/rwsem.

[PATCH v5 8/9] locking/rwsem: Enable count-based spinning on reader

2017-06-01 Thread Waiman Long
runs) on a 4.12 based kernel were 1760.2 Mop/s and 5439.0 Mop/s without and with the patch respectively. That was an increase of about 209%. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.c | 72 ++--- 1 file chang

[PATCH v5 7/9] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value

2017-06-01 Thread Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long <long...@redhat.com> --- kernel/locking/rwsem-xadd.c | 14 +- 1 file chan

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-05 Thread Waiman Long
On 10/05/2016 08:19 AM, Waiman Long wrote: On 10/04/2016 03:06 PM, Davidlohr Bueso wrote: On Thu, 18 Aug 2016, Waiman Long wrote: The osq_lock() and osq_unlock() function may not provide the necessary acquire and release barrier in some cases. This patch makes sure that the proper barriers

Re: [RFC PATCH-tip v4 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold

2016-08-24 Thread Waiman Long
On 08/24/2016 12:00 AM, Davidlohr Bueso wrote: On Thu, 18 Aug 2016, Waiman Long wrote: The default reader spining threshold is current set to 4096. However, the right reader spinning threshold may vary from one system to another and among the different architectures. This patch adds a new

Re: [RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-08-19 Thread Waiman Long
On 08/19/2016 01:57 AM, Wanpeng Li wrote: 2016-08-19 5:11 GMT+08:00 Waiman Long<waiman.l...@hpe.com>: When the count value is in between 0 and RWSEM_WAITING_BIAS, there are 2 possibilities. Either a writer is present and there is no waiter count = 0x0001 or there are waiters and r

[RFC PATCH-tip v4 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold

2016-08-19 Thread Waiman Long
of different systems as well as for testing purposes. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- Documentation/kernel-parameters.txt |3 +++ kernel/locking/rwsem-xadd.c | 14 +- 2 files changed, 16 insertions(+), 1 deletions(-) diff --git a/Documen

[RFC PATCH-tip v4 08/10] locking/rwsem: Enable spinning readers

2016-08-19 Thread Waiman Long
patch BW after patch % change --- -- randrw1352 MB/s 2164 MB/s +60% randwrite 1710 MB/s 2550 MB/s +49% Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- kernel/locking/rwsem-xadd.c

[RFC PATCH-tip v4 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-08-18 Thread Waiman Long
+ RWSEM_ACTIVE_READ_BIAS Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- arch/alpha/include/asm/rwsem.h|3 ++- arch/ia64/include/asm/rwsem.h |2 +- arch/s390/include/asm/rwsem.h |2 +- arch/x86/include/asm/rwsem.h |3 ++- include/asm-generic/rwsem.h |4 ++-- i

[RFC PATCH-tip v4 09/10] locking/rwsem: Enable reactivation of reader spinning

2016-08-18 Thread Waiman Long
. If there are sufficient more successful spin attempts than failed ones, it will try to reactivate reader spinning. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- include/linux/rwsem.h | 12 kernel/locking/rwsem-xadd.c | 27 +-- 2 files chang

[RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-08-18 Thread Waiman Long
--- -- randrw1210 MB/s 1352 MB/s +12% randwrite 1622 MB/s 1710 MB/s +5.4% The write-only microbench also showed improvement because some read locking was done by the XFS code. Signed-off-by: Waiman Long <waiman.l...@hpe.com> ---

[RFC PATCH-tip v4 00/10] locking/rwsem: Enable reader optimistic spinning

2016-08-18 Thread Waiman Long
new boot parameter to change the reader spinning threshold which can be system specific. Waiman Long (10): locking/osq: Make lock/unlock proper acquire/release barrier locking/rwsem: Stop active read lock ASAP locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value locking/rwse

[RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-08-18 Thread Waiman Long
org> Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- kernel/locking/osq_lock.c | 24 ++-- 1 files changed, 18 insertions(+), 6 deletions(-) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 05a3785..3da0b97 100644 --- a/kernel/locking/osq_

[RFC PATCH-tip v4 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h

2016-08-18 Thread Waiman Long
ed-off-by: Waiman Long <waiman.l...@hpe.com> --- arch/alpha/include/asm/rwsem.h|8 +--- arch/ia64/include/asm/rwsem.h |7 ++- arch/s390/include/asm/rwsem.h |7 +-- arch/x86/include/asm/rwsem.h | 19 +-- include/asm-generic/rwsem.h

[RFC PATCH-tip v4 04/10] locking/rwsem: Enable count-based spinning on reader

2016-08-18 Thread Waiman Long
. Both the spinning threshold and the default value for rspin_enabled can be overridden by architecture specific rwsem.h header file. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- include/linux/rwsem.h | 19 +++- kernel/locking/rwsem-xadd.c

[RFC PATCH-tip v4 05/10] locking/rwsem: move down rwsem_down_read_failed function

2016-08-18 Thread Waiman Long
Move the rwsem_down_read_failed() function down to below the optimistic spinning section before enabling optimistic spinning for the readers. It is because the rwsem_down_read_failed() function will call rwsem_optimistic_spin() in later patch. There is no change in code. Signed-off-by: Waiman

[RFC PATCH-tip v4 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value

2016-08-18 Thread Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- kernel/locking/rwsem-xadd.c | 14 +- 1 files c

Re: [RFC PATCH-tip v2 1/6] locking/osq: Make lock/unlock proper acquire/release barrier

2016-06-17 Thread Waiman Long
On 06/17/2016 11:45 AM, Will Deacon wrote: On Fri, Jun 17, 2016 at 11:26:41AM -0400, Waiman Long wrote: On 06/16/2016 08:48 PM, Boqun Feng wrote: On Thu, Jun 16, 2016 at 05:35:54PM -0400, Waiman Long wrote: If you look into the actual code: next = xchg_release(>next, N

[RFC PATCH-tip/locking/core v3 02/10] locking/rwsem: Stop active read lock ASAP

2016-06-17 Thread Waiman Long
--- -- randrw1210 MB/s 1352 MB/s +12% randwrite 1622 MB/s 1710 MB/s +5.4% The write-only microbench also showed improvement because some read locking was done by the XFS code. Signed-off-by: Waiman Long <waiman.l...@hpe.com> ---

Re: [RFC PATCH-tip v2 1/6] locking/osq: Make lock/unlock proper acquire/release barrier

2016-06-17 Thread Waiman Long
On 06/16/2016 08:48 PM, Boqun Feng wrote: On Thu, Jun 16, 2016 at 05:35:54PM -0400, Waiman Long wrote: On 06/15/2016 10:19 PM, Boqun Feng wrote: On Wed, Jun 15, 2016 at 03:01:19PM -0400, Waiman Long wrote: On 06/15/2016 04:04 AM, Boqun Feng wrote: Hi Waiman, On Tue, Jun 14, 2016 at 06:48

[RFC PATCH-tip/locking/core v3 05/10] locking/rwsem: move down rwsem_down_read_failed function

2016-06-17 Thread Waiman Long
Move the rwsem_down_read_failed() function down to below the optimistic spinning section before enabling optimistic spinning for the readers. It is because the rwsem_down_read_failed() function will call rwsem_optimistic_spin() in later patch. There is no change in code. Signed-off-by: Waiman

[RFC PATCH-tip/locking/core v3 04/10] locking/rwsem: Enable count-based spinning on reader

2016-06-17 Thread Waiman Long
. Both the spinning threshold and the default value for rspin_enabled can be overridden by architecture specific rwsem.h header file. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- include/linux/rwsem.h | 19 +++- kernel/locking/rwsem-xadd.c

[RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning

2016-06-17 Thread Waiman Long
k code. Patch 8 enables readers to do optimistic spinning. Patch 9 allows reactivation of reader spinning when a lot of writer-on-writer spins are successful. Patch 10 adds a new boot parameter to change the reader spinning threshold which can be system specific. Waiman Long (10): locking/osq: Mak

[RFC PATCH-tip/locking/core v3 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value

2016-06-17 Thread Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- kernel/locking/rwsem-xadd.c | 14 +- 1 files c

[RFC PATCH-tip/locking/core v3 09/10] locking/rwsem: Enable reactivation of reader spinning

2016-06-17 Thread Waiman Long
. If there are sufficient more successful spin attempts than failed ones, it will try to reactivate reader spinning. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- include/linux/rwsem.h | 12 kernel/locking/rwsem-xadd.c | 27 +-- 2 files chang

[RFC PATCH-tip/locking/core v3 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold

2016-06-17 Thread Waiman Long
of different systems as well as for testing purposes. Signed-off-by: Waiman Long <waiman.l...@hpe.com> --- Documentation/kernel-parameters.txt |3 +++ kernel/locking/rwsem-xadd.c | 14 +- 2 files changed, 16 insertions(+), 1 deletions(-) diff --git a/Documen

[RFC PATCH-tip/locking/core v3 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h

2016-06-17 Thread Waiman Long
ed-off-by: Waiman Long <waiman.l...@hpe.com> --- arch/alpha/include/asm/rwsem.h|8 +--- arch/ia64/include/asm/rwsem.h |7 ++- arch/s390/include/asm/rwsem.h |7 +-- arch/x86/include/asm/rwsem.h | 19 +-- include/asm-generic/rwsem.h

Re: [RFC PATCH-tip v2 1/6] locking/osq: Make lock/unlock proper acquire/release barrier

2016-06-16 Thread Waiman Long
On 06/15/2016 10:19 PM, Boqun Feng wrote: On Wed, Jun 15, 2016 at 03:01:19PM -0400, Waiman Long wrote: On 06/15/2016 04:04 AM, Boqun Feng wrote: Hi Waiman, On Tue, Jun 14, 2016 at 06:48:04PM -0400, Waiman Long wrote: The osq_lock() and osq_unlock() function may not provide the necessary

Re: [RFC PATCH-tip v2 2/6] locking/rwsem: Stop active read lock ASAP

2016-06-16 Thread Waiman Long
On 06/15/2016 10:14 PM, Davidlohr Bueso wrote: On Wed, 15 Jun 2016, Waiman Long wrote: I think there will be a little bit of performance impact for a workload that produce just the right amount of rwsem contentions. I'm not saying the change doesn't make sense, but this is the sort of thing

Re: [RFC PATCH-tip v2 5/6] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-06-15 Thread Waiman Long
On 06/15/2016 01:45 PM, Peter Zijlstra wrote: On Tue, Jun 14, 2016 at 06:48:08PM -0400, Waiman Long wrote: +++ b/arch/alpha/include/asm/rwsem.h @@ -17,9 +17,9 @@ #define RWSEM_UNLOCKED_VALUE 0xL #define RWSEM_ACTIVE_BIAS 0x0001L #define

Re: [RFC PATCH-tip v2 5/6] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation

2016-06-15 Thread Waiman Long
On 06/15/2016 01:43 PM, Peter Zijlstra wrote: On Tue, Jun 14, 2016 at 06:48:08PM -0400, Waiman Long wrote: even the reduced maximum of about 16k (32-bit) or 1G (64-bit) should be more than enough for the foreseeable future. So what happens if I manage to create 16k+ threads on my 32bit kernel

Re: [RFC PATCH-tip v2 4/6] locking/rwsem: move down rwsem_down_read_failed function

2016-06-15 Thread Waiman Long
On 06/15/2016 01:40 PM, Peter Zijlstra wrote: On Tue, Jun 14, 2016 at 06:48:07PM -0400, Waiman Long wrote: Move the rwsem_down_read_failed() function down to below the optimistic spinning section before enabling optimistic spinning for the readers. newline There is no change in code

  1   2   >