Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files
On 03/22/2019 03:30 PM, Davidlohr Bueso wrote: > On Fri, 22 Mar 2019, Linus Torvalds wrote: >> Some of them _might_ be performance-critical. There's the one on >> mmap_sem in the fault handling path, for example. And yes, I'd expect >> the normal case to very much be "no other readers or writers" for that >> one. > > Yeah, the mmap_sem case in the fault path is really expecting an unlocked > state. To the point that four archs have added branch predictions, ie: > > 92181f190b6 (x86: optimise x86's do_page_fault (C entry point for the > page fault path)) > b15021d994f (powerpc/mm: Add a bunch of (un)likely annotations to > do_page_fault) > > And using PROFILE_ANNOTATED_BRANCHES shows pretty clearly: > (without resetting the counters) > > correct incorrect % Function File Line > --- - - > 4603685 34 0 do_user_addr_fault fault.c 1416 > (bootup) > 382327745 449 0 do_user_addr_fault fault.c > 1416 (kernel build) > 399446159 461 0 do_user_addr_fault fault.c > 1416 (redis benchmark) > > It would probably wouldn't harm doing the unlikely() for all archs, or > alternatively, add likely() to the atomic_long_try_cmpxchg_acquire in > patch 3 and do it implicitly but maybe that would be less flexible(?) > > Thanks, > Davidlohr I had used the my lock event counting code to count the number of contended and uncontended trylocks. I tested both bootup and kernel build. I think I saw less than 1% were contended, the rests were all uncontended. That is similar to what you got. I thought I had sent the data out previously, but I couldn't find the email. That was the main reason why I took Linus' suggestion to optimize it for the uncontended case. Thanks, Longman
Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files
On Fri, 22 Mar 2019, Linus Torvalds wrote: Some of them _might_ be performance-critical. There's the one on mmap_sem in the fault handling path, for example. And yes, I'd expect the normal case to very much be "no other readers or writers" for that one. Yeah, the mmap_sem case in the fault path is really expecting an unlocked state. To the point that four archs have added branch predictions, ie: 92181f190b6 (x86: optimise x86's do_page_fault (C entry point for the page fault path)) b15021d994f (powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault) And using PROFILE_ANNOTATED_BRANCHES shows pretty clearly: (without resetting the counters) correct incorrect %Function File Line --- - - 4603685 34 0 do_user_addr_fault fault.c 1416 (bootup) 382327745 449 0 do_user_addr_fault fault.c 1416 (kernel build) 399446159 461 0 do_user_addr_fault fault.c 1416 (redis benchmark) It would probably wouldn't harm doing the unlikely() for all archs, or alternatively, add likely() to the atomic_long_try_cmpxchg_acquire in patch 3 and do it implicitly but maybe that would be less flexible(?) Thanks, Davidlohr
Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()
On 03/22/2019 01:25 PM, Russell King - ARM Linux admin wrote: > On Fri, Mar 22, 2019 at 10:30:08AM -0400, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before this patch, down_read_trylock: >> >>0x <+0>: callq 0x5 >>0x0005 <+5>: jmp0x18 >>0x0007 <+7>: lea0x1(%rdx),%rcx >>0x000b <+11>:mov%rdx,%rax >>0x000e <+14>:lock cmpxchg %rcx,(%rdi) >>0x0013 <+19>:cmp%rax,%rdx >>0x0016 <+22>:je 0x23 >>0x0018 <+24>:mov(%rdi),%rdx >>0x001b <+27>:test %rdx,%rdx >>0x001e <+30>:jns0x7 >>0x0020 <+32>:xor%eax,%eax >>0x0022 <+34>:retq >>0x0023 <+35>:mov%gs:0x0,%rax >>0x002c <+44>:or $0x3,%rax >>0x0030 <+48>:mov%rax,0x20(%rdi) >>0x0034 <+52>:mov$0x1,%eax >>0x0039 <+57>:retq >> >> After patch, down_read_trylock: >> >>0x <+0>: callq 0x5 >>0x0005 <+5>: xor%eax,%eax >>0x0007 <+7>: lea0x1(%rax),%rdx >>0x000b <+11>: lock cmpxchg %rdx,(%rdi) >>0x0010 <+16>: jne0x29 >>0x0012 <+18>: mov%gs:0x0,%rax >>0x001b <+27>: or $0x3,%rax >>0x001f <+31>: mov%rax,0x20(%rdi) >>0x0023 <+35>: mov$0x1,%eax >>0x0028 <+40>: retq >>0x0029 <+41>: test %rax,%rax >>0x002c <+44>: jns0x7 >>0x002e <+46>: xor%eax,%eax >>0x0030 <+48>: retq >> >> By using a rwsem microbenchmark, the down_read_trylock() rate (with a >> load of 10 to lengthen the lock critical section) on a x86-64 system >> before and after the patch were: >> >> Before PatchAfter Patch >># of Threads rlock rlock >> - - >> 1 14,496 14,716 >> 28,644 8,453 >> 46,799 6,983 >> 85,664 7,190 >> >> On a ARM64 system, the performance results were: >> >> Before PatchAfter Patch >># of Threads rlock rlock >> - - >> 1 23,676 24,488 >> 27,697 9,502 >> 44,945 3,440 >> 82,641 1,603 >> >> For the uncontended case (1 thread), the new down_read_trylock() is a >> little bit faster. For the contended cases, the new down_read_trylock() >> perform pretty well in x86-64, but performance degrades at high >> contention level on ARM64. > So, 70% for 4 threads, 61% for 4 threads - does this trend > continue tailing off as the number of threads (and cores) > increase? > I didn't try higher number of contending threads. I won't worry too much about contention as trylock is a one-off event. The chance of having more than one trylock happening simultaneously is very small. Cheers, Longman
Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()
On Fri, Mar 22, 2019 at 10:30:08AM -0400, Waiman Long wrote: > Modify __down_read_trylock() to optimize for an unlocked rwsem and make > it generate slightly better code. > > Before this patch, down_read_trylock: > >0x <+0>: callq 0x5 >0x0005 <+5>: jmp0x18 >0x0007 <+7>: lea0x1(%rdx),%rcx >0x000b <+11>:mov%rdx,%rax >0x000e <+14>:lock cmpxchg %rcx,(%rdi) >0x0013 <+19>:cmp%rax,%rdx >0x0016 <+22>:je 0x23 >0x0018 <+24>:mov(%rdi),%rdx >0x001b <+27>:test %rdx,%rdx >0x001e <+30>:jns0x7 >0x0020 <+32>:xor%eax,%eax >0x0022 <+34>:retq >0x0023 <+35>:mov%gs:0x0,%rax >0x002c <+44>:or $0x3,%rax >0x0030 <+48>:mov%rax,0x20(%rdi) >0x0034 <+52>:mov$0x1,%eax >0x0039 <+57>:retq > > After patch, down_read_trylock: > >0x <+0>: callq 0x5 >0x0005 <+5>: xor%eax,%eax >0x0007 <+7>: lea0x1(%rax),%rdx >0x000b <+11>: lock cmpxchg %rdx,(%rdi) >0x0010 <+16>: jne0x29 >0x0012 <+18>: mov%gs:0x0,%rax >0x001b <+27>: or $0x3,%rax >0x001f <+31>: mov%rax,0x20(%rdi) >0x0023 <+35>: mov$0x1,%eax >0x0028 <+40>: retq >0x0029 <+41>: test %rax,%rax >0x002c <+44>: jns0x7 >0x002e <+46>: xor%eax,%eax >0x0030 <+48>: retq > > By using a rwsem microbenchmark, the down_read_trylock() rate (with a > load of 10 to lengthen the lock critical section) on a x86-64 system > before and after the patch were: > > Before PatchAfter Patch ># of Threads rlock rlock > - - > 1 14,496 14,716 > 28,644 8,453 > 46,799 6,983 > 85,664 7,190 > > On a ARM64 system, the performance results were: > > Before PatchAfter Patch ># of Threads rlock rlock > - - > 1 23,676 24,488 > 27,697 9,502 > 44,945 3,440 > 82,641 1,603 > > For the uncontended case (1 thread), the new down_read_trylock() is a > little bit faster. For the contended cases, the new down_read_trylock() > perform pretty well in x86-64, but performance degrades at high > contention level on ARM64. So, 70% for 4 threads, 61% for 4 threads - does this trend continue tailing off as the number of threads (and cores) increase? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files
On 03/22/2019 01:01 PM, Linus Torvalds wrote: > On Fri, Mar 22, 2019 at 7:30 AM Waiman Long wrote: >> 19 files changed, 133 insertions(+), 930 deletions(-) > Lovely. And it all looks sane to me. > > So ack. > > The only comment I have is about __down_read_trylock(), which probably > isn't critical enough to actually care about, but: > >> +static inline int __down_read_trylock(struct rw_semaphore *sem) >> +{ >> + long tmp; >> + >> + while ((tmp = atomic_long_read(>count)) >= 0) { >> + if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, >> + tmp + RWSEM_ACTIVE_READ_BIAS)) { >> + return 1; >> + } >> + } >> + return 0; >> +} > So this seems to > > (a) read the line early (the whole cacheline in shared state issue) > > (b) read the line again unnecessarily in the while loop > > Now, (a) might be explained by "well, maybe we do trylock even with > existing readers", although I continue to think that the case we > should optimize for is simply the uncontended one, where we don't even > have multiple readers. > > But (b) just seems silly. > > So I wonder if it shouldn't just be > > long tmp = 0; > > do { > long new = atomic_long_cmpxchg_acquire(>count, tmp, > tmp + RWSEM_ACTIVE_READ_BIAS); > if (likely(new == tmp)) > return 1; >tmp = new; > } while (tmp >= 0); > return 0; > > which would seem simpler and solve both issues. Hmm? > > But honestly, I didn't check what our uses of down_read_trylock() look > like. We have more of them than I expected, and I _think_ the normal > case is the "nobody else holds the lock", but that's just a gut > feeling. > > Some of them _might_ be performance-critical. There's the one on > mmap_sem in the fault handling path, for example. And yes, I'd expect > the normal case to very much be "no other readers or writers" for that > one. > > NOTE! The above code snippet is absolutely untested, and might be > completely wrong. Take it as a "something like this" rather than > anything else. > >Linus As you have noticed already, this patch is just for moving code around without changing it. I optimize __down_read_trylock() in patch 3. Cheers, Longman
Re: [PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs
On Fri, Mar 22, 2019 at 7:30 AM Waiman Long wrote: > > For simplication, we are going to remove rwsem-spinlock.c and make all > architectures use a single implementation of rwsem - rwsem-xadd.c. Ack. Linus
Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()
On Fri, Mar 22, 2019 at 7:30 AM Waiman Long wrote: > > Modify __down_read_trylock() to optimize for an unlocked rwsem and make > it generate slightly better code. Oh, that should teach me to read all patches in the series before starting to comment on them. So ignore my comment on #1. Linus
[PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway. By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks): Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed - - - - - - 129,201 30,143 29,45828,615 30,172 29,201 2 6,807 13,299 1,171 7,725 15,025 1,804 4 6,504 12,755 1,520 7,127 14,286 1,345 8 6,762 13,412 764 6,826 13,652 726 16 6,693 15,408 662 6,599 15,938 626 32 6,145 15,286 496 5,549 15,487 511 64 5,812 15,495 60 5,858 15,572 60 There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a little bit faster than the assembly version with low lock contention. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h with no code change as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/rwsem.h | 172 --- arch/powerpc/include/asm/Kbuild | 1 - arch/s390/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild | 1 - arch/x86/include/asm/rwsem.h| 237 arch/x86/lib/Makefile | 1 - arch/x86/lib/rwsem.S| 156 - arch/x86/um/Makefile| 1 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/rwsem.h | 140 --- include/linux/rwsem.h | 4 +- kernel/locking/percpu-rwsem.c | 2 + kernel/locking/rwsem.h | 130 ++ 19 files changed, 133 insertions(+), 930 deletions(-) delete mode 100644 arch/alpha/include/asm/rwsem.h delete mode 100644 arch/ia64/include/asm/rwsem.h delete mode 100644 arch/x86/include/asm/rwsem.h delete mode 100644 arch/x86/lib/rwsem.S delete mode 100644 include/asm-generic/rwsem.h diff --git a/MAINTAINERS b/MAINTAINERS index e17ebf70b548..6bfd5a94c08e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9089,7 +9089,6 @@ F:arch/*/include/asm/spinlock*.h F: include/linux/rwlock*.h F: include/linux/mutex*.h F: include/linux/rwsem*.h -F: arch/*/include/asm/rwsem.h F: include/linux/seqlock.h F: lib/locking*.[ch] F: kernel/locking/ diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h deleted file mode 100644 index cf8fc8f9a2ed.. --- a/arch/alpha/include/asm/rwsem.h +++ /dev/null @@ -1,211 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ALPHA_RWSEM_H -#define _ALPHA_RWSEM_H - -/* - * Written by Ivan Kokshaysky , 2001. - * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h - */ - -#ifndef _LINUX_RWSEM_H -#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead" -#endif - -#ifdef __KERNEL__ - -#include - -#define RWSEM_UNLOCKED_VALUE 0xL -#define RWSEM_ACTIVE_BIAS 0x0001L -#define RWSEM_ACTIVE_MASK 0xL -#define RWSEM_WAITING_BIAS (-0x0001L) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -static inline int ___down_read(struct rw_semaphore *sem) -{ - long oldcount; -#ifndefCONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter += RWSEM_ACTIVE_READ_BIAS; -#else - long temp; - __asm__ __volatile__( - "1: ldq_l
[PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()
Modify __down_read_trylock() to optimize for an unlocked rwsem and make it generate slightly better code. Before this patch, down_read_trylock: 0x <+0>: callq 0x5 0x0005 <+5>: jmp0x18 0x0007 <+7>: lea0x1(%rdx),%rcx 0x000b <+11>:mov%rdx,%rax 0x000e <+14>:lock cmpxchg %rcx,(%rdi) 0x0013 <+19>:cmp%rax,%rdx 0x0016 <+22>:je 0x23 0x0018 <+24>:mov(%rdi),%rdx 0x001b <+27>:test %rdx,%rdx 0x001e <+30>:jns0x7 0x0020 <+32>:xor%eax,%eax 0x0022 <+34>:retq 0x0023 <+35>:mov%gs:0x0,%rax 0x002c <+44>:or $0x3,%rax 0x0030 <+48>:mov%rax,0x20(%rdi) 0x0034 <+52>:mov$0x1,%eax 0x0039 <+57>:retq After patch, down_read_trylock: 0x <+0>: callq 0x5 0x0005 <+5>: xor%eax,%eax 0x0007 <+7>: lea0x1(%rax),%rdx 0x000b <+11>:lock cmpxchg %rdx,(%rdi) 0x0010 <+16>:jne0x29 0x0012 <+18>:mov%gs:0x0,%rax 0x001b <+27>:or $0x3,%rax 0x001f <+31>:mov%rax,0x20(%rdi) 0x0023 <+35>:mov$0x1,%eax 0x0028 <+40>:retq 0x0029 <+41>:test %rax,%rax 0x002c <+44>:jns0x7 0x002e <+46>:xor%eax,%eax 0x0030 <+48>:retq By using a rwsem microbenchmark, the down_read_trylock() rate (with a load of 10 to lengthen the lock critical section) on a x86-64 system before and after the patch were: Before PatchAfter Patch # of Threads rlock rlock - - 1 14,496 14,716 28,644 8,453 46,799 6,983 85,664 7,190 On a ARM64 system, the performance results were: Before PatchAfter Patch # of Threads rlock rlock - - 1 23,676 24,488 27,697 9,502 44,945 3,440 82,641 1,603 For the uncontended case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rwsem.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h index 45ee00236e03..1f5775aa6a1d 100644 --- a/kernel/locking/rwsem.h +++ b/kernel/locking/rwsem.h @@ -174,14 +174,17 @@ static inline int __down_read_killable(struct rw_semaphore *sem) static inline int __down_read_trylock(struct rw_semaphore *sem) { - long tmp; + /* +* Optimize for the case when the rwsem is not locked at all. +*/ + long tmp = RWSEM_UNLOCKED_VALUE; - while ((tmp = atomic_long_read(>count)) >= 0) { - if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, - tmp + RWSEM_ACTIVE_READ_BIAS)) { + do { + if (atomic_long_try_cmpxchg_acquire(>count, , + tmp + RWSEM_ACTIVE_READ_BIAS)) { return 1; } - } + } while (tmp >= 0); return 0; } -- 2.18.1
[PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs
Currently, we have two different implementation of rwsem: 1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c) 2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c) As we are going to use a single generic implementation for rwsem-xadd.c and no architecture-specific code will be needed, there is no point in keeping two different implementations of rwsem. In most cases, the performance of rwsem-spinlock.c will be worse. It also doesn't get all the performance tuning and optimizations that had been implemented in rwsem-xadd.c over the years. For simplication, we are going to remove rwsem-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- arch/alpha/Kconfig | 7 - arch/arc/Kconfig| 3 - arch/arm/Kconfig| 4 - arch/arm64/Kconfig | 3 - arch/c6x/Kconfig| 3 - arch/csky/Kconfig | 3 - arch/h8300/Kconfig | 3 - arch/hexagon/Kconfig| 6 - arch/ia64/Kconfig | 4 - arch/m68k/Kconfig | 7 - arch/microblaze/Kconfig | 6 - arch/mips/Kconfig | 7 - arch/nds32/Kconfig | 3 - arch/nios2/Kconfig | 3 - arch/openrisc/Kconfig | 6 - arch/parisc/Kconfig | 6 - arch/powerpc/Kconfig| 7 - arch/riscv/Kconfig | 3 - arch/s390/Kconfig | 6 - arch/sh/Kconfig | 6 - arch/sparc/Kconfig | 8 - arch/unicore32/Kconfig | 6 - arch/x86/Kconfig| 3 - arch/x86/um/Kconfig | 6 - arch/xtensa/Kconfig | 3 - include/linux/rwsem-spinlock.h | 47 - include/linux/rwsem.h | 5 - kernel/Kconfig.locks| 2 +- kernel/locking/Makefile | 4 +- kernel/locking/rwsem-spinlock.c | 339 kernel/locking/rwsem.h | 3 - 31 files changed, 2 insertions(+), 520 deletions(-) delete mode 100644 include/linux/rwsem-spinlock.h delete mode 100644 kernel/locking/rwsem-spinlock.c diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 584a6e114853..27c871227eee 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -49,13 +49,6 @@ config MMU bool default y -config RWSEM_GENERIC_SPINLOCK - bool - -config RWSEM_XCHGADD_ALGORITHM - bool - default y - config ARCH_HAS_ILOG2_U32 bool default n diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index c781e45d1d99..23e063df5d2c 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -63,9 +63,6 @@ config SCHED_OMIT_FRAME_POINTER config GENERIC_CSUM def_bool y -config RWSEM_GENERIC_SPINLOCK - def_bool y - config ARCH_DISCONTIGMEM_ENABLE def_bool n diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 054ead960f98..c11c61093c6c 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -178,10 +178,6 @@ config TRACE_IRQFLAGS_SUPPORT bool default !CPU_V7M -config RWSEM_XCHGADD_ALGORITHM - bool - default y - config ARCH_HAS_ILOG2_U32 bool diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7e34b9eba5de..c62b9db2b5e8 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -237,9 +237,6 @@ config LOCKDEP_SUPPORT config TRACE_IRQFLAGS_SUPPORT def_bool y -config RWSEM_XCHGADD_ALGORITHM - def_bool y - config GENERIC_BUG def_bool y depends on BUG diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig index e5cd3c5f8399..ed92b5840c0a 100644 --- a/arch/c6x/Kconfig +++ b/arch/c6x/Kconfig @@ -27,9 +27,6 @@ config MMU config FPU def_bool n -config RWSEM_GENERIC_SPINLOCK - def_bool y - config GENERIC_CALIBRATE_DELAY def_bool y diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 725a115759c9..6555d1781132 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -92,9 +92,6 @@ config GENERIC_HWEIGHT config MMU def_bool y -config RWSEM_GENERIC_SPINLOCK - def_bool y - config STACKTRACE_SUPPORT def_bool y diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig index c071da34e081..61c01db6c292 100644 --- a/arch/h8300/Kconfig +++ b/arch/h8300/Kconfig @@ -27,9 +27,6 @@ config H8300 config CPU_BIG_ENDIAN def_bool y -config RWSEM_GENERIC_SPINLOCK - def_bool y - config GENERIC_HWEIGHT def_bool y diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index ac441680dcc0..3e54a53208d5 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -65,12 +65,6 @@ config GENERIC_CSUM config GENERIC_IRQ_PROBE def_bool y -config RWSEM_GENERIC_SPINLOCK - def_bool n - -config RWSEM_XCHGADD_ALGORITHM - def_bool y - config GENERIC_HWEIGHT
[PATCH v5 0/3] locking/rwsem: Rwsem rearchitecture part 0
v5: - Rebase to the latest v5.1 tree and fix conflicts in arch/{xtensa,s390}/include/asm/Kbuild. v4: - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. v3: - Optimize __down_read_trylock() for the uncontended case as suggested by Linus. v2: - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ. - Update performance test data in patch 1. The goal of this patchset is to remove the architecture specific files for rwsem-xadd to make it easer to add enhancements in the later rwsem patches. It also removes the legacy rwsem-spinlock.c file and make all the architectures use one single implementation of rwsem - rwsem-xadd.c. Waiman Long (3): locking/rwsem: Remove arch specific rwsem files locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs locking/rwsem: Optimize down_read_trylock() MAINTAINERS | 1 - arch/alpha/Kconfig | 7 - arch/alpha/include/asm/rwsem.h | 211 arch/arc/Kconfig| 3 - arch/arm/Kconfig| 4 - arch/arm/include/asm/Kbuild | 1 - arch/arm64/Kconfig | 3 - arch/arm64/include/asm/Kbuild | 1 - arch/c6x/Kconfig| 3 - arch/csky/Kconfig | 3 - arch/h8300/Kconfig | 3 - arch/hexagon/Kconfig| 6 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/Kconfig | 4 - arch/ia64/include/asm/rwsem.h | 172 arch/m68k/Kconfig | 7 - arch/microblaze/Kconfig | 6 - arch/mips/Kconfig | 7 - arch/nds32/Kconfig | 3 - arch/nios2/Kconfig | 3 - arch/openrisc/Kconfig | 6 - arch/parisc/Kconfig | 6 - arch/powerpc/Kconfig| 7 - arch/powerpc/include/asm/Kbuild | 1 - arch/riscv/Kconfig | 3 - arch/s390/Kconfig | 6 - arch/s390/include/asm/Kbuild| 1 - arch/sh/Kconfig | 6 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/Kconfig | 8 - arch/sparc/include/asm/Kbuild | 1 - arch/unicore32/Kconfig | 6 - arch/x86/Kconfig| 3 - arch/x86/include/asm/rwsem.h| 237 -- arch/x86/lib/Makefile | 1 - arch/x86/lib/rwsem.S| 156 --- arch/x86/um/Kconfig | 6 - arch/x86/um/Makefile| 1 - arch/xtensa/Kconfig | 3 - arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/rwsem.h | 140 - include/linux/rwsem-spinlock.h | 47 - include/linux/rwsem.h | 9 +- kernel/Kconfig.locks| 2 +- kernel/locking/Makefile | 4 +- kernel/locking/percpu-rwsem.c | 2 + kernel/locking/rwsem-spinlock.c | 339 kernel/locking/rwsem.h | 130 48 files changed, 135 insertions(+), 1447 deletions(-) delete mode 100644 arch/alpha/include/asm/rwsem.h delete mode 100644 arch/ia64/include/asm/rwsem.h delete mode 100644 arch/x86/include/asm/rwsem.h delete mode 100644 arch/x86/lib/rwsem.S delete mode 100644 include/asm-generic/rwsem.h delete mode 100644 include/linux/rwsem-spinlock.h delete mode 100644 kernel/locking/rwsem-spinlock.c -- 2.18.1