Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Mon, Feb 11, 2019 at 11:35:24AM -0500, Waiman Long wrote: > On 02/11/2019 06:58 AM, Peter Zijlstra wrote: > > Which is clearly worse. Now we can write that as: > > > > int __down_read_trylock2(unsigned long *l) > > { > > long tmp = READ_ONCE(*l); > > > > while (tmp >= 0) { > > if (try_cmpxchg(l, , tmp + 1)) > > return 1; > > } > > > > return 0; > > } > > > > which generates: > > > > 0030 <__down_read_trylock2>: > > 30: 48 8b 07mov(%rdi),%rax > > 33: 48 85 c0test %rax,%rax > > 36: 78 18 js 50 <__down_read_trylock2+0x20> > > 38: 48 8d 50 01 lea0x1(%rax),%rdx > > 3c: f0 48 0f b1 17 lock cmpxchg %rdx,(%rdi) > > 41: 75 f0 jne33 <__down_read_trylock2+0x3> > > 43: b8 01 00 00 00 mov$0x1,%eax > > 48: c3 retq > > 49: 0f 1f 80 00 00 00 00nopl 0x0(%rax) > > 50: 31 c0 xor%eax,%eax > > 52: c3 retq > > > > Which is a lot better; but not quite there yet. > > > > > > I've tried quite a bit, but I can't seem to get GCC to generate the: > > > > add $1,%rdx > > jle > > > > required; stuff like: > > > > new = old + 1; > > if (new <= 0) > > > > generates: > > > > lea 0x1(%rax),%rdx > > test %rdx, %rdx > > jle > > Thanks for the suggested code snippet. So you want to replace "lea > 0x1(%rax), %rdx" by "add $1,%rdx"? > > I think the compiler is doing that so as to use the address generation > unit for addition instead of using the ALU. That will leave the ALU > available for doing other arithmetic operation in parallel. I don't > think it is a good idea to override the compiler and force it to use > ALU. So I am not going to try doing that. It is only 1 or 2 more of > codes anyway. Yeah, I was trying to see what I could make it do.. #2 really should be good enough, but you know how it is once you're poking at it :-)
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On 02/11/2019 06:58 AM, Peter Zijlstra wrote: > Which is clearly worse. Now we can write that as: > > int __down_read_trylock2(unsigned long *l) > { > long tmp = READ_ONCE(*l); > > while (tmp >= 0) { > if (try_cmpxchg(l, , tmp + 1)) > return 1; > } > > return 0; > } > > which generates: > > 0030 <__down_read_trylock2>: > 30: 48 8b 07mov(%rdi),%rax > 33: 48 85 c0test %rax,%rax > 36: 78 18 js 50 <__down_read_trylock2+0x20> > 38: 48 8d 50 01 lea0x1(%rax),%rdx > 3c: f0 48 0f b1 17 lock cmpxchg %rdx,(%rdi) > 41: 75 f0 jne33 <__down_read_trylock2+0x3> > 43: b8 01 00 00 00 mov$0x1,%eax > 48: c3 retq > 49: 0f 1f 80 00 00 00 00nopl 0x0(%rax) > 50: 31 c0 xor%eax,%eax > 52: c3 retq > > Which is a lot better; but not quite there yet. > > > I've tried quite a bit, but I can't seem to get GCC to generate the: > > add $1,%rdx > jle > > required; stuff like: > > new = old + 1; > if (new <= 0) > > generates: > > lea 0x1(%rax),%rdx > test %rdx, %rdx > jle Thanks for the suggested code snippet. So you want to replace "lea 0x1(%rax), %rdx" by "add $1,%rdx"? I think the compiler is doing that so as to use the address generation unit for addition instead of using the ALU. That will leave the ALU available for doing other arithmetic operation in parallel. I don't think it is a good idea to override the compiler and force it to use ALU. So I am not going to try doing that. It is only 1 or 2 more of codes anyway. Cheers, Longman
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On 02/11/2019 05:39 AM, Ingo Molnar wrote: > * Ingo Molnar wrote: > >> Sounds good to me - I've merged this patch, will push it out after >> testing. > Based on Peter's feedback I'm delaying this - performance testing on at > least one key ll/sc arch would be nice indeed. > > Thanks, > > Ingo Yes, I will twist the generic code to generate better code. As I said in the commit log, only x86, ia64 and alpha provide assembly code to replace the generic C code. The ll/sc archs that I have access to (ARM64, ppc) are all using the generic C code anyway. I actually had done some performance measurement on both those platforms and didn't see any performance difference. I didn't include them as they were using generic code before. I will rerun the tests after I twisted the generic C code. Thanks, Longman
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > +static inline int __down_read_trylock(struct rw_semaphore *sem) > +{ > + long tmp; > + > + while ((tmp = atomic_long_read(>count)) >= 0) { > + if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, > +tmp + RWSEM_ACTIVE_READ_BIAS)) { > + return 1; > + } > + } > + return 0; > +} So the orignal x86 implementation reads: static inline bool __down_read_trylock(struct rw_semaphore *sem) { long result, tmp; asm volatile("# beginning __down_read_trylock\n\t" " mov %[count],%[result]\n\t" "1:\n\t" " mov %[result],%[tmp]\n\t" " add %[inc],%[tmp]\n\t" " jle2f\n\t" LOCK_PREFIX " cmpxchg %[tmp],%[count]\n\t" " jnz1b\n\t" "2:\n\t" "# ending __down_read_trylock\n\t" : [count] "+m" (sem->count), [result] "=" (result), [tmp] "=" (tmp) : [inc] "i" (RWSEM_ACTIVE_READ_BIAS) : "memory", "cc"); return result >= 0; } you replace that with: int __down_read_trylock1(unsigned long *l) { long tmp; while ((tmp = READ_ONCE(*l)) >= 0) { if (tmp == cmpxchg(l, tmp, tmp + 1)) return 1; } return 0; } which generates: <__down_read_trylock1>: 0: eb 17 jmp19 <__down_read_trylock1+0x19> 2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 8: 48 8d 4a 01 lea0x1(%rdx),%rcx c: 48 89 d0mov%rdx,%rax f: f0 48 0f b1 0f lock cmpxchg %rcx,(%rdi) 14: 48 39 c2cmp%rax,%rdx 17: 74 0f je 28 <__down_read_trylock1+0x28> 19: 48 8b 17mov(%rdi),%rdx 1c: 48 85 d2test %rdx,%rdx 1f: 79 e7 jns8 <__down_read_trylock1+0x8> 21: 31 c0 xor%eax,%eax 23: c3 retq 24: 0f 1f 40 00 nopl 0x0(%rax) 28: b8 01 00 00 00 mov$0x1,%eax 2d: c3 retq Which is clearly worse. Now we can write that as: int __down_read_trylock2(unsigned long *l) { long tmp = READ_ONCE(*l); while (tmp >= 0) { if (try_cmpxchg(l, , tmp + 1)) return 1; } return 0; } which generates: 0030 <__down_read_trylock2>: 30: 48 8b 07mov(%rdi),%rax 33: 48 85 c0test %rax,%rax 36: 78 18 js 50 <__down_read_trylock2+0x20> 38: 48 8d 50 01 lea0x1(%rax),%rdx 3c: f0 48 0f b1 17 lock cmpxchg %rdx,(%rdi) 41: 75 f0 jne33 <__down_read_trylock2+0x3> 43: b8 01 00 00 00 mov$0x1,%eax 48: c3 retq 49: 0f 1f 80 00 00 00 00nopl 0x0(%rax) 50: 31 c0 xor%eax,%eax 52: c3 retq Which is a lot better; but not quite there yet. I've tried quite a bit, but I can't seem to get GCC to generate the: add $1,%rdx jle required; stuff like: new = old + 1; if (new <= 0) generates: lea 0x1(%rax),%rdx test %rdx, %rdx jle Ah well, have fun :-) typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef unsigned long long u64; typedef signed char s8; typedef signed short s16; typedef signed int s32; typedef signed long long s64; typedef _Bool bool; # define CC_SET(c) "\n\t/* output condition code " #c "*/\n" # define CC_OUT(c) "=@cc" #c #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) extern void __cmpxchg_wrong_size(void); #define __raw_cmpxchg(ptr, old, new, size, lock) \ ({ \ __typeof__(*(ptr)) __ret; \ __typeof__(*(ptr)) __old = (old);\ __typeof__(*(ptr)) __new = (new);\ switch (size) { \ case 1: \ {\ volatile u8 *__ptr = (volatile u8 *)(ptr); \ asm volatile(lock "cmpxchgb %2,%1" \ : "=a" (__ret), "+m" (*__ptr) \ : "q" (__new), "0" (__old) \ : "memory");\ break; \ }\ case 2: \ {\ volatile u16 *__ptr = (volatile u16 *)(ptr); \ asm volatile(lock "cmpxchgw %2,%1" \ : "=a" (__ret), "+m" (*__ptr) \ : "r" (__new), "0" (__old) \ : "memory");\ break; \ }\ case 4: \ {\ volatile u32 *__ptr = (volatile u32 *)(ptr); \ asm volatile(lock
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Mon, Feb 11, 2019 at 10:40:44AM +0100, Peter Zijlstra wrote: > On Mon, Feb 11, 2019 at 10:36:01AM +0100, Peter Zijlstra wrote: > > On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > > > +static inline int __down_read_trylock(struct rw_semaphore *sem) > > > +{ > > > + long tmp; > > > + > > > + while ((tmp = atomic_long_read(>count)) >= 0) { > > > + if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, > > > +tmp + RWSEM_ACTIVE_READ_BIAS)) { > > > + return 1; > > > > That really wants to be: > > > > if (atomic_long_try_cmpxchg_acquire(>count, , > > tmp + > > RWSEM_ACTIVE_READ_BIAS)) > > > > > + } > > > + } > > > + return 0; > > > +} > > Also, the is the one case where LL/SC can actually do 'better'. Do you > have benchmarks for say PowerPC or ARM64 ? Ah, I see they already used asm-generic/rwsem.h which has similar code to the above.
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
* Will Deacon wrote: > On Mon, Feb 11, 2019 at 11:39:27AM +0100, Ingo Molnar wrote: > > > > * Ingo Molnar wrote: > > > > > Sounds good to me - I've merged this patch, will push it out after > > > testing. > > > > Based on Peter's feedback I'm delaying this - performance testing on at > > least one key ll/sc arch would be nice indeed. > > Once Waiman has posted a new version, I can take it for a spin on some > arm64 boxen if he shares his workload. Cool, thanks! Ingo
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Mon, Feb 11, 2019 at 11:39:27AM +0100, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > Sounds good to me - I've merged this patch, will push it out after > > testing. > > Based on Peter's feedback I'm delaying this - performance testing on at > least one key ll/sc arch would be nice indeed. Once Waiman has posted a new version, I can take it for a spin on some arm64 boxen if he shares his workload. Will
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
* Ingo Molnar wrote: > Sounds good to me - I've merged this patch, will push it out after > testing. Based on Peter's feedback I'm delaying this - performance testing on at least one key ll/sc arch would be nice indeed. Thanks, Ingo
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Mon, Feb 11, 2019 at 10:36:01AM +0100, Peter Zijlstra wrote: > On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > > +static inline int __down_read_trylock(struct rw_semaphore *sem) > > +{ > > + long tmp; > > + > > + while ((tmp = atomic_long_read(>count)) >= 0) { > > + if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, > > + tmp + RWSEM_ACTIVE_READ_BIAS)) { > > + return 1; > > That really wants to be: > > if (atomic_long_try_cmpxchg_acquire(>count, , > tmp + > RWSEM_ACTIVE_READ_BIAS)) > > > + } > > + } > > + return 0; > > +} Also, the is the one case where LL/SC can actually do 'better'. Do you have benchmarks for say PowerPC or ARM64 ?
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On Sun, Feb 10, 2019 at 09:00:50PM -0500, Waiman Long wrote: > diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h > index bad2bca..067e265 100644 > --- a/kernel/locking/rwsem.h > +++ b/kernel/locking/rwsem.h > @@ -32,6 +32,26 @@ > # define DEBUG_RWSEMS_WARN_ON(c) > #endif > > +/* > + * R/W semaphores originally for PPC using the stuff in lib/rwsem.c. > + * Adapted largely from include/asm-i386/rwsem.h > + * by Paul Mackerras . > + */ > + > +/* > + * the semaphore definition > + */ > +#ifdef CONFIG_64BIT > +# define RWSEM_ACTIVE_MASK 0xL > +#else > +# define RWSEM_ACTIVE_MASK 0xL > +#endif > + > +#define RWSEM_ACTIVE_BIAS0x0001L > +#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1) > +#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS > +#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + > RWSEM_ACTIVE_BIAS) > + > #ifdef CONFIG_RWSEM_SPIN_ON_OWNER > /* > * All writes to owner are protected by WRITE_ONCE() to make sure that > @@ -132,3 +152,113 @@ static inline void rwsem_clear_reader_owned(struct > rw_semaphore *sem) > { > } > #endif > + > +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM > +/* > + * lock for reading > + */ > +static inline void __down_read(struct rw_semaphore *sem) > +{ > + if (unlikely(atomic_long_inc_return_acquire(>count) <= 0)) > + rwsem_down_read_failed(sem); > +} > + > +static inline int __down_read_killable(struct rw_semaphore *sem) > +{ > + if (unlikely(atomic_long_inc_return_acquire(>count) <= 0)) { > + if (IS_ERR(rwsem_down_read_failed_killable(sem))) > + return -EINTR; > + } > + > + return 0; > +} > + > +static inline int __down_read_trylock(struct rw_semaphore *sem) > +{ > + long tmp; > + > + while ((tmp = atomic_long_read(>count)) >= 0) { > + if (tmp == atomic_long_cmpxchg_acquire(>count, tmp, > +tmp + RWSEM_ACTIVE_READ_BIAS)) { > + return 1; That really wants to be: if (atomic_long_try_cmpxchg_acquire(>count, , tmp + RWSEM_ACTIVE_READ_BIAS)) > + } > + } > + return 0; > +} > + > +/* > + * lock for writing > + */ > +static inline void __down_write(struct rw_semaphore *sem) > +{ > + long tmp; > + > + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, > + >count); > + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) > + rwsem_down_write_failed(sem); > +} > + > +static inline int __down_write_killable(struct rw_semaphore *sem) > +{ > + long tmp; > + > + tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS, > + >count); > + if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS)) > + if (IS_ERR(rwsem_down_write_failed_killable(sem))) > + return -EINTR; > + return 0; > +} > + > +static inline int __down_write_trylock(struct rw_semaphore *sem) > +{ > + long tmp; tmp = RWSEM_UNLOCKED_VALUE; > + > + tmp = atomic_long_cmpxchg_acquire(>count, RWSEM_UNLOCKED_VALUE, > + RWSEM_ACTIVE_WRITE_BIAS); > + return tmp == RWSEM_UNLOCKED_VALUE; return atomic_long_try_cmpxchg_acquire(>count, , RWSEM_ACTIVE_WRITE_BIAS); > +} > + > +/* > + * unlock after reading > + */ > +static inline void __up_read(struct rw_semaphore *sem) > +{ > + long tmp; > + > + tmp = atomic_long_dec_return_release(>count); > + if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)) > + rwsem_wake(sem); > +} > + > +/* > + * unlock after writing > + */ > +static inline void __up_write(struct rw_semaphore *sem) > +{ > + if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS, > + >count) < 0)) > + rwsem_wake(sem); > +} > + > +/* > + * downgrade write lock to read lock > + */ > +static inline void __downgrade_write(struct rw_semaphore *sem) > +{ > + long tmp; > + > + /* > + * When downgrading from exclusive to shared ownership, > + * anything inside the write-locked region cannot leak > + * into the read side. In contrast, anything in the > + * read-locked region is ok to be re-ordered into the > + * write side. As such, rely on RELEASE semantics. > + */ > + tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, >count); > + if (tmp < 0) > + rwsem_downgrade_wake(sem); > +} > + > +#endif /* CONFIG_RWSEM_XCHGADD_ALGORITHM */
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
* Waiman Long wrote: > On 02/10/2019 09:00 PM, Waiman Long wrote: > > As the generic rwsem-xadd code is using the appropriate acquire and > > release versions of the atomic operations, the arch specific rwsem.h > > files will not be that much faster than the generic code as long as the > > atomic functions are properly implemented. So we can remove those arch > > specific rwsem.h and stop building asm/rwsem.h to reduce maintenance > > effort. > > > > Currently, only x86, alpha and ia64 have implemented architecture > > specific fast paths. I don't have access to alpha and ia64 systems for > > testing, but they are legacy systems that are not likely to be updated > > to the latest kernel anyway. > > > > By using a rwsem microbenchmark, the total locking rates on a 4-socket > > 56-core 112-thread x86-64 system before and after the patch were as > > follows (mixed means equal # of read and write locks): > > > > Before Patch After Patch > ># of Threads wlock rlock mixed wlock rlock mixed > > - - - - - - > > 127,373 29,409 28,17028,773 30,164 29,276 > > 2 7,697 14,922 1,703 7,435 15,167 1,729 > > 4 6,987 14,285 1,490 7,181 14,438 1,330 > > 8 6,650 13,652 761 6,918 13,796 718 > >16 6,434 15,729 713 6,554 16,030 625 > >32 5,590 15,312 552 6,124 15,344 471 > >64 5,980 15,478 61 5,668 15,509 58 > > > > There were some run-to-run variations for the multi-thread tests. For > > x86-64, using the generic C code fast path seems to be a liitle bit > > faster than the assembly version especially for read-lock and when lock > > contention is low. Looking at the assembly version of the fast paths, > > there are assembly to/from C code wrappers that save and restore all > > the callee-clobbered registers (7 registers on x86-64). The assembly > > generated from the generic C code doesn't need to do that. That may > > explain the slight performance gain here. > > > > The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h > > as no other code other than those under kernel/locking needs to access > > the internal rwsem macros and functions. > > > > Signed-off-by: Waiman Long > > I have decided to break the rwsem patchset that I sent out on last > Thursday into 3 parts. This patch is part 0 as it touches a number of > arch specific files and so have the widest distribution. I would like to > get it merged first. Part 1 will be patches 1-10 (except 4) of my > original rwsem patchset. This part moves things around, adds more > debugging capability and lays the ground work for the next part. Part 2 > will contains the remaining patches which are the real beef of the whole > patchset. Sounds good to me - I've merged this patch, will push it out after testing. Thanks, Ingo
Re: [PATCH] locking/rwsem: Remove arch specific rwsem files
On 02/10/2019 09:00 PM, Waiman Long wrote: > As the generic rwsem-xadd code is using the appropriate acquire and > release versions of the atomic operations, the arch specific rwsem.h > files will not be that much faster than the generic code as long as the > atomic functions are properly implemented. So we can remove those arch > specific rwsem.h and stop building asm/rwsem.h to reduce maintenance > effort. > > Currently, only x86, alpha and ia64 have implemented architecture > specific fast paths. I don't have access to alpha and ia64 systems for > testing, but they are legacy systems that are not likely to be updated > to the latest kernel anyway. > > By using a rwsem microbenchmark, the total locking rates on a 4-socket > 56-core 112-thread x86-64 system before and after the patch were as > follows (mixed means equal # of read and write locks): > > Before Patch After Patch ># of Threads wlock rlock mixed wlock rlock mixed > - - - - - - > 127,373 29,409 28,17028,773 30,164 29,276 > 2 7,697 14,922 1,703 7,435 15,167 1,729 > 4 6,987 14,285 1,490 7,181 14,438 1,330 > 8 6,650 13,652 761 6,918 13,796 718 >16 6,434 15,729 713 6,554 16,030 625 >32 5,590 15,312 552 6,124 15,344 471 >64 5,980 15,478 61 5,668 15,509 58 > > There were some run-to-run variations for the multi-thread tests. For > x86-64, using the generic C code fast path seems to be a liitle bit > faster than the assembly version especially for read-lock and when lock > contention is low. Looking at the assembly version of the fast paths, > there are assembly to/from C code wrappers that save and restore all > the callee-clobbered registers (7 registers on x86-64). The assembly > generated from the generic C code doesn't need to do that. That may > explain the slight performance gain here. > > The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h > as no other code other than those under kernel/locking needs to access > the internal rwsem macros and functions. > > Signed-off-by: Waiman Long I have decided to break the rwsem patchset that I sent out on last Thursday into 3 parts. This patch is part 0 as it touches a number of arch specific files and so have the widest distribution. I would like to get it merged first. Part 1 will be patches 1-10 (except 4) of my original rwsem patchset. This part moves things around, adds more debugging capability and lays the ground work for the next part. Part 2 will contains the remaining patches which are the real beef of the whole patchset. Cheers, Longman
[PATCH] locking/rwsem: Remove arch specific rwsem files
As the generic rwsem-xadd code is using the appropriate acquire and release versions of the atomic operations, the arch specific rwsem.h files will not be that much faster than the generic code as long as the atomic functions are properly implemented. So we can remove those arch specific rwsem.h and stop building asm/rwsem.h to reduce maintenance effort. Currently, only x86, alpha and ia64 have implemented architecture specific fast paths. I don't have access to alpha and ia64 systems for testing, but they are legacy systems that are not likely to be updated to the latest kernel anyway. By using a rwsem microbenchmark, the total locking rates on a 4-socket 56-core 112-thread x86-64 system before and after the patch were as follows (mixed means equal # of read and write locks): Before Patch After Patch # of Threads wlock rlock mixed wlock rlock mixed - - - - - - 127,373 29,409 28,17028,773 30,164 29,276 2 7,697 14,922 1,703 7,435 15,167 1,729 4 6,987 14,285 1,490 7,181 14,438 1,330 8 6,650 13,652 761 6,918 13,796 718 16 6,434 15,729 713 6,554 16,030 625 32 5,590 15,312 552 6,124 15,344 471 64 5,980 15,478 61 5,668 15,509 58 There were some run-to-run variations for the multi-thread tests. For x86-64, using the generic C code fast path seems to be a liitle bit faster than the assembly version especially for read-lock and when lock contention is low. Looking at the assembly version of the fast paths, there are assembly to/from C code wrappers that save and restore all the callee-clobbered registers (7 registers on x86-64). The assembly generated from the generic C code doesn't need to do that. That may explain the slight performance gain here. The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h as no other code other than those under kernel/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm/Kbuild | 1 - arch/ia64/include/asm/rwsem.h | 172 - arch/powerpc/include/asm/Kbuild | 1 - arch/s390/include/asm/Kbuild| 1 - arch/sh/include/asm/Kbuild | 1 - arch/sparc/include/asm/Kbuild | 1 - arch/x86/include/asm/rwsem.h| 237 arch/x86/lib/Makefile | 1 - arch/x86/lib/rwsem.S| 156 -- arch/xtensa/include/asm/Kbuild | 1 - include/asm-generic/rwsem.h | 140 include/linux/rwsem.h | 4 +- kernel/locking/percpu-rwsem.c | 2 + kernel/locking/rwsem.h | 130 ++ 18 files changed, 133 insertions(+), 929 deletions(-) delete mode 100644 arch/alpha/include/asm/rwsem.h delete mode 100644 arch/ia64/include/asm/rwsem.h delete mode 100644 arch/x86/include/asm/rwsem.h delete mode 100644 arch/x86/lib/rwsem.S delete mode 100644 include/asm-generic/rwsem.h diff --git a/MAINTAINERS b/MAINTAINERS index 9919840..053f536 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8926,7 +8926,6 @@ F:arch/*/include/asm/spinlock*.h F: include/linux/rwlock*.h F: include/linux/mutex*.h F: include/linux/rwsem*.h -F: arch/*/include/asm/rwsem.h F: include/linux/seqlock.h F: lib/locking*.[ch] F: kernel/locking/ diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h deleted file mode 100644 index cf8fc8f9..000 --- a/arch/alpha/include/asm/rwsem.h +++ /dev/null @@ -1,211 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ALPHA_RWSEM_H -#define _ALPHA_RWSEM_H - -/* - * Written by Ivan Kokshaysky , 2001. - * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h - */ - -#ifndef _LINUX_RWSEM_H -#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead" -#endif - -#ifdef __KERNEL__ - -#include - -#define RWSEM_UNLOCKED_VALUE 0xL -#define RWSEM_ACTIVE_BIAS 0x0001L -#define RWSEM_ACTIVE_MASK 0xL -#define RWSEM_WAITING_BIAS (-0x0001L) -#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS -#define RWSEM_ACTIVE_WRITE_BIAS(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS) - -static inline int ___down_read(struct rw_semaphore *sem) -{ - long oldcount; -#ifndefCONFIG_SMP - oldcount = sem->count.counter; - sem->count.counter += RWSEM_ACTIVE_READ_BIAS; -#else - long temp; - __asm__ __volatile__( - "1: ldq_l %0,%1\n" -