Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-18 Thread Will Deacon
On Fri, Feb 15, 2019 at 01:58:34PM -0500, Waiman Long wrote:
> On 02/15/2019 01:40 PM, Will Deacon wrote:
> > On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote:
> >> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
> >>> v4:
> >>>  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
> >>>
> >>> v3:
> >>>  - Optimize __down_read_trylock() for the uncontended case as suggested
> >>>by Linus.
> >>>
> >>> v2:
> >>>  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
> >>>  - Update performance test data in patch 1.
> >>>
> >>> The goal of this patchset is to remove the architecture specific files
> >>> for rwsem-xadd to make it easer to add enhancements in the later rwsem
> >>> patches. It also removes the legacy rwsem-spinlock.c file and make all
> >>> the architectures use one single implementation of rwsem - rwsem-xadd.c.
> >>>
> >>> Waiman Long (3):
> >>>   locking/rwsem: Remove arch specific rwsem files
> >>>   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
> >>> archs
> >>>   locking/rwsem: Optimize down_read_trylock()
> >> Acked-by: Peter Zijlstra (Intel) 
> >>
> >> with the caveat that I'm happy to exchange patch 3 back to my earlier
> >> suggestion in case Will expesses concerns wrt the ARM64 performance of
> >> Linus' suggestion.
> > Right, the current proposal doesn't work well for us, unfortunately. Which
> > was your earlier suggestion?
> >
> > Will
> 
> In my posting yesterday, I showed that most of the trylocks done were
> actually uncontended. Assuming that pattern hold for the most of the
> workloads, it will not that bad after all.

That's fair enough; if you're going to sit in a tight trylock() loop like the
benchmark does, then you're much better off just calling lock() if you care
at all about scalability.

Will


Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-15 Thread Will Deacon
On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
> > v4:
> >  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
> > 
> > v3:
> >  - Optimize __down_read_trylock() for the uncontended case as suggested
> >by Linus.
> > 
> > v2:
> >  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
> >  - Update performance test data in patch 1.
> > 
> > The goal of this patchset is to remove the architecture specific files
> > for rwsem-xadd to make it easer to add enhancements in the later rwsem
> > patches. It also removes the legacy rwsem-spinlock.c file and make all
> > the architectures use one single implementation of rwsem - rwsem-xadd.c.
> > 
> > Waiman Long (3):
> >   locking/rwsem: Remove arch specific rwsem files
> >   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
> > archs
> >   locking/rwsem: Optimize down_read_trylock()
> 
> Acked-by: Peter Zijlstra (Intel) 
> 
> with the caveat that I'm happy to exchange patch 3 back to my earlier
> suggestion in case Will expesses concerns wrt the ARM64 performance of
> Linus' suggestion.

Right, the current proposal doesn't work well for us, unfortunately. Which
was your earlier suggestion?

Will


Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-15 Thread Waiman Long
On 02/15/2019 01:40 PM, Will Deacon wrote:
> On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote:
>> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
>>> v4:
>>>  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
>>>
>>> v3:
>>>  - Optimize __down_read_trylock() for the uncontended case as suggested
>>>by Linus.
>>>
>>> v2:
>>>  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
>>>  - Update performance test data in patch 1.
>>>
>>> The goal of this patchset is to remove the architecture specific files
>>> for rwsem-xadd to make it easer to add enhancements in the later rwsem
>>> patches. It also removes the legacy rwsem-spinlock.c file and make all
>>> the architectures use one single implementation of rwsem - rwsem-xadd.c.
>>>
>>> Waiman Long (3):
>>>   locking/rwsem: Remove arch specific rwsem files
>>>   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
>>> archs
>>>   locking/rwsem: Optimize down_read_trylock()
>> Acked-by: Peter Zijlstra (Intel) 
>>
>> with the caveat that I'm happy to exchange patch 3 back to my earlier
>> suggestion in case Will expesses concerns wrt the ARM64 performance of
>> Linus' suggestion.
> Right, the current proposal doesn't work well for us, unfortunately. Which
> was your earlier suggestion?
>
> Will

In my posting yesterday, I showed that most of the trylocks done were
actually uncontended. Assuming that pattern hold for the most of the
workloads, it will not that bad after all.

-Longman



Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-14 Thread Waiman Long
On 02/14/2019 05:37 AM, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
>> v4:
>>  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
>>
>> v3:
>>  - Optimize __down_read_trylock() for the uncontended case as suggested
>>by Linus.
>>
>> v2:
>>  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
>>  - Update performance test data in patch 1.
>>
>> The goal of this patchset is to remove the architecture specific files
>> for rwsem-xadd to make it easer to add enhancements in the later rwsem
>> patches. It also removes the legacy rwsem-spinlock.c file and make all
>> the architectures use one single implementation of rwsem - rwsem-xadd.c.
>>
>> Waiman Long (3):
>>   locking/rwsem: Remove arch specific rwsem files
>>   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
>> archs
>>   locking/rwsem: Optimize down_read_trylock()
> Acked-by: Peter Zijlstra (Intel) 
>
> with the caveat that I'm happy to exchange patch 3 back to my earlier
> suggestion in case Will expesses concerns wrt the ARM64 performance of
> Linus' suggestion.

I inserted a few lock event counters into the rwsem trylock code:

static inline int __down_read_trylock(struct rw_semaphore *sem)
{
    /*
 * Optimize for the case when the rwsem is not locked at all.
 */
    long tmp = RWSEM_UNLOCKED_VALUE;

    lockevent_inc(rwsem_rtrylock);
    do {
    if (atomic_long_try_cmpxchg_acquire(>count, ,
    tmp + RWSEM_ACTIVE_READ_BIAS)) {
    rwsem_set_reader_owned(sem);
    return 1;
    }
    lockevent_inc(rwsem_rtrylock_retry);
    } while (tmp >= 0);
    lockevent_inc(rwsem_rtrylock_fail);
    return 0;
}

static inline int __down_write_trylock(struct rw_semaphore *sem)
{
    long tmp;

    lockevent_inc(rwsem_wtrylock);
    tmp = atomic_long_cmpxchg_acquire(>count, RWSEM_UNLOCKED_VALUE,
  RWSEM_ACTIVE_WRITE_BIAS);
    if (tmp == RWSEM_UNLOCKED_VALUE) {
    rwsem_set_owner(sem);
    return true;
    }
    lockevent_inc(rwsem_wtrylock_fail);
    return false;
}

I booted the new kernel on a 4-socket 56-core 112-thread Broadwell
system. The counter values

1) After bootup:

rwsem_rtrylock=784029
rwsem_rtrylock_fail=59
rwsem_rtrylock_retry=394
rwsem_wtrylock=18284
rwsem_wtrylock_fail=230

2) After parallel kernel build (-j112):

rwsem_rtrylock=338667559
rwsem_rtrylock_fail=18
rwsem_rtrylock_retry=51
rwsem_wtrylock=17016332
rwsem_wtrylock_fail=98058

At least for these two use cases, try-for-ownership as suggested by
Linus is the right choice.

Cheers,
Longman



Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-14 Thread Peter Zijlstra
On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
> v4:
>  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
> 
> v3:
>  - Optimize __down_read_trylock() for the uncontended case as suggested
>by Linus.
> 
> v2:
>  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
>  - Update performance test data in patch 1.
> 
> The goal of this patchset is to remove the architecture specific files
> for rwsem-xadd to make it easer to add enhancements in the later rwsem
> patches. It also removes the legacy rwsem-spinlock.c file and make all
> the architectures use one single implementation of rwsem - rwsem-xadd.c.
> 
> Waiman Long (3):
>   locking/rwsem: Remove arch specific rwsem files
>   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
> archs
>   locking/rwsem: Optimize down_read_trylock()

Acked-by: Peter Zijlstra (Intel) 

with the caveat that I'm happy to exchange patch 3 back to my earlier
suggestion in case Will expesses concerns wrt the ARM64 performance of
Linus' suggestion.


[PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-13 Thread Waiman Long
v4:
 - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.

v3:
 - Optimize __down_read_trylock() for the uncontended case as suggested
   by Linus.

v2:
 - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
 - Update performance test data in patch 1.

The goal of this patchset is to remove the architecture specific files
for rwsem-xadd to make it easer to add enhancements in the later rwsem
patches. It also removes the legacy rwsem-spinlock.c file and make all
the architectures use one single implementation of rwsem - rwsem-xadd.c.

Waiman Long (3):
  locking/rwsem: Remove arch specific rwsem files
  locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
archs
  locking/rwsem: Optimize down_read_trylock()

 MAINTAINERS |   1 -
 arch/alpha/Kconfig  |   7 -
 arch/alpha/include/asm/rwsem.h  | 211 -
 arch/arc/Kconfig|   3 -
 arch/arm/Kconfig|   4 -
 arch/arm/include/asm/Kbuild |   1 -
 arch/arm64/Kconfig  |   3 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/c6x/Kconfig|   3 -
 arch/csky/Kconfig   |   3 -
 arch/h8300/Kconfig  |   3 -
 arch/hexagon/Kconfig|   6 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/Kconfig   |   4 -
 arch/ia64/include/asm/rwsem.h   | 172 
 arch/m68k/Kconfig   |   7 -
 arch/microblaze/Kconfig |   6 -
 arch/mips/Kconfig   |   7 -
 arch/nds32/Kconfig  |   3 -
 arch/nios2/Kconfig  |   3 -
 arch/openrisc/Kconfig   |   6 -
 arch/parisc/Kconfig |   6 -
 arch/powerpc/Kconfig|   7 -
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/riscv/Kconfig  |   3 -
 arch/s390/Kconfig   |   6 -
 arch/s390/include/asm/Kbuild|   1 -
 arch/sh/Kconfig |   6 -
 arch/sh/include/asm/Kbuild  |   1 -
 arch/sparc/Kconfig  |   8 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/unicore32/Kconfig  |   6 -
 arch/x86/Kconfig|   3 -
 arch/x86/include/asm/rwsem.h| 237 
 arch/x86/lib/Makefile   |   1 -
 arch/x86/lib/rwsem.S| 156 --
 arch/x86/um/Kconfig |   6 -
 arch/x86/um/Makefile|   1 -
 arch/xtensa/Kconfig |   3 -
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h | 140 -
 include/linux/rwsem-spinlock.h  |  47 --
 include/linux/rwsem.h   |   9 +-
 kernel/Kconfig.locks|   2 +-
 kernel/locking/Makefile |   4 +-
 kernel/locking/percpu-rwsem.c   |   2 +
 kernel/locking/rwsem-spinlock.c | 339 
 kernel/locking/rwsem.h  | 130 +++
 48 files changed, 135 insertions(+), 1447 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h
 delete mode 100644 include/linux/rwsem-spinlock.h
 delete mode 100644 kernel/locking/rwsem-spinlock.c

-- 
1.8.3.1