Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-25 Thread Waiman Long
On 7/25/20 1:26 PM, Peter Zijlstra wrote: On Fri, Jul 24, 2020 at 03:10:59PM -0400, Waiman Long wrote: On 7/24/20 4:16 AM, Will Deacon wrote: On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: BTW, do you have

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Waiman Long
On 7/24/20 3:10 PM, Waiman Long wrote: On 7/24/20 4:16 AM, Will Deacon wrote: On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch? I

Re: [PATCH v4 0/6] powerpc: queued spinlocks and rwlocks

2020-07-24 Thread Waiman Long
/powerpc/include/asm/simple_spinlock.h create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h That patch series looks good to me. Thanks for working on this. For the series, Acked-by: Waiman Long

Re: [PATCH v4 6/6] powerpc: implement smp_cond_load_relaxed

2020-07-24 Thread Waiman Long
On 7/24/20 9:14 AM, Nicholas Piggin wrote: This implements smp_cond_load_relaed with the slowpath busy loop using the Nit: "smp_cond_load_relaxed" Cheers, Longman

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-24 Thread Waiman Long
On 7/24/20 4:16 AM, Will Deacon wrote: On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch? I will have to update the patch to fix

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-23 Thread Waiman Long
On 7/23/20 3:58 PM, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 03:04:13PM -0400, Waiman Long wrote: On 7/23/20 2:47 PM, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-23 Thread Waiman Long
On 7/23/20 2:47 PM, pet...@infradead.org wrote: On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: BTW, do you have any comment on my v2 lock holder cpu info qspinlock patch? I will have to update the patch to fix the reported 0-day test problem, but I want to collect other feedback

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-23 Thread Waiman Long
On 7/23/20 10:00 AM, Peter Zijlstra wrote: On Thu, Jul 09, 2020 at 12:06:13PM -0400, Waiman Long wrote: We don't really need to do a pv_spinlocks_init() if pv_kick() isn't supported. Waiman, if you cannot explain how not having kick is a sane thing, what are you saying here? The current PPC

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-23 Thread Waiman Long
On 7/23/20 9:30 AM, Nicholas Piggin wrote: I would prefer to extract out the pending bit handling code out into a separate helper function which can be overridden by the arch code instead of breaking the slowpath into 2 pieces. You mean have the arch provide a queued_spin_lock_slowpath_pending

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-21 Thread Waiman Long
On 7/21/20 7:08 AM, Nicholas Piggin wrote: diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h index b752d34517b3..26d8766a1106 100644 --- a/arch/powerpc/include/asm/qspinlock.h +++ b/arch/powerpc/include/asm/qspinlock.h @@ -31,16 +31,57 @@ static inline

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-09 Thread Waiman Long
++ arch/powerpc/platforms/pseries/Kconfig| 5 ++ arch/powerpc/platforms/pseries/setup.c| 6 +- include/asm-generic/qspinlock.h | 2 + Another ack? I am OK with adding the #ifdef around queued_spin_lock(). Acked-by: Waiman Long diff --git a/arch/powerpc

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long
On 7/8/20 7:50 PM, Waiman Long wrote: On 7/8/20 1:10 AM, Nicholas Piggin wrote: Excerpts from Waiman Long's message of July 8, 2020 1:33 pm: On 7/7/20 1:57 AM, Nicholas Piggin wrote: Yes, powerpc could certainly get more performance out of the slow paths, and then there are a few parameters

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long
On 7/8/20 4:41 AM, Peter Zijlstra wrote: On Tue, Jul 07, 2020 at 03:57:06PM +1000, Nicholas Piggin wrote: Yes, powerpc could certainly get more performance out of the slow paths, and then there are a few parameters to tune. Can you clarify? The slow path is already in use on ARM64 which is

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long
On 7/8/20 4:32 AM, Peter Zijlstra wrote: On Tue, Jul 07, 2020 at 11:33:45PM -0400, Waiman Long wrote: From 5d7941a498935fb225b2c7a3108cbf590114c3db Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 7 Jul 2020 22:29:16 -0400 Subject: [PATCH 2/9] locking/pvqspinlock: Introduce

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-08 Thread Waiman Long
On 7/8/20 1:10 AM, Nicholas Piggin wrote: Excerpts from Waiman Long's message of July 8, 2020 1:33 pm: On 7/7/20 1:57 AM, Nicholas Piggin wrote: Yes, powerpc could certainly get more performance out of the slow paths, and then there are a few parameters to tune. We don't have a good alternate

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-07 Thread Waiman Long
rom 161e545523a7eb4c42c145c04e9a5a15903ba3d9 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 7 Jul 2020 20:46:51 -0400 Subject: [PATCH 1/9] locking/pvqspinlock: Code relocation and extraction Move pv_kick_node() and the unlock functions up and extract out the hash and lock code from pv_wait_head_or_lock() into pv_hash_l

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-06 Thread Waiman Long
On 7/6/20 12:35 AM, Nicholas Piggin wrote: v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you). Thanks, Nick Nicholas Piggin (6): powerpc/powernv: must include hvcall.h to get PAPR defines powerpc/pseries: move some PAPR paravirt functions to their own file

Re: [PATCH v2 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-05 Thread Waiman Long
On 7/3/20 3:35 AM, Nicholas Piggin wrote: Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/paravirt.h | 28 ++ arch/powerpc/include/asm/qspinlock.h | 55 +++ arch/powerpc/include/asm/qspinlock_paravirt.h | 5 ++

Re: [PATCH 6/8] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-02 Thread Waiman Long
On 7/2/20 12:15 PM, kernel test robot wrote: Hi Nicholas, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on tip/locking/core v5.8-rc3 next-20200702] [If your patch is applied to the wrong git tree, kindly drop us a note. And when

Re: [PATCH 6/8] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-02 Thread Waiman Long
On 7/2/20 3:48 AM, Nicholas Piggin wrote: Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/paravirt.h | 23 arch/powerpc/include/asm/qspinlock.h | 55 +++ arch/powerpc/include/asm/qspinlock_paravirt.h | 5 ++

Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
On 6/16/20 2:53 PM, Joe Perches wrote: On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()&q

Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
On 6/16/20 2:53 PM, Joe Perches wrote: On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote: v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()&q

Re: [PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
On 6/16/20 2:09 PM, Andrew Morton wrote: On Tue, 16 Jun 2020 11:43:11 -0400 Waiman Long wrote: As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important pa

[PATCH v5 1/2] mm/slab: Use memzero_explicit() in kzfree()

2020-06-16 Thread Waiman Long
.org Acked-by: Michal Hocko Signed-off-by: Waiman Long --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 9e72ba224175..37d48a56431d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1726,7 +1726,7 @@ void kz

[PATCH v5 2/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
ked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Waiman Long --- arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c

[PATCH v5 0/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
especially if LTO is used. Instead, the new kfree_sensitive() uses memzero_explicit() which won't get compiled out. Waiman Long (2): mm/slab: Use memzero_explicit() in kzfree() mm, treewide: Rename kzfree() to kfree_sensitive() arch/s390/crypto/prng.c | 4 +-- arch

Re: [PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Waiman Long
On 6/16/20 10:26 AM, Dan Carpenter wrote: Last time you sent this we couldn't decide which tree it should go through. Either the crypto tree or through Andrew seems like the right thing to me. Also the other issue is that it risks breaking things if people add new kzfree() instances while we

Re: [PATCH v4 3/3] btrfs: Use kfree() in btrfs_ioctl_get_subvol_info()

2020-06-16 Thread Waiman Long
On 6/16/20 10:48 AM, David Sterba wrote: On Mon, Jun 15, 2020 at 09:57:18PM -0400, Waiman Long wrote: In btrfs_ioctl_get_subvol_info(), there is a classic case where kzalloc() was incorrectly paired with kzfree(). According to David Sterba, there isn't any sensitive information

Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()

2020-06-16 Thread Waiman Long
On 6/15/20 11:30 PM, Eric Biggers wrote: On Mon, Jun 15, 2020 at 09:57:16PM -0400, Waiman Long wrote: The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for the buffer

[PATCH v4 3/3] btrfs: Use kfree() in btrfs_ioctl_get_subvol_info()

2020-06-15 Thread Waiman Long
. Reported-by: David Sterba Signed-off-by: Waiman Long --- fs/btrfs/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f1dd9e4271e9..e8f7c5f00894 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2692,7 +2692,7 @@ static

[PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-15 Thread Waiman Long
ked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Waiman Long --- arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c

[PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()

2020-06-15 Thread Waiman Long
especially if LTO is being used. To make sure that this optimization will not happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to do the clearing. Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") Cc: sta...@vger.kernel.org Signed-off-by: Waiman Lon

[PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-15 Thread Waiman Long
ring isn't totally safe either as compiler may compile out the clearing in their optimizer especially if LTO is used. Instead, the new kfree_sensitive() uses memzero_explicit() which won't get compiled out. Waiman Long (3): mm/slab: Use memzero_explicit() in kzfree() mm, treewide: Ren

Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-15 Thread Waiman Long
On 6/15/20 2:07 PM, Dan Carpenter wrote: On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote: diff --git a/mm/slab_common.c b/mm/slab_common.c index 23c7500eea7d..c08bc7eb20bd 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1707,17 +1707,17 @@ void *krealloc(const void *p

Re: [PATCH v2 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-14 Thread Waiman Long
On 4/14/20 3:16 PM, Michal Suchánek wrote: > On Tue, Apr 14, 2020 at 12:24:36PM -0400, Waiman Long wrote: >> On 4/14/20 2:08 AM, Christophe Leroy wrote: >>> >>> Le 14/04/2020 à 00:28, Waiman Long a écrit : >>>> Since kfree_sensitive() will do an implicit me

Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-04-14 Thread Waiman Long
On 4/14/20 8:48 AM, David Sterba wrote: > On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote: >> fs/btrfs/ioctl.c | 2 +- > >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 40b729dce91c..eab3f8510426 100644 >> ---

Re: [PATCH v2 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-14 Thread Waiman Long
On 4/14/20 2:08 AM, Christophe Leroy wrote: > > > Le 14/04/2020 à 00:28, Waiman Long a écrit : >> Since kfree_sensitive() will do an implicit memzero_explicit(), there >> is no need to call memzero_explicit() before it. Eliminate those >> memzero_explicit() and simplify

[PATCH v2 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-13 Thread Waiman Long
-by: Waiman Long --- .../allwinner/sun8i-ce/sun8i-ce-cipher.c | 19 +- .../allwinner/sun8i-ss/sun8i-ss-cipher.c | 20 +-- drivers/crypto/amlogic/amlogic-gxl-cipher.c | 12 +++ drivers/crypto/inside-secure/safexcel_hash.c | 3 +-- 4 files changed, 14

Re: [PATCH 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-13 Thread Waiman Long
On 4/13/20 5:31 PM, Joe Perches wrote: > On Mon, 2020-04-13 at 17:15 -0400, Waiman Long wrote: >> Since kfree_sensitive() will do an implicit memzero_explicit(), there >> is no need to call memzero_explicit() before it. Eliminate those >> memzero_explicit() and simplify the

[PATCH 2/2] crypto: Remove unnecessary memzero_explicit()

2020-04-13 Thread Waiman Long
Since kfree_sensitive() will do an implicit memzero_explicit(), there is no need to call memzero_explicit() before it. Eliminate those memzero_explicit() and simplify the call sites. Signed-off-by: Waiman Long --- .../crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 15 +++ .../crypto

[PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-04-13 Thread Waiman Long
ng is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/\bkzfree\b/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and the use of memzero_explicit() instead of memset(). Suggested-by: Joe Perches Signed-off-by: W

[PATCH 0/2] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-04-13 Thread Waiman Long
compile out the clearing in their optimizer. Instead, the new kfree_sensitive() uses memzero_explicit() which won't get compiled out. Waiman Long (2): mm, treewide: Rename kzfree() to kfree_sensitive() crypto: Remove unnecessary memzero_explicit() arch/s390/crypto/prng.c

Re: [PATCH v3 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-11 Thread Waiman Long
99.9000th: 82 > min=0, max=9887 min=0, max=121 > > Performance counter stats for 'system wide' (5 runs): > > context-switches43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) > cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) >

Re: [PATCH v3 2/2] powerpc/shared: Use static key to detect shared processor

2019-12-05 Thread Waiman Long
On 12/5/19 3:32 AM, Srikar Dronamraju wrote: > With the static key shared processor available, is_shared_processor() > can return without having to query the lppaca structure. > > Cc: Parth Shah > Cc: Ihor Pasichnyk > Cc: Juri Lelli > Cc: Phil Auld > Cc: Waiman Long

Re: [PATCH 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-04 Thread Waiman Long
00th: 70 > 98.9000th: 8136 99.9000th: 100 > min=-1, max=10008 min=0, max=142 > > Performance counter stats for 'system wide' (4 runs): > > context-switches 42,604 ( +- 0.87% ) 45,397 ( +- 0.25% ) &g

Re: [PATCH 0/2] Enabling MSI for Microblaze

2019-10-24 Thread Waiman Long
ude/asm/Kbuild | 1 - > arch/riscv/include/asm/Kbuild | 1 - > arch/sparc/include/asm/Kbuild | 1 - > drivers/pci/Kconfig | 2 +- > include/asm-generic/Kbuild | 1 + > 9 files changed, 2 insertions(+), 8 deletions(-) > That looks OK. Acked-by: Waiman Long

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-04-10 Thread Waiman Long
On 04/10/2019 04:15 AM, huang ying wrote: > Hi, Waiman, > > What's the status of this patchset? And its merging plan? > > Best Regards, > Huang, Ying I have broken the patch into 3 parts (0/1/2) and rewritten some of them. Part 0 has been merged into tip. Parts 1 and 2 are still under testing.

Re: [PATCH RFC 0/5] cpu/speculation: Add 'cpu_spec_mitigations=' cmdline options

2019-04-04 Thread Waiman Long
On 04/04/2019 12:44 PM, Josh Poimboeuf wrote: > Keeping track of the number of mitigations for all the CPU speculation > bugs has become overwhelming for many users. It's getting more and more > complicated to decide which mitigations are needed for a given > architecture. Complicating matters

Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
On 03/22/2019 03:30 PM, Davidlohr Bueso wrote: > On Fri, 22 Mar 2019, Linus Torvalds wrote: >> Some of them _might_ be performance-critical. There's the one on >> mmap_sem in the fault handling path, for example. And yes, I'd expect >> the normal case to very much be "no other readers or writers"

Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Waiman Long
On 03/22/2019 01:25 PM, Russell King - ARM Linux admin wrote: > On Fri, Mar 22, 2019 at 10:30:08AM -0400, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before th

Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
On 03/22/2019 01:01 PM, Linus Torvalds wrote: > On Fri, Mar 22, 2019 at 7:30 AM Waiman Long wrote: >> 19 files changed, 133 insertions(+), 930 deletions(-) > Lovely. And it all looks sane to me. > > So ack. > > The only comment I have is about __down_read_trylock()

[PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rwse

[PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-03-22 Thread Waiman Long
-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- arch/alpha/Kconfig | 7 - arch/arc/Kconfig

[PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include/asm

[PATCH v5 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-03-22 Thread Waiman Long
the architectures use one single implementation of rwsem - rwsem-xadd.c. Waiman Long (3): locking/rwsem: Remove arch specific rwsem files locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs locking/rwsem: Optimize down_read_trylock() MAINTAI

Re: [PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-21 Thread Waiman Long
On 02/21/2019 09:14 AM, Will Deacon wrote: > On Wed, Feb 13, 2019 at 05:00:17PM -0500, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before this patch, down_read_trylock: >

Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-15 Thread Waiman Long
On 02/15/2019 01:40 PM, Will Deacon wrote: > On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote: >> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: >>> v4: >>> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. >>> >>

Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-14 Thread Waiman Long
On 02/14/2019 05:37 AM, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote: >> v4: >> - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c. >> >> v3: >> - Optimize __down_read_trylock() for the uncontended case as s

Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-14 Thread Waiman Long
On 02/14/2019 01:02 PM, Will Deacon wrote: > On Thu, Feb 14, 2019 at 11:33:33AM +0100, Peter Zijlstra wrote: >> On Wed, Feb 13, 2019 at 03:32:12PM -0500, Waiman Long wrote: >>> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >>> it ge

Re: [PATCH 03/11] kernel/locks: consolidate RWSEM_GENERIC_* options

2019-02-14 Thread Waiman Long
On 02/14/2019 12:04 PM, Christoph Hellwig wrote: > On Thu, Feb 14, 2019 at 10:26:52AM -0500, Waiman Long wrote: >> Would you mind dropping just patch 3 from your series? > Sure, we can just drop this patch. Thanks, Longman

Re: [PATCH 03/11] kernel/locks: consolidate RWSEM_GENERIC_* options

2019-02-14 Thread Waiman Long
On 02/14/2019 05:52 AM, Geert Uytterhoeven wrote: > On Thu, Feb 14, 2019 at 12:08 AM Christoph Hellwig wrote: >> Introduce one central definition of RWSEM_XCHGADD_ALGORITHM and >> RWSEM_GENERIC_SPINLOCK in kernel/Kconfig.locks and let architectures >> select RWSEM_XCHGADD_ALGORITHM if they want

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-14 Thread Waiman Long
On 02/14/2019 08:23 AM, Davidlohr Bueso wrote: > On Fri, 08 Feb 2019, Waiman Long wrote: >> I am planning to run more performance test and post the data sometimes >> next week. Davidlohr is also going to run some of his rwsem performance >> test on this patchset. > > S

Re: [PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-14 Thread Waiman Long
On 02/14/2019 05:33 AM, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 03:32:12PM -0500, Waiman Long wrote: >> Modify __down_read_trylock() to optimize for an unlocked rwsem and make >> it generate slightly better code. >> >> Before this patch, down_read_trylock: >

[PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rw

[PATCH v4 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-02-13 Thread Waiman Long
-spinlock.c and make all architectures use a single implementation of rwsem - rwsem-xadd.c. All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM in the code are removed. Suggested-by: Peter Zijlstra Signed-off-by: Waiman Long --- arch/alpha/Kconfig | 7 - arch/arc/Kconfig

[PATCH v4 1/3] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

[PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-13 Thread Waiman Long
of this patchset is to remove the architecture specific files for rwsem-xadd to make it easer to add enhancements in the later rwsem patches. It also removes the legacy rwsem-spinlock.c file and make all the architectures use one single implementation of rwsem - rwsem-xadd.c. Waiman Long (3): locking

[PATCH v3 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
case (1 thread), the new down_read_trylock() is a little bit faster. For the contended cases, the new down_read_trylock() perform pretty well in x86-64, but performance degrades at high contention level on ARM64. Suggested-by: Linus Torvalds Signed-off-by: Waiman Long --- kernel/locking/rw

[PATCH v3 1/2] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

[PATCH v3 0/2] locking/rwsem: Remove arch specific rwsem files

2019-02-13 Thread Waiman Long
ch 2 for arm64. Waiman Long (2): locking/rwsem: Remove arch specific rwsem files locking/rwsem: Optimize down_read_trylock() MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Waiman Long
On 02/13/2019 02:45 AM, Ingo Molnar wrote: > * Waiman Long wrote: > >> I looked at the assembly code in arch/x86/include/asm/rwsem.h. For both >> trylocks (read & write), the count is read first before attempting to >> lock it. We did the same for all tryl

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-12 Thread Waiman Long
On 02/12/2019 02:58 PM, Linus Torvalds wrote: > On Mon, Feb 11, 2019 at 11:31 AM Waiman Long wrote: >> Modify __down_read_trylock() to make it generate slightly better code >> (smaller and maybe a tiny bit faster). > This looks good, but I would ask you to try one slightly

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-12 Thread Waiman Long
On 02/12/2019 01:36 PM, Waiman Long wrote: > On 02/12/2019 08:25 AM, Peter Zijlstra wrote: >> On Tue, Feb 12, 2019 at 02:24:04PM +0100, Peter Zijlstra wrote: >>> On Mon, Feb 11, 2019 at 02:31:26PM -0500, Waiman Long wrote: >>>> Modify __down_read_trylock() to make it

Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-12 Thread Waiman Long
On 02/12/2019 08:25 AM, Peter Zijlstra wrote: > On Tue, Feb 12, 2019 at 02:24:04PM +0100, Peter Zijlstra wrote: >> On Mon, Feb 11, 2019 at 02:31:26PM -0500, Waiman Long wrote: >>> Modify __down_read_trylock() to make it generate slightly better code >>> (smaller

[PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-11 Thread Waiman Long
1 27,787 28,259 28,359 9,234 On a ARM64 system, the performance results were: Before PatchAfter Patch # of Threads rlock rlock - - 1 24,155

[PATCH v2 1/2] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch/hexagon/include

[PATCH v2 0/2] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
platforms that I can tested on (arm64 & ppc) are both using the generic C codes, the rwsem performance shouldn't be affected by this patch except the down_read_trylock() code which was included in patch 2 for arm64. Waiman Long (2): locking/rwsem: Remove arch specific rwsem files locking/r

Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
On 02/11/2019 06:58 AM, Peter Zijlstra wrote: > Which is clearly worse. Now we can write that as: > > int __down_read_trylock2(unsigned long *l) > { > long tmp = READ_ONCE(*l); > > while (tmp >= 0) { > if (try_cmpxchg(l, , tmp + 1)) >

Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Waiman Long
On 02/11/2019 05:39 AM, Ingo Molnar wrote: > * Ingo Molnar wrote: > >> Sounds good to me - I've merged this patch, will push it out after >> testing. > Based on Peter's feedback I'm delaying this - performance testing on at > least one key ll/sc arch would be nice indeed. > > Thanks, > >

Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-10 Thread Waiman Long
On 02/10/2019 09:00 PM, Waiman Long wrote: > As the generic rwsem-xadd code is using the appropriate acquire and > release versions of the atomic operations, the arch specific rwsem.h > files will not be that much faster than the generic code as long as the > atomic functions

[PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-10 Thread Waiman Long
/locking needs to access the internal rwsem macros and functions. Signed-off-by: Waiman Long --- MAINTAINERS | 1 - arch/alpha/include/asm/rwsem.h | 211 --- arch/arm/include/asm/Kbuild | 1 - arch/arm64/include/asm/Kbuild | 1 - arch

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-08 Thread Waiman Long
On 02/08/2019 02:50 PM, Linus Torvalds wrote: > On Thu, Feb 7, 2019 at 11:08 AM Waiman Long wrote: >> This patchset revamps the current rwsem-xadd implementation to make >> it saner and easier to work with. This patchset removes all the >> architecture specific assembly co

Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-08 Thread Waiman Long
On 02/07/2019 03:54 PM, Waiman Long wrote: > On 02/07/2019 03:08 PM, Peter Zijlstra wrote: >> On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: >>> On 32-bit architectures, there aren't enough bits to hold both. >>> 64-bit architectures, however,

Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-07 Thread Waiman Long
On 02/07/2019 03:08 PM, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: >> On 32-bit architectures, there aren't enough bits to hold both. >> 64-bit architectures, however, can have enough bits to do that. For >> x86-64, the physical add

Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-07 Thread Waiman Long
On 02/07/2019 02:51 PM, Davidlohr Bueso wrote: > On Thu, 07 Feb 2019, Waiman Long wrote: >> 30 files changed, 1197 insertions(+), 1594 deletions(-) > > Performance numbers on numerous workloads, pretty please. > > I'll go and throw this at my mmap_sem intensive workl

Re: [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-07 Thread Waiman Long
On 02/07/2019 02:45 PM, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 02:07:19PM -0500, Waiman Long wrote: >> On 32-bit architectures, there aren't enough bits to hold both. >> 64-bit architectures, however, can have enough bits to do that. For >> x86-64, the physical add

Re: [PATCH-tip 04/22] locking/rwsem: Remove arch specific rwsem files

2019-02-07 Thread Waiman Long
On 02/07/2019 02:36 PM, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 02:07:08PM -0500, Waiman Long wrote: > >> +static inline int __down_read_trylock(struct rw_semaphore *sem) >> +{ >> +long tmp; >> + >> +while ((tmp = atomic_long_read(>

[PATCH-tip 22/22] locking/rwsem: Ensure an RT task will not spin on reader

2019-02-07 Thread Waiman Long
to deadlock. So we have to make sure that an RT task will not spin on a reader-owned rwsem. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index

[PATCH-tip 21/22] locking/rwsem: Wake up all readers in wait queue

2019-02-07 Thread Waiman Long
769 5,216 At low contention level, there is a slight drop in performance. At high contention level, however, this patch gives a big performance boost. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git

[PATCH-tip 20/22] locking/rwsem: Enable count-based spinning on reader

2019-02-07 Thread Waiman Long
in performance for mixed reader/writer workloads. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem-xadd.c | 63 +++ kernel/locking/rwsem-xadd.h | 45 +--- 3 files changed, 94 insertions

[PATCH-tip 19/22] locking/rwsem: Enable readers spinning on writer

2019-02-07 Thread Waiman Long
-by: Waiman Long --- kernel/locking/lock_events_list.h | 1 + kernel/locking/rwsem-xadd.c | 80 ++- kernel/locking/rwsem-xadd.h | 3 ++ 3 files changed, 74 insertions(+), 10 deletions(-) diff --git a/kernel/locking/lock_events_list.h b/kernel

[PATCH-tip 18/22] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value

2019-02-07 Thread Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value to better reflect the state of lock holder which enables us to make a better decision of what to do next. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 25 +++-- kernel/locking/rwsem-xadd.h

[PATCH-tip 17/22] locking/rwsem: Recheck owner if it is not on cpu

2019-02-07 Thread Waiman Long
Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 16dc7a1..21d462f 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem

[PATCH-tip 16/22] locking/rwsem: Remove redundant computation of writer lock word

2019-02-07 Thread Waiman Long
, the extra constant argument to rwsem_try_write_lock() and rwsem_try_write_lock_unqueued() should be optimized out by the compiler. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/kernel

[PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64

2019-02-07 Thread Waiman Long
on both read and write lock performance. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 20 +++-- kernel/locking/rwsem-xadd.h | 105 +++- 2 files changed, 110 insertions(+), 15 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b

[PATCH-tip 14/22] locking/rwsem: Add more rwsem owner access helpers

2019-02-07 Thread Waiman Long
Before combining owner and count, we are adding two new helpers for accessing the owner value in the rwsem. 1) struct task_struct *rwsem_get_owner(struct rw_semaphore *sem) 2) bool is_rwsem_reader_owned(struct rw_semaphore *sem) Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 11

[PATCH-tip 13/22] locking/rwsem: Remove rwsem_wake() wakeup optimization

2019-02-07 Thread Waiman Long
()/up_write()") will have to be reverted. Signed-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 74 - 1 file changed, 74 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 12b1d61..5f74bae 100644 --- a/ker

[PATCH-tip 12/22] locking/rwsem: Implement lock handoff to prevent lock starvation

2019-02-07 Thread Waiman Long
became much more fair, though there was a slight drop of about 6% in the mean locking operations done which was a tradeoff of having better fairness. Signed-off-by: Waiman Long --- kernel/locking/lock_events_list.h | 2 + kernel/locking/rwsem-xadd.c | 110

[PATCH-tip 11/22] locking/rwsem: Implement a new locking scheme

2019-02-07 Thread Waiman Long
-off-by: Waiman Long --- kernel/locking/rwsem-xadd.c | 145 +++- kernel/locking/rwsem-xadd.h | 85 +- 2 files changed, 89 insertions(+), 141 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index

[PATCH-tip 10/22] locking/rwsem: Enable lock event counting

2019-02-07 Thread Waiman Long
after sleeping. Signed-off-by: Waiman Long --- arch/Kconfig | 2 +- kernel/locking/lock_events_list.h | 17 + kernel/locking/rwsem-xadd.c | 12 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index

  1   2   >