Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Peter Zijlstra
On Tue, Oct 02, 2018 at 02:19:53PM +0100, Will Deacon wrote: > On Mon, Oct 01, 2018 at 10:00:28PM +0200, Peter Zijlstra wrote: > > Let me draw a picture of that.. > > > > > > CPU0 CPU1CPU2CPU3 > > > > 0) lock > >

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Peter Zijlstra
On Tue, Oct 02, 2018 at 02:19:53PM +0100, Will Deacon wrote: > On Mon, Oct 01, 2018 at 10:00:28PM +0200, Peter Zijlstra wrote: > > Let me draw a picture of that.. > > > > > > CPU0 CPU1CPU2CPU3 > > > > 0) lock > >

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Andrea Parri
On Tue, Oct 02, 2018 at 02:22:09PM +0100, Will Deacon wrote: > On Tue, Oct 02, 2018 at 02:31:52PM +0200, Andrea Parri wrote: > > > consider this scenario with your patch: > > > > > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set > > >pending. > > > > > > 2. CPU1

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Andrea Parri
On Tue, Oct 02, 2018 at 02:22:09PM +0100, Will Deacon wrote: > On Tue, Oct 02, 2018 at 02:31:52PM +0200, Andrea Parri wrote: > > > consider this scenario with your patch: > > > > > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set > > >pending. > > > > > > 2. CPU1

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Will Deacon
On Tue, Oct 02, 2018 at 02:31:52PM +0200, Andrea Parri wrote: > > consider this scenario with your patch: > > > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set > >pending. > > > > 2. CPU1 comes in and sets pending, spins on locked > > > > 3. CPU2 sees a pending

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Will Deacon
On Tue, Oct 02, 2018 at 02:31:52PM +0200, Andrea Parri wrote: > > consider this scenario with your patch: > > > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set > >pending. > > > > 2. CPU1 comes in and sets pending, spins on locked > > > > 3. CPU2 sees a pending

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Will Deacon
On Mon, Oct 01, 2018 at 10:00:28PM +0200, Peter Zijlstra wrote: > On Mon, Oct 01, 2018 at 06:17:00PM +0100, Will Deacon wrote: > > Thanks for chewing up my afternoon ;) > > I'll get you a beer in EDI ;-) Just one?! > > But actually, > > consider this scenario with your patch: > > > > 1. CPU0

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Will Deacon
On Mon, Oct 01, 2018 at 10:00:28PM +0200, Peter Zijlstra wrote: > On Mon, Oct 01, 2018 at 06:17:00PM +0100, Will Deacon wrote: > > Thanks for chewing up my afternoon ;) > > I'll get you a beer in EDI ;-) Just one?! > > But actually, > > consider this scenario with your patch: > > > > 1. CPU0

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Andrea Parri
> consider this scenario with your patch: > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set >pending. > > 2. CPU1 comes in and sets pending, spins on locked > > 3. CPU2 sees a pending and locked val, and is about to enter the head of >the waitqueue (i.e. it's

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-02 Thread Andrea Parri
> consider this scenario with your patch: > > 1. CPU0 sees a locked val, and is about to do your xchg_relaxed() to set >pending. > > 2. CPU1 comes in and sets pending, spins on locked > > 3. CPU2 sees a pending and locked val, and is about to enter the head of >the waitqueue (i.e. it's

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-01 Thread Peter Zijlstra
On Mon, Oct 01, 2018 at 06:17:00PM +0100, Will Deacon wrote: > Hi Peter, > > Thanks for chewing up my afternoon ;) I'll get you a beer in EDI ;-) > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > /** > > + * set_pending_fetch_acquire - fetch the whole lock value and set

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-01 Thread Peter Zijlstra
On Mon, Oct 01, 2018 at 06:17:00PM +0100, Will Deacon wrote: > Hi Peter, > > Thanks for chewing up my afternoon ;) I'll get you a beer in EDI ;-) > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > /** > > + * set_pending_fetch_acquire - fetch the whole lock value and set

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-01 Thread Will Deacon
Hi Peter, Thanks for chewing up my afternoon ;) On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load.

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-10-01 Thread Will Deacon
Hi Peter, Thanks for chewing up my afternoon ;) On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load.

RE: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread David Laight
From: Peter Zijlstra > Sent: 26 September 2018 12:01 > > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set

RE: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread David Laight
From: Peter Zijlstra > Sent: 26 September 2018 12:01 > > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Thu, Sep 27, 2018 at 10:13:15AM +0200, Andrea Parri wrote: > On Thu, Sep 27, 2018 at 09:59:35AM +0200, Peter Zijlstra wrote: > > On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > > > > > True, but it

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Thu, Sep 27, 2018 at 10:13:15AM +0200, Andrea Parri wrote: > On Thu, Sep 27, 2018 at 09:59:35AM +0200, Peter Zijlstra wrote: > > On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > > > > > True, but it

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Andrea Parri
On Thu, Sep 27, 2018 at 09:59:35AM +0200, Peter Zijlstra wrote: > On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > > > True, but it is nothing conceptually new to deal with: there're Cat > > models that

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Andrea Parri
On Thu, Sep 27, 2018 at 09:59:35AM +0200, Peter Zijlstra wrote: > On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > > > True, but it is nothing conceptually new to deal with: there're Cat > > models that

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > True, but it is nothing conceptually new to deal with: there're Cat > models that handle mixed-size accesses, just give it time. Sure, but until that time I

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Thu, Sep 27, 2018 at 09:47:48AM +0200, Andrea Parri wrote: > > LKMM in particular does _NOT_ deal with mixed sized atomics _at_all_. > > True, but it is nothing conceptually new to deal with: there're Cat > models that handle mixed-size accesses, just give it time. Sure, but until that time I

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Andrea Parri
On Thu, Sep 27, 2018 at 09:17:47AM +0200, Peter Zijlstra wrote: > On Wed, Sep 26, 2018 at 10:52:08PM +0200, Andrea Parri wrote: > > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > > On x86 we cannot do fetch_or with a single instruction and end up > > > using a cmpxchg loop,

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Andrea Parri
On Thu, Sep 27, 2018 at 09:17:47AM +0200, Peter Zijlstra wrote: > On Wed, Sep 26, 2018 at 10:52:08PM +0200, Andrea Parri wrote: > > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > > On x86 we cannot do fetch_or with a single instruction and end up > > > using a cmpxchg loop,

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 07:54:18PM +0200, Peter Zijlstra wrote: > On Wed, Sep 26, 2018 at 12:30:36PM -0400, Waiman Long wrote: > > On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > > > On x86 we cannot do fetch_or with a single instruction and end up > > > using a cmpxchg loop, this reduces

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 07:54:18PM +0200, Peter Zijlstra wrote: > On Wed, Sep 26, 2018 at 12:30:36PM -0400, Waiman Long wrote: > > On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > > > On x86 we cannot do fetch_or with a single instruction and end up > > > using a cmpxchg loop, this reduces

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 10:52:08PM +0200, Andrea Parri wrote: > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > On x86 we cannot do fetch_or with a single instruction and end up > > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > > with a very tricky

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-27 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 10:52:08PM +0200, Andrea Parri wrote: > On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > > On x86 we cannot do fetch_or with a single instruction and end up > > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > > with a very tricky

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Andrea Parri
On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Andrea Parri
On Wed, Sep 26, 2018 at 01:01:20PM +0200, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 12:30:36PM -0400, Waiman Long wrote: > On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > > On x86 we cannot do fetch_or with a single instruction and end up > > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > > with a very tricky composite xchg8 + load.

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Peter Zijlstra
On Wed, Sep 26, 2018 at 12:30:36PM -0400, Waiman Long wrote: > On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > > On x86 we cannot do fetch_or with a single instruction and end up > > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > > with a very tricky composite xchg8 + load.

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Waiman Long
On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set the pending

Re: [RFC][PATCH 3/3] locking/qspinlock: Optimize for x86

2018-09-26 Thread Waiman Long
On 09/26/2018 07:01 AM, Peter Zijlstra wrote: > On x86 we cannot do fetch_or with a single instruction and end up > using a cmpxchg loop, this reduces determinism. Replace the fetch_or > with a very tricky composite xchg8 + load. > > The basic idea is that we use xchg8 to test-and-set the pending