Re: [RFC] Cancellable MCS spinlock rework

2014-07-08 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
> On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> 
> > I do see a point in reducing the size of the rwsem structure. However, I 
> > don't quite understand the point of converting pointers in the 
> > optimistic_spin_queue structure to atomic_t. The structure is cacheline 
> > aligned and there is no saving in size. Converting them to atomic_t does 
> > have a bit of additional overhead of converting the encoded cpu number 
> > back to the actual pointer.
> > 
> > So my suggestion is to just change what is stored in the mutex and rwsem 
> > structure to atomic_t, but keep the pointers in the 
> > optimistic_spin_queue structure.
> 
> Peter, would you prefer going with the above?

Yeah..

> If we were to keep the pointers to the next and prev nodes in the struct
> optimistic_spin_queue instead of converting them to atomic_t to store
> their cpu #, we'd still need to keep track of the cpu #. In the unqueue
> phase of osq_lock, we might have to reload prev = node->prev which we
> then may cmpxchg() it with the lock tail.
> 
> The method we can think of so far would be to add a regular int variable
> to optimistic_spin_queue and initialize it to the CPU #, during the time
> we also initialize node->locked and node->next at the beginning of
> osq_lock. The cost wouldn't be much of an issue since
> optimistic_spin_queue is cache aligned.

Right, there's actually a 4 byte hole in there aside from the full
cacheline alignment, so yeah, tons of space.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-08 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
 On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
 
  I do see a point in reducing the size of the rwsem structure. However, I 
  don't quite understand the point of converting pointers in the 
  optimistic_spin_queue structure to atomic_t. The structure is cacheline 
  aligned and there is no saving in size. Converting them to atomic_t does 
  have a bit of additional overhead of converting the encoded cpu number 
  back to the actual pointer.
  
  So my suggestion is to just change what is stored in the mutex and rwsem 
  structure to atomic_t, but keep the pointers in the 
  optimistic_spin_queue structure.
 
 Peter, would you prefer going with the above?

Yeah..

 If we were to keep the pointers to the next and prev nodes in the struct
 optimistic_spin_queue instead of converting them to atomic_t to store
 their cpu #, we'd still need to keep track of the cpu #. In the unqueue
 phase of osq_lock, we might have to reload prev = node-prev which we
 then may cmpxchg() it with the lock tail.
 
 The method we can think of so far would be to add a regular int variable
 to optimistic_spin_queue and initialize it to the CPU #, during the time
 we also initialize node-locked and node-next at the beginning of
 osq_lock. The cost wouldn't be much of an issue since
 optimistic_spin_queue is cache aligned.

Right, there's actually a 4 byte hole in there aside from the full
cacheline alignment, so yeah, tons of space.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-07 Thread Jason Low
On Fri, 2014-07-04 at 09:51 +0200, Peter Zijlstra wrote:
> On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
> > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> > 
> > > I do see a point in reducing the size of the rwsem structure. However, I 
> > > don't quite understand the point of converting pointers in the 
> > > optimistic_spin_queue structure to atomic_t. The structure is cacheline 
> > > aligned and there is no saving in size. Converting them to atomic_t does 
> > > have a bit of additional overhead of converting the encoded cpu number 
> > > back to the actual pointer.
> > > 
> > > So my suggestion is to just change what is stored in the mutex and rwsem 
> > > structure to atomic_t, but keep the pointers in the 
> > > optimistic_spin_queue structure.
> > 
> > Peter, would you prefer going with the above?
> > 
> > If we were to keep the pointers to the next and prev nodes in the struct
> > optimistic_spin_queue instead of converting them to atomic_t to store
> > their cpu #, we'd still need to keep track of the cpu #. In the unqueue
> > phase of osq_lock, we might have to reload prev = node->prev which we
> > then may cmpxchg() it with the lock tail.
> > 
> > The method we can think of so far would be to add a regular int variable
> > to optimistic_spin_queue and initialize it to the CPU #, during the time
> > we also initialize node->locked and node->next at the beginning of
> > osq_lock. The cost wouldn't be much of an issue since
> > optimistic_spin_queue is cache aligned.
> 
> Let me try and have an actual look at the patch;

Okay, I will be sending out the patchset I had so that there's something
more concrete.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-07 Thread Jason Low
On Fri, 2014-07-04 at 09:51 +0200, Peter Zijlstra wrote:
 On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
  On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
  
   I do see a point in reducing the size of the rwsem structure. However, I 
   don't quite understand the point of converting pointers in the 
   optimistic_spin_queue structure to atomic_t. The structure is cacheline 
   aligned and there is no saving in size. Converting them to atomic_t does 
   have a bit of additional overhead of converting the encoded cpu number 
   back to the actual pointer.
   
   So my suggestion is to just change what is stored in the mutex and rwsem 
   structure to atomic_t, but keep the pointers in the 
   optimistic_spin_queue structure.
  
  Peter, would you prefer going with the above?
  
  If we were to keep the pointers to the next and prev nodes in the struct
  optimistic_spin_queue instead of converting them to atomic_t to store
  their cpu #, we'd still need to keep track of the cpu #. In the unqueue
  phase of osq_lock, we might have to reload prev = node-prev which we
  then may cmpxchg() it with the lock tail.
  
  The method we can think of so far would be to add a regular int variable
  to optimistic_spin_queue and initialize it to the CPU #, during the time
  we also initialize node-locked and node-next at the beginning of
  osq_lock. The cost wouldn't be much of an issue since
  optimistic_spin_queue is cache aligned.
 
 Let me try and have an actual look at the patch;

Okay, I will be sending out the patchset I had so that there's something
more concrete.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-04 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
> On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> 
> > I do see a point in reducing the size of the rwsem structure. However, I 
> > don't quite understand the point of converting pointers in the 
> > optimistic_spin_queue structure to atomic_t. The structure is cacheline 
> > aligned and there is no saving in size. Converting them to atomic_t does 
> > have a bit of additional overhead of converting the encoded cpu number 
> > back to the actual pointer.
> > 
> > So my suggestion is to just change what is stored in the mutex and rwsem 
> > structure to atomic_t, but keep the pointers in the 
> > optimistic_spin_queue structure.
> 
> Peter, would you prefer going with the above?
> 
> If we were to keep the pointers to the next and prev nodes in the struct
> optimistic_spin_queue instead of converting them to atomic_t to store
> their cpu #, we'd still need to keep track of the cpu #. In the unqueue
> phase of osq_lock, we might have to reload prev = node->prev which we
> then may cmpxchg() it with the lock tail.
> 
> The method we can think of so far would be to add a regular int variable
> to optimistic_spin_queue and initialize it to the CPU #, during the time
> we also initialize node->locked and node->next at the beginning of
> osq_lock. The cost wouldn't be much of an issue since
> optimistic_spin_queue is cache aligned.

Let me try and have an actual look at the patch; I'm in the tail end of
a flu and my head isn't quite set for details, but I'll buckle up and go
look, gimme a few :-)


pgpFSjtPMogdF.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-04 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 01:51:48PM -0700, Jason Low wrote:
> On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> > On 07/03/2014 02:34 PM, Jason Low wrote:
> > > On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
> > >> On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
> > >>> On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
> >  Would potentially reducing the size of the rw semaphore structure by 32
> >  bits (for all architectures using optimistic spinning) be a nice
> >  benefit?
> > >>> Possibly, although I had a look at the mutex structure and we didn't
> > >>> have a hole to place it in, unlike what you found with the rwsem.
> > >> Yeah, and currently struct rw_semaphore is the largest lock we have in
> > >> the kernel. Shaving off space is definitely welcome.
> > > Right, especially if it could help things like xfs inode.
> > >
> > 
> > I do see a point in reducing the size of the rwsem structure. However, I 
> > don't quite understand the point of converting pointers in the 
> > optimistic_spin_queue structure to atomic_t.
> 
> Converting the pointers in the optimistic_spin_queue to atomic_t would
> mean we're fully operating on atomic operations instead of using the
> potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.
> 
> If we're in the process of using the CPU numbers in atomic_t, I thought
> we might as well fix that as well since it has actually been shown to
> result in lockups on some architectures. We can then avoid needing to
> implement the tricky architecture workarounds for optimistic spinning.
> Wouldn't that be a "nice-have"?

Nah, I think those archs are fundamentally broken at the moment, we
should not make code harder to read and or more complex just for them.


pgpNyRMMmhD8k.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-04 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 01:51:48PM -0700, Jason Low wrote:
 On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
  On 07/03/2014 02:34 PM, Jason Low wrote:
   On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
   On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
   On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
   Would potentially reducing the size of the rw semaphore structure by 32
   bits (for all architectures using optimistic spinning) be a nice
   benefit?
   Possibly, although I had a look at the mutex structure and we didn't
   have a hole to place it in, unlike what you found with the rwsem.
   Yeah, and currently struct rw_semaphore is the largest lock we have in
   the kernel. Shaving off space is definitely welcome.
   Right, especially if it could help things like xfs inode.
  
  
  I do see a point in reducing the size of the rwsem structure. However, I 
  don't quite understand the point of converting pointers in the 
  optimistic_spin_queue structure to atomic_t.
 
 Converting the pointers in the optimistic_spin_queue to atomic_t would
 mean we're fully operating on atomic operations instead of using the
 potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.
 
 If we're in the process of using the CPU numbers in atomic_t, I thought
 we might as well fix that as well since it has actually been shown to
 result in lockups on some architectures. We can then avoid needing to
 implement the tricky architecture workarounds for optimistic spinning.
 Wouldn't that be a nice-have?

Nah, I think those archs are fundamentally broken at the moment, we
should not make code harder to read and or more complex just for them.


pgpNyRMMmhD8k.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-04 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote:
 On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
 
  I do see a point in reducing the size of the rwsem structure. However, I 
  don't quite understand the point of converting pointers in the 
  optimistic_spin_queue structure to atomic_t. The structure is cacheline 
  aligned and there is no saving in size. Converting them to atomic_t does 
  have a bit of additional overhead of converting the encoded cpu number 
  back to the actual pointer.
  
  So my suggestion is to just change what is stored in the mutex and rwsem 
  structure to atomic_t, but keep the pointers in the 
  optimistic_spin_queue structure.
 
 Peter, would you prefer going with the above?
 
 If we were to keep the pointers to the next and prev nodes in the struct
 optimistic_spin_queue instead of converting them to atomic_t to store
 their cpu #, we'd still need to keep track of the cpu #. In the unqueue
 phase of osq_lock, we might have to reload prev = node-prev which we
 then may cmpxchg() it with the lock tail.
 
 The method we can think of so far would be to add a regular int variable
 to optimistic_spin_queue and initialize it to the CPU #, during the time
 we also initialize node-locked and node-next at the beginning of
 osq_lock. The cost wouldn't be much of an issue since
 optimistic_spin_queue is cache aligned.

Let me try and have an actual look at the patch; I'm in the tail end of
a flu and my head isn't quite set for details, but I'll buckle up and go
look, gimme a few :-)


pgpFSjtPMogdF.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:

> I do see a point in reducing the size of the rwsem structure. However, I 
> don't quite understand the point of converting pointers in the 
> optimistic_spin_queue structure to atomic_t. The structure is cacheline 
> aligned and there is no saving in size. Converting them to atomic_t does 
> have a bit of additional overhead of converting the encoded cpu number 
> back to the actual pointer.
> 
> So my suggestion is to just change what is stored in the mutex and rwsem 
> structure to atomic_t, but keep the pointers in the 
> optimistic_spin_queue structure.

Peter, would you prefer going with the above?

If we were to keep the pointers to the next and prev nodes in the struct
optimistic_spin_queue instead of converting them to atomic_t to store
their cpu #, we'd still need to keep track of the cpu #. In the unqueue
phase of osq_lock, we might have to reload prev = node->prev which we
then may cmpxchg() it with the lock tail.

The method we can think of so far would be to add a regular int variable
to optimistic_spin_queue and initialize it to the CPU #, during the time
we also initialize node->locked and node->next at the beginning of
osq_lock. The cost wouldn't be much of an issue since
optimistic_spin_queue is cache aligned.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 17:35 -0400, Waiman Long wrote:
> On 07/03/2014 04:51 PM, Jason Low wrote:
> > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> >> On 07/03/2014 02:34 PM, Jason Low wrote:
> >>> On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
>  On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
> >> Would potentially reducing the size of the rw semaphore structure by 32
> >> bits (for all architectures using optimistic spinning) be a nice
> >> benefit?
> > Possibly, although I had a look at the mutex structure and we didn't
> > have a hole to place it in, unlike what you found with the rwsem.
>  Yeah, and currently struct rw_semaphore is the largest lock we have in
>  the kernel. Shaving off space is definitely welcome.
> >>> Right, especially if it could help things like xfs inode.
> >>>
> >> I do see a point in reducing the size of the rwsem structure. However, I
> >> don't quite understand the point of converting pointers in the
> >> optimistic_spin_queue structure to atomic_t.
> > Converting the pointers in the optimistic_spin_queue to atomic_t would
> > mean we're fully operating on atomic operations instead of using the
> > potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.
> 
> Yes, the ACCESS_ONCE macro for data store does have problem on some 
> architectures. However, I prefer a more holistic solution to solve this 
> problem rather than a workaround by changing the pointers to atomic_t's. 
> It is because even if we make the change,  we are still not sure if that 
> will work for those architectures as we have no machine to verify that. 
> Why not let the champions of those architectures to propose changes 
> instead of making some untested changes now and penalize commonly used 
> architectures like x86.

So I initially was thinking that converting to atomic_t would not result
in reducing performance on other architecture. However, you do have a
point in your first post that converting the encoded cpu number to the
pointer may add a little bit of overhead (in the contended cases).

If converting pointers to atomic_t in the optimistic_spin_queue
structure does affect performance for commonly used architectures, then
I agree that we should avoid that and only convert what's stored in
mutex/rwsem.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Waiman Long

On 07/03/2014 04:51 PM, Jason Low wrote:

On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:

On 07/03/2014 02:34 PM, Jason Low wrote:

On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:

On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:

On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.


I do see a point in reducing the size of the rwsem structure. However, I
don't quite understand the point of converting pointers in the
optimistic_spin_queue structure to atomic_t.

Converting the pointers in the optimistic_spin_queue to atomic_t would
mean we're fully operating on atomic operations instead of using the
potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.


Yes, the ACCESS_ONCE macro for data store does have problem on some 
architectures. However, I prefer a more holistic solution to solve this 
problem rather than a workaround by changing the pointers to atomic_t's. 
It is because even if we make the change,  we are still not sure if that 
will work for those architectures as we have no machine to verify that. 
Why not let the champions of those architectures to propose changes 
instead of making some untested changes now and penalize commonly used 
architectures like x86.



If we're in the process of using the CPU numbers in atomic_t, I thought
we might as well fix that as well since it has actually been shown to
result in lockups on some architectures. We can then avoid needing to
implement the tricky architecture workarounds for optimistic spinning.
Wouldn't that be a "nice-have"?

Jason



I am not aware of any tricky architectural workarounds other than 
disabling optimistic spinning for those that don't support it.


-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
> On 07/03/2014 02:34 PM, Jason Low wrote:
> > On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
> >> On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
> >>> On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
>  Would potentially reducing the size of the rw semaphore structure by 32
>  bits (for all architectures using optimistic spinning) be a nice
>  benefit?
> >>> Possibly, although I had a look at the mutex structure and we didn't
> >>> have a hole to place it in, unlike what you found with the rwsem.
> >> Yeah, and currently struct rw_semaphore is the largest lock we have in
> >> the kernel. Shaving off space is definitely welcome.
> > Right, especially if it could help things like xfs inode.
> >
> 
> I do see a point in reducing the size of the rwsem structure. However, I 
> don't quite understand the point of converting pointers in the 
> optimistic_spin_queue structure to atomic_t.

Converting the pointers in the optimistic_spin_queue to atomic_t would
mean we're fully operating on atomic operations instead of using the
potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.

If we're in the process of using the CPU numbers in atomic_t, I thought
we might as well fix that as well since it has actually been shown to
result in lockups on some architectures. We can then avoid needing to
implement the tricky architecture workarounds for optimistic spinning.
Wouldn't that be a "nice-have"?

Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Waiman Long

On 07/03/2014 02:34 PM, Jason Low wrote:

On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:

On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:

On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.



I do see a point in reducing the size of the rwsem structure. However, I 
don't quite understand the point of converting pointers in the 
optimistic_spin_queue structure to atomic_t. The structure is cacheline 
aligned and there is no saving in size. Converting them to atomic_t does 
have a bit of additional overhead of converting the encoded cpu number 
back to the actual pointer.


So my suggestion is to just change what is stored in the mutex and rwsem 
structure to atomic_t, but keep the pointers in the 
optimistic_spin_queue structure.


-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
> On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
> > > Would potentially reducing the size of the rw semaphore structure by 32
> > > bits (for all architectures using optimistic spinning) be a nice
> > > benefit?
> > 
> > Possibly, although I had a look at the mutex structure and we didn't
> > have a hole to place it in, unlike what you found with the rwsem.
> 
> Yeah, and currently struct rw_semaphore is the largest lock we have in
> the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 08:35 -0700, Linus Torvalds wrote:
> On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds
>  wrote:
> >
> > I don't think we'll support 2 _billion_ processes/threads waiting on
> > the same semaphore any time soon, so the 'long' seems a bit of an
> > overkill on 64-bit architectures.
> 
> Oh, never mind. The 'struct semaphore' uses it as just a plain count,
> but the rwsem ends up splitting up the bits for readers/writers, so we
> actually do want the full 64-bit value there.

Yeah, that was the key to reducing the struct size as the rwsem count
can't share a 64 bit chunk with the spinlock, unlike some of the other
lock.

Thanks,
Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Davidlohr Bueso
On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
> On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
> > Would potentially reducing the size of the rw semaphore structure by 32
> > bits (for all architectures using optimistic spinning) be a nice
> > benefit?
> 
> Possibly, although I had a look at the mutex structure and we didn't
> have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Linus Torvalds
On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds
 wrote:
>
> I don't think we'll support 2 _billion_ processes/threads waiting on
> the same semaphore any time soon, so the 'long' seems a bit of an
> overkill on 64-bit architectures.

Oh, never mind. The 'struct semaphore' uses it as just a plain count,
but the rwsem ends up splitting up the bits for readers/writers, so we
actually do want the full 64-bit value there.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Linus Torvalds
On Wed, Jul 2, 2014 at 9:39 PM, Jason Low  wrote:
>
> And due to padding, the additional modification below reduces the
> size of struct rw_semaphore by 64 bits on my machine  :)

Well, I think you could also make 'count' just be 'int' instead of 'long'.

I don't think we'll support 2 _billion_ processes/threads waiting on
the same semaphore any time soon, so the 'long' seems a bit of an
overkill on 64-bit architectures.

  Linus.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:39:18PM -0700, Jason Low wrote:
> And due to padding, the additional modification below reduces the
> size of struct rw_semaphore by 64 bits on my machine  :)
> 
> 
>  struct rw_semaphore {
>   long count;
> - raw_spinlock_t wait_lock;
>   struct list_head wait_list;
> + raw_spinlock_t wait_lock;
>  #ifdef CONFIG_SMP
> + struct optimistic_spin_tail osq; /* spinner MCS lock */
>   /*
>* Write owner. Used as a speculative check to see
>* if the owner is running on the cpu.
>*/
>   struct task_struct *owner;
> - struct optimistic_spin_tail osq; /* spinner MCS lock */
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   struct lockdep_map  dep_map;
> 

Right, that might make sense.


pgp9dUHbJo6sM.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
> Would potentially reducing the size of the rw semaphore structure by 32
> bits (for all architectures using optimistic spinning) be a nice
> benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.


pgpTk5rgWPdnB.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
 Would potentially reducing the size of the rw semaphore structure by 32
 bits (for all architectures using optimistic spinning) be a nice
 benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.


pgpTk5rgWPdnB.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:39:18PM -0700, Jason Low wrote:
 And due to padding, the additional modification below reduces the
 size of struct rw_semaphore by 64 bits on my machine  :)
 
 
  struct rw_semaphore {
   long count;
 - raw_spinlock_t wait_lock;
   struct list_head wait_list;
 + raw_spinlock_t wait_lock;
  #ifdef CONFIG_SMP
 + struct optimistic_spin_tail osq; /* spinner MCS lock */
   /*
* Write owner. Used as a speculative check to see
* if the owner is running on the cpu.
*/
   struct task_struct *owner;
 - struct optimistic_spin_tail osq; /* spinner MCS lock */
  #endif
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
   struct lockdep_map  dep_map;
 

Right, that might make sense.


pgp9dUHbJo6sM.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Linus Torvalds
On Wed, Jul 2, 2014 at 9:39 PM, Jason Low jason.l...@hp.com wrote:

 And due to padding, the additional modification below reduces the
 size of struct rw_semaphore by 64 bits on my machine  :)

Well, I think you could also make 'count' just be 'int' instead of 'long'.

I don't think we'll support 2 _billion_ processes/threads waiting on
the same semaphore any time soon, so the 'long' seems a bit of an
overkill on 64-bit architectures.

  Linus.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Linus Torvalds
On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 I don't think we'll support 2 _billion_ processes/threads waiting on
 the same semaphore any time soon, so the 'long' seems a bit of an
 overkill on 64-bit architectures.

Oh, never mind. The 'struct semaphore' uses it as just a plain count,
but the rwsem ends up splitting up the bits for readers/writers, so we
actually do want the full 64-bit value there.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Davidlohr Bueso
On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
 On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
  Would potentially reducing the size of the rw semaphore structure by 32
  bits (for all architectures using optimistic spinning) be a nice
  benefit?
 
 Possibly, although I had a look at the mutex structure and we didn't
 have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 08:35 -0700, Linus Torvalds wrote:
 On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  I don't think we'll support 2 _billion_ processes/threads waiting on
  the same semaphore any time soon, so the 'long' seems a bit of an
  overkill on 64-bit architectures.
 
 Oh, never mind. The 'struct semaphore' uses it as just a plain count,
 but the rwsem ends up splitting up the bits for readers/writers, so we
 actually do want the full 64-bit value there.

Yeah, that was the key to reducing the struct size as the rwsem count
can't share a 64 bit chunk with the spinlock, unlike some of the other
lock.

Thanks,
Jason

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
 On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
  On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
   Would potentially reducing the size of the rw semaphore structure by 32
   bits (for all architectures using optimistic spinning) be a nice
   benefit?
  
  Possibly, although I had a look at the mutex structure and we didn't
  have a hole to place it in, unlike what you found with the rwsem.
 
 Yeah, and currently struct rw_semaphore is the largest lock we have in
 the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Waiman Long

On 07/03/2014 02:34 PM, Jason Low wrote:

On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:

On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:

On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.



I do see a point in reducing the size of the rwsem structure. However, I 
don't quite understand the point of converting pointers in the 
optimistic_spin_queue structure to atomic_t. The structure is cacheline 
aligned and there is no saving in size. Converting them to atomic_t does 
have a bit of additional overhead of converting the encoded cpu number 
back to the actual pointer.


So my suggestion is to just change what is stored in the mutex and rwsem 
structure to atomic_t, but keep the pointers in the 
optimistic_spin_queue structure.


-Longman
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
 On 07/03/2014 02:34 PM, Jason Low wrote:
  On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
  On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
  On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
  Would potentially reducing the size of the rw semaphore structure by 32
  bits (for all architectures using optimistic spinning) be a nice
  benefit?
  Possibly, although I had a look at the mutex structure and we didn't
  have a hole to place it in, unlike what you found with the rwsem.
  Yeah, and currently struct rw_semaphore is the largest lock we have in
  the kernel. Shaving off space is definitely welcome.
  Right, especially if it could help things like xfs inode.
 
 
 I do see a point in reducing the size of the rwsem structure. However, I 
 don't quite understand the point of converting pointers in the 
 optimistic_spin_queue structure to atomic_t.

Converting the pointers in the optimistic_spin_queue to atomic_t would
mean we're fully operating on atomic operations instead of using the
potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.

If we're in the process of using the CPU numbers in atomic_t, I thought
we might as well fix that as well since it has actually been shown to
result in lockups on some architectures. We can then avoid needing to
implement the tricky architecture workarounds for optimistic spinning.
Wouldn't that be a nice-have?

Jason

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Waiman Long

On 07/03/2014 04:51 PM, Jason Low wrote:

On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:

On 07/03/2014 02:34 PM, Jason Low wrote:

On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:

On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:

On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

Possibly, although I had a look at the mutex structure and we didn't
have a hole to place it in, unlike what you found with the rwsem.

Yeah, and currently struct rw_semaphore is the largest lock we have in
the kernel. Shaving off space is definitely welcome.

Right, especially if it could help things like xfs inode.


I do see a point in reducing the size of the rwsem structure. However, I
don't quite understand the point of converting pointers in the
optimistic_spin_queue structure to atomic_t.

Converting the pointers in the optimistic_spin_queue to atomic_t would
mean we're fully operating on atomic operations instead of using the
potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.


Yes, the ACCESS_ONCE macro for data store does have problem on some 
architectures. However, I prefer a more holistic solution to solve this 
problem rather than a workaround by changing the pointers to atomic_t's. 
It is because even if we make the change,  we are still not sure if that 
will work for those architectures as we have no machine to verify that. 
Why not let the champions of those architectures to propose changes 
instead of making some untested changes now and penalize commonly used 
architectures like x86.



If we're in the process of using the CPU numbers in atomic_t, I thought
we might as well fix that as well since it has actually been shown to
result in lockups on some architectures. We can then avoid needing to
implement the tricky architecture workarounds for optimistic spinning.
Wouldn't that be a nice-have?

Jason



I am not aware of any tricky architectural workarounds other than 
disabling optimistic spinning for those that don't support it.


-Longman

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 17:35 -0400, Waiman Long wrote:
 On 07/03/2014 04:51 PM, Jason Low wrote:
  On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:
  On 07/03/2014 02:34 PM, Jason Low wrote:
  On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote:
  On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote:
  On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote:
  Would potentially reducing the size of the rw semaphore structure by 32
  bits (for all architectures using optimistic spinning) be a nice
  benefit?
  Possibly, although I had a look at the mutex structure and we didn't
  have a hole to place it in, unlike what you found with the rwsem.
  Yeah, and currently struct rw_semaphore is the largest lock we have in
  the kernel. Shaving off space is definitely welcome.
  Right, especially if it could help things like xfs inode.
 
  I do see a point in reducing the size of the rwsem structure. However, I
  don't quite understand the point of converting pointers in the
  optimistic_spin_queue structure to atomic_t.
  Converting the pointers in the optimistic_spin_queue to atomic_t would
  mean we're fully operating on atomic operations instead of using the
  potentially racy cmpxchg + ACCESS_ONCE stores on the pointers.
 
 Yes, the ACCESS_ONCE macro for data store does have problem on some 
 architectures. However, I prefer a more holistic solution to solve this 
 problem rather than a workaround by changing the pointers to atomic_t's. 
 It is because even if we make the change,  we are still not sure if that 
 will work for those architectures as we have no machine to verify that. 
 Why not let the champions of those architectures to propose changes 
 instead of making some untested changes now and penalize commonly used 
 architectures like x86.

So I initially was thinking that converting to atomic_t would not result
in reducing performance on other architecture. However, you do have a
point in your first post that converting the encoded cpu number to the
pointer may add a little bit of overhead (in the contended cases).

If converting pointers to atomic_t in the optimistic_spin_queue
structure does affect performance for commonly used architectures, then
I agree that we should avoid that and only convert what's stored in
mutex/rwsem.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-03 Thread Jason Low
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote:

 I do see a point in reducing the size of the rwsem structure. However, I 
 don't quite understand the point of converting pointers in the 
 optimistic_spin_queue structure to atomic_t. The structure is cacheline 
 aligned and there is no saving in size. Converting them to atomic_t does 
 have a bit of additional overhead of converting the encoded cpu number 
 back to the actual pointer.
 
 So my suggestion is to just change what is stored in the mutex and rwsem 
 structure to atomic_t, but keep the pointers in the 
 optimistic_spin_queue structure.

Peter, would you prefer going with the above?

If we were to keep the pointers to the next and prev nodes in the struct
optimistic_spin_queue instead of converting them to atomic_t to store
their cpu #, we'd still need to keep track of the cpu #. In the unqueue
phase of osq_lock, we might have to reload prev = node-prev which we
then may cmpxchg() it with the lock tail.

The method we can think of so far would be to add a regular int variable
to optimistic_spin_queue and initialize it to the CPU #, during the time
we also initialize node-locked and node-next at the beginning of
osq_lock. The cost wouldn't be much of an issue since
optimistic_spin_queue is cache aligned.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 10:30 -0700, Jason Low wrote:
> On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
> > > 
> > > Why I converted pointers to atomic_t?
> > > 
> > > This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
> > > also using less overhead, since atomic_t is often only 32 bits while
> > > pointers could be 64 bits.
> > 
> > So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
> > likely broken all over the place, and 'fixing' this one place doesn't
> > cure the problem.
> 
> Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture
> workarounds for optimistic spinning was just a nice side effect.
> 
> Would potentially reducing the size of the rw semaphore structure by 32
> bits (for all architectures using optimistic spinning) be a nice
> benefit?

And due to padding, the additional modification below reduces the
size of struct rw_semaphore by 64 bits on my machine  :)


 struct rw_semaphore {
long count;
-   raw_spinlock_t wait_lock;
struct list_head wait_list;
+   raw_spinlock_t wait_lock;
 #ifdef CONFIG_SMP
+   struct optimistic_spin_tail osq; /* spinner MCS lock */
/*
 * Write owner. Used as a speculative check to see
 * if the owner is running on the cpu.
 */
struct task_struct *owner;
-   struct optimistic_spin_tail osq; /* spinner MCS lock */
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map  dep_map;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote:
> On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
> > On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
> > > On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
> > > > The cancellable MCS spinlock is currently used to queue threads that are
> > > > doing optimistic spinning. It uses per-cpu nodes, where a thread 
> > > > obtaining
> > > > the lock would access and queue the local node corresponding to the CPU 
> > > > that
> > > > it's running on. Currently, the cancellable MCS lock is implemented by 
> > > > using
> > > > pointers to these nodes.
> > > > 
> > > > In this RFC patch, instead of operating on pointers to the per-cpu 
> > > > nodes, we
> > > > store the CPU numbers in which the per-cpu nodes correspond to in 
> > > > atomic_t.
> > > > A similar concept is used with the qspinlock.
> > > > 
> > > > We add 1 to the CPU number to retrive an "encoded value" representing 
> > > > the node
> > > > of that CPU. By doing this, 0 can represent "no CPU", which allows us to
> > > > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the 
> > > > next and
> > > > prev pointers in each node were also modified to store encoded CPU 
> > > > values.
> > > > 
> > > > By operating on the CPU # of the nodes using atomic_t instead of 
> > > > pointers
> > > > to those nodes, this can reduce the overhead of the cancellable MCS 
> > > > spinlock
> > > > by 32 bits (on 64 bit systems).
> > > 
> > > Still struggling to figure out why you did this.
> > 
> > Why I converted pointers to atomic_t?
> > 
> > This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
> > also using less overhead, since atomic_t is often only 32 bits while
> > pointers could be 64 bits.
> 
> So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
> likely broken all over the place, and 'fixing' this one place doesn't
> cure the problem.

Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture
workarounds for optimistic spinning was just a nice side effect.

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
> On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
> > > The cancellable MCS spinlock is currently used to queue threads that are
> > > doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
> > > the lock would access and queue the local node corresponding to the CPU 
> > > that
> > > it's running on. Currently, the cancellable MCS lock is implemented by 
> > > using
> > > pointers to these nodes.
> > > 
> > > In this RFC patch, instead of operating on pointers to the per-cpu nodes, 
> > > we
> > > store the CPU numbers in which the per-cpu nodes correspond to in 
> > > atomic_t.
> > > A similar concept is used with the qspinlock.
> > > 
> > > We add 1 to the CPU number to retrive an "encoded value" representing the 
> > > node
> > > of that CPU. By doing this, 0 can represent "no CPU", which allows us to
> > > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the 
> > > next and
> > > prev pointers in each node were also modified to store encoded CPU values.
> > > 
> > > By operating on the CPU # of the nodes using atomic_t instead of pointers
> > > to those nodes, this can reduce the overhead of the cancellable MCS 
> > > spinlock
> > > by 32 bits (on 64 bit systems).
> > 
> > Still struggling to figure out why you did this.
> 
> Why I converted pointers to atomic_t?
> 
> This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
> also using less overhead, since atomic_t is often only 32 bits while
> pointers could be 64 bits.

So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
likely broken all over the place, and 'fixing' this one place doesn't
cure the problem.


pgpaYYGxpg7ra.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
> On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
> > The cancellable MCS spinlock is currently used to queue threads that are
> > doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
> > the lock would access and queue the local node corresponding to the CPU that
> > it's running on. Currently, the cancellable MCS lock is implemented by using
> > pointers to these nodes.
> > 
> > In this RFC patch, instead of operating on pointers to the per-cpu nodes, we
> > store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
> > A similar concept is used with the qspinlock.
> > 
> > We add 1 to the CPU number to retrive an "encoded value" representing the 
> > node
> > of that CPU. By doing this, 0 can represent "no CPU", which allows us to
> > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the next 
> > and
> > prev pointers in each node were also modified to store encoded CPU values.
> > 
> > By operating on the CPU # of the nodes using atomic_t instead of pointers
> > to those nodes, this can reduce the overhead of the cancellable MCS spinlock
> > by 32 bits (on 64 bit systems).
> 
> Still struggling to figure out why you did this.

Why I converted pointers to atomic_t?

This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
also using less overhead, since atomic_t is often only 32 bits while
pointers could be 64 bits.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
> The cancellable MCS spinlock is currently used to queue threads that are
> doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
> the lock would access and queue the local node corresponding to the CPU that
> it's running on. Currently, the cancellable MCS lock is implemented by using
> pointers to these nodes.
> 
> In this RFC patch, instead of operating on pointers to the per-cpu nodes, we
> store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
> A similar concept is used with the qspinlock.
> 
> We add 1 to the CPU number to retrive an "encoded value" representing the node
> of that CPU. By doing this, 0 can represent "no CPU", which allows us to
> keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the next and
> prev pointers in each node were also modified to store encoded CPU values.
> 
> By operating on the CPU # of the nodes using atomic_t instead of pointers
> to those nodes, this can reduce the overhead of the cancellable MCS spinlock
> by 32 bits (on 64 bit systems).

Still struggling to figure out why you did this.




pgpfPeHZ3abfs.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
 The cancellable MCS spinlock is currently used to queue threads that are
 doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
 the lock would access and queue the local node corresponding to the CPU that
 it's running on. Currently, the cancellable MCS lock is implemented by using
 pointers to these nodes.
 
 In this RFC patch, instead of operating on pointers to the per-cpu nodes, we
 store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
 A similar concept is used with the qspinlock.
 
 We add 1 to the CPU number to retrive an encoded value representing the node
 of that CPU. By doing this, 0 can represent no CPU, which allows us to
 keep the simple if (CPU) and if (!CPU) checks. In this patch, the next and
 prev pointers in each node were also modified to store encoded CPU values.
 
 By operating on the CPU # of the nodes using atomic_t instead of pointers
 to those nodes, this can reduce the overhead of the cancellable MCS spinlock
 by 32 bits (on 64 bit systems).

Still struggling to figure out why you did this.




pgpfPeHZ3abfs.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
 On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
  The cancellable MCS spinlock is currently used to queue threads that are
  doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
  the lock would access and queue the local node corresponding to the CPU that
  it's running on. Currently, the cancellable MCS lock is implemented by using
  pointers to these nodes.
  
  In this RFC patch, instead of operating on pointers to the per-cpu nodes, we
  store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
  A similar concept is used with the qspinlock.
  
  We add 1 to the CPU number to retrive an encoded value representing the 
  node
  of that CPU. By doing this, 0 can represent no CPU, which allows us to
  keep the simple if (CPU) and if (!CPU) checks. In this patch, the next 
  and
  prev pointers in each node were also modified to store encoded CPU values.
  
  By operating on the CPU # of the nodes using atomic_t instead of pointers
  to those nodes, this can reduce the overhead of the cancellable MCS spinlock
  by 32 bits (on 64 bit systems).
 
 Still struggling to figure out why you did this.

Why I converted pointers to atomic_t?

This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
also using less overhead, since atomic_t is often only 32 bits while
pointers could be 64 bits.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
 On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
  On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
   The cancellable MCS spinlock is currently used to queue threads that are
   doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
   the lock would access and queue the local node corresponding to the CPU 
   that
   it's running on. Currently, the cancellable MCS lock is implemented by 
   using
   pointers to these nodes.
   
   In this RFC patch, instead of operating on pointers to the per-cpu nodes, 
   we
   store the CPU numbers in which the per-cpu nodes correspond to in 
   atomic_t.
   A similar concept is used with the qspinlock.
   
   We add 1 to the CPU number to retrive an encoded value representing the 
   node
   of that CPU. By doing this, 0 can represent no CPU, which allows us to
   keep the simple if (CPU) and if (!CPU) checks. In this patch, the 
   next and
   prev pointers in each node were also modified to store encoded CPU values.
   
   By operating on the CPU # of the nodes using atomic_t instead of pointers
   to those nodes, this can reduce the overhead of the cancellable MCS 
   spinlock
   by 32 bits (on 64 bit systems).
  
  Still struggling to figure out why you did this.
 
 Why I converted pointers to atomic_t?
 
 This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
 also using less overhead, since atomic_t is often only 32 bits while
 pointers could be 64 bits.

So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
likely broken all over the place, and 'fixing' this one place doesn't
cure the problem.


pgpaYYGxpg7ra.pgp
Description: PGP signature


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote:
 On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
  On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote:
   On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote:
The cancellable MCS spinlock is currently used to queue threads that are
doing optimistic spinning. It uses per-cpu nodes, where a thread 
obtaining
the lock would access and queue the local node corresponding to the CPU 
that
it's running on. Currently, the cancellable MCS lock is implemented by 
using
pointers to these nodes.

In this RFC patch, instead of operating on pointers to the per-cpu 
nodes, we
store the CPU numbers in which the per-cpu nodes correspond to in 
atomic_t.
A similar concept is used with the qspinlock.

We add 1 to the CPU number to retrive an encoded value representing 
the node
of that CPU. By doing this, 0 can represent no CPU, which allows us to
keep the simple if (CPU) and if (!CPU) checks. In this patch, the 
next and
prev pointers in each node were also modified to store encoded CPU 
values.

By operating on the CPU # of the nodes using atomic_t instead of 
pointers
to those nodes, this can reduce the overhead of the cancellable MCS 
spinlock
by 32 bits (on 64 bit systems).
   
   Still struggling to figure out why you did this.
  
  Why I converted pointers to atomic_t?
  
  This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
  also using less overhead, since atomic_t is often only 32 bits while
  pointers could be 64 bits.
 
 So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
 likely broken all over the place, and 'fixing' this one place doesn't
 cure the problem.

Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture
workarounds for optimistic spinning was just a nice side effect.

Would potentially reducing the size of the rw semaphore structure by 32
bits (for all architectures using optimistic spinning) be a nice
benefit?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Cancellable MCS spinlock rework

2014-07-02 Thread Jason Low
On Wed, 2014-07-02 at 10:30 -0700, Jason Low wrote:
 On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote:
  On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote:
   
   Why I converted pointers to atomic_t?
   
   This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while
   also using less overhead, since atomic_t is often only 32 bits while
   pointers could be 64 bits.
  
  So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is
  likely broken all over the place, and 'fixing' this one place doesn't
  cure the problem.
 
 Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture
 workarounds for optimistic spinning was just a nice side effect.
 
 Would potentially reducing the size of the rw semaphore structure by 32
 bits (for all architectures using optimistic spinning) be a nice
 benefit?

And due to padding, the additional modification below reduces the
size of struct rw_semaphore by 64 bits on my machine  :)


 struct rw_semaphore {
long count;
-   raw_spinlock_t wait_lock;
struct list_head wait_list;
+   raw_spinlock_t wait_lock;
 #ifdef CONFIG_SMP
+   struct optimistic_spin_tail osq; /* spinner MCS lock */
/*
 * Write owner. Used as a speculative check to see
 * if the owner is running on the cpu.
 */
struct task_struct *owner;
-   struct optimistic_spin_tail osq; /* spinner MCS lock */
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map  dep_map;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/