Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > > > I do see a point in reducing the size of the rwsem structure. However, I > > don't quite understand the point of converting pointers in the > > optimistic_spin_queue structure to atomic_t. The structure is cacheline > > aligned and there is no saving in size. Converting them to atomic_t does > > have a bit of additional overhead of converting the encoded cpu number > > back to the actual pointer. > > > > So my suggestion is to just change what is stored in the mutex and rwsem > > structure to atomic_t, but keep the pointers in the > > optimistic_spin_queue structure. > > Peter, would you prefer going with the above? Yeah.. > If we were to keep the pointers to the next and prev nodes in the struct > optimistic_spin_queue instead of converting them to atomic_t to store > their cpu #, we'd still need to keep track of the cpu #. In the unqueue > phase of osq_lock, we might have to reload prev = node->prev which we > then may cmpxchg() it with the lock tail. > > The method we can think of so far would be to add a regular int variable > to optimistic_spin_queue and initialize it to the CPU #, during the time > we also initialize node->locked and node->next at the beginning of > osq_lock. The cost wouldn't be much of an issue since > optimistic_spin_queue is cache aligned. Right, there's actually a 4 byte hole in there aside from the full cacheline alignment, so yeah, tons of space. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. Peter, would you prefer going with the above? Yeah.. If we were to keep the pointers to the next and prev nodes in the struct optimistic_spin_queue instead of converting them to atomic_t to store their cpu #, we'd still need to keep track of the cpu #. In the unqueue phase of osq_lock, we might have to reload prev = node-prev which we then may cmpxchg() it with the lock tail. The method we can think of so far would be to add a regular int variable to optimistic_spin_queue and initialize it to the CPU #, during the time we also initialize node-locked and node-next at the beginning of osq_lock. The cost wouldn't be much of an issue since optimistic_spin_queue is cache aligned. Right, there's actually a 4 byte hole in there aside from the full cacheline alignment, so yeah, tons of space. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Fri, 2014-07-04 at 09:51 +0200, Peter Zijlstra wrote: > On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: > > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > > > > > I do see a point in reducing the size of the rwsem structure. However, I > > > don't quite understand the point of converting pointers in the > > > optimistic_spin_queue structure to atomic_t. The structure is cacheline > > > aligned and there is no saving in size. Converting them to atomic_t does > > > have a bit of additional overhead of converting the encoded cpu number > > > back to the actual pointer. > > > > > > So my suggestion is to just change what is stored in the mutex and rwsem > > > structure to atomic_t, but keep the pointers in the > > > optimistic_spin_queue structure. > > > > Peter, would you prefer going with the above? > > > > If we were to keep the pointers to the next and prev nodes in the struct > > optimistic_spin_queue instead of converting them to atomic_t to store > > their cpu #, we'd still need to keep track of the cpu #. In the unqueue > > phase of osq_lock, we might have to reload prev = node->prev which we > > then may cmpxchg() it with the lock tail. > > > > The method we can think of so far would be to add a regular int variable > > to optimistic_spin_queue and initialize it to the CPU #, during the time > > we also initialize node->locked and node->next at the beginning of > > osq_lock. The cost wouldn't be much of an issue since > > optimistic_spin_queue is cache aligned. > > Let me try and have an actual look at the patch; Okay, I will be sending out the patchset I had so that there's something more concrete. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Fri, 2014-07-04 at 09:51 +0200, Peter Zijlstra wrote: On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. Peter, would you prefer going with the above? If we were to keep the pointers to the next and prev nodes in the struct optimistic_spin_queue instead of converting them to atomic_t to store their cpu #, we'd still need to keep track of the cpu #. In the unqueue phase of osq_lock, we might have to reload prev = node-prev which we then may cmpxchg() it with the lock tail. The method we can think of so far would be to add a regular int variable to optimistic_spin_queue and initialize it to the CPU #, during the time we also initialize node-locked and node-next at the beginning of osq_lock. The cost wouldn't be much of an issue since optimistic_spin_queue is cache aligned. Let me try and have an actual look at the patch; Okay, I will be sending out the patchset I had so that there's something more concrete. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > > > I do see a point in reducing the size of the rwsem structure. However, I > > don't quite understand the point of converting pointers in the > > optimistic_spin_queue structure to atomic_t. The structure is cacheline > > aligned and there is no saving in size. Converting them to atomic_t does > > have a bit of additional overhead of converting the encoded cpu number > > back to the actual pointer. > > > > So my suggestion is to just change what is stored in the mutex and rwsem > > structure to atomic_t, but keep the pointers in the > > optimistic_spin_queue structure. > > Peter, would you prefer going with the above? > > If we were to keep the pointers to the next and prev nodes in the struct > optimistic_spin_queue instead of converting them to atomic_t to store > their cpu #, we'd still need to keep track of the cpu #. In the unqueue > phase of osq_lock, we might have to reload prev = node->prev which we > then may cmpxchg() it with the lock tail. > > The method we can think of so far would be to add a regular int variable > to optimistic_spin_queue and initialize it to the CPU #, during the time > we also initialize node->locked and node->next at the beginning of > osq_lock. The cost wouldn't be much of an issue since > optimistic_spin_queue is cache aligned. Let me try and have an actual look at the patch; I'm in the tail end of a flu and my head isn't quite set for details, but I'll buckle up and go look, gimme a few :-) pgpFSjtPMogdF.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 01:51:48PM -0700, Jason Low wrote: > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > > On 07/03/2014 02:34 PM, Jason Low wrote: > > > On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: > > >> On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: > > >>> On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > > Would potentially reducing the size of the rw semaphore structure by 32 > > bits (for all architectures using optimistic spinning) be a nice > > benefit? > > >>> Possibly, although I had a look at the mutex structure and we didn't > > >>> have a hole to place it in, unlike what you found with the rwsem. > > >> Yeah, and currently struct rw_semaphore is the largest lock we have in > > >> the kernel. Shaving off space is definitely welcome. > > > Right, especially if it could help things like xfs inode. > > > > > > > I do see a point in reducing the size of the rwsem structure. However, I > > don't quite understand the point of converting pointers in the > > optimistic_spin_queue structure to atomic_t. > > Converting the pointers in the optimistic_spin_queue to atomic_t would > mean we're fully operating on atomic operations instead of using the > potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. > > If we're in the process of using the CPU numbers in atomic_t, I thought > we might as well fix that as well since it has actually been shown to > result in lockups on some architectures. We can then avoid needing to > implement the tricky architecture workarounds for optimistic spinning. > Wouldn't that be a "nice-have"? Nah, I think those archs are fundamentally broken at the moment, we should not make code harder to read and or more complex just for them. pgpNyRMMmhD8k.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 01:51:48PM -0700, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. If we're in the process of using the CPU numbers in atomic_t, I thought we might as well fix that as well since it has actually been shown to result in lockups on some architectures. We can then avoid needing to implement the tricky architecture workarounds for optimistic spinning. Wouldn't that be a nice-have? Nah, I think those archs are fundamentally broken at the moment, we should not make code harder to read and or more complex just for them. pgpNyRMMmhD8k.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 03, 2014 at 06:07:23PM -0700, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. Peter, would you prefer going with the above? If we were to keep the pointers to the next and prev nodes in the struct optimistic_spin_queue instead of converting them to atomic_t to store their cpu #, we'd still need to keep track of the cpu #. In the unqueue phase of osq_lock, we might have to reload prev = node-prev which we then may cmpxchg() it with the lock tail. The method we can think of so far would be to add a regular int variable to optimistic_spin_queue and initialize it to the CPU #, during the time we also initialize node-locked and node-next at the beginning of osq_lock. The cost wouldn't be much of an issue since optimistic_spin_queue is cache aligned. Let me try and have an actual look at the patch; I'm in the tail end of a flu and my head isn't quite set for details, but I'll buckle up and go look, gimme a few :-) pgpFSjtPMogdF.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > I do see a point in reducing the size of the rwsem structure. However, I > don't quite understand the point of converting pointers in the > optimistic_spin_queue structure to atomic_t. The structure is cacheline > aligned and there is no saving in size. Converting them to atomic_t does > have a bit of additional overhead of converting the encoded cpu number > back to the actual pointer. > > So my suggestion is to just change what is stored in the mutex and rwsem > structure to atomic_t, but keep the pointers in the > optimistic_spin_queue structure. Peter, would you prefer going with the above? If we were to keep the pointers to the next and prev nodes in the struct optimistic_spin_queue instead of converting them to atomic_t to store their cpu #, we'd still need to keep track of the cpu #. In the unqueue phase of osq_lock, we might have to reload prev = node->prev which we then may cmpxchg() it with the lock tail. The method we can think of so far would be to add a regular int variable to optimistic_spin_queue and initialize it to the CPU #, during the time we also initialize node->locked and node->next at the beginning of osq_lock. The cost wouldn't be much of an issue since optimistic_spin_queue is cache aligned. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 17:35 -0400, Waiman Long wrote: > On 07/03/2014 04:51 PM, Jason Low wrote: > > On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > >> On 07/03/2014 02:34 PM, Jason Low wrote: > >>> On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: > On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: > > On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > >> Would potentially reducing the size of the rw semaphore structure by 32 > >> bits (for all architectures using optimistic spinning) be a nice > >> benefit? > > Possibly, although I had a look at the mutex structure and we didn't > > have a hole to place it in, unlike what you found with the rwsem. > Yeah, and currently struct rw_semaphore is the largest lock we have in > the kernel. Shaving off space is definitely welcome. > >>> Right, especially if it could help things like xfs inode. > >>> > >> I do see a point in reducing the size of the rwsem structure. However, I > >> don't quite understand the point of converting pointers in the > >> optimistic_spin_queue structure to atomic_t. > > Converting the pointers in the optimistic_spin_queue to atomic_t would > > mean we're fully operating on atomic operations instead of using the > > potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. > > Yes, the ACCESS_ONCE macro for data store does have problem on some > architectures. However, I prefer a more holistic solution to solve this > problem rather than a workaround by changing the pointers to atomic_t's. > It is because even if we make the change, we are still not sure if that > will work for those architectures as we have no machine to verify that. > Why not let the champions of those architectures to propose changes > instead of making some untested changes now and penalize commonly used > architectures like x86. So I initially was thinking that converting to atomic_t would not result in reducing performance on other architecture. However, you do have a point in your first post that converting the encoded cpu number to the pointer may add a little bit of overhead (in the contended cases). If converting pointers to atomic_t in the optimistic_spin_queue structure does affect performance for commonly used architectures, then I agree that we should avoid that and only convert what's stored in mutex/rwsem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On 07/03/2014 04:51 PM, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. Yes, the ACCESS_ONCE macro for data store does have problem on some architectures. However, I prefer a more holistic solution to solve this problem rather than a workaround by changing the pointers to atomic_t's. It is because even if we make the change, we are still not sure if that will work for those architectures as we have no machine to verify that. Why not let the champions of those architectures to propose changes instead of making some untested changes now and penalize commonly used architectures like x86. If we're in the process of using the CPU numbers in atomic_t, I thought we might as well fix that as well since it has actually been shown to result in lockups on some architectures. We can then avoid needing to implement the tricky architecture workarounds for optimistic spinning. Wouldn't that be a "nice-have"? Jason I am not aware of any tricky architectural workarounds other than disabling optimistic spinning for those that don't support it. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: > On 07/03/2014 02:34 PM, Jason Low wrote: > > On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: > >> On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: > >>> On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > Would potentially reducing the size of the rw semaphore structure by 32 > bits (for all architectures using optimistic spinning) be a nice > benefit? > >>> Possibly, although I had a look at the mutex structure and we didn't > >>> have a hole to place it in, unlike what you found with the rwsem. > >> Yeah, and currently struct rw_semaphore is the largest lock we have in > >> the kernel. Shaving off space is definitely welcome. > > Right, especially if it could help things like xfs inode. > > > > I do see a point in reducing the size of the rwsem structure. However, I > don't quite understand the point of converting pointers in the > optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. If we're in the process of using the CPU numbers in atomic_t, I thought we might as well fix that as well since it has actually been shown to result in lockups on some architectures. We can then avoid needing to implement the tricky architecture workarounds for optimistic spinning. Wouldn't that be a "nice-have"? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: > On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: > > On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > > > Would potentially reducing the size of the rw semaphore structure by 32 > > > bits (for all architectures using optimistic spinning) be a nice > > > benefit? > > > > Possibly, although I had a look at the mutex structure and we didn't > > have a hole to place it in, unlike what you found with the rwsem. > > Yeah, and currently struct rw_semaphore is the largest lock we have in > the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 08:35 -0700, Linus Torvalds wrote: > On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds > wrote: > > > > I don't think we'll support 2 _billion_ processes/threads waiting on > > the same semaphore any time soon, so the 'long' seems a bit of an > > overkill on 64-bit architectures. > > Oh, never mind. The 'struct semaphore' uses it as just a plain count, > but the rwsem ends up splitting up the bits for readers/writers, so we > actually do want the full 64-bit value there. Yeah, that was the key to reducing the struct size as the rwsem count can't share a 64 bit chunk with the spinlock, unlike some of the other lock. Thanks, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: > On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > > Would potentially reducing the size of the rw semaphore structure by 32 > > bits (for all architectures using optimistic spinning) be a nice > > benefit? > > Possibly, although I had a look at the mutex structure and we didn't > have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds wrote: > > I don't think we'll support 2 _billion_ processes/threads waiting on > the same semaphore any time soon, so the 'long' seems a bit of an > overkill on 64-bit architectures. Oh, never mind. The 'struct semaphore' uses it as just a plain count, but the rwsem ends up splitting up the bits for readers/writers, so we actually do want the full 64-bit value there. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 2, 2014 at 9:39 PM, Jason Low wrote: > > And due to padding, the additional modification below reduces the > size of struct rw_semaphore by 64 bits on my machine :) Well, I think you could also make 'count' just be 'int' instead of 'long'. I don't think we'll support 2 _billion_ processes/threads waiting on the same semaphore any time soon, so the 'long' seems a bit of an overkill on 64-bit architectures. Linus. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:39:18PM -0700, Jason Low wrote: > And due to padding, the additional modification below reduces the > size of struct rw_semaphore by 64 bits on my machine :) > > > struct rw_semaphore { > long count; > - raw_spinlock_t wait_lock; > struct list_head wait_list; > + raw_spinlock_t wait_lock; > #ifdef CONFIG_SMP > + struct optimistic_spin_tail osq; /* spinner MCS lock */ > /* >* Write owner. Used as a speculative check to see >* if the owner is running on the cpu. >*/ > struct task_struct *owner; > - struct optimistic_spin_tail osq; /* spinner MCS lock */ > #endif > #ifdef CONFIG_DEBUG_LOCK_ALLOC > struct lockdep_map dep_map; > Right, that might make sense. pgp9dUHbJo6sM.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: > Would potentially reducing the size of the rw semaphore structure by 32 > bits (for all architectures using optimistic spinning) be a nice > benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. pgpTk5rgWPdnB.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. pgpTk5rgWPdnB.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:39:18PM -0700, Jason Low wrote: And due to padding, the additional modification below reduces the size of struct rw_semaphore by 64 bits on my machine :) struct rw_semaphore { long count; - raw_spinlock_t wait_lock; struct list_head wait_list; + raw_spinlock_t wait_lock; #ifdef CONFIG_SMP + struct optimistic_spin_tail osq; /* spinner MCS lock */ /* * Write owner. Used as a speculative check to see * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_tail osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; Right, that might make sense. pgp9dUHbJo6sM.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 2, 2014 at 9:39 PM, Jason Low jason.l...@hp.com wrote: And due to padding, the additional modification below reduces the size of struct rw_semaphore by 64 bits on my machine :) Well, I think you could also make 'count' just be 'int' instead of 'long'. I don't think we'll support 2 _billion_ processes/threads waiting on the same semaphore any time soon, so the 'long' seems a bit of an overkill on 64-bit architectures. Linus. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: I don't think we'll support 2 _billion_ processes/threads waiting on the same semaphore any time soon, so the 'long' seems a bit of an overkill on 64-bit architectures. Oh, never mind. The 'struct semaphore' uses it as just a plain count, but the rwsem ends up splitting up the bits for readers/writers, so we actually do want the full 64-bit value there. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 08:35 -0700, Linus Torvalds wrote: On Thu, Jul 3, 2014 at 8:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: I don't think we'll support 2 _billion_ processes/threads waiting on the same semaphore any time soon, so the 'long' seems a bit of an overkill on 64-bit architectures. Oh, never mind. The 'struct semaphore' uses it as just a plain count, but the rwsem ends up splitting up the bits for readers/writers, so we actually do want the full 64-bit value there. Yeah, that was the key to reducing the struct size as the rwsem count can't share a 64 bit chunk with the spinlock, unlike some of the other lock. Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. -Longman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. If we're in the process of using the CPU numbers in atomic_t, I thought we might as well fix that as well since it has actually been shown to result in lockups on some architectures. We can then avoid needing to implement the tricky architecture workarounds for optimistic spinning. Wouldn't that be a nice-have? Jason -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On 07/03/2014 04:51 PM, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. Yes, the ACCESS_ONCE macro for data store does have problem on some architectures. However, I prefer a more holistic solution to solve this problem rather than a workaround by changing the pointers to atomic_t's. It is because even if we make the change, we are still not sure if that will work for those architectures as we have no machine to verify that. Why not let the champions of those architectures to propose changes instead of making some untested changes now and penalize commonly used architectures like x86. If we're in the process of using the CPU numbers in atomic_t, I thought we might as well fix that as well since it has actually been shown to result in lockups on some architectures. We can then avoid needing to implement the tricky architecture workarounds for optimistic spinning. Wouldn't that be a nice-have? Jason I am not aware of any tricky architectural workarounds other than disabling optimistic spinning for those that don't support it. -Longman -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 17:35 -0400, Waiman Long wrote: On 07/03/2014 04:51 PM, Jason Low wrote: On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: On 07/03/2014 02:34 PM, Jason Low wrote: On Thu, 2014-07-03 at 10:09 -0700, Davidlohr Bueso wrote: On Thu, 2014-07-03 at 09:31 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 10:30:03AM -0700, Jason Low wrote: Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? Possibly, although I had a look at the mutex structure and we didn't have a hole to place it in, unlike what you found with the rwsem. Yeah, and currently struct rw_semaphore is the largest lock we have in the kernel. Shaving off space is definitely welcome. Right, especially if it could help things like xfs inode. I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. Converting the pointers in the optimistic_spin_queue to atomic_t would mean we're fully operating on atomic operations instead of using the potentially racy cmpxchg + ACCESS_ONCE stores on the pointers. Yes, the ACCESS_ONCE macro for data store does have problem on some architectures. However, I prefer a more holistic solution to solve this problem rather than a workaround by changing the pointers to atomic_t's. It is because even if we make the change, we are still not sure if that will work for those architectures as we have no machine to verify that. Why not let the champions of those architectures to propose changes instead of making some untested changes now and penalize commonly used architectures like x86. So I initially was thinking that converting to atomic_t would not result in reducing performance on other architecture. However, you do have a point in your first post that converting the encoded cpu number to the pointer may add a little bit of overhead (in the contended cases). If converting pointers to atomic_t in the optimistic_spin_queue structure does affect performance for commonly used architectures, then I agree that we should avoid that and only convert what's stored in mutex/rwsem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Thu, 2014-07-03 at 16:35 -0400, Waiman Long wrote: I do see a point in reducing the size of the rwsem structure. However, I don't quite understand the point of converting pointers in the optimistic_spin_queue structure to atomic_t. The structure is cacheline aligned and there is no saving in size. Converting them to atomic_t does have a bit of additional overhead of converting the encoded cpu number back to the actual pointer. So my suggestion is to just change what is stored in the mutex and rwsem structure to atomic_t, but keep the pointers in the optimistic_spin_queue structure. Peter, would you prefer going with the above? If we were to keep the pointers to the next and prev nodes in the struct optimistic_spin_queue instead of converting them to atomic_t to store their cpu #, we'd still need to keep track of the cpu #. In the unqueue phase of osq_lock, we might have to reload prev = node-prev which we then may cmpxchg() it with the lock tail. The method we can think of so far would be to add a regular int variable to optimistic_spin_queue and initialize it to the CPU #, during the time we also initialize node-locked and node-next at the beginning of osq_lock. The cost wouldn't be much of an issue since optimistic_spin_queue is cache aligned. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 10:30 -0700, Jason Low wrote: > On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote: > > On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: > > > > > > Why I converted pointers to atomic_t? > > > > > > This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while > > > also using less overhead, since atomic_t is often only 32 bits while > > > pointers could be 64 bits. > > > > So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is > > likely broken all over the place, and 'fixing' this one place doesn't > > cure the problem. > > Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture > workarounds for optimistic spinning was just a nice side effect. > > Would potentially reducing the size of the rw semaphore structure by 32 > bits (for all architectures using optimistic spinning) be a nice > benefit? And due to padding, the additional modification below reduces the size of struct rw_semaphore by 64 bits on my machine :) struct rw_semaphore { long count; - raw_spinlock_t wait_lock; struct list_head wait_list; + raw_spinlock_t wait_lock; #ifdef CONFIG_SMP + struct optimistic_spin_tail osq; /* spinner MCS lock */ /* * Write owner. Used as a speculative check to see * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_tail osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote: > On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: > > On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: > > > On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: > > > > The cancellable MCS spinlock is currently used to queue threads that are > > > > doing optimistic spinning. It uses per-cpu nodes, where a thread > > > > obtaining > > > > the lock would access and queue the local node corresponding to the CPU > > > > that > > > > it's running on. Currently, the cancellable MCS lock is implemented by > > > > using > > > > pointers to these nodes. > > > > > > > > In this RFC patch, instead of operating on pointers to the per-cpu > > > > nodes, we > > > > store the CPU numbers in which the per-cpu nodes correspond to in > > > > atomic_t. > > > > A similar concept is used with the qspinlock. > > > > > > > > We add 1 to the CPU number to retrive an "encoded value" representing > > > > the node > > > > of that CPU. By doing this, 0 can represent "no CPU", which allows us to > > > > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the > > > > next and > > > > prev pointers in each node were also modified to store encoded CPU > > > > values. > > > > > > > > By operating on the CPU # of the nodes using atomic_t instead of > > > > pointers > > > > to those nodes, this can reduce the overhead of the cancellable MCS > > > > spinlock > > > > by 32 bits (on 64 bit systems). > > > > > > Still struggling to figure out why you did this. > > > > Why I converted pointers to atomic_t? > > > > This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while > > also using less overhead, since atomic_t is often only 32 bits while > > pointers could be 64 bits. > > So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is > likely broken all over the place, and 'fixing' this one place doesn't > cure the problem. Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture workarounds for optimistic spinning was just a nice side effect. Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: > On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: > > On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: > > > The cancellable MCS spinlock is currently used to queue threads that are > > > doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining > > > the lock would access and queue the local node corresponding to the CPU > > > that > > > it's running on. Currently, the cancellable MCS lock is implemented by > > > using > > > pointers to these nodes. > > > > > > In this RFC patch, instead of operating on pointers to the per-cpu nodes, > > > we > > > store the CPU numbers in which the per-cpu nodes correspond to in > > > atomic_t. > > > A similar concept is used with the qspinlock. > > > > > > We add 1 to the CPU number to retrive an "encoded value" representing the > > > node > > > of that CPU. By doing this, 0 can represent "no CPU", which allows us to > > > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the > > > next and > > > prev pointers in each node were also modified to store encoded CPU values. > > > > > > By operating on the CPU # of the nodes using atomic_t instead of pointers > > > to those nodes, this can reduce the overhead of the cancellable MCS > > > spinlock > > > by 32 bits (on 64 bit systems). > > > > Still struggling to figure out why you did this. > > Why I converted pointers to atomic_t? > > This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while > also using less overhead, since atomic_t is often only 32 bits while > pointers could be 64 bits. So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is likely broken all over the place, and 'fixing' this one place doesn't cure the problem. pgpaYYGxpg7ra.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: > On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: > > The cancellable MCS spinlock is currently used to queue threads that are > > doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining > > the lock would access and queue the local node corresponding to the CPU that > > it's running on. Currently, the cancellable MCS lock is implemented by using > > pointers to these nodes. > > > > In this RFC patch, instead of operating on pointers to the per-cpu nodes, we > > store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. > > A similar concept is used with the qspinlock. > > > > We add 1 to the CPU number to retrive an "encoded value" representing the > > node > > of that CPU. By doing this, 0 can represent "no CPU", which allows us to > > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the next > > and > > prev pointers in each node were also modified to store encoded CPU values. > > > > By operating on the CPU # of the nodes using atomic_t instead of pointers > > to those nodes, this can reduce the overhead of the cancellable MCS spinlock > > by 32 bits (on 64 bit systems). > > Still struggling to figure out why you did this. Why I converted pointers to atomic_t? This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while also using less overhead, since atomic_t is often only 32 bits while pointers could be 64 bits. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: > The cancellable MCS spinlock is currently used to queue threads that are > doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining > the lock would access and queue the local node corresponding to the CPU that > it's running on. Currently, the cancellable MCS lock is implemented by using > pointers to these nodes. > > In this RFC patch, instead of operating on pointers to the per-cpu nodes, we > store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. > A similar concept is used with the qspinlock. > > We add 1 to the CPU number to retrive an "encoded value" representing the node > of that CPU. By doing this, 0 can represent "no CPU", which allows us to > keep the simple "if (CPU)" and "if (!CPU)" checks. In this patch, the next and > prev pointers in each node were also modified to store encoded CPU values. > > By operating on the CPU # of the nodes using atomic_t instead of pointers > to those nodes, this can reduce the overhead of the cancellable MCS spinlock > by 32 bits (on 64 bit systems). Still struggling to figure out why you did this. pgpfPeHZ3abfs.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this RFC patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. We add 1 to the CPU number to retrive an encoded value representing the node of that CPU. By doing this, 0 can represent no CPU, which allows us to keep the simple if (CPU) and if (!CPU) checks. In this patch, the next and prev pointers in each node were also modified to store encoded CPU values. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Still struggling to figure out why you did this. pgpfPeHZ3abfs.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this RFC patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. We add 1 to the CPU number to retrive an encoded value representing the node of that CPU. By doing this, 0 can represent no CPU, which allows us to keep the simple if (CPU) and if (!CPU) checks. In this patch, the next and prev pointers in each node were also modified to store encoded CPU values. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Still struggling to figure out why you did this. Why I converted pointers to atomic_t? This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while also using less overhead, since atomic_t is often only 32 bits while pointers could be 64 bits. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this RFC patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. We add 1 to the CPU number to retrive an encoded value representing the node of that CPU. By doing this, 0 can represent no CPU, which allows us to keep the simple if (CPU) and if (!CPU) checks. In this patch, the next and prev pointers in each node were also modified to store encoded CPU values. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Still struggling to figure out why you did this. Why I converted pointers to atomic_t? This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while also using less overhead, since atomic_t is often only 32 bits while pointers could be 64 bits. So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is likely broken all over the place, and 'fixing' this one place doesn't cure the problem. pgpaYYGxpg7ra.pgp Description: PGP signature
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: On Wed, 2014-07-02 at 18:27 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 09:21:10AM -0700, Jason Low wrote: The cancellable MCS spinlock is currently used to queue threads that are doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining the lock would access and queue the local node corresponding to the CPU that it's running on. Currently, the cancellable MCS lock is implemented by using pointers to these nodes. In this RFC patch, instead of operating on pointers to the per-cpu nodes, we store the CPU numbers in which the per-cpu nodes correspond to in atomic_t. A similar concept is used with the qspinlock. We add 1 to the CPU number to retrive an encoded value representing the node of that CPU. By doing this, 0 can represent no CPU, which allows us to keep the simple if (CPU) and if (!CPU) checks. In this patch, the next and prev pointers in each node were also modified to store encoded CPU values. By operating on the CPU # of the nodes using atomic_t instead of pointers to those nodes, this can reduce the overhead of the cancellable MCS spinlock by 32 bits (on 64 bit systems). Still struggling to figure out why you did this. Why I converted pointers to atomic_t? This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while also using less overhead, since atomic_t is often only 32 bits while pointers could be 64 bits. So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is likely broken all over the place, and 'fixing' this one place doesn't cure the problem. Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture workarounds for optimistic spinning was just a nice side effect. Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Cancellable MCS spinlock rework
On Wed, 2014-07-02 at 10:30 -0700, Jason Low wrote: On Wed, 2014-07-02 at 19:23 +0200, Peter Zijlstra wrote: On Wed, Jul 02, 2014 at 09:59:16AM -0700, Jason Low wrote: Why I converted pointers to atomic_t? This would avoid the potentially racy ACCESS_ONCE stores + cmpxchg while also using less overhead, since atomic_t is often only 32 bits while pointers could be 64 bits. So no real good reason.. The ACCESS_ONCE stores + cmpxchg stuff is likely broken all over the place, and 'fixing' this one place doesn't cure the problem. Right, fixing the ACCESS_ONCE + cmpxchg and avoiding the architecture workarounds for optimistic spinning was just a nice side effect. Would potentially reducing the size of the rw semaphore structure by 32 bits (for all architectures using optimistic spinning) be a nice benefit? And due to padding, the additional modification below reduces the size of struct rw_semaphore by 64 bits on my machine :) struct rw_semaphore { long count; - raw_spinlock_t wait_lock; struct list_head wait_list; + raw_spinlock_t wait_lock; #ifdef CONFIG_SMP + struct optimistic_spin_tail osq; /* spinner MCS lock */ /* * Write owner. Used as a speculative check to see * if the owner is running on the cpu. */ struct task_struct *owner; - struct optimistic_spin_tail osq; /* spinner MCS lock */ #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/