Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-07 Thread Paul E. McKenney
On Wed, Feb 05, 2014 at 10:10:34PM -0500, Waiman Long wrote: > On 02/03/2014 06:40 AM, Peter Zijlstra wrote: > >On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: > >>Light contention is the only case where the qspinlock may not perform as > >>good as the ticket spinlock. I know this is

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-07 Thread Paul E. McKenney
On Wed, Feb 05, 2014 at 10:10:34PM -0500, Waiman Long wrote: On 02/03/2014 06:40 AM, Peter Zijlstra wrote: On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: Light contention is the only case where the qspinlock may not perform as good as the ticket spinlock. I know this is the most

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-05 Thread Waiman Long
On 02/03/2014 06:40 AM, Peter Zijlstra wrote: On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: Light contention is the only case where the qspinlock may not perform as good as the ticket spinlock. I know this is the most common case. However, I would argue that the slowdown, if any,

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-05 Thread Waiman Long
On 02/03/2014 06:40 AM, Peter Zijlstra wrote: On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: Light contention is the only case where the qspinlock may not perform as good as the ticket spinlock. I know this is the most common case. However, I would argue that the slowdown, if any,

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-03 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: > > Light contention is the only case where the qspinlock may not perform as > good as the ticket spinlock. I know this is the most common case. However, I > would argue that the slowdown, if any, will not be really noticeable. This >

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-03 Thread Raghavendra K T
On 01/28/2014 11:49 PM, Waiman Long wrote: +#endif /* __ASM_GENERIC_QSPINLOCK_H */ diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index d2b32ac..f185584 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -223,3 +223,10 @@ endif config MUTEX_SPIN_ON_OWNER def_bool

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-03 Thread Raghavendra K T
On 01/28/2014 11:49 PM, Waiman Long wrote: +#endif /* __ASM_GENERIC_QSPINLOCK_H */ diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index d2b32ac..f185584 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -223,3 +223,10 @@ endif config MUTEX_SPIN_ON_OWNER def_bool

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-03 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: Light contention is the only case where the qspinlock may not perform as good as the ticket spinlock. I know this is the most common case. However, I would argue that the slowdown, if any, will not be really noticeable. This is

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: > On 01/31/2014 10:08 AM, Peter Zijlstra wrote: > >Whereas for the straight cmpxchg() you'd get something relatively simple > >like: > > > > mov %edx,%eax > > lock cmpxchg %ecx,(%rbx) > > cmp %edx,%eax > > jne ... > > I believe

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 01:26:29PM -0500, Waiman Long wrote: > >I don't get why we need the used thing at all; something like: > > > >struct qna { > > int cnt; > > struct qnode nodes[4]; > >}; > > > >DEFINE_PER_CPU(struct qna, qna); > > > >struct qnode *get_qnode(void) > >{ > > struct

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/31/2014 02:14 PM, George Spelvin wrote: Yes, we can do something like that. However I think put_qnode() needs to use atomic dec as well. As a result, we will need 2 additional atomic operations per slowpath invocation. The code may look simpler, but I don't think it will be faster than

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/31/2014 10:08 AM, Peter Zijlstra wrote: On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: For single-thread performance (no contention), a 256K lock/unlock loop was run on a 2.4Ghz Westmere x86-64 CPU. The following table shows the average time (in ns) for a single lock/unlock

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread George Spelvin
> Yes, we can do something like that. However I think put_qnode() needs to > use atomic dec as well. As a result, we will need 2 additional atomic > operations per slowpath invocation. The code may look simpler, but I > don't think it will be faster than what I am currently doing as the >

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:35 PM, Peter Zijlstra wrote: On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:28 PM, Peter Zijlstra wrote: On Thu, Jan 30, 2014 at 11:00:30AM -0800, Tim Chen wrote: On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:00 PM, Tim Chen wrote: On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void queue_spin_lock_slowpath(struct qspinlock *lock) +{ + unsigned int

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: > For single-thread performance (no contention), a 256K lock/unlock > loop was run on a 2.4Ghz Westmere x86-64 CPU. The following table > shows the average time (in ns) for a single lock/unlock sequence > (including the looping and

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:00 PM, Tim Chen wrote: On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void queue_spin_lock_slowpath(struct qspinlock *lock) +{ + unsigned int

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:28 PM, Peter Zijlstra wrote: On Thu, Jan 30, 2014 at 11:00:30AM -0800, Tim Chen wrote: On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/30/2014 02:35 PM, Peter Zijlstra wrote: On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread George Spelvin
Yes, we can do something like that. However I think put_qnode() needs to use atomic dec as well. As a result, we will need 2 additional atomic operations per slowpath invocation. The code may look simpler, but I don't think it will be faster than what I am currently doing as the cases

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/31/2014 10:08 AM, Peter Zijlstra wrote: On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: For single-thread performance (no contention), a 256K lock/unlock loop was run on a 2.4Ghz Westmere x86-64 CPU. The following table shows the average time (in ns) for a single lock/unlock

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Waiman Long
On 01/31/2014 02:14 PM, George Spelvin wrote: Yes, we can do something like that. However I think put_qnode() needs to use atomic dec as well. As a result, we will need 2 additional atomic operations per slowpath invocation. The code may look simpler, but I don't think it will be faster than

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 01:26:29PM -0500, Waiman Long wrote: I don't get why we need the used thing at all; something like: struct qna { int cnt; struct qnode nodes[4]; }; DEFINE_PER_CPU(struct qna, qna); struct qnode *get_qnode(void) { struct qna *qna =

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Fri, Jan 31, 2014 at 02:24:33PM -0500, Waiman Long wrote: On 01/31/2014 10:08 AM, Peter Zijlstra wrote: Whereas for the straight cmpxchg() you'd get something relatively simple like: mov %edx,%eax lock cmpxchg %ecx,(%rbx) cmp %edx,%eax jne ... I believe the speeds of the

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-31 Thread Peter Zijlstra
On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: For single-thread performance (no contention), a 256K lock/unlock loop was run on a 2.4Ghz Westmere x86-64 CPU. The following table shows the average time (in ns) for a single lock/unlock sequence (including the looping and timing

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Tim Chen
> > > + /* > > > + * Set up the new cpu code to be exchanged > > > + */ > > > + my_qcode = SET_QCODE(cpu_nr, qn_idx); > > > + > > > > If we get interrupted here before we have a chance to set the used flag, > > the interrupt handler could pick up the same qnode if it tries to > > acquire

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Peter Zijlstra
On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: > This patch introduces a new queue spinlock implementation that can > serve as an alternative to the default ticket spinlock. Compared with > the ticket spinlock, this queue spinlock should be almost as fair as > the ticket spinlock. It

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Peter Zijlstra
On Thu, Jan 30, 2014 at 11:00:30AM -0800, Tim Chen wrote: > On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: > > > +/** > > + * queue_spin_lock_slowpath - acquire the queue spinlock > > + * @lock: Pointer to queue spinlock structure > > + */ > > +void queue_spin_lock_slowpath(struct

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Tim Chen
On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: > +/** > + * queue_spin_lock_slowpath - acquire the queue spinlock > + * @lock: Pointer to queue spinlock structure > + */ > +void queue_spin_lock_slowpath(struct qspinlock *lock) > +{ > + unsigned int cpu_nr, qn_idx; > + struct qnode

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Rik van Riel
On 01/28/2014 01:19 PM, Waiman Long wrote: This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Rik van Riel
On 01/28/2014 01:19 PM, Waiman Long wrote: This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Tim Chen
On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void queue_spin_lock_slowpath(struct qspinlock *lock) +{ + unsigned int cpu_nr, qn_idx; + struct qnode *node,

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Peter Zijlstra
On Thu, Jan 30, 2014 at 11:00:30AM -0800, Tim Chen wrote: On Tue, 2014-01-28 at 13:19 -0500, Waiman Long wrote: +/** + * queue_spin_lock_slowpath - acquire the queue spinlock + * @lock: Pointer to queue spinlock structure + */ +void queue_spin_lock_slowpath(struct qspinlock *lock)

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Peter Zijlstra
On Tue, Jan 28, 2014 at 01:19:10PM -0500, Waiman Long wrote: This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-30 Thread Tim Chen
+ /* + * Set up the new cpu code to be exchanged + */ + my_qcode = SET_QCODE(cpu_nr, qn_idx); + If we get interrupted here before we have a chance to set the used flag, the interrupt handler could pick up the same qnode if it tries to acquire queued spin lock. Then we

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-29 Thread Waiman Long
On 01/28/2014 07:20 PM, Andi Kleen wrote: So the 1-2 threads case is the standard case on a small system, isn't it? This may well cause regressions. Yes, it is possible that in a lightly contended case, the queue spinlock maybe a bit slower because of the slowpath overhead. I observed some

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-29 Thread Waiman Long
On 01/28/2014 07:20 PM, Andi Kleen wrote: So the 1-2 threads case is the standard case on a small system, isn't it? This may well cause regressions. Yes, it is possible that in a lightly contended case, the queue spinlock maybe a bit slower because of the slowpath overhead. I observed some

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread George Spelvin
> So the 1-2 threads case is the standard case on a small > system, isn't it? This may well cause regressions. Well, the common case should be uncontended, which is faster. But yes, testing would be nice. >> In the extremely unlikely case that all the queue node entries are >> used up, the

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread Andi Kleen
So the 1-2 threads case is the standard case on a small system, isn't it? This may well cause regressions. > In the extremely unlikely case that all the queue node entries are > used up, the current code will fall back to busy spinning without > waiting in a queue with warning message.

[PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread Waiman Long
This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster

[PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread Waiman Long
This patch introduces a new queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread Andi Kleen
So the 1-2 threads case is the standard case on a small system, isn't it? This may well cause regressions. In the extremely unlikely case that all the queue node entries are used up, the current code will fall back to busy spinning without waiting in a queue with warning message.

Re: [PATCH v3 1/2] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-01-28 Thread George Spelvin
So the 1-2 threads case is the standard case on a small system, isn't it? This may well cause regressions. Well, the common case should be uncontended, which is faster. But yes, testing would be nice. In the extremely unlikely case that all the queue node entries are used up, the current