[PATCH v12 07/11] qspinlock: Revert to test-and-set on hypervisors

2014-10-16 Thread Waiman Long
From: Peter Zijlstra pet...@infradead.org When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l

[PATCH v12 08/11] qspinlock, x86: Rename paravirt_ticketlocks_enabled

2014-10-16 Thread Waiman Long
This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/include/asm/spinlock.h |4 ++-- arch/x86/kernel/kvm.c

Re: [PATCH 10/11] qspinlock: Paravirt support

2014-06-18 Thread Waiman Long
On 06/18/2014 08:03 AM, Paolo Bonzini wrote: Il 17/06/2014 00:08, Waiman Long ha scritto: +void __pv_queue_unlock(struct qspinlock *lock) +{ +int val = atomic_read(lock-val); + +native_queue_unlock(lock); + +if (val _Q_LOCKED_SLOW) +___pv_kick_head(lock); +} + Again

Re: [PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014-06-18 Thread Waiman Long
On 06/18/2014 09:50 AM, Konrad Rzeszutek Wilk wrote: On Wed, Jun 18, 2014 at 01:37:45PM +0200, Paolo Bonzini wrote: Il 17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto: On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: From: Waiman Longwaiman.l...@hp.com This patch extracts

Re: [PATCH 03/11] qspinlock: Add pending bit

2014-06-17 Thread Waiman Long
On 06/17/2014 04:36 PM, Konrad Rzeszutek Wilk wrote: On Sun, Jun 15, 2014 at 02:47:00PM +0200, Peter Zijlstra wrote: Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. Could you add this in the

Re: [PATCH 03/11] qspinlock: Add pending bit

2014-06-17 Thread Waiman Long
On 06/17/2014 05:10 PM, Konrad Rzeszutek Wilk wrote: On Tue, Jun 17, 2014 at 05:07:29PM -0400, Konrad Rzeszutek Wilk wrote: On Tue, Jun 17, 2014 at 04:51:57PM -0400, Waiman Long wrote: On 06/17/2014 04:36 PM, Konrad Rzeszutek Wilk wrote: On Sun, Jun 15, 2014 at 02:47:00PM +0200, Peter

Re: [PATCH 08/11] qspinlock: Revert to test-and-set on hypervisors

2014-06-16 Thread Waiman Long
On 06/15/2014 08:47 AM, Peter Zijlstra wrote: When we detect a hypervisor (!paravirt, see later patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstrapet...@infradead.org --- arch/x86/include/asm/qspinlock.h | 14 ++

Re: [PATCH 10/11] qspinlock: Paravirt support

2014-06-16 Thread Waiman Long
On 06/15/2014 08:47 AM, Peter Zijlstra wrote: +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +/* + * Write a comment about how all this works... + */ + +#define _Q_LOCKED_SLOW (2U _Q_LOCKED_OFFSET) + +struct pv_node { + struct mcs_spinlock mcs; + struct mcs_spinlock __offset[3]; +

Re: [PATCH 10/11] qspinlock: Paravirt support

2014-06-16 Thread Waiman Long
I am resending it as my original reply has some HTML code hence rejected by the mailing lists. On 06/15/2014 08:47 AM, Peter Zijlstra wrote: +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +/* + * Write a comment about how all this works... + */ + +#define _Q_LOCKED_SLOW (2U _Q_LOCKED_OFFSET) +

Re: [PATCH v11 14/16] pvqspinlock: Add qspinlock para-virtualization support

2014-06-12 Thread Waiman Long
On 06/12/2014 04:17 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:44:00AM -0400, Waiman Long wrote: @@ -19,13 +19,46 @@ extern struct static_key virt_unfairlocks_enabled; * that the clearing the lock bit is done ASAP without artificial delay * due to compiler optimization

Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

2014-06-12 Thread Waiman Long
On 06/12/2014 02:00 AM, Peter Zijlstra wrote: On Wed, Jun 11, 2014 at 05:22:28PM -0400, Long, Wai Man wrote: @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) */ for (;;) { /* -* If we observe any contention;

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-12 Thread Waiman Long
On 06/12/2014 01:50 AM, Peter Zijlstra wrote: On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: On 6/11/2014 6:54 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance

[PATCH v11 01/16] qspinlock: A simple generic 4-byte queue spinlock

2014-05-30 Thread Waiman Long
the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- include/asm-generic/qspinlock.h | 118 include/asm-generic/qspinlock_types.h | 61

[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support

2014-05-30 Thread Waiman Long
: Add pending bit qspinlock: Optimize for smaller NR_CPUS Waiman Long (14): qspinlock: A simple generic 4-byte queue spinlock qspinlock, x86: Enable x86-64 to use queue spinlock qspinlock: Extract out the exchange of tail code word qspinlock: prolong the stay in the pending bit path

[PATCH v11 02/16] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-05-30 Thread Waiman Long
optimization which will make the queue spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/Kconfig |1 + arch/x86/include/asm/qspinlock.h | 29

[PATCH v11 03/16] qspinlock: Add pending bit

2014-05-30 Thread Waiman Long
From: Peter Zijlstra pet...@infradead.org Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com

[PATCH v11 04/16] qspinlock: Extract out the exchange of tail code word

2014-05-30 Thread Waiman Long
This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h |2 + kernel/locking/qspinlock.c

[PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

2014-05-30 Thread Waiman Long
% It can be seen that the queue spinlock performance for 2 contending tasks is now comparable to ticket spinlock on the same node, but much faster when in different nodes. With 3 contending tasks, however, the ticket spinlock is still quite a bit faster. Signed-off-by: Waiman Long waiman.l

[PATCH v11 05/16] qspinlock: Optimize for smaller NR_CPUS

2014-05-30 Thread Waiman Long
this is horribly broken on Alpha pre EV56 (and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 13 kernel/locking/qspinlock.c

[PATCH v11 08/16] qspinlock: Prepare for unfair lock support

2014-05-30 Thread Waiman Long
If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need to detect the fact the lock can be stolen. Code are added for the stolen lock detection. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 26

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-05-30 Thread Waiman Long
-- --- ---- 1135135 137 2 4603 1034 1458 3 10940 12087 2562 4 21555 10507 4793 Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/Kconfig | 11

[PATCH v11 10/16] qspinlock: Split the MCS queuing code into a separate slowerpath

2014-05-30 Thread Waiman Long
. This avoids the slow down of the pending bit and trylock code path at the expense of a little bit of additional overhead to the MCS queuing code path. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 162 --- 1 files changed

[PATCH v11 07/16] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-30 Thread Waiman Long
Usr Time -- - ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 62

[PATCH v11 11/16] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled

2014-05-30 Thread Waiman Long
This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/spinlock.h |4 ++-- arch/x86/kernel/kvm.c|2 +- arch/x86/kernel/paravirt

[PATCH v11 14/16] pvqspinlock: Add qspinlock para-virtualization support

2014-05-30 Thread Waiman Long
no difference in performance. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/pvqspinlock.h | 359 arch/x86/include/asm/qspinlock.h | 33 kernel/locking/qspinlock.c | 72 +++- 3 files changed, 458 insertions(+), 6

[PATCH v11 13/16] pvqspinlock: Enable coexistence with the unfair lock

2014-05-30 Thread Waiman Long
in the pending bit code path back to the regular queuing code path so that it can be properly halted by the PV qspinlock code. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 47 --- 1 files changed, 43 insertions(+), 4 deletions

[PATCH v11 12/16] pvqspinlock, x86: Add PV data structure methods

2014-05-30 Thread Waiman Long
statistical data for debugfs Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/paravirt.h | 18 +- arch/x86/include/asm/paravirt_types.h | 17 + arch/x86/kernel/paravirt-spinlocks.c |6 ++ 3 files changed, 40 insertions(+), 1

[PATCH v11 15/16] pvqspinlock, x86: Enable PV qspinlock PV for KVM

2014-05-30 Thread Waiman Long
PV qspinlock 402 10.00 91.550.00 unfair qspinlock 570 10.00 62.980.00 unfair + PV qspinlock 586 10.00 59.680.00 Signed-off-by: Waiman Long waiman.l...@hp.com Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

[PATCH v11 16/16] pvqspinlock, x86: Enable PV qspinlock for XEN

2014-05-30 Thread Waiman Long
This patch adds the necessary XEN specific code to allow XEN to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/xen/spinlock.c | 147 +-- kernel

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-19 Thread Waiman Long
On 05/14/2014 03:13 PM, Radim Krčmář wrote: 2014-05-14 19:00+0200, Peter Zijlstra: On Wed, May 14, 2014 at 06:51:24PM +0200, Radim Krčmář wrote: Ok. I've seen merit in pvqspinlock even with slightly slower first-waiter, so I would have happily sacrificed those horrible branches. (I prefer

Re: [PATCH v10 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-05-19 Thread Waiman Long
On 05/08/2014 03:12 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:38AM -0400, Waiman Long wrote: No, we want the unfair thing for VIRT, not PARAVIRT. Yes, you are right. I will change that to VIRT. diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-13 Thread Waiman Long
On 05/12/2014 11:22 AM, Radim Krčmář wrote: 2014-05-07 11:01-0400, Waiman Long: From: Peter Zijlstrapet...@infradead.org Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. I think

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-09 Thread Waiman Long
On 05/08/2014 02:57 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:31AM -0400, Waiman Long wrote: +/** + * trylock_pending - try to acquire queue spinlock using the pending bit + * @lock : Pointer to queue spinlock structure + * @pval : Pointer to value of the queue spinlock 32-bit

Re: [PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

2014-05-09 Thread Waiman Long
On 05/08/2014 02:58 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:34AM -0400, Waiman Long wrote: @@ -221,11 +222,37 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) */ for (;;) { /* -* If we observe any

Re: [PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-09 Thread Waiman Long
On 05/08/2014 03:00 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:35AM -0400, Waiman Long wrote: @@ -94,23 +94,29 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) * can allow better optimization of the lock acquisition for the pending * bit holder

Re: [PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-09 Thread Waiman Long
On 05/08/2014 03:02 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:35AM -0400, Waiman Long wrote: /** + * get_qlock - Set the lock bit and own the lock + * @lock: Pointer to queue spinlock structure + * + * This routine should only be called when the caller is the only one

Re: [PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-05-09 Thread Waiman Long
On 05/08/2014 03:04 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:36AM -0400, Waiman Long wrote: /* + * To have additional features for better virtualization support, it is + * necessary to store additional data in the queue node structure. So + * a new queue node structure

Re: [PATCH v10 09/19] qspinlock: Prepare for unfair lock support

2014-05-09 Thread Waiman Long
On 05/08/2014 03:06 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:37AM -0400, Waiman Long wrote: If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need to detect the fact the lock can be stolen. Code are added

Re: [PATCH v10 18/19] pvqspinlock, x86: Enable PV qspinlock PV for KVM

2014-05-08 Thread Waiman Long
On 05/07/2014 03:07 PM, Konrad Rzeszutek Wilk wrote: Raghavendra KT had done some performance testing on this patch with the following results: Overall we are seeing good improvement for pv-unfair version. System: 32 cpu sandybridge with HT on (4 node with 32 GB each) Guest : 8GB with 16

Re: [PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-05-08 Thread Waiman Long
On 05/07/2014 03:07 PM, Konrad Rzeszutek Wilk wrote: On Wed, May 07, 2014 at 11:01:28AM -0400, Waiman Long wrote: v9-v10: - Make some minor changes to qspinlock.c to accommodate review feedback. - Change author to PeterZ for 2 of the patches. - Include Raghavendra KT's test results

Re: [PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-05-07 Thread Waiman Long
On 04/27/2014 02:09 PM, Raghavendra K T wrote: For kvm part feel free to add: Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com V9 testing has shown no hangs. I was able to do some performance testing. here are the results: Overall we are seeing good improvement for pv-unfair

[PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-05-07 Thread Waiman Long
of the lock or finer granularity ones. The main purpose is to make the lock contention problems more tolerable until someone can spend the time and effort to fix them. Peter Zijlstra (2): qspinlock: Add pending bit qspinlock: Optimize for smaller NR_CPUS Waiman Long (17): qspinlock: A simple generic

[PATCH v10 01/19] qspinlock: A simple generic 4-byte queue spinlock

2014-05-07 Thread Waiman Long
the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- include/asm-generic/qspinlock.h | 118 include/asm-generic/qspinlock_types.h | 61

[PATCH v10 02/19] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-05-07 Thread Waiman Long
optimization which will make the queue spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/Kconfig |1 + arch/x86/include/asm/qspinlock.h | 29

[PATCH v10 04/19] qspinlock: Extract out the exchange of tail code word

2014-05-07 Thread Waiman Long
This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h |2 + kernel/locking/qspinlock.c

[PATCH v10 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-05-07 Thread Waiman Long
. For large critical section, however, there may not be much benefit. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/Kconfig | 11 + arch/x86/include/asm/qspinlock.h | 79 ++ arch/x86/kernel/Makefile |1 + arch/x86

[PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-07 Thread Waiman Long
Usr Time -- - ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 61

[PATCH v10 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-05-07 Thread Waiman Long
this is horribly broken on Alpha pre EV56 (and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 13 kernel/locking/qspinlock.c

[PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-05-07 Thread Waiman Long
. It is also necessary to expand arch_mcs_spin_lock_contended() to the underlying while loop as additional code will need to be inserted into the loop. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 36 +++- 1 files changed, 23

[PATCH v10 09/19] qspinlock: Prepare for unfair lock support

2014-05-07 Thread Waiman Long
If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need to detect the fact the lock can be stolen. Code are added for the stolen lock detection. A new qhead macro is also defined as a shorthand for mcs.locked. Signed-off-by: Waiman

[PATCH v10 11/19] qspinlock: Split the MCS queuing code into a separate slowerpath

2014-05-07 Thread Waiman Long
into a slowerpath function. This avoids the slow down of the pending bit and trylock code path at the expense of a little bit of additional overhead to the MCS queuing code path. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 120 +-- 1

[PATCH v10 12/19] unfair qspinlock: Variable frequency lock stealing mechanism

2014-05-07 Thread Waiman Long
the cacheline contention problem on the lock word while trying to maintain as much of a FIFO order as possible. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 147 +++- 1 files changed, 146 insertions(+), 1 deletions

[PATCH v10 13/19] unfair qspinlock: Enable lock stealing in lock waiters

2014-05-07 Thread Waiman Long
10507 1869 4307 Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 160 ++-- 1 files changed, 154 insertions(+), 6 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index

[PATCH v10 14/19] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled

2014-05-07 Thread Waiman Long
This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/spinlock.h |4 ++-- arch/x86/kernel/kvm.c|2 +- arch/x86/kernel/paravirt

[PATCH v10 16/19] pvqspinlock: Enable coexistence with the unfair lock

2014-05-07 Thread Waiman Long
in the pending bit code path back to the regular queuing code path so that it can be properly halted by the PV qspinlock code. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 74 ++-- 1 files changed, 64 insertions(+), 10 deletions

[PATCH v10 17/19] pvqspinlock: Add qspinlock para-virtualization support

2014-05-07 Thread Waiman Long
no difference in performance. When coupled with unfair lock, the queue spinlock can be much faster than the PV ticket lock. When both the unfair lock and PV spinlock features is turned on, lock stealing will still be allowed in the fastpath, but not in the slowpath. Signed-off-by: Waiman Long waiman.l

[PATCH v10 18/19] pvqspinlock, x86: Enable PV qspinlock PV for KVM

2014-05-07 Thread Waiman Long
%) 1.5x 3991.9622 (4%) 2.0x 2527.0613 (2.5%) Signed-off-by: Waiman Long waiman.l...@hp.com Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/kernel/kvm.c | 135 + kernel/Kconfig.locks |2 +- 2 files changed, 136 insertions

[PATCH v10 19/19] pvqspinlock, x86: Enable PV qspinlock for XEN

2014-05-07 Thread Waiman Long
This patch adds the necessary XEN specific code to allow XEN to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/xen/spinlock.c | 147 +-- kernel

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-23 Thread Waiman Long
On 04/18/2014 05:40 PM, Waiman Long wrote: On 04/18/2014 03:05 PM, Peter Zijlstra wrote: On Fri, Apr 18, 2014 at 01:52:50PM -0400, Waiman Long wrote: I am confused by your notation. Nah, I think I was confused :-) Make the 1 _Q_LOCKED_VAL though, as that's the proper constant to use

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-23 Thread Waiman Long
On 04/23/2014 10:56 AM, Konrad Rzeszutek Wilk wrote: On Wed, Apr 23, 2014 at 10:23:43AM -0400, Waiman Long wrote: On 04/18/2014 05:40 PM, Waiman Long wrote: On 04/18/2014 03:05 PM, Peter Zijlstra wrote: On Fri, Apr 18, 2014 at 01:52:50PM -0400, Waiman Long wrote: I am confused by your

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-23 Thread Waiman Long
On 04/23/2014 01:55 PM, Konrad Rzeszutek Wilk wrote: On Wed, Apr 23, 2014 at 01:43:58PM -0400, Waiman Long wrote: On 04/23/2014 10:56 AM, Konrad Rzeszutek Wilk wrote: On Wed, Apr 23, 2014 at 10:23:43AM -0400, Waiman Long wrote: On 04/18/2014 05:40 PM, Waiman Long wrote: On 04/18/2014 03:05

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-23 Thread Waiman Long
On 04/23/2014 06:24 PM, Waiman Long wrote: On 04/23/2014 01:55 PM, Konrad Rzeszutek Wilk wrote: On Wed, Apr 23, 2014 at 01:43:58PM -0400, Waiman Long wrote: On 04/23/2014 10:56 AM, Konrad Rzeszutek Wilk wrote: On Wed, Apr 23, 2014 at 10:23:43AM -0400, Waiman Long wrote: On 04/18/2014 05:40

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-18 Thread Waiman Long
On 04/18/2014 03:42 AM, Ingo Molnar wrote: * Waiman Longwaiman.l...@hp.com wrote: Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. Signed-off-by: Peter Zijlstrapet...@infradead.org

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-18 Thread Waiman Long
On 04/18/2014 04:13 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote: + while ((val = atomic_read(lock-val)) _Q_LOCKED_MASK) + arch_mutex_cpu_relax(); That was a cpu_relax(). Yes, but arch_mutex_cpu_relax() is the same as cpu_relax

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-18 Thread Waiman Long
On 04/18/2014 04:15 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote: On 04/17/2014 11:49 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote: @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-18 Thread Waiman Long
On 04/18/2014 04:27 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 05:46:27PM -0400, Waiman Long wrote: On 04/17/2014 11:56 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +struct __qspinlock { + union { + atomic_t val

Re: [PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014-04-18 Thread Waiman Long
On 04/18/2014 04:33 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 09:46:04PM -0400, Waiman Long wrote: BTW, I didn't test out your atomic_test_and_set() change. Did it provide a noticeable performance benefit when compared with cmpxchg()? I've not tested that I think. I had a hard time

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-18 Thread Waiman Long
On 04/18/2014 12:35 PM, Konrad Rzeszutek Wilk wrote: On Fri, Apr 18, 2014 at 12:23:29PM -0400, Waiman Long wrote: On 04/18/2014 03:42 AM, Ingo Molnar wrote: * Waiman Longwaiman.l...@hp.com wrote: Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-18 Thread Waiman Long
On 04/18/2014 01:53 PM, Peter Zijlstra wrote: On Fri, Apr 18, 2014 at 01:32:47PM -0400, Waiman Long wrote: On 04/18/2014 04:15 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote: On 04/17/2014 11:49 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-18 Thread Waiman Long
On 04/18/2014 03:05 PM, Peter Zijlstra wrote: On Fri, Apr 18, 2014 at 01:52:50PM -0400, Waiman Long wrote: I am confused by your notation. Nah, I think I was confused :-) Make the 1 _Q_LOCKED_VAL though, as that's the proper constant to use. Everyone gets confused once in a while:-) I have

[PATCH v9 01/19] qspinlock: A simple generic 4-byte queue spinlock

2014-04-17 Thread Waiman Long
the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- include/asm-generic/qspinlock.h | 118 include/asm-generic/qspinlock_types.h | 61

[PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-04-17 Thread Waiman Long
spinlock contention problems. Those need to be solved by refactoring the code to make more efficient use of the lock or finer granularity ones. The main purpose is to make the lock contention problems more tolerable until someone can spend the time and effort to fix them. Waiman Long (19): qspinlock

[PATCH v9 03/19] qspinlock: Add pending bit

2014-04-17 Thread Waiman Long
Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 12

[PATCH v9 02/19] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-04-17 Thread Waiman Long
optimization which will make the queue spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/Kconfig |1 + arch/x86/include/asm/qspinlock.h | 29

[PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-17 Thread Waiman Long
This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h |2 + kernel/locking/qspinlock.c

[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Waiman Long
(and any other arch that cannot do single-copy atomic byte stores). Signed-off-by: Peter Zijlstra pet...@infradead.org Signed-off-by: Waiman Long waiman.l...@hp.com --- include/asm-generic/qspinlock_types.h | 13 kernel/locking/qspinlock.c| 111 ++--- 2

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014-04-17 Thread Waiman Long
comparable to ticket spinlock on the same node, but much faster when in different nodes. With 3 contending tasks, however, the ticket spinlock is still quite a bit faster. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 32 ++-- 1 files

[PATCH v9 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-04-17 Thread Waiman Long
Usr Time -- - ticketlock 2075 10.00 216.35 3.49 qspinlock 3023 10.00 198.20 4.80 Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 61

[PATCH v9 09/19] qspinlock: Prepare for unfair lock support

2014-04-17 Thread Waiman Long
If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need to detect the fact the lock can be stolen. Code are added for the stolen lock detection. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 26

[PATCH v9 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-04-17 Thread Waiman Long
. For large critical section, however, there may not be much benefit. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/Kconfig | 11 + arch/x86/include/asm/qspinlock.h | 79 ++ arch/x86/kernel/Makefile |1 + arch/x86

[PATCH v9 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-04-17 Thread Waiman Long
. It is also necessary to expand arch_mcs_spin_lock_contended() to the underlying while loop as additional code will need to be inserted into the loop. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 36 +++- 1 files changed, 23

[PATCH v9 13/19] unfair qspinlock: Enable lock stealing in lock waiters

2014-04-17 Thread Waiman Long
10507 1869 4307 Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 160 ++-- 1 files changed, 154 insertions(+), 6 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index

[PATCH v9 12/19] unfair qspinlock: Variable frequency lock stealing mechanism

2014-04-17 Thread Waiman Long
the cacheline contention problem on the lock word while trying to maintain as much of a FIFO order as possible. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 147 +++- 1 files changed, 146 insertions(+), 1 deletions

[PATCH v9 11/19] qspinlock: Split the MCS queuing code into a separate slowerpath

2014-04-17 Thread Waiman Long
into a slowerpath function. This avoids the slow down of the pending bit and trylock code path at the expense of a little bit of additional overhead to the MCS queuing code path. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 111

[PATCH v9 15/19] pvqspinlock, x86: Add PV data structure methods

2014-04-17 Thread Waiman Long
statistical data for debugfs Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/paravirt.h | 18 +- arch/x86/include/asm/paravirt_types.h | 17 + arch/x86/kernel/paravirt-spinlocks.c |6 ++ 3 files changed, 40 insertions(+), 1

[PATCH v9 14/19] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled

2014-04-17 Thread Waiman Long
This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/spinlock.h |4 ++-- arch/x86/kernel/kvm.c|2 +- arch/x86/kernel/paravirt

[PATCH v9 16/19] pvqspinlock: Enable coexistence with the unfair lock

2014-04-17 Thread Waiman Long
in the pending bit code path back to the regular queuing code path so that it can be properly halted by the PV qspinlock code. Signed-off-by: Waiman Long waiman.l...@hp.com --- kernel/locking/qspinlock.c | 74 ++-- 1 files changed, 64 insertions(+), 10 deletions

[PATCH v9 18/19] pvqspinlock, x86: Enable PV qspinlock PV for KVM

2014-04-17 Thread Waiman Long
-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/kernel/kvm.c | 135 + kernel/Kconfig.locks |2 +- 2 files changed, 136 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 7ab8ab3..eef427b 100644

[PATCH v9 19/19] pvqspinlock, x86: Enable PV qspinlock for XEN

2014-04-17 Thread Waiman Long
This patch adds the necessary XEN specific code to allow XEN to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/xen/spinlock.c | 146 +-- kernel

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-17 Thread Waiman Long
On 04/17/2014 11:42 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:55AM -0400, Waiman Long wrote: +/** + * trylock_pending - try to acquire queue spinlock using the pending bit + * @lock : Pointer to queue spinlock structure + * @pval : Pointer to value of the queue spinlock 32-bit

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-17 Thread Waiman Long
On 04/17/2014 11:49 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote: @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) node-next = NULL; /* +* We touched a (possibly) cold cacheline; attempt

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Waiman Long
On 04/17/2014 11:56 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +struct __qspinlock { + union { + atomic_t val; + struct { +#ifdef __LITTLE_ENDIAN + u16 locked_pending

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Waiman Long
On 04/17/2014 11:58 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +static __always_inline void +clear_pending_set_locked(struct qspinlock *lock, u32 val) +{ + struct __qspinlock *l = (void *)lock; + + ACCESS_ONCE(l-locked_pending) = 1

Re: [PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014-04-17 Thread Waiman Long
On 04/17/2014 12:36 PM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: There is a problem in the current trylock_pending() function. When the lock is free, but the pending bit holder hasn't grabbed the lock cleared the pending bit yet, the trylock_pending

Re: [PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-04-17 Thread Waiman Long
On 04/17/2014 01:23 PM, Konrad Rzeszutek Wilk wrote: On Thu, Apr 17, 2014 at 11:03:52AM -0400, Waiman Long wrote: v8-v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181...@infradead.org - Break

Re: [PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-04-17 Thread Waiman Long
On 04/17/2014 01:40 PM, Raghavendra K T wrote: On 04/17/2014 10:53 PM, Konrad Rzeszutek Wilk wrote: On Thu, Apr 17, 2014 at 11:03:52AM -0400, Waiman Long wrote: v8-v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r

Re: [PATCH v8 00/10] qspinlock: a 4-byte queue spinlock with PV support

2014-04-08 Thread Waiman Long
On 04/07/2014 01:51 PM, Raghavendra K T wrote: On 04/07/2014 10:08 PM, Waiman Long wrote: On 04/07/2014 02:14 AM, Raghavendra K T wrote: [...] But I am seeing hang in overcommit cases. Gdb showed that many vcpus are halted and there was no progress. Suspecting the problem /race with halting

Re: [PATCH v8 00/10] qspinlock: a 4-byte queue spinlock with PV support

2014-04-07 Thread Waiman Long
On 04/07/2014 02:14 AM, Raghavendra K T wrote: I tested the v7,v8 of qspinlock with unfair config on kvm guest. I was curious about unfair locks performance in undercommit cases. (overcommit case is expected to perform well) But I am seeing hang in overcommit cases. Gdb showed that many

Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-07 Thread Waiman Long
On 04/07/2014 10:09 AM, Peter Zijlstra wrote: On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long wrote: Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair lock support in there. Yes, because your patches were unreadable and entirely non obvious. And while I appreciate

Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-04 Thread Waiman Long
On 04/04/2014 09:00 AM, Peter Zijlstra wrote: So I'm just not ever going to pick up this patch; I spend a week trying to reverse engineer this; I posted a 7 patch series creating the equivalent, but in a gradual and readable fashion:

<    1   2   3   4   5   >