Re: [PATCH 0/2] dm-band: The I/O bandwidth controller: Overview

2008-01-23 Thread Peter Zijlstra
On Wed, 2008-01-23 at 21:53 +0900, Ryo Tsuruta wrote: Hi everyone, I'm happy to announce that I've implemented a Block I/O bandwidth controller. The controller is designed to be of use in a cgroup or virtual machine environment. The current approach is that the controller is implemented as

Re: [PATCH] stopmachine: add stopmachine_timeout v3

2008-07-16 Thread Peter Zijlstra
On Wed, 2008-07-16 at 15:51 +0900, Hidetoshi Seto wrote: If stop_machine() invoked while one of onlined cpu is locked up by some reason, stop_machine cannot finish its work because the locked cpu cannot stop. This means all other healthy cpus will be blocked infinitely by one dead cpu.

Re: How to get a sense of VM pressure

2008-07-28 Thread Peter Zijlstra
On Fri, 2008-07-25 at 10:55 -0700, Jeremy Fitzhardinge wrote: I'm thinking about ways to improve the Xen balloon driver. This is the driver which allows the guest domain to expand or contract by either asking for more memory from the hypervisor, or giving unneeded memory back. From the

Re: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-06 Thread Peter Zijlstra
On Thu, 2008-11-06 at 11:01 -0500, Vivek Goyal wrote: Does this still require I use dm, or does it also work on regular block devices? Patch 4/4 isn't quite clear on this. No. You don't have to use dm. It will simply work on regular devices. We shall have to put few lines of code for it

Re: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-06 Thread Peter Zijlstra
On Thu, 2008-11-06 at 10:30 -0500, [EMAIL PROTECTED] wrote: Hi, If you are not already tired of so many io controller implementations, here is another one. This is a very eary very crude implementation to get early feedback to see if this approach makes any sense or not. This

Re: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-06 Thread Peter Zijlstra
On Thu, 2008-11-06 at 11:39 -0500, Vivek Goyal wrote: On Thu, Nov 06, 2008 at 05:16:13PM +0100, Peter Zijlstra wrote: On Thu, 2008-11-06 at 11:01 -0500, Vivek Goyal wrote: Does this still require I use dm, or does it also work on regular block devices? Patch 4/4 isn't quite clear

Re: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-06 Thread Peter Zijlstra
On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote: Peter Zijlstra wrote: The only real issue I can see is with linear volumes, but those are stupid anyway - non of the gains but all the risks. Linear volumes may well be the most common ones. People start out with the filesystems

Re: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-07 Thread Peter Zijlstra
On Fri, 2008-11-07 at 11:41 +1100, Dave Chinner wrote: On Thu, Nov 06, 2008 at 06:11:27PM +0100, Peter Zijlstra wrote: On Thu, 2008-11-06 at 11:57 -0500, Rik van Riel wrote: Peter Zijlstra wrote: The only real issue I can see is with linear volumes, but those are stupid anyway

RE: [patch 0/4] [RFC] Another proportional weight IO controller

2008-11-14 Thread Peter Zijlstra
On Fri, 2008-11-14 at 13:58 +0900, Satoshi UCHIDA wrote: I think Satoshi's cfq controller patches also do not seem to be considering A, B, C, D and E to be at same level, instead it treats cgroup / , D and E at same level and tries to do proportional BW division among these. Satoshi,

Re: [PATCH 0/4] Virtual Machine Time Accounting

2009-08-04 Thread Peter Zijlstra
These patches never seem to have made it onto LKML?! On Mon, 2007-08-20 at 15:13 +0200, Laurent Vivier wrote: The aim of these four patches is to introduce Virtual Machine time accounting. _Ingo_, as these patches modify files of the scheduler, could you have a look to them, please ?

Re: [PATCH 0/4] Virtual Machine Time Accounting

2009-08-04 Thread Peter Zijlstra
On Tue, 2009-08-04 at 19:29 +0200, Martin Schwidefsky wrote: So its going to split user time into user and guest. Does that really make sense? For the host kernel it really is just another user process, no? The code (at least in parts) is already upstream. Look at the account_guest_time

Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Peter Zijlstra
On Mon, 2010-11-15 at 12:03 -0800, H. Peter Anvin wrote: On 11/15/2010 12:00 PM, Jeremy Fitzhardinge wrote: Another approach I discussed with PeterZ and Mathieu is to steal the LSB of the ticket counters (halving the max CPU count) to use as a there is someone in slowpath waiting on this

Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Peter Zijlstra
On Mon, 2010-11-15 at 13:02 -0800, Jeremy Fitzhardinge wrote: As a heuristic, it shouldn't be too bad performancewise, since (handwaving) if ticketholder N has entered the slowpath, then its likely that N+1 will as well. Yes, esp. if the whole slow unlock path takes more cycles than you

Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Peter Zijlstra
On Tue, 2010-11-16 at 13:08 -0800, Jeremy Fitzhardinge wrote: Maintain a flag in both LSBs of the ticket lock which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared

[PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Peter Zijlstra
when low on memory. Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl --- arch/alpha/kernel/smp.c |1 + arch/arm/kernel/smp.c |1 + arch/blackfin/mach-common/smp.c |3 ++- arch/cris/arch-v32/kernel/smp.c | 13 - arch/ia64/kernel/irq_ia64.c |2

Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Peter Zijlstra
On Mon, 2011-01-17 at 11:26 +, Russell King - ARM Linux wrote: On Mon, Jan 17, 2011 at 12:07:13PM +0100, Peter Zijlstra wrote: diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c index 42aa078..c4a570b 100644 --- a/arch/alpha/kernel/smp.c +++ b/arch/alpha/kernel/smp.c

Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Peter Zijlstra
On Mon, 2011-01-17 at 12:31 +0100, Peter Zijlstra wrote: On Mon, 2011-01-17 at 11:26 +, Russell King - ARM Linux wrote: Maybe remove the comment everything is done on the interrupt return path as with this function call, that is no longer the case. (Removed am33, m32r-ka, m32r, arm

Re: [uclinux-dist-devel] [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Peter Zijlstra
On Mon, 2011-01-17 at 14:49 -0500, Mike Frysinger wrote: On Mon, Jan 17, 2011 at 06:07, Peter Zijlstra wrote: Also, while reading through all this, I noticed the blackfin SMP code looks to be broken, it simply discards any IPI when low on memory. not really. see changelog of commit

Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-01-17 Thread Peter Zijlstra
On Tue, 2011-01-18 at 07:31 +1100, Benjamin Herrenschmidt wrote: Beware of false positive, I've used fake reschedule IPIs in the past for other things (like kicking a CPU out of sleep state for unrelated reasons). Nothing that I know that is upstream today but some of that might come back.

Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Peter Zijlstra
On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it is woken up because of an event like interrupt. KVM_HC_KICK_CPU allows the calling

Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Peter Zijlstra
On Wed, 2011-01-19 at 22:53 +0530, Srivatsa Vaddagiri wrote: On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it is woken up

Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-20 Thread Peter Zijlstra
On Thu, 2011-01-20 at 17:29 +0530, Srivatsa Vaddagiri wrote: If we had a yield-to [1] sort of interface _and_ information on which vcpu owns a lock, then lock-spinners can yield-to the owning vcpu, and then I'd nak it for being stupid ;-) really, yield*() is retarded, never even consider

Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-02-07 Thread Peter Zijlstra
On Mon, 2011-02-07 at 10:26 +1100, Benjamin Herrenschmidt wrote: You missed: diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 9813605..467d122 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -98,6 +98,7 @@ void smp_message_recv(int msg)

Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()

2011-02-09 Thread Peter Zijlstra
On Wed, 2011-02-09 at 17:14 +1100, Benjamin Herrenschmidt wrote: On Mon, 2011-02-07 at 14:54 +0100, Peter Zijlstra wrote: On Mon, 2011-02-07 at 10:26 +1100, Benjamin Herrenschmidt wrote: You missed: diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 9813605

Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Peter Zijlstra
On Tue, 2012-08-21 at 16:52 +0300, Michael S. Tsirkin wrote: + rcu_read_lock(); + mapping = rcu_dereference(page-mapping); + if (mapping_balloon(mapping)) + ret = true; + rcu_read_unlock(); This looks suspicious: you

Re: [PATCH v8 1/5] mm: introduce a common interface for balloon pages mobility

2012-08-21 Thread Peter Zijlstra
On Tue, 2012-08-21 at 09:47 -0300, Rafael Aquini wrote: + mapping = rcu_access_pointer(page-mapping); + if (mapping) + mapping = mapping-assoc_mapping; The comment near rcu_access_pointer() explicitly says: * Return the value of the specified RCU-protected pointer,

Re: [PATCH v8 3/5] virtio_balloon: introduce migration primitives to balloon pages

2012-08-21 Thread Peter Zijlstra
On Tue, 2012-08-21 at 17:40 +0300, Michael S. Tsirkin wrote: + spin_lock(vb-pages_lock); + page = list_first_or_null_rcu(vb-pages, struct page, lru); Why is list_first_or_null_rcu called outside RCU critical section here? It looks like vb-pages_lock is the

Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-17 at 13:38 -0300, Rafael Aquini wrote: +static inline void assign_balloon_mapping(struct page *page, + struct address_space *mapping) +{ + page-mapping = mapping; + smp_wmb(); +} + +static inline void

Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-07-10 Thread Peter Zijlstra
On Wed, Jul 10, 2013 at 01:33:25PM +0300, Gleb Natapov wrote: Here's an idea, trim the damn email ;-) -- not only directed at gleb. Ingo, Gleb, From the results perspective, Andrew Theurer, Vinod's test results are pro-pvspinlock. Could you please help me to know what will make it a

Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-16 Thread Peter Zijlstra
On Tue, Jul 16, 2013 at 09:02:15AM +0300, Gleb Natapov wrote: BTW can NMI handler take spinlocks? No -- that is, yes you can using trylock, but you still shouldn't. If it can what happens if NMI is delivered in a section protected by local_irq_save()/local_irq_restore()? You deadlock.

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-02-26 Thread Peter Zijlstra
You don't happen to have a proper state diagram for this thing do you? I suppose I'm going to have to make one; this is all getting a bit unwieldy, and those xchg() + fixup things are hard to read. On Wed, Feb 26, 2014 at 10:14:23AM -0500, Waiman Long wrote: +static inline int

Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-26 Thread Peter Zijlstra
On Wed, Feb 26, 2014 at 10:14:21AM -0500, Waiman Long wrote: +struct qnode { + u32 wait; /* Waiting flag */ + struct qnode*next; /* Next queue node addr */ +}; + +struct qnode_set { + struct qnodenodes[MAX_QNODES]; + int

Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-02-26 Thread Peter Zijlstra
On Wed, Feb 26, 2014 at 10:14:21AM -0500, Waiman Long wrote: +static void put_qnode(void) +{ + struct qnode_set *qset = this_cpu_ptr(qnset); + + qset-node_idx--; +} That very much wants to be: this_cpu_dec(). ___ Virtualization mailing

Re: [PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014-02-27 Thread Peter Zijlstra
Is this the same 8 patches you send yesterday? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-02-28 Thread Peter Zijlstra
On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote: + old = xchg(qlock-lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); + + if (old == 0) { + /* +* Got the lock, can clear the waiting bit now +*/ + smp_u8_store_release(qlock-wait,

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-02-28 Thread Peter Zijlstra
On Fri, Feb 28, 2014 at 08:25:24AM -0800, Linus Torvalds wrote: On Feb 28, 2014 1:30 AM, Peter Zijlstra pet...@infradead.org wrote: At low contention the cmpxchg won't have to be retried (much) so using it won't be a problem and you get to have arbitrary atomic ops. Peter, the difference

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-02-28 Thread Peter Zijlstra
After modifying it to do a deterministic cmpxchg, the test run time of 2 contending tasks jumps up from 600ms (best case) to about 1700ms which was worse than the original qspinlock's 1300-1500ms. It is the opportunistic nature of the xchg() code that can potentially combine multiple steps

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-03 Thread Peter Zijlstra
Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos,

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-04 Thread Peter Zijlstra
Updated version, this includes numbers for my SNB desktop and Waiman's variant. Curiously Waiman's version seems consistently slower on 2 cross node CPUs. Whereas my version seems to have a problem on SNB with 2 CPUs. There's something weird with the ticket lock numbers; when I compile the code

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-04 Thread Peter Zijlstra
On Tue, Mar 04, 2014 at 05:58:00PM +0100, Peter Zijlstra wrote: 2: 17141.3240502:620.1859302:618.737681 So I forgot that AMD has compute units that share L2: root@interlagos:~/spinlocks# export LOCK=./ticket ; ($LOCK 0 1 ; $LOCK 0 2) | awk '/^total/ { print $2

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-04 Thread Peter Zijlstra
On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote: Peter, I was trying to implement the generic queue code exchange code using cmpxchg as suggested by you. However, when I gathered the performance data, the code performed worse than I expected at a higher contention level. Below

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-05 Thread Peter Zijlstra
On Tue, Mar 04, 2014 at 11:40:43PM +0100, Peter Zijlstra wrote: On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote: Peter, I was trying to implement the generic queue code exchange code using cmpxchg as suggested by you. However, when I gathered the performance data, the code

Re: [PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014-03-13 Thread Peter Zijlstra
On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long wrote: On 03/12/2014 02:54 PM, Waiman Long wrote: +/* + * Set the lock bit clear the waiting bit simultaneously + * It is assumed that there is no lock stealing with this + * quick path

Re: [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest

2014-03-13 Thread Peter Zijlstra
On Wed, Mar 12, 2014 at 02:54:52PM -0400, Waiman Long wrote: +static inline void arch_spin_lock(struct qspinlock *lock) +{ + if (static_key_false(paravirt_unfairlocks_enabled)) + queue_spin_lock_unfair(lock); + else + queue_spin_lock(lock); +} So I would

Re: [PATCH RFC v6 10/11] pvqspinlock, x86: Enable qspinlock PV support for KVM

2014-03-13 Thread Peter Zijlstra
On Wed, Mar 12, 2014 at 02:54:57PM -0400, Waiman Long wrote: A KVM guest of 20 CPU cores was created to run the disk workload of the AIM7 benchmark on both ext4 and xfs RAM disks at 3000 users on a 3.14-rc6 based kernel. The JPM (jobs/minute) data of the test run were: You really should just

Re: [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest

2014-03-14 Thread Peter Zijlstra
On Thu, Mar 13, 2014 at 04:05:19PM -0400, Waiman Long wrote: On 03/13/2014 11:15 AM, Peter Zijlstra wrote: On Wed, Mar 12, 2014 at 02:54:52PM -0400, Waiman Long wrote: +static inline void arch_spin_lock(struct qspinlock *lock) +{ + if (static_key_false(paravirt_unfairlocks_enabled

Re: [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest

2014-03-17 Thread Peter Zijlstra
On Mon, Mar 17, 2014 at 01:44:34PM -0400, Waiman Long wrote: The PV ticketlock code was designed to handle lock holder preemption by redirecting CPU resources in a preempted guest to another guest that can better use it and then return the preempted CPU back sooner. But that's the PV code, not

Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-04 Thread Peter Zijlstra
So I'm just not ever going to pick up this patch; I spend a week trying to reverse engineer this; I posted a 7 patch series creating the equivalent, but in a gradual and readable fashion: http://lkml.kernel.org/r/20140310154236.038181...@infradead.org You keep on ignoring that; I'll keep on

Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-07 Thread Peter Zijlstra
On Fri, Apr 04, 2014 at 01:08:16PM -0400, Waiman Long wrote: Peter's patch is a rewrite of my patches 1-4, there is no PV or unfair lock support in there. Yes, because your patches were unreadable and entirely non obvious. And while I appreciate that its not entirely your fault; the subject is

Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-07 Thread Peter Zijlstra
On Fri, Apr 04, 2014 at 10:59:09AM -0400, Waiman Long wrote: I am really sorry if you have bad feeling about it. I do not mean to discredit you on your effort to make the qspinlock patch better. I really appreciate your input and would like to work with you on this patch as well as other

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:55AM -0400, Waiman Long wrote: +/** + * trylock_pending - try to acquire queue spinlock using the pending bit + * @lock : Pointer to queue spinlock structure + * @pval : Pointer to value of the queue spinlock 32-bit word + * Return: 1 if lock acquired, 0

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote: @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) node-next = NULL; /* + * We touched a (possibly) cold cacheline; attempt the trylock once + * more in the hope someone

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +#if !defined(__LITTLE_ENDIAN) !defined(__BIG_ENDIAN) +#error Missing either LITTLE_ENDIAN or BIG_ENDIAN definition. +#endif This seems entirely superfluous, I don't think a kernel build will go anywhere if either is missing.

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: @@ -48,6 +53,9 @@ * We can further change the first spinner to spin on a bit in the lock word * instead of its node; whereby avoiding the need to carry a node from lock to * unlock, and preserving API. + * + * N.B. The

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +struct __qspinlock { + union { + atomic_t val; + struct { +#ifdef __LITTLE_ENDIAN + u16 locked_pending; + u16 tail; +#else + u16

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +static __always_inline void +clear_pending_set_locked(struct qspinlock *lock, u32 val) +{ + struct __qspinlock *l = (void *)lock; + + ACCESS_ONCE(l-locked_pending) = 1; +} @@ -157,8 +251,13 @@ static inline int

Re: [PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: There is a problem in the current trylock_pending() function. When the lock is free, but the pending bit holder hasn't grabbed the lock cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some

Re: [PATCH v9 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-04-17 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 11:03:59AM -0400, Waiman Long wrote: kernel/locking/qspinlock.c | 61 +++ 1 files changed, 44 insertions(+), 17 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 497da24..80fe9ee 100644

Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-18 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote: + while ((val = atomic_read(lock-val)) _Q_LOCKED_MASK) + arch_mutex_cpu_relax(); That was a cpu_relax(). Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86. Yeah, so why bother typing more? Let the

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-18 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote: On 04/17/2014 11:49 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote: @@ -192,36 +220,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) node-next = NULL

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-18 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 05:46:27PM -0400, Waiman Long wrote: On 04/17/2014 11:56 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote: +struct __qspinlock { + union { + atomic_t val; char bytes[4]; + struct { +#ifdef

Re: [PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014-04-18 Thread Peter Zijlstra
On Thu, Apr 17, 2014 at 09:46:04PM -0400, Waiman Long wrote: BTW, I didn't test out your atomic_test_and_set() change. Did it provide a noticeable performance benefit when compared with cmpxchg()? I've not tested that I think. I had a hard time showing that cmpxchg loops were slower, but once I

Re: [PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014-04-18 Thread Peter Zijlstra
On Fri, Apr 18, 2014 at 01:32:47PM -0400, Waiman Long wrote: On 04/18/2014 04:15 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 05:28:17PM -0400, Waiman Long wrote: On 04/17/2014 11:49 AM, Peter Zijlstra wrote: On Thu, Apr 17, 2014 at 11:03:56AM -0400, Waiman Long wrote: @@ -192,36 +220,25

Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-18 Thread Peter Zijlstra
On Fri, Apr 18, 2014 at 01:52:50PM -0400, Waiman Long wrote: I am confused by your notation. Nah, I think I was confused :-) Make the 1 _Q_LOCKED_VAL though, as that's the proper constant to use. ___ Virtualization mailing list

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:31AM -0400, Waiman Long wrote: +/** + * trylock_pending - try to acquire queue spinlock using the pending bit + * @lock : Pointer to queue spinlock structure + * @pval : Pointer to value of the queue spinlock 32-bit word + * Return: 1 if lock acquired, 0

Re: [PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:34AM -0400, Waiman Long wrote: @@ -221,11 +222,37 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) */ for (;;) { /* - * If we observe any contention; queue. + * If we observe that the

Re: [PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:35AM -0400, Waiman Long wrote: @@ -94,23 +94,29 @@ static inline struct mcs_spinlock *decode_tail(u32 tail) * can allow better optimization of the lock acquisition for the pending * bit holder. */ -#if _Q_PENDING_BITS == 8 - struct __qspinlock {

Re: [PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:35AM -0400, Waiman Long wrote: /** + * get_qlock - Set the lock bit and own the lock + * @lock: Pointer to queue spinlock structure + * + * This routine should only be called when the caller is the only one + * entitled to acquire the lock. + */ +static

Re: [PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:36AM -0400, Waiman Long wrote: /* + * To have additional features for better virtualization support, it is + * necessary to store additional data in the queue node structure. So + * a new queue node structure will have to be defined and used here. + */ +struct

Re: [PATCH v10 09/19] qspinlock: Prepare for unfair lock support

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:37AM -0400, Waiman Long wrote: If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need to detect the fact the lock can be stolen. Code are added for the stolen lock detection. A new qhead macro is

Re: [PATCH v10 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:38AM -0400, Waiman Long wrote: No, we want the unfair thing for VIRT, not PARAVIRT. diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 9e7659e..10e87e1 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -227,6

Re: [PATCH v10 12/19] unfair qspinlock: Variable frequency lock stealing mechanism

2014-05-08 Thread Peter Zijlstra
On Wed, May 07, 2014 at 11:01:40AM -0400, Waiman Long wrote: +#define DEF_LOOP_CNT(c) int c = 0 +#define INC_LOOP_CNT(c) (c)++ +#define LOOP_CNT(c) c +#define LSTEAL_MIN (1 3) +#define LSTEAL_MAX (1 10) +#define LSTEAL_MIN_MASK

Re: [PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

2014-05-10 Thread Peter Zijlstra
On Fri, May 09, 2014 at 08:58:47PM -0400, Waiman Long wrote: On 05/08/2014 02:58 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:34AM -0400, Waiman Long wrote: @@ -221,11 +222,37 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval

Re: [PATCH v10 09/19] qspinlock: Prepare for unfair lock support

2014-05-10 Thread Peter Zijlstra
On Fri, May 09, 2014 at 09:19:32PM -0400, Waiman Long wrote: On 05/08/2014 03:06 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:37AM -0400, Waiman Long wrote: If unfair lock is supported, the lock acquisition loop at the end of the queue_spin_lock_slowpath() function may need

Re: [PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-05-10 Thread Peter Zijlstra
On Fri, May 09, 2014 at 09:08:56PM -0400, Waiman Long wrote: On 05/08/2014 03:04 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:36AM -0400, Waiman Long wrote: /* + * To have additional features for better virtualization support, it is + * necessary to store additional data

Re: [PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014-05-10 Thread Peter Zijlstra
On Sat, May 10, 2014 at 04:14:17PM +0200, Peter Zijlstra wrote: On Fri, May 09, 2014 at 09:08:56PM -0400, Waiman Long wrote: On 05/08/2014 03:04 PM, Peter Zijlstra wrote: On Wed, May 07, 2014 at 11:01:36AM -0400, Waiman Long wrote: /* + * To have additional features for better

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-12 Thread Peter Zijlstra
On Mon, May 12, 2014 at 05:22:08PM +0200, Radim Krčmář wrote: 2014-05-07 11:01-0400, Waiman Long: From: Peter Zijlstra pet...@infradead.org Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-14 Thread Peter Zijlstra
On Wed, May 14, 2014 at 06:51:24PM +0200, Radim Krčmář wrote: Ok. I've seen merit in pvqspinlock even with slightly slower first-waiter, so I would have happily sacrificed those horrible branches. (I prefer elegant to optimized code, but I can see why we want to be strictly better than

Re: [RFC] Implement Batched (group) ticket lock

2014-05-29 Thread Peter Zijlstra
On Wed, May 28, 2014 at 05:46:39PM +0530, Raghavendra K T wrote: In virtualized environment there are mainly three problems related to spinlocks that affect performance. 1. LHP (lock holder preemption) 2. Lock Waiter Preemption (LWP) 3. Starvation/fairness Though ticketlocks solve the

Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

2014-06-11 Thread Peter Zijlstra
On Fri, May 30, 2014 at 11:43:52AM -0400, Waiman Long wrote: --- kernel/locking/qspinlock.c | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index fc7fd8c..7f10758 100644 ---

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-11 Thread Peter Zijlstra
On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation by about 1-2% mainly due to the use of a static key. However, uncontended lock-unlock operation are really just a

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 12:54:02PM +0200, Peter Zijlstra wrote: @@ -252,6 +260,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) BUILD_BUG_ON(CONFIG_NR_CPUS = (1U _Q_TAIL_CPU_BITS)); +#ifdef CONFIG_VIRT_UNFAIR_LOCKS + /* +* A simple test and set

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: On 6/11/2014 6:54 AM, Peter Zijlstra wrote: On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: Enabling this configuration feature causes a slight decrease the performance of an uncontended lock-unlock operation

Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

2014-06-12 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 05:22:28PM -0400, Long, Wai Man wrote: @@ -233,11 +233,25 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) */ for (;;) { /* -* If we observe any contention; queue. +* If we observe that the queue is not

Re: [PATCH v11 14/16] pvqspinlock: Add qspinlock para-virtualization support

2014-06-12 Thread Peter Zijlstra
On Fri, May 30, 2014 at 11:44:00AM -0400, Waiman Long wrote: @@ -19,13 +19,46 @@ extern struct static_key virt_unfairlocks_enabled; * that the clearing the lock bit is done ASAP without artificial delay * due to compiler optimization. */ +#ifdef CONFIG_PARAVIRT_SPINLOCKS +static

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014-06-15 Thread Peter Zijlstra
From: Waiman Long waiman.l...@hp.com This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org

[PATCH 00/11] qspinlock with paravirt support

2014-06-15 Thread Peter Zijlstra
Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt

[PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable

2014-06-15 Thread Peter Zijlstra
-off-by: Peter Zijlstra pet...@infradead.org --- kernel/locking/qspinlock.c | 59 - 1 file changed, 43 insertions(+), 16 deletions(-) --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -93,24 +93,33 @@ static inline struct mcs_spinlock

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014-06-15 Thread Peter Zijlstra
needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- include/asm-generic/qspinlock.h | 118 include/asm-generic

[PATCH 09/11] pvqspinlock, x86: Rename paravirt_ticketlocks_enabled

2014-06-15 Thread Peter Zijlstra
From: Waiman Long waiman.l...@hp.com This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/include/asm/spinlock.h |4

[PATCH 08/11] qspinlock: Revert to test-and-set on hypervisors

2014-06-15 Thread Peter Zijlstra
When we detect a hypervisor (!paravirt, see later patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/include/asm/qspinlock.h | 14 ++ include/asm-generic/qspinlock.h |7

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014-06-15 Thread Peter Zijlstra
From: Peter Zijlstra pet...@infradead.org When we allow for a max NR_CPUS 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all

[PATCH 03/11] qspinlock: Add pending bit

2014-06-15 Thread Peter Zijlstra
Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. Signed-off-by: Peter Zijlstra pet...@infradead.org --- include/asm-generic/qspinlock_types.h | 12 ++- kernel/locking/qspinlock.c

[PATCH 02/11] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-06-15 Thread Peter Zijlstra
file includes some x86 specific optimization which will make the queue spinlock code perform better than the generic implementation. Signed-off-by: Waiman Long waiman.l...@hp.com Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/Kconfig |1 + arch/x86/include

[PATCH 06/11] qspinlock: Optimize pending bit

2014-06-15 Thread Peter Zijlstra
, in this case the pending bit is guaranteed to be released 'soon', therefore wait for it and avoid queueing. Signed-off-by: Peter Zijlstra pet...@infradead.org --- kernel/locking/qspinlock.c | 10 ++ 1 file changed, 10 insertions(+) Index: linux-2.6/kernel/locking/qspinlock.c

[PATCH 10/11] qspinlock: Paravirt support

2014-06-15 Thread Peter Zijlstra
the head is done in two parts, firstly the pv_wait_head will store its cpu number in whichever node is pointed to by the tail part of the lock word. Secondly, pv_link_and_wait_node() will propagate the existing head from the old to the new tail node. Signed-off-by: Peter Zijlstra pet

[PATCH 11/11] qspinlock, kvm: Add paravirt support

2014-06-15 Thread Peter Zijlstra
Signed-off-by: Peter Zijlstra pet...@infradead.org --- arch/x86/kernel/kvm.c | 58 ++ kernel/Kconfig.locks |2 - 2 files changed, 59 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86/kernel/kvm.c

Re: [PATCH v11 06/16] qspinlock: prolong the stay in the pending bit path

2014-06-15 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 04:54:52PM -0400, Waiman Long wrote: If two tasks see the pending bit goes away and try to grab it with cmpxchg, there is no way we can avoid the contention. However, if some how the pending bit holder get the lock and another task set the pending bit before the current

Re: [PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-06-15 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 05:08:28PM -0400, Waiman Long wrote: Native performance is king, try your very utmost bestest to preserve that, paravirt is a distant second and nobody sane should care about the virt case at all. The patch won't affect native performance unless the kernel is built

Re: [PATCH v11 14/16] pvqspinlock: Add qspinlock para-virtualization support

2014-06-15 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 04:48:41PM -0400, Waiman Long wrote: I don't have a good understanding of the kernel alternatives mechanism. I didn't either; I do now, cost me a whole day reading up on alternative/paravirt code patching. See the patches I just send out; I got the 'native' case with

  1   2   3   4   5   6   >