Re: [PATCH -v5][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt

On Thu, 8 Jan 2009, Andi Kleen wrote:

  What about memory hotplug as Ingo mentioned?
  
  Should that be:
  
  #if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_MEMORY_HOTPLUG)
 
 We expect memory hotunplug to only really work in movable zones
 (all others should at least have one kernel object somewhere that prevents
 unplug) and you can't have task structs in movable zones obviously
 
 So it's probably a non issue in practice.

Sure, it probably is a non issue, but I'm afraid that non issues of today 
might become issues of tomorrow. Where does it say that we can never put a 
task struct in a movable zone. Perhaps, we could someday have a CPU with 
memory local to it, and pinned tasks to that CPU in that memory. Then 
there can be cases where we remove the CPU and memory together.

Because of preemption in the mutex spin part, there's no guarantee that a 
the task in that removed memory will not be referenced again. Of course 
this thought is purely theoretical, but I like to solve bugs that might 
might happen tomorrow too. ;-)

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v6][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Gregory Haskins
Ingo Molnar wrote:
 * Peter Zijlstra pet...@infradead.org wrote:

   
 * WOW *
 

 WOW indeed - and i can see a similar _brutal_ speedup on two separate 
 16-way boxes as well:

   16 CPUs, running 128 parallel test-tasks.

   NO_OWNER_SPIN:
   avg ops/sec:   281595

   OWNER_SPIN:
   avg ops/sec:   524791

 Da Killer!
   

This jives with our findings back when we first looked at this
(200%-300% speedups in most benchmarks), so this is excellent that it is
yielding boosts here as well.

 Look at the performance counter stats:

   
12098.324578  task clock ticks (msecs)

1081  CPU migrations   (events)
7102  context switches (events)
2763  pagefaults   (events)
 

   
22280.283224  task clock ticks (msecs)

 117  CPU migrations   (events)
5711  context switches (events)
2781  pagefaults   (events)
 

 We were able to spend twice as much CPU time and efficiently so - and we 
 did about 10% of the cross-CPU migrations as before (!).

 My (wild) guess is that the biggest speedup factor was perhaps this little 
 trick:

 +   if (need_resched())
 +   break;

 this allows the spin-mutex to only waste CPU time if there's no work 
 around on that CPU. (i.e. if there's no other task that wants to run) The 
 moment there's some other task, we context-switch to it.
   
Well, IIUC thats only true if the other task happens to preempt current,
which may not always be the case, right?  For instance, if current still
has timeslice left, etc.  I think the primary difference is actually the
reduction in the ctx switch rate, but its hard to say without looking at
detailed traces and more stats.  Either way, woohoo!

-Greg



signature.asc
Description: OpenPGP digital signature


[PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Peter Zijlstra
On Thu, 2009-01-08 at 15:18 +0100, Ingo Molnar wrote:

 [ A detail, -tip testing found that the patch breaks mutex debugging:
 
   =
   [ BUG: bad unlock balance detected! ]
   -
   swapper/0 is trying to release lock (cpu_add_remove_lock) at:
   [8089f540] mutex_unlock+0xe/0x10
   but there are no more locks to release!
 
  but that's a detail for -v7 ;-) ]

Here it is..

I changed the optimistic spin cmpxchg trickery to not require the -1000
state, please double and tripple check the code.

This was done because the interaction between trylock_slowpath and that
-1000 state hurt my head.

---
Subject: 
From: Peter Zijlstra a.p.zijls...@chello.nl
Date: Thu Jan 08 09:41:22 CET 2009

Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
---
 include/linux/mutex.h   |4 +-
 include/linux/sched.h   |1 
 kernel/mutex-debug.c|   24 ++-
 kernel/mutex-debug.h|   18 ++-
 kernel/mutex.c  |   76 ++--
 kernel/mutex.h  |   22 -
 kernel/sched.c  |   66 +
 kernel/sched_features.h |1 
 8 files changed, 185 insertions(+), 27 deletions(-)

Index: linux-2.6/include/linux/mutex.h
===
--- linux-2.6.orig/include/linux/mutex.h
+++ linux-2.6/include/linux/mutex.h
@@ -50,8 +50,10 @@ struct mutex {
atomic_tcount;
spinlock_t  wait_lock;
struct list_headwait_list;
-#ifdef CONFIG_DEBUG_MUTEXES
+#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP)
struct thread_info  *owner;
+#endif
+#ifdef CONFIG_DEBUG_MUTEXES
const char  *name;
void*magic;
 #endif
Index: linux-2.6/kernel/mutex-debug.c
===
--- linux-2.6.orig/kernel/mutex-debug.c
+++ linux-2.6/kernel/mutex-debug.c
@@ -26,11 +26,6 @@
 /*
  * Must be called with lock-wait_lock held.
  */
-void debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner)
-{
-   lock-owner = new_owner;
-}
-
 void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
 {
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
@@ -80,9 +75,25 @@ void debug_mutex_unlock(struct mutex *lo
return;
 
DEBUG_LOCKS_WARN_ON(lock-magic != lock);
+#if 0
+   /*
+* XXX something is iffy with owner tracking, but lockdep - which has
+* similar checks - finds everything dandy, so disable for now.
+*/
+   if (lock-owner != current_thread_info()) {
+   printk(KERN_ERR %p %p\n, lock-owner, current_thread_info());
+   if (lock-owner) {
+   printk(KERN_ERR %d %s\n,
+   lock-owner-task-pid,
+   lock-owner-task-comm);
+   }
+   printk(KERN_ERR %d %s\n,
+   current_thread_info()-task-pid,
+   current_thread_info()-task-comm);
+   }
DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info());
+#endif
DEBUG_LOCKS_WARN_ON(!lock-wait_list.prev  !lock-wait_list.next);
-   DEBUG_LOCKS_WARN_ON(lock-owner != current_thread_info());
 }
 
 void debug_mutex_init(struct mutex *lock, const char *name,
@@ -95,7 +106,6 @@ void debug_mutex_init(struct mutex *lock
debug_check_no_locks_freed((void *)lock, sizeof(*lock));
lockdep_init_map(lock-dep_map, name, key, 0);
 #endif
-   lock-owner = NULL;
lock-magic = lock;
 }
 
Index: linux-2.6/kernel/mutex-debug.h
===
--- linux-2.6.orig/kernel/mutex-debug.h
+++ linux-2.6/kernel/mutex-debug.h
@@ -13,14 +13,6 @@
 /*
  * This must be called with lock-wait_lock held.
  */
-extern void
-debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner);
-
-static inline void debug_mutex_clear_owner(struct mutex *lock)
-{
-   lock-owner = NULL;
-}
-
 extern void debug_mutex_lock_common(struct mutex *lock,
struct mutex_waiter *waiter);
 extern void debug_mutex_wake_waiter(struct mutex *lock,
@@ -35,6 +27,16 @@ extern void debug_mutex_unlock(struct mu
 extern void debug_mutex_init(struct mutex *lock, const char *name,
 struct lock_class_key *key);
 
+static inline void mutex_set_owner(struct mutex *lock)
+{
+   lock-owner = current_thread_info();
+}
+
+static inline void mutex_clear_owner(struct mutex *lock)
+{
+   lock-owner = NULL;
+}
+
 #define spin_lock_mutex(lock, flags)   \
do {\
struct mutex *l = container_of(lock, struct mutex, wait_lock); \
Index: 

Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Peter Zijlstra
On Thu, 2009-01-08 at 10:09 -0500, Steven Rostedt wrote:
 On Thu, 8 Jan 2009, Peter Zijlstra wrote:
  Index: linux-2.6/kernel/sched.c
  ===
  --- linux-2.6.orig/kernel/sched.c
  +++ linux-2.6/kernel/sched.c
  @@ -4672,6 +4672,72 @@ need_resched_nonpreemptible:
   }
   EXPORT_SYMBOL(schedule);
   
  +#ifdef CONFIG_SMP
  +/*
  + * Look out! owner is an entirely speculative pointer
  + * access and not reliable.
  + */
  +int spin_on_owner(struct mutex *lock, struct thread_info *owner)
  +{
  +   unsigned int cpu;
  +   struct rq *rq;
  +   int ret = 1;
  +
  +   if (unlikely(!sched_feat(OWNER_SPIN)))
 
 I would remove the unlikely, if someone turns OWNER_SPIN off, then you 
 have the wrong decision being made. Choices by users should never be in a 
 likely or unlikely annotation. It's discrimination ;-)

in the unlikely case we schedule(), that seems expensive enough to want
to make the spin case ever so slightly faster.

  +   return 0;
  +
  +   preempt_disable();
  +#ifdef CONFIG_DEBUG_PAGEALLOC
  +   /*
  +* Need to access the cpu field knowing that
  +* DEBUG_PAGEALLOC could have unmapped it if
  +* the mutex owner just released it and exited.
  +*/
  +   if (probe_kernel_address(owner-cpu, cpu))
  +   goto out;
  +#else
  +   cpu = owner-cpu;
  +#endif
  +
  +   /*
  +* Even if the access succeeded (likely case),
  +* the cpu field may no longer be valid.
  +*/
  +   if (cpu = nr_cpumask_bits)
  +   goto out;
  +
  +   /*
  +* We need to validate that we can do a
  +* get_cpu() and that we have the percpu area.
  +*/
  +   if (!cpu_online(cpu))
  +   goto out;
 
 Should we need to do a get_cpu or something? Couldn't the CPU disappear 
 between these two calls. Or does it do a stop-machine and the preempt 
 disable will protect us?

Did you miss the preempt_disable() a bit up?

  +
  +   rq = cpu_rq(cpu);
  +
  +   for (;;) {
  +   if (lock-owner != owner)
  +   break;
  +
  +   /*
  +* Is that owner really running on that cpu?
  +*/
  +   if (task_thread_info(rq-curr) != owner)
  +   break;
  +
  +   if (need_resched()) {
  +   ret = 0;
  +   break;
  +   }
  +
  +   cpu_relax();
  +   }
  +out:
  +   preempt_enable_no_resched();
  +   return ret;
  +}
  +#endif

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Peter Zijlstra
On Thu, 2009-01-08 at 10:28 -0500, Steven Rostedt wrote:
 On Thu, 8 Jan 2009, Peter Zijlstra wrote:

  in the unlikely case we schedule(), that seems expensive enough to want
  to make the spin case ever so slightly faster.
 
 OK, that makes sense, but I would comment that. Otherwise, it just looks 
 like another misuse of the unlikely annotation.

OK, sensible enough.

   Should we need to do a get_cpu or something? Couldn't the CPU disappear 
   between these two calls. Or does it do a stop-machine and the preempt 
   disable will protect us?
  
  Did you miss the preempt_disable() a bit up?
 
 No, let me rephrase it better. Does the preempt_disable protect against
 another CPU from going off line? Does taking a CPU off line do a 
 stop_machine?

Yes and yes.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt

On Thu, 8 Jan 2009, Steven Rostedt wrote:
+   /*
+* We need to validate that we can do a
+* get_cpu() and that we have the percpu area.
+*/
+   if (!cpu_online(cpu))
+   goto out;
   
   Should we need to do a get_cpu or something? Couldn't the CPU disappear 
   between these two calls. Or does it do a stop-machine and the preempt 
   disable will protect us?
  
  Did you miss the preempt_disable() a bit up?
 
 No, let me rephrase it better. Does the preempt_disable protect against
 another CPU from going off line? Does taking a CPU off line do a 
 stop_machine?

I just looked at the cpu hotplug code, and it does call stop machine. All 
is in order ;-)

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Linus Torvalds


On Thu, 8 Jan 2009, Peter Zijlstra wrote:
 
 This was done because the interaction between trylock_slowpath and that
 -1000 state hurt my head.

Yeah, it was a stupid hacky thing to avoid the list_empty(), but doing 
it explicitly is fine (we don't hold the lock, so the list isn't 
necessarily stable, but doing list_empty() is fine because we don't ever 
dereference the pointers, we just compare the pointers themselves).

I shouldn't have done that hacky thing, it wasn't worth it.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Peter Zijlstra
On Thu, 2009-01-08 at 08:58 -0800, Linus Torvalds wrote:
 
 Ok, I've gone through -v7, and I'm sure you're all shocked to hear it, but 
 I have no complaints. 

*cheer*, except I guess we need to figure out what goes bad for Chris.

 Except that you dropped all the good commit 
 commentary you had earlier ;)

Yeah, I've yet to add that back, will do.

 The patch looks pretty good (except for the big #if 0 block in 
 mutex-debug.c that I hope gets fixed, but I can't even really claim that I 
 can be bothered), the locking looks fine (ie no locking at all), and the 
 numbers seem pretty convinving.
 
 Oh, and I think the open-coded
 
   atomic_cmpxchg(count, 1, 0) == 1
 
 could possibly just be replaced with a simple __mutex_fastpath_trylock(). 
 I dunno.

__mutex_fastpath_trylock() isn't always that neat -- see
include/asm-generic/mutex-xchg.h -- and its a NOP on DEBUG_MUTEXES.

Note how I used old_val for the list_empty() thing as well, we could
possibly drop that extra condition though.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Linus Torvalds


On Thu, 8 Jan 2009, Steven Rostedt wrote:
 
 We keep spinning if the owner changes.

I think we want to - if you have multiple CPU's and a heavily contended 
lock that acts as a spinlock, we still _do_ want to keep spinning even if 
another CPU gets the lock.

And I don't even believe that is the bug. I suspect the bug is simpler. 

I think the need_resched() needs to go in the outer loop, or at least 
happen in the !owner case. Because at least with preemption, what can 
happen otherwise is

 - process A gets the lock, but gets preempted before it sets lock-owner.

   End result: count = 0, owner = NULL.

 - processes B/C goes into the spin loop, filling up all CPU's (assuming 
   dual-core here), and will now both loop forever if they hold the kernel 
   lock (or have some other preemption disabling thing over their down()).

And all the while, process A would _happily_ set -owner, and eventually 
release the mutex, but it never gets to run to do either of them so.

In fact, you might not even need a process C: all you need is for B to be 
on the same runqueue as A, and having enough load on the other CPU's that 
A never gets migrated away. So C might be in user space.

I dunno. There are probably variations on the above.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Linus Torvalds


Unrelated:

On Thu, 8 Jan 2009, Chris Mason wrote:

 RIP: 0010:[8024f4de]  [8024f4de] __cmpxchg+0x36/0x3f

Ouch. HOW THE HELL DID THAT NOT GET INLINED?

cmpxchg() is a _single_ instruction if it's inlined, but it's a horrible 
mess of dynamic conditionals on the (constant - per call-site) size 
argument if it isn't.

It looks like you probably enabled the let gcc mess up inlining config 
option.

Ingo - I think we need to remove that crap again. Because gcc gets the 
inlining horribly horribly wrong. As usual.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt

On Thu, 8 Jan 2009, Linus Torvalds wrote:
 
 And I don't even believe that is the bug. I suspect the bug is simpler. 
 
 I think the need_resched() needs to go in the outer loop, or at least 
 happen in the !owner case. Because at least with preemption, what can 
 happen otherwise is
 
  - process A gets the lock, but gets preempted before it sets lock-owner.
 
End result: count = 0, owner = NULL.
 
  - processes B/C goes into the spin loop, filling up all CPU's (assuming 
dual-core here), and will now both loop forever if they hold the kernel 
lock (or have some other preemption disabling thing over their down()).
 
 And all the while, process A would _happily_ set -owner, and eventually 
 release the mutex, but it never gets to run to do either of them so.
 
 In fact, you might not even need a process C: all you need is for B to be 
 on the same runqueue as A, and having enough load on the other CPU's that 
 A never gets migrated away. So C might be in user space.
 
 I dunno. There are probably variations on the above.

Ouch! I think you are on to something:

for (;;) {
struct thread_info *owner;

old_val = atomic_cmpxchg(lock-count, 1, 0);
if (old_val == 1) {
lock_acquired(lock-dep_map, ip);
mutex_set_owner(lock);
return 0;
}

if (old_val  0  !list_empty(lock-wait_list))
break;

/* See who owns it, and spin on him if anybody */
owner = ACCESS_ONCE(lock-owner);

The owner was preempted before assigning lock-owner (as you stated).

if (owner  !spin_on_owner(lock, owner))
break;

We just spin :-(

I think adding the:

+   if (need_resched())
+   break;

would solve the problem.

Thanks,

-- Steve


cpu_relax();
}

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt



On Thu, 8 Jan 2009, Steven Rostedt wrote:
  In fact, you might not even need a process C: all you need is for B to be 
  on the same runqueue as A, and having enough load on the other CPU's that 
  A never gets migrated away. So C might be in user space.

You're right about not needing process C.

  
  I dunno. There are probably variations on the above.
 
 Ouch! I think you are on to something:
 
 for (;;) {
 struct thread_info *owner;
 
 old_val = atomic_cmpxchg(lock-count, 1, 0);
 if (old_val == 1) {
 lock_acquired(lock-dep_map, ip);
 mutex_set_owner(lock);
 return 0;
 }
 
 if (old_val  0  !list_empty(lock-wait_list))
 break;
 
 /* See who owns it, and spin on him if anybody */
 owner = ACCESS_ONCE(lock-owner);
 
 The owner was preempted before assigning lock-owner (as you stated).

If it was the current process that preempted the owner and these are RT 
tasks pinned to the same CPU and the owner is of lower priority than the 
spinner, we have a deadlock!

Hmm, I do not think the need_sched here will even fix that :-/

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Linus Torvalds


On Thu, 8 Jan 2009, Steven Rostedt wrote:
 
 Ouch! I think you are on to something:

Yeah, there's somethign there, but looking at Chris' backtrace, there's 
nothing there to disable preemption. So if it was this simple case, it 
should still have preempted him to let the other process run and finish 
up.

So I don't think Chris' softlockup is at least _exactly_ that case. 
There's something else going on too.

That said, I do think it's a mistake for us to care about the value of 
spin_on_owner(). I suspect v8 should

 - always have

if (need_resched())
break

   in the outer loop.

 - drop the return value from spin_on_owner(), and just break out if 
   anything changes (including the need_resched() flag).

 - I'd also drop the old_value  0  test, and just test the 
   list_empty() unconditionally. 

Aim for really simple. 

As to what to do about the !owner case - we do want to spin on it, but 
the interaction with preemption is kind of nasty. I'd hesitate to make the 
mutex_[un]lock() use preempt_disable() to avoid scheduling in between 
getting the lock and settign the owner, though - because that would slow 
down the normal fast-path case.

Maybe we should just limit the spin on !owner to some maximal count.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Chris Mason
On Thu, 2009-01-08 at 10:16 -0800, Linus Torvalds wrote:
 
 On Thu, 8 Jan 2009, Steven Rostedt wrote:
  
  Ouch! I think you are on to something:
 
 Yeah, there's somethign there, but looking at Chris' backtrace, there's 
 nothing there to disable preemption. So if it was this simple case, it 
 should still have preempted him to let the other process run and finish 
 up.
 

My .config has no lockdep or schedule debugging and voluntary preempt.
I do have CONFIG_INLINE_OPTIMIZE on, its a good name for trusting gcc I
guess.

 So I don't think Chris' softlockup is at least _exactly_ that case. 
 There's something else going on too.
 
 That said, I do think it's a mistake for us to care about the value of 
 spin_on_owner(). I suspect v8 should
 
  - always have
 
   if (need_resched())
   break
 
in the outer loop.
 
  - drop the return value from spin_on_owner(), and just break out if 
anything changes (including the need_resched() flag).
 
  - I'd also drop the old_value  0  test, and just test the 
list_empty() unconditionally. 
 

I'll give the above a shot, and we can address the preempt + !owner in
another rev

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 Unrelated:
 
 On Thu, 8 Jan 2009, Chris Mason wrote:
 
  RIP: 0010:[8024f4de]  [8024f4de] __cmpxchg+0x36/0x3f
 
 Ouch. HOW THE HELL DID THAT NOT GET INLINED?
 
 cmpxchg() is a _single_ instruction if it's inlined, but it's a horrible 
 mess of dynamic conditionals on the (constant - per call-site) size 
 argument if it isn't.
 
 It looks like you probably enabled the let gcc mess up inlining config 
 option.
 
 Ingo - I think we need to remove that crap again. Because gcc gets the 
 inlining horribly horribly wrong. As usual.

Apparently it messes up with asm()s: it doesnt know the contents of the 
asm() and hence it over-estimates the size [based on string heuristics] 
...

Which is bad - asm()s tend to be the most important entities to inline - 
all over our fastpaths .

Despite that messup it's still a 1% net size win:

  textdata bss dec hex filename
   7109652 1464684  802888 9377224  8f15c8 vmlinux.always-inline
   7046115 1465324  802888 9314327  8e2017 vmlinux.optimized-inlining

That win is mixed in slowpath and fastpath as well.

I see three options:

 - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already 
   default-off)

 - Change the asm() inline markers to something new like asm_inline, which
   defaults to __always_inline.

 - Just mark all asm() inline markers as __always_inline - realizing that 
   these should never ever be out of line.

We might still try the second or third options, as i think we shouldnt go 
back into the business of managing the inline attributes of ~100,000 
kernel functions.

I'll try to annotate the inline asms (there's not _that_ many of them), 
and measure what the size impact is.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread H. Peter Anvin
Ingo Molnar wrote:
 
 Apparently it messes up with asm()s: it doesnt know the contents of the 
 asm() and hence it over-estimates the size [based on string heuristics] 
 ...
 

Right.   gcc simply doesn't have any way to know how heavyweight an
asm() statement is, and it WILL do the wrong thing in many cases --
especially the ones which involve an out-of-line recovery stub.  This is
due to a fundamental design decision in gcc to not integrate the
compiler and assembler (which some compilers do.)

 Which is bad - asm()s tend to be the most important entities to inline - 
 all over our fastpaths .
 
 Despite that messup it's still a 1% net size win:
 
   textdata bss dec hex filename
7109652 1464684  802888 9377224  8f15c8 vmlinux.always-inline
7046115 1465324  802888 9314327  8e2017 vmlinux.optimized-inlining
 
 That win is mixed in slowpath and fastpath as well.

The good part here is that the assembly ones really don't have much
subtlety -- a function call is at least five bytes, usually more once
you count in the register spill penalties -- so __always_inline-ing them
should still end up with numbers looking very much like the above.

 I see three options:
 
  - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already 
default-off)
 
  - Change the asm() inline markers to something new like asm_inline, which
defaults to __always_inline.
 
  - Just mark all asm() inline markers as __always_inline - realizing that 
these should never ever be out of line.
 
 We might still try the second or third options, as i think we shouldnt go 
 back into the business of managing the inline attributes of ~100,000 
 kernel functions.
 
 I'll try to annotate the inline asms (there's not _that_ many of them), 
 and measure what the size impact is.

The main reason to do #2 over #3 would be for programmer documentation.
 There simply should be no reason to ever out-of-lining these.  However,
documenting the reason to the programmer is a valuable thing in itself.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Chris Mason
On Thu, 2009-01-08 at 13:27 -0500, Chris Mason wrote:
 On Thu, 2009-01-08 at 10:16 -0800, Linus Torvalds wrote:
  
  On Thu, 8 Jan 2009, Steven Rostedt wrote:
   
   Ouch! I think you are on to something:
  
  Yeah, there's somethign there, but looking at Chris' backtrace, there's 
  nothing there to disable preemption. So if it was this simple case, it 
  should still have preempted him to let the other process run and finish 
  up.
  
 
 My .config has no lockdep or schedule debugging and voluntary preempt.
 I do have CONFIG_INLINE_OPTIMIZE on, its a good name for trusting gcc I
 guess.

The patch below isn't quite what Linus suggested, but it is working here
at least.  In every test I've tried so far, this is faster than the ugly
btrfs spin.

dbench v7.1:789mb/s
dbench simple spin: 566MB/s

50 proc parallel creates v7.1:162 files/s avg sys: 1.6
50 proc parallel creates simple spin: 152 files/s avg sys: 2

50 proc parallel stat v7.1:2.3s total
50 proc parallel stat simple spin: 3.8s total

It is less fair though, the 50 proc parallel creates had a much bigger
span between the first and last proc's exit time.  This isn't a huge
shock, I think it shows the hot path is closer to a real spin lock.

Here's the incremental I was using.  It looks to me like most of the
things that could change inside spin_on_owner mean we still want to
spin.  The only exception is the need_resched() flag.

-chris

diff --git a/kernel/mutex.c b/kernel/mutex.c
index bd6342a..8936410 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -161,11 +161,13 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
return 0;
}
 
-   if (old_val  0  !list_empty(lock-wait_list))
+   if (!list_empty(lock-wait_list))
break;
 
/* See who owns it, and spin on him if anybody */
owner = ACCESS_ONCE(lock-owner);
+   if (need_resched())
+   break;
if (owner  !spin_on_owner(lock, owner))
break;
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Chris Mason
On Thu, 2009-01-08 at 13:14 -0500, Steven Rostedt wrote:
 
 
 On Thu, 8 Jan 2009, Steven Rostedt wrote:
   In fact, you might not even need a process C: all you need is for B to be 
   on the same runqueue as A, and having enough load on the other CPU's that 
   A never gets migrated away. So C might be in user space.
 
 You're right about not needing process C.
 
   
   I dunno. There are probably variations on the above.
  
  Ouch! I think you are on to something:
  
  for (;;) {
  struct thread_info *owner;
  
  old_val = atomic_cmpxchg(lock-count, 1, 0);
  if (old_val == 1) {
  lock_acquired(lock-dep_map, ip);
  mutex_set_owner(lock);
  return 0;
  }
  
  if (old_val  0  !list_empty(lock-wait_list))
  break;
  
  /* See who owns it, and spin on him if anybody */
  owner = ACCESS_ONCE(lock-owner);
  
  The owner was preempted before assigning lock-owner (as you stated).
 
 If it was the current process that preempted the owner and these are RT 
 tasks pinned to the same CPU and the owner is of lower priority than the 
 spinner, we have a deadlock!
 
 Hmm, I do not think the need_sched here will even fix that :-/

RT tasks could go directly to sleeping.  The spinner would see them on
the list and break out.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Peter Zijlstra
On Thu, 2009-01-08 at 11:13 -0800, Linus Torvalds wrote:
 
 Well, at least we do unless you enable that broken paravirt support. I'm 
 not at all clear on why CONFIG_PARAVIRT wants to use inferior locks, but I 
 don't much care.

Because the virtual cpu that has the ticket might not get scheduled for
a while, even though another vcpu with a spinner is scheduled.

The whole (para)virt is a nightmare in that respect.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt

On Thu, 8 Jan 2009, Chris Mason wrote:
 On Thu, 2009-01-08 at 13:14 -0500, Steven Rostedt wrote:
  
  If it was the current process that preempted the owner and these are RT 
  tasks pinned to the same CPU and the owner is of lower priority than the 
  spinner, we have a deadlock!
  
  Hmm, I do not think the need_sched here will even fix that :-/
 
 RT tasks could go directly to sleeping.  The spinner would see them on
 the list and break out.

True, we could do:

if (owner) {
if (!spin_on_owner(lock, owner))
break;
} else if (rt_task(current))
break;

That would at least solve the issue in the short term.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Harvey Harrison
On Thu, 2009-01-08 at 19:33 +0100, Ingo Molnar wrote:
 * Linus Torvalds torva...@linux-foundation.org wrote:

snip

  
  Ingo - I think we need to remove that crap again. Because gcc gets the 
  inlining horribly horribly wrong. As usual.
 
 Apparently it messes up with asm()s: it doesnt know the contents of the 
 asm() and hence it over-estimates the size [based on string heuristics] 
 ...

snip

 That win is mixed in slowpath and fastpath as well.
 
 I see three options:
 
  - Disable CONFIG_OPTIMIZE_INLINING=y altogether (it's already 
default-off)

I'd like to see this, leave all the heuristics out of it, if I say inline, I
don't mean _maybe_, I mean you'd better damn well inline it.  On the other
hand, alpha seems to be hand-disabling the inline means __always_inline
in their arch headers, so if this option is kept, it should be enabled
on alpha as that is the current state of play there and get rid of that
arch-private bit.

 
  - Change the asm() inline markers to something new like asm_inline, which
defaults to __always_inline.

I'd suggest just making inline always mean __always_inline.  And get to
work removing inline from functions in C files.  This is probably also the
better choice to keep older gccs producing decent code.

 
  - Just mark all asm() inline markers as __always_inline - realizing that 
these should never ever be out of line.
 
 We might still try the second or third options, as i think we shouldnt go 
 back into the business of managing the inline attributes of ~100,000 
 kernel functions.

Or just make it clear that inline shouldn't (unless for a very good reason)
_ever_ be used in a .c file.

Cheers,

Harvey

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Steven Rostedt

 
 The problem? Setting lock-count to 0. That will mean that the next 
 mutex_unlock() will not necessarily enter the slowpath at all, and won't 
 necessarily wake things up like it should.
 
 Normally we set lock-count to 0 after getting the lock, and only _inside_ 
 the spinlock, and then we check the waiters after that. The comment says 
 it all:
 
 /* set it to 0 if there are no waiters left: */
 if (likely(list_empty(lock-wait_list)))
 atomic_set(lock-count, 0);
 
 and the spinning case violates that rule. 

The difference is that here we have it set to -1 (in the non patched 
code), and we have to decide if we should change that to 0. To change from 
-1 to 0 needs the protection of the spin locks.

In the loop, we only change from 1 to 0 which is the same as the fast 
path, and should not cause any problems.


 
 Now, the spinning case only sets it to 0 if we saw it set to 1, so I think 
 the argument can go something like:

Yep.

 
  - if it was 1, and we _have_ seen contention, then we know that at 
least _one_ person that set it to 1 must have gone through the unlock 
slowpath (ie it wasn't just a normal locked increment.

Correct.

 
  - So even if _we_ (in the spinning part of stealing that lock) didn't 
wake the waiter up, the slowpath wakeup case (that did _not_ wake 
us up, since we were spinning and hadn't added ourselves to the wait 
list) must have done so.

Agreed.

 
 So maybe it's all really really safe, and we're still guaranteed to have 
 as many wakeups as we had go-to-sleeps. But I have to admit that my brain 
 hurts a bit from worrying about this.

I do not think that the issue with the previous bug that Chris showed had 
anything to do with the actual sleepers. The slow path never changes the 
lock to '1', so it should not affect the spinners. We can think of the 
spinners as not having true contention with the lock, and are just like a:

while (cond) {
if (mutex_trylock(lock))
goto got_the_lock;
}

 
 Sleeping mutexes are not ever simple. 

Now you see why in -rt we did all this in the slow path ;-)

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread H. Peter Anvin
Harvey Harrison wrote:

 We might still try the second or third options, as i think we shouldnt go 
 back into the business of managing the inline attributes of ~100,000 
 kernel functions.
 
 Or just make it clear that inline shouldn't (unless for a very good reason)
 _ever_ be used in a .c file.
 

The question is if that would produce acceptable quality code.  In
theory it should, but I'm more than wondering if it really will.

It would be ideal, of course, as it would mean less typing.  I guess we
could try it out by disabling any inline in the current code that
isn't __always_inline...

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Harvey Harrison
On Thu, 2009-01-08 at 17:44 -0800, H. Peter Anvin wrote:
 Harvey Harrison wrote:
 
  We might still try the second or third options, as i think we shouldnt go 
  back into the business of managing the inline attributes of ~100,000 
  kernel functions.
  
  Or just make it clear that inline shouldn't (unless for a very good reason)
  _ever_ be used in a .c file.
  
 
 The question is if that would produce acceptable quality code.  In
 theory it should, but I'm more than wondering if it really will.
 
 It would be ideal, of course, as it would mean less typing.  I guess we
 could try it out by disabling any inline in the current code that
 isn't __always_inline...
 

A lot of code was written assuming inline means __always_inline, I'd suggest
keeping that assumption and working on removing inlines that aren't
strictly necessary as there's no way to know what inlines meant 'try to inline'
and what ones really should have been __always_inline.

Not that I feel _that_ strongly about it.

Cheers,

Harvey

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's wrong with snapshot?

2009-01-08 Thread Shen Feng
thanks. I got it.

Shen

on 01/08/2009 07:49 PM, Yan Zheng wrote:
 2009/1/8 Shen Feng s...@cn.fujitsu.com:
 Hello,

 I tried to ceate a snapshot of a subvol.
 But it seems now work.

 Please see the test below. I think the file created by dd
 should also in the snap directory.

 
 This is the expected behaviour.  Following link describes what
 snapshot is.
 
 http://en.wikipedia.org/wiki/Snapshot_(computer_storage)
 
 Yan Zheng
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Andi Kleen
On Thu, Jan 08, 2009 at 05:44:25PM -0800, H. Peter Anvin wrote:
 Harvey Harrison wrote:
 
  We might still try the second or third options, as i think we shouldnt go 
  back into the business of managing the inline attributes of ~100,000 
  kernel functions.
  
  Or just make it clear that inline shouldn't (unless for a very good reason)
  _ever_ be used in a .c file.
  
 
 The question is if that would produce acceptable quality code.  In
 theory it should, but I'm more than wondering if it really will.

I actually often use noinline when developing code simply because it 
makes it easier to read oopses when gcc doesn't inline ever static
(which it normally does if it only has a single caller). You know
roughly where it crashed without having to decode the line number.

I believe others do that too, I notice it's all over btrfs for example.

-Andi

-- 
a...@linux.intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Linus Torvalds


On Thu, 8 Jan 2009, H. Peter Anvin wrote:
 
 Right.   gcc simply doesn't have any way to know how heavyweight an
 asm() statement is

I don't think that's relevant.

First off, gcc _does_ have a perfectly fine notion of how heavy-weight an 
asm statement is: just count it as a single instruction (and count the 
argument setup cost that gcc _can_ estimate).

That would be perfectly fine. If people use inline asms, they tend to use 
it for a reason.

However, I doubt that it's the inline asm that was the biggest reason why 
gcc decided not to inline - it was probably the constant switch() 
statement. The inline function actually looks pretty large, if it wasn't 
for the fact that we have a constant argument, and that one makes the 
switch statement go away.

I suspect gcc has some pre-inlining heuristics that don't take constant 
folding and simplifiation into account - if you look at just the raw tree 
of the function without taking the optimization into account, it will look 
big.

Linus
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread David Miller
From: Linus Torvalds torva...@linux-foundation.org
Date: Thu, 8 Jan 2009 19:46:30 -0800 (PST)

 First off, gcc _does_ have a perfectly fine notion of how heavy-weight an 
 asm statement is: just count it as a single instruction (and count the 
 argument setup cost that gcc _can_ estimate).

Actually, at least at one point, it counted the number of newline
characters in the assembly string to estimate how many instructions
are contained inside.

It actually needs to know exaclty how many instructions are in there,
to emit proper far branches and stuff like that, for some cpus.

Since they never added an (optional) way to actually tell the compiler
this critical piece of information, so I guess the newline hack is the
best they could come up with.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread H. Peter Anvin
Linus Torvalds wrote:
 
 First off, gcc _does_ have a perfectly fine notion of how heavy-weight an 
 asm statement is: just count it as a single instruction (and count the 
 argument setup cost that gcc _can_ estimate).
 

True.  It's not what it's doing, though.  It looks for '\n' and ';'
characters, and counts the maximum instruction size for each possible
instruction.

The reason why is that gcc's size estimation is partially designed to
select what kind of branches it needs to use on architectures which have
more than one type of branches.  As a result, it tends to drastically
overestimate, on purpose.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread H. Peter Anvin
Harvey Harrison wrote:
 
 A lot of code was written assuming inline means __always_inline, I'd suggest
 keeping that assumption and working on removing inlines that aren't
 strictly necessary as there's no way to know what inlines meant 'try to 
 inline'
 and what ones really should have been __always_inline.
 
 Not that I feel _that_ strongly about it.
 

Actually, we have that reasonably well down by now.  There seems to be a
couple of minor tweaking still necessary, but I think we're 90-95% there
already.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Andrew Morton
On Fri, 9 Jan 2009 04:35:31 +0100 Andi Kleen a...@firstfloor.org wrote:

 On Thu, Jan 08, 2009 at 05:44:25PM -0800, H. Peter Anvin wrote:
  Harvey Harrison wrote:
  
   We might still try the second or third options, as i think we shouldnt 
   go 
   back into the business of managing the inline attributes of ~100,000 
   kernel functions.
   
   Or just make it clear that inline shouldn't (unless for a very good 
   reason)
   _ever_ be used in a .c file.
   
  
  The question is if that would produce acceptable quality code.  In
  theory it should, but I'm more than wondering if it really will.
 
 I actually often use noinline when developing code simply because it 
 makes it easier to read oopses when gcc doesn't inline ever static
 (which it normally does if it only has a single caller). You know
 roughly where it crashed without having to decode the line number.
 
 I believe others do that too, I notice it's all over btrfs for example.
 

Plus there is the problem where

foo()
{
char a[1000];
}

bar()
{
char a[1000];
}

zot()
{
foo();
bar();
}

uses 2000 bytes of stack.

Fortunately scripts/checkstack.pl can find these.

If someone runs it.

With the right kconfig settings.

And with the right compiler version.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

2009-01-08 Thread Andi Kleen
On Thu, Jan 08, 2009 at 07:42:48PM -0800, Linus Torvalds wrote:
  I actually often use noinline when developing code simply because it 
  makes it easier to read oopses when gcc doesn't inline ever static
  (which it normally does if it only has a single caller)
 
 Yes. Gcc inlining is a total piece of sh*t.

The static inlining by default (unfortunately) saves a lot of text size.

For testing I built an x86-64 allyesconfig kernel with 
-fno-inline-functions-called-once (which disables the default static
inlining), and it increased text size by ~4.1% (over 2MB for a allyesconfig 
kernel). So I think we have to keep that, dropping it would cost too
much :/

-Andi

-- 
a...@linux.intel.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html