Re: [PATCH v2] mutex: Documentation rewrite

2014-05-22 Thread Randy Dunlap
On 05/22/2014 09:41 AM, Davidlohr Bueso wrote:
> From: Davidlohr Bueso 
> 
> Our mutexes have gone a long ways since the original implementation
> back in 2005/2006. However, the mutex-design.txt document is still
> stuck in the past, to the point where most of the information there
> is practically useless and, more important, simply incorrect. This
> patch pretty much rewrites it to resemble what we have nowadays.
> 
> Since regular semaphores are almost much extinct in the kernel
> (most users now rely on mutexes or rwsems), it no longer makes
> sense to have such a close comparison, which was copied from most
> of the cover letter when Ingo introduced the generic mutex subsystem.
> 
> Note that ww_mutexes are intentionally left out, leaving things as
> generic as possible.
> 
> Signed-off-by: Davidlohr Bueso 
> ---
> Changes from v2:
>  - Grammar corrections.
>  - Document cancelable MCS properties.
> 
>  Documentation/mutex-design.txt | 252 
> ++---
>  1 file changed, 135 insertions(+), 117 deletions(-)
> 
> diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
> index 1dfe62c..d9b0be5 100644
> --- a/Documentation/mutex-design.txt
> +++ b/Documentation/mutex-design.txt
> @@ -1,139 +1,157 @@
>  Generic Mutex Subsystem
>  
>  started by Ingo Molnar 
> +updated by Davidlohr Bueso 
>  
> -  "Why on earth do we need a new mutex subsystem, and what's wrong
> -   with semaphores?"
> +What are mutexes?
> +-
>  
> -firstly, there's nothing wrong with semaphores. But if the simpler
> -mutex semantics are sufficient for your code, then there are a couple
> -of advantages of mutexes:
> +In the Linux kernel, mutexes refer to a particular locking primitive
> +that enforces serialization on shared memory systems, and not only to
> +the generic term referring to 'mutual exclusion' found in academia
> +or similar theoretical text books. Mutexes are sleeping locks which
> +behave similarly to binary semaphores, and were introduced in 2006[1]
> +as an alternative to these. This new data structure provided a number
> +of advantages, including simpler interfaces, and at that time smaller
> +code (see Disadvantages).
>  
> - - 'struct mutex' is smaller on most architectures: E.g. on x86,
> -   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
> -   A smaller structure size means less RAM footprint, and better
> -   CPU-cache utilization.
> +[1] http://lwn.net/Articles/164802/
>  
> - - tighter code. On x86 i get the following .text sizes when
> -   switching all mutex-alike semaphores in the kernel to the mutex
> -   subsystem:
> +Implementation
> +--
>  
> -textdata bss dec hex filename
> - 3280380  868188  396860 4545428  455b94 vmlinux-semaphore
> - 3255329  865296  396732 4517357  44eded vmlinux-mutex
> +Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
> +and implemented in kernel/locking/mutex.c. These locks use a three
> +state atomic counter (->count) to represent the different possible
> +transitions that can occur during the lifetime of a lock:
>  
> -   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
> -   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
> -   Smaller code means better icache footprint, which is one of the
> -   major optimization goals in the Linux kernel currently.
> +   1: unlocked
> +   0: locked, no waiters
> +   negative: locked, with potential waiters
>  
> - - the mutex subsystem is slightly faster and has better scalability for
> -   contended workloads. On an 8-way x86 system, running a mutex-based
> -   kernel and testing creat+unlink+close (of separate, per-task files)
> -   in /tmp with 16 parallel tasks, the average number of ops/sec is:
> +In its most basic form it also includes a wait-queue and a spinlock
> +that serializes access to it. CONFIG_SMP systems can also include
> +a pointer to the lock task owner (->owner) as well as a spinner MCS
> +lock (->osq), both described below in (ii).
>  
> -Semaphores:Mutexes:
> +When acquiring a mutex, there are three possible paths that can be
> +taken, depending on the state of the lock:
>  
> -$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
> -8 CPUs, running 16 tasks.  8 CPUs, running 16 tasks.
> -checking VFS performance.  checking VFS performance.
> -avg loops/sec:  34713  avg loops/sec:  84153
> -CPU utilization:63%CPU utilization:22%
> +(i) fastpath: tries to atomically acquire the lock by decrementing the
> +counter. If it was already taken by another task it goes to the next
> +possible path. This logic is architecture specific. On x86-64, the
> +locking fastpath is 2 instructions:
>  
> -   i.e. in this workload, the mutex based kernel was 2.4 times faster
> -   than the semaphore based kernel, _and_ it also 

[PATCH v2] mutex: Documentation rewrite

2014-05-22 Thread Davidlohr Bueso
From: Davidlohr Bueso 

Our mutexes have gone a long ways since the original implementation
back in 2005/2006. However, the mutex-design.txt document is still
stuck in the past, to the point where most of the information there
is practically useless and, more important, simply incorrect. This
patch pretty much rewrites it to resemble what we have nowadays.

Since regular semaphores are almost much extinct in the kernel
(most users now rely on mutexes or rwsems), it no longer makes
sense to have such a close comparison, which was copied from most
of the cover letter when Ingo introduced the generic mutex subsystem.

Note that ww_mutexes are intentionally left out, leaving things as
generic as possible.

Signed-off-by: Davidlohr Bueso 
---
Changes from v2:
 - Grammar corrections.
 - Document cancelable MCS properties.

 Documentation/mutex-design.txt | 252 ++---
 1 file changed, 135 insertions(+), 117 deletions(-)

diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
index 1dfe62c..d9b0be5 100644
--- a/Documentation/mutex-design.txt
+++ b/Documentation/mutex-design.txt
@@ -1,139 +1,157 @@
 Generic Mutex Subsystem
 
 started by Ingo Molnar 
+updated by Davidlohr Bueso 
 
-  "Why on earth do we need a new mutex subsystem, and what's wrong
-   with semaphores?"
+What are mutexes?
+-
 
-firstly, there's nothing wrong with semaphores. But if the simpler
-mutex semantics are sufficient for your code, then there are a couple
-of advantages of mutexes:
+In the Linux kernel, mutexes refer to a particular locking primitive
+that enforces serialization on shared memory systems, and not only to
+the generic term referring to 'mutual exclusion' found in academia
+or similar theoretical text books. Mutexes are sleeping locks which
+behave similarly to binary semaphores, and were introduced in 2006[1]
+as an alternative to these. This new data structure provided a number
+of advantages, including simpler interfaces, and at that time smaller
+code (see Disadvantages).
 
- - 'struct mutex' is smaller on most architectures: E.g. on x86,
-   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
-   A smaller structure size means less RAM footprint, and better
-   CPU-cache utilization.
+[1] http://lwn.net/Articles/164802/
 
- - tighter code. On x86 i get the following .text sizes when
-   switching all mutex-alike semaphores in the kernel to the mutex
-   subsystem:
+Implementation
+--
 
-textdata bss dec hex filename
- 3280380  868188  396860 4545428  455b94 vmlinux-semaphore
- 3255329  865296  396732 4517357  44eded vmlinux-mutex
+Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
+and implemented in kernel/locking/mutex.c. These locks use a three
+state atomic counter (->count) to represent the different possible
+transitions that can occur during the lifetime of a lock:
 
-   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
-   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
-   Smaller code means better icache footprint, which is one of the
-   major optimization goals in the Linux kernel currently.
+ 1: unlocked
+ 0: locked, no waiters
+   negative: locked, with potential waiters
 
- - the mutex subsystem is slightly faster and has better scalability for
-   contended workloads. On an 8-way x86 system, running a mutex-based
-   kernel and testing creat+unlink+close (of separate, per-task files)
-   in /tmp with 16 parallel tasks, the average number of ops/sec is:
+In its most basic form it also includes a wait-queue and a spinlock
+that serializes access to it. CONFIG_SMP systems can also include
+a pointer to the lock task owner (->owner) as well as a spinner MCS
+lock (->osq), both described below in (ii).
 
-Semaphores:Mutexes:
+When acquiring a mutex, there are three possible paths that can be
+taken, depending on the state of the lock:
 
-$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
-8 CPUs, running 16 tasks.  8 CPUs, running 16 tasks.
-checking VFS performance.  checking VFS performance.
-avg loops/sec:  34713  avg loops/sec:  84153
-CPU utilization:63%CPU utilization:22%
+(i) fastpath: tries to atomically acquire the lock by decrementing the
+counter. If it was already taken by another task it goes to the next
+possible path. This logic is architecture specific. On x86-64, the
+locking fastpath is 2 instructions:
 
-   i.e. in this workload, the mutex based kernel was 2.4 times faster
-   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
-   utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
-   performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
-   performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
-   more 

[PATCH v2] mutex: Documentation rewrite

2014-05-22 Thread Davidlohr Bueso
From: Davidlohr Bueso davidl...@hp.com

Our mutexes have gone a long ways since the original implementation
back in 2005/2006. However, the mutex-design.txt document is still
stuck in the past, to the point where most of the information there
is practically useless and, more important, simply incorrect. This
patch pretty much rewrites it to resemble what we have nowadays.

Since regular semaphores are almost much extinct in the kernel
(most users now rely on mutexes or rwsems), it no longer makes
sense to have such a close comparison, which was copied from most
of the cover letter when Ingo introduced the generic mutex subsystem.

Note that ww_mutexes are intentionally left out, leaving things as
generic as possible.

Signed-off-by: Davidlohr Bueso davidl...@hp.com
---
Changes from v2:
 - Grammar corrections.
 - Document cancelable MCS properties.

 Documentation/mutex-design.txt | 252 ++---
 1 file changed, 135 insertions(+), 117 deletions(-)

diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
index 1dfe62c..d9b0be5 100644
--- a/Documentation/mutex-design.txt
+++ b/Documentation/mutex-design.txt
@@ -1,139 +1,157 @@
 Generic Mutex Subsystem
 
 started by Ingo Molnar mi...@redhat.com
+updated by Davidlohr Bueso davidl...@hp.com
 
-  Why on earth do we need a new mutex subsystem, and what's wrong
-   with semaphores?
+What are mutexes?
+-
 
-firstly, there's nothing wrong with semaphores. But if the simpler
-mutex semantics are sufficient for your code, then there are a couple
-of advantages of mutexes:
+In the Linux kernel, mutexes refer to a particular locking primitive
+that enforces serialization on shared memory systems, and not only to
+the generic term referring to 'mutual exclusion' found in academia
+or similar theoretical text books. Mutexes are sleeping locks which
+behave similarly to binary semaphores, and were introduced in 2006[1]
+as an alternative to these. This new data structure provided a number
+of advantages, including simpler interfaces, and at that time smaller
+code (see Disadvantages).
 
- - 'struct mutex' is smaller on most architectures: E.g. on x86,
-   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
-   A smaller structure size means less RAM footprint, and better
-   CPU-cache utilization.
+[1] http://lwn.net/Articles/164802/
 
- - tighter code. On x86 i get the following .text sizes when
-   switching all mutex-alike semaphores in the kernel to the mutex
-   subsystem:
+Implementation
+--
 
-textdata bss dec hex filename
- 3280380  868188  396860 4545428  455b94 vmlinux-semaphore
- 3255329  865296  396732 4517357  44eded vmlinux-mutex
+Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
+and implemented in kernel/locking/mutex.c. These locks use a three
+state atomic counter (-count) to represent the different possible
+transitions that can occur during the lifetime of a lock:
 
-   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
-   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
-   Smaller code means better icache footprint, which is one of the
-   major optimization goals in the Linux kernel currently.
+ 1: unlocked
+ 0: locked, no waiters
+   negative: locked, with potential waiters
 
- - the mutex subsystem is slightly faster and has better scalability for
-   contended workloads. On an 8-way x86 system, running a mutex-based
-   kernel and testing creat+unlink+close (of separate, per-task files)
-   in /tmp with 16 parallel tasks, the average number of ops/sec is:
+In its most basic form it also includes a wait-queue and a spinlock
+that serializes access to it. CONFIG_SMP systems can also include
+a pointer to the lock task owner (-owner) as well as a spinner MCS
+lock (-osq), both described below in (ii).
 
-Semaphores:Mutexes:
+When acquiring a mutex, there are three possible paths that can be
+taken, depending on the state of the lock:
 
-$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
-8 CPUs, running 16 tasks.  8 CPUs, running 16 tasks.
-checking VFS performance.  checking VFS performance.
-avg loops/sec:  34713  avg loops/sec:  84153
-CPU utilization:63%CPU utilization:22%
+(i) fastpath: tries to atomically acquire the lock by decrementing the
+counter. If it was already taken by another task it goes to the next
+possible path. This logic is architecture specific. On x86-64, the
+locking fastpath is 2 instructions:
 
-   i.e. in this workload, the mutex based kernel was 2.4 times faster
-   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
-   utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
-   performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
-   performed 3825 ops/sec 

Re: [PATCH v2] mutex: Documentation rewrite

2014-05-22 Thread Randy Dunlap
On 05/22/2014 09:41 AM, Davidlohr Bueso wrote:
 From: Davidlohr Bueso davidl...@hp.com
 
 Our mutexes have gone a long ways since the original implementation
 back in 2005/2006. However, the mutex-design.txt document is still
 stuck in the past, to the point where most of the information there
 is practically useless and, more important, simply incorrect. This
 patch pretty much rewrites it to resemble what we have nowadays.
 
 Since regular semaphores are almost much extinct in the kernel
 (most users now rely on mutexes or rwsems), it no longer makes
 sense to have such a close comparison, which was copied from most
 of the cover letter when Ingo introduced the generic mutex subsystem.
 
 Note that ww_mutexes are intentionally left out, leaving things as
 generic as possible.
 
 Signed-off-by: Davidlohr Bueso davidl...@hp.com
 ---
 Changes from v2:
  - Grammar corrections.
  - Document cancelable MCS properties.
 
  Documentation/mutex-design.txt | 252 
 ++---
  1 file changed, 135 insertions(+), 117 deletions(-)
 
 diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
 index 1dfe62c..d9b0be5 100644
 --- a/Documentation/mutex-design.txt
 +++ b/Documentation/mutex-design.txt
 @@ -1,139 +1,157 @@
  Generic Mutex Subsystem
  
  started by Ingo Molnar mi...@redhat.com
 +updated by Davidlohr Bueso davidl...@hp.com
  
 -  Why on earth do we need a new mutex subsystem, and what's wrong
 -   with semaphores?
 +What are mutexes?
 +-
  
 -firstly, there's nothing wrong with semaphores. But if the simpler
 -mutex semantics are sufficient for your code, then there are a couple
 -of advantages of mutexes:
 +In the Linux kernel, mutexes refer to a particular locking primitive
 +that enforces serialization on shared memory systems, and not only to
 +the generic term referring to 'mutual exclusion' found in academia
 +or similar theoretical text books. Mutexes are sleeping locks which
 +behave similarly to binary semaphores, and were introduced in 2006[1]
 +as an alternative to these. This new data structure provided a number
 +of advantages, including simpler interfaces, and at that time smaller
 +code (see Disadvantages).
  
 - - 'struct mutex' is smaller on most architectures: E.g. on x86,
 -   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
 -   A smaller structure size means less RAM footprint, and better
 -   CPU-cache utilization.
 +[1] http://lwn.net/Articles/164802/
  
 - - tighter code. On x86 i get the following .text sizes when
 -   switching all mutex-alike semaphores in the kernel to the mutex
 -   subsystem:
 +Implementation
 +--
  
 -textdata bss dec hex filename
 - 3280380  868188  396860 4545428  455b94 vmlinux-semaphore
 - 3255329  865296  396732 4517357  44eded vmlinux-mutex
 +Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
 +and implemented in kernel/locking/mutex.c. These locks use a three
 +state atomic counter (-count) to represent the different possible
 +transitions that can occur during the lifetime of a lock:
  
 -   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
 -   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
 -   Smaller code means better icache footprint, which is one of the
 -   major optimization goals in the Linux kernel currently.
 +   1: unlocked
 +   0: locked, no waiters
 +   negative: locked, with potential waiters
  
 - - the mutex subsystem is slightly faster and has better scalability for
 -   contended workloads. On an 8-way x86 system, running a mutex-based
 -   kernel and testing creat+unlink+close (of separate, per-task files)
 -   in /tmp with 16 parallel tasks, the average number of ops/sec is:
 +In its most basic form it also includes a wait-queue and a spinlock
 +that serializes access to it. CONFIG_SMP systems can also include
 +a pointer to the lock task owner (-owner) as well as a spinner MCS
 +lock (-osq), both described below in (ii).
  
 -Semaphores:Mutexes:
 +When acquiring a mutex, there are three possible paths that can be
 +taken, depending on the state of the lock:
  
 -$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
 -8 CPUs, running 16 tasks.  8 CPUs, running 16 tasks.
 -checking VFS performance.  checking VFS performance.
 -avg loops/sec:  34713  avg loops/sec:  84153
 -CPU utilization:63%CPU utilization:22%
 +(i) fastpath: tries to atomically acquire the lock by decrementing the
 +counter. If it was already taken by another task it goes to the next
 +possible path. This logic is architecture specific. On x86-64, the
 +locking fastpath is 2 instructions:
  
 -   i.e. in this workload, the mutex based kernel was 2.4 times faster
 -   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
 -   utilization. (In