Re: [RFC PATCH 00/64] mm: towards parallel address space operations

2018-02-06 Thread Davidlohr Bueso

On Mon, 05 Feb 2018, Laurent Dufour wrote:


On 05/02/2018 02:26, Davidlohr Bueso wrote:

From: Davidlohr Bueso 

Hi,

This patchset is a new version of both the range locking machinery as well
as a full mmap_sem conversion that makes use of it -- as the worst case
scenario as all mmap_sem calls are converted to a full range mmap_lock
equivalent. As such, while there is no improvement of concurrency perse,
these changes aim at adding the machinery to permit this in the future.


Despite the massive rebase, what are the changes in this series compared to
the one I sent in last May - you silently based on, by the way :
https://lkml.org/lkml/2017/5/24/409


Hardly, but yes I meant to reference that. It ended up being easier to just
do a from scratch version. I haven't done a comparison, but at first I thought
you missed gup users (now not so much), this patchset allows testing on more
archs (see below), we remove the trylock in vm_insert_page(), etc.



Direct users of the mm->mmap_sem can be classified as those that (1) acquire
and release the lock within the same context, and (2) those who directly
manipulate the mmap_sem down the callchain. For example:

(1)  down_read(>mmap_sem);
 /* do something */
 /* nobody down the chain uses mmap_sem directly */
 up_read(>mmap_sem);

(2a)  down_read(>mmap_sem);
  /* do something that retuns mmap_sem unlocked */
  fn(mm, );
  if (locked)
up_read(>mmap_sem);

(2b)  down_read(>mmap_sem);
  /* do something that in between released and reacquired mmap_sem */
  fn(mm);
  up_read(>mmap_sem);


Unfortunately, there are also indirect users which rely on the mmap_sem
locking to protect their data. For the first step using a full range this
doesn't matter, but when refining the range, these one would be the most
critical ones as they would have to be reworked to take the range in account.


Of course. The value I see in this patchset is that we can determine whether or
not we move forward based on the worst case scenario numbers.


Testing: I have setup an mmtests config file with all the workloads described:
http://linux-scalability.org/mmtests-config


Is this link still valid, I can't reach it ?


Sorry, that should have been:

https://linux-scalability.org/range-mmap_lock/mmtests-config

Thanks,
Davidlohr


Re: [RFC PATCH 00/64] mm: towards parallel address space operations

2018-02-06 Thread Davidlohr Bueso

On Mon, 05 Feb 2018, Laurent Dufour wrote:


On 05/02/2018 02:26, Davidlohr Bueso wrote:

From: Davidlohr Bueso 

Hi,

This patchset is a new version of both the range locking machinery as well
as a full mmap_sem conversion that makes use of it -- as the worst case
scenario as all mmap_sem calls are converted to a full range mmap_lock
equivalent. As such, while there is no improvement of concurrency perse,
these changes aim at adding the machinery to permit this in the future.


Despite the massive rebase, what are the changes in this series compared to
the one I sent in last May - you silently based on, by the way :
https://lkml.org/lkml/2017/5/24/409


Hardly, but yes I meant to reference that. It ended up being easier to just
do a from scratch version. I haven't done a comparison, but at first I thought
you missed gup users (now not so much), this patchset allows testing on more
archs (see below), we remove the trylock in vm_insert_page(), etc.



Direct users of the mm->mmap_sem can be classified as those that (1) acquire
and release the lock within the same context, and (2) those who directly
manipulate the mmap_sem down the callchain. For example:

(1)  down_read(>mmap_sem);
 /* do something */
 /* nobody down the chain uses mmap_sem directly */
 up_read(>mmap_sem);

(2a)  down_read(>mmap_sem);
  /* do something that retuns mmap_sem unlocked */
  fn(mm, );
  if (locked)
up_read(>mmap_sem);

(2b)  down_read(>mmap_sem);
  /* do something that in between released and reacquired mmap_sem */
  fn(mm);
  up_read(>mmap_sem);


Unfortunately, there are also indirect users which rely on the mmap_sem
locking to protect their data. For the first step using a full range this
doesn't matter, but when refining the range, these one would be the most
critical ones as they would have to be reworked to take the range in account.


Of course. The value I see in this patchset is that we can determine whether or
not we move forward based on the worst case scenario numbers.


Testing: I have setup an mmtests config file with all the workloads described:
http://linux-scalability.org/mmtests-config


Is this link still valid, I can't reach it ?


Sorry, that should have been:

https://linux-scalability.org/range-mmap_lock/mmtests-config

Thanks,
Davidlohr


Re: [RFC PATCH 00/64] mm: towards parallel address space operations

2018-02-05 Thread Laurent Dufour
On 05/02/2018 02:26, Davidlohr Bueso wrote:
> From: Davidlohr Bueso 
> 
> Hi,
> 
> This patchset is a new version of both the range locking machinery as well
> as a full mmap_sem conversion that makes use of it -- as the worst case
> scenario as all mmap_sem calls are converted to a full range mmap_lock
> equivalent. As such, while there is no improvement of concurrency perse,
> these changes aim at adding the machinery to permit this in the future.

Despite the massive rebase, what are the changes in this series compared to
the one I sent in last May - you silently based on, by the way :
https://lkml.org/lkml/2017/5/24/409

> 
> Direct users of the mm->mmap_sem can be classified as those that (1) acquire
> and release the lock within the same context, and (2) those who directly
> manipulate the mmap_sem down the callchain. For example:
> 
> (1)  down_read(>mmap_sem);
>  /* do something */
>  /* nobody down the chain uses mmap_sem directly */
>  up_read(>mmap_sem);
> 
> (2a)  down_read(>mmap_sem);
>   /* do something that retuns mmap_sem unlocked */
>   fn(mm, );
>   if (locked)
> up_read(>mmap_sem);
> 
> (2b)  down_read(>mmap_sem);
>   /* do something that in between released and reacquired mmap_sem */
>   fn(mm);
>   up_read(>mmap_sem);

Unfortunately, there are also indirect users which rely on the mmap_sem
locking to protect their data. For the first step using a full range this
doesn't matter, but when refining the range, these one would be the most
critical ones as they would have to be reworked to take the range in account.

> 
> Patches 1-2: add the range locking machinery. This is rebased on the rbtree
> optimizations for interval trees such that we can quickly detect overlapping
> ranges. More documentation as also been added, with an ordering example in the
> source code.
> 
> Patch 3: adds new mm locking wrappers around mmap_sem.
> 
> Patches 4-15: teaches page fault paths about mmrange (specifically adding the
> range in question to the struct vm_fault). In addition, most of these patches
> update mmap_sem callers that call into the 2a and 2b examples above.
> 
> Patches 15-63: adds most of the trivial conversions -- the (1) example above.
> (patches 21, 22, 23 are hacks that avoid rwsem_is_locked(mmap_sem) such that
> we don't have to teach file_operations about mmrange.
> 
> Patch 64: finally do the actual conversion and replace mmap_sem with the range
> mmap_lock.
> 
> I've run the series on a 40-core (ht) 2-socket IvyBridge with 16 Gb of memory
> on various benchmarks that stress address space concurrency.
> 
> ** pft is a microbenchmark for page fault rates.
> 
> When running with increasing thread counts, range locking takes a rather small
> hit (yet constant) of ~2% for the pft timings, with a max of 5%. This 
> translates
> similarly to faults/cpu.
> 
> 
> pft timings
>   v4.15-rc8  v4.15-rc8
> range-mmap_lock-v1
> Amean system-1  1.11 (   0.00%)1.17 (  -5.86%)
> Amean system-4  1.14 (   0.00%)1.18 (  -3.07%)
> Amean system-7  1.38 (   0.00%)1.36 (   0.94%)
> Amean system-12 2.28 (   0.00%)2.31 (  -1.18%)
> Amean system-21 4.11 (   0.00%)4.13 (  -0.44%)
> Amean system-30 5.94 (   0.00%)6.01 (  -1.11%)
> Amean system-40 8.24 (   0.00%)8.33 (  -1.04%)
> Amean elapsed-1 1.28 (   0.00%)1.33 (  -4.50%)
> Amean elapsed-4 0.32 (   0.00%)0.34 (  -5.27%)
> Amean elapsed-7 0.24 (   0.00%)0.24 (  -0.43%)
> Amean elapsed-120.23 (   0.00%)0.23 (  -0.22%)
> Amean elapsed-210.26 (   0.00%)0.25 (   0.39%)
> Amean elapsed-300.24 (   0.00%)0.24 (  -0.21%)
> Amean elapsed-400.24 (   0.00%)0.24 (   0.84%)
> Stddevsystem-1  0.04 (   0.00%)0.05 ( -16.29%)
> Stddevsystem-4  0.03 (   0.00%)0.03 (  17.70%)
> Stddevsystem-7  0.08 (   0.00%)0.02 (  68.56%)
> Stddevsystem-12 0.05 (   0.00%)0.06 ( -31.22%)
> Stddevsystem-21 0.06 (   0.00%)0.06 (   8.07%)
> Stddevsystem-30 0.05 (   0.00%)0.09 ( -70.15%)
> Stddevsystem-40 0.11 (   0.00%)0.07 (  41.53%)
> Stddevelapsed-1 0.03 (   0.00%)0.05 ( -72.14%)
> Stddevelapsed-4 0.01 (   0.00%)0.01 (  -4.98%)
> Stddevelapsed-7 0.01 (   0.00%)0.01 (  60.65%)
> Stddevelapsed-120.01 (   0.00%)0.01 (   6.24%)
> Stddevelapsed-210.01 (   0.00%)0.01 (  -1.13%)
> Stddevelapsed-300.00 (   0.00%)0.00 ( -45.10%)
> Stddevelapsed-400.01 (   0.00%)0.01 (  25.97%)
> 
> pft faults
> 

Re: [RFC PATCH 00/64] mm: towards parallel address space operations

2018-02-05 Thread Laurent Dufour
On 05/02/2018 02:26, Davidlohr Bueso wrote:
> From: Davidlohr Bueso 
> 
> Hi,
> 
> This patchset is a new version of both the range locking machinery as well
> as a full mmap_sem conversion that makes use of it -- as the worst case
> scenario as all mmap_sem calls are converted to a full range mmap_lock
> equivalent. As such, while there is no improvement of concurrency perse,
> these changes aim at adding the machinery to permit this in the future.

Despite the massive rebase, what are the changes in this series compared to
the one I sent in last May - you silently based on, by the way :
https://lkml.org/lkml/2017/5/24/409

> 
> Direct users of the mm->mmap_sem can be classified as those that (1) acquire
> and release the lock within the same context, and (2) those who directly
> manipulate the mmap_sem down the callchain. For example:
> 
> (1)  down_read(>mmap_sem);
>  /* do something */
>  /* nobody down the chain uses mmap_sem directly */
>  up_read(>mmap_sem);
> 
> (2a)  down_read(>mmap_sem);
>   /* do something that retuns mmap_sem unlocked */
>   fn(mm, );
>   if (locked)
> up_read(>mmap_sem);
> 
> (2b)  down_read(>mmap_sem);
>   /* do something that in between released and reacquired mmap_sem */
>   fn(mm);
>   up_read(>mmap_sem);

Unfortunately, there are also indirect users which rely on the mmap_sem
locking to protect their data. For the first step using a full range this
doesn't matter, but when refining the range, these one would be the most
critical ones as they would have to be reworked to take the range in account.

> 
> Patches 1-2: add the range locking machinery. This is rebased on the rbtree
> optimizations for interval trees such that we can quickly detect overlapping
> ranges. More documentation as also been added, with an ordering example in the
> source code.
> 
> Patch 3: adds new mm locking wrappers around mmap_sem.
> 
> Patches 4-15: teaches page fault paths about mmrange (specifically adding the
> range in question to the struct vm_fault). In addition, most of these patches
> update mmap_sem callers that call into the 2a and 2b examples above.
> 
> Patches 15-63: adds most of the trivial conversions -- the (1) example above.
> (patches 21, 22, 23 are hacks that avoid rwsem_is_locked(mmap_sem) such that
> we don't have to teach file_operations about mmrange.
> 
> Patch 64: finally do the actual conversion and replace mmap_sem with the range
> mmap_lock.
> 
> I've run the series on a 40-core (ht) 2-socket IvyBridge with 16 Gb of memory
> on various benchmarks that stress address space concurrency.
> 
> ** pft is a microbenchmark for page fault rates.
> 
> When running with increasing thread counts, range locking takes a rather small
> hit (yet constant) of ~2% for the pft timings, with a max of 5%. This 
> translates
> similarly to faults/cpu.
> 
> 
> pft timings
>   v4.15-rc8  v4.15-rc8
> range-mmap_lock-v1
> Amean system-1  1.11 (   0.00%)1.17 (  -5.86%)
> Amean system-4  1.14 (   0.00%)1.18 (  -3.07%)
> Amean system-7  1.38 (   0.00%)1.36 (   0.94%)
> Amean system-12 2.28 (   0.00%)2.31 (  -1.18%)
> Amean system-21 4.11 (   0.00%)4.13 (  -0.44%)
> Amean system-30 5.94 (   0.00%)6.01 (  -1.11%)
> Amean system-40 8.24 (   0.00%)8.33 (  -1.04%)
> Amean elapsed-1 1.28 (   0.00%)1.33 (  -4.50%)
> Amean elapsed-4 0.32 (   0.00%)0.34 (  -5.27%)
> Amean elapsed-7 0.24 (   0.00%)0.24 (  -0.43%)
> Amean elapsed-120.23 (   0.00%)0.23 (  -0.22%)
> Amean elapsed-210.26 (   0.00%)0.25 (   0.39%)
> Amean elapsed-300.24 (   0.00%)0.24 (  -0.21%)
> Amean elapsed-400.24 (   0.00%)0.24 (   0.84%)
> Stddevsystem-1  0.04 (   0.00%)0.05 ( -16.29%)
> Stddevsystem-4  0.03 (   0.00%)0.03 (  17.70%)
> Stddevsystem-7  0.08 (   0.00%)0.02 (  68.56%)
> Stddevsystem-12 0.05 (   0.00%)0.06 ( -31.22%)
> Stddevsystem-21 0.06 (   0.00%)0.06 (   8.07%)
> Stddevsystem-30 0.05 (   0.00%)0.09 ( -70.15%)
> Stddevsystem-40 0.11 (   0.00%)0.07 (  41.53%)
> Stddevelapsed-1 0.03 (   0.00%)0.05 ( -72.14%)
> Stddevelapsed-4 0.01 (   0.00%)0.01 (  -4.98%)
> Stddevelapsed-7 0.01 (   0.00%)0.01 (  60.65%)
> Stddevelapsed-120.01 (   0.00%)0.01 (   6.24%)
> Stddevelapsed-210.01 (   0.00%)0.01 (  -1.13%)
> Stddevelapsed-300.00 (   0.00%)0.00 ( -45.10%)
> Stddevelapsed-400.01 (   0.00%)0.01 (  25.97%)
> 
> pft faults
>