Re: [lustre-devel] [PATCH 030/124] staging: lustre: llite: Replace write mutex with range lock

2016-09-19 Thread Jan Kara
On Mon 19-09-16 09:25:48, Dilger, Andreas wrote:
> On Sep 19, 2016, at 09:28, Greg Kroah-Hartman  
> wrote:
> > 
> > On Sun, Sep 18, 2016 at 04:37:29PM -0400, James Simmons wrote:
> >> + * Range lock is used to allow multiple threads writing a single shared
> >> + * file given each thread is writing to a non-overlapping portion of the
> >> + * file.
> >> + *
> >> + * Refer to the possible upstream kernel version of range lock by
> >> + * Jan Kara : https://lkml.org/lkml/2013/1/31/480
> >> + *
> >> + * This file could later replaced by the upstream kernel version.
> > 
> > It doesn't look like range_lock ever got accepted in the kernel tree,
> > any idea what happened to it?  Having a per-filesystem lock type seems
> > odd to me...
> 
> I've added Jan and linux-fsdevel to the CC list to see what interest
> there is in the range locking implementaion.  At the time we added this
> to Lustre it appeared that this was moving nicely torward landing, but
> it seems to have stalled.
> 
> I think the range locking implementation is fairly generic, and if there
> are other users in the kernel it could easily be pulled out of the staging
> dir into vfs/.  I'm not against it going into vfs/ directly either, but
> not sure whether that is acceptable if the only user is in staging.

Yeah, so the problem with my range_lock implementation (and your one looks
fairly similar from a quick look) is that it is fairly heavyweight. That
may be OK for an inode_lock replacement but it needs a careful
benchmarking. Davidlohr was looking into making the implementation more
efficient (he wanted to use it for mmap_sem) but I think it got preempted
by other work.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [lustre-devel] [PATCH 030/124] staging: lustre: llite: Replace write mutex with range lock

2016-09-19 Thread Dilger, Andreas
On Sep 19, 2016, at 09:28, Greg Kroah-Hartman  
wrote:
> 
> On Sun, Sep 18, 2016 at 04:37:29PM -0400, James Simmons wrote:
>> + * Range lock is used to allow multiple threads writing a single shared
>> + * file given each thread is writing to a non-overlapping portion of the
>> + * file.
>> + *
>> + * Refer to the possible upstream kernel version of range lock by
>> + * Jan Kara : https://lkml.org/lkml/2013/1/31/480
>> + *
>> + * This file could later replaced by the upstream kernel version.
> 
> It doesn't look like range_lock ever got accepted in the kernel tree,
> any idea what happened to it?  Having a per-filesystem lock type seems
> odd to me...

I've added Jan and linux-fsdevel to the CC list to see what interest
there is in the range locking implementaion.  At the time we added this
to Lustre it appeared that this was moving nicely torward landing, but
it seems to have stalled.

I think the range locking implementation is fairly generic, and if there
are other users in the kernel it could easily be pulled out of the staging
dir into vfs/.  I'm not against it going into vfs/ directly either, but
not sure whether that is acceptable if the only user is in staging.

Cheers, Andreas


Re: [PATCH 030/124] staging: lustre: llite: Replace write mutex with range lock

2016-09-19 Thread Greg Kroah-Hartman
On Sun, Sep 18, 2016 at 04:37:29PM -0400, James Simmons wrote:
> + * Range lock is used to allow multiple threads writing a single shared
> + * file given each thread is writing to a non-overlapping portion of the
> + * file.
> + *
> + * Refer to the possible upstream kernel version of range lock by
> + * Jan Kara : https://lkml.org/lkml/2013/1/31/480
> + *
> + * This file could later replaced by the upstream kernel version.

It doesn't look like range_lock ever got accepted in the kernel tree,
any idea what happened to it?  Having a per-filesystem lock type seems
odd to me...

thanks,

greg k-h


[PATCH 030/124] staging: lustre: llite: Replace write mutex with range lock

2016-09-18 Thread James Simmons
From: Prakash Surya 

Testing has shown the ll_inode_inode's lli_write_mutex to be a
limiting factor with single shared file write performance, when using
many writing threads on a single machine. Even if each thread is
writing to a unique portion of the file, the lli_write_mutex will
prevent no more than a single thread to ever write to the file
simultaneously.

This change attempts to remove this bottle neck, by replacing this
mutex with a range lock. This should allow multiple threads to write
to a single file simultaneously iff the threads are writing to unique
regions of the file.

Performance testing shows this change to garner a significant
performance boost to write bandwidth. Using FIO on a single machine
with 32 GB of RAM, write performance tests were run with and without
this change applied; the results are below:

+-+---+-++++
| fio v2.0.13 |Write Bandwidth (KB/s)  |
+-+---+-++++
| # Tasks | GB / Task | Test 1  | Test 2 | Test 3 | Test 4 |
+-+---+-++++
|1|64 |  452446 | 454623 | 457653 | 463737 |
|2|32 |  850318 | 565373 | 602498 | 733027 |
|4|16 | 1058900 | 463546 | 529107 | 976284 |
|8| 8 | 1026300 | 468190 | 576451 | 963404 |
|   16| 4 | 1065500 | 503160 | 462902 | 830065 |
|   32| 2 | 1068600 | 462228 | 466963 | 749733 |
|   64| 1 |  991830 | 556618 | 557863 | 710912 |
+-+---+-++++

 * Test 1: Lustre client running 04ec54f. File per process write
   workload. This test was used as a baseline for what we
   _could_ achieve in the single shared file tests if the
   bottle necks were removed.

 * Test 2: Lustre client running 04ec54f. Single shared file
   workload, each task writing to a unique region.

 * Test 3: Lustre client running 04ec54f + I0023132b. Single shared
   file workload, each task writing to a unique region.

 * Test 4: Lustre client running 04ec54f + this patch.
   Single shared file workload, each task writing to a unique
   region.

Direct IO does not use the page cache like normal IO, so
concurrent direct IO reads of the same pages are not safe.

As a result, direct IO reads must take the range lock
in ll_file_io_generic, otherwise they will attempt to work
on the same pages and hit assertions like:
(osc_request.c:1219:osc_brw_prep_request())
ASSERTION( i == 0 || pg->off > pg_prev->off ) failed:
i 3 p_c 10 pg ea00017a5208 [pri 0 ind 2771] off 16384
prev_pg ea00017a51d0 [pri 0 ind 2256] off 16384

Signed-off-by: Prakash Surya 
Signed-off-by: Patrick Farrell 
Signed-off-by: Jinshan Xiong 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1669
Reviewed-on: http://review.whamcloud.com/6320
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6227
Reviewed-on: http://review.whamcloud.com/14385
Reviewed-by: Bobi Jam 
Reviewed-by: Alexander Boyko 
Reviewed-by: Hiroya Nozaki 
Reviewed-by: Andreas Dilger 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/Makefile   |2 +-
 drivers/staging/lustre/lustre/llite/file.c |   37 +++-
 .../staging/lustre/lustre/llite/llite_internal.h   |5 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c|2 +-
 drivers/staging/lustre/lustre/llite/range_lock.c   |  233 
 drivers/staging/lustre/lustre/llite/range_lock.h   |   82 +++
 6 files changed, 348 insertions(+), 13 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/range_lock.c
 create mode 100644 drivers/staging/lustre/lustre/llite/range_lock.h

diff --git a/drivers/staging/lustre/lustre/llite/Makefile 
b/drivers/staging/lustre/lustre/llite/Makefile
index 2cbb1b8..1ac0940 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -1,6 +1,6 @@
 obj-$(CONFIG_LUSTRE_FS) += lustre.o
 lustre-y := dcache.o dir.o file.o llite_close.o llite_lib.o llite_nfs.o \
-   rw.o namei.o symlink.o llite_mmap.o \
+   rw.o namei.o symlink.o llite_mmap.o range_lock.o \
xattr.o xattr_cache.o rw26.o super25.o statahead.o \
glimpse.o lcommon_cl.o lcommon_misc.o \
vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o vvp_req.o \
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 273b563..d9ee255 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1126,6 +1126,7 @@ ll_file_io_generic(const struct lu_env *env, struct 
vvp_io_args *args,
 {
struct ll_inode_info *lli = ll_i2info(file_inode(file));
struct ll_file_data  *fd  = LUSTRE_FPRIVATE(file);
+   struct