[RFC][PATCH 2/2] Btrfs: implement unlocked dio write

2013-01-31 Thread Miao Xie
This idea is from ext4. By this patch, we can make the dio write parallel,
and improve the performance.

We needn't worry about the race between dio write and truncate, because the
truncate need wait untill all the dio write end.

And we also needn't worry about the race between dio write and punch hole,
because we have extent lock to protect our operation.

I ran fio to test the performance of this feature.

== Hardware ==
CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz
Mem: 2GB
SSD: Intel X25-M 120GB (Test Partition: 60GB)

== config file ==
[global]
ioengine=psync
direct=1
bs=4k
size=32G
runtime=60
directory=/mnt/btrfs/
filename=testfile
group_reporting
thread

[file1]
numjobs=1 # 2 4
rw=randwrite

== result (KBps) ==
write   1   2   4
lock24936   24738   24726
nolock  24962   30866   32101

== result (iops) ==
write   1   2   4
lock623461846181
nolock  624077168025

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d17a04b..091593a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb 
*iocb,
struct file *file = iocb-ki_filp;
struct inode *inode = file-f_mapping-host;
int flags = 0;
-   bool wakeup = false;
+   bool wakeup = true;
int ret;
 
if (check_direct_IO(BTRFS_I(inode)-root, rw, iocb, iov,
offset, nr_segs))
return 0;
 
-   if (rw == READ) {
-   atomic_inc(inode-i_dio_count);
-   smp_mb__after_atomic_inc();
-   if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
- BTRFS_I(inode)-runtime_flags))) {
-   inode_dio_done(inode);
-   flags = DIO_LOCKING | DIO_SKIP_HOLES;
-   } else {
-   wakeup = true;
-   }
+   atomic_inc(inode-i_dio_count);
+   smp_mb__after_atomic_inc();
+   if (rw == WRITE) {
+   mutex_unlock(inode-i_mutex);
+   } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
+BTRFS_I(inode)-runtime_flags))) {
+   inode_dio_done(inode);
+   flags = DIO_LOCKING | DIO_SKIP_HOLES;
+   wakeup = false;
}
 
ret = __blockdev_direct_IO(rw, iocb, inode,
BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev,
iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
btrfs_submit_direct, flags);
+
if (wakeup)
inode_dio_done(inode);
+   if (rw == WRITE)
+   mutex_lock(inode-i_mutex);
return ret;
 }
 
-- 
1.7.11.7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write

2013-01-31 Thread Josef Bacik
On Thu, Jan 31, 2013 at 02:39:03AM -0700, Miao Xie wrote:
 This idea is from ext4. By this patch, we can make the dio write parallel,
 and improve the performance.
 
 We needn't worry about the race between dio write and truncate, because the
 truncate need wait untill all the dio write end.
 
 And we also needn't worry about the race between dio write and punch hole,
 because we have extent lock to protect our operation.
 
 I ran fio to test the performance of this feature.
 
 == Hardware ==
 CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz
 Mem: 2GB
 SSD: Intel X25-M 120GB (Test Partition: 60GB)
 
 == config file ==
 [global]
 ioengine=psync
 direct=1
 bs=4k
 size=32G
 runtime=60
 directory=/mnt/btrfs/
 filename=testfile
 group_reporting
 thread
 
 [file1]
 numjobs=1 # 2 4
 rw=randwrite
 
 == result (KBps) ==
 write 1   2   4
 lock  24936   24738   24726
 nolock24962   30866   32101
 
 == result (iops) ==
 write 1   2   4
 lock  623461846181
 nolock624077168025

So the one thing I worry about is interactions with fsync.  I've been depending
on the mutex to keep us from getting screwed by writers coming in while I'm
trying to write out the changed extents.  Could you test this with fsync and
make sure it doesn't break anything?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write

2013-01-31 Thread Liu Bo
On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
 This idea is from ext4. By this patch, we can make the dio write parallel,
 and improve the performance.

Interesting, AFAIK, ext4 can only do nolock dio write on some
conditions(should be a overwrite, file size remains unchanged,
no aligned/buffer io in flight), btrfs is ok without any conditions?

thanks,
liubo

 
 We needn't worry about the race between dio write and truncate, because the
 truncate need wait untill all the dio write end.
 
 And we also needn't worry about the race between dio write and punch hole,
 because we have extent lock to protect our operation.
 
 I ran fio to test the performance of this feature.
 
 == Hardware ==
 CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz
 Mem: 2GB
 SSD: Intel X25-M 120GB (Test Partition: 60GB)
 
 == config file ==
 [global]
 ioengine=psync
 direct=1
 bs=4k
 size=32G
 runtime=60
 directory=/mnt/btrfs/
 filename=testfile
 group_reporting
 thread
 
 [file1]
 numjobs=1 # 2 4
 rw=randwrite
 
 == result (KBps) ==
 write 1   2   4
 lock  24936   24738   24726
 nolock24962   30866   32101
 
 == result (iops) ==
 write 1   2   4
 lock  623461846181
 nolock624077168025
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c | 24 +---
  1 file changed, 13 insertions(+), 11 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index d17a04b..091593a 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb 
 *iocb,
   struct file *file = iocb-ki_filp;
   struct inode *inode = file-f_mapping-host;
   int flags = 0;
 - bool wakeup = false;
 + bool wakeup = true;
   int ret;
  
   if (check_direct_IO(BTRFS_I(inode)-root, rw, iocb, iov,
   offset, nr_segs))
   return 0;
  
 - if (rw == READ) {
 - atomic_inc(inode-i_dio_count);
 - smp_mb__after_atomic_inc();
 - if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 -   BTRFS_I(inode)-runtime_flags))) {
 - inode_dio_done(inode);
 - flags = DIO_LOCKING | DIO_SKIP_HOLES;
 - } else {
 - wakeup = true;
 - }
 + atomic_inc(inode-i_dio_count);
 + smp_mb__after_atomic_inc();
 + if (rw == WRITE) {
 + mutex_unlock(inode-i_mutex);
 + } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 +  BTRFS_I(inode)-runtime_flags))) {
 + inode_dio_done(inode);
 + flags = DIO_LOCKING | DIO_SKIP_HOLES;
 + wakeup = false;
   }
  
   ret = __blockdev_direct_IO(rw, iocb, inode,
   BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev,
   iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
   btrfs_submit_direct, flags);
 +
   if (wakeup)
   inode_dio_done(inode);
 + if (rw == WRITE)
 + mutex_lock(inode-i_mutex);
   return ret;
  }
  
 -- 
 1.7.11.7
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write

2013-01-31 Thread Miao Xie
On  fri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
 On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
 This idea is from ext4. By this patch, we can make the dio write parallel,
 and improve the performance.
 
 Interesting, AFAIK, ext4 can only do nolock dio write on some
 conditions(should be a overwrite, file size remains unchanged,
 no aligned/buffer io in flight), btrfs is ok without any conditions?

ext4 don't have extent lock, it can not avoid 2 AIO  threads are at work on the 
same
unwritten block, so it can not use unlocked dio write for unaligned dio/aio. 
But btrfs
has extent lock, it can avoid this problem.

And ext4 need take write lock of -i_data_sem, when it allocate the free space,
but in order to avoid truncation and hole punch during dio, it need take the 
read
lock of -i_data_sem before it release -i_mutex, that is if it isn't a 
overwrite,
deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But 
btrfs
doesn't have such limitation.

Thanks
Miao

 
 thanks,
 liubo
 

 We needn't worry about the race between dio write and truncate, because the
 truncate need wait untill all the dio write end.

 And we also needn't worry about the race between dio write and punch hole,
 because we have extent lock to protect our operation.

 I ran fio to test the performance of this feature.

 == Hardware ==
 CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz
 Mem: 2GB
 SSD: Intel X25-M 120GB (Test Partition: 60GB)

 == config file ==
 [global]
 ioengine=psync
 direct=1
 bs=4k
 size=32G
 runtime=60
 directory=/mnt/btrfs/
 filename=testfile
 group_reporting
 thread

 [file1]
 numjobs=1 # 2 4
 rw=randwrite

 == result (KBps) ==
 write1   2   4
 lock 24936   24738   24726
 nolock   24962   30866   32101

 == result (iops) ==
 write1   2   4
 lock 623461846181
 nolock   624077168025

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c | 24 +---
  1 file changed, 13 insertions(+), 11 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index d17a04b..091593a 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb 
 *iocb,
  struct file *file = iocb-ki_filp;
  struct inode *inode = file-f_mapping-host;
  int flags = 0;
 -bool wakeup = false;
 +bool wakeup = true;
  int ret;
  
  if (check_direct_IO(BTRFS_I(inode)-root, rw, iocb, iov,
  offset, nr_segs))
  return 0;
  
 -if (rw == READ) {
 -atomic_inc(inode-i_dio_count);
 -smp_mb__after_atomic_inc();
 -if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 -  BTRFS_I(inode)-runtime_flags))) {
 -inode_dio_done(inode);
 -flags = DIO_LOCKING | DIO_SKIP_HOLES;
 -} else {
 -wakeup = true;
 -}
 +atomic_inc(inode-i_dio_count);
 +smp_mb__after_atomic_inc();
 +if (rw == WRITE) {
 +mutex_unlock(inode-i_mutex);
 +} else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 + BTRFS_I(inode)-runtime_flags))) {
 +inode_dio_done(inode);
 +flags = DIO_LOCKING | DIO_SKIP_HOLES;
 +wakeup = false;
  }
  
  ret = __blockdev_direct_IO(rw, iocb, inode,
  BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev,
  iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
  btrfs_submit_direct, flags);
 +
  if (wakeup)
  inode_dio_done(inode);
 +if (rw == WRITE)
 +mutex_lock(inode-i_mutex);
  return ret;
  }
  
 -- 
 1.7.11.7
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] Btrfs: implement unlocked dio write

2013-01-31 Thread Miao Xie
On  fri, 01 Feb 2013 12:08:25 +0800, Miao Xie wrote:
 Onfri, 1 Feb 2013 10:53:30 +0800, Liu Bo wrote:
 On Thu, Jan 31, 2013 at 05:39:03PM +0800, Miao Xie wrote:
 This idea is from ext4. By this patch, we can make the dio write parallel,
 and improve the performance.

 Interesting, AFAIK, ext4 can only do nolock dio write on some
 conditions(should be a overwrite, file size remains unchanged,
 no aligned/buffer io in flight), btrfs is ok without any conditions?
 
 ext4 don't have extent lock, it can not avoid 2 AIO  threads are at work on 
 the same
 unwritten block, so it can not use unlocked dio write for unaligned dio/aio. 
 But btrfs
 has extent lock, it can avoid this problem.

Besides that, btrfs doesn't allow doing a unaligned dio/aio.

I read the code again, found there is a race that several tasks may update 
i_size at
the same time. There are two methods to fix this problem:
1. just like ext4, don't do unlocked write dio if it is beyond the end of the 
file
2. use a spin lock to protect i_size update

I want to choose the 2nd one.

Thanks
Miao

 
 And ext4 need take write lock of -i_data_sem, when it allocate the free 
 space,
 but in order to avoid truncation and hole punch during dio, it need take the 
 read
 lock of -i_data_sem before it release -i_mutex, that is if it isn't a 
 overwrite,
 deadlock will happen, so the unlocked dio of ext4 should be a overwrite. But 
 btrfs
 doesn't have such limitation.
 
 Thanks
 Miao
 

 thanks,
 liubo


 We needn't worry about the race between dio write and truncate, because the
 truncate need wait untill all the dio write end.

 And we also needn't worry about the race between dio write and punch hole,
 because we have extent lock to protect our operation.

 I ran fio to test the performance of this feature.

 == Hardware ==
 CPU: Intel(R) Core(TM)2 Duo CPU E7500  @ 2.93GHz
 Mem: 2GB
 SSD: Intel X25-M 120GB (Test Partition: 60GB)

 == config file ==
 [global]
 ioengine=psync
 direct=1
 bs=4k
 size=32G
 runtime=60
 directory=/mnt/btrfs/
 filename=testfile
 group_reporting
 thread

 [file1]
 numjobs=1 # 2 4
 rw=randwrite

 == result (KBps) ==
 write   1   2   4
 lock24936   24738   24726
 nolock  24962   30866   32101

 == result (iops) ==
 write   1   2   4
 lock623461846181
 nolock  624077168025

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c | 24 +---
  1 file changed, 13 insertions(+), 11 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index d17a04b..091593a 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -6589,31 +6589,33 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb 
 *iocb,
 struct file *file = iocb-ki_filp;
 struct inode *inode = file-f_mapping-host;
 int flags = 0;
 -   bool wakeup = false;
 +   bool wakeup = true;
 int ret;
  
 if (check_direct_IO(BTRFS_I(inode)-root, rw, iocb, iov,
 offset, nr_segs))
 return 0;
  
 -   if (rw == READ) {
 -   atomic_inc(inode-i_dio_count);
 -   smp_mb__after_atomic_inc();
 -   if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 - BTRFS_I(inode)-runtime_flags))) {
 -   inode_dio_done(inode);
 -   flags = DIO_LOCKING | DIO_SKIP_HOLES;
 -   } else {
 -   wakeup = true;
 -   }
 +   atomic_inc(inode-i_dio_count);
 +   smp_mb__after_atomic_inc();
 +   if (rw == WRITE) {
 +   mutex_unlock(inode-i_mutex);
 +   } else if (unlikely(test_bit(BTRFS_INODE_READDIO_NEED_LOCK,
 +BTRFS_I(inode)-runtime_flags))) {
 +   inode_dio_done(inode);
 +   flags = DIO_LOCKING | DIO_SKIP_HOLES;
 +   wakeup = false;
 }
  
 ret = __blockdev_direct_IO(rw, iocb, inode,
 BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev,
 iov, offset, nr_segs, btrfs_get_blocks_direct, NULL,
 btrfs_submit_direct, flags);
 +
 if (wakeup)
 inode_dio_done(inode);
 +   if (rw == WRITE)
 +   mutex_lock(inode-i_mutex);
 return ret;
  }
  
 -- 
 1.7.11.7
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at