Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-18 Thread liubo
On 04/16/2011 03:32 AM, Josef Bacik wrote:
 On 04/15/2011 03:24 PM, Christoph Hellwig wrote:
 Sorry, but this is too ugly to live.  If the reason for this really is
 good enough we'll just need to push the filemap_write_and_wait_range
 and i_mutex locking into every -fsync instance.

 
 So part of what makes small fsyncs slow in btrfs is all of our random
 threads to make checksumming not suck.  So we submit IO which spreads it
 out to helper threads to do the checksumming, and then when it returns
 it gets handed off to endio threads that run the endio stuff.  This
 works awesome with doing big writes and such, but if say we're and RPM
 database and write a couple of kilbytes, this tends to suck because we
 keep handing work off to other threads and waiting, so the scheduling
 latencies really hurt.
 
 So we'd like to be able to say hey this is a small amount of io, lets
 just do the checksumming in the current thread, and the same with
 handling the endio stuff.  We can't do that currently because
 filemap_write_and_wait_range is called before we get to fsync.  We'd
 like to be able to control this so we can do the appropriate magic to do
 the submission within the fsyncings thread context in order to speed
 things up a bit.
 
 That plus the stuff I said about i_mutex.  Is that a good enough reason
 to just push this down into all the filesystems?  Thanks,
 

Fine with the i_mutex.

I'm wandering that is it worth of doing so?

I've tested your patch with sysbench, and there is little improvement. :(

Sysbench args:
sysbench --test=fileio --num-threads=1 --file-num=10240 --file-block-size=1K 
--file-total-size=20M --file-test-mode=rndwr --file-io-mode=sync 
--file-extra-flags=  run


10240 files, 2Kb each
===
fsync_nolock (patch):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.152Kb/sec)
   35.15 Requests/sec executed

fsync (orig):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.287Kb/sec)
   35.29 Requests/sec executed
===

Seems that the improvement of avoiding threads interchange is not enough.

BTW, I'm trying to improve the fsync performance stuff, but mainly for large 
files(4G).
And I found that a large file will have a tremendous amount of csum items 
needed to
be flush into tree log during fsync().  Btrfs now uses a brute force approach to
ensure to get the most uptodate copies of everything, and this results in a bad
performance.  To change the brute way is bugging me a lot...

thanks,
liubo

 Josef
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-18 Thread Josef Bacik

On 04/18/2011 02:49 AM, liubo wrote:

On 04/16/2011 03:32 AM, Josef Bacik wrote:

On 04/15/2011 03:24 PM, Christoph Hellwig wrote:

Sorry, but this is too ugly to live.  If the reason for this really is
good enough we'll just need to push the filemap_write_and_wait_range
and i_mutex locking into every -fsync instance.



So part of what makes small fsyncs slow in btrfs is all of our random
threads to make checksumming not suck.  So we submit IO which spreads it
out to helper threads to do the checksumming, and then when it returns
it gets handed off to endio threads that run the endio stuff.  This
works awesome with doing big writes and such, but if say we're and RPM
database and write a couple of kilbytes, this tends to suck because we
keep handing work off to other threads and waiting, so the scheduling
latencies really hurt.

So we'd like to be able to say hey this is a small amount of io, lets
just do the checksumming in the current thread, and the same with
handling the endio stuff.  We can't do that currently because
filemap_write_and_wait_range is called before we get to fsync.  We'd
like to be able to control this so we can do the appropriate magic to do
the submission within the fsyncings thread context in order to speed
things up a bit.

That plus the stuff I said about i_mutex.  Is that a good enough reason
to just push this down into all the filesystems?  Thanks,



Fine with the i_mutex.

I'm wandering that is it worth of doing so?

I've tested your patch with sysbench, and there is little improvement. :(



Yeah it's not a huge change for us, there are other places we need to 
work on, however things like ext4 could do well to not hold the i_mutex 
over a transaction commit.  Just an example of how this could help us 
all in general, not just btrfs.



Sysbench args:
sysbench --test=fileio --num-threads=1 --file-num=10240 --file-block-size=1K 
--file-total-size=20M --file-test-mode=rndwr --file-io-mode=sync 
--file-extra-flags=  run


10240 files, 2Kb each
===
fsync_nolock (patch):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.152Kb/sec)
35.15 Requests/sec executed

fsync (orig):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.287Kb/sec)
35.29 Requests/sec executed
===

Seems that the improvement of avoiding threads interchange is not enough.

BTW, I'm trying to improve the fsync performance stuff, but mainly for large 
files(4G).
And I found that a large file will have a tremendous amount of csum items 
needed to
be flush into tree log during fsync().  Btrfs now uses a brute force approach to
ensure to get the most uptodate copies of everything, and this results in a bad
performance.  To change the brute way is bugging me a lot...



Yeah there are some things that could be done for this, I'm going to be 
spending a while here trying to squeeze as much performance out of fsync 
that we can get, though first I'm going to start with small fsyncs since 
that will be the most practical gain at the moment (think RPM 
databases).  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Add a new file op for fsync to give fs's more control

2011-04-15 Thread Josef Bacik
Btrfs needs to be able to control how data is submitted in the case of fsync to
make it a little faster, and really we could get rid of holding the i_mutex
altogether as well.  So introduce a -fsync_nolock helper that pushes the
responsibility of locking the inode and doing the filemap_write_and_wait_range
down into the fs so we can have better control of how we submit the io and do
our locking.  It looks like ext4 and probably xfs could get away with not taking
the i_mutex either, so they may benefit from this as well.  Really I could just
change -fsync() to do this and push everything down into all the filesystems,
but I wasn't sure how well that would be recieved, so I'm taking this approach.
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-15 Thread Christoph Hellwig
Sorry, but this is too ugly to live.  If the reason for this really is
good enough we'll just need to push the filemap_write_and_wait_range
and i_mutex locking into every -fsync instance.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-15 Thread Josef Bacik

On 04/15/2011 03:24 PM, Christoph Hellwig wrote:

Sorry, but this is too ugly to live.  If the reason for this really is
good enough we'll just need to push the filemap_write_and_wait_range
and i_mutex locking into every -fsync instance.



So part of what makes small fsyncs slow in btrfs is all of our random 
threads to make checksumming not suck.  So we submit IO which spreads it 
out to helper threads to do the checksumming, and then when it returns 
it gets handed off to endio threads that run the endio stuff.  This 
works awesome with doing big writes and such, but if say we're and RPM 
database and write a couple of kilbytes, this tends to suck because we 
keep handing work off to other threads and waiting, so the scheduling 
latencies really hurt.


So we'd like to be able to say hey this is a small amount of io, lets 
just do the checksumming in the current thread, and the same with 
handling the endio stuff.  We can't do that currently because 
filemap_write_and_wait_range is called before we get to fsync.  We'd 
like to be able to control this so we can do the appropriate magic to do 
the submission within the fsyncings thread context in order to speed 
things up a bit.


That plus the stuff I said about i_mutex.  Is that a good enough reason 
to just push this down into all the filesystems?  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-15 Thread Chris Mason
Excerpts from Christoph Hellwig's message of 2011-04-15 15:24:12 -0400:
 Sorry, but this is too ugly to live.  If the reason for this really is
 good enough we'll just need to push the filemap_write_and_wait_range
 and i_mutex locking into every -fsync instance.
 

Which part is too ugly to live?  The special op? New parameters?

The unconditional taking of i_mutex hurts a lot, especially on directory
fsyncs, so I'd love to get rid of it.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-15 Thread Christoph Hellwig
On Fri, Apr 15, 2011 at 03:34:57PM -0400, Chris Mason wrote:
 Excerpts from Christoph Hellwig's message of 2011-04-15 15:24:12 -0400:
  Sorry, but this is too ugly to live.  If the reason for this really is
  good enough we'll just need to push the filemap_write_and_wait_range
  and i_mutex locking into every -fsync instance.
  
 
 Which part is too ugly to live?  The special op? New parameters?

Two different fsync ops, when we could triviall do with one by pushing
things down.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html