Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 21:17, Manuel Krause wrote:
> On 05/07/2002 12:57 AM, Chris Mason wrote:
>
> 
> Hi, Chris & Hans!
> 
> Don't think this somekind of destructive discussion would lead to 
> anything useful for now, can you post a diff for 
> 2.4.19-pre7+latest-related-pending +compound-patch-from-ftp?
> 
> I'll try it and report if that leads to more security and/or less 
> performance on my every day use with NS6 and so on if there is any.

The current data logging patches are at:

ftp.suse.com/pub/people/mason/patches/data-logging

They are against 2.4.19-pre7, and contain versions of the major (stable)
speedups.  The patch is pretty big, so I'm not likely to merge with the
namesys pending directories.  The namesys guys add things frequently,
and I think it would get confusing for people trying to figure out which
patches to apply.

The data logging stuff is beta code, if you have a good test bed where
it's ok if things go wrong I can make you a special patch with the
pending stuff merged.

-chris







Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Manuel Krause

On 05/07/2002 12:57 AM, Chris Mason wrote:

> On Mon, 2002-05-06 at 17:21, Hans Reiser wrote:
> 
>>>I'd rather not put it back in because it adds yet another corner case to
>>>maintain for all time.  Most of the fsync/O_SYNC bound applications are
>>>just given their own partition anyway, so most users that need data
>>>logging need it for every write.
>>>
>>>
>>Does mozilla's mail user agent use fsync?  Should I give it its own 
>>partition?  I bet it is fsync bound;-)
>>
> 
> [ I took Wayne off the cc list, he's probably not horribly interested ]
> 
> Perhaps, but I'll also bet the fsync performance hit doesn't affect the
> performance of the system as a whole.  Remember that data=journal
> doesn't make the fsyncs fast, it just makes them faster.
> 
> 
>>Most persons using small fsyncs are using it because the person who 
>>wrote their application wrote it wrong.  What's more, many of the 
>>persons who wrote those applications cannot understand that they did it 
>>wrong even if you tell them (e.g. qmail author reportedly cannot 
>>understand, sendmail guys now understand but had Kirk McKusick on their 
>>staff and attending the meeting when I explained it to them so they are 
>>not very typical).  
>>
>>In other words, handling stupidity is an important life skill, and we 
>>all need to excell at it.;-)
>>
> 
> A real strength to linux is the application designers can talk directly
> to their own personal bottlenecks.  Hopefully we reward those that hunt
> us down and spend the time convincing us their applications are worth
> tuning for.  They then proceed to beat the pants off their competition.
> 
> 
>>Tell me what your thoughts are on the following:
>>
>>If you ask randomly selected ReiserFS users (not the reiserfs-list, but 
>>the ones who would never send you an email)  the following 
>>questions, what percentage will answer which choice?
>>
>>The filesystem you are using is named:
>>
>>a) the Performance Optimized SuSE FS
>>
>>b) NTFS
>>
>>c) FAT
>>
>>d) ext2
>>
>>e) ReiserFS
>>
> 
> I believe the ones that know what a filesystem is will answer ReiserFS,
> You might get a lot of ext2 answers, just because that's what a lot of
> people think the linux filesystem is.
> 
> 
>>If you want to change reiserfs to use data journaling you must do which:
>>
>>a) reinstall the reiserfs package using rpm
>>
>>b) modify /etc/fs.conf
>>
>>c) reinstall the operating system from scratch, and select different 
>>options during the install this time
>>
>>d) reformat your reiserfs partition using mkreiserfs
>>
>>e) none of the above
>>
>>f) all of the above except e)
>>
> 
> These people won't be admins of systems big enough for the difference to
> matter.  data journaling is targeted at people with so much load they
> would have to buy more hardware to make up for it.  The new option
> lowers the price to performance ratio, which is exactly what we want to
> do for sendmails, egeneras, lycos, etc.  If it takes my laptop 20ms to
> deliver a mail message, cutting the time down to 10ms just won't matter.
> 
> 
>>
>>What do you think the chances are that you can convince Hubert that 
>>every SuSE Enterprise Edition user should be asked at install time if 
>>they are going to use fsync a lot on each partition, and to use a 
>>different fstab setting if yes?
>>
> 
> Very little, I might tell them to buy the suse email server instead,
> since that would have the settings done right.  data=journal is just a
> small part of mail server tuning.
> 
> 
>>I know that you are an experienced sysadmin who was good at it.  Your 
>>intuition tells you that most sysadmins are like the ones you were 
>>willing to hire into your group at the university.  They aren't.
>>
>>Linux needs to be like a telephone.  You plug it in, push buttons, and 
>>talk.  It works well, but most folks don't know why.
>>
>>
> 
> Exactly.  I think there are 3 classes of users at play here.
> 
> 1) Those who don't understand and don't have enough load to notice.
> 2) Those who don't understand and do have enough load to notice.
> 3) Those who do understand and do have enough load to notice.
> 
> #2 will buy support from someone, and they should be able to configure
> the thing right.
> 
> #3 will find the docs and do it right themselves.
> 
> 
>>A moderate number of programs are small fsync bound for the simple 
>>reason that it is simpler to write them that way.We need to cover 
>>over their simplistic designs.
>>
>>So, you have my sympathies Chris, because I believe you that it makes 
>>the code uglier and it won't be a joy to code and test.  I hope you also 
>>see that it should be done.
>>
> 
> Mostly, I feel this kind of tuning is a mistake right now.  The patch is
> young and there are so many places left to tweak...I'm still at the
> stage where much larger improvements are possible, and a better use of
> coding time.  Plus, it's monday and it's always more fun to debate than
> give in on mondays.
> 
> -chris
> 


Hi, Chris & Hans!

D

Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Hans Reiser

Chris Mason wrote:

>On Mon, 2002-05-06 at 17:21, Hans Reiser wrote:
>  
>
>>>I'd rather not put it back in because it adds yet another corner case to
>>>maintain for all time.  Most of the fsync/O_SYNC bound applications are
>>>just given their own partition anyway, so most users that need data
>>>logging need it for every write.
>>>
>>>  
>>>
>>Does mozilla's mail user agent use fsync?  Should I give it its own 
>>partition?  I bet it is fsync bound;-)
>>
>>
>
>[ I took Wayne off the cc list, he's probably not horribly interested ]
>
>Perhaps, but I'll also bet the fsync performance hit doesn't affect the
>performance of the system as a whole.
>
 I suspect that on my laptop, downloading emails is disk bound due to 
fsync()  I haven't measured it, but it "feels" that way.

>
>Mostly, I feel this kind of tuning is a mistake right now.  The patch is
>young and there are so many places left to tweak...I'm still at the
>stage where much larger improvements are possible, and a better use of
>coding time.  Plus, it's monday and it's always more fun to debate than
>give in on mondays.
>
>-chris
>
>
>
>
>  
>

Needing more time to finish analyzing what is going on and what fixes it 
best is always a good reason to defer things

Hans




Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Mon, 2002-05-06 at 17:21, Hans Reiser wrote:
>
> >I'd rather not put it back in because it adds yet another corner case to
> >maintain for all time.  Most of the fsync/O_SYNC bound applications are
> >just given their own partition anyway, so most users that need data
> >logging need it for every write.
> >
> Does mozilla's mail user agent use fsync?  Should I give it its own 
> partition?  I bet it is fsync bound;-)

[ I took Wayne off the cc list, he's probably not horribly interested ]

Perhaps, but I'll also bet the fsync performance hit doesn't affect the
performance of the system as a whole.  Remember that data=journal
doesn't make the fsyncs fast, it just makes them faster.

> 
> Most persons using small fsyncs are using it because the person who 
> wrote their application wrote it wrong.  What's more, many of the 
> persons who wrote those applications cannot understand that they did it 
> wrong even if you tell them (e.g. qmail author reportedly cannot 
> understand, sendmail guys now understand but had Kirk McKusick on their 
> staff and attending the meeting when I explained it to them so they are 
> not very typical).  
> 
> In other words, handling stupidity is an important life skill, and we 
> all need to excell at it.;-)

A real strength to linux is the application designers can talk directly
to their own personal bottlenecks.  Hopefully we reward those that hunt
us down and spend the time convincing us their applications are worth
tuning for.  They then proceed to beat the pants off their competition.

> 
> Tell me what your thoughts are on the following:
> 
> If you ask randomly selected ReiserFS users (not the reiserfs-list, but 
> the ones who would never send you an email)  the following 
> questions, what percentage will answer which choice?
> 
> The filesystem you are using is named:
> 
> a) the Performance Optimized SuSE FS
> 
> b) NTFS
> 
> c) FAT
> 
> d) ext2
> 
> e) ReiserFS

I believe the ones that know what a filesystem is will answer ReiserFS,
You might get a lot of ext2 answers, just because that's what a lot of
people think the linux filesystem is.

> 
> If you want to change reiserfs to use data journaling you must do which:
> 
> a) reinstall the reiserfs package using rpm
> 
> b) modify /etc/fs.conf
> 
> c) reinstall the operating system from scratch, and select different 
> options during the install this time
> 
> d) reformat your reiserfs partition using mkreiserfs
> 
> e) none of the above
> 
> f) all of the above except e)

These people won't be admins of systems big enough for the difference to
matter.  data journaling is targeted at people with so much load they
would have to buy more hardware to make up for it.  The new option
lowers the price to performance ratio, which is exactly what we want to
do for sendmails, egeneras, lycos, etc.  If it takes my laptop 20ms to
deliver a mail message, cutting the time down to 10ms just won't matter.

> 
> 
> What do you think the chances are that you can convince Hubert that 
> every SuSE Enterprise Edition user should be asked at install time if 
> they are going to use fsync a lot on each partition, and to use a 
> different fstab setting if yes?

Very little, I might tell them to buy the suse email server instead,
since that would have the settings done right.  data=journal is just a
small part of mail server tuning.

> 
> I know that you are an experienced sysadmin who was good at it.  Your 
> intuition tells you that most sysadmins are like the ones you were 
> willing to hire into your group at the university.  They aren't.
> 
> Linux needs to be like a telephone.  You plug it in, push buttons, and 
> talk.  It works well, but most folks don't know why.
> 

Exactly.  I think there are 3 classes of users at play here.

1) Those who don't understand and don't have enough load to notice.
2) Those who don't understand and do have enough load to notice.
3) Those who do understand and do have enough load to notice.

#2 will buy support from someone, and they should be able to configure
the thing right.

#3 will find the docs and do it right themselves.

> A moderate number of programs are small fsync bound for the simple 
> reason that it is simpler to write them that way.We need to cover 
> over their simplistic designs.
> 
> So, you have my sympathies Chris, because I believe you that it makes 
> the code uglier and it won't be a joy to code and test.  I hope you also 
> see that it should be done.

Mostly, I feel this kind of tuning is a mistake right now.  The patch is
young and there are so many places left to tweak...I'm still at the
stage where much larger improvements are possible, and a better use of
coding time.  Plus, it's monday and it's always more fun to debate than
give in on mondays.

-chris





Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Hans Reiser

Chris Mason wrote:

>On Sat, 2002-05-04 at 10:59, Hans Reiser wrote:
>  
>
>>So how about if you revise fsync so that it always sends data blocks to 
>>the journal not to the main disk?
>>
>>
>
>This gets a little sticky.
>
>Once you log a block, it might be replayed after a crash.  So, you have
>to protect against corner cases like this:
>
>write(file)
>fsync(file) ; /* logs modified data blocks */
>write(file) ; /* write the same blocks without fsync */
>sync ;/* use expects new version of the blocks on disk */
>
>
>During replay, the logged data blocks overwrite the blocks sent to disk
>via sync().
>
>This isn't hard to correct for, every time a buffer is marked dirty, you
>check the journal hash tables to see if it is replayable, and if so you
>log it instead (the 2.2.x code did this due to tails).  This translates
>to increased CPU usage for every write.
>
>I'd rather not put it back in because it adds yet another corner case to
>maintain for all time.  Most of the fsync/O_SYNC bound applications are
>just given their own partition anyway, so most users that need data
>logging need it for every write.
>
Does mozilla's mail user agent use fsync?  Should I give it its own 
partition?  I bet it is fsync bound;-)

Also, I don't think you can reasonably expect most persons to know that 
they should turn data logging on for high fsync performance, even if you 
document it.

Most persons using small fsyncs are using it because the person who 
wrote their application wrote it wrong.  What's more, many of the 
persons who wrote those applications cannot understand that they did it 
wrong even if you tell them (e.g. qmail author reportedly cannot 
understand, sendmail guys now understand but had Kirk McKusick on their 
staff and attending the meeting when I explained it to them so they are 
not very typical).  

In other words, handling stupidity is an important life skill, and we 
all need to excell at it.;-)

Tell me what your thoughts are on the following:

If you ask randomly selected ReiserFS users (not the reiserfs-list, but 
the ones who would never send you an email)  the following 
questions, what percentage will answer which choice?

The filesystem you are using is named:

a) the Performance Optimized SuSE FS

b) NTFS

c) FAT

d) ext2

e) ReiserFS

If you want to change reiserfs to use data journaling you must do which:

a) reinstall the reiserfs package using rpm

b) modify /etc/fs.conf

c) reinstall the operating system from scratch, and select different 
options during the install this time

d) reformat your reiserfs partition using mkreiserfs

e) none of the above

f) all of the above except e)


What do you think the chances are that you can convince Hubert that 
every SuSE Enterprise Edition user should be asked at install time if 
they are going to use fsync a lot on each partition, and to use a 
different fstab setting if yes?

I know that you are an experienced sysadmin who was good at it.  Your 
intuition tells you that most sysadmins are like the ones you were 
willing to hire into your group at the university.  They aren't.

Linux needs to be like a telephone.  You plug it in, push buttons, and 
talk.  It works well, but most folks don't know why.

A moderate number of programs are small fsync bound for the simple 
reason that it is simpler to write them that way.We need to cover 
over their simplistic designs.

So, you have my sympathies Chris, because I believe you that it makes 
the code uglier and it won't be a joy to code and test.  I hope you also 
see that it should be done.

Hans




Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Hans Reiser

Chris Mason wrote:

>On Sat, 2002-05-04 at 10:59, Hans Reiser wrote:
>  
>
>>So how about if you revise fsync so that it always sends data blocks to 
>>the journal not to the main disk?
>>
>>
>
>This gets a little sticky.
>
>Once you log a block, it might be replayed after a crash.  So, you have
>to protect against corner cases like this:
>
>write(file)
>fsync(file) ; /* logs modified data blocks */
>write(file) ; /* write the same blocks without fsync */
>sync ;/* use expects new version of the blocks on disk */
>
>
>During replay, the logged data blocks overwrite the blocks sent to disk
>via sync().
>
>This isn't hard to correct for, every time a buffer is marked dirty, you
>check the journal hash tables to see if it is replayable, and if so you
>log it instead (the 2.2.x code did this due to tails).  This translates
>to increased CPU usage for every write.
>
Significant increased CPU usage?

>
>I'd rather not put it back in because it adds yet another corner case to
>maintain for all time.  Most of the fsync/O_SYNC bound applications are
>just given their own partition anyway, so most users that need data
>logging need it for every write.
>
most users don't know enough to turn it on;-)

>
>-chris
>
>
>
>
>
>
>  
>






RE: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread berthiaume_wayne

I'll add the write caching into the test just for info. Until there
is a way to guaranty the data is safe I'll have to go with no write caching
though. I should have all this testing done by the end of the week.

-Original Message-
From: Chris Mason [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 03, 2002 6:00 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [reiserfs-list] fsync() Performance Issue


On Fri, 2002-05-03 at 16:35, [EMAIL PROTECTED] wrote:
>   Chris, I have some quick preliminary results for you. I have
> additional testing to perform and haven't run debugreiserfs() yet. If you
> have a preference for which tests to run debugreiserfs() let me know.
>   Base testing was done against 2.4.13 built on RH 7.1 using the
> test_writes.c code I forwarded to you. The system is a Tyan with single
> PIII, IDE Promise 20269, Maxtor 160GB drive - write cache disabled. All
> numbers are with fsync() and 1KB files. As I said, more testing, i.e.
> filesizes, need to be performed.

> 2.4.19-pre7 speedup, data logging, write barrier / no options
>   => 47.1ms/file

Hi Wayne, thanks for sending these along.

I expected a slight improvement over the 2.4.13 code even with the data
logging turned off.  I'm curious to see how it does with the IDE cache
turned on.  With scsi, I see 10-15% better without any options than an
unpatched kernel.

> 2.4.19-pre7 speedup, data logging, write barrier / data=journal
>   => 25.2ms/file
> 2.4.19-pre7 speedup, data logging, write barrier /
data=journal,barrier=none
>   => 27.8ms/file

The barrier option doesn't make much difference because the write cache
is off.  With write cache on, the barrier code should allow you to be
faster than with the caching off, but without risking the data (Jens and
I are working on final fsync safety issues though).

Hans, data=journal turns on the data journaling.  The data journaling
patches also include optimizations to write metadata back to disk in
bigger chunks for tiny transactions (the current method is to write one
transaction's worth back, when a transaction has 3 blocks, this is
pretty slow).

I've put these patches up on:

ftp.suse.com/pub/people/mason/patches/data-logging

>   One question is will these patches be going into the 2.4 tree and
> when?

The data logging patches are a huge change, but the good news is they
are based on the nesting patches that have been stable for a long time
in the quota code.  I'll probably want a month or more of heavy testing
before I think about submitting them.

-chris




Re: [reiserfs-list] fsync() Performance Issue

2002-05-06 Thread Chris Mason

On Sat, 2002-05-04 at 10:59, Hans Reiser wrote:
>
> So how about if you revise fsync so that it always sends data blocks to 
> the journal not to the main disk?

This gets a little sticky.

Once you log a block, it might be replayed after a crash.  So, you have
to protect against corner cases like this:

write(file)
fsync(file) ; /* logs modified data blocks */
write(file) ; /* write the same blocks without fsync */
sync ;/* use expects new version of the blocks on disk */


During replay, the logged data blocks overwrite the blocks sent to disk
via sync().

This isn't hard to correct for, every time a buffer is marked dirty, you
check the journal hash tables to see if it is replayable, and if so you
log it instead (the 2.2.x code did this due to tails).  This translates
to increased CPU usage for every write.

I'd rather not put it back in because it adds yet another corner case to
maintain for all time.  Most of the fsync/O_SYNC bound applications are
just given their own partition anyway, so most users that need data
logging need it for every write.

-chris







Re: [reiserfs-list] fsync() Performance Issue

2002-05-01 Thread Oleg Drokin

Hello!

On Thu, May 02, 2002 at 07:07:18AM +0200, Christian Stuke wrote:
> Could we have this for 2.4.18+ pending also please?

This patch would apply to 2.4.18 + pending patches, I believe.
As for including these patchs into pending queue for 2.4.18, this is impossible
now, it is too big of a change, unfortunatelly. We hope to get something
like this into 2.4.19-pre1+

Bye,
Oleg



Re: [reiserfs-list] fsync() Performance Issue

2002-05-01 Thread Christian Stuke

Could we have this for 2.4.18+ pending also please?

Chris
- Original Message -
From: "Oleg Drokin" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 30, 2002 4:20 PM
Subject: Re: [reiserfs-list] fsync() Performance Issue


> Hello!
>
> On Fri, Apr 26, 2002 at 04:28:26PM -0400, [EMAIL PROTECTED] wrote:
> > I'm wondering if anyone out there may have some suggestions on how
> > to improve the performance of a system employing fsync(). I have to be
able
> > to guaranty that every write to my fileserver is on disk when the client
has
> > passed it to the server. Therefore, I have disabled write cache on the
disk
> > and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25,
> > without additional patches. I have seen some discussions out here about
> > various other "speed-up" patches and am wondering if I need to add these
to
> > 2.4.19-pre7? And what they are and where can I obtain said patches?
Also,
> > I'm wondering if there is another solution to syncing the data that is
> > faster than fsync(). Testing, thusfar, has shown a large disparity
between
> > running with and without sync.Another idea is to explore another
filesystem,
> > but I'm not exactly excited by the other journaling filesystems out
there at
> > this time. All ideas will be greatly appreciated.
>
> Attached is a speedup patch for 2.4.19-pre7 that should help your fsync
> operations a little. (From Chris Mason).
> Filesystem cannot do very much at this point unfortunatelly, it is ending
up
> waiting for disk to finish write operations.
>
> Also we are working on other speedup patches that would cover different
areas
> of write perfomance itself.
>
> Bye,
> Oleg
>




RE: [reiserfs-list] fsync() Performance Issue

2002-04-30 Thread berthiaume_wayne

Thanks. I'll start putting this one into test.
Wayne.

-Original Message-
From: Chris Mason [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, April 30, 2002 10:28 AM
To: Oleg Drokin
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [reiserfs-list] fsync() Performance Issue


On Tue, 2002-04-30 at 10:20, Oleg Drokin wrote:

> Attached is a speedup patch for 2.4.19-pre7 that should help your fsync
> operations a little. (From Chris Mason).
> Filesystem cannot do very much at this point unfortunatelly, it is ending
up
> waiting for disk to finish write operations.
> 
> Also we are working on other speedup patches that would cover different
areas
> of write perfomance itself.

A newer one (against 2.4.19-pre7) is below.  It has not been through as
much testing on the namesys side, which is why Oleg sent the older one.

Wayne and I have been talking in private mail, he's getting a bunch of
beta patches later today (this speedup, data logging, updated barrier
code).  Along with instructions for testing.

-chris

# Veritas (Hugh Dickins supplied the patch) sent the bits in
# fs/super.c that allow the FS to leave super->s_dirt set after a
# write_super call.
#
diff -urN --exclude *.orig parent/fs/buffer.c comp/fs/buffer.c
--- parent/fs/buffer.c  Mon Apr 29 10:20:24 2002
+++ comp/fs/buffer.cMon Apr 29 10:20:22 2002
@@ -325,6 +325,8 @@
lock_super(sb);
if (sb->s_dirt && sb->s_op && sb->s_op->write_super)
sb->s_op->write_super(sb);
+   if (sb->s_op && sb->s_op->commit_super)
+   sb->s_op->commit_super(sb);
unlock_super(sb);
unlock_kernel();
 
@@ -344,7 +346,7 @@
lock_kernel();
sync_inodes(dev);
DQUOT_SYNC(dev);
-   sync_supers(dev);
+   commit_supers(dev);
unlock_kernel();
 
return sync_buffers(dev, 1);
diff -urN --exclude *.orig parent/fs/reiserfs/bitmap.c
comp/fs/reiserfs/bitmap.c
--- parent/fs/reiserfs/bitmap.c Mon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/bitmap.c   Mon Apr 29 10:20:19 2002
@@ -122,7 +122,6 @@
   set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 );
 
   journal_mark_dirty (th, s, sbh);
-  s->s_dirt = 1;
 }
 
 void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
@@ -433,7 +432,6 @@
   /* update free block count in super block */
   PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed );
   journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s));
-  s->s_dirt = 1;
 
   return CARRY_ON;
 }
diff -urN --exclude *.orig parent/fs/reiserfs/ibalance.c
comp/fs/reiserfs/ibalance.c
--- parent/fs/reiserfs/ibalance.c   Mon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/ibalance.c Mon Apr 29 10:20:19 2002
@@ -632,7 +632,6 @@
/* use check_internal if new root is an internal node */
check_internal (new_root);
/*&&&&&&&&&&&&&&&&&&&&&&*/
-   tb->tb_sb->s_dirt = 1;
 
/* do what is needed for buffer thrown from tree */
reiserfs_invalidate_buffer(tb, tbSh);
@@ -950,7 +949,6 @@
 PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr );
 PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 );
do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1);
-   tb->tb_sb->s_dirt = 1;
 }

 if ( tb->blknum[h] == 2 ) {
diff -urN --exclude *.orig parent/fs/reiserfs/journal.c
comp/fs/reiserfs/journal.c
--- parent/fs/reiserfs/journal.cMon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/journal.c  Mon Apr 29 10:20:21 2002
@@ -64,12 +64,15 @@
 */
 static int reiserfs_mounted_fs_count = 0 ;
 
+static struct list_head kreiserfsd_supers =
LIST_HEAD_INIT(kreiserfsd_supers);
+
 /* wake this up when you add something to the commit thread task queue */
 DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ;
 
 /* wait on this if you need to be sure you task queue entries have been run
*/
 static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ;
 DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ;
+DECLARE_MUTEX(kreiserfsd_sem) ;
 
 #define JOURNAL_TRANS_HALF 1018   /* must be correct to keep the desc and
commit
 structs at 4k */
@@ -576,17 +579,12 @@
 /* lock the current transaction */
 inline static void lock_journal(struct super_block *p_s_sb) {
   PROC_INFO_INC( p_s_sb, journal.lock_journal );
-  while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) {
-PROC_INFO_INC( p_s_sb, journal.lock_journal_wait );
-sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ;
-  }
-  atomic_set(&(SB_JOURNAL(p_s_sb)->j_wlock), 1) ;
+  down(&SB_JOURNAL(p_s_sb)->j_lock);
 }
 
 /* unlock the current transaction */
 inline static void unlock_journal(struct super_block *p_s_sb) {
-  atomic_dec(&(SB_JOURNAL(p_s_sb)

Re: [reiserfs-list] fsync() Performance Issue

2002-04-30 Thread Chris Mason

On Tue, 2002-04-30 at 10:20, Oleg Drokin wrote:

> Attached is a speedup patch for 2.4.19-pre7 that should help your fsync
> operations a little. (From Chris Mason).
> Filesystem cannot do very much at this point unfortunatelly, it is ending up
> waiting for disk to finish write operations.
> 
> Also we are working on other speedup patches that would cover different areas
> of write perfomance itself.

A newer one (against 2.4.19-pre7) is below.  It has not been through as
much testing on the namesys side, which is why Oleg sent the older one.

Wayne and I have been talking in private mail, he's getting a bunch of
beta patches later today (this speedup, data logging, updated barrier
code).  Along with instructions for testing.

-chris

# Veritas (Hugh Dickins supplied the patch) sent the bits in
# fs/super.c that allow the FS to leave super->s_dirt set after a
# write_super call.
#
diff -urN --exclude *.orig parent/fs/buffer.c comp/fs/buffer.c
--- parent/fs/buffer.c  Mon Apr 29 10:20:24 2002
+++ comp/fs/buffer.cMon Apr 29 10:20:22 2002
@@ -325,6 +325,8 @@
lock_super(sb);
if (sb->s_dirt && sb->s_op && sb->s_op->write_super)
sb->s_op->write_super(sb);
+   if (sb->s_op && sb->s_op->commit_super)
+   sb->s_op->commit_super(sb);
unlock_super(sb);
unlock_kernel();
 
@@ -344,7 +346,7 @@
lock_kernel();
sync_inodes(dev);
DQUOT_SYNC(dev);
-   sync_supers(dev);
+   commit_supers(dev);
unlock_kernel();
 
return sync_buffers(dev, 1);
diff -urN --exclude *.orig parent/fs/reiserfs/bitmap.c comp/fs/reiserfs/bitmap.c
--- parent/fs/reiserfs/bitmap.c Mon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/bitmap.c   Mon Apr 29 10:20:19 2002
@@ -122,7 +122,6 @@
   set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 );
 
   journal_mark_dirty (th, s, sbh);
-  s->s_dirt = 1;
 }
 
 void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
@@ -433,7 +432,6 @@
   /* update free block count in super block */
   PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed );
   journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s));
-  s->s_dirt = 1;
 
   return CARRY_ON;
 }
diff -urN --exclude *.orig parent/fs/reiserfs/ibalance.c comp/fs/reiserfs/ibalance.c
--- parent/fs/reiserfs/ibalance.c   Mon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/ibalance.c Mon Apr 29 10:20:19 2002
@@ -632,7 +632,6 @@
/* use check_internal if new root is an internal node */
check_internal (new_root);
/*&&*/
-   tb->tb_sb->s_dirt = 1;
 
/* do what is needed for buffer thrown from tree */
reiserfs_invalidate_buffer(tb, tbSh);
@@ -950,7 +949,6 @@
 PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr );
 PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 );
do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1);
-   tb->tb_sb->s_dirt = 1;
 }

 if ( tb->blknum[h] == 2 ) {
diff -urN --exclude *.orig parent/fs/reiserfs/journal.c comp/fs/reiserfs/journal.c
--- parent/fs/reiserfs/journal.cMon Apr 29 10:20:24 2002
+++ comp/fs/reiserfs/journal.c  Mon Apr 29 10:20:21 2002
@@ -64,12 +64,15 @@
 */
 static int reiserfs_mounted_fs_count = 0 ;
 
+static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers);
+
 /* wake this up when you add something to the commit thread task queue */
 DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ;
 
 /* wait on this if you need to be sure you task queue entries have been run */
 static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ;
 DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ;
+DECLARE_MUTEX(kreiserfsd_sem) ;
 
 #define JOURNAL_TRANS_HALF 1018   /* must be correct to keep the desc and commit
 structs at 4k */
@@ -576,17 +579,12 @@
 /* lock the current transaction */
 inline static void lock_journal(struct super_block *p_s_sb) {
   PROC_INFO_INC( p_s_sb, journal.lock_journal );
-  while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) {
-PROC_INFO_INC( p_s_sb, journal.lock_journal_wait );
-sleep_on(&(SB_JOURNAL(p_s_sb)->j_wait)) ;
-  }
-  atomic_set(&(SB_JOURNAL(p_s_sb)->j_wlock), 1) ;
+  down(&SB_JOURNAL(p_s_sb)->j_lock);
 }
 
 /* unlock the current transaction */
 inline static void unlock_journal(struct super_block *p_s_sb) {
-  atomic_dec(&(SB_JOURNAL(p_s_sb)->j_wlock)) ;
-  wake_up(&(SB_JOURNAL(p_s_sb)->j_wait)) ;
+  up(&SB_JOURNAL(p_s_sb)->j_lock);
 }
 
 /*
@@ -756,7 +754,6 @@
   atomic_set(&(jl->j_commit_flushing), 0) ;
   wake_up(&(jl->j_commit_wait)) ;
 
-  s->s_dirt = 1 ;
   return 0 ;
 }
 
@@ -1220,7 +1217,6 @@
 if (run++ == 0) {
 goto loop_start ;
 }
-
 atomic_set(&(jl->j_flushing), 0) ;
 wake_up(&(jl->j_flush_wait)) ;
 return ret ;
@@ -1250,7 +1246,7 @@
 while(i != start) {
 jl = SB_JOURNAL_LIST(s) + i  ;
 age = CURRENT_TIME - jl->j_time

Re: [reiserfs-list] fsync() Performance Issue

2002-04-30 Thread Oleg Drokin

Hello!

On Fri, Apr 26, 2002 at 04:28:26PM -0400, [EMAIL PROTECTED] wrote:
>   I'm wondering if anyone out there may have some suggestions on how
> to improve the performance of a system employing fsync(). I have to be able
> to guaranty that every write to my fileserver is on disk when the client has
> passed it to the server. Therefore, I have disabled write cache on the disk
> and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs 3.6.25,
> without additional patches. I have seen some discussions out here about
> various other "speed-up" patches and am wondering if I need to add these to
> 2.4.19-pre7? And what they are and where can I obtain said patches? Also,
> I'm wondering if there is another solution to syncing the data that is
> faster than fsync(). Testing, thusfar, has shown a large disparity between
> running with and without sync.Another idea is to explore another filesystem,
> but I'm not exactly excited by the other journaling filesystems out there at
> this time. All ideas will be greatly appreciated.

Attached is a speedup patch for 2.4.19-pre7 that should help your fsync
operations a little. (From Chris Mason).
Filesystem cannot do very much at this point unfortunatelly, it is ending up
waiting for disk to finish write operations.

Also we are working on other speedup patches that would cover different areas
of write perfomance itself.

Bye,
Oleg


diff -uNr linux-2.4.19-pre6.o/fs/buffer.c linux-2.4.19-pre6.speedup/fs/buffer.c
--- linux-2.4.19-pre6.o/fs/buffer.c Mon Apr  8 14:53:24 2002
+++ linux-2.4.19-pre6.speedup/fs/buffer.c   Wed Apr 10 10:43:46 2002
@@ -325,6 +325,8 @@
lock_super(sb);
if (sb->s_dirt && sb->s_op && sb->s_op->write_super)
sb->s_op->write_super(sb);
+   if (sb->s_op && sb->s_op->commit_super)
+   sb->s_op->commit_super(sb);
unlock_super(sb);
unlock_kernel();
 
@@ -344,7 +346,7 @@
lock_kernel();
sync_inodes(dev);
DQUOT_SYNC(dev);
-   sync_supers(dev);
+   commit_supers(dev);
unlock_kernel();
 
return sync_buffers(dev, 1);
Binary files linux-2.4.19-pre6.o/fs/reiserfs/.journal.c.rej.swp and 
linux-2.4.19-pre6.speedup/fs/reiserfs/.journal.c.rej.swp differ
diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/bitmap.c 
linux-2.4.19-pre6.speedup/fs/reiserfs/bitmap.c
--- linux-2.4.19-pre6.o/fs/reiserfs/bitmap.cMon Apr  8 14:53:24 2002
+++ linux-2.4.19-pre6.speedup/fs/reiserfs/bitmap.c  Wed Apr 10 10:43:46 2002
@@ -122,7 +122,6 @@
   set_sb_free_blocks( rs, sb_free_blocks(rs) + 1 );
 
   journal_mark_dirty (th, s, sbh);
-  s->s_dirt = 1;
 }
 
 void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
@@ -433,7 +432,6 @@
   /* update free block count in super block */
   PUT_SB_FREE_BLOCKS( s, SB_FREE_BLOCKS(s) - init_amount_needed );
   journal_mark_dirty (th, s, SB_BUFFER_WITH_SB (s));
-  s->s_dirt = 1;
 
   return CARRY_ON;
 }
diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/ibalance.c 
linux-2.4.19-pre6.speedup/fs/reiserfs/ibalance.c
--- linux-2.4.19-pre6.o/fs/reiserfs/ibalance.c  Sat Nov 10 01:18:25 2001
+++ linux-2.4.19-pre6.speedup/fs/reiserfs/ibalance.cWed Apr 10 10:43:46 2002
@@ -632,7 +632,6 @@
/* use check_internal if new root is an internal node */
check_internal (new_root);
/*&&*/
-   tb->tb_sb->s_dirt = 1;
 
/* do what is needed for buffer thrown from tree */
reiserfs_invalidate_buffer(tb, tbSh);
@@ -950,7 +949,6 @@
 PUT_SB_ROOT_BLOCK( tb->tb_sb, tbSh->b_blocknr );
 PUT_SB_TREE_HEIGHT( tb->tb_sb, SB_TREE_HEIGHT(tb->tb_sb) + 1 );
do_balance_mark_sb_dirty (tb, tb->tb_sb->u.reiserfs_sb.s_sbh, 1);
-   tb->tb_sb->s_dirt = 1;
 }

 if ( tb->blknum[h] == 2 ) {
diff -uNr linux-2.4.19-pre6.o/fs/reiserfs/journal.c 
linux-2.4.19-pre6.speedup/fs/reiserfs/journal.c
--- linux-2.4.19-pre6.o/fs/reiserfs/journal.c   Mon Apr  8 14:53:24 2002
+++ linux-2.4.19-pre6.speedup/fs/reiserfs/journal.c Wed Apr 10 10:44:32 2002
@@ -64,12 +64,15 @@
 */
 static int reiserfs_mounted_fs_count = 0 ;
 
+static struct list_head kreiserfsd_supers = LIST_HEAD_INIT(kreiserfsd_supers);
+
 /* wake this up when you add something to the commit thread task queue */
 DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_wait) ;
 
 /* wait on this if you need to be sure you task queue entries have been run */
 static DECLARE_WAIT_QUEUE_HEAD(reiserfs_commit_thread_done) ;
 DECLARE_TASK_QUEUE(reiserfs_commit_thread_tq) ;
+DECLARE_MUTEX(kreiserfsd_sem) ;
 
 #define JOURNAL_TRANS_HALF 1018   /* must be correct to keep the desc and commit
 structs at 4k */
@@ -576,17 +579,12 @@
 /* lock the current transaction */
 inline static void lock_journal(struct super_block *p_s_sb) {
   PROC_INFO_INC( p_s_sb, journal.lock_journal );
-  while(atomic_read(&(SB_JOURNAL(p_s_sb)->j_wlock)) > 0) {
-PROC_INFO_INC( p_s_

Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Hans Reiser

[EMAIL PROTECTED] wrote:

>On Mon, 29 Apr 2002 19:56:59 +0200, Matthias Andree <[EMAIL PROTECTED]> 
> said:
>
>  
>
>>Barring write cache effects, fsync() only returns after all blocks are
>>on disk. While I'm not sure if and if yes, which, Linux file systems are
>>affected, but for portable applications, be aware that sync() may return
>>prematurely (and is allowed to!).
>>
>>
>
>And in fact is the reason for the old "recipe":
>  # sync
>  # sync
>  # sync
>  # reboot
>
>On the older Vax 750-class machines, sync could return LONG before the blocks
>were all flushed - the second 2 sync's were so you were busy typing for
>several seconds while the disks whirred.  Failure to understand the typing
>speed issue has lead at least one otherwise-clued author to recommend:
>  # sync;sync;sync
>  # reboot
>
>(the distinction being obvious if you think about when the shell reads the
>commands, and when it does the fork/exec for each case)
>
>  
>
Finally I understand this.  Doing more than one sync always seemed 
mysterious to me.;-)

Thanks Matthias.

Hans




Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Valdis . Kletnieks

On Mon, 29 Apr 2002 19:56:59 +0200, Matthias Andree <[EMAIL PROTECTED]>  
said:

> Barring write cache effects, fsync() only returns after all blocks are
> on disk. While I'm not sure if and if yes, which, Linux file systems are
> affected, but for portable applications, be aware that sync() may return
> prematurely (and is allowed to!).

And in fact is the reason for the old "recipe":
  # sync
  # sync
  # sync
  # reboot

On the older Vax 750-class machines, sync could return LONG before the blocks
were all flushed - the second 2 sync's were so you were busy typing for
several seconds while the disks whirred.  Failure to understand the typing
speed issue has lead at least one otherwise-clued author to recommend:
  # sync;sync;sync
  # reboot

(the distinction being obvious if you think about when the shell reads the
commands, and when it does the fork/exec for each case)

-- 
Valdis Kletnieks
Computer Systems Senior Engineer
Virginia Tech





msg05263/pgp0.pgp
Description: PGP signature


RE: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread berthiaume_wayne

Agreed, it would be better to sync to disk after multiple files
rather than serially; however, in the interest of not being concerned of a
power outage during the process, one of the reason the disk cache is
disabled, the choice was to fsync() each write.  

-Original Message-
From: Chris Mason [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 29, 2002 12:46 PM
To: [EMAIL PROTECTED]
Cc: Russell Coker; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [reiserfs-list] fsync() Performance Issue


On Mon, 2002-04-29 at 12:32, Toby Dickenson wrote:

> >One thing that has occurred to me (which has not been previously
discussed as 
> >far as I recall) is the possibility for using sync() instead of fsync()
if 
> >you can accumulate a number of files (and therefore replace many
fsync()'s 
> >with one sync() ).
> 
> I can see
> 
> write to file A
> write to file B
> write to file C
> sync
> 
> might be faster than
> 
> write to file A
> fsync A
> write to file B
> fsync B
> write to file C
> fsync C

Correct.

> 
> but is it possible for it to be faster than
> 
> write to file A
> write to file B
> write to file C
> fsync A
> fsync B
> fsync C

It depends on the rest of the system.  sync() goes through the big lru
list for the whole box, and fsync() goes through the private list for
just that inode.  If you've got other devices or files with dirty data,
case C that you presented will always be the fastest.  For general use,
I like this one the best, it is what the journal code is optimized for.

If files A, B, and C are the only dirty things on the whole box, a
single sync() will be slightly better, mostly due to reduced cpu time.

-chris




Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Matthias Andree

Toby Dickenson <[EMAIL PROTECTED]> writes:

> write to file A
> write to file B
> write to file C
> sync

Be careful with this approach. Apart from syncing other processes' dirty
data, sync() does not make the same guarantees as fsync() does.

Barring write cache effects, fsync() only returns after all blocks are
on disk. While I'm not sure if and if yes, which, Linux file systems are
affected, but for portable applications, be aware that sync() may return
prematurely (and is allowed to!).

-- 
Matthias Andree



Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Chris Mason

On Mon, 2002-04-29 at 12:32, Toby Dickenson wrote:

> >One thing that has occurred to me (which has not been previously discussed as 
> >far as I recall) is the possibility for using sync() instead of fsync() if 
> >you can accumulate a number of files (and therefore replace many fsync()'s 
> >with one sync() ).
> 
> I can see
> 
> write to file A
> write to file B
> write to file C
> sync
> 
> might be faster than
> 
> write to file A
> fsync A
> write to file B
> fsync B
> write to file C
> fsync C

Correct.

> 
> but is it possible for it to be faster than
> 
> write to file A
> write to file B
> write to file C
> fsync A
> fsync B
> fsync C

It depends on the rest of the system.  sync() goes through the big lru
list for the whole box, and fsync() goes through the private list for
just that inode.  If you've got other devices or files with dirty data,
case C that you presented will always be the fastest.  For general use,
I like this one the best, it is what the journal code is optimized for.

If files A, B, and C are the only dirty things on the whole box, a
single sync() will be slightly better, mostly due to reduced cpu time.

-chris





Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Chris Mason

On Mon, 2002-04-29 at 12:20, Russell Coker wrote:
> On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote:
> 
> It's interesting to note your email address and what it implies...
> 
> > I'm wondering if anyone out there may have some suggestions on how
> > to improve the performance of a system employing fsync(). I have to be able
> > to guaranty that every write to my fileserver is on disk when the client
> > has passed it to the server. Therefore, I have disabled write cache on the
> > disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs
> > 3.6.25, without additional patches. I have seen some discussions out here
> > about various other "speed-up" patches and am wondering if I need to add
> > these to 2.4.19-pre7? And what they are and where can I obtain said
> > patches? Also, I'm wondering if there is another solution to syncing the
> > data that is faster than fsync(). Testing, thusfar, has shown a large
> > disparity between running with and without sync.Another idea is to explore
> > another filesystem, but I'm not exactly excited by the other journaling
> > filesystems out there at this time. All ideas will be greatly appreciated.
> 
> These issues have been discussed a few times, but not with any results as 
> exciting as you might hope for.  One which was mentioned was using 
> fdatasync() instead of fsync().

The speedup patches should help fsync some, since they make it much more
likely a commit will be done without the journal lock held.

If all the writes on the FS end up being done through fsync, the data
logging patches might help a lot.  These should be ready for broader
testing this week.

If you are using IDE drives, the write barrier patches are almost enough
to allow you to turn on write caching safely.  They make sure metadata
triggers proper drive cache flushes, I can try to rig up something that
will also trigger a cache flush on data syncs.

-chris





Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Toby Dickenson

On Mon, 29 Apr 2002 18:20:18 +0200, Russell Coker
<[EMAIL PROTECTED]> wrote:

>On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote:
>
>It's interesting to note your email address and what it implies...
>
>>  I'm wondering if anyone out there may have some suggestions on how
>> to improve the performance of a system employing fsync(). I have to be able
>> to guaranty that every write to my fileserver is on disk when the client
>> has passed it to the server. Therefore, I have disabled write cache on the
>> disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs
>> 3.6.25, without additional patches. I have seen some discussions out here
>> about various other "speed-up" patches and am wondering if I need to add
>> these to 2.4.19-pre7? And what they are and where can I obtain said
>> patches? Also, I'm wondering if there is another solution to syncing the
>> data that is faster than fsync(). Testing, thusfar, has shown a large
>> disparity between running with and without sync.Another idea is to explore
>> another filesystem, but I'm not exactly excited by the other journaling
>> filesystems out there at this time. All ideas will be greatly appreciated.
>
>These issues have been discussed a few times, but not with any results as 
>exciting as you might hope for.  One which was mentioned was using 
>fdatasync() instead of fsync().
>
>One thing that has occurred to me (which has not been previously discussed as 
>far as I recall) is the possibility for using sync() instead of fsync() if 
>you can accumulate a number of files (and therefore replace many fsync()'s 
>with one sync() ).

I can see

write to file A
write to file B
write to file C
sync

might be faster than

write to file A
fsync A
write to file B
fsync B
write to file C
fsync C

but is it possible for it to be faster than

write to file A
write to file B
write to file C
fsync A
fsync B
fsync C

?



Toby Dickenson
[EMAIL PROTECTED]



Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Russell Coker

On Fri, 26 Apr 2002 22:28, [EMAIL PROTECTED] wrote:

It's interesting to note your email address and what it implies...

>   I'm wondering if anyone out there may have some suggestions on how
> to improve the performance of a system employing fsync(). I have to be able
> to guaranty that every write to my fileserver is on disk when the client
> has passed it to the server. Therefore, I have disabled write cache on the
> disk and issue an fsync() per file. I'm running 2.4.19-pre7, reiserfs
> 3.6.25, without additional patches. I have seen some discussions out here
> about various other "speed-up" patches and am wondering if I need to add
> these to 2.4.19-pre7? And what they are and where can I obtain said
> patches? Also, I'm wondering if there is another solution to syncing the
> data that is faster than fsync(). Testing, thusfar, has shown a large
> disparity between running with and without sync.Another idea is to explore
> another filesystem, but I'm not exactly excited by the other journaling
> filesystems out there at this time. All ideas will be greatly appreciated.

These issues have been discussed a few times, but not with any results as 
exciting as you might hope for.  One which was mentioned was using 
fdatasync() instead of fsync().

One thing that has occurred to me (which has not been previously discussed as 
far as I recall) is the possibility for using sync() instead of fsync() if 
you can accumulate a number of files (and therefore replace many fsync()'s 
with one sync() ).

-- 
If you send email to me or to a mailing list that I use which has >4 lines
of legalistic junk at the end then you are specifically authorizing me to do
whatever I wish with the message and all other messages from your domain, by
posting the message you agree that your long legalistic sig is void.