Re: large files

2004-05-18 Thread Bernd Schubert
Hello Chris,


 As a comparison data point, could you please try 2.6.6-mm3?  I realize
 you don't want to run this kernel in production, but it would tell us if
 I understand the problems at hand.

the results in 2.6.6-mm3 are below, we almost consider to run this kernel 
version.

Here are two other interesting facts:

1.) During the filecreation in 2.4.26 the load on the system was around 3-4, 
whereas in 2.6.6-mm3 the load was at about 8-9.

2.) When the dd file creation process finished (2.4.26 was running) the system 
became so unresponsible, that the drdb connection timed out and a resync 
process automatically started when the system became responsible again. I 
don't have any comparism to 2.6.6-mm3 since we would need another drbd 
version. Also, I don't know if this happend when dd finished or when 
rm-started, since both were running from a script.

Here are the measured times for file creation and file deleting

= 2.4.26:

taylor:~# cat test.out-2.4.26
time dd if=/dev/zero of=/worka/testfile.dd bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes transferred in 5746.266841 seconds (54743855 bytes/sec)

real95m46.275s
user0m0.760s
sys 29m57.800s


time rm -fr /worka/testfile.dd

real11m20.589s
user0m0.000s
sys 4m59.850s


= 2.6.6-mm3


taylor:~# cat test.out-2.6.6-mm3
time dd if=/dev/zero of=/worka/testfile.dd bs=1M count=30
30+0 records in
30+0 records out
31457280 bytes transferred in 4902.873869 seconds (64160900 bytes/sec)

real81m46.211s
user0m1.172s
sys 22m26.010s


time rm -fr /worka/testfile.dd

real1m38.000s
user0m0.000s
sys 1m5.872s



Do you have any ideas how we could improve 2.4.x? 


Thanks,
Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: [EMAIL PROTECTED]


pgpTxdLMuiwGJ.pgp
Description: signature


Re: large files

2004-05-18 Thread Chris Mason
On Tue, 2004-05-18 at 09:42, Bernd Schubert wrote:
 Hello Chris,
 
 
  As a comparison data point, could you please try 2.6.6-mm3?  I realize
  you don't want to run this kernel in production, but it would tell us if
  I understand the problems at hand.
 
 the results in 2.6.6-mm3 are below, we almost consider to run this kernel 
 version.
 
 Here are two other interesting facts:
 
 1.) During the filecreation in 2.4.26 the load on the system was around 3-4, 
 whereas in 2.6.6-mm3 the load was at about 8-9.
 
Which procs contributed to this load?  The simple dd should have kept
the load at one.

 2.) When the dd file creation process finished (2.4.26 was running) the system 
 became so unresponsible, that the drdb connection timed out and a resync 
 process automatically started when the system became responsible again. I 
 don't have any comparism to 2.6.6-mm3 since we would need another drbd 
 version. Also, I don't know if this happend when dd finished or when 
 rm-started, since both were running from a script.
 
Probably the rm.

[ 2.6.6-mm3 is much faster ]

 Do you have any ideas how we could improve 2.4.x? 
 

2.6.6-mm has a few key improvements.  There's less metadata
fragmentation thanks to some block allocator fixes.  More importantly,
during the rm, metadata blocks are read in 16 at a time instead of 1 at
a time.  I'd be happy to give someone pointers on porting the metadata
readahead bits back to 2.4.

-chris




Re: large files

2004-05-18 Thread Bernd Schubert
  1.) During the filecreation in 2.4.26 the load on the system was around
  3-4, whereas in 2.6.6-mm3 the load was at about 8-9.

 Which procs contributed to this load?  The simple dd should have kept
 the load at one.

Thats all I can see from top (2.4.26):

top - 16:45:14 up  4:47,  1 user,  load average: 3.30, 2.80, 2.09
Tasks:  80 total,   1 running,  79 sleeping,   0 stopped,   0 zombie
Cpu(s):   0.0% user,  24.0% system,   0.0% nice,  76.0% idle
Mem:   3104428k total,  3018816k used,85612k free,   228936k buffers
Swap:  1951888k total,0k used,  1951888k free,  2662272k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND   

1043 root  19   0  1392  364  320 D 40.7  0.0   3:11.55 dd 
  
7 root   9   0 000 D  3.7  0.0   4:02.48 kupdated  
   
6 root   9   0 000 D  1.7  0.0   2:15.18 bdflush   
   
5 root   9   0 000 S  1.0  0.0   2:47.13 kswapd
  
17 root   9   0 000 D  0.3  0.0   0:46.62 kreiserfsd   
 
1052 root   9   0  1040 1040  820 R  0.3  0.0   0:00.02 top
  


taylor:~# cat /proc/stat 
cpu  402 0 538233 2938065
cpu0 221 0 267698 1470431
cpu1 181 0 270535 1467634
page 215506778 453499588
swap 1 0
intr 200179775 1738350 2 0 9 4 0 2 0 4 2 0 0 0 0 13 5 0 0 0 0 0 0 0 0 5017728 
187006 0 0 0 193236650 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0
disk_io: (3,0):(4,4,32,0,0) (8,0):
(5058268,3638518,431013524,1419750,906999184) 
ctxt 179608469
btime 1084874257
processes 1054


Unfortunality I even don't have an idea how to interprete those numbers.


  Do you have any ideas how we could improve 2.4.x?

 2.6.6-mm has a few key improvements.  There's less metadata
 fragmentation thanks to some block allocator fixes.  More importantly,
 during the rm, metadata blocks are read in 16 at a time instead of 1 at
 a time.  I'd be happy to give someone pointers on porting the metadata
 readahead bits back to 2.4.

I certainly have neither the knowledge nor the time to do that.


Cheers,
Bernd


pgp2ecYLZAjrY.pgp
Description: signature


large files

2004-05-17 Thread Bernd Schubert
Hello,

I'm currently testing our new server and though it will primarily not serve 
really large files (about 40-60 users will have a quota of 25GB each on a 2TB 
array), I'm still testing the performance for large files.

So I created an about 300GB fil and the problem is to remove it now. 
Removing it took much more than 15 minutes. Here's the the relevant top line:

 5012 root  18   0   368  368   312 D21.9  0.0   5:48 rm

Since I didn't expect it to take so much time, I didn't measure the time to 
delete this file.

system specifications:
- dual opteron 242 (1600 MHz)
- linux-2.4.26 with all patches from Chris, no further patches
- reiserfs-3.6 format

The partition with the 300GB file has a size of 1.7TB.


Any ideas whats going on? 


Thanks,
Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: [EMAIL PROTECTED]


pgpLsBK2zvMkq.pgp
Description: signature


Re: large files

2004-05-17 Thread Chris Mason
On Mon, 2004-05-17 at 15:48, Bernd Schubert wrote:
 Hello,
 
 I'm currently testing our new server and though it will primarily not serve 
 really large files (about 40-60 users will have a quota of 25GB each on a 2TB 
 array), I'm still testing the performance for large files.
 
 So I created an about 300GB fil and the problem is to remove it now. 
 Removing it took much more than 15 minutes. Here's the the relevant top line:
 
  5012 root  18   0   368  368   312 D21.9  0.0   5:48 rm
 
 Since I didn't expect it to take so much time, I didn't measure the time to 
 delete this file.
 
 system specifications:
   - dual opteron 242 (1600 MHz)
   - linux-2.4.26 with all patches from Chris, no further patches
   - reiserfs-3.6 format
 
 The partition with the 300GB file has a size of 1.7TB.

This is most likely a combination of metadata fragmentation and the fact
that during deletes, 2.4.x reiserfs ends up reading one block at a time.

As a comparison data point, could you please try 2.6.6-mm3?  I realize
you don't want to run this kernel in production, but it would tell us if
I understand the problems at hand.

-chris




Re: large files

2004-05-17 Thread Bernd Schubert
 As a comparison data point, could you please try 2.6.6-mm3?  I realize
 you don't want to run this kernel in production, but it would tell us if
 I understand the problems at hand.

I will do this during the next days. Currently the system is not running in 
production yet, so rebooting other kernel versions is no problem.

Thanks,
Bernd


-- 
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: [EMAIL PROTECTED]




pgpPZveRg1o9V.pgp
Description: signature


Re: problem with overwriting large files

2003-07-24 Thread Oleg Drokin
Hello!

On Mon, Jul 21, 2003 at 08:30:19AM -0700, Suman Puthana wrote:
 We do not see any problem when we are writing into empty space(using the
 write call in a C program) as the file is extending( the write operation
 takes less than 3 ms), but for a certain part of the application we need to
 over-write these files and we find that the write operation is taking
 about 200-300 ms every few minutes, sometimes every few seconds depending on
 the system load.

The description is very nice, but it would be even more nice if you can
provide a sample test code that we can run and just see the problem.
 
 3.) Would writing in file system blocks(4096 bytes?) or multiples of blocks
 help this situation?  From some basic tests it doesn't seem to help by much.
 From the file system performance point of view, is it better to write
 sixteen 4K chunks or one 64K chunk?

Actually rewriting should be way more faster just because you are not allocating stuff
and only changing mtime. So... I'd really appreciate a sample code
that demonstrates a problem.

Also give an info about what kernel you are trying to use that shows the problem
andstuff like that.

Thank you.

Bye,
Oleg


Re: [reiserfs-list] quotacheck on large files?

2001-07-18 Thread Soeren Sonnenburg

On Wed, Jul 18, 2001 at 12:03:03PM +0300, Harald Hannelius wrote:

 box[/mnt] # quotacheck -avug
 Scanning /dev/sdc1 [/mnt] Hmm, file `/mnt/bigfile'
 not found
 Guess you'd better run fsck first !
 exiting...
 lstat: Value too large for defined data type

Doing a recompile with -D_LARGEFILE64_SOURCE worked for me.

S.



Re: [reiserfs-list] quotacheck on large files?

2001-07-18 Thread Harald Hannelius


On Wed, 18 Jul 2001, Vladimir V. Saveliev wrote:

clip

 If you had a big file on the ext2 fielsystem - would quotacheck be able
 to deal with that file?

Yes.



Harald H Hannelius | [EMAIL PROTECTED]  | GSM +358405470870





Re: [reiserfs-list] optimizing reiserfs for large files?

2001-06-25 Thread Matthew Hawkins

On Mon, 25 Jun 2001, Christian Gottschalch wrote:
 only problem, is reiserfs real stable for an production
 system ?? need no high performance, only stability, and
 journaling, so i think i'll try GFS looks more stable,
 XFS looks nice too, but i think its to new too, means some
 little bugs, i dont know, lets test it, test it. 

You've hit your head on the nail, test them all.  Only you will know
what is good for your environment.  We use reiserfs in production here
and have never had a problem, even when something dumb was done, like
mounting the root filesystem with tails enabled.  Although it does help
a lot performance wise when you have directories with thousands of
files, my main interest in reiserfs is the journaling (can't count the
number of times its saved a lot of fsck down time), and a general
interest in what new and crazy things the guys are going to make it do.

The other journaling filesystems that can compare (xfs, jfs) have a long
heritage from SGI and IBM respectively and while they haven't had as
much testing and exposure on Linux, they have on Irix and AIX and I
suspect most of the problems you'd find are ones in Linux itself, like
systems expecting exact behaviour of ext2fs (I think that was the prob
with NFS exports).  XFS in benchmarks has been notoriously slow on file
deletes, and noticeably faster than ext2fs and reiserfs on the other
operations.  Each filesystem has their good and bad points, and there's
only one way of working out what's best for you in a particular
situation...

Cheers,

-- 
Matt



Re: [reiserfs-list] optimizing reiserfs for large files?

2001-06-22 Thread Russell Coker

On Thursday 14 June 2001 12:18, grobe wrote:
 I have a significant loss of performance in bonnie tests. The writing
 intelligently-test
 e.g. gives me 20710 kB/s with reiserfs, while I get 24753 kB/s with
 ext2 (1 GB-file).

How much RAM do you have?  If you have more than 512M of RAM then the 
results won't be a good indication of true performance.

Also older versions of bonnie never sync the data so the performance 
report depends to a large extent on how much data remains in the 
write-back cache at the end of the test!

Bonnie++ addresses these issue.

Also neither of those results is what you should expect from modern 
hardware.  Machines that were typically sold in corner stores about a 
year ago (such as the machine under my desk) return results better than 
that.  I have attached the results of an Athlon-800 with 256M of PC-133 
RAM and a single 46G ATA-66 IBM hard drive.  The machine was not the most 
powerful machine on the market when I bought it over a year ago.

What types of hard drives does the machine have?

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page


Version 1.92b   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
temp   496M   447  98 28609  16 10608   7   718  98 34694  15 199.8   1
Latency 22328us2074ms   56626us   57412us   43123us2984ms
Version 1.92b   --Sequential Create-- Random Create
temp-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16   849  98 + +++ 15216  90   863  99 + +++  3423  98
Latency  9168us 113us 249us   12778us  41us1744us
1.92b,1.92b,temp,1,993204157,496M,,447,98,28609,16,10608,7,718,98,34694,15,199.8,1,16,849,98,+,+++,15216,90,863,99,+,+++,3423,98,22328us,2074ms,56626us,57412us,43123us,2984ms,9168us,113us,249us,12778us,41us,1744us



Re: [reiserfs-list] optimizing reiserfs for large files?

2001-06-22 Thread Russell Coker

On Saturday 23 June 2001 01:11, Lars O. Grobe wrote:
  Also neither of those results is what you should expect from modern
  hardware.  Machines that were typically sold in corner stores about a
  year ago (such as the machine under my desk) return results better
  than that.  I have attached the results of an Athlon-800 with 256M of
  PC-133 RAM and a single 46G ATA-66 IBM hard drive.  The machine was
  not the most powerful machine on the market when I bought it over a
  year ago.
 
  What types of hard drives does the machine have?

 G should be quite fast sca-scsi ibm-drives. As I wrote, it's an 320GB
 array in a EXP15 connected to a IBM ServeRAID4M. The Netfinity has two
 833MHz PIIIs.

Hmm.  Sounds like the performance you describe is less than expected, and 
the performance is being over-stated too!  When you get some more 
accurate results it'll look even worse...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page



Re: [reiserfs-list] ReiserFS and large files ... 4,294,967,295 bytes seems to bethe limit.

2001-06-18 Thread Yury Yu. Rupasov

Hans Reiser wrote:
 
 Richard Sharpe wrote:
 
  Hi,
 
  I was doing some testing of a 53GB reiserfs file system on a machine I am
  building to see what the performance was like, and I stumbled across a
  problem.
 
  If I try to create a file larger than 4,294,967,295 bytes (2^32-1), it
  simply stops at that point and the copy command (cat, dd, whatever) hangs.
  The system keeps going, just the command you are using hangs.
 
  This does not occur with an EXT2 file system, only a ReiserFS on RedHat 7.1
  with Linux 2.4.2 ...
 
  I have also tried it with a smaller ReiserFS partition, like 18GB, but the
  same results.
 
  Has anyone else seen this?
 
  Regards
  ---
  Richard Sharpe, [EMAIL PROTECTED]
  Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com)
  Contributing author, SAMS Teach Yourself Samba in 24 Hours
  Author, Special Edition, Using Samba
 
 this is bizarre, there is nothing in reiserfs that should cause this in the 2.4
 series, did you create a new fs from scratch under 2.4?  I recommend that you
 use kernel 2.4.4, as 2.4.2 is unstable.   Read our faq for the section about how
 you should update your libraries and file utils to avoid the 2gb limit, but
 .
 
 yura, please advise him.
 
 Hans

Hello !

Yes, you are right, to work properly with large files (2GB) 
you need linux-2.4.x, reiserfs-3.6.x and 3.6 reiserfs format.

So, please check your reiserfs format and if it is 3.5 please
use -o conv mount option.

Here is page describing all reiserfs mount option as well :
http://www.namesys.com/manp.html

Thanks,
Yura.



Re: [reiserfs-list] optimizing reiserfs for large files?

2001-06-14 Thread Chris Mason



On Thursday, June 14, 2001 12:54:11 PM +0200 Dirk Mueller [EMAIL PROTECTED]
wrote:

 On Don, 14 Jun 2001, grobe wrote:
 
 I have a significant loss of performance in bonnie tests. The writing
 intelligently-test
 e.g. gives me 20710 kB/s with reiserfs, while I get 24753 kB/s with ext2
 (1 GB-file).
 
 well, when writing files, reiserfs has to do _journalling_, which
 requires  some writes as well, so its pure natural that it is a bit
 slower. You can  watch the HDD activity LED - if its constantly on then
 its the disc that is  saturated and therefore the limiting factor and not
 reiserfs. If you want  journalling, i.e. no fsck after boot, then you
 have to accept _somewhere_  a _slight_ disadvantage. The question is if
 its really common for your setup  that the disc gets hammered with 100%
 write request. Experience shows that  its usually 90/10 distributed, that
 means 90% reads and 10% writes. So we're  talking about a performance
 drop of 2 percent for writes - something that  you won't notice in
 real-life, not to mention that reiserfs is for reads and  for
 creating/deleting files several magnitudes faster. 

The performance depends on workload, but there is still room for
improvement in reiserfs read and write performance.

One issue is the journal code isn't taking advantage of the prepare_write,
and commit_write address space operations.  We'll start a transaction
during  prepare_write, close it, then end up starting another one during
commit_write to log the atime update.  

This can be improved by allowing recursive transactions, which we also need
for a few other fixes...I hope to finish it today and get final testing
over the weekend.  It's kinda cool.

Zam is already working on the block allocator, I'm sure it'll be cleaner
and faster when he's done.  

 
 Chris Mason has lately written a patch to improve the performance of file 
 writes (especially for concurrent writes as it removes some global kernel 
 locks if I understand him correctly) performance. It is beta quality, as
 it  was never included in any official kernel (nor -ac) yet, but I'm
 using it  for a few weeks now without the slightest problem. You can find
 it in the  mailing list archive (search for pinned pages) or I can send
 it to you if  you're adventorous enough to try it out - YOU've BEEN
 WARNED. 

;-)  This should be in the next ac kernel, a few others have tested it and
reported good results.  But, I don't expect it to have a huge performance
impact on bonnie tests (where the inode is logged in commit_write anway).

-chris