Re: large files
Hello Chris, As a comparison data point, could you please try 2.6.6-mm3? I realize you don't want to run this kernel in production, but it would tell us if I understand the problems at hand. the results in 2.6.6-mm3 are below, we almost consider to run this kernel version. Here are two other interesting facts: 1.) During the filecreation in 2.4.26 the load on the system was around 3-4, whereas in 2.6.6-mm3 the load was at about 8-9. 2.) When the dd file creation process finished (2.4.26 was running) the system became so unresponsible, that the drdb connection timed out and a resync process automatically started when the system became responsible again. I don't have any comparism to 2.6.6-mm3 since we would need another drbd version. Also, I don't know if this happend when dd finished or when rm-started, since both were running from a script. Here are the measured times for file creation and file deleting = 2.4.26: taylor:~# cat test.out-2.4.26 time dd if=/dev/zero of=/worka/testfile.dd bs=1M count=30 30+0 records in 30+0 records out 31457280 bytes transferred in 5746.266841 seconds (54743855 bytes/sec) real95m46.275s user0m0.760s sys 29m57.800s time rm -fr /worka/testfile.dd real11m20.589s user0m0.000s sys 4m59.850s = 2.6.6-mm3 taylor:~# cat test.out-2.6.6-mm3 time dd if=/dev/zero of=/worka/testfile.dd bs=1M count=30 30+0 records in 30+0 records out 31457280 bytes transferred in 4902.873869 seconds (64160900 bytes/sec) real81m46.211s user0m1.172s sys 22m26.010s time rm -fr /worka/testfile.dd real1m38.000s user0m0.000s sys 1m5.872s Do you have any ideas how we could improve 2.4.x? Thanks, Bernd -- Bernd Schubert Physikalisch Chemisches Institut / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg e-mail: [EMAIL PROTECTED] pgpTxdLMuiwGJ.pgp Description: signature
Re: large files
On Tue, 2004-05-18 at 09:42, Bernd Schubert wrote: Hello Chris, As a comparison data point, could you please try 2.6.6-mm3? I realize you don't want to run this kernel in production, but it would tell us if I understand the problems at hand. the results in 2.6.6-mm3 are below, we almost consider to run this kernel version. Here are two other interesting facts: 1.) During the filecreation in 2.4.26 the load on the system was around 3-4, whereas in 2.6.6-mm3 the load was at about 8-9. Which procs contributed to this load? The simple dd should have kept the load at one. 2.) When the dd file creation process finished (2.4.26 was running) the system became so unresponsible, that the drdb connection timed out and a resync process automatically started when the system became responsible again. I don't have any comparism to 2.6.6-mm3 since we would need another drbd version. Also, I don't know if this happend when dd finished or when rm-started, since both were running from a script. Probably the rm. [ 2.6.6-mm3 is much faster ] Do you have any ideas how we could improve 2.4.x? 2.6.6-mm has a few key improvements. There's less metadata fragmentation thanks to some block allocator fixes. More importantly, during the rm, metadata blocks are read in 16 at a time instead of 1 at a time. I'd be happy to give someone pointers on porting the metadata readahead bits back to 2.4. -chris
Re: large files
1.) During the filecreation in 2.4.26 the load on the system was around 3-4, whereas in 2.6.6-mm3 the load was at about 8-9. Which procs contributed to this load? The simple dd should have kept the load at one. Thats all I can see from top (2.4.26): top - 16:45:14 up 4:47, 1 user, load average: 3.30, 2.80, 2.09 Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% user, 24.0% system, 0.0% nice, 76.0% idle Mem: 3104428k total, 3018816k used,85612k free, 228936k buffers Swap: 1951888k total,0k used, 1951888k free, 2662272k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1043 root 19 0 1392 364 320 D 40.7 0.0 3:11.55 dd 7 root 9 0 000 D 3.7 0.0 4:02.48 kupdated 6 root 9 0 000 D 1.7 0.0 2:15.18 bdflush 5 root 9 0 000 S 1.0 0.0 2:47.13 kswapd 17 root 9 0 000 D 0.3 0.0 0:46.62 kreiserfsd 1052 root 9 0 1040 1040 820 R 0.3 0.0 0:00.02 top taylor:~# cat /proc/stat cpu 402 0 538233 2938065 cpu0 221 0 267698 1470431 cpu1 181 0 270535 1467634 page 215506778 453499588 swap 1 0 intr 200179775 1738350 2 0 9 4 0 2 0 4 2 0 0 0 0 13 5 0 0 0 0 0 0 0 0 5017728 187006 0 0 0 193236650 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 disk_io: (3,0):(4,4,32,0,0) (8,0): (5058268,3638518,431013524,1419750,906999184) ctxt 179608469 btime 1084874257 processes 1054 Unfortunality I even don't have an idea how to interprete those numbers. Do you have any ideas how we could improve 2.4.x? 2.6.6-mm has a few key improvements. There's less metadata fragmentation thanks to some block allocator fixes. More importantly, during the rm, metadata blocks are read in 16 at a time instead of 1 at a time. I'd be happy to give someone pointers on porting the metadata readahead bits back to 2.4. I certainly have neither the knowledge nor the time to do that. Cheers, Bernd pgp2ecYLZAjrY.pgp Description: signature
large files
Hello, I'm currently testing our new server and though it will primarily not serve really large files (about 40-60 users will have a quota of 25GB each on a 2TB array), I'm still testing the performance for large files. So I created an about 300GB fil and the problem is to remove it now. Removing it took much more than 15 minutes. Here's the the relevant top line: 5012 root 18 0 368 368 312 D21.9 0.0 5:48 rm Since I didn't expect it to take so much time, I didn't measure the time to delete this file. system specifications: - dual opteron 242 (1600 MHz) - linux-2.4.26 with all patches from Chris, no further patches - reiserfs-3.6 format The partition with the 300GB file has a size of 1.7TB. Any ideas whats going on? Thanks, Bernd -- Bernd Schubert Physikalisch Chemisches Institut / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg e-mail: [EMAIL PROTECTED] pgpLsBK2zvMkq.pgp Description: signature
Re: large files
On Mon, 2004-05-17 at 15:48, Bernd Schubert wrote: Hello, I'm currently testing our new server and though it will primarily not serve really large files (about 40-60 users will have a quota of 25GB each on a 2TB array), I'm still testing the performance for large files. So I created an about 300GB fil and the problem is to remove it now. Removing it took much more than 15 minutes. Here's the the relevant top line: 5012 root 18 0 368 368 312 D21.9 0.0 5:48 rm Since I didn't expect it to take so much time, I didn't measure the time to delete this file. system specifications: - dual opteron 242 (1600 MHz) - linux-2.4.26 with all patches from Chris, no further patches - reiserfs-3.6 format The partition with the 300GB file has a size of 1.7TB. This is most likely a combination of metadata fragmentation and the fact that during deletes, 2.4.x reiserfs ends up reading one block at a time. As a comparison data point, could you please try 2.6.6-mm3? I realize you don't want to run this kernel in production, but it would tell us if I understand the problems at hand. -chris
Re: large files
As a comparison data point, could you please try 2.6.6-mm3? I realize you don't want to run this kernel in production, but it would tell us if I understand the problems at hand. I will do this during the next days. Currently the system is not running in production yet, so rebooting other kernel versions is no problem. Thanks, Bernd -- Bernd Schubert Physikalisch Chemisches Institut / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg e-mail: [EMAIL PROTECTED] pgpPZveRg1o9V.pgp Description: signature
Re: problem with overwriting large files
Hello! On Mon, Jul 21, 2003 at 08:30:19AM -0700, Suman Puthana wrote: We do not see any problem when we are writing into empty space(using the write call in a C program) as the file is extending( the write operation takes less than 3 ms), but for a certain part of the application we need to over-write these files and we find that the write operation is taking about 200-300 ms every few minutes, sometimes every few seconds depending on the system load. The description is very nice, but it would be even more nice if you can provide a sample test code that we can run and just see the problem. 3.) Would writing in file system blocks(4096 bytes?) or multiples of blocks help this situation? From some basic tests it doesn't seem to help by much. From the file system performance point of view, is it better to write sixteen 4K chunks or one 64K chunk? Actually rewriting should be way more faster just because you are not allocating stuff and only changing mtime. So... I'd really appreciate a sample code that demonstrates a problem. Also give an info about what kernel you are trying to use that shows the problem andstuff like that. Thank you. Bye, Oleg
Re: [reiserfs-list] quotacheck on large files?
On Wed, Jul 18, 2001 at 12:03:03PM +0300, Harald Hannelius wrote: box[/mnt] # quotacheck -avug Scanning /dev/sdc1 [/mnt] Hmm, file `/mnt/bigfile' not found Guess you'd better run fsck first ! exiting... lstat: Value too large for defined data type Doing a recompile with -D_LARGEFILE64_SOURCE worked for me. S.
Re: [reiserfs-list] quotacheck on large files?
On Wed, 18 Jul 2001, Vladimir V. Saveliev wrote: clip If you had a big file on the ext2 fielsystem - would quotacheck be able to deal with that file? Yes. Harald H Hannelius | [EMAIL PROTECTED] | GSM +358405470870
Re: [reiserfs-list] optimizing reiserfs for large files?
On Mon, 25 Jun 2001, Christian Gottschalch wrote: only problem, is reiserfs real stable for an production system ?? need no high performance, only stability, and journaling, so i think i'll try GFS looks more stable, XFS looks nice too, but i think its to new too, means some little bugs, i dont know, lets test it, test it. You've hit your head on the nail, test them all. Only you will know what is good for your environment. We use reiserfs in production here and have never had a problem, even when something dumb was done, like mounting the root filesystem with tails enabled. Although it does help a lot performance wise when you have directories with thousands of files, my main interest in reiserfs is the journaling (can't count the number of times its saved a lot of fsck down time), and a general interest in what new and crazy things the guys are going to make it do. The other journaling filesystems that can compare (xfs, jfs) have a long heritage from SGI and IBM respectively and while they haven't had as much testing and exposure on Linux, they have on Irix and AIX and I suspect most of the problems you'd find are ones in Linux itself, like systems expecting exact behaviour of ext2fs (I think that was the prob with NFS exports). XFS in benchmarks has been notoriously slow on file deletes, and noticeably faster than ext2fs and reiserfs on the other operations. Each filesystem has their good and bad points, and there's only one way of working out what's best for you in a particular situation... Cheers, -- Matt
Re: [reiserfs-list] optimizing reiserfs for large files?
On Thursday 14 June 2001 12:18, grobe wrote: I have a significant loss of performance in bonnie tests. The writing intelligently-test e.g. gives me 20710 kB/s with reiserfs, while I get 24753 kB/s with ext2 (1 GB-file). How much RAM do you have? If you have more than 512M of RAM then the results won't be a good indication of true performance. Also older versions of bonnie never sync the data so the performance report depends to a large extent on how much data remains in the write-back cache at the end of the test! Bonnie++ addresses these issue. Also neither of those results is what you should expect from modern hardware. Machines that were typically sold in corner stores about a year ago (such as the machine under my desk) return results better than that. I have attached the results of an Athlon-800 with 256M of PC-133 RAM and a single 46G ATA-66 IBM hard drive. The machine was not the most powerful machine on the market when I bought it over a year ago. What types of hard drives does the machine have? -- http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/projects.html Projects I am working on http://www.coker.com.au/~russell/ My home page Version 1.92b --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP temp 496M 447 98 28609 16 10608 7 718 98 34694 15 199.8 1 Latency 22328us2074ms 56626us 57412us 43123us2984ms Version 1.92b --Sequential Create-- Random Create temp-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 849 98 + +++ 15216 90 863 99 + +++ 3423 98 Latency 9168us 113us 249us 12778us 41us1744us 1.92b,1.92b,temp,1,993204157,496M,,447,98,28609,16,10608,7,718,98,34694,15,199.8,1,16,849,98,+,+++,15216,90,863,99,+,+++,3423,98,22328us,2074ms,56626us,57412us,43123us,2984ms,9168us,113us,249us,12778us,41us,1744us
Re: [reiserfs-list] optimizing reiserfs for large files?
On Saturday 23 June 2001 01:11, Lars O. Grobe wrote: Also neither of those results is what you should expect from modern hardware. Machines that were typically sold in corner stores about a year ago (such as the machine under my desk) return results better than that. I have attached the results of an Athlon-800 with 256M of PC-133 RAM and a single 46G ATA-66 IBM hard drive. The machine was not the most powerful machine on the market when I bought it over a year ago. What types of hard drives does the machine have? G should be quite fast sca-scsi ibm-drives. As I wrote, it's an 320GB array in a EXP15 connected to a IBM ServeRAID4M. The Netfinity has two 833MHz PIIIs. Hmm. Sounds like the performance you describe is less than expected, and the performance is being over-stated too! When you get some more accurate results it'll look even worse... -- http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/projects.html Projects I am working on http://www.coker.com.au/~russell/ My home page
Re: [reiserfs-list] ReiserFS and large files ... 4,294,967,295 bytes seems to bethe limit.
Hans Reiser wrote: Richard Sharpe wrote: Hi, I was doing some testing of a 53GB reiserfs file system on a machine I am building to see what the performance was like, and I stumbled across a problem. If I try to create a file larger than 4,294,967,295 bytes (2^32-1), it simply stops at that point and the copy command (cat, dd, whatever) hangs. The system keeps going, just the command you are using hangs. This does not occur with an EXT2 file system, only a ReiserFS on RedHat 7.1 with Linux 2.4.2 ... I have also tried it with a smaller ReiserFS partition, like 18GB, but the same results. Has anyone else seen this? Regards --- Richard Sharpe, [EMAIL PROTECTED] Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) Contributing author, SAMS Teach Yourself Samba in 24 Hours Author, Special Edition, Using Samba this is bizarre, there is nothing in reiserfs that should cause this in the 2.4 series, did you create a new fs from scratch under 2.4? I recommend that you use kernel 2.4.4, as 2.4.2 is unstable. Read our faq for the section about how you should update your libraries and file utils to avoid the 2gb limit, but . yura, please advise him. Hans Hello ! Yes, you are right, to work properly with large files (2GB) you need linux-2.4.x, reiserfs-3.6.x and 3.6 reiserfs format. So, please check your reiserfs format and if it is 3.5 please use -o conv mount option. Here is page describing all reiserfs mount option as well : http://www.namesys.com/manp.html Thanks, Yura.
Re: [reiserfs-list] optimizing reiserfs for large files?
On Thursday, June 14, 2001 12:54:11 PM +0200 Dirk Mueller [EMAIL PROTECTED] wrote: On Don, 14 Jun 2001, grobe wrote: I have a significant loss of performance in bonnie tests. The writing intelligently-test e.g. gives me 20710 kB/s with reiserfs, while I get 24753 kB/s with ext2 (1 GB-file). well, when writing files, reiserfs has to do _journalling_, which requires some writes as well, so its pure natural that it is a bit slower. You can watch the HDD activity LED - if its constantly on then its the disc that is saturated and therefore the limiting factor and not reiserfs. If you want journalling, i.e. no fsck after boot, then you have to accept _somewhere_ a _slight_ disadvantage. The question is if its really common for your setup that the disc gets hammered with 100% write request. Experience shows that its usually 90/10 distributed, that means 90% reads and 10% writes. So we're talking about a performance drop of 2 percent for writes - something that you won't notice in real-life, not to mention that reiserfs is for reads and for creating/deleting files several magnitudes faster. The performance depends on workload, but there is still room for improvement in reiserfs read and write performance. One issue is the journal code isn't taking advantage of the prepare_write, and commit_write address space operations. We'll start a transaction during prepare_write, close it, then end up starting another one during commit_write to log the atime update. This can be improved by allowing recursive transactions, which we also need for a few other fixes...I hope to finish it today and get final testing over the weekend. It's kinda cool. Zam is already working on the block allocator, I'm sure it'll be cleaner and faster when he's done. Chris Mason has lately written a patch to improve the performance of file writes (especially for concurrent writes as it removes some global kernel locks if I understand him correctly) performance. It is beta quality, as it was never included in any official kernel (nor -ac) yet, but I'm using it for a few weeks now without the slightest problem. You can find it in the mailing list archive (search for pinned pages) or I can send it to you if you're adventorous enough to try it out - YOU've BEEN WARNED. ;-) This should be in the next ac kernel, a few others have tested it and reported good results. But, I don't expect it to have a huge performance impact on bonnie tests (where the inode is logged in commit_write anway). -chris