Re: read two files simultaneously

2009-02-22 Thread Junsuk Shin
That's true. Using bigger buffer will help, but it doesn't tell why reading
large size file is slower than reading small size file.


On Sat, Feb 21, 2009 at 5:56 PM, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:

 I'm just guessing inode structure, the physical file location on HDD
 might be related to this. But, if I read only one file, the size
 doesn't matter. Reading file (10M, 100M, 700M) gives constantly about
 70MB/s, and the weird thing happens when I read 2 files of big size.


 if you use O_DIRECT it's read from disk exactly as you specified, without
 readahead, so you do a lot of seeks.

 simply use bigger buffer like 1MB




-- 
Junsuk
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: read two files simultaneously

2009-02-22 Thread Junsuk Shin
Both of them.

Reading two 100M files in interleaved way with 16K buffer, 62MB/s

Reading two 700M files in interleaved way with 16K buffer, 9MB/s

Reading two 100M files in interleaved way with 1M buffer, 55MB/s
  get worse with large buffer somehow

Reading two 700M files in interleaved way with 1M buffer, 34MB/s
  get better with large buffer, but still difference, 55 vs 34

I cannot find the reason for this. gstat(8) also shows low rates when
reading large files in interleaved way but not for small files.

On Sun, Feb 22, 2009 at 5:20 PM, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:

 That's true. Using bigger buffer will help, but it doesn't tell why reading
 large size file is slower than reading small size file.

  really slower? or just bigger difference with large files?




 On Sat, Feb 21, 2009 at 5:56 PM, Wojciech Puchar 
 woj...@wojtek.tensor.gdynia.pl wrote:

  I'm just guessing inode structure, the physical file location on HDD

 might be related to this. But, if I read only one file, the size
 doesn't matter. Reading file (10M, 100M, 700M) gives constantly about
 70MB/s, and the weird thing happens when I read 2 files of big size.


 if you use O_DIRECT it's read from disk exactly as you specified, without
 readahead, so you do a lot of seeks.

 simply use bigger buffer like 1MB




 --
 Junsuk


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


read two files simultaneously

2009-02-21 Thread Junsuk Shin
Hello,

I need to read two files simultaneously, and simply read(2) is
interleaved to do this. The problem is the performance varies
dramatically depending on the file size. I'm wondering what is the
problem in this case.

The test application does following:

open 2 files
  - the size of two file is same
  - since I read only once, bypass cache with O_DIRECT
read 16Kbytes of file1, then read 16K of file2, and so on

simplified code is like this:

fd1 = open(file1, O_RDONLY | O_DIRECT);
fd2 = open(file2, O_RDONLY | O_DIRECT);

for(...) {
/* read 16K of file1 */
while(...) {
count = read(fd1,...);

}
/* read 16K of file2 */
while(...) {
count = read(fd2,...);

}
}

When I tested with two 100M files, it takes 3.17 seconds (about 31MB/s
per file, 62MB/s in total)
However, if I test with two 700M files, it takes 162 seconds (about
4.5MB/s per file, 9MB/s in total)

I'm just guessing inode structure, the physical file location on HDD
might be related to this. But, if I read only one file, the size
doesn't matter. Reading file (10M, 100M, 700M) gives constantly about
70MB/s, and the weird thing happens when I read 2 files of big size.

The seek time might be related to this, but it looks like too huge
difference. What is going on this?

Thanks.

-- 
Junsuk
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


read() vs fread()

2009-02-20 Thread Junsuk Shin
Hi BSD guys,

While I was doing simple file read test, I found that there is a huge
difference in file read performance between read() and fread(). I'm
wondering if I'm doing something wrong or if someone has experienced
similar things.

Here is what I did,

For the specific application, I need to bypass cache (I read only
once, and that's all)
The test file is 700Mbytes dummy file.
Test app just reads the whole file.

Test is done on FreeBSD 7.1 amd 64, Celeron E1200, WD Caviar SE16 SATA 7200 RPM

For test 1,

fd = open(name, O_RDONLY | O_DIRECT);
while(...) {
  cnt = read();
  
}

for test 2,

fd = open(name, O_RDONLY | O_DIRECT);
file = fdopen(fd,r);
while(...) {
  cnt = fread();
  
}

test 1 takes about 11.64 seconds (63 MBytes/s), and test 2 takes about
51.53 seconds (14 MBytes/s)

If I use the pair of fopen() and fread(), it will have cache effect,
so the result doesn't say much of hdd performance.

Personally, I don't think the overhead of fread() (wrapper in libc) is
that huge. What would be the reason for this?

Thanks.

--
Junsuk
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: read() vs fread()

2009-02-20 Thread Junsuk Shin
setvbuf(file, buf, _IOFBF, bufsize) solved the problem perfectly.
Thanks a lot.

On Fri, Feb 20, 2009 at 4:09 PM, Pieter de Goeje pie...@degoeje.nl wrote:

 On Friday 20 February 2009 21:07:57 Junsuk Shin wrote:
  Hi BSD guys,
 
  While I was doing simple file read test, I found that there is a huge
  difference in file read performance between read() and fread(). I'm
  wondering if I'm doing something wrong or if someone has experienced
  similar things.
 
  Here is what I did,
 
  For the specific application, I need to bypass cache (I read only
  once, and that's all)
  The test file is 700Mbytes dummy file.
  Test app just reads the whole file.
 
  Test is done on FreeBSD 7.1 amd 64, Celeron E1200, WD Caviar SE16 SATA 7200
  RPM
 
  For test 1,
 
  fd = open(name, O_RDONLY | O_DIRECT);
  while(...) {
cnt = read();

  }
 
  for test 2,
 
  fd = open(name, O_RDONLY | O_DIRECT);
  file = fdopen(fd,r);
  while(...) {
cnt = fread();

  }
 
  test 1 takes about 11.64 seconds (63 MBytes/s), and test 2 takes about
  51.53 seconds (14 MBytes/s)
 
  If I use the pair of fopen() and fread(), it will have cache effect,
  so the result doesn't say much of hdd performance.
 
  Personally, I don't think the overhead of fread() (wrapper in libc) is
  that huge. What would be the reason for this?

 The reason is that by default a FILE has a really small internal buffer. Take
 a look at gstat(8) while running the test: you can clearly see an insane
 amount of I/O requests being done (almost 5000 reads per second on my HDD).
 To solve this call setvbuf(3):

 setvbuf(file, buf, _IOFBF, bufsize);

 A bufsize of 16k or bigger should help a lot. After this modification, I see
 about 900 reads per second (using bufsize = 64k) and the read speed is equal
 to the read(2) case.

 Regards,

 Pieter de Goeje

--
Junsuk
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org