Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-09 Thread Michael Snoyman
Just to clarify: the problem was in fact with my code, I was not passing
O_TRUNC to the open system call. Gregory's C code showed me the problem.
Once I add in that option, all the different benchmarks complete in roughly
the same amount of time. So given that our Haskell implementations based on
Handle are just about as fast as a raw C implementation, I'd say Handle is
performing very well.

Apologies if I got anyone overly concerned.


On Fri, Mar 8, 2013 at 12:36 PM, Simon Marlow  wrote:

> 1GB/s for copying a file is reasonable - it's around half the memory
> bandwidth, so copying the data twice would give that result (assuming no
> actual I/O is taking place, which is what you want because actual I/O will
> swamp any differences at the software level).
>
> The Handle overhead should be negligible if you're only using hGetBufSome
> and hPutBuf, because those functions basically just call read() and write()
> when the amount of data is larger than the buffer size.
>
> There's clearly something suspicious going on here, unfortunately I don't
> have time right now to investigate, but I'll keep an eye on the thread.
>
> Cheers,
> Simon
>
>
> On 08/03/13 08:36, Gregory Collins wrote:
>
>> +Simon Marlow
>> A couple of comments:
>>
>>   * maybe we shouldn't back the file by a Handle. io-streams does this
>>
>> by default out of the box; I had a posix file interface for unix
>> (guarded by CPP) for a while but decided to ditch it for simplicity.
>> If your results are correct, given how slow going by Handle seems to
>> be I may revisit this, I figured it would be "good enough".
>>   * io-streams turns Handle buffering off in withFileAsOutput. So the
>>
>> difference shouldn't be as a result of buffering. Simon: is this an
>> expected result? I presume you did some Handle debugging?
>>   * the IO manager should not have any bearing here because file code
>>
>> doesn't actually ever use it (epoll() doesn't work for files)
>>   * does the difference persist when the file size gets bigger?
>>   * your file descriptor code doesn't handle EINTR properly, although
>>
>> you said you checked that the file copy is being done?
>>   * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
>>
>> methods have a more believable ~70MB/s throughput.
>>
>> G
>>
>>
>> On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman > > wrote:
>>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some
>> benchmark results[1]. I was curious to see how the new io-streams
>> would work with conduit, as it looks like a far saner low-level
>> approach than Handles. In fact, the API is so simple that the entire
>> wrapper is just a few lines of code[2].
>>
>> I then added in some basic file copy benchmarks, comparing
>> conduit+Handle (with ResourceT or bracket), conduit+io-streams,
>> straight io-streams, and lazy I/O. All approaches fell into the same
>> ballpark, with conduit+bracket and conduit+io-streams taking a
>> slight lead. (I haven't analyzed that enough to know if it means
>> anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work
>> around the fact that System.IO.openFile does some file locking. To
>> avoid using Handles, I wrote a simple FFI wrapper exposing open,
>> read, and close system calls, ported it to POSIX, and hid it behind
>> a Cabal flag. Out of curiosity, I decided to expose it and include
>> it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test
>> itself is cheating somehow. But I don't know how to explain the
>> massive gap. I've run this on two different systems. The results you
>> see linked are from my local machine. On an EC2 instance, the gap
>> was a bit smaller, but the NoHandle code was still 75% faster than
>> the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager,
>> but I wanted to see if the community had any thoughts. The relevant
>> pieces of code are [3][4][5].
>>
>> Michael
>>
>> [1] 
>> http://static.snoyman.com/**streams.html
>> [2]
>> https://github.com/snoyberg/**conduit/blob/streams/io-**
>> streams-conduit/Data/Conduit/**Streams.hs
>> [3]
>> https://github.com/snoyberg/**conduit/blob/streams/conduit/**
>> System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/**conduit/blob/streams/conduit/**
>> Data/Conduit/Binary.hs#L54

Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-09 Thread John Lato
On Fri, Mar 8, 2013 at 6:36 PM, Simon Marlow  wrote:

> 1GB/s for copying a file is reasonable - it's around half the memory
> bandwidth, so copying the data twice would give that result (assuming no
> actual I/O is taking place, which is what you want because actual I/O will
> swamp any differences at the software level).
>
> The Handle overhead should be negligible if you're only using hGetBufSome
> and hPutBuf, because those functions basically just call read() and write()
> when the amount of data is larger than the buffer size.
>
> There's clearly something suspicious going on here, unfortunately I don't
> have time right now to investigate, but I'll keep an eye on the thread.
>

Possibly disk caching/syncing issues?  If some of the tests are able to
either read entirely from cache (on the 1MB test), or don't completely sync
after the write, they could happen much faster than others that have to
actually hit the disk.  For the 60MB test, it's almost guaranteed that
actual IO would take place and dominate the timings.

John L.


> Cheers,
> Simon
>
>
> On 08/03/13 08:36, Gregory Collins wrote:
>
>> +Simon Marlow
>> A couple of comments:
>>
>>   * maybe we shouldn't back the file by a Handle. io-streams does this
>>
>> by default out of the box; I had a posix file interface for unix
>> (guarded by CPP) for a while but decided to ditch it for simplicity.
>> If your results are correct, given how slow going by Handle seems to
>> be I may revisit this, I figured it would be "good enough".
>>   * io-streams turns Handle buffering off in withFileAsOutput. So the
>>
>> difference shouldn't be as a result of buffering. Simon: is this an
>> expected result? I presume you did some Handle debugging?
>>   * the IO manager should not have any bearing here because file code
>>
>> doesn't actually ever use it (epoll() doesn't work for files)
>>   * does the difference persist when the file size gets bigger?
>>   * your file descriptor code doesn't handle EINTR properly, although
>>
>> you said you checked that the file copy is being done?
>>   * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
>>
>> methods have a more believable ~70MB/s throughput.
>>
>> G
>>
>>
>> On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman > > wrote:
>>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some
>> benchmark results[1]. I was curious to see how the new io-streams
>> would work with conduit, as it looks like a far saner low-level
>> approach than Handles. In fact, the API is so simple that the entire
>> wrapper is just a few lines of code[2].
>>
>> I then added in some basic file copy benchmarks, comparing
>> conduit+Handle (with ResourceT or bracket), conduit+io-streams,
>> straight io-streams, and lazy I/O. All approaches fell into the same
>> ballpark, with conduit+bracket and conduit+io-streams taking a
>> slight lead. (I haven't analyzed that enough to know if it means
>> anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work
>> around the fact that System.IO.openFile does some file locking. To
>> avoid using Handles, I wrote a simple FFI wrapper exposing open,
>> read, and close system calls, ported it to POSIX, and hid it behind
>> a Cabal flag. Out of curiosity, I decided to expose it and include
>> it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test
>> itself is cheating somehow. But I don't know how to explain the
>> massive gap. I've run this on two different systems. The results you
>> see linked are from my local machine. On an EC2 instance, the gap
>> was a bit smaller, but the NoHandle code was still 75% faster than
>> the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager,
>> but I wanted to see if the community had any thoughts. The relevant
>> pieces of code are [3][4][5].
>>
>> Michael
>>
>> [1] 
>> http://static.snoyman.com/**streams.html
>> [2]
>> https://github.com/snoyberg/**conduit/blob/streams/io-**
>> streams-conduit/Data/Conduit/**Streams.hs
>> [3]
>> https://github.com/snoyberg/**conduit/blob/streams/conduit/**
>> System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/**conduit/blob/streams/conduit/**
>> Data/Conduit/Binary.hs#L54
>> [5]
>> https://github.com/snoyberg/**conduit/blob/streams/conduit/

Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Simon Marlow
1GB/s for copying a file is reasonable - it's around half the memory 
bandwidth, so copying the data twice would give that result (assuming no 
actual I/O is taking place, which is what you want because actual I/O 
will swamp any differences at the software level).


The Handle overhead should be negligible if you're only using 
hGetBufSome and hPutBuf, because those functions basically just call 
read() and write() when the amount of data is larger than the buffer size.


There's clearly something suspicious going on here, unfortunately I 
don't have time right now to investigate, but I'll keep an eye on the 
thread.


Cheers,
Simon

On 08/03/13 08:36, Gregory Collins wrote:

+Simon Marlow
A couple of comments:

  * maybe we shouldn't back the file by a Handle. io-streams does this
by default out of the box; I had a posix file interface for unix
(guarded by CPP) for a while but decided to ditch it for simplicity.
If your results are correct, given how slow going by Handle seems to
be I may revisit this, I figured it would be "good enough".
  * io-streams turns Handle buffering off in withFileAsOutput. So the
difference shouldn't be as a result of buffering. Simon: is this an
expected result? I presume you did some Handle debugging?
  * the IO manager should not have any bearing here because file code
doesn't actually ever use it (epoll() doesn't work for files)
  * does the difference persist when the file size gets bigger?
  * your file descriptor code doesn't handle EINTR properly, although
you said you checked that the file copy is being done?
  * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
methods have a more believable ~70MB/s throughput.

G


On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mailto:mich...@snoyman.com>> wrote:

Hi all,

I'm turning to the community for some help understanding some
benchmark results[1]. I was curious to see how the new io-streams
would work with conduit, as it looks like a far saner low-level
approach than Handles. In fact, the API is so simple that the entire
wrapper is just a few lines of code[2].

I then added in some basic file copy benchmarks, comparing
conduit+Handle (with ResourceT or bracket), conduit+io-streams,
straight io-streams, and lazy I/O. All approaches fell into the same
ballpark, with conduit+bracket and conduit+io-streams taking a
slight lead. (I haven't analyzed that enough to know if it means
anything, however.)

Then I decided to pull up the NoHandle code I wrote a while ago for
conduit. This code was written initially for Windows only, to work
around the fact that System.IO.openFile does some file locking. To
avoid using Handles, I wrote a simple FFI wrapper exposing open,
read, and close system calls, ported it to POSIX, and hid it behind
a Cabal flag. Out of curiosity, I decided to expose it and include
it in the benchmark.

The results are extreme. I've confirmed multiple times that the copy
algorithm is in fact copying the file, so I don't think the test
itself is cheating somehow. But I don't know how to explain the
massive gap. I've run this on two different systems. The results you
see linked are from my local machine. On an EC2 instance, the gap
was a bit smaller, but the NoHandle code was still 75% faster than
the others.

My initial guess is that I'm not properly tying into the IO manager,
but I wanted to see if the community had any thoughts. The relevant
pieces of code are [3][4][5].

Michael

[1] http://static.snoyman.com/streams.html
[2]

https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
[3]

https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
[4]

https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
[5]

https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org 
http://www.haskell.org/mailman/listinfo/haskell-cafe




--
Gregory Collins mailto:g...@gregorycollins.net>>



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Michael Snoyman
That demonstrated the issue: I'd forgotten to pass O_TRUNC to the open
system call. Adding that back makes the numbers much more comparable.

Thanks for the input everyone, and Gregory for finding the actual problem
(as well as pointing out a few other improvements).


On Fri, Mar 8, 2013 at 12:13 PM, Gregory Collins wrote:

> Something must be wrong with the conduit "NoHandle" code. I increased the
> filesize to 60MB and implemented the copy loop in pure C, the code and
> results are here:
>
> https://gist.github.com/gregorycollins/5115491
>
> Everything but the conduit NoHandle code runs in roughly 600-620ms,
> including the pure C version.
>
> G
>
>
> On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas <
> alexander.kjeld...@gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins 
>> wrote:
>>
>>> On Fri, Mar 8, 2013 at 9:48 AM, John Lato  wrote:
>>>
 For comparison, on my system I get
 $ time cp input.dat output.dat

 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s

>>>
>>> Does your workstation have an SSD? Michael's using a spinning disk.
>>>
>>>
>> If you're only copying a GB or so, it should only be memory traffic.
>>
>> Alexander
>>
>>
>>>
>>> --
>>> Gregory Collins 
>>>
>>> ___
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe@haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>>
>>
>
>
> --
> Gregory Collins 
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
Something must be wrong with the conduit "NoHandle" code. I increased the
filesize to 60MB and implemented the copy loop in pure C, the code and
results are here:

https://gist.github.com/gregorycollins/5115491

Everything but the conduit NoHandle code runs in roughly 600-620ms,
including the pure C version.

G


On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas <
alexander.kjeld...@gmail.com> wrote:

>
>
>
> On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins 
> wrote:
>
>> On Fri, Mar 8, 2013 at 9:48 AM, John Lato  wrote:
>>
>>> For comparison, on my system I get
>>> $ time cp input.dat output.dat
>>>
>>> real 0m0.004s
>>> user 0m0.000s
>>> sys 0m0.000s
>>>
>>
>> Does your workstation have an SSD? Michael's using a spinning disk.
>>
>>
> If you're only copying a GB or so, it should only be memory traffic.
>
> Alexander
>
>
>>
>> --
>> Gregory Collins 
>>
>> ___
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>


-- 
Gregory Collins 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Alexander Kjeldaas
On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins wrote:

> On Fri, Mar 8, 2013 at 9:48 AM, John Lato  wrote:
>
>> For comparison, on my system I get
>> $ time cp input.dat output.dat
>>
>> real 0m0.004s
>> user 0m0.000s
>> sys 0m0.000s
>>
>
> Does your workstation have an SSD? Michael's using a spinning disk.
>
>
If you're only copying a GB or so, it should only be memory traffic.

Alexander


>
> --
> Gregory Collins 
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
On Fri, Mar 8, 2013 at 9:48 AM, John Lato  wrote:

> For comparison, on my system I get
> $ time cp input.dat output.dat
>
> real 0m0.004s
> user 0m0.000s
> sys 0m0.000s
>

Does your workstation have an SSD? Michael's using a spinning disk.


-- 
Gregory Collins 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread John Lato
I'd like to point out that it's entirely possible to get good performance
out of a handle. The iteratee package has had both FD and Handle-based
IO for a while, and I've never observed any serious performance differences
between the two.  Also, if I may be so bold, Michael's supercharged copy
speeds are on par with iteratee's performance using Handles:
http://www.tiresiaspress.us/io-benchmarks.html

So while there's definitely something interesting going on here, I think it
needs a bit more investigation before suggesting that Handles should be
avoided.

For comparison, on my system I get
$ time cp input.dat output.dat

real 0m0.004s
user 0m0.000s
sys 0m0.000s

so the throughput observed on the faster times is entirely reasonable.

John L.


On Fri, Mar 8, 2013 at 4:36 PM, Gregory Collins wrote:

> +Simon Marlow
> A couple of comments:
>
>- maybe we shouldn't back the file by a Handle. io-streams does this
>by default out of the box; I had a posix file interface for unix (guarded
>by CPP) for a while but decided to ditch it for simplicity. If your results
>are correct, given how slow going by Handle seems to be I may revisit this,
>I figured it would be "good enough".
>- io-streams turns Handle buffering off in withFileAsOutput. So the
>difference shouldn't be as a result of buffering. Simon: is this an
>expected result? I presume you did some Handle debugging?
>- the IO manager should not have any bearing here because file code
>doesn't actually ever use it (epoll() doesn't work for files)
>- does the difference persist when the file size gets bigger?
>- your file descriptor code doesn't handle EINTR properly, although
>you said you checked that the file copy is being done?
>- Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
>methods have a more believable ~70MB/s throughput.
>
> G
>
>
> On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman wrote:
>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some benchmark
>> results[1]. I was curious to see how the new io-streams would work with
>> conduit, as it looks like a far saner low-level approach than Handles. In
>> fact, the API is so simple that the entire wrapper is just a few lines of
>> code[2].
>>
>> I then added in some basic file copy benchmarks, comparing conduit+Handle
>> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
>> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
>> and conduit+io-streams taking a slight lead. (I haven't analyzed that
>> enough to know if it means anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work around
>> the fact that System.IO.openFile does some file locking. To avoid using
>> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
>> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
>> curiosity, I decided to expose it and include it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test itself is
>> cheating somehow. But I don't know how to explain the massive gap. I've run
>> this on two different systems. The results you see linked are from my local
>> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
>> code was still 75% faster than the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager, but
>> I wanted to see if the community had any thoughts. The relevant pieces of
>> code are [3][4][5].
>>
>> Michael
>>
>> [1] http://static.snoyman.com/streams.html
>> [2]
>> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
>> [3]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
>> [5]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>>
>> ___
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>
>
> --
> Gregory Collins 
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
+Simon Marlow
A couple of comments:

   - maybe we shouldn't back the file by a Handle. io-streams does this by
   default out of the box; I had a posix file interface for unix (guarded by
   CPP) for a while but decided to ditch it for simplicity. If your results
   are correct, given how slow going by Handle seems to be I may revisit this,
   I figured it would be "good enough".
   - io-streams turns Handle buffering off in withFileAsOutput. So the
   difference shouldn't be as a result of buffering. Simon: is this an
   expected result? I presume you did some Handle debugging?
   - the IO manager should not have any bearing here because file code
   doesn't actually ever use it (epoll() doesn't work for files)
   - does the difference persist when the file size gets bigger?
   - your file descriptor code doesn't handle EINTR properly, although you
   said you checked that the file copy is being done?
   - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
   methods have a more believable ~70MB/s throughput.

G


On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman  wrote:

> Hi all,
>
> I'm turning to the community for some help understanding some benchmark
> results[1]. I was curious to see how the new io-streams would work with
> conduit, as it looks like a far saner low-level approach than Handles. In
> fact, the API is so simple that the entire wrapper is just a few lines of
> code[2].
>
> I then added in some basic file copy benchmarks, comparing conduit+Handle
> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
> and conduit+io-streams taking a slight lead. (I haven't analyzed that
> enough to know if it means anything, however.)
>
> Then I decided to pull up the NoHandle code I wrote a while ago for
> conduit. This code was written initially for Windows only, to work around
> the fact that System.IO.openFile does some file locking. To avoid using
> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
> curiosity, I decided to expose it and include it in the benchmark.
>
> The results are extreme. I've confirmed multiple times that the copy
> algorithm is in fact copying the file, so I don't think the test itself is
> cheating somehow. But I don't know how to explain the massive gap. I've run
> this on two different systems. The results you see linked are from my local
> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
> code was still 75% faster than the others.
>
> My initial guess is that I'm not properly tying into the IO manager, but I
> wanted to see if the community had any thoughts. The relevant pieces of
> code are [3][4][5].
>
> Michael
>
> [1] http://static.snoyman.com/streams.html
> [2]
> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
> [3]
> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
> [4]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
> [5]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


-- 
Gregory Collins 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
On Fri, Mar 8, 2013 at 9:36 AM, Gregory Collins wrote:

> I presume you did some Handle debugging?


...and here I mean "benchmarking" of course.


-- 
Gregory Collins 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread John Lato
I would have expected sourceFileNoHandle to make the most difference, since
that's one location (write) where you've obviously removed a copy. Does
sourceFileNoHandle allocate less?

Incidentally, I've recently been making similar changes to IO code
(removing buffer copies) and getting similar speedups.  Although the
results tend to be less pronounced in code that isn't strictly IO-bound.


On Fri, Mar 8, 2013 at 2:50 PM, Michael Snoyman  wrote:

> One clarification: it seems that sourceFile and sourceFileNoHandle have
> virtually no difference in speed. The gap comes exclusively from sinkFile
> vs sinkFileNoHandle. This makes me think that it might be a buffer copy
> that's causing the slowdown, in which case the benchmark may in fact be
> accurate.
> On Mar 8, 2013 8:30 AM, "Michael Snoyman"  wrote:
>
>> Hi all,
>>
>> I'm turning to the community for some help understanding some benchmark
>> results[1]. I was curious to see how the new io-streams would work with
>> conduit, as it looks like a far saner low-level approach than Handles. In
>> fact, the API is so simple that the entire wrapper is just a few lines of
>> code[2].
>>
>> I then added in some basic file copy benchmarks, comparing conduit+Handle
>> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
>> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
>> and conduit+io-streams taking a slight lead. (I haven't analyzed that
>> enough to know if it means anything, however.)
>>
>> Then I decided to pull up the NoHandle code I wrote a while ago for
>> conduit. This code was written initially for Windows only, to work around
>> the fact that System.IO.openFile does some file locking. To avoid using
>> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
>> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
>> curiosity, I decided to expose it and include it in the benchmark.
>>
>> The results are extreme. I've confirmed multiple times that the copy
>> algorithm is in fact copying the file, so I don't think the test itself is
>> cheating somehow. But I don't know how to explain the massive gap. I've run
>> this on two different systems. The results you see linked are from my local
>> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
>> code was still 75% faster than the others.
>>
>> My initial guess is that I'm not properly tying into the IO manager, but
>> I wanted to see if the community had any thoughts. The relevant pieces of
>> code are [3][4][5].
>>
>> Michael
>>
>> [1] http://static.snoyman.com/streams.html
>> [2]
>> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
>> [3]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
>> [4]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
>> [5]
>> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread Michael Snoyman
One clarification: it seems that sourceFile and sourceFileNoHandle have
virtually no difference in speed. The gap comes exclusively from sinkFile
vs sinkFileNoHandle. This makes me think that it might be a buffer copy
that's causing the slowdown, in which case the benchmark may in fact be
accurate.
On Mar 8, 2013 8:30 AM, "Michael Snoyman"  wrote:

> Hi all,
>
> I'm turning to the community for some help understanding some benchmark
> results[1]. I was curious to see how the new io-streams would work with
> conduit, as it looks like a far saner low-level approach than Handles. In
> fact, the API is so simple that the entire wrapper is just a few lines of
> code[2].
>
> I then added in some basic file copy benchmarks, comparing conduit+Handle
> (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
> lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
> and conduit+io-streams taking a slight lead. (I haven't analyzed that
> enough to know if it means anything, however.)
>
> Then I decided to pull up the NoHandle code I wrote a while ago for
> conduit. This code was written initially for Windows only, to work around
> the fact that System.IO.openFile does some file locking. To avoid using
> Handles, I wrote a simple FFI wrapper exposing open, read, and close system
> calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
> curiosity, I decided to expose it and include it in the benchmark.
>
> The results are extreme. I've confirmed multiple times that the copy
> algorithm is in fact copying the file, so I don't think the test itself is
> cheating somehow. But I don't know how to explain the massive gap. I've run
> this on two different systems. The results you see linked are from my local
> machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
> code was still 75% faster than the others.
>
> My initial guess is that I'm not properly tying into the IO manager, but I
> wanted to see if the community had any thoughts. The relevant pieces of
> code are [3][4][5].
>
> Michael
>
> [1] http://static.snoyman.com/streams.html
> [2]
> https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
> [3]
> https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
> [4]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
> [5]
> https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread Michael Snoyman
Hi all,

I'm turning to the community for some help understanding some benchmark
results[1]. I was curious to see how the new io-streams would work with
conduit, as it looks like a far saner low-level approach than Handles. In
fact, the API is so simple that the entire wrapper is just a few lines of
code[2].

I then added in some basic file copy benchmarks, comparing conduit+Handle
(with ResourceT or bracket), conduit+io-streams, straight io-streams, and
lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
and conduit+io-streams taking a slight lead. (I haven't analyzed that
enough to know if it means anything, however.)

Then I decided to pull up the NoHandle code I wrote a while ago for
conduit. This code was written initially for Windows only, to work around
the fact that System.IO.openFile does some file locking. To avoid using
Handles, I wrote a simple FFI wrapper exposing open, read, and close system
calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
curiosity, I decided to expose it and include it in the benchmark.

The results are extreme. I've confirmed multiple times that the copy
algorithm is in fact copying the file, so I don't think the test itself is
cheating somehow. But I don't know how to explain the massive gap. I've run
this on two different systems. The results you see linked are from my local
machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
code was still 75% faster than the others.

My initial guess is that I'm not properly tying into the IO manager, but I
wanted to see if the community had any thoughts. The relevant pieces of
code are [3][4][5].

Michael

[1] http://static.snoyman.com/streams.html
[2]
https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
[3]
https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
[4]
https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
[5]
https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe