Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-09 Thread John Lato
On Fri, Mar 8, 2013 at 6:36 PM, Simon Marlow marlo...@gmail.com wrote:

 1GB/s for copying a file is reasonable - it's around half the memory
 bandwidth, so copying the data twice would give that result (assuming no
 actual I/O is taking place, which is what you want because actual I/O will
 swamp any differences at the software level).

 The Handle overhead should be negligible if you're only using hGetBufSome
 and hPutBuf, because those functions basically just call read() and write()
 when the amount of data is larger than the buffer size.

 There's clearly something suspicious going on here, unfortunately I don't
 have time right now to investigate, but I'll keep an eye on the thread.


Possibly disk caching/syncing issues?  If some of the tests are able to
either read entirely from cache (on the 1MB test), or don't completely sync
after the write, they could happen much faster than others that have to
actually hit the disk.  For the 60MB test, it's almost guaranteed that
actual IO would take place and dominate the timings.

John L.


 Cheers,
 Simon


 On 08/03/13 08:36, Gregory Collins wrote:

 +Simon Marlow
 A couple of comments:

   * maybe we shouldn't back the file by a Handle. io-streams does this

 by default out of the box; I had a posix file interface for unix
 (guarded by CPP) for a while but decided to ditch it for simplicity.
 If your results are correct, given how slow going by Handle seems to
 be I may revisit this, I figured it would be good enough.
   * io-streams turns Handle buffering off in withFileAsOutput. So the

 difference shouldn't be as a result of buffering. Simon: is this an
 expected result? I presume you did some Handle debugging?
   * the IO manager should not have any bearing here because file code

 doesn't actually ever use it (epoll() doesn't work for files)
   * does the difference persist when the file size gets bigger?
   * your file descriptor code doesn't handle EINTR properly, although

 you said you checked that the file copy is being done?
   * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other

 methods have a more believable ~70MB/s throughput.

 G


 On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com
 mailto:mich...@snoyman.com wrote:

 Hi all,

 I'm turning to the community for some help understanding some
 benchmark results[1]. I was curious to see how the new io-streams
 would work with conduit, as it looks like a far saner low-level
 approach than Handles. In fact, the API is so simple that the entire
 wrapper is just a few lines of code[2].

 I then added in some basic file copy benchmarks, comparing
 conduit+Handle (with ResourceT or bracket), conduit+io-streams,
 straight io-streams, and lazy I/O. All approaches fell into the same
 ballpark, with conduit+bracket and conduit+io-streams taking a
 slight lead. (I haven't analyzed that enough to know if it means
 anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work
 around the fact that System.IO.openFile does some file locking. To
 avoid using Handles, I wrote a simple FFI wrapper exposing open,
 read, and close system calls, ported it to POSIX, and hid it behind
 a Cabal flag. Out of curiosity, I decided to expose it and include
 it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test
 itself is cheating somehow. But I don't know how to explain the
 massive gap. I've run this on two different systems. The results you
 see linked are from my local machine. On an EC2 instance, the gap
 was a bit smaller, but the NoHandle code was still 75% faster than
 the others.

 My initial guess is that I'm not properly tying into the IO manager,
 but I wanted to see if the community had any thoughts. The relevant
 pieces of code are [3][4][5].

 Michael

 [1] 
 http://static.snoyman.com/**streams.htmlhttp://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/**conduit/blob/streams/io-**
 streams-conduit/Data/Conduit/**Streams.hshttps://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 System/PosixFile.hschttps://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 Data/Conduit/Binary.hs#L54https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 Data/Conduit/Binary.hs#L167https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

 

Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-09 Thread Michael Snoyman
Just to clarify: the problem was in fact with my code, I was not passing
O_TRUNC to the open system call. Gregory's C code showed me the problem.
Once I add in that option, all the different benchmarks complete in roughly
the same amount of time. So given that our Haskell implementations based on
Handle are just about as fast as a raw C implementation, I'd say Handle is
performing very well.

Apologies if I got anyone overly concerned.


On Fri, Mar 8, 2013 at 12:36 PM, Simon Marlow marlo...@gmail.com wrote:

 1GB/s for copying a file is reasonable - it's around half the memory
 bandwidth, so copying the data twice would give that result (assuming no
 actual I/O is taking place, which is what you want because actual I/O will
 swamp any differences at the software level).

 The Handle overhead should be negligible if you're only using hGetBufSome
 and hPutBuf, because those functions basically just call read() and write()
 when the amount of data is larger than the buffer size.

 There's clearly something suspicious going on here, unfortunately I don't
 have time right now to investigate, but I'll keep an eye on the thread.

 Cheers,
 Simon


 On 08/03/13 08:36, Gregory Collins wrote:

 +Simon Marlow
 A couple of comments:

   * maybe we shouldn't back the file by a Handle. io-streams does this

 by default out of the box; I had a posix file interface for unix
 (guarded by CPP) for a while but decided to ditch it for simplicity.
 If your results are correct, given how slow going by Handle seems to
 be I may revisit this, I figured it would be good enough.
   * io-streams turns Handle buffering off in withFileAsOutput. So the

 difference shouldn't be as a result of buffering. Simon: is this an
 expected result? I presume you did some Handle debugging?
   * the IO manager should not have any bearing here because file code

 doesn't actually ever use it (epoll() doesn't work for files)
   * does the difference persist when the file size gets bigger?
   * your file descriptor code doesn't handle EINTR properly, although

 you said you checked that the file copy is being done?
   * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other

 methods have a more believable ~70MB/s throughput.

 G


 On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com
 mailto:mich...@snoyman.com wrote:

 Hi all,

 I'm turning to the community for some help understanding some
 benchmark results[1]. I was curious to see how the new io-streams
 would work with conduit, as it looks like a far saner low-level
 approach than Handles. In fact, the API is so simple that the entire
 wrapper is just a few lines of code[2].

 I then added in some basic file copy benchmarks, comparing
 conduit+Handle (with ResourceT or bracket), conduit+io-streams,
 straight io-streams, and lazy I/O. All approaches fell into the same
 ballpark, with conduit+bracket and conduit+io-streams taking a
 slight lead. (I haven't analyzed that enough to know if it means
 anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work
 around the fact that System.IO.openFile does some file locking. To
 avoid using Handles, I wrote a simple FFI wrapper exposing open,
 read, and close system calls, ported it to POSIX, and hid it behind
 a Cabal flag. Out of curiosity, I decided to expose it and include
 it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test
 itself is cheating somehow. But I don't know how to explain the
 massive gap. I've run this on two different systems. The results you
 see linked are from my local machine. On an EC2 instance, the gap
 was a bit smaller, but the NoHandle code was still 75% faster than
 the others.

 My initial guess is that I'm not properly tying into the IO manager,
 but I wanted to see if the community had any thoughts. The relevant
 pieces of code are [3][4][5].

 Michael

 [1] 
 http://static.snoyman.com/**streams.htmlhttp://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/**conduit/blob/streams/io-**
 streams-conduit/Data/Conduit/**Streams.hshttps://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 System/PosixFile.hschttps://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 Data/Conduit/Binary.hs#L54https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/**conduit/blob/streams/conduit/**
 

Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
On Fri, Mar 8, 2013 at 9:36 AM, Gregory Collins g...@gregorycollins.netwrote:

 I presume you did some Handle debugging?


...and here I mean benchmarking of course.


-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
+Simon Marlow
A couple of comments:

   - maybe we shouldn't back the file by a Handle. io-streams does this by
   default out of the box; I had a posix file interface for unix (guarded by
   CPP) for a while but decided to ditch it for simplicity. If your results
   are correct, given how slow going by Handle seems to be I may revisit this,
   I figured it would be good enough.
   - io-streams turns Handle buffering off in withFileAsOutput. So the
   difference shouldn't be as a result of buffering. Simon: is this an
   expected result? I presume you did some Handle debugging?
   - the IO manager should not have any bearing here because file code
   doesn't actually ever use it (epoll() doesn't work for files)
   - does the difference persist when the file size gets bigger?
   - your file descriptor code doesn't handle EINTR properly, although you
   said you checked that the file copy is being done?
   - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
   methods have a more believable ~70MB/s throughput.

G


On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com wrote:

 Hi all,

 I'm turning to the community for some help understanding some benchmark
 results[1]. I was curious to see how the new io-streams would work with
 conduit, as it looks like a far saner low-level approach than Handles. In
 fact, the API is so simple that the entire wrapper is just a few lines of
 code[2].

 I then added in some basic file copy benchmarks, comparing conduit+Handle
 (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
 lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
 and conduit+io-streams taking a slight lead. (I haven't analyzed that
 enough to know if it means anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work around
 the fact that System.IO.openFile does some file locking. To avoid using
 Handles, I wrote a simple FFI wrapper exposing open, read, and close system
 calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
 curiosity, I decided to expose it and include it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test itself is
 cheating somehow. But I don't know how to explain the massive gap. I've run
 this on two different systems. The results you see linked are from my local
 machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
 code was still 75% faster than the others.

 My initial guess is that I'm not properly tying into the IO manager, but I
 wanted to see if the community had any thoughts. The relevant pieces of
 code are [3][4][5].

 Michael

 [1] http://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread John Lato
I'd like to point out that it's entirely possible to get good performance
out of a handle. The iteratee package has had both FD and Handle-based
IO for a while, and I've never observed any serious performance differences
between the two.  Also, if I may be so bold, Michael's supercharged copy
speeds are on par with iteratee's performance using Handles:
http://www.tiresiaspress.us/io-benchmarks.html

So while there's definitely something interesting going on here, I think it
needs a bit more investigation before suggesting that Handles should be
avoided.

For comparison, on my system I get
$ time cp input.dat output.dat

real 0m0.004s
user 0m0.000s
sys 0m0.000s

so the throughput observed on the faster times is entirely reasonable.

John L.


On Fri, Mar 8, 2013 at 4:36 PM, Gregory Collins g...@gregorycollins.netwrote:

 +Simon Marlow
 A couple of comments:

- maybe we shouldn't back the file by a Handle. io-streams does this
by default out of the box; I had a posix file interface for unix (guarded
by CPP) for a while but decided to ditch it for simplicity. If your results
are correct, given how slow going by Handle seems to be I may revisit this,
I figured it would be good enough.
- io-streams turns Handle buffering off in withFileAsOutput. So the
difference shouldn't be as a result of buffering. Simon: is this an
expected result? I presume you did some Handle debugging?
- the IO manager should not have any bearing here because file code
doesn't actually ever use it (epoll() doesn't work for files)
- does the difference persist when the file size gets bigger?
- your file descriptor code doesn't handle EINTR properly, although
you said you checked that the file copy is being done?
- Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
methods have a more believable ~70MB/s throughput.

 G


 On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.comwrote:

 Hi all,

 I'm turning to the community for some help understanding some benchmark
 results[1]. I was curious to see how the new io-streams would work with
 conduit, as it looks like a far saner low-level approach than Handles. In
 fact, the API is so simple that the entire wrapper is just a few lines of
 code[2].

 I then added in some basic file copy benchmarks, comparing conduit+Handle
 (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
 lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
 and conduit+io-streams taking a slight lead. (I haven't analyzed that
 enough to know if it means anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work around
 the fact that System.IO.openFile does some file locking. To avoid using
 Handles, I wrote a simple FFI wrapper exposing open, read, and close system
 calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
 curiosity, I decided to expose it and include it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test itself is
 cheating somehow. But I don't know how to explain the massive gap. I've run
 this on two different systems. The results you see linked are from my local
 machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
 code was still 75% faster than the others.

 My initial guess is that I'm not properly tying into the IO manager, but
 I wanted to see if the community had any thoughts. The relevant pieces of
 code are [3][4][5].

 Michael

 [1] http://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




 --
 Gregory Collins g...@gregorycollins.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote:

 For comparison, on my system I get
 $ time cp input.dat output.dat

 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s


Does your workstation have an SSD? Michael's using a spinning disk.


-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Alexander Kjeldaas
On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins g...@gregorycollins.netwrote:

 On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote:

 For comparison, on my system I get
 $ time cp input.dat output.dat

 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s


 Does your workstation have an SSD? Michael's using a spinning disk.


If you're only copying a GB or so, it should only be memory traffic.

Alexander



 --
 Gregory Collins g...@gregorycollins.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Gregory Collins
Something must be wrong with the conduit NoHandle code. I increased the
filesize to 60MB and implemented the copy loop in pure C, the code and
results are here:

https://gist.github.com/gregorycollins/5115491

Everything but the conduit NoHandle code runs in roughly 600-620ms,
including the pure C version.

G


On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas 
alexander.kjeld...@gmail.com wrote:




 On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins 
 g...@gregorycollins.netwrote:

 On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote:

 For comparison, on my system I get
 $ time cp input.dat output.dat

 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s


 Does your workstation have an SSD? Michael's using a spinning disk.


 If you're only copying a GB or so, it should only be memory traffic.

 Alexander



 --
 Gregory Collins g...@gregorycollins.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe





-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Michael Snoyman
That demonstrated the issue: I'd forgotten to pass O_TRUNC to the open
system call. Adding that back makes the numbers much more comparable.

Thanks for the input everyone, and Gregory for finding the actual problem
(as well as pointing out a few other improvements).


On Fri, Mar 8, 2013 at 12:13 PM, Gregory Collins g...@gregorycollins.netwrote:

 Something must be wrong with the conduit NoHandle code. I increased the
 filesize to 60MB and implemented the copy loop in pure C, the code and
 results are here:

 https://gist.github.com/gregorycollins/5115491

 Everything but the conduit NoHandle code runs in roughly 600-620ms,
 including the pure C version.

 G


 On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas 
 alexander.kjeld...@gmail.com wrote:




 On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins 
 g...@gregorycollins.netwrote:

 On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote:

 For comparison, on my system I get
 $ time cp input.dat output.dat

 real 0m0.004s
 user 0m0.000s
 sys 0m0.000s


 Does your workstation have an SSD? Michael's using a spinning disk.


 If you're only copying a GB or so, it should only be memory traffic.

 Alexander



 --
 Gregory Collins g...@gregorycollins.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe





 --
 Gregory Collins g...@gregorycollins.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-08 Thread Simon Marlow
1GB/s for copying a file is reasonable - it's around half the memory 
bandwidth, so copying the data twice would give that result (assuming no 
actual I/O is taking place, which is what you want because actual I/O 
will swamp any differences at the software level).


The Handle overhead should be negligible if you're only using 
hGetBufSome and hPutBuf, because those functions basically just call 
read() and write() when the amount of data is larger than the buffer size.


There's clearly something suspicious going on here, unfortunately I 
don't have time right now to investigate, but I'll keep an eye on the 
thread.


Cheers,
Simon

On 08/03/13 08:36, Gregory Collins wrote:

+Simon Marlow
A couple of comments:

  * maybe we shouldn't back the file by a Handle. io-streams does this
by default out of the box; I had a posix file interface for unix
(guarded by CPP) for a while but decided to ditch it for simplicity.
If your results are correct, given how slow going by Handle seems to
be I may revisit this, I figured it would be good enough.
  * io-streams turns Handle buffering off in withFileAsOutput. So the
difference shouldn't be as a result of buffering. Simon: is this an
expected result? I presume you did some Handle debugging?
  * the IO manager should not have any bearing here because file code
doesn't actually ever use it (epoll() doesn't work for files)
  * does the difference persist when the file size gets bigger?
  * your file descriptor code doesn't handle EINTR properly, although
you said you checked that the file copy is being done?
  * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other
methods have a more believable ~70MB/s throughput.

G


On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com
mailto:mich...@snoyman.com wrote:

Hi all,

I'm turning to the community for some help understanding some
benchmark results[1]. I was curious to see how the new io-streams
would work with conduit, as it looks like a far saner low-level
approach than Handles. In fact, the API is so simple that the entire
wrapper is just a few lines of code[2].

I then added in some basic file copy benchmarks, comparing
conduit+Handle (with ResourceT or bracket), conduit+io-streams,
straight io-streams, and lazy I/O. All approaches fell into the same
ballpark, with conduit+bracket and conduit+io-streams taking a
slight lead. (I haven't analyzed that enough to know if it means
anything, however.)

Then I decided to pull up the NoHandle code I wrote a while ago for
conduit. This code was written initially for Windows only, to work
around the fact that System.IO.openFile does some file locking. To
avoid using Handles, I wrote a simple FFI wrapper exposing open,
read, and close system calls, ported it to POSIX, and hid it behind
a Cabal flag. Out of curiosity, I decided to expose it and include
it in the benchmark.

The results are extreme. I've confirmed multiple times that the copy
algorithm is in fact copying the file, so I don't think the test
itself is cheating somehow. But I don't know how to explain the
massive gap. I've run this on two different systems. The results you
see linked are from my local machine. On an EC2 instance, the gap
was a bit smaller, but the NoHandle code was still 75% faster than
the others.

My initial guess is that I'm not properly tying into the IO manager,
but I wanted to see if the community had any thoughts. The relevant
pieces of code are [3][4][5].

Michael

[1] http://static.snoyman.com/streams.html
[2]

https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
[3]

https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
[4]

https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
[5]

https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




--
Gregory Collins g...@gregorycollins.net mailto:g...@gregorycollins.net



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread Michael Snoyman
Hi all,

I'm turning to the community for some help understanding some benchmark
results[1]. I was curious to see how the new io-streams would work with
conduit, as it looks like a far saner low-level approach than Handles. In
fact, the API is so simple that the entire wrapper is just a few lines of
code[2].

I then added in some basic file copy benchmarks, comparing conduit+Handle
(with ResourceT or bracket), conduit+io-streams, straight io-streams, and
lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
and conduit+io-streams taking a slight lead. (I haven't analyzed that
enough to know if it means anything, however.)

Then I decided to pull up the NoHandle code I wrote a while ago for
conduit. This code was written initially for Windows only, to work around
the fact that System.IO.openFile does some file locking. To avoid using
Handles, I wrote a simple FFI wrapper exposing open, read, and close system
calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
curiosity, I decided to expose it and include it in the benchmark.

The results are extreme. I've confirmed multiple times that the copy
algorithm is in fact copying the file, so I don't think the test itself is
cheating somehow. But I don't know how to explain the massive gap. I've run
this on two different systems. The results you see linked are from my local
machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
code was still 75% faster than the others.

My initial guess is that I'm not properly tying into the IO manager, but I
wanted to see if the community had any thoughts. The relevant pieces of
code are [3][4][5].

Michael

[1] http://static.snoyman.com/streams.html
[2]
https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
[3]
https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
[4]
https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
[5]
https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread Michael Snoyman
One clarification: it seems that sourceFile and sourceFileNoHandle have
virtually no difference in speed. The gap comes exclusively from sinkFile
vs sinkFileNoHandle. This makes me think that it might be a buffer copy
that's causing the slowdown, in which case the benchmark may in fact be
accurate.
On Mar 8, 2013 8:30 AM, Michael Snoyman mich...@snoyman.com wrote:

 Hi all,

 I'm turning to the community for some help understanding some benchmark
 results[1]. I was curious to see how the new io-streams would work with
 conduit, as it looks like a far saner low-level approach than Handles. In
 fact, the API is so simple that the entire wrapper is just a few lines of
 code[2].

 I then added in some basic file copy benchmarks, comparing conduit+Handle
 (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
 lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
 and conduit+io-streams taking a slight lead. (I haven't analyzed that
 enough to know if it means anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work around
 the fact that System.IO.openFile does some file locking. To avoid using
 Handles, I wrote a simple FFI wrapper exposing open, read, and close system
 calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
 curiosity, I decided to expose it and include it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test itself is
 cheating somehow. But I don't know how to explain the massive gap. I've run
 this on two different systems. The results you see linked are from my local
 machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
 code was still 75% faster than the others.

 My initial guess is that I'm not properly tying into the IO manager, but I
 wanted to see if the community had any thoughts. The relevant pieces of
 code are [3][4][5].

 Michael

 [1] http://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)

2013-03-07 Thread John Lato
I would have expected sourceFileNoHandle to make the most difference, since
that's one location (write) where you've obviously removed a copy. Does
sourceFileNoHandle allocate less?

Incidentally, I've recently been making similar changes to IO code
(removing buffer copies) and getting similar speedups.  Although the
results tend to be less pronounced in code that isn't strictly IO-bound.


On Fri, Mar 8, 2013 at 2:50 PM, Michael Snoyman mich...@snoyman.com wrote:

 One clarification: it seems that sourceFile and sourceFileNoHandle have
 virtually no difference in speed. The gap comes exclusively from sinkFile
 vs sinkFileNoHandle. This makes me think that it might be a buffer copy
 that's causing the slowdown, in which case the benchmark may in fact be
 accurate.
 On Mar 8, 2013 8:30 AM, Michael Snoyman mich...@snoyman.com wrote:

 Hi all,

 I'm turning to the community for some help understanding some benchmark
 results[1]. I was curious to see how the new io-streams would work with
 conduit, as it looks like a far saner low-level approach than Handles. In
 fact, the API is so simple that the entire wrapper is just a few lines of
 code[2].

 I then added in some basic file copy benchmarks, comparing conduit+Handle
 (with ResourceT or bracket), conduit+io-streams, straight io-streams, and
 lazy I/O. All approaches fell into the same ballpark, with conduit+bracket
 and conduit+io-streams taking a slight lead. (I haven't analyzed that
 enough to know if it means anything, however.)

 Then I decided to pull up the NoHandle code I wrote a while ago for
 conduit. This code was written initially for Windows only, to work around
 the fact that System.IO.openFile does some file locking. To avoid using
 Handles, I wrote a simple FFI wrapper exposing open, read, and close system
 calls, ported it to POSIX, and hid it behind a Cabal flag. Out of
 curiosity, I decided to expose it and include it in the benchmark.

 The results are extreme. I've confirmed multiple times that the copy
 algorithm is in fact copying the file, so I don't think the test itself is
 cheating somehow. But I don't know how to explain the massive gap. I've run
 this on two different systems. The results you see linked are from my local
 machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle
 code was still 75% faster than the others.

 My initial guess is that I'm not properly tying into the IO manager, but
 I wanted to see if the community had any thoughts. The relevant pieces of
 code are [3][4][5].

 Michael

 [1] http://static.snoyman.com/streams.html
 [2]
 https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs
 [3]
 https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc
 [4]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54
 [5]
 https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-14 Thread John Melesky

On Mar 12, 2008, at 4:07 PM, Andrew Coppin wrote:
I'm trying to read the file from Notepad.exe while my Haskell  
program is still running - which takes about an hour.


I'm not a Windows user, but... Is it possible that Notepad tries to  
write-lock by default (since it's an editor), and fails? Put another  
way, have you tried other ways of reading the file? Heck, copying the  
file should be a read-only action. Can you copy it during the runtime,  
and open the copy?


-johnnn

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-14 Thread Andrew Coppin

John Melesky wrote:

On Mar 12, 2008, at 4:07 PM, Andrew Coppin wrote:
I'm trying to read the file from Notepad.exe while my Haskell program 
is still running - which takes about an hour.


I'm not a Windows user, but... Is it possible that Notepad tries to 
write-lock by default (since it's an editor), and fails?


Notepad successfully opens the log file of another script I'm running, 
so that's not the issue.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-13 Thread Bjorn Bringert
On Wed, Mar 12, 2008 at 10:03 PM, Andrew Coppin
[EMAIL PROTECTED] wrote:
 Don Stewart wrote:
   Hey Andrew,
  
   What are you trying to do? Read and write to the same file (if so, you
   need to use strict IO), or are you trying something sneakier?
  

  I have a long-running Haskell program that writes status information to
  a log file. I'd like to be able to open and read that log file before
  the program has actually terminated. I have a similar program written in
  Tcl that allows me to do this, since apparently the Tcl interpretter
  doesn't lock output files for exclusive access. Haskell, however, does.
  (This seems to be the stipulated behaviour as per the Report.) If
  there's an easy way to change this, it would be useful...

How about using appendFile?

appendFile :: FilePath - String - IO ()

The computation appendFile file str function appends the string str,
to the file file.

See 
http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v%3AappendFile

/Björn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-13 Thread Bulat Ziganshin
Hello Andrew,

Wednesday, March 12, 2008, 10:06:44 PM, you wrote:

 When I write to a file using System.IO, the file is locked for exclusive
 access. I gather this is as specified in the Haskell Report. Which is 
 nice, but... I'd actually prefer the file to *not* be locked. Anybody 
 know how to do that?

one (and only?) possible way is to use Streams library which happens
to not lock files:

http://haskell.org/haskellwiki/Library/Streams

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-13 Thread Andrew Coppin

Bulat Ziganshin wrote:

Hello Andrew,

one (and only?) possible way is to use Streams library which happens
to not lock files:
  


Is that likely to compile on Windows?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] File I/O question

2008-03-12 Thread Andrew Coppin

Hi Cafe.

There's good news and there's bad news.

The bad news is... I'm back. [Did I miss anything good?]

The good news is... I have an actual question to ask as well.

When I write to a file using System.IO, the file is locked for exclusive 
access. I gather this is as specified in the Haskell Report. Which is 
nice, but... I'd actually prefer the file to *not* be locked. Anybody 
know how to do that?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Don Stewart
andrewcoppin:
 Hi Cafe.
 
 There's good news and there's bad news.
 
 The bad news is... I'm back. [Did I miss anything good?]
 
 The good news is... I have an actual question to ask as well.
 
 When I write to a file using System.IO, the file is locked for exclusive 
 access. I gather this is as specified in the Haskell Report. Which is 
 nice, but... I'd actually prefer the file to *not* be locked. Anybody 
 know how to do that?

Hey Andrew,

What are you trying to do? Read and write to the same file (if so, you
need to use strict IO), or are you trying something sneakier?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Andrew Coppin

Don Stewart wrote:

Hey Andrew,

What are you trying to do? Read and write to the same file (if so, you
need to use strict IO), or are you trying something sneakier?
  


I have a long-running Haskell program that writes status information to 
a log file. I'd like to be able to open and read that log file before 
the program has actually terminated. I have a similar program written in 
Tcl that allows me to do this, since apparently the Tcl interpretter 
doesn't lock output files for exclusive access. Haskell, however, does. 
(This seems to be the stipulated behaviour as per the Report.) If 
there's an easy way to change this, it would be useful...


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Don Stewart
andrewcoppin:
 Don Stewart wrote:
 Hey Andrew,
 
 What are you trying to do? Read and write to the same file (if so, you
 need to use strict IO), or are you trying something sneakier?
   
 
 I have a long-running Haskell program that writes status information to 
 a log file. I'd like to be able to open and read that log file before 
 the program has actually terminated. I have a similar program written in 
 Tcl that allows me to do this, since apparently the Tcl interpretter 
 doesn't lock output files for exclusive access. Haskell, however, does. 
 (This seems to be the stipulated behaviour as per the Report.) If 
 there's an easy way to change this, it would be useful...

Did you open the file in ReadWriteMode ?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Andrew Coppin

Don Stewart wrote:

andrewcoppin:
  

Don Stewart wrote:


Hey Andrew,

What are you trying to do? Read and write to the same file (if so, you
need to use strict IO), or are you trying something sneakier?
 
  
I have a long-running Haskell program that writes status information to 
a log file. I'd like to be able to open and read that log file before 
the program has actually terminated. I have a similar program written in 
Tcl that allows me to do this, since apparently the Tcl interpretter 
doesn't lock output files for exclusive access. Haskell, however, does. 
(This seems to be the stipulated behaviour as per the Report.) If 
there's an easy way to change this, it would be useful...



Did you open the file in ReadWriteMode ?
  


Nope. Just WriteMode. I'm trying to read the file from Notepad.exe while 
my Haskell program is still running - which takes about an hour.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Don Stewart
andrewcoppin:
 
 Nope. Just WriteMode. I'm trying to read the file from Notepad.exe while 
 my Haskell program is still running - which takes about an hour.
 

Oh, you want another process in the system to read the file while GHC is
writing to it? This works fine on unix systems -- and perhaps Neil, or
one of the other windows experts, can explain what the story is on Windows.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] File I/O question

2008-03-12 Thread Andrew Coppin

Don Stewart wrote:

Oh, you want another process in the system to read the file while GHC is
writing to it?


That's the one. ;-)

[Well, not GHC but my GHC-compiled binary, but anyway...]


This works fine on unix systems -- and perhaps Neil, or
one of the other windows experts, can explain what the story is on Windows.
  


I thought the Report... wait, let me check...

Oh, OK, that's not what I thought the Report says. Section 21.2.3, File 
Locking. I thought it says that Haskell is supposed to prevent access 
to the same file by multiple threads. (And that, presumably, is why it's 
using an exclusive lock to try to implement these semantics under the 
Win32 API.) However, that's apparently not what it says... (It says 
multiple reader / single writer.)


So... does this count as a bug then?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] file i/o

2006-01-03 Thread Robert Heffernan
Hi,

I am relatively new to Haskell and am finding i/o difficult to work
with.  I am trying to do something like the following:

I have a file of data, each line of which looks like this:
STRING,  INTEGER SEQUENCE
for example:
FOO ,2,1,4,3,6,7,5,9,10,11,8,13,12,

I would like to write a function the reads this file and returns
arrays like this:
[FOO,[2,1,4,3,6,7,5,9,10,11,8,13,12]]

Ideally, the function would return the first line when initially
called, the second the next time it is called and so on.  I would
settle for something that returned a big array comprising arrays of
the above type containing all the information in the file.  The file
is big, however.

I can not figure out how to do this from any of the tutorials so I
thought I might ask here.

Thank you for your help,
Robert.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] file i/o

2006-01-03 Thread Neil Mitchell
Hi Robert,

The first thing to mention is that Haskell uses linked-lists, not
arrays as the standard list type structure, so [1,2] is actually a
linked list.

The next thing to note is that Haskell is *lazy*. It won't do work
that it doens't have to. This means that you can return a linked list
with all the lines in the file, but they won't actually be read til
they are required. i.e. Haskell cleverly worries about all the
getting a next line as required stuff, without you even noticing -
it will read it line by line.

A simple function that does some of what you want is:
 parseFile :: FilePath - IO [(String, [Int])]
 parseFile x = do src - readFile x
  return (map parseLine (lines src))

 parseLine :: String - (String, [Int])
 parseLine = for you to write :)

The other point is that Haskell linked lists have to have every
element of the same type, so you can't have [test,1] as a linked
list, what you actually want is a tuple, written (test,1) - a tuple
is of fixed length and all elements can be of different type.

Hope that helps,

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] file i/o

2006-01-03 Thread Thomas Davie
The other thing to mention, is that if you have the ability to change  
file formats, it may be better to make just a slight adjustment... If  
you make it look exactly like the haskell data structure you want:


[(Foo, [1,2,3,4,5,6,7])
,(Bar, [7,6,5,4,3,2,1])
,...]

Then your parser becomes even simpler:

parseFile :: FilePath - IO [(String,[Int])]
parseFile = do src - readFile x
   return $ read src

On Jan 3, 2006, at 11:33 AM, Neil Mitchell wrote:


Hi Robert,

The first thing to mention is that Haskell uses linked-lists, not
arrays as the standard list type structure, so [1,2] is actually a
linked list.

The next thing to note is that Haskell is *lazy*. It won't do work
that it doens't have to. This means that you can return a linked list
with all the lines in the file, but they won't actually be read til
they are required. i.e. Haskell cleverly worries about all the
getting a next line as required stuff, without you even noticing -
it will read it line by line.

A simple function that does some of what you want is:

parseFile :: FilePath - IO [(String, [Int])]
parseFile x = do src - readFile x
 return (map parseLine (lines src))



parseLine :: String - (String, [Int])
parseLine = for you to write :)


The other point is that Haskell linked lists have to have every
element of the same type, so you can't have [test,1] as a linked
list, what you actually want is a tuple, written (test,1) - a tuple
is of fixed length and all elements can be of different type.

Hope that helps,

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] file i/o

2006-01-03 Thread Robert Heffernan
Neil and Thomas,

Thanks to both of you for your help.  I have things working now.

Bob
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe