Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
On Fri, Mar 8, 2013 at 6:36 PM, Simon Marlow marlo...@gmail.com wrote: 1GB/s for copying a file is reasonable - it's around half the memory bandwidth, so copying the data twice would give that result (assuming no actual I/O is taking place, which is what you want because actual I/O will swamp any differences at the software level). The Handle overhead should be negligible if you're only using hGetBufSome and hPutBuf, because those functions basically just call read() and write() when the amount of data is larger than the buffer size. There's clearly something suspicious going on here, unfortunately I don't have time right now to investigate, but I'll keep an eye on the thread. Possibly disk caching/syncing issues? If some of the tests are able to either read entirely from cache (on the 1MB test), or don't completely sync after the write, they could happen much faster than others that have to actually hit the disk. For the 60MB test, it's almost guaranteed that actual IO would take place and dominate the timings. John L. Cheers, Simon On 08/03/13 08:36, Gregory Collins wrote: +Simon Marlow A couple of comments: * maybe we shouldn't back the file by a Handle. io-streams does this by default out of the box; I had a posix file interface for unix (guarded by CPP) for a while but decided to ditch it for simplicity. If your results are correct, given how slow going by Handle seems to be I may revisit this, I figured it would be good enough. * io-streams turns Handle buffering off in withFileAsOutput. So the difference shouldn't be as a result of buffering. Simon: is this an expected result? I presume you did some Handle debugging? * the IO manager should not have any bearing here because file code doesn't actually ever use it (epoll() doesn't work for files) * does the difference persist when the file size gets bigger? * your file descriptor code doesn't handle EINTR properly, although you said you checked that the file copy is being done? * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other methods have a more believable ~70MB/s throughput. G On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com mailto:mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/**streams.htmlhttp://static.snoyman.com/streams.html [2] https://github.com/snoyberg/**conduit/blob/streams/io-** streams-conduit/Data/Conduit/**Streams.hshttps://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/**conduit/blob/streams/conduit/** System/PosixFile.hschttps://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/**conduit/blob/streams/conduit/** Data/Conduit/Binary.hs#L54https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/**conduit/blob/streams/conduit/** Data/Conduit/Binary.hs#L167https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
Just to clarify: the problem was in fact with my code, I was not passing O_TRUNC to the open system call. Gregory's C code showed me the problem. Once I add in that option, all the different benchmarks complete in roughly the same amount of time. So given that our Haskell implementations based on Handle are just about as fast as a raw C implementation, I'd say Handle is performing very well. Apologies if I got anyone overly concerned. On Fri, Mar 8, 2013 at 12:36 PM, Simon Marlow marlo...@gmail.com wrote: 1GB/s for copying a file is reasonable - it's around half the memory bandwidth, so copying the data twice would give that result (assuming no actual I/O is taking place, which is what you want because actual I/O will swamp any differences at the software level). The Handle overhead should be negligible if you're only using hGetBufSome and hPutBuf, because those functions basically just call read() and write() when the amount of data is larger than the buffer size. There's clearly something suspicious going on here, unfortunately I don't have time right now to investigate, but I'll keep an eye on the thread. Cheers, Simon On 08/03/13 08:36, Gregory Collins wrote: +Simon Marlow A couple of comments: * maybe we shouldn't back the file by a Handle. io-streams does this by default out of the box; I had a posix file interface for unix (guarded by CPP) for a while but decided to ditch it for simplicity. If your results are correct, given how slow going by Handle seems to be I may revisit this, I figured it would be good enough. * io-streams turns Handle buffering off in withFileAsOutput. So the difference shouldn't be as a result of buffering. Simon: is this an expected result? I presume you did some Handle debugging? * the IO manager should not have any bearing here because file code doesn't actually ever use it (epoll() doesn't work for files) * does the difference persist when the file size gets bigger? * your file descriptor code doesn't handle EINTR properly, although you said you checked that the file copy is being done? * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other methods have a more believable ~70MB/s throughput. G On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com mailto:mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/**streams.htmlhttp://static.snoyman.com/streams.html [2] https://github.com/snoyberg/**conduit/blob/streams/io-** streams-conduit/Data/Conduit/**Streams.hshttps://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/**conduit/blob/streams/conduit/** System/PosixFile.hschttps://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/**conduit/blob/streams/conduit/** Data/Conduit/Binary.hs#L54https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/**conduit/blob/streams/conduit/**
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
On Fri, Mar 8, 2013 at 9:36 AM, Gregory Collins g...@gregorycollins.netwrote: I presume you did some Handle debugging? ...and here I mean benchmarking of course. -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
+Simon Marlow A couple of comments: - maybe we shouldn't back the file by a Handle. io-streams does this by default out of the box; I had a posix file interface for unix (guarded by CPP) for a while but decided to ditch it for simplicity. If your results are correct, given how slow going by Handle seems to be I may revisit this, I figured it would be good enough. - io-streams turns Handle buffering off in withFileAsOutput. So the difference shouldn't be as a result of buffering. Simon: is this an expected result? I presume you did some Handle debugging? - the IO manager should not have any bearing here because file code doesn't actually ever use it (epoll() doesn't work for files) - does the difference persist when the file size gets bigger? - your file descriptor code doesn't handle EINTR properly, although you said you checked that the file copy is being done? - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other methods have a more believable ~70MB/s throughput. G On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
I'd like to point out that it's entirely possible to get good performance out of a handle. The iteratee package has had both FD and Handle-based IO for a while, and I've never observed any serious performance differences between the two. Also, if I may be so bold, Michael's supercharged copy speeds are on par with iteratee's performance using Handles: http://www.tiresiaspress.us/io-benchmarks.html So while there's definitely something interesting going on here, I think it needs a bit more investigation before suggesting that Handles should be avoided. For comparison, on my system I get $ time cp input.dat output.dat real 0m0.004s user 0m0.000s sys 0m0.000s so the throughput observed on the faster times is entirely reasonable. John L. On Fri, Mar 8, 2013 at 4:36 PM, Gregory Collins g...@gregorycollins.netwrote: +Simon Marlow A couple of comments: - maybe we shouldn't back the file by a Handle. io-streams does this by default out of the box; I had a posix file interface for unix (guarded by CPP) for a while but decided to ditch it for simplicity. If your results are correct, given how slow going by Handle seems to be I may revisit this, I figured it would be good enough. - io-streams turns Handle buffering off in withFileAsOutput. So the difference shouldn't be as a result of buffering. Simon: is this an expected result? I presume you did some Handle debugging? - the IO manager should not have any bearing here because file code doesn't actually ever use it (epoll() doesn't work for files) - does the difference persist when the file size gets bigger? - your file descriptor code doesn't handle EINTR properly, although you said you checked that the file copy is being done? - Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other methods have a more believable ~70MB/s throughput. G On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.comwrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote: For comparison, on my system I get $ time cp input.dat output.dat real 0m0.004s user 0m0.000s sys 0m0.000s Does your workstation have an SSD? Michael's using a spinning disk. -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins g...@gregorycollins.netwrote: On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote: For comparison, on my system I get $ time cp input.dat output.dat real 0m0.004s user 0m0.000s sys 0m0.000s Does your workstation have an SSD? Michael's using a spinning disk. If you're only copying a GB or so, it should only be memory traffic. Alexander -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
Something must be wrong with the conduit NoHandle code. I increased the filesize to 60MB and implemented the copy loop in pure C, the code and results are here: https://gist.github.com/gregorycollins/5115491 Everything but the conduit NoHandle code runs in roughly 600-620ms, including the pure C version. G On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas alexander.kjeld...@gmail.com wrote: On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins g...@gregorycollins.netwrote: On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote: For comparison, on my system I get $ time cp input.dat output.dat real 0m0.004s user 0m0.000s sys 0m0.000s Does your workstation have an SSD? Michael's using a spinning disk. If you're only copying a GB or so, it should only be memory traffic. Alexander -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
That demonstrated the issue: I'd forgotten to pass O_TRUNC to the open system call. Adding that back makes the numbers much more comparable. Thanks for the input everyone, and Gregory for finding the actual problem (as well as pointing out a few other improvements). On Fri, Mar 8, 2013 at 12:13 PM, Gregory Collins g...@gregorycollins.netwrote: Something must be wrong with the conduit NoHandle code. I increased the filesize to 60MB and implemented the copy loop in pure C, the code and results are here: https://gist.github.com/gregorycollins/5115491 Everything but the conduit NoHandle code runs in roughly 600-620ms, including the pure C version. G On Fri, Mar 8, 2013 at 10:13 AM, Alexander Kjeldaas alexander.kjeld...@gmail.com wrote: On Fri, Mar 8, 2013 at 9:53 AM, Gregory Collins g...@gregorycollins.netwrote: On Fri, Mar 8, 2013 at 9:48 AM, John Lato jwl...@gmail.com wrote: For comparison, on my system I get $ time cp input.dat output.dat real 0m0.004s user 0m0.000s sys 0m0.000s Does your workstation have an SSD? Michael's using a spinning disk. If you're only copying a GB or so, it should only be memory traffic. Alexander -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
1GB/s for copying a file is reasonable - it's around half the memory bandwidth, so copying the data twice would give that result (assuming no actual I/O is taking place, which is what you want because actual I/O will swamp any differences at the software level). The Handle overhead should be negligible if you're only using hGetBufSome and hPutBuf, because those functions basically just call read() and write() when the amount of data is larger than the buffer size. There's clearly something suspicious going on here, unfortunately I don't have time right now to investigate, but I'll keep an eye on the thread. Cheers, Simon On 08/03/13 08:36, Gregory Collins wrote: +Simon Marlow A couple of comments: * maybe we shouldn't back the file by a Handle. io-streams does this by default out of the box; I had a posix file interface for unix (guarded by CPP) for a while but decided to ditch it for simplicity. If your results are correct, given how slow going by Handle seems to be I may revisit this, I figured it would be good enough. * io-streams turns Handle buffering off in withFileAsOutput. So the difference shouldn't be as a result of buffering. Simon: is this an expected result? I presume you did some Handle debugging? * the IO manager should not have any bearing here because file code doesn't actually ever use it (epoll() doesn't work for files) * does the difference persist when the file size gets bigger? * your file descriptor code doesn't handle EINTR properly, although you said you checked that the file copy is being done? * Copying a 1MB file in 1ms gives a throughput of ~1GB/s. The other methods have a more believable ~70MB/s throughput. G On Fri, Mar 8, 2013 at 7:30 AM, Michael Snoyman mich...@snoyman.com mailto:mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Gregory Collins g...@gregorycollins.net mailto:g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
One clarification: it seems that sourceFile and sourceFileNoHandle have virtually no difference in speed. The gap comes exclusively from sinkFile vs sinkFileNoHandle. This makes me think that it might be a buffer copy that's causing the slowdown, in which case the benchmark may in fact be accurate. On Mar 8, 2013 8:30 AM, Michael Snoyman mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O benchmark help (conduit, io-streams and Handle)
I would have expected sourceFileNoHandle to make the most difference, since that's one location (write) where you've obviously removed a copy. Does sourceFileNoHandle allocate less? Incidentally, I've recently been making similar changes to IO code (removing buffer copies) and getting similar speedups. Although the results tend to be less pronounced in code that isn't strictly IO-bound. On Fri, Mar 8, 2013 at 2:50 PM, Michael Snoyman mich...@snoyman.com wrote: One clarification: it seems that sourceFile and sourceFileNoHandle have virtually no difference in speed. The gap comes exclusively from sinkFile vs sinkFileNoHandle. This makes me think that it might be a buffer copy that's causing the slowdown, in which case the benchmark may in fact be accurate. On Mar 8, 2013 8:30 AM, Michael Snoyman mich...@snoyman.com wrote: Hi all, I'm turning to the community for some help understanding some benchmark results[1]. I was curious to see how the new io-streams would work with conduit, as it looks like a far saner low-level approach than Handles. In fact, the API is so simple that the entire wrapper is just a few lines of code[2]. I then added in some basic file copy benchmarks, comparing conduit+Handle (with ResourceT or bracket), conduit+io-streams, straight io-streams, and lazy I/O. All approaches fell into the same ballpark, with conduit+bracket and conduit+io-streams taking a slight lead. (I haven't analyzed that enough to know if it means anything, however.) Then I decided to pull up the NoHandle code I wrote a while ago for conduit. This code was written initially for Windows only, to work around the fact that System.IO.openFile does some file locking. To avoid using Handles, I wrote a simple FFI wrapper exposing open, read, and close system calls, ported it to POSIX, and hid it behind a Cabal flag. Out of curiosity, I decided to expose it and include it in the benchmark. The results are extreme. I've confirmed multiple times that the copy algorithm is in fact copying the file, so I don't think the test itself is cheating somehow. But I don't know how to explain the massive gap. I've run this on two different systems. The results you see linked are from my local machine. On an EC2 instance, the gap was a bit smaller, but the NoHandle code was still 75% faster than the others. My initial guess is that I'm not properly tying into the IO manager, but I wanted to see if the community had any thoughts. The relevant pieces of code are [3][4][5]. Michael [1] http://static.snoyman.com/streams.html [2] https://github.com/snoyberg/conduit/blob/streams/io-streams-conduit/Data/Conduit/Streams.hs [3] https://github.com/snoyberg/conduit/blob/streams/conduit/System/PosixFile.hsc [4] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L54 [5] https://github.com/snoyberg/conduit/blob/streams/conduit/Data/Conduit/Binary.hs#L167 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
On Mar 12, 2008, at 4:07 PM, Andrew Coppin wrote: I'm trying to read the file from Notepad.exe while my Haskell program is still running - which takes about an hour. I'm not a Windows user, but... Is it possible that Notepad tries to write-lock by default (since it's an editor), and fails? Put another way, have you tried other ways of reading the file? Heck, copying the file should be a read-only action. Can you copy it during the runtime, and open the copy? -johnnn ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
John Melesky wrote: On Mar 12, 2008, at 4:07 PM, Andrew Coppin wrote: I'm trying to read the file from Notepad.exe while my Haskell program is still running - which takes about an hour. I'm not a Windows user, but... Is it possible that Notepad tries to write-lock by default (since it's an editor), and fails? Notepad successfully opens the log file of another script I'm running, so that's not the issue. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
On Wed, Mar 12, 2008 at 10:03 PM, Andrew Coppin [EMAIL PROTECTED] wrote: Don Stewart wrote: Hey Andrew, What are you trying to do? Read and write to the same file (if so, you need to use strict IO), or are you trying something sneakier? I have a long-running Haskell program that writes status information to a log file. I'd like to be able to open and read that log file before the program has actually terminated. I have a similar program written in Tcl that allows me to do this, since apparently the Tcl interpretter doesn't lock output files for exclusive access. Haskell, however, does. (This seems to be the stipulated behaviour as per the Report.) If there's an easy way to change this, it would be useful... How about using appendFile? appendFile :: FilePath - String - IO () The computation appendFile file str function appends the string str, to the file file. See http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v%3AappendFile /Björn ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
Hello Andrew, Wednesday, March 12, 2008, 10:06:44 PM, you wrote: When I write to a file using System.IO, the file is locked for exclusive access. I gather this is as specified in the Haskell Report. Which is nice, but... I'd actually prefer the file to *not* be locked. Anybody know how to do that? one (and only?) possible way is to use Streams library which happens to not lock files: http://haskell.org/haskellwiki/Library/Streams -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
Bulat Ziganshin wrote: Hello Andrew, one (and only?) possible way is to use Streams library which happens to not lock files: Is that likely to compile on Windows? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] File I/O question
Hi Cafe. There's good news and there's bad news. The bad news is... I'm back. [Did I miss anything good?] The good news is... I have an actual question to ask as well. When I write to a file using System.IO, the file is locked for exclusive access. I gather this is as specified in the Haskell Report. Which is nice, but... I'd actually prefer the file to *not* be locked. Anybody know how to do that? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
andrewcoppin: Hi Cafe. There's good news and there's bad news. The bad news is... I'm back. [Did I miss anything good?] The good news is... I have an actual question to ask as well. When I write to a file using System.IO, the file is locked for exclusive access. I gather this is as specified in the Haskell Report. Which is nice, but... I'd actually prefer the file to *not* be locked. Anybody know how to do that? Hey Andrew, What are you trying to do? Read and write to the same file (if so, you need to use strict IO), or are you trying something sneakier? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
Don Stewart wrote: Hey Andrew, What are you trying to do? Read and write to the same file (if so, you need to use strict IO), or are you trying something sneakier? I have a long-running Haskell program that writes status information to a log file. I'd like to be able to open and read that log file before the program has actually terminated. I have a similar program written in Tcl that allows me to do this, since apparently the Tcl interpretter doesn't lock output files for exclusive access. Haskell, however, does. (This seems to be the stipulated behaviour as per the Report.) If there's an easy way to change this, it would be useful... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
andrewcoppin: Don Stewart wrote: Hey Andrew, What are you trying to do? Read and write to the same file (if so, you need to use strict IO), or are you trying something sneakier? I have a long-running Haskell program that writes status information to a log file. I'd like to be able to open and read that log file before the program has actually terminated. I have a similar program written in Tcl that allows me to do this, since apparently the Tcl interpretter doesn't lock output files for exclusive access. Haskell, however, does. (This seems to be the stipulated behaviour as per the Report.) If there's an easy way to change this, it would be useful... Did you open the file in ReadWriteMode ? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
Don Stewart wrote: andrewcoppin: Don Stewart wrote: Hey Andrew, What are you trying to do? Read and write to the same file (if so, you need to use strict IO), or are you trying something sneakier? I have a long-running Haskell program that writes status information to a log file. I'd like to be able to open and read that log file before the program has actually terminated. I have a similar program written in Tcl that allows me to do this, since apparently the Tcl interpretter doesn't lock output files for exclusive access. Haskell, however, does. (This seems to be the stipulated behaviour as per the Report.) If there's an easy way to change this, it would be useful... Did you open the file in ReadWriteMode ? Nope. Just WriteMode. I'm trying to read the file from Notepad.exe while my Haskell program is still running - which takes about an hour. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
andrewcoppin: Nope. Just WriteMode. I'm trying to read the file from Notepad.exe while my Haskell program is still running - which takes about an hour. Oh, you want another process in the system to read the file while GHC is writing to it? This works fine on unix systems -- and perhaps Neil, or one of the other windows experts, can explain what the story is on Windows. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File I/O question
Don Stewart wrote: Oh, you want another process in the system to read the file while GHC is writing to it? That's the one. ;-) [Well, not GHC but my GHC-compiled binary, but anyway...] This works fine on unix systems -- and perhaps Neil, or one of the other windows experts, can explain what the story is on Windows. I thought the Report... wait, let me check... Oh, OK, that's not what I thought the Report says. Section 21.2.3, File Locking. I thought it says that Haskell is supposed to prevent access to the same file by multiple threads. (And that, presumably, is why it's using an exclusive lock to try to implement these semantics under the Win32 API.) However, that's apparently not what it says... (It says multiple reader / single writer.) So... does this count as a bug then? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] file i/o
Hi, I am relatively new to Haskell and am finding i/o difficult to work with. I am trying to do something like the following: I have a file of data, each line of which looks like this: STRING, INTEGER SEQUENCE for example: FOO ,2,1,4,3,6,7,5,9,10,11,8,13,12, I would like to write a function the reads this file and returns arrays like this: [FOO,[2,1,4,3,6,7,5,9,10,11,8,13,12]] Ideally, the function would return the first line when initially called, the second the next time it is called and so on. I would settle for something that returned a big array comprising arrays of the above type containing all the information in the file. The file is big, however. I can not figure out how to do this from any of the tutorials so I thought I might ask here. Thank you for your help, Robert. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] file i/o
Hi Robert, The first thing to mention is that Haskell uses linked-lists, not arrays as the standard list type structure, so [1,2] is actually a linked list. The next thing to note is that Haskell is *lazy*. It won't do work that it doens't have to. This means that you can return a linked list with all the lines in the file, but they won't actually be read til they are required. i.e. Haskell cleverly worries about all the getting a next line as required stuff, without you even noticing - it will read it line by line. A simple function that does some of what you want is: parseFile :: FilePath - IO [(String, [Int])] parseFile x = do src - readFile x return (map parseLine (lines src)) parseLine :: String - (String, [Int]) parseLine = for you to write :) The other point is that Haskell linked lists have to have every element of the same type, so you can't have [test,1] as a linked list, what you actually want is a tuple, written (test,1) - a tuple is of fixed length and all elements can be of different type. Hope that helps, Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] file i/o
The other thing to mention, is that if you have the ability to change file formats, it may be better to make just a slight adjustment... If you make it look exactly like the haskell data structure you want: [(Foo, [1,2,3,4,5,6,7]) ,(Bar, [7,6,5,4,3,2,1]) ,...] Then your parser becomes even simpler: parseFile :: FilePath - IO [(String,[Int])] parseFile = do src - readFile x return $ read src On Jan 3, 2006, at 11:33 AM, Neil Mitchell wrote: Hi Robert, The first thing to mention is that Haskell uses linked-lists, not arrays as the standard list type structure, so [1,2] is actually a linked list. The next thing to note is that Haskell is *lazy*. It won't do work that it doens't have to. This means that you can return a linked list with all the lines in the file, but they won't actually be read til they are required. i.e. Haskell cleverly worries about all the getting a next line as required stuff, without you even noticing - it will read it line by line. A simple function that does some of what you want is: parseFile :: FilePath - IO [(String, [Int])] parseFile x = do src - readFile x return (map parseLine (lines src)) parseLine :: String - (String, [Int]) parseLine = for you to write :) The other point is that Haskell linked lists have to have every element of the same type, so you can't have [test,1] as a linked list, what you actually want is a tuple, written (test,1) - a tuple is of fixed length and all elements can be of different type. Hope that helps, Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] file i/o
Neil and Thomas, Thanks to both of you for your help. I have things working now. Bob ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe