[Haskell-cafe] performance of map reduce
Hi again. In http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390 there is a map reduce based log parser. I have written an alternative version: http://paste.pocoo.org/show/85699/ but, with a file of 315 MB, I have [1]: 1) map reduce implementation, non parallel real0m6.643s user0m6.252s sys 0m0.212s 2) map reduce implementation, parallel with 2 cores real0m3.840s user0m6.384s sys 0m0.652s 3) my implementation real0m8.121s user0m7.804s sys 0m0.216s What is the reason of the map reduce implementation being faster, even if not parallelized? It is possible to implement a map reduce version that can handle gzipped log files? [1] These tests does not consider the first run. For the first run (no data in OS cache), I have (not verified): 1) map reduce implementation, parallel with 2 cores real0m3.735s user0m6.328s sys 0m0.604s 2) my implementation real0m13.659s user0m7.712s sys 0m0.360s Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance of map reduce
manlio_perillo: Hi again. In http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390 there is a map reduce based log parser. I have written an alternative version: http://paste.pocoo.org/show/85699/ but, with a file of 315 MB, I have [1]: 1) map reduce implementation, non parallel real 0m6.643s user 0m6.252s sys 0m0.212s 2) map reduce implementation, parallel with 2 cores real 0m3.840s user 0m6.384s sys 0m0.652s 3) my implementation real 0m8.121s user 0m7.804s sys 0m0.216s What is the reason of the map reduce implementation being faster, even if not parallelized? Changes in how GC is utilised, or how optimisation works? It is possible to implement a map reduce version that can handle gzipped log files? Using the zlib binding on hackage.haskell.org, you can stream multiple zlib decompression threads with lazy bytestrings, and combine the results. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] performance of map reduce
Hello Don, Friday, September 19, 2008, 9:12:43 PM, you wrote: It is possible to implement a map reduce version that can handle gzipped log files? Using the zlib binding on hackage.haskell.org, you can stream multiple zlib decompression threads with lazy bytestrings, and combine the results. for gzip decompression you need to decompress all the previous data, so decompression of single file cannot be multithreaded -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance of map reduce
Don Stewart ha scritto: manlio_perillo: [...] It is possible to implement a map reduce version that can handle gzipped log files? Using the zlib binding on hackage.haskell.org, you can stream multiple zlib decompression threads with lazy bytestrings, and combine the results. This is a bit hard. A deflate encoded stream contains multiple blocks, so you need to find the offset of each block and decompress it in parallel. But then you need also to make sure each final block terminates with a '\n'. And the zlib Haskell binding does not support this usage (I'm not even sure zlib support this). By the way, this phrase: We allow multiple threads to read different chunks at once by supplying each one with a distinct file handle, all reading the same file here: http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193 IMHO is not correct, or at least misleading. Each block is read in the main thread, or at least myThreadId return always the same value. This is also the reason why I don't understand why my version is slower then the book version. The only difference is that the book version reads 4 chunks and my version only 1 big chunk. -- Don Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance of map reduce
On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo [EMAIL PROTECTED]wrote: By the way, this phrase: We allow multiple threads to read different chunks at once by supplying each one with a distinct file handle, all reading the same file here: http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193 IMHO is not correct, or at least misleading. It's both correct and, er, leading. The files are opened in a single thread, and then the file handles are read by multiple threads. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] performance of map reduce
Bryan O'Sullivan ha scritto: On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: By the way, this phrase: We allow multiple threads to read different chunks at once by supplying each one with a distinct file handle, all reading the same file here: http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193 IMHO is not correct, or at least misleading. It's both correct and, er, leading. The files are opened in a single thread, and then the file handles are read by multiple threads. Ah, right, thanks. They are read with ByteString.Lazy. Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe