[Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Hi again.

In 
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390

there is a map reduce based log parser.

I have written an alternative version:
http://paste.pocoo.org/show/85699/

but, with a file of 315 MB, I have [1]:

1) map reduce implementation, non parallel
real0m6.643s
user0m6.252s
sys 0m0.212s

2) map reduce implementation, parallel with 2 cores
real0m3.840s
user0m6.384s
sys 0m0.652s

3) my implementation
real0m8.121s
user0m7.804s
sys 0m0.216s



What is the reason of the map reduce implementation being faster, even 
if not parallelized?


It is possible to implement a map reduce version that can handle gzipped 
log files?



[1] These tests does not consider the first run.
For the first run (no data in OS cache), I have (not verified):

1) map reduce implementation, parallel with 2 cores
real0m3.735s
user0m6.328s
sys 0m0.604s

2) my implementation
real0m13.659s
user0m7.712s
sys 0m0.360s




Thanks   Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Don Stewart
manlio_perillo:
 Hi again.
 
 In 
 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id676390
 there is a map reduce based log parser.
 
 I have written an alternative version:
 http://paste.pocoo.org/show/85699/
 
 but, with a file of 315 MB, I have [1]:
 
 1) map reduce implementation, non parallel
 real  0m6.643s
 user  0m6.252s
 sys   0m0.212s
 
 2) map reduce implementation, parallel with 2 cores
 real  0m3.840s
 user  0m6.384s
 sys   0m0.652s
 
 3) my implementation
 real  0m8.121s
 user  0m7.804s
 sys   0m0.216s
 
 
 
 What is the reason of the map reduce implementation being faster, even 
 if not parallelized?

Changes in how GC is utilised, or how optimisation works?
  
 It is possible to implement a map reduce version that can handle gzipped 
 log files?

Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Bulat Ziganshin
Hello Don,

Friday, September 19, 2008, 9:12:43 PM, you wrote:

 It is possible to implement a map reduce version that can handle gzipped
 log files?

 Using the zlib binding on hackage.haskell.org, you can stream multiple
 zlib decompression threads with lazy bytestrings, and combine the
 results.

for gzip decompression you need to decompress all the previous data,
so decompression of single file cannot be multithreaded


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Don Stewart ha scritto:

manlio_perillo:
[...]
It is possible to implement a map reduce version that can handle gzipped 
log files?


Using the zlib binding on hackage.haskell.org, you can stream multiple
zlib decompression threads with lazy bytestrings, and combine the
results.



This is a bit hard.
A deflate encoded stream contains multiple blocks, so you need to find 
the offset of each block and decompress it in parallel.

But then you need also to make sure each final block terminates with a '\n'.

And the zlib Haskell binding does not support this usage (I'm not even 
sure zlib support this).




By the way, this phrase:
We allow multiple threads to read different chunks at once by supplying 
each one with a distinct file handle, all reading the same file

here:
http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

IMHO is not correct, or at least misleading.
Each block is read in the main thread, or at least myThreadId return 
always the same value.


This is also the reason why I don't understand why my version is slower 
then the book version.
The only difference is that the book version reads 4 chunks and my 
version only 1 big chunk.




-- Don




Thanks   Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Bryan O'Sullivan
On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo [EMAIL PROTECTED]wrote:


 By the way, this phrase:
 We allow multiple threads to read different chunks at once by supplying
 each one with a distinct file handle, all reading the same file
 here:

 http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

 IMHO is not correct, or at least misleading.


It's both correct and, er, leading. The files are opened in a single thread,
and then the file handles are read by multiple threads.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] performance of map reduce

2008-09-19 Thread Manlio Perillo

Bryan O'Sullivan ha scritto:
On Fri, Sep 19, 2008 at 2:31 PM, Manlio Perillo 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:



By the way, this phrase:
We allow multiple threads to read different chunks at once by
supplying each one with a distinct file handle, all reading the same
file
here:

http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html#id677193

IMHO is not correct, or at least misleading.


It's both correct and, er, leading. The files are opened in a single 
thread, and then the file handles are read by multiple threads.


Ah, right, thanks.
They are read with ByteString.Lazy.



Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe