Re: [Haskell-cafe] advice on space efficient data structure with efficient snoc operation

2009-04-07 Thread Manlio Perillo

Edward Kmett ha scritto:
I'm in the process of adding a Data.Sequence.Unboxed to 
unboxed-containers. I hope to have it in hackage today or tomorrow, 
should its performance work out as well as Data.Set.Unboxed.
 


Looking at the data definition of Data.Sequence I suspect that it is not 
really space efficient.


Please note that I have 480189 arrays, where each array has, on average, 
209 (Word16 :*: Word8) elements.


Using a [(Word32, UArr (Word16 :*: Word8))] takes about 800 MB (but it's 
hard to measure exact memory usage).



> [...]


Thanks  Manlio
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] advice on space efficient data structure with efficient snoc operation

2009-04-07 Thread Edward Kmett
I'm in the process of adding a Data.Sequence.Unboxed to unboxed-containers.
I hope to have it in hackage today or tomorrow, should its performance work
out as well as Data.Set.Unboxed.

Be warned the API will likely be shifting on you for a little while, while I
figure out a better way to manage all of the instances.

-Edward Kmett
On Tue, Apr 7, 2009 at 7:49 AM, Manlio Perillo wrote:

> Hi.
>
> I'm still working on my Netflix Prize project.
>
> For a function I wrote, I really need a data structure that is both space
> efficient (unboxed elements) and with an efficient snoc operation.
>
> I have pasted a self contained module with the definition of the function
> I'm using:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453
>
>
> The movie ratings are loaded from serialized data, and the result is
> serialized again, using the binary package:
>
> transcodeIO :: IO ()
> transcodeIO = do
>  input <- L.hGetContents stdin
>  let output = encodeZ $ transcode $ decodeZ input
>  L.hPut stdout output
>
> (here encodeZ and decodeZ are wrappers around Data.Binary.encode and
> Data.Binary.decode, with support to gzip compression/decompression)
>
>
> This function (transcodeIO, not transcode) takes, on my
> Core2 CPU T7200  @ 2.00GHz:
>
> real30m8.794s
> user29m30.659s
> sys 0m10.313s
>
> 1068 Mb total memory in use
>
>
> The problem here is with snocU, that requires a lot of copying.
>
> I rewrote the transcode function so that the input data set is split in N
> parts:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456
>
> The mapReduce function is the one defined in the Real World Haskell.
>
>
> The new function takes (using only one thread):
>
> real18m48.039s
> user18m30.901s
> sys 0m6.520s
>
> 1351 Mb total memory in use
>
>
> The additional required memory is probably caused by unionsWith, that is
> not strict.
> The function takes less time, since array copying is optimized.
> I still use snocU, but on small arrays.
>
> GC time is very high: 54.4%
>
>
> Unfortunately I can not test with more then one thread, since I get
> segmentation faults (probably a bug with uvector packages).
>
> I also got two strange errors (but this may be just the result of the
> segmentation fault, I'm not able to reproduce them):
>
> tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 ||
> (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)'
> failed.
>
>
> internal error: removeThreadFromQueue: not found
>(GHC version 6.8.2 for i386_unknown_linux)
>Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
>
>
>
> Now the question is: what data structure should I use to optimize the
> transcode function?
>
> IMHO there are two solutions:
>
> 1) Use a "lazy" array.
>   Something like ByteString.Lazy, and what is available in
>   storablevector package.
>
>   Using this data structure, I can avoid the use of appendU.
>
> 2) Use an "unboxed" list.
>
>   Something like:
>  http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h
>
>   That is: a linked list of unboxed arrays, but unlike the lazy array
>   solution, a snoc operation avoid copying if there is space in the
>   current array.
>
>   I don't know if this is easy/efficient to implement in Haskell.
>
>
> Any other suggestions?
>
>
> Thanks  Manlio Perillo
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] advice on space efficient data structure with efficient snoc operation

2009-04-07 Thread Manlio Perillo

Hi.

I'm still working on my Netflix Prize project.

For a function I wrote, I really need a data structure that is both 
space efficient (unboxed elements) and with an efficient snoc operation.


I have pasted a self contained module with the definition of the 
function I'm using:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453


The movie ratings are loaded from serialized data, and the result is 
serialized again, using the binary package:


transcodeIO :: IO ()
transcodeIO = do
  input <- L.hGetContents stdin
  let output = encodeZ $ transcode $ decodeZ input
  L.hPut stdout output

(here encodeZ and decodeZ are wrappers around Data.Binary.encode and 
Data.Binary.decode, with support to gzip compression/decompression)



This function (transcodeIO, not transcode) takes, on my
Core2 CPU T7200  @ 2.00GHz:

real30m8.794s
user29m30.659s
sys 0m10.313s

1068 Mb total memory in use


The problem here is with snocU, that requires a lot of copying.

I rewrote the transcode function so that the input data set is split in 
N parts:

http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456

The mapReduce function is the one defined in the Real World Haskell.


The new function takes (using only one thread):

real18m48.039s
user18m30.901s
sys 0m6.520s

1351 Mb total memory in use


The additional required memory is probably caused by unionsWith, that is 
not strict.

The function takes less time, since array copying is optimized.
I still use snocU, but on small arrays.

GC time is very high: 54.4%


Unfortunately I can not test with more then one thread, since I get 
segmentation faults (probably a bug with uvector packages).


I also got two strange errors (but this may be just the result of the 
segmentation fault, I'm not able to reproduce them):


tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || 
(new_prio >= __sched_fifo_min_prio && new_prio <= 
__sched_fifo_max_prio)' failed.



internal error: removeThreadFromQueue: not found
(GHC version 6.8.2 for i386_unknown_linux)
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug



Now the question is: what data structure should I use to optimize the 
transcode function?


IMHO there are two solutions:

1) Use a "lazy" array.
   Something like ByteString.Lazy, and what is available in
   storablevector package.

   Using this data structure, I can avoid the use of appendU.

2) Use an "unboxed" list.

   Something like:
  http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h

   That is: a linked list of unboxed arrays, but unlike the lazy array
   solution, a snoc operation avoid copying if there is space in the
   current array.

   I don't know if this is easy/efficient to implement in Haskell.


Any other suggestions?


Thanks  Manlio Perillo
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe