[Haskell-cafe] 1G strings in Haskell

2006-04-20 Thread Donald Bruce Stewart
Question:
Can I manipulate 1G strings in Haskell?

Short answer:
Yes! Mostly.

Doing some stress testing of FPS, here are some results for 1G strings.

3.2Ghz box, 2G physical mem.
Size of input string: 1G

N.B. 2G of physical ram is not enough when trying to benchmark functions
that copy 1G strings around :)

FunctionTime in seconds

All O(1) functions:
length  O
index   0
head0
tail0
last0
init0

O(n) , but answer found early on:
compare 0   
any 0
all 0
elem0
elemIndex   0

O(n) mostly:
maximum 3.855
minimum 4.666   
filterChar  8.084   
elemIndicies11.504  
lines   12.247  
find14.046  
tails   27.328
inits   33.620  
words   100.963
map toUpper 143.871

Failed due to memory exhaustion. 
Almost made it though, just need a tad more ram than I had.

filter  ! 
unlines !
unwords !
reverse !   -- copy
cons!   -- copy
snoc!   -- involves a copy
++  !   -- can't concat two 1G strings on this box

sort? taking too long, but space was ok.

[Char] functions are much more costly.
pack!   -- constructing 1G [Char] is not ok.
unpack  !

Lesssons, you can play with 1G strings in Haskell. Having more than 2G
ram is useful though. Replacing higher order functions with hard coded
first order equivalents can help.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] 1G strings in Haskell

2006-04-20 Thread Donald Bruce Stewart
dagit:
 On 4/19/06, Donald Bruce Stewart [EMAIL PROTECTED] wrote:
  Question:
  Can I manipulate 1G strings in Haskell?
 
 
  Failed due to memory exhaustion.
  Almost made it though, just need a tad more ram than I had.
 
  filter  !
  unlines !
  unwords !
  reverse !   -- copy
  cons!   -- copy
  snoc!   -- involves a copy
  ++  !   -- can't concat two 1G strings on this box
 
 I would say given the nature of the experiment that you want to feed
 these functions with values that lead to 1 GB results.  At least, that
 makes sense in the case of ++, but maybe not the others.

Yes, good point. Previous experiments with 500M strings indicated that
these all worked fine:

Size of test data: 531416k

Fine:
filter  7.264   
unwords 1.715
reverse 1.980
cons0.976   
snoc0.689  
++  2.845   

Still wouldn't work at 0.5G
unlines !
concat  !
  
 Did the machine have any available swap?  If so, why was it not used? 

Swap was used, 1G more, but ulimits kicked in a few times (and I'm not
an admin on the box). This was a bit frustrating, but the 0.5G results
show what is possible.

 The strings had to be in memory all at once to not skew the benchmark
 by including I/O time?

The strings were all in memory, yes, with some deepSeq tricks.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe