Re[2]: [Haskell-cafe] idea for avoiding temporaries

2007-03-11 Thread Bulat Ziganshin
Hello Claus,

Sunday, March 11, 2007, 10:03:59 PM, you wrote:

 both the array and strict list versions avoid some intermediate structures; 
 for the
 arbitrarily invented, relatively small inputs i've tried, strict lists are 
 the clear winner,
 thanks to lower memory traffic, but i'd like some feedback from the experts:

 -are there any obvious inefficiencies in the array code?

obviously, arrays version should create no temporary cells. the
problems was mainly due to 2 factors:

1) readArray m (i,j)
2) 'op' in 'l' which was passed as real closure and was not inlined
due to weakness of ghc optimizer

also, we should help strictness analyzer by marking all the variables
used in tight loops as strict. after that is done, we got 1000 times
less temporary data allocated and 5x faster execution. now it's a bit
faster than strict lists

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

CG.hs
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] idea for avoiding temporaries

2007-03-11 Thread Claus Reinke

Hi Bulat,

obviously, arrays version should create no temporary cells. 


that's why the memory traffic surprised me. i knew there had to be something 
wrong.


the problems was mainly due to 2 factors:
1) readArray m (i,j)


yes, indeed. since we are dealing in bulk operations, we might as well take advantage 
of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.


moral/note to self: bulk array operations are your friend (i knew that!-), but you need 
to use that when defining them (unsafeRead et al are only for library writers, but library 
writers ought to use them; and i was writing a small inlined library)



2) 'op' in 'l' which was passed as real closure and was not inlined
due to weakness of ghc optimizer


it irked me having to write the same loop twice, but i didn't consider the 
consequences.
an INLINE pragma on l almost seems sufficient to regain the loss, so i prefer that; but 
writing out the loop twice is still a tiny tick faster..


we should help strictness analyzer by marking all the variables used in tight loops as strict. 


ah, there were a few surprises in that one. i thought i had considered possible 
strictness
problems, but obviously, i missed a few relevant possibilities. annotating 
everything is the
safe option, and clearly documents the intentions, but i cannot help thinking about which 
annotations could be omitted:


- modArray: a and i are both used anyway
- i index in loop is definitely checked (but j isn't, and some others weren't, either; so 
  better safe than sorry)

- some of the parameters need not be annotated in this example, but should be 
if one
 wanted to reuse the code elsewhere 


- the one i really don't get yet is the one on the accumulators (!s in l, in 
dotA/matA);
 i thought i had covered that by using $! in applying the loop, but annotating 
the
 formal loop parameter, apart from being nicer, also seems faster..

moral/note to self: don't try to be clever, try to be clear..; strictness in formal 
parameters is better than strictness in actual parameters; bang-patterns are good;-)


after that is done, we got 1000 times less temporary data allocated and 5x faster 
execution. now it's a bit faster than strict lists


is this now anywhere near what the number-crunchers use, when they use 
Haskell?-)

Bulat, thanks for looking into it and for isolating the issues so quickly!-)
claus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] idea for avoiding temporaries

2007-03-10 Thread Bulat Ziganshin
Hello Claus,

Saturday, March 10, 2007, 4:36:22 AM, you wrote:

 ah, ok, i'm not used to thinking in such scales;-) (perhaps you should get in 
 touch
 with those SAC people, after all - i don't know what their state of play is, 
 but
 many years ago, they started in an office near mine, and they were definitely
 thinking about large arrays, even about how to distribute them, and 
 computations
 on them;

last days i learned details of google's MapReduce system. seems that
this approach is very interesting for dealing with large arrays. files
(arrays) are splitted into chunks, operations are splitted into
chunks, too. afaik, some C compilers are already able to automatically
split vector operations into several threads? at least, it will be
interesting to implement same technique for GHC, may be just in form
of library, like google does

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe