Hi Bulat,
obviously, arrays version should create no temporary cells.
that's why the memory traffic surprised me. i knew there had to be something
wrong.
the problems was mainly due to 2 factors:
1) readArray m (i,j)
yes, indeed. since we are dealing in bulk operations, we might as well take advantage
of that, so dropping the repeated bounds-checks inside the loops makes a lot of sense.
moral/note to self: bulk array operations are your friend (i knew that!-), but you need
to use that when defining them (unsafeRead et al are only for library writers, but library
writers ought to use them; and i was writing a small inlined library)
2) 'op' in 'l' which was passed as real closure and was not inlined
due to weakness of ghc optimizer
it irked me having to write the same loop twice, but i didn't consider the
consequences.
an INLINE pragma on l almost seems sufficient to regain the loss, so i prefer that; but
writing out the loop twice is still a tiny tick faster..
we should help strictness analyzer by marking all the variables used in tight loops as strict.
ah, there were a few surprises in that one. i thought i had considered possible
strictness
problems, but obviously, i missed a few relevant possibilities. annotating
everything is the
safe option, and clearly documents the intentions, but i cannot help thinking about which
annotations could be omitted:
- modArray: a and i are both used anyway
- i index in loop is definitely checked (but j isn't, and some others weren't, either; so
better safe than sorry)
- some of the parameters need not be annotated in this example, but should be
if one
wanted to reuse the code elsewhere
- the one i really don't get yet is the one on the accumulators (!s in l, in
dotA/matA);
i thought i had covered that by using $! in applying the loop, but annotating
the
formal loop parameter, apart from being nicer, also seems faster..
moral/note to self: don't try to be clever, try to be clear..; strictness in formal
parameters is better than strictness in actual parameters; bang-patterns are good;-)
after that is done, we got 1000 times less temporary data allocated and 5x faster
execution. now it's a bit faster than strict lists
is this now anywhere near what the number-crunchers use, when they use
Haskell?-)
Bulat, thanks for looking into it and for isolating the issues so quickly!-)
claus
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe