Re[2]: inside the GHC code generator

2006-02-28 Thread Bulat Ziganshin
Hello Simon,

Friday, February 24, 2006, 12:52:36 PM, you wrote:
SM Wow!  You make a lot of good suggestions.  I think some of them are
SM unfortunately unworkable, some others have been tried before, but there
SM are definitely some to try.  I'll suggest a few more in this email.

can you comment about unworkable/hard to implement ones? may be, ghc
users can propose some solutions that you are skipped

SM First of all I should say that we don't *want* to use gcc as a code 
SM generator.  Everything we've been doing in the back end over the last 
SM few years has been aiming towards dropping the dependency on gcc and 
SM removing the Evil Mangler.  Now is not the time to be spending a lot of
SM effort in beefing up the C back end :-)  Our preference is to work on 
SM both the native and C-- back ends; the C-- back end has the potential to
SM produce the best code and be the most portable.  We also plan to keep 
SM the unregisterised C back end for the purposes of porting to a new 
SM platform, it's only the registerised path we want to lose.

i know that is the hardest part of discussion. but sometimes we need
to change our decisions. i think that it's better to compare
efforts/results of going toward gcc/c-- way starting from current
state of ghc, avoiding counting the previous efforts :(  on the c--
side:

1) time/efforts required to implement many low-level optimizations
2) more or less efficient code

on the gcc side:

1) time required to implement generation of native C code instead of
current portable asm
2) one of the best optimization possible

gcc way already exists and works, while C-- way is still unfinished
even in its basic, unoptimized state. next, unregisterized C way
provides the maximum possible level of portability


SM Having said all that, if we can get better code out of the C back end
SM without too much effort, then that's certainly worthwhile.

moreover, many changes, proposed on this thread, perfectly
implementable even on C-- way

 translated to something like this
 
 factorial:
 _s1BD = *(Sp + 0);
 if (_s1BD != 1) goto c1C4; // n!=1 ?
 R1 = *(Sp + 4);
 Sp = Sp + 8;
 return (*(Sp + 0))();  // return from factorial
 c1C4:
 _s1BI = _s1BD * (*(Sp + 4));   // n*r
 _s1BF = _s1BD - 1; // n-1
 *(Sp + 4) = _s1BI;
 *(Sp + 0) = _s1BF;
 goto factorial;

SM To be fair, this is only the translation on an architecture that has no
SM argument registers (x86, and currently x86_64 but hopefully not for long).

this just emphasizes main problem of portable asm code generated by
ghc - you can't optimize it as good as C compiler optimizes idiomatic
C code. x86 has 7 registers, after all, and gcc generates great code
for this procedure - but only if it is written in idiomatic manner.
you will need to develop your own optimization techniques, and it will
be hard and long way comparing to just generation of more high-level C
code

just to compare - i've rewritten my binary serialization library two
times. the first version just used string concatenation (!), the
second was the carefully optimized using my C experience.
nevertheless, it was slow enough. and the final 3rd was optimized
according to ghc features. and the result was astonishing

portable asm looking great in theory, and C-- as portable low-level
language is great theoretical idea. but reality has its own laws. want
to know why my second version of serialization library, highly
optimized according to C experience, was so slow? it just returns
tuples and constructing/analyzing these lazy tuples used 80% of the
time

ghc generates not so fast code and that is known for many years. you
have developed the proper theoretical way to solve this problem, but
there is also much easier backdoor :) at least, it seems like this

SM The simple optimisation of changing the final tail call to factorial 
SM into a goto to a label at the beginning of the function (as suggested by
SM John Meacham I think) should improve the code generated by gcc for 
SM functions like this, and is easy to do.

the main detail is to work with local variables instead of Sp[xx]. gcc
can't optimize access to Sp[xx]

 1) because it just not asked. you can enable gcc optimization by
 adding -optc-O6 to the cmdline, but this leads to compilation errors
 on part of modules. it seems that gcc optimization is not compatible
 with evil mangler that is the ghc's own optimization trick.

SM -O3 works, I haven't tried -O6.

i've attached optc-O2-bug.hs that don't work both with -optc-O2 and
-optc-O3

 * C++ calling stack should be used as much as possible
 * parameters are passed in C++ way: factorial(int n, int r)

SM This is unworkable, I'm afraid.  Go and read Simon's paper on the STG:

SMhttp://citeseer.ist.psu.edu/peytonjones92implementing.html

SM there are two main reasons we don't use the C stack: garbage collection
SM and tail calls.  What do you plan to do about GC?


Re[2]: inside the GHC code generator

2006-02-27 Thread Bulat Ziganshin
Hello Simon,

Friday, February 24, 2006, 12:30:22 PM, you wrote:

SPJ | GHC has great high-level optimization.

SPJ However, let me strongly urge you *not* to focus attention primarily on
SPJ the gcc route.  Compiling via C has received a lot of attention over the
SPJ years, and there are many papers describing cool hacks for doing so.
SPJ GHC does not do as well as it could.

btw, it's well known that Clean sometimes make better code than GHC
and sometimes vice versa. now I'm almost sure that GHC outperforms
Clean in high-level optimizations and give away on the low-level code
generation, so the grand result depends on that is more important for
particular algorithm. on the hand-coded programs Clean should almost
always outperform GHC

SPJ But there are serious obstacles.  That's not gcc's fault -- it wasn't
SPJ designed for this.  Accurate GC is one of them,

we ca use two stacks - for boxed and for unboxed values

SPJ tail calls is another,

nowadays gcc optimize tail calls

SPJ and there are plenty more smaller things that bite you only after you've
SPJ invested a lot of time.  This way lies madness.

but nevertheless ghc already compiles to gcc and it's the fastest
backend, with help of Evil Mangler. afaik, code generated by ghc will
work even w/o Evil Mangler, so it's just an optimization hack. can you
please say what an optimizations done via EM? i think that part of
them can be implemented via changing C generation, so may be even we can
omit EM at long last

SPJ C-- was *designed* for this purpose.  GHC uses C-- as its intermediate
SPJ language (just before emitting C).  So a good route is this:

SPJ * Write C-- to C-- optimisations

SPJ * Then, if you like, translate that code to C.  Already you will be
SPJ doing better than GHC does today, because the C-- to C-- optimiser will
SPJ let you generate better C

SPJ * But you can also emit C--, or native code; both of these paths will
SPJ directly benefit from your C-- optimisations.

SPJ The point is that none of this relies on Quick C--; you can always use
SPJ an alternative back end.

yes, C-- was designed to solve all our problems - provide world-class
optimization, code generation for different cpus and so on. but does
it fulfil it's promises? no. and now you propose to write these
optimization as part of Haskell project. great idea! i've started from
the same idea and even wrote a sequence of optimizations what will
transform initial C-- code for fac to the efficient one. but then i
realized that the core problem is what ghc just generates low-level
code in portable asm style. and optimizing of low-level code that
tries to describe how to IMPLEMENT this algorithm instead of just
describe the ALGORITHM itself is a hard task. noone writes optimizers
for the ASM language, but you propose to do exact this.

instead of coping with current portable asm code generated from STG
i propose to generate idiomatic C or C-- code and leave its
optimization to the appropriate compiler. the SEPARATE problem is what
gcc/icc can generate much better code that qc, so i propose to direct
efforts of generating idiomatic C[--] toward C compilers

SPJ You can almost do this today. GHC uses C-- as an intermediate language.
SPJ But alas, GHC's code generator does not take advantage of C--'s native
SPJ calls or parameter passing.  Instead, it pushes things on an auxiliary
SPJ stack etc.  (Those Sp memory operations you see.)  This isn't necessary.
SPJ We are planning to split GHC's code generator into two parts
SPJ A) Generate C-- with native calls, with an implicit C-- stack
SPJ B) Perform CPS conversion, to eliminate all calls in favour of
SPJ jumps
SPJ using an explicit stack
SPJ The current code gen is A followed by B.  But A is a much more suitable
SPJ optimisation platform, and gives more flexibility.

i propose to implement only A and leave B to the C[--] compiler
itself. that will require to return to 2-stacks model, but in return
we will get 1st-class optimization instead of making itself it's
rather restricted subset. instead of permanently trying to catch up C
compiler's optimization we can just jump to the common bandwagon :)

SPJ Chris Thompson, and undergrad at Cambridge, is doing (B) as his
SPJ undergrad project, although it remains to be seen whether he'll have
SPJ enough complete to be usable.

SPJ Another shortcoming is that the native code generator in GHC isn't
SPJ capable of dealing with backward jumps to labels (because GHC hasn't
SPJ needed that so far).  But if you did C-- optimisation, you'd probably
SPJ generate such jumps.  It'd be great to beef up the native code gen to
SPJ handle that.

i propose to add if/for/while statements to the CmmStmt datatype to
allow generation of higher-level code. as John Meacham wrote several
months ago, gcc can't unroll loops written with gotos. so it's anyway
better to generate explicit loops. it should be not too hard to
generate asm code for these constructs?

although anyway 

Re[2]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello kyra,

Friday, February 24, 2006, 12:37:02 AM, you wrote:

 i prefer to see the asm code. this may be because of better high-level
 optimization strategies (reusing fib values). the scheme about i say
 will combine advantages of both worlds
k no strategies, plain exponential algorithm,

yes, the ocaml compiler works better with stack. but i sure that in
most cases gcc will outperform ocaml because it has large number of
optimizations which is not easy to implement (unrolling, instruction
scheduling and so on)

k also, Clean is *EXACTLY* in line with ocaml. This is interesting,
k because Clean is so much similar to Haskell.

clean differs from Haskell in support of unique types and strictness
annotations. the last is slowly migrates into GHC in form of shebang
patters, but i think that it is a half-solution. i mentioned in
original letter my proposals to add strictness annotations to
function types declarations and to declare strict datastructures, such
as ![Int]

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello Ben,

Friday, February 24, 2006, 2:04:26 AM, you wrote:

 * multiple results can be returned via C++ paira,b template, if this is
 efficiently implemented in gcc

BRG There's a -freg-struct-return option in 2.x, 3.x and 4.x. I think it's off
BRG by default on many architectures.

thank you! -1 problem. moreover, we can use just plain C for our
translation instead of C++


 * recursive definitions translated into the for/while loops if possible

BRG I think recent versions of GCC do a good job of this. See

BRG  http://home.in.tum.de/~baueran/thesis/

there is no problem with proper handling of tail calls. the problem is
what these loops are not unrolled, generating significantly worse
code than explicit loop. you can see this in the files i attached to
the original letter

BRG All of this efficiency stuff aside, there's a big problem you're 
neglecting:
BRG GARBAGE COLLECTION. For a language that allocates as much as Haskell I 
think
BRG a conservative collector is absolutely out of the question, because they
BRG can't compact the heap and so allocation becomes very expensive. A copying
BRG collector has to be able to walk the stack and find all the roots, and that
BRG interacts very badly with optimization. All the languages I've seen that
BRG compile via C either aren't garbage collected or use a conservative or
BRG reference-count collector.

as i said, we can just use 2 stacks - one, pointed by EBP register,
contains all boxed values, second is hardware stack, pointed of course
by ESP, contains unboxed values and managed by gcc as for any other C
programs. so, the boxed parameters to functions are go through
EBP-pointed stack and unboxed values passed via usual C conventions:

int fac(int n, int r)

currently, EBP is used for all data and ESP is just not used.
moreover, until 1999 the same two stacks scheme was used in GHC.
comparing to current one stack scheme, we will need more space for
stacks and may lose something because memory usage patterns will be
slightly less localized


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: inside the GHC code generator

2006-02-24 Thread Lemmih
On 2/24/06, Bulat Ziganshin [EMAIL PROTECTED] wrote:
 Hello kyra,

 Friday, February 24, 2006, 12:37:02 AM, you wrote:

  i prefer to see the asm code. this may be because of better high-level
  optimization strategies (reusing fib values). the scheme about i say
  will combine advantages of both worlds
 k no strategies, plain exponential algorithm,

 yes, the ocaml compiler works better with stack. but i sure that in
 most cases gcc will outperform ocaml because it has large number of
 optimizations which is not easy to implement (unrolling, instruction
 scheduling and so on)

 k also, Clean is *EXACTLY* in line with ocaml. This is interesting,
 k because Clean is so much similar to Haskell.

 clean differs from Haskell in support of unique types and strictness
 annotations. the last is slowly migrates into GHC in form of shebang
 patters, but i think that it is a half-solution. i mentioned in
 original letter my proposals to add strictness annotations to
 function types declarations and to declare strict datastructures, such
 as ![Int]

As I've understood it, Clean's strictness annotations are a bit of a
hack which only works on certain built-in types. Am I mistaking here?

--
Friendly,
  Lemmih
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Bulat Ziganshin
Hello kyra,

Thursday, February 23, 2006, 5:38:54 PM, you wrote:

k Bulat Ziganshin wrote:
 i think that ocaml can't generate code better than gcc and especially
 icc (intel C/C++ compiler), but may be i'm wrong? ;)
   
k didn't try factorial, but exponential fib in ocaml is *FASTER* than both
k gcc and intel c/c++ with highest optimization levels

i prefer to see the asm code. this may be because of better high-level
optimization strategies (reusing fib values). the scheme about i say
will combine advantages of both worlds


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Bulat Ziganshin
Hello Rene,

Thursday, February 23, 2006, 5:32:21 PM, you wrote:

seems that you don;t understand the situation. ghc compiles Haskell to
language called core, do almost all optimizations at level of this
language, then translates final result to the STG language from that
the C-- code is generated. changing the translation of STG can't
prevent ANY ghc optimization. although iy is not so easy because ghc
code generation and RTS closely tied together

RdV I should have been a bit clearer here. I meant that optimizations that are
RdV available from STG - Assembler, are better than STG - C - Assembler.

theoretically - yes. in practice, it is hard to compete with gcc. ghc
wait for such code generation 15 years (wait for the mountain to go in
our direction :) and i think that the REAL way to reach gcc level
optimization is to compile to the idiomatic C instead of continue
waiting while this theoretically possible optimization will arise

RdV GHC currently doesn't do most of the optimizations I am thinking of.
RdV -- Bit tagging to reduce pointer chasing, speed up pattern matching. Due to
RdV memory latency and speed it is quicker to do bit masking rather than memory
RdV reads
RdV -- Parameter passing and regisgter usage opimizations that rely on the
RdV structure of the RTS.
RdV -- Multiple stacks with custom frame layout.
RdV -- dynamic code optimization etc.
RdV -- Taking advantage of special assember instructions and flags.

i think that all these things can be done with idiomatic C way

RdV Though I have also seen comments that you can do a lot of these with GCC if
RdV you do your own stack and parameter management. i.e. don't use the C stack
RdV at all.

as i see in the current code generation, attaempts to do our own
stack management lead to that gcc just don't understand such code and
don't optimize it as good as possible. please compare code generated
for fac() function with all other variants

RdV Though your suggestions are probably better than nothing, which is probably
RdV what the alternative is (for instance I have not sufficient time to work on
RdV these things).

moreover, i sure that you can't compete with gcc :)

RdV Note that I didn't say that the assembly generation of OCAML was better 
than
RdV GCC, just that it was comparable.

what mean comparable? and what we should do to reuse this code
generation in ghc? at least i know what to do to reuse gcc code
generation. can you propose similar plan to reuse ocaml code
generation?


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Bulat Ziganshin
Hello Claus,

Thursday, February 23, 2006, 8:56:57 PM, you wrote:

 the long answer is: are you ever heard promises that gcc is best
 cpu-independent assembler? no? and you know why? because gcc is not
 cpu-independent assembler. gcc was strongly optimized to make
 efficient asm from the code usually written by the C programmers. but
 code generated by ghc has nothing common with it. so we are stay with
 all these register-memory moves, non-unrolled loops and all other
 results of naive compilation. gcc is just don't know how to
 efficiently manage such code!

CR would there be any advantage to targetting gcc's backend directly?

CR I notice that Mercury seems to support this:
CR http://www.cs.mu.oz.au/research/mercury/download/gcc-backend.html
CR http://gcc.gnu.org/frontends.html

CR that is, does using C as assembler disable possible optimizations,
CR or is going through the C frontend enabling more optimizations than
CR going to the backend directly?

i will read this. but one disadvantage is obvious - we can't use other
C compilers, including more efficient intel c/c++ (for my code it was
constantly 10% faster)

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Bulat Ziganshin
Hello Kevin,

Thursday, February 23, 2006, 9:06:25 PM, you wrote:

KG On a related point, Mercury has two C backends a low level one at the
KG level of GHC's and a high level one. Bulat might want to read this for
KG a description of the high level C implementation:

KG  
KG http://www.cs.mu.oz.au/research/mercury/information/papers.html#hlc_cc

citating from this paper's annotation: Many logic programming
implementations compile to C, but they compile to very low-level C,
and thus discard many of the advantages of compiling to a high-level
language. it's the same as i think

KG Also, ghc used to be faster than gcc for a naive, recursive factorial
KG function (once the cpr analysis and optimisation was added).  From
KG what Bulat wrote it seems that gcc got better ...

i don't say that we must compile recursive Haskell/STG functions
naively to recursive C ones (as jhc does). no - i say that we should
translate recursive Haskell definitions to explicit C loops, what is
NATURAL C PROGRAMMING STYLE and therefore optimized much better as you
can see in the files i attached

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Kevin Glynn

Bulat Ziganshin writes:
  Hello Kevin,
  

  
  KG Also, ghc used to be faster than gcc for a naive, recursive factorial
  KG function (once the cpr analysis and optimisation was added).  From
  KG what Bulat wrote it seems that gcc got better ...
  
  i don't say that we must compile recursive Haskell/STG functions
  naively to recursive C ones (as jhc does). no - i say that we should
  translate recursive Haskell definitions to explicit C loops, what is
  NATURAL C PROGRAMMING STYLE and therefore optimized much better as you
  can see in the files i attached
  

Ahhh, OK. I misunderstood.  I don't claim ghc beat a C loop.  ghc was
just more optimised at making function calls :-).

k
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-23 Thread Bulat Ziganshin
Hello Rene,

Thursday, February 23, 2006, 10:17:40 PM, you wrote:

RdV Maybe GHC should generate better C. I am just not sure whether this will
RdV bring the best global (as opposed to micro-optimizations) performance.

i answered in the original letter (search for Cray :)

RdV Generating better C might also be much more difficult for idioms that don't
RdV exist in C,

i will be glad to see concrete examples

RdV and might encourage people to write Haskell that looks like C
RdV (in order to get C like performance).

yes, they do just it now (see shootout entries or my Streams library)
but still don't get C performance. so i think that we will go to
something better position :)

RdV I prefer idiomatic Haskell. It would be nice if this could be fast. But can
RdV idiomatic Haskell be translated to efficient C? It might not have many
RdV direct loops for example.

sorry, i don't understand that you mean? in general, idiomatic Haskell
has big efficiency problem and this problem is laziness. the strict
Haskell code is, like ML code, can be compiled efficient

RdV -- Other notes

RdV Integer is about 30 times slower than it needs to be on GHC if you have 
over
RdV half the values between 2^-31 and 2^31. On i386 can you basically can test
RdV the low bit for free (it runs in parallel to the integer add instruction).
RdV This would allow only values outside this range to required calling the 
long
RdV integer code. Such an optimization is not easily done in C.

RdV This encourages Haskell programmers to always use Int, even if the value
RdV might get too big, because Integer is too slow.

this optimization is not implemented in ghc until now and i think that
amount of work required to implement it is bigger than requirements to
do faster Integer processing. moreover, i hope that good optimizing C
compiler can avoid additional testing

RdV Also Tail recursion is more general than looping. A general tail call 
RdV optimization will bring better returns than a loop optimization in GHC
RdV (though I could be wrong here). This requires special stack handling. Also
RdV not so easy in C.

seems that you don't seen the attached files. tail calls are optimized
in gcc

RdV If only simple loops are optimized it will encourage people to always code
RdV loops in their haskell rather than perhaps using more appropriate 
RdV constructs.

you are prefer that people will not have any option to make fast code?
:)  also i want to emphasize that C loops is the Haskell recursion. we
always said that these two constructs are equivalent, now it's the
time to generate code with the same speed

RdV Also take the Maybe data type with Nothing and Just ... or any other 
RdV datatypes with 0 and 1 variable constructors. Here these could be represent
RdV by special values for the 0 variable case and bit marking on the single
RdV constructor values. This could lead to good optimizations on case 
RdV expressions.
RdV Also not so easy in C.

1) it is far from current ghc optimization facilities
2) i don't see problems with using if()

RdV The lack of this feature encourages people to encode their datatypes as
RdV Int's to get speed. Also not good.

RdV Whether we can communicate the non aliasing and aliasing properties of GHC
RdV to the C compiler, I am also not so sure.

concrete examples?

RdV Also in GHC, I am not sure whether stack base locals are the best move. It
RdV might be best to have a fixed page as register spill in some cases.

it's the optimization that gcc can handle much better :)  just see at
the code generated for the fac() function

RdV If you wish to pass and return unboxed tuples without reboxing you will
RdV probably required a special function interface with double entry points (I
RdV think this can be done in C, but it is  a bit tricky).

as i said, probably paira,b can be used, but i don't tested this yet



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Re[2]: inside the GHC code generator

2006-02-23 Thread Rene de Visser




From: Bulat Ziganshin [EMAIL PROTECTED]

i answered in the original letter (search for Cray :)


Re-reading this, I see that you have a well defined goals that cover most of 
my points.


seems that you don't seen the attached files. tail calls are optimized
in gcc


No I don't see any attached files (on any of your emails).

Best of luck with your optimisations.
I will look forward to using them.

(Although I don't like the style of the shootout code,
I find it very useful in helping me speed up my own code,
and yes I find it better in write Haskell in such a style, than
to use FFI and call C).

Rene.


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: inside the GHC code generator

2006-02-23 Thread Jean-Philippe Bernardy
Hello Bulat,

From my (limited) knowledge of GHC backend, the difficult part of your
plan is that STG is not suited to compilation to native C at all. You
might need to do quite advanced translation from STG to another
intemediate language, (as GRIN for example), and then some more
advanced analysis/optimization before C can be generated.

iirc, the tricky part is the handling of lazyness. At any point you
may end up with a thunk (closure), which ghc handles easily by
evaluating it: it's always a function pointer. (That's the tagless
part) When in WHNF, it just calls a very simple function that fills
registers with the evaluated data. Otherwise, the function call
performs the reduction to WHNF.

If you want to use the same trick, you'll end up with the same
problems (bad C). So, you'll want to eliminate thunks statically,
finally requiring a 'high level' analysis as I suggested above.

Also, allocation + garbage collection is extremely efficient in
current GHC... C/C++ alloc doesn't even come close. It's entirely
possible that with even a very fast backend, the better ghc allocator
will be enough to swing the balance. (see jhc)
It might be possible to re-use most of it however.

I certainly don't want to discourage you (we all want faster code :),
but there is no easy path to climb the mountain ;)

Cheers,
JP.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users