Re[2]: inside the GHC code generator

2006-02-28 Thread Bulat Ziganshin
Hello Simon,

Friday, February 24, 2006, 12:52:36 PM, you wrote:
SM Wow!  You make a lot of good suggestions.  I think some of them are
SM unfortunately unworkable, some others have been tried before, but there
SM are definitely some to try.  I'll suggest a few more in this email.

can you comment about unworkable/hard to implement ones? may be, ghc
users can propose some solutions that you are skipped

SM First of all I should say that we don't *want* to use gcc as a code 
SM generator.  Everything we've been doing in the back end over the last 
SM few years has been aiming towards dropping the dependency on gcc and 
SM removing the Evil Mangler.  Now is not the time to be spending a lot of
SM effort in beefing up the C back end :-)  Our preference is to work on 
SM both the native and C-- back ends; the C-- back end has the potential to
SM produce the best code and be the most portable.  We also plan to keep 
SM the unregisterised C back end for the purposes of porting to a new 
SM platform, it's only the registerised path we want to lose.

i know that is the hardest part of discussion. but sometimes we need
to change our decisions. i think that it's better to compare
efforts/results of going toward gcc/c-- way starting from current
state of ghc, avoiding counting the previous efforts :(  on the c--
side:

1) time/efforts required to implement many low-level optimizations
2) more or less efficient code

on the gcc side:

1) time required to implement generation of native C code instead of
current portable asm
2) one of the best optimization possible

gcc way already exists and works, while C-- way is still unfinished
even in its basic, unoptimized state. next, unregisterized C way
provides the maximum possible level of portability


SM Having said all that, if we can get better code out of the C back end
SM without too much effort, then that's certainly worthwhile.

moreover, many changes, proposed on this thread, perfectly
implementable even on C-- way

 translated to something like this
 
 factorial:
 _s1BD = *(Sp + 0);
 if (_s1BD != 1) goto c1C4; // n!=1 ?
 R1 = *(Sp + 4);
 Sp = Sp + 8;
 return (*(Sp + 0))();  // return from factorial
 c1C4:
 _s1BI = _s1BD * (*(Sp + 4));   // n*r
 _s1BF = _s1BD - 1; // n-1
 *(Sp + 4) = _s1BI;
 *(Sp + 0) = _s1BF;
 goto factorial;

SM To be fair, this is only the translation on an architecture that has no
SM argument registers (x86, and currently x86_64 but hopefully not for long).

this just emphasizes main problem of portable asm code generated by
ghc - you can't optimize it as good as C compiler optimizes idiomatic
C code. x86 has 7 registers, after all, and gcc generates great code
for this procedure - but only if it is written in idiomatic manner.
you will need to develop your own optimization techniques, and it will
be hard and long way comparing to just generation of more high-level C
code

just to compare - i've rewritten my binary serialization library two
times. the first version just used string concatenation (!), the
second was the carefully optimized using my C experience.
nevertheless, it was slow enough. and the final 3rd was optimized
according to ghc features. and the result was astonishing

portable asm looking great in theory, and C-- as portable low-level
language is great theoretical idea. but reality has its own laws. want
to know why my second version of serialization library, highly
optimized according to C experience, was so slow? it just returns
tuples and constructing/analyzing these lazy tuples used 80% of the
time

ghc generates not so fast code and that is known for many years. you
have developed the proper theoretical way to solve this problem, but
there is also much easier backdoor :) at least, it seems like this

SM The simple optimisation of changing the final tail call to factorial 
SM into a goto to a label at the beginning of the function (as suggested by
SM John Meacham I think) should improve the code generated by gcc for 
SM functions like this, and is easy to do.

the main detail is to work with local variables instead of Sp[xx]. gcc
can't optimize access to Sp[xx]

 1) because it just not asked. you can enable gcc optimization by
 adding -optc-O6 to the cmdline, but this leads to compilation errors
 on part of modules. it seems that gcc optimization is not compatible
 with evil mangler that is the ghc's own optimization trick.

SM -O3 works, I haven't tried -O6.

i've attached optc-O2-bug.hs that don't work both with -optc-O2 and
-optc-O3

 * C++ calling stack should be used as much as possible
 * parameters are passed in C++ way: factorial(int n, int r)

SM This is unworkable, I'm afraid.  Go and read Simon's paper on the STG:

SMhttp://citeseer.ist.psu.edu/peytonjones92implementing.html

SM there are two main reasons we don't use the C stack: garbage collection
SM and tail calls.  What do you plan to do about GC?


Re: inside the GHC code generator

2006-02-28 Thread Simon Marlow

Bulat Ziganshin wrote:


* C++ calling stack should be used as much as possible
* parameters are passed in C++ way: factorial(int n, int r)



SM This is unworkable, I'm afraid.  Go and read Simon's paper on the STG:

SMhttp://citeseer.ist.psu.edu/peytonjones92implementing.html

SM there are two main reasons we don't use the C stack: garbage collection
SM and tail calls.  What do you plan to do about GC?

minimalistic solution i see - just use two stacks. unboxed values are
passed using the C conventions and C compiler will be able to
optimize using of this stack. boxed values are passed using the
current Sp[] machinery and therefore GC will not be changed


How do you propose to do thread switching?

Incedentally, GHC had two stacks a long time ago (before 4.00) I rewrote 
the runtime and code generator to replace it with one stack.  It was 
about 6 months work, if I remember correctly.  I think you misused the 
words just and minimalistic :-)


I know gcc does (some) tail-call optimisation these days.  You don't get 
any *guarantee* that a call will be implemented as a tail-call, though, 
and the conditions are quite complex.  The gcc folks treat it as an 
optimisation, rather than a fundamental part of the operational 
semantics, which is what you need for Haskell.


SM This is called semi-tagging, it was tried a long time ago.  Certainly 
SM worth trying again, but I very much doubt the gains would be anything 
SM like you claim, and it increases code size.


we will replace 2 unpredictable jumps with one that will not be even
executed if value is in WHNF. at least, i'm very displeased by slowness
of lazy values evaluation in GHC. for example, my binary serialization
package writes 60 mb/s working with unboxed arrays and only 20 mb/s
working with the lists


You need to *try* these things and measure the difference.  Quite often 
I'm surprised with the actual results I get from an optimisation that 
looks on paper to be worthwhile.  Today's processors with 3-level caches 
are complicated beasts.


Semi tagging increases code size, so unless you can make some 
significant gains to make up for it, you've lost already.


Let me give you another example: GHC puts info tables next to the entry 
code for a closure.  On paper this looks like it might be a bad idea: it 
pollutes the instruction cache with data that is rarely accessed during 
execution.  It's hard to arrange (this is one thing the mangler does), 
so we'd like to avoid it if possible.  However, doing this eliminates an 
indirection when entering a closure, reduces the size of info tables by 
one word, and reduces code size.  These effects together are more 
beneficial than the code-cache-pollution effect - last time I measured 
it the difference was ~10%.  (turn off TABLES_NEXT_TO_CODE and try it 
yourself).



SM There are other schemes, including this one that we particularly like:
SM use a spare bit in a pointer to indicate points to an evaluated value.
SM   Since there are two spare bits, you can go further and use the 4 
SM values to mean:


SM0: possibly unevaluated
SM1: evaluated, tag 0
SM2: evaluated, tag 1
SM3: evaluated, tag 2

SM I believe Robert Ennals tried this as part of his spec eval 
SM implementation, but IIRC he didn't get separate measurements for it (or

SM it didn't improve things much).  Note that this increases code size too.

to be exact, we had only 2^30-1 valid pointers, all other values are
ready to use! :)


True, but I'm assuming you often want to store a pointer in addition to 
the tag (eg. for a cons cell).  For enumerations, you're quite right 
that the whole word can be used for the tag as long as it is 
distinguished from a pointer.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: GHC 6.4.1 crash on Windows XP

2006-02-28 Thread Simon Marlow

Cyril Schmidt wrote:

A freshly installed GHC 6.4.1 on my colleague's PC crashes when I try
to build a package:

runhaskell Setup.hs build

The effect is easily reproduceable (it shows up on *any* package
that I try to build).

Does anyone have any idea of what might be wrong here?


Does it always crash in the same way?

Can you fire up GHCi and perform a small evaluation or two?

Can you compile small programs, do they run?

Please send us the complete output from commands that crash: eg. add -v 
to the build command above.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Missing Folder in ghc?

2006-02-28 Thread Lemmih
On 2/28/06, Ashley Yakeley [EMAIL PROTECTED] wrote:
 I'm trying to build GHC from source. But the ghc repository at
 http://darcs.haskell.org/ghc seems to be missing ghc/lib/compat/Cabal?

Did you run 'sh darcs-all get'?

--
Friendly,
  Lemmih
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Missing Folder in ghc?

2006-02-28 Thread Ashley Yakeley

Lemmih wrote:

Did you run 'sh darcs-all get'?


Oh, that wasn't in the README. Thanks. But now I get this:

/usr/bin/ghc -H16m -O -I. -Iinclude -Rghc-timing  -I../../../libraries 
-fglasgow-exts -no-recomp-c System/Directory/Internals.hs -o 
System/Directory/Internals.o  -ohi System/Directory/Internals.hi


System/Directory/Internals.hs:1:0:
Module `System.Directory.Internals' is a member of package base-1.0.
To compile this module, please use -ignore-package base-1.0.

I'm using GHC 6.4.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users