Re[2]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello kyra,

Friday, February 24, 2006, 12:37:02 AM, you wrote:

 i prefer to see the asm code. this may be because of better high-level
 optimization strategies (reusing fib values). the scheme about i say
 will combine advantages of both worlds
k no strategies, plain exponential algorithm,

yes, the ocaml compiler works better with stack. but i sure that in
most cases gcc will outperform ocaml because it has large number of
optimizations which is not easy to implement (unrolling, instruction
scheduling and so on)

k also, Clean is *EXACTLY* in line with ocaml. This is interesting,
k because Clean is so much similar to Haskell.

clean differs from Haskell in support of unique types and strictness
annotations. the last is slowly migrates into GHC in form of shebang
patters, but i think that it is a half-solution. i mentioned in
original letter my proposals to add strictness annotations to
function types declarations and to declare strict datastructures, such
as ![Int]

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello Ben,

Friday, February 24, 2006, 2:04:26 AM, you wrote:

 * multiple results can be returned via C++ paira,b template, if this is
 efficiently implemented in gcc

BRG There's a -freg-struct-return option in 2.x, 3.x and 4.x. I think it's off
BRG by default on many architectures.

thank you! -1 problem. moreover, we can use just plain C for our
translation instead of C++


 * recursive definitions translated into the for/while loops if possible

BRG I think recent versions of GCC do a good job of this. See

BRG  http://home.in.tum.de/~baueran/thesis/

there is no problem with proper handling of tail calls. the problem is
what these loops are not unrolled, generating significantly worse
code than explicit loop. you can see this in the files i attached to
the original letter

BRG All of this efficiency stuff aside, there's a big problem you're 
neglecting:
BRG GARBAGE COLLECTION. For a language that allocates as much as Haskell I 
think
BRG a conservative collector is absolutely out of the question, because they
BRG can't compact the heap and so allocation becomes very expensive. A copying
BRG collector has to be able to walk the stack and find all the roots, and that
BRG interacts very badly with optimization. All the languages I've seen that
BRG compile via C either aren't garbage collected or use a conservative or
BRG reference-count collector.

as i said, we can just use 2 stacks - one, pointed by EBP register,
contains all boxed values, second is hardware stack, pointed of course
by ESP, contains unboxed values and managed by gcc as for any other C
programs. so, the boxed parameters to functions are go through
EBP-pointed stack and unboxed values passed via usual C conventions:

int fac(int n, int r)

currently, EBP is used for all data and ESP is just not used.
moreover, until 1999 the same two stacks scheme was used in GHC.
comparing to current one stack scheme, we will need more space for
stacks and may lose something because memory usage patterns will be
slightly less localized


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[4]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello Jean-Philippe,

Friday, February 24, 2006, 2:04:57 AM, you wrote:

JPB From my (limited) knowledge of GHC backend, the difficult part of your
JPB plan is that STG is not suited to compilation to native C at all. You
JPB might need to do quite advanced translation from STG to another
JPB intemediate language, (as GRIN for example), and then some more
JPB advanced analysis/optimization before C can be generated.

your knowledge is very limited, because STG at all times was compiled
to C and then compiled by gcc :)  moreover, it's very easy to
implement efficient compilation of fac() definition from STG to
natural C. it's one-hour task, maximum. the real problem is changing
the whole environment, including AST representing C code inside the GHC,
RTS and so on

JPB iirc, the tricky part is the handling of lazyness. At any point you
JPB may end up with a thunk (closure), which ghc handles easily by
JPB evaluating it: it's always a function pointer. (That's the tagless
JPB part) When in WHNF, it just calls a very simple function that fills
JPB registers with the evaluated data. Otherwise, the function call
JPB performs the reduction to WHNF.

STG attaches explicit typing information to any var. i propose:

1) compile code that works with unboxed values to equivalent natural
C code
2) for the boxed values, MAXIMALLY simplify test that they are already
in WHNF. current solution leads to two jumps, what is a very expensive
on modern cpus. i propose to make instead one check and one
conditional jump what will be much cheaper. then, if value is in WHNF,
we just load all the needed data himself. if it is not in WHNF, we
perform just the same operations as in current GHC. for example,
wrapper that calls the int fac(int n, int r) worker, can be
something like this:

if (p = n-closure) then goto eval_n;
n_evaluated:
if (p = r-closure) then goto eval_r;
r_evaluated:
return fac (n-value, r-value);

eval_n:
(*p) (); // sets n-closure=NULL; n-value to n's value
goto n_evaluated;

eval_r:
(*p) ();
goto r_evaluated;

JPB If you want to use the same trick, you'll end up with the same
JPB problems (bad C). So, you'll want to eliminate thunks statically,
JPB finally requiring a 'high level' analysis as I suggested above.

really? :)

JPB Also, allocation + garbage collection is extremely efficient in
JPB current GHC... C/C++ alloc doesn't even come close. It's entirely
JPB possible that with even a very fast backend, the better ghc allocator
JPB will be enough to swing the balance. (see jhc)
JPB It might be possible to re-use most of it however.

i don't propose to replace everything in ghc backend, just the way
unboxed code is compiled and something around it. i just don't know
enough about other parts of ghc backend/rts :)))

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: inside the GHC code generator

2006-02-24 Thread Simon Peyton-Jones
| last days i studied GHC code generator, reading -ddumps of all sorts,
| GHC sources, various papers and John's appeals :)
| 
| what i carried from this investigation:
| 
| GHC has great high-level optimization. moreover, GHC now allows to
| program STG almost directly. when i look at STG code i don't see
| anything that can be done better - at least for these simple loops i
| tried to compile. i see just unboxed variables, primitive operations
| and simple loops represented as tail recursion. fine.
| 
| then STG compiled to the simplified C (called abstract C earlier and
| quasi C-- now), what is next can be translated:
| 
| * to real C and then compiled by gcc
| * to assembler by build-in simple C-- compiler
| * to assembler by some external C-- compiler (at least it is
| theoretically possible)

Good stuff Bulat.  There's plenty of interesting stuff to be done here.

However, let me strongly urge you *not* to focus attention primarily on
the gcc route.  Compiling via C has received a lot of attention over the
years, and there are many papers describing cool hacks for doing so.
GHC does not do as well as it could. 

But there are serious obstacles.  That's not gcc's fault -- it wasn't
designed for this.  Accurate GC is one of them, tail calls is another,
and there are plenty more smaller things that bite you only after you've
invested a lot of time.  This way lies madness.

C-- was *designed* for this purpose.  GHC uses C-- as its intermediate
language (just before emitting C).  So a good route is this:

* Write C-- to C-- optimisations

* Then, if you like, translate that code to C.  Already you will be
doing better than GHC does today, because the C-- to C-- optimiser will
let you generate better C

* But you can also emit C--, or native code; both of these paths will
directly benefit from your C-- optimisations.

The point is that none of this relies on Quick C--; you can always use
an alternative back end.



You can almost do this today. GHC uses C-- as an intermediate language.
But alas, GHC's code generator does not take advantage of C--'s native
calls or parameter passing.  Instead, it pushes things on an auxiliary
stack etc.  (Those Sp memory operations you see.)  This isn't necessary.
We are planning to split GHC's code generator into two parts
A) Generate C-- with native calls, with an implicit C-- stack
B) Perform CPS conversion, to eliminate all calls in favour of
jumps
using an explicit stack
The current code gen is A followed by B.  But A is a much more suitable
optimisation platform, and gives more flexibility.

Chris Thompson, and undergrad at Cambridge, is doing (B) as his
undergrad project, although it remains to be seen whether he'll have
enough complete to be usable.  

Another shortcoming is that the native code generator in GHC isn't
capable of dealing with backward jumps to labels (because GHC hasn't
needed that so far).  But if you did C-- optimisation, you'd probably
generate such jumps.  It'd be great to beef up the native code gen to
handle that.


Many of the optimisations you describe (perhaps all) are readily
expressible in the C-- intermediate language, and by working at that
level you will be independent of with the back end is gcc, a native code
generator, or Quick C--, or some other C-- compiler.  Much better.

Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: inside the GHC code generator

2006-02-24 Thread Simon Marlow

Hi Bulat,

Wow!  You make a lot of good suggestions.  I think some of them are 
unfortunately unworkable, some others have been tried before, but there 
are definitely some to try.  I'll suggest a few more in this email.


First of all I should say that we don't *want* to use gcc as a code 
generator.  Everything we've been doing in the back end over the last 
few years has been aiming towards dropping the dependency on gcc and 
removing the Evil Mangler.  Now is not the time to be spending a lot of 
effort in beefing up the C back end :-)  Our preference is to work on 
both the native and C-- back ends; the C-- back end has the potential to 
produce the best code and be the most portable.  We also plan to keep 
the unregisterised C back end for the purposes of porting to a new 
platform, it's only the registerised path we want to lose.


Having said all that, if we can get better code out of the C back end 
without too much effort, then that's certainly worthwhile.



translated to something like this

factorial:
_s1BD = *(Sp + 0);
if (_s1BD != 1) goto c1C4; // n!=1 ?
R1 = *(Sp + 4);
Sp = Sp + 8;
return (*(Sp + 0))();  // return from factorial
c1C4:
_s1BI = _s1BD * (*(Sp + 4));   // n*r
_s1BF = _s1BD - 1; // n-1
*(Sp + 4) = _s1BI;
*(Sp + 0) = _s1BF;
goto factorial;


To be fair, this is only the translation on an architecture that has no 
argument registers (x86, and currently x86_64 but hopefully not for long).


The simple optimisation of changing the final tail call to factorial 
into a goto to a label at the beginning of the function (as suggested by 
John Meacham I think) should improve the code generated by gcc for 
functions like this, and is easy to do.



1) because it just not asked. you can enable gcc optimization by
adding -optc-O6 to the cmdline, but this leads to compilation errors
on part of modules. it seems that gcc optimization is not compatible
with evil mangler that is the ghc's own optimization trick.


-O3 works, I haven't tried -O6.


* C++ calling stack should be used as much as possible
* parameters are passed in C++ way: factorial(int n, int r)


This is unworkable, I'm afraid.  Go and read Simon's paper on the STG:

  http://citeseer.ist.psu.edu/peytonjones92implementing.html

there are two main reasons we don't use the C stack: garbage collection 
and tail calls.  What do you plan to do about GC?



* recursive definitions translated into the for/while loops if possible


Certainly a good idea.


* groups of mutually recursive definitions should be also naturally
translated as much as possible
* functions marked INLINE in Haskell code should be marked the same in
C++ code


an inline function will have been inlined, so not sure what you mean here!


* maximally speed up using of already evaluated values. instead of
unconditional jumping to the closure-computation function just test
this field and continue computation if it contains NULL (indicating
that value is already in WHNF). This change will make time penalty of
boxing data structures significantly, at least 10 times, smaller


This is called semi-tagging, it was tried a long time ago.  Certainly 
worth trying again, but I very much doubt the gains would be anything 
like you claim, and it increases code size.


There are other schemes, including this one that we particularly like: 
use a spare bit in a pointer to indicate points to an evaluated value. 
 Since there are two spare bits, you can go further and use the 4 
values to mean:


  0: possibly unevaluated
  1: evaluated, tag 0
  2: evaluated, tag 1
  3: evaluated, tag 2

I believe Robert Ennals tried this as part of his spec eval 
implementation, but IIRC he didn't get separate measurements for it (or 
it didn't improve things much).  Note that this increases code size too.



* in addition to ability to define strict fields, allow to define
strict data structures (say, lists) and strict types/structures of
functions parameters and results. that is a long-standing goal, but
the ![!Int] - !Int function can be used inside higher-order code a
lot faster than [Int] - Int one. the ultimate goal is to allow
Haskell to be as fast as ML dialects in every situation and this means
that laziness should be optional in every place. this will also
allow to make efficient emulation of C++ virtual methods


Strict types are interesting, and I know several people (including me) 
who have put some thought into how they would work.  It turns out to be 
surprisingly difficult, though.


You might look into doing something simple: suppose you could declare a 
type to be unlifted:


  data !T = ..

The type T doesn't have a bottom element.  Its kind is !, not *, and it 
can't be used to instantiate a polymorphic type variable, just like 
GHC's primitive types.


This is quite restrictive, but it's simple and pretty easy to implement 
in GHC as it stands.  It gives you a nice 

Re: Re[2]: inside the GHC code generator

2006-02-24 Thread Lemmih
On 2/24/06, Bulat Ziganshin [EMAIL PROTECTED] wrote:
 Hello kyra,

 Friday, February 24, 2006, 12:37:02 AM, you wrote:

  i prefer to see the asm code. this may be because of better high-level
  optimization strategies (reusing fib values). the scheme about i say
  will combine advantages of both worlds
 k no strategies, plain exponential algorithm,

 yes, the ocaml compiler works better with stack. but i sure that in
 most cases gcc will outperform ocaml because it has large number of
 optimizations which is not easy to implement (unrolling, instruction
 scheduling and so on)

 k also, Clean is *EXACTLY* in line with ocaml. This is interesting,
 k because Clean is so much similar to Haskell.

 clean differs from Haskell in support of unique types and strictness
 annotations. the last is slowly migrates into GHC in form of shebang
 patters, but i think that it is a half-solution. i mentioned in
 original letter my proposals to add strictness annotations to
 function types declarations and to declare strict datastructures, such
 as ![Int]

As I've understood it, Clean's strictness annotations are a bit of a
hack which only works on certain built-in types. Am I mistaking here?

--
Friendly,
  Lemmih
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: inside the GHC code generator

2006-02-24 Thread Simon Peyton-Jones

| The closest thing I've seen to a solution is the technique used in
Urban
| Boquist's thesis, which is to put a static table in the executable
with
| information about register and stack usage indexed by procedure return
| point, and use that to unwind the stack and find roots. 

Every accurate garbage collector (including GHC's) uses a technique like
this, but the solution is invariably tightly-coupled to the garbage
collector, compiler and run-time system.  

A major feature of C-- is that it squarely addresses this problem in a
*reusable* way.

Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


factorial: let's get ahead of jhc! :)

2006-02-24 Thread Bulat Ziganshin
Hello glasgow-haskell-users,

i propose to do for start rather small change that will allow to
make things go several times faster and in particular outperform jhc
for the small leaf loops (i.e. loops that use only GHC primitives
and don't call other functions). that include factorial, a[]+=b[i]
loop that was discussed here several weeks ago, some shootout
examples, my own arrays serialization code and so on, so on

the idea is that the following STG code

f :: Int# - Double# - ... - Int#

(i.e. function use only unboxed values/results)

f x y z = code1 in
  case expr1 of
A - codeA
B - codeB
C - codeC in f x_c y_c z_c
D - codeD in f x_d y_d z_d
_ - code0 in f x0 y0 z0

can be compiled to the following C/C-- code:

f() {
  x = sp[0];
  y = sp[4];
  z = sp[8];
  sp+=12;

  code1;
  while (expr1!=A) {
if (expr1==B) then return codeB;
else if (expr1==C) then {x=x_c; y=y_c; z=z_c;}
else if (expr1==D) then {x=x_d; y=y_d; z=z_d;}
else {x=x0; y=y0; z=z0;}
code1;
  }
  return codeA;
}

this compilation don't require any changes in GHC memory model. all we
need:

1) add for/if/while to the C-- statement types list (data CmmStmt)

2) implement recognizer for such simple STG functions (leaf unboxed
procedures recognizer)

3) implement the translation itself

as i said, it's should be no more than one day of work. i even think
that it's one day for me and one hour for Simon Marlow :)  Simon, how
about this? i can even make the patches over current 6.6 sources and
you will apply them at morning and we will see whether it work :) i
yet never tried to rebuild the whole ghc :)

-- 
Best regards,
 Bulat  mailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


GHC 6.4.1 crash on Windows XP

2006-02-24 Thread Cyril Schmidt
A freshly installed GHC 6.4.1 on my colleague's PC crashes when I try
to build a package:

runhaskell Setup.hs build

The effect is easily reproduceable (it shows up on *any* package
that I try to build).

Does anyone have any idea of what might be wrong here?

Cyril
___

For the record, the information given by Windows at the point of crash:

Exception Information
Code: 0xc005Flags: 0x0
Record: 0x0  Address: 0x0

System Information
Windows NT 5.1 Build: 2600

Module 1
ghc.exe
Image Base: 0x0040  Image Size: 0x0
Checksum: 0x0088d2bdTime Stamp: 0x433058b8

(The IP and SP already point somewhere inside Dr. Watson, as far
as I can see, so I don't think the registers and stack contents
are of any use at this point.)


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[3]: factorial: let's get ahead of jhc! :)

2006-02-24 Thread Bulat Ziganshin
Hello Bulat,

Friday, February 24, 2006, 6:37:42 PM, you wrote:

SPJ Perhaps you may consider doing this transformation on the C-- data type
SPJ only, without involving the (already very complicated) STG - C-- code
SPJ generator?

i have found my investiations in this area. that is the C-- code
generated for fac worker:

Fac_zdwfac_entry() {
c1C0:
_s1BD = I32[Sp + 0];
if (_s1BD != 1) goto c1C4;
R1 = I32[Sp + 4];
Sp = Sp + 8;
jump (I32[Sp + 0]);
c1C4:
_s1BI = _s1BD * I32[Sp + 4];
_s1BF = _s1BD - 1;
I32[Sp + 4] = _s1BI;
I32[Sp + 0] = _s1BF;
jump c1C0;
}

first, we convert jump to the explicit loop:

_s1BD = I32[Sp + 0];
while (_s1BD != 1) {
_s1BI = _s1BD * I32[Sp + 4];
_s1BF = _s1BD - 1;
I32[Sp + 4] = _s1BI;
I32[Sp + 0] = _s1BF;
_s1BD = I32[Sp + 0];
}
R1 = I32[Sp + 4];
Sp = Sp + 8;
jump (I32[Sp + 0]);

then, we cache contents of sp[*] variables in the local ones:

sp4 = I32[Sp + 4];
sp0 = I32[Sp + 0];
_s1BD = sp0;
while (_s1BD != 1) {
_s1BI = _s1BD * sp4;
_s1BF = _s1BD - 1;
sp4 = _s1BI;
sp0 = _s1BF;
_s1BD = sp0;
}
I32[Sp + 4] = sp4;
I32[Sp + 0] = sp0;
R1 = I32[Sp + 4];
Sp = Sp + 8;
jump (I32[Sp + 0]);

and then we wipe out all the superfluous variables:

sp4 = I32[Sp + 4];
sp0 = I32[Sp + 0];
while (sp0 != 1) {
sp4 = sp0 * sp4;
sp0 = sp0 - 1;
}
R1 = sp4;
Sp = Sp + 8;
jump (I32[Sp + 0]);

and it is even for simple fac() function!!! instead of straightforward
generation of code that we need we should make all these nontrivial
studying and hard transformations on already generated code. on the
other side, we can just generate the following code right from the
STG:

int fac () {
sp4 = I32[Sp + 4];
sp0 = I32[Sp + 0];
while (sp0 != 1) {
sp4 = sp0 * sp4;
sp0 = sp0 - 1;
}
R1 = sp4;
Sp = Sp + 8;
jump (I32[Sp + 0]);
}

i think that my way is 100x simpler

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: factorial: let's get ahead of jhc! :)

2006-02-24 Thread Simon Marlow

Bulat Ziganshin wrote:


i propose to do for start rather small change that will allow to
make things go several times faster and in particular outperform jhc
for the small leaf loops (i.e. loops that use only GHC primitives
and don't call other functions). that include factorial, a[]+=b[i]
loop that was discussed here several weeks ago, some shootout
examples, my own arrays serialization code and so on, so on

the idea is that the following STG code

f :: Int# - Double# - ... - Int#

(i.e. function use only unboxed values/results)

f x y z = code1 in
  case expr1 of
A - codeA
B - codeB
C - codeC in f x_c y_c z_c
D - codeD in f x_d y_d z_d
_ - code0 in f x0 y0 z0

can be compiled to the following C/C-- code:

f() {
  x = sp[0];
  y = sp[4];
  z = sp[8];
  sp+=12;

  code1;
  while (expr1!=A) {
if (expr1==B) then return codeB;
else if (expr1==C) then {x=x_c; y=y_c; z=z_c;}
else if (expr1==D) then {x=x_d; y=y_d; z=z_d;}
else {x=x0; y=y0; z=z0;}
code1;
  }
  return codeA;
}

this compilation don't require any changes in GHC memory model. all we
need:

1) add for/if/while to the C-- statement types list (data CmmStmt)


Please don't extend the C-- data type - it's very small and simple 
because that makes it easy to manipulate and reason about.


I don't see why you need to, either: you can already express for, if and 
while in C-- using conditional branches.  I don't think gcc cares 
whether you write your code using labels and goto or while/for/if, it 
generates the same code either way.



2) implement recognizer for such simple STG functions (leaf unboxed
procedures recognizer)

3) implement the translation itself


By all means try this.  What you want is to compile recursive functions 
like this:


  f() {
   x = arg1;
   y = arg2;
  L:
   ... body of f, with args mapped to x  y,
   and recursive calls jumping to L passing args in x  y.
  }

It's quite hard to do this as a C-- to C-- optimisation, at least 
without implementing a lot of other optimisations that we don't already 
have.  I was hoping that gcc would do it for us, if we compile code like 
this:


  f() {
  L:
   ... body of f ...
   goto L;
  }

but sadly it doesn't do the whole job.  (see cmmLoopifyForC in 
cmm/CmmOpt.hs, which I added today).


So you might try the hacky way of doing this transformation at a higher 
level, when generating the C-- in the first place.  Putting the args in 
temporaries is the easy bit; generating different code for the recursive 
call will be more tricky.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Re[3]: factorial: let's get ahead of jhc! :)

2006-02-24 Thread Simon Peyton-Jones

| i have found my investiations in this area. that is the C-- code
| generated for fac worker:
| 
| Fac_zdwfac_entry() {
| c1C0:
| _s1BD = I32[Sp + 0];
| if (_s1BD != 1) goto c1C4;
| R1 = I32[Sp + 4];
| Sp = Sp + 8;
| jump (I32[Sp + 0]);
| c1C4:
| _s1BI = _s1BD * I32[Sp + 4];
| _s1BF = _s1BD - 1;
| I32[Sp + 4] = _s1BI;
| I32[Sp + 0] = _s1BF;
| jump c1C0;
| }

Once we do the A/B split of the code generator that I referred to, we
should get something like this from the A part

fac(n,m) {
  if n!=1 goto L
  return(m)
L: jump fac( n-1, n*m ) 
}

The B part will introduce the Sp nonsense.

I think you'll find this a much easier optimisation target.

Simon
-users
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: inside the GHC code generator

2006-02-24 Thread Ben Rudiak-Gould

Rene de Visser wrote:
Integer is about 30 times slower than it needs to be on GHC if you have 
over half the values between 2^-31 and 2^31. On i386 can you basically 
can test the low bit for free (it runs in parallel to the integer add 
instruction). This would allow only values outside this range to 
required calling the long integer code. Such an optimization is not 
easily done in C.


Currently GHC defines

data Integer = S# Int# | J# Int# ByteArray#

So it already avoids using GMP for small integers. There's a will this 
multiplication overflow primitive, but I'm not sure how it's implemented.


I think that changing this to use magic pointers would be pretty easy. You'd 
need to introduce a new primitive type, say Int31#, and then:


1. anytime you previously constructed a WHNF S# on the heap, make a
   magic pointer instead

2. anytime you dereference a pointer that might be an S#, check for a
   magic pointer first.

Even if a lot of code needs to be changed, it's straightforward because the 
changes are local. You're just changing the meaning of a pointer such that 
there's a statically allocated S# n at address 2n+1.


It would also be worth trying this for Char#, which is already a 31-bit 
type, to see if it would speed up text-processing code.


If only simple loops are optimized it will encourage people to always 
code loops in their haskell rather than perhaps using more appropriate 
constructs.


You could say that about any language. When your code needs to be fast it 
needs to be fast. I'd rather write ugly Haskell code than ugly C code, if it 
comes to that.


Also take the Maybe data type with Nothing and Just ... or any other 
datatypes with 0 and 1 variable constructors. Here these could be 
represent by special values for the 0 variable case and bit marking on 
the single constructor values.


I'm not sure this would help much. Nullary constructors are already 
allocated statically, so they're already represented by special values. You 
could check for those values instead of dereferencing the pointer, but in 
time-critical code the static object will presumably be in L1 cache anyway. 
I could be wrong of course, and it would be easy enough to extend the Int31# 
idea to nullary constructors (Int0#).


-- Ben

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[4]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello Lemmih,

Friday, February 24, 2006, 1:15:51 PM, you wrote:

 clean differs from Haskell in support of unique types and strictness
 annotations. the last is slowly migrates into GHC in form of shebang
 patters, but i think that it is a half-solution. i mentioned in
 original letter my proposals to add strictness annotations to
 function types declarations and to declare strict datastructures, such
 as ![Int]

L As I've understood it, Clean's strictness annotations are a bit of a
L hack which only works on certain built-in types. Am I mistaking here?

i don't know Clean very well, although i've seen rumors that it
supports strict datastructures. after all, this don't need changes in
language itself, it's just a syntax sugar:

data [a] = [] | a:[a]
x :: ![Int]

translates to the

data StrictList a = Nil | Cons a !(StrictList a)
x :: !(StrictList a)



... i've found the following in the 6 feb letter in cafe by Brian Hulley:

Bulat Ziganshin wrote:
 yes, i remember this SPJ's question :)  [!a] means that list
 elements are strict, it's the same as defining new list type with
 strict elements and using it here. ![a] means strict list, it is
 the same as defining list with next field strict:

 data List1 a = Nil1 | List1 !a (List1 a)
 data List2 a = Nil2 | List2 a !(List2 a)
 data List3 a = Nil3 | List3 !a !(List3 a)

Clean allows (AFAIK) several distinctions to be made:

1) ![a] means that the list of a's is a strict argument, just like writing 
!b

2) [!a] means that the list is head strict (List1 a)

3) [a!] means that the list is tail strict (List2 a)

4) [!a!] means that the list is head and tail strict (List3 a)

5) ![!a!] means that the head-and-tail-strict-list-argument is strict!!!

I think also (though I'm not entirely sure) that these distinctions are 
generalized for other data types by talking about element strictness and 
spine strictness.

One motivation seems to be that in the absence of whole program 
optimization, the strictness annotations on a function's type can allow the 
compiler to avoid creating thunks at the call site for cross-module calls 
whereas using seq in the function body itself means that the thunk still has 
to be created at the call site because the compiler can't possibly know that 
it's going to be immediately evaluated by seq.




and my own letter earlier on the 6 feb:

 foo :: !Int - !Int

KM (Is the second ! actually meaningful?)

yes! it means that the function is strict in its result - i.e. can't return
undefined value when strict arguments are given. this sort of knowledge
should help a compiler to propagate strictness and figure out the
parts of program that can be compiled as strict code. really, i think
ghc is able to figure functions with strict result just like it is able to
figure strict function arguments

KM Personally, I think is much nicer than sprinkling seq's around, and
KM generally sufficient.  However, there could perhaps be disambiguities?

btw, it's just implemented in the GHC HEAD

KM Last time this came up, I think examples resembling these were brought
KM up:

KM   foo :: [!a] - ![a] - a

yes, i remember this SPJ's question :)  [!a] means that list
elements are strict, it's the same as defining new list type with
strict elements and using it here. ![a] means strict list, it is
the same as defining list with next field strict:

data List1 a = Nil1 | List1 !a (List1 a)
data List2 a = Nil2 | List2 a !(List2 a)
data List3 a = Nil3 | List3 !a !(List3 a)

the type List3 is a simple strict list, like in any strict programming
language.

foo :: [!a] - ![a] - ![!a] - a

translates to

foo :: List1 a - List2 a - List3 a - a



KM   foo' :: Map !Int String - Int - String

that means that keys in this map saved as strict values. for example,
the following definition

type Map a b = [(a,b)]

will be instantiated to

Map !Int String == [(!Int, String)]


KM Anyway, if a reasonable semantics can be formulated, I think
KM strictness type annotations would be a great, useful, and
KM relatively non-intrusive (AFAICT, entirely backwards compatible)
KM addtion to Haskell'. 

such proposal already exists and supported by implementing this in GHC
HEAD


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[4]: inside the GHC code generator

2006-02-24 Thread Lemmih
On 2/24/06, Bulat Ziganshin [EMAIL PROTECTED] wrote:
 Hello Lemmih,

 Friday, February 24, 2006, 1:15:51 PM, you wrote:

  clean differs from Haskell in support of unique types and strictness
  annotations. the last is slowly migrates into GHC in form of shebang
  patters, but i think that it is a half-solution. i mentioned in
  original letter my proposals to add strictness annotations to
  function types declarations and to declare strict datastructures, such
  as ![Int]

 L As I've understood it, Clean's strictness annotations are a bit of a
 L hack which only works on certain built-in types. Am I mistaking here?

 i don't know Clean very well, although i've seen rumors that it
 supports strict datastructures. after all, this don't need changes in
 language itself, it's just a syntax sugar:

 data [a] = [] | a:[a]
 x :: ![Int]

 translates to the

 data StrictList a = Nil | Cons a !(StrictList a)
 x :: !(StrictList a)

Let's try this:

x :: ![Int] - Int

It would translate to something like this:

mkStrictList :: [a] - StrictList a
x = xStrict . mkStrictList
xStrict = ...

Wouldn't it be very expensive to strictify the list?

--
Friendly,
  Lemmih
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[6]: inside the GHC code generator

2006-02-24 Thread Bulat Ziganshin
Hello Lemmih,

Friday, February 24, 2006, 8:55:37 PM, you wrote:

 x :: ![Int]

 translates to the

 data StrictList a = Nil | Cons a !(StrictList a)
 x :: !(StrictList a)

L Let's try this:

L x :: ![Int] - Int

L It would translate to something like this:

L mkStrictList :: [a] - StrictList a
L x = xStrict . mkStrictList
L xStrict = ...

L Wouldn't it be very expensive to strictify the list?

yes, it would. but evaluating list as lazy one on EACH repetition of
some loop will cost much more. just for example - i found that the
cycle

repeat 666 (putStr (replicate 15 ' '))

works several times slower than

repeat (10^8) (putChar ' ')

if argument of putStr will be evaluated only one time then much of
difference will gone. of course, that is just benchmark. but there are
cases when i prefer to work with strict datastructures and specialize
my functions accordingly. typically it's the cases where time/space
are really critical

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: inside the GHC code generator

2006-02-24 Thread Wolfgang Thaller

Another shortcoming is that the native code generator in GHC isn't
capable of dealing with backward jumps to labels (because GHC hasn't
needed that so far).  But if you did C-- optimisation, you'd probably
generate such jumps.  It'd be great to beef up the native code gen to
handle that.


I'm already working on that. It's basically done, I think I only need  
to get around to one more session with the code for final cleanup.


(Just to avoid any duplication of effort).

Cheers,

Wolfgang

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: inside the GHC code generator

2006-02-24 Thread Ben Rudiak-Gould

Simon Peyton-Jones wrote:

| [...] put a static table in the executable with
| information about register and stack usage indexed by procedure return
| point, and use that to unwind the stack and find roots. 


Every accurate garbage collector (including GHC's) uses a technique like
this, but the solution is invariably tightly-coupled to the garbage
collector, compiler and run-time system.


Okay, I don't know what I was thinking when I wrote that no languages that 
compile via C use compacting collectors, since obviously lots of them do. 
But they do it by using various hacks to force local heap pointers into 
locations where the GC can find them. Most of this effort is wasted, because 
the local variables disappear before the GC runs. What I'm talking about is 
idiomatic C code which manipulates heap pointers like any other object, and 
which can be optimized as usual (e.g. holding heap pointers in callee-saved 
registers across function calls) without causing GC problems. There's no 
reason in principle that this couldn't be done.


-- Ben

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users