Re: LLVM back end

2007-01-04 Thread Simon Marlow

Simon Peyton-Jones wrote:

How about
* concurrency (the ability to have zillions of little stacks,
with stack overflow checks and growing the stck on overflow)?
* exception handling (the ability to crawl over the stack
looking for exception catch frames)?
* garbage collection (the ability to find pointers in the stack)


I'd imagined that we'd use LLVM as a low-level code generator, doing our own 
stack management and using the existing GHC runtime support for all of these 
things.  Michael seems to be saying something different though... but it seems 
to me that anything other than using LLVM as a code generator only is a *lot* of 
work, and in direct competition with C--, so not an easy sell.


Cheers,
Simon


Simon

| -Original Message-
| From: [EMAIL PROTECTED] [mailto:glasgow-
| [EMAIL PROTECTED] On Behalf Of Wolfgang Thaller
| Sent: 20 December 2006 23:55
| To: [EMAIL PROTECTED]
| Cc: GHC Users; Simon Marlow
| Subject: Re: LLVM back end
|
| On 20-Dec-06, at 10:10 AM, Michael T. Richter wrote:
|
|  Well, I'm almost entirely ignorant of LLVM and of Haskell --
|  especially the internals of both.  That makes me ideally suited for
|  this project since I'm not aware that it's impossible.
|
| I'm afraid LLVM currently lacks some features that are required for
| efficient GHC code generation, specifically:
|
| a) Global Register Variables
| b) The ability to put data next to code
| c) (maybe, I'm not sure) Tailcalls
|
| ad a)
| We need to keep some things in global registers for speed reasons
| (depending on the number of available registers); among other things,
| the haskell stack pointer and the heap pointer. Using regular global
| variables instead would be a lot slower.
|
| ad b)
| With (almost) every chunk of code, we want to associate a smal chunk
| of data (the info table) which contains information used by the
| garbage collector and other parts of the run time system. We want to
| use only one pointer to point to both of those things, and we don't
| want to waste time with any additional indirections, so we make the
| pointer point between the data chunk and the code chunk.
|
| ad c)
| While they are supported in theory, I couldn't get LLVM to generate
| any tailcalls. Maybe I've done something wrong, maybe they're not
| really implemented yet.
|
| So I guess step one would be to start negotiating about those things
| with the LLVM people.
|
| Cheers,
|
| Wolfgang
|
| ___
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-12-22 Thread John Meacham
On Thu, Dec 21, 2006 at 12:54:50AM +0100, Wolfgang Thaller wrote:
 ad c)
 While they are supported in theory, I couldn't get LLVM to generate  
 any tailcalls. Maybe I've done something wrong, maybe they're not  
 really implemented yet.

it seems that ghc could benefit from a tail-call elimination
optimization, that turns them directly into loops. I have implemented it
for jhc and it was not terribly difficult (though, not trivial)

basically, I just refrained from lambda lifting any functions that were
always applied in a fully saturated manner and called in tail-position.
then when spitting out C, I just compile them directly into 'goto's. It
seems to work well and subsumes turning jump-points directly into goto's
rather than function calls as well.

I actually do lambda lifting twice, once right before converting to
grin, where I lambda lift everything that can never be directly jumped
to because it occurs in an unsaturated call or in a lazy context (even
if it is called outside of tail position in a strict context), then I
optimize, then I lambda lift right before conversion to C any that
wern't able to be turned into true jumps. the grin optimizations seem to
be able to expose more tail-calls so make this two pass lambda lifting
worth it.


John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: LLVM back end

2006-12-21 Thread Simon Peyton-Jones
How about
* concurrency (the ability to have zillions of little stacks,
with stack overflow checks and growing the stck on overflow)?
* exception handling (the ability to crawl over the stack
looking for exception catch frames)?
* garbage collection (the ability to find pointers in the stack)

Simon

| -Original Message-
| From: [EMAIL PROTECTED] [mailto:glasgow-
| [EMAIL PROTECTED] On Behalf Of Wolfgang Thaller
| Sent: 20 December 2006 23:55
| To: [EMAIL PROTECTED]
| Cc: GHC Users; Simon Marlow
| Subject: Re: LLVM back end
|
| On 20-Dec-06, at 10:10 AM, Michael T. Richter wrote:
|
|  Well, I'm almost entirely ignorant of LLVM and of Haskell --
|  especially the internals of both.  That makes me ideally suited for
|  this project since I'm not aware that it's impossible.
|
| I'm afraid LLVM currently lacks some features that are required for
| efficient GHC code generation, specifically:
|
| a) Global Register Variables
| b) The ability to put data next to code
| c) (maybe, I'm not sure) Tailcalls
|
| ad a)
| We need to keep some things in global registers for speed reasons
| (depending on the number of available registers); among other things,
| the haskell stack pointer and the heap pointer. Using regular global
| variables instead would be a lot slower.
|
| ad b)
| With (almost) every chunk of code, we want to associate a smal chunk
| of data (the info table) which contains information used by the
| garbage collector and other parts of the run time system. We want to
| use only one pointer to point to both of those things, and we don't
| want to waste time with any additional indirections, so we make the
| pointer point between the data chunk and the code chunk.
|
| ad c)
| While they are supported in theory, I couldn't get LLVM to generate
| any tailcalls. Maybe I've done something wrong, maybe they're not
| really implemented yet.
|
| So I guess step one would be to start negotiating about those things
| with the LLVM people.
|
| Cheers,
|
| Wolfgang
|
| ___
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: LLVM back end

2006-12-21 Thread Michael T. Richter
Simon  Wolfgang:

OK, let me address these as far as I know them from my miniscule
research so far:

Global Register Variables -- I'm not sure I understand the point here.
LLVM is a... well, a virtual machine.  It's not a real target.  The LLVM
code is then either simulated in a VM (as per, say, Java) or it is
further compiled to a native representation.  (Or it is JITted, etc.)
If it's run in a VM, there is no register anythings.  For the rest, well
the simplicity of the LLVM format is supposed to make optimisations
easier.  The experience with the gcc3.34 port seems to bear this out
since it supposedly actually generates superior running code to actual
GCC3.  (I don't know if this is true for GCC4, though.)  My guess would
be that any half-decent optimiser would spot a global pointer variable
that's hit a billion times per second and would mark it as a candidate
for register usage.  And if it doesn't?  It sounds like an optimiser
that does just that could be written pretty simply.

The ability to put data next to code -- I'm not exactly sure what you
mean by this.  Do you mean some kind of inlined data like this kind of
psuedo-assembler?


move r1,mem-whatever
jmp foo

mem-data-inline db 1, 2, 3, 4, 5, 6, 7, 8

:foo
move r2,mem-data-inline
...


Tail calls purport to be supported.  The wording, however, says that
marking a call as tail, enables tail-call optimisation.  I'm not
positive what that translates to in practical terms.  I'll dig deeper
into it.  It could be that some kind of back-end optimiser has to be
provided to make this work.  (This is part of the charm of LLVM in my
opinion, though.  The ability to relatively easily write such
back-ends.)

I'm not sure about the concurrency issues.  I'm not sure I understand
what you mean here.  I'll see what LLVM's ability to cope with zillions
of little stacks is like, however.

Exception handling: there's an invoke primitive (a call, basically,
with decorations) and an unwind primitive.  These are in place
specifically for exception-handling purposes.  Given that GCC (3  4) is
a front-end for LLVM, and given that GCC supports exception handling,
I'm guessing this works just fine.  ;)

Garbage collection has extensive support.  Conservative collection is
supported without any work required at all.  Accurate garbage collection
has a whole web page devoted to it:
http://llvm.org/docs/GarbageCollection.html  The precis:

  * support is claimed for compacting semi-space collectors,
mark-sweep collectors, generational collectors, and even
reference counting implementations;
  * it provides read and write barrier support;
  * meta-data association with stack objects is supported.

There is a full API for front-ends to use for garbage collection
(detailed on the provided link).  Of interest is that the garbage
collector seems to be unlinked from the front-end client; you implement
a garbage collector (or take another implementation) on the back-end and
use it with the generic API provided.  How practical this will turn out
to be in real life work is, of course, the rub.  But it does look like
they thought about it at least.

-- 
Michael T. Richter
Email: [EMAIL PROTECTED], [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED], [EMAIL PROTECTED]; YIM:
michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber:
[EMAIL PROTECTED]

Thanks to the Court's decision, only clean Indians or colored people
other than Kaffirs, can now travel in the trams. --Mahatma Gandhi


smiley-4.png
Description: PNG image


signature.asc
Description: This is a digitally signed message part
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-12-21 Thread Wolfgang Thaller

On 21-Dec-06, at 1:26 PM, Michael T. Richter wrote:

Global Register Variables -- I'm not sure I understand the point  
here.  LLVM is a... well, a virtual machine.  It's not a real  
target.  The LLVM code is then either simulated in a VM (as per,  
say, Java) or it is further compiled to a native representation.   
(Or it is JITted, etc.)  If it's run in a VM, there is no register  
anythings.  [...]


Reading between the lines, I take it that you want to port GHC to  
LLVM in its entirety; LLVM-ghc would not support -fvia-C or -fasm  
code generation but leave everything to LLVM only. If you want to  
keep any of the other code generation routes, then the global  
register variables used by GHC are part of the ABI and have to be  
specified explitly.


 It sounds like an optimiser that does just that could be written  
pretty simply.


Well, it wouldn't be just another optimiser, it would require support  
from the code generators, I guess...


On the other hand... if we're not trying to mix LLVM code with  
traditional GHC code anyway, then who says those variables should be  
global? They should probably be parameters to every function, and  
then lets hope that the calling convention used for tail-calls puts  
them in registers:


void %foo(sbyte* %stackPointer, sbyte* %stackLimit, sbyte* % 
heapPointer, sbyte* %heapLimit) {

entry:
tail call void %bar( sbyte* %stackPointer, sbyte* % 
stackLimit, sbyte* %heapPointer, sbyte* %heapLimit )

ret void
}


The ability to put data next to code -- I'm not exactly sure what  
you mean by this.  Do you mean some kind of inlined data like this  
kind of psuedo-assembler?


move r1,mem-whatever
jmp foo

mem-data-inline db 1, 2, 3, 4, 5, 6, 7, 8

:foo
move r2,mem-data-inline
...


Well, basically, a heap object will start with a pointer to foo;  
sometimes we will want to tail-call foo, and sometimes we will want  
to access things at (foo-4) etc, i.e. the things at mem-data-inline.  
We want both things to be blindingly fast, i.e. we don't want to  
dereference another pointer.
Unless we want to link the LLVM-compiled code with -fasm and -fvia-C- 
compiled code, all we need is *any* mechanism to quickly access a  
block of data and a block of code from the same pointer.



On 21-Dec-06, at 9:14 AM, Simon Peyton-Jones wrote:


How about
* concurrency (the ability to have zillions of little stacks,
with stack overflow checks and growing the stck on overflow)?
* exception handling (the ability to crawl over the stack
looking for exception catch frames)?
* garbage collection (the ability to find pointers in the stack)


The alternative to using LLVM's support for all those things is to  
keep using GHC's run-time system and a stack we manage ourselves.  
Then we don't need to care about LLVM's support for those yet  
(although it might still be a good idea later, to give the optimiser  
more opportunities to help us).



Cheers,

Wolfgang
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-12-20 Thread Michael T. Richter
On Tue, 2006-19-12 at 13:13 +, Simon Marlow wrote:

  Is this me opening up a Pandora's Box of ignorance here?  Or is LLVM 
  potentially interesting?  (And were someone motivated into perhaps 
  trying to make an LLVM back-end, where would one start to poke around 
  in, say, the GHC codebase to even begin to implement this?  And how 
  insane would they be driven by the process?)



 Apologies for the slow reply.  Actually I think this is a pretty cool idea 
 (with 
 a disclaimer that I know very nearly nothing about LLVM).  Provided there are 
 no 
 serious gotchas, what you need to do is write a new backend for GHC that 
 translates Cmm to LLVM.  This should be pretty straightforward: for example, 
 the 
 Cmm-C code generator is only 1000 lines of Haskell:
 
http://darcs.haskell.org/ghc/compiler/cmm/PprC.hs
 
 I hope LLVM lets you put data next to code, which is what GHC needs for its 
 info 
 tables.  Also I hope it lets you fix global registers.


Well, I'm almost entirely ignorant of LLVM and of Haskell -- especially
the internals of both.  That makes me ideally suited for this project
since I'm not aware that it's impossible.  :D

Do you mind, Simon, if, since I've pretty much decided that I want to
tackle this, I pick your brains a lot for the Haskell internals side of
the fence?

-- 
Michael T. Richter
Email: [EMAIL PROTECTED], [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED], [EMAIL PROTECTED]; YIM:
michael_richter_1966; AIM: YanJiahua1966; ICQ: 241960658; Jabber:
[EMAIL PROTECTED]

To [the Chinese], all other people are barbarians. --The Dalai Lama


smiley-1.png
Description: PNG image


signature.asc
Description: This is a digitally signed message part
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-12-20 Thread Wolfgang Thaller

On 20-Dec-06, at 10:10 AM, Michael T. Richter wrote:

Well, I'm almost entirely ignorant of LLVM and of Haskell --  
especially the internals of both.  That makes me ideally suited for  
this project since I'm not aware that it's impossible.


I'm afraid LLVM currently lacks some features that are required for  
efficient GHC code generation, specifically:


a) Global Register Variables
b) The ability to put data next to code
c) (maybe, I'm not sure) Tailcalls

ad a)
We need to keep some things in global registers for speed reasons  
(depending on the number of available registers); among other things,  
the haskell stack pointer and the heap pointer. Using regular global  
variables instead would be a lot slower.


ad b)
With (almost) every chunk of code, we want to associate a smal chunk  
of data (the info table) which contains information used by the  
garbage collector and other parts of the run time system. We want to  
use only one pointer to point to both of those things, and we don't  
want to waste time with any additional indirections, so we make the  
pointer point between the data chunk and the code chunk.


ad c)
While they are supported in theory, I couldn't get LLVM to generate  
any tailcalls. Maybe I've done something wrong, maybe they're not  
really implemented yet.


So I guess step one would be to start negotiating about those things  
with the LLVM people.


Cheers,

Wolfgang

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-12-19 Thread Simon Marlow

Michael T. Richter wrote:
I've been eyeing LLVM[1 http://llvm.org] as interesting technology -- 
brief executive summary: a virtual machine suited as the back end of 
compiler output with optimised native code then coming from it as either 
JIT-based execution of the LLVM bytecode or as a further compilation 
step -- and couldn't help but immediately think of the possibility of 
one of the Haskell compiler projects providing an LLVM code generator.  
I think this would help in several areas:


* it could make porting the compiler to other architectures --
  including oddball ones that would be too small to otherwise
  support -- easier;
* it could help remove the nigh-ubiquitous reliance upon GCC as a
  back-end (while I think that GCC is a pretty good piece of
  software, I'm not sure it's really suited to its current role as
  the do-everything back end);
* it could leverage some of the really interesting work that's going
  on in optimisation technology by letting one VM's optimiser do the
  work for any number of languages;
* it could improve interaction between source code written in
  multiple languages. 

Is this me opening up a Pandora's Box of ignorance here?  Or is LLVM 
potentially interesting?  (And were someone motivated into perhaps 
trying to make an LLVM back-end, where would one start to poke around 
in, say, the GHC codebase to even begin to implement this?  And how 
insane would they be driven by the process?)


Apologies for the slow reply.  Actually I think this is a pretty cool idea (with 
a disclaimer that I know very nearly nothing about LLVM).  Provided there are no 
serious gotchas, what you need to do is write a new backend for GHC that 
translates Cmm to LLVM.  This should be pretty straightforward: for example, the 
Cmm-C code generator is only 1000 lines of Haskell:


  http://darcs.haskell.org/ghc/compiler/cmm/PprC.hs

I hope LLVM lets you put data next to code, which is what GHC needs for its info 
tables.  Also I hope it lets you fix global registers.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: LLVM back end

2006-11-22 Thread Al Falloon
I have also been looking into LLVM and I wondered if this had been 
looked at before.


Michael T. Richter wrote:
I've been eyeing LLVM[1 http://llvm.org] as interesting technology -- 

snip
step -- and couldn't help but immediately think of the possibility of 
one of the Haskell compiler projects providing an LLVM code generator.  
I think this would help in several areas:


* it could make porting the compiler to other architectures --
  including oddball ones that would be too small to otherwise
  support -- easier;


C is far more ubiquitous than LLVM. If GCC adopts LLVM as its back-end 
then that could change, but for now, C is still the most common 
portable assembler



* it could leverage some of the really interesting work that's going
  on in optimisation technology by letting one VM's optimiser do the
  work for any number of languages;


I am especially interested in the global optimizations for a static LLVM 
program. I would be curious to see if there is any improvement in the 
performance of the final executables. However, with GHC emitting the 
generated C into a single file, I think that GCC is already able to 
perform most of the same kinds of optimizations.



* it could improve interaction between source code written in
  multiple languages. 


LLVM is really low level, so you still have most of the same problems 
with data representation that you have with using C as the common 
language. However, there is one  advantage over C: since LLVM lets you 
directly represent tail-calls you can support languages like Scheme and 
Haskell that depend on recursion.


Is this me opening up a Pandora's Box of ignorance here?  Or is LLVM 
potentially interesting?  (And were someone motivated into perhaps 
trying to make an LLVM back-end, where would one start to poke around 
in, say, the GHC codebase to even begin to implement this?  And how 
insane would they be driven by the process?)


[1]http://llvm.org


After looking into it, I thought that generating the LLVM bytecode or 
text representation would be much easier than trying to use the FFI to 
link against the LLVM libraries.


So, I thought a good starting project would be to write a module that 
lets you manipulate the LLVM representation in Haskell including reading 
and writing it. From there you would be in a good starting position to 
try to make GHC emit LLVM bytecode instead of C or C--.


I haven't looked at the GHC internals, but C-- and LLVM are very 
similar, so I expect that writing the LLVM back-end would be a fairly 
mechanical transformation of the C-- back-end.


Another nice side-benifit of an LLVM library in Haskell is that you can 
use Haskell to write stand-alone optimization passes for the LLVM compiler.


--
Alan Falloon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users