Re: a plan for native compilation

2010-04-22 Thread Andy Wingo
Hi Ken,

On Wed 21 Apr 2010 19:02, Ken Raeburn raeb...@raeburn.org writes:

 On Apr 18, 2010, at 07:41, Andy Wingo wrote:
 Specifically, we should make it so that there is nothing you would
 want
 to go to a core file for. Compiling Scheme code to native code should
 never produce code that segfaults at runtime. All errors would still
 be
 handled by the catch/throw mechanism.

 Including a segfault in compiled Scheme code, caused by an
 application-supplied C procedure returning something that looks like one
 of the pointer-using SCM objects but is in reality just garbage? There
 *will* be core files.

Good point.

 * Debug info in native representations, handled by GDB and other
 debuggers. Okay, this is hard if we don't go via C code as an
 intermediate language, and probably even if we do. But we can
 probably
 at least map PC address ranges to function names and line numbers,
 stuff like that. Maybe we could do the more advanced stuff one format
 at a time, starting with DWARF.
 
 We should be able to do this already; given that we map bytecode
 address
 ranges to line numbers, and the function is on the stack still you you
 can query it for whatever you like. Adding a map when generating
 native
 code should be easy.

 I think for best results with GDB and other debuggers, it should be
 converted into whatever the native format is, DWARF or otherwise.

I agree that this would be nice, eventually. However Guile's debugging
information is currently for Guile, not for GDB. It needs to be readable
by Guile. This would imply DWARF readers for Guile, which would be nice,
but a pain also.

 * Even for JIT compilation, but especially for AOT compilation,
 optimizations should only be enabled with careful consideration of
 concurrent execution. E.g., if (while (not done) ) is supposed
 to work with a second thread altering done, you may not be able to
 combine multiple cases of reading the value of any variable even when
 you can prove that the current thread doesn't alter the value in
 between.
 
 Fortunately, Scheme programming style discourages global variables ;)
 Reminds me of spooky action at a distance. And when they are read,
 it
 is always through an indirection, so we should be good.

 Who said global? It could be two procedures accessing a value in a
 shared outer scope, with one of them launched in a second thread,
 perhaps indirectly via a third procedure which the compiler couldn't
 examine at the time to know that it would create a thread.

 I'm not sure indirection helps -- unless you mean it disables that sort
 of optimization.

Variables which are never set may be copied when closures are made.
Variables which are set! need to be boxed due to continuations, and so
that closures can just copy the box instead of values. There is still an
indirection. The compiler handles this.

 Better for emacs? Well I don't think we should over-sell speed, if
 that's what you're getting at.

 Hey, you're the one who said, Guile can implement Emacs Lisp better
 than Emacs can. :-) And specifically said that Emacs using Guile would
 be faster.

You caught me! ;)

Emacs using Guile will certainly be faster, when we get native
compilation. I think while we're just bytecode-based though it will be
the same.

 The initial work, at least, wouldn't involve a rewrite of Lisp into
 Scheme. So we still need to support dynamic scoping of, well, just about
 anything.

Indeed.

 Native-code compilation will make both Scheme and Elisp significantly
 faster -- I think 4x would be a typical improvement, though one would
 find 2x and 20x as well.

 For raw Scheme data processing, perhaps. Like I said, I'm concerned
 about how much of the performance of Emacs is tied to that of the Emacs
 C code (redisplay, buffer manipulation, etc) and that part probably
 wouldn't improve much if at all. So a 4x speedup of actual Emacs Lisp
 code becomes ... well, a much smaller speedup of Emacs overall.

Ah, a speedup to emacs itself! I was just talking about elisp ;-) It
certainly depends on what you're doing, I guess is the answer here. I
would like my Gnus to be faster, but I'm quite fine with just editing
source code and mail ;-)

 On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
 load my .emacs file.
 
 How long does emacs -Q take?

 Maybe about 1s less?

Good to know, thanks.

 I'm also pondering loading different Lisp files in two or three
 threads in parallel, when dependencies allow, but any manipulation of
 global variables has to be handled carefully, as do any load-time
 errors. (One thread blocks reading, while another executes
 already-loaded code... maybe more, to keep multiple cores busy at
 once.)
 
 This is a little crazy ;-)

 Only a little?

:)


Well, I've spent the whole morning poking mail, which doesn't do much to
help Guile or Emacs. I'm going to see if I can focus on code in the next
two or three weeks, besides GHM organization obligations.

Happy hacking,

Andy
-- 
http://wingolog.org/




Re: a plan for native compilation

2010-04-21 Thread Ken Raeburn
On Apr 18, 2010, at 07:41, Andy Wingo wrote:
 Specifically, we should make it so that there is nothing you would want
 to go to a core file for. Compiling Scheme code to native code should
 never produce code that segfaults at runtime. All errors would still be
 handled by the catch/throw mechanism.

Including a segfault in compiled Scheme code, caused by an application-supplied 
C procedure returning something that looks like one of the pointer-using SCM 
objects but is in reality just garbage?  There *will* be core files.

 * Debug info in native representations, handled by GDB and other
 debuggers. Okay, this is hard if we don't go via C code as an
 intermediate language, and probably even if we do. But we can probably
 at least map PC address ranges to function names and line numbers,
 stuff like that. Maybe we could do the more advanced stuff one format
 at a time, starting with DWARF.
 
 We should be able to do this already; given that we map bytecode address
 ranges to line numbers, and the function is on the stack still you you
 can query it for whatever you like. Adding a map when generating native
 code should be easy.

I think for best results with GDB and other debuggers, it should be converted 
into whatever the native format is, DWARF or otherwise.

 I would actually like to switch our compiled-code on-disk format to be a
 subset of ELF, so we can have e.g. a bytecode section, a native code
 section, sections for RO and RW data, etc. But that would take a fair
 amount of thinking.

And if it's actually compatible with ELF, would make special handling of 
compiled Scheme + compiled C possible on ELF platforms but not others, leading 
to two different ways of potentially building stuff (or, people supporting only 
ELF platforms in their packages, whether intentionally or not; or, people not 
bothering using the non-portable special handling).  Which is why I was 
suggesting native formats rather than ELF specifically -- more work up front, 
but more uniform treatment of platforms in the build process.

 * With some special compile-time hooks, perhaps FFI symbol references
 could turn into (weak?) direct symbol references, processed with
 native relocation handling, etc.
 
 This might improve startup times (marginally?), but it wouldn't affect
 runtimes, would it?

Depending how it's done, it might improve the first reference to a symbol very 
slightly.  You could (again, depending how it's done) perhaps trigger link-time 
errors if a developer forgets to supply libraries defining symbols the Scheme 
code knows will be required, instead of a delayed run-time error.

 * Even for JIT compilation, but especially for AOT compilation,
 optimizations should only be enabled with careful consideration of
 concurrent execution. E.g., if (while (not done) ) is supposed
 to work with a second thread altering done, you may not be able to
 combine multiple cases of reading the value of any variable even when
 you can prove that the current thread doesn't alter the value in
 between.
 
 Fortunately, Scheme programming style discourages global variables ;)
 Reminds me of spooky action at a distance. And when they are read, it
 is always through an indirection, so we should be good.

Who said global?  It could be two procedures accessing a value in a shared 
outer scope, with one of them launched in a second thread, perhaps indirectly 
via a third procedure which the compiler couldn't examine at the time to know 
that it would create a thread.

I'm not sure indirection helps -- unless you mean it disables that sort of 
optimization.

 Of course. Sandboxed code of course should not have access to mutexes or
 the FFI or many other things. Though it is an interesting point, that
 resources that you provide to sandboxed code should be threadsafe, if
 the sandbox itself has threads.

Actually, I'm not sure that mutexes should be forbidden, especially if you let 
the sandbox create threads.  But they should be well-protected, bullet-proof 
mutexes; none of this undefined behavior stuff.  :-)

 * Link compiled C and Scheme parts of a package together into a single
 shared library object, []
 
 This is all very hard stuff!

Maybe somewhat.  The big char array transformation wouldn't be that hard, I 
think, though we'd clearly be going outside the bounds of what a C99 compiler 
is *required* to support in terms of array size.  Slap a C struct wrapper on it 
(or C++, which would give you an encoding system for multiple names in a 
hierarchy, though with different character set limitations), and you've 
basically got an object file ready to be created.  Then you just have to teach 
libguile how not to read files for some modules.

 * Can anything remotely reasonable happen when C++ code calls Scheme
 code which calls C++ code ... with stack-unwinding cleanup code
 specified in both languages, and an exception is raised? []
 
 I have no earthly idea :)

It only just occurred to me.  It may be worth looking at 

Re: a plan for native compilation

2010-04-18 Thread Ludovic Courtès
Hi,

Ken Raeburn raeb...@raeburn.org writes:

 It would be awesome if GDB could display this information when
 debugging a process, *and* when looking at a core file.

Actually, GDB has some Guile support (‘set language scheme’), although
it works only with 1.6 and partly with 1.8 (because tags have changed,
stack management as well, etc.).

Thanks,
Ludo’.





Re: a plan for native compilation

2010-04-17 Thread Andy Wingo
Greets,

On Sat 17 Apr 2010 01:15, l...@gnu.org (Ludovic Courtès) writes:

 Andy Wingo wi...@pobox.com writes:

 So, my thought is to extend procedures with an additional pointer, a
 pointer to a native code structure.

 (So your point is what should we do now to allow for such experiments
 eventually, right?)

 Adding an extra work to programs seems like a good idea, yes.

I'm not sure I have an immediate point ;) Also I don't think it's a good
idea to reserve a word just-in-case. Better to do this work on a
future 2.2 branch. The word would probably go in the objcode structure,
also; a 5-word procedure object has bad cache implications.

 Now, what technology to choose for the compiler itself? Dunno. For a
 JIT, it would be useful to use something portable, and perhaps do the
 JIT compilation on the bytecode itself, without more source information.
 It would not produce the fastest code, but it would run fast.

 Yes, that’s what I had in mind, using GNU lightning (see
 http://www.fdn.fr/~lcourtes/software/guile/jit.html.)  It /seems/ to
 be doable, with milestones to do it incrementally, starting from a dumb
 version.

Yes, I had your model in mind. I think it's generally a good idea,
though I would like to be able to avoid trampolining.

 I think we can produce better native code ahead-of-time coming from the
 tree-il layer directly. I feel like eventually we'll need to replace
 GLIL with something else, but I don't really know; we'll find out in the
 future I guess. But I do want to do ahead-of-time compilation, because
 I want Guile programs to start up very quickly and not consume much
 memory.

 Sure.

 lightning does x86, x86_64, sparc, and powerpc (PLT uses it) while Sassy
 does only x86, so it may be that both could play a role.

 Anyway, not for now.  :-)

Agreed :) Just wanted to reify these thoughts, as breadcrumbs for the
future :)

Andy
-- 
http://wingolog.org/




Re: a plan for native compilation

2010-04-17 Thread Ludovic Courtès
Hi,

Andy Wingo wi...@pobox.com writes:

 It's a shame that GCC has not been able to support LLVM's level of
 innovation,

I don’t think innovation is the problem (did you read the 4.5 release
notes?).  However, it’s been too much of a monolithic compiler, unlike
LLVM, although plug-ins will now improve the situation.

Ludo’.





Re: a plan for native compilation

2010-04-17 Thread Ken Raeburn
Good stuff, Andy!

On Apr 16, 2010, at 07:09, Andy Wingo wrote:
 Currently, Guile has a compiler to a custom virtual machine, and the
 associated toolchain: assemblers and disassemblers, stack walkers, the
 debugger, etc. One can get the source location of a particular
 instruction pointer, for example.

These are great... but if they're run-time features of Guile, they're useless 
when examining a core file.

It would be awesome if GDB could display this information when debugging a 
process, *and* when looking at a core file.  (For both JIT and AOT compilation, 
of course.)  It doesn't currently know about Scheme and Guile, so obviously 
some work would need to be done on that side.  It's got some rather clunky 
looking hooks for providing debug info associated with JIT compilation, which I 
think I mentioned in IRC a while back.  Maybe the GDB developers could be 
persuaded to support a more direct way of supplying debug info than the current 
mechanism, such as a pointer to DWARF data.  GDB would also need to learn about 
Scheme and Guile specifically, which would take cooperation from both groups.

Obviously, when looking at a core file, no helper code from the library can be 
executed.  Perhaps less obviously, with a live process, when doing simple 
things like looking at symbol values, you probably don't want to execute 
library code if it means enabling other threads to resume executing for a while 
as well.

GDB 7 supports supplying Python code to pretty-print selected object types, or 
defining new commands.  We could supply Python code for looking at SCM objects, 
maybe even walking the stack, if that turns out to be practical with the 
interfaces GDB supplies.

 So, my thought is to extend procedures with an additional pointer, a
 pointer to a native code structure. The native code could be written
 out ahead-of-time, or compiled at runtime. But procedures would still
 have bytecode, for various purposes for example to enable code coverage
 via the next-instruction hook, and in the JIT case, because only some
 procedures will be native-compiled.

I wondered about this, when looking briefly at what JIT compilation would need 
to generate given certain byte codes.  Would you generate code based on the 
debug or non-debug versions of the instructions?  What would the choice depend 
on?  Can both bytecode evaluators be used in one process and with the same 
bytecode object?

What about when profiling for performance?

 We keep the same stack representation, so stack walkers and the debugger
 still work. Some local variables can be allocated into registers, but
 procedure args are still passed and returned on the stack. Though the
 procedure's arity and other metadata would be the same, the local
 variable allocations and source locations would differ, so we would need
 some additional debugger support, but we can work on that when the time
 comes.

The call sequence would have to work a little differently from now, I think.  
As you describe:

 All Scheme procedures, bytecode and native, will run inside the VM. If a
 bytecode procedure calls a native procedure, the machine registers are
 saved, and some machine-specific stub transfers control to the native
 code. Native code calling native code uses the same stack as the VM,
 though it has its own conventions over what registers to save; and
 native code calling bytecode prepares the Scheme stack, then restores
 the VM-saved machine registers.

Does the native code figure out if it's jumping to byte code or machine code, 
or does it use some transfer stub?

 AIUI the hotspot compiler actually does an SSA transformation of Java
 bytecode, then works on that. I'm not particularly interested in
 something like that; I'm more interested in something direct and fast,
 and obviously correct and understandable by our debugging
 infrastructure.

Though as you say, we can experiment later with additional changes.  If there's 
some heavily-used dynamically-generated code, it may be worth the extra effort, 
but we can find that out after we've got something working.

 Anyway, just some thoughts here. I'm not going to focus on native
 compilation in the coming months, as there are other things to do, but
 this is how I think it should be done :-)

Some random thoughts of my own:

Several possible options for AOT compilation (e.g., generating C or assembly 
and using native tools) could involve the generation of native object files.  
It seems tempting to me to see how much we might be able to use the native 
C/C++/Fortran/etc method or do something parallel:

* Debug info in native representations, handled by GDB and other debuggers.  
Okay, this is hard if we don't go via C code as an intermediate language, and 
probably even if we do.  But we can probably at least map PC address ranges to 
function names and line numbers, stuff like that.  Maybe we could do the more 
advanced stuff one format at a time, starting with DWARF.

* Code and read-only data 

Re: a plan for native compilation

2010-04-16 Thread No Itisnt
One option I am really starting to like is LLVM. I know what you're
thinking, huge memory consumption, giant dependency, etc, but it's so
cool! It supports every desktop architecture too.




Re: a plan for native compilation

2010-04-16 Thread Ludovic Courtès
Howdy!

Andy Wingo wi...@pobox.com writes:

 So, my thought is to extend procedures with an additional pointer, a
 pointer to a native code structure.

(So your point is what should we do now to allow for such experiments
eventually, right?)

Adding an extra work to programs seems like a good idea, yes.

 Now, what technology to choose for the compiler itself? Dunno. For a
 JIT, it would be useful to use something portable, and perhaps do the
 JIT compilation on the bytecode itself, without more source information.
 It would not produce the fastest code, but it would run fast.

Yes, that’s what I had in mind, using GNU lightning (see
http://www.fdn.fr/~lcourtes/software/guile/jit.html.)  It /seems/ to
be doable, with milestones to do it incrementally, starting from a dumb
version.

 I think we can produce better native code ahead-of-time coming from the
 tree-il layer directly. I feel like eventually we'll need to replace
 GLIL with something else, but I don't really know; we'll find out in the
 future I guess. But I do want to do ahead-of-time compilation, because
 I want Guile programs to start up very quickly and not consume much
 memory.

Sure.

lightning does x86, x86_64, sparc, and powerpc (PLT uses it) while Sassy
does only x86, so it may be that both could play a role.

Anyway, not for now.  :-)

Thanks,
Ludo’.