Re: guile 3 update, september edition

2018-09-18 Thread Ludovic Courtès
Hello,

Andy Wingo  skribis:

> On Mon 17 Sep 2018 11:35, l...@gnu.org (Ludovic Courtès) writes:
>
>>> The threshold at which Guile will automatically JIT-compile is set from
>>> the GUILE_JIT_THRESHOLD environment variable.  By default it is 5.
>>> If you set it to -1, you disable the JIT.  If you set it to 0, *all*
>>> code will be JIT-compiled.  The test suite passes at
>>> GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are
>>> supported by the JIT.  Set the GUILE_JIT_LOG environment variable to 1
>>> or 2 to see JIT progress.
>>
>> Just to be clear, does GUILE_JIT_THRESHOLD represents the number of
>> times a given instruction pointer is hit?
>
> No.  It is an abstract "hotness" counter associated with a function's
> code.  (I say "function's code" because many closures can share the same
> code and thus the same counter.  It's not in the scm_tc7_program object
> because some procedures don't have these.)
>
> All counters start at 0 when Guile starts.  A function's counters
> increment by 30 when a function is called, currently, and 2 on every
> loop back-edge.  I have not attempted to tweak these values yet.

OK, I see.

Exciting times!

Ludo’.



Re: guile 3 update, september edition

2018-09-18 Thread Andy Wingo
Greets :)

On Mon 17 Sep 2018 11:35, l...@gnu.org (Ludovic Courtès) writes:

>> The threshold at which Guile will automatically JIT-compile is set from
>> the GUILE_JIT_THRESHOLD environment variable.  By default it is 5.
>> If you set it to -1, you disable the JIT.  If you set it to 0, *all*
>> code will be JIT-compiled.  The test suite passes at
>> GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are
>> supported by the JIT.  Set the GUILE_JIT_LOG environment variable to 1
>> or 2 to see JIT progress.
>
> Just to be clear, does GUILE_JIT_THRESHOLD represents the number of
> times a given instruction pointer is hit?

No.  It is an abstract "hotness" counter associated with a function's
code.  (I say "function's code" because many closures can share the same
code and thus the same counter.  It's not in the scm_tc7_program object
because some procedures don't have these.)

All counters start at 0 when Guile starts.  A function's counters
increment by 30 when a function is called, currently, and 2 on every
loop back-edge.  I have not attempted to tweak these values yet.

>> Using GNU Lightning has been useful but in the long term I don't think
>> it's the library that we need, for a few reasons:
>
> [...]
>
> It might be that the lightning 1.x branch would be a better fit (it was
> exactly as you describe.)  I think that’s what Racket was (is?) using.

Could be!  I will have a look.

Cheers,

Andy



Re: guile 3 update, september edition

2018-09-17 Thread Amirouche Boubekki
Le lun. 17 sept. 2018 à 10:26, Andy Wingo  a écrit :

> Hi!
>
> This is an update on progress towards Guile 3.  In our last update, we
> saw the first bits of generated code:
>
>   https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html
>
> Since then, the JIT is now feature-complete.  It can JIT-compile *all*
> code in Guile, including delimited continuations, dynamic-wind, all
> that.  It runs automatically, in response to a function being called a
> lot.  It can also tier up from within hot loops.
>

This looks very good!

When the merge will be done, maybe it will be time to move guile-next to
master in guix? WDYT?

Keep it steady!


Re: guile 3 update, september edition

2018-09-17 Thread Ludovic Courtès
Hello!

Andy Wingo  skribis:

> This is an update on progress towards Guile 3.  In our last update, we
> saw the first bits of generated code:
>
>   https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html
>
> Since then, the JIT is now feature-complete.  It can JIT-compile *all*
> code in Guile, including delimited continuations, dynamic-wind, all
> that.  It runs automatically, in response to a function being called a
> lot.  It can also tier up from within hot loops.

Woohoo!  It’s awesome that JIT can already handle all Guile code and run
all the test suite.  To me that means it can be merged into ‘master’.

> The threshold at which Guile will automatically JIT-compile is set from
> the GUILE_JIT_THRESHOLD environment variable.  By default it is 5.
> If you set it to -1, you disable the JIT.  If you set it to 0, *all*
> code will be JIT-compiled.  The test suite passes at
> GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are
> supported by the JIT.  Set the GUILE_JIT_LOG environment variable to 1
> or 2 to see JIT progress.

Just to be clear, does GUILE_JIT_THRESHOLD represents the number of
times a given instruction pointer is hit?

> For debugging (single-stepping, tracing, breakpoints), Guile will fall
> back to the bytecode interpreter (the VM), for the thread that has
> debugging enabled.  Once debugging is no longer enabled (no more hooks
> active), that thread can return to JIT-compiled code.

Cool.

> Using GNU Lightning has been useful but in the long term I don't think
> it's the library that we need, for a few reasons:

[...]

It might be that the lightning 1.x branch would be a better fit (it was
exactly as you describe.)  I think that’s what Racket was (is?) using.

> Meaning that "eval" in Guile 3 is somewhere around 80% faster than in
> Guile 2.2 -- because "eval" is now JIT-compiled.

Very cool.

> Incidentally, as a comparison, Guile 2.0 (whose "eval" is slower for
> various reasons) takes 70s real time for the same benchmark.  Guile 1.8,
> whose eval was written in C, takes 4.536 seconds real time.  It's still
> a little faster than Guile 3's eval-in-Scheme, but it's close and we're
> catching up :)

It’s an insightful comparison; soon we can say it’s “as fast as
hand-optimized C” (and more readable, too :-)).

> I have also tested with ecraven's r7rs-benchmarks and we make a nice
> jump past the 2.2 results; but not yet at Racket or Chez levels yet.  I
> think we need to tighten up our emitted code.  There's another 2x of
> performance that we should be able to get with incremental improvements.
> For the last bit we will need global register allocation though, I
> think.

Looking forward to reading ecraven’s updated benchmark results.

Thank you for the awesomeness!

Ludo’.



guile 3 update, september edition

2018-09-17 Thread Andy Wingo
Hi!

This is an update on progress towards Guile 3.  In our last update, we
saw the first bits of generated code:

  https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html

Since then, the JIT is now feature-complete.  It can JIT-compile *all*
code in Guile, including delimited continuations, dynamic-wind, all
that.  It runs automatically, in response to a function being called a
lot.  It can also tier up from within hot loops.

The threshold at which Guile will automatically JIT-compile is set from
the GUILE_JIT_THRESHOLD environment variable.  By default it is 5.
If you set it to -1, you disable the JIT.  If you set it to 0, *all*
code will be JIT-compiled.  The test suite passes at
GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are
supported by the JIT.  Set the GUILE_JIT_LOG environment variable to 1
or 2 to see JIT progress.

For debugging (single-stepping, tracing, breakpoints), Guile will fall
back to the bytecode interpreter (the VM), for the thread that has
debugging enabled.  Once debugging is no longer enabled (no more hooks
active), that thread can return to JIT-compiled code.

Right now the JIT-compiled code exactly replicates what the bytecode
interpreter does: the same stack reads and writes, etc.  There is some
specialization when a bytecode has immediate operands of course.
However the choice to do debugging via the bytecode interpreter --
effectively, to always have bytecode around -- will allow machine code
(compiled either just-in-time or ahead-of-time) to do register
allocation.  JIT will probably do a simple block-local allocation.  An
AOT compiler is free to do something smarter.

As far as I can tell, with the default setting of
GUILE_JIT_THRESHOLD=5, JIT does not increase startup latency for any
workload, and always increases throughput.  More benchmarking is needed
though.

Using GNU Lightning has been useful but in the long term I don't think
it's the library that we need, for a few reasons:

  * When Lightning does a JIT compilation, it builds a graph of
operations, does some minor optimizations, and then emits code.  But
the graph phase takes time and memory.  I think we just need a
library that just emits code directly.  That would lower the cost of
JIT and allow us to lower the default GUILE_JIT_THRESHOLD.

  * The register allocation phase in Lightning exists essentially for
calls.  However we have a very restricted set of calls that we need
to do, and can do the allocation by hand on each architecture.  This
(We don't use CPU call instructions for Scheme function calls
because we use the VM stack.  We might be able to revise this in the
future but again Lightning is in the way).  Doing it by hand would
allow a few benefits:

  * Hand allocation would free up more temporary registers.  Right
now lightning reserves all registers used as part of the platform
calling convention; they are unavailable to the JIT.

  * Sometimes when Lightning needs a temporary register, it can
clobber one that we're using as part of an internal calling
convention.  I believe this is fixed for x86-64 but I can't be
sure for other architectures!  See commit
449ef7d9755b553cb0ad2629bca3bc42c5913e88.

  * We need to do our own register allocation; having Lightning also
do it is a misfeature.

  * Sometimes we know that we can get better emitted code, but the
lightning abstraction doesn't let us do it.  We should allow
ourselves to punch through that abstraction.

The platform-specific Lightning files basically expose most of the API
we need.  We could consider incrementally punching through lightning.h
to reach those files.  Something to think about for the future.

Finally, as far as performance goes -- we're generally somewhere around
80% faster than 2.2.  Sometimes more, sometimes less, always faster
though AFAIK.  As an example, here's a simple fib.scm:

   $ cat /tmp/fib.scm
   (define (fib n)
 (if (< n 2)
 1
 (+ (fib (- n 1))
(fib (- n 2)

Now let's use eval-in-scheme to print the 35th fibonacci number.  For
Guile 2.2:

   $ time /opt/guile-2.2/bin/guile -c \
   '(begin (primitive-load "/tmp/fib.scm") (pk (fib 35)))'

   ;;; (14930352)

   real 0m9.610s
   user 0m10.547s
   sys  0m0.040s

But with Guile from the lightning branch, we get:

   $ time /opt/guile/bin/guile -c \
   '(begin (primitive-load "/tmp/fib.scm") (pk (fib 35)))'

   ;;; (14930352)

   real 0m5.299s
   user 0m6.167s
   sys  0m0.064s

Meaning that "eval" in Guile 3 is somewhere around 80% faster than in
Guile 2.2 -- because "eval" is now JIT-compiled.  (Otherwise it's the
same program.)  This improves bootstrap times, though Guile 3's compiler
will generally make more CPS nodes than Guile 2.2 for the same
expression, which takes more time and memory, so the gain isn't
earth-shattering.

Incidentally, as a comparison, Guile 2.0 (whose