Re: guile 3 update, september edition
Hello, Andy Wingo skribis: > On Mon 17 Sep 2018 11:35, l...@gnu.org (Ludovic Courtès) writes: > >>> The threshold at which Guile will automatically JIT-compile is set from >>> the GUILE_JIT_THRESHOLD environment variable. By default it is 5. >>> If you set it to -1, you disable the JIT. If you set it to 0, *all* >>> code will be JIT-compiled. The test suite passes at >>> GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are >>> supported by the JIT. Set the GUILE_JIT_LOG environment variable to 1 >>> or 2 to see JIT progress. >> >> Just to be clear, does GUILE_JIT_THRESHOLD represents the number of >> times a given instruction pointer is hit? > > No. It is an abstract "hotness" counter associated with a function's > code. (I say "function's code" because many closures can share the same > code and thus the same counter. It's not in the scm_tc7_program object > because some procedures don't have these.) > > All counters start at 0 when Guile starts. A function's counters > increment by 30 when a function is called, currently, and 2 on every > loop back-edge. I have not attempted to tweak these values yet. OK, I see. Exciting times! Ludo’.
Re: guile 3 update, september edition
Greets :) On Mon 17 Sep 2018 11:35, l...@gnu.org (Ludovic Courtès) writes: >> The threshold at which Guile will automatically JIT-compile is set from >> the GUILE_JIT_THRESHOLD environment variable. By default it is 5. >> If you set it to -1, you disable the JIT. If you set it to 0, *all* >> code will be JIT-compiled. The test suite passes at >> GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are >> supported by the JIT. Set the GUILE_JIT_LOG environment variable to 1 >> or 2 to see JIT progress. > > Just to be clear, does GUILE_JIT_THRESHOLD represents the number of > times a given instruction pointer is hit? No. It is an abstract "hotness" counter associated with a function's code. (I say "function's code" because many closures can share the same code and thus the same counter. It's not in the scm_tc7_program object because some procedures don't have these.) All counters start at 0 when Guile starts. A function's counters increment by 30 when a function is called, currently, and 2 on every loop back-edge. I have not attempted to tweak these values yet. >> Using GNU Lightning has been useful but in the long term I don't think >> it's the library that we need, for a few reasons: > > [...] > > It might be that the lightning 1.x branch would be a better fit (it was > exactly as you describe.) I think that’s what Racket was (is?) using. Could be! I will have a look. Cheers, Andy
Re: guile 3 update, september edition
Le lun. 17 sept. 2018 à 10:26, Andy Wingo a écrit : > Hi! > > This is an update on progress towards Guile 3. In our last update, we > saw the first bits of generated code: > > https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html > > Since then, the JIT is now feature-complete. It can JIT-compile *all* > code in Guile, including delimited continuations, dynamic-wind, all > that. It runs automatically, in response to a function being called a > lot. It can also tier up from within hot loops. > This looks very good! When the merge will be done, maybe it will be time to move guile-next to master in guix? WDYT? Keep it steady!
Re: guile 3 update, september edition
Hello! Andy Wingo skribis: > This is an update on progress towards Guile 3. In our last update, we > saw the first bits of generated code: > > https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html > > Since then, the JIT is now feature-complete. It can JIT-compile *all* > code in Guile, including delimited continuations, dynamic-wind, all > that. It runs automatically, in response to a function being called a > lot. It can also tier up from within hot loops. Woohoo! It’s awesome that JIT can already handle all Guile code and run all the test suite. To me that means it can be merged into ‘master’. > The threshold at which Guile will automatically JIT-compile is set from > the GUILE_JIT_THRESHOLD environment variable. By default it is 5. > If you set it to -1, you disable the JIT. If you set it to 0, *all* > code will be JIT-compiled. The test suite passes at > GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are > supported by the JIT. Set the GUILE_JIT_LOG environment variable to 1 > or 2 to see JIT progress. Just to be clear, does GUILE_JIT_THRESHOLD represents the number of times a given instruction pointer is hit? > For debugging (single-stepping, tracing, breakpoints), Guile will fall > back to the bytecode interpreter (the VM), for the thread that has > debugging enabled. Once debugging is no longer enabled (no more hooks > active), that thread can return to JIT-compiled code. Cool. > Using GNU Lightning has been useful but in the long term I don't think > it's the library that we need, for a few reasons: [...] It might be that the lightning 1.x branch would be a better fit (it was exactly as you describe.) I think that’s what Racket was (is?) using. > Meaning that "eval" in Guile 3 is somewhere around 80% faster than in > Guile 2.2 -- because "eval" is now JIT-compiled. Very cool. > Incidentally, as a comparison, Guile 2.0 (whose "eval" is slower for > various reasons) takes 70s real time for the same benchmark. Guile 1.8, > whose eval was written in C, takes 4.536 seconds real time. It's still > a little faster than Guile 3's eval-in-Scheme, but it's close and we're > catching up :) It’s an insightful comparison; soon we can say it’s “as fast as hand-optimized C” (and more readable, too :-)). > I have also tested with ecraven's r7rs-benchmarks and we make a nice > jump past the 2.2 results; but not yet at Racket or Chez levels yet. I > think we need to tighten up our emitted code. There's another 2x of > performance that we should be able to get with incremental improvements. > For the last bit we will need global register allocation though, I > think. Looking forward to reading ecraven’s updated benchmark results. Thank you for the awesomeness! Ludo’.
guile 3 update, september edition
Hi! This is an update on progress towards Guile 3. In our last update, we saw the first bits of generated code: https://lists.gnu.org/archive/html/guile-devel/2018-08/msg5.html Since then, the JIT is now feature-complete. It can JIT-compile *all* code in Guile, including delimited continuations, dynamic-wind, all that. It runs automatically, in response to a function being called a lot. It can also tier up from within hot loops. The threshold at which Guile will automatically JIT-compile is set from the GUILE_JIT_THRESHOLD environment variable. By default it is 5. If you set it to -1, you disable the JIT. If you set it to 0, *all* code will be JIT-compiled. The test suite passes at GUILE_JIT_THRESHOLD=0, indicating that all features in Guile are supported by the JIT. Set the GUILE_JIT_LOG environment variable to 1 or 2 to see JIT progress. For debugging (single-stepping, tracing, breakpoints), Guile will fall back to the bytecode interpreter (the VM), for the thread that has debugging enabled. Once debugging is no longer enabled (no more hooks active), that thread can return to JIT-compiled code. Right now the JIT-compiled code exactly replicates what the bytecode interpreter does: the same stack reads and writes, etc. There is some specialization when a bytecode has immediate operands of course. However the choice to do debugging via the bytecode interpreter -- effectively, to always have bytecode around -- will allow machine code (compiled either just-in-time or ahead-of-time) to do register allocation. JIT will probably do a simple block-local allocation. An AOT compiler is free to do something smarter. As far as I can tell, with the default setting of GUILE_JIT_THRESHOLD=5, JIT does not increase startup latency for any workload, and always increases throughput. More benchmarking is needed though. Using GNU Lightning has been useful but in the long term I don't think it's the library that we need, for a few reasons: * When Lightning does a JIT compilation, it builds a graph of operations, does some minor optimizations, and then emits code. But the graph phase takes time and memory. I think we just need a library that just emits code directly. That would lower the cost of JIT and allow us to lower the default GUILE_JIT_THRESHOLD. * The register allocation phase in Lightning exists essentially for calls. However we have a very restricted set of calls that we need to do, and can do the allocation by hand on each architecture. This (We don't use CPU call instructions for Scheme function calls because we use the VM stack. We might be able to revise this in the future but again Lightning is in the way). Doing it by hand would allow a few benefits: * Hand allocation would free up more temporary registers. Right now lightning reserves all registers used as part of the platform calling convention; they are unavailable to the JIT. * Sometimes when Lightning needs a temporary register, it can clobber one that we're using as part of an internal calling convention. I believe this is fixed for x86-64 but I can't be sure for other architectures! See commit 449ef7d9755b553cb0ad2629bca3bc42c5913e88. * We need to do our own register allocation; having Lightning also do it is a misfeature. * Sometimes we know that we can get better emitted code, but the lightning abstraction doesn't let us do it. We should allow ourselves to punch through that abstraction. The platform-specific Lightning files basically expose most of the API we need. We could consider incrementally punching through lightning.h to reach those files. Something to think about for the future. Finally, as far as performance goes -- we're generally somewhere around 80% faster than 2.2. Sometimes more, sometimes less, always faster though AFAIK. As an example, here's a simple fib.scm: $ cat /tmp/fib.scm (define (fib n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2) Now let's use eval-in-scheme to print the 35th fibonacci number. For Guile 2.2: $ time /opt/guile-2.2/bin/guile -c \ '(begin (primitive-load "/tmp/fib.scm") (pk (fib 35)))' ;;; (14930352) real 0m9.610s user 0m10.547s sys 0m0.040s But with Guile from the lightning branch, we get: $ time /opt/guile/bin/guile -c \ '(begin (primitive-load "/tmp/fib.scm") (pk (fib 35)))' ;;; (14930352) real 0m5.299s user 0m6.167s sys 0m0.064s Meaning that "eval" in Guile 3 is somewhere around 80% faster than in Guile 2.2 -- because "eval" is now JIT-compiled. (Otherwise it's the same program.) This improves bootstrap times, though Guile 3's compiler will generally make more CPS nodes than Guile 2.2 for the same expression, which takes more time and memory, so the gain isn't earth-shattering. Incidentally, as a comparison, Guile 2.0 (whose