Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-07 Thread Ludovic Courtès
Andy Wingo  skribis:

> On Tue 07 Jan 2020 12:08, Ludovic Courtès  writes:
>
>> Andy Wingo  skribis:
>>
>>> Concretely I would add a little part of the compiler to the Tree-IL
>>> phase to serialize a bytecode for the "small" definitions in the module,
>>> for declarative modules, both public and private (because public
>>> definitions may alias private definitions).  This would be stored as a
>>> bytevector in an additional field of the module, and the program being
>>> compiled would be transformed to initialize the "lto" field (placeholder
>>> name) of the module, so that once the compiled module is loaded, we have
>>> the inlinable bindings.  I think this can be done compatibly.
>>
>> OK, sounds great.  What are your thoughts about versioning that wire
>> Tree-IL representation?
>
> It would be a little bytecode language, with its own versioning
> considerations.  It would need to have a translation to and from
> Tree-IL, though not necessarily lossless.  It would change only in
> ABI-compatible ways, using the bytecode version of the Guile doing the
> compilation as a proxy for what is OK to support.

I see, that makes a lot of sense to me.  Thanks for explaining!

Ludo’.



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-07 Thread Andy Wingo
On Tue 07 Jan 2020 12:08, Ludovic Courtès  writes:

> Andy Wingo  skribis:
>
>> Concretely I would add a little part of the compiler to the Tree-IL
>> phase to serialize a bytecode for the "small" definitions in the module,
>> for declarative modules, both public and private (because public
>> definitions may alias private definitions).  This would be stored as a
>> bytevector in an additional field of the module, and the program being
>> compiled would be transformed to initialize the "lto" field (placeholder
>> name) of the module, so that once the compiled module is loaded, we have
>> the inlinable bindings.  I think this can be done compatibly.
>
> OK, sounds great.  What are your thoughts about versioning that wire
> Tree-IL representation?

It would be a little bytecode language, with its own versioning
considerations.  It would need to have a translation to and from
Tree-IL, though not necessarily lossless.  It would change only in
ABI-compatible ways, using the bytecode version of the Guile doing the
compilation as a proxy for what is OK to support.

A



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-07 Thread Ludovic Courtès
Hi!

Andy Wingo  skribis:

> On Mon 06 Jan 2020 10:47, Ludovic Courtès  writes:
>
>> Andy Wingo  skribis:
>>
>>> With cross-module inlining of "small" definitions, I think we would
>>> solve a lot of this kind of problem.  I think we could add this during
>>> 3.0 and for this reason I would hesitate to apply this patch for 3.0
>>> because it changes "fx+" exports to be macros rather than "normal"
>>> values in the ABI.  WDYT?
>>
>> I agree that cross-module inlining is the better fix whereas this patch
>> is the immediate workaround.
>>
>> Are you confident that cross-module inlining can happen be added without
>> introducing incompatibilities over in the 3.0 series?  (At first sight
>> it seems tricky to me, notably because we’d have to store Tree-IL in
>> object files, which introduces compatibility and thus external
>> representation versioning considerations.)
>
> Concretely I would add a little part of the compiler to the Tree-IL
> phase to serialize a bytecode for the "small" definitions in the module,
> for declarative modules, both public and private (because public
> definitions may alias private definitions).  This would be stored as a
> bytevector in an additional field of the module, and the program being
> compiled would be transformed to initialize the "lto" field (placeholder
> name) of the module, so that once the compiled module is loaded, we have
> the inlinable bindings.  I think this can be done compatibly.

OK, sounds great.  What are your thoughts about versioning that wire
Tree-IL representation?

>> If you do, then it’s fine to drop this patch.  If conversely
>> cross-module inlining might take longer, then we can have this patch in
>> and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
>> here.  :-))
>
> :)
>
> I hesitate to land this patch because we haven't shown that it
> significantly helps things, it would need to be undone, and it makes the
> ABI more fragile.  So if that's OK let's do nothing :)

Alright, fine with me!

Thanks,
Ludo’.



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-06 Thread Andy Wingo
On Mon 06 Jan 2020 10:47, Ludovic Courtès  writes:

> Andy Wingo  skribis:
>
>> With cross-module inlining of "small" definitions, I think we would
>> solve a lot of this kind of problem.  I think we could add this during
>> 3.0 and for this reason I would hesitate to apply this patch for 3.0
>> because it changes "fx+" exports to be macros rather than "normal"
>> values in the ABI.  WDYT?
>
> I agree that cross-module inlining is the better fix whereas this patch
> is the immediate workaround.
>
> Are you confident that cross-module inlining can happen be added without
> introducing incompatibilities over in the 3.0 series?  (At first sight
> it seems tricky to me, notably because we’d have to store Tree-IL in
> object files, which introduces compatibility and thus external
> representation versioning considerations.)

Concretely I would add a little part of the compiler to the Tree-IL
phase to serialize a bytecode for the "small" definitions in the module,
for declarative modules, both public and private (because public
definitions may alias private definitions).  This would be stored as a
bytevector in an additional field of the module, and the program being
compiled would be transformed to initialize the "lto" field (placeholder
name) of the module, so that once the compiled module is loaded, we have
the inlinable bindings.  I think this can be done compatibly.

> If you do, then it’s fine to drop this patch.  If conversely
> cross-module inlining might take longer, then we can have this patch in
> and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
> here.  :-))

:)

I hesitate to land this patch because we haven't shown that it
significantly helps things, it would need to be undone, and it makes the
ABI more fragile.  So if that's OK let's do nothing :)

Cheers,

Andy



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-06 Thread Ludovic Courtès
Hello,

Andy Wingo  skribis:

> On Sat 04 Jan 2020 01:40, Ludovic Courtès  writes:
>
>> Ludovic Courtès  skribis:
>>
>>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc 
>>> guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>>
>>> ;;; (hash 
>>> "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>>> clock utime stime cutime cstime gctime
>>> 65.17 89.75  0.45   0.00   0.00  35.63
>>
>>   (define fx32xor fxxor)
>>   …
>
> From a speed perspective I think there is one major issue and one minor
> issue.
>
> The major issue is that we don't do cross-module inlining.  But now that
> we have declarative modules, this is a possibility:
>
>   https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html
>   https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html

Neat.  Storing Tree-IL in object files reminds me of LTO in GCC & co.

> With cross-module inlining of "small" definitions, I think we would
> solve a lot of this kind of problem.  I think we could add this during
> 3.0 and for this reason I would hesitate to apply this patch for 3.0
> because it changes "fx+" exports to be macros rather than "normal"
> values in the ABI.  WDYT?

I agree that cross-module inlining is the better fix whereas this patch
is the immediate workaround.

Are you confident that cross-module inlining can happen be added without
introducing incompatibilities over in the 3.0 series?  (At first sight
it seems tricky to me, notably because we’d have to store Tree-IL in
object files, which introduces compatibility and thus external
representation versioning considerations.)

If you do, then it’s fine to drop this patch.  If conversely
cross-module inlining might take longer, then we can have this patch in
and drop it in 3.2.  Your call!  (I guess I’m not being that helpful
here.  :-))

> The minor issue, at least relatively speaking, is that IMO the (rnrs
> arithmetic fixnums) API is not appropriate for bitwise operations.  When
> you do bitwise operations and you want to ensure that you're within some
> fixed domain, it's best to do e.g. "(logand x #x)" on operands
> and results.  Guile will optimize this well.  The good optimization
> isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned
> integers; and you usually want to exclude negative numbers.  fx+ doesn't
> help with that.

I agree, of course.  The tragedy is that people would use this (clumsy)
API in the hope of getting better performance, and the result ends up
being worse performance, at least on today’s Guile (perhaps also on
other implementations?).

Thanks,
Ludo’.



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-06 Thread Ludovic Courtès
Hi Göran,

Göran Weinholt  skribis:

> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>clock utime stime cutime cstime gctime
>  before:
> 2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
> 2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
> 2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
> 2.9.8   1.98  1.99  0.00   0.00   0.00   0.08

Excellent!

> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.

I suppose that’s a challenge for Andy.  ;-)

Thanks,
Ludo’.



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-05 Thread Göran Weinholt
Arne Babenhauserheide  writes:

> Göran Weinholt  writes:
>> I've pushed a Guile-specific version of (hashing fixnums) that inlines
>> the generic arithmetic procedures. This and some other small changes
>> improved the runtime:
>>
>>clock utime stime cutime cstime gctime
>>  before:
>> 2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
>> 2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>>  after:
>> 2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
>> 2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>>
>> That's about 100 times slower than sha256sum from coreutils. You might
>> get some more performance out of it by unrolling the loop in
>> sha-256-transform!.
>
> I’d like to highlight the understatement: The new version is about
> factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new
> version on Guile 2.9.8 is factor 15 faster than the old version with the
> old Guile).
>
> That’s an awesome speedup!
>
> (can you show the change/patch that did this? Sidenote: It would be
> great to have a list of performance hints for Guile)

The commit is here:

https://github.com/weinholt/hashing/commit/eb28080180b5fdfb6ffc74f8cdf2c1a7823ef0cb

The relevant changes are:

* The define-fx macro previously returned:

  #'(define name
  (if (> (fixnum-width) bit-width)
  fxname bitwise-name))

  This was bad for three reasons: Guile does not do constant folding on
  (fixnum-width); in Guile the fixnum procedures are slower than the
  generic procedures; and Guile 2.2 did not inline those defines. It now
  returns:

  #'(define-syntax name
  (identifier-syntax bitwise-name))

  So the generic procedures like +, bitwise-and, etc are used instead of
  fx+ and fxand. The use of define-syntax and identifier-syntax is
  basically manual inlining for the benefit of Guile 2.2.

* I replaced some instances of fixnum operations with the ones from
  (hashing fixnums), which benefit from the above.

* I changed some comparisons with constant fixnums to use eqv? instead
  of fx=?, e.g. (eqv? i 8) instead of (fx=? i 8). This is commonly
  faster, not only in Guile, because in implementations with immediate
  fixnums it can be done with only a simple compare + branch. I don't
  know if Guile takes advantage of this fact, but it's still faster than
  fx=?.

Regards,

-- 
Göran Weinholt   | https://weinholt.se/
Debian developer | 73 de SA6CJK



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-05 Thread Andy Wingo
On Sat 04 Jan 2020 01:40, Ludovic Courtès  writes:

> Ludovic Courtès  skribis:
>
>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc 
>> guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>
>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>> clock utime stime cutime cstime gctime
>> 65.17 89.75  0.45   0.00   0.00  35.63
>
>   (define fx32xor fxxor)
>   …

>From a speed perspective I think there is one major issue and one minor
issue.

The major issue is that we don't do cross-module inlining.  But now that
we have declarative modules, this is a possibility:

  https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00026.html
  https://lists.gnu.org/archive/html/guile-devel/2016-03/msg00027.html

With cross-module inlining of "small" definitions, I think we would
solve a lot of this kind of problem.  I think we could add this during
3.0 and for this reason I would hesitate to apply this patch for 3.0
because it changes "fx+" exports to be macros rather than "normal"
values in the ABI.  WDYT?

The minor issue, at least relatively speaking, is that IMO the (rnrs
arithmetic fixnums) API is not appropriate for bitwise operations.  When
you do bitwise operations and you want to ensure that you're within some
fixed domain, it's best to do e.g. "(logand x #x)" on operands
and results.  Guile will optimize this well.  The good optimization
isn't fixnum vs other kinds of numbers, it's unboxing to raw unsigned
integers; and you usually want to exclude negative numbers.  fx+ doesn't
help with that.

Cheers,

Andy



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-04 Thread Arne Babenhauserheide
Göran Weinholt  writes:
> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>clock utime stime cutime cstime gctime
>  before:
> 2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
> 2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
> 2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
> 2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>
> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.

I’d like to highlight the understatement: The new version is about
factor 10 faster on 2.2.6 and factor 8 faster on 2.9.8 (and the new
version on Guile 2.9.8 is factor 15 faster than the old version with the
old Guile).

That’s an awesome speedup!

(can you show the change/patch that did this? Sidenote: It would be
great to have a list of performance hints for Guile)

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-04 Thread Nala Ginrut
Congrats!
I just replaced Weinholt's hmac implementation with NSS binding for product
consideration, but nice to know this great result! And thanks to Weinholt's
work, Artanis had been relying on it for many years.

Best regards.


On Sat, Jan 4, 2020, 18:36 Göran Weinholt  wrote:

> Ludovic Courtès  writes:
>
> > Ludovic Courtès  skribis:
> >
> >> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure
> --ad-hoc guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
> >>
> >> ;;; (hash
> "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> >> clock utime stime cutime cstime gctime
> >> 65.17 89.75  0.45   0.00   0.00  35.63
> >
> > The patch below gives us:
> >
> > ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm
> >
> > ;;; (hash
> "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> > clock utime stime cutime cstime gctime
> > 59.31 80.65  0.39   0.00   0.00  30.73
> >
> > It’s a disappointingly small improvement.  The reason is that (hashing
> > fixnums) adds another layer of opacity, where it ends up doing
> > essentially:
> >
> >   (define fx32xor fxxor)
> >   …
> >
> > Thus, no inlining, and no easy trick to solve that.  :-/
>
> I've pushed a Guile-specific version of (hashing fixnums) that inlines
> the generic arithmetic procedures. This and some other small changes
> improved the runtime:
>
>clock utime stime cutime cstime gctime
>  before:
> 2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
> 2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
>  after:
> 2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
> 2.9.8   1.98  1.99  0.00   0.00   0.00   0.08
>
> That's about 100 times slower than sha256sum from coreutils. You might
> get some more performance out of it by unrolling the loop in
> sha-256-transform!.
>
> Regards,
>
> --
> Göran Weinholt   | https://weinholt.se/
> Debian developer | 73 de SA6CJK
>
>


Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-04 Thread Göran Weinholt
Ludovic Courtès  writes:

> Ludovic Courtès  skribis:
>
>> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc 
>> guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>>
>> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
>> clock utime stime cutime cstime gctime
>> 65.17 89.75  0.45   0.00   0.00  35.63
>
> The patch below gives us:
>
> ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm
>
> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> clock utime stime cutime cstime gctime
> 59.31 80.65  0.39   0.00   0.00  30.73
>
> It’s a disappointingly small improvement.  The reason is that (hashing
> fixnums) adds another layer of opacity, where it ends up doing
> essentially:
>
>   (define fx32xor fxxor)
>   …
>
> Thus, no inlining, and no easy trick to solve that.  :-/

I've pushed a Guile-specific version of (hashing fixnums) that inlines
the generic arithmetic procedures. This and some other small changes
improved the runtime:

   clock utime stime cutime cstime gctime
 before:
2.2.6  31.06 32.61  0.03   0.00   0.00   1.38
2.9.8  15.55 16.23  0.01   0.00   0.00   1.19
 after:
2.2.6   2.95  3.01  0.00   0.00   0.00   0.10
2.9.8   1.98  1.99  0.00   0.00   0.00   0.08

That's about 100 times slower than sha256sum from coreutils. You might
get some more performance out of it by unrolling the loop in
sha-256-transform!.

Regards,

-- 
Göran Weinholt   | https://weinholt.se/
Debian developer | 73 de SA6CJK



Re: SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-03 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc 
> guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm
>
> ;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
> clock utime stime cutime cstime gctime
> 65.17 89.75  0.45   0.00   0.00  35.63

The patch below gives us:

--8<---cut here---start->8---
ludo@ribbon /tmp/hashing$ guile --r6rs -L .. ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
59.31 80.65  0.39   0.00   0.00  30.73
--8<---cut here---end--->8---

It’s a disappointingly small improvement.  The reason is that (hashing
fixnums) adds another layer of opacity, where it ends up doing
essentially:

  (define fx32xor fxxor)
  …

Thus, no inlining, and no easy trick to solve that.  :-/

Anyhow, I think the patch is probably a good idea.  WDYT?

Thanks,
Ludo’.

diff --git a/module/rnrs/arithmetic/fixnums.scm b/module/rnrs/arithmetic/fixnums.scm
index 4ec1cae0c..c30807eb5 100644
--- a/module/rnrs/arithmetic/fixnums.scm
+++ b/module/rnrs/arithmetic/fixnums.scm
@@ -1,6 +1,6 @@
 ;;; fixnums.scm --- The R6RS fixnums arithmetic library
 
-;;  Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc.
+;;  Copyright (C) 2010, 2011, 2013, 2020 Free Software Foundation, Inc.
 ;;
 ;; This library is free software; you can redistribute it and/or
 ;; modify it under the terms of the GNU Lesser General Public
@@ -75,25 +75,26 @@
 	  fxrotate-bit-field
 	  fxreverse-bit-field)
   (import (only (guile) ash
-		cons*
-			define-inlinable
-			inexact->exact
-			logand
-			logbit?
-			logcount
-			logior
-			lognot
-			logxor
-			most-positive-fixnum 
-			most-negative-fixnum
-			object-address)
+		cons*
+		define-inlinable
+		inexact->exact
+		logand
+		logbit?
+		logcount
+		logior
+		lognot
+		logxor
+		most-positive-fixnum 
+		most-negative-fixnum
+		object-address)
 	  (ice-9 optargs)
 	  (rnrs base (6))
 	  (rnrs control (6))
 	  (rnrs arithmetic bitwise (6))
 	  (rnrs conditions (6))
 	  (rnrs exceptions (6))
-	  (rnrs lists (6)))
+  (rnrs lists (6))
+  (rnrs syntax-case (6)))
 
   (define fixnum-width
 (let ((w (do ((i 0 (+ 1 i))
@@ -121,70 +122,105 @@
 (or (for-all inline-fixnum? args) (raise (make-assertion-violation
 
   (define-syntax define-fxop*
+(lambda (s)
+  (syntax-case s ()
+((_ name op)
+ (with-syntax ((proc (datum->syntax
+  #'name
+  (string->symbol
+   (string-append "%"
+  (symbol->string
+   (syntax->datum #'name))
+  "-proc")
+   #'(begin
+   ;; Define a procedure for when the inline case doesn't
+   ;; apply.
+   (define proc
+ (case-lambda
+   ((x y)
+(assert-fixnum x y)
+(op x y))
+   (args
+(assert-fixnums args)
+(apply op args
+
+   (define-syntax name
+ (lambda (s)
+   (syntax-case s ()
+ ((_ args (... ...))
+  #'(begin
+  (assert-fixnum args (... ...))
+  (op args (... ...
+ (x
+  (identifier? #'x)
+  #'proc))
+
+  (define-syntax define-alias
 (syntax-rules ()
-  ((_ name op)
-   (define name
-	(case-lambda
-	  ((x y)
-	   (assert-fixnum x y)
-	   (op x y))
-	  (args
-	   (assert-fixnums args)
-   (apply op args)))
+  ((_ new old)
+   (define-syntax new (identifier-syntax old)
 
   ;; All these predicates don't check their arguments for fixnum-ness,
   ;; as this doesn't seem to be strictly required by R6RS.
 
-  (define fx=? =)
-  (define fx>? >)
-  (define fx=? >=)
-  (define fx<=? <=)
+  (define-alias fx=? =)
+  (define-alias fx>? >)
+  (define-alias fx=? >=)
+  (define-alias fx<=? <=)
 
-  (define fxzero? zero?)
-  (define fxpositive? positive?)
-  (define fxnegative? negative?)
-  (define fxodd? odd?)
-  (define fxeven? even?)
+  (define-alias fxzero? zero?)
+  (define-alias fxpositive? positive?)
+  (define-alias fxnegative? negative?)
+  (define-alias fxodd? odd?)
+  (define-alias fxeven? even?)
 
   (define-fxop* fxmax max)
   (define-fxop* fxmin min)
 
-  (define (fx+ fx1 fx2)
+  (define-inlinable (fx+ fx1 fx2)
 (assert-fixnum fx1 fx2) 
 (let ((r (+ fx1 fx2))) 
   (or (inline-fixnum? r)
   (raise (make-implementation-restriction-violation)))
   r))
 
-  (define (fx* fx1 fx2)
+  (define-inlinable (fx* fx1 fx2)
 (assert-fixnum 

SHA256 performance with Guile 2.2 vs. Guile 3.0

2020-01-03 Thread Ludovic Courtès
Hello Guilers!

Göran Weinholt wrote a pure-Scheme (R6RS) implementation of common
cryptographic hash functions:

  https://github.com/weinholt/hashing

I thought this would make a useful benchmark (and probably favorable to
JIT because it’s one hot loop), so here goes (computing the SHA256 hash
of ‘guile-2.2.6.tar.xz’ with the code below, compiled with -O2, Guile
2.2.6 vs. 2.9.8):

--8<---cut here---start->8---
ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc guile 
guile-hashing -- guile ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
135.07 164.13  0.52   0.00   0.00  42.14
ludo@ribbon ~/src/guix$ ./pre-inst-env guix environment --pure --ad-hoc 
guile-next guile3.0-hashing -- guile ~/tmp/sha256.scm

;;; (hash "b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988")
clock utime stime cutime cstime gctime
65.17 89.75  0.45   0.00   0.00  35.63
--8<---cut here---end--->8---

Guile 3 is twice as fast!  No difference if we force ‘-O3’.

Still far from the libgcrypt implementation in C + asm, but hey!

--8<---cut here---start->8---
ludo@ribbon ~/src/guix$ time guix hash -f hex 
"/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz"
b33576331465a60b003573541bf3b1c205936a16c407bc69f8419a527bf5c988

real0m0.097s
user0m0.093s
sys 0m0.024s
--8<---cut here---end--->8---

To be continued…

Ludo’.

(use-modules (hashing sha-2)
 (rnrs bytevectors)
 (ice-9 binary-ports)
 (ice-9 match)
 (ice-9 time))

(define (port-sha256 port)
  ;; Return the SHA256 of the data read from PORT.
  (define bv (make-bytevector 65536))
  (define hash (make-sha-256))

  (let loop ()
(match (get-bytevector-n! port bv 0
  (bytevector-length bv))
  ((? eof-object?)
   (sha-256-finish! hash)
   hash)
  (n
   (sha-256-update! hash bv 0 n)
   (loop)

(define (file-sha256 file)
  ;; Return the SHA256 of FILE.
  (call-with-input-file file port-sha256))

(time (pk 'hash (sha-256->string (file-sha256 
"/gnu/store/zfp7d4wr5hbl5lrnzs8c15bsc3ygq74g-guile-2.2.6.tar.xz"