Re: [Chicken-users] performance of bignums

2015-07-06 Thread Peter Bex
On Sun, Jun 28, 2015 at 09:47:30PM +0200, Peter Bex wrote:
 So far it seems my implementation of Burnikel/Ziegler division is rather
 unstable, performance-wise.  If I disable burnikel/ziegler so it falls
 back to the traditional gradebook method, the benchmark finishes in
 a quarter of the time it takes to run with BZ enabled.

I have gotten a handle on this problem now: It looks a lot like the
problem is either in the original paper's description of the algorithm
or in my interpretation of it:

It does some shifting magic to make the two numbers get a length so that
the dividend is twice as large as the divisor, then it passes the length
of the smaller one, which is used in the recursion (read the paper if you
want the details).

In the recursion, the size is always taken from this value that's passed
around, which is divided by two or four, depending on the part of the
algorithm you're in.  However, if you have a number with sparse digits,
this is completely unrealistic.  Splitting the number up into parts may
cause very small numbers to be passed to the recursion.  If you follow
the paper strictly, you will perform the divisions on these small numbers
assuming they are exactly half the size of the original number.  So you'd
be performing a lot of work down into the recursion until you hit the
cutoff point (which is 300 in our case), instead of just jumping back into
the regular gradebook division.

At least, that's my (admittedly very) limited understanding of what may
be going wrong here.

I had a quick look at GMP, but of course that code is inscrutable;
it *looks* like it is also ignoring the size propagation, instead
relying on the numbers' _actual_ size.  I read up on the various
descriptions of the algorithm and found that the algorithm's description
in Modern Computer Arithmetic by Brent  Zimmermann is completely
different (and a little bit simpler): it also looks at the number's
*actual* size.  Furthermore, it uses the difference in sizes as the
determining factor when to stop.

I tried using Brent  Zimmermann's algorithm as-is, but unfortunately
that caused the performance of another benchmark to be rather worse, and
it would anyway require me to convert that pile of code to C again for
CHICKEN 5 (so it's inlineable).  Then I tried applying the difference
check instead of the size check and *bingo*, the algorithm performed as
expected.

While investigating, I figured out that others have had the same basic
problem, but nobody seems to have written down exactly what's going
wrong (neither have I, but at least this mail is an attempt at
documenting my findings):

The following blog post mentions
also having extremely bad performance until they switched to using
a simple split the numbers in half approach:
https://deamentiaemundi.wordpress.com/2014/11/16/multiple-precision-in-javascript-fast-division-i/

There's a patch for Tcl to use Burnikel-Ziegler division that never
was applied due to disappointing results:
http://sourceforge.net/p/tcl/patches/589/

Then, I also tried a Python implementation against the Python version
of the pidigits benchmark (disabling the GMP stuff):
http://bugs.python.org/issue3451 (file fast_div.py)
This had the same problem that the fast division was much,
much slower than the straightforward gradebook/school division.

Anyway, I've now applied a patch to numbers trunk (I am awaiting
feedback on another bug report that I may have fixed before releasing)
and to CHICKEN 5 master.  The performance is now a lot closer to what
one might reasonably expect from a Scheme that doesn't use GMP.

On my 64-bit laptop, the results of CHICKEN 5 on pidigits is:
7.764s CPU time, 1.796s GC time (major), 208425/208209 mutations 
(total/tracked), 3628/4917 GCs (major/minor)

Versus Guile 2 (with GMP):
clock utime stime cutime cstime gctime
 6.05  5.81  0.24   0.00   0.00   4.51

Not too bad, I'd say :)

Numbers is a little slower, as is expected:
9.428s CPU time, 5.196s GC time (major), 208425 mutations, 8391/797 GCs 
(major/minor)


Unfortunately, this is the best of all possible cases.  On my 32-bit VPS
the performance gap between Guile and CHICKEN is much bigger:

CHICKEN 5 on 32-bits:
37.734s CPU time, 16.917s GC time (major), 209346/209296 mutations 
(total/tracked), 17038/14731 GCs (major/minor)

Guile 2 on 32-bits:
clock utime stime cutime cstime gctime
12.96 12.74  0.20   0.00   0.00   0.00

Numbers is again somewhat slower than CHICKEN 5:
46.806s CPU time, 30.346s GC time (major), 209346 mutations, 22827/12045 GCs 
(major/minor)

I'd still be interested in seeing comparisons with other Schemes.
Remember, the improved version of numbers is not released yet, so
you'll have to get it from Subversion for the time being:
https://code.call-cc.org/svn/chicken-eggs/release/4/numbers/trunk
(username: anonymous, password: the empty string)

If anyone else wants to take a stab at whittling down the
performance difference even further, please be my guest!  For now
I need a break from 

Re: [Chicken-users] performance of bignums

2015-06-28 Thread Peter Bex
On Thu, Jun 25, 2015 at 10:09:19PM +0200, Peter Bex wrote:
 I already removed the use of the format egg (the code contains a
 commented-out version that relies only on display), and even
 completely disable the println call.  It's hard to be sure, but it
 *looks* like the majority of the time is being spent in the division
 procedures.

So far it seems my implementation of Burnikel/Ziegler division is rather
unstable, performance-wise.  If I disable burnikel/ziegler so it falls
back to the traditional gradebook method, the benchmark finishes in
a quarter of the time it takes to run with BZ enabled.

It's not just that the BZ implementation is really bad: it does improve
performance on Chudnovsky's digits of pi calculation benchmark by a
factor of four or so.  I will do further experimentation.  I also noticed
a bug in the gradebook method, which I will have to address as well.

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-28 Thread Kooda
On Sun, Jun 28, 2015 at 09:47:30PM +0200, Peter Bex wrote:
 So far it seems my implementation of Burnikel/Ziegler division is rather
 unstable, performance-wise.  If I disable burnikel/ziegler so it falls
 back to the traditional gradebook method, the benchmark finishes in
 a quarter of the time it takes to run with BZ enabled.
 
 It's not just that the BZ implementation is really bad: it does improve
 performance on Chudnovsky's digits of pi calculation benchmark by a
 factor of four or so.  I will do further experimentation.  I also noticed
 a bug in the gradebook method, which I will have to address as well.
 

Thanks a lot for putting so much effort in this!

If you need anything please do not hesitate to ask, I’m really excited
about that feature in CHICKEN 5 (and surely I’m not alone). :)

-- 
Envoyé depuis ma GameBoy.

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Dan Leslie

SBCL is Public Domain/MIT/BSD, depending on the component. GMP is dual
licensed as LGPL3 and GPL2.

The licensing issues would impact those who distribute binaries built
with SBCL, but only if they are statically linked to GMP.

-Dan

Stephen Eilert spedr...@gmail.com writes:

 On Thu, Jun 25, 2015 at 4:50 PM, Peter Bex pe...@more-magic.net wrote:

 On Thu, Jun 25, 2015 at 11:39:50AM -0700, Martin DeMello wrote:
  Post to /r/scheme about chicken's bignum performance. (Not my post,
  just figured it could use some eyeballs.)
 
 
 http://www.reddit.com/r/scheme/comments/3b1ujw/performance_of_chicken_scheme_numbers_bignums/

 Hello Martin,

 Thanks for posting this.  We had already been discussing it earlier
 today in #chicken.  I had another look at the code but I can't really
 find any obvious inefficiencies.  It is indeed a bit faster with
 CHICKEN 5, but not by much.

 Of course, Guile is cheating by using GMP.  If I compare it to another
 Scheme which has its own bignum implementation like Gauche, we perform
 about the same.  It's interesting that sbcl is doing so well.  Maybe I'm
 overlooking something seemingly minor but important?

 Cheers,
 Peter



 Not sure about the status of this particular GSOC, but SBCL could also be
 cheating.

 http://www.sbcl.org/gsoc2013/ideas/#sec-1.2

 Now, I thought GMP were GPL'd and SBCL not, so I'm unsure about the legal
 implications, if it does bundle GMP now.


 — Stephen
 ___
 Chicken-users mailing list
 Chicken-users@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/chicken-users

-- 
-Dan Leslie

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Peter Bex
On Thu, Jun 25, 2015 at 11:39:50AM -0700, Martin DeMello wrote:
 Post to /r/scheme about chicken's bignum performance. (Not my post,
 just figured it could use some eyeballs.)
 
 http://www.reddit.com/r/scheme/comments/3b1ujw/performance_of_chicken_scheme_numbers_bignums/

Hello Martin,

Thanks for posting this.  We had already been discussing it earlier
today in #chicken.  I had another look at the code but I can't really
find any obvious inefficiencies.  It is indeed a bit faster with
CHICKEN 5, but not by much.

Of course, Guile is cheating by using GMP.  If I compare it to another
Scheme which has its own bignum implementation like Gauche, we perform
about the same.  It's interesting that sbcl is doing so well.  Maybe I'm
overlooking something seemingly minor but important?

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread cowan
Peter Bex scripsit:

 Thanks for posting this.  We had already been discussing it earlier
 today in #chicken.  I had another look at the code but I can't really
 find any obvious inefficiencies.  It is indeed a bit faster with
 CHICKEN 5, but not by much.

Potential confounders are the I/O and the call to format.  When I
get a chance, I'll try running the core of the code on the various
implementations on my Scheme machine (which I only have access to
nights and weekends now, thanks to $EMPLOYER's paranoia).

-- 
John Cowan  http://www.ccil.org/~cowanco...@ccil.org
An observable characteristic is not necessarily a functional requirement.
--John Hudson



___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Peter Bex
On Thu, Jun 25, 2015 at 10:26:53PM +0200, Peter Bex wrote:
 On Thu, Jun 25, 2015 at 05:04:17PM -0300, Stephen Eilert wrote:
  Not sure about the status of this particular GSOC, but SBCL could also be
  cheating.
  
  http://www.sbcl.org/gsoc2013/ideas/#sec-1.2
  
  Now, I thought GMP were GPL'd and SBCL not, so I'm unsure about the legal
  implications, if it does bundle GMP now.
 
 Thanks for posting that; I hadn't looked at the sbcl implementation
 of the pi-digits benchmark, but indeed that page says Pi digits on
 the benchmarks game is currently a bunch of calls to GMP.  This would
 *really* be cheating in my opinion, as it's not testing the language
 implementation but using nonstandard extensions that replace stuff
 that's already part of the core.
 
 I had just assumed it would be using the sbcl native implementation
 (which indeed is pretty damn good and almost 100% pure Lisp code).

In fact, there are two implementations, one of which uses native
operations.  It is still faster than CHICKEN and very close to Guile/GMP:

http://benchmarksgame.alioth.debian.org/u32/program.php?test=pidigitslang=sbclid=2#log

It does use some type annotations, though.  Adding some to the
CHICKEN version makes it go a tiny bit faster, but not by much.

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Stephen Eilert
On Thu, Jun 25, 2015 at 4:50 PM, Peter Bex pe...@more-magic.net wrote:

 On Thu, Jun 25, 2015 at 11:39:50AM -0700, Martin DeMello wrote:
  Post to /r/scheme about chicken's bignum performance. (Not my post,
  just figured it could use some eyeballs.)
 
 
 http://www.reddit.com/r/scheme/comments/3b1ujw/performance_of_chicken_scheme_numbers_bignums/

 Hello Martin,

 Thanks for posting this.  We had already been discussing it earlier
 today in #chicken.  I had another look at the code but I can't really
 find any obvious inefficiencies.  It is indeed a bit faster with
 CHICKEN 5, but not by much.

 Of course, Guile is cheating by using GMP.  If I compare it to another
 Scheme which has its own bignum implementation like Gauche, we perform
 about the same.  It's interesting that sbcl is doing so well.  Maybe I'm
 overlooking something seemingly minor but important?

 Cheers,
 Peter



Not sure about the status of this particular GSOC, but SBCL could also be
cheating.

http://www.sbcl.org/gsoc2013/ideas/#sec-1.2

Now, I thought GMP were GPL'd and SBCL not, so I'm unsure about the legal
implications, if it does bundle GMP now.


— Stephen
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Peter Bex
On Thu, Jun 25, 2015 at 03:59:59PM -0400, co...@ccil.org wrote:
 Peter Bex scripsit:
 
  Thanks for posting this.  We had already been discussing it earlier
  today in #chicken.  I had another look at the code but I can't really
  find any obvious inefficiencies.  It is indeed a bit faster with
  CHICKEN 5, but not by much.
 
 Potential confounders are the I/O and the call to format.  When I
 get a chance, I'll try running the core of the code on the various
 implementations on my Scheme machine (which I only have access to
 nights and weekends now, thanks to $EMPLOYER's paranoia).

I already removed the use of the format egg (the code contains a
commented-out version that relies only on display), and even
completely disable the println call.  It's hard to be sure, but it
*looks* like the majority of the time is being spent in the division
procedures.

Seeing how it fares on other Schemes would be interesting though!

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Peter Bex
On Thu, Jun 25, 2015 at 05:04:17PM -0300, Stephen Eilert wrote:
 Not sure about the status of this particular GSOC, but SBCL could also be
 cheating.
 
 http://www.sbcl.org/gsoc2013/ideas/#sec-1.2
 
 Now, I thought GMP were GPL'd and SBCL not, so I'm unsure about the legal
 implications, if it does bundle GMP now.

Thanks for posting that; I hadn't looked at the sbcl implementation
of the pi-digits benchmark, but indeed that page says Pi digits on
the benchmarks game is currently a bunch of calls to GMP.  This would
*really* be cheating in my opinion, as it's not testing the language
implementation but using nonstandard extensions that replace stuff
that's already part of the core.

I had just assumed it would be using the sbcl native implementation
(which indeed is pretty damn good and almost 100% pure Lisp code).

Cheers,
Peter


signature.asc
Description: Digital signature
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread cowan
Peter Bex scripsit:

 Of course, Guile is cheating by using GMP.  If I compare it to another
 Scheme which has its own bignum implementation like Gauche, we perform
 about the same.  It's interesting that sbcl is doing so well.  Maybe I'm
 overlooking something seemingly minor but important?

Here's a remark from the SBLC ideas page that may be helpful:

Integer division is notorious for being slow. However, it is also known
that the divisor is constant in the vast majority of cases, and serious
compilers exploit that fact to simplify divisions into sequences of simpler
multiplications, shifts, and additions. SBCL implements such a
simplification only for truncated division of unsigned machine words. Floor
and ceiling are less commonly supported natively in programming languages,
and there is a dearth of literature describing how to simplify them.
However, it is possible to do so, for both signed and unsigned machine
integers. It is also possible to specialise the routines for tagged
arithmetic. A complete execution of this project would include the
development of simplification routines for signed and unsigned truncate,
floor and ceiling divisions by integer constants. Some of the
simplifications, particularly those concerning tagged integers, will be
widely applicable and likely novel.


___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] performance of bignums

2015-06-25 Thread Stephen Eilert
On Thu, Jun 25, 2015 at 5:53 PM, Peter Bex pe...@more-magic.net wrote:

 On Thu, Jun 25, 2015 at 10:26:53PM +0200, Peter Bex wrote:
  On Thu, Jun 25, 2015 at 05:04:17PM -0300, Stephen Eilert wrote:
   Not sure about the status of this particular GSOC, but SBCL could also
 be
   cheating.
  
   http://www.sbcl.org/gsoc2013/ideas/#sec-1.2
  
   Now, I thought GMP were GPL'd and SBCL not, so I'm unsure about the
 legal
   implications, if it does bundle GMP now.
 
  Thanks for posting that; I hadn't looked at the sbcl implementation
  of the pi-digits benchmark, but indeed that page says Pi digits on
  the benchmarks game is currently a bunch of calls to GMP.  This would
  *really* be cheating in my opinion, as it's not testing the language
  implementation but using nonstandard extensions that replace stuff
  that's already part of the core.
 
  I had just assumed it would be using the sbcl native implementation
  (which indeed is pretty damn good and almost 100% pure Lisp code).

 In fact, there are two implementations, one of which uses native
 operations.  It is still faster than CHICKEN and very close to Guile/GMP:


 http://benchmarksgame.alioth.debian.org/u32/program.php?test=pidigitslang=sbclid=2#log

 It does use some type annotations, though.  Adding some to the
 CHICKEN version makes it go a tiny bit faster, but not by much.

 Cheers,
 Peter


That compilation output is amazing!

; note: forced to do GENERIC-* (cost 30)
;   unable to do inline fixnum arithmetic (cost 4) because:
;   The first argument is a INTEGER, not a FIXNUM.
;   The result is a (VALUES INTEGER OPTIONAL), not a (VALUES
FIXNUM REST T).
;   unable to do inline (signed-byte 32) arithmetic (cost 5) because:
;   The first argument is a INTEGER, not a (SIGNED-BYTE 32).
;   The result is a (VALUES INTEGER OPTIONAL), not a (VALUES
;
(SIGNED-BYTE 32) REST
;  T).


From the benchmark page (
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=pidigits#about
):

In addition to language specific multiprecision arithmetic, we will accept
programs that use GMP http://gmplib.org/.

So, the cheating is official. Let's cheat too!
*I fail to see the point though. If it's a programming language shootout,
does it make sense to offload the meat of the processing to GMP?* *Do they
intend to measure the FFI performance?*


— Stephen
___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] performance of bignums

2015-06-25 Thread Martin DeMello
Post to /r/scheme about chicken's bignum performance. (Not my post,
just figured it could use some eyeballs.)

http://www.reddit.com/r/scheme/comments/3b1ujw/performance_of_chicken_scheme_numbers_bignums/

martin

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users