Re: [sage-support] Re: Arbitrary precision in Cython & NumPy?

Thierry Dumont Sat, 18 Sep 2010 10:10:45 -0700

Le 18/09/2010 16:31, KvS a écrit :

Hi again,


I hope you don't mind me bumping this thread one more time. I started
experimenting with trying a few things for fast arbitrary precision
computations using Cython. Above it was suggested to use MPFR
directly, so without the RealNumber wrapper, as the fastest way. Here
is a bit of code that does the same thing (computing x**2+i*x for
input x and i in range(10**7)) using RealNumber, using doubles and
using MPFR directly:



What surprises/disappoints me a bit is that both RealNumber and MPFR
directly are a factor 100 slower than using doubles, even though I'm
using RealField(53). Is this something I just have to live with, i.e.
computations with doubles are somehow just more optimized (?), or did
I do something wrong/is there something I can do to improve the speed
of the RealNumber/MPFR directly variations?


Hello,

There is no way to work around this:

->RDF (Double Precision numbers) use the processor arithmetic
->RealField(x) use the MPFR library.

In the first case, Sage (python, cython) passes the operations to theprocessor which performs each operation in 1 (or may be 2) clock cycles.In the second case, MFR, will call a (may be small) routine to performthe task. Even if MPFR is perfectly optimized, this is much more costly.Use MPFR when you need high precision, whatever is costs, or a specialrounding method.

Even worst (or better): the processor implements pipelining, parallelismand so on.


Something I find very spectacular:

def prod(a,b):
    for i in range(1,1):
        a*b
c=random_matrix(RDF,1000)
d=random_matrix(RDF,1000)

prod(c,d)

try it; count the operations (10^9 multiplications). On my computer(3ghz), prod(c,d) takes 0.18 second: that is to say 5 Gigaflops! (or 10is you take account of the additions): more than 1 operation by clockcycle. This because we use here: 1) the processor fpu, 2) the blasAtlas, which allows the processor to perform at full speed (andminimizes cache misses).

Now, change RDF to RR. It takes 421 seconds to perform prod(c,d) on mycomputer: that is to say *2000* times more than with RDF numbers,because 1) we do not use the fpu of the processor, 2) there is nooptimization of the matrix product (and a lot of cache misses also!).


For "number crunching" use the processor's fpu (RDF) directly!

Yours
t.d.

Many thanks, Kees


--
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

<<attachment: tdumont.vcf>>

Re: [sage-support] Re: Arbitrary precision in Cython & NumPy?

Reply via email to