Le 18/09/2010 16:31, KvS a écrit :
Hi again,

I hope you don't mind me bumping this thread one more time. I started
experimenting with trying a few things for fast arbitrary precision
computations using Cython. Above it was suggested to use MPFR
directly, so without the RealNumber wrapper, as the fastest way. Here
is a bit of code that does the same thing (computing x**2+i*x for
input x and i in range(10**7)) using RealNumber, using doubles and
using MPFR directly:



What surprises/disappoints me a bit is that both RealNumber and MPFR
directly are a factor 100 slower than using doubles, even though I'm
using RealField(53). Is this something I just have to live with, i.e.
computations with doubles are somehow just more optimized (?), or did
I do something wrong/is there something I can do to improve the speed
of the RealNumber/MPFR directly variations?


Hello,

There is no way to work around this:

->RDF (Double Precision numbers) use the processor arithmetic
->RealField(x) use the MPFR library.

In the first case, Sage (python, cython) passes the operations to the processor which performs each operation in 1 (or may be 2) clock cycles. In the second case, MFR, will call a (may be small) routine to perform the task. Even if MPFR is perfectly optimized, this is much more costly. Use MPFR when you need high precision, whatever is costs, or a special rounding method.

Even worst (or better): the processor implements pipelining, parallelism and so on.

Something I find very spectacular:

def prod(a,b):
    for i in range(1,1):
        a*b
c=random_matrix(RDF,1000)
d=random_matrix(RDF,1000)

prod(c,d)

try it; count the operations (10^9 multiplications). On my computer (3ghz), prod(c,d) takes 0.18 second: that is to say 5 Gigaflops! (or 10 is you take account of the additions): more than 1 operation by clock cycle. This because we use here: 1) the processor fpu, 2) the blas Atlas, which allows the processor to perform at full speed (and minimizes cache misses).

Now, change RDF to RR. It takes 421 seconds to perform prod(c,d) on my computer: that is to say *2000* times more than with RDF numbers, because 1) we do not use the fpu of the processor, 2) there is no optimization of the matrix product (and a lot of cache misses also!).

For "number crunching" use the processor's fpu (RDF) directly!

Yours
t.d.



Many thanks, Kees


--
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

<<attachment: tdumont.vcf>>

Reply via email to