On Sep 18, 7:10 pm, Thierry Dumont <[email protected]> wrote:
> Le 18/09/2010 16:31, KvS a �crit :
>
>
>
> > Hi again,
>
> > I hope you don't mind me bumping this thread one more time. I started
> > experimenting with trying a few things for fast arbitrary precision
> > computations using Cython. Above it was suggested to use MPFR
> > directly, so without the RealNumber wrapper, as the fastest way. Here
> > is a bit of code that does the same thing (computing x**2+i*x for
> > input x and i in range(10**7)) using RealNumber, using doubles and
> > using MPFR directly:
>
> > What surprises/disappoints me a bit is that both RealNumber and MPFR
> > directly are a factor 100 slower than using doubles, even though I'm
> > using RealField(53). Is this something I just have to live with, i.e.
> > computations with doubles are somehow just more optimized (?), or did
> > I do something wrong/is there something I can do to improve the speed
> > of the RealNumber/MPFR directly variations?
>
> Hello,
>
> There is no way to work around this:
>
> ->RDF (Double Precision numbers) use the processor arithmetic
> ->RealField(x) use the MPFR library.
>
> In the first case, Sage (python, cython) passes the operations to the
> processor which performs each operation in 1 (or may be 2) clock cycles.
> In the second case, MFR, will call a (may be small) routine to perform
> the task. Even if MPFR is perfectly optimized, this is much more costly.
> Use MPFR when you need high precision, whatever is costs, or a special
> rounding method.
>
> Even worst (or better): the processor implements pipelining, parallelism
> and so on.
>
> Something I find very spectacular:
>
> def prod(a,b):
>      for i in range(1,1):
>          a*b
> c=random_matrix(RDF,1000)
> d=random_matrix(RDF,1000)
>
> prod(c,d)
>
> try it; count the operations (10^9 multiplications). On my computer
> (3ghz), prod(c,d) takes 0.18 second: that is to say 5 Gigaflops! (or 10
> is you take account of the additions): more than 1 operation by clock
> cycle. This because we use here: 1) the processor fpu, 2) the blas
> Atlas, which allows the processor to perform at full speed (and
> minimizes cache misses).
>
> Now, change RDF to RR. It takes 421 seconds to perform prod(c,d)  on my
> computer: that is to say *2000* times more than with RDF numbers,
> because 1) we do not use the fpu of the processor, 2) there is no
> optimization of the matrix product (and a lot of cache misses also!).
>
> For "number crunching" use the processor's fpu (RDF) directly!
>
> Yours
> t.d.
>
> > Many thanks, Kees
>
>
>
>  tdumont.vcf
> < 1KViewDownload

Alright, many thanks for the clear and extensive answer Thierry.
Bottomline is thus that I'll have to live with it.

On a sidenote, I must admit it surprises me. I'm only an amateur
programmer, let alone that I know anything about the subtleties of how
cpus interact exactly with code, but you would somehow expect that it
should be possible when you add two arbitrary precision numbers say,
you break them up in smaller chunks (like 2115+3135 =
(21+31)*100+(15+35)) so that each of the smaller chunks are
essentially doubles to be added and can hence be performed at optimal
speed. You would get the overhead (breaking the numbers up in chunks
and putting them together again) plus the cycles necessary to make the
additions, but this would be a lot less than a factor 100 I'd have
guessed.

Well, anyhow, of course the above is layman talk and I will be missing
essential points, but that's probably why I'd have expected arbitrary
precision speed to be a lot closer to double speed.

Cheers, Kees

-- 
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

Reply via email to