[sage-devel] Re: Implementation of Q(t) and monomials

William Stein Sat, 14 Nov 2009 21:33:43 -0800

On Sat, Nov 14, 2009 at 9:00 PM, Sebastian Pancratz <s...@pancratz.org> wrote:
>
> Dear William,
>
> I am adding a somewhat more detailed performance report below,
> comparing my own FLINT-based C code, MAGMA, SAGE 4.1.2.alpha2 and SAGE
> 4.1.2.alpha2 with the patch from trac ticket #4000.
>
> To give an impression of the results, for very small examples MAGMA
> marginally beats my C implementation, but I am very confident that I
> could change this around using a lazy evaluation approach for changing
> values away from 0/1 (then to be implemented as NULL/NULL).  At this
> stage already, the regular (resp. patched) SAGE code is out by a
> factor of 100 (resp. 10).  In the last example with degrees 100 and
> 140, the C code beats MAGMA by a factor of close to 10 and the regular
> SAGE code is a factor of around 150,000 slower (for addition; the I
> did not wait for the multiplication to finish), that is to say,
> essentially unusable for repeated calculations.  Fortunately, at this
> stage, the patched SAGE code is less than a factor of 2 slower than
> the C code and clearly beats the MAGMA code.
>
> By the way, I wouldn't want to say that the C code is heavily
> optimised at the moment.  When writing it, I have focussed on writing
> a clean interface while adding as little overhead as possible, thereby
> taking advantage of the speed that FLINT provides.  Nonetheless, apart
> from using the lazy evaluation approach mentioned above and trying to
> improve the combined operation addmul (used for matrix operations), I
> do not have any ideas for improvement at the moment.  But of course,
> I'd be happy to incorporate any suggestions.
>
> As a final comment, I want to add that the performance of the patched
> SAGE code shouldn't be a surprise.  As part of the work on trac ticket
> #4000, I had already written a class
> "UnivariateRationalRationalFunction(FractionFieldElement)" to
> represent a rational function over the rationals, based on Z[t], which
> in turn is based on FLINT already.  It is thus to be expected that it
> should run at a similar speed apart from a small constant factor.
>
> Kind regards,
>
> Sebastian
>
> [Performance comparison]
> Performance comparison between
>
>    FLINT-based C code,
>    MAGMA (Magma V2.15-12),
>    SAGE (4.1.2.alpha2),
>    SAGE (4.1.2.alpha2 with changes along trac ticket #4000).
>
> MAGMA was run on the departmental machine, with eight CPUs with the
> following specifications
>
>    model name      : Dual-Core AMD Opteron(tm) Processor 8220
>    cpu MHz         : 1000.000


Why do you say "1000" MHZ when that particular processor is a 2800Mhz
(=2.8Ghz) processor?

>    cache size      : 1024 KB
>
> and 32GB of memory.
>
> The other three programs were run on my laptop, with two CPUs with the
> following specifications
>
>    model name      : Intel(R) Core(TM)2 Duo CPU     P8400  @ 2.26GHz
>    cpu MHz         : 800.000


That cpu MHz of 800 makes no sense, given that your CPU is 2.26GHz.
I'm just curious.

>    cache size      : 3072 KB
>
> and 2GB of memory.
>
> We only compare addition and multiplication, based on three examples
> of different sizes.  The number of repetitions only applies to the
> FLINT-based C code and MAGMA.  We give the output of "timeit" for SAGE
> and scale the number up by the number of repetitions to make it easily
> comparable.
>
> The C timings are done using setup in the file qfmpz_poly-testp.
>
> The MAGMA code is executed as follows:
>
>    F := FieldOfFractions(PolynomialRing(Rationals()));
>    t := F.1;
>    b := (3*t^2+2*t+1)/(t+10);
>    c := (5*t+7)/13;
>    t0 := Cputime();
>    for i:=1 to 1000000 do
>    for> a := b+c;
>    for> end for;
>    t1 := Cputime(t0);
>    t1;
>
> The SAGE code is time with
>
>    F.<t> = Frac(PolynomialRing(QQ, 't'))
>    b = (3*t^2+2*t+1)/(t+10)
>    c = (5*t+7)/13
>    timeit('a = b+c')
>
> In each case, we give the timings in the same order as the name of the
> programs at the top of this text.
>
> In the small case, we use N = 1,000,000 repetitions and
>
> b = (3*t^2+2*t+1)/(t+10)
> c = (5*t+7)/13
>
> ADD
> 3.41
> 3.010
> 263 (625 loops, best of 3: 263 µs per loop)
> 50 (625 loops, best of 3: 50 µs per loop)
>
> MUL
> 3.74
> 4.340
> 391 (625 loops, best of 3: 391 µs per loop)
> 58.2 (625 loops, best of 3: 58.2 µs per loop)

Via the PARI C library:

sage: t=pari('t')
sage: b = (3*t^2+2*t+1)/(t+10)
sage: c = (5*t+7)/13
sage: timeit('a = b+c')
625 loops, best of 3: 8.06 µs per loop
sage: timeit('a=b*c')
625 loops, best of 3: 10.3 µs per loop


>
> In the medium case, we use N = 100,000 repetitions and
>
> b =
> (-19836522420*t^29+79346089680*t^28+57706247040*t^27+238038269040*t^26+158692179360*t^25+555422627760*t^24+5119102560*t^23-714114807120*t^22+52897393120*t^21+79346089680*t^20-79346089680*t^19-1983652242000*t^16-39673044840*t^14+39673044840*t^13+4959130605*t^12+79346089680*t^11+359032080*t^10-39673044840*t^6-119019134520*t^5+15869217936*t-26448696560)/
> (158692179360*t^30+7934608968*t^29-793460896800*t^28+13224348280*t^27-1017257560*t^25+741552240*t^24+52897393120*t^23-79346089680*t^21-46285218980*t^20+29754783630*t^19+158692179360*t^18-555422627760*t^17-158692179360*t^15-39673044840*t^14-7213280880*t^11-39673044840*t^10+79346089680*t^9-449627841520*t^8-79346089680*t^7-5805811440*t^5-79346089680*t^4+79346089680*t^3-26448696560*t-634768717440)
> c =
> (91560*t^30-183120*t^29+45780*t^27-30520*t^26-6226080*t^24-91560*t^22+366240*t^21+91560*t^20+45780*t^19+91560*t^17+45780*t^16-91560*t^15+137340*t^14-30520*t^13-30520*t^12-45780*t^11+274680*t^10+1007160*t^8-9156*t^7-274680*t^6-11445*t^4+9240*t^3-412020*t^2-39240*t
> +366240)/
> (45780*t^30-91560*t^29-91560*t^28-91560*t^27-45780*t^25+22890*t^24+91560*t^23+91560*t^22+183120*t^21+183120*t^20+10071600*t^19-18312*t^18+549360*t^16-45780*t^14+549360*t^13-91560*t^12+274680*t^10-45780*t^9-91560*t^8+549360*t^7-45780*t^6+503580*t^5+91560*t^3-137340*t^2+457800*t
> +91560)
>
> ADD
> 16.4
> 30.890
> 25200 (5 loops, best of 3: 252 ms per loop)
> 27 (625 loops, best of 3: 270 µs per loop)
>
> MUL
> 19.43
> 47.670
> 3370000 (1 loops, best of 1: 33.7 s per loop)
> 34.4 (625 loops, best of 3: 344 µs per loop)

Did you try using the Sage PARI C library.  It's reasonably good at
arithmetic in K(t) for K=Q and F_q.   On my 2.1Ghz laptop I get the
following on the above example:

t=pari('t')
a, b as above
sage: timeit('a = b+c')
21.5    (625 loops, best of 3: 215 µs per loop)
sage: timeit('a = b*c')
22.8    (625 loops, best of 3: 228 µs per loop)

Or am I timing the wrong thing?

>
> In the large case, we use N = 100,000 repetitions and
>
> b = (t^99-5*t^98+4*t^96+t^95-t^94+4*t^93-8*t^92-310*t^91-
> t^89+21*t^86-10*t^85+333*t^84-28*t^81+t^79+5*t^78-4*t^77+5*t^76-5*t^75-5*t^74-
> t^73+t^72-2*t^71+2*t^69+t^68+8*t^67-
> t^66-2*t^65+6*t^64+t^63+t^62+2*t^61+4*t^60+t^59+6*t^57-2*t^56+t^55-5*t^52-63*t^51-2*t^49-
> t^48+t^47-t^46+2*t^45+t^44-
> t^43+5*t^42-2*t^41+15*t^40+t^39-2*t^38+t^37-55*t^36+t^35+t^34-7*t^33-
> t^31+4*t^30-2*t^29+t^27+2*t^26-3*t^25+t^24-2*t^23+2*t^21+2*t^20+t^19+t^18-
> t^17-94*t^16+4*t^15+2*t^14-3*t^13-t^12+4*t^11+t^10-t^9+t^7+2*t^6+t^5-
> t^4)/(2*t^100-
> t^99-2*t^98+2*t^97+2*t^95+2*t^94+11*t^93+2*t^91+t^90+t^89+t^88-
> t^87+2*t^86+12*t^85-2*t^83+3*t^81+9*t^80+t^79-9*t^78-
> t^76+2*t^75+50*t^74-t^73+t^72-2*t^71-2*t^70-
> t^65+t^64+t^63+3*t^62-2*t^60+t^58-5*t^57-2*t^55-5*t^54+t^53-t^52-t^51-
> t^50-2*t^49+7*t^48+2*t^46+3*t^44+2*t^43+60*t^41-
> t^38+t^37+7*t^36+7*t^34-t^32+6*t^30-3*t^28+t^27+29*t^26-t^25-
> t^24+7*t^23+17*t^21-4*t^20+10*t^19+t^18+t^16-t^15+t^14+t^13-
> t^12+t^11+t^10+t^9+t^8-t^7+14*t^6-t^5-t^4+t^3-t^2-t
> +1)
> c = (-t^137-5*t^136-t^135-t^134-2*t^133+t^131+8*t^130-t^129-
> t^128-10*t^127+t^125+t^124+9*t^123-t^122-16*t^121+2*t^120-t^119-
> t^118+2*t^117-2*t^116-t^114-8*t^111+t^110+t^108+t^107-
> t^106+10*t^104-6*t^103-
> t^102+t^101+5*t^99-81*t^98+t^97-2*t^96+2*t^95-5*t^94+t^93-2*t^92+5*t^91+t^90-
> t^89+t^87-7*t^86-9*t^85-6*t^84-t^83+23*t^82-2*t^81-t^80+2*t^78-
> t^77-2*t^75-3*t^74+t^73+19*t^71-20*t^70+2*t^68-29*t^67-t^66-2*t^65-
> t^63+133*t^62+t^61+8*t^60+5*t^59-
> t^58-2*t^57+5*t^56+t^55-2*t^54-9*t^51-4*t^50+14*t^49-t^48-
> t^47+6*t^46+t^45+5*t^43+3*t^42-9*t^41+t^40-3*t^39+t^38+4*t^37+2*t^36+3*t^35-2*t^34-
> t^33-t^31-t^30-3*t^28-t^27+t^25+2*t^24-4*t^23+t^22+t^21-t^20-
> t^19+3*t^18+4*t^17+t^16-51*t^15-2*t^14-t^13-
> t^12-2*t^11+3*t^10+t^9-2*t^8+t^7-3*t^6+4*t^4+3*t^3-t^2)/
> (t^137+t^136-3*t^135-t^134+2*t^133-
> t^132+20*t^131-3*t^129+t^128-2*t^126-t^125-t^124-2*t^123-t^121+2*t^120-
> t^119+t^118-77*t^116+t^115-t^113-4*t^110-2*t^109-
> t^108+t^106+4*t^105-2*t^103+t^102+t^101+10*t^100-23*t^99-
> t^98-22*t^97-2*t^94-7*t^93+t^91+2*t^90-2*t^89+2*t^87-6*t^86-
> t^85+t^84-2*t^82-28*t^81-
> t^79-176*t^78-9*t^77+t^76-6*t^75-3*t^74+t^73-23*t^72-6*t^71+10*t^70+t^69+2*t^68-2*t^67-4*t^66+37*t^64-
> t^63+2*t^62-2*t^61+t^60+t^59-2*t^58+67*t^56-10*t^55-8*t^54+25*t^53+2*t^52-5*t^51+t^50-2*t^49-
> t^48-t^47-2*t^46-5*t^45-t^43-t^42-
> t^41+5*t^39+2*t^35-4*t^34+t^33+3*t^32-
> t^30-3*t^29+t^28+2*t^27+2*t^26+t^25+t^24-t^23+t^22+3*t^21-
> t^20+t^19-3*t^17+4*t^15-t^14+4*t^13+2*t^11+t^9-t^8+t^7-t^4-t^3-
> t^2-12*t-17)
>
> ADD
> 27.62
> 180.040
> 3980000 (1 loops, best of 1: 39.8 s per loop)
> 36.4 (625 loops, best of 3: 364 µs per loop)
>
> MUL
> 35.79
> 285.680
> NA (Manually terminated before completion)
> 57 (625 loops, best of 3: 570 µs per loop)

In PARI:

sage: timeit('a = b+c')
125 loops, best of 3: 1.98 ms per loop
sage: timeit('a = b*c')
125 loops, best of 3: 3.11 ms per loop

There FLINT is beating PARI's slower multiplication.

 -- William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: Implementation of Q(t) and monomials

Reply via email to