On Sat, Nov 14, 2009 at 9:00 PM, Sebastian Pancratz <s...@pancratz.org> wrote: > > Dear William, > > I am adding a somewhat more detailed performance report below, > comparing my own FLINT-based C code, MAGMA, SAGE 4.1.2.alpha2 and SAGE > 4.1.2.alpha2 with the patch from trac ticket #4000. > > To give an impression of the results, for very small examples MAGMA > marginally beats my C implementation, but I am very confident that I > could change this around using a lazy evaluation approach for changing > values away from 0/1 (then to be implemented as NULL/NULL). At this > stage already, the regular (resp. patched) SAGE code is out by a > factor of 100 (resp. 10). In the last example with degrees 100 and > 140, the C code beats MAGMA by a factor of close to 10 and the regular > SAGE code is a factor of around 150,000 slower (for addition; the I > did not wait for the multiplication to finish), that is to say, > essentially unusable for repeated calculations. Fortunately, at this > stage, the patched SAGE code is less than a factor of 2 slower than > the C code and clearly beats the MAGMA code. > > By the way, I wouldn't want to say that the C code is heavily > optimised at the moment. When writing it, I have focussed on writing > a clean interface while adding as little overhead as possible, thereby > taking advantage of the speed that FLINT provides. Nonetheless, apart > from using the lazy evaluation approach mentioned above and trying to > improve the combined operation addmul (used for matrix operations), I > do not have any ideas for improvement at the moment. But of course, > I'd be happy to incorporate any suggestions. > > As a final comment, I want to add that the performance of the patched > SAGE code shouldn't be a surprise. As part of the work on trac ticket > #4000, I had already written a class > "UnivariateRationalRationalFunction(FractionFieldElement)" to > represent a rational function over the rationals, based on Z[t], which > in turn is based on FLINT already. It is thus to be expected that it > should run at a similar speed apart from a small constant factor. > > Kind regards, > > Sebastian > > [Performance comparison] > Performance comparison between > > FLINT-based C code, > MAGMA (Magma V2.15-12), > SAGE (4.1.2.alpha2), > SAGE (4.1.2.alpha2 with changes along trac ticket #4000). > > MAGMA was run on the departmental machine, with eight CPUs with the > following specifications > > model name : Dual-Core AMD Opteron(tm) Processor 8220 > cpu MHz : 1000.000
Why do you say "1000" MHZ when that particular processor is a 2800Mhz (=2.8Ghz) processor? > cache size : 1024 KB > > and 32GB of memory. > > The other three programs were run on my laptop, with two CPUs with the > following specifications > > model name : Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz > cpu MHz : 800.000 That cpu MHz of 800 makes no sense, given that your CPU is 2.26GHz. I'm just curious. > cache size : 3072 KB > > and 2GB of memory. > > We only compare addition and multiplication, based on three examples > of different sizes. The number of repetitions only applies to the > FLINT-based C code and MAGMA. We give the output of "timeit" for SAGE > and scale the number up by the number of repetitions to make it easily > comparable. > > The C timings are done using setup in the file qfmpz_poly-testp. > > The MAGMA code is executed as follows: > > F := FieldOfFractions(PolynomialRing(Rationals())); > t := F.1; > b := (3*t^2+2*t+1)/(t+10); > c := (5*t+7)/13; > t0 := Cputime(); > for i:=1 to 1000000 do > for> a := b+c; > for> end for; > t1 := Cputime(t0); > t1; > > The SAGE code is time with > > F.<t> = Frac(PolynomialRing(QQ, 't')) > b = (3*t^2+2*t+1)/(t+10) > c = (5*t+7)/13 > timeit('a = b+c') > > In each case, we give the timings in the same order as the name of the > programs at the top of this text. > > In the small case, we use N = 1,000,000 repetitions and > > b = (3*t^2+2*t+1)/(t+10) > c = (5*t+7)/13 > > ADD > 3.41 > 3.010 > 263 (625 loops, best of 3: 263 µs per loop) > 50 (625 loops, best of 3: 50 µs per loop) > > MUL > 3.74 > 4.340 > 391 (625 loops, best of 3: 391 µs per loop) > 58.2 (625 loops, best of 3: 58.2 µs per loop) Via the PARI C library: sage: t=pari('t') sage: b = (3*t^2+2*t+1)/(t+10) sage: c = (5*t+7)/13 sage: timeit('a = b+c') 625 loops, best of 3: 8.06 µs per loop sage: timeit('a=b*c') 625 loops, best of 3: 10.3 µs per loop > > In the medium case, we use N = 100,000 repetitions and > > b = > (-19836522420*t^29+79346089680*t^28+57706247040*t^27+238038269040*t^26+158692179360*t^25+555422627760*t^24+5119102560*t^23-714114807120*t^22+52897393120*t^21+79346089680*t^20-79346089680*t^19-1983652242000*t^16-39673044840*t^14+39673044840*t^13+4959130605*t^12+79346089680*t^11+359032080*t^10-39673044840*t^6-119019134520*t^5+15869217936*t-26448696560)/ > (158692179360*t^30+7934608968*t^29-793460896800*t^28+13224348280*t^27-1017257560*t^25+741552240*t^24+52897393120*t^23-79346089680*t^21-46285218980*t^20+29754783630*t^19+158692179360*t^18-555422627760*t^17-158692179360*t^15-39673044840*t^14-7213280880*t^11-39673044840*t^10+79346089680*t^9-449627841520*t^8-79346089680*t^7-5805811440*t^5-79346089680*t^4+79346089680*t^3-26448696560*t-634768717440) > c = > (91560*t^30-183120*t^29+45780*t^27-30520*t^26-6226080*t^24-91560*t^22+366240*t^21+91560*t^20+45780*t^19+91560*t^17+45780*t^16-91560*t^15+137340*t^14-30520*t^13-30520*t^12-45780*t^11+274680*t^10+1007160*t^8-9156*t^7-274680*t^6-11445*t^4+9240*t^3-412020*t^2-39240*t > +366240)/ > (45780*t^30-91560*t^29-91560*t^28-91560*t^27-45780*t^25+22890*t^24+91560*t^23+91560*t^22+183120*t^21+183120*t^20+10071600*t^19-18312*t^18+549360*t^16-45780*t^14+549360*t^13-91560*t^12+274680*t^10-45780*t^9-91560*t^8+549360*t^7-45780*t^6+503580*t^5+91560*t^3-137340*t^2+457800*t > +91560) > > ADD > 16.4 > 30.890 > 25200 (5 loops, best of 3: 252 ms per loop) > 27 (625 loops, best of 3: 270 µs per loop) > > MUL > 19.43 > 47.670 > 3370000 (1 loops, best of 1: 33.7 s per loop) > 34.4 (625 loops, best of 3: 344 µs per loop) Did you try using the Sage PARI C library. It's reasonably good at arithmetic in K(t) for K=Q and F_q. On my 2.1Ghz laptop I get the following on the above example: t=pari('t') a, b as above sage: timeit('a = b+c') 21.5 (625 loops, best of 3: 215 µs per loop) sage: timeit('a = b*c') 22.8 (625 loops, best of 3: 228 µs per loop) Or am I timing the wrong thing? > > In the large case, we use N = 100,000 repetitions and > > b = (t^99-5*t^98+4*t^96+t^95-t^94+4*t^93-8*t^92-310*t^91- > t^89+21*t^86-10*t^85+333*t^84-28*t^81+t^79+5*t^78-4*t^77+5*t^76-5*t^75-5*t^74- > t^73+t^72-2*t^71+2*t^69+t^68+8*t^67- > t^66-2*t^65+6*t^64+t^63+t^62+2*t^61+4*t^60+t^59+6*t^57-2*t^56+t^55-5*t^52-63*t^51-2*t^49- > t^48+t^47-t^46+2*t^45+t^44- > t^43+5*t^42-2*t^41+15*t^40+t^39-2*t^38+t^37-55*t^36+t^35+t^34-7*t^33- > t^31+4*t^30-2*t^29+t^27+2*t^26-3*t^25+t^24-2*t^23+2*t^21+2*t^20+t^19+t^18- > t^17-94*t^16+4*t^15+2*t^14-3*t^13-t^12+4*t^11+t^10-t^9+t^7+2*t^6+t^5- > t^4)/(2*t^100- > t^99-2*t^98+2*t^97+2*t^95+2*t^94+11*t^93+2*t^91+t^90+t^89+t^88- > t^87+2*t^86+12*t^85-2*t^83+3*t^81+9*t^80+t^79-9*t^78- > t^76+2*t^75+50*t^74-t^73+t^72-2*t^71-2*t^70- > t^65+t^64+t^63+3*t^62-2*t^60+t^58-5*t^57-2*t^55-5*t^54+t^53-t^52-t^51- > t^50-2*t^49+7*t^48+2*t^46+3*t^44+2*t^43+60*t^41- > t^38+t^37+7*t^36+7*t^34-t^32+6*t^30-3*t^28+t^27+29*t^26-t^25- > t^24+7*t^23+17*t^21-4*t^20+10*t^19+t^18+t^16-t^15+t^14+t^13- > t^12+t^11+t^10+t^9+t^8-t^7+14*t^6-t^5-t^4+t^3-t^2-t > +1) > c = (-t^137-5*t^136-t^135-t^134-2*t^133+t^131+8*t^130-t^129- > t^128-10*t^127+t^125+t^124+9*t^123-t^122-16*t^121+2*t^120-t^119- > t^118+2*t^117-2*t^116-t^114-8*t^111+t^110+t^108+t^107- > t^106+10*t^104-6*t^103- > t^102+t^101+5*t^99-81*t^98+t^97-2*t^96+2*t^95-5*t^94+t^93-2*t^92+5*t^91+t^90- > t^89+t^87-7*t^86-9*t^85-6*t^84-t^83+23*t^82-2*t^81-t^80+2*t^78- > t^77-2*t^75-3*t^74+t^73+19*t^71-20*t^70+2*t^68-29*t^67-t^66-2*t^65- > t^63+133*t^62+t^61+8*t^60+5*t^59- > t^58-2*t^57+5*t^56+t^55-2*t^54-9*t^51-4*t^50+14*t^49-t^48- > t^47+6*t^46+t^45+5*t^43+3*t^42-9*t^41+t^40-3*t^39+t^38+4*t^37+2*t^36+3*t^35-2*t^34- > t^33-t^31-t^30-3*t^28-t^27+t^25+2*t^24-4*t^23+t^22+t^21-t^20- > t^19+3*t^18+4*t^17+t^16-51*t^15-2*t^14-t^13- > t^12-2*t^11+3*t^10+t^9-2*t^8+t^7-3*t^6+4*t^4+3*t^3-t^2)/ > (t^137+t^136-3*t^135-t^134+2*t^133- > t^132+20*t^131-3*t^129+t^128-2*t^126-t^125-t^124-2*t^123-t^121+2*t^120- > t^119+t^118-77*t^116+t^115-t^113-4*t^110-2*t^109- > t^108+t^106+4*t^105-2*t^103+t^102+t^101+10*t^100-23*t^99- > t^98-22*t^97-2*t^94-7*t^93+t^91+2*t^90-2*t^89+2*t^87-6*t^86- > t^85+t^84-2*t^82-28*t^81- > t^79-176*t^78-9*t^77+t^76-6*t^75-3*t^74+t^73-23*t^72-6*t^71+10*t^70+t^69+2*t^68-2*t^67-4*t^66+37*t^64- > t^63+2*t^62-2*t^61+t^60+t^59-2*t^58+67*t^56-10*t^55-8*t^54+25*t^53+2*t^52-5*t^51+t^50-2*t^49- > t^48-t^47-2*t^46-5*t^45-t^43-t^42- > t^41+5*t^39+2*t^35-4*t^34+t^33+3*t^32- > t^30-3*t^29+t^28+2*t^27+2*t^26+t^25+t^24-t^23+t^22+3*t^21- > t^20+t^19-3*t^17+4*t^15-t^14+4*t^13+2*t^11+t^9-t^8+t^7-t^4-t^3- > t^2-12*t-17) > > ADD > 27.62 > 180.040 > 3980000 (1 loops, best of 1: 39.8 s per loop) > 36.4 (625 loops, best of 3: 364 µs per loop) > > MUL > 35.79 > 285.680 > NA (Manually terminated before completion) > 57 (625 loops, best of 3: 570 µs per loop) In PARI: sage: timeit('a = b+c') 125 loops, best of 3: 1.98 ms per loop sage: timeit('a = b*c') 125 loops, best of 3: 3.11 ms per loop There FLINT is beating PARI's slower multiplication. -- William --~--~---------~--~----~------------~-------~--~----~ To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel-unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---