#10480: fast PowerSeries_poly multiplication
-----------------------------------+----------------------------------------
Reporter: pernici | Owner: malb
Type: enhancement | Status: needs_work
Priority: major | Milestone: sage-4.6.2
Component: commutative algebra | Keywords: power series
Author: mario pernici | Upstream: N/A
Reviewer: | Merged:
Work_issues: |
-----------------------------------+----------------------------------------
Comment(by pernici):
Hello,
lftabera wrote:
>I think that we can define an algorithm that works like karatsuba but
that has a prec parameter. If a portion will have order greater than the
prec. That part is discarded. The prec is updated on recursive calls.
I followed your suggestion to perform truncated multiplication
recursively,
keeping track of the precision, calling Karatsuba multiplication
when there are no truncations.
>Also, the following cases have to be correctly covered:
{{{
sage: O(x)*O(x)
0
sage: O(x^2)*O(x^3)
0
}}}
The new patch trac_10480_fast_PowerSeries_poly_multiplication2.patch
passes all tests, included these.
It performs generally better than the unpached Sage4.6,
except the case in which the series has degree lower than the precision,
like your example
{{{
p1 = K.random_element(800) + O(t^990)
}}}
on which it is typically up to 50% slower, while with the previous
patch it was 2x slower.
The first patch was also slow in other cases, see below.
The performance of truncated multiplication vs Karatsuba multiplication
depends on the distribution of the terms in the univariate series;
truncated multiplication is fast when the high powers in the univariate
variable t have many more terms than the low powers, like in this example
using this patch:
{{{
sage: from sage.rings.power_series_poly import PowerSeries_poly
sage: R.<a,b> = QQ[]
sage: K.<t> = R[]
sage: p = (1 + a*t + b*t^2 + O(t^50))^-1
sage: v = [len(x.coefficients()) for x in p.coefficients()]
sage: v[:6], v[-6:]
([1, 1, 2, 2, 3, 3], [23, 23, 24, 24, 25, 25])
sage: %timeit p*p
25 loops, best of 3: 26 ms per loop
sage: %timeit
PowerSeries_poly(p.parent(),p.polynomial().mul_trunc(p.polynomial(),50,1),50)
125 loops, best of 3: 4.89 ms per loop
sage: p*p -
PowerSeries_poly(p.parent(),p.polynomial().mul_trunc(p.polynomial(),50,1),50)
O(t^50)
}}}
unpached Sage (main):
{{{
sage: %timeit p*p
5 loops, best of 3: 833 ms per loop
}}}
`mul_trunc(self, right, h, generic=1)` calls `do_mul_trunc_generic`,
which
is the algorithm used in the first patch; in this case it is faster,
but in other cases it is slower, so I think it is better to use
`do_mul_trunc`.
However in multivariate series `do_mul_trunc_generic` seems to perform
always
better; I will post a patch about this in ticket #1956
{{{
p = K.random_element(50) + O(t^50)
}}}
has uniform distribution; for `p*p` the patched version performs as the
main version.
In un.sage one takes N subsequent `p=p*(p+1)` starting with a
random univariate series on a polynomial ring with n variables;
the ratio of the number of terms in the upper half of the univariate
series
and the lower half is given.
The ratio is close to 1 in a random polynomial, and increases with n and
N.
There is a correlation between the speedup
using truncated multiplication and this ratio.
The old patch (patch1) in some cases performs badly; the worst case is
given.
{{{
n variables in polynomial ring on QQ; h precision;
N = number of times p = p*(p+1) is taken
n=1
h=20 N=1 2 3 4 5 6 7
ratio 0.86 0.98 1 1 1 1 1
main 0.005 0.008 0.022 0.09 0.41 3.5 53.5
patch 0.004 0.006 0.016 0.06 0.31 2.5 41.5
h=100
ratio 0.93 1 1 1 1
main 0.049 0.14 0.47 1.6 7.9
patch 0.043 0.10 0.31 1.1 5.5
h=500
ratio 0.96 1.0
main 1.5 20.9
patch1 2.0 49.1
patch 1.3 17.0
n=2
h=20
ratio 0.81 1.06 1.10 1.19 1.34
main 0.009 0.04 0.30 4.3 134
patch 0.007 0.03 0.22 2.6 61
h=100
ratio 1.05 1.01 1.0 1.0
main 0.11 0.80 7.8 171
patch 0.09 0.59 5.5 103
n=4
h=20
ratio 1.08 1.53 1.64
main 0.015 0.35 21.0
patch 0.010 0.19 10.9
n=8
h=10
ratio 1.28 2.67 5.43
main 0.008 0.38 127
patch 0.006 0.046 4.4
n=16
h=10
ratio 1.00 2.77 8.90
main 0.013 1.95 > 19m(6GB)
patch 0.009 0.14 29.0(0.5GB)
}}}
In the last example I stopped the computation with main because too much
memory was used.
> I will try to implement the algorithm at "A long note on Mulders’ short
product" of Hanrot and Zimmermann that looks not harder than Karatsuba.
The authors claim that heuristically the gain time is on the average 0.7 x
Karatsuba time.
If I understand correctly, that paper deals with univariate series
on numeric rings. Maybe in the case of univariate series over polynomial
rings one could use the list of the number of elements to guide the
algorithm.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10480#comment:5>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.