#10480: fast PowerSeries_poly multiplication
-----------------------------------+----------------------------------------
   Reporter:  pernici              |       Owner:  malb        
       Type:  enhancement          |      Status:  needs_work  
   Priority:  major                |   Milestone:  sage-4.6.2  
  Component:  commutative algebra  |    Keywords:  power series
     Author:  mario pernici        |    Upstream:  N/A         
   Reviewer:                       |      Merged:              
Work_issues:                       |  
-----------------------------------+----------------------------------------

Comment(by pernici):

 Hello,

 lftabera wrote:
 >I think that we can define an algorithm that works like karatsuba but
 that has a prec parameter. If a portion will have order greater than the
 prec. That part is discarded. The prec is updated on recursive calls.

 I followed your suggestion to perform truncated multiplication
 recursively,
 keeping track of the precision, calling Karatsuba multiplication
 when there are no truncations.

 >Also, the following cases have to be correctly covered:

 {{{
 sage: O(x)*O(x)
 0
 sage: O(x^2)*O(x^3)
 0
 }}}

 The new patch trac_10480_fast_PowerSeries_poly_multiplication2.patch
 passes all tests, included these.
 It performs generally better than the unpached Sage4.6,
 except the case in which the series has degree lower than the precision,
 like your example

 {{{
 p1 = K.random_element(800) + O(t^990)
 }}}

 on which it is typically up to 50% slower, while with the previous
 patch it was 2x slower.

 The first patch was also slow in other cases, see below.

 The performance of truncated multiplication vs Karatsuba multiplication
 depends on the distribution of the terms in the univariate series;
 truncated multiplication is fast when the high powers in the univariate
 variable t have many more terms than the low powers, like in this example

 using this patch:
 {{{
 sage: from sage.rings.power_series_poly import PowerSeries_poly
 sage: R.<a,b> = QQ[]
 sage: K.<t> = R[]
 sage: p = (1 + a*t + b*t^2 + O(t^50))^-1
 sage: v = [len(x.coefficients()) for x in p.coefficients()]
 sage: v[:6], v[-6:]
 ([1, 1, 2, 2, 3, 3], [23, 23, 24, 24, 25, 25])
 sage: %timeit p*p
 25 loops, best of 3: 26 ms per loop
 sage: %timeit
 PowerSeries_poly(p.parent(),p.polynomial().mul_trunc(p.polynomial(),50,1),50)
 125 loops, best of 3: 4.89 ms per loop
 sage: p*p -
 PowerSeries_poly(p.parent(),p.polynomial().mul_trunc(p.polynomial(),50,1),50)
 O(t^50)
 }}}

 unpached Sage (main):

 {{{
 sage: %timeit p*p
 5 loops, best of 3: 833 ms per loop
 }}}

 `mul_trunc(self, right, h, generic=1)`  calls `do_mul_trunc_generic`,
 which
 is the algorithm used in the first patch; in this case it is faster,
 but in other cases it is slower, so I think it is better to use
 `do_mul_trunc`.
 However in multivariate series `do_mul_trunc_generic` seems to perform
 always
 better; I will post a patch about this in ticket #1956

 {{{
 p = K.random_element(50) + O(t^50)
 }}}

 has uniform distribution; for `p*p` the patched version performs as the
 main version.


 In un.sage one takes N subsequent `p=p*(p+1)` starting with a
 random univariate series on a polynomial ring with n variables;
 the ratio of the number of terms in the upper half of the univariate
 series
 and the lower half is given.
 The ratio is close to 1 in a random polynomial, and increases with n and
 N.
 There is a correlation between the speedup
 using truncated multiplication and this ratio.
 The old patch (patch1) in some cases performs badly; the worst case is
 given.

 {{{
 n variables in polynomial ring on QQ; h precision;
 N = number of times p = p*(p+1) is taken
 n=1
 h=20  N=1     2    3     4    5    6   7
 ratio  0.86  0.98  1     1    1    1   1
 main   0.005 0.008 0.022 0.09 0.41 3.5 53.5
 patch  0.004 0.006 0.016 0.06 0.31 2.5 41.5
 h=100
 ratio  0.93  1     1     1    1
 main   0.049 0.14  0.47  1.6  7.9
 patch  0.043 0.10  0.31  1.1  5.5
 h=500
 ratio  0.96  1.0
 main   1.5   20.9
 patch1 2.0   49.1
 patch  1.3   17.0

 n=2
 h=20
 ratio  0.81  1.06  1.10  1.19 1.34
 main   0.009 0.04  0.30  4.3  134
 patch  0.007 0.03  0.22  2.6  61
 h=100
 ratio  1.05  1.01  1.0   1.0
 main   0.11  0.80  7.8   171
 patch  0.09  0.59  5.5   103

 n=4
 h=20
 ratio  1.08  1.53  1.64
 main   0.015 0.35  21.0
 patch  0.010 0.19  10.9

 n=8
 h=10
 ratio  1.28  2.67  5.43
 main   0.008 0.38  127
 patch  0.006 0.046 4.4

 n=16
 h=10
 ratio  1.00  2.77  8.90
 main   0.013 1.95  > 19m(6GB)
 patch  0.009 0.14  29.0(0.5GB)
 }}}

 In the last example I stopped the computation with main because too much
 memory was used.

 > I will try to implement the algorithm at "A long note on Mulders’ short
 product" of Hanrot and Zimmermann that looks not harder than Karatsuba.
 The authors claim that heuristically the gain time is on the average 0.7 x
 Karatsuba time.

 If I understand correctly, that paper deals with univariate series
 on numeric rings. Maybe in the case of univariate series over polynomial
 rings one could use the list of the number of elements to guide the
 algorithm.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10480#comment:5>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to