#8096: Speed up parent creation for multiplication of square matrices
------------------------------+---------------------------------------------
Reporter: boothby | Owner: boothby
Type: enhancement | Status: needs_review
Priority: minor | Milestone: sage-5.0
Component: linear algebra | Keywords:
Work_issues: rebase | Upstream: N/A
Reviewer: | Author: boothby, robertwb
Merged: | Dependencies:
------------------------------+---------------------------------------------
Changes (by SimonKing):
* status: needs_work => needs_review
Old description:
> Multiplication of small square matrices is ridiculously slow:
> {{{
> sage: for d in range(1, 100):
> ... print d
> ... A = random_matrix(GF(3), d)
> ... B = random_matrix(GF(3), d)
> ... timeit("C = A*B")
>
> 1
> 625 loops, best of 3: 93.8 µs per loop
> 2
> 625 loops, best of 3: 93.9 µs per loop
> 3
> 625 loops, best of 3: 94.2 µs per loop
> 4
> 625 loops, best of 3: 94.1 µs per loop
> 5
> 625 loops, best of 3: 94.7 µs per loop
> 6
> 625 loops, best of 3: 94.9 µs per loop
> 7
> 625 loops, best of 3: 95.2 µs per loop
> 8
> 625 loops, best of 3: 95.8 µs per loop
> 9
> 625 loops, best of 3: 96.8 µs per loop
> 10
> 625 loops, best of 3: 97.6 µs per loop
> 11
> 625 loops, best of 3: 98.1 µs per loop
> 12
> 625 loops, best of 3: 101 µs per loop
> 13
> 625 loops, best of 3: 101 µs per loop
> 14
> 625 loops, best of 3: 104 µs per loop
> 15
> 625 loops, best of 3: 104 µs per loop
> 16
> 625 loops, best of 3: 108 µs per loop
> 17
> 625 loops, best of 3: 108 µs per loop
> 18
> 625 loops, best of 3: 113 µs per loop
> 19
> 625 loops, best of 3: 112 µs per loop
> 20
> 625 loops, best of 3: 118 µs per loop
> 21
> 625 loops, best of 3: 117 µs per loop
> 22
> 625 loops, best of 3: 125 µs per loop
> 23
> 625 loops, best of 3: 123 µs per loop
> 24
> 625 loops, best of 3: 133 µs per loop
> 25
> 625 loops, best of 3: 129 µs per loop
> 26
> 625 loops, best of 3: 143 µs per loop
> 27
> 625 loops, best of 3: 137 µs per loop
> 28
> 625 loops, best of 3: 155 µs per loop
> 29
> 625 loops, best of 3: 147 µs per loop
> 30
> 625 loops, best of 3: 166 µs per loop
> 31
> 625 loops, best of 3: 157 µs per loop
> 32
> 625 loops, best of 3: 179 µs per loop
> 33
> 625 loops, best of 3: 168 µs per loop
> 34
> 625 loops, best of 3: 196 µs per loop
> 35
> 625 loops, best of 3: 182 µs per loop
> 36
> 625 loops, best of 3: 214 µs per loop
> 37
> 625 loops, best of 3: 198 µs per loop
> 38
> 625 loops, best of 3: 234 µs per loop
> 39
> 625 loops, best of 3: 213 µs per loop
> 40
> 625 loops, best of 3: 255 µs per loop
> 41
> 625 loops, best of 3: 231 µs per loop
> 42
> 625 loops, best of 3: 279 µs per loop
> 43
> 625 loops, best of 3: 251 µs per loop
> 44
> 625 loops, best of 3: 307 µs per loop
> 45
> 625 loops, best of 3: 271 µs per loop
> 46
> 625 loops, best of 3: 335 µs per loop
> 47
> 625 loops, best of 3: 298 µs per loop
> 48
> 625 loops, best of 3: 363 µs per loop
> 49
> 625 loops, best of 3: 319 µs per loop
> 50
> 625 loops, best of 3: 401 µs per loop
> 51
> 625 loops, best of 3: 346 µs per loop
> }}}
>
> Here's a profile of the 1x1 case:
>
> {{{
> 625 loops, best of 3: 91.7 µs per loop
> 810004 function calls in 5.980 CPU seconds
>
> Ordered by: standard name
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 40000 0.100 0.000 0.100 0.000 :0(IntegerMod)
> 30000 0.080 0.000 0.080 0.000 :0(append)
> 10000 0.030 0.000 0.030 0.000 :0(base_ring)
> 10000 0.150 0.000 0.900 0.000 :0(category)
> 40000 0.250 0.000 0.290 0.000 :0(has_key)
> 10000 0.070 0.000 0.070 0.000 :0(hasattr)
> 10000 0.030 0.000 0.030 0.000 :0(is_Matrix)
> 80000 0.250 0.000 0.250 0.000 :0(isinstance)
> 30000 0.070 0.000 0.070 0.000 :0(keys)
> 30000 0.040 0.000 0.040 0.000 :0(len)
> 30001 0.080 0.000 0.080 0.000 :0(range)
> 10000 0.030 0.000 0.030 0.000 :0(setdefault)
> 1 0.000 0.000 0.000 0.000 :0(setprofile)
> 30000 0.140 0.000 0.140 0.000 :0(sorted)
> 1 0.380 0.380 5.980 5.980 <string>:1(<module>)
> 30000 0.250 0.000 1.750 0.000 cachefunc.py:155(get_key)
> 10000 0.060 0.000 0.850 0.000 cachefunc.py:220(__call__)
> 10000 0.060 0.000 0.090 0.000 cachefunc.py:254(get_cache)
> 10000 0.050 0.000 0.050 0.000 cachefunc.py:275(__get__)
> 20000 0.270 0.000 1.550 0.000 cachefunc.py:76(__call__)
> 20000 0.060 0.000 0.060 0.000 cachefunc.py:95(get_cache)
> 10000 0.070 0.000 2.520 0.000
> category.py:459(__contains__)
> 10000 0.360 0.000 1.550 0.000
> category.py:651(is_subcategory)
> 20000 0.120 0.000 1.670 0.000
> classcall_metaclass.py:64(__call__)
> 20000 0.040 0.000 0.040 0.000
> finite_field_prime_modn.py:121(characteristic)
> 20000 0.110 0.000 0.110 0.000
> finite_field_prime_modn.py:187(order)
> 30000 1.010 0.000 1.500 0.000
> function_mangling.py:205(fix_to_pos)
> 30000 0.080 0.000 0.080 0.000
> function_mangling.py:261(<genexpr>)
> 40000 0.290 0.000 0.390 0.000
> integer_mod_ring.py:733(__call__)
> 10000 0.050 0.000 0.070 0.000
> matrix_group_element.py:68(is_MatrixGroupElement)
> 10000 0.450 0.000 0.710 0.000
> matrix_space.py:1035(matrix)
> 10000 0.140 0.000 4.000 0.000
> matrix_space.py:1089(matrix_space)
> 10000 0.200 0.000 3.830 0.000
> matrix_space.py:110(MatrixSpace)
> 10000 0.000 0.000 0.000 0.000 matrix_space.py:1112(ncols)
> 10000 0.030 0.000 0.030 0.000 matrix_space.py:1124(nrows)
> 10000 0.310 0.000 1.270 0.000
> matrix_space.py:271(__call__)
> 10000 0.000 0.000 0.000 0.000 misc.py:514(get_verbose)
> 1 0.000 0.000 5.980 5.980 profile:0(for i in
> range(10000): C = A*B)
> 0 0.000 0.000 profile:0(profiler)
> 90000 0.270 0.000 0.270 0.000
> unique_representation.py:454(__eq__)
> }}}
New description:
Multiplication of small square matrices is ridiculously slow:
{{{
sage: for d in range(1, 100):
... print d
... A = random_matrix(GF(3), d)
... B = random_matrix(GF(3), d)
... timeit("C = A*B")
1
625 loops, best of 3: 93.8 µs per loop
2
625 loops, best of 3: 93.9 µs per loop
3
625 loops, best of 3: 94.2 µs per loop
4
625 loops, best of 3: 94.1 µs per loop
5
625 loops, best of 3: 94.7 µs per loop
6
625 loops, best of 3: 94.9 µs per loop
7
625 loops, best of 3: 95.2 µs per loop
8
625 loops, best of 3: 95.8 µs per loop
9
625 loops, best of 3: 96.8 µs per loop
10
625 loops, best of 3: 97.6 µs per loop
11
625 loops, best of 3: 98.1 µs per loop
12
625 loops, best of 3: 101 µs per loop
13
625 loops, best of 3: 101 µs per loop
14
625 loops, best of 3: 104 µs per loop
15
625 loops, best of 3: 104 µs per loop
16
625 loops, best of 3: 108 µs per loop
17
625 loops, best of 3: 108 µs per loop
18
625 loops, best of 3: 113 µs per loop
19
625 loops, best of 3: 112 µs per loop
20
625 loops, best of 3: 118 µs per loop
21
625 loops, best of 3: 117 µs per loop
22
625 loops, best of 3: 125 µs per loop
23
625 loops, best of 3: 123 µs per loop
24
625 loops, best of 3: 133 µs per loop
25
625 loops, best of 3: 129 µs per loop
26
625 loops, best of 3: 143 µs per loop
27
625 loops, best of 3: 137 µs per loop
28
625 loops, best of 3: 155 µs per loop
29
625 loops, best of 3: 147 µs per loop
30
625 loops, best of 3: 166 µs per loop
31
625 loops, best of 3: 157 µs per loop
32
625 loops, best of 3: 179 µs per loop
33
625 loops, best of 3: 168 µs per loop
34
625 loops, best of 3: 196 µs per loop
35
625 loops, best of 3: 182 µs per loop
36
625 loops, best of 3: 214 µs per loop
37
625 loops, best of 3: 198 µs per loop
38
625 loops, best of 3: 234 µs per loop
39
625 loops, best of 3: 213 µs per loop
40
625 loops, best of 3: 255 µs per loop
41
625 loops, best of 3: 231 µs per loop
42
625 loops, best of 3: 279 µs per loop
43
625 loops, best of 3: 251 µs per loop
44
625 loops, best of 3: 307 µs per loop
45
625 loops, best of 3: 271 µs per loop
46
625 loops, best of 3: 335 µs per loop
47
625 loops, best of 3: 298 µs per loop
48
625 loops, best of 3: 363 µs per loop
49
625 loops, best of 3: 319 µs per loop
50
625 loops, best of 3: 401 µs per loop
51
625 loops, best of 3: 346 µs per loop
}}}
Here's a profile of the 1x1 case:
{{{
625 loops, best of 3: 91.7 µs per loop
810004 function calls in 5.980 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
40000 0.100 0.000 0.100 0.000 :0(IntegerMod)
30000 0.080 0.000 0.080 0.000 :0(append)
10000 0.030 0.000 0.030 0.000 :0(base_ring)
10000 0.150 0.000 0.900 0.000 :0(category)
40000 0.250 0.000 0.290 0.000 :0(has_key)
10000 0.070 0.000 0.070 0.000 :0(hasattr)
10000 0.030 0.000 0.030 0.000 :0(is_Matrix)
80000 0.250 0.000 0.250 0.000 :0(isinstance)
30000 0.070 0.000 0.070 0.000 :0(keys)
30000 0.040 0.000 0.040 0.000 :0(len)
30001 0.080 0.000 0.080 0.000 :0(range)
10000 0.030 0.000 0.030 0.000 :0(setdefault)
1 0.000 0.000 0.000 0.000 :0(setprofile)
30000 0.140 0.000 0.140 0.000 :0(sorted)
1 0.380 0.380 5.980 5.980 <string>:1(<module>)
30000 0.250 0.000 1.750 0.000 cachefunc.py:155(get_key)
10000 0.060 0.000 0.850 0.000 cachefunc.py:220(__call__)
10000 0.060 0.000 0.090 0.000 cachefunc.py:254(get_cache)
10000 0.050 0.000 0.050 0.000 cachefunc.py:275(__get__)
20000 0.270 0.000 1.550 0.000 cachefunc.py:76(__call__)
20000 0.060 0.000 0.060 0.000 cachefunc.py:95(get_cache)
10000 0.070 0.000 2.520 0.000
category.py:459(__contains__)
10000 0.360 0.000 1.550 0.000
category.py:651(is_subcategory)
20000 0.120 0.000 1.670 0.000
classcall_metaclass.py:64(__call__)
20000 0.040 0.000 0.040 0.000
finite_field_prime_modn.py:121(characteristic)
20000 0.110 0.000 0.110 0.000
finite_field_prime_modn.py:187(order)
30000 1.010 0.000 1.500 0.000
function_mangling.py:205(fix_to_pos)
30000 0.080 0.000 0.080 0.000
function_mangling.py:261(<genexpr>)
40000 0.290 0.000 0.390 0.000
integer_mod_ring.py:733(__call__)
10000 0.050 0.000 0.070 0.000
matrix_group_element.py:68(is_MatrixGroupElement)
10000 0.450 0.000 0.710 0.000 matrix_space.py:1035(matrix)
10000 0.140 0.000 4.000 0.000
matrix_space.py:1089(matrix_space)
10000 0.200 0.000 3.830 0.000
matrix_space.py:110(MatrixSpace)
10000 0.000 0.000 0.000 0.000 matrix_space.py:1112(ncols)
10000 0.030 0.000 0.030 0.000 matrix_space.py:1124(nrows)
10000 0.310 0.000 1.270 0.000
matrix_space.py:271(__call__)
10000 0.000 0.000 0.000 0.000 misc.py:514(get_verbose)
1 0.000 0.000 5.980 5.980 profile:0(for i in
range(10000): C = A*B)
0 0.000 0.000 profile:0(profiler)
90000 0.270 0.000 0.270 0.000
unique_representation.py:454(__eq__)
}}}
Apply [attachment:trac8096_speedup_matrix_parent.patch]
--
Comment:
I have rebased the patches, combining them into one patch. In fact, a part
(but not all) of the second patch is not needed anymore, since the
creation of the zero matrix has been improved in another ticket.
Here is evidence that the patch still does improve the timings for the
computation of small matrices - even after we have switched to linbox:
Without patch:
{{{
sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 40.5 µs per loop
}}}
With the patch:
{{{
sage: d = 1
sage: A = random_matrix(GF(3), d)
sage: B = random_matrix(GF(3), d)
sage: timeit("C = A*B")
625 loops, best of 3: 18.5 µs per loop
}}}
I need to run the doctests, though. But needs review.
Apply trac8096_speedup_matrix_parent.patch
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/8096#comment:40>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.