#15104: Special case modn_dense matrix operations to improve performance
-------------------------------------+-------------------------------------
       Reporter:  nbruin             |        Owner:
           Type:  enhancement        |       Status:  needs_work
       Priority:  major              |    Milestone:  sage-6.2
      Component:  linear algebra     |   Resolution:
       Keywords:                     |    Merged in:
        Authors:  Nils Bruin         |    Reviewers:
Report Upstream:  N/A                |  Work issues:
         Branch:                     |       Commit:
  u/nbruin/ticket/15104              |  a908e28159a544ca33f03dcbf0e8def3cfe9a60e
   Dependencies:                     |     Stopgaps:
-------------------------------------+-------------------------------------

Comment (by nbruin):

 Some timings. I changed `dense_template.transpose` to use the given parent
 for square matrices.
 With this inner copy loop:
 {{{
         for i from 0 <= i < ncols:
             for j from 0 <= j < nrows:
                 M._entries[j+i*nrows] = self._entries[i+j*ncols]
 }}}
 I get:
 {{{
 sage: k=GF(17)
 sage: A=matrix(k,100,101,[k.random_element() for i in range(100*101)])
 sage: B=matrix(k,100,100,[k.random_element() for i in range(100*100)])
 sage: %timeit At=A.transpose()
 10000 loops, best of 3: 30.3 us per loop
 sage: %timeit Bt=B.transpose()
 100000 loops, best of 3: 11 us per loop
 }}}
 As you can see, parent creation overhead is still the main thing.

 With this inner loop
 {{{
         for i from 0 <= i < ncols:
             for j from 0 <= j < nrows:
                 M._matrix[i][j] = self._matrix[j][i]
 }}}
 I get:
 {{{
 sage: %timeit At=A.transpose()
 10000 loops, best of 3: 35.8 us per loop
 sage: %timeit Bt=B.transpose()
 100000 loops, best of 3: 15.8 us per loop
 }}}
 as one of the better timings. Depending on the matrix creation, but
 consistent with that fixed, I was also seeing `23.4 us`, which I guess
 happens if the `._matrix` pointer array is unfortunately allocated in
 memory relative to `_entries` (cache thrashing perhaps).

 I've also tried:
 {{{
         Midx=0
         for i from 0 <= i < ncols:
             selfidx=i
             for j from 0 <= j < nrows:
                 M._entries[Midx]=self._entries[selfidx]
                 Midx+=1
                 selfidx+=ncols
 }}}
 which was not really distinguishable from the first solution, but if
 anything, slightly slower. So my guess is that a multiplication is not
 something to worry about on modern CPUs.

--
Ticket URL: <http://trac.sagemath.org/ticket/15104#comment:15>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to