I added proper preallocation for MatTranspose_MPIAIJ(), which speeds it up greatly.
https://bitbucket.org/petsc/petsc-dev/changeset/486d000050ec62fbd732c0049cb5f09b2b5709b8 https://bitbucket.org/petsc/petsc-dev/changeset/75fca7ed1efa754ca010596a8ba69319501baf52(oops) Testing on cg $ mpirun -n 64 ./ex56 -pc_type gamg -ksp_monitor -ksp_rtol 1e-1 -log_summary -mattransposematmult_viamatmatmult 1 *Before:* -ne 99 MatTranspose 3 1.0 *1.3230e+00* 1.0 0.00e+00 0.0 1.0e+04 2.7e+03 5.1e+01 17 0 3 2 4 33 0 6 7 4 0 MatTrnMatMult 3 1.0 1.8360e+00 1.0 2.26e+07 1.1 2.3e+04 6.0e+03 1.2e+02 24 2 6 12 9 46 10 13 35 10 765 -ne 119 MatTranspose 3 1.0 *2.3402e+00* 1.0 0.00e+00 0.0 1.3e+04 3.1e+03 5.1e+01 16 0 3 2 4 34 0 6 7 4 0 MatTrnMatMult 3 1.0 3.2240e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03 1.2e+02 23 2 6 12 9 46 10 13 35 10 759 *After:* -ne 99 MatTranspose 3 1.0 *9.5813e-02* 1.0 0.00e+00 0.0 1.0e+04 2.7e+03 4.8e+01 1 0 3 2 4 3 0 6 7 4 0 MatTrnMatMult 3 1.0 6.0673e-01 1.0 2.26e+07 1.1 2.3e+04 6.0e+03 1.2e+02 8 2 6 12 9 21 10 13 35 10 2316 -ne 119 MatTranspose 3 1.0 *1.8572e-01* 1.0 0.00e+00 0.0 1.3e+04 3.1e+03 4.8e+01 2 0 3 2 4 4 0 6 7 4 0 MatTrnMatMult 3 1.0 1.0656e+00 1.0 3.91e+07 1.1 2.8e+04 6.9e+03 1.2e+02 10 2 6 12 9 23 10 13 35 10 2297 *Reference* (-mattransposematmult_viamatmatmult 0): -ne 99 MatTrnMatMult 3 1.0 8.0196e-01 1.0 1.02e+08 1.1 1.3e+04 1.3e+04 8.7e+01 13 10 4 15 7 28 33 8 40 7 7831 -ne 119 MatTrnMatMult 3 1.0 1.3759e+00 1.0 1.78e+08 1.1 1.6e+04 1.6e+04 8.7e+01 12 10 4 15 7 27 33 8 40 8 7999 I don't know why the reference implementation claims to have done so many more flops. This indicates that perhaps it makes sense for MatPtAP to do an explicit transpose and then RAP. Unless we can find a fast data structure for A^T * B. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121014/84a2cee8/attachment.html>
