On Mon, Aug 1, 2011 at 9:31 PM, Adam Byrd <adam1.byrd at gmail.com> wrote:
> On Mon, Aug 1, 2011 at 5:09 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > >> >> On Aug 1, 2011, at 3:00 PM, Adam Byrd wrote: >> >> > Hello, >> > >> > I'm looking for help reducing the time and communication of a parallel >> MatMatSolve using MUMPS. On a single processor I experience decent solve >> times (~9 seconds each), but when moving to multiple processors I see longer >> times with more cores. I've run with -log_summary and confirmed >> (practically) all the time is spent in MatMatSolve. I'm fairly certain it's >> all communication between nodes and I'm trying to figure out where I can >> make optimizations, or if it is even feasible for this type of problem. It >> is a parallel, dense, >> >> I hope you mean that the original matrix you use with MUMPS is sparse >> (you should not use MUMPS to solve dense linear systems). >> > > Oops, yes. The original matrix is sparse. It requires the solution and > identity matrix to be dense. I was typing faster than thinking. > >> >> > direct solve using MUMPS with an LU preconditioner. I know there are >> many smaller optimizations that can be done in other areas, but at the >> moment it is only the solve that concerns me. >> >> MUMPS will run slower on 2 processors than 1, this is just a fact of >> life. You will only gain with parallel for MUMPS for large problems. >> > > I see. It looks like I took off in the wrong direction then. I'm trying to > solve for the inverse of a sparse matrix in parallel. I'm starting at > 3600x3600 and will be moving to 30,000x30,000+ in the future. Which solver > suits this sort of problem? > The key to parallel computing (and most other things) is choosing the right problem.This unfortunately, is not a problem that lends itself to parallelism. Matt > >> Barry >> >> >> >> > >> > ---------------------------------------------- PETSc Performance >> Summary: ---------------------------------------------- >> > >> > ./cntor on a complex-c named hpc-1-0.local with 2 processors, by abyrd >> Mon Aug 1 16:25:51 2011 >> > Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011 >> > >> > Max Max/Min Avg Total >> > Time (sec): 1.307e+02 1.00000 1.307e+02 >> > Objects: 1.180e+02 1.00000 1.180e+02 >> > Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> > Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> > Memory: 2.091e+08 1.00001 4.181e+08 >> > MPI Messages: 7.229e+03 1.00000 7.229e+03 1.446e+04 >> > MPI Message Lengths: 4.141e+08 1.00000 5.729e+04 8.283e+08 >> > MPI Reductions: 1.464e+04 1.00000 >> > >> > Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> > e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> > and VecAXPY() for complex vectors of length >> N --> 8N flops >> > >> > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> > Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> > 0: Main Stage: 1.3072e+02 100.0% 0.0000e+00 0.0% 1.446e+04 >> 100.0% 5.729e+04 100.0% 1.730e+02 1.2% >> > >> > >> ------------------------------------------------------------------------------------------------------------------------ >> > See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> > Phase summary info: >> > Count: number of times phase was executed >> > Time and Flops: Max - maximum over all processors >> > Ratio - ratio of maximum to minimum over all >> processors >> > Mess: number of messages sent >> > Avg. len: average message length >> > Reduct: number of global reductions >> > Global: entire computation >> > Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> > %T - percent time in this phase %F - percent flops in this >> phase >> > %M - percent messages in this phase %L - percent message >> lengths in this phase >> > %R - percent reductions in this phase >> > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> > >> ------------------------------------------------------------------------------------------------------------------------ >> > >> > >> > ########################################################## >> > # # >> > # WARNING!!! # >> > # # >> > # This code was compiled with a debugging option, # >> > # To get timing results run config/configure.py # >> > # using --with-debugging=no, the performance will # >> > # be generally two or three times faster. # >> > # # >> > ########################################################## >> > >> > >> > >> > >> > ########################################################## >> > # # >> > # WARNING!!! # >> > # # >> > # The code for various complex numbers numerical # >> > # kernels uses C++, which generally is not well # >> > # optimized. For performance that is about 4-5 times # >> > # faster, specify --with-fortran-kernels=1 # >> > # when running config/configure.py. # >> > # # >> > ########################################################## >> > >> > >> > Event Count Time (sec) Flops >> --- Global --- --- Stage --- Total >> > Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> > >> ------------------------------------------------------------------------------------------------------------------------ >> > >> > --- Event Stage 0: Main Stage >> > >> > MatSolve 14400 1.0 1.2364e+02 1.0 0.00e+00 0.0 1.4e+04 5.7e+04 >> 2.0e+01 95 0100100 0 95 0100100 12 0 >> > MatLUFactorSym 4 1.0 2.0027e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > MatLUFactorNum 4 1.0 3.4223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 2.4e+01 3 0 0 0 0 3 0 0 0 14 0 >> > MatConvert 1 1.0 2.3644e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.1e+01 0 0 0 0 0 0 0 0 0 6 0 >> > MatAssemblyBegin 14 1.0 1.9959e-01 9.3 0.00e+00 0.0 3.0e+01 5.2e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 7 0 >> > MatAssemblyEnd 14 1.0 1.9908e-01 1.1 0.00e+00 0.0 4.0e+00 2.8e+01 >> 2.0e+01 0 0 0 0 0 0 0 0 0 12 0 >> > MatGetRow 32 1.0 4.2677e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > MatGetSubMatrice 4 1.0 7.6661e-03 1.0 0.00e+00 0.0 1.6e+01 1.2e+05 >> 2.4e+01 0 0 0 0 0 0 0 0 0 14 0 >> > MatMatSolve 4 1.0 1.2380e+02 1.0 0.00e+00 0.0 1.4e+04 5.7e+04 >> 2.0e+01 95 0100100 0 95 0100100 12 0 >> > VecSet 4 1.0 1.8590e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > VecScatterBegin 28800 1.0 2.2810e+00 2.2 0.00e+00 0.0 1.4e+04 5.7e+04 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> > VecScatterEnd 14400 1.0 4.1534e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> > KSPSetup 4 1.0 1.1060e-0212.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > PCSetUp 4 1.0 3.4280e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 5.6e+01 3 0 0 0 0 3 0 0 0 32 0 >> > >> ------------------------------------------------------------------------------------------------------------------------ >> > >> > Memory usage is given in bytes: >> > >> > Object Type Creations Destructions Memory Descendants' >> Mem. >> > Reports information only for process 0. >> > >> > --- Event Stage 0: Main Stage >> > >> > Matrix 27 27 208196712 0 >> > Vec 36 36 1027376 0 >> > Vec Scatter 11 11 7220 0 >> > Index Set 42 42 22644 0 >> > Krylov Solver 1 1 34432 0 >> > Preconditioner 1 1 752 0 >> > >> ======================================================================================================================== >> > Average time to get PetscTime(): 1.90735e-07 >> > Average time for MPI_Barrier(): 3.8147e-06 >> > Average time for zero size MPI_Send(): 7.51019e-06 >> > #PETSc Option Table entries: >> > -log_summary >> > -pc_factor_mat_solver_package mumps >> > -pc_type lu >> > #End of PETSc Option Table entries >> > Compiled without FORTRAN kernels >> > Compiled with full precision matrices (default) >> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 16 >> > Configure run at: Mon Jul 11 15:28:42 2011 >> > Configure options: PETSC_ARCH=complex-cpp-mumps --with-cc=mpicc >> --with-fc=mpif90 --with-blas-lapack-dir=/usr/lib64 --with-shared >> --with-clanguage=c++ --with-scalar-type=complex --download-mumps=1 >> --download-blacs=1 --download-scalapack=1 --download-parmetis=1 >> --with-cxx=mpicxx >> > ----------------------------------------- >> > Libraries compiled on Mon Jul 11 15:39:58 EDT 2011 on sc.local >> > Machine characteristics: Linux sc.local 2.6.18-194.11.1.el5 #1 SMP Tue >> Aug 10 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux >> > Using PETSc directory: /panfs/storage.local/scs/home/abyrd/petsc-3.1-p8 >> > Using PETSc arch: complex-cpp-mumps >> > ----------------------------------------- >> > Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g >> -fPIC >> > Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -g >> > ----------------------------------------- >> > Using include paths: >> -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/include >> -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/include >> -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/include >> -I/usr/mpi/gnu/openmpi-1.4.2/include -I/usr/mpi/gnu/openmpi-1.4.2/lib64 >> > ------------------------------------------ >> > Using C linker: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g >> > Using Fortran linker: mpif90 -fPIC -Wall -Wno-unused-variable -g >> > Using libraries: >> -Wl,-rpath,/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib >> -L/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib >> -lpetsc -lX11 >> -Wl,-rpath,/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib >> -L/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib >> -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis >> -lscalapack -lblacs -Wl,-rpath,/usr/lib64 -L/usr/lib64 -llapack -lblas -lnsl >> -lrt -Wl,-rpath,/usr/mpi/gnu/openmpi-1.4.2/lib64 >> -L/usr/mpi/gnu/openmpi-1.4.2/lib64 >> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 >> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal >> -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm >> -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal >> -lnsl -lutil -lgcc_s -lpthread -ldl >> > >> > Respectfully, >> > Adam Byrd >> > <PETScCntor.zip> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110801/9e23783d/attachment.htm>
