Danyang Su <[email protected]> writes: > Hi All, > > I have test the same example under Ubuntu12.04 X64. The PETSc-dev > version is update to date (GIT Date: 2013-11-01 14:59:20 -0500) and the > installation is smooth without any error. The speedup of MPI version is > linear scalable but the speedup of OpenMP version does not change. *From > the CPU usage, the program still run in one thread when use OpenMP. * > > The commands to run the test are as follows: > > openmp > ./ex2f -threadcomm_type openmp -threadcomm_nthreads 4 -m 1000 -n 1000 > -log_summary log_ex2f_1000x1000_ubuntu1204_omp_p4.log > > mpi > mpiexec -n 4 ./ex2f -m 1000 -n 1000 -log_summary > log_ex2f_1000x1000_ubuntu1204_mpi_p4.log > > This problem is so tricky to me. Can anybody confirm if KSP solver is > parallelized for OpenMP version? > > Thanks and regards, > > Danyang > > On 31/10/2013 4:54 PM, Danyang Su wrote: >> Hi All, >> >> I have a question on the speedup of PETSc when using OpenMP. I can get >> good speedup when using MPI, but no speedup when using OpenMP. >> The example is ex2f with m=100 and n=100. The number of available >> processors is 16 (32 threads) and the OS is Windows Server 2012. The >> log files for 4 and 8 processors are attached. >> >> The commands I used to run with 4 processors are as follows: >> Run using MPI >> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary >> log_100x100_mpi_p4.log >> >> Run using OpenMP >> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 >> -m 100 -n 100 -log_summary log_100x100_openmp_p4.log >> >> The PETSc used for this test is PETSc for Windows >> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this >> is not the problem because the same problem exists when I use >> PETSc-dev in Cygwin. I don't know if this problem exists in Linux, >> would anybody help to test? >> >> Thanks and regards, >> >> Danyang > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov > 4 15:35:47 2013 > With 4 threads per MPI_Comm > Using Petsc Development GIT revision: > 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500 > > Max Max/Min Avg Total > Time (sec): 2.376e+02 1.00000 2.376e+02 > Objects: 4.500e+01 1.00000 4.500e+01 > Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11 > Flops/sec: 9.271e+08 1.00000 9.271e+08 9.271e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> > 2N flops > and VecAXPY() for complex vectors of length N --> > 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 2.3759e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% > 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting > output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %f - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in > this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 2657 1.0 4.1715e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 18 11 0 0 0 18 11 0 0 0 573 > MatSolve 2657 1.0 6.4028e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 373 > MatLUFactorNum 1 1.0 1.1149e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 99 > MatILUFactorSym 1 1.0 8.2365e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 9.1023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 1.0014e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.2122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 2571 1.0 5.1144e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 22 36 0 0 0 22 36 0 0 0 1555 > VecNorm 2658 1.0 5.4516e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 975 > VecScale 2657 1.0 3.8631e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 688 > VecCopy 86 1.0 2.2233e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 88 1.0 1.1501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 172 1.0 4.4589e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 771 > VecMAXPY 2657 1.0 6.9213e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 29 38 0 0 0 29 38 0 0 0 1223 > VecNormalize 2657 1.0 9.3968e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 4 0 0 0 4 4 0 0 0 848 > KSPGMRESOrthog 2571 1.0 1.1630e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 49 72 0 0 0 49 72 0 0 0 1367 > KSPSetUp 1 1.0 2.8520e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.3699e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00100100 0 0 0 100100 0 0 0 929 > PCSetUp 1 1.0 2.0609e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 53 > PCApply 2657 1.0 6.4088e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 373 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 2 2 151957404 0 > Vector 37 37 296057424 0 > Krylov Solver 1 1 18368 0 > Preconditioner 1 1 984 0 > Index Set 3 3 4002304 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 6.50883e-06 > #PETSc Option Table entries: > -log_summary log_1000x1000_omp_p4.log > -m 1000 > -n 1000 > -threadcomm_nthreads 4 > -threadcomm_type openmp > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure run at: Mon Nov 4 15:22:12 2013 > Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc > --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp > --with-debugging=0
Add --with-threadcomm --with-pthreadclasses to the configuration. Using --with-openmp on its own doesn't turn on threadcomm. Use all three flags for now and compare -threadcomm_type openmp to -threadcomm_type pthread. http://www.mcs.anl.gov/petsc/documentation/installation.html#threads > ----------------------------------------- > Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc > Machine characteristics: > Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise > Using PETSc directory: /home/dsu/petsc > Using PETSc arch: linux-gnu-omp-opt > ----------------------------------------- > > Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable > -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include > -I/home/dsu/petsc/include -I/home/dsu/petsc/include > -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni > ----------------------------------------- > > Using C linker: gcc > Using Fortran linker: gfortran > Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib > -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc > -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib > -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 > -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ > -ldl -lgcc_s -ldl > ----------------------------------------- > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov > 4 15:31:30 2013 > Using Petsc Development GIT revision: > 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500 > > Max Max/Min Avg Total > Time (sec): 2.388e+02 1.00000 2.388e+02 > Objects: 4.500e+01 1.00000 4.500e+01 > Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11 > Flops/sec: 9.224e+08 1.00000 9.224e+08 9.224e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> > 2N flops > and VecAXPY() for complex vectors of length N --> > 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 2.3881e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% > 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting > output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %f - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in > this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 2657 1.0 4.0429e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 17 11 0 0 0 17 11 0 0 0 591 > MatSolve 2657 1.0 6.3888e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 374 > MatLUFactorNum 1 1.0 1.2874e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 85 > MatILUFactorSym 1 1.0 1.3501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 6.8491e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.3066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 2571 1.0 5.2507e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 22 36 0 0 0 22 36 0 0 0 1514 > VecNorm 2658 1.0 5.4426e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 2 0 0 0 2 2 0 0 0 977 > VecScale 2657 1.0 3.8871e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 684 > VecCopy 86 1.0 1.9921e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 88 1.0 1.0965e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 172 1.0 4.0171e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 856 > VecMAXPY 2657 1.0 7.0096e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 29 38 0 0 0 29 38 0 0 0 1208 > VecNormalize 2657 1.0 9.4060e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 4 0 0 0 4 4 0 0 0 847 > KSPGMRESOrthog 2571 1.0 1.1847e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 50 72 0 0 0 50 72 0 0 0 1342 > KSPSetUp 1 1.0 3.7805e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.3820e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00100100 0 0 0 100100 0 0 0 925 > PCSetUp 1 1.0 2.7698e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 40 > PCApply 2657 1.0 6.3946e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 374 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 2 2 151957404 0 > Vector 37 37 296057424 0 > Krylov Solver 1 1 18368 0 > Preconditioner 1 1 984 0 > Index Set 3 3 4002304 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 8.51154e-06 > #PETSc Option Table entries: > -log_summary log_1000x1000_omp_p1.log > -m 1000 > -n 1000 > -threadcomm_nthreads 1 > -threadcomm_type openmp > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure run at: Mon Nov 4 15:22:12 2013 > Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc > --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp > --with-debugging=0 > ----------------------------------------- > Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc > Machine characteristics: > Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise > Using PETSc directory: /home/dsu/petsc > Using PETSc arch: linux-gnu-omp-opt > ----------------------------------------- > > Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable > -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include > -I/home/dsu/petsc/include -I/home/dsu/petsc/include > -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni > ----------------------------------------- > > Using C linker: gcc > Using Fortran linker: gfortran > Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib > -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc > -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib > -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 > -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ > -ldl -lgcc_s -ldl > ----------------------------------------- > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./ex2f on a linux-gnu-opt named dsu-pc with 4 processors, by root Mon Nov 4 > 16:10:24 2013 > Using Petsc Development GIT revision: > 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500 > > Max Max/Min Avg Total > Time (sec): 5.364e+01 1.00045 5.362e+01 > Objects: 5.600e+01 1.00000 5.600e+01 > Flops: 2.837e+10 1.00010 2.837e+10 1.135e+11 > Flops/sec: 5.291e+08 1.00054 5.290e+08 2.116e+09 > MPI Messages: 2.744e+03 2.00000 2.058e+03 8.232e+03 > MPI Message Lengths: 2.193e+07 2.00000 7.991e+03 6.578e+07 > MPI Reductions: 2.720e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> > 2N flops > and VecAXPY() for complex vectors of length N --> > 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 5.3623e+01 100.0% 1.1347e+11 100.0% 8.232e+03 100.0% > 7.991e+03 100.0% 2.719e+03 100.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting > output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %f - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in > this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 1370 1.0 8.7882e+00 1.0 3.08e+09 1.0 8.2e+03 8.0e+03 > 0.0e+00 16 11100100 0 16 11100100 0 1402 > MatSolve 1370 1.0 9.0304e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 17 11 0 0 0 17 11 0 0 0 1362 > MatLUFactorNum 1 1.0 3.3336e-02 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 329 > MatILUFactorSym 1 1.0 7.1875e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1 1.0 7.2212e-0241.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 5.4802e-02 1.0 0.00e+00 0.0 1.2e+01 2.0e+03 > 9.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 1.2875e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 4.8881e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 1325 1.0 1.4754e+01 1.0 1.02e+10 1.0 0.0e+00 0.0e+00 > 1.3e+03 27 36 0 0 49 27 36 0 0 49 2776 > VecNorm 1371 1.0 1.9989e+00 1.1 6.86e+08 1.0 0.0e+00 0.0e+00 > 1.4e+03 4 2 0 0 50 4 2 0 0 50 1372 > VecScale 1370 1.0 4.9844e-01 1.1 3.42e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 1 0 0 0 1 1 0 0 0 2749 > VecCopy 45 1.0 4.4863e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 1418 1.0 6.2273e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 90 1.0 1.0165e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 1771 > VecMAXPY 1370 1.0 1.5635e+01 1.0 1.09e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 29 38 0 0 0 29 38 0 0 0 2789 > VecScatterBegin 1370 1.0 1.6159e-01 1.8 0.00e+00 0.0 8.2e+03 8.0e+03 > 0.0e+00 0 0100100 0 0 0100100 0 0 > VecScatterEnd 1370 1.0 9.6929e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecNormalize 1370 1.0 2.5033e+00 1.1 1.03e+09 1.0 0.0e+00 0.0e+00 > 1.4e+03 5 4 0 0 50 5 4 0 0 50 1642 > KSPGMRESOrthog 1325 1.0 2.9419e+01 1.0 2.05e+10 1.0 0.0e+00 0.0e+00 > 1.3e+03 54 72 0 0 49 54 72 0 0 49 2784 > KSPSetUp 2 1.0 2.1291e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 5.2989e+01 1.0 2.84e+10 1.0 8.2e+03 8.0e+03 > 2.7e+03 99100100100 99 99100100100 99 2141 > PCSetUp 2 1.0 1.4600e-01 1.1 2.74e+06 1.0 0.0e+00 0.0e+00 > 5.0e+00 0 0 0 0 0 0 0 0 0 0 75 > PCSetUpOnBlocks 1 1.0 1.1017e-01 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 100 > PCApply 1370 1.0 9.7092e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 18 11 0 0 0 18 11 0 0 0 1267 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 4 4 56984588 0 > Vector 41 41 74071440 0 > Vector Scatter 1 1 1060 0 > Index Set 5 5 1007832 0 > Krylov Solver 2 2 19520 0 > Preconditioner 2 2 1864 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 6.19888e-06 > Average time for MPI_Barrier(): 0.000529623 > Average time for zero size MPI_Send(): 0.000117242 > #PETSc Option Table entries: > -log_summary log_1000x1000_mpi_p4.log > -m 1000 > -n 1000 > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure run at: Mon Nov 4 14:29:26 2013 > Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran > --with-debugging=0 --download-f-blas-lapack --download-mpich > ----------------------------------------- > Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc > Machine characteristics: > Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise > Using PETSc directory: /home/dsu/petsc > Using PETSc arch: linux-gnu-opt > ----------------------------------------- > > Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall > -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} > ${CFLAGS} > Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC > -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} > ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include > -I/home/dsu/petsc/include -I/home/dsu/petsc/include > -I/home/dsu/petsc/linux-gnu-opt/include > ----------------------------------------- > > Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc > Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 > Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib > -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc > -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib > -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 > -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath > -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl > ----------------------------------------- > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./ex2f on a linux-gnu-opt named dsu-pc with 1 processor, by root Mon Nov 4 > 16:14:37 2013 > Using Petsc Development GIT revision: > 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500 > > Max Max/Min Avg Total > Time (sec): 2.295e+02 1.00000 2.295e+02 > Objects: 4.500e+01 1.00000 4.500e+01 > Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11 > Flops/sec: 9.597e+08 1.00000 9.597e+08 9.597e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 5.236e+03 1.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> > 2N flops > and VecAXPY() for complex vectors of length N --> > 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total > Avg %Total counts %Total > 0: Main Stage: 2.2953e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% > 0.000e+00 0.0% 5.235e+03 100.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting > output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %f - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in > this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over > all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatMult 2657 1.0 4.0388e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 18 11 0 0 0 18 11 0 0 0 592 > MatSolve 2657 1.0 6.1962e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 386 > MatLUFactorNum 1 1.0 1.2718e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 86 > MatILUFactorSym 1 1.0 9.5901e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1 1.0 1.2159e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 6.3241e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.2885e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecMDot 2571 1.0 4.9771e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 > 2.6e+03 22 36 0 0 49 22 36 0 0 49 1598 > VecNorm 2658 1.0 5.2489e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 > 2.7e+03 2 2 0 0 51 2 2 0 0 51 1013 > VecScale 2657 1.0 3.5420e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 750 > VecCopy 86 1.0 2.0908e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 88 1.0 1.1408e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 172 1.0 4.3620e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 789 > VecMAXPY 2657 1.0 6.6513e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 29 38 0 0 0 29 38 0 0 0 1273 > VecNormalize 2657 1.0 8.8659e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 > 2.7e+03 4 4 0 0 51 4 4 0 0 51 899 > KSPGMRESOrthog 2571 1.0 1.1234e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 > 2.6e+03 49 72 0 0 49 49 72 0 0 49 1416 > KSPSetUp 1 1.0 2.9065e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.2896e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 > 5.2e+03100100 0 0100 100100 0 0100 962 > PCSetUp 1 1.0 2.3610e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 0 0 0 0 0 0 47 > PCApply 2657 1.0 6.2019e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 27 11 0 0 0 27 11 0 0 0 385 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 2 2 151957404 0 > Vector 37 37 296057424 0 > Krylov Solver 1 1 18368 0 > Preconditioner 1 1 984 0 > Index Set 3 3 4002304 0 > Viewer 1 0 0 0 > ======================================================================================================================== > Average time to get PetscTime(): 5.81741e-06 > #PETSc Option Table entries: > -log_summary log_1000x1000_mpi_p1.log > -m 1000 > -n 1000 > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure run at: Mon Nov 4 14:29:26 2013 > Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran > --with-debugging=0 --download-f-blas-lapack --download-mpich > ----------------------------------------- > Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc > Machine characteristics: > Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise > Using PETSc directory: /home/dsu/petsc > Using PETSc arch: linux-gnu-opt > ----------------------------------------- > > Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall > -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} > ${CFLAGS} > Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC > -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} > ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include > -I/home/dsu/petsc/include -I/home/dsu/petsc/include > -I/home/dsu/petsc/linux-gnu-opt/include > ----------------------------------------- > > Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc > Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 > Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib > -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc > -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib > -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 > -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu > -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath > -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl > -----------------------------------------
pgpHnehl6uTxd.pgp
Description: PGP signature
