Dear All, I have some question on some recent implementation of PETSc for solving a large linear system from a 4d problem on hybrid unstructured meshes.
The point is that we have implemented all the mappings and the solution is fine, the number of iterations too. The results are robust with respect to the amount of CPU used but we have a scaling issue. The system is an intel cluster of the latest generation on Infiniband. We have attached the summary ... with hooefully a lot of informations. Any comments, suggestions, ideas are very welcome. We have been reading the threads with that are dealing with multi-core and the bus-limitation stuff, so we are aware of this. I am thinking now on an open/mpi hybrid stuff but I am not quite happy with the bus-limitation explanation, most of the systems are multicore. I hope the limitation are not the sparse matrix mapping that we are using ... Thanks in advance ... Cheers Aron * * -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120527/571f656d/attachment.html> -------------- next part -------------- PuertoRico Ymir Nodes: 124815 25.5.2012 Solver: KSPLGMRES Preconditioner: PCMG Matrix Size: 71,893,440 x 71,893,440 Total matrix NNZ: 482,500,000 MSC = MCD = 24 dt = 60sec Simulationszeit 20min 20 Zeitschritte (20 calls to KSPSolve) FLUCT Solver Solver Solver Eff. App App Eff. Nodes Threads time[sec] Iter Time[sec] Speedup [%] Time[sec] Speedup [%] 4 8 33-38 7-9 18.8 3.2 40 998 4.5 56 5 25 10-11 8-10 6.85 8.8 35 358 12.6 50 4 32 10-12 8-11 6.45 9.3 29 317 14.2 44 8 32 12-14 8-11 8.75 6.9 22 392 11.5 36 10 40 11-13 9-10 7.10 8.5 21 355 12.7 32 10 50 7-8 8-11 5.10 11.8 24 252 17.9 36 ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /Utilisateurs/aroland/bin/selfewwm_thomas on a linux-int named r1i3n7 with 40 processors, by aroland Fri May 25 19:17:22 2012 Using Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 09:28:45 CST 2012 Max Max/Min Avg Total Time (sec): 3.359e+02 1.00000 3.359e+02 Objects: 1.400e+02 1.00000 1.400e+02 Flops: 3.776e+10 1.14746 3.558e+10 1.423e+12 Flops/sec: 1.124e+08 1.14746 1.059e+08 4.237e+09 MPI Messages: 5.280e+03 6.00000 2.442e+03 9.768e+04 MPI Message Lengths: 5.991e+08 3.80128 1.469e+05 1.435e+10 MPI Reductions: 1.406e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.3594e+02 100.0% 1.4233e+12 100.0% 9.768e+04 100.0% 1.469e+05 100.0% 1.405e+03 99.9% ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 438 1.0 4.4104e+01 1.9 1.06e+10 1.1 9.7e+04 1.5e+05 0.0e+00 11 28100100 0 11 28100100 0 9131 MatSolve 458 1.0 3.6361e+01 2.8 1.09e+10 1.1 0.0e+00 0.0e+00 0.0e+00 10 29 0 0 0 10 29 0 0 0 11380 MatLUFactorNum 20 1.0 6.9722e+00 2.7 4.73e+0810.5 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1534 MatILUFactorSym 1 1.0 2.2795e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 45 1.0 8.7023e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 45 1.0 2.9763e+00 1.7 0.00e+00 0.0 4.4e+02 3.7e+04 8.0e+00 1 0 0 0 1 1 0 0 0 1 0 MatGetRowIJ 1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 20 1.0 5.9878e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetOrdering 1 1.0 1.8968e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 19 1.0 9.1826e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMax 20 1.0 1.0160e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 1 0 0 0 0 1 0 VecMin 20 1.0 1.0970e+0012.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 1 0 0 0 0 1 0 VecMDot 418 1.0 3.4962e+01 4.9 4.69e+09 1.2 0.0e+00 0.0e+00 4.2e+02 4 12 0 0 30 4 12 0 0 30 5013 VecNorm 687 1.0 4.1451e+01 5.3 2.64e+09 1.2 0.0e+00 0.0e+00 6.9e+02 5 7 0 0 49 5 7 0 0 49 2383 VecScale 667 1.0 5.9862e+00 3.2 1.28e+09 1.2 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 8011 VecCopy 269 1.0 2.2040e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecSet 1225 1.0 8.5033e+00 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAXPY 269 1.0 3.6360e+00 2.7 1.03e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 10638 VecMAXPY 667 1.0 1.5898e+01 2.9 6.30e+09 1.2 0.0e+00 0.0e+00 0.0e+00 4 17 0 0 0 4 17 0 0 0 14806 VecAssemblyBegin 40 1.0 2.4674e+00297.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+02 1 0 0 0 9 1 0 0 0 9 0 VecAssemblyEnd 40 1.0 1.2517e-04 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 438 1.0 1.1889e+00 9.1 0.00e+00 0.0 9.7e+04 1.5e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 438 1.0 2.2600e+0189.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecNormalize 458 1.0 3.5072e+01 3.8 2.64e+09 1.2 0.0e+00 0.0e+00 4.6e+02 5 7 0 0 33 5 7 0 0 33 2817 KSPGMRESOrthog 418 1.0 3.9814e+01 2.2 9.38e+09 1.2 0.0e+00 0.0e+00 4.2e+02 7 25 0 0 30 7 25 0 0 30 8805 KSPSetup 60 1.0 1.7510e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0 KSPSolve 20 1.0 1.4208e+02 1.0 3.78e+10 1.1 9.7e+04 1.5e+05 1.2e+03 42100100100 82 42100100100 82 10017 PCSetUp 40 1.0 1.3285e+01 2.1 4.73e+0810.5 0.0e+00 0.0e+00 1.2e+01 3 1 0 0 1 3 1 0 0 1 805 PCSetUpOnBlocks 687 1.0 4.4281e+01 2.5 1.13e+10 1.1 0.0e+00 0.0e+00 3.0e+00 12 30 0 0 0 12 30 0 0 0 9586 PCApply 229 1.0 1.0103e+02 1.1 2.30e+10 1.1 5.1e+04 1.5e+05 6.9e+02 29 61 52 52 49 29 61 52 52 49 8567 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Index Set 10 10 7700472 0 IS L to G Mapping 1 1 564 0 Application Order 1 1 999160 0 Matrix 5 5 386024408 0 Vector 114 114 1062454512 0 Vector Scatter 2 2 2072 0 Krylov Solver 3 3 37800 0 Preconditioner 3 3 2800 0 Viewer 1 0 0 0 ======================================================================================================================== Average time to get PetscTime(): 0 Average time for MPI_Barrier(): 3.93867e-05 Average time for zero size MPI_Send(): 4.75049e-06 #PETSc Option Table entries: -log_summary #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Thu Apr 26 22:05:34 2012 Configure options: --prefix=/Utilisateurs/aroland/thomas/opt/petsc_3.2-p6 --download-f-blas-lapack=1 --with-mpi-dir=/Utilisateurs/aroland/thomas/opt/mpich2/ --with-superlu_dist=true --download-superlu_dist=yes --with-parmetis-lib="[/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libparmetis.a,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libmetis.a]" --with-parmetis-include=/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/ --with-debugging=0 COPTFLAGS=-O3 FOPTFLAGS=-O3 ----------------------------------------- Libraries compiled on Thu Apr 26 22:05:34 2012 on service1 Machine characteristics: Linux-2.6.32.12-0.7-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6 Using PETSc arch: linux-intel-performance ----------------------------------------- Using C compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90 -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -I/Utilisateurs/aroland/thomas/opt/mpich2/include -I/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/include ----------------------------------------- Using C linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc Using Fortran linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90 Using libraries: -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lsuperlu_dist_2.5 -Wl,-rpath,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -L/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -lparmetis -lmetis -lflapack -lfblas -lm -L/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/ipp/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 -L/Calcul/Apps/intel/impi/4.0.3.008/lib64 -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lirc_s -lm -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl -----------------------------------------
