Thank you all for you reply!
Are you using a KSP/PC configuration which should weak scale?
Yes the system is solved with KSPSolve. There is no preconditioner yet,
but I fixed the number of CG iterations to 3 to ensure an apples to
apples comparison during the scaling measurements.
VecScatter has been greatly refactored (and the default implementation
is entirely new) since 3.7.
I now tried to use PETSc 3.11 and the code runs fine. The communication
seems to show a better weak scaling behavior now.
I'll see if we can just upgrade to 3.11.
Anyway, I'm curious about your
configuration and how you determine that MPI_Alltoallv/MPI_Alltoallw is
being used.
I used the Extrae profiler which intercepts all MPI calls and logs them
into a file. This showed that Alltoall is being used for the
communication, which I found surprising. With PETSc 3.11 the Alltoall
calls are replaced by MPI_Start(all) and MPI_Wait(all), which sounds
more reasonable to me.
This has never been a default code path, so I suspect
something in your environment or code making this happen.
I attached some log files for some PETSc 3.7 runs on 1,19 and 115 nodes
(24 cores each) which suggest polynomial scaling (vs logarithmic
scaling). Could it be some installation setting of the PETSc version? (I
use a preinstalled PETSc)
Can you please send representative log files which characterize the
lack of scaling (include the full log_view)?
"Stage 1: activation" is the stage of interest, as it wraps the
KSPSolve. The number of unkowns per rank is very small in the
measurement, so most of the time should be communication. However, I
just noticed, that the stage also contains an additional setup step
which might be the reason why the MatMul takes longer than the KSPSolve.
I can repeat the measurements if necessary.
I should add, that I put a MPI_Barrier before the KSPSolve, to avoid any
previous work imbalance to effect the KSPSolve call.
Best regards,
Felix
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
... on a haswell named nid04236 with 18 processors, by me Fri Jan 24 14:00:09 2020
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 1.955e+01 1.00000 1.955e+01
Objects: 2.677e+04 1.26112 2.300e+04
Flops: 2.304e+08 1.55728 1.977e+08 3.559e+09
Flops/sec: 1.179e+07 1.55728 1.012e+07 1.821e+08
MPI Messages: 3.953e+03 1.28469 3.426e+03 6.168e+04
MPI Message Lengths: 7.619e+06 1.76512 1.608e+03 9.915e+07
MPI Reductions: 4.489e+04 1.16111
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0610e+01 54.3% 0.0000e+00 0.0% 5.456e+03 8.8% 1.421e+00 0.1% 2.540e+03 5.7%
1: activation: 3.8488e+00 19.7% 1.6746e+06 0.0% 0.000e+00 0.0% 1.497e+00 0.1% 2.200e+01 0.0%
2: activation_rhs: 6.9804e-06 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
3: run: 5.0884e+00 26.0% 3.5578e+09 100.0% 5.622e+04 91.2% 1.605e+03 99.8% 3.811e+04 84.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
--- Event Stage 1: activation
VecTDot 7 1.0 5.9128e-05 1.4 5.59e+03 1.7 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 0 0 5 0 0 32 1437
VecNorm 5 1.0 7.2002e-05 2.2 4.00e+03 1.7 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 4 0 0 23 844
VecCopy 6 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 6.9141e-06 1.8 4.80e+03 1.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 10545
VecAYPX 3 1.0 2.1458e-06 0.0 2.00e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 14158
VecScatterBegin 5 1.0 6.2842e+00104594.2 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 18 0 0 0 0 90 0 0100 23 0
VecScatterEnd 5 1.0 6.9141e-06 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 5 1.0 6.2843e+0034957.9 9.01e+04 1.4 0.0e+00 0.0e+00 5.0e+00 18 0 0 0 0 90 85 0100 23 0
KSPSetUp 1 1.0 2.7895e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 3.8600e-04 1.0 8.85e+04 1.5 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 83 0 80 95 3600
PCSetUp 1 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 5 1.0 5.9605e-06 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
VecMDot 4876 1.0 6.8255e-02 1.5 6.03e+07 1.7 0.0e+00 0.0e+00 4.9e+03 0 26 0 0 11 1 26 0 0 13 13416
VecTDot 336 1.3 6.2203e-04 1.5 4.97e+05 1.3 0.0e+00 0.0e+00 2.9e+02 0 0 0 0 1 0 0 0 0 1 12288
VecNorm 5369 1.0 4.0783e-02 1.2 4.52e+06 1.6 0.0e+00 0.0e+00 5.3e+03 0 2 0 0 12 1 2 0 0 14 1686
VecScale 5039 1.0 4.8268e-03 2.9 2.02e+06 1.7 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 6343
VecCopy 6019 1.0 1.4820e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 3361 1.2 2.5723e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1798 1.2 4.0171e-03 1.1 6.75e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 25923
VecAYPX 12 0.0 7.8678e-06 0.0 1.78e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 15801
VecMAXPY 5039 1.0 9.2657e-03 1.6 6.43e+07 1.7 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 105369
VecScatterBegin 5808 1.0 5.4529e+00146.7 0.00e+00 0.0 0.0e+00 0.0e+00 5.7e+03 4 0 0 94 13 17 0 0 94 15 0
VecScatterEnd 5808 1.0 1.8566e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 5039 1.0 4.0384e-02 1.1 6.05e+06 1.7 0.0e+00 0.0e+00 5.0e+03 0 3 0 0 11 1 3 0 0 13 2274
MatMult 5206 1.0 1.1557e-01 1.2 9.14e+07 1.4 0.0e+00 0.0e+00 5.2e+03 1 41 0 94 12 2 41 0 94 14 12512
MatScale 82 1.3 2.6333e-03 1.1 1.89e+05 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1109
MatAssemblyBegin 1147 1.3 1.6677e-0116.5 0.00e+00 0.0 8.5e+03 6.0e+02 1.6e+03 1 0 14 5 3 2 0 15 5 4 0
MatAssemblyEnd 1147 1.3 3.7627e-02 1.3 0.00e+00 0.0 8.6e+03 1.0e+01 2.2e+03 0 0 14 0 5 1 0 15 0 6 0
MatGetValues 36 1.0 6.4850e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 59940 1.3 4.7204e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 81 1.3 2.7820e-02 1.3 3.59e+05 1.3 2.5e+03 4.0e+00 1.1e+03 0 0 4 0 2 1 0 4 0 3 199
MatMatMultSym 81 1.3 2.4759e-02 1.3 0.00e+00 0.0 2.5e+03 4.0e+00 9.7e+02 0 0 4 0 2 0 0 4 0 3 0
MatMatMultNum 81 1.3 2.9891e-03 1.3 3.59e+05 1.3 0.0e+00 0.0e+00 1.4e+02 0 0 0 0 0 0 0 0 0 0 1856
MatGetLocalMat 162 1.3 6.6185e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 162 1.3 1.1911e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 4876 1.0 7.8934e-02 1.5 1.21e+08 1.7 0.0e+00 0.0e+00 4.9e+03 0 51 0 0 11 1 52 0 0 13 23219
KSPSetUp 2 1.0 8.1062e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 163 1.3 2.2359e-01 1.0 2.24e+08 1.6 0.0e+00 0.0e+00 1.6e+04 1 97 0 94 35 4 97 0 94 41 15460
PCSetUp 2 1.0 9.5367e-07 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 5369 1.0 2.6193e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 84 1.3 3.3445e-0332.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 84 1.3 4.1623e-03 4.7 0.00e+00 0.0 4.5e+01 7.0e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 84 1.3 7.8917e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSided 84 1.3 3.7937e-03 7.3 0.00e+00 0.0 9.0e+00 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 1066 656 513568 0.
IS L to G Mapping 328 0 0 0.
Vector 738 0 0 0.
Vector Scatter 328 0 0 0.
Viewer 1 0 0 0.
--- Event Stage 1: activation
Vector 8 2 7784 0.
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
Index Set 8417 5402 4224128 0.
IS L to G Mapping 4015 661 2363088 0.
Vector 6616 817 2765416 0.
Vector Scatter 2700 0 0 0.
Matrix 2458 661 1824360 0.
Matrix Null Space 1 0 0 0.
Krylov Solver 3 0 0 0.
Preconditioner 3 0 0 0.
Star Forest Bipartite Graph 84 84 71232 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 6.19888e-06
Average time for zero size MPI_Send(): 5.54985e-06
#PETSc Option Table entries:
--cellml_file ../../../input/hodgkin_huxley_1952.c
--diffusion_solver_maxit 5
--disable_firing_output
--dt_0D 1e-3
--dt_1D 2e-3
--dt_3D 4e-3
--dt_splitting 2e-3
--emg_initial_guess_nonzero
--emg_preconditioner_type none
--emg_solver_maxit 3
--emg_solver_type cg
--end_time 4e-3
--fiber_distribution_file ../../../input/MU_fibre_distribution_3780.txt
--fiber_file ../../../input/25x25fibers.bin
--firing_times_file ../../../input/MU_firing_times_real.txt
--n_subdomains 3
--scenario_name weak_scaling_3_3_2_
-on_error_abort
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 --with-mpi-lib="[]" --with-mpi-include="[]" --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu" --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu_dist-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu_dist" --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-parmetis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lparmetis" --with-metis=1 --with-metis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-metis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lmetis" --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-ptscotch-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-mumps-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lesmumps -lpord" --with-hdf5=1 --with-hdf5-include=/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/include --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/lib -lhdf5_parallel -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --LIBS= --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell --prefix=/opt/cray/pe/petsc/3.7.6.2/real/GNU/5.3/haswell --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-hypre-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lHYPRE" --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-sundials-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
... on a haswell named nid04454 with 450 processors, by me Fri Jan 24 14:10:16 2020
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 3.901e+01 1.00001 3.901e+01
Objects: 2.122e+04 1.29938 1.757e+04
Flops: 3.160e+08 1.15372 3.020e+08 1.359e+11
Flops/sec: 8.100e+06 1.15373 7.742e+06 3.484e+09
MPI Messages: 3.364e+03 1.40029 2.820e+03 1.269e+06
MPI Message Lengths: 1.516e+07 2.22510 4.830e+03 6.131e+09
MPI Reductions: 5.445e+04 1.11343
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.8516e+01 47.5% 0.0000e+00 0.0% 1.116e+05 8.8% 1.842e+00 0.0% 1.941e+03 3.6%
1: activation: 7.6666e-03 0.0% 3.2847e+07 0.0% 0.000e+00 0.0% 2.268e+00 0.0% 2.200e+01 0.0%
2: activation_rhs: 7.2331e-06 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
3: run: 2.0490e+01 52.5% 1.3588e+11 100.0% 1.158e+06 91.2% 4.826e+03 99.9% 4.834e+04 88.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
--- Event Stage 1: activation
VecTDot 7 1.0 1.6212e-04 1.3 3.58e+03 1.1 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 0 2 5 0 0 32 9618
VecNorm 5 1.0 1.7476e-04 2.0 2.56e+03 1.1 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 2 3 0 0 23 6386
VecCopy 6 1.0 5.2452e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 1.0252e-05 2.1 3.07e+03 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 130628
VecAYPX 3 1.0 3.3379e-06 0.0 1.28e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 167173
VecScatterBegin 5 1.0 6.8860e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 90 0 0100 23 0
VecScatterEnd 5 1.0 1.3351e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 5 1.0 7.0133e-03 1.0 6.64e+04 1.3 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 91 86 0100 23 4032
KSPSetUp 1 1.0 3.0994e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.8902e-03 1.0 6.36e+04 1.2 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 24 83 0 80 95 14386
PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 5 1.0 7.1526e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
VecMDot 10000 1.0 3.0873e-01 1.3 7.92e+07 1.1 0.0e+00 0.0e+00 1.0e+04 1 25 0 0 18 1 25 0 0 21 111760
VecTDot 292 1.5 5.1212e-04 1.7 4.32e+05 1.5 0.0e+00 0.0e+00 2.2e+02 0 0 0 0 0 0 0 0 0 0 281566
VecNorm 10608 1.0 3.9872e-01 1.8 5.70e+06 1.1 0.0e+00 0.0e+00 1.1e+04 1 2 0 0 19 1 2 0 0 22 6142
VecScale 10334 1.0 1.3559e-02 2.0 2.65e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 85057
VecCopy 11327 1.0 3.6891e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2869 1.3 2.0351e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1858 1.2 5.2643e-03 1.6 5.51e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 389840
VecAYPX 18 0.0 8.8215e-06 0.0 2.66e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 205353
VecMAXPY 10334 1.0 1.4979e-02 1.2 8.44e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 2457210
VecScatterBegin 10962 1.0 4.9950e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+04 12 0 0 97 20 23 0 0 97 22 0
VecScatterEnd 10962 1.0 8.7821e-03 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 10334 1.0 2.4845e-01 1.1 7.94e+06 1.1 0.0e+00 0.0e+00 1.0e+04 1 3 0 0 19 1 3 0 0 21 13925
MatMult 10479 1.0 2.6717e+00 1.0 1.38e+08 1.3 0.0e+00 0.0e+00 1.0e+04 7 43 0 97 19 13 43 0 97 22 21938
MatScale 65 1.3 2.6226e-03 1.2 1.49e+05 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21205
MatAssemblyBegin 909 1.3 1.4655e-01 9.1 0.00e+00 0.0 1.8e+05 8.5e+02 1.2e+03 0 0 14 3 2 0 0 16 3 2 0
MatAssemblyEnd 909 1.3 3.5992e-02 1.7 0.00e+00 0.0 1.8e+05 1.3e+01 1.7e+03 0 0 15 0 3 0 0 16 0 4 0
MatGetValues 36 1.4 9.4175e-05 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 47360 1.3 4.0329e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 64 1.3 2.5717e-02 1.6 2.84e+05 1.3 4.8e+04 4.0e+00 8.4e+02 0 0 4 0 2 0 0 4 0 2 4101
MatMatMultSym 64 1.3 2.2983e-02 1.6 0.00e+00 0.0 4.8e+04 4.0e+00 7.4e+02 0 0 4 0 1 0 0 4 0 2 0
MatMatMultNum 64 1.3 2.5036e-03 1.4 2.84e+05 1.3 0.0e+00 0.0e+00 1.1e+02 0 0 0 0 0 0 0 0 0 0 42121
MatGetLocalMat 128 1.3 6.2647e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 128 1.3 9.9993e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 10000 1.0 3.2691e-01 1.2 1.58e+08 1.1 0.0e+00 0.0e+00 1.0e+04 1 51 0 0 18 1 51 0 0 21 211305
KSPSetUp 2 1.0 6.5250e-03181.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 129 1.3 3.3797e+00 1.1 3.11e+08 1.2 0.0e+00 0.0e+00 3.1e+04 8 99 0 97 57 16 99 0 97 65 39637
PCSetUp 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10608 1.0 6.7325e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 67 1.3 1.2636e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 67 1.3 1.2133e-03 1.6 0.00e+00 0.0 1.1e+03 5.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 67 1.3 8.5115e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSided 67 1.3 8.2541e-04 1.7 0.00e+00 0.0 2.2e+02 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 845 520 407104 0.
IS L to G Mapping 260 0 0 0.
Vector 585 0 0 0.
Vector Scatter 260 0 0 0.
Viewer 1 0 0 0.
--- Event Stage 1: activation
Vector 8 2 6376 0.
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
Index Set 6666 4280 3346816 0.
IS L to G Mapping 3182 525 1865968 0.
Vector 5256 647 2184824 0.
Vector Scatter 2139 0 0 0.
Matrix 1948 525 1449000 0.
Matrix Null Space 1 0 0 0.
Krylov Solver 3 0 0 0.
Preconditioner 3 0 0 0.
Star Forest Bipartite Graph 67 67 56816 0.
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 1.67847e-05
Average time for zero size MPI_Send(): 5.97318e-06
#PETSc Option Table entries:
--cellml_file ../../../input/hodgkin_huxley_1952.c
--diffusion_solver_maxit 5
--disable_firing_output
--dt_0D 1e-3
--dt_1D 2e-3
--dt_3D 4e-3
--dt_splitting 2e-3
--emg_initial_guess_nonzero
--emg_preconditioner_type none
--emg_solver_maxit 3
--emg_solver_type cg
--end_time 4e-3
--fiber_distribution_file ../../../input/MU_fibre_distribution_3780.txt
--fiber_file ../../../input/109x109fibers.bin
--firing_times_file ../../../input/MU_firing_times_real.txt
--n_subdomains 15
--scenario_name weak_scaling_15_15_2
-on_error_abort
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 --with-mpi-lib="[]" --with-mpi-include="[]" --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu" --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu_dist-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu_dist" --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-parmetis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lparmetis" --with-metis=1 --with-metis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-metis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lmetis" --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-ptscotch-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-mumps-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lesmumps -lpord" --with-hdf5=1 --with-hdf5-include=/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/include --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/lib -lhdf5_parallel -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --LIBS= --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell --prefix=/opt/cray/pe/petsc/3.7.6.2/real/GNU/5.3/haswell --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-hypre-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lHYPRE" --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-sundials-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
... on a haswell named nid03252 with 2738 processors, by me Fri Jan 24 15:32:45 2020
Using Petsc Release Version 3.7.6, Apr, 24, 2017
Max Max/Min Avg Total
Time (sec): 1.215e+02 1.00001 1.215e+02
Objects: 2.122e+04 1.29938 1.863e+04
Flops: 3.162e+08 1.15455 3.042e+08 8.328e+11
Flops/sec: 2.603e+06 1.15455 2.504e+06 6.856e+09
MPI Messages: 3.364e+03 1.40029 2.990e+03 8.185e+06
MPI Message Lengths: 1.516e+07 2.22511 4.777e+03 3.910e+10
MPI Reductions: 5.447e+04 1.11392
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.8289e+01 56.2% 0.0000e+00 0.0% 7.198e+05 8.8% 1.822e+00 0.0% 2.058e+03 3.8%
1: activation: 1.1864e-02 0.0% 2.0221e+08 0.0% 0.000e+00 0.0% 2.243e+00 0.0% 2.200e+01 0.0%
2: activation_rhs: 7.2035e-06 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
3: run: 5.3171e+01 43.8% 8.3258e+11 100.0% 7.466e+06 91.2% 4.773e+03 99.9% 4.941e+04 90.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
--- Event Stage 1: activation
VecTDot 7 1.0 3.8028e-04 1.3 3.58e+03 1.1 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 0 3 5 0 0 32 24948
VecNorm 5 1.0 3.2425e-04 1.6 2.56e+03 1.1 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 2 3 0 0 23 20941
VecCopy 6 1.0 1.0967e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 1.4067e-05 3.7 3.07e+03 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 579261
VecAYPX 3 1.0 8.8215e-06 0.0 1.28e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 384869
VecScatterBegin 5 1.0 1.0569e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 89 0 0100 23 0
VecScatterEnd 5 1.0 2.9087e-0530.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 5 1.0 1.0730e-02 1.0 6.64e+04 1.3 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 90 86 0100 23 16252
KSPSetUp 1 1.0 5.1975e-05 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 5.2700e-03 1.0 6.36e+04 1.2 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 44 83 0 80 95 31751
PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 5 1.0 1.1921e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
VecMDot 10000 1.0 6.0100e-01 1.1 7.92e+07 1.1 0.0e+00 0.0e+00 1.0e+04 0 25 0 0 18 1 25 0 0 20 349316
VecTDot 304 1.6 5.8770e-04 3.2 4.50e+05 1.6 0.0e+00 0.0e+00 2.3e+02 0 0 0 0 0 0 0 0 0 0 1584199
VecNorm 10614 1.0 6.8425e-01 1.5 5.71e+06 1.1 0.0e+00 0.0e+00 1.1e+04 0 2 0 0 19 1 2 0 0 21 21855
VecScale 10334 1.0 1.4122e-02 2.1 2.65e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 496878
VecCopy 11333 1.0 4.0164e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2869 1.3 2.1954e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1870 1.2 6.5804e-03 1.7 5.53e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2005584
VecAYPX 24 0.0 1.5497e-05 0.0 3.55e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 748346
VecMAXPY 10334 1.0 1.5738e-02 1.3 8.44e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 14229163
VecScatterBegin 10968 1.0 2.1788e+01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+04 17 0 0 97 20 40 0 0 97 22 0
VecScatterEnd 10968 1.0 9.3315e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 10334 1.0 4.8330e-01 1.0 7.94e+06 1.1 0.0e+00 0.0e+00 1.0e+04 0 3 0 0 19 1 3 0 0 21 43557
MatMult 10485 1.0 1.0264e+01 1.0 1.38e+08 1.3 0.0e+00 0.0e+00 1.0e+04 8 43 0 97 19 19 43 0 97 21 35225
MatScale 65 1.3 2.8410e-03 1.3 1.49e+05 1.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126118
MatAssemblyBegin 909 1.3 1.6064e-0114.3 0.00e+00 0.0 1.2e+06 8.4e+02 1.3e+03 0 0 14 3 2 0 0 16 3 3 0
MatAssemblyEnd 909 1.3 3.2645e-02 1.5 0.00e+00 0.0 1.2e+06 1.3e+01 1.8e+03 0 0 15 0 3 0 0 16 0 4 0
MatGetValues 36 1.4 1.0395e-04 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 47360 1.3 4.7610e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMatMult 64 1.3 2.9266e-02 1.9 2.84e+05 1.3 3.1e+05 4.0e+00 9.0e+02 0 0 4 0 2 0 0 4 0 2 23271
MatMatMultSym 64 1.3 2.6498e-02 2.0 0.00e+00 0.0 3.1e+05 4.0e+00 7.8e+02 0 0 4 0 1 0 0 4 0 2 0
MatMatMultNum 64 1.3 2.5842e-03 1.7 2.84e+05 1.3 0.0e+00 0.0e+00 1.1e+02 0 0 0 0 0 0 0 0 0 0 263541
MatGetLocalMat 128 1.3 5.5459e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 128 1.3 1.0481e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 10000 1.0 6.1988e-01 1.1 1.58e+08 1.1 0.0e+00 0.0e+00 1.0e+04 0 50 0 0 18 1 50 0 0 20 678034
KSPSetUp 2 1.0 9.9897e-05 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 129 1.3 1.1538e+01 1.0 3.11e+08 1.2 0.0e+00 0.0e+00 3.1e+04 9 98 0 97 57 21 99 0 97 63 71082
PCSetUp 2 1.0 7.8678e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10614 1.0 6.8228e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 67 1.3 1.9526e-04 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 67 1.3 1.5805e-03 1.8 0.00e+00 0.0 6.8e+03 5.2e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 67 1.3 9.5844e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
BuildTwoSided 67 1.3 1.1051e-03 1.8 0.00e+00 0.0 1.4e+03 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 845 520 407104 0.
IS L to G Mapping 260 0 0 0.
Vector 585 0 0 0.
Vector Scatter 260 0 0 0.
Viewer 1 0 0 0.
--- Event Stage 1: activation
Vector 8 2 6376 0.
--- Event Stage 2: activation_rhs
--- Event Stage 3: run
Index Set 6666 4280 3346816 0.
IS L to G Mapping 3182 525 1865968 0.
Vector 5256 647 2184824 0.
Vector Scatter 2139 0 0 0.
Matrix 1948 525 1449000 0.
Matrix Null Space 1 0 0 0.
Krylov Solver 3 0 0 0.
Preconditioner 3 0 0 0.
Star Forest Bipartite Graph 67 67 56816 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 3.70026e-05
Average time for zero size MPI_Send(): 6.16989e-06
#PETSc Option Table entries:
--cellml_file ../../../input/hodgkin_huxley_1952.c
--diffusion_solver_maxit 5
--disable_firing_output
--dt_0D 1e-3
--dt_1D 2e-3
--dt_3D 4e-3
--dt_splitting 2e-3
--emg_initial_guess_nonzero
--emg_preconditioner_type none
--emg_solver_maxit 3
--emg_solver_type cg
--end_time 4e-3
--fiber_distribution_file ../../../input/MU_fibre_distribution_3780.txt
--fiber_file ../../../input/277x277fibers.bin
--firing_times_file ../../../input/MU_firing_times_real.txt
--n_subdomains 37
--scenario_name weak_scaling_37_37_2
-on_error_abort
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-has-attribute-aligned=1 --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-dependencies=0 --with-fc=ftn --with-fortran-datatypes=0 --with-fortran-interfaces=0 --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 --with-mpi-lib="[]" --with-mpi-include="[]" --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mp" --with-superlu=1 --with-superlu-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu" --with-superlu_dist=1 --with-superlu_dist-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-superlu_dist-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsuperlu_dist" --with-parmetis=1 --with-parmetis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-parmetis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lparmetis" --with-metis=1 --with-metis-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-metis-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lmetis" --with-ptscotch=1 --with-ptscotch-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-ptscotch-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lptscotch -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 --with-mumps-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-mumps-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lesmumps -lpord" --with-hdf5=1 --with-hdf5-include=/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/include --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.10.1.1/GNU/5.1/lib -lhdf5_parallel -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --LIBS= --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell --prefix=/opt/cray/pe/petsc/3.7.6.2/real/GNU/5.3/haswell --with-hypre=1 --with-hypre-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-hypre-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lHYPRE" --with-sundials=1 --with-sundials-include=/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/include --with-sundials-lib="-L/opt/cray/tpsl/17.11.1/GNU/5.1/haswell/lib -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial"