Thanks for the MWE. But it behaves exactly as I would anticipate, see log. I think that naming the event differently for each rank is not good but fortunately PETSc just ignores this for rank > 0. Important is the event id which is the same. I correctly get that the time spent in this event is slightly more than 8 s and the ratio between the shortest and the longest time among all ranks is 8.0. Where's the deadlock? Note that PetscLogEventBegin/End and PetscStageLogPush/Pop are not collective! And thanks for another argument for better specificity of the log view. If the same event would be nested at several levels, the obtained total time wouldn't say anything useful. Thanks Vaclav |
************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
./ex1111 on a arch-mac-mpich-matlab-hdf5 named vhapla-macbook-pro with 8
processors, by vhapla Mon Jul 2 17:27:41 2018
Using Petsc Development GIT revision: v3.9.2-600-g2e10669da5 GIT Date:
2018-06-15 17:49:51 +0200
Max Max/Min Avg Total
Time (sec): 8.012e+00 1.00006 8.012e+00
Objects: 1.000e+00 1.00000 1.000e+00
Flop: 0.000e+00 0.00000 0.000e+00 0.000e+00
Flop/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00
Memory: 3.819e+04 1.00000 3.055e+05
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 1.200e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flop
and VecAXPY() for complex vectors of length N -->
8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 8.0115e+00 100.0% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 4.000e+00 33.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flop
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
Event rank 0 1 1.0 8.0052e+00 8.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 56 0 0 0 0 56 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000237179
Average time for zero size MPI_Send(): 0.000297129
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc="/usr/bin/xcrun -sdk macosx10.13 clang"
--with-matlab=1 --with-matlab-arch=maci64 --with-matlab-engine
--with-matlab-socket=0 --with-hdf5-dir=/opt/miniconda3/envs/salvus
--download-mpich --with-valgrind --with-x=0
PETSC_ARCH=arch-mac-mpich-matlab-hdf5
-----------------------------------------
Libraries compiled on 2018-06-19 13:50:29 on vhapla-macbook-pro
Machine characteristics: Darwin-17.6.0-x86_64-i386-64bit
Using PETSc directory: /Users/vhapla/petsc1
Using PETSc arch: arch-mac-mpich-matlab-hdf5
-----------------------------------------
Using C compiler: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-fstack-protector -Qunused-arguments -fvisibility=hidden -g3
-----------------------------------------
Using include paths: -I/Users/vhapla/petsc1/include
-I/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/include
-I/opt/miniconda3/envs/salvus/include
-I/Applications/MATLAB_R2017b.app/extern/include -I/usr/local/include
-----------------------------------------
Using C linker: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc
Using libraries: -Wl,-rpath,/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib
-L/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib -lpetsc
-Wl,-rpath,/opt/miniconda3/envs/salvus/lib -L/opt/miniconda3/envs/salvus/lib
-L/Applications/MATLAB_R2017b.app/bin/maci64
-L/Applications/MATLAB_R2017b.app/extern/lib/maci64 -llapack -lblas -lhdf5_hl
-lhdf5 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lstdc++ -ldl
-----------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
|
