Thanks for the MWE. But it behaves exactly as I would anticipate, see log. I think that naming the event differently for each rank is not good but fortunately PETSc just ignores this for rank > 0. Important is the event id which is the same. I correctly get that the time spent in this event is slightly more than 8 s and the ratio between the shortest and the longest time among all ranks is 8.0.

Where's the deadlock? Note that PetscLogEventBegin/End and PetscStageLogPush/Pop are not collective!

And thanks for another argument for better specificity of the log view. If the same event would be nested at several levels, the obtained total time wouldn't say anything useful.

Thanks

Vaclav

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


./ex1111 on a arch-mac-mpich-matlab-hdf5 named vhapla-macbook-pro with 8 
processors, by vhapla Mon Jul  2 17:27:41 2018
Using Petsc Development GIT revision: v3.9.2-600-g2e10669da5  GIT Date: 
2018-06-15 17:49:51 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           8.012e+00      1.00006   8.012e+00
Objects:              1.000e+00      1.00000   1.000e+00
Flop:                 0.000e+00      0.00000   0.000e+00  0.000e+00
Flop/sec:            0.000e+00      0.00000   0.000e+00  0.000e+00
Memory:               3.819e+04      1.00000              3.055e+05
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       1.200e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flop
                            and VecAXPY() for complex vectors of length N --> 
8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 8.0115e+00 100.0%  0.0000e+00   0.0%  0.000e+00   0.0%  
0.000e+00        0.0%  4.000e+00  33.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                             
--- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

Event rank 0           1 1.0 8.0052e+00 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 56  0  0  0  0  56  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000237179
Average time for zero size MPI_Send(): 0.000297129
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc="/usr/bin/xcrun -sdk macosx10.13 clang" 
--with-matlab=1 --with-matlab-arch=maci64 --with-matlab-engine 
--with-matlab-socket=0 --with-hdf5-dir=/opt/miniconda3/envs/salvus 
--download-mpich --with-valgrind --with-x=0 
PETSC_ARCH=arch-mac-mpich-matlab-hdf5
-----------------------------------------
Libraries compiled on 2018-06-19 13:50:29 on vhapla-macbook-pro 
Machine characteristics: Darwin-17.6.0-x86_64-i386-64bit
Using PETSc directory: /Users/vhapla/petsc1
Using PETSc arch: arch-mac-mpich-matlab-hdf5
-----------------------------------------

Using C compiler: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc    
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-fstack-protector -Qunused-arguments -fvisibility=hidden -g3  
-----------------------------------------

Using include paths: -I/Users/vhapla/petsc1/include 
-I/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/include 
-I/opt/miniconda3/envs/salvus/include 
-I/Applications/MATLAB_R2017b.app/extern/include -I/usr/local/include
-----------------------------------------

Using C linker: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc
Using libraries: -Wl,-rpath,/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib 
-L/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib -lpetsc 
-Wl,-rpath,/opt/miniconda3/envs/salvus/lib -L/opt/miniconda3/envs/salvus/lib 
-L/Applications/MATLAB_R2017b.app/bin/maci64 
-L/Applications/MATLAB_R2017b.app/extern/lib/maci64 -llapack -lblas -lhdf5_hl 
-lhdf5 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lstdc++ -ldl
-----------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################



2. 7. 2018 v 17:20, Stefano Zampini <[email protected]>:




I don't see why this should lead to deadlock? With current class-wise events you can already have many simultaneous instances of the same event at once.


Attached a MVE to better explain myself. If you run with p processes, you will see what I mean

 

--
Stefano
<log.c>

Reply via email to