OK, now I got you. So I think it's actually not that important when they are created but whether and in which order they are registered on different ranks (see attached edited MWE). Yes, that's a relevant concern I must admit, mainly for things working with subcomms.

Thanks

Vaclav
[0] event1 registered as 3
[0] event2 registered as 4
[2] event1 registered as 3
[2] event2 registered as 4
[4] event1 registered as 3
[4] event2 registered as 4
[5] event2 registered as 3
[5] event1 registered as 4
[7] event2 registered as 3
[7] event1 registered as 4
[1] event2 registered as 3
[3] event2 registered as 3
[3] event1 registered as 4
[1] event1 registered as 4
[6] event1 registered as 3
[6] event2 registered as 4
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


./ex1111 on a arch-mac-mpich-matlab-hdf5 named vhapla-macbook-pro with 8 
processors, by vhapla Mon Jul  2 18:36:13 2018
Using Petsc Development GIT revision: v3.9.2-600-g2e10669da5  GIT Date: 
2018-06-15 17:49:51 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           8.012e+00      1.00011   8.011e+00
Objects:              1.000e+00      1.00000   1.000e+00
Flop:                 0.000e+00      0.00000   0.000e+00  0.000e+00
Flop/sec:            0.000e+00      0.00000   0.000e+00  0.000e+00
Memory:               3.821e+04      1.00000              3.057e+05
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       1.200e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flop
                            and VecAXPY() for complex vectors of length N --> 
8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 8.0109e+00 100.0%  0.0000e+00   0.0%  0.000e+00   0.0%  
0.000e+00        0.0%  4.000e+00  33.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                             
--- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

 uÝ                  1 1.0 8.0012e+00 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 56  0  0  0  0  56  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000205183
Average time for zero size MPI_Send(): 0.000130266
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc="/usr/bin/xcrun -sdk macosx10.13 clang" 
--with-matlab=1 --with-matlab-arch=maci64 --with-matlab-engine 
--with-matlab-socket=0 --with-hdf5-dir=/opt/miniconda3/envs/salvus 
--download-mpich --with-valgrind --with-x=0 
PETSC_ARCH=arch-mac-mpich-matlab-hdf5
-----------------------------------------
Libraries compiled on 2018-06-19 13:50:29 on vhapla-macbook-pro 
Machine characteristics: Darwin-17.6.0-x86_64-i386-64bit
Using PETSc directory: /Users/vhapla/petsc1
Using PETSc arch: arch-mac-mpich-matlab-hdf5
-----------------------------------------

Using C compiler: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc    
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
-fstack-protector -Qunused-arguments -fvisibility=hidden -g3  
-----------------------------------------

Using include paths: -I/Users/vhapla/petsc1/include 
-I/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/include 
-I/opt/miniconda3/envs/salvus/include 
-I/Applications/MATLAB_R2017b.app/extern/include -I/usr/local/include
-----------------------------------------

Using C linker: /Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/bin/mpicc
Using libraries: -Wl,-rpath,/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib 
-L/Users/vhapla/petsc1/arch-mac-mpich-matlab-hdf5/lib -lpetsc 
-Wl,-rpath,/opt/miniconda3/envs/salvus/lib -L/opt/miniconda3/envs/salvus/lib 
-L/Applications/MATLAB_R2017b.app/bin/maci64 
-L/Applications/MATLAB_R2017b.app/extern/lib/maci64 -llapack -lblas -lhdf5_hl 
-lhdf5 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -lstdc++ -ldl
-----------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Attachment: log.c
Description: Binary data


2. 7. 2018 v 17:57, Stefano Zampini <stefano.zamp...@gmail.com>:

The intent was not to show you a deadlock; was to provide you evidence that the PetscLogEvent model is meant to have the same event created collectively (hopefully at class registration) on all processes. See also PetscLogViewDefault

2018-07-02 18:45 GMT+03:00 Vaclav Hapla <vaclav.ha...@erdw.ethz.ch>:
Thanks for the MWE. But it behaves exactly as I would anticipate, see log. I think that naming the event differently for each rank is not good but fortunately PETSc just ignores this for rank > 0. Important is the event id which is the same. I correctly get that the time spent in this event is slightly more than 8 s and the ratio between the shortest and the longest time among all ranks is 8.0.

Where's the deadlock? Note that PetscLogEventBegin/End and PetscStageLogPush/Pop are not collective!

And thanks for another argument for better specificity of the log view. If the same event would be nested at several levels, the obtained total time wouldn't say anything useful.

Thanks

Vaclav



2. 7. 2018 v 17:20, Stefano Zampini <stefano.zamp...@gmail.com>:




I don't see why this should lead to deadlock? With current class-wise events you can already have many simultaneous instances of the same event at once.


Attached a MVE to better explain myself. If you run with p processes, you will see what I mean

 

-- 
Stefano
<log.c>





-- 
Stefano

Reply via email to