The debug version does extra reductions for error checking, you should never 
look at -log_summary with a debug build.

   Normally MatMult() for MPIAIJ matrices has nearest neighbor communication so 
neither MatMult() or MatMultTranspose() has global reductions but if some 
scatters involve all entries then VecScatter does use global reductions in 
those cases.  You could run a slimmed down run on 2 process with one process in 
the debugger and put a break point in MPI_Allreduce() and MPI_Reduce() to see 
when it is being triggered inside the MatMultTranspose(). Use 
-start_in_debugger -debugger_nodes 0


   Barry

On Jan 4, 2014, at 1:32 PM, R. Oğuz Selvitopi <[email protected]> wrote:

> Hello,
> 
> I am trying to understand the output generated by PETSc with -log_summary 
> option.
> 
> Using PetscLogStageRegister/PetscLogStagePush/PetscLogStagePop I want to find 
> out if there exists unnecessary communication in my code.
> My problem is with understanding the number of reductions performed.
> 
> I have a solver whose stages are logged, and in the summary stages output, I 
> get
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  
> -- Message Lengths --  -- Reductions --
>                                  Avg     %Total     Avg     %Total   counts   
> %Total     Avg         %Total   counts   %Total 
> 4:          Solver: 6.5625e-04   4.3%  4.2000e+02  59.7%  1.600e+01  23.2%  
> 3.478e+00       14.5%  8.000e+00   5.3% 
> 
> where it seems I have 8 reduction operations performed. But in the details of 
> the stage events, I get:
> 
> --- Event Stage 4: Solver
> 
> Event                Count      Time (sec)     Flops                          
>    --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult                      1 1.0 1.2207e-04 1.8 4.10e+01 4.1 8.0e+00 
> 1.5e+01 0.0e+00  1 16 12  7  0  15 27 50 50  0     1
> MatMultTranspose       1 1.0 1.2112e-04 1.0 4.60e+01 3.8 8.0e+00 1.5e+01 
> 2.0e+00  1 18 12  7  1  18 30 50 50 25     1
> VecDot                       3 1.0 2.6989e-04 1.2 2.90e+01 2.6 0.0e+00 
> 0.0e+00 3.0e+00  2 12  0  0  2  36 20  0  0 38     0
> VecSet                       2 1.0 8.1062e-06 1.5 0.00e+00 0.0 0.0e+00 
> 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
> VecAXPY                   2 1.0 3.7909e-05 1.3 2.00e+01 2.0 0.0e+00 0.0e+00 
> 0.0e+00  0  9  0  0  0   5 15  0  0  0     2
> VecAYPX                   1 1.0 5.0068e-06 1.2 1.00e+01 1.7 0.0e+00 0.0e+00 
> 0.0e+00  0  5  0  0  0   1  8  0  0  0     6
> VecScatterBegin         2 1.0 7.2956e-05 2.4 0.00e+00 0.0 1.6e+01 1.5e+01 
> 0.0e+00  0  0 23 14  0   6  0100100  0     0
> VecScatterEnd           2 1.0 9.5129e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0  10  0  0  0  0     0
> 
> It seems there are only 5 reductions.
> 
> But when I detail my log stages, it shows up VecAXPY/VecAYPX operations 
> require reductions as well (I have two VecAXPY and a single VecAYPX, so 5+3 = 
> 8).
> (Whose logs I have not included here).
> 
> Normally these two operations should not require any reductions at all, as 
> opposed to VecDot.
> 
> Do VecAXPY/VecAYPX require reductions? Is it because PETSc is compiled with 
> the debugging option so that it performs additional checks that perform 
> reductions?
> 
> Which is the correct number of reductions in above statistics, 5 or 8?
> 
> Moreover, why does MatMult require no reduction whereas MatMultTranspose 
> requires two of them?
> 
> Thanks in advance.

Reply via email to