The results are kind of funky,
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
SNESSolve 40 1.0 4.9745e+02 3.3 4.25e+09 1.0 1.7e+06 3.8e+04
2.7e+03 46 93 99 95 80 46 93 99 95 80 2187
SNESFunctionEval 666 1.0 4.8990e+02 3.4 5.73e+08 1.0 1.7e+06 3.8e+04
1.3e+03 45 13 99 95 40 45 13 99 95 40 299
SNESLineSearch 79 1.0 3.8578e+00 1.0 4.98e+08 1.0 4.0e+05 3.8e+04
6.3e+02 1 11 23 23 19 1 11 23 23 19 33068
VecScatterEnd 1335 1.0 3.4761e+02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 31 0 0 0 0 31 0 0 0 0 0
MatMult MF 547 1.0 1.2570e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04
1.1e+03 2 28 81 78 34 2 28 81 78 34 25962
MatMult 547 1.0 1.2571e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04
1.1e+03 2 28 81 78 34 2 28 81 78 34 25960
look at the %T time for global SNES solve is 46 % of the total time, function
evaluations are 45% but MatMult are only 2% (and yet matmult should contain
most of the function evaluations). I cannot explain this. Also the
VecScatterEnd is HUGE and has a bad load balance of 5.8 Why are there so many
more scatters than function evaluations? What other operations are you doing
that require scatters?
It's almost like you have some mysterious "extra" function calls outside of the
SNESSolve that are killing the performance? It might help to understand the
performance to strip out all extraneous computations not needed (like in custom
monitors etc).
Barry
> On Aug 25, 2015, at 1:21 AM, Timothée Nicolas <[email protected]>
> wrote:
>
> Here is the log summary (attached). At the beginning are personal prints, you
> can skip. I seem to have a memory crash in the present state after typically
> 45 iterations (that's why I used 40 here), the log summary indicates some
> creations without destruction of Petsc objects (I will fix this immediately),
> that may cause the memory crash, but I don't think it's the cause of the slow
> function evaluations.
>
> The log_summary is consistent with 0.7s per function evaluation
> (4.8990e+02/666 = 0.736). In addition, SNESSolve itself takes approximately
> the same amount of time (is it normal ?). And the other long operation is
> VecScatterEnd. I assume it is the time used in process communications ? In
> which case I suppose it is normal that it takes a significant amount of time.
>
> So this ~10 times increase does not look normal right ?
>
> Best
>
> Timothee NICOLAS
>
>
> 2015-08-25 14:56 GMT+09:00 Barry Smith <[email protected]>:
>
> > On Aug 25, 2015, at 12:45 AM, Timothée Nicolas <[email protected]>
> > wrote:
> >
> > Hi,
> >
> > I am testing PETSc on the supercomputer where I used to run my explicit MHD
> > code. For my tests I use 256 processes on a problem of size 128*128*640 =
> > 10485760, that is, 40960 grid points per process, and 8 degrees of freedom
> > (or physical fields). The explicit code was using Runge-Kutta 4 for the
> > time scheme, which means 4 function evaluation per time step (plus one
> > operation to put everything together, but let's forget this one).
> >
> > I could thus easily determine that the typical time required for a function
> > evaluation was of the order of 50 ms.
> >
> > Now with the implicit Newton-Krylov solver written in PETSc, in the present
> > state where for now I have not implemented any Jacobian or preconditioner
> > whatsoever (so I run with -snes_mf), I measure a typical time between two
> > time steps of between 5 and 20 seconds, and the number of function
> > evaluations for each time step obtained with SNESGetNumberFunctionEvals is
> > 17 (I am speaking of a particular case of course)
> >
> > This means a time per function evaluation of about 0.5 to 1 second, that
> > is, 10 to 20 times slower.
> >
> > So I have some questions about this.
> >
> > 1. First does SNESGetNumberFunctionEvals take into account the function
> > evaluations required to evaluate the Jacobian when -snes_mf is used, as
> > well as the operations required by the GMRES (Krylov) method ? If it were
> > the case, I would somehow intuitively expect a number larger than 17, which
> > could explain the increase in time.
>
> PetscErrorCode SNESGetNumberFunctionEvals(SNES snes, PetscInt *nfuncs)
> {
> *nfuncs = snes->nfuncs;
> }
>
> PetscErrorCode SNESComputeFunction(SNES snes,Vec x,Vec y)
> {
> ...
> snes->nfuncs++;
> }
>
> PetscErrorCode MatCreateSNESMF(SNES snes,Mat *J)
> {
> .....
> if (snes->pc && snes->pcside == PC_LEFT) {
> ierr = MatMFFDSetFunction(*J,(PetscErrorCode
> (*)(void*,Vec,Vec))SNESComputeFunctionDefaultNPC,snes);CHKERRQ(ierr);
> } else {
> ierr = MatMFFDSetFunction(*J,(PetscErrorCode
> (*)(void*,Vec,Vec))SNESComputeFunction,snes);CHKERRQ(ierr);
> }
> }
>
> So, yes I would expect all the function evaluations needed for the
> matrix-free Jacobian matrix vector product to be counted. You can also look
> at the number of GMRES Krylov iterations it took (which should have one
> multiply per iteration) to double check that the numbers make sense.
>
> What does your -log_summary output look like? One thing that GMRES does is
> it introduces a global reduction with each multiple (hence a barrier across
> all your processes) on some systems this can be deadly.
>
> Barry
>
>
> >
> > 2. In any case, I thought that all things considered, the function
> > evaluation would be the most time consuming part of a Newton-Krylov solver,
> > am I completely wrong about that ? Is the 10-20 factor legit ?
> >
> > I realize of course that preconditioning should make all this smoother, in
> > particular allowing larger time steps, but here I am just concerned about
> > the sheer Function evaluation time.
> >
> > Best regards
> >
> > Timothee NICOLAS
>
>
> <log_summary.txt>