Re: [petsc-users] Function evaluation slowness ?

Barry Smith Tue, 25 Aug 2015 10:40:12 -0700

> On Aug 25, 2015, at 2:06 AM, Timothée Nicolas <[email protected]> 
> wrote:
> 
> OK, I see,
> 
> Might it be that I do something a bit funky to obtain a good guess for solve 
> ? I had he following idea, which I used with great success on a very 
> different problem (much simpler, maybe that's why it worked) : obtain the 
> initial guess as a cubic extrapolation of the preceding solutions. The idea 
> is that I expect my solution to be reasonably smooth over time, so 
> considering this, the increment of the fields should also be continuous (I 
> solve for the increments, not the fields themselves). Therefore, I store in 
> my user context the current vector Xk as well as the last three solutions 
> Xkm1 and Xkm2.
> 
> I define 
> 
> dxm2 = Xkm1 - Xkm2
> dxm1 = Xk - Xkm1
> 
> And I use the result of the last SNESSolve as 
> 
> dx = Xkp1 - Xk
> 
> Then I set the new dx initial guess as the pointwise cubic extrapolation of 
> (dxm2,dxm1,dx)
> 
> However it seems pretty local and I don't see why scatters would be required 
> for this.


   Yes, no scatters here.

> 
> I printed the routine I use to do this below. In any case I will clean up a 
> bit, remove the extra stuff (not much there however). If it is not 
> sufficient, I will transform my form function in a dummy which does not 
> require computations and see what happens.
> 
> Timothee
> 
>   PetscErrorCode :: ierr
> 
>   PetscScalar :: M(3,3)
>   Vec         :: xkm2,xkm1
>   Vec         :: coef1,coef2,coef3
>   PetscScalar :: a,b,c,t,det
> 
>   a = user%tkm1
>   b = user%tk
>   c = user%t
>   t = user%t+user%dt
> 
>   det = b*a**2 + c*b**2 + a*c**2 - (c*a**2 + a*b**2 + b*c**2)
> 
>   M(1,1) = (b-c)/det
>   M(2,1) = (c**2-b**2)/det
>   M(3,1) = (c*b**2-b*c**2)/det
> 
>   M(1,2) = (c-a)/det
>   M(2,2) = (a**2-c**2)/det
>   M(3,2) = (a*c**2-c*a**2)/det
> 
>   M(1,3) = (a-b)/det
>   M(2,3) = (b**2-a**2)/det
>   M(3,3) = (b*a**2-a*b**2)/det
> 
>   call VecDuplicate(x,xkm1,ierr)
>   call VecDuplicate(x,xkm2,ierr)
> 
>   call VecDuplicate(x,coef1,ierr)
>   call VecDuplicate(x,coef2,ierr)
>   call VecDuplicate(x,coef3,ierr)
> 
>   call VecWAXPY(xkm2,-one,user%Xkm2,user%Xkm1,ierr)
>   call VecWAXPY(xkm1,-one,user%Xkm1,user%Xk,ierr)
> 
>   ! The following lines correspond to the following simple operation          
>                                                                               
>                                                                               
>                              
>   ! coef1 = M(1,1)*alpha + M(1,2)*beta + M(1,3)*gamma                         
>                                                                               
>                                                                               
>                              
>   ! coef2 = M(2,1)*alpha + M(2,2)*beta + M(2,3)*gamma                         
>                                                                               
>                                                                               
>                              
>   ! coef3 = M(3,1)*alpha + M(3,2)*beta + M(3,3)*gamma                         
>                                                                               
>                                                                               
>                              
>   call VecCopy(xkm2,coef1,ierr)
>   call VecScale(coef1,M(1,1),ierr)
>   call VecAXPY(coef1,M(1,2),xkm1,ierr)
>   call VecAXPY(coef1,M(1,3),x,ierr)
> 
>   call VecCopy(xkm2,coef2,ierr)
>   call VecScale(coef2,M(2,1),ierr)
>   call VecAXPY(coef2,M(2,2),xkm1,ierr)
>   call VecAXPY(coef2,M(2,3),x,ierr)
> 
>   call VecCopy(xkm2,coef3,ierr)
>   call VecScale(coef3,M(3,1),ierr)
>   call VecAXPY(coef3,M(3,2),xkm1,ierr)
>   call VecAXPY(coef3,M(3,3),x,ierr)
> 
>   call VecCopy(coef3,x,ierr)
>   call VecAXPY(x,t,coef2,ierr)
>   call VecAXPY(x,t**2,coef1,ierr)
> 
>   call VecDestroy(xkm2,ierr)
>   call VecDestroy(xkm1,ierr)
> 
>   call VecDestroy(coef1,ierr)
>   call VecDestroy(coef2,ierr)
>   call VecDestroy(coef3,ierr)
> 
> 
> 
> 2015-08-25 15:47 GMT+09:00 Barry Smith <[email protected]>:
> 
>   The results are kind of funky,
> 
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                          
>    --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> SNESSolve             40 1.0 4.9745e+02 3.3 4.25e+09 1.0 1.7e+06 3.8e+04 
> 2.7e+03 46 93 99 95 80  46 93 99 95 80  2187
> SNESFunctionEval     666 1.0 4.8990e+02 3.4 5.73e+08 1.0 1.7e+06 3.8e+04 
> 1.3e+03 45 13 99 95 40  45 13 99 95 40   299
> SNESLineSearch        79 1.0 3.8578e+00 1.0 4.98e+08 1.0 4.0e+05 3.8e+04 
> 6.3e+02  1 11 23 23 19   1 11 23 23 19 33068
> VecScatterEnd       1335 1.0 3.4761e+02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00 31  0  0  0  0  31  0  0  0  0     0
> MatMult MF           547 1.0 1.2570e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 
> 1.1e+03  2 28 81 78 34   2 28 81 78 34 25962
> MatMult              547 1.0 1.2571e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 
> 1.1e+03  2 28 81 78 34   2 28 81 78 34 25960
> 
> look at the %T time for global SNES solve is 46 % of the total time, function 
> evaluations are 45% but MatMult are only 2% (and yet matmult should contain 
> most of the function evaluations). I cannot explain this. Also the 
> VecScatterEnd is HUGE and has a bad load balance of 5.8  Why are there so 
> many more scatters than function evaluations? What other operations are you 
> doing that require scatters?
> 
> It's almost like you have some mysterious "extra" function calls outside of 
> the SNESSolve that are killing the performance? It might help to understand 
> the performance to strip out all extraneous computations not needed (like in 
> custom monitors etc).
> 
>  Barry
> 
> 
> 
> 
> 
> 
> > On Aug 25, 2015, at 1:21 AM, Timothée Nicolas <[email protected]> 
> > wrote:
> >
> > Here is the log summary (attached). At the beginning are personal prints, 
> > you can skip. I seem to have a memory crash in the present state after 
> > typically 45 iterations (that's why I used 40 here), the log summary 
> > indicates some creations without destruction of Petsc objects (I will fix 
> > this immediately), that may cause the memory crash, but I don't think it's 
> > the cause of the slow function evaluations.
> >
> > The log_summary is consistent with 0.7s per function evaluation 
> > (4.8990e+02/666 = 0.736). In addition, SNESSolve itself takes approximately 
> > the same amount of time (is it normal ?). And the other long operation is 
> > VecScatterEnd. I assume it is the time used in process communications ? In 
> > which case I suppose it is normal that it takes a significant amount of 
> > time.
> >
> > So this ~10 times increase does not look normal right ?
> >
> > Best
> >
> > Timothee NICOLAS
> >
> >
> > 2015-08-25 14:56 GMT+09:00 Barry Smith <[email protected]>:
> >
> > > On Aug 25, 2015, at 12:45 AM, Timothée Nicolas 
> > > <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > I am testing PETSc on the supercomputer where I used to run my explicit 
> > > MHD code. For my tests I use 256 processes on a problem of size 
> > > 128*128*640 = 10485760, that is, 40960 grid points per process, and 8 
> > > degrees of freedom (or physical fields). The explicit code was using 
> > > Runge-Kutta 4 for the time scheme, which means 4 function evaluation per 
> > > time step (plus one operation to put everything together, but let's 
> > > forget this one).
> > >
> > > I could thus easily determine that the typical time required for a 
> > > function evaluation was of the order of 50 ms.
> > >
> > > Now with the implicit Newton-Krylov solver written in PETSc, in the 
> > > present state where for now I have not implemented any Jacobian or 
> > > preconditioner whatsoever (so I run with -snes_mf), I measure a typical 
> > > time between two time steps of between 5 and 20 seconds, and the number 
> > > of function evaluations for each time step obtained with 
> > > SNESGetNumberFunctionEvals is 17 (I am speaking of a particular case of 
> > > course)
> > >
> > > This means a time per function evaluation of about 0.5 to 1 second, that 
> > > is, 10 to 20 times slower.
> > >
> > > So I have some questions about this.
> > >
> > > 1. First does SNESGetNumberFunctionEvals take into account the function 
> > > evaluations required to evaluate the Jacobian when -snes_mf is used, as 
> > > well as the operations required by the GMRES (Krylov) method ? If it were 
> > > the case, I would somehow intuitively expect a number larger than 17, 
> > > which could explain the increase in time.
> >
> > PetscErrorCode  SNESGetNumberFunctionEvals(SNES snes, PetscInt *nfuncs)
> > {
> >   *nfuncs = snes->nfuncs;
> > }
> >
> > PetscErrorCode  SNESComputeFunction(SNES snes,Vec x,Vec y)
> > {
> > ...
> >   snes->nfuncs++;
> > }
> >
> > PetscErrorCode  MatCreateSNESMF(SNES snes,Mat *J)
> > {
> > .....
> >   if (snes->pc && snes->pcside == PC_LEFT) {
> >     ierr = MatMFFDSetFunction(*J,(PetscErrorCode 
> > (*)(void*,Vec,Vec))SNESComputeFunctionDefaultNPC,snes);CHKERRQ(ierr);
> >   } else {
> >     ierr = MatMFFDSetFunction(*J,(PetscErrorCode 
> > (*)(void*,Vec,Vec))SNESComputeFunction,snes);CHKERRQ(ierr);
> >   }
> > }
> >
> >   So, yes I would expect all the function evaluations needed for the 
> > matrix-free Jacobian matrix vector product to be counted. You can also look 
> > at the number of GMRES Krylov iterations it took (which should have one 
> > multiply per iteration) to double check that the numbers make sense.
> >
> >   What does your -log_summary output look like? One thing that GMRES does 
> > is it introduces a global reduction with each multiple (hence a barrier 
> > across all your processes) on some systems this can be deadly.
> >
> >   Barry
> >
> >
> > >
> > > 2. In any case, I thought that all things considered, the function 
> > > evaluation would be the most time consuming part of a Newton-Krylov 
> > > solver, am I completely wrong about that ? Is the 10-20 factor legit ?
> > >
> > > I realize of course that preconditioning should make all this smoother, 
> > > in particular allowing larger time steps, but here I am just concerned 
> > > about the sheer Function evaluation time.
> > >
> > > Best regards
> > >
> > > Timothee NICOLAS
> >
> >
> > <log_summary.txt>
> 
>

Re: [petsc-users] Function evaluation slowness ?

Reply via email to