> On Aug 25, 2015, at 2:06 AM, Timothée Nicolas <[email protected]> > wrote: > > OK, I see, > > Might it be that I do something a bit funky to obtain a good guess for solve > ? I had he following idea, which I used with great success on a very > different problem (much simpler, maybe that's why it worked) : obtain the > initial guess as a cubic extrapolation of the preceding solutions. The idea > is that I expect my solution to be reasonably smooth over time, so > considering this, the increment of the fields should also be continuous (I > solve for the increments, not the fields themselves). Therefore, I store in > my user context the current vector Xk as well as the last three solutions > Xkm1 and Xkm2. > > I define > > dxm2 = Xkm1 - Xkm2 > dxm1 = Xk - Xkm1 > > And I use the result of the last SNESSolve as > > dx = Xkp1 - Xk > > Then I set the new dx initial guess as the pointwise cubic extrapolation of > (dxm2,dxm1,dx) > > However it seems pretty local and I don't see why scatters would be required > for this.
Yes, no scatters here. > > I printed the routine I use to do this below. In any case I will clean up a > bit, remove the extra stuff (not much there however). If it is not > sufficient, I will transform my form function in a dummy which does not > require computations and see what happens. > > Timothee > > PetscErrorCode :: ierr > > PetscScalar :: M(3,3) > Vec :: xkm2,xkm1 > Vec :: coef1,coef2,coef3 > PetscScalar :: a,b,c,t,det > > a = user%tkm1 > b = user%tk > c = user%t > t = user%t+user%dt > > det = b*a**2 + c*b**2 + a*c**2 - (c*a**2 + a*b**2 + b*c**2) > > M(1,1) = (b-c)/det > M(2,1) = (c**2-b**2)/det > M(3,1) = (c*b**2-b*c**2)/det > > M(1,2) = (c-a)/det > M(2,2) = (a**2-c**2)/det > M(3,2) = (a*c**2-c*a**2)/det > > M(1,3) = (a-b)/det > M(2,3) = (b**2-a**2)/det > M(3,3) = (b*a**2-a*b**2)/det > > call VecDuplicate(x,xkm1,ierr) > call VecDuplicate(x,xkm2,ierr) > > call VecDuplicate(x,coef1,ierr) > call VecDuplicate(x,coef2,ierr) > call VecDuplicate(x,coef3,ierr) > > call VecWAXPY(xkm2,-one,user%Xkm2,user%Xkm1,ierr) > call VecWAXPY(xkm1,-one,user%Xkm1,user%Xk,ierr) > > ! The following lines correspond to the following simple operation > > > > ! coef1 = M(1,1)*alpha + M(1,2)*beta + M(1,3)*gamma > > > > ! coef2 = M(2,1)*alpha + M(2,2)*beta + M(2,3)*gamma > > > > ! coef3 = M(3,1)*alpha + M(3,2)*beta + M(3,3)*gamma > > > > call VecCopy(xkm2,coef1,ierr) > call VecScale(coef1,M(1,1),ierr) > call VecAXPY(coef1,M(1,2),xkm1,ierr) > call VecAXPY(coef1,M(1,3),x,ierr) > > call VecCopy(xkm2,coef2,ierr) > call VecScale(coef2,M(2,1),ierr) > call VecAXPY(coef2,M(2,2),xkm1,ierr) > call VecAXPY(coef2,M(2,3),x,ierr) > > call VecCopy(xkm2,coef3,ierr) > call VecScale(coef3,M(3,1),ierr) > call VecAXPY(coef3,M(3,2),xkm1,ierr) > call VecAXPY(coef3,M(3,3),x,ierr) > > call VecCopy(coef3,x,ierr) > call VecAXPY(x,t,coef2,ierr) > call VecAXPY(x,t**2,coef1,ierr) > > call VecDestroy(xkm2,ierr) > call VecDestroy(xkm1,ierr) > > call VecDestroy(coef1,ierr) > call VecDestroy(coef2,ierr) > call VecDestroy(coef3,ierr) > > > > 2015-08-25 15:47 GMT+09:00 Barry Smith <[email protected]>: > > The results are kind of funky, > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > SNESSolve 40 1.0 4.9745e+02 3.3 4.25e+09 1.0 1.7e+06 3.8e+04 > 2.7e+03 46 93 99 95 80 46 93 99 95 80 2187 > SNESFunctionEval 666 1.0 4.8990e+02 3.4 5.73e+08 1.0 1.7e+06 3.8e+04 > 1.3e+03 45 13 99 95 40 45 13 99 95 40 299 > SNESLineSearch 79 1.0 3.8578e+00 1.0 4.98e+08 1.0 4.0e+05 3.8e+04 > 6.3e+02 1 11 23 23 19 1 11 23 23 19 33068 > VecScatterEnd 1335 1.0 3.4761e+02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 31 0 0 0 0 31 0 0 0 0 0 > MatMult MF 547 1.0 1.2570e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 > 1.1e+03 2 28 81 78 34 2 28 81 78 34 25962 > MatMult 547 1.0 1.2571e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 > 1.1e+03 2 28 81 78 34 2 28 81 78 34 25960 > > look at the %T time for global SNES solve is 46 % of the total time, function > evaluations are 45% but MatMult are only 2% (and yet matmult should contain > most of the function evaluations). I cannot explain this. Also the > VecScatterEnd is HUGE and has a bad load balance of 5.8 Why are there so > many more scatters than function evaluations? What other operations are you > doing that require scatters? > > It's almost like you have some mysterious "extra" function calls outside of > the SNESSolve that are killing the performance? It might help to understand > the performance to strip out all extraneous computations not needed (like in > custom monitors etc). > > Barry > > > > > > > > On Aug 25, 2015, at 1:21 AM, Timothée Nicolas <[email protected]> > > wrote: > > > > Here is the log summary (attached). At the beginning are personal prints, > > you can skip. I seem to have a memory crash in the present state after > > typically 45 iterations (that's why I used 40 here), the log summary > > indicates some creations without destruction of Petsc objects (I will fix > > this immediately), that may cause the memory crash, but I don't think it's > > the cause of the slow function evaluations. > > > > The log_summary is consistent with 0.7s per function evaluation > > (4.8990e+02/666 = 0.736). In addition, SNESSolve itself takes approximately > > the same amount of time (is it normal ?). And the other long operation is > > VecScatterEnd. I assume it is the time used in process communications ? In > > which case I suppose it is normal that it takes a significant amount of > > time. > > > > So this ~10 times increase does not look normal right ? > > > > Best > > > > Timothee NICOLAS > > > > > > 2015-08-25 14:56 GMT+09:00 Barry Smith <[email protected]>: > > > > > On Aug 25, 2015, at 12:45 AM, Timothée Nicolas > > > <[email protected]> wrote: > > > > > > Hi, > > > > > > I am testing PETSc on the supercomputer where I used to run my explicit > > > MHD code. For my tests I use 256 processes on a problem of size > > > 128*128*640 = 10485760, that is, 40960 grid points per process, and 8 > > > degrees of freedom (or physical fields). The explicit code was using > > > Runge-Kutta 4 for the time scheme, which means 4 function evaluation per > > > time step (plus one operation to put everything together, but let's > > > forget this one). > > > > > > I could thus easily determine that the typical time required for a > > > function evaluation was of the order of 50 ms. > > > > > > Now with the implicit Newton-Krylov solver written in PETSc, in the > > > present state where for now I have not implemented any Jacobian or > > > preconditioner whatsoever (so I run with -snes_mf), I measure a typical > > > time between two time steps of between 5 and 20 seconds, and the number > > > of function evaluations for each time step obtained with > > > SNESGetNumberFunctionEvals is 17 (I am speaking of a particular case of > > > course) > > > > > > This means a time per function evaluation of about 0.5 to 1 second, that > > > is, 10 to 20 times slower. > > > > > > So I have some questions about this. > > > > > > 1. First does SNESGetNumberFunctionEvals take into account the function > > > evaluations required to evaluate the Jacobian when -snes_mf is used, as > > > well as the operations required by the GMRES (Krylov) method ? If it were > > > the case, I would somehow intuitively expect a number larger than 17, > > > which could explain the increase in time. > > > > PetscErrorCode SNESGetNumberFunctionEvals(SNES snes, PetscInt *nfuncs) > > { > > *nfuncs = snes->nfuncs; > > } > > > > PetscErrorCode SNESComputeFunction(SNES snes,Vec x,Vec y) > > { > > ... > > snes->nfuncs++; > > } > > > > PetscErrorCode MatCreateSNESMF(SNES snes,Mat *J) > > { > > ..... > > if (snes->pc && snes->pcside == PC_LEFT) { > > ierr = MatMFFDSetFunction(*J,(PetscErrorCode > > (*)(void*,Vec,Vec))SNESComputeFunctionDefaultNPC,snes);CHKERRQ(ierr); > > } else { > > ierr = MatMFFDSetFunction(*J,(PetscErrorCode > > (*)(void*,Vec,Vec))SNESComputeFunction,snes);CHKERRQ(ierr); > > } > > } > > > > So, yes I would expect all the function evaluations needed for the > > matrix-free Jacobian matrix vector product to be counted. You can also look > > at the number of GMRES Krylov iterations it took (which should have one > > multiply per iteration) to double check that the numbers make sense. > > > > What does your -log_summary output look like? One thing that GMRES does > > is it introduces a global reduction with each multiple (hence a barrier > > across all your processes) on some systems this can be deadly. > > > > Barry > > > > > > > > > > 2. In any case, I thought that all things considered, the function > > > evaluation would be the most time consuming part of a Newton-Krylov > > > solver, am I completely wrong about that ? Is the 10-20 factor legit ? > > > > > > I realize of course that preconditioning should make all this smoother, > > > in particular allowing larger time steps, but here I am just concerned > > > about the sheer Function evaluation time. > > > > > > Best regards > > > > > > Timothee NICOLAS > > > > > > <log_summary.txt> > >
