> On Oct 30, 2017, at 2:23 PM, Mark Lohry <mlo...@gmail.com> wrote: > > > Hmm, are those blocks dense? If so you could benefit enormously from using > BAIJ format. > > > Yes they're dense blocks. Usually coupled compressible 3D NS with DG > elements, 5 equations x order (N+1)*(N+2)*(N+3)/3 block size. So block sizes > of 50^2 to 175^2 are typical. I'll try BAIJ; I initially set it up with AIJ > as it seemed better supported in parallel on the linear solver table, but I > suppose these are rather large blocks... still surprising performance as this > was overall a pretty small system (1,536 elements/diagonal 100^2 blocks).
Something is really wrong to get those huge times. > > > Could you run with -ksp_view_mat binary and send the resulting file called > binaryoutput and we can run the coloring codes local to performance debug. > > > Will send this evening. Thanks but send for the AIJ case, not BAIJ > > > On Mon, Oct 30, 2017 at 3:02 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > On Oct 30, 2017, at 1:58 PM, Mark Lohry <mlo...@gmail.com> wrote: > > > > Hmm, metis doesn't really have anything to do with the sparsity of the > > Jacobian does it? > > > > No, I just mean I'm doing initial partitioning and parallel communication > > for the residual evaluations independently of petsc, and then doing a > > 1-to-1 mapping to the petsc solution vector. Along with manually setting > > the non-zero structure of the MPIAIJ system as in the user manual. I don't > > think there's anything wrong with the system structure as it gives the same > > correct answer as the un-preconditioned matrix-free approach. > > > > The exact system those MatColoring times came from has size (100x100) > > blocks on the diagonals corresponding to the tetrahedral cells, with those > > having 4 neighbor blocks on the same row (or fewer for elements on > > boundaries.) > > Hmm, are those blocks dense? If so you could benefit enormously from using > BAIJ format. > > Matt, > > Sounds like performance bugs for the parallel coloring apply algorithms > with big "diagonal blocks" > > Mark, > > Could you run with -ksp_view_mat binary and send the resulting file > called binaryoutput and we can run the coloring codes local to performance > debug. > > > Barry > > > > > On Mon, Oct 30, 2017 at 1:55 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > > On Oct 30, 2017, at 12:39 PM, Mark Lohry <mlo...@gmail.com> wrote: > > > > > > > > > > > > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for the FD > > > > computation of the jacobians, or for the computation of the > > > > preconditioner? I'd like to get a handle on the relative costs of these. > > > > > > No, do you just want the time? You can get that from the logging; for > > > example -log_view > > > > > > Yes, was just thinking in regards to your suggestion of recomputing when > > > the number of linear iterations gets too high; I assume it's the ratio of > > > preconditioner cost vs linear solver cost at runtime that's the metric of > > > interest, and not the absolute value of either. But I'll cross that > > > bridge when I come to it. > > > > > > When I had asked, I was looking to see where a long pause was happening > > > thinking it was the FD jacobian; turned out to be before that in > > > MatColoringApply which seems surprisingly expensive. MATCOLORINGJP took > > > ~15 minutes on 32 cores on a small 153,000^2 system, with > > > MATCOLORINGGREEDY taking 30 seconds. Any guidance there, or is this > > > expected? I'm not using DM, just manually entering the sparsity resulting > > > from a metis decomposition of a tetrahedral mesh. > > > > Hmm, metis doesn't really have anything to do with the sparsity of the > > Jacobian does it? > > > > Matt, > > > > These times are huge. What is going on? > > > > Barry > > > > > > > > > > > Thanks for the info on the lag logic, I'll play with the TS pre/post > > > calls for the time-accurate problems and only use LagJacobian. > > > > > > On Mon, Oct 30, 2017 at 11:29 AM, Smith, Barry F. <bsm...@mcs.anl.gov> > > > wrote: > > > > > > > On Oct 29, 2017, at 11:50 AM, Mark Lohry <mlo...@gmail.com> wrote: > > > > > > > > Thanks again Barry, I've got the preconditioners hooked up with > > > > -snes_mf_operator and at least AMG looks to be working great on a high > > > > order unstructured DG problem. > > > > > > > > Couple questions on the SNESSetLagJacobian + SNESSetLagPreconditioner > > > > code flow: > > > > > > > > 1) With -snes_mf_operator, and given SNESSetLagJacobian(snes, 1) > > > > (default) and SNESSetLagPreconditioner(snes, 2), after the first KSP > > > > solve in a newton iteration, will it do the finite different jacobian > > > > calculation? Or will the Jacobian only be computed when the > > > > preconditioner lag setting demands it on the 3rd newton step? I suspect > > > > it's the latter based on where I see the code pause. > > > > > > SNES with -snes_mf_operator will ALWAYS use the matrix-free finite > > > difference f(x+h) - f(x) to apply the matrix vector product. > > > > > > The LagJacobian and LagPreconditioner are not coordinated. The first > > > determines how often the Jacobian used for preconditioning is recomputed > > > and the second determines how often the preconditioner is recomputed. > > > > > > If you are using -snes_mf_operator then it never makes sense to have > > > lagJacobian < lagPreconditioner since it would recompute the Jacobian but > > > not actually use it. It also makes no sense for lagPreconditioner < > > > lagJacobian because you'd be recomputing the preconditioner on the same > > > Jacobian. > > > > > > But actually if you don't change the Jacobian used in building the > > > preconditioner then when it tries to recompute the preconditioner it > > > determines the matrix has not changed so skips rebuilding the > > > preconditioner. So when using -snes_mf_operator there is really no reason > > > generally to set the preconditioner lag. > > > > > > > > 2) How do implicit TS and SNESSetLagPreconditioner/Persists interact? > > > > Does the counter since-last-preconditioner-compute reset with time > > > > steps, or does that lag counter just increment with every SNES solve > > > > regardless of how many nonlinear solves might have happened in a given > > > > timestep? Say lag preconditioner is 2, and a time stepper uses 3, 2, > > > > and 3 nonlinear solves on 3 steps, is the flow > > > > > > > > (time step 1)->(update preconditioner)->(snes solve)->(snes > > > > solve)->(update preconditioner)->(snes solve) > > > > (time step 2)->(snes solve)->(update preconditioner)->(snes solve) > > > > (time step 3)->(snes solve)->(update preconditioner)->(snes > > > > solve)->(snes solve) > > > > > > > > or > > > > > > > > (time step 1)->(update preconditioner)->(snes solve)->(snes > > > > solve)->(update preconditioner)->(snes solve) > > > > (time step 2)->(update preconditioner)->(snes solve)->(snes solve) > > > > (time step 3)->(update preconditioner)->(snes solve)->(snes > > > > solve)->(update preconditioner)->(snes solve) > > > > > > > > ? > > > > > > > > I think for implicit time stepping I'd probably want the preconditioner > > > > to be recomputed just once at the beginning of each time step, or some > > > > multiple of that. Does that sound reasonable? > > > > > > Yes, what you want to do is completely reasonable. > > > > > > You can use SNESSetLagJacobian() and SNESSetLagJacobianPersists() in > > > combination to have the Jacobian recomputed ever fixed number of times; > > > if you set the persists flag and set LagJacobian to 10 it will recompute > > > the Jacobian used in the preconditioner every 10th time a new Jacobian is > > > needed. > > > > > > If you want to compute the new Jacobian used to build the > > > preconditioner once at the beginning of each new TS stage you can set > > > SNESSetLagJacobian() to negative -2 in the TS prestage call. There are > > > possibly other tricks you can do by setting the two flags at different > > > locations. > > > > > > An alternative to hardwiring how often the Jacobian used to build the > > > preconditioner is rebuilt is to rebuild based on when the preconditioner > > > starts "working less well". Here you could put an additional KSPMonitor > > > or SNESMonitor that detects if the number of linear iterations is above a > > > certain amount and then sets the recompute Jacobian flag to -2 so that > > > for the next solve it recreates the Jacobian used in building the > > > preconditioner. > > > > > > > > > > > > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for the FD > > > > computation of the jacobians, or for the computation of the > > > > preconditioner? I'd like to get a handle on the relative costs of these. > > > > > > No, do you just want the time? You can get that from the logging; for > > > example -log_view > > > > > > > > > > > > > > > Best, > > > > Mark > > > > > > > > On Sat, Sep 23, 2017 at 3:28 PM, Mark Lohry <mlo...@gmail.com> wrote: > > > > Great, thanks Barry. > > > > > > > > On Sat, Sep 23, 2017 at 3:12 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > > > > > > > On Sep 23, 2017, at 12:48 PM, Mark W. Lohry <mlo...@princeton.edu> > > > > > wrote: > > > > > > > > > > I'm currently using JFNK in an application where I don't have a > > > > > hand-coded jacobian, and it's working well enough but as expected the > > > > > scaling isn't great. > > > > > > > > > > What is the general process for using PC with MatMFFDComputeJacobian? > > > > > Does it make sense to occasionally have petsc re-compute the jacobian > > > > > via finite differences, and then recompute the preconditioner? Any > > > > > that just need the sparsity structure? > > > > > > > > Mark > > > > > > > > Yes, this is a common approach. SNESSetLagJacobian -snes_lag_jacobian > > > > > > > > The normal approach in SNES to use matrix-free for the operator and > > > > use finite differences to compute an approximate Jacobian used to > > > > construct preconditioners is to to create a sparse matrix with the > > > > sparsity of the approximate Jacobian (yes you need a way to figure out > > > > the sparsity, if you use DMDA it will figure out the sparsity for you). > > > > Then you use > > > > > > > > SNESSetJacobian(snes,J,J, SNESComputeJacobianDefaultColor, NULL); > > > > > > > > and use the options database option -snes_mf_operator > > > > > > > > > > > > > Are there any PCs that don't work in the matrix-free context? > > > > > > > > If you do the above you can use almost all the PC since you are > > > > providing an explicit matrix from which to build the preconditioner > > > > > > > > > Are there any example codes I overlooked? > > > > > > > > > > Last but not least... can the Boomer-AMG preconditioner work with > > > > > JFNK? To really show my ignorance of AMG, can it actually be written > > > > > as a matrix P^-1(Ax-b)=0, , or is it just a linear operator? > > > > > > > > Again, if you provide an approximate Jacobian like above you can use > > > > it with BoomerAMG, if you provide NO explicit matrix you cannot use > > > > BoomerAMG or almost any other preconditioner. > > > > > > > > Barry > > > > > > > > > > > > > > Thanks, > > > > > Mark > > > > > > > > > > > > > > > > > > > > > > > >