Matt, using greedy on that system took 30 seconds wallclock, does that seem reasonable? It's a bit concerning on scaling since this is a small system. MATCOLORINGSL (the default) takes <10 seconds; the user manual specifies it's a sequential method, yet seems to work on MPIAIJ without complaining. Also, any of these expected to work with BAIJ?
-Mark On Tue, Oct 31, 2017 at 9:34 AM, Matthew Knepley <knep...@gmail.com> wrote: > On Mon, Oct 30, 2017 at 3:02 PM, Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > >> >> > On Oct 30, 2017, at 1:58 PM, Mark Lohry <mlo...@gmail.com> wrote: >> > >> > Hmm, metis doesn't really have anything to do with the sparsity of the >> Jacobian does it? >> > >> > No, I just mean I'm doing initial partitioning and parallel >> communication for the residual evaluations independently of petsc, and then >> doing a 1-to-1 mapping to the petsc solution vector. Along with manually >> setting the non-zero structure of the MPIAIJ system as in the user manual. >> I don't think there's anything wrong with the system structure as it gives >> the same correct answer as the un-preconditioned matrix-free approach. >> > >> > The exact system those MatColoring times came from has size (100x100) >> blocks on the diagonals corresponding to the tetrahedral cells, with those >> having 4 neighbor blocks on the same row (or fewer for elements on >> boundaries.) >> >> Hmm, are those blocks dense? If so you could benefit enormously from >> using BAIJ format. >> >> Matt, >> >> Sounds like performance bugs for the parallel coloring apply >> algorithms with big "diagonal blocks" >> > > Peter wrote the JP code (I think). I tried to look at it last night, but > abstraction is not present. Its not easy > to see where a performance problem might lurk. I think if we care, we just > have to instrument it and run > this example. Personally I have never used anything but greedy, which > works great. > > Matt > > >> Mark, >> >> Could you run with -ksp_view_mat binary and send the resulting file >> called binaryoutput and we can run the coloring codes local to performance >> debug. >> >> >> Barry >> >> > >> > On Mon, Oct 30, 2017 at 1:55 PM, Smith, Barry F. <bsm...@mcs.anl.gov> >> wrote: >> > >> > > On Oct 30, 2017, at 12:39 PM, Mark Lohry <mlo...@gmail.com> wrote: >> > > >> > > >> > > > >> > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for >> the FD computation of the jacobians, or for the computation of the >> preconditioner? I'd like to get a handle on the relative costs of these. >> > > >> > > No, do you just want the time? You can get that from the logging; >> for example -log_view >> > > >> > > Yes, was just thinking in regards to your suggestion of recomputing >> when the number of linear iterations gets too high; I assume it's the ratio >> of preconditioner cost vs linear solver cost at runtime that's the metric >> of interest, and not the absolute value of either. But I'll cross that >> bridge when I come to it. >> > > >> > > When I had asked, I was looking to see where a long pause was >> happening thinking it was the FD jacobian; turned out to be before that in >> MatColoringApply which seems surprisingly expensive. MATCOLORINGJP took ~15 >> minutes on 32 cores on a small 153,000^2 system, with MATCOLORINGGREEDY >> taking 30 seconds. Any guidance there, or is this expected? I'm not using >> DM, just manually entering the sparsity resulting from a metis >> decomposition of a tetrahedral mesh. >> > >> > Hmm, metis doesn't really have anything to do with the sparsity of >> the Jacobian does it? >> > >> > Matt, >> > >> > These times are huge. What is going on? >> > >> > Barry >> > >> > > >> > > >> > > Thanks for the info on the lag logic, I'll play with the TS pre/post >> calls for the time-accurate problems and only use LagJacobian. >> > > >> > > On Mon, Oct 30, 2017 at 11:29 AM, Smith, Barry F. <bsm...@mcs.anl.gov> >> wrote: >> > > >> > > > On Oct 29, 2017, at 11:50 AM, Mark Lohry <mlo...@gmail.com> wrote: >> > > > >> > > > Thanks again Barry, I've got the preconditioners hooked up with >> -snes_mf_operator and at least AMG looks to be working great on a high >> order unstructured DG problem. >> > > > >> > > > Couple questions on the SNESSetLagJacobian + >> SNESSetLagPreconditioner code flow: >> > > > >> > > > 1) With -snes_mf_operator, and given SNESSetLagJacobian(snes, 1) >> (default) and SNESSetLagPreconditioner(snes, 2), after the first KSP solve >> in a newton iteration, will it do the finite different jacobian >> calculation? Or will the Jacobian only be computed when the preconditioner >> lag setting demands it on the 3rd newton step? I suspect it's the latter >> based on where I see the code pause. >> > > >> > > SNES with -snes_mf_operator will ALWAYS use the matrix-free finite >> difference f(x+h) - f(x) to apply the matrix vector product. >> > > >> > > The LagJacobian and LagPreconditioner are not coordinated. The >> first determines how often the Jacobian used for preconditioning is >> recomputed and the second determines how often the preconditioner is >> recomputed. >> > > >> > > If you are using -snes_mf_operator then it never makes sense to >> have lagJacobian < lagPreconditioner since it would recompute the Jacobian >> but not actually use it. It also makes no sense for lagPreconditioner < >> lagJacobian because you'd be recomputing the preconditioner on the same >> Jacobian. >> > > >> > > But actually if you don't change the Jacobian used in building the >> preconditioner then when it tries to recompute the preconditioner it >> determines the matrix has not changed so skips rebuilding the >> preconditioner. So when using -snes_mf_operator there is really no reason >> generally to set the preconditioner lag. >> > > > >> > > > 2) How do implicit TS and SNESSetLagPreconditioner/Persists >> interact? Does the counter since-last-preconditioner-compute reset with >> time steps, or does that lag counter just increment with every SNES solve >> regardless of how many nonlinear solves might have happened in a given >> timestep? Say lag preconditioner is 2, and a time stepper uses 3, 2, and 3 >> nonlinear solves on 3 steps, is the flow >> > > > >> > > > (time step 1)->(update preconditioner)->(snes solve)->(snes >> solve)->(update preconditioner)->(snes solve) >> > > > (time step 2)->(snes solve)->(update preconditioner)->(snes solve) >> > > > (time step 3)->(snes solve)->(update preconditioner)->(snes >> solve)->(snes solve) >> > > > >> > > > or >> > > > >> > > > (time step 1)->(update preconditioner)->(snes solve)->(snes >> solve)->(update preconditioner)->(snes solve) >> > > > (time step 2)->(update preconditioner)->(snes solve)->(snes solve) >> > > > (time step 3)->(update preconditioner)->(snes solve)->(snes >> solve)->(update preconditioner)->(snes solve) >> > > > >> > > > ? >> > > > >> > > > I think for implicit time stepping I'd probably want the >> preconditioner to be recomputed just once at the beginning of each time >> step, or some multiple of that. Does that sound reasonable? >> > > >> > > Yes, what you want to do is completely reasonable. >> > > >> > > You can use SNESSetLagJacobian() and SNESSetLagJacobianPersists() >> in combination to have the Jacobian recomputed ever fixed number of times; >> if you set the persists flag and set LagJacobian to 10 it will recompute >> the Jacobian used in the preconditioner every 10th time a new Jacobian is >> needed. >> > > >> > > If you want to compute the new Jacobian used to build the >> preconditioner once at the beginning of each new TS stage you can set >> SNESSetLagJacobian() to negative -2 in the TS prestage call. There are >> possibly other tricks you can do by setting the two flags at different >> locations. >> > > >> > > An alternative to hardwiring how often the Jacobian used to build >> the preconditioner is rebuilt is to rebuild based on when the >> preconditioner starts "working less well". Here you could put an additional >> KSPMonitor or SNESMonitor that detects if the number of linear iterations >> is above a certain amount and then sets the recompute Jacobian flag to -2 >> so that for the next solve it recreates the Jacobian used in building the >> preconditioner. >> > > >> > > >> > > > >> > > > 3) Are there any hooks analogous to KSPSetPreSolve/PostSolve for >> the FD computation of the jacobians, or for the computation of the >> preconditioner? I'd like to get a handle on the relative costs of these. >> > > >> > > No, do you just want the time? You can get that from the logging; >> for example -log_view >> > > >> > > > >> > > > >> > > > Best, >> > > > Mark >> > > > >> > > > On Sat, Sep 23, 2017 at 3:28 PM, Mark Lohry <mlo...@gmail.com> >> wrote: >> > > > Great, thanks Barry. >> > > > >> > > > On Sat, Sep 23, 2017 at 3:12 PM, Barry Smith <bsm...@mcs.anl.gov> >> wrote: >> > > > >> > > > > On Sep 23, 2017, at 12:48 PM, Mark W. Lohry <mlo...@princeton.edu> >> wrote: >> > > > > >> > > > > I'm currently using JFNK in an application where I don't have a >> hand-coded jacobian, and it's working well enough but as expected the >> scaling isn't great. >> > > > > >> > > > > What is the general process for using PC with >> MatMFFDComputeJacobian? Does it make sense to occasionally have petsc >> re-compute the jacobian via finite differences, and then recompute the >> preconditioner? Any that just need the sparsity structure? >> > > > >> > > > Mark >> > > > >> > > > Yes, this is a common approach. SNESSetLagJacobian >> -snes_lag_jacobian >> > > > >> > > > The normal approach in SNES to use matrix-free for the operator >> and use finite differences to compute an approximate Jacobian used to >> construct preconditioners is to to create a sparse matrix with the sparsity >> of the approximate Jacobian (yes you need a way to figure out the sparsity, >> if you use DMDA it will figure out the sparsity for you). Then you use >> > > > >> > > > SNESSetJacobian(snes,J,J, SNESComputeJacobianDefaultColor, >> NULL); >> > > > >> > > > and use the options database option -snes_mf_operator >> > > > >> > > > >> > > > > Are there any PCs that don't work in the matrix-free context? >> > > > >> > > > If you do the above you can use almost all the PC since you are >> providing an explicit matrix from which to build the preconditioner >> > > > >> > > > > Are there any example codes I overlooked? >> > > > > >> > > > > Last but not least... can the Boomer-AMG preconditioner work with >> JFNK? To really show my ignorance of AMG, can it actually be written as a >> matrix P^-1(Ax-b)=0, , or is it just a linear operator? >> > > > >> > > > Again, if you provide an approximate Jacobian like above you can >> use it with BoomerAMG, if you provide NO explicit matrix you cannot use >> BoomerAMG or almost any other preconditioner. >> > > > >> > > > Barry >> > > > >> > > > > >> > > > > Thanks, >> > > > > Mark >> > > > >> > > > >> > > > >> > > >> > > >> > >> > >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/> >