> On Feb 19, 2015, at 1:56 PM, Dmitry Karpeyev <[email protected]> wrote: > > > > On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <[email protected]> wrote: > > Yeah, that sounds like a good fix, except for this: we have to make sure all > ranks diverge with this failure so that the user can retry the solve, if > necessary.
This is the business of the person calling MatSetFailure(). We can require that this routine be a collective. Barry > That would require an extra reduction every KSP iteration. > With Inf or NaN we could piggyback on the norm computation. > > Dmitry. > > Barry > > > > > > On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <[email protected]> wrote: > > > > I wanted to revive this thread and move it to petsc-dev. This problem seems > > to be harder than I realized. > > > > Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a valid > > output vector. > > For example, it's a MatMFFD and as part of its function evaluation it has > > to evaluate an implicitly-defined > > constitutive model (e.g., solve an equation of state) and this inner solve > > diverges > > (e.g., the time step is too big). I want to be able to abort the linear > > solve and the nonlinear solve, return a suitable "converged" reason and let > > the user retry, maybe with a > > different timestep size. This is for a hand-rolled time stepper, but TS > > would face similar issues. > > > > Based on the previous thread here > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-August/022597.html > > I tried marking the result of MatMult as "invalid" and let it propagate up > > to KSPSolve() where it can be handled. > > This quickly gets out of control, since the invalid Vec isn't returned to > > the KSP immediately. It could be a work > > vector, which is fed into PCApply() along various code paths, depending on > > the side of the preconditioner, whether it's a > > transpose solve, etc. Each of these transformations (e.g., PCApply()) > > would then have to check the validity of > > the input argument, clear its error condition and set it on the output > > argument, etc. Very error-prone and fragile. > > Not to mention the large amount of code to sift through. > > > > This is a general problem of exception handling -- we want to "unwind" the > > stack to the point where the problem should > > be handled, but there doesn't seem to a good way to do it. We also want to > > be able to clear all of the error conditions > > on the way up (e.g., mark vectors as valid again, but not too early), > > otherwise we leave the solver in an invalid state. > > > > > > Instead of passing an exception condition up the stack I could try storing > > that condition in one of the more globally-visible > > objects (e.g., the Mat), but if the error occurs inside the evaluation of > > the residual that's being differenced, it doesn't really > > have access to the Mat. This probably raises various thread safety issues > > as well. > > > > Using SNESSetFunctionDomainError() doesn't seem to be a solution: a MatMFFD > > created with MatCreateSNESMF() > > has a pointer to SNES, but the function evaluation code actually has no > > clue about that. More generally, I don't > > know whether we want to wait for the linear solve to complete before > > handling this exception: it is unnecessary, > > it might be an expensive linear solve and the result of such a KSPSolve() > > is probably undefined and might blow up in > > unexpected ways. I suppose if there is a way to get a hold of SNES, each > > subsequent MatMult_MFFD has to check > > whether the domain error is set and return early in that case? We would > > still have to wait for the linear solve to grind > > through the rest of its iterations. I don't know, however, if there is a > > good way to guarantee that linear solver will get > > through this quickly and without unintended consequences. Should MatMFFD > > also get a hold of the KSP and set a flag > > there to abort? I still don't know what the intervening code (e.g., the > > various PCApply()) will do before the KSP has a > > chance to deal with this. > > > > I'm now thinking that setting some vector entries to NaN might be a good > > solution: I hope this NaN will propagate all the > > way up through the subsequent arithmetic operations (does the IEEE > > floating-point arithmetic guarantees?), this "error > > condition" gets automatically cleared the next time the vector is > > recomputed, since its values are reset. Finally, I want > > this exception to be detected globally but without incurring an extra > > reduction every time the residual is evaluated, > > and NaN will be show up in the norm that (most) KSPs would compute anyway. > > That way KSP could diverge with a > > KSP_DIVERGED_NAN or a similar reason and the user would have an option to > > retry. The problem with this approach > > is that VecValidEntries() in MatMult() and PCApply() will throw an error > > before this can work, so I'm trying to think about > > good ways of turning it off. Any ideas about how to do this? > > > > Incidentally, I realized that I don't understand how > > SNESFunctionDomainError can be handled gracefully in the current > > set up: it's not set or checked collectively, so there isn't a good way to > > abort and retry across the whole comm, is there? > > > > Dmitry. > > > > > > > > > > > > > > > > > > On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <[email protected]> wrote: > > Dmitry Karpeyev <[email protected]> writes: > > > > > Handling this at the KSP level (I actually think the Mat level is more > > > appropriate, since the operator, not the solver, knows its domain), > > > > We are dynamically discovering the domain, but I don't think it's > > appropriate for Mat to refuse to evaluate any more matrix actions until > > some action is taken at the MatMFFD/SNESMF level. Marking the Vec > > invalid is fine, but some action needs to be taken and if Derek wants > > the SNES to skip further evaluations, we need to propagate the > > information up the stack somehow. >
