> On Feb 19, 2015, at 1:56 PM, Dmitry Karpeyev <[email protected]> wrote:
> 
> 
> 
> On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <[email protected]> wrote:
> 
> Yeah, that sounds like a good fix, except for this: we have to make sure all 
> ranks diverge with this failure so that the user can retry the solve, if 
> necessary.  

   This is the business of the person calling MatSetFailure(). We can require 
that this routine be a collective.

 Barry

> That would require an extra reduction every KSP iteration.
> With Inf or NaN we could piggyback on the norm computation.
> 
> Dmitry.
> 
>    Barry
> 
> 
> 
> 
> > On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <[email protected]> wrote:
> >
> > I wanted to revive this thread and move it to petsc-dev. This problem seems 
> > to be harder than I realized.
> >
> > Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a valid 
> > output vector.
> > For example, it's a MatMFFD and as part of its function evaluation it  has 
> > to evaluate an implicitly-defined
> > constitutive model (e.g., solve an equation of state) and this inner solve 
> > diverges
> > (e.g., the time step is too big).  I want to be able to abort the linear
> > solve and the nonlinear solve, return a suitable "converged" reason and let 
> > the user retry, maybe with a
> > different timestep size.  This is for a hand-rolled time stepper, but TS 
> > would face similar issues.
> >
> > Based on the previous thread here 
> > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-August/022597.html
> > I tried marking the result of MatMult as "invalid" and let it propagate up 
> > to KSPSolve() where it can be handled.
> > This quickly gets out of control, since the invalid Vec isn't returned to 
> > the KSP immediately.  It could be a work
> > vector, which is fed into PCApply() along various code paths, depending on 
> > the side of the preconditioner, whether it's a
> > transpose solve, etc.  Each of these transformations (e.g., PCApply()) 
> > would then have to check the validity of
> > the input argument, clear its error condition and set it on the output 
> > argument, etc.  Very error-prone and fragile.
> > Not to mention the large amount of code to sift through.
> >
> > This is a general problem of exception handling -- we want to "unwind" the 
> > stack to the point where the problem should
> > be handled, but there doesn't seem to a good way to do it.  We also want to 
> > be able to clear all of the error conditions
> > on the way up (e.g., mark vectors as valid again, but not too early), 
> > otherwise we leave the solver in an invalid state.
> >
> >
> > Instead of passing an exception condition up the stack I could try storing 
> > that condition in one of the more globally-visible
> > objects (e.g., the Mat), but if the error occurs inside the evaluation of 
> > the residual that's being differenced, it doesn't really
> > have access to the Mat.  This probably raises various thread safety issues 
> > as well.
> >
> > Using SNESSetFunctionDomainError() doesn't seem to be a solution: a MatMFFD 
> > created with MatCreateSNESMF()
> > has a pointer to SNES, but the function evaluation code actually has no 
> > clue about that. More generally, I don't
> > know whether we want to wait for the linear solve to complete before 
> > handling this exception: it is unnecessary,
> > it might be an expensive linear solve and the result of such a KSPSolve() 
> > is probably undefined and might blow up in
> > unexpected ways.  I suppose if there is a way to get a hold of SNES, each 
> > subsequent MatMult_MFFD has to check
> > whether the domain error is set and return early in that case?  We would 
> > still have to wait for the linear solve to grind
> > through the rest of its iterations.    I don't know, however, if there is a 
> > good way to guarantee that linear solver will get
> > through this quickly and without unintended consequences. Should MatMFFD 
> > also get a hold of the KSP and set a flag
> > there to abort?  I still don't know what the intervening code (e.g., the 
> > various PCApply()) will do before the KSP has a
> > chance to deal with this.
> >
> > I'm now thinking that setting some vector entries to NaN might be a good 
> > solution: I hope this NaN will propagate all the
> > way up through the subsequent arithmetic operations (does the IEEE 
> > floating-point arithmetic guarantees?), this "error
> > condition" gets automatically cleared the next time the vector is 
> > recomputed, since its values are reset.  Finally, I want
> > this exception to be detected globally but without incurring an extra 
> > reduction every time the residual is evaluated,
> > and NaN will be show up in the norm that (most) KSPs would compute anyway.  
> > That way KSP could diverge with a
> > KSP_DIVERGED_NAN or a similar reason and the user would have an option to 
> > retry.  The problem with this approach
> > is that VecValidEntries() in MatMult() and PCApply() will throw an error 
> > before this can work, so I'm trying to think about
> > good ways of turning it off.  Any ideas about how to do this?
> >
> > Incidentally, I realized that I don't understand how 
> > SNESFunctionDomainError can be handled gracefully in the current
> > set up: it's not set or checked collectively, so there isn't a good way to 
> > abort and retry across the whole comm, is there?
> >
> > Dmitry.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <[email protected]> wrote:
> > Dmitry Karpeyev <[email protected]> writes:
> >
> > > Handling this at the KSP level (I actually think the Mat level is more
> > > appropriate, since the operator, not the solver, knows its domain),
> >
> > We are dynamically discovering the domain, but I don't think it's
> > appropriate for Mat to refuse to evaluate any more matrix actions until
> > some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> > invalid is fine, but some action needs to be taken and if Derek wants
> > the SNES to skip further evaluations, we need to propagate the
> > information up the stack somehow.
> 

Reply via email to