Re: [petsc-dev] handling user domain errors

Barry Smith Mon, 04 May 2015 21:19:19 -0700

> On May 4, 2015, at 10:54 PM, Dmitry Karpeyev <[email protected]> wrote:
> 
> 
> 
> On Mon, May 4, 2015 at 6:20 PM Barry Smith <[email protected]> wrote:
> 
>   My first reaction to this was "man that is ugly and cumbersome, I can do it 
> much cleaner than that"; turns out it isn't as simple as I thought but with a 
> couple of macros I think I've incorporated much of what is needed in
> 
> https://bitbucket.org/petsc/petsc/pull-request/315/propagating-solver-errors-instead-of/diff
> 
> some work needs to be done on getting the most appropriate SNES converged 
> reason set. In fact one could argue that trying to pass the converged reason 
> up as a single enum type may not be the best model since there may be more 
> information that one wishes to convey such as function domain error that 
> happened while differencing the function with coloring to compute the 
> Jacobian.
> Are you arguing for a more full-fledged exception handling?


  No. Actually the more full-fledged exception handling has to handle the 
parallel collective issues which is tough.

> Note that you are essentially having to insert various custom "exception 
> condition" checks (e.g., SNESCheckKSPSolve(), if(ksp->reason) break; 
> KSPCheckDot(), etc) on the whole call path, along which an exception might be 
> propagating.  This strikes me as brittle and error-prone, not to mention 
> threatening to get rather complex if the number of these exceptions and their 
> combinations starts to grow.

   Propose something better

> 
>   Anyways in particular look at the test example ex69.c
> Looks pretty good.  Thanks! 
> 
> 
>   Barry
> 
> > On May 1, 2015, at 10:52 PM, Dmitry Karpeyev <[email protected]> wrote:
> >
> > Here's the first crack at it: 
> > https://bitbucket.org/petsc/petsc/branch/karpeev/ksp-diverged-on-matmult-nanorinf.
> > Messier than I had expected (GMRES only for now).
> >
> > On Fri, May 1, 2015 at 8:06 PM Dmitry Karpeyev <[email protected]> wrote:
> > On Fri, May 1, 2015 at 7:32 PM Barry Smith <[email protected]> wrote:
> >
> > > On May 1, 2015, at 6:43 PM, Jed Brown <[email protected]> wrote:
> > >
> > > Barry Smith <[email protected]> writes:
> > >>   1) This simplifies the needed code since we won't need to put
> > >>   checks all over the place on returns about failure nor do we need
> > >>   to worry about propagating errors from one process to another
> > >>   (since the Nan/Inf get moved by the MPI_Allreduce()).
> > >
> > > My concern is that -fp_trap will become a lot less useful.
> >
> >   I agree there is a tradeoff; but under "normal" circumstances where there 
> > are no Nan or Inf around (which I think is most of the time) -fp_trap will 
> > be just as useful as now. For the other cases the user will have to have 
> > some idea where (and when) in the code to turn on the trapping to catch the 
> > "true" problems.
> >
> >    Barry
> >
> >   The only other way I see to do it is carry a validity flag around with 
> > each vector and reduce that flag in all the vector reductions; but this 
> > alone is not enough we would also have to have some propagation code for 
> > things like zero pivot, for example setting a validity flag in the Mat 
> > factor (saying the factor is not valid) and propagating up those flags. We 
> > get all these things "for free" with the Inf Nan approach.
> > There is an additional benefit: the validity flag would have to be cleared 
> > by the caller to avoid "false positives" on subsequent calls.  That's an 
> > opportunity for bugs.  With NaN the "error condition" (i.e., the NaN entry) 
> > gets cleared automatically by a subsequent successful vector operation.
> >
> >
> > What exactly caused the NaN would have to be signaled "out-of-band" as the 
> > saying goes. One way to "signal" it is by the code path that led to the 
> > error condition: that's why calling through KSP_MatMult() is useful.  It's 
> > not ideal, but covers the cases of immediate interest.
> > Dmitry.
> >
> > >
> >
>

Re: [petsc-dev] handling user domain errors

Reply via email to