My first reaction to this was "man that is ugly and cumbersome, I can do it 
much cleaner than that"; turns out it isn't as simple as I thought but with a 
couple of macros I think I've incorporated much of what is needed in 

https://bitbucket.org/petsc/petsc/pull-request/315/propagating-solver-errors-instead-of/diff
 

some work needs to be done on getting the most appropriate SNES converged 
reason set. In fact one could argue that trying to pass the converged reason up 
as a single enum type may not be the best model since there may be more 
information that one wishes to convey such as function domain error that 
happened while differencing the function with coloring to compute the Jacobian.

  Anyways in particular look at the test example ex69.c


  Barry

> On May 1, 2015, at 10:52 PM, Dmitry Karpeyev <[email protected]> wrote:
> 
> Here's the first crack at it: 
> https://bitbucket.org/petsc/petsc/branch/karpeev/ksp-diverged-on-matmult-nanorinf.
> Messier than I had expected (GMRES only for now).
> 
> On Fri, May 1, 2015 at 8:06 PM Dmitry Karpeyev <[email protected]> wrote:
> On Fri, May 1, 2015 at 7:32 PM Barry Smith <[email protected]> wrote:
> 
> > On May 1, 2015, at 6:43 PM, Jed Brown <[email protected]> wrote:
> >
> > Barry Smith <[email protected]> writes:
> >>   1) This simplifies the needed code since we won't need to put
> >>   checks all over the place on returns about failure nor do we need
> >>   to worry about propagating errors from one process to another
> >>   (since the Nan/Inf get moved by the MPI_Allreduce()).
> >
> > My concern is that -fp_trap will become a lot less useful.
> 
>   I agree there is a tradeoff; but under "normal" circumstances where there 
> are no Nan or Inf around (which I think is most of the time) -fp_trap will be 
> just as useful as now. For the other cases the user will have to have some 
> idea where (and when) in the code to turn on the trapping to catch the "true" 
> problems.
> 
>    Barry
> 
>   The only other way I see to do it is carry a validity flag around with each 
> vector and reduce that flag in all the vector reductions; but this alone is 
> not enough we would also have to have some propagation code for things like 
> zero pivot, for example setting a validity flag in the Mat factor (saying the 
> factor is not valid) and propagating up those flags. We get all these things 
> "for free" with the Inf Nan approach.
> There is an additional benefit: the validity flag would have to be cleared by 
> the caller to avoid "false positives" on subsequent calls.  That's an 
> opportunity for bugs.  With NaN the "error condition" (i.e., the NaN entry) 
> gets cleared automatically by a subsequent successful vector operation. 
> 
> 
> What exactly caused the NaN would have to be signaled "out-of-band" as the 
> saying goes. One way to "signal" it is by the code path that led to the error 
> condition: that's why calling through KSP_MatMult() is useful.  It's not 
> ideal, but covers the cases of immediate interest.
> Dmitry.
> 
> >
> 

Reply via email to