Here's the first crack at it: https://bitbucket.org/petsc/petsc/branch/karpeev/ksp-diverged-on-matmult-nanorinf . Messier than I had expected (GMRES only for now).
On Fri, May 1, 2015 at 8:06 PM Dmitry Karpeyev <[email protected]> wrote: > On Fri, May 1, 2015 at 7:32 PM Barry Smith <[email protected]> wrote: > >> >> > On May 1, 2015, at 6:43 PM, Jed Brown <[email protected]> wrote: >> > >> > Barry Smith <[email protected]> writes: >> >> 1) This simplifies the needed code since we won't need to put >> >> checks all over the place on returns about failure nor do we need >> >> to worry about propagating errors from one process to another >> >> (since the Nan/Inf get moved by the MPI_Allreduce()). >> > >> > My concern is that -fp_trap will become a lot less useful. >> >> I agree there is a tradeoff; but under "normal" circumstances where >> there are no Nan or Inf around (which I think is most of the time) -fp_trap >> will be just as useful as now. For the other cases the user will have to >> have some idea where (and when) in the code to turn on the trapping to >> catch the "true" problems. >> >> Barry >> >> The only other way I see to do it is carry a validity flag around with >> each vector and reduce that flag in all the vector reductions; but this >> alone is not enough we would also have to have some propagation code for >> things like zero pivot, for example setting a validity flag in the Mat >> factor (saying the factor is not valid) and propagating up those flags. We >> get all these things "for free" with the Inf Nan approach. >> > There is an additional benefit: the validity flag would have to be cleared > by the caller to avoid "false positives" on subsequent calls. That's an > opportunity for bugs. With NaN the "error condition" (i.e., the NaN entry) > gets cleared automatically by a subsequent successful vector operation. > > > What exactly caused the NaN would have to be signaled "out-of-band" as the > saying goes. One way to "signal" it is by the code path that led to the > error condition: that's why calling through KSP_MatMult() is useful. It's > not ideal, but covers the cases of immediate interest. > Dmitry. > >> >> > >> >>
