Re: Catch divide-by-zero floating number exception in backend

2018-11-12 Thread Pedro Larroy
Hi

Could you be specific about the bugs? While we could use this for debug some 
particular errors as you describe I would think that in the general case you 
would want to rely on unit testing and conditional checks for very small 
numbers on the denominator if you can’t have a NaN. I think we should collect 
some examples before and study them carefully as fp artihmetic is tricky. I 
think is not common practice and also not portable to use signals and fp 
exceptions, as you mentioned.

Pedro

> On 9. Nov 2018, at 00:30, Lin Yuan  wrote:
> 
> Dear MXNet Community,
> 
> I recently found the NaN errors sometimes could be due to some
> divide-by-zero float number bugs in engine backend. However, by default,
> such an exception will not be thrown. I added a signal trap to catch this
> error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a
> few exceptions when running the python unit test. But this only works for
> Linux OS.
> 
> I would like to get more feedback on the best practice to catch such bugs
> in the code and if we should enforce such checks in CI. Any comment is
> appreciated.
> 
> Best Regards,
> 
> Lin


Catch divide-by-zero floating number exception in backend

2018-11-08 Thread Lin Yuan
Dear MXNet Community,

I recently found the NaN errors sometimes could be due to some
divide-by-zero float number bugs in engine backend. However, by default,
such an exception will not be thrown. I added a signal trap to catch this
error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a
few exceptions when running the python unit test. But this only works for
Linux OS.

I would like to get more feedback on the best practice to catch such bugs
in the code and if we should enforce such checks in CI. Any comment is
appreciated.

Best Regards,

Lin