GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1659
SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to
Math.exp, Math.log
In a few places in MLlib, an expression of the form `log(1.0 + p)` is
evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However
the correct answer is very near `p`. This is why `Math.log1p` exists.
Similarly for one instance of `exp(m) - 1` in GraphX; there's a special
`Math.expm1` method.
While the errors occur only for very small arguments, given their use in
machine learning algorithms, this is entirely possible.
Also note the related PR for Python:
https://github.com/apache/spark/pull/1652
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srowen/spark SPARK-2748
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1659.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1659
----
commit c5926d4d1cc0d071b4ebbd2bddc14af250809fb0
Author: Sean Owen <[email protected]>
Date: 2014-07-30T11:34:20Z
Use log1p, expm1 for better precision for tiny arguments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---