Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/88#issuecomment-38083186
@rezazadeh Continue the discussion from an outdated comment:
~~~
I don't want to be using relative condition number here, I just want an
absolute lower bound on the smallest singular value considered nonzero so we
don't run into issues when dividing by it. I will rename the rcond instance
variable too, but that's it.
~~~
If the largest singular value is `1e10`, any singular value below `1e-6`
becomes noise because of limited precision and should be treated as zero. For
big matrices, more precision gets lost due to accumulation. Users usually do
not know the largest singular value beforehand and hence have no way to set the
correct threshold. If `1e-9` is used in the case I just mentioned, there is no
dividing-by-zero exception but the result `U` matrix will contain nothing but
noise. This is why LAPACK uses RCOND in many places like xGESDD and xGELSD. I
think it would be nice for us to follow the standard.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---