Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/88#issuecomment-38083186
  
    @rezazadeh Continue the discussion from an outdated comment:
    
    ~~~
    I don't want to be using relative condition number here, I just want an 
absolute lower bound on the smallest singular value considered nonzero so we 
don't run into issues when dividing by it. I will rename the rcond instance 
variable too, but that's it.
    ~~~
    
    If the largest singular value is `1e10`, any singular value below `1e-6` 
becomes noise because of limited precision and should be treated as zero. For 
big matrices, more precision gets lost due to accumulation. Users usually do 
not know the largest singular value beforehand and hence have no way to set the 
correct threshold. If `1e-9` is used in the case I just mentioned, there is no 
dividing-by-zero exception but the result `U` matrix will contain nothing but 
noise. This is why LAPACK uses RCOND in many places like xGESDD and xGELSD. I 
think it would be nice for us to follow the standard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to