tygert commented on issue #16556: [SPARK-19184][MLlib] Improve numerical 
stability for method tallSkinnyQR.
URL: https://github.com/apache/spark/pull/16556#issuecomment-531984643
 
 
   Well, in applications to dimension reduction (that is, when principal 
component analysis is useful), the singular values of the matrix being analyzed 
need to decay to something relatively negligible. So, no, the problem doesn't 
get better as the number of columns increases, but actually typically gets 
worse. You're right that the fix which @hl475 devised is not ideal from the 
point of view of efficiency; the fix provides numerical stability through code 
that would ideally be optimized in C or some other lower-level language. 
Basically all codes that are serious about dimension reduction include some 
kind of truncation of singular values or regularized pseudoinverse. I've seen 
people use Spark's MLlib by adding a multiple of identity to AA^T just to avoid 
the numerical instability. In some sense, that latter hack works and sacrifices 
accuracy controllably.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to