tygert commented on issue #16556: [SPARK-19184][MLlib] Improve numerical stability for method tallSkinnyQR. URL: https://github.com/apache/spark/pull/16556#issuecomment-531984643 Well, in applications to dimension reduction (that is, when principal component analysis is useful), the singular values of the matrix being analyzed need to decay to something relatively negligible. So, no, the problem doesn't get better as the number of columns increases, but actually typically gets worse. You're right that the fix which @hl475 devised is not ideal from the point of view of efficiency; the fix provides numerical stability through code that would ideally be optimized in C or some other lower-level language. Basically all codes that are serious about dimension reduction include some kind of truncation of singular values or regularized pseudoinverse. I've seen people use Spark's MLlib by adding a multiple of identity to AA^T just to avoid the numerical instability. In some sense, that latter hack works and sacrifices accuracy controllably.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
