tygert commented on issue #16556: [SPARK-19184][MLlib] Improve numerical stability for method tallSkinnyQR. URL: https://github.com/apache/spark/pull/16556#issuecomment-531974017 To be honest, @srowen : this is way more likely at scale than for the 4x4 case. That is how we found the problem. @hl475 eventually worked out a small case that was representative of what others had been observing. We got complaints that principal component analysis in Spark was broken, and it turned out that the problem was numerical instability. You could in principle use a least-squares solver rather than inverting matrices, if you wanted to rely on Breeze alone. There seems to be a larger issue, though: solving systems of linear equations by explicitly inverting matrices and without any reason for subspaces to align is something prohibited very early in textbooks on numerical linear algebra. Ideally whoever would be maintaining MLlib would be familiar with condition numbers and numerical instability, though I fully realize that there may not be enough resources available to approach the ideal. If we end up using MLlib more where I work, then perhaps I can fix this in the future. Sorry for the distraction.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
