Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/22784 Test results with existing PCA and using SVD without computing covariance matrix val data = Array( Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))), Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0), Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)) 1) PCA using covariance matrix explained Variance = [ 0.7943932532, 0.2056067468, 1.26E-16] Top 2 Principle components : [[-0.44859172075072673 -0.28423808214073987 0.13301985745398526 -0.05621155904253121 -0.1252315635978212 0.7636264774662965 0.21650756651919933 -0.5652958773533949 -0.8476512931126826 -0.11560340501314653 ]] 2) PCA using SVD, without computing covariance matrix: explained Variance = [0.7943932532, 0.2056067468, 5.55E-17] Top 2 Principle components : [[-0.44859172075072673 -0.2842380821407399 0.13301985745398529 -0.056211559042531424 -0.12523156359782125 0.7636264774662964 0.21650756651919945 -0.5652958773533953 -0.8476512931126826 -0.11560340501314664]] **Leading Eigen Values MSE = 0.0 Leading eigen vectors MSE = 0.0**
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org