Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/23126 Compare Spark computeCovariance function in RowMatrix for DenseVector and Numpy's function cov, Find two problem, below is the result: 1)The Spark function computeCovariance in RowMatrix is not accuracy input data 1.0,2.0,3.0,4.0,5.0 2.0,3.0,1.0,2.0,6.0 Numpy function cov result: [[2.5 1.75] [ 1.75 3.7 ]] RowMatrix function computeCovariance result: 2.5 1.75 1.75 3.700000000000001 2)For some input case, the result is not good generate input data by below logic data1 = np.random.normal(loc=100000, scale=0.000009, size=10000000) data2 = np.random.normal(loc=200000, scale=0.000002,size=10000000) Numpy function cov result: [[ 8.10536442e-11 -4.35439574e-15] [ -4.35439574e-15 3.99928264e-12]] RowMatrix function computeCovariance result: -0.0027484893798828125 0.001491546630859375 0.001491546630859375 8.087158203125E-4
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org