Hi all, These days I test Lasso and ridge regression in MLlib, and I find an error of Double.Nan. While other classification and regression methods do very well.
Finally I find that Lasso and RidgeRegression call computeStats() function to compute mean and SD (standard deviation) for normalizing input data. However, some returned SDs are zeroes. So when encountering 0.0 / 0.0, there will be a Nan error. How about setting directly to zero if both the divisor and dividend are zeroes, and adding a smoothing factor (e.g. 1.0e-10) if the dividend alone is zero? Or anyone have better ideas ? Thanks ! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/computeStats-in-MLUtils-will-cause-Nan-not-a-number-error-tp980.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
