Hi all,

These days I test Lasso and ridge regression in MLlib, and I find an error
of Double.Nan. While other classification and regression methods do very
well.

Finally I find that Lasso and RidgeRegression call computeStats() function
to compute mean and SD (standard deviation) for normalizing input data.
However, some returned SDs are zeroes. So when encountering 0.0 / 0.0, there
will be a Nan error.

How about setting directly to zero if both the divisor and dividend are
zeroes, and adding a smoothing factor (e.g. 1.0e-10) if the dividend alone
is zero? Or anyone have better ideas ?

Thanks !



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/computeStats-in-MLUtils-will-cause-Nan-not-a-number-error-tp980.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to