[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-10 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-218239032 @mengxr I looked into using DGELSD to solve `A^T A x = A^T b` as you suggested. It works fine, but then the issue is how to calculate the errors on the coefficients

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-19 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197468720 One problem with the eigen decomposition method is that for rank deficient matrix some of the eigenvalues can be extremely small (instead of being zero

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-15 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196960794 I'm a bit confused about the use of DGELSD. As far as I can tell, it requires matrix A itself. But in the current implementation, we're decomposing A^T.A on the driver

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194522061 I should point out that to identify constant features, I'm comparing variance (aVar) to zero. But, It can happen that the variance for constant features may

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread iyounus
GitHub user iyounus opened a pull request: https://github.com/apache/spark/pull/11610 [SPARK-13777] [ML] Remove constant features from training in noraml solver (WLS) ## What changes were proposed in this pull request? "normal" solver in LinearRegression use

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-178770437 GLMNET sets all coefficients to zero if yStd=0 and fitIntercept=false regardless of standardization or regularization. Thats why I cannot compare my normal equation

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51615248 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -398,7 +422,8 @@ class LinearRegressionModel private[ml

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-02 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-178941008 For `yStd != 0`, and `regParm != 0`, my solution doesn't match with GLMNET. I showed this comparison on this jira https://github.com/apache/spark/pull/10274

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-01 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-178134675 For the case (3), I'm assuming that the label and features are not standardized. So, in that case, the solution exists. Here is my perspective

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-02-01 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51505471 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +575,47 @@ class LinearRegressionSuite

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51355081 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +575,47 @@ class LinearRegressionSuite

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-30 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-177375091 I've completed this PR. I think all the tests are there. Here, I'm going to document a couple of minor issues just for future reference. __Issue 1__

[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-28 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176454056 @dbtsai Linear regression also has similar issues. There, "normal" and "l-gbfs" solvers treat this case differently (and incorrectly)

[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...

2016-01-28 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176481784 @dbtsai I agree that the constant feature doesn't have predictive power. But, the WeightedLeastSqures just throws an `AssertionError` in `lapack.dpotrs` (https

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51071809 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-27 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51071489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-21 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-173738254 I've added an exception for the case when label is constant and `standardization == true` and `regParam != 0.0`. Also added test for this case. I cannot test

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-20 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-173363841 @mengxr I haven't implemented the changes suggested by @dbtsai and @srowen yet. It think the solution I proposed to this issue may not be very suitable. I'll make some

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-18 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r50060828 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala --- @@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-16 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r49941742 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,41 @@ class LinearRegression @Since("

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-14 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r49815599 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala --- @@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-12 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r49543068 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -94,8 +110,7 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...

2016-01-11 Thread iyounus
GitHub user iyounus opened a pull request: https://github.com/apache/spark/pull/10702 [Spark-12732][ML] bug fix in linear regression train Fixed the bug in linear regression train for the case when the target variable is constant. The two cases for `fitIntercept=true

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-11 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r49368466 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-07 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r49140607 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -94,8 +110,7 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2016-01-05 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r48902276 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-30 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10384#issuecomment-168013749 Just made the changed as suggested by @dbtsai. Sorry for the delay. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-22 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48287387 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -23,15 +23,23 @@ import org.apache.spark.Logging

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-22 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48288460 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -22,91 +22,115 @@ import

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-21 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r4829 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -105,6 +112,14 @@ class RegressionMetrics @Since("

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-21 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48199690 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -31,13 +30,17 @@ import org.apache.spark.sql.DataFrame

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-21 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48209708 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -105,6 +112,14 @@ class RegressionMetrics @Since("

[GitHub] spark pull request: R^2 for regression through the origin.

2015-12-18 Thread iyounus
GitHub user iyounus opened a pull request: https://github.com/apache/spark/pull/10384 R^2 for regression through the origin. Modified the definition of R^2 for regression through origin. Added modified test for regression metrics. You can merge this pull request into a Git

[GitHub] spark pull request: R^2 for regression through the origin.

2015-12-18 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48064092 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -31,13 +30,17 @@ import org.apache.spark.sql.DataFrame

[GitHub] spark pull request: [SPARK-12331][ML] R^2 for regression through t...

2015-12-18 Thread iyounus
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10384#discussion_r48072262 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala --- @@ -22,91 +22,111 @@ import

[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...

2015-12-11 Thread iyounus
GitHub user iyounus opened a pull request: https://github.com/apache/spark/pull/10274 [SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero. This fixes the behavior of WeightedLeastSquars.fit() when