Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-218239032
@mengxr I looked into using DGELSD to solve `A^T A x = A^T b` as you
suggested. It works fine, but then the issue is how to calculate the errors on
the coefficients
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-197468720
One problem with the eigen decomposition method is that for rank deficient
matrix some of the eigenvalues can be extremely small (instead of being zero
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-196960794
I'm a bit confused about the use of DGELSD. As far as I can tell, it
requires matrix A itself. But in the current implementation, we're decomposing
A^T.A on the driver
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/11610#issuecomment-194522061
I should point out that to identify constant features, I'm comparing
variance (aVar) to zero. But, It can happen that the variance for constant
features may
GitHub user iyounus opened a pull request:
https://github.com/apache/spark/pull/11610
[SPARK-13777] [ML] Remove constant features from training in noraml solver
(WLS)
## What changes were proposed in this pull request?
"normal" solver in LinearRegression use
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178770437
GLMNET sets all coefficients to zero if yStd=0 and fitIntercept=false
regardless of standardization or regularization. Thats why I cannot compare my
normal equation
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51615248
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -398,7 +422,8 @@ class LinearRegressionModel private[ml
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178941008
For `yStd != 0`, and `regParm != 0`, my solution doesn't match with GLMNET.
I showed this comparison on this jira
https://github.com/apache/spark/pull/10274
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178134675
For the case (3), I'm assuming that the label and features are not
standardized. So, in that case, the solution exists. Here is my perspective
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51505471
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51355081
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177375091
I've completed this PR. I think all the tests are there. Here, I'm going to
document a couple of minor issues just for future reference.
__Issue 1__
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10940#issuecomment-176454056
@dbtsai Linear regression also has similar issues. There, "normal" and
"l-gbfs" solvers treat this case differently (and incorrectly)
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10940#issuecomment-176481784
@dbtsai I agree that the constant feature doesn't have predictive power.
But, the WeightedLeastSqures just throws an `AssertionError` in `lapack.dpotrs`
(https
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51071809
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51071489
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173738254
I've added an exception for the case when label is constant and
`standardization == true` and `regParam != 0.0`. Also added test for this case.
I cannot test
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173363841
@mengxr I haven't implemented the changes suggested by @dbtsai and @srowen
yet. It think the solution I proposed to this issue may not be very suitable.
I'll make some
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r50060828
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala
---
@@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r49941742
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,41 @@ class LinearRegression @Since("
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r49815599
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala
---
@@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r49543068
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -94,8 +110,7 @@ private[ml] class WeightedLeastSquares
GitHub user iyounus opened a pull request:
https://github.com/apache/spark/pull/10702
[Spark-12732][ML] bug fix in linear regression train
Fixed the bug in linear regression train for the case when the target
variable is constant. The two cases for `fitIntercept=true
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r49368466
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r49140607
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -94,8 +110,7 @@ private[ml] class WeightedLeastSquares
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10274#discussion_r48902276
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -86,6 +86,22 @@ private[ml] class WeightedLeastSquares
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10384#issuecomment-168013749
Just made the changed as suggested by @dbtsai. Sorry for the delay.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48287387
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -23,15 +23,23 @@ import org.apache.spark.Logging
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48288460
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala
---
@@ -22,91 +22,115 @@ import
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r4829
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -105,6 +112,14 @@ class RegressionMetrics @Since("
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48199690
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -31,13 +30,17 @@ import org.apache.spark.sql.DataFrame
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48209708
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -105,6 +112,14 @@ class RegressionMetrics @Since("
GitHub user iyounus opened a pull request:
https://github.com/apache/spark/pull/10384
R^2 for regression through the origin.
Modified the definition of R^2 for regression through origin. Added
modified test for regression metrics.
You can merge this pull request into a Git
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48064092
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -31,13 +30,17 @@ import org.apache.spark.sql.DataFrame
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10384#discussion_r48072262
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/evaluation/RegressionMetricsSuite.scala
---
@@ -22,91 +22,111 @@ import
GitHub user iyounus opened a pull request:
https://github.com/apache/spark/pull/10274
[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero
properly if standard deviation of target variable is zero.
This fixes the behavior of WeightedLeastSquars.fit() when
36 matches
Mail list logo