[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10940 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-190593882 Had an offline discussion with @dbtsai and @coderxiang . We agreed to keep the current behavior and have it well documented. I will mark this JIRA as "won't" and created SPARK-13590 for documentation and logging improvement. @coderxiang Do you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-190553302 @coderxiang @dbtsai Sorry for late response! I actually thought this PR already got merged ... Anyway, I tested `glmnet` and found that `glmnet` outputs zero coefficients for constant columns regardless of intercept, regularization, and standardization settings. I thought about it today and I feel it actually makes sense. If we have a constant column in our training data, do we expect it to change or stay constant in test data? If its value might change, we should set its coefficient to zero because we cannot estimate how big the change would be. If its value stays constant (or maybe users created this column to add bias manually), it shouldn't be regularized and users should really turn on `fitIntercept` instead. So my suggestion is to follow glmnet and set the coefficients of constant columns to zero regardless of other settings. If there are constant columns and `fitIntercept` is false. We should output a warning message. Does it sound good to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176459204 @iyounus Ideally, it will be great that `intercept=true`, we keep the current behavior which is constant column doesn't have any predictive power. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176454056 @dbtsai Linear regression also has similar issues. There, "normal" and "l-gbfs" solvers treat this case differently (and incorrectly). The other problem there is that if intercept=true, then a constant feature column makes the gramian matrix singular and cholesky decomposition fails. Should I create separate jira for the case of constant feature? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176481784 @dbtsai I agree that the constant feature doesn't have predictive power. But, the WeightedLeastSqures just throws an `AssertionError` in `lapack.dpotrs` (https://issues.apache.org/jira/browse/SPARK-11918), whereas the "l-bfgs" solver sets the coefficient to zero. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-176503318 @iyounus Maybe in this case, `WeightedLeastSqures ` should drop those columns so the model can be still trained. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175911254 **[Test build #50227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50227/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898182 **[Test build #50229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50229/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175911585 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175911586 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50227/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175804541 @mengxr you mean do this locally? I was concerned this will create confusion since we are modifying the true value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175803323 @coderxiang Would the fix be cleaner if we set `featuresStd(i)` to `1.0` if it is `0.0`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175811350 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50209/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175811345 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51033689 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -971,8 +971,12 @@ private class LogisticAggregator( val margin = - { var sum = 0.0 features.foreachActive { (index, value) => - if (featuresStd(index) != 0.0 && value != 0.0) { -sum += localCoefficientsArray(index) * (value / featuresStd(index)) + if (value != 0.0) { --- End diff -- Sure I'll remove, I was avoiding changing existing logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175808760 **[Test build #50210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50210/consoleFull)** for PR 10940 at commit [`43db782`](https://github.com/apache/spark/commit/43db782a5ba649d3139df2688e9578a6de9de734). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175811802 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51033447 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -971,8 +971,12 @@ private class LogisticAggregator( val margin = - { var sum = 0.0 features.foreachActive { (index, value) => - if (featuresStd(index) != 0.0 && value != 0.0) { -sum += localCoefficientsArray(index) * (value / featuresStd(index)) + if (value != 0.0) { --- End diff -- This `if` branch is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51033453 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -983,8 +987,12 @@ private class LogisticAggregator( val multiplier = weight * (1.0 / (1.0 + math.exp(margin)) - label) features.foreachActive { (index, value) => -if (featuresStd(index) != 0.0 && value != 0.0) { - localGradientSumArray(index) += multiplier * (value / featuresStd(index)) +if (value != 0.0) { --- End diff -- same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175819709 **[Test build #50213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50213/consoleFull)** for PR 10940 at commit [`43db782`](https://github.com/apache/spark/commit/43db782a5ba649d3139df2688e9578a6de9de734). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175828908 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50210/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175840958 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175828663 **[Test build #50210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50210/consoleFull)** for PR 10940 at commit [`43db782`](https://github.com/apache/spark/commit/43db782a5ba649d3139df2688e9578a6de9de734). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175828903 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175840489 **[Test build #50213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50213/consoleFull)** for PR 10940 at commit [`43db782`](https://github.com/apache/spark/commit/43db782a5ba649d3139df2688e9578a6de9de734). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175840967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51049983 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -971,8 +971,10 @@ private class LogisticAggregator( val margin = - { var sum = 0.0 features.foreachActive { (index, value) => - if (featuresStd(index) != 0.0 && value != 0.0) { + if (featuresStd(index) != 0.0) { --- End diff -- The previous version is correct. Checking `value != 0.0` is much cheaper than computing `localCoefficientsArray(index)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51050297 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -607,6 +607,74 @@ class LogisticRegressionSuite assert(model2.coefficients ~= coefficientsR2 relTol 1E-2) } + + test("an extra large example for review only") { +val trainer1 = (new LogisticRegression).setFitIntercept(false) + .setElasticNetParam(0.0) + .setRegParam(1) + .setStandardization(false) + .setMaxIter(1000) + .setTol(1e-9) + +val binaryDatasetWithUniqueColumn = sqlContext.read + .format("libsvm") + .load("../data/mllib/sample_libsvm_data_with_unique_column.txt") + +val model1 = trainer1.fit(binaryDatasetWithUniqueColumn) + +val interceptR1 = 0.0 +val coefficientsR1 = Vectors.dense(0.0301002746509743, 0.0906099616129797, 0.0954855492088332, + 0.0243782420594917, 0.0174024017667667, -0.0006549273929309, + 0.0637250665085166, -0.0589532651377124, 0.1383368129434264, + 0.0665749825701113, 0.0799386779781182, 0.1198682685242071, + 0.1802933312643371, -0.0124797701753129) + +assert(model1.intercept ~== interceptR1 absTol 1E-3) +assert(model1.coefficients ~= coefficientsR1 relTol 1E-2) + } + + test("binary logistic regression without intercept with L2 regularizationon " + +"data with unique column without intercept") { +val trainer = (new LogisticRegression).setFitIntercept(false) + .setElasticNetParam(0.0) + .setRegParam(1) + .setStandardization(false) + .setMaxIter(1000) + .setTol(1e-9) + +val binaryDatasetWithUniqueColumn = sqlContext.createDataFrame( + sc.parallelize( +Array( + LabeledPoint(label = 1.0, features = Vectors.dense(1, 1)), + LabeledPoint(label = 0.0, features = Vectors.dense(0, 1)) +) + ) +) + +val model = trainer.fit(binaryDatasetWithUniqueColumn) + +val interceptR = 0.0 +val coefficientsR = Vectors.dense(0.22478867, -0.02241016) + +assert(model.intercept ~== interceptR absTol 1E-3) +assert(model.coefficients ~= coefficientsR relTol 1E-2) + +/* +Use the following scikit-learn Python code to get a reference result: --- End diff -- Please move the comments to the beginning of the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51050301 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -607,6 +607,74 @@ class LogisticRegressionSuite assert(model2.coefficients ~= coefficientsR2 relTol 1E-2) } + + test("an extra large example for review only") { +val trainer1 = (new LogisticRegression).setFitIntercept(false) + .setElasticNetParam(0.0) + .setRegParam(1) + .setStandardization(false) + .setMaxIter(1000) + .setTol(1e-9) + +val binaryDatasetWithUniqueColumn = sqlContext.read + .format("libsvm") + .load("../data/mllib/sample_libsvm_data_with_unique_column.txt") + +val model1 = trainer1.fit(binaryDatasetWithUniqueColumn) + +val interceptR1 = 0.0 +val coefficientsR1 = Vectors.dense(0.0301002746509743, 0.0906099616129797, 0.0954855492088332, + 0.0243782420594917, 0.0174024017667667, -0.0006549273929309, + 0.0637250665085166, -0.0589532651377124, 0.1383368129434264, + 0.0665749825701113, 0.0799386779781182, 0.1198682685242071, + 0.1802933312643371, -0.0124797701753129) + +assert(model1.intercept ~== interceptR1 absTol 1E-3) +assert(model1.coefficients ~= coefficientsR1 relTol 1E-2) + } + + test("binary logistic regression without intercept with L2 regularizationon " + +"data with unique column without intercept") { --- End diff -- `unique column` -> `a constant column` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51051112 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -341,11 +341,11 @@ class LogisticRegression @Since("1.2.0") ( regParamL1 } else { // If `standardization` is false, we still standardize the data -// to improve the rate of convergence; as a result, we have to -// perform this reverse standardization by penalizing each component -// differently to get effectively the same objective function when +// to improve the rate of convergence unless the standard deviation is zero; +// as a result, we have to perform this reverse standardization by penalizing +// each component differently to get effectively the same objective function when // the training dataset is not standardized. -if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else 0.0 +if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else regParamL1 --- End diff -- The constant `value` can be really large or very small negatively. The optimizer may not be able to converge well in this case. I don't prove or try it yet, but mathematically, with the following changes, this should be solving identical problem. ```scala // the training dataset is not standardized. if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else regParamL1 / featuresMean(index) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51051223 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -971,8 +971,10 @@ private class LogisticAggregator( val margin = - { var sum = 0.0 features.foreachActive { (index, value) => - if (featuresStd(index) != 0.0 && value != 0.0) { + if (featuresStd(index) != 0.0) { sum += localCoefficientsArray(index) * (value / featuresStd(index)) + } else { +sum += localCoefficientsArray(index) * value --- End diff -- Change `value` into `1.0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51051184 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -417,7 +417,7 @@ class LogisticRegression @Since("1.2.0") ( val rawCoefficients = state.x.toArray.clone() var i = 0 while (i < numFeatures) { - rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / featuresStd(i) else 0.0 } + rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / featuresStd(i) else 1.0 } --- End diff -- ```scala rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / featuresStd(i) else 1.0 / featuresMean(i) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51051267 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1106,7 +1110,8 @@ private class LogisticCostFun( totalGradientArray(index) += regParamL2 * temp value * temp } else { -0.0 +totalGradientArray(index) += regParamL2 * value +value * value --- End diff -- Change `value` into `1.0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51051238 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -983,8 +985,10 @@ private class LogisticAggregator( val multiplier = weight * (1.0 / (1.0 + math.exp(margin)) - label) features.foreachActive { (index, value) => -if (featuresStd(index) != 0.0 && value != 0.0) { +if (featuresStd(index) != 0.0) { localGradientSumArray(index) += multiplier * (value / featuresStd(index)) +} else { + localGradientSumArray(index) += multiplier * value --- End diff -- Change `value` into `1.0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51052438 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -341,11 +341,11 @@ class LogisticRegression @Since("1.2.0") ( regParamL1 } else { // If `standardization` is false, we still standardize the data -// to improve the rate of convergence; as a result, we have to -// perform this reverse standardization by penalizing each component -// differently to get effectively the same objective function when +// to improve the rate of convergence unless the standard deviation is zero; +// as a result, we have to perform this reverse standardization by penalizing +// each component differently to get effectively the same objective function when // the training dataset is not standardized. -if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else 0.0 +if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else regParamL1 --- End diff -- Can you give an example that using `value` will fail to converge? I agree any non-zero number here can make the algorithm work, but should we select a particular number as the denominator, or let it be the original value? @mengxr what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51052743 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -341,11 +341,11 @@ class LogisticRegression @Since("1.2.0") ( regParamL1 } else { // If `standardization` is false, we still standardize the data -// to improve the rate of convergence; as a result, we have to -// perform this reverse standardization by penalizing each component -// differently to get effectively the same objective function when +// to improve the rate of convergence unless the standard deviation is zero; +// as a result, we have to perform this reverse standardization by penalizing +// each component differently to get effectively the same objective function when // the training dataset is not standardized. -if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else 0.0 +if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else regParamL1 --- End diff -- BTW, this technique is already used here by dividing the std of feature. Note that you need to make sure `value != 0`, and if `value == 0`, the coefficient of that feature will be zero. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175863915 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175867624 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175868432 Intuitively, this makes sense to me. Since when `setFitIntercept(false)`, those features with `std == 0` will act as the effect of intercept resulting non-zero coefficients. Can you add couple more tests as the following. Thanks. First, adding two new datasets by zeroing out `binaryDataset` and making one column as non-zero constance. Matching the result against GLMNET like the rest of the tests when 1) setFitIntercept(false), setStandardization(false) with/without regularization 2) setFitIntercept(false), setStandardization(true) with/without regularization 3) setFitIntercept(true), setStandardization(false) with/without regularization 4) setFitIntercept(true), setStandardization(true) with/without regularization +cc @iyounus Linear Regression may have similar issue, if you have time, you may check it out. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r51054476 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -341,11 +341,11 @@ class LogisticRegression @Since("1.2.0") ( regParamL1 } else { // If `standardization` is false, we still standardize the data -// to improve the rate of convergence; as a result, we have to -// perform this reverse standardization by penalizing each component -// differently to get effectively the same objective function when +// to improve the rate of convergence unless the standard deviation is zero; +// as a result, we have to perform this reverse standardization by penalizing +// each component differently to get effectively the same objective function when // the training dataset is not standardized. -if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else 0.0 +if (featuresStd(index) != 0.0) regParamL1 / featuresStd(index) else regParamL1 --- End diff -- @coderxiang You may try it out. When `value` is very small like `1e-6` as a constant comparing the rest of the features, the corresponding coefficient will be very large. In this case, the optimization on coefficients will be on different scales, and this often causes some convergence issue in line search. Similar argument can be made when the constant `value` is very large comparing the rest of the features. That's why we still standardize the features even users ask `standardization == false`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175867627 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50225/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175870597 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175870484 @dbtsai without regularization may let the objective being not strongly-convex and thus not guaranteeing the uniqueness of the solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175871497 **[Test build #50227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50227/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175872406 @coderxiang I agree, without regularization, those features become collinear so the solution will not be unique. However, for those features with std != 0, the coefficients should be unique. Can you check them at least? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175881461 **[Test build #50229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50229/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175579533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50191/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175579209 **[Test build #50191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50191/consoleFull)** for PR 10940 at commit [`ae4dd3c`](https://github.com/apache/spark/commit/ae4dd3cc55254165a84f1c4e5f89abbf6c5e41c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175582499 **[Test build #50189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50189/consoleFull)** for PR 10940 at commit [`ae4dd3c`](https://github.com/apache/spark/commit/ae4dd3cc55254165a84f1c4e5f89abbf6c5e41c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175507401 **[Test build #50184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50184/consoleFull)** for PR 10940 at commit [`df49cfe`](https://github.com/apache/spark/commit/df49cfead7664e9f04fcae14156da65d1b13b9ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175579527 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175583165 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50189/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175583162 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175479554 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175479556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50181/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175508278 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175508286 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50184/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175508269 **[Test build #50184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50184/consoleFull)** for PR 10940 at commit [`df49cfe`](https://github.com/apache/spark/commit/df49cfead7664e9f04fcae14156da65d1b13b9ac). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175524526 **[Test build #50191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50191/consoleFull)** for PR 10940 at commit [`ae4dd3c`](https://github.com/apache/spark/commit/ae4dd3cc55254165a84f1c4e5f89abbf6c5e41c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175524347 **[Test build #50189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50189/consoleFull)** for PR 10940 at commit [`ae4dd3c`](https://github.com/apache/spark/commit/ae4dd3cc55254165a84f1c4e5f89abbf6c5e41c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175522507 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175479373 **[Test build #50181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50181/consoleFull)** for PR 10940 at commit [`eca66df`](https://github.com/apache/spark/commit/eca66df5f6af20e5f74a87a78c3749a0d391ffe1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r50957402 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -607,6 +607,73 @@ class LogisticRegressionSuite assert(model2.coefficients ~= coefficientsR2 relTol 1E-2) } + + test("an extra large example for review only") { --- End diff -- This test is for review only, to show an example on a larger data set. Will remove if merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on a diff in the pull request: https://github.com/apache/spark/pull/10940#discussion_r50957422 --- Diff: data/mllib/sample_libsvm_data_with_unique_column.txt --- @@ -0,0 +1,270 @@ +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1 14:1 --- End diff -- This file is for review only, to show an example on a larger data set. Will remove if merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175389799 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50169/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175389795 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175469270 **[Test build #50181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50181/consoleFull)** for PR 10940 at commit [`eca66df`](https://github.com/apache/spark/commit/eca66df5f6af20e5f74a87a78c3749a0d391ffe1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175382007 **[Test build #50168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50168/consoleFull)** for PR 10940 at commit [`09e95a7`](https://github.com/apache/spark/commit/09e95a73392292914cedbe4b0bfc373420dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175466437 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
GitHub user coderxiang opened a pull request: https://github.com/apache/spark/pull/10940 [SPARK-13029][ml] fix a logistic regression issue when inputing data has a column with identical value This is a bug that appears while fitting a Logistic Regression model with `.setStandardization(false)` and `setFitIntercept(false)`. If the data matrix has one column with identical value, the resulting model is not correct. Specifically, the special column will always get a weight of 0, due to the special check inside the code. However, the correct solution, which is unique for L2 logistic regression, usually has non-zero weight. The fix is to update the special handing logic to make it compatible with columns with std=0. You can merge this pull request into a Git repository by running: $ git pull https://github.com/coderxiang/spark dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10940 commit 437a285bfa20d9431819bbd9e01faa2622893616 Author: Shuo XiangDate: 2016-01-27T01:25:22Z handle Logistic regression with column of unique value commit 09e95a73392292914cedbe4b0bfc373420dc Author: Shuo Xiang Date: 2016-01-27T01:25:38Z handle Logistic regression with column of unique value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175423974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50168/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175393478 Jenkins, retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175423971 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175423859 **[Test build #50168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50168/consoleFull)** for PR 10940 at commit [`09e95a7`](https://github.com/apache/spark/commit/09e95a73392292914cedbe4b0bfc373420dc). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user coderxiang commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175462272 Jenkins, retest please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org