[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 I'd close this and favor the quick fix #20594 based on the discussion in JIRA. Will re-open it if it is needed later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 @jkbradley Thanks! I will post the problem and proposed design on the JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20566 Thanks for the patch @viirya As always, I'll request that we put design decisions & long discussions in JIRA so that they are easier to uncover. It can also be good to get quick feedback about design before implementation. I'll comment in JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 cc @MLnick @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87302/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/788/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87301/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/787/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87299/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/785/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Not only `threshold`, the default params of `NaiveBayes`, `LogisticRegression` (maybe more, I'm looking up now) are all set in the estimator, not in their model. The models are received the default values at the end of `fit`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20566 I believe this will break persistence for LogisticRegression. I believe the issue is that the `threshold` param on LogisticRegressionModel doesn't get a default directly, but only gets it during the call to `fit` on LogisticRegression. This is currently fine because the Model can only be created by fitting or by being read from disk and in both case some value gets set for threshold. With this change that's no longer the case. Here's a test to confirm, https://github.com/apache/spark/commit/5db2108224accdf848b41ef0d8d1c312b49f49c6. I believe LinearRegression may have a similar issue. Our current tests don't seem to cover this kind of thing so I think we should improve test coverage if we want to make this kind of change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87289/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit [`6228006`](https://github.com/apache/spark/commit/6228006fdb62ca25ffda21dab3e88cfe406a9e0b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit [`6228006`](https://github.com/apache/spark/commit/6228006fdb62ca25ffda21dab3e88cfe406a9e0b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/775/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit [`3b5e7c6`](https://github.com/apache/spark/commit/3b5e7c64742f7eeaf2fe9d3cb95bbbcef1f15abc). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87285/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Yeah, IMHO, when the user loads a model from old version into new version to run, I think it is reasonable to run it with current default value because the param is not explicitly set and should use "default" value of current system. Thanks for your comment. Let's wait for others' option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20566 @viirya that's a good question. Honestly my idea is that if the user doesn't set a value, he/she doesn't care about it, so it is good to use the new version default IMHO. But it is also true that changing a default may cause unexpected behavior in user code. So, it LGTM, but I'd like to hear others' opinion on this too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/771/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit [`3b5e7c6`](https://github.com/apache/spark/commit/3b5e7c64742f7eeaf2fe9d3cb95bbbcef1f15abc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 @mgaido91 I also considered the issue of changed default values across versions. I'm not sure which is more reasonable, using old version's default value or using current version's default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87283/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit [`7785cac`](https://github.com/apache/spark/commit/7785cacee8dd4a6e9938c3c99dad3ad3117655d3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/769/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit [`7785cac`](https://github.com/apache/spark/commit/7785cacee8dd4a6e9938c3c99dad3ad3117655d3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org