[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 I'd close this and favor the quick fix #20594 based on the discussion in JIRA. Will re-open it if it is needed later. --- - To

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 @jkbradley Thanks! I will post the problem and proposed design on the JIRA. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-12 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20566 Thanks for the patch @viirya As always, I'll request that we put design decisions & long discussions in JIRA so that they are easier to uncover. It can also be good to get quick feedback

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 cc @MLnick @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87302/ Test PASSed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/788/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87301/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/787/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87299/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/785/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Not only `threshold`, the default params of `NaiveBayes`, `LogisticRegression` (maybe more, I'm looking up now) are all set in the estimator, not in their model. The models are received the default

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20566 I believe this will break persistence for LogisticRegression. I believe the issue is that the `threshold` param on LogisticRegressionModel doesn't get a default directly, but only gets it during the

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87289/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/775/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87285/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 Yeah, IMHO, when the user loads a model from old version into new version to run, I think it is reasonable to run it with current default value because the param is not explicitly set and should use

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20566 @viirya that's a good question. Honestly my idea is that if the user doesn't set a value, he/she doesn't care about it, so it is good to use the new version default IMHO. But it is also true that

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/771/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20566 @mgaido91 I also considered the issue of changed default values across versions. I'm not sure which is more reasonable, using old version's default value or using current version's default value.

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87283/ Test FAILed. ---

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/769/

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20566 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20566 **[Test build #87283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit