[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 That's OK. I will improve the docs in another 2.1 QA PR and close this one. Thanks for all your clarification. @MLnick @jkbradley --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16011 I agree with @MLnick that we should not make this change. We need users to be able to set all Params in Estimators, including the Params of the Models they produce. If this is confusing for

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 When I did QA work for 2.1, I found the parameter ``` handleInvalid``` is disorienting. For example, the default behavior of ```QuantileDiscretizer``` to handle invalid value (i.e. NaN) is

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-27 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16011 As far as I recall, the idea is that the `Bucketizer` can be used standalone, and because the `QuantileDiscretizer` itself produced the same thing as a bucketizer, it was used as the model rather

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 @MLnick Yeah, I think this is the most common case that copying Params from estimators to models. However, I also found some algorithms do not comply this rule, such as ```ALS``` which has

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16011 Typically the estimator Params are copied to the model though. How do you propose to set the handle invalid param in say a pipeline? On Fri, 25 Nov 2016 at 18:38, Yanbo Liang

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 @MLnick Your description is totally correct. However, the ```model``` you used in your example is type of ```Bucketizer```. I will keep ```handleInvalid``` in ```Bucketizer```. In the current ML

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16011 I don't think this is correct. The idea is `QuantileDiscretizer` skips NaN when creating the buckets, but there will still be an error thrown during `transform` (rather than `fit`) if NaNs are in

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 cc @jkbradley @srowen @VinceShieh @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16011 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16011 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69155/ Test PASSed. ---

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16011 **[Test build #69155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69155/consoleFull)** for PR 16011 at commit

[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16011 **[Test build #69155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69155/consoleFull)** for PR 16011 at commit