Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
That's OK. I will improve the docs in another 2.1 QA PR and close this one.
Thanks for all your clarification. @MLnick @jkbradley
---
If your project is set up for it, you can reply to this
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16011
I agree with @MLnick that we should not make this change. We need users to
be able to set all Params in Estimators, including the Params of the Models
they produce. If this is confusing for
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
When I did QA work for 2.1, I found the parameter ``` handleInvalid``` is
disorienting. For example, the default behavior of ```QuantileDiscretizer``` to
handle invalid value (i.e. NaN) is
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/16011
As far as I recall, the idea is that the `Bucketizer` can be used
standalone, and because the `QuantileDiscretizer` itself produced the same
thing as a bucketizer, it was used as the model rather
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
@MLnick Yeah, I think this is the most common case that copying Params from
estimators to models. However, I also found some algorithms do not comply this
rule, such as ```ALS``` which has
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/16011
Typically the estimator Params are copied to the model though. How do you
propose to set the handle invalid param in say a pipeline?
On Fri, 25 Nov 2016 at 18:38, Yanbo Liang
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
@MLnick Your description is totally correct. However, the ```model``` you
used in your example is type of ```Bucketizer```. I will keep
```handleInvalid``` in ```Bucketizer```. In the current ML
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/16011
I don't think this is correct. The idea is `QuantileDiscretizer` skips NaN
when creating the buckets, but there will still be an error thrown during
`transform` (rather than `fit`) if NaNs are in
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
cc @jkbradley @srowen @VinceShieh @sethah
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16011
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69155/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16011
**[Test build #69155 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69155/consoleFull)**
for PR 16011 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16011
**[Test build #69155 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69155/consoleFull)**
for PR 16011 at commit
13 matches
Mail list logo