Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/16011
As far as I recall, the idea is that the `Bucketizer` can be used
standalone, and because the `QuantileDiscretizer` itself produced the same
thing as a bucketizer, it was used as the model rather than having a dedicated
`QuantileDiscretizerModel`.
`Bucketizer` is already a separate transformer (it is not required to be
produced by a `QuantileDiscretizer`), since it's a `Model` and the constructor
is public (by design). So it by itself can be used in a pipeline, and the
`splits` param could be selected via cross-validation (for example).
What you propose here makes using `QuantileDiscretizer` and a non-default
`handleInvalid` param together with cross-validation impossible. In addition,
as you've pointed out in your code example above, this would force a pretty
clunky "workaround" to set the `handleInvalid` param in a pipeline.
Why do this? What is the actual problem with what exists currently? To me
it seems better the way it is. Also, I don't see any major benefit to adding a
new `QuantileDiscretizerModel`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]