Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/16011
@MLnick Yeah, I think this is the most common case that copying Params from
estimators to models. However, I also found some algorithms do not comply this
rule, such as ```ALS``` which has ```ALSParams``` and ```ALSModelParams``` for
estimator and model separately.
I think we can set params to models not via estimator, for example:
```
val discretizer = new QuantileDiscretizer()
val pipeline = new Pipeline().setStages(Array(discretizer))
val model = pipeline.fit(df)
model.stages(0).asInstanceOf[Bucketizer].setHandleInvalid("skip")
```
I know this way is a little tricky, a better way may be we can have
```QuantileDiscretizerModel``` which is produced by ```QuantileDiscretizer```.
Think more about it, ```Bucketizer``` is a separate transformer which
mainly has two params(```splits``` and ```handleInvalid```) can be set. Users
can provides candidates for these two params when doing cross validation to
select the best model. But if we constrict it must be produced by
```QuantileDiscretizer```, the ```splits``` would be a member variable of the
model rather than a param. From this perspective, it's more make sense to see
```Bucketizer``` as a separate transformer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]