Github user davireis commented on the issue:
https://github.com/apache/spark/pull/12614
Just weighting in the motivations:
https://0xdata.atlassian.net/browse/SW-224
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-ml-Why-is-private-class-ColumnPruner-td16863.html
And my own use case: I have a dataframe with two textual columns on which I
want to run apply a LDAModel. This model was trained in a different dataset,
and although I can reset its input (setFeatureCol), I cannot reset its output
(no setTopicDistributionCol in the trained model). Since both applications of
LDAModel will output in the same column name, my pipeline barfs. If I had
ColumnPruner, I could just combine it with SQLTransformer to rename the output
column. Alternatively LDAModel itself could be fixed, or I could build a
WithColumnRenamedTransformer. But ColumnPruner would suffice as primitive for
many use cases I believe, since most of the other simple schema manipulations
can be achieved with SQLTransformer. Maybe I am missing some already in-place
alternatives, but from what I understand, I can only achieve what I want now
with a custom transformer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]