[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

davireis Fri, 28 Apr 2017 04:06:07 -0700

Github user davireis commented on the issue:

    https://github.com/apache/spark/pull/12614
  
    Just weighting in the motivations: 
    
    https://0xdata.atlassian.net/browse/SW-224
    
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-ml-Why-is-private-class-ColumnPruner-td16863.html
    
    And my own use case: I have a dataframe with two textual columns on which I 
want to run apply a LDAModel. This model was trained in a different dataset, 
and although I can reset its input (setFeatureCol), I cannot reset its output 
(no setTopicDistributionCol in the trained model). Since both applications of 
LDAModel will output in the same column name, my pipeline barfs. If I had 
ColumnPruner, I could just combine it with SQLTransformer to rename the output 
column. Alternatively LDAModel itself could be fixed, or I could build a 
WithColumnRenamedTransformer. But ColumnPruner would suffice as primitive for 
many use cases I believe, since most of the other simple schema manipulations 
can be achieved with SQLTransformer. Maybe I am missing some already in-place 
alternatives, but from what I understand, I can only achieve what I want now 
with a custom transformer.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #12614: [SPARK-14730][ML] Expose ColumnPruner as feature transfo...

Reply via email to