Github user aseigneurin commented on the pull request:

    https://github.com/apache/spark/pull/8152#issuecomment-132155133
  
    I like the fact that `IndexToString` is a standard `Transformer`. One thing 
though, is that I don't see how to use it in the same `Pipeline` as a 
`StringIndexer`.
    
    I would like to have a pipeline with the following stages:
    
    - `StringIndexer` to transform the "label" column ("y"/"n" values) to a 
"label_indexed" column (0/1)
    - `RandomForestClassifier` using the "label_indexed" column and producing a 
"prediction" column (0/1)
    - `IndexToString` converting the "prediction" column (0/1) to a 
"prediction_label" column (back to "y"/"n" values)
    
    However, because I have to feed the `IndexToString` with the labels from 
the `StringIndexer`, I'm unable to do that.
    
    I have seen that, if I don't call `IndexToString.setLabels`, the input 
column's metadata is used to get the labels but, in my case, the "prediction" 
column doesn't have metadata.
    
    Any idea how to proceed without having to put the `StringIndexer` and 
`IndexToString` transformers out of the pipeline? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to