[GitHub] spark pull request: [SPARK-5886][ML] Add label indexer

jkbradley Tue, 24 Feb 2015 15:12:48 -0800

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4735#issuecomment-75869252
  
    Is LabelIndexer going to be different from FeatureIndexer.  We will need a 
transformer to index features as well.  I've been planning to revive my old PR 
for DatasetIndexer which was headed this way: 
[https://github.com/apache/spark/pull/3000], but it would be good not to 
duplicate efforts.
    
    If you want to merge the 2, that would be great, or I could push an update. 
 The main thing I liked about DatasetIndexer is that it could be used to choose 
which features to treat as continuous vs. categorical based on a maxCategories 
threshold.  Choosing automatically helps users to avoid having to hand-pick 
columns as categorical vs. continuous.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5886][ML] Add label indexer

Reply via email to