[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

jkbradley Fri, 20 Feb 2015 19:03:06 -0800

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4460#issuecomment-75352674
  
    +1 for the feature assembler or some other algorithm handling munging and 
indexing as needed.
    * Note that the behavior of the assembler may depend on the algorithm being 
used.  E.g., an assembler may want to use 1-hot encoding for Strings for linear 
regression, but use simple indexing for trees.  That makes it awkward for the 
user, and we may eventually want each algorithm to handle its own feature 
assembly if needed.
    
    About categorical types for decision trees: There should ideally be a 
distinction between categorical types with arbitrary values and categorical 
types known to be in a range {0, 1, ..., numCategories-1}.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

Reply via email to