[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

mengxr Mon, 16 Feb 2015 22:39:13 -0800

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4460#issuecomment-74623603
  
    There are two types of `Attribute(s)`: describing a feature group (a vector 
column) or describing a single feature (a scalar column). For a feature group, 
the column name becomes the group name and individual features inside this 
group may have their own names. For example, we have a vector column called 
`user` and inside this feature group we can have features named `age` and 
`gender`. When we merge multiple groups into a single feature vector, e.g., in 
a feature vector assembler, the names are flattened like `user:age` and 
`user:gender`. This answers @sryza 's question about one-hot-encoding. Assume 
that the input column is a scalar column called "country" with categories 
stored in the attribute. Then OneHotEncoder will output a vector column and 
generate feature attributes with names like `country:US`, `country:CA`, etc.
    
    +1 on @jkbradley 's suggestion about not calling it `FeatureAttribute`. 
`Attribute` should be okay to describe a scalar column but we also need a name 
to describe a vector column, where `Attributes` may sounds a little confusing. 
I suggest `AttributeGroup`.
    
    We don't need to care about the `FeatureType` in `mllib.tree` in this PR. 
Once we have this PR merged, we can migrate the decision tree code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

Reply via email to