[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

srowen Tue, 17 Feb 2015 02:40:13 -0800

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4460#issuecomment-74648170
  
    So is the idea that `FeatureAttributes` becomes `AttributeGroup`, and that 
it continues to contain many `Attribute`s? I didn't realize that we intended 
the vector-valued features to be whole schemas within themselves. So they may 
be `AttributeGroup`s too and so an `AttributeGroup` is an `Attribute` too. 
Makes sense.
    
    Rename `FeatureType`? and what's its value for `AttributeGroup`? `GROUP` or 
`null`?
    
    You could imagine a more elaborate hierarchy of types: _discrete_ is a 
special case of _continuous_, _ordinal_ is a special case of _discrete_. It's 
nice to have that expressiveness; it adds somewhat to the complexity for the 
caller and the code. Maybe you could argue that the schema should force an 
interpretation for the algorithm. But I kind of like it. The type objects would 
have methods like `isContinuous`, `isCategorical`. Should I make a fuller 
hierarchy or stick to adding `BINARY`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

Reply via email to