Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4460#issuecomment-74648170
So is the idea that `FeatureAttributes` becomes `AttributeGroup`, and that
it continues to contain many `Attribute`s? I didn't realize that we intended
the vector-valued features to be whole schemas within themselves. So they may
be `AttributeGroup`s too and so an `AttributeGroup` is an `Attribute` too.
Makes sense.
Rename `FeatureType`? and what's its value for `AttributeGroup`? `GROUP` or
`null`?
You could imagine a more elaborate hierarchy of types: _discrete_ is a
special case of _continuous_, _ordinal_ is a special case of _discrete_. It's
nice to have that expressiveness; it adds somewhat to the complexity for the
caller and the code. Maybe you could argue that the schema should force an
interpretation for the algorithm. But I kind of like it. The type objects would
have methods like `isContinuous`, `isCategorical`. Should I make a fuller
hierarchy or stick to adding `BINARY`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]