Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4460#issuecomment-75170753
Call it `AttributeType` maybe?
So if an `AttributeGroup` contains both `Attribute`s but also vector-valued
columns, which sound like `AttributeGroup`s within themselves. That's why it
seemed like `AttributeGroup` should be an `Attribute` or at least share a
common superclass? then I didn't know what to call it and it seemed like
overkill. That was the logic behind `AttributeGroup extends Attribute` -- WDYT?
As for hierarchy that's all I can think of. Ordinal extends discrete
extends continuous; binary extends, well, discrete and categorical I suppose.
Hm, I'd imagine most categorical features come in as strings. This feels
like just the kind of thing a framework can accommodate if it has the type
information. I don't think it's more or less complex to say that a string
column can be categorical? It would take some work to inject a translation to
integers where that's needed but that's great if the framework can do that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]