Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4460#issuecomment-75370475
- (Any support for making Metadata .get methods return `Option`?)
- I created a `FeatureType` hierarchy. It's a little tricky and involves
`trait`s because `Binary` has to have two types, but works.
- I did not yet make a `BinaryAttribute` or `DiscreteAttribute`, pending
views on whether it's worth it. The same sort of extra work to inject a
`CategoricalTypeAttribute` trait or something is necessary in order to let
`DiscreteAttribute` share attributes of a `CategoricalAttribute` and
`ContinuousAttribute`. Worth it?
- I did not yet make a `DiscreteAttribute`. Worth it? That one's not hard.
- `CategoricalAttribute` now accepts an optional, explicit cardinality (and
that replaces `numCategories`). I think it should only be defined for
categorical types, no? I added an example in the scaladoc.
- I did not rename `FeatureAttributes` or try to make it extend
`Attribute`, since I understand that vector-valued features do not have
heterogeneous types. That is, a vector-valued feature is simply a many-valued
continuous or categorical or binary column, not a complete sub-schema.
- I went to add `AttributeGroup`, but then I can't figure out how this
isn't already covered by `Attribute`'s dimension? It's 1 for a scalar, >1 for a
vector-valued feature. That's all that's different from a metadata perspective,
right?
@mengxr Please feel free at any time to take this over if it's easier. It
will become more efficient at some point for you to just finish it as you think
best. I personally believe your vision sounds good and am more concerned with
making it match the vision of the other code you're creating, and letting you
get on with that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]