Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/304#issuecomment-40024806
@sryza I made one pass over the code. Besides the inline comments:
1. The output of one-hot is always sparse, we should use sparse vector
instead of dense.
2. This is part of feature transformation. Using `Array` to store features
would result reallocation of memory. We should spend more time on the data
types.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---