[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

mengxr Wed, 09 Apr 2014 15:34:32 -0700

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/304#issuecomment-40024806
  
    @sryza I made one pass over the code. Besides the inline comments:
    
    1. The output of one-hot is always sparse, we should use sparse vector 
instead of dense.
    
    2. This is part of feature transformation. Using `Array` to store features 
would result reallocation of memory. We should spend more time on the data 
types.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

Reply via email to