[GitHub] spark pull request #19024: [SPARK-21469][ML][EXAMPLES] Adding Examples for F...

MLnick Wed, 23 Aug 2017 05:50:22 -0700

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19024#discussion_r134741609
  
    --- Diff: docs/ml-features.md ---
    @@ -211,6 +211,65 @@ for more details on the API.
     </div>
     </div>
     
    +## FeatureHasher
    +
    +Feature hashing projects a set of categorical or numerical features into a 
feature vector of
    +specified dimension (typically substantially smaller than that of the 
original feature
    +space). This is done using the [hashing 
trick](https://en.wikipedia.org/wiki/Feature_hashing)
    +to map features to indices in the feature vector.
    +
    +The `FeatureHasher` transformer operates on multiple columns. Each column 
may contain either
    +numeric or categorical features. Behavior and handling of column data 
types is as follows:
    +
    +- Numeric columns: For numeric features, the hash value of the column name 
is used to map the
    +feature value to its index in the feature vector. Numeric features are 
never treated as
    +categorical, even when they are integers. You must explicitly convert 
numeric columns containing
    +categorical features to strings first.
    +- String columns: For categorical features, the hash value of the string 
"column_name=value"
    +is used to map to the vector index, with an indicator value of `1.0`. 
Thus, categorical features
    +are "one-hot" encoded (similarly to using `OneHotEncoder` with 
`dropLast=false`).
    --- End diff --
    
    Should link to `OneHotEncoder` section within the guide here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19024: [SPARK-21469][ML][EXAMPLES] Adding Examples for F...

Reply via email to