[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

MLnick Wed, 17 Jan 2018 04:50:12 -0800

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20257#discussion_r162040939
  
    --- Diff: docs/ml-features.md ---
    @@ -783,11 +783,11 @@ Because this existing `OneHotEncoder` is a stateless 
transformer, it is not usab
     
     ## OneHotEncoderEstimator
     
    -[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column of 
label indices to a column of binary vectors, and each output binary vector 
includes at most a single one-value. This encoding allows algorithms which 
expect continuous features, such as Logistic Regression, to use categorical 
features. For string type input data, it is common to encode categorical 
features using [StringIndexer](ml-features.html#stringindexer) first.
    +[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a 
categorical feature, represented as a label index, to a binary vector with at 
most a single one-value indicating the presence of a specific feature value 
from among the set of all feature values.
    --- End diff --
    
    @viirya sorry for any confusion but I didn't intend you to remove these 
sentences:
    
    ```This encoding allows algorithms which expect continuous features, such 
as Logistic Regression, to use categorical features. For string type input 
data, it is common to encode categorical features using 
[StringIndexer](ml-features.html#stringindexer) first.```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20257: [SPARK-23048][ML] Add OneHotEncoderEstimator docu...

Reply via email to