[GitHub] spark pull request: SPARK-7579 [ML] [DOC] User guide update for On...

SparkQA Tue, 19 May 2015 17:17:56 -0700

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/6126#issuecomment-103703111
  
      [Test build #33101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33101/consoleFull)
 for   PR 6126 at commit 
[`4f5376e`](https://github.com/apache/spark/commit/4f5376ee0a2f2700bc57cce29ad39959ca943e37).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column 
of label indices to a column of binary vectors, with at most a single 
one-value. This encoding allows algorithms which expect continuous features, 
such as Logistic Regression, to use categorical features as well. The 
[OneHotEncoder](api/scala/index.html#org.apache.spark.ml.feature.OneHotEncoder) 
class provides this functionality. By default, the resulting binary vector has 
a component for each category, so with 5 categories, an input value of 2.0 
would map to an output vector of (0.0, 0.0, 1.0, 0.0, 0.0). If the 
`includeFirst` is set to false, the first category is omitted, so the output 
vector for the previous example would be (0.0, 1.0, 0.0, 0.0) and an input 
value of 0.0 would map to a vector of all zeros. Including the first category 
makes the vector columns linearly dependent because they sum up to one.`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-7579 [ML] [DOC] User guide update for On...

Reply via email to