Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6126#issuecomment-103703111
[Test build #33101 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33101/consoleFull)
for PR 6126 at commit
[`4f5376e`](https://github.com/apache/spark/commit/4f5376ee0a2f2700bc57cce29ad39959ca943e37).
* This patch **passes all tests**.
* This patch **does not merge cleanly**.
* This patch adds the following public classes _(experimental)_:
* `[One-hot encoding](http://en.wikipedia.org/wiki/One-hot) maps a column
of label indices to a column of binary vectors, with at most a single
one-value. This encoding allows algorithms which expect continuous features,
such as Logistic Regression, to use categorical features as well. The
[OneHotEncoder](api/scala/index.html#org.apache.spark.ml.feature.OneHotEncoder)
class provides this functionality. By default, the resulting binary vector has
a component for each category, so with 5 categories, an input value of 2.0
would map to an output vector of (0.0, 0.0, 1.0, 0.0, 0.0). If the
`includeFirst` is set to false, the first category is omitted, so the output
vector for the previous example would be (0.0, 1.0, 0.0, 0.0) and an input
value of 0.0 would map to a vector of all zeros. Including the first category
makes the vector columns linearly dependent because they sum up to one.`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]