GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/11832
[SPARK-13963][ML] Adding binary toggle param to HashingTF
## What changes were proposed in this pull request?
Adding binary toggle parameter to ml.feature.HashingTF, as well as
mllib.feature.HashingTF since the former wraps this functionality. This
parameter, if true, will set non-zero valued term counts to 1 to transform term
count features to binary values that are well suited for discrete probability
models.
## How was this patch tested?
Added unit tests for ML and MLlib
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
binary-param-HashingTF-SPARK-13963
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11832.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11832
----
commit a5ff3309c0d07e57177374133130803eb98ebffb
Author: Bryan Cutler <[email protected]>
Date: 2016-03-18T21:19:19Z
[SPARK-13963] Adding binary toggle to HashingTF in ml/mllib
commit 31097231769860b86d1d3234ebf7d4e95f96e5cb
Author: Bryan Cutler <[email protected]>
Date: 2016-03-18T21:19:48Z
Added unit test for HashingTF binary toggle
commit ca1436166a1292f92d72408c10cf606623b31bbd
Author: Bryan Cutler <[email protected]>
Date: 2016-03-18T21:26:34Z
fixed param description text
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]