GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/6308
[SPARK-7219] [MLLIB] Output feature attributes in HashingTF
This PR updates `HashingTF` to output ML attributes that tell the number of
features in the output column. We need to expand `UnaryTransformer` to support
output metadata. A `df outputMetadata: Metadata` is not sufficient because the
metadata may also depends on the input data. Though this is not true for
`HashingTF`, I think it is reasonable to update `UnaryTransformer` in a
separate PR. `checkParams` is added to verify common requirements for params. I
will send a separate PR to use it in other test suites. @jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-7219
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6308.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6308
----
commit 91a6106f9d0576fe4842fca7b99e6e69ea9d7707
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-20T23:55:07Z
WIP
commit 178ae238e2c3a002069973535dcc247b41a95172
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-21T00:21:31Z
update HashingTF with tests
commit 2194703ca6bb1a1983b58290861287e6956d0444
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-21T01:33:56Z
add test for attributes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]