GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/6308

    [SPARK-7219] [MLLIB] Output feature attributes in HashingTF

    This PR updates `HashingTF` to output ML attributes that tell the number of 
features in the output column. We need to expand `UnaryTransformer` to support 
output metadata. A `df outputMetadata: Metadata` is not sufficient because the 
metadata may also depends on the input data. Though this is not true for 
`HashingTF`, I think it is reasonable to update `UnaryTransformer` in a 
separate PR. `checkParams` is added to verify common requirements for params. I 
will send a separate PR to use it in other test suites. @jkbradley

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-7219

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6308
    
----
commit 91a6106f9d0576fe4842fca7b99e6e69ea9d7707
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-20T23:55:07Z

    WIP

commit 178ae238e2c3a002069973535dcc247b41a95172
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-21T00:21:31Z

    update HashingTF with tests

commit 2194703ca6bb1a1983b58290861287e6956d0444
Author: Xiangrui Meng <[email protected]>
Date:   2015-05-21T01:33:56Z

    add test for attributes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to