GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/7574

    [SPARK-9230] [ML] Support StringType features in RFormula

    This adds StringType feature support via OneHotEncoder. As part of this 
task it was necessary to change RFormula to an Estimator, so that factor levels 
could be determined from the training dataset.
    
    Not sure if I am using uids correctly here, would be good to get reviewer 
help on that.
    cc @mengxr 
    
    Umbrella design doc: 
https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit#

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark string-features

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7574
    
----
commit a1d03f44f7e226198bde129cc0f40827761bff17
Author: Eric Liang <[email protected]>
Date:   2015-07-20T22:25:55Z

    refactor into estimator

commit 8a637db882175161ef17dce0795cf1576b594f20
Author: Eric Liang <[email protected]>
Date:   2015-07-20T23:40:20Z

    encoder wip

commit b01c7c5c90efac1d3470b2c463fddf91fbf67408
Author: Eric Liang <[email protected]>
Date:   2015-07-21T00:53:11Z

    add test

commit 5b2c4a2d8c29065a232aa207deaa6e869e545131
Author: Eric Liang <[email protected]>
Date:   2015-07-21T01:45:33Z

    Mon Jul 20 18:45:33 PDT 2015

commit d841cec4f42cef5dbda3d43e036964ae63fd71c9
Author: Eric Liang <[email protected]>
Date:   2015-07-21T17:49:29Z

    Merge branch 'master' into string-features
    
    Conflicts:
        mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
        mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala

commit 72bd6f333dd118a900338917213bb8e75144c6e7
Author: Eric Liang <[email protected]>
Date:   2015-07-21T19:22:57Z

    fix merge

commit a230a4790c5163d337781fb9f50cca8a7f83a8b1
Author: Eric Liang <[email protected]>
Date:   2015-07-21T19:49:03Z

    Merge branch 'master' into string-features

commit 169a0850fc40964194e48c4b317b74226a542cd5
Author: Eric Liang <[email protected]>
Date:   2015-07-21T20:08:48Z

    tweak functional test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to