GitHub user ericl opened a pull request:
https://github.com/apache/spark/pull/7574
[SPARK-9230] [ML] Support StringType features in RFormula
This adds StringType feature support via OneHotEncoder. As part of this
task it was necessary to change RFormula to an Estimator, so that factor levels
could be determined from the training dataset.
Not sure if I am using uids correctly here, would be good to get reviewer
help on that.
cc @mengxr
Umbrella design doc:
https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit#
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ericl/spark string-features
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7574.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7574
----
commit a1d03f44f7e226198bde129cc0f40827761bff17
Author: Eric Liang <[email protected]>
Date: 2015-07-20T22:25:55Z
refactor into estimator
commit 8a637db882175161ef17dce0795cf1576b594f20
Author: Eric Liang <[email protected]>
Date: 2015-07-20T23:40:20Z
encoder wip
commit b01c7c5c90efac1d3470b2c463fddf91fbf67408
Author: Eric Liang <[email protected]>
Date: 2015-07-21T00:53:11Z
add test
commit 5b2c4a2d8c29065a232aa207deaa6e869e545131
Author: Eric Liang <[email protected]>
Date: 2015-07-21T01:45:33Z
Mon Jul 20 18:45:33 PDT 2015
commit d841cec4f42cef5dbda3d43e036964ae63fd71c9
Author: Eric Liang <[email protected]>
Date: 2015-07-21T17:49:29Z
Merge branch 'master' into string-features
Conflicts:
mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala
commit 72bd6f333dd118a900338917213bb8e75144c6e7
Author: Eric Liang <[email protected]>
Date: 2015-07-21T19:22:57Z
fix merge
commit a230a4790c5163d337781fb9f50cca8a7f83a8b1
Author: Eric Liang <[email protected]>
Date: 2015-07-21T19:49:03Z
Merge branch 'master' into string-features
commit 169a0850fc40964194e48c4b317b74226a542cd5
Author: Eric Liang <[email protected]>
Date: 2015-07-21T20:08:48Z
tweak functional test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]