[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125295030 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125295005 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125294308 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125300346 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596332 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -114,25 +177,29 @@ class RFormula(override val uid: String) }

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596324 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,72 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596320 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,72 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125372454 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125372440 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125376053 [Test build #38602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38602/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125375942 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125375931 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598513 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598495 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -114,25 +177,29 @@ class RFormula(override val uid: String) }

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598510 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,72 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125371041 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596483 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596486 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35596488 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598503 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -48,55 +49,59 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35598479 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,72 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125373141 [Test build #38597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38597/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125376595 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125396509 [Test build #38597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38597/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125396596 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125380508 [Test build #38602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38602/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125384454 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7574 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-125380557 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-25 Thread ericl
Github user ericl commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124916428 ptal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35397462 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -130,9 +173,52 @@ class RFormula(override val uid: String) Label

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35397464 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -130,9 +173,52 @@ class RFormula(override val uid: String) Label

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35397461 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,60 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124346736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124346148 [Test build #38316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38316/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124338457 [Test build #38316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38316/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124769768 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124769744 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124770734 [Test build #38410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38410/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124776979 [Test build #38410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/38410/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124777072 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124338057 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-124338078 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-23 Thread ericl
Github user ericl commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123988222 Hmm, I guess that is pretty harmless though. Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123994687 You can construct a `Pipeline` object in `RFormula.fit`, which contains all `StringIndexer`, `OneHotEncoder`, etc. Then call `Pipeline.fit` in `RFormula.fit` and get the

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-22 Thread ericl
Github user ericl commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123961633 @mengxr to clarify, not calling `StringIndexer.fit` in `RFormula.fit` means RFormulaModel will have a reference to the original fitted dataset, correct? --- If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-22 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35279252 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -62,19 +77,60 @@ class RFormula(override val uid: String) /** @group

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-22 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35279311 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -130,9 +173,52 @@ class RFormula(override val uid: String) Label

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-22 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123913341 @ericl I think it is simpler to construct a `pipeline` in `RFormula.fit` without calling `StringIndexer.fit` explicitly. That leaves space for `pipeline.fit`

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-22 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7574#discussion_r35279570 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -130,9 +173,52 @@ class RFormula(override val uid: String) Label

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123488315 [Test build #37982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37982/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123488427 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123475216 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123475106 [Test build #37977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37977/console) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123479213 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123479155 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123467145 [Test build #37977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37977/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123465703 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123465662 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7574#issuecomment-123480884 [Test build #37982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37982/consoleFull) for PR 7574 at commit

[GitHub] spark pull request: [SPARK-9230] [ML] Support StringType features ...

2015-07-21 Thread ericl
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/7574 [SPARK-9230] [ML] Support StringType features in RFormula This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that