[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-09 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207738831 Yup, LGTM. Merged to master. Thanks all for review and @wangmiao1981 for the PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12200 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207628271 This LGTM For defaults, if the Param applies to training only (like minDF), then it makes sense to set the default in the Estimator. But if the Param

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207523226 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207523225 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207522950 **[Test build #55359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55359/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207507859 **[Test build #55359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55359/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207266803 @wangmiao1981 I think you're right - we should move the `setDefault` call for binary back to the trait as the param is shared between the estimator and model. Thanks!

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-08 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58987840 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207082749 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207082751 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207082605 **[Test build #55241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55241/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-207072619 **[Test build #55241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55241/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58929398 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid:

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206999812 @MLnick I will revise the test accordingly. I think after testing the estimator, I need to turn off the flag of the trained model first. Otherwise, the binary

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206724684 @wangmiao1981 the tests are ok - but I do like @BrianCutler's idea of fitting the vectorizer first, and then building the model from that fitted vocab - it seems

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-07 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58826834 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206523617 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206523611 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206523215 **[Test build #55135 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55135/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206507539 **[Test build #55135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55135/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206506499 **[Test build #55134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55134/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206506504 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206506503 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206506032 **[Test build #55134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55134/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206502911 **[Test build #55133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55133/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206502928 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206502936 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206502365 **[Test build #55133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55133/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206501277 **[Test build #55132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55132/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206501293 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206501286 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206500835 **[Test build #55132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55132/consoleFull)** for PR 12200 at commit

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206500157 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58754677 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,19 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58754307 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58753889 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206493241 @wangmiao1981 the test looks ok to me now, although I'm not too sure why you added extra 'a's and 'b's, but that's fine. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58752457 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -114,7 +114,7 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58752409 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,19 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58750775 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58749870 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58748929 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58746316 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58744522 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58744168 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -183,6 +183,26 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58743453 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58742287 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -127,6 +146,9 @@ class CountVectorizer(override val uid: String)

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58740250 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58739580 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58673286 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58672875 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -100,6 +103,24 @@ private[feature] trait CountVectorizerParams

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58672971 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -184,7 +208,8 @@ object CountVectorizer extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58672317 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -42,7 +42,8 @@ private[feature] trait CountVectorizerParams extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206186423 I'll do it once you update the tests as discussed (so we only test when necessary) :) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58662986 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -115,6 +115,27 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206181665 @MLnick can you trigger the auto test? It seems that I am not in the white list. I had one JIRA merged to master. Thanks! Miao --- If your project is set up

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58662007 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -115,6 +115,27 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12200#discussion_r58657525 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -115,6 +115,27 @@ class CountVectorizerSuite extends

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12200#issuecomment-206140085 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14392][ML]CountVectorizer Estimator sho...

2016-04-06 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12200 [SPARK-14392][ML]CountVectorizer Estimator should include binary toggle Param ## What changes were proposed in this pull request? CountVectorizerModel has a binary toggle param. This