[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 Thanks Sean. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65821/ Test PASSed. ---

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65821/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65821/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15179 Accepting more input types SGTM too (with unit tests). The PR title and description (and perhaps the JIRA too) should be updated. Thanks! --- If your project is set up for it, you can reply to

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 Sorry, I meant unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15179 Sorry, what's a UT here? user defined type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 LGTM. How about adding an UT for the input type new ArrayType(StringType, false)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65769/ Test PASSed. ---

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65769 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65769/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65769 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65769/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15179 Yes that's better IMHO. It makes this more consistent. Word2Vec was the only case that looked for an exact match on nullability when it was expecting nullable input (i.e. pointlessly disallowed

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 `CountVectorizer` can accept both `Array(String, true)` and `Array(String, false)`. Perhaps we can just change the validation code. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15179 I think the problem is that it won't accept anything that outputs `Array(String, true)`? it seems like it's looking for exactly the same type. Elsewhere I see a similar check written so as to not

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 Checked locally and Word2Vec can work with `Array(String, false)`. We can loosen the constraint. Maybe a new UT can be added. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 I'm not sure what will happen if we actually send an `Array(String, false)` column to Word2Vec. If nothing goes wrong, I think we can loosen the constraint. I'll try something locally. ---

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15179 Or really to have it accept an array of strings, whether nullable or not. Right now it can accept a nullable string array type, but will pointlessly reject input that is _not_ nullable. The check is

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/15179 Hi @srowen. By fixing `Word2Vec`, do you mean to change its input type to `Array(String, false)`? I cannot recall all the details. Right now I think the primary reason is to be consistent

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15179 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65711/ Test PASSed. ---

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65711/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15179 **[Test build #65711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65711/consoleFull)** for PR 15179 at commit

[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15179 This change merely parallels https://github.com/apache/spark/pull/7414 from @hhbyyh (what do you think?) I'd still sort of like to understand why we didn't 'fix' Word2Vec instead in that change?