[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
Thanks Sean. LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65821/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65821/consoleFull)**
 for PR 15179 at commit 
[`76af236`](https://github.com/apache/spark/commit/76af236ab516026d43bdbce679b49a3629108ef3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65821/consoleFull)**
 for PR 15179 at commit 
[`76af236`](https://github.com/apache/spark/commit/76af236ab516026d43bdbce679b49a3629108ef3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15179
  
Accepting more input types SGTM too (with unit tests).  The PR title and 
description (and perhaps the JIRA too) should be updated.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
Sorry, I meant unit test



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15179
  
Sorry, what's a UT here? user defined type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
LGTM. How about adding an UT for the input type new ArrayType(StringType, 
false)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65769/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65769/consoleFull)**
 for PR 15179 at commit 
[`4335237`](https://github.com/apache/spark/commit/43352373d949e97f1c6bff65fc00fbd4a86e1db5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65769 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65769/consoleFull)**
 for PR 15179 at commit 
[`4335237`](https://github.com/apache/spark/commit/43352373d949e97f1c6bff65fc00fbd4a86e1db5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15179
  
Yes that's better IMHO. It makes this more consistent. Word2Vec was the 
only case that looked for an exact match on nullability when it was expecting 
nullable input (i.e. pointlessly disallowed non-null input)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
`CountVectorizer` can accept both `Array(String, true)` and `Array(String, 
false)`. Perhaps we can just change the validation code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15179
  
I think the problem is that it won't accept anything that outputs 
`Array(String, true)`? it seems like it's looking for exactly the same type. 
Elsewhere I see a similar check written so as to not have to match nullability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
Checked locally and Word2Vec can work with `Array(String, false)`. We can 
loosen the constraint. Maybe a new UT can be added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
I'm not sure what will happen if we actually send an `Array(String, false)` 
column to Word2Vec. If nothing goes wrong, I think we can loosen the constraint.

I'll try something locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15179
  
Or really to have it accept an array of strings, whether nullable or not. 
Right now it can accept a nullable string array type, but will pointlessly 
reject input that is _not_ nullable. The check is too strict. We could just 
loosen that and then any string array input would work, which seems like the 
idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/15179
  
Hi @srowen. By fixing `Word2Vec`, do you mean to change its input type to 
`Array(String, false)`?

I cannot recall all the details. Right now I think the primary reason is to 
be consistent with auto schema inference, thus to support usage like 
https://github.com/apache/spark/blob/180fd3e0a3426db200c97170926afb60751dfd0e/examples/src/main/scala/org/apache/spark/examples/ml/Word2VecExample.scala
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65711/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65711/consoleFull)**
 for PR 15179 at commit 
[`0c7e552`](https://github.com/apache/spark/commit/0c7e552bae0fad20dd269476924f0b3d1256b67a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15179
  
**[Test build #65711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65711/consoleFull)**
 for PR 15179 at commit 
[`0c7e552`](https://github.com/apache/spark/commit/0c7e552bae0fad20dd269476924f0b3d1256b67a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15179: [SPARK-10835] [ML] Change Output of NGram to Array(Strin...

2016-09-21 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15179
  
This change merely parallels https://github.com/apache/spark/pull/7414 from 
@hhbyyh  (what do you think?) I'd still sort of like to understand why we 
didn't 'fix' Word2Vec instead in that change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org