[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-14 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750041#comment-15750041 ] Joseph K. Bradley commented on SPARK-18374: --- Oh nice, I didn't realize that was

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-14 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749138#comment-15749138 ] Sean Owen commented on SPARK-18374: --- Yeah I tagged as 'releasenotes' for that reason --

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-14 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749114#comment-15749114 ] Joseph K. Bradley commented on SPARK-18374: --- I noted this change of behavior in

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712533#comment-15712533 ] Apache Spark commented on SPARK-18374: -- User 'hhbyyh' has created a pull request for

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711645#comment-15711645 ] Sean Owen commented on SPARK-18374: --- Seems OK to me and to remove the stems like won.

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-30 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710198#comment-15710198 ] yuhao yang commented on SPARK-18374: I checked with some other lists of stopwords and

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708181#comment-15708181 ] Sean Owen commented on SPARK-18374: --- I think you can proceed to remove things like "won

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-29 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707659#comment-15707659 ] yuhao yang commented on SPARK-18374: Yes. Currently we're discussing if we should put

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-29 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707642#comment-15707642 ] Xiangrui Meng commented on SPARK-18374: --- See the discussion here: https://github.co

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-29 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707560#comment-15707560 ] yuhao yang commented on SPARK-18374: cc [~mengxr] to see if he recalls any specific r

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-14 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664807#comment-15664807 ] Sean Owen commented on SPARK-18374: --- [~whisper] do you have a comment here? it does see

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-14 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664503#comment-15664503 ] yuhao yang commented on SPARK-18374: Thanks for the response. By default, _Tokenizer_

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-14 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663273#comment-15663273 ] Sean Owen commented on SPARK-18374: --- Adding the stop-words is fine, however, if the ups

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-13 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15662015#comment-15662015 ] yuhao yang commented on SPARK-18374: With the default behavior of the _Tokenizer_ and

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652076#comment-15652076 ] Sean Owen commented on SPARK-18374: --- It's a fair point indeed, because it would be much

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-09 Thread nirav patel (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651793#comment-15651793 ] nirav patel commented on SPARK-18374: - [~srowen] Do you mean how to tokenize words in

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650799#comment-15650799 ] Sean Owen commented on SPARK-18374: --- I think the idea is that it's applied post-tokeniz

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-08 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649953#comment-15649953 ] yuhao yang commented on SPARK-18374: Just to provide some history info for the issue: