[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750041#comment-15750041
]
Joseph K. Bradley commented on SPARK-18374:
---
Oh nice, I didn't realize that was
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749138#comment-15749138
]
Sean Owen commented on SPARK-18374:
---
Yeah I tagged as 'releasenotes' for that reason --
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749114#comment-15749114
]
Joseph K. Bradley commented on SPARK-18374:
---
I noted this change of behavior in
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712533#comment-15712533
]
Apache Spark commented on SPARK-18374:
--
User 'hhbyyh' has created a pull request for
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711645#comment-15711645
]
Sean Owen commented on SPARK-18374:
---
Seems OK to me and to remove the stems like won.
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710198#comment-15710198
]
yuhao yang commented on SPARK-18374:
I checked with some other lists of stopwords and
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708181#comment-15708181
]
Sean Owen commented on SPARK-18374:
---
I think you can proceed to remove things like "won
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707659#comment-15707659
]
yuhao yang commented on SPARK-18374:
Yes. Currently we're discussing if we should put
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707642#comment-15707642
]
Xiangrui Meng commented on SPARK-18374:
---
See the discussion here: https://github.co
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707560#comment-15707560
]
yuhao yang commented on SPARK-18374:
cc [~mengxr] to see if he recalls any specific r
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664807#comment-15664807
]
Sean Owen commented on SPARK-18374:
---
[~whisper] do you have a comment here? it does see
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15664503#comment-15664503
]
yuhao yang commented on SPARK-18374:
Thanks for the response. By default, _Tokenizer_
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663273#comment-15663273
]
Sean Owen commented on SPARK-18374:
---
Adding the stop-words is fine, however, if the ups
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15662015#comment-15662015
]
yuhao yang commented on SPARK-18374:
With the default behavior of the _Tokenizer_ and
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652076#comment-15652076
]
Sean Owen commented on SPARK-18374:
---
It's a fair point indeed, because it would be much
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651793#comment-15651793
]
nirav patel commented on SPARK-18374:
-
[~srowen] Do you mean how to tokenize words in
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650799#comment-15650799
]
Sean Owen commented on SPARK-18374:
---
I think the idea is that it's applied post-tokeniz
[
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649953#comment-15649953
]
yuhao yang commented on SPARK-18374:
Just to provide some history info for the issue:
18 matches
Mail list logo