[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread Michael Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530144#comment-17530144 ] Michael Sokolov commented on LUCENE-10541: -- I think the probability of choosing a particular

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread Dawid Weiss (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529845#comment-17529845 ] Dawid Weiss commented on LUCENE-10541: -- I've applied the PR - we can close this issue (for now)?

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529844#comment-17529844 ] ASF subversion and git services commented on LUCENE-10541: -- Commit

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529839#comment-17529839 ] ASF subversion and git services commented on LUCENE-10541: -- Commit

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-29 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529837#comment-17529837 ] ASF subversion and git services commented on LUCENE-10541: -- Commit

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-28 Thread Michael McCandless (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529416#comment-17529416 ] Michael McCandless commented on LUCENE-10541: - {quote}enwiki lines contains 2 million

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-28 Thread Uwe Schindler (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529286#comment-17529286 ] Uwe Schindler commented on LUCENE-10541: I think another option might be to use

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-28 Thread Dawid Weiss (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529261#comment-17529261 ] Dawid Weiss commented on LUCENE-10541: -- Filed a PR at https://github.com/apache/lucene/pull/850.

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-27 Thread Dawid Weiss (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528991#comment-17528991 ] Dawid Weiss commented on LUCENE-10541: -- I agree - we should fix mock analyzer to not return such

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-27 Thread Michael McCandless (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528913#comment-17528913 ] Michael McCandless commented on LUCENE-10541: - {quote}Let's fix the default. I know the

[jira] [Commented] (LUCENE-10541) What to do about massive terms in our Wikipedia EN LineFileDocs?

2022-04-27 Thread Robert Muir (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528912#comment-17528912 ] Robert Muir commented on LUCENE-10541: -- I think the problem is this line in MockTokenizer: