[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-12-04 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708502#comment-16708502 ] ASF subversion and git services commented on LUCENE-8509: - Commit

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-12-03 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707100#comment-16707100 ] Alan Woodward commented on LUCENE-8509: --- I plan on committing this in the next couple of days >

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-11-19 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691884#comment-16691884 ] Alan Woodward commented on LUCENE-8509: --- Here's an updated patch that allows you to conditionally

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-29 Thread Michael Gibney (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667406#comment-16667406 ] Michael Gibney commented on LUCENE-8509: I'd echo [~dsmiley]'s comment over at LUCENE-8516 – "I

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-28 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1403#comment-1403 ] Alan Woodward commented on LUCENE-8509: --- > WDGF is playing the role of a tokenizer This is the

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Mike Sokolov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665429#comment-16665429 ] Mike Sokolov commented on LUCENE-8509: -- [ from mailing list – sorry for the duplication ] The

Re: [jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Michael Gibney
Ah, I see -- thanks, Michael. To make sure I understand correctly, this particular case (with this particular order of analysis components) *would* in fact be fixed by causing TrimFilter to update offsets. But for the sake of argument, if we had some filter *before* TrimFilter that for some reason

Re: [jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Michael Sokolov
In case it wasn't clear, I am +1 for Alan's plan. We can always restore offset-alterations here if at some future date we figure out how to do it correctly. On Fri, Oct 26, 2018 at 6:08 AM Michael Sokolov wrote: > The current situation is that it is impossible to apply offsets correctly > in a

Re: [jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Michael Sokolov
The current situation is that it is impossible to apply offsets correctly in a TokenFilter. It seems to work OK most of the time, but truly correct behavior relies on prior components in the chain not having altered the length of tokens, which some of them occasionally do. For complete correctness

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-24 Thread Michael Gibney (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663115#comment-16663115 ] Michael Gibney commented on LUCENE-8509: > The trim filter removes the leading space from the

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-24 Thread David Smiley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662897#comment-16662897 ] David Smiley commented on LUCENE-8509: -- bq. The trim filter removes the leading space from the

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-24 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662159#comment-16662159 ] Alan Woodward commented on LUCENE-8509: --- Here is a patch removing the offset-adjustment logic from

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-24 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662137#comment-16662137 ] Alan Woodward commented on LUCENE-8509: --- A related case:

[jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-09-19 Thread Alan Woodward (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620596#comment-16620596 ] Alan Woodward commented on LUCENE-8509: --- I don't think this is fixable with the current setup, but