[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first
[ https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807496#comment-16807496 ] ASF subversion and git services commented on LUCENE-8730: - Commit 9591052fede6dda95fc26113bb22ab79b5405a75 in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9591052 ] LUCENE-8730: WordDelimiterGraphFilter always emits its original token first > Ensure WordDelimiterGraphFilter always emits its original token first > - > > Key: LUCENE-8730 > URL: https://issues.apache.org/jira/browse/LUCENE-8730 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8730.patch, LUCENE-8730.patch > > > WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically > outside setting position length; the only difference being that WDGF can > sometimes emit its original token as the second output token rather than the > first. We should change this to conform to the behaviour of the older filter > - this will make it much easier to remove WDF entirely and cut over tests > that use it incidentally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first
[ https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807497#comment-16807497 ] ASF subversion and git services commented on LUCENE-8730: - Commit 3de0b3671998cc9bc723d10f1b31ce48cbd4fa64 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3de0b36 ] LUCENE-8730: WordDelimiterGraphFilter always emits its original token first > Ensure WordDelimiterGraphFilter always emits its original token first > - > > Key: LUCENE-8730 > URL: https://issues.apache.org/jira/browse/LUCENE-8730 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8730.patch, LUCENE-8730.patch > > > WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically > outside setting position length; the only difference being that WDGF can > sometimes emit its original token as the second output token rather than the > first. We should change this to conform to the behaviour of the older filter > - this will make it much easier to remove WDF entirely and cut over tests > that use it incidentally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first
[ https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806975#comment-16806975 ] Jim Ferenczi commented on LUCENE-8730: -- +1, thanks Alan > Ensure WordDelimiterGraphFilter always emits its original token first > - > > Key: LUCENE-8730 > URL: https://issues.apache.org/jira/browse/LUCENE-8730 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8730.patch, LUCENE-8730.patch > > > WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically > outside setting position length; the only difference being that WDGF can > sometimes emit its original token as the second output token rather than the > first. We should change this to conform to the behaviour of the older filter > - this will make it much easier to remove WDF entirely and cut over tests > that use it incidentally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first
[ https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806942#comment-16806942 ] Alan Woodward commented on LUCENE-8730: --- Updated patch, folding in Jim's feedback. > Ensure WordDelimiterGraphFilter always emits its original token first > - > > Key: LUCENE-8730 > URL: https://issues.apache.org/jira/browse/LUCENE-8730 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8730.patch, LUCENE-8730.patch > > > WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically > outside setting position length; the only difference being that WDGF can > sometimes emit its original token as the second output token rather than the > first. We should change this to conform to the behaviour of the older filter > - this will make it much easier to remove WDF entirely and cut over tests > that use it incidentally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first
[ https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806642#comment-16806642 ] Jim Ferenczi commented on LUCENE-8730: -- +1 to output the original token first. Is it possible to set the original token offset (savedTermLength) once since the value doesn't change ? I also wonder if the first value in the buffer should be filtered from the sort entirely (e.g. call sorter.sort(1, bufferedLen)) to ensure correctness ? > Ensure WordDelimiterGraphFilter always emits its original token first > - > > Key: LUCENE-8730 > URL: https://issues.apache.org/jira/browse/LUCENE-8730 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8730.patch > > > WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically > outside setting position length; the only difference being that WDGF can > sometimes emit its original token as the second output token rather than the > first. We should change this to conform to the behaviour of the older filter > - this will make it much easier to remove WDF entirely and cut over tests > that use it incidentally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org