[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first

2019-04-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807496#comment-16807496
 ] 

ASF subversion and git services commented on LUCENE-8730:
-

Commit 9591052fede6dda95fc26113bb22ab79b5405a75 in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9591052 ]

LUCENE-8730: WordDelimiterGraphFilter always emits its original token first


> Ensure WordDelimiterGraphFilter always emits its original token first
> -
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8730.patch, LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically 
> outside setting position length; the only difference being that WDGF can 
> sometimes emit its original token as the second output token rather than the 
> first.  We should change this to conform to the behaviour of the older filter 
> - this will make it much easier to remove WDF entirely and cut over tests 
> that use it incidentally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first

2019-04-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807497#comment-16807497
 ] 

ASF subversion and git services commented on LUCENE-8730:
-

Commit 3de0b3671998cc9bc723d10f1b31ce48cbd4fa64 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3de0b36 ]

LUCENE-8730: WordDelimiterGraphFilter always emits its original token first


> Ensure WordDelimiterGraphFilter always emits its original token first
> -
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8730.patch, LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically 
> outside setting position length; the only difference being that WDGF can 
> sometimes emit its original token as the second output token rather than the 
> first.  We should change this to conform to the behaviour of the older filter 
> - this will make it much easier to remove WDF entirely and cut over tests 
> that use it incidentally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first

2019-04-01 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806975#comment-16806975
 ] 

Jim Ferenczi commented on LUCENE-8730:
--

+1, thanks Alan

> Ensure WordDelimiterGraphFilter always emits its original token first
> -
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8730.patch, LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically 
> outside setting position length; the only difference being that WDGF can 
> sometimes emit its original token as the second output token rather than the 
> first.  We should change this to conform to the behaviour of the older filter 
> - this will make it much easier to remove WDF entirely and cut over tests 
> that use it incidentally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first

2019-04-01 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806942#comment-16806942
 ] 

Alan Woodward commented on LUCENE-8730:
---

Updated patch, folding in Jim's feedback.

> Ensure WordDelimiterGraphFilter always emits its original token first
> -
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8730.patch, LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically 
> outside setting position length; the only difference being that WDGF can 
> sometimes emit its original token as the second output token rather than the 
> first.  We should change this to conform to the behaviour of the older filter 
> - this will make it much easier to remove WDF entirely and cut over tests 
> that use it incidentally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8730) Ensure WordDelimiterGraphFilter always emits its original token first

2019-04-01 Thread Jim Ferenczi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806642#comment-16806642
 ] 

Jim Ferenczi commented on LUCENE-8730:
--

+1 to output the original token first. Is it possible to set the original token 
offset (savedTermLength) once since the value doesn't change ? I also wonder if 
the first value in the buffer should be filtered from the sort entirely (e.g. 
call sorter.sort(1, bufferedLen)) to ensure correctness ?
 

> Ensure WordDelimiterGraphFilter always emits its original token first
> -
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically 
> outside setting position length; the only difference being that WDGF can 
> sometimes emit its original token as the second output token rather than the 
> first.  We should change this to conform to the behaviour of the older filter 
> - this will make it much easier to remove WDF entirely and cut over tests 
> that use it incidentally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org