Robert Muir commented on SOLR-1869:

bq. this all started because the highlighter was highlighting a term at the 
same offsets twice,

Perhaps we should fix this directly in DefaultSolrHighlighter? It already has 
this TokenStream-sorting filter thats intended to do the following:
/** Orders Tokens in a window first by their startOffset ascending.
 * endOffset is currently ignored.
 * This is meant to work around fickleness in the highlighter only.  It
 * can mess up token positions and should not be used for indexing or querying.

Maybe the deduplication logic should occur here after it sorts on startOffset? 

> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>                 Key: SOLR-1869
>                 URL: https://issues.apache.org/jira/browse/SOLR-1869
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Joe Calderon
>            Priority: Minor
>         Attachments: RemoveDupOffsetTokenFilter.java, 
> RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and 
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical 
> terms with the same offset positions, instead it looks like it removes 
> duplicates based on position increment which wont work when using it after 
> something like the edgengram filter. when i posted this to the mailing list 
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables 
> in constructor

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to