[ https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854983#action_12854983 ]
Robert Muir commented on SOLR-1869: ----------------------------------- bq. this all started because the highlighter was highlighting a term at the same offsets twice, Perhaps we should fix this directly in DefaultSolrHighlighter? It already has this TokenStream-sorting filter thats intended to do the following: {code} /** Orders Tokens in a window first by their startOffset ascending. * endOffset is currently ignored. * This is meant to work around fickleness in the highlighter only. It * can mess up token positions and should not be used for indexing or querying. */ {code} Maybe the deduplication logic should occur here after it sorts on startOffset? > RemoveDuplicatesTokenFilter doest have expected behaviour > --------------------------------------------------------- > > Key: SOLR-1869 > URL: https://issues.apache.org/jira/browse/SOLR-1869 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Reporter: Joe Calderon > Priority: Minor > Attachments: RemoveDupOffsetTokenFilter.java, > RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch > > > the RemoveDuplicatesTokenFilter seems broken as it initializes its map and > attributes at the class level and not within its constructor > in addition i would think the expected behaviour would be to remove identical > terms with the same offset positions, instead it looks like it removes > duplicates based on position increment which wont work when using it after > something like the edgengram filter. when i posted this to the mailing list > even erik hatcher seemed to think thats what this filter was supposed to do... > attaching a patch that has the expected behaviour and initializes variables > in constructor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.